.. _AdministrationGuide:

********************
Administration Guide
********************

.. contents:: Table of Contents
   :depth: 2
   :local:
   :backlinks: none

This guide describes the deployment of the VDJServer services on
computing resources provided by Texas Advanced Computing Center,
The University of Texas at Austin. VDJServer is managed by
Dr Lindsay G. Cowell at UT Southwestern Medical Center. Dr Scott Christley
is the software development manager and is responsible for day-to-day operations.

If you log on to this computer system, you acknowledge your awareness
and acceptance of the UT Austin Acceptable Use Policy. The
University will prosecute violators to the full extent of the law.

TACC Usage Policies: https://portal.tacc.utexas.edu/tacc-usage-policy

Service Deployment
-----------------------

There are currently three primary deployments of VDJServer: development, staging
and production. Each resides on its own VM at TACC:
vdj-dev.tacc.utexas.edu (development), vdj-staging.tacc.utexas.edu
(staging), vdj-prod.tacc.utexas.edu (production). vdj-prod is the same
as https://vdjserver.org. These are full public deployments in that Google Recaptcha,
emails, and other system services should be accessible and functional.

These VMs cannot be accessed by ssh outside of TACC, so you first
need to ssh to a public TACC machine such as ls6.tacc.utexas.edu.
Furthermore, they all require 2-factor authentication
with the TACC Token app. The only publicly accessible ports are 
standard http/https ports of 80 and 443. SSL Security as explained below is configured
on each of the deployment VMs.

Other backend VMs include:

+ vdj-rep-01: data-storage.vdjserver.org Tapis storage system, production Mongo DB
+ vdj-rep-02: production ADC API
+ vdj-rep-03: V2 API
+ vdj-rep-04: open

These backend VMs are not publicly accessible, and network traffic to these VMs
are protected by firewall rules.

VM Setup
^^^^^^^^

We try to avoid customizing the VMs when possible to reduce maintenance and allow
for services to be more easily migrated from one VM to another. More details about
each is provided in their own individual section below.

* docker, for VDJServer programs
* nginx, for SSL certificates and proxy
* nfs, for mounting Corral disk

Docker
^^^^^^

We use docker exclusively for running the VDJServer server programs on the VMs. Standard
installation docker for CentOS: https://docs.docker.com/engine/install/centos/

Be sure to enable docker with systemctl so that it gets started on reboot::

 systemctl enable docker
 systemctl enable containerd

Also add the vdj user and any others to the docker group::

 usermod -aG docker vdj
 usermod -aG docker another_user

nginx and SSL Security
^^^^^^^^^^^^^^^^^^^^^^

SSL security is handled at the system level versus in each server
process. Specifically, a system `nginx` is installed as a proxy to
accept https requests and reroutes them to a local port or to a port on
another VM. Incoming non-secure http requests are redirected to https,
but proxied requests going to server processes are sent over http. Proxied
requests to other VMs use https for additional security.
The config file `/etc/nginx/nginx.conf` should be kept simple, if possible,
to route all locations to a single port. A second `nginx` which runs as
part of the `docker-compose` and is http, can then handle the routing of
specific locations to specific services. This allows flexibility in
deployment without having to continually modify the system config file::

 yum install nginx

After you have nginx configured properly, make sure to enable the service
with systemctl so that it gets started on reboot::

 systemctl start nginx
 systemctl enable nginx

It's probably best not to try to create the configuration from scratch but
copy from an existing nginx configuration.
The current setup is to have one section for redirecting http to https
as shown here for the vdj-dev VM::

    server {
        listen         80;
        server_name    vdj-dev.tacc.utexas.edu www.vdj-dev.tacc.utexas.edu;
        return         301 https://vdj-dev.tacc.utexas.edu$request_uri;
    }

and another section to route the https request to the local nginx port within
the docker containers. Note that we've had to add a number of custom headers
to responses for security purposes::

    server {
        listen       443 ssl;
        server_name  vdj-dev.tacc.utexas.edu;

        #root /var/www/html/vdjserver-backbone/live-site;                                                                                                                                

        #ssl                  on;                                                                                                                                                        
        ssl_certificate      /etc/pki/tls/certs/vdj-dev.tacc.utexas.edu.cer;
        ssl_certificate_key  /etc/pki/tls/private/vdj-dev.tacc.utexas.edu.key;

        ssl_session_timeout  5m;

        ssl_protocols TLSv1.2;
        ssl_ciphers  EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH;
        ssl_prefer_server_ciphers   on;
        ssl_dhparam /etc/ssl/certs/dhparam.pem;

        if ($host ~ /^www\./) {
            rewrite ^(.*) https://vdj-dev.tacc.utexas.edu$1 permanent;
        }

        # Deny all attempts to access hidden files                                                                                                                                       
        # such as .htaccess, .htpasswd, .DS_Store (Mac).                                                                                                                                 
        location ~ /\. {
          deny all;
        }

        # route everything to local nginx in the VDJServer docker container
        location / {
            proxy_pass http://127.0.0.1:8080;                                                                                                                                           
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_read_timeout 86400;
            # additional security headers required by TACC                                                                                                                               
            add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
            add_header X-Frame-Options SAMEORIGIN always;
            add_header X-Content-Type-Options nosniff always;
            add_header Content-Security-Policy "default-src https: wss: 'self' 'unsafe-inline' 'unsafe-eval'" always;
            add_header X-XSS-Protection "1" always;
            add_header Referrer-Policy "strict-origin-when-cross-origin" always;
            add_header Permissions-Policy "geolocation=(self)" always;
        }

    }

Whenever you change nginx.conf, you need to reload it in the running service::

 nginx -s reload

With V1, all of the components (nginx, web, api) fit on one VM, but in V2 we
add the data repository (repository), which should have its own VM, and there may
be additional services added later.

Corral Disk
^^^^^^^^^^^

Some API services requires direct access to the Corral project disks so that it can
access files more efficiently versus going through the Tapis API.
TACC needs to enable NFS mount for any VM that will access.
To mount the disk, the VM needs the NFS software. We don't enable any of the
systemd services because we only want the NFS client and not run an NFS server::

 sudo yum install nfs-utils

Create the mount folder and set its permission::

 sudo mkdir /vdjZ
 sudo chown vdj /vdjZ
 sudo chgrp G-803419 /vdjZ
 sudo chmod a+rw /vdjZ

An entry needs to be put into the system `/etc/fstab` so that it is mounted on VM start::

 129.114.52.166:/corral/main/projects/vdjZ   /vdjZ   nfs   rw,proto=tcp,nfsvers=3,nosuid,rsize=1024768,wsize=1024768,intr 0 0

Note that only the vdj account is allowed to write the project disk, even root is
not allowed. This means you need to switch to the vdj account if you want to work
with the files at the command line. Also, API service programs must change their
group and user in order to access.

Tapis V3
^^^^^^^^

The Tapis V3 Tenant server for VDJServer is `vdjserver.tapis.io`, and
the staging server is `vdjserver.staging.tapis.io`. Each needs to be
separately configured with clients, token accounts, storage and execution
systems, permissions, and apps. Token accounts are created by Tapis
administrators; these are long-lived token accounts separate from
standard user accounts that are used for automated testing and command
line automation. These accounts cannot be used to login to the VDJServer
web portal because they lack OAuth authentication. The JSON description
files for systems and apps are stored in the
[vdjserver-tapis](https://github.com/vdjserver/vdjserver-tapis) repository
with setup instructions. Setup is performed at the command line using the
`vdjserver-tools` python program which resides in that repository.

TACC firewall rules need to be defined to allow Tapis machines access to the
backend VMs. The two primary uses are for file transfers and for database access
through the Meta API.

Google Email Accounts
^^^^^^^^^^^^^^^^^^^^^

A couple of google email accounts have been created for testing purposes. There
is also a google email account for centralizing study curation for the ADC. These
accounts can be used to login to the VDJServer web portal.

VDJServer Development
^^^^^^^^^^^^^^^^^^^^^

This deployment is meant to support the development process. While a significant amount of
development can be done on a local machine, there are number of functions that require the
deployment environment to work properly. Some of these include:

* Google captcha.
* Notifications from Tapis.
* Access to TACC restricted resources.

Set up directory for vdj account::

 mkdir -p /var/www/docker
 chown vdj /var/www/docker
 chgrp G-803419 /var/www/docker

Then as vdj account can pull down the appropriate source repositories and setup services::

 su - vdj
 cd /var/www/docker
 git clone https://bitbucket.org/vdjserver/vdjserver-web.git

Should avoid doing source code development with the vdj account, i.e. avoid doing commits
and pushes with the source code in /var/www/docker. Use it strictly for pulling changes.
Instead use your personal account and manually start/stop the services to test.

VDJServer Production
^^^^^^^^^^^^^^^^^^^^

This is the production deployment of VDJServer. Care should be taken in making any
changes to minimize disruption to users.

VDJServer Staging
^^^^^^^^^^^^^^^^^^^^

This is the staging deployment of VDJServer. 


Additional VMs
^^^^^^^^^^^^^^

There are four additional VMs that can be used for running API services.

* vdj-rep-01: This is the current production machine for the Mongo DB. It is also
  the Tapis storage system `data-storage.vdjserver.org`, which is actually a proxy to access the
  Corral project storage mounted at `/vdjZ`.
* vdj-rep-02: This is the current production machine for VDJServer ADC API.
* vdj-rep-03: This is the current staging machine for VDJServer API V2.
* vdj-rep-04: This is currently open.

API Ports
^^^^^^^^^

VDJServer V1 only had the single API process, but V2 has introduced additional services. To
avoid conflict, we try to use unique ports for each service. To complicate matters, most
services run within docker containers and internal ports can be exposed differently, but we
try to use the same port number for both.

* 80/443: nginx https proxy (host machine).
* 8080: nginx proxy (inside docker container).
* 8081: VDJServer API V2, `/api/v2`.
* 8020: VDJServer ADC API, `/airr/v1`.
* 8021: VDJServer ADC Async API, `/airr/async/v1`.
* 8025: VDJServer iR+ Stats API, `/irplus/stats/v1`.

Processes and Checklists
------------------------

New Release of the VDJServer Web Portal
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Deploying a new release of the VDJServer Web Portal involves
two main steps.

* Create new releases for all of the repositories.
* Deploying the new release on the production (or staging, dev) VM.

Create Release of Repositories
++++++++++++++++++++++++++++++

New release of all the components that have changed. Some of
these are libraries that are used by multiple services, these are:

* vdjserver-schema
* vdj-tapis-js

Then the web portal submodules are:

* vdjserver-web-api
* vdjserver-web-backbone
* vdjserver-web-nginx
* vdjserver-web-plumber

To make a release, merge all the applicable pull requests. If there are no changes
to the submodule, then we do not need to make a new release for that submodule.
For submodules that have changes, we need to determine the next version number
and set in the appropriate file.

* For vdjserver-schema, make sure the new version number is in `package.json`.
* For vdj-tapis-js, make sure the new version number is in `package.json`.

* For vdjserver-web, there is no file with a version number.
* For vdjserver-web-api, make sure the new version number is in `package.json`.
* For vdjserver-web-backbone, make sure the new version number is in `package.json`.
* For vdjserver-web-nginx, there is no file with a version number.
* For vdjserver-web-plumber, current no file with a version number.

For vdjserver-web, we should use a clean git clone directory.

* git clone https://github.com/vdjserver/vdjserver-web.git vdjserver-web-clean
* git submodule update --init --recursive
* for each submodule, checkout master branch and pull all the new changes.
* If the submodule has its own submodules, then do the submodule init command to bring them up-to-date.
* Finally, commit those submodules and create new release of vdjserver-web.

Deploy Release to VM at TACC
++++++++++++++++++++++++++++

* ssh login to ls6.tacc.utexas.edu with your user account
* ssh login to vdj-prod.tacc.utexas.edu (production VM) or vdj-staging.tacc.utexas.edu (staging VM) or vdj-dev.tacc.utexas.edu (dev VM)
* ``docker ps`` will show if services are currently running. Also ``systemctl status vdjserver`` will show if services are automatically started on the system.
* ``sudo bash`` to become root.
* ``su - vdj`` to become the vdj account.
* ``cd /var/www/docker`` where the code resides.
* git clone into a new version directory, e.g. for version 2.8.0, ``git clone https://github.com/vdjserver/vdjserver-web.git vdjserver-web-v2.8.0``
* cd into that new version directory and initialize the submodules ``git submodule update --init --recursive``
* migrate the configurations files
    * For vdjserver-web-api, this is the ``.env`` file.
    * For vdjserver-web-backbone, this is the ``environment-config.js`` file
* cd into the docker-compose/v2 directory from the top-level vdjserver-web directory and setup ``.env`` to point to appropriate DATA_ROOT directory.
* do a ``docker compose build``
* If everything builds ok, exit this terminal to go back to root user.
* stop the current running service with ``systemctl stop vdjserver``
* ``cd /var/www/docker`` and swap the current version directory with the new version.
    * rename the ``vdjserver-web`` directory to old version name, e.g. ``mv vdjserver-web vdjserver-web-v2.7.0``
    * rename the new version directory to ``vdjserver-web``, e.g. ``mv vdjserver-web-v2.8.0 vdjserver-web```
* start the service with ``systemctl start vdjserver``
* after the service is up, verify that you can login.

The disk space on these VMs is pretty small, and the docker images will start filling up the space, so
it's a good idea to prune the images and volumes occasionally.

* ``df -h .`` will show you the current disk usage.
* ``docker system prune --volumes`` will clean up docker disk space.

Start/Stop Services
^^^^^^^^^^^^^^^^^^^

Maintenance Mode
^^^^^^^^^^^^^^^^

VDJServer may need to be put in maintenance mode for a number of reasons. Putting VDJServer
in maintenance mode displays a message on the home page and prevents users from login, as well
as other functionality from the home page like creating an account or password reset.

* TACC or Tapis is experiencing issues that prevent VDJServer from working properly.
* VDJServer itself has an issue that prevents it from working properly.
* VDJServer is going through a significant upgrade.

The `environment-config.js` configuration file has a simple mechanism to enable maintenance
mode and display a maintenance message:

1. Login to vdj-prod and become root
2. Go to directory which holds the active `environment-config.js`, this is typically
   `/var/www/docker/vdjserver-web/vdjserver-web-backbone/docker/environment-config/run`
3. Make a backup: `cp environment-config.js environment-config.js.bak`
4. Open `environment-config.js` in an editor such as emacs
5. Change `maintenance` value to `true`
6. Change `maintenanceMessage` to the message to be displayed on the home page.
7. Save the file. Changes are active immediately.
8. Load https://vdjserver.org to verify maintenance mode and the display message.

Disabling maintenance mode can be done by replacing `environment-config.js` with the
backup file, or editing it directly:

1. Change `maintenance` value to `false`
2. Save the file. Changes are active immediately.
3. Load https://vdjserver.org and verify you can login.

These instructions are for the production website. If necessary, put the staging and
development deployments in maintenance mode, or shutdown the services to prevent access.

This maintenance mode only applies to the VDJServer GUI. Other VDJServer services like
the Web API and the ADC repository will still be active. These services have to be shutdown
if access needs to be disabled.

SSL Certificate for vdjserver.org
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

NOTE: OBSOLETE. This section is obsolete and needs to be rewritten.

Once a year, about a month before the expiration date, UTSW's SysOps will send an email
indicated the vdjserver.org certificate will expire. Installing a new certificate involves
these main steps:

+ Submit a TACC request for certificate renewal.
+ TACC generates a certificate signing request (CSR) and private key for vdjserver.org
+ Submit a UTSW ServiceDesk request with the CSR
+ Pay for the new certificate
+ Install the new certificate into vdjserver.org

Because TACC controls the vdjserver.org domain, they have to generate the certificate
renewal. Generally, after getting TACC to respond to the certificate renewal request,
they can be put on the same email chain as a UTSW person, and together they work through
the steps of generating a new certificate.

Sometimes TACC will install the certificate for us, but in case they just give us
the files, then we can install them. There are two files needed, a private key file
and the certificate file. When installing, make copies of the existing certificates files
and be careful not to accidentally overwrite or delete them. The private key file
is put in the `/etc/pki/tls/private` directory, and the certificate file is put in
the `/etc/pki/tls/certs` directory. In both cases, there is a `backup`
subdirectory to put backup copies, e.g., with these commands where `YEAR` is the active
year for the certificate.

+ ssh to vdjserver.org and become root
+ cd /etc/pki/tls/certs
+ copy certificate file to vdjserver.org.cer.YEAR
+ cp vdjserver.org.cer.YEAR backup
+ cd /etc/pki/tls/private
+ copy private key file to vdjserver.org.key.YEAR
+ cp vdjserver.org.key.YEAR backup

If there are multiple files in the directory, and it is not clear which are the
current files, look in the nginx config file, `/etc/nginx/nginx.conf`, and
`ssl_certificate` and `ssl_certificate_key` will have the full path to the files.

If you copy the new files over the old files then there is no need to modify the
nginx config file, but I suggest using a `YEAR` prefix to keep the files separate.
This helps accidentally overwriting a file.

+ edit /etc/nginx/nginx.conf and set `ssl_certificate` and `ssl_certificate_key` to
  the absolute paths to the certificate and private key files.
+ restart nginx with `systemctl restart nginx`
+ check nginx is running and no errors with `systemctl status nginx`

Verify that the new certificate is installed by going to vdjserver.org from your
browser. You may need to refresh and/or clear your cache. Check the certificate
and verify it has a new expiration date.

Note that these instructions only apply to the vdjserver.org production machine. The vdj-staging,
vdj-dev, and other VMs are in the tacc.utexas.edu domain. If the certificate expires
for any of them, submit a TACC request and they will update.

VDJServer Users Mailing List
----------------------------

NOTE: OBSOLETE. This section is obsolete and needs to be rewritten.

We utilize UTSW's mailing list service, running GNU mailman, to manage VDJServer's user
mailing list. Currently, the process is not automated and new users must be manually
added to the mailing list. Automating is difficult as the mailing list administration
can only be accessed on UTSW's internal network, which is not accessible from the TACC
VMs running the VDJServer code. There are two essential tasks: 1) generate list of new
user accounts, and 2) add new emails to the mailing list.

The email account for the mailing list is `vdjserver-users@lists.utsouthwestern.edu`

The script to list user accounts is part of the `vdjserver-repair` repository. Here is
the steps to generate a list of new user accounts

1. Login into TACC (stampede2, etc.) and become vdj.
2. `source vdjserver.env` to setup environment.
3. Go to $WORK directory, then `cd ../common/vdjserver-repair`
4. `module load python3`
5. `python3 list_all_users.py`
6. To keep historical records, the `users` subdirectory contains dated files with
   the list of all users, e.g. `agave_users_Jul_13_2021.txt`, `agave_users_Jul_6_2020.txt`,
   and so on. Also, there is a file with the last user added, e.g. `last_user_Jul_13_2021.txt`,
   `last_user_Jul_6_2020.txt`, and so on. Run the script again and send the output to a
   file with the current date.
7. `python3 list_all_users.py >users/agave_users_MON_DAY_YEAR.txt`
8. `cd users`
9. Now you need to extract just the new users created since the last time. Open the
   `agave_users_MON_DAY_YEAR.txt` file in an editor, search for the user account for
   the last user added from the previous time period, then copy/paste the rows and save
   into file `users_to_subscribe.txt`. Also, put the last user in the list into a new
   file with the current data, `last_user_MON_DAY_YEAR.txt`.
10. Finally, extract just the email address from `users_to_subscribe.txt`.
11. `awk '{print $2}' users_to_subscribe.txt`
12. You can copy/paste that list of emails into the mailing list administration website
    as described below. There is no need to worry about removing duplicate emails because
    the mailing list will automatically filter those out.
13. Lastly, add, commit and push the new files with `git` to the repository so the information
    is saved.

Administration of the VDJServer mailing list requires being on the UTSW internal network.
Also, the website is not secure (http versus https) and some browser will automatically try
to switch to https. This can usually be overcome by opening a private (in cognito) browser
window.

1. Open a private (in cognito) browser window.
2. Go to `http://lists.utsouthwestern.edu/mailman/listinfo/vdjserver-users`
3. At the bottom of the screen is a link to the VDJServer-users administrative interface:
   `http://lists.utsouthwestern.edu/mailman/admin/vdjserver-users`
4. Authenticate with the admin password.
5. From the configuration categories, click `Membership Management...` then click
   `Mass Subscription`.
6. There is a text box labeled `Enter one address per line below`, copy/paste the list of
   emails into that text box, and click the `Submit Your Changes` button to mass subscribe
   all the emails. The result page should list the emails successfully subscribed, and
   any duplicates will be automatically filtered out. Emails will also to the mailing list
   owners indicating that the emails have been successfully subscribed.

Project Data Loads into the AIRR Data Commons
---------------------------------------------

VDJServer has two processes for publicly sharing project data. The first is project publishing
which makes metadata, files and analysis results available for read-only access by the public. Project
publishing can be done by the user, as it is quick, and mainly involves setting permissions on the data.
The second is loading project data into the VDJServer repository for the AIRR Data Commons.
This is manually initiated by a VDJServer administrator because loading the rearrangement data
can take a long time, and additional processes should occur, like generating the download cache
and statistics, before the data is made public.

Loading rearrangement data can be very time-consuming. Not just due to the amount of data to be loaded,
but in particular the `junction_suffixes` index, which is used to optimize CDR3 substring searches,
imposes significant overhead. Furthermore, we do not want to load data into the production database that is currently
responding to public queries, because query results may be actively changing while data is being loaded. This
has the potential to generate unreproducible results for users. Because of this, we have designed a system
with two features for optimization.

+ Double-buffering scheme. One set of collections for production queries, and another set of collections
  for data loading. The roles of the collections are switched when the current data load collections are
  put into production, and conversely the current production collections are setup for data loading. This
  also means the studies need to be loaded twice.
+ Delete the `junction_suffixes` index for the data loading collections.

In general, the production VDJServer repository (`https://vdjserver.org/airr/v1`) points to the production
query collections, while the staging VDJServer repository (`https://vdj-staging.tacc.utexas.edu/airr/v1`)
points to the data loading collections. Setting the collections for a service involves changing the
environment config file for `MONGODB_QUERY_COLLECTION` and `MONGODB_LOAD_COLLECTION` to either `_0` or
`_1` based upon which collections are which. These settings are in the VDJServer repository service
and in the VDJServer API service. The former handles queries and the statistics cache, while the latter handles
the download cache and the data loading functions. 

New Release of the VDJServer Repository
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We strive to perform a periodic release, e.g., every 3-6 months, of the VDJServer repository
so that any loaded studies will be made publicly available. The primary steps are:

+ Create the `junction_suffixes` index on the rearrangement collection.
+ Ensure that the ADC Download Cache has been populated for the new studies so that they can downloaded
  when the repository is made public.
+ Ensure that the Statistics Cache has been populated for the new studies so that they will be
  available when the repository is made public.
+ Write a brief release announcement and update the :ref:`Release Announcements <RepositoryReleaseAnnouncements>`.
+ Inform iReceptor (and possibly others) so they can perform any updates. Generally, the staging collection
  should be tested on iReceptor Gateway staging to expose any issues with increased database size.
+ Flip the values for the `MONGODB_QUERY_COLLECTION` and `MONGODB_LOAD_COLLECTION` settings in the
  environment config for both production services and restart the services. This will make the database live.
+ Verify that queries are working to the new production database in VDJServer CDP. Check that the new
  studies are available with statistics and can be downloaded.
+ Send an email with the release announcement to the VDJServer Users Mailing List.

Prepare the staging VDJServer Repository for Data Loading
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Assuming a new release of the VDJServer Repository was put into production, the staging repository
should be setup so that it can be used for data loading.

+ Flip the values for the `MONGODB_QUERY_COLLECTION` and `MONGODB_LOAD_COLLECTION` settings in the
  environment config for the staging VDJServer repository service and restart the service.
+ Verify that the `MONGODB_LOAD_COLLECTION` for VDJServer API is pointing to the data loading collections.
  We do not want to accidentally load data into the production collections.
+ Delete the `junction_suffixes` index on the data loading rearrangement collection.

Because of the double-buffering scheme, the staging repository, which used to be the production
repository, is missing all of the new studies that were recently made public. Therefore, the
first steps should be:

+ Determine the set of studies that were newly made public.
+ Load those studies into the staging repository.

After those studies have been loaded, the staging repository should be identical to the production
repository. Verify that the number of repertoires and rearrangement counts match between the
two collections. Finally, start loading any new studies.

.. toctree::
   :maxdepth: 1