Browser Mirrors: Difference between revisions

From genomewiki
Jump to navigationJump to search
No edit summary
No edit summary
Line 7: Line 7:


* [[Media:doDownloads.sh]] - Script for downloading selected files from hgdownload.cse.ucsc.edu
* [[Media:doDownloads.sh]] - Script for downloading selected files from hgdownload.cse.ucsc.edu
* [[Media:doUpdateDb.sh]] - Script for updating local mysql database with downloaded files.
* [[Media:doUpdateDb.sh]] - Script for updating local mysql databases with downloaded files.
* [[Media:databases]] - File identifying which databases to mirror.
* [[Media:databases]] - File identifying which databases to mirror.
* [[Media:gbdb.exclude]] - File identifying directories to be excluded when rsyncing /gbdb.
* [[Media:gbdb.exclude]] - File identifying directories to be excluded when rsyncing /gbdb.

Revision as of 15:34, 10 August 2006

This page contains information for users interested in mirroring the UCSC Genome Browser on their own servers. See also http://genome.ucsc.edu/mirror.html

Partial Mirrors

A complete mirror of all assemblies requires a large amount of disk space (currently on the order of a terabyte). However, it is not too difficult to set things up so that only a portion of assemblies are mirrored. The following scripts and auxiliary files are used for this purpose at Cornell (http://genome-mirror.bscb.cornell.edu).

These programs are run nightly via cron, via the following crontab entry:

0 0 * * * /usr/data/mirror-download/doAll.sh 

where doAll.sh is a simple wrapper for doDownloads.sh and doUpdateDb.sh:

#!/bin/bash -e

# do downloads and updates
# for use with cron

echo "#####################################"
/usr/data/mirror-download/doDownloads.sh

echo "#####################################"
/usr/data/mirror-download/doUpdateDb.sh

echo "#####################################"
echo "Successfully updated mirror."

To mirror a new database, simply add the name (e.g., galGal3) to the databases file and, if necessary, delete it from gbdb.exclude. Note that doDownloads.sh and doUpdateDb.sh can be run manually, either on all databases or just on selected databases (see -d option). The "dry-run" (-n) option is handy for seeing what they will do without actually making any changes.

These programs are a work in progress. One minor problem is that they contain hardcoded paths to /usr/data/mirror-downloads, which is the working directory on our system. This can easily be corrected. Another issue is that we want to allow for local tracks in addition to mirrored tracks. As a result, we cannot use the --delete option when we rsync files to /gbdb and we cannot simply overwrite the hgcentral tables with the 'hgcentral.sql' table provided on hgdownload (see comments in scripts). A related issue is that updates on hgdownload to trackDb tables currently cause our own local versions of these tables to be overwritten, and we have to redefine them from our local trackDb.ra files each time this happens. To address these problems, we will need to do some programming that will allow updated data from hgdownload to be merged with our own local data. Users who do not need to maintain their own local files need not worry about these issues -- but you may want to edit the scripts to use the --delete option when rsyncing /gbdb and to download and load hgcentral.sql.