Minimal Browser Installation
NOTE: This page is not necessarily maintained with the most up-to-date information. The information in our mirror instructions is current.
Minimal Browser Installation
Usually a browser installation wants to be a subset of all data from a selection of genomes compared to the entire UCSC Genome Browser. A mirror with a subset of genomes is often called a "partial mirror". Instead of the entire rsync of everything from some genomes as documented on Mirror Instructions or on this wiki (Browser installation), sometimes you want only a subset of data from one genome or want to create a genome browser for an entirely new genome. This is more work but possible. We call this a "Minimal Browser" in the following. This page shows you a minimal browser that displays only the genome sequence and the gaps. It is admittedly not a very useful genome browser but will get you started.
A license is required for commercial download and/or installation of the Genome Browser binaries and source code. No license is needed for academic, nonprofit, and personal use. To purchase a license, see our License Instructions.
Please note the full discussion in the README.* files and scripts to assist with these browser installation procedures in the source tree directory: src/product/ and src/product/scripts/
A minimal browser database needs six tables:
The gateway page needs the hgcentral database to function. The hgcentral database can by copied directly from the MySQL data files from the ftp server ftp://hgdownload.soe.ucsc.edu/mysql/hgcentral or loaded from the SQL text file at http://hgdownload.soe.ucsc.edu/admin/hgcentral.sql
Enter a defaultGenome=<your species> specification in your /cgi-bin/hg.conf file. See notes in the src/product/ex.hg.conf file for this option.
For the /gbdb/ data area, at a minimum you will need the .2bit file or the nib files for the assembly. This is either:
/gbdb/<database>/<database>.2bit or for older genome assemblies: /gbdb/<database>/nib/*.nib
Various tracks use other files in this directory. If you don't care about all the tracks, you won't need other files here.
For the genbank sequences, you can check the gbExtFile table for your database to see exactly which files are used by that assembly in /gbdb/genbank/
Extract the "path" column from that table and use that list in a --files-from specification for your rsync.
You will also need the /gbdb/hgFixed/ and the hg19 installation requires the /gbdb/hg19Patch5/ directory and database.
To mirror a single genome, there are a few extra databases that are required to enable the full functions of that single genome database. These databases contain data that are not specific to a single genome assembly. Your particular genome may need one or more of the following databases:
With symlinks in your MySQL data directory:
- go -> go080130
- proteome -> proteins090821
- uniProt -> sp090821
The specific selection of the go, proteins, and uniProt databases can be found in the hgcentral gdbPdb table:
hgsql -e "select * from gdbPdb;" hgcentral
The symlinks remain as indicated above, other genomes will reference a specific protein, go, or uniProt database explicitly.
You may need to add MySQL GRANT permissions for these new databases if your read-only MySQL user has a specific list of database accesses.
The currently recommended UCSC browser mirror procedures can be found in the source tree: scripts directory.
To fully utilize scripts such as these, you should be familiar with shell programming and you should
be able to understand what the scripts are doing so you can customize them to your particular installation.
They are not going to work blindly out of the box.
Building a new genome database
I made hgBlat work on my local browser installation by putting the full hostnames into hgcentral.blatservers, e.g. 'blat4' was replaced by the output of `blat4.soe.ucsc.edu`. I wonder if it wouldn't be a good idea to mention this in the mirroring instructions somewhere. --- max
Before you start using our blat servers, you need to verify with us that you have permission. We can't have everyone with a mirror site simply use our blat servers, the load would take them down for everyone. See also: Kent Informatics for a commercial blat license.
A nice command from Paul McKenna: UPDATE blatServers SET host=concat(host,'.soe.ucsc.edu'); Max 15:11, 3 February 2007 (PST)