Adding New Tracks to a browser installation
NOTE: This page is not necessarily maintained with the most up-to-date information. The information in the README files in the source tree is guaranteed to be current: src/product
For a generic discussion of how to load any type of track into your local browser mirror, refer to the document in the source tree src/product/README.trackDb and the discussion of how the trackDb entries function in src/hg/makeDb/trackDb/README See also How_to_add_a_track_to_a_mirror.
Your .bed file better not have any browser or track lines if you want the loader to work (e.g.:
grep '^chr' yourFile.bed | hgLoadBed hg18 myData stdin
You need to hop all over the source tree like a frog on a griddle to get things done... (it may be convenient to declare aliases for frequent destinations:
alias tdb='cd ~/kent/src/hg/makeDb/trackDb' alias mdb='cd ~/kent/src/hg/makeDb/doc'
The docs are a little misleading perhaps. I seemed to need a .hg.conf file in my home directory - setting env options didn't seem to help a bit for commandline building access. (see also: src/product/README.mysql.setup)
Sometimes (there is documentation in the files listed above for this) you will need to hash the first line of make alpha for it to work. This is only applicable for the older versions of mirrors. (see also: src/product/README.trackDb)
Instructions for the older mirrors
From the readme on addTrack.txt
(warning: this text is from 2003. It is good for general concepts. For current instructions, use the README files in the source tree src/product/README.*)
This describes how to add a track to the UCSC genome browser. The two major steps in adding a track are creating a table containing the track information, and putting a description of the track in trackDd. The browser has one mysql database for each version of each genome that it displays. Both the track table and the track description live in this database. The current human genome database is hg13, while the current mouse database is mm2.
II. MySQL Preliminaries. (see also: src/product/README.mysql.setup)
Before you get started it is good to look at these databases a little, and make sure that you have update access to them. You could do this directly with the 'mysql' command, but let's do it instead with the 'hgsql' command, which will keep you from having to type your user name and password all the time.
Assuming you've got the browser source already installed in ~/kent/src do the following to create hgsql cd ~/kent/src/lib make This make almost always goes smoothly on Linux. You may need to remove the '-ggdb' flag in the makefile on other systems, and possibly set up a MACHTYPE environment variable, and then mkdir $MACHTYPE on some systems. Next cd ../hg/lib make The main problem that can happen with this make is if the mysql libraries and include files are not found. See kent/src/README for details. The next step is cd ../hgsql make rehash The hgsql program is just a thin wrapper around mysql. It looks for the password and username in the file ~/.hg.conf. Here's the necessary parts of .hg.conf:
- db.host is the name of the MySQL host to connect to
- db.user is the username is use when connecting to the host
- this is the password to use with the above hostname
- this is a database where stuff common to all versions of the genome is stored
This .hg.conf is similar to the cgi-bin/hg.conf file that the browser uses, but it need not contain everything that file does. Also it's advisable to have a read-only user/password in cgi-bin/hg.conf while you'll want a read-write user/password in ~/.hg.conf. Setting this up can involve doing some 'grants' in mysql. See the documentation at www.mysql.com for how do set up various users.
Assuming your mysql and .hg.conf are set up, and that you already have a mirror site going then the command hgsql database (where database is something like hg13 or mm2) should bring you to the mysql prompt. Do mysql> show tables; and you should see a large list of tables. When you've finished adding a track, the track(s) for your tables will be among them. Also try doing: mysql> describe trackDb; This will list the fields of the trackDb table, which has a row for each track. Then do mysql> select tableName,shortLabel,type from trackDb; This will show you some of the key fields from this table. You won't be updating this table directly, but it can be handy to look at it sometimes for debugging purposes. Some other useful mysql commands are
mysql> select count(*) from sex;
This will count up all the items in the sex table. You thought there were only two? This table reflects the diversity of the sex fields in genbank. Try mysql> select * from sex; to see the full diversity. Well, enough of that non-normalized nightmare. To get out do:
III. Loading the main track table.
The UCSC Genome Browser Database is usually loaded from
a text file of some sort. The most popular types of
text files are .psl files for blat alignments, .bed files
for a wide variety of data, and GTF files for gene predictions.
for further information on these formats. For now we'll
assume you have a file in one of these formats that you
want to add to the browser.
(see also: src/hg/makeDb/trackDb/README and src/product/README.building.source)
A) Creating the loader programs cd ~/kent/src/hg/makeDb cd hgLoadPsl make cd ../hgLoadBed make cd ../ldHgGene make cd ../hgPepPred
The makeDb directory also contains loaders for a number of more specialized tables including hgLoadOut for RepeatMasker data. There is also a .doc file describing in detail how we created each database in the files named things like makeHg13.txt and makeMm2.txt. (in src/hg/makeDb/doc/*.txt)
B) Loading a bed file
Loading a bed file is the most straightforward. First decide on the name you want to call the table. Then do hgLoadBed database tableName file.bed type hgLoadBed with no arguments for further information. Database will be something like hg13 or mm2.
C) Loading a psl file
Loading a psl file is also easy. Make sure that the psl file is sorted by chromosome (tName) and start position (tStart). Use kent/src/hg/pslSort or just plain Unix sort for this if necessary. If the number of alignments is somewhat modest (say less than 500,000) then do hgLoadPsl database -table=tableName file.psl -tNameIx This will load everything into one big table. For huge numbers of alignments the browser will be faster if you first split up the data into one file for each chromosome. Name these files chr1_tableName.psl chr2_tableName.psl and so forth. Then do hgLoadPsl database chr*_tableName.psl This will end up making a separate table for each chromosome. Unfortunately it is still a bit complicated to make the details pages for a psl format track to include the alignments themselves. Please contact us at UCSC if this is a priority for you and we will try to make it easier.
D) Loading a GTF (or GFF) file
Generally GTF is a much more tightly defined standard than GFF, so GTF files are more likely to work without tweaking. However most reasonable GFFs will work as well. To load do ldHgGene database tableName file(s).gtf This will make a gene-prediction type table. You often will want to create an associated predicted peptide table as well. To do this do hgPepPred database generic tableNamePep file(s).fa The first word after the '>' in the fasta files should use the same symbol as the 'group' in GFF files or 'transcript_id' in GTF files.
IV. Updating trackDb
WARNING: these trackDb instructions are OBSOLETE, please follow the instructions in src/product/README.trackDb
Your data will not display in the browser until you load it into trackDb. To do this first cd ~/kent/src/hg/makeDb/trackDb and look at the file trackDb.ra, and read the README file. Then decide whether your new track should be global, organism specific, or assembly specific, and edit the corresponding trackDb.ra file. Generally it's good to find an existing track as similar as possible to the track you want to add, copy and paste it, and modify the copy. Then put any explanitory text you want on the track in trackName.html in the appropriate directory. After this do a make alpha to update the trackDb table, or a
to update trackDb_user (where user is you Unix username). If multiple engineers are working on the project you can set up cgi-bin-user directories with hg.conf files that will tell the browser to use trackDb_user instead of trackDb to avoid conflicts with other engineers code.
After the 'make alpha' the browser should show your track. Congratulations if you've made it this far.
-Jim Kent Feb 14, 2003.
Instructions to upload wiggle files
These following instructions are for wiggle bed files. For the other wiggle formats, please alter accordingly. When uploading wiggle files, like the bed files, the header must be taken off, and the type must be set as wig 0 100. Groups should be in separate files to be seen as separate tracks. There are other ways of displaying wig files, but this is the best currently. The format of the files should be :- chr bpstart bpstop score \n
chr17 30258 30259 1
It doesn't matter what order the chr are in, but it MUST be sorted bp value ascending. It is possible to alter the view to dot points, colour ect by changing my_tracks. These changes can be viewed in the help of UCSC genome browser. Then use the wigEncode function and hgLoadWiggle. The pathPrefix is used as it tries to write to the readonly section of the mirror which depending on how your mirror is set up, can cause issues. This redirects it to the directory you're working out of, where the wigEncode naturally pastes the .wig and .wib files. hg18 is the build.
wigEncode file label.wig label.wib
hgLoadWiggle -pathPrefix=/home/c.wells/ucsc/file hg18 label label.wig
A BED track description for a custom SNP track
The .bed file has lines like this:
chr22 49512530 49512531 rs8137951 0 + 49512530 49512531 255,0,0 chr22 49518559 49518560 rs756638 0 + 49518559 49518560 255,0,0 chr22 49522492 49522493 rs3810648 0 + 49522492 49522493 255,0,0 chr22 49524956 49524957 rs2285395 0 + 49524956 49524957 255,0,0 chrX 100017093 100017094 rs5921682 0 + 100017093 100017094 255,0,0 chrX 100019802 100019803 rs5967204 0 - 100019802 100019803 255,0,0
Load the .bed file into a custom track called Ill550v3 Note the .bed file must have NO browser or track lines - otherwise hgLoadBed will barf
/var/www/cgi-bin/loader/hgLoadBed -noBin -tab hg18 Ill550v3 /home/rerla/public_html/illHH550v3.BED
Adjust your trackDb.ra file
I'm working (eg) on meme in
I add these settings - they seem to work the way I want, allowing the name field from the .bed file to appear in full view:
track Ill550v3 priority 20.1 shortLabel Illum550kv3 longLabel Illumina 550k v3 snps visibility pack type bed 9 . color 255,0,0 group varRep thickDrawItem on
Recompile the trackDb
make alpha DBS=hg18