Using hgWiggle without a database

From genomewiki
Revision as of 22:33, 15 July 2010 by Hiram (talk | contribs) (note public MySQL db access)
Jump to navigationJump to search

hgWiggle used on local files

The hgWiggle command is used to extract the compressed data values from a "wiggle" type of data track in the genome browser. It is often useful to be able to run this command locally without a database. The following example explains how to use hgWiggle on local files only without a database.

If you do have access to the internet you can use the UCSC public database server to dramatically speed up these types of queries. For this case, you only need to download the .wib files. Note comments in instructions below for this alternative.

Download files from hgdownload

The ".wig" files to use for this are actually the database table dumps available from the hgdownload system. Fetch the files you need to use from hgdownload. For example, the gc5Base track on the Stickleback organism:

Fetch the ".wig" file from the database dump:

ftp://hgdownload.cse.ucsc.edu/goldenPath/gasAcu1/database/gc5Base.txt.gz

And you need the compressed data values in the ".wib" file from the gbdb filesystem files:

ftp://hgdownload.cse.ucsc.edu/gbdb/gasAcu1/wib/gc5Base.wib

Place these files together in the same directory. The compressed gc5Base.txt.gz file is the so-called ".wig" file, make it appear as so:

$ gunzip gc5Base.txt.gz
$ ln -s gc5Base.txt gc5Base.wig

The resulting files appear as:

$ ls -ogrt gcBase*
lrwxrwxrwx  1       11 May 25 09:19 gc5Base.wig -> gc5Base.txt
-rw-rw-r--  1  9869820 May 25 09:36 gc5Base.txt
-rw-rw-r--  1 90820835 May 25 09:37 gc5Base.wib

The hgWiggle command

Then, using hgWiggle, for example, statistics on chrI:

$ hgWiggle -chr=chrI -doStats gc5Base
looking for: gc5Base.wig
#        from file, Table: gc5Base
# Chrom Data    Data    # Data  Data    Bases   Minimum Maximum Range   Mean Variance Standard
#       start   end     values  span    covered                                       deviation
chrI    1   28185910   5512103     5   27560515      0     100   100 44.4915 533.509 23.0978

To get statistics on a set of genomic regions, create a BED file containing the regions (chrom, chromStart, chromEnd), and supply this to hgWiggle, using the -bedFile option.

What is special about this process

The database dump file is slightly different than an actual ".wig" file. It has an extra "bin" column at the beginning. The hgWiggle command ignores this extra column. The "file" column of this file has a fully qualified file name to a /gbdb/gasAcu1/wib/gc5Base.wib file. The hgWiggle command ignores this fully qualified name, and finds the gc5Base.wib file in the current directory.

Multiple .wib files

Some older assembly databases have per-chromosome .wib files in the gbdb wib directory. In this case, download each of those files for your chromosome of interest. The process described here will work in the same manner.