Using hgWiggle without a database

From genomewiki
Jump to navigationJump to search

hgWiggle used on local files

The hgWiggle command is used to extract the compressed data values from a "wiggle" type of data track in the genome browser. It is often useful to be able to run this command locally without a database. The following example explains how to use hgWiggle on local files only without a database.

If you do have access to the internet you can use the UCSC public database server to dramatically speed up these types of queries. For this case, you only need to download the .wib files. Note comments in instructions below for this alternative.

Download files from hgdownload

If you want to use the UCSC public MySQL server, you only need to download the .wib files. You do not need to download the database .txt.gz files.

The ".wig" files to use for this are actually the database table dumps available from the hgdownload system. Fetch the files you need to use from hgdownload. For example, the gc5Base track on the Stickleback organism:

Fetch the ".wig" file from the database dump:

rsync -aP rsync://hgdownload.cse.ucsc.edu/goldenPath/gasAcu1/database/gc5Base.txt.gz .

And you need the compressed data values in the ".wib" file from the gbdb filesystem files:

rsync -aP rsync://hgdownload.cse.ucsc.edu/gbdb/gasAcu1/wib/gc5Base.wib .

Place these files together in the same directory. The compressed gc5Base.txt.gz file is the so-called ".wig" file, make it appear as so:

gunzip gc5Base.txt.gz
ln -s gc5Base.txt gc5Base.wig

The resulting files appear as:

$ ls -ogrt gc5Base*
lrwxrwxrwx  1       11 May 25 09:19 gc5Base.wig -> gc5Base.txt
-rw-rw-r--  1  9869820 May 25 09:36 gc5Base.txt
-rw-rw-r--  1 90820835 May 25 09:37 gc5Base.wib

The hgWiggle command

Then, using hgWiggle, for example, statistics on chrI:

$ hgWiggle -chr=chrI -doStats gc5Base
looking for: gc5Base.wig
#        from file, Table: gc5Base
# Chrom Data    Data    # Data  Data    Bases   Minimum Maximum Range   Mean Variance Standard
#       start   end     values  span    covered                                       deviation
chrI    1   28185910   5512103     5   27560515      0     100   100 44.4915 533.509 23.0978

To get statistics on a set of genomic regions, create a BED file containing the regions (chrom, chromStart, chromEnd), and supply this to hgWiggle, using the -bedFile option.

Using the UCSC public MySQL server

To operate the hgWiggle command using the public MySQL server, place the following three lines into a special file in your home directory by the name of .hg.conf and set its permissions to 600: chmod 600 .hg.conf

db.host=genome-mysql.cse.ucsc.edu
db.user=genomep
db.password=password
central.db=hgcentral

The password indicated here is indeed password which is not a secret.

With this file in place, and the .wib file present in the directory you want to work in, use the hgWiggle command with the -db argument:

hgWiggle -db=ce6 -chr=chrI -doStats gc5Base

What is special about this process

The database dump file is slightly different than an actual ".wig" file. It has an extra "bin" column at the beginning. The hgWiggle command ignores this extra column. The "file" column of this file has a fully qualified file name to a /gbdb/gasAcu1/wib/gc5Base.wib file. The hgWiggle command ignores this fully qualified name, and finds the gc5Base.wib file in the current directory.

Multiple .wib files

Some older assembly databases have per-chromosome .wib files in the gbdb wib directory. In this case, download each of those files for your chromosome of interest. The process described here will work in the same manner.