Programmatic access to the Genome Browser: Difference between revisions

From genomewiki
Jump to navigationJump to search
(Created page with "* Get the sequence of a genome at a particular place ** Download the tool twoBitToFa from http://hgdownload.cse.ucsc.edu/admin/exe/ ** twoBitToFa http://hgdownload.cse.ucsc.edu/g...")
 
No edit summary
Line 1: Line 1:
* Get the sequence of a genome at a particular place
The UCSC API for retrieving data and uploading data is not REST driven but revolves around client-side C tools that convert to/from binary files.
** Download the tool twoBitToFa from http://hgdownload.cse.ucsc.edu/admin/exe/
 
** twoBitToFa http://hgdownload.cse.ucsc.edu/gbdb/hg19/hg19.2bit test.fa -seq=chr21 -start=1 -end=10000
Here are some common tasks that can be solved with calls from scripts to the UCSC Genome Browser, assuming that you know the standard Unix command line tools:
 
* Get the chromosome sequence for a range
** Download the tool twoBitToFa from http://hgdownload.cse.ucsc.edu/admin/exe/ e.g. with <code>curl http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/twoBitToFa > twoBitToFa; chmod a+x twoBitToFa</code>
** To get the DNA sequence from e.g. the human genome hg19, run a command like <code>twoBitToFa http://hgdownload.cse.ucsc.edu/gbdb/hg19/hg19.2bit stdout -seq=chr21 -start=1 -end=10000</code>. You can replace stdout with a filename of your choice.
** for best performance, download the 2bit file for your genome from http://hgdownload.cse.ucsc.edu/gbdb/<databaseId> to local disk.
** for best performance, download the 2bit file for your genome from http://hgdownload.cse.ucsc.edu/gbdb/<databaseId> to local disk.
   
   
* Get the "wiggle" (x-y-plot) graph data
* Get the "wiggle" (x-y-plot) graph data for a chromosome range
** Download bigWigToWig from http://hgdownload.cse.ucsc.edu/admin/exe/
** Download bigWigToWig from http://hgdownload.cse.ucsc.edu/admin/exe/ as shown above
** bigWigToWig http://hgdownload.cse.ucsc.edu/gbdb/hg19/bbi/wgEncodeBroadHistoneK562Cbx2Sig.bigWig -chrom=chr21 -start=0 -end=10000000 stdout
** run a command like <code>bigWigToWig http://hgdownload.cse.ucsc.edu/gbdb/hg19/bbi/wgEncodeBroadHistoneK562Cbx2Sig.bigWig -chrom=chr21 -start=0 -end=10000000 stdout</code>. You can also replace stdout with a filename of your choice.
 
* Download data from a database table


* Download data stored in a database table
** use Tools - Table Browser -  "Describe schema" to browse the database schema. All fields have a human readable description and the links to other tables are shown.
** use Tools - Table Browser -  "Describe schema" to browse the database schema. All fields have a human readable description and the links to other tables are shown.
** mysql --no-defaults -h genome-mysql.cse.ucsc.edu -u genome -A -e 'select * from pubsBingBlat' -NB > out.txt
** mysql --no-defaults -h genome-mysql.cse.ucsc.edu -u genome -A -e 'select * from pubsBingBlat' -NB > out.txt
Line 21: Line 24:
* Upload a custom track and link to the genome browser with the track loaded
* Upload a custom track and link to the genome browser with the track loaded


** create a file temp.bed with contents like these:
** create a custom track file as documented here http://genome.ucsc.edu/goldenpath/help/customTrack.html, e.g.: printf 'track name="TestTrack" description="TestTrack with links on features" url="http://www.google.com/$$"\nchr1 1 1000 testIdForUrl' > temp.bed
  track name="TestTrack" description="TestTrack with links on features" url="http://www.google.com/$$"
** upload your file with a command like this, it will print a string to stdout which we are calling $HGSID in the following <code>curl -s -F db=hg19 -F 'hgct_customText=@temp.bed' http://genome.ucsc.edu/cgi-bin/hgCustom  | grep -o 'hgsid=[0-9]*_[a-zA-Z0-9]*' | uniq | sed -e 's/hgsid=//'</code>
  chr1 1 1000 testIdForUrl
** you can link to a fresh genome browser session with only this track loaded with http://genome.ucsc.edu/cgi-bin/hgTracks?hgsid=$HGSID&position=chr1:1-1000
** upload your file with a command like this, it will print a string to stdout which we are calling $HGSID in the following
** you can load more tracks into this session by adding the parameter hgsid=$HGSID to all future curl calls
  curl -s -F db=hg19 -F 'hgct_customText=chr1 1 1000' http://genome.ucsc.edu/cgi-bin/hgCustom  | grep -o 'hgsid=[0-9]*_[a-zA-Z0-9]*' | uniq | sed -e 's/hgsid=//'
</ul>
** you can link to this track with http://genome.ucsc.edu/cgi-bin/hgTracks?hgsid=$HGSID&position=chr1:1-1000

Revision as of 09:34, 6 July 2015

The UCSC API for retrieving data and uploading data is not REST driven but revolves around client-side C tools that convert to/from binary files.

Here are some common tasks that can be solved with calls from scripts to the UCSC Genome Browser, assuming that you know the standard Unix command line tools:

  • Download data stored in a database table
    • use Tools - Table Browser - "Describe schema" to browse the database schema. All fields have a human readable description and the links to other tables are shown.
    • mysql --no-defaults -h genome-mysql.cse.ucsc.edu -u genome -A -e 'select * from pubsBingBlat' -NB > out.txt
  • Get a copy of the current Genome Browser image from a script
    • use "curl http://genome.ucsc.edu/cgi-bin/hgRenderTracks > test.png". hgRenderTracks understands the same parameters and options as the main hgTracks CGI, e.g. <trackName>=pack.
    • to show only a single track with hgRenderTracks, make sure that the first track parameter is hideTracks=1
    • for example, to download the image for a chromosomal location with only the RefSeq transcripts and publications track to "pack" mode, use this command:
  curl 'http://genome.ucsc.edu/cgi-bin/hgRenderTracks?position=chr17:41570860-41650551&hideTracks=1&refGene=pack&pubs=pack' > temp.png 
  • Upload a custom track and link to the genome browser with the track loaded