Programmatic access to the Genome Browser: Difference between revisions

From genomewiki
Jump to navigationJump to search
Line 37: Line 37:
* To get the DNA sequence from e.g. the human genome hg19, run a command like <code>twoBitToFa http://hgdownload.cse.ucsc.edu/gbdb/hg19/hg19.2bit stdout -seq=chr21 -start=1 -end=10000</code>. You can replace stdout with a filename of your choice.
* To get the DNA sequence from e.g. the human genome hg19, run a command like <code>twoBitToFa http://hgdownload.cse.ucsc.edu/gbdb/hg19/hg19.2bit stdout -seq=chr21 -start=1 -end=10000</code>. You can replace stdout with a filename of your choice.
* for best performance, download the 2bit file for your genome from http://hgdownload.cse.ucsc.edu/gbdb/ to local disk
* for best performance, download the 2bit file for your genome from http://hgdownload.cse.ucsc.edu/gbdb/ to local disk
<pre>
twoBitToFa http://hgdownload.cse.ucsc.edu/gbdb/hg19/hg19.2bit stdout -seq=chr21 -start=15000000 -end=15000050
>chr21:15000000-15000050 
agccctgaacaaagacagggcttggcttatataggcaaacttacagaagc
</pre>


== Get the "wiggle" (x-y-plot) graph data for a chromosome range ==
== Get the "wiggle" (x-y-plot) graph data for a chromosome range ==

Revision as of 13:34, 6 July 2015

The UCSC API for retrieving data and uploading data is RESTful over HTTP but does not use JSON to save computational time on the server. Download of most data formats requires client-side C tools that convert to/from binary files. Data upload uses custom text files.

Here are some common tasks that can be done from scripts with the UCSC Genome Browser. It is assumed that the reader knows the standard Unix command line tools.

Download data stored in a database table

  • use Tools - Table Browser - "Describe schema" to browse the database schema. All fields have a human readable description and the links to other tables are shown.
  • to access the public Mysql server, use a commen like mysql hg19 --no-defaults -h genome-mysql.cse.ucsc.edu -u genome -A -e 'select * from pubsBingBlat' -NB > out.txt
  • the list of data tracks is part of the table trackDb. The first column is the internal name of the track.
  • tracks with types that start with "big" are stored in files, all others are stored at least to some extent in Mysql tables.
  • to use the genomic coordinates of non-big tables, retrieve the table and use our documentation of file formats http://genome.ucsc.edu/FAQ/FAQformat.html
  • the first column in many tables with genomic coordinates is called "bin" and can be stripped for most applications
 mysql --no-defaults -h genome-mysql.cse.ucsc.edu -u genome -A -e "select ta
bleName, type, priority from trackDb where tableName in ('gold', 'refGene','knownGene', 'ccds', 'clinvar') limit 5"  hg19 
+-----------+-------------------------------------+----------+
| tableName | type                                | priority |
+-----------+-------------------------------------+----------+
| clinvar   | bigBed 12 .                         |      100 |
| gold      | bed 3 +                             |      100 |
| knownGene | genePred knownGenePep knownGeneMrna |        1 |
| refGene   | genePred refPep refMrna             |        2 |
+-----------+-------------------------------------+----------+

mysql --no-defaults -h genome-mysql.cse.ucsc.edu -u genome -A -e "select * from knownGene limit 3"  hg19 
+------------+-------+--------+---------+-------+----------+--------+-----------+--------------------+--------------------+-----------+------------+
| name       | chrom | strand | txStart | txEnd | cdsStart | cdsEnd | exonCount | exonStarts         | exonEnds           | proteinID | alignID    |
+------------+-------+--------+---------+-------+----------+--------+-----------+--------------------+--------------------+-----------+------------+
| uc001aaa.3 | chr1  | +      |   11873 | 14409 |    11873 |  11873 |         3 | 11873,12612,13220, | 12227,12721,14409, |           | uc001aaa.3 |
| uc010nxr.1 | chr1  | +      |   11873 | 14409 |    11873 |  11873 |         3 | 11873,12645,13220, | 12227,12697,14409, |           | uc010nxr.1 |
| uc010nxq.1 | chr1  | +      |   11873 | 14409 |    12189 |  13639 |         3 | 11873,12594,13402, | 12227,12721,14409, | B7ZGX9    | uc010nxq.1 |
+------------+-------+--------+---------+-------+----------+--------+-----------+--------------------+--------------------+-----------+------------+

Get the chromosome sequence for a range

twoBitToFa http://hgdownload.cse.ucsc.edu/gbdb/hg19/hg19.2bit stdout -seq=chr21 -start=15000000 -end=15000050
>chr21:15000000-15000050  
agccctgaacaaagacagggcttggcttatataggcaaacttacagaagc

Get the "wiggle" (x-y-plot) graph data for a chromosome range

Get a copy of the current Genome Browser image from a script

Upload a custom track and link to the genome browser with the track loaded