Table Browser URL

From genomewiki
Revision as of 16:38, 14 July 2017 by Cath Tyner (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Before considering the method of fetching data as a command line operation via a URL to the table browser, please note that these types of operations are much easier to do and more efficient with the kent source tree utilities: featureBits and overlapSelect. With an appropriate setup of your $HOME/.hg.conf file, these utilities can access the UCSC public MySQL server.

However, if you are intent on programming the table browser, here is a brief discussion on how to create a command line script to fetch data from the table browser.

Please take note of the following notice found on the home page of the UCSC Genome Browser WEB site:

Program-driven use of this software is limited to a maximum of
one hit every 15 seconds and no more than 5,000 hits per day.

With that limitation in mind, consider the following procedure.

The trick is to use the table browser in the normal manner until it gives an example of the type of output desired.

Then use cartDump to obtain the cgi variables used by the table browser as it produced that output. Copy those cgi variables into a command line, and add the two special URL variables:

'submit=submit&hgta_doTopSubmit=1'

to trick hgTables into thinking it just got a submit button press. Other form inputs may have different trigger variables. Examine the page source of the form to determine the trigger variable.

With this process, you can get hgTables to produce any of its outputs with a URL fetch as in the examples here. It gets tricky if there are filters or intersections involved. You can not use the paste list of identifiers option since that is a multi-step process in the browser and can not be duplicated with a command line sequence. Instead, use the filter option and place the name of the item to filter by in a name field for the table. You may need to examine the source of a form page to determine the exact form variable names that are in use for inputs.

However, for extensive use of this type of function, it is most often much more convenient and efficient to simply download the actual MySQL table data from hgdownload, and use the kent source tree tools to manipulate and calculate with the actual data locally.

Here is an example of fetching genscan genes within a specified position:

#!/bin/sh
POSITION="chrX:151073054-151383976"
wget --progress=dot \
'http://genome.ucsc.edu/cgi-bin/hgTables?db=hg18&hgta_compressType=none&'\
'hgta_group=genes&hgta_outputType=gff&outGff=1&hgta_regionType=range&'\
'hgta_table=genscan&hgta_track=genscan&org=Human&position='${POSITION}\
'&submit=submit&hgta_doTopSubmit=1' \
    -O genscan.${POSITION}.gtf

And, another example demonstrating the fetch of intron sequence for a named gene:

#!/bin/sh
GENE_NAME=NM_003742
wget 'http://genome.ucsc.edu/cgi-bin/hgTables?clade=vertebrate&db=hg18&'\
'hgSeq.promoterSize=1000&hgSeq.cdsExon=on&hgSeq.intron=on&'\
'hgSeq.downstreamSize=1000&hgSeq.granularity=gene&hgSeq.padding5=0&'\
'hgSeq.padding3=0&hgSeq.casing=exon&hgSeq.repMasking=lower&'\
'hgta_doGenomicDna="get sequence"&'\
'hgta_fil.v.hg18.knownGene..rawLogic=AND&'\
'hgta_fil.v.hg18.knownGene..rawQuery=&'\
'hgta_fil.v.hg18.knownGene.name.dd=does&'\
'hgta_fil.v.hg18.knownGene.name.pat='${GENE_NAME}\
'&hgta_filterTable=hg18.knownGene&'\
'hgta_geneSeqType=genomic&hgta_group=genes&hgta_outputType=sequence&'\
'hgta_regionType=genome&hgta_table=knownGene&hgta_track=knownGene&'\
'org=Human' -O ${GENE_NAME}.introns.fa

Some of those variables are probably unnecessary in the query. This is a maximum set.