Where is the annotation data

From genomewiki
Revision as of 13:12, 17 June 2020 by Max (talk | contribs) (Created page with "Sometimes you know the name of the track, e.g. clinvarMain or est, from clicking around in the UI, and you need the name of the database table or file. This is not as straigh...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Sometimes you know the name of the track, e.g. clinvarMain or est, from clicking around in the UI, and you need the name of the database table or file. This is not as straightforward as it seems.

1) First you can look at the trackDb files. in kent/src/hg/makeDb/trackDb, "grep -r <trackName> *" to find the trackDb.ra file that defines your track. If it has a bigDataUrl line, then the data is in a bigBed/bigWig file referenced here. The path MUST be /gbdb/<db>/xxxxx, we have a habit of storing big files under /gbdb/<db>/bbi, but not always

2) In most cases, the data is in a database table with the same name as the track. You can use hgSql or 'mysql --no-defaults -h genome-mysql.cse.ucsc.edu -u genome -A' and then

  use hg19
  select * from clinvarMain;

This either shows (1) the data itself as rows or (2) a pointer to the data file. If it shows the data itself, you can go to Mysql, and do a "SELECT * from <tableName> limit 10" to get an idea or "DESCRIBE <tableName>" to get the schema.

Sometimes the table contains only a single row e.g. /gbdb/hg19/bbi/clinvar/clinvarMain.bb, this is case (2). So this is non-MySQL data, you know where to look and can display the data with "bigBedToBed /gbdb/hg19/bbi/clinvar/clinvarMain.bb" and "bigBedInfo -as /gbdb/hg19/bbi/clinvar/clinvarMain.bb" to get the data schema.

3) other cases are obscure and very very rare these days. E.g. in hg18 there are some tables that are split by chromosome. You can check with "SHOW TABLES LIKE '%tableName%'" in mysql if there is a table like "chr4_clinvarMain" (there isn't). For very few and very old tracks, the table name different from the track name.