Where is the annotation data

From genomewiki
Revision as of 13:59, 17 June 2020 by Max (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Sometimes you know the name of the track, e.g. clinvarMain or est, from clicking around in the UI, and you need the name of the database table or file. This is not as straightforward as it seems.

1) First you can look at the trackDb files. in kent/src/hg/makeDb/trackDb, "grep -r <trackName> *" to find the trackDb.ra file that defines your track. If it has a bigDataUrl line, then the data is in a bigBed/bigWig file referenced here. The path MUST be /gbdb/<db>/xxxxx, we have a habit of storing big files under /gbdb/<db>/bbi, but not always. Either way, use the bigBedToBed to bigBedInfo -as or similar tools for bigWig to look at the data.

2) In all other cases, the data is in a database table with the same name as the track. You can use hgSql or 'mysql --no-defaults -h genome-mysql.cse.ucsc.edu -u genome -A' and then

  use hg19
  select * from clinvarMain;

This either shows (A) the data itself as rows or (B) a pointer to the data file. If it shows the data itself, case 2A, you can go to Mysql, and do a "SELECT * from <tableName> limit 10" to get an idea of the data itself (appending \G if your screen is too small) or "DESCRIBE <tableName>" to get the table schema.

Sometimes the table contains only a single row e.g. /gbdb/hg19/bbi/clinvar/clinvarMain.bb, this is case 2B. So this is non-MySQL data, you know where to look and can display the data with "bigBedToBed /gbdb/hg19/bbi/clinvar/clinvarMain.bb" and "bigBedInfo -as /gbdb/hg19/bbi/clinvar/clinvarMain.bb" to get the data schema.

3) other cases are obscure and very very rare these days. E.g. in hg18 there are some tables that are split by chromosome. You can check with "SHOW TABLES LIKE '%tableName%'" in mysql if there is a table like "chr4_clinvarMain" (there isn't, for this example). For very few and very old tracks, the table name is just different from the track name and the table browser is your friend, or the source code or your trusted older colleagues.