GENCODEqa

From Genecats
Revision as of 01:22, 1 March 2023 by Lrnassar (talk | contribs) (Adding more info on knownGene hgcentral pushes)
Jump to navigationJump to search

QA process for both GENCODE versions (hg19, mm10, mm39) and GENCODE knownGene (hg38, mm39)

This track is 'semi-otto'. That is to say, the original data comes standardized from GENCODE and then it is run through an almost entirely automatic script in order to create the Genome Browser tracks. For this reason, the QA will be expedited and reserved to only the following steps. All GENCODE tracks should be QA'd together.

  1. Make necessary trackDb edits - for GENCODE versions that means hiding previous track. Add the new/updated pennantIcon pointing to what will be the new news archive post
  2. Perform a sanity check on dev, this should be opening the track at a single loci against a similar gene model track to ensure nothing obvious is wrong. This should not be more than 5 minutes. For knownGene tracks: Click on an item and test out all of the hgGene linkouts, report any that are not working. Also, only for For knownGene tracks, you will need to update the desc pages statistics and mentions of any old assembly numbers, etc.
  3. Push tables/data to hgwbeta
  4. Sanity check to make sure that all the correct tables/files were pushed and the track displays. Should be quick like the previous one. No need to re-check links on knownGene
  5. Push tables/data to RR
  6. Final sanity check on the RR, quick as all the others
    1. For knownGene you will also want to update the BLAT servers, the targetDb table, and afterwards push the blastTab tables
    2. For target Db you can see here (http://genomewiki.ucsc.edu/genecats/index.php?title=UCSC_Genes_Staging_Process) essentially the process should be the following for each updating database, for example hg38 below:
# Sanity check first always, so select statement
hgsql -e "select * from targetDb where db='hg38'" hgcentraltest
hgsql -e "select * from targetDb where db='hg38'" hgcentraltest > targetDb.dev
hgsql -h hgwbeta -e "select * from targetDb where db='hg38' limit 1" hgcentralbeta
hgsql -h hgwbeta -e "delete from targetDb where db='hg38' limit 1" hgcentralbeta
hgsql -h hgwbeta -e "load data local infile "targetDb.dev" into table targetDb" hgcentralbeta
hgsql -h genome-centdb -e "select * from targetDb where db='hg38' limit 1" hgcentral
hgsql -h genome-centdb -e "delete from targetDb where db='hg38' limit 1" hgcentral
hgsql -h genome-cetndb -e "load data local infile 'targetDb.dev' into table targetDb" hgcentral

Then for the BLAT update you will want to see which host/port is being used now, e.g.

$ hgsql -e "select * from blatServers where db like '%hg38Kg%'" hgcentraltest
+--------------+---------------------+-------+---------+--------+---------+
| db           | host                | port  | isTrans | canPcr | dynamic |
+--------------+---------------------+-------+---------+--------+---------+
| hg38KgSeqV43 | blat1a.soe.ucsc.edu | 17915 |       0 |      1 |       0 |
+--------------+---------------------+-------+---------+--------+---------+
$ hgsql -h genome-centdb -e "select * from blatServers where db like '%hg38Kg%'" hgcentral
+--------------+--------+-------+---------+--------+---------+
| db           | host   | port  | isTrans | canPcr | dynamic |
+--------------+--------+-------+---------+--------+---------+
| hg38KgSeqV41 | blat1a | 17909 |       0 |      1 |       0 |
+--------------+--------+-------+---------+--------+---------+

Note: We drop the '.soe.ucsc.edu' part, so when updating beta/RR we would only keep blat1a from that.

In this case blat1a is the same as the hgwbeta and RR host, so no need to update that. Only the port number.

hgsql -h hgwbeta -e "update blatServers set port='17915' where db like '%hg38Kg%'" hgcentralbeta
hgsql -h hgwbeta -e "select * from blatServers where db like '%hg38Kg%'" hgcentralbeta
hgsql -h genome-centdb -e "update blatServers set port='17915' where db like '%hg38Kg%'" hgcentral
hgsql -h genome-centdb -e "select * from blatServers where db like '%hg38Kg%'" hgcentral

Lastly, draft the news archive for all of the tracks together and announce them. Let the engineer know in RM for knownGene that we are ready for the blastTab tables.

If any errors are encountered while performing these steps, bring it up to the track developer and have them include checks for the error in future builds of the track.