Chains and Nets QA

From Genecats
Jump to navigationJump to search


See also: Editing the human trackDb.ra file

There are two components of chains/nets Chains_Nets: the actual chains/nets themselves and the liftOver directories/files. The liftOver [1] files make it possible to convert coordinates from one assembly to another. The liftOver files also have several associated files that will need to be pushed to hgdownloads at the end of the QA process. Chains and Nets are typically QAed in reciprocal pairs as similar tests are being done for both database-chain/net sets; however, this wiki only documents QAing one track.

Prior to QAing on hgwbeta: Pushing from hgwdev

1) Use sudo mypush to push chain/net tables from hgwdev to hgwbeta if this isn't part of a new assembly (the files have most likely been pushed). This will include 3 tables: a chain, net and chain link table. Syntax for this command is: sudo mypush [from database] [tableName] hgwbeta. See the example below:

  • hgwdev: sudo mypush danRer6 chainMm9 hgwbeta
  • hgwdev: sudo mypush danRer6 netMm9 hgwbeta
  • hgwdev: sudo mypush danRer6 chainMm9Link hgwbeta

Additionally, you will have to mypush the chain/net files from the reciprocal assemblies:

  • hgwdev: sudo mypush mm9 chainDanRar6 hgwbeta

You might see other chain files besides chain$db, chain$dbLink,net$db net, these are reciprocalBest files and the tables should not be pushed to beta/RR (at least as of Jan 2018, but there is a hope to perhaps add them someday). Example tables:

chainRBestNomLeu3 chainRBestNomLeu3Link chainSynNomLeu3 chainSynNomLeu3Link netRBestNomLeu3 netSynNomLeu3

Note that the Reciprocal Best and syntenics DO have files (not tables), and those will most certainly be in Hiram's file push list to hgdownload.


2) "Make beta" on dev from ~/kent/src/hg/makeDb/trackDb to include the tables just pushed. Syntax for this command is: make strict DBS=[from database].

  • ex: hgwdev: make beta DBS=danRer6


3) Email a push request to copy /gbdb/[from database]/liftOver/[from database]To[to database].over.chain.gz from hgwdev to hgnfs1. Note that the liftover file will now be available on both beta and RR as it is on hgnfs1. However, because the hgcentral database on the RR doesn't have an entry pointing to this file, the Convert tool will only be visible on beta. Check to see if the push is needed first by navigating on hgwbeta to /gbdb/*database*/liftOver and look for your file.

  • ex: email: Please push /gbdb/danRer6/liftOver/danRer6ToMm9.over.chain.gz from hgwdev to hgnfs1.
  • note that the to database is capitalized as shown above in bold.
  • NOTE: push inverse as well. ex: /gbdb/mm10/liftOver/mm10ToDanRer6.over.chain.gz


4) Add a line from hgcentraltest database to hgcentral beta so every file pushed in step 3 is visible on beta (this is similar to the same way that tables are not visible until trackDb is updated). To do this you first find the line on hgcentraltest.liftOverChain, copy it and then insert it into hgcentralbeta.liftOverChain. Note that it is very easy to make a mistake that can bring down hgcentral so make sure that your syntax and mysql statements are correct. See the example below:

  • hgwdev~$ hgsql hgcentraltest
mysql> select * from liftOverChain where fromDb=“danRer6” and toDb=“mm9”;
  • Now copy this line into a local file:
hgwdev~$ hgsql -Ne "select * from liftOverChain where fromDb='danRer6' and toDb='mm9'" hgcentraltest > chain.dev
  • insert line into to hgcentralbeta on hgwbeta:
First check to make sure that this line isn't already on hgcentralbeta (duplicate lines are hard to remove):
hgwdev~$ hgsql -h hgwbeta -Ne "select * from liftOverChain where fromDb='danRer6' and toDb='mm9'" hgcentralbeta
Insert the line copied from hgcentraltest into hgcentralbeta:
hgwde~v$ hgsql -h hgwbeta -e "load data local infile 'chain.dev' into table liftOverChain" hgcentralbeta
  • check your work:
hgwdev~$ hgsql -h hgwbeta -Ne "select * from liftOverChain where fromDb='danRer6' and toDb='mm9'" hgcentralbeta


  • repeat for every liftOver file for your organism ( including inverse files).

QAing on hgwbeta

Test CGIs

There are two CGIs that use the liftOver data: hgConvert [2] and hgLiftOver [3]. To test these CGIs first select a top level short net that has very few other chains that match to the same area, note the coordinates. For example: Tenrec position: scaffold_229106:701-8492 Human position: chr10:69666116-69673705

To test hgConvert, first go to the genome browser of your "fromDb" database and center the window on the pre-selected net. Click on "View" pulldown and select the Convert link from the top blue bar. Configure the pull down menu to point to the "toDb" being tested and click "submit". The results should be the same as chain/net track details for that chain.

To test hgLiftOver first click on the "Tools" pulldown in the side blue bar on the home page and then click the "Other Utilities" link to reach the "Batch Conversion (liftOver)" link. Configure the pull down menus to match the toDB and fromDb being tested, enter the pre-selected test coordinates and click "submit". If the search fails, lower the minMatch (as low as 0.01) until some result for some region is obtained successfully. Check to see if the results match those from hgConvert and the chain/net track. Results in hgConvert may not match hgLiftOver but will still match the net and chains exactly. If results in hgLiftOver are NOT in hgConvert the developer should be contacted.

Also notice note on bottom of page about Editing the human trackDb.ra file and the note about Replacing old tables with new ones (i.e. dropping old Chain and Nets -which also need dropping from genome-euro).

QA the track either quick or full

Quick QA

You can run the following two scripts instead of doing the "Full QA" (below) using the chainNetTrio script. Pick one (Quick or Full) and then move on to Test CGIs.

Run getChainLines.csh
getChainLines.csh [database] [other database]

Check that the values for chainMinScore and chainLinearGap output by the script match the values listed in the track description page, under "Methods, Chain track", for the relevant track in the Genome Browser.

Example:

Script output:

chainMinScore 3000
chainLinearGap medium

Track description text:

Chains scoring below a minimum score of '3000' were discarded; the remaining chains are displayed in this track. 
The linear gap matrix used with axtChain:
-linearGap=medium

Example to do this in a loop: for dbs in $(cat dbsWithHg38ChainNet) ; do echo $dbs ; getChainLines.csh hg38 $dbs ; done > getChainLines.output.9dbs

Run getMatrixLines.csh

NOTE: In the latest assemblies, blastz is no longer used for scoring matrices, instead lastz is used. getMatrixLines.csh will not work in these cases. Instead, use the findScores.pl script.

getMatrixLines.csh [database] [other database]

Check that the values for the matrix output by the script match the values shown in the track description page, under "Methods, Chain track", for the relevant track in the Genome Browser.

Example:

Script output:

matrixHeader A, C, G, T
matrix 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91

Track description text:

  	A	C	G	T
A	91	-114	-31	-123
C	-114	100	-125	-31
G	-31	-125	100	-114
T	-123	-31	-114	91
If you're not sure what the output should match

For some assemblies like mm10, there is no listing of the chainMinScore, the chainLinearGap or the matrix in the chain/net track description page. In cases such as this where you're not sure what the output of getChainLines.csh and getMatrixLines.csh is supposed to be, you can run the command findScores.pl:

hgwdev> findScores.pl mm10 petMar2
looking in file:
  /hive/data/genomes/mm10/bed/lastz.petMar2/axtChain/run/chain.csh
-scoreScheme=/scratch/data/blastz/HoxD55.q
matrix 16 91,-90,-25,-100,-90,100,-100,-25,-25,-100,100,-90,-100,-25,-90,91
-minScore=5000
-linearGap=loose
If script output does not match description page

If you run getChainLines.csh and getMatrixLines.csh and the output of one or both does not match what is listed on the description page, you will need to edit the trackDb.chainNet.ra file. Make sure first that you are looking at the correct assembly. For instructions on how to do this, see Editing trackDb.chainNet.ra.

Full QA

Use the following only if you do not do the Quick QA (above). You do not need to do both, as the Full QA steps (i.e. the chainNetTrio.csh script) include the above Quick QA scripts.

Run the chainNetTrio.csh script. The chainNetTrio.csh script runs several other scripts and outputs the results into several files. It is recommended that you create a separate directory to run this script in so you don't end up with the random output files in another directory. Read through the results in these three output files: *.chain.* , *.chain2.* , *.net.*. Mainly check that all the scripts tested ok and do a few manual checks that requested in these files.

To run the script:

chainNetTrio.csh  [database] [other database]
  • ex: chainNetTrio.csh danRer6 mm9

After QAing on hgwbeta: Push to and QA on the RR

Push to the RR

Note: for steps 1-3 you may have to do some sneaky/selective pushing (i.e. dropping, including genome-euro) that is different from the steps listed if there is already a chain/net on the RR for an older assembly. See Replacing old tables with new ones (i.e. replacing chainXXX,netFelCat4 with chainXXX,netFelCat5).

1) Make public on beta so that trackDb is completely up-to-date with anyone else's changes.

  • ex: hgwbeta: make public DBS=danRer6

2) Email a push request to copy the three chain/net (chain, net and chain link) tables from hgwbeta to mysqlrr and also trackDb and friends (also see: [Three State TrackDb]).

  • ex: email: Please push trackDb and Friends and the following tables from the danRer6 database: chainMm9, netMm9, chainMm9Link from hgwbeta to mysqlrr

If this is not part of a new assembly release, make sure to add the URL for the Release Log in the PushQ. Instructions.

Test on the RR and Add Line to hgcentral

3) Test the chain/Net track on the RR to make sure it is working properly.

4) Add line to hgcentral.liftOverChain from hgcentralbeta.liftOverChain. Use the same steps listed in step 4 above under "Prior to QAing on hgwbeta". Note that you do not need to push the liftOver file as the RR and hgwbeta both have access to hgnfs1.

  • To log into hgcentral from hgwdev: hgsql -h genome-centdb hgcentral
  • Insert the line copied from hgcentralbeta into hgcentral:
hgsql -h genome-centdb -e "load data local infile 'chain.dev' into table liftOverChain" hgcentral

5) Test that hgLiftOver and hgConvert to make sure they are working as expected.

6) Email a push request to copy this file: /usr/local/apache/htdocs-hgdownload/goldenPath/[from database]/liftOver/[from database]To[to database].over.chain.gz and this directory: /usr/local/apache/htdocs-hgdownload/goldenPath/[from database]/vs[to database]/* from hgwdev to hgdownloads so that the public can download these files (note that .net.axt. files may be in own /axtNet/ directory as described in README file). Note that the destination paths on hgdownload will only be "htdocs", not "htdocs-hgdownload".

  • Before pushing, check the file as described in New_track_checklist#Downloads.
  • Make sure that the vs[database]/md5sum.txt file has been updated to include the new files, and push along with the second liftOver/md5sum.txt too.
  • ex: email: Please push /usr/local/apache/htdocs-hgdownload/goldenPath/danRer6/liftOver/danRer6ToMm9.over.chain.gz and /usr/local/apache/htdocs-htdownload/goldenPath/danRer6/vsMm9/* from hgwdev to hgdownload.

7) If tables were dropped on hgwbeta, email pushers to drop the tables on the RR. Be sure to request genome-euro drop along with mysqlrr.

Update and push downloads.html

8) To make a link to these new download files change the downloads.html file in your hgdownload source tree.

  • hgwdev (normally in /cluster/home/yourname/hgdownload):vi downloads.html
  • in hgdownloads, do a 'git pull'
  • Find a link under "Pair-wise alignments" to a similar database. Copy it, paste it, and change it so that it points to the new vs directory.
  • do a 'make alpha'
  • view on http://hgdownload-test.soe.ucsc.edu/downloads.html to see your changes, then:
hgwdev> git add downloads.html
hgwdev> git commit -m "msg describing your changes" download.html
hgwdev> git push


9) Your changes should now appear in /usr/local/apache/htdocs-hgdownload/downloads.html

10) Email a push request to copy this file: /usr/local/apache/htdocs-hgdownload/downloads.html from hgwdev to hgdownload, with a note to replace "htdocs-hgdownload" with "htdocs" in the directory path. Once push is complete, check to see that your changes are now on the public downloads page.


The end!

Adding chains/nets to a human assembly

If you are adding chains/nets to a human assembly, in addition to the above steps, you will also need to modify the human trackDb.ra file. For instructions on how to do this, see Editing the human trackDb.ra file.