Assembly QA Part 3 BETA Steps: Difference between revisions

From Genecats
Jump to navigationJump to search
mNo edit summary
mNo edit summary
Line 251: Line 251:
If not all of the tables are listed, email the developer asking him to add those tables to the tablesIgnored $db. According to Hiram it is probably ok for us to edit all.joiner ourselves.
If not all of the tables are listed, email the developer asking him to add those tables to the tablesIgnored $db. According to Hiram it is probably ok for us to edit all.joiner ourselves.


====<span style="color:dodgerblue">Beta: repush any tables? check updateTimes====
</span>
===Make sure no tables need to be repushed from hgwdev to hgwbeta===
You can use
hgwdev > updateTimesDb.sh -d $db
to compare table update times between hgwdev and hgwbeta. Everything but hgFindSpec, history, tableDescriptions, trackDb and the genbank tables should have the same update times.
To see all of the tables in the assembly that are related to genbank do this:
hgwdev > hgsql -Ne 'show tables' $db | egrep -f /cluster/data/genbank/etc/genbank.tbls


END OF SECTIONS
END OF SECTIONS

Revision as of 23:14, 17 April 2017

This page is currently a draft in progress. For now, use Releasing an assembly instead.

Navigation Menu

Home Page
Assembly QA Part 1: DEV Steps
Assembly QA Part 2: Track Steps
Assembly QA Part 3: BETA Steps
Assembly QA Part 4: RR Steps
Assembly QA Part 5: Post Release Steps


NOTE TO SELF ADD LINKS TO ALL CHAIN-NET STEPS http://genomewiki.ucsc.edu/genecats/index.php/Chains_and_Nets_QA

Tracks: Populate spreadsheet steps

  • We need to create a checklist for your beta steps.
  • You can add a new tab to track beta steps, or you can pick up where you left off on the same tab as your "dev" steps.
To populate the "wiki link" for each step, add this formula to cell A2 (or the row after your last "dev" step) in your new "track checklist" spreadsheet and drag the formula down:
A2 (or the row after your last "dev" step)
=HYPERLINK("http://genomewiki.ucsc.edu/genecats/index.php/Assembly_QA_Part_3_BETA_Steps#"&SUBSTITUTE(B2," ", "_"),"link")
IMPORTANT: Drag the formula for "A" down the spreadsheet to populate the other rows.
To populate the "track checklist steps," add this formula to cell B2 ((or the row after your last "dev" step) in your new "track checklist" spreadsheet. Do NOT drag the formula down.
B2
=IMPORTXML("http://genomewiki.ucsc.edu/genecats/index.php/Assembly_QA_Part_3_BETA_Steps#", "/html/body/div/div/div/div/div/h4/span/span")

This formula will populate all the rows below it with the wiki section titles. You do no need to drag this formula down.

Beta: Check for Ensembl tracks

Before pushing Ensembl tracks to beta, review the Ensembl wiki page and the Ensembl_QA script.

Beta: Make clean table list

Step 1. Verify table list location In Redmine for your assembly, look at the field, "Table List." The engineer should have provided a path to redmine.$db.table.list E.g., /hive/data/genomes/manPen1/redmine5515/redmine.manPen1.table.list

Your file contents should be something like this:

head redmine.manPen1.table.list
hg38.chainManPen1
hg38.chainManPen1Link
hg38.netManPen1
manPen1.augustusGene

"'Step 2. Copy table list to your hive dir From hive, copy the file list to your assembly dir:

cp /hive/data/genomes/manPen1/redmine5515/redmine.manPen1.table.list .

Step 3. Make a clean file containing only the file names cut -d'.' -f2 redmine.manPen1.table.list > cleanTableList

Beta: Remove certain tables from cleanTableList

Remove tables from the list that start with trackDb or hgFindSpec:

sed -i.bak '/trackDb/d' cleanTableList

sed -i.bak '/hgFindSpec/d' cleanTableList

You'll be using this cleanTableList when doing your push.

Beta: Remove 'seq' and 'extFile' tables

Follow the steps above to remove seq or extFile tables from cleanTableList. Do not push seq or extFile tables from dev to beta.

You must use the copyExtSeqRows.csh script to move only the rows needed.

More information can be found here.

Beta: Push all tables to beta

Push all tables (EXCEPT seq, extFile, hgFindSpec and trackDb tables) from hgwdev to hgwbeta:

This command will push one table from dev to beta for your database/assembly:

 sudo mypush $db $table mysqlbeta

Or, push them all in a loop:

for table in $(cat cleanTableList); do sudo mypush $db $tbl mysqlbeta; done

Beta: Do a 'make beta' in trackDb for your assembly

Do make beta on hgwdev in kent/src/hg/makeDb/trackDb like so:

 make beta DBS=$db

Example to make beta on more than one db at a time:

 make beta DBS='$db1 $db2 $db3 $db4 etc'




Running multiple dbs in parallel to save time

Multiple assemblies can be run in parallel by using the make -j option (as of 2/10/17, thanks to Mark Diekhans). Updating all dev dbs used to take about 50 minutes, and now it can take about 5 minutes (at 16 in parallel). While Mark has safely run 16 dbs at a time on dev, it is recommended to only run 8 or less at a time on beta or the RR. Use make -j # beta and make -j # public, where the number (make -j 16 alpha) represents how many parallel processes (16) are running.

For example, if you do:
  make -j 8 alpha
it updates everything, 8 at a time. If you do:
  make -j 2 DBS="hg19 hg38 mm10 felCat5"
it updates those 4 databases, 2 at a time .
Note: the 'make in parallel' process creates and removes temporary files:
The tmp dirs are found with:
 kent/src/inc/portable.h:
   char *getTempDir(void);
   /* get temporary directory to use for programs.  This first checks TMPDIR environment
    * variable, then /data/tmp, /scratch/tmp, /var/tmp, /tmp.  Return is static and
    * only set of first call */

Examples:

 make beta -j 4 DBS="dm6 ce11 sacCer3 droEre1 droSec1 droSim1 droYak2 droAna2 dp3 droMoj2 droVir2 droGri1 droPer1"
 make public -j 4 DBS="dm6 ce11 sacCer3 droEre1 droSec1 droSim1 droYak2 droAna2 dp3 droMoj2 droVir2 droGri1 droPer1"

Beta: Do a 'make beta' in trackDb for chain/net organisms

If your assembly has alignments to other organisms, such as chain/net alignments to hg38 or to the previous assembly version of your organism, be sure to also do a 'make beta' for those assemblies.

See details in above section



Beta: Do a 'make public' in trackDb for your assembly

Make your track public by using the "make public" command on hgwdev while in the trackDb directory (src/hg/makeDb/trackDb):

   [user@hgwdev trackDb]$ make public DBS=$db

Example to make beta on more than one db at a time:

 make beta DBS='$db1 $db2 $db3 $db4 etc'

Your track should now be visible on the hgwbeta-public server.

If your track is not visible, you may want to check that your track has the correct release tag. Also see [Three State TrackDb] for more information.


Beta: Do a 'make public' in trackDb for chain/net organisms

If your assembly has alignments to other organisms, such as chain/net alignments to hg38 or to the previous assembly version of your organism, be sure to also do a 'make public' for those assemblies.

See details in above section

Beta: Check release tags: compare dev and beta tracks side-by-side

Compare your tracks by bringing up a dev and a beta browser window side-by-side. If some beta tracks can't be seen, you may need to edit the release tag. See this page for more information: http://genomewiki.ucsc.edu/index.php/ThreeStateTrackDb

Beta: Review copyHgcentral steps

Beta: 1. copyHgcentral test $db blatServers dev beta

Beta: 2. copyHgcentral test $db dbDb dev beta

Beta: 3. copyHgcentral test $db defaultDb dev beta

Beta: 4. copyHgcentral test $db genomeClade dev beta

Beta: 5. copyHgcentral: liftOverChain (manual move)

liftOverChain is not copied with the copyHgcentral script, it needs to be copied manually.

  • Only copy lines from liftOverChain on hgcentraltest to hgcentralbeta if there are liftOver files listed in the pushQ and if the assemblies they go to/from exist on the RR.
  • Check for lines in liftOverChain that should be in the pushQ, but aren't (e.g., the liftOver from a previous assembly).
  • Email the developer and ask them to add them to the pushQ if necessary.
 hgsql -Ne "SELECT * FROM liftOverChain WHERE fromDb = '$db' OR toDb = '$db'" hgcentraltest > chain.dev 
Check beta, load if not present and recheck:
 hgsql -h mysqlbeta -Ne "SELECT * FROM liftOverChain WHERE fromDb = '$db' OR toDb = '$db'" hgcentralbeta 
 hgsql -h mysqlbeta -e "LOAD DATA LOCAL INFILE 'chain.dev' INTO TABLE liftOverChain" hgcentralbeta

Beta: checkMetaData

After you have completed the steps above, use the script checkMetaData.csh to make sure that all of the metadata is the same on hgwdev and on hgwbeta. Run this script in a temporary folder; it creates some comparison files that can be deleted after the check.

Beta: joinerCheck common keys

Check that common keys between tables are in sync:

 hgwdev > cd ~/kent/src/hg/makeDb/schema 
 hgwdev > joinerCheck -database=$db -keys all.joiner

If there are errors related to genbank identifiers, it is likely because of the genbank load process, and not an issue with your database. Run joinerCheck once the tables are on beta to confirm:

 hgwdev > HGDB_CONF=~/.hg.conf.beta joinerCheck -keys -identifier=$identifier all.joiner

Beta: joinerCheck table times

Check table update times:

 hgwdev > joinerCheck -database=$db -times all.joiner

Beta: joinerCheck tableCoverage

Check that all tables in this database are mentioned/referenced in all.joiner

 hgwdev > joinerCheck -database=$db -tableCoverage all.joiner 

If not all of the tables are listed, email the developer asking him to add those tables to the tablesIgnored $db. According to Hiram it is probably ok for us to edit all.joiner ourselves.

Beta: repush any tables? check updateTimes

Make sure no tables need to be repushed from hgwdev to hgwbeta

You can use

hgwdev > updateTimesDb.sh -d $db

to compare table update times between hgwdev and hgwbeta. Everything but hgFindSpec, history, tableDescriptions, trackDb and the genbank tables should have the same update times.

To see all of the tables in the assembly that are related to genbank do this:

hgwdev > hgsql -Ne 'show tables' $db | egrep -f /cluster/data/genbank/etc/genbank.tbls

END OF SECTIONS

Request a push of any listed supporting files in /gbdb from hgwdev to hgnfs1 and check on hgwbeta. Note that hgwbeta and the RR share the files on hgnfs1, so once these files are in place, there is not another push required when the track is released to the RR. Be sure to send a push request to have the gbdb files pushed to hgdownload in advance of the usual Sunday sync, if this is necessary for your track.

If there are images associated with any track description pages, be sure to run a make beta from within kent/src/hg/htdocs/ to get the images to beta.

Tracks: Remove release tag for big*/vcf track types

Once you verify that the track looks good on hgwbeta, remove the release tag from trackDb.ra.


Tracks: Remove release tag for big*/vcf track types

Once you verify that the track looks good on hgwbeta, remove the release tag from trackDb.ra.



🔵 Done with BETA steps? Go to Assembly QA Part 4: RR Steps