Assembly QA Part 3 BETA Steps: Difference between revisions

Revision as of 17:52, 20 April 2017

This page is currently a draft in progress. For now, use Releasing an assembly instead.

Navigation Menu Home Page Assembly QA Part 1: DEV Steps Assembly QA Part 2: Track Steps Assembly QA Part 3: BETA Steps Assembly QA Part 4: RR Steps Assembly QA Part 5: Post Release Steps

NOTE TO SELF ADD LINKS TO ALL CHAIN-NET STEPS http://genomewiki.ucsc.edu/genecats/index.php/Chains_and_Nets_QA

Tracks: Populate spreadsheet steps

We need to create a checklist for your beta steps.
You can add a new tab to track beta steps, or you can pick up where you left off on the same tab as your "dev" steps.

To populate the "wiki link" for each step, add this formula to cell A2 (or the row after your last "dev" step) in your new "track checklist" spreadsheet and drag the formula down:

A2 (or the row after your last "dev" step)

=HYPERLINK("http://genomewiki.ucsc.edu/genecats/index.php/Assembly_QA_Part_3_BETA_Steps#"&SUBSTITUTE(B2," ", "_"),"link")

IMPORTANT: Drag the formula for "A" down the spreadsheet to populate the other rows.

To populate the "track checklist steps," add this formula to cell B2 ((or the row after your last "dev" step) in your new "track checklist" spreadsheet. Do NOT drag the formula down.

B2

=IMPORTXML("http://genomewiki.ucsc.edu/genecats/index.php/Assembly_QA_Part_3_BETA_Steps#", "/html/body/div/div/div/div/div/h4/span/span")

This formula will populate all the rows below it with the wiki section titles. You do no need to drag this formula down.

Beta: Check for Ensembl tracks

Before pushing Ensembl tracks to beta, review the Ensembl wiki page and the Ensembl_QA script.

Beta: Copy redmine table list to hive

In Redmine for your assembly, look at the field, "Table List." The engineer should have provided a path to redmine.$db.table.list E.g., /hive/data/genomes/manPen1/redmine5515/redmine.manPen1.table.list

Your file contents should be something like this:

head redmine.manPen1.table.list
hg38.chainManPen1
hg38.chainManPen1Link
hg38.netManPen1
manPen1.augustusGene

Copy the file list to your assembly directory in hive.

From hive, copy the file list to your assembly dir:

cp /hive/data/genomes/manPen1/redmine5515/redmine.manPen1.table.list .

Make a note of which assemblies are listed in this file list by looking at the unique databases/assemblies in column 1. For example:

cut -d'.' -f1 redmine.manPen1.table.list | sort -u

hg38
manPen1
mm10

Beta: Remove ' trackDb' and 'hgFindSpec' tables from your HIVE table list

Remove tables from the list that start with trackDb or hgFindSpec:

sed -i.bak '/trackDb/d' cleanTableList

sed -i.bak '/hgFindSpec/d' cleanTableList

You'll be using this cleanTableList when doing your push.

Beta: Remove 'seq' and 'extFile' tables from your HIVE table list

Follow the steps above to remove seq or extFile tables from cleanTableList.
NOTE: Do not push seq or extFile tables from dev to beta.
You must use the copyExtSeqRows.csh script to move only the rows needed.
More information can be found on the conservation track wiki page.

Beta: Make clean table lists for pushing

In the example above, the assembly manPen1 has tables in 3 different databases (manPen1, hg38, and mm10). It will be helpful to create a file, containing only table names, for each unique database listed in the Table List. These files can then be used for pushing tables in a loop.

Using the example above, make 3 files with a list of tables to push for each assembly,

1 for manPen1 with tables to push

cat redmine.manPen1.table.list| grep "mm10." | cut -d'.' -f2 > mm10.push

1 for hg38 with tables to push

cat redmine.manPen1.table.list| grep "hg38." | cut -d'.' -f2 > hg38.push

1 for mm10 with tables to push

cat redmine.manPen1.table.list| grep "manPen1." | cut -d'.' -f2 > manPen1.push

Beta: Compare cleanTableList with dev mySQL table list

It's worthwhile to compare the Redmin Table List with the tables that are on dev for your assembly database and also for any other databases listed (e.g., databases that have associated chain net files).

hgsql -Ne "show tables" manPen1 > devTablesAll

cat devTablesAll | grep -v trackDb | grep -v hgFindSpec > devMySqlTableList

This command will show a side-by side difference between the tables on dev and the tables you will push.

sdiff -s devMySqlTableList manPen1.push

Examine which tables are missing, ask if you have questions.

Beta: Push all tables for your assembly to beta

Push all tables (EXCEPT seq, extFile, hgFindSpec and trackDb tables) from hgwdev to hgwbeta:

Option 1: Push 1 table at a time

This command will push one table from dev to beta for your database/assembly:

 sudo mypush $db $table mysqlbeta

Option 2: Push tables in a loop

for table in $(cat yourTableList); do sudo mypush $db $tbl mysqlbeta; done

For example:


for table in $(cat manPen1.push); do sudo mypush manPen1 $table mysqlbeta; done

Option 3: Use the bigPush.sh script, list tables in a file

List space separated tables in quotes.

hgwdev > bigPush.csh $db myTableList

Option 4: Use the bigPush.sh script, list tables in quotes

List space separated tables in quotes.

hgwdev > bigPush.csh $db "table1 table2 table3 table4"

You can count the number of tables in a database if you want to compare the number of tables listed in your push file to the number of tables that have been pushed to beta.

FROM HIVE:

wc -l manPen1.push
55 manPen1.push

FROM MYSQLBETA

use manPen1
USE databasename; SHOW TABLES; SELECT FOUND_ROWS();

+--------------+
| FOUND_ROWS() |
+--------------+
|           55 |
+--------------+

Beta: Push all tables to other assemblies to beta

Repeat the push steps for other assemblies listed in your table list (e.g., chain/net tables to other assemblies). For example:

for table in $(cat hg38.push); do sudo mypush hg38 $table mysqlbeta; done

for table in $(cat mm10.push); do sudo mypush mm10 $table mysqlbeta; done

Beta: Review copyHgcentral steps

You can copy items from hgcentraltest to hgcentralbeta with the copyHgcentral script. For the usage statement, run:

hgwdev > copyHgcentral -h

The copyHgcentral script must be run in test mode first.
Test mode will show you the state of hgcentraltest, hgcentralbeta and hgcentral.
Once test mode has been run and reviewed, run execute mode to copy from hgcentraltest to hgcentralbeta.
Note that test mode generates output files which must be manually deleted afterward. Be sure to run copyHgcentral in hive or your home directory and not in a directory where temp files should not be.
Note that copyHgcentral can be run for "all" (blatServers, dbDb, defaultDb, genomeClade):

hgwdev > copyHgcentral test $db all dev beta

Beta: copyHgcentral test $db blatServers dev beta

Generates files, run in hive:

hgwdev > copyHgcentral test $db blatServers dev beta

hgwdev > copyHgcentral execute $db blatServers dev beta

You can also check on mysqlbeta:

use hgcentralbeta;

select * from blatServers where db='manPen1';

Beta: copyHgcentral test $db dbDb dev beta

Generates files, run in hive:

hgwdev > copyHgcentral test $db dbDb dev beta

hgwdev > copyHgcentral execute $db dbDb dev beta

You can also check on mysqlbeta:

use hgcentralbeta;

select * from dbDb where name='manPen1' \G;

Beta: copyHgcentral test $db defaultDb dev beta

Generates files, run in hive:

hgwdev > copyHgcentral test $db defaultDb dev beta

hgwdev > copyHgcentral execute $db defaultDb dev beta

You can also check on hgsqlbeta

use hgcentralbeta;

select * from defaultDb where name="manPen1"limit 1;

Beta: copyHgcentral test $db genomeClade dev beta

NOTE: This table probably will not need to be updated. It contains records like this:

mysql> select * from genomeClade order by rand() limit 5;
+-----------------+------------+----------+
| genome          | clade      | priority |
+-----------------+------------+----------+
| GRCh38.p2       | haplotypes |      134 |
| C. japonica     | worm       |       70 |
| Atlantic cod    | vertebrate |      125 |
| D. melanogaster | insect     |       10 |
| D. persimilis   | insect     |       55 |
+-----------------+------------+----------+

Generates files, run in hive:

hgwdev > copyHgcentral test $db genomeClade dev beta

hgwdev > copyHgcentral execute $db genomeClade dev beta

Beta: copyHgcentral: liftOverChain (manual move)

liftOverChain is not copied with the copyHgcentral script, it needs to be copied manually.

Only copy lines from liftOverChain on hgcentraltest to hgcentralbeta if there are liftOver files listed in the pushQ and if the assemblies they go to/from exist on the RR.
Check for lines in liftOverChain that should be in the pushQ, but aren't (e.g., the liftOver from a previous assembly).
Add lines related to your assembly, any previous versions of your organism, and any other organisms that are associated with liftOver files and your assembly.
More details on the Chain and Net QA wiki page.

 hgsql -Ne "SELECT * FROM liftOverChain WHERE fromDb = '$db' OR toDb = '$db'" hgcentraltest > chain.dev

Check beta, load if not present and recheck:

 hgsql -h mysqlbeta -Ne "SELECT * FROM liftOverChain WHERE fromDb = '$db' OR toDb = '$db'" hgcentralbeta 

 hgsql -h mysqlbeta -e "LOAD DATA LOCAL INFILE 'chain.dev' INTO TABLE liftOverChain" hgcentralbeta

Beta: checkMetaData

After completing copyHgcentral steps, run checkMetaData.csh $db

This checks that all of the metadata is the same on hgwdev and on hgwbeta.
Run this script in a temporary folder or hive; it creates some comparison files that can be deleted after the check.

Beta: Check that your assembly is listed in align.dbs

The new assembly should already be listed in the files align.dbs and hgwdev.dbs in the source tree at ~/kent/src/hg/makeDb/genbank/etc/.

Beta: Double-check /gbdb files

Double-check that hgnfs1 (which is /gbdb on hgwbeta) has the files listed in Remine file list.

$ /gbdb/manPen1

ls -ld $(find .)

Remove any unnecessary files.
If any files were updated on hgwdev in the course of QAing tracks, make sure the correct version is on hgnfs1.
To see the timestamps of files with symlinks on hgwdev, use the options "-lL" with ls.
The large "L" shows information for a file that a link references, rather than for the link itself.
You may notice that hgwdev has TRIX .ix and .ixx files, that is OK. These live on beta in a different location, /data/trix, more information at the at the TrackDb page.

Note: files on hgwdev are double the size as beta, this is not a concern.

Beta: Push gbdb files

In order to check your assembly on beta, you will need to:

Request a push of any listed supporting files in /gbdb from hgwdev to hgnfs1
Note that hgwbeta and the RR share the files on hgnfs1, so once these files are in place, there is not another push required when the track is released to the RR.
Note that the gbdb files are sync'd to hgdownload on Sundays. If needed sooner, request a push.
If there are images associated with any track description pages, be sure to run a make beta from within kent/src/hg/htdocs/ to get the images to beta.
Example request

Beta: Admins/pushers completed gbdb push request

Beta: Do a 'make beta' in trackDb for your assembly

Do make beta on hgwdev in kent/src/hg/makeDb/trackDb like so:

 make beta DBS=$db

Example to make beta on more than one db at a time:

 make beta DBS='$db1 $db2 $db3 $db4 etc'

Running multiple dbs in parallel to save time

Multiple assemblies can be run in parallel by using the make -j option (as of 2/10/17, thanks to Mark Diekhans). Updating all dev dbs used to take about 50 minutes, and now it can take about 5 minutes (at 16 in parallel). While Mark has safely run 16 dbs at a time on dev, it is recommended to only run 8 or less at a time on beta or the RR. Use make -j # beta and make -j # public, where the number (make -j 16 alpha) represents how many parallel processes (16) are running.

For example, if you do:

  make -j 8 alpha

it updates everything, 8 at a time. If you do:

  make -j 2 DBS="hg19 hg38 mm10 felCat5"

it updates those 4 databases, 2 at a time .

Note: the 'make in parallel' process creates and removes temporary files:

The tmp dirs are found with:

 kent/src/inc/portable.h:
   char *getTempDir(void);
   /* get temporary directory to use for programs.  This first checks TMPDIR environment
    * variable, then /data/tmp, /scratch/tmp, /var/tmp, /tmp.  Return is static and
    * only set of first call */

Examples:

 make beta -j 4 DBS="dm6 ce11 sacCer3 droEre1 droSec1 droSim1 droYak2 droAna2 dp3 droMoj2 droVir2 droGri1 droPer1"
 make public -j 4 DBS="dm6 ce11 sacCer3 droEre1 droSec1 droSim1 droYak2 droAna2 dp3 droMoj2 droVir2 droGri1 droPer1"

Beta: Do a 'make beta' in trackDb for chain/net organisms

If your assembly has alignments to other organisms, such as chain/net alignments to hg38 or to the previous assembly version of your organism, be sure to also do a 'make beta' for those assemblies.

See details in above section

Beta: Do a 'make public' in trackDb for your assembly

Make your track public by using the "make public" command on hgwdev while in the trackDb directory (src/hg/makeDb/trackDb):

   [user@hgwdev trackDb]$ make public DBS=$db

Example to make beta on more than one db at a time:

 make beta DBS='$db1 $db2 $db3 $db4 etc'

Your track should now be visible on the hgwbeta-public server.

If your track is not visible, you may want to check that your track has the correct release tag. Also see [Three State TrackDb] for more information.

Beta: Do a 'make public' in trackDb for chain/net organisms

If your assembly has alignments to other organisms, such as chain/net alignments to hg38 or to the previous assembly version of your organism, be sure to also do a 'make public' for those assemblies.

See details in above section

cat ~/kent/src/hg/makeDb/genbank/etc/align.dbs | grep manPen
manPen1

If your assembly is missing from align.dbs , check with Brian Raney.

Beta: Turn on GenBank updates

Once your assembly is listed in align.dbs, turn on GenBank updates on hgwbeta before 4:30 p.m.
Add the new assembly to ~/kent/src/hg/makeDb/genbank/etc/hgwbeta.dbs in alphabetical order.
Be sure to save, git add, git commit, and git push the file.
Do not yet edit the rr.dbs, this comes later when the assembly is on the RR.

Beta: GenBank updates: make libs & run make

After committing the change, make sure your libs are up to date:

cd ~/kent/src ; make libs

then go ahead and run the make:

cd ~/kent/src/hg/makeDb/genbank/ 
git pull 
make install-rr install-server

To see whether updates have run (at least a day after the *.dbs files were updated), check the update times of the table 'gbLoaded':

Beta: GenBank updates: check Genbank update times

hgwdev > updateTimes.csh $db gbLoaded verbose

For example, you'll see updates for dev and beta (but not yet for the rr/euro/asia):

updateTimes.csh manPen1 gbLoaded verbose

gbLoaded
=============
dev  2017-04-18 11:40:07
beta 2017-04-18 11:40:07

rr
euro
asia

The update times will be out of sync between machines, but not by more than 24 hours or so if updates are running. The gbLoaded table will be updated regardless of whether changes to other GenBank tables were picked up. More genbank update instructions are available at Genbank updates.

The etc-update-server part of the make will cause the downloads mentioned below in the "Verify downloads" section to be created.

Beta: joinerCheck common keys

Check that common keys between tables are in sync:

 hgwdev > cd ~/kent/src/hg/makeDb/schema 

 hgwdev > joinerCheck -database=$db -keys all.joiner

If there are errors related to genbank identifiers, it is likely because of the genbank load process, and not an issue with your database. Run joinerCheck once the tables are on beta to confirm:

 hgwdev > HGDB_CONF=~/.hg.conf.beta joinerCheck -keys -identifier=$identifier all.joiner

Beta: joinerCheck table times

Check table update times:

hgwdev > cd ~/kent/src/hg/makeDb/schema 

hgwdev > joinerCheck -database=$db -times all.joiner

Beta: joinerCheck tableCoverage

Check that all tables in this database are mentioned/referenced in all.joiner

 hgwdev > joinerCheck -database=$db -tableCoverage all.joiner

If not all of the tables are listed, email the developer asking him to add those tables to the tablesIgnored $db. According to Hiram it is probably ok for us to edit all.joiner ourselves.

Beta: Check release tags: compare dev and beta tracks side-by-side

Compare your tracks by bringing up a dev and a beta browser window side-by-side. If some beta tracks can't be seen, you may need to edit the release tag. See this page for more information: http://genomewiki.ucsc.edu/index.php/ThreeStateTrackDb

Where are your release tags? You can do a recursive search, like this:

find . -name '*ra' | xargs grep "manPen1"

Beta: Remove release tag for big*/vcf track types

Once you verify that the track looks good on hgwbeta, remove the release tag from trackDb.ra.

Beta: Check Tools > LiftOver

Check LiftOver on beta and beta public.

Beta: Check Tools > BLAT (dna)

Copy some dna and check BLAT for a DNA search on beta and beta public.

Beta: Check Tools > BLAT (protein)

Copy some protein and check BLAT for a protein search on beta and beta public.

Beta: Check Tools > PCR

Check PCR on beta and beta public.

Beta: Review chain/net track QA wiki

Review the Chain/Net QA wiki page, see the section QAing on hgwbeta

Beta: run chainNetTrio.csh script

Details in Chain/Net QA wiki page, section QAing on hgwbeta

chainNetTrio.csh  [database] [other database]

Beta: repush any tables? check updateTimes

Make sure no tables need to be repushed from hgwdev to hgwbeta You can use

hgwdev > updateTimesDb.sh -d $db

to compare table update times between hgwdev and hgwbeta. Everything but hgFindSpec, history, tableDescriptions, trackDb and the genbank tables should have the same update times.

To see all of the tables in the assembly that are related to genbank do this:

hgwdev > hgsql -Ne 'show tables' $db | egrep -f /cluster/data/genbank/etc/genbank.tbls

🔵 Done with BETA steps? Go to Assembly QA Part 4: RR Steps

@@ Line 562: / Line 562: @@
 Copy some protein and check BLAT for a protein search on beta and beta public.
-====<span style="color:dodgerblue">Beta: check Tools > PCR ====
+====<span style="color:dodgerblue">Beta: Check Tools > PCR ====
 </span>

Assembly QA Part 3 BETA Steps: Difference between revisions

Revision as of 17:52, 20 April 2017

Tracks: Populate spreadsheet steps

Beta: Check for Ensembl tracks

Beta: Copy redmine table list to hive

Beta: Remove ' trackDb' and 'hgFindSpec' tables from your HIVE table list

Beta: Remove 'seq' and 'extFile' tables from your HIVE table list

Beta: Make clean table lists for pushing

Beta: Compare cleanTableList with dev mySQL table list

Beta: Push all tables for your assembly to beta

Beta: Push all tables to other assemblies to beta

Beta: Review copyHgcentral steps

Beta: copyHgcentral test $db blatServers dev beta

Beta: copyHgcentral test $db dbDb dev beta

Beta: copyHgcentral test $db defaultDb dev beta

Beta: copyHgcentral test $db genomeClade dev beta

Beta: copyHgcentral: liftOverChain (manual move)

Beta: checkMetaData

Beta: Check that your assembly is listed in align.dbs

Beta: Double-check /gbdb files

Beta: Push gbdb files

Beta: Admins/pushers completed gbdb push request

Beta: Do a 'make beta' in trackDb for your assembly

Beta: Do a 'make beta' in trackDb for chain/net organisms

Beta: Do a 'make public' in trackDb for your assembly

Beta: Do a 'make public' in trackDb for chain/net organisms

Beta: Turn on GenBank updates

Beta: GenBank updates: make libs & run make

Beta: GenBank updates: check Genbank update times

Beta: joinerCheck common keys

Beta: joinerCheck table times

Beta: joinerCheck tableCoverage

Beta: Check release tags: compare dev and beta tracks side-by-side

Beta: Remove release tag for big*/vcf track types

Beta: Check Tools > LiftOver

Beta: Check Tools > BLAT (dna)

Beta: Check Tools > BLAT (protein)

Beta: Check Tools > PCR

Beta: Review chain/net track QA wiki

Beta: run chainNetTrio.csh script

Beta: repush any tables? check updateTimes

Navigation menu

Search