Releasing an assembly: Difference between revisions

From genomewiki
Jump to navigationJump to search
(Replacing page with 'This page is no longer maintained.')
 
(102 intermediate revisions by 3 users not shown)
Line 1: Line 1:
==Pre-staging of assembly on hgwbeta==
This page is no longer maintained.
 
===Check if chromosome sizes have changed significantly===
* If you are releasing an update to an assembly, check to see if chromosome sizes have changed significantly. Report any significant changes to the developer.
* Output chromosome sizes from the old and new assemblies into two files and compare them
hgwdev > hgsql -Ne "select chrom, size from chromInfo" $oldDb > oldChromSizes
hgwdev > hgsql -Ne "select chrom, size from chromInfo" $newDb > newChromSizes
hgwdev > sdiff -s oldChromSizes newChromSizes
 
===Check that there the chain/nets/liftOvers listed in the pushQ are to valid assemblies on the RR===
If your assembly has a Chain/Nets to/from an assembly that is *not* on the RR (and not in the pushQ as another new assembly), you do not need to QA those Chain/Nets or push them to the RR. You will need to drop those three rows from your sub-pushQ. To remove them from the pushQ go to the chain/net track entry, click lock and then click the delete button. Also check that there are no liftOver files in gbdb listed in the pushQ to organisms that haven't been released.  To find the other-organism liftover files, cd to /gbdb on hgwdev and use this command: ls -d */liftOver/*$db* .
 
===Send email to markd and Jeltje van Baren (jeltje at cse.wustl.edu) so that they can start the N-Scan predictions===
They don't always do N-Scan predictions for every assembly, but this way they know so they can get their track in the pushQ soon after the assembly is pushed to the RR.
 
==Stage on assembly on hgwbeta==
 
====Add assembly to mirror exclude list====
Add the new assembly to the mirror exclude list for the gbdb and mysql rsync download targets at hgdownload:/opt/csw/etc/rsyncd.conf or email push-request@soe.ucsc.edu and have them do it for you.
 
===Push tables to mysqlbeta===
 
====Create database on hgwbeta and push tables for $db====
* Create the database on hgwbeta.
hgwbeta > hgsql
mysql > CREATE DATABASE $db;
* Create a list of tables to import from hgwdev from the tables listed in the push queue. There should be a table called '$db' in the qapushq database on hgwbeta which can be used to get all of the tables at once:
hgwbeta > hgsql -Ne "SELECT tbls FROM $db WHERE dbs='$db'" qapushq > tableList
To convert spaces to newlines for the tableList:
awk '{ for (i=1;i<=NF;i++) print $i }' infileName > outfileName
* Remove the hgFindSpec, trackDb, and tableDescriptions tables from the tableList.
* Push the tables to hgwbeta.
hgwdev > bigPush.csh $db tableList
bigPush.csh gives size of the push at the end, which you can use to confirm it is "similar" to the original size from hgwdev. You can also compare sizes in the main pushQ by putting a "*" in the tables field, selecting hgwdev from the "Current Location", and then clicking on "show sizes" button.
 
====Push chain/net tables in other organisms====
This will involve these tables: otherDb.(chrN_)chain$db, otherDb.(chrN_)chainLink$db, and otherDb.net$db. First create a file with the 3 files and use bigPush.csh to push to all the otherDbs in the pushQ. Verify that the assembly is on the RR before pushing the chain/net files to it.
hgwdev > bigPush.csh $db tableList
After pushing the tables you will need to make beta in trackDB on hgwbeta for each of these organisms.
 
===Update hgcentralbeta===
After you have completed the steps below, you can use the script '''checkMetaData.csh''' to make sure that all of the metadata is the same on hgwdev and on hgwbeta. Run this script in a temporary folder because it creates several files.
 
====dbDb====
Create (or update) hgcentralbeta.dbDb metadata. This will add the new assembly to the hgGateway page. When checking on hgwdev, make sure that the assembly date is correct under the description column. It should be later than the previous assembly. If it is not, contact the developer.
* Check to make sure your row doesn't exist in hgcentralbeta:
hgwbeta > hgsql hgcentralbeta
mysql > select * from dbDb where name = '$db'\G
* Check to make sure the row exists on hgcentraltest:
hgwdev > hgsql hgcentralbeta
mysql > select * from dbDb where name = '$db'\G
* If the above looks correct, then redirect it to a file:
hgwdev > hgsql -N -e "select * from dbDb where name = '$db'" hgcentraltest > hgcentraltest.dbDb
* Check the newly created file:
hgwdev > cat hgcentraltest.dbDb
* Load onto hgcentralbeta:
hgwdev > hgsql -h mysqlbeta -e "LOAD DATA LOCAL INFILE 'hgcentraltest.dbDb' INTO TABLE dbDb" hgcentralbeta
* Check from hgwdev to see if hgcentralbeta has been updated with the new row:
hgwdev > hgsql -h mysqlbeta -e "select * from dbDb where name = '$db'" hgcentralbeta
 
====blatServers====
The developer has often already requested that the blat servers be set up for the new assembly - check whether there are lines in hgcentraltest.blatServers. If they are there, follow the steps below. If not, request a blat server from the cluster-admins and create 2 lines in hgcentraltest.blatServers and hgcentralbeta.blatServers. The cluster-admins will give you the name of the blatServer and the port numbers for the isTrans and canPcr. Then you can add two new lines to the blatServer table for this information on both the hgcentraltest database on hgwdev. If this is an update to a previous assembly, you will want to leave the entries for the previous assemblies in the blatServers table. For more information about where the blat servers for different machines should be hosted go to [[Updating blat servers]].
* Get the data from hgwdev:
hgwdev > hgsql -Ne "SELECT * FROM blatServers WHERE db = '$db'" hgcentraltest > blat.dev
*Check if the lines are already on beta and load if not:
hgwdev > hgsql -h mysqlbeta -Ne "SELECT * FROM blatServers WHERE db = '$db'" hgcentralbeta
hgwdev > hgsql -h mysqlbeta -e "LOAD DATA LOCAL INFILE 'blat.dev' INTO TABLE blatServers" hgcentralbeta
 
====genomeClade and gdbPdb====
Also move data for hgcentralbeta.genomeClade (select * from genomeClade where genome='$db';), hgcentralbeta.gdbPdb (for Known Genes), etc. as needed using same methods. If this is not the first assembly for an organism, genomeClade will already be fine.
 
====liftOverChain====
You only need to copy lines from liftOverChain from hgcentraltest to hgcentralbeta if there are liftOver files - check the push Q. Also, make sure that you only move lines in liftOverChain that are to assemblies that are on the RR. Check that there aren't lines in liftOverChain that should be in the pushQ but aren't (e.g. lift ups from old orgs, etc). Email the developer and ask them to add them to the pushQ if necessary.
hgsql -Ne "SELECT * FROM liftOverChain WHERE fromDb = '$db' OR toDb = '$db'" hgcentraltest > chain.dev
Check beta, load if not present and recheck:
hgsql -h mysqlbeta -Ne "SELECT * FROM liftOverChain WHERE fromDb = 'danRer6' OR toDb = 'danRer6'" hgcentralbeta
hgsql -h mysqlbeta -e "LOAD DATA LOCAL INFILE 'chain.dev' INTO TABLE liftOverChain" hgcentralbeta
 
====defaultDb====
Do not change the value for defaultDb (leave it set to the previous assembly for this organism) for human and mouse on hgwbeta. This is because many people use these assemblies and will be confused when it changes on hgwdev and hgwbeta. For assemblies other than human and mouse, change the defaultDb so that you don't accidentally test the previous assembly. If this is the first assembly for an organism, you will need the defaultDb entry in order for the assembly to appear on hgwbeta.
 
===Push /gbdb/$db and html/description.html===
Extract all of the gbdb files from the pushQ for your org and those for the other orgs as well:
 
hgwbeta > hgsql -Ne "SELECT files FROM $db" qapushq > fileList
 
Ask for a push of the list of /gbdb files above from hgwdev to hgnfs1. Remind the pushers that items that are symlinked on hgwdev should become real files on hgnfs1. To see how big these files are:
 
hgwdev > cd /gbdb/$db
hgwdev > du -hscL `ls -d */liftOver/*$db*` .
 
===Push image file to hgwbeta and rr===
The image file that appears on the gateway page should reside in the kent source tree in:
~/kent/src/hg/htdocs/images/
and a copy should exist at:
hgwdev > /usr/local/apache/htdocs/images/
 
If there is a previous assembly, it is possible that it is using the same image on the gateway page. Check on hgwbeta to see if the image is missing. If it isn't, you don't need to ask for the image to be pushed.
 
To get the image to appear on hgwbeta and the RR, ask for a push of the file from hgwdev to hgwbeta and the RR. It's a good idea to ask for the push of the image to the RR during the staging process, as you will inevitably forget to push it when it's time to release the assembly. If there are any other images for this assembly (for instance, the phylo image that goes with the Conservation track), you can push them too.
 
===Make trackDb on hgwbeta===
Remake the trackDb on hgwbeta. Will likely need to be done again as track descriptions are updated.
hgwbeta> cd kent/src/hg/makeDb/trackDb
hgwbeta> make beta DBS=$db
 
===Turn on GenBank updates===
The new assembly should already be listed in the files align.dbs and hgwdev.dbs. If it is not, check with Mark Diekhans. Turn on GenBank updates on hgwbeta before 4:30 p.m., when the daily updates start by adding the new assembly to /kent/src/hg/makeDb/genbank/etc/hgwbeta.dbs in alphabetical order. After committing the change:
ssh hgwbeta
cd ~/kent/src/hg/makeDb/genbank/
git pull
make etc-update-rr
Note: etc-update-rr is correct, as this updates all of the /genbank/etc files viewable by the rr.
To see whether updates have run (at least a day after the *.dbs files were updated), check the update times of the table 'gbLoaded':
hgwdev > updateTimes.csh $db gbLoaded
The update times will be out of sync between machines, but not by more than 24 hours or so if updates are running. The gbLoaded table will be updated regardless of whether changes to other GenBank tables were picked up. More genbank update instructions are available at [[Genbank updates]].
 
==Test on hgwbeta==
 
===Check the .2bit files===
 
====Check that it is not symlink on hgnfs1 and that there is only one file and is not in a subdirectory====
It is fine if hgwdev is a symlink.
 
====Check that all 3 copies of the .2bit file (gbdb, downloads, blat servers) are identical====
The 3 locations of the .2bit file are /gbdb/$db/, /usr/local/apache/htdocs-hgdownload/goldenPath/$db/bigZips/ and on the blat server (/scratch/$db). Use md5sum to confirm they are identical. Get the blat server from the hgcentral database and ssh into the machine:
 
hgwdev > ssh qateam@blat#.cse.ucsc.edu
 
This will let you on to the blat machine after which you can look in /scratch/$db to see the .2bit file. If it is not the same as the other .2bit files ask the pushers to restart the assembly and to pull the newest .2bit file from /gbdb.
 
===Review the phylogenetic location in pull-down menus on hgGateway===
Organisms are supposed to be listed in phylogenetic order in the pull-down menus on hgGateway. Check to see if your new genome is in the the right place evolutionarily. Find a map or get help if unclear about proper location.
hgwbeta > hgsql hgcentralbeta
mysql > select name, orderKey from dbDb order by orderKey;
 
===joinerCheck===
 
====Check that common keys between tables are in sync====
hgwbeta > cd ~/kent/src/hg/makeDb/schema
hgwbeta > joinerCheck -database=$db -keys all.joiner
 
====Check that all tables in this database are mentioned in all.joiner====
hgwbeta > cd ~/kent/src/hg/makeDb/schema
hgwbeta > joinerCheck -database=$db -tableCoverage all.joiner
If not all of the tables are listed, email the developer asking him to add those tables to the tablesIgnored $db. According to Hiram it is probably ok for us to edit all.joiner ourselves.
 
===Check indices===
Every table should have at least one index, which should be on columns that make sense to index. You can either do it the easy way: using pushQ, click on "show sizes button" or the mysql way:
mysql > show index from $table_name;
 
===Verify makedoc for all the tracks listed in the pushQ===
File should be /src/hg/makeDb/doc/$db.txt. Check that all the tracks listed in the pushQ are included. Things that probably won't be in the makedoc explicitly are the supporting tables, gc5Base, nestedRepeats, genbank tables, assembly, and gap. If everything is there, be sure to click on “Y” in pushQ for both the main pushQ and all the tracks in the sub-pushQ. Note that you can quickly change the values for all the tracks in the sub-pushQ by accessing the database (qapushq) directly from hgwbeta.
 
===Run featureBits to verify that the gold and gap tables together cover the entire genome===
Run featureBits -countGaps -or $db gold gap, to make sure that the gold and gap table together cover the entire genome (should be 100%)
 
===Check to make sure that none of the table names have underscores(_)===
There are some older tables that have underscores (all_est and all_mrna) -- these are OK. What is definitely *not* OK is for split tables (tables that start with chr) to have more than one underscore in their name. Run the two queries below and verify that the only returned results follow these rules:
 
mysql> show tables like "%\_%";
mysql> show tables like "%\_%\_%";
 
===Make sure that there is a liftOver file from the previous assembly to this assembly===
This is the number one request after a new release. These files are located here:
 
/gbdb/[from database]/liftOver/[from database]To[to database].over.chain.gz
 
===If the new assembly is an update to the human, mouse, rat, zebrafish, D. melanogaster, C. elegans, or S. cerevisiae genomes, make sure that the appropriate blastTab tables to this assembly are built===
More information about blastTabs can be found [[BlastTabs|here]]
 
===Review all tracks in the sub-pushQ===
Make sure to run doGenbankTests to check genbank tracks in the pushQ. If there are Ensembl Genes also run qaEnsGenes.csh.
 
===Check that all of the MySQL tables are in good repair===
To do this run:
 
hgwbeta> sudo dbCheck.sh $db
 
This will do a myisamchk on all tables in that $db and repair any that need repairing (noted in the output by the words "REPAIR needed").
 
===Check all sample queries on hgGateway page===
From the gateway page, check all of the sample queries listed in the assembly details. Edit them if they do not work.
 
===Check default position and default tracks are scientifically interesting and aesthetically pleasing===
From the gateway page, press 'Click here to reset'. Go back to your assembly, then press 'submit'. You will be taken to the default position for your assembly. Make sure that the resulting area is scientifically interesting and aesthetically pleasing! You can edit the default location here: hgcentralbeta.dbDb.defaultPos and the default tracks here: /kent/src/hg/makeDb/trackDb/$db/trackDb.ra.
 
===Check Blat and PCR===
See <link> for more information
 
===Verify downloads===
The downloads are located at:
 
hgwdev> /usr/local/apache/htdocs-hgdownload/goldenPath/$db/
 
Note that you should only push the downloads needed for the tracks in your pushQ. LiftOver files and vs* directories are for the chain/net tracks and the multiz*way, phastCons*way and phyloP*way directories are for the conservation track.
 
Note that /$db/database will be empty except for README.txt. This directory will contain a dump of the database on the RR <link>, but will always remain empty on hgwdev.
 
====Check that the permissions are group protein writable (at least chmod 664)====
The developer who created this assembly will probably be the owner of the directory and the files in it; you may need to ask him/her to change the permissions.
 
====Check the md5sum====
Check that the md5sum against the md5sum.txt file (if there) for each directory you are planning pushing. Note that the md5sum.txt in the liftOver directory may need to be edited (at least temporarily) to include only the liftOver files contained in the pushQ.
 
====Read and verify READMEs====
Check that we have READMEs at top level, and for bigZips, chromosomes, liftOvers and comparatives (multiz, phastCons, vsXXX). Verify that the information in the READMEs is correct. Note that some of the files mentioned in the README are generated by the Genbank process.
 
====Verify that the number of records in the upstream*.zip files is consistent====
In the bigZips directory, check that all upstream.zip files unzip into same number of records. These files are created by the Genbank process and contain the upstream bases for every refSeq record.
 
====Verify that liftOvers are in liftOvers directories and not in vs* Directories====
 
==Stage assembly on Round Robin==
 
 
===Let Donna know that you are getting ready to push so she can decide if she wants to do the static docs or not===
 
 
===Make sure no tables need to be repushed from hgwdev to hgwbeta===
You can use hgwdev> updateTimesDb.csh to compare table update times between hgwdev and hgwbeta. Everything but the genbank tables should have the same updateTimes. To see all of the tables in the assembly that are related to genbank run this: hgsql -Ne 'show tables' $db | egrep -f /cluster/data/genbank/etc/genbank.tbls.
 
 
===Email warning genome-mirror at least 24 hours before release===
Send an email to genome-mirror to let them know there is going to be a bunch more data they might want to host. The way to find out how much data is:
 
Size of entire assembly database:
hgwbeta> cd /tmp
hgwbeta> dbSnoop -unsplit $db $db.dbSnoop
hgwbeta> head $db.dbSnoop
 
Size of entire assembly gbdb:
hgwbeta> cd /gbdb
hgwbeta> du -hsc $db
 
 
===Adjust the release log in main pushQ===
Compile a list of the tracks being released on this assembly and paste it into the release log box of the main pushQ entry for the initial release of the assembly. You can fetch the list from the assembly pushQ, but note that for the genbank tracks you will need to get the names manually.
 
===Request rsync of entire database from push-request===
Make sure about trackDb_public and hgFindSpec_public. Will still need to push trackDb and friends to get track search to work (pushing trackDb and friends tells the pushers to push the trix files that are needed by track search)
 
Note that if you are going to repush any genbank tables, you must push ALL genbank tables together. To see all of the tables in the assembly that are related to genbank run this: hgsql -Ne 'show tables' $db | egrep -f /cluster/data/genbank/etc/genbank.tbls.
 
 
===Mark main pushQ entry as "push requested"===
 
 
===rsync /gbdb again as necessary===
Check that the files that were pushed to hgnfs1 are identical to those on hgwdev.
 
 
===Update hgcentral===
Use the files you created to transfer the appropriate lines from hgcentraltest to hgcentralbeta, to also transfer those lines to the hgcentral database on the RR. This is preferred, since we have verified that the lines are indeed correct since they worked on hgcentralbeta. You can log into hgcentral on the RR like so:
 
hgwdev> hgsql -h genome-centdb hgcentral
 
See <link> to see how to load the files into hgcentral
   
====Change the active column in dbDb to be set to 0====
The active column dictates whether the assembly appears in the drop-down menu on the gateway page. When it equals 0, it doesn't show in the pull down, when it equals 1, it does show.
 
====blatServers, genomeClade, gdbPdb (NOT liftOver)====
It is only necessary to edit genomeClade if this is the first assembly for this species or if the order of species was changed. It is only necessary to update gdbPdb for assemblies that are being released with knownGenes. Also, note that it is ok for hgNearOk in dbDb to equal 1 for an older assembly of same organism.
 
 
===Turn on genbank updates on the RR===
 
 
===Request dump and autodump of database===
Ask the pushers to dump the mysql tables from the RR to .txt.gz and .sql files on hgdownload:/usr/local/apache/htdocs/goldenPath/$db/database, and to start the autodump for this database so that the files will be updated with RR tables.
 
 
===Test the assembly tracks, BLAT and PCR on the RR===
It is possible to do this with active=0 by forcing db=$org and position= into the hgTracks URL. First view an older assembly, then edit the URL so that you are actually viewing your new assembly.
 
==Enable Assembly on RR and post-release follow-up==
 
===Set active=1 in hgcentral===
 
When everything is working as expected, set the assembly to active:
hgwdev> hgsql -h genome-centdb hgcentral
mysql> UPDATE dbDb SET active = 1 WHERE name = "$db";
 
 
===Verify again that everything is working as expected on the RR===
 
 
===Update liftOvers in hgcentral and test hgLiftOver and hgConvert===
 
 
===Update defaultDb in hgcentral===
Set your assembly as the default assembly for this organism. If this was a human or mouse assembly, go back and update hgcentraltest and hgcentralbeta too.
 
 
===Remove assembly from mirror "exclude" list===
Remind the pushers to remove this assembly from the mirror's "exclude" list. This will allow mirror sites to rsync the /gbdb for this assembly.
 
 
===Push Downloads from hgwdev to hgdownload===
These files are pushed directly from hgwdev: /usr/local/apache/'''htdocs-hgdownload'''/goldenPath/$db/ to hgdownload: /usr/local/apache/'''htdocs'''/goldenPath/$db/. Be sure to specify this in your push request. Ask the pushers to be sure to keep the permissions as they are when they push the files (especially making sure that they are group protein writable). Make sure to only push the directories that are applicable to the tracks that are in the pushQ.
 
 
===Update/Add symlink in /usr/local/apache/htdocs-hgdownload/goldenPath/currentGenomes===
Update/Add symlink to hgwdev> /usr/local/apache/htdocs-hgdownload/goldenPath/currentGenomes so that points to the most recent assembly and request a push to hgdownloads. This is for ftp users who only want to go to the most recent assembly for an organism.
 
 
===Confirm that downloads and currentGenome are functional===
For human, mouse and rat (KG assemblies), ask for push of htdocs/knownGeneList/$db/* and knownGeneLists.html
 
 
===Turn on genome-mysql===
Notify cluster-admin that the new assembly is available and needs to be released to genome-mysql. Permissions should be made for users "genome" and "genomep". The admins also need to update the mysql.db table permissions. (Jorge says we can ask them to follow the instructions in their wiki for "Mirror_Server".)
 
 
===Push Static Content from hgwdev to hgwbeta and Round Robin===
 
Pages to update:
* /usr/local/apache/htdocs/indexNews.html
* /usr/local/apache/htdocs/goldenPath/newsarch.html
* /usr/local/apache/htdocs/goldenPath/credits.html
* /usr/local/apache/htdocs/FAQ/FAQreleases.html
* /usr/local/apache/htdocs-hgdownload/downloads.html
* /gbdb/$db/html/description.html
 
For more information go to: [[Static_content_for_new_assemblies]]
 
If new types of tables: goldenPath/gbdDescriptions.html and goldenPath/help/hgTracksHelp.html
 
 
===Announce the Release on genome@soe.ucsc.edu and genecats@soe.ucsc.edu===
Send an email to Donna letting her know that the assembly is released and working on the RR. She will send announcements to: genome mailing list (genome@soe.ucsc.edu) and the genecats mailing list (genecats@soe.ucsc.edu). If Donna is busy you can also edit the documentation yourself and send announcements as above. See "Push Static Content" section.
 
 
===Update hgcentral.sql for mirrors===
cd $WEEKLYBLD; Run buildHgCentralSql.csh real (can run w/o 'real' to just see diffs). hgcentral.sql will be automatically pushed to hgdownload.
 
 
===You may need to update all.joiner for the RR===
 
==Chain/Nets to other organisms==
This can be done at various times
* Now push the trackDb/hgFindSpec as needed for the OtherOrgs that have chains and nets pointing to YourOrg so that these tracks will be turned on. Be sure to run compareTrackDbAll.csh and compareHgFindSpec.csh. Resolve issues as usual.
 
*Drop the old chains and nets that these are replacing, if any. After you drop the chains/nets, be sure to send an email to genome-mirror letting them know what has been dropped (so that they can drop from their mirror sites).
 
**set liftOverChain. Do not change hgcentral.liftOverChain for assemblies on the RR going to the new db until the new db is active on the RR. Prepare a file to load into hgcentral immediately after setting active = 1.
 
 
 
 
 
==Next day follow-up==
 
===Check the Genbank is running===
*Make sure Genbank daily updates are running on Round Robin. You can do this by viewing the dates on the download files in htdocs/goldenPath/$db/bigZips/(they should be more recent than the ones you pushed with your release).
 
 
===Retire the assembly sub pushQ using retirePushQ.csh===
Make sure there are release log entries for the net and chain tracks in other databases. (The script will remove release log notes for all push queue entries where dbs=$db.)
 
 
===Press "done!" in the main push queue and verify the next day that the release log was updated===
*The day after you press “done!” in the main push queue for your assembly, the Release Log on the website will be updated with the information about the new release (from whatever you entered into the Release Log field of the main push queue).
 
 
===Check the downloads against the md5sum size===
You can download each one and then run: md5sum <filename>
 
 
===Check that genome-mysql is working===
From hgwdev:
  mysql -h genome-mysql -A -u genome $db
 
   
 
 
 
 
===Double check next day: release log, database dump/autodump, genome-mysql, genbank auto-update, downloads (bigZip autodownloads) ===
 
 
 
 
[[Category:Browser QA]]

Latest revision as of 19:23, 10 March 2011

This page is no longer maintained.