Assembly QA Part 1 DEV Steps

From Genecats
Jump to navigationJump to search

These steps were revised in 2017, but you can also see the old steps: Releasing an assembly (old steps)


Navigation Menu

Home Page
Assembly QA Part 1: DEV Steps
Assembly QA Part 2: Track Steps
Assembly QA Part 3: BETA Steps
Assembly QA Part 4: RR Steps

Setup: Create a Google spreadsheet checklist from a template

Steps:

  1. Open a new Google Spreadsheet.
  2. Go to the Google spreadsheet Template: Assembly Release Checklist
  3. Copy the template: File > Make a copy
  4. Give your new spreadsheet a title, like "manPen1 Assembly Release Checklist".
  5. Move your spreadsheet to a good folder on your Google Drive so that you can easily find it later.
  6. All set! You can now use your checklist.

Tips:

  1. Note: This system works best when you create one spreadsheet per assembly.
  2. See the tab, "README" for more info.
  3. If a wiki section is h4 ("====Wiki Section===="), denoted by surrounding the section with exactly 4 equal signs, then the h4 section will appear as a step in your checklist.
  4. To add a new step to your checklist - do not add it directly to your spreadsheet. Instead add a new h4 section to the wiki. Just copy an existing h4 and edit it!
  5. Your h4 will become a url, so the only punctuation you can use is a colon " : " otherwise the wiki link in column A will break.
  6. To see your change, toggle the "#" character in your formula. The "#" is not really needed in the formula, and removing it or adding it back in will re-load the page.

Setup: Make a directory in your hive

During this assembly release process, you will be generating a lot of output, and you'll need a place to put everything. The use of the "hive" directory is encouraged as the best location because of ample space.

 
mkdir /hive/users/userName/assemblies/assemblyName  

e.g.:  mkdir /hive/users/cath/assemblies/manPen1

Setup: Create an alias to your new dir

When you add an alias from your .bashrc file, you can simply type that alias in your command line as a shortcut to the associated command. A "shortcut" alias can be created to allow fast access to your hive directory for this assembly.

To do this, follow the steps below:

  1. In your terminal, connect to hgwdev and type "cd" (go to your home directory).
  2. Confirm the location of .bashrc. Type "ls -a" in your home directory to see all hidden files that have a " . " in the filename. This way you can confirm the location of your .bashrc file.
  3. Open your .bashrc file for editing. If you're using the vi editor, you can type "vi .bashrc" to edit the file. Add an alias by typing in the line below, then save your changes.
alias hive='cd /hive/users/yourUserName/assemblies/yourAssembly'

e.g., alias hive='cd /hive/users/cath/assemblies/manPen1'


Redmine: Review the Redmine as PushQ wiki

  • As of March 2017, the PushQ has been replaced with Redmine to track and release new assemblies.
  • Review the Redmine as the pushQ replacement wiki page.
  • Go to Redmine > GB > Issues > Filter: "Ready for QA"
  • Find the assembly you will QA/Release

Redmine: Set assignee as yourself

Redmine: Set the engineer as watcher if they are not the developer

Redmine: Set Status to Reviewing

Dev: Check minimal browser criteria

Does this assembly have the required tracks?

Visit this page to check that the assembly contains the required tracks to be considered a minimal browser on the RR.

To add explaination: genbank mrnas & ests (/cluster/data/genbank/data/organism.lst) How to view/interpret the file

Dev: Check that BLAT Server is running

To check if your organism has a blat servers set up, run the following command (beware that copyHgcentral creates many temp files):

copyHgcentral test $db blatServers dev beta

a better command that does not create many temporary files is just querying hgcentraltest yourself:

hgsql -e "select * from blatServers where db='$db'" hgcentraltest

The developer has often already requested that the blat servers be set up for the new assembly. If not, and/or if entries for your assembly are missing from hgcentraltest.blatServers, please make a note in the Redmine ticket and ask the assembly builder to 1) request the setup of the blat servers and to 2) manually add the entries to hgcentraltest.blatServers. Make sure that this assembly is not hosted on "blatx" BLAT server. That server is not as stable and therefore is for assemblies that are not destined for the RR. For more information about where the blat servers for different machines should be hosted, go to Updating blat servers.

You should see results like this (below) since this should only be setup on dev so far:


copyHgcentral test manPen1 blatServers dev beta

--------------------------------------------------
--------------------------------------------------
<<< blatServers >>>

hgcentraltest
-------------
manPen1	blat1b	17878	1	0
manPen1	blat1b	17879	0	1

hgcentralbeta
-------------


hgcentral
-------------


*** There are blatServers differences between dev and beta ***

*** The blatServers data on beta and rr is identical ***

Dev: Do a BLAT search: DNA

From BLAT tool on dev:

  1. Go to your browser and copy some DNA sequence
  2. Go to BLAT: Home > Tools > Blat
  3. Paste in sequence
  4. Change query type to DNA and press submit
  5. Click on various blat results to make sure they look as expected
  6. Make a custom track of blat results and then look at them in the browser.

Dev: Do a BLAT search: protein

From BLAT tool on dev:

  1. Go to your browser and copy some DNA sequence -> translate to amino acid sequence*
  2. Go to BLAT: Home > Tools > Blat
  3. Paste in sequence
  4. Change query type to "protein" (amino acid) and press submit
  5. Click on various blat results to make sure they look as expected
  6. Make a custom track of blat results and then look at them in the browser.

Dev: isPCR test

  • Go to dev's PCR Tool and test a PCR search for your assembly.

For example, on hg38:

  • You want to get DNA, about 20-23 bases, that "book end" the region of DNA that will be amplified.
  • For example, here's a 70bp region in hg38: chr1:11,131,574-11,131,643
  • Go to this region on hg38
  • hgTracks, View > DNA (v + d keyboard shortcut)
  • Click "get DNA" with the default selections.
  • Copy the DNA to your clipboard:
>hg38_dna range=chr1:11131574-11131643 5'pad=0 3'pad=0 strand=+ repeatMasking=none
CCTGGTCCCAACACCTAGCCCACGGCCTGACAGAGAACCAGTGCTCAATGCTTGAAGGAAGAACCGCTGG


Go to isPCR for hg38 (Tools >InSilico PCR)
Genome: Human
Assembly: hg38
Tareget::genome assembly
Forward Primer: The first 20(ish) bp of the region, e.g., CCTGGTCCCAACACCTAGCC (in green below)
CCTGGTCCCAACACCTAGCCCACGGCCTGACAGAGAACCAGTGCTCAATGCTTGAAGGAAGAACCGCTGG
Reverse Primer: The last 20(ish) of your region, e.g, GCTTGAAGGAAGAACCGCTGG (in red below)
CCTGGTCCCAACACCTAGCCCACGGCCTGACAGAGAACCAGTGCTCAATGCTTGAAGGAAGAACCGCTGG
Check "Flip reverse primer" This will change the region in red to the reverse compliment and also flip it 180 degrees.
The idea is that you are finding the DNA between the green and the red chunks to amplify.
Click submit.

For the reverse primer in red, you could have output the "-" strand DNA (the reverse compliment" in "Tools > View DNA" by selecting the radio button for the reverse compliment. If you do this for the "Reverse Primer" field in isPCR, then you do not have to select "Flip Reverse Primer."

Dev: Compare chrom sizes

Skip this if your assembly is the first for a species (hosted by UCSC), there will be no chrom sizes to compare to!
For a new assembly version, compare the chrom sizes from the last assembly to this new assembly version. For some assemblies, chrom names were changed, be aware of this if comparing. You are not checking annotations on the reference sequence, you are just checking the number of base pairs per chrom/contig, and making sure that nothing has changed drastically (i.e., millions of base pairs different). Also take a look for general differences, such as chrom labels or number of chrom/contigs.
Output chrom sizes into two files, sort each file by using the command below
Compare the sorted files
There are two ways to compare chromosomes:
1.Navigate to http://hgwdev.gi.ucsc.edu/cgi-bin/hgGateway, find your assembly and click on the "View Sequences" button - bring up 2 windows side by side to view both old and new assemblies. Now, you can compare the chromosome sizes.

or

2. open up a terminal window and input the following commands:

hgsql -Ne "select chrom, size from chromInfo" $prevDb | sort > oldChromSizes
hgsql -Ne "select chrom, size from chromInfo" $db | sort > newChromSizes
sdiff -s oldChromSizes newChromSizes

You may want to use "$cat oldChromSizes | head" to clean up the output in both old and new chromSize files, we are only concerned with the "chr# ####" labels.

Dev: Gateway: Check the tree

On hgGateway, make sure your db appears in the tree.

  1. Type the first few letters of your assembly name in the search field above the "Represented Species" tree, "m-a-n-P-e..." and the rest should populate.
  2. Your assembly should now be highlighted in the tree, and the tree position should have moved so that you are now centered on the tree position for your org.
  3. Hover over the name of your org within the tree, you should see the scientific name.
  4. Hover over the horizontal branch leading to your org, you should see the genus - family - order.
  5. Hover over the vertical branch leading to your org, you should see the superorder.
  6. Go to a different organism on hgGateway. Then scroll down the tree and find your organism. Click on the name of your organism in the tree and you should go to the default assembly for your organism.

Dev: Gateway: Check default position

  1. Go to gateway page
  2. Reset all user settings (Home > Genome Browser > Reset All User Settings
  3. Select the assembly you're testing
  4. Press "Go" on hgGateway
  5. You will be taken to the default position for your assembly.
  6. Make sure that the resulting area is scientifically interesting and aesthetically pleasing!
  7. You can edit the default location here: hgcentraltest.dbDb.defaultPos:
hgsql -e "update dbDb set defaultPos='chr6:43426669-43433274' where name='danRer11'" hgcentraltest

On an unrelated dbDb note, setting the field hgPbOk=1 sets the base pairs shown on hgTracks from T's to U's. This also affects zoomed in MAF files, but shouldn't matter unless we're displaying an RNA genome like SARS-CoV-2. This field was left over from the protein browser and was repurposed, so it should be 0 for all DNA genomes.

Dev: Gateway: Check default tracks

  • Each assembly has certain tracks that are hidden or visible by default.
  • You can edit the default tracks here: /kent/src/hg/makeDb/trackDb/$db/trackDb.ra.

Below is an example for turning on a default gene track that was off when the developer released the assembly to dev.

Resource: https://genome.ucsc.edu/goldenpath/help/trackDb/trackDbDoc.html

  • manPen1 has no gene tracks on by default.
  • I want to turn on the augustus track (on by default, pack visibility).
  • Looking at ~/kent/src/hg/makeDb/trackDb/$db/trackDb.ra, I see that there is no stanza for the augustus track, because it is inheriting the parent *.ra files configuration, making it hidden.
  • I need to override the parent config in the manPen1 .ra file.

Steps:

  • go to dev, Genome Browser > Reset All User Settings
  • note which track you would like to turn on, see if you want it in 'pack' or 'full' etc.
  • vim ~/kent/src/hg/makeDb/trackDb/pangolin/manPen1/trackDb.ra
  • Add something like this:
#Local declaration so that augustus genes is picked up.
track augustusGene override
visibility pack
  • cd ~/kent/src/hg/makeDb/trackDb
  • make alpha DBS=manPen1
  • refresh your dev hgTracks browser and see that your track is now on, inheriting the parent's visibility (pack, in this case).
  • if all looks good, add, commit, push your .ra file.

If your assembly is already public on the RR, then continue the push:

  • make beta DBS=manPen1
  • make public DBS=manPen1
  • Push request to admins: Make trackDb & friends for manPen1
  • Check the rr/euro/asia for your newly visible track.

Dev: Gateway: Check trackDb priority

  • Each assembly has certain tracks that are hidden or visible by default.
    • Our standard is to have the visible tracks at the beginning of each group.
  • You can edit the default tracks here: /kent/src/hg/makeDb/trackDb/$db/trackDb.ra.


Tracks that are on by default should be the first tracks within a group, for example, GENCODE v29 is first in the "Genes and Gene Predictions" group for hg38. All other tracks that are hidden by default proceed the visible tracks in alphabetical order. The only exception to this rule is for the chain/net tracks inside of the "Comparative Genomics" group. These chain/net tracks are in phylogenetic order and should not be in alphabetical order.

To change the order of the tracks on hgTracks, you can use the priority trackDb setting:

priority 1

hgTracks will display the tracks with the lowest priority value first, then followed by any tracks without a priority setting in alphabetical order. The priority value can be a floating point number, so a priority value of 1.1 will be displayed after a track with a priority value set to 1.

Dev: Gateway: Organism image check

The image on hgGateway is referenced from trackDb directory's description.html file.

From your file.list from Redmine, make sure a scientificName.jpg image is listed, check to see that it does exist on dev.

The image file that appears on the gateway page should reside in the kent source tree in:

~/kent/src/hg/htdocs/images/

and a copy should exist at:

hgwdev > /usr/local/apache/htdocs/images/

If the image is not showing up on genome-test, cd to kent/src/hg/htdocs, ensure the image in the images directory is committed, then run make alpha.

Dev: Gateway: Accession ID check


Assemblies/sequences, from various organizations, are submitted to the mother ship GenBank.
Those assemblies might be included in RefSeq if criteria are met.

The QA check should be to go to NCBI and double check that the accessionID is correct, possibly by searching the Accession ID in https://www.ncbi.nlm.nih.gov/assembly/.

RefSeq assemblies:
use accession ID: GCF_000002315.4 (e.g., galGal5)
are delivered with chrMt (if they exisit)
are delivered with NCBI gene predictions
Genbank assemblies:
use accession ID: GCA_000001305.2
delivered without a chrMt.
do not have gene predictions.

For the UCSC Genome Browser, it is preferable to use RefSeq assemblies (in part due to 'more data'). This is a "learn as we go" direction; historically GeneBank was preferred.

Helpful article: Nature, 2012 A beginner's guide to eukaryotic genome annotation

Dev: Gateway: Check the NCBI assembly version link

Check that there is an NCBI link to the exact assembly version, either by clicking the link on the Gateway or searching http://www.ncbi.nlm.nih.gov/assembly/organism/

Dev: Verify make doc for all tracks

  • The makefile/s or initialbuild.txt file for your assembly describes the browser build.
  • Location should be here: ~/kent/src/hg/makeDb/doc/$db/*

Cath asked Hiram about tables that should be mentioned in the makedoc Nov 2017. Below is an example from xenLae2. The makedoc correctly lists all of the necessary tracks.

Mentioned in the makedoc

  1. augustusGene
  2. chromAlias
  3. cpgIslandExt
  4. cpgIslandExtUnmasked
  5. cytoBandIdeo
  6. gap
  7. genscan
  8. gold
  9. microsat
  10. rmsk
  11. simpleRepeat
  12. trackDb
  13. ucscToINSDC
  14. ucscToRefSeq
  15. windowmaskerSdust
  16. (Sometimes) ensGene

Not mentioned in the makedoc, and it is ok that they are not mentioned:

  • Constructed by genbank processes:
  1. all_est
  2. all_mrna
  3. intronEst
  4. refFlat
  5. refGene
  6. refSeqAli
  7. xenoRefFlat
  8. xenoRefGene
  9. xenoRefSeqAli
  10. estOrientInfo
  11. mrnaOrientInfo
  • Constructed by the doBlastzChainNet.pl script:
  1. chainHg38
  2. chainHg38Link
  3. chainMm10
  4. chainMm10Link
  5. chainXenTro9
  6. chainXenTro9Link
  7. netHg38
  8. netMm10
  9. netXenTro9
  • Constructed by the makeGenome.pl script:
  1. chromInfo
  2. gc5BaseBw
  3. grp
  • Constructed by 'make' in trackDb hierarchy:
  1. hgFindSpec
  • Added to by many loader commands:
  1. history
  • Constructed by doRepeatMasker.pl script:
  1. nestedRepeats
  • Constructed by cron job every night:
  1. tableDescriptions

Grep tips: Use your list of tables you'll push (see BETA STEPS )as the grep search string list, and look in the make file to see which tables are NOT mentioned

  • 1. grep -of allTables.xenLae2 ~/kent/src/hg/makeDb/doc/xenLae2/initialBuild.txt | sort -u > tablesListed.makedoc
  • 2. comm -23 tablesListed.makedoc allTables.xenLae2

This is also a helpful grep: cat ~/kent/src/hg/makeDb/doc/xenLae2/initialBuild.txt | grep "DONE"

Dev: Review downloads dir

View the contents of the downloads directory.

ls -R /usr/local/apache/htdocs-hgdownload/goldenPath/$db/

LiftOver files and vs* directories are for the chain/net tracks; and the multiz*way, phastCons*way and phyloP*way directories are for conservation tracks.

Note that $db/database dir will be empty except for README.txt. This directory will contain a dump of the database on the RR, but will always remain empty on hgwdev.

Also note that these files:

est.fa.gz      mrna.fa.gz      refMrna.fa.gz      xenoMrna.fa.gz
est.fa.gz.md5  mrna.fa.gz.md5  refMrna.fa.gz.md5  xenoMrna.fa.gz.md5

will not be present on hgwdev. They are generated automatically and rsync'ed to hgdownload after an assembly is added to hgwbeta.dbs and "make etc-update-server" is run in the kent/src/hg/makeDb/genbank/ directory on hgwbeta.

Dev: Run dbCheck

Run the following command to check that all MySQL tables are in good repair:

sudo dbCheck.sh $db

Dev: Alignment files are to valid assemblies

In Redmine for your assembly, the engineer should have provided a path to redmine.$db.file.list E.g., /hive/data/genomes/manPen1/redmine5515/redmine.manPen1.file.list

From hive, copy the file list to your assembly dir:

/hive/data/genomes/manPen1/redmine5515/redmine.manPen1.file.list

Take a look at the alignment "To" and "From" files, and make sure they are to valid assemblies on the RR.

LiftOver Files
A liftOver file is a chain file, it is a subset of all chains used in creating the net file.
hg38ToAnoCar2.over.chain.gz: These files are required for the liftOver utility.
The file names reflect the assembly conversion data contained within in the format <db1>To<Db2>.over.chain.gz. For example, a file named hg38ToAnoCar2.over.chain.gz file contains the liftOver data needed to convert hg38 coordinates to the anoCar2 assembly.
Chain Files
Chain files contain all possible chains generated by lastz, before a subset of "best" chains are filtered into the liftOver file.
hg38.anoCar2.all.chain.gz: chained blastz alignments.
The chain format is described in on the chain help page.
Net Files
hg38.anoCar2.net.gz: "net" file.
This file describes rearrangements between the species and the best Lizard match to any part of the Human genome. The net format is described in on the net help page.
Axt Files
hg38.anoCar2.net.axt.gz: chained and netted alignments.
i.e. the best chains in the Human genome, with gaps in the best chains filled in by next-best chains where possible. The axt format is described in the axt help page.

Dev: liftOver exists: old to new, new to old

Skip this if your assembly is the first version for the organism. Otherwise, check that the previous assembly version has a liftOver file to

  • the new version
  • and a reciprocal file in the
/gbdb/[your Db]/liftOver/[your Db]To[the older version of your org].over.chain.gz
/gbdb/[the older version of your org]/liftOver/[the older version of you org]To[your Db].over.chain.gz

Dev: liftOver exists: other orgs

Your assembly will probably also have liftOver files to/from other major orgs, such as the newer human and mouse assemblies. Check that liftOver files exist in BOTH directories,

/gbdb/[your database]/liftOver/
/gbdb/[some other org database]/liftOver

For example, if your assembly is manPen1, see what liftOver files are there. These should also match what is in your filelist from Redmine.

 ★  /gbdb/manPen1/liftOver
ls
manPen1ToHg38.over.chain.gz  manPen1ToMm10.over.chain.gz

Note that there are liftOver files to TWO other orgs, human and mouse. If this assembly was not the first, it should also have liftOver files to the previous assembly version.

Let's go look at liftOver files for hg38:

 ★  /gbdb/hg38/liftOver
ls | grep ManPen
hg38ToManPen1.over.chain.gz

and then we'll check mm10:

 ★  /gbdb/mm10/liftOver
ls | grep ManPen
mm10ToManPen1.over.chain.gz

Dev: Check Tools: LiftOver

  • Go to dev's LiftOver Tool and test lifts to & from other assembly versions and other organisms that you have liftOver files for.

Dev: Review notes and make temp dir for md5sum checks

There is a way to check all md5sums at once using one command. This should save you lots of time and typing. You'll need two directories in your home folder, temp and temp2.

First go to your test directory:

cd /usr/local/apache/htdocs-hgdownload/goldenPath/$db/

Then run the following loop command to compare current md5sum.txt files and the ones generated when the files were uploaded. If your only output are about README.txt and md5sum.txt, that's great and nothing has changed. If something else comes up, ask your developer.

for dir in *; do cd $dir; md5sum * | sort > ~/temp/$dir; sort md5sum.txt > ~/temp2/$dir; echo $dir; diff ~/temp/$dir ~/temp2/$dir; cd ..; done

OUTDATED BELOW, use above command

First, make a dir "temp" in your home directory. You'll use this in the steps below. The remainder of the text below explains how the md5sum checks will work.

Review the following section, which is a guide to verify that the download files exist and are not corrupt in the following directory and sub dirs:

hgwdev > /usr/local/apache/htdocs-hgdownload/goldenPath/$db/

We will be using a computer program called Md5sum to generate MD5 hashes to verify the integrity of the files since any change to the file will cause its MD5 hash to change. The MD5 hashes for each file was generated and stored in the md5sum.txt file.

An easy way to compare the MD5 hashes of each file is to do a diff. This can be easily automated by running the following commands.

The first command is to run md5sum for all files in your current directory (these will be listed in the steps below), sort them, and then redirect the output to a file.

md5sum * | sort > ~/temp/filename_1

The second command sorts the md5sum.txt file and redirects the output to a different file.

sort md5sum.txt > ~/temp/filename_2

The final command compares the two files created and displays the lines that differ between the two files.

diff ~/temp/filename_1 ~/temp/filename_2

Note that the md5sum.txt file obviously does not contain md5sum.txt and it was created before there was a README.txt file, so your diff will show md5sum.txt and README.txt in the results. If everything is ok, those should be the only results. In the vs* directories, the XXXX.net.axt.gz file will show up as axtNet/XXXX.net.axt.gz.

Continue on to the next steps to begin running these checks in the following directories.

Dev: bigZips: check md5sum

Change your directory to:

cd /usr/local/apache/htdocs-hgdownload/goldenPath/$db/bigZips

then run the following command:

md5sum * | sort > ~/temp/filename_1; sort md5sum.txt > ~/temp/filename_2; diff ~/temp/filename_1 ~/temp/filename_2

Dev: bigZips: check README

These check README commands can be automated so you don't have to do any of the below commands. You still do have to read or skim the README.txt files that output. Here is the command to display all README files for the whole directory:

cd  /usr/local/apache/htdocs-hgdownload/goldenPath/$db/
for dir in *; do cd $dir; echo $dir; cat README.txt; cd ..; done

REDUNDANT BELOW, above step does all README.txt prints in this assembly directory



/usr/local/apache/htdocs-hgdownload/goldenPath/$db/bigZips
  • Verify that the README.txt exists
  • cat the file and read it, check the contents (such as urls listed, etc.)

Dev: bigZips: check for corruption

Change your working directory to:

cd /usr/local/apache/htdocs-hgdownload/goldenPath/$db/bigZips

Run the following in the directory:

for file in *; do zcat $file | head; zcat $file | tail; done

Scroll the output and make sure all the text is ASCII.

Dev: database: check README

cat /usr/local/apache/htdocs-hgdownload/goldenPath/$db/database/README.txt
  • Verify that the README.txt exists
  • cat the file and read it, check the contents (such as urls listed, etc.)

Dev: liftOver: check md5sum

Change your directory to:

cd /usr/local/apache/htdocs-hgdownload/goldenPath/$db/liftOver

then run this command:

md5sum * | sort > ~/temp/filename_1; sort md5sum.txt > ~/temp/filename_2; diff ~/temp/filename_1 ~/temp/filename_2

Dev: liftOver: check README

/usr/local/apache/htdocs-hgdownload/goldenPath/$db/liftOver
  • Verify that the README.txt exists
  • cat the file and read it, check the contents (such as urls listed, etc.)

Dev: liftOver: corruption

Run the following in each directory and check the output:

for file in *; do zcat $file | head; zcat $file | tail; done

Scroll the output and make sure all the text is ASCII.

Dev: vsXXX: check md5sum

This section is only relevant if your assembly has chain/net files to another organism.

Note: there may be multiple organisms that your assembly has alignment files to, check them all.

Change your directory to:

cd /usr/local/apache/htdocs-hgdownload/goldenPath/$db/vsXXX

then run the following command:

md5sum * | sort > ~/temp/filename_1; sort md5sum.txt > ~/temp/filename_2; diff ~/temp/filename_1 ~/temp/filename_2

REPEAT this process for subdirectories:

  • reciprocalBest
  • reciprocalBest/axtRBestNet

Dev: vsXXX: check README

cd /usr/local/apache/htdocs-hgdownload/goldenPath/$db/vsMm10
cd /usr/local/apache/htdocs-hgdownload/goldenPath/$db/vsHg38


  • Verify that the README.txt exists
  • cat the file and read it, check the contents (such as urls listed, etc.)

Dev: vsXXX: corruption

Change your directory to other assemblies chains of your assemboly:

cd /usr/local/apache/htdocs-hgdownload/goldenPath/hg38/vs${db^}
cd /usr/local/apache/htdocs-hgdownload/goldenPath/mm10/vs${db^}

Run the following in each directory and check the output:

for file in *; do zcat $file | head; zcat $file | tail; done

Scroll the output and make sure all the text is ASCII.

REPEAT this process for subdirectories:

  • reciprocalBest
  • reciprocalBest/axtRBestNet

Dev: for :queryDb:vsYourDb: check README

/usr/local/apache/htdocs-hgdownload/goldenPath/queryDb/vsYourDb

Check the README.txt files for any other organisms that your assembly has alignments (chain/net/liftover/etc) to:

  • Verify that the README.txt exists
  • cat the file and read it, check the contents (such as urls listed, etc.)

REPEAT this process for subdirectory:

  • reciprocalBest (this readme covers the subdir, axtRBestNet).

Dev: for :queryDb:vsYourDb: check md5sum

Note: there may be multiple organisms that your assembly has alignment files to, check them all.

Change your directory to:

hgwdev > /usr/local/apache/htdocs-hgdownload/goldenPath/queryDb/vsYourDb

then run the following command:

md5sum * | sort > ~/temp/filename_1; sort md5sum.txt > ~/temp/filename_2; diff ~/temp/filename_1 ~/temp/filename_2

REPEAT this process for subdirectories:

  • reciprocalBest
  • reciprocalBest/axtRBestNet

Dev: for :queryDb:vsYourDb: check corruption

/usr/local/apache/htdocs-hgdownload/goldenPath/queryDb/vsYourDb

Just do a zcat for otherOrg -> yourDb:

zcat $file | head

REPEAT this process for subdirectories:

  • reciprocalBest
  • reciprocalBest/axtRBestNet

Dev: md5sum check with 2bitCompare

2bitCompare $db

The .2bit files contain the new assembly sequence in a compact, binary format. The .2bit files are located at:

  • /scratch/$db (on the blat server)
  • /usr/local/apache/htdocs-hgdownload/goldenPath/$db/bigZips/ (on hgwdev)
  • /gbdb/$db/ (on hgwdev)
  • /gbdb/$db/ (on hgwbeta)

Check the to make sure that the .2bit files are identical by running the 2bitCompare script. Particularly if the assembly has been part of a multiz track without a Browser, the file may exist on beta and RR and may not have been masked.

Below is some sample output:

hgwdev> 2bitCompare allMis1

  Checking md5sums.  This could take a few minutes.  Please be patient...

        blat4a md5sum: 134e740c05eedadc24de3a96775a25d6 /scratch/allMis1/allMis1.2bit
      download md5sum: 134e740c05eedadc24de3a96775a25d6 /usr/local/apache/htdocs-hgdownload/goldenPath/allMis1/bigZips/allMis1.2bit
   hgwdev gbdb md5sum: 134e740c05eedadc24de3a96775a25d6 /gbdb/allMis1/allMis1.2bit
  hgwbeta gbdb md5sum: 134e740c05eedadc24de3a96775a25d6 /gbdb/allMis1/allMis1.2bit

        blat4a date,size: Jun 19 11:03 569794406
      download date,size: Jul 3 10:55 53
   hgwdev gbdb date,size: Jun 7 13:34 39
  hgwbeta gbdb date,size: Jun 7 13:33 569794406

The first part of the script output lists the md5sums of all four .2bit files. These should be identical.

The second part of the script output lists the timestamps and filesizes.

  • The download and hgwdev gbdb files should be symlinks, as evidenced by a small filesize.
  • The blat and hgwbeta gbdb files should be the actual files, as evidenced by a large filesize.
  • The two symlink filesizes will likely be different, but the filesize of the two actual files should be identical.

If the blat .2bit is not the same as the other .2bit files, ask the pushers to restart the assembly and to pull the newest .2bit file from /gbdb.

Note.

 hgwbeta/rr gbdb md5sum: The $db directory does not exist in /gbdb on hgwbeta
 hgwbeta/rr gbdb date,size: N/A

Could show since there's no gbdb data on beta yet, that's part of the whole data push process.

Dev: Permissions check: downloads dir

The developer may need to update permissions to the download directory to be at least 664.

hgwdev > /usr/local/apache/htdocs-hgdownload/goldenPath/$db/

ls -lLR *
-rw-rw-r

This output can be thousands of lines if you have lots of alignments. To shorten it, you can display only the lines that don't match that permission, the vs label, the total bytes line, the symlink permissions, and blank lines. If it finds anything with less permissions, investigate thoroughly.

ls -lLR * | grep -ve '-rw-rw-r--\|vs\|total\|drwxrwxr-x\|^$'

Dev: Ensure your dbs is defined in trackDb makefile

cat ~/kent/src/hg/makeDb/trackDb/makefile | grep $db


🔵 Done with DEV steps? Go to Assembly QA Part 2: Track Steps