ENCODE QA: Difference between revisions

From Genecats
Jump to navigationJump to search
(→‎Download File specifics: adding info about hgFileUi)
(Major update of page: changed formatting completely and added missing information,)
Line 1: Line 1:
==Getting Started==
==Getting Started==
* Choose a track from the encodePushQ (sub pushQ accessed from pushQ Gateway).
===Claim track (pushQ & redmine)===
* Also claim the [http://redmine.soe.ucsc.edu Redmine] Issue of that track & change status from "Approved" to "Reviewing"
# Select the top-most track from the [http://hgwbeta.cse.ucsc.edu/cgi-bin/qaPushQ?org=encodePushQ&month=current encodePushQ], which is a sub-pushQ accessed from pushQ Gateway.
** Add yourself as a watcher to the redmine ticket (that way when you assign it back to the developer you will get updates)
# Claim the track's [http://redmine.soe.ucsc.edu/ Redmine] Issue & change status from "Approved" to "Reviewing"
** Make a question in the redmine ticket asking Kate to make a determination about the composite and subtrack labels (may not be necessary for subsequent releases)
#* Add yourself as a watcher to the redmine ticket (so if you assign it back to the developer you will get updates)
**email Kate when you claim a track (cc Katrina)
#* Make a question in the redmine ticket for Kate:
**email wrangler telling them to update the ENCODE status to "Reviewing"
#*# let her know you claimed the track
* Check out the developer's notes file to get a feel for what the track consists of.
#*# ask her to make a determination about the composite and subtrack labels (may not be necessary for subsequent releases)
#*Make a question for the wrangler asking them to update the ENCODE status to "Reviewing"
#* Use the % Done on Redmine Issue to estimate your QA progress. Kate uses this to check status.
===Review the "notes" file===
* Check out the developer's [[#Notes file]] to get a feel for what the track consists of.
** the path to the notes file will be in the "Notes" section of the pushQ entry
** the path to the notes file will be in the "Notes" section of the pushQ entry
** trust the notes file over the pushQ entry table/files information
** trust the notes file over the pushQ entry table/files information
* If this is a subsequent release, see [[#Subsequent Release of Data (e.g. Release 2)]] first.
* If this is a subsequent release, see [[#Subsequent Release of Data (e.g. Release 2)]] first.
* Use the % Done on Redmine Issue to track your QA progress. Kate uses this to check status.
===Create table list===
* Copy the list of tables (new tables and, if applicable, updated tables) from the "notes" file to a new file (for running [[#Run qaEncodeTracks.csh & check output|qaEncodeTracks]] and [[#Staging on hgwbeta|staging on hgwbeta]])


==run qaEncodeTracks.csh==
==Run qaEncodeTracks.csh & check output==
You will need to dump the list of tables (just the new tables if this is a releaseN) from the pushQ (or developer's notes file if this a releaseN) to a file (i.e. tableList in the usage statement). Then run qaEncodeTracks.csh, which does:
which does:
* countPerChrom
* countPerChrom
* check for entry in tableDescriptions table
* check for entry in tableDescriptions table
Line 22: Line 27:
* check that positional tables are sorted
* check that positional tables are sorted
* checkTableCoords (checks for any illegal coordinates)
* checkTableCoords (checks for any illegal coordinates)
Also, run genePredCheck/pslCheck if applicable. (i.e. if your track is a gene prediction track)


==Staging on hgwbeta==
==Staging on hgwbeta==
#Make a list of all tables (new & updated that need to be pushed to beta)
===Push /gbdb files===
#In trackDb, change 'release alpha' lines to 'release alpha,beta' lines and 'release beta,public' to 'release public' and then check in these changes. (Make sure there are release tags for tracks in super tracks! See http://genomewiki.ucsc.edu/index.php/ThreeStateTrackDb#Existing_super-tracks_MUST_use_release_tags )
Push new and, if applicable, updated /gbdb files (e.g. .wib, .bb, etc.) from hgwdev -> hgnfs1.
#*A quick way to replace these line in vi is ":#,## s/release alpha/release alpha,beta/" where #  = from start line and ## = to ending line
===Push tables to mysqlbeta===
#If this is a subsequent release (an update to an existing track), you may want to hold off on the next step so that you can compare old and new tracks on hgwdev and hgwbeta:
Use bigPush.csh using the table list you created [[#Create table list|above]].
#* Open the track on hgwbeta before staging it to make sure that the update won't cause a cart clash for users currently looking at the track (as evidenced by a completely blank screen, for instance). If you need to do a cartReset to get the track to show up correctly, something is wrong.
===Prepare trackDb (release tags and metaDb)===
#Do bigPush.csh using list created above
#release tags: see the [http://genomewiki.ucsc.edu/index.php/ThreeStateTrackDb Three State TrackDb] page for more info on release tags and our three-state trackDb
#Push any new /gbdb files (e.g. .wib or .bb files) from hgwdev to hgnfs1 if applicable
#*In /cluster/home/$usr/trackDb/$species/$db/trackDb.wgEncode.ra, find the include statement for your track's .ra file and change 'alpha' tag to an 'alpha,beta' tag and, if applicable (releaseN), change 'beta,public' to 'public' and then check in these changes.<br />If a new track is in a super-track, make sure there are release tags! See [http://genomewiki.ucsc.edu/index.php/ThreeStateTrackDb#Existing_super-tracks_MUST_use_release_tags explanation])
#On hgwbeta in trackDb: make beta DBS=dbName
#metaDb: starting from /cluster/home/$usr/trackDb/$species/$db/metaDb
## copy metDb .ra file from ~metaDb/alpha -> metaDb/beta
## add .ra file name to the makefile in ~metaDb/beta
# commit changes
# On hgwbeta, make beta DBS=__ from /cluster/home/$usr/trackDb/


===Staging metaDb on hgwbeta===
==hgTrackUi==
Starting from /cluster/home/$usr/trackDb/$species/$db/
===Functionality (track controls)===
#copy metaDb .ra file from ~metaDb/alpha -> metaDb/beta
====Display Modes====
#add .ra file name to the makefile in ~metaDb/beta
* by default, composite overall display mode should be set to dense (super-track, if applicable, should be set to hide)
#From trackDb on hgwbeta: make beta DBS=<$db>
* changing display mode of views should affect the subtrack list & hgTracks
====Config Settings of Views====
* settings function correctly
* settings of different views are independent
* Signals, by default, should have the following settings (unless lab has requested otherwise or other good reason):
** Data view scaling: use vertical viewing range (rather than auto-scale)
*** in dense, default fixed range should result in meaningful banding at full chromosome (not all gray)
** Windowing function: mean + whiskers
====Matrix====
* By default, matrix boxes should be fully checked or fully unchecked (not grayed), if not, this is trackDb setting issue that the wrangler should fix.
* Matrix headers:
** For human, Tier 1 and Tier 2 cell lines:
*** should be listed first (Tier 1 in alphanbetical order followed by Tier 2 in alphabetical order)
*** should be labeled as Tier 1 or 2 with the tier following the cell line in parentheses, bold, no hyperlink, no italics, e.g. '''<u>cellLineA</u>''' '''(Tier 1)'''
** +/- buttons function correctly
** selections in matrix result in appropriate selection changes in subtrack list
====Subtrack list====
* adjusts according to matrix & view (hide -> non-hide) selections
* 'only selected/visible' and 'all' radio selections function
* sorting functions (clicking on column headings)
* schema links work
====MetaData====
* make sure metaData is present by clicking on ellipsis (...)
* check a few to make sure they have somewhat consistent fields
* spot check a few fields to make sure they make sense
====Links====
* check that all links work, and where applicable, are [[#Relative Links|relative]]
===Content (.html description page)===
====Labels====
* Check labels adhere to Kate's instructions
**Other resources: [http://encodewiki.ucsc.edu/EncodeDCC/index.php/ENCODE_track_settings_style_guide Style Guide] and the Label spreadsheets on the soe google docs.
====Sections====
* Make sure all sections are present
* Check grammar, spelling, readability, completeness, correctness
=====Description=====
* Breif overall summary of track.
=====Display Conventions and Configuration=====
* Contains info about each view in track
* No description for views only available in downloads
* link to [http://hgwdev.cse.ucsc.edu/goldenPath/help/multiView.html multi-view instructions]
* Tracks with Bam alignments should have a link to the [http://samtools.sourceforge.net/SAM1.pdf Sam Format Specification] and should explain any non-standard tags, those starting with X, Y or Z or that are not listed in the [http://samtools.sourceforge.net/SAM1.pdf#page=6 tag section]
=====Methods=====
* Make sure it is detailed enough.
* style should be consistent with the rest of the site
** references to "data" are plural
** value and units have space between them (e.g. 50 bp rather than 50bp)
=====Release Notes=====
* Optional for first release
* Required for subsequent releases
* Should start with "Release # (Month Year) of this track...."
* Provides a description of the changes of this particular release.
=====Credits=====
* Must have contact person
* Name is a hyperlink to email
* Email must be sanitized (using encodeEmail.pl script)
=====References=====
* Correct format, see [[CBSE citation format]]
* Alphabetical order
=====Data Release Policy=====
*Standard language:
**Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available [http://hgwdev.cse.ucsc.edu/ENCODE/terms.html here].
* make sure "here" is a link to the [http://hgwdev.cse.ucsc.edu/ENCODE/terms.html data release polciy]
====Links====
*Check standard links are present, and, where applicable, are [[#Relative Links|relative]]:
** ENCODE Data policy (in Data Release Policy section)
** help for multi-view (Next to "Select views" in track Control Section in Display Conventions and Configuration section)
** contact email (see [[#Credits]] for more info)


==Other things to check by hand==
==hgc details==
# For hg19 check that labels follow this new convention: http://encodewiki.ucsc.edu/EncodeDCC/index.php/ENCODE_track_settings_style_guide
===Accuracy of details===
# Run genePredCheck/pslCheck if applicable. (i.e. if your track is a gene prediction track)
* details that are displayed correspond with the record in the table
# make sure there is a link to the help doc (in the config section: "Select views (help)")
===Makes sense===
# check that metadata is present by clicking on "..." link in tables list on details page, spot check a few to make sure they are correct
* tables values seem correct
# read description page (if the track is part of a super track, make sure to QA the super track description)
===Useful===
#* is it detailed enough, especially Methods
* you understand what is being displayed
#* Is the text consistent with the rest of the site:
* internal, non-functioning fields are not displayed (e.g. if all values in a field have "-1" as a placeholder, we shouldn't display that field)
#** Make sure that any references to "data" are plural
===Complete===
#** Make sure that all units have a space in between their quantity ie. '''50 bp''' and not 50bp
* all useful information from is present (there's nothing important that is missing)
#* are the citations in correct format ([[CBSE_citation_format]])
===Clear===
#* does the "Display Conventions and Configuration" section cover all track types
* details are presented and labeled clearly
#** Tracks with Bam alignments should have a link to the [http://samtools.sourceforge.net/SAM1.pdf#page=4 Sam Format Specification]
* layout is user friendly
#* test all hyper-links
===Links===
#* releaseN tracks should contain a section called "Release Notes" which should state the release# and provide a description of the changes in that particular release. See [http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=wgEncodeUwDnaseSeq this page] for an example.
* check that all links are relevant, work, and where applicable, are [[#Relative Links|relative]]
#* Check for lab contact (sanitize email addresses using encodeEmail.pl script)
# Release log (look in PushQ): must contain "ENCODE", usually it is just be the shortlabel, if it is a subsequent release it should have "(release #)". If there is something weird about the data that needs to be noted, make sure it fits in nicely with the current release log entries. (url should be of the format: ../../cgi-bin/hgTrackUi?db=hg18&g=wgEncodeAffyRnaChip )
# configuration section (does it work?)
#* Check the views are working & that the settings work
#* Here are some additional specific guidelines when checking the Signal track default settings:
#** auto-scale shouldn't be used unless a lab insists; should be a fixed range
#** mean + whiskers should be the default setting unless there is a good reason
#** check signal in dense view for the whole chrom to make sure the fixed range allows for nice pattern of dark bands (we don't want to see all light gray across), wrangler should fix if need be
#** if there are multiple signal tracks in a track, their settings should be independent
# multi-view config: matrix etc.
#* By default, matrix boxes should be fully checked or fully unchecked (not grayed), if not, this is trackDb setting issue the wrangler should fix.
#* Tier 1 and Tier 2 cell lines should be labeled as such in bold in the matrix.
# check "reset to defaults" button (does it work)
# Make sure there's a link to the ENCODE Data Release Policy (at the bottom of the description page).
# Make sure the tracks in the Tier1 and Tier2 Cell Lines are properly colored (no black, and all tracks from one Cell Line have the same color).
# Tier 1 and 2 cell lines should be displaying first by default when viewed in the browser.
# For supertracks, by default the supertrack control should be on hide, and the tracks within should be on dense


===Testing in the Browser===
==hgTracks==
# test one point from table to view in GB (pick a point which can be obtained by clicking on "schema" from the track configuration page)
===Display===
##If they are bam files, the schema will only give you a filePath. You will need to use SamTools to obtain a point to test.
====Views (zoomed in/out)====
### add /hive/data/outside/samtools/svn_${MACHTYPE}/samtools to $PATH in your .bashrc
* check the display of all Views in all display modes when zoomed in to the base pair level & zoomed out to 1 million bp
### You can run the command line using the fileName found in the schema:
====Table coordinates + features====
###:samtools view -x filePath chrx:xxxx-xxxxxx | head
* an items' cooridnates and other display features (exons, etc) display as expected/correclty based on table
###:samtools view -x /data/NT/gbdb/hg18/neandertal/seqAlis/Feld1-hg18.sorted.bam chr1:2000000-3000000 | head
** a line from the table for comparing against the display can be obtained from schema or mysql db for regular tables
### The output will give you the start position and then it gives you read length. Add the read length to the start position to get the end position. This will give the point needed to put into the browser for testing purposes.
** for bam files, the schema will only give you a filePath. You will need to use SamTools to obtain a point to test.
##If they are bigWig files, you can use the utility: bigWigInfo file.bw - make sure they are V4!
**#add /hive/data/outside/samtools/svn_${MACHTYPE}/samtools to $PATH in your .bashrc
##If they are bigBed files, you can use the utility: bigBedInfo file.bb
**#run samtools on the command line using the fileName found in the schema (see following example). The output will give you the start position and then it gives you read length; add the read length to the start position to get the end position. This will give the point needed to put into the browser for testing purposes.
# If hg19, compare that same point to the equivalent position in hg18 by doing a "convert" from hg19 to see the equivalent position in hg18. In your hg19 window, go back to your region, open new window and paste in hg18 equivalent position and compare hg19 to hg18. (*Notee: Comparisons to hg18 should be very cursory. Any differences should be noted in the redmine ticket, but not necessarily investigated unless a user also brings up an issue. The thinking behind this is that when there are differences, it is most likely an error with hg18, not hg19 and we are unnecessarily holding up hg19).
  samtools view -x filePath chrx:xxxx-xxxxxx | head
# zoom into base level (at different visibilities)                                                  
  samtools view -x /data/NT/gbdb/hg18/neandertal/seqAlis/Feld1-hg18.sorted.bam chr1:2000000-3000000 | head
# zoom way out 1million bps (at different visibilities)
:* for big* files, you can't get individual record, but use bigWigInfo or bigBedInfo to get general stats, be sure bigWigs are version 4.
# searching: should items be searchable                                                             
====Searchable====
# default visibility: should this track be on by default?
* Are items searchable; should they be? Most likely not for ENCODE.
====Colors====
* For human, Tier 1 and Tier 2 cell lines should be displayed in a unique color (other than black)
* it is OK if other tracks are in color, but not necessary
====Defaults (composite/subtracks)====
* should this composite track be on by default? (For ENCODE, usually no)
* check the which subtracks are set as default selection, make sure:
** there aren't too many
** important cell lines are on by default
* default Tier 1 and Tier 2 subtracks should display first
====Compare to hg18====
* If track is in hg19, compare a point on the hg19 browser of the track to the equivalent position in hg18.
*# use "convert" from hg19 position to see the equivalent position in hg18.
*# go back to your region in hg19, open new window and paste in hg18 equivalent position and compare hg19 to hg18.
* Note: Comparisons to hg18 should be very cursory. Any differences should be noted in the redmine ticket, but not necessarily investigated unless a user also brings up an issue. The thinking behind this is that when there are differences, it is most likely an error with hg18, not hg19 and we are unnecessarily holding up hg19)
===Performance===
====Chrom 1 Test (signal & experiment)====
When position is set to all of chromosome 1, data of interest loads in less than 1 minute:
* '"signal''': check time of loading first signal subtrack
* '''experiment''': check time of loading all views for one experiment (e.g. Pol2 in GM12878 cells)?
====Defaults at Gene Sized Region Test====
Set position to a gene-size region with your track's default subtracks on and the default browser tracks on (easiest to reset cart, turn on track)
* should display quickly and not be "too much" data


===Does the data make Sense?===
==Data make sense==
# Compare related subtracks of related Views to each other. For example:
===Compare subtracks within views===
#* Does the All Signal Raw Signal really seem to comprise of the data in the Plus Raw Signal & Minus Raw Signal?
* Do all the subtracks within a view somewhat correlate?
#* Do Peaks really represent the high Signal areas of the Signal View subtracks?
===Compare subtracks of related views===
# Do the data make sense Biologically? Turn on other tracks to compare. For example:
*For example:
#* RNA-seq data should correlate with the exons in a genes track
** Does the All Signal Raw Signal subtrack of an experiment really seem to comprise of the data in the Plus Raw Signal & Minus Raw Signal?
#* TFBS tracks should correlate with the beginning of gene transcripts
** Do Peaks really represent the high Signal areas of the Signal View subtracks?
===Do the data make sense Biologically?===
*Turn on other tracks to compare.
** compare to the gene tracks
** compare to subtracks of similar tracks
** For example:
**** RNA-seq data should correlate with the exons in a genes track
**** TFBS tracks should correlate with the beginning of gene transcripts


===Performance Tests===                                                                                      
==Files==
# Does the first 'Signal' subtrack pass the chr1 test (chr1 loads in less than one minute)                        
===hgFileUi===
# Do all views for one experiment pass chr1 test (e.g. Pol2 in GM12878 cells)?                                     
* 'Downloads' links on hgTrackUi should now go to hgFileUi
# A user-oriented test would be to test the performance in a gene-size region of the track with just the default-on subtracks (for the Yale track, and many other ENCODE tracks, default-on subtracks will be all experiments in the GM12878 cell lines, Signal view only -- this should be the configuration you see after a cart reset, then turning the overall track vis to full).
** if not, ask wrangler to add "sortOrder" information to trackDb entry
# Note that ENCODE tracks can have any number of subtracks, and will continue to grow with time. We should definitely assure that useful subsets can be displayed in user-friendly time.
====file count====
* Check # of files displayed is correct (use "notes" file)
====download button====
* Make sure download button prompts a download (and doesn't take you to an error page)
====sort columns====
* Check the sorting of columns
====links====
* Check that the "Track Settings" link takes you back to the track's hgTrackUi page
===index===
* since there is no longer a link, go to <nowiki>http://hgdownload-test.cse.ucsc.edu/goldenPath/$db/encodeDCC/<trackName>/</nowiki>:
* index page in downloads dir is created from two files, preamble.html & index.html:
====preamble.html====
the top description part of the index page (not the list of files), should have:
* introductory paragraph with a very brief description
* releaseN: should also include the release number at end of description (e.g. "This is Release 2 of this track. Release notes are included in the track description.")
* if not complete, here are the directions to edit (or just ask wrangler to do it!):
*# edit the preamble.html file (note that it is not in git) in the downloads directory (/usr/local/apache/htdocs-hgdownload/goldenPath/$db/encodeDCC/) on hgwdev
*# regenerate index.html by running the script: encodeDownloadsPage.pl index.html (I think the preamble.html needs to be in the dir where you run the script)
*links (present and function):
** trackUi
** files.txt
** md5sum.txt files
====index.html====
list of files on the index page (not the description on top)
* check file count (should name each file being released, and only the name of those files in this release) against file count in the "notes" file
** if the list count isn't right (or doesn't seem right otherwise), ask wrangler
** or, run the encodeDownloadsPage.pl script (at the prompt type: encodeDownloadsPage.pl index.html) directly in the /releaseN directory (hgwdev: /kent/src/hg/htdocs/goldenPath/<db>/encodeDCC/<trackName>/releaseN) to generate a new index.html page that you know contains all the files in that directory. Then, copy the /releaseN/index.html to the main track directory (hgwdev: /kent/src/hg/htdocs/goldenPath/<db>/encodeDCC/<trackName>/) as that is the index.html used on the site. Don't forget to commit your changes.
*executable: make sure this file is executable (because of the way the links are created)
 
==Release to RR==
Note: Cc the data wrangler for this track on all your pushes Cc encode@soe.ucsc.edu on your final push.
===Release log===
* Release log field in PushQ:
** should be the long label (or short if too long) and, if releaseN, release number in parentheses
** must contain ENCODE (or it it won't show up on ENCODE downloads page)
===Push download files===
* Push download files, index.html, files.txt and md5sum.txt (from hgwdev to hgdownload)
** If this is a releaseN, even though there is a releaseN directory on hgwdev, do not create one on hgdownload (see [[#Download files]] for specifics)
* Note, this push can take hours
===Prepare trackDb (release tags and metaDb)===
# '''release tags''', in trackDb.wgEncode.ra:
## remove 'alpha,beta' (no release tag necessary)
## if releaseN, then: (see [http://genomewiki.ucsc.edu/index.php/ThreeStateTrackDb Three State TrackDb] page for more info)
### copy <trackName>.new.ra over <trackName>.ra
### delete <trackName>.new.ra public line
### copy <trackName>.new.html over <trackName>.html
### open <trackName>.ra and remove html line (pointed to <trackName>.new.html)
#'''metaDb''', from /cluster/home/$usr/trackDb/$species/$db/metaDb:
## double check alpha vs. beta to make sure you have most updated metadata (diff beta/<trackName.ra> alpha/<trackName>.ra)
##* if diffs are due to next release, don't copy to beta, if diffs are for current release, copy to beta & double check in hgTrackUi, etc.
## copy metDb <trackName>.ra file from ~metaDb/beta -> metaDb/public
## add <trackName>.ra file name to the make file in ~metaDb/public
# commit/push changes
# On hgwbeta, make public DBS=$db from /cluster/home/$usr/trackDb/
#* announce it to browser-qa
===Check on public===
# If this is a subsequent release, check track on [http://hgwbeta-public.cse.ucsc.edu hgwbeta-public]
# Run comparePublic.csh to check differences between trackDb_public and RR and hgwbeta.
===Push tables===
*Push track tables from mysqlbeta -> mysqlrr (not trackDb_public yet)
===Drop tables (if necessary)===
# Drop tables from hgwbeta that need to be removed (being replaced by V# tables)
# Drop tables being removed from the RR
===Drop /gbdb files (if necessary)===
* Drop /gbdb files from hgnfs1
===Push trackDb+friends===
* Push trackDb+friends and tableDescriptions from mysqlbeta -> mysqlrr
** cc wrangler and encode@soe.ucsc.edu
===PushQ & check on RR===
# click "push requested" in the pushQ record
# once all pushes complete, check track on RR
# click "Done!" on pushQ record
# Transfer pushQ entry to from the L queue of encodePushQ to the Main pushQ.
===ENCODE downloads===
* check ENCODE downloads page ([http://hgwdev.cse.ucsc.edu/ENCODE/downloads.html human] | [http://hgwdev.cse.ucsc.edu/ENCODE/downloadsMouse.html mouse]), if you track isn't there, add it:
** /kent/src/hg/htdocs/ENCODE/downloads.html to add a line for your track and, if necessary, its super-track
*** super-track title should be a non-underlined link to the super-track hgTrackUi, for example:
<A style="text-decoration:none" HREF="http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRnaSeqSuper" TARGET=_blank>RNA-seq</A>
:* push the following from hgwbeta -> RR:
/usr/local/apache/htdocs-hgdownload/ENCODE/downloads.html


==Files==
==Other info==
First, a note about finding the files. One of the most time-consuming things we do is track down items that should have been placed in the "Files" section of the pushQ entries but weren't. It takes us a long time to (a) figure out what's missing, and (b) find it. If developers can ensure that both the /gbdb and /goldenPath files are there, it would be a huge help!
===Subsequent Release of Data (e.g. Release 2)===
 
Periodically, released ENCODE tracks will be augmented with new data as labs complete experiments on new cell lines, etc. The new data will come in various formats: some will replace existing data, some will be brand new, some old data will be eliminated, etc.
 
====Notes file====
The data wrangler will create a notes file using the encodeMkChangeNotes script, check it into git, and place it here: kent/src/hg/makeDb/doc/encodeDcc$db/*.txt
 
This document should contain complete lists of each table and file and what its disposition is. The tables and files will fall into categories similar to this:
* A) Untouched - are on public browser and should remain
* B) Deprecated - are currently on RR but will no longer be needed and should not be referenced by the public site.
:: NOTE: NO FILES SHOULD BE REMOVED from the downloads directory on hgdownloads (RR).
::This list is provided for completeness. Any files marked here as in gbdb may be eliminated.
* C) New - are only currently on test but will need to be pushed to the RR.
* D) Additional items of note
 
This document may not match reality. It may be the case that some of the tables/files do not exist at all, the names are incorrect, they are not present on the machine as listed in the file, they do not match the list that is in the pushQ. The first challenge in QAing a subsequent ENCODE release is to determine if/how the notes file diverges from reality. To do this, compare the file to the "snapshot" of what is included in the release (and what you should QA), which can be found in the "release2" list in the downloads directory (hgwdev: /data/apache/htdocs/goldenPath/<db>/<trackName>/release<x>/). If the file differs from the downloads directory, then send that information to the data wrangler and pop the track into the B-queue while they sort it out. Otherwise, QA spends far too much time figuring out exactly what they are expected to QA.
 
Once the list is finalized, proceed with the QA work as outlined above. Note the additional steps in the #Files section for how to handle the /releaseN directory.
 
====MetaDb changes====
There also may be some metadata changes to fix errors or add information such as the GEO Series and GEO Sample.
* Be sure to QA these metadata changes
* Check for related redmine issues (addition of GEO accessions should definitely have a related issue for the addition of this info)
* You can also do a diff to check for any other metadata changes.From kent/src/hg/makeDb/trackDb/<org>/<db>/metaDb, do the following diff:


===Download File specifics===
diff beta/wgEncodeYourTrack alpha/wgEncodeYourTrack
# Check the hgFileUi page (link to Downloads on hgTrackUi should now go there for new tracks).
#* Check # of files is correct
#* Make sure download button prompts a download (and doesn't take you to an error page)
#* Check the sorting of columns
#* Check the "Track Settings" link takes you back to the track's hgTrackUi page
# Check the hgdownload dir, since there is no longer a link, go to http://hgdownload-test.cse.ucsc.edu/goldenPath/<db>/encodeDCC/<trackName>/:
##Check the '''Index page''' (e.g. [http://hgdownload-test.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeYaleChIPseq this page ] ), which created from two files, preamble.html & index.html) in the downloads directory:
##*''preamble.html'' - the top description part of the index page (not the list of files)
##**an introductory paragraph with a brief description, and link to the trackUi.
##**should include a link to the files.txt and md5sum.txt files
##**releaseN: should also include the release number at end of description (e.g. "This is Release 2 of this track. Release notes are included in the track description.")
##**If you need to edit the preamble.html (not the list of files), follow these steps:
##**#Edit the preamble.html file (note that it is not in git) in the downloads directory (/usr/local/apache/htdocs-hgdownload/goldenPath/$db/encodeDCC/) on hgwdev
##**#Regenerate index.html by running the script:  encodeDownloadsPage.pl index.html (I think the preamble.html needs to be in the dir where you run the script)
##**#Look at results on genome-test.  If necessary, go back to step 1.
##*''index.html'' - the list of files on the index page (not the description on top)
##**should contain the name of each file being released (and only the name of those files in this release). A good way to spot check this is to make sure the number of files at the bottom of the list is correct.
##***new track: use the PushQ file list (shouldn't be files with V2 or V3, etc.) and your best judgment to determine that all the appropriate files (and only those files are listed).
##***releaseN: make sure the right # of files are there. Check some of the removed files to make sure they were in fact removed from the list. If the list doesn't seem right, run the encodeDownloadsPage.pl script (at the prompt type: encodeDownloadsPage.pl index.html) directly in the /releaseN directory (hgwdev: /kent/src/hg/htdocs/goldenPath/<db>/encodeDCC/<trackName>/releaseN) to generate a new index.html page that you know contains all the files in that directory. Then, copy the /releaseN/index.html to the main track directory (hgwdev: /kent/src/hg/htdocs/goldenPath/<db>/encodeDCC/<trackName>/) as that is the index.html used on the site. Don't forget to commit your changes.
##**make sure this file is executable (because of the way the links are created)
##Downloads directory also needs to contain the following:
##*files.txt - plaintext version of index.html; lists files with metadata
##*md5sum.txt - checksum of all files in download directory
#When you are ready to release make sure your track is listed on the [http://genome.ucsc.edu/ENCODE/downloads.html downloads page] - if it isn't listed, go /kent/src/hg/htdocs/ENCODE/downloads.html to add a line for your track, and push the following from hgwdev -> hgwbeta, RR:
  /usr/local/apache/htdocs-hgdownload/ENCODE/downloads.html
Also, if the new track is part of a super-track, when you add the super-track category, please make the title a non-underlined link to the super-track page, for example:
  <A style="text-decoration:none" HREF="http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRnaSeqSuper" TARGET=_blank>RNA-seq</A>


===Pushing Files===
===Pushing Files===
Pushing the three main types of files involved in ENCODE tracks.
Pushing the three main types of files involved in ENCODE tracks.


*gbdb Files
====gbdb files====
Files of this form get pushed hgwdev -> hgnfs1
Files of this form get pushed hgwdev -> hgnfs1
  /gbdb/hg18/wib/wgEncode*.wib


*Other Files
/gbdb/hg18/wib/wgEncode*.wib
Files of this form get pushed hgwdev -> hgwbeta & RR (they are quite often accidentally omitted from the pushQ entry -- you will need these types of protocol PDF files, if this is the first subtrack released for this cell line from this lab)
 
  /usr/local/apache/htdocs/ENCODE/protocols/cell/*.pdf
====Download files====


*Download Files
Download files for an original release get pushed hgdownload-test on hgwdev -> hgdownload (list the entire file path as usual)
Download files for an original release get pushed hgdownload-test -> hgdownload (list the entire file path as usual)
 
  /usr/local/apache/htdocs-hgdownload/goldenPath/hg18/encodeDCC/wgEncode*/index.html
/usr/local/apache/htdocs-hgdownload/goldenPath/hg18/encodeDCC/wgEncode*/index.html
  /usr/local/apache/htdocs-hgdownload/goldenPath/hg18/encodeDCC/wgEncode*/wgEncode*.[bed/wig].gz
/usr/local/apache/htdocs-hgdownload/goldenPath/hg18/encodeDCC/wgEncode*/wgEncode*.[bed/wig].gz


When pushing download files for a subsequent release track (e.g. release 2), push files as follows (but in your request, list the from/to paths at the top followed by a list of the file names without the full path)
When pushing download files for a subsequent release track (e.g. release 2), push files as follows (but in your request, list the from/to paths at the top followed by a list of the file names without the full path)


  from '''hgwdev''': /usr/local/apache/htdocs-hgdownload/goldenPath/<db>/encodeDCC/<trackName>/releaseN/
from hgwdev: /usr/local/apache/htdocs-hgdownload/goldenPath/<db>/encodeDCC/<trackName>/releaseN/
  to '''hgdownload''': /usr/local/apache/htdocs/goldenPath/<db>/encodeDCC/<trackName>/
to hgdownload: /usr/local/apache/htdocs/goldenPath/<db>/encodeDCC/<trackName>/
  (''Note'': no releaseN directory on '''hgdownload''')
(Note: no releaseN directory on hgdownload)


* After Pushing Files:
* Once the files have been pushed you can check to see if the push was successful using this script: checkPushedFiles.csh
Once the files have been pushed you can check to see if the push was successful using this script:
checkPushedFiles.csh


===validateFiles===
====Other files====
*''No longer run, here are Tim's comments about QA running validateFiles:'' "To get these things through the pipeline, we run them through validateFiles, so I think your running them through again is one time too many. But if you are going to, then each lab and each file type may have negotiated limits (which may change between submissions).  These limits are found in the relevant submission directory DAF files."
 
* Old validateFiles process:
Files of this form get pushed hgwbeta -> RR. Because they used to be omitted from the pushQ entry often, the directories containing these files are now pushed weekly byKatrina on Fridays. So QAers no longer have to worry about pushing these. They are not in source control so go out way ahead of the track usually.
Test a smattering of different file types using this tool: '''validateFiles''' (type the program name without arguments to see the usage statement). If there are no errors, there will be no output.  For example, for files of type tagAlign, invoke the tool like this:
 
  validateFiles -type=tagAlign -genome=/gbdb/hg18/hg18.2bit /usr/local/apache/htdocs/goldenPath/hg18/encodeDCC/wgEncodeHudsonalphaChipSeq/wgEncodeHudsonalphaChipSeqAlignmentsRep1Gm12878Control.tagAlign.gz
  /usr/local/apache/htdocs/ENCODE/protocols/cell/*.pdf


For tagAligns there are several relevant validateFiles options:
===Relative Links===
  mismatches - frequently 2 but negotiated for each labSet this to 5 to be tolerant
In html on our site, you can create relative links (on dev, the link goes to the page on dev, on beta, it goes to beta, etc.) by using part of the path based on the your file's location in the source tree relative to the location of the file or cgi you are linking to.
  matchFirst - negotiatedYou should set this to 25 and even then you may need to adjust it
<br />For example from ~trackDb/human/hg19, here is how you point to:
  nMatch - negotiated, but you should always have this parameter set.
* a golden path file:
  <A HREF="../../goldenPath/help/multiView.html" TARGET=_BLANK>here</A>.
* cgi:
  <A TARGET=_BLANK HREF="/cgi-bin/hgEncodeVocab?type=cellType">cell lines</A>
* ENCODE protocols:
<A HREF="../../ENCODE/protocols/cell">ENCODE cell culture protocols </A>.
* ENCODE portal:
  <A TARGET=_BLANK HREF="/ENCODE/index.html">Encyclopedia of DNA Elements (ENCODE) Project</A>
* ENCODE data release policy:
<A TARGET=_BLANK HREF="../ENCODE/terms.html" TARGET=_BLANK>here</A>


If you want to be exact, then the metadata as seen on the downloads page tells which submission directory the file belongs to, and the most recent *.DAF (or *.daf) will have a validationSettings line in it which will include the settings that belong to each file type. Example:
==Old info==
  /hive/groups/encode/dcc/pipeline/encpipeline_prod/773/UtaChIPseqBOonlyV1.DAF
===File Validation===
has the line:
* No longer run, here are Tim's comments about QA running validateFiles: "To get these things through the pipeline, we run them through validateFiles, so I think your running them through again is one time too many. But if you are going to, then each lab and each file type may have negotiated limits (which may change between submissions). These limits are found in the relevant submission directory DAF files."
  validationSettings allowReloads;validateFiles.tagAlign:mmCheckOnInN=100,mismatches=3,nMatch,matchFirst=25
* Old validateFiles process:


This means that the tag aligns were validated with -mismatches=3 -nMatch -matchFirst=25
Test a smattering of different file types using this tool: validateFiles (type the program name without arguments to see the usage statement). If there are no errors, there will be no output. For example, for files of type tagAlign, invoke the tool like this:


==Subsequent Release of Data (e.g. Release 2)==
validateFiles -type=tagAlign -genome=/gbdb/hg18/hg18.2bit /usr/local/apache/htdocs/goldenPath/hg18/encodeDCC/wgEncodeHudsonalphaChipSeq/wgEncodeHudsonalphaChipSeqAlignmentsRep1Gm12878Control.tagAlign.gz
Periodically, the existing ENCODE tracks will be augmented with new data as labs complete experiments on new cell lines, etc.  The new data will come in various formats: some will replace existing data, some will be brand new, some old data will be eliminated, etc.


===Notes file===
For tagAligns there are several relevant validateFiles options:


The data wrangler will create a text document, check it into git, and place it here: kent/src/hg/makeDb/doc/encodeDccHg18/*.txt
  mismatches  - frequently 2 but negotiated for each lab.  Set this to 5 to be tolerant
  matchFirst - negotiated.  You should set this to 25 and even then you may need to adjust it
  nMatch - negotiated, but you should always have this parameter set.


This document should contain complete lists of each table and file and what its disposition is.  The tables and files will fall into categories similar to this:
If you want to be exact, then the metadata as seen on the downloads page tells which submission directory the file belongs to, and the most recent *.DAF (or *.daf) will have a validationSettings line in it which will include the settings that belong to each file type. Example:
*A) Untouched - are on public browser and should remain
*B) Deprecated - are currently on RR but will no longer be needed and should not be referenced by the public site.
**NOTE: NO FILES SHOULD BE REMOVED from the downloads directory on hgdownloads (RR).
**This list is provided for completeness. Any files marked here as in gbdb may be eliminated.
*C) New - are only currently on test but will need to be pushed to the RR.
*D) Additional items of note


This document may not match reality.  It may be the case that some of the tables/files do not exist at all, the names are incorrect, they are not present on the machine as listed in the file, they do not match the list that is in the pushQ.  The first challenge in QAing a subsequent ENCODE release is to determine if/how the file diverges from reality.  '''To do this, compare the file to the "snapshot" of what is included in the release (and what you should QA), which can be found in the "release2" list in the downloads directory (hgwdev:  /data/apache/htdocs/goldenPath/<db>/<trackName>/release<x>/). If the file differs from the downloads directory, then send that information to the data wrangler and pop the track into the B-queue while they sort it out'''.  Otherwise, QA spends far too much time figuring out exactly what they are expected to QA.
  /hive/groups/encode/dcc/pipeline/encpipeline_prod/773/UtaChIPseqBOonlyV1.DAF


Once the list is finalized, proceed with the QA work as outlined above.  Note the additional steps in the [[#Files]] section for how to handle the /releaseN directory.
has the line:


===MetaDb changes===
  validationSettings allowReloads;validateFiles.tagAlign:mmCheckOnInN=100,mismatches=3,nMatch,matchFirst=25


There also may be some metadata changes to fix errors or add information such as the GEO Series and GEO Sample.
This means that the tag aligns were validated with -mismatches=3 -nMatch -matchFirst=25
* Be sure to QA these metadata changes
* Check for related redmine issues (addition of GEO accessions should definitely have a related issue for the addition of this info)
* You can also do a diff to check for any other metadata changes.From kent/src/hg/makeDb/trackDb/<org>/<db>/metaDb, do the following diff:
diff beta/wgEncodeYourTrack alpha/wgEncodeYourTrack


==Releasing to RR==
Note: Cc the data wrangler for this track on all your pushes Cc encode@soe.ucsc.edu on your final push.
# Check release log field in PushQ...needs to start with ENCODE
# If this is a first release, skip this step and go straight to Step#3. If this is a subsequent release, do the following:
#* Remove the 'release public' block (including sub-blocks) of your track from trackDb.wgEncode.ra.
#* Remove the 'release alpha,beta' lines from the release alpha blocks (including sub-blocks), and then on parent and view-in-the-middle blocks if applicable:
#**also note: removeAlphas script may not run if your table list file has lines that begin with a tab/space (remove these in vi with :%s/^ *//)
##git pull
##run removeAlphaBetas script (> to a file)
##diff between file & trackDb.wgEncode.ra (diff file1 file2)
##*double check # of release alpha,betas matches # of tables (diff file1 file2 | grep release | wc --lines)
##copy file over trackDb.wgEncode.ra
##git diff to check new copy against repository copy
##If necessary, remove release alphas from the parent block and "view-in-the-middle" sub-blocks and git diff again
##make alpha on db to make sure everything looks good on dev
##commit change
##git diff again to make sure they are the same
##remove file
##from trackDb on hgwbeta: make beta DBS=<db>
#Releasing the metaDb (most tracks will require this)
#* Starting from /cluster/home/$usr/trackDb/$species/$db/
## copy metaDb .ra file from ~metaDbeta-> metaDb/public
## add .ra file name to the makefile in ~metaDb/public
# Do a make public DBS=<db> (from trackDb on hgwbeta) and announce it to QA
# If this is a subsequent release, check track on http://hgwbeta-public.cse.ucsc.edu
# Run comparePublic.csh to check differences between trackDb_public and RR and hgwbeta.
# Push track tables from mysqlbeta -> mysqlrr (not trackDb_public yet)
# Drop tables from hgwbeta that need to be removed (being replaced by V# tables)
# Drop tables being removed from the RR
# Push trackDb and friends ([[Pushing_trackDb]]) and tableDescriptions  from mysqlbeta -> mysqlrr
# Push download files, index.html, files.txt and md5sum.txt (from hgwdev to hgdownload)
#* If this is a releaseN, even though there is a releaseN directory on hgwdev, do not create one on hgdownload (see the Download Files section of [[#Pushing Files]] for specifics)
# Drop .wib files that need to be dropped (from hgnfs1)
# Check the [http://genome.ucsc.edu/ENCODE/downloads.html ENCODE/downloads.html] page to see if your track is listed. If not (mostly for first releases), edit and push ENCODE/downloads.html from hgwdev -> hgwbeta & RR (a special Encode download release log)
# This step is now done when you request a push of trackDb and friends so this step can be skipped: Push cv.ra file (only if there is a matrix) located here: /usr/local/apache/cgi-bin/encode
# Click "push requested" in the pushQ record and then click "done" after verification on the RR. Then transfer pushQ entry from the the L queue to the main pushQ.


[[Category:Browser QA]]
[[Category:Browser QA]]
[[Category:Browser QA ENCODE]]
[[Category:Browser QA ENCODE]]

Revision as of 23:27, 24 March 2011

Getting Started

Claim track (pushQ & redmine)

  1. Select the top-most track from the encodePushQ, which is a sub-pushQ accessed from pushQ Gateway.
  2. Claim the track's Redmine Issue & change status from "Approved" to "Reviewing"
    • Add yourself as a watcher to the redmine ticket (so if you assign it back to the developer you will get updates)
    • Make a question in the redmine ticket for Kate:
      1. let her know you claimed the track
      2. ask her to make a determination about the composite and subtrack labels (may not be necessary for subsequent releases)
    • Make a question for the wrangler asking them to update the ENCODE status to "Reviewing"
    • Use the % Done on Redmine Issue to estimate your QA progress. Kate uses this to check status.

Review the "notes" file

  • Check out the developer's #Notes file to get a feel for what the track consists of.
    • the path to the notes file will be in the "Notes" section of the pushQ entry
    • trust the notes file over the pushQ entry table/files information
  • If this is a subsequent release, see #Subsequent Release of Data (e.g. Release 2) first.

Create table list

Run qaEncodeTracks.csh & check output

which does:

  • countPerChrom
  • check for entry in tableDescriptions table
  • check that shortLabel does not exceed 17 characters
  • check that longLabel does not exceed 80 characters
  • check that there are no underscores in the table names
  • check for indices on the tables
  • check that positional tables are sorted
  • checkTableCoords (checks for any illegal coordinates)

Also, run genePredCheck/pslCheck if applicable. (i.e. if your track is a gene prediction track)

Staging on hgwbeta

Push /gbdb files

Push new and, if applicable, updated /gbdb files (e.g. .wib, .bb, etc.) from hgwdev -> hgnfs1.

Push tables to mysqlbeta

Use bigPush.csh using the table list you created above.

Prepare trackDb (release tags and metaDb)

  1. release tags: see the Three State TrackDb page for more info on release tags and our three-state trackDb
    • In /cluster/home/$usr/trackDb/$species/$db/trackDb.wgEncode.ra, find the include statement for your track's .ra file and change 'alpha' tag to an 'alpha,beta' tag and, if applicable (releaseN), change 'beta,public' to 'public' and then check in these changes.
      If a new track is in a super-track, make sure there are release tags! See explanation)
  2. metaDb: starting from /cluster/home/$usr/trackDb/$species/$db/metaDb
    1. copy metDb .ra file from ~metaDb/alpha -> metaDb/beta
    2. add .ra file name to the makefile in ~metaDb/beta
  3. commit changes
  4. On hgwbeta, make beta DBS=__ from /cluster/home/$usr/trackDb/

hgTrackUi

Functionality (track controls)

Display Modes

  • by default, composite overall display mode should be set to dense (super-track, if applicable, should be set to hide)
  • changing display mode of views should affect the subtrack list & hgTracks

Config Settings of Views

  • settings function correctly
  • settings of different views are independent
  • Signals, by default, should have the following settings (unless lab has requested otherwise or other good reason):
    • Data view scaling: use vertical viewing range (rather than auto-scale)
      • in dense, default fixed range should result in meaningful banding at full chromosome (not all gray)
    • Windowing function: mean + whiskers

Matrix

  • By default, matrix boxes should be fully checked or fully unchecked (not grayed), if not, this is trackDb setting issue that the wrangler should fix.
  • Matrix headers:
    • For human, Tier 1 and Tier 2 cell lines:
      • should be listed first (Tier 1 in alphanbetical order followed by Tier 2 in alphabetical order)
      • should be labeled as Tier 1 or 2 with the tier following the cell line in parentheses, bold, no hyperlink, no italics, e.g. cellLineA (Tier 1)
    • +/- buttons function correctly
    • selections in matrix result in appropriate selection changes in subtrack list

Subtrack list

  • adjusts according to matrix & view (hide -> non-hide) selections
  • 'only selected/visible' and 'all' radio selections function
  • sorting functions (clicking on column headings)
  • schema links work

MetaData

  • make sure metaData is present by clicking on ellipsis (...)
  • check a few to make sure they have somewhat consistent fields
  • spot check a few fields to make sure they make sense

Links

  • check that all links work, and where applicable, are relative

Content (.html description page)

Labels

  • Check labels adhere to Kate's instructions
    • Other resources: Style Guide and the Label spreadsheets on the soe google docs.

Sections

  • Make sure all sections are present
  • Check grammar, spelling, readability, completeness, correctness
Description
  • Breif overall summary of track.
Display Conventions and Configuration
  • Contains info about each view in track
  • No description for views only available in downloads
  • link to multi-view instructions
  • Tracks with Bam alignments should have a link to the Sam Format Specification and should explain any non-standard tags, those starting with X, Y or Z or that are not listed in the tag section
Methods
  • Make sure it is detailed enough.
  • style should be consistent with the rest of the site
    • references to "data" are plural
    • value and units have space between them (e.g. 50 bp rather than 50bp)
Release Notes
  • Optional for first release
  • Required for subsequent releases
  • Should start with "Release # (Month Year) of this track...."
  • Provides a description of the changes of this particular release.
Credits
  • Must have contact person
  • Name is a hyperlink to email
  • Email must be sanitized (using encodeEmail.pl script)
References
Data Release Policy
  • Standard language:
    • Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here.
  • make sure "here" is a link to the data release polciy

Links

  • Check standard links are present, and, where applicable, are relative:
    • ENCODE Data policy (in Data Release Policy section)
    • help for multi-view (Next to "Select views" in track Control Section in Display Conventions and Configuration section)
    • contact email (see #Credits for more info)

hgc details

Accuracy of details

  • details that are displayed correspond with the record in the table

Makes sense

  • tables values seem correct

Useful

  • you understand what is being displayed
  • internal, non-functioning fields are not displayed (e.g. if all values in a field have "-1" as a placeholder, we shouldn't display that field)

Complete

  • all useful information from is present (there's nothing important that is missing)

Clear

  • details are presented and labeled clearly
  • layout is user friendly

Links

  • check that all links are relevant, work, and where applicable, are relative

hgTracks

Display

Views (zoomed in/out)

  • check the display of all Views in all display modes when zoomed in to the base pair level & zoomed out to 1 million bp

Table coordinates + features

  • an items' cooridnates and other display features (exons, etc) display as expected/correclty based on table
    • a line from the table for comparing against the display can be obtained from schema or mysql db for regular tables
    • for bam files, the schema will only give you a filePath. You will need to use SamTools to obtain a point to test.
      1. add /hive/data/outside/samtools/svn_${MACHTYPE}/samtools to $PATH in your .bashrc
      2. run samtools on the command line using the fileName found in the schema (see following example). The output will give you the start position and then it gives you read length; add the read length to the start position to get the end position. This will give the point needed to put into the browser for testing purposes.
 samtools view -x filePath chrx:xxxx-xxxxxx | head
 samtools view -x /data/NT/gbdb/hg18/neandertal/seqAlis/Feld1-hg18.sorted.bam chr1:2000000-3000000 | head
  • for big* files, you can't get individual record, but use bigWigInfo or bigBedInfo to get general stats, be sure bigWigs are version 4.

Searchable

  • Are items searchable; should they be? Most likely not for ENCODE.

Colors

  • For human, Tier 1 and Tier 2 cell lines should be displayed in a unique color (other than black)
  • it is OK if other tracks are in color, but not necessary

Defaults (composite/subtracks)

  • should this composite track be on by default? (For ENCODE, usually no)
  • check the which subtracks are set as default selection, make sure:
    • there aren't too many
    • important cell lines are on by default
  • default Tier 1 and Tier 2 subtracks should display first

Compare to hg18

  • If track is in hg19, compare a point on the hg19 browser of the track to the equivalent position in hg18.
    1. use "convert" from hg19 position to see the equivalent position in hg18.
    2. go back to your region in hg19, open new window and paste in hg18 equivalent position and compare hg19 to hg18.
  • Note: Comparisons to hg18 should be very cursory. Any differences should be noted in the redmine ticket, but not necessarily investigated unless a user also brings up an issue. The thinking behind this is that when there are differences, it is most likely an error with hg18, not hg19 and we are unnecessarily holding up hg19)

Performance

Chrom 1 Test (signal & experiment)

When position is set to all of chromosome 1, data of interest loads in less than 1 minute:

  • '"signal: check time of loading first signal subtrack
  • experiment: check time of loading all views for one experiment (e.g. Pol2 in GM12878 cells)?

Defaults at Gene Sized Region Test

Set position to a gene-size region with your track's default subtracks on and the default browser tracks on (easiest to reset cart, turn on track)

  • should display quickly and not be "too much" data

Data make sense

Compare subtracks within views

  • Do all the subtracks within a view somewhat correlate?

Compare subtracks of related views

  • For example:
    • Does the All Signal Raw Signal subtrack of an experiment really seem to comprise of the data in the Plus Raw Signal & Minus Raw Signal?
    • Do Peaks really represent the high Signal areas of the Signal View subtracks?

Do the data make sense Biologically?

  • Turn on other tracks to compare.
    • compare to the gene tracks
    • compare to subtracks of similar tracks
    • For example:
        • RNA-seq data should correlate with the exons in a genes track
        • TFBS tracks should correlate with the beginning of gene transcripts

Files

hgFileUi

  • 'Downloads' links on hgTrackUi should now go to hgFileUi
    • if not, ask wrangler to add "sortOrder" information to trackDb entry

file count

  • Check # of files displayed is correct (use "notes" file)

download button

  • Make sure download button prompts a download (and doesn't take you to an error page)

sort columns

  • Check the sorting of columns

links

  • Check that the "Track Settings" link takes you back to the track's hgTrackUi page

index

  • since there is no longer a link, go to http://hgdownload-test.cse.ucsc.edu/goldenPath/$db/encodeDCC/<trackName>/:
  • index page in downloads dir is created from two files, preamble.html & index.html:

preamble.html

the top description part of the index page (not the list of files), should have:

  • introductory paragraph with a very brief description
  • releaseN: should also include the release number at end of description (e.g. "This is Release 2 of this track. Release notes are included in the track description.")
  • if not complete, here are the directions to edit (or just ask wrangler to do it!):
    1. edit the preamble.html file (note that it is not in git) in the downloads directory (/usr/local/apache/htdocs-hgdownload/goldenPath/$db/encodeDCC/) on hgwdev
    2. regenerate index.html by running the script: encodeDownloadsPage.pl index.html (I think the preamble.html needs to be in the dir where you run the script)
  • links (present and function):
    • trackUi
    • files.txt
    • md5sum.txt files

index.html

list of files on the index page (not the description on top)

  • check file count (should name each file being released, and only the name of those files in this release) against file count in the "notes" file
    • if the list count isn't right (or doesn't seem right otherwise), ask wrangler
    • or, run the encodeDownloadsPage.pl script (at the prompt type: encodeDownloadsPage.pl index.html) directly in the /releaseN directory (hgwdev: /kent/src/hg/htdocs/goldenPath/<db>/encodeDCC/<trackName>/releaseN) to generate a new index.html page that you know contains all the files in that directory. Then, copy the /releaseN/index.html to the main track directory (hgwdev: /kent/src/hg/htdocs/goldenPath/<db>/encodeDCC/<trackName>/) as that is the index.html used on the site. Don't forget to commit your changes.
  • executable: make sure this file is executable (because of the way the links are created)

Release to RR

Note: Cc the data wrangler for this track on all your pushes Cc encode@soe.ucsc.edu on your final push.

Release log

  • Release log field in PushQ:
    • should be the long label (or short if too long) and, if releaseN, release number in parentheses
    • must contain ENCODE (or it it won't show up on ENCODE downloads page)

Push download files

  • Push download files, index.html, files.txt and md5sum.txt (from hgwdev to hgdownload)
    • If this is a releaseN, even though there is a releaseN directory on hgwdev, do not create one on hgdownload (see #Download files for specifics)
  • Note, this push can take hours

Prepare trackDb (release tags and metaDb)

  1. release tags, in trackDb.wgEncode.ra:
    1. remove 'alpha,beta' (no release tag necessary)
    2. if releaseN, then: (see Three State TrackDb page for more info)
      1. copy <trackName>.new.ra over <trackName>.ra
      2. delete <trackName>.new.ra public line
      3. copy <trackName>.new.html over <trackName>.html
      4. open <trackName>.ra and remove html line (pointed to <trackName>.new.html)
  2. metaDb, from /cluster/home/$usr/trackDb/$species/$db/metaDb:
    1. double check alpha vs. beta to make sure you have most updated metadata (diff beta/<trackName.ra> alpha/<trackName>.ra)
      • if diffs are due to next release, don't copy to beta, if diffs are for current release, copy to beta & double check in hgTrackUi, etc.
    2. copy metDb <trackName>.ra file from ~metaDb/beta -> metaDb/public
    3. add <trackName>.ra file name to the make file in ~metaDb/public
  3. commit/push changes
  4. On hgwbeta, make public DBS=$db from /cluster/home/$usr/trackDb/
    • announce it to browser-qa

Check on public

  1. If this is a subsequent release, check track on hgwbeta-public
  2. Run comparePublic.csh to check differences between trackDb_public and RR and hgwbeta.

Push tables

  • Push track tables from mysqlbeta -> mysqlrr (not trackDb_public yet)

Drop tables (if necessary)

  1. Drop tables from hgwbeta that need to be removed (being replaced by V# tables)
  2. Drop tables being removed from the RR

Drop /gbdb files (if necessary)

  • Drop /gbdb files from hgnfs1

Push trackDb+friends

  • Push trackDb+friends and tableDescriptions from mysqlbeta -> mysqlrr
    • cc wrangler and encode@soe.ucsc.edu

PushQ & check on RR

  1. click "push requested" in the pushQ record
  2. once all pushes complete, check track on RR
  3. click "Done!" on pushQ record
  4. Transfer pushQ entry to from the L queue of encodePushQ to the Main pushQ.

ENCODE downloads

  • check ENCODE downloads page (human | mouse), if you track isn't there, add it:
    • /kent/src/hg/htdocs/ENCODE/downloads.html to add a line for your track and, if necessary, its super-track
      • super-track title should be a non-underlined link to the super-track hgTrackUi, for example:
<A style="text-decoration:none" HREF="http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRnaSeqSuper" TARGET=_blank>RNA-seq</A>
  • push the following from hgwbeta -> RR:
/usr/local/apache/htdocs-hgdownload/ENCODE/downloads.html

Other info

Subsequent Release of Data (e.g. Release 2)

Periodically, released ENCODE tracks will be augmented with new data as labs complete experiments on new cell lines, etc. The new data will come in various formats: some will replace existing data, some will be brand new, some old data will be eliminated, etc.

Notes file

The data wrangler will create a notes file using the encodeMkChangeNotes script, check it into git, and place it here: kent/src/hg/makeDb/doc/encodeDcc$db/*.txt

This document should contain complete lists of each table and file and what its disposition is. The tables and files will fall into categories similar to this:

  • A) Untouched - are on public browser and should remain
  • B) Deprecated - are currently on RR but will no longer be needed and should not be referenced by the public site.
NOTE: NO FILES SHOULD BE REMOVED from the downloads directory on hgdownloads (RR).
This list is provided for completeness. Any files marked here as in gbdb may be eliminated.
  • C) New - are only currently on test but will need to be pushed to the RR.
  • D) Additional items of note

This document may not match reality. It may be the case that some of the tables/files do not exist at all, the names are incorrect, they are not present on the machine as listed in the file, they do not match the list that is in the pushQ. The first challenge in QAing a subsequent ENCODE release is to determine if/how the notes file diverges from reality. To do this, compare the file to the "snapshot" of what is included in the release (and what you should QA), which can be found in the "release2" list in the downloads directory (hgwdev: /data/apache/htdocs/goldenPath/<db>/<trackName>/release<x>/). If the file differs from the downloads directory, then send that information to the data wrangler and pop the track into the B-queue while they sort it out. Otherwise, QA spends far too much time figuring out exactly what they are expected to QA.

Once the list is finalized, proceed with the QA work as outlined above. Note the additional steps in the #Files section for how to handle the /releaseN directory.

MetaDb changes

There also may be some metadata changes to fix errors or add information such as the GEO Series and GEO Sample.

  • Be sure to QA these metadata changes
  • Check for related redmine issues (addition of GEO accessions should definitely have a related issue for the addition of this info)
  • You can also do a diff to check for any other metadata changes.From kent/src/hg/makeDb/trackDb/<org>/<db>/metaDb, do the following diff:

diff beta/wgEncodeYourTrack alpha/wgEncodeYourTrack

Pushing Files

Pushing the three main types of files involved in ENCODE tracks.

gbdb files

Files of this form get pushed hgwdev -> hgnfs1

/gbdb/hg18/wib/wgEncode*.wib

Download files

Download files for an original release get pushed hgdownload-test on hgwdev -> hgdownload (list the entire file path as usual)

/usr/local/apache/htdocs-hgdownload/goldenPath/hg18/encodeDCC/wgEncode*/index.html
/usr/local/apache/htdocs-hgdownload/goldenPath/hg18/encodeDCC/wgEncode*/wgEncode*.[bed/wig].gz

When pushing download files for a subsequent release track (e.g. release 2), push files as follows (but in your request, list the from/to paths at the top followed by a list of the file names without the full path)

from hgwdev: /usr/local/apache/htdocs-hgdownload/goldenPath/<db>/encodeDCC/<trackName>/releaseN/
to hgdownload: /usr/local/apache/htdocs/goldenPath/<db>/encodeDCC/<trackName>/
(Note: no releaseN directory on hgdownload)
  • Once the files have been pushed you can check to see if the push was successful using this script: checkPushedFiles.csh

Other files

Files of this form get pushed hgwbeta -> RR. Because they used to be omitted from the pushQ entry often, the directories containing these files are now pushed weekly byKatrina on Fridays. So QAers no longer have to worry about pushing these. They are not in source control so go out way ahead of the track usually.

/usr/local/apache/htdocs/ENCODE/protocols/cell/*.pdf

Relative Links

In html on our site, you can create relative links (on dev, the link goes to the page on dev, on beta, it goes to beta, etc.) by using part of the path based on the your file's location in the source tree relative to the location of the file or cgi you are linking to.
For example from ~trackDb/human/hg19, here is how you point to:

  • a golden path file:
<A HREF="../../goldenPath/help/multiView.html" TARGET=_BLANK>here</A>.
  • cgi:
<A TARGET=_BLANK HREF="/cgi-bin/hgEncodeVocab?type=cellType">cell lines</A>
  • ENCODE protocols:
<A HREF="../../ENCODE/protocols/cell">ENCODE cell culture protocols </A>.
  • ENCODE portal:
<A TARGET=_BLANK HREF="/ENCODE/index.html">Encyclopedia of DNA Elements (ENCODE) Project</A>
  • ENCODE data release policy:
<A TARGET=_BLANK HREF="../ENCODE/terms.html" TARGET=_BLANK>here</A>

Old info

File Validation

  • No longer run, here are Tim's comments about QA running validateFiles: "To get these things through the pipeline, we run them through validateFiles, so I think your running them through again is one time too many. But if you are going to, then each lab and each file type may have negotiated limits (which may change between submissions). These limits are found in the relevant submission directory DAF files."
  • Old validateFiles process:

Test a smattering of different file types using this tool: validateFiles (type the program name without arguments to see the usage statement). If there are no errors, there will be no output. For example, for files of type tagAlign, invoke the tool like this:

validateFiles -type=tagAlign -genome=/gbdb/hg18/hg18.2bit /usr/local/apache/htdocs/goldenPath/hg18/encodeDCC/wgEncodeHudsonalphaChipSeq/wgEncodeHudsonalphaChipSeqAlignmentsRep1Gm12878Control.tagAlign.gz

For tagAligns there are several relevant validateFiles options:

 mismatches  - frequently 2 but negotiated for each lab.  Set this to 5 to be tolerant
 matchFirst - negotiated.  You should set this to 25 and even then you may need to adjust it
 nMatch - negotiated, but you should always have this parameter set.

If you want to be exact, then the metadata as seen on the downloads page tells which submission directory the file belongs to, and the most recent *.DAF (or *.daf) will have a validationSettings line in it which will include the settings that belong to each file type. Example:

 /hive/groups/encode/dcc/pipeline/encpipeline_prod/773/UtaChIPseqBOonlyV1.DAF

has the line:

 validationSettings allowReloads;validateFiles.tagAlign:mmCheckOnInN=100,mismatches=3,nMatch,matchFirst=25

This means that the tag aligns were validated with -mismatches=3 -nMatch -matchFirst=25