ENCODE QA: Difference between revisions

From Genecats
Jump to navigationJump to search
No edit summary
(hgwbeta)
 
(194 intermediate revisions by 9 users not shown)
Line 1: Line 1:
==Getting Started==
==Getting Started==
* Choose a track from the encodePushQ (sub pushQ accessed from pushQ Gateway).
Note: see the [[Old ENCODE QA]] page for the pre-bootcamp ENCODE QA process
* Also claim the [http://redmine.soe.ucsc.edu Redmine] Issue of that track & change status from "Approved" to "Reviewing"
===Claim track (redmine)===
** Add yourself as a watcher to the redmine ticket (that way when you assign it back to the developer you will get updates)
# Look at the Redmine [http://redmine.soe.ucsc.edu/projects/encode-wrangling/issues?query_id=206 PushQ] query in the [http://redmine.soe.ucsc.edu/projects/encode-wrangling Redmine ENCODE project].
** Make a question in the redmine ticket asking Kate to make a determination about the composite and subtrack labels (may not be necessary for subsequent releases)
#*In general, select the track with the highest priority. If two tracks have the same priority, take the one with the soonest due date. Also take into account how many other tracks that wrangler currently has in "Reviewing" status (under active QA).
**email Kate when you claim a track (cc Katrina)
# Change Assignee to yourself
**email wrangler telling them to update the ENCODE status to "Reviewing"
# Change [http://redmine.soe.ucsc.edu/ Redmine] status from "Approved" to "Reviewing"
* Check out the developer's notes file to get a feel for what the track consists of.
# Add yourself (and the wrangler) as a watcher to the redmine ticket (so if you assign it back to the developer you will get updates)
** the path to the notes file will be in the "Notes" section of the pushQ entry
Note: QA no longer uses the '% Done' field in the Redmine Issue to estimate QA progress. This is now just for wrangler use.  Also, you may want to review the Redmine notes for the previous release, found under the "Related issues" section at the top of the current Redmine ticket.
** trust the notes file over the pushQ entry table/files information
* If this is a subsequent release, see [[#Subsequent Release of Data (e.g. Release 2)]] first.
* Use the % Done on Redmine Issue to track your QA progress. Kate uses this to check status.


==run qaEncodeTracks.csh==
===Run encodeQaInit on ''hgwdev''===
You will need to dump the list of tables (just the new tables if this is a releaseN) from the pushQ (or developer's notes file if this a releaseN) to a file (i.e. tableList in the usage statement). Then run qaEncodeTracks.csh, which does:
Running this script '''''on hgwdev''''' makes preparations for QA (run the script without any arguments to see the usage):
 
'''The script''':
* runs qaEncodeTracks script
* performs a verification of the notes file
* sets the release status of all the subIds for the current release to "Reviewing" (no longer need to ask the wrangler to do this)
 
'''The script creates''':
* directory in /hive/groups/encode/encodeQa/hg19/ for that track and release (the subsequent items are in this directory)
*:e.g. /hive/groups/encode/encodeQa/hg19/wgEncodeRikenCage/release2
* allTables - list of all tables for this release
* beta.mdb.ra
* checkPushFilesList - lists files that should be pushed to hgdownload; use to verify all files were successfully pushed to hgdownload, see [[#Verify all downloads pushed (checkPushedFiles.csh) | Verify all downloads pushed]] section
* claimMail - email for Kate about claiming track
* downloads (sym link) - links to downloads directory of the track
* htmlDownloadSnippet - html for adding track to the appropriate ENCODE downloads page ([http://genome.ucsc.edu/ENCODE/downloads.html human] or [http://genome.ucsc.edu/ENCODE/downloadsMouse.html mouse])
* fullFilesListNoRevoked - list of all files downloadable through hgFileUi
* methods - created from this ENCODE QA wiki page index to be used as a checklist and notes file for QA
* newTables - list of new tables for this release
* notes.file (sym link) - links to the notes file for a track describing what this release consists of (created by encodeMkChangeNotes)
* pushFilesMail - email to send for pushing the download files (releasing)
* pushGbdbsMail - email to send for pushing the gbdb files (staging)
* pushTableMail - email to send for pushing the tables and trackDb & friends (releasing)
* release.sql - sql commands for to create the releaseLog (after releasing the track)
* script.output - output from qaEncodeTracks script
* subIds - list of all the subIds involved in this release
 
'''Re-running encodeQaInit''':
 
If the release changes in a way that the notes.file changes (tables/file/gbdbs are added, removed, etc), then the push*Mail, tableList, etc will no longer be accurate. In some cases, it might be best to re-run encodeQaInit so that these files don't have to fixed by hand.
 
Details:
* methods will not be replaced if it exists already
* release.sql will not be replaced, but it will be updated:
** lastdate = last time encodeQaInit was run (updates every time encodeQaInit is run)
** initdate = date encodeQaInit was first run (not ever overwritten)
* all other files will be replaced
* if you want to be able to refer back and see what tables and gbdbs you pushed, before running the script, rename newTables (e.g. tableListOld), or save the email files
 
===Create a symbolic link===
# From your home directory enter:
ln -s /hive/groups/encode/encodeQa/*database*/*wgEncodeTrackName*/*release* trackName
Example: ln -s /hive/groups/encode/encodeQa/hg19/wgEncodeUwDnase/release5 UwDnase
# Symbolic links allow you quickly jump from your home directory. For example by typing: cd UwDnase
# To delete a symbolic link, if you tab complete the name, be sure to delete the final "/" character. For example, "rm UwDnase" will work, while "rm UwDnase/" will not.
 
===email Cricket (claimMail)===
and cc Kate. From the command line encodeQa directory:
mail -s "claiming trackName (Release #)" -c yourDevLogin -c kate cricket < claimMail
 
(good habit to ''read'' the email before sending)
 
===Review the notes file===
# Review the [[#Notes file]] to familiarize yourself with the components of the release. To find the notes file:
#* use the notes.file sym link in the encodeQa directory
#* or go to /kent/src/hg/makeDb/doc/encodeDcc/__<db>__/
# If this is a subsequent release, see [[#Subsequent Release of Data (e.g. Release 2)]] first.
# Compare the notes file to the hgTrackUi (to make sure it reflects the notes file) on dev (or on beta if dev already has the next release staged).
# If a Release N, compare the hgTrackUi on dev to the previous release's hgTrackUi on the RR to help verify notes file & new hgTrackUi is correct (e.g. make sure things aren't missing from the new release in comparison to the previous release that aren't accounted for in the notes file).
 
===Pre-QA===
Some tracks may have already gone through some preliminary QA, see [[Pre-QA]] for more information.
 
==Check script.output==
Output from qaEncodeTracks.csh which is run by encodeQaInit and does:
* countPerChrom
* countPerChrom
* check for entry in tableDescriptions table
* check for entry in tableDescriptions table
Line 22: Line 83:
* check that positional tables are sorted
* check that positional tables are sorted
* checkTableCoords (checks for any illegal coordinates)
* checkTableCoords (checks for any illegal coordinates)
Also, run genePredCheck/pslCheck if applicable. (i.e. if your track is a gene prediction track)


==Staging on hgwbeta==
==Staging on hgwbeta==
#Make a list of all tables (new & updated that need to be pushed to beta)
===Push /gbdb files (pushGbdbsMail)===
#In trackDb, change 'release alpha' lines to 'release alpha,beta' lines and 'release beta,public' to 'release public' and then check in these changes. (Make sure there are release tags for tracks in super tracks! See http://genomewiki.ucsc.edu/index.php/ThreeStateTrackDb#Existing_super-tracks_MUST_use_release_tags )
Push new and, if applicable, updated /gbdb files (e.g. .wib, .bb, etc.) from hgwdev -> hgnfs1.
#*A quick way to replace these line in vi is ":#,## s/release alpha/release alpha,beta/" where # = from start line and ## = to ending line
* Review the pushGbdbsMail, then from the command line in the encodeQa directory:
#If this is a subsequent release (an update to an existing track), you may want to hold off on the next step so that you can compare old and new tracks on hgwdev and hgwbeta:
 
#* Open the track on hgwbeta before staging it to make sure that the update won't cause a cart clash for users currently looking at the track (as evidenced by a completely blank screen, for instance). If you need to do a cartReset to get the track to show up correctly, something is wrong.  
mail -s "push files to hgnfs1 for trackName (Release #)" -c yourDevLogin push-request < pushGbdbsMail
#Do bigPush.csh using list created above
===Open track on beta (if subsequent release)===
#Push any new /gbdb files (e.g. .wib or .bb files) from hgwdev to hgnfs1 if applicable
Open the track on hgwbeta in hgTracks before staging it.<br /> This way, when you [[#Check track on beta|check the track on beta]] (in the last staging step) you'll be able to tell if the update will cause a cart clash for users who happen to be using it when you release it to the RR (as evidenced by a completely blank screen).
#On hgwbeta in trackDb: make beta DBS=dbName
===Push tables to hgwbeta (bigPush.csh)===
Use bigPush.csh '''on hgdev''' using the newTables file created by encodeQaInit. For example: bigPush.csh mm9 newTables
 
===Run encodeQaPrepareRelease (trackDb release tags and metaDb)===
# Run encodeQaPrepareRelease with "beta" for the stage (tip: run it from the encodeQa directory created by encodeQaInit, and the summary file of metadata changes will be saved there). Running encodeQaPrepareRelease:
## updates the trackDb release tags for the track's include statements appropriately:
##* In /cluster/home/$usr/trackDb/$species/$db/trackDb.wgEncode.ra, finds the include statement for your track's .ra file and changes 'alpha' tag to an 'alpha,beta' tag and, if applicable (releaseN), change 'beta,public' to 'public')
##*see the [http://genomewiki.ucsc.edu/index.php/ThreeStateTrackDb Three State TrackDb] page for more info on release tags and our three-state trackDb)
## prepares the metadata:
##* from /cluster/home/$usr/trackDb/$species/$db/metaDb:
##* copies metDb .ra file from ~metaDb/alpha -> metaDb/beta
##* adds .ra file name to the makefile in ~metaDb/beta
## creates a trackName.beta.metaDb.diff file which is a summary of the metadata changes which you can review.
# Do a git status to see what files changed; review changes
# Check-in the changes
##Add commit message such as: git commit -m "Staging Caltech RNA-Seq (Release 1) on beta (redmine #7777)"
# On hgwbeta, make beta DBS=__ from /cluster/home/$usr/trackDb/
 
===Check track on beta===
Check that the track looks good on beta.<br />
If this is a subsequent release, you already had the track open on beta from [[#Open track on beta (if subsequent release)]]. Refresh the page to see the changes.<br /> If you get a blank screen:
# Don't reset your cart (at least not until you've completed these steps!)
# Notify the track wrangler that there is likely a problem with conflicting cart variables when the new data is used with an old cart.
# Dump the cart variables (manipulate the URL to: http://hgwbeta.soe.ucsc.edu/cgi-bin/cartDump then hit enter) and save them in a file for people to look at.
 
==hgTrackUi==
===Functionality (track controls)===
* For more detail information on how the track controls currently work, please see [[Subtrack_Configuration]].
====Display Modes====
* If in a super-track, by default, composite "Maximum display mode" should be set to dense. Super-track should be set to hide.
* If not in a super-track, by default composite "Maximum display mode" should be set to hide.
* If multiple views, Kate wants these settings by default:
** Peaks -> pack
** Alignments -> hide
** All else -> full
* NOTE: Check with the wrangler in case there are custom visibility settings
* Changing display mode of views should affect the subtrack list & hgTracks
 
====Config Settings of Views====
* settings function correctly
* settings of different views are independent
* Signals, by default, should have the following settings (unless lab has requested otherwise or other good reason):
** Data view scaling: use vertical viewing range (rather than auto-scale)
*** (Pre-QA skip) in dense, default fixed range should result in meaningful banding at full chromosome (not all gray)
** Windowing function: mean + whiskers
 
====Matrix====
* By default, matrix boxes should be fully checked or fully unchecked (not grayed), if not, this is trackDb setting issue that the wrangler should fix.
* Matrix headers:
** For human, Tier 1 and Tier 2 cell lines:
*** should be listed first (Tier 1 in alphabetical order followed by Tier 2 in alphabetical order)
*** should be labeled as Tier 1 or 2. The tier should follow the cell line name, in parentheses and bolded. No hyperlink, no italics, e.g. '''<u>cellLineA</u>''' '''(Tier 1)'''
** matrix headers are links to a working hgEncodeVocab page for the item (cell line, factor, etc)
** +/- buttons function correctly
** selections in matrix result in appropriate selection changes in subtrack list
 
====Subtrack list====
* adjusts according to matrix & view (hide -> non-hide) selections
* 'only selected/visible' and 'all' radio selections function
* sorting functions (clicking on column headings)
* schema links work and has a "description" column
** if the table is very large, there may not be an "info" column
** if the table is merely a reference to a big* filename, there will not be a "description" column
 
====MetaData====
* make sure metaData is present by clicking on the down arrow ('''v''')
* check a few to make sure they have somewhat consistent fields
* spot check a few fields to make sure they make sense
* make sure subtracks have dccAccession numbers (aka UCSC Accession), if none, may not be ready for QA; ask wrangler.
** expId & dccAccession should correspond, dccAccession = wgEncodeE<H or M><expId> (the E=experiment, H=human, M=mouse)
** these #numbers should be the same among subtracks of the same "experiment," even across assemblies of an organism (e.g. same number on hg18 & hg19)
*** NOTE: expID should only be displayed on hgwdev.
** can check to make sure a composite (or single track) has all its expIDs and dccAccessions using experimentify option on mdbPrint:
 
<pre>
mdbPrint <db> -composite=<compositeName> -experimentify
mdbPrint <db> -obj=<trackName> -experimentify
</pre>
 
====Links====
* check that all links work, and (PRe-QA skip) where applicable, are [[#Relative Links|relative]]
 
===Content (.html description page)===
====Labels====
* Check labels adhere to Kate's instructions
**Other resources: [http://encodewiki.ucsc.edu/EncodeDCC/index.php/ENCODE_track_settings_style_guide Style Guide] and [http://www.ncbi.nlm.nih.gov/books/NBK988/ The NCBI Style Guide].


===Staging metaDb on hgwbeta===
====Sections====
Starting from /cluster/home/$usr/trackDb/$species/$db/
* Make sure all sections are present, in order, and have the correct headings (the list below has the correct headings and is in the correct order)
#copy metaDb .ra file from ~metaDb/alpha -> metaDb/beta
* Check grammar, spelling, readability, completeness, correctness
#add .ra file name to the makefile in ~metaDb/beta
* style should be consistent with the rest of the site
#From trackDb on hgwbeta: make beta DBS=<$db>
** Description should be in a passive 3rd person voice
** references to "data" are plural
** value and units have space between them (e.g. 50 bp rather than 50bp)
* links should be hyper linked text rather than just plain URLs
* Latin Terminology
** Latin or foreign words or phrases '''should not''' be italicized
** Genus/species names '''should''' be italicized
=====Description=====
* Brief overall summary of track.
=====Display Conventions and Configuration=====
* Contains info about each view in track
* No description for views only available in downloads
* link to [http://hgwdev.gi.ucsc.edu/goldenPath/help/multiView.html multi-view instructions] if there are multiple viewing options.
* Tracks with Bam alignments (in metadata, fileName will end with ".bam") should have a link to the [http://samtools.sourceforge.net/SAM1.pdf Sam Format Specification] and should explain any non-standard tags, those starting with X, Y or Z or that are not listed in the [http://samtools.sourceforge.net/SAM1.pdf#page=6 tag section]


==Other things to check by hand==
=====Methods=====
# For hg19 check that labels follow this new convention: http://encodewiki.ucsc.edu/EncodeDCC/index.php/ENCODE_track_settings_style_guide
* Make sure it is detailed enough.
# Run genePredCheck/pslCheck if applicable. (i.e. if your track is a gene prediction track)
# make sure there is a link to the help doc (in the config section: "Select views (help)")
# check that metadata is present by clicking on "..." link in tables list on details page, spot check a few to make sure they are correct
# read description page (if the track is part of a super track, make sure to QA the super track description)
#* is it detailed enough, especially Methods
#* Is the text consistent with the rest of the site:
#** Make sure that any references to "data" are plural
#** Make sure that all units have a space in between their quantity ie. '''50 bp''' and not 50bp
#* are the citations in correct format ([[CBSE_citation_format]])
#* does the "Display Conventions and Configuration" section cover all track types
#** Tracks with Bam alignments should have a link to the [http://samtools.sourceforge.net/SAM1.pdf#page=4 Sam Format Specification]
#* test all hyper-links
#* releaseN tracks should contain a section called "Release Notes" which should state the release# and provide a description of the changes in that particular release. See [http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=wgEncodeUwDnaseSeq this page] for an example.
#* Check for lab contact (sanitize email addresses using encodeEmail.pl script)
# Release log (look in PushQ): must contain "ENCODE", usually it is just be the shortlabel, if it is a subsequent release it should have "(release #)". If there is something weird about the data that needs to be noted, make sure it fits in nicely with the current release log entries. (url should be of the format: ../../cgi-bin/hgTrackUi?db=hg18&g=wgEncodeAffyRnaChip )
# configuration section (does it work?)
#* Check the views are working & that the settings work
#* Here are some additional specific guidelines when checking the Signal track default settings:
#** auto-scale shouldn't be used unless a lab insists; should be a fixed range
#** mean + whiskers should be the default setting unless there is a good reason
#** check signal in dense view for the whole chrom to make sure the fixed range allows for nice pattern of dark bands (we don't want to see all light gray across), wrangler should fix if need be
#** if there are multiple signal tracks in a track, their settings should be independent
# multi-view config: matrix etc.
#* By default, matrix boxes should be fully checked or fully unchecked (not grayed), if not, this is trackDb setting issue the wrangler should fix.
#* Tier 1 and Tier 2 cell lines should be labeled as such in bold in the matrix.
# check "reset to defaults" button (does it work)
# Make sure there's a link to the ENCODE Data Release Policy (at the bottom of the description page).
# Make sure the tracks in the Tier1 and Tier2 Cell Lines are properly colored (no black, and all tracks from one Cell Line have the same color).
# Tier 1 and 2 cell lines should be displaying first by default when viewed in the browser.
# For supertracks, by default the supertrack control should be on hide, and the tracks within should be on dense


===Testing in the Browser===
=====Verification=====
# test one point from table to view in GB (pick a point which can be obtained by clicking on "schema" from the track configuration page)
* optional
##If they are bam files, the schema will only give you a filePath. You will need to use SamTools to obtain a point to test.
### add /hive/data/outside/samtools/svn_${MACHTYPE}/samtools to $PATH in your .bashrc
### You can run the command line using the fileName found in the schema:
###:samtools view -x filePath chrx:xxxx-xxxxxx | head
###:samtools view -x /data/NT/gbdb/hg18/neandertal/seqAlis/Feld1-hg18.sorted.bam chr1:2000000-3000000 | head
### The output will give you the start position and then it gives you read length. Add the read length to the start position to get the end position. This will give the point needed to put into the browser for testing purposes.
##If they are bigWig files, you can use the utility: bigWigInfo file.bw - make sure they are V4!
##If they are bigBed files, you can use the utility: bigBedInfo file.bb
# If hg19, compare that same point to the equivalent position in hg18 by doing a "convert" from hg19 to see the equivalent position in hg18. In your hg19 window, go back to your region, open new window and paste in hg18 equivalent position and compare hg19 to hg18. (*Notee: Comparisons to hg18 should be very cursory. Any differences should be noted in the redmine ticket, but not necessarily investigated unless a user also brings up an issue. The thinking behind this is that when there are differences, it is most likely an error with hg18, not hg19 and we are unnecessarily holding up hg19).
# zoom into base level (at different visibilities)                                                 
# zoom way out 1million bps (at different visibilities)
# searching: should items be searchable                                                             
# default visibility: should this track be on by default?


===Does the data make Sense?===
=====Release Notes=====
# Compare related subtracks of related Views to each other. For example:
* Optional for first release
#* Does the All Signal Raw Signal really seem to comprise of the data in the Plus Raw Signal & Minus Raw Signal?
* Required for subsequent releases
#* Do Peaks really represent the high Signal areas of the Signal View subtracks?
* Should start with "Release # (Month Year) of this track...."
# Do the data make sense Biologically? Turn on other tracks to compare. For example:
* Provides a description of the changes of this particular release.
#* RNA-seq data should correlate with the exons in a genes track
=====Credits=====
#* TFBS tracks should correlate with the beginning of gene transcripts
* Must have contact person
* Name is a hyperlink to email
* Email must be sanitized (using encodeEmail.pl script). To check go into your track's .html file and make sure the 'mailto' address is encoded.


===Performance Tests===                                                                                      
=====References=====
# Does the first 'Signal' subtrack pass the chr1 test (chr1 loads in less than one minute)                        
* Correct format, see [[CBSE citation format]]
# Do all views for one experiment pass chr1 test (e.g. Pol2 in GM12878 cells)?                                     
* Alphabetical order
# A user-oriented test would be to test the performance in a gene-size region of the track with just the default-on subtracks (for the Yale track, and many other ENCODE tracks, default-on subtracks will be all experiments in the GM12878 cell lines, Signal view only -- this should be the configuration you see after a cart reset, then turning the overall track vis to full).
* Make sure URL to references don't contain the rel='nofollow' attribute.
# Note that ENCODE tracks can have any number of subtracks, and will continue to grow with time. We should definitely assure that useful subsets can be displayed in user-friendly time.
* Hyperlink uses PMID
 
=====Publications=====
This is an optional listing publications that reference or use ENCODE data from this track. Information for this section is provided to us by NHGRI.
* Correct format, see [[CBSE citation format]]
* Alphabetical order
* Hyperlink uses PMID
 
=====Data Release Policy=====
Note: GENCODE Genes tracks are exceptions since GENCODE Genes data are restriction free immediately; see below for the GENCODE Data Release Policy language.
* Standard language, '''Supertrack''' ''-refers to dates as being on track configuration and download page'':
::Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available [http://hgwdev.gi.ucsc.edu/ENCODE/terms.html here].
* Standard language, '''Track''' ''-refers to dates as being above:''
::Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available [http://hgwdev.gi.ucsc.edu/ENCODE/terms.html here].
* '''GENCODE''' Genes language, Track or Supertrack:
::GENCODE data are available for use without restrictions. The full data release policy for ENCODE is available [http://hgwdev.gi.ucsc.edu/ENCODE/terms.html here].
* make sure "here" is a link to the [http://hgwdev.gi.ucsc.edu/ENCODE/terms.html data release policy]
 
====Links====
*Check standard links are present, and, where applicable, are [[#Relative Links|relative]]:
** ENCODE Data policy (in Data Release Policy section)
** help for multi-view (Next to "Select views" in track Control Section in Display Conventions and Configuration section)
** contact email (see [[#Credits]] for more info)
*If there is supplemental data, make sure there is a link and that it points to hgdownload and not hgdownload-test
 
==hgc details==
Check the following for each view:
===Accuracy of details===
* details that are displayed correspond with the record in the table
===Makes sense===
* table values seem correct
 
===Useful===
* you understand what is being displayed
* internal, non-functioning fields are not displayed (e.g. if all values in a field have "-1" as a placeholder, we shouldn't display that field)
===Complete===
* all useful information from is present (there's nothing important that is missing)
===Clear===
* details are presented and labeled clearly
* layout is user friendly
===Links===
* check that all links are relevant, work, and where applicable, are [[#Relative Links|relative]]
* standard links: downloads, metadata, schema
 
==hgTracks==
===Display===
====Views (zoomed in/out)====
* check the display of all Views in all display modes when zoomed in to the base pair level & zoomed out to 1 million bp
====Table coordinates + features====
* an items' cooridnates and other display features (exons, etc) display as expected/correctly based on table
** a line from the table for comparing against the display can be obtained from schema or mysql db for regular tables
** for bam files, the schema will only give you a filePath. You will need to use SamTools to obtain a point to test.
**#add /hive/data/outside/samtools/svn_${MACHTYPE}/samtools to $PATH in your .bashrc
**#run samtools on the command line using the fileName found in the schema (see following example). The output will give you the start position and then it gives you read length (in the CIGAR string); if the CIGAR string is simple, e.g. 76M, add the read length, 76, to the start position to get the end position. If the CIGAR string is complicated, e.g. 43S17M494510N16M, just use the actual sequence to determine the length (paste sequence into Word & get word count) and add this to the start position to get the end position. This will give the point needed to put into the browser for testing purposes.
  samtools view -x filePath chrx:xxxx-xxxxxx | head
  samtools view -x /gbdb/hg19/bbi/wgEncodeCshlLongRnaSeqHuvecCellPapAlnRep1.bam chr1:2000000-3000000 | head
:* for big* files, you can't get individual record, but use bigWigInfo or bigBedInfo to get general stats, be sure bigWigs are version 4.
 
====Searchable====
* Are items searchable; should they be? Most likely not for ENCODE. (position/search box at the top of the browser image)
* Do a quick search of a subtrack in track search (button found at the bottom of the browser image) to make sure that it is interacting correctly.
 
====Colors====
=====Human=====
* Human Tier 1 cell lines should be displayed in a unique color (other than black)
* Human 'original' Tier 2 cell lines (HeLa-S3, HepG2, and HUVEC) should be displayed in a unique color (other than black), but newly promoted Tier 2 cell lines should be displayed in black until a unique color for each is determined by the consortium.
* it is OK if other tracks are in color, but not necessary
 
=====Mouse=====
* For mouse, tissues/cell lines are grouped based on organ systems, and each has a unique color:
*#Skeletal: Dark Purple 102,50,200
*#Muscular: Brown (139,69,19)
*#Circulatory: Red (153,38,0)
*#Nervous: Grey (105,105,105)
*#Respiratory; Black (empty)
*#Digestive: Orange (230,159,0)
*#Excretory: Purple (204,121,167)
*#Endocrine: Pink (189,0,157)
*#Reproductive: Green (0,158,115)
*#Lymphatic/immune System: Blue (86,180,233)
*#Stem Cell: Dark Blue (65,105,225)
*Visualize the above colors in the browser [http://genome-test.gi.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Vsmalladi&hgS_otherUserSessionName=Mouse%20Cell%2FTissue%20Coloring Mouse color Session]
* If a cell line is not colored that should be, the appropriate color for the cell lines is listed as a 'color' value in the cv.ra.
 
====Defaults (composite/subtracks)====
* should this composite track be on by default? (For ENCODE, usually no)
* check the which subtracks are set as default selection, make sure:
** there aren't too many
** important cell lines are on by default
* default Tier 1 and Tier 2 subtracks should display first
====Compare to hg18====
* If track is the first release in hg19, compare a point on the hg19 browser of the track to the equivalent position in hg18.
*# use "convert" from hg19 position to see the equivalent position in hg18.
*# go back to your region in hg19, open new window and paste in hg18 equivalent position and compare hg19 to hg18.
* Note: Comparisons to hg18 should be very cursory. Any differences should be noted in the redmine ticket, but not necessarily investigated unless a user also brings up an issue. The thinking behind this is that when there are differences, it is most likely an error with hg18, not hg19 and we are unnecessarily holding up hg19)
 
===Performance===
====Chrom 1 Test (signal & experiment)====
When position is set to all of chromosome 1, data of interest (in full mode) loads in less than 1 minute:
* '''signal''': check time of loading first signal subtrack
* '''experiment''': check time of loading all views for one experiment (e.g. Pol2 in GM12878 cells)?
 
====Defaults at Gene Sized Region Test====
Set position to a gene-size region with your track's default subtracks on and the default browser tracks on (easiest to reset cart, turn on track)
* should display quickly and not be "too much" data
 
==Data make sense==
===Compare subtracks within views===
* Do all the subtracks within a view somewhat correlate?
===Compare subtracks of related views===
*For example:
** Does the All Signal Raw Signal subtrack of an experiment really seem to comprise of the data in the Plus Raw Signal & Minus Raw Signal?
** Do Peaks really represent the high Signal areas of the Signal View subtracks?
===Do the data make sense Biologically?===
*Turn on other tracks to compare.
** compare to the gene tracks
** compare to subtracks of similar tracks
** For example:
*** RNA-seq data should correlate with the exons in a genes track
*** TFBS tracks should correlate with the beginning of gene transcripts


==Files==
==Files==
First, a note about finding the files. One of the most time-consuming things we do is track down items that should have been placed in the "Files" section of the pushQ entries but weren't.  It takes us a long time to (a) figure out what's missing, and (b) find it. If developers can ensure that both the /gbdb and /goldenPath files are there, it would be a huge help!
===hgFileUi===
* 'Downloads' links on hgTrackUi should now go to hgFileUi
** if not, ask wrangler to add "fileSortOrder" information to trackDb entry
====file count (notes.file)====
* Check # of files displayed is correct (use notes.file). Pre-QA skip.


===Download File specifics===
*Note: If you are not seeing the correct number of files on beta, first try clearing the cache
#Check the '''Index page''' (e.g. [http://hgdownload-test.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeYaleChIPseq this page ] ), which created from two files, preamble.html & index.html) in the downloads directory:
**Add the following to the end of the URL in your address bar: '''&clearCache=true'''
#*''preamble.html'' - the top description part of the index page (not the list of files)
 
#**an introductory paragraph with a brief description, and link to the trackUi.
====download button====
#**should include a link to the files.txt and md5sum.txt files
* Make sure download button prompts a download (and doesn't take you to an error page)
#**releaseN: should also include the release number at end of description (e.g. "This is Release 2 of this track. Release notes are included in the track description.")
====useful columns w/ good titles ====
#**If you need to edit the preamble.html (not the list of files), follow these steps:
* Columns are useful
#**#Edit the preamble.html file (note that it is not in git) in the downloads directory (/usr/local/apache/htdocs-hgdownload/goldenPath/$db/encodeDCC/) on hgwdev
* Column titles are correct and make sense (e.g. dccAccession title is "UCSC Accession")
#**#Regenerate index.html by running the script:  encodeDownloadsPage.pl index.html (I think the preamble.html needs to be in the dir where you run the script)
 
#**#Look at results on genome-test. If necessary, go back to step 1.
====sort columns====
#*''index.html'' - the list of files on the index page (not the description on top)
* Check the sorting of columns. Clicking on the title of the column should sort the table on that column.
#**should contain the name of each file being released (and only the name of those files in this release). A good way to spot check this is to make sure the number of files at the bottom of the list is correct.
 
#***new track: use the PushQ file list (shouldn't be files with V2 or V3, etc.) and your best judgment to determine that all the appropriate files (and only those files are listed).
====file filter====
#***releaseN: make sure the right # of files are there. Check some of the removed files to make sure they were in fact removed from the list. If the list doesn't seem right, run the encodeDownloadsPage.pl script (at the prompt type: encodeDownloadsPage.pl index.html) directly in the /releaseN directory (hgwdev: /kent/src/hg/htdocs/goldenPath/<db>/encodeDCC/<trackName>/releaseN) to generate a new index.html page that you know contains all the files in that directory. Then, copy the /releaseN/index.html to the main track directory (hgwdev: /kent/src/hg/htdocs/goldenPath/<db>/encodeDCC/<trackName>/) as that is the index.html used on the site. Don't forget to commit your changes.
* Check the filtering of files
#**make sure this file is executable (because of the way the links are created)
 
#Downloads directory also needs to contain the following:
====links====
#*files.txt - plaintext version of index.html; lists files with metadata
* Check that the "Track Settings" link takes you back to the track's hgTrackUi page
#*md5sum.txt - checksum of all files in download directory
* Check that the navigation, file filter title links, and other links work
#When you are ready to release make sure your track is listed on the [http://genome.ucsc.edu/ENCODE/downloads.html downloads page] - if it isn't listed, go /kent/src/hg/htdocs/ENCODE/downloads.html to add a line for your track, and push the following from hgwdev -> hgwbeta, RR:
* Make sure files.txt & md5Sum.txt links are present and function
  /usr/local/apache/htdocs-hgdownload/ENCODE/downloads.html
* Make sure the download server link goes to the download directory for that track see [[#download server]] for more info.
Also, if the new track is part of a super-track, when you add the super-track category, please make the title a non-underlined link to the super-track page, for example:
 
  <A style="text-decoration:none" HREF="http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRnaSeqSuper" TARGET=_blank>RNA-seq</A>
===download server===
* linked from hgFileUi with the *download server* link
* have wrangler remove index.html or preamble.html files from the current release directories if they exist (it is OK in older directories, e.g. release1 if this is a release3).
====README====
* README should be displayed automatically followed by the list of files in the directory
* contains a URL to the track's hgFileUi (you can double check by copying link, pasting in a browser, and changing hgdownload to hgdownload-test).
* there may be more files/directories in here than seen in hgFileUi. This is OK. Because we are not dropping obsolete files, they will be present in this directory. Also, on hgdownload-test there will also be releaseN directories. These are part of the process of preparing a track and are OK. These, however, *won't* be pushed to the hgdownload upon release of the track.
 
==Release to RR==
Note: Cc the data wrangler for this track on all your pushes and Cc encode-staff@soe.ucsc.edu on your final push.
'''Bold text'''===Push downloads (pushFilesMail)===
* Push download files, index.html, files.txt and md5sum.txt (from hgwdev to hgdownload):
 
mail -s "push download files for trackName (Release #)" -c wranglerDevLogin push-request < pushFilesMail
 
* If track has supplemental data (files linked from the description pages in a /supplemental directory), make sure the pushFilesMail includes them (if they're in the notes.file, they should be in the pushFilesMail).
* Notice that even though there is a releaseN directory on hgwdev, it is not pushed to a releaseN directory on hgdownload (see [[#Download files]] for specifics, and note about '''ReleaseLatest''')
* Note, this push can take hours, so you may want to start the day before you want to release.
* NOTE: If you are manually requesting this push be sure '''not to push the releaseLatest directory''', only the files from there.  Add a useful sentence to tip off admin in your push-request.
 
===Verify all downloads pushed (encodeQaCheckHgdownloadFiles)===
#Run encodeQaCheckHgdownloadFiles to make sure all files got pushed to hgdownload
#* use 'checkPushFilesList' as the list of files (created by encodeQaInit and in the encodeQa directory for the track/release)
# Run encodeQaCheckHgdownloadFiles to make sure all files got pushed to the hgdownload San Diego (hgdownload-sd)
 
'''to check hgdownload''' (hgdownload is the default server, so it doesn't need to be specified):
  encodeQaCheckHgdownloadFiles db trackname checkPushFilesList
 
'''to check hgdownloa-sd:'''
encodeQaCheckHgdownloadFiles db trackname checkPushFilesList -s hgdownload-sd
 
===Run encodeQaPrepareRelease (trackDb release tags & metaDb)===
# Check metaDb/alpha vs. beta to make sure you have most updated metadata (diff beta/<trackName.ra> alpha/<trackName>.ra) or try the qaRaDiff script (soon to be renamed raDiff).
#* if diffs are due to next release, don't copy to beta, if diffs are for current release, copy to beta & double check in hgTrackUi, etc.
# Run encodeQaPrepareRelease with "public" as the stage, which:
## updates the '''release tags''' in trackDb.wgEncode.ra (of the $db directory): removes any release tags from the current <trackName>.releaseN.ra include line (or can be <trackName>.ra if 1st release)
## updates the '''metaDb''', from /cluster/home/$usr/trackDb/$species/$db/metaDb:
### copies metaDb <trackName>.ra file from ~metaDb/beta -> metaDb/public
### adds <trackName>.ra file name to the make file in ~metaDb/public (if a first release)
# Review changes
# Check-in changes
# Announce intent to make public on db
# '''If' you have changes to cv.ra''', you will need to push a copy from beta to public.  See [http://redmine.soe.ucsc.edu/issues/11601 RM#11601]
# On hgwbeta, make public DBS=$db from /cluster/home/$usr/trackDb/
 
===Check track on beta-public===
Check track on [http://hgwbeta-public.soe.ucsc.edu hgwbeta-public]
* hgwbeta-public uses hgwbeta for the tables, but uses the CGIs that are on the RR.
 
===Push tables + trackDb&friends (pushTableMail)===
*Push the tables from hgwbeta -> mysqlrr and trackDb & friends:
 
mail -s "push tables and trackDb & friends for trackName (Release #)" -c encode-staff push-request < pushTableMail
 
* no longer have to push tableDescriptions because it gets pushed out once a week by a designated QA'er
 
===Check track on RR===
Once all pushes complete, check track on RR
 
===Set subIds to Released on hgwdev===
Set the ENCODE status of the subIds (listed in the 'subIds' file of the enocode directory created by encodeQaInit) for this release to "Released":
* '''on hgwdev''' from the bash shell command line enter this 'for loop':
 
for i in $(cat filenamewithpath); do encodeStatus.pl $i released; done
 
or, if you need to change just a few subIds, you can enter the individual subIds with spaces in between them instead of doing the cat fileName:
 
for i in subIdsWithSpacesSeparating; do encodeStatus.pl $i released; done
 
For example:
 
for i in 2007 3113; do encodeStatus.pl $i released; done
 
===Close Redmine ticket===
* If there aren't any lingering issues, close the ticket by setting the status to "Released".
 
===Run encodeQaSqlRelease on hgwbeta===
On '''hgwbeta''', encodeQaSqlRelease creates a pushQ entry directly in the L queue of the Main pushQ so the ENCODE track will have an entry in the release log.
* The Release Log field of the entry created:
** should be the long label (or short if too long) and, if releaseN, release number in parentheses
** should contain ENCODE (or it it won't show up on ENCODE downloads page)
* The Release Log URL:
** should be a relative link with the db specified (e.g. ../../cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRikenCage)
 
===ENCODE downloads (htmlDownloadSnippet)===
Check ENCODE downloads page ([http://hgwdev.gi.ucsc.edu/ENCODE/downloads.html human] | [http://hgwdev.gi.ucsc.edu/ENCODE/downloadsMouse.html mouse]), if your track isn't there:
# add it by copying the text in the htmlDownloadSnippet file and adding it to /kent/src/hg/htdocs/ENCODE/downloads.html under the appropriate group in alphabetical order.  The snippet may be added to a different location, for example downloadsMouse.html instead of downloads.html, if you were working on mouse.
# if necessary, also add its super-track:
#* super-track title should be a non-underlined link to the super-track hgTrackUi, for example: <br /> <nowiki><A style="text-decoration:none" HREF="http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRnaSeqSuper" TARGET=_blank>RNA-seq</A></nowiki>
#test your changes in your sandbox and dev by doing a make update and make alpha in the appropriate location: /cluster/home/*yourhome*/kent/src/hg/htdocs
#*do a make beta from hgwbeta and ensure changes, check them in.
# request push of the appropriate file (depending on if your track is in mouse or human, for example with subject line: "please push ENCODE mouse downloads static page") from hgwbeta -> RR:
/usr/local/apache/htdocs/ENCODE/downloads.html
 
OR
 
/usr/local/apache/htdocs/ENCODE/downloadsMouse.html
 
==Other info==
===Subsequent Release of Data (e.g. Release 2)===
 
Periodically, released ENCODE tracks will be augmented with new data as labs complete experiments on new cell lines, etc. The new data will come in various formats: some will replace existing data, some will be brand new, some old data will be eliminated, etc.
 
====Notes file====
The data wrangler will create a notes file using the encodeMkChangeNotes script, check it into git, and place it here: kent/src/hg/makeDb/doc/encodeDcc$db/*.txt
 
This document should contain complete lists of each table and file and what its disposition is. The tables and files will fall into categories similar to this:
* A) Untouched - are on public browser and should remain
* B) Deprecated - are currently on RR but will no longer be needed and should not be referenced by the public site.
:: NOTE: NO FILES SHOULD BE REMOVED from the downloads directory on hgdownloads (RR).
::This list is provided for completeness.
* C) New - are only currently on test but will need to be pushed to the RR.
* D) Additional items of note
 
This document may not match reality. It may be the case that some of the tables/files do not exist at all, the names are incorrect, they are not present on the machine as listed in the file, they do not match the list that is in the pushQ. The first challenge in QAing a subsequent ENCODE release is to determine if/how the notes file diverges from reality. To do this, compare the file to the "snapshot" of what is included in the release (and what you should QA), which can be found in the "release2" list in the downloads directory (hgwdev: /data/apache/htdocs/goldenPath/<db>/<trackName>/release<x>/). If the file differs from the downloads directory, then send that information to the data wrangler and pop the track into the B-queue while they sort it out. Otherwise, QA spends far too much time figuring out exactly what they are expected to QA.
 
Once the list is finalized, proceed with the QA work as outlined above. Note the additional steps in the #Files section for how to handle the /releaseN directory.
 
====MetaDb changes====
There also may be some metadata changes to fix errors or add information such as the GEO Series and GEO Sample.
* Be sure to QA these metadata changes
* Check for related redmine issues (addition of GEO accessions should definitely have a related issue for the addition of this info)
* You can also do a diff to check for any other metadata changes.From kent/src/hg/makeDb/trackDb/<org>/<db>/metaDb, do the following diff:
 
diff beta/wgEncodeYourTrack alpha/wgEncodeYourTrack


===Pushing Files===
===Pushing Files===
Pushing the three main types of files involved in ENCODE tracks.
Pushing the three main types of files involved in ENCODE tracks.


*gbdb Files
====gbdb files====
Files of this form get pushed hgwdev -> hgnfs1
Files of this form get pushed hgwdev -> hgnfs1
  /gbdb/hg18/wib/wgEncode*.wib


*Other Files
/gbdb/hg18/wib/wgEncode*.wib
Files of this form get pushed hgwdev -> hgwbeta & RR (they are quite often accidentally omitted from the pushQ entry -- you will need these types of protocol PDF files, if this is the first subtrack released for this cell line from this lab)
 
  /usr/local/apache/htdocs/ENCODE/protocols/cell/*.pdf
====Download files====
 
Download files for an original release get pushed hgdownload-test on hgwdev -> hgdownload (list the entire file path as usual)


*Download Files
/usr/local/apache/htdocs-hgdownload/goldenPath/hg18/encodeDCC/wgEncode*/index.html
Download files for an original release get pushed hgdownload-test -> hgdownload (list the entire file path as usual)
/usr/local/apache/htdocs-hgdownload/goldenPath/hg18/encodeDCC/wgEncode*/wgEncode*.[bed/wig].gz
  /usr/local/apache/htdocs-hgdownload/goldenPath/hg18/encodeDCC/wgEncode*/index.html
  /usr/local/apache/htdocs-hgdownload/goldenPath/hg18/encodeDCC/wgEncode*/wgEncode*.[bed/wig].gz


When pushing download files for a subsequent release track (e.g. release 2), push files as follows (but in your request, list the from/to paths at the top followed by a list of the file names without the full path)
When pushing download files for a subsequent release track (e.g. release 2), push files as follows (but in your request, list the from/to paths at the top followed by a list of the file names without the full path)


  from '''hgwdev''': /usr/local/apache/htdocs-hgdownload/goldenPath/<db>/encodeDCC/<trackName>/releaseN/
from hgwdev: /usr/local/apache/htdocs-hgdownload/goldenPath/<db>/encodeDCC/<trackName>/releaseN/
  to '''hgdownload''': /usr/local/apache/htdocs/goldenPath/<db>/encodeDCC/<trackName>/
to hgdownload: /usr/local/apache/htdocs/goldenPath/<db>/encodeDCC/<trackName>/\
  (''Note'': no releaseN directory on '''hgdownload''')
(Note no releaseN directory on hgdownload.  FYI: releaseLatest is symlinked to the newest releaseN.  It should be OK to push from releaseLatest.)
 
* Once the files have been pushed you can check to see if the push was successful using this script: checkPushedFiles.csh
 
NOTE:  Although these files are logically in the htdocs-hgdownload directory in the apache document root, the physical location of the downloads directories is in the analysis/ftp path on the hive (to make pre-release data available to ENCODE analysts by FTP).  The symlink pattern is:
 
/usr/local/apache/htdocs-hgdownload/goldenPath/<rel>/encodeDCC ->
/hive/groups/encode/dcc/pipeline/downloads/<rel>


* After Pushing Files:
====Other files====
Once the files have been pushed you can check to see if the push was successful using this script:
checkPushedFiles.csh


===validateFiles===
Files of this form get pushed hgwbeta -> RR. Because they used to be omitted from the pushQ entry often, the directories containing these files are now pushed weekly byKatrina on Fridays. So QAers no longer have to worry about pushing these. They are not in source control so go out way ahead of the track usually.
*''No longer run, here are Tim's comments about QA running validateFiles:'' "To get these things through the pipeline, we run them through validateFiles, so I think your running them through again is one time too many. But if you are going to, then each lab and each file type may have negotiated limits (which may change between submissions).  These limits are found in the relevant submission directory DAF files."
 
* Old validateFiles process:
  /usr/local/apache/htdocs/ENCODE/protocols/cell/*.pdf
Test a smattering of different file types using this tool: '''validateFiles''' (type the program name without arguments to see the usage statement). If there are no errors, there will be no output.  For example, for files of type tagAlign, invoke the tool like this:
  validateFiles -type=tagAlign -genome=/gbdb/hg18/hg18.2bit /usr/local/apache/htdocs/goldenPath/hg18/encodeDCC/wgEncodeHudsonalphaChipSeq/wgEncodeHudsonalphaChipSeqAlignmentsRep1Gm12878Control.tagAlign.gz


For tagAligns there are several relevant validateFiles options:
===Relative Links===
  mismatches - frequently 2 but negotiated for each labSet this to 5 to be tolerant
In html on our site, you can create relative links (on dev, the link goes to the page on dev, on beta, it goes to beta, etc.) by using part of the path based on the your file's location in the source tree relative to the location of the file or cgi you are linking to.
  matchFirst - negotiatedYou should set this to 25 and even then you may need to adjust it
<br />For example from ~trackDb/human/hg19, here is how you point to:
  nMatch - negotiated, but you should always have this parameter set.
* a golden path file:
  <A HREF="../../goldenPath/help/multiView.html" TARGET=_BLANK>here</A>.
* cgi:
  <A TARGET=_BLANK HREF="/cgi-bin/hgEncodeVocab?type=cellType">cell lines</A>
* ENCODE protocols:
<A HREF="../../ENCODE/protocols/cell">ENCODE cell culture protocols </A>.
* ENCODE portal:
  <A TARGET=_BLANK HREF="/ENCODE/index.html">Encyclopedia of DNA Elements (ENCODE) Project</A>
* ENCODE data release policy:
<A TARGET=_BLANK HREF="../ENCODE/terms.html" TARGET=_BLANK>here</A>


If you want to be exact, then the metadata as seen on the downloads page tells which submission directory the file belongs to, and the most recent *.DAF (or *.daf) will have a validationSettings line in it which will include the settings that belong to each file type. Example:
==Old info==
  /hive/groups/encode/dcc/pipeline/encpipeline_prod/773/UtaChIPseqBOonlyV1.DAF
===File Validation===
has the line:
* No longer run, here are Tim's comments about QA running validateFiles: "To get these things through the pipeline, we run them through validateFiles, so I think your running them through again is one time too many. But if you are going to, then each lab and each file type may have negotiated limits (which may change between submissions). These limits are found in the relevant submission directory DAF files."
  validationSettings allowReloads;validateFiles.tagAlign:mmCheckOnInN=100,mismatches=3,nMatch,matchFirst=25
* Old validateFiles process:


This means that the tag aligns were validated with -mismatches=3 -nMatch -matchFirst=25
Test a smattering of different file types using this tool: validateFiles (type the program name without arguments to see the usage statement). If there are no errors, there will be no output. For example, for files of type tagAlign, invoke the tool like this:


==Subsequent Release of Data (e.g. Release 2)==
validateFiles -type=tagAlign -genome=/gbdb/hg18/hg18.2bit /usr/local/apache/htdocs/goldenPath/hg18/encodeDCC/wgEncodeHudsonalphaChipSeq/wgEncodeHudsonalphaChipSeqAlignmentsRep1Gm12878Control.tagAlign.gz
Periodically, the existing ENCODE tracks will be augmented with new data as labs complete experiments on new cell lines, etc.  The new data will come in various formats: some will replace existing data, some will be brand new, some old data will be eliminated, etc.


===Notes file===
For tagAligns there are several relevant validateFiles options:


The data wrangler will create a text document, check it into git, and place it here: kent/src/hg/makeDb/doc/encodeDccHg18/*.txt
  mismatches  - frequently 2 but negotiated for each lab.  Set this to 5 to be tolerant
  matchFirst - negotiated.  You should set this to 25 and even then you may need to adjust it
  nMatch - negotiated, but you should always have this parameter set.


This document should contain complete lists of each table and file and what its disposition is.  The tables and files will fall into categories similar to this:
If you want to be exact, then the metadata as seen on the downloads page tells which submission directory the file belongs to, and the most recent *.DAF (or *.daf) will have a validationSettings line in it which will include the settings that belong to each file type. Example:
*A) Untouched - are on public browser and should remain
*B) Deprecated - are currently on RR but will no longer be needed and should not be referenced by the public site.
**NOTE: NO FILES SHOULD BE REMOVED from the downloads directory on hgdownloads (RR).
**This list is provided for completeness. Any files marked here as in gbdb may be eliminated.
*C) New - are only currently on test but will need to be pushed to the RR.
*D) Additional items of note


This document may not match reality.  It may be the case that some of the tables/files do not exist at all, the names are incorrect, they are not present on the machine as listed in the file, they do not match the list that is in the pushQ.  The first challenge in QAing a subsequent ENCODE release is to determine if/how the file diverges from reality.  '''To do this, compare the file to the "snapshot" of what is included in the release (and what you should QA), which can be found in the "release2" list in the downloads directory (hgwdev:  /data/apache/htdocs/goldenPath/<db>/<trackName>/release<x>/). If the file differs from the downloads directory, then send that information to the data wrangler and pop the track into the B-queue while they sort it out'''.  Otherwise, QA spends far too much time figuring out exactly what they are expected to QA.
  /hive/groups/encode/dcc/pipeline/encpipeline_prod/773/UtaChIPseqBOonlyV1.DAF


Once the list is finalized, proceed with the QA work as outlined above.  Note the additional steps in the [[#Files]] section for how to handle the /releaseN directory.
has the line:


===MetaDb changes===
  validationSettings allowReloads;validateFiles.tagAlign:mmCheckOnInN=100,mismatches=3,nMatch,matchFirst=25


There also may be some metadata changes to fix errors or add information such as the GEO Series and GEO Sample.
This means that the tag aligns were validated with -mismatches=3 -nMatch -matchFirst=25
* Be sure to QA these metadata changes
* Check for related redmine issues (addition of GEO accessions should definitely have a related issue for the addition of this info)
* You can also do a diff to check for any other metadata changes.From kent/src/hg/makeDb/trackDb/<org>/<db>/metaDb, do the following diff:
diff beta/wgEncodeYourTrack alpha/wgEncodeYourTrack


==Releasing to RR==
Note: Cc the data wrangler for this track on all your pushes Cc encode@soe.ucsc.edu on your final push.
# Check release log field in PushQ...needs to start with ENCODE
# If this is a first release, skip this step and go straight to Step#3. If this is a subsequent release, do the following:
#* Remove the 'release public' block (including sub-blocks) of your track from trackDb.wgEncode.ra.
#* Remove the 'release alpha,beta' lines from the release alpha blocks (including sub-blocks), and then on parent and view-in-the-middle blocks if applicable:
#**also note: removeAlphas script may not run if your table list file has lines that begin with a tab/space (remove these in vi with :%s/^ *//)
##git pull
##run removeAlphaBetas script (> to a file)
##diff between file & trackDb.wgEncode.ra (diff file1 file2)
##*double check # of release alpha,betas matches # of tables (diff file1 file2 | grep release | wc --lines)
##copy file over trackDb.wgEncode.ra
##git diff to check new copy against repository copy
##If necessary, remove release alphas from the parent block and "view-in-the-middle" sub-blocks and git diff again
##make alpha on db to make sure everything looks good on dev
##commit change
##git diff again to make sure they are the same
##remove file
##from trackDb on hgwbeta: make beta DBS=<db>
#Releasing the metaDb (most tracks will require this)
#* Starting from /cluster/home/$usr/trackDb/$species/$db/
## copy metaDb .ra file from ~metaDbeta-> metaDb/public
## add .ra file name to the makefile in ~metaDb/public
# Do a make public DBS=<db> (from trackDb on hgwbeta) and announce it to QA
# If this is a subsequent release, check track on http://hgwbeta-public.cse.ucsc.edu
# Run comparePublic.csh to check differences between trackDb_public and RR and hgwbeta.
# Push track tables from mysqlbeta -> mysqlrr (not trackDb_public yet)
# Drop tables from hgwbeta that need to be removed (being replaced by V# tables)
# Drop tables being removed from the RR
# Push trackDb and friends ([[Pushing_trackDb]]) and tableDescriptions  from mysqlbeta -> mysqlrr
# Push download files, index.html, files.txt and md5sum.txt (from hgwdev to hgdownload)
#* If this is a releaseN, even though there is a releaseN directory on hgwdev, do not create one on hgdownload (see the Download Files section of [[#Pushing Files]] for specifics)
# Drop .wib files that need to be dropped (from hgnfs1)
# Check the [http://genome.ucsc.edu/ENCODE/downloads.html ENCODE/downloads.html] page to see if your track is listed. If not (mostly for first releases), edit and push ENCODE/downloads.html from hgwdev -> hgwbeta & RR (a special Encode download release log)
# This step is now done when you request a push of trackDb and friends so this step can be skipped: Push cv.ra file (only if there is a matrix) located here: /usr/local/apache/cgi-bin/encode
# Click "push requested" in the pushQ record and then click "done" after verification on the RR. Then transfer pushQ entry from the the L queue to the main pushQ.


[[Category:Browser QA]]
[[Category:Browser QA]]
[[Category:Browser QA ENCODE]]
[[Category:Browser QA ENCODE]]

Latest revision as of 23:46, 4 June 2019

Getting Started

Note: see the Old ENCODE QA page for the pre-bootcamp ENCODE QA process

Claim track (redmine)

  1. Look at the Redmine PushQ query in the Redmine ENCODE project.
    • In general, select the track with the highest priority. If two tracks have the same priority, take the one with the soonest due date. Also take into account how many other tracks that wrangler currently has in "Reviewing" status (under active QA).
  2. Change Assignee to yourself
  3. Change Redmine status from "Approved" to "Reviewing"
  4. Add yourself (and the wrangler) as a watcher to the redmine ticket (so if you assign it back to the developer you will get updates)

Note: QA no longer uses the '% Done' field in the Redmine Issue to estimate QA progress. This is now just for wrangler use. Also, you may want to review the Redmine notes for the previous release, found under the "Related issues" section at the top of the current Redmine ticket.

Run encodeQaInit on hgwdev

Running this script on hgwdev makes preparations for QA (run the script without any arguments to see the usage):

The script:

  • runs qaEncodeTracks script
  • performs a verification of the notes file
  • sets the release status of all the subIds for the current release to "Reviewing" (no longer need to ask the wrangler to do this)

The script creates:

  • directory in /hive/groups/encode/encodeQa/hg19/ for that track and release (the subsequent items are in this directory)
    e.g. /hive/groups/encode/encodeQa/hg19/wgEncodeRikenCage/release2
  • allTables - list of all tables for this release
  • beta.mdb.ra
  • checkPushFilesList - lists files that should be pushed to hgdownload; use to verify all files were successfully pushed to hgdownload, see Verify all downloads pushed section
  • claimMail - email for Kate about claiming track
  • downloads (sym link) - links to downloads directory of the track
  • htmlDownloadSnippet - html for adding track to the appropriate ENCODE downloads page (human or mouse)
  • fullFilesListNoRevoked - list of all files downloadable through hgFileUi
  • methods - created from this ENCODE QA wiki page index to be used as a checklist and notes file for QA
  • newTables - list of new tables for this release
  • notes.file (sym link) - links to the notes file for a track describing what this release consists of (created by encodeMkChangeNotes)
  • pushFilesMail - email to send for pushing the download files (releasing)
  • pushGbdbsMail - email to send for pushing the gbdb files (staging)
  • pushTableMail - email to send for pushing the tables and trackDb & friends (releasing)
  • release.sql - sql commands for to create the releaseLog (after releasing the track)
  • script.output - output from qaEncodeTracks script
  • subIds - list of all the subIds involved in this release

Re-running encodeQaInit:

If the release changes in a way that the notes.file changes (tables/file/gbdbs are added, removed, etc), then the push*Mail, tableList, etc will no longer be accurate. In some cases, it might be best to re-run encodeQaInit so that these files don't have to fixed by hand.

Details:

  • methods will not be replaced if it exists already
  • release.sql will not be replaced, but it will be updated:
    • lastdate = last time encodeQaInit was run (updates every time encodeQaInit is run)
    • initdate = date encodeQaInit was first run (not ever overwritten)
  • all other files will be replaced
  • if you want to be able to refer back and see what tables and gbdbs you pushed, before running the script, rename newTables (e.g. tableListOld), or save the email files

Create a symbolic link

  1. From your home directory enter:
ln -s /hive/groups/encode/encodeQa/*database*/*wgEncodeTrackName*/*release* trackName

Example: ln -s /hive/groups/encode/encodeQa/hg19/wgEncodeUwDnase/release5 UwDnase

  1. Symbolic links allow you quickly jump from your home directory. For example by typing: cd UwDnase
  2. To delete a symbolic link, if you tab complete the name, be sure to delete the final "/" character. For example, "rm UwDnase" will work, while "rm UwDnase/" will not.

email Cricket (claimMail)

and cc Kate. From the command line encodeQa directory:

mail -s "claiming trackName (Release #)" -c yourDevLogin -c kate cricket < claimMail

(good habit to read the email before sending)

Review the notes file

  1. Review the #Notes file to familiarize yourself with the components of the release. To find the notes file:
    • use the notes.file sym link in the encodeQa directory
    • or go to /kent/src/hg/makeDb/doc/encodeDcc/__<db>__/
  2. If this is a subsequent release, see #Subsequent Release of Data (e.g. Release 2) first.
  3. Compare the notes file to the hgTrackUi (to make sure it reflects the notes file) on dev (or on beta if dev already has the next release staged).
  4. If a Release N, compare the hgTrackUi on dev to the previous release's hgTrackUi on the RR to help verify notes file & new hgTrackUi is correct (e.g. make sure things aren't missing from the new release in comparison to the previous release that aren't accounted for in the notes file).

Pre-QA

Some tracks may have already gone through some preliminary QA, see Pre-QA for more information.

Check script.output

Output from qaEncodeTracks.csh which is run by encodeQaInit and does:

  • countPerChrom
  • check for entry in tableDescriptions table
  • check that shortLabel does not exceed 17 characters
  • check that longLabel does not exceed 80 characters
  • check that there are no underscores in the table names
  • check for indices on the tables
  • check that positional tables are sorted
  • checkTableCoords (checks for any illegal coordinates)

Also, run genePredCheck/pslCheck if applicable. (i.e. if your track is a gene prediction track)

Staging on hgwbeta

Push /gbdb files (pushGbdbsMail)

Push new and, if applicable, updated /gbdb files (e.g. .wib, .bb, etc.) from hgwdev -> hgnfs1.

  • Review the pushGbdbsMail, then from the command line in the encodeQa directory:
mail -s "push files to hgnfs1 for trackName (Release #)" -c yourDevLogin push-request < pushGbdbsMail

Open track on beta (if subsequent release)

Open the track on hgwbeta in hgTracks before staging it.
This way, when you check the track on beta (in the last staging step) you'll be able to tell if the update will cause a cart clash for users who happen to be using it when you release it to the RR (as evidenced by a completely blank screen).

Push tables to hgwbeta (bigPush.csh)

Use bigPush.csh on hgdev using the newTables file created by encodeQaInit. For example: bigPush.csh mm9 newTables

Run encodeQaPrepareRelease (trackDb release tags and metaDb)

  1. Run encodeQaPrepareRelease with "beta" for the stage (tip: run it from the encodeQa directory created by encodeQaInit, and the summary file of metadata changes will be saved there). Running encodeQaPrepareRelease:
    1. updates the trackDb release tags for the track's include statements appropriately:
      • In /cluster/home/$usr/trackDb/$species/$db/trackDb.wgEncode.ra, finds the include statement for your track's .ra file and changes 'alpha' tag to an 'alpha,beta' tag and, if applicable (releaseN), change 'beta,public' to 'public')
      • see the Three State TrackDb page for more info on release tags and our three-state trackDb)
    2. prepares the metadata:
      • from /cluster/home/$usr/trackDb/$species/$db/metaDb:
      • copies metDb .ra file from ~metaDb/alpha -> metaDb/beta
      • adds .ra file name to the makefile in ~metaDb/beta
    3. creates a trackName.beta.metaDb.diff file which is a summary of the metadata changes which you can review.
  2. Do a git status to see what files changed; review changes
  3. Check-in the changes
    1. Add commit message such as: git commit -m "Staging Caltech RNA-Seq (Release 1) on beta (redmine #7777)"
  4. On hgwbeta, make beta DBS=__ from /cluster/home/$usr/trackDb/

Check track on beta

Check that the track looks good on beta.
If this is a subsequent release, you already had the track open on beta from #Open track on beta (if subsequent release). Refresh the page to see the changes.
If you get a blank screen:

  1. Don't reset your cart (at least not until you've completed these steps!)
  2. Notify the track wrangler that there is likely a problem with conflicting cart variables when the new data is used with an old cart.
  3. Dump the cart variables (manipulate the URL to: http://hgwbeta.soe.ucsc.edu/cgi-bin/cartDump then hit enter) and save them in a file for people to look at.

hgTrackUi

Functionality (track controls)

Display Modes

  • If in a super-track, by default, composite "Maximum display mode" should be set to dense. Super-track should be set to hide.
  • If not in a super-track, by default composite "Maximum display mode" should be set to hide.
  • If multiple views, Kate wants these settings by default:
    • Peaks -> pack
    • Alignments -> hide
    • All else -> full
  • NOTE: Check with the wrangler in case there are custom visibility settings
  • Changing display mode of views should affect the subtrack list & hgTracks

Config Settings of Views

  • settings function correctly
  • settings of different views are independent
  • Signals, by default, should have the following settings (unless lab has requested otherwise or other good reason):
    • Data view scaling: use vertical viewing range (rather than auto-scale)
      • (Pre-QA skip) in dense, default fixed range should result in meaningful banding at full chromosome (not all gray)
    • Windowing function: mean + whiskers

Matrix

  • By default, matrix boxes should be fully checked or fully unchecked (not grayed), if not, this is trackDb setting issue that the wrangler should fix.
  • Matrix headers:
    • For human, Tier 1 and Tier 2 cell lines:
      • should be listed first (Tier 1 in alphabetical order followed by Tier 2 in alphabetical order)
      • should be labeled as Tier 1 or 2. The tier should follow the cell line name, in parentheses and bolded. No hyperlink, no italics, e.g. cellLineA (Tier 1)
    • matrix headers are links to a working hgEncodeVocab page for the item (cell line, factor, etc)
    • +/- buttons function correctly
    • selections in matrix result in appropriate selection changes in subtrack list

Subtrack list

  • adjusts according to matrix & view (hide -> non-hide) selections
  • 'only selected/visible' and 'all' radio selections function
  • sorting functions (clicking on column headings)
  • schema links work and has a "description" column
    • if the table is very large, there may not be an "info" column
    • if the table is merely a reference to a big* filename, there will not be a "description" column

MetaData

  • make sure metaData is present by clicking on the down arrow (v)
  • check a few to make sure they have somewhat consistent fields
  • spot check a few fields to make sure they make sense
  • make sure subtracks have dccAccession numbers (aka UCSC Accession), if none, may not be ready for QA; ask wrangler.
    • expId & dccAccession should correspond, dccAccession = wgEncodeE<H or M><expId> (the E=experiment, H=human, M=mouse)
    • these #numbers should be the same among subtracks of the same "experiment," even across assemblies of an organism (e.g. same number on hg18 & hg19)
      • NOTE: expID should only be displayed on hgwdev.
    • can check to make sure a composite (or single track) has all its expIDs and dccAccessions using experimentify option on mdbPrint:
mdbPrint <db> -composite=<compositeName> -experimentify
mdbPrint <db> -obj=<trackName> -experimentify

Links

  • check that all links work, and (PRe-QA skip) where applicable, are relative

Content (.html description page)

Labels

Sections

  • Make sure all sections are present, in order, and have the correct headings (the list below has the correct headings and is in the correct order)
  • Check grammar, spelling, readability, completeness, correctness
  • style should be consistent with the rest of the site
    • Description should be in a passive 3rd person voice
    • references to "data" are plural
    • value and units have space between them (e.g. 50 bp rather than 50bp)
  • links should be hyper linked text rather than just plain URLs
  • Latin Terminology
    • Latin or foreign words or phrases should not be italicized
    • Genus/species names should be italicized
Description
  • Brief overall summary of track.
Display Conventions and Configuration
  • Contains info about each view in track
  • No description for views only available in downloads
  • link to multi-view instructions if there are multiple viewing options.
  • Tracks with Bam alignments (in metadata, fileName will end with ".bam") should have a link to the Sam Format Specification and should explain any non-standard tags, those starting with X, Y or Z or that are not listed in the tag section
Methods
  • Make sure it is detailed enough.
Verification
  • optional
Release Notes
  • Optional for first release
  • Required for subsequent releases
  • Should start with "Release # (Month Year) of this track...."
  • Provides a description of the changes of this particular release.
Credits
  • Must have contact person
  • Name is a hyperlink to email
  • Email must be sanitized (using encodeEmail.pl script). To check go into your track's .html file and make sure the 'mailto' address is encoded.
References
  • Correct format, see CBSE citation format
  • Alphabetical order
  • Make sure URL to references don't contain the rel='nofollow' attribute.
  • Hyperlink uses PMID
Publications

This is an optional listing publications that reference or use ENCODE data from this track. Information for this section is provided to us by NHGRI.

Data Release Policy

Note: GENCODE Genes tracks are exceptions since GENCODE Genes data are restriction free immediately; see below for the GENCODE Data Release Policy language.

  • Standard language, Supertrack -refers to dates as being on track configuration and download page:
Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here.
  • Standard language, Track -refers to dates as being above:
Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here.
  • GENCODE Genes language, Track or Supertrack:
GENCODE data are available for use without restrictions. The full data release policy for ENCODE is available here.

Links

  • Check standard links are present, and, where applicable, are relative:
    • ENCODE Data policy (in Data Release Policy section)
    • help for multi-view (Next to "Select views" in track Control Section in Display Conventions and Configuration section)
    • contact email (see #Credits for more info)
  • If there is supplemental data, make sure there is a link and that it points to hgdownload and not hgdownload-test

hgc details

Check the following for each view:

Accuracy of details

  • details that are displayed correspond with the record in the table

Makes sense

  • table values seem correct

Useful

  • you understand what is being displayed
  • internal, non-functioning fields are not displayed (e.g. if all values in a field have "-1" as a placeholder, we shouldn't display that field)

Complete

  • all useful information from is present (there's nothing important that is missing)

Clear

  • details are presented and labeled clearly
  • layout is user friendly

Links

  • check that all links are relevant, work, and where applicable, are relative
  • standard links: downloads, metadata, schema

hgTracks

Display

Views (zoomed in/out)

  • check the display of all Views in all display modes when zoomed in to the base pair level & zoomed out to 1 million bp

Table coordinates + features

  • an items' cooridnates and other display features (exons, etc) display as expected/correctly based on table
    • a line from the table for comparing against the display can be obtained from schema or mysql db for regular tables
    • for bam files, the schema will only give you a filePath. You will need to use SamTools to obtain a point to test.
      1. add /hive/data/outside/samtools/svn_${MACHTYPE}/samtools to $PATH in your .bashrc
      2. run samtools on the command line using the fileName found in the schema (see following example). The output will give you the start position and then it gives you read length (in the CIGAR string); if the CIGAR string is simple, e.g. 76M, add the read length, 76, to the start position to get the end position. If the CIGAR string is complicated, e.g. 43S17M494510N16M, just use the actual sequence to determine the length (paste sequence into Word & get word count) and add this to the start position to get the end position. This will give the point needed to put into the browser for testing purposes.
 samtools view -x filePath chrx:xxxx-xxxxxx | head
 samtools view -x /gbdb/hg19/bbi/wgEncodeCshlLongRnaSeqHuvecCellPapAlnRep1.bam chr1:2000000-3000000 | head
  • for big* files, you can't get individual record, but use bigWigInfo or bigBedInfo to get general stats, be sure bigWigs are version 4.

Searchable

  • Are items searchable; should they be? Most likely not for ENCODE. (position/search box at the top of the browser image)
  • Do a quick search of a subtrack in track search (button found at the bottom of the browser image) to make sure that it is interacting correctly.

Colors

Human
  • Human Tier 1 cell lines should be displayed in a unique color (other than black)
  • Human 'original' Tier 2 cell lines (HeLa-S3, HepG2, and HUVEC) should be displayed in a unique color (other than black), but newly promoted Tier 2 cell lines should be displayed in black until a unique color for each is determined by the consortium.
  • it is OK if other tracks are in color, but not necessary
Mouse
  • For mouse, tissues/cell lines are grouped based on organ systems, and each has a unique color:
    1. Skeletal: Dark Purple 102,50,200
    2. Muscular: Brown (139,69,19)
    3. Circulatory: Red (153,38,0)
    4. Nervous: Grey (105,105,105)
    5. Respiratory; Black (empty)
    6. Digestive: Orange (230,159,0)
    7. Excretory: Purple (204,121,167)
    8. Endocrine: Pink (189,0,157)
    9. Reproductive: Green (0,158,115)
    10. Lymphatic/immune System: Blue (86,180,233)
    11. Stem Cell: Dark Blue (65,105,225)
  • Visualize the above colors in the browser Mouse color Session
  • If a cell line is not colored that should be, the appropriate color for the cell lines is listed as a 'color' value in the cv.ra.

Defaults (composite/subtracks)

  • should this composite track be on by default? (For ENCODE, usually no)
  • check the which subtracks are set as default selection, make sure:
    • there aren't too many
    • important cell lines are on by default
  • default Tier 1 and Tier 2 subtracks should display first

Compare to hg18

  • If track is the first release in hg19, compare a point on the hg19 browser of the track to the equivalent position in hg18.
    1. use "convert" from hg19 position to see the equivalent position in hg18.
    2. go back to your region in hg19, open new window and paste in hg18 equivalent position and compare hg19 to hg18.
  • Note: Comparisons to hg18 should be very cursory. Any differences should be noted in the redmine ticket, but not necessarily investigated unless a user also brings up an issue. The thinking behind this is that when there are differences, it is most likely an error with hg18, not hg19 and we are unnecessarily holding up hg19)

Performance

Chrom 1 Test (signal & experiment)

When position is set to all of chromosome 1, data of interest (in full mode) loads in less than 1 minute:

  • signal: check time of loading first signal subtrack
  • experiment: check time of loading all views for one experiment (e.g. Pol2 in GM12878 cells)?

Defaults at Gene Sized Region Test

Set position to a gene-size region with your track's default subtracks on and the default browser tracks on (easiest to reset cart, turn on track)

  • should display quickly and not be "too much" data

Data make sense

Compare subtracks within views

  • Do all the subtracks within a view somewhat correlate?

Compare subtracks of related views

  • For example:
    • Does the All Signal Raw Signal subtrack of an experiment really seem to comprise of the data in the Plus Raw Signal & Minus Raw Signal?
    • Do Peaks really represent the high Signal areas of the Signal View subtracks?

Do the data make sense Biologically?

  • Turn on other tracks to compare.
    • compare to the gene tracks
    • compare to subtracks of similar tracks
    • For example:
      • RNA-seq data should correlate with the exons in a genes track
      • TFBS tracks should correlate with the beginning of gene transcripts

Files

hgFileUi

  • 'Downloads' links on hgTrackUi should now go to hgFileUi
    • if not, ask wrangler to add "fileSortOrder" information to trackDb entry

file count (notes.file)

  • Check # of files displayed is correct (use notes.file). Pre-QA skip.
  • Note: If you are not seeing the correct number of files on beta, first try clearing the cache
    • Add the following to the end of the URL in your address bar: &clearCache=true

download button

  • Make sure download button prompts a download (and doesn't take you to an error page)

useful columns w/ good titles

  • Columns are useful
  • Column titles are correct and make sense (e.g. dccAccession title is "UCSC Accession")

sort columns

  • Check the sorting of columns. Clicking on the title of the column should sort the table on that column.

file filter

  • Check the filtering of files

links

  • Check that the "Track Settings" link takes you back to the track's hgTrackUi page
  • Check that the navigation, file filter title links, and other links work
  • Make sure files.txt & md5Sum.txt links are present and function
  • Make sure the download server link goes to the download directory for that track see #download server for more info.

download server

  • linked from hgFileUi with the *download server* link
  • have wrangler remove index.html or preamble.html files from the current release directories if they exist (it is OK in older directories, e.g. release1 if this is a release3).

README

  • README should be displayed automatically followed by the list of files in the directory
  • contains a URL to the track's hgFileUi (you can double check by copying link, pasting in a browser, and changing hgdownload to hgdownload-test).
  • there may be more files/directories in here than seen in hgFileUi. This is OK. Because we are not dropping obsolete files, they will be present in this directory. Also, on hgdownload-test there will also be releaseN directories. These are part of the process of preparing a track and are OK. These, however, *won't* be pushed to the hgdownload upon release of the track.

Release to RR

Note: Cc the data wrangler for this track on all your pushes and Cc encode-staff@soe.ucsc.edu on your final push. Bold text===Push downloads (pushFilesMail)===

  • Push download files, index.html, files.txt and md5sum.txt (from hgwdev to hgdownload):
mail -s "push download files for trackName (Release #)" -c wranglerDevLogin push-request < pushFilesMail
  • If track has supplemental data (files linked from the description pages in a /supplemental directory), make sure the pushFilesMail includes them (if they're in the notes.file, they should be in the pushFilesMail).
  • Notice that even though there is a releaseN directory on hgwdev, it is not pushed to a releaseN directory on hgdownload (see #Download files for specifics, and note about ReleaseLatest)
  • Note, this push can take hours, so you may want to start the day before you want to release.
  • NOTE: If you are manually requesting this push be sure not to push the releaseLatest directory, only the files from there. Add a useful sentence to tip off admin in your push-request.

Verify all downloads pushed (encodeQaCheckHgdownloadFiles)

  1. Run encodeQaCheckHgdownloadFiles to make sure all files got pushed to hgdownload
    • use 'checkPushFilesList' as the list of files (created by encodeQaInit and in the encodeQa directory for the track/release)
  2. Run encodeQaCheckHgdownloadFiles to make sure all files got pushed to the hgdownload San Diego (hgdownload-sd)

to check hgdownload (hgdownload is the default server, so it doesn't need to be specified):

 encodeQaCheckHgdownloadFiles db trackname checkPushFilesList

to check hgdownloa-sd:

encodeQaCheckHgdownloadFiles db trackname checkPushFilesList -s hgdownload-sd

Run encodeQaPrepareRelease (trackDb release tags & metaDb)

  1. Check metaDb/alpha vs. beta to make sure you have most updated metadata (diff beta/<trackName.ra> alpha/<trackName>.ra) or try the qaRaDiff script (soon to be renamed raDiff).
    • if diffs are due to next release, don't copy to beta, if diffs are for current release, copy to beta & double check in hgTrackUi, etc.
  2. Run encodeQaPrepareRelease with "public" as the stage, which:
    1. updates the release tags in trackDb.wgEncode.ra (of the $db directory): removes any release tags from the current <trackName>.releaseN.ra include line (or can be <trackName>.ra if 1st release)
    2. updates the metaDb, from /cluster/home/$usr/trackDb/$species/$db/metaDb:
      1. copies metaDb <trackName>.ra file from ~metaDb/beta -> metaDb/public
      2. adds <trackName>.ra file name to the make file in ~metaDb/public (if a first release)
  3. Review changes
  4. Check-in changes
  5. Announce intent to make public on db
  6. If' you have changes to cv.ra, you will need to push a copy from beta to public. See RM#11601
  7. On hgwbeta, make public DBS=$db from /cluster/home/$usr/trackDb/

Check track on beta-public

Check track on hgwbeta-public

  • hgwbeta-public uses hgwbeta for the tables, but uses the CGIs that are on the RR.

Push tables + trackDb&friends (pushTableMail)

  • Push the tables from hgwbeta -> mysqlrr and trackDb & friends:
mail -s "push tables and trackDb & friends for trackName (Release #)" -c encode-staff push-request < pushTableMail
  • no longer have to push tableDescriptions because it gets pushed out once a week by a designated QA'er

Check track on RR

Once all pushes complete, check track on RR

Set subIds to Released on hgwdev

Set the ENCODE status of the subIds (listed in the 'subIds' file of the enocode directory created by encodeQaInit) for this release to "Released":

  • on hgwdev from the bash shell command line enter this 'for loop':
for i in $(cat filenamewithpath); do encodeStatus.pl $i released; done

or, if you need to change just a few subIds, you can enter the individual subIds with spaces in between them instead of doing the cat fileName:

for i in subIdsWithSpacesSeparating; do encodeStatus.pl $i released; done

For example:

for i in 2007 3113; do encodeStatus.pl $i released; done

Close Redmine ticket

  • If there aren't any lingering issues, close the ticket by setting the status to "Released".

Run encodeQaSqlRelease on hgwbeta

On hgwbeta, encodeQaSqlRelease creates a pushQ entry directly in the L queue of the Main pushQ so the ENCODE track will have an entry in the release log.

  • The Release Log field of the entry created:
    • should be the long label (or short if too long) and, if releaseN, release number in parentheses
    • should contain ENCODE (or it it won't show up on ENCODE downloads page)
  • The Release Log URL:
    • should be a relative link with the db specified (e.g. ../../cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRikenCage)

ENCODE downloads (htmlDownloadSnippet)

Check ENCODE downloads page (human | mouse), if your track isn't there:

  1. add it by copying the text in the htmlDownloadSnippet file and adding it to /kent/src/hg/htdocs/ENCODE/downloads.html under the appropriate group in alphabetical order. The snippet may be added to a different location, for example downloadsMouse.html instead of downloads.html, if you were working on mouse.
  2. if necessary, also add its super-track:
    • super-track title should be a non-underlined link to the super-track hgTrackUi, for example:
      <A style="text-decoration:none" HREF="http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRnaSeqSuper" TARGET=_blank>RNA-seq</A>
  3. test your changes in your sandbox and dev by doing a make update and make alpha in the appropriate location: /cluster/home/*yourhome*/kent/src/hg/htdocs
    • do a make beta from hgwbeta and ensure changes, check them in.
  4. request push of the appropriate file (depending on if your track is in mouse or human, for example with subject line: "please push ENCODE mouse downloads static page") from hgwbeta -> RR:
/usr/local/apache/htdocs/ENCODE/downloads.html

OR

/usr/local/apache/htdocs/ENCODE/downloadsMouse.html

Other info

Subsequent Release of Data (e.g. Release 2)

Periodically, released ENCODE tracks will be augmented with new data as labs complete experiments on new cell lines, etc. The new data will come in various formats: some will replace existing data, some will be brand new, some old data will be eliminated, etc.

Notes file

The data wrangler will create a notes file using the encodeMkChangeNotes script, check it into git, and place it here: kent/src/hg/makeDb/doc/encodeDcc$db/*.txt

This document should contain complete lists of each table and file and what its disposition is. The tables and files will fall into categories similar to this:

  • A) Untouched - are on public browser and should remain
  • B) Deprecated - are currently on RR but will no longer be needed and should not be referenced by the public site.
NOTE: NO FILES SHOULD BE REMOVED from the downloads directory on hgdownloads (RR).
This list is provided for completeness.
  • C) New - are only currently on test but will need to be pushed to the RR.
  • D) Additional items of note

This document may not match reality. It may be the case that some of the tables/files do not exist at all, the names are incorrect, they are not present on the machine as listed in the file, they do not match the list that is in the pushQ. The first challenge in QAing a subsequent ENCODE release is to determine if/how the notes file diverges from reality. To do this, compare the file to the "snapshot" of what is included in the release (and what you should QA), which can be found in the "release2" list in the downloads directory (hgwdev: /data/apache/htdocs/goldenPath/<db>/<trackName>/release<x>/). If the file differs from the downloads directory, then send that information to the data wrangler and pop the track into the B-queue while they sort it out. Otherwise, QA spends far too much time figuring out exactly what they are expected to QA.

Once the list is finalized, proceed with the QA work as outlined above. Note the additional steps in the #Files section for how to handle the /releaseN directory.

MetaDb changes

There also may be some metadata changes to fix errors or add information such as the GEO Series and GEO Sample.

  • Be sure to QA these metadata changes
  • Check for related redmine issues (addition of GEO accessions should definitely have a related issue for the addition of this info)
  • You can also do a diff to check for any other metadata changes.From kent/src/hg/makeDb/trackDb/<org>/<db>/metaDb, do the following diff:

diff beta/wgEncodeYourTrack alpha/wgEncodeYourTrack

Pushing Files

Pushing the three main types of files involved in ENCODE tracks.

gbdb files

Files of this form get pushed hgwdev -> hgnfs1

/gbdb/hg18/wib/wgEncode*.wib

Download files

Download files for an original release get pushed hgdownload-test on hgwdev -> hgdownload (list the entire file path as usual)

/usr/local/apache/htdocs-hgdownload/goldenPath/hg18/encodeDCC/wgEncode*/index.html
/usr/local/apache/htdocs-hgdownload/goldenPath/hg18/encodeDCC/wgEncode*/wgEncode*.[bed/wig].gz

When pushing download files for a subsequent release track (e.g. release 2), push files as follows (but in your request, list the from/to paths at the top followed by a list of the file names without the full path)

from hgwdev: /usr/local/apache/htdocs-hgdownload/goldenPath/<db>/encodeDCC/<trackName>/releaseN/
to hgdownload: /usr/local/apache/htdocs/goldenPath/<db>/encodeDCC/<trackName>/\
(Note no releaseN directory on hgdownload.  FYI: releaseLatest is symlinked to the newest releaseN.  It should be OK to push from releaseLatest.)
  • Once the files have been pushed you can check to see if the push was successful using this script: checkPushedFiles.csh

NOTE: Although these files are logically in the htdocs-hgdownload directory in the apache document root, the physical location of the downloads directories is in the analysis/ftp path on the hive (to make pre-release data available to ENCODE analysts by FTP). The symlink pattern is:

/usr/local/apache/htdocs-hgdownload/goldenPath/<rel>/encodeDCC ->
/hive/groups/encode/dcc/pipeline/downloads/<rel>

Other files

Files of this form get pushed hgwbeta -> RR. Because they used to be omitted from the pushQ entry often, the directories containing these files are now pushed weekly byKatrina on Fridays. So QAers no longer have to worry about pushing these. They are not in source control so go out way ahead of the track usually.

/usr/local/apache/htdocs/ENCODE/protocols/cell/*.pdf

Relative Links

In html on our site, you can create relative links (on dev, the link goes to the page on dev, on beta, it goes to beta, etc.) by using part of the path based on the your file's location in the source tree relative to the location of the file or cgi you are linking to.
For example from ~trackDb/human/hg19, here is how you point to:

  • a golden path file:
<A HREF="../../goldenPath/help/multiView.html" TARGET=_BLANK>here</A>.
  • cgi:
<A TARGET=_BLANK HREF="/cgi-bin/hgEncodeVocab?type=cellType">cell lines</A>
  • ENCODE protocols:
<A HREF="../../ENCODE/protocols/cell">ENCODE cell culture protocols </A>.
  • ENCODE portal:
<A TARGET=_BLANK HREF="/ENCODE/index.html">Encyclopedia of DNA Elements (ENCODE) Project</A>
  • ENCODE data release policy:
<A TARGET=_BLANK HREF="../ENCODE/terms.html" TARGET=_BLANK>here</A>

Old info

File Validation

  • No longer run, here are Tim's comments about QA running validateFiles: "To get these things through the pipeline, we run them through validateFiles, so I think your running them through again is one time too many. But if you are going to, then each lab and each file type may have negotiated limits (which may change between submissions). These limits are found in the relevant submission directory DAF files."
  • Old validateFiles process:

Test a smattering of different file types using this tool: validateFiles (type the program name without arguments to see the usage statement). If there are no errors, there will be no output. For example, for files of type tagAlign, invoke the tool like this:

validateFiles -type=tagAlign -genome=/gbdb/hg18/hg18.2bit /usr/local/apache/htdocs/goldenPath/hg18/encodeDCC/wgEncodeHudsonalphaChipSeq/wgEncodeHudsonalphaChipSeqAlignmentsRep1Gm12878Control.tagAlign.gz

For tagAligns there are several relevant validateFiles options:

 mismatches  - frequently 2 but negotiated for each lab.  Set this to 5 to be tolerant
 matchFirst - negotiated.  You should set this to 25 and even then you may need to adjust it
 nMatch - negotiated, but you should always have this parameter set.

If you want to be exact, then the metadata as seen on the downloads page tells which submission directory the file belongs to, and the most recent *.DAF (or *.daf) will have a validationSettings line in it which will include the settings that belong to each file type. Example:

 /hive/groups/encode/dcc/pipeline/encpipeline_prod/773/UtaChIPseqBOonlyV1.DAF

has the line:

 validationSettings allowReloads;validateFiles.tagAlign:mmCheckOnInN=100,mismatches=3,nMatch,matchFirst=25

This means that the tag aligns were validated with -mismatches=3 -nMatch -matchFirst=25