HgTablesTest details

From Genecats
Jump to navigationJump to search

hgTablesTest - what it is actually testing.

For each org/db/group/track/table does this:

 - For each org/db, it gets 5MB test region from the middle of the first chrom in chromInfo table.
 - Can filter by -org= -db= or specify number to check -orgs=N -dbs=N
 - Can filter by specifying a single -group= -track= -table=
 - Can filter by specifying the number to check, -groups=N -tracks=N -tables=N
   Defaults to all groups, and the first 4 tracks and the first 2 tables. 
   Since testing just the first 4 tracks all the time does not get much coverage
   of the rest of the system I have recently added the ability for it to
   shuffle the track and table lists or not. 
   -seed=N - option for reproducibility and debugging.
   -noShuffle - do not shuffle tracks and tables lists.
 - Recursively selects the track/table in the hgTables drop-downs.
 - Checks with the htmlCheck library all pages fetched by the robot.
 - Presses the schema button to bring up the schema page.
   Because the schema page includes the track description at the bottom,
   it ends up checking the html description which is located under makeDb/trackDb/
   and which gets built into the trackDb.html field.  It turns out that
   it stops at the first error, so actually testing and fixing goes faster
   by just running the htmlCheck utility directly on the .html files under makeDb/trackDb/.
   You can quickly find out if the fix worked, and if there are any other errors,
   without waiting for a whole other build-cycle of 3 weeks.
 - Presses the summary/statistics button.
 - testAllFields - chooses "all fields from selected table", "get output".
   Counts the rows returned and keeps as expectedCount for further steps.
 - testOneField - chooses "Select Fields from primary and related tables", "get output"
   It automatically checks the first field found and submits. It compares the rows returned to the expected count.
 - If no BED output is available, this is a signal that it cannot limit the output to the 5MB test position,
   which means that the entire table will be scanned.  The table is skipped if over 500K rows,
   which it checks in the database.
 - If BED output is available (and output is limited to 5MB test region),
   it then proceeds to test these:
   - testOutSequence - chooses "sequence", "get output", fetches, compares output rowCount to expectedCount.
   - testOutBed - chooses "BED", "get output", fetches, compares output rowCount to expectedCount.
   - testOutHyperlink - chooses "hyperlinks", "get output", fetches, compares output <A> tags count to expectedCount.
   - testOutGff- chooses "GTF", "get output", fetches. No other checking. (internally calls everything GFF not GTF)
   - testOutCustomTrack -- chooses "custom track", "get output", "CT in Table Browser". Checks that group "user" now exists which is the group where user-created subtracks go. Because previous tests may also create custom tracks, this check is not completely foolproof.
   "CDS FASTA from multiple alignments" output type is NOT tested.

What it is NOT testing:

identifiers (names/accessions)
filter
intersection
correlation

But, at the end, just once it does these special tests on uniProt db:

joining
filter
identifier

HGDB_PROF (or HGDB_CONF)

It joins uniProt.taxon. And it compares the number of rows returned to the table size which is fetched from the database. And if you are connected to hgwdev db while hitting hgwbeta URL, those two tables can be different, and hgTablesTest complains about it. You can address this by using the environment variable HGDB_PROF=someprofile where someprofile is defined in your .hg.conf file and points to the database which you are testing against, which would be hgwbeta. Alternatively you can point HGDB_CONF to .hg.conf.beta which points db.* to hgwbeta.

Hard vs Soft Errors

In the output log, it breaks down problems for reporting into hard and soft errors. Most of the errors are soft errors. A hard error occurs if it had errAbort while fetching, or the page variable is null, or the page->status returned from the hgTables CGI is not 200 OK.

Errors you can ignore

Ex error1

allFields n/a hg38 rep chainSelf chainSelfLink carefulAlloc: Allocated too much memory - more than 500,000,000 bytes (734,348,198)

This error is just saying the track has too many things to access in the Table Browser. In this instance the issue is that this is the self-alignment track, and it is in an area of a lot of repeats, near the centromere, so the track has a lot of items here.

chainSelf errors come in different forms and are often false positives, with no discoverable problem.

Ex error2

summaryStats Mouse mm10 rna intronEst est Error near line 169 of hgwbeta.cse.ucsc.edu/cgi-bin/hgTables:<li>Can\x27t\x20start\x20query\x3A\x3CBR\x3Eselect\x20tStart\x2CtEnd\x2CqName\x LI outside of any of DIR MENU OL UL
This is a known bug with the est table on mm10 where somebody forgot about split-chrom tables. The table 'mm10.est' doesn't exist, since it is split across each chromosome, so the real table names are chr1_est etc.

Ex error3

Error near line 163 of hgwbeta.cse.ucsc.edu/cgi-bin/hgTables: </BLOCKQUOTE></TD><TD><TT>varchar(255)</TT></TD> <TD><A HREF="/cgi-bin/hgTables </BLOCKQUOTE> without preceding <BLOCKQUOTE>

This error is actually a data bug -- the stray "" is in the intron column of the tRNAs table.


Ex error4

Example Running hgTablesTest

During the builds a script called doRobots.csh by the Build Meister

If you see an error in the logs, it can be helpful to rerun the hgTablesTest on that specific item.

Ex1

The following will run a test on the beta site of hg38 database selecting the group gene and the knownGene table and put the output into file called tempLog.

hgTablesTest hgwbeta.soe.ucsc.edu/cgi-bin/hgTables -db=hg38 -group=genes -track=knownGene -table=knownGene tempLog

While running important errors should show up, and you can look into the results to see the grand total:

grand total
               Total:   31 tests,  0 soft errors,  0 hard errors, 19.07 seconds

Ex2

Here's another example from a real log. Hint: grep -v "0 hard errors" to make results easier to read.
cat /hive/groups/browser/newBuild/kent/src/utils/qa/weeklybld/logs/v407.preview2.hgTables.log | grep -v "0 hard errors" | less

type subtotals
           allFields:   62 tests,  0 soft errors,  1 hard errors, 60.15 seconds
              schema:   68 tests,  0 soft errors,  2 hard errors, 56.70 seconds
        summaryStats:   58 tests,  0 soft errors,  1 hard errors, 59.71 seconds

organism subtotals
                 n/a:  753 tests,  0 soft errors,  4 hard errors, 629.07 seconds

db subtotals
                hg38:  740 tests,  0 soft errors,  4 hard errors, 619.33 seconds

group subtotals
                 rep:   61 tests,  0 soft errors,  2 hard errors, 101.25 seconds
              varRep:   74 tests,  0 soft errors,  2 hard errors, 54.39 seconds

track subtotals
           chainSelf:   18 tests,  0 soft errors,  2 hard errors, 68.17 seconds
   dbSnp153Composite:   23 tests,  0 soft errors,  2 hard errors, 16.31 seconds

table subtotals
       chainSelfLink:    4 tests,  0 soft errors,  2 hard errors, 22.21 seconds
   dbSnp153BadCoords:   11 tests,  0 soft errors,  1 hard errors,  7.72 seconds
        dbSnp153Mult:   11 tests,  0 soft errors,  1 hard errors,  7.87 seconds

grand total
               Total:  753 tests,  0 soft errors,  4 hard errors, 629.07 seconds

You can look at the above and see that the db hg38 and 4 hard errors and that the track chainSelf in grp rep was a source of two.

Here we run hg38 rep chainSelf chainSelfLink to recreate the issue:

hgTablesTest hgwbeta.soe.ucsc.edu/cgi-bin/hgTables -db=hg38 -group=rep -track=chainSelf -table=chainSelfLink tempLog2

This gives this screen output:

Running on machine hgwdev
Testing URL hgwbeta.soe.ucsc.edu/cgi-bin/hgTables
Connecting as hgcat@localhost to database server Localhost via UNIX socket
Testing hg38 at position chr1:121978212-126978211
Testing n/a hg38 rep chainSelf chainSelfLink
carefulAlloc: Allocated too much memory - more than 500,000,000 bytes (604,788,212). Exiting.

That Allocated too much memory - more than 500,000,000 bytes is the Errors you can ignore Ex1 on this page.

Ex3

In Ex2 there is also 2 hard errors on the dbSnp153Composite track.

Here we run hg38 varRep dbSnp153Composite dbSnp153Mult to recrreat the issue:

hgTablesTest hgwbeta.soe.ucsc.edu/cgi-bin/hgTables -db=hg38 -group=varRep -track=dbSnp153Composite -table=dbSnp153Mult tempLog3

Looking at tempLog3 we see the error is on the schema: cat tempLog3 | grep -v "0 hard errors"

schema n/a hg38 varRep dbSnp153Composite dbSnp153Mult Error near line 537 of hgwbeta.soe.ucsc.edu/cgi-bin/hgTables:  < 1%).</td>   </tr>   <tr>     <td>refIsSingleton</td>     <td class="number">3 Space not allowed between opening bracket < and tag name 

Ex4

In Ex2 there also 2 hard errors on the dbSnp153Composite track.


Here we run hg38 varRep dbSnp153Composite dbSnp153BadCoords to recreate the issue:

$ hgTablesTest hgwbeta.soe.ucsc.edu/cgi-bin/hgTables -db=hg38 -group=varRep -track=dbSnp153Composite -table=dbSnp153BadCoords tempLog4

Looking at tempLog4 we see the error is on the schema: cat tempLog4 | grep -v "0 hard errors"

schema n/a hg38 varRep dbSnp153Composite dbSnp153BadCoords Error near line 537 of hgwbeta.soe.ucsc.edu/cgi-bin/hgTables:  < 1%).</td>   </tr>   <tr>     <td>refIsSingleton</td>     <td class="number">3 Space not allowed between opening bracket < and tag name 


For Ex3 and Ex4 the result would be to let the developer of the dbSnp153Composite Track know about their Track Description page errors.