QA scripts

From Genecats
Revision as of 02:44, 27 September 2011 by Rhead (talk | contribs) (many more added!)
Jump to navigationJump to search

This is a list of the most frequently-used programs and scripts that QAers use. It does not include track-specific programs, such as chainNetTrio.csh. Some are devised by QA and live in the source tree at kent/src/utils/qa. Some are

Must know about!

bigPush.csh

Pushes tables in list to mysqlbeta and records size.

 Requires sudo access to mypush to run.
 
 Do not redirect output or run in the background,
 as it will require you to type your password in.
 Program will ask you for your password again after
 large tables. If you take too long to re-type in
 the table the script stalled on might not get
 pushed. Double-check that all tables have been
 pushed!
 
 Will report total size of push and write two files:
 db.tables.push -> output for all tables from mypush
 db.tables.pushSize -> size of push

commTrio.csh

Sorts and compares two files.  
Counts unique and common records.
    usage:  leftFileName rightFileName [rm]
            optional [rm]: remove the three output files when finished

compareWholeColumn.csh

 gets a column from a table on dev and beta and checks diffs.
 reports numbers of rows unique to each and common.
 can compare to older database.
 writes files of everything.
   usage:  database table column [db2] 

compareWholeTable.csh

 gets an entire table from two machines and checks diffs.
 reports numbers of rows unique to each and common.
 writes files of everything.
 not real-time on RR -- uses genome-mysql.
   usage:  database table [machine1] [machine2]
     (defaults to dev and beta)

countPerChrom.csh

 check to see if there are annotations on all chroms.
 will check to see if chrom field is named tName or genoName.
   usage:  database1 table [database2] [RR] [histogram]
     checks database1 on dev
     database2 will be checked on beta by default
       if RR is specified, will use genome-mysql
     histogram option prints bar graph, not values

featureBits

(no ".csh". Also see: getYield.csh)

featureBits - Correlate tables via bitmap projections. 
usage:
  featureBits database table(s)

(truncated for brevity)

findLevel

(no ".csh")

 searches trackDb hierarchy for your table and corresponding .html file
 also returns the value of the priority and visibility entries
 and the .ra file location for each
   usage:  database tableName

findOrg.csh

Finds the organism name given the assembly name
 usage: assemblyName [date]
        will accept name with or without digit
        use 'date' to also retrieve assembly date
        (e.g. 'ornAna2' or 'ornAna')

getAssemblies.csh

 gets the names of all databases that contain a given table.
 will accept the MySQL wildcard, %, but not on RR machines
 note: not real-time on RR.  uses nightly TABLE STATUS dump.
   usage:  tablename [machine] [verbose] - defaults to beta
             "verbose" prints list of assemblies checked

getTrackName.csh

Returns the short label and group of the track for this table.
In the case of a composite track, it returns the short label
for both the sub track and the parent track.
   usage:  database tableName

joinerCheck

(also see: runJoiner.csh)

joinerCheck - Parse and check joiner file
usage:
  joinerCheck file.joiner
options:
  -fields - Check fields in joiner file exist, faster with -fieldListIn
     -fieldListOut=file - List all fields in all databases to file.
     -fieldListIn=file - Get list of fields from file rather than mysql.
  -keys - Validate (foreign) keys.  Takes about an hour.
  -tableCoverage - Check that all tables are mentioned in joiner file
  -dbCoverage - Check that all databases are mentioned in joiner file
  -times - Check update times of tables are after tables they depend on
  -all - Do all tests: -fields -keys -tableCoverage -dbCoverage -times
  -identifier=name - Just validate given identifier.
                   Note only applies to keys and fields checks.
  -database=name - Just validate given database.
                   Note only applies to keys and times checks.
  -verbose=N - use verbose to diagnose difficulties. N = 2, 3 or 4 to
             - show increasing level of detail for some functions.

realTime.csh

(also see: updateTimes.csh)

 gets update times from all machines in real time for tables in list.
   usage:  database tablelist (will accept single table)

runBits.csh

 runs featureBits and checks for overlap with gaps.
   usage:  database trackname [checkUnbridged]
             where overlap with unbridged gaps can be turned on

updateTimes.csh

(also see: realTime.csh)

 gets update times for three machines for tables in list.
 if table is trackDb, trackDb_public will also be checked.
 warning:  not in real time for RR.  uses overnight dump.
   usage:  database tablelist 
           reports on dev, beta and RR
           tablelist will accept single table


Might also like!

compareTrackDbAll.csh

 checks all fields in trackDb
   usage: database [machine1] [machine2] [mode] 
     (defaults to hgw1 and hgwbeta)
     mode = (fast | verbose | fastVerbose) 
      - verbose is for html field - defaults to terse
      - fast = (genome-mysql) - defaults to realTime (WGET)

checkPushedFiles.csh

checks to see if files are in place, after a push
usage: website files(s)
website should include the path of the directory where
the files reside, such as:
  http://hgdownload.cse.ucsc.edu/goldenPath/hg19/liftOver/ 
file(s) is either a single name or a list of names, and can
include items with additional directory structure, like so:
  filename
  dir/filename
  dir/dir/dir/filename
any output other than '200 OK' indicates an error.

compareTableToFile.csh

Ensures that a table correlates with its associated file.
Only prints results if there is a diff between table and file.
Works for these file types: narrowPeak, broadPeak, gappedPeak,
                            bedGraph, NRE, BiP, gcf
For wiggle files, you must specify [wig] parameter.
 usage:  database tableName fileName [wig] [verbose]
  fileName includes path of download file 
  e.g. /goldenPath/<db>/fileName.gz
  use verbose for more details

copyExtSeqRows.csh

Automatically copies appropriate rows from the extFile and seq tables
from hgwdev to hgwbeta.

(truncated for brevity)

countRows.csh

 gets the rowcount for a list of tables from dev, beta and RR.
   usage:  database tablelist [genome-mysql]
     tablelist can be just name of single table
   RR results not in real time, but from dumps
   genome-mysql option adds results from public mysql server

findBlatServer.csh

gets info about which blat server hosts which genome(s)
usage:  db|host|all  [db|host]  [machine]
  first parameter required: one specific db or host or all dbs
  second parameter optional: order by db or by host (blatServer)
    defaults to order by db
  third parameter optional: specify machine
    defaults to RR

findColumn.csh

 searches database for all tables containing a specified column name.
   usage:  database, field, [machine = hgwdev|hgwbeta] 
     (defaults to beta)


findPushQLocks.csh

 find all locks in the pushQ on hgwbeta
   usage: go|real
    run with 'go' to see a list of locks
    run with 'real' to unlock all the locks

getChainLines.csh

 Searches the README.txt files to find the correct parameters for the 
 $chainMinScore and $chainLinearGap variables. 
   usage:  fromDb toDb (these can be in either order)

getChromlist.csh

 prints the chrom names for an assembly.
   usage:  database [norandom]

getMatrixLines.csh

 Searches the README.txt files to find the correct parameters for the 
 $matrix variable.  This is the q-parameter from the blastz run. 
   usage:  fromDb toDb (these can be in either order)

getYield.csh

(also see: featureBits)

 uses featureBits to get yield and enrichment.
   usage:  database trackname [reference track]
             refTrack defaults to refGene

runJoiner.csh

 runs joinerCheck -keys, finding all identifiers for a table.
 runs joinerCheck -times (use "noTimes" to disable).
 set database to "all" for global.
 for chains/nets, use tablename format: chainDb.
   usage:  database table [all.joiner file to use] [noTimes]