QA scripts: Difference between revisions

From Genecats
Jump to navigationJump to search
(many more added!)
(added mypush, added <pre> tags to make usage statements look nicer)
Line 1: Line 1:
This is a list of the most frequently-used programs and scripts that QAers use.  It does not include track-specific programs, such as chainNetTrio.csh.  Some are devised by QA and live in the source tree at kent/src/utils/qa.  Some are  
This is a list of the most frequently-used programs and scripts that QAers use.  It does not include track-specific programs, such as chainNetTrio.csh.  Some are devised by QA and live in the source tree at kent/src/utils/qa (these mostly end in ".csh").  Some are programs that the engineers or system administrators have written.


=Must know about!=
=Must know about!=


==bigPush.csh==
==bigPush.csh==
 
(also see: mypush)
<pre>
Pushes tables in list to mysqlbeta and records size.
Pushes tables in list to mysqlbeta and records size.
   Requires sudo access to mypush to run.
   Requires sudo access to mypush to run.
Line 19: Line 20:
   db.tables.push -> output for all tables from mypush
   db.tables.push -> output for all tables from mypush
   db.tables.pushSize -> size of push
   db.tables.pushSize -> size of push
 
</pre>
==commTrio.csh==
==commTrio.csh==
 
<pre>
  Sorts and compares two files.   
  Sorts and compares two files.   
  Counts unique and common records.
  Counts unique and common records.
Line 27: Line 28:
     usage:  leftFileName rightFileName [rm]
     usage:  leftFileName rightFileName [rm]
             optional [rm]: remove the three output files when finished
             optional [rm]: remove the three output files when finished
 
</pre>
==compareWholeColumn.csh==
==compareWholeColumn.csh==
 
<pre>
   gets a column from a table on dev and beta and checks diffs.
   gets a column from a table on dev and beta and checks diffs.
   reports numbers of rows unique to each and common.
   reports numbers of rows unique to each and common.
Line 36: Line 37:


     usage:  database table column [db2]  
     usage:  database table column [db2]  
 
</pre>
==compareWholeTable.csh==
==compareWholeTable.csh==
 
<pre>
   gets an entire table from two machines and checks diffs.
   gets an entire table from two machines and checks diffs.
   reports numbers of rows unique to each and common.
   reports numbers of rows unique to each and common.
Line 46: Line 47:
     usage:  database table [machine1] [machine2]
     usage:  database table [machine1] [machine2]
       (defaults to dev and beta)
       (defaults to dev and beta)
 
</pre>
==countPerChrom.csh==
==countPerChrom.csh==
 
<pre>
   check to see if there are annotations on all chroms.
   check to see if there are annotations on all chroms.
   will check to see if chrom field is named tName or genoName.
   will check to see if chrom field is named tName or genoName.
Line 58: Line 59:
         if RR is specified, will use genome-mysql
         if RR is specified, will use genome-mysql
       histogram option prints bar graph, not values
       histogram option prints bar graph, not values
 
</pre>
==featureBits==
==featureBits==
(no ".csh".  Also see: getYield.csh)
(no ".csh".  Also see: getYield.csh)
<pre>
  featureBits - Correlate tables via bitmap projections.  
  featureBits - Correlate tables via bitmap projections.  
  usage:
  usage:
   featureBits database table(s)
   featureBits database table(s)
</pre>
(truncated for brevity)
(truncated for brevity)


==findLevel==
==findLevel==
(no ".csh")
(no ".csh")
<pre>
   searches trackDb hierarchy for your table and corresponding .html file
   searches trackDb hierarchy for your table and corresponding .html file
   also returns the value of the priority and visibility entries
   also returns the value of the priority and visibility entries
Line 73: Line 77:


     usage:  database tableName
     usage:  database tableName
 
</pre>
==findOrg.csh==
==findOrg.csh==
 
<pre>
  Finds the organism name given the assembly name
  Finds the organism name given the assembly name
   usage: assemblyName [date]
   usage: assemblyName [date]
Line 81: Line 85:
         use 'date' to also retrieve assembly date
         use 'date' to also retrieve assembly date
         (e.g. 'ornAna2' or 'ornAna')
         (e.g. 'ornAna2' or 'ornAna')
 
</pre>
==getAssemblies.csh==
==getAssemblies.csh==
 
<pre>
   gets the names of all databases that contain a given table.
   gets the names of all databases that contain a given table.
   will accept the MySQL wildcard, %, but not on RR machines
   will accept the MySQL wildcard, %, but not on RR machines
Line 90: Line 94:
     usage:  tablename [machine] [verbose] - defaults to beta
     usage:  tablename [machine] [verbose] - defaults to beta
               "verbose" prints list of assemblies checked
               "verbose" prints list of assemblies checked
 
</pre>
==getTrackName.csh==
==getTrackName.csh==
 
<pre>
  Returns the short label and group of the track for this table.
  Returns the short label and group of the track for this table.
  In the case of a composite track, it returns the short label
  In the case of a composite track, it returns the short label
Line 98: Line 102:


     usage:  database tableName
     usage:  database tableName
 
</pre>
==joinerCheck==
==joinerCheck==
(also see: runJoiner.csh)
(also see: runJoiner.csh)
 
<pre>
  joinerCheck - Parse and check joiner file
  joinerCheck - Parse and check joiner file
  usage:
  usage:
Line 120: Line 124:
   -verbose=N - use verbose to diagnose difficulties. N = 2, 3 or 4 to
   -verbose=N - use verbose to diagnose difficulties. N = 2, 3 or 4 to
               - show increasing level of detail for some functions.
               - show increasing level of detail for some functions.
</pre>


==mypush==
used with sudo, generally like so:
sudo mypush db table mysqlbeta
<pre>
Usage: mypush database table-pattern [hostlist]
NOTE: use single quotes around table-pattern 
      if it contains shell special chars like * or ?
</pre>
==realTime.csh==
==realTime.csh==
(also see: updateTimes.csh)
(also see: updateTimes.csh)
 
<pre>
   gets update times from all machines in real time for tables in list.
   gets update times from all machines in real time for tables in list.


     usage:  database tablelist (will accept single table)
     usage:  database tablelist (will accept single table)
 
</pre>
==runBits.csh==
==runBits.csh==
 
<pre>
   runs featureBits and checks for overlap with gaps.
   runs featureBits and checks for overlap with gaps.


     usage:  database trackname [checkUnbridged]
     usage:  database trackname [checkUnbridged]
               where overlap with unbridged gaps can be turned on
               where overlap with unbridged gaps can be turned on
 
</pre>
==updateTimes.csh==
==updateTimes.csh==
(also see: realTime.csh)
(also see: realTime.csh)
 
<pre>
   gets update times for three machines for tables in list.
   gets update times for three machines for tables in list.
   if table is trackDb, trackDb_public will also be checked.
   if table is trackDb, trackDb_public will also be checked.
Line 146: Line 159:
             reports on dev, beta and RR
             reports on dev, beta and RR
             tablelist will accept single table
             tablelist will accept single table
 
</pre>


=Might also like!=
=Might also like!=
==compareTrackDbAll.csh==
==compareTrackDbAll.csh==
 
<pre>
   checks all fields in trackDb
   checks all fields in trackDb


Line 159: Line 172:
       - verbose is for html field - defaults to terse
       - verbose is for html field - defaults to terse
       - fast = (genome-mysql) - defaults to realTime (WGET)
       - fast = (genome-mysql) - defaults to realTime (WGET)
 
</pre>
==checkPushedFiles.csh==
==checkPushedFiles.csh==
 
<pre>
  checks to see if files are in place, after a push
  checks to see if files are in place, after a push


Line 177: Line 190:


  any output other than '200 OK' indicates an error.
  any output other than '200 OK' indicates an error.
 
</pre>
==compareTableToFile.csh==
==compareTableToFile.csh==
 
<pre>
  Ensures that a table correlates with its associated file.
  Ensures that a table correlates with its associated file.
  Only prints results if there is a diff between table and file.
  Only prints results if there is a diff between table and file.
Line 191: Line 204:


   use verbose for more details
   use verbose for more details
 
</pre>
==copyExtSeqRows.csh==
==copyExtSeqRows.csh==
 
<pre>
  Automatically copies appropriate rows from the extFile and seq tables
  Automatically copies appropriate rows from the extFile and seq tables
  from hgwdev to hgwbeta.
  from hgwdev to hgwbeta.
</pre>
(truncated for brevity)
(truncated for brevity)


==countRows.csh==
==countRows.csh==
 
<pre>
   gets the rowcount for a list of tables from dev, beta and RR.
   gets the rowcount for a list of tables from dev, beta and RR.


Line 207: Line 221:
     RR results not in real time, but from dumps
     RR results not in real time, but from dumps
     genome-mysql option adds results from public mysql server
     genome-mysql option adds results from public mysql server
 
</pre>
==findBlatServer.csh==
==findBlatServer.csh==
 
<pre>
  gets info about which blat server hosts which genome(s)
  gets info about which blat server hosts which genome(s)


Line 218: Line 232:
   third parameter optional: specify machine
   third parameter optional: specify machine
     defaults to RR
     defaults to RR
 
</pre>
==findColumn.csh==
==findColumn.csh==
 
<pre>
   searches database for all tables containing a specified column name.
   searches database for all tables containing a specified column name.


     usage:  database, field, [machine = hgwdev|hgwbeta]  
     usage:  database, field, [machine = hgwdev|hgwbeta]  
       (defaults to beta)
       (defaults to beta)
 
</pre>
 
==findPushQLocks.csh==
==findPushQLocks.csh==
 
<pre>
   find all locks in the pushQ on hgwbeta
   find all locks in the pushQ on hgwbeta


Line 234: Line 247:
     run with 'go' to see a list of locks
     run with 'go' to see a list of locks
     run with 'real' to unlock all the locks
     run with 'real' to unlock all the locks
 
</pre>
==getChainLines.csh==
==getChainLines.csh==
 
(also see: getMatrixLines.csh)
<pre>
   Searches the README.txt files to find the correct parameters for the  
   Searches the README.txt files to find the correct parameters for the  
   $chainMinScore and $chainLinearGap variables.  
   $chainMinScore and $chainLinearGap variables.  


     usage:  fromDb toDb (these can be in either order)
     usage:  fromDb toDb (these can be in either order)
 
</pre>
==getChromlist.csh==
==getChromlist.csh==
 
<pre>
   prints the chrom names for an assembly.
   prints the chrom names for an assembly.


     usage:  database [norandom]
     usage:  database [norandom]
 
</pre>
==getMatrixLines.csh==
==getMatrixLines.csh==
 
(aslo see: getChainLines.csh)
<pre>
   Searches the README.txt files to find the correct parameters for the  
   Searches the README.txt files to find the correct parameters for the  
   $matrix variable.  This is the q-parameter from the blastz run.  
   $matrix variable.  This is the q-parameter from the blastz run.  


     usage:  fromDb toDb (these can be in either order)
     usage:  fromDb toDb (these can be in either order)
 
</pre>
==getYield.csh==
==getYield.csh==
(also see: featureBits)
(also see: featureBits)
 
<pre>
   uses featureBits to get yield and enrichment.
   uses featureBits to get yield and enrichment.


     usage:  database trackname [reference track]
     usage:  database trackname [reference track]
               refTrack defaults to refGene
               refTrack defaults to refGene
 
</pre>
==runJoiner.csh==
==runJoiner.csh==
 
<pre>
   runs joinerCheck -keys, finding all identifiers for a table.
   runs joinerCheck -keys, finding all identifiers for a table.
   runs joinerCheck -times (use "noTimes" to disable).
   runs joinerCheck -times (use "noTimes" to disable).
Line 271: Line 286:


     usage:  database table [all.joiner file to use] [noTimes]
     usage:  database table [all.joiner file to use] [noTimes]
 
</pre>
[[Category:Browser QA]]
[[Category:Browser QA]]
[[Category:Browser QA Training]]
[[Category:Browser QA Training]]

Revision as of 03:04, 27 September 2011

This is a list of the most frequently-used programs and scripts that QAers use. It does not include track-specific programs, such as chainNetTrio.csh. Some are devised by QA and live in the source tree at kent/src/utils/qa (these mostly end in ".csh"). Some are programs that the engineers or system administrators have written.

Must know about!

bigPush.csh

(also see: mypush)

Pushes tables in list to mysqlbeta and records size.
  Requires sudo access to mypush to run.
  
  Do not redirect output or run in the background,
  as it will require you to type your password in.
  Program will ask you for your password again after
  large tables. If you take too long to re-type in
  the table the script stalled on might not get
  pushed. Double-check that all tables have been
  pushed!
  
  Will report total size of push and write two files:
  db.tables.push -> output for all tables from mypush
  db.tables.pushSize -> size of push

commTrio.csh

 Sorts and compares two files.  
 Counts unique and common records.

     usage:  leftFileName rightFileName [rm]
             optional [rm]: remove the three output files when finished

compareWholeColumn.csh

  gets a column from a table on dev and beta and checks diffs.
  reports numbers of rows unique to each and common.
  can compare to older database.
  writes files of everything.

    usage:  database table column [db2] 

compareWholeTable.csh

  gets an entire table from two machines and checks diffs.
  reports numbers of rows unique to each and common.
  writes files of everything.
  not real-time on RR -- uses genome-mysql.

    usage:  database table [machine1] [machine2]
      (defaults to dev and beta)

countPerChrom.csh

  check to see if there are annotations on all chroms.
  will check to see if chrom field is named tName or genoName.

    usage:  database1 table [database2] [RR] [histogram]

      checks database1 on dev
      database2 will be checked on beta by default
        if RR is specified, will use genome-mysql
      histogram option prints bar graph, not values

featureBits

(no ".csh". Also see: getYield.csh)

 featureBits - Correlate tables via bitmap projections. 
 usage:
   featureBits database table(s)

(truncated for brevity)

findLevel

(no ".csh")

  searches trackDb hierarchy for your table and corresponding .html file
  also returns the value of the priority and visibility entries
  and the .ra file location for each

    usage:  database tableName

findOrg.csh

 Finds the organism name given the assembly name
  usage: assemblyName [date]
         will accept name with or without digit
         use 'date' to also retrieve assembly date
         (e.g. 'ornAna2' or 'ornAna')

getAssemblies.csh

  gets the names of all databases that contain a given table.
  will accept the MySQL wildcard, %, but not on RR machines
  note: not real-time on RR.  uses nightly TABLE STATUS dump.

    usage:  tablename [machine] [verbose] - defaults to beta
              "verbose" prints list of assemblies checked

getTrackName.csh

 Returns the short label and group of the track for this table.
 In the case of a composite track, it returns the short label
 for both the sub track and the parent track.

    usage:  database tableName

joinerCheck

(also see: runJoiner.csh)

 joinerCheck - Parse and check joiner file
 usage:
   joinerCheck file.joiner
 options:
   -fields - Check fields in joiner file exist, faster with -fieldListIn
      -fieldListOut=file - List all fields in all databases to file.
      -fieldListIn=file - Get list of fields from file rather than mysql.
   -keys - Validate (foreign) keys.  Takes about an hour.
   -tableCoverage - Check that all tables are mentioned in joiner file
   -dbCoverage - Check that all databases are mentioned in joiner file
   -times - Check update times of tables are after tables they depend on
   -all - Do all tests: -fields -keys -tableCoverage -dbCoverage -times
   -identifier=name - Just validate given identifier.
                    Note only applies to keys and fields checks.
   -database=name - Just validate given database.
                    Note only applies to keys and times checks.
   -verbose=N - use verbose to diagnose difficulties. N = 2, 3 or 4 to
              - show increasing level of detail for some functions.

mypush

used with sudo, generally like so: sudo mypush db table mysqlbeta

Usage: mypush database table-pattern [hostlist]
 NOTE: use single quotes around table-pattern  
       if it contains shell special chars like * or ? 

realTime.csh

(also see: updateTimes.csh)

  gets update times from all machines in real time for tables in list.

    usage:  database tablelist (will accept single table)

runBits.csh

  runs featureBits and checks for overlap with gaps.

    usage:  database trackname [checkUnbridged]
              where overlap with unbridged gaps can be turned on

updateTimes.csh

(also see: realTime.csh)

  gets update times for three machines for tables in list.
  if table is trackDb, trackDb_public will also be checked.
  warning:  not in real time for RR.  uses overnight dump.

    usage:  database tablelist 

            reports on dev, beta and RR
            tablelist will accept single table

Might also like!

compareTrackDbAll.csh

  checks all fields in trackDb

    usage: database [machine1] [machine2] [mode] 

      (defaults to hgw1 and hgwbeta)
      mode = (fast | verbose | fastVerbose) 
       - verbose is for html field - defaults to terse
       - fast = (genome-mysql) - defaults to realTime (WGET)

checkPushedFiles.csh

 checks to see if files are in place, after a push

 usage: website files(s)

 website should include the path of the directory where
 the files reside, such as:
   http://hgdownload.cse.ucsc.edu/goldenPath/hg19/liftOver/ 

 file(s) is either a single name or a list of names, and can
 include items with additional directory structure, like so:
   filename
   dir/filename
   dir/dir/dir/filename

 any output other than '200 OK' indicates an error.

compareTableToFile.csh

 Ensures that a table correlates with its associated file.
 Only prints results if there is a diff between table and file.
 Works for these file types: narrowPeak, broadPeak, gappedPeak,
                             bedGraph, NRE, BiP, gcf
 For wiggle files, you must specify [wig] parameter.

  usage:  database tableName fileName [wig] [verbose]
   fileName includes path of download file 
   e.g. /goldenPath/<db>/fileName.gz

   use verbose for more details

copyExtSeqRows.csh

 Automatically copies appropriate rows from the extFile and seq tables
 from hgwdev to hgwbeta.

(truncated for brevity)

countRows.csh

  gets the rowcount for a list of tables from dev, beta and RR.

    usage:  database tablelist [genome-mysql]
      tablelist can be just name of single table

    RR results not in real time, but from dumps
    genome-mysql option adds results from public mysql server

findBlatServer.csh

 gets info about which blat server hosts which genome(s)

 usage:  db|host|all  [db|host]  [machine]
   first parameter required: one specific db or host or all dbs
   second parameter optional: order by db or by host (blatServer)
     defaults to order by db
   third parameter optional: specify machine
     defaults to RR

findColumn.csh

  searches database for all tables containing a specified column name.

    usage:  database, field, [machine = hgwdev|hgwbeta] 
      (defaults to beta)

findPushQLocks.csh

  find all locks in the pushQ on hgwbeta

    usage: go|real
     run with 'go' to see a list of locks
     run with 'real' to unlock all the locks

getChainLines.csh

(also see: getMatrixLines.csh)

  Searches the README.txt files to find the correct parameters for the 
  $chainMinScore and $chainLinearGap variables. 

    usage:  fromDb toDb (these can be in either order)

getChromlist.csh

  prints the chrom names for an assembly.

    usage:  database [norandom]

getMatrixLines.csh

(aslo see: getChainLines.csh)

  Searches the README.txt files to find the correct parameters for the 
  $matrix variable.  This is the q-parameter from the blastz run. 

    usage:  fromDb toDb (these can be in either order)

getYield.csh

(also see: featureBits)

  uses featureBits to get yield and enrichment.

    usage:  database trackname [reference track]
              refTrack defaults to refGene

runJoiner.csh

  runs joinerCheck -keys, finding all identifiers for a table.
  runs joinerCheck -times (use "noTimes" to disable).
  set database to "all" for global.
  for chains/nets, use tablename format: chainDb.

    usage:  database table [all.joiner file to use] [noTimes]