Automation

From genomewiki
Revision as of 23:27, 21 August 2006 by AngieHinrichs (talk | contribs) (Outline of documentation for genome db build automation.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Why Automate?

You've seen one genome assembly, you've seen 'em all -- hardly! But there are some very predictable, repetitive things that developers need to do every time we build a genome annotation database on a new genome assembly. It is in our best interest to automate these steps when possible for these reasons:

  • it saves time
  • it reduces copy-paste and didn't-see-that-error-message errors
  • it helps to enforce naming conventions, which helps us use each other's data
  • it can produce detailed and accurate documentation of the data
  • it keeps our eyes from glazing over

Of course, nothing is for free. When something goes wrong in an automated process, we must work our way back from a usually cryptic error message through an additional level of code to the source of the problem. (Or if it's GenBank automation, bug MarkD. ;) But the hope is that developers will spend their time on more tasks that require critical thinking and fewer boring repetitive tasks.

The 5/30/06 genecats meeting was devoted to discussion and planning of build automation; Hiram transcribed the whiteboard notes from the meeting in High Throughput Genome Builds.

Automation Scripting Infrastructure

use of perl... interpreted, nice support for regexes, hashes, etc.

  • HgAutomate.pm
  • HgRemoteScript.pm
  • HgStepManager.pm

doTemplate.pl

Existing Automation Scripts

  • makeGenomeDb.pl
  • doRepeatMasker.pl
  • makeDownloads.pl
  • doSameSpeciesLiftOver.pl
  • doBlastzChainNet.pl
  • doHgNearBlastp.pl
  • makePushQSql.pl

MarkD's genbank scripts...

Automation Wish List

  • Repeat library generation (window masker?)
  • Brian's chained protein alignments
  • CpG islands
  • multiz
  • phastCons
  • meta-automation of all blastz's, multiz, phastCons?
  • meta-automation of all scripts that we always run?

Automation Troubleshooting

  • fileserver/machines out of sync
  • cluster job dies
  • cluster job hangs
  • ssh hangs