You've seen one genome assembly, you've seen 'em all -- hardly! But there are some very predictable, repetitive things that developers need to do every time we build a genome annotation database on a new genome assembly. It is in our best interest to automate these steps when possible for these reasons:
- it saves time
- it reduces copy-paste and didn't-see-that-error-message errors
- it helps to enforce naming conventions, which helps us use each other's data
- it can produce detailed and accurate documentation of the data
- it keeps our eyes from glazing over
Of course, nothing is for free. When something goes wrong in an automated process, we must work our way back from a usually cryptic error message through an additional level of code to the source of the problem. (Or if it's GenBank automation, bug MarkD. ;) But the hope is that developers will spend their time on more tasks that require critical thinking and fewer boring repetitive tasks.
Automation Scripting Infrastructure
use of perl... interpreted, nice support for regexes, hashes, etc.
Existing Automation Scripts
MarkD's genbank scripts...
Automation Wish List
- Repeat library generation (window masker?)
- Brian's chained protein alignments
- CpG islands
- meta-automation of all blastz's, multiz, phastCons?
- meta-automation of all scripts that we always run?
- fileserver/machines out of sync
- cluster job dies
- cluster job hangs
- ssh hangs