High Throughput Genome Builds
From genomewiki
Jump to navigationJump to search
Input Requirements
AGP File, Sequence
Output Tracks
Gap. Gold, ChromInfo, GC5
Then Repeat Masking/Window Masker and trfBig
Produces rmsk/windowMask and simpleRepeats track
Early Tracks
Genscan, Genbank Scripts, CpG Islands
Map Closest protein set (human mostly)
Same Species LiftOvers
Comparative Genomics
human and mouse
Candidate Genomes
(A)platypus(new + update), (A)frog update, (A)Stickleback(fish)(new + update), (A)Medaka(fish), (B)rhesus update, (B)chicken update, (A)elephant, (B)rabbit(new + update), (A)pig(new + update), (B)cat(new + update), (A)fugu update, (B)chimp update, (C)mouse, (C)human
Tools
- xxxToAgp
- agpToDb builds to to RepeatMasking, include data staging ? SAN, scratch, iscratch
verifies AGP & Sequence, sets up unmasked sequence into 500K chunks for RM, stores sequence in standard place.
- masker with or without repeat library
Work towards windowMask automation
- Masked sequence stager
- make goldenPath downloads
- makeXxxTrack
Work towards track by track automation
- doBlastzChainNet (how about saving run-time parameters in a metadata DB table to create README/.html information ?)
chain/net .html file $alignSettings keyword and library function to make alignment parameter table
- Conservation (On all browsers ?)
Bugs
.html pages for tracks
QA Time
- Downloads
- Metadata, hgCentral, defaultDb, dbDb, all.joiner, READMEs
- liftOvers (between same species)
- HTML Pages
- Links and Clickthroughs
Metadata blastz parameter table
Organism | Assembly | matrix | gaps | abridged repeats | ...etc... |
---|---|---|---|---|---|
mouse | mm8 | medium | medium | yes | ...etc... |
human | hg19 | loose | loose | no | ...etc... |