Talk:Same species lift over construction

From genomewiki
Revision as of 07:00, 7 May 2015 by Anurag Priyam (talk | contribs) (Feedback on the approach described on this page.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

The scripts on this page are a bit misleading:

The splitting process is ineffective. The approach taken by this script can create millions of jobs depending on the number of scaffolds you have and can result in job time of days even on very fast 40 core machines. Simply splitting query sequence into 5000 bp large chunks (and grouping them into 30-50 groups) is enough to run BLAT within minutes even on reasonably powerful desktop computers.

The BlatJob.csh script re-invents wheel which can make things buggy and slow things down dramatically. Indeed, BlatJob.csh takes 10x more time to run than the BLAT job it launches! Then, lift up files can be created with the faSplit tool. I would rather use that than try to create lift up files myself and risk a bug in the overall procedure.

The original lift over description at http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/doc/liftOver.txt and the steps at http://genomewiki.ucsc.edu/index.php/LiftOver_Howto are helpful. I just did a same species lift over (~350 Mb genome) with ~94% of the annotations from target assembly mapped to the query. Of all the annotations mapped, ~90% are good. You can also check out my simple pipeline here - https://github.com/yeban/flo (working on "how to use" documentation).