Talk:Same species lift over construction: Difference between revisions

From genomewiki
Jump to navigationJump to search
(Feedback on the approach described on this page.)
 
mNo edit summary
Line 5: Line 5:
The BlatJob.csh script re-invents wheel which can make things buggy and slow things down dramatically. Indeed, BlatJob.csh takes 10x more time to run than the BLAT job it launches! Then, lift up files can be created with the faSplit tool. I would rather use that than try to create lift up files myself and risk a bug in the overall procedure.
The BlatJob.csh script re-invents wheel which can make things buggy and slow things down dramatically. Indeed, BlatJob.csh takes 10x more time to run than the BLAT job it launches! Then, lift up files can be created with the faSplit tool. I would rather use that than try to create lift up files myself and risk a bug in the overall procedure.


The original lift over description at http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/doc/liftOver.txt and the steps at http://genomewiki.ucsc.edu/index.php/LiftOver_Howto are helpful. I just did a same species lift over (~350 Mb genome) with ~94% of the annotations from target assembly mapped to the query. Of all the annotations mapped, ~90% are good. You can also check out my simple pipeline here - https://github.com/yeban/flo (working on "how to use" documentation).
The original lift over description at http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/doc/liftOver.txt and the steps at http://genomewiki.ucsc.edu/index.php/LiftOver_Howto are helpful. I just did a same species lift over (~350 Mb genome) with ~94% of the annotations from target assembly mapped to the query. Of all the annotations mapped, ~90% are good. You can also check out my pipeline here - https://github.com/yeban/flo.

Revision as of 11:21, 7 May 2015

The scripts on this page are a bit misleading:

The splitting process is ineffective. The approach taken by this script can create millions of jobs depending on the number of scaffolds you have and can result in job time of days even on very fast 40 core machines. Simply splitting query sequence into 5000 bp large chunks (and grouping them into 30-50 groups) is enough to run BLAT within minutes even on reasonably powerful desktop computers.

The BlatJob.csh script re-invents wheel which can make things buggy and slow things down dramatically. Indeed, BlatJob.csh takes 10x more time to run than the BLAT job it launches! Then, lift up files can be created with the faSplit tool. I would rather use that than try to create lift up files myself and risk a bug in the overall procedure.

The original lift over description at http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/doc/liftOver.txt and the steps at http://genomewiki.ucsc.edu/index.php/LiftOver_Howto are helpful. I just did a same species lift over (~350 Mb genome) with ~94% of the annotations from target assembly mapped to the query. Of all the annotations mapped, ~90% are good. You can also check out my pipeline here - https://github.com/yeban/flo.