Talk:Same species lift over construction: Difference between revisions

From genomewiki
Jump to navigationJump to search
mNo edit summary
(changed cse to soe)
 
Line 5: Line 5:
The BlatJob.csh script re-invents wheel which can make things buggy and slow things down dramatically. Indeed, BlatJob.csh takes 10x more time to run than the BLAT job it launches! Then, lift up files can be created with the faSplit tool. I would rather use that than try to create lift up files myself and risk a bug in the overall procedure.
The BlatJob.csh script re-invents wheel which can make things buggy and slow things down dramatically. Indeed, BlatJob.csh takes 10x more time to run than the BLAT job it launches! Then, lift up files can be created with the faSplit tool. I would rather use that than try to create lift up files myself and risk a bug in the overall procedure.


The original lift over description at http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/doc/liftOver.txt and the steps at http://genomewiki.ucsc.edu/index.php/LiftOver_Howto are helpful. I just did a same species lift over (~350 Mb genome) with ~94% of the annotations from target assembly mapped to the query. Of all the annotations mapped, ~90% are good. You can also check out my pipeline here - https://github.com/yeban/flo.
The original lift over description at http://hgwdev.soe.ucsc.edu/~kent/src/unzipped/hg/doc/liftOver.txt and the steps at http://genomewiki.ucsc.edu/index.php/LiftOver_Howto are helpful. I just did a same species lift over (~350 Mb genome) with ~94% of the annotations from target assembly mapped to the query. Of all the annotations mapped, ~90% are good. You can also check out my pipeline here - https://github.com/yeban/flo.

Latest revision as of 08:02, 1 September 2018

The scripts on this page are a bit misleading:

The splitting process is ineffective. The approach taken by this script can create millions of jobs depending on the number of scaffolds you have and can result in job time of days even on very fast 40 core machines. Simply splitting query sequence into 5000 bp large chunks (and grouping them into 30-50 groups) is enough to run BLAT within minutes even on reasonably powerful desktop computers.

The BlatJob.csh script re-invents wheel which can make things buggy and slow things down dramatically. Indeed, BlatJob.csh takes 10x more time to run than the BLAT job it launches! Then, lift up files can be created with the faSplit tool. I would rather use that than try to create lift up files myself and risk a bug in the overall procedure.

The original lift over description at http://hgwdev.soe.ucsc.edu/~kent/src/unzipped/hg/doc/liftOver.txt and the steps at http://genomewiki.ucsc.edu/index.php/LiftOver_Howto are helpful. I just did a same species lift over (~350 Mb genome) with ~94% of the annotations from target assembly mapped to the query. Of all the annotations mapped, ~90% are good. You can also check out my pipeline here - https://github.com/yeban/flo.