LiftOver Howto: Difference between revisions
No edit summary |
(added note about the scripts now available) |
||
(10 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
==2018 UPDATE NOTE== | |||
With competing assemblies becoming more common, liftOver file is sometimes necessary for | This page is an interesting historical discussion and well worth the read. | ||
<b><span style="color:#bb0000"> | |||
HOWEVER, please note</span></b>, | |||
the UCSC tool chain commands: [[DoSameSpeciesLiftOver.pl]] and [[DoBlastzChainNet.pl]] can now completely perform | |||
the sequence of events in your environment with your selected genome sequences. | |||
==Discussion== | |||
Creating a liftOver file is very similar to a whole-genome alignment. A liftOver file is a [[Chains Nets|chain]] file, where for each region in the genome the alignments of the best/longest syntenic regions are used to translate features from one version of a genome to another. | |||
With competing assemblies becoming more common, liftOver file is sometimes necessary for someone that sets up his/her own genome browser of a different assembly. | |||
This page is based on [[Minimal_Steps_For_LiftOver]], but is even more minimalistic. | This page is based on [[Minimal_Steps_For_LiftOver]], but is even more minimalistic. | ||
Also see the page [[Whole_genome_alignment_howto]] for some background on tools and terminology. | Also see the page [[Whole_genome_alignment_howto]] and [[Same_species_lift_over_construction]] for some background on tools and terminology. From that page you will also find a link to a [http://genomewiki.ucsc.edu/index.php/DoChainNetBlastz.pl DoChainNetBlastz.pl script.] | ||
Specifically, note the page [http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto#Example.2C_step_1:_Alignments_with_Blastz Alignments_with_Blastz] and understand the script, RunLastzChain sh.txt | |||
== Outline == | == Outline == | ||
Line 30: | Line 44: | ||
twoBitInfo ../ci3/ci3.2bit ci3.chromInfo | twoBitInfo ../ci3/ci3.2bit ci3.chromInfo | ||
twoBitInfo ../ci2.2bit ci2.chromInfo | twoBitInfo ../ci2.2bit ci2.chromInfo | ||
* identify alignable regions from chains: | * Netting: identify alignable regions from chains: | ||
mkdir net | mkdir net | ||
chainNet all.sorted.chain ci2.chromInfo ci3.chromInfo net/all.net /dev/null | chainNet all.sorted.chain ci2.chromInfo ci3.chromInfo net/all.net /dev/null | ||
* Finally, select the right alignable regions using the nets, creating a "liftOver" file: | * Finally, select the right alignable regions using the nets, creating a "liftOver" file: | ||
netChainSubset net/all.net all.chain ci2ToCi3.liftOver | netChainSubset net/all.net all.chain ci2ToCi3.liftOver |
Latest revision as of 15:29, 26 April 2018
2018 UPDATE NOTE
This page is an interesting historical discussion and well worth the read.
HOWEVER, please note, the UCSC tool chain commands: DoSameSpeciesLiftOver.pl and DoBlastzChainNet.pl can now completely perform the sequence of events in your environment with your selected genome sequences.
Discussion
Creating a liftOver file is very similar to a whole-genome alignment. A liftOver file is a chain file, where for each region in the genome the alignments of the best/longest syntenic regions are used to translate features from one version of a genome to another.
With competing assemblies becoming more common, liftOver file is sometimes necessary for someone that sets up his/her own genome browser of a different assembly.
This page is based on Minimal_Steps_For_LiftOver, but is even more minimalistic.
Also see the page Whole_genome_alignment_howto and Same_species_lift_over_construction for some background on tools and terminology. From that page you will also find a link to a DoChainNetBlastz.pl script.
Specifically, note the page Alignments_with_Blastz and understand the script, RunLastzChain sh.txt
Outline
- BLAT the new genome onto the old genome
- Sort/Chain/Merge/Split/Net
Alignment
- My genome is rather small, so I don't do any splitting or lifting steps anywhere, that makes it simpler. If you need to split your genome, see Minimal_Steps_For_LiftOver: The old genome is split in chunks and lifting file is generated. After the alignment of the chunks, the resulting chains are lifted back to the original coordinates.
- I have repeatmasked my new genome assembly. The masked fa files are in ../ci3/rm/masked
- The old assembly is called ci2.2bit, the new assembly is in the directory ../ci3. After repeatmasking it's in rm/masked
- I want the alignment .psl to be in the directory psl, so I did the alignment with
mkdir psl for i in ../ci3/rm/masked/*.masked; do blat ../ci2.2bit $i -tileSize=12 -fastMap -minIdentity=98 psl/`basename $i .fa.masked`.psl -noHead -minScore=100; done
- Translate psl files to chains in the directory chain:
mkdir chain for i in psl/*.psl; do axtChain -linearGap=medium -psl $i ../ci2.2bit ../ci3/ci3.2bit chain/`basename $i .psl`.chain; done
- Merge short chains into longer ones into the directory chainMerge:
mkdir chainMerge chainMergeSort chain/*.chain | chainSplit chainMerge stdin -lump=50
- concat and sort the chains:
cat chainMerge/*.chain > all.chain chainSort all.chain all.sorted.chain
- Need info about chromosome sizes for netting:
twoBitInfo ../ci3/ci3.2bit ci3.chromInfo twoBitInfo ../ci2.2bit ci2.chromInfo
- Netting: identify alignable regions from chains:
mkdir net chainNet all.sorted.chain ci2.chromInfo ci3.chromInfo net/all.net /dev/null
- Finally, select the right alignable regions using the nets, creating a "liftOver" file:
netChainSubset net/all.net all.chain ci2ToCi3.liftOver