LiftOver Howto

From genomewiki
Jump to navigationJump to search

2018 UPDATE NOTE

This page is an interesting historical discussion and well worth the read.

HOWEVER, please note, the UCSC tool chain commands: DoSameSpeciesLiftOver.pl and DoBlastzChainNet.pl can now completely perform the sequence of events in your environment with your selected genome sequences.

Discussion

Creating a liftOver file is very similar to a whole-genome alignment. A liftOver file is a chain file, where for each region in the genome the alignments of the best/longest syntenic regions are used to translate features from one version of a genome to another.

With competing assemblies becoming more common, liftOver file is sometimes necessary for someone that sets up his/her own genome browser of a different assembly.

This page is based on Minimal_Steps_For_LiftOver, but is even more minimalistic.

Also see the page Whole_genome_alignment_howto and Same_species_lift_over_construction for some background on tools and terminology. From that page you will also find a link to a DoChainNetBlastz.pl script.


Specifically, note the page Alignments_with_Blastz and understand the script, RunLastzChain sh.txt

Outline

  • BLAT the new genome onto the old genome
  • Sort/Chain/Merge/Split/Net

Alignment

  • My genome is rather small, so I don't do any splitting or lifting steps anywhere, that makes it simpler. If you need to split your genome, see Minimal_Steps_For_LiftOver: The old genome is split in chunks and lifting file is generated. After the alignment of the chunks, the resulting chains are lifted back to the original coordinates.
  • I have repeatmasked my new genome assembly. The masked fa files are in ../ci3/rm/masked
  • The old assembly is called ci2.2bit, the new assembly is in the directory ../ci3. After repeatmasking it's in rm/masked
  • I want the alignment .psl to be in the directory psl, so I did the alignment with
 mkdir psl
 for i in ../ci3/rm/masked/*.masked; do blat ../ci2.2bit $i -tileSize=12 -fastMap -minIdentity=98 psl/`basename $i .fa.masked`.psl -noHead -minScore=100; done
  • Translate psl files to chains in the directory chain:
 mkdir chain
 for i in psl/*.psl; do axtChain -linearGap=medium -psl $i ../ci2.2bit ../ci3/ci3.2bit chain/`basename $i .psl`.chain; done
  • Merge short chains into longer ones into the directory chainMerge:
 mkdir chainMerge
 chainMergeSort chain/*.chain | chainSplit chainMerge stdin -lump=50
  • concat and sort the chains:
 cat chainMerge/*.chain > all.chain
 chainSort all.chain all.sorted.chain
  • Need info about chromosome sizes for netting:
 twoBitInfo ../ci3/ci3.2bit ci3.chromInfo
 twoBitInfo ../ci2.2bit ci2.chromInfo
  • Netting: identify alignable regions from chains:
 mkdir net
 chainNet all.sorted.chain ci2.chromInfo ci3.chromInfo net/all.net /dev/null
  • Finally, select the right alignable regions using the nets, creating a "liftOver" file:
 netChainSubset net/all.net all.chain ci2ToCi3.liftOver