LiftOver Howto: Difference between revisions

From genomewiki
Jump to navigationJump to search
No edit summary
No edit summary
Line 1: Line 1:
Creating a liftOver file is very similar to a whole-genome alignment. A liftOver file is a chain file, where for each region in the genome the alignments of the best/longest syntenic regions are used to translate features from one version of a genome to another.
Creating a liftOver file is very similar to a whole-genome alignment. A liftOver file is a chain file, where for each region in the genome the alignments of the best/longest syntenic regions are used to translate features from one version of a genome to another.


With competing assemblies becoming more common, liftOver file is sometimes necessary for group that set up their own browsers.
With competing assemblies becoming more common, liftOver file is sometimes necessary for someone that sets up his/her own genome browser of a different assembly.


This page is based on [[Minimal_Steps_For_LiftOver]], but is even more minimalistic.
This page is based on [[Minimal_Steps_For_LiftOver]], but is even more minimalistic.

Revision as of 11:23, 12 January 2011

Creating a liftOver file is very similar to a whole-genome alignment. A liftOver file is a chain file, where for each region in the genome the alignments of the best/longest syntenic regions are used to translate features from one version of a genome to another.

With competing assemblies becoming more common, liftOver file is sometimes necessary for someone that sets up his/her own genome browser of a different assembly.

This page is based on Minimal_Steps_For_LiftOver, but is even more minimalistic.

Also see the page Whole_genome_alignment_howto for some background on tools and terminology.

Outline

  • BLAT the new genome onto the old genome
  • Sort/Chain/Merge/Split/Net

Alignment

  • My genome is rather small, so I don't do any splitting or lifting steps anywhere, that makes it simpler. If you need to split your genome, see Minimal_Steps_For_LiftOver: The old genome is split in chunks and lifting file is generated. After the alignment of the chunks, the resulting chains are lifted back to the original coordinates.
  • I have repeatmasked my new genome assembly. The masked fa files are in ../ci3/rm/masked
  • The old assembly is called ci2.2bit, the new assembly is in the directory ../ci3. After repeatmasking it's in rm/masked
  • I want the alignment .psl to be in the directory psl, so I did the alignment with
 mkdir psl
 for i in ../ci3/rm/masked/*.masked; do blat ../ci2.2bit $i -tileSize=12 -fastMap -minIdentity=98 psl/`basename $i .fa.masked`.psl -noHead -minScore=100; done
  • Translate psl files to chains in the directory chain:
 mkdir chain
 for i in psl/*.psl; do axtChain -linearGap=medium -psl $i ../ci2.2bit ../ci3/ci3.2bit chain/`basename $i .psl`.chain; done
  • Merge short chains into longer ones into the directory chainMerge:
 mkdir chainMerge
 chainMergeSort chain/*.chain | chainSplit chainMerge stdin -lump=50
  • concat and sort the chains:
 cat chainMerge/*.chain > all.chain
 chainSort all.chain all.sorted.chain
  • Need info about chromosome sizes for netting:
 twoBitInfo ../ci3/ci3.2bit ci3.chromInfo
 twoBitInfo ../ci2.2bit ci2.chromInfo
  • Netting: identify alignable regions from chains:
 mkdir net
 chainNet all.sorted.chain ci2.chromInfo ci3.chromInfo net/all.net /dev/null
  • Finally, select the right alignable regions using the nets, creating a "liftOver" file:
 netChainSubset net/all.net all.chain ci2ToCi3.liftOver