Minimal Steps For LiftOver
2018 UPDATE NOTE
This page is an interesting historical discussion and well worth the read.
HOWEVER, please note, the UCSC tool chain commands: DoSameSpeciesLiftOver.pl and DoBlastzChainNet.pl can now completely perform the sequence of events in your environment with your selected genome sequences.
Discussion
See also: Whole_genome_alignment_howto, LiftOver_Howto and Same_species_lift_over_construction.
This doc is a stripped down version of /kent/src/hg/doc/liftOver.txt.
The following describes the minimal steps needed to create a liftOver chain file that converts annotations between bacterial genome builds (genome size ~4Mb). Because they are the same species, I used BLAT to align the two builds. More divergent sequences should be aligned using blastz. Steps [2] to [4] can be modified to accept the blastz output file format.
[1] Split the query (NEW) genome build by FASTA record using faSplit:
$ faSplit sequence NEW.build 2 chr
NOTES: There are 2 fasta records in the NEW build, therefore, I used the argument '2' to split the build into two files. Each file contains one FASTA record, chr0 and chr1. Also need to create .lft files (for step [3]) that describe the two sequences. If breaking the sequences into chunks using the 'size' parameter, just use the -lift option. Otherwise, need to make your own .lft files. For example, chr0.lft contains:
0 chr 3061531 chr 3061531
where columns are:
start seq_name size seq_name size
(See also LiftUp format)
Step [6] requires chrom.sizes files that contain the sequence lengths of the FASTA records in the builds:
$ cd path_to_OLD_build
$ twoBitInfo OLD.2bit chrom.sizes
$ cd path_to_NEW_build
$ twoBitInfo NEW.2bit chrom.sizes
[2] BLAT query sequences from [1] against the OLD build:
$ blat path_to_OLD_build/OLD.2bit path_to_NEW_build/chr0.fa OLD.chr0.psl -tileSize=12 -minScore=100 -minIdentity=98 -fastMap
$ blat path_to_OLD_build/OLD.2bit path_to_NEW_build/chr1.fa OLD.chr1.psl -tileSize=12 -minScore=100 -minIdentity=98 -fastMap
NOTES: Not using ooc file because genome build is small. Probably a good idea to use a ooc file for larger genomes i.e. > X(?) Mb.
[3] Use liftUp to change the coordinate system. Requires .lft files created in step [1]:
$ liftUp -pslQ chr1.psl chr1.lft warn OLD.chr1.psl
$ liftUp -pslQ chr0.psl chr0.lft warn OLD.chr0.psl
[4] Chain together alignments from [3] using axtChain:
$ axtChain -linearGap=medium -psl chr0.psl path_to_OLD_build/OLD.2bit path_to_NEW_build/NEW.2bit chr0.chain
$ axtChain -linearGap=medium -psl chr1.psl path_to_OLD_build/OLD.2bit path_to_NEW_build/NEW.2bit chr1.chain
NOTES: Note the -psl argument. This allows axtChain to accept psl as input. I haven't tested this using blastz instead of BLAT. I figure you can convert the lav output from blastz using lavToAxt then use axtChain, ignoring the -psl option.
[5] Combine and sort chain files from [4]:
$ chainMergeSort *.chain | chainSplit chain stdin
NOTES: This creates a directory 'chain' which contains a chr.chain file. The OLD build use here only contained one chromosome. Not sure if axtChain will create one chain file for each target chromosome.
[6] Make alignment nets from chains in [5]:
$ cd chain
$ mkdir ../net
$ chainNet chr.chain path_to_OLD_build/chrom.sizes path_to_NEW_build/chrom.sizes ../net/chr.net /dev/null
NOTES: This step requires the chrom.sizes files from step [1].
[7] Create liftOver chain file:
$ netChainSubset ../net/chr.net chr.chain ../over/chr.chain
[8] Use your new liftOver chain file:
$ liftOver to_be_converted.bed ../over/chr.chain conversions.bed unmapped