Minimal Steps For LiftOver

From genomewiki
Jump to navigationJump to search

2018 UPDATE NOTE

This page is an interesting historical discussion and well worth the read.

HOWEVER, please note, the UCSC tool chain commands: DoSameSpeciesLiftOver.pl and DoBlastzChainNet.pl can now completely perform the sequence of events in your environment with your selected genome sequences.

Discussion

See also: Whole_genome_alignment_howto, LiftOver_Howto and Same_species_lift_over_construction.

This doc is a stripped down version of /kent/src/hg/doc/liftOver.txt.

The following describes the minimal steps needed to create a liftOver chain file that converts annotations between bacterial genome builds (genome size ~4Mb). Because they are the same species, I used BLAT to align the two builds. More divergent sequences should be aligned using blastz. Steps [2] to [4] can be modified to accept the blastz output file format.

[1] Split the query (NEW) genome build by FASTA record using faSplit:

$ faSplit sequence NEW.build 2 chr

NOTES: There are 2 fasta records in the NEW build, therefore, I used the argument '2' to split the build into two files. Each file contains one FASTA record, chr0 and chr1. Also need to create .lft files (for step [3]) that describe the two sequences. If breaking the sequences into chunks using the 'size' parameter, just use the -lift option. Otherwise, need to make your own .lft files. For example, chr0.lft contains:

0       chr 3061531 chr      3061531

where columns are:

start seq_name size seq_name size

(See also LiftUp format)

Step [6] requires chrom.sizes files that contain the sequence lengths of the FASTA records in the builds:

$ cd path_to_OLD_build
$ twoBitInfo OLD.2bit chrom.sizes
$ cd path_to_NEW_build
$ twoBitInfo NEW.2bit chrom.sizes

[2] BLAT query sequences from [1] against the OLD build:

$ blat path_to_OLD_build/OLD.2bit path_to_NEW_build/chr0.fa OLD.chr0.psl -tileSize=12 -minScore=100 -minIdentity=98 -fastMap
$ blat path_to_OLD_build/OLD.2bit path_to_NEW_build/chr1.fa OLD.chr1.psl -tileSize=12 -minScore=100 -minIdentity=98 -fastMap

NOTES: Not using ooc file because genome build is small. Probably a good idea to use a ooc file for larger genomes i.e. > X(?) Mb.

[3] Use liftUp to change the coordinate system. Requires .lft files created in step [1]:

$ liftUp -pslQ chr1.psl chr1.lft warn OLD.chr1.psl 
$ liftUp -pslQ chr0.psl chr0.lft warn OLD.chr0.psl 

[4] Chain together alignments from [3] using axtChain:

$ axtChain -linearGap=medium -psl chr0.psl path_to_OLD_build/OLD.2bit path_to_NEW_build/NEW.2bit chr0.chain
$ axtChain -linearGap=medium -psl chr1.psl path_to_OLD_build/OLD.2bit path_to_NEW_build/NEW.2bit chr1.chain

NOTES: Note the -psl argument. This allows axtChain to accept psl as input. I haven't tested this using blastz instead of BLAT. I figure you can convert the lav output from blastz using lavToAxt then use axtChain, ignoring the -psl option.

[5] Combine and sort chain files from [4]:

$ chainMergeSort *.chain | chainSplit chain stdin

NOTES: This creates a directory 'chain' which contains a chr.chain file. The OLD build use here only contained one chromosome. Not sure if axtChain will create one chain file for each target chromosome.

[6] Make alignment nets from chains in [5]:

$ cd chain
$ mkdir ../net
$ chainNet chr.chain path_to_OLD_build/chrom.sizes path_to_NEW_build/chrom.sizes ../net/chr.net /dev/null

NOTES: This step requires the chrom.sizes files from step [1].

[7] Create liftOver chain file:

$ netChainSubset ../net/chr.net chr.chain ../over/chr.chain

[8] Use your new liftOver chain file:

$ liftOver to_be_converted.bed ../over/chr.chain conversions.bed unmapped