Minimal Steps For LiftOver

From genomewiki
Revision as of 15:18, 26 April 2018 by Hiram (talk | contribs) (corrected doBlastzChainNet.pl)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

2018 UPDATE NOTE

This page is an interesting historical discussion and well worth the read.

HOWEVER, please note, the UCSC tool chain commands: DoSameSpeciesLiftOver.pl and DoBlastzChainNet.pl can now completely perform the sequence of events in your environment with your selected genome sequences.

Discussion

See also: Whole_genome_alignment_howto, LiftOver_Howto and Same_species_lift_over_construction.

This doc is a stripped down version of /kent/src/hg/doc/liftOver.txt.

The following describes the minimal steps needed to create a liftOver chain file that converts annotations between bacterial genome builds (genome size ~4Mb). Because they are the same species, I used BLAT to align the two builds. More divergent sequences should be aligned using blastz. Steps [2] to [4] can be modified to accept the blastz output file format.

[1] Split the query (NEW) genome build by FASTA record using faSplit:

$ faSplit sequence NEW.build 2 chr

NOTES: There are 2 fasta records in the NEW build, therefore, I used the argument '2' to split the build into two files. Each file contains one FASTA record, chr0 and chr1. Also need to create .lft files (for step [3]) that describe the two sequences. If breaking the sequences into chunks using the 'size' parameter, just use the -lift option. Otherwise, need to make your own .lft files. For example, chr0.lft contains:

0       chr 3061531 chr      3061531

where columns are:

start seq_name size seq_name size

(See also LiftUp format)

Step [6] requires chrom.sizes files that contain the sequence lengths of the FASTA records in the builds:

$ cd path_to_OLD_build
$ twoBitInfo OLD.2bit chrom.sizes
$ cd path_to_NEW_build
$ twoBitInfo NEW.2bit chrom.sizes

[2] BLAT query sequences from [1] against the OLD build:

$ blat path_to_OLD_build/OLD.2bit path_to_NEW_build/chr0.fa OLD.chr0.psl -tileSize=12 -minScore=100 -minIdentity=98 -fastMap
$ blat path_to_OLD_build/OLD.2bit path_to_NEW_build/chr1.fa OLD.chr1.psl -tileSize=12 -minScore=100 -minIdentity=98 -fastMap

NOTES: Not using ooc file because genome build is small. Probably a good idea to use a ooc file for larger genomes i.e. > X(?) Mb.

[3] Use liftUp to change the coordinate system. Requires .lft files created in step [1]:

$ liftUp -pslQ chr1.psl chr1.lft warn OLD.chr1.psl 
$ liftUp -pslQ chr0.psl chr0.lft warn OLD.chr0.psl 

[4] Chain together alignments from [3] using axtChain:

$ axtChain -linearGap=medium -psl chr0.psl path_to_OLD_build/OLD.2bit path_to_NEW_build/NEW.2bit chr0.chain
$ axtChain -linearGap=medium -psl chr1.psl path_to_OLD_build/OLD.2bit path_to_NEW_build/NEW.2bit chr1.chain

NOTES: Note the -psl argument. This allows axtChain to accept psl as input. I haven't tested this using blastz instead of BLAT. I figure you can convert the lav output from blastz using lavToAxt then use axtChain, ignoring the -psl option.

[5] Combine and sort chain files from [4]:

$ chainMergeSort *.chain | chainSplit chain stdin

NOTES: This creates a directory 'chain' which contains a chr.chain file. The OLD build use here only contained one chromosome. Not sure if axtChain will create one chain file for each target chromosome.

[6] Make alignment nets from chains in [5]:

$ cd chain
$ mkdir ../net
$ chainNet chr.chain path_to_OLD_build/chrom.sizes path_to_NEW_build/chrom.sizes ../net/chr.net /dev/null

NOTES: This step requires the chrom.sizes files from step [1].

[7] Create liftOver chain file:

$ netChainSubset ../net/chr.net chr.chain ../over/chr.chain

[8] Use your new liftOver chain file:

$ liftOver to_be_converted.bed ../over/chr.chain conversions.bed unmapped