Minimal Steps For LiftOver

From genomewiki
Revision as of 18:05, 9 June 2007 by Vforget (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

The following outlines the minimal steps I undertook to make a liftOver chain file to convert annotations between bacterial genome builds (genome size ~4Mb).

[1] Split the query (NEW) genome build by FASTA record using faSplit:

$ faSplit sequence NEW.build 2 chr

NOTES: There are 2 fasta records in the NEW build. Therefore, I used the argument '2' to split the build into two files which each contain one FASTA record, chr0 and chr1. Also need to create .lft files that describe the two sequences. If breaking the sequences into chunks using the 'size' parameter, just use the -lift option. Otherwise, need to make your own .lft files, e.g. chr0.lft contains:

0       chr 3061531 chr      3061531

where columns are:

start seq_name size seq_name size

Also need to make chrom.sizes files what contain the sequence lengths of the FASTA records in the builds:

$ cd path_to_OLD_build
$ twoBitInfo OLD.2bit chrom.sizes
$ cd path_to_NEW_build
$ twoBitInfo NEW.2bit chrom.sizes

[2] BLAT query sequences from [1] against the OLD build:

$ blat path_to_OLD_build/OLD.2bit path_to_NEW_build/chr0.fa OLD.chr0.psl -tileSize=12 -minScore=100 -minIdentity=98 -fastMap
$ blat path_to_OLD_build/OLD.2bit path_to_NEW_build/chr1.fa OLD.chr1.psl -tileSize=12 -minScore=100 -minIdentity=98 -fastMap

NOTES: Not using ooc file becuase genome build is small. Probably a good idea to use a ooc file for larger genomes i.e. > X Mb (?).

[3] Use liftUp to change the coordinate system. Requires .lft files created in step [1]:

$ liftUp -pslQ chr1.psl chr1.lft warn OLD.chr1.psl 
$ liftUp -pslQ chr0.psl chr0.lft warn OLD.chr0.psl 

[4] Chain together alignments from [3] using axtChain:

$ axtChain -linearGap=medium -psl chr0.psl path_to_OLD_build/OLD.2bit path_to_NEW_build/NEW.2bit chr0.chain
$ axtChain -linearGap=medium -psl chr1.psl path_to_OLD_build/OLD.2bit path_to_NEW_build/NEW.2bit chr1.chain

NOTES: Note the -psl argument. This allows axtChain to accept psl as input. I haven't tested this using blastz instead of BLAT. I figure you can convert the lav output from blastz using lavToAxt then use axtChain, ignoring the -psl option.

[5] Combine and sort chain files from [4]:

$ chainMergeSort *.chain | chainSplit chain stdin

NOTES: This creates a directory 'chain' whicn contains a chr.chain file. The OLD build use here only contained one chromosome. Not sure if axtChain will create one chain file for each target chromosome.

[6] Make alignment nets from chains in [5]:

$ cd chain
$ mkdir ../net
$ chainNet chr.chain path_to_OLD_build/chrom.sizes path_to_NEW_build/chrom.sizes $ ../net/chr.net /dev/null

NOTES: This step requires the chrom.sizes files from step [1].

[7] Create liftOver chain file:

$ netChainSubset ../net/chr.net chr.chain ../over/chr.chain

[8] Use your new liftOver chain file:

$ liftOver to_be_converted.bed ../over/chr.chain conversions.bed unmapped