GenbankAlignments

From Genecats
Revision as of 20:06, 15 January 2019 by Braney (talk | contribs)
Jump to navigationJump to search

The Genbank Alignment Process

This is a description of the behind the scenes parts of the Genbank alignment process. If you want doc on how to add a new species to the list of aligned assemblies you want to go here.

Overview

The genbank alignment process aligns RNA and EST sequences from NCBI, as well as the RefSeq mRNA's to (almost) all the assemblies that UCSC supports. The process is divided into roughly five parts: download, process, align, database load, and dissemination. The first four parts and the beginning of the fifth happen on the genbank-101 machine, the dissemination part includes hgwdev, hgwbeta, and then to our official mirrors (RR, euro, japan).

Realigning Tracks It maybe necessary to realign and reload tracks to change alignment parameters or other attributes. This is fairly straight forward when a genome databases is initially being built. It's more complex if one has to sync up multiple systems. If automated alignment or update has been enabled for the database, disable it by editing $gbRoot/etc/align.dbs. Make sure an automated alignment isn't current running. To triger a realignment, on needs to remove the related files for some partation of the data for all updates. These live under either the genbank or refseq alignment directories, for example: data/aligned/genbank.139.0/hg16/ data/aligned/refseq.139.0/hg16/ To realign native RefSeq mRNAs for hg16, one would remove: data/aligned/refseq.139.0/hg16/*/mrna.native.* To realign xeno GeneBank ESTs for hg16, one would remove: data/aligned/refseq.139.0/hg16/*/est.*.xeno.* Do an initial alignment as described above, restricting with -srcDb and -type. Reload the database with the partation of data that was realigned. The -srcDb and -type options restrict the subset. The organism category (native or xeno) isn't specified. Reloading of ESTs isn't supported, use -drop and -initialLoad instead. nice bin/gbDbLoadStep -reload -srcDb=genbank -type=mrna $db