GenbankAlignments: Difference between revisions

From Genecats
Jump to navigationJump to search
No edit summary
No edit summary
Line 1: Line 1:
The Genbank Alignment Process
The Genbank Alignment Process
This is a description of the behind the scenes parts of the Genbank alignment process.  If you want doc on how to add a new species to the list of aligned assemblies you want to go here.


Overview
Overview


The genbank alignment process aligns RNA and EST sequences from NCBI, as well as the RefSeq mRNA's to (almost) all the assemblies that UCSC supports.  The process is divided into roughly five parts: download, process, align, database load, and dissemination.  The first four parts and the beginning of the fifth happen on the genbank-101 machine, the dissemination part includes hgwdev, hgwbeta, and then to our official mirrors (R
The genbank alignment process aligns RNA and EST sequences from NCBI, as well as the RefSeq mRNA's to (almost) all the assemblies that UCSC supports.  The process is divided into roughly five parts: download, process, align, database load, and dissemination.  The first four parts and the beginning of the fifth happen on the genbank-101 machine, the dissemination part includes hgwdev, hgwbeta, and then to our official mirrors (RR, euro, japan).
 
'''Realigning Tracks'''
It maybe necessary to realign and reload tracks to change alignment parameters or other attributes. This is fairly straight forward when a genome databases is initially being built. It's more complex if one has to sync up multiple systems.
If automated alignment or update has been enabled for the database, disable it by editing $gbRoot/etc/align.dbs.
Make sure an automated alignment isn't current running.
To triger a realignment, on needs to remove the related files for some partation of the data for all updates. These live under either the genbank or refseq alignment directories, for example:
data/aligned/genbank.139.0/hg16/
data/aligned/refseq.139.0/hg16/
To realign native RefSeq mRNAs for hg16, one would remove:
data/aligned/refseq.139.0/hg16/*/mrna.native.*
To realign xeno GeneBank ESTs for hg16, one would remove:
data/aligned/refseq.139.0/hg16/*/est.*.xeno.*
Do an initial alignment as described above, restricting with -srcDb and -type.
Reload the database with the partation of data that was realigned. The -srcDb and -type options restrict the subset. The organism category (native or xeno) isn't specified. Reloading of ESTs isn't supported, use -drop and -initialLoad instead.
nice bin/gbDbLoadStep -reload -srcDb=genbank -type=mrna $db

Revision as of 20:06, 15 January 2019

The Genbank Alignment Process

This is a description of the behind the scenes parts of the Genbank alignment process. If you want doc on how to add a new species to the list of aligned assemblies you want to go here.

Overview

The genbank alignment process aligns RNA and EST sequences from NCBI, as well as the RefSeq mRNA's to (almost) all the assemblies that UCSC supports. The process is divided into roughly five parts: download, process, align, database load, and dissemination. The first four parts and the beginning of the fifth happen on the genbank-101 machine, the dissemination part includes hgwdev, hgwbeta, and then to our official mirrors (RR, euro, japan).

Realigning Tracks It maybe necessary to realign and reload tracks to change alignment parameters or other attributes. This is fairly straight forward when a genome databases is initially being built. It's more complex if one has to sync up multiple systems. If automated alignment or update has been enabled for the database, disable it by editing $gbRoot/etc/align.dbs. Make sure an automated alignment isn't current running. To triger a realignment, on needs to remove the related files for some partation of the data for all updates. These live under either the genbank or refseq alignment directories, for example: data/aligned/genbank.139.0/hg16/ data/aligned/refseq.139.0/hg16/ To realign native RefSeq mRNAs for hg16, one would remove: data/aligned/refseq.139.0/hg16/*/mrna.native.* To realign xeno GeneBank ESTs for hg16, one would remove: data/aligned/refseq.139.0/hg16/*/est.*.xeno.* Do an initial alignment as described above, restricting with -srcDb and -type. Reload the database with the partation of data that was realigned. The -srcDb and -type options restrict the subset. The organism category (native or xeno) isn't specified. Reloading of ESTs isn't supported, use -drop and -initialLoad instead. nice bin/gbDbLoadStep -reload -srcDb=genbank -type=mrna $db