GenbankAlignments: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
The Genbank Alignment Process | The Genbank Alignment Process | ||
This is a description of the behind the scenes parts of the Genbank alignment process. If you want doc on how to add a new species to the list of aligned assemblies you want to go here. | |||
Overview | Overview | ||
The genbank alignment process aligns RNA and EST sequences from NCBI, as well as the RefSeq mRNA's to (almost) all the assemblies that UCSC supports. The process is divided into roughly five parts: download, process, align, database load, and dissemination. The first four parts and the beginning of the fifth happen on the genbank-101 machine, the dissemination part includes hgwdev, hgwbeta, and then to our official mirrors ( | The genbank alignment process aligns RNA and EST sequences from NCBI, as well as the RefSeq mRNA's to (almost) all the assemblies that UCSC supports. The process is divided into roughly five parts: download, process, align, database load, and dissemination. The first four parts and the beginning of the fifth happen on the genbank-101 machine, the dissemination part includes hgwdev, hgwbeta, and then to our official mirrors (RR, euro, japan). | ||
'''Realigning Tracks''' | |||
It maybe necessary to realign and reload tracks to change alignment parameters or other attributes. This is fairly straight forward when a genome databases is initially being built. It's more complex if one has to sync up multiple systems. | |||
If automated alignment or update has been enabled for the database, disable it by editing $gbRoot/etc/align.dbs. | |||
Make sure an automated alignment isn't current running. | |||
To triger a realignment, on needs to remove the related files for some partation of the data for all updates. These live under either the genbank or refseq alignment directories, for example: | |||
data/aligned/genbank.139.0/hg16/ | |||
data/aligned/refseq.139.0/hg16/ | |||
To realign native RefSeq mRNAs for hg16, one would remove: | |||
data/aligned/refseq.139.0/hg16/*/mrna.native.* | |||
To realign xeno GeneBank ESTs for hg16, one would remove: | |||
data/aligned/refseq.139.0/hg16/*/est.*.xeno.* | |||
Do an initial alignment as described above, restricting with -srcDb and -type. | |||
Reload the database with the partation of data that was realigned. The -srcDb and -type options restrict the subset. The organism category (native or xeno) isn't specified. Reloading of ESTs isn't supported, use -drop and -initialLoad instead. | |||
nice bin/gbDbLoadStep -reload -srcDb=genbank -type=mrna $db |
Revision as of 20:06, 15 January 2019
The Genbank Alignment Process
This is a description of the behind the scenes parts of the Genbank alignment process. If you want doc on how to add a new species to the list of aligned assemblies you want to go here.
Overview
The genbank alignment process aligns RNA and EST sequences from NCBI, as well as the RefSeq mRNA's to (almost) all the assemblies that UCSC supports. The process is divided into roughly five parts: download, process, align, database load, and dissemination. The first four parts and the beginning of the fifth happen on the genbank-101 machine, the dissemination part includes hgwdev, hgwbeta, and then to our official mirrors (RR, euro, japan).
Realigning Tracks It maybe necessary to realign and reload tracks to change alignment parameters or other attributes. This is fairly straight forward when a genome databases is initially being built. It's more complex if one has to sync up multiple systems. If automated alignment or update has been enabled for the database, disable it by editing $gbRoot/etc/align.dbs. Make sure an automated alignment isn't current running. To triger a realignment, on needs to remove the related files for some partation of the data for all updates. These live under either the genbank or refseq alignment directories, for example: data/aligned/genbank.139.0/hg16/ data/aligned/refseq.139.0/hg16/ To realign native RefSeq mRNAs for hg16, one would remove: data/aligned/refseq.139.0/hg16/*/mrna.native.* To realign xeno GeneBank ESTs for hg16, one would remove: data/aligned/refseq.139.0/hg16/*/est.*.xeno.* Do an initial alignment as described above, restricting with -srcDb and -type. Reload the database with the partation of data that was realigned. The -srcDb and -type options restrict the subset. The organism category (native or xeno) isn't specified. Reloading of ESTs isn't supported, use -drop and -initialLoad instead. nice bin/gbDbLoadStep -reload -srcDb=genbank -type=mrna $db