UCSC Genes Staging Process

The UCSC Gene set is created at UCSC for three vertebrate organisms: human, mouse, and rat. It is built once during the initial release of a new assembly, then updated sporadically after that. The process for QAing, staging, and releasing an update to the UCSC Gene track is complicated enough that it deserves to be documented.

Data Involved

Databases
- Assembly Database (e.g. hg18) -- usually about 60 tables (more details on this below)
- UniProt Database (e.g. sp080707)
- Proteome Database (e.g. proteins080707)

Files
- Index files to speed searching (e.g. /gbdb/hg18/knownGene.ix and /gbdb/hg18/knownGene.ixx)
- Known Gene list for Google to index (e.g. /usr/local/apache/htdocs/knownGeneList/hg18/*)
- hgdownload files (e.g. /goldenPath/proteinDB/proteins080707/database/README.txt)

Tables in the Assembly Database

There are many tables involved in the UCSC Gene set. For a complete list of all possible tables ever used to support any UCSC Genes set in any of the three organisms, see: /cluster/bin/scripts/kgTables. If there are tables supporting the UCSC Gene set in the assembly you are working with that are not on this list, please add them to the list.

Details About UniProt and Proteome Databases

Staging on hgwbeta

As usual, the new databases and tables will be built on hgwdev. After QAing on hgwdev, the whole set should be staged on hgwbeta.

UCSC Genes Staging Process

Contents

Data Involved

Tables in the Assembly Database

Details About UniProt and Proteome Databases

Staging on hgwbeta

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

related sites

hosted projects

Tools