UCSC Genes Staging Process: Difference between revisions

From genomewiki
Jump to navigationJump to search
(Replacing page with 'This page is no longer maintained.')
 
(35 intermediate revisions by 3 users not shown)
Line 1: Line 1:
The UCSC Gene set is created at UCSC for three vertebrate organisms: human, mouse, and rat.  It is built once during the initial release of a new assembly, then updated sporadically after that.  The process for QAing, staging, and releasing an update to the UCSC Gene track is complicated enough that it deserves to be documented. 
This page is no longer maintained.
 
== Data Involved ==
* Databases
** Assembly Database (e.g. hg18) -- usually about 60 tables (more details on this [[#Tables_in_the_Assembly_Database | below]])
** UniProt Database  (e.g. sp080707)
** Proteome Database (e.g. proteins080707)
 
* Files
** Index files to speed searching (e.g. /gbdb/hg18/knownGene.ix and /gbdb/hg18/knownGene.ixx)
** Known Gene list for Google to index (e.g. /usr/local/apache/htdocs/knownGeneList/hg18/*)
** hgdownload files (e.g. /goldenPath/proteinDB/proteins080707/database/README.txt)
 
 
== Tables in the Assembly Database ==
There are many tables involved in the UCSC Gene set.  For a complete list of all possible tables ever used to support any UCSC Genes set in any of the three organisms, see: /cluster/bin/scripts/kgTables.  If there are tables supporting the UCSC Gene set in the assembly you are working with that are not on this list, please add them to the list.  You might also consider checking the list of tables in the pushQ entry against the list of tables for the previous UCSC Gene set (sometimes developers forget to build all of the necessary tables). 
 
For the Summer 2008 update to the UCSC Gene set on hg18, the tables are:
 
affyHumanExonGs,
affyHumanExonGsMedian,
affyHumanExonGsRatio,
affyHumanExonGsRatioMedian,
bioCycMapDesc,
bioCycPathway,
ccdsKgMap,
ceBlastTab,
cgapAlias,
cgapBiocDesc,
cgapBiocPathway,
chromInfo,
dmBlastTab,
drBlastTab,
foldUtr3,
foldUtr5,
gnfAtlas2Distance,
gnfU95Distance,
humanHprdP2P,
humanVidalP2P,
humanWankerP2P,
keggMapDesc,
keggPathway,
kg3ToKg4,
kgAlias,
kgColor,
kgProtAlias,
kgProtMap2,
kgSpAlias,
kgTxInfo,
kgXref,
knownAlt,
knownBlastTab,
knownCanonical,
knownGene,
knownGeneMrna,
knownGenePep,
knownIsoforms,
knownToAllenBrain,
knownToCdsSnp,
knownToEnsembl,
knownToGnf1h,
knownToGnfAtlas2,
knownToHInv,
knownToHprd,
knownToLocusLink,
knownToPfam,
knownToRefSeq,
knownToSuper,
knownToU133,
knownToU133Plus2,
knownToU95,
knownToVisiGene,
mmBlastTab,
pbAnomLimit,
pbResAvgStd,
pbStamp,
pepCCntDist,
pepExonCntDist,
pepHydroDist,
pepIPCntDist,
pepMolWtDist,
pepMwAa,
pepPi,
pepPiDist,
pepResDist,
pfamDesc,
rnBlastTab,
scBlastTab,
scopDesc,
spMrna.
 
== Details About UniProt and Proteome Databases ==
 
Each UCSC Gene set is related to one UniProt database and one Proteome Database.  Each of these databases can support more than one UCSC Gene set (e.g. a single UniProt database might support the UCSC Genes on both hg18 and mm9). 
 
These databases are given a name based on the date they were created.  All UniProt databases are named using the following convention: spYYMMDD (e.g. sp080707).  All Proteome databases are named using the following convention: proteinsYYMMDD (e.g. proteins080707).
 
To make this transparent to the users, a symbolic link is used; users see "uniProt" (but are actually using spYYMMDD).  Once you push these two databases to hgwbeta, ask the cluster-admin to update the symbolic link in the /var/lib/mysql directory for uniProt and proteome to point to the newly-pushed databases.  Likewise for the push from hgwbeta to the public website.
 
Additionally, as you set up the new databases on hgwbeta (then on the public website) you will need to edit hgcentralbeta.gdbPdb (then hgcentral.gdbPdb) to point to the correct databases:
 
mysql> select * from hgcentraltest.gdbPdb where genomeDb = 'hg18'\G
 
genomeDb:  hg18<BR>
proteomeDb: proteins080707
 
== Staging on hgwbeta ==
As usual, the new databases and tables will be built on hgwdev.  After QAing on hgwdev, the whole set should be staged on hgwbeta. 
 
== Releasing to the public website ==
 
== Post-Release ==
We typically like to announce the release of a new UCSC Gene set.  Ask Donna to prepare an announcement for the website and genome-announce.  Also see this page: [[Post-Release-Checklist]].
 
[[Category:Browser_QA]]

Latest revision as of 19:26, 10 March 2011

This page is no longer maintained.