QAing UCSC Genes
hgGene page source information (see the link below):
Gene Sorter Column Sources:
Name |
Description |
Source |
# |
Item Number in Displayed List/Select Gene |
n/a |
Name |
Gene Name/Select Gene |
kgXref.geneSymbol |
UCSC ID |
UCSC Transcript ID |
knownGene.name |
UniProtKB |
UniProtKB Protein Display ID |
kgXref.spDisplayID or kgXref.spID_organism |
UniProtKB Acc |
UniProtKB Protein Accession |
kgXref.spID |
RefSeq |
NCBI RefSeq Gene Accession |
kgXref.refseq |
Entrez Gene |
NCBI Entrez Gene/LocusLink ID |
knownToLocusLink |
GenBank |
GenBank mRNA Accession |
kgXref.refseq or kgXref.mRNA |
Ensembl |
Ensembl Transcript ID |
knownToEnsembl |
GNF Atlas 2 ID |
ID of Associated GNF Atlas 2 Expression Data |
knownToGnfAtlas2 |
Gene Category |
High Level Gene Category - Coding, Antisense, etc. |
kgTxInfo.category |
CDS Score |
Coding potential score from txCdsPredict |
kgTxInfo.cdsScore |
VisiGene |
UCSC VisiGene In Situ Image Browser |
knownToVisiGene |
Allen Brain |
Allen Brain Atlas In Situ Images of Adult Mouse Brains |
knownToAllenBrain & allenBrainUrl |
U133 ID |
ID of Associated Affymetrix U133 Expression Data |
knownToU133 |
GNF Atlas 2 |
GNF Expression Atlas 2 Data from U133A and GNF1H Chips |
gnfAtlas2 |
Max GNF Atlas 2 |
Maximum Expression Value of GNF Expression Atlas 2 |
calculated? |
GNF Atlas 2 Delta |
Normalized Difference in GNF Expression Atlas 2 from Selected Gene |
gnfAtlas2Distance |
BLASTP |
NCBI BLASTP Bit Score |
knownBlastTab.bitScore |
BLASTP |
NCBI BLASTP E-Value |
knownBlastTab.evalue |
%ID |
NCBI BLASTP Percent Identity |
knownBlastTab.identity |
5' UTR Fold |
5' UTR Fold Energy (Estimated kcal/mol) |
foldUtr5.energy |
3' UTR Fold |
3' UTR Fold Energy (Estimated kcal/mol) |
foldUtr3.energy |
Exon Count |
Number of Exons (Including Non-Coding) |
knownGene.exonCount |
Intron Size |
Size of biggest (or optionally smallest) intron |
knownGene exonStarts - exonEnds |
Genome Position |
Genome Position/Link to Genome Browser |
(knownGene.txStart + txEnd)/2 |
Mouse |
Mouse Ortholog (Best Blastp Hit to UCSC Known Genes) |
mmBlastTab |
Rat |
Rat Ortholog (Best Blastp Hit to UCSC Known Genes) |
rnBlastTab |
Zebrafish |
Danio rerio Ortholog (Best Blastp Hit to Ensembl) |
drBlastTab |
Drosophila |
D. melanogaster Ortholog (Best Blastp Hit to FlyBase Proteins) |
dmBlastTab |
C. elegans |
C. elegans Ortholog (Best Blastp Hit to WormPep) |
ceBlastTab |
Yeast |
Saccharomyces cerevisiae Ortholog (Best Blastp Hit to RefSeq) |
scBlastTab |
Pfam Domains |
Protein Family Domain Structure |
knownToPfam à pfamDesc |
Superfamily |
Protein Superfamily Assignments |
ucscScop & scopDesc |
PDB |
Protein Data Bank |
kgProtMap2 & sp###### database |
Gene Ontology |
Gene Ontology (GO) Terms Associated with Gene |
kgProtMap2 & sp###### database |
M. Vidal P2P |
Human Protein-Protein Interaction Network from Marc Vidal |
humanVidalP2P |
E. Wanker P2P |
Human Protein-Protein Interaction Network from Erich Wanker |
humanWankerP2P |
HPRD P2P |
Human Protein-Protein Interaction Network from the Human Reference Protein Database |
humanHprdP2P |
Description |
Short Description Line/Link to Details Page |
kgXref.description |
Table Descriptions
Attempt to describe the uses of the tables used in or related to UCSC Genes.
UCSC Gene and GS Table Descriptions
allenBrainGene - "Human Cortex Gene Expression" link in Seq&Links to Tls&Dbs allenBrainUrl - w/ knownToAllenBrain creats GS column, "Allen Brain" bioCycMapDesc - BioCyc description name in Biochem & Signaling... bioCycPathway - BioCyc pathway name in Biochem & Signaling... ccdsKgMap - determines the CCDS in the "Other names for this Gene" section ceBlastTab - other species C. elegans cgapAlias - links cgapID with kgXref.geneSymbol to pull info for gene. cgapBiocDesc - BioCarta description in Biochem & Signaling Pathways cgapBiocPathway - BioCarta pathway name in Biochem & Signaling Pathways dmBlastTab - other species D. melanogaster - leave as open issue for now drBlastTab - other species zebrafish foldUtr3 - mRNA Secondary Structure....section foldUtr5 - mRNA Secondary Structure....section gnfAtlas2 - own track, QA'd with that track, GS, micrary exp data sxn gnfAtlas2Distance - GS sort by "Expression (GNF Atlas2)" & GS clmn "GNF Atlas 2 Delta" humanHprdP2P - Gene Sorter column "HPRD P2P" & "sort by" humanVidalP2P - Gene Sorter column "M. Vidal Protein-to-Protein" & GS sort by humanWankerP2P - Gene Sorter column "E. Wanker Protein-to-Protein" & "sort by" keggMapDesc - KEGG pathway description in Biochem & Signaling Pathways keggPathway - KEGG pathway name in Biochem & Signaling Pathways kg4ToKg5 - allows searching of an old ID from previous gene set in new gene set or users can check the kg3ToKg4 table directly to find corresponding gene IDs. kgAlias - pops "Alternate Gene Symbols" in Other Names... section kgColor - colors the gene in browser kgProtAlias - intermediate table? kgProtMap2 - Scop Domains in Protein Domain & Structure Info & Protein Data Bank in GS needs this to work, also involved with proteome browser (not releasing with proteome browser with hg19; being phased out) kgSpAlias - duplicate of kgAlias w/ extra field, spID, that is blank in all records kgTxInfo - provides "Gene Model Information" kgXref - provides the "other names for the gene" knownAlt - separate track "Alt Events"; see tracks/altEvents/hg19/methods knownBlastTab - Gene Sorter (GS "ID%"=identity, GS "BLASTP E-Value"=eValue, GS "BLASTP Bits"=bitScore) knownCanonical - best transcript from each clusterId - don't display splice variants knownGene - primary table knownGeneMrna - "mRNA" link in Seq & lnks to Tools &Db section knownGenePep - "protein" link in Seq & lnks to Tools & Db section knownIsoforms - groups transcripts into clusterId knownToAllenBrain - w/ allenBrainUrl creates GS "Allen Brain" column/link knownToCdsSnp - Not pushing; Coding SNP column in gene sorter knownToEnsembl - used in link to Ensembl knownToGnf1h - dropped & didn't see changes on hgGene or Gene Sorter, gnfAtlas1? knownToGnfAtlas2 - "Microarray..." sxn & Microarray link, GS "GNF Atlas 2 ID" knownToHprd - creates the "HPRD" link in the Seq&lnks to Tls&Dbs section knownToLocusLink - used in link to Entrez Gene, see issues below knownToPfam - gives Pfam Domains section of Prot Dom & Stre info & GS knownToRefSeq - used in link to RefSeq (Other Names) knownToSuper - contains scop domain info with gene name & start/end knownToTreefam - used for link to Treefam website in Seq&lnks to Tls&Dbs knownToU133 - Gene Sorter column "U133 ID" knownToVisiGene - used in link to VisiGene mmBlastTab - other species mouse pfamDesc - gives Pfam description in pfam domains section (step in GS) rnBlastTab - other species rat scBlastTab - other species S. cerevisiae scopDesc - prints acc and description in "SCOP Domains" of Prot Dmn * Strtr Info spMrna - intermediate table? Doesn't seem to directly affect hgGene or GS ucscScop - from ucscID gets scop domainName
Tables Related to UCSC Genes are Separate tracks
affyU133 allenBrainAli exoniphy - created by Adam Siepel of Cornell for each assembly (2nd choice is to lift from previous assembly) gnfAtlas2 nibbImageProbes omimGene omimGeneMap omimMorbidMap omimToKnownCanonical vgAllProbes
No longer UCSC Genes Tables
knownToCdsSnp - dropping on all assemblies. Found too many issues; Populated Cds Snp column in Gene Sorter. knownToGnf1h - part of GNF Atlas 1, which is not on hg19
Proteome Browser Tables (no longer releasing)
pbAnomLimit pbResAvgStd pepCCntDist pepExonCntDist pepHydroDist pepIPCntDist pepMolWtDist pepPi pepPiDist pepResDist pepMwAa