BoG2013VariationPoster: Difference between revisions

From genomewiki
Jump to navigationJump to search
(→‎Common Gene Haplotype Alleles: reworded some gene haplotype allele points with suggestions from Tim)
(→‎Common Gene Haplotype Alleles: tried to make methods more clear, added notes about scoring and the reference variant, added scoring section)
Line 20: Line 20:
See the [http://hgwdev-demo3.cse.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Rhead&hgS_otherUserSessionName=BoG2013VariationPoster development version]. Click on any protein-coding gene in the '''UCSC Genes''' track and scroll to the '''Common Gene Haplotype Alleles''' section.  (The feature is currently implemented only on GRCh37/hg19 protein-coding genes.)
See the [http://hgwdev-demo3.cse.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Rhead&hgS_otherUserSessionName=BoG2013VariationPoster development version]. Click on any protein-coding gene in the '''UCSC Genes''' track and scroll to the '''Common Gene Haplotype Alleles''' section.  (The feature is currently implemented only on GRCh37/hg19 protein-coding genes.)


For each protein-coding gene in the UCSC Genes track, the 2,184 phased chromosomes in the [http://www.1000genomes.org/ 1000 Genomes Project] have been distilled into distinct haplotype alleles.
For each protein-coding gene in the UCSC Genes track, variant data from the 2,184 phased chromosomes in the [http://www.1000genomes.org/ 1000 Genomes Project] have been distilled into distinct haplotype alleles. Each haplotype allele is generated from GRCh37/hg19 reference DNA, with 1000 Genomes Project DNA variants spliced in, then translated into amino acids.
 
===Usage tips===


* By default, only non-synonymous, common (occurring in at least 1% of haploytope alleles) variants are displayed.  Including all variants in the display will generate the list of all haplotypes found in 1000 Genomes participants, though many of these haplotypes may have no protein coding effect.  Including all variants will also update haplotype and homozygous frequency calculations.
* By default, only non-synonymous, common (occurring in at least 1% of haploytope alleles) variants are displayed.  Including all variants in the display will generate the list of all haplotypes found in 1000 Genomes participants, though many of these haplotypes may have no protein coding effect.  Including all variants will also update haplotype and homozygous frequency calculations.
Line 26: Line 28:
* By default, only common (occurring with a frequency of more than 1%) haplotype alleles are displayed.
* By default, only common (occurring with a frequency of more than 1%) haplotype alleles are displayed.


* When the full sequence is displayed, columns with variants are highlighted by a green vertical line.  The effects of variants are highlighted by bolded red letters.  Synonymous changes are only evident when DNA bases are displayed.
* If the reference variant is present among the haplotype alleles generated from the 1000 Genomes data, it will be labeled as such in the "Reference  Variants" column.
 
* When the full sequence is displayed, columns with variants are highlighted by green vertical lines.  The effects of variants are highlighted by bolded red letters.  Synonymous changes are only evident when DNA bases are displayed.


* All columns are sortable.
* All columns are sortable.
Line 36: Line 40:
* Clicking on variants in the summary section takes you to the corresponding track details pages of the [http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=tgpPhase1 1000G Ph1 Vars] track.
* Clicking on variants in the summary section takes you to the corresponding track details pages of the [http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=tgpPhase1 1000G Ph1 Vars] track.


* See the distribution of each haplotype allele among major population groups by clicking the "Display distribution" button.  Optionally display the distribution of each allele among the [http://www.1000genomes.org/about#ProjectSamples groups defined by the 1000 Genomes Project].
* Clicking the "Display distribution" button will show the distribution of each haplotype allele among major population groups.  Optionally display the distribution of each allele among the [http://www.1000genomes.org/about#ProjectSamples groups defined by the 1000 Genomes Project].
 
* By default, scoring is hidden.  Three types of scores are provided to help users find haplotype alleles that occur more or less frequently than expected or that have unusual distributions in populations.  See definitions below.
 
===Scoring definitions===
 
* '''Hap score''':
 
* '''Hom score''':
 
* '''Pop score''' (only visible when population distributions are displayed):


==How to get help==
==How to get help==

Revision as of 23:35, 26 April 2013

This page contains links related to the UCSC Genome Browser poster presented by Brooke Rhead at Biology of Genomes 2013 [1])

Poster: New variation resources at the UCSC Genome Browser

This poster presents a first look at two new UCSC Genome Browser features for assessing variation. Both features will be released to the public website in the coming months.

Variant Annotation Integrator

See the development version.

In order to assist researchers in annotating and prioritizing thousands of variant calls from sequencing projects, we are developing the Variant Annotation Integrator (VAI) and anticipate a first public release by the end of June 2013. There are several existing tools that can annotate variant calls with predicted functional effects on protein-coding genes and regulatory regions, for example Ensembl's Variant Effect Predictor (VEP). However, these tools are usually restricted to one or two sources of gene annotations and a limited set of additional annotation sources. The VAI will offer much broader choices from the full UCSC database and user-provided custom tracks.

The first release of the VAI will include a simple user interface for selecting variants to annotate as well as the most commonly used annotation sources: protein-coding genes, regulatory regions, predictions from tools such as SIFT and PolyPhen2 provided by the Database of Non-Synonymous Functional Predictions (dbNSFP), and already-discovered variants from dbSNP. The simple user interface will also provide several options for filtering variants based on annotations. A link to an advanced user interface will enable sophisticated users to add annotation sources from the full database.

Common Gene Haplotype Alleles

See the development version. Click on any protein-coding gene in the UCSC Genes track and scroll to the Common Gene Haplotype Alleles section. (The feature is currently implemented only on GRCh37/hg19 protein-coding genes.)

For each protein-coding gene in the UCSC Genes track, variant data from the 2,184 phased chromosomes in the 1000 Genomes Project have been distilled into distinct haplotype alleles. Each haplotype allele is generated from GRCh37/hg19 reference DNA, with 1000 Genomes Project DNA variants spliced in, then translated into amino acids.

Usage tips

  • By default, only non-synonymous, common (occurring in at least 1% of haploytope alleles) variants are displayed. Including all variants in the display will generate the list of all haplotypes found in 1000 Genomes participants, though many of these haplotypes may have no protein coding effect. Including all variants will also update haplotype and homozygous frequency calculations.
  • By default, only common (occurring with a frequency of more than 1%) haplotype alleles are displayed.
  • If the reference variant is present among the haplotype alleles generated from the 1000 Genomes data, it will be labeled as such in the "Reference Variants" column.
  • When the full sequence is displayed, columns with variants are highlighted by green vertical lines. The effects of variants are highlighted by bolded red letters. Synonymous changes are only evident when DNA bases are displayed.
  • All columns are sortable.
  • Hovering your mouse over numbers in the "haplotype frequency" and "homozygous frequency" columns will show you the actual count of alleles (e.g., "370 of 2184").
  • Hovering your mouse over some buttons displays hints.
  • Clicking on variants in the summary section takes you to the corresponding track details pages of the 1000G Ph1 Vars track.
  • Clicking the "Display distribution" button will show the distribution of each haplotype allele among major population groups. Optionally display the distribution of each allele among the groups defined by the 1000 Genomes Project.
  • By default, scoring is hidden. Three types of scores are provided to help users find haplotype alleles that occur more or less frequently than expected or that have unusual distributions in populations. See definitions below.

Scoring definitions

  • Hap score:
  • Hom score:
  • Pop score (only visible when population distributions are displayed):

How to get help

Other posters about the UCSC Genome Browser

  • Using the UCSC Genome Browser to evaluate putative genetic variants. Hinrichs AS et al. Biology of Genomes, 2012. genomewiki page .pptx, PDF
  • Visually integrating genomic data in the UCSC Genome Browser. Hinrichs AS et al. HGV 2011. genomewiki page .pptx, PDF
  • UCSC Genome Browser Data Hubs. Zweig AS et al. Biology of Genomes, 2011. PDF
  • Genome-wide ENCODE Data at UCSC. Rosenbloom KR et al. ASHG, 2010. PPT
  • UCSC Genome Browser Tool Suite. Hinrichs AS et al. Genomics of Common Disease, 2008: .ppt, PDF
  • More Presentations