This page contains links related to the UCSC Genome Browser poster presented by Brooke Rhead at Biology of Genomes 2013 [1])

Poster: New variation resources at the UCSC Genome Browser

This poster presents a first look at two new UCSC Genome Browser features for assessing variation. Both features will be released to the public website in the coming months.

Files: .pptx, PDF
Abstract: .txt

Variant Annotation Integrator

See the development version. The sample data from our Personal Genome SNP format page has been loaded as a custom track.

Overview

In order to assist researchers in annotating and prioritizing thousands of variant calls from sequencing projects, we are developing the Variant Annotation Integrator (VAI) and anticipate a first public release by the end of June 2013. There are several existing tools that can annotate variant calls with predicted functional effects on protein-coding genes and regulatory regions, for example Ensembl's Variant Effect Predictor (VEP). However, these tools are usually restricted to one or two sources of gene annotations and a limited set of additional annotation sources. The VAI will offer much broader choices from the full UCSC database and user-provided custom tracks.

The first release of the VAI will include a simple user interface for selecting variants to annotate as well as the most commonly used annotation sources: protein-coding genes, regulatory regions, predictions from tools such as SIFT and PolyPhen2 provided by the Database of Non-Synonymous Functional Predictions (dbNSFP), and already-discovered variants from dbSNP. The simple user interface will also provide several options for filtering variants based on annotations. A link to an advanced user interface will enable sophisticated users to add annotation sources from the full database.

Usage tips

(need to elaborate on this section):

what to expect to work/not work in the mockup

needs a custom track in VCF or SNP to do anything (eventually will allow starting with native SNPs, and will eventually have a place to paste in input data)

Common Gene Haplotype Alleles

See the development version. Click on any protein-coding gene in the UCSC Genes track and scroll to the Common Gene Haplotype Alleles section. (The feature is currently implemented only on GRCh37/hg19 protein-coding genes.)

Phase 1 of the 1000 Genomes Project included 1092 individual genomes. For each protein-coding gene in the UCSC Genes track, variant data from the 2,184 (per autosome) phased chromosomes have been distilled into distinct haplotype alleles, or distinct sets of variants found on at least one of the 1000 Genomes subject chromosomes.

Usage tips

By default, only non-synonymous, common variants are displyed. Common variants occur in at least 1% of 1000 Genome subject chromosomes. Including all variants in the display will generate the list of all haplotypes found in 1000 Genomes participants, though many of these haplotypes may have no protein coding effect. Note that haplotype and homozygous frequency calculations depend upon which variants are included.

By default, only common haplotype alleles are displayed. Common haplotypes occur in at least 1% of 1000 Genome subject chromosomes.

There may be no "reference haplotype" (made of entirely reference variants) represented in the 1000 Genomes data. If there is, it will be marked as "reference" in the table of haplotypes.

When the full sequence is displayed, columns with variants are highlighted by green vertical lines. The effects of variants are highlighted by bolded red letters. Synonymous changes are only evident when DNA bases are displayed. Each haplotype allele sequence is generated from GRCh37/hg19 reference DNA, with 1000 Genomes Project DNA variants spliced in, then translated into amino acids.

All columns are sortable. Sorting on a variant while the full sequence is displayed will highlight that variant with a vertical blue line.

Hovering your mouse over numbers in the "haplotype frequency" and "homozygous frequency" columns will show you the actual count of alleles (e.g., "N=370 of 2184").

Hovering your mouse over some buttons displays hints.

Clicking on non-reference variants in the summary section takes you to the corresponding track details pages of the 1000G Ph1 Vars track.

Clicking the "Display population" button will show the distribution of each haplotype allele among major population groups. Optionally display the distribution of each allele among the groups defined by the 1000 Genomes Project.

By default, scoring is hidden. Three types of scores are provided to help users find haplotype alleles that occur more or less frequently than expected or that have unusual distributions in populations. See definitions below.

Scoring definitions

Hap score: The haplotype score is the normalized (-log10) probability of finding exactly N subject chromosomes with this haplotype, given the proportions of individual variants. The score is normalized by dividing by the total number of variants. Normalization allows comparing the scores between genes with many variants and those with few. The score will be positive if the haplotype is more frequent than expected by chance and negative if less frequent.

Hom score: The homozygous score is the (-log10) probability of finding exactly N individuals with this haplotype on both chromosomes, given the actual frequency of the haplotype in subject chromosomes. The score will be positive if the haplotype is found homozygous in more and negative when in fewer individuals than expected. Negative values might suggest that the haplotype is deleterious when homozygous. scores

Pop score (only visible when population distributions are displayed): The population skew score is the variance between population groups divided by N, the number occurrences of the haplotype. The most frequently occurring haplotypes will potentially have larger scores, but if N is small, a skew in population distribution is not unexpected.

How to get help

Search for answers in our mail list archives: http://genome.ucsc.edu/contacts.html
Email a new question to our actively monitored list genome@soe.ucsc.edu
OpenHelix's free training materials: http://www.openhelix.com/downloads/ucsc/ucsc_home.shtml

BoG2013VariationPoster

Contents

Poster: New variation resources at the UCSC Genome Browser

Variant Annotation Integrator

Overview

Usage tips

Common Gene Haplotype Alleles

Usage tips

Scoring definitions

How to get help

Other posters about the UCSC Genome Browser

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

related sites

hosted projects

Tools