29mammals: Difference between revisions

From genomewiki
Jump to navigationJump to search
(adding constrained elements)
m (Protected "29mammals" [edit=sysop:move=sysop])
 
(15 intermediate revisions by one other user not shown)
Line 1: Line 1:
29 Mammal Paper supplemental materials
'''A high-resolution map of evolutionary constraint in the human genome based on 29 eutherian mammals'''


All tracks can be loaded into the hg18 genome browser:
This page contains links to view the data sets from the above paper on the UCSC genome browser.
[http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hgt.customText=http://genomewiki.ucsc.edu/images/b/b0/29mammals_track.txt 29mammals custom tracks] or the individual tracks can be loaded separately in the list below:
All tracks can be loaded into the hg18 genome browser using this link,
[http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hgt.customText=http://genomewiki.ucsc.edu/images/b/b0/29mammals_track.txt 29mammals custom tracks]  
,or the individual tracks can be loaded separately from the list below.  The data for these tracks has been slightly massaged from the original data to conform to UCSC genome browser formats. Primary data is available for download from The Broad Institute's site here:
https://www.broadinstitute.org/scientific-community/science/projects/mammals-models/29-mammals-project-supplementary-info


==[http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hgt.customText=http://genomewiki.ucsc.edu/images/2/28/29mammalsConstrainedElements_track.txt Constrained Elements]==
Summary:  Lists of constrained elements.<BR>
For each 12-mer in the human genome a measure of constraint was
scored using SiPhy (see reference below), both as a rate-based
score (omega), and a measure that includes biased substitution
patterns (pi).  Those falling in annotated Ancestral Repeats
were used as a background.  An empirical cutoff score was set
corresponding to 10% FDR, and all 12-mers above this score were
considered significant.  Overlapping significant 12-mers were
clustered to yield larger elements.
There are four tracks in this set:
Constraint Elements pi lods score, data: mean: 7.214417, min: 1.711200, max: 16.237301, std: 1.364699
Constraint Elements pi branch length score, data: mean: 4.173642, min: -0.049714, max: 7.351400, std: 0.755658
Constraint Elements omega lods score, data: mean: 7.667489, min: 1.676100, max: 16.680599, std: 1.378213
Constraint Elements omega branch length score, data: mean: 4.255876, min: 0.301830, max: 7.376400, std: 0.731477
Data provided by Or Zuk at broad.mit.edu and Manuel Garber at broadinstitute.org
Garber, M. et al. [http://bioinformatics.oxfordjournals.org/content/25/12/i54.short Identifying novel constrained elements by exploiting biased substitution patterns]
Bioinformatics 25, i54-62, doi:btp190 [pii] 10.1093/bioinformatics/btp190 (2009)
== Base Level Measure of Constraint ==
Summary:  Base-level measure of constraint scored using SiPhy (see reference above),
both as a rate-based score (omega) and a measure that includes biased substitution patterns (pi).
Data Not Available on UCSC Genome Browser due to the lack of a suitable display format.
Contact: Manuel Garber <mgarber@broadinstitute.org>


==[http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hgt.customText=http://genomewiki.ucsc.edu/images/e/ed/29mammalsNovelExons_track.txt Novel Exons]==
==[http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hgt.customText=http://genomewiki.ucsc.edu/images/e/ed/29mammalsNovelExons_track.txt Novel Exons]==
Line 21: Line 54:
  [http://genome.cshlp.org/content/17/12/1823.abstract Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes]
  [http://genome.cshlp.org/content/17/12/1823.abstract Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes]
  Genome Res 17, 1823-1836, doi:gr.6679507 [pii] 10.1101/gr.6679507 (2007).
  Genome Res 17, 1823-1836, doi:gr.6679507 [pii] 10.1101/gr.6679507 (2007).
==[http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hgt.customText=http://genomewiki.ucsc.edu/images/a/a7/29mammals2xARs_track.txt human and primate] accelerated regions==
Summary:  Lists of human accelerated regions (HARs) and primate accelerated regions (PARs).


  Regions with accelerated substitution rates in either lineage were
== [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hgt.customText=http://genomewiki.ucsc.edu/images/4/47/29mammalsExcessConstraint_track.txt Synonymous Constraint elements] ==
  identified by first defining candidate elements using the phastCons
  Summary: Identified coding regions with a very low synonymous substitution  
  program (not including the lineage of interest) and then scoring those
  rate Ð indicating additional sequence constraints beyond the amino acid level. 
  elements for accelerated substitution rates in the
The Synonymous Constraint Elements (SCEs) are defined at three different
  subtree (human or primate) of interest.
resolutions (9-, 15-, and 30-codon). There is also a bedGraph track for
  the local estimate of the synonymous substitution rate (lambda_s). 
Also available at: http://compbio.mit.edu/SCE/
 
Contact: Mike Lin <mikelin@mit.edu>
 
==[http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hgt.customText=http://genomewiki.ucsc.edu/images/8/8b/29mammalsRNAStruct_track.txt  RNA structures] ==
Summary: The list of candidate predictions for structural RNA families. EvoFold structural predictions
were based on a 31-way subset of the genome-wide 44-way multiZ alignment (consisting of 28 of the 29
eutherian mammals, together with opossum, chicken, and tetraodon as outgroups) and clustered into
candidate families using the novel EvoFam algorithm.  This data, as well as the complete set of
structure predictions from the EvoFold screen can be downloaded in bulk or browsed through a
UCSC Genome Mirror from the following web site:  http://moma.ki.au.dk/~jsp/mammals/.
 
  In addition, individual families are listed and annotated in the following reference and its supplement.
 
  Contacts: Brian Parker bparker@binf.ku.dk, Stefan Washietl <wash@csail.mit.edu>
 
Reference: Parker, B. J. et al. New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes. Genome Research (2011)


Data provided by Katherine Pollard at ucsf.edu
==[http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hgt.customText=http://genomewiki.ucsc.edu/images/a/ac/29mammalsConstraintStructure_track.txt Constraint Structure] in Promoters==
==[http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hgt.customText=http://genomewiki.ucsc.edu/images/5/5e/29mammalsExaptedElements_track.txt exaptations] of mobile elements==
  Summary: A list of local maxima identified from the smoothed pi-scores in the core promoters of genes.
  Summary: List of exapted elements identified as described in the reference mentioned below.


  Data provided by Craig Lowe at stanford.edu
  Data provided by Evan Mauceli at broadinstitute.org


Lowe, C. B. &amp; Haussler, D. 29 mammalian genomes reveal novel
==[http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hgt.customText=http://genomewiki.ucsc.edu/images/f/fd/29mammalsMotifInstances_track.txt Identified Regulatory] Motifs==
exaptations of mobile elements for likely regulatory functions
in the human genome. In preparation (2011).
==[http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hgt.customText=http://genomewiki.ucsc.edu/images/f/fd/29mammalsMotifInstances_track.txt identified regulatory] motifs==
  Summary:  A list of instances of identified regulatory motifs.
  Summary:  A list of instances of identified regulatory motifs.


Line 52: Line 97:
  [http://genome.cshlp.org/content/17/12/1823.abstract Reliable prediction of regulator targets using 12 Drosophila genomes]
  [http://genome.cshlp.org/content/17/12/1823.abstract Reliable prediction of regulator targets using 12 Drosophila genomes]
  Genome Res 17, 1919-1931, doi:gr.7090407 [pii] 10.1101/gr.7090407 (2007).
  Genome Res 17, 1919-1931, doi:gr.7090407 [pii] 10.1101/gr.7090407 (2007).
==[http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hgt.customText=http://genomewiki.ucsc.edu/images/a/ac/29mammalsConstraintStructure_track.txt constraint structure] in promoters==
Summary:  A list of local maxima identified from the smoothed pi-scores in the core promoters of genes.


Data provided by Evan Mauceli at broadinstitute.org
==[http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hgt.customText=http://genomewiki.ucsc.edu/images/2/28/29mammalsConstrainedElements_track.txt constrained elements]==
Summary:  Lists of constrained elements.<BR>
For each 12-mer in the human genome a measure of constraint was
scored using SiPhy (see reference below), both as a rate-based
score (omega), and a measure that includes biased substitution
patterns (pi).  Those falling in annotated Ancestral Repeats
were used as a background.  An empirical cutoff score was set
corresponding to 10% FDR, and all 12-mers above this score were
considered significant.  Overlapping significant 12-mers were
clustered to yield larger elements.


  There are four tracks in this set:
==[http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hgt.customText=http://genomewiki.ucsc.edu/images/a/a3/29mammalsPosSelCodons_track.txt Positively selected codons] ==
  Constraint Elements pi lods score, data: mean: 7.214417, min: 1.711200, max: 16.237301, std: 1.364699
  Summary: Main data files and backing data for the analysis identifying
Constraint Elements pi branch length score, data: mean: 4.173642, min: -0.049714, max: 7.351400, std: 0.755658
  positively selected codons.  This data and updates are available for
  Constraint Elements omega lods score, data: mean: 7.667489, min: 1.676100, max: 16.680599, std: 1.378213
download from here: http://www.ebi.ac.uk/~greg/mammals/
  Constraint Elements omega branch length score, data: mean: 4.255876, min: 0.301830, max: 7.376400, std: 0.731477
 
Contact: Gregory Jordan <greg@ebi.ac.uk>
 
==[http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hgt.customText=http://genomewiki.ucsc.edu/images/5/5e/29mammalsExaptedElements_track.txt Exaptations] of Mobile Elements==
Summary: List of exapted elements identified as described in the reference mentioned below.
 
  Data provided by Craig Lowe at stanford.edu
 
Lowe, C. B. &amp; Haussler, D. 29 mammalian genomes reveal novel
  exaptations of mobile elements for likely regulatory functions
in the human genome. In preparation (2011).
 
==[http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hgt.customText=http://genomewiki.ucsc.edu/images/a/a7/29mammals2xARs_track.txt Human and Primate] Accelerated Regions==
Summary: Lists of human accelerated regions (HARs) and primate accelerated regions (PARs).


  Data provided by Or Zuk at broad.mit.edu and Manuel Garber at broadinstitute.org
  Regions with accelerated substitution rates in either lineage were
identified by first defining candidate elements using the phastCons
program (not including the lineage of interest) and then scoring those
elements for accelerated substitution rates in the
subtree (human or primate) of interest.


  Garber, M. et al. [http://bioinformatics.oxfordjournals.org/content/25/12/i54.short Identifying novel constrained elements by exploiting biased substitution patterns]
  Data provided by Katherine Pollard at ucsf.edu
Bioinformatics 25, i54-62, doi:btp190 [pii] 10.1093/bioinformatics/btp190 (2009)

Latest revision as of 21:26, 18 May 2011

A high-resolution map of evolutionary constraint in the human genome based on 29 eutherian mammals

This page contains links to view the data sets from the above paper on the UCSC genome browser. All tracks can be loaded into the hg18 genome browser using this link, 29mammals custom tracks ,or the individual tracks can be loaded separately from the list below. The data for these tracks has been slightly massaged from the original data to conform to UCSC genome browser formats. Primary data is available for download from The Broad Institute's site here: https://www.broadinstitute.org/scientific-community/science/projects/mammals-models/29-mammals-project-supplementary-info


Constrained Elements

Summary:  Lists of constrained elements.
For each 12-mer in the human genome a measure of constraint was scored using SiPhy (see reference below), both as a rate-based score (omega), and a measure that includes biased substitution patterns (pi). Those falling in annotated Ancestral Repeats were used as a background. An empirical cutoff score was set corresponding to 10% FDR, and all 12-mers above this score were considered significant. Overlapping significant 12-mers were clustered to yield larger elements.
There are four tracks in this set:
Constraint Elements pi lods score, data: mean: 7.214417, min: 1.711200, max: 16.237301, std: 1.364699
Constraint Elements pi branch length score, data: mean: 4.173642, min: -0.049714, max: 7.351400, std: 0.755658
Constraint Elements omega lods score, data: mean: 7.667489, min: 1.676100, max: 16.680599, std: 1.378213
Constraint Elements omega branch length score, data: mean: 4.255876, min: 0.301830, max: 7.376400, std: 0.731477
Data provided by Or Zuk at broad.mit.edu and Manuel Garber at broadinstitute.org
Garber, M. et al. Identifying novel constrained elements by exploiting biased substitution patterns
Bioinformatics 25, i54-62, doi:btp190 [pii] 10.1093/bioinformatics/btp190 (2009)

Base Level Measure of Constraint

Summary:  Base-level measure of constraint scored using SiPhy (see reference above), 
both as a rate-based score (omega) and a measure that includes biased substitution patterns (pi).
Data Not Available on UCSC Genome Browser due to the lack of a suitable display format.
Contact: Manuel Garber <mgarber@broadinstitute.org>

Novel Exons

Summary: A list of identified novel conserved exons.
Exons were identified using a version of CONGO (previously developed for
the Drosophila genomes, see reference below) enhanced to handle mammalian
exon prediction.  The enhancements include a semi-Markov feature to model
the short length distribution of mammalian exons, a synteny feature for
recognizing duplicated regions, and an alternative training function to improve
accuracy when performing an unbalanced prediction task
(only ~1.5% of the human genome is protein-coding)
Data provided by Mike Lin at mit.edu
Lin, M. F. et al.
Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes
Genome Res 17, 1823-1836, doi:gr.6679507 [pii] 10.1101/gr.6679507 (2007).

Synonymous Constraint elements

Summary: Identified coding regions with a very low synonymous substitution 
rate Ð indicating additional sequence constraints beyond the amino acid level.   
The Synonymous Constraint Elements (SCEs) are defined at three different 
resolutions (9-, 15-, and 30-codon). There is also a bedGraph track for 
the local estimate of the synonymous substitution rate (lambda_s).  
Also available at: http://compbio.mit.edu/SCE/
Contact: Mike Lin <mikelin@mit.edu>

RNA structures

Summary: The list of candidate predictions for structural RNA families. EvoFold structural predictions 
were based on a 31-way subset of the genome-wide 44-way multiZ alignment (consisting of 28 of the 29 
eutherian mammals, together with opossum, chicken, and tetraodon as outgroups) and clustered into 
candidate families using the novel EvoFam algorithm.  This data, as well as the complete set of 
structure predictions from the EvoFold screen can be downloaded in bulk or browsed through a 
UCSC Genome Mirror from the following web site:  http://moma.ki.au.dk/~jsp/mammals/. 
In addition, individual families are listed and annotated in the following reference and its supplement.
Contacts: Brian Parker bparker@binf.ku.dk, Stefan Washietl <wash@csail.mit.edu>
Reference: Parker, B. J. et al. New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes. Genome Research (2011)

Constraint Structure in Promoters

Summary:  A list of local maxima identified from the smoothed pi-scores in the core promoters of genes.
Data provided by Evan Mauceli at broadinstitute.org

Identified Regulatory Motifs

Summary:  A list of instances of identified regulatory motifs.
A motif catalog was built from TRANSFAC, Jaspar, and Protein
Binding Microarrays using a method similar to that described in
the reference below, with extensions for position frequency matricies.
Motif instances were identified genome-wide using a FDR of 60%.
Data provided by Pouya Kheradpour at mit.edu
Kheradpour, P., Stark, A., Roy, S. & Kellis, M.
Reliable prediction of regulator targets using 12 Drosophila genomes
Genome Res 17, 1919-1931, doi:gr.7090407 [pii] 10.1101/gr.7090407 (2007).


Positively selected codons

Summary: Main data files and backing data for the analysis identifying 
positively selected codons.  This data and updates are available for 
download from here:  http://www.ebi.ac.uk/~greg/mammals/
Contact: Gregory Jordan <greg@ebi.ac.uk>

Exaptations of Mobile Elements

Summary: List of exapted elements identified as described in the reference mentioned below.
Data provided by Craig Lowe at stanford.edu
Lowe, C. B. & Haussler, D. 29 mammalian genomes reveal novel
exaptations of mobile elements for likely regulatory functions
in the human genome. In preparation (2011).

Human and Primate Accelerated Regions

Summary:  Lists of human accelerated regions (HARs) and primate accelerated regions (PARs).
Regions with accelerated substitution rates in either lineage were
identified by first defining candidate elements using the phastCons
program (not including the lineage of interest) and then scoring those
elements for accelerated substitution rates in the
subtree (human or primate) of interest.
Data provided by Katherine Pollard at ucsf.edu