Conservation Track: Difference between revisions

From genomewiki
Jump to navigationJump to search
No edit summary
(added my notes from Kate's talk)
Line 3: Line 3:
1) Track Components: Tables
1) Track Components: Tables


     multizNway
     multizNway: scored ref, index into maf files (via extFile)
     multizNwaySummary
     multizNwaySummary: added to improve performance when the display is > 1 million bases
     multizNwayFrames
     multizNwayFrames: Mark D's codon frames, Brian R's gap annotation
     phastConsNway
     phastConsNway: wiggle, one score per base in genome. provides index into wib file.  based on percent (0..1)




Line 12: Line 12:


   * Display:
   * Display:
         /gbdb/<db>/multizNway/*.maf
         /gbdb/<db>/multizNway/*.maf (multiz table uses this file)
         /gbdb/<db>/phastConsNway/*.wib
         /gbdb/<db>/phastConsNway/*.wib (phastCons table uses this file)


   * Downloads:
   * Downloads:
         goldenPath/<db>/multizNway/chr*.maf
         goldenPath/<db>/multizNway/chr*.maf
         goldenPath/<db>/multizNway/upstream*.maf
         goldenPath/<db>/multizNway/upstream*.maf
         goldenPath/<db>/phastConsNway/*
         goldenPath/<db>/phastConsNway/* (compressed, per chrom)


3) Track Components: TrackDb
3) Track Components: TrackDb


   * Required:
   * Required:
         type wigMaf
         type wigMaf (track type)
         wiggle
         wiggle (wiggle table)


   * Optional:
   * Optional:
         speciesOrder
         speciesOrder (this is the order that the species will appear on the track control page and in the browser -- should be in phylo order)
         speciesGroups
         speciesGroups (these are the groups into which the species are split (e.g. vertebrate, mammals))
         summary
         summary (points to multizXwaySummary table)
         frames
         frames (points to multizXwayFrames table)


4) Most Conserved Track
4) Most Conserved Track


   * Table:
   * Table:
         phastConsNwayElements
         phastConsNwayElements (BED of scored elements)


   * Files:
   * Files:
Line 43: Line 43:
   1. Create single-coverage pairwise alignments (axtNet)
   1. Create single-coverage pairwise alignments (axtNet)
   2. Create multiple alignment
   2. Create multiple alignment
   3. Generate conservation scores and conserved elements
   3. Generate conservation scores and conserved elements (phastCons)
   4. Add gap annotation to multiple alignment
   4. Add gap annotation to multiple alignment (Brian R's gap annotation software)
   5. Create multiple alignment summary
   5. Create multiple alignment summary
   6. Create frame tables for multiple alignment
   6. Create frame tables for multiple alignment
Line 51: Line 51:
6) Pairwise Alignments: Procedure
6) Pairwise Alignments: Procedure


   1. Blastz Alignment (blastz, lavToPsl)
   1. Blastz Alignment (blastz, lavToPsl)  (this generates a set of alignments in psl (these are close enough so that you can swap species1 <-> species2))
   2. Chaining (axtChain, chainMergeSort, chainAntiRepeat)
   2. Chaining (axtChain, chainMergeSort, chainAntiRepeat)
   3. Netting (chainNet, netFilter)
   3. Netting (chainNet, netFilter)
   4. Extraction of single-coverage alignments from the net (netToAxt)
   4. Extraction of single-coverage alignments from the net (netToAxt) (net chooses single best chain for Level 1)  (can't simply swap nets like you can chains)  (feed netAxt into MULTIZ)


   *  All automated by doBlastzChainNet.pl
   *  All automated by doBlastzChainNet.pl
Line 62: Line 62:
7) Pairwise Alignments: Parameters
7) Pairwise Alignments: Parameters


     Blastz scoring matrix
     Blastz scoring matrix (this is the $matrix that shows up on the chain description page)
     Blastz gap penalties, misc
     Blastz gap penalties, misc
     Lineage-specific repeat abridging
     Lineage-specific repeat abridging (give BLASTZ masked sequence, BLASTZ aviods starting in a repeat, but will continue through one)
     Chaining min score, linear gap
     Chaining min score, linear gap


Line 72: Line 72:
   * Inputs:
   * Inputs:
         1. Single-coverage pairwise alignments
         1. Single-coverage pairwise alignments
         2. Species tree
         2. Species tree (phastCons "make tree")


   * Aligner:
   * Aligner:
         multiz (with autoMZ driver) or
         multiz (with autoMZ driver) (feed it the tree, and it does the multiple alignment)
         TBA (Threaded Blockset Aligner)
        or
         TBA (Threaded Blockset Aligner) (ENCODE uses this)




9) Conservation Scoring with PhastCons
9) Conservation Scoring with PhastCons (Adam S's phylogenetic HMM)


   * Inputs:
   * Inputs:
Line 92: Line 93:
         Conserved elements
         Conserved elements


(our goal is to get 5% of genome in conserved elements -- the params are tweaked until we hit this)


10) Multiple Alignment Summary and Annotations
10) Multiple Alignment Summary and Annotations

Revision as of 19:15, 1 August 2006

Conservation Track Implementation Notes

1) Track Components: Tables

   multizNway: scored ref, index into maf files (via extFile)
   multizNwaySummary: added to improve performance when the display is > 1 million bases
   multizNwayFrames: Mark D's codon frames, Brian R's gap annotation
   phastConsNway: wiggle, one score per base in genome. provides index into wib file.  based on percent (0..1)


2) Track Components: Files

 * Display:
       /gbdb/<db>/multizNway/*.maf  (multiz table uses this file)
       /gbdb/<db>/phastConsNway/*.wib  (phastCons table uses this file)
 * Downloads:
       goldenPath/<db>/multizNway/chr*.maf
       goldenPath/<db>/multizNway/upstream*.maf
       goldenPath/<db>/phastConsNway/*  (compressed, per chrom)

3) Track Components: TrackDb

 * Required:
       type wigMaf  (track type)
       wiggle  (wiggle table)
 * Optional:
       speciesOrder (this is the order that the species will appear on the track control page and in the browser -- should be in phylo order)
       speciesGroups (these are the groups into which the species are split (e.g. vertebrate, mammals))
       summary (points to multizXwaySummary table)
       frames (points to multizXwayFrames table)

4) Most Conserved Track

 * Table:
       phastConsNwayElements (BED of scored elements)
 * Files:
       NONE

5) Track Construction: Overview

 1. Create single-coverage pairwise alignments (axtNet)
 2. Create multiple alignment
 3. Generate conservation scores and conserved elements (phastCons)
 4. Add gap annotation to multiple alignment (Brian R's gap annotation software)
 5. Create multiple alignment summary
 6. Create frame tables for multiple alignment


6) Pairwise Alignments: Procedure

 1. Blastz Alignment (blastz, lavToPsl)  (this generates a set of alignments in psl (these are close enough so that you can swap species1 <-> species2))
 2. Chaining (axtChain, chainMergeSort, chainAntiRepeat)
 3. Netting (chainNet, netFilter)
 4. Extraction of single-coverage alignments from the net (netToAxt) (net chooses single best chain for Level 1)  (can't simply swap nets like you can chains)  (feed netAxt into MULTIZ)
 *  All automated by doBlastzChainNet.pl
  (Thanks, Angie!!)


7) Pairwise Alignments: Parameters

   Blastz scoring matrix (this is the $matrix that shows up on the chain description page)
   Blastz gap penalties, misc
   Lineage-specific repeat abridging (give BLASTZ masked sequence, BLASTZ aviods starting in a repeat, but will continue through one)
   Chaining min score, linear gap


8) Multiple Alignment

 * Inputs:
       1. Single-coverage pairwise alignments
       2. Species tree (phastCons "make tree")
 * Aligner:
       multiz (with autoMZ driver) (feed it the tree, and it does the multiple alignment)
       or
       TBA (Threaded Blockset Aligner) (ENCODE uses this)


9) Conservation Scoring with PhastCons (Adam S's phylogenetic HMM)

 * Inputs:
       Multiple alignment
       Species tree with branch lengths
        (optionally two trees)
 * Parameters:  rho, expected-len, target-coverage
 * Output:
       Per-base probability
       Conserved elements

(our goal is to get 5% of genome in conserved elements -- the params are tweaked until we hit this)

10) Multiple Alignment Summary and Annotations

   Gap Annotation (mafAddIRows)
   Summary table (hgLoadMafSummary)
   Coding frames (getFrames, etc.)