Mm9 multiple alignment: Difference between revisions

From genomewiki
Jump to navigationJump to search
Line 31: Line 31:
   <TH>% of other<BR>matched by mm9</TH>
   <TH>% of other<BR>matched by mm9</TH>
   <TH>done</TH>
   <TH>done</TH>
</TR>
<TR>
  <TH>mouse mm9</TH>
  <TD>21</TD>
  <TD>2654 Mb</TD>
  <TD>0.0</TD>
  <TD>&nbsp;</TD>
  <TD>&nbsp;</TD>
  <TD>&nbsp;</TD>
  <TD>&nbsp;</TD>
  <TD>&nbsp;</TD>
</TR>
</TR>


Line 36: Line 48:
   <TH>rat rn4</TH>
   <TH>rat rn4</TH>
   <TD>21</TD>
   <TD>21</TD>
   <TD>2702 Mb</TD>
   <TD>2718 Mb</TD>
   <TD>0.160657</TD>  
   <TD>0.160657</TD>  
   <TD>3000</TD>
   <TD>3000</TD>
Line 48: Line 60:
   <TH>human hg18</TH>
   <TH>human hg18</TH>
   <TD>24</TD>
   <TD>24</TD>
   <TD>2963 Mb</TD>
   <TD>3080 Mb</TD>
   <TD>0.452619</TD>  
   <TD>0.452619</TD>  
   <TD>3000</TD>
   <TD>3000</TD>
Line 60: Line 72:
   <TH>Rhesus rheMac2</TH>
   <TH>Rhesus rheMac2</TH>
   <TD>22</TD>
   <TD>22</TD>
   <TD>2731 Mb</TD>
   <TD>2864 Mb</TD>
   <TD>0.452745</TD>
   <TD>0.452745</TD>
   <TD>3000</TD>
   <TD>3000</TD>
Line 72: Line 84:
   <TH>Orangutan ponAbe1</TH>
   <TH>Orangutan ponAbe1</TH>
   <TD>79553</TD>
   <TD>79553</TD>
   <TD>3090 Mb</TD>
   <TD>3240 Mb</TD>
   <TD>0.453809</TD>
   <TD>0.453809</TD>
   <TD>3000</TD>
   <TD>3000</TD>
Line 84: Line 96:
   <TH>Marmoset calJac1</TH>
   <TH>Marmoset calJac1</TH>
   <TD>49724</TD>
   <TD>49724</TD>
   <TD>2889 Mb</TD>
   <TD>3029 Mb</TD>
   <TD>0.454272</TD>
   <TD>0.454272</TD>
   <TD>3000</TD>
   <TD>3000</TD>
Line 96: Line 108:
   <TH>Chimp panTro2</TH>
   <TH>Chimp panTro2</TH>
   <TD>25</TD>
   <TD>25</TD>
   <TD>3028 Mb</TD>
   <TD>3175 Mb</TD>
   <TD>0.454514</TD>
   <TD>0.454514</TD>
   <TD>3000</TD>
   <TD>3000</TD>
Line 108: Line 120:
   <TH>GuineaPig cavPor2</TH>
   <TH>GuineaPig cavPor2</TH>
   <TD>295514</TD>
   <TD>295514</TD>
   <TD>3246 Mb</TD>
   <TD>3403 Mb</TD>
   <TD>0.479871</TD>
   <TD>0.479871</TD>
   <TD>3000</TD>
   <TD>3000</TD>
Line 120: Line 132:
   <TH>Horse equCab1</TH>
   <TH>Horse equCab1</TH>
   <TD>32</TD>
   <TD>32</TD>
   <TD>1961 Mb</TD>
   <TD>2056 Mb</TD>
   <TD>0.479871</TD>
   <TD>0.479871</TD>
   <TD>3000</TD>
   <TD>3000</TD>
Line 132: Line 144:
   <TH>TreeShrew tupBel1</TH>
   <TH>TreeShrew tupBel1</TH>
   <TD>150851</TD>
   <TD>150851</TD>
   <TD>3491 Mb</TD>
   <TD>3660 Mb</TD>
   <TD>0.494934</TD>
   <TD>0.494934</TD>
   <TD>3000</TD>
   <TD>3000</TD>
Line 144: Line 156:
   <TH>Bushbaby otoGar1</TH>
   <TH>Bushbaby otoGar1</TH>
   <TD>120882</TD>
   <TD>120882</TD>
   <TD>3261 Mb</TD>
   <TD>3420 Mb</TD>
   <TD>0.498957</TD>
   <TD>0.498957</TD>
   <TD>3000</TD>
   <TD>3000</TD>
Line 156: Line 168:
   <TH>Armadillo dasNov1</TH>
   <TH>Armadillo dasNov1</TH>
   <TD>304391</TD>
   <TD>304391</TD>
   <TD>3678 Mb</TD>
   <TD>3856 Mb</TD>
   <TD>0.517360</TD>
   <TD>0.517360</TD>
   <TD>3000</TD>
   <TD>3000</TD>
Line 168: Line 180:
   <TH>Rabbit oryCun1</TH>
   <TH>Rabbit oryCun1</TH>
   <TD>215471</TD>
   <TD>215471</TD>
   <TD>3303 Mb</TD>
   <TD>3464 Mb</TD>
   <TD>0.519779</TD>
   <TD>0.519779</TD>
   <TD>3000</TD>
   <TD>3000</TD>
Line 180: Line 192:
   <TH>Cat felCat3</TH>
   <TH>Cat felCat3</TH>
   <TD>217790</TD>
   <TD>217790</TD>
   <TD>3858 Mb</TD>
   <TD>4045 Mb</TD>
   <TD>0.530610</TD>
   <TD>0.530610</TD>
   <TD>3000</TD>
   <TD>3000</TD>
Line 192: Line 204:
   <TH>Dog canFam2</TH>
   <TH>Dog canFam2</TH>
   <TD>39</TD>
   <TD>39</TD>
   <TD>2331 Mb</TD>
   <TD>2445 Mb</TD>
   <TD>0.533544</TD>
   <TD>0.533544</TD>
   <TD>3000</TD>
   <TD>3000</TD>
Line 204: Line 216:
   <TH>Elephant loxAfr1</TH>
   <TH>Elephant loxAfr1</TH>
   <TD>233134</TD>
   <TD>233134</TD>
   <TD>3535 Mb</TD>
   <TD>3707 Mb</TD>
   <TD>0.536627</TD>
   <TD>0.536627</TD>
   <TD>3000</TD>
   <TD>3000</TD>
Line 216: Line 228:
   <TH>Cow bosTau3</TH>
   <TH>Cow bosTau3</TH>
   <TD>30</TD>
   <TD>30</TD>
   <TD>2321 Mb</TD>
   <TD>2434 Mb</TD>
   <TD>0.540852</TD>
   <TD>0.540852</TD>
   <TD>3000</TD>
   <TD>3000</TD>
Line 228: Line 240:
   <TH>Hedgehog eriEur1</TH>
   <TH>Hedgehog eriEur1</TH>
   <TD>379801</TD>
   <TD>379801</TD>
   <TD>3211 Mb</TD>
   <TD>3367 Mb</TD>
   <TD>0.632457</TD>
   <TD>0.632457</TD>
   <TD>3000</TD>
   <TD>3000</TD>
Line 240: Line 252:
   <TH>Shrew sorAra1</TH>
   <TH>Shrew sorAra1</TH>
   <TD>262057</TD>
   <TD>262057</TD>
   <TD>2800 Mb</TD>
   <TD>2936 Mb</TD>
   <TD>0.658734</TD>
   <TD>0.658734</TD>
   <TD>3000</TD>
   <TD>3000</TD>
Line 252: Line 264:
   <TH>Tenrec echTel1</TH>
   <TH>Tenrec echTel1</TH>
   <TD>325491</TD>
   <TD>325491</TD>
   <TD>3646 Mb</TD>
   <TD>3823 Mb</TD>
   <TD>0.666303</TD>
   <TD>0.666303</TD>
   <TD>3000</TD>
   <TD>3000</TD>
Line 264: Line 276:
   <TH>Opossum monDom4</TH>
   <TH>Opossum monDom4</TH>
   <TD>9</TD>
   <TD>9</TD>
   <TD>3272 Mb</TD>
   <TD>3431 Mb</TD>
   <TD>0.909852</TD>
   <TD>0.909852</TD>
   <TD>5000</TD>
   <TD>5000</TD>
Line 276: Line 288:
   <TH>Platypus ornAna1</TH>
   <TH>Platypus ornAna1</TH>
   <TD>201522</TD>
   <TD>201522</TD>
   <TD>1904 Mb</TD>
   <TD>1996 Mb</TD>
   <TD>1.165888</TD>
   <TD>1.165888</TD>
   <TD>5000</TD>
   <TD>5000</TD>
Line 288: Line 300:
   <TH>Chicken galGal3</TH>
   <TH>Chicken galGal3</TH>
   <TD>33</TD>
   <TD>33</TD>
   <TD>984 Mb</TD>
   <TD>1032 Mb</TD>
   <TD>1.285399</TD>
   <TD>1.285399</TD>
   <TD>5000</TD>
   <TD>5000</TD>
Line 300: Line 312:
   <TH>Lizard anoCar1</TH>
   <TH>Lizard anoCar1</TH>
   <TD>7233</TD>
   <TD>7233</TD>
   <TD>1699 Mb</TD>
   <TD>1781 Mb</TD>
   <TD>1.404225</TD>
   <TD>1.404225</TD>
   <TD>5000</TD>
   <TD>5000</TD>
Line 312: Line 324:
   <TH>X. tropicalis xenTro2</TH>
   <TH>X. tropicalis xenTro2</TH>
   <TD>19759</TD>
   <TD>19759</TD>
   <TD>1443 Mb</TD>
   <TD>1513 Mb</TD>
   <TD>1.726205</TD>
   <TD>1.726205</TD>
   <TD>5000</TD>
   <TD>5000</TD>
Line 324: Line 336:
   <TH>Stickleback gasAcu1</TH>
   <TH>Stickleback gasAcu1</TH>
   <TD>21</TD>
   <TD>21</TD>
   <TD>382 Mb</TD>
   <TD>400 Mb</TD>
   <TD>2.012649</TD>
   <TD>2.012649</TD>
   <TD>5000</TD>
   <TD>5000</TD>
Line 336: Line 348:
   <TH>Zebrafish danRer4</TH>
   <TH>Zebrafish danRer4</TH>
   <TD>25</TD>
   <TD>25</TD>
   <TD>1475 Mb</TD>
   <TD>1547 Mb</TD>
   <TD>2.027153</TD>
   <TD>2.027153</TD>
   <TD>5000</TD>
   <TD>5000</TD>
Line 348: Line 360:
   <TH>Tetraodon tetNig1</TH>
   <TH>Tetraodon tetNig1</TH>
   <TD>21</TD>
   <TD>21</TD>
   <TD>207 Mb</TD>
   <TD>217 Mb</TD>
   <TD>2.051015</TD>
   <TD>2.051015</TD>
   <TD>5000</TD>
   <TD>5000</TD>
Line 360: Line 372:
   <TH>Fugu fr2</TH>
   <TH>Fugu fr2</TH>
   <TD>1</TD>
   <TD>1</TD>
   <TD>381 Mb</TD>
   <TD>400 Mb</TD>
   <TD>2.086669</TD>
   <TD>2.086669</TD>
   <TD>5000</TD>
   <TD>5000</TD>
Line 372: Line 384:
   <TH>Medaka oryLat1</TH>
   <TH>Medaka oryLat1</TH>
   <TD>24</TD>
   <TD>24</TD>
   <TD>690 Mb</TD>
   <TD>724 Mb</TD>
   <TD>2.200402</TD>
   <TD>2.200402</TD>
   <TD>5000</TD>
   <TD>5000</TD>

Revision as of 20:21, 21 August 2007

Mouse Mm9 multiple alignment/conservation track

To avoid artifacts in downstream processing of the UCSC multiple alignments, it is important to be careful on the use of the parameters used in the blastz processing pipeline. There are a number of steps in the pipeline and a variety of tunable parameters involved. This page will track the various parameters used in the alignments as they proceed toward the completion of a multiple alignment conservation track on the mm9 mouse (NCBI build 37) assembly

The chrom count in the table below does not include haplotypes, chr*_random, chrUn or chrM unless chrUn or scaffolds are the only sequences for that assembly.

The genome size has the same limitation as the chrom count, no randoms.

Tree distances are from the hg18 28-way measurements, with ponAbe1 and calJac1 manually inserted into the tree.

I believe we use "syntenic" nets on the organisms that are assembled into chromosomes.

30-way phylogenetic tree

axtChain parameters and end results

name db chrom
count (*)
genome
size
tree
distance
axtChain
minScore
axtChain
linearGap
% of mm9
matched
% of other
matched by mm9
done
mouse mm9 21 2654 Mb 0.0          
rat rn4 21 2718 Mb 0.160657 3000 medium 68.357 69.541 16 August
human hg18 24 3080 Mb 0.452619 3000 medium 38.499 35.201 16 August
Rhesus rheMac2 22 2864 Mb 0.452745 3000 medium xx.123 xx.456 tbd
Orangutan ponAbe1 79553 3240 Mb 0.453809 3000 medium xx.123 xx.456 tbd
Marmoset calJac1 49724 3029 Mb 0.454272 3000 medium xx.123 xx.456 tbd
Chimp panTro2 25 3175 Mb 0.454514 3000 medium xx.123 xx.456 tbd
GuineaPig cavPor2 295514 3403 Mb 0.479871 3000 medium xx.123 xx.456 tbd
Horse equCab1 32 2056 Mb 0.479871 3000 medium xx.123 xx.456 tbd
TreeShrew tupBel1 150851 3660 Mb 0.494934 3000 medium xx.123 xx.456 tbd
Bushbaby otoGar1 120882 3420 Mb 0.498957 3000 medium xx.123 xx.456 tbd
Armadillo dasNov1 304391 3856 Mb 0.517360 3000 medium xx.123 xx.456 tbd
Rabbit oryCun1 215471 3464 Mb 0.519779 3000 medium xx.123 xx.456 tbd
Cat felCat3 217790 4045 Mb 0.530610 3000 medium xx.123 xx.456 tbd
Dog canFam2 39 2445 Mb 0.533544 3000 medium xx.123 xx.456 tbd
Elephant loxAfr1 233134 3707 Mb 0.536627 3000 medium xx.123 xx.456 tbd
Cow bosTau3 30 2434 Mb 0.540852 3000 medium xx.123 xx.456 tbd
Hedgehog eriEur1 379801 3367 Mb 0.632457 3000 medium xx.123 xx.456 tbd
Shrew sorAra1 262057 2936 Mb 0.658734 3000 medium xx.123 xx.456 tbd
Tenrec echTel1 325491 3823 Mb 0.666303 3000 medium xx.123 xx.456 tbd
Opossum monDom4 9 3431 Mb 0.909852 5000 loose xx.123 xx.456 tbd
Platypus ornAna1 201522 1996 Mb 1.165888 5000 loose xx.123 xx.456 tbd
Chicken galGal3 33 1032 Mb 1.285399 5000 loose xx.123 xx.456 tbd
Lizard anoCar1 7233 1781 Mb 1.404225 5000 loose xx.123 xx.456 tbd
X. tropicalis xenTro2 19759 1513 Mb 1.726205 5000 loose xx.123 xx.456 tbd
Stickleback gasAcu1 21 400 Mb 2.012649 5000 loose xx.123 xx.456 tbd
Zebrafish danRer4 25 1547 Mb 2.027153 5000 loose xx.123 xx.456 tbd
Tetraodon tetNig1 21 217 Mb 2.051015 5000 loose xx.123 xx.456 tbd
Fugu fr2 1 400 Mb 2.086669 5000 loose xx.123 xx.456 tbd
Medaka oryLat1 24 724 Mb 2.200402 5000 loose xx.123 xx.456 tbd


(*) chrom count does not include haplotypes, chr*_random, chrUn or chrM unless chrUn or scaffolds are the only sequences for that assembly.

The genome size has the same limitation as the chrom count, no randoms.

Tree distances are from the hg18 28-way measurements, with ponAbe1 and calJac1 manually inserted into the tree.

blastz alignment parameters details

query abridged
repeats
M K L Q Y
Rat rn4 yes 40M 3K 3K default 9400
Human hg18 yes 40M 3K 3K default 9400
Rhesus rheMac2 no 40M 3K 3K default 9400
Orangutan ponAbe1 no 50 3K 3K default 9400
Marmoset calJac1 no 50 3K 3K default 9400
Chimp panTro2 yes 40M 3K 3K default 9400
GuineaPig cavPor2 no 50 3K 3K default 9400
Horse equCab1 no 40M 3K 3K default 9400
TreeShrew tupBel1 no 50 3K 3K default 9400
Bushbaby otoGar1 no 50 3K 3K default 9400
Armadillo dasNov1 no 50 3K 3K default 9400
Rabbit oryCun1 no 50 3K 3K default 9400
Cat felCat3 no 50 3K 3K default 9400
Dog canFam2 yes 40M 3K 3K default 9400
Elephant loxAfr1 no 50 3K 3K default 9400
Cow bosTau3 no 40M 3K 3K default 9400
Hedgehog eriEur1 no 50 3K 3K default 9400
Shrew sorAra1 no 50 3K 3K default 9400
Tenrec echTel1 no 50 3K 3K default 9400
Opossum monDom4 no 50 2200 6000 HoxD55 3400
Platypus ornAna1 no 50 2200 6000 HoxD55 3400
Chicken galGal3 yes 40M 2200 6000 HoxD55 3400
Lizard anoCar1 no 50 2200 6000 HoxD55 3400
X_tropicalis xenTro2 no 50 2200 6000 HoxD55 3400
Stickleback gasAcu1 no 40M 2200 6000 HoxD55 3400
Zebrafish danRer4 yes 40M 2200 6000 HoxD55 3400
Tetraodon tetNig1 no 40M 2200 6000 HoxD55 3400
Fugu fr2 no 40M 2200 6000 HoxD55 3400
Medaka oryLat1 no 40M 2200 6000 HoxD55 3400

default blastz parameters

m=80  v=0  B=2  C=0  E=30  G=0  H=0  K=3000 L=K
M=0 O=400 P=1 R=0 T=1 W=8 X=10*(A-to-A match score)
Y=O+300*E Z=1

From the blastz usage message:

Default values are given in parentheses.
  m(80M) bytes of space for trace-back information
  v(0) 0: quiet; 1: verbose progress reports to stderr
  B(2) 0: single strand; >0: both strands
  C(0) 0: no chaining; 1: just output chain; 2: chain and extend;
       3: just output HSPs
  E(30) gap-extension penalty.
  G(0) diagonal chaining penalty.
  H(0) interpolate between alignments at threshold K = argument.
  K(3000) threshold for MSPs
  L(K) threshold for gapped alignments
  M(0) mask any base in seq1 hit this many times; 0 = no dynamic masking
  O(400) gap-open penalty.
  P(1) 0: entropy not used; 1: entropy used; >1 entropy with feedback.
  Q load the scoring matrix from a file.
  R(0) antidiagonal chaining penalty.
  T(1) 0: W-bp words;  1: 12of19;  2: 12of19 without transitions.
                       3: 14of22;  4: 14of22 without transitions.
  W(8) word size (unused unless T=0)
  X(10*(A-to-A match score)) X-drop parameter for ungapped extension.
  Y(O+300E) X-drop parameter for gapped extension.
  Z(1) increment between successive words in sequence 1.
The default scoring matrix is:  The HoxD55 scoring matrix is:
 ACGT
A91-114-31-123
C-114100-125-31
G-31-125100-114
T-123-31-11491
 
 ACGT
A91-90-25-100
C-90100-100-25
G-25-100100-90
T-100-25-9091

matrix parameters

The "medium" gap score matrix, tuned for the mouse-human distance is:

tableSize    11
smallSize   111
position  1   2   3   11  111  2111  12111  32111   72111  152111  252111
qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900
tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900
bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300

The "loose" gap score matrix, tuned for the chicken-human distance is:

tablesize    11
smallSize   111
position  1   2   3   11  111  2111  12111  32111  72111  152111  252111
qGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600
tGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600
bothGap 625 660 700  750  900  1400   4000   8000  16000   32000   57000

the tree diagram

((((((((

 (((Mouse_mm9:0.076274,Rat_rn4:0.084383):0.200607,
    GuineaPig_cavPor2:0.202990):0.034350,
        Rabbit_oryCun1:0.208548):0.014587,

((((((Human_hg18:0.005873,Chimp_panTro2:0.007668):0.013037,
   Orangutan_ponAbe1:0.02):0.013037,Rhesus_rheMac2:0.031973):0.0365,
        Marmoset_calJac1:0.07):0.0365,Bushbaby_otoGar1:0.151185):0.015682,
           TreeShrew_tupBel1:0.162844):0.006272):0.019763,

 ((Shrew_sorAra1:0.248532,Hedgehog_eriEur1:0.222255):0.045693,

 (((Dog_canFam2:0.101137,Cat_felCat3:0.098203):0.048213,
    Horse_equCab1:0.099323):0.007287,
        Cow_bosTau3:0.163945):0.012398):0.018928):0.030081,

 (Armadillo_dasNov1:0.133274,(Elephant_loxAfr1:0.103030,
        Tenrec_echTel1:0.232706):0.049511):0.008424):0.213469,

 Opossum_monDom4:0.320721):0.088647,
    Platypus_ornAna1:0.488110):0.118797,
        (Chicken_galGal3:0.395136,Lizard_anoCar1:0.513962):0.093688):0.151358,
            Frog_xenTro2:0.778272):0.174596,

 (((Tetraodon_tetNig1:0.203933,Fugu_fr2:0.239587):0.203949,
    (Stickleback_gasAcu1:0.314162,Medaka_oryLat1:0.501915):0.055354):0.346008,
Zebrafish_danRer4:0.730028):0.174596);