Mm9 multiple alignment: Difference between revisions

From genomewiki
Jump to navigationJump to search
Line 19: Line 19:
   <TH>rat rn4</TH>
   <TH>rat rn4</TH>
   <TD>21</TD>
   <TD>21</TD>
   <TD>2,702 Mb</TD>
   <TD>2702 Mb</TD>
   <TD>0.1587</TD>  
   <TD>0.160657</TD>  
   <TD>3000</TD>
   <TD>3000</TD>
   <TD>medium</TD>
   <TD>medium</TD>
Line 31: Line 31:
   <TH>human hg18</TH>
   <TH>human hg18</TH>
   <TD>24</TD>
   <TD>24</TD>
   <TD>2,963 Mb</TD>
   <TD>2963 Mb</TD>
   <TD>0.4667</TD>  
   <TD>0.452619</TD>  
   <TD>3000</TD>
   <TD>3000</TD>
   <TD>medium</TD>
   <TD>medium</TD>
Line 41: Line 41:


<TR>
<TR>
   <TD>Chimp panTro2</TD>
   <TH>Rhesus rheMac2</TH>
   <TD>25</TD>
   <TD>22</TD>
   <TD>3028 Mb</TD>
   <TD>2731 Mb</TD>
   <TD>0.4686</TD>
   <TD>0.452745</TD>
   <TD>3000</TD>
   <TD>3000</TD>
   <TD>medium</TD>
   <TD>medium</TD>
Line 53: Line 53:


<TR>
<TR>
   <TD>tbd ponAbe1</TD>
   <TH>Orangutan ponAbe1</TH>
   <TD>79553</TD>
   <TD>79553</TD>
   <TD>3090 Mb</TD>
   <TD>3090 Mb</TD>
   <TD>X.xxxx</TD>
   <TD>0.453809</TD>
   <TD>3000</TD>
   <TD>3000</TD>
   <TD>medium</TD>
   <TD>medium</TD>
Line 65: Line 65:


<TR>
<TR>
   <TD>Rhesus rheMac2</TD>
   <TH>Marmoset calJac1</TH>
   <TD>22</TD>
   <TD>49724</TD>
   <TD>2731 Mb</TD>
   <TD>2889 Mb</TD>
   <TD>0.4906</TD>
   <TD>0.454272</TD>
   <TD>3000</TD>
   <TD>3000</TD>
   <TD>medium</TD>
   <TD>medium</TD>
Line 77: Line 77:


<TR>
<TR>
   <TD>Bushbaby otoGar1</TD>
   <TH>Chimp panTro2</TH>
   <TD>120882</TD>
   <TD>25</TD>
   <TD>3261 Mb</TD>
   <TD>3028 Mb</TD>
   <TD>X.xxxx</TD>
   <TD>0.454514</TD>
   <TD>3000</TD>
   <TD>3000</TD>
   <TD>medium</TD>
   <TD>medium</TD>
Line 89: Line 89:


<TR>
<TR>
   <TD>tbd calJac1</TD>
   <TH>GuineaPig cavPor2</TH>
   <TD>49724</TD>
   <TD>295514</TD>
   <TD>2889 Mb</TD>
   <TD>3246 Mb</TD>
   <TD>X.xxxx</TD>
   <TD>0.479871</TD>
   <TD>3000</TD>
   <TD>3000</TD>
   <TD>medium</TD>
   <TD>medium</TD>
Line 101: Line 101:


<TR>
<TR>
   <TD>TreeShrew tupBel1</TD>
   <TH>Horse equCab1</TH>
   <TD>150851</TD>
   <TD>32</TD>
   <TD>3491 Mb</TD>
   <TD>1961 Mb</TD>
   <TD>X.xxxx</TD>
   <TD>0.479871</TD>
   <TD>3000</TD>
   <TD>3000</TD>
   <TD>medium</TD>
   <TD>medium</TD>
Line 113: Line 113:


<TR>
<TR>
   <TD>GuineaPig cavPor2</TD>
   <TH>TreeShrew tupBel1</TH>
   <TD>295514</TD>
   <TD>150851</TD>
   <TD>3246 Mb</TD>
   <TD>3491 Mb</TD>
   <TD>X.xxxx</TD>
   <TD>0.494934</TD>
   <TD>3000</TD>
   <TD>3000</TD>
   <TD>medium</TD>
   <TD>medium</TD>
Line 125: Line 125:


<TR>
<TR>
   <TD>Rabbit oryCun1</TD>
   <TH>Bushbaby otoGar1</TH>
   <TD>215471</TD>
   <TD>120882</TD>
   <TD>3303 Mb</TD>
   <TD>3261 Mb</TD>
   <TD>0.5131</TD>
   <TD>0.498957</TD>
   <TD>3000</TD>
   <TD>3000</TD>
   <TD>medium</TD>
   <TD>medium</TD>
Line 137: Line 137:


<TR>
<TR>
   <TD>Shrew sorAra1</TD>
   <TH>Armadillo dasNov1</TH>
   <TD>262057</TD>
   <TD>304391</TD>
   <TD>2800 Mb</TD>
   <TD>3678 Mb</TD>
   <TD>X.xxxx</TD>
   <TD>0.517360</TD>
   <TD>3000</TD>
   <TD>3000</TD>
   <TD>medium</TD>
   <TD>medium</TD>
Line 149: Line 149:


<TR>
<TR>
   <TD>Hedgehog eriEur1</TD>
   <TH>Rabbit oryCun1</TH>
   <TD>379801</TD>
   <TD>215471</TD>
   <TD>3211 Mb</TD>
   <TD>3303 Mb</TD>
   <TD>X.xxxx</TD>
   <TD>0.519779</TD>
   <TD>3000</TD>
   <TD>3000</TD>
   <TD>medium</TD>
   <TD>medium</TD>
Line 161: Line 161:


<TR>
<TR>
   <TD>Dog canFam2</TD>
   <TH>Cat felCat3</TH>
   <TD>39</TD>
   <TD>217790</TD>
   <TD>2331 Mb</TD>
   <TD>3858 Mb</TD>
   <TD>0.6230</TD>
   <TD>0.530610</TD>
   <TD>3000</TD>
   <TD>3000</TD>
   <TD>medium</TD>
   <TD>medium</TD>
Line 173: Line 173:


<TR>
<TR>
   <TD>Cat felCat3</TD>
   <TH>Dog canFam2</TH>
   <TD>217790</TD>
   <TD>39</TD>
   <TD>3858 Mb</TD>
   <TD>2331 Mb</TD>
   <TD>X.xxxx</TD>
   <TD>0.533544</TD>
   <TD>3000</TD>
   <TD>3000</TD>
   <TD>medium</TD>
   <TD>medium</TD>
Line 185: Line 185:


<TR>
<TR>
   <TD>Horse equCab1</TD>
   <TH>Elephant loxAfr1</TH>
   <TD>32</TD>
   <TD>233134</TD>
   <TD>1961 Mb</TD>
   <TD>3535 Mb</TD>
   <TD>X.xxxx</TD>
   <TD>0.536627</TD>
   <TD>3000</TD>
   <TD>3000</TD>
   <TD>medium</TD>
   <TD>medium</TD>
Line 197: Line 197:


<TR>
<TR>
   <TD>Cow bosTau3</TD>
   <TH>Cow bosTau3</TH>
   <TD>30</TD>
   <TD>30</TD>
   <TD>2321 Mb</TD>
   <TD>2321 Mb</TD>
   <TD>0.6344</TD>
   <TD>0.540852</TD>
   <TD>3000</TD>
   <TD>3000</TD>
   <TD>medium</TD>
   <TD>medium</TD>
Line 209: Line 209:


<TR>
<TR>
   <TD>Armadillo dasNov1</TD>
   <TH>Hedgehog eriEur1</TH>
   <TD>304391</TD>
   <TD>379801</TD>
   <TD>3678 Mb</TD>
   <TD>3211 Mb</TD>
   <TD>X.xxxx</TD>
   <TD>0.632457</TD>
   <TD>5000</TD>
   <TD>3000</TD>
   <TD>loose</TD>
   <TD>medium</TD>
   <TD>xx.123</TD>
   <TD>xx.123</TD>
   <TD>xx.456</TD>
   <TD>xx.456</TD>
Line 221: Line 221:


<TR>
<TR>
   <TD>Elephant loxAfr1</TD>
   <TH>Shrew sorAra1</TH>
   <TD>233134</TD>
   <TD>262057</TD>
   <TD>3535 Mb</TD>
   <TD>2800 Mb</TD>
   <TD>0.6256</TD>
   <TD>0.658734</TD>
   <TD>5000</TD>
   <TD>3000</TD>
   <TD>loose</TD>
   <TD>medium</TD>
   <TD>xx.123</TD>
   <TD>xx.123</TD>
   <TD>xx.456</TD>
   <TD>xx.456</TD>
Line 233: Line 233:


<TR>
<TR>
   <TD>Tenrec echTel1</TD>
   <TH>Tenrec echTel1</TH>
   <TD>325491</TD>
   <TD>325491</TD>
   <TD>3646 Mb</TD>
   <TD>3646 Mb</TD>
   <TD>0.7805</TD>
   <TD>0.666303</TD>
   <TD>5000</TD>
   <TD>3000</TD>
   <TD>loose</TD>
   <TD>medium</TD>
   <TD>xx.123</TD>
   <TD>xx.123</TD>
   <TD>xx.456</TD>
   <TD>xx.456</TD>
Line 245: Line 245:


<TR>
<TR>
   <TD>Opossum monDom4</TD>
   <TH>Opossum monDom4</TH>
   <TD>9</TD>
   <TD>9</TD>
   <TD>3272 Mb</TD>
   <TD>3272 Mb</TD>
   <TD>1.0698</TD>
   <TD>0.909852</TD>
   <TD>5000</TD>
   <TD>5000</TD>
   <TD>loose</TD>
   <TD>loose</TD>
Line 257: Line 257:


<TR>
<TR>
   <TD>Platypus ornAna1</TD>
   <TH>Platypus ornAna1</TH>
   <TD>201522</TD>
   <TD>201522</TD>
   <TD>1904 Mb</TD>
   <TD>1904 Mb</TD>
   <TD>X.xxxx</TD>
   <TD>1.165888</TD>
   <TD>5000</TD>
   <TD>5000</TD>
   <TD>loose</TD>
   <TD>loose</TD>
Line 269: Line 269:


<TR>
<TR>
   <TD>Chicken galGal3</TD>
   <TH>Chicken galGal3</TH>
   <TD>33</TD>
   <TD>33</TD>
   <TD>984 Mb</TD>
   <TD>984 Mb</TD>
   <TD>1.3425</TD>
   <TD>1.285399</TD>
   <TD>5000</TD>
   <TD>5000</TD>
   <TD>loose</TD>
   <TD>loose</TD>
Line 281: Line 281:


<TR>
<TR>
   <TD>Lizard anoCar1</TD>
   <TH>Lizard anoCar1</TH>
   <TD>7233</TD>
   <TD>7233</TD>
   <TD>1699 Mb</TD>
   <TD>1699 Mb</TD>
   <TD>X.xxxx</TD>
   <TD>1.404225x</TD>
   <TD>5000</TD>
   <TD>5000</TD>
   <TD>loose</TD>
   <TD>loose</TD>
Line 293: Line 293:


<TR>
<TR>
   <TD>X. tropicalis xenTro2</TD>
   <TH>X. tropicalis xenTro2</TH>
   <TD>19759</TD>
   <TD>19759</TD>
   <TD>1443 Mb</TD>
   <TD>1443 Mb</TD>
   <TD>1.7936</TD>
   <TD>1.726205</TD>
   <TD>5000</TD>
   <TD>5000</TD>
   <TD>loose</TD>
   <TD>loose</TD>
Line 305: Line 305:


<TR>
<TR>
   <TD>Tetraodon tetNig1</TD>
   <TH>Stickleback gasAcu1</TH>
   <TD>21</TD>
   <TD>21</TD>
   <TD>207 Mb</TD>
   <TD>382 Mb</TD>
   <TD>2.0157</TD>
   <TD>2.012649</TD>
   <TD>5000</TD>
   <TD>5000</TD>
   <TD>loose</TD>
   <TD>loose</TD>
Line 317: Line 317:


<TR>
<TR>
   <TD>Fugu fr2</TD>
   <TH>Zebrafish danRer4</TH>
   <TD>1</TD>
   <TD>25</TD>
   <TD>381 Mb</TD>
   <TD>1475 Mb</TD>
   <TD>2.0562</TD>
   <TD>2.027153</TD>
   <TD>5000</TD>
   <TD>5000</TD>
   <TD>loose</TD>
   <TD>loose</TD>
Line 329: Line 329:


<TR>
<TR>
   <TD>Stickleback gasAcu1</TD>
   <TH>Tetraodon tetNig1</TH>
   <TD>21</TD>
   <TD>21</TD>
   <TD>382 Mb</TD>
   <TD>207 Mb</TD>
   <TD>X.xxxx</TD>
   <TD>2.051015</TD>
   <TD>5000</TD>
   <TD>5000</TD>
   <TD>loose</TD>
   <TD>loose</TD>
Line 341: Line 341:


<TR>
<TR>
   <TD>Medaka oryLat1</TD>
   <TH>Fugu fr2</TH>
   <TD>24</TD>
   <TD>1</TD>
   <TD>690 Mb</TD>
   <TD>381 Mb</TD>
   <TD>X.xxxx</TD>
   <TD>2.086669</TD>
   <TD>5000</TD>
   <TD>5000</TD>
   <TD>loose</TD>
   <TD>loose</TD>
Line 353: Line 353:


<TR>
<TR>
   <TD>Zebrafish danRer4</TD>
   <TH>Medaka oryLat1</TH>
   <TD>25</TD>
   <TD>24</TD>
   <TD>1475 Mb</TD>
   <TD>690 Mb</TD>
   <TD>2.1059</TD>
   <TD>2.200402</TD>
   <TD>5000</TD>
   <TD>5000</TD>
   <TD>loose</TD>
   <TD>loose</TD>
Line 370: Line 370:


chrom size has the same limitation as the chrom count, no randoms.
chrom size has the same limitation as the chrom count, no randoms.
Tree distances are from the hg18 28-way measurements, with ponAbe1 and calJac1 manually inserted into the tree.


==blastz alignment parameters details==
==blastz alignment parameters details==

Revision as of 18:21, 20 August 2007

To avoid artifacts in downstream processing of the UCSC multiple alignments, it is important to be careful on the use of the parameters used in the blastz processing pipeline. There are a number of steps in the pipeline and a variety of tunable parameters involved. This page will track the various parameters used in the alignments as they proceed toward the completion of a multiple alignment conservation track on the mm9 mouse (NCBI build 37) assembly

axtChain parameters and end results

name db chrom
count (*)
genome
size
tree
distance
axtChain
minScore
axtChain
linearGap
% of mm9
matched
% of other
matched by mm9
done
rat rn4 21 2702 Mb 0.160657 3000 medium 68.357 69.541 16 August
human hg18 24 2963 Mb 0.452619 3000 medium 38.499 35.201 16 August
Rhesus rheMac2 22 2731 Mb 0.452745 3000 medium xx.123 xx.456 tbd
Orangutan ponAbe1 79553 3090 Mb 0.453809 3000 medium xx.123 xx.456 tbd
Marmoset calJac1 49724 2889 Mb 0.454272 3000 medium xx.123 xx.456 tbd
Chimp panTro2 25 3028 Mb 0.454514 3000 medium xx.123 xx.456 tbd
GuineaPig cavPor2 295514 3246 Mb 0.479871 3000 medium xx.123 xx.456 tbd
Horse equCab1 32 1961 Mb 0.479871 3000 medium xx.123 xx.456 tbd
TreeShrew tupBel1 150851 3491 Mb 0.494934 3000 medium xx.123 xx.456 tbd
Bushbaby otoGar1 120882 3261 Mb 0.498957 3000 medium xx.123 xx.456 tbd
Armadillo dasNov1 304391 3678 Mb 0.517360 3000 medium xx.123 xx.456 tbd
Rabbit oryCun1 215471 3303 Mb 0.519779 3000 medium xx.123 xx.456 tbd
Cat felCat3 217790 3858 Mb 0.530610 3000 medium xx.123 xx.456 tbd
Dog canFam2 39 2331 Mb 0.533544 3000 medium xx.123 xx.456 tbd
Elephant loxAfr1 233134 3535 Mb 0.536627 3000 medium xx.123 xx.456 tbd
Cow bosTau3 30 2321 Mb 0.540852 3000 medium xx.123 xx.456 tbd
Hedgehog eriEur1 379801 3211 Mb 0.632457 3000 medium xx.123 xx.456 tbd
Shrew sorAra1 262057 2800 Mb 0.658734 3000 medium xx.123 xx.456 tbd
Tenrec echTel1 325491 3646 Mb 0.666303 3000 medium xx.123 xx.456 tbd
Opossum monDom4 9 3272 Mb 0.909852 5000 loose xx.123 xx.456 tbd
Platypus ornAna1 201522 1904 Mb 1.165888 5000 loose xx.123 xx.456 tbd
Chicken galGal3 33 984 Mb 1.285399 5000 loose xx.123 xx.456 tbd
Lizard anoCar1 7233 1699 Mb 1.404225x 5000 loose xx.123 xx.456 tbd
X. tropicalis xenTro2 19759 1443 Mb 1.726205 5000 loose xx.123 xx.456 tbd
Stickleback gasAcu1 21 382 Mb 2.012649 5000 loose xx.123 xx.456 tbd
Zebrafish danRer4 25 1475 Mb 2.027153 5000 loose xx.123 xx.456 tbd
Tetraodon tetNig1 21 207 Mb 2.051015 5000 loose xx.123 xx.456 tbd
Fugu fr2 1 381 Mb 2.086669 5000 loose xx.123 xx.456 tbd
Medaka oryLat1 24 690 Mb 2.200402 5000 loose xx.123 xx.456 tbd


(*) chrom count does not include haplotypes, chr*_random, chrUn or chrM unless chrUn or scaffolds are the only sequences for that assembly.

chrom size has the same limitation as the chrom count, no randoms.

Tree distances are from the hg18 28-way measurements, with ponAbe1 and calJac1 manually inserted into the tree.

blastz alignment parameters details

target query abridged
repeats
target
size
(overlap)
query
size
(overlap)
H M
mm9 rat rn4 yes
B=0
10M (10K) 10M (0) 2000 40M
human hg18 mm9 yes
B=0
10M (0) 10M (10K) 2000 40M


default blastz parameters

m=80  v=0  B=2  C=0  E=30  G=0  H=0  K=3000 L=K
M=0 O=400 P=1 R=0 T=1 W=8 X=10*(A-to-A match score)
Y=O+300*E Z=1

From the blastz usage message:

Default values are given in parentheses.
  m(80M) bytes of space for trace-back information
  v(0) 0: quiet; 1: verbose progress reports to stderr
  B(2) 0: single strand; >0: both strands
  C(0) 0: no chaining; 1: just output chain; 2: chain and extend;
       3: just output HSPs
  E(30) gap-extension penalty.
  G(0) diagonal chaining penalty.
  H(0) interpolate between alignments at threshold K = argument.
  K(3000) threshold for MSPs
  L(K) threshold for gapped alignments
  M(0) mask any base in seq1 hit this many times; 0 = no dynamic masking
  O(400) gap-open penalty.
  P(1) 0: entropy not used; 1: entropy used; >1 entropy with feedback.
  Q load the scoring matrix from a file.
  R(0) antidiagonal chaining penalty.
  T(1) 0: W-bp words;  1: 12of19;  2: 12of19 without transitions.
                       3: 14of22;  4: 14of22 without transitions.
  W(8) word size (unused unless T=0)
  X(10*(A-to-A match score)) X-drop parameter for ungapped extension.
  Y(O+300E) X-drop parameter for gapped extension.
  Z(1) increment between successive words in sequence 1.

matrix parameters

The "medium" gap score matrix, tuned for the mouse-human distance is:

tableSize    11
smallSize   111
position  1   2   3   11  111  2111  12111  32111   72111  152111  252111
qGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900
tGap    350 425 450  600  900  2900  22900  57900  117900  217900  317900
bothGap 750 825 850 1000 1300  3300  23300  58300  118300  218300  318300

The "loose" gap score matrix, tuned for the chicken-human distance is:

tablesize    11
smallSize   111
position  1   2   3   11  111  2111  12111  32111  72111  152111  252111
qGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600
tGap    325 360 400  450  600  1100   3600   7600  15600   31600   56600
bothGap 625 660 700  750  900  1400   4000   8000  16000   32000   57000