Mm9 multiple alignment: Difference between revisions
Line 19: | Line 19: | ||
<TH>rat rn4</TH> | <TH>rat rn4</TH> | ||
<TD>21</TD> | <TD>21</TD> | ||
<TD> | <TD>2702 Mb</TD> | ||
<TD>0. | <TD>0.160657</TD> | ||
<TD>3000</TD> | <TD>3000</TD> | ||
<TD>medium</TD> | <TD>medium</TD> | ||
Line 31: | Line 31: | ||
<TH>human hg18</TH> | <TH>human hg18</TH> | ||
<TD>24</TD> | <TD>24</TD> | ||
<TD> | <TD>2963 Mb</TD> | ||
<TD>0. | <TD>0.452619</TD> | ||
<TD>3000</TD> | <TD>3000</TD> | ||
<TD>medium</TD> | <TD>medium</TD> | ||
Line 41: | Line 41: | ||
<TR> | <TR> | ||
< | <TH>Rhesus rheMac2</TH> | ||
<TD> | <TD>22</TD> | ||
<TD> | <TD>2731 Mb</TD> | ||
<TD>0. | <TD>0.452745</TD> | ||
<TD>3000</TD> | <TD>3000</TD> | ||
<TD>medium</TD> | <TD>medium</TD> | ||
Line 53: | Line 53: | ||
<TR> | <TR> | ||
< | <TH>Orangutan ponAbe1</TH> | ||
<TD>79553</TD> | <TD>79553</TD> | ||
<TD>3090 Mb</TD> | <TD>3090 Mb</TD> | ||
<TD> | <TD>0.453809</TD> | ||
<TD>3000</TD> | <TD>3000</TD> | ||
<TD>medium</TD> | <TD>medium</TD> | ||
Line 65: | Line 65: | ||
<TR> | <TR> | ||
< | <TH>Marmoset calJac1</TH> | ||
<TD> | <TD>49724</TD> | ||
<TD> | <TD>2889 Mb</TD> | ||
<TD>0. | <TD>0.454272</TD> | ||
<TD>3000</TD> | <TD>3000</TD> | ||
<TD>medium</TD> | <TD>medium</TD> | ||
Line 77: | Line 77: | ||
<TR> | <TR> | ||
< | <TH>Chimp panTro2</TH> | ||
<TD> | <TD>25</TD> | ||
<TD> | <TD>3028 Mb</TD> | ||
<TD> | <TD>0.454514</TD> | ||
<TD>3000</TD> | <TD>3000</TD> | ||
<TD>medium</TD> | <TD>medium</TD> | ||
Line 89: | Line 89: | ||
<TR> | <TR> | ||
< | <TH>GuineaPig cavPor2</TH> | ||
<TD> | <TD>295514</TD> | ||
<TD> | <TD>3246 Mb</TD> | ||
<TD> | <TD>0.479871</TD> | ||
<TD>3000</TD> | <TD>3000</TD> | ||
<TD>medium</TD> | <TD>medium</TD> | ||
Line 101: | Line 101: | ||
<TR> | <TR> | ||
< | <TH>Horse equCab1</TH> | ||
<TD> | <TD>32</TD> | ||
<TD> | <TD>1961 Mb</TD> | ||
<TD> | <TD>0.479871</TD> | ||
<TD>3000</TD> | <TD>3000</TD> | ||
<TD>medium</TD> | <TD>medium</TD> | ||
Line 113: | Line 113: | ||
<TR> | <TR> | ||
< | <TH>TreeShrew tupBel1</TH> | ||
<TD> | <TD>150851</TD> | ||
<TD> | <TD>3491 Mb</TD> | ||
<TD> | <TD>0.494934</TD> | ||
<TD>3000</TD> | <TD>3000</TD> | ||
<TD>medium</TD> | <TD>medium</TD> | ||
Line 125: | Line 125: | ||
<TR> | <TR> | ||
< | <TH>Bushbaby otoGar1</TH> | ||
<TD> | <TD>120882</TD> | ||
<TD> | <TD>3261 Mb</TD> | ||
<TD>0. | <TD>0.498957</TD> | ||
<TD>3000</TD> | <TD>3000</TD> | ||
<TD>medium</TD> | <TD>medium</TD> | ||
Line 137: | Line 137: | ||
<TR> | <TR> | ||
< | <TH>Armadillo dasNov1</TH> | ||
<TD> | <TD>304391</TD> | ||
<TD> | <TD>3678 Mb</TD> | ||
<TD> | <TD>0.517360</TD> | ||
<TD>3000</TD> | <TD>3000</TD> | ||
<TD>medium</TD> | <TD>medium</TD> | ||
Line 149: | Line 149: | ||
<TR> | <TR> | ||
< | <TH>Rabbit oryCun1</TH> | ||
<TD> | <TD>215471</TD> | ||
<TD> | <TD>3303 Mb</TD> | ||
<TD> | <TD>0.519779</TD> | ||
<TD>3000</TD> | <TD>3000</TD> | ||
<TD>medium</TD> | <TD>medium</TD> | ||
Line 161: | Line 161: | ||
<TR> | <TR> | ||
< | <TH>Cat felCat3</TH> | ||
<TD> | <TD>217790</TD> | ||
<TD> | <TD>3858 Mb</TD> | ||
<TD>0. | <TD>0.530610</TD> | ||
<TD>3000</TD> | <TD>3000</TD> | ||
<TD>medium</TD> | <TD>medium</TD> | ||
Line 173: | Line 173: | ||
<TR> | <TR> | ||
< | <TH>Dog canFam2</TH> | ||
<TD> | <TD>39</TD> | ||
<TD> | <TD>2331 Mb</TD> | ||
<TD> | <TD>0.533544</TD> | ||
<TD>3000</TD> | <TD>3000</TD> | ||
<TD>medium</TD> | <TD>medium</TD> | ||
Line 185: | Line 185: | ||
<TR> | <TR> | ||
< | <TH>Elephant loxAfr1</TH> | ||
<TD> | <TD>233134</TD> | ||
<TD> | <TD>3535 Mb</TD> | ||
<TD> | <TD>0.536627</TD> | ||
<TD>3000</TD> | <TD>3000</TD> | ||
<TD>medium</TD> | <TD>medium</TD> | ||
Line 197: | Line 197: | ||
<TR> | <TR> | ||
< | <TH>Cow bosTau3</TH> | ||
<TD>30</TD> | <TD>30</TD> | ||
<TD>2321 Mb</TD> | <TD>2321 Mb</TD> | ||
<TD>0. | <TD>0.540852</TD> | ||
<TD>3000</TD> | <TD>3000</TD> | ||
<TD>medium</TD> | <TD>medium</TD> | ||
Line 209: | Line 209: | ||
<TR> | <TR> | ||
< | <TH>Hedgehog eriEur1</TH> | ||
<TD> | <TD>379801</TD> | ||
<TD> | <TD>3211 Mb</TD> | ||
<TD> | <TD>0.632457</TD> | ||
<TD> | <TD>3000</TD> | ||
<TD> | <TD>medium</TD> | ||
<TD>xx.123</TD> | <TD>xx.123</TD> | ||
<TD>xx.456</TD> | <TD>xx.456</TD> | ||
Line 221: | Line 221: | ||
<TR> | <TR> | ||
< | <TH>Shrew sorAra1</TH> | ||
<TD> | <TD>262057</TD> | ||
<TD> | <TD>2800 Mb</TD> | ||
<TD>0. | <TD>0.658734</TD> | ||
<TD> | <TD>3000</TD> | ||
<TD> | <TD>medium</TD> | ||
<TD>xx.123</TD> | <TD>xx.123</TD> | ||
<TD>xx.456</TD> | <TD>xx.456</TD> | ||
Line 233: | Line 233: | ||
<TR> | <TR> | ||
< | <TH>Tenrec echTel1</TH> | ||
<TD>325491</TD> | <TD>325491</TD> | ||
<TD>3646 Mb</TD> | <TD>3646 Mb</TD> | ||
<TD>0. | <TD>0.666303</TD> | ||
<TD> | <TD>3000</TD> | ||
<TD> | <TD>medium</TD> | ||
<TD>xx.123</TD> | <TD>xx.123</TD> | ||
<TD>xx.456</TD> | <TD>xx.456</TD> | ||
Line 245: | Line 245: | ||
<TR> | <TR> | ||
< | <TH>Opossum monDom4</TH> | ||
<TD>9</TD> | <TD>9</TD> | ||
<TD>3272 Mb</TD> | <TD>3272 Mb</TD> | ||
<TD> | <TD>0.909852</TD> | ||
<TD>5000</TD> | <TD>5000</TD> | ||
<TD>loose</TD> | <TD>loose</TD> | ||
Line 257: | Line 257: | ||
<TR> | <TR> | ||
< | <TH>Platypus ornAna1</TH> | ||
<TD>201522</TD> | <TD>201522</TD> | ||
<TD>1904 Mb</TD> | <TD>1904 Mb</TD> | ||
<TD> | <TD>1.165888</TD> | ||
<TD>5000</TD> | <TD>5000</TD> | ||
<TD>loose</TD> | <TD>loose</TD> | ||
Line 269: | Line 269: | ||
<TR> | <TR> | ||
< | <TH>Chicken galGal3</TH> | ||
<TD>33</TD> | <TD>33</TD> | ||
<TD>984 Mb</TD> | <TD>984 Mb</TD> | ||
<TD>1. | <TD>1.285399</TD> | ||
<TD>5000</TD> | <TD>5000</TD> | ||
<TD>loose</TD> | <TD>loose</TD> | ||
Line 281: | Line 281: | ||
<TR> | <TR> | ||
< | <TH>Lizard anoCar1</TH> | ||
<TD>7233</TD> | <TD>7233</TD> | ||
<TD>1699 Mb</TD> | <TD>1699 Mb</TD> | ||
<TD> | <TD>1.404225x</TD> | ||
<TD>5000</TD> | <TD>5000</TD> | ||
<TD>loose</TD> | <TD>loose</TD> | ||
Line 293: | Line 293: | ||
<TR> | <TR> | ||
< | <TH>X. tropicalis xenTro2</TH> | ||
<TD>19759</TD> | <TD>19759</TD> | ||
<TD>1443 Mb</TD> | <TD>1443 Mb</TD> | ||
<TD>1. | <TD>1.726205</TD> | ||
<TD>5000</TD> | <TD>5000</TD> | ||
<TD>loose</TD> | <TD>loose</TD> | ||
Line 305: | Line 305: | ||
<TR> | <TR> | ||
< | <TH>Stickleback gasAcu1</TH> | ||
<TD>21</TD> | <TD>21</TD> | ||
<TD> | <TD>382 Mb</TD> | ||
<TD>2. | <TD>2.012649</TD> | ||
<TD>5000</TD> | <TD>5000</TD> | ||
<TD>loose</TD> | <TD>loose</TD> | ||
Line 317: | Line 317: | ||
<TR> | <TR> | ||
< | <TH>Zebrafish danRer4</TH> | ||
<TD> | <TD>25</TD> | ||
<TD> | <TD>1475 Mb</TD> | ||
<TD>2. | <TD>2.027153</TD> | ||
<TD>5000</TD> | <TD>5000</TD> | ||
<TD>loose</TD> | <TD>loose</TD> | ||
Line 329: | Line 329: | ||
<TR> | <TR> | ||
< | <TH>Tetraodon tetNig1</TH> | ||
<TD>21</TD> | <TD>21</TD> | ||
<TD> | <TD>207 Mb</TD> | ||
<TD> | <TD>2.051015</TD> | ||
<TD>5000</TD> | <TD>5000</TD> | ||
<TD>loose</TD> | <TD>loose</TD> | ||
Line 341: | Line 341: | ||
<TR> | <TR> | ||
< | <TH>Fugu fr2</TH> | ||
<TD> | <TD>1</TD> | ||
<TD> | <TD>381 Mb</TD> | ||
<TD> | <TD>2.086669</TD> | ||
<TD>5000</TD> | <TD>5000</TD> | ||
<TD>loose</TD> | <TD>loose</TD> | ||
Line 353: | Line 353: | ||
<TR> | <TR> | ||
< | <TH>Medaka oryLat1</TH> | ||
<TD> | <TD>24</TD> | ||
<TD> | <TD>690 Mb</TD> | ||
<TD>2. | <TD>2.200402</TD> | ||
<TD>5000</TD> | <TD>5000</TD> | ||
<TD>loose</TD> | <TD>loose</TD> | ||
Line 370: | Line 370: | ||
chrom size has the same limitation as the chrom count, no randoms. | chrom size has the same limitation as the chrom count, no randoms. | ||
Tree distances are from the hg18 28-way measurements, with ponAbe1 and calJac1 manually inserted into the tree. | |||
==blastz alignment parameters details== | ==blastz alignment parameters details== |
Revision as of 18:21, 20 August 2007
To avoid artifacts in downstream processing of the UCSC multiple alignments, it is important to be careful on the use of the parameters used in the blastz processing pipeline. There are a number of steps in the pipeline and a variety of tunable parameters involved. This page will track the various parameters used in the alignments as they proceed toward the completion of a multiple alignment conservation track on the mm9 mouse (NCBI build 37) assembly
axtChain parameters and end results
name db | chrom count (*) |
genome size |
tree distance |
axtChain minScore |
axtChain linearGap |
% of mm9 matched |
% of other matched by mm9 |
done |
---|---|---|---|---|---|---|---|---|
rat rn4 | 21 | 2702 Mb | 0.160657 | 3000 | medium | 68.357 | 69.541 | 16 August |
human hg18 | 24 | 2963 Mb | 0.452619 | 3000 | medium | 38.499 | 35.201 | 16 August |
Rhesus rheMac2 | 22 | 2731 Mb | 0.452745 | 3000 | medium | xx.123 | xx.456 | tbd |
Orangutan ponAbe1 | 79553 | 3090 Mb | 0.453809 | 3000 | medium | xx.123 | xx.456 | tbd |
Marmoset calJac1 | 49724 | 2889 Mb | 0.454272 | 3000 | medium | xx.123 | xx.456 | tbd |
Chimp panTro2 | 25 | 3028 Mb | 0.454514 | 3000 | medium | xx.123 | xx.456 | tbd |
GuineaPig cavPor2 | 295514 | 3246 Mb | 0.479871 | 3000 | medium | xx.123 | xx.456 | tbd |
Horse equCab1 | 32 | 1961 Mb | 0.479871 | 3000 | medium | xx.123 | xx.456 | tbd |
TreeShrew tupBel1 | 150851 | 3491 Mb | 0.494934 | 3000 | medium | xx.123 | xx.456 | tbd |
Bushbaby otoGar1 | 120882 | 3261 Mb | 0.498957 | 3000 | medium | xx.123 | xx.456 | tbd |
Armadillo dasNov1 | 304391 | 3678 Mb | 0.517360 | 3000 | medium | xx.123 | xx.456 | tbd |
Rabbit oryCun1 | 215471 | 3303 Mb | 0.519779 | 3000 | medium | xx.123 | xx.456 | tbd |
Cat felCat3 | 217790 | 3858 Mb | 0.530610 | 3000 | medium | xx.123 | xx.456 | tbd |
Dog canFam2 | 39 | 2331 Mb | 0.533544 | 3000 | medium | xx.123 | xx.456 | tbd |
Elephant loxAfr1 | 233134 | 3535 Mb | 0.536627 | 3000 | medium | xx.123 | xx.456 | tbd |
Cow bosTau3 | 30 | 2321 Mb | 0.540852 | 3000 | medium | xx.123 | xx.456 | tbd |
Hedgehog eriEur1 | 379801 | 3211 Mb | 0.632457 | 3000 | medium | xx.123 | xx.456 | tbd |
Shrew sorAra1 | 262057 | 2800 Mb | 0.658734 | 3000 | medium | xx.123 | xx.456 | tbd |
Tenrec echTel1 | 325491 | 3646 Mb | 0.666303 | 3000 | medium | xx.123 | xx.456 | tbd |
Opossum monDom4 | 9 | 3272 Mb | 0.909852 | 5000 | loose | xx.123 | xx.456 | tbd |
Platypus ornAna1 | 201522 | 1904 Mb | 1.165888 | 5000 | loose | xx.123 | xx.456 | tbd |
Chicken galGal3 | 33 | 984 Mb | 1.285399 | 5000 | loose | xx.123 | xx.456 | tbd |
Lizard anoCar1 | 7233 | 1699 Mb | 1.404225x | 5000 | loose | xx.123 | xx.456 | tbd |
X. tropicalis xenTro2 | 19759 | 1443 Mb | 1.726205 | 5000 | loose | xx.123 | xx.456 | tbd |
Stickleback gasAcu1 | 21 | 382 Mb | 2.012649 | 5000 | loose | xx.123 | xx.456 | tbd |
Zebrafish danRer4 | 25 | 1475 Mb | 2.027153 | 5000 | loose | xx.123 | xx.456 | tbd |
Tetraodon tetNig1 | 21 | 207 Mb | 2.051015 | 5000 | loose | xx.123 | xx.456 | tbd |
Fugu fr2 | 1 | 381 Mb | 2.086669 | 5000 | loose | xx.123 | xx.456 | tbd |
Medaka oryLat1 | 24 | 690 Mb | 2.200402 | 5000 | loose | xx.123 | xx.456 | tbd |
(*) chrom count does not include haplotypes, chr*_random, chrUn or chrM unless chrUn or scaffolds are the only sequences for that assembly.
chrom size has the same limitation as the chrom count, no randoms.
Tree distances are from the hg18 28-way measurements, with ponAbe1 and calJac1 manually inserted into the tree.
blastz alignment parameters details
target | query | abridged repeats |
target size (overlap) |
query size (overlap) |
H | M |
---|---|---|---|---|---|---|
mm9 | rat rn4 | yes B=0 |
10M (10K) | 10M (0) | 2000 | 40M |
human hg18 | mm9 | yes B=0 |
10M (0) | 10M (10K) | 2000 | 40M |
default blastz parameters
m=80 v=0 B=2 C=0 E=30 G=0 H=0 K=3000 L=K M=0 O=400 P=1 R=0 T=1 W=8 X=10*(A-to-A match score) Y=O+300*E Z=1 From the blastz usage message: Default values are given in parentheses. m(80M) bytes of space for trace-back information v(0) 0: quiet; 1: verbose progress reports to stderr B(2) 0: single strand; >0: both strands C(0) 0: no chaining; 1: just output chain; 2: chain and extend; 3: just output HSPs E(30) gap-extension penalty. G(0) diagonal chaining penalty. H(0) interpolate between alignments at threshold K = argument. K(3000) threshold for MSPs L(K) threshold for gapped alignments M(0) mask any base in seq1 hit this many times; 0 = no dynamic masking O(400) gap-open penalty. P(1) 0: entropy not used; 1: entropy used; >1 entropy with feedback. Q load the scoring matrix from a file. R(0) antidiagonal chaining penalty. T(1) 0: W-bp words; 1: 12of19; 2: 12of19 without transitions. 3: 14of22; 4: 14of22 without transitions. W(8) word size (unused unless T=0) X(10*(A-to-A match score)) X-drop parameter for ungapped extension. Y(O+300E) X-drop parameter for gapped extension. Z(1) increment between successive words in sequence 1.
matrix parameters
The "medium" gap score matrix, tuned for the mouse-human distance is:
tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300
The "loose" gap score matrix, tuned for the chicken-human distance is:
tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000