Human/hg19/GRCh37 46-way multiple alignment: Difference between revisions
(linear format tree) |
No edit summary |
||
(2 intermediate revisions by one other user not shown) | |||
Line 4: | Line 4: | ||
The initial release of this track include a phylogenetic tree that had two small errors in it. | The initial release of this track include a phylogenetic tree that had two small errors in it. | ||
[http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz46way/46way.nh 46way.nh]<BR> | [http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz46way/46way.nh 46way.nh]<BR> | ||
Namely: Baboon (papHam1) | Namely: Baboon (papHam1) was specified as an outgroup to [Rhesus (rheMac2) + apes] instead | ||
of correctly sister species. The same problem | of correctly as sister species. The same problem was present for Wallaby (macEug1) and | ||
Opossum (monDom5). The discussion below includes a corrected phylogenetic tree. | Opossum (monDom5). The discussion below includes a corrected phylogenetic tree. | ||
==Multiple Trees== | ==Multiple Trees== | ||
Line 103: | Line 23: | ||
Thus, there are six different phylogenetic trees. | Thus, there are six different phylogenetic trees. | ||
==Corrected Tree== | |||
1. all 46 species: | |||
(((((((((((((((((hg19,panTro2),gorGor1), ponAbe2), (rheMac2,papHam1)), | |||
calJac1),tarSyr1), (micMur1,otoGar1)), tupBel1),(((((mm9,rn4), | |||
dipOrd1),cavPor3), speTri1), (oryCun2,ochPri2))), | |||
(((vicPac1,(turTru1,bosTau4)), ((equCab2,(felCat3,canFam2)), | |||
(myoLuc1,pteVam1))), (eriEur1,sorAra1))), (((loxAfr3,proCap1),echTel1), | |||
(dasNov2,choHof1))), (monDom5,macEug1)),ornAna1), ((galGal3,taeGut1), | |||
anoCar1)),xenTro2), (((tetNig2,fr2), (gasAcu1,oryLat2)), danRer6)),petMar1) | |||
2. placental only subset: | |||
(((((((((((hg19,panTro2),gorGor1),ponAbe2),(rheMac2,papHam1)),calJac1), | |||
tarSyr1),(micMur1,otoGar1)),tupBel1),(((((mm9,rn4),dipOrd1), | |||
cavPor3),speTri1),(oryCun2,ochPri2))),(((vicPac1,(turTru1,bosTau4)), | |||
((equCab2,(felCat3,canFam2)),(myoLuc1,pteVam1))),(eriEur1,sorAra1))), | |||
(((loxAfr3,proCap1),echTel1),(dasNov2,choHof1))) | |||
3. primate only subset: | |||
(((((((hg19,panTro2),gorGor1),ponAbe2),(rheMac2,papHam1)),calJac1),tarSyr1),(micMur1,otoGar1)) | |||
I ran an experiment to rerun the entire multiple alignment with the corrected tree as well | |||
as recalculate phastCons and phyloP tracks. There are slight differences in the resulting | |||
multiple alignment, but nothing significant. It may be interesting to construct a difference | |||
track for points of interest on the phastCons and phyloP tracks that do have some differences. | |||
==4D sites branch length calculations== | |||
Branch lengths were estimated by taking a subset of the refSeq track: | |||
hgsql hg19 -Ne \ | |||
"select * from refGene,refSeqStatus where refGene.name=refSeqStatus.mrnaAcc | |||
and refSeqStatus.status='Reviewed' and mol='mRNA'" \ | |||
| cut -f 2-20 | egrep -E -v "chrM|chrUn|random|_hap|chrX" \ | |||
genePredSingleCover stdin stdout | sort > refSeqReviewedNR.gp | |||
Which is for the following chromosomes only: | |||
chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 | |||
chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 | |||
chr18 chr19 chr20 chr21 chr22 chrY | |||
Using the 2009-10-21 version of the [http://compgen.bscb.cornell.edu/phast/index.php PHAST] package. | |||
For each chromosome c in the above list, running msa_view on each $c.maf from the multiz alignment: | |||
awk -v C=$c '$2 == C {print}' refSeqReviewedNR.gp > $c.gp | |||
msa_view --4d --features $c.gp --do-cats 3 -i MAF $c.maf -o SS > $c.ss | |||
msa_view -i SS --tuple-size 1 $c.ss > mfa/chr${c}.mfa | |||
Then, putting all the chr*.mfa files together: | |||
msa_view --aggregate `cat species.lst` mfa/*.mfa | sed s/"> "/">"/ > all.mfa | |||
Using phyloFit to construct a tree model: | |||
phyloFit --EM --precision MED --msa-format FASTA --subst-mod REV --tree tree.all.nh all.mfa | |||
Adjust the frequencies back to a genome-wide GC percent of 0.41 with: | |||
modFreqs phyloFit.mod 0.41 > vertebrate.mod | |||
Resulting tree is for all 46 species. The same procedure is run for primate and placental subsets. | |||
And the same procedure was performed with just chrX. The resulting six trees are listed below. | |||
==Trees without chrX== | |||
1. Primate only subset | |||
(((((((hg19:0.006036,panTro2:0.006817):0.002265,gorGor1:0.008230):0.008764, | |||
ponAbe2:0.016911):0.012657(rheMac2:0.006959,papHam1:0.006378):0.027277):0.019382, | |||
calJac1:0.064230):0.054029,tarSyr1:0.132418):0.018980, | |||
(micMur1:0.087093,otoGar1:0.129282):0.018980); | |||
2. Placental only subset | |||
(((((((((((hg19:0.006024,panTro2:0.006789):0.002262,gorGor1:0.008208):0.008765, | |||
ponAbe2:0.016837):0.012535,(rheMac2:0.006872, | |||
papHam1:0.006439):0.027303):0.019859,calJac1:0.063414):0.055057, | |||
tarSyr1:0.129930):0.009838,(micMur1:0.085989, | |||
otoGar1:0.128778):0.033658):0.012238,tupBel1:0.194945):0.004481, | |||
(((((mm9:0.086189,rn4:0.089247):0.195157,dipOrd1:0.208591):0.023521, | |||
cavPor3:0.217198):0.010584,speTri1:0.151978):0.024663,(oryCun2:0.114630, | |||
ochPri2:0.198507):0.094820):0.012067):0.019462,(((vicPac1:0.107714, | |||
(turTru1:0.063700,bosTau4:0.114890):0.023250):0.035925, | |||
((equCab2:0.104886,(felCat3:0.103989,canFam2:0.108561):0.053212):0.005425, | |||
(myoLuc1:0.163463,pteVam1:0.131116):0.042793):0.004204):0.010077, | |||
(eriEur1:0.219045,sorAra1:0.286631):0.054487):0.020551):0.011417, | |||
(((loxAfr3:0.076308,proCap1:0.143785):0.026717,echTel1:0.221431):0.042789, | |||
(dasNov2:0.110105,choHof1:0.085867):0.045250):0.011417); | |||
3. all 46 species | |||
(((((((((((((((((hg19:0.006700,panTro2:0.006667):0.002250, | |||
gorGor1:0.008825):0.009680,ponAbe2:0.018318):0.014340, | |||
(rheMac2:0.007853,papHam1:0.007637):0.029618):0.021965, | |||
calJac1:0.066131):0.057590,tarSyr1:0.137823):0.011062,(micMur1:0.092749, | |||
otoGar1:0.129725):0.035463):0.015494,tupBel1:0.186203):0.004937, | |||
(((((mm9:0.084509,rn4:0.091589):0.197773,dipOrd1:0.211609):0.022992, | |||
cavPor3:0.225629):0.010150,speTri1:0.148468):0.025746,(oryCun2:0.114227, | |||
ochPri2:0.201069):0.101463):0.015313):0.020593,(((vicPac1:0.107275, | |||
(turTru1:0.064688,bosTau4:0.123592):0.025153):0.040335,((equCab2:0.109397, | |||
(felCat3:0.098612,canFam2:0.102458):0.049845):0.006219,(myoLuc1:0.142540, | |||
pteVam1:0.113399):0.033706):0.004508):0.011671,(eriEur1:0.221785, | |||
sorAra1:0.269562):0.056393):0.021227):0.023664,(((loxAfr3:0.082242, | |||
proCap1:0.155358):0.026990,echTel1:0.245936):0.049697, | |||
(dasNov2:0.116664,choHof1:0.096357):0.053145):0.006717):0.234728, | |||
(monDom5:0.125686,macEug1:0.122008):0.215100):0.071664, | |||
ornAna1:0.456592):0.109504,((galGal3:0.165536,taeGut1:0.171542):0.199223, | |||
anoCar1:0.489241):0.105143):0.172371,xenTro2:0.855573):0.311354, | |||
(((tetNig2:0.224159,fr2:0.203847):0.195181,(gasAcu1:0.316413, | |||
oryLat2:0.481970):0.059150):0.325640,danRer6:0.730752):0.147949):0.526688, | |||
petMar1:0.526688); | |||
==Trees on chrX only== | |||
1. primate only subset | |||
(((((((hg19:0.003917,panTro2:0.005184):0.002146,gorGor1:0.008108):0.007057, | |||
ponAbe2:0.015569):0.013208,(rheMac2:0.004711, | |||
papHam1:0.004180):0.023970):0.018430,calJac1:0.058028):0.053927, | |||
tarSyr1:0.096237):0.019719,(micMur1:0.074162,otoGar1:0.118457):0.019719); | |||
2. placental only subset | |||
(((((((((((hg19:0.003913,panTro2:0.005197):0.002196,gorGor1:0.008068):0.006904, | |||
ponAbe2:0.015655):0.013435,(rheMac2:0.004704, | |||
papHam1:0.004228):0.023895):0.019027,calJac1:0.057101):0.054600, | |||
tarSyr1:0.096642):0.013924,(micMur1:0.074221, | |||
otoGar1:0.117211):0.029832):0.013436,tupBel1:0.153211):0.002109, | |||
(((((mm9:0.063891,rn4:0.066094):0.167668,dipOrd1:0.175669):0.023604, | |||
cavPor3:0.171594):0.005607,speTri1:0.125382):0.026739,(oryCun2:0.083723, | |||
ochPri2:0.168135):0.075730):0.008263):0.019786,(((vicPac1:0.081343, | |||
(turTru1:0.056118,bosTau4:0.102627):0.021578):0.029857,((equCab2:0.087934, | |||
(felCat3:0.097379,canFam2:0.091434):0.043427):0.006482,(myoLuc1:0.117475, | |||
pteVam1:0.106041):0.027660):0.003592):0.010126,(eriEur1:0.212782, | |||
sorAra1:0.234802):0.043917):0.021099):0.013238,(((loxAfr3:0.066021, | |||
proCap1:0.128897):0.023867,echTel1:0.212319):0.046738, | |||
(dasNov2:0.101972,choHof1:0.102076):0.046533):0.013238); | |||
3. all 46 species | |||
(((((((((((((((((hg19:0.003795,panTro2:0.005196):0.002735, | |||
gorGor1:0.008038):0.006805,ponAbe2:0.015616):0.013581, | |||
(rheMac2:0.004670,papHam1:0.004195):0.023755):0.019288, | |||
calJac1:0.056549):0.053796,tarSyr1:0.095597):0.014122, | |||
(micMur1:0.074107,otoGar1:0.115630):0.029705):0.012941, | |||
tupBel1:0.151971):0.002729,(((((mm9:0.063333,rn4:0.065446):0.163518, | |||
dipOrd1:0.172634):0.023201,cavPor3:0.169031):0.005188, | |||
speTri1:0.123749):0.026667,(oryCun2:0.083602, | |||
ochPri2:0.165502):0.074856):0.008279):0.018611, | |||
(((vicPac1:0.080894,(turTru1:0.056486,bosTau4:0.101570):0.021353):0.030026, | |||
((equCab2:0.087517,(felCat3:0.097110,canFam2:0.090575):0.043182):0.006555, | |||
(myoLuc1:0.116436,pteVam1:0.105246):0.027654):0.003743):0.010755, | |||
(eriEur1:0.209272,sorAra1:0.229678):0.042690):0.021348):0.019835, | |||
(((loxAfr3:0.066090,proCap1:0.127321):0.023966,echTel1:0.208813):0.045208, | |||
(dasNov2:0.100868,choHof1:0.100549):0.045734):0.008854):0.264749, | |||
(monDom5:0.131898,macEug1:0.143442):0.220509):0.085972, | |||
ornAna1:0.483509):0.097702,((galGal3:0.177448,taeGut1:0.162788):0.246554, | |||
anoCar1:0.557249):0.117029):0.144329,xenTro2:0.880060):0.301475, | |||
(((tetNig2:0.229555,fr2:0.206494):0.159973,(gasAcu1:0.307725, | |||
oryLat2:0.476387):0.072242):0.286820,danRer6:0.792960):0.189983):0.438700, | |||
petMar1:0.438700); |
Latest revision as of 22:07, 6 January 2010
The 46 species multiple alignment on human/hg19/GRCh37 is an extra large bit of work. A discussion of the phylogenetic trees used in the alignment is included here.
Errata
The initial release of this track include a phylogenetic tree that had two small errors in it.
46way.nh
Namely: Baboon (papHam1) was specified as an outgroup to [Rhesus (rheMac2) + apes] instead
of correctly as sister species. The same problem was present for Wallaby (macEug1) and
Opossum (monDom5). The discussion below includes a corrected phylogenetic tree.
Multiple Trees
For the phyloP/phastCons calculations, there are a number of trees that were used.
There is a set of trees with branch lengths calculated based only on the ordinary chromosomes without chrX, and a set of trees calculated based only on chrX.
Within those two categories, there are three trees with branch lengths calculated from subsets of the 46 species:
- primate subset only
- placental mammal subset only
- all 46 vertebrates
Thus, there are six different phylogenetic trees.
Corrected Tree
1. all 46 species:
(((((((((((((((((hg19,panTro2),gorGor1), ponAbe2), (rheMac2,papHam1)), calJac1),tarSyr1), (micMur1,otoGar1)), tupBel1),(((((mm9,rn4), dipOrd1),cavPor3), speTri1), (oryCun2,ochPri2))), (((vicPac1,(turTru1,bosTau4)), ((equCab2,(felCat3,canFam2)), (myoLuc1,pteVam1))), (eriEur1,sorAra1))), (((loxAfr3,proCap1),echTel1), (dasNov2,choHof1))), (monDom5,macEug1)),ornAna1), ((galGal3,taeGut1), anoCar1)),xenTro2), (((tetNig2,fr2), (gasAcu1,oryLat2)), danRer6)),petMar1)
2. placental only subset:
(((((((((((hg19,panTro2),gorGor1),ponAbe2),(rheMac2,papHam1)),calJac1), tarSyr1),(micMur1,otoGar1)),tupBel1),(((((mm9,rn4),dipOrd1), cavPor3),speTri1),(oryCun2,ochPri2))),(((vicPac1,(turTru1,bosTau4)), ((equCab2,(felCat3,canFam2)),(myoLuc1,pteVam1))),(eriEur1,sorAra1))), (((loxAfr3,proCap1),echTel1),(dasNov2,choHof1)))
3. primate only subset:
(((((((hg19,panTro2),gorGor1),ponAbe2),(rheMac2,papHam1)),calJac1),tarSyr1),(micMur1,otoGar1))
I ran an experiment to rerun the entire multiple alignment with the corrected tree as well as recalculate phastCons and phyloP tracks. There are slight differences in the resulting multiple alignment, but nothing significant. It may be interesting to construct a difference track for points of interest on the phastCons and phyloP tracks that do have some differences.
4D sites branch length calculations
Branch lengths were estimated by taking a subset of the refSeq track:
hgsql hg19 -Ne \ "select * from refGene,refSeqStatus where refGene.name=refSeqStatus.mrnaAcc and refSeqStatus.status='Reviewed' and mol='mRNA'" \ | cut -f 2-20 | egrep -E -v "chrM|chrUn|random|_hap|chrX" \ genePredSingleCover stdin stdout | sort > refSeqReviewedNR.gp
Which is for the following chromosomes only:
chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrY
Using the 2009-10-21 version of the PHAST package.
For each chromosome c in the above list, running msa_view on each $c.maf from the multiz alignment:
awk -v C=$c '$2 == C {print}' refSeqReviewedNR.gp > $c.gp msa_view --4d --features $c.gp --do-cats 3 -i MAF $c.maf -o SS > $c.ss msa_view -i SS --tuple-size 1 $c.ss > mfa/chr${c}.mfa
Then, putting all the chr*.mfa files together:
msa_view --aggregate `cat species.lst` mfa/*.mfa | sed s/"> "/">"/ > all.mfa
Using phyloFit to construct a tree model:
phyloFit --EM --precision MED --msa-format FASTA --subst-mod REV --tree tree.all.nh all.mfa
Adjust the frequencies back to a genome-wide GC percent of 0.41 with:
modFreqs phyloFit.mod 0.41 > vertebrate.mod
Resulting tree is for all 46 species. The same procedure is run for primate and placental subsets. And the same procedure was performed with just chrX. The resulting six trees are listed below.
Trees without chrX
1. Primate only subset
(((((((hg19:0.006036,panTro2:0.006817):0.002265,gorGor1:0.008230):0.008764, ponAbe2:0.016911):0.012657(rheMac2:0.006959,papHam1:0.006378):0.027277):0.019382, calJac1:0.064230):0.054029,tarSyr1:0.132418):0.018980, (micMur1:0.087093,otoGar1:0.129282):0.018980);
2. Placental only subset
(((((((((((hg19:0.006024,panTro2:0.006789):0.002262,gorGor1:0.008208):0.008765, ponAbe2:0.016837):0.012535,(rheMac2:0.006872, papHam1:0.006439):0.027303):0.019859,calJac1:0.063414):0.055057, tarSyr1:0.129930):0.009838,(micMur1:0.085989, otoGar1:0.128778):0.033658):0.012238,tupBel1:0.194945):0.004481, (((((mm9:0.086189,rn4:0.089247):0.195157,dipOrd1:0.208591):0.023521, cavPor3:0.217198):0.010584,speTri1:0.151978):0.024663,(oryCun2:0.114630, ochPri2:0.198507):0.094820):0.012067):0.019462,(((vicPac1:0.107714, (turTru1:0.063700,bosTau4:0.114890):0.023250):0.035925, ((equCab2:0.104886,(felCat3:0.103989,canFam2:0.108561):0.053212):0.005425, (myoLuc1:0.163463,pteVam1:0.131116):0.042793):0.004204):0.010077, (eriEur1:0.219045,sorAra1:0.286631):0.054487):0.020551):0.011417, (((loxAfr3:0.076308,proCap1:0.143785):0.026717,echTel1:0.221431):0.042789, (dasNov2:0.110105,choHof1:0.085867):0.045250):0.011417);
3. all 46 species
(((((((((((((((((hg19:0.006700,panTro2:0.006667):0.002250, gorGor1:0.008825):0.009680,ponAbe2:0.018318):0.014340, (rheMac2:0.007853,papHam1:0.007637):0.029618):0.021965, calJac1:0.066131):0.057590,tarSyr1:0.137823):0.011062,(micMur1:0.092749, otoGar1:0.129725):0.035463):0.015494,tupBel1:0.186203):0.004937, (((((mm9:0.084509,rn4:0.091589):0.197773,dipOrd1:0.211609):0.022992, cavPor3:0.225629):0.010150,speTri1:0.148468):0.025746,(oryCun2:0.114227, ochPri2:0.201069):0.101463):0.015313):0.020593,(((vicPac1:0.107275, (turTru1:0.064688,bosTau4:0.123592):0.025153):0.040335,((equCab2:0.109397, (felCat3:0.098612,canFam2:0.102458):0.049845):0.006219,(myoLuc1:0.142540, pteVam1:0.113399):0.033706):0.004508):0.011671,(eriEur1:0.221785, sorAra1:0.269562):0.056393):0.021227):0.023664,(((loxAfr3:0.082242, proCap1:0.155358):0.026990,echTel1:0.245936):0.049697, (dasNov2:0.116664,choHof1:0.096357):0.053145):0.006717):0.234728, (monDom5:0.125686,macEug1:0.122008):0.215100):0.071664, ornAna1:0.456592):0.109504,((galGal3:0.165536,taeGut1:0.171542):0.199223, anoCar1:0.489241):0.105143):0.172371,xenTro2:0.855573):0.311354, (((tetNig2:0.224159,fr2:0.203847):0.195181,(gasAcu1:0.316413, oryLat2:0.481970):0.059150):0.325640,danRer6:0.730752):0.147949):0.526688, petMar1:0.526688);
Trees on chrX only
1. primate only subset
(((((((hg19:0.003917,panTro2:0.005184):0.002146,gorGor1:0.008108):0.007057, ponAbe2:0.015569):0.013208,(rheMac2:0.004711, papHam1:0.004180):0.023970):0.018430,calJac1:0.058028):0.053927, tarSyr1:0.096237):0.019719,(micMur1:0.074162,otoGar1:0.118457):0.019719);
2. placental only subset
(((((((((((hg19:0.003913,panTro2:0.005197):0.002196,gorGor1:0.008068):0.006904, ponAbe2:0.015655):0.013435,(rheMac2:0.004704, papHam1:0.004228):0.023895):0.019027,calJac1:0.057101):0.054600, tarSyr1:0.096642):0.013924,(micMur1:0.074221, otoGar1:0.117211):0.029832):0.013436,tupBel1:0.153211):0.002109, (((((mm9:0.063891,rn4:0.066094):0.167668,dipOrd1:0.175669):0.023604, cavPor3:0.171594):0.005607,speTri1:0.125382):0.026739,(oryCun2:0.083723, ochPri2:0.168135):0.075730):0.008263):0.019786,(((vicPac1:0.081343, (turTru1:0.056118,bosTau4:0.102627):0.021578):0.029857,((equCab2:0.087934, (felCat3:0.097379,canFam2:0.091434):0.043427):0.006482,(myoLuc1:0.117475, pteVam1:0.106041):0.027660):0.003592):0.010126,(eriEur1:0.212782, sorAra1:0.234802):0.043917):0.021099):0.013238,(((loxAfr3:0.066021, proCap1:0.128897):0.023867,echTel1:0.212319):0.046738, (dasNov2:0.101972,choHof1:0.102076):0.046533):0.013238);
3. all 46 species
(((((((((((((((((hg19:0.003795,panTro2:0.005196):0.002735, gorGor1:0.008038):0.006805,ponAbe2:0.015616):0.013581, (rheMac2:0.004670,papHam1:0.004195):0.023755):0.019288, calJac1:0.056549):0.053796,tarSyr1:0.095597):0.014122, (micMur1:0.074107,otoGar1:0.115630):0.029705):0.012941, tupBel1:0.151971):0.002729,(((((mm9:0.063333,rn4:0.065446):0.163518, dipOrd1:0.172634):0.023201,cavPor3:0.169031):0.005188, speTri1:0.123749):0.026667,(oryCun2:0.083602, ochPri2:0.165502):0.074856):0.008279):0.018611, (((vicPac1:0.080894,(turTru1:0.056486,bosTau4:0.101570):0.021353):0.030026, ((equCab2:0.087517,(felCat3:0.097110,canFam2:0.090575):0.043182):0.006555, (myoLuc1:0.116436,pteVam1:0.105246):0.027654):0.003743):0.010755, (eriEur1:0.209272,sorAra1:0.229678):0.042690):0.021348):0.019835, (((loxAfr3:0.066090,proCap1:0.127321):0.023966,echTel1:0.208813):0.045208, (dasNov2:0.100868,choHof1:0.100549):0.045734):0.008854):0.264749, (monDom5:0.131898,macEug1:0.143442):0.220509):0.085972, ornAna1:0.483509):0.097702,((galGal3:0.177448,taeGut1:0.162788):0.246554, anoCar1:0.557249):0.117029):0.144329,xenTro2:0.880060):0.301475, (((tetNig2:0.229555,fr2:0.206494):0.159973,(gasAcu1:0.307725, oryLat2:0.476387):0.072242):0.286820,danRer6:0.792960):0.189983):0.438700, petMar1:0.438700);