Human/hg19/GRCh37 46-way multiple alignment: Difference between revisions
(adding corrected trees) |
(rearrange section) |
||
Line 7: | Line 7: | ||
of correctly sister species. The same problem is present for Wallaby (macEug1) and | of correctly sister species. The same problem is present for Wallaby (macEug1) and | ||
Opossum (monDom5). The discussion below includes a corrected phylogenetic tree. | Opossum (monDom5). The discussion below includes a corrected phylogenetic tree. | ||
==Multiple Trees== | |||
For the phyloP/phastCons calculations, there are a number of trees that were used. | |||
There is a set of trees with branch lengths calculated based only on the | |||
ordinary chromosomes without chrX, and a set of trees calculated based only on chrX. | |||
Within those two categories, there are three trees with branch lengths calculated | |||
from subsets of the 46 species: | |||
# primate subset only | |||
# placental mammal subset only | |||
# all 46 vertebrates | |||
Thus, there are six different phylogenetic trees. | |||
==Corrected Tree== | ==Corrected Tree== | ||
Line 174: | Line 190: | ||
oryLat2:0.476387):0.072242):0.286820,danRer6:0.792960):0.189983):0.438700, | oryLat2:0.476387):0.072242):0.286820,danRer6:0.792960):0.189983):0.438700, | ||
petMar1:0.438700); | petMar1:0.438700); | ||
Revision as of 18:49, 7 December 2009
The 46 species multiple alignment on human/hg19/GRCh37 is an extra large bit of work. A discussion of the phylogenetic trees used in the alignment is included here.
Errata
The initial release of this track include a phylogenetic tree that had two small errors in it.
46way.nh
Namely: Baboon (papHam1) and Rhesus (rheMac2) were specified as separate nodes instead
of correctly sister species. The same problem is present for Wallaby (macEug1) and
Opossum (monDom5). The discussion below includes a corrected phylogenetic tree.
Multiple Trees
For the phyloP/phastCons calculations, there are a number of trees that were used.
There is a set of trees with branch lengths calculated based only on the ordinary chromosomes without chrX, and a set of trees calculated based only on chrX.
Within those two categories, there are three trees with branch lengths calculated from subsets of the 46 species:
- primate subset only
- placental mammal subset only
- all 46 vertebrates
Thus, there are six different phylogenetic trees.
Corrected Tree
1. all 46 species:
(((((((((((((((((hg19,panTro2),gorGor1), ponAbe2), (rheMac2,papHam1)), calJac1),tarSyr1), (micMur1,otoGar1)), tupBel1),(((((mm9,rn4), dipOrd1),cavPor3), speTri1), (oryCun2,ochPri2))), (((vicPac1,(turTru1,bosTau4)), ((equCab2,(felCat3,canFam2)), (myoLuc1,pteVam1))), (eriEur1,sorAra1))), (((loxAfr3,proCap1),echTel1), (dasNov2,choHof1))), (monDom5,macEug1)),ornAna1), ((galGal3,taeGut1), anoCar1)),xenTro2), (((tetNig2,fr2), (gasAcu1,oryLat2)), danRer6)),petMar1)
2. placental only subset:
(((((((((((hg19,panTro2),gorGor1),ponAbe2),(rheMac2,papHam1)),calJac1), tarSyr1),(micMur1,otoGar1)),tupBel1),(((((mm9,rn4),dipOrd1), cavPor3),speTri1),(oryCun2,ochPri2))),(((vicPac1,(turTru1,bosTau4)), ((equCab2,(felCat3,canFam2)),(myoLuc1,pteVam1))),(eriEur1,sorAra1))), (((loxAfr3,proCap1),echTel1),(dasNov2,choHof1)))
3. primate only subset:
(((((((hg19,panTro2),gorGor1),ponAbe2),(rheMac2,papHam1)),calJac1),tarSyr1),(micMur1,otoGar1))
I ran an experiment to rerun the entire multiple alignment with the corrected tree as well as recalculate phastCons and phyloP tracks. There are slight differences in the resulting multiple alignment, but nothing significant. It may be interesting to construct a difference track for points of interest on the phastCons and phyloP tracks that do have some differences.
4D sites branch length calculations
Branch lengths were estimated by taking a subset of the refSeq track:
hgsql hg19 -Ne \ "select * from refGene,refSeqStatus where refGene.name=refSeqStatus.mrnaAcc and refSeqStatus.status='Reviewed' and mol='mRNA'" \ | cut -f 2-20 | egrep -E -v "chrM|chrUn|random|_hap|chrX" \ genePredSingleCover stdin stdout | sort > refSeqReviewedNR.gp
Which is for the following chromosomes only:
chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrY
Using the 2009-10-21 version of the PHAST package.
For each chromosome c in the above list, running msa_view on each $c.maf from the multiz alignment:
awk -v C=$c '$2 == C {print}' refSeqReviewedNR.gp > $c.gp msa_view --4d --features $c.gp --do-cats 3 -i MAF $c.maf -o SS > $c.ss msa_view -i SS --tuple-size 1 $c.ss > mfa/chr${c}.mfa
Then, putting all the chr*.mfa files together:
msa_view --aggregate `cat species.lst` mfa/*.mfa | sed s/"> "/">"/ > all.mfa
Using phyloFit to construct a tree model:
phyloFit --EM --precision MED --msa-format FASTA --subst-mod REV --tree tree.all.nh all.mfa
Adjust the frequencies back to a genome-wide GC percent of 0.41 with:
modFreqs phyloFit.mod 0.41 > vertebrate.mod
Resulting tree is for all 46 species. The same procedure is run for primate and placental subsets. And the same procedure was performed with just chrX. The resulting six trees are listed below.
Trees without chrX
1. Primate only subset
(((((((hg19:0.006036,panTro2:0.006817):0.002265,gorGor1:0.008230):0.008764, ponAbe2:0.016911):0.012657(rheMac2:0.006959,papHam1:0.006378):0.027277):0.019382, calJac1:0.064230):0.054029,tarSyr1:0.132418):0.018980, (micMur1:0.087093,otoGar1:0.129282):0.018980);
2. Placental only subset
(((((((((((hg19:0.006024,panTro2:0.006789):0.002262,gorGor1:0.008208):0.008765, ponAbe2:0.016837):0.012535,(rheMac2:0.006872, papHam1:0.006439):0.027303):0.019859,calJac1:0.063414):0.055057, tarSyr1:0.129930):0.009838,(micMur1:0.085989, otoGar1:0.128778):0.033658):0.012238,tupBel1:0.194945):0.004481, (((((mm9:0.086189,rn4:0.089247):0.195157,dipOrd1:0.208591):0.023521, cavPor3:0.217198):0.010584,speTri1:0.151978):0.024663,(oryCun2:0.114630, ochPri2:0.198507):0.094820):0.012067):0.019462,(((vicPac1:0.107714, (turTru1:0.063700,bosTau4:0.114890):0.023250):0.035925, ((equCab2:0.104886,(felCat3:0.103989,canFam2:0.108561):0.053212):0.005425, (myoLuc1:0.163463,pteVam1:0.131116):0.042793):0.004204):0.010077, (eriEur1:0.219045,sorAra1:0.286631):0.054487):0.020551):0.011417, (((loxAfr3:0.076308,proCap1:0.143785):0.026717,echTel1:0.221431):0.042789, (dasNov2:0.110105,choHof1:0.085867):0.045250):0.011417);
3. all 46 species
(((((((((((((((((hg19:0.006700,panTro2:0.006667):0.002250, gorGor1:0.008825):0.009680,ponAbe2:0.018318):0.014340, (rheMac2:0.007853,papHam1:0.007637):0.029618):0.021965, calJac1:0.066131):0.057590,tarSyr1:0.137823):0.011062,(micMur1:0.092749, otoGar1:0.129725):0.035463):0.015494,tupBel1:0.186203):0.004937, (((((mm9:0.084509,rn4:0.091589):0.197773,dipOrd1:0.211609):0.022992, cavPor3:0.225629):0.010150,speTri1:0.148468):0.025746,(oryCun2:0.114227, ochPri2:0.201069):0.101463):0.015313):0.020593,(((vicPac1:0.107275, (turTru1:0.064688,bosTau4:0.123592):0.025153):0.040335,((equCab2:0.109397, (felCat3:0.098612,canFam2:0.102458):0.049845):0.006219,(myoLuc1:0.142540, pteVam1:0.113399):0.033706):0.004508):0.011671,(eriEur1:0.221785, sorAra1:0.269562):0.056393):0.021227):0.023664,(((loxAfr3:0.082242, proCap1:0.155358):0.026990,echTel1:0.245936):0.049697, (dasNov2:0.116664,choHof1:0.096357):0.053145):0.006717):0.234728, (monDom5:0.125686,macEug1:0.122008):0.215100):0.071664, ornAna1:0.456592):0.109504,((galGal3:0.165536,taeGut1:0.171542):0.199223, anoCar1:0.489241):0.105143):0.172371,xenTro2:0.855573):0.311354, (((tetNig2:0.224159,fr2:0.203847):0.195181,(gasAcu1:0.316413, oryLat2:0.481970):0.059150):0.325640,danRer6:0.730752):0.147949):0.526688, petMar1:0.526688);
Trees on chrX only
1. primate only subset
(((((((hg19:0.003917,panTro2:0.005184):0.002146,gorGor1:0.008108):0.007057, ponAbe2:0.015569):0.013208,(rheMac2:0.004711, papHam1:0.004180):0.023970):0.018430,calJac1:0.058028):0.053927, tarSyr1:0.096237):0.019719,(micMur1:0.074162,otoGar1:0.118457):0.019719);
2. placental only subset
(((((((((((hg19:0.003913,panTro2:0.005197):0.002196,gorGor1:0.008068):0.006904, ponAbe2:0.015655):0.013435,(rheMac2:0.004704, papHam1:0.004228):0.023895):0.019027,calJac1:0.057101):0.054600, tarSyr1:0.096642):0.013924,(micMur1:0.074221, otoGar1:0.117211):0.029832):0.013436,tupBel1:0.153211):0.002109, (((((mm9:0.063891,rn4:0.066094):0.167668,dipOrd1:0.175669):0.023604, cavPor3:0.171594):0.005607,speTri1:0.125382):0.026739,(oryCun2:0.083723, ochPri2:0.168135):0.075730):0.008263):0.019786,(((vicPac1:0.081343, (turTru1:0.056118,bosTau4:0.102627):0.021578):0.029857,((equCab2:0.087934, (felCat3:0.097379,canFam2:0.091434):0.043427):0.006482,(myoLuc1:0.117475, pteVam1:0.106041):0.027660):0.003592):0.010126,(eriEur1:0.212782, sorAra1:0.234802):0.043917):0.021099):0.013238,(((loxAfr3:0.066021, proCap1:0.128897):0.023867,echTel1:0.212319):0.046738, (dasNov2:0.101972,choHof1:0.102076):0.046533):0.013238);
3. all 46 species
(((((((((((((((((hg19:0.003795,panTro2:0.005196):0.002735, gorGor1:0.008038):0.006805,ponAbe2:0.015616):0.013581, (rheMac2:0.004670,papHam1:0.004195):0.023755):0.019288, calJac1:0.056549):0.053796,tarSyr1:0.095597):0.014122, (micMur1:0.074107,otoGar1:0.115630):0.029705):0.012941, tupBel1:0.151971):0.002729,(((((mm9:0.063333,rn4:0.065446):0.163518, dipOrd1:0.172634):0.023201,cavPor3:0.169031):0.005188, speTri1:0.123749):0.026667,(oryCun2:0.083602, ochPri2:0.165502):0.074856):0.008279):0.018611, (((vicPac1:0.080894,(turTru1:0.056486,bosTau4:0.101570):0.021353):0.030026, ((equCab2:0.087517,(felCat3:0.097110,canFam2:0.090575):0.043182):0.006555, (myoLuc1:0.116436,pteVam1:0.105246):0.027654):0.003743):0.010755, (eriEur1:0.209272,sorAra1:0.229678):0.042690):0.021348):0.019835, (((loxAfr3:0.066090,proCap1:0.127321):0.023966,echTel1:0.208813):0.045208, (dasNov2:0.100868,choHof1:0.100549):0.045734):0.008854):0.264749, (monDom5:0.131898,macEug1:0.143442):0.220509):0.085972, ornAna1:0.483509):0.097702,((galGal3:0.177448,taeGut1:0.162788):0.246554, anoCar1:0.557249):0.117029):0.144329,xenTro2:0.880060):0.301475, (((tetNig2:0.229555,fr2:0.206494):0.159973,(gasAcu1:0.307725, oryLat2:0.476387):0.072242):0.286820,danRer6:0.792960):0.189983):0.438700, petMar1:0.438700);