Opsin evolution: ancestral introns

From genomewiki
Revision as of 17:40, 3 January 2008 by Tomemerald (talk | contribs)
Jump to navigationJump to search

Introns within coding regions of opsin genes can potentially provide an independent (or supplemental) means of organizing known opsins and and classifying new ones. This becomes especially important as the universe of opsins has expanded to include rhabdomeric opsins within deuterostomes, ciliary opsins within protostomes, and novel opsins from cnidarians which might be otherwise difficult to separate from rhodopsin-superfamily non-opsins and generic GPCR.

Changes in intron pattern consititue a category of rare genetic event (RGE). Other RGEs relevent to opsins include coding indels (insertion or deletion of amino acies) and gene order rearrangements along a chromosome. RGEs are characters that can be used in gene tree analyses and reconstruction of ancestral states. Each type of RGE has its own intrinsic time scale that makes it useful on aspects of opsin evolution over comparable time frames. Intron patterns are extremely conserved, making them useless (stay the same) over mammalian, even vertebrate, time scales but appropriate to Eumetazoan. Indels are quite conserved but informative within opsins over shorter periods. Gene order is only moderately conserved within Bilatera, more commonly it is completely washed out. All RGEs are subject to some degree of homoplasy (multiple independent origins).

The intron pattern consists of two parameters, location and phase (codon splitting across two exons):

Location is easy to specify homologically because opsins contain numerous invariant or near-invariant residues sprinkled along their length that unambiguously anchor alignments. The main potential difficulty occurs near an indel (insertion or deletion). However indels are very rarely fixed in the core region of opsins because the transmembrane helices (3.4 residues per turn) do not tolerate that disruption of their bundle associations or retinal tuning or membrane spanning lengths and because the cytoplasmic and extracellular loop regions are generally too short or otherwise significantly engaged in conserved interactions with other signaling molecules. Indels in the amino and carboxy termini, which in some opsin classes are extended and poorly conserved, are exceptions to this.

It's quite possible for more or less the same intron location to arise repeatedly (convergent evolution), especially when 'same' is slightly muddled by indel ambiguity. However phase determination can often disambiguate this issue. Here we must briefly review MolBio 101 because many opsin papers exhibit total unawareness of the phase concept:

Three possibilities exist for intron phase: In phase 00, the splice donor (GT in all known opsins) follows immediately after last triplet codon of an exon and the splice acceptor (AG in all known opsins) can immediately precede the first codon of the next exon. In phase 12, an extra basepair follows the last complete triplet codon and precede the start of the splice donor; two extra base pairs (which fill out the split codon and preserve reading frame) follow the splice acceptor but precede the next complete codon. In phase 21 introns, the overhang is 2 bp at the donor complemented by 1 bp overhang at the acceptor, together a new 3 bp triplet codon.

Opsins phaseTypes.png

>MEL1_homSap Homo sapiens (human) Gq  483 NM_033282 melanopsin OPN4                                               
0 MNPPSGPRVPPSPTQEPSCMATPAPPSWWDSSQSSISSLGRLPSISPT 0 
0 APGTWAAAWVPLPTVDVPDHAHYTLGTVILLVGLTGMLGNLTVIYTFCR 2
1 SRSLRTPANMFIINLAVSDFLMSFTQAPVFFTSSLYKQWLFGET 1
...

It's useful to indicate phase information within the fasta representation of a sequence. That's done here by line breaks between exons with associated phase overhangs shown by numbers. These numbers are ignored by the vast majority of web software tools so the extra characters do not to be purged before say a blast query. By convention, the initial methionine is preceded by a 0 even though it is almost always part of a larger 5' UTR. Similarily the stop codon asterisk is followed by a 0 even though it is almost always part of a longer 3' UTR. One last convention: the 'extra' amino acid formed by 12 or 21 introns is assigned to the 2 side of the exon break. It's often given incorrectly in Blast output because that tool is not aware of exon breaks and often extends alignments past them into a translated intron.

Normally, phase is determined by aligning by blastn of a transcript (processed already by the cell to remove introns) against genomic. If genomic is not available, the transcript can by intronated be comparison to a phylogenetically close orthologous gene.

For example, full length transcripts were sequenced for various opsins from the amphioxus Branchiostoma belcheri. However the genome project is not there but over in Branchiostoma floridae. So the B. belcheri proteins need to be placed within the B. floridae, which is most conveniently done using Blat on the UCSC genome browser. Exon boundaries are then read off from the alignment details page using 3-frame translation in a second web browser tab at Expasy to ensure smooth reading frame joins and uBlastx against the opsin collection in a third to monitor alignment. That process accurately intronates the presumptive ortholog in B. floridae. Finally B. belcheri is back-intronated by alignment. That amounts to a testable prediction of intron pattern in that species.

This won't be accurate if B. belcheri has gained or lost introns since the two species diverged. However, outside of certain rogue species, introns typically have a "half-life" of perhaps 5 billion years of branch length, many multiples of the divergence time here. (They're much more conserved than amino acid sequence.) Consequently the inferred gene model will be correct 99% of the time. However every sequence in the Opsin Classifier that originated in a genome project was independently intronated within that project, never by homology. And some species without genome projects (like Platynereis) have the occasional large contig containing an opsin.

In practise, it is easy to make small mistakes in assigning phases to genes, especially when percent identity is remote from the alignment query. That's because GT and AG are common dinucleotides and sometimes multiple options seem viable (preserve reading frame). Of course, there's much more to splice sites than just two dinucleotides so sometimes those additional properties are used to sort out the possibilities (gene prediction software tools carry sophisticated versions of these rules). Usually just one possibility works consistently across the comparative genomics spectrum within a given orthology class of opsins.

It turns out that in post-lamprey deuterostomes not a single case of intron gain or loss can be documented in any of the 14 gene trees (recall each sequence set is maximally phylogenetically dispersed). Since each tree contains several billions of years of branch length, the overall event rate for opsins in this clade is lower than 1 per 50 billion years of evolutionary clock time. That's not atypical -- in fact intron gain and loss is know to be very infrequent (but not zero) across the entire vertebrate proteome.

However other issues must be considered such as alternative splicing, intron sliding, NAGNAG ambiguity, asymmetric relative frequencies of phase types, mechanisms and relative frequency of gains and loses, hotspots of predisposition, likelihood estimates of convergent evolution, migration out of GT-AG to minor splice forms and so forth. Alternative splicing is irrelevent to opsins because of their intolerant structure -- alternative transcripts in all likelihood are just the usual transcriptome noise. Intron sliding is a nutty literature concept repeatedly debunked. Acceptor ambiguity is real but not seen yet in opsins. Phase 0 is disproportionate genomewide, often half of all coding phases.

There is a general predisposition to enhanced intron loss distally (3') due to recombination with processed mrna; mechanisms of intron gain are a bit cryptic and -- like everything else -- cannot be assumed the same across all of Metazoa. General trends need not be applicable to this specific gene family, Hotspots may have relevance to opsins but apply to inaccessible ancestral sequence rather than contemporary forms.

Keeping all this in mind, I intronated nearly 200 phylogenetically dispersed opsins in the Opsin Classifier using direct genomic comparison when possible and homology annotation transfer when not. The error rate is not zero; anomalies needing revisiting are concentrated in 12 and 21 introns and in opsins without close homologs. When a gene model is fragmentary, only half the splice site may be available so that validation is lacking. In some assemblies, there seem to be sequencing errors that don't allow introns where they are required to avoid premature stop codons.

The literature on intron antiquity is hopelessly muddled due to intemperate speculation in the pre-informatics era of the previous century. Today we know that the vast majority of (say) human introns are very old, dating back to early unicellular eukaryotes (eg introns in human SUMF1 are shared with diatom). Consequently they were well entrenched at the time of Eumetazoan emergence and experienced little gain or loss in any but the rogue lineages (notably sea urchins, tunicates, nematodes, fruit flies). We expect most deuterostome introns to be present in at least some species of ecdysozoa, lophotrochozoa, and cnidaria; this has been validated recently in the case of Nematostella.

Let's consider first the intron situation in ciliary opsins and whether we can unambiguously determine the ancestral intron pattern at the time of Urbilatera or even Urmetazoa. We'll use common sense parsimony rather than maximal likelihood methods because these simply bury their subjectivity within rarely discussed model assumptions that aren't likely to consistently hold across this vast time and clade scale -- higher taxa sampling density on long branches is the best way to test and improve ancestral intron prediction.

Two detailed examples in Annotation Tricks section explain how the Opsin Classifier sequence collection can be used in conjunction with uBlast to determine whether exon breaks of a given opsin agree with another. Let's look now at the full set of all known ciliary opsins, knowing in advance that those in protostomes will ultimately arbitrate the deuterome situation as outgroup until such time as definitive relevently intronated cnidarian opsins are located.

It's apparent that all 109 deuterostome ciliary opsins in our current 10 orthology class collection (namely RHO1 RHO2 SWS2 SWS LWS PIN VAOP PPIN PARIE ENCEPH, with 4 classes RGR PER NEUR and MEL held out as intron pattern specificity controls) exhibit a single conserved intronation pattern across the human to echinoderm time scale. There are some significant events, all of which predate lamprey divergence, such as the extra intron in LWS opsins. Here the gene tree structure can provide outgroups capable of distinguishing intron gain from intron loss (LWS experienced a gain). This gene tree is reliably enough known from many publications or can be quickly generated with common software such as ClustalW from the vastly expanded collection in the Opsin Classifier.

In Lophotrochozoa, the situation is much more limited with just two Platynereis opsins, of which one is well-characterized by experiment and the other is directly intronatable. Perhaps with targeted sequencing effort, new cdna, additional bioinformatics, or more complete genomes, homologs will emerge in Capitella, Helobdella, Aplysia, Lottia, Schmidtea, or Schistosoma. These species already provide additional intronated opsins of melanopsin class.

In Ecdysozoa, 3 ciliary opsins had been previously established in Anopheles and Apis whereas they had been completely ruled out in the (truly finished) Drosophila genome. The list of genomic species with the 'same' ciliary opsin can be readily extended to Culex, Aedes, Tribolium, Bombyx, and Daphnia. However gene loss seems to have happened repeatedly (or current coverage is insufficient; the gene cannot be found in Nasonia, Ixodes, and others.

To procede with the actual work of ancestral intron determination, it's helpful to first reduce the number of sequences to a smaller set of proxy sequences that retain all the information but less of the clutter. That is, it's nice to know that introns in RHO1 are exactly conserved in location and phase in the phylogenetically diverse set of 14 sequences spanning human to lamprey but once that has been determined, the task shifts to comparing RHO1 introns with other opsins, a single representative RHO1 sequence suffices. (The set of 14 RHO1 in the reference collecion is already an immense reduction of the hundreds available in over-sampled fish and placental mammals.) That proxy sequence can also carry coding indel and synteny information.

The proxy sequence can be somewhat optimized to allow more reliable homological comparisons of intron positions to other opsins, including remotely related proteins. Various options exist, such as ancestral reconstruction, consensus sequence, profile sequence, basal diverging species sequence (lamprey), or single-species-consistent set (frog would work). However many ciliary opsin homology classes first surface at lamprey, meaning no earlier ancestral sequence could be reconstructed. The ultimate accuracy of ancestral sequences is largely not validatable; errors can arise in ectopic (co-evolving but non-adjacent) amino acids.

Arthropod rhabdomeric opsins are another arena where 'too many' sequences have been determined in insects without genomes. Here too the gene tree is well established and a reliable enough ancestral proxy sequence feasible. That's not the case for ciliary opsins in lophtrochozoa, cnidaria, and early diverging deuterostomes; here the full sparse set of individual sequences must be retained. This mixes noisy contemporary opsin sequences with filtered ancestral sequences -- not necessarily a problem but something to keep in view at the interpretative stage.

However it turns out that ciliary, rhabdomeric, and 'retinal isomerase' opsins introns in Bilatera may have completely disjoint sets of introns -- any common ground is far more deeply ancestral. Hence they can be considered separately. Further, no significant variation whatsoever occurs in intron pattern within any orthology class (these were established independently), though intron pattern alone is not sufficient to distinguish all orthology classes.

Opsin ancient cil introns.png

Ciliary opsins, concentrated in deuterostomes but with two homology classes sometimes represented in protostomes, are a favorable situation because gain and loss of introns is exceedingly rare per billion years of branch length (which minimizes homoplasy issues). It is most parsimonius to assume the ancestral Urbilateran ciliary opsin had exactly 4 exons. These are denoted in the summary table as 88 12 FATLG, 206 00 FTVKE, and 307 00 MMNKQ. (The notation gives the position coordinate downstream of the conserved asparagine at position 55 of bovine rhodopsin in a large gapped alignment of 230 opsins of all classes, with intron phase given as 12 etc. and five residues provided for orientation.)

Note the ciliary opsin from lophotrochozoan Platynereis plays a key role in establishing the third intron which was evidently lost in the stem of ecdysozoa (insects plus crustacean). However, the story is not so simple because most opsin classes also contain one or two 'sporadic' introns (of very limited gene tree distribution and so specific to later developments in that orthology class). The only exception here is 152 21 which cuts across all rod and cone opsin classes. There is some positional uncertainty in regions of heavy gapping (eg 155 12 and 159 12 difference could be a gapping effect) and coincidental convergences are not impossible (the 152 21 intron in the heavily altered PPINa_ciona may be one).

In this view, it takes 10 intron gains and 1 loss to account for the data, rather at odds with received wisdom about those relative rates. That number could be reduced by admitting a degree of positional or phase drift in the interior region that would lump events; however that mechanism lacks support outside this gene family. Alternately, a hotspot existed here in an ancestral stem that was active during the timespan of gene duplication. A stem species may have had a higher rate of intron gain and loss than we see in most species today (rogue species such as drosophila and ciona are hyperactive in this way). The exact sequence of events may never be resolved because of an insufficient number of extant species from the amphioxus-lamprey stem.

The intron data in ciliary opsins, as in so many genes, conflicts with 1R and 2R whole genome duplication scenarios in early deuterostomes. That cannot have played any role in any post-encephalopsin, post-amphioxus ciliary opsins which instead are simply sequential nestings of (intron-preserving) segmental duplications. The data also conflict with sweeping theories of ectopic propagation of established visual systems.

We're now in a far better position to analyze putative ciliary opsins in cnidaria and sponges (which might be seriously diverged in primary sequence). Nematostella in particular is quite conservative overall in terms of ancestral intron retention (minimal gain and loss relative to human). Opsins could be an exceptional gene family in that regard, but that primarily makes sense for gene copies derived initially by retropositioning (as in olfactory rhodopsins), with all old introns lost and perhaps 1-2 new ones subsequently gained. A weak blast match to authentic opsins and proven expression in a photoreception cell is insufficient to establish a given candidate gene as an opsin: a slow-evolving generic GPCR might also give similar alignment quality and other signaling processes take place even within a specialized cell type. Without diagnostic residues, appropriate introns, and informative indels, the case could be very circumstantial.

Ancestral Introns in Ciliary Opsins: all data

                     * ** **  * *** **         * *** *  * *** *
RHO1_homSa    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
RHO1_monDo    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
RHO1_bosTa    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
RHO1_ornAn    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
RHO1_galGa    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
RHO1_xenTr    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
RHO1_neoFo    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
RHO1_latCh    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
RHO1_anoCa    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
RHO1_petMa    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
RHO1_letJa    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
RHO1_geoAu    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
RHO1_leuEr    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
RHO1_calMi    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
RHO1_takRu    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
RHO2_galGa    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
RHO2_anoCa    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
RHO2_neoFo    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
RHO2_latCh    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
RHO2_gekGe    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
RHO2_geoAu    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
SWS2_ornAn    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
SWS2_utaSt    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
SWS2_taeGu    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
SWS2_neoFo    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
SWS2_galGa    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
SWS2_xenTr    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
SWS2_geoAu    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
SWS2_takRu    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
SWS2_gasAc    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
SWS1_homSa    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
SWS1_monDo    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
SWS1_anoCa    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
SWS1_utaSt    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
SWS1_taeGu    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
SWS1_galGa    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
SWS1_neoFo    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
SWS1_xenLa    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
SWS1_geoAu    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
SWS1_danRe    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
SWS1_oryLa    . . .  1 88 12  2 152 21  . . .  3 206 0  4 307 0
LWS_homSap 1 -15 12  2 88 12  3 152 21  . . .  4 206 0  5 307 0
LWS_monDom 1 -15 12  2 88 12  3 152 21  . . .  4 206 0  5 307 0
LWS_ornAna 1 -15 12  2 88 12  3 152 21  . . .  4 206 0  5 307 0
LWS_galGal 1 -15 12  2 88 12  3 152 21  . . .  4 206 0  5 307 0
LWS_anoCar 1 -15 12  2 88 12  3 152 21  . . .  4 206 0  5 307 0
LWS_xenTro 1 -15 12  2 88 12  3 152 21  . . .  4 206 0  5 307 0
LWS_takRub 1 -15 12  2 88 12  3 152 21  . . .  4 206 0  5 307 0
LWS_gasAcu 1 -15 12  2 88 12  3 152 21  . . .  4 206 0  5 307 0
LWS_petMar 1 -15 12  2 88 12  3 152 21  . . .  4 206 0  5 307 0
LWS_letJap 1 -15 12  2 88 12  3 152 21  . . .  4 206 0  5 307 0
LWS_geoAus 1 -15 12  2 88 12  3 152 21  . . .  4 206 0  5 307 0
LWS_neoFor 1 -15 12  2 88 12  3 152 21  . . .  4 206 0  5 307 0
PIN_galGal    . . .  1 88 12  . . .   2 155 12 3 206 0  4 307 0
PIN_utaSta    . . .  1 88 12  . . .   2 155 12 3 206 0  4 307 0
PIN_podSic    . . .  1 88 12  . . .   2 155 12 3 206 0  4 307 0
PIN_pheMad    . . .  1 88 12  . . .   2 155 12 3 206 0  4 307 0
PIN_xenTro    . . .  1 88 12  . . .   2 155 12 3 206 0  4 307 0
PIN_bufJap    . . .  1 88 12  . . .   2 155 12 3 206 0  4 307 0
VAOP_galGa    . . .  1 88 12  . . .   2 165 21 3 206 0  4 307 0
VAOP_anoCa    . . .  1 88 12  . . .   2 165 21 3 206 0  4 307 0
VAOP_xenTr    . . .  1 88 12  . . .   2 165 21 3 206 0  4 307 0
VAOP_danRe    . . .  1 88 12  . . .   2 165 21 3 206 0  4 307 0
VAOP_rutRu    . . .  1 88 12  . . .   2 165 21 3 206 0  4 307 0
VAOP_takRu    . . .  1 88 12  . . .   2 165 21 3 206 0  4 307 0
VAOP_petMa    . . .  1 88 12  . . .   2 165 21 3 206 0  4 307 0
PPIN_anoCa    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
PPIN_xenTr    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
PPIN_petMa    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
PPIN_letJa    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
PPIN_ictPu    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
PPIN_oncMy    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
PPIN_danRe    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
PPINa_cioI 1 31 0    2 88 12  3 152 21  . . .  5 206 0  7 307 0
PPINa_cioS 1 31 0    2 88 12  3 152 21  . . .  5 206 0  7 307 0
PPINb_cioI 1 31 0    2 88 12  3 152 21  . . .  5 206 0  7 307 0
PPINb_cioS 1 31 0    2 88 12  3 152 21  . . .  5 206 0  7 307 0
PARIE_utaS    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
PARIE_anoC    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
PARIE_xenT    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
PARIE_takR    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
PARIE_gasA    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
PARIE_danR    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
CILI2_plaD    . . .  1 88 12  2 128 12  . . .  3 206 0  4 307 0
CILI1_plaD    . . .  1 88 12  2 128 12  . . .  3 206 0  4 307 0
PIN_stoPur    . . .  1 88 12  . . .   2 159 12 3 206 0  4 307 0
ENCEPH_hom    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
ENCEPH_mon    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
ENCEPH_gal    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
ENCEPH_ano    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
ENCEPH_gas    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
ENCEPH_xen    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
ENCEPH4a_t    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
ENCEPH4b_t    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
ENCEPH4_br    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
ENCEPH5_br    . . .  1 88 12  . . .     . . .  2 206 0  3 307 0
ENCEPH1_an 1 32 0    2 88 12  . . .   3 160 21 4 206 0  . . .
ENCEPH2_an 1 32 0    2 88 12  . . .   3 160 21 4 206 0  . . .
ENCEPH_aed 1 32 0    2 88 12  . . .   3 160 21 4 206 0  . . .
ENCEPH_cul 1 32 0    2 88 12  . . .   3 160 21 4 206 0  . . .
ENCEPH_tri 1 32 0    2 88 12  . . .   3 160 21 4 206 0  . . .
ENCEPH_bom 1 32 0    2 88 12  . . .   3 160 21 4 206 0  . . .
ENCEPH_dap 1 32 0    2 88 12  . . .   3 160 21 4 206 0  . . .
ENCEPH_api 1 47 0    2 80 21  3 122 21         4 233 12 5 272 0


(to be continued after the finishing ancestral gene reconstructions)