Opsin evolution: ancestral sequences: Difference between revisions

From genomewiki
Jump to navigationJump to search
No edit summary
No edit summary
Line 27: Line 27:
Before going there, let's take a quick overview using stratified invariance in post-lamprey post-encephalopsin ciliary opsins. That's quickly done by aligning the opsins and taking the consenus line at incrementally declining percent identity requirements, that is 100% invariant, 95%, 90%, etc. We need to know <span "color: #990099;">which residue are conserved at what depth and why.</span> Opsins are hair-trigger, fail-safe by structural design: human RHO1 can detect as few as 5 photons, each one of which can activate hundreds of G-proteins via transducin. Very rarely does an unactivated opsin cause G-protein signaling. Tolerated mutational change must work around these constraints.
Before going there, let's take a quick overview using stratified invariance in post-lamprey post-encephalopsin ciliary opsins. That's quickly done by aligning the opsins and taking the consenus line at incrementally declining percent identity requirements, that is 100% invariant, 95%, 90%, etc. We need to know <span "color: #990099;">which residue are conserved at what depth and why.</span> Opsins are hair-trigger, fail-safe by structural design: human RHO1 can detect as few as 5 photons, each one of which can activate hundreds of G-proteins via transducin. Very rarely does an unactivated opsin cause G-protein signaling. Tolerated mutational change must work around these constraints.


<pre>stratified invariance in cilary opsins: column height = conservation depth; rho1 = human rhodopsin RHO1; opns line = conserved all opsins 90% caps/50% lower; special symbols = reduced alphabets.
<pre>Stratified Invariance in Cilary Opsins
column height = conservation depth; rho1 = human rhodopsin RHO1  
opns line = conserved all opsins 90% caps/50% lower
special symbols = reduced alphabets


100% ..............y.................N..................................................G...C..#.......G............eR..V!..P..................W..........................
100% ..............y.................N..................................................G...C..#.......G............eR..V!..P..................W..........................
Line 58: Line 61:
</pre>
</pre>


There's not conservation depth outside the opsin core, which begins at a classical waffle residue (Y or F) not much sooner than a very deeply conserved asparagine, position N55 in the (FSMLAAYMFLLIVLGFPIN region in human RHO1 terminology). That Asn55 is known to be conserved within GPCR far outside of opsins, making it diagnositically useless. The reason for the prodiguous conservation of this particular amino acid -- which likely exceeds many trillion years of branch length considering all the family members and all the species -- is apparently [http://www.sciencemag.org/cgi/content/full/289/5480/739/F5 structural.] Its side chain makes two interhelical hydrogen bonds to Asp83 in TMH2 and to the peptide carbonyl of Ala299 of TMH7. Asp83 is in turn connected via a water molecule to the peptide carbonyl of Gly120 in TMH3 (ie is not side-chain specific). Nearby Asn78 also in TMH2 also constrains three helices via hydrogen bonds to hydroxyl groups of Ser127 of TMH3 and Thr160 +Trp161 of TMH4. Of course glutamines could furnish these same exact bond donors at the same exact geometry but the extra CH2 group would push the bonds forward; no coevolutionary change in acceptor can accommodate this given the palette of 20 amino acids so it is never seen.
There's not conservation depth outside the opsin core, which begins at a classical waffle residue (Y or F) not much sooner than a very deeply conserved asparagine, position ASN55</span> in the (FSMLAAYMFLLIVLGFPIN region in human RHO1 terminology). That Asn55 is known to be conserved within GPCR far outside of opsins, making it diagnositically useless. The reason for the prodiguous conservation of this particular amino acid -- which likely exceeds many trillion years of branch length considering all the family members and all the species -- is apparently [http://www.sciencemag.org/cgi/content/full/289/5480/739/F5 structural.] Its side chain makes two interhelical hydrogen bonds to Asp83 in TMH2 and to the peptide carbonyl of Ala299 of TMH7. Asp83 is in turn connected via a water molecule to the peptide carbonyl of Gly120 in TMH3 (ie is not side-chain specific). Nearby Asn78 also in TMH2 also constrains three helices via hydrogen bonds to hydroxyl groups of Ser127 of TMH3 and Thr160 +Trp161 of TMH4. Of course glutamines could furnish these same exact bond donors at the same exact geometry but the extra CH2 group would push the bonds forward; no coevolutionary change in acceptor can accommodate this given the palette of 20 amino acids so it is never seen.


[[Image:Opsins_asn55_dry134.png|left|]]
[[Image:Opsins_asn55_dry134.png|left|]]


The ERY motif (which can accommote D in first position and W in third) is another huge source of confusion in the opsin literature. It too is not at all specific to opsins within GPCR; consequently it cannot be used to argue that distant Nematostella or Hydra blast matches are indeed opsins rather than some other class of GPCR. The ERY motif is also structural. Glu134 forms a salt-bridge with guanidium of adjacent Arg135 which in turn hydrogen-bonded to Glu247 and Thr251 in TMH6, a relation possibly critical to keeping rhodopsin in the inactive conformation. Movement in TMH3 during the photoreception cycle changes the environment of the ERY motif causing its reorientation.  
The <span style="color: #990099;">ERY motif</span> (which can accommote D in first position and W in third) is another huge '''source of confusion''' in the opsin literature. It too is not at all specific to opsins within GPCR; consequently it cannot be used to argue say that distant Nematostella or Hydra blast matches are indeed opsins rather than some other class of GPCR. The ERY motif is also structural. Glu134 forms a salt-bridge with guanidium of adjacent Arg135 which in turn hydrogen-bonded to Glu247 and Thr251 in TMH6, a relation possibly critical to keeping rhodopsin in the inactive conformation. Movement in TMH3 during the photoreception cycle changes the environment of the ERY motif causing its reorientation.  


NPxxY is a third patch of residue conservation in TMH7 specific to rhodopsin-superfamily but not to opsins. However the stratified alignment shows that a slightly larger patch might be diagnostic for ciliary opsins and distinguish them from rhabdomeric, say VYNPVIYI (with specifically reduced alphabets for the hydrophobic residues). This type of exercise requires a massive opsin reference collection to be sure the full range of natural variation is seen. Here the structural and functional signficance of this motif is murky. The two polar residues Asn302 and Tyr306 are internal, with the former possibly hydrogen bonding via a water bridge to Asp83 and the latter's hydroxyl close to Asn73 (highly conserved among generic GPCRs).  
The <span style="color: #990099;">NPxxY motif</span> is a third conserved patch lying in TMH7 specific to rhodopsin-superfamily but not to opsins. However the stratified alignment shows that a slightly larger patch might be diagnostic for ciliary opsins and distinguish them from rhabdomeric, say VYNPVIYI (with specifically reduced alphabets for the hydrophobic residues). This type of exercise requires a massive opsin reference collection to be sure the full range of natural variation is seen. Here the structural and functional signficance of this motif is murky. The two polar residues Asn302 and Tyr306 are internal, with the former possibly hydrogen bonding via a water bridge to Asp83 and the latter's hydroxyl close to Asn73 (highly conserved among generic GPCRs).  


Broad conservation ends well before the stop codon RNCMLTTICCG (position locatable by web browser search in the sequence collection).  That's not to say there's not good information earlier about evolution strictly within cone opsins (such as the 1 residue deletion after PFEYPQY uniting RHO1 through LWS, and the 2 residue insert uniting RHO1 through PIN) but we're looking at a very much deeper time scale for now for all of Metazoa. Note from the 'opsn' line that opsins very broadly considered (cnidarian, protostome, deuterostome; Go, Gt, Gq) share considerable conservation at 70 positions of 288. That's to say 25% identity is the approximate floor (lower bound) for a blast search. These residues may be so fundamental to the GPCR and rhodopsin superfamily that the floor for non-opsins won't be that different. Therefore sequence alignment alone cannot be used to show remote sequences in sponges and cnidarians are truly opsins.
Broad conservation ends well before the stop codon RNCMLTTICCG (position locatable by web browser search in the sequence collection).  That's not to say there's not good information earlier about evolution strictly within cone opsins (such as the 1 residue deletion after PFEYPQY uniting RHO1 through LWS, and the 2 residue insert uniting RHO1 through PIN) but we're looking at a very much deeper time scale for now for all of Metazoa. Note from the 'opsn' line that opsins very broadly considered (cnidarian, protostome, deuterostome; Go, Gt, Gq) share considerable conservation at 70 positions of 288. That's to say 25% identity is the approximate floor (lower bound) for a blast search. These residues may be so fundamental to the GPCR and rhodopsin superfamily that the floor for non-opsins won't be that different. Therefore sequence alignment alone cannot be used to show remote sequences in sponges and cnidarians are truly opsins.
Line 71: Line 74:


<pre>
<pre>
Landmarks in Bovine Rhodopsin RHO1 explain some aspects of residue conservation:
Landmarks in Bovine Rhodopsin RHO1 sequence explain residue conservation:


194 residues in seven transmembrane helices
194 residues in seven transmembrane helices

Revision as of 15:37, 20 December 2007

Reconstruction of ancestral genes -- indeed whole genomes -- is useful in a variety of contexts. Widely done in opsins to reconstruct historical spectral senstitivities, our purpose here is primarily to reduce the excessive number of available opsin sequences to representative ones that still carry all the information of the opsin class but without the idiosyncracies that might have developed in particular clades. An ancestral ciliary opsin sequence at the agnathan divergence node takes away the subsequent 500 million years of sequence divergences. Suppose the same is done for rhabdomeric opsins. Comparing these to an uncharacterized extant (contemporary) lophotrochozoan or cnidarian opsin greatly sharpens the alignment which previously involved a billion years of round trip evolutionary divergence. It further facilitates comparison of diagnostic signature residues and patches and rare genetic events such as intron gain or loss.

These considerations are very important in opsins, which are embedded in the largest and most complex of all gene families, the GPCR, because it happens that the critical events in the evolutionary origin of eye are quite old, certainly predating the Cambrian. Opsin sequences are well-conserved, less well than some like histone or ribosomal proteins but far more than the median protein, but over the time scales involved the percent identity has dropped off into the unreliable Blast twilight zone (below 30%) where a faster evolving opsin might be confused with a slower evolving GPCR not involved in photoreception. Ancestral sequences thus greatly improve the placement of opsins within their correct homology class.

However the utility really depends on the accuracy of ancestral sequence reconstruction. We hear this or that maximal likelihood or bayesian methodology "should" work -- but are these assertions really testable or just self-serving bioinformatic blather? Ancient dna has two problems -- the sequenceable component is never that ancient and the fossil that it came from is never exactly from the divergence node. On the protein side, even if collagens and other structural proteins can be sequenced from dinosaur femur, that won't help with soft-tissue membrane-bound opsins. Fossil eyes in trilobites are much studied but equally uninformative at the molecular level. So direct tests of reconstructed opsins are not imminent.

Another dimensionality to testing accuracy of reconstructed sequences involves physical construction of the gene and its expression in a contemporary host. If the gene were an enzyme, we might gain confidence looking at binding constants, catalytic efficiency, and substrate specificiity. For opsins we have covalent binding of retinal, spectral sensitivity, 7-transmembrane topology, and signaling capability. Unfortunately we don't know what the ancestral lambda max should be. These functionalities won't prove stringent enough because even the sloppiest reconstruction will get invariant and near-invariant residues correct. These may be quite adequate to produce a satisfactorily functioning opsin that bears little relationship to true ancestral sequence.

It could be argued that the ancestral residues with the most variation are the least important, so it doesn't really matter if the reconstruction gets them right. It's abundantly clear from a quick alignment that amino and carboxy termini are under very relaxed constraints sometimes even within a single orthology class (but sometimes not), making reconstruction outside the core opsin essentially hopeless. There's also markedly less conservation in some of the extracellular and cytoplasmic connecting loops. Note though that a highly organized portion in the extracellular region, including a conserved disulfide bridge, actually guides the arrangement of the seven-helix transmembrane motif

Thus the focus of the restoration effort lies between the hopeless and the slam-dunk invariant residues. Here we don't want to use reconstruction methods developed and vetted for cytoplasmic proteins. The rules -- such as what consititues a conservative substitution -- are very different for integral membrane proteins such as opsins where alpha helices have exterior exposed to hydrophobic rather than hydrophilic except at their cap residues. We know from the determined 3D structure of bovine RHO1 (intradiskal loop 1 and loop 2 have been studied separately) that opsins will have significant co-evolution, that is ectopic (non-adjacent) residue pairs that shift in a coordinated manner according to their own reduced alphabet despite the disparity in linear position, the best known case being the retinal-bearing lysine and its negative counterion glutamate (where the reduced alphabet contains aspartate at the counterion but not arginine or histidine at the Schiff base, though opsins are known where the counterion position has shifted). These issues are not considered in residue-by-residue and local patch reconstruction methods.

It's proven extremely difficult to determine the 3D structure of additional GPCRs despite an immense research effort but finally in August 2007 that of a construct based on human beta2 adrenergic receptor (intronless gene ADRB2) was obtained. Beyond other intronless adrenergic receptors, t's most closely related within the human genome to dopamine and serotonin receptors (DRD1 and HTR4 resp.) The latter has 8 coding introns which we'll consider later as a control on specificity. Ominously, ADRB2 has best blastp to our putative new Nematostella melanopsin nemVec1 and one from annelid, MEL2_capCap and otherwise resembles melanopsins at the 30% identity level!

There's nothing specifically known about rhabdomeric opsins but rule of thumb in crystallography of soluble proteins is that an unknown sequence can be reliably fitted to a known 3D structure if homology exeeds the 30% identity level, with the big picture retained even at much lower levels. These may apply to membrane opsins because the 7-transmembrane topology (deduced from hydrophobic periodicity plots) is a very deeply conserved feature. However subtleties of ectopic interactions may not emerge from structural fitting despite the many constraints provided by invariant residues, though residue covariance can sometimes be inferred from direct statistical study of the sequences themselves. The authors of the beta adrenergic study are sceptical of fitting.

We'll take a heuristic approach here to ancestral sequence reconstruction because not all possible evolutionary nuances have tangible sequaleae to our central focus of disentangling very ancient gene duplications and divergences for the purpose of photoreceptor functional homologenization. It is very likely that a pragmatic hand-curational approach informed by expert opinion and tailored to the particular circumstances of opsin structure/function produces a better product than blind application of statistical web software whose appealing 'objectivity' only masks massive internal subjectivity in parameter choices and mutational processes. However with opsins the outcomes may scarcely differ in the early rounds of reconstruction at the determinable positions, for example the opsin portfolio at lamprey node.

We can expect as ancillary benefits (1) a tenfold reduction in the number of sequences under management, (2) a small set of proxy sequences that retains all of the information (including intron and indel rare genomic events) but none of the idiosyncraticies, (3) a blast query that significantly outperforms any of its consitituent sequences on outside opsins because it has taken off 500 million years of divergence time, (4) and a sequence less likely to be fooled by non-opsin rhodopsin superfamily members or generic GPCR.

It's best procede in stages with the actual work of ancestral sequence reconstruction, as determined by phylogenetically dispersed sampling density. That is, lophotrochozoan ciliary opsins are known in too low numbers in too few species, whereas an excessive number of insect rhabdomeric and teleost cone opsins are available. After a bioinformatic push on new genomes, the resultant data set allows ancestral sequence reconstructions at common ancestor with lamprey for all classes of deuterostome opsins and at the ancestral arthropod for rhabdomeric imaging opsins. For ciliary opsins in lophtrochozoa, cnidaria, and early diverging deuterostomes, the sparse set of individual sequences must initially be retained. This will unavoidably mix filtered ancestral sequences with noisy contemporary species-level opsin sequences at the interpretative stage. That's the usual state of affairs in bioinformatics and hardly a show-stopper.

In actual ancestral opsin reconstruction, we won't use consensus sequence except as a heuristic because that doesn't exploit the known gene tree and species tree. There's a potential benefit to single-species consistency -- species such as Xenopus have nearly a full set -- because an actual sequence preserves subtle co-evolving residue pairs. Profile sequences (which retain the dispersion over the 20 possible amino acids at each reconstructed position) are powerful but unwieldy -- the most useful output is a logos graphic which however requires trimming sequences to fixed length and loses text character. Most of the benefit can there skimmed by use of reduced alphabet at positions where this is necessary. That is, at some residues the ancestral value is truly undeterminable, being at any given time a polymorphic mix of more or less equally acceptable alternatives (eg asparagine/glutamine waffling). Special symbol used to indicate reduced alphabets can become unrecognizable to blast type tools which expect the standard 20.

Outgroups have an important role in arbitrating ancestral residue choice in the situation two sister clades might disagree. Here simple parsimony drives the decision. If say threonine is used in one clade of cone opsins and serine in another, while pinopsin and the others use threonine, then the ancestral residue is threonine and not as reduced alphabet threonine/serine, ie the serine is taken as a clade-specific change on that stem. This extends to an 8 row parsimony decision table that covers all the combinatorial possibilities.

Before going there, let's take a quick overview using stratified invariance in post-lamprey post-encephalopsin ciliary opsins. That's quickly done by aligning the opsins and taking the consenus line at incrementally declining percent identity requirements, that is 100% invariant, 95%, 90%, etc. We need to know which residue are conserved at what depth and why. Opsins are hair-trigger, fail-safe by structural design: human RHO1 can detect as few as 5 photons, each one of which can activate hundreds of G-proteins via transducin. Very rarely does an unactivated opsin cause G-protein signaling. Tolerated mutational change must work around these constraints.

Stratified Invariance in Cilary Opsins
column height = conservation depth; rho1 = human rhodopsin RHO1 
opns line = conserved all opsins 90% caps/50% lower
special symbols = reduced alphabets

100% ..............y.................N..................................................G...C..#.......G............eR..V!..P..................W..........................
 95% ..............f.................N............L....N....n.......................%%..G...C..#.%.....G............eR..V!C.P..................W.........P...W..%........C
 90% ..............y......M..........N............LR...N....N.......................%F..G...C..#G%.....G............ER..V!C.P..................W.........P..GW..%........C
 85% ..............f......M..........N......T.....LR.P$N....Nv...#.................GYF..G...C..EG%.....G...$.S....A.ER..V!C.P.g........A.......W........PP..GW..Y...G...SC
 80% ..............y......M..........N......T.....LR.PLN.!..NL...#.................GYF..G...C..EGF.....G...L.SL...A.ER..V!C.P.G........A.......W........PP..GW..Y...G...SC
 75% ..............f......M..........N......T...K.LR.PLN%!LVNL..A#......g..........GYF..G...C..EGF.....G...LWSL.!.A.ERy.V!CKP.G...F....A..G....W........PP..GWS.Y.PEG...SC
 70% ..............y......M..........N......T...K.LR.PLNYILVNLA!A#L.....G..........GYF..G...C..EGF.....G...LWSL.!.A.ERy.V!CKP.G#..F...HA..G....W........PP..GWS.Y.PEG...SC
 65% P...Pq........f......M..........N.vV..!T...KKLR.PLNYILVNLA!ADL.....G..........GYF..G...C..EGF.....G.!.LWSL.!$A.ERy.V!CKP.G#..F...HA..G!.%.W........PPL.GWS.Y.PEG...SC
 60% P...Pq...A....y......M..........N.LV..VT.k.KKLR.PLNYILVNLA!ADL.....G..........GYF..G...C..EGF.....G.!.LWSLa!LA.ERY.V!CKP.GN..F...HA..G!.F.W!.......PPLFGWS.Y.PEG$..SC
 55% Pf..Pq...A..w.f...A..M..........N.LV..VT.KfKKLR.PLNYILVNLA!ADL.....G.......#..GYF.$G...C..EGF.v...G!V.LWSLAVLA.ERY.V!CKP.GN..F...HA..G!.F.W!....W..PPLFGWSRY.PEG$.TSC
 50% PF..PQ...A.PW.Y..LA..M..v.......N.LV!.VT.KfKKLR.PLNYILVNLAVADL.....G.T!....#..GYF.LG...C..EGF.V...G!V.LWSLAVLA%ERY.VVCKP$GNF.F...HA..G!.FTW!....W..PPLFGWSRY.PEG$.TSC
opsn:................................N..v.......k.lr.p.n....nla..d...................f..g...C...gf.....g..s...l..la..Ry.vi..p..........a.......W.....w...pl.gw..y.peg..tsC
rho1:PFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSC

100: ...........................................................#......!..M!..%.....PY......................P....K....%NP.IY...N..F...................................
 95: ...............%...........P...I...Y.......................#.....MV..M!..%.....PY......................P....K....YNP.IY..$N.#F...................................
 90: ...............%...........P...I...Y.......................E..V..MV..M!..%.....PY......................P....K....YNP.IY..$N.QFR..................................
 85: ..#.%..........%........F..P...I...Y.......................E.#V..MV!.M!..F..CW.PY......................P.%F.K...!YNP!IY..$N.QFR.C.......G........................
 80: ..#.%..........%........F..P...I...Y.......................E.#V.RMV!.M!..F..CW.PY...A..................P.%F.K...!YNP!IY!.$N.QFR.C.......G........................
 75: .P#.%.........S%....F..CF..P..!I...Y.............#.....t..AE.#V.RMV!.M!..F..CW.PY...A..................P.%F.K...!YNP!IY!.$N.QFR.C.......G.......#................
 70: .P#.%.........S%....F..CF..P..!I...Y..L.......A.#..#...T..AE.#V.RMV!!M!..F..CW.PYA..A..................P.%F.K...!YNP!IY!.MNKQFR.C......cG.......#................
 65: .P#WY.........SY!...F..CF..P..!I...Y..L...$...A.QQ.#...T.KAE.EV.RMV!!MV..F$.CW.PYA..A..................P.%F.K...!YNP!IY!%MNKQFR.C......CG.......#...t.S.V....
 60: GP#WY.......#.SY!...F..CF..PL.!I.%.Y..L...$..!A.QQ.Es..TQKAE.EV.RMV!!MV.AFL!CW.PYA.fA..!..#......P....!P.%F.K...!YNPIIY!FMNKQFR.C......CG.......#...T#S.VS...!.P.
 55: GPDWY....#..#.SY!!.$F..CF.!PL.!I.F.Y..L$..LR.VA.QQ.ES..TQKAE.EV.RMV!VMV.AFL!CW.PYA.FA..!..N.....#P..A.!PA%F.K.S.VYNPIIY!FMNKQFR#C......CG.....#.#...T#SSVS...V.P.
 50: GPDWY....#..#.SY!!.$F..CF.!PL.!I.FSY..LL..LR.VA.QQ.ES..TQKAE.EVTRMVVVMV.AFL!CW.PYA.FA$.!..N.....#P..AT!PA%F.KSStVYNPIIY!FMNKQFR#C.$....CGK.p..#.#.S.T#SSVS.s.V.P.
opsn:..d...........sy........f..Pl..i...Y.......................e.....m...m...f...w.PYa...............p.....p..f.k.s..ynpiiy...n..fr..................................
rho1:GIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVSKTETSQVAPA

There's not conservation depth outside the opsin core, which begins at a classical waffle residue (Y or F) not much sooner than a very deeply conserved asparagine, position ASN55 in the (FSMLAAYMFLLIVLGFPIN region in human RHO1 terminology). That Asn55 is known to be conserved within GPCR far outside of opsins, making it diagnositically useless. The reason for the prodiguous conservation of this particular amino acid -- which likely exceeds many trillion years of branch length considering all the family members and all the species -- is apparently structural. Its side chain makes two interhelical hydrogen bonds to Asp83 in TMH2 and to the peptide carbonyl of Ala299 of TMH7. Asp83 is in turn connected via a water molecule to the peptide carbonyl of Gly120 in TMH3 (ie is not side-chain specific). Nearby Asn78 also in TMH2 also constrains three helices via hydrogen bonds to hydroxyl groups of Ser127 of TMH3 and Thr160 +Trp161 of TMH4. Of course glutamines could furnish these same exact bond donors at the same exact geometry but the extra CH2 group would push the bonds forward; no coevolutionary change in acceptor can accommodate this given the palette of 20 amino acids so it is never seen.

Opsins asn55 dry134.png

The ERY motif (which can accommote D in first position and W in third) is another huge source of confusion in the opsin literature. It too is not at all specific to opsins within GPCR; consequently it cannot be used to argue say that distant Nematostella or Hydra blast matches are indeed opsins rather than some other class of GPCR. The ERY motif is also structural. Glu134 forms a salt-bridge with guanidium of adjacent Arg135 which in turn hydrogen-bonded to Glu247 and Thr251 in TMH6, a relation possibly critical to keeping rhodopsin in the inactive conformation. Movement in TMH3 during the photoreception cycle changes the environment of the ERY motif causing its reorientation.

The NPxxY motif is a third conserved patch lying in TMH7 specific to rhodopsin-superfamily but not to opsins. However the stratified alignment shows that a slightly larger patch might be diagnostic for ciliary opsins and distinguish them from rhabdomeric, say VYNPVIYI (with specifically reduced alphabets for the hydrophobic residues). This type of exercise requires a massive opsin reference collection to be sure the full range of natural variation is seen. Here the structural and functional signficance of this motif is murky. The two polar residues Asn302 and Tyr306 are internal, with the former possibly hydrogen bonding via a water bridge to Asp83 and the latter's hydroxyl close to Asn73 (highly conserved among generic GPCRs).

Broad conservation ends well before the stop codon RNCMLTTICCG (position locatable by web browser search in the sequence collection). That's not to say there's not good information earlier about evolution strictly within cone opsins (such as the 1 residue deletion after PFEYPQY uniting RHO1 through LWS, and the 2 residue insert uniting RHO1 through PIN) but we're looking at a very much deeper time scale for now for all of Metazoa. Note from the 'opsn' line that opsins very broadly considered (cnidarian, protostome, deuterostome; Go, Gt, Gq) share considerable conservation at 70 positions of 288. That's to say 25% identity is the approximate floor (lower bound) for a blast search. These residues may be so fundamental to the GPCR and rhodopsin superfamily that the floor for non-opsins won't be that different. Therefore sequence alignment alone cannot be used to show remote sequences in sponges and cnidarians are truly opsins.

Opsin bovRHO1.png

Landmarks in Bovine Rhodopsin RHO1 sequence explain residue conservation:

194 residues in seven transmembrane helices
 35 to  64 for TMH1 Asn55 hydrogen bonded to Asp83 TMH2 and Ala299 TMH6
 71 to 100 for TMH2 Gly90 night blindness
107 to 139 for TMH3 Cys110 half-disulfide | Glu113 salt bridge | ERY motif hydrogen bonds Arg135 and Glu247 Thr251 in TMH6
151 to 173 for TMH4
200 to 225 for TMH5
247 to 277 for TMH6
286 to 306 for TMH7 Lys296 11-cis-retinal NPviY motif 302-306

74 residues extracellular in 3 loops and tail
  1 to  34 for nTER Asn2 oligosaccharide|Gly3-Pro12 beta sheet parallel membrane | Asn15 oligosaccharide | Pro23  Gln28  retinitis pigmentosa maintain  orientation between EXC1 and nTER 
101 to 106 for EXC1
174 to 199 for EXC2 Cys187 half disulfide 
278 to 285 for EXC3

70 residues cytoplasmic in 2 loops and tail
 65 to  70 for CYT1
140 to 150 for CYT2
226 to 246 for CYT3
307 to 348 for cTER  Cys32  Cys323 covalent palmitate tails last 15-amino acids unstructured


provisional trimmed ancestral proxy sequences for vertebrate ciliary opsins
>ANC_RHO1_14
MFfLIlvgFPvNFLTLfVTvqHKKLRtPLNYILLNLAvAnLFMVlfGFtvTmYTsmnGYFvfGptgCniEGFFATLGGEiaLWsLVVLAiERYvViCKPMsNFRFGntHAImGVaFTWiMALaCAaPPLvGWSRYIPEGmQCSCGvDYYTlkPeiNNESFVIYMFvVHFtIPfivIF
FCYGrLlcTVKeAAAqQQESasTQkAEkEVTRMVvlMVIaFLvCWVPYASVAfYIFthQGsdFGptFMTvPAFFAKSsalYNPvIYIlmNKQFRNCMITTlCCG

>ANC_RHO1_06
mFfLIitGlPiNiLTLlVTFkHKKLRQPLNYILVNLAvAdLfmvcfGFTVTFytawngYFvfGPiGCAiEGFfATlGGqVALWSLVVLAIERYIVvCKPMGNFRFsatHaimGIaFTWfmAlsCAaPPLfGWSRyiPEGlQCSCGPDYYTlNPDfHNESyViYmFvVHFliPvviIF
fsYGRLiCKVrEAAAQQQESAsTQKAEkEVTRMVILMVlGFllAWtPYAsvAfWIFtNkGAeFsaTlMtvPAFFSKSSslyNPIIYVL$NKQFRNCMiTTiCCG

>ANC_SWS2_09
MfflvilGfpiNvLTifCTikyKKLRSHLNYILVNLAvaNLlVvcvGStTAFySFsqmYFalGplaCKiEGFaATLGGMvSLWSLAVvAFERfLVICKPlGNFtFrgtHAvlgCvaTWvfglaaSaPPLfGWSRYIPEGLQCSCGPDWYTTnNKwNNESYVlFLFgFC
FgvPlaiIlFsYgrLLltLravAkqQeqAsTQKAEREVTrMVVvMVlGFLVCWlPYaSFALWvVtnRGepFDLrlAsIPsVFSKaStVYNPvIYvfmNKQFRSCMmKmffcG

>ANC_SWS1_11
MGfVFfaGTPLNaiVLvvTikYKKLRQPLNYILVNIsaaGFvfcvFSvftVFvaSsqGYfffGktvCalEafvGslaGLVTGWSLAfLAFERYiVICKPFGnFrFsSkHAlaaVvaTWiiGvgvsiPPFFGWSRYIPEGLqCSCGPDWYTvgtkYkSEyY
TwFLfifCFivPlsiIiFSYsQLLgALRAVAAQQqESAtTQKAEREVSRMvivMVgSFclCYvPYAalAmYmvnnrdhglDlRLVTIPAFFSKSscVYNPiIYcFMNKQFraCIMEtVcG

>ANC_LWS_13
MifVVaaSvFTNGLVLVATaKFKKLRHPLNWILVNliAiADLGETvfASTiSVcNQvfGYFILGHP$CVfEGytVSyCGItaLWSLtIIsWERWvVVCKPFGNiKFDgKwAtaGI!FSWVWsavWcaPPiFGWSRyWPHGLKTSCGPDVFSGssd
pGvqSyMivLMiTCCfiPLaiIilCYlqVwlaIraVAkQQKESEsTQKAEkEVSRMVVVMilAycfCWGPYtfFACFaAaNPGYAFHPLaAalPAYFAKSATIYNPIIYVFMNRQFRNCImQLFG

>ANC_PIN_06
MGmVVisAffVNGLVIVVSlkyKKLRSPLNYILVNLAiADLLVTfFGStiSFvNNivGFFvfGktmCEfEGFMVSLTGIVGLWSLAILAFERYlVICKPvGDFrFQqRHAVlGCaFTWgWsliWTsPPLfGWsSYVPEGLrTSCGPNWY
tGGsnNNSYImaLFvTCFamPLstIlFSYaNLLltLRAVAAQQKEsETTQRAErEVTRMVIaMVlAFLbCWLPYAsFAmVVAthKdlvIqPqLASLPSYFSKTATVYNPIIYVFMNKQFRsCLltlmcCG

>ANC_VAOP_07
MfvvTaLSLaENFaVilVTfkFkQLRQPLNYiiVNLsvADfLVSliGGsiSFlTNykGYFfLGkwACVLEGFAVTfFGiVALWSLAlLAFERyfVICRPLGNmRLrgKHAaLGlafVWtFSfiwTvPPvlGWSSYtv
SkIGTTCEPNWYSGnfhDHTfIitFFsTCFIfPLgVIfvsYGKLirKLrKvSnTqgrLgntRkpErQVTRMVVVMIlAFmvcWtPYAaFSIlvTAhPtIhLDPrLAAiPAFFSKTAtVYNPiIYvFMNKQFRkClvQlfsc

>ANC_PPIN_07
mavfsvsgvLNstVIiVTlryrQLRqPlNysLVNLAvADLGcavfGGlltveTNAvGYFnLGRVGCVlEGFAVAFFGIAaLCtiAVIAvDRyvVVCkPlGtvmFttrhAlaG!awSWlWSfvWNTPPLFGWGselLEGVrTSCAPnWYsrD
PaNvSYIvcYFafCFAiPFsvIvvSYgrLlwTLhQVaKLgvlesGSTakaEaQVsRMVvVMvmAFLlcWLPYAaFAltVildPnlyInPvIATvPMYLtKsSTVyNPIIYIFMNrQFRDcavPfLLCG

(to be continued)