Opsin evolution: orgins of opsins

From genomewiki
Revision as of 15:35, 18 December 2009 by Tomemerald (talk | contribs) (New page: === Introduction: the origin of opsins === left The origin of the first opsins is intriguing. Opsins are operationally defined for our purposes as 7-transmembr...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Introduction: the origin of opsins

OpsinOrigins.jpg

The origin of the first opsins is intriguing. Opsins are operationally defined for our purposes as 7-transmembrane GPCR containing a Schiff base lysine at position homologous to K296 of the bovine rhodopsin reference sequence.

This section moves forward in time from the parental gene content of the immediate ancestral genome (greatly facilitated by the new Trichoplax assembly) that gave rise to the first opsins via gene duplication and neofunctionalization of one copy for photoreception. Subsequent sections go backwards in time, first coalescing separate gene trees of ciliary, melanopsic and other opsins to their respective ur-opsins and ultimately deducing properties of the crown group opsin.

Events 600 million years ago seem hopelessly inaccessible and indeed many uncertainties will remain even after every relevent genome has been sequenced. However comparative genomics has already provided substantial insights into certain aspects of their evolution:

  • The first opsins were not associated with gross morphological structures (such as stalked eyes) that could possibly leave a fossil record -- key events took place strictly at the molecular level. Genomes of extant species (some more than others) are not exactly living fossils because the evolutionary accrual of mutations never ceases. Opsins have been demonstrably been obliterated both by gradual psuedogenization and large scale deletions, confusing the record. Yet opsin genes and even their regulatory regions, when compared across the entire metazoan tree, can furnish reliable reconstructions of opsin content and even sequence at ancestral species divergence nodes.
  • Opsins are definitely not the 'original' GPCR because these were already widely deployed at much earlier divergence nodes (yeast, protozoa, choanflagellates, trichoplax have GPCR but lack opsins). Nor are opsins the prototype for the 'rhodopsin class' R of the GRAFS classification of GPCR (which again was established far earlier). Indeed, even the Ralpha subgroup with of rhodopsin class GPCR was well-established prior to the first metazoan opsin.
  • Opsins are thus latecomers, not pioneers, to a rapidly expanding paralogous gene clade within already full-featured GPCR. Judging by their closest extant blastp relatives among tens of thousands of GPCR at GenBank, opsins specifically arose as a gene duplication within the peptide receptor subgroup PEP. Indeed, certain of these proteins exhibit opsins among their top ten best back-blast matches (ie have better matches than almost all non-opsin GPCR).
  • Note an 'intermediate' GPCR can never be found: either lysine is present at K296 or it isn't. Reconstructing ancestral states from the best contemporary set of GPCR proteins lacking K296 cannot produce a lysine there by any methodology. The 20 encoded amino acids can be clustered into subgroups (eg by polarity) but ultimately form a unorderable discrete set not providing continuum transitional states.
  • Most likely the parental gene had several introns and the original opsins inherited this pattern (ie the duplication was segmental rather than retroprocessional as in some later cnidarian opsins). The history of introns within opsins is already complex and becomes quite problematic within the enveloping gene family. Opsins lack the ubiquitious phase 21 intron breaking the DRY motif arginine (with the exception of a fragmentary sea urchin melanopsin).
  • Intracellular targeting of early opsins was likely to cytoplasmic or endoplasmic reticulum membranes as isolated monomers (rather than in derived microvilar or ciliary structural specializations later with melanin backing providing directionality). They were the first eyes to the world only in the sense of indicating the intensity of sunlight striking the cell via already elaborated GPCR second messenger signal transduction.
  • Opsins did not arise from flavinoid-based cryptochromes, mechanistically different photoreceptors that evolved much earlier to establish circadian rhythm and eventually magneto-sensing. Cryptochromes are homologous to DNA photolyase repair enzymes, not GPCR.
  • Opsin creation does not imply saltatory evolution because the basics had been established far earlier -- the 7-transmembrane helical structure with fixed topology, the TM1-TM2 salt bridge N55-D83 that could serve as initial counterion, the DRY ionic lock, the NPxxY terminal helix, the conformational shift upon binding of ligand that could trigger signaling, the Galpha protein binding site needed for the signaling cascade, and an arrestin-type mechanism signaling termination. Opsins contained and continued all of these features fromthe get-go, gradually adapting them to various photoreceptive functions.
  • In principle, GPCRs could continue to spawn new clades of opsins from time to time. However, they did not in bilaterans. That is, no gene tree of a bilateran opsin coalesces with a GPCR gene later than the bilateran common ancestor. All bilateran opsins are descended from one of six opsins classes present in the ur-bilateran. The gene tree comprised of all opsins excludes all GPCR, consistent with a unique K296 origination event. However, it remains possible that some cnidarian or ctenophoran opsins arose from a second wing of GPCR but no representative of this opsin survived in bilaterans.
  • Two genes in separate species are by definition orthologous only when descended vertically from a single gene in their last common ancestor. It appears that all bilateran opsins -- after accounting for later clade-specific expansions and losses -- are orthologous to either a cilopsin, melanopsin, peropsin, rgropsin, or neuropsin at the bilateran common ancestor.
  • These 5 opsin classes appear not fully coelesced even at the last common ancestor of bilaterans with cnidarians -- while sequence data is woefully limited today, it seems that both melanopsins and cilopsins classes existed in this ancestor, perhaps in addition other opsin classes no longer represented in bilaterans. Conversely, peropsins have been retained in lophotrochozoan, ecdysozoan, and deuterstome lineages but not in any cnidarian sequence to date. Neuropsins have been retained solely in chordates, whereas rgropsins are even more restricted (to vertebrates, even though they could not have originated there). These latter genes are the analogs of cnidarian-only opsin classes.
  • All opsins are homologous so any given pair is orthologous at some earlier common ancestor -- but which one? The species tree itself is confused here on sistering vs independent nodes. The single ctenophore opsin available -- unfortunately only a distal fragment -- is difficult to classify. The fact that its best blast matches cluster about equally well with melanopsins and cilopsins (to the exclusion of other bilateran classes) suggests that their merger is not far off.
  • The gene tree can be worked out even though and coupled later to the species tree. Despite many efforts at this, the deeper topology remains problematic. It appears from sequence clustering, indel analysis (not possible yet in cnidopsins), and especially intron conservation that ((peropsin, rgropsin),neuropsin) is the correct grouping. Further, this assemblage associates more closely with cilopsins, leaving a final topology which must be superimposed on the phylogenetic tree:
gene tree    ((((cilopsin,((peropsin,rgropsin),neuropsin)),melanopsin),cnidopsin),GPCRpep);
species tree (((((((((echinoderm,acornworm),amphioxus),tunicate),vertebrate),((chelicerate,(crusacean,insect)),(mollusc,annelid))),cnidaria),ctenophore),trichoplax),sponge);

OpsinOutgroup.jpg


Note opsins are unique among GPCR in several respects: they catalyze a mild in-situ enxymatic reaction -- cis-trans photoismorization -- that furnishes the signaling agonist. (This reaction also occurs thermally without enzyme but so does carbon dioxide dissolution in water yet humans have 15 carbonic anhydrases). Cis-retinal, being lipid soluble, does not diffuse through the extracellular mileau to reach its receptor binding site as in all other GPCR. Instead it is covalently bound to a lysine deeply internal to TM7, again unprecedented among GPCR (though other internal charged amino acids can occur, notably the D83 glutamate salt bridge and K90 of ultraviolet opsins).

Conceivably forerunners of opsins bound a related chromophore non-covalently, perhaps an all-trans retinoid in the manner of peropsins. Retinoic acid is sometimes proposed as ancestral ligand but retinoic acid receptors (RAR and RXR) are non-GPCR nuclear hormone receptors that bind all trans-RA or 9-cis-RA but not 13-cis-RA. Furthermore, the GPCR receptors inducible by retinoic acid -- RAIG1 proteins (GPRC5C etc) belong elsewhere in the GRAFS classification, have no particular affiliation with opsins and again do not bind retinoids themselves.

Finally, although literature searches turn up scattered assertion about 'opsins' in species such as Chlamydemonas ('chlamyopsin' Z48968) and 'volvoxopsin', not to mention bacterial 'rhodopsins', these amount to abusive terminological metaphors, unwelcome additions to an already complex gene family . These proteins do not have seven transmembrane helices in the same arrangement as GPCR nor possess the slightest sequence homology at deeply conserved GPCR residues, so represent independent evolution of photobiology (along the lines of bat and butterfly wings representing independent origins of flying). The fact that the chromophores can similar retinoids may be coincidence arising from the ubiquity of metabolic carotenoids (availability) and the restricted number of biochemicals (isoprenoids but not amino acids) with tunable adsorption in the visual range (suitability).

Origin of contemporary opsin classes

Traceback of opsins can begin by selecting certain 'index sequences'. It ultimately does not matter which or how many, but for historical reasons bovine rhodopsin, frog melanopsin, human peropsin, mouse neuropsin and so forth might be used.

Each index sequence is then built out to a larger class of orthologs in nearby species using flanking gene synteny to confirm best-blast. Lineage-specific gene duplications with close affinities (eg from recent clade-specific paralogous expansions such as teleost fish whole genome duplications) are added. Eventually the set collides with an expanding set of another index sequence and all bilateran opsin sequences are uniquely placed in one of five clusters.

Ciliary opsins (generated from RHO1) forms a cohesive gene clade that does not coalesce with melanopsins, peropsins, neuropsins, or rgropsins within vertebrates, deuterostomes, or even bilatera. The index gene picks up rod and cone imaging opsins, pinopsin, parapinopsin, parietopsin, very ancient opsin, encephalopsin, teleost multiple tissue, and certain ciliary opsins from protostomes.

Hardly a vertebrate innovation, include early deuterostomes lacking imaging eyes, both branches of protostomes (initially bee and ragworm), pre-bilateran cnidarians and possibly ctenophores. Sponges are still uncertain (because of a 5 year wait on the assembly) but the very earliest metazoan genomes (Monosiga and Trichoplax) definitely lack ciliary and (all other) opsins. If those genomes are representative, then ciliary opsins emerged on the post-Trichoplax stem. Certain cnidarian opsins -- but not all -- already exhibit some sequence specializations of ciliary opsins.

Ciliary opsins have been totally lost on numerous occasions in numerous lineages, notably 'model' organisms like drosophila and worse, nematodes, which have lost all types of opsins. Hemichordates and non-annelid lophotrochozoans lost ciliary opsins independently. Other explanations (such as multiple re-emergences of ciliary-like opsins) are manifestly impossible.

The earliest deuterostome ciliary ur-opsin is best represented by the TMT class of opsins, in particular by the TMT1 subgroup that has retained important ancestral characteristics in the diagnostic TM2 region. Sequential expansion of TMT1 gave rise to all the other ciliary opsins found in vertebrates, including all rod and cone opsins. This fundamental gene, though retained through ampibian and amniote, curiously was eventually lost in birds and mammals. Transcripts are often annotated as testis libraries suggesting a function in gamete release timing. Its immediate descendent gene TMT2, whose subfunctionalization is unknown, is retained in monotremes and marsupials but lost in all placentals. The best experimental organism for studying TMT1 is probably Xenopus.

Melanopsins, discovered in 1998 in frog lateral line dermal melanophores (as well as hypothalamus, iris, and retinal horizontal cells) form another ancient opsin class. Melanopsins include rhabdomeric arthropod opsins (which have an unnecessary dual nomenclature -- they're melapsins by multiple independent criteria) and lophotrochozoan melanopsins (which other than scallop, squid and octopus sit undocumented within genome projects). One cnidarian opsin from coral classifies as a melanopsin yet closely shares other properties with cnidarian opsins that don't.

OpsinLoss.jpg

Peropsins are a third major class of opsins in the sense of broad but not universal retention.. They are expanded in deuterostomes but rarely occur in arthropods yet are of major importance in lophotrochozoa. They are the only opsin class retained in hemichordates. There is nothing resembling them in cnidaria though that could very well reflect gene loss in the two genomes available as no coalescence with other opsin classes is evident.

Neuropsins are a much expanded but little studied group of opsins restricted to living deuterostomes though they did not originate there (unless divergence from another opsin class was exceeding abrupt and then immensely slowed). The neuropsin expansion to 4 genes in the lamprey stem continued unchanged to the amniote ancestor but subsequently contracted to 2 in monotremes and only 1 in marsupials and placentals.

Rgropsins are another little-studied group that is represented today only on from the tunicate-vertebrate last common ancestor. Again these opsins must have originated far earlier in pre-bilaterans because their ancestral reconstructed sequence is still far from coelescence with other ancestral opsin classes.

Rgropsins and neuropsins are conceivably retained in other bilatera but diverged to the point of unrecognizability. This scenario has to be discarded because analysis of complete genomes is adequately sensitive to locate all K296-containing GPCR. This reasoning is applicable to peropsins as well -- they have been definitely lost in all insect and molluscan genomes as well as the chelicerate Ixodes though fortunately retained in arachnids. This illustrates the incredible interpretive importance of a taxonomically broad sequencing program.

Peropsin, neuropsins and rgropsins are unified by their intronation, sharing 3 ancestral introns despite numerous differences. This indicate -- given the slow rate of intron gain and loss in most deuterostomes -- that they share deep roots in pre-bilateran, implying near total loss of neuropsins and rgropsins in invertebrates.. None of these introns are shared with cilopsins or melanopsins or for that matter known GPCR.

In hindsight, large scale loss of opsin classes should not come as a surprise -- humans lost 12 of 20 opsin loci present in the amniote ancestor. This is characteristic of GPCR evolution as well: collapse or near-collapse of a large gene clade, followed by later massive expansion but retention only in scattered lineages. This can result in two species have similar number of GPCR genes but very poor correspondences among them. This is a very different history from genes such as ribosomal proteins or catabolic enzymes (eg homogenistate oxygenase) that are retained in all species as single copies. Other genes like globins exhibit moderate expansion to several copies accompanying a trend to organismal specializational complexity with little evidence of contraction cycles.

Once over this conceptual hurdle, these cycles of expansion and contraction can be repeatedly invoked on various branches of the phylogenetic tree to explain many aspects of opsin classification. Vertebrates have a remarkable history of largely terminal expansion in the lamprey stem, followed by retention in most lineages, clade-specific expansions in teleost fish, and major attrition in mammals but with some recovery in cone genes in primates.

Cnidopsins are a taxonomically based collection of opsins that largely do not classify unambiguously within bilateran opsins. Much more intensive sampling is needed here because neither of the two genome projects to date was favorable for opsins. Ctenophores currently have a single unexpected opsin gene obtained as accidental byproduct of another project -- obviously much greater sequencing effort is needed given their currently basal position of opsin-containing species.

Nearest neighbors of opsins among GPCR

The immediate outgroup of opsins lie among a vast number GPCR receptors. The set of 29 non-opsin GPCRs in the reference gene collection was constructed by taking best-blasts separately for each opsin class to all human GPCR, then collating the lists with winnowing out of repeated entries and too-recent gene family expansions. TACR2 (tachykinin receptor) and SSTR1 (somatostatin receptor) are the best single representatives. To these are added the 3 non-opsin GPCR with determined 3D structures and the closest two non-opsins in an early diverging eukaryote (Trichoplax) and a few nearest neighbors in a May 2009 update of the GRAFS classification tree.

The aligned sequences below have been trimmed at both ends to the earliest indications of conservation. Highly conserved residues are shown in red and less conserved residues in blue. The Schiff base lysine (position -16 relative to the FR end of TM7) does not occur outside of opsins. Note many of the conserved patches in these GPCR are very similar to those of opsins, implying those residues have no utility in distinguishing opsins from non-opsins. These shared conserved residues describe commonalities needed for generic GPCR structure and signaling but not especially for photobiology. Departures in certain opsin classes might indicate they are constituitively or even no longer signaling.

This expansion of the Ralpha class had largely taken place by early-diverging metazoan such as Monosiga and Trichoplax, species which do not contain opsins, implying the ancestral metazoan lacked them as well. The orphan receptors GPR21 and GPR52 form the immediate outgroup (within the 800 human GPCR) in an oft-cited 2003 study. They have isoleucine at K296; their ligands still remain unknown as of Dec 2009. Conservation is high throughout deuterostomes; blast matches within GenBank nr are restricted (within opsins) to molluscan melanopsins, suggesting Gq signaling.

The melatonin receptor MLTNR1A emerges as a close relative to opsins. Curiously it plays a key role in circadian rhythms and so could coordinate with an opsin photosensors (ie one arising from a gene duplication and divergence). N-acetyl-5-methoxytryptamine, the ligand, bears no obvious relationship to cis-retinal however and K296 is lacking, making an immediate parent gene relationship problematic.

Another clue to the origin of opsins might be provided by examining GPCR intron positions and phases to see if shared with ancient introns in opsins. Many non-olfactory GPCR with sequence similarity to opsins have no introns or just one, suggesting the genes duplicated by retroprocessing and perhaps acquiring an intron at unrelated position later. UROPS2 has an intron but it does not seem to correspond to one in any opsin. Cnidarian opsins are either intronless (Nematostellata) or undetermined (just known from processed transcripts).

Closeness in the GRAFS tree does not fully accord with closeness of opsin blastp match, suggesting (unsurprisingly) that its topology is slightly wrong at some internal nodes. On average rank in blastp top scores (or by average 5 best blast expectation values), as representatives of all opsin classes are aligned with the GPCR below, the highest scoring ones by far are are the Trichoplax opsins followed by various peptide receptors:

Rank  Gene          Exp  Exons  Receptor      Ligand

4.2   UROPS2_triAd  e-29   2    orphan        histamine? (HRH2:  best non-opsin blast human)
5.4   UROPS1_triAd  e-28   1    orphan        peptide?   (SSTR1: best non-opsin blast human)
5.6   SSTR1_homSap  e-26   1    somatostatin  peptide
7.2   TACR2_homSap  e-25   5    tachykinin    peptide
8.1   GALR1_homSap  e-24   3    galanin       peptide
8.9   MTNR1A_homSa  e-23   2    melatonin     N-acetyl-5-methoxytryptamine 


Trichoplax has two very curious 7-transmembrane protein that emerges as its best genomic matches to opsin queries. While lacking K296 for a Schiff base, their best back-blast to GenBank nr returns almost entirely opsins (rather than other GPCR receptors). While Trichoplax is 600+ million years removed from a common ancestor, this gene could still offer clues about the immediate GPCR ancestor to opsins. It is not plausibly descended from an opsin gene expansion followed by loss of K296 because Trichoplax ancestors all lack opsins.

In summary, the parent GPCR gene for opsins can be localized to the PEP subgroup of R class GPCR within GRAFS but no particular gene there stands out as a pre-opsin. The time span is immense and the gene class has experienced much churning through expansion and contraction cycles.