Opsin evolution: key critters (cnidaria)

From genomewiki
Jump to navigationJump to search

Cnidaria .. 1 opsin established

Biologists have belatedly realized that many molecular and morphological innovations attributed to chordates (or grudgingly to bilatera) actually track back much earlier to the common ancestor with cnidaria (Eumetazoa) if not earlier still to placazoa, sponges, and choanoflagellates. That's certainly true of photoreception. Two cnidarian genome projects have been more or less finished (Nematostella and Hydra) but that selection needs to be seriously expanded. Hydra especially repeats the whole mistake of sequencing a hugely derived genome with relatively little applicability to Bilatera based on a shallow 'model organism' approach (Hydra has only 66 publications in the last 30 years).

A scientifically neutral definition of eye needs to embrace the full variety of photoreceptors, including those with fewer "features" than the most complex. Probably the cutoff should be based on use of bona fide opsins classifying to the root of the encephalopsin, melanopsin, and RGR families covalently binding retinal and variants as agonist. Some purposes for an eye can be fully met by just perceiving and acting upon one "pixel", that is a simple photoreceptor eye with no pigment cup (that could provide directionality, two pixels resolution). Far too much emphasis has been given to the distinctness of lensed vision whereas such systems are evidently easily evolved and only a part of a broader photoreceptor continuum.

Opsin cnid overview.png

We don't say humans lack eyes just because a redtailed hawk has more pixels; we don't say humans lack color vision just because a turtle sees richer, sharper colors. When a simpler photostructure already suffices to distinguish day from night for gamete release, up from down for settlement, towards or away for predator evasion, cornea, lens, retina, and centralized nervous system are just baggage that can't be developed or maintained under darwinian selection. Cnidarian eyes exemplify this full range of possibilities.

Sponges and cnidarians have operated for immense timescales under selective pressure on huge population numbers on a steady body plan. Rather than frozen in time at some primitive condition as often portrayed (biobigotry), quite the opposite, they are fast evolvers that have had eons to perfect their genes and expression systems even as mammals played evolutionary catchup (eg human knee or defective LWS opsin duplication).

No living animal represents a long-gone ancestral node -- evolution never stops at the dna level even if outward morphology seems constant. All extant species have proven equally successful at survival. Evolution is not a story book progressing to human-- if cnidarians are so dumb and their vision so bad, how then are they able to chase, catch, kill, and eat advanced vertebrates?

Cubozoa: Tripedalia cystophora (jellyfish) .. 1 opsin

A landmark paper by Kozmik et al in the 24 June 089 PNAS has found the first convincing camera-eye imaging opsin sequence in pre-Bilatera, accompanied by homologous genes needed for the signaling cascade and melanin formation. That opsin classifies deeply within deuterostome TMT- and encephalopsin-class cilary opsins, just as expected (should such an opsin even exist in cnidarian photoreception) because pineal and retinal opsins descended via gene duplications from this ancient opsin class (which also has non-imaging representatives in protostomes) prior to lamprey divergence.

This new gene has the expected 7-transmembrane topology, conserved disulfide, ERY domain (as ERF), and conserved lysine for covalent chromophore attachment (with counterion predicted here at the ancestral 'E181' position EGV). Only the lysine and counterion are specific properties of opsins relative to generic rhodopsin-class GPCR. Other conserved residues specific to ciliary opsins serve to distinguish it from rhabodomeric opsins and contribute to its unequivocal blastp clustering within vertebrate ciliary opsins. These latter are slow-evolving and consequently the distance is less than to protostome ciliary opsins.

The authors suggest the NKQ motif (NRS in Tripedalia) at the start of the last cytoplasmic region is a reliable signature for ciliary transducin interaction (this was established experimentally as a contributing factor but only for bovine RHO1 and its co-evolving transducin Gt) but comparative genomics of 300 phylogenetically dispersed opsins shows this cannot hold in general as TMT opsins lack the basic middle residue and encephalopsins do not conserve the motif at all. Note rod and cone transducins cannot be tracked back orthologously even to tunicates. Melanopsins are not known from pre-Bilatera; at some point they coalesce with ciliary opsins, perhaps with this motif coelescing to the latter's pattern. Opsins can also trigger multiple signaling systems.

OpsinActivation.png


Tripedalia and its still-hypothetical Gt transducin have been co-evolving on a separate trajectory from the bovine pair for perhaps 1.2 billion years. The histories of gene duplication in the heterotrimeric G proteins may differ, along with the constraints in vertebrates of transducin having to simultaneously interact with multiple ciliary opsin genes. However as initial gene duplicates, Gt and Gq would have had the same binding site; even as the proteins differentiated to serve distinct opsin sets, the binding site (at the juncture of last transmembrane and last cytoplasmic sections) would likely remain conserved because of homologic inertia of the binding pocket and the implausibility of creating a new transduction mechanism.

It is easy enough to identify candidates for the photocascade alpha subunit of heterotrimeric G protein in cnidaria -- simple Blast of human GNAT2 protein (cone transducin) calls up two strong 64% identity matches in Nematostella, Hydra, and others (meaning the binding site could be accurately modelled). These could be close relatives of the implied Tripedalia ciliary opsin transducin for which there is no data, but exclusive staining data would needed to make the case.

Actual ciliary opsin conservation in this boundary region looks like yNP.IY..mNkqFr.c (YNPVIYCLLNRSFRKM in Tripedalia) with the serine of NRS not observed in other species; the logos graphic shown below makes this quantitative. Note while this region looks very different in melanopsins (typically HPK in HNPIIYAITHPKYRM), there was never any potential for confusion given the much worse alignment there. The cubomedusan opsin also appears oddly truncated at the carboxy terminus; it appears to lack the cysteines need for palmitoylation and the serines/threonines for kinase activation.

CubozoanProt.png
Alignment of new cnidarian opsin with best Blastp in reference collection, compared to consensus sequence of all known ciliary opsins.

triCys 47 SFLLNGLVIAVLIKYIRTITNTNIIVLSMSCANILIPLLGSPLSATSSLMRKWQFGNGGCTWYGFINTLSGISGIYHLTFLSFERFITIVLPLKRDTILSTKNIYIGLGILWVAAIGVAGAPVFGWCEYIKEGVRTSCSVAW
bestBl    +F +NGLVI V +KY +  +  N I+++++ AN+L+ + GS +S +++++  +  G   C + GF+ +L+GI G++ L   +FER++ I  P+  D     K+  +G    WV +      P+FGWC Y+ EG+RTSC       
consen        n............Lr.p.n..l.n...........g..........g....g...C...gf.....G......l.....eR......p...........a.......W........pPl.GW..y..eg....C....
xenTro 43 AFFVNGLVIVVTLKYKKLRSPLNYILVNLAIANLLVTIFGSSVSFSNNVVGYFFMGKTMCEFEGFMVSLTGIVGLWSLAILAFERYLVICKPMG-DFRFQQKHAILGCSFTWVWSFIWTSPPLFGWCSYVPEGLRTSCGPNW

triCys -SSKENMNVFSYNLFMIFTVFLLPMLVIIYCNYRFIKEVSIMSTRARGLQGGDSEMTASASKAEKQLTIMVITMIIAFNIAWLPYTVVSMVFLTGYGDVVGPMGASVPSVFAKTSVIYNPVIYCLLNRSFRKML-----CGNS 325
bestBl  +   N N  SY + +  T F++P+  II+ +Y  +    +M+ RA   Q  DSE T    +AEK++T MVI M++AF I WLPY   ++V+         P  AS+PS F+KT+ +YNP+IY  +N+ FR  L     CG S
consen ..........sy.....f.....P...i....Y..............................E..v..Mv..Mv..f...w.PY......................P....K....yNP.IY..mn.qfr........CG..
xenTro YTGGTNNN--SYIMALFLTCFIMPLSTIIF-SYSNL----LMALRAVAAQQKDSETT---QRAEKEVTRMVIAMVLAFLICWLPYASFAVVVAVNKDVVIEPTVASLPSYFSKTATVYNPIIYVFMNKQFRNCLMTLLCCGRS 316

The low overall percent identity (37%) of its best matches to any known Bilateran opsin, attributable to the great evolutionary time spans involved, disappointingly does not open any new doors. The sequence here does not have striking homololgy to cnidarian opsins recently proposed by several other groups, does not elicit dramatic new ones in sequenced cnidarian genomes or transcript programs, nor serve to locate opsins in sponge genome.

Note the other phototransduction cascade proteins co-located in the rhopalia by the authors have much more striking percent identity to both Nematostella and human homologs than the opsin, most astonishingly PDE6D at 80%. In terms of human gene names, function, and associated disease:

OCA2   49% EU310502 melanocyte membrane transporter of melanin precursor tyrosine (ocular and cutaneous albinism)
MITF   48% EU310499 basic helix-loop-helix and leucine zipper transcription factor (auditory+pigmentary syndromes)
PDE5A  48% EU310500 cGMP-specific phosphodiesterase 5A
PDE6D  80% EU310501 cGMP-specific phosphodiesterase subunit delta recognizing prenylation
GUCY2F 48% EU310503 rod outer membrane guanyl cyclase resynthesis of cGMP for recovery of the dark state

Most curiously, the authors observe that the major Tripedalia lens crystallin J1 protein is also strongly expressed in the nominally lensless slit and pit eyes. This raises the question of whether our concept of lens is too anthropomorphic and whether other anatomical configurations of high refractive index proteins can accomplish the same ends, possibly requiring an 'upgrade' for slit and pit eye functional assessment. For example, cone megamitochondria in treeshrews (refractive index 1.4) may have some lensing or waveguide function.

CoboTax.png

The opsins of slit and pit and larval eyes (implied by their photoreceptor cell structures) need to be determined with high priority. What's needed here too is a massive ortholog sequencing effort in the 19 extant species of cubozoan to break up isolated opsin long branch to show how such sequences are evolving, allow estimation of when they were recruited to imaging vision and reconstruct the earliest such ancestor. If a rhabdomeric opsin also occurs in larval or other eyes, that too needs comparative genomics.

It's unclear whether the last common eumetazoan ancestor of Tripedalia and Bilatera had imaging vision (note the hydozoan Cladonema radiatum has eyes and sponge larva have photoreceptor structures), yet clearly that ancestor contained one or more ciliary-type rhodopsin-class 7TM GPCR from which ciliary opsins descended in both clades, not always to be recruited for imaging. The fossil record for cubozoan cnidaria predates the Cambrian, though when eyes and statocysts (both regulated by PAXB) first appeared is unclear. Note further that planaria larva of Tripedalia have a rhadomeric photoreceptor, suggesting melanopsin photoreception is also very ancient.

It's worth noting that the phylogenetic tree for early metazoans has entered a state of turmoil. Sponges may be secondarily simplified in the adult stage; ctenophores may be basal, and so forth. It's even been suggested that ancestral sponge larva represent the central object from which complex metazoan are descended.

These considerations suggest a very early evolutionary origin for the basic genes of photoreception and their regulation, with a great many lineage-specific subsequent upgrades and downgrades of the details, deuterostomes being the last to get on board with imaging vision. Thus the question Darwin asked, how many times did vision originate, requires a more nuanced answer than just a number. Most likely, the basic package of photoreceptor genes and their developmental regulation of expression arose just once, with all subsequent systems descended from that. However that package was subjected to numerous gene duplications and morphological variations in deployment, and ouside recruitment in the case of crystallins and pigments.

Protostomes recruited melanopsin-class opsins for their imaging vision (despite available retained ciliary opsins), whereas early deuterostomes lacked imaging vision per se but retained ciliary opsins in related photoreceptory roles. Later post-amphioxus, post-tunicate deuterostomes independently recruited a descendent ciliary opsin (despite an available retained melanopsin-class opsin), moving from pineal to bilateral imaging eyes in the third and latest invention of imaging vision.

The spectral sensitivity of neritic (near-shore) lens eyes of a box jellyfish, Tripedalia cystophora previously considered by M Coates et al was interpreted as a single vitamin A-1 based opsin with peak sensitivity near 500 nm (blue-green). However nothing was sequenced. This species was most helpfully reviewed earlier by Piatigorsky and Kozkmik who note Eakin already commented on seemingly ciliary photoreceptors in 1962. However, 45 years later we still didn't know if opsins in cnidarians would classify with vertebrate ciliary opsins. They could even share conserved intron positions though that cannot be determined from transcript data.

Opsin cnid larva.png

Furthermore, as noted by Nordstrom et al, planula larvae of Tripedalia have a series of single-cell pigment cup rhabdomeric-like photoreceptors directly connected to motor cilia. These lack neural connections in line with Gehring's notion of the eye preceding the brain in evolution, rather than being a later add-on. So cnidaria might actually retain descendents of both types of ancestral opsins. No sequence is available yet for larvae,

Opsins cubomedusae.png

Cubozoa: Carybdea rastonii (box jellyfish) .. 1 opsin

Cnidarians are the earliest diverging invertebrates with multicellular light-detecting organs. Photodetectors include simple eyespots, pigment cups, complex pigment cups with lenses, and camera-type eyes with a cornea, lens, and retina. These remarkable eyes are located on sensory clubs called rhopalia with four lining the bell. Each houses six eyes: a pair of pit ocelli, a pair of slit ocelli, and two unpaired lens eyes with counterparts to cornea, cellular lens and retina of ciliated photoreceptors.

Anatomically, the ocelli have bipolar sensory photoreceptor cells interspersed among nonsensory pigment cells with the apical end making the light-receptor with the basal end forming an axon that synapses with second-order neurons to form what amounts to ocular nerves. Vision has roles in the reproduction and feeding of cubomedusae which can find each other and chase, catch, and eat teleost fish. A patch of Pelagia nocticula 10 square miles in extent and 35 feet deep recently destroyed a salmon farm off Northern Ireland.

One of the most striking jellyfish from the perspective of complex eyes is Carybdea marsupialis, as reviewed by VJ Martin. Antibody studies based on vertebrate cone/rod opsins are not sufficient because of possible cross reactivity to generic GPCR proteins or non-imaging photoisomerases; no opsins have been sequenced yet. Provided the retroposon and base composition are not unwieldy, Carybdea could be an instructive genome to sequence. Nematostella and Hydra, whatever their other genomic merits, sit in the Anthozoa and Hydrozoa, clades of cnidarian lacking elaborate visual systems.

Carybdea rastonii has a green-sensitive visual pigment in its ciliary-type lens eyes utilizing Gs cAMP phototransduction cascade (that is, not Gt, Go or Gq). A complete opsin-like sequence AB435549 satisfies various opsin sequence signature requirements but does not classify clearly among known ciliary opsins. Instead, its affinities lie with opsin-like sequences from Hydra and Nematostella -- species that have 'too many' opsins for their meagre photoreceptive anatomy and photobehavioral capacities.

The second problem is the lack of blastp affinity of Carybdea protein to the new validated opsin from Tripedalia cystophora, which classifies as expected within bilateran ciliary opsins. This implies the last common ancestor of box jellyfish with bilaterans possessed a conventional ciliary opsin. What then is the need for other classes of opsin-like sequences? Other interpretations of opsin-like sequences need to be considered:

First, AB435549 may function more along the lines of peropsin/RGR/neuropsin (even though it does not cluster with them) as an auxillary, possibly signaling or replenishing photoisomerase but not the primary imaging photoreceptor in Carybdea. In this scenario, AB435549 could hybridize more or less correctly in situ as would a better missing opsin -- remaining to be recovered -- more closely related to the Tripedalia opsin.

This still does not explain the observed clustering of AB435549 with opsin-like proteins, especially of intronless genes of Hydra and Nematostella with dubious connection to any kind of vision. Possibly the function here instead has to do with sensing or digestion of dietary carotenoids or photo-rearrangement of double bonds for biosynthetic, energetic or regulatory purposes (eg retinoic acid metabolism). In its wildest form, this hypothesis envisions metabolic photoreception as the core ancestral property that was later co-opted during the evolution of vision. Alternately, metabolic photoreception was a later spinoff of light sensing.

A third scenario just places AB435549 on another 'track' from bilateran opsins. Here it either arose from a conventional ciliary opsin after species divergence from last common ancestor or it is older and bilaterans subsequently lost all members of its gene tree class. The latter seems more plausible because AB435549 has no particular affinities to ciliary opsins relative to melanopsin or peropsin-type opsins. In this view box jellyfish have retained two systems with different retention patterns in different clades, in analogy to protosomes emphasizing melanopsins and deuterostomes ciliary in their imaging opsins. In support of this, the timing of box jellyfish divergences could be equally as old, even if they 'all look the same' from the human perspective. Here the Tripedalia group is then more relevent to the evolution of deuterostome vision whereas neither Tripedalia or Carybdea is helpful in understanding protosome vision.

It would be quite practical with 2009 technology to sequence a substantial number of complete box jellyfish genomes. This has the great advantage of allowing bioinformatic recovery of complete K-rhodopsin portfolios. This would settle the question of whether Carybdea posseses a ciliary opsin clustering with that of Tripedalia. Genomes also provide homologs of all auxillary genes such as Galpha and RPE65. With a large set of proteins and rRNA, the timing of divergences within box jellyfish could be better estimated. It remains conceivable that cnidarians are paraphyletic, ie box jellyfish share a later divergence node with bilaterans, perhaps explaining common ground in eye structures not seen in anthozoa etc.

>CUBOP_carRas Carybdea rastonii sea_wasp Cubomedusae AB435549 cubop PUBMED 18832159
MGANITEILSGFLACVVFLSISLNMIVLITFYRLRHKLAFKDALMASMAFSDVVQAIVGYPLEVFTVVDGKWTFGMELCQVAGFFITALGQVSIAHLTALAL
DRYFTVCRPFVATAIHGSMRNAGMVIFVCWFYASFWAVLPLVGWSNYDVEGDGMRCSINWADDSPKSYSYRVCLFVFIYLIPVLLMVATYVLVQGEMKNMRGRAAQLFGSESEAAL
KNIKAEKRHTRLVFVMILSFIVAWTPYTFVAMWVSFFTKQLGPIPLYVDTLAAMLAKSSAMFNPIIYCFLHKQFRRAVLRGVCGRIVGGNAIAPSSTAVEPGQTLASGTAES*

(to be continued)

Cubozoa: Chiropsella bronzie (box jellyfish) .. 0 opsins

This box jellyfish was featured recently in a comprehensive optical and micro-anatomical study of all four eye types. The genus Chiropsella is not currently known to GenBank taxonomy, meaning no sequence data at all is available (unless some synonym has been used). Its enveloping family Chirodropidae has barely 17 sequence entries, none relevent to vision. Chiropsella bronzie was first named in 2006; it occurs in knee-deep water feeding on shrimp along sandy beaches in North Queensland, Australia.

RhopalCB.jpg
RhopalCB2.jpg

The picture of eye functions that emerges here is rather surprising. Both upper and lower lens eyes are severely under-focused (much more so than Tripedalia cystophora) with the retina so close to the lens that only blurred vision can result. And these are its best eyes. A novel long pigment cell has dark pigment moving within a white pigmented tube during light/dark adaptation for unknown advantages.

Since rhopalia eyes have seemingly had several hundred million years to evolve deeper vitreal space (which seems simple enough), an eye tuned to detect large structures at short range (spatial low-pass filter) evidently suits Chiropsella. The primary function may be visual avoidance of obstacles or detection of prey within range. Higher spatial resolution and rapid refreshing entail a concomittant expansion of the nervous system and higher ongoing energetic costs to adaptively process massive extra information.

The skyward pointing upper lens eye has various peculiar features from our perspective. The ellipsoid lens lacks focusing refractive power, has a cataract-like inclusion casting a shadow on the retina, a hole in connection with the pigment layer exposing the retina to direct sunlight, balloon cells partly covering the lens aperture, gastric cells contacting the posterior lens side, and less of a pupillary response. Although little is known on the biological side, the upper lens-eye shadow line capability could be suited for detecting sun or moon position.

The two smaller pit and slit eye types have an epithelium/cornea covering but do not contain a lens. The photoreceptors are pigmented and organised into ciliary, pigment and neural layers. These eyes are capable at best of monitoring ambient light intensity, perhaps guiding overall phototactic behaviour or orientation. Yet this proposed function could seemingly be accomplished as a byproduct of lens eye functionality and would not require paired pit and slit eyes, much less four sets of them around the full quadrant of rhopalia.

While vertebrates too have anatomically separated photoreceptors for diverse functions, there are meagre prospects for homologizing here to melanocyte, ganglial, pineal or other deep brain structures. In fact even the main lower lens eye may not descend from an ancestral structure that, in another clade, became a bilateran eye because apparent common ground can originate multiple times just from convergent considerations of optical physics. Thus a 'cornea' is merely a protective epithelial layer and a lens just a thickening filled with overproduced protein providing refractive index.

Even if the old mystery of the origin of the eye can been pushed pack through homologization to the mystery of the origin of the rhopalium, no 'intermediate states' are likely be found in the Cambrian jellyfish fossil record and seemingly not in other living species of box jellyfish (all of which have the (2+2+1+1)x4=24 eye pattern).

It is not clear whether distinct opsin genes are utilized in these various eyes and if so, what their gene tree might look like, eg ((pit,slit),(upper,lower)) vs (((pit,slit),upper),lower). We are left wondering too about the origin of rhabodmeric melanopsins if ultrastructure in prebilaterans is always specialized to modified cilia.


Anthozoa: Nematostella vectensis (sea anemone) .. 3 opsins

The Nematostella genome has been released along with major papers and an upgrade to Stellabase. Not all 6.1 million traces were used up by the assembly, so any gene missing from the assembly should be sought directly in the trace archives.

The sea anemone, an anthozoan within Cnidaria having epithelial cells, neurons, stem cells, complex extra-cellular matrix, muscle fibers, and symmetry axis, is emerging as a high-profile evo-devo model species to elucidate the emergence and deployment of genes that determine animal body plans. However those plans don't seem to include eyes or overt photoreceptor structures such as pigment cells -- for that cubomedusae would be far better. PAX6 and RX are especially relevent to photoreceptor structures; their expression has been thoroughly studied in Nematostella without uncovering any sensory system though they contribute to patterning specific components of the ectodermal nerve net.

The JGI annotation pipeline produced a number of extensively annotated gene models for Nematostella opsins. These are available simply by keyword lookup, tblastn of various queries the best of which turn out to be -- unsurprisingly -- an encephalopsin subclass from Branchiostoma. It is important to credit the JGI staff for providing the relevent bioinformatic track computations because they were first to characterize and release these opsins into the public domain (eg GenBank NR and Entrez Gene). It does not constitute independent "discovery" to perform keyword lookup and copy out other peoples' work. Without proper citation, that's plagiarism.

I extended improperly truncated JGI gene models (ie those lacking iMet and stop codon), validated the extensions still lacked introns (GT-Ag splice junctions missing at positions expected from closest homologs), placed the best 3 (of a half dozen) in the Opsin Classifer with fasta headers, noted their best matches below, and validated lysine and counterion glutamate in the expected positions. All this is consistent with (but does not prove) a role for ciliary Gt opsins in pre-Bilateran photoreception.

We expect cnidarians (maybe not this particular anthozoan) to have both melanopsins and encephalopsins. Our tendency is to think that imaging eye opsins, whether insect rhabdomeric or vertebrate ciliary, are the main attraction, with the other opsins playing out obscure roles in secondary functions like timing of gamete release. That's quite wrong-headed. Deeper gene family trees show that the melanopsin and encephalopsin constititue the primary photoreceptors. Over vast evolutionary time scales, they gave rise to various spin-offs in various clades at various times through gene duplication and subsequent neofunctionalization. At even greater phylogenetic depth, melanopsin and encephalopsin are themselves related by gene duplication of an ur-opsin, which itself arose as a duplication of an established non-opsin GPCR. As noted by Arendt, that exploited prior gene duplication within the alpha subunit of heteromeric G protein and profound diversification in signalling system second messaging.

The odd thing about all these cnidarian encephalopsins is their lack of introns (three ancestrals are expected). That's very unlikely to be the Eumetazoan ancestral state for encephalopsin because Nematostella is no rogue organism when it comes to intron conservation. A common explanation for this within eukaryotic bioinformatics is gene duplication of a master gene via fully processed retrogenes (rather than through tandem, segmental, chromosomal, or whole genome duplications -- all of which preserve introns). Mixed mechanisms are also common (as in olfactory receptors): an initial intronless retrogene is duplicated tandemly etc. These paralogs can even displace the master gene by taking over its function, causing it subsequently to be displaced or even lost. That scenario played out within zebrafish opsins.

If so, we might expect Nematostella encephalopsins to be more closely related to each other than any known opsin from any species. Indeed ENCEPHa_nemVec is 90% identical to ENCEPHb_nemVec and 52% with ENCEPHc_nemVec, whereas only 39% identical to the best bilateran opsin, ENCEPH4_braFlo of amphioxus. Those are profound differences -- mammalian proteins typically take 100 myr to lose 10% of their percent identity. Here though we know next to nothing about clade-specific rates and have very long branches indeed. Of course, a seven-transmembrane protein has very different evolutionary constraints from the generic globular cytoplasmic protein to which off-the-shelf phylogenetic software is tuned, so no purpose is served applying that.

It appears the three Nematostella proteins may share a distinctive rare genetic event, an indel in a loop region. That would favor a common history. It will prove difficult to resolve indels as to insertion or deletion for lack of suitable outgroup.

Given an finished genome, the mode of gene amplification can be explored by looking at flanking genes. Perhaps ENCEPHa_nemVec and ENCEPHa_nemVec are adjacent (ie tandem duplication) or perhaps their flanking genes are paralogous (syntenic segmental duplication). However the Nematostella genome is currently unfinished and the (gapless) contigs containing the encephalopsins run about 10 kbp. Depending on gene density that can be too small to establish synteny. These contigs, separated by strings of N's of unknown length, are further assembled into larger scaffolds (ample for synteny), a process usually trustworthy at highly experienced JGI but sometimes confounded by issues such as repeats, compositional simplicity, very recent duplicative regions, and clonability.

The most convenient approach here is tblastn of ENCEPHa_nemVec against the wgs menu item at NCBI Blast, specifying Nematostella. The three genes here are on different scaffolds altogether, ruling out tandem position. The nearest flanking genes can be extracted by blastx of the enveloping contig (or whole scaffold) against GenBank protein. JGI has in effect already done this, as could be seen by expanding out from the inital browser view. Comparing 3 browser views is complicated by the fact that flanking paralogs might be named differently, but that is readily overcome by collecting sequences (noting strand orientation) into a mini-database and comparing within uBlast.

Notice the Opsin Classifier collection already contains the outcome of this process as a fasta header field (for deuterostome opsins). It is conceivable that orthology of a Nematostella opsin to say a Branchiostoma opsin could be established in this way (synteny). However gene order in both genomes has been independently scrambled over immense time scales and orthology would have been to the Nematostella master gene (with introns) that appears lost. It's better to build out from a local synteny chain but that requires data from additional cnidaria. Note the irony here in that the farther removed the genome from human, the more densely they must be sampled.

It's evident from a casual ClustalW alignment, after marking up columns for membrane-spanning sections and considering hydrophobicity, that Nematostella opsins conform to the standard central pattern. That's unsurprising since proteins retain 3D structure at far lower percent identity and the pattern here cuts much deeper, into the overall rhodopsin superfamily and beyond to generic GPCR. However encephalopsins can have very considerable extensions at their amino and especially carboxy termini that need separate consideration.

For now, sequences can be trimmed to whatever is alignable across the full spectrum of ciliary opsins. Recall that by design the Opsin Classifier collection seeks maximal phylogenetic dispersion to mitigate over-weighting by over-studied species that might introduce clade-specific interpretive bias. That could also be done by distilling the dataset down to ancestral sequences at lamprey divergence, the risk there being co-evolution of non-adjacent residues (eg different alpha helices) can be lost in residue-by-residue ancestral reconstructions.

As noted, Nematostella opsins are at best 39% identical. These had better be strongly concentrated at invariant and near-invariant ciliary opsin positions rather than randomly distributed. Blastp of course doesn't know the difference. We know at the outset this strong association will occur for any GPCR to the extent that it is reliably alignable, so the question really is whether conservation is concentrated at the conserved positions specific to ciliary opsins (ie conservation not shared with Go and Gq opsins). This has all been studied before but not nearly at the phylogenetic depth made possible by comparative genomics. There is always a need in remote opsins for independent support (here stratified signature residues) of candidates suggested by blast searches.

For that, it is most convenient to cut conservation tranches with Corpet's Multalign because user-specifiable line width can set breaks after structurally meaningful locations. Here the cutoff for invariant is set variously at 100%, 95%, 90%,... (with Nematostella omitted) and the stack of consensus lines retrieved. That results in a nuanced version of invariance that can be set off against the Nematostella sequence at those positions. For "controls" rhabodomeric opsins, rhodopsin superfamily, and generic GPCR generate their own stacks. (Alternatives such as logos or the misnamed evolutionary trace would give similar outcomes. None of the methods make use of the known phylogenetic tree relating the sequences.) The bottom line here will be that these new cnidarian opsins will have conserved residue signatures specific to a conventionally functioning ciliary opsin, though ultimately that can only be tested by experiment.

>ENCEPHa_nemVec Nematostella vectensis (anemone) no cdna complete 1 exon 306 aa best:ENCEPH4_braFlo scaffold_465_Cont27987 alt: Nemve1:219988 Nem1
>ENCEPHb_nemVec Nematostella vectensis (anemone) NC-extended 1 exon 275 aa best:ENCEPH4_braFlo scaffold_273_Cont21871 alt:Nemve1:130042 Nem3 
>ENCEPHc_nemVec Nematostella vectensis (anemone) C-extended 1 exon 289 aa best: ENCEPH5_braFlo scaffold_11_Cont2404alt: Nemve1:85309 Nem2

ENCEPH4_braFlo   Branchiostoma floridae (amphioxus) Gt 0....   470  7.0e-48 39% identity to ENCEPH4_braFlo 
ENCEPH4_braBel   Branchiostoma belcheri (amphioxus) Gt 0....   449  1.2e-45
PER_xenTro       Xenopus tropicalis (frog) ?? 0.2.0.2.1.0...   438  1.7e-44
ENCEPH4a_takRub  Takifugu rubripes (teleost) Gt 0...2...0...   435  3.6e-44
PER_homSap       Homo sapiens (human) ?? 0.2.0.2.1.0.1 in...   426  3.2e-43
ENCEPH4b_takRub  Takifugu rubripes (teleost) Gt 0...2...0...   418  2.3e-42
ENCEPH5_braFlo   Branchiostoma floridae (amphioxus) Gt 0....   418  2.3e-42
ENCEPH_gasAcu    Gasterosteus aculeatus (stickleback) Gt ...   415  4.7e-42
PER_monDom       Monodelphis domestica (opossum) ?? 0.2.0...   411  1.2e-41

Four putative opsins have been proposed by Plachetzki et al. Accessions of the supporting gene models are given in the JGI protein ID system (non-GenBank) as Nematostella1 219988, Nematostella2 85309, Nematostella3 130042, and Nematostella4 108738 (or fragments in the alignment graphic allow recovery of the respective cdnas by tblastn of GenBank WGS). As noted in the Hydra section, multiple lines of evidence are necessary to establish the first bona fide opsins in cnidarians.

There appear to be 2 Nematastella opsin-like cdnas at GenBank that cannot be found in the genome assembly or trace archives, DV091537 and DV087469. While genes can be missing from first assemblies, it is bizarre for both to be missing considering coverage is 6x. Upon back-blast to GenBank nr or the Opsin Classifier, very strong matches are seen consistently within crustacea. Thus it appears that these Sars Institute products are contaminants from another species, possibly a brine shrimp widely used in aquarium food. It is not unusual to see transcript (at issue here) and genome projects contaminated with dna from other species such as commensals, parasites, and food source -- this is reminiscent of Xenoturbella being confused with a mollusc in its diet.

New Nematostella transcripts continue to be posted by JGI into mid-Dec 2007. Using proxies for all possible queries, I located a possible melanopsin and possible rhabdomeric LWS counterpart, The former had two coding exons but not at a melanopsin position; the latter had but one. These are fairly weak matches and further characterization is needed. They're stored in the Opsin Classifier as MEL_nemVec and LWS_nemVec2.

A third group has taken a serious look at photoreception in Nematostella. No paper or dissertation has emerged as yet; no cnidarian opsins have been posted to GenBank.

The claim of orthology will prove exceedingly difficult to establish in a 700 million year long branch. It is not a property of a gene tree per se. By definition, two genes in species A and species B are orthologous if and only if they have descended vertically from the same single parental gene in their last common ancestor. The last component is exceedingly important because all opsins -- indeed all GPCR -- are ultimately descended from a single gene. However that single gene was not to be found in the common ancester of cnidarian and bilaterans because sponges already appear to have classical opsins and perhaps hundreds of GPCR.

Most ancestral introns in human genes were established in unicellular eukaryotes well prior to fungal and green plant divergence. For example the distinct introns in close paralogs SUMF1 and SUMF2 were in place before human/diatom separation. It's very difficult to imagine how the introns in neuropsins, rgropsins, peropsins, melanopsins, encephalopsins, pteropsins, and ciliary opsins could have descended from a single gene in Eumetazoa.

Evolution of photoreception: the eyeless anthozoan 
Nematostella vectensis  as a model
Poster talk March 22-23, 2007
Heather Q. Marlow,  Daniel I. Speiser, David Q. Matus and Mark Q. Martindale (Email: marlow@hawaii.edu)
 
"Eyes have evolved numerous times within the animals, yet there has been surprising convergence in
the morphology, function and molecular basis of development in these structures. Although these diverse
eye types have arisen independently, many taxa utilize similar cassettes of genes to specify them. These de-
velopmental genes include members of the SIX class of homeodomain proteins (sine oculis and optix), eyes
absent, dachshund and famously, the Pax genes (Pax6). Additionally, all animals in which photoreception has
been investigated use the opsin family, a class of seven transmembrane receptors, to detect light. Cnidarians
are an early branching lineage that are likely to have diverged from the rest of the animals before the evolu-
tion of discrete eye structures. 

The ancestral cnidarian did not posses eyes, however like the extant anthozoan cnidarians (sea anemones, 
corals, and sea pens), it was likely to have had photoreceptive cells. In order
to determine the level at which cnidarian photoreceptive cells may share homology with bilaterian eyes, we
have examined the expression of these “eye” genes during development in the anthozoan cnidarian model
Nematostella vectensis through in situ hybridization. 

We have also identified, cloned, and studied the expression of many members of the visual opsin class of 
receptors in N. vectensis. Our data indicate that N. vectensis possesses putative photoreceptive cells which
express several orthologs to the visual opsins, that the organization of photoreceptor cells differs between 
different life history stages of the animal, and that presumptive photoreceptor cells express many of the 
same developmental molecules that specify eye development in bilaterian animals. These findings support the 
hypothesis that eyes may share homology only at the level of the photoreceptor, and that additional “eye” genes 
may have been co-opted into the eye specification pathway from more general neural roles in bilaterians." 

A fourth group published a 19 Dec 07 paper on putative anthozoan and hydrozoan opsins, releasing 54 full length sequences to GenBank. These include 31 full length intronated predicted genes for Nematostella, 21 mRNA for lens-eyed Cladonema radiatum, and 2 for eyeless Podocoryne carnea. The latter two species are hydrozoa without genome projects meaning the transcripts cannot be intronated. Many of the 54 proteins have best-blastp below 30% identity within the 230 validated opsins of the phylogenetically comprehensive reference collection. This is worse than some generic non-opsin GPCR, so almost all of the residue matching will be non-specifically exhausted. All have lysine in homologous position, so potentially covalently bound retinal (though that was not established chemically). The counterion situation does not work out at either E113 or E181. Five are missing the universally conserved early asparagine and two others are truncated.

Conserved residues in putative cnidarian opsins relative to bovine rhodopsin and consensus sequences for ciliary, melanopsins, pteropsins, peropsins, and all validated opsins.
     ..........................................................*...................................................................*..................................................................................................................*..................
rho1 NFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFR
cili N.lv...t.k.k..LrPlN.ilvNla.a#l.....g..........gyfG.....C..eG%...l.G.v.lwsl.vla.dRy.v!ckp.g..f.a.g.........f.....W..pPl.GWs.Y.peg...sC...w.....s%..f.c...........Pl.i....Y..l.........aE..v.rM!..M!..%l...........cW.PYaa........p..P...........faKss.%NPi.IY.f$Nk#fr
cnid N..vi..................s.a..d..........................C...gf........si.hl.....ery........................W.....w...Pl.GW..y..e.....C...w.....sY............l..%P.................m....i..%......................aWtPYa..............l.........fAK.s..nP...%......fr
vali N..V.......k..LRP.N...vNLA..Dl...................g.....C..yg%.....G..s...$..ia.dRY.v!..P......a...........W.....w...Pl.GW..Y.pEg..tsC..#w.....s%.............f.%Pl!I.%..Y..i..........E.....m...m!..F.............W.PYa.........p..P...........fAK.s.%NP!.IY......%R
mela N.lv...f...ksLrtp.N.fIiNLA.sDf.ms....P....s.....W.fG...C.lYaF.g.lfG..S..t$..Ia.DRY.v!t.Pl..s.r.i.v........W.ysl.Ws.pP.fGwg.YvpEG..tsCt.D%.t..r.%.$.f.FPl.i..cY..if.a!r....#.k.ak.........%.......................sW.PYa.!.lG..ltpy.P............AKSai.NPi.iYa..hpkfR
pter Ng.V!.!F..tKsLRTPsN$lV!NLA.sDf.MM..m.Ppm.nc%.t..w.lG...C#.Ya..Gsl.Gc.siwtm..Ia.DRYnvIvkg.p$t..Ali.........W.....W...P.fgwnRYVPEGn$TaCgtDYLt.srs%.ys.vYP$.I!%.Y.fIv.aV.aHEkE.rlAK.vAl.t.sLwf......................aWTPY..!n.G...tPl.ti............k.a...p..vy.ishp.yr
pero N..v...f.k........#....nLA..D.g!s..g.p....S.....W.%G.G.Cq.ygf.gf.fg..Si...t.!a.DRY..iC......$.............W...afWa..Pl.Gwg.YEP.g..t.Ctl#w......%...............%P.!m....Y..!..K.k.....tk............%l...........aW.PYa!..w..f..p.ip.$..........AKs...NP..!Y...#..fr
rho1.NFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFR

Anthozoa: Acropora millepora (stony coral) .. 1 opsin

A new cnidarian opsin appeared at GenBank on 13 May 2009 based on the 454 transcriptome survey SRA003728 of 5-day-old planulae larva in Acropora millepora. The entry EZ013658 generates 244 alignable residues (after a few N's are manually corrected using genetic code redundancy) of a melanopsin-type protein; unfortunately it terminates 20 residues short of the expected Schiff lysine (so 38 residues short of a full motif).

Extending this sequence to the end of the seventh transmembrane segment is a high priority. That would likely raise the percent identity (excluding tails) over 40%, far beyond agreement of opsins to generic GPCRs. Even as it stands, the conserves early signature residues of melanopsins within opsins and opsins within GPCR and cleanly clusters with melanopsins at the opsin blastp classifier. With additional orthologs from other cnidaria (Nematostella genome lacks one), a better ancestral sequence could be worked out at the divergence node with bilatera. This sequence in turn may be as close as we get to the origin of melanopsic photoreception.

On 29 May 2009, blastn of the 454 Short Read Archive became available at NCBI. While EZ013658 matched 3 reads quite well, it was not fully tiled. These reads allowed various errors to be corrected but best of all, 56 extra amino acids to be added C-terminally. These included a standard Schiff motif KTASVYNPIIYFFSYKSFR. It remains a bit mysterious as to where the central region of EZ013658 came from in its assembly.

The best matches within cnidaria are to Nematostella K-rhodopsins classifying to TMT/ENCEPH, even while being more distant to Nematostella opsins classifying as melanopsins (eg BR000662, suggesting these opsin homolog classes converge in cnidaria. This means their signaling partners cannot be safely inferred from sequence alone; indeed Galpha programs themselves have complicated lineage-specific expansions.

A sub-sequence of EZ013658 was correctly identified as melanopsin-related, though not linked to circadian rhythm, in a brief April 2009 study of 24 circadian rhythm genes and the extraordinarily light-driven reproductive timing of broadcast-spawning corals (which may utilize cryptochrome photoreceptors rather than opsins).

Special care must be taken in cnidarians to maintain a rigorous definition of opsin photoreceptor candidates and not digress to deeply diverged non-opsin GPCRs lacking ability to bind chromophore and -- as far as we know -- any relevence to photoreception. Even with K-rhodopsins, it remains quite possible that some are merely involved in light-driven catabolism or rearrangement of dietary beta-carotenoids.

Alignment of Acropora 454 transcript with human melanopsin  
Blue shows the first two cytoplasmic domains; red invariant disulfide and trigger motif; magenta Schiff lysine end motif.

acrMil   1 HHTISFLYFLLALFSFSLNSVVILTFLLDRSLLFPANLIILSIAISDWLMSVVPNIMGGVANASNDLPFTDWSCTVFAFVATLLGLSNMLHHAAFALDRYMVITRPMRANH--SMTRILA 118
           H+T+  +  L+ L     N  VI TF   RSL  PAN+ I+++A+SD+LMS     +   ++      F +  C  +AF   L G+S+M+   A ALDRY+VITRP+      S  R
homSap  70 HYTLGTVILLVGLTGMLGNLTVIYTFCRSRSLRTPANMFIINLAVSDFLMSFTQAPVFFTSSLYKQWLFGETGCEFYAFCGALFGISSMITLTAIALDRYLVITRPLATFGVASKRRAAF 189

acrMil 119 VIAFLWCFALTWSLFPLVGWSAYVREAGDIACSVNWQSDNPSDSSYMVCLFFFFYFVPLAIIVYCYVFMIRSVRFMTKNAQKIWGV-----RSAAALETVQATWKMAKIGLIMVLRFFVAWTPYAVVSFIIAF 244
           V+  +W +AL WSL P  GWSAYV E    +CS ++ S  P  R+Y + L  F +F+PLII+ YCY+F+ R++R  T  A + +G       S    + +Q+  KMAKI L+++L F ++W PY+ V+ ++AF
homSap 190 VLLGVWLYALAWSLPPFFGWSAYVPEGLLTSCSWDYMSFTPAVRAYTMLLCCFVFFLPLLIIIYCYIFIFRAIR-ETGRALQTFGACKGNGESLWQRQRLQSECKMAKIMLLVILLFVLSWAPYSAVA-LVAF

acrMil DSVKDIPT-IAEIVPSMFAKTASVYNPIIYFFSYKSFRESLVK 288
            + T     VP++ AK ++++NPIIY  ++  +R ++ +
homSap AGYAHVLTPYMSSVPAVIAKASAIHNPIIYAITHPKYRVAIAQ 363

>MEL1_acrMil Acropora millepora (stony coral) 454 transcriptome shotgun assembly EZ013658 + 454 blastn, frag 40%/63% ENCEPHc_nemVec; 35%/57% MEL1_homSap
HHTISFLYFLLALFSFSLNSVVILTFLLDRSLLFPANLIILSIAISDWLMSVVPNIMGGVANASNDLPFTDWSCTVFAFVATLLGLSNMLHHAAFALDRYMVITRPMRANHSMTRILAVIAFLWCFALTWSLFPLVGWSAYVREAGDVACSVNW  
QSDNPSDTSYMVCLFFFFYFVPLAIIVYCYVFMIRSVRFMTKNAQKIWGVRSAAALETVQATWKMAKIGLIMVVGFFVAWTPYAVVSFIIAFDSVKDIPTIAEIVPSMFAKTASVYNPIIYFFSYKSFRESLVKSWRRYRNRNNVWPL

Hydrozoa: Hydra magnipapillata (hydra) .. 0 opsins

Because opsin photoreception is quite ancient, clearly pre-Bilaterans have a major role to play in illuminating the origins of photoreception systems. What's not so clear is that the two cnidarians chosen so far for genome projects are optimal in this regard.

Hydra does not have overt photoreceptive structures or cells obviously specialized for light detection yet it exhibits marked behavioral photosensitivity (noted by Trembley in 1744). Studies beginning in 2000 flagged the ectoderm (using antibodies to squid rhodopsin), known to contain epidermal sensory neurons, as responsible for extraocular photoreception. Musio and coworkers sought to recover opsins using degenerate primers, targeting melanopsin and peropsin as the most plausible in Hydra because the latter opsin seems not to require advanced relationships with neighboring cells or auxillary enzymes (ie, acts as photosensor and its own photoisomerase).

The Hydra cdna CB073527 was proposed as a peropsin based on best-blast to mouse peropsin. However using a much larger collection of demonstrably orthologous chordate peropsins in the Opsin Classifier conflicts with this interpretation: the putative cnidarian gene needs to consistently associate with this gene family (equivalently, have best match to it among all reconstructed ancestral opsins) but does not. Furthermore the best match is very weak at 31%. This is the signature of generic non-opsin rhodopsin superfamily members (which we expect any eumetazoan to have by the hundreds).

With the imminent availability of the Hydra genome assembly, the 161 amino acids of the fragmentary transcript can be extended for example with trace 1121878952 to apparent full length (309 aa) and its introns determined (none). This does not improve its best-blast score nor family coherence. It does not cluster consistently with ciliary opsins -- what would the signaling partner be when the matches are scattered between Gt, Go, and Gq opsins? The blast probability of 1.6e-34 does not mean much under these circumstances.

Opsin hydra doubtful.png

Two putative hydra opsin fragments can be extracted from Fig.1 of an Oct 2007 paper, AKSSTIINPTISCIIYKE and AKLSAVLNALVNCYINKS. These too fail to extend to convincing opsins. Expression centers around the hydropore, a better fit to GPCR chemoreceptor localization. There is meagre behaviorial evidence for photoreception near the hydropore and no possibility of a pigment-backed eye. An ultrastructure study is needed to demonstrate that the putative opsin is expressed in specialized photoreceptor cells. It is critical to include non-opsin GPCR as alignment controls -- all GPCR proteins bind heteromeric G proteins and many have lysine without binding of retinal. The cdna accession numbers are Hydra1 CN554949 and Hydra2 CV15164.

These papers highlight the special difficulties in working with cnidarian opsin candidates. We know from the outset that they will be quite diverged from bilateran opsins. Multiple forms of supporting data are needed, preferably in the form of diagnostic introns, alignments demonstrating conservation of critical residues and structures, in situ hybridization to anatomically plausible neuronal photoreceptors, and specific loss of photobehavior upon knockdown.

A higher standard of proof is needed for the first cnidarian opsins because validated ones will surely be used to pull in further homologs via annotation transfer. There is a definite risk in admitting inadequately documented opsins to the Opsin Classifier because once that database is tainted, it could draw in even more non-opsins from the GPCR world.

CnidBase provides a blast service to cnidarians including hydra but this appears restricted to ESTs and so only duplicates GenBank. GenBank carries contigs of the genome assembly on 25 May 2009 in the wgs division. Some 10.2 million Hydra traces have been provided by JCVI, ample for the 1290 Mbp estimated genome size. However the hydra genome project was dropped from that website. The draft genome expected in Dec 2005 has not surfaced, possibly because high AT content complicated assembly. A 'hydrazome' blast server and funky browser for assembly v2.0.4 has surfaced along with Gnomon gene predictions at NCBI nr.

The best current opsin search strategy simply uses tblastn of the usual three NCBI (nr, est_others and wgs) restricted to Hydra. After genes are recovered, their best-blast to all of nr must be considered. The outcome first lists other cnidarian 'opsins' and then mixed bona fide bilateran opsins. That's encouraging because generic GPCR come later but does not prove by any means that any of the Hydra genes are true opsins, even those with lysine in Schiff base position and having other signature residues.

The basic problem is Hydra has far too many opsin-like genes (some two dozen) for its complete lack of anatomical photoreceptive structures and minimalist phototactic behavior. It is not plausible that such a simple organism could have a larger opsin repertoire than an amniote with four color imaging vision and numerous pineal and deep brain photoreception. All the Hydra genes are intronless, suggesting they have as arisen, much like the olfactory gene expansion in mammals, as processed retrogenes.

If not opsins, if their function is chemoreception, why then is the Schiff lysine conserved? Note first that it is not currently known whether a retinal species occupies any of the Schiff base sites nor whether any cis-trans photoisomerization with ligand release takes place. These opsin-like genes have retained many key residues for signalling so very likely follow standard GPCR mechanisms (though with unknown Galpha partnering) and are certainly not pseudogenes.

Among the many possible scenarios: the opsin-like genes have other primary agonists. The genes were perhaps derived long ago from an opsin and retain the Schiff lysine and (undetermined) counterion through evolutionary inertia. That is, mutational loss of the lysine + counterion is a two-step process. Initial loss of either leaves a defective salt bridge in the extreme hydrophilic mileau -- negative selection would inactivate the gene before the second component could be favorably mutated. Here though it needs to be recognized that generic GPCR do not have these charged internal residues (ie transitioning is possible since all GPCR presumably coalesce to a single ancestral gene). This argues for non-retinal agonist binding -- after all, what is so special about the terminal aldehyde in a retinal? Indeed assimilation of beta-carotene and its variants generates a large number of them.

A few Hydra opsin-like intronless genes with apparent disulfide, DRY motif, Schiff lysine and switch motif:

>XM_002163291_hydMag Hydra magnipapillata no introns
MASLFIIVLLSCLCGLSVTLNVTAVIAIISTKKNKDVCDVILMSLAVSDGVECIFGFSVELYGYATKGKTLQNETLCKINGFIVMYLALTSISHLVCLCLYRYILIV
HSLKAQRYLTNVKQSALYFIIPSWIYGLFWSIAAISGWNEIIREKVDTHRCTINMSPDDELKRSYLYSLTVFCFLVPVVIIIYCSLKVHLKLQNMWKLCVQISGEYAAITK
ATYKLERKHFIFLGLIIGSFFVVWTPYALCVFFLALQIKLPRVLLTYSALFAKSSTIINPTISCIIYKEYFQILRIKVQKLFRNNIVSPANL*
 
>XM_002157121_hydMag Hydra magnipapillata no introns
MDGVKFTLITIISVAIFTNSVSLYFLFKKQKKNNYVILCINLSLSDLLQSIAGYIPALFIDAKLQKATMLCKLSAFFIAFPSFTTIAMLTSMALSRMVLLSTCFHCNQ
INYKILFRKIGFISWIYGFIWAVFPLFGFSSYTLEGTHSRCSIDFSPKTIADKVYLIMIVAFGFLIPVMSILISCIYTAKVMRSKYKFFYVTYGKENVETKRYKEKEKKA
FSSFVLMVTSFIICWSPYATIGCLSAFTLTRIPKWLLHSAAFLGKLSAVVNPFIYYWKDGLFKKRFTSTTWNASYFISKSQQNIEDNQKNSRGALRNYMNVFCG*

>ABRM01010711_hydMag Hydra magnipapillata no introns 72% XM_002157121
MDAVKTTLLIITSIAVLTNSVSLYFLYKKQKKNNYVILCINLSASDLLQSIAGYIPALFFDANMQKATTLCKLSAFFIAFPSFTTIAMLSAMAISRMVLLGTCFHSTQINYKKLFIRIGI
ASWIYGFIWAVLPLLGFSSYTVENTRSRCSINFSPKTSVEKLYLIILMAFGFFIPIISILASCLFTARVINTKYKYFCVTYGKENVETKRYKKKEKKTFLSFIIMVLSFVVCWTPYATVG
CFSAFTSLKIPKWLLHVAAFFGKLSALVNPFIYYWKDGLFKKCFLNIRFKTTKLLIRKSQENSQK

>ABRM01010885_hydMag Hydra magnipapillata no introns
MDPVKITLVIIITIAIFTNTISLYFLYKKRKSNYVILCINLSFADLLQSIAGYIPALFLDTNIQKATTLCKLTAFFVAFPSFSAIAMLTGMALSRMVLLSSCFSSNLINYKKLFIKIGI
FSWIYGLFWAFLPLIGFSSYTVEATHSRCSMNFSPKNIIEKAYLILIFAFGFCIPVTIIITSCLFTAHVIVTKYNYFYVTYGKENVETKLFKEKEKKAISSFLLMVLSFIVCWTPYATIG
CFTAFTSVKTPNWLLHVAAFFAKLSALVNPVIYYWKDGLFKKRSVRKKSITKSLLIIKSEIN

>ABRM01031940_hydMag Hydra magnipapillata no introns 
MNAVKTALLVISSTATITNAISLYFLYRKHKNNYVILCINLSFSDLLRSIAGYIPALFLETNLNRASTLCKLSAFFIAFSSFTTIAMLTAIALSRMVLLSTCFLHSQVNYKTLFIKIGL
LSWIYGFTWATMPFLGFSSYTLENTNSRCSIDFFPKTKTEKVYLILLIAFGFLIPIITIITSCLYTANVMRSQYNYFYMIYGKNNVETKKYKVKEKKAFSSFLLMVLSFIVCWTPYATVGC
FSAFTSLVIPNWLLHFAAFFGKCSALVNPVIYYWKDSLLNHYFKS

>ABRM01023697_hydMag Hydra magnipapillata no introns  
MDAEQILLTFIALMAIIGNIAALYCLAKRRVNNYIMLCVNLSVSAEIQSIFGFLPTLFLEKGSKKTTLLCRLAAFFSVFPSFTSISILTAVAISRMFLLEKPFLSNHGCYRSLFYKIGM
ASWVYSFVWASLPLLGFSPYTLEATGSRCSINWTPKtVSDKVFLILLLVFMFFLPLIIILVSCFYTARVIHLKLTYFSSTYGKDNIETKRFKKKENKAVLSLLIMVLSFLLCWTPYATIALL
SAFTSITTPILLLKVAALFAKVSAVINPIIYCTKENVFYNITIAFKLRNASLITRSRNMENKNNAL

...

Hydrozoa: Cladonema radiatum (jellyfish) .. 0 opsins

Opsin cladonema.png

A Dec 2007 paper reports 20 mRNA opsin candidates for the lens-eyed hydrozoan jellyfish, Cladonema radiatum. These generally classify as ciliary opsins and the ones tested are expressed somewhat appropriately. However even the best alignment to validated opsins is very weak, no better in percent identity than many non-opsin GPCR (here 13 were used as outgroup without rationalization). However back-blastp to GenBank nr shows no non-opsin GPCR among the best matches.

The authors did not establish the existence of retinal in this species nor show 11-cis retinal covalently bound to any candidate. They note Schiff base lysine occurs in correct homologous position but do not comment on the absence of counterion at traditional positions 113 or 181 (bovine RHO1 numbering). That lysine is necessary for an opsin but not sufficient (it might arise here from primer bias). Non-opsins such as GPCR176 can have lysine at this position as well, making it only semi-diagnostic:

CropN1      KFSVVSNPIVYVIFYKDFR
            K S+++NP++++   K  R
GPCR176     KVSLLANPVLFLTVNKSVR NP_009154

CropN1      KFSVVSNPIVYVIFYKDFR
            K + + NP++YV   + FR  
LWS_homSap  KSATIYNPVIYVFMNRQFR

The number of 'opsins' is excessive given numbers of validated opsins that occur in complete bilatera genomes, even allowing for eyespot developmental stage variations and auxillary functions such as gonad photoreceptive gamete release. Recent gene family expansions might make more sense for chemosensory or chemokine recepters than opsins.

In this view, 1-2 bona fide opsins might lurk among the collection -- but which ones?. Opsins evidently experienced various gene duplications which appear subsequently co-opted to (unknown) non-opsin, non-photoreceptive roles. This causes them to nest confusingly within the opsins even though they are no longer photoreceptors. There may have been selection to maintain the buried lysine because it worked structurally at the time of duplication (offset perhaps by chloride ion) or is used for a new but chemically related signalling agonist. The same phenomonon (gene duplication and neofunctionalization) may have occured within Nematostella and Daphnia, which both have 'too many' opsins for their imaging needs. The Daphnia non-opsin opsin gene familiy expansion is not even broadly shared within the Crustacean clade.

The confusion really arises from the notion of "terminally diverged" opsins first studied within Bilatera. That is, within deuterostomes, we observe a sequence of gene duplication and divergence say encephalopsin --> pinopsin --> LWS --> other cone opsins but that stays within opsins. Even that sequence had largely terminated 500 million years ago at time of lamprey divergence (primate color vision recovery is an exception). An analogous sequence is familiar within Arthropoda for example melanopsin --> MWS --> UVS. We don't observe the sequence encephalopsin --> pinopsin --> LWS --> bradykinin receptor in human nor melanopsin --> MWS --> glycoprotein hormone receptor in drosophila. Nor do we expect any such 500 million years in the future.

In Bilatera, it seems once an opsin, always an opsin. Gene duplication still occurs but opsins are apparently so deeply dug into their hole of specialization of function and tissue expression that a gene duplicate cannot be retained unless it can carve out a niche for itself as limited variant of photoreceptor opsin, say new color sensitivity or polarization detector. Lens crystalins prove that genes often have pre-existing multiple disjoint functions (glycolytic enzyme, refractive index supplier) and so at duplication the niche is already there, merely awaiting partitioning of expression after which sequences can optimize to their respective niches or just drift. Opsins in contrast are single-purpose.

Opsins may not have been so committed in early diverging ancestral cnidaria with less elaborated photoreceptive systems and metazoan cell type specializations. Not so terminally diverged, opsin gene duplicates may have retained overall GPCR signaling capacity but for some other agonist than cis-retinal. After all, shifts in outside molecular trigger happened frequently in generic GPCR evolution, accounting for their vast diversity of functionality despite minimal departures from the universal hepta-transmembrane structure. Ready shifts in agonist are not a design flaw but rather a design feature. Variation in agonist may be tolerated, especially in ancestral GPCR, and GPCR gene duplication and divergence coupled to that of a peptide agonist.

Opsins seem unique in that cis-retinal is covalently attached, whereas other GPCR agonists diffuse in transiently to their binding site. This makes it difficult to see what the 'next' agonist could be in a duplicated non-opsin opsin (other than something very similar like vitamin A variants seen in some teleosts). However it's been argued that the photoisomerization product, trans-retinal, is really the agonist. That's non-covalently bound similarly to other GPCR effectors.

If so, that would make agonist shift in a duplicated ancestral metazoan opsin no different from other shifts taking place in other duplicated GPCR. Indeed, opsins arose from other GPCR; the ur-GPCR was not necessarily a retinal binder. Still, we wonder what the new agonists could be in this cloud of cnidarion opsin-like proteins and what signalling is accomplished where. Hybridization in Cladodema suggests the site of signalling has not moved appreciably.

It's worth re-examing the notion of "once an opsin, always an opsin" even in Bilateran history. First we wonder about the cloud of lineage-specific opsin-like duplications in the crustacean genome of Daphnia. Using bovine RHO1 as query against human genome turns up a dozen non-opsin GPCR exhibiting better matches than the bona fide opsin RGR. Very likely these would nest within opsins with respect to RGR. This suggests that certain non-opsins occur inside the broader photo-opsin family. However here it is not so certain that RGR is a degenerate photoreceptor opsin, today 'merely' a retinal photoisomerase in boreoeuthere placentals retaining the Schiff base lysine but not covalently binding cis-retinal. A similar question arises with neuropsin and peropsin.

                                               RHO1_bosTau        KTSAVYNPVIYIMMNQKFR query
NP_001044 somatostatin receptor 5 [Homo sapiens]            1e-26 NSCA--NPVLYGFLSDNFR non-opsin GPCR
NP_001048 tachykinin receptor 2 [Homo sapiens]              2e-25 MSSTMYNPIIYCCLNDRFR non-opsin GPCR 
NP_000900 neuropeptide Y receptor Y1 [Homo sapiens]         1e-23 MISTCVNPIFYGFLNKNFQ non-opsin GPCR
NP_000721 cholecystokinin A receptor [Homo sapiens]         1e-23 YTSSCVNPIIYCFMNKRFR non-opsin GPCR
NP_000903 opioid receptor, mu 1 isoform MOR-1 [Homo sapie   3e-22 YTNSCLNPVLYAFLDENFK non-opsin GPCR
NP_004212 G protein-coupled receptor 50 [Homo sapiens]      1e-21 YFNSCLNAVIYGLLNENFR non-opsin GPCR
NP_006047 neuromedin U receptor 1 [Homo sapiens]            1e-20 LGSAA-NPVLYSLMSSRFR non-opsin GPCR
NP_001471 galanin receptor 1 [Homo sapiens]                 1e-20 YSNSSVNPIIYAFLSENFR non-opsin GPCR
NP_005949 melatonin receptor 1A [Homo sapiens]              1e-20 YFNSCLNAIIYGLLNQNFR non-opsin GPCR
NP_000614 bradykinin receptor B2 [Homo sapiens]             4e-19 YSNSCLNPLVYVIVGKRFR non-opsin GPCR
NP_002912 retinal G-protein coupled receptor RGR_homsap     4e-18 KMVPTINAINYALGNEMVC opsin RGR_homsap
NP_003292 thyrotropin-releasing hormone receptor [Homo      3e-18 YLNSAINPVIYNLMSQKFR non-opsin GPCR
NP_005152 angiotensin II receptor-like 1 [Homo sapiens]     7e-18 YVNSCLNPFLYAFFDPRFR non-opsin GPCR
NP_000901 neuropeptide Y receptor Y2 [Homo sapiens]         9e-18 MCSTFANPLLYGWMNSNYR non-opsin GPCR
NP_000570 chemokine (C-C motif) receptor 5 [Homo sapie      2e-17 MTHCCINPIIYAFVGEKFR non-opsin GPCR
NP_000670 alpha-1B-adrenergic receptor [Homo sapiens]       3e-16 YFNSCLNPIIYPCSSKEFK non-opsin GPCR

No genome project is planned so the mRNA cannot be examined for diagnostic intronation; this could be done indirectly if candidates help locate orthologs in other cnidaria. However they are already very diverged from Nematostella and Hydra opsins. Introns would not be informative anyway in discriminating opsins from co-opted non-opsin, non-photoreceptors arising as segmental duplications. Indels would be similar. This presents a very difficult bioinformatic problem because truly diagnostic residues of functioning opsins could be quite subtle.

Similarly on the experimental side, antibodies would likely cross-react. The derived non-opsins likely still signal with a transducin-type G-protein and are quenched by the same arrestin, so no help there. Knockdowns might have unforseen consequences in these non-opsins, indirectly disrupting photosensitive behaviors. Thus the best way forward in identifying the true opsins within the collection is probably in vitro expression, reconstitution with cis-retinal, and demonstration of photoisomerization.

Placozoa: Trichoplax adhaerens .. 0 opsins

The sequenced genome of Trichoplax adhaerens appears in the 21 August 08 Nature and associated [98 pages of supplemental], an excellent treatment as first assembly articles go but still insufficient detail to replicate various bioinformatic assertions. The genome and browser are hosted at JGI but it is equally convenient for most purposes to simply use NCBI tblastn targeted to the WGS contig division restricted to Trichoplax.

The authors identify 4 genes as photoreceptive opsins, based on GPCR blast character, Schiff base lysine in expected position (though that position was not described), and unpublished accounts of phototactic behavior. However notation used in the document (scaffold8_1356180_1396180 = Opsin 6, scaffold13_1543556_1583556 = Opsin 26, scaffold8_1308974_1348974 = Opsin 30. scaffold8_1467859_1507859 = Opsin 31) does not match that provided at GenBank nor JGI genome. Gaps in assembly are indicated by N's which may or may not be counted in coordinate numbers. Opsins with these scaffold numberings do not surface by tBlastn of GenBank WGS where the Trichoplax contigs reside. The authors have not responded to an email request to furnish the fasta sequences.

In light of similar claims made for Hydra, Cladonema and Nematostella opsins that didn't bear up to scrutiny, given the lack of photoreceptor structures and cell-specific expression staining in Trichoplax, the burden of proof for opsins has not yet been met. Nothing in the Trichoplax genome meets the best-blastp threshold (exceeding the 25-35 percent identity range of generic GPCR matches) of the opsin classifier. Bona fide opsins also require a counterion E at ancestral position, a DRY motif, a binding motif for alpha subunit of heteromeric G protein, and about 150 semi-invariant residues that distinguish photoreceptors by ortholog class (eg ciliary vs rhabdomeric) and separate them from non-photoreceptive 'EK-rhodopsin-class' generic GPCR.

The only validated pre-Bilateran opsin to date occurs in the imaging eye of the cubozoa Tripedalia cystophora where a single ciliary opsin sequence meets all expression and classification tests. This opsin has no noteworthy matches in Trichoplax. Cnidarian larva also have a rhabdomeric structure implying that class of opsin as well.

Examples exist of non-opsin non-photoreceptive GPCR that contain lysine at the retinal position, ie this lysine is necessary for an opsin but far from sufficient. They are easy to collect using blast of consensus opsin sequence ending somewhat past that lysine. Cnidaria are especially rich in these lysine-containing non-photoreceptors. Indeed the 'opsins' reported for Hydra, Cladonema and Nematostella greatly excede the (non-observed) photoreceptive structures and organismal needs -- which surely aren't twice those of the most advanced bilatera.

Nematostella: no photoreceptor structures but 16+ GPCR genes with lysine in 'retinal position':

>tpd|BR000664     ID: 44/121 36 FLFAWTPYAVVSLWTTFGDTHRIPALLGVLPSLFA K LSSCYNPIIYFFMYTKFR 39% ENCEPH4_braFlo  *
>tpd|BR000663     ID: 48/121 39 FLFAWVPYAVVSLYASFGGVTTIPKLMSTLPAMLA K TSACYNPIIYFFMYSKFR 35% TMT5_braFlo     *
>ref|XM_001637473 ID: 35/115 30 FLACWLPYVIVSTCMSLGRRPQISLLTLEITLLVA K SGVIYNPFIYAALNLRFR 33% MEL_schMed      *
>tpd|BR000666     ID: 32/81  39 FLIAWLPYAIVSLYSAITGE.RVSPEAATIPGMLA K SASCYNPLIYVFLYSRFR 33% TMT_ornAna      *
>tpd|BR000667     ID: 38/121 31 FLVAWSPYAITSLYISWTGIQTIPDIARILPPMFA K AYSCYNPVIYYGMSRKFR 29% MEL2_xenLae     *
>tpd|BR000671     ID: 28/66  42 FVISWSPYCVVSVMAMVHRAPSLPRGLAEIPELMA K ASVIYNPLVFTVMNIEFR 32% VAOP_takRub
>tpd|BR000685     ID: 30/75  40 FTLSWSPYAIVCLKSMVVGEQKLAPFTSEVTALMA K ASAVYNPIVYVVLSKRFR 35% MEL1_schMan
>ref|XM_001634921 ID: 37/107 34 FLVAWTPYAVISFYSITGHARDLSFISVVLPAIFA K TSAFYNPIITLVFWFCLR 37% RHO1_letJap     frag
>tpd|BR000662     ID: 36/107 33 FVFAWTPYAVVSIYS.ALLKPKLPLIAGILPPLFA K TSTLYNPLLCFIGSPPIR 31% ENCEPH4_braFlo 
>tpd|BR000672     ID: 25/75  33 FVISWSPYCVVSVIAMVQRAPALSQGFAEIPELMA K ASVVYNPLVFTVMNRGFR 30% MEL_TMT
>ref|XM_001627232 ID: 37/115 32 FFISWSPYCIVSLIESAKGEVVLSPGVSMIPELMA K ASVMYNPVVYTLMNARFR 31% MEL_TMT
>ref|XM_001639117 ID: 37/118 31 FTLSWSPDAILSVI....SMVTGRPIIHVVPSLMA K SSVIYTPIVYIAFSRSFR 34% VAOP_danRer
>tpd|BR000661     ID: 38/121 31 FLIAWTPYAVVSFYYSLRGPTGVPLMAAMLPSLFA K ASSLFNPIIYFAMSREFR 31% MEL2_galGal
>tpd|BR000686     ID: 35/114 30 FAITWLPYAVYVLIAAFGGSHLFDAVMSVVPAMVA K MSILYNPFVYALVNPRFR 31% TMT_triCas
>tpd|BR000674     ID: 36/121 29 FLACWLPYVIVSTCMSLGRRPQISLLTLEITLLVA K SGVIYNPFIYAALNLRFR 35% MEL2_strPur
>tpd|BR000669     ID: 35/121 28 FSM.WVPYVAVSLIQAFTADSIITPTASHITVLVA K SCVIYNVLIYVVLNRKLK 34% TMT5_braFlo


KopsinTree.jpg

The tree at left (which would benefit from a few dozen outgroup genes such as beta adrenergic receptor) shows how some related sequences clustered in a recent evolution study by Plachetzki et al who sought but did not identify opsins in sponge, Trichoplax or Monosiga. It's not known if any of their 'cnidops' bind retinal or have a photoreceptive role. It is possible that retinal is bound in Schiff base position but requires a secondary agonist to signal. Alternatively, signaling compounds related to but distinct from retinal assume the binding site.

A third scenario conserves the lysine and its counterion through evolutionary inertia (the salt bridge avoids unfavorable burying of charged residues in hydrophobic membrane and maintains the trigger) without Schiff base, with the diversity of genes perhaps serving chemoreception. Such a class of GPCR in early cnidaria could have been the foundational source for later recruitment to opsins in cubomedusa.

An analogous situation in sea urchin (rapid expansion of an intronless opsin-like class of genes) was noted by Raibl and coworkers (who denoted them specific rapidly expanded lineages of GPCRs or surreal-GPCRs) and noted a parallel to olfactory gene expansion in amniotes. In other words, prior to dead-end specialization as imaging opsins, earlier gene duplicates could find novel but useful roles. That's not so different from even earlier diverging 'rhodopsin-class' GPCR taking on roles such as beta adrenergic receptors.

In this scenario, photoreceptive opsins have a basal outgroup originating in cnidaria (or Trichoplax) of EK-rhodopsin class non-opsins and a distinct class of EK-rhodopsin class non-opsins paraphyletically nested within them in sea urchin and possibly amphioxus GPCR (where genes are still conservatively intronated). These latter two species also have 'too many' opsins versus a paucity of photoreceptor cell morphologies especially in urchin.

Gene duplication (and loss) of opsins continues as an active process to the present day in many species. We're not surprised to see LWS duplicate to MWS in primate cones. However we don't expect LWS to give rise to processed retrogenes serving olfaction or non-retinal signaling. Those days could be behind it. LWS may be terminally differentiated and too many mutational steps away from more versatile deployment. If so, evolutionary processes do not remain constant; some genes mature over time into locked-down roles with less potential for novelity in their duplicates than their ancestral predecessors. We see from the context of Ciona that GPCR have a very long history of acquiring new agonists for novel expanded gene family branches:

RhodCiona.jpg

To recapitulate, photoreceptive opsins did not originate out of thin air by miraculous multiple mutations providing overnight EK, signature motifs and retinal binding pocket. More plausibly, opsins specialized from a pre-existing gene expansion pool of EK-class GPCR that itself evolved from generic group 1 GPCR. That class persists today in pre-bilaterans with unknown agonists and function though it apparently winked out in bilatera. Genuine opsins may nest paraphyletically (recruited by independent events, eg melanopsin, encephalopsin, neuropsin) within EK-rhodopsin class GPCR; a diversity of cnidarian opsin sequences are needed.

The unpublished observation of phototaxis in Trichoplax is only weak supportive evidence for opsins because demosponge larva also exhibit phototaxis (shadow seeking under coral rubble) whereas the action spectrum fits a flavin or carotenoid chromophore better than retinal. Larval photoreceptors don't reside in a rhabdomeric or ciliary morphological setting but rather in ring of columnar monociliated epithelial cells. No expression staining has been conducted for the putative opsins -- Trichoplax has only 4 cell types, none specialized in appearance to photoreceptor.

Cryptochromes are another well-known alternate photoreceptive system in bilatera with no homology at the protein level to opsins. In all kingdoms of life, only seven known families of proteins transduce light into signal (opsins, cryptochromes, phytochromes, xanthopsins, phototropins, blue flavin and lite1); Trichoplax would have to be evaluated for each of these too. For example, nematode C. elegans lacks opsins and cryptochromes but has a shorter wavelength photoresponse bypassing cAMP and diacylglycerol (DAG) downstream signalling that neurons commonly use to control behavior. The primary photoreceptor LITE1 (NP_509043) is a 8-transmembrane member of the invertebrate gustatory receptor family (non-GPCR) without known chromophore.

Trichoplax could have an unobserved larva and metamorphosis in view of the proposed post-sponge position. Alternatively, it may have be a direct developer or have lost the ability to reproduce other than by fission (which is contradicted by meiosis genes, oocytes and population genetics despite non-observation of male gametes). Trichoplax, being too small, has never been observed directly in the wild -- instead it is collected over weeks on submerged alga-coated microscope slides (or aquarium walls). Whole aspects of its life cycle may go undetected in labratory culture. The current situation (assuming post-sponge divergence) -- highly conserved genome yet highly derived morphology -- is an odd but not impossible mix.

Even if this phylum lacks opsins, the genome will still prove important in determining the state of signaling systems at its ancestral node. The 98 mbp genome is 3% the size of a mammalian genome but the estimated 11,514 coding genes are not that different from Drosophila and 55% of the 20,000 genes of human. Pseudogenes, which outnumber genes in mammals, are difficult to detect in isolated genomes such as Trichoplax.

This large gene set if assumed approximately ancestral is not consistent with subsequent subsequent 2R whole genome duplications (giving 45,056 genes in human) without massive gain in Trichoplax and/or loss in human). In other words, even if 2R occured, it was an inconsequential mechanism in the overall scheme of gene duplication and retention; the complement of genes was in large measure already established in early metazoa. That has special relevence to understanding expansion and specialization of opsins and associated G protein alpha subunits which primarily exhibit tandem duplication followed by translocation.

TrichoplaxSEM.jpg

Trichoplax is an odd amoeboid-shaped animal without overt symmetry axes, body plan, recognizable organs, or internal digestive gut but clear dorsal/ventral sides. Only a single species has received a Linnean name to date; it is assigned a whole phylum, Placozoa. However the taxonomy section of GenBank lists 87 other placozoan isolates; these have ~200 deposited sequences and previous publications have established quite diverged mitochondrial genes, compatible with a whole order (even as external morphologies remain indistinguishable). Thus the 600 myr long branch of the Red Sea isolate could readily subdivided by 454 sequencing to provide supplementing glimpses of ancestral characters and better reconstruction of ancestral proteins.

The authors place the divergence node between sponge and cnidaria, surprising given the advanced sponge larva and its similarity to later-diverging metazoan. Post-sponge divergence (but sistered with Cnidaria) was proposed earlier in a 1993 Science article based on small subunit ribosomal RNA sequences. The place of Ctenophores still cannot be resolved for lack of an available genome, though just about every topology has received support. Thus the number of early metazoan nodes discussed in "The Ancestor's Tale" is still uncertain.

In a 1996 PNAS article, four of the same authors made an equally convincing argument for Trichoplax basal to sponge using both mitochondrial protein analysis and odd ancestral features such as retained introns and 5 retained ORFs exceeding 100aa:

"Our analysis shows that the Trichoplax mitochondrion possesses the largest known metazoan mtDNA genome, at 43,079 bp, more than twice the size of the typical metazoan mtDNA. Its large size is due not to secondary expansion but to features shared with metazoan outgroups, such as intragenic spacers, several introns [cox1 introns share identical positions with choanoflagellate Monosiga and fungus], ORFs of unknown function, and protein-coding regions that are generally larger than that found in animals. The large Trichoplax mtDNA is the least derived mitochondrial genome of any animal. Moreover, the Trichoplax mitochondrion shares unique derived features with other lower metazoans, notably the loss of all ribosomal protein genes. These structural features of the Trichoplax mitochondrial genome, along with Bayesian and maximum-likelihood (ML) analyses of mitochondrial proteins from metazoans and outgroups, provide robust support for the phylogenetic placement of the phylum Placozoa at the root of the Metazoa.... the basal phylogenetic position of Placozoa within the lower metazoans is robust, with P values between 0.924 and 1.000 for the various statistical tests."

It appears that an unpublished, unreleased assembly of proteins of the sponge Amphimedon was used (yet again). This genome project is 100% publicly funded yet its assembly seem to have been privatized to a shortlisted clique of insiders. Lottia gene models are also used without bibliographic citation. Both were evidently analyzed by unknown parties as unpublished SNAP ab initio predictions (Table S7.1). This raises some questions because neither reviewers nor other scientists can independently evaluate the evidence supporting the unexpected phylogenetic relationships. It is reminiscent of the recent uproar over non-release of dinosaur collagen mass spectroscopy.

The JGI site shows sponge still in draft assembly on 25 Aug 08 whereas this paper was submitted nine months earlier, on 4 Dec 07. Trace reads were completed and made available in June 2005 but these cannot be used for genes or proteins without assembly as individual read typically cover but single exons. Collagen researchers have [assembled genes using tiling from mate pairs and ESTs. By custom, assembly and first publication is normally reserved for the sequencing laboratory but that was never intended to apply to holding back data for 3-4 years.

Trichoplax appears to retain a goodly share of ancestral characters, including some chromosomal gene associations and a large number of apparent 1:1 orthologs to human and anemone. However it must be noted that the authors have significantly altered the traditional definition of synteny in making their dot plot (Oxford grid) by discarding strand orientation altogether, allowing weak partial length blastp hits (perhaps just to common domains or pseudogenes) to count as orthologous matches, and accepting departures of 10 genes from adjacency (which can amount to several million bp in mammalian genomes).

The assumption here is small and very localized intra-chromosomal inversions have been more frequent than inter-chromosomal arrangments over the vast 1.2 billion year roundtrip span of evolutionary time (but see gibbon, fly, tunicate etc). As described a decade ago, dot plots rendered in Photoshop better retain orientation and quality of fit using grayscale and tint, rather than the all-or-none as here.

The 11,514 coding gene models arise largely from an informatics pipeline tool fGenesh. Like all gene predictors, even when parameterized with transcripts, this tool makes numerous errors in human (where it can be thoroughly evaluated by comparative genomics) and is no longer included in the 35 predictive gene tracks at UCSC (though still valued for nematode and other species). The tool acted after RepeatMasker masking of 665 transposons; if that library were incomplete on low copy elements, the gene count could be too high.

Manual curation is mentioned briefly but could not have been extensive given the number of genes. Gene models missing start or stop codons were simply extended out from available sequence (though in some cases transcript extension was feasible). While a downstream stop codon can always be found, an initial methionine candidate does not always occur before an upstream stop codon is encountered. This is not a good annotation practise as it assumes that partial first and last exons are at hand; better to do this only when accurate gene models from other species warrant these two assumptions. These extensions will seldom have homological support even when correct unless the termini are important (eg signal peptides; G protein C-terminal opsin interaction).

Only 58 ESTs are provided by GenBank (from an unpublished immunology study by different authors); in the nucleotide division, it appears that GenBank does not distinguish experimental mRNAs from those simply inferred from genome models. Transcripts have proven very helpful in quantitating gene prediction accuracy. In assessing completeness of the genome, the authors state 85% of 14,571 T. adhaerens ESTs (later described as 2,506 when clustered of 21% coverage) have quality matches (a more nuanced account appears in Supplemental). If 15% did not make the cut -- assuming them to be representative of missing or partial genes or absent from transcripts -- that would bring the total coding gene count to 13,241.

These 2,506 assembled transcripts could be downloaded at the indicated JGI url and tested to see how and if they are stored at GenBank as this affects queries of the average user. The article does not discuss how discrepancies between transcripts and pipeline gene models were resolved. These can involve spliced-out exons with or without comparative genomics support and non-support of called exon boundaries (here the transcripts will have it right).

On a species so distant and little studied biochemically as Trichoplax (27 articles since 1974, mainly field work or rDNA sequencing), when protein alignment dips below the 40% protein-level identity on partial matches (notably GPCR putative opsins), homological transfer of annotation from functionally characterized proteins won't be reliable beyond rough domain-level concepts. Best-blastp percent identity to known genes is not noted at GenBank model entries.

Here signalP, InterPro, and TMHMM were used to predict signal peptides, domains, and transmembrane peptides; while not validated with pre-bilateran metazoan protein chemistry to any extent, these tools provide a good start to localized functional elements. KEGG and KOG are not sufficiently developed to be worthwhile; indeed, their development appears abandoned (2004-05 versions were used). EC numbers and GO nonsense add little value. Biosynthetic pathways were not analyzed and so essential amino acids, cofactors, and intermediary metabolism were not determined.

What is needed here is an ability to filter or grade annotation transfer quality vis-a-vis match quality, match extent, domain structure, and gene family extent along the lines of GeneSorter at UCSC. For example, Trichoplax/human matches can be very high as in the 77% identity agreement seen in G protein alpha subunits (below) or very low 29% in best matches of some opsins to Trichoplax GPCR.

Conversely, Trichoplax and the secret sponge annotation will now propagate all over. This can become a self-reinforcing paradigm for error propagation, eg as later authors see corroborating annotation in their top homology matches.

None of this has to affect opsin analysis (other than missing assembly) because the genome sequence can be directly searched by tblastn without reference to JGI gene models or annotation. Any assertion of opsins must establish the location and phasing of introns in a gene model as these are exceedingly conserved in opsins and diagnostic of orthology classes (and so especially important when protein sequences are so diverged as to blur into generic GPCR).

At this late date, it might not be expected that a metazoan would contain many authentic coding genes with no blast matches in all of GenBank. These may prove greatly enriched for artefacts, just as the Ensembl human gene set used contains nearly 2,000 pseudogenes, related debris and mispredictions not supported by the 28way comparative genomics track to any phylogenetic depth. It's not clear what fraction had supporting transcripts. Dropping these will roughly offset valid genes missing from the assembly or missed by gene prediction.

Similarly, introns will not be located correctly if this is homologically forced in regions of poor identity (self-fulfilling prophecy); phase agreement can help but not so much if based simply on GT-AG which occur every 16 bp approximately, nor even with nuanced splice rules if merely assumed from remotely related species. The practical impact is that exceptions and imperfect matches are missed -- the authors report excellent overall conservation of (high quality flanking) introns with respect to human and Nematostella , which fits the modern picture of intronation being deeply ancestral and profoundly conserved outside a few rogue species (such as fly, nematode, and tunicate) with high turnover. Some 150 known splice sites that use rare alternatives to GT-AG may also be as deeply conserved though this has not yet been investigated.

Manual ab initio curation here at genomeWiki of selected genes of signal transduction (alpha subunits of Trichoplax heteromeric G proteins) shows that cGMP Gt transducin-type alpha subunits (presumably still hyperpolarizing) share the identical 8-exon structure and phasing of human and other bilatera, including the anomalously short 15 residue exon 2 that distinguishes this class of genes from the otherwise identically intronated 7-exon Gq alpha subunits that utilize phospholipase C hydrolysis of PIP2 and IP3 signaling. This strongly suggests that, though the requisite gene duplication and divergence took place much earlier, the distinctive exon 2 emergence (or fusion) predates metazoa. The presence of Gt and Gq subunits does not imply the existence of melanopsin or encephalopsin homologs because these alpha subunits must service hundreds of other unrelated GPCR.

>GNAi1_triAdh Trichoplax adhaerens (placazoa) Gt XM_002115978 77% GNAi1_homSap 8 exons 
0 MGCAASAGDKVAAAKSKEIDKKIKSDAEKAAREVKLLLL 1
2 GAGESGKSTIVKQMR 21 IIHESGFSEEDRAQYKPVVFSNTMQSMAAIIRAMGVLRIEFGDKTSLV 0
0 GDARRLFEIMDAPGVQEFTPEIVSLLKRLWSDHGVQQCFSRSREYQLNDSAPY 2
1 YLNSIDRLGKPEYIPSEQDVLRTRVKTTGIVETHFTFKDLHFK 2
1 MFDVGGQRSERKKWIHCFEGVTAIIFCVSLSAYDLVLAEDEEM 0
0 NRMMESMKLFDSICNNKWFTETSIILFLNKKDLFQEKILKSPLTICFPEYT 1
2 GANTYEEASAYIQMKFEDLNKMKDQKEIYTHFTCATDTNNIQFVFDAVTDVIIKNNLKDCGLF* 0

>GNAQ_triAdh Trichoplax adhaerens (placazoa) Gq XM_002116172 76% GNAQ_homSap 7 exons 
0 MACCLSDEAREQRRINREIEKELKKHKRDAKRELKLLLL 1
2 GTGESGKSTFIKQMRIIHGKGYTDNDRAEFTQLVFQNIFTAIQALIKAMETLNITYEHQSN 0
0 RQRVDVVRTVDPETVGSLSKEHVEAIDSIWNDSGVQECYDRRREYQLSDSAKY 2
1 YLTDLHRLAEPNYLPTQQDILRVRAPTTGIIEYDFNLDTVMFR 2
1 MVDVGGQRSERRKWIHCFENVTSIMFLVALSEYDQILAEADSQ 0
0 NRMEESKALFKTIITYPWFQNSSIILFLNKKDILEEKVQKSNIADYFPEYD 1
2 GPPRDAQAGREFILKMFVDLNPDSEKIIYSHFTCATDTENIRFVFAAVKDTILQFNLREYNLV* 0

Porifera: Amphimedon queenslandica (sponge) .. 0 opsins

One marine demosponge genome is available as unassembled traces. Sponges lie at the base of multicellular animals and are not notable for a nervous system. However demosponge larva do exhibit phototaxis (shadow seeking under coral rubble) but the action spectrum is supposedly a better fit to a flavin or carotenoid chromophore. Sponges also can respond to gravity, current, and chemical cues.

The basis for sponge responsiveness to light has been carefully studied from an ultrastructural perspective -- for an animal lacking nerves and cell junctions, the parenchymella larva are quite capable of responding effectively to light and other stimuli. Larval photoreceptors may lie in a posterior ring of columnar monociliated epithelial cells. A pigment cell occurs but the pigment itself has not been chemically characterized -- the issue is whether it is a homologously derived (melanin via tyrosine hydroxylase) or novel.

Negative larval phototaxis there has been attributed to pigment-filled protrusions in a posterior ring of columnar monociliated epithelial cells. This species may prove insufficient to explain the full range of photoresponsive responses in sponge larva such as circadian rhythm and hexactinellid photoreception (notably the role of stalk spicules).

Jacobs et al have proposed a far more sweeping view of early evolution of sensory (and other!) organs in sponges. A related view, that of Gehring, proposes that the eye (and other sensory systems) came before the brain, indeed that the nervous system arose later to coordinate a response to all these inputs. There is support for that in simple photoreceptor cells controlling their own cilia. Consequently we should not be too quick to dismiss sponge photoreception for lack of neurons.

In an article entitled "Six major steps in animal evolution: are we derived sponge larvae?" C. Nielsen writes:

A scenario for the early evolution of the metazoans. The metazoan ancestor "choanoblastaea" was a pelagic sphere consisting of choanocytes. The evolution of multicellularity enabled division of labor between cells and an "advanced choanoblastaea" consisting of choanocytes and nonfeeding cells. Polarity became established, and an adult, sessile stage developed. Choanocytes of the upper side became arranged in a groove with the cilia pumping water along the groove. Cells overarched the groove so that a choanocyte chamber was formed, establishing the body plan of an adult sponge; the pelagic larval stage was retained but became lecithotrophic [yolk-supplied]. The sponges radiated into monophyletic Silicea, Calcarea, and Homoscleromorpha. Homoscleromorph larvae show cell layers resembling true, sealed epithelia.

A homoscleromorph-like larva developed an archenteron, and the sealed epithelium made extracellular digestion possible in this isolated space. This larva became sexually mature, and the adult sponge-stage was abandoned in an extreme progenesis. This eumetazoan ancestor, "gastraea," corresponds to Haeckel's gastraea.

Trichoplax represents this stage, but with the blastopore spread out so that the endoderm has become the underside of the creeping animal. Another lineage developed a nervous system; this "neurogastraea" is the ancestor of the Neuralia. Cnidarians have retained this organization, whereas the Triploblastica (Ctenophora+Bilateria), have developed the mesoderm. The bilaterians developed bilaterality in a primitive form in the Acoelomorpha and in an advanced form with tubular gut and long Hox cluster in the Eubilateria (Protostomia+Deuterostomia).... The evolution of the eumetazoan ancestor from a progenetic homoscleromorph larva implies that we, as well as all the other eumetazoans, are derived sponge larvae.

Opsin sponge.png

The resulting picture of Reneira larva -- numerous differentiated and pluripotential cell types arranged in stereotypic patterns along central-lateral and anterior-posterior axes -- is not one typically conjured up of parazoan ("almost metazoan") in the view of Leys and Degnan. Indeed the common ancestor humans shared with sponge may have been rather advanced.

The concept here is that a photoreceptor cell can control its associated cilium without the baggage of a CNS, either as a passive rudder or more actively directing phototactic motion. In effect the single photocell is a self-sufficient brain that processes external environmental inputs, asseses them and acts appropriately. Chemoreception, a very similar GPCR signaling system, might work the same way. In this view, the nervous system evolved as a secondary system to coordinate these stand-alone sensory effectors.

The genome of Amphimedon queenslandica has been available since June 2005 at the Trace Archives but never assembled. Consequently tblastn of contigs is not available without do-it-yourself assembly of the 2.9 million traces, a inefficient but increasingly utilized option. The species was formerly called Reniera spp. and it is still carried under that name at JGI Genome. It's also been placed in Haliclona and Adocia. Voucher specimens, here QM G315611, are needed to have everyone on the same genomics page.

A futile search for sponge opsins turned up only non-opsin, rhodopsin-class GPCR genes from Amphimedon. That needs to be revisited with tblastn after assembly with melanopsins and encephalopsins recontructed back to the eumetazoan common ancestor. Similarly, no opsins were located in even earlier diverging placozoan Trichoplax, choanoflagellate Monosiga, and fungal genomes. This fits a picture of photoreceptor opsins first appearing subsequent to sponge in eumetazoa cnidarians. However these were hardly de novo genetic innovations but rather evolved out of the already-rich cauldron of GPCR gene copies in the sponge ancester.

Some later diverging species such as the model organism C. elegans lost all of their opsin genes, making them useless in Urbilateran ancestor reconstruction. This argues for much more intensive genomic sampling of sponges and cnidarians so as to sidestep inference mislead by gene loss in model organisms chosen for historic reasons.

Choanoflagellates: Monosiga brevicollis .. 0 opsins

MonoBrev.jpg

The genome sequence of Monosiga brevicollis appears in the 14 Feb 08 issue of Nature. It contains a reported 9,200 genes in 42 Mbp. These are densely intronated but reflect net loss since common ancestor with metazoans. Domain orders are markedly shuffled relative to eumetazoan counterparts and some key proteins such as Notch have only partial-length matches.

The 14 extant genera of choanoflagellates have been minimally studied overall with only 59 non-Monosigna sequences at GenBank so additional representative genomes are needed to fully understand ancestral characters (ie, retained by at least one species of choanoflagellate).

Separate studies have considered the emergence of the three collagen clades, cell adhesion via cadherins, the greatly elaborated tyrosine kinase signaling network with 128 tyrosine kinases, 38 tyrosine phosphatases, signalling origins, and 123 phosphotyrosine-binding SH2 proteins, and even transcriptionally active LTR retrotransposons.

Opsins, though not really expected from known behavior or morphology, have been sought directly without success in Monosiga. Indeed, no GPCR with lysine in retinal position can be detected either. Monosiga has either lost this class gene or more likely never had them. This puts the spotlight on sponge, yet blastn of traces is not sufficiently sensitive.

Alpha heterotrimeric G protein subunits that could bind opsin-like receptors have not been directly studied. Here we see the first alpha protein below has two introns identically located and phased to human to Gi class proteins but lacks the short exon and downstream introns. The second Gq class subunit has altogether different intronation despite 55% identity to the apparent human ortholog. These gene structures imply an era of active intron gain/loss in the era of the ancestor. Encephalopsin and melanopsin could have been supported had they been present but G alpha signalling there is a specialization of a much earlier evolved process.


>GNAi_monBre Monosiga brevicollis 3 exons XM_001747738 55% GNAi_homSapheteromeric G protein alpha subunit Gi
0 MGICMSAEQKAQQARTAAVEAQLERDAQLASRTIKLLLL 1
2 GAGESGKSTLVKQMKIIHGDGFSNEELKSYKPTICDNLVHSMRAVLEAMGPLVIDIGDQVRPP 0
0 HAKVVLSYIELGTSGGLTPELTEALKALWADSGVQECFRRSNEYQLNDSAEYFFNNIDRIAQSNYLPTQEDVLRARVRTTGV
IETTFRYKDLIYRMFDVGGQRSERRKWIHCFNDVTAVLFVAALSGYDMKLFEDQETNRIHESLTLFDAICNNSFFINTAIILFL
NKTDLFSQKIARTPLKDYFPEYDGPPNNASEAKKFIAGMFKRLNKNPNKPVYEHFVCATETQNIRYVFDAVK* 0+

>GNAQ_monBre Monosiga brevicollis 7 exon form XM_001745795 55% GNAQ_homSap heteromeric G protein alpha subunit Gq
0 MPCGPPDETRRRSLAIDRQLRKERMSKQREYKILLL GTGESGKSTIIKQMRIIYGQGFNESDRLAYKPLVYRNIITSMKRMLDALDQLSLQLADSSLEEDAYDK
LDVDVNTVDAIEPYYPLLKKLWNDNGIQQVFQRRNEYQLSDSTAYYYNRLDAVAAADYIPTVDDVLRSRQATTGIHEFEFDLDSVVFRMMDVGGQRSERRKWIHSFE 0
0 GVTSIIFIAACNEYDQVLAEDTNVNRMQESLALFGQIIQYHW 2
1 FANSSFILFLNKQDLLEEKVKTHPIKPFFPDYTGQE 0
0 GDYENIKKFIETMYRSRKPAGKDLYTHFTMATDTSNIQFVFNAVRSTLLRIHLKDYNLF* 0