Opsin evolution: informative indels: Difference between revisions

From genomewiki
Jump to navigationJump to search
Line 305: Line 305:
  SWS2_takRu  I N VLTIACTIQYKKLRSHLNYILVNLAFS N LLVT-TVGS F TCFCCFFV-- RYMIVGP LG C  P.592
  SWS2_takRu  I N VLTIACTIQYKKLRSHLNYILVNLAFS N LLVT-TVGS F TCFCCFFV-- RYMIVGP LG C  P.592
  SWS2_gasAc  I N ALTVACTVQNKKLRSHLNYILVNLAVS N LLVS-GVGA F TAFLSFAA-- RYFVLGT LA C  P.592
  SWS2_gasAc  I N ALTVACTVQNKKLRSHLNYILVNLAVS N LLVS-GVGA F TAFLSFAA-- RYFVLGT LA C  P.592
  <font color="blue">SWS1_homSa  L N AMVLVATLRYKKLRQPLNYILVNVSFG <font color="magenta">G</font> FLLC-IFSV F PVFVASCN-- GYFVFGR HV C  P.592
  <font color="blue">SWS1_homSa  L N AMVLVATLRYKKLRQPLNYILVNVSFG <font color="red">G</font> FLLC-IFSV F PVFVASCN-- GYFVFGR HV C  P.592
  SWS1_monDo  L N AVVLVATLRYKKLRQPLNYILVNVSLC <font color="red">G</font> FIFC-IFAV F TVFISSSQ-- GYFIFGR HV C  P.592
  SWS1_monDo  L N AVVLVATLRYKKLRQPLNYILVNVSLC <font color="red">G</font> FIFC-IFAV F TVFISSSQ-- GYFIFGR HV C  P.592
  SWS1_smiCr  L N GVVLIATLRYKKLRQPLNYILVNISLA <font color="red">G</font> FIFC-VFSV F TVFVSSSQ-- GYFVFGR HV C  P.592
  SWS1_smiCr  L N GVVLIATLRYKKLRQPLNYILVNISLA <font color="red">G</font> FIFC-VFSV F TVFVSSSQ-- GYFVFGR HV C  P.592
Line 446: Line 446:
  TMT2_plaDu  L N VLVLVLFIKDRKLRSPNNFLYVSLALG D LLVA-VFGT A FKFIITARK- TLLREED GF C  P.591
  TMT2_plaDu  L N VLVLVLFIKDRKLRSPNNFLYVSLALG D LLVA-VFGT A FKFIITARK- TLLREED GF C  P.591
   
   
  <font color="#0066CC">RGR1_homSa  L N TLTIFSFCKTPELRTPCHLLVLSLALA D SGIS-LNAL V AATSSLL<font color="magenta">-</font>-- RRWPYGS DG C  P.593
  <font color="blue">RGR1_homSa  L N TLTIFSFCKTPELRTPCHLLVLSLALA D SGIS-LNAL V AATSSLL<font color="magenta">-</font>-- RRWPYGS DG C  P.593
  RGR1_ornAn  L N GLTIASFRKIKELRTPSNLLVVSLALA D SGIC-LNAL M AALSSFL<font color="magenta">-</font>-- RHWPYGA EG C  P.593
  RGR1_ornAn  L N GLTIASFRKIKELRTPSNLLVVSLALA D SGIC-LNAL M AALSSFL<font color="magenta">-</font>-- RHWPYGA EG C  P.593
  RGR1_galGa  L N GLTIISFRKIKELRTPSNLLVLSIALA D CGIC-INAF I AAFSSFL<font color="magenta">-</font>-- RYWPYGS EG C  P.593
  RGR1_galGa  L N GLTIISFRKIKELRTPSNLLVLSIALA D CGIC-INAF I AAFSSFL<font color="magenta">-</font>-- RYWPYGS EG C  P.593
Line 473: Line 473:
  PER1_todPa  L C GMCIIFLARQSPKPRRKYAILIHVLIT A MAV--NGGD P AHASSSIV-- GRWLYGS VG C  P.582
  PER1_todPa  L C GMCIIFLARQSPKPRRKYAILIHVLIT A MAV--NGGD P AHASSSIV-- GRWLYGS VG C  P.582
  PER1b_sacK  G N SVVLEMFRRYKELLSPSAILLISLALA D LGLT-IFGM S LSCVSSFA-- GRWLFGK FG C  P.592
  PER1b_sacK  G N SVVLEMFRRYKELLSPSAILLISLALA D LGLT-IFGM S LSCVSSFA-- GRWLFGK FG C  P.592
  <font color="#0066CC">PER1_braFl  G N IFAIIVFLTEKEFRKKEHNSFALNLAIA D LSVCVFAY P SSTISGYA-- GEWMLGD VG C  P.602
  <font color="blue">PER1_braFl  G N IFAIIVFLTEKEFRKKE<font color="magenta">H</font>NSFALNLAIA D LSVCVFAY P SSTISGYA-- GEWMLGD VG C  P.602
  PER1_braBe  G N VITITVFLTEKEFRKKQQNGFVLNLAIA D LSVCVFAY P SSAIAGYA-- GRWVLGD VG C  P.602
  PER1_braBe  G N VITITVFLTEKEFRKKQ<font color="magenta">Q</font>NGFVLNLAIA D LSVCVFAY P SSAIAGYA-- GRWVLGD VG C  P.602
  PER2_braFl  G N ATVVLMFMLKWRQLCRKANLLIINLAAV D LCISVFGY P FSASSGFA-- NQWLFSD AI C  P.602
  PER2_braFl  G N ATVVLMFMLKWRQLCRK<font color="magenta">A</font>NLLIINLAAV D LCISVFGY P FSASSGFA-- NQWLFSD AI C  P.602
  PER2_braBe  G N ATVVLMFIMKWRQLCRKANLLVINLAAA N LCITIFGY P FSASSGYA-- HQWLFPD AI C  P.602
  PER2_braBe  G N ATVVLMFIMKWRQLCRK<font color="magenta">A</font>NLLVINLAAA N LCITIFGY P FSASSGYA-- HQWLFPD AI C  P.602
  PER2a_strP  G N ITVICVLCRYRTFRKRSINLLLINMAAS D LGVSVAGY P LTTVSGYW-- GRWLFGD VG C  P.602
  PER2a_strP  G N ITVICVLCRYRTFRKRS<font color="magenta">I</font>NLLLINMAAS D LGVSVAGY P LTTVSGYW-- GRWLFGD VG C  P.602
  PER2b_strP  G N ITVLCVLCRYGTFRKRSVNILLMNMAVS D LGVSVAGY P LTAISGYR-- GRWVFAD IG C  P.602</font>
  PER2b_strP  G N ITVLCVLCRYGTFRKRS<font color="magenta">V</font>NILLMNMAVS D LGVSVAGY P LTAISGYR-- GRWVFAD IG C  P.602</font>
  PER2_patYe  G N LLIIIVFAKRRSVRRPINFFVLNLAVS D LIVA-LLGY P MTAASAFS-- NRWIFDN IG C  P.592
  PER2_patYe  G N LLIIIVFAKRRSVRRPINFFVLNLAVS D LIVA-LLGY P MTAASAFS-- NRWIFDN IG C  P.592
  PER3_braFl  E N GITLATFTKFRSLRSPTTMLLVHLAIA D LGIC-IFGY P FSGASSLR-- SHWLFGG VG C  P.592
  PER3_braFl  E N GITLATFTKFRSLRSPTTMLLVHLAIA D LGIC-IFGY P FSGASSLR-- SHWLFGG VG C  P.592
Line 514: Line 514:
  NEUR3a_tet  G N ASVLFSASRRLTPLKAPELLTVNLAVT D IGMA-LSMY P LSIASAFN-- HAWMGGD TA C  P.592
  NEUR3a_tet  G N ASVLFSASRRLTPLKAPELLTVNLAVT D IGMA-LSMY P LSIASAFN-- HAWMGGD TA C  P.592
  NEUR3_petM  G N GAVLGVAARRWAKLKAPELLSVNLALT D LGIA-ASIY P LAVASAWN-- HRWLGGQ PV C  P.592
  NEUR3_petM  G N GAVLGVAARRWAKLKAPELLSVNLALT D LGIA-ASIY P LAVASAWN-- HRWLGGQ PV C  P.592
  <font color="brown">NEUR4_ornA  G N SMVIFILHRQRGILNPTDYLTFNLAVS D ASVS-VFGY S RGIIEIFNVF RDDGFSI WT C  P.620
  <font color="blue">NEUR4_ornA  G N SMVIFILHRQRGILNPTDYLTFNLAVS D ASVS-VFGY <font color="red">S</font> RGIIEIFN<font color="red">VF</font> RDDGFSI WT C  P.620
  NEUR4_galG  G N SVVIFVLYKQRHLLQPTDYLTFNLAVS D ASIS-VFGY S RGIIEIFNVF RDDGFSI WT C  P.620
  NEUR4_galG  G N SVVIFVLYKQRHLLQPTDYLTFNLAVS D ASIS-VFGY <font color="red">S</font> RGIIEIFN<font color="red">VF</font> RDDGFSI WT C  P.620
  NEUR4_taeG  G N SIVIFVLYKQRHVLQPTDYLTFNLAVS D ASIS-VFGY S RGIIEIFNVF RDDGFSI WT C  P.620
  NEUR4_taeG  G N SIVIFVLYKQRHVLQPTDYLTFNLAVS D ASIS-VFGY <font color="red">S</font> RGIIEIFN<font color="red">VF</font> RDDGFSI WT C  P.620
  NEUR4_anoc  G N SIVIFVLYRQRAGLQPTDYLTFNLAVS D ASVS-VFGY S RGIIEIFNVF RDDGFSI WT C  P.620
  NEUR4_anoc  G N SIVIFVLYRQRAGLQPTDYLTFNLAVS D ASVS-VFGY <font color="red">S</font> RGIIEIFN<font color="red">VF</font> RDDGFSI WT C  P.620
  NEUR4_xenT  G N SIVIFVLYKQRANLLPTDYLTFNLAVS D ASTS-VFGY S RGIIEIFNVF RDDGFSI WT C  P.620
  NEUR4_xenT  G N SIVIFVLYKQRANLLPTDYLTFNLAVS D ASTS-VFGY <font color="red">S</font> RGIIEIFN<font color="red">VF</font> RDDGFSI WT C  P.620
  NEUR4_danR  G N SIVIFVLFRQRSTLQPTDYLTLNLAVS D ASIS-VFGY S RGILEIFNIF KDSGYSV WT C  P.620
  NEUR4_danR  G N SIVIFVLFRQRSTLQPTDYLTLNLAVS D ASIS-VFGY <font color="red">S</font> RGILEIFN<font color="red">IF</font> KDSGYSV WT C  P.620
  NEUR4_tetN  G N TVVLFVLVRQRSSLQPTDLLTFNLAVS D ASIS-VFGY S RGIIQIFNVF QDSGFSI WT C  P.620
  NEUR4_tetN  G N TVVLFVLVRQRSSLQPTDLLTFNLAVS D ASIS-VFGY <font color="red">S</font> RGIIQIFN<font color="red">VF</font> QDSGFSI WT C  P.620
  NEUR4_gasA  G N SLVMFVLYRQRASLQSTDFLTLNLAIS D ASIS-IFGY S RGILEIFNIF NDDGTWI WT C  P.620
  NEUR4_gasA  G N SLVMFVLYRQRASLQSTDFLTLNLAIS D ASIS-IFGY <font color="red">S</font> RGILEIFN<font color="red">IF</font> NDDGTWI WT C  P.620
  NEUR4_calM  G N SIVIFILYRQRLSLQPPDYLTLNLAVS D ASIS-IFGY S RGIIEIFNVF RDDGFSI WT C  P.620</font>
  NEUR4_calM  G N SIVIFILYRQRLSLQPPDYLTLNLAVS D ASIS-IFGY <font color="red">S</font> RGIIEIFN<font color="red">VF</font> RDDGFSI WT C  P.620</font>
   
   
  <font color="gray">GPR17_homS  G N TLALWLFIRDHKSGTPANVFLMHLAVA D LSCV--LVL P TRLVYHFSG- NHWPFGE IA C  P.581
  <font color="gray">GPR17_homS  G N TLALWLFIRDHKSGTPANVFLMHLAVA D LSCV--LVL P TRLVYHFSG- NHWPFGE IA C  P.581

Revision as of 21:20, 12 December 2009

Introduction to indels

Insertions and deletions of amino acids (together called coding indels) are a class of genetic event rarely fixed in conserved protein sequence regions. It is not immediately clear whether a given indel represents an insertion or a deletion. The process of deciding is called indel resolution; it requires a phylogenetic tree allowing determination of ancestral length. If outgroups are consistently short, then by parsimony the ingroup clade with longer length experienced an insertion. Indels are unresolvable when outgroup data is not available. Two or more consistent outgroup nodes establishes a period of length stability.

It is implausible -- rarity cubed -- that multiple outgroups plus an ingroup experienced independent deletions of the same length at the same site (though the exact site can be difficult to evaluate if flanking residues were also affected by the original genetic event or subsequently by accelerated compensatory mutation). Advanced statistical methods can provide only illusory gains over simple parsimony because the underlying required models of indel formation are entirely speculative.

Nonetheless, examples of homoplasy are easy to come by, especially in repetitive nucleotide regions encoding runs of compositionally simple amino acids subject to the mutational mechanism of replication slippage. Homoplasy at longer time scales manifests itself by incoherent distribution over a known phylogenetic tree. Convergent evolution can also be driven by selective advantage for altered length.

Indels occur very unevenly across the length of a given protein homology class. The rate might be high in terminal regions if the amino or carboxy termini are unimportant to the fold or function of matured protein. Within folded regions of soluble proteins, indels are greatly concentrated in loop regions of the 3D structure where a change in length can be accommodated without structural disruption. The distributional occurence of indels even allows prediction of loop regions.

For integral membrane proteins such as GPCR, deletions are very rarely fixed in the transmembrane helical regions because a shortened length would no longer span the membrane at the same angle, thus pulling in inappropriate non-hydrophobic residues from soluble loops. Insertions too are rare because they push hydrophobic and boundary turn residues out into soluble compartments and distort connecting loops, perhaps altering insertion angles of adjacent transmembrane regions. Such mutations arise frequently enough but are rarely fixed at the population level or hang on as balanced alleles over timescales commensurate with ordinal speciations.

In massively expanded gene families such as GPCR, a coherently fixed indel in one descendent clade of the gene tree suggests adaptive sub- or neo-functionalisation: if the indel were merely tolerated as near-neutral change, over geological timescales homoplasy at that site would occur. A remarkable site in transmembrane helix 2 was proposed in May 2009:

'Class A GPCR constitute a large family of transmembrane receptors. Helical distortions play a major role in the overall fold of these receptors. Most are related to conserved proline residues. However, in transmembrane helix 2, the proline pattern is not conserved, and when present, proline may be located at position TM 2.58, 2.59, or 2.60 yielding a bulged structure in P2.59 and P2.60 receptors or a more typical proline kink in P2.58 receptors. The proline pattern of helix 2 can be used as an evolutionary marker of molecular divergence of class A GPCRs.

At this site, two independent indel events occurred. One [unresolvable] indel arose very early in GPCR evolution in a bilaterian ancestor before protostome-deuterostome divergence. This indel led to the split between the P2.58 somatostatin/opioid receptors and peptide receptors with the P2.59 pattern. Subfamilies with proline at position 2.59 or no proline expanded earlier, whereas P2.60 receptors remained marginal throughout evolution. P2.58 receptors underwent later rapid expansion in vertebrates with the development of the chemokine and purinergic receptor subfamilies from somatostatin/opioid-related ancestors. A second indel, resolvable as a deletion, occurred in insect melanopsins.'

This result refines the classification of Class A GPCR, which might be quite indecisive at certain gene tree nodes from sequence alignment alone. Timing of the insect deletion can be done better (below) because the SwissProt collection used by the authors carries only 20% of the melanopsins actually available. Note the structural significance of length and bulge changes can be examined in available 3D determinations. The functional effect of this shift in TM2 remains obscure but must be important.

Class  Gene           PDB            Protein                     PubMed      Best human opsin   Next Best         Signaling

T.60.1  RHO1_bosTau    1JFP 3C9M 2J4Y bovine rod rhodopsin        17825322  RHO1_homSap 93%   SWS1_homSap   45%  Gt GNAT1 raises cGMP
P.60.0  MEL1_todPac    2Z73 2ZIY      squid melanopsin            18480818  MEL1_homSap 43%   PER1_homSap   30%  Gq GNAQ? inositol trisphosphate
P.59.3  ADORA2A_homSap 3EML           adenosine receptor 2A       18832607  MEL1_homSap 27%   ENCEPH_homSap 27%  Gs GNAT3 raises cAMP
P.59.1  ADRB1_melGal   2VT4           beta 1 adrenergic receptor  18594507  MEL1_homSap 29%   ENCEPH_homSap 25%  Gs GNAT3 raises cAMP
P.59.1  ADRB2_homSap   2R4R           beta 2 adrenergic receptor  17962520  MEL1_homSap 28%   PER1_homSap   29%  Gs GNAT3 raises cAMP

Thus indels in opsins -- when they occur in a conserved region -- are potentially very informative as rare genetic events not appreciably subject to homoplasy in defining orthology classes and higher order clusterings of them, hopefully corroborating or even refining trees derived from sequence clustering by alignment. While precious, such data is limited because physiological and structural constraints have prevented most regions of opsins from ever accommodating an indel.

Indels in ciliary opsins

The tertiary structural integrity requirements of a 7-transmembrane opsin, along with tuned binding of retinal, isomerization cycle conformational shifts and binding to secondary protein contributers to the photoreception cycle, conspire to greatly constrain admissable locations for ciliary opsin indels. Indeed this varies greatly by region, with indels never seen in the transmembrane regions themselves (despite tens of billions of branch length years) and restricted in connecting cytoplasmic and extracellular loops to EC2 and IC3 and IC7. Indel incidence is much higher in amino and carboxy terminal tails but not useful because of gapping ambiguity issues.

The distribution of fixed indels is quite peculiar: almost all occur in gene family stems (ie shortly after gene duplication in one branch), hardly any occur mid-history. For vertebrate imaging opsins, this means prior to lamprey divergence. In other words, not only had all the classes of imaging opsins emerged post-tunicate/amphioxus pre-lamprey but (neglecting tails) also all their indels. No further indels arose in the subsequent 500 million years in any of these opsins, as if these opsins were already optimized from the length perspective

Consequently the rate of indel occurence per billion years of branch length -- and so the frequency of multiple independent events near a given site -- is highly correlated to region, ie each region has a characteristic time scale over which it can be informative: too long and the risk of homoplasy (convergent evolution) is too high. That risk is exacerbated by uncertainty in gap placement within an alignment, which first requires delimitation by flanking invariant residues. Gap length per se is ambiguous: an indel of 3 residues shared by two extant species might have arisen once as a single event in the first species or as two events (one and two residues successively) in the other. Thus any phylogenetic interpretation of indels must be tempered by knowledge of the regional indel susceptibilities and the assumption these remain fairly constant across lineages and time.

Informative indels show up as readily apparent columns of gaps in large-scale alignments. If present across a single opsin orthology class, that merely validates prior blast clustering and other rare genomic events in establishing those classes in the first place. Sporadic indels, defined here as indels found within a single opsin gene, arise from seqencing errors but if not might be an adaptive specialization. It's very rare to see a ciliary opsin indel restricted to a phylogenetic subclade but examples exist: the post-marsupial loss of 5 residues of RHO1 in the distal arrestin binding region.

We're concerned here primarily with non-sporadic indels that span two or more orthology classes that speak to unresolved dating and topological issues in the gene tree. Significant individual indels visible on the alignment page. These give rise to a table sortable by position along the opsin sequence, indel length, region (eg 3rd cytoplasmic loop), higher taxonomic clade, and phylogenetic depth. Specific goals are dating indel events, characterizing remote opsins in pre-vertebrate deuterostomes, correctly placing cnidarians opsins, disambiguating opsins from non-opsin GPCR, and establishing ancestral lengths.

For deuterostome ciliary opsins, the story is fairly simple up to encephalopsin. None of the transmembrane helices have indels. That holds also for the first two cytoplasmic loops and first and last extracellular loops. Structural constraints can be too rigid, as illustrated by the well-known hydrogen bond chain of extremely conserved residues that holds the transmembrane helices in a fixed relative position: N55 in TM1 hydrogen bonded to D83 in TMH2 to peptide A299 in TMH6. Indels that altered the position of these residues within the respective helical wheels would cause the whole arrangement to become unglued. The asparagine and aspartate are deeply invariant not only in opsins but also GPCR.

The second extracellular loop has a two residue insert in all rod and cone opsins in a region so far not attributed functional significance; this may have been a near-neutral event in the ancestral stem protein (ie in a gene duplicate of pinopsin). The cytoplasmic side has all the protein-protein interactions but length of the extracellular loops can still be important in tensioning of transmembrane helices that sets their angles of insertion and relative orientation.

The third cytoplasmic loop has variable length distally. Length is constant within orthology classes with parietopsin having full length, parapinopsin one residue shorter, and all others two residues fewer. This is a region of high beta factor in bovine rhodopsin crystals, ie has too much movement to be assigned a conformation. Unsurprisingly no function has been assigned. While the indel pattern supports the conventional gene tree, evidently this indel hotspot has fixed at least three separate events. While that hasn't resulted in overt homoplasy in terms of length, additional events could be masked. This weakens interpretive certainty of indels in this region.

The amino terminus has 4 informative indels, all deletions. The first unites unites RHO1 and RHO2 to the exclusion of all other opsins (as does the short highly conserved N-terminus with two glycosylation sites). No indel or intron distinguishes them. RHO2 has an odd phylogenetic distribution -- it seems to occur in one species of lamprey but not in genomic lamprey (despite 19 million traces) nor in cartilaginous nor ray-finned fish, but seeming rises again in lungfish, coelocanth, lizards, and chicken but not frog nor any mammal. Possibly the lamprey RHO2 is a lineage-specific duplication of lamprey RHO1. A later independent duplication in lobe-finned fish persisted until the mammalian nocturnal loss era. It may be missing in frog because of an incomplete genome.

Indels in melanopsins: TM2 region

The mid-transmembrane helix region preceding the proline in TM2 -- the only opsin transmembrane helix ever to experience an indel in 100 billion years of branch length evolution -- exhibits various independent insertions and deletions. That would seem to undercut efforts to make the length a definitive fundamental classifying tool among GPCR. The situation can be compounded by separate indels following the proline that, depending on gap placement, might affect the extracellular loop connecting TM2 and TM3.

However with care, the homoplasy is managable, making the locus is quite informative for opsins (though a detailed analysis is necessary to fully exploit it).

An 'iron triangle' provides a fixed upstream frame of reference critical to reliable gapping of indels in this region. This consists of a very conserved Asn55 in TM1 hydrogen bonded to the almost universal charged residue Asp83 internal to TM2 which is further hydrogen bonded via internal H20 to N of the terminal NPXXY motif and a peptide amide Ala299 in TM7 (bovine rhodopsin numbering). The iron triangle is central to the proper associative bundling and relative orientation of the seven transmembrane helices in the vicinity of the Schiff base K296. No indels occur in any opsin or GPCR between this N and D (meaning cytoplasmic loop CL1 is of fixed length, namely 12 aa). Note from the full alignment that D83 has been replaced by G in all teleost fish RHO2 and all SWS1; it is mixed with N83 in some RHO1, RHO2 and entirely N83 in SWS2 but ancestrally strictly D in basal ciliary opsins.

Downstream, the reference frame is augmented by the first cysteine C110 of the universal GPCR disulfide linking TM3 to EC2. This is preceded by an easily recognized ancient motif WIFG (squid melanopsin; human G106R causes retinitis pigmentosa), which forces all gaps to be placed between the iron triangle D, the proline P and WIFGFAAC (FVFGPTGC in bovine rhodopsin). Thus post-proline gapping is quite constrained by reliable anchors.

Proline, as an imino acid incapable of alpha helix participation, plays a special role in GPCR transmembrane helices, kinking or bulging them. Shifting the position of a proline one residue forward or back relative to the 3.4 residues per turn helical wheel (view down axis) alters both the angle of resumption of the helix and its membrane-exiting residue position, perhaps somewhat torquing the connection to the following transmembrane helix TM3.

The effect is not dramatic in terms of angstroms of shift (as can be seen from a recent 3D alignment of helix TM2 that compares bovine and squid opsins, yet it follows from comparative genomics that the consequences for adsorption spectrum and/or regulation of signaling must be substantial. In other words, gene clade specific retention of proline or specific substituents observed in the massive alignment below holding for billions of years of branch length is only feasible when adaptive.

The 185 ciliary opsins (which includes 5 basal cnidarian opsins) in the reference sequence collection are all of the same length in this region (excpt for odd Apis and Platynereis sequences), as are 65 peropsins, RGR and neuropsins, many melanopsins, and the vast majority of near-opsin GPCR. Consequently this length, denoted P59.2 (for proline in position 59 bovine rhodopsin numbering and 2 residues shorter in the proline-cysteine region than the longest opsins, is ancestral for melanopsins which themselves vary in length.

Deuterostome melanopsins are all of P.59.2 type, as are LMS and BCR arthropod melanopsins, a subclass of lophotrochozoan melanopsins, and the one known cndarian melanopsin. The remaining dozen known lophotrochozoan melanopsins are all type P.60.2. This class -- which fortunately includes the structurally determined squid melanopsin -- thus has a one residue insertion whose location appears to be 5 residues after the D and 4 before the P.

Thus lophotrochozoan melanopsins had ancestral length up to a gene duplication which subsequently acquired this stem insertion in a descendent copy. A single other human GPCR, namely thyrotropin-releasing hormone receptor TRHR, is also P.60.2, demonstrating homoplasy. However given the rarity of transmembrane indel events, the history here can be reliably disambiguated assuming parsimony.

The three classes of ecdysozoan ultraviolet melanopsins (represented by 44 genes) all share a one residue deletion in this same region, approximately at the 4th post-D residue, making them P.58.2 class, homoplasic to within gap placement to moderately abundant GPCR (eg somatostatin receptor). This event, affecting insects, crustaceans and chelicerates, occured deep within the stem lineage of ecdysozoa. More data from early diverging arthropods is needed to refine the timing. Recall these opsins have a peculiar lysine K90 (sometimes E90) that tunes their adsorption into the ultraviolet. The extra residue loss may be required to correctly position the K90 for its blueshift.

The three molluscan melanopsins of ancestral length share a striking signature aspartate residue two position preceding the proline, ie at this same K90 position. (Recall G90D and T94I in human RHO1 constitutively activate transducin in absence of chromophore and cause night blindness.) Consequently these three opsins may also have their adsorption shifted towards the UV since otherwise G90 is present in lophotrochozoan melanopsins. They should be renamed (ie reclassified) to reflect probable parental character, with P.60.2 lophotrochozoan opsins renamed to MEL2.

The post-proline pre-cysteine region has length variations that represent insertions in various homology classes. They are difficult to gap reliably other than occuring at the distal end of TM2 before the conserved block of extracellular loop EL1. As TM2 (by definition) just reaches the surface, these extra residues can be attributed to lengthened EL1. It emerges that indels outside the D to P region are only moderately informative. They may suffice to define narrow classes of opsins where blast clustering is ambiguous. While pseudo-homoplasic, that is readily resolvable given the sequence cluster isolation:

  • Three amphioxus melanopsins (eg MEL6_braFlo) have a 1 residue distal deletion but MELmop_braFlo does not. This event constitutes an isolated class of sequences.
  • Nine melanopsins from Branchiopoda have a 1 residue distal insertion. Three other melanopsins from this group have a further 1 residue insertion. This group of melanopsins has other odd properties; these could possibly have deeper ancestral roots but data is lacking from earlier branching arthropods.
  • RGR opsins all have a 1 residue distal deletion; however two Ciona opsins have seemingly regained a residue. Five Ciona RGR have a deletion preceding the D. However because the proline anchor is lacking, placement is otherwise uncertain in this isolated opsin class. These same five opsins are unique in having tyrosine in place of the conserved asparagine N in TM1 (that bonds to D).
  • Five peropsins have an inserted residue preceding the D. This appears to define PER2 opsins which are curently restricted to amphioxus and sea urchin. Hemichordates have a peropsin of type PER1 lacking the insert. Lophotrochozoan peropsins also lack it. Thus it appears to be a very restricted early gene expansion that did not persist in vertebrates.
  • NEUR4 neuropsins have a large distal insertion of 4 residues. This class of opsins is quite obscure and lacks the proline.

Departures from the conserved N D P C format are uncommon. RGR is Y/N D VMITAL C and NEUR 4 neuropsins are consistently N D S C. Ciliary opsins are the only major group departing from this pattern. Most provocatively, the very earliest TMT opsins from deuterostomes, ecdysozoa and cnidaria have the standard pattern, establishing it as unquestionably ancestral for ciliary opsins.

These opsins should be renamed to reflect this classificatory principle because they provide the ciliary ur-opsin form and quite possibly function. They cannot be successfully modeled in TM2 using bovine rhodopsin structure because it lacks the proline and its induced kink.

Using known fish ciliary ur-opsins as probes and the N D P C (especially P) as extra criterion, it emerges that both frog and lizard have a ciliary ur-opsin in syntenic location. However chicken, platypus, marsupial, and placental mammal do not. Gene order is preserved in chicken but no pseudogene remains at this site. This is a familiar story in opsins ... an old gene fades out mid-amniote but otherwise continues on 310 million year (Wall hypothesis).

No transcript data or reference gene information is available for frog or lizard ciliary ur-opsin , meaning nothing is known about site of expression. However this opsin has been specifically studied in fish, amphioxus and sea urchin. Testis is one site of expression.

Alignment of TM2 proline region in lophotrochozoan melanopsins with included representative outgroup sequences.
Numbers in parentheses indicate total number of reference sequences represented by the proxy sequence:

MEL1_todPac GNGIVIYLFTKTKSLQTPANMFIINLAFSDFTFSLVNGFPLMTISCFLKKWIFGFAAC P.60.2   (1)
MEL1_sepOff GNGIVIYLFTKTKSLQTPANMFIINLAFSDFTFSLVNGFPLMTISCFIKKWVFGMAAC P.60.2   (1)
MEL1_entDof GNGVVIYLFSKTKSLQTPANMFIINLAMSDLSFSAINGFPLKTISAFMKKWIFGKVAC P.60.2   (1)
MEL1_patYes GNTTVVYIFSNTKSLRSPSNLFVVNLAVSDLIFSAVNGFPLLTVSSFHQKWIFGSLFC P.60.2   (1)
MEL1_lotGit GNFVVIYTFSRTKSLRTASNMFVVNLALSDLTFSAVNGFPLFSLSSFSHKWIFGRVAC P.60.2   (1)
MEL1_plaDum GNLLVVWTFLKTKSLRTAPNMLLVNLAIGDMAFSAINGFPLLTISSINKRWVWGKLWc P.60.2   (1)
MEL1_schMed GNLLVLYIFARAKSLRTPPNMFIMSLAIGDLTFSAVNGFPLLTISSFNTRWAWGKLTC P.60.2   (1)
MEL1_capCap GNLVVITLFIKTRSLRTPPNMFIINLALSDMGFCATNGFPLMTVASFQKLWRWGPVAC P.60.2   (1)
MEL1_schMan GNSLVITLFLLCKQLRTPPNMLIVSLAISDFSFALINGFPLKTIAAFNHRWGWGKLAC P.60.2   (1)
MEL2_schMan LNLLVIVFFTMFKSLRTPSNILVVNLAISDFGFSAVIGFPLKTMAAFNNFWPWGKLAC P.60.2   (1)
MEL3_schMan TNLLVIFVFLTPKSSISLQCALIINLAISDFGFSAVIGFPLKTIAAFNQYWPWGSVAC P.60.2   (1)
MEL1_helRob GNIIVVWVFSRTPSLRTPSNVLVINLAICDILFSALIGFPMSALSCFQRHWIWGNFYC P.60.2   (1)
MEL2_helRob            TPILRTHANVLIINLALCDLIFSSLIGFPMTALSCFKRHWIWGDLGC P.60.2   (1)
MEL1_aplCal GNSLVIITCIRFKDLRTRSNILIINLAVGDLLMC-LIDFPLLAAASFYGEWPYGRQVC P.59.2   (1)
MEL2_lotGig GNSIVIWAHVRIKSLSTTSNMLILNLCVGCLIMC-IVDFPLYATSSFLQKWIFGHKVC P.59.2   (1)
MEL2_aplCal           RHSSLRTSSNLLVVNLTVADLVMS-SLDFPILAISSYKGCWVMGFLGC P.59.2   (1)
LMS1_droMel GNGVVIYIFATTKSLRTPANLLVINLAISDFGIM-ITNTPMMGINLYFETWVLGPMMC P.59.2  (23)
MEL1_homSap GNLTVIYTFCRSRSLRTPANMFIINLAVSDFLMS-FTQAPVFFTSSLYKQWLFGETGC P.59.2  (20)
MEL2_strPur GNSLVIYTFLRFKKLHSPINLLIVNLSASDLLVA-TTGTPLSMVSSFYGRWLFGTNAC P.59.2  (10)
TMTa1_danRe NNLLVLVLFGRYKVLRSPINFLLVNICLSDLLVC-VLGTPFSFAASTQGRWLIGDTGC P.59.2 (185)
PER1_homSap SNIIVLGIFIKYKELRTPTNAIIINLAVTDIGVS-SIGYPMSAASDLYGSWKFGYAGC P.59.2  (33)
NEUR1_homSa GNGYVLYMSSRRKKKLRPAEIMTINLAVCDLGIS-VVGKPFTIISCFCHRWVFGWIGC P.59.2  (30)
UV7_droMel  GNAFVIFMFANRKSLRTPANILVMNLAICDFLM--LIKCPIAIYNNIKEGPALGDIAC P.58.2  (14)
UV5_apiMel  GNGLVIWIFCAAKSLRTPSNMFVVNLAICDFFM--MIKTPIFIYNSFNTGFALGNLGC P.58.2  (20)
UVB_nasVit  GNGCVVWIFSTSKVLRTPSNLFIINLALFDLVM--ALEIPMLIINSFIERMIGWGLGC P.58.2   (8)

Opsinh2o.jpg

Alignment in TM2 region: 411 opsins

Colored blocks show useful opsin gene tree synapomorphies -- derived states relative to last common ancestor. These genes received the indel (position colored magenta) -- ie the indel was resolvable. Note the consistency with gene names, which were derived independently via blast clustering without consideration of introns, indels, and other rare genomic events.

Some phyoSNPs at key amino acids are also shown in red, notably the K90 of insect ultraviolet melanopsins and sole class of ciliary opsins with ancestral proline surviving at position 59 which dilstingishes the ciliary ur-opsin class TMTa from later gene duplications that became other TMT homologs and encephalopsins. SWS1 opsins all have an asparagine in place of the key aspartate; RHO2 in teleost fish all have a glcine; NEUR4 all have a serine here as well as a 4 residue insert.

            --TM1-----><---CL1---><---------------TM2------------><--EC1---><-TM3
MEL1_todPa  G N GIVIYLFTKTKSLQTPANMFIINLAFS D FTFSLVNGF P LMTISCFL-- KKWIFGF AA C  P.602
MEL1_sepOf  G N GIVIYLFTKTKSLQTPANMFIINLAFS D FTFSLVNGF P LMTISCFI-- KKWVFGM AA C  P.60
MEL1_entDo  G N GVVIYLFSKTKSLQTPANMFIINLAMS D LSFSAINGF P LKTISAFM-- KKWIFGK VA C  P.602
MEL1_patYe  G N TTVVYIFSNTKSLRSPSNLFVVNLAVS D LIFSAVNGF P LLTVSSFH-- QKWIFGS LF C  P.602
MEL1_lotGi  G N FVVIYTFSRTKSLRTASNMFVVNLALS D LTFSAVNGF P LFSLSSFS-- HKWIFGR VA C  P.602
MEL1_plaDu  G N LLVVWTFLKTKSLRTAPNMLLVNLAIG D MAFSAINGF P LLTISSIN-- KRWVWGK LW c  P.602
MEL1_schMe  G N LLVLYIFARAKSLRTPPNMFIMSLAIG D LTFSAVNGF P LLTISSFN-- TRWAWGK LT C  P.602
MEL1_capCa  G N LVVITLFIKTRSLRTPPNMFIINLALS D MGFCATNGF P LMTVASFQ-- KLWRWGP VA C  P.602
MEL1_schMa  G N SLVITLFLLCKQLRTPPNMLIVSLAIS D FSFALINGF P LKTIAAFN-- HRWGWGK LA C  P.602
MEL2_schMa  L N LLVIVFFTMFKSLRTPSNILVVNLAIS D FGFSAVIGF P LKTMAAFN-- NFWPWGK LA C  P.602
MEL3_schMa  T N LLVIFVFLTPKSSISLQCALIINLAIS D FGFSAVIGF P LKTIAAFN-- QYWPWGS VA C  P.602
MEL1_helRo  G N IIVVWVFSRTPSLRTPSNVLVINLAIC D ILFSALIGF P MSALSCFQ-- RHWIWGN FY C  P.602
MEL2_helRo  . . .........TPILRTHANVLIINLALC D LIFSSLIGF P MTALSCFK-- RHWIWGD LG C  P.602
MEL1_aplCa  G N SLVIITCIRFKDLRTRSNILIINLAVG D LLMC-LIDF P LLAAASFY-- GEWPYGR QV C  P.592
MEL2_lotGi  G N SIVIWAHVRIKSLSTTSNMLILNLCVG C LIMC-IVDF P LYATSSFL-- QKWIFGH KV C  P.592
MEL2_aplCa  . . ........RHSSLRTSSNLLVVNLTVA D LVMS-SLDF P ILAISSYK-- GCWVMGF LG C  P.592
MEL1_homSa  G N LTVIYTFCRSRSLRTPANMFIINLAVS D FLMS-FTQA P VFFTSSLY-- KQWLFGE TG C  P.592
MEL1_felCa  G N LMVIYTFCRSRGLRTPANMFIINLAVS D FFMS-FTQA P VFFASSLH-- KRWLFGE AG C  P.592
MEL1_canFa  G N LMVIYTFCRTRGLRTPSNMFIINLAVS D FFMS-FTQA P VFFASSLH-- KRWLFGE AG C  P.592
MEL1_myoLu  G N LTVIYTFCRSRGLRTPANMFIINLAVS D FLMC-FTQA P VVFASSIY-- KRWLFGE AG C  P.592
MEL1_pteVa  G N LTVIYTFCRSRGLRTPANMFIINLAVS D FLMS-FTQA P VVFISSLY-- KRWLFGQ AG C  P.592
MEL1_smiCr  G N LLVIYTFCRSRSLRTPANMFIINLAIS D FFMS-FTQA P VFFASSLY-- ERWIFGE KG C  P.592
MEL1_monDo  G N FLVIYTFCRSHSLRTPANMFIINLAIS D FFMS-FTQA P VFFASSMY-- KRWIFGE KA C  P.592
MEL1_loxAf  G N LMVIYIFFRSRGLRTPANMFIINLAVS D FLMS-FTQA P VFFASSLY-- KRWLFGE AG C  P.592
MEL1_taeGu  G N FLVFYAFCRSRSLQTPANILIINLAIS D FLMS-ITQS P VFFTSSLY-- KHWIFGE KG C  P.592
MEL1_galGa  G N FLVIYAFCRSRTLQKPANIFIINLAVS D FLMS-ITQS P VFFTNSLH-- KRWIFGE KG C  P.592
MEL1_xenTr  G N FLVIYAFCRSRSLRSPANMFIINLAIT D FLMS-VTQA P VFFATSLH-- KRWIFGE KG C  P.592
MEL1_danRe  G N FLVIYAFSRSRTLRTPANLFIINLAIT D FLMC-ATQA P IFFTTSMH-- KRWIFGE KG C  P.592
MEL1_takRu  G N FLVIYAFCRSRSLRTPANMFIINLAVT D LLMC-VTQT P IFFTTSMY-- KRWIFGE KG C  P.592
MEL1_gasAc  G N VLVIYAFSKSRSLRTPANMFIINLAIT D LLMC-VTQA P IFFTTSMH-- KRWIFGE KG C  P.592
MEL1_oryLa  G N FLVIYAFSRSRSLRTPANMFIINLAIT D LLMC-VTQS P IFFTTSMH-- KRWIFGE KG C  P.592
MEL1_calMi  G N FLVIYAFLRSRSLRTPANTFIINLAAT D FLMS-VTQS P IFFITSIH-- KRWIFGE KG C  P.592
MEL1_petMa  G N VLVIYAFSKSKSLRSPANIFIINLAFA D FFMS-ITQT P IFFVTSLH-- KRWIFGE KG C  P.592
MELmop_bra  G N AVVVYSFIKSKGLRTPANFFIINLALS D FLMN-LTNM P IFAVNSAF-- QRWLLSD FA C  P.592
MELmop_bra  G N AVVVYSFIKSKGLRTPANFFIINLALS D FLMN-LTNM P IFAVNSAF-- QRWLLSD FA C  P.592
MEL1_strPu  . . ........WTKSLRTPPNMLIVNLAIS D FGMV-ITNF P LMFASTIY-- NRWLFGD AG C  P.592
MEL2_strPu  G N SLVIYTFLRFKKLHSPINLLIVNLSAS D LLVA-TTGT P LSMVSSFY-- GRWLFGT NA C  P.592
MEL2_galGa  G N LLVLYAFYSNKKLRTPQNFFIMNLAVS D FLMS-ASQA P ICFVNSLH-- REWILGD IG C  P.592
MEL2_xenLa  G N MLVLYAFYRNKKLRTAPNYFIINLAIS D FLMS-ATQA P VCFLSSLH-- REWILGD IG C  P.592
MEL2_anoCa  G N LLVLYAFYSNKRLRTPPNYFIMNLAVS D FLMS-ATQA P ICFLNSMH-- KEWVLGD IG C  P.592
MEL2_tetNi  G N VLVIFAFYSNKKLRSLPNYFIVNLAVS D LLMA-STQS P IFFIN-LY-- KEWMFGE TA C  P.592
MEL2_danRe  G N ALVMFAFYRNKKLRSLPNYFIMNLAVS D FLMA-ITQS P IFFINCLY-- KEWMFGE LG C  P.592
MEL2_gasAc  G N ALVMLAVYSNKKLRNLPNYFIMNLAVS D FLMA-FTQS P IFFINCLY-- KEWAFGE TG C  P.592
MEL6_braFl  G N AVALYAFCRSRSLRRPKNYLIANLCLT D MVVC-LVYS P IIVTRSL--- SHGLPSK ES C  P.593
MEL6_braBe  G N VVALYAFCRTRSLRRPKNYVVANLCLT D MFVC-LVYC P IVVSRSF--- SHGFPSK ES C  P.593
MELx_braFl  G N AVALYAFCSTRKLRRPKNYVVANLCLT D LIMC-IVYC P VIVISSF--- SGRIPTD GA C  P.593

LMS1_droMe  G N GVVIYIFATTKSLRTPANLLVINLAIS D FGIM-ITNT P MMGINLYF-- ETWVLGP MM C  P.592
LMS2_droMe  G N GVVVYIFGGTKSLRTPANLLVLNLAFS D FCMM-ASQS P VMIINFYY-- ETWVLGP LW C  P.592
LMS6_droMe  G N FIVMYIFTSSKGLRTPSNMFVVNLAFS D FMMM-FTMF P PVVLNGFY-- GTWIMGP FL C  P.592
LMS_anoGam  G N GMVIYIFSTAKSLRTPSNLFIVNLALS D FLMM-GTNA P TMVYNCWF-- ETWSLGL LM C  P.592
LMS_rhoPro  G N GMVIFIFSSTKTLRTPSNLLVVNLAFS D FLMM-FTMS P PMVINCYN-- ETWVLGP LM C  P.592
LMS_schGre  G N GMVIYIFSTTKSLRTPSNLLVVNLAFS D FLMM-FTMS A PMGINCYY-- ETWVLGP FM C  P.592
LMS_lucCru  G N GMVIYIFSTTKSLRSPSNLLVVNLAFS D FLMM-FTMA P PMVINCYN-- ETWVWGP LF C  P.592
LMS_triCas  G N GMVIYIFSSTKALRTPSNLLVVNLAFS D FLMM-LCMS P AMVINCYN-- ETWVLGP LV C  P.592
LMS_manSex  G N GMVIYIFMSTKSLKTPSNLLVVNLAFS D FLMM-CAMS P AMVVNCYY-- ETWVWGP FA C  P.592
LMS_papXut  G N GMVVYIFTSTKSLKTPSNLLVVNLAFS D FLMM-LCMA P PMLINCYY-- ETWVFGP LA C  P.592
LMS_homCoa  G N GMVVYIFSCTKALRTPSNLLVVNLAFS D FLMM-FTMA P PMVLNCYY-- ETWVLGP FM C  P.592
LMSa_nasVi  G N GMVVYIFASTKSLRTPSNLLVINLAFS D FCMM-FTMS P PMVINCYY-- ETWVFGP LM C  P.592
LMSb_apiMe  G N GMVVYIFLSTKSLRTPSNLFVINLAIS D FLMM-FCMS P PMVINCYY-- ETWVLGP LF C  P.592
LMS_acyPis  G N GMVIYIFTCTKNLRTPSNLLIVNLAFS D FCLM-FTMC P AMVWNCFY-- ETWMFGP FA C  P.592
LMSb_nasVi  G N GMVVYIFLVTPSLRTPSNLLVINLAFS D FVMM-IIMS P PMVVNCWY-- ETWILGP LM C  P.592
LMSa_apiMe  G N GVVVYVFIMTPSLRTPSNLLVVNLAFS D FIMM-GFMC P PMVICCFY-- ETWVLGS LM C  P.592
LMS_meoOer  G N FVVIWVFMNTKALRSPANTLVVSLAVS D FIMM-ACMF P PLVLNCYW-- GTWIFGP LF C  P.592
LMS_limPol  G N GMVIYLMMTTKSLRTPTNLLVVNLAFS D FCMM-AFMM P TMTSNCFA-- ETWILGP FM C  P.592
LMS2_plePa  G N GMVMYLMNTTKSLKTPTNMLIVNLAFS D FCMM-AFMM P TMAANCFA-- ETWILGP FM C  P.592
LMS2_hasAd  G N GMVIYLMSTTKSLKTPTNMLIVNLAFS D FCMM-AFMM P TMAANCFA-- ETWILGP LM C  P.592
LMS_ixoSca  G N SMVIYIMTTSKSLRSPTNMLVVNLAFS D WCMM-AFMM P TMAANCFA-- ETWILGP FM C  P.592
LMS1_plePa  G N SIVIYLMLSVKSLRTPANFLVTSLAVS D GGML-AFMA P TMPINCFA-- QTWVLGP FM C  P.592
LMS1_hasAd  G N GVVMYLMMTVKNLRTPGNFLVLNLALS D FGML-FFMM P TMSINCFA-- ETWVIGP FM C  P.592
BCRa_hemSa  G N GLVIYLYMKSQALKTPANMLIVNLALS D LIML-TTNF P PFCYNCFGS- GRWMFSG TY C  P.591
BCRb_hemSa  G N GLVIYLFNKSAALRTPANILVVNLALS D LIML-TTNV P FFTYNCFGS- GVWMFSP QY C  P.591
BCR_porPel  G N GMVIYLFAKCQALRTPANILVVNLALS D LIML-TTNV P FFTYNCFGN- GVWMFSA TY C  P.591
BCR_triGra  G N SLVISLFTKTKELRTPANMFVVNLAFS D LCMM-ITQF P MFVYNCFGN- GMWLFGP FL C  P.591
BCR2_triLo  G N SLVISLFTKTKELRTPANMFVVNLAFS D LCMM-ITQF P MFVYNCFGN- GMWLFGP FL C  P.591
BCR_limPol  G Q SVVLYLFAKTKPLRTPANMLIVNLAFS D FMMM-ITQF P VFIINCLGG- GAWQLGP LL C  P.591
BCR2_braKu  G N GLVIWIFLKTKSLRTPSNMLIVNLAIA D FFMM-LTQS P LYIISAFST- RWWIWGH FW C  P.591
BCR3_braKu  G N GLVIKIFLKTKSLRTPSNMLIVNLAIA D FFMM-LTQS P LFIISAFSS- RWWIWGH FW C  P.591
BCR1_triGr  G N YLVLRIFTKFQELRRPSNVLVINLALS D MLLM-LTLF P ECVYNFLGS- GPWRFGD LG C  P.591
BCR2_triGr  G N VLVLHIFGKHKNLRSPTNTLLMNLAFC D LMIF-IGLY P EMLGNIFMND GTWMWSGD VA C  P.590
BCR1_triLo  G N VLVLHIFGKHKNLRSPTNTLLMNLAFC D LMIF-IGLY P EMLGNIFMND GTWMWGD IA C  P.590
BCR3_triGr  G N VLVLYIFGKYKSLRSPTNVLVMNLAFC D LGLF-VGLY P ELLGNIFINN GPWMWGD VA C  P.590
MEL1_dapPu  A N STILYVFSRFKRLRTPANVFIINLTIC D FLA--CCLH P LAVYSAFR-- GRWSFGQ TG C  P.582
UV7a_acyPi  G N SLVIFMYFKCRSLQTPANMLIINLAVS D FIM--LAKA S VFIYNSYY-- LGPALGK LG C  P.582
UV7b_acyPi  G N SLVIFMYIKCKSLQTPANVLIMNLAVS D FIM--LAKT P VFIYNSFY-- QGPTLGK LG C  P.582
UV7_rhoPro  G N LLVIFMILRFRTLRTSSNILILNLAVS D FLM--VAKM P VFIYNSFY-- FGPVLGE MG C  P.582
UV7_anoGam  G N ALVVFMFYRYRSLRTPANYLVINLAVA D FII--MMEA P MFIYNSIH-- QGPALGS IG C  P.58
UV7_aedAeg  G N LLVILMFFRFKSLRTPANYLVINLAIA D FII--MLEA P LFVYNSYH-- QGPATGN VW C  P.582
UV7_culQui  G N VLVIFMFFKFKSLRTPANYLVINLAVA D FLI--MLEA P IFVYNSYH-- LGPAFGN TL C  P.582
UV7_droMel  G N AFVIFMFANRKSLRTPANILVMNLAIC D FLM--LIKC P IAIYNNIK-- EGPALGD IA C  P.582
UV7_droYak  G N AFVIFMFANRKSLRTPANILVMNLAIC D FLM--LIKC P IAIYNNIK-- EGPALGD IA C  P.582
UV7_droAna  G N AFVIFMFANRKSLRTPANILVMNLAIC D FLM--LVKC P IAIYNNIK-- EGPALGD VA C  P.582
UV7_droPse  G N AFVIFMFANRKSLRTPANILVMNLAIC D FLM--LVKC P IAIYNNIK-- EGPALGD AA C  P.582
UV7_droWil  G N AFVIFMFSNRKSLRTPANILVMNLAIC D FLM--LVKC P IAIYNNIK-- EGPALGD IA C  P.582
UV7_droMoj  G N AFVIFMFGSRKSLRTPANILVMNLAIC D FLM--LVKC P IAIYNNIQ-- EGPALGD AA C  P.582
UV7_pedHum  G N FLIIYLFLRKRSLRTPSNVFIFNLAVS D SLL--LLKM P VFIINSFY-- LGPALGN LG C  P.582
UV7_ixoSca  . . .........RRRIRSQANLLVFNLALS D LLM--VLEI P LLVYNSLK-- LRPALGV WG C  P.582
UV5_plePay  G N AIVMYIFFSAKTLRTPTNMFVIGLAMA D LLM--MSKT P VFIYNCFH-- LGPVFGQ IG C  P.582
UV5_hasAda  G N AIVIYIFSVSKSLRTPTNMFVIGLAMA D LLM--MSKT P VFIYNCFH-- LGPVFGQ LG C  P.582
UV5_braKug  G N GVVIWVFASAKSLRTPSNLFVINLAVL D FLM--MLKT P VFIVNSFN-- EGPIWGK TG C  P.582
UV5_triLon  G N GVVIWIFSSAKSLRTPSNMFVINLAVL D FIM--MMKT P VFIVNSFN-- EGPIWGK FG C  P.582
UV5_triGra  G N GVVIWIFSSAKSLRTPSNMFVINLAVL D FIM--MMKT P VFIVNSFN-- EGPIWGK FG C  P.582
UV5a_dapPu  G N GVVIWIFTNCKSLRTPSNMLVVNLAIL D MLM--MLKS P VMIINSYN-- EGPIWGK LG C  P.582
UV5b_dapPu  G N GIVIYIFSTTKELKTPSNILILNLAIC D FIM--MIKT P IFIVNSFN-- EGPVFGR LG C  P.582
UV5_papXut  G N GLVIFIFSASKSLRTPSNLLVVQLAVL D FLM--MLKA P IFIYNSIK-- RGFASGV IG C  P.582
UV5_manSex  G N GMVIFIFSTTKSLRTSSNFLVLNLAIL D FIM--MAKA P -FIYNSAM-- RGFAVGT VG C  P.582
UV5_apiMel  G N GLVIWIFCAAKSLRTPSNMFVVNLAIC D FFM--MIKT P IFIYNSFN-- TGFALGN LG C  P.582
UV5_nasVit  G N GLVIWIFCAAKSLRTPSNMFVVNLAIC D FMM--MLKT P IFIYNSFH-- TGFALGN LG C  P.582
UV5_diaNig  G N GLVIWVFSSAKTLRTPSNIFVINLALY D FIM--MLKT P IFIYNSFN-- LGFGLGQ LG C  P.582
UV5_lucCru  G N GLVLWIFSTSKSLKTASNMFVVNLAFC D FIM--MMKM P IFVYNSFN-- RGYALGH IG C  P.582
UV5_triCas  G N GLVIWIFSTSKSLRTASNMFVVNLAIC D FAM--MIKT P IFIYNSFY-- RGFALGH LG C  P.582
UV5_anoGam  G N GLVIWIFIAAKSLRTPSNVFVINLAIC D FFM--MAKT P IFIYNSFT-- KGFTLGN LG C  P.582
UV4_droMel  G N GMVIWIFSTSKSLRTPSNMFVLNLAVF D LIM--CLKA P IFIYNSFH-- RGFALGN TW C  P.582
UV3_droMel  G N GLVIWVFSAAKSLRTPSNILVINLAFC D FMM--MVKT P IFIYNSFH-- QGYALGH LG C  P.582
UV5_rhoPro  G N GLVIWIFSTAKTLRTPSNIFVVNLAIC D FLM--MSKT P IFIYNSFK-- LGYALGH RA C  P.582
UV5_pedHum  G N GIVIWIFTTSKNLRTASNVFVVNLAIF D FIM--MAKT P IMIYNSMN-- LGFECGF VW C  P.582
UV5_acyPis  G N GLVIWVFCVAKPLRTPSNIFVINLALC D FVM--MAKA P IFILGSIN-- RGY-QGH FL C  P.582
UVB_anoGam  G N GIVLWIFGTSKSLRNGSNMFIINLAIF D LLM--MCEM P MFLVNSFS-- ERLVGYG VG C  P.582
UVB_diaNig  G N GIVLWIFATTKSLRTPSNMFVVNQALL D LLM--MIEM P MFVLNSLYF- QRPIGWE MG C  P.581
UVB_manSex  G N GIVIWIFSTSKSLRSASNMFVINLAVF D LMM--MLEM P LLIMNSFY-- QRLVGYQ LG C  P.582
UVB_apiMel  G N CCVIWIFSTSKSLRTPSNMFIVSLAIF D IIM--AFEM P MLVISSFM-- ERMIGWE IG C  P.582
UVB_nasVit  G N GCVVWIFSTSKVLRTPSNLFIINLALF D LVM--ALEI P MLIINSFI-- ERMIGWG LG C  P.582
UV5B_droMe  G N GLVIWIFSTSKSLRTPSNLLILNLAIF D LFM--CTNM P HYLINATV-- GYIVGGD LG C  P.582
UVB_acyPis  G N GLVLWIFCVSKPLRTPSNLFVLNLALC D FSM--VLVL P ILIYDSID-- HKY-PGH LQ C  P.582
UVB_megVic  G N GLVLWIFCVSKPLRTPSNLFVLNLALC D FSM--VLVL P ILIYDSID-- HKY-PGH LQ C  P.582

RHO1_bosTa  I N FLTLYVTVQHKKLRTPLNYILLNLAVA D LFMV-FGGF T TTLYTSLH-- GYFVFGP TG C  P.592
RHO1_homSa  I N FLTLYVTVQHKKLRTPLNYILLNLAVA D LFMV-LGGF T STLYTSLH-- GYFVFGP TG C  P.581
RHO1_monDo  I N FLTLYVTIQHKKLRTPLNYILLNLAIA D LFMV-FGGF T MTLYTSLH-- GYFVFGP TG C  P.592
RHO1_ornAn  I N FLTLYVTIQHKKLRTPLNYILLNLAFA N HFMV-LGGF T TTLYTSLH-- GYFVFGP TG C  P.592
RHO1_galGa  V N FLTLYVTIQHKKLRTPLNYILLNLVVA D LFMV-FGGF T TTMYTSMN-- GYFVFGV TG C  P.592
RHO1_anoCa  I N FLTLFVTIQHKKLRTPLNYILLNLAVA N LFMV-LMGF T TTMYTSMN-- GYFIFGT VG C  P.592
RHO1_xenTr  I N FMTLYVTIQHKKLRTPLNYILLNLVFA N HFMV-LCGF T VTMYTSMH-- GYFIFGQ TG C  P.592
RHO1_neoFo  I N FLTLYVTVQHKKLRTPLNYILLNLAVA D LFMV-FGGF T TTMYTAMN-- GYFVFGV VG C  P.592
RHO1_latCh  I N FLTLFVTIQHKKLRTPLNYILLDLAVA D LCMV-FGGF F VTMYSSMN-- GYFVLGP TG C  P.592
RHO1_angAn  V N FLTLYVTIEHKKLRTPLNYILLNLAVA N LFMV-FGGF T TTVYTSMH-- GYFVFGE TG C  P.592
RHO1_conMy  I N FLTLYVTIEHKKLRTPLNYILLNLAVA D LFMV-FGGF T TTMYTSMH-- GYFVFGP TG C  P.592
RHO1_takRu  V N FLTLFVTVKHKKLRTPLNYVLLNLAVA D LFMV-IGGF T VTLYTALH-- AYFVLGV TG C  P.592
RHO1_leuEr  V N FLTLFVTIQHKKLRQPLNYILLNLAVS D LFMV-FGGF T TTIITSMN-- GYFIFGP AG C  P.592
RHO1_calMi  V N FLTLYVTFEHKKLRQPLNFILLNLAVA D LFMV-FGGF F ITVYTSLH-- GYFVFGV TG C  P.592
RHO1_petMa  V N FLTLFVTVQHKKLRTPLNYILLNLAVA N LFMV-LFGF T LTMYSSMN-- GYFVFGP TM C  P.592
RHO1_letJa  V N FLTLFVTVQHKKLRTPLNYILLNLAMA N LFMV-LFGF T VTMYTSMN-- GYFVFGP TM C  P.592
RHO1_geoAu  V N FLTLFVTVQHKKLRTPLNYILLNLAVS N LFMI-LFGF T TTMYTSMN-- GYFVFGP TM C  P.592
RHO2_calMi  I N GLTLLVTVKHKKLRQPLNFILLNLAVA D LFMV-FGGF F ITVYTSLH-- GYFVFGV TG C  P.592
RHO2_galGa  I N LLTLLVTFKHKKLRQPLNYILVNLAVA D LFMA-CFGF T VTFYTAWN-- GYFVFGP VG C  P.592
RHO2_taeGu  I N FLTLLVTFKHKKLRQPLNYILVNLAVA D LCMA-CFGF T VTFYTAWN-- GYFVFGP IG C  P.592
RHO2_podSi  I N LLTLLVTFKHKKLRQPLNYILVNLAVA D LFMA-CFGF T VTFYTAWN-- GYFIFGP IG C  P.592
RHO2_anoCa  I N ILTLLVTFKHKKLRQPLNYILVNLAVA D LFMA-CFGF T VTFYTAWN-- GYFIFGP IG C  P.592
RHO2_neoFo  I N LLTLVVTFKHKKLRQPLNYILVNLAVA D LFMV-CFGF T VTFSTAIN-- GYFIFGP RG C  P.592
RHO2_latCh  I N FLTLLVTFKHKKLRQPLNYILVNLAVA S LFMV-VFGF T VTFYSSLN-- GYFVLGP MG C  P.592
RHO2_gekGe  L N GLTLFVTFQHKKLRQPLNYILVNLAAA N LVTV-CCGF T VTFYASWY-- AYFVFGP IG C  P.592
RHO2_pheMa  L N GLTLFVTFQHKKLRQPLNYILVNLAVA N LLMV-ICGF T VTFYTSWY-- GYFVFGP MG C  P.592
RHO2_geoAu  V N FMTLFVTFKLKKLRQPLNFILVNLCVA D LLMI-MFGF T TTFYTAMN-- GYFVFGP TG C  P.592
RHO2_danRe  I N GLTLLVTAQHKKLRQPLNFILVNLAVA G TIMV-CFGF T VTFYSAIN-- GYFVLGP TG C  P.592
RHO2d_danR  I N GLTLLVTAQHKKLRQPLNFILVNLAVA G TIMV-CFGF T VTFYTAIN-- GYFVLGP TG C  P.592
RHO2c_danR  I N GLTLVVTAQHKKLRQPLNFILVNLAVA G TIMV-CFGF T VTFYTAIN-- GYFVLGP TG C  P.592
RHO2a_danR  I N VLTLVVTAQHKKLRQPLNYILVNLAFA G TIMV-IFGF T VSFYCSLV-- GYMALGP LG C  P.592
RHO2b_danR  I N VLTLLVTAQHKKLRQPLNYILVNLAFA G TIMA-FFGF T VTFYCSIN-- GYMALGP TG C  P.592
RHO2_takRu  I N GLTLLVTAQNKKLRQPLNYILVNLAVA G LIMC-AFGF T ITITSAVN-- GYFILGA TA C  P.592
RHO2_gasAc  I N GLTLLVTAQNKKLRQPLNYILVNLAVA G LIMC-AFGF T ITITSAVN-- GYFILGA TA C  P.592
RHO2_oreNi  I N GLTLFVTAQNKKLRQPLNYILVNLAVA G LIMC-CFGF T ITITSAIN-- GYFVLGT TF C  P.592
RHO2_hipHi  I N GLTLFVTAQNKKLRQPLNYILVNLAVA G LIMC-CFGF T ITITSAFN-- GYFILGA TF C  P.592
RHO2_mulSu  I N GLTLLVTFQNKKLQQPLNYILVNLAVV G LIMC-AFGF T ITITSALN-- GYFILGP TF C  P.592
RHO2_pomMi  I N ALTLLVTFQNKKLRQPLNFILVNLAVA G LIMC-AFGF T ITITSALN-- GYFILGA TF C  P.592
RHO2_oryLa  I N ALTLVVTAQNKKLRQPLNFILVNLAVA G LIMV-CFGF T VCIYSCMV-- GYFSLGP LG C  P.592
SWS2_ornAn  I N LLTVICTIKYKKLRSHLNYILVNLAVS N MLVV-CVGS A TAFYSFAH-- MYFVLGP TA C  P.592
SWS2_anoCa  I N VLTIFCTFKYKKLRSHLNYILVNLSVS N LLVV-CVGS T TAFYSFSN-- MYFSLGP TA C  P.592
SWS2_utaSt  I N VLTIFCTFKYKKLRSHLNYILVNLAVS N LLVV-CIGS T TAFYSFAQ-- MYFSLGP TA C  P.592
SWS2_taeGu  I N ALTVLCTAKYKKLRSHLNYILVNLAVA N LLVV-CVGS T TAFYSFSQ-- MYFALGP LA C  P.592
SWS2_neoFo  I N VLTIICTFKYKKLRSHLNYILVNLAVA N LIVV-GFGS T TAFYSFSQ-- MYFAWGP LA C  P.592
SWS2_galGa  I N TLTIFCTARFRKLRSHLNYILVNLALA N LLVI-LVGS T TACYSFSQ-- MYFALGP TA C  P.592
SWS2_xenTr  L N LLTIICTVKYKKLRSHLNYILVNLAVA N LIVI-CFGS T TAFYSFSQ-- MYFSLGT LA C  P.592
SWS2_geoAu  L N FLTVFVTIKYKKLRSHLNYILVNLAIA N LIVV-CCGS T LAFYSFMH-- KYFILGP LF C  P.592
SWS2_takRu  I N VLTIACTIQYKKLRSHLNYILVNLAFS N LLVT-TVGS F TCFCCFFV-- RYMIVGP LG C  P.592
SWS2_gasAc  I N ALTVACTVQNKKLRSHLNYILVNLAVS N LLVS-GVGA F TAFLSFAA-- RYFVLGT LA C  P.592
SWS1_homSa  L N AMVLVATLRYKKLRQPLNYILVNVSFG G FLLC-IFSV F PVFVASCN-- GYFVFGR HV C  P.592
SWS1_monDo  L N AVVLVATLRYKKLRQPLNYILVNVSLC G FIFC-IFAV F TVFISSSQ-- GYFIFGR HV C  P.592
SWS1_smiCr  L N GVVLIATLRYKKLRQPLNYILVNISLA G FIFC-VFSV F TVFVSSSQ-- GYFVFGR HV C  P.592
SWS1_tarRo  L N AVVLIATLRYKKLRQPLNYILVNISLA G FIFC-VISV F TVFISSSQ-- GYFIFGR HV C  P.592
SWS1_taeGu  L N AIVLIVTIKYKKLRQPLNYILVNISVS G LMCC-VFCI F TVFIASSQ-- GYFVFGK HM C  P.592
SWS1_anoCa  L N AIILIVTVKYKKLRQPLNYILVNISFA G FLFC-TFSV F TVFMASSQ-- GYFFFGR HV C  P.592
SWS1_utaSt  L N AIILIVTVKYKKLRQPLNYILVNISFA G FLFC-VFSV F TVFLASSQ-- GYFFFGR HI C  P.592
SWS1_neoFo  L N AIVLFVTIKYKKLQQPLNYILVNISLA G FIFC-FFGV F AVFIASCQ-- GYFIFGK TV C  P.592
SWS1_galGa  L N AVVLWVTVRYKRLRQPLNYILVNISAS G FVSC-VLSV F VVFVASAR-- GYFVFGK RV C  P.592
SWS1_xenLa  L N FIVLLVTIKYKKLRQPLNYILVNITVG G FLMC-IFSI F PVFVSSSQ-- GYFFFGR IA C  P.592
SWS1_petMa  L N AIVLIVTVKCKKLRQPLTYMLVNISAA G LVFC-LFSI S TVFLFSTQ-- GYFVFGP TV C  P.592
SWS1_geoAu  L N AIVLVVTIKYKKLRQPLNYILVNISAA G LVFC-LFSI S TVFVASMQ-- GYFFLGP TI C  P.592
SWS1_danRe  M N GIVLFVTMKYKKLRQPLNYILVNISLA G FIFD-TFSV S QVSVCAAR-- GYYSLGY TL C  P.592
SWS1_oryLa  L N FVVLLATAKYKKLRVPLNYILVNITFA G FIFV-TFSV S QVFLASVR-- GYYFFGQ TL C  P.592
LWS_homSap  T N GLVLAATMKFKKLRHPLNWILVNLAVA D LAET-VIAS T ISVVNQVY-- GYFVLGH PM C  P.592
LWS_monDom  T N GLVLVATMKFKKLRHPLNWILVNLAVA D LGET-VIAS T ISVINQIY-- GYFILGH PL C  P.592
LWS_macEug  T N GLVLVATMKFKKLRHPLNWILVNLAVA D LGET-LIAS T ISVINQIY-- GYFILGH PM C  P.592
LWS_smiCra  T N GLVLVATMKFKKLRHPLNWILVNLAVA D LGET-IIAS T ISVINQIY-- GYFILGH PM C  P.592
LWS_ornAna  T N GLVLVATMKFKKLRHPLNWILVNLAVA D LGET-LIAS T ISVINQIF-- GYFILGH PM C  P.592
LWS_galGal  T N GLVLVATWKFKKLRHPLNWILVNLAVA D LGET-VIAS T ISVINQIS-- GYFILGH PM C  P.592
LWS_anoCar  T N GLVLVATAKFKKLRHPLNWILVNLAIA D LGET-VIAS T ISVINQIS-- GYFILGH PM C  P.592
LWS_xenTro  T N GLVLVATLKFKKLRHPLNWILVNMAIA D LGET-VIAS T ISVCNQIF-- GYFVLGH PM C  P.592
LWS_petMar  S N GLVLVATVKFKKLRHPLNWIIVNLAIA D ILET-IFAS T ISVCNQVY-- GYFILGH PM C  P.592
LWS_letJap  T N GLVLVATMKFKKLRHPLNWILVNLAIA D ILET-IFAS T ISVCNQVF-- GYFILGH PM C  P.592
LWS_geoAus  T N GLVLVATLKFKKLRHPLNWILVNLAIA D IGET-IFAS T VSVVNQIF-- GYFILGH PL C  P.592
LWS_neoFor  T N GLVLMATYKFKKLRHPLNWILVNLAIA D LGET-LIAS T ISVTNQIF-- GYFILGH PM C  P.592
LWS_takRub  T N GLVLVATAKFKKLRHPLNWILVNLAIA D LGET-VFAS T ISVCNQFF-- GYFILGH PM C  P.592
LWS_gasAcu  T N GLVLVATAKFKKLQHPLNWILVNLAIA D LGET-VFAS T ISVCNQFF-- GYFILGH PM C  P.592
LWS1_calMi  T N GLVLVATVRFKKLRHPLNWILVNMALA D LGET-VLAS T VSVANQFF-- GYFILGH PL C  P.592
LWS2_calMi  T N GLVLVATWKFKKLRHPLNWILVNLAIA D LGET-LFAS T ISICNQVF-- GYFILGH PM C  P.592
PIN_galGal  V N GLVIVVSICYKKLRSPLNYILVNLAVA D LLVT-LCGS S VSLSNNIN-- GFFVFGR RM C  P.592
PIN_colLiv  V N GLVIVVSIRYKKLRSPLNYILVNLAMA D LLVT-LCGS S VSFSNNIN-- GFFVFGK RL C  P.592
PIN_taeGut  L N GLVIVVSVRHKRLRSPLNYILLNLAVA N LLVT-LCGS S VSLSNNIS-- GFFVFGE RL C  P.592
PIN_utaSta  V N GLVIVVSIQYKKLRSPLNYILVNLAIA D LLVT-SFGS T LSFANNIY-- GFFVLGQ TA C  P.592
PIN_podSic  V N GLVIVVSVQFKKLRSPLNYVLVNLAVA D LLVT-FFGS T ISFVNNAQ-- GFFIFGQ AT C  P.592
PIN_pheMad  A N GLVIAVSVRFKRLRSPLNYILVNLATA D LLVT-FFGS I ISFVNNAV-- GFFVFGK TA C  P.592
PIN_xenTro  V N GLVIVVTLKYKKLRSPLNYILVNLAIA N LLVT-IFGS S VSFSNNVV-- GYFFMGK TM C  P.592
PIN_bufJap  V N GMVIVVSLKYKKLRSPLNYILVNLAVA D ILVT-MFGS T VSFHNNIF-- GFFTLGK LV C  P.592
VAOP_galGa  E N LAVILVTFKFKQLRQPVNYVIVNLSVA D FLVS-LTGG T ISFLANLK-- GYFYMGH WA C  P.592
VAOP_taeGu  E N LAVILVTFKFKQLRQPINYIIVNLSVA D FLVS-LTGG T ISFLTNLK-- GYFFMGY WA C  P.592
VAOP_anoCa  E N FTVILVTIKFKQLRQPLNYVIVNLSVA D FLVS-LIGG T ISFSTNLK-- GYFYMGH WA C  P.593
VAOP_xenTr  E N FIVILVTAKFKQLRQPLNYIIVNLSVA D FLVS-VIGG T ISIATNSR-- GYFYLGS WA C  P.592
VAOP_danRe  E N FTVMLVTFRFQQLRQPLNYIIVNLSLA D FLVS-LTGG S ISFLTNYH-- GYFFLGK WA C  P.592
VAOP_rutRu  E N FAVMLVTFRFTQLRKPLNYIIVNLSLA D FLVS-LTGG T ISFLTNYH-- GYFFLGK WA C  P.592
VAOP_takRu  E N FLVMFITFKFKQLRQPLNYIIVNLAIA D FLVS-LTGG L ISFLTNAR-- GYFFLGR WA C  P.592
VAOP_petMa  E N FAVIVVTARFRQLRQPLNYVLVNLAAA D LLVS-AIGG S VSFFTNIK-- GYFFLGV HA C  P.592
PPIN_anoCa  L N TAVIAITIKYRQLRQPINYSLVNLAIA D LGAA-LLGG S LNVETNAV-- GYYNLGR VG C  P.592
PPIN_xenTr  L N VTVIVVTFKYRQLRHPINYSLVNLAIA D LGVT-VLGG A LTVETNAV-- GYFNLGR VG C  P.592
PPINa_petM  L N STVIIVTLRHRQLRHPLNFSLVNLAVA D LGVT-VFGA S LVVETNAV-- GYFNLGR VG C  P.592
PPIN_letJa  L N STVVIVTLRHRQLRHPLNFSLVNLAVA D LGVT-VFGA S LVVETNAV-- GYFNLGR VG C  P.592
PPIN_danRe  L N VTVITVTLKYKQLRQPLNFALVNLAVA D LGCA-VFGG L PTVVTNAM-- GYFSLGR VG C  P.592
PPIN_ictPu  L N MVVIIVTVRYKQLRQPLNYALVNLAVA D LGCP-VFGG L LTAVTNAM-- GYFSLGR VG C  P.592
PPIN_oncMy  M N VLVIMVTMRHRKLRQPLNYALVNLAVA D LGCA-LFGG L PTMVTNAM-- GYFSMGR LG C  P.592
PPINb_takR  L N VLVIVVTMKHRQLRQPLSYALVNLAIC D LGCA-LFGG I PTTITSAM-- GYFSLGR VG C  P.592
PPINb_tetN  L N VLVIVVTLKHRQLRQPLNYALVNLAIC D LGCA-LFGG I PTTVTSAM-- GYFSLGR LG C  P.592
PPINb_gasA  L N ALVIVVTARHRQLRQPLSYALVNLAVC D LGCA-ACGG L PTTVTSAM-- GYFSLGR AG C  P.592
PPINa_gasA  L N ATVIIVTLMHKQLRQPLNYALVNMALA D LGTA-MTGG V LSVVNNAQ-- GYFSLGR SG C  P.592
PPINa_takR  L N ATVIIVSLMHKQLRQPLNYALVNMAVA D LGTA-MTGG L LSVVNNAQ-- GYFSLGR TG C  P.592
PPINa_tetN  L N ATVIIVSLMHKQLRQPLNYALVNMAAA D LGTA-VSGG L LSVVNNAQ-- GHFSLGR TG C  P.592
PPINa_cioI  L N ILVIVATLKNKVLRQPLNYIIVNLAVV D LLSG-FVGG F ISIAANGA-- GYFFWGK TM C  P.592
PPINa_cioS  L N ILVITATLKNKVLRQPLNYIIVNLAVV D LLSG-LVGG V ISIFANGA-- GYFFWGK FM C  P.592
PPINb_cioI  L N GFVIIATMKNKKLRQPLNYIIINLSIA D FLSG-LVGG F IGMISNSA-- GYFYFGK TV C  P.592
PPINb_cioS  L N LLVIVATYKNKDLRRPINYIIVNLAVA D LTCS-VVGG L LGVLNNGA-- GYYFLGK SV C  P.592
PARIE_utaS  N N SLVIAVTLKNPQLRNPINIFILNLSFS D LMMS-LCGT T IVIATNYY-- GYFYLGR KF C  P.592
PARIE_anoC  N N FLVIAVTLKNPQLRNPINIFILNLSFS D LMMS-ICGT T IVIATNYH-- GYFYLGR RF C  P.592
PARIE_xenT  N N AIVILVTLKHPQLRNPINIFILNLSFS D LMMA-LCGT T IVVSTNYH-- GYFYLGK QF C  P.592
PARIE_takR  N N SLAIAVMLKNPSLLQPINIFILSLAVS D LMIG-LCGS L VVTITNYH-- GSFFIGH TA C  P.592
PARIE_tetN  N N GLAITVMLKNPALLQPINIFILSLAVS D LMIG-LCGS L VVTITNYQ-- GSFFIGH TA C  P.592
PARIE_gasA  N N VLVITVLVRNPSLLQPMNVFILSLAVS D LMIG-LCGS L VVTITNYH-- GSFFIGH TA C  P.592
PARIE_danR  N N VLVIAVMVKNLHFLNAMTVIIFSLAVS D LLIA-TCGS A IVTVTNYE-- GSFFLGD AF C  P.592
ENCEPH_hom  N N LLVLVLYYKFQRLRTPTHLLLVNISLS D LLVS-LFGV T FTFVSCLR-- NGWVWDT VG C  P.592
ENCEPH_oto  N N LLVLVLYYKFPRLRTPTHLFLVNISLS D LLVS-LFGV T FTFVSCLR-- NGWVWDT VG C  P.592
ENCEPH_lox  N N LLVLVLYYKFQRLRTPTHLFLVNISLS D LLVS-LFGV T FTFVSCLR-- NGWVWDT VG C  P.592
ENCEPH_pte  N N LLVLVFYYKFQQVRTPFYLFLVNISFS D LLVS-FFGV T FTFVSCLR-- NGWVWDT VG C  P.592
ENCEPH_mus  G N LLVLLLYSKFPRLRTPTHLFLVNLSLG D LLVS-LFGV T FTFASCLR-- NGWVWDA VG C  P.592
ENCEPH_can  C H FCPQKGFLEFQRLRTPTHLLLVNLSLS D LLVS-LFGV T FTFVSCLR-- NGWVWDS VG C  P.592
ENCEPH_mon  N N LLVLVLYYKFQRLRTPTHLFLVNISFN D LLVS-LFGV T FTFVSCLR-- SGWVWDS VG C  P.592
ENCEPH_ano  N N LLVLVLYAKFKRLRTPTHLFLVNISLS D LLVS-LFGV S FTFGSCLR-- HRWVWDA AG C  P.592
ENCEPH_gal  N N LLVLVLYYKFKRLRTPTNLFLVNISLS D LLVS-VCGV S LTFMSCLR-- SRWVWDA AG C  P.592
ENCEPH_dan  N N IIVIILYSRYKRLRTPTNLLIVNISVS D LLVS-LTGV N FTFVSCVK-- RRWVFNS AT C  P.592
ENCEPH_tak  N N FVVLALYCRFKRLRTPTNLLLVNISLS D LLVS-LFGI N FTFAACVQ-- GRWTWTQ AT C  P.592
ENCEPH_gas  N N VVVIVLYCKFKRLRTPTNLLVVNISLS D LLVS-VIGI N FTFVSCIR-- GGWTWSR AT C  P.592
ENCEPH_ory  N N LLVILLYCKFKRLRTPTSLLLVNISLS D LLVS-VVGI N FTLASCVK-- GRWMWSQ AT C  P.592
ENCEPH_xen  N N LLVLILYCKFKRLQTPTNLLFFNTSLC H FVFS-LLAI T FTFMSCVR-- GSWAFSV EM C  P.592
ENCEPH4_br  N N FVVILLIGCHRQLRTPFNLLLLNMSVA D LLVS-VCGN T LSFASAVR-- HRWLWGR PG C  P.592
ENCEPH4_br  N N FVVILLIGCHRQLRTPFNLLLLNVSVA D LLVS-VCGN T LSFASAVQ-- HRWLWGR PG C  P.592
ENCEPH_cal  N N ILVLLLYYKFKRLRTPTNLLLVNISVS D LLVS-VFGL S FTFVSCTQ-- GRWGWDS AA C  P.592
ENCEPH_squ  N N LLMLVLYCKFKRLRTPTNLFLVNISIS D LLLS-VFGV I FTFVSCVK-- GRWVWDS AA C  P.592
ENCEPH_pet  N N LLLVALFVGFKRLQTPTNLLLVNISLS D LLVS-VFGN T LTLVSCVR-- RRWVWGN GG C  P.592
TMT5_braFl  S N GAVVLLFLKFRQLRTPFNMLLLNMSVA D LLVS-VCGN T LSFASAVR-- HRWLWGR PG C  P.592
TMT5_braBe  S N GAVVVLFLKFPQLRTPFNLLLLNMAVA D LLVS-VCGN T LSFASAVR-- HRWLWGR PG C  P.592
TMT_monDom  S N FIVLVLFCKFKVLRNPVNMLLLNISIS D MLVC-LSGT T LSFASSIQ-- GRWIGGK HG C  P.592
TMT_macEug  N N FIVLVLFCKFKVLRNPVNMLLLNISIS D MLVC-LTGT T LSFASSIR-- GRWIAGY HG C  P.592
TMT_galGal  N N LIVLILFCKFKTLRNPVNMLLLNISIS D MLVC-ISGT T LSFASNIH-- GKWIGGE HG C  P.592
TMT_taeGut  N N LIVLILFCKFKTLRNPVNMLLLNISVS D MLVC-ISGT T LSFASNIR-- GKWIGGD HA C  P.592
TMT_anoCar  N N LVVLILFCKFKTLRNPVNMLLLNISAS D MLVC-ISGT T LSFVSNIY-- GRWIGGE HG C  P.592
TMT_xenTro  N N FVVLILFCKFKTLRTPVNMMLLNISAS D MLVC-VSGT T LSFTSSIK-- GKWIGGE YG C  P.592
TMT_ornAna  N N LIVLILFCKFKALRNPVNMIMLNISAS D MLVC-VSGT T LSFASNIS-- GRWIGGD PG C  P.592
TMT_danRer  N N LVVLVLFCKFKTLRTPVNMLLLNISIS D MLVC-MFGT T LSFASSVR-- GRWLLGR HG C  P.592
TMT_tetNig  N N FIVLLLFCKFKKLRTPVNVLLLNISVS D MLVC-LFGT T LSFASSLR-- GRWLLGR SG C  P.592
TMT_takRub  N N FVVLLLFCKFKKLRTPVNMLLLNISVS D MLVC-LFGT T LSFASSIR-- GRWLLGR IG C  P.592
TMT_gasAcu  N N LVVLLLFCKFKKLRTPVNMLLLNISVS D MLVC-LFGT T LSFASSLR-- GKWLLGR SG C  P.592
TMT_oryLat  N N FVVLILFCKFKKLRTPVNMLLLNISVS D MLVC-LFGT T LSFASSIR-- GRWLLGR GG C  P.592
TMTa1_anoC  N N LLVLVLFCRNKVLRSPINLLLMNISLS D LMIC-IVGT P FSFAASTQ-- GKWLIGP AG C  P.592
TMTa1_xenT  N N LVVLILFCQYKVLRSPINMLLMNISLS D LMVC-ILGT P FSFAASTQ-- GHWLIGE IG C  P.592
TMTa1_danR  N N LLVLVLFGRYKVLRSPINFLLVNICLS D LLVC-VLGT P FSFAASTQ-- GRWLIGD TG C  P.592
TMTb_danRe  N N TLVLVLFCRYKVLRSPMNCLLISISVS D LLVC-VLGT P FSFAASTQ-- GRWLIGR AG C  P.592
TMTa_gasAc  N N LLVLVLFCRYKMLRSPINLLLINISIS D LLVC-VLGT P FSFAASTQ-- GRWLIGE GG C  P.592
TMTb_gasAc  S N FLVLALFCRYRALRTPMNLLLVSISAS D LLVS-MVGT P FSFAASTQ-- GRWLIGR AG C  P.592
TMTa_oryLa  N N LLVLVLFCRYKILRSPINLLLINISIS D LLVC-VLGT P FSFAASTQ-- GRWLIGE GG C  P.592
TMTb_oryLa  S N LLVLALFCRYRALRTPMNLLLVSISVS D LLVS-VLGT P FSFAASTQ-- GRWLIGR AG C  P.592
TMTa_pimPr  N N TLVLILFCRYKVLRSPMNYLLVSIAVS D LLVC-VLGT P FSFAASTQ-- GRWLIGR AG C  P.592
TMTa_takRu  N N LLVLVLFCRYKMLRSPINLLLMNISIS D LLVC-VLGT P FSFAASTQ-- GRWLIGE AG C  P.592
TMTb_takRu  S N FLVLALFCRYRALRTPMNLMLVSISAS D LLVS-VLGT P FSFAASTQ-- GRWLIGR AG C  P.592
TMTa_tetNi  S N LLVLVLFCRFKVLRSPINLLLVNISVS D LLVC-VLGT P FSFAASTQ-- GRWLIGA AG C  P.592
TMTb_tetNi  S N LLVLALFCRFRALRTPMNLMLVSISAS D LLVS-VLGT P FSFAASTQ-- GRWLLGR AG C  P.592
TMTa_oncMy  S N LFVLLVFARFQVLRTPINLILLNISVS D MLVC-IFGT P FSFAASLY-- GRWLIGA HG C  P.592
TMTa1_calM  N N LLVLVLFCKYKVLRSPMNMLLLNISVS D MLVC-ICGT P FSFAASVQ-- GRWLVGE QG C  P.592
TMTa2_calM  N N LLVLLLFVCFKEIRTPLNMILLNISLS D LSVC-VFGT P FSFAASIY-- RRWLIGH KG C  P.592
TMTx_braFl  N N STTLYLVGRYKQLRTPFNILMVNLSVS D LLMC-VLGT P FSFVSSLH-- GRWMFGH SG C  P.592
TMTPIN_str  N N GIVMILFARFPSLRHPINSFLFNVSLS D LIIS-CLAS P FTFASNFA-- GRWLFGD LG C  P.592
TMTy_braFl  T N LLTVLVFWCFKSLRTPFHLYLGGIALS D LLVA-ALGS P FAVASAVG-- ERWLFGR AV C  P.592
ENCEPH_str  G N SVVLFLFAWDRHLRTPTNMFLLSLTIS D WLVT-VVGI P FVTASIYA-- HRWLFAH VG C  P.592
TMT_triCys  L N GLVIAVLIKYIRTITNTNIIVLSMSCA N ILIP-LLGS P LSATSSLM-- RKWQFGN GG C  P.592
CUBOP_carR  L N MIVLITFYRLRHKLAFKDALMASMAFS D VVQA-IVGY P LEVFTVVD-- GKWTFGM EL C  P.592
TMT_apiMel  A N LLVAIVIVKDAQLWTPVNVILFNLVFG D FLVS-IFGN P VAMVSAAT-- GGWYWGY KM C  P.592
TMT1_anoGa  L N IFVIALMYKDVQLWTPMNIILFNLVCS D FSVS-IIGN P LTLTSAIS-- HRWLYGK SI C  P.592
TMT2_anoGa  L N LFVIALMCKDMQLWTPMNIILFNLVCS D FSVS-IIGN P LTLTSAIS-- HRWIFGR TL C  P.592
TMT_aedAeg  L N LFVIALMCKDVQLWTPINIILFNLVCS D FSVS-IIGN P FTLTSAIS-- RHWIFGR TV C  P.592
TMT_culPip  L N LFVIALMCKEVQLWTPMNIILLNLVCS D FSVS-IVGN P FTLSSAIS-- HRWLFGR KL C  P.592
TMT_triCas  L N LTVIIFMLKERQLWSPLNIILFNLVVS D FLVS-VLGN P WTFFSAIN-- YGWIFGE TG C  P.592
TMT_bomMor  . . ............LWTPLNIILFNLVCS D FSVS-VLGN P FTLISALF-- HRWIFGH TM C  P.592
TMT_rhoPro  G N LIVIIIMCRDKNLWTPVNFILFNVIVS D FSVA-ALGN P FTLASAIA-- KRWFFGQ SM C  P.592
TMTa_dapPu  M N IVVVVIILNDSQKMTPLNWMLLNLACS D GAIA-GFGT P ISAAAALK-- FTWPFSH EL C  P.592
TMTb_dapPu  M N VVVVIVILNDSQRMTPLNWMLLNLACS D GAIA-GFGT P ISTAAALE-- FGWPFSQ EL C  P.592
ENCEPHa_ne  T N TIVVIIFISSQRLHTTPNLILFSMSVC D WLMA-TMAK S VGIYGNAR-- YWPTVGK VT C  P.592
ENCEPHb_ne  T N TIVVITFIFSKRLHTTPNLILFSMSVC D WLMA-AMAK S VGIYGNAR-- YWPTVGK VT C  P.592
ENCEPHc_ne  L N GIVLIIFLATRSLRTIPNMILLSMAWA D WLMA-CLAD A VGAYANAN-- NWPSMVG GL C  P.592
TMT1_plaDu  S N GVIMYLYFKDKSLRSPMNLLFVNLAMS D FTVA-FFGA M FQFGLTCTR- KYMSPGM AL C  P.591
TMT2_plaDu  L N VLVLVLFIKDRKLRSPNNFLYVSLALG D LLVA-VFGT A FKFIITARK- TLLREED GF C  P.591

RGR1_homSa  L N TLTIFSFCKTPELRTPCHLLVLSLALA D SGIS-LNAL V AATSSLL--- RRWPYGS DG C  P.593
RGR1_ornAn  L N GLTIASFRKIKELRTPSNLLVVSLALA D SGIC-LNAL M AALSSFL--- RHWPYGA EG C  P.593
RGR1_galGa  L N GLTIISFRKIKELRTPSNLLVLSIALA D CGIC-INAF I AAFSSFL--- RYWPYGS EG C  P.593
RGR1_xenTr  L N GLTLLSFYKIRELRTPSNLFIISLAVA D TGLC-LNAF V AAFSSFL--- RYWPYGS EG C  P.593
RGR1_gasAc  L N AVTIAAFLKVRELRTPSNFLVFSLAVA D IGIS-MNAT I AAFSSFL--- RYWPYGS DG C  P.593
RGR2_danRe  L N AISVLAFLRVREMQTPNNFFIFNLAVA D LSLN-INGL V AAYACYL--- RHWPFGS EG C  P.593
RGR2_pimPr  L N LISVLAFLRVREIQTPNNFFIFNLAVA D LSLN-INGL V AAYASYL--- RYWPFGS EG C  P.593
RGR2_tetNi  L N AISIVSFLTVKEMRNPSNFFVFNLALA D ISLN-VNGL I AAYASYL--- RYWPFGQ DG C  P.593
RGR2_gasAc  L N AISIASFLRVKEMWNPSNFFVFNLAVA D ICLN-VNGL T AAYASYL--- RYWPFGQ DG C  P.593
RGR2_oryLa  L N AISILAFLRVKEMRSPSSFLVFNLALA D ISLN-INGL T AAYASYL--- RYWPFGQ EG C  P.593
RGR1_calMi  L N GLTLLAFYKIKELRTPSNLLITSLALS D FGIS-MNAF I AAFSSFL--- RYWPYGS EG C  P.593
RGRa_cioIn  G Y SLLFVIFAKRPDLKK-KNKFLLSLATS D LLIT-VHVF A STIAAFA--- PQWPFGD LG C  P.593
RGRa_cioSa  G Y GLLFVIFAKSPDLKK-KNRFLFSLAVS D LLIT-IHVV A SVVASFQ--- SEWPFGS IG C  P.593
RGRb1_cioI  G Y AVYFGAIWRSKTLQT-RHIWLTSLACG D IIMM-VHLI L ESLSSLGM-- GHRPRQN FE C  P.593
RGRb2_cioI  G Y SVYILAIWSSKKLQT-KHIWLTSLACA D LLMM-VHLF M DGLSSFHQ-- GRRPKGI FE C  P.593
RGRb2_cioS  G Y SIYLRAIWSSRKLQT-RHIWLTSLACA D LIMM-VHLF M DGLSSFHQ-- GRRPKGN FE C  P.593

PER1_homSa  S N IIVLGIFIKYKELRTPTNAIIINLAVT D IGVS-SIGY P MSAASDLY-- GSWKFGY AG C  P.592
PER1_ornAn  S N VIVLGIFVKFEELRTATNAIIINLAVT D IGVS-GIGY P MSAASDLH-- GSWKFGH AG C  P.592
PER1_monDo  S N VIVLGIFVKYKALRTATNTIIINLAVT D IGVS-SIGY P MSAASDLY-- GSWKFGY DG C  P.592
PER1_xenTr  S N IIVLGIFVKYKELRTATNAIIINLAFT D IGVS-GIGY P MSAASDLH-- GSWKFGY VG C  P.592
PER1_gasAc  S N IVVLLMFWKFKELRTATNFIIINLAFT D IGVA-GIGY P MSAASDIH-- GSWKFGY AG C  P.592
PER1a_sacK  L S SVNFRMLLSNPDYCSKAGNFFLSLAVT D LCVC-IFET P FSAFSHHA-- GFWIFGD TA C  P.592
PER1_lotGi  L S LLVALTFIREKGLFKYGRAWLHISLAI A NVGV-VGAF P FSGSSSFS-- GRWLYGS GM C  P.592
PER1_aplCa  L N LLTALTFYKDTKLTKGSQPWLHILLAL A NVGV-VAPS P FPASSSFS-- GRWLYGS TM C  P.592
PER1_todPa  L C GMCIIFLARQSPKPRRKYAILIHVLIT A MAV--NGGD P AHASSSIV-- GRWLYGS VG C  P.582
PER1b_sacK  G N SVVLEMFRRYKELLSPSAILLISLALA D LGLT-IFGM S LSCVSSFA-- GRWLFGK FG C  P.592
PER1_braFl  G N IFAIIVFLTEKEFRKKEHNSFALNLAIA D LSVCVFAY P SSTISGYA-- GEWMLGD VG C  P.602
PER1_braBe  G N VITITVFLTEKEFRKKQQNGFVLNLAIA D LSVCVFAY P SSAIAGYA-- GRWVLGD VG C  P.602
PER2_braFl  G N ATVVLMFMLKWRQLCRKANLLIINLAAV D LCISVFGY P FSASSGFA-- NQWLFSD AI C  P.602
PER2_braBe  G N ATVVLMFIMKWRQLCRKANLLVINLAAA N LCITIFGY P FSASSGYA-- HQWLFPD AI C  P.602
PER2a_strP  G N ITVICVLCRYRTFRKRSINLLLINMAAS D LGVSVAGY P LTTVSGYW-- GRWLFGD VG C  P.602
PER2b_strP  G N ITVLCVLCRYGTFRKRSVNILLMNMAVS D LGVSVAGY P LTAISGYR-- GRWVFAD IG C  P.602
PER2_patYe  G N LLIIIVFAKRRSVRRPINFFVLNLAVS D LIVA-LLGY P MTAASAFS-- NRWIFDN IG C  P.592
PER3_braFl  E N GITLATFTKFRSLRSPTTMLLVHLAIA D LGIC-IFGY P FSGASSLR-- SHWLFGG VG C  P.592
PER3_braBe  E N GITLATFSKFRSLRSPTTMLLVHLAIA D LGIC-IFGY P FSGASSLR-- SHWLFGG VG C  P.592
PER3_hadAd  G N GLVLVTFLRFRVLVTPTTLLLVNLAVS D LGLI-LFGF P FSASSSLS-- AKWIFGE GG C  P.592

NEUR_strPu  G N ISVIVISLRKREKLKPIDLLTINLAIA D FLIC-VVSY P LPMISAFR-- HRWSFGK FG C  P.592
NEUR1_homS  G N GYVLYMSSRRKKKLRPAEIMTINLAVC D LGIS-VVGK P FTIISCFC-- HRWVFGW IG C  P.592
NEUR1_calJ  G N GYVLYMSSRRKKKLRPAEIMTINLAVC D LGIS-VVGK P FTIISCFC-- HRWVFGW IG C  P.592
NEUR1_dasN  G N GYVLYMSSKRKKKLRPAEIMTINLAVC D LGIS-VVGK P FTIISCFC-- HRWVFGW IG C  P.592
NEUR1_canF  G N GYVLYMSSRRKKKLRPAEIMTINLAIC D LGIS-VVGK P FTIISCFC-- HRWVFGW IG C  P.592
NEUR1_bosT  G N GYVLYMSSRRKKKLRPAEIMTVNLAIC D LGIS-VVGK P FTIISCFC-- HRWVFGW IG C  P.592
NEUR1_musM  G N GYVLYMSSRRKKKLRPAEIMTINLAVC D LGIS-VVGK P FTIISCFC-- HRWVFGW FG C  P.592
NEUR1_loxA  G N GYVLYMSCRRKKKLRPAEIMTINLAVC D LGIS-VVGK P FVIISCFC-- HRWVFGW IG C  P.592
NEUR1_ochP  G N GYVLYMSSRRKKKLRPAEIMTINLAVC D LGIS-VVGK P FTIISCFC-- HRWVFGW IG C  P.592
NEUR1_monD  G N GYVIYMSSKRKKKLRPAEIMTVNLAVC D LGIS-VVGK P FTIISCFS-- HRWVFGW VG C  P.592
NEUR1_ornA  G N GYVIYMSSRRKKKLRPAEIMTVNLAVC D LGIS-VVGK P FTIVSCFC-- HRWVFGW MG C  P.592
NEUR1_galG  G N GYVIFMSSKRKKKLRPAEIMTVNLAVC D LGIS-VVGK P FSIISFFS-- HRWIFGW MG C  P.592
NEUR1_xenT  G N GYVIYMACSRKKKLRPAEIMTINLAVC D LGIS-VTGK P FAIVSCFS-- HRWVFGW NA C  P.592
NEUR1_danR  G N GYVMYMTFKRKTKLKPPEIMTLNLAIF D FGIS-VSGK P FFIVSSFS-- HRWLFGW QG C  P.592
NEUR1_calM  G N GYVIYLSITQKRKLKPPEILITNLAIS D FGMS-VGGQ P FLIISCFS-- HRWIFGW VG C  P.592
NEUR1a_bra  G N GRVLWLSYRCRARLRPVEMFVVSLAVA D VGLS-LVGH P FAAASSLM-- GRWSFGS AG C  P.592
NEUR1b_bra  G N GRVLWLSYRNWAKLRPVELFVVSLAVT D VGIS-VFGY P FAASSSLL-- GRWSFGS AG C  P.592
NEUR2_galG  G N SILLYISYKKKHLLKPAEYFIINLAIS D LAMT-LTLY P LAVTSSLS-- HRWLYGK HI C  P.592
NEUR2_anoC  G N SILLYVSYKKKNLLKPAEYFMINLAIS D LGMT-LTLY P LAVTSSLA-- HRWLFGQ QV C  P.592
NEUR2_xenT  G N SMLLLVAYRKRSILKPAEFFIVNLSIS D LGMT-GTLF P LAIPSLFA-- HRWLFDK VT C  P.592
NEUR2_danR  G N GMLLFVAYRKRSSLKPAEFFVVNLSVS D LGMT-LSLF P LAIPSALA-- HRWLFGE IT C  P.592
NEUR2_calM  G N SVLLFVAYRKRQILKPAEYFVANLAVS D ISMT-VTLL P LAISSNFS-- HRWLFVS KP C  P.592
NEUR3_galG  G N SAVLATAVKRSSLLKSPELLTVNLAVA D IGMA-ISMY P LAIASAWN-- HAWLGGD AS C  P.592
NEUR3_taeG  G N SAVLATAVKRSSLLKPPELLTVNLAVA D IGMA-LSMY P LAIASAWS-- HAWLGGD AS C  P.592
NEUR3_xenT  G N CAVLATAVKCSSHLKAPDLLSINLAVA D LGMA-ISMY P LAIASAWN-- HAWLGGD AS C  P.592
NEUR3_anoC  G N SMVLAVAVKRSSCLRSPELLTVNLAAT D LGMG-LSMY P LAIASAWN-- HAWLGGE AT C  P.592
NEUR3b_dan  G N LMVLVMAYKRSNHMKPPELLSVNLAVT D LGAA-VTMY P LAVASAWN-- HHWIGGD VS C  P.592
NEUR3a_dan  G N AAVLLTAAWRHSVLKAPELLTVNLAVT D IGMA-LSMY P LSIASAFN-- HAWIGGD PS C  P.592
NEUR3a_tet  G N ASVLFSASRRLTPLKAPELLTVNLAVT D IGMA-LSMY P LSIASAFN-- HAWMGGD TA C  P.592
NEUR3_petM  G N GAVLGVAARRWAKLKAPELLSVNLALT D LGIA-ASIY P LAVASAWN-- HRWLGGQ PV C  P.592
NEUR4_ornA  G N SMVIFILHRQRGILNPTDYLTFNLAVS D ASVS-VFGY S RGIIEIFNVF RDDGFSI WT C  P.620
NEUR4_galG  G N SVVIFVLYKQRHLLQPTDYLTFNLAVS D ASIS-VFGY S RGIIEIFNVF RDDGFSI WT C  P.620
NEUR4_taeG  G N SIVIFVLYKQRHVLQPTDYLTFNLAVS D ASIS-VFGY S RGIIEIFNVF RDDGFSI WT C  P.620
NEUR4_anoc  G N SIVIFVLYRQRAGLQPTDYLTFNLAVS D ASVS-VFGY S RGIIEIFNVF RDDGFSI WT C  P.620
NEUR4_xenT  G N SIVIFVLYKQRANLLPTDYLTFNLAVS D ASTS-VFGY S RGIIEIFNVF RDDGFSI WT C  P.620
NEUR4_danR  G N SIVIFVLFRQRSTLQPTDYLTLNLAVS D ASIS-VFGY S RGILEIFNIF KDSGYSV WT C  P.620
NEUR4_tetN  G N TVVLFVLVRQRSSLQPTDLLTFNLAVS D ASIS-VFGY S RGIIQIFNVF QDSGFSI WT C  P.620
NEUR4_gasA  G N SLVMFVLYRQRASLQSTDFLTLNLAIS D ASIS-IFGY S RGILEIFNIF NDDGTWI WT C  P.620
NEUR4_calM  G N SIVIFILYRQRLSLQPPDYLTLNLAVS D ASIS-IFGY S RGIIEIFNVF RDDGFSI WT C  P.620

GPR17_homS  G N TLALWLFIRDHKSGTPANVFLMHLAVA D LSCV--LVL P TRLVYHFSG- NHWPFGE IA C  P.581
CYSLTR1_ho  G N GFVLYVLIKTYHKKSAFQVYMINLAVA D LLCV--CTL P LRVVYYVHK- GIWLFGD FL C  P.581
P2RY8_homS  G N LFSLWVLCRRMGPRSPSVIFMINLSVT D LMLA--SVL P FQIYYHCNR- HHWVFGV LL C  P.581
BDKRB2_hom  E N IFVLSVFCLHKSSCTVAEIYLGNLAAA D LILA--CGL P FWAITISNN- FDWLFGE TL C  P.581
SSTR1_homS  G N SMVIYVILRYAKMKTATNIYILNLAIA D ELLM--LSV P FLVTSTLL-- RHWPFGA LL C  P.582
OPRL1_homS  G N CLVMYVILRHTKMKTATNIYIFNLALA D TLVL--LTL P FQGTDILL-- GFWPFGN AL C  P.582
OPRM1_homS  G N FLVMYVIVRYTKMKTATNIYIFNLALA D ALAT--STL P FQSVNYLM-- GTWPFGT IL C  P.582
CCR4_homSa  G N SVVVLVLFKYKRLRSMTDVYLLNLAIS D LLFV--FSL P FWGYYAA--- DQWVFGL GL C  P.583
TACR2_homS  G N AIVIWIILAHRRMRTVTNYFIVNLALA D LCMA-AFNA A FNFVYASH-- NIWYFGR AF C  P.592
GALR1_homS  G N SLVITVLARSKKPRSTTNLFILNLSIA D LAYL-LFCI P FQATVYAL-- PTWVLGA FI C  P.592
QRFPR_homS  G N ALVFYVVTRSKAMRTVTNIFICSLALS D LLIT-FFCI P VTMLQNIS-- DNWLGGA FI C  P.592
PPYR1_homS  G N LCLMCVTVRQKEKANVTNLLIANLAFS D FLMC-LLCQ P LTAVYTIM-- DYWIFGE TL C  P.592
NPY1R_homS  G N LALIIIILKQKEMRNVTNILIVNLSFS D LLVA-IMCL P FTFVYTLM-- DHWVFGE AM C  P.592
GPR19_homS  G N SLVCLVIHRSRRTQSTTNYFVVSMACA D LLIS-VAST P FVLLQFTT-- GRWTLGS AT C  P.592
HCRTR1_hom  G N TLVCLAVWRNHHMRTVTNYFIVNLSLA D VLVT-AICL P ASLLVDIT-- ESWLFGH AL C  P.592
GPR161_hom  G N LVIVVTLYKKSYLLTLSNKFVFSLTLS N FLLS-VLVL P FVVTSSIR-- REWIFGV VW C  P.592
ADRA1D_hom  G N LLVILSVACNRHLQTVTNYFIVNLAVA D LLLS-ATVL P FSATMEVL-- GFWAFGR AF C  P.592
ADRB2_homS  G N VLVITAIAKFERLQTVTNYFITSLACA D LVMG-LAVV P FGAAHILM-- KMWTFGN FW C  P.592
ADRB1_melG  G N VLVIAAIGSTQRLQTLTNLFITSLACA D LVVG-LLVV P FGATLVVR-- GTWLWGS FL C  P.592
PRLHR_homS  G N CLLVLVIARVRRLHNVTNFLIGNLALS D VLMC-TACV P LTLAYAFEP- RGWVFGG GL C  P.591
NMUR2_homS  G N VLVCLVILQHQAMKTPTNYYLFSLAVS D LLVL-LLGM P LEVYEMWRN- YPFLFGP VG C  P.591
ADORA2A_ho  G N VLVCWAVWLNSNLQNVTNYFVVSLAAA D IAVG-VLAI P FAITIS---- TGFCAAC HG C  P.594
TRHR_homSa  G N IMVVLVVMRTKHMRTPTNCYLVSLAVA D LMVLVAAGL P NITDSIY--- GSWVYGY VG C  P.603
NPY2R_homS  G N SLVIHVVIKFKSMRTVTNFFIANLAVA D LLVN-TLCL P FTLTYTLM-- GEWKMGP VL C  P.592

Indels in other opsins

Informative indels would be very helpful in this class of opsins because their sequence relationships to ciliary and melanopsins are too weak. Note intron patterns, another class of even rarer genetic event and so even better suited for deep time scales -- has already illuminated branching relationships to a certain extent.

(to be continued)