Opsin evolution: Cytoplasmic face: Difference between revisions

From genomewiki
Jump to navigationJump to search
Line 1: Line 1:
=== Comparative genomics of the cytoplasmic face of GPCR proteins ===  
=== Comparative genomics of the cytoplasmic face of GPCR proteins ===  


The cytoplasmic 'face' of opsin (or any GPCR) is presumably responsible for all interactions with downstream signal relaying partners because these latter are cytoplasmic proteins having no access to extracellular loops or transmembrane segments. Here it must be noted that ligand photoisomerization and release from Schiff base deep within the transmembrane region must drive a significant change in conformation in the cytoplasmic face that differentiates inactive from active states.
The cytoplasmic face of an opsin (or any GPCR) is comprised of three disjoint connecting loops and the carboxy terminus. It is presumably responsible for all interactions with downstream signal relaying partners because these latter are cytoplasmic proteins having no physical access to the extracellular loops or transmembrane segments. Here it must be noted that photoisomerization and retinal release from Schiff base deep within the transmembrane region must drive a significant change in conformation in the cytoplasmic face that differentiates its inactive from active states.


The cytoplasmic face is comprised of three loops and carboxy terminus. For bioinformatic purposes, it is convenient to 'reorganize' each linear protein sequence into its intracellular, membrane and outer regions for separate consideration. This is done below for the cytoplasmic face of 500 curated opsins from each of the 18 vertebrate opsin orthology classes using multiple representatives for each phylogenetic node and intense bracketing at eras of change (eg between DRY and GRY opsins of RGR class). A range of non-opsin GPCR are included to define properties common to all members of this large gene family (not specific to opsins).
For bioinformatic purposes, it is convenient to 'reorganize' each linear protein sequence into its intracellular, membrane and outer regions for separate consideration. This is done below for the cytoplasmic face for 500 curated opsins from each of the 20 vertebrate opsin genetic loci using multiple representatives for each phylogenetic node and intense bracketing at eras of functional transition (eg between DRY and GRY opsins of RGR class). A range of non-opsin GPCR are included to define properties common to all members of this large gene family (not specific to opsins).


The two critical goals in GPCR research are to determine the natural ligands (which largely concerns the extracellular and transmembrane regions) notably for orphan receptors and to determine their specific Galpha signaling partner among the 16 such paralogs in the vertebrate genome. For the 18 orthology classes of vertebrate opsins, the ligand is already known (11-cis retinal or related) but the signaling partner is generally not. As an example, does RGR opsin signal, to what purpose, and what is the meaning of the abrupt shift in the DRY motif to GRY in boreoeutheres?
The two critical goals in GPCR research are finding the natural ligands (which largely concerns the extracellular and transmembrane regions) notably for orphan receptors and to determining their specific Galpha signaling partner among the 17 such paralogs in the vertebrate genome. For vertebrate opsins, the ligand is known (11-cis retinal or related) but the signaling partner generally is not. For example, does RGR opsin signal at all, to what regulatory effect, and what is the meaning of the abrupt shift in the DRY motif to GRY at boreoeuthere divergence?
              
              
             DRY loop motif      transmemb 7 9 signaling
             DRY loop motif      transmemb Le 7 9 signaling
  ENCEPH_hom ERYIRVVHARVINFSW    AWRAITYIW 16 V A G?
  ENCEPH_hom ERYIRVVHARVINFSW    AWRAITYIW 16 V A G?
  RGR_homSap GRYHHYCTRSQLAWNS    AVSLVLFVW 16 C R G?
  RGR_homSap GRYHHYCTRSQLAWNS    AVSLVLFVW 16 C R G?
Line 23: Line 23:
  NEUR1_homS DRYLKICYLSYGVWLKRKH  AYICLAAIW 19 C L G?
  NEUR1_homS DRYLKICYLSYGVWLKRKH  AYICLAAIW 19 C L G?
  NEUR2_galG VCCLKICFPAYGNRFRRKH  GQILIACAW 19 C P G?
  NEUR2_galG VCCLKICFPAYGNRFRRKH  GQILIACAW 19 C P G?
NEUR3_galG IRFLVTNSSKSNSNKISKNT VHILITFIW 20 N S G?
NEUR4_ornA TRYIKGCHPHRGHFINTAN  ISVALILIW 19 C P G?
  TMT_monDom ERYRTL-TLCPGQGADYQK  ALLAVAGSW 19 - L G?
  TMT_monDom ERYRTL-TLCPGQGADYQK  ALLAVAGSW 19 - L G?
  MEL1_homSa DRYLVITRPLATFGVASKRR AAFVLLGVW 20 T P Gq
  MEL1_homSa DRYLVITRPLATFGVASKRR AAFVLLGVW 20 T P Gq

Revision as of 11:23, 30 January 2009

Comparative genomics of the cytoplasmic face of GPCR proteins

The cytoplasmic face of an opsin (or any GPCR) is comprised of three disjoint connecting loops and the carboxy terminus. It is presumably responsible for all interactions with downstream signal relaying partners because these latter are cytoplasmic proteins having no physical access to the extracellular loops or transmembrane segments. Here it must be noted that photoisomerization and retinal release from Schiff base deep within the transmembrane region must drive a significant change in conformation in the cytoplasmic face that differentiates its inactive from active states.

For bioinformatic purposes, it is convenient to 'reorganize' each linear protein sequence into its intracellular, membrane and outer regions for separate consideration. This is done below for the cytoplasmic face for 500 curated opsins from each of the 20 vertebrate opsin genetic loci using multiple representatives for each phylogenetic node and intense bracketing at eras of functional transition (eg between DRY and GRY opsins of RGR class). A range of non-opsin GPCR are included to define properties common to all members of this large gene family (not specific to opsins).

The two critical goals in GPCR research are finding the natural ligands (which largely concerns the extracellular and transmembrane regions) notably for orphan receptors and to determining their specific Galpha signaling partner among the 17 such paralogs in the vertebrate genome. For vertebrate opsins, the ligand is known (11-cis retinal or related) but the signaling partner generally is not. For example, does RGR opsin signal at all, to what regulatory effect, and what is the meaning of the abrupt shift in the DRY motif to GRY at boreoeuthere divergence?

           DRY loop motif       transmemb Le 7 9 signaling
ENCEPH_hom ERYIRVVHARVINFSW     AWRAITYIW 16 V A G?
RGR_homSap GRYHHYCTRSQLAWNS     AVSLVLFVW 16 C R G?
RGR2_gasAc DRYHQYCTRQKLFWST     TLTMSAIIW 16 C R G?
RHO1_homSa ERYVVVCKPMSNFRFGENH  AIMGVAFTW 19 C P GNAT1
RHO2_galGa ERYIVVCKPMGNFRFSATH  AMMGIAFTW 19 C P GNAT2
SWS2_ornAn ERFLVICKPLGNLSFRGTH  AIFGCAATW 19 C P GNAT2
PIN_galGal ERYVVVCRPLGDFQFQRRH  AVSGCAFTW 19 C P G?
SWS1_homSa ERYIVICKPFGNFRFSSKH  ALTVVLATW 19 C P GNAT2
LWS_homSap ERWMVVCKPFGNVRFDAKL  AIVGIAFSW 19 C P GNAT2
VAOP_galGa ERYIVICRPVGNMRLRGKH  AAQGIAFVW 19 C P Gt
PARIE_utaS ERYNVVCQPLGTLQMSTKR  GYQLLGFIW 19 C P Gd+Go
PPIN_xenTr DRVFVVCKPMGTLTFTPKQ  ALAGIAASW 19 C P Gt
PER_homSap DRYLTICLPDVGRRMTTNT  YIGLILGAW 19 C P Go
NEUR1_homS DRYLKICYLSYGVWLKRKH  AYICLAAIW 19 C L G?
NEUR2_galG VCCLKICFPAYGNRFRRKH  GQILIACAW 19 C P G?
NEUR3_galG IRFLVTNSSKSNSNKISKNT VHILITFIW 20 N S G?
NEUR4_ornA TRYIKGCHPHRGHFINTAN  ISVALILIW 19 C P G?
TMT_monDom ERYRTL-TLCPGQGADYQK  ALLAVAGSW 19 - L G?
MEL1_homSa DRYLVITRPLATFGVASKRR AAFVLLGVW 20 T P Gq
MEL2_anoCa DRYCVITKPLQSIKRTSKKR TCIIIVFVW 20 T P Gq

While it might seem straightforward to thread any opsin onto its best fit among the five newly available crystallographic structures, that does not work for distantly related paralogs beyond the universal 7-transmembrane feature because loop regions can be of quite different length and so lack discernable alignability, having diverged greatly in amino acid sequence (even though they are all ultimately homologous).

While these structures entail various compromises (such as replacemente of C3 by lysozylme and deletion of carboxy tail to enable stable crystallization), they are hugely important to annotation transfer of sequence/function relationships via comparative genomics. Yet most of the 18 vertebrate opsin orthology classes have only remote models to date and even these can be indeterminate for mid-loop C2 residues (indicative of flexible conformation).

Gene           PDB            Protein                     PubMed      Best human opsin   Next Best         Signaling

RHO1_bosTau    1JFP 3C9M 2J4Y bovine rod rhodopsin        17825322  RHO1_homSap 93%   SWS1_homSap   45%  Gt GNAT1 raises cGMP
MEL1_todPac    2Z73 2ZIY      squid melanopsin            18480818  MEL1_homSap 43%   PER1_homSap   30%  Gq GNAQ? inositol trisphosphate
ADORA2A_homSap 3EML           adenosine receptor 2A       18832607  MEL1_homSap 27%   ENCEPH_homSap 27%  Gs GNAT3 raises cAMP
ADRB1_melGal   2VT4           beta 1 adrenergic receptor  18594507  MEL1_homSap 29%   ENCEPH_homSap 25%  Gs GNAT3 raises cAMP
ADRB2_homSap   2R4R           beta 2 adrenergic receptor  17962520  MEL1_homSap 28%   PER1_homSap   29%  Gs GNAT3 raises cAMP

It has not proven feasible to predict loop conformations ab initio or from peptide libraries; it is folly to consider individual loop structure in isolation (rather than the cytoplasmic face in its entirety) or fail to specify the activation state being computed. Any predicted structure and special roles for individual residues must be consistent with the comparative genomics of close and even distant orthologs because binding relationships to Galpha and other proteins do not change rapidly in evolutionary time (as seen from heterologous substitution experiments). Even when a cytoplasmic loop seems to lack a definable structure, individual residues can be conserved over vast branch length times. That conservation must ultimately be explained.

OpsinCyto3D.jpg

Two new high resolution structures of squid melanopsin establish that the cytoplasmic face is not structurally homologous as a whole across paralogous opsin classes. We knew this already from comparative genomics alone but not specifically why. The xray structure exhibits unprecedented rigid extensions of transmembrane helices 5 and 6 of order 25 angstroms out into the cytoplasm, greatly constraining the intermediate residues of cytoplasmic loop C3. The proximal carboxy terminus also contributes importantly to the overall structure here.

The squid melanopsin structure, used at SwissModel, could readily predict the structure of the cytoplasmic face of all opsins of melanopsin class, of which 48 vertebrate sequences, 9 lophotrochozoan, 43 arthropod, and 1 cnidarian sequences are available here. The Gq signalling partner will be used throughout these melanopsins, yet what features the Galpha protein specifically recognizes in the cytoplasmic face remains obscure. It cannot really be the helical extensions per se because the Gq protein is structurally still homologous to its 15 paralogs (in vertebrates) of different signaling types.

The second cytoplasmic loop

In squid melanopsin, first six residues of cytoplasmic loop C2 also form an extensional helix in squid melanopsin beginning with the DRY motif and surprisingly terminating three residues before the deeply conserved proline (normally a helix breaker as in adrenergic receptors). This proline alone cannot define the two states through its cis and trans configurations because glycine or leucine can also characterize whole opsin orthology classes at this position. The last 3 residues of basic character HRR of loop C2 also preface a transmembrane helix as RAR do in turkey receptor.

Cytoplasmic loop C2 has conserved length of 16-20 in all opsins with much more rigid constraint within individual opsin classes (eg all vertebrate imaging opsins have length 19. The structure of the C2 loop of over 100 melanopsins can readily be modelled based on its closest match among the determined structures, currently squid melanopsin or bovine rhodopsin, with adenosine and adrenergic receptors serving as 'structural outgroup'.

On the basis of length (19 to rhodopsin, 20 to melanopsin), all the opsins except encephalopsin and RGR (both 16 residues) and TMT (18 residues subsequent to a deletion in amniote stem) have a structural model. This model is further constrained by predictable helical extensions of transmembrane helices into the cytoplasm, leaving only the mid-loop region to be predicted. It's not clear whether observed residue conservation -- both within and across orthology classes -- derives from structural importance or instead to Galpha binding specificity requirements.

The adenosine and adrenergic receptor structures -- however useful they might be for annotation transfer to the other 350 non-oderant human GPCR -- ultimately will not prove helpful in modeling the second cytoplasmic loop of opsins (squid melanopsin does that better already). Note C2 in these three structures is consistently stablized by a mid-loop hydrogen bond to the DRY residues. This constraint is not observed in squid melanopsins or other metazoan opsin classes; indeed it is not feasible because no hydrogen bond-capable residue consistently occurs there (in the comparative genomics sense of conserved residue). Ancestrally, this mid-loop bridge might be a derived feature fairly early in the stem of non-opsin GPCR.

OpsinCyto2Five.jpg

MelSecStr.jpg


(to be continued)


The carboxy-terminal tail

This distinctive region has quite baffling length variation across -- and sometimes within -- opsin classes. The extent of conservation also differs greatly, with no real universally conserved residues past the end of the seventh transmembrane helix. The observed terminal conservation pattern for a given opsin must be indicative of its functional importance, even as that stands today insufficiently explained by arrestin phosphoserine or cysteine palmitylation sites, opsin dimerization or other membrane macro organization, or interaction with Galpha proteins. Some interactions would seem to require commonality across all orthology classes (or larger assemblages such as ciliary opsins) while others do not.

The first hand-gapped alignment below illustrates these issues using RGR from 53 species. The alignment begins inside the last transmembrane segment with the Schiff base lysine K and continues past the NAxxY motif at a deeply invariant length (totallying 19 residues) to the "YR" motif found in almost all GPCR. This marks the beginning of the carboxy terminal cytoplasmic tail, which in RGR is fairly fixed at 23 residues, remain alignable and may extend the transmembrane helix but bear no resemblance to any other opsin or GPCR.

The degree of conservation establishes selection is at work. It appears that RGR must terminate in several charged (characteristically basic) residues irregardless of length indels. These could possibly associate electrostatically with membrane phospholipid or be important to initial establishment of topology. Mammals have in effect lost the YR motif though most have an R one residue later. This does not quite coincide with the advent of ERY or GRY mammals in cytoplasmic loop C2.

Conservation of G.WQ.L..Q has persisted for tens of billions of years and cannot be explained by helix or beta sheet per se -- possibly it is constrained by interaction with parts of the other cytoplasmic face. It appears that arrestin could recognize phospserine or threonine in almost all species but palmityolation cannot be widespread. A few species, such as guinea pig, microbat and armadillo may be exhibiting early stages of pseudogenization or at least partial loss of function.

Absent any experimental information or relevent 3D structure or capacity for annotation transfer from homologous regions, the specifics of individual residue and residue patch conservation will remain difficult to explain.

             K..PT.NA..YaLG.E.yr .G.Wq.L..q..........k.K    
>RGR_homSap  KMVPTINAINYALGNEMVC RGIWQCLSPQKRE-----KDRTK   RGR_homSap  KMVPTINAINYALGNEMVC RGIWQCLSPQKREKDRTK      
>RGR_panTro  KMVPTINAINYALGNEMVC RGIWQCLSPQKSE-----KDRTK   RGR_panTro  ................... ...........S......      
>RGR_gorGor  KMVPTINAINYALGNEMVC RGIWQCLSPQKSK-----KDRTK   RGR_ponPyg  ................... ...........S......      
>RGR_ponPyg  KMVPTINAINYALGNEMVC RGIWQCLSPQKSE-----KDRTK   RGR_gorGor  ................... ...........SK.....      
>RGR_nomLeu  KMVPTINAVNYALGNEMVC RGIWQCLSPQKSE-----KDRAK   RGR_nomLeu  ........V.......... ...........S....A.      
>RGR_macMul  KMVPTINAINYALGNEMVC RGIWQCLSPQKSE-----KDRAK   RGR_macMul  ................... ...........S....A.      
>RGR_papHam  KMVPTINAINYALGNEMVC RGIWQCLSPQKSE-----KDRAK   RGR_papHam  ................... ...........S....A.      
>RGR_calJac  KMVPTIDAINYALGNEMIC RGIWQCLSPQKSE-----KDRTK   RGR_calJac  ......D..........I. ...........S......      
>RGR_tarSyr  KTVPTINAYHYALGSEMVC RGIWQCLSPHSSE-----.....   RGR_tarSyr  .T......YH....S.... .........HSS.           
>RGR_otoGar  KTVPTINAVNYALGSEMVC RGIWQCLSLQRSK-----QDGAK   RGR_otoGar  .T......V.....S.... ........L.RSKQ.GA.      
>RGR_micMur  KTVPTINAINYALGSETVC RGIWQCLSPQRSE-----QDRAK   RGR_micMur  .T............S.T.. ..........RS.Q..A.      
>RGR_tupBel  KMVPTVNAVNYALGSETIC RGIWGCLSP-KRE-----RDRAR   RGR_tupBel  .....V..V.....S.TI. ....G....KR-.R..AR      
>RGR_musMus  KTMPTINAINYALHREMVC RGTWQCLSPQKSK-----KDRTQ   RGR_musMus  .TM..........HR.... ..T........SK....Q      
>RGR_ratNor  KTMPTINAINYALRSEMVC RGTWQCRSAQKSK-----QDRTQ   RGR_ratNor  .TM..........RS.... ..T...R.A..SKQ...Q      
>RGR_cavPor  KTVPTINAINYSLG----- RGPWQSLEMQRSK-----QD      RGR_cavPor  .T.........S..R---- -.P..S.EM.RSKQ.         
>RGR_dipOrd  KMVPTVNAINYALCNELLC GGFSLGLLPQKGK-----QDRTQ   RGR_dipOrd  .....V.......C..LL. G.FSLG.L...GKQ...Q      
>RGR_oryCun  KTVPTVNAVNYALGSEVIR RGIWQCLLPQRSV-----RGRAQ   RGR_oryCun  .T...V..V.....S.VIR .......L..RSVRG.AQ      
>RGR_ochPri  KAVPTVNAINYALGSEVIR RGIWQCLLPQRSV-----RDRAQ   RGR_ochPri  .A...V........S.VIR .......L..RSVR..AQ      
>RGR_bosTau  KAVPTVNAMNYALGSEMVH RGIWQCLSPQRRE-----HSREQ   RGR_bosTau  .A...V..M.....S...H ..........R..HS.EQ      
>RGR_susScr  KMVPTVNAINYALGGEMVH RGIWQCLSPQRRE-----RDREQ   RGR_susScr  .....V........G...H ..........R..R..EQ      
>RGR_canFam  KAAPTINAIHYALGGDMVH GGLWQCLSPQRSQ-----PDRAR   RGR_canFam  .AA......H....GD..H G.L.......RSQP..AR      
>RGR_felCat  kaVPTINAINYALGSEMVH RGIWQCLSPQGSG-----LDRAR   RGR_felCat  .A............S...H ..........GSGL..AR      
>RGR_equCab  KTVPTINAVNYALGSEMLH RGIWQCLSPQKSE-----RDRAQ   RGR_equCab  .T......V.....S..LH ...........S.R..AQ      
>RGR_myoLuc  KMVPTVNAVNYALGS---- -GIWQRLSLQ.............   RGR_myoLuc  .....V..V.....S---- -....R..L.              
>RGR_pteVam  KMAPTINAVNYALGSEMVQ RGIWQCLSPQRSE-----RDHAQ   RGR_pteVam  ..A.....V.....S...Q ..........RS.R.HAQ      
>RGR_sorAra  KTVPTVNALHYGLGSGMVQ NGFRKGLWLQRRE-----RERAL   RGR_sorAra  .T...V..LH.G..SG..Q N.FRKG.WL.R..RE.AL      
>RGR_eriEur  ktVPTVNAVHYVLGSEKVH KGFWQCFSPQRSE-----QDRAR   RGR_eriEur  .T...V..VH.V..S.K.H K.F...F...RS.Q..AR      
>RGR_loxAfr  KAVPVINACHYALGSEVVR GGIWQYLSRQRGESPLRARDRTH   RGR_loxAfr  .A..V...CH....S.V.R G....Y..R.RG.SPLRAR DRTH
>RGR_proCap  KAVPIVNACHYALGSETVH RGIWQCLSRQRGESPPRTRDRTQ   RGR_proCap  .A..IV..CH....S.T.H ........R.RG.SPPRTR DRTQ
>RGR_echTel  KAVPIVNACHYALGSETVH RGIWQCLSRQRGESPPRTRDRTQ   RGR_echTel  .A..IV..CH....S.T.H ........R.RG.SPPRTR DRTQ
>RGR_choHof  KTMPTINAFQYALGSETVC RDIWQCLPRLRSMGRSSGHD      RGR_choHof  .TM.....FQ....S.T.. .D.....PRLRSMGRSSGH D   
>RGR_dasNov  KTMPTVNALYYALGRESVH RNA                       RGR_dasNov  .TM..V..LY....R.S.H .NA                      
>RGR_ornAna  KTVPVIDAFTYALRNEDYR GGIWQFLTGQKIERV-EVENKIK   RGR_ornAna  .T..V.D.FT...R..DYR G....F.TG..I.RVEVEN KIK
>RGR_xenTro  KTSPAVNAYVYGLGNENYR GGIWQYLTGQKLEKA-ETDNKTK   RGR_xenTro  .TS.AV..YV.G....NYR G....Y.TG..L..AE.DN KTK
>RGR_xenLae  KISPAVNAYVYGLGNENYR GGIWLYLTGQKLEKA-ETDSRTK   RGR_xenLae  .IS.AV..YV.G....NYR G...LY.TG..L..AE.DS RTK
>RGR1_danRer KTSPTFNVFVYALGNENYR GGIWQLLTGQKIESP-AIENKSK   RGR1_danRe  .TS..F.VFV......NYR G....L.TG..I.SPAIEN KSK
>RGR1_takRub KTCPTINVFLYALGNENYR GGIWQFLTGEKIEAP-QIENKSK   RGR1_gasAc  .TS..F.VFL......NYR G....L.TGE.IDVPQIEN KSK
>RGR1_tetNig KTCPTVNVFLYALGNENYR GGIWQFLTGEKIETP-QLENKTK   RGR1_gadMo  .TA..F.VFL......NYR G....L.TGE.I.VPQIEN KSK
>RGR1_gasAcu KTSPTFNVFLYALGNENYR GGIWQLLTGEKIDVP-QIENKSK   RGR1_takRu  .TC....VFL......NYR G....F.TGE.I.APQIEN KSK
>RGR1_oryLat KTSPTFNPLLYALGNENYR GGIWQFLTGEKIHVP-QDDNKSK   RGR1_tetNi  .TC..V.VFL......NYR G....F.TGE.I.TPQLEN KTK
>RGR1_gadMor KTAPTFNVFLYALGNENYR GGIWQLLTGEKIEVP-QIENKSK   RGR1_oryLa  .TS..F.PLL......NYR G....F.TGE.IHVPQDDN KSK
>RGR2_danRer KTSPIFHAVLYAYGNEFYR GGVWQFLTGQK-----SAD-KKK   RGR2_danRe  .TS.IFH.VL..Y...FYR G.V..F.TG..SADKKK
>RGR2_pimPro KTSPIFHAAMYAYGNEFYR GGIWQFLTGQK-----PAD-KKK   RGR2_pimPr  .TS.IFH.AM..Y...FYR G....F.TG..PADKKK
>RGR2_tetNig KTNPIFNALLYTFGNEFYR GGVWHFLTGHKIVDP-VLK-KSK   RGR2_tetNi  .TN.IF..LL.TF...FYR G.V.HF.TGH.IVDPVL.K SK
>RGR2_gasAcu kTNPIFNALLYSFGNEFYR GGVWHFLTGQKMVDP-VVK-KSK   RGR2_gasAc  .TN.IF..LL.SF...FYR G.V.HF.TG..MVDPVV.K SK
>RGR2_oryLat KTNPFFNALLYSFGNEFYR GGVWNFLTGQKIVEP-DVK-KSKQK RGR2_hipHi  .TN.IF..LL.SF...FYR G.V.HF.TG..IVDPVV.K SK
>RGR2_oncMyk KTNPISNAWLYSFGNEFYR GGVWQFLTGQKFTEP-VVV-KLKGR RGR2_oryLa  .TN.FF..LL.SF...FYR G.V.NF.TG..IVEPDV.K SKQK
>RGR2_espLuc KMNPIFNALLYSFGNEFYR GGVWQFLTGQKFTEL-VVV-KLKGR RGR2_poeRe  .TN.IF..FL.SF...FYR G.V.NF.TG..IVEPDV.K SK
>RGR2_gadMor KTNPISNALLYSFGNESYR SGVWHFLTGQKFVEP-SFK-KIK   RGR2_oncMy  .TN.IS..WL.SF...FYR G.V..F.TG..FTEPVVVK LKGR
>RGR2_poeRet KTNPIFNAFLYSFGNEFYR GGVWNFLTGQKIVEP-DVK-KSK   RGR2_espLu  ..N.IF..LL.SF...FYR G.V..F.TG..FTELVVVK LKGR
>RGR2_hipHip KTNPIFNALLYSFGNEFYR GGVWHFLTGQKIVDP-VVK-KSK   RGR2_gadMo  .TN.IS..LL.SF...SYR S.V.HF.TG..FVEPSF.K IK  

Peropsin exhibits greater conservation both in its post-K helix and in its cytoplasmic tail than RGR. The FR motif is perfectly conserved throughout vertebrates. Length, ancestrally 32 residues, experienced an era of variability in amniotes but then settled down to a fixed 35 residues in mammals. The differance alignment shows that a central motif EITISN conserved in early vertebrates changed character completely (to TMPVTS) in mammals, though the earlier motif still appears faded in platypus. A cysteine conserved back to invertebrates might be palmitoylated; conserved serines and threonines offer potential phosphorylation sites.

The cytoplasmic tail of peropsin is completely unalignable to RGR. Unlike RGR, tblastn of peropsin tail against whole human genome elicits matches to imaging opsins and a GPCR (neuropeptide Y receptor). While these matches are weak and largely driven by the last transmembrane section alone, 3 early tail residues (*) emerge as possible conserved residues. Whether or not homologically valid, this suggests modeling of the first 9 residues of peropsin tail by known bovine rhodopsin structure.

                                  *  * *
peropsin   KSSTFYNPCIYVVANKKFR RAMLAMFKC
           KS+T YNP IYV  N++FR   +L +F C
LWS opsin  KSATIYNPVIYVFMNRQFR NCILQLF  
RHO opsin  KSAAIYNPVIYIMMNKQFR NCMLTTICC
NPY2R GPCR ..STFANPLLYGWMNSNYR KAFLSAFRC

Conserved   ksstfynpciyv.ankkFR rAm.aMfkCqthq.mpvts.lpm.vsq.pl.sgr.  
PER_homSap  KSSTFYNPCIYVVANKKFR RAMLAMFKCQTHQTMPVTSILPMDVSQNPLASGRI       PER_homSap  KSSTFYNPCIYVVANKKFR RAMLAMFKCQTHQTMPVTS ILPMDVSQNPLASGRI      
PER_panTro  KSSTFYNPCIYVVANKKFR RAMLAMFKCQTHQTMPVTSILPMDVSQNPLASGRI       PER_panTro  ................... ................... ................      
PER_gorGor  ksstfynpciyvvankKFR RAMLAMFKCQTHQTMPVTSILPMDVSQNPLASGRI       PER_gorGor  ................... ................... ................      
PER_ponPyg  KSSTFYNPCIYVVANKKFR RAMLAMFKCQTHQTMPVTSILPMDVSQNPLASGRI       PER_ponPyg  ................... ................... ................      
PER_nomLeu  KSSTFYNPCIYVVANKKFR KAMLAMFKWPNHQTMPGTSILPMDVSQNPLTSGKI       PER_nomLeu  ................... K.......WPN.....G.. ...........T..K.      
PER_macMul  KSSTFYNPCIYVVANKKFR RAMLAMFKCQTHQTMPVTSILPMDVSQNPLASGRI       PER_macMul  ................... ................... ................      
PER_papHam  KSSTFYNPCIYMVANKKFR RAMLAMFKCQTHQTMPVTSILPMDVSQNPLASGRI       PER_papHam  ...........M....... ................... ................      
PER_calJac  KSSTFYNPCIYVVANKKFR RAMLAMLKCQTHQTMPVTSVLPMDISQNPLASGRI       PER_calJac  ................... ......L............ V....I..........      
PER_tarSyr  ksstfynpciyvvankKFR RAMFAMLKCQTYQAMPATSSLPMNVSQNPLTSGKN       PER_tarSyr  ................... ...F..L....Y.A..A.. S...N......T..KN      
PER_otoGar  KSSTFYNPCIYVVANKKFR RAMFAMFKCQTHQAMAVTSILPMDISQNPLASRRI       PER_otoGar  ................... ...F.........A.A... .....I.......R..      
PER_micMur  KSSTFYNPCIYVIANKKFR RAMFAMFKCQTHQAMPVTSIFPMGVSQNPLPSGRT       PER_micMur  ............I...... ...F.........A..... .F..G......P...T      
PER_tupBel  KSSTFYNPCIYVLANKKFR KAMCAMFKCQTHQAMSVTSVLPMASSPRPLAPARV       PER_tupBel  ............L...... K..C.........A.S... V...AS.PR...PA.V      
PER_musMus  KSSTFYNPCIYVAAHKKFR KAMLAMFKCQPHLAVPEPSTLPMDMPQSSLAPVRI       PER_musMus  ............A.H.... K.........P.LAV.EP. T....MP.SS..PV..      
PER_ratNor  KSSTFYNPCIYVAANKKFR KAMFAMLKCQPHQAMPEPSTLAMGVPHSPLAPARI       PER_ratNor  ............A...... K..F..L...P..A..EP. T.A.G.PHS...PA..      
PER_ochPri  KSSTFYNPCIYVAANKRSR RAMFAMFKCQIPQAKPVTSLSPRDVSQSPLSSGRT       PER_cavPor  ............I...... ...F...Q.....AV..A. .....A..S.......      
PER_cavPor  KSSTFYNPCIYVIANKKFR RAMFAMFQCQTHQAVPVASILPMDASQSPLASGRI       PER_dipOrd  ................... ......L......A..... ................      
PER_speTri  KSSTFYNPCIYVAANKRFR RAMFAMFKCQTHQAMPVTSVLPMDVSQSPRASGRI       PER_speTri  ............A...R.. ...F.........A..... V.......S.R.....      
PER_oryCun  KSSTFYNPCIYVAANKRFR RAMFAMFKCQTHQAMPVTSVLPMDVSQNPLPSGII       PER_ochPri  ............A...RS. ...F......IP.AK.... LS.R....S..S...T      
PER_dipOrd  KSSTFYNPCIYVVANKKFR RAMLAMLKCQTHQAMPVTSILPMDVSQNPLASGRI       PER_oryCun  ............A...R.. ...F.........A..... V..........P..I.      
PER_bosTau  KSSTFYNPCIYVIANKKFR RAMLAMFKCQTTQAMPVTSVLPMDVPQNPLTSGKV       PER_bosTau  ............I...... ...........T.A..... V.....P....T..KV      
PER_turTru  KSSTFYNPCIYVIANKKFR RAMLAMFKCQTHQAMPMESILPMDVPQNPLTSGKV       PER_turTru  ............I...... .............A..ME. ......P....T..KV      
PER_susScr  KSSTFYNPCIYVIANKKFR RAMLAMFKCQTHQAMPLESTLPMDVPQNPLASGRV       PER_vicVic  ............I...... .............A..M.. ......P....T...L      
PER_vicVic  KSSTFYNPCIYVIANKKFR RAMLAMFKCQTHQAMPMTSILPMDVPQNPLTSGRL       PER_susScr  ............I...... .............A..LE. T.....P........V      
PER_canFam  KSSTFYNPCIYVVANKKFR KAIFAMFKCQTHQAMPGTSILPMDVSQNPLASGRN       PER_canFam  ................... K.IF.........A..G.. ...............N      
PER_felCat  ksstfynpciyvvankKFR KAMFAMFKCENRQPMPVTSILPMDVSQNPLTSGRK       PER_felCat  ................... K..F.....ENR.P..... ...........T...K      
PER_equCab  KSSTFYNPCIYVVANKKFR RAMFAMFKCQTHRAMPVTSILPMDVPQNQLASGRI       PER_equCab  ................... ...F........RA..... ......P..Q......      
PER_myoLuc  KSSTFYNPCIYVVANKKFR RAMFAMFKCQTHQTMTTMSFLPMDVPQNPLTSGRI       PER_myoLuc  ................... ...F...........TTM. F.....P....T....      
PER_pteVam  KSSTFYNPCIYVVANKKFR RAMFAMFKCQDHQSMPVTSVLPMDVPQNPLTSGRI       PER_pteVam  ................... ...F......D..S..... V.....P....T....      
PER_eriEur  KSSTFYNPCIYVLANKKFR RAMFAMFKCQTHQAMPVTNTLPMDIPQK-LDSRRN       PER_eriEur  ............L...... ...F.........A....N T....IP.K-.D.R.N      
PER_sorAra  KSSTFYNPCIYVVANKKFR RAMSAMLTCRAQGAMPAASTLPMDAAHSPQASGRN       PER_sorAra  ................... ...S..LT.RAQGA..AA. T....AAHS.Q....N 
PER_loxAfr  KSSTFYNPCIYVVANKKFR RAMFAMFKCQTHQAEPVTCILPMNVSQNPLAAGRI       PER_loxAfr  ................... ...F.........AE...C ....N.......A...      
PER_echTel  ksstfynpciyvvankKFR RAMFALLQCQPQEARRVTSILPMNVSQNPMASGRL       PER_echTel  ................... ...F.LLQ..PQEARR... ....N.....M....L      
PER_proCap  KSSTFYNPCIYVVANKKFR RAMLAMFKCQTHQAVPVTNILPMTVSQNSSASGRI       PER_proCap  ................... .............AV...N ....T....SS.....      
PER_choHof  KSSTFYNPCIYVVANKKFR TIMFAMLKCQTHQAVPVTSILPMNVSENPLASGRI       PER_choHof  ................... TI.F..L......AV.... ....N..E........      
PER_dasNov  KSSTFYNPCIYVVANKKFR RAIFAMLKCQTHQAMPVMSILPMNVSENPLASGRI       PER_dasNov  ................... ..IF..L......A...M. ....N..E........      
PER_monDom  KSSTFYNPCIYVAANKKFR RAISAMIRCQTHQSMPISNALPMN                  PER_monDom  ............A...... ..IS..IR.....S..ISN A...N       
PER_macEug  KSSTFYNPCIYVAANKKFR RAISAMMRCETHQSMPVSNALPLNLT                PER_macEug  ............A...... ..IS..MR.E...S...SN A..LNLT     
PER_ornAna  KSSTFYNPCIYVVANKKFR RAMLSMVQCQTHREITITDVLPMNRSRSPLTL          PER_ornAna  ................... ....S.VQ....REITI.D V...NR.RS..TL    
PER_galGal  KSSTFYNPCIYVIANKKFR RAILAMVRCQTRQEITISNALPMTVSLSALTS          PER_galGal  ............I...... ..I...VR...R.EITISN A...T..LSA.T.    
PER_taeGut  KSSTFYNPCIYVIANKKFR RAILAMVRCQTRQEITINNALPMSVSQSALTSQNSSHLPA  PER_taeGut  ............I...... ..I...VR...R.EITINN A...S...SA.T.QNSSHL PA
PER_anoCar  KSSTFYNPCIYVIANKRFR RAILAMIRCQTRQEITINNVLPMSVSQSTIA           PER_anoCar  ............I...R.. ..I...IR...R.EITINN V...S...STI.     
PER_xenTro  KSSTFYNPCIYVIANKKFR RAILSMVQCKSRQEVTLDNHFPMNVSQSTLTT          PER_xenTro  ............I...... ..I.S.VQ.KSR.EVTLDN HF..N...ST.TT    
PER_danRer  KSSTFYNPCIYVIANKKFR RAIIGMIRCQTRQRVTINNQLPMMASSVPLNP          PER_danRer  ............I...... ..IIG.IR...R.RVTINN Q...MA.SV..NP    
PER_gasAcu  KSSTFYNPCIYVIANKKFR RAIIGMVRCQTRQRITINSQVPMTTSQQPLTQ          PER_gasAcu  ............I...... ..IIG.VR...R.RITIN. QV..TT..Q..TQ    
PER_oryLat  KSSTFYNPCIYVIANKKFR RAIIGMIRCQTRQRITISTQVPMTISQQPLTQ          PER_oryLat  ............I...... ..IIG.IR...R.RITIST QV..TI..Q..TQ    
PER_takRub  KSSTFYNPCIYVIANKKFR RAIIGMIRCQTRQQMTINTEIPMTTSQQTATQ          PER_takRub  ............I...... ..IIG.IR...R.Q.TINT EI..TT..QTATQ    
PER_tetNig  KSSTFYNPCIYVITNKKFR QAIIGMIRCQTRQQITINTDIPMTASQQTLTQ          PER_tetNig  ............IT..... Q.IIG.IR...R.QITINT DI..TA..QT.TQ    
PER_calMil  KSSTFYNPCIYVIANKKFR KAIMAMICCQNRQEITINHTLPMTISRVPLTE          PER_calMil  ............I...... K.IM..IC..NR.EITINH T...TI.RV..TE    
PER1b_sacK  KIPAVFNPVIYVALNPEFR KYFGKTIGCRRKRKKPIAVRLNGSEQNVENTI          PER1b_sacK  .IPAVF..V...AL.PE.. KYFGKTIG.RRKRKK.IAV R.NGSEQNVENTI

Reference sequence collection

Cytoplasmic loop C2 from 101 melanopsins

species    helix bridge area  hel transmemb Le 7 9
MEL1_homSa DRYLV ITRPLATFGVAS KRR AAFVLLGVW 20 T P
MEL1_panTr DRYLV ITRPLATFGVAS KRR AAFVLLGVW 20 T P
MEL1_gorGo DRYLV ITRPLATFGVAS KRR AAFVLLGVW 20 T P
MEL1_ponAb DRYLV ITRPLATIGVAS KRR AAFVLLGVW 20 T P
MEL1_rheMa DRYLV ITRPLATIGVAS KRR AAFVLLGVW 20 T P
MEL1_calJa DRYLV ITRPLATIGVAS TKR AAFVLLGVW 20 T P
MEL1_micMu DRYLV ITRPLASVGTAS KRR AGLVLLGVW 20 T P
MEL1_otoGa DRYLV ITRPLTTVGVAS KRR AALVLLGVW 20 T P
MEL1_musMu DRYLV ITRPLATIGRGS KRR TALVLLGVW 20 T P
MEL1_ratNo DRYLV ITRPLATIGMRS KRR TALVLLGVW 20 T P
MEL1_nanEh DRYLV ITRPLATIGVAS KRR TALVLLGVW 20 T P
MEL1_phoSu DRYLV ITRPLATIGMGS KRR TALVLLGIW 20 T P
MEL1_dipOr DRYLV ITRPLATIGVTS KRR TAFVLLGVW 20 T P
MEL1_cavPo DRYLV ITRPLATIGVAS KRQ AALVLLGVW 20 T P
MEL1_speTr DRYLV ITRPLATIGMAS KKR AAFFLLGVW 20 T P
MEL1_oryCu DRYLV ITRPLAAVGMVS KKR AGLVLLGVW 20 T P
MEL1_ochPr DRYLV ITRPLAAVGMVS KRR TGLVLLGVW 20 T P
MEL1_bosTa DRYLV ITRPLATVGMVS KRR AALVLLGVW 20 T P
MEL1_turTr DRYLV ITRPLATVGMVS KRR AALVLLGVW 20 T P
MEL1_susSc DRYLV ITHPLATVGMVS KRR AALVLLGVW 20 T P
MEL1_equCa DRYLV ITRPLATVGVVS KRW AALVLLGIW 20 T P
MEL1_felCa DRYLV ITHPLATIGVVS KRR AALVLLGVW 20 T P
MEL1_canFa DRYLV ITHPLAAVGVVS KRR AALVLLGVW 20 T P
MEL1_myoLu DRYLV ITRPLA-IGVVS KRR AALVLLGVW 19 T P
MEL1_pteVa DRYLV ITRPLAAIGVVS KRR AALVLLGVW 20 T P
MEL1_eriEu DRYLV ITRPLATIGVVS KRR VALVLLGVW 20 T P
MEL1_loxAf DRYLV ITRPLATIGVVS KRR AALVLLGIW 20 T P
MEL1_proCa DRYLV ITRPLATIGVVS KRR TALVLLGTW 20 T P
MEL1_echTe DRYLV ITRPLATIGVVS KRR AALVLLVIW 20 T P
MEL1_smiCr DRYFV ITRPLASIGMIS KKK TGLILLGVW 20 T P
MEL1_monDo DRYFV ITRPLASIGVIS KKK TGFILLGVW 20 T P
MEL1_ornAn DRYFV ITRPLASIGVIS KKR ALLILTGVW 20 T P
MEL1_anoCa DRYFV ITRPLASIGAMS TKK ALLILSGVW 20 T P
MEL1_taeGu DRYFV ITKPLASVGVTS KKK ALIILVGVW 20 T P
MEL1_galGa DRYFV ITKPLASVRVMS KKK ALIILVGVW 20 T P
MEL1_xenTr DRYFV ITRPLTSIGVMS KKR AVLILSGVW 20 T P
MEL1_danRe DRYFV ITRPLASIGVLS QKR ALLILLVAW 20 T P
MEL1_danRe DRYFV ITRPLASIGVMS RKR ALLILSAAW 20 T P
MEL1_takRu DRYFV ITRPLTSIGVLS RKR AFVILMTVW 20 T P
MEL1_gasAc DRYFV ITRPLTSIGMMS RRR ALLILMGAW 20 T P
MEL1_oryLa DRYFV ITRPLTSIGVLS RKR ALLILSAAW 20 T P
MEL1_calMi DRYFV ITRPLASIGVLS HRR AGLIILSLW 20 T P
MEL1_petMa DRYLV LTRPLASIGAMS KRR AMYITAAVW 20 T P
MEL2_galGa DRYLV ITKPLRSIQWTS KKR TIQIIAAVW 20 T P
MEL2_anoCa DRYCV ITKPLQSIKRTS KKR TCIIIVFVW 20 T P
MEL2_xenLa NRYIV ITKPLQSIQWSS KKR TSQIIVLVW 20 T P
MEL2_danRe DRYLV ITKPLQTIQWNS KRR TGLAILCIW 20 T P
MEL2_tetNi DRYVV ITKPLQTIRRSS KRR TALAILMVW 20 T P
MEL2_gasAc DRYLV ITKPLQAIHWGS KRR TTLAILLVW 20 T P
MEL1_plaDu DRFYV ITNPLGAAQTMT KKR AFIILTIIW 20 T P
MEL1_capCa DRYMV IAKPFYAMKHVS HKR SLIQIILAW 20 A P
MEL1_helRo DRYLV VGQPLAMLNQSH FRR SFYHVLIIW 20 G P
MEL1_todPa DRYNV IGRPMAASKKMS HRR AFIMIIFVW 20 G P
MEL1_schMe DRYFV IAQPFQTMKSLT IKR AIIMLVFVW 20 A P
MEL2_schMa DRYLV IATPFESVFQTT PRR TLLLMLFLW 20 A P
MEL1_lotGi DRYLV ITSPFTAMRNMT HKR AFLMIVGVW 20 T P
MEL1_sepOf DRYNV IGRPMAASKKMS HRR AFLMIIFVW 20 G P
MEL1_entDo DRYNV IGRPMAASKKMS HRR AFLMIIFVW 20 G P
UVV_camAb  DRYST IARPLDGKLS   RGQ VLLLIMLIW 18 A P
UVV_catBo  DRYST IARPLDGKLS   RGQ VILLIALIW 18 A P
UVV_apiMe  DRYST IARPLDGKLS   RGQ VILFIVLIW 18 A P
BLU_apiMe  DRYRT ISCPIDGRLN   SKQ AAVIIAFTW 18 S P
BLU_ DRoMe DRYKT ISNPIDGRLS   YGQ IVLLILFTW 18 S P
BLU_manSe  DRYKT ISSPLDGRIN   TVQ AGLLIAFTW 18 S P
UVV1_droMe DRYNV ITKPMNRNMT   FTK AVIMNIIIW 18 T P
UVV1_pedHu DRCET ITNPL-QKSG   KKK AFLLAAFTW 18 T P
UVV_manSe  DRHST ITRPLDGRLS   EGK VLLMVAFVW 18 T P
UVV_papXu  DRHST ITRPLDGRLS   RGK VLLMMVCVW 18 T P
UVV2_droMe DRFNV ITRPMEGKMT   HGK AIAMIIFIY 18 T P
UVV2_pedHu DRYQV IVHPLER-KT   KAA VYFQILLIW 18 V P
LWS_nemVe  DRYIV IVHPMKKIMT   RKK AALMIVGVW 18 V P
LWS_pedHu  DRYNV IVKGLSAKPMT  IKM ALLNILFVW 19 V G
LWS_vanCa  DRYNV IVKGIAAKPLT  ING AMLRVLGIW 19 V G
LWS_papXu  DRYNV IVKGIAAKPMT  ING ALLRILGIW 19 V G
LWS_helSa  DRYNV IVKGIAAKPMT  ING ALLRVFGIW 19 V G
LWS_pieRa  DRYNV IVKGIAAKPMT  INS ALLRILGVW 19 V G
LWS_manSe  DRYNV IVKGIAAKPMT  SNG ALLRILGIW 19 V G
MWS2_droMe DRYNV IVKGINGTPMT  IKT SIMKILFIW 19 V G
LWS_rhoPr  DRYNV IVKGISAKPMT  NKT AMLRILLVW 19 V G
LWS_meoOe  DRYNV IVKGISGTPLS  QKN TTLQVLFVW 19 V G
LWS_catBo  DRYNV IVKGLSAKPMT  ING ALLRILGIW 19 V G
LWS_schGr  DRYNV IVKGLSAKPMT  NKT AMLRILFIW 19 V G
LWS_triCa  DRYNV IVKGLSAQPLT  KKG AMLRILIIW 19 V G
LWS2_apiMe DRYNV IVKGLSGKPLS  ING ALIRIIAIW 19 V G
LWS_bomTe  DRYNV IVKGLSGKPLT  ING ALLRILGIW 19 V G
MWS_calEr  DRYNV IVKGMAGQPMT  IKL AIMKIALIW 19 V G
MWS1_droMe DRYQV IVKGMAGRPMT  IPL ALGKIAYIW 19 V G
LWS_droMe  DRYCV IVKGMARKPLT  ATA AVLRLMVVW 19 V G
LWS_arcGr  DRYNV IVKGVAAEPLT  SKG ASIRILFVW 19 V G
LWS_eupSu  DRYNV IVKGVAATPLT  NKG AFARNIFSW 19 V G
LWS_camLu  DRYNV IVKGVAGEPLS  TKK ASLWILTVW 19 V G
LWS_proMi  DRYNV IVKGVAGEPLS  TKK ASLWILIVW 19 V G
LWS_holCo  DRYNV IVKGVSAEPLT  SGG AMMRIAGTW 19 V G
LWS_homGa  DRYNV IVKGVSATPLT  TNG AMLRNLFSW 19 V G
LWS_neoAm  DRYNV IVKGVSGEPLT  NSG AMTRIAGTW 19 V G
LWS_neoOe  DRYNV IVKGVSGKPLS  QKN ATLQVLFVW 19 V G
LWS_mysDi  ERYNV IVKGVSSKPLS  VKG AITRIVLTW 19 V G
LWS1_apiMe DRYNV IVKGMSGTPLT  IKR AMLQILGIW 19 V G
LWS_limPo  DRYNV IVRGMAAAPLT  HKK ATLLLLFVW 19 V G
LWS_limPo  DRYNV IVRGMAAAPLT  HKK ATLLLLFVW 19 V G
LWS_ixoSc  DRYNV IVRGVAAAPLT  HKR AALMIFFVW 19 V G
ADRB2_homS DRYFA ITSPFKYQSLLT KNK ARVIILMVW 20 T P
ADRA2A_hom DRYWS ITQAIEYNLKRT PRR IKAIIITVW 20 T A
ADRA2C_hom DRYWS VTQAVEYNLKRT PRR VKATIVAVW 20 T A
HTR1A_homS DRYWA ITDPIDYVNKRT PRR AAALISLTW 20 T P
CHRM1_homS DRYFS VTRPLSYRAKRT PRR AALMIGLAW 20 T P
DRD2_homSa DRYTA VAMPMLYNTRYS KRR VTVMISIVW 21 A P
TAAR9_homS DRYIA VTDPLTYPTKFT VSV SGICIVLSW 20 T P
ADRA2B_hom DRYWA VSRALEYNSKRT PRR IKCIILTVW 20 S A

Reference collection of 352 cytoplasmic loop sequences from all opsins

The second column contains the C2 loop sequences. The third column shows the continuation into transmembrane helix 4. The end of the loop region is determined by countback from the invariant tryptophan at position 160 in squid melanopsin as well as from crystallography and transmembrane prediction tools. Other columns show loop length and values at potentially informative positions 7 and 9 (which are generally characteristic of orthology class).

RHO1_homSa	ERYVVVCKPMSNFRFGENH	AIMGVAFTW	19	C	P
RHO1_bosTa	ERYVVVCKPMSNFRFGENH	AIMGVAFTW	19	C	P
RHO1_ornAn	ERYIVVCKPMSNFRFGENH	AIMGVAFTW	19	C	P
RHO1_monDo	ERYVVVCKPMSNFRFGENH	AIIGVAFTW	19	C	P
RHO1_galGa	ERYVVVCKPMSNFRFGENH	AIMGVAFSW	19	C	P
RHO1_calMi	ERYVVVCKPMSNFRFGTNH	AIMGVAFTW	19	C	P
RHO1_xenTr	ERYVVVCKPMANFRFGENH	AIMGVVFTW	19	C	P
RHO1_latCh	ERYVVVCKPMSNFRFGENH	AIMGVIFTW	19	C	P
RHO1_neoFo	ERYIVVCKPISNFRFGENH	AIMGVVFTW	19	C	P
RHO1_angAn	ERWVVVCKPMSNFRFGENH	AIMGLAFTW	19	C	P
RHO1_takRu	ERYIVVCKPMTNFRFGEKH	AIAGLVFTW	19	C	P
RHO1_leuEr	ERYMVVCKPMANFRFGSQH	AIIGVVFTW	19	C	P
RHO1_petMa	ERYIVICKPMGNFRFGSTH	AYMGVAFTW	19	C	P
RHO1_letJa	ERYIVICKPMGNFRFGNTH	AIMGVAFTW	19	C	P
RHO1_geoAu	ERYIVICKPMGNFRFGNTH	AIMGVALTW	19	C	P
RHO2_galGa	ERYIVVCKPMGNFRFSATH	AMMGIAFTW	19	C	P
RHO2_gekGe	ERYIVICKPMGNFRFSATH	AIMGIAFTW	19	C	P
RHO2_anoCa	ERYIVVCKPMGNFRFSATH	ALMGISFTW	19	C	P
RHO2_taeGu	ERYIVICKPMGNFRFSASH	ALMGIAFTW	19	C	P
RHO2_podSi	ERYIVVCKPMGNFRFSSSH	ALMGIAFTW	19	C	P
RHO2_pheMa	ERYIVICKPMGNFRFSSSH	AMMGISFTW	19	C	P
RHO2_latCh	ERYIVVCKPMGNFRFASSH	AIMGIAFTW	19	C	P
RHO2_geoAu	ERYIVVCKPMGNFRFATTH	AALGVVFTW	19	C	P
RHO2_neoFo	ERYIVVCKPMGNFRFSNNH	SIIGIVFTW	19	C	P
RHO1_anoCa	ERYVVICKPMSNFRFGETH	ALIGVSCTW	19	C	P
RHO1_conMy	ERWMVVCKPVTNFRFGESH	AIMGVMVTW	19	C	P
RHO2_ancDa	ERYIVVCKPMGSFKFSSSH	AMAGIAFTW	19	C	P
RHO2a_danR	ERYIVVCKPMGSFKFSANH	AMAGIAFTW	19	C	P
RHO2b_danR	ERYIVVCKPMGSFKFSSNH	AMAGIAFTW	19	C	P
RHO2c_danR	ERYIVVCKPMGSFKFSSNH	AFAGIGFTW	19	C	P
RHO2d_danR	ERYIVVCKPMGSFKFSASH	AFAGCAFTW	19	C	P
RHO2_oryLa	ERYIVVCKPMGSFKFTATH	SAAGCAFTW	19	C	P
RHO2_takRu	ERYVVVCKPMGSFKFTGTH	AAVGVAFTW	19	C	P
RHO2_gasAc	ERYIVVCKPMGSFKFSGTH	AGAGVLFTW	19	C	P
RHO2_hipHi	ERYIVVCKPMGSFKFSGTH	AGIGVLFTW	19	C	P
RHO2_mulSu	ERYIVVCKPMGSFKFSGTH	AGAGVAFTW	19	C	P
RHO2_oreNi	ERYIVVCKPMGSFKFTGAH	AGAGVLFTW	19	C	P
RHO2_pomMi	ERYIVVCKPMGSFKFSGAH	AGAGVALTW	19	C	P
SWS2_ornAn	ERFLVICKPLGNLSFRGTH	AIFGCAATW	19	C	P
SWS2_anoCa	ERYLVICKPLGNFTFRGTH	AIIGCAVTW	19	C	P
SWS2_utaSt	ERFLVICKPLGNFSFRGTH	AIIGCIITW	19	C	P
SWS2_taeGu	ERFLVICKPLGNFTFRGSH	AVLGCAITW	19	C	P
SWS2_galGa	ERFLVICKPLGNFTFRGSH	AVLGCVATW	19	C	P
SWS2_neoFo	ERFLVICKPLGNFTFRSTH	AIIGCVATW	19	C	P
SWS2_xenTr	ERFLVICKPMGNFTFRESH	AVLGCILTW	19	C	P
PIN_galGal	ERYVVVCRPLGDFQFQRRH	AVSGCAFTW	19	C	P
PIN_pheMad	ERYLVICKPVGDFQFQRRH	AVIGCLYTW	19	C	P
PIN_utaSta	ERYLVICKPVGDFRFQQRH	AVFGCVFTW	19	C	P
PIN_xenTro	ERYLVICKPMGDFRFQQKH	AILGCSFTW	19	C	P
PIN_bufJap	ERYIVICKPMGDFRFQQRH	AVMGCAFTW	19	C	P
PIN_podSic	ERYLVICKPVGDFRFPARH	AVLGCAFTW	19	C	P
PIN_calMil	ERYIVICKPMGDFRFQQKH	AVWGCLFTW	19	C	P
SWS1_homSa	ERYIVICKPFGNFRFSSKH	ALTVVLATW	19	C	P
SWS1_monDo	ERFIVICKPFGNFRFNSKH	AMMVVLATW	19	C	P
SWS1_smiCr	ERFIVICKPFGNFRFNSKH	AMMVVLATW	19	C	P
SWS1_tarRo	ERFIVICKPFGNFRFSSKH	AMMVVLATW	19	C	P
SWS1_taeGu	ERYIVICKPFGNFRFNSRH	ALLVVAATW	19	C	P
SWS1_anoCa	ERYIVICKPFGNFRFNSRH	ALLVVAATW	19	C	P
SWS1_utaSt	ERYIVICKPFGNFRFNSKH	ALLVVAATW	19	C	P
SWS1_galGa	ERYIVICKPFGNFRFSSRH	ALLVVVATW	19	C	P
SWS1_geoAu	ERYIVICKPFGNFRFGSKH	ALVAVGLTW	19	C	P
SWS1_neoFo	ERYLVICKPIGNFRFGSKH	SMIAVVAAW	19	C	P
SWS1_xenLa	ERYIVICKPMGNFNFSSSH	ALAVVICTW	19	C	P
SWS1_petMa	ERYIVICKPFGNFRFGSIH	SLFAFCLTW	19	C	P
SWS1_danRe	ERYVVICKPFGSFKFGQGQ	AVGAVVFTW	19	C	P
SWS1_oryLa	ERYLVICKPFGAFKFGSNH	ALAAVIFTW	19	C	P
SWS2_geoAu	ERCLVICKPFGNIAFRGTH	ALIRCGFAW	19	C	P
SWS2_takRu	ERWLVVCKPLGNFIFKPDH	AIVCCIFTW	19	C	P
SWS2_gasAc	ERWLVICKPLGNFIFKPDH	ALVCCAFTW	19	C	P
LWS_homSap	ERWMVVCKPFGNVRFDAKL	AIVGIAFSW	19	C	P
LWS_monDom	ERWVVVCKPFGNVKFDAKL	AMVGIIFSW	19	C	P
LWS_ornAna	ERWIVVCKPFGNVKFDAKL	AMVGIVFSW	19	C	P
LWS_anoCar	ERWVVVCKPFGNVKFDAKL	AVAGIVFSW	19	C	P
LWS_galGal	ERWFVVCKPFGNIKFDGKL	AVAGILFSW	19	C	P
LWS_xenTro	ERWFVVCKPFGNIKFDGKL	AATGIIFSW	19	C	P
LWS_neoFor	ERWVVVCKPFGNIKFDGKW	AAGGIIFSW	19	C	P
LWS_calMil	ERWVVVCKPFGNVKFDGKW	AAFGIIFSW	19	C	P
LWS_takRub	ERWVVVCKPFGNVKFDAKW	ATGGIVFSW	19	C	P
LWS_gasAcu	ERWIVVCKPFGNVKFDAKW	ATAGIVFSW	19	C	P
LWS_petMar	ERWMVVCKPFGNIKFDGKI	ATILIVFSW	19	C	P
LWS_letJap	ERWMVVCKPFGNIKFDGKI	AIILIVFSW	19	C	P
LWS_geoAus	ERWMVVCKPFGNLKFDGKV	AIVLIIFSW	19	C	P
VAOP_galGa	ERYIVICRPVGNMRLRGKH	AAQGIAFVW	19	C	P
VAOP_anoCa	ERYVVICRPLGNMRLNGKH	AALGVAFVW	19	C	P
VAOP_xenTr	ERYIVICRPLGNLRLQGKH	SALAIIFVW	19	C	P
VAOP_danRe	ERFFVICRPLGNIRLRGKH	AALGLVFVW	19	C	P
VAOP_rutRu	ERFFVICRPLGNIRLRGKH	AALGLLFVW	19	C	P
VAOP_takRu	ERFFVICRPLGNMRLQAKH	AAIGLLFVW	19	C	P
VAOP_petMa	ERYFVICRPLGNFRLQSKH	AVLGLAVVW	19	C	P
PPIN_anoCa	DRAIVIAKPMGTITFTTRK	AMIGVAVSW	19	A	P
PPIN_xenTr	DRVFVVCKPMGTLTFTPKQ	ALAGIAASW	19	C	P
PPIN_ictPu	DRYMVVCRPLGAVMFQTKH	ALAGVVFSW	19	C	P
PPIN_oncMy	DRYVVVCRPMGAVMFQTRH	AVGGVVLSW	19	C	P
PPIN_danRe	ERCMVVCRPVGSISFQTRH	AVFGVAVSW	19	C	P
PPIN_petMa	DRFVVVCKPLGTLMFTRRH	ALLGITWAW	19	C	P
PPIN_letJa	DRFVVVCKPLGTLMFTRRH	ALLGIAWAW	19	C	P
PPIN2_petM	ERYVVVCKPLGGVHFGTQH	GLCGVAISW	19	C	P
PARIE_utaS	ERYNVVCQPLGTLQMSTKR	GYQLLGFIW	19	C	P
PARIE_anoC	ERYNVVCQPLGTLQMSTQR	AYQLLGFIW	19	C	P
PARIE_xenT	ERYNVVCEPIGALKLSTKR	GYQGLVFIW	19	C	P
PARIE_takR	ERYNVVCKPRAGLKLTMRR	SIIGLLFVW	19	C	P
PARIE_gasA	ERYNVVCRPRNALKLSMRR	SIHGLLIVW	19	C	P
PARIE_danR	ERYNVVCKPMAGFKLNVGR	SCQGLLLVW	19	C	P
PER_homSap	DRYLTICLPDVGRRMTTNT	YIGLILGAW	19	C	P
PER_panTro	DRYLTICLPDVGRRMTTNT	YIGLILGAW	19	C	P
PER_nomLeu	DRYLTICLPDVGRRMTTNT	YIGLILGAW	19	C	P
PER_gorGor	DRYLTICLPDVGRRMTTNT	YIGLILGAW	19	C	P
PER_ponPyg	DRYLTICLPDIGRRMTTNT	YIGLILGAW	19	C	P
PER_macMul	DRYLTICLPDIGRRMTTNT	YIGMILGAW	19	C	P
PER_papHam	DRYLTICLPDIGRRMTTNT	YIGMILGAW	19	C	P
PER_otoGar	DRYLTICRPDIGRRMTTNS	YIGMILGAW	19	C	P
PER_tarSyr	DRYLTICRPDIGRRMTTNT	YVGMILGAW	19	C	P
PER_micMur	DRYLTICRPDIGRRMTTHT	YVGMILGAW	19	C	P
PER_cavPor	DRYLTICRPDIGRRMTSHS	YVGMILGAW	19	C	P
PER_ochPri	DRYLTICQPDIGRRMTTHT	YFGMILGAW	19	C	P
PER_oryCun	DRYLTICHPDVGRRMTTRT	YLGLILGAW	19	C	P
PER_calJac	DRYLTICLPDIGRRMTTST	YIIMILGAW	19	C	P
PER_canFam	DRYLTICSPDTGRRMTTNT	YISMILGAW	19	C	P
PER_felCat	DRYLTICSPNSGRRMTTNT	YISMILGAW	19	C	P
PER_susScr	DRYLTICRPEAGRRMTTNT	YISMILGAW	19	C	P
PER_vicVic	DRYLTICRPDAGRRMTTNT	YISMILGAW	19	C	P
PER_turTru	DRYLTICCPGAGRRMTTNT	YISMILGAW	19	C	P
PER_bosTau	DRYLTICHPDAGRRMTANT	YISMILGAW	19	C	P
PER_choHof	DRYLTICHPDVGRRMTINT	YISMILGAW	19	C	P
PER_dasNov	DRYLTICRPDTGRRMTINT	YISMILGAW	19	C	P
PER_echTel	DRYLTICHPDRGRRMTSNT	YVGMILGAW	19	C	P
PER_loxAfr	DRYLTICHPHIGRRMTSNT	YVSMILGAW	19	C	P
PER_sorAra	DRYLTLCRPDAGRSMTTNS	YVGLILGAW	19	C	P
PER_equCab	DRYLTTCRPDAGRRMTTST	YTSMILGAW	19	C	P
PER_dipOrd	DRYLTICHPDIGRGMTTRT	YVTMILGAW	19	C	P
PER_musMus	DRYLTISCPDVGRRMTTNT	YLSMILGAW	19	S	P
PER_ratNor	DRYLTISCPDVGRRMTGNT	YLSMVLGAW	19	S	P
PER_eriEur	DRYLTICRPHTGRSMSANS	YIAMILGAW	19	C	P
PER_tupBel	DRYLTLCRPAVGRRMGSST	YAAMILGAW	19	C	P
PER_monDom	DRYLTICQPDLGGRMTSYN	YTLMILTAW	19	C	P
PER_ornAna	DRYLTICRPAIGRKMTRSN	YTAMILAAW	19	C	P
PER_xenTro	DRYLTICRPDIGRRISGRH	YTAMILAAW	19	C	P
PER_galGal	DRYLTICRPDIGRRMTTRN	YAALILAAW	19	C	P
PER_anoCar	DRYLTICKPHIGSRLTATN	YTTLILAAW	19	C	P
PER_taeGut	DRYLTICRPDIGRRMTTRS	YATLILAAW	19	C	P
PER1_gasAc	DRYLTICRPDIGQKMTMQS	YNLLILAAW	19	C	P
PER_gasAcu	DRYLTICRPDIGQKMTMQS	YNLLILAAW	19	C	P
PER_oryLat	DRYLTICRPDLGQKMTMQS	YNLLILAAW	19	C	P
PER_takRub	DRYITICRPDIGRKMTVQS	YNLLILAAW	19	C	P
PER_tetNig	DRYLTICRPDIGRKMTVQS	YNLLIAAAW	19	C	P
PER_danRer	DRYLTICRPDIGQKLTTRS	YTLLIVAAW	19	C	P
PER1a_sacK	DRYWATCSPVEVMELKSKY	YTRMTALGW	19	C	P
NEUR1_homS	DRYLKICYLSYGVWLKRKH	AYICLAAIW	19	C	L
NEUR1_nomL	DRYLKICYLSYGVWLKRKH	AYICLAAIW	19	C	L
NEUR1_panT	DRYLKICYLSYGVWLKRKH	AYICLAAIW	19	C	L
NEUR1_ponP	DRYLKICYLSYGVWLKRKH	AYICLAAIW	19	C	L
NEUR1_macM	DRYLKICYLSYGVWLKRKH	AYICLAAIW	19	C	L
NEUR1_papH	DRYLKICYLSYGVWLKRKH	AYICLAAIW	19	C	L
NEUR1_calJ	DRYLKICYLSYGVWLKRKH	AYICLAAIW	19	C	L
NEUR1_tarS	DRYLKICYLSYGVWLKRKH	AYICLAAIW	19	C	L
NEUR1_cavP	DRYLKICYLSYGVWLKRKH	AYICLAAIW	19	C	L
NEUR1_dasN	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_equC	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_canF	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_susS	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_pteV	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_choH	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_musM	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_ratN	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_loxA	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_felC	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_turT	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_tupB	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_echT	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_dipO	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_bosT	DRYLKICYLSYGIWLKRKH	AYICLAVIW	19	C	L
NEUR1_eriE	DRYLKICYLSYGVWLKRKH	AYLCLAVIW	19	C	L
NEUR1_sorA	DRYLKICYLSYGVWLKRKH	AYICLVVIW	19	C	L
NEUR1_speT	DRYLKICYLSYGVWLKRKH	AFICLAVIW	19	C	L
NEUR1_oryC	DRYLKICYLSYGVWLKRRH	AYICLALIW	19	C	L
NEUR1_myoL	DRYLKICYLSYGVWLKRKH	TYICLAFIW	19	C	L
NEUR1_monD	DRYLKICHLSYGTWLKRHH	AFICLALIW	19	C	L
NEUR1_taeG	DRYLKICHLSYGTWLKRHH	AFICLAIIW	19	C	L
NEUR1_galG	DRYLKICHLAYGTWLKRHH	AFICLALIW	19	C	L
NEUR1_ornA	DRYLKICHLSYGTWLKRHH	AYICLAIIW	19	C	L
NEUR1_macE	DRYLKICHLSYGTWLKRHH	AYICLVIIW	19	C	L
NEUR1_gasA	DRYLKICHLRYGTWLKRHH	AFVCLALVW	19	C	L
NEUR1_anoC	DRYFKICHLSYGTWLKRHH	VFICLGIIW	19	C	L
NEUR1_tetN	DRYLKICHLRYGAWLKRHH	AFLCLASVW	19	C	L
NEUR1_xenT	DRYLKICHLRYGTWLKRRH	AFIALAVIW	19	C	L
NEUR1_takR	DRYLKICHLRYGTWFKRHH	AFLCLVFTW	19	C	L
NEUR1_oryL	DRYLKICHLRYGTWLKRQH	AFLCLVFVW	19	C	L
NEUR1_pimP	DRYLKICHLRYGTWLKRQH	IFLCLVFVW	19	C	L
NEUR1_danR	DRYLKICHLRYGTWLKRHH	AFLSVVFIW	19	C	L
NEUR1_calM	DRYLKICHLQYGSWLQRRH	VFMSLAFIW	19	C	L
NEUR2_galG	VCCLKICFPAYGNRFRRKH	GQILIACAW	19	C	P
NEUR2_anoC	VCCLKICFPVYGNRFRPGH	GWILIACAW	19	C	P
NEUR2_oncM	VCFVKVCYPLYGNRFNAVH	GRLLIACAW	19	C	P
NEUR2_xenT	VCCLKVCYPAYGNKFSTAH	SRILLLGIW	19	C	P
NEUR2_danR	VCCLKVCFPNYGNKFSSSH	ACVMVIGVW	19	C	P
NEUR2_pimP	VCCLKVCCPNYGNKFSSNH	ACVMVIGVW	19	C	P
NEUR2_tetN	VCCLKVCLPNLGSKFSSSH	ARLLVAGVW	19	C	P
NEUR2_takR	VCCLKVCFPNHGSRFSSSH	ARLLVVGVW	19	C	P
NEUR2_gasA	VCCLKVCFPNHGNRFSSSH	ARLLVVAVW	19	C	P
NEUR2_oryL	VCCLKVCFPNHGNKFSFSH	ARLLVAGVW	19	C	P
TMT_monDom	ERYRTL-TLCPGQGADYQK	ALLAVAGSW	19	-	L
TMT_macEug	ERYRTL-TLCPRQGTDYHK	ALLAVAGSW	19	-	L
TMT_ornAna	ERYRTL-TLHPKQSTDYQK	AVLAVGASW	19	-	L
TMT_galGal	ERYSTL-TLCNKRSDDYRK	ALLAVGGSW	19	-	L
TMT_taeGut	ERYNTL-TLCHKRSDDFRK	ALLAVAGSW	19	-	L
TMT_anoCar	ERYSTL-TQTNKRGSDYQK	ALLGVGGSW	19	-	Q
TMT_xenTro	ERYSTL-TLYNKGGPNFKK	ALLAVASSW	19	-	L
TMT_danRer	ERYCTMMGSTEADATNYKK	VIGGVLMSW	19	M	S
TMT_pimPro	ERYCTMMGATQADSTNYKK	VAMGIAFSW	19	M	A
TMTa_takRu	ERYSTMMTPTEADPSNYCK	VCLGITLSW	19	M	P
TMT_tetNig	ERYSTMMTPTEADSSNYCK	VCLGIGLSW	19	M	P
TMT_gasAcu	ERYSTMVAPTEADSSNYHK	ISLGITLSW	19	V	P
TMT_oryLat	ERYSTMMTPAEADSSNYRK	ISLGIILSW	19	M	P
TMTb_takRu	ERYCTMVSSTIASNRDYRP	VLGGICFSW	19	V	S
TMTa_calMi	DRYITITGTTEADITNYNK	TIVGIALSW	19	T	T
TMT1_plaDu	ERYLAVVRPFDVGNLTNRR	VIAGGVFVW	19	V	P
TMT2_anoGa	ERYCLISRPFSSRNLTRRG	AFLAIFFIW	19	S	P
TMT_triCas	ERYLLIARPFRNNALNFHS	AALSVFSIW	19	A	P
TMT_bomMor	ERYLMVTRPLTSRHLSSKG	AVLSIMFIW	19	T	P
ENCEPH_hom	ERYIRVVHARVINFSW	AWRAITYIW	16	V	A
TMT_aedAe	ERFCLISHPFSSRSLSRRG	AVFAILFIW	19	S	P
TMT_culPi	ERFYLISRPFSSRSLSRRG	ALGAVLLIW	19	S	P
ENCEPH_lox	ERYIRVVHARVINFSW	AWRAITYIW	16	V	A
TMT1_anoGa	ERFCLISRPFAAQNRSKQG	ACLAVLFIW	19	S	P
ENCEPH_can	ERYIRVVHARVINFSW	AWRAITYIW	16	V	A
TMT_triCa	ERYLLIARPFRNNALNFHS	AALSVFSIW	19	A	P
ENCEPH_oto	ERYIRVVHARVINFSW	AWRAITYIW	16	V	A
ENCEPH_mus	ERYIRVVHARVINFSW	AWRAITYIW	16	V	A
ENCEPH_ano	ERYIRVVHARVIDFSW	SWRAITYIW	16	V	A
ENCEPH_gal	ERYIRVVHAKVIDFSW	SWRAITYIW	16	V	A
ENCEPH_mon	ERYNRIVHAKVINFSW	AWRAITYIW	16	V	A
ENCEPH_pte	ERYIRVVQARAIDFSW	AWRTITYIW	16	V	A
ENCEPH_squ	ERYIRVVNATAIDFSW	AWRAITYIW	16	V	A
ENCEPH_xen	ERYARVVYGKYVNSSW	SKRSITFVW	16	V	G
ENCEPH_dan	ERYIRVVHAKVVDFPW	AWRAITHIW	16	V	A
ENCEPH_tak	ERYIRVVHAQVVDFPW	AWRAIGHIW	16	V	A
ENCEPH_gas	ERYIRVVHAQVVDFPW	AWRAIGHIW	16	V	A
ENCEPH_ory	ERYIRVVHAQVVDFPW	AWRAIGHIW	16	V	A
ENCEPH_cal	ERYIRVVNAKATNFPW	AWRAITYTW	16	V	A
ENCEPH_squ	ERYIRVVNATAIDFSW	AWRAITYIW	16	V	A
ENCEPH_pet	ERYARLIKAQVLDFSW	AWRAVTYTW	16	I	A
RGR_homSap	GRYHHYCTRSQLAWNS	AVSLVLFVW	16	C	R
RGR_panTro	GRYHHYCTRSQLAWNS	AISLVLFVW	16	C	R
RGR_gorGor	GRYHHYCTGSTLACKS	AVSLVLSGR	16	C	G
RGR_macMul	GRYHHYCTRSQLAWNS	AISLVLFVW	16	C	R
RGR_ponPyg	GRYHHYCTGSQLAWNS	AISLVLFVW	16	C	G
RGR_calJac	GRYHHYCTGSQLAWNS	AISLVLFVW	16	C	G
RGR_nomLeu	GRYHHYCTGSQLAWNS	AISLVLFVW	16	C	G
RGR_tarSyr	GRYHHYCTGSQLAWNT	AISLVLFVW	16	C	G
RGR_pteVam	GRYHHYCTGSRLAWNT	AVSLVLFVW	16	C	G
RGR_oryCun	GRYHHYCTGSQLAWNT	AVLLVLFVW	16	C	G
RGR_ochPri	GRYHHYCTGSQLAWNT	AVLLVLFVW	16	C	G
RGR_otoGar	GRYHHYCTGRPLAWST	AISLVLFVW	16	C	G
RGR_micMur	GRYHHYCTGSPLAWST	AISLVLFVW	16	C	G
RGR_musMus	GRYHHYCTGRQLAWDT	AIPLVLFVW	16	C	G
RGR_ratNor	GRYHHYCTGRQLAWDT	AIPLVLFVW	16	C	G
RGR_cavPor	GRHQQCCTRGRLTWST	AVPLVLFVW	16	C	R
RGR_speTri	GRYHHYCTGSQLAWNT	AIPLVLFVW	16	C	G
RGR_sorAra	GRYHHYCTGRQLAWDV	AIALVIFVW	16	C	G
RGR_myoLuc	GRYHHYCTGSRLAWRT	AASLVLFVW	16	C	G
RGR_canFam	GRYHHYCTRGQLAWNT	AISLVLCVW	16	C	R
RGR_felCat	GRYHHYCSGSQLAWNT	AISLVICVW	16	C	G
RGR_bosTau	GRYHHFCTGSRLDWNT	AVSLVFFVW	16	C	G
RGR_turTru	GRYHHYCTGSRLDWNT	AVSLVFFVW	16	C	G
RGR_susScr	GRYHHYCTRSRLDWNT	AVSLVFFVW	16	C	R
RGR_equCab	GRYHHYCTRSRLAWNT	AVFLVFFVW	16	C	R
RGR_eriEur	GRYHHHCTRSRLAWNT	AVFLVFFVW	16	C	R
RGR_dipOrd	GRCHHHCTGSLLGWDT	AVSLVIFVW	16	C	G
RGR_loxAfr	ERYHHYCTRSRLAWSS	ASALVLFVW	16	C	R
RGR_proCap	ERYHHYCTGSKLAWSS	AGALVLFMW	16	C	G
RGR_echTel	ERYHHYCTGSQFTWSS	ASTLVLFMW	16	C	G
RGR_dasNov	ERCHRHCIGRRLAWST	AGCLVLCLW	16	C	G
RGR_choHof	ERYRHHCTGSQLSWST	AGSLVLCVW	16	C	G
RGR_ornAna	DRYLRHCSRSKPQWGT	AVSTVLFAW	16	C	R
RGR_anoCar	DRHHQYCTGNKLQWGS	VIPMTIFLW	16	C	G
RGR_galGal	DRYHHYCTRSKLQWST	AISMMVFAW	16	C	R
RGR_taeGut	DRYHHYCTRSRLQWST	AVSMMVFAW	16	C	R
RGR_xenTro	DRYHQYCTRSKLHWST	AVSVVFFIW	16	C	R
RGR_xenLae	DRYHQYCTRSKLHWGT	AVSMVLFVW	16	C	R
RGR1_gasAc	DRYHQYCTRTKLQWSS	AITLAVFVW	16	C	R
RGR1_takRu	DRYHQYCTRTKLQWSS	AITLAVFIW	16	C	R
RGR1_tetNi	DRYHQYCTRTKLQWSS	AITLAVFIW	16	C	R
RGR1_pimPr	DRYHQYCTRTKLQWSS	AITLVIFIW	16	C	R
RGR1_osmMo	DRYHQYCTRTKLQWSS	AITLVMFIW	16	C	R
RGR1_gadMo	DRYHQYCTRTELQWSS	AVTLSVFIW	16	C	R
RGR1_danRe	DRYHQYCTRTKLQWSS	AITLVLFTW	16	C	R
RGR1_oryLa	DRYHQYCTRTKLQWST	AITLAVLVW	16	C	R
RGR_calMil	DRYHQNCSRSRLQWSS	AITVTVFIW	16	C	R
RGR2_gasAc	DRYHQYCTRQKLFWST	TLTMSAIIW	16	C	R
RGR2_tetNi	DRYHQYCTRQKLFWST	TLTMSSIIW	16	C	R
RGR2_oryLa	DRYHQYCTRQKLFWST	SITISLIIW	16	C	R
RGR2_danRe	DRYHQYCTKQKMFWST	SITISCLIW	16	C	K
RGR2_pimPr	DRYHLYCTKQKMFWST	SGTISALIW	16	C	K
RGR2_gadMo	DRYHQYCTRQKLFWST	TVTMCCIVW	16	C	R
RGR2_hipHi	DRYHQYCTRQKLFWST	TLTMSGIIW	16	C	R
RGR2_oncMy	DRYHQYVTNQKLFWST	AWTISIIIW	16	V	N
RGR2_esoLu	DRYHQYVTNQKLFWST	AWTFSIIIW	16	V	N
RGR2_poeRe	DRYHQYCTRQKLFWST	TLTMSGIIW	16	C	R
MEL1_homSa	DRYLVITRPLATFGVASKRR	AAFVLLGVW	20	T	P
MEL1_panTr	DRYLVITRPLATFGVASKRR	AAFVLLGVW	20	T	P
MEL1_gorGo	DRYLVITRPLATFGVASKRR	AAFVLLGVW	20	T	P
MEL1_ponAb	DRYLVITRPLATIGVASKRR	AAFVLLGVW	20	T	P
MEL1_rheMa	DRYLVITRPLATIGVASKRR	AAFVLLGVW	20	T	P
MEL1_calJa	DRYLVITRPLATIGVASTKR	AAFVLLGVW	20	T	P
MEL1_micMu	DRYLVITRPLASVGTASKRR	AGLVLLGVW	20	T	P
MEL1_otoGa	DRYLVITRPLTTVGVASKRR	AALVLLGVW	20	T	P
MEL1_musMu	DRYLVITRPLATIGRGSKRR	TALVLLGVW	20	T	P
MEL1_ratNo	DRYLVITRPLATIGMRSKRR	TALVLLGVW	20	T	P
MEL1_nanEh	DRYLVITRPLATIGVASKRR	TALVLLGVW	20	T	P
MEL1_phoSu	DRYLVITRPLATIGMGSKRR	TALVLLGIW	20	T	P
MEL1_dipOr	DRYLVITRPLATIGVTSKRR	TAFVLLGVW	20	T	P
MEL1_cavPo	DRYLVITRPLATIGVASKRQ	AALVLLGVW	20	T	P
MEL1_speTr	DRYLVITRPLATIGMASKKR	AAFFLLGVW	20	T	P
MEL1_oryCu	DRYLVITRPLAAVGMVSKKR	AGLVLLGVW	20	T	P
MEL1_ochPr	DRYLVITRPLAAVGMVSKRR	TGLVLLGVW	20	T	P
MEL1_bosTa	DRYLVITRPLATVGMVSKRR	AALVLLGVW	20	T	P
MEL1_turTr	DRYLVITRPLATVGMVSKRR	AALVLLGVW	20	T	P
MEL1_susSc	DRYLVITHPLATVGMVSKRR	AALVLLGVW	20	T	P
MEL1_equCa	DRYLVITRPLATVGVVSKRW	AALVLLGIW	20	T	P
MEL1_felCa	DRYLVITHPLATIGVVSKRR	AALVLLGVW	20	T	P
MEL1_canFa	DRYLVITHPLAAVGVVSKRR	AALVLLGVW	20	T	P
MEL1_myoLu	DRYLVITRPLA-IGVVSKRR	AALVLLGVW	20	T	P
MEL1_pteVa	DRYLVITRPLAAIGVVSKRR	AALVLLGVW	20	T	P
MEL1_eriEu	DRYLVITRPLATIGVVSKRR	VALVLLGVW	20	T	P
MEL1_loxAf	DRYLVITRPLATIGVVSKRR	AALVLLGIW	20	T	P
MEL1_proCa	DRYLVITRPLATIGVVSKRR	TALVLLGTW	20	T	P
MEL1_echTe	DRYLVITRPLATIGVVSKRR	AALVLLVIW	20	T	P
MEL1_smiCr	DRYFVITRPLASIGMISKKK	TGLILLGVW	20	T	P
MEL1_monDo	DRYFVITRPLASIGVISKKK	TGFILLGVW	20	T	P
MEL1_ornAn	DRYFVITRPLASIGVISKKR	ALLILTGVW	20	T	P
MEL1_anoCa	DRYFVITRPLASIGAMSTKK	ALLILSGVW	20	T	P
MEL1_taeGu	DRYFVITKPLASVGVTSKKK	ALIILVGVW	20	T	P
MEL1_galGa	DRYFVITKPLASVRVMSKKK	ALIILVGVW	20	T	P
MEL1_xenTr	DRYFVITRPLTSIGVMSKKR	AVLILSGVW	20	T	P
MEL1_danRe	DRYFVITRPLASIGVLSQKR	ALLILLVAW	20	T	P
MEL1_danRe	DRYFVITRPLASIGVMSRKR	ALLILSAAW	20	T	P
MEL1_takRu	DRYFVITRPLTSIGVLSRKR	AFVILMTVW	20	T	P
MEL1_gasAc	DRYFVITRPLTSIGMMSRRR	ALLILMGAW	20	T	P
MEL1_oryLa	DRYFVITRPLTSIGVLSRKR	ALLILSAAW	20	T	P
MEL1_calMi	DRYFVITRPLASIGVLSHRR	AGLIILSLW	20	T	P
MEL1_petMa	DRYLVLTRPLASIGAMSKRR	AMYITAAVW	20	T	P
MEL2_galGa	DRYLVITKPLRSIQWTSKKR	TIQIIAAVW	20	T	P
MEL2_anoCa	DRYCVITKPLQSIKRTSKKR	TCIIIVFVW	20	T	P
MEL2_xenLa	NRYIVITKPLQSIQWSSKKR	TSQIIVLVW	20	T	P
MEL2_danRe	DRYLVITKPLQTIQWNSKRR	TGLAILCIW	20	T	P
MEL2_tetNi	DRYVVITKPLQTIRRSSKRR	TALAILMVW	20	T	P
MEL2_gasAc	DRYLVITKPLQAIHWGSKRR	TTLAILLVW	20	T	P
MEL1_plaDu	DRFYVITNPLGAAQTMTKKR	AFIILTIIW	20	T	P
MEL1_capCa	DRYMVIAKPFYAMKHVSHKR	SLIQIILAW	20	A	P
MEL1_helRo	DRYLVVGQPLAMLNQSHFRR	SFYHVLIIW	20	G	P
MEL1_todPa	DRYNVIGRPMAASKKMSHRR	AFIMIIFVW	20	G	P
TMT_triCys	ERFITIVLPLKRDTILSTKN	IYIGLGILW	20	V	P

Reference collection of structurally determined GPCR

>RHO1_bosTau cow rod rhodopsin
MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAI
ERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWL
PYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTTLCCGKNPLGDDEASTTVSKTETSQVAPA*

>MEL1_todPac Todarodes pacificus (squid) Gq X70498 480 11106382 Mollusca 'squid rhodopsin' 3D: May 2008 Cys 337 palmitoyled
MGRDLRDNETWWYNPSIVVHPHWREFDQVPDAVYYSLGIFIGICGIIGCGGNGIVIYLFTKTKSLQTPANMFIINLAFSDFTFSLVNGFPLMTISCFLKKWIFGFAACKVYGFIGGIFGFMSIMTMAMISI
DRYNVIGRPMAASKKMSHRRAFIMIIFVWLWSVLWAIGPIFGWGAYTLEGVLCNCSFDYISRDSTTRSNILCMFILGFFGPILIIFFCYFNIVMSVSNHEKEMAAMAKRLNAKELRKAQAGANAEMRLAKI
SIVIVSQFLLSWSPYAVVALLAQFGPLEWVTPYAAQLPVMFAKASAIHNPMIYSVSHPKFREAISQTFPWVLTCCQFDDKETEDDKDAETEIPAGESSDAAPSADAAQMKEMMAMMQKMQQQQAAYPPQGY
APPPQGYPPQGYPPQGYPPQGYPPQGYPPPPQGAPPQGAPPAAPPQGVDNQAYQA*

>ADRB1_melGal turkey Beta 1 adrenergic receptor with stabilising mutations And bound cyanopindolol
MGAELLSQQWEAGMSLLMALVVLLIVAGNVLVIAAIGSTQRLQTLTNLFITSLACADLVVGLLVVPFGATLVVRGTWLWGSFLCELWTSLDVLCVTASIETLCVIAI
DRYLAITSPFRYQSLMTRARAKVIICTVWAISALVSFLPIMMHWWRDEDPQALKCYQDPGCCDFVTNRAYAIASSIISFYIPLLIMIFVALRVYREA
KEQIRKIDRASKRKRVMLMREHKALKTLGIIMGVFTLCWLPFFLVNIVNVFNRDLVPDWLFVAFNWLGYANSAMNPIIYCRSPDFRKAFKRLLAFPRKADRRLHHHHHH*

>ADRB2_homSap beta 2 adrenergic receptor 365 aa  
MGQPGNGSAFLLAPNRSHAPDHDVTQQRDEVWVVGMGIVMSLIVLAIVFGNVLVITAIAKFERLQTVTNYFITSLACADLVMGLAVVPFGAAHILMKMWTFGNFWCEFWTSIDVLCVTASIETLCVIAV
DRYFAITSPFKYQSLLTKNKARVIILMVWIVSGLTSFLPIQMHWYRATHQEAINCYANETCCDFFTNQAYAIASSIVSFYVPLVIMVFVYSRVFQEAKRQLQKIDKSEGRFHVQNLSQVEQDGRTGHGL
RRSSKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIQDNLIRKEVYILLNWIGYVNSGFNPLIYCRSPDFRIAFQELLCLRRSSLKAYGNGYSSNGNTGEQSG*

>ADORA2A_homSap adenosine adrenergic receptor 2A
MPIMGSSVYITVELAIAVLAILGNVLVCWAVWLNSNLQNVTNYFVVSLAAADIAVGVLAIPFAITISTGFCAACHGCLFIACFVLVLTQSSIFSLLAIAI
DRYIAIRIPLRYNGLVTGTRAKGIIAICWVLSFAIGLTPMLGWNNCGQPKEGKNHSQGCGEGQVACLFEDVVPMNYMVYFNFFACVLVPLLLMLGVYLRI
FLAARRQLKQMESQPLPGERARSTLQKEVHAAKSLAIIVGLFALCWLPLHIINCFTFFCPDCSHAPLWLMYLAIVLSHTNSVVNPFIYAYRIREFRQTFR
KIIRSHVLRQQEPFKAAGTSARVLAAHGSDGEQVSLRLNGHPPGVWANGSAPHPERRPNGYALGLVSGGSAQESQGNTGLPDVELLSHELKGVCPEPPGLDDPLAQDGAGVS*

The C2 loop is highly conserved within each orthology class for GPCR with determined structure:

        RHO1 in vertebrates                  MEL1 in vertebrates                    ADRB1 in vertebrates                   ADRB2 orthologs in tetrapods           ADORA2A in teleosts
homSap  ERYVVVCKPMSNFRFGENHAIMGVAFTW  homSa  DRYLVITRPLATFGVASKRRAAFVLLGVW  homSap  DRYLAITSPFRYQSLLTRARARGLVCTVW  homSap  DRYFAITSPFKYQSLLTKNKARVIILMVW  homSap  DRYIAIRIPLRYNGLVTG TRAKGIIAICW
panTro  ERYVVVCKPMSNFRFGENHAIMGVAFTW  panTr  DRYLVITRPLATFGVASKRRAAFVLLGVW  panTro  DRYLAITSPFRYQSLLTRARARGLVCTVW  panTro  DRYFAITSPFKYQSLLTKNKARVIILMVW  panTro  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
gorGor  ERYVVVCKPMSNFRFGENHAIMGVAFTW  gorGo  DRYLVITRPLATFGVASKRRAAFVLLGVW  ponAbe  DRYLAITSPFRYQSLLTRARARGLVCTVW  gorGor  DRYFAITSPFKYQSLLTKNKARVIILMVW  gorGor  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
ponAbe  ERYVVVCKPMSNFRFGENHAIMGVAFTW  ponAb  DRYLVITRPLATIGVASKRRAAFVLLGVW  rheMac  DRYLAITSPFRYQSLLTRARARGLVCTVW  ponAbe  DRYFAITSPFKYQSLLTKNKARVIILMVW  ponAbe  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
rheMac  ERYVVVCKPMSNFRFGENHAIMGVAFTW  rheMa  DRYLVITRPLATIGVASKRRAAFVLLGVW  calJac  DRYLAITSPFRYQSLLTRARARGLVCTVW  rheMac  DRYFAITSPFKYQSLLTKNKARVIILMVW  rheMac  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
calJac  ERYVVVCKPMSNFRFGENHAIMGVAFTW  calJa  DRYLVITRPLATIGVASTKRAAFVLLGVW  micMur  DRYLAITSPFRYQSLLTRARARALVCTVW  calJac  DRYFAITSPFKYQSLLTKNKARVIILMVW  calJac  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
micMur  ERYVVVCKPMSNFRFGENHAIMGVVFTW  micMu  DRYLVITRPLASVGTASKRRAGLVLLGVW  otoGar  DRYLAITSPFRYQSLLTRARARPLVCTVW  micMur  DRYFAITSPFKYQSLLTKNKARVVILMVW  micMur  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
musMus  ERYVVVCKPMSNFRFGENHAIMGVVFTW  otoGa  DRYLVITRPLTTVGVASKRRAALVLLGVW  musMus  DRYLAITSPFRYQSLLTRARARALVCTVW  otoGar  DRYFAITSPFKYQSLLTKNKARVVILMVW  musMus  DRYIAIRIPLRYNGLVTGMRAKGIIAICW
ratNor  ERYVVVCKPMSNFRFGENHAIMGVAFTW  musMu  DRYLVITRPLATIGRGSKRRTALVLLGVW  ratNor  DRYLAITSPFRYQSLLTRARARALVCTVW  tupBel  DRYFAITSPFKYQSLLTKNKARVVILMVW  ratNor  DRYIAIRIPLRYNGLVTGVRAKGIIAICW
cavPor  ERYVVVCKPMSNFRFGENHAIMGVVFTW  ratNo  DRYLVITRPLATIGMRSKRRTALVLLGVW  cavPor  DRYLAITSPFRYQSLLTRARARVLVCTVW  dipOrd  DRYFAITSPFKYQSLLTKNKARVVILMVW  dipOrd  DRYIAIRIPLRYNSLVTCTRAKGIIAICW
speTri  ERYMVVCKPMSNFRFGENHAIMGVIFTW  dipOr  DRYLVITRPLATIGVTSKRRTAFVLLGVW  oryCun  DRYLAITSPFRYQSLLTRARARALVCTVW  cavPor  DRYFAITSPFKYQSLLTKNKARVVILMVW  cavPor  DRYIAIRIPLRYNGLVTCTRAKGIIAICW
oryCun  ERYVVVCKPMSNFRFGENHAIMGVAFTW  cavPo  DRYLVITRPLATIGVASKRQAALVLLGVW  ochPri  DRYLAITSPFRYQSLLTRARARALVCTVW  oryCun  DRYFAITSPFKYQSLLTKNKARVVILMVW  speTri  DRYIAIRIPLRYNGLVTGMRAKGIIAICW
ochPri  ERYVVVCKPMSNFRFGENHAIMGVAFTW  speTr  DRYLVITRPLATIGMASKKRAAFFLLGVW  bosTau  DRYLAITSPFRYQSLLTRARARALVCTVW  ochPri  DRYFAITSPFKYQSLLTKNKARVVVLMVW  oryCun  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
bosTau  ERYVVVCKPMSNFRFGENHAIMGVAFTW  oryCu  DRYLVITRPLAAVGMVSKKRAGLVLLGVW  equCab  DRYLAITSPFRYQSLLTRARARALVCTVW  equCab  DRYFAITSPFKYQSLLTKNKARVVILMVW  ochPri  DRYIAIRIPLRYNGLVTGSRAKGIIAICW
equCab  ERYVVVCKPMSNFRFGENHAIMGVAFTW  ochPr  DRYLVITRPLAAVGMVSKRRTGLVLLGVW  felCat  DRYLAITSPFRYQSLLTRARARALVCTVW  felCat  DRYFAITSPFKYQSLLTKNKARVVILMVW  turTru  DRYIAIRIPLRYNGLVTGTRAKGIIAVCW
felCat  ERYVVVCKPMSNFRFGENHAIMGVAFTW  bosTa  DRYLVITRPLATVGMVSKRRAALVLLGVW  canFam  DRYLAITAPFRYQSLLTRARARALVCTVW  canFam  DRYFAITSPFKYQSLLTKNKARVVILMVW  bosTau  DRYIAIRIPLRYNGLVTGTRAKGIIAVCW
canFam  ERYVVVCKPMSNFRFGENHAIMGVAFTW  turTr  DRYLVITRPLATVGMVSKRRAALVLLGVW  myoLuc  DRYLAITSPFRYQSLLTRARARALVCTVW  myoLuc  DRYFAITSPFKYQSLLTKNKARVVILLVW  canFam  DRYIAIRIPLRYNGLVTGTRAKGIIAVCW
myoLuc  ERYVVVCKPMSNFRFGENHAIMGLAFTW  equCa  DRYLVITRPLATVGVVSKRWAALVLLGIW  pteVam  DRYLAITSPFRYQSLLTRARARALVCTVW  pteVam  DRYFAITSPFKYQSLLTKNKARVVILMVW  myoLuc  DRYIAIRIPLRYNGLVTGARAKGIIAICW
pteVam  ERYVVVCKPMSNFRFGENHAIMGLALTW  felCa  DRYLVITHPLATIGVVSKRRAALVLLGVW  echTel  DRYLAITSPFRYQSLLTRARARVLVCTVW  eriEur  DRYFAITSPFKYQSLLTKNKARVVILMVW  eriEur  DRYIAIRIPLRYNGLVTGQRAKGIIAVCW
eriEur  ERYVVVCKPMSNFRFGENHAIMGVAFTW  canFa  DRYLVITHPLAAVGVVSKRRAALVLLGVW  choHof  DRYLAITSPFRYQSLLTRARARALVCTVW  sorAra  DRYFAITSPFKYQSLLTKNKARGVILMVW  loxAfr  DRYIAIRIPLRYNGLVTGTRAKGIIAVCW
dasNov  ERYVVVCKPMSNFRFGENHAVMGVAFTW  myoLu  DRYLVITRPLA-IGVVSKRRAALVLLGVW  monDom  DRYIAITSPFRYQSLLTRARARALVCTVW  proCap  DRYFAITSPFKYQSLLTKNKARVVILMVW  proCap  DRYIAIRIPLRYNGLVTGTRAKGIIAVCW
monDom  ERYVVVCKPMSNFRFGENHAIIGVAFTW  pteVa  DRYLVITRPLAAIGVVSKRRAALVLLGVW  ornAna  DRYIAITSPFRYRSLLTRARARGLVCGVW  echTel  DRYFAITSPFKYQSLLTKNKARVVILMVW  galGal  DRIIAIRIPLRYNGLVTGSRAKGIIAICW
ornAna  ERYIVVCKPMSNFRFGENHAIMGVAFTW  eriEu  DRYLVITRPLATIGVVSKRRVALVLLGVW  galGal  DRYLAITSPFRYQSLMTRARAKGIICTVW  dasNov  DRYFAITSPFKYQSLLTKNKARVVILMVW  taeGut  DRIIAIRIPLRYNGLVTGSRAKGIIAICW
galGal  ERYVVVCKPMSNFRFGENHAIMGVAFSW  loxAf  DRYLVITRPLATIGVVSKRRAALVLLGIW  taeGut  DRYLAITSPFRYQSLMTKGRAKGIICTVW  monDom  DRYFAITAPFRYQSMLTKGKARVVILVVW  xenTro  DRYIAIRIPLRYNSLVTSRRANAIIAVCW
taeGut  ERYVVVCKPMSNFRFGENHAIMGVAFSW  proCa  DRYLVITRPLATIGVVSKRRTALVLLGTW  anoCar  DRYLAITSPFRYQSLMTKKRAKIIVCVVW  galGal  DRYFAITSPFKYQSLLTKSKARVVILVVW  tetNig  DRYIAIKLPLRYNGLVTGQRAQAIIAICW
anoCar  ERYVVICKPMSNFRFGETHALIGVSCTW  echTe  DRYLVITRPLATIGVVSKRRAALVLLVIW  xenTro  DRYIAITSPLKYEMLVTKVRARLTVCLVW  taeGut  DRYFAITSPFKYQSLLTKGKARVVILVVW  fugRub  DRYIAIKLPLRYNSLVTGKRAQGIIAICW
xenTro  ERYVVVCKPMANFRFGENHAIMGVVFTW  monDo  DRYFVITRPLASIGVISKKKTGFILLGVW  tetNig  DRYVAITSPFRYQSLLTKARARAMVCAVW  anoCar  DRYFAITSPFKYQSHLTKNKARVIILLVW  gasAcu  DRYIAIKIPLRYNGLVTGQRAQGIIAICW
tetNig  ERYIVVCKPVTNFRFGEKHAIAGLAFTW  ornAn  DRYFVITRPLASIGVISKKRALLILTGVW  fugRub  DRYVAITSPFRYQSLLTKARAKAMVCAVW  xenTro  DRYFAITSPFRYQSLLTKCKARIVILLVW  oryLat  DRYIAIKIPLRYNSLVTSQRARGIIAICW
fugRub  ERYIVVCKPMTNFRFGEKHAIAGLVFTW  anoCa  DRYFVITRPLASIGAMSTKKALLILSGVW  gasAcu  DRYVAITSPFRYQSLLTKARARTVVCVVW                                         danRer  DRYIAIKIPLRYNSLVTGQRARGIIAICW
gasAcu  ERYVVVCKPMSNFRFGEKHAIAGLLFTW  galGa  DRYFVITKPLASVRVMSKKKALIILVGVW  oryLat  DRYVAITSPFRYQSLLTKSRAKAVVCVVW    
oryLat  ERYVVVCKPMTNFRFEEKHAIAGLAFSW  xenTr  DRYFVITRPLTSIGVMSKKRAVLILSGVW  danRer  DRYIAIISPFRYQSLLTKARAKVVVCAVW    
danRer  ERWMVVCKPVSNFRFGENHAIMGVAFTW  danRe  DRYFVITRPLASIGVLSQKRALLILLVAW  petMar  DRYIAVARPLRYETLMNKRRARFIIVAVW    
petMar  ERYIVICKPMGNFRFGSTHAYMGVAFTW  takRu  DRYFVITRPLTSIGVLSRKRAFVILMTVW      
                                      gasAc  DRYFVITRPLTSIGMMSRRRALLILMGAW      
                                      oryLa  DRYFVITRPLTSIGVLSRKRALLILSAAW      
                                      calMi  DRYFVITRPLASIGVLSHRRAGLIILSLW      
                                      petMa  DRYLVLTRPLASIGAMSKRRAMYITAAVW