Opsin evolution: Cytoplasmic face: Difference between revisions

From genomewiki
Jump to navigationJump to search
Line 569: Line 569:
KIIRSHVLRQQEPFKAAGTSARVLAAHGSDGEQVSLRLNGHPPGVWANGSAPHPERRPNGYALGLVSGGSAQESQGNTGLPDVELLSHELKGVCPEPPGLDDPLAQDGAGVS*
KIIRSHVLRQQEPFKAAGTSARVLAAHGSDGEQVSLRLNGHPPGVWANGSAPHPERRPNGYALGLVSGGSAQESQGNTGLPDVELLSHELKGVCPEPPGLDDPLAQDGAGVS*


         ADRB2 orthologs in tetrapods
         ADRB2 orthologs in tetrapods             ADORA2A in teleosts
homSap  DRYFAITSPFKYQSLLTKNKARVIILMVW
homSap  DRYFAITSPFKYQSLLTKNKARVIILMVW   homSap  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
panTro  DRYFAITSPFKYQSLLTKNKARVIILMVW
panTro  DRYFAITSPFKYQSLLTKNKARVIILMVW   panTro  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
gorGor  DRYFAITSPFKYQSLLTKNKARVIILMVW
gorGor  DRYFAITSPFKYQSLLTKNKARVIILMVW   gorGor  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
ponAbe  DRYFAITSPFKYQSLLTKNKARVIILMVW
ponAbe  DRYFAITSPFKYQSLLTKNKARVIILMVW   ponAbe  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
rheMac  DRYFAITSPFKYQSLLTKNKARVIILMVW
rheMac  DRYFAITSPFKYQSLLTKNKARVIILMVW   rheMac  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
calJac  DRYFAITSPFKYQSLLTKNKARVIILMVW
calJac  DRYFAITSPFKYQSLLTKNKARVIILMVW   calJac  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
micMur  DRYFAITSPFKYQSLLTKNKARVVILMVW
micMur  DRYFAITSPFKYQSLLTKNKARVVILMVW   micMur  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
otoGar  DRYFAITSPFKYQSLLTKNKARVVILMVW
otoGar  DRYFAITSPFKYQSLLTKNKARVVILMVW   musMus  DRYIAIRIPLRYNGLVTGMRAKGIIAICW
tupBel  DRYFAITSPFKYQSLLTKNKARVVILMVW
tupBel  DRYFAITSPFKYQSLLTKNKARVVILMVW   ratNor  DRYIAIRIPLRYNGLVTGVRAKGIIAICW
dipOrd  DRYFAITSPFKYQSLLTKNKARVVILMVW
dipOrd  DRYFAITSPFKYQSLLTKNKARVVILMVW   dipOrd  DRYIAIRIPLRYNSLVTCTRAKGIIAICW
cavPor  DRYFAITSPFKYQSLLTKNKARVVILMVW
cavPor  DRYFAITSPFKYQSLLTKNKARVVILMVW   cavPor  DRYIAIRIPLRYNGLVTCTRAKGIIAICW
oryCun  DRYFAITSPFKYQSLLTKNKARVVILMVW
oryCun  DRYFAITSPFKYQSLLTKNKARVVILMVW   speTri  DRYIAIRIPLRYNGLVTGMRAKGIIAICW
ochPri  DRYFAITSPFKYQSLLTKNKARVVVLMVW
ochPri  DRYFAITSPFKYQSLLTKNKARVVVLMVW   oryCun  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
equCab  DRYFAITSPFKYQSLLTKNKARVVILMVW
equCab  DRYFAITSPFKYQSLLTKNKARVVILMVW   ochPri  DRYIAIRIPLRYNGLVTGSRAKGIIAICW
felCat  DRYFAITSPFKYQSLLTKNKARVVILMVW
felCat  DRYFAITSPFKYQSLLTKNKARVVILMVW   turTru  DRYIAIRIPLRYNGLVTGTRAKGIIAVCW
canFam  DRYFAITSPFKYQSLLTKNKARVVILMVW
canFam  DRYFAITSPFKYQSLLTKNKARVVILMVW   bosTau  DRYIAIRIPLRYNGLVTGTRAKGIIAVCW
myoLuc  DRYFAITSPFKYQSLLTKNKARVVILLVW
myoLuc  DRYFAITSPFKYQSLLTKNKARVVILLVW   canFam  DRYIAIRIPLRYNGLVTGTRAKGIIAVCW
pteVam  DRYFAITSPFKYQSLLTKNKARVVILMVW
pteVam  DRYFAITSPFKYQSLLTKNKARVVILMVW   myoLuc  DRYIAIRIPLRYNGLVTGARAKGIIAICW
eriEur  DRYFAITSPFKYQSLLTKNKARVVILMVW
eriEur  DRYFAITSPFKYQSLLTKNKARVVILMVW   eriEur  DRYIAIRIPLRYNGLVTGQRAKGIIAVCW
sorAra  DRYFAITSPFKYQSLLTKNKARGVILMVW
sorAra  DRYFAITSPFKYQSLLTKNKARGVILMVW   loxAfr  DRYIAIRIPLRYNGLVTGTRAKGIIAVCW
proCap  DRYFAITSPFKYQSLLTKNKARVVILMVW
proCap  DRYFAITSPFKYQSLLTKNKARVVILMVW   proCap  DRYIAIRIPLRYNGLVTGTRAKGIIAVCW
echTel  DRYFAITSPFKYQSLLTKNKARVVILMVW
echTel  DRYFAITSPFKYQSLLTKNKARVVILMVW   galGal  DRIIAIRIPLRYNGLVTGSRAKGIIAICW
dasNov  DRYFAITSPFKYQSLLTKNKARVVILMVW
dasNov  DRYFAITSPFKYQSLLTKNKARVVILMVW   taeGut  DRIIAIRIPLRYNGLVTGSRAKGIIAICW
monDom  DRYFAITAPFRYQSMLTKGKARVVILVVW
monDom  DRYFAITAPFRYQSMLTKGKARVVILVVW   xenTro  DRYIAIRIPLRYNSLVTSRRANAIIAVCW
galGal  DRYFAITSPFKYQSLLTKSKARVVILVVW
galGal  DRYFAITSPFKYQSLLTKSKARVVILVVW   tetNig  DRYIAIKLPLRYNGLVTGQRAQAIIAICW
taeGut  DRYFAITSPFKYQSLLTKGKARVVILVVW
taeGut  DRYFAITSPFKYQSLLTKGKARVVILVVW   tetRub  DRYIAIKLPLRYNSLVTGKRAQGIIAICW
anoCar  DRYFAITSPFKYQSHLTKNKARVIILLVW
anoCar  DRYFAITSPFKYQSHLTKNKARVIILLVW   gasAcu  DRYIAIKIPLRYNGLVTGQRAQGIIAICW
xenTro  DRYFAITSPFRYQSLLTKCKARIVILLVW
xenTro  DRYFAITSPFRYQSLLTKCKARIVILLVW   oryLat  DRYIAIKIPLRYNSLVTSQRARGIIAICW
                                        danRer  DRYIAIKIPLRYNSLVTGQRARGIIAICW
</pre>
</pre>


[[Category:Comparative Genomics]]
[[Category:Comparative Genomics]]

Revision as of 12:11, 27 January 2009

Comparative genomics of the cytoplasmic face of GPCR proteins

The cytoplasmic 'face' of opsin (or any GPCR) is presumably responsible for all interactions with downstream signal relaying partners because these latter are cytoplasmic proteins having no access to extracellular loops or transmembrane segments. Here it must be noted that ligand photoisomerization and release from Schiff base deep within the transmembrane region must drive a significant change in conformation in the cytoplasmic face that differentiates inactive from active states.

The cytoplasmic face is comprised of three loops and carboxy terminus. For bioinformatic purposes, it is convenient to 'reorganize' each linear protein sequence into its intracellular, membrane and outer regions for separate consideration. This is done below for the cytoplasmic face of 500 curated opsins from each of the 18 vertebrate opsin orthology classes using multiple representatives for each phylogenetic node and intense bracketing at eras of change (eg between DRY and GRY opsins of RGR class). A range of non-opsin GPCR are included to define properties common to all members of this large gene family (not specific to opsins).

The two critical goals in GPCR research are to determine the natural ligands (which largely concerns the extracellular and transmembrane regions) notably for orphan receptors and to determine their specific Galpha signaling partner among the 16 such paralogs in the vertebrate genome. For the 18 orthology classes of vertebrate opsins, the ligand is already known (11-cis retinal or related) but the signaling partner is generally not. As an example, does RGR opsin signal, to what purpose, and what is the meaning of the abrupt shift in the DRY motif to GRY in boreoeutheres?

           DRY loop motif       transmemb L  7 9 signaling
ENCEPH_hom ERYIRVVHARVINFSW     AWRAITYIW 16 V A G?
RGR_homSap GRYHHYCTRSQLAWNS     AVSLVLFVW 16 C R G?
RGR2_gasAc DRYHQYCTRQKLFWST     TLTMSAIIW 16 C R G?
RHO1_homSa ERYVVVCKPMSNFRFGENH  AIMGVAFTW 19 C P GNAT1
RHO2_galGa ERYIVVCKPMGNFRFSATH  AMMGIAFTW 19 C P GNAT2
SWS2_ornAn ERFLVICKPLGNLSFRGTH  AIFGCAATW 19 C P GNAT2
PIN_galGal ERYVVVCRPLGDFQFQRRH  AVSGCAFTW 19 C P G?
SWS1_homSa ERYIVICKPFGNFRFSSKH  ALTVVLATW 19 C P GNAT2
LWS_homSap ERWMVVCKPFGNVRFDAKL  AIVGIAFSW 19 C P GNAT2
VAOP_galGa ERYIVICRPVGNMRLRGKH  AAQGIAFVW 19 C P Gt
PARIE_utaS ERYNVVCQPLGTLQMSTKR  GYQLLGFIW 19 C P Gd+Go
PPIN_xenTr DRVFVVCKPMGTLTFTPKQ  ALAGIAASW 19 C P Gt
PER_homSap DRYLTICLPDVGRRMTTNT  YIGLILGAW 19 C P Go
NEUR1_homS DRYLKICYLSYGVWLKRKH  AYICLAAIW 19 C L G?
NEUR2_galG VCCLKICFPAYGNRFRRKH  GQILIACAW 19 C P G?
TMT_monDom ERYRTL-TLCPGQGADYQK  ALLAVAGSW 19 - L G?
MEL1_homSa DRYLVITRPLATFGVASKRR AAFVLLGVW 20 T P Gq
MEL2_anoCa DRYCVITKPLQSIKRTSKKR TCIIIVFVW 20 T P Gq

While it might seem straightforward to thread any opsin onto its best fit among the five newly available crystallographic structures, that does not work for distantly related paralogs beyond the universal 7-transmembrane feature because loop regions can be of quite different length and so lack discernable alignability, having diverged greatly in amino acid sequence (even though they are all ultimately homologous).

While these structures entail various compromises (such as replacemente of C3 by lysozylme and deletion of carboxy tail to enable stable crystallization), they are hugely important to annotation transfer of sequence/function relationships via comparative genomics. Yet most of the 18 vertebrate opsin orthology classes have only remote models to date and even these can be indeterminate for mid-loop C2 residues (indicative of flexible conformation).

Gene           PDB            Protein                     PubMed      Best human opsin   Next Best         Signaling

RHO1_bosTau    1JFP 3C9M 2J4Y bovine rod rhodopsin        17825322  RHO1_homSap 93%   SWS1_homSap   45%  Gt GNAT1 raises cGMP
MEL1_todPac    2Z73 2ZIY      squid melanopsin            18480818  MEL1_homSap 43%   PER1_homSap   30%  Gq GNAQ? inositol trisphosphate
ADORA2A_homSap 3EML           adenosine receptor 2A       18832607  MEL1_homSap 27%   ENCEPH_homSap 27%  Gs GNAT3 raises cAMP
ADRB1_melGal   2VT4           beta 1 adrenergic receptor  18594507  MEL1_homSap 29%   ENCEPH_homSap 25%  Gs GNAT3 raises cAMP
ADRB2_homSap   2R4R           beta 2 adrenergic receptor  17962520  MEL1_homSap 28%   PER1_homSap   29%  Gs GNAT3 raises cAMP

It has not proven feasible to predict loop conformations ab initio or from peptide libraries; it is folly to consider individual loop structure in isolation (rather than the cytoplasmic face in its entirety) or fail to specify the activation state being computed. Any predicted structure and special roles for individual residues be consistent with the comparative genomics of close and even distant orthologs because binding relationships to Galpha and other proteins do not change rapidly in evolutionary time (as seen from heterologous substitution experiments). Even when a cytoplasmic loop seems to lack a definable structure, individual residues can be conserved over vast branch length times. That conservation must ultimately be explained.

OpsinCyto3D.jpg

Two new high resolution structures of squid melanopsin establish that the cytoplasmic face is not structurally homologous as a whole across paralogous opsin classes. We knew this already from comparative genomics alone but not specifically why. The xray structure exhibits unprecedented rigid extensions of transmembrane helices 5 and 6 of order 25 angstroms out into the cytoplasm, greatly constraining the intermediate residues of cytoplasmic loop C3. The proximal carboxy terminus also contributes importantly to the overall structure here.

The squid melanopsin structure, used at SwissModel, could readily predict the structure of the cytoplasmic face of all opsins of melanopsin class, of which 48 vertebrate sequences, 9 lophotrochozoan, 43 arthropod, and 1 cnidarian sequences are available here. The Gq signalling partner will be used throughout these melanopsins, yet what features the Galpha protein specifically recognizes in the cytoplasmic face remains obscure. It cannot really be the helical extensions per se because the Gq protein is structurally still homologous to its 15 paralogs (in vertebrates) of different signaling types.

The second cytoplasmic loop

In squid melanopsin, first six residues of cytoplasmic loop C2 also form an extensional helix in squid melanopsin beginning with the DRY motif and surprisingly terminating three residues before the deeply conserved proline (normally a helix breaker as in adrenergic receptors). This proline alone cannot define the two states through its cis and trans configurations because glycine or leucine can also characterize whole opsin orthology classes at this position. The last 3 residues of basic character HRR of loop C2 also preface a transmembrane helix as RAR do in turkey receptor.

Cytoplasmic loop C2 has conserved length of 16-20 in all opsins with much more rigid constraint within individual opsin classes (eg all vertebrate imaging opsins have length 19. The structure of the C2 loop of over 100 melanopsins can readily be modelled based on its closest match among the determined structures, currently squid melanopsin or bovine rhodopsin, with adenosine and adrenergic receptors serving as 'structural outgroup'.

On the basis of length (19 to rhodopsin, 20 to melanopsin), all the opsins except encephalopsin and RGR (both 16 residues) and TMT (18 residues subsequent to a deletion in amniote stem) have a structural model. This model is further constrained by predictable helical extensions of transmembrane helices into the cytoplasm, leaving only the mid-loop region to be predicted. It's not clear whether observed residue conservation -- both within and across orthology classes -- derives from structural importance or instead to Galpha binding specificity requirements.

The adenosine and adrenergic receptor structures -- however useful they might be for annotation transfer to the other 350 non-oderant human GPCR -- ultimately will not prove helpful in modeling the second cytoplasmic loop of opsins (squid melanopsin does that better already). Note C2 in these three structures is consistently stablized by a mid-loop hydrogen bond to the DRY residues. This constraint is not observed in squid melanopsins or other metazoan opsin classes; indeed it is not feasible because no hydrogen bond-capable residue consistently occurs there (in the comparative genomics sense of conserved residue). Ancestrally, this mid-loop bridge might be a derived feature fairly early in the stem of non-opsin GPCR.

OpsinCyto2Five.jpg

MelSecStr.jpg


(to be continued)


Reference sequence collection

Cytoplasmic loop C2 from 101 melanopsins

species    helix bridge area  hel transmemb Le 7 9
MEL1_homSa DRYLV ITRPLATFGVAS KRR AAFVLLGVW 20 T P
MEL1_panTr DRYLV ITRPLATFGVAS KRR AAFVLLGVW 20 T P
MEL1_gorGo DRYLV ITRPLATFGVAS KRR AAFVLLGVW 20 T P
MEL1_ponAb DRYLV ITRPLATIGVAS KRR AAFVLLGVW 20 T P
MEL1_rheMa DRYLV ITRPLATIGVAS KRR AAFVLLGVW 20 T P
MEL1_calJa DRYLV ITRPLATIGVAS TKR AAFVLLGVW 20 T P
MEL1_micMu DRYLV ITRPLASVGTAS KRR AGLVLLGVW 20 T P
MEL1_otoGa DRYLV ITRPLTTVGVAS KRR AALVLLGVW 20 T P
MEL1_musMu DRYLV ITRPLATIGRGS KRR TALVLLGVW 20 T P
MEL1_ratNo DRYLV ITRPLATIGMRS KRR TALVLLGVW 20 T P
MEL1_nanEh DRYLV ITRPLATIGVAS KRR TALVLLGVW 20 T P
MEL1_phoSu DRYLV ITRPLATIGMGS KRR TALVLLGIW 20 T P
MEL1_dipOr DRYLV ITRPLATIGVTS KRR TAFVLLGVW 20 T P
MEL1_cavPo DRYLV ITRPLATIGVAS KRQ AALVLLGVW 20 T P
MEL1_speTr DRYLV ITRPLATIGMAS KKR AAFFLLGVW 20 T P
MEL1_oryCu DRYLV ITRPLAAVGMVS KKR AGLVLLGVW 20 T P
MEL1_ochPr DRYLV ITRPLAAVGMVS KRR TGLVLLGVW 20 T P
MEL1_bosTa DRYLV ITRPLATVGMVS KRR AALVLLGVW 20 T P
MEL1_turTr DRYLV ITRPLATVGMVS KRR AALVLLGVW 20 T P
MEL1_susSc DRYLV ITHPLATVGMVS KRR AALVLLGVW 20 T P
MEL1_equCa DRYLV ITRPLATVGVVS KRW AALVLLGIW 20 T P
MEL1_felCa DRYLV ITHPLATIGVVS KRR AALVLLGVW 20 T P
MEL1_canFa DRYLV ITHPLAAVGVVS KRR AALVLLGVW 20 T P
MEL1_myoLu DRYLV ITRPLA-IGVVS KRR AALVLLGVW 19 T P
MEL1_pteVa DRYLV ITRPLAAIGVVS KRR AALVLLGVW 20 T P
MEL1_eriEu DRYLV ITRPLATIGVVS KRR VALVLLGVW 20 T P
MEL1_loxAf DRYLV ITRPLATIGVVS KRR AALVLLGIW 20 T P
MEL1_proCa DRYLV ITRPLATIGVVS KRR TALVLLGTW 20 T P
MEL1_echTe DRYLV ITRPLATIGVVS KRR AALVLLVIW 20 T P
MEL1_smiCr DRYFV ITRPLASIGMIS KKK TGLILLGVW 20 T P
MEL1_monDo DRYFV ITRPLASIGVIS KKK TGFILLGVW 20 T P
MEL1_ornAn DRYFV ITRPLASIGVIS KKR ALLILTGVW 20 T P
MEL1_anoCa DRYFV ITRPLASIGAMS TKK ALLILSGVW 20 T P
MEL1_taeGu DRYFV ITKPLASVGVTS KKK ALIILVGVW 20 T P
MEL1_galGa DRYFV ITKPLASVRVMS KKK ALIILVGVW 20 T P
MEL1_xenTr DRYFV ITRPLTSIGVMS KKR AVLILSGVW 20 T P
MEL1_danRe DRYFV ITRPLASIGVLS QKR ALLILLVAW 20 T P
MEL1_danRe DRYFV ITRPLASIGVMS RKR ALLILSAAW 20 T P
MEL1_takRu DRYFV ITRPLTSIGVLS RKR AFVILMTVW 20 T P
MEL1_gasAc DRYFV ITRPLTSIGMMS RRR ALLILMGAW 20 T P
MEL1_oryLa DRYFV ITRPLTSIGVLS RKR ALLILSAAW 20 T P
MEL1_calMi DRYFV ITRPLASIGVLS HRR AGLIILSLW 20 T P
MEL1_petMa DRYLV LTRPLASIGAMS KRR AMYITAAVW 20 T P
MEL2_galGa DRYLV ITKPLRSIQWTS KKR TIQIIAAVW 20 T P
MEL2_anoCa DRYCV ITKPLQSIKRTS KKR TCIIIVFVW 20 T P
MEL2_xenLa NRYIV ITKPLQSIQWSS KKR TSQIIVLVW 20 T P
MEL2_danRe DRYLV ITKPLQTIQWNS KRR TGLAILCIW 20 T P
MEL2_tetNi DRYVV ITKPLQTIRRSS KRR TALAILMVW 20 T P
MEL2_gasAc DRYLV ITKPLQAIHWGS KRR TTLAILLVW 20 T P
MEL1_plaDu DRFYV ITNPLGAAQTMT KKR AFIILTIIW 20 T P
MEL1_capCa DRYMV IAKPFYAMKHVS HKR SLIQIILAW 20 A P
MEL1_helRo DRYLV VGQPLAMLNQSH FRR SFYHVLIIW 20 G P
MEL1_todPa DRYNV IGRPMAASKKMS HRR AFIMIIFVW 20 G P
MEL1_schMe DRYFV IAQPFQTMKSLT IKR AIIMLVFVW 20 A P
MEL2_schMa DRYLV IATPFESVFQTT PRR TLLLMLFLW 20 A P
MEL1_lotGi DRYLV ITSPFTAMRNMT HKR AFLMIVGVW 20 T P
MEL1_sepOf DRYNV IGRPMAASKKMS HRR AFLMIIFVW 20 G P
MEL1_entDo DRYNV IGRPMAASKKMS HRR AFLMIIFVW 20 G P
UVV_camAb  DRYST IARPLDGKLS   RGQ VLLLIMLIW 18 A P
UVV_catBo  DRYST IARPLDGKLS   RGQ VILLIALIW 18 A P
UVV_apiMe  DRYST IARPLDGKLS   RGQ VILFIVLIW 18 A P
BLU_apiMe  DRYRT ISCPIDGRLN   SKQ AAVIIAFTW 18 S P
BLU_ DRoMe DRYKT ISNPIDGRLS   YGQ IVLLILFTW 18 S P
BLU_manSe  DRYKT ISSPLDGRIN   TVQ AGLLIAFTW 18 S P
UVV1_droMe DRYNV ITKPMNRNMT   FTK AVIMNIIIW 18 T P
UVV1_pedHu DRCET ITNPL-QKSG   KKK AFLLAAFTW 18 T P
UVV_manSe  DRHST ITRPLDGRLS   EGK VLLMVAFVW 18 T P
UVV_papXu  DRHST ITRPLDGRLS   RGK VLLMMVCVW 18 T P
UVV2_droMe DRFNV ITRPMEGKMT   HGK AIAMIIFIY 18 T P
UVV2_pedHu DRYQV IVHPLER-KT   KAA VYFQILLIW 18 V P
LWS_nemVe  DRYIV IVHPMKKIMT   RKK AALMIVGVW 18 V P
LWS_pedHu  DRYNV IVKGLSAKPMT  IKM ALLNILFVW 19 V G
LWS_vanCa  DRYNV IVKGIAAKPLT  ING AMLRVLGIW 19 V G
LWS_papXu  DRYNV IVKGIAAKPMT  ING ALLRILGIW 19 V G
LWS_helSa  DRYNV IVKGIAAKPMT  ING ALLRVFGIW 19 V G
LWS_pieRa  DRYNV IVKGIAAKPMT  INS ALLRILGVW 19 V G
LWS_manSe  DRYNV IVKGIAAKPMT  SNG ALLRILGIW 19 V G
MWS2_droMe DRYNV IVKGINGTPMT  IKT SIMKILFIW 19 V G
LWS_rhoPr  DRYNV IVKGISAKPMT  NKT AMLRILLVW 19 V G
LWS_meoOe  DRYNV IVKGISGTPLS  QKN TTLQVLFVW 19 V G
LWS_catBo  DRYNV IVKGLSAKPMT  ING ALLRILGIW 19 V G
LWS_schGr  DRYNV IVKGLSAKPMT  NKT AMLRILFIW 19 V G
LWS_triCa  DRYNV IVKGLSAQPLT  KKG AMLRILIIW 19 V G
LWS2_apiMe DRYNV IVKGLSGKPLS  ING ALIRIIAIW 19 V G
LWS_bomTe  DRYNV IVKGLSGKPLT  ING ALLRILGIW 19 V G
MWS_calEr  DRYNV IVKGMAGQPMT  IKL AIMKIALIW 19 V G
MWS1_droMe DRYQV IVKGMAGRPMT  IPL ALGKIAYIW 19 V G
LWS_droMe  DRYCV IVKGMARKPLT  ATA AVLRLMVVW 19 V G
LWS_arcGr  DRYNV IVKGVAAEPLT  SKG ASIRILFVW 19 V G
LWS_eupSu  DRYNV IVKGVAATPLT  NKG AFARNIFSW 19 V G
LWS_camLu  DRYNV IVKGVAGEPLS  TKK ASLWILTVW 19 V G
LWS_proMi  DRYNV IVKGVAGEPLS  TKK ASLWILIVW 19 V G
LWS_holCo  DRYNV IVKGVSAEPLT  SGG AMMRIAGTW 19 V G
LWS_homGa  DRYNV IVKGVSATPLT  TNG AMLRNLFSW 19 V G
LWS_neoAm  DRYNV IVKGVSGEPLT  NSG AMTRIAGTW 19 V G
LWS_neoOe  DRYNV IVKGVSGKPLS  QKN ATLQVLFVW 19 V G
LWS_mysDi  ERYNV IVKGVSSKPLS  VKG AITRIVLTW 19 V G
LWS1_apiMe DRYNV IVKGMSGTPLT  IKR AMLQILGIW 19 V G
LWS_limPo  DRYNV IVRGMAAAPLT  HKK ATLLLLFVW 19 V G
LWS_limPo  DRYNV IVRGMAAAPLT  HKK ATLLLLFVW 19 V G
LWS_ixoSc  DRYNV IVRGVAAAPLT  HKR AALMIFFVW 19 V G
ADRB2_homS DRYFA ITSPFKYQSLLT KNK ARVIILMVW 20 T P
ADRA2A_hom DRYWS ITQAIEYNLKRT PRR IKAIIITVW 20 T A
ADRA2C_hom DRYWS VTQAVEYNLKRT PRR VKATIVAVW 20 T A
HTR1A_homS DRYWA ITDPIDYVNKRT PRR AAALISLTW 20 T P
CHRM1_homS DRYFS VTRPLSYRAKRT PRR AALMIGLAW 20 T P
DRD2_homSa DRYTA VAMPMLYNTRYS KRR VTVMISIVW 21 A P
TAAR9_homS DRYIA VTDPLTYPTKFT VSV SGICIVLSW 20 T P
ADRA2B_hom DRYWA VSRALEYNSKRT PRR IKCIILTVW 20 S A

Reference collection of 352 cytoplasmic loop sequences from all opsins

The second column contains the C2 loop sequences. The third column shows the continuation into transmembrane helix 4. The end of the loop region is determined by countback from the invariant tryptophan at position 160 in squid melanopsin as well as from crystallography and transmembrane prediction tools. Other columns show loop length and values at potentially informative positions 7 and 9 (which are generally characteristic of orthology class).

RHO1_homSa	ERYVVVCKPMSNFRFGENH	AIMGVAFTW	19	C	P
RHO1_bosTa	ERYVVVCKPMSNFRFGENH	AIMGVAFTW	19	C	P
RHO1_ornAn	ERYIVVCKPMSNFRFGENH	AIMGVAFTW	19	C	P
RHO1_monDo	ERYVVVCKPMSNFRFGENH	AIIGVAFTW	19	C	P
RHO1_galGa	ERYVVVCKPMSNFRFGENH	AIMGVAFSW	19	C	P
RHO1_calMi	ERYVVVCKPMSNFRFGTNH	AIMGVAFTW	19	C	P
RHO1_xenTr	ERYVVVCKPMANFRFGENH	AIMGVVFTW	19	C	P
RHO1_latCh	ERYVVVCKPMSNFRFGENH	AIMGVIFTW	19	C	P
RHO1_neoFo	ERYIVVCKPISNFRFGENH	AIMGVVFTW	19	C	P
RHO1_angAn	ERWVVVCKPMSNFRFGENH	AIMGLAFTW	19	C	P
RHO1_takRu	ERYIVVCKPMTNFRFGEKH	AIAGLVFTW	19	C	P
RHO1_leuEr	ERYMVVCKPMANFRFGSQH	AIIGVVFTW	19	C	P
RHO1_petMa	ERYIVICKPMGNFRFGSTH	AYMGVAFTW	19	C	P
RHO1_letJa	ERYIVICKPMGNFRFGNTH	AIMGVAFTW	19	C	P
RHO1_geoAu	ERYIVICKPMGNFRFGNTH	AIMGVALTW	19	C	P
RHO2_galGa	ERYIVVCKPMGNFRFSATH	AMMGIAFTW	19	C	P
RHO2_gekGe	ERYIVICKPMGNFRFSATH	AIMGIAFTW	19	C	P
RHO2_anoCa	ERYIVVCKPMGNFRFSATH	ALMGISFTW	19	C	P
RHO2_taeGu	ERYIVICKPMGNFRFSASH	ALMGIAFTW	19	C	P
RHO2_podSi	ERYIVVCKPMGNFRFSSSH	ALMGIAFTW	19	C	P
RHO2_pheMa	ERYIVICKPMGNFRFSSSH	AMMGISFTW	19	C	P
RHO2_latCh	ERYIVVCKPMGNFRFASSH	AIMGIAFTW	19	C	P
RHO2_geoAu	ERYIVVCKPMGNFRFATTH	AALGVVFTW	19	C	P
RHO2_neoFo	ERYIVVCKPMGNFRFSNNH	SIIGIVFTW	19	C	P
RHO1_anoCa	ERYVVICKPMSNFRFGETH	ALIGVSCTW	19	C	P
RHO1_conMy	ERWMVVCKPVTNFRFGESH	AIMGVMVTW	19	C	P
RHO2_ancDa	ERYIVVCKPMGSFKFSSSH	AMAGIAFTW	19	C	P
RHO2a_danR	ERYIVVCKPMGSFKFSANH	AMAGIAFTW	19	C	P
RHO2b_danR	ERYIVVCKPMGSFKFSSNH	AMAGIAFTW	19	C	P
RHO2c_danR	ERYIVVCKPMGSFKFSSNH	AFAGIGFTW	19	C	P
RHO2d_danR	ERYIVVCKPMGSFKFSASH	AFAGCAFTW	19	C	P
RHO2_oryLa	ERYIVVCKPMGSFKFTATH	SAAGCAFTW	19	C	P
RHO2_takRu	ERYVVVCKPMGSFKFTGTH	AAVGVAFTW	19	C	P
RHO2_gasAc	ERYIVVCKPMGSFKFSGTH	AGAGVLFTW	19	C	P
RHO2_hipHi	ERYIVVCKPMGSFKFSGTH	AGIGVLFTW	19	C	P
RHO2_mulSu	ERYIVVCKPMGSFKFSGTH	AGAGVAFTW	19	C	P
RHO2_oreNi	ERYIVVCKPMGSFKFTGAH	AGAGVLFTW	19	C	P
RHO2_pomMi	ERYIVVCKPMGSFKFSGAH	AGAGVALTW	19	C	P
SWS2_ornAn	ERFLVICKPLGNLSFRGTH	AIFGCAATW	19	C	P
SWS2_anoCa	ERYLVICKPLGNFTFRGTH	AIIGCAVTW	19	C	P
SWS2_utaSt	ERFLVICKPLGNFSFRGTH	AIIGCIITW	19	C	P
SWS2_taeGu	ERFLVICKPLGNFTFRGSH	AVLGCAITW	19	C	P
SWS2_galGa	ERFLVICKPLGNFTFRGSH	AVLGCVATW	19	C	P
SWS2_neoFo	ERFLVICKPLGNFTFRSTH	AIIGCVATW	19	C	P
SWS2_xenTr	ERFLVICKPMGNFTFRESH	AVLGCILTW	19	C	P
PIN_galGal	ERYVVVCRPLGDFQFQRRH	AVSGCAFTW	19	C	P
PIN_pheMad	ERYLVICKPVGDFQFQRRH	AVIGCLYTW	19	C	P
PIN_utaSta	ERYLVICKPVGDFRFQQRH	AVFGCVFTW	19	C	P
PIN_xenTro	ERYLVICKPMGDFRFQQKH	AILGCSFTW	19	C	P
PIN_bufJap	ERYIVICKPMGDFRFQQRH	AVMGCAFTW	19	C	P
PIN_podSic	ERYLVICKPVGDFRFPARH	AVLGCAFTW	19	C	P
PIN_calMil	ERYIVICKPMGDFRFQQKH	AVWGCLFTW	19	C	P
SWS1_homSa	ERYIVICKPFGNFRFSSKH	ALTVVLATW	19	C	P
SWS1_monDo	ERFIVICKPFGNFRFNSKH	AMMVVLATW	19	C	P
SWS1_smiCr	ERFIVICKPFGNFRFNSKH	AMMVVLATW	19	C	P
SWS1_tarRo	ERFIVICKPFGNFRFSSKH	AMMVVLATW	19	C	P
SWS1_taeGu	ERYIVICKPFGNFRFNSRH	ALLVVAATW	19	C	P
SWS1_anoCa	ERYIVICKPFGNFRFNSRH	ALLVVAATW	19	C	P
SWS1_utaSt	ERYIVICKPFGNFRFNSKH	ALLVVAATW	19	C	P
SWS1_galGa	ERYIVICKPFGNFRFSSRH	ALLVVVATW	19	C	P
SWS1_geoAu	ERYIVICKPFGNFRFGSKH	ALVAVGLTW	19	C	P
SWS1_neoFo	ERYLVICKPIGNFRFGSKH	SMIAVVAAW	19	C	P
SWS1_xenLa	ERYIVICKPMGNFNFSSSH	ALAVVICTW	19	C	P
SWS1_petMa	ERYIVICKPFGNFRFGSIH	SLFAFCLTW	19	C	P
SWS1_danRe	ERYVVICKPFGSFKFGQGQ	AVGAVVFTW	19	C	P
SWS1_oryLa	ERYLVICKPFGAFKFGSNH	ALAAVIFTW	19	C	P
SWS2_geoAu	ERCLVICKPFGNIAFRGTH	ALIRCGFAW	19	C	P
SWS2_takRu	ERWLVVCKPLGNFIFKPDH	AIVCCIFTW	19	C	P
SWS2_gasAc	ERWLVICKPLGNFIFKPDH	ALVCCAFTW	19	C	P
LWS_homSap	ERWMVVCKPFGNVRFDAKL	AIVGIAFSW	19	C	P
LWS_monDom	ERWVVVCKPFGNVKFDAKL	AMVGIIFSW	19	C	P
LWS_ornAna	ERWIVVCKPFGNVKFDAKL	AMVGIVFSW	19	C	P
LWS_anoCar	ERWVVVCKPFGNVKFDAKL	AVAGIVFSW	19	C	P
LWS_galGal	ERWFVVCKPFGNIKFDGKL	AVAGILFSW	19	C	P
LWS_xenTro	ERWFVVCKPFGNIKFDGKL	AATGIIFSW	19	C	P
LWS_neoFor	ERWVVVCKPFGNIKFDGKW	AAGGIIFSW	19	C	P
LWS_calMil	ERWVVVCKPFGNVKFDGKW	AAFGIIFSW	19	C	P
LWS_takRub	ERWVVVCKPFGNVKFDAKW	ATGGIVFSW	19	C	P
LWS_gasAcu	ERWIVVCKPFGNVKFDAKW	ATAGIVFSW	19	C	P
LWS_petMar	ERWMVVCKPFGNIKFDGKI	ATILIVFSW	19	C	P
LWS_letJap	ERWMVVCKPFGNIKFDGKI	AIILIVFSW	19	C	P
LWS_geoAus	ERWMVVCKPFGNLKFDGKV	AIVLIIFSW	19	C	P
VAOP_galGa	ERYIVICRPVGNMRLRGKH	AAQGIAFVW	19	C	P
VAOP_anoCa	ERYVVICRPLGNMRLNGKH	AALGVAFVW	19	C	P
VAOP_xenTr	ERYIVICRPLGNLRLQGKH	SALAIIFVW	19	C	P
VAOP_danRe	ERFFVICRPLGNIRLRGKH	AALGLVFVW	19	C	P
VAOP_rutRu	ERFFVICRPLGNIRLRGKH	AALGLLFVW	19	C	P
VAOP_takRu	ERFFVICRPLGNMRLQAKH	AAIGLLFVW	19	C	P
VAOP_petMa	ERYFVICRPLGNFRLQSKH	AVLGLAVVW	19	C	P
PPIN_anoCa	DRAIVIAKPMGTITFTTRK	AMIGVAVSW	19	A	P
PPIN_xenTr	DRVFVVCKPMGTLTFTPKQ	ALAGIAASW	19	C	P
PPIN_ictPu	DRYMVVCRPLGAVMFQTKH	ALAGVVFSW	19	C	P
PPIN_oncMy	DRYVVVCRPMGAVMFQTRH	AVGGVVLSW	19	C	P
PPIN_danRe	ERCMVVCRPVGSISFQTRH	AVFGVAVSW	19	C	P
PPIN_petMa	DRFVVVCKPLGTLMFTRRH	ALLGITWAW	19	C	P
PPIN_letJa	DRFVVVCKPLGTLMFTRRH	ALLGIAWAW	19	C	P
PPIN2_petM	ERYVVVCKPLGGVHFGTQH	GLCGVAISW	19	C	P
PARIE_utaS	ERYNVVCQPLGTLQMSTKR	GYQLLGFIW	19	C	P
PARIE_anoC	ERYNVVCQPLGTLQMSTQR	AYQLLGFIW	19	C	P
PARIE_xenT	ERYNVVCEPIGALKLSTKR	GYQGLVFIW	19	C	P
PARIE_takR	ERYNVVCKPRAGLKLTMRR	SIIGLLFVW	19	C	P
PARIE_gasA	ERYNVVCRPRNALKLSMRR	SIHGLLIVW	19	C	P
PARIE_danR	ERYNVVCKPMAGFKLNVGR	SCQGLLLVW	19	C	P
PER_homSap	DRYLTICLPDVGRRMTTNT	YIGLILGAW	19	C	P
PER_panTro	DRYLTICLPDVGRRMTTNT	YIGLILGAW	19	C	P
PER_nomLeu	DRYLTICLPDVGRRMTTNT	YIGLILGAW	19	C	P
PER_gorGor	DRYLTICLPDVGRRMTTNT	YIGLILGAW	19	C	P
PER_ponPyg	DRYLTICLPDIGRRMTTNT	YIGLILGAW	19	C	P
PER_macMul	DRYLTICLPDIGRRMTTNT	YIGMILGAW	19	C	P
PER_papHam	DRYLTICLPDIGRRMTTNT	YIGMILGAW	19	C	P
PER_otoGar	DRYLTICRPDIGRRMTTNS	YIGMILGAW	19	C	P
PER_tarSyr	DRYLTICRPDIGRRMTTNT	YVGMILGAW	19	C	P
PER_micMur	DRYLTICRPDIGRRMTTHT	YVGMILGAW	19	C	P
PER_cavPor	DRYLTICRPDIGRRMTSHS	YVGMILGAW	19	C	P
PER_ochPri	DRYLTICQPDIGRRMTTHT	YFGMILGAW	19	C	P
PER_oryCun	DRYLTICHPDVGRRMTTRT	YLGLILGAW	19	C	P
PER_calJac	DRYLTICLPDIGRRMTTST	YIIMILGAW	19	C	P
PER_canFam	DRYLTICSPDTGRRMTTNT	YISMILGAW	19	C	P
PER_felCat	DRYLTICSPNSGRRMTTNT	YISMILGAW	19	C	P
PER_susScr	DRYLTICRPEAGRRMTTNT	YISMILGAW	19	C	P
PER_vicVic	DRYLTICRPDAGRRMTTNT	YISMILGAW	19	C	P
PER_turTru	DRYLTICCPGAGRRMTTNT	YISMILGAW	19	C	P
PER_bosTau	DRYLTICHPDAGRRMTANT	YISMILGAW	19	C	P
PER_choHof	DRYLTICHPDVGRRMTINT	YISMILGAW	19	C	P
PER_dasNov	DRYLTICRPDTGRRMTINT	YISMILGAW	19	C	P
PER_echTel	DRYLTICHPDRGRRMTSNT	YVGMILGAW	19	C	P
PER_loxAfr	DRYLTICHPHIGRRMTSNT	YVSMILGAW	19	C	P
PER_sorAra	DRYLTLCRPDAGRSMTTNS	YVGLILGAW	19	C	P
PER_equCab	DRYLTTCRPDAGRRMTTST	YTSMILGAW	19	C	P
PER_dipOrd	DRYLTICHPDIGRGMTTRT	YVTMILGAW	19	C	P
PER_musMus	DRYLTISCPDVGRRMTTNT	YLSMILGAW	19	S	P
PER_ratNor	DRYLTISCPDVGRRMTGNT	YLSMVLGAW	19	S	P
PER_eriEur	DRYLTICRPHTGRSMSANS	YIAMILGAW	19	C	P
PER_tupBel	DRYLTLCRPAVGRRMGSST	YAAMILGAW	19	C	P
PER_monDom	DRYLTICQPDLGGRMTSYN	YTLMILTAW	19	C	P
PER_ornAna	DRYLTICRPAIGRKMTRSN	YTAMILAAW	19	C	P
PER_xenTro	DRYLTICRPDIGRRISGRH	YTAMILAAW	19	C	P
PER_galGal	DRYLTICRPDIGRRMTTRN	YAALILAAW	19	C	P
PER_anoCar	DRYLTICKPHIGSRLTATN	YTTLILAAW	19	C	P
PER_taeGut	DRYLTICRPDIGRRMTTRS	YATLILAAW	19	C	P
PER1_gasAc	DRYLTICRPDIGQKMTMQS	YNLLILAAW	19	C	P
PER_gasAcu	DRYLTICRPDIGQKMTMQS	YNLLILAAW	19	C	P
PER_oryLat	DRYLTICRPDLGQKMTMQS	YNLLILAAW	19	C	P
PER_takRub	DRYITICRPDIGRKMTVQS	YNLLILAAW	19	C	P
PER_tetNig	DRYLTICRPDIGRKMTVQS	YNLLIAAAW	19	C	P
PER_danRer	DRYLTICRPDIGQKLTTRS	YTLLIVAAW	19	C	P
PER1a_sacK	DRYWATCSPVEVMELKSKY	YTRMTALGW	19	C	P
NEUR1_homS	DRYLKICYLSYGVWLKRKH	AYICLAAIW	19	C	L
NEUR1_nomL	DRYLKICYLSYGVWLKRKH	AYICLAAIW	19	C	L
NEUR1_panT	DRYLKICYLSYGVWLKRKH	AYICLAAIW	19	C	L
NEUR1_ponP	DRYLKICYLSYGVWLKRKH	AYICLAAIW	19	C	L
NEUR1_macM	DRYLKICYLSYGVWLKRKH	AYICLAAIW	19	C	L
NEUR1_papH	DRYLKICYLSYGVWLKRKH	AYICLAAIW	19	C	L
NEUR1_calJ	DRYLKICYLSYGVWLKRKH	AYICLAAIW	19	C	L
NEUR1_tarS	DRYLKICYLSYGVWLKRKH	AYICLAAIW	19	C	L
NEUR1_cavP	DRYLKICYLSYGVWLKRKH	AYICLAAIW	19	C	L
NEUR1_dasN	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_equC	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_canF	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_susS	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_pteV	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_choH	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_musM	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_ratN	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_loxA	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_felC	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_turT	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_tupB	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_echT	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_dipO	DRYLKICYLSYGVWLKRKH	AYICLAVIW	19	C	L
NEUR1_bosT	DRYLKICYLSYGIWLKRKH	AYICLAVIW	19	C	L
NEUR1_eriE	DRYLKICYLSYGVWLKRKH	AYLCLAVIW	19	C	L
NEUR1_sorA	DRYLKICYLSYGVWLKRKH	AYICLVVIW	19	C	L
NEUR1_speT	DRYLKICYLSYGVWLKRKH	AFICLAVIW	19	C	L
NEUR1_oryC	DRYLKICYLSYGVWLKRRH	AYICLALIW	19	C	L
NEUR1_myoL	DRYLKICYLSYGVWLKRKH	TYICLAFIW	19	C	L
NEUR1_monD	DRYLKICHLSYGTWLKRHH	AFICLALIW	19	C	L
NEUR1_taeG	DRYLKICHLSYGTWLKRHH	AFICLAIIW	19	C	L
NEUR1_galG	DRYLKICHLAYGTWLKRHH	AFICLALIW	19	C	L
NEUR1_ornA	DRYLKICHLSYGTWLKRHH	AYICLAIIW	19	C	L
NEUR1_macE	DRYLKICHLSYGTWLKRHH	AYICLVIIW	19	C	L
NEUR1_gasA	DRYLKICHLRYGTWLKRHH	AFVCLALVW	19	C	L
NEUR1_anoC	DRYFKICHLSYGTWLKRHH	VFICLGIIW	19	C	L
NEUR1_tetN	DRYLKICHLRYGAWLKRHH	AFLCLASVW	19	C	L
NEUR1_xenT	DRYLKICHLRYGTWLKRRH	AFIALAVIW	19	C	L
NEUR1_takR	DRYLKICHLRYGTWFKRHH	AFLCLVFTW	19	C	L
NEUR1_oryL	DRYLKICHLRYGTWLKRQH	AFLCLVFVW	19	C	L
NEUR1_pimP	DRYLKICHLRYGTWLKRQH	IFLCLVFVW	19	C	L
NEUR1_danR	DRYLKICHLRYGTWLKRHH	AFLSVVFIW	19	C	L
NEUR1_calM	DRYLKICHLQYGSWLQRRH	VFMSLAFIW	19	C	L
NEUR2_galG	VCCLKICFPAYGNRFRRKH	GQILIACAW	19	C	P
NEUR2_anoC	VCCLKICFPVYGNRFRPGH	GWILIACAW	19	C	P
NEUR2_oncM	VCFVKVCYPLYGNRFNAVH	GRLLIACAW	19	C	P
NEUR2_xenT	VCCLKVCYPAYGNKFSTAH	SRILLLGIW	19	C	P
NEUR2_danR	VCCLKVCFPNYGNKFSSSH	ACVMVIGVW	19	C	P
NEUR2_pimP	VCCLKVCCPNYGNKFSSNH	ACVMVIGVW	19	C	P
NEUR2_tetN	VCCLKVCLPNLGSKFSSSH	ARLLVAGVW	19	C	P
NEUR2_takR	VCCLKVCFPNHGSRFSSSH	ARLLVVGVW	19	C	P
NEUR2_gasA	VCCLKVCFPNHGNRFSSSH	ARLLVVAVW	19	C	P
NEUR2_oryL	VCCLKVCFPNHGNKFSFSH	ARLLVAGVW	19	C	P
TMT_monDom	ERYRTL-TLCPGQGADYQK	ALLAVAGSW	19	-	L
TMT_macEug	ERYRTL-TLCPRQGTDYHK	ALLAVAGSW	19	-	L
TMT_ornAna	ERYRTL-TLHPKQSTDYQK	AVLAVGASW	19	-	L
TMT_galGal	ERYSTL-TLCNKRSDDYRK	ALLAVGGSW	19	-	L
TMT_taeGut	ERYNTL-TLCHKRSDDFRK	ALLAVAGSW	19	-	L
TMT_anoCar	ERYSTL-TQTNKRGSDYQK	ALLGVGGSW	19	-	Q
TMT_xenTro	ERYSTL-TLYNKGGPNFKK	ALLAVASSW	19	-	L
TMT_danRer	ERYCTMMGSTEADATNYKK	VIGGVLMSW	19	M	S
TMT_pimPro	ERYCTMMGATQADSTNYKK	VAMGIAFSW	19	M	A
TMTa_takRu	ERYSTMMTPTEADPSNYCK	VCLGITLSW	19	M	P
TMT_tetNig	ERYSTMMTPTEADSSNYCK	VCLGIGLSW	19	M	P
TMT_gasAcu	ERYSTMVAPTEADSSNYHK	ISLGITLSW	19	V	P
TMT_oryLat	ERYSTMMTPAEADSSNYRK	ISLGIILSW	19	M	P
TMTb_takRu	ERYCTMVSSTIASNRDYRP	VLGGICFSW	19	V	S
TMTa_calMi	DRYITITGTTEADITNYNK	TIVGIALSW	19	T	T
TMT1_plaDu	ERYLAVVRPFDVGNLTNRR	VIAGGVFVW	19	V	P
TMT2_anoGa	ERYCLISRPFSSRNLTRRG	AFLAIFFIW	19	S	P
TMT_triCas	ERYLLIARPFRNNALNFHS	AALSVFSIW	19	A	P
TMT_bomMor	ERYLMVTRPLTSRHLSSKG	AVLSIMFIW	19	T	P
ENCEPH_hom	ERYIRVVHARVINFSW	AWRAITYIW	16	V	A
TMT_aedAe	ERFCLISHPFSSRSLSRRG	AVFAILFIW	19	S	P
TMT_culPi	ERFYLISRPFSSRSLSRRG	ALGAVLLIW	19	S	P
ENCEPH_lox	ERYIRVVHARVINFSW	AWRAITYIW	16	V	A
TMT1_anoGa	ERFCLISRPFAAQNRSKQG	ACLAVLFIW	19	S	P
ENCEPH_can	ERYIRVVHARVINFSW	AWRAITYIW	16	V	A
TMT_triCa	ERYLLIARPFRNNALNFHS	AALSVFSIW	19	A	P
ENCEPH_oto	ERYIRVVHARVINFSW	AWRAITYIW	16	V	A
ENCEPH_mus	ERYIRVVHARVINFSW	AWRAITYIW	16	V	A
ENCEPH_ano	ERYIRVVHARVIDFSW	SWRAITYIW	16	V	A
ENCEPH_gal	ERYIRVVHAKVIDFSW	SWRAITYIW	16	V	A
ENCEPH_mon	ERYNRIVHAKVINFSW	AWRAITYIW	16	V	A
ENCEPH_pte	ERYIRVVQARAIDFSW	AWRTITYIW	16	V	A
ENCEPH_squ	ERYIRVVNATAIDFSW	AWRAITYIW	16	V	A
ENCEPH_xen	ERYARVVYGKYVNSSW	SKRSITFVW	16	V	G
ENCEPH_dan	ERYIRVVHAKVVDFPW	AWRAITHIW	16	V	A
ENCEPH_tak	ERYIRVVHAQVVDFPW	AWRAIGHIW	16	V	A
ENCEPH_gas	ERYIRVVHAQVVDFPW	AWRAIGHIW	16	V	A
ENCEPH_ory	ERYIRVVHAQVVDFPW	AWRAIGHIW	16	V	A
ENCEPH_cal	ERYIRVVNAKATNFPW	AWRAITYTW	16	V	A
ENCEPH_squ	ERYIRVVNATAIDFSW	AWRAITYIW	16	V	A
ENCEPH_pet	ERYARLIKAQVLDFSW	AWRAVTYTW	16	I	A
RGR_homSap	GRYHHYCTRSQLAWNS	AVSLVLFVW	16	C	R
RGR_panTro	GRYHHYCTRSQLAWNS	AISLVLFVW	16	C	R
RGR_gorGor	GRYHHYCTGSTLACKS	AVSLVLSGR	16	C	G
RGR_macMul	GRYHHYCTRSQLAWNS	AISLVLFVW	16	C	R
RGR_ponPyg	GRYHHYCTGSQLAWNS	AISLVLFVW	16	C	G
RGR_calJac	GRYHHYCTGSQLAWNS	AISLVLFVW	16	C	G
RGR_nomLeu	GRYHHYCTGSQLAWNS	AISLVLFVW	16	C	G
RGR_tarSyr	GRYHHYCTGSQLAWNT	AISLVLFVW	16	C	G
RGR_pteVam	GRYHHYCTGSRLAWNT	AVSLVLFVW	16	C	G
RGR_oryCun	GRYHHYCTGSQLAWNT	AVLLVLFVW	16	C	G
RGR_ochPri	GRYHHYCTGSQLAWNT	AVLLVLFVW	16	C	G
RGR_otoGar	GRYHHYCTGRPLAWST	AISLVLFVW	16	C	G
RGR_micMur	GRYHHYCTGSPLAWST	AISLVLFVW	16	C	G
RGR_musMus	GRYHHYCTGRQLAWDT	AIPLVLFVW	16	C	G
RGR_ratNor	GRYHHYCTGRQLAWDT	AIPLVLFVW	16	C	G
RGR_cavPor	GRHQQCCTRGRLTWST	AVPLVLFVW	16	C	R
RGR_speTri	GRYHHYCTGSQLAWNT	AIPLVLFVW	16	C	G
RGR_sorAra	GRYHHYCTGRQLAWDV	AIALVIFVW	16	C	G
RGR_myoLuc	GRYHHYCTGSRLAWRT	AASLVLFVW	16	C	G
RGR_canFam	GRYHHYCTRGQLAWNT	AISLVLCVW	16	C	R
RGR_felCat	GRYHHYCSGSQLAWNT	AISLVICVW	16	C	G
RGR_bosTau	GRYHHFCTGSRLDWNT	AVSLVFFVW	16	C	G
RGR_turTru	GRYHHYCTGSRLDWNT	AVSLVFFVW	16	C	G
RGR_susScr	GRYHHYCTRSRLDWNT	AVSLVFFVW	16	C	R
RGR_equCab	GRYHHYCTRSRLAWNT	AVFLVFFVW	16	C	R
RGR_eriEur	GRYHHHCTRSRLAWNT	AVFLVFFVW	16	C	R
RGR_dipOrd	GRCHHHCTGSLLGWDT	AVSLVIFVW	16	C	G
RGR_loxAfr	ERYHHYCTRSRLAWSS	ASALVLFVW	16	C	R
RGR_proCap	ERYHHYCTGSKLAWSS	AGALVLFMW	16	C	G
RGR_echTel	ERYHHYCTGSQFTWSS	ASTLVLFMW	16	C	G
RGR_dasNov	ERCHRHCIGRRLAWST	AGCLVLCLW	16	C	G
RGR_choHof	ERYRHHCTGSQLSWST	AGSLVLCVW	16	C	G
RGR_ornAna	DRYLRHCSRSKPQWGT	AVSTVLFAW	16	C	R
RGR_anoCar	DRHHQYCTGNKLQWGS	VIPMTIFLW	16	C	G
RGR_galGal	DRYHHYCTRSKLQWST	AISMMVFAW	16	C	R
RGR_taeGut	DRYHHYCTRSRLQWST	AVSMMVFAW	16	C	R
RGR_xenTro	DRYHQYCTRSKLHWST	AVSVVFFIW	16	C	R
RGR_xenLae	DRYHQYCTRSKLHWGT	AVSMVLFVW	16	C	R
RGR1_gasAc	DRYHQYCTRTKLQWSS	AITLAVFVW	16	C	R
RGR1_takRu	DRYHQYCTRTKLQWSS	AITLAVFIW	16	C	R
RGR1_tetNi	DRYHQYCTRTKLQWSS	AITLAVFIW	16	C	R
RGR1_pimPr	DRYHQYCTRTKLQWSS	AITLVIFIW	16	C	R
RGR1_osmMo	DRYHQYCTRTKLQWSS	AITLVMFIW	16	C	R
RGR1_gadMo	DRYHQYCTRTELQWSS	AVTLSVFIW	16	C	R
RGR1_danRe	DRYHQYCTRTKLQWSS	AITLVLFTW	16	C	R
RGR1_oryLa	DRYHQYCTRTKLQWST	AITLAVLVW	16	C	R
RGR_calMil	DRYHQNCSRSRLQWSS	AITVTVFIW	16	C	R
RGR2_gasAc	DRYHQYCTRQKLFWST	TLTMSAIIW	16	C	R
RGR2_tetNi	DRYHQYCTRQKLFWST	TLTMSSIIW	16	C	R
RGR2_oryLa	DRYHQYCTRQKLFWST	SITISLIIW	16	C	R
RGR2_danRe	DRYHQYCTKQKMFWST	SITISCLIW	16	C	K
RGR2_pimPr	DRYHLYCTKQKMFWST	SGTISALIW	16	C	K
RGR2_gadMo	DRYHQYCTRQKLFWST	TVTMCCIVW	16	C	R
RGR2_hipHi	DRYHQYCTRQKLFWST	TLTMSGIIW	16	C	R
RGR2_oncMy	DRYHQYVTNQKLFWST	AWTISIIIW	16	V	N
RGR2_esoLu	DRYHQYVTNQKLFWST	AWTFSIIIW	16	V	N
RGR2_poeRe	DRYHQYCTRQKLFWST	TLTMSGIIW	16	C	R
MEL1_homSa	DRYLVITRPLATFGVASKRR	AAFVLLGVW	20	T	P
MEL1_panTr	DRYLVITRPLATFGVASKRR	AAFVLLGVW	20	T	P
MEL1_gorGo	DRYLVITRPLATFGVASKRR	AAFVLLGVW	20	T	P
MEL1_ponAb	DRYLVITRPLATIGVASKRR	AAFVLLGVW	20	T	P
MEL1_rheMa	DRYLVITRPLATIGVASKRR	AAFVLLGVW	20	T	P
MEL1_calJa	DRYLVITRPLATIGVASTKR	AAFVLLGVW	20	T	P
MEL1_micMu	DRYLVITRPLASVGTASKRR	AGLVLLGVW	20	T	P
MEL1_otoGa	DRYLVITRPLTTVGVASKRR	AALVLLGVW	20	T	P
MEL1_musMu	DRYLVITRPLATIGRGSKRR	TALVLLGVW	20	T	P
MEL1_ratNo	DRYLVITRPLATIGMRSKRR	TALVLLGVW	20	T	P
MEL1_nanEh	DRYLVITRPLATIGVASKRR	TALVLLGVW	20	T	P
MEL1_phoSu	DRYLVITRPLATIGMGSKRR	TALVLLGIW	20	T	P
MEL1_dipOr	DRYLVITRPLATIGVTSKRR	TAFVLLGVW	20	T	P
MEL1_cavPo	DRYLVITRPLATIGVASKRQ	AALVLLGVW	20	T	P
MEL1_speTr	DRYLVITRPLATIGMASKKR	AAFFLLGVW	20	T	P
MEL1_oryCu	DRYLVITRPLAAVGMVSKKR	AGLVLLGVW	20	T	P
MEL1_ochPr	DRYLVITRPLAAVGMVSKRR	TGLVLLGVW	20	T	P
MEL1_bosTa	DRYLVITRPLATVGMVSKRR	AALVLLGVW	20	T	P
MEL1_turTr	DRYLVITRPLATVGMVSKRR	AALVLLGVW	20	T	P
MEL1_susSc	DRYLVITHPLATVGMVSKRR	AALVLLGVW	20	T	P
MEL1_equCa	DRYLVITRPLATVGVVSKRW	AALVLLGIW	20	T	P
MEL1_felCa	DRYLVITHPLATIGVVSKRR	AALVLLGVW	20	T	P
MEL1_canFa	DRYLVITHPLAAVGVVSKRR	AALVLLGVW	20	T	P
MEL1_myoLu	DRYLVITRPLA-IGVVSKRR	AALVLLGVW	20	T	P
MEL1_pteVa	DRYLVITRPLAAIGVVSKRR	AALVLLGVW	20	T	P
MEL1_eriEu	DRYLVITRPLATIGVVSKRR	VALVLLGVW	20	T	P
MEL1_loxAf	DRYLVITRPLATIGVVSKRR	AALVLLGIW	20	T	P
MEL1_proCa	DRYLVITRPLATIGVVSKRR	TALVLLGTW	20	T	P
MEL1_echTe	DRYLVITRPLATIGVVSKRR	AALVLLVIW	20	T	P
MEL1_smiCr	DRYFVITRPLASIGMISKKK	TGLILLGVW	20	T	P
MEL1_monDo	DRYFVITRPLASIGVISKKK	TGFILLGVW	20	T	P
MEL1_ornAn	DRYFVITRPLASIGVISKKR	ALLILTGVW	20	T	P
MEL1_anoCa	DRYFVITRPLASIGAMSTKK	ALLILSGVW	20	T	P
MEL1_taeGu	DRYFVITKPLASVGVTSKKK	ALIILVGVW	20	T	P
MEL1_galGa	DRYFVITKPLASVRVMSKKK	ALIILVGVW	20	T	P
MEL1_xenTr	DRYFVITRPLTSIGVMSKKR	AVLILSGVW	20	T	P
MEL1_danRe	DRYFVITRPLASIGVLSQKR	ALLILLVAW	20	T	P
MEL1_danRe	DRYFVITRPLASIGVMSRKR	ALLILSAAW	20	T	P
MEL1_takRu	DRYFVITRPLTSIGVLSRKR	AFVILMTVW	20	T	P
MEL1_gasAc	DRYFVITRPLTSIGMMSRRR	ALLILMGAW	20	T	P
MEL1_oryLa	DRYFVITRPLTSIGVLSRKR	ALLILSAAW	20	T	P
MEL1_calMi	DRYFVITRPLASIGVLSHRR	AGLIILSLW	20	T	P
MEL1_petMa	DRYLVLTRPLASIGAMSKRR	AMYITAAVW	20	T	P
MEL2_galGa	DRYLVITKPLRSIQWTSKKR	TIQIIAAVW	20	T	P
MEL2_anoCa	DRYCVITKPLQSIKRTSKKR	TCIIIVFVW	20	T	P
MEL2_xenLa	NRYIVITKPLQSIQWSSKKR	TSQIIVLVW	20	T	P
MEL2_danRe	DRYLVITKPLQTIQWNSKRR	TGLAILCIW	20	T	P
MEL2_tetNi	DRYVVITKPLQTIRRSSKRR	TALAILMVW	20	T	P
MEL2_gasAc	DRYLVITKPLQAIHWGSKRR	TTLAILLVW	20	T	P
MEL1_plaDu	DRFYVITNPLGAAQTMTKKR	AFIILTIIW	20	T	P
MEL1_capCa	DRYMVIAKPFYAMKHVSHKR	SLIQIILAW	20	A	P
MEL1_helRo	DRYLVVGQPLAMLNQSHFRR	SFYHVLIIW	20	G	P
MEL1_todPa	DRYNVIGRPMAASKKMSHRR	AFIMIIFVW	20	G	P
TMT_triCys	ERFITIVLPLKRDTILSTKN	IYIGLGILW	20	V	P

Reference collection of structurally determined GPCR

>RHO1_bosTau cow rod rhodopsin
MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAI
ERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWL
PYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTTLCCGKNPLGDDEASTTVSKTETSQVAPA*

>MEL1_todPac Todarodes pacificus (squid) Gq X70498 480 11106382 Mollusca 'squid rhodopsin' 3D: May 2008 Cys 337 palmitoyled
MGRDLRDNETWWYNPSIVVHPHWREFDQVPDAVYYSLGIFIGICGIIGCGGNGIVIYLFTKTKSLQTPANMFIINLAFSDFTFSLVNGFPLMTISCFLKKWIFGFAACKVYGFIGGIFGFMSIMTMAMISI
DRYNVIGRPMAASKKMSHRRAFIMIIFVWLWSVLWAIGPIFGWGAYTLEGVLCNCSFDYISRDSTTRSNILCMFILGFFGPILIIFFCYFNIVMSVSNHEKEMAAMAKRLNAKELRKAQAGANAEMRLAKI
SIVIVSQFLLSWSPYAVVALLAQFGPLEWVTPYAAQLPVMFAKASAIHNPMIYSVSHPKFREAISQTFPWVLTCCQFDDKETEDDKDAETEIPAGESSDAAPSADAAQMKEMMAMMQKMQQQQAAYPPQGY
APPPQGYPPQGYPPQGYPPQGYPPQGYPPPPQGAPPQGAPPAAPPQGVDNQAYQA*

>ADRB1_melGal turkey Beta 1 adrenergic receptor with stabilising mutations And bound cyanopindolol
MGAELLSQQWEAGMSLLMALVVLLIVAGNVLVIAAIGSTQRLQTLTNLFITSLACADLVVGLLVVPFGATLVVRGTWLWGSFLCELWTSLDVLCVTASIETLCVIAI
DRYLAITSPFRYQSLMTRARAKVIICTVWAISALVSFLPIMMHWWRDEDPQALKCYQDPGCCDFVTNRAYAIASSIISFYIPLLIMIFVALRVYREA
KEQIRKIDRASKRKRVMLMREHKALKTLGIIMGVFTLCWLPFFLVNIVNVFNRDLVPDWLFVAFNWLGYANSAMNPIIYCRSPDFRKAFKRLLAFPRKADRRLHHHHHH*

>ADRB2_homSap beta 2 adrenergic receptor 365 aa  
MGQPGNGSAFLLAPNRSHAPDHDVTQQRDEVWVVGMGIVMSLIVLAIVFGNVLVITAIAKFERLQTVTNYFITSLACADLVMGLAVVPFGAAHILMKMWTFGNFWCEFWTSIDVLCVTASIETLCVIAV
DRYFAITSPFKYQSLLTKNKARVIILMVWIVSGLTSFLPIQMHWYRATHQEAINCYANETCCDFFTNQAYAIASSIVSFYVPLVIMVFVYSRVFQEAKRQLQKIDKSEGRFHVQNLSQVEQDGRTGHGL
RRSSKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIQDNLIRKEVYILLNWIGYVNSGFNPLIYCRSPDFRIAFQELLCLRRSSLKAYGNGYSSNGNTGEQSG*

>ADORA2A_homSap adenosine adrenergic receptor 2A
MPIMGSSVYITVELAIAVLAILGNVLVCWAVWLNSNLQNVTNYFVVSLAAADIAVGVLAIPFAITISTGFCAACHGCLFIACFVLVLTQSSIFSLLAIAI
DRYIAIRIPLRYNGLVTGTRAKGIIAICWVLSFAIGLTPMLGWNNCGQPKEGKNHSQGCGEGQVACLFEDVVPMNYMVYFNFFACVLVPLLLMLGVYLRI
FLAARRQLKQMESQPLPGERARSTLQKEVHAAKSLAIIVGLFALCWLPLHIINCFTFFCPDCSHAPLWLMYLAIVLSHTNSVVNPFIYAYRIREFRQTFR
KIIRSHVLRQQEPFKAAGTSARVLAAHGSDGEQVSLRLNGHPPGVWANGSAPHPERRPNGYALGLVSGGSAQESQGNTGLPDVELLSHELKGVCPEPPGLDDPLAQDGAGVS*

        ADRB2 orthologs in tetrapods             ADORA2A in teleosts
homSap  DRYFAITSPFKYQSLLTKNKARVIILMVW    homSap  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
panTro  DRYFAITSPFKYQSLLTKNKARVIILMVW    panTro  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
gorGor  DRYFAITSPFKYQSLLTKNKARVIILMVW    gorGor  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
ponAbe  DRYFAITSPFKYQSLLTKNKARVIILMVW    ponAbe  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
rheMac  DRYFAITSPFKYQSLLTKNKARVIILMVW    rheMac  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
calJac  DRYFAITSPFKYQSLLTKNKARVIILMVW    calJac  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
micMur  DRYFAITSPFKYQSLLTKNKARVVILMVW    micMur  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
otoGar  DRYFAITSPFKYQSLLTKNKARVVILMVW    musMus  DRYIAIRIPLRYNGLVTGMRAKGIIAICW
tupBel  DRYFAITSPFKYQSLLTKNKARVVILMVW    ratNor  DRYIAIRIPLRYNGLVTGVRAKGIIAICW
dipOrd  DRYFAITSPFKYQSLLTKNKARVVILMVW    dipOrd  DRYIAIRIPLRYNSLVTCTRAKGIIAICW
cavPor  DRYFAITSPFKYQSLLTKNKARVVILMVW    cavPor  DRYIAIRIPLRYNGLVTCTRAKGIIAICW
oryCun  DRYFAITSPFKYQSLLTKNKARVVILMVW    speTri  DRYIAIRIPLRYNGLVTGMRAKGIIAICW
ochPri  DRYFAITSPFKYQSLLTKNKARVVVLMVW    oryCun  DRYIAIRIPLRYNGLVTGTRAKGIIAICW
equCab  DRYFAITSPFKYQSLLTKNKARVVILMVW    ochPri  DRYIAIRIPLRYNGLVTGSRAKGIIAICW
felCat  DRYFAITSPFKYQSLLTKNKARVVILMVW    turTru  DRYIAIRIPLRYNGLVTGTRAKGIIAVCW
canFam  DRYFAITSPFKYQSLLTKNKARVVILMVW    bosTau  DRYIAIRIPLRYNGLVTGTRAKGIIAVCW
myoLuc  DRYFAITSPFKYQSLLTKNKARVVILLVW    canFam  DRYIAIRIPLRYNGLVTGTRAKGIIAVCW
pteVam  DRYFAITSPFKYQSLLTKNKARVVILMVW    myoLuc  DRYIAIRIPLRYNGLVTGARAKGIIAICW
eriEur  DRYFAITSPFKYQSLLTKNKARVVILMVW    eriEur  DRYIAIRIPLRYNGLVTGQRAKGIIAVCW
sorAra  DRYFAITSPFKYQSLLTKNKARGVILMVW    loxAfr  DRYIAIRIPLRYNGLVTGTRAKGIIAVCW
proCap  DRYFAITSPFKYQSLLTKNKARVVILMVW    proCap  DRYIAIRIPLRYNGLVTGTRAKGIIAVCW
echTel  DRYFAITSPFKYQSLLTKNKARVVILMVW    galGal  DRIIAIRIPLRYNGLVTGSRAKGIIAICW
dasNov  DRYFAITSPFKYQSLLTKNKARVVILMVW    taeGut  DRIIAIRIPLRYNGLVTGSRAKGIIAICW
monDom  DRYFAITAPFRYQSMLTKGKARVVILVVW    xenTro  DRYIAIRIPLRYNSLVTSRRANAIIAVCW
galGal  DRYFAITSPFKYQSLLTKSKARVVILVVW    tetNig  DRYIAIKLPLRYNGLVTGQRAQAIIAICW
taeGut  DRYFAITSPFKYQSLLTKGKARVVILVVW    tetRub  DRYIAIKLPLRYNSLVTGKRAQGIIAICW
anoCar  DRYFAITSPFKYQSHLTKNKARVIILLVW    gasAcu  DRYIAIKIPLRYNGLVTGQRAQGIIAICW
xenTro  DRYFAITSPFRYQSLLTKCKARIVILLVW    oryLat  DRYIAIKIPLRYNSLVTSQRARGIIAICW
                                         danRer  DRYIAIKIPLRYNSLVTGQRARGIIAICW