Opsin evolution: RBP3 (IRBP): Difference between revisions
Tomemerald (talk | contribs) |
Tomemerald (talk | contribs) No edit summary |
||
(15 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
'''See also:''' [[Opsin_evolution|Curated Sequences]] | [[Opsin_evolution:_RPE65|RPE65]] | [[Opsin_evolution:_transducins|Transducins]] | [[USH2A_SNPs|Usher: USH2A]] | [[CDH23_SNPs|Usher: CDH23]] | [[LOXHD1_SNPs|LOXHD1]] | [[Opsin_evolution:_update_blog|Update Blog]] | |||
== RPB3 (IRBP): introduction == | == RPB3 (IRBP): introduction == | ||
Interstitial retinol-binding protein, | Interstitial retinol-binding protein, inexplicably named by RBP3 by [http://www.genenames.org/ HGNC] despite a lack of paralogs RBP1 or RBP2, likely confusion with ribosomal binding protein genes and widespread prior use of the protein name IRBP, is a 4 exon 1247 residue glycoprotein thought to shuttle retinoids interstitially between the photoreceptor cells and the retinal pigment epithelium. This role would only make sense for ciliary opsin systems that are unable to regenerate cis-retinal without an auxiliary pathway in an anatomically separate tissue (here RPE). Consequently -- since nearly all protein folds are extremely ancient -- RPB3 must have been co-opted from some other role. | ||
The protein's size results from four ancient internal tandem | The protein's size results from four ancient internal tandem duplications that became established prior to intronation (that is, the gene structure does not reflect the repeat structure; the repeats happened first, introns were inserted randomly later within the fourth repeat). Any given repeat module clusters markedly better to the same-numbered repeat in other species than to any of the internal repeats, establishing that repeats had arisen and diverged already prior to speciation rather than arising independently in descendent lineages (like RHO1 and RHO2 in lamprey). | ||
It was initially expected that IRBP as a self-contained homotetramer would load four molecules of trans-retinol to accomplish its passive shuttling efficiently. However experiments decisively show this to be false -- the subunits are quite inequivalent and only one molecule is transported. The structure of module 2 of frog was [determined in 2002 (PDB: 1J7X); it consists of two sub-domains and a cleft. The fold matches obscure proteases and hydrolases found in little-studied bacteria; this might represent convergent evolution as seen in TIM beta barrels rather than provide valid clues to ancestral function. A structural determination of the entire molecule is reportedly underway. | It was initially expected that IRBP as a self-contained homotetramer would load four molecules of trans-retinol to accomplish its passive shuttling efficiently. However experiments decisively show this to be false -- the subunits are quite inequivalent and only one molecule is transported. The structure of module 2 of frog was [determined in 2002 (PDB: 1J7X); it consists of two sub-domains and a cleft. The fold matches obscure proteases and hydrolases found in little-studied bacteria; this might represent convergent evolution as seen in TIM beta barrels rather than provide valid clues to ancestral function. A structural determination of the entire molecule is reportedly underway. | ||
Line 9: | Line 11: | ||
Fragments of the protein have been sequenced from an immense number of species for phylogenetic purposes. That's because it evolves fast enough to provide a large number of seemingly informative sites and these can be obtained conveniently exploiting the large size of the first reading frame. While there is minimal risk of accidentally cross-matching different modules, the internal repeat structure implies that residues might not be evolving independently. While the modules might seem too diverged for cross-module gene conversion to still be operative, they retain patches of near identity. | Fragments of the protein have been sequenced from an immense number of species for phylogenetic purposes. That's because it evolves fast enough to provide a large number of seemingly informative sites and these can be obtained conveniently exploiting the large size of the first reading frame. While there is minimal risk of accidentally cross-matching different modules, the internal repeat structure implies that residues might not be evolving independently. While the modules might seem too diverged for cross-module gene conversion to still be operative, they retain patches of near identity. | ||
Some very odd aspects of marsupial RBP3 are investigated below because of the implications in a gene so widely used in taxonomy. Teleost fish further raise the question of inhomogeneous recombination events arising from module mix-up. In fish, the ancestral four-module gene has [http://www.molvis.org/molvis/v12/a180/ given rise] to a two-module gene (M1 and M4) accompanied by an intronless upstream tandem fragment duplicate of M1-M3 without any indication of M4 in the residual 10 residues. Both genes are transcribed and apparently functional since the establishment event | Some very odd aspects of marsupial RBP3 are investigated below because of the implications in a gene so widely used in taxonomy. Teleost fish further raise the question of inhomogeneous recombination events arising from module mix-up. In fish, the ancestral four-module gene has [http://www.molvis.org/molvis/v12/a180/ given rise] to a two-module gene (M1 and M4) accompanied by an intronless upstream tandem fragment duplicate of M1-M3 without any indication of M4 in the residual 10 residues. Both genes are transcribed and apparently functional since the establishment event occurred prior to zebrafish and fugu divergence, though the upstream gene has been lost in many lineages including tetraodon, medaka, and stickleback but not cichlids. | ||
The upstream fragment suggests a truncated version of the first large exon. The lack of intron does not suggest retroprocessing in view of their lack in the parental gene and the locational adjacency but rather standard recombinational or flanking retroposon-driven tandem duplicating. The loss of M2 and M3 in the downstream gene but retention of standard introns again suggests recombinant loss, either as part of the initial tandem event or subsequent to it as a consequence of the new potential for exact misalignment. Note otherwise the four modules would have been quite diverged from each other prior to the emergence of rayfinned fish. | The upstream fragment suggests a truncated version of the first large exon. The lack of intron does not suggest retroprocessing in view of their lack in the parental gene and the locational adjacency but rather standard recombinational or flanking retroposon-driven tandem duplicating. The loss of M2 and M3 in the downstream gene but retention of standard introns again suggests recombinant loss, either as part of the initial tandem event or subsequent to it as a consequence of the new potential for exact misalignment. Note otherwise the four modules would have been quite diverged from each other prior to the emergence of rayfinned fish. | ||
Line 17: | Line 19: | ||
=== IRBP module alignment, conservation and structural aspects === | === IRBP module alignment, conservation and structural aspects === | ||
The image below aligns the four module types from human to lamprey and amphioxus, with yellow showing exceptionally conserved residues and cyan moderately conserved. The number of species shown could be greatly expanded especially for module 1 but for clarity only a phylogenetically representative set is displayed. Module boundaries vary slightly according to the bioinformatics tool used to define the domain (eg | The image below aligns the four module types from human to lamprey and amphioxus, with yellow showing exceptionally conserved residues and cyan moderately conserved. The number of species shown could be greatly expanded especially for module 1 but for clarity only a phylogenetically representative set is displayed. Module boundaries vary slightly according to the bioinformatics tool used to define the domain (eg crystallographic homology transfer, SuperFamily, Pfam, SwissProt, blastp of reference domains) but can be standardized by counting back a fixed number of residues N-terminally from the universally conserved tyrosine and forward from the C-terminal alanines because conserved residues must occupy near-identical positions within the tertiary structure. | ||
It can be seen that M2 has two variable length insertional regions, no doubt relatively disordered loops. Amphioxus additionally has inserted residues in three regions that are suppressed in the figure -- observe that although its overall identity is low, identity is high in conserved residues. The modules each have distinctive class indels and characteristic residues that allow them to be readily distinguished from each other. These features were most likely established prior to lamprey divergence rather than later by convergence. Only two conserved patches are observed, GNvGLYRvD and RAivVGErT in human; other conserved residues are dispersed (at least within the linear structure). | It can be seen that M2 has two variable length insertional regions, no doubt relatively disordered loops. Amphioxus additionally has inserted residues in three regions that are suppressed in the figure -- observe that although its overall identity is low, identity is high in conserved residues. The modules each have distinctive class indels and characteristic residues that allow them to be readily distinguished from each other. These features were most likely established prior to lamprey divergence rather than later by convergence. Only two conserved patches are observed, GNvGLYRvD and RAivVGErT in human; other conserved residues are dispersed (at least within the linear structure). | ||
Line 25: | Line 27: | ||
The sole [http://tinyurl.com/cetu3a structural determination] to date, 1J7X_A the single module M2 of Xenopus laevis to 1.8 Å, describes two domains separated by a hydrophobic binding cleft (based on analogy to very similar protease and hydrolase folds). The smaller domain A is an an amino terminal three helix bundle that extends 8 residues past the M2 insertion region (to PGDSIQAEN in xenLae) . M2 was an unfortunate choice because it is most diverged and bares the least resemblance to early-diverging or ancestral sequence. | The sole [http://tinyurl.com/cetu3a structural determination] to date, 1J7X_A the single module M2 of Xenopus laevis to 1.8 Å, describes two domains separated by a hydrophobic binding cleft (based on analogy to very similar protease and hydrolase folds). The smaller domain A is an an amino terminal three helix bundle that extends 8 residues past the M2 insertion region (to PGDSIQAEN in xenLae) . M2 was an unfortunate choice because it is most diverged and bares the least resemblance to early-diverging or ancestral sequence. | ||
The observed fold similarities could not plausibly have arisen by chance. Crotonase has a binding site for hydrophobic chains of fatty acids and isomerizes them, with possible | The observed fold similarities could not plausibly have arisen by chance. Crotonase has a binding site for hydrophobic chains of fatty acids and isomerizes them, with possible relevance in RBP3 to cis and trans retinoid stabilization and transport; however IRBP does not appear to be an enzyme or isomerase itself and RPE65 carries out the regenerative isomerization reaction. | ||
M2 may have two distinct binding sites, one in the cleft between domains A and B and the other solely inside domain B. The assignment of conserved tryptophans to these sites and other conserved residues to the surface (suggesting a protein binding partner, possibly intra-module binding) has to be revisited in view of the much deeper phylogenetic array of sequences now available. | M2 may have two distinct binding sites, one in the cleft between domains A and B and the other solely inside domain B. The assignment of conserved tryptophans to these sites and other conserved residues to the surface (suggesting a protein binding partner, possibly intra-module binding) has to be revisited in view of the much deeper phylogenetic array of sequences now available. | ||
Line 35: | Line 37: | ||
Despite subunits quite diverged in sequence, a more likely arrangement is an anti-parallel dimer of M1-M2-M3-M4 to M4-M3-M2-M1 (as seen in [[Personal_genomics:_ACTN3#R577x_and_co-evolution_of_actinin_spectrin_repeats|actinin spectrin repeats]]). This arrangement allows each module to dimerize with the quasi dihedral symmetry, leaving no docking sites vacant. Because the binding patches are in paralogous modules, this explains why observed surface residue conservation in frog M2 carries over to the other modules. | Despite subunits quite diverged in sequence, a more likely arrangement is an anti-parallel dimer of M1-M2-M3-M4 to M4-M3-M2-M1 (as seen in [[Personal_genomics:_ACTN3#R577x_and_co-evolution_of_actinin_spectrin_repeats|actinin spectrin repeats]]). This arrangement allows each module to dimerize with the quasi dihedral symmetry, leaving no docking sites vacant. Because the binding patches are in paralogous modules, this explains why observed surface residue conservation in frog M2 carries over to the other modules. | ||
If true, this implies co-evolution of amino acids in these binding patches on top of what already occurs internally to an individual module. All phylogenetic software assumes -- ridiculously -- independent evolution of sites. However like morphological characters, molecular characters can have dependencies that amount to couplings and coordination between individual reduced alphabets. Here the much studied M1 module has surface residues co- | If true, this implies co-evolution of amino acids in these binding patches on top of what already occurs internally to an individual module. All phylogenetic software assumes -- ridiculously -- independent evolution of sites. However like morphological characters, molecular characters can have dependencies that amount to couplings and coordination between individual reduced alphabets. Here the much studied M1 module has surface residues co-evolving with counterparts on M4. | ||
IRBP has only a single site with glycosylation potential (in [http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=protein&dopt=GenPept&RID=V21ZK4W8012&log%24=protalign&blast_rank=1&list_uids=20150355 beta strand 8] of the hinge region of domain B at residues 204 and 515 which are paralogous under module alignment). Although the NxT motif occurs with excellent phylogenetic conservation depth (human to lamprey, also in fish fragmentary genes), only modules M1 and M2 exhibit it. Given the interstitial location and experimental support for glycoprotein nature, IRBP is very likely glycosylated. Bulky carbohydrate chains need to be out of the way of the putative dimerization patches. | IRBP has only a single site with glycosylation potential (in [http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=protein&dopt=GenPept&RID=V21ZK4W8012&log%24=protalign&blast_rank=1&list_uids=20150355 beta strand 8] of the hinge region of domain B at residues 204 and 515 which are paralogous under module alignment). Although the NxT motif occurs with excellent phylogenetic conservation depth (human to lamprey, also in fish fragmentary genes), only modules M1 and M2 exhibit it. Given the interstitial location and experimental support for glycoprotein nature, IRBP is very likely glycosylated. Bulky carbohydrate chains need to be out of the way of the putative dimerization patches. | ||
Line 41: | Line 43: | ||
[[Image:IRBPglycosyl.png]] | [[Image:IRBPglycosyl.png]] | ||
The situation with cysteines is not so clear. Here M1 has a cysteine at position 19 that is conserved back to lamprey (also in fish fragmentary genes) but A/T/S in other modules. A second conserved cysteine is found only in modules M2 and M3 at position 163 (human M1 numbering, residue is I). M4 has no conserved cysteines. A few other cysteines occur elsewhere but only | The situation with cysteines is not so clear. Here M1 has a cysteine at position 19 that is conserved back to lamprey (also in fish fragmentary genes) but A/T/S in other modules. A second conserved cysteine is found only in modules M2 and M3 at position 163 (human M1 numbering, residue is I). M4 has no conserved cysteines. A few other cysteines occur elsewhere but only sporadically in the comparative genomics sense. The cysteines at M2 and M3 could potentially form a disulfide in the antiparallel dimer model discussed above. Inter-modular disulfides are not feasible if modules are arranged linearly. No explanation is currently available for the conserved cysteine in M1. | ||
An initial study of knockout mice | An initial study of knockout mice disastrously used a strain carrying a [http://www.ncbi.nlm.nih.gov/pubmed/19074801 severe mutation] at L450 in [[Opsin_evolution:_RPE65|RPE65]], the enzyme catalyzing the rate-limiting isomerization step in the visual regeneration cycle. Lack of IRBP alone delays transfer of both bleached from photoreceptor cells and newly regenerated chromophore from RPE to photoreceptors. | ||
A single known [http://www.ncbi.nlm.nih.gov/pubmed/19074801 disease mutation] D1080N causes a very rare recessive retinitis pigmentosa. This aspartate may disrupt an internal salt bridge according to the Xenopus M2 structure (and homology transfer to other modules]. It lies in M4 near the end of the second exon. This residue is conserved back to lamprey and amphioxus, indeed to crotonase and other weakly aligning homologs. | A single known [http://www.ncbi.nlm.nih.gov/pubmed/19074801 disease mutation] D1080N causes a very rare recessive retinitis pigmentosa. This aspartate may disrupt an internal salt bridge according to the Xenopus M2 structure (and homology transfer to other modules]. It lies in M4 near the end of the second exon. This residue is conserved back to lamprey and amphioxus, indeed to crotonase and other weakly aligning homologs. | ||
Line 61: | Line 63: | ||
M1_macEug IFQPSLVLDMAKILLDNYCFPENLMGMQEAIEQAIKSGEILDISDPQMLASVLTAGVQGSLNDPRLVISYEPSP-----------AEAPQQSPKLTSLTQEELLTLLQQMIKYQVLDGNVGYLRVDYIPGQEVVEKVGEFLVNDIWKKLMGTSSLVLDL | M1_macEug IFQPSLVLDMAKILLDNYCFPENLMGMQEAIEQAIKSGEILDISDPQMLASVLTAGVQGSLNDPRLVISYEPSP-----------AEAPQQSPKLTSLTQEELLTLLQQMIKYQVLDGNVGYLRVDYIPGQEVVEKVGEFLVNDIWKKLMGTSSLVLDL | ||
M1_sacHar IFQPTLVLDMAKILLDNYCFPENLMGMQEAIEQAIKSGEILDISDPQMLANVLTAGVQGSLNDPRLVISYEPST-----------SEAPQHDPKFANATQEELLALFQKIIKYQVLEDNVGYLRVDYIPGRDMIEEVGEFLVNDIWKKVMETSSLVLDL | M1_sacHar IFQPTLVLDMAKILLDNYCFPENLMGMQEAIEQAIKSGEILDISDPQMLANVLTAGVQGSLNDPRLVISYEPST-----------SEAPQHDPKFANATQEELLALFQKIIKYQVLEDNVGYLRVDYIPGRDMIEEVGEFLVNDIWKKVMETSSLVLDL | ||
M1_ornAna VSQPSMVLDVAKILLDNYCYPENLMGMQEAIEEAIQRGEILDIADPKRLASVLTAGVQGSLNDPRLVISYEPAP-----------VAVSQQPPEPASLPAEQPLERLRPAVGSEVLEGNVGYLRVDRLPGREEIERVGAVLGRDIWEKLLGTSALVLDL | |||
M1_galGal IFQPTLVLDMAKVLLDNYCYPENLVGMQEAIEQAIKSGEILDISDPKMLANVLTAGVQGALNDPRLVISYEPSL-----------HAAPKQEAE-TYPTREQLLSLIEHVVIYDKLEGNVGYLRIDYIIGQEVVEKVGAFLVDKVWKTLINTSALVIDL | M1_galGal IFQPTLVLDMAKVLLDNYCYPENLVGMQEAIEQAIKSGEILDISDPKMLANVLTAGVQGALNDPRLVISYEPSL-----------HAAPKQEAE-TYPTREQLLSLIEHVVIYDKLEGNVGYLRIDYIIGQEVVEKVGAFLVDKVWKTLINTSALVIDL | ||
M1_taeGut IFQPTLVLDMAKVLLDNYCYPENLVGMQEAIEQAIKSGEILDISDPKMLANVLTAGVQGALNDPRLVISYEPLP-----------HSGPKQEAE-GSPTREQLLSLIEHVIMYDKLEGNVGYLRIDYIIGEEVVQKVGAFLVDKVWKTLIETSALVIDL | M1_taeGut IFQPTLVLDMAKVLLDNYCYPENLVGMQEAIEQAIKSGEILDISDPKMLANVLTAGVQGALNDPRLVISYEPLP-----------HSGPKQEAE-GSPTREQLLSLIEHVIMYDKLEGNVGYLRIDYIIGEEVVQKVGAFLVDKVWKTLIETSALVIDL | ||
M1_anoCar VLQSTLVLDMAKLLLDNYCLPENLVGMREAIEQAIKNGEVLDISDPKLLATVLTAGVQGALNDPRLVISYEPTA-----------PAAPKQRME-TSLTPEQLLSLIQHTVKYEVLDDNVGYLRIDYIMGQDIVQKIGSFLVEKVWKTLLGTSALILDL | M1_anoCar VLQSTLVLDMAKLLLDNYCLPENLVGMREAIEQAIKNGEVLDISDPKLLATVLTAGVQGALNDPRLVISYEPTA-----------PAAPKQRME-TSLTPEQLLSLIQHTVKYEVLDDNVGYLRIDYIMGQDIVQKIGSFLVEKVWKTLLGTSALILDL | ||
M1_xenLae LFQPSLVMDMAKVLLDNYCFPENLVGMQETIEQAVKGGEILHISDPDTLANVFTSGVQGYLNDPRLVVSYEPN-------------YSGPQTEQSLELTPEQLKFLINHSVKYDILPGNIGYLRIDFIIGQDVVQKVGPHLVNNIWKKLMPTSALILDL | M1_xenLae LFQPSLVMDMAKVLLDNYCFPENLVGMQETIEQAVKGGEILHISDPDTLANVFTSGVQGYLNDPRLVVSYEPN-------------YSGPQTEQSLELTPEQLKFLINHSVKYDILPGNIGYLRIDFIIGQDVVQKVGPHLVNNIWKKLMPTSALILDL | ||
M1_xenTro VFQPSLVMDMAKVLLDNYCFPENLVGMQETIEQAMKSGEILHISDPETLANVFTSGVQGFLNDPRLVVSYEPN-------------YSGPRKEQSPEPTLEQLKFLLDHSVTYDLLPGNIGYLRIDFIIGQDVVQKVGPLLVNNIWKKLMPSSALILDL | M1_xenTro VFQPSLVMDMAKVLLDNYCFPENLVGMQETIEQAMKSGEILHISDPETLANVFTSGVQGFLNDPRLVVSYEPN-------------YSGPRKEQSPEPTLEQLKFLLDHSVTYDLLPGNIGYLRIDFIIGQDVVQKVGPLLVNNIWKKLMPSSALILDL | ||
Line 77: | Line 79: | ||
<font color="red">M2_homSap SALPGVVHCLQEVLKDYYTLVDRVPTLLQHLASM----DFSTVVSEEDLVTKLNAGLQAASEDPRLLVRAIGPTETPSWPAPDAAAEDSPGVAPELPEDEAIRQALVDSVFQVSVLPGNVGYLRFDSFADASVLGVLAPYVLRQVWEPLQDTEHLIMDL | <font color="red">M2_homSap SALPGVVHCLQEVLKDYYTLVDRVPTLLQHLASM----DFSTVVSEEDLVTKLNAGLQAASEDPRLLVRAIGPTETPSWPAPDAAAEDSPGVAPELPEDEAIRQALVDSVFQVSVLPGNVGYLRFDSFADASVLGVLAPYVLRQVWEPLQDTEHLIMDL | ||
M2_bosTau RALPGVIQRLQEALREYYTLVDRVPALLSHLAAM----DLSSVVSEDDLVTKLNAGLQAVSEDPRLQVQVVRPKEASS--GPEEEAEEPPEAVPEVPEDEAVRRALVDSVFQVSVLPGNVGYLRFDSFADASVLEVLGPYILHQVWEPLQDTEHLIMDL | M2_bosTau RALPGVIQRLQEALREYYTLVDRVPALLSHLAAM----DLSSVVSEDDLVTKLNAGLQAVSEDPRLQVQVVRPKEASS--GPEEEAEEPPEAVPEVPEDEAVRRALVDSVFQVSVLPGNVGYLRFDSFADASVLEVLGPYILHQVWEPLQDTEHLIMDL | ||
M2_monDom RARPGAIQRLMEVLQNYYTLVDRVPALLHHLTAI----DYSSVLTEEDLAAKLNAGLQAVSEDPRLLVRVLRPEEATM- | M2_monDom RARPGAIQRLMEVLQNYYTLVDRVPALLHHLTAI----DYSSVLTEEDLAAKLNAGLQAVSEDPRLLVRVLRPEEATM-GSEASEEEDATPAANSLPEDESQRQALVDSVFQVSVLPGNVGYLRFDEFADSSVLGTLAPYVIRQVWEPLQDTNHLIMDL | ||
M2_sacHar RARPGAIQRLMEILQKYYTLVDRVPALLHHLTAI----DYSSVLTEEDLAAKLNAMLQAVSEDPRLLVRVLRPEEATV--EAEPGEESATPASVSLPESDAERQALIDSVFQVSVLPGNVGYLRFDEFADNSVLGTLAPYVLRQVWEPLQDTDHLIMDL | M2_sacHar RARPGAIQRLMEILQKYYTLVDRVPALLHHLTAI----DYSSVLTEEDLAAKLNAMLQAVSEDPRLLVRVLRPEEATV--EAEPGEESATPASVSLPESDAERQALIDSVFQVSVLPGNVGYLRFDEFADNSVLGTLAPYVLRQVWEPLQDTDHLIMDL | ||
M2_macEug RARPAAIQRLMEVLQNYYTLVDRVPALLHHLTAI----DYSSVLTEEDLAAKLNAGLQAVSEDPRLLVRVLRPEEATA--GAESREEAATAAPVPLPDGESQRQALVNSVFQVSVLPGNVGYLRFDEFADSSVLGALAPYVLQQVWEPLQDTDHLIMDL | M2_macEug RARPAAIQRLMEVLQNYYTLVDRVPALLHHLTAI----DYSSVLTEEDLAAKLNAGLQAVSEDPRLLVRVLRPEEATA--GAESREEAATAAPVPLPDGESQRQALVNSVFQVSVLPGNVGYLRFDEFADSSVLGALAPYVLQQVWEPLQDTDHLIMDL | ||
M2_ornAna GAVPGAVAHLADLLRDYYALVDRVPALLRHLAAL----DLSSVLSEEDLTSRLNAGLQAASEDPRLLVRRLEPEEAERGPPRKEEEQKEEEEEDQPSPGASILPGDGSSLFRVSVLPGNVGYLCFDEFPEASALERLGPLLGRRVWEPLEATDHLMVDL | |||
M2_galGal RAVPGTLSRLTDILKDYYSLVERVPVLLRHLTTS----DFSSVQSAEDLATKLNTEMQTLSEDPRLLVRTMMPGEA-----AAPPAEMPIAMAANLPDNEQLLHALVDTVFKVSVLPGNVGYMRFDEFADASVLVKLGPYIVKKVWEPLQNTENLIMDL | M2_galGal RAVPGTLSRLTDILKDYYSLVERVPVLLRHLTTS----DFSSVQSAEDLATKLNTEMQTLSEDPRLLVRTMMPGEA-----AAPPAEMPIAMAANLPDNEQLLHALVDTVFKVSVLPGNVGYMRFDEFADASVLVKLGPYIVKKVWEPLQNTENLIMDL | ||
M2_taeGut RAVPGTISHLKNILKDYYSLVERVPALLRRLTTS----DFSSVQSSEDLATKLNTELQALSDDPRLMVRVMMPGEA-----ADSPAEKPVGMAADLPDNEQLLHALVDTVFKVSVLPGNVGYMRFDEFADASVLVKLGPYLVHKVWEPLQNTENLIMDL | M2_taeGut RAVPGTISHLKNILKDYYSLVERVPALLRRLTTS----DFSSVQSSEDLATKLNTELQALSDDPRLMVRVMMPGEA-----ADSPAEKPVGMAADLPDNEQLLHALVDTVFKVSVLPGNVGYMRFDEFADASVLVKLGPYLVHKVWEPLQNTENLIMDL | ||
M2_anoCar KAIPNSMSYLVDIIKNNYSMLEQVPVLLQHLSTF----DYSSVLSVKDLASKLNAELQTISEDPRLFLRVPASDEA-----VTSQTDEKVAMASDLPNNEQLMKALVMTVFKVSVLPGNVGYMRFDEFGDATVLVKLGPYLLQHVWEPLQATDYLIIDL | M2_anoCar KAIPNSMSYLVDIIKNNYSMLEQVPVLLQHLSTF----DYSSVLSVKDLASKLNAELQTISEDPRLFLRVPASDEA-----VTSQTDEKVAMASDLPNNEQLMKALVMTVFKVSVLPGNVGYMRFDEFGDATVLVKLGPYLLQHVWEPLQATDYLIIDL | ||
M2_xenTro SSITHILLQLSEILVNNYAFSERIPTLLQHLPNL----DYSSVISEEDITAKLNYELQSLTEDPRLVLKSKTDSLV---------MPEDSTQVENLPDDEATLQALVNTVFKVSILPGNIGYLRFDEFADVSVLAKLGPYIVNTVWDPITVTENLIIDL | M2_xenTro SSITHILLQLSEILVNNYAFSERIPTLLQHLPNL----DYSSVISEEDITAKLNYELQSLTEDPRLVLKSKTDSLV---------MPEDSTQVENLPDDEATLQALVNTVFKVSILPGNIGYLRFDEFADVSVLAKLGPYIVNTVWDPITVTENLIIDL | ||
M2_xenLae SSVTHVLHQLCDILANNYAFSERIPTLLQHLPNL----DYSTVISEEDIAAKLNYELQSLTEDPRLVLKSKTDTLV---------MPGDSIQAENIPEDEAMLQALVNTVFKVSILPGNIGYLRFDQFADVSVIAKLAPFIVNTVWEPITITENLIIDL | M2_xenLae SSVTHVLHQLCDILANNYAFSERIPTLLQHLPNL----DYSTVISEEDIAAKLNYELQSLTEDPRLVLKSKTDTLV---------MPGDSIQAENIPEDEAMLQALVNTVFKVSILPGNIGYLRFDQFADVSVIAKLAPFIVNTVWEPITITENLIIDL | ||
Line 186: | Line 188: | ||
=== Evolutionary origin of RBP3 (IRBP) === | === Evolutionary origin of RBP3 (IRBP) === | ||
Lamprey sequence was recovered by a French group in 2008 from unassembled genome project contigs but not explicitly provided in the article | Lamprey sequence was recovered by a French group in 2008 from unassembled genome project contigs but not explicitly provided in the [http://www.ncbi.nlm.nih.gov/pubmed/18499481 article] nor posted to GenBank. They reported four modules despite uncertainty in whether these all resided in the same gene. Below, a nearly full length gene (exon 3 and 4 are missing) has been independently recovered from the initial lamprey assembly and parsed into its modules using Superfamily HMM. It indeed has four modules classifying to the expected types and ordering, proving the modern version of the gene was fully established prior to the last common ancestor of mammals and lamprey. As the lamprey ancestor already had fully modern ciliary color vision (so need for retinol shuttling), the role of RBP3 (IRBP) in extant species may have already been established 500 myr ago. | ||
It cannot yet be verified that lamprey has the third intron, though otherwise intron positions and phases are identical to other vertebrates. While no full length chondrichthyes sequence is available, close study of the elephantshark contig containing exon 3 establishes that the final intron occurs with the same phase 0 and position seen in later-diverging tetrapods. While intron loss in lamprey is possible, the simpler explanation is intron gain in the stem for reasons provided below. | |||
Callorhinchus milii contig AAVX01012059 establishes that exon 3 has standard flanking phase 1 introns on both sides: | Callorhinchus milii contig AAVX01012059 establishes that exon 3 has standard flanking phase 1 introns on both sides: | ||
Line 196: | Line 198: | ||
LIIETNSL-RDHR<font color="blue">YNIGGPTSSIPILCSYFFDDDKTVLLDTVYSRPTDTISEMKAIPQVAGNGSTESSVHSYI</font>CEDLHHCKYGLII | LIIETNSL-RDHR<font color="blue">YNIGGPTSSIPILCSYFFDDDKTVLLDTVYSRPTDTISEMKAIPQVAGNGSTESSVHSYI</font>CEDLHHCKYGLII | ||
Previous efforts to trace the gene back to earlier deuterostomes (or metazoa) proved futile. That remains the case in March 2009 for tunicate ( | Previous efforts to trace the gene back to earlier deuterostomes (or metazoa) proved futile. That remains the case in March 2009 for tunicate (despite a third assembly and massive transcript set) and much-studied sea urchin, establishing that in all likelihood homologs have been lost. However it is possible but implausible that sequences have diverged to the point of unrecognizability. It is equally implausible that a complex fold matching known bacterial folds arose de novo in Cambrian deuterostomes. This creates the unresolved dilemma of ghost gene retention over an immense time frame in ancestral eyeless species yet loss in almost all extant clades. | ||
With the advent of the second assembly of the cephalochordate Branchiostoma floridae (which has far fewer polymorphism-related assembly stutters than the first release), it is straightforward to recover a homolog in this species (after rubbishing an unmotivated fusion to the adjacent sulfotransferase in | With the advent of the second assembly of the cephalochordate Branchiostoma floridae (which has far fewer polymorphism-related assembly stutters than the first release), it is straightforward to recover a homolog in this species (after rubbishing an unmotivated fusion to the adjacent sulfotransferase in a JGI gene model). The smaller gene here consists of a single module that clusters best with M3 and M4. This suggests that the module number expansion -- like so many key events in vertebrate gene evolution -- took place between cephalochordate and agnathan divergences. | ||
The shocking aspect of the Branchiostoma RBP3 (IRBP) gene is its 9 exons. These range in size from 30 to 66 amino acids, quite typical of the average vertebrate gene. The anomaly here is really in the immense size of the first exon of the | The shocking aspect of the Branchiostoma RBP3 (IRBP) gene is its 9 exons. These range in size from 30 to 66 amino acids, quite typical of the average vertebrate gene. Splice junctions are all standard GT-AG. The anomaly here is really in the immense size of the first exon of the vertebrate four-module gene which extends for 1018 residues in human. That can be placed in prospective using the UCSC [http://genome.cse.ucsc.edu/cgi-bin/hgTables Table Browser] to determine the overall size distribution of the 190,000-odd human coding exons. The average protein has 450 residues and 8-9 exons. | ||
The second anomaly is that the placement and phasing of the Branchiostoma introns do not correspond at all via blastp | The second anomaly is that the placement and phasing of the Branchiostoma introns do not correspond at all (via blastp registration) to those of module 4, the only vertebrate module with internal exons (3 of them). A great majority of introns are immensely conserved -- from human to cnidarian -- so the notion of massive erasure followed by de novo intronation in either the Branchiostoma or vertebrate gene can be discarded. However some explanation of descent is required because the genes are strongly homologous (chance expectation e-30) though at best 31% identical and gappy in alignment. | ||
The Branchiostoma gene can be assumed orthologous for lack of a better candidate, but nearby genes in the assembly (EARS2 DNAJC19 UGT2B4 CD79B) bear no obvious synteny to those flanking the human RBP3 gene (RBP3 ZNF488 GDF2 GDF10 ANXA8L1) and only broken chromosomal correspondence is seen in whole genome alignment to human (net track). | The Branchiostoma gene can be assumed orthologous to vertebrate IRBP for lack of a better candidate, but nearby genes in the assembly (EARS2 DNAJC19 UGT2B4 CD79B) bear no obvious synteny to those flanking the human RBP3 gene (RBP3 ZNF488 GDF2 GDF10 ANXA8L1) and only broken chromosomal correspondence is seen in whole genome alignment to human (net track). | ||
It follows that searches for earlier diverging species that still carry a homolog should be carried out with a single-domain protein as query. The amphioxus protein is the only one currently available and may be highly diverged from its ancestral form. In any event, it does not expose any cryptic homologs in tunicate or echinoderm, much less lophotrochozoa, arthropods or cnidaria | It follows that searches for earlier diverging species that still carry a homolog should probably be carried out with a single-domain protein as query. The amphioxus protein is the only one currently available and may be highly diverged from its ancestral form. In any event, it does not expose any cryptic homologs in tunicate or echinoderm, much less lophotrochozoa, arthropods or cnidaria known to have ciliary opsin systems. | ||
Under the twin assumptions of orthology and approximately ancestral intronation pattern in Branchiostoma, how do we get from a one module gene with 8 introns to a four module gene with | Under the twin assumptions of orthology and approximately ancestral intronation pattern in Branchiostoma, how do we get from a one module gene with 8 introns to a four module gene with 3 unrelated introns by lamprey divergence? This in some ways may have been the critical step in the evolution of imaging eyes with their massive requirement for rapid retinol recycling that likely vastly exceeded that of the ancestral cephalochordate (note RPE65 is [http://www.ncbi.nlm.nih.gov/pubmed/19193895 rate-limiting]). IRBP co-evolved its interstitial location and shuttling function with the newly developed supporting role of the retinal pigmented epithelium, which together were essential for effective vertebrate imaging vision. | ||
One hypothetical scenario is formation first of a retroprocessed intronless gene. This may have displaced the nine-exon parental gene because of selection for | One hypothetical scenario is formation first of a retroprocessed intronless gene with one module. This may have displaced the nine-exon parental gene because of selection for higher throughput translation. The protein may have been initially a non-cyclic tetramer of discrete subunits. The gene dosage perhaps doubled via a tandem copy, again to meet the need of increased retinol cycling. Next the intervening untranslated region experienced deletions fusing the previously independent modules, leading to a dimer of the internal repeat dimer at the protein level. | ||
While this scenario is | The gene complex then doubled again perhaps through recombinational module mismatch, once again to meet the newly evolving need for rapid retinol recycling -- at this point all four modules may have bound retinol. New introns with were gained by an unexplained process that favored the 3' end of the gene. Finally a need for allosteric regulation or interaction with other proteins allowed subfunctionalization of other domains to non-retinol shuttling roles despite the apparent loss of previously selection for efficiency because RPE65 had become rate-limiting. | ||
While this scenario is speculative and non-unique, elements of it are not uncommon in other genes, even in opsins. For example, a retroprocessed fish RHO1 [http://www.ncbi.nlm.nih.gov/pubmed/18466202 displaced] the parental 5-exon gene into the pineal and LWS repeated spawned secondary cone opsin genes that were initially tandem. A great many human proteins have internally repeated domains and some like titin quite large numbers of them. There may not exist sufficient surviving members of the cephalochordate, hagfish and lamprey clades to specifically illuminate what really happened here. The highest priority would be to study localization and function of RBP3_braFlo within that the visual systems of that organism. | |||
>RBP3_braFlo Branchiostoma floridae Region: 9 exons; 1 domain: 83-381 | >RBP3_braFlo Branchiostoma floridae Region: 9 exons; 1 domain: 83-381 | ||
Line 223: | Line 227: | ||
0 VPGGIRFPDMPLYLLTSNRTSREAEEFAYAMQVVNRTTIIGETT 1 | 0 VPGGIRFPDMPLYLLTSNRTSREAEEFAYAMQVVNRTTIIGETT 1 | ||
2 AGEEFTGMWFPIDQTDVHLLTRTNVVRNPITQDSWSGK 1 | 2 AGEEFTGMWFPIDQTDVHLLTRTNVVRNPITQDSWSGK 1 | ||
2 GVTPDIIVPSEKALTVALRKIQ</font>GSEDTKMAASSGNIEPPRWTVYLVFICTSIAILTYPTFM* | 2 GVTPDIIVPSEKALTVALRKIQ</font>GSEDTKMAASSGNIEPPRWTVYLVFICTSIAILTYPTFM* | ||
=== RBP3 (IRBP) use in marsupial phylogeny === | === RBP3 (IRBP) use in marsupial phylogeny === | ||
Line 231: | Line 235: | ||
The use of IRBP in phylogenetic trees requires a high level of sequence accuracy, particularly since 'informative' characters are precisely the differences. However the PCR survey sequences exhibit various discrepancies in comparison to very high coverage genomic sequences. Some differences could arise from valid polymorphisms in individual animals used as dna source or represent an acceptable (low) level of PCR sequencing error, but other discrepancies appear to be gross errors in GenBank submissions (all from the same research group). | The use of IRBP in phylogenetic trees requires a high level of sequence accuracy, particularly since 'informative' characters are precisely the differences. However the PCR survey sequences exhibit various discrepancies in comparison to very high coverage genomic sequences. Some differences could arise from valid polymorphisms in individual animals used as dna source or represent an acceptable (low) level of PCR sequencing error, but other discrepancies appear to be gross errors in GenBank submissions (all from the same research group). | ||
None of the | The Sacrophilus entry AY532685 has 6 amino acids inconsistent with genomic data (too many for polymorphic variation) as well as a gross error affecting a 17 residue block. It seems likely in the rush to obtain enough 'informative' characters, a completely unworkable error rate has been tolerated. All marsupial phylogenetic studies based in part on analysis of IRBP thus come into question. | ||
None of the fragmentary marsupial GenBank entries provide a numbering system relative to a full length marsupial IRBP gene (eg Monodelphis) or its modules; the 4x module repeat structure of the gene is conducive to erroneous module cross-alignment. cDNA sequences typically start at position 73 of M1 (beginning of first hyper-variable indel region) and continue half way through M2 (to position 143), for example EF028750 of Myrmecobius fasciatus (numbat). Thus the region commonly sequenced is poorly coordinated with internal modular structure or existing 3D data. | |||
The closest matches to the thylacine IRBP are shown in the difference alignment of the first 60 residues below. These species all lie with the Dasyuromorphia. The indicated E-->K may be one of several phyloSNPs breaking this group into <font color="blue">blue</font> and <font color="green">green</font> subclades. The numbat <font color="red">Myrmecobius</font> fits implausibly (its amino terminal sequence [http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=124054062 EF028750] needs verification) -- its affinities seem to lie with the <font color="brown">Didelphimorphia</font> in view of the shared Q.VV.K motif. Thylacinus is not basal within Dasyuromorphia relative to Myrmecobius using IRBP. This is not a case of mis-comparison of modules. | The closest matches to the thylacine IRBP are shown in the difference alignment of the first 60 residues below. These species all lie with the Dasyuromorphia. The indicated E-->K may be one of several phyloSNPs breaking this group into <font color="blue">blue</font> and <font color="green">green</font> subclades. The numbat <font color="red">Myrmecobius</font> fits implausibly (its amino terminal sequence [http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=124054062 EF028750] needs verification) -- its affinities seem to lie with the <font color="brown">Didelphimorphia</font> in view of the shared Q.VV.K motif. Thylacinus is not basal within Dasyuromorphia relative to Myrmecobius using IRBP. This is not a case of mis-comparison of modules. | ||
Line 351: | Line 357: | ||
It emerges from direct tblastn that the Sacrophilus individual sequenced was female. That is, ATRX is well represented but not ATRY (though the situation is somewhat confused due to additional paralogs). Marsupial XY are [http://www.ncbi.nlm.nih.gov/pubmed/11173870,12508115,16209912,17333539,18185981,9215558 quite different] from placentals: | It emerges from direct tblastn that the Sacrophilus individual sequenced was female. That is, ATRX is well represented but not ATRY (though the situation is somewhat confused due to additional paralogs). Marsupial XY are [http://www.ncbi.nlm.nih.gov/pubmed/11173870,12508115,16209912,17333539,18185981,9215558 quite different] from placentals: | ||
<blockquote> | <blockquote> | ||
"Many or most genes on the mammal Y chromosome evolved a testis-specific function after diverging from an X-borne copy with a general function in both sexes. In marsupial but not eutherian mammals, a testis-specific | "Many or most genes on the mammal Y chromosome evolved a testis-specific function after diverging from an X-borne copy with a general function in both sexes. In marsupial but not eutherian mammals, a testis-specific ortholog (ATRY) of the widely expressed X-borne ATRX gene lies on the Y chromosome. Since mutations in human ATRX cause sex reversal, it is possible that one function of ATRY in marsupials is testicular differentiation. We report here the isolation and sequencing of the tammar wallaby (Macropus eugenii) ATRY cDNA, and comparison of its sequence with that of tammar ATRX. The evolution of a testis-specific function for the ATRY protein distinct from the general role of ATRX in both sexes has been accompanied by sequence changes in many protein domains that would alter protein binding partners. A large open reading frame encodes a 1771 amino acid ATRY protein that has diverged extensively from ATRX. The conservation and loss of particular motifs identify those required for testicular function (ATRY) and function in other tissues (ATRX)."</blockquote> | ||
== Reference sequences == | == Reference sequences == | ||
Line 393: | Line 399: | ||
NFSPTLIADMAKIFMDNYCSPEKLTGMEEAIDAASSNTEILSISDPTMLANVLTDGVKKTISDSRVKVTYEPDLILAAPPAMPDIPLEHLAAMIKGTVKVEILEGNIGYLKIQHIIGEEMAQKVGPLLLEYIWDKILPTSAMILDFRSTVTGELSGIPYIVSYFTDPEPLIHIDSVYDRTADLTIELWSMPTLLGKRYGTSKPLIILTSKDTLGIAEDVAYCLKNLKRATIVGENTAGGTVKMSKMKVGDTDFYVTVPVAKSINPITGKSWEINGVAPDVDVAAEDALDAAIAII | NFSPTLIADMAKIFMDNYCSPEKLTGMEEAIDAASSNTEILSISDPTMLANVLTDGVKKTISDSRVKVTYEPDLILAAPPAMPDIPLEHLAAMIKGTVKVEILEGNIGYLKIQHIIGEEMAQKVGPLLLEYIWDKILPTSAMILDFRSTVTGELSGIPYIVSYFTDPEPLIHIDSVYDRTADLTIELWSMPTLLGKRYGTSKPLIILTSKDTLGIAEDVAYCLKNLKRATIVGENTAGGTVKMSKMKVGDTDFYVTVPVAKSINPITGKSWEINGVAPDVDVAAEDALDAAIAII | ||
>M1_petMar | >M1_petMar | ||
KFDTAVVLHLAKVLLDNYCIPENLVGMDEAIQRAVDNGELLGVSDPESAASALTEGIQAALNDPRIAVSYVPDVDDDGDREEGDAEGWDAGEQHRPTTFEELLATIPQKTSFAVLDGNVGYLRADEIISEATIKKLGPVIVQRIWNRLVDTDTFVLDLRYNSHGDITGLPYLVSCFCEPRPVVHLDTVYYRPTNESKEIWSLPDLQGARFAKHKDVFVLVSANTEGVAENVAYVLKHLHRATVIGEQTAGGSLEVERFRLGDSRFFVTVPTARSQSPLTGRSWELTGVFPCVSAPSERALDKALEIL | |||
>M1x_takRu | >M1x_takRu | ||
FYQHTLVLEMAKLLLENYCIPENLVGMQEAIQRAIKSREILQISDRKTLATVLTVGVQGALNDPRLSVSYEPSFSPLPLQALSSLPVEQQLRLLRNSIKLDILDSDVGYLRIDRIIDEETLLKFGPLLRENVWDKAAQTSSLILDLRFSTAGGWSGIPSIVSYFTEPHSLVHIDTVYDRPSNTTTELWTMSSVRGKTFGGKKDMIVLIGRRTAGAAEAVAYTLKHLNRAIVVGERSAGGSLKVRKFRIAESDFYITMPVARSVSPITGKSWEVSGISPTVNVAAREALAKAQTFL | FYQHTLVLEMAKLLLENYCIPENLVGMQEAIQRAIKSREILQISDRKTLATVLTVGVQGALNDPRLSVSYEPSFSPLPLQALSSLPVEQQLRLLRNSIKLDILDSDVGYLRIDRIIDEETLLKFGPLLRENVWDKAAQTSSLILDLRFSTAGGWSGIPSIVSYFTEPHSLVHIDTVYDRPSNTTTELWTMSSVRGKTFGGKKDMIVLIGRRTAGAAEAVAYTLKHLNRAIVVGERSAGGSLKVRKFRIAESDFYITMPVARSVSPITGKSWEVSGISPTVNVAAREALAKAQTFL | ||
Line 404: | Line 411: | ||
RALPGVIQRLQEALREYYTLVDRVPALLSHLAAMDLSSVVSEDDLVTKLNAGLQAVSEDPRLQVQVVRPKEASSGPEEEAEEPPEAVPEVPEDEAVRRALVDSVFQVSVLPGNVGYLRFDSFADASVLEVLGPYILHQVWEPLQDTEHLIMDLRQNPGGPSSAVPLLLSYFQSPDASPVRLFSTYDRRTNITREHFSQTELLGRPYGTQRGVYLLTSHRTATAAEELAFLMQSLGWATLVGEITAGSLLHTHTVSLLETPEGGLALTVPVLTFIDNHGECWLGGGVVPDAIVLAEEALDRAQEVL | RALPGVIQRLQEALREYYTLVDRVPALLSHLAAMDLSSVVSEDDLVTKLNAGLQAVSEDPRLQVQVVRPKEASSGPEEEAEEPPEAVPEVPEDEAVRRALVDSVFQVSVLPGNVGYLRFDSFADASVLEVLGPYILHQVWEPLQDTEHLIMDLRQNPGGPSSAVPLLLSYFQSPDASPVRLFSTYDRRTNITREHFSQTELLGRPYGTQRGVYLLTSHRTATAAEELAFLMQSLGWATLVGEITAGSLLHTHTVSLLETPEGGLALTVPVLTFIDNHGECWLGGGVVPDAIVLAEEALDRAQEVL | ||
>M2_monDom | >M2_monDom | ||
RARPGAIQRLMEVLQNYYTLVDRVPALLHHLTAIDYSSVLTEEDLAAKLNAGLQAVSEDPRLLVRVLRPEEATMGSEASEEEDATPAANSLPEDESQRQALVDSVFQVSVLPGNVGYLRFDEFADSSVLGTLAPYVIRQVWEPLQDTNHLIMDLRYNPGGPSSAVPLLLSYFQDPAAGPIRLFTTYDRQTNQTQEHLSRAELLGKPYGAQRGVYLLTSHHTATAAEEFAFLMQSLGRATLVGEITAGSLMHTRTFPLLQPPNGNLVLTVPILTFIDNNGECWLGGGVVPDAIVLAEEALDKAKEVL | |||
>M2_macEug | >M2_macEug | ||
RARPAAIQRLMEVLQNYYTLVDRVPALLHHLTAIDYSSVLTEEDLAAKLNAGLQAVSEDPRLLVRVLRPEEATAGAESREEAATAAPVPLPDGESQRQALVNSVFQVSVLPGNVGYLRFDEFADSSVLGALAPYVLQQVWEPLQDTDHLIMDLRYNPGGPSSAVPLLLSYFQDPAAGPVRLFATYDRQTNQTREYRSQAELLGKPYGAQRGVYLLTSHHTATAAEEFAFLMQSLGRATLVGEITAGNLMHTRTFSLLQPPDGSLVLTVPILTFIDNHGECWLGGGVVPDAIVLAEEALDKAKEVI | RARPAAIQRLMEVLQNYYTLVDRVPALLHHLTAIDYSSVLTEEDLAAKLNAGLQAVSEDPRLLVRVLRPEEATAGAESREEAATAAPVPLPDGESQRQALVNSVFQVSVLPGNVGYLRFDEFADSSVLGALAPYVLQQVWEPLQDTDHLIMDLRYNPGGPSSAVPLLLSYFQDPAAGPVRLFATYDRQTNQTREYRSQAELLGKPYGAQRGVYLLTSHHTATAAEEFAFLMQSLGRATLVGEITAGNLMHTRTFSLLQPPDGSLVLTVPILTFIDNHGECWLGGGVVPDAIVLAEEALDKAKEVI | ||
>M2_sacHar | >M2_sacHar | ||
RARPGAIQRLMEILQKYYTLVDRVPALLHHLTAIDYSSVLTEEDLAAKLNAMLQAVSEDPRLLVRVLRPEEATVEAEPGEESATPASVSLPESDAERQALIDSVFQVSVLPGNVGYLRFDEFADNSVLGTLAPYVLRQVWEPLQDTDHLIMDLRYNPGGPSSAVPLLLSYFQDPSAGPVRLFATYDRQTNQTQEYRSRAELLGKPYGAERGVYLLTSYHTATAAEEFAFLMQSLGRATLVGEITAGSLMHTRTFPLLQPPNGSLVLTVPTLTFIDNHGECWLGGGVVPDAIVLAEEALDKAKEVL >M2_ornAna | RARPGAIQRLMEILQKYYTLVDRVPALLHHLTAIDYSSVLTEEDLAAKLNAMLQAVSEDPRLLVRVLRPEEATVEAEPGEESATPASVSLPESDAERQALIDSVFQVSVLPGNVGYLRFDEFADNSVLGTLAPYVLRQVWEPLQDTDHLIMDLRYNPGGPSSAVPLLLSYFQDPSAGPVRLFATYDRQTNQTQEYRSRAELLGKPYGAERGVYLLTSYHTATAAEEFAFLMQSLGRATLVGEITAGSLMHTRTFPLLQPPNGSLVLTVPTLTFIDNHGECWLGGGVVPDAIVLAEEALDKAKEVL | ||
>M2_ornAna | |||
GAVPGAVAHLADLLRDYYALVDRVPALLRHLAALDLSSVLSEEDLTSRLNAGLQAASEDPRLLVRRLEPEEAERGPPRKEEEQKEEEEEDQPSPGASILPGDGSSLFRVSVLPGNVGYLCFDEFPEASALERLGPLLGRRVWEPLEATDHLMVDLRNNPGGPSSAVPLLLSYFQDPAAGPIRLFTTYNRPADVTREYASRAGALEKPYGARRGVYLLTSHRTATAAEEFAYLMQALGRATLVGEITAGRLLHSRTFPLLRPPWEGLVLTVPFLTLFDPHGEGWLGGGVVPDAIVLAEEALEKAGEVL | GAVPGAVAHLADLLRDYYALVDRVPALLRHLAALDLSSVLSEEDLTSRLNAGLQAASEDPRLLVRRLEPEEAERGPPRKEEEQKEEEEEDQPSPGASILPGDGSSLFRVSVLPGNVGYLCFDEFPEASALERLGPLLGRRVWEPLEATDHLMVDLRNNPGGPSSAVPLLLSYFQDPAAGPIRLFTTYNRPADVTREYASRAGALEKPYGARRGVYLLTSHRTATAAEEFAYLMQALGRATLVGEITAGRLLHSRTFPLLRPPWEGLVLTVPFLTLFDPHGEGWLGGGVVPDAIVLAEEALEKAGEVL | ||
>M2_galGal | >M2_galGal | ||
Line 421: | Line 429: | ||
SSVTHVLHQLCDILANNYAFSERIPTLLQHLPNLDYSTVISEEDIAAKLNYELQSLTEDPRLVLKSKTDTLVMPGDSIQAENIPEDEAMLQALVNTVFKVSILPGNIGYLRFDQFADVSVIAKLAPFIVNTVWEPITITENLIIDLRYNVGGSSTAVPLLLSYFLDPETKIHLFTLHNRQQNSTDEVYSHPKVLGKPYGSKKGVYVLTSHQTATAAEEFAYLMQSLSRATIIGEITSGNLMHSKVFPFDGTQLSVTVPIINFIDSNGDYWLGGGVVPDAIVLADEALDKAKEII | SSVTHVLHQLCDILANNYAFSERIPTLLQHLPNLDYSTVISEEDIAAKLNYELQSLTEDPRLVLKSKTDTLVMPGDSIQAENIPEDEAMLQALVNTVFKVSILPGNIGYLRFDQFADVSVIAKLAPFIVNTVWEPITITENLIIDLRYNVGGSSTAVPLLLSYFLDPETKIHLFTLHNRQQNSTDEVYSHPKVLGKPYGSKKGVYVLTSHQTATAAEEFAYLMQSLSRATIIGEITSGNLMHSKVFPFDGTQLSVTVPIINFIDSNGDYWLGGGVVPDAIVLADEALDKAKEII | ||
>M2_petMar | >M2_petMar | ||
GVARKAVEAAGELLLSSYTFVERASAIADHLSWSEYGSVVSVEDLTSKLTQDLQSVAEDPRLVVSNREPEWPPLAQPIPPGPPAPLPDDEQMLEAIVDSAFKVEVLEGNIGYLRFDEFGDASAVMKLRKQLVSKVWERIHPTDDVIIDLRYNLGGSSTAIPIVLSYFQDASPPVHFYTVYDRLRNVTAEFHTVSNLTSQLYGSKKGVYLLTSQHTATAAEEFTYLMQSLNRATIVGEITSGRLAHSLAFRLSDTGLYMTVPIVNFIDNNDEYWLGGGVVPDAIVLAENALDAAKEII | |||
>M2x_takRu | >M2x_takRu | ||
Line 457: | Line 465: | ||
>M4_homSap | >M4_homSap | ||
AKVPTVLQTAGKLVADNYASAELGAKMATKLSGLQSRYSRVTSEVALAEILGADLQMLSGDPHLKAAHIPENAKDRIPGIVPMQIPSPEVFEELIKFSFHTNVLEDNIGYLRFDMFGDGELLTQVSRLLVEHIWKKIMHTDAMIIDMRFNIGGPTSSIPILCSYFFDEGPPVLLDKIYSRPDDSVSELWTHAQVVGERYGSKKSMVILTSSVTAGTAEEFTYIMKRLGRALVIGEVTSGGCQPPQTYHVDDTNLYLTIPTARSVGASDGSSWEGVGVTPHVVVPAEEALARAKEML | |||
>M4_bosTau | >M4_bosTau | ||
AKVPTVLQTAGKLVADNYASPELGVKMAAELSGLQSRYARVTSEAALAELLQADLQVLSGDPHLKTAHIPEDAKDRIPGIVPMQIPSPEVFEDLIKFSFHTNVLEGNVGYLRFDMFGDCELLTQVSELLVEHVWKKIVHTDALIVDMRFNIGGPTSSISALCSYFFDEGPPILLDKIYNRPNNSVSELWTLSQLEGERYGSKKSMVILTSTLTAGAAEEFTYIMKRLGRALVIGEVTSGGCQPPQTYHVDDTDLYLTIPTARSVGAADGSSWEGVGVVPDVAVPAEAALTRAQEML | AKVPTVLQTAGKLVADNYASPELGVKMAAELSGLQSRYARVTSEAALAELLQADLQVLSGDPHLKTAHIPEDAKDRIPGIVPMQIPSPEVFEDLIKFSFHTNVLEGNVGYLRFDMFGDCELLTQVSELLVEHVWKKIVHTDALIVDMRFNIGGPTSSISALCSYFFDEGPPILLDKIYNRPNNSVSELWTLSQLEGERYGSKKSMVILTSTLTAGAAEEFTYIMKRLGRALVIGEVTSGGCQPPQTYHVDDTDLYLTIPTARSVGAADGSSWEGVGVVPDVAVPAEAALTRAQEML | ||
Line 489: | Line 497: | ||
AEIPALAQAAATLIADNYAFPSIGEHVAEKLEAVVAGGEYNLISTKEDLEERLSEDLLKLSEDKCLKTTSNIPALPPMNPTPEMFIALIKSSFQTDVFENNIGYLRFDMFGDFEHVATIAQIIVEHVWNKVVDTDALIIDLRNNIGGHASSIAGFCSYFFDADKQIVLDHIYDRPSNTTRDLQTLEQLTGRRYGSKKSVVILTSGVTAGAAEEFVFIMKRLGRAMIIGETTHGGCQPPETFAVGESDIFLSIPISHSTAQGPSWEGAGIAPHIPVPAGAALDTAKGML | AEIPALAQAAATLIADNYAFPSIGEHVAEKLEAVVAGGEYNLISTKEDLEERLSEDLLKLSEDKCLKTTSNIPALPPMNPTPEMFIALIKSSFQTDVFENNIGYLRFDMFGDFEHVATIAQIIVEHVWNKVVDTDALIIDLRNNIGGHASSIAGFCSYFFDADKQIVLDHIYDRPSNTTRDLQTLEQLTGRRYGSKKSVVILTSGVTAGAAEEFVFIMKRLGRAMIIGETTHGGCQPPETFAVGESDIFLSIPISHSTAQGPSWEGAGIAPHIPVPAGAALDTAKGML | ||
>M4_petMar | >M4_petMar | ||
ADAPSILRTVGKLVADGYSRAEAALGVPSKLAALLEAGEYGALRSEEELAFKLTVHLQLITGDRHLKAVCVPEHATDRMPGIVPMQMPPTESFEDLIKFSFITDVLEGNIGYLRFDLFSDLEALEHVAHLLVEHVWKKICDTEILIIDLR | |||
>Mn_braFlo RALNDQSL SKAIILD LNEL insertions omitted | >Mn_braFlo RALNDQSL SKAIILD LNEL insertions omitted | ||
Line 744: | Line 752: | ||
2 * 0 | 2 * 0 | ||
>RBP3_petMar lamprey exon3/4 | >RBP3_petMar lamprey fragment exon3/4 missing, fixed genomic frameshift; four domains: 34-312,327-615,625-914,916-1217 | ||
0 | 0 MAGSREQRTAFSTRLLLLLLLPLATCPSQAPYKFDTAVVLHLAKVLLDNYCIPENLVGMDEAIQRAVDNGELLGVSDPESAASALTEGIQAALNDPRIAVSYVPDVDDDGDREEGDAEGW | ||
DAGEQHRPTTFEELLATIPQKTSFAVLDGNVGYLRADEIISEATIKKLGPVIVQRIWNRLVDTDTFVLDLRYNSHGDITGLPYLVSCFCEPRPVVHLDTVYYRPTNESKEIWSLPDLQGA | |||
RFAKHKDVFVLVSANTEGVAENVAYVLKHLHRATVIGEQTAGGSLEVERFRLGDSRFFVTVPTARSQSPLTGRSWELTGVFPCVSAPSERALDKALEILNARGVARKAVEAAGELLLSSY | |||
TFVERASAIADHLSWSEYGSVVSVEDLTSKLTQDLQSVAEDPRLVVSNREPEWPPLAQPIPPGPPAPLPDDEQMLEAIVDSAFKVEVLEGNIGYLRFDEFGDASAVMKLRKQLVSKVWER | |||
IHPTDDVIIDLRYNLGGSSTAIPIVLSYFQDASPPVHFYTVYDRLRNVTAEFHTVSNLTSQLYGSKKGVYLLTSQHTATAAEEFTYLMQSLNRATIVGEITSGRLAHSLAFRLSDTGLYMT | |||
VPIVNFIDNNDEYWLGGGVVPDAIVLAENALDAAKEIIEFHAKMASLLELAGALVEGYYAMLSDGENATAEILLKYREGWYRSVVDYEALASQLTSDLHEIWGDHRLHAFYSDLQIERMD | |||
EDKTPSVPSPEELSVLIDTVFKVDILANNVGYLRFDMMTDAEVLKHVGPQLVEKVWNKISSTRSLVIDVRYNMGGYSTSIPILCSYFFDASPPRHLYTVFDRPSRSSTQVFTVPRVLGQR | |||
YGASKDVYILTSHMTGSAGEILTRVMSDLKRATVIGEPTAGGSLSTGTYRIGDSRLYVFIPNQAGVSPSGGRTWSVAGVEPHVQTKASEALQSALRMVALRADAPSILRTVGKLVADGYS | |||
RAEAALGVPSKLAALLEAGEYGALRSEEELAFKLTVHLQLITGDRHLKAVCVPEHATDRMPGIVPMQ 0 | |||
0 MPPTESFEDLIKFSFITDVLEGNIGYLRFDLFSDLEALEHVAHLLVEHVWKKICDTEILIIDLR 2 | 0 MPPTESFEDLIKFSFITDVLEGNIGYLRFDLFSDLEALEHVAHLLVEHVWKKICDTEILIIDLR 2 | ||
>RBP3_braFlo Branchiostoma floridae Region: 9 exons 1 domain: 83-381 ClpP/crotonase e-38 419-630; misfused to PAPS sulfotransferase | >RBP3_braFlo Branchiostoma floridae Region: 9 exons 1 domain: 83-381 ClpP/crotonase e-38 419-630; misfused to PAPS sulfotransferase | ||
Line 769: | Line 775: | ||
2 GVTPDIIVPSEKALTVALRKIQGSEDTKMAASSGNIEPPRWTVYLVFICTSIAILTYPTFM* 0 | 2 GVTPDIIVPSEKALTVALRKIQGSEDTKMAASSGNIEPPRWTVYLVFICTSIAILTYPTFM* 0 | ||
</pre> | </pre> | ||
'''See also:''' [[Opsin_evolution|Curated Sequences]] | [[Opsin_evolution:_RPE65|RPE65]] | [[Opsin_evolution:_transducins|Transducins]] | [[USH2A_SNPs|Usher: USH2A]] | [[CDH23_SNPs|Usher: CDH23]] | [[LOXHD1_SNPs|LOXHD1]] | [[Opsin_evolution:_update_blog|Update Blog]] | |||
[[Category:Comparative Genomics]] | [[Category:Comparative Genomics]] |
Latest revision as of 11:47, 23 March 2010
See also: Curated Sequences | RPE65 | Transducins | Usher: USH2A | Usher: CDH23 | LOXHD1 | Update Blog
RPB3 (IRBP): introduction
Interstitial retinol-binding protein, inexplicably named by RBP3 by HGNC despite a lack of paralogs RBP1 or RBP2, likely confusion with ribosomal binding protein genes and widespread prior use of the protein name IRBP, is a 4 exon 1247 residue glycoprotein thought to shuttle retinoids interstitially between the photoreceptor cells and the retinal pigment epithelium. This role would only make sense for ciliary opsin systems that are unable to regenerate cis-retinal without an auxiliary pathway in an anatomically separate tissue (here RPE). Consequently -- since nearly all protein folds are extremely ancient -- RPB3 must have been co-opted from some other role.
The protein's size results from four ancient internal tandem duplications that became established prior to intronation (that is, the gene structure does not reflect the repeat structure; the repeats happened first, introns were inserted randomly later within the fourth repeat). Any given repeat module clusters markedly better to the same-numbered repeat in other species than to any of the internal repeats, establishing that repeats had arisen and diverged already prior to speciation rather than arising independently in descendent lineages (like RHO1 and RHO2 in lamprey).
It was initially expected that IRBP as a self-contained homotetramer would load four molecules of trans-retinol to accomplish its passive shuttling efficiently. However experiments decisively show this to be false -- the subunits are quite inequivalent and only one molecule is transported. The structure of module 2 of frog was [determined in 2002 (PDB: 1J7X); it consists of two sub-domains and a cleft. The fold matches obscure proteases and hydrolases found in little-studied bacteria; this might represent convergent evolution as seen in TIM beta barrels rather than provide valid clues to ancestral function. A structural determination of the entire molecule is reportedly underway.
Fragments of the protein have been sequenced from an immense number of species for phylogenetic purposes. That's because it evolves fast enough to provide a large number of seemingly informative sites and these can be obtained conveniently exploiting the large size of the first reading frame. While there is minimal risk of accidentally cross-matching different modules, the internal repeat structure implies that residues might not be evolving independently. While the modules might seem too diverged for cross-module gene conversion to still be operative, they retain patches of near identity.
Some very odd aspects of marsupial RBP3 are investigated below because of the implications in a gene so widely used in taxonomy. Teleost fish further raise the question of inhomogeneous recombination events arising from module mix-up. In fish, the ancestral four-module gene has given rise to a two-module gene (M1 and M4) accompanied by an intronless upstream tandem fragment duplicate of M1-M3 without any indication of M4 in the residual 10 residues. Both genes are transcribed and apparently functional since the establishment event occurred prior to zebrafish and fugu divergence, though the upstream gene has been lost in many lineages including tetraodon, medaka, and stickleback but not cichlids.
The upstream fragment suggests a truncated version of the first large exon. The lack of intron does not suggest retroprocessing in view of their lack in the parental gene and the locational adjacency but rather standard recombinational or flanking retroposon-driven tandem duplicating. The loss of M2 and M3 in the downstream gene but retention of standard introns again suggests recombinant loss, either as part of the initial tandem event or subsequent to it as a consequence of the new potential for exact misalignment. Note otherwise the four modules would have been quite diverged from each other prior to the emergence of rayfinned fish.
Conceivably whole genome or segmental duplication in fish set the stage for inhomogeneous recombination (as modelled by Nickerson), though simple local tandem duplication event mediated by retroposons during replication is another option. A great many tandem loci exist in genomes such as human for which wholesale duplication is inapplicable. This could possibly be resolved by sequencing basal teleost fish that diverged prior to the putative genome duplication to determine if the odd arrangement of genes was already established.
IRBP module alignment, conservation and structural aspects
The image below aligns the four module types from human to lamprey and amphioxus, with yellow showing exceptionally conserved residues and cyan moderately conserved. The number of species shown could be greatly expanded especially for module 1 but for clarity only a phylogenetically representative set is displayed. Module boundaries vary slightly according to the bioinformatics tool used to define the domain (eg crystallographic homology transfer, SuperFamily, Pfam, SwissProt, blastp of reference domains) but can be standardized by counting back a fixed number of residues N-terminally from the universally conserved tyrosine and forward from the C-terminal alanines because conserved residues must occupy near-identical positions within the tertiary structure.
It can be seen that M2 has two variable length insertional regions, no doubt relatively disordered loops. Amphioxus additionally has inserted residues in three regions that are suppressed in the figure -- observe that although its overall identity is low, identity is high in conserved residues. The modules each have distinctive class indels and characteristic residues that allow them to be readily distinguished from each other. These features were most likely established prior to lamprey divergence rather than later by convergence. Only two conserved patches are observed, GNvGLYRvD and RAivVGErT in human; other conserved residues are dispersed (at least within the linear structure).
The sole structural determination to date, 1J7X_A the single module M2 of Xenopus laevis to 1.8 Å, describes two domains separated by a hydrophobic binding cleft (based on analogy to very similar protease and hydrolase folds). The smaller domain A is an an amino terminal three helix bundle that extends 8 residues past the M2 insertion region (to PGDSIQAEN in xenLae) . M2 was an unfortunate choice because it is most diverged and bares the least resemblance to early-diverging or ancestral sequence.
The observed fold similarities could not plausibly have arisen by chance. Crotonase has a binding site for hydrophobic chains of fatty acids and isomerizes them, with possible relevance in RBP3 to cis and trans retinoid stabilization and transport; however IRBP does not appear to be an enzyme or isomerase itself and RPE65 carries out the regenerative isomerization reaction.
M2 may have two distinct binding sites, one in the cleft between domains A and B and the other solely inside domain B. The assignment of conserved tryptophans to these sites and other conserved residues to the surface (suggesting a protein binding partner, possibly intra-module binding) has to be revisited in view of the much deeper phylogenetic array of sequences now available.
A peculiar feature of full four-module protein (after discounting a couple dozen residues for signal peptide and carboxy terminal wander) is the cheek-to-jowl adjacency of consecutive modules -- in human at most 3 amino acids separate the modules from each other.
This is far too short a bridge to allow many arrangement options for module-module quasi-homodimerization (or allosteric regulation of the binding subunit). Instead the modules must be arranged more as beads on a string. Linear donor-receptor docking in a M1-M2-M3-M4 arrangement risks endless polymerization unless the donor in M4 docks to the receptor in M1. However the bridge sequences are too short to permit closure into a tetrameric cycle.
Despite subunits quite diverged in sequence, a more likely arrangement is an anti-parallel dimer of M1-M2-M3-M4 to M4-M3-M2-M1 (as seen in actinin spectrin repeats). This arrangement allows each module to dimerize with the quasi dihedral symmetry, leaving no docking sites vacant. Because the binding patches are in paralogous modules, this explains why observed surface residue conservation in frog M2 carries over to the other modules.
If true, this implies co-evolution of amino acids in these binding patches on top of what already occurs internally to an individual module. All phylogenetic software assumes -- ridiculously -- independent evolution of sites. However like morphological characters, molecular characters can have dependencies that amount to couplings and coordination between individual reduced alphabets. Here the much studied M1 module has surface residues co-evolving with counterparts on M4.
IRBP has only a single site with glycosylation potential (in beta strand 8 of the hinge region of domain B at residues 204 and 515 which are paralogous under module alignment). Although the NxT motif occurs with excellent phylogenetic conservation depth (human to lamprey, also in fish fragmentary genes), only modules M1 and M2 exhibit it. Given the interstitial location and experimental support for glycoprotein nature, IRBP is very likely glycosylated. Bulky carbohydrate chains need to be out of the way of the putative dimerization patches.
The situation with cysteines is not so clear. Here M1 has a cysteine at position 19 that is conserved back to lamprey (also in fish fragmentary genes) but A/T/S in other modules. A second conserved cysteine is found only in modules M2 and M3 at position 163 (human M1 numbering, residue is I). M4 has no conserved cysteines. A few other cysteines occur elsewhere but only sporadically in the comparative genomics sense. The cysteines at M2 and M3 could potentially form a disulfide in the antiparallel dimer model discussed above. Inter-modular disulfides are not feasible if modules are arranged linearly. No explanation is currently available for the conserved cysteine in M1.
An initial study of knockout mice disastrously used a strain carrying a severe mutation at L450 in RPE65, the enzyme catalyzing the rate-limiting isomerization step in the visual regeneration cycle. Lack of IRBP alone delays transfer of both bleached from photoreceptor cells and newly regenerated chromophore from RPE to photoreceptors.
A single known disease mutation D1080N causes a very rare recessive retinitis pigmentosa. This aspartate may disrupt an internal salt bridge according to the Xenopus M2 structure (and homology transfer to other modules]. It lies in M4 near the end of the second exon. This residue is conserved back to lamprey and amphioxus, indeed to crotonase and other weakly aligning homologs.
* AIILDLRYNLGGDREGVVHWASFFF RBP3_braFlo A+I+D+R+N+GG + S+FF AMIIDMRFNIGGPTSSIPILCSYFF RBP3_homSap +IID+R+N+GG ++SIPILCSYFF ILIIDLRYNMGGYSTSIPILCSYFF RBP3_petMar
M1_homSap LFQPSLVLDMAKVLLDNYCFPENLLGMQEAIQQAIKSHEILSISDPQTLASVLTAGVQSSLNDPRLVISYEPST-----------PEPPPQVPALTSLSEEELLAWLQRGLRHEVLEGNVGYLRVDSVPGQEVLSMMGEFLVAHVWGNLMGTSALVLDL M1_bosTau LFQPSLVLEMAQVLLDNYCFPENLMGMQGAIEQAIKSQEILSISDPQTLAHVLTAGVQSSLNDPRLVISYEPST-----------LEAPPRAPAVTNLTLEEIIAGLQDGLRHEILEGNVGYLRVDDIPGQEVMSKLRSFLVANVWRKLVNTSALVLDL M1_monDom IFQPSLVRDMAKILLDNYCFPENLMGMQEVIEQAIKSGEILDISDPQMLASVLTAGVQGALNDPRLVISFEPSI-----------PETPQHVPKLANVTQEELLILLQQMIKYQVLEGNVGYLRVDYIPGQEVVEKVGEFLVNNIWKKLMGTSSLVLDL M1_macEug IFQPSLVLDMAKILLDNYCFPENLMGMQEAIEQAIKSGEILDISDPQMLASVLTAGVQGSLNDPRLVISYEPSP-----------AEAPQQSPKLTSLTQEELLTLLQQMIKYQVLDGNVGYLRVDYIPGQEVVEKVGEFLVNDIWKKLMGTSSLVLDL M1_sacHar IFQPTLVLDMAKILLDNYCFPENLMGMQEAIEQAIKSGEILDISDPQMLANVLTAGVQGSLNDPRLVISYEPST-----------SEAPQHDPKFANATQEELLALFQKIIKYQVLEDNVGYLRVDYIPGRDMIEEVGEFLVNDIWKKVMETSSLVLDL M1_ornAna VSQPSMVLDVAKILLDNYCYPENLMGMQEAIEEAIQRGEILDIADPKRLASVLTAGVQGSLNDPRLVISYEPAP-----------VAVSQQPPEPASLPAEQPLERLRPAVGSEVLEGNVGYLRVDRLPGREEIERVGAVLGRDIWEKLLGTSALVLDL M1_galGal IFQPTLVLDMAKVLLDNYCYPENLVGMQEAIEQAIKSGEILDISDPKMLANVLTAGVQGALNDPRLVISYEPSL-----------HAAPKQEAE-TYPTREQLLSLIEHVVIYDKLEGNVGYLRIDYIIGQEVVEKVGAFLVDKVWKTLINTSALVIDL M1_taeGut IFQPTLVLDMAKVLLDNYCYPENLVGMQEAIEQAIKSGEILDISDPKMLANVLTAGVQGALNDPRLVISYEPLP-----------HSGPKQEAE-GSPTREQLLSLIEHVIMYDKLEGNVGYLRIDYIIGEEVVQKVGAFLVDKVWKTLIETSALVIDL M1_anoCar VLQSTLVLDMAKLLLDNYCLPENLVGMREAIEQAIKNGEVLDISDPKLLATVLTAGVQGALNDPRLVISYEPTA-----------PAAPKQRME-TSLTPEQLLSLIQHTVKYEVLDDNVGYLRIDYIMGQDIVQKIGSFLVEKVWKTLLGTSALILDL M1_xenLae LFQPSLVMDMAKVLLDNYCFPENLVGMQETIEQAVKGGEILHISDPDTLANVFTSGVQGYLNDPRLVVSYEPN-------------YSGPQTEQSLELTPEQLKFLINHSVKYDILPGNIGYLRIDFIIGQDVVQKVGPHLVNNIWKKLMPTSALILDL M1_xenTro VFQPSLVMDMAKVLLDNYCFPENLVGMQETIEQAMKSGEILHISDPETLANVFTSGVQGFLNDPRLVVSYEPN-------------YSGPRKEQSPEPTLEQLKFLLDHSVTYDLLPGNIGYLRIDFIIGQDVVQKVGPLLVNNIWKKLMPSSALILDL M1_tetNig AFPPSLIADMAKIVLDNYCSPEKLAGMKEAIKAAGTNTEVLNIPDGESLARVLSAGVQGTVSDPRLMVSFQPN-------------YVPAGPHKMPPLPPEHLVAVLQTSVKLDILEGNTGYLRIDHILGEEVADKVGPALIDLIWNKILPTSALIFDL M1_takRub AFPPSLITDMAKIVLDNYCSPEKLAGMKEAIEAAGTNTEVLNIPDGESLARVLSAGVQGTVSDSRLMVSYQPD-------------YVPAVPPKMPPLPPEHLVAVLQTSIKLDLLEGNTGYLRIDHIIGEDVAEKVGPSLIDLIWNKILPTSALIFDL M1_gasAcu GFAPNVIIDMAKIVIDNYCSPEKLAGMKEAIEAAGSNTEVLSIPDAETLANVLSAGVQTTVSDPRLMISYEPN-------------YVPVVPPKMPPLPPDQVIAVLQTSIKLDILEGNIGYLRIDHILGEDVAEKVGPLLLDLVWNKILPTSALIFDL M1_oryLat SFPPSLITDLAKIVMDNYCSPEKLSGMKEDIATAGANTDVLNIPDGEALAKVLTDGVQTTVSDPRLRVSYEPN-------------YVPVVP---PQLPPEQLIAVLQTSIKLDILEGNIGYLRIDSIIGEEVAEKVGPLLLELVWSKILPTSALIFDL M1_danRer NFSPTLIADMAKIFMDNYCSPEKLTGMEEAIDAASSNTEILSISDPTMLANVLTDGVKKTISDSRVKVTYEPD-------------LILAAPPAMPDIPLEHLAAMIKGTVKVEILEGNIGYLKIQHIIGEEMAQKVGPLLLEYIWDKILPTSAMILDF M1_petMar KFDTAVVLHLAKVLLDNYCIPENLVGMDEAIQRAVDNGELLGVSDPESAASALTEGIQAALNDPRIAVSYVA-----------------------PPHTFEELLATIPQKTSFAVLDGNVGYLRADEIISEATIKKLGPVIVQRIWNRLVDTDTFVLDL M1x_takRu FYQHTLVLEMAKLLLENYCIPENLVGMQEAIQRAIKSREILQISDRKTLATVLTVGVQGALNDPRLSVSYEPS-------------FSPLPLQALSSLPVEQQLRLLRNSIKLDILDSDVGYLRIDRIIDEETLLKFGPLLRENVWDKAAQTSSLILDL M1x_danRe SFQSALVLDMAKILLDNYCFPENLIGMQEAIQQAINSGEILHISDRKTLASVLTAGVQGALNDPRLTVSYEPN-------------YTLITPPALHSLPTEQLIRLIRSTVKLEVMDNNIGYLRIDRIIGQETVVKLGRLLHNNIWKKVAHTSAMIFDL M2_homSap SALPGVVHCLQEVLKDYYTLVDRVPTLLQHLASM----DFSTVVSEEDLVTKLNAGLQAASEDPRLLVRAIGPTETPSWPAPDAAAEDSPGVAPELPEDEAIRQALVDSVFQVSVLPGNVGYLRFDSFADASVLGVLAPYVLRQVWEPLQDTEHLIMDL M2_bosTau RALPGVIQRLQEALREYYTLVDRVPALLSHLAAM----DLSSVVSEDDLVTKLNAGLQAVSEDPRLQVQVVRPKEASS--GPEEEAEEPPEAVPEVPEDEAVRRALVDSVFQVSVLPGNVGYLRFDSFADASVLEVLGPYILHQVWEPLQDTEHLIMDL M2_monDom RARPGAIQRLMEVLQNYYTLVDRVPALLHHLTAI----DYSSVLTEEDLAAKLNAGLQAVSEDPRLLVRVLRPEEATM-GSEASEEEDATPAANSLPEDESQRQALVDSVFQVSVLPGNVGYLRFDEFADSSVLGTLAPYVIRQVWEPLQDTNHLIMDL M2_sacHar RARPGAIQRLMEILQKYYTLVDRVPALLHHLTAI----DYSSVLTEEDLAAKLNAMLQAVSEDPRLLVRVLRPEEATV--EAEPGEESATPASVSLPESDAERQALIDSVFQVSVLPGNVGYLRFDEFADNSVLGTLAPYVLRQVWEPLQDTDHLIMDL M2_macEug RARPAAIQRLMEVLQNYYTLVDRVPALLHHLTAI----DYSSVLTEEDLAAKLNAGLQAVSEDPRLLVRVLRPEEATA--GAESREEAATAAPVPLPDGESQRQALVNSVFQVSVLPGNVGYLRFDEFADSSVLGALAPYVLQQVWEPLQDTDHLIMDL M2_ornAna GAVPGAVAHLADLLRDYYALVDRVPALLRHLAAL----DLSSVLSEEDLTSRLNAGLQAASEDPRLLVRRLEPEEAERGPPRKEEEQKEEEEEDQPSPGASILPGDGSSLFRVSVLPGNVGYLCFDEFPEASALERLGPLLGRRVWEPLEATDHLMVDL M2_galGal RAVPGTLSRLTDILKDYYSLVERVPVLLRHLTTS----DFSSVQSAEDLATKLNTEMQTLSEDPRLLVRTMMPGEA-----AAPPAEMPIAMAANLPDNEQLLHALVDTVFKVSVLPGNVGYMRFDEFADASVLVKLGPYIVKKVWEPLQNTENLIMDL M2_taeGut RAVPGTISHLKNILKDYYSLVERVPALLRRLTTS----DFSSVQSSEDLATKLNTELQALSDDPRLMVRVMMPGEA-----ADSPAEKPVGMAADLPDNEQLLHALVDTVFKVSVLPGNVGYMRFDEFADASVLVKLGPYLVHKVWEPLQNTENLIMDL M2_anoCar KAIPNSMSYLVDIIKNNYSMLEQVPVLLQHLSTF----DYSSVLSVKDLASKLNAELQTISEDPRLFLRVPASDEA-----VTSQTDEKVAMASDLPNNEQLMKALVMTVFKVSVLPGNVGYMRFDEFGDATVLVKLGPYLLQHVWEPLQATDYLIIDL M2_xenTro SSITHILLQLSEILVNNYAFSERIPTLLQHLPNL----DYSSVISEEDITAKLNYELQSLTEDPRLVLKSKTDSLV---------MPEDSTQVENLPDDEATLQALVNTVFKVSILPGNIGYLRFDEFADVSVLAKLGPYIVNTVWDPITVTENLIIDL M2_xenLae SSVTHVLHQLCDILANNYAFSERIPTLLQHLPNL----DYSTVISEEDIAAKLNYELQSLTEDPRLVLKSKTDTLV---------MPGDSIQAENIPEDEAMLQALVNTVFKVSILPGNIGYLRFDQFADVSVIAKLAPFIVNTVWEPITITENLIIDL M2_petMar GVARKAVEAAGELLLSSYTFVERASAIADHLSWS----EYGSVVSVEDLTSKLTQDLQSVAEDPRLVVSNREPEWV--------GAADPPGPPAPLPDDEQMLEAIVDSAFKVEVLEGNIGYLRFDEFGDASAVMKLRKQLVSKVWERIHPTDDVIIDL M2x_takRu SRIPKVLQIVLDIIGRFYAFADRVQALLQQLESA----DLFSVVSEEDLAARLNHDLQTASEDPRLIIRHKRDNI------------PRAEEEPELHAANDHDGELVEG-FTVQVLPHNTGYLRLDRFVRCSEGDKLEEIVAEKVWGPLKDTQNLIIDL M2x_danRe KTIPKAVRRVSDIIKRYYSFKDKIPALLNQLAKA----DYFTVVSEEDLAGKLNHEMQSVFEDPRLLIKATQVLT------------DDASSE-DRSSSDDLTDPL----FKLEMISGNNGYLRFDRFPTPEVLLRLEDHIKKKIWQPVQETENLVIDL M3_homSap QSLGALVEGTGHLLEAHYARPEVVGQTSALLRAKLAQGAYRTAVDLESLASQLTADLQEVSGDHRLLVFHSPGELV---------VEEAP-PPPPAVPSPEELTYLIEALFKTEVLPGQLGYLRFDAMAELETVKAVGPQLVRLVWQQLVDTAALVIDL M3_bosTau RSLGELVEGTGRLLEAHYARPEVVGQMGALLRAKLAQGAYRTAVDLESLASQLTADLQEMSGDHRLLVFHSPGEMV---------AEEAP-PPPPVVPSPEELSYLIEALFKTEVLPGQLGYLRFDAMAELETVKAVGPQLVQLVWQKLVDTAALVVDL M3_monDom QRLGALVEGTGHLLEAHYALPEVVGQASALLKAKLEHGTYRTAVDFESLASQLTSDLQEVSGDHRLHVFHSPGEPV---------SEELT-PPQKGVPSPEELTYLIEALFKTEVLPGQLGYLRFDMMAEAETVRAIAPQLVELVWEKLVHTEALVVDL M3_macEug QRLGALVEDAGHLLEAHYALPEVVGQASALLRARLVHGTYRTAVDFESLASQLTSDLQEVSGDHRVHVFHSPGELI---------PEELS-PPQNVVPSPEELTYLIEALFKTEVLPGQLGYLRFDMMAEAETVRAIGPQLIELVWEKLVNTEALVVDL M3_sacHar QRLGALVEGTGHLVEAHYALPEVVGQASAFLRATLAHGTYRTAVDFESLASQLTSDLQEVSGDHRLHVFHSPGEPV---------PEESS-PPHKGVPSPEELTYLIEALFKTDVLP-QLGYLRFDMMAEVETVRAIGPQLVELVWEKLVNTEALVVDL M3_taeGut KNMGVLLEGTGQLLEDHYAIPEVAAKASAMLSTKRAQGGYRSAIDSETLASQLTSDLQEASGDHRLHVFHSHVEPT---------PEEQL-PNV--IPSPEELSYIIEALFKIEVLPGNLGYLRFDMMAEAETVKAIGPQLLQMVWNKLVDTDAMIIDM M3_galGal RKMGILLESTGQLLEAHYAIPEVAEKASVMLSTKRVQGGYRSAVDFETLASQLTSDLQEASGDHRLHVFHSHVEPT---------PEEQL-PNM--IPSPEELSYIIEALFKIEVLPGNLGYLRFDMMAEAETVKAIGPQLVQMVWNKLVDTDAMIIDM M3_anoCar KGMGSLIERVGQLLEAHYAIPEMARRVSSMLNSKLAQGGYRTAVDFETLASQLTNDLQETSGDHQLHVFHSHVEPS---------LEEQS-PFK--TLTPEELNFIIEALFKVDVLPGNVGYLRFDMMAEFESVKTIEPQILHMVWEKLVETSAMIVDM M3_xenTro PSVFALVEGTGHLLEVHYAIPEVAYKVSSVLQNKWSEGGYRSVVDLESLASQLTSEMQENSGDHRLHVFYSDTEPE---------ILEDQ-PPK--IPSAEELNYIIDALFKIEVLQGNVGYLRFDMMADTEIIKAIGPQLVSLVWNKLVETNSLIIDM M3_xenLae PSIFPLVKGTGHLLEVHYAIPEVAYKVSSVLQNKWSEGGYRSVVDLESLASLLTSEMQENSGDHRLHVFYSDTEPE---------ILEDQ-PPK--IPSPEELNYIIDALFKIEVLPGNVGYLRFDMMADTEIIKAIGPQLVSLVWNKLVETNSLIIDM M3_petMar AKMASLLELAGALVEGYYAMLSDGENATAEILLKYREGWYRSVVDYEALASQLTSDLHEIWGDHRLHAF-YSDLQI---------ERMDE-DKTPSVPSPEELSVLIDTVFKVDILANNVGYLRFDMMTDAEVLKHVGPQLVEKVWNKISSTRSLVIDV M3x_takRu QGLRSLIGRTGELLEKHYAIQEVAQKVGEVLLSKWAEGLYRSVVDLESLASQLTADLQEASGDHRLHVFRCDVELE---------SLHGV-PK---IAAVEEAGFVIDALFKSELLPRNVGYLRFDTMADIEAAKGAAPRLVKSVWNKLVDTDSLIIDM M3x_danRr KNIQGLVQEAGDLLEKHYSVPEVAAKVSRLLQSKLTEGLYRSVVDYESLASQLTSDLQETSGDQRLHIFYCETEPE---------TLHDT-PK---IPSPEEAGFIVEALFKVDVMSGNIGYLRFDMMEDIKVLQAINPEFLKVVWNKLVNTDMLIIDV M4_petMar ADAPSILRTVGKLVADGYSRAEAALGVPSKLAALLEAGEYGALRSEEELAFKLTVHLQLITGDRHLKAVCVPEHAT---------DRMPG-IVPMQMPPTESFEDLIKFSFITDVLEGNIGYLRFDLFSDLEALEHVAHLLVEHVWKKICDTEILIIDL M4_homSap AKVPTVLQTAGKLVADNYASAELGAKMATKLSGL--QSRYSRVTSEVALAEILGADLQMLSGDPHLKAAHIPENAK---------DRIPGIVPMQ-IPSPEVFEELIKFSFHTNVLEDNIGYLRFDMFGDGELLTQVSRLLVEHIWKKIMHTDAMRIID M4_bosTau AKVPTVLQTAGKLVADNYASPELGVKMAAELSGL--QSRYARVTSEAALAELLQADLQVLSGDPHLKTAHIPEDAK---------DRIPGIVPMQ-IPSPEVFEDLIKFSFHTNVLEGNVGYLRFDMFGDCELLTQVSELLVEHVWKKIVHTDALIVDM M4_monDom AKVPTILQTAGKLVADNYASLEVGSRVASKLAKL--QTQYRQVTSEGELADMLGADLQTLSGDRHLKTAHIPEDAK---------DRIPGIVPMQ-LPSPEAFEDLIKFSFHTNVFEGNIGYLRFDMFGDCELLTQVSDLLVEHVWKKVVHTDGMIIDM M4_macEug SKVPTILQTAGKLVADNYASPEVGSRVAAKLARL--QTQYRQVTSEGELADMLGADLQTLSGDSHLKTAHIPEDSK---------DRIPGIVPMQ-LPSPEAFEDLIKFSFHTNVFEGNIGYLRFDMFGDCELLTQVSDLLVEHVWKKVVHTDGMIIDM M4_sacHar AKVPTILQTAGKLVADNYASPEVGSRVAAKLASL--QIQYGKVTSEGELADMLGADLQTLSGDRHLKTAHIPEDAK---------DRIPGIVPMQ-LPSPEAFEDLIKFSFHTNVFEGNIGYLRFDMFGDCELLIQVSDLLVEHVWKKVMHTDGMIIDM M4_ornAna SKVPTVLRTAAKLVADNYAFRETGAGVAAQMGGL--QARCGRVTSEGALAEVLGAHLRALSGDPHLQMVYIPEDAK---------DRIPGVVPMQ-IPSAETFEDLIKFSFHTSVMEGNIGYLRFDMFGDCELLTQVSELMVEHVWKKIVHTDGLIIDM M4_xenLae TKIPTVIQTAAKLVADNYAFADTGANVASKFIALVDKIDYKMIKSEVELAEKINDDLQSLSKDFHLKAVYIPENSK---------DRIPGVVPMQ-IPSPELFEELIKFSFHTDVFEKNIGYIRFDMFADSDLLNQVSDLLVEHVWKKVVDQDALIIDM M4_xenTro TKIPSVIQTAGKLVADNYAFADTGADVASKLIALVDKINYKMIKSEVELAEKLNYDLQSLSKDVHLKAVYIPENSK---------DRIPGVVPMQ-IPSPEMFEDLIKFSFHTDVFEKNLGYIRFDMFADSDLLNQVSDLLVEHVWKKVVNQDALIIDM M4_taeGut AQVPQILQTVGKLVADNYAFVNTGTVIASNLTKNIHKDNYKRINTEEDLAGKVTAILQALSDDKHLKLLYIPEHAK---------DSIPGIMPKQ-IPPPEVFEDLIKFSFHTNVFENNIGYLRFDMFGDSELLTQLSDLMIEHVWKKIFHTDALIIDL M4_galGal TQVPQIVQTVGKLVAENYAFVDIGTDIASNLTKSVNKENYKRINSEKELARKLTAILQALSDDEHLKILYIPEHAK---------DSIPGILPKQ-IPSPEVFEDLIKFSFHTNVFENNIGYLRFDMFGDCELLTQVSDLLVEHVWKKIVHTDALIIDM M4_anoCar TKLPSVLNTIGKLVADNYAFADIGATVAAKFADYAKKGTYRKINSEIELSGKLAADLKALSGDRHLMISHIPERSK---------GRILGLVPMQQIPPPEILEDLIKFSLHTNVFENNIGYLRFDMFGDCELMSQVSELLVQHVWNKIVNTDALIIDM M4_tetNig AQIPAIIEGTAALVANNYAFEATGADVAKELRELQANGQYSSVVSKESLEAALSADLQRLSGDKSLKTT----------------PNTPVLPPMD--YTPEMYIELIKVSFHTDVFENNIGYLRFDMFGDFEEVKAIAQIIVEHVWNKVVNTDALILDL M4_takRub AQIPAIIEGAATLIAKNYAFEATGADVATKLRELLAKGQYNSVVSSESLEVALSADLQRLSGDKSLKAT----------------QNAPVLPPMD--YSPEMYIELIKVSFHTDVFENNIGYLRFDMFGDFEEVKAIAQIIVEHVWNKVVNTDALILDL M4_gasAcu NRVPAIIEGSATLIADNYAFEDIGAAVAEKLKGLLANGEYSKVVSKDSLEMKLSADLRTLSGDKSLKTT----------------SNVPALPPMN--YSPEMYIELIKVSFHTDVFEDNIGYLRFDMFGDFEEVKAIAQIIVEHVWNKVVNTDAMIVDL M4_oryLat LQVPAIIEESATLVANNYAFESTAADVAEKLKGHLANGDYNMVVSKESLEAKLSADLQSLSGDKSLTVS----------------SNTGAPPPME--YTPEMYIELIKISFHTDVFENNIGYLRFDMFGDFEEVKAIAQVIVEHVWNKVLHTDAMIIDL M4_danRer AEIPALAQAAATLIADNYAFPSIGEHVAEKLEAVVAGGEYNLISTKEDLEERLSEDLLKLSEDKCLKTT----------------SNIPALPPMN--PTPEMFIALIKSSFQTDVFENNIGYLRFDMFGDFEHVATIAQIIVEHVWNKVVDTDALIIDL M1_homSap RHCTGGQVS-GIPYIISYLHPGN-TILHVDTIYNRPSNTTTEIWTLPQVLGERYGADKDVVVLTSSQTRGVAEDIAHILKQMRRAIVVGERTGGGALDLRKLRIGES------DFFFTVPVSRSLGPLGGGSQTWEGSGVLPCVGTPAEQALEKALAIL M1_bosTau RHCTGGHVS-GIPYVISYLHPGS-TVSHVDTVYDRPSNTTTEIWTLPEALGEKYSADKDVVVLTSSRTGGVAEDIAYILKQMRRAIVVGERTVGGALNLQKLRVGQS------DFFLTVPVSRSLGPLGEGSQTWEGSGVLPCVGTPAEQALEKALAVL M1_monDom QHSSGGEIS-GIPFVISYLHQGD-ILLHVDTVYDRPSNTTTEIWTLPQVLGERYGGEKDMVVLTSHRTVGVAEDIAYILKKLRRAIVVGEQTLGGALDLRKLRIGQS------DFFITVPVSRSLSPLGGGSQTWEGSGVLPCVGIPAEQALGKALAIL M1_macEug QHSTEGEVS-GIPFVISYLHEGD-ILLHVDTVYDRPSNTTTEIWTLPQVLGERYSGEKDLVILTSHRTVGVAEDIAYILKKMRRAIVVGEQTLGGALDLRKLRIGQS------DFFITVPVSRSLSPLGGGSQTWEGSGVLPCVGIPAEQALEKALAIL M1_sacHar QHSRGGEVS-GIPFVISYLHQGD-ILLHVDTIYDRPSNTTTEIWTLPQVLGERYSGEKDIVVLTSHHTVGVAEDIAYILKKMRRAIVVGEQTQGGSLDLRKLRIGQS------DFFITVPVARSLSPLGGGSQTWEGSGVVPCVGIPAEQALEKALAIL M1_galGal RYSTGGQIS-GIPFIISYLHEAD-KMLHVETVYNRPSNTTTEIWTLPKVLGERYSKDKDVIVLISHHTTGVAEDVAYILKHMNRAITLGEKTAGGSLDIQKLRIGPS------NFYMMVPVSRSVSPLSGGGQSWEVSGVMPCVASEAEQALKKSLDIL M1_taeGut RHSTGGQIS-GLPFIISYLHEQD-KILHVETVYNRPSNTTTEIWTLPKVLGERYSKDKDVIVLISHHTTGVAEDVAYILKHMNRAITVGEKTAGGSLDIQKLRIGPS------NFYMMVPVSRSVSPLSGGGQSWEVSGVMPCVATEAEQALQKSLDIL M1_anoCar RYTTGGDVS-GIPFIISYLYNGD-KVLHVDTVYNRPSNTTVEILTLPKVLGVRYSKDKDVILLISKYTTGVAENVAYILKHMHRTIIVGEKSAGGSLDTQKMQIGNS------QFYMTVPLSCSVSPLSGSGQSWEISGVTPCVVISAEQALDKALAIL M1_ornAna RHSTGGHVS-GIPFFISYFYPEG-PALHVDTVYDRPSNATRQLWTLPRVLGARYAADKDVVVLTSRLTAGVAEDVAYILQQMRRAIVVGERTAGGPLVFRKLRVGLS------DFFITVPVACSLGPLGGGGRSWEGSGVLPCVAVPADRALDEALDIL M1_xenLae RYSTQGEVS-GIPFVVSYL--CD-SEIHIDSIYNRPSNTTTDLWTLPELMGERYGKVKDVVVLTSKYTKGVAEDASYILKHMNRAIVVGEKTAGGSLDTQKIKIGQS------DFYITVPVSRSLSPLTG--QSWEVSGVSPCVVVNAKDALDKAQAIL M1_xenTro RYSTQGKVS-GIPFVVSYL--TD-PQIHIDSIYNRPSNTTTDLWTLSELMGERYGKDKDVVVLTSKYTEGIAEGAAYILKHMSRAIVVGEKTAGGSLDIQKIKIGQS------EFYITVPVSRSISPLTG--QSWEVAGVFPCVVVNANNALNKAQGIL M1_tetNig RYTSSGDIS-GIPYIVSYFTQAE-PVVHIDSVYDRPSNTTTKLLSLPNLLGQRYGVSKPLIVLTSKNTKGIAEDVAYCLKNLKRATIVGEKTAGGSLKLDTFKVGDT------DFYITVPTAKSINPITG--SSWEIRGVTPHVEVNAEDALATAIKIV M1_takRub RYTSSGEIS-GIPYIVSYFTQAE-PVVHIDSVYDRPSNTTTKLFSLSNLLGERYGITKPLIILTSKNTKGIAEDVAYCLKNLKRATIVGERTAGGSVKLDNFKVGST------DFYITVPTAKSINPVTG--SSWEITGVKPDVEVNAEDALATAIKIV M1_gasAcu RYTSSGDIS-GIPYIVSYFTEAG-TPIHIDSIYDRPSNTTTKLFSMSTLLGERYSTSKPLIILTSKNTKGIAEDVAYCLQNLKRATIVGEKTAGGSVKVDKIQVRDT------GFYVTVPTAKSVNPITG--STWEVTGVTPNVEVNAEDALATAIKIV M1_oryLat RYTSSGDIT-GIPYIISYLTDAK-SEIHIDTIYDRPLNTTTKLLSMQSTLGQTYGGTKPLLVLTSKNTKDIAEDVAYCLKNLKRATIVGEKTAGGSAKIKKFRVGDT------DFYVTLPTAKSINPITG--SSWEVTGVKPNVEVNAEEALATALKII M1_danRer RSTVTGELS-GIPYIVSYFTDPE-PLIHIDSVYDRTADLTIELWSMPTLLGKRYGTSKPLIILTSKDTLGIAEDVAYCLKNLKRATIVGENTAGGTVKMSKMKVGDT------DFYVTVPVAKSINPITG--KSWEINGVAPDVDVAAEDALDAAIAII M1_petMar RYNSHGDIT-GLPYLVSCFCEPR-PVVHLDTVYYRPTNESKEIWSLPDLQGARFAKHKDVFVLVSANTEGVAENVAYVLKHLHRATVIGEQTAGGSLEVERFRLGDS------RFFVTVPTARS-EPAD---RSW---GVFPCVSAPSERALDKALEIL M1x_takRu RFSTAGGWS-GIPSIVSYFTEPH-SLVHIDTVYDRPSNTTTELWTMSSVRGKTFGGKKDMIVLIGRRTAGAAEAVAYTLKHLNRAIVVGERSAGGSLKVRKFRIAES------DFYITMPVARSVSPITG--KSWEVSGISPTVNVAAREALAKAQTFL M1x_danRe RFSTAGELS-GLPYIVSYFSDSD-PLLHIDTIYERPTNITRELWTLPTLLGERFGKRKDLIVLISKRTIGAAEGVAYILKHLKRAVIIGERSAGGSVRVDKLKIGDS------GFYITVPVARSVNPVTG--QSWEVSGVAPSVTVNPKESIAKAKSLI M2_homSap RHNPGGPSS-AVPLLLSYFQGPEAGPVHLFTTYDRRTNITQEHFSHMELPGPRYSTQRGVYLLTSHRTATAAEEFAFLMQSLGWATLVGEITAGNLLHTRTVPLLDTPEG---SLALTVPVLTFIDNHG---EAWLGGGVVPDAIVLAEEALDKAQEVL M2_bosTau RQNPGGPSS-AVPLLLSYFQSPDASPVRLFSTYDRRTNITREHFSQTELLGRPYGTQRGVYLLTSHRTATAAEELAFLMQSLGWATLVGEITAGSLLHTHTVSLLETPEG---GLALTVPVLTFIDNHG---ECWLGGGVVPDAIVLAEEALDRAQEVL M2_monDom RYNPGGPSS-AVPLLLSYFQDPAAGPIRLFTTYDRQTNQTQEHLSRAELLGKPYGAQRGVYLLTSHHTATAAEEFAFLMQSLGRATLVGEITAGSLMHTRTFPLLQPPNG---NLVLTVPILTFIDNNG---ECWLGGGVVPDAIVLAEEALDKAKEVL M2_sacHar RYNPGGPSS-AVPLLLSYFQDPSAGPVRLFATYDRQTNQTQEYRSRAELLGKPYGAERGVYLLTSYHTATAAEEFAFLMQSLGRATLVGEITAGSLMHTRTFPLLQPPNG---SLVLTVPTLTFIDNHG---ECWLGGGVVPDAIVLAEEALDKAKEVL M2_macEug RYNPGGPSS-AVPLLLSYFQDPAAGPVRLFATYDRQTNQTREYRSQAELLGKPYGAQRGVYLLTSHHTATAAEEFAFLMQSLGRATLVGEITAGNLMHTRTFSLLQPPDG---SLVLTVPILTFIDNHG---ECWLGGGVVPDAIVLAEEALDKAKEVI M2_galGal RYNPGGPSSSAVPMLISYFQDPTAGPVHLFTTYDRRTNHTQEHNSQAELLAQPYGAQRGIYVLTSRHTATAAEEFAYLMQSLGRATLIGEITAGSLSHTCTFPLVQPEQGITRGLTITVPVITFIDNHG---ESWMGGGVVPDAIVLAEDALEKAEEVL M2_taeGut RYNLGGPSSSAVPVLLSYFQDPAAGPVHLFTTYDRRTNHTQEHNSQAELLGQSYGAKRGVYLLTSHHTATAAEEFAYLMQSLGRATLIGEITAGSLSHTRTFPLLQPGPGITRGLTITVPVITFIDNHG---ESWMGGGVVPDAIVLAEDALEKAEEVL M2_anoCar RYNIGGPSSSAVPVLLSYFQDPSAGPVHFFTTYNRLTNQTQAYSSSAEMVGKPYGARRGVYLLTSHNTATAAEEFAYLMQTLGRATLVGEITAGSLSHTHTFCILELGGGC--GLLINVPVITLIDNHG---EYWLGGGVVPDSIVLADEALEKAREVL M2_ornAna RNNPGGPSS-AVPLLLSYFQDPAAGPIRLFTTYNRPADVTREYASRAGALEKPYGARRGVYLLTSHRTATAAEEFAYLMQALGRATLVGEITAGRLLHSRTFPLLRPPWE---GLVLTVPFLTLFDPHG---EGWLGGGVVPDAIVLAEEALEKAGEVL M2_xenTro RYNIGG-SSTSIPLLLSYFQEPE-NRIHLFTIYNRQQNSTNEVYSLPKVLGKPYGSKKGVYVLTSHETATAAEEFAYLMQSLSRATIIGEITSGNLMHSKAFPLDGTR------LSVTVPIMNFIDNNG---DYWLGGGVVPDAIVLADEALDKAKEII M2_xenLae RYNVGG-SSTAVPLLLSYFLDPE-TKIHLFTLHNRQQNSTDEVYSHPKVLGKPYGSKKGVYVLTSHQTATAAEEFAYLMQSLSRATIIGEITSGNLMHSKVFPFDGTQ------LSVTVPIINFIDSNG---DYWLGGGVVPDAIVLADEALDKAKEII M2_petMar RYNLGG-SSTAIPIVLSYFQDVA--PVHFYTVYDRLRNVTAEFHTVSNLTSQLYGSKKGVYLLTSQHTATAAEEFTYLMQSLNRATIVGEITSGRLAHSLAFRLSDT------GLYMTVPIVNFIDNND---EYWLGGGVVPDAIVLAENALDAAKEII M2x_takRu RHNTGG-SSTSVALLLSYLRDPL-PKRHFFTIYDSVQNTTTEYGSRPHIPGPSYGSERGVYVLTSHYTAGAAEEFAYLIQSLHFGTVVGEITSGTLMHSKTFQVEGT------DIFITVPFINFLDNNG---EYWLGGGVVPDAIVLAEEALEHVNRTA M2x_danRe RFNTGG-STEALPILLSYMFDTS-SSTYLFSIYDSIKNTTFDFHTLNNISGPSYGSTKGVYVLTSYYTAEAGEEFAYLMQSLHRGTVIGEITSGMLLHSKTFQIEQT------SLAITVPIINFIDVNG---ECWLGGGVVPDAIVLAEEALERAHEII M3_homSap RYNPGSYST-AIPLLCSYFFEAE-PRQHLYSVFDRATSKVTEVWTLPQVAGQRYGSHKDLYILMSHTSGSAAEAFAHTMQDLQRATVIGEPTAGGALSVGIYQVGSS------PLYASMPTQMAMSATTG--KAWDLAGVEPDITVPMSEALSIAQDIV M3_bosTau RYNPGSYST-AVPLLCSYFFEAE-PRRHLYSVFDRATSRVTEVWTLPHVTGQRYGSHKDLYVLVSHTSGSAAEAFAHTMQDLQRATIIGEPTAGGALSVGIYQVGSS------ALYASMPTQMAMSASTG--EAWDLAGVEPDITVPMSVALSTARDIV M3_monDom RYNPGGYST-AVPLLCSYFFEAE-PRRHLYTIFDRAASQLTEVWTLPQVAGERYGSQKDLYILISHTSGSAAEAFVHTMKDQHRATVIGEPTGGGALSVGIYQVENS------PLYASMPTQVAISPVTG--KAWDMAGVEPDVSVLSSEALMTTQGIV M3_macEug RYNPGSYST-SVPLLCSYFFEAE-PRKHLYTIFDRAASQATEVWTLPQVAGERYGSQKDLYILISHTSGSAAEAFAHAMKDLRRATVIGEPTAGGALSVGIYQVSNS------PLYASMPTQVAISPVTG--KAWDIAGVEPDVSVPAREALITTQGIV M3_sacHar RYNPGSYST-TVPLLCSYFFEAE-PRKHLYTVFDRATSQFTEVWTLPQVTGERYGSQKDLYILISHTSGSAAEAFAHIMKDLQRATVIGEPTAGGALSVGIYQVGDS------PLYVSMPTQVALSPVTG--KAWDMAGVEPDVSVLANEALITAQGIV M3_taeGut RYNTGGYST-AIPILCSYFFDPE-PRKHLYTVFDRSTSRSTEVWTLPQLAGKRYGSLKDIYILTSHMSGSAAEAFTRSMKDLHRATVVGEPTVGGSLSVGIYRVGNS------SLYASIPSQVVLSPVTG--KVWSVSGVEPHITIQASEAMAAAQHIA M3_galGal RYNTGGYST-AVPILCSYFFEPE-PRQHLYTVFDRSTSRSTEVWTLPKVTGKRYGSLKDIYILTSHMSGSAAEAFTRSMKDLHRATVIGEPTVGGSLSVGIYRVGNS------SLYRSIPSQVVLSPVTG--KVWSVSGAEPHITIQASEALAAAKHIA M3_anoCar RYNTGSYST-AVPMFCSYFFDAE-PQQHLYTIIDRSTSQSTEVWTSSQVSGKRYGSTKDLYILISHASGSAAEAFTRSLKDLHRATVIGEPTVGGSLSASIYNIGST------PLYASIPSQIVLSPVSG--KVWSLSGIQPHVTTQSNEALASAQNII M3_xenTro RYNTGGYST-AIPIFCSYFFDPE-PLQHLYTVYDRSTSSGTDIWTLPEVVGERYGSTKDIYILTSHMTGSAAEVFTRSMKELNRATIIGEPTSGVSLSVGMYKVGES------NLYVSIPNQVVISSVTG--KVWSVSGVEPHVIAQASEAMNVAHHII M3_xenLae RYNTGGYST-AIPIFCSYFFDPE-PLQHLYTVYDRSTSTGKDIWTLPEVFGERYGSTKDIYILTSHMTGSAAEVFTRSLKDLNRATLIGEPTSGVSLSVGMYKVGDS------NLYVTIPNQVVISSVTG--KVWSVSGVEPHVIIQANEAMNIAHRII M3_petMar RYNMGGYST-SIPILCSYFFDAS-PPRHLYTVFDRPSRSSTQVFTVPRVLGQRYGASKDVYILTSHMTGSAGEILTRVMSDLKRATVIGEPTAGGSLSTGTYRIGDS------RLYVFIPNQAGVSPSGG--RTWSVAGVEPHVQTKASEALQSALRMV M3x_takRu RYNAGGSST-AVPLWCSYFVDGE-PLQHLYTVYDRTTKTRVEVMTLPEVSGQRYDPGKDVYILTSHMTGSAAEAFVRAMRDLNRVTIVGEPTAGGSLSSATYQIGES------VLYASIPNQVVTSAATG--KLWSISGVEPDVFAQARDALPVAQRII M3x_danRr RYNTGGYST-AIPLLCTYFFDAQ-PLTHIYTLFDRSTATVTKVTTLPDVLGQKYSSQKDVYILTSHITGSAAEAFTRTMKDLKRATVIGEPTIGGALSSGTYQIGNS------ILYASIPNQAVLNAVTG--KPWSISGVEPHIVAQASDALIVAQKII M4_petMar RYNMGGYST-SIPILCSYFFDAS-PPRHLYTVFDRPSRSSTQVFTVPRVLGQRYGASKDVYILTSHMTGSAGEILTRVMSDLKRATVIGEPTAGGSLSTGTYRIGDS------RLYVFIPNQAGVSPSGG--RTWSVAGVEPHVQTKASEALQSALRMV M4_homSap MFNIGGPTS-SIPILCSYFFDEG-PPVLLDKIYSRPDDSVSELWTHAQVVGERYGSKKSMVILTSSVTAGTAEEFTYIMKRLGRALVIGEVTSGGCQPPQTYHVDDT------NLYLTIPTARSVGASDG--SSWEGVGVTPHVVVPAEEALARAKEML M4_bosTau RFNIGGPTS-SISALCSYFFDEG-PPILLDKIYNRPNNSVSELWTLSQLEGERYGSKKSMVILTSTLTAGAAEEFTYIMKRLGRALVIGEVTSGGCQPPQTYHVDDT------DLYLTIPTARSVGAADG--SSWEGVGVVPDVAVPAEAALTRAQEML M4_monDom RFNIGGPTS-SISALCSYFFDEG-QEVLLDQIYNRPNDSISEIWTQSQVAGERYGSKKSVIILTSSMTAGAAEEFVYVMQRLGRALVIGEVTSGGCQPPQTYHVDDT------DLYITIPTARSVGSGDK--PSWEGVGVAPHVEVPADQALSKAKEMF M4_macEug RFNIGGPTS-SISALCSYFFDEG-QKVLLDRIYNRPNDSIVEIWTQPHVTGERYGSKKSVIILTSSTTAGAAEEFVYIMQGLGRALVIGEVTSGGCQPPQTYHVDDT------DLYITIPTAQSVGSGDR--PSWEGIGVTPHVEVPADQALSKAKEMF M4_sacHar RFNIGGPTS-SISAMCSYFFDEG-QGVLLDRIYNRPNDSISEIWTQPQVIGERYGSKKTVVILTSSMTAGAAEEFVYIMQRLGRALVIGEVTSGGCQPPQTYHVDDT------DLYITIPTTRSVVSGDK--SSWEGVGVVPHMEVPADQALSKAKEMF M4_ornAna R-NIGGPTS-SISALCSYFFDED-HPVLLDKIYNRPNDSISEIWTHSHIAGERYGSRKSVVILTSNMTAGAAEEFVSIMKRLGRALVVGEVTGGGCHPPQTYHVDDT------HLYITIPTSRSVGSEDG--SSWEGVGVTPHLVVPADVALSRAKDLF M4_xenLae RFNIGGPTS-SIPIFCSYFFDEG-TPVLLDKIYSRTSNAMTDIWTLPDLVGKTFGSKKPLIILTSSLTEGAAEEFVYIMKRLGRAYVVGEVTSGGCHPPQTYHVDDT------HLYLTIPTSRSASAEPG--ESWEGKGVLPDLEISSETALLKAKEIL M4_xenTro RFNIGGPTS-SIPTFCSYFFDEG-TPVLLDKIYSRTTNAITDVWTLPHLVGNAFGSKKPVIILTSSLTEGAAEEFVYIMKRLGRAYVIGEVTSGGCHPPQTYHVDDT------HLYLTIPTSRSASAKPG--ESWEGKGVLPDLEITSETALMKAKEIL M4_taeGut RYNIGGSTT-PIAILCSYFFDEG-HPVLLDRVYDRPSDSVKEIWTQPQLKGERYGSQKGLVILTSAVTAGAAEEFVYIMKRLSRALIIGEQTSGGCHSPQTYQVDET------NFYVVIPTSRSVTSADS--TSWEGKGVSPHIETPAETALIKAKEML M4_galGal RYNIGGYTN-SIPILCSYFFDEG-HQVLLDKVYDRPSDSVKEIWTQPQLRGERYGSQKGLIILTSAVTAGAAEEFVFIMKRLGRALIIGEQTSGGSHSPQTYQVDDT------NFYIIIPTARSVISAES--ASWEGKGVPPHMETPAVTALIKAKEVL M4_anoCar RYNVGGPAC-SVPLLCSYFFDEG-HPILLDKVYNRPNDTTSNIWTVSKLAGKRYGLNKGLIILTSSVTSGAAEEFAHIMKRLGRAFIIGQKTSGGCHPPQTFHVDGT------NLYITTPVSRSVFSVN---DSWEGVGVSPHLDVSTDVALIKAKEML M4_tetNig RNNVGGPTT-AIAGFCSYFFDAD-KQNRVGQAVRQASGTTTELLTLSELTGVRYGSKKSLIILTSGATAGAAEEFVYIMKKLGRAMIVGETTAGASHPPQTFRVGET------DVFLLIPTVHSDTGA-G--PAWEGAGIAPHIPASAEAALGTARAIL M4_takRub RNNVGGPTT-AIAGFCSYFFDAD-KLIVLDKLHDRPSGTTTELLTLPELTGVRYGSKKSLIILTSGATAGAAEEFVYIMKKLGRAMIVGETTAGASHPPQVFSVGEI------GIFLSIPTVHSDTAA-G--PAWEGTGITPHIPVSAEAALGTAKGIL M4_gasAcu RNNIGGPTT-AIAGFCSYFFDSD-KQIVLDRLYDRPSGTTTELRTLPELTGTRYGSKKSLVMLTSRATAGAAEEFVYIMKKLGRAMIVGETTAGTSHPPKTFRVGET------DIFLSIPTVHSDTAA-G--PAWEGAGVAPHIPVPADAALETAKGIF M4_oryLat RNNVGGPTT-AIAGFCSYFFDGD-KQILLDKLYDRSTGTTTDLLTLGELTGERYGSKKSLIILASRATAGAAEEFVYIMKRLGRAMIVGETTAGASHPPKVFQVGES------DIFLSIPTVHSDTSA-G--PGWEGAGVAPHIPVAAGAALETAKAIL M4_danRer RNNIGGHAS-SIAGFCSYFFDAD-KQIVLDHIYDRPSNTTRDLQTLEQLTGRRYGSKKSVVILTSGVTAGAAEEFVFIMKRLGRAMIIGETTHGGCQPPETFAVGES------DIFLSIPISHS-TAQ-G--PSWEGAGIAPHIPVPAGAALDTAKGML
Evolutionary origin of RBP3 (IRBP)
Lamprey sequence was recovered by a French group in 2008 from unassembled genome project contigs but not explicitly provided in the article nor posted to GenBank. They reported four modules despite uncertainty in whether these all resided in the same gene. Below, a nearly full length gene (exon 3 and 4 are missing) has been independently recovered from the initial lamprey assembly and parsed into its modules using Superfamily HMM. It indeed has four modules classifying to the expected types and ordering, proving the modern version of the gene was fully established prior to the last common ancestor of mammals and lamprey. As the lamprey ancestor already had fully modern ciliary color vision (so need for retinol shuttling), the role of RBP3 (IRBP) in extant species may have already been established 500 myr ago.
It cannot yet be verified that lamprey has the third intron, though otherwise intron positions and phases are identical to other vertebrates. While no full length chondrichthyes sequence is available, close study of the elephantshark contig containing exon 3 establishes that the final intron occurs with the same phase 0 and position seen in later-diverging tetrapods. While intron loss in lamprey is possible, the simpler explanation is intron gain in the stem for reasons provided below.
Callorhinchus milii contig AAVX01012059 establishes that exon 3 has standard flanking phase 1 introns on both sides: VIASTSSLIVDLRYNIGGPTSSIPILCSYFFDDDKTVLLDTVYSRPTDTISEMKAIPQVAGNGSTESSVHSYIGERYGSKKSMVIL +I T+SL D RYNIGGPTSSIPILCSYFFDDDKTVLLDTVYSRPTDTISEMKAIPQVAGNGSTESSVHSYI E K +I+ LIIETNSL-RDHRYNIGGPTSSIPILCSYFFDDDKTVLLDTVYSRPTDTISEMKAIPQVAGNGSTESSVHSYICEDLHHCKYGLII
Previous efforts to trace the gene back to earlier deuterostomes (or metazoa) proved futile. That remains the case in March 2009 for tunicate (despite a third assembly and massive transcript set) and much-studied sea urchin, establishing that in all likelihood homologs have been lost. However it is possible but implausible that sequences have diverged to the point of unrecognizability. It is equally implausible that a complex fold matching known bacterial folds arose de novo in Cambrian deuterostomes. This creates the unresolved dilemma of ghost gene retention over an immense time frame in ancestral eyeless species yet loss in almost all extant clades.
With the advent of the second assembly of the cephalochordate Branchiostoma floridae (which has far fewer polymorphism-related assembly stutters than the first release), it is straightforward to recover a homolog in this species (after rubbishing an unmotivated fusion to the adjacent sulfotransferase in a JGI gene model). The smaller gene here consists of a single module that clusters best with M3 and M4. This suggests that the module number expansion -- like so many key events in vertebrate gene evolution -- took place between cephalochordate and agnathan divergences.
The shocking aspect of the Branchiostoma RBP3 (IRBP) gene is its 9 exons. These range in size from 30 to 66 amino acids, quite typical of the average vertebrate gene. Splice junctions are all standard GT-AG. The anomaly here is really in the immense size of the first exon of the vertebrate four-module gene which extends for 1018 residues in human. That can be placed in prospective using the UCSC Table Browser to determine the overall size distribution of the 190,000-odd human coding exons. The average protein has 450 residues and 8-9 exons.
The second anomaly is that the placement and phasing of the Branchiostoma introns do not correspond at all (via blastp registration) to those of module 4, the only vertebrate module with internal exons (3 of them). A great majority of introns are immensely conserved -- from human to cnidarian -- so the notion of massive erasure followed by de novo intronation in either the Branchiostoma or vertebrate gene can be discarded. However some explanation of descent is required because the genes are strongly homologous (chance expectation e-30) though at best 31% identical and gappy in alignment.
The Branchiostoma gene can be assumed orthologous to vertebrate IRBP for lack of a better candidate, but nearby genes in the assembly (EARS2 DNAJC19 UGT2B4 CD79B) bear no obvious synteny to those flanking the human RBP3 gene (RBP3 ZNF488 GDF2 GDF10 ANXA8L1) and only broken chromosomal correspondence is seen in whole genome alignment to human (net track).
It follows that searches for earlier diverging species that still carry a homolog should probably be carried out with a single-domain protein as query. The amphioxus protein is the only one currently available and may be highly diverged from its ancestral form. In any event, it does not expose any cryptic homologs in tunicate or echinoderm, much less lophotrochozoa, arthropods or cnidaria known to have ciliary opsin systems.
Under the twin assumptions of orthology and approximately ancestral intronation pattern in Branchiostoma, how do we get from a one module gene with 8 introns to a four module gene with 3 unrelated introns by lamprey divergence? This in some ways may have been the critical step in the evolution of imaging eyes with their massive requirement for rapid retinol recycling that likely vastly exceeded that of the ancestral cephalochordate (note RPE65 is rate-limiting). IRBP co-evolved its interstitial location and shuttling function with the newly developed supporting role of the retinal pigmented epithelium, which together were essential for effective vertebrate imaging vision.
One hypothetical scenario is formation first of a retroprocessed intronless gene with one module. This may have displaced the nine-exon parental gene because of selection for higher throughput translation. The protein may have been initially a non-cyclic tetramer of discrete subunits. The gene dosage perhaps doubled via a tandem copy, again to meet the need of increased retinol cycling. Next the intervening untranslated region experienced deletions fusing the previously independent modules, leading to a dimer of the internal repeat dimer at the protein level.
The gene complex then doubled again perhaps through recombinational module mismatch, once again to meet the newly evolving need for rapid retinol recycling -- at this point all four modules may have bound retinol. New introns with were gained by an unexplained process that favored the 3' end of the gene. Finally a need for allosteric regulation or interaction with other proteins allowed subfunctionalization of other domains to non-retinol shuttling roles despite the apparent loss of previously selection for efficiency because RPE65 had become rate-limiting.
While this scenario is speculative and non-unique, elements of it are not uncommon in other genes, even in opsins. For example, a retroprocessed fish RHO1 displaced the parental 5-exon gene into the pineal and LWS repeated spawned secondary cone opsin genes that were initially tandem. A great many human proteins have internally repeated domains and some like titin quite large numbers of them. There may not exist sufficient surviving members of the cephalochordate, hagfish and lamprey clades to specifically illuminate what really happened here. The highest priority would be to study localization and function of RBP3_braFlo within that the visual systems of that organism.
>RBP3_braFlo Branchiostoma floridae Region: 9 exons; 1 domain: 83-381 0 MTRPSKVDIVFPIKPFTIPTAHEQVKGEGPVDINKNALCKSADEGHTHP 1 2 VSIAMAPTAYIVFVALVPTVLSVDWLDVVMGIGDVMADHYLDQDLRALNDQSLLQRWNRTLVHRFQ 0 0 SWSQDDMSDSLRMEEGLTSELRNITGDETIK 0 0 VWDFGVYENTTQEPVPREFYNFSTFVDNFK 2 1 KNREKHINVTMLEGNVGYVSIRSMSHIVDIILPDPEMTEFFLSKMAALNESK 0 0 AIILDLRYNLGGDREGVVHWASFFFNATPSVPLSDVYYRDGVNQYWTLLE 0 0 VPGGIRFPDMPLYLLTSNRTSREAEEFAYAMQVVNRTTIIGETT 1 2 AGEEFTGMWFPIDQTDVHLLTRTNVVRNPITQDSWSGK 1 2 GVTPDIIVPSEKALTVALRKIQGSEDTKMAASSGNIEPPRWTVYLVFICTSIAILTYPTFM*
RBP3 (IRBP) use in marsupial phylogeny
The first three homology domains and part of the fourth are all encoded by the first large exon of 1090 amino acids. Sequenced fragments of this exon (often starting internally within module M1) has been much used in marsupial phylogeny (along with the first intron of transthyretin). Indeed the 96 marsupial species in 51 genera having partially determined IRBP sequences at GenBank include a Dec 2008 partial sequence for the extinct species Thylacinus cynocephalus, as well as for Sarcophilus harrisii (replaced here by a full length sequence derived from the genome project).
The use of IRBP in phylogenetic trees requires a high level of sequence accuracy, particularly since 'informative' characters are precisely the differences. However the PCR survey sequences exhibit various discrepancies in comparison to very high coverage genomic sequences. Some differences could arise from valid polymorphisms in individual animals used as dna source or represent an acceptable (low) level of PCR sequencing error, but other discrepancies appear to be gross errors in GenBank submissions (all from the same research group).
The Sacrophilus entry AY532685 has 6 amino acids inconsistent with genomic data (too many for polymorphic variation) as well as a gross error affecting a 17 residue block. It seems likely in the rush to obtain enough 'informative' characters, a completely unworkable error rate has been tolerated. All marsupial phylogenetic studies based in part on analysis of IRBP thus come into question.
None of the fragmentary marsupial GenBank entries provide a numbering system relative to a full length marsupial IRBP gene (eg Monodelphis) or its modules; the 4x module repeat structure of the gene is conducive to erroneous module cross-alignment. cDNA sequences typically start at position 73 of M1 (beginning of first hyper-variable indel region) and continue half way through M2 (to position 143), for example EF028750 of Myrmecobius fasciatus (numbat). Thus the region commonly sequenced is poorly coordinated with internal modular structure or existing 3D data.
The closest matches to the thylacine IRBP are shown in the difference alignment of the first 60 residues below. These species all lie with the Dasyuromorphia. The indicated E-->K may be one of several phyloSNPs breaking this group into blue and green subclades. The numbat Myrmecobius fits implausibly (its amino terminal sequence EF028750 needs verification) -- its affinities seem to lie with the Didelphimorphia in view of the shared Q.VV.K motif. Thylacinus is not basal within Dasyuromorphia relative to Myrmecobius using IRBP. This is not a case of mis-comparison of modules.
* * * STSKAPQHDSKFTNATQEELLALFQQIIKYQVLEGNVGYLRVDYIPGREMIEEVGEFLVN EU091365 0 Thylacinus cynocephalus .........P..A..................I............................ AY532676 3 Myoictis wallacei ........NP..A............................................... AY532687 3 Neophascogale lorentzii ........NP..A........T...................................... AY532686 4 Phascolosorex dorsalis .........P..V............................................... AY532670 2 Parantechinus apicalis ....V....P..A..................I.....................L...... AY532675 5 Myoictis melas .........P..A...................................D........... AY532679 3 Dasyurus hallucatus ...E.....P..A............K........D.............D........... AY532685 6 Sarcophilus harrisii ...E.......RA..........L............................Q..K.... EF028748 6 Sminthopsis crassicaudata .......R.P.LA.........SL.......................Q....Q....... EF028749 8 Planigale ingrami ..A......P.LA.V.....................................K....... EF028736 6 Antechinus stuartii ..A......P.L..V.....................................K....... EF028743 5 Micromurexia habbema ..A......P.LA.V.....................................K....... EF028744 6 Murexchinus melanurus ..A......P.L..V....V................................K....... EF028746 6 Paramurexia rothschildi ..A......P.LA.V.....................................K....... EF028747 6 Phascogale calura ..A......P.LA.V.....................................K....... EF028745 6 Phascomurexia naso .SA......P.LA.V.....................................K....... AY532667 7 Murexia longicaudata ......K..PNLA........T.L..R....................Q.VV.K....... EF028750 12 Myrmecobius fasciatus ..PET...VP..A.V........L..M....................Q.VV.K....... AY233765 13 Caluromys philander ..PET...VP.LA.V.......QL..M....................Q.VV.K....... AF257675 15 Caluromysiops irrupta ..PET...VP.LA.V......T.L..M....................Q.VV.K....... AF257688 15 Glironia venusta .IPET...VP..A.V.R....T.L..M....................Q.VV.K....... AF257683 16 Didelphis albiventris .IPE....VP.LA.I......T.L..M....................Q.VV.K....... AF257686 15 Gracilinanus microtarsus .IPET...VP..A.V......T.L..M....................Q.VV.K....... AF257676 15 Marmosops noctivagus .IPET...VP.LA.V........L..M....................Q.VV.K....... AY233788 15 Philander opossum .IPET...VP.LA.I......T.L..M....................Q.VV.K....... AF257689 16 Thylamys pallidior
Using Sarcophilus as probe in a different region, 721-900, we find a very peculiar outcome for numerous GenBank marsupial entries: the 17 residue region orthologous to 38-54 in M2 (VLTEEDLAAKLNAMLQA in M2_sacHar) has been replaced by a non-homologous 17 residue section copied over from 143-158 of M1 (SLVLDLQHSRGGEVSG in M2_myrFas), explaining non-alignment in the latter region (blue shows M2):
myrFas: 61 DIWKKVMETSSLVLDLQHSRGGEVSGIPFVISYLHQGDILLHVDTIYDRPSNTTTEIWTL 120 DIWKKVMETSSLVLDLQHSRGGEVSGIPFVISYLHQGDILLHVDTIYDRPSNTTTEIWTL sacHar: 154 DIWKKVMETSSLVLDLQHSRGGEVSGIPFVISYLHQGDILLHVDTIYDRPSNTTTEIWTL 213 myrFas: 121 PQVLGERYSGEKDIVVLTSHHTVGVAEDIAYILKEMRRAIVVGEQTQGGVLDLRKLRIGQ 180 PQVLGERYSGEKDIVVLTSHHTVGVAEDIAYILK+MRRAIVVGEQTQGG LDLRKLRIGQ sacHar: 214 PQVLGERYSGEKDIVVLTSHHTVGVAEDIAYILKKMRRAIVVGEQTQGGSLDLRKLRIGQ 273 myrFas: 181 SDFFITVPVARSLSPLGGGSQTLEGSGVVPCVGIPAEKALEKTLAILTLRRARPGAIQRL 240 SDFFITVPVARSLSPLGGGSQT EGSGVVPCVGIPAE+ALEK LAILTLRRARPGAIQRL sacHar: 274 SDFFITVPVARSLSPLGGGSQTWEGSGVVPCVGIPAEQALEKALAILTLRRARPGAIQRL 333 myrFas: 241 MEILQKYYTLVDRVPALLHHLTAIDYSSSLVLDLQHSRGGEVSGTVSEDPRLLVRVLRSE 300 MEILQKYYTLVDRVPALLHHLTAIDYSS L + ++ + VSEDPRLLVRVLR E sacHar: 334 MEILQKYYTLVDRVPALLHHLTAIDYSSVLTEEDLAAKLNAMLQAVSEDPRLLVRVLRPE 393
The source of this error cannot plausibly be explained XY difference, pseudogene, balanced polymorphism, nonhomologous recombination, frameshifts, internal inversion, or systemic experimental error (eg Dasyurus maculatus AY532680 is identical to AY243439 outside the 15 amino acid block). Genomic reads from Sarcophilus, Macropus and Monodelphis show no sign of novel substitution gene despite excellent coverage. All Didelphimorphia and Diprotodontia entries at Genbank are normal; sauropods, platypus and dozens of placentals establish this as ancestral. For some marsupials, independent sequencing finds the normal version of the gene.
These peculiar sequences were all submitted by C. Krajewski on 23-JAN-2004 and utilized in a paper entitled "Molecular systematics of the enigmatic phascolosoricine marsupials of New Guinea" published in the Aust. J. Zool. 52, 389-415 (2004). Two similarly flawed sequences submitted the same day (AY532677, AY532679) were subsequently corrected on 25-SEP-2006 by the same individual resulting in GenBank entries being fixed. However many subsequent studies of IRBP may have used the 17 flawed sequences that were never corrected. The same authors resequenced species such as Pseudantechinus roryi (EU086689) obtaining the correct sequence in this region on 07-AUG-2007 but still did not fix their earlier entry AY532673.
AY532685 MEILQKYYTLVDRVPALLHHLTAIDYSSSLVLDLQHSRGGEVSGTVSEDPRLLVRVLRSE Sarcophilus harrisii AY532684 ....E................................S....................P. Dasyurus geoffroii AY532681 ....E................................S....................P. Dasyurus albopunctatus AY532683 ....E................................S....................P. Dasyurus viverrinus AY532682 ....E........................P.......SE...................P. Dasyurus spartacus AY532680 ....E..............R.................SR...................P. Dasyurus maculatus AY532678 ..V..................................S....................P. Dasycercus cristicauda AY532669 ..V..................................S....................P. Dasykaluta rosamondae AY532676 ..V..................S...............S....................P. Myoictis wallacei AY532675 ..V..................S...............S....................P. Myoictis melas AY532687 ..V........N.L.......................S....................P. Neophascogale lorentzii AY532671 ..V..................................S....................P. Parantechinus bilarni AY532670 ..V.................................TS.........RG.........P. Parantechinus apicalis AY532686 ..V..................................S........P...........p. Phascolosorex dorsalis AY532674 ..V.......................................................P. Pseudantechinus ningbing AY532672 ..V..................................S....................P. Pseudantechinus woolleyae AY532673 ..V........N..R......................S...................SP. Pseudantechinus roryi 454 read MEILQKYYTLVDRVPALLHHLTAIDYSSVLTEEDLAAKLNAMLQAVSEDP Sarcophilus harrisii EF028739 ............................V.TEEDLAAKLNAMLQA.............P. Antechinus minimus AY243439 ....E..............R........V.TEEDLAAKLNAMLQA.............P. Dasyurus maculatus EF028750 ....K................KT.....I.TEEDLAAKLNAILQA.............P. Myrmecobius fasciatus EF028737 ..V.........................V.TEEDLAAKINAMLQA.............P. Antechinus flavipes EF028748 ..V.........................V.TEEDLAAKLNA.LQA.............P. Sminthopsis crassicaudata AY243438 ..V.........................V.TEEDLAAKLNA.LQA.............P. Planigale sp. EF028749 ..V.........................V.TEEDLAAKLNA.LQA.............P. Planigale ingrami AY532679 ..V.........................V.TEEDLAAKLNAMLQA............... Dasyurus hallucatus AF025382 ..V.........................V.TEEDLAAKLNAMLQA.............P. Phascogale tapoatafa EF028741 ..V.........................V.TEEDLAAKLNAMLQA.............P. Antechinus godmani AY532666 ..V.........................V.TEEDLAAKLNAMLQA.............P. Antechinus swainsonii EF028736 ..V.........................V.TEEDLAAKLNAMLQA.............P. Antechinus stuartii EF028742 ..V.........................V.TEEDLAAKLNAMLQA.............P. Antechinus agilis EF028738 ..V.........................V.TEEDLAAKLNAMLQA.............P. Antechinus bellus EF028740 ..V.........................V.TEEDLAAKLNAMLQA.............P. Antechinus leo EF028747 ..V.........................V.TEEDLAAKLNAMLQA.............P. Phascogale calura EF028744 ..V.........................V.TEEDLAAKLNAMLQA.............P. Murexchinus melanurus EF028743 ..V.........................V.TEEDLAAKLNAMLQA.............P. Micromurexia habbema EU086688 ..V.........................V.TEEDLAAKLNAMLQA.............P. Pseudantechinus macdonnellensis EU086689 ..V.........................V.TEEDLAAKLNAMLQA.............P. Pseudantechinus roryi EU086686 ..V.........................V.TEEDLAAKLNAMLQA............SP. Pseudantechinus macdonnellensis EU086687 ..V.........................V.TEEDLAAKLNAMLQA..........G..P. Pseudantechinus mimulus AY532667 ..V.........................V.TEEDLAAKLNAMLQA.............P. Murexia longicaudata EF028746 ..V.........................V.TEEDLAAKLNAMLQA.............P. Paramurexia rothschildi AY532677 ..V.........................V.TEEDLAAKLNAMLQA.............P. Dasyuroides byrnei EF028745 ..V..........I..............V.TEEDLAAKLNAMLQA.............P. Phascomurexia naso Macropus eugenii assembly sacHar MEILQKYYTLVDRVPALLHHLTAIDYSSSLVLDLQHSRGGEVSGTVSEDPRLLVRVLRSE ME+LQ YYTLVDRVPALLHHLTAIDYSS L + ++ VSEDPRLLVRVLR E macEug MEVLQNYYTLVDRVPALLHHLTAIDYSSVLTEEDLAAKLNAGLQAVSEDPRLLVRVLRPE Monodelphis domestica assembly TSSLVLDLQHSSGGEISG sacHar MEILQKYYTLVDRVPALLHHLTAIDYSSSLVLDLQHSRGGEVSGTVSEDPRLLVRVLRSE ME+LQ YYTLVDRVPALLHHLTAIDYSS L + ++ VSEDPRLLVRVLR E monDom MEVLQNYYTLVDRVPALLHHLTAIDYSSVLTEEDLAAKLNAGLQAVSEDPRLLVRVLRPE Ornithorhynchus anatinus assembly sacHar EILQKYYTLVDRVPALLHHLTAIDYSSSLVLDLQHSRGGEVSGTVSEDPRLLVRVLRSE ++L+ YY LVDRVPALL HL A+D SS L + SR SEDPRLLVR L E ornAna DLLRDYYALVDRVPALLRHLAALDLSSVLSEEDLTSRLNAGLQAASEDPRLLVRRLEPE Equus caballus assembly sacHar EILQKYYTLVDRVPALLHHLTAIDYSSSLVLDLQHSRGGEVSGTVSEDPRLLVRVLRSE E LQ YYTLVDRVPALLHHL ++D+SS + D ++ VSEDPRLLV V+RS+ equCab EALQDYYTLVDRVPALLHHLASMDFSSVVSEDDLVAKLNAGLQAVSEDPRLLVWVVRSK
It emerges from direct tblastn that the Sacrophilus individual sequenced was female. That is, ATRX is well represented but not ATRY (though the situation is somewhat confused due to additional paralogs). Marsupial XY are quite different from placentals:
"Many or most genes on the mammal Y chromosome evolved a testis-specific function after diverging from an X-borne copy with a general function in both sexes. In marsupial but not eutherian mammals, a testis-specific ortholog (ATRY) of the widely expressed X-borne ATRX gene lies on the Y chromosome. Since mutations in human ATRX cause sex reversal, it is possible that one function of ATRY in marsupials is testicular differentiation. We report here the isolation and sequencing of the tammar wallaby (Macropus eugenii) ATRY cDNA, and comparison of its sequence with that of tammar ATRX. The evolution of a testis-specific function for the ATRY protein distinct from the general role of ATRX in both sexes has been accompanied by sequence changes in many protein domains that would alter protein binding partners. A large open reading frame encodes a 1771 amino acid ATRY protein that has diverged extensively from ATRX. The conservation and loss of particular motifs identify those required for testicular function (ATRY) and function in other tissues (ATRX)."
Reference sequences
These are organized both parsed into functional modules (signal peptide and spacer residues are dropped) as determined by global multiple alignment consistency and as intronated genes showing position and phase of intron breaks. Both sets are in phylogenetic order with respect to the canonical deuterostome tree.
RBP3 (IRBP) sequences parsed into structural modules
>M1_homSap LFQPSLVLDMAKVLLDNYCFPENLLGMQEAIQQAIKSHEILSISDPQTLASVLTAGVQSSLNDPRLVISYEPSTPEPPPQVPALTSLSEEELLAWLQRGLRHEVLEGNVGYLRVDSVPGQEVLSMMGEFLVAHVWGNLMGTSALVLDLRHCTGGQVSGIPYIISYLHPGNTILHVDTIYNRPSNTTTEIWTLPQVLGERYGADKDVVVLTSSQTRGVAEDIAHILKQMRRAIVVGERTGGGALDLRKLRIGESDFFFTVPVSRSLGPLGGGSQTWEGSGVLPCVGTPAEQALEKALAIL >M1_bosTau LFQPSLVLEMAQVLLDNYCFPENLMGMQGAIEQAIKSQEILSISDPQTLAHVLTAGVQSSLNDPRLVISYEPSTLEAPPRAPAVTNLTLEEIIAGLQDGLRHEILEGNVGYLRVDDIPGQEVMSKLRSFLVANVWRKLVNTSALVLDLRHCTGGHVSGIPYVISYLHPGSTVSHVDTVYDRPSNTTTEIWTLPEALGEKYSADKDVVVLTSSRTGGVAEDIAYILKQMRRAIVVGERTVGGALNLQKLRVGQSDFFLTVPVSRSLGPLGEGSQTWEGSGVLPCVGTPAEQALEKALAVL >M1_monDom IFQPSLVRDMAKILLDNYCFPENLMGMQEVIEQAIKSGEILDISDPQMLASVLTAGVQGALNDPRLVISFEPSIPETPQHVPKLANVTQEELLILLQQMIKYQVLEGNVGYLRVDYIPGQEVVEKVGEFLVNNIWKKLMGTSSLVLDLQHSSGGEISGIPFVISYLHQGDILLHVDTVYDRPSNTTTEIWTLPQVLGERYGGEKDMVVLTSHRTVGVAEDIAYILKKLRRAIVVGEQTLGGALDLRKLRIGQSDFFITVPVSRSLSPLGGGSQTWEGSGVLPCVGIPAEQALGKALAIL >M1_macEug IFQPSLVLDMAKILLDNYCFPENLMGMQEAIEQAIKSGEILDISDPQMLASVLTAGVQGSLNDPRLVISYEPSPAEAPQQSPKLTSLTQEELLTLLQQMIKYQVLDGNVGYLRVDYIPGQEVVEKVGEFLVNDIWKKLMGTSSLVLDLQHSTEGEVSGIPFVISYLHEGDILLHVDTVYDRPSNTTTEIWTLPQVLGERYSGEKDLVILTSHRTVGVAEDIAYILKKMRRAIVVGEQTLGGALDLRKLRIGQSDFFITVPVSRSLSPLGGGSQTWEGSGVLPCVGIPAEQALEKALAIL >M1_sacHar IFQPTLVLDMAKILLDNYCFPENLMGMQEAIEQAIKSGEILDISDPQMLANVLTAGVQGSLNDPRLVISYEPSTSEAPQHDPKFANATQEELLALFQKIIKYQVLEDNVGYLRVDYIPGRDMIEEVGEFLVNDIWKKVMETSSLVLDLQHSRGGEVSGIPFVISYLHQGDILLHVDTIYDRPSNTTTEIWTLPQVLGERYSGEKDIVVLTSHHTVGVAEDIAYILKKMRRAIVVGEQTQGGSLDLRKLRIGQSDFFITVPVARSLSPLGGGSQTWEGSGVVPCVGIPAEQALEKALAIL >M1_ornAna VSQPSMVLDVAKILLDNYCYPENLMGMQEAIEEAIQRGEILDIADPKRLASVLTAGVQGSLNDPRLVISYEPAPVAVSQQPPEPASLPAEQPLERLRPAVGSEVLEGNVGYLRVDRLPGREEIERVGAVLGRDIWEKLLGTSALVLDLRHSTGGHVSGIPFFISYFYPEGPALHVDTVYDRPSNATRQLWTLPRVLGARYAADKDVVVLTSRLTAGVAEDVAYILQQMRRAIVVGERTAGGPLVFRKLRVGLSDFFITVPVACSLGPLGGGGRSWEGSGVLPCVAVPADRALDEALDIL >M1_galGal IFQPTLVLDMAKVLLDNYCYPENLVGMQEAIEQAIKSGEILDISDPKMLANVLTAGVQGALNDPRLVISYEPSLHAAPKQEAETYPTREQLLSLIEHVVIYDKLEGNVGYLRIDYIIGQEVVEKVGAFLVDKVWKTLINTSALVIDLRYSTGGQISGIPFIISYLHEADKMLHVETVYNRPSNTTTEIWTLPKVLGERYSKDKDVIVLISHHTTGVAEDVAYILKHMNRAITLGEKTAGGSLDIQKLRIGPSNFYMMVPVSRSVSPLSGGGQSWEVSGVMPCVASEAEQALKKSLDIL >M1_taeGut IFQPTLVLDMAKVLLDNYCYPENLVGMQEAIEQAIKSGEILDISDPKMLANVLTAGVQGALNDPRLVISYEPLPHSGPKQEAEGSPTREQLLSLIEHVIMYDKLEGNVGYLRIDYIIGEEVVQKVGAFLVDKVWKTLIETSALVIDLRHSTGGQISGLPFIISYLHEQDKILHVETVYNRPSNTTTEIWTLPKVLGERYSKDKDVIVLISHHTTGVAEDVAYILKHMNRAITVGEKTAGGSLDIQKLRIGPSNFYMMVPVSRSVSPLSGGGQSWEVSGVMPCVATEAEQALQKSLDIL >M1_anoCar VLQSTLVLDMAKLLLDNYCLPENLVGMREAIEQAIKNGEVLDISDPKLLATVLTAGVQGALNDPRLVISYEPTAPAAPKQRMETSLTPEQLLSLIQHTVKYEVLDDNVGYLRIDYIMGQDIVQKIGSFLVEKVWKTLLGTSALILDLRYTTGGDVSGIPFIISYLYNGDKVLHVDTVYNRPSNTTVEILTLPKVLGVRYSKDKDVILLISKYTTGVAENVAYILKHMHRTIIVGEKSAGGSLDTQKMQIGNSQFYMTVPLSCSVSPLSGSGQSWEISGVTPCVVISAEQALDKALAIL >M1_xenLae LFQPSLVMDMAKVLLDNYCFPENLVGMQETIEQAVKGGEILHISDPDTLANVFTSGVQGYLNDPRLVVSYEPNYSGPQTEQSLELTPEQLKFLINHSVKYDILPGNIGYLRIDFIIGQDVVQKVGPHLVNNIWKKLMPTSALILDLRYSTQGEVSGIPFVVSYLCDSEIHIDSIYNRPSNTTTDLWTLPELMGERYGKVKDVVVLTSKYTKGVAEDASYILKHMNRAIVVGEKTAGGSLDTQKIKIGQSDFYITVPVSRSLSPLTGQSWEVSGVSPCVVVNAKDALDKAQAIL >M1_xenTro VFQPSLVMDMAKVLLDNYCFPENLVGMQETIEQAMKSGEILHISDPETLANVFTSGVQGFLNDPRLVVSYEPNYSGPRKEQSPEPTLEQLKFLLDHSVTYDLLPGNIGYLRIDFIIGQDVVQKVGPLLVNNIWKKLMPSSALILDLRYSTQGKVSGIPFVVSYLTDPQIHIDSIYNRPSNTTTDLWTLSELMGERYGKDKDVVVLTSKYTEGIAEGAAYILKHMSRAIVVGEKTAGGSLDIQKIKIGQSEFYITVPVSRSISPLTGQSWEVAGVFPCVVVNANNALNKAQGIL >M1_tetNig AFPPSLIADMAKIVLDNYCSPEKLAGMKEAIKAAGTNTEVLNIPDGESLARVLSAGVQGTVSDPRLMVSFQPNYVPAGPHKMPPLPPEHLVAVLQTSVKLDILEGNTGYLRIDHILGEEVADKVGPALIDLIWNKILPTSALIFDLRYTSSGDISGIPYIVSYFTQAEPVVHIDSVYDRPSNTTTKLLSLPNLLGQRYGVSKPLIVLTSKNTKGIAEDVAYCLKNLKRATIVGEKTAGGSLKLDTFKVGDTDFYITVPTAKSINPITGSSWEIRGVTPHVEVNAEDALATAIKIV >M1_takRub AFPPSLITDMAKIVLDNYCSPEKLAGMKEAIEAAGTNTEVLNIPDGESLARVLSAGVQGTVSDSRLMVSYQPDYVPAVPPKMPPLPPEHLVAVLQTSIKLDLLEGNTGYLRIDHIIGEDVAEKVGPSLIDLIWNKILPTSALIFDLRYTSSGEISGIPYIVSYFTQAEPVVHIDSVYDRPSNTTTKLFSLSNLLGERYGITKPLIILTSKNTKGIAEDVAYCLKNLKRATIVGERTAGGSVKLDNFKVGSTDFYITVPTAKSINPVTGSSWEITGVKPDVEVNAEDALATAIKIV >M1_gasAcu GFAPNVIIDMAKIVIDNYCSPEKLAGMKEAIEAAGSNTEVLSIPDAETLANVLSAGVQTTVSDPRLMISYEPNYVPVVPPKMPPLPPDQVIAVLQTSIKLDILEGNIGYLRIDHILGEDVAEKVGPLLLDLVWNKILPTSALIFDLRYTSSGDISGIPYIVSYFTEAGTPIHIDSIYDRPSNTTTKLFSMSTLLGERYSTSKPLIILTSKNTKGIAEDVAYCLQNLKRATIVGEKTAGGSVKVDKIQVRDTGFYVTVPTAKSVNPITGSTWEVTGVTPNVEVNAEDALATAIKIV >M1_oryLat SFPPSLITDLAKIVMDNYCSPEKLSGMKEDIATAGANTDVLNIPDGEALAKVLTDGVQTTVSDPRLRVSYEPNYVPVVPPQLPPEQLIAVLQTSIKLDILEGNIGYLRIDSIIGEEVAEKVGPLLLELVWSKILPTSALIFDLRYTSSGDITGIPYIISYLTDAKSEIHIDTIYDRPLNTTTKLLSMQSTLGQTYGGTKPLLVLTSKNTKDIAEDVAYCLKNLKRATIVGEKTAGGSAKIKKFRVGDTDFYVTLPTAKSINPITGSSWEVTGVKPNVEVNAEEALATALKII >M1_danRer NFSPTLIADMAKIFMDNYCSPEKLTGMEEAIDAASSNTEILSISDPTMLANVLTDGVKKTISDSRVKVTYEPDLILAAPPAMPDIPLEHLAAMIKGTVKVEILEGNIGYLKIQHIIGEEMAQKVGPLLLEYIWDKILPTSAMILDFRSTVTGELSGIPYIVSYFTDPEPLIHIDSVYDRTADLTIELWSMPTLLGKRYGTSKPLIILTSKDTLGIAEDVAYCLKNLKRATIVGENTAGGTVKMSKMKVGDTDFYVTVPVAKSINPITGKSWEINGVAPDVDVAAEDALDAAIAII >M1_petMar KFDTAVVLHLAKVLLDNYCIPENLVGMDEAIQRAVDNGELLGVSDPESAASALTEGIQAALNDPRIAVSYVPDVDDDGDREEGDAEGWDAGEQHRPTTFEELLATIPQKTSFAVLDGNVGYLRADEIISEATIKKLGPVIVQRIWNRLVDTDTFVLDLRYNSHGDITGLPYLVSCFCEPRPVVHLDTVYYRPTNESKEIWSLPDLQGARFAKHKDVFVLVSANTEGVAENVAYVLKHLHRATVIGEQTAGGSLEVERFRLGDSRFFVTVPTARSQSPLTGRSWELTGVFPCVSAPSERALDKALEIL >M1x_takRu FYQHTLVLEMAKLLLENYCIPENLVGMQEAIQRAIKSREILQISDRKTLATVLTVGVQGALNDPRLSVSYEPSFSPLPLQALSSLPVEQQLRLLRNSIKLDILDSDVGYLRIDRIIDEETLLKFGPLLRENVWDKAAQTSSLILDLRFSTAGGWSGIPSIVSYFTEPHSLVHIDTVYDRPSNTTTELWTMSSVRGKTFGGKKDMIVLIGRRTAGAAEAVAYTLKHLNRAIVVGERSAGGSLKVRKFRIAESDFYITMPVARSVSPITGKSWEVSGISPTVNVAAREALAKAQTFL >M1x_danRe SFQSALVLDMAKILLDNYCFPENLIGMQEAIQQAINSGEILHISDRKTLASVLTAGVQGALNDPRLTVSYEPNYTLITPPALHSLPTEQLIRLIRSTVKLEVMDNNIGYLRIDRIIGQETVVKLGRLLHNNIWKKVAHTSAMIFDLRFSTAGELSGLPYIVSYFSDSDPLLHIDTIYERPTNITRELWTLPTLLGERFGKRKDLIVLISKRTIGAAEGVAYILKHLKRAVIIGERSAGGSVRVDKLKIGDSGFYITVPVARSVNPVTGQSWEVSGVAPSVTVNPKESIAKAKSLI >M2_homSap SALPGVVHCLQEVLKDYYTLVDRVPTLLQHLASMDFSTVVSEEDLVTKLNAGLQAASEDPRLLVRAIGPTETPSWPAPDAAAEDSPGVAPELPEDEAIRQALVDSVFQVSVLPGNVGYLRFDSFADASVLGVLAPYVLRQVWEPLQDTEHLIMDLRHNPGGPSSAVPLLLSYFQGPEAGPVHLFTTYDRRTNITQEHFSHMELPGPRYSTQRGVYLLTSHRTATAAEEFAFLMQSLGWATLVGEITAGNLLHTRTVPLLDTPEGSLALTVPVLTFIDNHGEAWLGGGVVPDAIVLAEEALDKAQEVL >M2_bosTau RALPGVIQRLQEALREYYTLVDRVPALLSHLAAMDLSSVVSEDDLVTKLNAGLQAVSEDPRLQVQVVRPKEASSGPEEEAEEPPEAVPEVPEDEAVRRALVDSVFQVSVLPGNVGYLRFDSFADASVLEVLGPYILHQVWEPLQDTEHLIMDLRQNPGGPSSAVPLLLSYFQSPDASPVRLFSTYDRRTNITREHFSQTELLGRPYGTQRGVYLLTSHRTATAAEELAFLMQSLGWATLVGEITAGSLLHTHTVSLLETPEGGLALTVPVLTFIDNHGECWLGGGVVPDAIVLAEEALDRAQEVL >M2_monDom RARPGAIQRLMEVLQNYYTLVDRVPALLHHLTAIDYSSVLTEEDLAAKLNAGLQAVSEDPRLLVRVLRPEEATMGSEASEEEDATPAANSLPEDESQRQALVDSVFQVSVLPGNVGYLRFDEFADSSVLGTLAPYVIRQVWEPLQDTNHLIMDLRYNPGGPSSAVPLLLSYFQDPAAGPIRLFTTYDRQTNQTQEHLSRAELLGKPYGAQRGVYLLTSHHTATAAEEFAFLMQSLGRATLVGEITAGSLMHTRTFPLLQPPNGNLVLTVPILTFIDNNGECWLGGGVVPDAIVLAEEALDKAKEVL >M2_macEug RARPAAIQRLMEVLQNYYTLVDRVPALLHHLTAIDYSSVLTEEDLAAKLNAGLQAVSEDPRLLVRVLRPEEATAGAESREEAATAAPVPLPDGESQRQALVNSVFQVSVLPGNVGYLRFDEFADSSVLGALAPYVLQQVWEPLQDTDHLIMDLRYNPGGPSSAVPLLLSYFQDPAAGPVRLFATYDRQTNQTREYRSQAELLGKPYGAQRGVYLLTSHHTATAAEEFAFLMQSLGRATLVGEITAGNLMHTRTFSLLQPPDGSLVLTVPILTFIDNHGECWLGGGVVPDAIVLAEEALDKAKEVI >M2_sacHar RARPGAIQRLMEILQKYYTLVDRVPALLHHLTAIDYSSVLTEEDLAAKLNAMLQAVSEDPRLLVRVLRPEEATVEAEPGEESATPASVSLPESDAERQALIDSVFQVSVLPGNVGYLRFDEFADNSVLGTLAPYVLRQVWEPLQDTDHLIMDLRYNPGGPSSAVPLLLSYFQDPSAGPVRLFATYDRQTNQTQEYRSRAELLGKPYGAERGVYLLTSYHTATAAEEFAFLMQSLGRATLVGEITAGSLMHTRTFPLLQPPNGSLVLTVPTLTFIDNHGECWLGGGVVPDAIVLAEEALDKAKEVL >M2_ornAna GAVPGAVAHLADLLRDYYALVDRVPALLRHLAALDLSSVLSEEDLTSRLNAGLQAASEDPRLLVRRLEPEEAERGPPRKEEEQKEEEEEDQPSPGASILPGDGSSLFRVSVLPGNVGYLCFDEFPEASALERLGPLLGRRVWEPLEATDHLMVDLRNNPGGPSSAVPLLLSYFQDPAAGPIRLFTTYNRPADVTREYASRAGALEKPYGARRGVYLLTSHRTATAAEEFAYLMQALGRATLVGEITAGRLLHSRTFPLLRPPWEGLVLTVPFLTLFDPHGEGWLGGGVVPDAIVLAEEALEKAGEVL >M2_galGal RAVPGTLSRLTDILKDYYSLVERVPVLLRHLTTSDFSSVQSAEDLATKLNTEMQTLSEDPRLLVRTMMPGEAAAPPAEMPIAMAANLPDNEQLLHALVDTVFKVSVLPGNVGYMRFDEFADASVLVKLGPYIVKKVWEPLQNTENLIMDLRYNPGGPSSSAVPMLISYFQDPTAGPVHLFTTYDRRTNHTQEHNSQAELLAQPYGAQRGIYVLTSRHTATAAEEFAYLMQSLGRATLIGEITAGSLSHTCTFPLVQPEQGITRGLTITVPVITFIDNHGESWMGGGVVPDAIVLAEDALEKAEEVL >M2_taeGut RAVPGTISHLKNILKDYYSLVERVPALLRRLTTSDFSSVQSSEDLATKLNTELQALSDDPRLMVRVMMPGEAADSPAEKPVGMAADLPDNEQLLHALVDTVFKVSVLPGNVGYMRFDEFADASVLVKLGPYLVHKVWEPLQNTENLIMDLRYNLGGPSSSAVPVLLSYFQDPAAGPVHLFTTYDRRTNHTQEHNSQAELLGQSYGAKRGVYLLTSHHTATAAEEFAYLMQSLGRATLIGEITAGSLSHTRTFPLLQPGPGITRGLTITVPVITFIDNHGESWMGGGVVPDAIVLAEDALEKAEEVL >M2_anoCar KAIPNSMSYLVDIIKNNYSMLEQVPVLLQHLSTFDYSSVLSVKDLASKLNAELQTISEDPRLFLRVPASDEAVTSQTDEKVAMASDLPNNEQLMKALVMTVFKVSVLPGNVGYMRFDEFGDATVLVKLGPYLLQHVWEPLQATDYLIIDLRYNIGGPSSSAVPVLLSYFQDPSAGPVHFFTTYNRLTNQTQAYSSSAEMVGKPYGARRGVYLLTSHNTATAAEEFAYLMQTLGRATLVGEITAGSLSHTHTFCILELGGGCGLLINVPVITLIDNHGEYWLGGGVVPDSIVLADEALEKAREVL >M2_xenTro SSITHILLQLSEILVNNYAFSERIPTLLQHLPNLDYSSVISEEDITAKLNYELQSLTEDPRLVLKSKTDSLVMPEDSTQVENLPDDEATLQALVNTVFKVSILPGNIGYLRFDEFADVSVLAKLGPYIVNTVWDPITVTENLIIDLRYNIGGSSTSIPLLLSYFQEPENRIHLFTIYNRQQNSTNEVYSLPKVLGKPYGSKKGVYVLTSHETATAAEEFAYLMQSLSRATIIGEITSGNLMHSKAFPLDGTRLSVTVPIMNFIDNNGDYWLGGGVVPDAIVLADEALDKAKEII >M2_xenLae SSVTHVLHQLCDILANNYAFSERIPTLLQHLPNLDYSTVISEEDIAAKLNYELQSLTEDPRLVLKSKTDTLVMPGDSIQAENIPEDEAMLQALVNTVFKVSILPGNIGYLRFDQFADVSVIAKLAPFIVNTVWEPITITENLIIDLRYNVGGSSTAVPLLLSYFLDPETKIHLFTLHNRQQNSTDEVYSHPKVLGKPYGSKKGVYVLTSHQTATAAEEFAYLMQSLSRATIIGEITSGNLMHSKVFPFDGTQLSVTVPIINFIDSNGDYWLGGGVVPDAIVLADEALDKAKEII >M2_petMar GVARKAVEAAGELLLSSYTFVERASAIADHLSWSEYGSVVSVEDLTSKLTQDLQSVAEDPRLVVSNREPEWPPLAQPIPPGPPAPLPDDEQMLEAIVDSAFKVEVLEGNIGYLRFDEFGDASAVMKLRKQLVSKVWERIHPTDDVIIDLRYNLGGSSTAIPIVLSYFQDASPPVHFYTVYDRLRNVTAEFHTVSNLTSQLYGSKKGVYLLTSQHTATAAEEFTYLMQSLNRATIVGEITSGRLAHSLAFRLSDTGLYMTVPIVNFIDNNDEYWLGGGVVPDAIVLAENALDAAKEII >M2x_takRu SRIPKVLQIVLDIIGRFYAFADRVQALLQQLESADLFSVVSEEDLAARLNHDLQTASEDPRLIIRHKRDNIPRAEEEPELHAANDHDGELVEGFTVQVLPHNTGYLRLDRFVRCSEGDKLEEIVAEKVWGPLKDTQNLIIDLRHNTGGSSTSVALLLSYLRDPLPKRHFFTIYDSVQNTTTEYGSRPHIPGPSYGSERGVYVLTSHYTAGAAEEFAYLIQSLHFGTVVGEITSGTLMHSKTFQVEGTDIFITVPFINFLDNNGEYWLGGGVVPDAIVLAEEALEHVNRTA >M2x_danRe KTIPKAVRRVSDIIKRYYSFKDKIPALLNQLAKADYFTVVSEEDLAGKLNHEMQSVFEDPRLLIKATQVLTDDASSEDRSSSDDLTDPLFKLEMISGNNGYLRFDRFPTPEVLLRLEDHIKKKIWQPVQETENLVIDLRFNTGGSTEALPILLSYMFDTSSSTYLFSIYDSIKNTTFDFHTLNNISGPSYGSTKGVYVLTSYYTAEAGEEFAYLMQSLHRGTVIGEITSGMLLHSKTFQIEQTSLAITVPIINFIDVNGECWLGGGVVPDAIVLAEEALERAHEII >M3_homSap QSLGALVEGTGHLLEAHYARPEVVGQTSALLRAKLAQGAYRTAVDLESLASQLTADLQEVSGDHRLLVFHSPGELVVEEAPPPPPAVPSPEELTYLIEALFKTEVLPGQLGYLRFDAMAELETVKAVGPQLVRLVWQQLVDTAALVIDLRYNPGSYSTAIPLLCSYFFEAEPRQHLYSVFDRATSKVTEVWTLPQVAGQRYGSHKDLYILMSHTSGSAAEAFAHTMQDLQRATVIGEPTAGGALSVGIYQVGSSPLYASMPTQMAMSATTGKAWDLAGVEPDITVPMSEALSIAQDIV >M3_bosTau RSLGELVEGTGRLLEAHYARPEVVGQMGALLRAKLAQGAYRTAVDLESLASQLTADLQEMSGDHRLLVFHSPGEMVAEEAPPPPPVVPSPEELSYLIEALFKTEVLPGQLGYLRFDAMAELETVKAVGPQLVQLVWQKLVDTAALVVDLRYNPGSYSTAVPLLCSYFFEAEPRRHLYSVFDRATSRVTEVWTLPHVTGQRYGSHKDLYVLVSHTSGSAAEAFAHTMQDLQRATIIGEPTAGGALSVGIYQVGSSALYASMPTQMAMSASTGEAWDLAGVEPDITVPMSVALSTARDIV >M3_monDom QRLGALVEGTGHLLEAHYALPEVVGQASALLKAKLEHGTYRTAVDFESLASQLTSDLQEVSGDHRLHVFHSPGEPVSEELTPPQKGVPSPEELTYLIEALFKTEVLPGQLGYLRFDMMAEAETVRAIAPQLVELVWEKLVHTEALVVDLRYNPGGYSTAVPLLCSYFFEAEPRRHLYTIFDRAASQLTEVWTLPQVAGERYGSQKDLYILISHTSGSAAEAFVHTMKDQHRATVIGEPTGGGALSVGIYQVENSPLYASMPTQVAISPVTGKAWDMAGVEPDVSVLSSEALMTTQGIV >M3_macEug QRLGALVEDAGHLLEAHYALPEVVGQASALLRARLVHGTYRTAVDFESLASQLTSDLQEVSGDHRVHVFHSPGELIPEELSPPQNVVPSPEELTYLIEALFKTEVLPGQLGYLRFDMMAEAETVRAIGPQLIELVWEKLVNTEALVVDLRYNPGSYSTSVPLLCSYFFEAEPRKHLYTIFDRAASQATEVWTLPQVAGERYGSQKDLYILISHTSGSAAEAFAHAMKDLRRATVIGEPTAGGALSVGIYQVSNSPLYASMPTQVAISPVTGKAWDIAGVEPDVSVPAREALITTQGIV >M3_sacHar QRLGALVEGTGHLVEAHYALPEVVGQASAFLRATLAHGTYRTAVDFESLASQLTSDLQEVSGDHRLHVFHSPGEPVPEESSPPHKGVPSPEELTYLIEALFKTDVLPQLGYLRFDMMAEVETVRAIGPQLVELVWEKLVNTEALVVDLRYNPGSYSTTVPLLCSYFFEAEPRKHLYTVFDRATSQFTEVWTLPQVTGERYGSQKDLYILISHTSGSAAEAFAHIMKDLQRATVIGEPTAGGALSVGIYQVGDSPLYVSMPTQVALSPVTGKAWDMAGVEPDVSVLANEALITAQGIV >M3_taeGut KNMGVLLEGTGQLLEDHYAIPEVAAKASAMLSTKRAQGGYRSAIDSETLASQLTSDLQEASGDHRLHVFHSHVEPTPEEQLPNVIPSPEELSYIIEALFKIEVLPGNLGYLRFDMMAEAETVKAIGPQLLQMVWNKLVDTDAMIIDMRYNTGGYSTAIPILCSYFFDPEPRKHLYTVFDRSTSRSTEVWTLPQLAGKRYGSLKDIYILTSHMSGSAAEAFTRSMKDLHRATVVGEPTVGGSLSVGIYRVGNSSLYASIPSQVVLSPVTGKVWSVSGVEPHITIQASEAMAAAQHIA >M3_galGal RKMGILLESTGQLLEAHYAIPEVAEKASVMLSTKRVQGGYRSAVDFETLASQLTSDLQEASGDHRLHVFHSHVEPTPEEQLPNMIPSPEELSYIIEALFKIEVLPGNLGYLRFDMMAEAETVKAIGPQLVQMVWNKLVDTDAMIIDMRYNTGGYSTAVPILCSYFFEPEPRQHLYTVFDRSTSRSTEVWTLPKVTGKRYGSLKDIYILTSHMSGSAAEAFTRSMKDLHRATVIGEPTVGGSLSVGIYRVGNSSLYRSIPSQVVLSPVTGKVWSVSGAEPHITIQASEALAAAKHIA >M3_anoCar KGMGSLIERVGQLLEAHYAIPEMARRVSSMLNSKLAQGGYRTAVDFETLASQLTNDLQETSGDHQLHVFHSHVEPSLEEQSPFKTLTPEELNFIIEALFKVDVLPGNVGYLRFDMMAEFESVKTIEPQILHMVWEKLVETSAMIVDMRYNTGSYSTAVPMFCSYFFDAEPQQHLYTIIDRSTSQSTEVWTSSQVSGKRYGSTKDLYILISHASGSAAEAFTRSLKDLHRATVIGEPTVGGSLSASIYNIGSTPLYASIPSQIVLSPVSGKVWSLSGIQPHVTTQSNEALASAQNII >M3_xenTro PSVFALVEGTGHLLEVHYAIPEVAYKVSSVLQNKWSEGGYRSVVDLESLASQLTSEMQENSGDHRLHVFYSDTEPEILEDQPPKIPSAEELNYIIDALFKIEVLQGNVGYLRFDMMADTEIIKAIGPQLVSLVWNKLVETNSLIIDMRYNTGGYSTAIPIFCSYFFDPEPLQHLYTVYDRSTSSGTDIWTLPEVVGERYGSTKDIYILTSHMTGSAAEVFTRSMKELNRATIIGEPTSGVSLSVGMYKVGESNLYVSIPNQVVISSVTGKVWSVSGVEPHVIAQASEAMNVAHHII >M3_xenLae PSIFPLVKGTGHLLEVHYAIPEVAYKVSSVLQNKWSEGGYRSVVDLESLASLLTSEMQENSGDHRLHVFYSDTEPEILEDQPPKIPSPEELNYIIDALFKIEVLPGNVGYLRFDMMADTEIIKAIGPQLVSLVWNKLVETNSLIIDMRYNTGGYSTAIPIFCSYFFDPEPLQHLYTVYDRSTSTGKDIWTLPEVFGERYGSTKDIYILTSHMTGSAAEVFTRSLKDLNRATLIGEPTSGVSLSVGMYKVGDSNLYVTIPNQVVISSVTGKVWSVSGVEPHVIIQANEAMNIAHRII >M3_petMar AKMASLLELAGALVEGYYAMLSDGENATAEILLKYREGWYRSVVDYEALASQLTSDLHEIWGDHRLHAFYSDLQIERMDEDKTPSVPSPEELSVLIDTVFKVDILANNVGYLRFDMMTDAEVLKHVGPQLVEKVWNKISSTRSLVIDVRYNMGGYSTSIPILCSYFFDASPPRHLYTVFDRPSRSSTQVFTVPRVLGQRYGASKDVYILTSHMTGSAGEILTRVMSDLKRATVIGEPTAGGSLSTGTYRIGDSRLYVFIPNQAGVSPSGGRTWSVAGVEPHVQTKASEALQSALRMV >M3x_takRu QGLRSLIGRTGELLEKHYAIQEVAQKVGEVLLSKWAEGLYRSVVDLESLASQLTADLQEASGDHRLHVFRCDVELESLHGVPKIAAVEEAGFVIDALFKSELLPRNVGYLRFDTMADIEAAKGAAPRLVKSVWNKLVDTDSLIIDMRYNAGGSSTAVPLWCSYFVDGEPLQHLYTVYDRTTKTRVEVMTLPEVSGQRYDPGKDVYILTSHMTGSAAEAFVRAMRDLNRVTIVGEPTAGGSLSSATYQIGESVLYASIPNQVVTSAATGKLWSISGVEPDVFAQARDALPVAQRII >M3x_danRr KNIQGLVQEAGDLLEKHYSVPEVAAKVSRLLQSKLTEGLYRSVVDYESLASQLTSDLQETSGDQRLHIFYCETEPETLHDTPKIPSPEEAGFIVEALFKVDVMSGNIGYLRFDMMEDIKVLQAINPEFLKVVWNKLVNTDMLIIDVRYNTGGYSTAIPLLCTYFFDAQPLTHIYTLFDRSTATVTKVTTLPDVLGQKYSSQKDVYILTSHITGSAAEAFTRTMKDLKRATVIGEPTIGGALSSGTYQIGNSILYASIPNQAVLNAVTGKPWSISGVEPHIVAQASDALIVAQKII >M4_homSap AKVPTVLQTAGKLVADNYASAELGAKMATKLSGLQSRYSRVTSEVALAEILGADLQMLSGDPHLKAAHIPENAKDRIPGIVPMQIPSPEVFEELIKFSFHTNVLEDNIGYLRFDMFGDGELLTQVSRLLVEHIWKKIMHTDAMIIDMRFNIGGPTSSIPILCSYFFDEGPPVLLDKIYSRPDDSVSELWTHAQVVGERYGSKKSMVILTSSVTAGTAEEFTYIMKRLGRALVIGEVTSGGCQPPQTYHVDDTNLYLTIPTARSVGASDGSSWEGVGVTPHVVVPAEEALARAKEML >M4_bosTau AKVPTVLQTAGKLVADNYASPELGVKMAAELSGLQSRYARVTSEAALAELLQADLQVLSGDPHLKTAHIPEDAKDRIPGIVPMQIPSPEVFEDLIKFSFHTNVLEGNVGYLRFDMFGDCELLTQVSELLVEHVWKKIVHTDALIVDMRFNIGGPTSSISALCSYFFDEGPPILLDKIYNRPNNSVSELWTLSQLEGERYGSKKSMVILTSTLTAGAAEEFTYIMKRLGRALVIGEVTSGGCQPPQTYHVDDTDLYLTIPTARSVGAADGSSWEGVGVVPDVAVPAEAALTRAQEML >M4_monDom AKVPTILQTAGKLVADNYASLEVGSRVASKLAKLQTQYRQVTSEGELADMLGADLQTLSGDRHLKTAHIPEDAKDRIPGIVPMQLPSPEAFEDLIKFSFHTNVFEGNIGYLRFDMFGDCELLTQVSDLLVEHVWKKVVHTDGMIIDMRFNIGGPTSSISALCSYFFDEGQEVLLDQIYNRPNDSISEIWTQSQVAGERYGSKKSVIILTSSMTAGAAEEFVYVMQRLGRALVIGEVTSGGCQPPQTYHVDDTDLYITIPTARSVGSGDKPSWEGVGVAPHVEVPADQALSKAKEMF >M4_macEug lower case: sacHar fix SKVPTILQTAGKLVADNYASPEVGSRVAAKLARLQTQYRQVTSEGELADMLGADLQTLSGDSHLKTAHIPEDSKDRIPGIVPMQlpspeafedlikfsFHTNVFEGNIGYLRFDMFGDCELLTQVSDLLVEHVWKKVVHTDGMIIDMRFNIGGPTSSISALCSYFFDEGQKVLLDRIYNRPNDSIVEIWTQPHVTGERYGSKKSVIILTSSTTAGAAEEFVYIMQGLGRALVIGEVTSGGCQPPQTYHVDDTDLYITIPTAQSVGSGDRPSWEGIGVTPHVEVPADQALSKAKEMF >M4_sacHar AKVPTILQTAGKLVADNYASPEVGSRVAAKLASLQIQYGKVTSEGELADMLGADLQTLSGDRHLKTAHIPEDAKDRIPGIVPMQLPSPEAFEDLIKFSFHTNVFEGNIGYLRFDMFGDCELLIQVSDLLVEHVWKKVMHTDGMIIDMRFNIGGPTSSISAMCSYFFDEGQGVLLDRIYNRPNDSISEIWTQPQVIGERYGSKKTVVILTSSMTAGAAEEFVYIMQRLGRALVIGEVTSGGCQPPQTYHVDDTDLYITIPTTRSVVSGDKSSWEGVGVVPHMEVPADQALSKAKEMF >M4_ornAna SKVPTVLRTAAKLVADNYAFRETGAGVAAQMGGLQARCGRVTSEGALAEVLGAHLRALSGDPHLQMVYIPEDAKDRIPGVVPMQIPSAETFEDLIKFSFHTSVMEGNIGYLRFDMFGDCELLTQVSELMVEHVWKKIVHTDGLIIDMRNIGGPTSSISALCSYFFDEDHPVLLDKIYNRPNDSISEIWTHSHIAGERYGSRKSVVILTSNMTAGAAEEFVSIMKRLGRALVVGEVTGGGCHPPQTYHVDDTHLYITIPTSRSVGSEDGSSWEGVGVTPHLVVPADVALSRAKDLF >M4_taeGut AQVPQILQTVGKLVADNYAFVNTGTVIASNLTKNIHKDNYKRINTEEDLAGKVTAILQALSDDKHLKLLYIPEHAKDSIPGIMPKQIPPPEVFEDLIKFSFHTNVFENNIGYLRFDMFGDSELLTQLSDLMIEHVWKKIFHTDALIIDLRYNIGGSTTPIAILCSYFFDEGHPVLLDRVYDRPSDSVKEIWTQPQLKGERYGSQKGLVILTSAVTAGAAEEFVYIMKRLSRALIIGEQTSGGCHSPQTYQVDETNFYVVIPTSRSVTSADSTSWEGKGVSPHIETPAETALIKAKEML >M4_galGal TQVPQIVQTVGKLVAENYAFVDIGTDIASNLTKSVNKENYKRINSEKELARKLTAILQALSDDEHLKILYIPEHAKDSIPGILPKQIPSPEVFEDLIKFSFHTNVFENNIGYLRFDMFGDCELLTQVSDLLVEHVWKKIVHTDALIIDMRYNIGGYTNSIPILCSYFFDEGHQVLLDKVYDRPSDSVKEIWTQPQLRGERYGSQKGLIILTSAVTAGAAEEFVFIMKRLGRALIIGEQTSGGSHSPQTYQVDDTNFYIIIPTARSVISAESASWEGKGVPPHMETPAVTALIKAKEVL >M4_anoCar TKLPSVLNTIGKLVADNYAFADIGATVAAKFADYAKKGTYRKINSEIELSGKLAADLKALSGDRHLMISHIPERSKGRILGLVPMQQIPPPEILEDLIKFSLHTNVFENNIGYLRFDMFGDCELMSQVSELLVQHVWNKIVNTDALIIDMRYNVGGPACSVPLLCSYFFDEGHPILLDKVYNRPNDTTSNIWTVSKLAGKRYGLNKGLIILTSSVTSGAAEEFAHIMKRLGRAFIIGQKTSGGCHPPQTFHVDGTNLYITTPVSRSVFSVNDSWEGVGVSPHLDVSTDVALIKAKEML >M4_xenLae TKIPTVIQTAAKLVADNYAFADTGANVASKFIALVDKIDYKMIKSEVELAEKINDDLQSLSKDFHLKAVYIPENSKDRIPGVVPMQIPSPELFEELIKFSFHTDVFEKNIGYIRFDMFADSDLLNQVSDLLVEHVWKKVVDQDALIIDMRFNIGGPTSSIPIFCSYFFDEGTPVLLDKIYSRTSNAMTDIWTLPDLVGKTFGSKKPLIILTSSLTEGAAEEFVYIMKRLGRAYVVGEVTSGGCHPPQTYHVDDTHLYLTIPTSRSASAEPGESWEGKGVLPDLEISSETALLKAKEIL >M4_xenTro TKIPSVIQTAGKLVADNYAFADTGADVASKLIALVDKINYKMIKSEVELAEKLNYDLQSLSKDVHLKAVYIPENSKDRIPGVVPMQIPSPEMFEDLIKFSFHTDVFEKNLGYIRFDMFADSDLLNQVSDLLVEHVWKKVVNQDALIIDMRFNIGGPTSSIPTFCSYFFDEGTPVLLDKIYSRTTNAITDVWTLPHLVGNAFGSKKPVIILTSSLTEGAAEEFVYIMKRLGRAYVIGEVTSGGCHPPQTYHVDDTHLYLTIPTSRSASAKPGESWEGKGVLPDLEITSETALMKAKEIL >M4_tetNig AQIPAIIEGTAALVANNYAFEATGADVAKELRELQANGQYSSVVSKESLEAALSADLQRLSGDKSLKTTPNTPVLPPMDYTPEMYIELIKVSFHTDVFENNIGYLRFDMFGDFEEVKAIAQIIVEHVWNKVVNTDALILDLRNNVGGPTTAIAGFCSYFFDADKQNRVGQAVRQASGTTTELLTLSELTGVRYGSKKSLIILTSGATAGAAEEFVYIMKKLGRAMIVGETTAGASHPPQTFRVGETDVFLLIPTVHSDTGAGPAWEGAGIAPHIPASAEAALGTARAIL >M4_takRub AQIPAIIEGAATLIAKNYAFEATGADVATKLRELLAKGQYNSVVSSESLEVALSADLQRLSGDKSLKATQNAPVLPPMDYSPEMYIELIKVSFHTDVFENNIGYLRFDMFGDFEEVKAIAQIIVEHVWNKVVNTDALILDLRNNVGGPTTAIAGFCSYFFDADKLIVLDKLHDRPSGTTTELLTLPELTGVRYGSKKSLIILTSGATAGAAEEFVYIMKKLGRAMIVGETTAGASHPPQVFSVGEIGIFLSIPTVHSDTAAGPAWEGTGITPHIPVSAEAALGTAKGIL >M4_gasAcu NRVPAIIEGSATLIADNYAFEDIGAAVAEKLKGLLANGEYSKVVSKDSLEMKLSADLRTLSGDKSLKTTSNVPALPPMNYSPEMYIELIKVSFHTDVFEDNIGYLRFDMFGDFEEVKAIAQIIVEHVWNKVVNTDAMIVDLRNNIGGPTTAIAGFCSYFFDSDKQIVLDRLYDRPSGTTTELRTLPELTGTRYGSKKSLVMLTSRATAGAAEEFVYIMKKLGRAMIVGETTAGTSHPPKTFRVGETDIFLSIPTVHSDTAAGPAWEGAGVAPHIPVPADAALETAKGIF >M4_oryLat LQVPAIIEESATLVANNYAFESTAADVAEKLKGHLANGDYNMVVSKESLEAKLSADLQSLSGDKSLTVSSNTGAPPPMEYTPEMYIELIKISFHTDVFENNIGYLRFDMFGDFEEVKAIAQVIVEHVWNKVLHTDAMIIDLRNNVGGPTTAIAGFCSYFFDGDKQILLDKLYDRSTGTTTDLLTLGELTGERYGSKKSLIILASRATAGAAEEFVYIMKRLGRAMIVGETTAGASHPPKVFQVGESDIFLSIPTVHSDTSAGPGWEGAGVAPHIPVAAGAALETAKAIL >M4_danRer AEIPALAQAAATLIADNYAFPSIGEHVAEKLEAVVAGGEYNLISTKEDLEERLSEDLLKLSEDKCLKTTSNIPALPPMNPTPEMFIALIKSSFQTDVFENNIGYLRFDMFGDFEHVATIAQIIVEHVWNKVVDTDALIIDLRNNIGGHASSIAGFCSYFFDADKQIVLDHIYDRPSNTTRDLQTLEQLTGRRYGSKKSVVILTSGVTAGAAEEFVFIMKRLGRAMIIGETTHGGCQPPETFAVGESDIFLSIPISHSTAQGPSWEGAGIAPHIPVPAGAALDTAKGML >M4_petMar ADAPSILRTVGKLVADGYSRAEAALGVPSKLAALLEAGEYGALRSEEELAFKLTVHLQLITGDRHLKAVCVPEHATDRMPGIVPMQMPPTESFEDLIKFSFITDVLEGNIGYLRFDLFSDLEALEHVAHLLVEHVWKKICDTEILIIDLR >Mn_braFlo RALNDQSL SKAIILD LNEL insertions omitted VDWLDVVMGIGDVMADHYLDQDLLQRWNRTLVHRFQSWSQDDMSDSLRMEEGLTSELRNITGDETIKVWDFGVYENTTQEPVPREFYNFSTFVDNFKKNINVTMLEGNVGYVSIRSMSHIVDIILPDPEMTEFFLSKMAASKAIILDLRYNLGGDREGVVHWASFFFNATPSVPLSDVYYRDGVNQYWTLLEVPGGIRFPDMPLYLLTSNRTSREAEEFAYAMQVVNRTTIIGETTAGEEFTGMWFPIDQTDVHLLTRTNVVRNPITQDSWSGKGVTPDIIVPSEKALTVALRKI
RBP3 (IRBP) sequences from human to amphioxus
>RPB3_homSap human 0 MMREWVLLMSVLLCGLAGPTHLFQPSLVLDMAKVLLDNYCFPENLLGMQEAIQQAIKSHEILSISDPQTLASVLTAGVQSSLNDPRLVISYEPSTPEPPPQV PALTSLSEEELLAWLQRGLRHEVLEGNVGYLRVDSVPGQEVLSMMGEFLVAHVWGNLMGTSALVLDLRHCTGGQVSGIPYIISYLHPGNTILHVDTIYNRPSNTTTEIWTLPQVLG ERYGADKDVVVLTSSQTRGVAEDIAHILKQMRRAIVVGERTGGGALDLRKLRIGESDFFFTVPVSRSLGPLGGGSQTWEGSGVLPCVGTPAEQALEKALAILTLRSALPGVVHCLQ EVLKDYYTLVDRVPTLLQHLASMDFSTVVSEEDLVTKLNAGLQAASEDPRLLVRAIGPTETPSWPAPDAAAEDSPGVAPELPEDEAIRQALVDSVFQVSVLPGNVGYLRFDSFADA SVLGVLAPYVLRQVWEPLQDTEHLIMDLRHNPGGPSSAVPLLLSYFQGPEAGPVHLFTTYDRRTNITQEHFSHMELPGPRYSTQRGVYLLTSHRTATAAEEFAFLMQSLGWATLVG EITAGNLLHTRTVPLLDTPEGSLALTVPVLTFIDNHGEAWLGGGVVPDAIVLAEEALDKAQEVLEFHQSLGALVEGTGHLLEAHYARPEVVGQTSALLRAKLAQGAYRTAVDLESL ASQLTADLQEVSGDHRLLVFHSPGELVVEEAPPPPPAVPSPEELTYLIEALFKTEVLPGQLGYLRFDAMAELETVKAVGPQLVRLVWQQLVDTAALVIDLRYNPGSYSTAIPLLCS YFFEAEPRQHLYSVFDRATSKVTEVWTLPQVAGQRYGSHKDLYILMSHTSGSAAEAFAHTMQDLQRATVIGEPTAGGALSVGIYQVGSSPLYASMPTQMAMSATTGKAWDLAGVEP DITVPMSEALSIAQDIVALRAKVPTVLQTAGKLVADNYASAELGAKMATKLSGLQSRYSRVTSEVALAEILGADLQMLSGDPHLKAAHIPENAKDRIPGIVPMQ 0 0 IPSPEVFEELIKFSFHTNVLEDNIGYLRFDMFGDGELLTQVSRLLVEHIWKKIMHTDAMIIDMR 2 1 FNIGGPTSSIPILCSYFFDEGPPVLLDKIYSRPDDSVSELWTHAQVV 1 2 GERYGSKKSMVILTSSVTAGTAEEFTYIMKRLGRALVIGEVTSGGCQPPQTYHVDDTNLYLTIPTARSVGASDGSSWEGVGVTPHVVVPAEEALARAKEMLQHNQLRVKRSPGLQDHL* 0 >RBP3_bosTau cow run-on terminal exon 0 MVRKWALLLPMLLCGLTGPAHLFQPSLVLEMAQVLLDNYCFPENLMGMQGAIEQAIKSQEILSISDPQTLAHVLTAGVQSSLNDPRLVISYEPSTLEAPP RAPAVTNLTLEEIIAGLQDGLRHEILEGNVGYLRVDDIPGQEVMSKLRSFLVANVWRKLVNTSALVLDLRHCTGGHVSGIPYVISYLHPGSTVSHVDTVY DRPSNTTTEIWTLPEALGEKYSADKDVVVLTSSRTGGVAEDIAYILKQMRRAIVVGERTVGGALNLQKLRVGQSDFFLTVPVSRSLGPLGEGSQTWEGSG VLPCVGTPAEQALEKALAVLMLRRALPGVIQRLQEALREYYTLVDRVPALLSHLAAMDLSSVVSEDDLVTKLNAGLQAVSEDPRLQVQVVRPKEASSGPE EEAEEPPEAVPEVPEDEAVRRALVDSVFQVSVLPGNVGYLRFDSFADASVLEVLGPYILHQVWEPLQDTEHLIMDLRQNPGGPSSAVPLLLSYFQSPDAS PVRLFSTYDRRTNITREHFSQTELLGRPYGTQRGVYLLTSHRTATAAEELAFLMQSLGWATLVGEITAGSLLHTHTVSLLETPEGGLALTVPVLTFIDNH GECWLGGGVVPDAIVLAEEALDRAQEVLEFHRSLGELVEGTGRLLEAHYARPEVVGQMGALLRAKLAQGAYRTAVDLESLASQLTADLQEMSGDHRLLVF HSPGEMVAEEAPPPPPVVPSPEELSYLIEALFKTEVLPGQLGYLRFDAMAELETVKAVGPQLVQLVWQKLVDTAALVVDLRYNPGSYSTAVPLLCSYFFE AEPRRHLYSVFDRATSRVTEVWTLPHVTGQRYGSHKDLYVLVSHTSGSAAEAFAHTMQDLQRATIIGEPTAGGALSVGIYQVGSSALYASMPTQMAMSAS TGEAWDLAGVEPDITVPMSVALSTARDIVTLRAKVPTVLQTAGKLVADNYASPELGVKMAAELSGLQSRYARVTSEAALAELLQADLQVLSGDPHLKTAH IPEDAKDRIPGIVPMQ 0 0 IPSPEVFEDLIKFSFHTNVLEGNVGYLRFDMFGDCELLTQVSELLVEHVWKKIVHTDALIVDMR 2 1 FNIGGPTSSISALCSYFFDEGPPILLDKIYNRPNDSVSELWTLSQLE 1 2 GERYGSKKSMVILTSTLTAGAAEEFTYIMKRLGRALVIGEVTSGGCQPPQTYHVDDTDLYLTIPTARSVGAADGSSWEGVGVVPDVAVPAEAALTRAQEMLQHTPLRARRSPRLHGRRKGHHRQSQGRAGSLGRNQGVgRPEVLTEAPSGQKRGLLQCG* 0 >RBP3_monDom opossum 0 MTSQCLLLFSALLFSLAHAEQIFQPSLVRDMAKILLDNYCFPENLMGMQEVIEQAIKSGEILDISDPQMLASVLTAGVQGALNDPRLVISFEPSIPETPQ HVPKLANVTQEELLILLQQMIKYQVLEGNVGYLRVDYIPGQEVVEKVGEFLVNNIWKKLMGTSSLVLDLQHSSGGEISGIPFVISYLHQGDILLHVDTVY DRPSNTTTEIWTLPQVLGERYGGEKDMVVLTSHRTVGVAEDIAYILKKLRRAIVVGEQTLGGALDLRKLRIGQSDFFITVPVSRSLSPLGGGSQTWEGSG VLPCVGIPAEQALGKALAILTLRRARPGAIQRLMEVLQNYYTLVDRVPALLHHLTAIDYSSVLTEEDLAAKLNAGLQAVSEDPRLLVRVLRPEEATMGSEASEEEDATPAANS LPEDESQRQALVDSVFQVSVLPGNVGYLRFDEFADSSVLGTLAPYVIRQVWEPLQDTNHLIMDLRYNPGGPSSAVPLLLSYFQDPAAGP IRLFTTYDRQTNQTQEHLSRAELLGKPYGAQRGVYLLTSHHTATAAEEFAFLMQSLGRATLVGEITAGSLMHTRTFPLLQPPNGNLVLTVPILTFIDNNG ECWLGGGVVPDAIVLAEEALDKAKEVLEFHQRLGALVEGTGHLLEAHYALPEVVGQASALLKAKLEHGTYRTAVDFESLASQLTSDLQEVSGDHRLHVFH SPGEPVSEELTPPQKGVPSPEELTYLIEALFKTEVLPGQLGYLRFDMMAEAETVRAIAPQLVELVWEKLVHTEALVVDLRYNPGGYSTAVPLLCSYFFEA EPRRHLYTIFDRAASQLTEVWTLPQVAGERYGSQKDLYILISHTSGSAAEAFVHTMKDQHRATVIGEPTGGGALSVGIYQVENSPLYASMPTQVAISPVT GKAWDMAGVEPDVSVLSSEALMTTQGIVALRAKVPTILQTAGKLVADNYASLEVGSRVASKLAKLQTQYRQVTSEGELADMLGADLQTLSGDRHLKTAHI PEDAKDRIPGIVPMQ 0 0 LPSPEAFEDLIKFSFHTNVFEGNIGYLRFDMFGDCELLTQVSDLLVEHVWKKVVHTDGMIIDMR 2 1 FNIGGPTSSISALCSYFFDEGQEVLLDQIYNRPNDSISEIWTQSQVA 1 2 GERYGSKKSVIILTSSMTAGAAEEFVYVMQRLGRALVIGEVTSGGCQPPQTYHVDDTDLYITIPTARSVGSGDKPSWEGVGVAPHVEVPADQALSKAKEMFNHHLQRAK* 0 >RBP3_sacHar Sarcophilus harrisii 0 MTSQCLLLFSALLFSLAHGEEIFQPTLVLDMAKILLDNYCFPENLMGMQEAIEQAIKSGEILDISDPQMLANVLTAGVQGSLNDPRLVISYEPSTSEAPQ HDPKFANATQEELLALFQKIIKYQVLEDNVGYLRVDYIPGRDMIEEVGEFLVNDIWKKVMETSSLVLDLQHSR GGEVSGIPFVISYLHQGDILLHVDTIYDRPSNTTTEIWTLPQVLGERYSGEKDIVVLTSHHTVGVAEDIAYILKKMRRAIVVGEQTQGGSLDLRKLRIGQSDFFITVPVARSLSPLGGGSQTWEGSG VVPCVGIPAEQALEKALAILTLRRARPGAIQRLMEILQKYYTLVDRVPALLHHLTAIDYSSVLTEEDLAAKLNAMLQAVSEDPRLLVRVLRPEEATVEAEPGEESATPASVSLPESDAERQALI DSVFQVSVLPGNVGYLRFDEFADNSVLGTLAPYVLRQVWEPLQDTDHLIMDLRYNPGGPSSAVPLLLSYFQDPSAGPVRLFATYDRQTNQ TQEYRSRAELLGKPYGAERGVYLLTSYHTATAAEEFAFLMQSLGRATLVGEITAGSLMHTRTFPLLQPPNGSLVLTVPTLTFIDNHGECWLGGGVVPDAIVLAEEALDKAKEVLEFHQRL GALVEGTGHLVEAHYALPEVVGQASAFLRATLAHGTYRTAVDFESLASQLTSDLQEVSGDHRLHVFHSPGEPVPEESSPPHKGVPSPEELTYLIEALFKTDVLP QLGYLRFDMMAEVETVRAIGPQLVELVWEKLVNTEALVVDLRYNPGSYSTTVPLLCSYFFEAEPRKHLYTVFDRATSQFTEVWTLPQVTGERYGSQKDLYILISHTSGSA AEAFAHIMKDLQRATVIGEPTAGGALSVGIYQVGDSPLYVSMPTQVALSPVTGKAWDMAGVEPDVSVLANEALITAQGIVALRAKVPTILQTAGKLVADNYA SPEVGSRVAAKLASLQIQYGKVTSEGELADMLGADLQTLSGDRHLKTAHIPEDAKDRIPGIVPMQ 0 0 LPSPEAFEDLIKFSFHTNVFEGNIGYLRFDMFGDCELLIQVSDLLVEHVWKKVMHTDGMIIDMR 2 1 FNIGGPTSSISAMCSYFFDEGQGVLLDRIYNRPNDSISEIWTQPQVI 1 2 GERYGSKKTVVILTSSMTAGAAEEFVYIMQRLGRALVIGEVTSGGCQPPQTYHVDDTDLYITIPTTRSVVSGDKSSWEGVGVVPHMEVPADQALSKAKEMFTHQLQKTK* 0 >RBP3_macEug wallaby 3 alleles to FJ603206 Macropus eugenii frag exon 2 lower case: sacHar fix 0 MTSQCLLLFSALLLSLAHAEQIFQPSLVLDMAKILLDNYCFPENLMGMQEAIEQAIKSGEILDISDPQMLASVLTAGVQGSLNDPRLVISYEPSPAEAPQQSPKLTSLTQEELLTLLQQM IKYQVLDGNVGYLRVDYIPGQEVVEKVGEFLVNDIWKKLMGTSSLVLDLQHSTEGEVSGIPFVISYLHEGDILLHVDTVYDRPSNTTTEIWTLPQVLGERYSGEKDLVILTSHRTVGVAE DIAYILKKMRRAIVVGEQTLGGALDLRKLRIGQSDFFITVPVSRSLSPLGGGSQTWEGSGVLPCVGIPAEQALEKALAILTLRRARPAAIQRLMEVLQNYYTLVDRVPALLHHLTAIDYS SVLTEEDLAAKLNAGLQAVSEDPRLLVRVLRPEEATAGAESREEAATAAPVPLPDGESQRQALVNSVFQVSVLPGNVGYLRFDEFADSSVLGALAPYVLQQVWEPLQDTDHLIMDLRYNP GGPSSAVPLLLSYFQDPAAGPVRLFATYDRQTNQTREYRSQAELLGKPYGAQRGVYLLTSHHTATAAEEFAFLMQSLGRATLVGEITAGNLMHTRTFSLLQPPDGSLVLTVPILTFIDNH GECWLGGGVVPDAIVLAEEALDKAKEVIEFHQRLGALVEDAGHLLEAHYALPEVVGQASALLRARLVHGTYRTAVDFESLASQLTSDLQEVSGDHRVHVFHSPGELIPEELSPPQNVVPS PEELTYLIEALFKTEVLPGQLGYLRFDMMAEAETVRAIGPQLIELVWEKLVNTEALVVDLRYNPGSYSTSVPLLCSYFFEAEPRKHLYTIFDRAASQATEVWTLPQVAGERYGSQKDLYI LISHTSGSAAEAFAHAMKDLRRATVIGEPTAGGALSVGIYQVSNSPLYASMPTQVAISPVTGKAWDIAGVEPDVSVPAREALITTQGIVTLRSKVPTILQTAGKLVADNYASPEVGSRVA AKLARLQTQYRQVTSEGELADMLGADLQTLSGDSHLKTAHIPEDSKDRIPGIVPMQ 0 0 lpspeafedlikfsFHTNVFEGNIGYLRFDMFGDCELLTQVSDLLVEHVWKKVVHTDGMIIDMR 2 1 FNIGGPTSSISALCSYFFDEGQKVLLDRIYNRPNDSIVEIWTQPHVT 1 2 GERYGSKKSVIILTSSTTAGAAEEFVYIMQGLGRALVIGEVTSGGCQPPQTYHVDDTDLYITIPTAQSVGSGDRPSWEGIGVTPHVEVPADQALSKAKEMFIHHLQRAD* 0 >RBP3_ornAna platypus genome rife with frameshifts, dels, misassembly frag 0 MGVCLPLLLVAQFSLTGHVEPVSQPSMVLDVAKILLDNYCYPENLMGMQEAIEEAIQRGEILDIADPKRLASVLTAGVQGSLNDPRLVISYEPAPVAVSQ QPPEPASLPAEQPLERLRPAVGSEVLEGNVGYLRVDRLPGREEIERVGAVLGRDIWEKLLGTSALVLDLRHSTGGHVSGIPFFISYFYPEGPALHVDTVY DRPSNATRQLWTLPRVLGARYAADKDVVVLTSRLTAGVAEDVAYILQQMRRAIVVGERTAGGPLVFRKLRVGLSDFFITVPVACSLGPLGGGGRSWEGSG VLPCVAVPADRALDEALDILALRGAVPGAVAHLADLLRDYYALVDRVPALLRHLAALDLSSVLSEEDLTSRLNAGLQAASEDPRLLVRRLEPEEAERGPP RKEEEQKEEEEEDQPSPGASILPGDGSSLFRVSVLPGNVGYLCFDEFPEASALERLGPLLGRRVWEPLEATDHLMVDLRNNPGGPSSAVPLLLSYF QDPAAGPIRLFTTYNRPADVTREYASRAGALEKPYGARRGVYLLTSHRTATAAEEFAYLMQALGRATLVGEITAGRLLHSRTFPLLRPPWEGLVLTVPFL TLFDPHGEGWLGGGVVPDAIVLAEEALEKAGEVLAFHQTLEALVETTGHLLEAHYCFPAGARRAGAQPWPVAGVEPDVMAQAAEALAVAQGIAALRSKVP TVLRTAAKLVADNYAFRETGAGVAAQMGGLQARCGRVTSEGALAEVLGAHLRALSGDPHLQMVYIPEDAKDRIPGVVPMQ 0 0 IPSAETFEDLIKFSFHTSVMEGNIGYLRFDMFGDCELLTQVSELMVEHVWKKIVHTDGLIIDMR 2 1 NIGGPTSSISALCSYFFDEDHPVLLDKIYNRPNDSISEIWTHSHIA 1 2 GERYGSRKSVVILTSNMTAGAAEEFVSIMKRLGRALVVGEVTGGGCHPPQTYHVDDTHLYITIPTSRSVGSEDGSSWEGVGVTPHLVVPADVALSRAKDLFRAHLEHRD* 0 >RBP3_taeGut Taeniopygia guttata 0 MIRTHFLLLSALIMCSIPAEEIFQPTLVLDMAKVLLDNYCYPENLVGMQEAIEQAIKSGEILDISDPKMLANVLTAGVQGALNDPRLVISYEPLPHSGPK QEAEGSPTREQLLSLIEHVIMYDKLEGNVGYLRIDYIIGEEVVQKVGAFLVDKVWKTLIETSALVIDLRHSTGGQISGLPFIISYLHEQDKILHVETVYN RPSNTTTEIWTLPKVLGERYSKDKDVIVLISHHTTGVAEDVAYILKHMNRAITVGEKTAGGSLDIQKLRIGPSNFYMMVPVSRSVSPLSGGGQSWEVSGV MPCVATEAEQALQKSLDILAVRRAVPGTISHLKNILKDYYSLVERVPALLRRLTTSDFSSVQSSEDLATKLNTELQALSDDPRLMVRVMMPGEAADSPAE KPVGMAADLPDNEQLLHALVDTVFKVSVLPGNVGYMRFDEFADASVLVKLGPYLVHKVWEPLQNTENLIMDLRYNLGGPSSSAVPVLLSYFQDPAAGPVH LFTTYDRRTNHTQEHNSQAELLGQSYGAKRGVYLLTSHHTATAAEEFAYLMQSLGRATLIGEITAGSLSHTRTFPLLQPGPGITRGLTITVPVITFIDNH GESWMGGGVVPDAIVLAEDALEKAEEVLAFHKNMGVLLEGTGQLLEDHYAIPEVAAKASAMLSTKRAQGGYRSAIDSETLASQLTSDLQEASGDHRLHVF HSHVEPTPEEQLPNVIPSPEELSYIIEALFKIEVLPGNLGYLRFDMMAEAETVKAIGPQLLQMVWNKLVDTDAMIIDMRYNTGGYSTAIPILCSYFFDPE PRKHLYTVFDRSTSRSTEVWTLPQLAGKRYGSLKDIYILTSHMSGSAAEAFTRSMKDLHRATVVGEPTVGGSLSVGIYRVGNSSLYASIPSQVVLSPVTG KVWSVSGVEPHITIQASEAMAAAQHIANLRAQVPQILQTVGKLVADNYAFVNTGTVIASNLTKNIHKDNYKRINTEEDLAGKVTAILQALSDDKHLKLLY IPEHAKDSIPGIMPK 0 0 QIPPPEVFEDLIKFSFHTNVFENNIGYLRFDMFGDSELLTQLSDLMIEHVWKKIFHTDALIIDLR 2 1 YNIGGSTTPIAILCSYFFDEGHPVLLDRVYDRPSDSVKEIWTQPQLK 1 2 GERYGSQKGLVILTSAVTAGAAEEFVYIMKRLSRALIIGEQTSGGCHSPQTYQVDETNFYVVIPTSRSVTSADSTSWEGKGVSPHIETPAETALIKAKEMLNAHLHSSR* 0 >RBP3_galGal Gallus gallus 1236 aa N-terminal 21 aa signal peptide 5 glyc (3 unique) two W per repeat 0 MRTYFFLFSVLIVCSISAEEIFQPTLVLDMAKVLLDNYCYPENLVGMQEAIEQAIKSGEILDISDPKMLANVLTAGVQGALNDPRLVISYEPSLHAAPKQ EAETYPTREQLLSLIEHVVIYDKLEGNVGYLRIDYIIGQEVVEKVGAFLVDKVWKTLINTSALVIDLRYSTGGQISGIPFIISYLHEADKMLHVETVYNR PSNTTTEIWTLPKVLGERYSKDKDVIVLISHHTTGVAEDVAYILKHMNRAITLGEKTAGGSLDIQKLRIGPSNFYMMVPVSRSVSPLSGGGQSWEVSGVM PCVASEAEQALKKSLDILAVRRAVPGTLSRLTDILKDYYSLVERVPVLLRHLTTSDFSSVQSAEDLATKLNTEMQTLSEDPRLLVRTMMPGEAAAPPAEM PIAMAANLPDNEQLLHALVDTVFKVSVLPGNVGYMRFDEFADASVLVKLGPYIVKKVWEPLQNTENLIMDLRYNPGGPSSSAVPMLISYFQDPTAGPVHL FTTYDRRTNHTQEHNSQAELLAQPYGAQRGIYVLTSRHTATAAEEFAYLMQSLGRATLIGEITAGSLSHTCTFPLVQPEQGITRGLTITVPVITFIDNHG ESWMGGGVVPDAIVLAEDALEKAEEVLTFHRKMGILLESTGQLLEAHYAIPEVAEKASVMLSTKRVQGGYRSAVDFETLASQLTSDLQEASGDHRLHVFH SHVEPTPEEQLPNMIPSPEELSYIIEALFKIEVLPGNLGYLRFDMMAEAETVKAIGPQLVQMVWNKLVDTDAMIIDMRYNTGGYSTAVPILCSYFFEPEP RQHLYTVFDRSTSRSTEVWTLPKVTGKRYGSLKDIYILTSHMSGSAAEAFTRSMKDLHRATVIGEPTVGGSLSVGIYRVGNSSLYRSIPSQVVLSPVTGK VWSVSGAEPHITIQASEALAAAKHIASLRTQVPQIVQTVGKLVAENYAFVDIGTDIASNLTKSVNKENYKRINSEKELARKLTAILQALSDDEHLKILYI PEHAKDSIPGILPK 0 0 QIPSPEVFEDLIKFSFHTNVFENNIGYLRFDMFGDCELLTQVSDLLVEHVWKKIVHTDALIIDMR 2 1 YNIGGYTNSIPILCSYFFDEGHQVLLDKVYDRPSDSVKEIWTQPQLR 1 2 GERYGSQKGLIILTSAVTAGAAEEFVFIMKRLGRALIIGEQTSGGSHSPQTYQVDDTNFYIIIPTARSVISAESASWEGKGVPPHMETPAVTALIKAKEVLSAHLHSSR* 0 >RBP3_anoCar lizard 0 MLRKCLWLSIVLVCCSSYADSVLQSTLVLDMAKLLLDNYCLPENLVGMREAIEQAIKNGEVLDISDPKLLATVLTAGVQGALNDPRLVISYEPTAPAAPK QRMETSLTPEQLLSLIQHTVKYEVLDDNVGYLRIDYIMGQDIVQKIGSFLVEKVWKTLLGTSALILDLRYTTGGDVSGIPFIISYLYNGDKVLHVDTVYN RPSNTTVEILTLPKVLGVRYSKDKDVILLISKYTTGVAENVAYILKHMHRTIIVGEKSAGGSLDTQKMQIGNSQFYMTVPLSCSVSPLSGSGQSWEISGV TPCVVISAEQALDKALAILSLRKAIPNSMSYLVDIIKNNYSMLEQVPVLLQHLSTFDYSSVLSVKDLASKLNAELQTISEDPRLFLRVPASDEAVTSQTD EKVAMASDLPNNEQLMKALVMTVFKVSVLPGNVGYMRFDEFGDATVLVKLGPYLLQHVWEPLQATDYLIIDLRYNIGGPSSSAVPVLLSYFQDPSAGPVH FFTTYNRLTNQTQAYSSSAEMVGKPYGARRGVYLLTSHNTATAAEEFAYLMQTLGRATLVGEITAGSLSHTHTFCILELGGGCGLLINVPVITLIDNHGE YWLGGGVVPDSIVLADEALEKAREVLEFHKGMGSLIERVGQLLEAHYAIPEMARRVSSMLNSKLAQGGYRTAVDFETLASQLTNDLQETSGDHQLHVFHS HVEPSLEEQSPFKTLTPEELNFIIEALFKVDVLPGNVGYLRFDMMAEFESVKTIEPQILHMVWEKLVETSAMIVDMRYNTGSYSTAVPMFCSYFFDAEPQ QHLYTIIDRSTSQSTEVWTSSQVSGKRYGSTKDLYILISHASGSAAEAFTRSLKDLHRATVIGEPTVGGSLSASIYNIGSTPLYASIPSQIVLSPVSGKV WSLSGIQPHVTTQSNEALASAQNIILFRTKLPSVLNTIGKLVADNYAFADIGATVAAKFADYAKKGTYRKINSEIELSGKLAADLKALSGDRHLMISHIP ERSKGRILGLVPM 0 0 QIPPPEILEDLIKFSLHTNVFENNIGYLRFDMFGDCELMSQVSELLVQHVWNKIVNTDALIIDMR 2 1 YNVGGPACSVPLLCSYFFDEGHPILLDKVYNRPNDTTSNIWTVSKLA 1 2 GKRYGLNKGLIILTSSVTSGAAEEFAHIMKRLGRAFIIGQKTSGGCHPPQTFHVDGTNLYITTPVSRSVFSVNDSWEGVGVSPHLDVSTDVALIKAKEMLKAHLH* 0 >RBP3_xenLae Xenopus laevis 0 MPPLFQALTTALFFCGIASNPLFQPSLVMDMAKVLLDNYCFPENLVGMQETIEQAVKGGEILHISDPDTLANVFTSGVQGYLNDPRLVVSYEPNYSGPQT EQSLELTPEQLKFLINHSVKYDILPGNIGYLRIDFIIGQDVVQKVGPHLVNNIWKKLMPTSALILDLRYSTQGEVSGIPFVVSYLCDSEIHIDSIYNRPS NTTTDLWTLPELMGERYGKVKDVVVLTSKYTKGVAEDASYILKHMNRAIVVGEKTAGGSLDTQKIKIGQSDFYITVPVSRSLSPLTGQSWEVSGVSPCVV VNAKDALDKAQAILAVRSSVTHVLHQLCDILANNYAFSERIPTLLQHLPNLDYSTVISEEDIAAKLNYELQSLTEDPRLVLKSKTDTLVMPGDSIQAENI PEDEAMLQALVNTVFKVSILPGNIGYLRFDQFADVSVIAKLAPFIVNTVWEPITITENLIIDLRYNVGGSSTAVPLLLSYFLDPETKIHLFTLHNRQQNS TDEVYSHPKVLGKPYGSKKGVYVLTSHQTATAAEEFAYLMQSLSRATIIGEITSGNLMHSKVFPFDGTQLSVTVPIINFIDSNGDYWLGGGVVPDAIVLA DEALDKAKEIIAFHPSIFPLVKGTGHLLEVHYAIPEVAYKVSSVLQNKWSEGGYRSVVDLESLASLLTSEMQENSGDHRLHVFYSDTEPEILEDQPPKIP SPEELNYIIDALFKIEVLPGNVGYLRFDMMADTEIIKAIGPQLVSLVWNKLVETNSLIIDMRYNTGGYSTAIPIFCSYFFDPEPLQHLYTVYDRSTSTGK DIWTLPEVFGERYGSTKDIYILTSHMTGSAAEVFTRSLKDLNRATLIGEPTSGVSLSVGMYKVGDSNLYVTIPNQVVISSVTGKVWSVSGVEPHVIIQAN EAMNIAHRIIKLRTKIPTVIQTAAKLVADNYAFADTGANVASKFIALVDKIDYKMIKSEVELAEKINDDLQSLSKDFHLKAVYIPENSKDRIPGVVPM 0 0 QIPSPELFEELIKFSFHTDVFEKNIGYIRFDMFADSDLLNQVSDLLVEHVWKKVVDQDALIIDMR 2 1 FNIGGPTSSIPIFCSYFFDEGTPVLLDKIYSRTSNAMTDIWTLPDLV 1 2 GKTFGSKKPLIILTSSLTEGAAEEFVYIMKRLGRAYVVGEVTSGGCHPPQTYHVDDTHLYLTIPTSRSASAEPGESWEGKGVLPDLEISSETALLKAKEILESQLEGRR* 0 >RBP3_xenTro Xenopus tropicalis 89% xenLae 0 MSPLFKALTTVLFFCIVASNPVFQPSLVMDMAKVLLDNYCFPENLVGMQETIEQAMKSGEILHISDPETLANVFTSGVQGFLNDPRLVVSYEPNYSGPRK EQSPEPTLEQLKFLLDHSVTYDLLPGNIGYLRIDFIIGQDVVQKVGPLLVNNIWKKLMPSSALILDLRYSTQGKVSGIPFVVSYLTDPQIHIDSIYNRPS NTTTDLWTLSELMGERYGKDKDVVVLTSKYTEGIAEGAAYILKHMSRAIVVGEKTAGGSLDIQKIKIGQSEFYITVPVSRSISPLTGQSWEVAGVFPCVV VNANNALNKAQGILAVRSSITHILLQLSEILVNNYAFSERIPTLLQHLPNLDYSSVISEEDITAKLNYELQSLTEDPRLVLKSKTDSLVMPEDSTQVENL PDDEATLQALVNTVFKVSILPGNIGYLRFDEFADVSVLAKLGPYIVNTVWDPITVTENLIIDLRYNIGGSSTSIPLLLSYFQEPENRIHLFTIYNRQQNS TNEVYSLPKVLGKPYGSKKGVYVLTSHETATAAEEFAYLMQSLSRATIIGEITSGNLMHSKAFPLDGTRLSVTVPIMNFIDNNGDYWLGGGVVPDAIVLA DEALDKAKEIIAFHPSVFALVEGTGHLLEVHYAIPEVAYKVSSVLQNKWSEGGYRSVVDLESLASQLTSEMQENSGDHRLHVFYSDTEPEILEDQPPKIP SAEELNYIIDALFKIEVLQGNVGYLRFDMMADTEIIKAIGPQLVSLVWNKLVETNSLIIDMRYNTGGYSTAIPIFCSYFFDPEPLQHLYTVYDRSTSSGT DIWTLPEVVGERYGSTKDIYILTSHMTGSAAEVFTRSMKELNRATIIGEPTSGVSLSVGMYKVGESNLYVSIPNQVVISSVTGKVWSVSGVEPHVIAQAS EAMNVAHHIIKLRTKIPSVIQTAGKLVADNYAFADTGADVASKLIALVDKINYKMIKSEVELAEKLNYDLQSLSKDVHLKAVYIPENSKDRIPGVVPMQ 0 0 IPSPEMFEDLIKFSFHTDVFEKNLGYIRFDMFADSDLLNQVSDLLVEHVWKKVVNQDALIIDMr 2 1 FNIGGPTSSIPTFCSYFFDEGTPVLLDKIYSRTTNAITDVWTLPHLV 1 2 GNAFGSKKPVIILTSSLTEGAAEEFVYIMKRLGRAYVIGEVTSGGCHPPQTYHVDDTHLYLTIPTSRSASAKPGESWEGKGVLPDLEITSETALMKAKEILVSQLEGR* 0 >RBP3_tetNig frameshifts in genome two domains: 23-324,326-612 no upstream dup 0 MAKALFTVASLLLLANGFFVGAAFPPSLIADMAKIVLDNYCSPEKLAGMKEAIKAAGTNTEVLNIPDGESLARVLSAGVQGTVSDPRLMVSFQPNYVPAG PHKMPPLPPEHLVAVLQTSVKLDILEGNTGYLRIDHILGEEVADKVGPALIDLIWNKILPTSALIFDLRYTSSGDISGIPYIVSYFTQAEPVVHIDSVYD RPSNTTTKLLSLPNLLGQRYGVSKPLIVLTSKNTKGIAEDVAYCLKNLKRATIVGEKTAGGSLKLDTFKVGDTDFYITVPTAKSINPITGSSWEIRGVTP HVEVNAEDALATAIKIVNLRAQIPAIIEGTAALVANNYAFEATGADVAKELRELQANGQYSSVVSKESLEAALSADLQRLSGDKSLKTTPNTPVLPPM 0 0 DYTPEMYIELIKVSFHTDVFENNIGYLRFDMFGDFEEVKAIAQIIVEHVWNKVVNTDALILDLr 2 1 NNVGGPTTAIAGFCSYFFDADKQNRVGQAVRQASGTTTELLTLSELT 1 2 GVRYGSKKSLIILTSGATAGAAEEFVYIMKKLGRAMIVGETTAGASHPPQTFRVGETDVFLLIPTVHSDTGAGPAWEGAGIAPHIPASAEAALGTARAILNKHFAGQK* 0 >RBP3_takRub fugu two domains: 23-324,326-612 plus upstream dup 0 MAKALFLVASLLLLANDVLVRAAFPPSLITDMAKIVLDNYCSPEKLAGMKEAIEAAGTNTEVLNIPDGESLARVLSAGVQGTVSDSRLMVSYQPDYVPAV PPKMPPLPPEHLVAVLQTSIKLDLLEGNTGYLRIDHIIGEDVAEKVGPSLIDLIWNKILPTSALIFDLRYTSSGEISGIPYIVSYFTQAEPVVHIDSVYD RPSNTTTKLFSLSNLLGERYGITKPLIILTSKNTKGIAEDVAYCLKNLKRATIVGERTAGGSVKLDNFKVGSTDFYITVPTAKSINPVTGSSWEITGVKP DVEVNAEDALATAIKIVSLRAQIPAIIEGAATLIAKNYAFEATGADVATKLRELLAKGQYNSVVSSESLEVALSADLQRLSGDKSLKATQNAPVLPPM 0 0 DYSPEMYIELIKVSFHTDVFENNIGYLRFDMFGDFEEVKAIAQIIVEHVWNKVVNTDALILDLR 2 1 NNVGGPTTAIAGFCSYFFDADKLIVLDKLHDRPSGTTTELLTLPELT 1 2 GVRYGSKKSLIILTSGATAGAAEEFVYIMKKLGRAMIVGETTAGASHPPQVFSVGEIGIFLSIPTVHSDTAAGPAWEGTGITPHIPVSAEAALGTAKGILNKHFGGQK* 0 >RBP3_gasAcu sticklebck two domains: 27-317,323-612 no upstream dup 0 MAKLIFLVAPLLVLGNIAFIHAGFAPNVIIDMAKIVIDNYCSPEKLAGMKEAIEAAGSNTEVLSIPDAETLANVLSAGVQTTVSDPRLMISYEPNYVPVV PPKMPPLPPDQVIAVLQTSIKLDILEGNIGYLRIDHILGEDVAEKVGPLLLDLVWNKILPTSALIFDLRYTSSGDISGIPYIVSYFTEAGTPIHIDSIYD RPSNTTTKLFSMSTLLGERYSTSKPLIILTSKNTKGIAEDVAYCLQNLKRATIVGEKTAGGSVKVDKIQVRDTGFYVTVPTAKSVNPITGSTWEVTGVTP NVEVNAEDALATAIKIVTLLNRVPAIIEGSATLIADNYAFEDIGAAVAEKLKGLLANGEYSKVVSKDSLEMKLSADLRTLSGDKSLKTTSNVPALPPM 0 0 NYSPEMYIELIKVSFHTDVFEDNIGYLRFDMFGDFEEVKAIAQIIVEHVWNKVVNTDAMIVDLR 2 1 NNIGGPTTAIAGFCSYFFDSDKQIVLDRLYDRPSGTTTELRTLPELT 1 2 GTRYGSKKSLVMLTSRATAGAAEEFVYIMKKLGRAMIVGETTAGTSHPPKTFRVGETDIFLSIPTVHSDTAAGPAWEGAGVAPHIPVPADAALETAKGIFKKHFAGQK* 0 >RBP3_oryLat medaka two domains: 28-314,320-605 no upstream dup 0 MAKTLFLVASLLVLGNVVFLHASFPPSLITDLAKIVMDNYCSPEKLSGMKEDIATAGANTDVLNIPDGEALAKVLTDGVQTTVSDPRLRVSYEPNYVPVV PPQLPPEQLIAVLQTSIKLDILEGNIGYLRIDSIIGEEVAEKVGPLLLELVWSKILPTSALIFDLRYTSSGDITGIPYIISYLTDAKSEIHIDTIYDRPL NTTTKLLSMQSTLGQTYGGTKPLLVLTSKNTKDIAEDVAYCLKNLKRATIVGEKTAGGSAKIKKFRVGDTDFYVTLPTAKSINPITGSSWEVTGVKPNVE VNAEEALATALKIINLRLQVPAIIEESATLVANNYAFESTAADVAEKLKGHLANGDYNMVVSKESLEAKLSADLQSLSGDKSLTVSSNTGAPPPM 0 0 EYTPEMYIELIKISFHTDVFENNIGYLRFDMFGDFEEVKAIAQVIVEHVWNKVLHTDAMIIDLR 2 1 NNVGGPTTAIAGFCSYFFDGDKQILLDKLYDRSTGTTTDLLTLGELT 1 2 GERYGSKKSLIILASRATAGAAEEFVYIMKRLGRAMIVGETTAGASHPPKVFQVGESDIFLSIPTVHSDTSAGPGWEGAGVAPHIPVAAGAALETAKAILNKHIGGQQHAAS* 0 >RBP3_danRer zebrafish upstream frag as well two domains: 22-322,324-609 0 MAQALVLLVSLLFFSNVAHCNFSPTLIADMAKIFMDNYCSPEKLTGMEEAIDAASSNTEILSISDPTMLANVLTDGVKKTISDSRVKVTYEPDLILAAPP AMPDIPLEHLAAMIKGTVKVEILEGNIGYLKIQHIIGEEMAQKVGPLLLEYIWDKILPTSAMILDFRSTVTGELSGIPYIVSYFTDPEPLIHIDSVYDRT ADLTIELWSMPTLLGKRYGTSKPLIILTSKDTLGIAEDVAYCLKNLKRATIVGENTAGGTVKMSKMKVGDTDFYVTVPVAKSINPITGKSWEINGVAPDV DVAAEDALDAAIAIIKLRAEIPALAQAAATLIADNYAFPSIGEHVAEKLEAVVAGGEYNLISTKEDLEERLSEDLLKLSEDKCLKTTSNIPALPPM 0 0 NPTPEMFIALIKSSFQTDVFENNIGYLRFDMFGDFEHVATIAQIIVEHVWNKVVDTDALIIDLr 2 1 NNIGGHASSIAGFCSYFFDADKQIVLDHIYDRPSNTTRDLQTLEQLT 1 2 GRRYGSKKSVVILTSGVTAGAAEEFVFIMKRLGRAMIIGETTHGGCQPPETFAVGESDIFLSIPISHSTAQGPSWEGAGIAPHIPVPAGAALDTAKGMLNKHFSGQK* 0 >RBP3x_takRub fugu single upstream exon 42% frameshift no transcripts three domains: 23-323,325-615,618-907 MAPRTPVLLLVLLFCALPVRSFYQHTLVLEMAKLLLENYCIPENLVGMQEAIQRAIKSREILQISDRKTLATVLTVGVQGALNDPRLSVSYEPSFSPLPLQALSSLPVEQQLRLLRN SIKLDILDSDVGYLRIDRIIDEETLLKFGPLLRENVWDKAAQTSSLILDLRFSTAGGWSGIPSIVSYFTEPHSLVHIDTVYDRPSNTTTELWTMSSVRGK TFGGKKDMIVLIGRRTAGAAEAVAYTLKHLNRAIVVGERSAGGSLKVRKFRIAESDFYITMPVARSVSPITGKSWEVSGISPTVNVAAREALAKAQTFLA VRSRIPKVLQIVLDIIGRFYAFADRVQALLQQLESADLFSVVSEEDLAARLNHDLQTASEDPRLIIRHKRDNIPRAEEEPELHAANDHDGELVEGFTVQV LPHNTGYLRLDRFVRCSEGDKLEEIVAEKVWGPLKDTQNLIIDLRHNTGGSSTSVALLLSYLRDPLPKRHFFTIYDSVQNTTTEYGSRPHIPGPSYGSER GVYVLTSHYTAGAAEEFAYLIQSLHFGTVVGEITSGTLMHSKTFQVEGTDIFITVPFINFLDNNGEYWLGGGVVPDAIVLAEEALEHVNRTATFHQGLRSLIGRTGELLEKHYAIQEVAQKVGEV LLSKWAEGLYRSVVDLESLASQLTADLQEASGDHRLHVFRCDVELESLHGVPKIAAVEEAGFVIDALFKSELLPRNVGYLRFDTMADIEAAKGAAPRLVKSVWNKLVDTDSLIIDMRYNA GGSSTAVPLWCSYFVDGEPLQHLYTVYDRTTKTRVEVMTLPEVSGQRYDPGKDVYILTSHMTGSAAEAFVRAMRDLNRVTIVGEPTAGGSLSSATYQIGESVLYASIPNQVVTSAATGKL WSISGVEPDVFAQARDALPVAQRIISARLLKREKGR* 0 >RBP3x_danRer zebrafish single upstream exon 55%/41% transcript DN857398 3 domains: 21-321,324-609,612-901 expressed: inner nuclear layer and ganglion cell layer MAGVFVFILVTYRVLLVNASFQSALVLDMAKILLDNYCFPENLIGMQEAIQQAINSGEILHISDRKTLASVLTAGVQGALNDPRLTVSYEPNYTLITPPA LHSLPTEQLIRLIRSTVKLEVMDNNIGYLRIDRIIGQETVVKLGRLLHNNIWKKVAHTSAMIFDLRFSTAGELSGLPYIVSYFSDSDPLLHIDTIYERPT NITRELWTLPTLLGERFGKRKDLIVLISKRTIGAAEGVAYILKHLKRAVIIGERSAGGSVRVDKLKIGDSGFYITVPVARSVNPVTGQSWEVSGVAPSVT VNPKESIAKAKSLISVRKTIPKAVRRVSDIIKRYYSFKDKIPALLNQLAKADYFTVVSEEDLAGKLNHEMQSVFEDPRLLIKATQVLTDDASSEDRSSSD DLTDPLFKLEMISGNNGYLRFDRFPTPEVLLRLEDHIKKKIWQPVQETENLVIDLRFNTGGSTEALPILLSYMFDTSSSTYLFSIYDSIKNTTFDFHTLN NISGPSYGSTKGVYVLTSYYTAEAGEEFAYLMQSLHRGTVIGEITSGMLLHSKTFQIEQTSLAITVPIINFIDVNGECWLGGGVVPDAIVLAEEALERAH EIIAFHKNIQGLVQEAGDLLEKHYSVPEVAAKVSRLLQSKLTEGLYRSVVDYESLASQLTSDLQETSGDQRLHIFYCETEPETLHDTPKIPSPEEAGFIV EALFKVDVMSGNIGYLRFDMMEDIKVLQAINPEFLKVVWNKLVNTDMLIIDVRYNTGGYSTAIPLLCTYFFDAQPLTHIYTLFDRSTATVTKVTTLPDVL GQKYSSQKDVYILTSHITGSAAEAFTRTMKDLKRATVIGEPTIGGALSSGTYQIGNSILYASIPNQAVLNAVTGKPWSISGVEPHIVAQASDALIVAQKI IATKQQKKNSGK* 0 >RBP3x_salSal Salmo salar transcript frag DY725143 EETAAKLGPLLRENIWTKVTHASSLIFDLRYSTAGELSGVPFIISYFSDPEPLIHIDTVFDRPSNTTKELWTMSSIMGERYGKRKDLIVLTSKRTMGAAEAIAYTLKHLNRAIIVGERSA GGSVKVQKIRIGDSGFYITVPVARSVNPITGQSWEVSGVSPSVNINAKEAVANAKNLLAVRSAIPNAVQSVSDIIRQYYSFTDRVPALLQHLESTDFFSVISEEDLANKFNNELQSVSEDPRLMIKL >RBP3_calMil elephantfish frag 2 domains 6-243,334-531 PPVTRESSPTSDKLPEDPTFLQALVDTVFKVSVLPDNTGYFRFDEFPEISVMSKLVQYIIEKVWLPVKDTDRLIVDLRHNVGGHSSVVPLLLSYFYDPEP PVGLFTVYNRLTNTTSHTTLPGVGQHVYGSRKDIYVLTSHRTATAAEELAYLLQSLNRATIVGEITSGSLLHSRSFQIPSTHLVITIPFINFMDNHGECW LGGGVVPDSIVLAEDTLERTKEIIGFHAQVAELVESTGKLLAVHYAIPEVAAEVSAVLSAKLTQGLYRSVVDWESLASRLTVDLQETSVWSVSGAEPHVI VQANEAMTVALGIINLRAKIPSIFQAAGKLVADNYAFAQTGAGVAETIADLIEGTGYGMINTEGKLAEVLSDTLQQLSGDKHLKAVHIPGDSKHQTPGIAMIQ 0 0 QMPPPEILEDLVKFSYQTKVLENNVGYLRFDMFGDNEMITQVSELMAKHVWNVIASTSSLIVDLR 2 1 YNIGGPTSSIPILCSYFFDDDKTVLLDTVYSRPTDTISEMKAIPQVAGNGSTESSVHSYI 1 2 * 0 >RBP3_petMar lamprey fragment exon3/4 missing, fixed genomic frameshift; four domains: 34-312,327-615,625-914,916-1217 0 MAGSREQRTAFSTRLLLLLLLPLATCPSQAPYKFDTAVVLHLAKVLLDNYCIPENLVGMDEAIQRAVDNGELLGVSDPESAASALTEGIQAALNDPRIAVSYVPDVDDDGDREEGDAEGW DAGEQHRPTTFEELLATIPQKTSFAVLDGNVGYLRADEIISEATIKKLGPVIVQRIWNRLVDTDTFVLDLRYNSHGDITGLPYLVSCFCEPRPVVHLDTVYYRPTNESKEIWSLPDLQGA RFAKHKDVFVLVSANTEGVAENVAYVLKHLHRATVIGEQTAGGSLEVERFRLGDSRFFVTVPTARSQSPLTGRSWELTGVFPCVSAPSERALDKALEILNARGVARKAVEAAGELLLSSY TFVERASAIADHLSWSEYGSVVSVEDLTSKLTQDLQSVAEDPRLVVSNREPEWPPLAQPIPPGPPAPLPDDEQMLEAIVDSAFKVEVLEGNIGYLRFDEFGDASAVMKLRKQLVSKVWER IHPTDDVIIDLRYNLGGSSTAIPIVLSYFQDASPPVHFYTVYDRLRNVTAEFHTVSNLTSQLYGSKKGVYLLTSQHTATAAEEFTYLMQSLNRATIVGEITSGRLAHSLAFRLSDTGLYMT VPIVNFIDNNDEYWLGGGVVPDAIVLAENALDAAKEIIEFHAKMASLLELAGALVEGYYAMLSDGENATAEILLKYREGWYRSVVDYEALASQLTSDLHEIWGDHRLHAFYSDLQIERMD EDKTPSVPSPEELSVLIDTVFKVDILANNVGYLRFDMMTDAEVLKHVGPQLVEKVWNKISSTRSLVIDVRYNMGGYSTSIPILCSYFFDASPPRHLYTVFDRPSRSSTQVFTVPRVLGQR YGASKDVYILTSHMTGSAGEILTRVMSDLKRATVIGEPTAGGSLSTGTYRIGDSRLYVFIPNQAGVSPSGGRTWSVAGVEPHVQTKASEALQSALRMVALRADAPSILRTVGKLVADGYS RAEAALGVPSKLAALLEAGEYGALRSEEELAFKLTVHLQLITGDRHLKAVCVPEHATDRMPGIVPMQ 0 0 MPPTESFEDLIKFSFITDVLEGNIGYLRFDLFSDLEALEHVAHLLVEHVWKKICDTEILIIDLR 2 >RBP3_braFlo Branchiostoma floridae Region: 9 exons 1 domain: 83-381 ClpP/crotonase e-38 419-630; misfused to PAPS sulfotransferase 0 MTRPSKVDIVFPIKPFTIPTAHEQVKGEGPVDINKNALCKSADEGHTHP 1 2 VSIAMAPTAYIVFVALVPTVLSVDWLDVVMGIGDVMADHYLDQDLRALNDQSLLQRWNRTLVHRFQ 0 0 SWSQDDMSDSLRMEEGLTSELRNITGDETIK 0 0 VWDFGVYENTTQEPVPREFYNFSTFVDNFK 2 1 KNREKHINVTMLEGNVGYVSIRSMSHIVDIILPDPEMTEFFLSKMAALNESK 0 0 AIILDLRYNLGGDREGVVHWASFFFNATPSVPLSDVYYRDGVNQYWTLLE 0 0 VPGGIRFPDMPLYLLTSNRTSREAEEFAYAMQVVNRTTIIGETT 1 2 AGEEFTGMWFPIDQTDVHLLTRTNVVRNPITQDSWSGK 1 2 GVTPDIIVPSEKALTVALRKIQGSEDTKMAASSGNIEPPRWTVYLVFICTSIAILTYPTFM* 0
See also: Curated Sequences | RPE65 | Transducins | Usher: USH2A | Usher: CDH23 | LOXHD1 | Update Blog