Opsin evolution: Encephalopsin gene loss: Difference between revisions

From genomewiki
Jump to navigationJump to search
No edit summary
 
(63 intermediate revisions by 2 users not shown)
Line 1: Line 1:
=== Introduction to Encephalopsins ===
'''See also:''' [[Opsin_evolution|Curated Sequences]] | [[Opsin_evolution:_LWS_PhyloSNPs|LWS]] | [[Opsin_evolution:_Melanopsin_gene_loss|Melanopsins]] | [[Opsin_evolution:_Neuropsin_phyloSNPs|Neuropsin]] | [[Opsin_evolution:_Peropsin_phyloSNPs|Peropsin]] | [[Opsin_evolution:_RGR_phyloSNPs|RGR phyloSNPs]] | [[Opsin_evolution:_update_blog|Update Blog]]


Encephalopsin is a pivotal ciliary opsin, with basal evolutionary position to the imaging opsin gene expansion and well represented in early diverging deuterstomes, notably Branchiostoma. Despite this, encephalopsin has been the subject of just one [http://www.jneurosci.org/cgi/pmidlookup?view=long&pmid=10234000 substantive publication] from 1999. That article primarily emphasized  its non-retinal distribution in mouse brain.
== Introduction to Encephalopsin Evolution ==


The gene is well-represented in chondrichthyes and teleost fish with multiple independent copies, with single copies in amphibians, birds, lizard, marsupials, primates and rodents, with excellent amino acid conservation (over 60% between zebrafish and human). Surprisingly, upon more intensive taxonomic sampling, functional encephalopsin turns out to completely absent in a variety of other mammalian clades.
Encephalopsin is a pivotal ciliary opsin class, with basal evolutionary position to the imaging opsin gene expansion and well represented in early diverging deuterostomes, notably the cephalochordate Branchiostoma. Despite this, encephalopsin has been the subject of just one [http://www.jneurosci.org/cgi/pmidlookup?view=long&pmid=10234000 substantive publication] dating to 1999. That article primarily emphasized  its non-retinal distribution in mouse brain and testis but low-power microscopy could not determine its ultrastructural context -- presumably purely in the cytoplasmic membrane but not associated with a ciliary stack:


=== Massive encephalopsin gene loss in mammals ==
<blockquote>
"Encephalopsin is highly expressed in the preoptic area and paraventricular nucleus of the hypothalamus ... in selected regions of the cerebral cortex, cerebellar Purkinje cells, a subset of striatal neurons, selected thalamic nuclei, and a subset of interneurons in the ventral horn of the spinal cord. Rostrocaudal gradients of encephalopsin expression are present in the cortex, cerebellum, and striatum. Radial stripes of encephalopsin expression are seen in the cerebellum."</blockquote>


The table shows 10 species that today have unprocessed pseudogenes in place of a once-functional encephalopsin gene. The observed phylogenetic distribution of loss (which is irreversible) requires a minimum of 6 independent events, for example in some bats but not others. At the level of superordinal clades, no gene loss is observed in Euarchontoglires but Laurasiatheres are heavily affected as well as the Xenarthra within Atlantogenata and at least platypus within monotremes.
The gene is well-represented in chondrichthyes and teleost fish with multiple independent copies, but single copies in amphibians, birds, lizard, marsupials, primates and rodents, with excellent amino acid conservation (over 60% between teleost fish and human). Surprisingly, upon more intensive taxonomic sampling, functional encephalopsin turns out to completely absent in a variety of other mammalian clades. While interesting in its own right, this complicates the standard application of comparative genomics to the evolution of encephalopsin structure and function.


The odd feature of these pseudogenes is that they must all have been fairly recent on the mammalian time scale to remain observable as decayed relics today. However they seem to be at various degrees of degeneration. This suggests that the losses are not tied to a single unifying global environmental event. As the function of encephalopsin is obscure, the consequences of gene loss are even more so. Encephalopsin may have lost its supporting selected function; no increase in opsin gene number has compensated for this.
=== Multiple events of encephalopsin loss in mammals ===
 
The table lists 11 species that today have unprocessed pseudogenes in place of a once-functional encephalopsin gene. The observed phylogenetic distribution of loss (which is irreversible) requires a minimum of 7 independent events, for example in some bats but not others. At the level of superordinal clades, no gene loss is observed in Euarchontoglires but Laurasiatheres (especially artiodactyls) are heavily affected as well as the Xenarthra within Atlantogenata and at least platypus within monotremes.
 
The odd feature of these pseudogenes is that they must all have arisen fairly recent on the mammalian time scale to remain observable today as decayed relics. However they survive at variable degrees of degeneration. This suggests that the losses are not coordinated with a single unifying global environmental event. Pseudogenization in frog becomes apparent only upon alignment: not only is the Schiff base lysine now isoleucine, but various problems occur at conserved residues in the carboxy terminus.
 
The first exon presents special problems in that it is exceedingly GC rich. This leads to translational simplicity (high GLPA) and blockage by filters such as seg that recognize compositional simplicity. This results in problems for alignment algorithms, leading to non-assembly (as in horse genome) and/or non-recognition of sequences. Blastn at the trace archive still seems to work effectively.
 
However the artifacts may not be limited to bioinformatics -- the composition of the gene may predispose it to replication slippage and CpG hotspot mutations and so to pseudogenization in the absence of strong selection. For example horse encephalopsin has an astronomic 39 CpG sites in its 373 bp exon 1.
 
MTAGTRAGGQGSWEGGGAAGAEGPGPAGPLSPAPLFSPGTYERLALLLGCLGLLGVGNNLLVLVLYSKFPRLRTPTHLLLVNISLSDLLVALFGVTFTFVSCLRNGWVWDAVGCAWDGFSSSLC horse exon 1
MTAGTR<font color="red">.................................</font>TYER<font color="red">......................</font>YSKFPRLRTPTH<font color="red">.............</font>ALFGVTFTFVSCLRNGWVWDAVGCAWDGFSSSLC seg filter
atgaccgcgggaacccga<font color="red">gcggggggccagggttcctgggagggcggcggggcggcgggcgctgagggcccggggccggcgggcccgctgagccccgcgccgctcttcagcccgggc</font>acc
  M  T  A  G  T  R  A  G  G  Q  G  S  W  E  G  G  G  A  A  G  A  E  G  P  G  P  A  G  P  L  S  P  A  P  L  F  S  P  G  T
tacgagcgcctggcgctgctgctcggctgcctcgggctgctgggcgtgggcaacaacctgctggtgctcgtcctctactccaagttcccgcggctccgcacgcccacccacctcctgctg
  Y  E  R  L  A  L  L  L  G  C  L  G  L  L  G  V  G  N  N  L  L  V  L  V  L  Y  S  K  F  P  R  L  R  T  P  T  H  L  L  L
gtcaacatcagcctcagcgacctgctggtggccctcttcggggtcacctttaccttcgtgtcctgcctgcggaacggctgggtgtgggacgccgtgggctgcgcgtgggacggctttagcagcagcctctgcg
  V  N  I  S  L  S  D  L  L  V  A  L  F  G  V  T  F  T  F  V  S  C  L  R  N  G  W  V  W  D  A  V  G  C  A  W  D  G  F  S  S  S  L  C   
 
As the function of encephalopsin is obscure, the consequences of gene loss are even more so. Encephalopsin may have lost its supporting selected function; no increase in opsin gene number has compensated for this (from homology searches of complete genomes).
 
The clade-incoherent gene loss of encephalopsin is reminiscent of loss of GULO, the terminal enzyme for L-ascorbic acid biosynthesis, with relic gene fragments still detectable in guinea pigs (gene estimated lost 20 myr ago) and primates (lost on stem between lemurs and tarsier about 60 myr ago). At least 5 independent losses have been documented in some passerine birds, all tested species of bats, and teleost fish from zebrafish on (but not including sturgeon, chondrichthyes, or lamprey). A 2007 [http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=17849013 study] of loss of GUCYD (GPCR signaling guanyl cyclase D) exemplifies modern pseudogene dating methods.
 
In the table, 'ok' indicates presence of a gene with conserved sequence characteristics appropriate to  functioning opsins and appropriate to the ancestral encephalopsin class; ps indicates definite pseudogenization of the 4-exon gene (reading frame shifts, internal stop codons, loss of sequence conservation in expected regions); +- indicates partial gene availability with some uncertainty (traces commonly have errors such as short indels and quality departure at their ends; repetitive base composition in exon 1 may have lead to trace read technical problems).
ok >ENCEPH_homSap Homo sapiens (human) NM_014322 OPN3 full
ok >ENCEPH_panTro Pan troglodytes full
-- >ENCEPH_gorGor Gorilla gorilla only exon 3 avalable
ok >ENCEPH_ponAbe Pongo abelii full
ok >ENCEPH_nomLeu Nomascus leucogenys exon 1 ok frag
ok >ENCEPH_macMul Macaca mulatta (rhesus) XP_001094239 full
ok >ENCEPH_papHam Papio hamadryas 1st exon problematic 1x ok frag
ok >ENCEPH_calJac Callithrix jacchus full
ok >ENCEPH_tarSyr Tarsius syrichta exon 1 fragmentary
ok >ENCEPH_micMur Microcebus murinus full
ok >ENCEPH_otoGar Otolemur garnettii full
ok >ENCEPH_tupBel Tupaia belangeri fragments
ok >ENCEPH_musMus Mus musculus Opn3 Panopsin NM_010098 2aa del full
ok >ENCEPH_ratNor Rattus norvegicus XP_573517 predicted 2aa del full
ok >ENCEPH_speTri Spermophilus tridecemlineatus full
ok >ENCEPH_dipOrd Dipodomys ordii full
ok >ENCEPH_cavPor Cavia porcellus 3 aa del full
ok >ENCEPH_oryCun Oryctolagus cuniculus full
ok >ENCEPH_ochPri Ochotona princeps 5aa del ok frag
ps >ENCEPH_bosTau pseudo frag
ps >ENCEPH_turTru Tursiops truncatus pseudo frag
ps >ENCEPH_vicVic pseudo pseudo frag
ok >ENCEPH_canFam Canis familiaris (dog) DN422921 transcript correct, XP_854433 genbank error
ok >ENCEPH_felCat Felis catus full
ok >ENCEPH_equCab Equus caballus full
ps >ENCEPH_myoLuc Myotis lucifugus weak frag
ok >ENCEPH_pteVam Pteropus vampyrus 86%=homSap full
-- >ENCEPH_sorAra poor coverage
-- >ENCEPH_eriEur poor coverage
ok >ENCEPH_loxAfr Loxodonta africana 2 exons in browser, 1 2x full
-- >ENCEPH_echTel Echinops telfairi no coverage
ok >ENCEPH_proCap Procavia capensis ok frag
ps >ENCEPH_dasNov Dasypus novemcinctus pseudo
+- >ENCEPH_choHof Chololepis hoffmanni incomplete coverage
ok >ENCEPH_monDom Monodelphis domestica (opossum) encephalopsin OPN3 75%=homSap full
ok >ENCEPH_macEug Macropus eugenii ok frag
ps >ENCEPH_ornAna Ornithorhynchus anatinus pseudo
ps >ENCEPH_xenTro Xenopus tropicalis (frog) 45%=homSap full lost Schiff K
ok >ENCEPH_galGal Gallus gallus (chicken) 71%=homSap encephalopsin OPN3 full
ok >ENCEPH_taeGut Taeniopygia guttata mrna CK301424 70%=homSap full
ok >ENCEPH_anoCar Anolis carolinensis (lizard) 70%=homSap OPN3 full
ok >ENCEPH_danRer Danio rerio (zebrafish) NM_001111164 mrna 61%=homSap full
ok >ENCEPH_gasAcu Gasterosteus aculeatus (stickleback) 58%=homSap full
ok >ENCEPH_oryLat Oryzias latipes  58%=homSap full
ok >ENCEPH_takRub Takifugu rubripes (pufferfish) homSap=61% full
 
=== Insertion of retrogene CHML into intron 1 of encephalopsin ===
 
The story here is a common one in gene duplication processes, here apparently involving gene dosage compensation. In the amniote ancestor and birds and lizards today there is but one copy of the multi-exonic gene CHM. In mammals, as the sex chromosomes underwent upheaval, the ortholog ended up on chrX. Dosage compensation then favored retention of a subsequently retroprocessed copy into an autosomal chromosome, an event timed below to stem placentals (ie in elephant and sloth etc but not opossum).
 
The arrangement can be observed at the current quality of assemblies in Homo sapiens, Pan troglodytes, Pongo abelii, Macaca mulatta, Otolemur garnettii, Tupaia belangeri, Mus musculus, Rattus norvegicus, Cavia porcellus, Oryctolagus cuniculus, Canis familiaris, Equus caballus, Bos taurus, Choloepus hoffmanni, Procavia capensis and Loxodonta africana but not in marsupial, platypus, chicken or lizard assemblies.
 
[[Image:EncCHML.jpg|left]]
 
The intronless retrogene CHML landed in ancestral chromosome 1 (human numbering) within the first intron of encephalopsin, sharing the same direction of transcription. This raises the question of whether this copy of CHML has separate initiation and termination of transcription or is translated from encephalopsin transcripts by seizing some ribosomal starts. Presumably the normal splice junctions of encephalopsin continue to excise all of the expanded intron (resulting in normal protein -- it could not function with a large inappropriate inserted domain).
 
This event correlates (imperfectly) with lineage-specific losses of encephalopsin within tetrapods as CHML may have selectively out-competed ENCEPH in some lineages and been causative for pseudogenization.
 
The region annotated below only corresponds to a known domain (GDI: GDP dissociation inhibitor pfam00996) but suffices as a probe to encephalopsin intron 1. The 3D structure of this region is available in the exonic parent gene in rat, PDB 1LTX, allowing determination of the structural location of an interesting single-residue deletion in carnivores + bats + perisodactyls.


In the table, 'ok' indicates presence of a gene with conserved sequence characteristics appropriate to  functioning opsins and appropriate to the ancestral encephalopsin class; ps indicates definite pseudogenization of the 4-exon gene (reading frame shifts, internal stop codons, loss of sequence conservation in expected regions); +- indicates partial gene recovery with some uncertainty (traces commonly have errors such as short indels and quality departure at their ends; repetitive base composition in exon 1 may have lead to trace read problems).
<pre>
<pre>
ok >ENCEPH_homSap Homo sapiens (human) NM_014322 OPN3 full
CHML chr1:239,864,567-239,864,746                              genSpp
ok >ENCEPH_panTro Pan troglodytes full
AFRQCSFSEYLKTKKLTPNLQHFVLHSIAMTSESSCTTIDGLNATKNFLQCLGRFGNTPF  CHML_homSap Homo sapiens (human)
ok >ENCEPH_gorGor Gorilla gorilla ok frag
AFRQCSFSEYLKTKKLTPNLQHFVLHSIAMTSESSCTTIDGLNATKNFLQCLGRFGNTPF  CHML_panTro Pan troglodytes (chimp)
ok >ENCEPH_ponAbe Pongo abelii full
AFRQCSFSEYLKTKKLTPNLQHFVLHSIAMTSESSCTTIDGLNATKNFLQCLGRFGNTPF  CHML_ponPyg Pongo pygmaeus (orang_sumatran)
ok >ENCEPH_nomLeu Nomascus leucogenys exon 1 ok frag
AFRQCSFSEYLKTKKLTPNLQHFVLHSIAMTSESSCTTIDGLNAIKNFLQCLGRFGNTPF  CHML_macMul Macaca mulatta (rhesus)
ok >ENCEPH_macMul Macaca mulatta (rhesus) XP_001094239 full
AFRQCSFSEYLKTKKLTPNLQHFVLHSIAMTSESSCTTIDGLNATKNFLQCLGRFGNTPF  CHML_calJac Callithrix jacchus (marmoset)
ok >ENCEPH_papHam Papio hamadryas 1st exon problematic 1x ok frag
AFEQCLFSEYLKTKKLTPNLQHFILHSIAMTSESSCTTIDGLKATKNFLQCLGRFGNTPF  CHML_otoGar Otolemur garnettii (bushbaby)
ok >ENCEPH_calJac Callithrix jacchus full
AFKQCSFSEYLKTKKLTPNLQHFVLHSIAMTSESSCTTVDGLKATKNFLQCLGRFGDTPF  CHML_micMur Microcebus murinus (mouse_lemur)
ok >ENCEPH_tarSyr Tarsius syrichta
AFEQCLFSEYLKTKKLTPNLRHFILHSIAMTSESSCSTLDGLKATKTFLQCLGRFGNTPF  CHML_tupBel Tupaia belangeri (tree_shrew)
ok >ENCEPH_micMur Microcebus murinus full
DFKQCSFSDYLKTKKLTPNLQHFILHSIAMTSESSCTTLDGLQATKTFLQCLGRFGNTPF  CHML_musMus Mus musculus (mouse)
ok >ENCEPH_otoGar Otolemur garnettii full
DFKQCSFSDYLKTKKLTPNLQHFILHSIAMSSDSSCTTLDGLQATKNFLRCLGRFGNTPF  CHML_ratNor Rattus norvegicus (rat)
+- >ENCEPH_tupBel Tupaia belangeri so-so frag
DFQQCLFSEYLKTKRLTPNLQHFILHSIAMTSESSCTTLDGLKATKNFLQCLGRFGNTPF  CHML_cavPor Cavia porcellus (guinea_pig)
DFKQCSFSEYLKAKKLTPNLQHFVLHSIAMTSETSCTTLDGLKATKIFLQCLGRFGNTPF  CHML_oryCun Oryctolagus cuniculus (rabbit)
ok >ENCEPH_musMus Mus musculus Opn3 Panopsin NM_010098 2aa del full
DFKQCSFSEYLKTKKLTPNLQHFILHSIAMTSESSCTTLDGLRATKNFLQCLGRFGNTPF  CHML_ochPri Ochotona princeps (pika)
ok >ENCEPH_ratNor Rattus norvegicus XP_573517 predicted 2aa del full
AFVHCSFSDYLKTKKLTPNLQHFVLHSIAMT-ESSCTTIDGLKATKNFLRCLGRFGNTPF  CHML_canFam Canis familiaris (dog)
ok >ENCEPH_speTri Spermophilus tridecemlineatus full
AFMQCSFSEYLKTKKLTPNLQHFVLHSIAMT-ESSCTTIDGLKATKNFLQCLGRFGNTPF  CHML_felCat Felis catus (cat)
ok >ENCEPH_dipOrd Dipodomys ordii full
AFMQCSFSEYLKTKKLTPNLQHFVLHSIAMT-ESSCTTIDGLKATKNFLQCLGRFGNTPF  CHML_equCab Equus caballus (horse)
ok >ENCEPH_cavPor Cavia porcellus 3 aa del full
DFTQRPFSEYLKTQKLTPNLQHFILHSIAMT-EPSCLTVDGLKATKHFLQCLGRYGNTPF  CHML_myoLuc Myotis lucifugus (microbat)
ok >ENCEPH_oryCun Oryctolagus cuniculus full
DFTQCSFSEYLKTKKLTPNLQHFVLYSIAMT-ESSCTTVDGLKAAKNFLRCLGRFGNTPF  CHML_pteVam Pteropus vampyrus (macrobat)
ok >ENCEPH_ochPri Ochotona princeps 5aa del ok frag
AFTQCSFSEYLKTKNLTPSLQHFILHSIAMMSESSCTTVDGLKATKTFLQCLGRFGNTPF  CHML_bosTau Bos taurus (cow)
AFTQCSFSEYLKTKKLTPSLQHFVLHSIAMMSESSCTTIEGLKATKNFLQCLGKFGNTPF  CHML_turTru Tursiops truncatus (dolphin)
ok >ENCEPH_canFam Canis familiaris (dog) XP_854433 full
DFMQCSFSEYLKAKKLTPSLQHFVLHSIAMTSESSCTTIDGLKATKNFLQCLGRFGNTPF  CHML_susScr Sus scrofa (pig)
ok >ENCEPH_felCat Felis catus full
AFVQSSFSEYLKTKKLTPNLQHFVLHSIAMMSESPCTTIDGLKATKNFLQCLGRFGNTPF  CHML_vicVic Vicugna vicugna (vicugna)
ps >ENCEPH_bosTau pseudo frag
AFVQSSFSEYLKTKKLTPNLQHYILHSISMTSESSCTTLDGLKATKKFLQCLGRFGNTPF  CHML_eriEur Erinaceus europaeus (hedgehog)
ps >ENCEPH_turTru Tursiops truncatus pseudo frag
AFIQCSFSDYLKTKKLTPNLQHFILHSIAMTPEASCSTVDGLKATKIFLQCLGRFGNTPF  CHML_sorAra Sorex araneus (shrew)
ps >ENCEPH_susScro no coverage
TFKQCSFSEYLKTKRLTPNIHHFVLHSIAITSQSSCTIIDGLKATKTFLWCLGWFSKNPF  CHML_dasNov Dasypus novemcinctus (armadillo)
ps >ENCEPH_vicVic pseudo pseudo frag
AFEQCSFSEYLKTKKLTPNLQHFILHSIAMTSQSSCTTLDGLKATKNFLQCLGRFGNTPF  CHML_choHof Choloepus hoffmanni (sloth)
ps >ENCEPH_myoLuc Myotis lucifugus weak frag
AFKQCSFSEYLKTKKLTPNLQHFVLHSIAMTSESSCTTIDGLKATKNFLQCLGRFGNTPF  CHML_loxAfr Loxodonta africana (elephant)
ok >ENCEPH_pteVam Pteropus vampyrus 86%=homSap full
AFKHCSFSEYLKTKKLTPNLQHFVLHSIAMTSESSCTTIDGLKATKTFLQCLGRFGNTPF  CHML_proCap Procavia capensis (hyrax)
ok >ENCEPH_equCab Equus caballus full
...                                                            CHML_echTel Echinops telfairi (tenrec)
ps >ENCEPH_sorAra pseudo frag
...                                                            CHML_monDom Monodelphis domestica (opossum)
ps >ENCEPH_eriEur no coverage
...                                                            CHML_ornAna Ornithorhynchus anatinus (platypus)
...                                                            CHML_anoCar Anolis carolinensis (lizard)
ok >ENCEPH_loxAfr Loxodonta africana 2 exons in browser, 1 2x full
...                                                            CHML_galGal Gallus gallus (chicken)</pre>
ps >ENCEPH_echTel Echinops telfairi no coverage
 
ok >ENCEPH_proCap Procavia capensis ok frag
=== Post-marsupial loss of TMT ===
ps >ENCEPH_dasNov Dasypus novemcinctus  pseudo
 
+- >ENCEPH_choHof Chololepis hoffmanni so-so frag
TMT (for teleost multiple tissue) is another fairly obscure opsin, again subject of but a [http://www.ncbi.nlm.nih.gov/pubmed/12670711 single 2003 publication] that established its expression in "a variety of neural and non-neural tissues, including a zebrafish embryonic cell line that exhibits a light entrainable clock" suggesting TMT regulates peripheral clocks in teleost fish. The index sequence for TMT is defined by AF349947 (respectively AF402774 for fugu).
ok >ENCEPH_monDom Monodelphis domestica (opossum) encephalopsin OPN3 75%=homSap full
 
ok >ENCEPH_macEug Macropus eugenii ok frag
TMT is actually a family of 3 paralogs in teleost fish. The one studied has a partially syntenic copy on another chromosome likely resulting from whole genome duplication. A third paralog -- called TMT here -- is actually the only one with an syntenic counterpart in frog, lizard, birds, and marsupials but not placentals (even though the gene order is not disturbed).
ps >ENCEPH_ornAna Ornithorhynchus anatinus pseudo
 
The gene is curiously intertwined with the 4th and final coding exon of an apparently unrelated gene on the opposite strand, ST6GAL2 sialyltransferase, from fish to opossum (that is all species in which it is found). This would not create transcriptional or translational issues for either gene.
ok >ENCEPH_xenTro Xenopus tropicalis (frog) 45%=homSap full
 
ok >ENCEPH_galGal Gallus gallus (chicken) 71%=homSap encephalopsin OPN3 full
Curiously ST6GAL1 in the same orientation is adjacent to an older TMT gene on another chromosome but without the overlap, suggesting this latter gene pair is parental and that the exons were intertwined in fish stem after a segmental duplication, perhaps mediated by intronic retroposon driven recombination. This older gene is found intact from fish to lizard and as barely recognizable (blast of syntenic region) pseudogenes in chicken, finch and platypus but not marsupial or placental. This requires three independent losses, one in chicken/finch stem, another approximately synchronous one on the platypus branch (the pseudogene is old but way younger than platypus divergence), and a final earlier event in marsupial-placental stem that left no traces. GenBank has no transcripts for any tetrapod TMT as of Dec 2009.
ok >ENCEPH_taeGut Taeniopygia guttata mrna CK301424 70%=homSap full
 
ok >ENCEPH_anoCar Anolis carolinensis (lizard) 70%=homSap OPN3 full
>TMTa_taeGut Taeniopygia guttata (finch) pseudogene frag from ST6GAL1 synteny
FHSTSSSCVRADFTSVRFR GITSLISLAVLSYERCCTM RTAEPDTTNSRKAWTGIILSWTYSLLWTVPPLLG SSYGPEGPGITCSVNWHSQDANNASYIICLFIFCLVIPFAIIVYSYGKLLCAVRQV
ok >ENCEPH_danRer Danio rerio (zebrafish) NM_001111164 mrna 61%=homSap full
ok >ENCEPH_gasAcu Gasterosteus aculeatus (stickleback) 58%=homSap full
>TMTa_galGal Gallus gallus (chicken) pseudogene frag from ST6GAL1 synteny
ok >ENCEPH_oryLat Oryzias latipes 58%=homSap full
VRLIEFAIYILTFFFGAIFNVLALWVFFCKIKKWTETKVYAINLVFADFFVICILPFMAYLIWKNSVRDELCQFIEAMYFINMVVSIYVISFIFIDRYLGIKHPLKAR
ok >ENCEPH_takRub Takifugu rubripes (pufferfish) homSap=61% full
AFRSPSKAALLCGLLWLAVTTGTILNFQQRYAHFCFQYDTSKPTALILLSFFIFTLPLATLTFCSIEIIRNLKKQMKTNALEEKSIQKALYIIYANLVVFLLCFLPSHLIVIARLVT
</pre>
 
>TMTa_ornAnaPS Ornithorhynchus anatinus (platypus) pseudogene frags adjacent to syntenic GPR35, weak assembly from ST6GAL1 side
2 WAYASFWATMPLVGLGNYAPEPFGTSCTLDWWLAQASVAGQAFILNILFFCLLLPTAVIV 0
0 SKGVSKGMEKIGEQ*VQLTVFVVVICFLFCWLPYGTMASISTCGKPGLITPT SSVFPLVLG KNSTVLNPVIYGFLNvk 0
  0 FYRCFHALMSF KDFTSSISEVSPIPFDFSCVTPRIQNNH-FPSASEGRP
 
Elephant shark has two apparent paralogs but these are difficult to assign from single exon fragments; counterparts cannot be found in lamprey or Ciona, yet amphioxus and possibly sea urchin have related genes.
 
TMTs are encephalopsin-class opsin, clustering most closely with these opsins. Although initially found in fish, it is not a newly arisen product of whole genome duplication there because synteny to encephalopsin is lacking, the two encephalopsins are quite diverged, and species earlier and later (lacking the duplication) also have both copies. 
 
Thus TMT likely arose in lamprey ancestor if not far earlier. Indeed, certain encephalopsins of cephalochordates cluster better with TMT than classical encephalopsins, as do the two encephalopsin-class opsins of the lophotrochozoan, Platynereis dumerilii, and 8 in insects and crustacea. The sea urchin gene associates equally with the two encephalopsin classes. There are additional unexpected associations with pinopsins -- that class may need to be broken into two groups.
 
TMT is thus likely the ancestral gene, with encephalopsin an ancient but later derived form receiving more emphasis historically because it persisted into some placentals, notably human, unlike TMT. Alternate scenarios involving loss of encephalopsin from early diverging bilaterans cannot be ruled out. It is noteworthy that Branchiostoma, a late deuterostome without imaging eyes, has an expanded repertoire of both TMT and encephalopsin, just as it did with neuropsins.


=== Reference set of curated encephalopsins (including pseudogenes) ===
== Reference set of 54 curated encephalopsins (including pseudogenes) ==
<pre>
<pre>
>ENCEPH_homSap Homo sapiens (human) NM_014322 OPN3 full
>ENCEPH_homSap Homo sapiens (human) NM_014322 OPN3 full
Line 179: Line 280:
0 LRCVEDLQTIQVIKILRYEKKVAKMCFFMIFTFLICWMPYIVIRFLVVNGYGRLVTPTISIVSYLFCKSSTVYNPVIYIFMIRK 0
0 LRCVEDLQTIQVIKILRYEKKVAKMCFFMIFTFLICWMPYIVIRFLVVNGYGRLVTPTISIVSYLFCKSSTVYNPVIYIFMIRK 0


>ENCEPH_canFam Canis familiaris (dog) XP_854433 full
>ENCEPH_canFam Canis familiaris (dog) DN422921 transcript correct, genbank XP_854433 errs
0 MMRRVKLTLIPAAVLDIESQAPKDESLYFSICHFCPQKGFLEFQRLRTPTHLLLVNLSLSDLLVSLFGVTFTFVSCLRNGWVWDSVGCVWDGFSSSLF 1
0 MYSGNRSGGQGHWEGGGAAGAEGPGPAGTLSPAPLFSPGTYERLALLLGSVGLLGVGNNLLVLVLYSKFQRLRTPTHLLLVNLSLSDLLVSLFGVTFTFVSCLRNGWVWDSVGCVWDGFSSSLF 1
2 GIVSITTLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWSGAPLLGWNRYILDVHGLGCTVDWKSKDANDSFFVLFLFLGCLVVPMGVIVHCYGHILYSIRM 0
2 GIVSITTLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWSGAPLLGWNRYILDVHGLGCTVDWKSKDANDSFFVLFLFLGCLVVPMGVIVHCYGHILYSIRM 0
0 LRCVEDLQTIQVIKILRYEKKVAKMCFLMIFIFLIFWMPYIVICFLVVNGYGHLVTPTVSIVSYLFAKSSTVYNPVIYIIMIRK 0
0 LRCVEDLQTIQVIKILRYEKKVAKMCFLMIFIFLIFWMPYIVICFLVVNGYGHLVTPTVSIVSYLFAKSSTVYNPVIYIIMIRK 0
Line 239: Line 340:
>ENCEPH_proCap Procavia capensis ok frag
>ENCEPH_proCap Procavia capensis ok frag
2 GIASITSLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWSGAPLLGWNRYILDTHGLACTVDWKSNNTNDSSFVLFLFLGCLVVPVGVIVHCYGHILYSIRM 0
2 GIASITSLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWSGAPLLGWNRYILDTHGLACTVDWKSNNTNDSSFVLFLFLGCLVVPVGVIVHCYGHILYSIRM 0
0 LRCVEDLQTIQVIKILRYEKKVAKMCLFMILTFLICWMPYIVICFLMVNDYGYLVTPTISIVSYLIAKSSTVYNPVIYTFMIRKV 0
0 LRCVEDLQTIQVIKILRYEKKVAKMCLFMILTFLICWMPYIVICFLMVNDYGYLVTPTISIVSYLIAKSSTVYNPVIYTFMIRK 0
0 FRRSLFQLLCFRLLRCQRPAKNKPEVGSEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVNDTDKINGSKADVIQVRPL* 0
0 FRRSLFQLLCFRLLRCQRPAKNKPEVGSEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVNDTDKINGSKADVIQVRPL* 0


Line 259: Line 360:
0 LRCVEELQTIQVIKILRYEKKVAKMCFLMIAIFLFCWMPYAVICLLVANGYGSLVTPTVAIIASLFAKSSTAYNPIIYIFMSRK 0
0 LRCVEELQTIQVIKILRYEKKVAKMCFLMIAIFLFCWMPYAVICLLVANGYGSLVTPTVAIIASLFAKSSTAYNPIIYIFMSRK 0
0 FRRCLLQLLCFRLLKFQQPKKDRPVIRTEKQIRPIVMSQKVGDRPKKKVTFSSSSIIFIITSDETQMIDENDKNSGTKVNVIQVRPL* 0
0 FRRCLLQLLCFRLLKFQQPKKDRPVIRTEKQIRPIVMSQKVGDRPKKKVTFSSSSIIFIITSDETQMIDENDKNSGTKVNVIQVRPL* 0
>ENCEPH_sacHar Sarcophilus harrisii (tasmanian_devil) 94% identity monDom
0 MYSGNSSDDAGGGYWGSGGTGGAGGTGVAGEPAPEGSPRPAPLFSPGTYELLALLIATIGLLGLCNNLLVLVLYYKFQRLRTPTHLFLVNISFSDLLVSLFGVTFTFVSCLRSGWVWDSVGCAWDGFSNTLF 1
2 GIVSIMTLTVLAYERYNRIVHAKVINFSWAWRAITYIWLYSLIWTGAPLLGWNRYTLEIHGLGCSVDWKSKDPNDSSFVLFLFLGCLVLPVGVMAYCYGHILYAIRM 0
0 FRCVEELQTLQVIKILRYEKKVAKMCFLMIATFLFCWMPHAVICFLVANGYGSLVTPTVAIIPSLFAKSSTAYNPIIYIFMSRK 0
0 FRRCLLQLLcFRQLKFQQPKKDRAIIRTEKQIRPIVMSQKVGDRPKKKVTFSSSSIIFIITSDETQMIDDNDKNSETKVNVIQVRPL* 0


>ENCEPH_macEug Macropus eugenii ok frag
>ENCEPH_macEug Macropus eugenii ok frag
Line 290: Line 397:
0 FRRCLVQLFCVQFLRFKRTLKEQPAIESNKPIRPIVMSQKVGDRPKKKVTFSSSSIIFIITSDDTEQIDVSTKCSDTKINVIQVKPL* 0
0 FRRCLVQLFCVQFLRFKRTLKEQPAIESNKPIRPIVMSQKVGDRPKKKVTFSSSSIIFIITSDDTEQIDVSTKCSDTKINVIQVKPL* 0


>ENCEPH_xenTro Xenopus tropicalis (frog) 45%=homSap full
>ENCEPH_xenTro Xenopus tropicalis (frog) 45%=homSap recent pseudogene, I for Schiff K, loss of C-terminal conserved residues
0 MPVTNGSHNNSISWLHSKDMFTEDTYHFLALIVATVGFLGLVNNLLVLILYCKFKRLQTPTNLLFFNTSLCHFVFSLLAITFTFMSCVRGSWAFSVEMCVFHGFSKNLL 1
0 MPVTNGSHNNSISWLHSKDMFTEDTYHFLALIVATVGFLGLVNNLLVLILYCKFKRLQTPTNLLFFNTSLCHFVFSLLAITFTFMSCVRGSWAFSVEMCVFHGFSKNLL 1
2 GIVSFGTLTVVAYERYARVVYGKYVNSSWSKRSITFVWVYSLAWTGFPLIGWNLYTFETHKLDCSFEWTATDPKDTAFVLLFFLACITLPLSIMAYCYGYILYEIQK 0
2 GIVSFGTLTVVAYERYARVVYGKYVNSSWSKRSITFVWVYSLAWTGFPLIGWNLYTFETHKLDCSFEWTATDPKDTAFVLLFFLACITLPLSIMAYCYGYILYEIQK 0
0 LRSVKNIQNFQEITILDYEIKMAKMCLLMMLTFLIGWMPYTILSLLVTSGYSKFITPTITVMPSLLAIASAAYNPVIHIFTIKK 0
0 LRSVKNIQNFQEITILDYEIKMAKMCLLMMLTFLIGWMPYTILSLLVTSGYSKFITPTITVMPSLLAIASAAYNPVIHIFTIKK 0
0 FRQCLVQLLPPINFHPPINPPINNFWRLLKNLNGRLAMKKVKPVLGKGRSHNRPEKKVPPINFSSSDFFTRTTSDTGTHGITESTKGKRTNVRLIQVHPL* 0
0 FRQCLVQLLFHNFWRLLKNLNGRLAMKKVKPVLGKGRSHNRPEKKVFSSSDFFTRTTSDTGTHGITESTKGKRTNVRLIQVHPLYP* 0


>ENCEPH_danRer Danio rerio (zebrafish) NM_001111164 mrna 61%=homSap full
>ENCEPH_danRer Danio rerio (zebrafish) NM_001111164 mrna 61%=homSap full
Line 301: Line 408:
0 LRSIQDLQTVQTIKILRYEKKVAVMFLMMISCFLVCWTPYAVVSMLEAFGKKSVVSPTVAIIPSLFAKSSTAYNPVIYAFMSRK 0
0 LRSIQDLQTVQTIKILRYEKKVAVMFLMMISCFLVCWTPYAVVSMLEAFGKKSVVSPTVAIIPSLFAKSSTAYNPVIYAFMSRK 0
0 FRRCMLQMLCSRLTSLQHTIKDRPLSRIEHPIRPIVMSQSRTDRPKKRVTFSSSSIVFIIASHDTHPLDITSKCNDEPDINVIQVRPL* 0
0 FRRCMLQMLCSRLTSLQHTIKDRPLSRIEHPIRPIVMSQSRTDRPKKRVTFSSSSIVFIIASHDTHPLDITSKCNDEPDINVIQVRPL* 0
>ENCEPH_tetNig Tetraodon nigroviridis (pufferfish) homSap=61% full
0 MSSADDSRSARSGEPSLFAVHTYRLLAAAIGAIGVLGFCNNLAVAALYWRFRRLRTPTNLLLLNISLSDLLVSLLGVNFTFAACVQGRWTWNQATCVWDGFSNSLF 1
2 GIVSIMTLAALAYERYIRVVHAQVVDFPWAWRAIGHIWLYSLAWTGAPLLGWNRYTLEIHRLGCSLDWASKDPNDASFILLFLLACFFVPVGIMIYCYGNILYAVHM 0
0 IRSIQDLQTVQIIKILRYEKKVSVMFFLMISCFLLCWTPYAVVSMMVAFGRKSMVSPTVAIIPSFFAKSSTAYNPVIYVFMSRK 0
0 FRRCLLQLLCSRLSWLQRGLKERPLAPVQRPIRPIVVSRPCGKGTRPKKKVTFSSSSIVFIITSDDFRQLDVTSRAGDSADVNAIQVRPL* 0


>ENCEPH_takRub Takifugu rubripes (pufferfish) homSap=61% full
>ENCEPH_takRub Takifugu rubripes (pufferfish) homSap=61% full
Line 335: Line 448:
0 CWSPYAVASLFVASGFEHLVSPPVSIVPSLLAKSNAVCNPLLFLLMSGN 0
0 CWSPYAVASLFVASGFEHLVSPPVSIVPSLLAKSNAVCNPLLFLLMSGN 0


>ENCEPH4_braFlo Branchiostoma floridae (amphioxus) 12435605 AB050608 encephalopsin Amphiop4 new exon 12 and 34 + perfect fit 
>ENCEPH4_braFlo Branchiostoma floridae (amphioxus) Gt -ZFYVE1 +RTF1 -CES1 -POMT2 402 aa 12435605 AB050608 Amphiop4 new exon 12 and 34
0 MALYNNTSSPSQDLLWDAPYSQGHIWDNSSASNSSEDVMDQGKVELQDFSDAGYTAIATCLALI 1
0 MALYNNTSSPSQDLLWDAPYSQGHIWDNSSASNSSEDVMDQGKVELQDFSDAGYTAIATCLALI 1
2 GFVGFTNNFVVILLIGCHRQLRTPFNLLLLNMSVADLLVSVCGNTLSFASAVRHRWLWGRPGCVWYGFANSLF 1
2 GFVGFTNNFVVILLIGCHRQLRTPFNLLLLNMSVADLLVSVCGNTLSFASAVRHRWLWGRPGCVWYGFANSLF 1
Line 343: Line 456:
0 FREFLLARLQRVCCRQQAVPRVTPMDDNVHVRLGGEGPSQSQQFLPAGENVENVDMLEYVQENCKPKADSLSTISE* 0
0 FREFLLARLQRVCCRQQAVPRVTPMDDNVHVRLGGEGPSQSQQFLPAGENVENVDMLEYVQENCKPKADSLSTISE* 0


>ENCEPH4_braBel Branchiostoma belcheri (amphioxus) no_ref genome encephalopsin Amphiop4 introns from braFlo  
>ENCEPH4_braBel Branchiostoma belcheri (amphioxus) AB050608 full Amphiop4 introns from braFlo PUBMED 12435605
0 MPLYNTSSGPTQGLPWDTPYSQDPIWNDSSPSNSSEDAVVDQGRGELQDFSDAGYTAIATGLALI 1
0 MPLYNTSSGPTQGLPWDTPYSQDPIWNDSSPSNSSEDAVVDQGRGELQDFSDAGYTAIATGLALI 1
2 GLVGSMNNFVVILLIGCHRQLRTPFNLLLLNVSVADLLVSVCGNTLSFASAVQHRWLWGRPGCVWYGFANSLF 1
2 GLVGSMNNFVVILLIGCHRQLRTPFNLLLLNVSVADLLVSVCGNTLSFASAVQHRWLWGRPGCVWYGFANSLF 1
Line 351: Line 464:
0 FREFLLARLRTFCCRQPRMLRVTPMDDNAHARLVGEGPSHAQQVIPSEENGENVEMRKVQGNQLKADSLSTISE* 0
0 FREFLLARLRTFCCRQPRMLRVTPMDDNAHARLVGEGPSHAQQVIPSEENGENVEMRKVQGNQLKADSLSTISE* 0


>ENCEPH5_braFlo Branchiostoma floridae (amphioxus) no_ref genome encephalopsin extra 0 intron 
0 MLGMHNVMNATDYDNNNATFAAWNFQRNGTTEEEVEFSGFDTVAVVIAAIGIAGFLSNGAVVLLFLKFRQLRTPFNMLLLNMSVADLLVSVCGNTLSFASAVRHRWLWGRPGCVWYGFANHLF 1
2 GLVSLISLAVISYERYRMVVKPKGPGSSYLTYNKVGLAIIFIYLYCLLWTTLPIVGWSSYQLE 0
0 GPKISCSVAWEEHSLSNTSYIVAIFIMCLLLPLLIIIYSYCRLWYKVKK 0
0 GSQNLPPAIRKSSQKEQKIARMVVVMITCFLVCWLPYGAMALVVSFGGESLISPTAAVVPSLLAKSSTCYNPLVYFAMNNQ 0
0 FRRYFQDLLCCGRRLFDASASVNTCNTSAMPRHSPVFQKPDSDQYNGIQKSREPQMRTTGQNAPYRQWIEMQTIAVVVKADEVNNKFGEVKT* 0
>ENCEPH5_braBel Branchiostoma belcheri (amphioxus)  AB050609 encephalopsin Amphiop5 extra Nfrag in mrna 
0 MLGIYNVVNATEYGNNTTFAAWDFKRNGTGGEEEVEFFGYDAVAGVIAIIGVVGFVSNGAVVVLFLKFPQLRTPFNLLLLNMAVADLLVSVCGNTLSFASAVRHRWLWGRPGCVWYGFANHLF 1
2 GLVSLISLAVISFLRYRMVVKPKGPGSSYLTYTKVGLAILFIYLYCLLWTTLPIAGWSSYQLE 0
0 GPKIGCSVAWEEHSWSNTSYIVVLFITCLFAPLLIIVYSYYRLWHKVKQ 0
0 GSRNLPAAMRKSSQKEQKIAMMVIVMITCFMVCWLPYGAMALVVTFGGERLISHTAAVVPSLLAKSSTCYNPVVYFAMNSQ 0
0 FRRYFQDLLCCGRRLFDVSQSVVTGNTAMPRNNSQGFRKDDSDQKQDNGLPKQSEGPMCDHSSNESQMEGSRHNTAASQQWIEMQTIAVVVKAVEVDTSAANEP* 0
>ENCEPH6_braFlo browser duplicated frag
0      VAAILALIGVLGIVNNSTTLYLVGRYKQLRTPFNILMVNLSVSDLLMCVLGTPFSFVSSLHGRWMFGHSGCEWYGFICNF 1
2 GIVSLITLTVISYERYLLMKRLPNERILSYRAVALAVVFIWCYSLLWTAPPLVGWSSYGPEGYGISCSVNWESRTANDTSYIVAYFVGCLVFPVAIIVISYTRLLILYMRQ 0
0 APSAPMQMLVRREKRVTKMVVVMIMGFTICWTPYTIVALIVTCGGEGIITPAAATVPALFAKSSVVYNAAIYVAMNNQ 0
0 FRKCFLRSLNCRSQPRDPSSQQYTLKTNQVGMSTSGSQAARTADRIKTVHVATANPQDHRSSSGQAVEDNGGFRKSLTHSLPLNSISTLLEAEK* 0


>ENCEPH_strPur Stronglyocentrotus purpuratus GLEAN3_03451 modified terminal exon by extending penultimate to stop codon
>ENCEPH_strPur Stronglyocentrotus purpuratus GLEAN3_03451 modified terminal exon by extending penultimate to stop codon
Line 377: Line 471:
1 EQKLLKTLIAIAIAFLVAWSPYAITSMIVVFGGSELLSLTATTLPSLFAKSSVMINPIIYAVTSRVFRKSLKK 0
1 EQKLLKTLIAIAIAFLVAWSPYAITSMIVVFGGSELLSLTATTLPSLFAKSSVMINPIIYAVTSRVFRKSLKK 0
0 MLTSFFPGCMTYIMTDKSPPSSSRPIQLGLCKYHFLY* 0
0 MLTSFFPGCMTYIMTDKSPPSSSRPIQLGLCKYHFLY* 0
</pre>


>ENCEPH4a_takRub Takifugu rubripes (pufferfish) 12670711 AF402774 encephalopsin TMT 40%=homSap full
== Reference set of 31 TMT genes from amphioxus to marsupial ==
 
<pre>
>TMT_monDom Monodelphis domestica shortened final exon DFPEVSEKQLCLLS PEVWPQP +NCK2 -UXS1 +TMT -ST6GAL2 (overlap) -RALY
0 MSNNLTTNLSLEALLSASEDKQRNGLSRTGHTIVAVFLGIILIFGSISNFIVLVLFCKFKVLRNPVNMLLLNISISDMLVCLSGTTLSFASSIQGRWIGGKHGCRWYGFANSCF 1
2 GIVSLISLAILSYERYRTLTLCPGQGADYQKALLAVAGSWLYSLVWTVPPLIGWSSYGTEGAGTSCSVHWTSKSVESVSYIMCLFIFCLVIPILVMVYFYGRLLYAVKQ 0
0 VGKIRKTAARKREYHVLFMVVTAVICYLICWVPYGMIALLATFGPPGVVSPVANVVPSILAKSSTVCNPIIYVLMNKQ 0
0 FYKCFLILFHCQPAQSGPDVSLCPSNVTVIQLGQRKNKDAPGSI*
 
>TMT_macEug Macropus eugenii frag
0 MSINLTANLSFGTLLPDSEEKQRSGLSRTGHTVTAVFLGLILILGVINNFIVLVLFCKFKVLRNPVNMLLLNISISDMLVCLTGTTLSFASSIRGRWIAGYHGCRWYGFANSCF 1
2 GIVSLISLAVLSYERYRTLTLCPRQGTDYHKALLAVAGSWLYSLIWTVPPLIGWSSYGTEGAGTSCSVHWTSKSVESVSYIMCLFIFCLVIPILFMVYFYGRLLYTVKQ 0
0 VGKIRKSAARKREYHVLFMVVTAVICYLICWVPYGMIALLATFGPPGVVSPVANVVPSILAKSSTVCNPIIYILMNKQ 0
0 FYKCFLILFHCQPASSASDASLCPSKMTVIQLGQRKDKEVPCAIQDLPEVSKKQLCLLSPESNVAPSSGHPQEKMEEKPLSE*  0
 
>TMT_ornAna Ornithorhynchus anatinus frag
0                        GLSRTGHTMVAVFLGIILVFGFMNNLIVLILFCKFKALRNPVNMIMLNISASDMLVCVSGTTLSFASNISGRWIGGDPGCRWYGFVNSCL 1
2 GIVSLISLAVLSYERYRTLTLHPKQSTDYQKAVLAVGASWIYSLIWTIPPLLGWSSYGTEGAGTSCSVHWSSKSPVSVSYIVCLFIFCLVIPVLVMIYCYGRLLYAVKQ 0
0 IGKARKTAARKREYHVLFMVITTVICYLVCWMPYGVTALLATFGQPGTVSPEASVIPSILAKSSTVCNPIIYILMNKQ 0
0 FYKCFLILFHCQPPRAADAPSTYPSQVMVIQLNQRRSRETAGAPQVLLEMKHQTLHLLGPQLHETPSWERSTPVHPE* 0
 
>TMT_galGal Gallus gallus XM_001234388 mRNA multiple tissue opsin full +NCK2 -UXS1 +TMT -ST6GAL2 (overlap) +SLC5A7 +SULT1C4
0 MNHTWTYNLSFGAPTDPVEPRAGLSRNGHTVVAVFLGFILFFGFLNNLIVLILFCKFKTLRNPVNMLLLNISISDMLVCISGTTLSFASNIHGKWIGGEHGCRWYGFVNSCF 1
2 GIVSLISLAVLSYERYSTLTLCNKRSDDYRKALLAVGGSWVYSLLWTVPPLLGWSSYGIEGAGTSCSVRWSSETAESTSYIICLFIFCLVIPVMVMMYCYGRLLYAVKQ 0
0 VGKIHKNTARKREYHVLFMVITTVICYLVCWIPYGVIALLATFGKPGVVTPVASIIPSILAKSSTVCNPIIYILMNKQ 0
0 FYKCFRQLFHCQPPSSTDGEPTCHSKVTVIQLNQKTDGGKLCNNKPRPETDNKVTSLLHPEPGLEPAAKTVPPM*  0
 
>TMT_taeGut Taeniopygia guttata
0 MNHTWMYNLSFGAPAHPVEPRAGLSRSGHTVVAVFLGLILFFGFLNNLIVLILFCKFKTLRNPVNMLLLNISVSDMLVCISGTTLSFASNIRGKWIGGDHACRWYGFVNSCF 1
2 GVVSLISLAVLSYERYNTLTLCHKRSDDFRKALLAVAGSWIYSLVWTVPPLLGWSSYGVEGAGTSCSVRWSSESAESTSYIICLFVFCLVVPVMVMMYCYGRLLYAVKQ 0
0 VGKIHKNAARKREYHVLFMVIPTVICYLVCWIPYGVIALLATFGKPGAVTPITSIIPSILAKSSTVCNPIIYILMNKQ 0
0 FYKCFRQLFHCQPPSSTDGEPTCHSKVTVIQLDQRADGGNMCNNEPHPETDSKMTSLLCPETTSKATPPTS* 0
 
>TMT_anoCar full +TMT -ST6GAL2 (overlap) +SLC5A7
0 MSELSSNLTFNMSTSIEEPGSGLSRMGHNIVAVFLGLILVFGFLNNLVVLILFCKFKTLRNPVNMLLLNISASDMLVCISGTTLSFVSNIYGRWIGGEHGCRWYGFVNSCF 1
2 GIVSLISLAILSYERYSTLTQTNKRGSDYQKALLGVGGSWLYSLIWTVPPLIGWSSYGLEGAGTSCSVRWTSETLESVTYIICLFIFCLAIPVLVMIYCYARLFYAVKQ 0
0 VGKLRKTSARKREFHVLFMIITTIICYLICWMPYGVIALLATFGRPGLVSPVASVIPSILAKSSTVFNPIIYILMNKQ 0
0 FYKCFLMLLHCQPSSVADGETICQSKVMAIHQNQKAQGGVILKSQVVPQMDEKAICLLSPESSLDPVLESTPQLSKENSFL* 0
 
>TMT_xenTro full -UXS1 +TMT -ST6GAL2 (overlap) +SLC5A7
0 MSTIKNWTTNISVENSMSYIENDLSLPTEAVLSRTGHTVVAIFLGFILIFGFLNNFVVLILFCKFKTLRTPVNMMLLNISASDMLVCVSGTTLSFTSSIKGKWIGGEYGCQWYGFVNSCF 1
2 GIVSLISLAILSYERYSTLTLYNKGGPNFKKALLAVASSWLYSLVWTVPPLLGWSSYGREGAGTSCSVRWTSESVESVSYIICLFIFCLALPVFVMLYCYGRLLYAVKQ 0
0 VGKIRKIAARKREYHVLFMVITTVICYLLCWLPYGVVALLATFGRPGVISPVASVVPSILAKSSTVFNPIIYILMNKQ 0
0 FYKCFLILFHCHPTSSADGKSICQSNYTVIQLNQKLNNIVAIPGQTQIPESVDKMPCIHRQNNESPSDQMPQSTTEHLISGT* 0
 
>TMT_danRer Danio rerio  -UXS1 +TMT -ST6GAL2 (overlap) +GPR89A -pdzk1l
0 MFFEQADLNYSFNMSEEDRLTLLDEDWSDSPMETLSRAGFIALSVFLGFIMTFGFFNNLVVLVLFCKFKTLRTPVNMLLLNISISDMLVCMFGTTLSFASSVRGRWLLGRHGCMWYGFINSCF 1
2 GIVSLISLVVLSYDRYSTLTVYHKRAPDYRKPLLAVGGSWLYSLIWTVPPLLGWSSYGLEGAGTSCSVSWTQRTAESHAYIICLFVFCLGLPVLVMVYCYGRLLYAVKQ 0
0 VGKIRKTAARKREYHVLFMVITTVVCYLLCWMPYGVVAMMATFGRPGIISPVASVVPSLLAKSSTVINPLIYILMNKQ 0
0 FYRCFRILFCCQRSLLQNGHSSMPSKTTVIQLNRRVNSNAVACTAQISTGTHNHDCSTHVTERSNPPEVIP* 0
 
>TMT_tetNig Tetraodon nigroviridis  -UXS1 +TMT -ST6GAL2 frameshifted assembly
0 MFSGQAGLNSSFNLSDGRGLEDAPAGRGRLSPTGFVVLSVVLGFIITFGFLNNFIVLLLFCKFKKLRTPVNVLLLNISVSDMLVCLFGTTLSFASSLRGRWLLGRSGCNWYGFINSCF 1
2 GIVSLISLVILSHDRYSTLTVYNKQGINYRKPLLAVGGTWLYSLLWTVPPLLGWSSYGIEGAGTSCSVSLDGADGPVPRLHHLPLHLLPGVTGAGDGLLLQQAAVGRQ 0
0 VGKIRKTSARKREYHILFMVVTTAACYLVCWMPYGVVAMMATFGPPNIISPVASVVPSLLAKSSTVINPLIYILMNKQ 0
0 FYKCFLILFHCSHWSADNGTTSVPSKITVIQLNRRAYSNTVACADPLSTDALKQCCSAKNASTIEVKLS* 0
 
>TMT_takRub Takifugu rubripes (pufferfish)
0 MFSGQAGLNYSFNLSDDRELLDAPAGRAKLSPTGFVVLSVVLGFIMTFGFLNNFVVLLLFCKFKKLRTPVNMLLLNISVSDMLVCLFGTTLSFASSIRGRWLLGRIGCSWYGFINSCF 1
2 GIVSLISLVILSYDRYSTLTVYNKQGINYRKPLLAVGGTWLYSLFWTVPPLLGWSSYGIEGAGTSCSVSWTVQTAQSHAYIICLFTFCLGIPILVMIYCYSRLLWAVKQ 0
0 VGRIRKTAARKREYHILFMVVTTAACYLVCWMPYGVVAMMATFGPPNIISPVASVVPSLLAKSSTVINPLIYILMNKQ 0
0 FYKCFLILFHCGHWSADNGNTSMPSKTTAIQLNRRVYSNTVACADQLSTDALKQCCSANTISTKNTSTVEGKLS* 0
 
>TMT_gasAcu
0 MVFGQAGLNHSFNLSDDRELLDTSAGRAKLSPTGFVVLSVMLGFIMTFGFVNNLVVLLLFCKFKKLRTPVNMLLLNISVSDMLVCLFGTTLSFASSLRGKWLLGRSGCSWYGFINSCF 1
2 GIVSLISLVILSYDRYSTLTVYNKAGPDYRKPLLAIGGSWLYSLFWTVPPLLGWSSYGIEGAGTSCSVSWTVQTAQSHAYIICLFTFCLGLPMLVMIYCYSRLLLAVKQ 0
0 VGRIRKTAARRREYHILFMVLTTAACYMLCWMPYGVVAMMATFGPPNIISPVASVVPSLLAKSSTVINPLIYILMNKQ 0
0 FYRCFLILFHCKHWSAENHNTSMPSKTTVIHLNRRVCSNTLPCTAQASTDAANHFCSTSATKHTSPPLQGHGLSLNVLNMIRQENHSHDEAAKNQLDCLT* 0
 
>TMT_oryLat Oryzias latipes (medaka)
0 MFSGQTGLNFSFNQSDDRELEDTPAGSAKLSQAGFVVLSVVLGFIMTFGFLNNFVVLILFCKFKKLRTPVNMLLLNISVSDMLVCLFGTTLSFASSIRGRWLLGRGGCSWYGFINSCF 1
2 GIVSLISLVILSYDRYSTLTVYNKGGLNYRKPLLAVGGSWLYSLFWTVPPLLGWSSYGLEGAGTSCSVSWTANTAQSHAYIICLFIFCLGLPILVMIYCYSRLLLAVKQ 0
0 VGKIRKTAARKREYHILFMVLTTAACYLLCWMPYGVVAMMATFGPPNIISPVASVVPSLLAKSSTVINPLIYILMNKQ 0
0 FYRCFLILFHCDHWSSENGNTSVPSKTTVIPLNRRIYTNTVAQISTDNAN* 0
 
>TMT_ictPun Ictalurus punctatus (catfish) transcript from whole fry
0                          WLLGRTGCMWYGFINSCF 1
2 GIVSLISLMILSYERYSTMTVYNNQGPNYRKHLLAVGGSWLYSLIWTVPPLLGWSSYGLEGAGTSCSVSWTDHSPKSHAYIICLFIFCLGLPVLLMVYSYGRLLYAVKQ 0
0 LGKIHKTARRRDYHLLFMITTTVVCYLLCWTPYSVVALMASFGRPGIITPVASIIPSLLAKSSTVINPVIYIFMNKQ 0
0 FYRCFRTLLGYKERSAVPDDHSLMATKNTAIQLKCIMHNNPVPSPAHTPPPFF
 
>TMT_oncMyk Oncorhynchus mykiss frag 2 white blood cell transcripts
2  GIFCLGLPVLVMVYCYGRLLYAVKQ 0
0 VGRIRKSAARRREFHILFMVITTVVCYLLCWMPYGVIAMMATFGHPGLITPIVTVVPSIMAKSSTVINPLIYILMNKQ 0
0 FYRCFLILFHCKRPSSENGVSSMPSKTTVIQLNRRGHSNNVALTPQLSTGANHHNHNHTVECSTNNREVTTPIGLPHSGWL* 0
 
>TMTa1_danRer Danio rerio NM_001118899 full +PBX3 +TNK2 +TMTa1 -PAP2D +LPPR4
0 MIVSNLSVLSCRRNSALCLGAVEGHLEASSSYRTLSPTGHILVAVSLGFIGTFGFLNNLLVLVLFGRYKVLRSPINFLLVNICLSDLLVCVLGTPFSFAASTQGRWLIGDTGCVWYGFANSLL 1
2 GIVSLISLAVLSYERYCTMMGSTEADATNYKKVIGGVLMSWIYSLIWTLPPLFGWSRYGPEGPGTTCSVDWTTKTANNISYIICLFIFCLIVPFLVIIFCYGKLLHAIKQ 0
0 VSSVNTSVSRKREHRVLLMVITMVVFYLLCWLPYGIMALLATFGAPGLVTAEASIVPSILAKSSTVINPVIYIFMNKQ 0
0 FYRCFRALLNCDKPQRGSSLKSSSKTKPFRPGRRTDNFTFMVASVGPNQTNPVEDGPPSADNTKPAVLSLVAHYNG* 0
 
>TMTa_takRub Takifugu rubripes (pufferfish) -CALD1 +TNK2 -RAB18 +ABI1 12670711 AF402774 full
0 MIVSNVSLSGCAGVNGAVCAAEGHQAGGSDRSTLTPTGNLVVSVFLGFIGTFGLVNNLLVLVLFCRYKMLRSPINLLLMNISISDLLVCVLGTPFSFAASTQGRWLIGEAGCVWYGFANSLF 1
0 MIVSNVSLSGCAGVNGAVCAAEGHQAGGSDRSTLTPTGNLVVSVFLGFIGTFGLVNNLLVLVLFCRYKMLRSPINLLLMNISISDLLVCVLGTPFSFAASTQGRWLIGEAGCVWYGFANSLF 1
2 GVVSLISLAVLSFERYSTMMTPTEADPSNYCKVCLGITLSWVYSLVWTVPPLFGWSSYGPEGPGTTCSVNWTAKTTNSISYIICLFVFCLIVPFLVIVFCYGKLLCAIRQ 0
2 GVVSLISLAVLSFERYSTMMTPTEADPSNYCKVCLGITLSWVYSLVWTVPPLFGWSSYGPEGPGTTCSVNWTAKTTNSISYIICLFVFCLIVPFLVIVFCYGKLLCAIRQ 0
Line 384: Line 571:
0 FYRCFLALLCCQDPRSGSSMKSSSKVATKAKGVTPTGQRRTDFLYMVASLGRPAATIPQLGPSFDATNDFTKPPSSDTIKPVVVSLAAHCDG*
0 FYRCFLALLCCQDPRSGSSMKSSSKVATKAKGVTPTGQRRTDFLYMVASLGRPAATIPQLGPSFDATNDFTKPPSSDTIKPVVVSLAAHCDG*


>ENCEPH4b_takRub Takifugu rubripes (teleost) no_ref genome encephalopsin full
>TMTa_tetNig Tetraodon nigroviridis full
0 MIASNASVSGCAGVHGAACAADAPPAGGSHRSSSSLTPTGNLVVSVFLGLIGTSGLVSNLLVLVLFCRFKVLRSPINLLLVNISVSDLLVCVLGTPFSFAASTQGRWLIGAAGCVWYGFVNSLFG 1
2 GIVSLISLAVLSFERYSTMMTPTEADSSNYCKVCLGIGLSWVYSLLWTVPPLLGWSSYGPEGPGTTCSVNWTAKTANSVSYIICLFVFCLILPFLVIVFCYGKLLCAIRQ 0
0 VSGVNASMSRRREQRVLFMVVVMVICYLLCWLPYGVVALLATFGPPGLVTPAASIIPSILAKSSTVINPVIYVFMNKQ 0
0 FSRCFLSLLCCEDPRSSTSLRSSSRVTTKAVRGGTLTGQRRTNHLLYMVAALGRPVATAMPQLGPSFDATYDITKAPSSDNHQPVVVSLEAHG* 0
 
>TMTa_gasAcu Gasterosteus aculeatus (stickleback)  +TNK2 +ENC full
0 MIVSNLSLSGCAGVSSALCAAAGEGHLSGGSHRNTLTPTGHLVVAVCLGFIGTLGLMNNLLVLVLFCRYKMLRSPINLLLINISISDLLVCVLGTPFSFAASTQGRWLIGEGGCVWYGFANSLFG 1
2 GIVSLISLAVLSYERYSTMVAPTEADSSNYHKISLGITLSWVYSLIWTAPPLFGWSHYGPEGPGTTCSVDWTARTANSISYIICLFVFCLIVPFLVIVFCYGKLLCAIRQV 0
0 VSGINASLSRKREQRVLFMVVIMVVCYLLCWLPYGIMALMATFGPPGLITPVASIIPSVLAKTSTVINPVIYVFMNKQ 0
0 FYRCFKALLRCEAPRPSSSLKSSSKVPTKAMRGAAVTGPRHTNNFLFVVASLGRPVATIPQLGPSVEPTIDVTGGPSSDNNKPVIVSLVAQCDG* 0
 
>TMTa_oryLat Oryzias latipes (medaka) genome SLC12A3 two frags
0 MLVSNVSLGGCAEFNSALCAGAGEEHLGGGSYRTTLTPTGHLIVAVCLGFIGTFGLVNNLLVLVLFCRYKILRSPINLLLINISISDLLVCVLGTPFSFAASTQGRWLIGEGGCVWYGFANSLCG 1
2 GIVSLISLAVLSYERYSTMMTPAEADSSNYRKISLGIILSWGYSLLWTLPPLFGWSHYGPEGPGTTCSVDWTAKTANNISYIICLFVFCLIVPFMVIVFCYGKLLYAIKQV 0
0 VSGINVSVSRKREQRVLFMVVIMVICYLLCWLPYGIMALLATFGPPDLVTPEASIIPSVLAKTSTAINPVIYVFMNKQ 0
0 * 0
 
>TMTa_pimPro Pimephales promelas frag
  GHLVVAVCLGFIGTGFLNNTLVLILFCRYKVLRSPMNYLLVSIAVSDLLVCVLGTPFSFAASTQGRWLIGRAGCVWYGFINSCL 1
2 GVVSLISLAVLSYERYCTMMGATQADSTNYKKVAMGIAFSWIYSMVWTLPPLFGWSCYGPEGPGTTCSVNWAARTANNVSYIICLFFFCLILPFIVIVYSYGRLLQAITQ 0
0 VSRINTVVSRKREQRVLFMVITMVVCYLLCWLPYGIMALLAAFGRPGLVTPAASIVPSVLAKTSTVINPIIYIFMNKQ 0
0 FCRCFHALIMCTTPQRGSSFKNSSKVTKTLRTVRRANGQNVTFAVASAGHPTICAPH
 
>TMTb_danRer +TNK1 +TMTb -MYEOV2
0 MIESNVSRSCEWCAGGGEGTGAHLDENHSDHSLSPTGHLVVAVCLGFIGTFGFLNNTLVLVLFCRYKVLRSPMNCLLISISVSDLLVCVLGTPFSFAASTQGRWLIGRAGCVWYGFINSFLG 1
2 GVVSLISLAVLSYERYCTMMGSTQADSTNYRKVVIGIAFSWIYSMVWTLPPLFGWSCYGPEGPGTTCSVNWAARTPNNVSYIVCLFVFCLILPFIVIVYSYGRLLQAITQ 0
0 VSRINTVVSRKREQRVLFMVVTMVVCYLLCWLPYGIMALLATFGHPGLVTPAASIVPSLLAKSSTVINPIIYIFMNKQ
0 FCRCFHALIMCTTPERGSSFKNSSKVTKTLRTVRRANGQNVTFAVASAVHRTPYSDRQKSSSEGEKLPPATGQGTSKPVVSLVAYYNG* 0
>TMTb_takRub Takifugu rubripes (pufferfish) +TFRC +TMTb +CHES1 -MYEOV2 -ARHGAP21 full
0 MIVCNVSLSCAHCPGEGTAANDAYAQASGSLATPTLSQRGHLVVAVCLGFIGTVGFLSNFLVLALFCRYRALRTPMNLMLVSISASDLLVSVLGTPFSFAASTQGRWLIGRAGCVWYGFVNACL 1
0 MIVCNVSLSCAHCPGEGTAANDAYAQASGSLATPTLSQRGHLVVAVCLGFIGTVGFLSNFLVLALFCRYRALRTPMNLMLVSISASDLLVSVLGTPFSFAASTQGRWLIGRAGCVWYGFVNACL 1
2 GIVSLISLAVLSYERYCTMVSSTIASNRDYRPVLGGICFSWFYSLAWTVPPLLGWSRYGPEGPGTTCSVDWRTQTPNNISYIVCLFTFCLLLPFFVILYSYGKLLHTIRQ 0  
2 GIVSLISLAVLSYERYCTMVSSTIASNRDYRPVLGGICFSWFYSLAWTVPPLLGWSRYGPEGPGTTCSVDWRTQTPNNISYIVCLFTFCLLLPFFVILYSYGKLLHTIRQ 0  
Line 390: Line 607:
0 FYRCFRAFLNCSTPKRDSTVRTFTRISLRALRQDQQQKGSALAPSSARPTPNSIHESSLKGSHSTPSNGGAAAAKSPAANRSKPKLILVAHYRE* 0
0 FYRCFRAFLNCSTPKRDSTVRTFTRISLRALRQDQQQKGSALAPSSARPTPNSIHESSLKGSHSTPSNGGAAAAKSPAANRSKPKLILVAHYRE* 0


>ENCEPH4a_calMil Callorhinchus milii (elephantfish) wgs frag  
>TMTb_tetNig Tetraodon nigroviridens
0 MIVCNLSLSCAHCPGGGAAATDAYAEAPGSLAPPTLSQRGHLVVAVCLGAIGTVGFLSNLLVLALFCRFRALRTPMNLMLVSISASDLLVSVLGTPFSFAASTQGRWLLGRAGCVWYGFVNACL 1
2 GIVSLISLAVLSYERYCTMMASTMASNRDYRPVLLGICFSWFYSLAWTVPPLLGWSRYGPEGPGTTCSVDWRTQTPNNISYIVCLFAFCLLLPFCVILYSYGKLLHTIRQ 0
0 VSSVSSAVTRRREHRVLVMVVAMVVCYLICWLPYGVTALLATFGPPNLLTPEATITPSLLAKFSTVINPFIYIFMNKQ 0
0 FYRCFRAFLSCSSPERGSTVRTFTRISLRAVCQRKQQRVSAPAASSACPTPNSIHHSSRKGSHSASSNSGTAAAAKTPAANSSKPKLILVVHYRE* 0
 
>TMTb_gasAcu Gasterosteus aculeatus (stickleback) full
0 MIVCNVSLSCVHCPGGGAGGTAATATGAYEEVSDSLPAPSLSPKGHLVVAVCLGFIGTFGFLSNFLVLALFCRYRALRTPMNLLLVSISASDLLVSMVGTPFSFAASTQGRWLIGRAGCVWYGFVNACL 1
2 GIVSLISLAVLSFERYSTMVKPTVADGRDFRPALGGIAFSWLYSVAWTVPPLLGWSEYGPEGPGTTCSVDWKTQTANNISYIVCLFVFCLVLPFCVILYSYSRLLQAIRQ 0
0 IPQVSVVSSVVTRHREQRVLAMVVVMVACYLVCWLPYGVAALLATFGPRDLLSPEASITPSLLAKFSTVVNPFIYIFMNKQ 0
0 FYRCFRAFLSCSTPERGSTLKTFSRPTKTLRAGRHEKGRRVSAAAPSTAQPTRNSAPRSSQGANHASATPPPSPADGRCAAAGAAKPKRTLVAHYRE* 0
 
>TMTb_oryLat
0 MIVPNASLSCAHCDGDAAEQDAPGSAAAPSLSPTGHLVVAVCLGLIGTCGFLSNLLVLALFCRYRALRTPMNLLLVSISVSDLLVSVLGTPFSFAASTQGRWLIGRAGCVWYGFINACL 1
2 GIVSLISLAVLSYERYSTVMTPNMADGRDFRPALGGICFSWLYSVAWTVPPLLGWSRYGPEGPGTTCSVDWKTQTPNNISYIICLFTFCLLLPFGVIVYSYGKMLRVIRQ 0
0 VSQVRSMSSVVTRRREQRVLVMVVTMVVCYLVCWLPYGIAALLATFGPRDLLTPAASITPSLLAKFSTVINPLIYIFMNKQ 0
0 FYRCFWAFFCCSTPEQVSTLRTFSRVTKTIRTFRQERELHVSAPAPSSGLPTPNSIQKGNNHVDPSSINQACAASDSPDSRKPKVVLVAHYQE* 0
 
>TMTa1_calMil Callorhinchus milii (elephantfish) wgs frag  
0 MLNSSPNSSPSLPLSQVGWTGLSRTGLTVVAVCLGIIMVLGFLNNLLVLVLFCKYKVLRSPMNMLLLNISVSDMLVCICGTPFSFAASVQGRWLVGEQGCKWYGFANSLF 1
0 MLNSSPNSSPSLPLSQVGWTGLSRTGLTVVAVCLGIIMVLGFLNNLLVLVLFCKYKVLRSPMNMLLLNISVSDMLVCICGTPFSFAASVQGRWLVGEQGCKWYGFANSLF 1
2 GIVSLMSLTILSYDRYITITGTTEADITNYNKTIVGIALSWIYSLMWTLPPLFGWSNYGPEGPGTTCSVNWQSKEVSSKSYIICLFIFCLLMPFLVIVYCYGKLVLAVRK 0
2 GIVSLMSLTILSYDRYITITGTTEADITNYNKTIVGIALSWIYSLMWTLPPLFGWSNYGPEGPGTTCSVNWQSKEVSSKSYIICLFIFCLLMPFLVIVYCYGKLVLAVRK 0
0     AQTREHRILLMVISMVTFYLLCWLPYGTVALIGTFGNADLITPTCSVIPSILAKSSTVINPVIYVIMNKQ 0
0       AQTREHRILLMVISMVTFYLLCWLPYGTVALIGTFGNADLITPTCSVIPSILAKSSTVINPVIYVIMNKQ 0
 
>TMTa2_calMil Callorhinchus milii (elephantfish) wgs frag
0 VSANNSMGRTRENKLLIMVTFMIICFLLCWLPYGIVALLATFGSPGLITPTASIIPSVLAKTSTVYNPIIYIFMNKQ 0
 
>TMTx_braFlo Branchiostoma floridae (amphioxus) XM_002207814 frag with assembly duplication, no N-term even in v2.0 47% TMT5_braFlo + insect TMTs
0 43 VAAILALIGVLGIVNNSTTLYLVGRYKQLRTPFNILMVNLSVSDLLMCVLGTPFSFVSSLHGRWMFGHSGCEWYGFICNFL 1
2 GIVSLITLTVISYERYLLMKRLPNERILSYRAVALAVVFIWCYSLLWTAPPLVGWSSYGPE 0
0 GYGISCSVNWESRTANDTSYIVAYFVGCLVFPVAIIVISYTRLILYMRQ 0
0 QAPSAPMQMLVRREKRVTKMVVVMIMGFTICWTPYTIVALIVTCGGEGIITPAAATVPALFAKSSVVYNAAIYVAMNNQ 0
0 FRKCFLRSLNCRSQPRDPSSQQYTLKTNQVGMSTSGSQAARTADRIKTVHVATANPQDHRSSSGQAVEDNGGFRKSLTHSLPLNSISTLLEAEK* 0
 
>TMT5_braFlo Branchiostoma floridae (amphioxus) extra 00 intron Amphiop5
0 MLGMHNVMNATDYDNNNATFAAWNFQRNGTTEEEVEFSGFDTVAVVIAAIGIAGFLSNGAVVLLFLKFRQLRTPFNMLLLNMSVADLLVSVCGNTLSFASAVRHRWLWGRPGCVWYGFANHLF 1
2 GLVSLISLAVISYERYRMVVKPKGPGSSYLTYNKVGLAIIFIYLYCLLWTTLPIVGWSSYQLE 0
0 GPKISCSVAWEEHSLSNTSYIVAIFIMCLLLPLLIIIYSYCRLWYKVKK 0
0 GSQNLPPAIRKSSQKEQKIARMVVVMITCFLVCWLPYGAMALVVSFGGESLISPTAAVVPSLLAKSSTCYNPLVYFAMNNQ 0
0 FRRYFQDLLCCGRRLFDASASVNTCNTSAMPRHSPVFQKPDSDQYNGIQKSREPQMRTTGQNAPYRQWIEMQTIAVVVKADEVNNKFGEVKT* 0
 
>TMT5_braBel Branchiostoma belcheri (amphioxus) AB050609 full introns from braFlo Amphiop5 extra Nfrag in mrna
0 MLGIYNVVNATEYGNNTTFAAWDFKRNGTGGEEEVEFFGYDAVAGVIAIIGVVGFVSNGAVVVLFLKFPQLRTPFNLLLLNMAVADLLVSVCGNTLSFASAVRHRWLWGRPGCVWYGFANHLF 1
2 GLVSLISLAVISFLRYRMVVKPKGPGSSYLTYTKVGLAILFIYLYCLLWTTLPIAGWSSYQLE 0
0 GPKIGCSVAWEEHSWSNTSYIVVLFITCLFAPLLIIVYSYYRLWHKVKQ 0
0 GSRNLPAAMRKSSQKEQKIAMMVIVMITCFMVCWLPYGAMALVVTFGGERLISHTAAVVPSLLAKSSTCYNPVVYFAMNSQ 0
0 FRRYFQDLLCCGRRLFDVSQSVVTGNTAMPRNNSQGFRKDDSDQKQDNGLPKQSEGPMCDHSSNESQMEGSRHNTAASQQWIEMQTIAVVVKAVEVDTSAANEP* 0


>ENCEPH4b_calMil Callorhinchus milii (elephantfish) wgs frag
>TMTy_braFlo Branchiostoma floridae (amphioxus) FE572481 (to other allele) gastrula XM_002222645 flawed, allele dup, 39% ENCEPH4_braFlo new
0 VSANNSMGRTRENKLLIMVTFMIICSCFAGCLRNSSSFGHFGSPGLITPTASIIPSVLAKTSTVYNPIIYIFMNKQ 0
0 MASAGQNVTFPAIDTMAPTPEALTSDPTTPAYFTTEQHLLMAVWLGFIGSFGFVTNLLTVLVFWCFKSLRTPFHLYLGGIALSDLLVAALGSPFAVASAVGERWLFGRAVCVWYAFVNYFL 1
2 SVSIVTMATMSFSRYWVIIRPQSAPRLDTVYGACVVNALAWCYSFFWTIMPVLGWSRFTQ 0
0 VAAMTVCSLDWDHHTPLSKSYIPVAFLTCLFLPLGVIIFSVFKTTMHLRR 0
0 AAEVEDEVPNEVRAGRKTTRITLVMAGCWLVAWLPYACMALVIAAGGRVSPTVEVLATKFAKTSYIVNTIIYLVMEKE 0
0 FRKSLVLLLFCGRDPFDIQIEQPAYEKADVYVERLVTAEPMVEMEAVNVRPAQQEPARAPFGTPL* 0
 
>TMTPIN_stoPur Stronglyocentrotus purpuratus GLEAN3_05569 0.2.2.0.0 16311335 opsin1 PIN-type introns no cdna no sacKow
0 MSNLMTGLVTNVNALSGIGNETPTTIGLSSLVVPVSRTTYNYLTVYTGFLTIFGILNNGIVMILFARFPSLRHPINSFLFNVSLSDLIISCLASPFTFASNFAGRWLFGDLGCTLYAFLVFVA 1
2 GTEQIVILAALSIQRCMLVVRPFTAQKMTHRWALFFISLTWIYSLIICVPPLFGWNRYTYEGPGT 1
2 ACSVAWNSPSPGDTSYIIFIFVLVLVIPFGIIIFCYGLLVYAVKK 0
0 ISRTQAALSSEAKADRKVSKMIFIMILFFLIAWTPYTGFSLYVTFGKNVVITPLAGTFPPFFAKLCTIHNPIIYFLLNKQ 0
0 FKDALIQLFCCGENPFDRDESEHEGRGGRHRHRTAPSATAHIGGRGRASSLPTATSMLDIPQAASTAASSSGKTQNKESLEKGPSTSETTNKRVFELSSKIQKFEISEKNNTPSSSELPGASSLSGALMPPRRAMKNQVGCLPPVDN* 0
</pre>
</pre>


 
'''See also:''' [[Opsin_evolution|Curated Sequences]] | [[Opsin_evolution:_LWS_PhyloSNPs|LWS]] | [[Opsin_evolution:_Melanopsin_gene_loss|Melanopsins]] | [[Opsin_evolution:_Neuropsin_phyloSNPs|Neuropsins]] | [[Opsin_evolution:_Peropsin_phyloSNPs|Peropsins]] | [[Opsin_evolution:_RGR_phyloSNPs|RGR phyloSNPs]] | [[Opsin_evolution:_update_blog|Update Blog]]
[[Category:Comparative Genomics]]
[[Category:Comparative Genomics]]

Latest revision as of 13:14, 7 April 2010

See also: Curated Sequences | LWS | Melanopsins | Neuropsin | Peropsin | RGR phyloSNPs | Update Blog

Introduction to Encephalopsin Evolution

Encephalopsin is a pivotal ciliary opsin class, with basal evolutionary position to the imaging opsin gene expansion and well represented in early diverging deuterostomes, notably the cephalochordate Branchiostoma. Despite this, encephalopsin has been the subject of just one substantive publication dating to 1999. That article primarily emphasized its non-retinal distribution in mouse brain and testis but low-power microscopy could not determine its ultrastructural context -- presumably purely in the cytoplasmic membrane but not associated with a ciliary stack:

"Encephalopsin is highly expressed in the preoptic area and paraventricular nucleus of the hypothalamus ... in selected regions of the cerebral cortex, cerebellar Purkinje cells, a subset of striatal neurons, selected thalamic nuclei, and a subset of interneurons in the ventral horn of the spinal cord. Rostrocaudal gradients of encephalopsin expression are present in the cortex, cerebellum, and striatum. Radial stripes of encephalopsin expression are seen in the cerebellum."

The gene is well-represented in chondrichthyes and teleost fish with multiple independent copies, but single copies in amphibians, birds, lizard, marsupials, primates and rodents, with excellent amino acid conservation (over 60% between teleost fish and human). Surprisingly, upon more intensive taxonomic sampling, functional encephalopsin turns out to completely absent in a variety of other mammalian clades. While interesting in its own right, this complicates the standard application of comparative genomics to the evolution of encephalopsin structure and function.

Multiple events of encephalopsin loss in mammals

The table lists 11 species that today have unprocessed pseudogenes in place of a once-functional encephalopsin gene. The observed phylogenetic distribution of loss (which is irreversible) requires a minimum of 7 independent events, for example in some bats but not others. At the level of superordinal clades, no gene loss is observed in Euarchontoglires but Laurasiatheres (especially artiodactyls) are heavily affected as well as the Xenarthra within Atlantogenata and at least platypus within monotremes.

The odd feature of these pseudogenes is that they must all have arisen fairly recent on the mammalian time scale to remain observable today as decayed relics. However they survive at variable degrees of degeneration. This suggests that the losses are not coordinated with a single unifying global environmental event. Pseudogenization in frog becomes apparent only upon alignment: not only is the Schiff base lysine now isoleucine, but various problems occur at conserved residues in the carboxy terminus.

The first exon presents special problems in that it is exceedingly GC rich. This leads to translational simplicity (high GLPA) and blockage by filters such as seg that recognize compositional simplicity. This results in problems for alignment algorithms, leading to non-assembly (as in horse genome) and/or non-recognition of sequences. Blastn at the trace archive still seems to work effectively.

However the artifacts may not be limited to bioinformatics -- the composition of the gene may predispose it to replication slippage and CpG hotspot mutations and so to pseudogenization in the absence of strong selection. For example horse encephalopsin has an astronomic 39 CpG sites in its 373 bp exon 1.

MTAGTRAGGQGSWEGGGAAGAEGPGPAGPLSPAPLFSPGTYERLALLLGCLGLLGVGNNLLVLVLYSKFPRLRTPTHLLLVNISLSDLLVALFGVTFTFVSCLRNGWVWDAVGCAWDGFSSSLC horse exon 1
MTAGTR.................................TYER......................YSKFPRLRTPTH.............ALFGVTFTFVSCLRNGWVWDAVGCAWDGFSSSLC seg filter

atgaccgcgggaacccgagcggggggccagggttcctgggagggcggcggggcggcgggcgctgagggcccggggccggcgggcccgctgagccccgcgccgctcttcagcccgggcacc
 M  T  A  G  T  R  A  G  G  Q  G  S  W  E  G  G  G  A  A  G  A  E  G  P  G  P  A  G  P  L  S  P  A  P  L  F  S  P  G  T 
tacgagcgcctggcgctgctgctcggctgcctcgggctgctgggcgtgggcaacaacctgctggtgctcgtcctctactccaagttcccgcggctccgcacgcccacccacctcctgctg
 Y  E  R  L  A  L  L  L  G  C  L  G  L  L  G  V  G  N  N  L  L  V  L  V  L  Y  S  K  F  P  R  L  R  T  P  T  H  L  L  L 
gtcaacatcagcctcagcgacctgctggtggccctcttcggggtcacctttaccttcgtgtcctgcctgcggaacggctgggtgtgggacgccgtgggctgcgcgtgggacggctttagcagcagcctctgcg
 V  N  I  S  L  S  D  L  L  V  A  L  F  G  V  T  F  T  F  V  S  C  L  R  N  G  W  V  W  D  A  V  G  C  A  W  D  G  F  S  S  S  L  C     

As the function of encephalopsin is obscure, the consequences of gene loss are even more so. Encephalopsin may have lost its supporting selected function; no increase in opsin gene number has compensated for this (from homology searches of complete genomes).

The clade-incoherent gene loss of encephalopsin is reminiscent of loss of GULO, the terminal enzyme for L-ascorbic acid biosynthesis, with relic gene fragments still detectable in guinea pigs (gene estimated lost 20 myr ago) and primates (lost on stem between lemurs and tarsier about 60 myr ago). At least 5 independent losses have been documented in some passerine birds, all tested species of bats, and teleost fish from zebrafish on (but not including sturgeon, chondrichthyes, or lamprey). A 2007 study of loss of GUCYD (GPCR signaling guanyl cyclase D) exemplifies modern pseudogene dating methods.

In the table, 'ok' indicates presence of a gene with conserved sequence characteristics appropriate to functioning opsins and appropriate to the ancestral encephalopsin class; ps indicates definite pseudogenization of the 4-exon gene (reading frame shifts, internal stop codons, loss of sequence conservation in expected regions); +- indicates partial gene availability with some uncertainty (traces commonly have errors such as short indels and quality departure at their ends; repetitive base composition in exon 1 may have lead to trace read technical problems).

ok	>ENCEPH_homSap Homo sapiens (human) NM_014322 OPN3 full
ok	>ENCEPH_panTro Pan troglodytes full
--	>ENCEPH_gorGor Gorilla gorilla only exon 3 avalable
ok	>ENCEPH_ponAbe Pongo abelii full
ok	>ENCEPH_nomLeu Nomascus leucogenys exon 1 ok frag
ok	>ENCEPH_macMul Macaca mulatta (rhesus) XP_001094239 full
ok	>ENCEPH_papHam Papio hamadryas 1st exon problematic 1x ok frag
ok	>ENCEPH_calJac Callithrix jacchus full
ok	>ENCEPH_tarSyr Tarsius syrichta exon 1 fragmentary
ok	>ENCEPH_micMur Microcebus murinus full
ok	>ENCEPH_otoGar Otolemur garnettii full
ok	>ENCEPH_tupBel Tupaia belangeri fragments
	
ok	>ENCEPH_musMus Mus musculus Opn3 Panopsin NM_010098 2aa del full
ok	>ENCEPH_ratNor Rattus norvegicus XP_573517 predicted 2aa del full
ok	>ENCEPH_speTri Spermophilus tridecemlineatus full
ok	>ENCEPH_dipOrd Dipodomys ordii full
ok	>ENCEPH_cavPor Cavia porcellus 3 aa del full
ok	>ENCEPH_oryCun Oryctolagus cuniculus full
ok	>ENCEPH_ochPri Ochotona princeps 5aa del ok frag
	
ps	>ENCEPH_bosTau pseudo frag
ps	>ENCEPH_turTru Tursiops truncatus pseudo frag
ps	>ENCEPH_vicVic pseudo pseudo frag
ok	>ENCEPH_canFam Canis familiaris (dog) DN422921 transcript correct, XP_854433 genbank error
ok	>ENCEPH_felCat Felis catus full
ok	>ENCEPH_equCab Equus caballus full
ps	>ENCEPH_myoLuc Myotis lucifugus weak frag
ok	>ENCEPH_pteVam Pteropus vampyrus 86%=homSap full
--	>ENCEPH_sorAra poor coverage
--	>ENCEPH_eriEur poor coverage
	
ok	>ENCEPH_loxAfr Loxodonta africana 2 exons in browser, 1 2x full
--	>ENCEPH_echTel Echinops telfairi no coverage
ok	>ENCEPH_proCap Procavia capensis ok frag
ps	>ENCEPH_dasNov Dasypus novemcinctus pseudo
+-	>ENCEPH_choHof Chololepis hoffmanni incomplete coverage
ok	>ENCEPH_monDom Monodelphis domestica (opossum) encephalopsin OPN3 75%=homSap full
ok	>ENCEPH_macEug Macropus eugenii ok frag
ps	>ENCEPH_ornAna Ornithorhynchus anatinus pseudo
	
ps	>ENCEPH_xenTro Xenopus tropicalis (frog) 45%=homSap full lost Schiff K
ok	>ENCEPH_galGal Gallus gallus (chicken) 71%=homSap encephalopsin OPN3 full
ok	>ENCEPH_taeGut Taeniopygia guttata mrna CK301424 70%=homSap full
ok	>ENCEPH_anoCar Anolis carolinensis (lizard) 70%=homSap OPN3 full
	
ok	>ENCEPH_danRer Danio rerio (zebrafish) NM_001111164 mrna 61%=homSap full
ok	>ENCEPH_gasAcu Gasterosteus aculeatus (stickleback) 58%=homSap full
ok	>ENCEPH_oryLat Oryzias latipes  58%=homSap full
ok	>ENCEPH_takRub Takifugu rubripes (pufferfish) homSap=61% full

Insertion of retrogene CHML into intron 1 of encephalopsin

The story here is a common one in gene duplication processes, here apparently involving gene dosage compensation. In the amniote ancestor and birds and lizards today there is but one copy of the multi-exonic gene CHM. In mammals, as the sex chromosomes underwent upheaval, the ortholog ended up on chrX. Dosage compensation then favored retention of a subsequently retroprocessed copy into an autosomal chromosome, an event timed below to stem placentals (ie in elephant and sloth etc but not opossum).

The arrangement can be observed at the current quality of assemblies in Homo sapiens, Pan troglodytes, Pongo abelii, Macaca mulatta, Otolemur garnettii, Tupaia belangeri, Mus musculus, Rattus norvegicus, Cavia porcellus, Oryctolagus cuniculus, Canis familiaris, Equus caballus, Bos taurus, Choloepus hoffmanni, Procavia capensis and Loxodonta africana but not in marsupial, platypus, chicken or lizard assemblies.

EncCHML.jpg

The intronless retrogene CHML landed in ancestral chromosome 1 (human numbering) within the first intron of encephalopsin, sharing the same direction of transcription. This raises the question of whether this copy of CHML has separate initiation and termination of transcription or is translated from encephalopsin transcripts by seizing some ribosomal starts. Presumably the normal splice junctions of encephalopsin continue to excise all of the expanded intron (resulting in normal protein -- it could not function with a large inappropriate inserted domain).

This event correlates (imperfectly) with lineage-specific losses of encephalopsin within tetrapods as CHML may have selectively out-competed ENCEPH in some lineages and been causative for pseudogenization.

The region annotated below only corresponds to a known domain (GDI: GDP dissociation inhibitor pfam00996) but suffices as a probe to encephalopsin intron 1. The 3D structure of this region is available in the exonic parent gene in rat, PDB 1LTX, allowing determination of the structural location of an interesting single-residue deletion in carnivores + bats + perisodactyls.

CHML chr1:239,864,567-239,864,746                              genSpp
AFRQCSFSEYLKTKKLTPNLQHFVLHSIAMTSESSCTTIDGLNATKNFLQCLGRFGNTPF   CHML_homSap Homo sapiens (human)
AFRQCSFSEYLKTKKLTPNLQHFVLHSIAMTSESSCTTIDGLNATKNFLQCLGRFGNTPF   CHML_panTro Pan troglodytes (chimp)
AFRQCSFSEYLKTKKLTPNLQHFVLHSIAMTSESSCTTIDGLNATKNFLQCLGRFGNTPF   CHML_ponPyg Pongo pygmaeus (orang_sumatran)
AFRQCSFSEYLKTKKLTPNLQHFVLHSIAMTSESSCTTIDGLNAIKNFLQCLGRFGNTPF   CHML_macMul Macaca mulatta (rhesus)
AFRQCSFSEYLKTKKLTPNLQHFVLHSIAMTSESSCTTIDGLNATKNFLQCLGRFGNTPF   CHML_calJac Callithrix jacchus (marmoset)
AFEQCLFSEYLKTKKLTPNLQHFILHSIAMTSESSCTTIDGLKATKNFLQCLGRFGNTPF   CHML_otoGar Otolemur garnettii (bushbaby)
AFKQCSFSEYLKTKKLTPNLQHFVLHSIAMTSESSCTTVDGLKATKNFLQCLGRFGDTPF   CHML_micMur Microcebus murinus (mouse_lemur)
AFEQCLFSEYLKTKKLTPNLRHFILHSIAMTSESSCSTLDGLKATKTFLQCLGRFGNTPF   CHML_tupBel Tupaia belangeri (tree_shrew)
DFKQCSFSDYLKTKKLTPNLQHFILHSIAMTSESSCTTLDGLQATKTFLQCLGRFGNTPF   CHML_musMus Mus musculus (mouse)
DFKQCSFSDYLKTKKLTPNLQHFILHSIAMSSDSSCTTLDGLQATKNFLRCLGRFGNTPF   CHML_ratNor Rattus norvegicus (rat)
DFQQCLFSEYLKTKRLTPNLQHFILHSIAMTSESSCTTLDGLKATKNFLQCLGRFGNTPF   CHML_cavPor Cavia porcellus (guinea_pig)
DFKQCSFSEYLKAKKLTPNLQHFVLHSIAMTSETSCTTLDGLKATKIFLQCLGRFGNTPF   CHML_oryCun Oryctolagus cuniculus (rabbit)
DFKQCSFSEYLKTKKLTPNLQHFILHSIAMTSESSCTTLDGLRATKNFLQCLGRFGNTPF   CHML_ochPri Ochotona princeps (pika)
AFVHCSFSDYLKTKKLTPNLQHFVLHSIAMT-ESSCTTIDGLKATKNFLRCLGRFGNTPF   CHML_canFam Canis familiaris (dog)
AFMQCSFSEYLKTKKLTPNLQHFVLHSIAMT-ESSCTTIDGLKATKNFLQCLGRFGNTPF   CHML_felCat Felis catus (cat)
AFMQCSFSEYLKTKKLTPNLQHFVLHSIAMT-ESSCTTIDGLKATKNFLQCLGRFGNTPF   CHML_equCab Equus caballus (horse)
DFTQRPFSEYLKTQKLTPNLQHFILHSIAMT-EPSCLTVDGLKATKHFLQCLGRYGNTPF   CHML_myoLuc Myotis lucifugus (microbat)
DFTQCSFSEYLKTKKLTPNLQHFVLYSIAMT-ESSCTTVDGLKAAKNFLRCLGRFGNTPF   CHML_pteVam Pteropus vampyrus (macrobat)
AFTQCSFSEYLKTKNLTPSLQHFILHSIAMMSESSCTTVDGLKATKTFLQCLGRFGNTPF   CHML_bosTau Bos taurus (cow)
AFTQCSFSEYLKTKKLTPSLQHFVLHSIAMMSESSCTTIEGLKATKNFLQCLGKFGNTPF   CHML_turTru Tursiops truncatus (dolphin)
DFMQCSFSEYLKAKKLTPSLQHFVLHSIAMTSESSCTTIDGLKATKNFLQCLGRFGNTPF   CHML_susScr Sus scrofa (pig)
AFVQSSFSEYLKTKKLTPNLQHFVLHSIAMMSESPCTTIDGLKATKNFLQCLGRFGNTPF   CHML_vicVic Vicugna vicugna (vicugna)
AFVQSSFSEYLKTKKLTPNLQHYILHSISMTSESSCTTLDGLKATKKFLQCLGRFGNTPF   CHML_eriEur Erinaceus europaeus (hedgehog)
AFIQCSFSDYLKTKKLTPNLQHFILHSIAMTPEASCSTVDGLKATKIFLQCLGRFGNTPF   CHML_sorAra Sorex araneus (shrew)
TFKQCSFSEYLKTKRLTPNIHHFVLHSIAITSQSSCTIIDGLKATKTFLWCLGWFSKNPF   CHML_dasNov Dasypus novemcinctus (armadillo)
AFEQCSFSEYLKTKKLTPNLQHFILHSIAMTSQSSCTTLDGLKATKNFLQCLGRFGNTPF   CHML_choHof Choloepus hoffmanni (sloth)
AFKQCSFSEYLKTKKLTPNLQHFVLHSIAMTSESSCTTIDGLKATKNFLQCLGRFGNTPF   CHML_loxAfr Loxodonta africana (elephant)
AFKHCSFSEYLKTKKLTPNLQHFVLHSIAMTSESSCTTIDGLKATKTFLQCLGRFGNTPF   CHML_proCap Procavia capensis (hyrax)
...                                                            CHML_echTel Echinops telfairi (tenrec)
...                                                            CHML_monDom Monodelphis domestica (opossum)
...                                                            CHML_ornAna Ornithorhynchus anatinus (platypus)
...                                                            CHML_anoCar Anolis carolinensis (lizard)
...                                                            CHML_galGal Gallus gallus (chicken)

Post-marsupial loss of TMT

TMT (for teleost multiple tissue) is another fairly obscure opsin, again subject of but a single 2003 publication that established its expression in "a variety of neural and non-neural tissues, including a zebrafish embryonic cell line that exhibits a light entrainable clock" suggesting TMT regulates peripheral clocks in teleost fish. The index sequence for TMT is defined by AF349947 (respectively AF402774 for fugu).

TMT is actually a family of 3 paralogs in teleost fish. The one studied has a partially syntenic copy on another chromosome likely resulting from whole genome duplication. A third paralog -- called TMT here -- is actually the only one with an syntenic counterpart in frog, lizard, birds, and marsupials but not placentals (even though the gene order is not disturbed).

The gene is curiously intertwined with the 4th and final coding exon of an apparently unrelated gene on the opposite strand, ST6GAL2 sialyltransferase, from fish to opossum (that is all species in which it is found). This would not create transcriptional or translational issues for either gene.

Curiously ST6GAL1 in the same orientation is adjacent to an older TMT gene on another chromosome but without the overlap, suggesting this latter gene pair is parental and that the exons were intertwined in fish stem after a segmental duplication, perhaps mediated by intronic retroposon driven recombination. This older gene is found intact from fish to lizard and as barely recognizable (blast of syntenic region) pseudogenes in chicken, finch and platypus but not marsupial or placental. This requires three independent losses, one in chicken/finch stem, another approximately synchronous one on the platypus branch (the pseudogene is old but way younger than platypus divergence), and a final earlier event in marsupial-placental stem that left no traces. GenBank has no transcripts for any tetrapod TMT as of Dec 2009.

>TMTa_taeGut Taeniopygia guttata (finch) pseudogene frag from ST6GAL1 synteny
FHSTSSSCVRADFTSVRFR GITSLISLAVLSYERCCTM RTAEPDTTNSRKAWTGIILSWTYSLLWTVPPLLG SSYGPEGPGITCSVNWHSQDANNASYIICLFIFCLVIPFAIIVYSYGKLLCAVRQV

>TMTa_galGal Gallus gallus (chicken) pseudogene frag from ST6GAL1 synteny
VRLIEFAIYILTFFFGAIFNVLALWVFFCKIKKWTETKVYAINLVFADFFVICILPFMAYLIWKNSVRDELCQFIEAMYFINMVVSIYVISFIFIDRYLGIKHPLKAR
AFRSPSKAALLCGLLWLAVTTGTILNFQQRYAHFCFQYDTSKPTALILLSFFIFTLPLATLTFCSIEIIRNLKKQMKTNALEEKSIQKALYIIYANLVVFLLCFLPSHLIVIARLVT
 
>TMTa_ornAnaPS Ornithorhynchus anatinus (platypus) pseudogene frags adjacent to syntenic GPR35, weak assembly from ST6GAL1 side
2 WAYASFWATMPLVGLGNYAPEPFGTSCTLDWWLAQASVAGQAFILNILFFCLLLPTAVIV 0
0 SKGVSKGMEKIGEQ*VQLTVFVVVICFLFCWLPYGTMASISTCGKPGLITPT SSVFPLVLG KNSTVLNPVIYGFLNvk 0
0 FYRCFHALMSF KDFTSSISEVSPIPFDFSCVTPRIQNNH-FPSASEGRP 

Elephant shark has two apparent paralogs but these are difficult to assign from single exon fragments; counterparts cannot be found in lamprey or Ciona, yet amphioxus and possibly sea urchin have related genes.

TMTs are encephalopsin-class opsin, clustering most closely with these opsins. Although initially found in fish, it is not a newly arisen product of whole genome duplication there because synteny to encephalopsin is lacking, the two encephalopsins are quite diverged, and species earlier and later (lacking the duplication) also have both copies.

Thus TMT likely arose in lamprey ancestor if not far earlier. Indeed, certain encephalopsins of cephalochordates cluster better with TMT than classical encephalopsins, as do the two encephalopsin-class opsins of the lophotrochozoan, Platynereis dumerilii, and 8 in insects and crustacea. The sea urchin gene associates equally with the two encephalopsin classes. There are additional unexpected associations with pinopsins -- that class may need to be broken into two groups.

TMT is thus likely the ancestral gene, with encephalopsin an ancient but later derived form receiving more emphasis historically because it persisted into some placentals, notably human, unlike TMT. Alternate scenarios involving loss of encephalopsin from early diverging bilaterans cannot be ruled out. It is noteworthy that Branchiostoma, a late deuterostome without imaging eyes, has an expanded repertoire of both TMT and encephalopsin, just as it did with neuropsins.

Reference set of 54 curated encephalopsins (including pseudogenes)

>ENCEPH_homSap Homo sapiens (human) NM_014322 OPN3 full
0 MYSGNRSGGHGYWDGGGAAGAEGPAPAGTLSPAPLFSPGTYERLALLLGSIGLLGVGNNLLVLVLYYKFQRLRTPTHLLLVNISLSDLLVSLFGVTFTFVSCLRNGWVWDTVGCVWDGFSGSLF 1
2 GIVSIATLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWAGAPLLGWNRYILDVHGLGCTVDWKSKDANDSSFVLFLFLGCLVVPLGVIAHCYGHILYSIRM 0
0 LRCVEDLQTIQVIKILKYEKKLAKMCFLMIFTFLVCWMPYIVICFLVVNGHGHLVTPTISIVSYLFAKSNTVYNPVIYVFMIRK 0
0 FRRSLLQLLCLRLLRCQRPAKDLPAAGSEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVDDSDKTNGSKVDVIQVRPL* 0

>ENCEPH_panTro Pan troglodytes full
0 MYSGNRSGGQGYWDGGGAAGAEGPAPAGTLSPAPLFSPGTYERLALLLGSIGLLGVGNNLLVLVLYYKFQRLRTPTHLLLVNISLSDLLVSLFGVTFTFVSCLRNGWVWDTVGCVWDGFSGSLF 1
2 GIVSIATLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWAGAPLLGWNRYILDVHGLGCTVDWKSKDANDSSFVLFLFLGCLVVPLGVIAHCYGHILYSIRM 0
0 LRCVEDLQTIQVIKILKYEKKLAKMCFLMIFTFLVCWMPYIVICFLVVNGHGHLVTPTISIVSYLFAKSNTVYNPVIYVFMIRK 0
0 FRRSLLQLLCLRLLRCQRPAKDLPAAGSEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVDDSDKTNGSKVDVIQVRPL* 0

>ENCEPH_gorGor Gorilla gorilla ok frag
2 GIVSIATLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWAGAPLLGWNRYILDVHGLGCTVDWKSKDANDSSFVLFLFLGCLVVPLGVIAHCYGHILYSIRM 0
0 LRCVEDLQTIQVIKILKYEKKLAKMCFLMIFTFLVCWMPYIVICFLVVNGHGHLVTPTISIVSYLFAKSNTVYNPVIYVFMIRK 0

>ENCEPH_ponAbe Pongo abelii full
0 MYSGNRSGGQGYWDGGGAAGAEGPAPAGTLSPAPLFSPGTYERLALLLGSIGLLGVGNNLLVLVLYYKFQRLRTPTHLLLVNISLSDLMVSLFGVTFTFVSCLRNGWVWDTVGCVWDGFSGSLF 1
2 GIVAIATLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWAGAPLLGWNRYILDVHGLGCTVDWKSKDANDSSFVLFLFLGCLVVPLGVIAHCYGHILYSIRM 0
0 LRCVEDLQTIQVIKILKYEKKLAKMCFLMIFTFLVCWMPYIVICFLVVNGHGRLVTPTISIVSYLFAKSNTVYNPVIYVFMIRK 0
0 FRRSLLQLLCLRLLRCQRPAKDLPAAGSEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVDDSDKTNGSKVDVIQVRPL* 0

>ENCEPH_nomLeu Nomascus leucogenys exon 1 ok frag
0 MYSGNRSGGQGYWDGGGAAGAEGPAPAGTLSPAPLFSPGTYERLALLLGSIGLLGVGN  1
2 GIVSIATLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWAGAPLLGWNRYILDVHGLGCTVDWKSKDANDSSFVLFLFLGCLVVPLGVIAHCYGHILYSIRM 0
0 LRCVEDLQTIQVIKILKYEKKLAKMCFLMIFTFLVCWMPYIVICFLVVNGHGHLVTPTISIVSYLFAKSNTVYNPVIYVLMIRK 0
0 FRRSLLQLLCLRLLRCQRPAKDLPAAGSEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVDDSDKTNGSKVDVIQVRPL* 0

>ENCEPH_macMul Macaca mulatta (rhesus) XP_001094239 full
1 MYSGNRSGGQGYWDGGGAAGAEGPAPAGTLSPAPLFSPGTYERLALLLGSIGLLGVGNNLLVLVLYYKFQRLRTPTHLLLVNISLSDLLVSLFGVTFTFVSCLRNGWVWDTVGCVWDGFSGSLF 1
2 GIVSIATLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWAGAPLLGWNRYILDVHGLGCTVDWKSKDANDSSFVLFLFLGCLVVPLGVIAHCYGHILYSIRM 0
0 LRCVEDLQTIQVIKILKYEKKLAKMCFLMIFTFLVCWMPYIVICFLVVNGHGHLVTPTISIVSYLFAKSNTVYNPVIYVFMIRK 0
0 FRRSLLQLLCLRLLRCQRPAKDLPAAGSEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVDDSDKTNGSKVDVIQVRPL* 0

>ENCEPH_papHam Papio hamadryas 1st exon problematic 1x ok frag
0 MYSGNRSGGQGYWDGGGAAGTEGPALVGTLIPAPLFSPGTYERLALLLGSIGLLGVGNNLLVLVLYYKFQRLRTPTHLL LLVSLFGVTFTFVSCLRNGWVWDTVGCVWDGFSGSLFG 1
2 gIVSIATLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWAGAPLLGWNRYILDVHGLGCTVDWKSKDANDSSFVLFLFLGCLVVPLGVIAHCYGHILYSIRM 0
0 LRCVEDLQTIQVIKILKYEKKLAKMCFLMIFTFLVCWMPYIVICFLVVNGHGHLVTPTISIVSYLFAKSNTVYNPVIYVFMIRK 0
0 FRRSLLQLLCLRLLRCQRPAKDLPAAGSEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVDDSDKTNGSKVDVIQVRPL* 0

>ENCEPH_calJac Callithrix jacchus full
0 MYSGNRSGGQGYWDGGEAAGAEGPAPAGTLSPAPLFSPGTYERLALLLGSIGLLGVGNNLLVLVLYYKFQRLRTPTHLLLVNISLSNLLVSLFGVTFTFVSCLRNGWVWDTVGCVWDGFSGSLF 1
2 GIVSIATLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWAGAPLLGWNRYILDVHGLGCTVDWKSKDANDSSFVLFLFLGCLVVPLGVIAHCYGHILYSIRM 0
0 LRCVEDLQTIQVIKILKYEKKLAKMCFLMIFTFLVCWMPYIVICFLVVNGHGHLVTPTISIVSYLFAKSNTVYNPVIYVFMIRK 0
0 FRRSLLQLLCLRMLRCQQPAKDLSAAGSEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVDDSDKTNGSKVDVIQVRPL* 0

>ENCEPH_tarSyr Tarsius syrichta weak full
0 MYSGNRSGGQGSWEGGGAAGAEGPAAAGIPAPIISRGTYERLALVLLGSIGLLGVGNNLLVLVLYYKFPRLRTPTHLLLANISLSDLLVSLFGVTFTFVSCLRNGWVWDTVDCMGYVLTIDLF 1
2 GIVSIATLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWAGAPLLGWNRYILDVHGLGCTVDWKSKDANDTSFVLFLFLGCLVVPMGVISYCYGHILYSIRe 0
0 LRCVEDLQTIQVIKILKYEKKVAKMCFFMIFTFLICWMPYIVICFLVVNSQGHLVTPTISVVSYLFAKSNTVYNPVIYIFMIRK 0
0 FRRSLLQFLCLRLLRCQQPAKDLPAAENEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVDDSDKTSGSKVDVIQVRPL* 0

>ENCEPH_otoGar Otolemur garnettii full
0 MYSGNRSGGQGFWEGGGAAGAEEPTPEGTLSPAPLFSPSAYERLALLLGSIGLLGVANNLLVLVLYYKFPRLRTPTHLFLVNISLSDLLVSLFGVTFTFVSCLRNGWVWDTVGCVWDGFSGSLF 1
2 GIVSIATLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWAGAPLLGWNRYILDVHGLGCTVDWKSKDANDSSFVLFLFLGCLVVPVGVVAHCYGHILYSIRM 0
0 LRCVEDLQTTQVIKILKYEKKVAKMCFFMIFTFLVCWMPLIVICFLVVNGQGHLVTPTVSIVSYLLAKSNTVYNPVIYIFMLRK 0
0 FRRSLLQLLCFRLLRCQRPAKDLPAAESEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVDNSDKTNGSKVDVIQVRPL* 0

>ENCEPH_micMur Microcebus murinus full
0 MYSGNRSGGQWFWEGGGAAGAEGPAPAGTLSPAPLFSPGTYERLALLLGSIGLLGVGNNLLVLVLYYKFPRLRTPTHLLLVNISLSDLLVSLFGVTFTFVSCLRNGWVWDTVGCVWDGFSSSLF 1
2 GIVSIATLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWAGAPLLGWNRYILDVHGLGCTVDWKSKDANDSSFVLFLFLGCLVVPVGVMVHCYGHILYSVRM 0
0 LRCVEDLQTIQVIKILKYEKKLAKMCFLMIFTFLVCWMPYIVICFLVVNGQRHLVTPTVSIVSYLFAKSNTVYNPIIYIFMIRK 0
0 FRRSLLQLLCFRLLRCQRPAKDLPASESEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVDNSDKTSGSKVDVIQVRPL* 0

>ENCEPH_tupBel Tupaia belangeri so-so frag
0   SGNRRGGQGLLEGGGAVGVEGLAPTGSQSPAPLFSRGTYERLALLLGSIGLLGVGHNLLVLVLYYKFPRLRTPTHLLLLNISLGDLLVSVFGVTFTFVTCLRNGWVWDTVSCAWDGFSSSLF 1
2 GIVSITTLTVLAYERYIRVVHARVINFPWAWRAITYIWLYSLAWAGAPLLGWNRYMLDVHGLGCTVDWKSK
0           MINILRYKKKVAKMCFLMILTFLICWMPYIVIRFLVVNGGYGHLITPTVSIVSFLFAKSSTVYNPVIYIFMIRK 0
0 FRRSLLQLLCFRLLRYQRPAKDLPAAGSEMQIRPIVMSQKDGDKPKKKVTFNSSSIIFIITSDESLSVDDSDKTSGSKVDVIQVRPL* 0

>ENCEPH_musMus Mus musculus Opn3 Panopsin NM_010098 2aa del full
0 MYSGNRSGDQGYWEDGAGAEGAAPAGTRSPAPLFSPTAYERLALLLGCLALLGVGGNLLVLLLYSKFPRLRTPTHLFLVNLSLGDLLVSLFGVTFTFASCLRNGWVWDAVGCAWDGFSGSLF 1
2 GFVSITTLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWAGAPLLGWNRYILDIHGLGCTVDWRSKDANDSSFVLFLFLGCLVVPVGIIAHCYGHILYSVRM 0
0 LRCVEDLQTIQVIKMLRYEKKVAKMCFLMAFVFLTCWMPYIVTRFLVVNGYGHLVTPTVSIVSYLFAKSSTVYNPVIYIFMNRK 0
0 FRRSLLQLLCFRLLRCQRPAKNLPAAESEMHIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVEDSDRSSASKVDVIQVRPL* 0

>ENCEPH_ratNor Rattus norvegicus XP_573517 predicted 2aa del full
0 MYSGNRSGGQGYWEDGAGAEGAAPAGTRSPAPLFSPTAYERLALLLGCLALLGVGGNLLVLLLYSKFPRLRTPTHLFLVNLSLGDLLVSLFGVTFTFASCLRNGWVWDAVGCAWDGFSGSLF 1
2 GFVSITTLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWAGAPLLGWNRYILDVHGLGCTVDWKSKDANDSSFVLFLFLGCLVVPMGIIAHCYGHILYSVRM 0
0 LRCVEDLQTIQVIKMLRYEKKVAKMCFLMAFVFLTCWMPYVVTRFLVVNGYGHLVTPTVSIVSYLFAKSSTVYNPVIYIFMIRK 0
0 FRRSLLQLLCFRLLRCQRPAKNLPAAESEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVEDSDRSSASKVDVIQVRPL* 0

>ENCEPH_speTri Spermophilus tridecemlineatus full
0 MYSGNRSGSQGSWEGDGSAGAEGSAPEGTLSPTPLFSPGTNERLALLFRSVGLLGAGSNLLVLVLYYKFQGSAHPLTFFLVNISLGDLLMSLFGVTFTFVSCLRNRWVWDTVACVWDGFSSSLF 1
2 GIVSITTLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWAGAPLLGWNRYILDVHGLGCTVEWKSKDANDSSFVLFLFLGCLVVPVGVIAHCYGHILYSIRM 0
0 LRCVEDLQIFQVIKILRYEKKLAKMCFVMVFTFLICWMPYIVVCFLVANGYGQRVTPTVSIVSNLFAKSSTVYNPVIYIFMIRK 0
0 FRRSLLQLLCSRLLRCQQPAKDLPAVGNEMQIRPIVISQKDGERPKKKVTFNSSSIVFIITSDESLSVDDSNRTSGSKADVIQVRPL* 0

>ENCEPH_dipOrd Dipodomys ordii full
0 MYSGNRSGGQEYWEDGGAAGSEGPAPAGTLSPAPLFSAGAYERLALLLGSAGLLGVGNNLLVLVLYYKFQRLRTPTHLLLVNISLSDLLVSLFGVTFTFVSCLRNGWVWDTVGCVWDGFSRSLF 1
2 GIVSITTLTVLAYERYIRVVHARVINFTWAWRAITYIWLYSLAWAGAPLLGWNRYILDIHGLGCTVDWKAKDANDSSFVLFLFIGCLVVPVGIIAHCYGHILYSIRM 0
0 LRCVEDLQTIQIIKILQYEKKLAKMCFLMALTFLMCWMPYIVTCFLVVNSHGHLVTPTISIVSHLLAKSSTIYNPVIYIFMIRK 0
0 FRRSLLQLLCFRLLRCQRPAKDLPAAGSEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVDDSVRSSGSKADVIQVRPL* 0

>ENCEPH_cavPor Cavia porcellus 3 aa del full
0 MYSGNRSSGQGYWEGGGPEDPAPAGTLSPAPLFSPGAYERLALLLGSLGLLGVGNNLLVLVLYYKFQRLRSPTHLFLANISLSDLLGSLFGVTFTFVSCLKNGWVWDAVGCVWDGFSRSLF 1
2 GIVSITTLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWAGAPLLGWNRYILDIHGLSCTVDWKSKDANDSSFVLFLFLGCLVVPVGVIVHCYGHILYSIRM 0
0 LRGVEDLQTIQVMKILRSENKVAIMCFLMVFIFLVCWMPYIVICFLLVNGYRHRVTPTVSIVSYLFTKSSTVYNPVIYVLMIRK 0
0 FRRSLLQLHCLRLLRCQQPAKDLPAVEREMHIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVDDSDRTSGSKVDTIQVRPL* 0

>ENCEPH_oryCun Oryctolagus cuniculus full
0 MYSGNRSGEQGYWEGGGAAGAEGPGPAGTLSPAPLFSPSTYERLALLLGSIGLLGVGSNLLVLVLYYKFQRLRTPTLLFLVNISLSDLLVSVFGVTFTFVSCLRNGWVWDTVGCVWDGFSSSLF 1
2 GIVSITTLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWAGAPLLGWNRYILDIHGLGCTVDWKSKNANDSSFVLFLFLGCLVVPVGVIAHCYGHILYSVRM 0
0 LRCVEDLQTIQVIKILRYEKKVAKMCFFMVFTFLICWMPYVVICFLVVNGYGHLVTPTLSIVSYLFCKSSTAYNPIIYIFMIRK 0
0 FRRSLLQLLCFQPLRCQQPPKDLPTVGSEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIIASDESLAVDDNEKASGPKVDVIQVRPL* 0

>ENCEPH_ochPri Ochotona princeps 5aa del ok frag
0 MYSGNRSSGQGHWEDAEESEPAGTVSPAPLFSTNTYERLALLFGSLGLLGVGNNLLVLVLYYKFQRLRTPTHLFLVNLSLSDLLVSLFGVTFTLVSCLRNGWVWDTVGCVWDGFSSSLF 1
2 GIVSITTLTVLAYERYIRVVHARVINYSWAWRAITYIWLYSLAWAGAPLLGWNRYMLDIHGLGCTVDWKSKNANDSSFVLFLFLGCLVVPVGVIAHCYGHILYSVRM 0
0 LRCVEDLQTIQVIKILRYEKKVAKMCFFMIFTFLICWMPYIVIRFLVVNGYGRLVTPTISIVSYLFCKSSTVYNPVIYIFMIRK 0

>ENCEPH_canFam Canis familiaris (dog) DN422921 transcript correct, genbank XP_854433 errs
0 MYSGNRSGGQGHWEGGGAAGAEGPGPAGTLSPAPLFSPGTYERLALLLGSVGLLGVGNNLLVLVLYSKFQRLRTPTHLLLVNLSLSDLLVSLFGVTFTFVSCLRNGWVWDSVGCVWDGFSSSLF 1
2 GIVSITTLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWSGAPLLGWNRYILDVHGLGCTVDWKSKDANDSFFVLFLFLGCLVVPMGVIVHCYGHILYSIRM 0
0 LRCVEDLQTIQVIKILRYEKKVAKMCFLMIFIFLIFWMPYIVICFLVVNGYGHLVTPTVSIVSYLFAKSSTVYNPVIYIIMIRK 0
0 FRRSLLQLLCFRPLRCQRPAKDLPANGSEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESVSIDDSDKTSVSKVDVIQVRPL* 0

>ENCEPH_felCat Felis catus full
0 MYSRNRSGGQGHWEGGGAAGAERQGPAGTLSPAPLFSPGTYERLAMLLGSIGLLGVGNNLLVLVLYYKFQRLRTPTHLLLVNISLSDLLVSLFGVTFTFVSCLRNGWVWDTVGCVWDGFSSSLF 1
2 GTVSITTLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWSGAPLLGWNRYILDVHGLGCSVDWKSKDANDSSFVLFLFLGCLVVPVGVIAHCYGHILYSVRM 0
0 LRCVEDLQTIQVIKILRYEKKVAKMCFLMISTFLIFWMPYIVICFLVVNGYGHLVTPTVSIVSYLFAKSSTVYNPVIYIFMIRK 0
0 FRRSLLQLLCFRLLRCQRPAKDLPTNGSEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVEDSDKTSVSKVDVIQVRPL* 0

>ENCEPH_bosTau pseudo frag
0 P-----SASG---RRGG----A-----G*SSLTLPVSEGAYERVV-LLGSVGLPGVGSNLLVLVIYLKLPRLRSPARLLLLHVSLGDLLPSVLQAALAFAFPLRGSRVGGTTIGEWDGFSSSL* 1
2 GIVSIITLTRLAYECYICVIHARVINFPQAWRTIPCIWLSSTVWSGASLLGWNNHILDMHGPGCTGDWPSKDTSHSSFVLFLFLGCL    GVIAHCHGHILFSIQ 
0 F*RALLQLLCF*LLRCQIPAKDLSAVGREMRLTSSVKSQKDRDVTERQG-QAKEKRTFNSSSIIFIITNDESLSVGDSDRTNGSKVDVIQVHPL

>ENCEPH_turTru Tursiops truncatus pseudo frag
0 NCGGGGAGWEGG-----EGLWPGQPSLTQPV-SQGAYELLVLLLGSVGLLGVGSSLLVLVLYLKFPRLRSPSRLFLLHVGLGNLLPSVLRAALAFAFRPRGGVVGGATNCVWDGFSNSLW 1
2 GIFSIITLTTLACERYIGMIHNRVISFSWAWRAITYIWLYSLVRSGSPLLG*HRYILDVHVLGCAVDWKSKDTSDSSFVLFLFLDCMVVPVGVIAHCYGHILYSIRK 0
0 FRRALLQLLCFRLQRCQ*PAKDLPAVGSEMQIQLIVMPQKDRDRPKKKLTFNSSSIIFVITNDESLSVD-GERTSGS*VDVIQVCPL* 0

>ENCEPH_susScro no coverage

>ENCEPH_vicVic pseudo pseudo frag
2 GIVSIISLTVLAYECYIHVVHARMITFSWTWRAVTYIWLYTLVWSGVPLMG*NRYIL-FHGLGCAVDWKSKDANDFCFVLFLFLGSLVVPVGVIAHCYGHILYSIEL 0
0 RGVEDLQTIKVIRILRYENKLARMCFCMTFTFMILWMPYVVICFLMFSDGGHLVTLTVFIVS*PFTNSSTVYDAAFYIFMIRK 0
0 FQRALLHLLCFRLLRYQQPAKDLPTYQS*MQIRPIEMSQKVRDRPKKKVIFNSSPIIFIITHDGSLSVDDKD

>ENCEPH_myoLuc Myotis lucifugus weak frag
0 MFSGNRNG--GQFRGGQLGLGHRGVGASGTLGPRAFLKNIYFYSFERRR

>ENCEPH_pteVam Pteropus vampyrus 86%=homSap full
0 MHSGNRSGGLDSWEGGGAAGAEGPGLAGTLSPGSVFNPSTYERLALLLGSIGLLGVANNLLVLVFYYKFQQVRTPFYLFLVNISFSDLLVSFFGVTFTFVSCLRNGWVWDTVGCVWDGFSSSLF 1
2 GTVSMTTLTVLAYERYIRVVQARAIDFSWAWRTITYIWLYSLGWSGAPLLGWNRYILDVHGLGCAVDWKSKDANDSSFVLFLFLGCLVVPVVVIAHCYGHILYSVQM 0
0 LRCVEDLQTIQVIKILRYEKKMAKMCFLMIFTFLISWMPYIVICFLVVNGYGHLVTPTVSIVSYLFAKSSTVYNPVIYIFMIRK 0
0 FRRFVLQLLCFRPLRCRRPATDLPAGGSEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFVITSDESLSVDDSDKINGSKADGIQVRPL* 0

>ENCEPH_equCab Equus caballus full
0 MTAGTRAGGQGSWEGGGAAGAEGPGPAGPLSPAPLFSPGTYERLALLLGCLGLLGVGNNLLVLVLYSKFPRLRTPTHLLLVNISLSDLLVALFGVTFTFVSCLRNGWVWDAVGCAWDGFSSSLC 1
2 GIVSITTLTVLAYERYIRVVHARVINFSWAWRALTYIWLYSLAWSGAPLLGWNRYILDIHGLGCAVDWKSKDANDSTFVLFLFLGCLVVPMGVIAHCYGHILYSIRM 0
0 LRCVEDLQTIQVMKILRYEKKLAKMCFFMIFTFLIFWMPYIVICFLVANGYGHLVTPTVSIVSYLFAKSSTIYNPIIYIFTIRK 0
0 FRRSLSQLLCFRLLRCQRPAKDQPPVGSEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVHDSDKINGSKVEVIQVRPL* 0

>ENCEPH_eriEur no coverage

>ENCEPH_sorAra pseudo frag
0 RSVHTSRRGLDAGDRGGAPGATEPGRADDAVLSAALLLGAGRGTLLVLILHQKCRRPLTSPLAQLGPVNVSRGKLLVSLFGITFVFFLRNCWVWETEGRGAFSCSVL
                              

>ENCEPH_loxAfr Loxodonta africana 2 exons in browser, 1 2x full
0 MYSGNRSGGQDLWEGGGGSGGAGPAGTLSPAPVFRSGTYERLALLVGSIGLLGVGNNLLVLVLYYKFQRLRTPTHLFLVNISLSDLLVSLFGVTFTFVSCLRNGWVWDTVGCVWDGFSSSLF 1
2 GIASITTLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWSGAPLLGWNRYILDTHGLACTVDWKSNNSSDSSFVLFLFLGCLVVPVGVIAHCYGHILYSIRM 0
0 LRCVEDLQTIQVIKILRHEKKLAKMCLFMIFTFLICWMPYIVICFLVVNGYGHLVTPTISIVSYLFAKSSTVYNPVIYTFMIRK 0
0 FRRSLLQLLCFRLLRCQRPAKDLPVVGSEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVNNIDKTNGSKADVIQIRPL* 0

>ENCEPH_echTel Echinops telfairi no coverage

>ENCEPH_proCap Procavia capensis ok frag
2 GIASITSLTVLAYERYIRVVHARVINFSWAWRAITYIWLYSLAWSGAPLLGWNRYILDTHGLACTVDWKSNNTNDSSFVLFLFLGCLVVPVGVIVHCYGHILYSIRM 0
0 LRCVEDLQTIQVIKILRYEKKVAKMCLFMILTFLICWMPYIVICFLMVNDYGYLVTPTISIVSYLIAKSSTVYNPVIYTFMIRK 0
0 FRRSLFQLLCFRLLRCQRPAKNKPEVGSEMQIRPIVMSQKDGDRPKKKVTFNSSSIIFIITSDESLSVNDTDKINGSKADVIQVRPL* 0

>ENCEPH_dasNov Dasypus novemcinctus  pseudo
0 MYSGNRSSGHES---GGPT--------GTLGSAAFFSPRTYERLALLAGAGGLLGAGGHLLVLALRCALPQLRSPPRRLLVTASLGDPLVSVFGVAFTCAACLRSG-VWDPAGCVGDGFGSGLC 1
2 GIVSTSSLTGLASEHSIRVVHASLISFSWAWAWLYSLAWSGVPLLGWDRYVLDVHRRGCTLNLRARDSSASSRVLFLFLGCVAVPVGVTVHCHGHILHSIRM 0
0 FLCVEGLQTVQVIKILKYEKKAATMCLVVVASFLMGWMPYIAIHFSVVNGYEHLVTPVVSTVSRLFAKSSPVYNPVIYIIMIRK 0
0 FHRSFL*LLFLQLLRCQRPAQDLPVVESEMQVRPTVMSQKDRHRPKKKVTFNSSSIIFIITSDESVSVNGSDKTNGSKFDVI     * 0
                                                                                                                                                                                                                                                                       
>ENCEPH_choHof Chololepis hoffmanni so-so frag
0 MYSGNRSGGRDYWEGGGGAGAEGPGPTGTLSPALVFSPGTYERLAGLIGSIGLLGAGNNVLVLILYYKFQRLRTPTHLFLVNISFSDLLVSLFGVTFTFISCLRNGWVWDTVGCVWDGFSSSLF 1
2 GIVSITTLTVLAYERYIRVVHARVVNFSWTWRAITYIWLYSLACSGASLLGWNRYTLDIHGLACSVNWKSPDSSDSSFVLFLLLDCLTGPVAVIAHCY 
0 LRCVEDLQTVQVIKILRYEKKVAKMCFVMIATFLMCWMPYIVICFLVVNGYGHLVTPTVSIVSHLFVKSSTVYNLVIYIFMLRK 0
0 FRRSLLQLLCFRLLRCQRPAKDLPVVGCEMQIRPIVMSQKEGHRPKKKVTFNSSSIIFIITSDESISVDGSDKTNGPKVDVIQVRPL* 0

>ENCEPH_monDom Monodelphis domestica (opossum) encephalopsin OPN3 75%=homSap full
0 MYSDNSSDDGGGGYWGSGRAGGASGTGVTGEPGPEGSPRQAPLFSPGTYELLALLIATIGLLGLCNNLLVLVLYYKFQRLRTPTHLFLVNISFNDLLVSLFGVTFTFVSCLRSGWVWDSVGCAWDGFSNTLF 1
2 GIVSIMTLTVLAYERYNRIVHAKVINFSWAWRAITYIWLYSLVWTGAPLLGWNRYTLEIHGLGCSVDWKSKDPNDSSFVIFLFFGCLMLPVGVMAYCYGHILYAIRM
0 LRCVEELQTIQVIKILRYEKKVAKMCFLMIAIFLFCWMPYAVICLLVANGYGSLVTPTVAIIASLFAKSSTAYNPIIYIFMSRK 0
0 FRRCLLQLLCFRLLKFQQPKKDRPVIRTEKQIRPIVMSQKVGDRPKKKVTFSSSSIIFIITSDETQMIDENDKNSGTKVNVIQVRPL* 0

>ENCEPH_sacHar Sarcophilus harrisii (tasmanian_devil) 94% identity monDom
0 MYSGNSSDDAGGGYWGSGGTGGAGGTGVAGEPAPEGSPRPAPLFSPGTYELLALLIATIGLLGLCNNLLVLVLYYKFQRLRTPTHLFLVNISFSDLLVSLFGVTFTFVSCLRSGWVWDSVGCAWDGFSNTLF 1
2 GIVSIMTLTVLAYERYNRIVHAKVINFSWAWRAITYIWLYSLIWTGAPLLGWNRYTLEIHGLGCSVDWKSKDPNDSSFVLFLFLGCLVLPVGVMAYCYGHILYAIRM 0
0 FRCVEELQTLQVIKILRYEKKVAKMCFLMIATFLFCWMPHAVICFLVANGYGSLVTPTVAIIPSLFAKSSTAYNPIIYIFMSRK 0
0 FRRCLLQLLcFRQLKFQQPKKDRAIIRTEKQIRPIVMSQKVGDRPKKKVTFSSSSIIFIITSDETQMIDDNDKNSETKVNVIQVRPL* 0

>ENCEPH_macEug Macropus eugenii ok frag
0                         GALGCREPGQREPSSSAPFSPGTYELLALLIATIGLLGLCNNLLVLVLYYKFQRLRTPTHLLLVNISFSDLLVSLFGVTFTFVSCLRSGWVWHTVGCAWDGFSNSLF 1
2 GIVSIMTLTVLAYERYHRIVHAKVINFSWTWRAITYIWLYSLVWTGAPLLGWNRYTLEIHGLGCSVDWKSKDPNDSSFVLFLFLGCLVLPVGVMAYCYGHILYAIRM 0
0 0
0 FRRCLLQLLCFRQLKFQQPKKDRPVIRTEKQIRPIVMSQKVGDRPKKKVTFSSSSIIFIITSDETQMIDDNDKNNGTKVNVIQVRPL* 0

>ENCEPH_ornAna Ornithorhynchus anatinus pseudo
0 MVPWNGS-GRHLGAVR---GPE--SLPATPGAARPSRPGAGDGRL--LGLF-P-GVGGNLLVLLL--ALPGPPTTTDLYLASVAVSDLL--LL---LPFVYRLWRSRPWVFVCRLLGE-GGSLA 1
2 GIVSLISLAVLSYERYTLTLHPKQSNYQKAVLAVGASWIYSLIWTIPPLLGWSSYGTEGAGTSCSVHWSSKSVC-SYIVCLFI--CLVIPVLVMIYCYGRLLYAVKQ 0
0 LHCVKELQNIQVIGSLRYER*VTEMYFFTIAQFLVCQSPSALVSYPAAH-----VSPVVAKISPVFANSSFVYNPVISIFVRRK 0
0 KASR*KVNVIQVQPPS* 0

>ENCEPH_galGal Gallus gallus (chicken) 71%=homSap encephalopsin OPN3 full
0 MHSGNGTGATSRPQLAAAGHEVPGERPLFSAGTYELLALLIATIGTLGVCNNLLVLVLYYKFKRLRTPTNLFLVNISLSDLLVSVCGVSLTFMSCLRSRWVWDAAGCVWDGFSNSLF 1
2 GIVSIMTLTVLAYERYIRVVHAKVIDFSWSWRAITYIWLYSLAWTGAPLLGWNRYTLEIHGLGCSMDWKSKDPNDTSFVLLFFLGCLVAPVVIMAYCYGHILYAVRM 0
0 LRCVEDFQTSQVIKLLKYEKKVAKMCFLMISTFLICWMPYAVVSLLVTYGYSNLVTPTVAIIPSFFAKSSTAYNPVIYIFMSRK 0
0 FRQCLLQLLCFRLMRFQRIMKEPSGAGNVKPIRPIVMSQKVGDRPKKKVTFSSSSIIFIIASDDTQQIDDNSKHNGTKVNVIQVKPL* 0

>ENCEPH_taeGut Taeniopygia guttata mrna CK301424 70%=homSap full
0 MPAGNGTGTSGRPAPAAPEQEVPGERPLFSAGTYELLALLVATIGMLGLCNNLLVLVLYYKFKRLRTPTNLFLVNISLSDLLVSVFGVSLTFMSCLRSRWVWDAAGCVWDGFSSSLF 1
2 GIVSIMTLTALAYERYIRVVHAKVIDFSWSWRAITYIWLYSLAWTGAPLLGWNRYTLEIHGLGCSVDWKSKDPNDTSFVLLFFLGCLVAPVGIMAYCYGHILHAVRM 0
0 LRCVEDFQTVQVIKLLRYEKKVAKMCFLMISTFLICWMPYAVVSLLITYGYSNLVTPTVAIIPSFFAKSSTAYNPVIYIFMSRK 0
0 FRRCLLQLLCFRLMRFQRTMRETPATGSDKPIRPIVLSQKAGDRPKKKVTFSSSSVIFIITSDDAEQIEDSSKHNETKVNAIQVKPL* 0

>ENCEPH_anoCar Anolis carolinensis (lizard) 70%=homSap OPN3 full
0 MFSANGTRSGAGSDLEPGPGQQQQQREASEEEERGAGLSPFSAGTYELLALLVAAIGLLGLCNNLLVLVLYAKFKRLRTPTHLFLVNISLSDLLVSLFGVSFTFGSCLRHRWVWDAAGCVWDGFSNSLF 1
2 GIVSIMTLTVLAYERYIRVVHARVIDFSWSWRAITYIWLYSLAWTGAPLLGWNHYTLEIHGLGCSVDWQSKEPSDSSFVLFFFLGCLAAPVGIMAYCYGHILHAIRM 0
0 LRCVEDLQSIQVIKILRYEKKVAKMCFLMVTTFLICWMPYAVVSLLIAYGYGHLITPTVAIIPSFFAKSSTAYNPVIYIFMSRK 0
0 FRRCLVQLFCVQFLRFKRTLKEQPAIESNKPIRPIVMSQKVGDRPKKKVTFSSSSIIFIITSDDTEQIDVSTKCSDTKINVIQVKPL* 0

>ENCEPH_xenTro Xenopus tropicalis (frog) 45%=homSap recent pseudogene, I for Schiff K, loss of C-terminal conserved residues
0 MPVTNGSHNNSISWLHSKDMFTEDTYHFLALIVATVGFLGLVNNLLVLILYCKFKRLQTPTNLLFFNTSLCHFVFSLLAITFTFMSCVRGSWAFSVEMCVFHGFSKNLL 1
2 GIVSFGTLTVVAYERYARVVYGKYVNSSWSKRSITFVWVYSLAWTGFPLIGWNLYTFETHKLDCSFEWTATDPKDTAFVLLFFLACITLPLSIMAYCYGYILYEIQK 0
0 LRSVKNIQNFQEITILDYEIKMAKMCLLMMLTFLIGWMPYTILSLLVTSGYSKFITPTITVMPSLLAIASAAYNPVIHIFTIKK 0
0 FRQCLVQLLFHNFWRLLKNLNGRLAMKKVKPVLGKGRSHNRPEKKVFSSSDFFTRTTSDTGTHGITESTKGKRTNVRLIQVHPLYP* 0

>ENCEPH_danRer Danio rerio (zebrafish) NM_001111164 mrna 61%=homSap full
0 MNSFNETPTEAHLENYNYIFADETYKLLTFTIGSIGVLGFCNNIIVIILYSRYKRLRTPTNLLIVNISVSDLLVSLTGVNFTFVSCVKRRWVFNSATCVWDGFSNSLF 1
2 GIVSIMTLSGLAYERYIRVVHAKVVDFPWAWRAITHIWLYSLAWTGAPLLGWNRYTLEVHQLGCSLDWASKDPNDASFILFFLLGCFFVPVGVMVYCYGNILYTVKM 0
0 LRSIQDLQTVQTIKILRYEKKVAVMFLMMISCFLVCWTPYAVVSMLEAFGKKSVVSPTVAIIPSLFAKSSTAYNPVIYAFMSRK 0
0 FRRCMLQMLCSRLTSLQHTIKDRPLSRIEHPIRPIVMSQSRTDRPKKRVTFSSSSIVFIIASHDTHPLDITSKCNDEPDINVIQVRPL* 0

>ENCEPH_tetNig Tetraodon nigroviridis (pufferfish) homSap=61% full
0 MSSADDSRSARSGEPSLFAVHTYRLLAAAIGAIGVLGFCNNLAVAALYWRFRRLRTPTNLLLLNISLSDLLVSLLGVNFTFAACVQGRWTWNQATCVWDGFSNSLF 1
2 GIVSIMTLAALAYERYIRVVHAQVVDFPWAWRAIGHIWLYSLAWTGAPLLGWNRYTLEIHRLGCSLDWASKDPNDASFILLFLLACFFVPVGIMIYCYGNILYAVHM 0
0 IRSIQDLQTVQIIKILRYEKKVSVMFFLMISCFLLCWTPYAVVSMMVAFGRKSMVSPTVAIIPSFFAKSSTAYNPVIYVFMSRK 0
0 FRRCLLQLLCSRLSWLQRGLKERPLAPVQRPIRPIVVSRPCGKGTRPKKKVTFSSSSIVFIITSDDFRQLDVTSRAGDSADVNAIQVRPL* 0

>ENCEPH_takRub Takifugu rubripes (pufferfish) homSap=61% full
0 MNPANGSRSERSAEQLLFSGDTYRVLAFTIGTIGAFGFCNNFVVLALYCRFKRLRTPTNLLLVNISLSDLLVSLFGINFTFAACVQGRWTWTQATCVWDGFSNSLF 1
2 GIVSIMTLAALAYERYIRVVHAQVVDFPWAWRAIGHIWLYALAWTGAPLLGWNRYTLEIHRLGCSLDWASKDPNDASFILLFLLACFFVPVGIMIYCYGNILYAVQM 0
0 IRSIQDLQTVQIIKILRYEKKVSVMFFLMISCFLLCWTPYAVVSMMVAFGRRSMVSPTMAIIPSFFAKSSTAYNPLIYVFMSRK 0
0 FRHCLLQLLCSRLSWLQRSLKERPLAPVQRPIRPIVMSRPCGKGNRPKKKVTFSSSSIVFIITSDDFGQLDVTSKSGDSADVNAIQVRPL* 0

>ENCEPH_gasAcu Gasterosteus aculeatus (stickleback) 58%=homSap full
0 MNPDNGTREERSTDHSIFAVGTYKLLAFAIGTIGVFGFCNNVVVIVLYCKFKRLRTPTNLLVVNISLSDLLVSVIGINFTFVSCIRGGWTWSRATCIWDGFSNSLF 1
2 GIVSIMTLASLAYERYIRVVHAQVVDFPWAWRAIGHIWLYSLVWTGAPLLGWNRYTLEIHRLGCSLDWASKDPNDASFILLFLLACFFVPVGIMIYCYGNILYAVQM 0
0 LRSIQDLQTVQIIKILRYEKKVAVMFLLMISCFLLCWTPYAVVSMMEAFGRKNMVSPTVAIIPSFFAKSSTAYNPLICVFMSRK 0
0 FRRCLMQLLCSRVTCLQCNLKERPLAPVQRPIRPIVVSAACGGGRVRPKKRVTFSSSSIVFIITRNDIRHTDVTSNTRESSEANVFQVRPL* 0

>ENCEPH_oryLat Oryzias latipes  58%=homSap full
0 MNPANESRAGRHEERSVFAVGTYKLLTVIIGTIGVFGFCNNLLVILLYCKFKRLRTPTSLLLVNISLSDLLVSVVGINFTLASCVKGRWMWSQATCVWDGFSNSLF 1
2 GIVSIMTLAALAYERYIRVVHAQVVDFPWAWRAIGHIWLYSLAWTGAPLLGWNRYTLEIHQLGCSLDWASKDPNDAAFILLFLLGCFFVPVGIMIYCYGNILYAVRM 0
0 LRSIEDLQTVQIIKILRYEKKVAAMFLLMISCFLVCWTPYAVVSMMEAFGKKSMVSPTVAIVPSFFAKSSTAYNPLIYVFMNRK 0
0 FRRCFLQLLGSRLCSKISWLQCTLKEHPLTPVERPIRPIVASTSCGSRHRPKKRVTFNSSSIVFMITGDEFQQLDVTSKSRNSSEANVFHVRPL* 0

>ENCEPH_calMil Callorhinchus milii (elephantfish) wgs frag  
0 MNPTNSTEPQEEHLFSPNTYKLLAVIIGTIGIVGFCNNILVLLLYYKFKRLRTPTNLLLVNISVSDLLVSVFGLSFTFVSCTQGRWGWDSAACVWDGSHSLF 1
2 GTVSIVTLTVLAYERYIRVVNAKATNFPWAWRAITYTWFYSLAWSGAPLV
0 0
0 YRRCLSQLFCSHLMSLQWSIKDPSSKARNDMPVKPIVLSQKGDRPKKRVTFSSSSIVFIITSDDTQELGSIAGSNATQISIVQVQPL* 0

>ENCEPH_squAca Squalus acanthias (dogfish) Gt 0...2...0.0 indel x x x x 202 aa 000 nm no_ref genome fragment   
0 MNAANSTDTREESLFSPGTYQVLAVIIGTIGVVGFCNNLLMLVLYCKFKRLRTPTNLFLVNISISDLLLSVFGVIFTFVSCVKGRWVWDSAACVWDGFSNCLF 1
2 GISSIMSLTVLAYERYIRVVNATAIDFSWAWRAITYIWLYSLAWTGAPLIGWNSYTLELHRLGCSVNWDSRNPSDTSFVLFLFLGCLLCPIGVIAYCYG

>ENCEPH_petMar Petromyzon marinus (lamprey) no_ref genome fragment   
0 MQSPKQDSLHYAGDTGAKAAPDSAQGNASALGSNFLLHGGDLGEGSTAFSAATFRLLAGVVGTIGVAGFLNNLLLVALFVGFKRLQTPTNLLLVNISLSDLLVSVFGNTLTLVSCVRRRWVWGNGGCVWDGFSNSLF 1
2 GIVSISTLTALSYERYARLIKAQVLDFSWAWRAVTYTWLYSAAWTGAPLLGWSRYVLEKHGLGCSIDWASSNPPDAAFVLFFFLGCLAAPLLVMGFCFGRIALAITQ 0
0 CWSPYAVASLFVASGFEHLVSPPVSIVPSLLAKSNAVCNPLLFLLMSGN 0

>ENCEPH4_braFlo Branchiostoma floridae (amphioxus) Gt -ZFYVE1 +RTF1 -CES1 -POMT2 402 aa 12435605 AB050608 Amphiop4 new exon 12 and 34
0 MALYNNTSSPSQDLLWDAPYSQGHIWDNSSASNSSEDVMDQGKVELQDFSDAGYTAIATCLALI 1
2 GFVGFTNNFVVILLIGCHRQLRTPFNLLLLNMSVADLLVSVCGNTLSFASAVRHRWLWGRPGCVWYGFANSLF 1
2 GIVSLVTLSALAFERYCVVVRSSDMLTYKSSLVVITFIWLYSLLWTSLPLLGWSSYQFEGHN 0
0 VGCSVNWVQHNPDNVSYIVTLMVTCFFVPMVVVCWSYAWIWRTVRM 0
0 SSEAKPECGNSQNAGRLVTTMVVVMIICFLVCWTPYAVMALIVTFGADHLVTPTASVIPSLVAKSSTAYNPIIYVLMNNQ 0
0 FREFLLARLQRVCCRQQAVPRVTPMDDNVHVRLGGEGPSQSQQFLPAGENVENVDMLEYVQENCKPKADSLSTISE* 0

>ENCEPH4_braBel Branchiostoma belcheri (amphioxus) AB050608 full Amphiop4 introns from braFlo PUBMED 12435605 
0 MPLYNTSSGPTQGLPWDTPYSQDPIWNDSSPSNSSEDAVVDQGRGELQDFSDAGYTAIATGLALI 1
2 GLVGSMNNFVVILLIGCHRQLRTPFNLLLLNVSVADLLVSVCGNTLSFASAVQHRWLWGRPGCVWYGFANSLF 1
2 GIVSLVTLSALAFERYCVVVRSSEMLTYKSSLGMIAFIWMYSLLWTSLPLLGWSSYQFEGHS 0
0 VGCSVNWVKHNVNNVSYIITLMVTCFFVPMVVVCWSYACIWRTVRM 0
0 SAEMKSEFGNPQNTGRLVTTMVVVMIVCFLVCWTPYTVMALIVTFGADHLVTPTASVIPSLVAKSSTAYNPIIYVLMNNQ 0
0 FREFLLARLRTFCCRQPRMLRVTPMDDNAHARLVGEGPSHAQQVIPSEENGENVEMRKVQGNQLKADSLSTISE* 0


>ENCEPH_strPur Stronglyocentrotus purpuratus GLEAN3_03451 modified terminal exon by extending penultimate to stop codon
0 MSLATKKHFIRNAVEEGGHLLEKWDKGG 2
1 YAFIMTFLGLNSLMSHAVIAVDRYLVITKPHF 1
2 GIVVTYPKAFLMISIPWVFSFAWAVFPLAGWGEFTYEGTGAWCSVRWDSDQPQIMSYVLAMMFLTFISSIVIMMYCYICIFLTTRRMPRWATSNSIKTHERNRRRR 2
1 EQKLLKTLIAIAIAFLVAWSPYAITSMIVVFGGSELLSLTATTLPSLFAKSSVMINPIIYAVTSRVFRKSLKK 0
0 MLTSFFPGCMTYIMTDKSPPSSSRPIQLGLCKYHFLY* 0

Reference set of 31 TMT genes from amphioxus to marsupial

>TMT_monDom Monodelphis domestica shortened final exon DFPEVSEKQLCLLS PEVWPQP +NCK2 -UXS1 +TMT -ST6GAL2 (overlap) -RALY
0 MSNNLTTNLSLEALLSASEDKQRNGLSRTGHTIVAVFLGIILIFGSISNFIVLVLFCKFKVLRNPVNMLLLNISISDMLVCLSGTTLSFASSIQGRWIGGKHGCRWYGFANSCF 1
2 GIVSLISLAILSYERYRTLTLCPGQGADYQKALLAVAGSWLYSLVWTVPPLIGWSSYGTEGAGTSCSVHWTSKSVESVSYIMCLFIFCLVIPILVMVYFYGRLLYAVKQ 0
0 VGKIRKTAARKREYHVLFMVVTAVICYLICWVPYGMIALLATFGPPGVVSPVANVVPSILAKSSTVCNPIIYVLMNKQ 0
0 FYKCFLILFHCQPAQSGPDVSLCPSNVTVIQLGQRKNKDAPGSI*

>TMT_macEug Macropus eugenii frag
0 MSINLTANLSFGTLLPDSEEKQRSGLSRTGHTVTAVFLGLILILGVINNFIVLVLFCKFKVLRNPVNMLLLNISISDMLVCLTGTTLSFASSIRGRWIAGYHGCRWYGFANSCF 1
2 GIVSLISLAVLSYERYRTLTLCPRQGTDYHKALLAVAGSWLYSLIWTVPPLIGWSSYGTEGAGTSCSVHWTSKSVESVSYIMCLFIFCLVIPILFMVYFYGRLLYTVKQ 0
0 VGKIRKSAARKREYHVLFMVVTAVICYLICWVPYGMIALLATFGPPGVVSPVANVVPSILAKSSTVCNPIIYILMNKQ 0
0 FYKCFLILFHCQPASSASDASLCPSKMTVIQLGQRKDKEVPCAIQDLPEVSKKQLCLLSPESNVAPSSGHPQEKMEEKPLSE*  0

>TMT_ornAna Ornithorhynchus anatinus frag
0                        GLSRTGHTMVAVFLGIILVFGFMNNLIVLILFCKFKALRNPVNMIMLNISASDMLVCVSGTTLSFASNISGRWIGGDPGCRWYGFVNSCL 1
2 GIVSLISLAVLSYERYRTLTLHPKQSTDYQKAVLAVGASWIYSLIWTIPPLLGWSSYGTEGAGTSCSVHWSSKSPVSVSYIVCLFIFCLVIPVLVMIYCYGRLLYAVKQ 0
0 IGKARKTAARKREYHVLFMVITTVICYLVCWMPYGVTALLATFGQPGTVSPEASVIPSILAKSSTVCNPIIYILMNKQ 0
0 FYKCFLILFHCQPPRAADAPSTYPSQVMVIQLNQRRSRETAGAPQVLLEMKHQTLHLLGPQLHETPSWERSTPVHPE* 0

>TMT_galGal Gallus gallus XM_001234388 mRNA multiple tissue opsin full +NCK2 -UXS1 +TMT -ST6GAL2 (overlap) +SLC5A7 +SULT1C4
0 MNHTWTYNLSFGAPTDPVEPRAGLSRNGHTVVAVFLGFILFFGFLNNLIVLILFCKFKTLRNPVNMLLLNISISDMLVCISGTTLSFASNIHGKWIGGEHGCRWYGFVNSCF 1
2 GIVSLISLAVLSYERYSTLTLCNKRSDDYRKALLAVGGSWVYSLLWTVPPLLGWSSYGIEGAGTSCSVRWSSETAESTSYIICLFIFCLVIPVMVMMYCYGRLLYAVKQ 0
0 VGKIHKNTARKREYHVLFMVITTVICYLVCWIPYGVIALLATFGKPGVVTPVASIIPSILAKSSTVCNPIIYILMNKQ 0
0 FYKCFRQLFHCQPPSSTDGEPTCHSKVTVIQLNQKTDGGKLCNNKPRPETDNKVTSLLHPEPGLEPAAKTVPPM*  0

>TMT_taeGut Taeniopygia guttata 
0 MNHTWMYNLSFGAPAHPVEPRAGLSRSGHTVVAVFLGLILFFGFLNNLIVLILFCKFKTLRNPVNMLLLNISVSDMLVCISGTTLSFASNIRGKWIGGDHACRWYGFVNSCF 1
2 GVVSLISLAVLSYERYNTLTLCHKRSDDFRKALLAVAGSWIYSLVWTVPPLLGWSSYGVEGAGTSCSVRWSSESAESTSYIICLFVFCLVVPVMVMMYCYGRLLYAVKQ 0
0 VGKIHKNAARKREYHVLFMVIPTVICYLVCWIPYGVIALLATFGKPGAVTPITSIIPSILAKSSTVCNPIIYILMNKQ 0
0 FYKCFRQLFHCQPPSSTDGEPTCHSKVTVIQLDQRADGGNMCNNEPHPETDSKMTSLLCPETTSKATPPTS* 0

>TMT_anoCar full +TMT -ST6GAL2 (overlap) +SLC5A7
0 MSELSSNLTFNMSTSIEEPGSGLSRMGHNIVAVFLGLILVFGFLNNLVVLILFCKFKTLRNPVNMLLLNISASDMLVCISGTTLSFVSNIYGRWIGGEHGCRWYGFVNSCF 1
2 GIVSLISLAILSYERYSTLTQTNKRGSDYQKALLGVGGSWLYSLIWTVPPLIGWSSYGLEGAGTSCSVRWTSETLESVTYIICLFIFCLAIPVLVMIYCYARLFYAVKQ 0
0 VGKLRKTSARKREFHVLFMIITTIICYLICWMPYGVIALLATFGRPGLVSPVASVIPSILAKSSTVFNPIIYILMNKQ 0
0 FYKCFLMLLHCQPSSVADGETICQSKVMAIHQNQKAQGGVILKSQVVPQMDEKAICLLSPESSLDPVLESTPQLSKENSFL* 0

>TMT_xenTro full -UXS1 +TMT -ST6GAL2 (overlap) +SLC5A7
0 MSTIKNWTTNISVENSMSYIENDLSLPTEAVLSRTGHTVVAIFLGFILIFGFLNNFVVLILFCKFKTLRTPVNMMLLNISASDMLVCVSGTTLSFTSSIKGKWIGGEYGCQWYGFVNSCF 1
2 GIVSLISLAILSYERYSTLTLYNKGGPNFKKALLAVASSWLYSLVWTVPPLLGWSSYGREGAGTSCSVRWTSESVESVSYIICLFIFCLALPVFVMLYCYGRLLYAVKQ 0
0 VGKIRKIAARKREYHVLFMVITTVICYLLCWLPYGVVALLATFGRPGVISPVASVVPSILAKSSTVFNPIIYILMNKQ 0
0 FYKCFLILFHCHPTSSADGKSICQSNYTVIQLNQKLNNIVAIPGQTQIPESVDKMPCIHRQNNESPSDQMPQSTTEHLISGT* 0

>TMT_danRer Danio rerio  -UXS1 +TMT -ST6GAL2 (overlap) +GPR89A -pdzk1l
0 MFFEQADLNYSFNMSEEDRLTLLDEDWSDSPMETLSRAGFIALSVFLGFIMTFGFFNNLVVLVLFCKFKTLRTPVNMLLLNISISDMLVCMFGTTLSFASSVRGRWLLGRHGCMWYGFINSCF 1
2 GIVSLISLVVLSYDRYSTLTVYHKRAPDYRKPLLAVGGSWLYSLIWTVPPLLGWSSYGLEGAGTSCSVSWTQRTAESHAYIICLFVFCLGLPVLVMVYCYGRLLYAVKQ 0
0 VGKIRKTAARKREYHVLFMVITTVVCYLLCWMPYGVVAMMATFGRPGIISPVASVVPSLLAKSSTVINPLIYILMNKQ 0
0 FYRCFRILFCCQRSLLQNGHSSMPSKTTVIQLNRRVNSNAVACTAQISTGTHNHDCSTHVTERSNPPEVIP* 0

>TMT_tetNig Tetraodon nigroviridis  -UXS1 +TMT -ST6GAL2 frameshifted assembly
0 MFSGQAGLNSSFNLSDGRGLEDAPAGRGRLSPTGFVVLSVVLGFIITFGFLNNFIVLLLFCKFKKLRTPVNVLLLNISVSDMLVCLFGTTLSFASSLRGRWLLGRSGCNWYGFINSCF 1
2 GIVSLISLVILSHDRYSTLTVYNKQGINYRKPLLAVGGTWLYSLLWTVPPLLGWSSYGIEGAGTSCSVSLDGADGPVPRLHHLPLHLLPGVTGAGDGLLLQQAAVGRQ 0
0 VGKIRKTSARKREYHILFMVVTTAACYLVCWMPYGVVAMMATFGPPNIISPVASVVPSLLAKSSTVINPLIYILMNKQ 0
0 FYKCFLILFHCSHWSADNGTTSVPSKITVIQLNRRAYSNTVACADPLSTDALKQCCSAKNASTIEVKLS* 0

>TMT_takRub Takifugu rubripes (pufferfish)
0 MFSGQAGLNYSFNLSDDRELLDAPAGRAKLSPTGFVVLSVVLGFIMTFGFLNNFVVLLLFCKFKKLRTPVNMLLLNISVSDMLVCLFGTTLSFASSIRGRWLLGRIGCSWYGFINSCF 1
2 GIVSLISLVILSYDRYSTLTVYNKQGINYRKPLLAVGGTWLYSLFWTVPPLLGWSSYGIEGAGTSCSVSWTVQTAQSHAYIICLFTFCLGIPILVMIYCYSRLLWAVKQ 0
0 VGRIRKTAARKREYHILFMVVTTAACYLVCWMPYGVVAMMATFGPPNIISPVASVVPSLLAKSSTVINPLIYILMNKQ 0
0 FYKCFLILFHCGHWSADNGNTSMPSKTTAIQLNRRVYSNTVACADQLSTDALKQCCSANTISTKNTSTVEGKLS* 0

>TMT_gasAcu
0 MVFGQAGLNHSFNLSDDRELLDTSAGRAKLSPTGFVVLSVMLGFIMTFGFVNNLVVLLLFCKFKKLRTPVNMLLLNISVSDMLVCLFGTTLSFASSLRGKWLLGRSGCSWYGFINSCF 1
2 GIVSLISLVILSYDRYSTLTVYNKAGPDYRKPLLAIGGSWLYSLFWTVPPLLGWSSYGIEGAGTSCSVSWTVQTAQSHAYIICLFTFCLGLPMLVMIYCYSRLLLAVKQ 0
0 VGRIRKTAARRREYHILFMVLTTAACYMLCWMPYGVVAMMATFGPPNIISPVASVVPSLLAKSSTVINPLIYILMNKQ 0
0 FYRCFLILFHCKHWSAENHNTSMPSKTTVIHLNRRVCSNTLPCTAQASTDAANHFCSTSATKHTSPPLQGHGLSLNVLNMIRQENHSHDEAAKNQLDCLT* 0

>TMT_oryLat Oryzias latipes (medaka)
0 MFSGQTGLNFSFNQSDDRELEDTPAGSAKLSQAGFVVLSVVLGFIMTFGFLNNFVVLILFCKFKKLRTPVNMLLLNISVSDMLVCLFGTTLSFASSIRGRWLLGRGGCSWYGFINSCF 1
2 GIVSLISLVILSYDRYSTLTVYNKGGLNYRKPLLAVGGSWLYSLFWTVPPLLGWSSYGLEGAGTSCSVSWTANTAQSHAYIICLFIFCLGLPILVMIYCYSRLLLAVKQ 0
0 VGKIRKTAARKREYHILFMVLTTAACYLLCWMPYGVVAMMATFGPPNIISPVASVVPSLLAKSSTVINPLIYILMNKQ 0
0 FYRCFLILFHCDHWSSENGNTSVPSKTTVIPLNRRIYTNTVAQISTDNAN* 0

>TMT_ictPun Ictalurus punctatus (catfish) transcript from whole fry
0                          WLLGRTGCMWYGFINSCF 1
2 GIVSLISLMILSYERYSTMTVYNNQGPNYRKHLLAVGGSWLYSLIWTVPPLLGWSSYGLEGAGTSCSVSWTDHSPKSHAYIICLFIFCLGLPVLLMVYSYGRLLYAVKQ 0
0 LGKIHKTARRRDYHLLFMITTTVVCYLLCWTPYSVVALMASFGRPGIITPVASIIPSLLAKSSTVINPVIYIFMNKQ 0
0 FYRCFRTLLGYKERSAVPDDHSLMATKNTAIQLKCIMHNNPVPSPAHTPPPFF

>TMT_oncMyk Oncorhynchus mykiss frag 2 white blood cell transcripts
2   GIFCLGLPVLVMVYCYGRLLYAVKQ 0
0 VGRIRKSAARRREFHILFMVITTVVCYLLCWMPYGVIAMMATFGHPGLITPIVTVVPSIMAKSSTVINPLIYILMNKQ 0
0 FYRCFLILFHCKRPSSENGVSSMPSKTTVIQLNRRGHSNNVALTPQLSTGANHHNHNHTVECSTNNREVTTPIGLPHSGWL* 0

>TMTa1_danRer Danio rerio NM_001118899 full +PBX3 +TNK2 +TMTa1 -PAP2D +LPPR4
0 MIVSNLSVLSCRRNSALCLGAVEGHLEASSSYRTLSPTGHILVAVSLGFIGTFGFLNNLLVLVLFGRYKVLRSPINFLLVNICLSDLLVCVLGTPFSFAASTQGRWLIGDTGCVWYGFANSLL 1
2 GIVSLISLAVLSYERYCTMMGSTEADATNYKKVIGGVLMSWIYSLIWTLPPLFGWSRYGPEGPGTTCSVDWTTKTANNISYIICLFIFCLIVPFLVIIFCYGKLLHAIKQ 0
0 VSSVNTSVSRKREHRVLLMVITMVVFYLLCWLPYGIMALLATFGAPGLVTAEASIVPSILAKSSTVINPVIYIFMNKQ 0
0 FYRCFRALLNCDKPQRGSSLKSSSKTKPFRPGRRTDNFTFMVASVGPNQTNPVEDGPPSADNTKPAVLSLVAHYNG* 0

>TMTa_takRub Takifugu rubripes (pufferfish) -CALD1 +TNK2 -RAB18 +ABI1 12670711 AF402774 full
0 MIVSNVSLSGCAGVNGAVCAAEGHQAGGSDRSTLTPTGNLVVSVFLGFIGTFGLVNNLLVLVLFCRYKMLRSPINLLLMNISISDLLVCVLGTPFSFAASTQGRWLIGEAGCVWYGFANSLF 1
2 GVVSLISLAVLSFERYSTMMTPTEADPSNYCKVCLGITLSWVYSLVWTVPPLFGWSSYGPEGPGTTCSVNWTAKTTNSISYIICLFVFCLIVPFLVIVFCYGKLLCAIRQ 0
0 VSGINASTSRKREQRVLCMVVIMVICYLLCWLPYGVVALLATFGPPDLVTPEASIIPSVLAKSSTVINPIIYVFMNKQ 0
0 FYRCFLALLCCQDPRSGSSMKSSSKVATKAKGVTPTGQRRTDFLYMVASLGRPAATIPQLGPSFDATNDFTKPPSSDTIKPVVVSLAAHCDG*

>TMTa_tetNig Tetraodon nigroviridis full
0 MIASNASVSGCAGVHGAACAADAPPAGGSHRSSSSLTPTGNLVVSVFLGLIGTSGLVSNLLVLVLFCRFKVLRSPINLLLVNISVSDLLVCVLGTPFSFAASTQGRWLIGAAGCVWYGFVNSLFG 1
2 GIVSLISLAVLSFERYSTMMTPTEADSSNYCKVCLGIGLSWVYSLLWTVPPLLGWSSYGPEGPGTTCSVNWTAKTANSVSYIICLFVFCLILPFLVIVFCYGKLLCAIRQ 0
0 VSGVNASMSRRREQRVLFMVVVMVICYLLCWLPYGVVALLATFGPPGLVTPAASIIPSILAKSSTVINPVIYVFMNKQ 0
0 FSRCFLSLLCCEDPRSSTSLRSSSRVTTKAVRGGTLTGQRRTNHLLYMVAALGRPVATAMPQLGPSFDATYDITKAPSSDNHQPVVVSLEAHG* 0

>TMTa_gasAcu Gasterosteus aculeatus (stickleback)   +TNK2 +ENC full
0 MIVSNLSLSGCAGVSSALCAAAGEGHLSGGSHRNTLTPTGHLVVAVCLGFIGTLGLMNNLLVLVLFCRYKMLRSPINLLLINISISDLLVCVLGTPFSFAASTQGRWLIGEGGCVWYGFANSLFG 1
2 GIVSLISLAVLSYERYSTMVAPTEADSSNYHKISLGITLSWVYSLIWTAPPLFGWSHYGPEGPGTTCSVDWTARTANSISYIICLFVFCLIVPFLVIVFCYGKLLCAIRQV 0
0 VSGINASLSRKREQRVLFMVVIMVVCYLLCWLPYGIMALMATFGPPGLITPVASIIPSVLAKTSTVINPVIYVFMNKQ 0
0 FYRCFKALLRCEAPRPSSSLKSSSKVPTKAMRGAAVTGPRHTNNFLFVVASLGRPVATIPQLGPSVEPTIDVTGGPSSDNNKPVIVSLVAQCDG* 0

>TMTa_oryLat Oryzias latipes (medaka) genome SLC12A3 two frags
0 MLVSNVSLGGCAEFNSALCAGAGEEHLGGGSYRTTLTPTGHLIVAVCLGFIGTFGLVNNLLVLVLFCRYKILRSPINLLLINISISDLLVCVLGTPFSFAASTQGRWLIGEGGCVWYGFANSLCG 1
2 GIVSLISLAVLSYERYSTMMTPAEADSSNYRKISLGIILSWGYSLLWTLPPLFGWSHYGPEGPGTTCSVDWTAKTANNISYIICLFVFCLIVPFMVIVFCYGKLLYAIKQV 0
0 VSGINVSVSRKREQRVLFMVVIMVICYLLCWLPYGIMALLATFGPPDLVTPEASIIPSVLAKTSTAINPVIYVFMNKQ 0
0 * 0

>TMTa_pimPro Pimephales promelas frag
   GHLVVAVCLGFIGTGFLNNTLVLILFCRYKVLRSPMNYLLVSIAVSDLLVCVLGTPFSFAASTQGRWLIGRAGCVWYGFINSCL 1
2 GVVSLISLAVLSYERYCTMMGATQADSTNYKKVAMGIAFSWIYSMVWTLPPLFGWSCYGPEGPGTTCSVNWAARTANNVSYIICLFFFCLILPFIVIVYSYGRLLQAITQ 0
0 VSRINTVVSRKREQRVLFMVITMVVCYLLCWLPYGIMALLAAFGRPGLVTPAASIVPSVLAKTSTVINPIIYIFMNKQ 0
0 FCRCFHALIMCTTPQRGSSFKNSSKVTKTLRTVRRANGQNVTFAVASAGHPTICAPH

>TMTb_danRer +TNK1 +TMTb -MYEOV2
0 MIESNVSRSCEWCAGGGEGTGAHLDENHSDHSLSPTGHLVVAVCLGFIGTFGFLNNTLVLVLFCRYKVLRSPMNCLLISISVSDLLVCVLGTPFSFAASTQGRWLIGRAGCVWYGFINSFLG 1
2 GVVSLISLAVLSYERYCTMMGSTQADSTNYRKVVIGIAFSWIYSMVWTLPPLFGWSCYGPEGPGTTCSVNWAARTPNNVSYIVCLFVFCLILPFIVIVYSYGRLLQAITQ 0 
0 VSRINTVVSRKREQRVLFMVVTMVVCYLLCWLPYGIMALLATFGHPGLVTPAASIVPSLLAKSSTVINPIIYIFMNKQ
0 FCRCFHALIMCTTPERGSSFKNSSKVTKTLRTVRRANGQNVTFAVASAVHRTPYSDRQKSSSEGEKLPPATGQGTSKPVVSLVAYYNG* 0
 
>TMTb_takRub Takifugu rubripes (pufferfish) +TFRC +TMTb +CHES1 -MYEOV2 -ARHGAP21 full
0 MIVCNVSLSCAHCPGEGTAANDAYAQASGSLATPTLSQRGHLVVAVCLGFIGTVGFLSNFLVLALFCRYRALRTPMNLMLVSISASDLLVSVLGTPFSFAASTQGRWLIGRAGCVWYGFVNACL 1
2 GIVSLISLAVLSYERYCTMVSSTIASNRDYRPVLGGICFSWFYSLAWTVPPLLGWSRYGPEGPGTTCSVDWRTQTPNNISYIVCLFTFCLLLPFFVILYSYGKLLHTIRQ 0 
0 VRRVSSTVTRRREHRVLVMVVAMVVCYLICWLPYGVTALLATFGPPNLLTPEATITPSLLAKFSTVINPFIYIFMNKQ 0
0 FYRCFRAFLNCSTPKRDSTVRTFTRISLRALRQDQQQKGSALAPSSARPTPNSIHESSLKGSHSTPSNGGAAAAKSPAANRSKPKLILVAHYRE* 0

>TMTb_tetNig Tetraodon nigroviridens
0 MIVCNLSLSCAHCPGGGAAATDAYAEAPGSLAPPTLSQRGHLVVAVCLGAIGTVGFLSNLLVLALFCRFRALRTPMNLMLVSISASDLLVSVLGTPFSFAASTQGRWLLGRAGCVWYGFVNACL 1
2 GIVSLISLAVLSYERYCTMMASTMASNRDYRPVLLGICFSWFYSLAWTVPPLLGWSRYGPEGPGTTCSVDWRTQTPNNISYIVCLFAFCLLLPFCVILYSYGKLLHTIRQ 0
0 VSSVSSAVTRRREHRVLVMVVAMVVCYLICWLPYGVTALLATFGPPNLLTPEATITPSLLAKFSTVINPFIYIFMNKQ 0
0 FYRCFRAFLSCSSPERGSTVRTFTRISLRAVCQRKQQRVSAPAASSACPTPNSIHHSSRKGSHSASSNSGTAAAAKTPAANSSKPKLILVVHYRE* 0

>TMTb_gasAcu Gasterosteus aculeatus (stickleback) full
0 MIVCNVSLSCVHCPGGGAGGTAATATGAYEEVSDSLPAPSLSPKGHLVVAVCLGFIGTFGFLSNFLVLALFCRYRALRTPMNLLLVSISASDLLVSMVGTPFSFAASTQGRWLIGRAGCVWYGFVNACL 1
2 GIVSLISLAVLSFERYSTMVKPTVADGRDFRPALGGIAFSWLYSVAWTVPPLLGWSEYGPEGPGTTCSVDWKTQTANNISYIVCLFVFCLVLPFCVILYSYSRLLQAIRQ 0
0 IPQVSVVSSVVTRHREQRVLAMVVVMVACYLVCWLPYGVAALLATFGPRDLLSPEASITPSLLAKFSTVVNPFIYIFMNKQ 0
0 FYRCFRAFLSCSTPERGSTLKTFSRPTKTLRAGRHEKGRRVSAAAPSTAQPTRNSAPRSSQGANHASATPPPSPADGRCAAAGAAKPKRTLVAHYRE* 0

>TMTb_oryLat
0 MIVPNASLSCAHCDGDAAEQDAPGSAAAPSLSPTGHLVVAVCLGLIGTCGFLSNLLVLALFCRYRALRTPMNLLLVSISVSDLLVSVLGTPFSFAASTQGRWLIGRAGCVWYGFINACL 1
2 GIVSLISLAVLSYERYSTVMTPNMADGRDFRPALGGICFSWLYSVAWTVPPLLGWSRYGPEGPGTTCSVDWKTQTPNNISYIICLFTFCLLLPFGVIVYSYGKMLRVIRQ 0
0 VSQVRSMSSVVTRRREQRVLVMVVTMVVCYLVCWLPYGIAALLATFGPRDLLTPAASITPSLLAKFSTVINPLIYIFMNKQ 0
0 FYRCFWAFFCCSTPEQVSTLRTFSRVTKTIRTFRQERELHVSAPAPSSGLPTPNSIQKGNNHVDPSSINQACAASDSPDSRKPKVVLVAHYQE* 0

>TMTa1_calMil Callorhinchus milii (elephantfish) wgs frag 
0 MLNSSPNSSPSLPLSQVGWTGLSRTGLTVVAVCLGIIMVLGFLNNLLVLVLFCKYKVLRSPMNMLLLNISVSDMLVCICGTPFSFAASVQGRWLVGEQGCKWYGFANSLF 1
2 GIVSLMSLTILSYDRYITITGTTEADITNYNKTIVGIALSWIYSLMWTLPPLFGWSNYGPEGPGTTCSVNWQSKEVSSKSYIICLFIFCLLMPFLVIVYCYGKLVLAVRK 0
0        AQTREHRILLMVISMVTFYLLCWLPYGTVALIGTFGNADLITPTCSVIPSILAKSSTVINPVIYVIMNKQ 0

>TMTa2_calMil Callorhinchus milii (elephantfish) wgs frag 
0 VSANNSMGRTRENKLLIMVTFMIICFLLCWLPYGIVALLATFGSPGLITPTASIIPSVLAKTSTVYNPIIYIFMNKQ 0

>TMTx_braFlo Branchiostoma floridae (amphioxus) XM_002207814 frag with assembly duplication, no N-term even in v2.0 47% TMT5_braFlo + insect TMTs 
0 43 VAAILALIGVLGIVNNSTTLYLVGRYKQLRTPFNILMVNLSVSDLLMCVLGTPFSFVSSLHGRWMFGHSGCEWYGFICNFL 1
2 GIVSLITLTVISYERYLLMKRLPNERILSYRAVALAVVFIWCYSLLWTAPPLVGWSSYGPE 0
0 GYGISCSVNWESRTANDTSYIVAYFVGCLVFPVAIIVISYTRLILYMRQ 0
0 QAPSAPMQMLVRREKRVTKMVVVMIMGFTICWTPYTIVALIVTCGGEGIITPAAATVPALFAKSSVVYNAAIYVAMNNQ 0
0 FRKCFLRSLNCRSQPRDPSSQQYTLKTNQVGMSTSGSQAARTADRIKTVHVATANPQDHRSSSGQAVEDNGGFRKSLTHSLPLNSISTLLEAEK* 0

>TMT5_braFlo Branchiostoma floridae (amphioxus) extra 00 intron Amphiop5
0 MLGMHNVMNATDYDNNNATFAAWNFQRNGTTEEEVEFSGFDTVAVVIAAIGIAGFLSNGAVVLLFLKFRQLRTPFNMLLLNMSVADLLVSVCGNTLSFASAVRHRWLWGRPGCVWYGFANHLF 1
2 GLVSLISLAVISYERYRMVVKPKGPGSSYLTYNKVGLAIIFIYLYCLLWTTLPIVGWSSYQLE 0
0 GPKISCSVAWEEHSLSNTSYIVAIFIMCLLLPLLIIIYSYCRLWYKVKK 0
0 GSQNLPPAIRKSSQKEQKIARMVVVMITCFLVCWLPYGAMALVVSFGGESLISPTAAVVPSLLAKSSTCYNPLVYFAMNNQ 0
0 FRRYFQDLLCCGRRLFDASASVNTCNTSAMPRHSPVFQKPDSDQYNGIQKSREPQMRTTGQNAPYRQWIEMQTIAVVVKADEVNNKFGEVKT* 0

>TMT5_braBel Branchiostoma belcheri (amphioxus) AB050609 full introns from braFlo Amphiop5 extra Nfrag in mrna 
0 MLGIYNVVNATEYGNNTTFAAWDFKRNGTGGEEEVEFFGYDAVAGVIAIIGVVGFVSNGAVVVLFLKFPQLRTPFNLLLLNMAVADLLVSVCGNTLSFASAVRHRWLWGRPGCVWYGFANHLF 1
2 GLVSLISLAVISFLRYRMVVKPKGPGSSYLTYTKVGLAILFIYLYCLLWTTLPIAGWSSYQLE 0
0 GPKIGCSVAWEEHSWSNTSYIVVLFITCLFAPLLIIVYSYYRLWHKVKQ 0
0 GSRNLPAAMRKSSQKEQKIAMMVIVMITCFMVCWLPYGAMALVVTFGGERLISHTAAVVPSLLAKSSTCYNPVVYFAMNSQ 0
0 FRRYFQDLLCCGRRLFDVSQSVVTGNTAMPRNNSQGFRKDDSDQKQDNGLPKQSEGPMCDHSSNESQMEGSRHNTAASQQWIEMQTIAVVVKAVEVDTSAANEP* 0

>TMTy_braFlo Branchiostoma floridae (amphioxus) FE572481 (to other allele) gastrula XM_002222645 flawed, allele dup, 39% ENCEPH4_braFlo new
0 MASAGQNVTFPAIDTMAPTPEALTSDPTTPAYFTTEQHLLMAVWLGFIGSFGFVTNLLTVLVFWCFKSLRTPFHLYLGGIALSDLLVAALGSPFAVASAVGERWLFGRAVCVWYAFVNYFL 1
2 SVSIVTMATMSFSRYWVIIRPQSAPRLDTVYGACVVNALAWCYSFFWTIMPVLGWSRFTQ 0
0 VAAMTVCSLDWDHHTPLSKSYIPVAFLTCLFLPLGVIIFSVFKTTMHLRR 0
0 AAEVEDEVPNEVRAGRKTTRITLVMAGCWLVAWLPYACMALVIAAGGRVSPTVEVLATKFAKTSYIVNTIIYLVMEKE 0
0 FRKSLVLLLFCGRDPFDIQIEQPAYEKADVYVERLVTAEPMVEMEAVNVRPAQQEPARAPFGTPL* 0

>TMTPIN_stoPur Stronglyocentrotus purpuratus GLEAN3_05569 0.2.2.0.0 16311335 opsin1 PIN-type introns no cdna no sacKow 
0 MSNLMTGLVTNVNALSGIGNETPTTIGLSSLVVPVSRTTYNYLTVYTGFLTIFGILNNGIVMILFARFPSLRHPINSFLFNVSLSDLIISCLASPFTFASNFAGRWLFGDLGCTLYAFLVFVA 1
2 GTEQIVILAALSIQRCMLVVRPFTAQKMTHRWALFFISLTWIYSLIICVPPLFGWNRYTYEGPGT 1
2 ACSVAWNSPSPGDTSYIIFIFVLVLVIPFGIIIFCYGLLVYAVKK 0
0 ISRTQAALSSEAKADRKVSKMIFIMILFFLIAWTPYTGFSLYVTFGKNVVITPLAGTFPPFFAKLCTIHNPIIYFLLNKQ 0
0 FKDALIQLFCCGENPFDRDESEHEGRGGRHRHRTAPSATAHIGGRGRASSLPTATSMLDIPQAASTAASSSGKTQNKESLEKGPSTSETTNKRVFELSSKIQKFEISEKNNTPSSSELPGASSLSGALMPPRRAMKNQVGCLPPVDN* 0

See also: Curated Sequences | LWS | Melanopsins | Neuropsins | Peropsins | RGR phyloSNPs | Update Blog