Iron sulfur clusters
Introduction
The surprisingly numerous nuclear proteins containing 4Fe-4S clusters are made from their respective apoproteins in the cytoplasm during the final stages of an assembly process that begins within mitochondria and ends with an embedded cluster in polymerases, helicases, primases, telomerases, and photolyases with no explained need for a cofactor otherwise associated with oxidation and reduction.
These 4Fe-4S clusters do not spontaneously associate with their target protein because they do not occur in free solution, being quite unstable to unwanted oxidation. Instead, nascent clusters are attached to a series of mediating proteins, carrier scaffolds and conformational chaperones throughout a complex process of maturation. That process and the gene products involved -- which are conserved from yeast to human -- have been recently reviewed in depth and new results (1,2) have clarified the roles of the four main protein components that collaborate on the final stage of cytoplasmic assembly.
Not all extra-mitochondrial 4Fe-4S cluster proteins are assembled in this pathway, but the molecular basis for specificity has not yet been determined. Indeed, a surprising number of proteins -- some studied for decades -- have only been recognized as iron sulfur proteins in 2011-12.
Indeed, the list of proteins is still incomplete because many unrelated homology classes have Fe-S clusters, meaning no single diagnostic pattern can be used to scan the entire proteome. Often four conserved cysteines coordinate the cubane complex but their spacing within the primary sequence is not uniform and difficult to distinguish from cysteine patterns that bind intrinsic zinc. Further confusing matters, seemingly artifactual zinc can replace bona fide 4Fe-4S clusters in proteins purified for crystallography in the presence of oxygen (1,2,3).
The early and middle stages of intra-mitochondrial iron sulfur cluster assembly are carried out by gene products of bacterial origin, relics of alphaproteobacterial endosymbiosis transferred long ago to the nuclear genome. However, not all components of final cytoplasmic assembly have such a clear origin whereas most targeted apoproteins (such as primase large subunit PRIM2) are clearly those of the archaeal parent. Thus the final stage of assembly presents two worlds in collision: bacterial proteins assembly iron sulfur clusters in unfamiliar archaeal proteins.
Bioinformatics, while a poor substitute for experimentation, is fast and easy, so it is best to exhaust the possibilities there first. Nothing is proven but sometimes it can suggest interesting directions.
MMS19: a large all-scaffold protein
MMS19 is a large protein involved in cytoplasmic iron sulfur assembly first studied with bioinformatic tools 12 years ago (1,2). Revisiting that with modern comparative genomics methods, MMS19 emerges as a modular scaffolding protein over its entire length, conserved in its features -- though not particularly in amino acid sequence -- from the earliest diverging eukaryotes to human.
The C-terminus of MMS19 was initially classified as HEAT repeats. Today we know these are not found as individual units but instead work together to form a long twisted spiral of consecutive modules called an ARM domain. An individual HEAT unit consists of a small 3-helix bundle, a generic super-secondary structure analogous to a beta-alpha-beta Rossmann fold unit, meaning most occurrences of HEAT in the eukaryotic proteome are not truly homologous despite structural similarity but instead represent convergent evolution analogous to Rossmann-like fold units forming many unrelated beta propellers or TIM barrels.
Since these domains are catalytically inert and lack conserved cysteins or other conserved motifs, MMS19 can contribute as organizing principle to the cytoplasmic iron assembly complex (and other nuclear complexes) but not to the actual business of forming 4Fe-4S on target apoproteins.
The size of MMS19 -- over a thousand residues -- makes it a difficult target for structure determination. As of June 2012, no deposited structure at PDB provides a template upon which the MMS19 can be threaded. In the interim, MMS19 might be structurally modeled using the known beta-catenin structure -- despite its lack of authentic homology, it too is comprised almost entirely of HEAT units.
The number of HEAT repeats in an ARM domain is subject to expansion and contraction over evolutionary time. The individual units often align poorly with each other and generally lack conserved residue signatures despite initial reports, yet this level of variation does not necessarily affect the overall fold. However, this lack of diagnostic features makes it difficult to reliably identify remote homologs of HEAT repeats because the primary sequences can be diverged beyond recognition and homological alignments go out of register when the number of repeats differs.
An accurate count of the number of individual HEAT domains in MMS19 from a given species is also difficult because some domains are more accurately represented in the HMMer profile than others, here giving different counts between mouse and human at UniProt despite 90% sequence identity over their entire length. Indeed the five species with manually reviewed UniProt entries are all in conflict, with the nominal number of HEAT units range from 7 in human to 18 in slime mold, despite similar lengths and overall alignment implying the same actual domain structure.
MMS19 is a single-copy gene without paralogs in all eukaryotes, implying simple orthology (no retained duplications or losses). Below, 20 full length fasta sequences from GenBank were chosen for uniform distribution over the eukaryotic phylogenetic tree. Superfamily proved to be the most consistent, sensitive and selective online tool for ARM domain detection. The figure at bottom establishes that MMS19 consists entirely of HEAT units and spacers, which in effect form a single ARM.
It emerges upon alignment with MultAlin that conservation is mediocre overall except for one previously notedspecial region of exceptional conservation containing two blocks of invariant residues from human to yeast to amoeba. This region must have already been established in the last common ancestor of all eukaryotes and play a very special role in MMS19 even today to account its conservation over trillions of years of cumulative branch length. However that role remains a complete mystery. Sequence conservation in MMS19 is otherwise not exceptional: typically 27-34% identity relative to human, of which some portion is accidental.
Ultra-conserved region in MMS19: human: 182 DGEKDPRNLLVAFRIVHDLISRDYSLGPFVEELFEVTSCYFPIDFTPPPNDPHGIQREDL 241 yeast: 184 NGEKDPRNLLLSFALNKSITSSLQNVENFKEDLFDVLFCYFPITFKPPKHDPYKISNQDL 24
Regardless of Blast query -- full-length MMS19, this ultra-conserved region, or reconstructed ancestral sequence -- no counterpart to MMS19 occurs among 2,500 complete bacteria and archaea genomes, even though unambiguous orthologs to human MMS19 are readily found in the earliest diverging eukaryotes. MMS19 may thus represent a eukaryotic innovation needed to organize more complex cytoplasmic iron assembly, or be too simplified and diverged (or just lost) in prokaryotes. As method of last resort, prokaryotic operons containing other cytoplasmic iron sulfur assembly proteins could be scanned for adjacent HEAT-like domains or comparable scaffolding proteins.
There are no matches to MMS19 at PDB using Blastp. Since the fold is widespread and generic, structural matches in DALI do not imply homology. On the other hand, this allows the crystallographic structure of a non-homologous ARM protein (beta-catennin pdb: 1LUJ) to serve as provisional structural template. Bound E-cadherin, ICAT, XTCF3 complexes have been also been determined which may suggest a binding mode for cytoplasmic iron sulfur and helicase-type proteins on HEAT repeats of MMS19.
MMS19 could determine selectivity among the overall set of iron sulfur apoproteins if only those interacting with DNA (or comparable nucleotide) have binding propensity for HEAT units. This specificity would then vary by organism, as would the effects of knock-in replacement. Only those apoproteins that align along the linear scaffolding structure in close enough proximity to CIA effector proteins receive an iron sulfur complex. Not every protein arising in this context need directly bind a HEAT domain -- they could bind another protein with that capacity.
This scaffolding scenario requires multiple non-homologous proteins to have HEAT binding sites, which seemingly requires convergent evolution on a significant scale since a shared mobile binding domain can be ruled out. If MMM19 is truly a eukaryotic innovation, then cytoplasmic iron assembly complex initially functioned without the scaffolding until MMS19 and these binding sites evolved.
That seems implausible, so some other common ground must account for HEAT binding. One option, based on the super-helical configuration and major groove of the overall ARM domain, supposes that that MMM19 spoofs a DNA helix or nucleotide base in shape and charge (along the lines of W536 in CRY1B photolyase). This would explain most of the specificity -- each archaeal apoproteins of DNA metabolism needing an iron sulfur cluster already has a DNA binding site, so already an appropriate MMM19 HEAT binding site.
In early endosymbiosis, retained bacterial cluster assembly machinery collides with nuclear-encoded archaeal iron-sulfur protein motifs previously maturated by a different system, a conflict that had to be seamlessly resolved without ever a gap in continued functionality.
CIAO1: a WD40 multi-protein scaffold
The bioinformatic analysis of CIAO1 is straightforward: it consists of a WD40 domain in its entirety. There is no CIAO2 -- this is a single-copy gene in all eukaryotes. These ubiquitous 7-propeller blade domains can arise in non-homologous proteins as a common supersecondary structure rather than from spread of a mobile domain -- the human genome encodes some 257 proteins with a WD40 repeat. However a degree of coincidental sequence alignment can arise from common constraints (such as a conserved glycine/histidine and the tryptophan/aspartate pairs).
WD40 domains are not catalytic and so, like MMS19, not involved here mechanistically in Fe-S formation, transfer or repair. Thus CIAO1 is likely a structural scaffolding protein coordinating larger multi-protein complexes, so its acronym (Chinese for bridge) is appropriate. Crystallographic structures have been determined both in yeast PDB: 2HES and human PDB: 3FM0.
Strongly conserved surface residues -- which likely mediate oligomeric interactions -- mostly lie on the top and one side of CIAO1. Despite this, of four substitutions tested, only R127E (purple) affected in vivo functionality, as assayed bu plasmid rescue of CIAO1 depleted cells and levels of the assembled Fe-S protein isopropylmalate isomerase. This does not explain observed conservation of other surface residues such as K16, R34, E54, E197 and R251 which are unlikely to play a role in internal structure or stability.
No journal article ever accompanied the human structural determination but differences from yeast are likely minor in view of demonstrated replacement capability. Since human CIAO1 forms a hetero-oligomer with FAM96A and likely FAM96B as well, this suggests yeast CIAO1 forms a similar oligomer with its counterpart yeast FAM96B consistent with the late role of CIAO1 in cytosolic FeS assembly.
It is thus feasible to separately determine conserved residues in FAM96A and FAM96B using ConSurf. Their common ground would then include the binding site for CIAO1, presumably either its previously established conserved top or side residues. Since FAM96B forms a domain-swapped dimer and CIAO1 binds stoichiometrically, a symmetric heterotetramer can be expected.
However nothing is accomplished in terms of coordinated docking unless CIAO1 has a second binding site for another component of cytosolic FeS assembly. CFD1 is a possibility here in view of the CFD1-CIAO1 fusion protein in S. pombe, as are NBP35, ERCC2 and ANT2.
CIAO1 has weak blast matches in both bacteria and archaea but these are not associated with any of the three iron sulfur cluster assembly system operons (ISC, SUF, NIR) and may simply represent convergence in WD40 proteins. Matches to early-diverging eukaryotes -- a half dozen are provided below -- are much more persuasive because back-blastp to human uniquely recovers CIAO1. These exhibit extensive conserved regions considering the immense phylogenetic span and rapid evolution of some clades. Narrowly ineage-specific indels, presumably in loop regions, can be removed to create an idealized alignment that better reflects conserved residues.
FAM98B is homologous to bacterial SufT
FAM96B is remarkably conserved throughout eukaryotes. It duplicated in the earliest metazoan ancestor, giving rise to FAM96A after the divergence of choanflagellates but before those of sponge, trichoplax, ctenophore or cnidarian. Both FAM96B and FAM96A have been retained in all metazoan lineages. In vertebrates but not earlier diverging deuterostomes, FAM96A acquired an unmistakable signal peptide, meaning it was no longer targeted to the cytoplasm. The species with the signal peptide are exactly those with an extra pair of invariant cysteines, suggesting a disulfide suitable for an oxidizing subcellular compartment such as endoplasmic reticulum.
However these new cysteines are not in proper crystallographic position to form a disulfide, though the new Cys99 and the long conserved near-terminal Cys155 are within range and may form a disulfide under certain conditions. A disulfide between Cys99 and Cys155, while physically plausible from 3UX2 and signal peptide, is however not evolutionarily plausible because the counterpart of Cys155 in FAM96B is a long-conserved surface residue for another reason.
The positions of two Cys90, one from each monomer, may also be capable of forming a disulfide, 2Fe-2S, or zinc ligand provided the protein is purified anaerobically or reconstituted using some sort of activity assay. The phylogenetic conservation of cysteines is explored in the alignment below.
The two encoded proteins both bind CIA01. However they must have distinct functions in vivo to account for their retention in so many lineages for so long. The usual explanations -- specialized time of expression during development or in differentiated populations of cells -- is not generally applicable from single-celled organisms to human. Since the signal peptide in FAM96A arose fairly late, it too cannot explain retention in earlier diverging species. (Note however tools that recognize signal peptides are not adequately trained on these species.) Since the duplication is restricted to metazoans (ie animals), it could possibly be associated with dietary, rather than diffusive, acquisition of iron. Secondary duplications of FAM96B occurred in various lineages (eg insects, slime molds).
The placement and phase of introns in FAM96B and FAM96A -- largely conserved -- implies that FAM96B was largely intronated prior to gene duplication. Although the two genes did go their separate ways during the subsequent 600 million years (both gains and losses of introns occurred), the patterns remain very closely related today even comparing human to sponge. Most remarkably, the first intron was already present in early diverging stramenopiles (eg Phytophthora infestans) and the last exon in the last common ancestor of human and amoebozoa (eg Dictyostelium discoideum). Convergent evolution is implausible given 450 possibilities (three possible phases at 150 sites).
Intron position and phasing (00 12 21) in FAM96B and its early metazoan duplicate FAM96A: FAM96A_homSap WLSGLSEPGAARQPRIMEEKALEVYDLIRTIRDPEKPNTLEELEVVSESCVEVQEINEEEYLVIIRFTPTVPHCSLATLIGLCLRVKLQRCLPFKHKLEIYISEGTHSTEEDINKQINDKERVAAAMENPNLREIVEQCVLEPD FAM96A_sacKow MNKDETLRIKGAEDDQEKELAEEIYDIIRTIRDPEKPQTLEDLDVVYEDGVLVNHRGTDEFLVNVEFTPTVPHCTLATLIGLCIRVKLQRTLPHSYKLDIFIKKGTHSTEDEINKQINDKERIAAAMENPNLKDLVDNCVVDLE FAM96A_triCas EFEDSPRKEVKEVSEDDSELKYTVYDLIRTIKDPEKPNTLEELNVVYEEGEVKERTSGNVSVVRVEFNPTVPHCSLATLIGLCIRIKLERCIPYRIKLDIYIKAGAHTTEHEINKQINDKERIAAAMENPNLREMVENCIVEED FAM96A_nemVec PSFGASRIDNDVNSQSNRNLALDVYDLIKDIKDPEKPQTLEDLKVVYESCVEVQKVAGQDHITIT FTPTVPHCSLATLIGLCIRVKLEKSLPEKFKLDIYLKKGTHSTENEINKQINDKERIAAAMENPNLRKIVENCIDEDN FAM96A_acrPal MSENKILSTAADSSFDNLVLVQEVFDIVKDIRDPELPQTLEELHVIEEEFIKIDKIENDEYIIKIEFTPTVPHCSLATLIGLCLRVKLERSLPYKFKLDIFLSRGTHSTENEINKQINDKERIAAAMENPNLKKIVEECILDAN FAM96A_triAdh ELIRDIKDPELPQTLEELNVVTEDEIFVRNMKQGEACIRINFTPTVPHCSLATLIGLCIRVKLQRCLDQDYKLDIYVTKGSHDTEDGVNKQINDKERVAAAIENPNVKKLVEECLQEVQ FAM96B_homSap SGERPVTAGEEDEQVPDSIDAREIFDLIRSINDPEHPLTLEELNVVEQVRVQVSDPESTVAVAFTPTIPHCSMATLIGLSIKVKLLRSLPQRFKMDVHITPGTHASEHAVNKQLADKERVAAALENTHLLEVVNQCLSARS FAM96B_braFlo TSEREVTPEELNEDVEDAIDAREIFDILSSINDPEHPLTLEELNVIEQSRITVDEDNNHVSVEFTPTIPHCSMATLIGLSIRVKLLRALPTRFKVDVHITPGTHQSEHAVNKQLADKERVAAALENQHLLEVVNQCLSTRN FAM96B_droMel IKERVLTANEEDENVPDPFDKREIFDLIRNINDPEHPLTLEELHVVQEDLIRINDSQNSVHISFTPTIPHCSMATLIGLSIRVKLLRSLPPRFKVTVEITPGTHASELAVNKQLADKERVAAALENNHLAEVINQCIAKG FAM96B_nemVec LKERVVLAEEEDDNIVDKIDDREIFDMIRSINDPEHPLTLEELNVVEQALIDVSDDESYVKVQFTPTIPHCSMATLIGLAIRVRLLRSLPDRFKVDVKITPGTHQSEIAVNKQLADKERVAAALENNHLLDVIDQCLVSKK FAM96B_triAdh SKEREILPEELDDNIVDKIDEREIFDIIRSINDPEHPLTLEELNVVEECKIDVDDDNNFVKVHFTPTIPHCSMATLIGLCIRVRLIRSLPERFKVDITVTPGSHSSEIAVNKQLADKERVAAAMENSNLLKVVNQCLAMD FAM96B_ampQue AKQRPVTVEEEKDDVYDEIDAREVFDLIRHINDPEHPLTLEELNVVQEDLICINNKENFVSVHFTPTIPHCSMATLIGLSIRVCLLRSLPNRFKIDVIITPGSHMSEQAVNKQLADKERIAAAIENSHLLNVVHQCLNTKR FAM96B_monBre VQERVTLDNDFDDDVIDPFDSREIFDLVRHINDPEHPLTLEELNVVRLDQILVDDAQNYVRVQFTPTIPHCSMASLIGLCLRVRLLRALPPRFKVDVEIFPGTHATEASINKQLADKERVAAALENPNLKTVVNECLQLDD
FAM96B and FAM96A are essentially comprised of a single domain of unknown function, DUF59, a fall-out of automated primary sequence clustering. DUF59 is not a common building block in proteins -- FAM96B/FAM96A are the only human proteins in an 18,500 member proteome to contain it. Thus it is remarkable that the top -- if weak -- Blastp matches of FAM96B to prokaryotes are to a DUF59 domain in the bacterial SufT gene. SufT is part of a large operon of an alternative system for formation of 4Fe-4S proteins. Further, a SufT-related domain occurs in Q6STH5 of Arabidopsis thaliana (in chloroplasts, cyanobacterial endosympbiont proteins assemble Fe-S complexes) fused N-terminally to a homolog of NUBP2/CFD1 which participate in cytosolic Fe-S cluster assembly. Recall CFD1 is fused to CIAO1 in S. pombe, consistent with the three proteins forming a heteromeric complex.
Is DUF59, defined by a subtle primary sequence profile, a valid domain? Four independent crystallographic structures all have the same fold (ie low DALI-defined rmsd deviations), as do template-threaded structures. Additionally SufT is located within a SUF operon in at least three genera of bacteria as SufABCDSUTR in Ralstonia, Cupriavidus and Pseudogulbenkiania -- this can't be coincidence given the rarity of the DUF59 domain and involvement of both FAM96B and SufT in iron sulfur cluster formation. Thus although human FAM98B and bacterial SufT have borderline alignable primary sequences, five lines of evidence support bona fide homology, a surprising result because it mixes (eg within human) a component of the 'backup' bacterial SUF system of iron sulfur complex formation with elements descended from the main bacterial ISC system.
Gene PDB UniProt PubMed Species Comment FAM96A 3UX2 Q9H5X1 22683786,2261886 Homo sapiens cytosolic 4Fe-4S cluster formation FAM96B '3UX2' Q9Y3D0 22678362,22678361 Homo sapiens from FAM96A utilizing >50% sequence identity DUF59 1WCJ Q9WYV7 16199668,15213465 Thermotogo maritima 216 homologs TM0487 1UWD DUF59 3CQ1 Q53W28 -------- Thermus thermophilus dDTP-4-Keto-L-Rhamnose reductase-related TTHB138, aka 2CU6 DUF59 3LNO Q81XF6 -------- Bacillus anthracis article never published
SufT ---- AM260480 -------- Ralstonia eutropha iron sulfur cluster assembly protein PaaD '3CQ1' G8RCQ5 16199668 Staphylococcus aureus aromatic ring hydroxylating enzyme Fe-S assembly YITW HCF101 ---- Q6STH5 14690502,19817716 Arabidopsis thaliana chloroplast 4Fe-4S cluster formation
If more were known about the specific role of SufT in iron sulfur cluster metabolism, that would likely transfer directly to human FAM96B and FAM96A. Although little is known today, SufT may be more easily studied in bacterial genetic systems than FAM96B. Not all bacterial DUF59 domains reside in Suf operons. While the correspondence to FAM96B is not as clear, some of these domains are exceedingly well-studied. Insights into FAM98B/SufT function may thus come from the phenylacetic acid degradation pathway -- a seemingly unrelated topic but one that involves a DUF59 domain.
Here, a colossal annotation error has propagated to thousands of GenBank and UniProt entries -- the DUF59 domain of the phenylacetate catabolic pathway actually resides in PaaD, not PaaJ (a non-homologous protein with only thiolase domains). Utilizing E.coli accession P76080 as valid starting point for PaaD, 85% of the Blastp matches are mislabeled PaaJ. The PaaD protein is obviously not the product of the PaaJ gene as many entries state. This situation -- seemingly impossible to fix -- illustrates the dangers of unattended computer annotation.
>PaaD_escCol Escherichia coli 1 Duf59 conserved cysteine 6 other conserved cysteines P76080 PaaJ ydbQ DUF59 unacceptable synonym: PaaJ MQRLATIAPPQVHEIWALLSQIPDPEIPVLTITDLGMVRNVTQMGEGWVIGFTPTYSGCPATEHLIGAIREAMTTNGFTPVQV VLQLDPAWTTDWMTPDARERLREYGISPPAGHSCHAHLPPEVRCPRCASVHTTLISEFGSTACKALYRCDSCREPFDYFKCI*
The initial aromatic ring oxygenation (using O2, not water) is accomplished by a multi-subunit complex requiring PaaA, PaaB, PaaC and PaaE for in vitro reconstitution of catalytic activity. PaaD has no place in the complex but an essential role in vivo as established by mutation and supported by position within the Paa operon in many species.
Together, this suggests the DUF59 domain of PaaD establishes and/or repairs the 2Fe-2S cluster of PaaE reductase which might be especially vulnerable to reactive oxygen species produced in ring cleavage. This presumed role for PaaD is thus not a great departure from what its homologs FAM96A and FAM96B do in eukaryotes -- the difference being DUF59 acts on a broad class of iron sulfur apo-proteins when it resides in a SufT-containing operon, whereas it acts narrowly on a specific phenylacetate catabolism reductase when located in the associated Paa operon. Note PaaA itself has a possibly relevant di-iron center though its regulatory paralog PaaC does not.
The PaaD protein contains an N-terminal DUF95 domain, at best 38% identical and 55% similar to FAM96B, sharing the single universally conserved central cysteine and a few but not all additional motifs. Following a non-conserved spacer, a strongly conserved C-terminal region of 41 residues containing 6 invariant cysteines emerges. While not a known domain and not found outside bacteria, it suggests an iron-sulfur carrier functionality. Outside of bacteria, PaaD has weak blast matches to assorted DUF59 domains. Thus since its divergence from a common ancestral protein with FAM96B billions of years ago, PaaD has become restricted and highly adapted to the bacterial aromatic catabolic pathway.
The DUF59 domain appears in still other annotational contexts in bacteria, notably YitW, N-6 adenine-specific DNA methylase, and rhamnose reductase-related protein. A peculiar practise in structural genomics programs -- selecting targets at random, posting structures to PDB, never bothering with an article -- results in orphaned, uncharacterized entries such as 3LNO Bacillus anthracis and 3CQ1 Thermus thermophilus in the case of DUF59. In addition, neither GenBank nor UniProt entry annotation tracks propagation -- for example, it is utterly unclear who, why or where a protein in an unknown species was initially designated YitW.
While partly attributable to unsatisfactory computer annotation and nomenclatural multiplicities, other specific targeting systems may occur in these homologs. However when the DUF59 does not consistently reside in a overtly relevant operon, it is difficult to distinguish specific from general targeting. Starting with a given DUF59, it is far from clear where its natural blastp cluster group begins and ends, given potentially rapid rates of sequence divergence in prokaryotes.
These alternatives are additionally difficult to resolve because potential Fe-S apo-protein targets associated to DUF59 modification may not be annotated as such, having proven notoriously difficult to detect in the case of DNA helicases, polymerases and primases. The problem arises from loss during aerobic protein purification (perhaps with replacement by Zn) and lack of reliable bioinformatic signature for iron-sulfur clusters, notably an 'insufficient' number of conserved liganding cysteines (due to cooperating dimeric sites or 19290777 glutathione contributing unsuspected cysteines), a lack of consistent pattern spacing in the cysteines, and ambiguity relative dedicated zinc binding sites.
Beginning with the Thermus thermophilus DUF59 domain with determined structure (3CQ1 Q53W28), tBlastn against complete genomes proves the most instructive approach. Here Thermus scotoductus has an unsurprising 83% identical match to the query but also a 42% match elsewhere to a second 'metal-sulfur cluster biosynthetic enzyme' (ie SufT), a 34% match within the Paa operon (ie PaaD) and a 39% hit to ATP-binding Mrp/Nbp35' suggestive of eukaryotic iron-sulfur assembly paralogs.
These matches specifically conserve ultra-conserved motifs of SufT; however the 'Nbp35' match is to a fused N-terminal DUF59 domain, very reminiscent of a similiar fused protein in Arabidopsis thaliana (Q6STH5). The bacterial Nbp35 domain aligns to NUBPL, NUBP1 and NUBP2 in human which are not fused to a DUF59 domain (ie FAM96B). Nbp35 does not have functionally related gene neighbors in either T. thermophilus or E. coli.
Beginning with the Bacillus anthracis DUF59 domain with determined structure (3LNO Q81XF6), tBlastn against compete bacterial genomes may turn up instructive operon membership in at least some species. While the gene, often called YitW is adjacent to presumably related entitites yitV and yitU, these are ordinary hydrolases unrelated to iron sulfur complexes. In other Bacillus selenitireducens, it is adjacent to a P-loop NTPase helicase, mildly suggestive of an association. In Halobacillus halophilus, it is adjacent to and transcribed in the same direction as molybotperin biosynthetic proteins MobB moaE moaD, an association often observed with DUF59 domains and one that might make sense in terms of Fe-S clusters (moaD is the sulfur carrier protein).
Crystallographic structures from the euryarchaeote Archaeoglobus fulgidus (3KB1 or 2PH1) can be used as structural template for HCF101 (and related sequences, notably human NUBP1). Percent identity at 39% is low but adequate for some purposes. The seemingly biologically illiterate Structural Genomics Consortium picked family representatives lacking Duf59 and DUF971 domains. No article or commentary accompanies the 2008 submissions. The dimer has a zinc ion at its center held together by deeply conserved pair of CPNC motifs that however are lacking in higher eukaryotes such as plants.
The Arabidopsis protein HCF101 is a triple fusion of a DUF59 domain, a P-loop ATpase resembling human NUBPL somewhat more than NUBP1 (NBP35) or NUBP2 (CFD1), and a DUF971 domain (a free-standing protein of unknown function in bacteria). The triple fusion did not arise in plants -- it is already found in early diverging eukaryotes (stramenopiles and alveolata). Since both bacteria and archaea (euryarchaeotes and thaumarchaeotes but not crenarchaeotes which have only standalone NUBP1 eg YP_004781840) have the DUF59-NUBPL fusion, the initial double fusion is ancient. Fungi and metazoans have no DUF59-NUBPL fusion proteins though Duf971 is not altogether lost (eg two Fe dioxygenases BBOX1 TMLHE in human). Frog and hydra have odd fusions accessions, possibly processed pseudogene fusions: XP_002945459, XP_002161498.
Despite some evidence to the contrary, the HCF101 gene family cannot bind a 4Fe-4S cluster via four conserved cysteines as none such exist, according to the broader set of orthologs available today. One conserved cysteine is taken up by the DUF59 domain and another by the P-loop itself; the only other candidate C..C seen in prokaryotes and some early diverging eukaryotes, does not extend to plants.
The best structural template for the triple fusion protein DUF59-NBP35-DUF971 (Arabidopsis HCF101 or unfused human NUBP1) is the DUF59 structure 3KB1 of the euryarchaeote Archaeoglobus fulgidus. Percent identity is low at 40% but adequate for some purposes. Note DUF971 is a free-standing protein of unknown function in bacteria.
The triple fusion of HCF101 did not arise in plants -- it is already present in early diverging eukaryotes (stramenopiles and alveolata) which lack chloroplasts (the intra-cellular location of HCF101). The double fusion DUF59-NUBPL too is older, occurring in both bacteria and archaea. Fungi and metazoans lack DUF59-NUBPL and NUBPL-DUF971 fusion proteins though Duf971 is not altogether lost (eg human Fe dioxygenases BBOX1 and TMLHE).
The Archaeoglobus protein is also a dimer with a central zinc ion held by a deeply conserved pair of CpnC motifs in the NUBPL that however are not available in Arabidopsis HCF101 which has one conserved cysteine taken up by the DUF59 active site and another by the ATP-binding P-loop and so lacks conserved cysteines or histidines at homologous position to CpnC.
However alignment of archaeal protein with the triple paralog system in humans shows remarkable conservation in this region, so the dimer may well bind a 4Fe-4S cluster as that paper concluded. Plants including Arabidopsis do have a second standalone NUBPL counterpart with the CpnC motif so that is probably the proper comparison here, with HCF101 representing a loss in certain early diverging eukaryotes though not all (eg the Paramecium triple fusion).
NUBP2_homSap ENMSGFtCPHCtECtsvF (consensus of 46 NUBP1_homSap ENMSGFiCPKCKKESQIF NUBPL_homSap QNMSVFQCPKCKHKTHIF archaeal 3KB1 ENMAYFECPNCGERTYLF Arabidopsis ENMCHFDAD--GKRYYPF HCF101 Arabidopsis ENMSCFVCPHCNEPSFIF NP_193689 chromosome partitioning ENMSCFKCSKCGEKSYIF Zea mays NP_001150831 ENMSCFKCPKCGEKSYIF Oryza sativa EEC75781 ENMSHHTCSKCGHVERIF Volvox carteri XP_002954172 ENMSYFKCPNCGERSHIF Physcomitrella patens XP_001755735 ENMSCFKCPNCGHPSYIF Vitis vinifera XP_002282449 ENMAYHRCGKCGHVEHIF Chlamydomonas reinhardtii XP_001702721 ENMSFFKCPHCGEPSFIF Populus trichocarpa XP_002307635
The metazoans DUF59 domains FAM96B and FAM96A do not particularly resemble the DUF59 domain found in these fusion proteins (whose consensus sequence matches unfused YitW type best, then SufT). FAM96B occurs separately from DUF59-NUBPL-Duf971 in early eukaryotes and plants. Thus FAM96B did not arise from a breakup of the fusion protein after plant divergence; however NUBPL may have. If so, the gene duplication FAM96A appeared about the same time that free-standing NUBPL first arose and conceivably compensated for loss of the fused DUF59 domain.
Finally the remaining DUF59 domain of interest for annotation transfer is Thermotoga maritima (Q9WYV7 1WCJ) which despite two published articles provides no gene association clues to its function or target apo-proteins. However the String association tool provides some interesting suggestions:
Alignment of FAM96A and FAM96B in phylogenetic order: FAM96A_homSap --EKALEVYDLIRTIRDPEKPNTLEELEVVSESCVEVQEINEEEYLVIIRFTPTVPHCSLATLIGLCLRVKLQRCLPFKHKLEIYISEGTHSTEEDINKQINDKERVAAAMENPNLREIVEQCVLEPD FAM96A_calJac --EKALEVYDLIRTIRDPEKPSTLEELEVVSESCVEVQEINEEEYLVIIRFTPTVPHCSLATLIGLCLRVKLQRCLPFKHKLEIYISEGTHSTEEDINKQINDKERVAAAMENPNLREIVEQCVLEPD FAM96A_otoGar --EKALEIYDLIRTIRDPEKPNTLEELEVVTESCVEVQEINEEDYLVIIKFTPTVPHCSLATLIGLCLRVKLQRCFPFKHKLEIYISEGTHSTEEDINKQINDKERVAAAMENPNLREIVEQCVLEPD FAM96A_tupBel --EKALEVYDLIRTIRDPEKPNTLEELDVVTESCVEVQEINEDDYLVIIRFTPTVPHCSLATLIGLCLRVKLQRCLPFKHKLEIYISEGTHSTEEDINKQINDKERVAAAMENPNLREIVEQCVLEPD FAM96A_musMus --EKALEVYDLIRTIRDPEKPNTLEELEVVTESCVEVQEINEDDYLVIIKFTPTVPHCSLATLIGLCLRVKLQRCLPFKHKLEIYISEGTHSTEEDINKQINDKERVAAAMENPNLREIVEQCVLEPD FAM96A_oryCun --EKALEVYDLIRTIRDPEKPNTLEELEVVTESCVEVQEINEDDYLVVIRFTPTVPHCSLATLIGLCLRVKLQRCLPFKHKLEIYISEGTHSTEEDINKQINDKERVAAAMENPNLREIVEQCVLEPD FAM96A_bosTau --EKALEVYDLIRTIRDPEKPNTLEELEVVTESCVEVQEINEDDYLVIIRFTPTVPHCSLATLIGLCLRVKLQRCLPFKHKLEIYISEGTHSTEEDINKQINDKERVAAAMENPNLREIVEQCVLEPD FAM96A_canFam --EKALEVYDLIRTIRDPEKPNTLEELEVVTESSVEVQEINEEDYLVIIRFTPTVPHCSLATLIGLCLRVKLQRCLPFKHKLEIYISEGTHSTEEDINKQINDKERVAAAMENPNLREIVEQCVLEPD FAM96A_loxAfr --EKALEVYDLIRNIRDPEKPNTLEELEVVTESCVEVQEINEDDYLVIIRFTPTVPHCSLATLIGLCLRVKLQRCLPFKHKLEIYISEGTHSTEEDINKQINDKERVAAAMENPNLREIVEQCVLEPD FAM96A_choHof --EKALEVYDLIKIIQDPEKPNTLEEPEVATESCVEVQEINEEDYLVII-FTPTVPHCCLATLIGLCLRVKLQRCLPFKHNLEIYISEGTHSTEEDINKQINDKERVAAAMENPNLREIVEQCVLEPD FAM96A_monDom --EKALEVYDIIRTIRDPEKPNTLEELEVVTESCVEVKEIHEEDYLVIIRFTPTVPHCSLATLIGLCLRVKLQRCLPFKHKLEIYISEGTHSTEEDINKQINDKERVAAAMENPNLREIVEQCVIEPD FAM96A_ornAna --DKALEVYDLIRTIRDPEKPNTLEELEVVTESCVKVKEVDEDDYLVIIRFTPTVPHCSLATLIGLCLRVKLQRCLPFKHKLEIYISEGTHSTEEDINKQINDKERVAAAMENPNLREIVEQCVLEPD FAM96A_taeGut --DRAIEVYDIIRTIRDPEKPNTLEELEVVTENCVQVQEIGEDEYLVIIRFTPTVPHCSLATLIGLCLRIKLQRCLPFRHKLEIYISEGTHSTEEDINKQINDKERVAAAMENPNLREIVEQCVTEPD FAM96A_galgal --DKALEVYDIIRTIRDPEKPNTLEELDVVTESCVQVDEIGEEEYLVVIRFTPTVPHCSLATLIGLCLRIKLQRCLPFRHKLEIYISEGTHSTEEDINKQINDKERVAAAMENPNLREIVEQCVTEPD FAM96A_chrPic --DRALEVYDIIRTIRDPEKPNTLEELEVVTESCVEVHEIGEDEYLVIIRFTPTVPHCSLATLIGLCLRIKLQRCLPFKHKLEIYISEGAHSTEEDVNKQINDKERVAAAMENPNLREIVEQCVTEPD FAM96A_anoCar --ERALEVYDIIRTIRDPEKPNTLEELDVVTESCVEVHETSEDEYLVTIRFTPTVPHCSLATLIGLCLRIKLQRCLPFKHKLEIFISEGAHSIEEDINKQINDKERVAAAMENPNLREIVEQCVLEPD FAM96A_xenTro --ERALEVYDIIRNIRDPEKPNTLEDLDVVSESCVSVQELDEECYLVVIRFTPTVPHCSLATLIGLCLRVKLQRCLSFKHKLEIYISEGTHSTEEDINKQINDKERVSAAMENPNLREIVEQCVTEPD FAM96A_danRef --EKALEVYDVIRTIRDPEKPNTLEELDVVTEKCVEVQELGDDEYLIVIKFSPTVPHCSLATLIGLCLQVKLQRCLPFKHKLEIYITEGTHSIEEDINKQINDKERVAAAMENPNLREIVEQCVTEPD FAM96A_oreNil --EKALEVYDVIKSIRDPEKPNTLEELEVVTEKCVEVQELGEDEYLIIIRFSPTVPHCSLATLIGLCLQVKLQRCLPFKHKLEIYISEGTHSTEEDINKQINDKERVAAAMENPNLREIVEQCVTEPD FAM96A_oryLat --EKALEVYDVIRSIRDPEKPNTLEELEVVTEKCVEVQDLGEDEYLIIIKFSPTVPHCSLATLIGLCLQVKLQRCLPFKHKLEIYLSEGTHSTEEDINKQINDKERVAAAMENPNLREIVEQCVTEPD FAM96A_cioInt MEDYEGTIYDIIRTIKDPEKPGSLEDLDVVYEEGVSVKTSENHRCNVEVKFRPTIKHCSLATLIGLCLHVKLQRTLPTTHKIRVFVKEGSHNTEDEVNKQINDKERIAAAMENPNIRKMVENCIKEPD FAM96A_braFlo LDDLSDIVYDLIRDIRDPEKDNTLEELDVVYESGVHVEPWGEDKFHISIEFTPTVPHCSLATLIGLCLRVKLENNLPQHYKLDITVKEGTHSTGPEINKQINDKERIAAAMENPDLRAVVNKCVQDPE FAM96A_sacKow EKELAEEIYDIIRTIRDPEKPQTLEDLDVVYEDGVLVNHRGTDEFLVNVEFTPTVPHCTLATLIGLCIRVKLQRTLPHSYKLDIFIKKGTHSTEDEINKQINDKERIAAAMENPNLKDLVDNCVVDLE FAM96A_strPur LNGMAGDIYDIIRDIQDPEKPNTLEDLEVVYEEGVTVAALETEEQLINIEFTPTVPHCSLATLIGLCLRVRLERSLPNKHKLDIIVKKGTHATEDDINKQINDKERIAAAMENPNLRKLVEHCVSIED FAM96A_triCas DSELKYTVYDLIRTIKDPEKPNTLEELNVVYEEGVEVKERTSGNVSVVVEFNPTVPHCSLATLIGLCIRIKLERCIPYRIKLDIYIKAGAHTTEHEINKQINDKERIAAAMENPNLREMVENCIVEED FAM96A_nemVec NRNLALDVYDLIKDIKDPEKPQTLEDLKVVYESCVEVQKVAGQDHIT-ITFTPTVPHCSLATLIGLCIRVKLEKSLPEKFKLDIYLKKGTHSTENEINKQINDKERIAAAMENPNLRKIVENCIDEDN FAM96A_acrPal NLVLVQEVFDIVKDIRDPELPQTLEELHVIEEEFIKIDKIENDEYIIKIEFTPTVPHCSLATLIGLCLRVKLERSLPYKFKLDIFLSRGTHSTENEINKQINDKERIAAAMENPNLKKIVEECILDAN FAM96A_triAdh NQKLCSQIFELIRDIKDPELPQTLEELNVVTEDEIFVRNMKQGEACIRINFTPTVPHCSLATLIGLCIRVKLQRCLDQDYKLDIYVTKGSHDTEDGVNKQINDKERVAAAIENPNVKKLVEECLQEVQ FAM96B_homSap DSIDAREIFDLIRSINDPEHPLTLEELNVVEQVRVQV---SDPESTVAVAFTPTIPHCSMATLIGLSIKVKLLRSLPQRFKMDVHITPGTHASEHAVNKQLADKERVAAALENTHLLEVVNQCLSARS FAM96B_papHam DSIDAREIFDLIRSINDPEHPLTLEELNVVEQVRVQV---SDPESTVAVAFTPTIPHCSMATLIGLSIKVKLLRSLPQRFKMDVHITPGTHASEHAVNKQLADKERVAAALENTHLLEVVNQCLSARP FAM96B_micMur DSIDAREIFDLIRSINDPEHPLTLEELNVVEQVRVQV---SDPESTVAVAFTPTIPHCSMATLIGLSIKVKLLRSLPQRFKMDVHITPGTHASEHAVNKQLADKERVAAALENTHLLEVVNQCLSARS FAM96B_tupBel DSIDAREIFDLIRSINDPEHPLTLEELNVVEQVRVQV---SDPESTVAVAFTPTIPHCSMATLIGLSIKVKLLRSLPQRFKMDVHITPGTHASEHAVNKQLADKERVAAALENTHLLEVVNQCLSARS FAM96B_musMus DSIDAREIFDLIRSINDPEHPLTLEELNVVEQVRIQV---SDPESTVAVAFTPTIPHCSMATLIGLSIKVKLLRSLPQRFKMDVHITPGTHASEHAVNKQLADKERVAAALENTHLLEVVNQCLSars FAM96B_oryCun DSIDAREIFDLIRSINDPEHPLTLEELNVVEQVRVQV---SDPESTVAVAFTPTIPHCSMATLIGLSIKVKLLRSLPQRFKMDVHITPGTHASEHAVNKQLADKERVAAALENTHLLEVVNQCLSARS FAM96B_bosTau DSIDAREIFDLIRSINDPEHPLTLEELNVVEQVRVQV---SDPESTVAVAFTPTIPHCSMATLIGLSIKVKLLRSLPQRFKMDVHITPGTHASEHAVNKQLADKERVAAALENTHLLEVVNQCLSARS FAM96B_canFam DSIDAREIFDLIRSINDPEHPLTLEELNVVEQVRVQV---SDPESTVAVAFTPTIPHCSMATLIGLSIKVKLLRSLPQRFKMDVHITPGTHASEHAGKKQLADKERVAPPLENTHLLEVVNQCLSARS FAM96B_loxAfr DSIDAREIFDLIRSINDPEHPLTLEELNVVEQVRVQV---SDPESTVAVAFTPTIPHCSMATLIGLSIKVKLLRSLPQRFKMDVHITPGTHASEHAVNKQLADKERVAAALENTHLLEVVNQCLSARS FAM96B_choHof DSIDAREIFDLIRSINDPEHPLTLEELNVVEQVRVQV---SDPESTVAVAFTPTIPHCSMATLIGLSIKVKLLRSLPQRFKMDVHITPGTHASEHAVNKQLADKERVAAALENTHLLEVVNQCLSARS FAM96B_macEug DSIDDREIFGLIRSINDPEHPLTLEELNVVEQVRVKV---NDRESTVAVEFTPTIPHCSMATLIGLSIKVKLIRSLPERFKMDVHITPGTHASEHAVNKQLADKERVAAALENSHLLEVVNQCLSARS FAM96B_monDom DSIDDREIFVLIRSINDPEHPLTLEELNVVEQVRVKV---NDRESTVAVEFTPTIPHCSMATLIGLSIKVKLIRSLPERFKMDVHITPGTHASEHAVNKQLADKERVAAALENSHLLEVVNQCLSARS FAM96B_galGal DSIDDREIFDLIRSINDPEHPLTLEELNVVEQVRVKV---NDAESTVAVEFTPTIPHCSMATLIGLSIKVKLIRSLPERFKMDVHITPGTHASEHAVNKQLADKERVAAALENSHLLEVVNQCLSARS FAM96B_xenTro DRIDDREIFDLIRCINDPEHPLTLEELNVVEEIRVKV---SDEESTVSVEFTPTIPHCSMATLIGLSIKVKLLRSLPERFKVDVHITPGTHASEHAVNKQLADKERVAAALENSHLLEVVNQCLSGRS FAM96B_tetNig DPIDDREIFDLIRTINDPEHPLSLEELNVVEQVRVKV---NDAESTVDVEFTPTIPHCSMATLIGLSIKVKLLRCLPNRFKIDVHITPGTHASEEAVNKQLADKERVAAALENSSLLEVVNQCLSSRG FAM96B_gasAcu DPIDDREIFDLIRAINDPEHPLSLEELNVVEQVRVQV---NDEESIVGIEFTPTIPHCSMATLIGLSIKVKLLRSLPDRFKIDVHITPGTHASEEAVNKQLADKERVAAALENSSLLEVVNQLTPTRG FAM96B_danRer DPIDVREIFDLIRSINDPEHPLSLEELNVVEQVRVNV---NDEESTVSVEFTPTIPHCSMATLIGLSIKVKLLRSLPDRFKIDVHITPGTHASEDAVNKQLADKERVAAALENSQLLEVVNQCLSSRG FAM96B_petMar DEIDSREVFDLIRGINDPEHPLTLEELKVVEEAYVSV---TDAESMVVVAFTPTIPHCSMATLIGLAIRVQLLRCLPDRFKVDVHIAPGMHASEHAVNKQLADKERVAAALENSHLLGVVNQCLGGRK FAM96B_cioInt DPFDRREIFDLIRDINDPEHPLTLEDLRVVSENDIEV---DDEKSFIKVSFTPTIPHCSMATLIGLAIRVRLLRSLPPRFKVEVEISPGSHQSEKAVNKQLGDKERVAAALENNHLLNVVNQCLTgrk FAM96B_braFlo DAIDAREIFDILSSINDPEHPLTLEELNVIEQSRITV---DEDNNHVSVEFTPTIPHCSMATLIGLSIRVKLLRALPTRFKVDVHITPGTHQSEHAVNKQLADKERVAAALENQHLLEVVNQCLSTRN FAM96B_strPur DAIDTREVFDLIRNINDPEHPLTLEELNVVQQAEVEV---DDPGNVVKVTFTPTIPHCSMATLIGLAIRVKLIRSLPSRFKVDINIKPGTHVSENAVNKQLADKERVAAALENNHLLEVVNQCLTQRD FAM96B_monFav DKIDEREVFDLIRSINDPEHPLTLEQLNVVEQSLVEV---DDTNNYVKIQFTPTIPHCSMATLIGLAIRVQLLRSLPDRFKVDISITPGTHASEDAVNKQLADKERVAAALENTHLLEVVNQCLAVRH FAM96B_nemVec DKIDDREIFDMIRSINDPEHPLTLEELNVVEQALIDV---SDDESYVKVQFTPTIPHCSMATLIGLAIRVRLLRSLPDRFKVDVKITPGTHQSEIAVNKQLADKERVAAALENNHLLDVIDQCLVSKK FAM96B_plePil dkiddreivdmirsINDPEHPNSLEELSVVQLDLITC---NDTDNYVDVKFTPTIPHCSMATLIGLSLKVKLLRSLASRFKVDVRITPGSHSTEEAINKQLADKERVAAALENPQLVNMVNQCIYGkk FAM96B_ampQue DEIDAREVFDLIRHINDPEHPLTLEELNVVQEDLICI---NNKENFVSVHFTPTIPHCSMATLIGLSIRVCLLRSLPNRFKIDVIITPGSHMSEQAINKQLADKERIAAAIENSHLLNVVHQCLNTKR FAM96B_subDom DEIDAREVFDLVKNINDPEHPLTLEQLNVVQLGHIDV---SDVDSSVTVYFTPTIPHCSMATLIGLSIRVRLLRALPARFKVDVMISPGTHASEVAVNKQLADKERIAAALENNHLLDVVNSCLTGVr FAM96B_triAdh DKIDEREIFDIIRSINDPEHPLTLEELNVVEECKIDV---DDDNNFVKVHFTPTIPHCSMATLIGLCIRVRLIRSLPERFKVDITVTPGSHSSEIAVNKQLADKERVAAAMENSNLLKVVNQCLAMDr FAM96B_monBre DPFDSREIFDLVRHINDPEHPLTLEELNVVRLDQILV---DDAQNYVRVQFTPTIPHCSMASLIGLCLRVRLLRALPPRFKVDVEIFPGTHATEASINKQLADKERVAAALENPNLKTVVNECLQLDD FAM96B_sacCer DLIDAQEIYDLIAHISDPEHPLSLGQLSVVNLEDIDVHDSGNQNAEVVIKITPTITHCSLATLIGLGIRVRLERSLPPRFRITILLKKGTHDSENQVNKQLNDKERVAAACENEQLLGVVSKMLVTCK FAM96B_ajeDer EPIDEQEIYDLIATIADPEHPISLGALAVVSLPDISIKPPDSPLRTVSVLITPTITHCSLATVIGLGVRVRLEQSLPPRFRVDVRIKEGTHSTADEVNKQLADKERVAAALENGTLMGVIGKMLETCQ FAM96B_ashGos DPVDPQEIYDLIAHISDPEHPLTLGQLAVVNLPDIEVRDSGDPHAEVVVRITPTITHCSLATLIGLGIRVRLERSLTPRFRITVLLKKGSHQSENQVNKQLNDKERVAAACENEQLVEVVSKMLSTCK FAM96B_canAlb DPIDEQEIFDLIATISDPEHPLTLAQLAVVNLSDIKITNDGGGGSEVLIKITPTITHCSLATLIGLGIRVRLDRSLPSRYRIKILIKEGTHQSENQVNKQLNDKERVAAACENDQLLNVISQMLSTCK FAM96B_dekBru EPIDAQEIYDLTASISDPEHPLTLGQLAVXNLNDIEVKNASDKSGEILLRITPTISQCSLATLIGLGIRVRLDRCLPKRFRITILLKEGTHQTEKQVNKQLNDKERVSAAAENPQLLKVISNMLSSCE FAM96B_kluLac DPIDAQEIYDLIAHISDPEHPLTLGQLAVVNLADIEVHDTNGKDAEVIVRITPTITHCSLATLIGLGIRVRLERSLSPRFRITILLKKGTHQSENQVNKQLNDKERVAAACENDQLLGVVSKMLSTCK FAM96B_komPas ESVDALEIYDLISSISDPEHPLTLGQLAVVNLEDIQLDDSGNPNAEVIIKITPTITHCSLATLIGLGIRVRLERCLPPRYRIIIKVKEKTHQSENQVNKQLNDKERVSAACENDQLLKVISQMLSSCK FAM96B_parBra EPIDEQEIFDLISTIADPEHPISLGSLAVVSLPDISIRPPDSPLRTVTVLITPTITHCSLATVIGLGVRVRLEQSLPHRFRVDVRIKEGTHSTADEVNKQLADKERVAAALENGTLMGVIGRMLETCQ FAM96B_schPom DPIDPQEIYDLLAKINDPEHPLTLAQLSVVKLEDIEVVDNVEGDSYITVHITPTIPHCSMCTLIGLCIRVRLERCLPPRFHVDVKVKKGTHASESQVNKQLNDKERVAAACENEQLLSVLNGMMATCV FAM96B_triAtr EAIDEQEIYDLISNITDPEHPVSLGQLSVINLPDIHITPVPSPNVQVTVELTPTVTHCSLATVLGLGVRVRLEQVLPPNYRVEVICKENSHSQDDQVNKQLSDKERVAAALENDSLKSVLDKMLESCI FAM96B_yarLip EPIDSQEIYDLIATISDPEHPLTLGQLAVVKLEDIWVHDTGDKNAEIVVKITPTITHCSLATLIGLGIRVRLERALPPRFRFTITVKEGTHQSENQVNKQLNDKERVAAACENEQLLGVISGMLATCQ FAM96B_micSpp DPVDAIEVFYHIKNINDPEHPYSLEQLDIVSVENIRV---HSEAQFIQVYFTPTVPHCSMATLIGLAIRRKLQESLAGRFKTEVLVFPGSHSSESAVNKQLNDKERVAAALENTNLLEKVNLCLRGNL FAM96B_ostLuc DAVDALEIFDHVRDINDPEHPYSLERLNVVGASAIEC---DDARNRVRVEFTPTVPHCSMATLIGLSIRVKLLRTLPRRFKVDVVIAPGTHASERAVNKQLNDKERVAAALENGNLLEKVDLCLSGKT FAM96B_araTha EPIDQLEIFDHIRDIKDPEHPNTLEDLRVVTEDSVEV---DDENSYVRVTFTPTVEHCSMATVIGLCVRVKLLRSLPSRYKIDIRVAPGSHATEDALNKQLNDKERVAAALENPNLVEMVDECLPSEE FAM96B_vitVin EPVDQQEIFDHIRDIKDPEHPYSLEELKVITEDAIEV---DDKRSYVRVTFTPTVEHCSMATVIGLCLRVKLLRSLPSRYKVDIKVAPGTHATEAAVNKQLNDKERVAAALENPNLLDMVDECLAPSY FAM96B_popTri EPIDQLEVFDHIRDIKDPEHPYSLEELKVITEDAIEV---DDNHSYVRVTFTPTVEHCSMATVIGLCLRVKLMRSLPQRYKVDIRVAPGTHATESAVNKQLNDKERVAAALENPNLVDMVDECLAPSY FAM96B_zeaMay EPIDQLEIFDHIRDIKDPEHPYSLEQLNVVTEDSIEL---NDESNYVRVTFTPTVEHCSMATIIGLCIRVKLVRSLPPRYKVDIRVAPGSHATEAAVNKQLNDKERVAAALENPNLLDMVEECLSPTF FAM96B_braDis EPIDQLEIFDHIRDIKDPEHPYSLEELNVVTEESVEI---NDKLSHVRVTFTPTVEHCSMATVIGLCVRVKLIRSLPPRYKVDIRVAPGSHATETAVNKQLNDKERVAAALENPNLLDIVEECLAPTF FAM96B_dicDis DEFDEQEIFDLVRSITDPEHPLTLEQLNVVRIENVNI---NLENSYILLYFTPTVPHCSMANLIGLSIKEKLARSLPKRFKVDVIVTPGSHSSESSVNKQLNDKERVSAALDSSSILTIVNECIKQN- FAM96B_polPal DDFDVYEIFDLVRDINDPEHPLTLEQLNVVRHENIKI---DISNNIIRLYFTPTVPHCSMANIIGLSIKEKLSRSLPQRFKVDVKVTPGSHSSEQSVNKQLNDKERVSAALDSSSILNVVNECIKLPI FAM96B_entDis EDIDQLEIYEHIRRIKDPEHPVTLEQLKVISPDLINV---DDKGNHIIVKFTPTVDNCTMATLIGLTIRTKLMRILPPRIKLDIYLTKGTHQTEEDVNKQLNDKERIAAALEKQTLLQLVNKCLIlpi FAM96B_phyInf DPFEPDEVFEILRHINDPEHPLTLEQLKVMSLENVHV---DDVNSRVKIFFTPTIPHCSMATLIGLCLRVKLLRSLPSRFKVDILITPGTHSSEAAVNKQLNDKERVAAALENSHLLTVVNKCIAHTD FAM96B_thaPse DAITVNEIFDIVRNIQDPEHPLTLEQLNVVRLELIKV---VDSFSTVHVQFTPTIPHCSMATLIGLSLRVKLLRSLPPRFKVVVEIESGTHASEHAVNKQLADKERVRAALENEHLLGVVNKCIAGVA FAM96B_phaTri DMVDADEVFEIIRNIQDPEHPLTLEQLGVVSKRQIDV---HDSYSTLDVRFTPTIPHCSMATHIGLCLRVKLDRSLPPRFKVKVRIEPGSHSSETAINKQLADKERVCAALENKHLLGIVNRCIIDGM FAM96B_naeGru DEFDALEVYDLIRNINDPEHPLSLEQLKVTQHDLITV---DNKNNLIVIYFTPTITHCSMATLIGLSIRVKLLRSLPKRFKVDIFITPGTHQSEDQVNKQLNDKERVAAALENERLLSVVNRCIAQS- FAM96B_triVag EAIDSLELYNYIRLIKDPEHPFSLEQLHIVSPDDIKV---DDKEGRVNLVFTPTVPNCSLPAVLGLCIRERLLQVLPQRFKIFITVARGKHIQEDSINRQLRDKERCLAALERRNIRTMIDNCIACDD FAM96B_cryMur SEITPMDIFEIIRRIKDPEYPLTLEQLNVVELKNISV---DNNANRVIVYFTPTITSCSQASLIGLSILFKLTFTLPSRFKVIIKVTPGSYDSEEALNKQMRDKERVRAALENMQIFKAITRGIVNSD FAM96B_babBov DEFEVTEIFNIIRNIKDPEYSYTLESLKIVEPENIDI---DQENAIVTVKFTPTVPHCSQATIIGLMIYVKLQQSLPLHFKIDVQITEGTHNTEDAINKQLLDKERVAAALENPVLLDMINDGIYNTV FAM96B_tetThe DEIDQLEIFDLIRHIDDPEHPLTLEQLNVLQPENIKV---NIDHKLVTVLFTPTIPHCSLAQIIGLMIKVKLIRSLPRDYKVDVYITPGTHVQELSVNKQINDKERVMAAIENPSILRVVNKGVSNSD FAM96B_theAnn ESFDEEEIFDIIRTIKDPEYSYSLEDLNVVSKDNIFI---DEDTSTISVFFTPTVPHCTQASIIGLMIFVKLYQSLPPYFKIDVQISKGTHNTEEMINKQLLDKERISAALEYPPILKMINKGILFLQ FAM96B_tryCru DPIDSLEVFHHIRSIRDPEHPNTLEELKVVEPELIRV---DEVKQTVRVQFTPTVPHCSMTTLIGLCISLKLQRSLPRGTKVDVYVTPGSHEQEEQVNKQLNDKERVAAALENKNLLNVVESCLNEFE FAM96B_leiInf DPIDAWEVFEIIRRIRDPEHPNSLEQLKVVEPSLITV---DWKKRHIRVLFTPTVPHCSLTTLIGLSIRLQLERSLPEYTKVDIYVTPGTHEQEAQVNKQLNDKERVAAALENCNLLNVVESCINEFD
Alignment of bacterial SufT proteins: * SufT_ralEut -EPDSLEGRVIAALRTVYDPEIPVNIYDLGLIYQLSVDEASGKVGIRMTLTAPGCPVAQTFPGVVESAVMEASGVDAVEVELVWDPPWSRERMSEAARLELGLL Ralstonia eutropha SufT_burTer -TTERLEVRVIDALRTVFDPEIPVNIYDLGLVYGLDVDQAEGRVEIRMTLTAPGCPVAQTFPATVEDAVFCVCGVNEVHVELVWDPPWSRERMSDAARLQLGML Burkholderia terrae SufT_azoSpp -EGEGLRDEVVAVLRLIYDPEIPVNIYDLGLIYGLDIDEAKGKVDIRMTLTAPACPVAESLVSSVKEAVASVDGVEDAAVELVWDPPWTQDRMSEAARLDLGLL Azoarcus spp. SufT_pseFer -PVSELQQRVIEALRTVYDPEIPVNIYDLGLVYALDVDDAEGKVRIDLTLTAPGCPVAQTFPALVAEAVERVDGVHEAEVELVWEPPWSQDMMSEAARLELGLL Pseudogulbenkiania ferrooxidans SufT_polSpp -PVTGLQARVIEALRGVFDPEIPVNIYDLGLVYGLDVDEALGKVHIRLTLTAPGCPVAQTFPEVVGSTVDGVPGVNEVEVELVWQPPWSKGMMSEAARLQLGLL Polaromonas spp. SufT_legLon -DSELLKEAIINALRGVYDPEIPVNIYDLGLIYDVSIDD-NAHVLIQMTLTTPGCPVAQTFPGTVEQAVNQVEGVSDCTVELVWEPPWSQERMTEAARLELGIF Legionella longbeachae SufT_metSpp -TAEDLKEDVIEMLKTIYDPEIPVNIYELGLIYQIDVSD-SGNVVIQMTLTAPGCPVAQTFPGDVENKIRSIDGVNKVHVELVWDPPWTRDQMSEAAQLQLGMF Methylophaga spp. SufT_fluDum -DAELLKEAIVNALREIYDPEIPVNIYDLGLIYDISIDD-ESHVTIQMTLTTPGCPVAQTFPGTVEQAVNKVEGVCDCTVELVWEPPWSQERMTEAARLELGMF Fluoribacter dumoffii SufT_ricGry -NEATLKNSIVNTLKHIYDPEIPVNIYDLGLIYHIFIDV-PGHVTIQMTLTTPGCPVAQTFPSMVENAVNAIDGVHETQVELVWDPPWTSAKMSEAAKLQLGML Rickettsiella grylli SufT_metMob -SREALLVRVKEMLQTIYDPELPVNIYDLGLVYKLEATE-SGQVSIEMTLTTPNCPVAQTFPDTVREKLLCVPGVSSVGVTLVWDPPWGRDSMSEAAKLQLGML Methylotenera mobilis SufT_pedHep -DKEALKQKVIDCLQTIYDPEIPVNIYELGLIYETEILPPLNNVQIVMTLTAPGCPAAQSIPLEVEQKVKEIDGINEVTVEVTWDPPWNRDMMSETARLELGMM Pedobacter heparinus SufT_allVin MDPEELREPIIASLRGVHDPEIPVNIYDLGLIYRIDIAG-NGDVSVDMTLTAPGCPVAGMMPLMVKSAVERVEGVGQVSVQLVWDPPWSADNMSDEARLQLGLM Allochromatium vinosum SufT_marPur VDVEALRESIVTALRGVHDPEIPVNIYDLGLIYTIDIAA-DGTVAVEMTLTAPGCPVAGMMPLMVKQAVARVEGVGEVDVALVWDPPWTQERMSDEARLQLGLM Marichromatium purpuratum SufT_nitMob VGGERLREAVVEALQCVYDPEIPINIYALGLIYELDVND-EGFVDVVMTLTSPSCPVAGQMPGMVKSAVEQVAGVRAAEVELTWDPPWSSDRVSEAGKLQLGLI Nitrococcus mobilis SufT_tisMob DDPGDLEAAVIDVLRSCYDPEIPVNIYDLGLIYEVRIDA-GDQAYIRMTLTSPMCPVAESLPVEIETKIRDIAGISDVTVEVVWDPPWTPEMMSEDARLELNMF Tistrella mobilis SufT_thiVio MEAEELREPIIAALRGVHDPEIPVNIYDLGLIYSVDIAP-NGDVAIDMTLTAPACPVAGMMPIMVKDAVSRVDGVGEVRVELVWDPPWGQENMSDEARLQLGLM Thiocystis violascens SufT_ignAlb ISKQELEEKIIQALKTCYDPEIPVDIFELGLIYEVAI-DDNNNVKIKMTLTSPMCPAAQSLPLEVEGKVKSIPQVNDVKVEVVWNPPWNKDMMSEVAKLELGFL Ignavibacterium album SufT_anaThe KEFLNIESDIVKVLKTVYDPEIPVNIYDLGLIYEIEVRDE-GKVKIVMTLTSPNCPVAESLPEEVYEKVLAVDGVNDVELHLTFDPPWSKDMLSEEAMLELGLL Anaerophaga thermohalophila SufT_zymMob PDKEKLKAEIIETLRDIYDPEIPVNIYDLGLIYDIEIGD-DNHVVIKMTLTTPNCPVAGSMPAEIELRVGQIKGVGAVEVELVWDPPWGMDRISDEAKLELGLL Zymomonas mobilis SufT_thiDre VDAEALREPIIAALRMVHDPEIPINIYDLGLIYRIDIAG-DGDVKVDMTLTAPACPVAGMMPLMVRDAVARVEGVGEVQVELVWDPPWNQNNMSDEARLQLGLM Thiorhodococcus drewsii SufT_erySpp GAGSDLQQAVIDALKEIYDPEIPVNIYDLGLIYGVEVDD-EADATITMTLTTPHCPVAETMPGEVELRAASVPGIRDAEVELVWDPPWSPEKMSDEARLELGML Erythrobacter spp. SufT_rosNub STDHPLYDGVVEACRTVYDPEIPVNIYDLGLIYTIDIDD-ESAVKIIMTLTAPGCPVAGEMPGWVAEAIEPMAGVKQVDVELTWEPPWGMEMMSDEARLELGFM Roseovarius nubinhibens SufT_rosSpp STDHPLYDQLVEACRTVYDPEIPVNIYDLGLIYTIEIDA-ENAVRVIMTLTAPGCPVAGEMPGWVAEAIEPVAGVKQVDVELVWDPPWGMDMMSDEARLELGFM Roseobacter spp. SufT_sphAla AVGGDLYEAVIAALKDIYDPEIPVNIYDLGLIYNVEID--EGHVMVTMTLTTPHCPVAESMPGEVELRVGAVPGVGDAEVNLVWDPPWSPANMSDEARLELGML Sphingopyxis alaskensis SufT_braSpp ADKDALVTEIVAALRTVHDPEIPVNIYDLGLIYRIEPKD-DGQVDIDMTLTAPGCPVAGEILTWVETAVRAIDGIAGVEVRLVFDPPWDSSRMSDDVKLELGLL Bradyrhizobium spp. SufT_pheZuc AELDRLTDQLIEKLKTVYDPEIPVDIYELGLIYKVDVSD-DKDVAIDMTLTAPGCPVAGEMPGWVEDAVMEIDDIKSCKVELVFDPPWDPSRMSDEAKLQLNMF Phenylobacterium zucineum SufT_thiSpp MDAEELREPIIAALRRVHDPEIPVNIYDLGLIYKIDIAS-NGNVDVDMTLTAAACPVAGMMPLMVKDAVQKVEGVGQVEVELVWDPPWSQDNMSEEALLQLGMM Thiorhodovibrio spp. SufT_rhoBac TTDHPLYENVVEACRSVYDPEIPVNIYDLGLIYTIEIDA-ESDVAIKMSLTAPGCPVAGEMPGWVAEAVEPLPGVKTVAVELVWEPPWGMEMMSDEARLELGFM Rhodobacterales bacterium SufT_salRub TPDDDLYQRVVESLREIYDPEIPVNIYDLGLIYHLDVGE-DSHVDVLMTLTAPNCPAAGVLPGQAEDAVRETEGVESVNLEMTFEPPFSPQMMSEEARLELGFM Salinibacter ruber SufT_azoAma DGDPALMAAVFEALRTVRDPEIPVNLVDLGLIYRVRVHR-DGLVHIDMTLTTPACPVATTLPGQVQNLVSLVPGVSVVLVDMVWDPPWTRDRMTESARLELGLV Azospirillum amazonense SufT_cycMar DDQEALKGKVINAIKDVYDPEIPVDVYELGLIYEITVYPV-NNVYILMTLTSPNCPAAESIPSEVKESVGNIEGVNNVEVELTFDPPYSQDMMSEVAKLELGFL Cyclobacterium marinum SufT_rhoMar LGDKELEQAIIEALKSVYDPEIPVNIYDLGLIYEIRIFE-DRTVYVKMTLTAPGCPVAGTLPGQVEMRLQEVPGVKDARVELTFDPPYTIERMSDEARLALGWM Rhodothermus marinus SufT_niaSol MEPDTIEEKVIKELQTVFDPELPVNIYELGLIYKVEVLND-NYVKILMTLTAPSCPAAQSLPVEVDQKIRAIEGVSDVDVTITWDPAWNKSMMSEAAQLELGFL Niabella soli SufT_xanAut EDEAALMEAIIAGLRTVTDPEIPVNIYDLGLIYRIELKD-DGVVEIDMTLTAPGCPVAGQMLGWVQQAVGVVEGVSDVKMKLVFDPPWDKSRMSDEVQLELGLI Xanthobacter autotrophicus SufT_cauCre DELNRLTDQLIEKLKTVYDPEIPVDIYELGLIYKVDVSD-SKDVAIDMTLTAPGCPVAGEMPGWVKDAVMEIPGLKSCTVELTFDPPWDASRMSDEAKLQLNMF Caulobacter crescentus SufT_cytHut IDQAELKNKALEAIQTVYDPEIPVNIFELGLIYEVSVFPV-NNIFVQMTLTSPNCPAAQSMPAEVENKIKAIEGVNEVTVEITFDPTWSQEMMSDAAKLELGFM Cytophaga hutchinsonii SufT_ruePom SVDHDLYEPVVEACRTVYDPEIPVNIFELGLIYTVEISD-ENEVRVIMTLTAPGCPVAGEMPGWVAAAVESVPGVKSVEVEMTWDPPWGMEMMSDEARLELGFM Ruegeria pomeroyi SufT_jooMar IDTTALGEKIVTVLKTIYDPEIPVDIYELGLIYDVFVNED-SEVKILMTLTTPNCPVAETLPVEVEEKIKTIDDVKDAEVEITFDPPWSQDLMSEEAKLELGLL Joostella marina SufT_flaBac SKDVALGEQIVGVIKTIYDPEIPVDIYELGLIYDVLVNED-NEVKILMTLTSPNCPVAETLPVEVEEKVKSIDAVKDAEVEITFDPPWTQELMSEAAKLELGML Flavobacteria bacterium SufT_citSpp STGHPLHEALVEACRTVYDPEIPVNIYDLGLIYTIAIDD-ENAVKVIMTLTAPGCPVAGEMPGWVQEATSTVPGVRDVDVEMVFDPPWGMDMMSDEARLELGFM Citreicella spp. SufT_celAlg IDTAELGEKIVGVLKTIYDPEIPVDIYELGLIYDVFVNED-NEVKILMTLTSPNCPVAESLPAEVEEKVKSLDAVKDAEVEITFDPPWTQELMSEEAKLELGML Cellulophaga algicola SufT_gluMor EGTVPDQDAIIASIATVYDPEIPVNIYELGLIYAIDLHD-DGRVKIEMTLTAPNCPSAQELPEMVKDAVSHVPHVKNVEVEIVWDPPWDMSRMSDDARLALNMF Gluconobacter morbifer SufT_dyaFer MSEEDLKEEVIRAIKTVYDPEIPVDVYELGLIYDLKVFPI-NNVFVSMTLTSPSCPSAGTLPGEVEQKIREVEGVNDVSVELTFDPPYSTEMMSEEAKLELGFM Dyadobacter fermentans Consensus L !! al t!yDPEIPV#I%#LGL!Y V ! MTLTaP CPvA P V v ! g! v Velvw#PPw MS# ArL#Lg Prim.cons. sDDEALKEAVIEALRTVYDPEIPVNIYDLGLIYEIDIDDDEGhVKIDMTLTAPGCPVAGTMPGEVEEAVESVeGVKDVEVELVWDPPWSQDMMSEEARLELGML FAM96_conSeq LIRSINDPEHPLTLEELNVVEESCVEVQEIDDDESLVIVRFTPTIPHCSMATLIGLCIRVKLLRSLPFRFKVDIYITPGTHSSEEAVNKQLNDKER 88 sequences SufT_conSeq ALRTVYDPEIPV-----NIYDLGLIYEIDIDDDEGHVKIDMTLTAPGCPVAGTMPGEVEEAVESVEGVK-DVEVELVWDPPWSQDMMSEEARLELG 42 sequences PaaD_conSeq LLSQVPDPEVPV----LSITDLGMVRDVEWEGD-GWVVVTFTPTYSGCPATELILGDIRQALTEA-GFT-PVHVEVQLSPAWTTDWMSEEGREKLR 21 sequences Consensus llr ! DPE Pv ln! #lg ! #i#dDeg V ! fTpT pgCp a li g !r al gf V !e p ws # mse# r klr
Alignment of Triple fusion DUF59-NUBPL-Duf971 (HCF101) proteins: <---------------------------------------DUF59----*----------------------> < P-loop > DUF59NUDuf971_araTha ------ASSSVGESVAQTSEKDVLKALSQIIDP-DFGTDIVSCGFVKDLGINEALGEVSFRLELTTPACPVKDMFENKANEVVAALPWVKKVNVTMSAQPAK--PIF-AGQLPFGLSRISNIIAVSSCKGGVGKSTVAVNLAYTLAGMGARVGIFDADVYGPSLPTMVNPESRILEMNPE-K-KTIIP DUF59NUDuf971_medTru ----ASVEVGSSSISTGTAEDDVLKALSQIIDP-DFGTDIVTCGFVKDLQIDKALGEVSFRLELTTPACPIKDVFEKQANEVVAVLPWVKNVNVTMSAQPAK--PLF-AEQLPAGLQTISNIIAVSSCKGGVGKSTVAVNLAYTLADMGARVGIFDADIYGPSLPTMVSSENRILEMNPE--KKTIIP DUF59NUDuf971_ostTau ------SMPAIYSAPDGSKESEVLSKLRRVIDP-DFGEDIVNCGFVKALVIDESAGSVLFAIELTTPACPVKAEFERQAKAFVEELDWVKRVSVTMTAQPAR--NDA-PETVE-GLRRVSHIIAVSSCKGGVGKSTTSVNLAYTLAMMGAKVGILDADVYGPSLPTMISPDVPVLEMDKE-T-GTIKP DUF59NUDuf971_micSpp ------SSPSITSAPDGSREADVLNALRNVIDP-DFGEDIVNCGFVKDLRVSD-AGDVTFTLELTTPACPVKEEFDRLSKQYVTALEWAKSCNVNMTAQPVT--NDM-PDAVE-GLKGVRHIIAVSSCKGGVGKSTTSVNLAYTLRMMGAKVGIFDADVFGPSLPTMTSPEQAVLQMDKE-T-GSITP DUF59NUDuf971_thaPse -------------------QSQILAALSVINDP-DLNADIVSLGFVQNLKIDESSNIVSLDLELTTPACPVKDLFVQQCQDIINGLAWTRGADVTLTSQPTA--AP--SDA-PLGMSQIGAVIAVSSCKGGVGKSTTAVNLAFALESLGAKVGIFDADVYGPSLPTMVTPEDDNVRFV-------IAP DUF59NUDuf971_phaTri ---------------------EVLSTLKSVIDP-DLGSDIVTLGFVQNLKLDGRD--VSFDVELTTPACPVKEQFQLDCQQLVQDLPWTNNIQVTMTAQPSV--QE--T-A-TLGMSQVGAVIAVSSCKGGVGKSTTAVNLAFSLQRLGATVGIFDADVYGPSLPTMITPQDDTVRFV-------VAP DUF59NUDuf971_ectSil YHVSMMTETTSDTPPGR--KDEVLAVLSAVMDP-DLSMDIVSLGFIKELEISGEDEVVTFDVELTTPACPVKAQFQQDCRDLVEALPWVDRAEVTMTAQPVR--DV--SDTVPTGLSKVATIIAVSSCKGGVGKSTTAVNLAFALDKQGAKVGILDADIYGPSLPTMVKPDREEVEFVGN-Q---IRP DUF59NUDuf971_albLai LRVQSLPSCSYSTRQNALLEMDILSKLRQVPDQLGLKSDIVTLGRVKNVQLSLQEKSVYLTLEAPNGALDVAEQWKKDSMESLRELDWIQSLHIETARPKPK--NL--HAKRSSTLENVSEIVAVSSCKGGVGKSTVAVNLAYSLVQRGARVGILDADIYGPSLPTMINPEDRVVRPSPT-NKGFILP DUF59NUDuf971_parTet SVPIVHKLYFTFS-VNEEYKIQILNRLKQIKHS-DSHKDIVSNGYVENLSID-QDGRVIIDLKLDQDYRKMKAL----CSDALKQFEWIKNLDIRMAPKKENVFTQA-NTQKRGNLQNVKKIIAVSSCKGGVGKSTIALNLTFSLQKLGFNVGIFDADVYGPSLPTLIGKEKQQLYAPED-KPKEILP DUF59NUDuf971_tetThe IGQLTQKPTFHFSNAVQEGKAEITKKLKEITFE-DG-SNIIDNGSILTIDIE-SSGKVTVQLKLDQNYRKLKGL----CNAKLQEIPWIKEFEIKMAPKDQ----ET-SFKKRGQLENVKKIIAVSSCKGGVGKSTVAINLAFSLLKQGHKVGIFDADIYGPSIPTLINKENAILQAPED-RPKEILP DUF59NUDuf971_perMar MARFGRRGFANETAAATEREKEILQQLSLIIDP-DLHKDIVTLGFVQNLTISD-EGVVVFDLKLTTPACPVRDQFIDACTRACSALPWVTDVKVTLSAKSRA--GGA-PEVKSENLSNVQNIVAVTSCKGGVGKSSVAVNLAYSIAKHGVKVGILDADIFGPSLPYLIPSTERAPADPQ--------P DUF59NUDuf971_pauChr ---------------MVFTTNEALQVLAVIL-DEGSRRSVIELGWISRLRIQ--NSRIIFRLELPNFANKQRDEIVKKARASLLLLEGMKDVQIEIATAPIGQAGHG-AENGRQSISGVKHILAVSSGKGGVGKSTVAVNLACALARSGLKVGLLDADIYGPNVPTMLGVEDVKPEIAGTGNQQVLSP DUF59NUDuf971_synSpp ----------------MTPVEQANKALQQVK-DAGSGKTALELGWIEQIRIT--PPRAVFRLSLPGFAQSQRDRIVAEARGALMALDGIEDVQIEISQGGIGQAGHG-QPAERQSIPGVRQVIAVSSGKGGVGKSTVAVNLACALAQTGLRVGLLDADIYGPNAPTMLGVADQTPEVQGSGDQQRIVP Duf59NU_theSco ---------------MALTEERVLEALRTVM-DPELGKDLVSLGMVGEVRLE--GGRVDLLINLTTPACPLKGQIEADIRRALHPL-GVEEVRVRFGGGVKA--------PEQYPIPGVKHVVAVGSGKGGVGKSTVAANLALALLQEGARVGLLDADLYGPSQAKMFGLEGERLKVD---QHRKILP Duf59NU_marHyd --------------MSRLSETDVLDALRGVN-DPELHKDLVSLGMVEQVVVD--GRKVAVKINLTTPACPLKGQIEGEVRAALERI-GAEHVEITFGASVRG--------PQQLPLPGVKNVVAVGSGKGGVGKSTVAVNLAIALSQEGARVGLLDGDIYGPSQARMLGLEGEKLRVN---EAKKIVP Duf59NU_theRos --------------MSELTRERVLEALRPVQ-DPELHRSLVDLGMIKEVTIE--GASVRVQVELTTPACPLRERIREDVERAVRALPGVQTVEVGFSSRVRAAGTGL---PDRQPIPGVKNTIAVASGKGGVGKSTVAVNLAVALAQEGATVGLLDADVYGPSIPLMLGAEEQ-PGLV---DNKII-P Duf59NU_deiMar -----------------MHQDDVLAALRTVN-DPELHRDLVSLGMVKGVRVD--GDRVDVHVELTTPACPLRGTIEADVRRAVEAA-GARDVRVEFSARVAP--------PAQPALPGVKHVLLVGSGKGGVGKSSVAANIAASLAADGARVGLMDADVYGPSIAHMMGASGEKVTAT---EDRKMRP Duf59NU_lepFer --------------MSTVSEELVWSALGRVI-EPDFKKDLVTLKMIENLKID--GGNLSFTIVLTTPACPLKDEMKNACLASLASVPGITNTEISFTARTTGGSF-----TGKTPIPGVKNVIAVSSGKGGVGKSTTSVNLAIALSQMGAKVGIMDADVYGPNIPMMLGITD--TPRQ---VDKKLFP Duf59NU_myxFul ---------------MSVTQADILSAMSKVM-DPELHVDLVKAGMVKDIRVS--GDTAKLKIELTTPACPMKGKIQADAEAALKAVPGLKSFDIEWGAQVRAAGGGM---PGGALLPKVKNIILVGAGKGGVGKSTVAVNLATALAQHGAKVGLLDADFYGPSVPLMTGLGDKRPVSP---DGKVLNP Duf59NU_corCor ---------------MSVSQADVMTAMSKVI-DPELHVDLVKAGMVKDVRVT--GDTVKLKIELTTPACPMKGKIQADAEAALKAVPGLKSFDIEWGAQVRGVAGGS----GGALLPGVKNILLVGAGKGGVGKSTVAVNLATSLAQHGAKVGLLDADFYGPSVPLMTGTADKRPVSP---NGKTLDP Duf59NU_stiAur ---------------MSVSERDILAAMSKVV-DPELHVDLVKAGMVKDIRIS--GDAVKLKIELTTPACPMKGKIQADTEAALKAVPGLKSFELEWGAQVRATGGGVGQGQGQALLPGVKNIILVGAGKGGVGKSTVAVNLATALARHGAKVGLLDADFYGPSVPLMTGITEK-PVSP---DGKTLTP Duf59NU_anaThe -------------MASAVTKEAVLQALSHVQ-EPELHKDLVTLGMVRDVEIE--AGKVRFRIVLTTPACPLKSRIENEARSAVLSLSGVQEVEVILDAQVPSDGR-----NRGVLSLPVRNVVAVASGKGGVGKSTVAVNLAVSLAQSGARVGLLDADIYGPNIPTMMGVQR--LPPQ---NGQKLIP Duf59NU_anaSpp ---------------MPLDPTTALDALRKVM-DPELRRDLVSLGMVKDVVVE--GDTVRLKVELTTPACPLKDTIGRDVKAALEGA-GFRSVELSWGAQVRAAPGAA----QGQLTPGVKNIILVGAGKGGVGKSTVAVNLAAGLARTGAKVGILDADIYGPSVPMLTGVTDR-PTSR---DGKKLEP Duf59NU_natPha -----------------MNEETVLDRLAAVENDPDLGDDIVSLGLVNDVNID--AETIHVDLALGAPYSPTETELAGTVRDALSE----LDREIDLTASVDT--GLS---ADEQILPDVENIIAVASGKGGVGKSTVAVNLAAGLSQLGARVGLFDADVYGPNVPRMVEADDQ-PKAT---EQETIIP Duf59NU_halPau -----------------MDETDVRAVLRTVE-DPDLGEDIVSLGLVNDVTVE--DETARISLALGAPYAPHESEIANRVREALND----EGIDTELSARVDT--QLS---PEEQVLPGVKNIIAVASGKGGVGKSTVAVNLAAGLAKLGARVGLFDADVYGPNVPRMVDANER-PRAT---EEQKLVP Duf59NU_natPel -----------------MDEAAVRDRLRTVE-DPELGDDIVSLGLVNDITVD--GDEVAIDLALGAPYSPSESDIAAEVRETLTA----EDLEPDLTASVPDRDDLT---SEDQVLPNVKNVIAVASGKGGVGKSTVAVNLAAGLSQLGARVGLFDADVYGPNVPRMVDADEP-PMAT---EDETLVP Duf59NU_halLac -----------------MNEADVRERLVDVR-DPDLGDDIVSLGLVNDVEVD--DDEIRISLALGAPFSPHESAIADDVRAALAD----TGLDVELSASIPD--DLE---PDEQVLPGVKNVIAVASGKGGVGKSTMAVNIAAGLSALGARVGLFDADVYGPNVPRMVSAEER-PQ-T---DGETIVP Duf59NU_halVol -----------------MDEADVRDRLRAVE-DPDLGDDVVSLGLVNAVEVD--GDTVRISLALGAPYSPAETDIGRRIREVLAE----DGLEVDLTAKVPT--DRD---PDEEVLPGVKNIIAVASGKGGVGKSTVAVNLAAGLSKLGARVGLFDADIYGPNVPRMVAAEEA-PQAT---QDQTIVP Duf59NU_halXan -----------------MDEDAVRDRLRTVE-DPELGDDIVSLGLVNDIAVD--GDEVSVDLALGAPYSPTESDIAAEAREVLIE----AGLEPDLSASVPDRDDLS---SDEQVLPGVKNVIAVASGKGGVGKSTVAVNLAAGLSQLGARVGLFDADVYGPNVPRMVDADEP-PMAT---EDETLMP Duf59NU_halTia -----------------MDEAELRELLASVE-DPDLEGDIVSLGLVNDVALE--NGSAHIDLALGAPFSPTETTIADRVREVIGD----AA--PDLAVELTASIDRD---TEGDVLPGVKNVVAVASGKGGVGKSTVAVNLAAGLADRGARVGLFDADIYGPNVPRMLDAHER-PEAT---EDDQIIP Duf59NU_calSub ---------------MLIDEQTVLNALRQVI-DPDLKIDIVTLGMIKNLVIK--DGDVSFTLELTTPACPYNKSIEDSARAAVESIPGVRSVDMRVTARVWSAKPM----AST--YPDVKNVVAVASGKGGVGKTTVAINLACSLALSGARVGLVDADIYGPTIPKIVKIVEPPRLRP---DKKVEPA Duf59NU_cenSym --IRLSLNNENTLTENMVGVDQVLESLGKVI-DPDLKKDIVSMGMIKDLELD--DGNLKFTLELTTPACPFNVEIEDDVRKVIGELDGIKNLNLNVTAKVMEGRSL----DEDAGMTTVKNIIGVASGKGGVGKSTVALNLALALGQTGAKVGLLDADIYGPSIPLMLGMKEAFMEVE---ANKLQPA Duf59NU_nitSal ----------------MVGIDQVLEKLSTVI-DPDLKKDIVSMGMIKDLELN--DGNLKFTLELTTPACPFNVEIEDDVRKAIGEISELKNFDMKVTAKVMEGRSL----EADTGMASVKNIIGVASGKGGVGKSTVSLNLALALAQTGAKVGLLDADIYGPSIPLMLGMKDGFMEVE---DNKLQPA Consensus e vl L ! dpdl di!slG ! v leLttpacp i r al a v lpgVkn!!aV sgKGGVGKSTvavNLA La Ga VGllDAD YGPs p $ P Prim.cons.DUF59 EADVLqALRaVIDDPDLGKDIVSLGMVKDLRIDESGGTVSFDLELTTPACPlKDEIEADVRAAL * * * * * * * DUF59NUDuf971_araTha TEYMGVKLVSFGFA-GQG--RAIMRGPMVSGVINQLLTTTEWGELDYLVIDMPPGTGDIQLTLCQVAPLTAAVIVTTPQKLAFIDVAKGVRMFSKLKVPCVAVVENMCHFDAD--GKRYYPFGKGSGSEVVKQFGIPHLFDLPIRPTLSASGDSGTPEVVSDPLSDV-ARTFQDLGVCVVQQCAK DUF59NUDuf971_medTru TEYMGVKLVSFGFA-GQG--RAIMRGPMVSGVTNQLLTTTEWGELDYLVIDMPPGTGDIQLTLCQIVPLTAAVIVTTPQKLSFIDVAKGVRMFSKLKVPCVAVVENMCHFDAD--GKRYYPFGRGSGSQVVQQFGIPHLFDLPIRPTLSASGDSGMPEVVADPQGEV-SKIFQNLGVCVVQQCAK DUF59NUDuf971_ostTau VEYEGVKVVSFGFA-GQG--SAIMRGPMVSGLINQLLTTTDWGELDYLIIDMPPGTGDVQLTLCQVVPITAAVVVTTPQKLAFIDVEKGVRMFAKLAVPCVSVVENMSYFEVD--GVKHKPFGEGSGAKICEQYGVPNLLQMPIVPDLSACGDTGRPLVLRDPTCET-SSRYQEVAATVVREVAK DUF59NUDuf971_micSpp TEYEGVGIVSFGFA-GQG--SAIMRGPMVSGLINQMLTTTAWGDLDYLIIDMPPGTGDVQLTICQVLPITAAVVVTTPQKLAFIDVEKGVRMFSKLRVPCVAVVENMSYFDGDD-GKRYKPFGEGSGQRICDDYGVPNLFQMPIVPDLSACGDTGRPLVLVDPAGDV-STIYGAVAAKVVQEVAK DUF59NUDuf971_thaPse LRRGDVSLMSFGYV-NEG--SAIMRGPMVTQLLDQFLSLTNWGALDYLIMDMPPGTGDIQLTLSQRLNITAAVIVTTPQELSFVDVERGVEMFDTVNVPCIAVVENMAYLEREE-TEMIRIFGPGHKRRLSEQWGIEHTYSVPLMGQIAQNGDSGTPFILDNPKSPQ-ADIYRQLAKSVVSEVAK DUF59NUDuf971_phaTri LQRNGVRLMSFGYV-NDG--SAVMRGPMVTQLLDQFLSVTHWGALDYLILDMPPGTGDIQLTLTQKLNITAAVIVTTPQELSFADVVRGVEMFDTVNVPCIAVVENMAYYESAD-PEKIQIFGAGHRDRLSQQWGIEHSFSIPLLNKIAANGDNGTPFVLEFPDSPP-AKIYQELASAVVSEVAK DUF59NUDuf971_ectSil MTAHGVKLMSYGFV-NQG--AAIMRGPMVSQLLSQFVTLTSWGELDYLVIDMPPGTGDIQLTLCQVLNITAAVIVTTPQKLSFTDVVKGIDLFDTVNVPSVAVVENMAYYDAVD-QTVFKIFGKGHQMRLADMWGITNTIRMPLVADVASSGDSGIPFVVSKPDSDH-SESYSQLAEAVVREVAK DUF59NUDuf971_albLai LEFQGVKLMSFGFV-NQG--AAVMRGPMVSKLIDQLILATQWGSLDFLIVDMPPGTGDIQMSLTQQMPISAAVIVTTPQRLSTIDVEKGIVMFQNLKVPSVAVVENMAFFDCIH-GTRHYPFGRSHMQELAEKYSIEHMFQLPITQESAYSADHGKPFVLGGNDPNT-VETYKHLAEAIAREVVK DUF59NUDuf971_parTet IEFNGVKTMSYGYASGNQ--KAIIRGPMVSSIVVQLVQQTQWQNLDYLVVDMPPGTGDIQISLCQELNFDGAVIVTTPQRLSFIDVVKGIEMFDVLKVPTLSVVENMAEYVCPDCNHVHRPFGQGYMNMLQKQFGIATAVSIPLYGDISKYSDLGSPVVLTLPEDHTINNIYRQLANNVVHELSR DUF59NUDuf971_tetThe IEYEGLKTMSYGFA--RK--KAIIRGPMVSAIVTQLAMQTQWGDLDYLIVDMPPGTGDIQITLCQEIKFDGAVVVTTPQKLAFVDVIKGIEMFDELKVPTLAVVENMCLFVCDGCGKEHHPFGPGYMNMLKNQFGIQSSVQIPIYDMIAKYSDYGRPVSITLPDEHTITKIYSSLAENVHQEILK DUF59NUDuf971_perMar YYHNGVKLMSMGYIRPGE--SVAVRGPMVSGMIQQMLTMTDWGHLDYLIIDYPPGTGDVQLTIGQQAKVDAAVVVTTPQQLSLVDVEKGIELFDKLNIPSIAVVENMAYFKCPTCSDKHQVFGRAADSKLAEKYGIQSHVELPIDPDMARNVDSAFPFVCNEADGSEASKAFESLADDVIRGVSK DUF59NUDuf971_pauChr IVCYGISMVSMGLLIDKNQ-PVIWRGPMLNGIIRQFLYQVEWENKDVLVVDLPPGTGDVQLSITQAIPLVGAVIVTTPQAVSLQDARRGLAMFIQMGVNILGVIENMSVFIPPDRPERYALFGNGGGSTLAEEADVELLTQLPMEILVQQGSDRGKPIVLSQS-TSTTGKAFIALAEKIKLKFLK DUF59NUDuf971_synSpp IETCGIAMVSMGLLIDDHQ-PVIWRGPMLNGIIRQFLYQAEWGERDVLIVDLPPGTGDAQLSLAQAVPMAGVVIVTTPQQVSLQDARRGLAMFRQMGIPVLGVVENMSAFIPPDRPDRYALFGSGGGAQLASDYDVPLLAQIPMEMPVQEGGDTGRPIVINRS-DSASAAEFKGLAEAVLKAVTQ Duf59NU_theSco LEAFGLKVLSIANIVPPGQ-AMIWRGPILHGTIKQFLEEVNWGELDYLVVDLPPGTGDVQLSLAQLTKVSGGVIVTTPQEVALIDAERAADMFKKVQVPVLGVLENMSHFLCPHCGKPTPIFGEGGGKRLAERLKTRFLGEIPLTLPLRESGDRGRPILVESP-EGPEAEAFRRAARELAAALSV Duf59NU_marHyd LERYGIRVLSIANIAPPGQ-ALVWRGPILHGTIRQFLQDVDWGELDYLIVDLPPGTGDVQLSLSQLTQVTGGVIVTTPQDVARIDAERAADMFRKVQVPLLGVIENMAYYACPSCGERSYLFGQGGGRKLAESQNTAFLGEIPLSMPVRESGDAGTPITVAHP-DAPEAQAFRQVARQLAGQLSI Duf59NU_theRos GRAYGIAVMSVGYILDPEK-ALIWRGPLVSQLIRQFLSDVQWGDLDYLVIDLPPGTGDVQLTLVQTIPLSGAIIVTTPQDVALADAIKGLQMFREVKTPVLGIVENMSYFVCPHCGHVAEIFGSGGGERVANKYGVPLLGQIPIDPAVREGGDRGVPVVVGQP-GSSTAQAFREAARQAAARLSV Duf59NU_deiMar LERHGVKFISMGNLSPAGQ-ALVWRGPMLHSAVQQFLKDAAWGELDYLIVDLPPGTGDVQLSITQSVNVTGAVIVTTPQDVALIDAARAVDMFRKASVPILGVVENMSYFVAPDTGITYDIFGRGGARKLG---GLTVLGEVPLDTEVRQDADGGVPSVLAHP-QSAASVALRSVARTLAGRVSV Duf59NU_lepFer PSGHGITVMSMAFMVPPGT-PLIWRGPMLHGIIQQFCQDIAWGDLDYLVVDMPPGTGDAQLSLAQLVPLSGAIIVTTPQEVALSDSRRGLAMFQKVNVPILGIVENMSSFVCPHCHEETDIFSKGGGEKAAHELHVPFLGRIPIDLSIREGGDSGHPIAVAYP-ESPLTQSYRDIAGKLASAISI Duf59NU_myxFul LEAHGLKVMSIGFLVEADQ-ALIWRGPMLHGALMQLVRDVNWGELDYLVLDLPPGTGDVALTLSQSVRAAGAVLVTTPQDVALADVVRAKQMFDKVHIPVLGIVENMSQFVCPNCSHATPIFNHGGGQKAAQMFGIPFLGEIPLDLKVRVSGDSGVPVVVGAK-DSPEAKAFQDVARNIAGRVSA Duf59NU_corCor LVAHGLKIMSIGFLVEADQ-ALIWRGPMLHGALMQLVRDVNWGELDYLILDLPPGTGDVALTLSQSVRAAGAVLVTTPQDVALADVVRAKQMFDKVHIPVLGIVENMSQFVCPHCNKSTPIFNHGGGHKAAEMFGIPFLGEIPLDLKVREAGDSGVPVVVGHQ-DSPEAKAFQEVARAVAGRVST Duf59NU_stiAur MSKYGLKIMSIGFLVEPDQ-ALIWRGPMLHGALMQLVRDVNWGELDYLILDLPPGTGDVALSLSQSVRAAGAVLVTTPQDVALADVVRAKSMFDKVHIPVLGIVENMSQFICPNCSHATNIFHRGGGRKAAEMFSIPFLGEVPLELKVRESGDAGVPVVAGAP-DSREAQAFLEIARNVAGRVSA Duf59NU_anaThe AEAYGVQVMSIGFLVKPGQ-PLIWRGPMLHSAIRQFLADVAWNELDYMIVDLPPGTGDAQLSLAQSVPLSGGVIVTLPQRVSQEDAMRGLQMFRELNVPVLGVIENMSYLELPDGT-RMDIFGTGGGEDLAQAAEVPFLGAIPIDPGVRVGGDQGVPVVISAP-QSAPARALTAIAQKIAASLSV Duf59NU_anaSpp LHAHGMKVMSIGFLVDPDQ-ALIWRGPMVTGALIQLLRDVNWGDLDYLVLDLPPGTGDIPLTLAQNVRAAGVVLVSTPQDLALADVIRAKLMFDKVSIPVLGIVENMSAFVCPHCRSETAIFDKGGARTAAEKMGIRFLGDVPIDLAIREGGDKGVPVVVGQP-DSPQAAALLAVAKNVAGAVST Duf59NU_natPha PEKYGMKLMSMDFLVGEDD-PVIWRGPMVHKVLTQLWEDVEWGALDYMVVDLPPGTGDTQLTLLQSVPVSGAVIVTTPQKVALDDAEKGLQMFGEHDTPVLGIVENMSGFVCPDCGSEHDIFGSGGGESFADDVEMPFLGRIPLDPAVREGGDAGRPVVLDE--DDETGEALRSFTERTANMQGI Duf59NU_halPau PEKFGVKLMSMAFLTGKDD-PVIWRGPMVHKVLTQLWEDVEWGQLDYMVVDLPPGTGDTQLTLLQSVPVTGAVIVTTPQQVALDDANKGLQMFGKHDTPVLGIAENMSTFKCPDCGGEHDIFGHGGGAEFAEDHEMPFLGSIPLDPSVRSGGDEGEPIVLDD--ESDTGESFRTLTENVANNVGI Duf59NU_natPel PEKYGVKLMSMAFLTGEDD-PVIWRGPMVHKVITQLTEDVEWGHLDYLVVDLPPGTGDTQLTMLQTMPVTGAVIVTTPQDVALDDARKGLEMFAKHDTVVLGIAENMSTFACPDCGGEHDIFGSGGGEDFAEEHELPFLGSIPLDPAVREGGDGGKPTVLKD--DDTTSDALRTITENVANNTGI Duf59NU_halLac PERFGVKLMSMDFLTGEDD-PVIWRGPMVHKIITQLVEDVEWGELDYLVMDLPPGTGDTQLTILQTLPLTGAVIVTTPQEVALDDAVKGLRMFGKHDTNVLGIAENMAGFRCPDCGGFHEIFGSGGGKALAQEHDLPFLGGVPLDPAVRTGGDDGEPVVL-E--EGETADAFRVIVENVANNAGV Duf59NU_halVol PEKYGMKLMSMAFLVGDDD-PVIWRGPMVHQLLTQLVEDVEWGSLDYLVLDLPPGTGDTQLSILQTLPLTGAVIVTTPQNVALDDANKGLRMFGKHDTNVLGIVENMSTFRCPDCGNRHDIFGAGGGREFAASNDLPFLGALPLDPAVREGGDGGKPIVLED--DDETADAFRVMTENIADMVGI Duf59NU_halXan PEKYGVKLMSMAFLTGEDD-PVIWRGPMVHKVITQLTEDVEWGHLDYLVVDLPPGTGDTQLTMLQTMPVTGAVIVTTPQDVALDDARKGLEMFAQHDTVVLGIAENMSSFACPDCGSQHDIFGSGGGREFADEHDMPFLGSIPLDPSVREGGDSGKPTVLDD--DSEVGESFRTITENVANNTGI Duf59NU_halTia PEKHGMKLMSMDFLLGEDD-PVIWRGPMVHQTLTQLFEDVQWGDLDYLVVDLPPGTGDTQLTLLQTVPVTGAVIVTTPQGVALDDARKGLEMFGKHETPVLGIIENMSSFKCPDCGSEHAIFGEGGGREFADQVQMPFLGEIPLDPEIRERGDEGRPAVLAD--DLDVSGAFRNFVANTANNQGI Duf59NU_calSub KMMLGIKVMSLGLFVDEGT-AVIWRGPLVASAVKQLLTEAQWGELDYLIVDLPPGTGDASLTLAQTMPLTGVVIVTTPQQAASVIAAKALSMFRRLGVTIIGIVENMSYYVCPECGKESSLFGQSHTDKMAAELDVEVLGRIPMSPDVSVNHDQGVPIVLAAP-SSPAAKAFDEAAKKIAAKISI Duf59NU_cenSym EAS-GIKVVSFGFFAEQAHKAAIYRGPIISGILKQFLVDTNWSDLDYLIVDLPPGTGDIPLTLAQTIPITGILVVTTPQNVASNVAVKAVGMFEKLNVPIIGVVENMSGFVCNKCGEKHNVFGEGGAKRISEQFKIPLIGEIPLTAGIMAGSEEGRPIILTDP-DSPSSNAFRSSAKNIAAQCSI Duf59NU_nitSal DSH-GLKVVSFGFFADQSNQAAIYRGPIISGILKQFLVDTNWSDLDYLIVDLPPGTGDIPLTLAQTIPITGILVVTTPQDVASNVAVKAVSMFEKLNVPIIGVVENMSHFICPNCNEKHYIFGEGGAKKISEQFNMPFLGEIPLNSGIMAGSDLGKPIMITNP-DSPSAVAFRKSAKNIAAQCSI
Alignment of bacterial PaaD proteins: <-----------------------------------------DUF59---------------------------------------------> <--------- unknown 6-Cys domain -------> * - * * * * * * conserved cysteines PaaD_escCol WALLSQIPDPEIPVLTITDLGMVRNVTQMGEGWVIG-FTPTYSGCPATEHLIGAIREAMTTNGFTPVQVVLQLDPAWTTDWMTPDARERLREYGISPPAG--HSC--HA--HLPPEVRCPRCASVHTTLISEFGSTACKALYRCDSCREPFDYFKCI Escherichia coli PaaD_entAsb WSLLSQIPDPEVPVLTITDLGMVRNVTALGEGWVIG-FTPTYSGCPATEHLLGAIRETMTAHGFTPVHIVLQLEPAWTTDWMTADARERLREYGISPPVG--HSC--HA--HAPAEVSCPRCASTDTSLISEFGSTACKALYRCNSCREPFDYFKCI Enterobacter asburiae PaaD_kleOxy WEVLSAIPDPEVPVLTITDLGMVRSVDRRGDGWVIG-FTPTYSGCPATEHLLGEIRAAMTENGYAPVHIVLQLDPPWTTDWMGPEARERLRQYGISPPQG--HAC--HA--HMPEEVVCPRCASRHTSLISEFGSTACKALYRCDSCREPFDYFKCI Klebsiella oxytoca PaaD_citFre WGLLSAISDPEVPVLTITDLGMVRSVERCGDGWVIG-FTPTYSGCPATEHLLGEIRAVMADHGYTPVHIVLQLDPPWTTDWMSPDARERLRQYGISPPQA--HAC--HA--EMPTDVQCPRCASTHTSLISEFGSTACKALYRCDSCREPFDYFKCI Citrobacter freundii PaaD_serPro WHCLQQISDPELPVLSITDLGMVRDVVADGGGWRIT-FTPTYSGCPATEFLLEAIEQQLTAAGFSPVKVDIRLSPAWTTDWMNADARERLREYGVAPPQG--HTC-DKP--QAHGPVPCPRCGSTHSEKISEFGSTACKALYRCCDCREPFDYFKCI Serratia proteamaculans PaaD_hafAlv WQCLHAISDPELPVLSITDLGMVRGVTPLKKGWLVT-FTPTYSGCPATEFLISAIQETLTEAGFSPVQVEICLTPAWTTDWMNVEAKNRLREYGVAPPQG--LIC-EKP--LSTETVQCPRCGSHDSQKVSEFGSTACKALYRCKQCLEPFDYFKCI Hafnia alvei PaaD_serOdo WHCLQQISDPELPVLSITDLGMVRSVEAEGTGWRVT-FTPTYSGCPATEFLLEAIERQLFEAGFSPVRVEVRLDPAWTTDWMNAEARARLRQYGVAPPQG--HSC-DRP--LSHGPVPCPRCGSEHTEKISEFGSTACKALYRCRACREPFDYFKCI Serratia odorifera PaaD_proStu WQQLHQIPDPELPALSITDLGMIRNVFATSRGWKVM-FTPTYSGCPATEFLINEIKNVLEHAGFPNVEIEVVLTPAWTTDWMNQDAKQRLREFGIAPPAG--KAC-EHP--EKSGPICCPRCDSQHTEKISEFGSTACKALYRCLECFEPFDYFKCI Providencia stuartii PaaD_psyIng WKLLSAIPDPEIPAITIAELGMLRAVDFENEQWVVT-FTPTYSGCPATEMLINDITQAMTSAGHTPVKVNVSLDPAWTTDWMSEASKKKLSDYGIAPPQG--KAC-FEG--SLPPSVNCPNCGGQSTQLISEFGSTACKAHYKCLTCYEPFDYFKCI Psychromonas ingrahamii PaaD_aciRad WETLKQVADPEIPVLSVVDLGMIRGVELNEEDQIIVRLTPTYSGCPATDLLKAEITQAFTVQGLVPVQVVVDLSEVWTTDWMSESGKQKLQQYGIAPPQGEAHSCGTHV--ALSDGIKCPRCHSQHTKLLSEFSSTACKALYRCQQCLEPFDYFKCI Acinetobacter radioresistens PaaD_psePut WAVLGQVMDPEVPVVSVVDLGIVRDVDW-RAGHLHLVVTPTYSGCPATEVIEGDIRQALEQAGFTAPDLERRLTPAWSTDWISELGRERLRAYGIAPPQGSASKRSLLG---EPPQVCCPQCGSAHTELLSQFGSTACKALYRCRECLEPFDYFKCI Pseudomonas putida PaaD_pseSpp WDAVCDVPDPEVPVLTIEDLGVLRDVHVQADGSVQVTITPTYTGCPAMSMFAFDIEAALLNAGFDKVEVKTVLNPAWTTDWLSEKAREKLRAYGIAPPNGKAS----RRALFGEEQVRCPKCNSANTSRISEFGSTACKALYRCNDCSEPFDYFKCI Pseudovibrio spp PaaD_fluTaf WSYLEEVPDPEVPVLSIIDLGIVRGVKVISDSEVHITITPTYSGCPAMNYIEKSIQEILTEKGFKTIHIDTILAPAWTTDWMSENGKKKLLEYGIAPPVNEVDKLVLFG---TAPTVKCPQCGSKDTKMLSQFGSTACKALYQCTSCLEPFDYFKCL Fluviicola taffensis PaaD_achXyl YAWLQEVPDPEIPVLSVVDLGVVRDVAW-DGDACVVVITPTYSGCPAMREITEDIRQVLARHGVGEVRVETRLSPAWTTDWMSEKGRAALKDYGIAAPAQQAIDISGISRRNAGPPIECPRCGSRDTRLVSNFGSTSCKALYRCVSCREPFDYFKTH Achromobacter xylosoxidans PaaD_sinMel WHWLSQVPDPEIPVISVTDLGIVRNVDW-DGETLVVTVTPTYSGCPATAVINLDIERRLTENGIESVRLERQLSPAWTTDWISAEGREKLESYGIAAPVDGTGAIEGLEG-GAALTVSCPRCGSTRTDKVSQFGSTPCKASYRCRDCLEPFDYFKCI Sinorhizobium meliloti PaaD_oceSpp WDLLDEVKDPEVPVLTIWDLGILRDIEREGD-SVIVTITPTYSGCPAMDNISTDVTQVLNDAGYADVKVKTSLSPAWSSEWMSPEGRRKLRNYGIAPPEDADL---DEDGLTPDAHAQCPHCSSRNTRRVSEFGSTACKALFQCNDCNEPFDYFKKI Oceanospirillum spp PaaD_polGil WGWLEEVPDPEIPVLSLVDLGVIRSIGW-DGGRLVVKVTPTYSGCPATSVINFEIEKALRDHGITDLVLERQLSPAWTTDWISESGREKLRAYGIAPPVAGT-AVCGSGG-QADPVVACPRCGSTDTAQVSRFGSTPCKAAYRCNACLEPFDYFKCI Polymorphum gilvum PaaD_marAlg WALLEEVKDPEVPAVSVVELGIVRAVRW-DGKELSIDVTPTYSGCPATELIEELIIEAMRAAGFRAPNINQVLTPAWTTDWITAEGKEKLRAFGIAPPEGSSSKLSLLG---EPDIIACPHCGSTDTEQVSEFGSTACKALYRCTECLEPFDYFKCI Marinobacter algicola PaaD_niaKor WSILEEVCDPEVPVLTIVDLGIVRDVKVNEE-AVEVVITPTYSGCPAMDVIRMNIRMALLQHEYKNVQITTVLSPAWTTDWMTEAGKEKLKAYGIAPPNVKQQVC-NTHLFAEDEAVQCPHCNSYNTRRISEFGSTACKSLYQCNDCQEPFDYFKCH Niastella koreensis PaaD_artPhe WDIAATVVDPEIPVLSIADLGILRDVEVAGD-HVKVTITPTYSGCPAMDAIRDDVKTAFEKEGYTDVEVDLVLSPAWTTDWMTEEGKAKLQEYGIAPPTGHSKAARHAGPIRLSMAVKCPQCASLNTKELTRFGSTSCKALYVCQDCKEPFDYFKVL Arthrobacter phenanthrenivorans PaaD_burBac WALLAEVMDPEVPVISVLDLGLVREVLD-DGQTLDVVLTPTYSGCPATEIIEASVRDALERGGLGPLRVAMRRAPAWTTDWISEAGREKLRAYGIAPP-GAVDPQAAQTIHLVSRRVPCPRCGDTHTERLSAFGATACKALYRCLACREPFEHFKPI Burkholderiales bacterium Consensus W l ! DPE Pvls! #LG vR ! v TPTYSGCPAt i ! al G v v L PaWtTDWmse grekLr YG!apP g ! CPrCgs T SeFGSTaCKAlYrC C EPFDYFKci Prim.cons. WALLSQVPDPEVPVLSITDLGMVRDVEWEGdGWVvVTFTPTYSGCPATEliLGDIRQALTEAGFTPVhVEvQLSPAWTTDWMSEEGREKLREYGIAPPQG AHACS HGGL APPPVQCPRCGSTHTELISEFGSTACKALYRCNsCREPFDYFKCI PaaD_conSeq LLSQVPDPEVPVLSITDLGMVRDVEWEGDGWVVVTFTPTYSGCPATELILGDIRQALTEAGFTP-VHVEVQL consensus PaaD sequence LL+++ DPE P L++ L +V+D+E EGD ++ V TPT C LI IR L E P HV+V++ 38% identical, 55% similar FAM96B_schPom LLAKINDPEHP-LTLAQLSVVKDIEVEGDSYITVHITPTIPHCSMCTLIGLCIRVRL-ERCLPPRFHVDVKV Schizosaccharomyces pombe NP_594677
Other DUF59 domains of interest: >DUF59a_theSco Thermus scotoductus YP_004202903 uninformative genomic context 3CQ1 Q53W28 rhamnose-related MSETNPLETQALALLENVYDPELGLDVVNLGLIYELRVEPPLAYVRMTLTTPGCPLHDSMGDAVRQALSRIPGVEEVQVELTFDPPWTPARLSEKARRALGWG >DUF59b_theSco Thermus scotoductus CP001962 REGION: 574428..574829 apparently SufT but uninformative genomic context MDERHREGLPEGQAGLQEGNANQAGGKEGLPTKEQVLEALKVVYDPEIPVNIVDLGLVYDVEIHENGVVDLTMTLTAIGCPAQDMVKADAEMAVMRLPGVQGVNVEFVWTPPWTPARMTEEGKRMMRMFGFNV >PaaD_theSco Thermus scotoductus CP001962 REGION: 1148207..1168662 Paa operon: PaaG PaaB PaaI PaaJ PaaN MVERYWEALKGVKDPEIPVLNIVEMGMVLGVEAEGEKVRVRFRPTFSGCPALRLIREEIEKALREAGAKEVEVVEARTPWSTEDMAEEARRKLLGYGVAPPLPLPLAGKDPPCPRCGSREVVLKNTFGATLCKMLYQCAACGEVFEAFKTV >Duf59c.NBP35_theSco Thermus scotoductus CP001962 REGION: 1432113..1433165 uninformative genomic context MALTEERVLEALRTVMDPELGKDLVSLGMVGEVRLEGGRVDLLINLTTPACPLKGQIEADIRRALHPLGVEEVRVRFGGGVKAPEQYPIPGVKHVVAVGS GKGGVGKSTVAANLALALLQEGARVGLLDADLYGPSQAKMFGLEGERLKVDQHRKILPLEAFGLKVLSIANIVPPGQAMIWRGPILHGTIKQFLEEVNWG ELDYLVVDLPPGTGDVQLSLAQLTKVSGGVIVTTPQEVALIDAERAADMFKKVQVPVLGVLENMSHFLCPHCGKPTPIFGEGGGKRLAERLKTRFLGEIP LTLPLRESGDRGRPILVESPEGPEAEAFRRAARELAAALSVQAFIALPMA >Duf59.NBP35_araTha Arabidopsis thaliana Q6STH5 MPLLHPQSLRHPSFEIQTQRRSNSTTRLLLSHKFLHSQASIISISRTRILKRVSQNLSVAKAASAQASSSVGESVAQTSEKDVLKALSQIIDPDFGTDIV SCGFVKDLGINEALGEVSFRLELTTPACPVKDMFENKANEVVAALPWVKKVNVTMSAQPAKPIFAGQLPFGLSRISNIIAVSSCKGGVGKSTVAVNLAYT LAGMGARVGIFDADVYGPSLPTMVNPESRILEMNPEKKTIIPTEYMGVKLVSFGFAGQGRAIMRGPMVSGVINQLLTTTEWGELDYLVIDMPPGTGDIQL TLCQVAPLTAAVIVTTPQKLAFIDVAKGVRMFSKLKVPCVAVVENMCHFDADGKRYYPFGKGSGSEVVKQFGIPHLFDLPIRPTLSASGDSGTPEVVSDP LSDVARTFQDLGVCVVQQCAKIRQQVSTAVTYDKYLKAIRVKVPNSDEEFLLHPATVRRNDRSAQSVDEWTGEQKVLYGDVAEDIEPEDIRPMGNYAVSI TWPDGFSQIAPYDQLEEIERLVDVPPLSPVEV >DUF59_bacAnt Bacillus anthracis Q81XF6 3LNO MSQEAFENKLYANLEAVIDPELGVDIVNLGLVYDVTADENNNAVITMTMTSIGCPMAGQIVSDVKKVLSTNVPEVNEIEVNVVWNPPWSKERMSRMAKIALGIRD >DUF59_halHal Halobacillus halophilus HE717023 10001..10312 gene order: yitW MobB moaE moaD (molybdopterin synthase sulfur carrier) MSTATEENALGALENVIDPELGIDIVNLGLVYGVDIDPDGNATVTMTLTAMGCPLAAHIEQDVKGCLADLPEINQVAVNIVWNPPWTKDRMSRYAKIALGIPD >YITW_bacSub Bacillus subtilis P70949 MEEALKENIMGALEQVVDPELGVDIVNLGLVYDVDMDEDGLTHITMTLTSMGCPLAPIIVDEVKKALADLPEVKDTEVHIVWNPPWTRDKMSRYAKIALGIQ >YitW_strGal Streptococcus gallolyticus F0VS24 N6adenine-specific DNA methylase MSEQKYTEEEVAKIKDRILEALEMVIDPELGIDIVNLGLVYEIRFEQNGHTEIDMTLTTMGCPLADLLTDQIHDVMREIPEVTNTEVKLVWYPAWTVDKMSRYARIALGIR >Duf59_theThe Thermus thermophilus Q53W28 TTHB138 dDTP-4-Keto-L-Rhamnose reductase-related dimer MTARNPLEAQAWALLEAVYDPELGLDVVNLGLIYDLVVEPPRAYVRMTLTTPGCPLHDSLGEAVRQALSRLPGVEEVEVEVTFEPPWTLARLSEKARRLLGWG >Duf59_theMar Thermotoga maritima Q9WYV7 1WCJ TM_0487 MPMSKKVTKEDVLNALKNVIDFELGLDVVSLGLVYDIQIDDQNNVKVLMTMTTPMCPLAGMILSDAEEAIKKIEGVNNVEVELTFDPPWTPERMSPELREKFGV >DUF59.NUBPL.Duf971_araTha Arabidopsis thaliana Q6STH5 HCF101 Viridiplantae MPLLHPQSLRHPSFEIQTQRRSNSTTRLLLSHKFLHSQASIISISRTRILKRVSQNLSVAKAASAQASSSVGESVAQTSEKDVLKALSQIIDPDFGTDIV SCGFVKDLGINEALGEVSFRLELTTPACPVKDMFENKANEVVAALPWVKKVNVTMSAQPAKPIFAGQLPFGLSRISNIIAVSSCKGGVGKSTVAVNLAYT LAGMGARVGIFDADVYGPSLPTMVNPESRILEMNPEKKTIIPTEYMGVKLVSFGFAGQGRAIMRGPMVSGVINQLLTTTEWGELDYLVIDMPPGTGDIQL TLCQVAPLTAAVIVTTPQKLAFIDVAKGVRMFSKLKVPCVAVVENMCHFDADGKRYYPFGKGSGSEVVKQFGIPHLFDLPIRPTLSASGDSGTPEVVSDP LSDVARTFQDLGVCVVQQCAKIRQQVSTAVTYDKYLKAIRVKVPNSDEEFLLHPATVRRNDRSAQSVDEWTGEQKVLYGDVAEDIEPEDIRPMGNYAVSI TWPDGFSQIAPYDQLEEIERLVDVPPLSPVEV >DUF59.NUBPL.Duf971_medTru Medicago truncatula BT138281 MQAVQASSSPHFSIHSSKPPHSSTCSLVTSSVNVKCSGFSLREQSSLWTSYNKRVILKSSFSAKAASVEVGSSSISTGTAEDDVLKALSQIIDPDFGTDI VTCGFVKDLQIDKALGEVSFRLELTTPACPIKDVFEKQANEVVAVLPWVKNVNVTMSAQPAKPLFAEQLPAGLQTISNIIAVSSCKGGVGKSTVAVNLAY TLADMGARVGIFDADIYGPSLPTMVSSENRILEMNPEKKTIIPTEYMGVKLVSFGFAGQGRAIMRGPMVSGVTNQLLTTTEWGELDYLVIDMPPGTGDIQ LTLCQIVPLTAAVIVTTPQKLSFIDVAKGVRMFSKLKVPCVAVVENMCHFDADGKRYYPFGRGSGSQVVQQFGIPHLFDLPIRPTLSASGDSGMPEVVAD PQGEVSKIFQNLGVCVVQQCAKIRQQVSTAVTYDKSVKAIRVKVPDSDEEFFLHPATVRRNDRSAQSVDEWTGEQKLQYTDIPDYIEPEEIRPMGNYAVS ITWPDGFSQIAPYDQLQTMERLVGVS >DUF59.NUBPL.Duf971_ostTau Ostreococcus tauri XP_003083908 Viridiplantae MKTKNSSITRTKSPTLERLNDADASMIDANAPYRVELAESDLIAAESVRRAIEEPKGHLTRCLAKRLGGDALRDALKETLRIERLEGGSRYEEGFKGSGV WRERTRAGLFLVVIKKKHDAKLVSAALAEIDVVYASFRDWKTGHKHFTSRRALHASSTHKTRLRDRTRNVGHVRLVSERSRISTGERGEFQFSDKSRRIN SNRVVRARHAPSTSTEGDLMDTTSMPAIYSAPDGSKESEVLSKLRRVIDPDFGEDIVNCGFVKALVIDESAGSVLFAIELTTPACPVKAEFERQAKAFVE ELDWVKRVSVTMTAQPARNDAPETVEGLRRVSHIIAVSSCKGGVGKSTTSVNLAYTLAMMGAKVGILDADVYGPSLPTMISPDVPVLEMDKETGTIKPVE YEGVKVVSFGFAGQGSAIMRGPMVSGLINQLLTTTDWGELDYLIIDMPPGTGDVQLTLCQVVPITAAVVVTTPQKLAFIDVEKGVRMFAKLAVPCVSVVE NMSYFEVDGVKHKPFGEGSGAKICEQYGVPNLLQMPIVPDLSACGDTGRPLVLRDPTCETSSRYQEVAATVVREVAKLNNGKKPRVDIDPGYDGAFRVEI PGENNDKAFWITAKNVRLSDESARVKGSDESPDRLLNGAPIPDDIAPVEMSVIGNYAMSITWPDGLSQVAAFSTLAKLERLPARAS >DUF59.NUBPL.Duf971_micSpp Micromonas sp. XP_002502646 Viridiplantae MATLAAIRPSDSRLGARLKVGGTGGRIARKTVTAGAGRRTTSTRRGPVSIRAHASSSSPSITSAPDGSREADVLNALRNVIDPDFGEDIVNCGFVKDLRV SDAGDVTFTLELTTPACPVKEEFDRLSKQYVTALEWAKSCNVNMTAQPVTNDMPDAVEGLKGVRHIIAVSSCKGGVGKSTTSVNLAYTLRMMGAKVGIFD ADVFGPSLPTMTSPEQAVLQMDKETGSITPTEYEGVGIVSFGFAGQGSAIMRGPMVSGLINQMLTTTAWGDLDYLIIDMPPGTGDVQLTICQVLPITAAV VVTTPQKLAFIDVEKGVRMFSKLRVPCVAVVENMSYFDGDDGKRYKPFGEGSGQRICDDYGVPNLFQMPIVPDLSACGDTGRPLVLVDPAGDVSTIYGAV AAKVVQEVAKLQAGPKGSLALDTEGVAGVDGALRVQLADEGGMPFYVRGCDVRRSDKSATADGESKKADFLMDGVTPVPDDIAPVEAHVVGNYAVQISWP DGFSQVATFAQIQALSRLPAGAKVGA >DUF59.NUBPL.Duf971_thaPse Thalassiosira pseudonana XP_002293925 frag stramenopiles QSQILAALSVINDPDLNADIVSLGFVQNLKIDESSNIVSLDLELTTPACPVKDLFVQQCQDIINGLAWTRGADVTLTSQPTAAPSDAPLGMSQIGAVIAV SSCKGGVGKSTTAVNLAFALESLGAKVGIFDADVYGPSLPTMVTPEDDNVRFVGRQIAPLRRGDVSLMSFGYVNEGSAIMRGPMVTQLLDQFLSLTNWGA LDYLIMDMPPGTGDIQLTLSQRLNITAAVIVTTPQELSFVDVERGVEMFDTVNVPCIAVVENMAYLEREETEMIRIFGPGHKRRLSEQWGIEHTYSVPLM GQIAQNGDSGTPFILDNPKSPQADIYRQLAKSVVSEVAKIKFCTGKGGRPSVSYDVEKSILRVDDGDIQNATISPAELRRGCRCAACVEELTGKQILNPA SISESVKPLNMSPTGNYALSVDWSDGHRSLYPYRQIRSM >DUF59.NUBPL.Duf971_phaTri Phaeodactylum tricornutum XP_002186426 frag stramenopiles EVLSTLKSVIDPDLGSDIVTLGFVQNLKLDGRDVSFDVELTTPACPVKEQFQLDCQQLVQDLPWTNNIQVTMTAQPSVQETATLGMSQVGAVIAVSSCKG GVGKSTTAVNLAFSLQRLGATVGIFDADVYGPSLPTMITPQDDTVRFVGRQVAPLQRNGVRLMSFGYVNDGSAVMRGPMVTQLLDQFLSVTHWGALDYLI LDMPPGTGDIQLTLTQKLNITAAVIVTTPQELSFADVVRGVEMFDTVNVPCIAVVENMAYYESADPEKIQIFGAGHRDRLSQQWGIEHSFSIPLLNKIAA NGDNGTPFVLEFPDSPPAKIYQELASAVVSEVAKTKFAKSMRPSVQYDAESHLLQVSQNGVGSTDEEHVATLPPAELRRACRCAACVEELTGRQILVPSS VSDKIAPRNMVPTGNYALSVDWSDGHRSLYPYRQIRAL >DUF59.NUBPL.Duf971_ectSil Ectocarpus siliculosus CBJ25905 stramenopiles MHMQLKRPCVVAAAVASLQCLCSKAFVQPFSQHRAKQGIWPHQYSCKSTSLPVQQRARSYHVSMMTETTSDTPPGRKDEVLAVLSAVMDPDLSMDIVSLG FIKELEISGEDEGRQVVTFDVELTTPACPVKAQFQQDCRDLVEALPWVDRAEVTMTAQPVRDVSDTVPTGLSKVATIIAVSSCKGGVGKSTTAVNLAFAL DKQGAKVGILDADIYGPSLPTMVKPDREEVEFVGNQIRPMTAHGVKLMSYGFVNQGAAIMRGPMVSQLLSQFVTLTSWGELDYLVIDMPPGTGDIQLTLC QVLNITAAVIVTTPQKLSFTDVVKGIDLFDTVNVPSVAVVENMAYYDAVDQTVFKTGLESNIEDLLTLDGDELSAAAAREGLASPPQDETALRAAVIAEV MKKRASTKQREYIFGKGHQMRLADMWGITNTIRMPLVADVASSGDSGIPFVVSKPDSDHSESYSQLAEAVVREVAKLKFSDNDRPMLSFQPAEGTVTIET SGGKQVMAAADLRRQCRCALCVEEFSGKPLLDPASVPENIVPTEFAPIGNYAVSVKWDDGHSSLYPYKNFAMGYKPRTKRPELSPV >DUF59.NUBPL.Duf971_albLai Albugo laibachii CCA21133 stramenopiles MGNKPSRIQSSDHATDLNPEDAFDGDSVTKCTIGLTPDLINHLNDLEKANKSPHSSDASPNERLIQIDENMYRKQLENAYRKGEEDGRLKIGREIQELQS STSHQAKEAKKLEVEANARIRELVDEISQKKYNAPIKQVQCTEERLACLKCYQENPETVLRCKEVADAFIQCGQEATDYNVFAIMKIMRHLPPVRLSRWT FSSSKHLLYGIKSSQLLRVQSLPSCSYSTRQNALLEMDILSKLRQVPDQLGLKSDIVTLGRVKNVQLSLQEKSVYLTLEAPNGALLDVAEQWKKDSMESL RELDWIQSLHIETARPKPKNLHAKRSSTLENVSEIVAVSSCKGGVGKSTVAVNLAYSLVQRGARVGILDADIYGPSLPTMINPEDRVVRPSPTNKGFILP LEFQGVKLMSFGFVNQKAAPGAGGVGAAVMRGPMVSKLIDQLILATQWGSLDFLIVDMPPGTGDIQMSLTQQMPISAAVIVTTPQRLSTIDVEKGIVMFQ NLKVPSVAVVENMAFFDCIHGTRHYPFGRSHMQELAEKYSIEHMFQLPITQESAYSADHGKPFVLGGNDPNTVETYKHLAEAIAREVVKLRHKALLAPEF LFNPARGILLRSYTSTHAKEITISAAQLRAQCRCAQCVDEFTGKQLLDITKISTEIVPTTIQRKGNYAYAVAWSDGHSASLYTDQHLQQLISEKSSS >DUF59.NUBPL.Duf971_parTet Paramecium tetraurelia XP_001429812 Alveolata MLRRLTASVPIVHKLYFTFSVNEEYKIQILNRLKQIKHSDSHKDIVSNGYVENLSIDQDGRVIIDLKLDQDYRKMKALCSDALKQFEWIKNLDIRMAPKK ENVFTQANTQKRGNLQNVKKIIAVSSCKGGVGKSTIALNLTFSLQKLGFNVGIFDADVYGPSLPTLIGKEKQQLYAPEDKPKEILPIEFNGVKTMSYGYA SGNQKAIIRGPMVSSIVVQLVQQTQWQNLDYLVVDMPPGTGDIQISLCQELNFDGAVIVTTPQRLSFIDVVKGIEMFDVLKVPTLSVVENMAEYVCPDCN HVHRPFGQGYMNMLQKQFGIATAVSIPLYGDISKYSDLGSPVVLTLPEDHTINNIYRQLANNVVHELSRSDLTKTPTVRYDTGKRVIIIRDFDGKEKPIK SVELRSKCNCALCVDEFTGRRLNQNQQLDQEVYPYKIEPKGNYAVAIVWSDGHRSSIYPYKRLWSDEIIEHKS >DUF59.NUBPL.Duf971_tetThe Tetrahymena thermophila XP_001007903 Alveolata MIRNTVQLFKQCARSATILKKLNLNSQSFSNTIKIGQLTQKPTFHFSNAVQEGKAEITKKLKEITFEDGSNIIDNGSILTIDIESSGKVTVQLKLDQNYR KLKGLCNAKLQEIPWIKEFEIKMAPKDQETSFKKRGQLENVKKIIAVSSCKGGVGKSTVAINLAFSLLKQGHKVGIFDADIYGPSIPTLINKENAILQAP EDRPKEILPIEYEGLKTMSYGFARKKAIIRGPMVSAIVTQLAMQTQWGDLDYLIVDMPPGTGDIQITLCQEIKFDGAVVVTTPQKLAFVDVIKGIEMFDE LKVPTLAVVENMCLFVCDGCGKEHHPFGPGYMNMLKNQFGIQSSVQIPIYDMIAKYSDYGRPVSITLPDEHTITKIYSSLAENVHQEILKLQNGNNEPPI VRYQTGNSLVIVEKNNGEIKKMKADVLRKHCNCALCVDEFTGKRLIKDDTIDNEVYPYKIEPKGNYAVAIIWSDGHRSSIYPYDTLFSDKIPAYEEPEKK KSACSTSK >DUF59.NUBPL_perMar Perkinsus marinus XP_002785765 Alveolata MARFGRRGFANETAAATEREKEILQQLSLIIDPDLHKDIVTLGFVQNLTISDEGVVVFDLKLTTPACPVRDQFIDACTRACSALPWVTDVKVTLSAKSRA GGAPEVKSENLSNVQNIVAVTSCKGGVGKSSVAVNLAYSIAKHGVKVGILDADIFGPSLPYLIPSTERAPADPQPYYHNGVKLMSMGYIRPGESVAVRGP MVSGMIQQMLTMTDWGHLDYLIIDYPPGTGDVQLTIGQQAKVDAAVVVTTPQQLSLVDVEKGIELFDKLNIPSIAVVENMAYFKCPTCSDKHQVFGRAAD SKHLAEKYGIQSHVELPIDPDMARNVDDVKASAFPFVCNEAFDGSEASKAFESLADDVIRGVSKVL >DUF59.NUBPL_pauChr Paulinella chromatophora YP_002049254 Rhizaria no Duf971 MVFTTNEALQVLAVILDEGSRRSVIELGWISRLRIQNSRIIFRLELPNFANKQRDEIVKKARASLLLLEGMKDVQIEIGSTVPATAPIGQAGHGAENGRQ SISGVKHILAVSSGKGGVGKSTVAVNLACALARSGLKVGLLDADIYGPNVPTMLGVEDVKPEIAGTGNQQVLSPIVCYGISMVSMGLLIDKNQPVIWRGP MLNGIIRQFLYQVEWENKDVLVVDLPPGTGDVQLSITQAIPLVGAVIVTTPQAVSLQDARRGLAMFIQMGVNILGVIENMSVFIPPDRPEQRYALFGNGG GSTLAEEADVELLTQLPMEILVQQGSDRGKPIVLSQSTSTTGKAFIALAEKIKLKFLKVE >Duf59.NBP35_theSco Thermus scotoductus CP001962 REGION: 1432113..1433165 uninformative genomic context MALTEERVLEALRTVMDPELGKDLVSLGMVGEVRLEGGRVDLLINLTTPACPLKGQIEADIRRALHPLGVEEVRVRFGGGVKAPEQYPIPGVKHVVAVGS GKGGVGKSTVAANLALALLQEGARVGLLDADLYGPSQAKMFGLEGERLKVDQHRKILPLEAFGLKVLSIANIVPPGQAMIWRGPILHGTIKQFLEEVNWG ELDYLVVDLPPGTGDVQLSLAQLTKVSGGVIVTTPQEVALIDAERAADMFKKVQVPVLGVLENMSHFLCPHCGKPTPIFGEGGGKRLAERLKTRFLGEIP LTLPLRESGDRGRPILVESPEGPEAEAFRRAARELAAALSVQAFIALPMA >Duf59.NBP35_marHyd Marinithermus hydrothermalis YP_004367725 MSRLSETDVLDALRGVNDPELHKDLVSLGMVEQVVVDGRKVAVKINLTTPACPLKGQIEGEVRAALERIGAEHVEITFGASVRGPQQLPLPGVKNVVAVGSGKGGVGKSTVAVNLAIALSQEGARVGLLDGDIYGPSQAR MLGLEGEKLRVNEAKKIVPLERYGIRVLSIANIAPPGQALVWRGPILHGTIRQFLQDVDWGELDYLIVDLPPGTGDVQLSLSQLTQVTGGVIVTTPQDVARIDAERAADMFRKVQVPLLGVIENMAYYACPSCGERSYLF GQGGGRKLAESQNTAFLGEIPLSMPVRESGDAGTPITVAHPDAPEAQAFRQVARQLAGQLSIRNLSTLPMV >Duf59.NBP35_theRos Thermomicrobium roseum YP_002521772 MSELTRERVLEALRPVQDPELHRSLVDLGMIKEVTIEGASVRVQVELTTPACPLRERIREDVERAVRALPGVQTVEVGFSSRVRAAGTGLPDRQPIPGVKNTIAVASGKGGVGKSTVAVNLAVALAQEGATVGLLDADVY GPSIPLMLGAEEQPGLVDNKIIPGRAYGIAVMSVGYILDPEKALIWRGPLVSQLIRQFLSDVQWGDLDYL VIDLPPGTGDVQLTLVQTIPLSGAIIVTTPQDVALADAIKGLQMFREVKTPVLGIVENMSYFVCPHCGHVAEIFGSGGGERVANKYGVPLLGQIPIDPAVREGGDRGVPVVVGQPGSSTAQAFREAARQAAARLSVEAVKKPRKPVMMLQPKR >Duf59.NBP35_deiMar Deinococcus maricopensis YP_004170951 ParA/MinD-like MHQDDVLAALRTVNDPELHRDLVSLGMVKGVRVDGDRVDVHVELTTPACPLRGTIEADVRRAVEAAGARDVRVEFSARVAPPAQPALPGVKHVLLVGSGKGGVGKSSVAANIAASLAADGARVGLMDADVYGPSIAHMMG ASGEKVTATEDRKMRPLERHGVKFISMGNLSPAGQALVWRGPMLHSAVQQFLKDAAWGELDYLIVDLPPGTGDVQLSITQSVNVTGAVIVTTPQDVALIDAARAVDMFRKASVPILGVVENMSYFVAPDTGITYDIFGRG GARKLGGLTVLGEVPLDTEVRQDADGGVPSVLAHPQSAASVALRSVARTLAGRVSVQALDVLPMV >Duf59.NBP35_lepFer Leptospirillum ferrooxidans YP_005469663 MSTVSEELVWSALGRVIEPDFKKDLVTLKMIENLKIDGGNLSFTIVLTTPACPLKDEMKNACLASLASVPGITNTEISFTARTTGGSFTGKTPIPGVKNVIAVSSGKGGVGKSTTSVNLAIALSQMGAKVGIMDADVYGP NIPMMLGITDTPRQVDKKLFPPSGHGITVMSMAFMVPPGTPLIWRGPMLHGIIQQFCQDIAWGDLDYLVVDMPPGTGDAQLSLAQLVPLSGAIIVTTPQEVALSDSRRGLAMFQKVNVPILGIVENMSSFVCPHCHEETD IFSKGGGEKAAHELHVPFLGRIPIDLSIREGGDSGHPIAVAYPESPLTQSYRDIAGKLASAISISNAGAIPIQIGNFGG >Duf59.NBP35_myxFul Myxococcus fulvus YP_004668407 MSVTQADILSAMSKVMDPELHVDLVKAGMVKDIRVSGDTAKLKIELTTPACPMKGKIQADAEAALKAVPGLKSFDIEWGAQVRAAGGGMPGGALLPKVKNIILVGAGKGGVGKSTVAVNLATALAQHGAKVGLLDADFYG PSVPLMTGLGDKRPVSPDGKVLNPLEAHGLKVMSIGFLVEADQALIWRGPMLHGALMQLVRDVNWGELDY LVLDLPPGTGDVALTLSQSVRAAGAVLVTTPQDVALADVVRAKQMFDKVHIPVLGIVENMSQFVCPNCSHATPIFNHGGGQKAAQMFGIPFLGEIPLDLKVRVSGDSGVPVVVGAKDSPEAKAFQDVARNIAGRVSAQSAKSIPLPVMQAR >Duf59.NBP35_corCor Corallococcus coralloides YP_005370154 MSVSQADVMTAMSKVIDPELHVDLVKAGMVKDVRVTGDTVKLKIELTTPACPMKGKIQADAEAALKAVPGLKSFDIEWGAQVRGVAGGSGGALLPGVKNILLVGAGKGGVGKSTVAVNLATSLAQHGAKVGLLDADFYGP SVPLMTGTADKRPVSPNGKTLDPLVAHGLKIMSIGFLVEADQALIWRGPMLHGALMQLVRDVNWGELDYLILDLPPGTGDVALTLSQSVRAAGAVLVTTPQDVALADVVRAKQMFDKVHIPVLGIVENMSQFVCPHCNKS TPIFNHGGGHKAAEMFGIPFLGEIPLDLKVREAGDSGVPVVVGHQDSPEAKAFQEVARAVAGRVSTQSMKSVPLPVMQAR >Duf59.NBP35_stiAur Stigmatella aurantiaca YP_003953960 MSVSERDILAAMSKVVDPELHVDLVKAGMVKDIRISGDAVKLKIELTTPACPMKGKIQADTEAALKAVPGLKSFELEWGAQVRATGGGVGQGQGQALLPGVKNIILVGAGKGGVGKSTVAVNLATALARHGAKVGLLDAD FYGPSVPLMTGITEKPVSPDGKTLTPMSKYGLKIMSIGFLVEPDQALIWRGPMLHGALMQLVRDVNWGELDYLILDLPPGTGDVALSLSQSVRAAGAVLVTTPQDVALADVVRAKSMFDKVHIPVLGIVENMSQFICPNC SHATNIFHRGGGRKAAEMFSIPFLGEVPLELKVRESGDAGVPVVAGAPDSREAQAFLEIARNVAGRVSAQSVRSIPLPVMQAR >Duf59.NBP35_anaThe Anaerolinea thermophila YP_004174446 MASAVTKEAVLQALSHVQEPELHKDLVTLGMVRDVEIEAGKVRFRIVLTTPACPLKSRIENEARSAVLSLSGVQEVEVILDAQVPSDGRNRGVLSLPVRNVVAVASGKGGVGKSTVAVNLAVSLAQSGARVGLLDADIYG PNIPTMMGVQRLPPQNGQKLIPAEAYGVQVMSIGFLVKPGQPLIWRGPMLHSAIRQFLADVAWNELDYMIVDLPPGTGDAQLSLAQSVPLSGGVIVTLPQRVSQEDAMRGLQMFRELNVPVLGVIENMSYLELPDGTRMD IFGTGGGEDLAQAAEVPFLGAIPIDPGVRVGGDQGVPVVISAPQSAPARALTAIAQKIAASLSVAALSAPGTLSINVIE >Duf59.NBP35_anaSpp Anaeromyxobacter sp. YP_001379317 MPLDPTTALDALRKVMDPELRRDLVSLGMVKDVVVEGDTVRLKVELTTPACPLKDTIGRDVKAALEGAGFRSVELSWGAQVRAAPGAAQGQLTPGVKNIILVGAGKGGVGKSTVAVNLAAGLARTGAKVGILDADIYGPS VPMLTGVTDRPTSRDGKKLEPLHAHGMKVMSIGFLVDPDQALIWRGPMVTGALIQLLRDVNWGDLDYLVLDLPPGTGDIPLTLAQNVRAAGVVLVSTPQDLALADVIRAKLMFDKVSIPVLGIVENMSAFVCPHCRSETA IFDKGGARTAAEKMGIRFLGDVPIDLAIREGGDKGVPVVVGQPDSPQAAALLAVAKNVAGAVSTQVLKAPRLPVIGAQPRA >DUF59.NUBPL.Duf971_synSpp Synechococcus sp. YP_380733 Cyanobacteria MTPVEQANKALQQVKDAGSGKTALELGWIEQIRITPPRAVFRLSLPGFAQSQRDRIVAEARGALMALDGIEDVQIEIGQPPSQGGIGQAGHGQPAERQSI PGVRQVIAVSSGKGGVGKSTVAVNLACALAQTGLRVGLLDADIYGPNAPTMLGVADQTPEVQGSGDQQRIVPIETCGIAMVSMGLLIDDHQPVIWRGPML NGIIRQFLYQAEWGERDVLIVDLPPGTGDAQLSLAQAVPMAGVVIVTTPQQVSLQDARRGLAMFRQMGIPVLGVVENMSAFIPPDRPDCRYALFGSGGGA QLASDYDVPLLAQIPMEMPVQEGGDTGRPIVINRSDSASAAEFKGLAEAVLKAVTQTV >Duf59.NBP35_natPha Natronomonas pharaonis YP_327062 euryarchaeota MNEETVLDRLAAVEnDPDLGDDIVSLGLVNDVNIDAETIHVDLALGAPYSPTETELAGTVRDALSELDREI DLTASVDTGLSADEQILPDVENIIAVASGKGGVGKSTVAVNLAAGLSQLGARVGLFDADVYGPNVPRMVE ADDQPKATEQETIIPPEKYGMKLMSMDFLVGEDDPVIWRGPMVHKVLTQLWEDVEWGALDYMVVDLPPGT GDTQLTLLQSVPVSGAVIVTTPQKVALDDAEKGLQMFGEHDTPVLGIVENMSGFVCPDCGSEHDIFGSGG GESFADDVEMPFLGRIPLDPAVREGGDAGRPVVLDEDDETGEALRSFTERTANMQGIVRRRQVSAADR >Duf59.NBP35_halPau Haladaptatus paucihalophilus ZP_08044106 euryarchaeota MDETDVRAVLRTVEDPDLGEDIVSLGLVNDVTVEDETARISLALGAPYAPHESEIANRVREALNDEGIDT ELSARVDTQLSPEEQVLPGVKNIIAVASGKGGVGKSTVAVNLAAGLAKLGARVGLFDADVYGPNVPRMVD ANERPRATEEQKLVPPEKFGVKLMSMAFLTGKDDPVIWRGPMVHKVLTQLWEDVEWGQLDYMVVDLPPGT GDTQLTLLQSVPVTGAVIVTTPQQVALDDANKGLQMFGKHDTPVLGIAENMSTFKCPDCGGEHDIFGHGG GAEFAEDHEMPFLGSIPLDPSVRSGGDEGEPIVLDDESDTGESFRTLTENVANNVGIINRRRQQQE >Duf59.NBP35_natPel Natrinema pellirubrum ZP_08964079 euryarchaeota MDEAAVRDRLRTVEDPELGDDIVSLGLVNDITVDGDEVAIDLALGAPYSPSESDIAAEVRETLTAEDLEP DLTASVPDRDDLTSEDQVLPNVKNVIAVASGKGGVGKSTVAVNLAAGLSQLGARVGLFDADVYGPNVPRM VDADEPPMATEDETLVPPEKYGVKLMSMAFLTGEDDPVIWRGPMVHKVITQLTEDVEWGHLDYLVVDLPP GTGDTQLTMLQTMPVTGAVIVTTPQDVALDDARKGLEMFAKHDTVVLGIAENMSTFACPDCGGEHDIFGS GGGEDFAEEHELPFLGSIPLDPAVREGGDGGKPTVLKDDDTTSDALRTITENVANNTGIVHRQAISQSRR NEAASPDR >Duf59.NBP35_halLac Halorubrum lacusprofundi YP_002566480 euryarchaeota MNEADVRERLVDVRDPDLGDDIVSLGLVNDVEVDDDEIRISLALGAPFSPHESAIADDVRAALADTGLDV ELSASIPDDLEPDEQVLPGVKNVIAVASGKGGVGKSTMAVNIAAGLSALGARVGLFDADVYGPNVPRMVS AEERPQTDGETIVPPERFGVKLMSMDFLTGEDDPVIWRGPMVHKIITQLVEDVEWGELDYLVMDLPPGTG DTQLTILQTLPLTGAVIVTTPQEVALDDAVKGLRMFGKHDTNVLGIAENMAGFRCPDCGGFHEIFGSGGG KALAQEHDLPFLGGVPLDPAVRTGGDDGEPVVLEEGETADAFRVIVENVANNAGVVRRRGVSEGR >Duf59.NBP35_halVol Haloferax volcanii YP_003536803 euryarchaeota MDEADVRDRLRAVEDPDLGDDVVSLGLVNAVEVDGDTVRISLALGAPYSPAETDIGRRIREVLAEDGLEV DLTAKVPTDRDPDEEVLPGVKNIIAVASGKGGVGKSTVAVNLAAGLSKLGARVGLFDADIYGPNVPRMVA AEEAPQATQDQTIVPPEKYGMKLMSMAFLVGDDDPVIWRGPMVHQLLTQLVEDVEWGSLDYLVLDLPPGT GDTQLSILQTLPLTGAVIVTTPQNVALDDANKGLRMFGKHDTNVLGIVENMSTFRCPDCGNRHDIFGAGG GREFAASNDLPFLGALPLDPAVREGGDGGKPIVLEDDDETADAFRVMTENIADMVGIVQRRSVSER >Duf59.NBP35_halXan Halopiger xanaduensis YP_004597659 euryarchaeota MDEDAVRDRLRTVEDPELGDDIVSLGLVNDIAVDGDEVSVDLALGAPYSPTESDIAAEAREVLIEAGLEP DLSASVPDRDDLSSDEQVLPGVKNVIAVASGKGGVGKSTVAVNLAAGLSQLGARVGLFDADVYGPNVPRM VDADEPPMATEDETLMPPEKYGVKLMSMAFLTGEDDPVIWRGPMVHKVITQLTEDVEWGHLDYLVVDLPP GTGDTQLTMLQTMPVTGAVIVTTPQDVALDDARKGLEMFAQHDTVVLGIAENMSSFACPDCGSQHDIFGS GGGREFADEHDMPFLGSIPLDPSVREGGDSGKPTVLDDDSEVGESFRTITENVANNTGIVHRRSVSQSQR AASRSDD >Duf59.NBP35_halTia Halorhabdus tiamatea ZP_08558390 euryarchaeota Cobyrinic acid MDEAELRELLASVEDPDLEGDIVSLGLVNDVALENGSAHIDLALGAPFSPTETTIADRVREVIGDAAPDL AVELTASIDRDTEGDVLPGVKNVVAVASGKGGVGKSTVAVNLAAGLADRGARVGLFDADIYGPNVPRMLD AHERPEATEDDQIIPPEKHGMKLMSMDFLLGEDDPVIWRGPMVHQTLTQLFEDVQWGDLDYLVVDLPPGT GDTQLTLLQTVPVTGAVIVTTPQGVALDDARKGLEMFGKHETPVLGIIENMSSFKCPDCGSEHAIFGEGG GREFADQVQMPFLGEIPLDPEIRERGDEGRPAVLADDLDVSGAFRNFVANTANNQGIVHRNRLSEQSE >Duf59.NBP35_calSub Caldiarchaeum subterraneum BAJ47606 thaumarchaeota MLIDEQTVLNALRQVIDPDLKIDIVTLGMIKNLVIKDGDVSFTLELTTPACPYNKSIEDSARAAVESIPG VRSVDMRVTARVWSAKPMASTYPDVKNVVAVASGKGGVGKTTVAINLACSLALSGARVGLVDADIYGPTI PKIVKIVEPPRLRPDKKVEPAKMMLGIKVMSLGLFVDEGTAVIWRGPLVASAVKQLLTEAQWGELDYLIV DLPPGTGDASLTLAQTMPLTGVVIVTTPQQAASVIAAKALSMFRRLGVTIIGIVENMSYYVCPECGKESS LFGQSHTDKMAAELDVEVLGRIPMSPDVSVNHDQGVPIVLAAPSSPAAKAFDEAAKKIAAKISILAHAAG AAKAGAK >Duf59.NBP35_cenSym Cenarchaeum symbiosum YP_875901 thaumarchaeota chromosome partitioning MCGTGAHMQAPPAGPSRHGWQGFRAGSFSRIRLSLNNENTLTENMVGVDQVLESLGKVIDPDLKKDIVSMGMIKDLELDDGNLKFTLELTTPACPFNVEI EDDVRKVIGELDGIKNLNLNVTAKVMEGRSLDEDAGMTTVKNIIGVASGKGGVGKSTVALNLALALGQTGAKVGLLDADIYGPSIPLMLGMKEAFMEVEA NKLQPAEASGIKVVSFGFFAEQAHKAAIYRGPIISGILKQFLVDTNWSDLDYLIVDLPPGTGDIPLTLAQTIPITGILVVTTPQNVASNVAVKAVGMFEK LNVPIIGVVENMSGFVCNKCGEKHNVFGEGGAKRISEQFKIPLIGEIPLTAGIMAGSEEGRPIILTDPDSPSSNAFRSSAKNIAAQCSIIAAKLQEEMAA EAAASPDAGGAEGQSPPQQDAGKPLGVASAAEKDGGL >Duf59.NBP35_nitSal Nitrosopumilus salaria ZP_10117149 thaumarchaeota MVGIDQVLEKLSTVIDPDLKKDIVSMGMIKDLELNDGNLKFTLELTTPACPFNVEIEDDVRKAIGEISELKNFDMKVTAKVMEGRSLEADTGMASVKNII GVASGKGGVGKSTVSLNLALALAQTGAKVGLLDADIYGPSIPLMLGMKDGFMEVEDNKLQPADSHGLKVVSFGFFADQSNQAAIYRGPIISGILKQFLVD TNWSDLDYLIVDLPPGTGDIPLTLAQTIPITGILVVTTPQDVASNVAVKAVSMFEKLNVPIIGVVENMSHFICPNCNEKHYIFGEGGAKKISEQFNMPFL GEIPLNSGIMAGSDLGKPIMITNPDSPSAVAFRKSAKNIAAQCSILAAKLQDEMAAESSEPTPEPTN >NBP35_pyrFum Pyrolobus fumarii YP_004781840 no Duf59_NUBPL in crenarchaeotes, most like NUBP1 MRVLVVSDALVKQRAEIMKKVIEQQRMIAEKMKKVRYKLVILSGKGGVGKSFVTASLAMALAMKGRRVAVFDADVHGPSIPKMLGVHGKRMYALPDGRLLPVEGPLGVKVVSIDFLLESEEQAVIWRGPLKTAAIRELLA YTDWGELDYLLVDLPPGTGDEQLTIAQLIPGLSGTIIVTIPSDVARIVVKKAITFAKRLNIPIVGVIENMSYFECPDGSRHYIFGKGAGRRIAEEMGVPFLGEIPIDPRISKANDEGKPFFVEYPDSSAAKAFLEIAERVIERVEGKGEGRSESQAKE >NUBPa_arcFul Archaeoglobus fulgidus NP_071094 3KB1 MQKRVTDEDIKERLDKIGFRIAVMSGKGGVGKSTVTALLAVHYAKQGKKVGILDADFLGPSIPHLFGLEKGKVAVSDEGLEPVLTQRLGIKVMSIQFLLP KRETPVIWRGPLIAGMIREFLGRVAWGELDYLLIDLPPGTGDAPLTVMQDAKPNGAVIVSTPQELTAAVVEKAITMAEQTKTAVLGIVENMAYFECPNCG ERTYLFGEGKASELARKYKIEFITEIPIDSDLLKLSDLGRVEEYEPDWFEFFPY >NUBPb_arcFul Archaeoglobus fulgidus NP_071205 2PH1 MQKRVTDEEIKERLGKIKSRIAVMSGKGGVGKSTVTALLAVHYARQGKKVGILDADFLGPSIPILFGLRNARIAVSAEGLEPVLTQKYGIKVMSMQFLLP KENTPVIWRGPLIAGMIREFLGRVAWGELDHLLIDLPPGTGDAPLTVMQDAKPTGVVVVSTPQELTAVIVEKAINMAEETNTSVLGLVENMSYFVCPNCG HKSYIFGEGKGESLAKKYNIGFFTSIPIEEELIKLADSGRIEEYEKDWFESAPF
NARFL (IOP1)
(coming shortly)
ERCC2 (XPD)
(coming shortly)
Curated reference sequences
It serves no current purpose to collect all possible full length MMS19 sequences from GenBank, so only a sample of 20 uniformly distributed over the eukaryotic phylogenetic tree is provided here. MMS19 presents no real homology complications, being present as a single-copy gene. Genes in early diverging eukaryotes are assumed single-exon, ie taken as the largest open reading frame enveloping the match to the ultra-conserved region. MMS19 is studied experimentally only in yeast and human.
The apparent absence in Giardia and various obligate parasites could be attributable to a reduced genome, extreme sequence divergence relative to available probes, or incomplete assembly -- it is inconceivable that these species lack core iron sulfur proteins of DNA metabolism. Indeed, the conserved cysteine pattern of primase large subunit are readily located in these species. It remains conceivable however that the very earliest diverging eukaryotes retain components of the archaeal iron sulfur cluster formation system.
The yeast gene, sometimes called MET18 in that literature, is unsurprisingly single-exon (only 283 of 6000 yeast proteins have them) and not located in a yeast-type operon. While some immediate neighbors are involved in DNA processes, none are homologous to iron sulfur cluster assembly components or have recognized 4Fe-4S cofactors themselves.
Gene Position Description MET18 chrIX:113806 DNA repair, TFIIH regulator, nucleotide excision repair, RNA polymerase II, telomere maintenance RRT14 chrIX:117024 rDNA transcription, localizes to nucleolus, involved in ribosome biogenesis STH1 chrIX:117992 ATPase component in chromatin remodeling, expression of early meiotic genes, helicase-related protein homologous to Snf2p KGD1 chrIX:122689 mitochondrial alpha-ketoglutarate dehydrogenase ASG1 chrIX:102782 zinc cluster transcriptional regulator stress response CSM2 chrIX:99860 homologous recombination repair, accurate chromosome segregation during meiosis SIM1 chrIX:128151 may participate in DNA replication
The human gene has 31 coding exons. These do not correspond to natural structural breaks in the tertiary structure (eg HEAT units) and the ultra-conserved regions is spread across parts of 3 exons. Thus despite its modular structure, MMS19 had already completed its internal expansion of domain units prior to the main era of exon formation and could not today expand further by exon duplication because these would present issues of compatible [phasing] as well as not corresponding cleanly to structural units.
Exon structure of human MMS19: columns show exon number, amino acid size, intron phasing (donor bp overhang), primary sequence, and ultra-conserved region. 1 37 1 MAAAAAVEAAAPMGALWGLVHDFVVGQQEGPADQVAA 2 17 2 DVKSGNYTVLQVVEALG 3 33 1 SSLENPEPRTRARAIQLLSQVLLHCHTLLLEKE 4 29 0 VVHLILFYENRLKDHHLVIPSVLQGLKAL 5 25 0 SLCVALPPGLAVSVLKAIFQEVHVQ 6 23 1 SLPQVDRHTVYNIITNFMRTREE 7 43 1 ELKSLGADFTFGFIQVMDGEKDPRNLLVAFRIVHDLISRDYSL 8 21 0 GPFVEELFEVTSCYFPIDFTP 9 29 0 PPNDPHGIQREDLILSLRAVLASTPRFAE 10 25 0 FLLPLLIEKVDSEVLSAKLDSLQTL 11 26 0 NACCAVYGQKELKDFLPSLWASIRRE 12 46 1 VFQTASERVEAEGLAALHSLTACLSRSVLRADAEDLLDSFLSNILQ 13 52 0 DCRHHLCEPDMKLVWPSAKLLQAAAGASARACDSVTSNVLPLLLEQFHKHSQ 14 26 1 SSQRRTILEMLLGFLKLQQKWSYEDK 15 42 1 DQRPLNGFKDQLCSLVFMALTDPSTQLQLVGIRTLTVLGAQP 16 28 2 DLLSYEDLELAVGHLYRLSFLKEDSQSC 17 33 1 RVAALEASGTLAALYPVAFSSHLVPKLAEELRV 18 50 1 GESNLTNGDEPTQCSRHLCCLQALSAVSTHPSIVKETLPLLLQHLWQVNR 19 52 1 GNMVAQSSDVIAVCQSLRQMAEKCQQDPESCWYFHQTAIPCLLALAVQASMP 20 34 2 EKEPSVLRKVLLEDEVLAAMVSVIGTATTHLSPE 21 34 0 LAAQSVTHIVPLFLDGNVSFLPENSFPSRFQPFQ 22 23 0 DGSSGQRRLIALLMAFVCSLPRN 23 42 1 VEIPQLNQLMRELLELSCCHSCPFSSTAAAKCFAGLLNKHPA 24 34 0 GQQLDEFLQLAVDKVEAGLGSGPCRSQAFTLLLW 25 19 0 VTKALVLRYHPLSSCLTAR 26 62 1 LMGLLSDPELGPAAADGFSLLMSDCTDVLTRAGHAEVRIMFRQRFFTDNVPALVQGFHAAPQ 27 28 0 DVKPNYLKGLSHVLNRLPKPVLLPELPT 28 55 0 LLSLLLEALSCPDCVVQLSTLSCLQPLLLEAPQVMSLHVDTLVTKFLNLSSSPSM 29 20 0 AVRIAALQCMHALTRLPTPV 30 34 2 LLPYKPQVIRALAKPLDDKKRLVRKEAVSARGEW 31 9 0 FLLGSPGS*
Alignment of ultra-conserved region: MMS19_homSap FTFGFIQVMDGEKDPRNLLVAFRIVHDLI-------SRDYSLGPFVEELFEVTSCYFPIDFTPPPNDPHG-IQREDLILSLR MMS19_musMus FTFGFIQVMDGEKDPRNLLLAFRIVHDLI-------SKDYSLGPFVEELFEVTSCYFPIDFTPPPNDPYG-IQREDLILSLR MMS19_cioInt FLYQYIQVIDGEQDPRNLLTIFQLTKNLI-------ESSFPLFDLVEELFDVSSCYFPIDFNPAAAGKKSTITNLDLVSSLR MMS19_braFlo FVWGFIQAMDGEKDPRNLIIAFSIAR-IV-------AQAFPIGTFTEELFEVISCYFPIDFTPPADDPHG-VTREDLVLGLR MMS19_strPur FVLGLLHAMDGEKDPRNLILLFNILP-TV-------INNFKIDMFIEETFEVVACYYPVDFHPPPNDPYG-ISREKLALSLK MMS19_sacKow FVFGYIQCMDGEKDPRNLTMIFRCVP-II-------IHNFPIDVFIEELFEVVSCYFPIDFTPPPNDPYK-VTQEELVLGLR MMS19_dapPul FVFGLIQLADQERDPRNLLILFSIFPVVA--------RYFRFEPFTEEFFEVFSCYFPIDFTPPANDPYA-VTKEQLCDGLR MMS19_droMel FVYGLINSIDGERDPRNLDIIFSFMPEFL--------STYPLLHLAEEMFEIFACYFPIDFNPSKQDPAA-ITRDELSKKLT MMS19_nemVec LVFGFLQAMDGEKDPRNLVVAFKLAR-II-------IKNFPIGLFAEDLFEVTSCYFPIDFTP------------------- MMS19_triAdh FVFGYIQVMDGEKDPRNLLLALKIAKFIV--------QNFNIDLFLDDFFEIISCYFPIDFTPPPNLP----SNENVTK--- MMS19_sacCer FIETFLHVANGEKDPRNLLLSFALNKSIT-------SSLQNVENFKEDLFDVLFCYFPITFKPPKHDPYK-ISNQDLKTALR MMS19_schPom FFSGICSTFAGEKDPRNLMLVFSMLK-KI-------LSTFPIDGFEQQFFDITYCYFPITFRAPPDATNLAITSDDLKIALR MMS19_araTha LVYAMCEAIDGEKDPQCLMIVFHLVELLAPLFP---SPSGPLASDASDLFEVIGCYFPLHFTHTKDDEAN-IRREDLSRGLL MMS19_dicDis FMVGYLQFIDNEKDPRNLIFSFKLLPKVIYNIP---EHKHFLES----LFEIISCYFPISFNPKGNDPNS-ITKDDLSNSLL MMS19_pytUlt FAQAFLNAMEGEKDPRNLLLCLQIARELLAKLE---VVFDRHDAVLQQYFDVVSCYFPITFTPPPNDPYG-ITSEELILSLR MMS19_sapPar LMDGFLRAMSGEKDPRNLLFCLRFAAELLTTYA---NVVDAD--VAKGFFDATSCYFPITFRPPPNDPYG-ITSEDLVLALR MMS19_polPal FMAGFLQFIDGEKDPRNIIYTFRLIPRVILYIP---EYKNFADS----LFEILSCYFPISFNPKPGDPNS-ITKDDLVSSLL MMS19_entHis -IDTCVQLIELERDPECLKEVFDLIKLVS-------QKNEIDADSAPLLFDCASAYFPILYPPKGDEA----LRIDLTNKIL MMS19_naeGru FLNGFIQSLEGERDPGNLLYCFNLIPKVIAIFDDSELSSKILSAVSDDLFDITSCYFPITYTPPANDTRG-ITREDLSRSLK MMS19_phyInf LAQTFLSAMEGEKDPRNLLLCMQVARTLLSKLE---PVFSRSDTLLQQYFDVVSCYFPIIFTPPPNDPYG-ITSEGLILSLR MMS19_albLai FIRSFLNAMTGEKDPRNLKHCFQIAQTMMQKLE---MVFQEAE-LSEQYFRVISCYFPITFTPPPNDPYG-VTTEELIRSLR Consensus f g q dgEkDPrnL f lF#v sCYFPI ftpp ndp it edl lr
Summary of MMS19 reference sequences: MMS19_homSap Homo sapiens mammal Q96T76 1030 aa 7 HEAT 100% MMS19_musMus Mus musculus mammal Q9D071 1031 aa 9 HEAT 89% MMS19_cioInt Ciona intestinalis urochordate XP_002128657 1026 aa x HEAT 32% MMS19_braFlo Branchiostoma floridae cephalochordate XP_002588594 1027 aa x HEAT 42% MMS19_strPur Strongylocentrotus purpuratus echinoderm XP_001194909 975 aa x HEAT 36% MMS19_sacKow Saccoglossus kowalevskii hemichordate XP_002735310 1007 aa x HEAT 40% MMS19_dapPul Daphnia pulex crustacean EFX86854 961 aa x HEAT 38% MMS19_droMel Drosophila melanogaster insect NP_649519 959 aa x HEAT 30% MMS19_nemVec Nematostella vectensis cnidarian XM_001629116 897 aa x HEAT 32% MMS19_triAdh Trichoplax adhaerens single-celled metazoan XP_002114595 959 aa x HEAT 39% MMS19_sacCer Saccharomyces cerevisiae budding yeast P40469 MET18 1032 aa 13 HEAT 29% MMS19_schPom Schizosaccharomyces pombe fission yeast Q9UTR1 1018 aa 14 HEAT 25% MMS19_araTha Arabidopsis thaliana plant NM_124186 1134 aa x HEAT 28% armadillo/beta-catenin-like MMS19_dicDis Dictyostelium discoideum slime mold Q54J88 1115 aa 18 HEAT 31% MMS19_pytUlt Pythium ultimum stramenopiles ADOS01001616 957 aa 31% MMS19_sapPar Saprolegnia parasitica stramenopiles ADCG01000470 804 aa 31% MMS19_polPal Polysphondylium pallidum amoeba EFA86574 994 aa x HEAT 28% MMS19_entHis Entamoeba histolytica amoeba XP_651925 868 aa x HEAT 25% MMS19_naeGru Naegleria gruberi early eukaryote: heterolobosea XP_002678884 1070 aa x HEAT 27% MMS19_phyInf Phytophthora infestans early eukaryote: stramenopiles 1114 aa x HEAT 33% MMS19_albLai Albugo laibachii early eukaryote: stramenopiles 1077 aa x HEAT 27%
>MMS19_homSap Homo sapiens mammal Q96T76 1030 aa 7 HEAT 100% MAAAAAVEAAAPMGALWGLVHDFVVGQQEGPADQVAADVKSGNYTVLQVVEALGSSLENPEPRTRARAIQLLSQVLLHCHTLLLEKEVVHLILFYENRLK DHHLVIPSVLQGLKALSLCVALPPGLAVSVLKAIFQEVHVQSLPQVDRHTVYNIITNFMRTREEELKSLGADFTFGFIQVMDGEKDPRNLLVAFRIVHDL ISRDYSLGPFVEELFEVTSCYFPIDFTPPPNDPHGIQREDLILSLRAVLASTPRFAEFLLPLLIEKVDSEVLSAKLDSLQTLNACCAVYGQKELKDFLPS LWASIRREVFQTASERVEAEGLAALHSLTACLSRSVLRADAEDLLDSFLSNILQDCRHHLCEPDMKLVWPSAKLLQAAAGASARACDSVTSNVLPLLLEQ FHKHSQSSQRRTILEMLLGFLKLQQKWSYEDKDQRPLNGFKDQLCSLVFMALTDPSTQLQLVGIRTLTVLGAQPDLLSYEDLELAVGHLYRLSFLKEDSQ SCRVAALEASGTLAALYPVAFSSHLVPKLAEELRVGESNLTNGDEPTQCSRHLCCLQALSAVSTHPSIVKETLPLLLQHLWQVNRGNMVAQSSDVIAVCQ SLRQMAEKCQQDPESCWYFHQTAIPCLLALAVQASMPEKEPSVLRKVLLEDEVLAAMVSVIGTATTHLSPELAAQSVTHIVPLFLDGNVSFLPENSFPSR FQPFQDGSSGQRRLIALLMAFVCSLPRNVEIPQLNQLMRELLELSCCHSCPFSSTAAAKCFAGLLNKHPAGQQLDEFLQLAVDKVEAGLGSGPCRSQAFT LLLWVTKALVLRYHPLSSCLTARLMGLLSDPELGPAAADGFSLLMSDCTDVLTRAGHAEVRIMFRQRFFTDNVPALVQGFHAAPQDVKPNYLKGLSHVLN RLPKPVLLPELPTLLSLLLEALSCPDCVVQLSTLSCLQPLLLEAPQVMSLHVDTLVTKFLNLSSSPSMAVRIAALQCMHALTRLPTPVLLPYKPQVIRAL AKPLDDKKRLVRKEAVSARGEWFLLGSPGS* >MMS19_musMus Mus musculus mammal Q9D071 1031 aa 9 HEAT 89% MAAATGLEEAVAPMGALCGLVQDFVMGQQEGPADQVAADVKSGGYTVLQVVEALGSSLENAEPRTRARGAQLLSQVLLQCHSLLSEKEVVHLILFYENRL KDHHLVVPSVLQGLRALSMSVALPPGLAVSVLKAIFQEVHVQSLLQVDRHTVFSIITNFMRSREEELKGLGADFTFGFIQVMDGEKDPRNLLLAFRIVHD LISKDYSLGPFVEELFEVTSCYFPIDFTPPPNDPYGIQREDLILSLRAVLASTPRFAEFLLPLLIEKVDSEILSAKLDSLQTLNACCAVYGQKELKDFLP SLWASIRREVFQTASERVEAEGLAALHSLTACLSCSVLRADAEDLLGSFLSNILQDCRHHLCEPDMKLVWPSAKLLQAAAGASARACEHLTSNVLPLLLE QFHKHSQSNQRRTILEMILGFLKLQQKWSYEDRDERPLSSFKDQLCSLVFMALTDPSTQLQLVGIRTLTVLGAQPGLLSAEDLELAVGHLYRLTFLEEDS QSCRVAALEASGTLATLYPGAFSRHLLPKLAEELHKGESDVARADGPTKCSRHFRCLQALSAVSTHPSIVKETLPLLLQHLCQANKGNMVTESSEVVAVC QSLQQVAEKCQQDPESYWYFHKTAVPCLFALAVQASMPEKESSVLRKVLLEDEVLAALASVIGTATTHLSPELAAQSVTCIVPLFLDGNTSFLPENSFPD QFQPFQDGSSGQRRLVALLTAFVCSLPRNVEIPQLNRLMRELLKQSCGHSCPFSSTAATKCFAGLLNKQPPGQQLEEFLQLAVGTVEAGLASESSRDQAF TLLLWVTKALVLRYHPLSACLTTRLMGLLSDPELGCAAADGFSLLMSDCTDVLTRAGHADVRIMFRQRFFTDNVPALVQGFHAAPQDVKPNYLKGLSHVL NRLPKPVLLPELPTLLSLLLEALSCPDSVVQLSTLSCLQPLLLEAPQIMSLHVDTLVTKFLNLSSSYSMAVRIAALQCMHALTRLPTSVLLPYKSQVIRA LAKPLDDKKRLVRKEAVSARGEWFLLGSPGS* >MMS19_cioInt Ciona intestinalis urochordate XP_002128657 1026 aa x HEAT 32% MEKVKFEMEEMIQLWLRDKNDDHKILKCAQQIENREQTIGDLVTALGPHLTNKDTKIRIDACTLLSNVIHKLPKDCLNQGELESLVQFLCSRLEDHYTLQ PVALSLLLQLSSADNLTGENACSIITSVFKEVHIQTCMQHDRLKIFQILGTLLDIHTKDVITMGRDFLYQYIQVIDGEQDPRNLLTIFQLTKNLIESSFP LFDLVEELFDVSSCYFPIDFNPAAAGKKSTITNLDLVSSLRGVLASTKQFAQYCIPLMLEKLESDVESAKIDSLETLTACLGCYGKQELEKYLSSLWSDV KREINQSSSEQIEKCCLTFLTSLLSNLSSWPVDQKSEKATDLKSFLDDVLEDCVPRLQAQSDDRSKWMAGHVVLACAKSSKKACSQIVTTVLPILLQNAQ SKSASTTLAGQSVQQSALDNLVKLTAVCGQFNFENHPVLKKKEEFFTILNELALKSEIEEQLKCIAVAGFASLLKLEILSNVELTEIASLLLKMIKLKPE SHLRGEVLSVAGYLSSQHPDVAKSHLIPCVMRRMEEGDDSCFDVLASVCTHFDVLKLVLGFIMERIVNTQVDETSEPLLHACLESLQKMTSSSWVGNTEI EYMALNLVLPLLKRCIEVTLELSVPEQCCANCHIFEDVSKECASLPILKSAAIVIRNVCQKLKPGKSTDLVIQLIASLYNNSKLSSLDIKSDVHFTPFHP KASPLQTRTLCFLPATICALHPNIEIPELAELETKLLNTCLHCTDQPSYVFAAKALSGLVNKYKKPSIPILEKLKSHFDTDPNWSLKSEEEKMMILTLLI WICKALVLSNHPDSLIFIKNLLYWMGDDSVGEVAAAGFDIILRESNEVLSPSSHSTIRLMHKQRFFLLIIPEIVSSFKTSENKTQQTNILTALSHLIGHL PKQVLMQHFTELLPLLTQALHTDNTQLLKSVLSTLFCFIQDTTEAMTAHLENLMKHFLRLSKFKQDIDVRVKAVQCIGVVTLLPPIVILPFKNDIVRHLV SVLDDRKRDVRTEASKARSEWFLVGT* >MMS19_braFlo Branchiostoma floridae cephalochordate XP_002588594 1027 aa x HEAT 42% MAALSGNVQENVLEFVQGQQDSALQSVAKAVFDGETSLLQLVESLGSSLTSTEVTTRARATQLLAEVLHRSPSNRLTEKEAEVLSAFFCDRLLDHHSVQP HVLHGLLALSAAPQLPQGEEVKIVQVIFKEVYVQSLVQTDRRAIYNILANFLDTRLEALQALGADFVWGFIQAMDGEKDPRNLIIAFSIARIVAQAFPIG TFTEELFEVISCYFPIDFTPPADDPHGVTREDLVLGLRQVLAATSKFAQFCVPMLLEKLSSDVTSAKLDSLHTLAACAEVYGADSMKSFLDLLLSAISKE VYSSIHQDVENAALAGLTAVVATLSHAVTETRSVFSLHHFLDSLLKGCKHHLCEPELKLMWPSAKLLQAAARASDPACVHVLDTAVPLLVEQFQVHPYPQ HRHTILEVTIAFIHVAHASTSGTDAPNPVVPHSDNFLTLFYSVLEDADAGLRSSGVGGMAAMIGITDVVKGKHLDLCANHLGRLVLHDADPTVQRRSTEA LAAMATAHPDVVREEVLSKLLQVLENNNPNAMDTNQSEQVCAKHVTNQYVLNTLAAVSTHPTIVRCTIPKLLSHLQALIESCPDQATQEAIATLDCVYKV VEKTVINDANVEYFVDTIVLNLMSMALSAAVNTSDENLLHDTSLLEIVAKVLRAVARSLPNSTGKGIVNSTVQAFLQGNLAAISLNTSASFEPLDVSSPW QQTQTVQLLSAIVCSVARNVDIPSISELAQKLLTLSCASDHEPTSLAAAKSLSGVVNKWDQGEQLQTFLQETRDCLEQILSKTEDEKARCRAVAVLVWLT KALVIRGHPSGSQFTKTLMALFEDEAIGRRAAEGFYVILSDSPDVLSKESHANIRLMYKQRFFMENLPALVDGFNQADDGRKQSFLCAVSHLLTFIPRQV LLGALPPLVPLLVQSLLGEDPSLQVSTLEMFSSLVQEAPQVISKNIDALIPQLLELSKNGPTMKVRMAALKSVGSMTSLPHAVVYPYRNRVVRELAVAVG DKKRMVRKEAVAARGEWFLLGSPGGK* >MMS19_strPur Strongylocentrotus purpuratus echinoderm XP_001194909 975 aa x HEAT 36% MGTYLMSTETRIRAKGVELMSEVLTSLPRAFLNQQQIQVLIEFLCARLLDHHSITQHTLKGLLAMSSQSSFPPSSSVQVMTAIFKEVQVQTMLQVDRRTV YNIVVNLLRISLTELQGMGSEFVLGLLHAMDGEKDPRNLILLFNILPTVINNFKIDMFIEETFEVVACYYPVDFHPPPNDPYGISREKLALSLKTCLSST PKFAQFCLPMVMEKLSSDLQTARLDSYQLLQACAPVYSQGDMMSYIEAIWSYCRKELMVGASVELDQEAVKTLGAVVKAVSTGIQSTSGGGGGDGDLNSF LRNILTECRQHLCKPDHRMIHPCSKLLVAVATASYPACIAILKYSVPILLDQFHIFDQTRERMVLLDIIQRLLHSGHDHIKEADDWRAIYAHHLDTVVTT VLSTLAKDQGVPDLRMAGLGTLGELVQVPVLMDQSRLELVGQELTRILLEEANEDVRCECIETVSCFASRHTEFVKSTILTTLWTTVQKGESGYQRIVVD MATIVTDTSDDLSRSLTSELMVEIAKTELNSEQHLVYLATLNTLTAHLSSAPSNLESLLSSVVHPLMKMVVSATLQSSVEAGNNPHCCGEVLVAMAEVFR TVIPKLDSSMGSKLCQCAVDVFLHGNLTSLELTNPNTSVPFSPLDPRAPVHQTQLVTVLQPIVCSLRRDIHIPSSKQLMSSLLHIAAHSRAWLASTWAAK GLAGGVNKHPAGSDLDEVLVEAESLLGQAMSSGQEGSVKQQALMAWVLLTKALVMRSHPKATAFLTTLLRLLEDAELGAQVPQTLGMLLEDMRDVLSEGL HADVKIMYKQRVFLQALPFVMALFNKDDLRTKAITALCHLLPSIPRPVLLAELPPIIPRLVQSLRVTDPRPTLPILDILESLLEETLPSLVDQADTLLPT LLELSAYQASMKVRIASLKCVGAITSFPHHLVYPHQETVVRSLAPRLDDKKRLVRQEAGKARTKWILLQQDTKG* >MMS19_sacKow Saccoglossus kowalevskii hemichordate XP_002735310 1007 aa x HEAT 40% MATSMCIEIVENYVRGEDESAIHAAEIKILELVENLGTYLTSTEKNIRCRGTRLLSEVLNRLPKNFLSSDEVRALVIFYCDRLSDHYSVTPHTLLGMLAL STYDNLPKGCEVQLVQAIYKEVHVQSMIQVDRRSVYAILSNLLDTRIKDLQSLGRDFVFGYIQCMDGEKDPRNLTMIFRCVPIIIHNFPIDVFIEELFEV VSCYFPIDFTPPPNDPYKVTQEELVLGLRKCLAATPKFAEHCLPLLMEKLSSDVQRAKIDSFLTLAECCEVYGEDDLMEFLPAMWSTIRREVFQAFSHEV EKSALTCLCSIVKTLSNAVSNANKAAGGLDEFLDLVLKDCSKHLRDPGLRLMLPTSKLLQSAASASDPACYKIISAVVPILLEQFHKCKQVNERVSLLHA ALDFIKVCKSFTFGDDTPSPVIPFKDSLASLFLSLLSDHSSQLRCIGITGLVGLMSLNAIMNINEKKLAAMHFTNIVLTDQDNKVCSEAVTALAFMSMEF SLLVKEEVLPQLIKELDSRATGTRHRFIVNTLAGISMHSDIVLTTIPVMLQHLGTLSEDNTAESLETAVNTIQSIDIVVNSNISDEQCLDFFHSKLLPQL LRITVDQALQVNNYILCKEDVVSSIATVCRNIAKVLDDRVASNLVSNTISLFLDGNLENIGLKQSSQHFRPLEISSPWQHTQLVSLLTSIICSMKTFELS SQCLELMEKLLKLSLSSEHHLTCVSAAKCYAGLVNKHKQGTDLDSSLETVVESTCRMLQDEISDQNIYNRQKALTLWLWVTKALVLRAHFKSTQFTTKLI SLFEDHQLSQMAADGFYIILSESQDVLNKDMHCDIKLMYRQRLFMQTLPRILAGFEKANEDKKQYYLSALSHLLQFIPKQVLLSELPPLMPMLVQSLYCQ DVGLYVSTSDTLSMLIQDAPTVISLYVDTLLPQLLTLSTYQQSMKVRIAALKCIGLFVTLPTHVIYPRQKEVVRRLASVLDDRKRLVRQQAVTARGLWFL LEAPKK* >MMS19_dapPul Daphnia pulex crustacean EFX86854 961 aa x HEAT 38% MAISTAIQKLRDSFNSEESANESIRCISQSIASKELTILKLVEDLQPDLLNQQNTHRCKAVSTIGTILEQLGPELKGLNEKEVELVTEFFCSKLKDHHSI LPAALQGLHALSTAPKLSPGLARLISQSIFQDVHCQSQLQHDRRAIYKTLKNLLAFHLKELQDLGQDFVFGLIQLADQERDPRNLLILFSIFPVVARYFR FEPFTEEFFEVFSCYFPIDFTPPANDPYAVTKEQLCDGLRQCLAGSPHFAEYCLPLLQEKLESDLVSAKVEALKTLELCCQTYQAGQLEKWVDSFWTGIR REVLINVNTDDLEHASLDALAALSRAFTTDGEFNSPAFTKLLKNVLTECQGHLCEPERRLMTPSSYILLAICSGSAPACALIVSQVIPLLMDQYRIRPQS NPRQFILNSLNKMVHAGLYGFTEENVAQSGLASLIPKLLELYLEVLKEDDAVLRNLSLQGLSHLIGTCLNHQDLEKVNGTLLDLLQKSTATDSVIAEIGH FFCKSAEKNENLFLEQVLVKLLDIAVSGSIPTDGCARTIRPGITTGSTQSFDSFRTKGRTRNIPAIAPLGIENIGRRRSSGVTPSHLCLIEKSGFQFIFR VILLDQNVGRNVFVTFSALYRKATEFINEQTEQYVSQHLARSPWTLSIMEATLGSLDATPSGHSLERLVNTLEPLTVCHPKADVRLSACRLMAALVNKLP EGHELEAILDSLRRKWQDPSTDRCNSVCLFVWITKALLMRSYSKLNQYIQELVDSLNDPTHGYQVAEGFKTILCDTEECLNFNCHANIRLMYRQRFFQEV VPRLLKLYRESESCNKAACFAAIANQLAFIPEGVLIAHITTLIPLLIQCLSTDQPAQLIISTINAFMGLMSDNVSAIEEYISSLVPRLLTLAKDGITMDV RRLALQCLSELRKAQSIVLLPLRSEVILRLVPCLSDKKRLVRREAALARQKWIMLGQPGCN* >MMS19_droMel Drosophila melanogaster insect NP_649519 959 aa x HEAT 30% MTTPTRATLEKALKSDQKLVNSATQIAKDLTAKAYDISALAEALGFALSSPDMEERVAGTNLLSAVLIALPQDLLQERQLEFLSTFYMDRLRDHHNVMPA IIDGIDALVHMKALPRAQIPQILQSFFEHTTCQSQTRSDRTKLFHIFQYLTENFQDELQAMAGDFVYGLINSIDGERDPRNLDIIFSFMPEFLSTYPLLH LAEEMFEIFACYFPIDFNPSKQDPAAITRDELSKKLTNCLVANNEFAEGTVVLAIEKLESELLVAKLDSIELLHQAAVKFPPSVLEPHFDQIWQALKTET FPGNDNEEILKASLKALSALLERAAHIPDISHSYQSSILGVILPHLSDVNQRLFHPATGIALVCVSGDAPYAADKILNSFLLKLQAADASSEQRIKIYYI VSQVYKLSALRGSLQKLDTTIRESVQDDVIASLRLIEQEEFDAKKEDLELQKAALSVLNESAPLLNEKQRALIYKALVQLVSHPSIDIDFTTLTVSLGAL QPVEVQSNFIDVCVRNFEIFSTFVKRKIYTNLLPLMPQIAFTQRILDLVMTQTFNDTTAEPVRLLALEALNKLLLLADQRFIVDVQQESNLLHKLIELGQ KTEGLSMQSLEQIAGALSRITQQLPLSEQSAIVSEYLPGLNLSQSADLYITKGLLGYLHKDITLDDHFERLLTDLTQLSLNSDNEQLRVIAHHLLCSMVN KMESNPANLRKVKKITEQLKVAIKKGDVRAVEILAWVGKGLVVAGFDEAADVVGDLSDLLKHPSLSTAAALGFDIIAAEYPELDLPVVKFLYKQKLFHTI MGKMGSKLANYCVHHLKAFVYVLKATPQAVIKLNIEQLGPLLFKSLEEHNEAQSLCIALGICEKFVAQQDTYFQGHLAHLIPSCLELSKYKAQHTMQVRI AALQLLYDVTKYPTFVLLPHKVDVTLALAAALDDPKRLVRNTAVKARNAWYLVGAPSPN* >MMS19_nemVec Nematostella vectensis cnidarian XM_001629116 897 aa x HEAT 32% MAALGQEEYPSLATLLQDVYQRRKNLLQVVELLGPSLTSTDTDKRCSAVQLLSSLLQKLVNYKLTDREDLKPVGSDLVFGFLQAMDGEKDPRNLVVAFKL ARIIIKNFPIGLFAEDLFEVTSCYFPIDFTPFCLPLLMEKLSSDVINAKIDSLLTLVFQTVSSELEDAAFKALSSIIKNLESSSPGQEPFLSRIFINFYT ISCYVTQCHPDVVEFKTPFLDCVIKECCANIEGADLRKVKPSGQLLQAAFVTDTTYNEITSTAVPLILSKYNDEATQGLVKKLLLDVLLGLLTASKPYYK RKGSVLASHTSALVDVLFSALVSDSPSLCRAAIAGLVSMVTLPGLLLEQKVGMFVEHLTSFVLNTKDLTVRQESNAALAFLAMEFPELIKTKLVSVLAEQ LQKEDGSAMDEENISHLQSDKSHPQYDQMLNTLSAVCTEEGVVRHVVPIILDHGEYLVTGKDLERGVLHGKISETLKCLNSIVKGTLQSSTVEPNYYTEV VIYRIIDLCTQSALQESPDCPMATPEALALVCSIVRQVISHLAVNEAEDVLHIIVSNFIEGKTPLSARAEQKFAPLEPSSPWQQSQLVTVLMAAVCSARR EVRIPRQKELVPRLQVLASGCNHRKTTVAASKCLGGIINKMAQGDDLTADLHSLKGQLQNHMDGNEEQRWRAVITWLWLTRALVTRSHPMAQEFVQKVLH LLDDVSVGRVAADGFYVIVSDCDDVMNQAMHADIKMMYKQRFFMETLPLLLKGFHDTRPECKYLYLCALSHLLQWIPKQVLLTEIPTLMPMLIQALSRDE PSLLLSTLQTLYSLVFDAPEVISRQVTSLIPNFLELAKCKASMKVRMEAIKCLGAMTTLEHHVVYVYKARVIKELACTLDDPKRLVRAEAVKCRNEW* >MMS19_triAdh Trichoplax adhaerens single-celled metazoan XP_002114595 959 aa x HEAT 39% MEKDSSAKSLQQLMDEFILGNSSAINEIIKGIYDGHIKLSTIVELLGPYLTSVEHEKRLQGMKLLSEVLQMLSMYKMQATEVQLLVAFYSDRLQDHFSIL PETLRGILALVQHQIISEEDAVTIVKGIFKEVQNQALLQADRNKVYAILAGLLDKHYEGIKIMDADFVFGYIQVMDGEKDPRNLLLALKIAKFIVQNFNI DLFLDDFFEIISCYFPIDFTPPPNLPSNENVTKEDLIIVLRESLTSTRKFAGISCAKIYTATDFQEYLQPIWTAIRQEVFLSMDDQVQELSLEALKHVVV TISSNSLQQPDQDPLNDFINMIVTETQQYLQDPELKLANPCGNVLNAVASASDRSCYSILTPIIPRLVNLYSTDKTVIFRCKVLDILIKLLNAAANCQLS EQFIAPMDWHEIVKLLQLAMDTSEEDIRLRVTASFSILIQIKDALPADEIERISNDILKRALEDPSSIVRHGSISTLATIASVLPDVIITTVIPYIRTSV TNLQLLLQCLANVKNRIENCLYLYHYLFDDILWLCVYNSLEESINSFEFKTIKIIASIGQLIYLNLDESSQKKFIDNLLELFMNGQVSVLKPMTVIDELP LKQFYPLNVASSQRQVQLIEILCKILGAIKFRDGILSPNDMITNLLDISCKSVHQPSATSAAQLLSSIINKMEEGDQLENYIKSITNTICNVLYSKNVET EMKNAVNTWIWMFAILCRYSCSLFHYSNFDIKTSFDASFQLMKALIMRSHPYSNEALIQVLKFFKLPNVGHVASAGFKIIIGDEENILCESTNAIVKFMY KNRFFMMASEKLMENYRIASKGIKHHYLTALSHLLNGVPKQMLLNHLQMLMPLLVESVSCDEESLRLSSLQTLRPLITEAPDIISNYVASMLPELLKLCN FPSSMKIRISALQSVNDLASLPIHLVVPYKSKVINELGNTVNDKKRLVFTVINPKKQQ* >MMS19_sacCer Saccharomyces cerevisiae budding yeast P40469 MET18 1032 aa 13 HEAT 29% MTPDELNSAVVTFMANLNIDDSKANETASTVTDSIVHRSIKLLEVVVALKDYFLSENEVERKKALTCLTTILAKTPKDHLSKNECSVIFQFYQSKLDDQA LAKEVLEGFAALAPMKYVSINEIAQLLRLLLDNYQQGQHLASTRLWPFKILRKIFDRFFVNGSSTEQVKRINDLFIETFLHVANGEKDPRNLLLSFALNK SITSSLQNVENFKEDLFDVLFCYFPITFKPPKHDPYKISNQDLKTALRSAITATPLFAEDAYSNLLDKLTASSPVVKNDTLLTLLECVRKFGGSSILENW TLLWNALKFEIMQNSEGNENTLLNPYNKDQQSDDVGQYTNYDACLKIINLMALQLYNFDKVSFEKFFTHVLDELKPNFKYEKDLKQTCQILSAIGSGNVE IFNKVISSTFPLFLINTSEVAKLKLLIMNFSFFVDSYIDLFGRTSKESLGTPVPNNKMAEYKDEIIMILSMALTRSSKAEVTIRTLSVIQFTKMIKMKGF LTPEEVSLIIQYFTEEILTDNNKNIYYACLEGLKTISEIYEDLVFEISLKKLLDLLPDCFEEKIRVNDEENIHIETILKIILDFTTSRHILVKESITFLA TKLNRVAKISKSREYCFLLISTIYSLFNNNNQNENVLNEEDALALKNAIEPKLFEIITQESAIVSDNYNLTLLSNVLFFTNLKIPQAAHQEELDRYNELF ISEGKIRILDTPNVLAISYAKILSALNKNCQFPQKFTVLFGTVQLLKKHAPRMTETEKLGYLELLLVLSNKFVSEKDVIGLFDWKDLSVINLEVMVWLTK GLIMQNSLESSEIAKKFIDLLSNEEIGSLVSKLFEVFVMDISSLKKFKGISWNNNVKILYKQKFFGDIFQTLVSNYKNTVDMTIKCNYLTALSLVLKHTP SQSVGPFINDLFPLLLQALDMPDPEVRVSALETLKDTTDKHHTLITEHVSTIVPLLLSLSLPHKYNSVSVRLIALQLLEMITTVVPLNYCLSYQDDVLSA LIPVLSDKKRIIRKQCVDTRQVYYELGQIPFE* >MMS19_schPom Schizosaccharomyces pombe fission yeast Q9UTR1 1018 aa 14 HEAT 25% MSSNLVALYLFSIDRSQDEANDVVDRIVEEIVTDRMGIVDLVTSIGEYLTDNNISVRAKAVLLLSQTLGELPKDRLPAKHVSVLLQFYLSRLDDEVTMKE NALGIGALLNMQNFPAQKIVDVCKALFSSTDMPKYAQATRLNILKVFETIIDNYLFFISSQTRDAFFSGICSTFAGEKDPRNLMLVFSMLKKILSTFPID GFEQQFFDITYCYFPITFRAPPDATNLAITSDDLKIALRETLVANDAFSKLLLPALFERLKASTVRIKIDALNIYIEACKTWRVGAYLWSAKDFWESIKQ EILNSTDAELQNLALGALNTLASKFYKEEGFSSSFTEFVDMILIQLSQRLLEDVNVKSCGSCAAVFASLASISVETFNYCSCNFLPSVLDLPMVNEPLEK QKGMLVFLEYVYKCLVLLYGKWRSKNQADIDNPLLVYKDKQLSFVSGSLMGTAKDETEIRMLALKVIFLMASIKNFLTESELTMVLQFLDDIAFDFSDPI KKKATECLKDLGLLKPDFLLLTSFPFAFSKLTDDVTAKSSSEETFKQYLSVLVSISEERSLFKALVIRLVEMLKDQFKSKEMSVDLVESIVQSLSVAFKE RNDRNEQEIPFFFEELLKQLFTLCFANCESMNVRCLIYVSQTINEIVRVNHFEFQEKFVGQLWKLYMENSNSDLIETEGCEKAAERFTLAASLSDQKFLN LVVLLQGGLNGLSKKLHFIEKLNIELLNLLINVVFVTESPGVKISALRLISSLINKCEKDEDISSFISSKGVTSLWDKVYTGTPKESEAALDVLAWVDKA LVSRKHSEGIPLAFKLLDTLNLQNVGDSSVKALSIIIKDDPALSKENSYVEKLLYKQRFYASVSPKILEHISTATGGEKSLYLMLLSNVIGNVPKEIVIP DMPSILPLLLQCLSLSDISVKLSTLNVIHTSVKELTSLLTEYLDTLIPSLLAIPKDMNNPTVVRLLALKCLGSLPEFTPTTNLQLFRDKVIRGLIPCLDD PKRVVRTEASRTRHKWYI* >MMS19_araTha Arabidopsis thaliana plant NM_124186 1134 aa x HEAT 28% armadillo/beta-catenin-like MMVEPNQLVQHLETFVDTNRSSSQQDDSLKAIASSLENDSLSITQLVREMEMYLTTTDNLVRARGILLLAEILDCLKAKPLNDTIVHTLVGFFSEKLADW RAMCGALVGCLALLKRKDVAGVVTDIDVQAMAKSMIQNVQVQALALHERKLAFELLECLLQQHSEAILTMGDLLVYAMCEAIDGEKDPQCLMIVFHLVEL LAPLFPSPSGPLASDASDLFEVIGCYFPLHFTHTKDDEANIRREDLSRGLLLAISSTPFFEPYAIPLLLEKLSSSLPVAKVDSLKCLKDCALKYGVDRMK KHYGALWSALKDTFYSSTGTHLSFAIESLTSPGFEMNEIHRDAVSLLQRLVKQDISFLGFVVDDTRINTVFDTIYRYPQYKEMPDPSKLEVLVISQILSV SAKASVQSCNIIFEAIFFRLMNTLGIVEKTSTGDVVQNGNSTVSTRLYHGGLHLCIELLAASKDLILGFEECSPTSGCANSGCSMVKSFSVPLIQVFTSA VCRSNDDSVVDVYLGVKGLLTMGMFRGGSSPVSRTEFENILVTLTSIITAKSGKTVVWELALKALVCIGSFIDRYHESDKAMSYMSIVVDNLVSLACSSH CGLPYQMILEATSEVCSTGPKYVEKMVQGLEEAFCSSLSDFYVNGNFESIDNCSQLLKCLTNKLLPRVAEIDGLEQLLVHFAISMWKQIEFCGVFSCDFN GREFVEAAMTTMRQVVGIALVDSQNSIIQKAYSVVSSCTLPAMESIPLTFVALEGLQRDLSSRDELILSLFASVIIAASPSASIPDAKSLIHLLLVTLLK GYIPAAQALGSMVNKLGSGSGGTNTSRDCSLEEACAIIFHADFASGKKISSNGSAKIIVGSETTMSKICLGYCGSLDLQTRAITGLAWIGKGLLMRGNER VNEIALVLVECLKSNNCSGHALHPSAMKHAADAFSIIMSDSEVCLNRKFHAVIRPLYKQRCFSTIVPILESLIMNSQTSLSRTMLHVALAHVISNVPVTV ILDNTKKLQPLILEGLSVLSLDSVEKETLFSLLLVLSGTLTDTKGQQSASDNAHIIIECLIKLTSYPHLMVVRETSIQCLVALLELPHRRIYPFRREVLQ AIEKSLDDPKRKVREEAIRCRQAWASITSGSNIF* >MMS19_dicDis Dictyostelium discoideum slime mold Q54J88 1115 aa 18 HEAT 31% MTSNITELNKWIEGYVNPQSEESVKTNAINMVLLYMKSNKIDLQDVVQGLGDYLKSNDSILRARGTLLLSEVLCRLPDLPLNQDQVHFLAMFYCDRLQDY ACSSEVVKGITGLITNHTPDYPDNQKLLRNIFSEVHPTSLTQAHRKMVLQVIDIMFNKCLSEIQELKNDFMVGYLQFIDNEKDPRNLIFSFKLLPKVIYN IPEHKHFLESLFEIISCYFPISFNPKGNDPNSITKDDLSNSLLNCFSCTPLLAEHSIPFLIDKICSNLIETKIEALQTLVYCCDRYGGFAVQPFLEEIWS TLRTLILTHKNTTVIEESKKTIFYLTRSFTKERKVLESFLSIMIKECLHHIKSSQDSKIAIYCASILYQSVSASLLSSKIILIHIFPNLFNFLSELQKQD TVQKVNEQNSVIALFNDLLKANSIAFEMYSNENKEPNPLEPFVDQLFKLFSDLLLLNSSSSIRSNSIECLSNLYISKKVHTTEQDDDDSEQITNEFLLDL EKRQFIIKSLVSLLNSSDNTLRHKSLDSLFTIASNEDPSVLNLYVIPTLLQMINHSSCNINTTNNKINNNNNNNNIVIKNNKCQDEHCNEDHSNKNENNN NSNENSNGNSTSGSDDDLKHYLEAFTKLCTHQPLLESVIPQIQVLLQHNIKETYQSNEDFEKSILILQSISFILEKSTNIKSMTICSKSILFPLIKGLYK QELISSSNDNNNNNNNNSNRFNQILTPTLKMIHSIFENISIESQKPLLEKLIKLFLNGDTLVINYQLPTTTTTIIKPFEKSSPYKYLIPIFTTIISQSKL DLSENNELKQSLYQMSLDVNVDDSIAISCSKAYSSIINKQQQQQQQDQINFNFFNDNLLKVINDTTTPLPLKIRHLDLFTWCTKALLTNGNSINIKLGSC LADIISNENVELSYHASKSFGILLSETDVLNEKSGSIIKILFEQKFFTLMFPILLESFKVSKNKELQTISSHYLIAISNLLKHVPKEILLAELNEILPIV MQSLKSSDNNDQVQLLDSSLQTLTMLINETPSSFISYLDSLIPSLIKISTKSTKYNLKRSALEILTLLSKSIPFVNLFPYKTQVVTDIIPCLDDKKRIVR REAQKCRNSWYILQK* >MMS19_pytUlt Pythium ultimum stramenopiles ADOS01001616 957 aa 4 ARM units 31% MFSLDAPLAPAIDAFVNPENDDNVHKTSLNTVVMQVHRKVSMEALIQALGLHLTSTDDKVRARAMQLLAEVLSRLPELPLTPNAVQLLVDFFADRLADYP SASACLQALLALESNHAKKIASPTVTIILIQKMVKVLHVPQLGQAMRKQCFELMQLALGQKVVVDVLVTAPESSSIDHGLLFAQAFLNAMEGEKDPRNLL LCLQIARELLAKLEVVFDRHDAVLQQYFDVVSCYFPITFTPPPNDPYGITSEELILSLRKAFAASDLLAKHVLPFLLEKLSSTVVEAKLDSLQTLVFCCE AYSINVALLHMLSIANALYHEVVKGEKKEVIEASLRAISRFSSVIGLAKTKAAGGAAYAWNKFVVELTTRAMSDLTGHATDSLVSVSAGQVLAALGKDSV LGFTHVLETSVPLLIQQFNESSTSTESKCEASLARLLLIVNTIDREVDQSASAQPMRPHALVLIDALVAFLSNNEALSTPTAKCSAIEALSHLVTYPPSP IVEIAQVKALVELFINFLLFDASPEVRRECLQSLRAISTIKQKATVKNYASLVMEIALTQLMDAVQLSAQNTKVAAVLASSGRDHPEFFNDVLDSITQLS QEASLFQATIVRLVDFCVVENQDSNKITFVANSSANGTQAHVDGILNAVAKIVELNADDKASMEFCVTSGGDNSIVFRLLKAVTTTAADAAAQNALLDDA KLASCARIFRTPMQNVSTETQQLLANAAISAFLTTQSTGASASHPAYLQLVPLFSAVINSANRNLNLPETSRVINTLLELAQSSTAVYHTTASTQQIEQI SSEAALSAAKSLASIVNKMSDGEEFDALIVLLLDQKLSQIIANEQKDVSVRVAALQIYVWIAKALVIRGHREHAPACLFFLCKFLTPETSDARSQIAMHV AKSFKLLVTEFPDVLNRKCGAFITVRQHKKKYAGILGNADLTFYFVCVVPVPSANV* >MMS19_sapPar Saprolegnia parasitica stramenopiles ADCG01000470 804 aa 1 large ARM repeat 31% MFSLDAPLQPAIDGFVDPENGEQQHTTHLNNVVMSVHRKTPIEQVIQGLGAHLTHVQDKRRARATLLLAEVLTRLPDLRLSSDTAHLLLTFFLERLKDGP SMAACLKALVALISLHAALLPANDAWTVCATCHAWCERAVVETLLNLPTPIASLSQSMRKQSFELLQLIVRRGALGDHEGRVLMDGFLRAMSGEKDPRNL LFCLRFAAELLTTYANVVDADVAKGFFDATSCYFPITFRPPPNDPYGITSEDLVLALRSVFVGHDSLAKHVLPMVLDKLSRTTVVEMTKDILETLAFCCA KYPLNRLLLHFTPVAAAVYHHVLHGDNTAVIAVAIDALKTITRAVSPPSKLPGMQALAWNKCIVYLVNQAVEDLAHQAPDSMVSTGAGHVLCAIASVGVA GFSHVLSSALALLLEQCAAQAGSPAEAATARLVQLLGCIDAEVDHSAPPLVPYVSAIQTTLVHGLETATSSRQQKLCLQGLRCLVLRPPSPLLDDASLEV LLQGWTSTVLSNPFPDVRDEATSTLQAIALKSPGLAQIVLTRCVPSFLQVLEQPAVLFFASWCGDMDDGLGQCSVWAVDRGHGARHPRGPHAALARPRHL SAPPAAVPDQLDAPVCDRDDGGRGRHCPRQQGLGRVHGLRHPRHSHFIVGALPPRRRARRARPDAVDRRPSDRAGDGQHERACHAVRVPPGANNAAHVDL AAGLDVGHFALAWPRPATATIERSTLLVRRGVDGARTRRAAAPLAAAVYARRAEQRRRREHSQGACGALQRPPRGHAAVRCVPKVDHVARCPPRRHGVAG GRHS* >MMS19_polPal Polysphondylium pallidum amoeba EFA86574 994 aa x HEAT 28% MSKANIDSYININNNDQTKQTSLNILLLEINANKLSIHQLVEYLGDYLQNTDSILRARGTLLLSEVLCRLPDLKLNEAQVEFLAAFYHDRLQDYACASEV VKGVYGLCVNHKVPYPHNQKMIRAIFQEVHPSTLVQTHRKMVLQLIEHLLEHNLTEIQELKGDFMAGFLQFIDGEKDPRNIIYTFRLIPRVILYIPEYKN FADSLFEILSCYFPISFNPKPGDPNSITKDDLVSSLLNCFGASTYFAEHCIPFLIDKICSNVVDTKIESLKTLLFCCSKYGPVALRPHLDDIWGTLRTQI LTQKSATVIDESKKTMFYLTRVLAADQETLQSFLSMVDKECLHHIKTSQDSKLAVSCASILFQTVSASVKSSRIVLSHILPTIIDFFKELSLHLSDDPIH KANEQLSIIGLFNDLLKANNISFQYNNENIDKEINPLEEYKDKLYDLFIGLLSNSSALVRTLAVDCLANLYVTRHIKTSVPITFVLDQEKRQSIIKDLGV WLLIQIFRNKSLEALMSITKLEQVEQMNLFAIPTLLQMINANQSKNVSESKHYLEAFSQLCTHQPLLQSVIPQIKTLLEHSIKKKYINNDEFENSLLVLQ SLENTFSNSIDEQTMTICYREILLPLVKELFEQVFSLDVNSQEQKDQVLGIMKPAISMIHSNNKKEAIELFINIYLNGDLSALQINKEFKPFSSDATEQA KLLIPIFTSVISQSKFELSTNKLLKEMLMSRALDSNVEESISNACAICYGSIINKQTDQTDLPLDHLEQLISSSSTNKTQALNLLIWIEKGLVTNGNPQS IKVGELLAQLITSENTEISQKAAKSFYILLSDHDTFDHKSGAIVKRQKNETVSSQFLVAITNLLRNVPKEVLLGELQEVLPIVLHSLHSNQRDLLNSSLQ TLMMLVDEASTSISSHLDSLIPTLIKISVNGESLTFRQSSLEILTRISRAIPYPKIYPFRNQVINGIVPALDDKKRLVRREATKCRNSWFILQ* >MMS19_entHis Entamoeba histolytica amoeba XP_651925 868 aa x HEAT 25% MSTPAQQLNEFIESPKVIKEGYEIIDQLMKNNYNVNSLVTDLGDTLPSEDERIRFRATSLLTYCLIKYPIKEESKDVFVDYLASRLVDAVCLEPILTALL QLVTKKPSDEIINEIAMAYSCMRTQLYTKEVRILVYQFYKVFINYYQATEVIDTCVQLIELERDPECLKEVFDLIKLVSQKNEIDADSAPLLFDCASAYF PILYPPKGDEALRIDLTNKILDAFVSAPIYAQFALPFLLDKLDADLSSIKLEALKAIYFCIQRFELKYVYAYFTHIWESIEQNISTVGVVEVNEFAFAIA SYFCSLDDFHSKNLMESIKMFCLRMMSETDEIIINFVNGLLEELTKKSEKFFKVFVPVFIQCFHDQLQDADDRPKEQFERELFIVRLIYQRIIEGMPLLD CVKGQVAWDLHRLATPLHPCFVSLLDIDVSLALLNLLGEQRMVPFQNAIELSENKCHDAIPILQRLYEKEEDVMISLLPANKIITNLELVSGIALHSPKL FEQLLKLIPTLQSNEYVPVFQSILSDALPFNCLDVYVNHCIPVFIVITNGVLSPLFNTLMNRLSRLHSILSSKKISELTEGVLSQLKEHSRLLLILPSLL QFYQPENLIVYLNEVQEVDKDTIAIYSLLISKLTNIIPHVLEQNKEYFNGYKTIQELDSHESNKQATPIFIEELCRMNNKAIECLKEMIVFDSINKKNEL HWKEELFNLVYERFIESHQVTTVEESHIMILLFSLLPTEKLLTYESTVLKIFNIICVPTSHLNEIDSVVVLLFNILPTVSQYPMSLIESELDSIITKLFN VLYINGTTIKYRCDIIDLLTRIRVVYGIDAIRPYQKNVIKKLLVPLDDNKRLVRRSAAICRNIWETTA* >MMS19_naeGru Naegleria gruberi early eukaryote: heterolobosea XP_002678884 1070 aa x HEAT 27% MQTSSNSNGEQELISLIDSLCNPTLPNTNKESLKSKLIEFVVNGTLTINEGIKLLGEYLNHATDDRIRGAAYAVLDLILENIPNNVGSETDETKQTTQLK LVASLLRFIGDRFYDFDCLATLLPCLFSLFKKWSSYISSEQAINVVLQFFENVNIQSIGSTSGVAHATKTRSLCFEWFSLLADRFPSIVRTIDFLNGFIQ SLEGERDPGNLLYCFNLIPKVIAIFDDSELSSKILSAVSDDLFDITSCYFPITYTPPANDTRGITREDLSRSLKLCFGCNKFFAPTLFPFLLEKLSSDLV DTKLETLDYLCYCIEKFGEVNSREYLTEIWSYIKAESVKTNSMDVMKKCYESITKIARIVIIPNDPSNKPFDIPNIEAILRTALLELKSKEPKFAAQYAR MIYACAVPTFEISMMVFNRVMPELVATLSESDTKDKLYGSLLMITQLLQAVAEQKGENQLPEVVFNLISQVQTVFLSIYEEEFSKNDKEMILVMVETISR IAIFRIPSTLLRDIYVSRILLKSYGEENKSFSLLHATSLEEYKERVIKDIAWIYKYAPDIVSEDILVPLFGALYGCENKSEHINRILSNISAVGKVCPSM TPSITHRLFERIESIPISESHYEHERVKVFETITSLDVSLIPAHDKVSYIQRIVKMSVTDSSSQMVDDSDTMDCSDSECAHVHHNQGNFSFLTLLLGRSL ENELQQVVLDSVLQYANSVPSTGLKNFISVLSAIVIACRPTVGMGNLITMTDSLLQMALKGEQPSQVTKCIAQLVGSVLNKLPLDSTEFQQLITICNATV FDAFSQMLTVYNGDSESAERYIEMVSWILKGLVMRGAYVPHADRYSSLLCGSLVFEYNSSKVNKKVAEGFLIAIGEDETSIHKENHAIIQVLYKQRFFAT NVRKLMDSINTVTQPHIIGSILLALSNLIHNVPTKVILSEVKNIFPIVLKFLEMRQILIEQDNNSEDLLYAAIKTTLTLLSDAKEEMSVHLSSIVPILLD TCKFKKSQAVRILSVEALLELTNGYKYYEIYPLKKDIIKGLEACLDDKKRKVRKAAVKCRNSYFVLSNNQ* >MMS19_phyInf Phytophthora infestans early eukaryote: stramenopiles 1114 aa x HEAT 33% MVSYEQLGSLPQKGSQNPVVNQKLEAIAMFSLDAPLAPAIDAFVDPENDDAAQKTGLNTVVMQVHRHVSMEALIQALGAYLTNGDDKVRSRATLLLAEVL TRLPELQLTPSAVQLLMTFFADRLADFPSASACLRALLALETHHAAQVQSPRTTVALIPKLGKTLHIPQLGQAMRKMCFDLMQLALMQSTVVELLLDSVP ASKDAQDASVDDAEQSEDLGRQLAQTFLSAMEGEKDPRNLLLCMQVARTLLSKLEPVFSRSDTLLQQYFDVVSCYFPIIFTPPPNDPYGITSEGLILSLR HAFAASDLLAPLVLPFLLKKLASTVVEAKLDAIQTLVFCGERYSVNALLLQMHAVATALYDEVLDGEKQEVIAEARQAISRFSGVVARAKAQDTPGAAYA WSKFVVDMTARAAGELRENAADSMVSVSAGQVLAALGRESSMSFAHVLKIAVPLLVEQLNNESSGSDSVPSKCEAALARILLLIDTIDREIDQSGQGQPM RPHAAALIDALVNFLSSDHDNQTKPGSSPTARCVAVEALCHLLTFPPSPIVAPAQVKALINLFTRMLLLDPVAEVRTACLQSLKEISTVSTASEGSTNSG EHPVTGGYAAFVVEISLARLMAAVSEGSDQEDDDDEEGTGVAAVLTASNRNFDSFFEEALLAITELCRESSIFQATIFLLIDLCVEKGDGKQSAIGFCEA EGDATRQRHVDCILDAVAKIVEINAGDRTSMEFCVKASSSASIIFRLLTAVETLAARATASSGYKSGLVDEVKLSACVRIFRAVMQNVSSATQQQLVDAV VPAFLRTNTSEPASLQFVPLFAAVINSAARDVALPDSSLVINRLLELAQSGATAVSESPPRQLQLVYTDAALSAAKSLASIVNKMSDGAEFDALIDLLLS RKLAVVISNSAESFTVRVAALQIYAWIAKALVIRGHKVHAPVCLRFLCSFLTPDGDVNMEQEGDDQHAAALRMEVAKTFKLLVSEYLDVLNRKCGAFITF LYRQRLFDLVFPVLLEYIRARIDEESSVAALVAFAQVIAHSPKAIYLPHLAQIFPLMVQALNTDDRELGSAAIQTFKPLLLESVESAKPFLKDVFPGLLK QAQFGYVVSCSDS* >MMS19_albLai Albugo laibachii early eukaryote: stramenopiles 1077 aa x HEAT 27% MFQLDAPLSPAIKKFIDSGASNDEETGQKTSLNAVVMHTHRIGSIETLIQELEPYLTDDCNDFARARATLLIAEVLTRLPDLPLSGNRIQVLNNFFCARL DDPPSIPASFQALLALQKHHSTEIPDSENMELVIRISDTLHVPQLNQPMRKRYYELVYLVIQQERMQKALSRSQQAQVFIRSFLNAMTGEKDPRNLKHCF QIAQTMMQKLEMVFQEAELSEQYFRVISCYFPITFTPPPNDPYGVTTEELIRSLRNVLTASDVLIHQMVPFLLEKLSGSMSEEAKVDALDTLGHCVETFS LKNLLLHIRSIGQVFYHEILNGERARVIETASNVLSRVSSVIGRAKVQGSSGSGFAWNAVVVTITNQAVEKLHENSVDSMSSASAGKVLASMSRESLVVS THVLNTSMPLLIEQVKHSFEASSSQCEAALDRLMLFVDTIDEEVEQISTIHPIHSHASPILEALVKFLEEDTPTSTPNAKRLSIRIISHLVIYPSTPVVR PSDVERIVRLFTRGFLSDASKHVRSEFLSSLKALSGAIKTPSTLQSVHCKREKTLQLYGTLLKEHCIAQLLALVQDGKSPEAETFQKSSCRTRKDFEQDT LAAITELSHDPVIFKEAVVHLLQSCFIDQDGLLIFRSFEVEHTLQFFQAVATIIELNASNASNMEFCASIDDQNGIAFKLLDAFVSMAMSNGQSKEQKFL PPNAIAFSTRILRTIMQNICFDTQQKLLDRAISRFHPILQTEESTPSQHLYQIVSAFSTVINSANRSLAFPKAYCVIDSLMAVSRSITTESHGYTNEIVL LISQSIGSILNKVRDKHFEAKVESLLTGLSQSIHNDQEQAQWHTSIEVYIWITKGLLLCGHPKYSSQSVAFLTQLLIHHSDKGVRGQVAEGVRVILTEFP NVLNRKCGASCNMLFRQRLFELVGPNLLAFISKHSEETTEALTGFCYIVAFSPKAAFISLISTIMPLVLRGLSSDHVELGAAAIKAYKIVSDTSIEHVKP FLKDVFHGLLQQAQHSANALDRKDALECIGMLTTLPYELIHSYKDRVLRQLLFCLDDRKRFVRYTAVRVRNKWSVL*
>CIAO1_homSap Homo sapiens O76071 length=339 3FM0_A PDB: 3FM0 MKDSLVLLGRVPAHPDSRCWFLAWNPAGTLLASCGGDRRIRIWGTEGDSWICKSVLSEGHQRTVRKVAWSPCGNYLASASFDATTCIWKKNQDDFECVTT LEGHENEVKSVAWAPSGNLLATCSRDKSVWVWEVDEEDEYECVSVLNSHTQDVKHVVWHPSQELLASASYDDTVKLYREEEDDWVCCATLEGHESTVWSL AFDPSGQRLASCSDDRTVRIWRQYLPGNEQGVACSGSDPSWKCICTLSGFHSRTIYDIAWCQLTGALATACGDDAIRVFQEDPNSDPQQPTFSLTAHLHQ AHSQDVNCVAWNPKEPGLLASCSDDGEVAFWKYQRPEGL* >CIAO1_droMel Drosophila melanogaster Q7K1Y4 MGRLILEHTLQGHKGRIWGVAWHPKGNVFASCGEDKAIRIWSLTGNTWSTKTILSDGHKRTIREIRWSPCGQYLASASFDATTAIWSKSSGEFECNATLE GHENEVKSVSWSRSGGLLATCSRDKSVWIWEVAGDDEFECAAVLNPHTQDVKRVVWHPTKDILASASYDNTIKMFAEEPIDNDWDCTATLTSHTSTVWGI DFDADGERLVSCSDDTTIKIWRAYHPGNTAGVATPDQQTVWKCVCTVSGQHSRAIYDVSWCKLTGLIATACGDDGIRIFKESSDSKPDEPTFEQITAEEG AHDQDVNSVQWNPVVAGQLISCSDDGTIKIWKVTE* >CIAO1_triAdh Trichoplax adhaerens B3RNR8 MTTVAFLPLSFPFLIFNFGQYIRIWAKNSDSDQWTCKSILTEGHTRTIRSVAWSPCGNYLASCSFDATICIWSKKDGDFECMATLEGHENEVKCVNWSSS GVYLASCSRDKSAWIWEFIEEDEEYECASVLTDHSQDVKHVVWSPKENALVSASYDNTIKIYKEVDDDWECSHTLIGHESTVWSLSFHSSGELFVSCGDD KVLKIWKCLKSGPSDVKWISICTIAGYHNRPIYDVDWSKLNNKIATACGDDAIRIFSIVRITISISNQLLFIAYYQAHNHDVNVVRWHPKVDNILASGSD DNCIKIWKVHSNN* >CIAO1_nemVec Nematostella vectensis A7RWD2 cnidarian MTGHEDRVWSVAWSPNGFVLASCGGDKTIRIWGKEGDKWICKTILEDGHQRTIRSLGWSPCGTFLASASFDATTCIWDQKSGEFECNATLEGHENEVKSV DWSVSGSLLATCGRDKSVWIWEVQEDDEYECASVIHSHTQDVKKVVWHPTKEILASCSYDDTIKLYKEDEDDWSCCDTLEGHESTVWSISFDGSGDRIVS CSDDKTVRIWKSYPPGNQEGVVVSGKHTKWKCVCVLSGYHDRTIYDVHWSKVSGLIATASGDDCIRIFKEDTNSDRNQPSFQLVATQRKAHSMDVNSICW HPKDENILATCSDDGTVKLWRFTPAEE* >CIAO1_sacCer Saccharomyces cerevisiae A6ZYM0 length=330 Cia1 YDR267C PDB: 2HES MASINLIKSLKLYKEKIWSFDFSQGILATGSTDRKIKLVSVKYDDFTLIDVLDETAHKKAIRSVAWRPHTSLLAAGSFDSTVSIWAKEESADRTFEMDLL AIIEGHENEVKGVAWSNDGYYLATCSRDKSVWIWETDESGEEYECISVLQEHSQDVKHVIWHPSEALLASSSYDDTVRIWKDYDDDWECVAVLNGHEGTV WSSDFDKTEGVFRLCSGSDDSTVRVWKYMGDDEDDQQEWVCEAILPDVHKRQVYNVAWGFNGLIASVGADGVLAVYEEVDGEWKVFAKRALCHGVYEINV VKWLELNGKTILATGGDDGIVNFWSLEKAA* >CIAO1_chlRei Chlamydomonas reinhardtii A8IZG4 MDPFTLEPIGALSGHDDRVWNVAWSPQGDMLASCSGDKTVRIWSRRQPRPSEQWYCSAILDQCHTRTIRSVAWSPTGRALATASFDATVAVWELSSGVWE QVAELEGHENEVKCVAWNPDGRLIATCGRDRSVWIWESMPGREFECVDVKQGHSQDVKAVTWHPSGELLVSAGYDDTIKLWTYDGDEWGCAQTLGGTGTG HESTVWDVCWDPVSRARLASCSDDLTLRLWESRAAPTSTPASAPAGAAAAGFVPSRPDLRCAVTLSGHHRRTVFSLDWAPTGLIATGDGDDSILAEEEAS GLLTQPGGQWGCWARVAKAHGADVNCVRWNPAEPRLLASCSDDGLIRLWWLR* >CIAO1_dicDis Dictyostelium discoideum XP_646229 amoebozoa MTDTTKNDKYNLKLIDSMQKEAPYDKVWNLAWHPNGEILATCANDKYIQIWSKDTNGKWGLVQSLEGHEKTVRRVAWSPCGRFLAGASFDASTSIWEKSK DELEFTHVSSLEGHTYEVKSVAWDSTGTLLATCSRDKSIWIWQMEDDNDFECLSINSGHGQDIKCVLWHPNEELLASSSYDDTIKFWKDIDGDWECINTL TGHESSIWDLAFNKDGDKLVSCGEDKLVLFWKFDKENEKWINIFKFKNENSRPIYSIDWSSLTNTIVTGSADDSIIFYEQESDDTPDKYKIILKKKNAHD SDVNCTKWNPKFKNILASCGDDGFIKIWELQDK* >CIAO1_polPal Polysphondylium pallidum EFA75350 amoebozoa MSLNEISVLSYDQPSKIWNIEWSPDGKLLASCGDDKTIHIWMEESENKWVVLQKLEAHEKTVRRIAWSPDGKYLAAASFDASTSIWEVNNGEFNHISTLE GHSFEVKSVAWDASGQLLATCSRDKSIWIWQMEDDQDFECISINNGHSQDVKCVRWHPSLEILASASYDDTIKMWQDTDGDWECIDTLSAHESTIWDIQF NASGNRLVSCSDDRSVCFWRLDSTTGRWKLLSRLESVHSRPIFSVDWSHNQELSPTEQLICTGGGDDSIIIYHQKQQQQQQQSDSSSSSSTTPNEIEQYE ILYKHEKAHKSDINSIRWNPKKPNILASSGDDSTIKIWSFVC* >CIAO1_tetThe Tetrahymena thermophila XP_001017221 alveolata MIEEKMEEQKEFVKCIGQLNGHTDKIWSVSWHPTLDIFATCSSDKTIKIWGLKENSENQYELKQTISDTHERTIRTLAFSPDGMMLACGSFDSTISIYAL NNGSFEFVSKLEGHEHEVKCVAWDSEGKFLASCSRDKTVWVWDYENGFDFSCYSVIDAHTQDVKHVKWIPGTNNLASTSFDDKLKLWEQEDDDWKCSATY SNHSATVWCVEFSKTGQYMASCGDDKQIKVYKKNENGAFSSPYIVETTIKNAHARTIYSLSFSEDATFLASVGADNTLNVYQKNMYVTTFEGQDNNLYEL LEKKVNCHFADINCVAFHPSKDILVTVSDDRQIKLWSVEINL* >CIAO1_phySoj Phytophthora sojae EGZ17716 stramenopiles fragment LGVRGHPRGRAGPHHPRLRYRARCRSPDGRYLASVSFDATTVIWEKQGSSYEVISSLEGHESEVKSVAWSPSGSYLATCSRDKSVWIWEADADTDFECIS VLHGHTQDVKFVAWHPTEDLLVSASYDDTVRIWAENDDDWYCKETLAGHTSTVWGVALNPQGTQMASVSDDTDVVVWQRDVNSKEVNEDGSPKEWKQAFT VSGCHERTIFSVDWSKHGDLLVTGAADNAIRVFQGQPTDSPSSFELAVQQKEAHASDINCVRWSPQLQEDGGKKALLLASASDDGLVRIWKLQLP* >CIAO1_thaPse Thalassiosira pseudonana XP_002294332 stramenopiles fragment EWKLIATIREGHSRTIRSVAFAPTSSTLGVPILASASFDGKVLIWEHFADEENHGTFEPIAQLEGHENEIKHLAWNQTGSLLASCGRDKTIWIWECFLPG TVGGSASGGGGDDEGEFECLAVLQGHEGDVKSIAFALSHGQWGEGDEILLSASYDNSIKVWAEESGDWYCAATLAVHTSTVWCLGINPGGVRFLSGSEDG SMAIWKMYTATERKRLFPREHAVSSTDGLWNSGHGRIASGGGDNCIQIYREETGGSGAGSSSDAPKFAIEAMAINAHDGDVNCVKWYPRDGTSLVSCGDD GAVRIWKYSQAG* >CIAO1_naeGru Naegleria gruberi XP_002680935 heterolobosea MTTNDDLAALTTQEIISAVEGNVSDHEESVWSIAWHPKYSNLLATCSSDKTVRLYYVRVLSPSGRLFAKCIDVLENQHNRTIRRVDWSLPSGNALACASF DGTSSIWILLQNKLQNHLQALEEESQNSKESSPTTSANLGLLKCVSTLEGHENEVKSVAWNYKSASLMDQSDDHDGEDGDCGLLATCGRDKTVWVWEAID KVGFSDFDCNSVCSGHTQDVKFVAWHPLTRSNMPSLLYSASYDNTIRIWKEGGFEEDHDRQSDEWKCVGILRGHTSTVWGLAFEPQLSSDDPEYPQYMVS VGDDKSLILWREDVVGNYIDMNVTQVQTISDVHTRTIYYVDWCVYKHPSTGQSISLVATAGGDNTIAIYQFDTTTRQLKLLTKIANAHDSDINCCIWNKN EFGLLSSCSDDGAVKFWKLKM*