PRDM9: meiosis and recombination

From genomewiki
Jump to navigationJump to search

Introduction

PRDM9 is a gene on human chromosome 5 with a very peculiar history. Its primary function -- after many false starts -- has only recently become clear: scanning the genome with its terminal zinc finger array to locate and mark recombination hotspots with its histone methylase where its transcription factor domain can direct additional proteins to initiate the double stranded breaks needed for meiosis. Some level of recombination between homologous chromosomes is essential to proper alignment and separation into daughter cells as well as for bringing favorable alleles onto the same haplotype for adaptive evolution.

Such a mission-critical protein is typically highly conserved. However this is not the case here at all. Indeed, it proves exceedingly difficult to find a comprehensive set of PRDM9 orthologs even in the 39 sequenced placental mammalian genomes available on 15 July 2011, with immense confusion in the literature over paralogs, lost copies, pseudogenes, and similar composite domain proteins having only very distant homology. PRDM9 and its parent gene PRDM7 do not have a full-length counterpart in marsupials, monotremes, birds or earlier diverging vertebrates -- perhaps unusual in the whole proteome context but not for zinc finger proteins.

Rapid evolution of this gene subfamily occurs at the amino acid level, especially in zinc finger number and in the four residues responsible for recognizing a specific dna trinucleotide. This is not coincidental to the role in meiosis: the process tends to destroy its recombination hotspots by biased gene conversion. Since recombination is essential, new hotspots must emerge. The race is then on for PRDM7 or its spun-off PRDM9s to rapidly evolve and define new histone markup sites.

This rapid evolution could cause breeding incompatibility between populations in the F1 generation (meiosis arrest for lack of cross-overs, notably between chrX and chrY) and thus be central to the process of speciation. However the evolution of the hotspot-defining gene takes very different forms in different mammalian lineages. In effect each major clade of placentals is evolving a qualitatively different mating system, taking its most extreme form in pecoran ruminants with 6 PRDM9 genes. This differentiation follows upon the very different structure and gene content of sex chromosomes between monotremes, marsupials and placentals which in turn are much different from those of the amniote ancestor..

Syntenic relationships can help resolve gene duplication events during mammalian evolution. Here the chromosomal gene order TUBB3+ AFG3L1+ GAS8+ has stably existed since the stem amniote emerged 310 million years ago, with the arrangement TUBB3+ AFG3L1+ GAS8+ PRDM7- qTer arising in placental mammals prior to Afrothere divergence (ie, between 102-125 myr ago) and maintained there since over billions of years of observable branch length. PRDM9 however is found in many syntentic contexts, depending on clade and the various segmental duplications giving rise to these secondary copies.

From the perspective of comparative genomics, PRDM7 is the fundamental gene, not the disparate collection of genes lumped under PRDM9. At different times in different placental clades, PRDM7 spun off segmental duplications of itself to other sites in other chromosomes, probably because of its susceptible location at the extreme q arm of an autosomal chromosome. Because PRDM7 has stayed at its site adjacent to GAS8, it is possible to say unambiguously which of two initially identical copies is the parent gene. Because of this history, the 'PRDM9' genes do not form a distinct subtree within the overall two gene tree under phylogenetic algorithms but instead associate more closely with their parental PRDM7 parent.

These paralogous copies -- despite all being called PRDM9 -- are not orthologous outside their species clade of origin. Orthology requires (by long-standing definition) vertical descent from a common gene in the last common ancestor of two species. Here primate PRDM9 are descended from a common gene (namely the recent duplicate of PRDM7 in the stem preceding speciation) but 'PRDM9' in other clades arose from different duplications at different times during placental mammal evolution and so are not orthologous to primate PRDM9 (not vertically descended from a common PRDM9 in their last common ancestor).

Such copies are sometimes called in-paralogs within a species and co-orthologs across species. However these terms are topologically unstable (depend on the extent of species included in the gene tree) unlike the terms ortholog, paralog and homolog which are well-defined. Composite domain proteins such as PRDM7 give rise to whole new levels of terminological confusion as each domain has a long, complex and separate history of duplication and shuffling.

Comparative genomics in placental mammals

In euarchontoglires, a segmental duplication of PRDM7 occurred in a stem catarrhine primate and descended through speciation events to contemporary old world monkeys and great apes. This second copy (PRDM9) relocated to and stayed within a cadherin gene complex on a different chromosome. PRDM7 persisted at its original ancestral location but became an overt pseudogene in some lineages (rhesus, gibbon, gorilla, chimp and human) but not so clearly in others (orangutan). Earlier diverging primates such as lemurs, tarsier and new world monkeys have a single PRMR7 gene adjacent to GAS8. Tree shrew has unsatisfactory coverage in this region (six exons spread out over two contigs and 3 unassembled traces, a string of Ns in the terminal zinc finger domain, and undetermined synteny).

Although an obvious pseudogene, human PRDM7 is sometimes treated as a functional gene with 'isoforms'. However exon 9 of the reference sequence hg18 contains an internal direct tandem repeat of 88 nucleotides that throws off the reading frame and subsequent splice to exon 10, which itself has a frameshift (GGGG to GGG) in the second of its three zinc fingers. The protein is incorrectly described at NCBI, SwissProt and UCSC -- zinc fingers translated into the wrong reading frame cannot possibly form a stable fold, much less recognize a nucleotide sequence. Given the common comparative genomics context of duplication followed by subsequent pseudogenization (of either parent or duplicate), this feature is unquestionably a pseudogene whether it is still transcribed or not. Pseudogenization likely predated divergence of bushman and neanderthal and apparently independently of those events in other primates.

Rodents and lagomorphs have no counterpart to PRDM9, though the situation is confused by later chromosomal rearrangements (no affirming homolog or even debris adjacent to GAS8 or cadherin). The mouse gene is then orthologous to primate PRDM7, not PRDM9. The rat gene occurs in the same syntenic context as mouse; other rodent genomes are too incomplete for synteny to be assessed. Rabbit has two apparent PRDM7, called here PRDM7a and PRDM7b; neither copy is syntenic to mouse/rat or any other mammal. The pika genome is too incomplete to determine whether this duplication predated their divergence. Overall the data is consistent with a single PRDM7 locus in the last common ancestor of primate and rodent. It would be vastly more useful to complete genomes already begun than to embark on incomplete sequencing of an additional 10k vertebrate genomes.

Laurasiatheres have a quite different history of gene duplication. Most species simply retain the ancestral condition of a single PRDM7 gene adjacent to GAS8. Vampire bat (but not brown bat) has an additional segmental duplication to a novel location that is today a pseudogene. Dog inexplicably has a PRDM7 pseudogene but no PRDM9 despite a rather complete assembly, even as other carnivores (cat, panda, ferret), insectivores, perissodactyls and early-diverging artiodactyls (alpaca, pig, dolphin) have a conventional single PRDM7 gene (though some of these have too few zinc fingers to recognize sufficiently long dna motifs to delimit hotspots).

Carnivores -- but not bats or horses -- have an intervening cadherin gene between GAS8 and PRDM7. This rare genomic event is not the ancestral state but is unfortunately too restricted in distribution to resolve the status of Pegasoferae:

geneSpp         id  chr          strand     start       stop   span
PRDM7_ailMel  100%  GL193502         +-     628987    644235  15249
CAD1_homSap    73%  GL193502         +-     620344    624223   3880
GAS8_homSap    91%  GL193502         ++     594843    609901  15059

PRDM7_canFam   82%  chr5             ++   66560684  66567275   6592
CAD1_homSap    75%  chr5             ++   66571832  66581008   9177
GAS8_homSap    93%  chr5             +-   66587321  66604940  17620

PRDM7_felCat  100%  Un_ACBE01450414  +-      10493     13105   2613
CAD1_homSap    75%  Un_ACBE01450414  +-       3902      4280    379

PRDM7_equCab  100%  chr3             +-   36378853  36387224   8372
GAS8_homSap    93%  chr3             ++   36348528  36361906  13379

Pecoran ruminants (cow, sheep, muntjak) present a vastly more complicated situation. Cows -- even in the revised assembly -- have a PRDM7 pseudogene adjacent to GAS8 accompanied by 5 PRDM9 copies in other locations (all distinct from the primate cadherin secondary site). This is neither a recent development nor an artifact of domestication because a similar expansion is seen in provisional assemblies of sheep and muntjak (wild deer) but not dolphin, pig or vicuna, dating the expansion to stem pecoran ruminant. It is not clear which if any of these gene copies play a role in recombination -- the primate paradigm for meiotic markup is not immediately applicable to these species.

Atlantogenata (Afrotheres + Xenarthra) have yet another history. Elephant (best of five available assemblies) has three loci: an old PRDM7 pseudogene in GAS8 syntenic position, a seemingly functional PRDM9a with 12 terminal zinc fingers and novel syntenic location, and a fairly recent pseudogene PRDM9b. A dna assembly from fossil mammoth shows the same three genes with the same pseudogenization pattern. Although the sequences diverged separately after speciation, three identical inactivating mutations occur in both mammoth and elephant but not hyrax, thus dating gene loss relative to their speciation. This is shown for exon 9 below:

1 YVNCARDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKWKKELTAGR 1 PRDM9_conSeq  wildtype consensus reference
1 YVNCIQD*KEQNLVAFQYHRQIFHWTCCTIRPGCELLVWYGDNYSQELGIKWGSR*KKELTSGT 1 PRDM9b_loxAfr gg bad acceptor, early stop codon,   internal stop codon
1 YVNCTRDKEEQNLVAFQYHRQIFYWTCHTIQPGCelLVWYGDNYGQELGIKWGSR*KKELTSGT 1 PRDM9b_mamPri gg bad acceptor, two 1 bp deletions, internal stop codon
1 YVRRARDTEERNLVAFQYHRQIFYRTCCTVRPGCELLVWRGAEDSQALG    SRRTMELTSQK 1 PRDM9b_proCap pseudogene with 4aa deletion
1 YVNCARDEEEQNLVAFQYHRQIFYRTCRTIQPDCELLVWYGDEYGQELGIKWGSRWKKELTSGT 1 PRDM9a_loxAfr wildtype
1 YVNCARDEEEQNLVAFQYHRQIFYRT                                       1 PRDM9a_mamPri fragmentary coverage
1 YVNCARDEDEQNLVAFQYHGQIFYRTCRPVQPGCELLVWYGDEYGQELGIQRGSRQMKALSSQT 1 PRDM9a_proCap 17 zinc fingers
1 YVNGTQDEKEQNLVFFQYHRQIFYQTCYAVWPGCQLLVWYRDECGQELGIKWDNRGKKEFTVGT 1 PRDM7_loxAfr  bad acceptor, bad donor
1 YVNGTQDEKEQNLVFFQYHRQIFYQTCYAVWPGCQLLVWYRDECGQELGIKWDNRGKKEFTVGT 1 PRDM7_mamPri  bad acceptor, bad donor, 1 synon bp difference
1 YVNCARDDEEQNLVAFQYHRQIFYRTCRAIRPGCELLVWYGDEYGQELGIKWGSKWKKELTAEK 1 PRDM7_choHof  wildtype
1 YVNCAWDDKEQNLVAFQYHRQIFYRTCRTIRPGCELLVWYGDEYGQELGIKWGSKWKKEFMTGT 1 PRDM7_dasNov  wildtype

Marsupials and platypus: the mystery of exon 5

Tracking PRDM7 back to marsupials and beyond presents significant uncertainties. The three available marsupial assemblies are seriously incomplete, causing gene prediction problems when exons are spread over multiple small contigs, which further do not provide syntenic validation. Domain linker regions have weak amino acid conservation and so fail to give blast matches to placental queries, a problem exacerbated for short exons and pseudogenes (opossum). No expression data exist to bridge uncertain regions, meaning missing exons cannot be located nor exons in different contigs definitively connected. Because the domains here occur widely in other combinations in other proteins, a full length marsupial sequence is critical to testing whether the domain shuffle resulting in PRDM7 and PRDM9 was a placental innovation.

The most favorable situation occurs in the Monodelphis domestica assembly. Here eight of the ten expected exons (1 and 5 are missing) are readily located in a single assembly region of length 33,449 bp with a single gap (estimated at 270 bp). It is not surprising that exon 1 cannot be located because it has no known domain or reason for fixed length and is diverging rapidly in placentals. However locating exon 5 is important for distinguishing between two adjacent small genes evolving into a single fused gene only in the placental branch versus a full length gene already present in the last common ancestor.

Unless exon 5 lies within the assembly gap, it should be locatable in the 25,548 bp separating exon 4 and exon 6 (of which 8,263 bp remains after application of RepeatMasker). However blastx against a panel of 54 exon 5 sequences from placental mammal fails to give any suggestion of match, despite plausibly adequate length (all placental exon 5 sequences have 52 amino acids).

Gene prediction tools such as GenScan, NScan, Ensembl and Gnomon give useless results because they neglect comparative genomics: a few exons are correctly predicted but are otherwise embedded in time-wasting rubbish. The poor reliability of these tools does not justify GenBank clutter (eg XM_001369137) for their predictions. The 46-species whole genome alignment at UCSC (starting with PRDM7/9 'ProteinFasta' link at the description page) is a better starting point.

Here it should be noted that exon 5 has not diverged especially rapidly from the last common ancestor of placentals. Aligned to human, the full range of sequences has overall identity of 69%. Exon 5 has a number of invariant and semi-invariant residues, only possible over this time span if maintained by selective pressure. Thus it has some function even though it contains no known Pfam domains and has no crystallographic structure match. Because exon 4 has a splice donor of phase 0 and exon 6 a splice acceptor of phase 2, exon 5 in marsupials must take the form 0 xxxxxxxx 1 to conserve reading frame. This rules out non-use of exon 5 in marsupials (alternative splicing) followed by mutational decay to unrecognizability.

The opossum gene is peculiar in that 7 of the 8 exons available are quite conventional in sequence but the terminal zinc finger exon is completely broken up by frameshifts and stop codons and barely recognizable. The other exons return only PRDM7/9 as significant matches when back-blasted against the human genome establishing that they have not been confused with the many hundreds of partial homologs with KRAB, SSXRD, PR (SET) or C2H2 domains.

The Sarcophilus harrisii assembly is missing the same two exons but has a conventional terminal exon with an intact zinc finger region of seven repeats (with two distal frameshifts however). Here exons 2 occurs in contig AFEY01202902 and exons 3-4 in AFEY01156721 with 1,436 bp left over to host exon 5; exons 6-10 are found in a third contig AFEY01386448 with 8,331 bp available upstream for exon 5. It is not known whether these contigs would be adjacent in more complete assembly.The six exons comparable between tasmanian devil and opossum are 82% identical to each other as proteins and 67% identical to those of human, not indicative of anomalous or especially rapid evolution in the context of entire proteome rates.

The Macropus eugenii (wallaby) assembly is least complete, with no contig containing more than a single exon. Here exons 1, 4, 5 and 8 are missing altogether but the terminal zinc finger exon is intact with 7 C2H2 domains. It is worth noting that the exon 10 is so long and distinctive with its phase 2 reading frame and early zinc finger that there is no possibility of confusing it with those of homologs (HKR1, ZNF133, ZNF169, ZNF343, ZNF589 in human).

If marsupials had a markedly (or even totally) different exon 5 of form 0 xxxxxxxx 1, it should emerge from a tblastx comparison of the regions between exons 4-6. However no plausible candidate emerges. This implies orthology despite the assembly gaps and missing exon 5, ie the last common ancestor to marsupials and placentals had a full length PRDM7-type gene. It is uncertain whether these should be connected up into a single gene with the later exons -- the whole issue here is timing of the final gene shuffle.

The situation in platypus is curious. Only distal exons 6-10 can be reliably recognized in the current assembly, ie KRAB, SSXRD and exon 5 are missing but the knuckle, PR and zinc finger domains are present with 3-4 repeat units. However the early zinc finger in the last exon is not present. Yet the best backblast to human is still PRDM7/9. These exons occur in two tandem copies on the same strand but differ significantly from each other and so do not represent mis-assembly duplications. The intervening area is gapless so the missing exons should be locatable if present.

However they are not. Upon blastx of the repeatmasked sequence against Genbank tetrapod sequences, no matches occur, other than three worthless platypus gene models (XP_001507240, XP_001509482, XP_001509433) that predict earlier exons which however are wholly lacking in any support in any other species. Thus it appears that the gapless region does not contain any counterpart to exons 1-5 of theran mammals. Either this region has been lost in platypus or it is a stand-alone shorter distal version of PRDM7/9. The first identifiable exons begins with the expected phase 2 reading frame in both tandem copies and do not contain an in-frame methionine upstream prior to a stop codon. Hence there must be at least one earlier exon. However tblastx of the appropriate regions of repeatmasked marsupial and platypus again does not identify noteworthy peptide candidates.

Perhaps the corresponding ancestral region was shuffled together with a gene providing the proximal regions in the theran branch only, giving rise to the full length gene there. However tblastn queries of the platypus assembly, while locating numerous appropriate KRAB_A domains with the correct 0 xxxxxxxx 1 reading frame that backblast to other human proteins, do not find counterparts of the exon 1-5 region beyond exon 2. Hence there is no obvious donor for the proximal half of PRDM7/9.

Given that the PRDM and zinc finger families are greatly expanded with extensive domain shuffling in mammals with difficulties already tracing back PRDM7/9 to marsupials and monotremes, it comes as no surprise that bird, lizard and frog genomes shed no further light on the evolution of this gene. The situation in non-placental mammals could theoretically be resolved by sequencing transcripts, but these are exceedingly rare for PRDM7/9 even in placentals and so will not emerge unless explicitly sought.

Conservation of exon 5 within placentals; invariant residues in red

PRDM9_homSap    GMPKASFSNESSLKELSRTANLLNASGSEQAQKPVSPSGEASTSGQHSRLKL
PRDM9_panTro    .......N.........GMP....T............P..............
PRDM9_gorGor    .....................................P..............
PRDM9_ponAbe    .......N.........G.Q....T............P..........T..I
PRDM9_nomLeu    .................GA..................P..............
PRDM9_macMul    .......N.......V.GM.....T............P...R..........
PRDM9_papHam    E...T............G.P...ST.........A..P..............
PRDM7_calJac    .......G......K..G...V..T..P.........P..............
PRDM7_micMur    ...R.PL.DG.......G......T.....P......PR..........R..
PRDM7_otoGar    ...R.PL.DG.......GP.S.P.I.....H..HM.SPR.........GR.S
PRDM7_tarSyr    ...R.PL.IV.......EM.....T.D....W......R.....E....K..
PRDM7_oryCun    ...RLPVN.........GI.....TT...ED...SF.PK.TR......TR..
PRDM7_ratNor    ET.RMPL.DK..V..VFGIE....T....H.....CSPE.GN.....FGK..
PRDM7_musMus    ESSRMP..G..NV..G.GIE....T....HV.....SLE.GN......GK..
PRDM7_speTri    LK.EVLL..........G......T.....V......LR...A.R....R..
PRDM9e_bosTau   ..SR.PL.K.......PGA.K..KT..CK....L.P.PRK.R.PE..P.Q.V
PRDM9c_oviAri   ..S..LV..K.....MPGASK..KTR.PK...I..PAPR.P...E..P.Q.V
PRDM9a_munMun   ..SR.PLIK.......LGA.K.MKT...K...N..PHPRK.R.P...P.Q.V
PRDM7_turTru    AV.PVPL.......K.PGA.Q.QK...PA...S.AP.P.A....AW.T.Q..
PRDM7_lamPac    ...RGPL..Q.......G..KP.KT...G.....FP.L.......R...Q..
PRDM7_susScr    SDSRVPL..K......LT..EVPET.......E....P......RRR.GQE.
PRDM7_canFam    .I.RVPL..K.......E..K...T.SP..G..S..LP.K.....H.T.Q..
PRDM7_felCat    .THRVPL.K.....DF.E..K...T.....G.....LP.......H...R..
PRDM7_ailMel    .I.R.PLR.........E..K...T....LG.....LP.......HD.LQ..
PRDM7_musPut    .V.R.PL..........E..K...T....HD.....HP.......H..LR..
PRDM7_pteVam    A..RVPL...P......VI....K......D....F.P.K..A.R....Q..
PRDM7_myoLuc    AKSR.PL..........G.....TT.....T..T.P.P.........P.S..
PRDM7_equCab    R.RT.PL....R.....G..K..KT.S...V......L....S.E....R..
PRDM7_sorAra    .RSRTPI.....S....G.RT...TKCTK.....LF.P.......HY.KP..
PRDM9a_loxAfr   .T...LLG.......V.G..I...TT..........SP......D.P..W..
PRDM7_echTel    ...GV.LR...N..V..G..I..T.AEP..PH-.G..P...T..HE.L.Q.V
PRDM7a_proCap   .T...LLG.......V.G..I...TT..........SP......D.P..W..
Consensus       GMPRAPLSNESSLKELSGTANLLNTSGSEQAQKPVSPPGEASTSGQHSRQKL

Distal domain combination already formed by fish ancestor

Various additional sequences are relevant to understanding the curated placental mammal PRDM7/9 set. For example, the neanderthal genome despite being very far from satisfactory coverage can provide a PRDM9 sequence derived from the human reference sequence using non-synonymous SNPs reported in the corresponding UCSC browser track. The changes reported in the zinc finger domain (R HDL S R) may be enough to have created somewhat of a species barrier, though this involves comparing a fossil sequence to a contemporary human (which are today themselves quite variable). Similarly, the bushman genome sequence might yield an intermediate outgroup, though that assembly (like so many others) remains elusive.

Terminal sequences for 9 additional species of murid rodents have been determined but these have limited value for comparative genomics because they do not even cover the entire terminal exon and their syntenic contexts (and thus homological relationships) were not established. The single individual sequenced may not be representative of the overall population in the zinc finger region (based on the extensive diversity observed in human), diminishing their utility for predicting species barriers. These genes are most likely PRDM7 orthologs only secondarily related to the catarrhine primate PRDM9 set, ie descended from the unique locus present in stem euarchontoglires whereas the latter duplicated from a stem old world monkey PRDM7. It is worth noting that the reported sequences are very orderly and lack the overall chaos of frameshifts and stop codons so often seen in this gene family. The protein accessions are here.

A zebrafish protein put forward as an ortholog to placental mammal PRDM9 seems implausible given that birds, lizard and frog lack notable homologs. It lacks close counterparts in other species of fish with determined genomes and is not syntenic to mammalian gene locations. Thus it might represent an independent gene shuffle that resulted in a similar concatenation of domains (parallel evolution) . However both piecewise and whole back-blast to mammal call up only PRDM7/9 and the closely related PRDM11, suggesting orthology of the parts. The protein lacks the KRAB and SSXRD domains but contains a standard knuckle, PR(SET), early ZNF finger and ZNF repeat domain (all in exons phased identically to human). The repeat region is fairly chaotic with only moderate resemblance in its details to mammalian zinc fingers. Related genes are found in salmon, trout, catfish and minnow but not stickleback, fugu, tetraodon or medaka. Transcripts are exceedingly common in contrast to mammals. The missing KRAB and SSXRD domains are believed critical in recruiting other essential proteins to the hotspot in the only systems with experimental data (mouse and human).

Reported PRDM9 orthologs in early diverging bilatera such as Lottia, Capitella and Nematostella can be dismissed as independent occurrences of common ancient domains. None of these domains are mammalian innovations -- PR(SET) traces back to bacterial methylases and zinc fingers also have a long and complex history. Without conservation of all mammalian domains, exon phasing, syntenic chromosomal location and demonstration of descent from a single gene in the last common ancestor, there is no basis for calling such genes orthologous nor assuming they function similarly in meiosis or illuminate mammalian PRDM7/9 evolution in any way. Widespread expression in testes is actually not supportive as it conflicts with the mammalian expression pattern. How could such a fundamental capacity be lost (and replaced by a non-homologous system) so many times in so many other lineages-- all of which have obligatory meiosis?

It thus appears that while the core distal domains (terminal five exons) of PRDM7/9 came together long ago, the function may have been inessential because numerous lineages subsequently lost any counterpart. This core domain persisted however into the common ancestor with monotremes, with the full length gene only coming together by marsupial divergence (or with more certainty, in stem placental).

>PRDM7_danRer Danio rerio (zebrafish) Q6P2A1 transcript BC064665 no KRAB SSXRD or exon 5 but knuckle SET early ZNf C2H2 array
0 MSLSP 1
2 DLPPSEEQNLEIQGSATNCYSVVIIEEQDDTFNDQPF 1
2 YCEMCQQHFIDQCETHGPPSFTCDSPAALGTPQRALLTLPQGLVIGRSSISHAGLGVFNQGQTVPLGMHFGPFDGEEISEEKALDSANSWV 0
0 ICRGNNQYSYIDAEKDTHSNWMK 2
1 FVVCSRSETEQNLVAFQQNGRILFRCCRPISPGQEFRVWYAEEYAQGLGAIWDKIWDNKCISQ 1
2 GSTEEQATQNCPCPFCHYSFPTLVYLHAHVKRTHPNEYAQFTQTHPLESEAHTPITEVEQCLVASDEALSTQTQPVTESPQEQISTQNGQPIHQTENSDEPDASDIYTAAGEISDEI
HACVDCGRSFLRSCHLKRHQRTIHSKEKP
YCCSQCKKCFSQATGLKRHQHTHQEQEKNIESPDRPSDI
YPCTKCTLSFVAKINLHQHLKRHHHGEYLRLVESGSLTAETEEDHT
EVCFDKQDPNYEPPSRGRKSTKNSLKGRGCPKKVAVGRPRGRPPKNKNLEVEVQKIS
PICTNCEQSFSDLETLKTHQCPRRDDEGDNVEHPQEASQ
YICGECIRAFSNLDLLKAHECIQQGEGS
YCCPHCDLYFNRMCNLRRHERTIHSKEKP
YCCTVCLKSFTQSSGLKRHQQSHLRRKSHRQSSALFTAAI
FPCAYCPFSFTDERYLYKHIRRHHPEMSLKYLSFQEGGVLSVEKP
HSCSQCCKSFSTIKGFKNHSCFKQGEKV
YLCPDCGKAFSWFNSLKQHQRIHTGEKP
YTCSQCGKSFVHSGQLNVHLRTHTGEKP
FLCSQCGESFRQSGDLRRHEQKHSGVRP
CQCPDCGKSFSRPQSLKAHQQLHVGTKL
FPCTQCGKSFTRRYHLTRHHQKMHS* 0

Comparative genomics: sequence availability

As of mid-April 2011, some 61 PRDM7 and PRDM9 genes from 36 species can be recovered from placental mammal genome projects. The encoded proteins are compiled here as tab-delimited pdf text that will paste cleanly into rows and columns of a spreadsheet such as excel, and as exon-by-exon gene models in the Curated reference sequences section below.

Of these 61 genes, 18 are pseudogenes in various states of degeneration. There has been no gain or loss of introns -- all have the same identically intronated ten exons. No retroprocessed genes occur despite transcription in germline tissues. Because many genomes are incomplete, 83 exons of the 610 expected are apparently located in coverage gaps or are too short and diverged to be recognizable.

The table below shows the number of zinc fingers in the second column, phylogenetic clade in the third, and adjacent gene (synteny) in the fifth.

The number of zinc fingers is quite variable in human and likely so in all species; the table provides that of the individual selected for genome project which may not be repesentative of the species. These zinc finger arrays have been corrected in low coverage genomes for common sequencing errors -- frameshifts and premature stop codons arising from nucleotide run length mis-calls (eg, ggggg interpeted as gggg).

Pseudgenes are sometimes obvious (large deletions, reading frame errors at multiple locations, stop codons in early exons, amino acid substitutions not corresponding to the conservation profile) but otherwise can be difficult to distinguish from assembly error or a bad allele of a usually intact gene in the population (possibly a balanced polymorphism that reduces copy number). A pseudogene can continue being transcribed for tens of millions of years after losing all functionality at the protein level. That is moot here because PRDM7 and PRDM9 are barely represented in the tens of million mammalian transcripts at GenBank.

The PRDM7 genes are all orthologous in the classical sense (as can be seen by adjacency to the unrelated gene GAS8) but the PRDM9 genes arose as different lineage-specific segmental duplications so are orthologous only when shared within a well-defined phylogenetic clade.

  • PRDM7: genes with ancestral location GAS8 synteny
  • PRDM9: lineage-specific segmental duplications of PRDM7
  • Pseudogenes: multiple disabling frameshifts and stop codons in parental gene (not a retrogene)
  >PRDM9_homSap   13  prim  gene  CDH12  Homo         sapiens      (human)         NM_020227
  >PRDM9_panTro   19  prim  gene  CDH12  Pan          troglodytes  (chimp)         GU166820
  >PRDM9_gorGor    -  prim  gene  cdh12  Gorilla      gorilla      (gorilla)       CABD02290264
  >PRDM9_ponAbe   10  prim  gene  CDH12  Pongo        abelii       (orangutan)     XR_093432
  >PRDM9_nomLeu   10  prim  gene  cdh12  Nomascus     leucogenys   (gibbon)        ADFV01015315
  >PRDM9_macMul    9  prim  gene  CDH12  Macaca       mulatta      (rhesus)        XM_001083675
  >PRDM9_papHam   11  prim  gene  cdh12  Papio        hamadryas    (baboon)        genome
  >PRDM7_homSap    3  prim  gene  GAS8+  Homo         sapiens      (human)         genome
  >PRDM7_panTro    2  prim  pseu  GAS8+  Pan          troglodytes  (chimp)         genome
  >PRDM7_gorGor    3  prim  pseu  GAS8+  Gorilla      gorilla      (gorilla)       genome
  >PRDM7_ponAbe    4  prim  gene  GAS8+  Pongo        abelii       (orangutan)     genome
  >PRDM7_nomLeu    5  prim  pseu  gas8+  Nomascus     leucogenys   (gibbon)        ADFV01125891
  >PRDM7_macMul    2  prim  pseu  GAS8+  Macaca       mulatta      (rhesus)        genome
  >PRDM7_papHam    2  prim  pseu  gas8+  Papio        hamadryas    (baboon)        genome
  >PRDM7_calJac   12  prim  gene  GAS8+  Callithrix   jacchus      (marmoset)      XR_090591
  >PRDM7_tarSyr    -  prim  pseu  gas8+  Tarsius      syrichta     (tarsier)       ABRT011082008
  >PRDM7_micMur    8  prim  gene  gas8+  Microcebus   murinus      (lemur)         ABDC01433247
  >PRDM7_otoGar    7  prim  gene  GAS8+  Otolemur     garnettii    (galago)        genome
  >PRDM7_tupBel    9  prim  gene  noDet  Tupaia       belangeri    (tree_shrew)    genome
  >PRDM9_oryCun    8  glir  gene  other  Oryctolagus  cuniculus    (rabbit)        genome
  >PRDM7_oryCun    4  glir  gene  other  Oryctolagus  cuniculus    (rabbit)        genome
  >PRDM7_ochPri    -  glir  gene  noDet  Ochotona     princeps     (pika)          AAYZ01312269
  >PRDM7_ratNor   10  glir  gene  PDCD2  Rattus       norvegicus   (rat)           NM_001108903
  >PRDM7_musMus   12  glir  gene  PDCD2  Mus          musculus     (mouse)         NM_144809
  >PRDM7_musMol   11  glir  gene  noDet  Mus          molossinus   (wild_mouse)    GU216230
  >PRDM7_dipOrd    -  glir  gene  noDet  Dipodomys    ordii        (kangaroo_rat)  genome
  >PRDM7_speTri    -  glir  gene  noDet  Spermophil   tridecemlin  (squirrel)      AAQQ01308561
  >PRDM9a_bosTau   7  laur  gene  noDet  Bos          taurus       (cattle)        NW_003053109
  >PRDM9b_bosTau   5  laur  gene  noDet  Bos          taurus       (cattle)        DAAA02065087
  >PRDM9c_bosTau   -  laur  gene  noDet  Bos          taurus       (cattle)        XM_002699750
  >PRDM9d_bosTau   9  laur  gene  noDet  Bos          taurus       (cattle)        genome
  >PRDM9e_bosTau   9  laur  gene  noDet  Bos          taurus       (cattle)        genome
  >PRDM9e_oviAri   -  laur  pseu  noDet  Ovis         aries        (sheep)         genome
  >PRDM9d_oviAri   -  laur  gene  noDet  Ovis         aries        (sheep)         genome
  >PRDM9c_oviAri   4  laur  pseu  noDet  Ovis         aries        (sheep)         genome
  >PRDM9b_oviAri   2  laur  pseu  noDet  Ovis         aries        (sheep)         genome
  >PRDM9a_oviAri   9  laur  gene  noDet  Ovis         aries        (sheep)         genome
  >PRDM9d_munMun   4  laur  gene  noDet  Muntiacus    muntjak      (muntjac)       AC216498
  >PRDM9c_munMun  15  laur  gene  noDet  Muntiacus    muntjak      (muntjac)       AC154919
  >PRDM9b_munMun  13  laur  gene  noDet  Muntiacus    muntjak      (muntjac)       AC218859
  >PRDM9a_munMun   7  laur  gene  noDet  Muntiacus    muntjak      (muntjac)       AC225653
  >PRDM7_bosTau    -  laur  pseu  GAS8+  Bos          taurus       (cattle)        genome
  >PRDM7_turTru    9  laur  gene  gas8+  Tursiops     truncatus    (dolphin)       ABRN01441536
  >PRDM7_lamPac    2  laur  gene  noDet  Lama         pacos        (llama)         scaffolds 
  >PRDM7_susScr    9  laur  gene  GAS8+  Sus          scrofa       (pig)           FP476134
  >PRDM7_canFam    5  laur  pseu  GAS8+  Canis        familiaris   (dog)           genome
  >PRDM7_felCat   11  laur  gene  GAS8+  Felis        catus        (cat)           genome
  >PRDM7_ailMel    6  laur  gene  GAS8+  Ailuropoda   melanoleuca  (panda)         GL193502
  >PRDM7_musPut    3  laur  gene  noDet  Mustela   putorius  (ferret)         AEYP01035077
  >PRDM9_pteVam   15  laur  pseu  noDet  Pteropus     vampyrus     (bat)           ABRP01232219
  >PRDM7_pteVam    7  laur  gene  GAS8+  Pteropus     vampyrus     (bat)           ABRP01250178
  >PRDM7_myoLuc    6  laur  gene  gas8+  Myotis       lucifugus    (bat)           AAPE02062260
  >PRDM7_equCab    4  laur  gene  GAS8+  Equus        caballus     (horse)         genome
  >PRDM7_sorAra    8  laur  gene  noDet  Sorex        araneus      (shrew)         AALT01000095
  >PRDM9a_loxAfr  12  afro  gene  noDet  Loxodonta    africana     (elephant)      genome
  >PRDM9b_loxAfr   3  afro  pseu  noDet  Loxodonta    africana     (elephant)      genome
  >PRDM7_loxAfr    5  afro  pseu  GAS8+  Loxodonta    africana     (elephant)      genome
  >PRDM7_echTel    5  afro  pseu  noDet  Echinops     telfairi     (tenrec)        genome
  >PRDM7a_proCap  17  afro  pseu  noDet  Procavia     capensis     (hyrax)         ABRQ01392668
  >PRDM7b_proCap  13  afro  pseu  noDet  Procavia     capensis     (hyrax)         ABRQ01227339
  >PRDM7_dasNov    9  xena  pseu  noDet  Dasypus      novemcinctus (armadillo)     AAGV020462211
  >PRDM7_choHof    2  xena  pseu  noDet  Choloepus    hoffmanni    (sloth)         ABVD01893961

Comparative genomics of PRDM9 and PRDM7

PRDMcompBio.jpg

PRDM9 is one of many human proteins sharing a set of common domains, as well as various multiplicities of the zinc finger domain C2H2. The diagram at left shows an effort at organizing these into phylogenetic tree according to structural considerations of the SET domain these proteins all share.

The traditional SET domain is too small for an enzyme with distinctive substrates so flanking sequence must be added despite its lack of apparent conservation. Using S-adenosyl methionine, PRDM9 places the third methyl group only on the fourth position lysine in mature histone H3 (which is actually position 5 prior to iMet removal: MARTKQTARK...), one of many such epigenetic methylases in the human genome. The histone recognized by such methylases correlates poorly with evolutionary grouping by SET domain (figure).

The upper left corner shows the variability in domain structure. While PRDM9 and PRDM7 share the same domains (an upstream KRAB domain is not shown), of PR-class homologs, PRDM11 shares only the SET domain despite nesting deep within the PRDM9 subtree. PRDM4 has both the SET and C2H2 domains, possibly sharing the early C2H2 domain in an exon beginning with a phase 2 splice acceptor (as shown in reference sequence section). Overall however, PRDM9 and PRDM7 have no full length homologs with matching exon structure. Even the SET domain is intronated differently within PR-class proteins (with the sole exception of PRDM11), suggesting either ancient divergence or unusual evolution. These incongruities may have arisen from domain shuffling, gain and loss.

The human PRDM9 sequence below is annotated in color for domains relative to exon breaks. The protein can be best understood in terms of concatenated domains, not all of which may be present in antecedent and descendant homologs. The first two domains KRAB and SSXRD interact with transcription factors.

Each C2H2 domain -- so named for two cysteines and two histidines liganding to a structural zinc ion -- recognizes a specific trinucleotide (more or less) and so concatenated in a large array recognize specific binding sites along the genome, though tolerance of nucleotide variability and synergistic effects between adjacent units make it difficult to read out these sites precisely, despite immense efforts.

PRDM7dot.gif

The concatenated C2H2 domains, conserved at the amino acid level so necessarily similar at the dna level, are prone to replication slippage. This process can give rise to point mutations as well as leading to a peaked distribution of repeat number rather than to a single number. Many other unrelated genes with internal repeats (such as the octapeptide region of the prion gene PRNP) are also affected by replication slippage. Such proteins regions are conveniently identified genomewide by mRNA dot plots.

The C2H2 domains generally reside in a long distinctive terminal exon of splicing phase 2 that has been shuffled over mammalian evolutionary time into various contexts. Concepts such as paralogy and orthology need piecewise definitions in these composite proteins. Synteny (gene adjacency) plays a major role in reliably deconstructing events in specific lineages.

Here the unrelated single-copy conserved gene GAS8 plays an important role. PRDM7 occurs immediately distal to it on the negative strand, making the two genes are convergently transcribed). PRDM7 is otherwise the last gene on the q arm of its chromosome in many species which may predispose it to copy number dispersal events. PRDM9 is not consistently located within placental mammals, suggesting independent relocation events.

Both PRDM9 and PRDM7 contain a seldom-mentioned C2H2 domain early in the exon annotated by SwissProt and readily found by the online domain tools regardless of species. This domain conserves the four critical residues needed for zinc binding (and so the associated fold) but lacks the terminal cap TGEKP which otherwise serves to lock down a C2H2 zinc finger after it has scanned along genomic dna to an appropriate trinucleotide. The function of this early domain and the following 112 residues are unknown -- no homologous 3D structure has ever been determined.

The first C2H2 of the main repeat region is proximaly degenerate, beginning in VKY in all species (instead of YCE). The tyrosine cannot plausibly replace the usual cysteine for zinc binding though the other three needed residues are present. This domain ends in a typical cap region TGEKP. Humans are the exception here where the conserved helix-ending proline has been replaced with leucine in the reference human genome with unknown functional consequences.


>PRDM9_homSap Homo sapiens (human) Q9NQV7 10 exons chr5:23,509,579 span 18,301 bp KRAB SSXRD zinc knuckle SET early ZNF C2H2 cap
0 MSPEKSQEESPEEDTERTERKPM 0
0 VKDAFKDISIYFTKEEWAEMGDWEKTRYRNVKRNYNALITI 1
2 GLRATRPAFMCHRRQAIKLQVDDTEDSDEEWTPRQQ 1
2 VKPPWMALRVEQRKHQK 0
0 GMPKASFSNESSLKELSRTANLLNASGSEQAQKPVSPSGEASTSGQHSRLKL 1
2 ELRKKETERKMYSLRERKGHAYKEVSEPQDDDYL 1
2 YCEMCQNFFIDSCAAHGPPTFVKDSAVDKGHPNRSALSLPPGLRIGPSGIPQAGLGVWNEASDLPLGLHFGPYEGRITEDEEAANNGYSWL 0
0 ITKGRNCYEYVDGKDKSWANWMR 2
1 YVNCARDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKWKKELMAGR 1
2 EPKPEIHPCPSCCLAFSSQKFLSQHVERNHSSQNFPGPSARKLLQPENPCPGDQNQEQQYPDPHSRNDKTKGQEIKERSKLLNKRTWQREISRAFSSPPKGQMGSCRVGKRIMEEESRTGQKVNPGNTGKLFVGVGISRIAK
VKYGECGQGFSVKSDVITHQRTHTGEKL
YVCRECGRGFSWKSHLLIHQRIHTGEKP
YVCRECGRGFSWQSVLLTHQRTHTGEKP
YVCRECGRGFSRQSVLLTHQRRHTGEKP
YVCRECGRGFSRQSVLLTHQRRHTGEKP
YVCRECGRGFSWQSVLLTHQRTHTGEKP
YVCRECGRGFSWQSVLLTHQRTHTGEKP
YVCRECGRGFSNKSHLLRHQRTHTGEKP
YVCRECGRGFRDKSHLLRHQRTHTGEKP
YVCRECGRGFRDKSNLLSHQRTHTGEKP
YVCRECGRGFSNKSHLLRHQRTHTGEKP
YVCRECGRGFRNKSHLLRHQRTHTGEKP
YVCRECGRGFSDRSSLCYHQRTHTGEKP YVCREDE* 0
         -1  23  6           traditional numbering of dna recognizing amino acids
HPCPSCCLAFSSQKFLSQHVERNH     alignment of early C2H2 domain
  *  *            *  *       zinc liganding positions

Different segmental duplications relate PRDM9 and PRDM7

PRDM7segDup.gif

In humans, PRDM9 and PRDM7 are related by a 26 kbp segmental duplication that begins about 8 kbp upstream of the start codon and continues through most of the 3' UTR. Since the retroposon patterns are nearly identical, the duplication must be fairly recent. The overall percent identity of non-coding dna is about 93%, again inconsistent with either early (within stem placental or late divergence (post-chimpanzee). The duplication contains a potentially diagnostic 1845 bp retroposon-free region upstream of the first coding exon.

Note PRDM7 is situated at the extreme tip of chromosome 16q, perhaps predisposing it to chromosomal copy number rearrangements. The syntenic context is TUBB3+ DEFB+ AFG3L1+ DBNDD1- GAS8+ PRDM7- qTel, meaning it is transcribed convergently with GAS8, a non-homologous highly conserved single copy gene often detectable even in low coverage genomes in the small contig containing PRDM7. This association has been extremely stable over boreoeutheran placental mammal evolutionary time and so serves to reliably define PRDM7 orthologs and their spin-off copies. Elephants also have a gene pair similar to human PRDM9 and PRDM7. The former is at a syntenically novel site but the latter is an old pseudogene but still detectably adjacent to GAS8 in opposite orientation. It thus follows that 'PRDM9' in elephant is an independent earlier spin-off of its conventional PRDM7 gene. This is consistent with telomeric susceptibility to repeated rearrangements.

Recall here the actual definition of gene orthology: two genes in two species are orthologous if they are vertically descended from the same gene in their last common ancestor. Here the LCA of human and elephant is ur-placental mammal which had PRDM7 but no PRDM9. The two PRDM9 genes are thus not descended from a common ancestral PRDM9 gene but from parallel gene duplications of a common PRDM7 gene at different times in different clades during the course of mammalian speciation. Such genes are called in-paralogs within a given species and co-orthologs across them.

The syntenic context of PRDM9 is quite variable, supporting the scenario of multiple origins. This context can be used to count the number of distinct segmental duplications of PRDM7. For example, in humans, PRDM9 basically lies in a retroposon-rich gene desert but is eventually flanked by two pairs of cadherin genes at the much larger scale of 7 mbp. In rhesus, these same genes are seen (with some minor rearrangements), establishing that this PRDM9 segmental duplication preceded the divergence of old world monkeys.

Marmoset has a seemingly functional PRDM7 in the usual position facing GAS8, still at the extreme end of chromosome 20. The cadherin cluster is intact on chr2:178,954,165-180,696,523. However Blastx of the intervening dna -- which is similar in size to rhesus and human so not suggesting large deletions -- shows not even a suggestion of an old PRDM9 pseudogene. The assembly is gapless here. and Blastx is sensitive enough to detect very old pseudogenes provided they decayed by small indels and nucleotide substitutions. Thus it appears that PRDM7 never duplicated in marmoset -- placing that even in the stem to old world monkeys (or prior to tarsier divergence -- that assembly has poor coverage). Note that the marmoset PRDM7 has a respectable terminal zinc finger array of twelve units, enough to specify 36 bp.

Gene  Strand Protein      Start     Species
CDH18    -   cadherin 18  19981287  homSap  ponAbe  macMul
CDH12    -   cadherin 12  22853731  homSap  ponAbe  macMul  calJac
PRDM9    +   human PRDM9  23528704  homSap  ponAbe  macMul  calJac
CDH10    -   cadherin 10  24644911  homSap  ponAbe  macMul  calJac
CDH9     -   cadherin 9   27038689  homSap  ponAbe  macMul

Lemurs present a new complication. The Otolemur assembly has two distinct and seemingly functional PRDM7 copies (each with seven zinc fingers) containing GAS8 end-sequence in expected opposite orientation. One of the GAS8 copies appears to be a pseudogene. This represents a new type of lineage-specific segmental duplication. There is no sign of PRDM9. The other lemur with an assembly, Microcebus murinus, has but a single copy, again with seven zinc fingers. The only relevant contigs (ABDC01433247 and ABDC01371462) contain no coding syntenic information so this gene cannot be assigned to PRDM7 with certainty.

The tree shrew assembly, like tarsier, has low coverage and only blast matches to zinc finger arrays that cannot be assigned to the PRDM family. This cannot be totally attributed to low coverage because many ordinary genes are satisfactorily represented in these species. Other issues such as telomeric position, gene copy number (mobility), pseudogenization, deletional loss, chimerization, and individual heterozygosity must be affecting recovery of PRDM9 gene models in these species.

Moving on to laurasiatheres, Bos taurus presents a much more complicated situation. First, the GAS8 locus on chr18 contains the first two exons of a PRDM7 pseudogene in expected orientation but distal regions of the gene are completely deleted. The cadherin locus on chr20 is also intact but the 2.6 mbp region between CDH12 and CDH10 contains no indication of PRDM9, consistent with that segmental duplication being primate-specific and PRDM7 being the older parental location. This holds in the Baylor 4.0 assembly carried at UCSC, the Baylor 4.2 assembly, and the alternative assembly of the same data, UMD3.1. The latter two can be queried by the genomic blast server at NCBI.

A third locus on chr 1 hosts an unreviewed GenBank pipline entry called PRDM9, derived as NW_003053109 from the alternative bovine assembly UMD3.1 Staff corrected an unspecified frameshift to fix the reading frame -- a dangerous practise in a gene family so prone to pseudogenization. The gene, called PRDM9a here, resides on the extreme end of chromosome 1 and differs from the Baylor 4.0 assembly at two amino acids outside the zinc finger region. The syntenic context here is novel: EFHB- RAB5A+ PCAF+ ZNF596- PRDM9a- which corresponds overall to human chr 3. The juxtapositioning of two zinc finger proteins on the same strand causes PRDM9 alignments to extend spuriously into the 12 zinc fingers of ZNF596, jumping over its 5 earlier coding exons.

ZNF596 contains a KRAB domain but no SET methylase. Humans encode a best-blast protein of the same assigned name on chr 8 (77% identity). Note the early exons of ZNF596 can be added to end of PRDM9a to form an artificial probe for this association in other species, though the two genes have a 43,400 bp spacer in cow, which is large relative to contig size in low coverage assemblies. The sole fragmentary transcript from yak testis (EF432551) is nearly identical to this PRDM9a, suggesting that the gene -- and perhaps its syntenic location -- became established prior to yak-cow divergence and is still functional. However its array of seven zinc fingers could recognize at most a region of 21 bp.

ZNF596 did not arise from a PRDM9-like gene through loss of the SET domain, though it is one of the better matches within the large zinc finger family. Excluding the zinc finger domain, ZNF343, ZNF133 and ZNF169 provide much higher blastp scores, as they also do just comparing the zinc finger arrays. The juxtaposition of ZNF596 and PRDM9a is likely coincidental rather than a consequence of inhomogeneous recombination between zinc fingers bringing PRDM9 to this site.

The fourth PRDM9 locus of interest, called here PRDM9b, is still not mapped to any bovine chromosome. It resides in contig DAAA02065087 in the UMD3.1 assembly and is temporarily assigned to chr Un.004.649 at Baylor assembly. Here the reading frame in exon two can be restored if a run of 5 A's is corrected to 6 A's. That is done here in the reference sequences because this is typically just sequencing error. The protein has a full set of domains KRAB SSXRD SET C2H2 with a moderate zinc finger array of five. Synteny cannot be determined in chr Un features which can simply pool unrelated unplaceable contigs into a manageable unit. Flanking dna in DAAA02065087map to several places in the cow genome, suggesting this feature has copy number attributes, perhaps of telomeric repeat type. PRDM9b is not a recent feature because it differs at a considerable number of amino acids from other PRDM9 in the cow genome. These substitutions avoid highly conserved residues, not consistent with early pseudogenization. PRDM9b is capable of histone marking but it is not clear whether that has functional significance to meiosis.

Yet another locus in the Baylor 4.0 assembly, called PRDM9c here, could not initially be placed on a cow chromosome. While such features are often assembly artefacts, this one is supported by a transcript from 4-cell embryos (GO353654) consistent with a role in or after meiosis. In UMD3.1, this gene has been placed on chr X. Despite a very large contig, no zinc fingers occur in any reading frame, suggesting that the gene was transferred here without the last exon (or it subsequently got deleted). In any event, the penultimate exon does not have a phase 1 splice donor in expected position and so terminates at the next stop codon downstream. The protein retains the KRAB, SSXRD and SET domains but does not possess the ability to scan or bind dna. It has accrued various amino acid substitutions relative to other bovine that rule out recent establishment.

Finally, two additional genes, denoted PRDM9d and PRDM9e here, are located as a parallel tandem pair in a higher quality region of bovine chr X. These are 96% identical as proteins, consistent with one being derived fairly recently from the other. Synteny here will not be informative until other ruminant genomes become available.

Overall the situation in cow is very different from primates and rodents. Results there about the function of single-copy autosomal PRDM9 gnes in meiosis markup can scarcely be carried over to a species with five seemingly intact genes, three of which are on chr X (which intriguingly has the very limited pseudoautosomal region on chr Y where it can cross over).

The cow situation cannot be limited to the Hereford breed used for the genome project because the PRDM9 are too diverged from one another outside the zinc finger region. Indeed there is some suggestion from non-NCBI sheep genome that it too has many of these copies. However other cetartiodactyl genomes (dolphin, pig and alpaca) and other laurasiatheres (panda, dog, cat, shrew, bats) do not show these copies, suggesting that this complexity could be limited to pecoran ruminants. All-vs-all blastp percent identities are consistent with this, though rates of evolution in this gene family are hardly typical.This cannot be resolved with cow genome alone -- there is no good candidate still present for parent gene to all these copies. These results are summarized in the table below:

Gene   #ZNF  Status  Chr  Synteny  cDNA  Accession    9a_bosTau 9b_bosTau 9e_bosTau 9a_oviAri 9a_turTru 7_ailMel

PRDM7    -   pseudo  18    GAS8     no   none           --        --        --        --        --       --
PRDM9a   7     ok     1    ZNF596   yes  NW_003053109  100%      85%        81%       82%       76%      72%
PRDM9b   5     ok     ?    not det  no   DAAA02065087   81%     100%        78%       79%       72%      68%
PRDM9c   0     ok     X    not det  yes  XM_002699750   80%      80%        82%       83%       74%      73%
PRDM9d   9     ok     X    ---      no   none           80%      78%        96%       93%       73%      67%
PRDM9e   9     ok     X    ---      no   none           81%      78%       100%       93%       73%      68%

The role of CpG mutations

Human PRDM9 has 39 CpG sites in its coding exons, potentially mutational hotspots. After attempted dna repair, these usually resolve to CpA or TpG. If not at a synonymous site, these changes alter the encoded amino acid. Some 28 of the CpG sites are at arginine CGn codons (of which the protein has 90 overall). These always result in a substitution: for G -> A, histidine for CGT and CGC and glutamine for CGG and CGA; for C -> T, cysteine for CGT and CGC and tryptophan and stop codon for CGG and CGA. These changes are in fact seen in many of the reference sequences. The display below shows wildtype human PRDM9 in the top lines and the effects of G -> A and C -> T in the next. The zinc finger array is highlighted. Note that position -1 is sensitive to the CpG hotspot effect, at least in human PRDM9 as it stands. However the rapid evolution reported for the four dna-recognizing residues cannot be primarily attributed to the CpG effect. The terminal partial finger YVCREDE* is commonly altered to Y*CREDE* but this is likely insufficient for loss of function.

PRDM9_homSapWT   MSPEKSQEESPEEDTERTERKPMVKDAFKDISIYFTKEEWAEMGDWEKTRYRNVKRNYNALITIGLRATRPAFMCHRRQAIKLQVDDTEDSDEEWTPRQQVKPPWMALRVEQRKHQKGMPKASFSNESSLKELSRTANLLNASGSEQAQKPVSPSGEASTSGQHSRLKLELRKKETERKM
PRDM9_homSapCA   ...................Q.............................H...................Q......Q...................................H...................................................................
PRDM9_homSapTG   ...................W.............................C...................*......*...................................C........V..........................................................

PRDM9_homSapWT   YSLRERKGHAYKEVSEPQDDDYLYCEMCQNFFIDSCAAHGPPTFVKDSAVDKGHPNRSALSLPPGLRIGPSGIPQAGLGVWNEASDLPLGLHFGPYEGRITEDEEAANNGYSWLITKGRNCYEYVDGKDKSWANWMRYVNCARDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDE
PRDM9_homSapCA   ...Q...........K........................................H.........................................Q....K......................................Q.....................Q...............
PRDM9_homSapTG   ...*............L.......................................C..............................L..........*...........................................W.....................*...............
                                                    
PRDM9_homSapWT   YGQELGIKWGSKWKKELMAGREPKPEIHPCPSCCLAFSSQKFLSQHVERNHSSQNFPGPSARKLLQPENPCPGDQNQEQQYPDPHSRNDKTKGQEIKERSKLLNKRTWQREISRAFSSPPKGQMGSCRVGKRIMEEESRTGQKVNPGNTGKLFVGVGISRIAK
PRDM9_homSapCA   .S..............................................H.....................................H............................................................................
PRDM9_homSapTG   ................................................C.....................................C............................................................................

........-1..23..6..........   ........-1..23..6..........   ........-1..23..6..........   ........-1..23..6..........
VKYGECGQGSVKSDVITHQRTHTGEKL   YVCRECGRGSRQSVLLTHQRRHTGEKP   YVCRECGRGRDKSHLLRHQRTHTGEKP   
...........................   .......Q..Q................   .......Q.HN................   
...........................   .......W..W................   .......W.C.................   
YVCRECGRGSWKSHLLIHQRIHTGEKP   YVCRECGRGSWQSVLLTHQRTHTGEKP   YVCRECGRGRDKSNLLSHQRTHTGEKP   
.I.....Q...................   .......Q...................   .......Q...................   
.......W...................   .......W...................   .......W...................   
YVCRECGRGSWQSVLLTHQRTHTGEKP   YVCRECGRGSWQSVLLTHQRTHTGEKP   YVCRECGRGSNKSHLLRHQRTHTGEKP   
.......Q...................   .......Q...................   .......Q...................   
.......W...................   .......W...................   .......W...................   
YVCRECGRGSRQSVLLTHQRRHTGEKP   YVCRECGRGSNKSHLLRHQRTHTGEKP   YVCRECGRGRNKSHLLRHQRTHTGEKP   YVCRECGRGSDRSSLCYHQRTHTGEKP   YVCREDE
.......Q..Q................   .......Q...................   .......Q.H.................   .I.....Q..N................   .I.....
.......W..W................   .......W...................   .......W.C.................   .......W...................   .......

Excluding pseudogenes, a weblogo from an alignment of the remaining placental PRDM7 and PRDM9 genes illustrates the location of potential CpG mutations relative to conserved residues. These will be relatively high frequency disease alleles. In the initial KRAB domain, the potentially affected arginines are not especially well-conserved. However, at the first site, neither histidine nor cysteine is part of the reduced alphabet ans so these changes are unlikely to be tolerated. At the second and third sites, glutamine does occur secondarily in some species (cow, sheep and muntjac) and murid rodents, respectively. These changes are thus borderline for adverse effects on functionality.

KRAB9logo.png

In terms of potentially protective upstream CpG islands, PRDM9 has none. Three occur somewhat near the start of PRDM7 but do not extend into the coding region and may not be associated at all with this gene. Thus cytidines would be methylated in both coding regions, rendering them susceptible to hotspot mutations. The composite snapshot below from the UCSC human genome browser shows CpG islands relative to the two genes.

CpGislandsPR.gif


Structural considerations in C2H2 zinc fingers

High resolution structures of C2H2 zinc finger domains have been available for decades. As the name suggests, the divalent zinc atom locks the two cysteines and two histidines into a rigid geometry providing a core conformation that a small peptide of 28 residues could not otherwise stably assume. Note in the unbound state, finger tips must retain flexibility while the domain ensemble scans its genome for specific dna sequences appropriate to its function. Each finger binds a trinucleotide -- in effect making a zinc finger the protein counterpart to tRNA anticodon. However overall binding is not a simple read-off code because adjacent fingers alter each other's specificities in subtle ways.

The linker region TGEKP plays a key role when the correct DNA sequence is encountered, snap-locking its finger down onto its target by capping the C-terminus of its alpha helix. A hydrogen bond between the first threonine and middle glutamate is key to this binding-induced conformational shift. From comparative genomics, it appears that a serine in first position can also form this hydrogen bond. The role of the glycine is to stay out of the way; the lysine counterbalances the negative charge of the glutamate; the proline terminates any helical propensity, allowing a fresh start in the adjacent finger.

While this motif is immensely conserved within C2H2 zinc finger of PDRM9 homologs, exceptions do occur. It is important to understand these because these loss of dna lock-down could loosen or even eliminate trinucleotide binding specificity. Such steps might represent initial stages of pseudogenization. However many exceptions occur within the first or last fingers. It is also common for fragmentary and imperfect motifs to end the protein, sometimes continuing on in another reading frame past the current stop codon.

Note in aligning zinc finger motifs, the breaks should always be put at the end of the linker region. It is completely illogical to break at the first cysteine as some authors do because capping by the linker region is specific to its zinc finger, not the following one.

Predicting dna binding sites of zinc finger domains

PRDM9onDNA.jpg


The zinc knuckle preceding the PR (SET) domain

A 2011 crystallographic study establishes that a short motif YC..C..........C..HGP found in 6 members of the human PRDM gene family binds zinc via the 3 cysteines and a histidine. The fold most closely resembles the previously known RanBP2 zinc finger domain which occurs in some 21 human proteins, notably nucleoporins NUP153, NUP358, NPL4, EWS, TLS, RBP56, RBM5, RBM10, TEX13A, RANDB2 and ZRANB2. Not all these domains are necessarily homologous because the fold is small and zinc fingers seem to have evolved numerous times. Such fingers can bind other proteins, ssRNA and likely DNA. Their function in PRDM genes is completely unknown but the aromatic residue preceding the first cysteine may contribute to a pi-bonding base stack with guanines.

KnuckleSET.jpg

The domain begins at a phase 2 exon, meaning that the first codon letter is borrowed from the preceding exon splice donor. A dozen earlier residues from this exon are also used but do not exhibit any conservation outside their orthology class. In most cases the knuckle domain exon also contains a downstream PR(SET) domain but at variable intervening lengths (distances shown are to conserved FGP in center of PR(SET) domain. The function of these intervening residues are unknown.

exon 6     splice exon 7                     SET  gene name
IPLNQHTSDPNN 1 2 RCDMCADNRNGECPMHGPLHSLRRLVG .49. PRDM6_homSap
PDPPRPFDPHDL 1 2 WCEECNNAHASVCPKHGPLHPIPNRPV .16. PRDM10_homSap
MAEDGSEEIMFI 1 2 WCEDCSQYHDSECPELGPVVMVKDSFV .99. PRDM15_homSap
GSKENMATLFTI 1 2 WCTLCDRAYPSDCPEHGPVTFVPDTPI .36. PRDM4_homSap
IVPKSFQQVDFW 1 2 FCESCQEYFVDECPNHGPPVFVSDTPV .42. PRDM11_homSap
KEVSEPQDDDYL 1 2 YCEMCQNFFIDSCAAHGPPTFVKDSAV .42. PRDM9_homSap
KEISEPQDDDYL 1 2 YCEMCQNFFIDSCAAHGPPTFVKDSAV .42. PRDM7_homSap
QEIWDPQDDDYL 1 2 YCEECQTFFLETCAVHGPPKFVQDSVM .42. PRDM7_monDom
NENYRPEDDDYL 1 2 YCEICQTFFLEKCVLHGPPVFVQDLPV .42. PRDM7_ornAna
EEQDDTFNDQPF 1 2 YCEMCQQHFIDQCETHGPPSFTCDSPA .42. PRDM7_danRer
TEEEELRDEEYF 1 2 FCEECKSFFIEECELHGPPLFIPDTPA .42. PRDM7_salSal
IKEEEADVKDFL 1 2 YCEVCKSVFFSKCEVHGPALFIADSPV .42. PRDM7_ictPun
                YVCRECGRGFSWQSVLLTHQRTHTGEKP comparison to longer zinc finger in main array of PRDM7/9

Structural alignment of all PRDM proteins

To determine the evolutionary relationship of the 16 human PRDM genes, it is useful (given the great divergence in primary sequence) to consider rare genomic events such as intron gain/loss and indels. Only 7 of the 16 contain the knuckle region. Of these PDRM11 is the most closely related to PRMD9. This is fortunate because the 3D structure of PRDM11 was recently determined (PDB: 3RAY) from before the knuckle region on into the final exon, thus allowing threading of PRDM9 (whose structure has not been studied). The dozen-odd conserved patches in these widely diverged paralogs find their explanation in the atomic details of this structure.

The knuckle region apparently represents a one-time domain aquisition relative to a knuckle-less ancestral state. The date of this event relative to species phylogeny and the source of the domain are unclear (it is very unlikely to have evolved in situ). Similarly, the internal phase 00 intron is ancestral even though it breaks up a coherent structural domain. Note the final 12 intron is also ancestral -- the PR(SET) domain never occurs without it even though zinc fingers are not always found in the next exon. However the later 21 intron is a newer acquired feature specific to PRDM9 and its closest associates, post-dating aquisition of the knuckle domain and pre-dating duplication and divergence of the PRDM7/9 group. This again follows from gene tree and parsimony considerations.

PRDMs.gif


Legend for alignment above of the full set of human proteins with a PR(SET) domain:
gapping: uncertain between conserved markers
underlining: magenta coloring shows non-informative idiosyncratic introns
knuckle: shortened zinc finger motif
C2H2: terminal zinc finger region following universal phase 12 intron
0: indel unifying PRDM9/7/11, cannot be resolved as insertion or deletion
1: arginine supporting PRDM6 as outgroup to the knuckle subgroup
2: near-universal motif SLP
3: near-universal motif GF
4: indel unifying PRDM9/7/11, resolvable as an insertion
5: near-universal motif FGP
6: near-universal motif WLI split by universal phase 00 intron
7: near-universal motif NWMrYV  split by phase 21 intron gained by PRDM9/7/11/4
8: inexplicable repositioning of 6 residues to previous exon in PRDM4
9: near-universal motif EQNL
10: near-universal motif IFY
11: near-universal motif ELLVWY
12: possible synapormorphy grouping first 9 genes
PRDM3: inexplicably has official gene name MECOM
PRDM16: CVDANQAGAG insertion removed from ISEDLGSEKFCVDANQAGAGSWLKYIRVA
PRDM15: duplicated diverged exon removed 21 SWPASGHVHTQAGQGMRGYEDRDRADPQQLPEAVPAGLVRRLSGQQLPCRSTLTWGRLCHLVAQGR
iM: initial methionine, protein too for further comparison

The sequences above can be restricted to just alignable residues of the knuckle-containing PRDM, which allows idiosyncratic insertions to be removed. In turn, this removes a great deal of noise in terms of using the data to determine a gene tree. The second set of sequences provides a similarly trimmed and edited set of sequences for the full set of PRDM.

>PRDM9  
YCEMCQNFFIDSCAAHGPPTFVKDSAVDKGHPNRSALSLPPGLRIGPSGIPQAGLGVWNEASDLPLGLHFGPYEGRITEDEEAANNGYSWL ITKGR.NCYEYVDGKDKSWANWMR YVNCARDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEY
>PRDM7  
YCEMCQNFFIDSCAAHGPPTFVKDSAVDKGHPNRSALSLPPGLRIGPSGIPQAGLGVWNEASDLPLGLHFGPYEGRITEDEEAANSGYSWL ITKGR.NCYEYVDGKDKSSANWMR YVNCARDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEY
>PRDM11 
FCESCQEYFVDECPNHGPPVFVSDTPVPVGIPDRAALTIPQGMEVVKDTS..GESDVRCVNEVIPKGHIFGPYEGQIS.TQDKSAGFFSWL IVDKN.NRYKSIDGSDETKANWMR YVVISREEREQNLLAFQHSERIYFRACRDIRPGEWLRVWYSEDY
>PRDM4  
WCTLCDRAYPSDCPEHGPVTFVPDTPIE....SRARLSLPKQLVLRQSIV.GAEVGVWTG.ETIPVRTCFGPLIGQQSAEWTDKAVNHIWK IYHNG.VLEFCIITTDENECNWMM FVRKARNREEQNLVAYPHDGKIFFCTSQDIPPENELLFYYSRDY
>PRDM10 
WCEECNNAHASVCPKHGPLHPIPNRPVL....TRARASLPLVLYIDRFLG.....GVFSK.RRIPKRTQFGPVEGPLVRGSELKDCYIHLK VSLDKGDRKERDLLSDETLCNWMM FVRPAQNHLEQNLVAYQYGHHVYYTTIKNVEPKQELKVWYAASY
>PRDM15 
WCEDCSQYHDSECPELGPVVMVKDSFVL....SRARSSLPPNLEIRRLED.GAE.GVFAI.TQLVKRTQFGPFESRRV.AKWEKESAFPLK VFQKDGHPVCF.DTSNEDDCNWMM LVRPAAEAEHQNLTAYQHGSDVYFTTSRDIPPGTELRVWYAAFY
>PRDM6  
RCDMCADNRNGECPMHGPLHSLRRLVGT...SSAALRDLPREVCLCTSTVPGLAYGICAA.QRIQQGTWIGPFQGVLLQAGAVRNTQHLWE IYDQDGTLQHFIDGGEPSKSSWMR YIRCARHCGEQNLTVVQYRSNIFYRACIDIPRGTELLVWYNDSY
>PRDM9
SLPPGLRIGPSGI.PQAGLGVWNEASDLPLGLHFGPYEGRIT...EDEEAANNGYSWLITKGR.NCYEYVDGKDKSWANWMRYVNCARDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEY
>PRDM7
SLPPGLRIGPSGI.PQAGLGVWNEASDLPLGLHFGPYEGRIT...EDEEAANSGYSWLITKGR.NCYEYVDGKDKSSANWMRYVNCARDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEY
>PRDM11
TIPQGMEVVKDTS...GESDVRCVNEVIPKGHIFGPYEGQIS....TQDKSAGFFSWLIVDKN.NRYKSIDGSDETKANWMRYVVISREEREQNLLAFQHSERIYFRACRDIRPGEWLRVWYSEDY 
>PRDM4
SLPKQLVLRQSIV..GAEVGVWTG.ETIPVRTCFGPLIGQQSMEVAEWTDKAVNHIWKIYHNG.VLEFCIITTDENECNWMMFVRKARNREEQNLVAYPHDGKIFFCTSQDIPPENELLFYYSRDY
>PRDM10
SLPLVLYIDRFLG......GVFSK.RRIPKRTQFGPVEGPLV...RGSELKDCYIHLKVSLDKGDRKERDLLSDETLCNWMMFVRPAQNHLEQNLVAYQYGHHVYYTTIKNVEPKQELKVWYAASY
>PRDM1
SLPPNLEIRRLED..GAE.GVFAI.TQLVKRTQFGPFESRRV....AKWEKESAFPLKVFQKDGHPVCF.DTSNEDDCNWMMLVRPAAEAEHQNLTAYQHGSDVYFTTSRDIPPGTELRVWYAAFY
>PRDM6
DLPREVCLCTSTVP.GLAYGICAA.QRIQQGTWIGPFQGVLLEKVQAGAVRNTQHLWEIYDQDGTLQHFIDGGEPSKSSWMRYIRCARHCGEQNLTVVQYRSNIFYRACIDIPRGTELLVWYNDSY
>PRDM14
QLPEGLCLMQTVFGEVPHFGVFCS.SFIAKGVRFGPFQGKVVNASEVKTYGDNSVMWEIFED.GHLSHFIDGK.GGTGNWMSYVNCARFPKEQNLVAVQCQGHIFYESCKEIHQNQELLVWYGDCY
>PRDM1
SLPRNLLFKYATN.SEEVIGVMSK.EYIPKGTRFGPLIGEIYTNDTVPKNANRKYFWRIYSR.GELHHFIDGFNEEKSNWMRYVNPAHSPREQNLAACQNGMNIYFYTIKPIPANQELLVWYCRDF
>PRDM2
GLPEEVR.LFPSAVDKTRIGVWAT.KPILKGKKFGPFVGDKK...KRSQVKNNVYMWEVYYP.NLGWMCIDATDPEKGNWLRYVNWACSGEEQNLFPLEINRAIYYKTLKPIAPGEELLVWYNGED
>PRDM12
VLPAEVIIAQSSIPGEGL.GIFSK.TWIKAGTEMGPFTGRVIAPEHVDICKNNNLMWEVFNEDGTVRYFIDASQEDHRSWMTYIKCARNEQEQNLEVVQIGTSIFYKAIEMIPPDQELLVWYGNSH
>PRDM5
GMYVPDRFSLKSSRVQDGMGLYTA.RRVRKGEKFGPFAGEKRMPEDLDENMDYRLMWEVRGSKGEVLYILDATNPRHSNWLRFVHEA.PSQEQKNLAAIQEGEIFYLAVEDIETDTELLIGYLDSD
>PRDM16
PIPADFELRESSIPGAGL.GVWAK.RKMEAGERLGPCVVVPR.....AAAKETDFGWEQILTDVEVSPQGCDLGSEKFSWLKYIRVACSCDDQNLTMCQISEQIYYKVIKDIEPGEELLVHVKEGV
>PRDM3
PIPAEFELRESNMPGAGL.GIWTK.RKIEVGEKFGPYVGEQRS.....NLKDPSYGWEILDEFYNVKFCIDASQPDVGSWLKYIRFAG.CYDQHNLVACQINQIFYRVVADIAPGEELLLFMKSED
>PRDM13
iMHGAARAPATSVSAD.CCIPAGLRLGPVPGTFKLGKYLSDRREPGPKKKVRMVRGE...LVDESGGSPLEWIGLIRAARNSQEQTLEAIADLPQIFYRALRDVQPGEELTVWYSNSL
>PRDM8
GIWDGDAKAVQQCLTDIFTSVYTT.CDIPENAIFGPCVLSHTIALKSTDKRTVPYIFRVDTSAANGSSEGL.......MWLRLVQSARDKEEQNLEAYIKNGQLFYRSLRRIAKDEELLVWYGKEL

Curated reference sequences

The sequences below have largely been compiled from genome projects -- only rarely do validating transcripts exist at GenBank. Sequences with a single frameshift or other glitch have been edited to allow full length proteins on the theory that the error either reflects an aberrant atypical individual chosen for sequencing or sequencing error in low coverage projects within a difficult region. However such sequences may instead reflect early stages of pseudogenization. Other sequences are in fact clearly pseudogenes; here recognizable exons have been collected to allow rough dating of loss of function.

In the case of more intensively studied species such as human, the number of C2H2 repeats varies widely. Only the reference sequence representative is shown here. This variation likely occurs in all species with the individual animal chosen for sequencing not necessarily the most common allele. Many clades have independent histories of gene amplification and gene loss, making both orthologous and functional comparisons problematic at substantial divergences.

The reference sequences below are also available as here as tab-delimited pdf text that will paste cleanly into rows and columns of a spreadsheet which allows sorting to conveniently select data subsets.

Other useful sequences such as PRDM11, PRDM4 and zinc finger semi-homologs having similar exon and domain structures, are provide in the subsequent section along with syntenic markers such as GAS8.

>PRDM9_homSap Homo sapiens (human) genome Prim gene 13 CDH12 chr5 10 exon size 18,301 bp KRAB SSXRD SET C2H2
0 MSPEKSQEESPEEDTERTERKPM 0 
0 VKDAFKDISIYFTKEEWAEMGDWEKTRYRNVKRNYNALITI 1
2 GLRATRPAFMCHRRQAIKLQVDDTEDSDEEWTPRQQ 1
2 VKPPWMALRVEQRKHQK 0
0 GMPKASFSNESSLKELSRTANLLNASGSEQAQKPVSPSGEASTSGQHSRLKL 1
2 ELRKKETERKMYSLRERKGHAYKEVSEPQDDDYL 1
2 YCEMCQNFFIDSCAAHGPPTFVKDSAVDKGHPNRSALSLPPGLRIGPSGIPQAGLGVWNEASDLPLGLHFGPYEGRITEDEEAANNGYSWL 0
0 ITKGRNCYEYVDGKDKSWANWMR 2
1 YVNCARDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKWKKELMAGR 1
2 EPKPEIHPCPSCCLAFSSQKFLSQHVERNHSSQNFPGPSARKLLQPENPCPGDQNQEQQYPDPHSRNDKTKGQEIKERSKLLNKRTWQREISRAFSSPPKGQMGSCRVGKRIMEEESRTGQKVNPGNTGKLFVGVGISRIAK
VKYGECGQGFSVKSDVITHQRTHTGEKL
YVCRECGRGFSWKSHLLIHQRIHTGEKP
YVCRECGRGFSWQSVLLTHQRTHTGEKP
YVCRECGRGFSRQSVLLTHQRRHTGEKP
YVCRECGRGFSRQSVLLTHQRRHTGEKP
YVCRECGRGFSWQSVLLTHQRTHTGEKP
YVCRECGRGFSWQSVLLTHQRTHTGEKP
YVCRECGRGFSNKSHLLRHQRTHTGEKP
YVCRECGRGFRDKSHLLRHQRTHTGEKP
YVCRECGRGFRDKSNLLSHQRTHTGEKP
YVCRECGRGFSNKSHLLRHQRTHTGEKP
YVCRECGRGFRNKSHLLRHQRTHTGEKP
YVCRECGRGFSDRSSLCYHQRTHTGEKP
YVCREDE.....................

>PRDM9_panTro Pan troglodytes (chimp) genome Prim gene 19 CDH12 chr5 frag assembly glitch in mid C2H2
0 MSPERSQEESPEGDTERTERKPM 0 
0 VKDAFKDISIYFTKEEWAEMGDWgKTRYRiVKMNYNALITi 1
2 GLRATRPAFMCHRRQAIKLQVDDTEDSDEEWTPRQQ 1
2 VKPPWMAFRGEQSKHQK 0
0 GMPKASFNNESSLkELSGmPNLLNTSgSEQAQKPVSPPGEASTSGQHSRLKL 1
2 ELRRKETvGKMYSLRERKGHAYKEISEPQDDDYL 1
2 yCEMCQNFFIDSCAAHGPPTFVKDSAVDKGHPNRSALSLPPGLRIGPSGIPQAGLRVWNEASDPPLGLHSGPYEGQITEDEEAANSGYSWL 0
0 ITKGRNCYEYVDGKDKSwANWMR 2
1 YENCARDDEEQNLVSFQYHRQSFYRTCRVIRPGCELLVWYGDE GQELGIKWGSKWKKELMAGR 1
2 EPKPEIHPCPSCCLAFSSQKFLSQHVERNHSSQNFPGPSARKLLQPENPCPGDQNQEQQYPDPRSRNDKTKGQEIKERSKLLNKRTWQREISRAFSSPPKGQMGSCRVGKRIMEEESRTGQKVNPGNTAKLFVGVGISRIAK
VKYGECGQGFSDKSDVITHQRTHTGGKP
YVCRECGRGFSWKSHLLSHQRTHTGEKP
YVCRECGRGFSVKSSLLSHRTTHTGEKP
YVCRECGRGFSVKSSLLSHQRTHTGEKP
YVCRECGRGFSQQSNLLSHQRTHTGEKP
YVCRECGRGFSVKSSLLSHQRTHTGEKP
YVCRECGRGFSVKSSLLSHQRTHTGEKP
YVCRECGRGFSKQSHLLSHQRTHTGEKP
YVCRECGRGFSVQSNLLSHQRTHTGEKL
YVCRECGRGFSQQSHLLRHQRTHTGEKP
YVCRecgrgfsqqshLLSHQRTHTGEKP
YVCRECGRGFSVKSSLLSHQRTHTGEKP
YVCRECGRGFSKQSHLLSHQRTHTGEKP
YVCRECGRGFSQQSHLLSHQRTHTGEKP
YVCRECGRGFSQQSHLLRHQRTHTGEKP
YVCRECGRGFSVKSSLLSHQRTHTGEKP
YVCRECGRGFSVKSSLLSHQRTHTGEKP
YVCRECERGFSQQSHLLRHQRTHTGEKP
YVCRECGRGFSRQSALLIHQRTHTGEKP
VCREDE......................

>PRDM9_gorGor Gorilla gorilla (gorilla) CABD02290264 Prim gene -- cdh12 chr5 several contigs needed, most of ZNF domain missing
0 MSPERSQEESPEEDTERTERKPM 0 
0 VKDAFKDISIYFTKEEWAEMGDWEKTRYRNVKRNYNALITI 1
2 GLRATRPAFMCHRRQAIKLQVDDTEDSDEEWTPRQQ 1
2 VKPPCMALRVEQRKHQK 0
0 GMPKASFSNESSLKELSRTANLLNASGSEQAQKPVSPPGEASTSGQHSRLKL 1
2 ELRKKETEGKMYSLRERKGHAYKEVSEPQDDDYL 1
2 yCEMCQNFFIDSCAAHGPPIFVKDSAVDKGHPNRSALSLPPGLRIGPSGIPQAGLGVWNEASDLPLGLHFGPYKGRITEDEEAANNGYSWL 0
0 ITKGRNCYEYVDGKDKSWANWMR 2
1 YVNCARDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKWKKELMAGR 1
2 EPKPEIHPCPSCCLAFSSQKFLSQHVERNHSSQNFPGPSARTLLQPENPCPGDQNQEQQYPDPRSRNDKTKGQEIKERSKLLNKRTWQREISRAFSSPPKGQMGSCRVGKRIMEEESR TGQKVNPGNTGKLFVGVGISRIAK
VKYGECGQGFSVKSDVITHQRTHTGEKP
YVC.........................

>PRDM9_ponAbe Pongo abelii (orangutan) genome Prim gene 10 CDH12 chr5 frameshift extra a penultimate ZNF
0 MSPERSQEESPkGDTERTERKPM 0 
0 VKDAFKDISIYFTKEEWTEMGDWEKTRYRNVKRNYKTLITI 1
2 GLRATRPAFMCHRRQAIKLQVDDTEDSDEEWTPRQQ 1
2 VKPPWMAFRGEQSKHQK 0
0 GMPKASFNNESSLKELSGTQNLLNTSGSEQAQKPVSPPGEASTSGQHSTLKI 1
2 ELRRKETEGKTYSLRERKGHAYKEVSEPQDDDYL 1
2 YCEMCQNFFIDSCAAHGPPTFVKDSAVDKGHPNRSALSLPPGLRIGPSGIPQAGLGVWNEASDLPLGLHFGPYEGRITEDKEAANNGYSWL 0
0 ITKGRNCYEYVDGKDKSWANWMR 2
1 YVNCAWDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKWKKELMPGR 1
2 EPKPEIHPCPSCCLAFSSQKFLSQHVERNHSSQNFPGPSARKLLQPENPCPGDQNHEQQYSDPRSCNDKTKGQEIKERSKLLNKRTWQREISRAFSSPPKGQMGSCRVGKRIMEEESRTGQKVNPGNTGKLFVGVGISRIAK
VKYGECGQGFSDKSDVITHQRTHTGGRS
YVCRECGRGFSRQSVLLIHQRTHTGEKP
YVCRECGRGFSRRSVLLIHQRTHTGEKP
YVCRECGRGFSQQSVLLIHQRTHTGEKP
YVCRECGRGFSRRSVLLIHQRTHTGEKP
YVCRECGRGFSWKSVLLRHQRTHTGEKP
YVCRECGRGFSQQSVVFIHQRTHTGEKP
YVCRECGRGFSGKSVLFRHQRTHTGEKP
YVCRECGRGFSDKSGVCYHQRTHTRGEA
YVCRECGRGFSVKSNLLSHQRTHTEEKL
YVCREDE.....................

>PRDM9_nomLeu Nomascus leucogenys (gibbon) ADFV01015315 Prim gene 10 cdh12 ADFV01015317 ADFV01015319 no synteny CpG stop exon 6 in 6/6 traces
0 MSPERSQEESPEEDTERTEQKPT 0 
0 VKDAFKDISIYFTKEEWAEMGDWEKTRYRNVKRNYNALITI 1
2 GLRATRPAFMCHRRQAIKLQVDDTEDSDEEWTPRQQ 1
2 VKPPWMALRMEQRKHQK 0
0 GMPKASFSNESSLKELSGAANLLNASGSEQAQKPVSPPGEASTSGQHSRLKL 1
2 ELRRKETEGKMYSL*ERKGHAYKEVSEPQDDDYL 1
2 YCEMCQNFFTDSCAAHGPPTFIKDSTVGKGHPNRSALSLPPGLRIGPSGIPQAGLGVWNEASDLPLGLHFGPYEGRITEDEEAANNGYSWL 0
0 ITKGRNCYEYVDGKDKSWANWMR 2
1 YVNCARDDEEQNLVAFQYHRQIFYRTCQVIRPGCEPLVWYGDEYGQELGIKWGSKWKKELTAER 1
2 EPKAEIHPCPSCCLAFSSQKFLSQHVARHHSSQNFPGPSARKFLQPENPCPGDQNQEQQYSDPRSCNDKTKGQEIKERSKLLNKRTWQREISRAFSSSPKVQMGSCRVGKRIIEESRTGQKVNPGNTGQLFVGVGISRIAE
VKYGECGQGFSVKSDVITHQRTHTGEKL
YLCRECGRGFSVKSSLLSHQRTHTGEKP
YVCRECGRGFSKKSNLLSHQRTHTGEKP
YVCRECGRGFSDKSSLLRHQRTHTGEKP
YVCRECGRGFSQKSSLLSHQRTHTGEKP
YVCRECGRGFSQKSSLLSHQRTHTGEKP
YVCRECGRGFSDKSSLLRHQRTHTGEKP
YVCRECGRGFSQKSSLLSHQRTHTGEKP
YVCRECGRGFSVKSNLLSHQRTHTGEKP
YVCRECGRGFSDKSSLLRHQRTHTGEKP

>PRDM9_macMul Macaca mulatta (rhesus) genome Prim gene 9 CDH12 chr6 exon 4 lost to Ns
0 MSPERSQEESPEEDTERTERKPT 0 
0 VKDAFKDISIYFTKEEWAEMGDWEKTRYRNVKRNYNALITI 1
2 GLRATRPAFMCHRRQAIKLQVDDTEDSDEEWTPRQQ 1
2  0
0 GMPKASFNNESSLKEVSGMANLLNTSGSEQAQKPVSPPGEARTSGQHSRLKL 1
2 ELRRKETEGKMYSLRERKGHAYKEVSEPQDDDYL 1
2 YCEMCQNFFIDSCAAHGPPTFIKDSAVEKGHPNRSALSLPPGLRIGPSGIPQAGLGVWNEASDLPLGLHFGPYEGRITQDEEAANNGYSWL 0
0 ITKGRNCYEYVDGKDKSWANWMR 2
1 YVNCARDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKWKKELMAGR 1
2 EPKPEIHPCPSCCLAFSSQKFLSQHVERNHSTQNFPGPSARRLFQPENLCSGDQNQEQQYSDPRSCNDKTKGQEIKERSKLLNKRTWPKEISRAFSSPPKGQMGSSRVGERMMEEEYRTGQKVNPENTGKLFVGVGISRIAK
VKYGECGQGFSDKSDVIIHQRTHTGEKP
YLCRECGRGFSQKSSLRRHQRTHTGEKP
YLCRECGRGFRDNSSLRYHQRTHTGEKP
YLCRECGRGFSNNSGLCYHQRTHTGEKP
YLCRECGRGFSDNSSLHRHQRTHTGEKP
YLCRECGRGFSNNSGLRYHQRTHTGEKP
YLCRECGRGFSNNSGLRHHQRTHTGEKP
YLCRECGRGFSQKANLLRHQRTHTGEKP
YLCRECGRGFSQKADLLSHQRTHTGEKP
VCRKDE......................

>PRDM9_papHam Papio hamadryas (baboon) genome Prim gene 11 cdh12 contigs scattered
0  0 
0  1
2  1
2 VKPPWMAFRVEQSKHQK 0
0 EMPKTSFSNESSLKELSGTPNLLSTSGSEQAQKPASPPGEASTSGQHSRLKL 1
2 ELRRKEAEGKMYSLRERKGHAYKEVSELQDDDYL 1
2 ycEMCQNFFIDSCAAHGPPTFVKDSAVNKGHPNRSALSLPPGLRIGPSGIPQAGLGVWNEASDLPLGLHFGPYEGRITEDKEAANNGYSWL 0
0 ITKGRNCYEYVDGKDKSWANWMR 2
1  1
2 EPKPEIHPCPSCCLAFSSQKFLSQHVERNHSTQNFPGPSARRLLQPENLCSGDQNQEQQYSDPCSCNDKTKGQEIKERSKLLNKRTWQKEISRAFSSPPKGQMGSSRVGERMMEEESRTGQKVNPENIGKLFVEVGISRIAK
VKYGECGQGFSGKSDVITHQRTHTEGKP
YLCRECGRGFSQKSNLLRHQRTHTGEKP
YLCRECGRGFRDNSSLRCHQRTHTGEKP
YLCRECGRGFRDNSSLRCHQRTHTGEKP
YLCRECGRGFSDNSSLRYHQRTHTGEKP
YLCRECGRGFRDNSSLRYHQRTHTGEKP
YLCRECGRGFSVKSNLLSHQRTHTGEKP
YVCRECGRGFSDNSSLRCHQRTHTGEKP
YLCRECGRGFSQMSHLRCHQRTHTGEKP
YLCRECGRGFSVKSNLLSHQRTHTGEKP
YVCRECGRGFSRKANLLSHQRTHTGEKP

>PRDM7_homSap Homo sapiens (human) genome Prim gene 3 GAS8+ chr16 TUBB3+ DEFB+ AFG3L1+ DBNDD1- GAS8+ PRDM7- 92% id
0 MSPERSQEESPEGDTERTERKPM 0 
0 VKDAFKDISIYFTKEEWAEMGDWEKTRYRNVKMNYNALITV 1
2 GLRATRPAFMCHRRQAIKLQVDDTEDSDEEWTPRQQ 1
2 VKPPWMAFRGEQSKHQK 0
0 GMPKASFNNESSLRELSGTPNLLNTSDSEQAQKPVSPPGEASTSGQHSRLKL 1
2 ELRRKETEGKMYSLRERKGHAYKEISEPQDDDYL 1
2 YCEMCQNFFIDSCAAHGPPTFVKDSAVDKGHPNRSALSLPPGLRIGPSGIPQAGLGVWNEASDLPLGLHFGPYEGRITEDEEAANSGYSWL 0
0 ITKGRNCYEYVDGKDKSSANWMR 2
1 YVNCARDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKWKKELMAGR 1
2 EPKPEIHPCPSCCLAFSSQKFLSQHVERNHSSQNFPGPSARKLLQPENPCPGDQNQERQYSDPRCCNDKTKGQEIKERSKLLNKRTWQREISRAFSSPPKGQMGSSRVGERMMEEESRTGQKVNPGNTGKLFVGVGISRIAK
VKYGECGQGFSDKSDVITHQRTHTGGKP
YVCRECGRgFSRKSDLLSHQRTHTGEKP
YVCRECERGFSRKSVLLIHQRTHRGDAP
VCRKDE......................

>PRDM7_panTro Pan troglodytes (chimp) genome Prim pseu 2 GAS8+ chr16 
0 MSPERSQEESPEEDTERTERKPM 0 
0 VKDAFKDISIYFTKEEWAEMGDWEKTRYRNVKRNYNALITI 1
2 GLRATRPAFMCHRRQAIKLQVDDTEDSDEEWTPRQQ 1
2 VKPPLMALRVEQRKHQK 0
0 GMPKASFSNESSLKELSRTANLLNASGSEQAQKPVSPPGEASTSGQHSRLKL 1
2 ELKKKETEGKMYSLRERKGHAYKEVSEPQDDDYL 1
2 YCEMCQNFFIDSCAAHGPPTFVKDSAVDKGHPNRSALSLPPGLRIGPSGIPQAGLGVWNEASDLPLGLHFGPYKGRITEDEEAANNGYSWL 0
0 ITKGRNCYEYVDGKDKSWANWMR 2
1 YVNCARDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKWKKELMAGR 1
2 EPKPEIHPCPSCCLAFSSQKFLSQHVERNHSSQNFPGPSARKLQPENP PGDQNQERQYSDPRCCNDKTKGQEVKERSKLLNKWTWQREISRAFSSLPKGQMGSSRVGERMMEEESRTGQKVNPGNTGKLFVGVGISRIAK
VKYGECGQGFSVKSDVITHQRTHTGEKP
YVCRECGQGFSRKSVLLIHQRTHRGEKP
VCRKDE......................

>PRDM7_gorGor Gorilla gorilla (gorilla) genome Prim pseu 3 GAS8+ chr15730 numerous frameshifts in terminal ZNF domain
0 MSPERSQEESPEGDTERTERKPM 0 
0 VKDAFKDISIYFTKEEWAEMGDWEKTRYRNVKRNYNALITI 1
2 GLRATQPVFMCHRRQAIKLQVDDTEDSDEEWTPRQQ 1
2  0
0 GMPKASFNNESSLKELSGTPNLLNTSGSEQAQKPVSPPGEASTSGQHSRRKL 1
2 ELRRKETEGKMYSLRERKGHAYKEISKPQDDDYL 1
2 yCEMCQNFFIDSCAAHGPPTFVKDSAVDKRHPNRSALSLPPGLRIGPSGIPQAGLGVWNEASDLPLGLHFGPYEGRITEDKEAANSGYSWL 0
0 ITKGRNCYEYVDGKDKSWANWMR 2
1 YVNCARDDEEQNLVALQYHRQIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKWKKELTAGR 1
2 EPKPEIHPCPSCCLAFSSQKFLSQHVERNHSSQNFPGPSARKLLQPENPCPGDQNQERQYSDPRCCNDKTKGQEIKERSKLLNKRTWQREISRAFSSPPKGQMGSSRVGERMMEEESRTGQKVNPGNTGKLFVGVGISRIAK
VKYGECGQGFSWKSNLLRHQRTHTGGKP
YVCRECGRGFSWKSDLLSHQRTHTGEKP
YVCRECGRGFSWKSNLLSHQRTHTGEKP

>PRDM7_ponAbe Pongo abelii (orangutan) genome Prim gene 4 GAS8+ chr16 
0 MSPERSQEESPEDDTERTERKPT 0 
0 VKDAFKDISIYFTKEEWAEMGDWEKTRYRNVKRNYNALITI 1
2 GLRATRPAFMCHRRQAIKLQVDDTEDSDEEWTPRQQ 1
2 VKPPWMALRVEQRKHQK 0
0 GMPKASFNNESSLKELSETANLLNASGSEQAQKPVSPPGEASTSGQHSRLKL 1
2 ELRSKETEGNTYSLRERKGHAYKEISEPQDDDYL 1
2 yCEMCQNFFIDSCAAHGPPTFVKDSAVDKGHPNRSALTLPPGLRIGPSGIPQAGLGVWNEASDLPLGLHFGPYEGRITKDEEAANNGYSWL 0
0 ITKGRNCYEYVDGKDKSWANWMR 2
1 YVNCARDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKWKKELMAGR 1
2 EPKPEIHPCPSCCLAFSSQKFLSQHVERNHSSQNFPGPSARHLLQAENPCPGDQNQEQQYSDPDCCNDKTKGQEIKERSKLLNKRTWQREISRAFSSSAKGQMGSSRVGERMMEEESGTGQKVNPGNTGKLFVGVGISRIAK
VKYGECGQGFSVKSDVITHQRTHTGEKP
YICRESGRGFTQKSGLLSHQRTHTGEKP
YVCRECGWGFSQKSNLLRHQRTHTGEKP
YVCRECGRGFSRKSVLLIHQRTHTGEKP
VCRKDE......................

>PRDM7_nomLeu Nomascus leucogenys (gibbon) ADFV01125891 Prim pseu 5 gas8+ synteny implied by non-coding
0  0 
0  1
2  1
2 IKSPWMAVRVEQSKHQK 0
0 GMPKASFNNESGLKELSGTQNLLNTSG EQARKPVSPPGEASTSGQHSRQKL 1
2 ELRRKETEGKMYSL ERKGHAYKEVSEPQDDDYL 1
2 yCEMCQNFFTDSCAAHGPPTFVKDSAVDKGHPNHSALSLPPGLRIGPSGIPQAGLGVWNEASDLPLGLHFGPYEGRITEDEEAANNGYSWL 0
0 ITKGRNCYEYVDGKDKS ANWMK 2
1 YVNCARDHEEQNLVAFQYHRQIFYRTCQVIRPGCEPLVWYGDEYGQELGIKWGSKWKKELTAER 1
2 EPKPEIHPCPSCCLVFTSQKFLSQHVECNHSSQNFPGPSARKLLQRENPCPGDQNQEQQYSDSRSCNDKTKGQEIKERSKL NKRIWQRKISRAFSSLPKGQMGSSRVGERMMEEESRTGQKVNPGNTGKLFVGVGISRIAK
VKYGECGQGFSDKSDVIAHQGTHTGGKS
.ICRECGWGFSQESHLLIHQRTHTGEKL
YVCRECGQGFSQKSDLLSHQRTHTGEKP
YVRRECGRGFSQKSNLLSHQRTHTEEKP
YVCRECGWGFSQKSHLLIHQRTHTGKKP
VCRKDE......................

>PRDM7_macMul Macaca mulatta (rhesus) genome Prim pseu 2 GAS8+ chr20 frameshifts exon 5 and 10, exon 10 a to aa restores frame
0  0 
0  1
2  1
2 VKPPWMAFRVEQSKHQK 0
0 EMPKTSFNNESSLKELSGTPNLLSTSDSE AQKPASPPGEASTSGQHSRLKL 1
2 ELRRKETEGKMYSLRERKRHAYKEASELQHDDYL 1
2 YCEMCQNFFIDSCAAHGPPTFVKDNAVNKGHPNRSALSLPPGLRIGPSGIPQAGLGVWNEASDLPLGLHFGPCEGRITEDKEAANSGYSWL 0
0 ITKGRNCYEYVDGKDKSWAKWMR 2
1  1
2 EPKPEIYPCPSCCLAFSSQKFLSQHVERNHSSQNFPGPSARKLLQSENPCPGDQNQEQQYSDPSSCNDKTKGQEIKERSKLLNKRTWQREILRAFTSPPKGQMGSSRVGERMMEEEFRTGQKANPGNTGKLFVGVEISRIAK
VKYGECGQGFSGKSDVITHQRTHTEGKP
YVCRGCGRRFSQKSSLLRHQRTHTGEKP
VCKKNE......................

>PRDM7_papHam Papio hamadryas (baboon) genome Prim pseu 2 gas8+ contigs scattered
0 MSPERSQEESPEEDTERTEWKPM 0 
0 VKDAFKDISIYFTKEEWAEMGDWEKTRYRNVKRNYNALITI 1
2 GLRATRPAFMCHRRQAIKLQVDDTEDSDEEWTPRQQ 1
2 VKPPWMAVRVEQSKHQK 0
0 GMPKASFNNESSLKEVSGMANLLNTSGSEQAQKPVSPPGEARTSGQHSRLKL 1
2 ELRRKETEGKMYSLRERKGHAYKEVSEPQDDDYL 1
2 YCEMCQNFFIDSCAAHGPPTFIKDSAVEKGHPNRSALSLPPGLRIGPSGIPQAGLGVWNEASDLPLGLHFGPYEGRITQDEEAANNGYSWL 0
0 ITKGRNCYEYVDGKDKSWANWMR 2
1 YVNCARDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKWKKELMAGR 1
2 EPKPEIYPCPSCCLAFSSQKFLSQHVERNHSSQNFPGPSARKLLQSENPCPGDQNQEQQYSDPSSCNDKTKGQEIKERSKLLNKRTRQRQILRAFTSPPKGQMGSSRVGERMMKEEFRTGQKANPGNTGKLFVGVEISRIAK
VKYGECGQGFSDKSDVVIHQRTHTREKP
YVYRgCGQGFSIKSNLLRHQRIHTGEKP

>PRDM7_calJac Callithrix jacchus (marmoset) genome Prim gene 12 GAS8+ chr20 one frameshift in repeat area chr20 terminus
0 MSPERSQEESPEGDTGRTEQKPM 0 
0 VKDAFKDISMYFSKEEWAEMGDWEKTRYRNMKRNYNALITI 1
2 GLRATRPAFMCHRRQAIKLQVDDTEDSDEEWTPRQQ 1
2 VKPPGMAFRVGQSKHQK 0
0 GMPKASFGNESSLKKLSGTANVLNTSGPEQAQKPVSPPGEASTSGQHSRLKL 1
2 ELRRKDTEEKMYSLRERKGLAYKEVSEPQDDDYL 1
2 yCEICQNFFIDSCAAHGPPTFVKDSAVDKGHPNHAALSLPPGLRIGPSGIPQAGLGVWNEASDLPLGLHFGPYEGRVTEDEEAASSGYSWL 0
0 ITKGRNCYEYVDGKDKSWANWMR 2
1 YVNCARDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKWKKELMAGR 1
2 ESKPEIHPCPSCCLAFSSQKFLSHHVERNHSSQNFPGTSTRKLLQPENPCPGKQKEEQQYFDPCNSNDKTKGQETKERSKLLNIRTWQREMARAFSNPPKGQMGSSRVEERMMEEESRTGQKVNPVDTGKLFVGVGISRIAK
AKYGECGQGFSDMSDVTGHQRTHTGEKP
YVCRECGRGFSQKSALLSHQRTHTGEKP
YVCRECGRGFSQKSHLLSHQRTHTGEKP
YVCTECGRGFSQKSVLLSHQRTHTGEKP
YVCTECGRGFSRKSNLLSHQRTHTGEKP
YVCRECGRGFSRKSALLSHQRTHTGEKP
YVCRKCGRGFSQKSNLLSHQGTHTGEKP
YVCTECGRGFSQKSHLLSHQRTHTGEKP
YVCRKCGRGFSQKSNLLSHQRTHTGEKP
YVCRECGRGFSFKSALLRHQRTHTGEKP
YVCRECGRGFSRKSHLLSHQGTHIGEKP
YVCRECGRGFSRKSNLLSHQRIHTGEKP
YVRREDE.....................

>PRDM7_micMur Microcebus murinus (lemur) ABDC01433247 Prim gene 8 gas8+ weak coverage
0 MSPEKSQEESPEEDTERTERKPM 0 
0 vKDAFKDISIYFTKEEWAEMGDWEKTRYRNVKRNYNALITI 1
2 GLRATRPAFMCHRRQAIKLQVDDTEDSDEEWTPRQQ 1
2 VKPPWMALRVEQRKHQK 0
0 GMPKASFSNESSLKELSRTANLLNASGSEQAQKPVSPSGEASTSGQHSRLKL 1
2 ELRKKETERKMYSLRERKGHAYKEVSEPQDDDYL 1
2 YCEKCQNFFIDSCAAHGPPTFVKDSAVDKGHPNRSALSLPPGLKIRPSGIPQAGLGVWNEASELPLGLHFGPYEGQVTEDEEAANSGYSWL 0
0 ITKGRNCYEYVDGKDDSWANWMR 2
1 YVNCARDEEEQNLVAFQYHRQIFYRTCQVIRPGCELLVWYGDEYGQELGIKWGSKWKEELTIRQ 1
2 EPKPEIHPCPSCSLAFSSQKFLSQHVKHTHSSQISPRTSGRKHLQPENPCPGDQNQEQQHSDPHSCNDKAKDQEVKERPKPFHKKTQQRGISRAFSSPPKGKMGSCREGKRIMEEEPRTGQKVGPGDTDKLCAAGGISRISR
VKYGDSGQSFSDKSNVIIHQRTHTGEKP
YVCRECGRGFSQKSDLLKHQRTHTGEKP
YVCRECGRGFSQKSHLLRHQRTHTGEKP
YVCRECGRGFSQKSDLLIHQRTHTGEKP
YVCRECGRGFSCKSHLLIHQRTHTGEKP
YVCRECGRGFSCKSSLLIHQRTHTGEKP
YVCRGVWGEALAESQTSSYTRGHTQGRS
PVFAGRVSKSLALNYISTATGGHLLTSH
LPTPALGGASKGSLLTLYISQECKETRN

>PRDM7_otoGar Otolemur garnettii (galago) genome Prim gene 7 GAS8+ good coverage
0 MSPEKSQEESPEEDTERTERKPM 0 
0 VKDAFKDISIYFTKEEWAEMGDWEKTRYRNVKRNYNALITI 1
2 GLRATRPAFMCHRRQAIKLQVDDTEDSDEEWTPRQQ 1
2 VKHPWMAFRMEQSKRQK 0
0 ILKKCMLSFNMHLKELSGPASLPNISGSEQHQKHMSSPREASTSGQHSGRKS 1
2 DLRIKEIEVRMYSLRERKGHAYKEVSEPQDDDYL 1
2 yCEKCQNFFIDNCAVHGPPTFVKDTAVEKGHPNRSVLSLPSGLGIRTSGIPQAGFGVWNEASDLQLGLHFGPYEGQVTEDEEAANSGYSWL 0
0 ITKGRNCYEYVDGKDESQGNWMR 2
1 YVNCARDEEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKWKKELTAGQ 1
2 EPKPEIHPCPSCSLAFSTQKFLSQHVERTHPSQISQGTSGRKNLRPQTPCPRDENQEQQHSDPNSRNDKTKGQEVKEMSKTSHKKTQQSRISRIFSCPPKGQMGSSREGERMIEEEPRPDQKVGPGDTEKFCVAIGISGIVK
VKNRECVQSFSNKS
NLRHQRTHTGEKP
YMCRDCGRGFSHKSSLFRHQRTHTGEKP
YVCRDCGRGFSLKANLLTHQRTHTGEKP
YVCRDCGQGFSQKAHLLRHQRTHTGEKP
YMCRDCGQGFSRKAYLLTHQRTHTGEKP
YVCRDCGQGFSQKAHLLTHQRTHTGEKP
YVCRDCGRGFSHKSSLFRHQRTHTGEKP
YICRDCG

>PRDM7_tarSyr Tarsius syrichta (tarsier) ABRT011082008 Prim pseu -- gas8+ double frameshift in exon 5, ABRT010499286
0  0 
0  1
2 GLRAPRPAFMCHRKRAIKPLVDDTEDSDEEWTPRQQ 1
2  0
0 GMPRAPLSIVSSLKELSEMANLLNTSDSEQAWKPVSPSREASTSEQHSRKKL 1
2 EFRKKEIEVNMYSLRERKDCAYKEVNEPQDDDYL 1
2 YCEQCQNFFIDSCATHGIPTFINDSAVDKGHPNRSALSLPPGLRIGPSGIPQAGLGVWNEASELPLGLHFGPYEGQITDDEEAANSGYSWL 0
0 ITKGRNCYEYVDGKDKSWANWMR 2
1 YVNCARDDEEQNLVAFQYHRQIFYRTCRIIRPGCELLVWYGDEYGQELGIKWGSKWKKELMAGR 1
2 

>PRDM7_tupBel Tupaia belangeri (tree_shrew) AAPY01316756 noDet
0 MRRYKSPEESPEGDAGRTEWKPT 0
0 VKDAFKDISVYFSKEEWAQMGEWEKIRYRNVKRNYTTLIAI 1
2 GLRAPRPAFMCHRKLAVKPHMDDAEDSDEEWTPRQQ 1
2  0
0  1
2   KMYSLRERKCGTYKEVHEPQDDDYL 1
2 yCEKCQNFFIDSCSAHGPPIFVKDSAVDKGSLNRSVLSLPPGLRIAPSGIPEAGLGVWNAATDLPLGLHFGPYEGQITEDEEAANSGYSWL 0
0 ITKGRNCYEYVDGKDESCANWMR 2
1 YVNCARDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGEEYGQELGIKWGSKWKKSLWQGE 1
2 EPRPEIHPCLSCSLAFSSQKFLNQHVEHNHSCQRSLRTS
            QSSLIRHQRTHTGEKP
YLCGECGRGFSRQSHLIIHQRTHTGEKP
YVCRECGRGFSLQSNLIIHQRTHTGEKP
YGCRECGRGFSQQSSLIRHQRTHTGEKP
YVCRECGRGFSRHSSLIIHQRTHTGEKP
YLCGECGRGFSRQSHLIIHQRTHTGEKP
YVCRECGRGFSQQPQLIIHQRTHTGEKP
YVCRECGRGFRCQSHLIIHQRTHTGEKP
YVCRECGRGFSQQPHLIIHQRTHTGEKP*VCRKGE

>PRDM9_oryCun Oryctolagus cuniculus (rabbit) genome Glir gene 8 other Un0161 exon 2 ttt to tt restores frame; ZNF717+ DCAF4+ YAP1+ PRDM9- qTer 
0 MSAAAPAEPSPGADAGQARGKPE 0 
0 VQDAFRDISIYFSKEEWAEMGEWEKIRYRNVKRNYCALVAI 1
2 GLRAPRPAFMCHRRLAVRARADDTEDSDEEWTPRQQ 1
2 VKPPWMAFRTEHSKHQK 0
0 GMPRLPVNNESSLKELSGTANLLKTTGSEEDQKPSFPPKETRTSGQHSTRKL 1
2 GLRRKNIEVKMYSFRKRKSQAYKECSEPQDDDYL 1
2 YCEKCQNFFLDSCAVHGPPIFVKDSAVDKGHPNRSVLSLPPGLRIGPSGIPEAGLGVWNEASDLPLGLHFGPYEGQITEEEEAANSGYSWL 0
0 ITKGRNCYEYVDGKDRSWANWMR 2
1 YVNCARNDEEQNLVAFQYHKQIFYRTCQVIKPGCELLVWYGDEYGQELGIKWGSKWKEELTAGR 1
2 EPKPEIHPCPSCSLAFSSHKFLSQHMERSHSSQIFPGAPARNHLQPANPCPGKEHQKLSDPQSWNDKNEGQDVKEKSRFSSKRTRQKAISRSFSSLPKGQVETSREGERMIEEEPRIGQELNPEDTGKSSVGAGLSRIAG
VKYRDCRQGLSDKSHLINGQRAHTGEKP
YACRECERGFTVKSNLISHQRTHTGEKP
YACRECGRGFTVKSALTTHQRTHTGEKP
YACRECGRGFTVKSHLISHQRTHTGEKP
YACRECGRGFTVKSALITHQRTHTGEKP
YACRECGQGFTVKSNLISHQRTHTGEKP
YACRECGRGFTQKSHLINHLRAHTGEKP
YACRECGRGFTVKSDLISHQRTHTGEKP
YACRVDE.....................

>PRDM7_oryCun Oryctolagus cuniculus (rabbit) genome Glir gene 4 other synteny novel
0  0 
0  1
2 GLRAPRPAFMCHRRLAVRARADDTEDSDEEWTPRQQ 1
2 VKPPWMAFRTEHSKHQK 0
0 GMPRLPVNNESSLKELSGIANLLNTTGSEEDQKPSFPPKETRTSGQHSTRKL 1
2 GLRRKNIEVKMYSFRKRKSQAYKECSEPQDDDYL 1
2 YCEKCQNFFLDSCAVHGPPIFVKDSAVDKGHPNRSVLSLPPGLRIGPSGIPEAGLGVWNEASDLPLGLHFGPYEGQITEEEEAANSGYSWL 0
0 ITKGRNCYEYVDGKDRSWANWMR 2
1 YVNCARNDEEQNLVAFQYHKQIFYRTCQVIKPGCELLVWYGDEYGQELGIKWGSKWKEELTAGR 1
2 EPKPEIHPCPSCSLAFSSHKFLSQHMECSHSSQIFPGAPARNHLQPANPCPGKEHQKLSDPQSWNDKNEGQDVKEKSRFSSKRTRQKAISRSFSSLPKGQVETSREGERMIEEEPRIGQELNPEDTGKSSVGAGLSRIAG
VKYRDCRQGLSDKSHLINGQRAHTGEKP
YACRECGQSFTVKSNLISHQRTHTGEKP
YACRECGRGFTQKSHLIRHQRTHTGEKP
YACRECGQSFTWKSNLISHQRTHTGEKP
YACRVDE.....................

>PRDM7_ochPri Ochotona princeps (pika) AAYZ01312269 Glir gene -- noDet dubious fragment, no orthologous terminal exon
0  0 
0  1
2  1
2  0
0  1
2  1
2 yCEMCQNFFIESCAVHGSPTFVKD     GHPHRSVLSLPSGLRIGPSGIPEAGLGVWNETTDLPLGLHFGPYEGQVTEEEEATNSGYSWL 0
0 ITKGRNRYEYVDGKDPSQANWMR 2
1 YVNCARNDEEQNLVAFQYHRQIFYRTCRAVRQGCELLVWYGDEYGQELGIKWGSKWKEELTAGR 1
2 

>PRDM7_ratNor Rattus norvegicus (rat) P0C6Y7 Glir gene 10 PDCD2 chr1 FM103467 single transcript from body fat
0 MNTNKPEENSTEGDAGKLEWKPK 0 
0 VKDEFKDISIYFSKEEWAEMGEWEKIRYRNVKRNYKMLISI 1
2 GLRAPRPAFMCYQRQAIKPQINDNEDSDEEWTPKQQ 1
2 VSSPWVPFRVKHSKQQK 0
0 ETPRMPLSDKSSVKEVFGIENLLNTSGSEHAQKPVCSPEEGNTSGQHFGKKL 1
2 KLRRKNVEVNRYRLRERKDLAYEEVSEPQDDDYL 1
2 YCEKCQNFFIDSCPNHGPPVFVKDSVVDRGHPNHSVLSLPPGLRIGPSGIPEAGLGVWNEASDLPVGLHFGPYKGQITEDEEAANSGYSWL 0
0 ITKGRNCYEYVDGQDESQANWMR 2
1 YVNCARDDEEQNLVAFQYHRKIFYRTCRVIRPGRELLVWYGDEYGQELGIKWGSKMKKGFTAGR 1
2 ELRTEIHPCFLCSLAFSSQKFLTQHVEWNHRTEIFPGASARINPKPGDPCPDQLQEHFDSQNKNDKASNEVKRKSKPRHKWTRQRISTAFSSTLKEQMRSEESKRTVEEELRTGQTTNIEDTAKSFIASETS
RIERQCGQCFSDKSNVSEHQRTHTGEKP
YICRECGRGFSQKSDLIKHQRTHTEEKP
YICRECGRGFTQKSDLIKHQRTHTEEKP
YICRECGRGFTQKSDLIKHQRTHTGEKP
YICRECGRGFTQKSDLIKHQRTHTEEKP
YICRECGRGFTQKSSLIRHQRTHTGEKP
YICRECGLGFTQKSNLIRHLRTHTGEKP
YICRECGLGFTRKSNLIQHQRTHTGEKP
YICRECGQGLTWKSSLIQHQRTHTGEKP
YICRECGRGFTWKSSLIQHQRTHTVEK.

>PRDM7_musMus Mus musculus (mouse) Q96EQ9 Glir gene 12 PDCD2 chr17 CN723438 eight transcripts, four from retina
0 MNTNKLEENSPEEDTGKFEWKPK 0 
0 VKDEFKDISIYFSKEEWAEMGEWEKIRYRNVKRNYKMLISI 1
2 GLRAPRPAFMCYQRQAMKPQINDSEDSDEEWTPKQQ 1
2 VSPPWVPFRVKHSKQQK 0
0 ESSRMPFSGESNVKEGSGIENLLNTSGSEHVQKPVSSLEEGNTSGQHSGKKL 1
2 KLRKKNVEVKMYRLRERKGLAYEEVSEPQDDDYL 1
2 YCEKCQNFFIDSCPNHGPPLFVKDSMVDRGHPNHSVLSLPPGLRISPSGIPEAGLGVWNEASDLPVGLHFGPYEGQITEDEEAANSGYSWL 0
0 ITKGRNCYEYVDGQDESQANWMR 2
1 YVNCARDDEEQNLVAFQYHRKIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKMKKGFTAGR 1
2 ELRTEIHPCLLCSLAFSSQKFLTQHMEWNHRTEIFPGTSARINPKPGDPCSDQLQEQHVDSQNKNDKASNEVKRKSKPRQRISTTFPSTLKEQMRSEESKRTVEELRTGQTTNTEDTVKSFIASEIS
SIERQCGQYFSDKSNVNEHQKTHTGEKP
YVCRECGRGFTQNSHLIQHQRTHTGEKP
YVCRECGRGFTQKSDLIKHQRTHTGEKP
YVCRECGRGFTQKSDLIKHQRTHTGEKP
YVCRECGRGFTQKSVLIKHQRTHTGEKP
YVCRECGRGFTQKSVLIKHQRTHTGEKP
YVCRECGRGFTAKSVLIQHQRTHTGEKP
YVCRECGRGFTAKSNLIQHQRTHTGEKP
YVCRECGRGFTAKSVLIQHQRTHTGEKP
YVCRECGRGFTAKSVLIQHQRTHTGEKP
YVCRECGRGFTQKSNLIKHQRTHTGEKP
YVCRECGWGFTQKSDLIQHQRTHTREK.

>PRDM7_musMol Mus molossinus (wild_mouse) GU216230 Glir gene 11 noDet full length deposit
0 MNTNKLEENSPEEDTGKFEWKPK 0 
0 VKDEFKDISIYFSKEEWAEMGEWEKIRYRNVKRNYKMLISI 1
2 GLRAPRPAFMCYQRQAMKPQINDSEDSDEEWTPKQQ 1
2 VSPPWVPFRVKHSKQQK 0
0 ESSRMPFSGESNVKEGSGIENLLNTSGSEHVQKPVSSLEEGNTSGQHSGKKL 1
2 KLRKKNVEVKMYRLRERKGLAYKEVSEPQDDDYL 1
2 YCEKCQNFFIDSCPNHGPPLFVKDSMVDRGHPNHSVLSLPPGLRISPSGIPEAGLGVWNEASDLPVGLHFGPYEGQITEDEEAANSGYSWL 0
0 ITKGRNCYEYVDGQDESQANWMR 2
1 YVNCARDDEEQNLVAFQYHRKIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKMKKGFTAGR 1
2 ELRTEIHPCLLCSLAFSSQKFLTQHMEWNHRTEIFPGTSARINPKPGDPCSDQLQEQHVDSQNKNDKASNEVKRKSKPRQRISTTFPSTLKEQMRSEESKRTVEELRTGQTTNTEDTVKSFIASEIS
SIERQCGQYFSDKSNVNEHQKTHTGEKP
YVCRECGRGFTAKSNLIQHQRTHTGEKP
YVCRECGRGFTQKSVLIQHQRTHTGEKP
YVCRECGRGFTQKSDLIKHQRTHTGEKP
YVCRECGRGFTAKSNLIQHQRTHTGEKP
YVCRECGRGFTEKSSLIKHQRTHTGEKP
YVCRECGWGFTAKSNLIQHQRTHTGEKP
YVCRECGRGFTQKSSLIKHQRTHTGEKP
YVCRECGRGFTAKSNLIQHQRTHTGEKP
YVCRECGWGFTQKSNLIKHQRTHTGEKP
YVCRECGWGFTQKSDLIQHQRTHTR.EK

>PRDM7_dipOrd Dipodomys ordii (kangaroo_rat) genome Glir gene -- noDet dubious fragment, no orthologous terminal exon
0  0 
0  1
2 GLKAPRPVFMCHRRQAIKPQVDDTDDSDEEWTPGRQ 1
2  0
0  1
2 elRTKEVKMRMYSLRERKSYAYEEISEPQDDDYL 1
2 yCEQCQNFFINSCTVHGPPIFVRDNVVDKGHYDRSVLSLPPGLRIRQSSIPEAGLGVWNEESDLPLGLHFGPYEGQITEDEDAANSGYSWM 0
0 ITKGRNCYVYVDGKDKSQANWMR 2
1 YVNCARYDEEQNLVAFQYHRQIFYRTCRVIKAGCELLVWYGDEYGQELGIKWGSKWKRELTAgr 1
2 

>PRDM7_speTri Spermophil tridecemlin (squirrel) AAQQ01308561 Glir gene -- noDet plus exon by exon traces
0  0 
0  1
2 GFRAPRPAFMCHQRQTIKLQMDDTEDSDEEWTPRQQ 1
2  0
0 LKPEVLLSNESSLKELSGTANLLNTSGSEQVQKPVSPLREASASRQHSRRKL 1
2 ELRTKEVEVKMYSLRERKGHAYKEVSEPQDDDYL 1
2 yCDKCQNFFMDSCPVHGPPTFIKDSVVNKDHSNHSTLSLPLGLRIGPSSIPEAGLGVWNEATDLPLGLHFGPYRGQITEDEEAANSGYSWL 0
0 ITKGRNCYEYVDGKDESQANWMR 2
1 YVNCARDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKWKKELSAGR 1
2 EPKPEIHPCPSCSLAFSSQKFLSQHVDRSHPSQIFPGTSMRKKLIPGDSSPRDQLQEQQHPDPHGWNDKARGQEVQGSLKPTHKGTRQRGISSPPKGQMGRSEESERMMEDDLKADQEINPEDTDKILVGVEMSRI
-

>PRDM9a_bosTau Bos taurus (cattle) NW_003053109 Laur gene 7 noDet chr1 
0 MSQNRSPEERTKGDAGRTEWKLT 0 
0 AKDAFKDISIYFSKEEWAEMGEWEKTGYRNVKRNYEVLIAI 1
2 GLRATQPAFMHHRRQVIKPQGDDTEDSDEEWTPQHQ 1
2 GKPSRKAFRMEHRKHQK 0
0 GKSRGPLSKVSSLKKLQGAAKLLNTSGSKWAQKPANPPRETRTLEQHSRQKV 1
2 ELRRKETDMKRYSLRERKGHVYQEVSEPQDDDYL 1
2 YCQECQNFFIDSCDAHGPPTFVKDSAVEKGHANRSVLTLPPGLSIKLSGIPEAGLGVWNEASHLPLGLHFGPYEGQITDDKEAINSGYSWL 0
0 ITKGRNSYEYVDGKDTSLaNWMR 2
1 YVNCARHYEEQNLVAFQYHGQIFYRTCQVVRPGCELLVWYGDEYGEKLGIKCESRGKSMFAAGr 1
2 ESKPKIHPCASCSLAFSSQKFLSQHVQHNHPSQTLLRPSARDYLQPEDPCPGSQNQQQRYSDPHSPSDKPEGREVKDRPQPLLKSIRLKRISRASSYSPRGQMGASGVHERITEEPSTSQKPNPEDTGKLFMGAGVSGIIK
VKYGECGQGSKDRSSLITNQRTHTGEKP
YVCGECGQSFNQKSTLITHQRTHTGEKP
YVCGECGRSFNQKSTLITHQRTHTGEKP
YVCGECGRSFSQKSTLIKHQRTHTGEKP
YVCGECGQSFNQKSTLITHQRTHTGEKP
YVCGECGQSFNQKSTLITHQRTHTGEKP
YVCGECGRSFSRKSTLITHQRTHRGEKL
CLQGV......................

>PRDM9b_bosTau Bos taurus (cattle) DAAA02065087 Laur gene 5 noDet chrU aaaaa fixed to aaaaaa in exon 2 KRAB SSXRD SET C2H2
0 MSPNRSPENSTEGDAGRTEWKPM 0 
0 AKDAFKDISIYFTKEEWAEMGEWEKIQYRNVKRNYEALIAI 1
2 GFRATQPGFMHHGRQVLKSQVDDTEDSDEEWTPRQQ 1
2 GKPSGMAFRGEPSKHPK 0
0 RLSRGPLNKVSSLKKLPGAAKLLKKSGSKQAQKPVPPPREARTPGKHPRHKV 1
2 ELRRKETEVKRYSVRERKGHVYQEVSEPQDDDYL 1
2 YCEECQNFFIDSCAAHGPPTFVKDSAVEKGHANRSALTLPPGLSIRPSGIPEAGLGVWNEASDLPLGLHFGPYEGQIIYNEEDSNSGYCWL 0
0 VTKGRNSYEYVDGKDTSLANWMR 2
1 YVNCARDDEEQNLVALQYHGQIFYRTCRVVRPGCELLVWYGDEYGEELGIKQDKRGKSKLSAQR 1
2 AKMHPCASCSLAFSSQKFLSQHVQRNHPSQTLLRPSARDHLQPEDPCPGNQNQQQRYSDPHSPSDKPEGRKAKDRPQPLLKSIKLKRISRASSYSPRGQVGRSGVHERITEEPSTSQKLNPEDTGKLFMGAGVSGIIK
VKYRECGQGSKDRSSLITHERTHRAEAL
CLRRVWAKLQSEVPLLVMHQRTHTGEKL
YVCGECGKSFSQKSPLIRHQRTHTGEKP
YVCGECGKSFSQKSPLIRHQRTHTGKKP
YVCRECGRSFSDKSH.HTPEYTHRGEAL
HLRGVWA.....................

>PRDM9c_bosTau Bos taurus (cattle) XM_002699750 Laur gene -- noDet chrX GO353654 4-cell embryo transcript no zinc downstream despite 43k bp
0 MSPNRSPENSTEGDAGRTEWKPM 0 
0 AKDAFKDISIYFTKEEWAEMGEWEKIRYRNVKRNYEALIAI 1
2 GFRATQPGFMHHRRQVLKPQVDDTEDSDEEWTPRQQ 1
2 GKPSGMAFRGERSKHQK 0
0 RLSRGPLNKVSSLKKLPGAAKLLKKSGSKQAQKPVPPPREARTPGKHPRHKV 1
2 ELRRKETKVKRYSVRERKGHVYQEVSEPQDDDYL 1
2 YCEECQNFFIDSCAAHGPPTFVKDSAVEKGHANRSALTLPPGLSIRPSGIPEAGLGVWNEASDLPLGLHFGPYEGQIIYNEEDSHSGYCWL 0
0 VTKGRNSYEYVDGKDTSLANWMR 2
1 YVNCARDDEEQNLVALQYHGQIFYRTCRVVRPGCELLVWYGDEYGEELGIKQDKRGKSKLSAQR 1
2 

>PRDM9d_bosTau Bos taurus (cattle) genome Laur gene 9 noDet chrX proximal tandem
0 MRPNTSPEESTERDAGRTEWKPT 0 
0 AKDAFKDISVYFSKEEWEEMGEWEKIRYRNVKRNYEALIAI 1
2 GFRATRPAFMHHRRQVIKLQADDTEDSDEEWTPRQQ 1
2 GKLSSMAFRVEHNKHQN 0
0 TMSRAPLSKEFSLKELPGAAKLLKTSGSKQAQKLVPPPGKARTPGQHPRQKV 1
2 ELRRKETEVKRYSLRERKGHVYQEVSEPQDDDYL 1
2 YCEECQSFFIDSCAAHGPPIFVKDCAVEKGHANRSALTLPPGLSIRESSIPEAGLGVWNEVSDLPLGLHFGPYEGQITDDEEAANSGYSWL 0
0 ITKRRNCYEYVDGKDTSLANWMR 2
1 YVNCARDDEEQNLVALQYHGQIFYRTCQVVRPGCELLVWYGDEYGQDLGIKRESSRKSELAGPR 1
2 EPKPKIYPCASCCLSFSSQKFLSQHVQRNHPSQILLRPSIGDHLQPEDPCPGSQNQQQRYSDPHSLSDKPEGREPKERPHPLLKGPKLCIRPKRISTASSYPPKGQMGGSEVHERMTEEPSTSQKLNPEDTGKLFMEAGVSGIVR
VNYGDHEQGSKDRSSLITHEKIHTGEKP
YVCKECGKSFNGRSDLTKHKRTHTGEKP
YACGECGRSFSFKKNLITHKRTHTREKP
YVCRECGRSFNEKSRLTIHKRTHTGEKP
YVCGDCGQSFSLKSVLITHQRTHTGEKP
YVCGECGRSFNEKSRLTIHKRTHTGEKP
YVCGDCGQSFSLKSVLITHQRTHTGEKP
YVCGECGQSFNEKSRLTIHKRTHTGEKP
YACGDCGQSFSLKSVLITHQRTHTGEKP
YVCMECE.....................

>PRDM9e_bosTau Bos taurus (cattle) genome Laur gene 9 noDet chrX distal tandem
0 MRPNRSPEESTEGDAGRTEWKPM 0 
0 AKDAFKDISIYFSKEEWEEMGEWEKIRYRNVKRNYEVLITI 1
2 GFRAARPAFMHHRRQVIKPQVNDIKDSDEEWTPRQQ 1
2 GKPFSMAFRVEHSKHQK 0
0 GMSRAPLSKESSLKELPGAAKLLKTSGCKQAQKLVPPPRKARTPEQHPRQKV 1
2 ERRRKETGVKRYSLREREGLVYQEVSEPLDDDYL 1
2 YCEECQSFFIDICAAHRPPTFVKDCAVEKGHANCSALTLPPGLSIRLSGIPEAGLGVWNEASDLPLGLHFGPYEGQITDDKEAAHSRYSWL 0
0 ITKGRNCYEYVDGKDTSLANWMR 2
1 YVNCARDDEEQNLVALQYQGQIFYRTCQVVRPGCELLVWYGDEYGWDLSIKQDSRGKNKLAAGR 1
2 EPKPKIYPCASCCLSFSSQKFLSQHVQRNHPSQILLRPSIGDHLQPEDPCPGSQNEQQRYSDPHSLSDKPEGREPKERPHPLLKGPKLCIRLKRISTASSYPPKGQMGGSEVHERMTEEPSTSQKLNPEDTGKLFMEAGVSGIVR
VKYGEHEQDSKDKSSLITHEKIHTGEKP
YVCTECGKSFNWKSDLTKHKRTHSEEKP
YACGECGRSFSFKKNLIIHQRTHTGEKP
YVCGECGRSFSEKSNLTKHKRTHTGEKP
YACGECGQSFSFKKNLITHQRTHTGEKP
YVCGECGRSFSEKSRLTTHKRTHTGEKP
YVCGDCGQSFSLKSVLITHQRTHTGEKP
YVCRECGRSFSVISNLIRHQRTHTGEKP
YVCRECEQSFREKSNLVRHQRTHTGEKP
YVCMECE.....................

>PRDM9e_oviAri Ovis aries (sheep) genome Laur pseu -- noDet chr 18 cow has PDRM7 pseudogene; sheep GAS8 is on sheep chr14
0  0 
0  1
2 GLRAP PPFMYHRRQVIKPQVDDIEDSDEEWTPRQQ 1
2  0
0  1
2 ELRRKETEMKIYSLQKRKGHMYQEVSDPQDDNYL 1
2 ycEKCQNF INSCAAHGPPTFVKDCVVEKGHASCSALtLSPGLSIRPSGIPEAGLRVWNEASDLPLGLHFGPYKGQITDDEEVANSRYFWL 0
0  2
1 YVNCAQDDEEQNLVAFQYHRQIFS TCWVVRPGCELLVWYRDEYGQELSIK GSRHKSELTVRR 1
2 

>PRDM9d_oviAri Ovis aries (sheep) genome Laur gene -- noDet chr1 near end chr1
0  0 
0  1
2 GLRATRLAFMHHCRQVIKPQVDDIEDSDEEWTPRQQ 1
2  0
0  1
2  1
2  0
0 ITKGRNCYEYVDGKDTSLANWMR 2
1 YVNCARDDEEQNLVALQYQGQIFYRTCQVVRPGCELLVWYGDEYGQDLGIKRDSSGKSELAAGR 1
2 

>PRDM9c_oviAri Ovis aries (sheep) genome Laur pseu 4 noDet chr5 middle of 108,514,869 bp
0  0 
0  1
2 GLRATRLAFMHHCRQVIKPQVDDIEDSDEEWTPRQQ 1
2  0
0 GMSKALVSNKSSLKEMPGASKLLKTRGPKQAQIPVPAPREPSTSEQHPRQKV 1
2  1
2                HGLPTLVKDCAVEKGHANHSALSLSPGSSIRPSGIPEAGLGVWNKVSDLLLGLHFGSYVGQITDDEEAAKSGYSWL 0
0  2
1 YVNGAQD KEQNLVAFLTHRQIFY TCRVVRPGCELLVWYRDTYSQELSIKCGSRWKSELTASR 1
2 PMCSCSLAFSSQKFLSQHVKCNHPSQILLKTSARDRLQPEDPCPGNPNQQQQYSDLHSWSDKPESRESKEKPQPLLKSIRLRRISRASSYSSRGQMGGFRVHKRMREEPSTGKEVSPEDAGKLFMGEGVSRIMR
VKYGDCG
GSKDRSSLMTHQRTHTGENP
YVCREYE.SFSEKSSLIKHQRTHTGEKP
YVCRECWQSFGRKSTLITHQRMHTREKP
CVCRECGRSFSKKSTLITHQRTHTGQKP

>PRDM9b_oviAri Ovis aries (sheep) genome Laur pseu 2 noDet chrX not tandem: 62 mbp separation
0 MSPNRSPENSTEGDAGRTEWKPM 0 
0 AKDAFKDISIYFTKEEWAEMGEWEKIRYRNVKRNYEALIAI 1
2 GFRATQPAFMHHHRQVIKPQVDDTEDSEEEWTPRQQ 1
2 GKPSGMAFRGERSKHQK 0
0 RLSRGPLNKVSSLKKLPGAAKLLKKTGSKQAQKPVPPPREARTPGQHPRHKV 1
2 ELRRKETEVKRYSLRERKGHVYQEVSELQDDDYL 1
2 yCEECQNFFIDSCAAHGPPTFVKDSAVEKGHANRSALTLPPGLSIRPSGIPEAGLGVWNEASDLPLGLHFGPYEGQVIYNEEASHSGYSWL 0
0 VTKGRNSYEYVDGKDTSLANWMR 2
1 YVNCARDDEEQNLVALQYHGQIFYRTCQVVRPGCELLVWYGDEYGEELGIKQDSRGKSKLSAQR 1
2 ELKPKIHPCASCSPAFSSQKFLSQYVQPNHPSQILLRPSARDHLQPEDPCPGNQNEQQ YSDPHSPSDKPEGCKAKERPPWLLKSMSVRISMASSYSPKGQMRGSETHYRMTEEPSTSQKLNPEDIGKLFMGTGVSGIIK
IKYEECGQVSKDRSSLITHEGTHTREQS
YVCRECGQSFSVKSSLIRLQRTHTGEKP
Y...........................

>PRDM9a_oviAri Ovis aries (sheep) genome Laur gene 9 noDet chrX not tandem
0 MSPNRSPENSTEGDAGRTEWKPM 0 
0 AKDAFKDISIYFTKEEWAEMGEWEKIRYRNVKRNYEALIAI 1
2 GFRATQPAFMHHHRQVIKPQVDDTEDSEEEWTPRQQ 1
2 GKPSGMAFRGERSKHQK 0
0 GMSRGPLSKVSSLKKLPGTTKLLKTSGSKQAQKPVPSSREARTSG HTRQKV 1
2 ELGRKETDMKRYSLRERKGHVYQEVSEPQDDDYL 1
2 yCQECQNFFINSCDAHGPPTFVKDSAVEKGHANRSALTLPPGLSIRLSGIPEAGLGVWNEASHLPLGLHFGPYEGQITDDKEAVNSGYSWL 0
0  2
1 YVNCARHYEEQNLVAFQYHGQIFYRTCQVVRPGCELLVWYGDEYGEKLGIRCESRGKSMLAAGR 1
2 EPKPKIHPCASCSLSFSSQKFLSQHVQRSHPSQILLRPSPRDHLQPEDPCPGKQNQQQRYSDPHSPSDKPEGQEPKERPHPLLKGPKLCIRLKRISTASSYTPKGQMGGSEVHEKMTEEPSTSQKLNPENTGKLFMEAGVSGIVR
VKYGEHEQGSKDKSSLITHERIHTGEKP
YVCKECGKSFNGRSNLTRHKRTHTGEKP
YVCRECGQSFSLKSILITHQRTHTGEKP
YVCGECGQSFSEKSNLTRHKRTHTGEKP
YVCRECGQSFSLKSILITHQRTHTGEKP
YVCRECGRSFSVKSNLTRHKMTHTGEKP
YVCGECGQSFSQKPHLIKHQRTHTGEKP
YVCRECGRSFSAMSNLIRHQRTHTGEKP
YVCRECGRSFSAMSNLIRHQRTHTGEKP
YVCREC......................

>PRDM9d_munMun Muntiacus muntjak (muntjac) AC216498 Laur gene 4 noDet frameshift exon 9 no syntenic loci; identities: 92%b 89%a 90%c
0 MRPNRSQEESTEGNAGRTERKPT 0 
0 GKDAFKDISVYFSKEEWEEMGEWEKIRYRNMKRNYEALIAI 1
2 GFRATQPTFMHHRRQVIKSQVDDTEDSDEEWTPRQQ 1
2 GKPSSMAFRVEHSKNQK 0
0 RMSRAPLSNESGLKELPGAAKSLKTSDSKQARNPVPHHRKARTPGQLPRQKV 1
2 ELRRKETGVKRYSLRERKGHVYQEVSEPQDDDYL 1
2 YCEECQNFFINSCAAHGPpTFVKDCAVEKGHANRSALTLPHGLSIRLSGIPDAGLGVWNKVSDLALGLHFGPYKGQITDNEEAANSGYAWL 0
0 ITKGRNCYEYVDGKDTSWANWMR 2
1 YVNCARDDEEQNLVAFQYHGQIFYRTCQVVRPGCELLVWYGDEYGQDFGIKRNSRGKSELAAGR 1
2 EPKPKIHPCASCSLTFSSQKFLSQHIQCSHPPQTLLRPSERDLLQPEDPCPGNQNQQQRYSDPHSPSDKPEGHEAKDRPQPLLKSIRLKRISRASSCSPRGQMGGSGVHERMTEEPSTSQKLNPGDTGTLLTGAGVSGIMK
VKYGECGQGSKDRSSLSTHERTHTGEKP
YVCRECGQSFSGKPVLIRHQRTHTGEKP
YVCMECGRSFSAKSVLMTHHRTHTGEKP
YICRECGQSFSQKIHLIRHQRIHTGE.P
SVFRECE.....................

>PRDM9c_munMun Muntiacus muntjak (muntjac) AC154919 Laur gene 15 noDet no syntenic loci AC204173 99% identical
0 MRPNRSPEESTEGDAGRTEQKPT 0 
0 AKDAFKDISVYFSKEEWEEMGDWEKIRYRNMKRNYEVLIAI 1
2 GFRATRPDFMHHRRQVIKPQVDDTEDSDEEWAPRQQ 1
2 GKPSSVAFRVEHSKHQK 0
0 RMSRAPLSNESGLKELPGAAKPLKTSGSKQAQNPVPHHRKARTPGQLPRQKV 1
2 ELRRKETGVKRYSLRERKGHVYQEVSKPQDDDYL 1
2 YCEKCQNFFIDSCAAHGPPTFVKDCAVEKGHANRSLLTLPPGLSIRLSGIPDAGLGVWNEASDLPLGLHFGPYEGQITDDEEAANSGYAWL 0
0 ITKGRDCYQYVDGKDTSWANWMR 2
1 YVNCARDDEEQNLVAFQYHGQIFYQTCQVVRPGCELLVWCGDEYGQDLGIKRNSRGKSELVAGR 1
2 EPKPKIHPCASCSLAFSSQKFLSQHIQRSHPSQTLLRPSERDLLQPEDPCPGNQNQRFSDPHRPSDRPQPLLKSIRLKRISRASSYSPRGQMGGSGVHELMTEEPSTSHKLNPEDTGTLLMGAGVSGIMR
VTYGECGQGSKDRSSLTTHERTYTGEKP
YVCGECGRSFCQKAHLITHQRTHTGEKP
YVCRECGQSFSRNSLLIRHQRIHTGEKP
YVCGECGRSFRDKSNLISHRRTHTGEKP
YVCGECGQSFSDKSNLIRHQRTHAGEKP
YVCGECGRSFNRKSHLITHQRTHTGEKP
YACRECGQSFSQKSILITHQRTHTGEKP
YACRECG.SFSQKSILITHQRTHTGEKP
YVCGECGRSFSQKSLLITHQRTHTGEKP
YVCMECGRSFSQKTHLITHQRTHTGEKP
YVCGECGRSFSQKSLLITHQRTHTGEKP
YVCGECGRSFSQKSLLITHQRTHTGEKP
YICMECGRSFSQKTHLITHQRTHTGEKP
YVCGKCGQSFSDKSNLISHKRTHTGEKP
YVCRECGRSFNRKSLLITHQRTHT.E.P
YVCRECE.....................

>PRDM9b_munMun Muntiacus muntjak (muntjac) AC218859 Laur gene 13 noDet no syntenic loci
0 MRPNTSPEESTEGDAGRTERKPT 0 
0 AKDAFKDISVYFSKEEWEEMGDWEKSRYRNMKRNYEVLIAI 1
2 GFRATRPDFMHHRRQVIKPQVDDTEDSDEEWAPRQQ 1
2 GKPSSMAFRVEHSKHQK 0
0 RMSRAPLSNESGLKELPGAAKPLKTSGSKQAQNPVPHHRKARTPGQLPRQKV 1
2 ELRRKETGVKRYSLRERKGHVYQEVSKPQDDDYL 1
2 YCEECQNFFIDSCAAHGPPTFVKDCAVEKGHANRSALTLPPGLSIRLSGIPDAGLGVWNETSDLPLGLHFGPYEGQITDDEEAANSGYAWL 0
0 ITKGRNCYQYVDGKDTSWANWMR 2
1 YVNCARDDEEQNLVAFQYHGQIFYRTCQVIRPGCELLVWYGDEYGQDLGIKRNSRGKSELATGR 1
2 EPKPKIHPCASCSLAFSSQKFLSQHIQRSHPSQTLLRPSERDLLQPEDPCPGSQNQRYSDPHSPSDKPEGQEAKDRPQQLLKSIRLKRISRASSYSPGGQMGGSGVHERMTEEPSTSQKLNPEDTGTLLTGAGVSGIMR
VTYGECWKGSKDRSSLTTHERTHTGEKP
YVCGECGQSFHHGSVLIRHQRTHTGEKP
YVCGECGRSFSQKSVLIRHQRTHTGEKP
YVCGECGRSFSQKSVLIRHQRTHTGEKP
YVCGECGRSFSQKAHLITHQRTHTGEKP
YVCGECGRSFSQKTHLISHKRTHTGEKP
YVCGECGRSFCQKSALIRHQRAHTGEKP
YVCGECGRSFIQKSDFIRHQRTHTGEKP
YVCRECGQSYSDKTVLITHERTHTGEKP
YVCGECGRSYSDKTVLITHERTHTGEKP
YVCGECGRSFLWKSALIRHQRTHTGEKP
YACGDCGRSFNQKSNFIRHQRTHTGEKP
YVCGECWRSFSQKSSSSDTRGHTQGRRP
VCRECG..SFSQKSHLISHQRTHTEEKP
YVCRECE.....................

>PRDM9a_munMun Muntiacus muntjak (muntjac) AC225653 Laur gene 7 noDet unordered contigs htgs; no synteny tag stop instead of aag K
0 MRPNRSPEESTEGDAGRTEQKPT 0 
0 AKDAFKDISVYFSKEEWEEMGEWEKIRYRNVKRNYEALIAI 1
2 GFRATRPDFMHHCRQVIKPQVDDTEDSDEEWTPRQQ 1
2 GKPSSMAFRVKHSKHQK 0
0 GMSRAPLIKESSLKELLGAAKLMKTSGSKQAQNPVPHPRKARTPGQHPRQKV 1
2 ELTRKETGVKRYSLRERKGHVYQEVSEPQDDDYL 1
2 YCEECQNFFIDSCAAHGLPTFVKDCAVEKGHANRSALTLPPGLSIRLSGIPDAGLGVWNEESDLPLGLHFGPYEGQITDDEEAANSGYAWL 0
0 ITKGRNCYQYVDGKDTSWANWMR 2
1 YVNCARDDEEQNLVAFQYHGQIFYRTCQVIRPGCELLVWYGDEYGQDLGIKRNSRGKSELAAGR 1
2 EPKPKIHPCASCSLAFTSQKFLSQHIQRSHPAQTLLRPSERNLLQPEHPCPGSQNQRYSDPHSLSDKPEGQEAKDRPQPLLKSIRLKRISRASSYSPGGQMGGSGVHERMKDEPSTSQKLNPEDTGTLLTGAGVSGIMR
VTYGECGKGSKDRSSLTTHERTHTGEKP
YACRECGRSFRQKSDFITHQRTHTGEKP
YVCGQCGRSFGRKFALIRHQRIHTGEKP
YVCRECGQSFSQKTHLSSHQRTHTGEKP
YVCGECGRSFSQKSVLIRHQRTHTGEKP
YVCQECGRSFSDKSNLISHKRTHMGEKP
YVCRECGRSFIRKSVLIRHQRTHTGE.P
YVCRECE.....................

>PRDM7_bosTau Bos taurus (cattle) genome Laur pseu -- GAS8+ missing C2H2
0 MSPNRSPEESIEGDTGRTEWKPT 0 
0 AKDAFKDISIYFCKEEWAQMG WEKIRYRNVKRNYEALITL 1
2  1
2  0
0  1
2  1
2  0
0  2
1  1
2 

>PRDM7_turTru Tursiops truncatus (dolphin) ABRN01441536 Laur gene 9 gas8+ no useful synteny
0 MSTDRWPEDSTEGDAGRTAWKPT 0 
0 VKDAFKDISIYFSKEEWTEMGEWEKIRYRNVKKNYEALVTL 1
2 GLRAPRPAFMCHRRQAIKAQVGDPEDSDEEWTPRQQ 1
2 VKPSWVAFRVEHSKHQK 0
0 AVPPVPLSNESSLKKLPGAAQLQKASGPAQAQSPAPPPGAASTSAWHTRQKL 1
2 ERRAKQIEVKMYSLRERKGHVYQEVSEPQDDDYL 1
2 yCEKCQNFFIDSCAAHGAPTFVKDSAVEKGHPNRSALTLPPGLSIRPSGIPEAGLGVWNEASDLPLGLHFGPYEGQITEDEEAANSGYSWL 0
0 ITKGRNCYEYVDGKDTSWANWMR 2
1 YVNCARDEEEQNLVAFQYHRQIFYRTCRVVRPGCELLVWYGDEYSQELGIPWGSGWKSQLVaGR 1
2 DPKPKIQPCGSCSLAFSSQKILSQHVECSHPSQVLPRTSARDRVQPEDPCPGYQNRQQQYSDPHSWSNKPECQEVKERSKPLLKRIRLGRISRAFSSSPKGQMGSSRAHERMMEAGPSTGQKVNPEATGKLLIGAGVSRVVK
VKYRSSGQGSKDRSSLTKHQRTHTGEKP
YVCGECGRDFSLKSDLIRHQRTHTGEKP
YVCGECGRDFSLKSGLISHQRTHTGEKP
YVCGECGRDFSQKSGLIRHQRTHTGEKP
YVCGECGRDFSLKSGLISHQRTHTGEKP
YVCGECGRDFSQKSGLIRHQRTHTGEKP
YVCGECGRDFSLKSGLITHQRTHTGEKP
YVCGECGRDFSQKSNLITHQRTHTGEKP
YVCGECGRDFSRKSSYI...........

>PRDM7_lamPac Lama pacos (llama) scaffolds traces
0  0 
0 TFKDISIYFSKEEWTEMGEWEKIRYRNVKRNYEALITI 1
2 GLRAPRPAFMCHRRKAIKPQVDDTEDSDEEWTPRQQ 1
2  0
0 GMPRGPLSNQSSLKELSGTAKPLKTSGSGQAQKPFPPLGEASTSGRHSRQKL 1
2 ELRRKESQVKMYSLRERKGHAYQEVSEPQDDDYL 1
2  0
0 ITKGRKCYEYVDGKDKYWANWMR 2
1 YVNCARDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGEEYGQELGIKWGSKWKKSLWQGE 1
2 EPKIYLCPSCSLAFSSQKFLSQHVKHNHPSQILPRTAAGRHLEPEDPCPGNQNEQQQHSDQHSWNDKPEGQEAKERSKPFLKRIRLRRISGAFSYSHKGQMGNSRVHDRMIEEEPSTGQKVNPKDTGKLFTWAGVSRTVE
VNYGEYGQGCKDTSHLTTHQRTHTGEKP
YVCRECGRGFTRKSNLTIHQREHTTGEK

>PRDM7_susScr Sus scrofa (pig) FP476134 Laur gene 9 GAS8+ unordered HTGS not wgs misassembly or inversion; not in genome browser
0 MRPDRRPEESPDPAAGSTERKAA 0 
0 ATDAFKDISIYFSKEEWTEMGEWEKIRYRNVKRNYEALTTI 1
2 GLRAPRPAFMCHRRQAIKPQVDDTEDSDEEWTPRQQ 1
2 VKPCRVAFRVEHNKHQK 0
0 SDSRVPLSNKSSLKELLTTAEVPETSGSEQAQEPVSPPGEASTSRRRSGQEL 1
2 ARRRKDTEARMYSLRERKGHAYQEVGEPQDDDYL 1
2 yCEKCQNFFIDSCAAHGPPTFVKDSAVDKGHPNRSALTLPPGLRIRPSGIPEAGLGVWNEAHDLPLGLHFGPYEGQVTEDEEAANSGYSWL 0
0 ITKGRNCYEYVDGKDKSWANWMR 2
1 YVNCARDDEEQNLVAFQYHRQIFYRTCRVVRPGCELLVWYGDEYGQELGIKWGSKWKKELTAGI 1
2 EPKPKIHPCPSCSLAFSSQRFLSQHVERSHPSQSLPRASARRGLQPEGPCPDNQQQQQPYPDPHSWDGTSESQDVKEGSKPFLERRRLRKTSRASSYAPEGQMRSSRVRERMTEEEPSAGQKVNPEDTGTLFTVAGES
GILRVENRGYGPDSGLTRHPRTHTGEKP
HVCSECGRGFSVKSHLIRHQRTHTGEKP
YVCRECGRGFSVKSHLIRHQRTHTGEKP
YVCRECGRGFSVKSSLITHQRTHTGEKP
YVCRECGRGFSVKSHLIRHQRTHTGEKP
YVCRECGRGFSEKSSLVTHQRTHTGEKP
FVCRECGRGFSVKSSLVTHQRTHTGEKP
YVCRECGRGFSVKSNFITHQRTHTGEKP
YVCRECGRGFSEKSSLVTHQRTHTGEKP
YVCREGE.....................

>PRDM7_canFam Canis familiaris (dog) genome Laur pseu 5 GAS8+ frameshift fixed to 6 ZNF; synteny MNS1 K1F1B intervening CDH3 oddity
0  0 
0  1
2  1
2 VKPSWVAFRMEQSKHQK 0
0 GIPRVPLSNKSSLKELSETAKLLNTSSPEQGQKSVSLPGKASTSGHHTRQKL 1
2 ELRRKDVEVKMYSLQERKGLAYQEVSEPQDDDYL 1
2 yCEK QTFFIDSCTVHGPPTFVKDSEVDKGQPNHSALTLPPGLRIRTSSIPQAGLGVWN ASDLPLGLHFGPYKGQITEDEEAANSGYSCL 0
0 ITKGRNCYEYVDGKDkSWANWMR 2
1 YMNCARDDEEQS LVAFQYHRQIFYRTPGHQASCELLVWYGDEYSQELGIKWGSKWKSELTAGK 1
2 EPNPEIHPCPSCSL AFSSQKFLSQHLEHNHPSQILPRISVREHFRPKDPCPGCQNQQQQQHSDPQRWNDRAKGQEGKERFKPLPKSIRQRRISRAFSTPCKGQTTCEGIVKEEPSAGSQKLNPEDTGKLFKGVGMTRIIR
VKYRGCGRGFNDRSHLSRHQRTHTGENP
YVCRECGRGFIHRTNLIIHQRTHTGEKP
YVCRECGtGFIQRSNLSIHQRTHTGEKP
YVCRECGRGFTQRSTLNEHQRTHTEEKP
YVCRECGRSFTRRSTLITHQRTHTGEKP
YVCRECGRSFT.................
KRSTWDPWVAQRFGACLWP.........

>PRDM7_felCat Felis catus (cat) genome Laur gene 11 GAS8+ two contigs GAS8 implied by downstream CAD1
0 MEPSPASESARGQPGGPGTTSPLRFPEQSAERGSRKARWKPT 0 
0 AKDAFKDISIYFSKEEWTEMGDWEKIRYRNVKRNYEALMTI 1
2 gLRAPRPAFMCHRRQAIKPQVDVTEDSDEEWTPRQQ 1
2 VKPSWVASRVDQNKQHK 0
0 GTHRVPLSKESSLKDFSETAKLLNTSGSEQGQKPVSLPGEASTSGHHSRRKL 1
2 ELRRKEIGVKMYSLRERKGFAYQEVSEPQDDDYL 1
2 yCEKCQNFFIDSCAVHGPPTFVKDNAVGKGHPNRSALTLPPGLRIRPSSIPEAGLGVWNEASDLPLGTHFGPYEGQITEDEEAANSGYSWL 0
0 ITKGRNCYEYVDGKDNSWANWMR 2
1 YVNCARDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKWKSELSTGK 1
2 EPQPDIHRCPSCSLAFSSQKFLSQHVECKHSSQSLPQISARKHFQPENPCPGDQNQQQQQHSDPHSWNDKAKCQEVKERSRPLLKSIKQRRISRAFSTPCKGQMGSSRVCEGMVEEGPSMGQNLNSEDTGKLFMGVGMSRIVR
IKNRGCEQGFNDRSHFSRHQRTHKEEKP
SVCNEFRRDFSHKSALITHQRTHTGEKP
YVCRECGRGFTQRSNLFRHQRTHTGEKP
YVCRECGRGFTQRSDLFTHQRTHTGEKP
YVCRECGRGFTRRSNLFTHQRTHTGEKP
YVCRECGRGFTRRSHLFTHQRTHTGEKP
YVCRECGRGFTQRSNLFTHQRTHTGEKP
YVCRECGRGFTQRSDLFRHQRTHTGEKP
YVCRECGRGFTQRSHLFTHQRTHTGEKP
YVCRECGRGFTQRSNLFRHQRTHTGEKP
YVCRECGRGFTWRSNLFTHQRTHTGEKP
YVCRKDGQGFTNKLHLSYQRT
NVATTHSIPQL

>PRDM7_ailMel Ailuropoda melanoleuca (panda) GL193502 Laur gene 6 GAS8+ first three exons from different contig ACTA01106867
0 MGPLPASESEQSLPGGPSTMSLNTSPEETPERDSGRTGWKPT 0 
0 AKDAFKDISIYFSKEEWTEMGDWEKIRYRNVKRNYEALITI 1
2 GLRAPRPAFMCHRRQAIKPQVDDTEDSDEEWTPRRQ 1
2 VRPSWVAFRMEQSKHQR 0
0 GIPRAPLRNESSLKELSETAKLLNTSGSELGQKPVSLPGEASTSGHDSLQKL 1
2 GFRRKDVEVKMYSLRERKSLAYQEVSEPQDDDYL 1
2 yCEKCQNFFIDSCAVHGPPTFVKDSAVDKGQPNRSALTLPPGLRIRPSGIPQAGLGVWNEASDLPLGLHFGPYEGQITEDEEAANSGYSWL 0
0 ITKGRNCYEYVDGKDNSWANWMR 2
1 YVNCARDEEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKWKSELAAGK 1
2 EPKPEIHPCPSCSLAFSSQKFLSQHLEHNHPSQILSRKSASEHFQQEDPCPGHQNQQQQQHSDPHRWNDKAKGQEVKERFKPLLKSIRQRRISRAFSSPCKGQTRSSTVCEGMVEEEPSAGQKLNPEETGKLFMGVGMSGIIR
VKYRGCGRDFSDRSHQSGHQRRHQKKP
SVCKKVKREFSHKSVLITHQRTHTGEKP
YVCRECGRGFTQRSNLIRHQRTHTGEKP
YVCRECGRGFTQRSNLIRHQRTHTGEKP
YVCRECGRGFTQRSSLIRHQRTHTGEKP
YVCRECGRGFTLRPNLIGHQRTHTEALP
INYISTTKEQM

>PRDM7_musPut Mustela putorius (ferret) AEYP01035076 AEYP01035077 terminates early in C2H2
0 MRPRTASESEQGLPGGPSTGSVSGPPEETPERDSGRTGRKPP 0 
0 AQDAFKDISVYFSKEEWTEMGDWEKIRYRNVKRNYEALITI 1
2 GLRAPRPAFMCHRRQATIPRVDDTEDSDEEWTPRQQ 1
2 VRPSWVAFKMEQSKHQK 0
0 GVPRAPLSNESSLKELSETAKLLNTSGSEHDQKPVSHPGEASTSGHHSLRKL 1
2 ELRRKDVEVKMYSLRERKSLAYQEVSEPQDDDYL 1
2 YCEKCQNFFIDSCAVHGPPTFVKDSAVDKGQPNRSALTLPPGLRIRPSGIPQAGLGVWNEASDLPLGLHFGPYEGQITEDEEAANSGYSWL 0 
0 ITKGRNCYEYVDGKDNSWANWMR 2
1 YVNCARDDEEQNLVAFQYRRQIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKWKSELTAEK 1
2 EPKPEIHPCPSCTLAFSSQKFLSQHLERNHPSQILPRISAGEHFQPEDPCPGEQNHQQQQHSDPQNWNDKAKGQDVKESFKPLLESIRQRKNSRAFPIPCEGQTGYEGIVEEEPSTGQKLNPEETGKLFMGVGMSRIIR
VKYRGSGQGFDDRSHLSRHQRTHKEEKP
SVGKEPRREFIHKSVLVTHQRTHTGEKP
YVCRECGRGFTQRSHLIRHQR

>PRDM9_pteVam Pteropus vampyrus (bat) ABRP01232219 Laur pseu 15 noDet frameshift ttt to tttt fixed in last zinc finger; no blastx synteny
0  0 
0  1
2  1
2 vQPSWVAFGVEQSKHQK 0
0 AMPRVPLSNESSLKELSVIANPLKASGSEQNQQPVFPPGKASASRQHSRRKL 1
2 eLRRKGVEVKMDSLRERMGRVYQEVSEPQDDDYL 1
2 yCEKCQNFFIDSCAAHGSPIFVKDSEVDIGHPNHSALTLPPGLRIGPSGIPEAGLGVWNEASNLPLGLLFGPYEGQVTEDEEAANSKYSwM 0
0 spKGETAEYV DGKDESRANWMR 2
1 YVNCARDDEDQNLVAFQFRRQIFYRTCRVIMPGCELLVWYGDEYGQGLGIKWGSKWKREFTAGR 1
2 EPKPEIHPCPSCSLAFSSRKFLSQHMKRSHPSQSLPGISARKHLQSKEPHPEDQSQQQQQQQHTDPCSWNDKAEGQEVKERSKPMLERNGQRKISRAFSKPPKGQMGSPRECERMMEAEPSTSQKVNPENTGKSSVGVGASRIVR
VKYGGCGHGFDDGSHFIRHQRTHSGEKP
FVCRECERGFNEKSSLTMHQRTHSGEKP
FVCREC.EGFSVKSSLIRHQRTYSGEKP
FVCRECEQGFNEKSSLTMHQRTHSGEKP
FFCRECEGFSVK.SSLIRHQRTHSGQKP
FVCRECKRGFTQKSHLITHQRTHSGEKP
FCRECER.GFTQKSHLIKHQRTHSGEKP
FVCRECA.....................

>PRDM7_pteVam Pteropus vampyrus (bat) ABRP01250178 Laur gene 7 GAS8+ 4 distal exons of GAS8+-; unique F sweep in zinc finger; 15 ZNF dotplot no CAD1
0 MRPDRSPEEAPEGDTRRTGCKPK 0 
0 AKDAFKDISIYFSKEEWTEMGDWEKIRYRNVKRNYDALQAI 1
2 GLRAPRPAFMCRRRQAIKPQVDDSEDSDEEWTPRQQ 1
2  0
0 AMPRVPLSNEPSLKELSVIANLLKASGSEQDQKPVFPPGKASASRQHSRQKL 1
2 GLRRKGVEVKMYSLRERTGRVYQEVSEPQDDDYL 1
2 yCEKCQNFFIDSCAAHGSPIFVKDSEVDIRHPNRSALTLPPGLRIGPSGIPEAGLGVWNEASDLPLGLLFGPYEGQVTEDEEAANSGYSWL 0
0 QGKGRNCYEYVDGKDESRANWMR 2
1 YVNCARDDEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKWKRELTAGR 1
2 EPKPAIHPCPSCSLAFSGQKFLSQHMKRSHPSQSLPGISARKHLQSKEPHPEDQSQQQHNDPRSWNDKAEGQEVKERSKPLLERNRQRKIFRAFSKPPKGQMGSPREYERMMEAEPSTSQKVNPENTGKSSVGVGASRIVI
VKYGGCEHGFDDGSHLIMHQRTHSGEKP
FVCRECERGFSKKSNLITHQRTHSGEKP
FVCRECERGFTRKSSLITHQRTHSGEKP
FVCRECERGFTQKSHLITHQRTHSGEKP
FVCRECERGFSEKSSLIKHQRTHSGEKP
FVCRECERGFTRKSSLITHQRTHSGEKP
FVCRECERGFTQKSSLIKHQRTHSGEKP
FVCRECERGFTQKSSLIKHQRTHSGEKP
FVCRECERGFTQKSSLIKHQRTHSGEKP
FVCRECERGFTQKSSLITHQRTHSGEKP
FVCRECERGFTQKSHLITHQRTHSGEKP
FVCRECERGFSKKSNLITHQRTHSGEKP
FVCRECERGFTRKSLLITHQRTHSGEKP
FVFRECERGFTQKSSLITHQRTHSGEKP
FVCRECERGFTRKSYLITHQRTHSGEKP
FVGRECE.....................

>PRDM7_myoLuc Myotis lucifugus (bat) AAPE02062260 Laur gene 6 gas8+ TGA stop codon; CpG hotspot for R CGA; SXXRD implies missing KRAB no CAD1
0  0 
0  1
2  1
2  0
0 AKSRAPLSNESSLKELSGTANLLTTSGSEQTQKTVPPPGEASTSGQHPRSKL 1
2 dLRRKEIEVKMYSLRERKCRVYQEISEPQDDDYL 1
2 YCEKCQNFFIDSCAVHGPPTFVKDSAVDKGHANRSALTLPPGLRIGPSGIPEAGLGVWNEECDLPVGLHYGPYEGQITEDEAIANSGYSWL 0
0 ITKGRNCYEYVDGKDTSQANWMR 2
1 YVNCARDDEEQNLVAFQYHRQIFYRTCRVVRKGCELLVWYGEEYGQELGIKWGSKWKTEPVAGR 1
2 EPKPEIHPCPSCSVAFSSQTFLSQHGKRNHPSEILPGAPAGNHLQSEEPGPERQNQQQQQQTGPHGWNDKAEGQEVKGRSKPLLKRIRQRGTSRASFKPPNRHMGSSSERERIREEEPSTGQNVNHKNTGKLFVGVKRSKSVT
IKHGGCGQGFNDGSHIDTHQRTHSGEKP
YICRECGGFTHKSDL.IRHQRTHSQENP
YVCRECGRGFRDRSTLITHQRTHSGEKP
YVCRECGRGLTEKSTLITHQRTHSGEKP
YVCRECGRGFTRKSTLITHQRTHSGEKP

YVCRECGRGSRVKSNLIRHQRTHSGEK
SGVCIEGE....................

>PRDM7_equCab Equus caballus (horse) genome Laur gene 4 GAS8+ missing front exons, pre-terminal stop GAS8+- flanked right by EMR2-
0  0 
0  1
2  1
2 VKPSWVAFRVEQSKQQK 0
0 RMRTAPLSNESRLKELSGTAKLLKTSSSEQVQKPVSPLGEASSSEQHSRRKL 1
2 ELRRKEVGVKMYSLRERKGHAYQEVSEPQDDDYL 1
2 yCENCQNFFIDSCAAHGPPIFVKDSAVDKGHPNRSALTLPLGLRIRPSGIPEAGLGVWNEASDLPLGLHFGPYEGQITEDEEAANSGYSWL 0
0 ITKGRNCYEYVDGKDISWANWMR 2
1 YVNCARDDEEQNLVAFQYHRQIFYRTCRVVRPGCELLVWYGDEYGQELGIKWGSKWKRELTAGR 1
2 EPKLEIHPCPSCSLAFSSQKFLSQHVERNHPSQILPGTSARNHLQPEDPSPGDQNQQQQHSDPHSWKDKAHSQEVKERSKPLLKKIRQRRIPRAFSYPPKGQMENFRMRERIMEEKPSIGRKVNPEDTGKLFLEMRMSRNVR
VQYGGCGRGFNDRASLIKHQRTHTGEKP
YVCRECEQGFTQKSSLIAHQRTHTGEKP
YVCRECEQGFSEKSHLIRHQRTHTGEKP
YVCRECEQGFSVKSNLIRHQRTHTGEKL
.FCREGK.....................

>PRDM7_sorAra Sorex araneus (shrew) AALT01000095 Laur gene 8 noDet no useful synteny; upstream spectrin, IgG; GAS8 contig has no sign of pseudogene
0 MSLNRPAEMNTQGKARKLMLKPM 0 
0 SKDAFKDISMYFSKEEWAEMGDWEKIRHRNVKRNYEELISI 1
2 GLRAARPAFMSHRRQAIKTQLDDTEESDEEWTPNQQ 1
2 VKSLRVAFRAEQSKHQK 0
0 GRSRTPISNESSSKELSGTRTLLNTKCTKQAQKPLFPPGEASTSGHYSKPKL 1
2 ELRRKEPEVKMYSLRERKGRAYQEVSEPQDDDYL 1
2 YCENCQNFFINKCSAHGSPIFVKDNAVAKGHSNRSALTLPHGLRIGPSGIPEAGLGIWNEASDLPLGLHFGPYEGQITNDEEAANSGYSWL 0
0 ITKGRNCYEYVDGVDESLANWMR 2
1 YVNCARDYEEQNLVAFQYHRQIFYRTCRIIKPGCELLVWYGDEYGQELGIKWGSKWKSELTADK 1
2 EPKPEIYPCPCCSLAFSNQKFLSRHVEHSHPSLILPGTSARTHPKSVNFCPGDQNQWQQHSDACNDKPDEPWNDKLENHKSKGRSKPLPKRMGQKRISTAFPNLRSSKMGSSNKHETIMDKINTGQKENPKDTYRVFAGIGMPRIIR
DKHVTLRRSFTNRSSPLTHQRTHTGEKP
YVCRECGRGFSQKSHLLTHQRTHTGEKP
YVCRECGRGFTDRSSLLTHQRTHTGEKP
YVCRECGRGFSLKSSLLRHQRTHTGEKP
YVCRECGRGFSLKSSLLTHQRTHTGEKP
YVCRECGRGFTDRSSLLTHQRTHTGEKP
YVCRECGRGFSLKSSLLTHQRTHTGEKP
YVCRECGRGFSRKSSLLRHQRTHTGEKP
YVCES.......................

>PRDM9a_loxAfr Loxodonta africana (elephant) genome Afro gene 12 noDet chr 153 novel synteny THEG+ MIER2+ PPAP2C PRDM9- ZNF699-
0 MSPARAAKKNPRGDVGSAGRTPT 0 
0 aKDTFRDISIYFSKEEWAEMGEWEKFRYRNVKRNYEALVTI 1
2 GLRAPRPAFMCHRRQAIKAQVDNTEDSDEEWTPRQQ 1
2 VKPPSVASRAEQSRHQK 0
0 GTPKALLGNESSLKEVSGTAILLNTTGSEQAQKPVSSPGEASTSDQPSRWKL 1
2 EPRRNEVEVKMYNLRERKGLEYQEVSEPQDDDYL 1
2 yCEKCQNFFIDTCAVHGAPMFVKDSPVDRGHPNHSALTLPPGLRIGPSSIPKAGLGVWNEASELPLGLHFGPYEGQVTEDKEAANSGYSWL 0
0 ITKGKNCYEYVDGKDESWANWMR 2
1 YVNCARDEEEQNLVAFQYHRQIFYRTCRTIQPDCELLVWYGDEYGQELGIKWGSRWKKELTSGR 1
2 EPKPEIHPCPSCRLAFSSQKFLSQHMKHSHPSPPFPGTPERKYLQPEDPRPGGRRQQRSEQHMWSDKAEDPEAGDGSRLVFERTRRGCISKACSSLPKGQIGSSREGNRMMETKPSPGQKANPEDAEKLFLGVGTSRIAK
VRCGECGQGFSQKSVLIRHQKTHSGEKP
YVCGECGRGFSVKSVLIKHQRTHSGEKP
YVCGECGRGFSVKSVLITHQRTHSGEKP
YVCGECGRGFSVKSVLITHQRTHSGEKP
YVCGECGRGFSQKSDLIKHQRTHSGEKP
YSCRECGRGFSRKSVLITHQRTHSGEKP
YVCGECGRGFSQKSNLITHQRTHSGEKP
YVCGECGRGFSRKSVLITHQRTHSGEKP
YVCGECGRGFSQKSNLITHQRTHSGEKP
YVCGECGRGFSQKSDLITHQRTHSGEKP
YVCRECGRGFSRKSNLITHQRTHSGEKP
YVCRECRRGFSVKSALI...........
GHGRRKCSKSAEPLHFPRVSRDQK....

>PRDM9b_loxAfr Loxodonta africana (elephant) genome Afro pseu 3 noDet approx seq after frameshift correction
0  0 
0  1
2  1
2  0
0 GTPKVLLSNESSLKEVSGTAILLSTMGSEQAQKPVSSPGEASTSDQPSRRKQ 1
2 EPRRKEVEVNMYSLRERKGLVYQEVGEPQDDDYL 1
2 yCEKCQNFFIHTCAVHGAPMFVKDSHVDRGHLNHSALTLPPGLRIGPSSIPEAGLRVR EVSEQLLGLHIGPYEGQVTEDkEAAHSGYSWL 0
0 ITKGRNCYKYVDGKDDPWANRMR 2
1 YVNCIQD KEQNLVAFQYHRQIFHWTCCTIRPGCELLVWYGDNYSQELGIKWGSR KKEL     1
2 EPKPEIHPCPSCPLAISSQKFLDQHTKHSHPSPPFPGTPERKHLQPEDPHPGGRRQQHSEQHLNDKAEDPETGDGSKPVFERARLVGGGAGGVSKVCSSLPKGQMGSSREGNRMMETEGQKVNPEDTEKLFLGVGISRLAK
VRCGEYGQGFSQKSVLIRHQRTYSGEEH
YVCGECGRGFSWKSQLTRHQRSHSWEKP
YVCRECGGFSVKSTLI............
GTGEGNAATIHLHLPS............

>PRDM7_loxAfr Loxodonta africana (elephant) genome Afro pseu 5 GAS8+ scaffold_57 several frameshifts; ZNF540 opposite strand upstream of N-terminus
0  0 
0  1
2 GLRASHPAFTCHCMQAIKAQMDDTEDSNEEQTPRQq 1
2 VRPSWVAFRMEQSKHQR 0
0 GMLRVPRSNESSLKNLSGTSIMLSRAGSEQAQKLVLPPGKASTSDEHSRQKP 1
2 EHRRKGVEVKMYSF ERKGLVYQEIS PQDDDYL 1
2 YCEKCQNFFIDTCESHGVPTFVKNSTTDSGHPNHLALTPSSGLRTRPSSIPKAWLRLWNKAFELLLGLPFSPCEGQVIEDEAVDNSGYSWL 0
0  2
1 YVNGTQDEKEQNLVFFQYHRQIFYQTCYAVWPGCQLLVWYRDECGQELGIKWDNRGKKEFe    1
2 EPKPEAHPCPSCPLAFSSEKFLSQHMKHNHPSQSSPETPERKHLQPEDPHPGHQNQQQQQHSDPHRWNDKAEGQQTGDRSKPMFENIRQEVTSRAFSSLPKGQMVCSREGNRMMETEPSPGLKVNPEVTGKLFLGVESSRIAK
VKYRGCGRDFSDRSHQSGHQRRHQ
KKP
SVCKKVKREFSHKSVLITHQRTHSGEKS
YVCKESGRGFSAKSNLIRPRRTHTGEKP
YVCGERGG.FSVSGLII.HQRAHSPEKP
YVCREGRRGFGDKSSFIKHQRATLGEKS
YVCKESGRGFS.................
AKSNLIRPRRKKCRHDTTPHPQL.....

>PRDM7_echTel Echinops telfairi (tenrec) genome Afro pseu 5 noDet 2 frameshifts plus stop codon
0  0 
0  1
2 GLRAPRPAFMCHHRPAAKGQVEDSEDSDEEWTPRQR 1
2  0
0 GMPGVSLRNESNLKVLSGTAILLTAAEPEQPH PGSPPGEATTSHEHLRQKV 1
2 epELRRRAVMMNSLRERKNLMYQEVSTPCDDNCL 1
2 YGERCHNFFIDTHIAHGATTFVKDS    PMDRSNCSILPPGLRIGPSGIPEAGLGVWNEASELPLGLHFVPYEGQVTKDEAATNSGYSWM 0
0 ITKGRNCYEYVDGKDKSWANwMr 2
1  1
2 EPKPEVNPCPSCPLALSSQQLKHSHPFQSLPGTPAEKHLQAEDFHPRGQKLHHFEHHIRNERAEGLETGDGSKPMLERTRLGKMSKTTYNSPKGQTRSSGETNRIREADLNPGQGVNAEDTRNLFLGIGISRIAK
VRCRECGHGFSVKSSLITHQRIHTGEKP
YVCSECGQGFSQKSVLIRHQRIHTGEKP
YICRECDRGFSRKSHLIKHQRTHSGEKP
YVCRECGQGFSQKSVLITHHRTHSGEKP
YVCRECGRGFSQKSDLIKHERTHS....

>PRDM7a_proCap Procavia capensis (hyrax) ABRQ01227339 Afro pseu 17 noDet frameshift and two stop codons in exon 10 
0  0 
0 AKDAFRDISIYFSKEEWAEMGEWEKSRYRNVKRNYEALVAI 1
2 GLRAPRPAFMCHRRQAIKAQVDNTEDSDEEWTPRQQ 1
2 AKPRSVASREELRKPQK 0
0 GTPKALLGNESSLKEVSGTAILLNTTGSEQAQKPVSSPGEASTSDQPSRWKL 1
2 EPRRKEAEVKRYNLREGTNPAYQEVGDTQDDDYL 1
2 yCEKYQKFCTDVCPAHGALAFLKDLSVERGHPKHSALTLPPGLRIGASGIPEAGLGVWSEASELPPGLHFGPCERQVTKDNEAANRGYLWP 0
0 ITKGRSCSLYMDRKDESRANWMR 2
1 YVRHAGDKEEQNLVAFQYHRQIFYRTCRPVQPGCELLVWPGAEDGQELGLQRGSRWKKELASQT 1
2 EARPEIHPCPSCPLAFSTPKFLSHHVKHSHPCQPFPGTLARRPLQPEDPHPGDRRQQHSEQPNWNDKAEGPEIGHVSRPVFEKTRQEGFSEARSSLPKGQMGRSREAERTTETQNSPGQKVNPEDTEILFLRGGISEIAK
VKCGECGQGFSRKSHLIRHQRTHSGMKP
YVCRECRRGFGVKSLLTRHQRTCSGMKP
YVCRECGQGFRWKSHLIRHQRTHSGEKP
FVCSECGRGFSVRSHLFTHQRTHSGEKP
YVCKECGRGFSVKSYLTTHQRTHTGEKP
YVCKECGRGFSWKSHLITHQRTHSGEKP
YVCRQCGRGFSVQSHLIIHQRTHSGDKP
YICRECGRDFTEKSSLIRHRRTHSGEKP
YVCRDCG*GFTRKSLLITHQRTHSGEKP
YVYRECGRGFSCKSYLISHQKTHLGEKP
YVCSDCGRGFSVKSQLVSHKRTHSGEKP
FVCREC*RGFSVKSSLISHQRTHSGEKP
FVCRECGRGFSVKSSLIKHQRTHSGEKP
YVCKECGRGFSQKSSLITHQRTHSGEKP
YVCRECGRGFGLKSYLITHQRTHTGEKP
YICRECG*GFSVKSSLITDQRTHTGEKP
YVCRECGRAFSKKSSLISHHRTHPAEAV 
YVHRECG.....................

>PRDM7b_proCap Procavia capensis (hyrax) ABRQ01392668 Afro pseu 13 noDet CpG stop in ZNF1, 4aa insert exon 4, frameshift exon 5 c to cc, 4aa del exon 9 etc
0  0 
0 AKEYFRDISMFFS*ERWVEMSESEKFCYRNMKRNCETTGAG 1
2 GIRVFHPAFMIHPRKTIKAQMDDSEDSDEDWTARQQ 1
2 AKPPSVASREELRKPQK 0
0 GPSRAPLRIKSSLKRVSEPAIVWSTADSEQAQERVQKPVLSRREASASDQPLRRKV 1
2 EPRRHEAEDKRYSLRGGTGPACQEVGEPQDDDYL 1
2 yCEECRNFFIDTCVAHGTPVFIKDISVERGHPNRLALTLPTGLRIGPSSIPDAGLGVWNEASELPPGLHFGPCEGQVTEDEEAANSGYSWL 0
0 VTKGRSCFEYVDGKNEALANWMR 2
1 YVRRARDTEERNLVAFQYHRQIFYRTCCTVRPGCELLVWRGAEDSQALG----SRRTMELTSQK 1
2 EARPEIHPCPSCPLAFSTQKFLSYHVNHSHSSEPFPGTHARRHLPREDPRPGYERDQRSEQHNWNDSTGGPERDVSRP VIERTWEGEISEACSSLPRGHMGRSREGERMAETQSSPGLKVTLAK
VRWDEYGQGFGPKSHHITQQTKHSGKKP
CVCKECG*GFRVKSLLKSHQMTHSGEKP
YVCRECGRGFSVKSTLITHQRTHSGEKP
YVCRECGRGFSVKSFLISHQRTHSGEKP
YVCRECGRGFSWKSGLITHQRTHTGEKR
YVCRECGHGFNRPSRLIRHQRTHSGEQP
YVCRECGHGFNRRSQLIRHQRTHTGEQP
YVCRECGQGFSGKSGLNRHQRTHSGEKP
YVYKECGRGFSVKSTLIKHQRGHSGEKP
YVCKECGRGFSRNSGLITHQRTHSGEQP
YVCRECGRGFNQKSGVISHQRIHSGEKP
FVCGECGRRFSWQSNLITHQRTHSGEKP
FVCRECGRGFSAKTSLINHQRIH*GKKP YVCRDGG*

>PRDM7_dasNov Dasypus novemcinctus (armadillo) AAGV020462211 9 xena pseu TRAPP
0  0
0 AQDAFRDISTYFSREEWAEMGRWEKLRYRNVKRNYEALLAI 1
2 GLRAPRPAFMCHRKQSIKPQVDDAEDSDEEWTPRQQ 1
2  0
0  1
2 EPRRKGIDVKMYSLRERKGLAYEEVSEPQDDDYL 1
2 yCEKCQNFFIDSCTVHGPPIFVKDSAVDKGHPNRSALTLPSGLRIGPSGIPEAGLGIWNEASDLPLGLHFGPYEGQVTEDEEAANSGYSWL 0
0 ITKGRNYYEYEDGKDKSWANWMR 2
1 YVNCAWDDKEQNLVAFQYHRQIFYRTCRTIRPGCELLVWYGDEYGQELGIKWGSKWKKEFMTGT 1
2 ELKPEIHPCPSCPLAFSSEKFLSQHVRRHHPSQSFPAACAREHFQPQNPRPRGEEQQQHSDQCGWKDKAEGQETENRPKPLFERIKPMGSPRAFYNPPRGQMRSSREGKRMMEIQPSQDQKMNSE RGQLFLGVGIFKTEV
IKFGENRQDFSDKSDHTSHQRTHTGEKP
YVCRECGRGFSNNSHLTRHQRTHTGVKP
YVCRECGQGFSVKPALTKHQRTHTVEKP
yVCSECG GFSVKSTLITHQRTHTGEKP
CVCRECGRGFNNKPDLTKHQRTHTGEKS
YVCRECG GFSVKSTLIIHQRTHTGEKP
YVCRECGRGFSEKSNLTVHQRTHTGEKP
YVCRECGRGFSEKSNLTVHQRTHTGEKP
YVCRECGRSFSVKSTLITHQRTHTVEKP
YVCMKSEVVVSNKSHLNSHRRMKCGHRT PPPPQL

>PRDM7_choHof Choloepus hoffmanni (sloth) ABVD01893961 2 xena gene noDet 0  0
0  1
2  1
2  0
0  1
2  1
2 ycekcQNFFFENCAAHGPPTLLKDSAVGQGRPKHSALVLPPGLRLGPSGIPEAGLGVWNEASDLPLGLHFGPYEGQVTEDEEATNSGYSWL 0
0 ITKGRNCYEYVDGKDKSCANWMR 2
1 YVNCARDDEEQNLVAFQYHRQIFYRTCRAIRPGCELLVWYGDEYGQELGIKWGSKWKKELTAEK 1
2 GLKPEIHPCPSCPLAFSTEKFLSQHVQRNHPSQIFPVTYARKHLQPQDPRPGDQQQPQPHSDQCHCSDKAEDQETEKRSKPLFESTKQMGISRAYSSPPEGQMRSSREDKRTMEIEPSQDQKMNPEETRLFVGVGILKTAR
IKCGEYGQGFSVKPNLTTHQRTHTEEKP
YVCRECGRGFGQKPNLSRHQRTHTGEKP
YVCRECGRGFG.................

>PRDMx_monDom Monodelphis domestica (opossum) gene genome no GAS8 fragment KRAB SSXRD SET weak C2H2 domain
0 0  
0 GEDAFKDISTYFSKKQWVKLKEWEKVRLKNVKRNYEAMIKI 1 
2 GLSVPRPAFMCRGRQNKKVKVEESGDSDEEWIPKQL 1 
2 VKTLRFPSRAKQRTHPK 0  
0 1  
2 DCRRKDVEVHIYSLRERKYQVYQEMWDPQDDDYL 1 
2 yCEECQIFFLDSCPLHGPPTFVQDSAMVKGHPYCSAITLPPGLRIGLSGIPGAGLGVWNEASTLPLGLHFGPYKGKMTEDDEAANSGYSWM 0 
0 ITKGRNCYEYVDGKEESCSNWMR 2  
1 YVNCARDEEEQNLVAFQYHRQIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGSKWKRPLPELTGE 1 
2 GKPGISLCPSTLWASPLIPSSINTRCSKQPP*VFLDSGTGKL*AGRSTAGPATSNRFQLLSDKETSPKEHPSSLWGKTKQVDRREKFSLPQSQQVRGKESSSGEDLSRIQGKSTRQTTMAFQERNR
KECE*GFTHQTNLVTHRWTHSGERP  
YVCV*GFTQKLGFSPYTWTL* 0  
 
>PRDMx_macEug Macropus eugenii (wallaby) ABQO010244377 ABQO010410412 ABQO011136158 ABQO010410657 
0 0  
0 GEDTYKDISMYFSKKQWMELREWEKIRLKNVKQNYEAMIKI 1  
2 gFSAPRPTFMCHGKQNKEAKVEESGDFDEEWIRKQP 1 
2 0  
0 1  
2 ECRRKEAEVHIYNLRERKYQVYQEIWDPQDDDYL 1  
2 FCEECQTFFLETCAVHGPPKFVQDSVMVKGHPYCSAITLPPGLRIGLSGIPGAGLGIWNEASNLPLGLHFGPYEGQMTEDDEAANSGYSWM 0 
0 2  
1 YVNCARDEEEQNLVAFQYHRKIFYRTCQIIRPGCELLVWYGDEYGQELGIKWGSKWKRPPITLT 1 
2 espGIHVCPFCPLGSPLMHSQSTYAAQTSPQICLDSRTRNNYEPDQLLPPSSSCVSDKVEISQKQRPSSLCGKTKQVNLVEMLSLPQSPQVSKKSSSMDWDVSRIQGKSAKQTTQGFQKGDKKGFGS
YKCGEYKQGFTSKSVLNRHRQKHSGKKP
YVCEECGRGFTQVSNLTTHRQTHSGEKP
YVCEECGRGFARKLNLTTHRRTHSGEKP
YVCEECGRGFTQGSSLITHRRTHSGEKP
YVCEECGRGFAWKLNLTTHRRTHSGEKP
YVCKECGRGFTQGSSLITHRRTHSGEKP
YVCKECGRGFTQGSNLTTHRRTHSGEKP
YVCKECGRGFAWKSNLTTHRRTHSGEKP
YVCKECGRGFTQVSNLIAHRRTHSGEKF
YVYGQEFTWKSDLSTCR* 0  

>PRDMx_sarHar Sarcophilus harrisii (tasmanian_devil) AFEY01386448 two distal frameshifts, syntenic -PSMC4
0 0  
0 EEDSFKDISMYFSKKQWMELRDWEKVRFKNVKRNYEAMIKI 1  
2 GLTASRPTFMCRGKQNRRAKVEESGDSDEEWMPKQL 1 
2 VKASRFSSRLKQKTHLR 0  
0 1  
2 eCRKKDAAVHIYNLRERKYPIYQEIWDPQDDDYL 1  
2 FCEECQTFFLETCAVHGPPKFVQDGAMIKGYPYCSAITLPPGLRIGLSGIPNAGLGVWNEGSNLPMGLHFGPYEGKSTEDDEAANSGYSWM 0 
0 ITKGRNSYEYVDGKEESCSNWMR 2  
1 YVNCAREEEEQNLVAFQYQRQIFYRTCRVIRPGCELLVWYGDEYGQELGIKWGRKWKRPLTGIT 1 
2 tspGIHLCPSCPSDFSTHAFLSQQVPKQPSQGFLDSTTGSHGLGNLHPDQLLPPGYSCVSDKAETSRKEHPSTLWEKIKKVDLEEPASLPQRQHVREEESNLGEWDLSRIQGESVKLTSLALQEESQEGLGQ 
YKCGEGKQRYSSKPGLIRHRQRTHSGEKC
YVCEECKRGFARRSYLNIHRRRHSGQKP
HVCEECKRGFADKSTLIRHRWTHSKEKP
YICEECKQGFTQKSYLIKHRWKHLGEKP
YVCKECKQRFTQRSYLNTHRWRHRQRS
LLCMRSAGEDLHRDHLIIHRWTHSGERP
YVCEECKGGFTQRSYLNTHTDGNVGKEEP
YVCEECR* 0

>PRDMxa_ornAna Ornithorhynchus anatinus (platypus) gene genome chrX5 fragment X5 +- 20577549 no iMet possible in first exon phase 2
0 0  
0 1    
2 1  
2 0  
0 1  
2 RIGKKPQVRDFNLRKQKRKIYNENYRPEDDDYL 1 
2 yCEICQTFFLEKCVLHGPPVFVQDLPVEKWRPNRSTITLPPGMQIKVSGIPNAGLGVWNQATSLPRGLHFGPYMGIRTKNEKESHSGYSWM 0 
2 IVRGKNYEYLDGKDKAFSNWMR 2  
1 YVNCARSEREQNLVAIQYQGEIYYRTCRVIPPGQELLVWYGLEYGRHLGILPNNNNPEP 1 
2 ERAKARVRKSERIEKAMARVRKSEQIERAKARVRTSERIERAMATV RKSERIERAKVTVKKSEQIERAMGRVRKSERIERAKDMGRKKALGGLPRPCRGGLSDETQQRKGGGHEQLGQKPGPSEA RAGPAEGSATPRR
HCCDVCRKAFKRLSHLRQHKRIHTGEKP  
LVCKVCRRTFSDPSNLNRHSRIHTGLRP  
YVCKLCRKAFADPSNLKRHVFSHTGHKP  
FVCEKCGKGFNRCDNLKDHSAKHSEDNSTPKP* 0 
 
>PRDMxb_ornAna Ornithorhynchus anatinus (platypus) gene genome chrX5 tandem fragment slight frameshift taa to ta YVN exon X5 +- 20605294 20611704 no iMet possible in first exon phase 2 gg as expected
0 0  
0 1  
2 1  
2 0  
0 1  
2 RSGKKPQVRDFNLRKQKRKMYTEESEPEDDDYL 1 
2 yCEDCQTFFLEKCSVHGPPVFVQDCEAKRCQQNRSEVTLPPGLLIKMSGIPNAGLGVWNQATSLPRGLYFGPFVGIRKNNVKDSLSGYSWA 0 
0 ILRGRNYEYLDGKNTSFSNWMR 2  
1 YVNCPRTKYEQNLVAIQYHREIYYRTTPCDSTRSRVAGVVWRRVRSYLGIFWKSETPKS 1 
2 ERPHSSGGSFAPSARSGGVKQRIWSKRRSAALQRTRERRNSTHDFPPKHEDTAARQDERQCPDRGRAKQRGVRKSEQIERAKAMGRKKALGGLSPPRRERLSDEAGQRKKSGHEQFWQKPGPSEAWAGPAEGSTIPRR
HCCDVCGKAFNRLSRLKQHKRVHTGEKP  
LVCKICKRAFSDPSNLNRHAKRHTGEKP  
FVCRVCGRSFNRSDNMNEHRWKHTSNNIIP NTGHMSATVVENASLCINRNYQIYKERATYL* 0

>PRDM7_danRer Danio rerio (zebrafish) Q6P2A1 transcript BC064665 no KRAB SSXRD or exon 5 but knuckle SET early ZNf C2H2 array
0 MSLSP 1
2 DLPPSEEQNLEIQGSATNCYSVVIIEEQDDTFNDQPF 1
2 YCEMCQQHFIDQCETHGPPSFTCDSPAALGTPQRALLTLPQGLVIGRSSISHAGLGVFNQGQTVPLGMHFGPFDGEEISEEKALDSANSWV 0
0 ICRGNNQYSYIDAEKDTHSNWMK 2
1 FVVCSRSETEQNLVAFQQNGRILFRCCRPISPGQEFRVWYAEEYAQGLGAIWDKIWDNKCISQ 1
2 GSTEEQATQNCPCPFCHYSFPTLVYLHAHVKRTHPNEYAQFTQTHPLESEAHTPITEVEQCLVASDEALSTQTQPVTESPQEQISTQNGQPIHQTENSDEPDASDIYTAAGEISDEI
HACVDCGRSFLRSCHLKRHQRTIHSKEKP
YCCSQCKKCFSQATGLKRHQHTHQEQEKNIESPDRPSDI
YPCTKCTLSFVAKINLHQHLKRHHHGEYLRLVESGSLTAETEEDHT
EVCFDKQDPNYEPPSRGRKSTKNSLKGRGCPKKVAVGRPRGRPPKNKNLEVEVQKIS
PICTNCEQSFSDLETLKTHQCPRRDDEGDNVEHPQEASQ
YICGECIRAFSNLDLLKAHECIQQGEGS
YCCPHCDLYFNRMCNLRRHERTIHSKEKP
YCCTVCLKSFTQSSGLKRHQQSHLRRKSHRQSSALFTAAI
FPCAYCPFSFTDERYLYKHIRRHHPEMSLKYLSFQEGGVLSVEKP
HSCSQCCKSFSTIKGFKNHSCFKQGEKV
YLCPDCGKAFSWFNSLKQHQRIHTGEKP
YTCSQCGKSFVHSGQLNVHLRTHTGEKP
FLCSQCGESFRQSGDLRRHEQKHSGVRP
CQCPDCGKSFSRPQSLKAHQQLHVGTKL
FPCTQCGKSFTRRYHLTRHHQKMHS* 0

>PRDM7_salSal Salmo salar NM_001173912
0 MESEWKSGGEEESGSEGERTPSSSHRDP 1
2 VCVSEQMKRAWLRQMNLRSRARVGYTEEEELRDEEYF 1
2 FCEECKSFFIEECELHGPPLFIPDTPAPLGAPDRARLTLPPGLEVRTSAIPGAGLGVFNHGHSVTQGTHYGPYEGELTDKELDMESGYSWV 0
0 IYKSKQRDEYIDGKRDTHSNWMR 2
1 YVNCARSEDEQNLVAFQYRGGILYRCCKPIAVGEELLVWYGEKYARDLGIVFDFLWDKKCSAR 1
2 GVNESSQSQIFSCSGCLFSFTAQTYLYKHIKRCHREECVRLPRSGGIRAETLAPPSGSQRCSTTPDRTPITLLTQKHRDTGKPAP
HHCSQCGKSFRRSGDLKVHQRTHTGERP
YHCSQCGKRFSVSGHLKTHQRTHTGERP
YHCSQCGKSFCRSGDLKVHQRTHTGERP
YHCSQCGKRFSVSRHLKRHQHIHTGERP
YHCSQCGKSFSASWSVKRHQITVHSVGRVSVSQEA* 0

>PRDM7_oncMyk Oncorhynchus mykiss testis FP324541 CR372724
0 mTPSSSHRDPVC 1
2 VSEQRKRAWLKQVNLCSRARVRVGYTEEEELREEDYF 1
2 FCEECKSFFIEECELHGPPLFIQDTPAPLGAPDRARLTLPPGLEVRTSAIPGAGLGVFNYGHSVTQGTHYGPYEGELTDTELAMESGYSWV 0
0 IYKSKQSDEYIDAKRETHSNWMR 2
1 YVNCARNEEEQNLVAFQYRGGILYRCCKPLAVGEELLVWYGEEYARDLGIIFDFLWDRKSSAR 1
2 GVNESSQSQIFSCSGCPFSFTAQIYLYKHTKRCHREEYVRLPRSGGIRSETLAPPSGSQRCSTTPDRTPITLLTQKHQDTGKPRP
HHCSQCGKSFHRSGDLKVHQRTHTGERP
YHCSQCGKRFSVSGNLKTHQRIHTGERP
YPCSQCGKSFHRSDLKVHQRTHTGEKP
YHCSQCGKRFSVSGNLKTHQRIHTGERL
YPCSQCGKSFHRSELKVQQRTRPGKKTISLFPVWE*

>PRDM7_ictPun Ictalurus punctatus FD367165 FD063496 C-terminus missing second gene present
0 MKTEAKDGGTEGI 2
1 VKKETLELSISNHGNSFHIIPEVVSIKEEEADVKDFL 1
2 YCEVCKSVFFSKCEVHGPALFIADSPVPMGVADRARQTLPPGLEIQKSGIPDAGLGVFNKGETVPVGAHFGPYQGELVDKEEAMNSVYSWV 0
0 IYMSRQCEKYIDAKREVHANWMR 2
1 YVNCAHSDGEQNLVAFQYRGGILYRCCRPINPGQELLVWYEEKYASDVGPIFAQLWNIKCSLSGKVHT

Tracing the early history of individual exons and concatenations

It is instructive to consider certain closely related placental KRAB, ZNF and PRDM genes that may have some connection to the origin of PRDM7 and PRDM9. Nomenclature is very unsatisfactory in these gene families, as can be seen from lack of correspondence between gene name and intronation which is exceedingly well conserved in metazoa. For example, HKR1 a conventional ZNF family member, is egregiously misnamed. The methylase component is exceedingly old with clear antecedents in bacteria. Evidently gene duplications in an early intronless stem eukaryote were subsequently intronated randomly in different paralogs and shuffled into various larger proteins. Within PRDM*, the gene tree is (((PRDM7,PRDM11),(PRDM4,PRDM10)),PRDM6) with others only related by a PR (SET) domain.

>PRDM11_homSap Homo sapiens (human) 511 aa 7 exons chr11:45115564 44% id PRDM9 SET
0 MLKMAEPIASLMIVECRACLRCSPLFLYQREK 0
0 DRMTENMKECLAQTNAAVGDMVTVVKTEVCSPLRDQEYGQPC 2
1 SRRPDSSAMEVEPKKLKGKRDLIVPKSFQQVDFW 1
2 FCESCQEYFVDECPNHGPPVFVSDTPVPVGIPDRAALTIPQGMEVVKDTSGESDVRCVNEVIPKGHIFGPYEGQISTQDKSAGFFSWL 0
0 IVDKNNRYKSIDGSDETKANWMR 2
1 YVVISREEREQNLLAFQHSERIYFRACRDIRPGEWLRVWYSEDYMKRLHSMSQETIHRNLAR 1
2 GEKRLQREKSEQVLDNPEDLRGPIHLSVLRQGKSPYKRGFDEGDVHPQAKKKKIDLIFKDVLEASLESAKVEAHQLALSTSLVIRKVPKYQDDAYSQCATTMTHGVQNIGQTQG
EGDWKVPQGVSKEPGQLEDEEEEPSSFKADSPAEASLASDPHELPTTSFCPNCIRLKKKVRELQAELDMLKSGKLPEPPVLPPQVLELPEFSDPAGKLVWMRLLSEGRVRSGLCGG* 0

>PRDM4_homSap Homo sapiens (human) 801 aa 11 exons chr12:108126644 3DB5:EHGPV..IGVPE SET + 1 + 6 C2H2 domaians
0 MHHR 2
1 MNEMNLSPVGMEQLTSSSVSNALPVSGSHLGLAASPTHSAIPAP 1
2 GLPVAIPNLGPSLSSLPSALSLMLPMGIGDRGVMCGLPERNYTLPPPPYPHLESSYFRTILP 1
2 GILSYLADRPPPQYIHPNSINVDGNTALSITNNPSALDPYQSNGNVGLEPGIVSIDSRSVNTHGAQSLHPSDGHEVALDTAITMENVSRVTSPISTDGMAEELTMDGVAGEHSQIPNGSRSHEPLSVDSVSN
NLAADAVGHGGVIPMHGNGLELPVVMETDHIASRVNGMSDSALSDSIHTVAMSTNSVSVALSTSHNLASLESVSLHEVGLSLEPVAVSSITQEVAMGTGHVDVSSDSLSFVSPSLQMEDSNSNKENMATLFTI 1
2 WCTLCDRAYPSDCPEHGPVTFVPDTPIESRARLSLPKQLVLRQSIVGAEV 1
2 GVWTGETIPVRTCFGPLIGQQSHSMEVAEWTDKAVNHIWK 0
0 IYHNGVLEFCIITTDENECNWMMFVRKAR 2
1 NREEQNLVAYPHDGKIFFCTSQDIPPENELLFYYSRDYAQQI 1
2 GVPEHPDVHLCNCGKECNSYTEFKAHLTSHIHNHLPTQGHSGSHGPSHSKERKWKCSMCPQAFISPSKLHVHFMGHMGMKPHKCDFCSKAFSDPSNLRTHLKIHT 1
2 GQKNYRCTLCDKSFTQKAHLESHMVIHTGEKNLKCDYCDKLFMRRQDLKQHVLIHTQ 2
1 ERQIKCPKCDKLFLRTNHLKKHLNSHEGKRDYVCEKCTKAYLTKYHLTRHLKTCKGPTSSSSAPEEEEEDDSEEEDLADSVGTEDCRINSAVYSADESLSAHK* 0

>PRDM10_homSap length=1160
0 ASLPVHNQVLPSIESVDGSDPLATLQTPLGRLEAKEEEDEDEDEDTEEDEEEDGEDTDLDDWEPDPPRPFDPHDL 1
2 WCEECNNAHASVCPKHGPLHPIPNRPVLTRARASLPLVLYIDRFLGGVFSKRRIPKRTQFGPVEGPLVRGSELKDCYIHLK 0
0 VSLDKGDRKERDLHEDLWFELSDETLCNWMMFVRPAQNHLEQNLVAYQYGHHVYYTTIKNVEPKQELK 0
0 VWYAASYAEFVNQKIHDISEEERK 1
2 VLREQEKNWPCYECNRRFISSEQLQQHLNSHDEKLDVFSR 2
1 TRGRGRGRGKRRFGPGRRPGRPPKFIRLEITSENGEKSDDGTQ 0

>PRDM6_homSap length=595
0 MLKPGDPGGSAFLKVDPAYLQHWQQLFPHGGAGPLKGSGAAGLLSAPQPLQPPPPPPPPERAEPPPDSLRPRPASLSSASSTPASSSTSASSASSCAA
AAAAAALAGLSALPVSQLPVFAPLAAAAVAAEPLPPKELCLGATSGPGPVKCGGGGGGGGEGRGAPRFRCSAEELDYYLYGQQRMEIIPLNQHTSDPNN 1
2 RCDMCADNRNGECPMHGPLHSLRRLVGTSSAAAAAPPPELPEWLRDLPREVCLCTSTVPGLAYGICAAQRIQQGTWIGPFQGVLLPPEKVQAGAVRNTQHLWE 0
0 IYDQDGTLQHFIDGGEPSKSSWMRYIRCARHCGEQNLTVVQYR 2
1 SNIFYRACIDIPRGTELLVWYNDSYTSFFGIPLQCIAQDEN 1
2 LNVPSTVMEAMCRQDALQPFNKSSKLAPTTQQRSVVFPQTPCSRNFSLLDKSGPIESGFNQINVKNQRVLASPTSTSQLHSEFSDWHLWKCGQCFKTFTQRILLQMHVCTQNPDR 2
1 PYQCGHCSQSFSQPSELRNHVVTHSSDRPFKCGYCGRAFAGATTLNNHIRTHTGEKPFK 2
1 CERCERSFTQATQLSRHQRMPNECKPITESPESIEVD* 0
>ZNF133_homSap Homo sapiens (human) NP_001076799 KRAB Krueppel-associated box and zinc fingers
0 MAFRDVAVDFTQDEWRLLSPAQRTLYREVMLENYSNLVSL 1
2 GISFSKPELITQLEQGKETWREEKKCSPATCP 1
2 DPEPELYLDPFCPPGFSSQKFPMQHVLCNHPPWIFTCLCAEGNIQPGDPGPGDQ EKQQQASEGRPWSDQAEGPE GEGAMPLFGRTKKRTLG AFSRPPQRQPVSSRNGLRGVELEASPAQTGNPEETDKLLKRIEVLGFGT
VNCGECGLSFSKMTNLLSHQRIHSGEKP
YVCGVCEKGFSLKKSLARHQKAHSGEKP
IVCRECGRGFNRKSTLIIHERTHSGEKP
YMCSECGRGFSQKSNLIIHQRTHSGEKP
YVCRECGKGFSQKSAVVRHQRTHLEEKT
IVCSDCGLGFSDRSNLISHQRTHSGEKP
YACKECGRCFRQRTTLVNHQRTHSKEKP
YVCGVCGHSFSQNSTLISHRRTHTGEKP
YVCGVCGRGFSLKSHLNRHQNIHSGEKP
IVCKDCGRGFSQQSNLIRHQRTHSGEKP
MVCGECGRGFSQKSNLVAHQRTHSGERP
YVCRECGRGFSHQAGLIRHKRKHSREKP
YMCRQCGLGFGNKSALITHKRAHSEEKP
CVCRECGQGFLQKSHLTLHQMTHTGEKP
YVCKTCGRGFSLKSHLSRHRKTTSVHHR LPVQPDPEPCAGQPSDSLYSL* 0

>ZNF169_homSap Homo sapiens (human) KRAB Krueppel-associated box and zinc fingers
0 MSPGLLTTRKEALMAFRDVAVAFTQKEWKLLSSAQRTLYREVMLENYSHLVSL 1
2 GIAFSKPKLIEQLEQGDEPWREENEHLLDLCP 1
2 EPRTEFQPSFPHLVAFSSSQLLRQYALSGHPTQIFPSSSAGGDFQLEAPRCSSEKGESGETEGPDSSLRKRPSRISRTFFSPHQGDPVEWVEGNREGGTDLRLAQRMSLGGSDTMLKGADTSESGAVIRGNYRLGLSKKSSLFSHQKH
HVCPECGRGFCQRSDLIKHQRTHTGEKP
YLCPECGRRFSQKASLSIHQRKHSGEKP
YVCRECGRHFRYTSSLTNHKRIHSGERP
FVCQECGRGFRQKIALLLHQRTHLEEKP
FVCPECGRGFCQKASLLQHQSSHTGERP
FLCLECGRSFRQQSLLLSHQVTHSGEKP
YVCAECGHSFRQKVTLIRHQRTHTGEKP
YLCPQCGRGFSQKVTLIGHQRTHTGEKP
YLCPDCGRGFGQKVTLIRHQRTHTGEKP
YLCPKCGRAFGFKSLLTRHQRTHSEEEL
YVDRVCGQGLGQKSHLISDQRTHSGEKP
CICDECGRGFGFKSALIRHQRTHSGEKP
YVCRECGRGFSQKSHLHRHRRTKSGHQL LPQEVF* 0
>ZNF343_homSap Homo sapiens (human) KRAB Krueppel-associated box and zinc fingers
0 MMLPYPSALGDQYWEEILLPKNGENVETMKKLTQNHKAK 1
2 GLPSNDTDCPQKKEGKAQIV 0
0 VPVTFRDVTVIFTEAEWKRLSPEQRNLYKEVMLENYRNLLSL 1
2 AEPKPEIYTCSSCLLAFSCQQFLSQHVLQIFLGLCAENHFHPGNSSPGHWKQQGQQYSHVSCWFENAEGQERGGGSKPWSARTEERETSRAFPSPLQRQSASPRKGNMVVETEPSSAQRPNPVQLDKGLKELETLRFGA
INCREYEPDHNLESNFITNPRTLLGKKP
YICSDCGRSFKDRSTLIRHHRIHSMEKP
YVCSECGRGFSQKSNLSRHQRTHSEEKP
YLCRECGQSFRSKSILNRHQWTHSEEKP
YVCSECGRGFSEKSSFIRHQRTHSGEKP
YVCLECGRSFCDKSTLRKHQRIHSGEKP
YVCRECGRGFSQNSDLIKHQRTHLDEKP
YVCRECGRGFCDKSTLIIHERTHSGEKP
YVCGECGRGFSRKSLLLVHQRTHSGEKH
YVCRECRRGFSQKSNLIRHQRTHSNEKP
YICRECGRGFCDKSTLIVHERTHSGEKP
YVCSECGRGFSRKSLLLVHQRTHSGEKH
YVCRECGRGFSHKSNLIRHQRTH* 0
 
>ZNF589_homSap length=364
0 MWAPREQLLGWTAE 1
2 ALPAKDSAWPWEEKPRYL 0
0 GPVTFEDVAVLFTEAEWKRLSLEQRNLYKEVMLENLRNLVSL 1
2 AESKPEVHTCPSCPLAFGSQQFLSQDELHNHPIPGFHAGNQLHPGNPCPEDQPQSQHPSDKNHRGAEAEDQRVEGGVRPLFWSTNERGALVGFSSLFQRPPISSWG
GNRILEIQLSPAQNASSEEVDRISKRAETPGFGAVTFGECALAFNQKSNLFRQKAVTAEKSSDKRQSQVCRECGRGFSRKSQLIIHQRTHTGEKPYVCGECGRGFIVESVLRNHLSTHSG
EKPYVCSHCGRGFSCKPYLIRHQRTHTREKSFMCTVCGRGFREKSELIKHQRIHTGDKPYVCRD* 0

>ZNF596_homSap Homo sapiens (human) KRAB Krueppel-associated box and zinc fingers
0 MESQESVTFQDVAVDFTQEEWALLDTSQRTLFREVMLENISHLVSV 1
2 GNQLYKSDVISHLEQGEQLSREGLGFLQGQSPVISDREDDPKKQEMLSMQHICKKDAPLISAMQWSHTQEDPLECNNFREKFTEILPLTQYVIPQVGKKPFISQDVGKAISYLPSFNIQKQIHSRSKS
YECHQRRNTFIQSSAHRQHNNTQTGEKT
FECHVCRKAFSKSSNLRRHEMIHTGVKP
HGCHLCGKSFTHCSDLRKHERIHTGEKL
YGCHLCGKAFSKSYNLRRHEVIHTKEKP
NECHLCGKAFAHCSDLRKHERTHFGEKP
YGCHLCGKTFSKTSYLRQHERTHNGEKP
YGCHLCGKAFTHCSHLRKHERTHTGEKP
YECHLCGKAFTESSVLRRHERTHTGEKP
YECHLCWKAFTDSSVLKRHERTHTGEKP
YECHLCGKTFNHSSVLRRHERTHTGEKP
YECNICGKAFNRSYNFRLHKRIHTGEKP
YKCYLCGKAFSKYFNLRQHENSCYKGNK* 0

>HKR1_homSap Homo sapiens (human) KRAB Krueppel-associated box and zinc fingers
0 MRVNHTVSTMLPTCMVHRQTMSCSGAGGITAFVAFRDVAVYFTQEEWRLLSPAQRTLHREVMLETYNHLVSL 1
2 EIPSSKPKLIAQLERGEAPWREERKCPLDLCP 1
2 ESKPEIQLSPSCPLIFSSQQALSQHVWLSHLSQLFSSLWAGNPLHLGKHYPEDQ KQQQDPFCFSGKAEWIQE GEDSRLLFGRVSKNGTSKALSSPPEEQQPAQSKEDNTVVDIGSSPERRADLEETDKVLHGLEVSGFGE
IKYEEFGPGFIKESNLLSLQKTQTGETP
YMYTEWGDSFGSMSVLIKNPRTHSGGKP
YVCRECGRGFTWKSNLITHQRTHSGEKP
YVCKDCGRGFTWKSNLFTHQRTHSGLKP
YVCKECGQSFSLKSNLITHQRAHTGEKP
YVCRECGRGFRQHSHLVRHKRTHSGEKP
YICRECEQGFSQKSHLIRHLRTHTGEKP
YVCTECGRHFSWKSNLKTHQRTHSGVKP
YVCLECGQCFSLKSNLNKHQRSHTGEKP
FVCTECGRGFTRKSTLSTHQRTHSGEKP
FVCAECGRGFNDKSTLISHQRTHSGEKP
FMCRECGRRFRQKPNLFRHKRAHSGA
FVCRECGQGFCAKLTLIKHQRAHAGGKP
HVCRECGQGFSRQSHLIRHQRTHSGEKP
YICRKCGRGFSRKSNLIRHQRTHSG* 0
>GAS8_homSap Homo sapiens (human) synteny marker right centromeric positive strand C16orf3- in second intron growth arrest-specific del cancer
MAPKKKGKKGKAKGTPIVDGLAPEDMSKEQVEEHVSRIREELDREREERNYFQLERDKIHTFWEITRRQLEEKKAELRNKDREMEEAEERHQVEIKVYKQKVKHLLYEHQNNLTEMKAEG
TVVMKLAQKEHRIQESVLRKDMRALKVELKEQELASEVVVKNLRLKHTEEITRMRNDFERQVREIEAKYDKKMKMLRDELDLRRKTELHEVEERKNGQIHTLMQRHEEAFTDIKNYYNDI
TLNNLALINSLKEQMEDMRKKEDHLEREMAEVSGQNKRLADPLQKAREEMSEMQKQLANYERDKQILLCTKARLKVREKELKDLQWEHEVLEQRFTKVQQERDELYRKFTAAIQEVQQKT
GFKNLVLERKLQALSAAVEKKEVQFNEVLAASNLDPAALTLVSRKLEDVLESKNSTIKDLQYELAQVCKAHNDLLRTYEAKLLAFGIPLDNVGFKPLETAVIGQTLGQGPAGLVGTPT*

>CDH12_homSap Homo sapiens (human) synteny marker chr 5 794 aa
MLTRNCLSLLLWVLFDGGLLTPLQPQPQQTLATEPRENVIHLPGQRSHFQRVKRGWVWNQFFVLEEYVGSEPQYVGKLHSDLDKGEGTVKYTLSGDGAGTVFTIDETTGDIHAIRSLDRE
EKPFYTLRAQAVDIETRKPLEPESEFIIKVQDINDNEPKFLDGPYVATVPEMSPVGAYVLQVKATDADDPTYGNSARVVYSILQGQPYFSIDPKTGVIRTALPNMDREVKEQYQVLIQAK
DMGGQLGGLAGTTIVNITLTDVNDNPPRFPKSIFHLKVPESSPIGSAIGRIRAVDPDFGQNAEIEYNIVPGDGGNLFDIVTDEDTQEGVIKLKKPLDFETKKAYTFKVEASNLHLDHRFH
SAGPFKDTATVKISVLDVDEPPVFSKPLYTMEVYEDTPVGTIIGAVTAQDLDVGSSAVRYFIDWKSDGDSYFTIDGNEGTIATNELLDRESTAQYNFSIIASKVSNPLLTSKVNILINVL
DVNEFPPEISVPYETAVCENAKPGQIIQIVSAADRDLSPAGQQFSFRLSPEAAIKPNFTVRDFRNNTAGIETRRNGYSRRQQELYFLPVVIEDSSYPVQSSTNTMTIRVCRCDSDGTILS
CNVEAIFLPVGLSTGALIAILLCIVILLAIVVLYVALRRQKKKDTLMTSKEDIRDNVIHYDDEGGGEEDTQAFDIGALRNPKVIEENKIRRDIKPDSLCLPRQRPPMEDNTDIRDFIHQR
LQENDVDPTAPPYDSLATYAYEGSGSVAESLSSIDSLTTEADQDYDYLTDWGPRFKVLADMFGEEESYNPDKVT*

Online References

Open 39 abstracts on PRDM9 and related issues. Or the reverse chronological list below provides free full text for individual articles when that is available:

abs 2011  Briknarova       The PR/SET domain in PRDM4 is preceded by a zinc knuckle.  Proteins 2011 Jul;79(7):2341-5. doi: 10.1002/prot.23057.
pmc 2011  Fledel       Variation in human recombination rates and its genetic determinants.  PLoS One 2011;6(6):e20321.
abs 2011  Neaves       Unisexual reproduction among vertebrates.  Trends Genet. 2011 Mar;27(3):81-8.
abs 2011  Ponting      What are the genomic drivers of the rapid evolution of PRDM9?  Trends Genetics (2011) 1–7
htm 2011  Yanover      Extensive protein and DNA backbone sampling improves structure-based specificity prediction for C2H2 zinc fingers.  Nucleic Acids Res. 2011 Feb 22
pdf 2011  Ubeda        Red Queen theory of recombination hotspots.  J Evol Biol. 2011 Mar;24(3):541-53.
abs 2010  Hochwagen    Meiosis: a PRDM9 guide to the hotspots of recombination.  Curr Biol. 2010 Mar 23;20(6):R271-4.
abs 2010  Klug         The discovery of zinc fingers and practical applications in gene regulation and genome manipulation.  Q Rev Biophys. 2010 Feb;43(1):1-21.
abs 2010  Berg         PRDM9 variation strongly influences recombination hot-spot activity and meiotic instability in humans.  Nat Genet. 2010 Oct;42(10):859-63.
abs 2010  McVean       PRDM9 marks the spot.  Nat Genet. 2010 Oct;42(10):821-2.
pdf 2010  Kong         Fine-scale recombination rate differences between sexes, populations and individuals.  Nature. 2010 Oct 28;467(7319):1099-103.
pmc 2010  Parvanov     Prdm9 controls activation of mammalian recombination hotspots.  Science. 2010 Feb 12;327(5967):835.
pmc 2010  Lorenz       The ancient mammalian KRAB zinc finger gene cluster on human chromosome 8q24.3  BMC Genomics. 2010 Mar 26;11:206. 
pmc 2010  Neale        PRDM9 points the zinc finger at meiotic recombination hotspots.  Genome Biol. 2010;11(2):104.
pmc 2010  Sandovici    PRDM9 sticks its zinc fingers into recombination hotspots and between species.  F1000 Biol Rep. 2010 May 24;2.
pmc 2010  Billings     Patterns of recombination activity on mouse chromosome 11 revealed by high resolution mapping.  PLoS One. 2010 Dec 8;5(12):e15340.
htm 2010  Cheung       Genetic control of hotspots.  Science. 2010 Feb 12;327(5967):791-2.
pdf 2010  Urnov        Highly efficient endogenous human gene correction using designed zinc-finger nucleases.  Nature. 2005 Jun 2;435(7042):646-51.
htm 2010  Zheng        Detecting sequence polymorphisms associated with meiotic recombination hotspots in the human genome.  Genome Biol. 2010;11(10):R103.
htm 2010  Baudat       PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice.  Science. 2010 Feb 12;327(5967):836-40.
htm 2010  Myers        Drive against hotspot motifs in primates implicates the PRDM9 gene in meiotic recombination.  Science. 2010 Feb 12;327(5967):876-9.
pmc 2009  Berglund     Hotspots of biased nucleotide substitutions in human genes.  PLoS Biol. 2009 Jan 27;7(1):e26.
pmc 2009  Thomas       Evolution of C2H2-zinc finger genes revisited.  BMC Evol Biol. 2009 Mar 4;9:51.
pmc 2009  Oliver       Accelerated evolution of the Prdm9 speciation gene across diverse metazoan taxa.  PLoS Genet. 2009 Dec;5(12):e1000753.
pmc 2009  Thomas       Extraordinary molecular evolution in the PRDM9 fertility gene.  PLoS One. 2009 Dec 30;4(12):e8505.
htm 2009  Willis       Origin of species in overdrive.  Science. 2009 Jan 16;323(5912):350-1.
htm 2009  Irie         Single-nucleotide polymorphisms of the PRDM9 (MEISETZ) gene in patients with nonobstructive azoospermia.  J Androl. 2009 Jul-Aug;30(4):426-31.
htm 2009  Mihola       A mouse speciation gene encodes a meiotic histone H3 methyltransferase.  Science. 2009 Jan 16;323(5912):373-5.
abs 2008  Brayer       The protein-binding potential of C2H2 zinc finger domains.  Cell Biochem Biophys. 2008;51(1):9-19.
pmc 2008  Duret        The impact of recombination on nucleotide substitutions in  the human genome.  PLoS Genet. 2008 May 9;4(5):e1000071.
pmc 2008  Miyamoto     Two single nucleotide polymorphisms in PRDM9 (MEISETZ) gene may be a genetic risk factor for Japanese patients with azoospermia by meiotic arrest.  J Assist Reprod Genet. 2008 Nov-Dec;25(11-12):553-7.
htm 2008  Cho          Prediction of DNA binding sites for zinc finger proteins.  BBRC 2008 May 9;369(3):845-8.
pmc 2007  Coop         Live hot, die young: transmission distortion in recombination hotspots.  PLoS Genet. 2007 Mar 9;3(3):e35.
pmc 2007  Fumasoni     Family expansion and gene rearrangements contributed to the functional specialization of PRDM genes in vertebrates.  BMC Evol Biol. 2007 Oct 4;7:187.
pdf 2006  Phillips     A family of zinc-finger proteins is required for chromosome-specific pairing and synapsis during meiosis.  Dev Cell. 2006 Dec;11(6):817-29.
htm 2006  Birtle       Meisetz and the birth of the KRAB motif.  Bioinformatics. 2006 Dec 1;22(23):2841-5. 
pdf 2006  Hayashi      Meisetz, a novel histone tri-methyltransferase, regulates meiosis-specific epigenesis.  Cell Cycle. 2006 Mar;5(6):615-20.
pdf 2005  Hayashi      A histone H3 methyltransferase controls epigenetic events required for meiotic prophase. Nature 2005 Nov 17;438(7066):374-8.
abs 2000  Laity        DNA-induced alpha-helix capping in conserved linker sequences is a determinant of binding affinity in Cys(2)-His(2) zinc fingers.  J Mol Biol. 2000 Jan 28;295(4):719-27.