Bison: nuclear genomics
Bison conservation genomics: introduction
The main nuclear genome of bison, like the mitochondrial genome, will have significant conservation management issues because the consequences of nineteenth and twentieth century bottlenecks (and consequent inbreeding) are still with us today.
Most conservation herds are derived from a tiny founding populations and maintained for many decades at far too low a population level, with surplus animals removed episodically without the slightest consideration of population genetic impacts. Other management practices such elimination of predators, winter feeding, gender imbalance, culling of unruly bulls, and trophy hunts also interfere with natural selection (survival of the fittest).
The founding individuals of a given herd -- even previously wild animals experiencing millenia of natural selection -- still have a substantial genetic load . Autosomal recessives form an important component of that load and are the primary focus here. These are gene mutations found in one of the two copies of non-sex chromosomes that are more or less masked by compensation by the properly functioning copy.
When the founding population is small, the gene frequency of an autosomal recessive mutation is necessarily high. As inbreeding is unavoidable, offspring can inherit two bad copies of the gene. In this homozygous state, no compensation can occur and the disease associated with the mutation is fully manifested. Note populations often harbor mutations at different sites in the same gene. Here the affected offspring can be a compound heterozygote -- two bad copies but at different sites in the same protein.
Looking just at same-site autosomal recessives, the two variables are the frequency q of the bad allele in the population and the coefficient of inbreeding f. The latter simply tallies the percentage of identical alleles by inherited descent (autozygosity). There is an assumption here that would not be valid for the YNP bison nineteenth century bottleneck, namely that the surviving parental animals were not already inbred.
This can be translated into millions of DNA base pairs lacking heterozygosity assuming a bison nuclear genome size of three billion. Then for f at 1/4, there 716,000,000 base pairs of inbreeding derived homozygosity, enough for 5,000 protein coding genes. This DNA will be somewhat broken up into blocks by recombination. For f at 1/8, these numbers are 358,000,000 bp and 2,500 genes. Other coefficients of inbreeding are quickly computed:
- Father/daughter, mother/son or brother/sister → 25%
- Grandfather/granddaughter or grandmother/grandson → 12.5%
- Half-brother/half-sister → 12.5%
- Uncle/niece or aunt/nephew → 12.5%
- Great-grandfather/great-granddaughter or great-grandmother/great-grandson → 6.25%
- Half-uncle/niece or half-aunt/nephew → 6.25%
- First cousins → 6.25%
- First cousins once removed or half-first cousins → 3.1%
- Second cousins or first cousins twice removed → 1.6%
- Second cousins once removed or half-second cousins → 0.78%
The frequency of an autosomal recessive disorder in the offspring of a consanguineous mating is then qf + qq (1-f). Inserting various coefficients of inbreeding and realistic values of deleterious alleles, it quickly emerges that almost all autosomal recessive disease in bison arises from inbreeding. Very rarely does it arise in the offspring of remotely related animals.
Example: suppose a bison bull is dominant for a few years. If it breeds with a calf it previously sired, f is 1/4. The odds that recessive disease did not result from inbreeding only exceed 50-50 when q exceeds 1/3. However q = 0.1 is the largest q known in human disease (hemochromatosis). Cystic fibrosis is another extreme case but there q is only 0.02. For a bison disease allele at that frequency, with a disease observed in the offspring, the odds are overwhelming (94%) that it came from inbreeding, not mating of unrelated bison representative of the whole populations. The odds are still high for the grandparent and first cousin situations (88% and 77%) and are higher still for lower q that are more typical. In summary, autosomal recessive disease in bison can be brought back to natural levels simply by avoidance of inbreeding.
Extensive whole genome sequencing in humans has established that each individual human carries 275 loss-of-function variants and 75 variants previously implicated in inherited disease (both classes typically heterozygous and differing from person to person), additionally varying from the reference human proteome of 9,000,000 amino acids at 10,500 other sites (0.12%). The deleterious alleles include 200 in-frame indels, 90 premature stop codons, 45 splice-site-disrupting variants and 235 deletions shifting reading frame.
In bison the overall genetic load will surely be worse in view of extreme bottlenecks, small herd size history and unavoidable inbreeding. Offspring with deleterious nuclear genes in the homozygous state will be more abundant than in humans who are inbred too but not nearly to the same extent.
Measuring inbreeding in bison
Little bison nuclear genome data is currently available but that situation is changing rapidly with ongoing whole genome sequencing projects not only for bison but also of closely related species such as yak, water buffalo, domestic cow and fossil steppe and plains bison that can help establish a baseline of normality for current conservation herd bison.
Humans however are already intensively studied. Here incest studies in human have transferable implications to bison herds with limited a number of bulls or a single bull maintaining breeding dominance across generations. The graphic at left shows how a human SNP chip detected incest in a 3-year-old boy with multiple medical issues without access to parental dna.
The green blocks show 668 million base pairs of DNA homozygosity out of the 716 Mbp expected for parent-child incest (coefficient of inbreeding 1/4, human genome size 3.000 Mbp). This represents a quarter of the genes, so approximately 5,000 of which 62 would be expected to have carried deleterious mutations. Some 31 on average would now be homozygous deleterious in the child.
Humans exiting Africa experienced significant bottlenecks then and during subsequent glaciations and climate change as well as founding population migrations. Inbreeding was unavoidable at times. Cousin marriage remains very common today in human populations, with some long been closed to outsiders and now rife with autosomal recessive disease. Thus there is considerable applicability of human data to the bison situation.
The diagram below shows genealogical terminology relative to inbreeding. It is important to track gender because X-linked mutations manifest themselves readily in males (because the X chromosome there is single copy). Additionally, the mitochondrial genome is maternally inherited and the two genomes need to be co-managed. The Y chromosome is also of interest because its non-recombining portion in bison-cow hybrid herds would still be intact (initial crosses always used a bison bull).
Incest is a crime in nearly all human societies but management-driven incest in bison is not. The SNP chip here had 620,901 markers, representing 12x the resolution available for the comparable cattle chip applied to bison. Thus the bison chip would give clear results but not the sharp resolution because the median marker spacing would slip to 32.4 kbp and the average spacing to 56.4 kbp. For matings between bison related at the second degree (uncle-niece, double first cousins), the inbreeding coefficient is 1/8 and expected level of homozygosity 358 Mb. Here the calf would carry roughly 15 deleterious mutations.
Bison are routinely corralled and tested for previous exposure to brucellosis. The blood samples taken also serve for DNA sampling, where a tiny volume placed on special filter paper would be stable for years at room temperature. These FTA cards provide DNA suitable for readout on the widely used bovine SNP Illumina beadchip. Thus it is fast and cheap to determine the extent of inbreeding at Yellowstone National Park even though cattle introgression (the original use of the chip in bison) is not the issue there. Inbreeding and long-ago introgression are just opposite extremes.
Actual opportunistic measurement of corralled, radio-collared, or naturally expired animals is vastly preferable to academic exercises in theoretic population modeling. There is no real interest in maintenance of neutral allele frequencies measured by microsatellites or junk DNA SNPs but rather in consequential frequencies of deleterious alleles in specific genes that are the legacy of the initial bottleneck, as well as adaptive alleles. The genes and alleles of interest have only been determined so far for bison mitochondria.
Real bison herds are impossibly complex. The females are strongly matrilocal. Although distinct herds may persist for decades because of physical barriers between valleys, bulls and sometimes family groups of cows may wander between them. The herd sizes and composition fluctuate dramatically from year to year depending primarily on the severity of culls, its targeting to extended family groups, snowfall in winter, and susceptibility to predation and disease.
One wonders what management purpose the obscure theoretical constructs of population ecology can serve in real world conservation genomics, given real bison herds have constantly shifting and essentially unmeasurable hereditary allele parameters in 20,000 genes, with two weakly related bison differing at more than 10,500 amino acid sites.
Microsatellites: obsolete markers for conservation genomics
Microsatellites have been used until quite recently to measure genetic diversity in bison herds and by implication genetic health (apparently defined as adequate diversity rather than quality of that diversity -- genetic load). This is sometimes combined with survival of calves (recruitment) -- another measure that does not consider the level of inherited disease in those calves.
It's worth stressing that ten years ago, proper sequencing was slow and expensive whereas microsatellites and restriction length polymorphism measurement were affordable at population level scales and considered cutting edge. The analogy today would be to use a bovine SNP chip instead of a whole exome chip, itself a cheap substitute for whole genome low coverage reads which in turn are a cheap substitute for whole genome assembly. The goal today is not to manage neutral markers but rather determine the adaptive and maladaptive alleles at the whole genome level.
What are microsatellite markers, how do they get chosen, and what justifies the frequent assumption that they are neutral markers?
Microsatellites are small dna repeats, usually 1-6 base pairs. They are chosen primarily for the ability of adjacent unique sequence to amplify properly, allowing multiplexed PCR reads from both directions through the repeats and lengths to be scored. Bison microsatellites were adapted from known non-syntenic cow loci with large numbers of distinguisable alleles. The dinucleotide repeats used do not occur in coding genes because of reading frame issues. At the time bison microsatellites were chosen, only chromosome number and centimorgan position were known.
It is instructive to take primers for microsatellites historically used in bison parentage testing and actually map them using Blat into the most recent assembly of cow genome. This became feasible with the release of the Bos taurus genome on 27 Sept 2004. That early assembly has been replaced by version 4.2 dated April 2009 and a competing Maryland assembly.
Although determining the genomic context of a few dozen microsatellites takes less than five minutes, it appears not to hve been conducted in bison population ecology, not even to determine whether the microsatellite is autosomal or sex-linked. With the X-linked microsatellite BM6017, its position relative to regions recombining with chr Y needs to be established for purposes of effective population size. Since BM6017 occurs within the first million base pairs (chrX:786,830) it likely recombines with Y (the cow genome utilized a cow).
The results raise various concerns. The features are all dinucleotide repeats, exceedingly prone to replication slippage and so frequent length homoplasy (as well documented in repeat diseases such as Huntington). Worse, the common panel of microsatellites varies from strong phylogenetic conservation over hundreds of millions of years to no conservation at all (candidates for selective neutrality).
Some microsatellite primer pairs are themselves common retroposons (ie were never screened by RepeatMasker or Blast). Other pairs do not map into the same region of the 2009 bovine assembly. Still others have only one mappable primer. In some cases the microsatellites themselves are only portions of larger regions of compositional simplicity, making them even more prone to single-generational expansion and contraction. Yet other primer pairs map to regions that have experienced segmental duplication and so present scoring ambiguity depending on which of the paralogous microsatellite gets amplified.
Four microsatellites with satisfactory genomic mapping lie within coding gene introns but seventeen others do not. While coding introns (and their embedded microsatellites) are not themselves translated into protein, they commonly influence splicing efficiency, alternative splice donors and acceptors, mRNA stability, gene regulation and so on, hence can deviate enormously from postulated neutrality.
The microsatellite database at left shows accession numbers in the first column and chromosomal position spanned by the primer pair in the second. The third shows genome browser screenshots of 200 bp width.
In each small graphic, the bottom line represents the standard phastCons measurement of phylogenetic conservation defined by global whole genome alignement. In cow genome, this track compares cow to dog, human, mouse, and outgroup platypus. The upper band shows the primer pairs; the middle the microsatellite itself.
Note that for about half the microsatellites, the repeat is strongly conserved relative to conservation observed genomewide. In others the conserved region is weaker or broader. A few microsatellites are not conserved at all. The final column comments on the extent of marker neutrality; it was used to provide sort order. Microsatellites could have been chosen consistently from the simple repeat track available for the cow genome browser from 2004 on. However they were not.
While microsatellites are obsolete in bison today because of the bovine SNP chip (which itself is far from ideal), the question remains what to do with legacy microsatellite data and wildlife management policies that were inadvertently misinformed by them.
Microsatellites at Yellowstone clustered into four subpopulations using Genepop 4.0.10 and one-locus estimates of variance yielding the table below of pairwise fixation indices (Fst) for BMS510 (Gardipee pers. comm.). This compares microsatellite differences within and across subpopulations. Fst is a measure of genetic distance in junk dna that does not consider functionally significant differences such as disease, balanced or adaptive alleles. A value of zero for classically defined Fst indicates panmixis (no subpopulation structure); a value of one means totally separate populations. Fst has largely been supplanted by the Jost D statistic. If sample sizes were large enough, the X-linked microsatellite BM6017 might differentiate haploid bull-driven mixing from diploid cow by comparison to autosomal markers such as BM6017.
Pairwise Fst for BMS510 in 4 Yellowstone bison subpopulations: here NR06 stands for the northern range herd in 2006, CR08 for central range herd 2008 etc. CR06 NR06 CR08 NR06 -0.022 CR08 -0.019 -0.018 NR08 -0.017 -0.019 -0.019
The table below shows the comparative genetics of a widely used bison microsatellite (BMS510). This occurs in a large intron between two coding exons of the gene CTNNA3 at cow chromosome chr28:21880829-21880921. This microsatellite was initially characterized ten years ago in bison for purposes of determining cattle introgression and has later used extensively at Yellowstone to measure genetic diversity. An orthologous microsatellite in pig was found independently as one of 10,882 porcine microsatellites and in silico mapping of 4,528 of them into the pig genome.
Like many bison microsatellites, BMS510 presents various issues complicating its use in genealogical associations:
- It is evolutionarily quite old, with orthologous regions easily detectable in a wide range of mammalian genomes. Consistent with its long persistence, this feature is not classified as selectively neutral by the whole genome alignment statistical tool phastCons.
- The length of the dincleotide repeat is quite variable both across species and within individuals of a given species. This implies repeated expansions and contractions by replication slippage occur frequently. Thus accidental agreement of length can be expected for microsatellites with different histories. Similarly, disagreements in length arise so frequently within single generations (based on intensively surveyed human repeat diseasesat 25 loci) that length differences cannot be trusted to provide a reliable measure of overall genetic distance. Homoplasic markers are best avoided in comparative genomics.
- Several dozen microsatellites are commonly used together, so one bad microsatellite would not necessarily taint a study. However it appears from the genome browser screenshots above that BMS510 is by no means an anomaly. If so, adding other microsatellites with varying quality issues might not improve the signal to noise ratio.
Comparative genomics of bison microsatellite BMS510: Human atgattcctttcccaatctacaaat gtgtgtttttgtgtgtgtatgtgtgtgttgtgttgtgtgtgtgt ataaatacattgag Chimp atgattcctttctcaatctacaaat gtgtgtttttgtgtgtgtatgtgtgtgttgtgttgtgtgtgtgt ataaatacattgag Gorilla atgattcctttcccaatctacaaat gtgtgtttgtgtgtgtgtatgtgtgtgttgtgttgtgtgtgtgt ataaatacattgag Orangutan gtgattcctttcccaatctacaaat gtgtgtttatgtgtgtgtatgtgtgtgttgtgttgtgtgtgtgtgt aaatacattgag Rhesus atgatttctttcccaatctacaaattt gtgtttgtgtctgtatgtgtatgttgtgtcatgtgtgtgtgt aaatacactgag Marmoset attatccctatcctaatctacac gtgtgtgtgtgtgtgtgtgtgtgtgt aaaaatgttgag Mouse lemur gtgattcttatcccaatcaagaaat gtgtatatgtgtgtatatgt aaatattttgga Bushbaby atgtttcttatcttaagaaat gtgtgtgtgcaaatgtgtgtgtgtgtgt aaatatgggttagt Tree shrew tcaagtcccatccaaatctagaa gtgtgtgtatgtgtgtgtatgtgtgagt acacacatgcacatg Mouse atgattttcatcccaatctatacatgcat gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtggtgt aatatatcata Rat atgattttcattccaatctaaaaatgagcatga gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgcgt gcgcgcgccttataatacat Guinea pig ctaatttctatcttaatgaggaa gtatgtgtgtg aaacagaga Squirrel atgattttcaccccaacatacaacctaaggatat gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgttt aagtatcagg Alpaca atgattcccatcccagttgagaaatagg gtgtgtgtgtctgtgtgcctaagt acgtcagt Cow genome tgcatgattctcattccagtctagaaac gtgtgtgtgtgtgtgtgtgtgtgtgtg cattaatacattagcagcaga Cow trace1 tgcatgattctcattccagtctagaaac gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtg cattaatacattagcagcaga Cow trace2 tgcatgattctcattccagtctagaagc gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtg cattaatacattagcagcaga Cow trace3 tgcatgattctcattccagtctagaaac gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtg cattaatacattagcagcaga Cow trace4 tgcatgattctcattccagtctagaaac gtgtgtgtgtgtgtgtgtgtgtgtgtgtggtgtgtgtgtgtg cattaatacattagcagcaga Bison 92 tgcatgattctcattccagtctagaaacatgtat gtgtgtgtgtgtgtgtgtgtg cattaatacattagcagcaga Bison 91 tgcatgattctcattccagtctagaaacatt gtgtgtgtgtgtgtgtgtgtgtg cattaatacattagcagcaga Bison 94 tgcatgattctcattccagtctagaaacatgtat gtgtgtgtgtgtgtgtgtgtgtg cattaatacattagcagcaga Bison 95 tgcatgattctcattccagtctagaaacatt gtgtgtgtgtgtgtgtgtgtgtgtgtg cattaatacattagcagcaga Sheep ttcataattctcatttcagtctagaaacatgtat gtgtgtgtgtgtgtgtgtgtgtg cattaatacattagca Pig genome atgattctaaccccagtctagaaatacactg gtgtgtgtgtgtgtgtgtgtgtgtgt gcgtgcacgcacacataaa pig KVL2571 atgattctcaccccagtctagaaatacagtg gtgtgtgtgtgtgtgtgtgtgtgtgt gcgcgcgtgcacgcaca Horse atgatttccatcccaatctagaaatac gtgtgtg gggcatagatacat Cat atgattctcagcccaatctagaaattt gtgtgtgtgtgcacatgtgtgtg ctcatataagcata Dog atgattcccatcccaatctagaagttt gtgtgtatttgtgtgcatgcatgtg catgcatgtatgcc Microbat gtgattcccattccaatctagaaat gtgtgtgcatgtatgtgtgtgt aaatacatgagc Megabat atgattcctatcctaatctagaaat gtgtgtttctgtgtgtgtg agtatgtgtgtgag Rock hyrax aatgtttcataattgtgcatgtatgg gtgtgtgt atatgtatacat Tenrec atgattctcatcccaatctaggg gtgtgtgtgtgtgtgtgtgtgt aaaaggg Maximal possible alignment between cow trace reads and bison microsatellite BMS510 variants: bosTau 510486062 ACATTTTTAGATGCTGCATGATTCT-CATTCCAGTCTAGAAAC GTGTGTGTGTGTGTGTGTGTGTGTGTGTGGTGTGTGTGTGTG CATTAATACATTAGCAGCAGAGAACAGGGAACGGCT bosTau 387503787 ACATTTTTAGATGCTGCATGATTCT-CATTCCAGTCTAGAAAC GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG--- CATTAATACATTAGCAGCAGAGAACAGGAA--GGCT bosTau 772917044 ACATTTTTAGATGCTGCATGATTCT-CATTCCAGTCTAGAAGC GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG----- CATTAATACATTAGCAGCAGAGAACAGGAA--GGCT bosTau 564338658 ACATTTTTAGATGCTGCATGATTCT-CATTCCAGTCTAGAAAC GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG----------- CATTAATACATTAGCAGCAGAGAACAGGAA--GGCT bosTau 606697323 ACATTTTTAGATGCTGCATGATTAGGCATTCCAGTCTAGAAAC GTGTGTGTGTGTGTGTGTGTGAGTGTG--------------- CATTAATACATTAGCAGCAGAGAACAGGAA--GGCT bisBis BMS510-92 TGCTGCATGATTCT-CATTCCAGTCTAGAAAC ATGTATGTGTGTGTGTGTGTGTGTGTG--------------- CATTAATACATTAGCAGCAGAGAACAGGAA--GGCT bisBis BMS510-94 TGCTGCATGATTCT-CATTCCAGTCTAGAAAC ATGTATGTGTGTGTGTGTGTGTGTGTGTG------------- CATTAATACATTAGCAGCAGAGAACAGGAA--GGCT bisBis BMS510-91 TGCTGCATGATTCT-CATTCCAGTCTAGAAAC AT-TGTGTGTGTGTGTGTGTGTGTGTG--------------- CATTAATACATTAGCAGCAGAGAACAGGAA--GGCT bisBis BMS510-95 TGCTGCATGATTCT-CATTCCAGTCTAGAAAC AT-TGTGTGTGTGTGTGTGTGTGTGTGTGTG----------- CATTAATACATTAGCAGCAGAGAACAGGAA--GGCT
Results to date from the bovine SNP chip
(to be continued)
The bison prion gene marker M17T
The prion gene PRNP is one of the most intensively sequenced of all mammalian autosomal genes. The bison PRNP protein has three complete GenBank entries (none sourced to herd) and two accompanying publications. The source of AY769958 is WGFD WBb0401, suggesting Wyoming Game and Fish Department, Wyoming Bison bison 0401 but that sequence is not further discussed (or even used) in full text.
The other two came from a remarkable 301 bison sequencing survey of federal herds including Yellowstone, Grand Teton, Wind Cave, and Theodore Roosevelt national parks and Henry Mts on BLM. The only coding variant observed in bison, M17T form the two representative GenBank entries. The overall frequency of M17T, reported as T:C ratio in dna, is 69:31 (pers. comm. CM Seabury, JN Derr).
At Yellowstone (where sample size is sufficient for statistics), the allele frequencies are in Hardy-Weinberg equilibrium despite not particularly well-mixed herds. However dna sampling may have opportunistically coincided with brucellosis testing which, unlike a later Gardipee fecal sampling protocol, would predominately sample the northern herd. In any event, both YNP PRNP and mitochondrial haplotype studies are snapshots 4-8 years back of a bison population with very high turn-over because of large scale culls and natural causes.
MM MT TT bison M T 2MT YNP 70 107 43 220 0.561 0.439 0.495 GTNP 4 10 1 15 WC 10 5 1 16 TR 3 6 1 10 HM 12 6 2 20 TNRVJ 14 5 1 20 totals 113 141 49 301 PRNP signal peptide M17T is a T to C transition in second letter of codon relative to ancestral: atggtgaaaagccacataggcagttggatcctggttctctttgtggccatgtggagtgacgtgggcctctgcaagaagcgaccaaaacctgga M V K S H I G S W I L V L F V A M W S D V G L C K K R P K P G atggtgaaaagccacataggcagttggatcctggttctctttgtggccacgtggagtgacgtgggcctctgcaagaagcgaccaaaacctgga M V K S H I G S W I L V L F V A T W S D V G L C K K R P K P G
Although there is no conservation genomics to 'manage' in the case of M17T, the short shelf-life of sequence data illustrates the effort necessary to collect real-time data and manage by it. Gardipee determined the mitochondrial haplotypes of altogether different bison. The Derr group may or may not have used dna samples from the prion study for microsatellite (genetic diversity) or haplotype studies. It would be quite complex to simultaneously maintain genetic diversity, reduce M17T (were this a disease allele), and reduce frequency of the mitochondrial disease haplotype 6.
Methionine at position 17 of bison PRNP matches those of its immediate outgroups, yak, domestic cow, and water buffalo. All 5 available yak prion sequences have methionine -- Bos grunniens is the immediate sister species to bison. Two available Bison bonasus sequences are also M17, as are 4 Syncerus caffer, 6 Bubalus bubalis, 11 Tragelaphus, 2 Boselaphus and nearly a thousand Bos taurus. Indeed all available Bovinae sequences have M17. Beyond Bovinae, none of the 635 available Bovidae sequences in 8 other species (mostly sheep) have threonine at position 17. The situation is the same for 124 pecoran ruminants excluding Bovidae.
A bison-like threonine is first encountered in 18 PRNP more distant Cetartiodactyla sequences, primarily whales and camels. At this level of divergence, alanine and cysteine are also encountered. Curiously, threonine is ancestral for placental mammals as a whole (see alignment below).
This data establishes that the mammalian reduced alphabet at position 17 consists of methionine and threonine. Although these may coexist in some clades as persistent polymorphisms, in pecoran ruminants, methionine appears to have completely displaced threonine (reduced its frequency below 0.1%). Thus it is highly implausible that M17T in bison resulted from lineage sorting favoring a low-lying allele present in ruminants all along.
On the other hand, the bison population went from 30,000,000 to 300 in a decade or so. The 30 survivors at YNP may have been inbred as well. Just as the reference human genome accidentally captured what we now know to be rare alleles quite unrepresentative of the U.S. human population -- for example the four octapeptide allele of PRNP, a 2% allele -- the surviving bison did not capture reduced alphabet frequencies of the 20,000 coding gene proteome of the nineteenth century large bison population.
This state of affairs can be illustrated with a webLogo graphic of nine million amino acids width (entire bison proteome). Here the height of postScript letters represents the allele spectrum frequencies (optionally adjusted for entropy). Neglecting linkage (haploblock size taken as 3 bp) and inbreeding, the founding population at YNP amounts to sampling across the Logos 30 times, at each site in proportion to letter height. Clade-specific trends could be represented by a series that gradually expands the outgroup. Too narrow a clade will in general have poor statistics. Here a high quality Bovini-only analysis can be made to accompany the all-mammal picture below.
Since there is no support for a threonine component for tens of millions of years of pecoran ancestry, more parsimonously a de novo mutation occurred threonine re-introducing after bison divergence from yak. Provided threonine remained part of the reduced alphabet, the change was acceptable (neutral). If it occurred in a prolific bull in conjunction with founding populations, the allele could have quickly attained the frequencies of observed today. Given M17T occurs in 6 herds, the original mutation probably preceded the main nineteenth century bison bottleneck. Past allele ratios could still be assessed using the hundreds of fossil bison and steppe bison dna samples collected by Shapiro.
Note threonine has four codons and methionine but one. Intriguingly, the sole threonine codon that can change to methionine in one step is a CpG hotspot site, ACG. This more typically resolves to ACA, still threonine, but the pyrimidine transition to ATG methionine still occurs. And while the back mutation T to C is by no means uncommon, the rate asymmetry (CpG faster than ordinary T to C transition) affects interpretation. In humans, where only threonine is found in over a thousand PRNP sequences, the ACA codon is used, not ACG. Methionine could arise only from a two-stage process.
The M17T residue lies in the middle of the signal peptide cleaved during maturation so does not appear in the final GPI-anchored protein on the exterior of the cytoplasmic membrane unless abnormally processed. M17T does not prevent proper maturation cleavage according to thoroughly vetted bioinformatic prediction tools such as SignalP. Consequently neither allele causes prion disease nor influences the species barrier (transmission from other species). Although residue 17 is under strong selection like the rest of the signal peptide, M17T is neither a balanced polymorphism (such as E6V hemoglobin in malarial resistance) nor an adaptive shift to threonine -- M17T simply reflects bouncing around within the confines of a long-established reduced alphabet.
From the phylogenetic standpoint, methionine and threonine constitute the reduced alphabet at position 17. Alanine is less common but also tolerated. These amino acids are neighbors in the genetic code related by single base pair transitions (threonine taken as central). Note however that amino acids such as branched chain aliphatics also related by simple common mutations but not observed. This raises the question of why the PRNP signal peptide is so conserved relative to other signal peptides, for example PLBD2.
Some 4500 genes have signal peptides in mammals, all interfacing with the same signal receptor processor (SRP). Such many-to-one protein interactions cannot co-evolve (as claimed for speciation by Ernst Mayr) because if the SRP changed to accommodate a change in PRNP, that change would throw off its adaptive fit to the other 4499 proteins it must continue to recognize.
The PRNP-specific conservation suggests that the signal peptide of PRNP might not always be cleaved or influence protein processing in some other way, resulting in a mix of ultimate cellular destinations or different membrane topologies. Indeed two papers provided evidence for alternative C- or N-terminal insertion of retained single-pass membrane retention in the endoplasmic reticulum lumen.
However no significance to this was ever found and the research track has since been abandoned. Strong selection on the prion protein overall also remains a mystery in view of minimal impacts of knockout mutations and implausible compensation by an immensely diverged tandem paralog PRND whose signal peptide does not exhibit such striking conservation.
Regardless, M17T is implausibly functionally significant because threonine occurs so widely. Note several other residues have significant reduced alphabets including the adjacent position 16. indels in signal peptides are also unusual and seldom recurrent; the one at position 3 is a striking synapomorphy of euarchontoglires but with no known functional significance. This represents a deletion in the common ancestor of rodents and primates; no event took place in the bison lineage.
Thus M17T primarily serves as a neutral nuclear gene marker in bison, though unfortunately is not represented in the bovine SNP chip (the polymorphism has not been observed among a thousand bovine PRNP genes sequenced in several dozen widely varying breeds). PRNP is located on the bison counterpart to bovine chromosome 13 so its inheritance could not correlate in the manner of chr X or chr Y with maternal inheritance of mitochondrial dna. One commonly used bison microsatellite AGLA232 maps onto cow chromosome 13 like PRNP but not into particularly close proximity (position 77,616,098 vs 47,231,024 in the October 2007 cow assembly). M17T in the homozygous state implies no cattle introgression but only for its (limited) haploblock.
The species barrier for prion disease is difficult to predict but that of bison will be identical to cow. That risk comes from two main sources: germline or somatic mutation in an individual or transmission from deer, elk or moose affected with chronic wasting disease (scrapie that has previously crossed the cervid species barrier). Bison are not mixed with cattle because of the brucellosis issue and sheep allotments are not commonly in proximity making transmission from public lands mad cow or scrapie sheep implausible.
By far the greatest risk to bison comes from winter hay feeding at the National Elk Refuge near Jackson, Wyoming. CWD will explode in the next few years at such concentrated feeding sites. Given that sheep scrapie crossed the species barrier to mule deer at a Colorado Game & Fish facility, bison should not be put at risk at the National Elk Refuge.
Inherited prion disease is autosomal dominant with high penetrance so does not depend like autosomal recessive disease on inbreeding to bring rare alleles together (though late onset or prion disease has brought about large human pedigrees). It represents toxic gain of function, not loss of normal protein function. That remains unknown despite 12,051 scientific studies as of February 2011. There are no known human diseases associated with point mutations in PRNP>
The greatest single risk for inherited prion disease in bison is amplification of the octapeptide repeat region PHGGGWGQ by replication slippage. Here bison have six octapeptide repeats, in the normal range but nonetheless an enhanced risk factor for disease expansion. Note the number of bison in conservation herds is very small relative to the one-per-ten-million incidence of repeat expansion observed in human. Domestic cattle may have four to seven repeats with the 7x repeat in Brown Swiss a borderline concern as it is in human.
CpG hotspot mutations dominate the point mutation spectrum in mammals. The most common outcome for a CpG mutation is the purine transition CpA. Any of 12 amino acid substitutions can arise. When CpG resolves as the pyrimidine transition TpG, 8 non-synonymous outcomes are possible in addition to internal stop codons (which would not cause prion disease). These 20 point mutations are shown at bottom and need individual assessment for disease-causing potential.
A known CpG point mutation in domestic cattle E211K (corresponding to homologous E200K human pathogenic allele) causes genetically based mad cow disease, as specifically predicted eleven years earlier by the author here. That constitutes a proven risk factor for bison which has CpG in the identical position (bottom).
Again, the number of bison in conservation herds is dwarfed by the 103,000,000 million cattle for which E211K has been observed (heterozygously) in one source cow its calf, and by presumption some degree of ancestors, though prevalence is low in surveys. Although artificial insemination is practised on a massive scale, it has not yet been determined by haplotyping whether E211K arose from the sire. The affected ten year old cow was initially said a Bos indicus x Bos taurus hybrid.
Without doubt, mutations capable of causing prion disease arose each year in the large pre-contact North American bison herd. However the expected age of onset may have been a high multiple of typical lifespans and actual disease or transmission very rare. The population of conservation herds today is too small for de novo transmissible spongiform encephalopathy to arise at any significant frequency.
MVKSHIGSWILVLFVAMWSDVGLCKKRPKPG Bison bison CR227All1 AY769958 MVKSHIGSWILVLFVATWSDVGLCKKRPKPG Bison bison CR227All2 MVKSHIGSWILVLFVAMWSDVGLCKKRPKPG Bison bonasus MVKSHIGSWILVLFVAMWSDVGLCKKRPKPG Bos taurus MVKRHIGSWILVLFVVMWSDVGLCKKRPKPG Bubalus bubalis MVKSHIGSWILVLFVVMWSDVGLCKKRPKPG Syncerus caffer MVKSHIGSWILVLFVAMWSDVALCKKRPKPG Tragelaphus strepsiceros MVKSHIGSWILVLFVAMWSDVALCKKRPKPG Oryx leucoryx MVKSHIGSWILVLFVAMWSDVGLCKKRPKPG Capreolus capreolus MVKSHIGSWILVLFVAMWSDVGLCKKRPKPG Kobus megaceros MVKSHIGSWILVLFVAMWSDVGLCKKRPKPG Connochaetes taurinus MVKSHIGSWILVLFVAMWSDVGLCKKRPKPG Ammotragus lervia MVKSHIGSWILVLFVAMWSDVGLCKKRPKPG Hippotragus niger MVKSHIGSWILVLFVAMWSDVGLCKKRPKPG Ovibos moschatus MVKSHIGSWILVLFVAMWSDVGLCKKRPKPG Ovis aries MVKSHIGSWILVLFVAMWSDVGLCKKRPKPG Ovis canadensis MVKSHIGSWILVLFVAMWSDVGLCKKRPKPG Capris hircus MVKSHIGSWILVLFVAMWSDVGLCKKRPKPG Cervus elaphus MVKSHIGSWILVLFVAMWSDVGLCKKRPKPG Cervus elaphus nelsoni MVKSHIGSWILVLFVAMWSDVGLCKKRPKPG Dama dama MVKSHIGSWILVLFVAMWSDVGLCKKRPKPG Odocoileus virginianus MVKSHIGSWILVLFVAMWSDVGLCKKRPKPG Rangifer tarandus MVKSHIGSWILVLFVAMWSDVGLCKKRPKPG Alces alces MVKSHIANWILVLFVATWSDMGFCKKRPKPG Tursiops truncatus MVKSHIGGWILVLFVAAWSDIGLCKKRPKPG Sus scrofa MVKSHMGSWILVLFVVTWSDVGLCKKRPKPG Camelus dromedarius MVKSHMGSWILVLFVVTWSDMGLCKKRPKPG Vicugna vicugna MVKSLVGGWILLLFVATWSDVGLCKKRPKPG Myotis lucifugus MVKNYIGGWILVLFVATWSDVGLCKKRPKPG Pteropus vampyrus MVKSHIGGWILLLFVATWSDVGLCKKRPKPG Canis familiaris MVKSHIGSWILVLFVAMWSDVGLCKKRPKPG Felis catus MVKSHIGSWLLVLFVATWSDIGFCKKRPKPG Mustela putorius MVKSHIGSWLLVLFVATWSDIGFCKKRPKPG Mustela vison MVKSHIGSWILVLFVAMWSDVGLCKKRPKPG Ailuropoda melanoleuca MVKSHVGGWILVLFVATWSDVGLCKKRPKPG Equus caballus MVRSHVGGWILVLFVATWSDVGLCKKRPKPG Diceros bicornis MVKNHVGCWLLVLFVATWSEVGLCKKRPKPG Erinaceus europaeus MVTGHLGCWLLVLFMATWSDVGLCKKRPKPG Sorex araneus MA--NLGCWMLVLFVATWSDLGLCKKRPKPG Homo sapiens MA--NLGCWMLVLFVATWSDLGLCKKRPKPG Pan troglodytes MA--NLGCWMLVLFVATWSDLGLCKKRPKPG Gorilla gorilla MA--NLGCWMLVLFVATWSDLGLCKKRPKPG Pongo pygmaeus MA--NLGCWMLVLFVATWSDLGLCKKRPKPG Nomascus leucogenys MA--NLGCWMLVLFVATWSDLGLCKKRPKPG Hylobates lar MA--NLGCWMLVLFVATWSDLGLCKKRPKPG Symphalangus syndactylus MA--NLGCWMLVLFVATWSDLGLCKKRPKPG Macaca arctoides MA--NLGCWMLVLFVATWSDLGLCKKRPKPG Macaca fascicularis MA--NLGCWMLVLFVATWSDLGLCKKRPKPG Macaca fuscata MA--NLGCWMLVLFVATWSDLGLCKKRPKPG Macaca mulatta MA--NLGCWMLVLFVATWSDLGLCKKRPKPG Macaca nemestrina MA--NLGCWMLVLFVATWSDLGLCKKRPKPG Papio hamadryas MA--NLGCWMLFLFVATWSDLGLCKKRPKPG Callithrix jacchus MA--NLGCWMLVLFVATWSDLGLCKKRPKPG Cebus apella MA--NLGCWMLVVFVATWSDLGLCKKRPKPG Cercopithecus aethiops MA--NLGCWMLVVFVATWSDLGLCKKRPKPG Cercopithecus dianae MA--NLGCWMLVLFVATWSDLGLCKKRPKPG Colobus guereza MA--NLGCWMLVLFVATWSDLGLCKKRPKPG Presbytis francoisi MA--NLGCWMLVLFVATWSDLGLCKKRPKPG Saimiri sciureus MA--KLGYWLLVLFVATWSDVGLCKKRPKPG Tarsius syrichta MA--NLGCWMLVVFVATWSDVGLCKKRPKPG Microcebus murinus MA--RLGCWMLVLFVATWSDIGLCKKRPKPG Otolemur garnettii ME--NLGCWMLILFVATWSDIGLCKKRPKPG Cynocephalus variegatus MA--QLGCWLMVLFVATWSDVGLCKKRPKPG Tupaia belangeri MA--NLGYWLLALFVTMWTDVGLCKKRPKPG Mus musculus MA--NLGYWLLALFVTTCTDVGLCKKRPKPG Rattus norvegicus MA--NLGYWLLALFVTTCTDVGLCKKRPKPG Rattus rattus MA--NAGCWLLVLFVATWSDTGLCKKRPKPG Cavia porcellus MA--NLGYWLLALFVTTWTDVGLCKKRPKPG Apodemus sylvaticus MA--NLGCWLLVLFVATWSDLGLCKKRTKPG Dipodomys ordii MA--NLSYWLLAFFVTTWTDVGLCKKRPKPG Clethrionomys glareolus MA--NLSYWLLALFVATWTDVGLCKKRPKPG Cricetulus griseus MA--NLSYWLLALFVATWTDVGLCKKRPKPG Cricetulus migratorius MA--NLGYWLLALFVTMWTDVGLCKKRPKPG Meriones unguiculatus MA--NLSYWLLALFVAMWTDVGLCKKRPKPG Mesocricetus auratus MA--NLGYWLLALFVATWTDVGLCKKRPKPG Sigmodon fulviventer MA--NLGYWLLALFVATWTDVGLCKKRPKPG Sigmodon hispiedis MV--NPGCWLLVLFVATLSDVGLCKKRPKPG Spermophilus tridecemlineatus MV--NPGYWLLVLFVATLSDVGLCKKRPKPG Sciurus vulgaris MA--HLGYWMLLLFVATWSDVGLCKKRPKPG Oryctolagus cuniculus MA--HLSYWLLVLFVAAWSDVGLCKKRPKPG Ochotona princeps MVKSHLGCWIMVLFVATWSEVGLCKKRPKPG Cyclopes didactylus MVRSRVGCWLLLLFVATWSELGLCKKRPKPG Dasypus novemcinctus MVKGTVSCWLLVLVVAACSDMGLCKKRPKPG Echinops telfairi MVKSSLGCWILVLFVATWSDMGLCKKRPKPG Elephas maximus MVKSSLGCWILVLFVATWSDMGLCKKRPKPG Loxodonta africana MVKSSLGCWMLVLFVATWSDVGLCKKRPKPG Procavia capensis MMKSGLGCWILVLFVATWSDVGLCKKRPKPG Orycteropus afer MVKSGLGCWILVLFVATWSDVGVCKKRPKPG Trichechus manatus MAKIQLGYWILALFIVTWSELGLCKKPKTRPG Macropus eugenii MGKIHLGYWFLALFIMTWSDLTLCKKPKPRPG Monodelphis domestica MGKIQLGYWILVLFIVTWSDLGLCKKPKPRPG Trichosurus vulpecular Effect of CpG hotspot mutation on bison prion protein normal: 1 MVKSHIGSWILVLFVAMWSDVGLCKKRPKPGGGWNTGGSRYPGQGSPGGNRYPPQGGGGW 60 MVKSHIGSWILVLFVAMWSD+GLCKK+PKPGGGWNTGGS+YPGQGSPGGN YPPQGGGGW CpG CpA: 1 MVKSHIGSWILVLFVAMWSDMGLCKKQPKPGGGWNTGGSQYPGQGSPGGNHYPPQGGGGW 60 normal: 61 GQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGGWGQGGTHGQWNKPSKPKTNM 120 GQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGGWGQGGTH QWNKPSKPKTNM CpG CpA: 61 GQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGGWGQGGTHSQWNKPSKPKTNM 120 normal: 121 KHVAGAAAAGAVVGGLGGYMLGSAMSRPLIHFGSDYEDRYYRENMHRYPNQVYYRPVDQY 180 KHVAGAAAAGAVVGGLGGYMLGSAMSRPLIHFGSDYED YY ENMH YPNQVYYRPVDQY CpG CpA: 121 KHVAGAAAAGAVVGGLGGYMLGSAMSRPLIHFGSDYEDHYYHENMHHYPNQVYYRPVDQY 180 normal: 181 SNQNNFVHDCVNITVKEHTVTTTTKGENFTETDIKMMERVVEQMCITQYQRESQAYYQRGASVIL 245 SNQNNFVHDCVNITVKEHTVTTTTKGENFT+TDIKMME+VVEQMCITQYQRESQAYYQ+GASVIL CpG CpA: 181 SNQNNFVHDCVNITVKEHTVTTTTKGENFTKTDIKMMEQVVEQMCITQYQRESQAYYQQGASVIL 245 normal: 1 MVKSHIGSWILVLFVAMWSDVGLCKKRPKPGGGWNTGGSRYPGQGSPGGNRYPPQGGGGW 60 MVKSHIGSWILVLFVAMWSDVGLCKK PKPGGGWNTGGS YPGQGSPGGN YPPQGGGGW CpG TpG: 1 MVKSHIGSWILVLFVAMWSDVGLCKK-PKPGGGWNTGGS-YPGQGSPGGNCYPPQGGGGW 58 normal: 61 GQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGGWGQGGTHGQWNKPSKPKTNM 120 GQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGGWGQGGTHGQWNKPSKPKTNM CpG TpG: 59 GQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGGWGQGGTHGQWNKPSKPKTNM 118 normal: 121 KHVAGAAAAGAVVGGLGGYMLGSAMSRPLIHFGSDYEDRYYRENMHRYPNQVYYRPVDQY 180 KHVAGAAAAGAVVGGLGGYMLGSAMSRPLIHFGSDYED YY ENMH YPNQVYYRPVDQY CpG TpG: 119 KHVAGAAAAGAVVGGLGGYMLGSAMSRPLIHFGSDYEDCYYCENMHCYPNQVYYRPVDQY 178 normal: 181 SNQNNFVHDCVNITVKEHTVTTTTKGENFTETDIKMMERVVEQMCITQYQRESQAYYQRGASVIL 245 SNQNNFVHDCVNITVKEHTVTTTTKGENFTETDIKMME VVEQMCITQYQRESQAYYQ GASVIL CpG TpG: 179 SNQNNFVHDCVNITVKEHTVTTTTKGENFTETDIKMME-VVEQMCITQYQRESQAYYQ-GASVIL 245