Bison: mitochondrial genomics: Difference between revisions

From genomewiki
Jump to navigationJump to search
No edit summary
Line 14: Line 14:
=== Interpreting yak CYTB variation ===  
=== Interpreting yak CYTB variation ===  


Although the mitochondria encodes the usual 20 amino acids, only a subset of physio-chemically similar residues (the reduced alphabet) ever appear at a given position in a given protein. This subset describes the acceptable substitutions that do not significantly disrupt protein functionality. Discovery of this reduced alphabet can be achieved with greater sensitivity when the number of available species and their individual sequencies multiplicities are high. For mitochondrial proteins, that sensitivity is 1 in 10,000 (0.01% occurence frequency) for a given amino acid.  
Although the mitochondria encodes the usual 20 amino acids, only a subset of physio-chemically similar residues (the reduced alphabet) ever appear at a given position in a given protein. This subset describes the acceptable substitutions that do not significantly disrupt protein functionality. Discovery of this reduced alphabet can be achieved with greater sensitivity when the number of available species and their individual sequences multiplicities are high. For mitochondrial proteins, that sensitivity is 1 in 10,000 (0.01% occurrence frequency) for a given amino acid.  


Interpretive certainty is never attained without experimentation but improves (up to a point) with more sequence data. Here it is important to check whether certain less common substitutions have persisted over evolutionary time in a phylogenetically coherent manner (ie a sub-clade) or are novel adaptations perhaps in conjunction with a co-evolving residue at another site (or another protein, perhaps even nuclear-encoded). After these considerations, the remaining rare changes are either deleterious or sequencing error. Polymorphism significance can be pursued at the xray structural level for only 3 of the 13 mitochondrial proteins (CYTB, COX2, COX1) and even this is complicated in the case of CYTB by its oliomeric association with 3 nuclear encoded proteins.
Interpretive certainty is never attained without experimentation but improves (up to a point) with more sequence data. Here it is important to check whether certain less common substitutions have persisted over evolutionary time in a phylogenetically coherent manner (ie a sub-clade) or are novel adaptations perhaps in conjunction with a co-evolving residue at another site (or another protein, perhaps even nuclear-encoded). After these considerations, the remaining rare changes are either deleterious or sequencing error. Polymorphism significance can be pursued at the xray structural level for only 3 of the 13 mitochondrial proteins (CYTB, COX2, COX1) and even this is complicated in the case of CYTB by its oligomeric association with 3 nuclear encoded proteins.


Aligning CTYB from the 70 complete yak mitochondrial genomes available on 1 Dec 10 shows variation at just 9 sites along the protein (ie 9 nsSNPs). These are quickly found when the [http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_multalin.html web alignment tool] retains input sequence order, displays residues identical to the top sequence as dots, gaps fragmentary data correctly, and allows a wide display permitting effective cross-species comparisons.  
Aligning CTYB from the 70 complete yak mitochondrial genomes available on 1 Dec 10 shows variation at just 9 sites along the protein (ie 9 nsSNPs). These are quickly found when the [http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_multalin.html web alignment tool] retains input sequence order, displays residues identical to the top sequence as dots, gaps fragmentary data correctly, and allows a wide display permitting effective cross-species comparisons.  


Yak and bison -- despite being sister species -- share variation only at one site, position 98. Here yak is exclusively valine with the exception of a single deleterious occurence (see below) of leucine, whereas bison have a mix of valine and alanine (which otherwise is very rare at this position in mammals), ie the ancestral residue was valine. Thus no lineage sorting occured at any amino acid position in CYTB at the time these species diverged. Lineage sorting however may be important in the [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2694835/?tool=pubmed overall evolution of the Bovini]: 53 ancient polymorphisms (at the dna level) are said to have persisted since Bos and Bison diverged from Bubalus 5–8 million years ago.
Yak and bison -- despite being sister species -- share variation only at one site, position 98. Here yak is exclusively valine with the exception of a single deleterious occurrence (see below) of leucine, whereas bison have a mix of valine and alanine (which otherwise is very rare at this position in mammals), ie the ancestral residue was valine. Thus no lineage sorting occurred at any amino acid position in CYTB at the time these species diverged. Lineage sorting however may be important in the [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2694835/?tool=pubmed overall evolution of the Bovini]: 53 ancient polymorphisms (at the dna level) are said to have persisted since Bos and Bison diverged from Bubalus 5–8 million years ago.


The summary table of yak CYTB amino acid polymorphisms below arises from alignment of 5000 full-length mammalian cytochrome b orthologs. <font color=red>Red</font> indicates deleterious mutation, <font color=#00CC66>green</font> a possibly acceptable change but of restricted distribution, and <font color=blue>blue</font> a near-neutral substitution. It can be seen that the smallish yak population sampled ([http://onlinelibrary.wiley.com/doi/10.1111/j.1365-2699.2010.02379.x/full 21 wild, 48 domestic added in Aug 10 to 3-4 previously available]) already contains 5 deleterious alleles in CYTB which represents only 10% of the mitochondrial proteome.
The summary table of yak CYTB amino acid polymorphisms below arises from alignment of 5000 full-length mammalian cytochrome b orthologs. <font color=red>Red</font> indicates deleterious mutation, <font color=#00CC66>green</font> a possibly acceptable change but of restricted distribution, and <font color=blue>blue</font> a near-neutral substitution. It can be seen that the smallish yak population sampled ([http://onlinelibrary.wiley.com/doi/10.1111/j.1365-2699.2010.02379.x/full 21 wild, 48 domestic added in Aug 10 to 3-4 previously available]) already contains 5 deleterious alleles in CYTB which represents only 10% of the mitochondrial proteome.


     '''A017T      A084T    V098L     I188T     I192T     V195A     D214N     V329M     I348F'''  
The changes can also be displayed in context by coloring the appropriate residues in a reference sequence relative to a  composite sequence consolidating all the polymorphisms from distinct animals (no one animal has more than two of the 9; V195A + I348F occurs in two animals). This latter sequence is quite useful in comparing polymorphism sites across species as explained in the annotation tricks section.
    927  A    4,994  4522  4309  I     94  4528  4429  4610  4232  I
 
   4018  S      <font color=red>T</font>    430    667  4353    427  I   <font color=blue>512  N</font>    188    651  V
>CYTB_bosGruR Bos grunniens cytochrome b ref seq taken as gi|147744503
   <font color=#00CC66>46  T</font>       1     34     14    505  M     25     43    133     63  T
MTNIRKSHPLMKIVNN<font color=#00CC66>A</font>FIDLPAPSNISSWWNFGSLLGVCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMH<font color=#00CC66>A</font>NGASMFFICLYMH<font color=#00CC66>V</font>GRGLYYGSYTFLETWNIGV<font color=#00CC66>I</font>LLLTVMATAFMGYVLPWGQMSF
      3       1     11  A     <font color=red>T</font>    <font color=#00CC66>31  T</font>    4  G      8     44     45  M
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIITA<font color=#00CC66>I</font>AM<font color=#00CC66>V</font>HLLFLHETGSNNPTGISS<font color=#00CC66>D</font>ADKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
     M                <font color=red>L</font>              3      4  M     2  Y    <font color=#00CC66>22  M</font>      4  N
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTL<font color=#00CC66>V</font>ADLLTLTWIGGQPVEHPY<font color=#00CC66>I</font>IIGQLASIMYFLLILVLMPTAGTIENKLLKW
     F                1  N              2  V     <font color=red>1  A</font>     1  H     2  G     <font color=red>F</font>
     P                                      A                       1  E      1  A                    
>CYTB_bosGruP Bos grunniens composite polymorphisms: A017T A084T V098L I188T I192T V195A D214N V329M I348F
                                               S         
MTNIRKSHPLMKIVNN<font color=#FF0000>T</font>FIDLPAPSNISSWWNFGSLLGVCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMH<font color=#FF0000>T</font>NGASMFFICLYMH<font color=#FF0000>L</font>GRGLYYGSYTFLETWNIGV<font color=#FF0000>T</font>LLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIITA<font color=#FF0000>T</font>AM<font color=#FF0000>A</font>HLLFLHETGSNNPTGISS<font color=#FF0000>N</font>ADKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTL<font color=#FF0000>M</font>ADLLTLTWIGGQPVEHPY<font color=#FF0000>F</font>IIGQLASIMYFLLILVLMPTAGTIENKLLKW
     '''A017T      A084T    V098L    I188T    I192T    V195A    D214N    V329M    I348F'''
  927 A    4,994  4522  4309  I     94  4528  4429  4610  4232  I
  4018 S       <font color=red>T</font>   430 I   667 4353  L   427 I   <font color=blue>512  N</font>   188 T   651 V
    <font color=#00CC66>46  T</font>       1 P     34 M     14 I   505 M     25  T     43 E   133 A     63 T
    3 L       1 V     11 A     <font color=red>T</font>     <font color=#00CC66>31  T</font>     4  G     8 S     44 I     45 M
     M                 <font color=red>L</font>               3  F     4 M     2  Y     <font color=#00CC66>22  M</font>     4 N
     F                 1  N               2  V     <font color=red>1  A</font>     1  H     2  G     <font color=red>F</font>
     P                                      A                         1  E      1  A                      
                                               S         
  (analysis to be continued)  
  (analysis to be continued)  
  A017T
  A017T
Line 50: Line 62:
       1  V
       1  V


V098L: At position 98, the reduced alphabet consists of valine 90% of the time regardless of mammalian clade with the similar (branched chain aliphatic) isoleucine having substantial dispersed representation at nearly 9%. The 430 species in which it occurs are scattered incoherently within mammal clades, meaning that it has arisen independantly many times. V098I may be slightly suboptimal as there is an evident bias (at some level) against equal occurence. It likely co-exists with valine in most non-bottlenecked populations of mammals, observed if enough individuals of a given species are sequenced.
V098L: At position 98, the reduced alphabet consists of valine 90% of the time regardless of mammalian clade with the similar (branched chain aliphatic) isoleucine having substantial dispersed representation at nearly 9%. The 430 species in which it occurs are scattered incoherently within mammal clades, meaning that it has arisen independently many times. V098I may be slightly suboptimal as there is an evident bias (at some level) against equal occurence. It likely co-exists with valine in most non-bottlenecked populations of mammals, observed if enough individuals of a given species are sequenced.


However leucine, the seemingly similar third aliphatic residue, occurs one once despite being but a single base change transition away from the dominant residue. Were leucine a near-neutral substitution, its incidence would be vastly higher. Thus the change V098L reported for yak represents either a deleterious mutation or an unprecedented adaptation (eg to high altitude) or sequencing error in GenBank entry ACU82101. The same can be said for the more overtly radical change V098N in lemur AAS00156.
However leucine, the seemingly similar third aliphatic residue, occurs one once despite being but a single base change transition away from the dominant residue. Were leucine a near-neutral substitution, its incidence would be vastly higher. Thus the change V098L reported for yak represents either a deleterious mutation or an unprecedented adaptation (eg to high altitude) or sequencing error in GenBank entry ACU82101. The same can be said for the more overtly radical change V098N in lemur AAS00156.
Line 134: Line 146:
Many long-standing issues such as introgression, historic bottlenecks, population mixing, accrual of deleterious coding variants, hard polytomies, and lineage sorting during speciation can now be approached and resolved, especially with the increasing sequencing of end-Pleistocene frozen dna. This may allow more enlightened management of endangered species such as bison where populations reached rock bottom -- recovering numbers is not enough if genomic integrity is still at risk.
Many long-standing issues such as introgression, historic bottlenecks, population mixing, accrual of deleterious coding variants, hard polytomies, and lineage sorting during speciation can now be approached and resolved, especially with the increasing sequencing of end-Pleistocene frozen dna. This may allow more enlightened management of endangered species such as bison where populations reached rock bottom -- recovering numbers is not enough if genomic integrity is still at risk.


However, the flood of data raises significant issues in extraction of significant information: it is not instructive to align the tens of thousands of sequences available for each of 13 mitochondrial proteins -- that give a an intractible array of 3789 amino acids by 12500 sequences, enough to fill 20 x 100 = 2000 screens on the largest possible computer monitor. That data must be distilled down somehow to take-away information.
However, the flood of data raises significant issues in extraction of significant information: it is not instructive to align the tens of thousands of sequences available for each of 13 mitochondrial proteins -- that give a an intractable array of 3789 amino acids by 12500 sequences, enough to fill 20 x 100 = 2000 screens on the largest possible computer monitor. That data must be distilled down somehow to take-away information.


This section explains a practical desktop protocol for extracting the 'reduced phylogenetic alphabet' at each residue of the mitochondrial proteome. The method depends heavily on current capabilities of [http://blast.ncbi.nlm.nih.gov/Blast.cgi Blastp] at NCBI and so may not be completely stable to changes made there over time.
This section explains a practical desktop protocol for extracting the 'reduced phylogenetic alphabet' at each residue of the mitochondrial proteome. The method depends heavily on current capabilities of [http://blast.ncbi.nlm.nih.gov/Blast.cgi Blastp] at NCBI and so may not be completely stable to changes made there over time.


First note that tBlastn cannot be used against the nr or wgs nucleotide databases at NCBI (or with Blat at UCSC) since the signficantly [http://sourcedb.cas.cn/sourcedb_nwipb_cas/yw/rck/200906/t20090613_1042162.html different genetic code] of mammalian mitochondia is no longer supported as a parameter option. Other oddities involve missing terminal nucleotides that are added before translation. However mitochondrial dna is usually translated sensibly at GenBank protein entries.
First note that tBlastn cannot be used against the nr or wgs nucleotide databases at NCBI (or with Blat at UCSC) since the significantly [http://sourcedb.cas.cn/sourcedb_nwipb_cas/yw/rck/200906/t20090613_1042162.html different genetic code] of mammalian mitochondria is no longer supported as a parameter option. Other oddities involve missing terminal nucleotides that are added before translation. However mitochondrial dna is usually translated sensibly at GenBank protein entries.


  The vertebrate mitochondrial code:
  The vertebrate mitochondrial code:

Revision as of 15:56, 1 December 2010

Introduction to bison conservation genomics

(to be continued)

Phylogeny: bison and yak are sister groups

(to be continued)


Interpreting bison CYTB variation

(to be continued)

Interpreting yak CYTB variation

Although the mitochondria encodes the usual 20 amino acids, only a subset of physio-chemically similar residues (the reduced alphabet) ever appear at a given position in a given protein. This subset describes the acceptable substitutions that do not significantly disrupt protein functionality. Discovery of this reduced alphabet can be achieved with greater sensitivity when the number of available species and their individual sequences multiplicities are high. For mitochondrial proteins, that sensitivity is 1 in 10,000 (0.01% occurrence frequency) for a given amino acid.

Interpretive certainty is never attained without experimentation but improves (up to a point) with more sequence data. Here it is important to check whether certain less common substitutions have persisted over evolutionary time in a phylogenetically coherent manner (ie a sub-clade) or are novel adaptations perhaps in conjunction with a co-evolving residue at another site (or another protein, perhaps even nuclear-encoded). After these considerations, the remaining rare changes are either deleterious or sequencing error. Polymorphism significance can be pursued at the xray structural level for only 3 of the 13 mitochondrial proteins (CYTB, COX2, COX1) and even this is complicated in the case of CYTB by its oligomeric association with 3 nuclear encoded proteins.

Aligning CTYB from the 70 complete yak mitochondrial genomes available on 1 Dec 10 shows variation at just 9 sites along the protein (ie 9 nsSNPs). These are quickly found when the web alignment tool retains input sequence order, displays residues identical to the top sequence as dots, gaps fragmentary data correctly, and allows a wide display permitting effective cross-species comparisons.

Yak and bison -- despite being sister species -- share variation only at one site, position 98. Here yak is exclusively valine with the exception of a single deleterious occurrence (see below) of leucine, whereas bison have a mix of valine and alanine (which otherwise is very rare at this position in mammals), ie the ancestral residue was valine. Thus no lineage sorting occurred at any amino acid position in CYTB at the time these species diverged. Lineage sorting however may be important in the overall evolution of the Bovini: 53 ancient polymorphisms (at the dna level) are said to have persisted since Bos and Bison diverged from Bubalus 5–8 million years ago.

The summary table of yak CYTB amino acid polymorphisms below arises from alignment of 5000 full-length mammalian cytochrome b orthologs. Red indicates deleterious mutation, green a possibly acceptable change but of restricted distribution, and blue a near-neutral substitution. It can be seen that the smallish yak population sampled (21 wild, 48 domestic added in Aug 10 to 3-4 previously available) already contains 5 deleterious alleles in CYTB which represents only 10% of the mitochondrial proteome.

The changes can also be displayed in context by coloring the appropriate residues in a reference sequence relative to a composite sequence consolidating all the polymorphisms from distinct animals (no one animal has more than two of the 9; V195A + I348F occurs in two animals). This latter sequence is quite useful in comparing polymorphism sites across species as explained in the annotation tricks section.

>CYTB_bosGruR Bos grunniens cytochrome b ref seq taken as gi|147744503 
MTNIRKSHPLMKIVNNAFIDLPAPSNISSWWNFGSLLGVCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGLYYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIITAIAMVHLLFLHETGSNNPTGISSDADKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLTLTWIGGQPVEHPYIIIGQLASIMYFLLILVLMPTAGTIENKLLKW

>CYTB_bosGruP Bos grunniens composite polymorphisms: A017T A084T V098L I188T I192T V195A D214N V329M I348F
MTNIRKSHPLMKIVNNTFIDLPAPSNISSWWNFGSLLGVCLILQILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHTNGASMFFICLYMHLGRGLYYGSYTFLETWNIGVTLLLTVMATAFMGYVLPWGQMSF
WGATVITNLLSAIPYIGTNLVEWIWGGFSVDKATLTRFFAFHFILPFIITATAMAHLLFLHETGSNNPTGISSNADKIPFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLFAYAI
LRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLMADLLTLTWIGGQPVEHPYFIIGQLASIMYFLLILVLMPTAGTIENKLLKW

   A017T       A084T     V098L     I188T     I192T     V195A     D214N     V329M     I348F  
  927  A    4,994  A   4522  V   4309  I     94  I   4528  V   4429  D   4610  V   4232  I
 4018  S        3  T    430  I    667  S   4353  L    427  I    512  N    188  T    651  V
   46  T        1  P     34  M     14  I    505  M     25  T     43  E    133  A     63  T
    3  L        1  V     11  A      1  T     31  T      4  G      8  S     44  I     45  M
    3  M                  1  L                3  F      4  M      2  Y     22  M      4  N
    1  F                  1  N                2  V      1  A      1  H      2  G      2  F
    1  P                                      1  A                          1  E      1  A                       
                                              1  S        
(analysis to be continued) 
A017T
  927  A
 4018  S
   46  T
    3  L
    3  M
    1  F
    1  P

(analysis to be continued)
A084T  
 4,994  A
     3  T
     1  P
     1  V

V098L: At position 98, the reduced alphabet consists of valine 90% of the time regardless of mammalian clade with the similar (branched chain aliphatic) isoleucine having substantial dispersed representation at nearly 9%. The 430 species in which it occurs are scattered incoherently within mammal clades, meaning that it has arisen independently many times. V098I may be slightly suboptimal as there is an evident bias (at some level) against equal occurence. It likely co-exists with valine in most non-bottlenecked populations of mammals, observed if enough individuals of a given species are sequenced.

However leucine, the seemingly similar third aliphatic residue, occurs one once despite being but a single base change transition away from the dominant residue. Were leucine a near-neutral substitution, its incidence would be vastly higher. Thus the change V098L reported for yak represents either a deleterious mutation or an unprecedented adaptation (eg to high altitude) or sequencing error in GenBank entry ACU82101. The same can be said for the more overtly radical change V098N in lemur AAS00156.

V098L
4522	V most common amino acid at position 98 of CYTB
 430	I
  34	M
  11	A bison
   1	L yak
   1	N lemur

(analysis to be continued)
V098L  
 4522  V
  430  I
   34  M
   11  A
    1  L
    1  N

(analysis to be continued)
I188T  
 4309  I
  667  S
   14  I
    1  T

(analysis to be continued)
I192T  
   94  I
 4353  L
  505  M
   31  T
    3  F
    2  V
    1  A
    1  S

(analysis to be continued)
V195A  
 4528  V
  427  I
   25  T
   11  X
    4  G
    4  M
    1  A

(analysis to be continued)
D214N  
 4429  D
  512  N
   43  E
    8  S
    4  X
    2  Y
    1  H

(analysis to be continued)
V329M  
 4610  V
  188  T
  133  A
   44  I
   22  M
    2  G
    1  E

(analysis to be continued)
I348F  
 4232  I
  651  V
   63  T
   45  M
    4  N
    2  F
    1  A

Kilo-sequence alignment tricks

New sequencing technologies have greatly affected the amount of mammalian mitochondrial genomic data available at GenBank. Five years ago, it was acceptable to publish population-level D loop sequences accompanied by a few fragmentary coding reads; today, a publication might offer 60-70 entire mitochondrial genomes. This favors evolutionary study of mitochondrial proteins over comparative genomics of nuclear genome products because the latter is still restricted to around 50 species (Dec 2010) almost all incompletely sequenced.

Many long-standing issues such as introgression, historic bottlenecks, population mixing, accrual of deleterious coding variants, hard polytomies, and lineage sorting during speciation can now be approached and resolved, especially with the increasing sequencing of end-Pleistocene frozen dna. This may allow more enlightened management of endangered species such as bison where populations reached rock bottom -- recovering numbers is not enough if genomic integrity is still at risk.

However, the flood of data raises significant issues in extraction of significant information: it is not instructive to align the tens of thousands of sequences available for each of 13 mitochondrial proteins -- that give a an intractable array of 3789 amino acids by 12500 sequences, enough to fill 20 x 100 = 2000 screens on the largest possible computer monitor. That data must be distilled down somehow to take-away information.

This section explains a practical desktop protocol for extracting the 'reduced phylogenetic alphabet' at each residue of the mitochondrial proteome. The method depends heavily on current capabilities of Blastp at NCBI and so may not be completely stable to changes made there over time.

First note that tBlastn cannot be used against the nr or wgs nucleotide databases at NCBI (or with Blat at UCSC) since the significantly different genetic code of mammalian mitochondria is no longer supported as a parameter option. Other oddities involve missing terminal nucleotides that are added before translation. However mitochondrial dna is usually translated sensibly at GenBank protein entries.

The vertebrate mitochondrial code:

TTT F Phe      TCT S Ser      TAT Y Tyr      TGT C Cys  
TTC F Phe      TCC S Ser      TAC Y Tyr      TGC C Cys  
TTA L Leu      TCA S Ser      TAA * Ter      TGA W Trp  
TTG L Leu      TCG S Ser      TAG * Ter      TGG W Trp  

CTT L Leu      CCT P Pro      CAT H His      CGT R Arg  
CTC L Leu      CCC P Pro      CAC H His      CGC R Arg  
CTA L Leu      CCA P Pro      CAA Q Gln      CGA R Arg  
CTG L Leu      CCG P Pro      CAG Q Gln      CGG R Arg  

ATT I Ile      ACT T Thr      AAT N Asn      AGT S Ser  
ATC I Ile i    ACC T Thr      AAC N Asn      AGC S Ser  
ATA M Met i    ACA T Thr      AAA K Lys      AGA * Ter  Bos can use ATA as initiation codon
ATG M Met i    ACG T Thr      AAG K Lys      AGG * Ter  

GTT V Val      GCT A Ala      GAT D Asp      GGT G Gly  
GTC V Val      GCC A Ala      GAC D Asp      GGC G Gly  
GTA V Val      GCA A Ala      GAA E Glu      GGA G Gly  
GTG V Val i    GCG A Ala      GAG E Glu      GGG G Gly  

    AAs  = FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG
  Start  = --------------------------------MMMM---------------M------------
  Base1  = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Base2  = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
  Base3  = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG