There are 64 different codons (61 codons encoding for amino acids and 3 stop codons) but only 20 different translated amino acids. The overabundance in the number of codons allows many amino acids to be encoded by more than one codon. Because of such redundancy it is said that the genetic code is degenerate. The genetic codes of different organisms are often biased towards using one of the several codons that encode the same amino acid over the others—that is, a greater frequency of one will be found than expected by chance. How such biases arise is a much debated area of molecular evolution. Codon usage tables detailing genomic codon usage bias for organisms in GenBank and RefSeq can be found in the HIVE-Codon Usage Tables (HIVE-CUTs) project[dead link],[1] which contains two distinct databases, CoCoPUTs and TissueCoCoPUTs. Together, these two databases provide comprehensive, up-to-date codon, codon pair and dinucleotide usage statistics for all organisms with available sequence information and 52 human tissues, respectively.[2][3]
It is generally acknowledged that codon biases reflect the contributions of 3 main factors: GC-biased gene conversion that favors GC-ending codons in diploid organisms, arrival biases reflecting mutational preferences (typically favoring AT-ending codons), and natural selection for codons that are favorable in regard to translation.[4][5][6] Optimal codons in fast-growing microorganisms, like Escherichia coli or Saccharomyces cerevisiae (baker's yeast), reflect the composition of their respective genomic transfer RNA (tRNA) pool.[7] It is thought that optimal codons help to achieve faster translation rates and high accuracy. As a result of these factors, translational selection is expected to be stronger in highly expressed genes, as is indeed the case for the above-mentioned organisms.[8][9] In other organisms that do not show high growing rates or that present small genomes, codon usage optimization is normally absent, and codon preferences are determined by the characteristic mutational biases seen in that particular genome. Examples of this are Homo sapiens (human) and Helicobacter pylori.[10][11] Organisms that show an intermediate level of codon usage optimization include Drosophila melanogaster (fruit fly), Caenorhabditis elegans (nematode worm), Strongylocentrotus purpuratus (sea urchin), and Arabidopsis thaliana (thale cress).[12] Several viral families (herpesvirus, lentivirus, papillomavirus, polyomavirus, adenovirus, and parvovirus) are known to encode structural proteins that display heavily skewed codon usage compared to the host cell. The suggestion has been made that these codon biases play a role in the temporal regulation of their late proteins.[13]
The nature of the codon usage-tRNA optimization has been fiercely debated. It is not clear whether codon usage drives tRNA evolution or vice versa. At least one mathematical model has been developed where both codon usage and tRNA expression co-evolve in feedback fashion (i.e., codons already present in high frequencies drive up the expression of their corresponding tRNAs, and tRNAs normally expressed at high levels drive up the frequency of their corresponding codons). However, this model does not seem to yet have experimental confirmation. Another problem is that the evolution of tRNA genes has been a very inactive area of research.[citation needed]
^L. Duret and N. Galtier (2009). "Biased gene conversion and the evolution of mammalian genomic landscapes". Annu Rev Genomics Hum Genet. 10: 285–311. doi:10.1146/annurev-genom-082908-150001. PMID19630562.
^Sharp, Paul M.; Stenico, Michele; Peden, John F.; Lloyd, Andrew T. (1993). "Codon usage: mutational bias, translational selection, or both?". Biochem. Soc. Trans. 21 (4): 835–841. doi:10.1042/bst0210835. PMID8132077. S2CID8582630.
^Kanaya, Shigehiko; Yamada, Yuko; Kudo, Yoshihiro; Ikemura, Toshimichi (1999). "Studies of codon usage and tRNA genes of 18 unicellular organisms and quantification of Bacillus subtilis tRNAs: gene expression level and species-specific diversity of codon usage based on multivariate analysis". Gene. 238 (1): 143–155. doi:10.1016/s0378-1119(99)00225-5. ISSN0378-1119. PMID10570992.
^Duret, Laurent (2000). "tRNA gene number and codon usage in the C. elegans genome are co-adapted for optimal translation of highly expressed genes". Trends in Genetics. 16 (7): 287–289. doi:10.1016/s0168-9525(00)02041-2. ISSN0168-9525. PMID10858656.