To Codon Optimize or Not: That is the Question

By Alyssa Cecchetelli

There are 64 different codons that encode 20 amino acids and three stop codons, meaning that the same amino acid can be encoded by more than one codon. Although the genetic code is universal, many different organisms actually prefer certain codons over others for certain amino acids. This is termed codon usage bias. In fact, some species are known to avoid certain codons altogether. 

So what does this mean for a molecular biologist who wants to express genes from one organism in another? Let’s take a look at codon usage and when you might want to optimize codons for expression in a particular organism.

Why is codon usage important?

Translation and protein synthesis relies on transfer RNAs (tRNAs). tRNAs bind to and deliver amino acids to the ribosome where they are incorporated into the growing polypeptide chain. One part of the tRNA contains the anticodon,  a three nucleotide sequence complementary to the mRNA sequence. The 3’ end of  the tRNA binds the amino acid that corresponds to the anticodon sequence. That means there are 61 possible tRNAs. Cells however, may not express all 61 of these tRNAs and those tRNAs that are expressed may be found at very different levels (Mauro and Chappell, 2014). Due to this variation in tRNA expression, the codons that encode a protein may affect the rate of translation and thus protein expression. In fact studies have shown that translational efficiency is correlated with codon bias across all endogenous genes in E. coli and S. cerevisiae (Tuller et. al., 2010).

Illustration showing the addition of amino acids by tRNAs into a growing chain off of the P site on the ribosome. tRNAs charged with amino acids enter in the A site, and uncharged tRNAs exit the E site of the ribosome.
Figure 1: An overview of peptide synthesis. A ribosome interacts with mRNA and charged tRNAs to build a peptide. Image from Mariana Ruiz Villarreal.

By the late 1980s, scientists even created a codon adaptation index (CAI) that is based on the codon usage frequency in a reference of highly expressed genes (Sharp and Li, 1987). In 2000 the Ikemura lab created the Codon Usage Tabulated from GenBank database (CUTG), that provides an electronic dataset for codon-usage of 257,468 genes across 8,792 organisms (Nakamura et al., 2000). 

What is codon optimization?

Codon optimization is an approach in gene engineering to improve gene expression by changing synonymous codons based on an organism's codon bias. The idea is that scientists can make mutations throughout a gene of interest based on an organisms codon usage bias to increase translational efficiency and thus protein expression without altering the sequence of the protein. As DNA synthesis has become quite cost-effective it seems easy for scientists to re-synthesize DNA with the most optimized sequence. Unfortunately, it is not always easy to identify the ideal codon for every amino acid across an entire polypeptide. Many researchers have created codon-optimization algorithms/resources (Athey et al., 2017) and many DNA synthesis companies, such as IDT, Genewiz and GenScript, offer tools to help with these decisions. 

So when do you want to codon optimize your gene of interest?

Codon optimization has become extremely useful in expressing functional proteins in hosts that do not naturally express that gene. The protein you want to express may contain codons that are rarely used in the host you are using or come from an organism that contains expression limiting regulatory elements or use a non-canonical code (Gustafsson et al., 2004), such as Hoogsten base pairs and wobble pairs

On the left is the codon wheel that depicts 64 codons that encode 20 amino acids. On the right is an example of codon usage among different organisms.
Figure 2: (A) The codon wheel depicts the 64 codons that encodes the 20 amino acids and three stop codons. (B) Example of codon usage among different organisms. There are several codons that encode the same amino acid. Each organism, however, may preferentially use one codon over the other while other codons are under expressed compared to others. Image from Hiss et al., 2017.

Most commonly, scientists codon optimize genes from eukaryotic organisms for expression in prokaryotic systems or yeast (Lanza et al., 2014). For example, it may be worth codon optimizing a human gene for expression in E. coli. Genes are also optimized for expression in mammalian cells. For example, Henry Lester’s lab optimized the C. elegans GluCl ion channel genes for expression in mammalian cells. This was the first codon optimized membrane protein expressed in mammalian cells. In this study, codon optimization resulted in a 6-9 fold increase in expression of these channels providing scientists with the ability to selectively silence neurons in vivo (Slimko and Lester, 2003)

Addgene has a large selection of codon optimized genes, so check out our inventory before synthesizing a gene yourself. 

Caveats and concerns: When not to codon optimize?

Although codon optimization increases protein production in certain systems, synonymous changes to a gene sequence can cause unexpected detrimental results to the protein. Codon optimization could affect protein conformation, folding and stability, change post-translational modification sites and even affect protein function (Mauro and Chappell, 2014). Different rates of translation by different tRNAs, including those that exhibit wobble base-pairing (a tRNA that can recognize multiple synonymous codons) may actually be critical for determining the rate of translation. The ribosome may slow and pause during elongation which may actually be necessary for proper protein folding (Stadler and Fire, 2011). Therefore, codon optimization may disrupt the fine-tuned timing of translation and ultimately protein function. 

In addition codon optimization could remove information encoded in the primary sequence of a particular gene which could affect translation initiation (Dresios et al., 2006) and mRNA stability (Hausser et al., 2013) to name a few. This could affect synonymous mutations which have even been linked to some diseases. It has been reported that 5-10% of human genes contain a region where a synonymous mutation can be harmful (Sauna and Kimchi-Sarfaty, 2011). 

Overall, codon optimization may not always be the optimal strategy for increasing protein production. If you are trying to express a protein in its native host there is likely no need to codon optimize as this gene should already be biased to that organism's codon usage. Others suggest codon optimization should not be used in some in vivo experiments and in biotechnology and therapeutics (Mauro and Chappell 2014). 

Ultimately, it is important to take a close look at your experiment and gene that you may want to codon optimize before you get started. There are a ton of online resources available online to both determine and synthesize DNA sequences for optimal expression in your system of choice.

Check out Addgene’s codon-optimized genes!


References

Athey J, Alexaki A, Osipova E, Rostovtsev A, Santana-Quintero LV, Katneni U, Simonyan V, Kimchi-Sarfaty C (2017) A new and updated resource for codon usage tables. BMC Bioinformatics 18: . https://doi.org/10.1186/s12859-017-1793-7

Dresios J, Chappell SA, Zhou W, Mauro VP (2005) An mRNA-rRNA base-pairing mechanism for translation initiation in eukaryotes. Nat Struct Mol Biol 13:30–34 . https://doi.org/10.1038/nsmb1031

Gustafsson C, Govindarajan S, Minshull J (2004) Codon bias and heterologous protein expression. Trends in Biotechnology 22:346–353 . https://doi.org/10.1016/j.tibtech.2004.04.006

Hausser J, Syed AP, Bilen B, Zavolan M (2013) Analysis of CDS-located miRNA target sites suggests that they can effectively inhibit translation. Genome Research 23:604–615 . https://doi.org/10.1101/gr.139758.112

Mauro VP, Chappell SA (2014) A critical analysis of codon optimization in human therapeutics. Trends in Molecular Medicine 20:604–613  https://doi.org/10.1016/j.molmed.2014.09.003

Nakamura Y (2000) Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Research 28:292–292 . https://doi.org/10.1093/nar/28.1.292

Sauna ZE, Kimchi-Sarfaty C (2011) Understanding the contribution of synonymous mutations to human disease. Nat Rev Genet 12:683–691. https://doi.org/10.1038/nrg3051

Sharp PM, Li W-H (1987) The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucl Acids Res 15:1281–1295 . https://doi.org/10.1093/nar/15.3.1281

Slimko EM, Lester HA (2003) Codon optimization of Caenorhabditis elegans GluCl ion channel genes for mammalian cells dramatically improves expression levels. Journal of Neuroscience Methods 124:75–81 . https://doi.org/10.1016/s0165-0270(02)00362-x

Stadler M, Fire A (2011) Wobble base-pairing slows in vivo translation elongation in metazoans. RNA 17:2063–2073 . https://doi.org/10.1261/rna.02890211

Tuller T, Waldman YY, Kupiec M, Ruppin E (2010) Translation efficiency is determined by both codon bias and folding energy. Proc Natl Acad Sci USA 107:3645–3650 https://doi.org/10.1073/pnas.0909910107

Varani G, McClain WH (2000) The G·U wobble base pair. EMBO Rep 1:18–23 . https://doi.org/10.1093/embo-reports/kvd001

Additional resources on the Addgene blog

Resources on Addgene.org

 

Topics: Molecular Biology Protocols and Tips, Plasmids

Leave a Comment

Sharing science just got easier... Subscribe to our blog