CRISPR 101: Cytosine Transversion Editors

By Emily P. Bentley

The first base editors revolutionized CRISPR gene editing. Cytosine base editors (CBEs) and adenine base editors (ABEs) chemically modify target bases without breaking the DNA backbone, making them efficient and precise tools for altering DNA sequences.

These first base editors were both deaminases: they removed an —NH2 (amine) group from the target base and replaced it with a double-bonded oxygen. This simple enzymatic reaction enabled conversion between the purine bases (A and G) or between the pyrimidine bases (C and T). (Check out our previous post on these editors for more details on how they work!)

Chemical structures of nucleobases are shown. Conversions between purines or between pyrimidines are labeled as base transitions. Conversions between purines and pyrimidines are labeled as base transversions.  One row encompasses the purines: adenine, guanine, and hypoxanthine (inosine). Guanine and hypoxanthine (inosine) are shown in the same color to represent their equivalent base pairing activity. A double sided arrow, labeled “base transition,” shows conversion between adenine and guanine / hypoxanthine.  The next row encompasses the pyrimidines: cytosine, thymine, and uracil. Thymine and uracil are shown in the same color to represent their equivalent base pairing activity.. A double sided arrow, labeled “base transition,” shows conversion between cytosine and thymine / uracil.
Figure 1: Base transitions are edits between the purine bases (adenine, guanine, and hypoxanthine) or between the pyrimidine bases (cytosine, thymine, and uracil). Base transversions are edits that convert a purine to a pyrimidine or vice-versa. Note: hypoxanthine (shown) is the nucleobase component of the nucleoside inosine. Created with BioRender.com.

Although it was a huge innovation, there was a limitation to this approach. CBEs and ABEs are base transition editors. They adjust the side groups of DNA bases and leave the core ring structure intact, allowing them to make 4 out of the possible 12 base-to-base conversions. The other 8 conversions are base transversions: they require switching between a two-ring purine and one-ring pyrimidine base. This is a far more complex chemical task, not a job for a single enzyme.

Base transitions Base transversions

C → T
G → A

A → G
T → C

C → G                   C → A
G → C                   G → T

A → C                   A → T
T → G                   T → A

So how have scientists solved this problem? By taking advantage of repair pathways within the cell — in some cases, the same repair pathways that were considered problematic in the development of CBEs and ABEs.

This post includes a lot of acronyms, so we’ve included a glossary at the end if you ever need to double-check your vocabulary!

A brief review of base editing tools

First, let’s briefly review how CRISPR-derived base editing works in general. Base editors use either catalytically dead Cas9 (dCas9) or a Cas9 nickase (nCas9) to avoid introducing double-strand breaks. The only things base editors really need from Cas9 are sequence specificity and the R-loop structure.

In order to bind DNA, Cas9 requires (1) a guide RNA, (2) a complementary DNA sequence, and (3) a protospacer-adjacent motif (PAM) on the opposite strand. Once it finds a PAM, Cas9 unwinds the adjacent DNA to create a region of separate strands called an R-loop (Figure 2). This allows the guide RNA to base pair with the DNA target strand, leaving the non-target strand unpaired and accessible on the outside of the enzyme.

A cartoon depiction of Cas9 + gRNA bound to DNA. The gRNA is base paired to the DNA target strand, separating the two strands of DNA. The non-target strand is left single-stranded. Cas9 binds directly to the PAM sequence, which is on the non-target strand immediately adjacent to the open loop of separated DNA strands.
Figure 2: CRISPR R-loop. Created with BioRender.com.

Base editors take advantage of this single-stranded DNA (ssDNA) region, using enzymes fused to Cas9 to alter the exposed ssDNA bases. Thus, base editors have an activity window restricted to the stretch of ssDNA created by Cas9 binding.

""Pro tip! In CRISPR, the non-target strand is the DNA strand that does not base pair with the gRNA. Base editors target this “non-target strand.” To avoid confusion, we usually use different terminology when discussing base editors, like calling it the “edited strand.”

Uracil excision repair leads to the first base transversion editor

The story of base transversion editors starts with researchers in the Keith Joung lab, who were engineering an ABE to reduce its tendency to make off-target edits to RNA. Like other ABEs, the editor they were using consisted of Cas9 fused to an engineered adenosine deaminase capable of chemically converting adenosine (A) to inosine (I, treated as G by the cell). Surprisingly, the team noticed that this editor sometimes introduced C → G edits at certain positions within the editing activity window (Kurt et al., 2021).

They hypothesized that the adenosine deaminase could also perform cytidine (C) deamination, converting the C base to uracil (U). This is the same chemical reaction performed by CBEs. However, most CBEs include an inhibitor of base excision repair (BER), the cellular process that catches mutated U’s in DNA and ideally converts them back to Cs. In this case, the team suspected that the BER process was excising the incorrect U base, creating an “abasic,” or empty, DNA site, and then replacing the incorrect base with a G.

""Pro tip! Abasic sites are also called apurinic or apyrimidinic sites and abbreviated as AP sites.

To explore this further, the team started engineering a CBE to be worse at its original job. They removed the uracil glycosylase inhibitor (UGI) components, which inhibit BER during cytosine-to-uracil editing, and instead introduced its opposite, a uracil DNA N-glycosylase (UNG) from E. coli. The resulting construct, eUNG-BE4max(R33A)ΔUGI, was given the pithier name CGBE1, for “C → G Base Editor 1,” and had a C → G editing efficiency of up to 68% at the most favorable target sites.

Differential cytosine editing between bacteria and mammalian cells

Researchers in the Xueli Zhang lab were making similar observations around the same time. (The papers were actually published in the same journal on the same day.) This team created a similar construct that they called a Glycosylase Base Editor, or GBE, for its reliance on the glycosylase UNG to crease abasic sites (Zhao et al., 2021). Their GBE produced C → G edits in HEK293T cells with over 50% efficiency in 23 out of 30 target sites.

But the same strategy in E. coli produced different results. GBEs seemed to follow the same initial mechanism as in mammalian cells — by excising the freshly edited U bases — but E. coli instead filled those abasic sites with A's, producing C → A edits.

The nucleobases are shown in a similar layout to Figure 1, with arrows describing conversions between them. In step one, a cytidine deaminase converts cytosine to uracil; this is catalyzed directly by the base editor. Next, base excision of uracil (also by the base editor) is repaired in two different ways by the cell, shown by an arrow that splits into two outcomes. Repair in E. coli leads to adenine, while repair in mammalian cells leads to guanine.
Figure 3: Mechanism of cytosine transversion editors. Red arrows indicate a conversion directly catalyzed by the base editor. Black arrows indicate natural cellular processes. Created with BioRender.com.


Beyond UNG: Optimizing C → G editors

Now that several labs had shown that preferential C → G editing was possible, the race was on to develop more efficient and precise editors. Most researchers agreed on the first part of the likely editing mechanism: cytidine deaminase converts C to U, and UNG excises the new U to create an abasic site. But how that site is filled depends on cellular factors. Multiple labs tried to identify beneficial factors to include in even better cytosine transversion editors.

New DNA repair proteins fused to CGBEs

Instead of using UNG, the Wei Leong Chew Lab developed two fusions of a CBE with other BER components: the DNA repair protein rXRCC1, or the DNA binding and lyase domain of DNA polymerase β (PB) (Chen et al., 2021). Both of these fusion constructs improved editing in various cell types, although efficacy varied among target sequences.

The researchers hypothesized that the CGBE edit process occurs entirely while Cas9 is bound, keeping the two DNA strands separate and preventing repair based on the complementary strand. This could explain why the edited U base opposite the original G is not primarily converted back to a complementary C.

Optimized base editing with eOPTI-CGBE and cOPTI-CGBE

Many optimizations had already been proposed for other base editors, so the Yidi Sun and Erwei Zuo labs tested and implemented several of these advances for CGBEs (Yuan et al., 2021). In addition, they showed that UNG from both E. coli and C. elegans could be effective components of CGBEs. Their eOPTI-CGBE and cOPTI-CGBE had higher editing efficiency and produced fewer indels than first-generation editors.

This team also demonstrated that different deaminase and UNG enzymes had different sequence preferences, which they defined thoroughly. Plus, they expanded CGBE target scope by creating BE variants with other Cas9s that recognize different PAMs. Their computational tool for predicting CGBE editing outcomes and assisting sgRNA selection is available at http://www.sunlab.fun:3838/BE_SMART/.

Even more new DNA repair proteins fused to CGBEs

The David Liu lab also experimented with adding different proteins to the CGBE architecture. They conducted a CRISPRi screen to identify endogenous genes that impact CGBE efficiency. In addition to using a new UNG ortholog from Mycobacterium smegmatis, termed UdgX, they found three proteins whose fusion improved the CGBE: DNA polymerase D2 (POLD2), exonuclease 1 (EXO1), and RNA binding motif protein X-linked (RBMX) (Koblan et al., 2021). Several of these proteins had additive effects when multiple were included in the CGBE. They also developed a machine learning model capable of predicting editing outcomes of their various CGBEs at different target sequences and in different cell types, available at www.crisprbehive.design.

CGBEs fused to pioneer factors to improve chromatin accessibility

One of the original labs that pioneered these editors (calling them GBEs), Xueli Zhang’s lab, returned with a unique approach to improving C → G editing. Instead of fusing DNA repair proteins to the base editor, they added the pioneer factor SOX2, which is capable of binding to and remodeling inaccessible chromatin (Yang et al., 2022). Specifically, they found that the SOX2 activation domain (SAD), which recruits histone acetyltransferase to open chromatin, improved editing efficiencies when fused to the N terminus of a GBE construct. This effect improved GBE editing but not CBE editing, maybe because repairing a GBE-derived abasic site requires more cellular factors binding to DNA. However, ABEs were improved by adding SOX, and GBE editing was increased even in chromatin regions that were already accessible. It's not entirely clear why this is, but it might point to a more dynamic picture of chromatin than an accessible/inaccessible binary.

A word of caution: CGBEs and indels

Cytosine transversion editors rely on a multi-step cellular repair pathway that is known for sometimes causing double-strand breaks (DSBs). The Fei-Long Meng Lab did in-depth profiling to compare editing outcomes between CBEs and CGBEs. They found that C → G editing was often accompanied by small deletions in their target loci (between 4.0% to 29.2% of studied sequences) (Huang et al., 2024). BE4max, a CBE, generated similar deletions at a lower frequency. Their data suggest that DSBs are a common intermediate of CGBE activity, and they are sometimes processed by error-prone non-homologous end joining (NHEJ). To mitigate this, the team introduced the enzyme HMCES, either as a fusion construct or by co-transfection. HMCES forms a chemical crosslink with DNA abasic sites, protecting them from cellular processes that would cause DSBs, and its inclusion significantly reduced indels generated by CGBEs.

What's next?

Base transversion editing is still advancing rapidly. If it follows a similar path to base transition editing, we can expect several generations of optimization before the field begins to converge on a set of standard tools. Additionally, aspiring genetic editors can now compare the advantages of base editing and prime editing, an alternative editing strategy that replaces a target region with a newly synthesized, templated sequence.

And the story still isn’t over! Keep an eye out for our next post about base editing, where we’ll tackle transversion editors for the three other DNA bases, bringing the dream of any-base editing closer than ever.

Subscribe to the Addgene blog


References and Resources

Glossary

We introduced a lot of terminology in this post! Here’s a quick refresher if you ever lose track of which acronym is which.

Acronym Full name Notes
CRISPR terms
gRNA/sgRNA Guide RNA RNA used by Cas9 to locate a matching DNA sequence.
  Target strand The DNA strand that base pairs with the guide RNA.
  Non-target strand The DNA strand that does not base pair with the guide RNA. Despite being called the "non-target strand" in CRISPR terms, this strand is edited by base editors!
PAM Protospacer-Adjacent Motif A short sequence that must be present in the non-target strand of DNA for Cas9 to bind.
  R-loop The structure formed by Cas9 binding to DNA and separating the two strands. The guide RNA base pairs to the target strand. The complementary region of the non-target strand is excluded as ssDNA.
DSB Double-Stranded Break A cut to both strands of DNA.
Indel Insertion/deletion A type of unwanted DNA edit.
NHEJ Non-Homologous End Joining An error-prone cellular process to repair DSBs.
Base Editing terms
  Base transition Base changes between purines (A ↔ G) or between pyrimidines (C ↔ T).
  Base transversion  Base changes from purine (A or G) to pyrimidine (C or T) or vice-versa.
CBE Cytosine Base Editor Converts C → T; the first type of base editor invented.
ABE Adenine Base Editor Converts A → G; the second type of base editor invented.
CGBE C → G Base Editor Converts C → G; also called a glycosylase base editor (GBE).
GBE Glycosylase Base Editor A base editor that relies on DNA glycosylation to create an abasic site. In this post, it refers to the same approach as a CGBE.
BER Base Excision Repair A cellular process in which incorrect nucleobases are removed from the backbone, creating abasic sites for further repair.
  Abasic site A DNA site with the nucleobase removed; also called an apurinic / apyrimidinic (AP) site.
UNG / UDG Uracil DNA N-Glycosylase Removes uracil from DNA, creating an abasic site.
UGI Uracil Glycosylase Inhibitor Inhibits UNG to preserve uracil bases in DNA.


References

Chen, L., Park, J. E., Paa, P., Rajakumar, P. D., Prekop, H.-T., Chew, Y. T., Manivannan, S. N., & Chew, W. L. (2021). Programmable C:G to G:C genome editing with CRISPR-Cas9-directed base excision repair proteins. Nature Communications, 12(1), 1384. https://doi.org/10.1038/s41467-021-21559-9

Huang, M. E., Qin, Y., Shang, Y., Hao, Q., Zhan, C., Lian, C., Luo, S., Liu, L. D., Zhang, S., Zhang, Y., Wo, Y., Li, N., Wu, S., Gui, T., Wang, B., Luo, Y., Cai, Y., Liu, X., Xu, Z., … Meng, F.-L. (2024). C-to-G editing generates double-strand breaks causing deletion, transversion and translocation. Nature Cell Biology, 26(2), 294–304. https://doi.org/10.1038/s41556-023-01342-2

Koblan, L. W., Arbab, M., Shen, M. W., Hussmann, J. A., Anzalone, A. V., Doman, J. L., Newby, G. A., Yang, D., Mok, B., Replogle, J. M., Xu, A., Sisley, T. A., Weissman, J. S., Adamson, B., & Liu, D. R. (2021). Efficient C•G-to-G•C base editors developed using CRISPRi screens, target-library analysis, and machine learning. Nature Biotechnology, 39(11), 1414–1425. https://doi.org/10.1038/s41587-021-00938-z

Koblan, L. W., Doman, J. L., Wilson, C., Levy, J. M., Tay, T., Newby, G. A., Maianti, J. P., Raguram, A., & Liu, D. R. (2018). Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nature Biotechnology, 36(9), 843–846. https://doi.org/10.1038/nbt.4172

Kurt, I. C., Zhou, R., Iyer, S., Garcia, S. P., Miller, B. R., Langner, L. M., Grünewald, J., & Joung, J. K. (2021). CRISPR C-to-G base editors for inducing targeted DNA transversions in human cells. Nature Biotechnology, 39(1), 41–46. https://doi.org/10.1038/s41587-020-0609-x

Yang, C., Dong, X., Ma, Z., Li, B., Bi, C., & Zhang, X. (2022). Pioneer Factor Improves CRISPR-Based C-To-G and C-To-T Base Editing. Advanced Science (Weinheim, Baden-Wurttemberg, Germany), 9(26), e2202957. https://doi.org/10.1002/advs.202202957

Yuan, T., Yan, N., Fei, T., Zheng, J., Meng, J., Li, N., Liu, J., Zhang, H., Xie, L., Ying, W., Li, D., Shi, L., Sun, Y., Li, Y., Li, Y., Sun, Y., & Zuo, E. (2021). Optimization of C-to-G base editors with sequence context preference predictable by machine learning methods. Nature Communications, 12(1), 4902. https://doi.org/10.1038/s41467-021-25217-y

Zhao, D., Li, J., Li, S., Xin, X., Hu, M., Price, M. A., Rosser, S. J., Bi, C., & Zhang, X. (2021). Glycosylase base editors enable C-to-A and C-to-G base changes. Nature Biotechnology, 39(1), 35–40. https://doi.org/10.1038/s41587-020-0592-2

Additional Resources on the Addgene blog

Additional resources on Addgene.org

Topics: CRISPR 101, Base Editing

Leave a Comment

Sharing science just got easier... Subscribe to our blog