A Needle in a Base-Stack: Cas9 Structural Biology

By Emily P. Bentley

Have you ever designed a CRISPR guide RNA and wondered why it is limited to only 20 bases, or why it’s so important to choose a target sequence with a nearby protospacer-adjacent motif (PAM)? Cas9 is becoming an ever more ubiquitous tool for genome engineering, and studying its structure can help us understand the parameters of CRISPR experimental design. Let’s dive into some structural biology!

Major features

First, we’ll cover the basic parts of the Cas9 enzyme.

See figure caption for details.

 

Figure 1: A cartoon depiction of Cas9’s two major lobes, REC and NUC, and their subdomains. NUC includes the HNH and RuvC catalytic domains, as well as the CTD (also known as the PI), while REC includes most of the bridge helix. Created with BioRender.com.

 

Cas9 has two major lobes, called NUC (for nuclease) and REC (for recognition). The NUC lobe contains the DNA scissors: the HNH and RuvC domains, which are named for similar nuclease folds found in other proteins. The C-terminal domain (CTD) of the protein also belongs to the NUC lobe. Because its primary role is recognizing the PAM on DNA, the CTD is also sometimes called the PAM-interacting (PI) domain.

The guide RNA is bound primarily by the REC lobe. An arginine-rich bridge helix connects the two lobes and helps recognize the guide RNA. Unlike the cleavage domains, the recognition domains—including the CTD and the entire REC lobe—do not structurally resemble any other known proteins; they are completely unique to the CRISPR system (Nishimasu et al., 2014).

Table 1. Summary of Cas9 domains, the lobe each domain belongs to, and their roles.

Lobe

Domain

Role

NUC

HNH

Cut DNA target strand

RuvC

Cut DNA non-target strand

CTD (or PI)

Recognize PAM

REC

REC I, II, and III

Recognize guide RNA

Bridge helix

 

Unbound Cas9

Now that we know what we’re looking at, let’s peek at unbound—“apo”—Cas9 from Streptococcus pyogenes.

Crystal structure of S. pyogenes Cas9 in the apo state (PDB ID 4CMP). The protein is shown in the open conformation, with the lobes stretched apart. The REC lobe in particular is not compact and exposes a lot of irregular surface area to solvent.

 

Figure 2:  Crystal structure of S. pyogenes Cas9 in the apo state (PDB ID 4CMP) with domains colored as in Figure 1. HNH, red; RuvC, orange; CTD, yellow; REC lobe, light blue; bridge helix, purple.

 

Apo Cas9 changes conformation frequently (Osuka et al., 2018), but it’s most often shown in an “open” state, with the NUC and REC lobes laying open like a book (Figure 2).

The portion of the CTD that recognizes the PAM is so flexible in this state that it doesn’t appear in the crystal structure (Figure 2) at all. When this crucial recognition region is disordered, Cas9 binds DNA very weakly, most likely through unstructured electrostatic interactions. It also can’t recognize PAMs, so it associates with random DNA sequences (Sternberg et al., 2014). The nuclease lobes are inactive in the absence of guide RNA, preventing Cas9 from making off-target cuts.

 

""Pro tip! X-ray crystallography cannot capture flexible or disordered regions, so crystal structures like those in Figures 2 and 3 don’t show these regions, even if they are present in the sample. Other techniques, like electron microscopy and FRET, can offer more information on these regions.

 

Cas9’s transition from weak, random binding to precision gene editing requires three major steps: RNA loading, DNA sequence recognition, and DNA cleavage.

RNA loading

Once supplied with a guide RNA — whether it’s an sgRNA or a crRNA:tracrRNA duplex — Cas9’s two lobes move toward each other into the “closed” conformation shown in Figure 3. Though Cas9 is not a static molecule in any binding state, RNA loading stabilizes the closed conformation.

 

Blugene-holding-DNA-300px-2Fun Fact! Even though the Cas9:RNA complex is well described in the closed conformation, recent electron microscopy studies have also found Cas9:RNA complexes in the open conformation (Cofsky et al., 2022), indicating that there is more to learn about this process! 

 

See figure caption for details. See figure caption for details.

 

Figure 3: Crystal structure of S. pyogenes Cas9 bound to guide RNA (PDB ID 4ZT0), shown opaque (left) and transparent (right). The RNA guide sequence is extended as a single unpaired strand in the center of the protein. The repeat:anti-repeat duplex pokes out from the REC lobe and doubles back on itself at the bottom of the image. Stem loops 1 and 2 extend horizontally across both lobes. The guide RNA in this structure does not include stem loop 3, and guide RNA nucleotides 11-20 are not resolved due to structural disorder. HNH, red; RuvC, orange; CTD, yellow; REC lobe, light blue; bridge helix, purple; sgRNA, dark blue.

 

The REC lobe binds tightly to the repeat:anti-repeat duplex of the guide RNA, confirming its sequence. This region of the protein varies between Cas9 enzymes from different species, explaining why they recognize different guide RNAs.

Deep in the REC lobe, the arginine-rich bridge helix pokes through the middle of the RNA duplex, separating it into two strands: the stem loops and the guide region. The stem loops bind along the outside of the protein, spanning both lobes. Stem loop 1, nestled between the REC and CTD domains, is required for Cas9 nuclease function. Stem loops 2 and 3 are not strictly necessary for in vitro cleavage, but they improve cutting efficiency and are important for robust activity in cells (Jiang & Doudna, 2017).

Within an interior cavity, Cas9 grips nucleotides 1-10 of the guide RNA, prearranged for base pairing with a complementary DNA strand. These 10 nucleotides are the “seed” region of the RNA guide, where exact matches to the DNA target are essential for cleavage. Nucleotides 11-20 are disordered and don’t appear in the Figure 3 crystal structure, reflecting the less stringent recognition mechanism that can tolerate mismatches in this region.

The crystal structure in Figure 3 shows the guide RNA stabilizing the DNA-binding protein portion of the CTD, ready to check for PAM sequencines. This helps explain why Cas9 can’t locate PAM sites until after RNA loading: the structure to interrogate the sequence isn’t in place yet.

 

S. pyogenes Cas9 in the apo state (PDB ID 4CMP) compared to the guide RNA-bound state (PDB ID 4ZT0). The crystal structure of the apo state is in the open conformation and a large portion of the CTD is not shown due to disorder. This state is flexible, has a disordered CTD, binds DNA weakly, and is unable to recognize PAM sequences. The crystal structure of RNA-bound Cas9 is in the closed conformation, and the previously hidden region of the CTD is now structured. This state is prearranged to bind DNA and recognize PAM sequences.

 

Figure 4: S. pyogenes Cas9 in the apo state (PDB ID 4CMP, top) compared to the guide RNA-bound state (PDB ID 4ZT0, bottom), seen from the same angle. Box highlights the DNA-binding region of the CTD, which is disordered in the apo state and absent from that structure. HNH, red; RuvC, orange; CTD, yellow; REC lobe, light blue; bridge helix, purple; sgRNA, dark blue.

 

DNA sequence recognition

Once primed by RNA loading, the PAM-recognition residues in the CTD can get cozy with DNA. S. pyogenes Cas9 recognizes the PAM sequence 5’-NGG-3’, where N can be any base. In Figure 5, we can see why: two arginines make base-specific hydrogen bonds to the guanines, but the N (a thymine in this case) is only bound by the phosphate backbone. These PAM bases must be present for Cas9 to unwind the DNA and check for sequence matches; a perfect guide RNA will do you no good if there is no adjacent PAM to anchor it. However, Cas9 only checks the non-target strand for the PAM sequence and doesn’t mind if the PAM complementary strand is mismatched.

 

See figure caption for details.
Figure 5: S. pyogenes Cas9 bound to guide RNA and target DNA (PDB ID 7S4X). Residues R1333 and R1335 from the CTD (yellow) make base-specific hydrogen bonds to guanines of the PAM in the non-target DNA strand (dark gray). Target DNA strand, light gray.

 

Once it has found a PAM site, Cas9 needs to check for complementarity with its guide RNA. But how to test DNA bases that are already snugly paired to each other? At this point, the protein can switch between the closed and open conformations to bend the DNA, destabilizing the base pairs closest to the PAM and flipping them out of the DNA duplex to pair with the guide RNA (Cofsky et al., 2022; Osuka et al., 2018). Each matching DNA:RNA pair gives just enough stability for the next DNA base pair to be unwound. A mismatch within the 10-base seed sequence will destabilize the nascent heteroduplex and abolish cleavage (Pattanayak et al., 2013). As mentioned above, however, Cas9 sometimes tolerates mismatches in the next 8-10 bases after perfect pairing in the seed sequence.

 

Electron microscopy structure of S. pyogenes Cas9 bound to guide RNA and target DNA (PDB ID 7S4X). Double-stranded helical DNA is bound by the CTD. As the DNA enters the protein, the strands separate. The target strand base pairs with the guide RNA, forming a helical heteroduplex that rests between the lobes. The non-target strand primarily contacts the RuvC domain and partially rests on the outside of the protein, but some nucleotides are not resolved due to disorder. On the far side of the protein, the DNA strands rejoin to form a helical duplex. See alt text of directly preceding figure for details.
Figure 6: Electron microscopy structure of S. pyogenes Cas9 bound to guide RNA and target DNA (PDB ID 7S4X), shown opaque (left) and transparent (right). Some nucleotides of the non-target DNA strand are not resolved due to disorder. All three stem loops and all 20 guide bases of the RNA are shown. HNH, red; RuvC, orange; CTD, yellow; REC lobe, light blue; bridge helix, purple; sgRNA, dark blue; target DNA strand, light gray; non-target DNA strand, dark gray.

 

The full DNA:RNA heteroduplex is cradled in the positively charged channel between the two lobes. For CRISPR to be programmable for any arbitrary sequence, it can’t bind to this duplex by its base pairs; those vary from target to target. Instead, both lobes bind to the DNA and RNA phosphate backbones, allowing Cas9 to verify the structure of the complementary heteroduplex. Base mismatches cause bulges in the duplex that may not fit properly in this channel, depending on their precise location.

The end of the heteroduplex rests against the RuvC nuclease domain, which stabilizes the last base pair with an end-capping loop (Nishimasu et al., 2014). This cap is where the DNA:RNA heteroduplex transitions back to DNA duplex, and it’s the reason guide sequences have a maximum length of 20 bases: nothing longer will fit. Stabilization by this region of Cas9 also allows the protein to tolerate RNA:DNA mismatches between base pairs 18 and 20.

 

""Pro tip! The Taylor lab mutated the residues in this loop to produce SuperFi-Cas9, a high-fidelity variant that still retains high cleavage efficiency (Bravo et al., 2022).

 

See figure caption for details.

 

Figure 7: S. pyogenes Cas9 bound to guide RNA and target DNA (PDB ID 7S4X). The end of the DNA:RNA heteroduplex lies between the RuvC (orange) and HNH (red) domains, while the REC lobe has been hidden for visibility. A RuvC loop caps the heteroduplex and marks the transition back to a DNA duplex. sgRNA, dark blue; target DNA strand, light gray; non-target DNA strand, dark gray.

 

DNA cleavage

Now that the RNA is loaded and the DNA sequence is recognized, it’s finally time to cut!

The HNH domain, which cuts the target DNA strand, makes few contacts with the rest of Cas9 and is highly flexible; it can rotate about 140 degrees between inactive and active conformations (Bravo et al., 2022; Sternberg et al., 2015). The correct positioning of the non-target DNA strand is important for stabilizing the active conformation of the HNH domain, ensuring all binding partners are in place before cleavage can occur (Jiang & Doudna, 2017; Sternberg et al., 2015).

 

Cas9:RNA crystal structure.

 

Cas9:RNA:DNA electron microscopy structure (PDB ID 7S4X). The HNH domain moves significantly between this and the crystal structures: in the RNA-only structure, it is rotated and distant from the position in the DNA-bound structure, in which it displaces part of the REC lobe to access the target DNA cleavage site.
Figure 8: Comparison of Cas9:RNA crystal structure (left, PDB ID 4ZT0) with Cas9:RNA:DNA electron microscopy structure (right, PDB ID 7S4X), viewed from the same angle to highlight conformational differences. HNH, red; RuvC, orange; CTD, yellow; REC lobe, light blue; bridge helix, purple; sgRNA, dark blue; target DNA strand, light gray; non-target DNA strand, dark gray.

 

The conformational state of the HNH domain also allosterically controls RuvC activity through a hinge region between the two (Jiang & Doudna, 2017). By linking the two nuclease domains, Cas9 ensures both are positioned correctly before HNH or RuvC can cleave DNA, which helps limit off-target activity.

What next? It’s not actually clear! In vitro, Cas9 holds tightly to DNA even after cleaving it, so many researchers think the enzyme needs cellular recycling factors to release its handiwork (Sternberg et al., 2014). New discoveries on this topic are still being made!

Conclusion

Finally, we can summarize the process from start to finish:

 

See figure caption for details.

Figure 9: Cartoon summary of Cas9 activity. The apo enzyme is shown in the open conformation, but highly flexible, as in Figure 2. The guide RNA is bound primarily by the REC lobe, stabilizing the closed conformation, as in Figure 3. Upon DNA binding, the NUC lobe recognizes the PAM site in the non-target DNA strand, and Cas9 bends the DNA helix by transitioning between the open and closed conformations. This allows it to unwind the DNA and test for sequence complementarity with its guide RNA. If the sequence is recognized, Cas9 in the closed conformation cleaves both DNA strands via the NUC lobe, as in Figure 6. NUC lobe, green; REC lobe, light blue; guide RNA, dark blue; DNA, gray; PAM site, yellow. Created with BioRender.com.

 

The exact conformational changes that make Cas9 such a valuable laboratory tool are still an area of active research. New details are still being uncovered, and different techniques all bring their own advantages—and blind spots—to these questions. Each new structural insight also spurs the next generation of engineered CRISPR enzymes with their own tradeoffs.

And this article only covered Cas9! Other CRISPR enzymes like Cas12 have different structures and different uses. In all cases, structural biology can help you visualize the system you’re working with and better understand how to manipulate it. Happy CRISPRing!

 

Molecular graphics and analyses performed with UCSF ChimeraX, developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco, with support from National Institutes of Health R01-GM129325 and the Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases.


Resources and references

More resources on addgene.org 

Addgene's CRISPR Guide

Addgene's CRISPR 101 eBook 

Resources on the Addgene blog

Components of CRISPR 

xCas9: Engineering a CRISPR Variant with PAM Flexibility

Cas9 vs the Other Cases

Further reading

Wang, J. Y., Pausch, P., & Doudna, J. A. (2022). Structural biology of CRISPR–Cas immunity and genome editing enzymes. Nature Reviews Microbiology, 20(11), 641–656. https://doi.org/10.1038/s41579-022-00739-4

References

Bravo, J. P. K., Liu, M.-S., Hibshman, G. N., Dangerfield, T. L., Jung, K., McCool, R. S., Johnson, K. A., & Taylor, D. W. (2022). Structural basis for mismatch surveillance by CRISPR–Cas9. Nature, 603(7900), 343–347. https://doi.org/10.1038/s41586-022-04470-1

Cofsky, J. C., Soczek, K. M., Knott, G. J., Nogales, E., & Doudna, J. A. (2022). CRISPR-Cas9 bends and twists DNA to read its sequence. Nature Structural & Molecular Biology, 29(4), 395–402. https://doi.org/10.1038/s41594-022-00756-0

Jiang, F., & Doudna, J. A. (2017). CRISPR-Cas9 Structures and Mechanisms. Annual Review of Biophysics, 46, 505–529. https://doi.org/10.1146/annurev-biophys-062215-010822

Nishimasu, H., Ran, F. A., Hsu, P. D., Konermann, S., Shehata, S. I., Dohmae, N., Ishitani, R., Zhang, F., & Nureki, O. (2014). Crystal Structure of Cas9 in Complex with Guide RNA and Target DNA. Cell, 156(5), 935–949. https://doi.org/10.1016/j.cell.2014.02.001

Osuka, S., Isomura, K., Kajimoto, S., Komori, T., Nishimasu, H., Shima, T., Nureki, O., & Uemura, S. (2018). Real‐time observation of flexible domain movements in CRISPR–Cas9. The EMBO Journal, 37(10), e96941. https://doi.org/10.15252/embj.201796941

Pattanayak, V., Lin, S., Guilinger, J. P., Ma, E., Doudna, J. A., & Liu, D. R. (2013). High-throughput profiling of off-target DNA cleavage reveals RNA-programmed Cas9 nuclease specificity. Nature Biotechnology, 31(9), 839–843. https://doi.org/10.1038/nbt.2673

Sternberg, S. H., LaFrance, B., Kaplan, M., & Doudna, J. A. (2015). Conformational control of DNA target cleavage by CRISPR-Cas9. Nature, 527(7576), 110–113. https://doi.org/10.1038/nature15544

Sternberg, S. H., Redding, S., Jinek, M., Greene, E. C., & Doudna, J. A. (2014). DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature, 507(7490), 62–67. https://doi.org/10.1038/nature13011

 

Topics: CRISPR, CRISPR 101, Cas Proteins

Leave a Comment

Sharing science just got easier... Subscribe to our blog