This post was contributed by guest blogger, Addgene Advisory Board member, and Associate Director of the Genetic Perturbation Platform at the Broad Institute, John Doench.
CRISPR technology has made it easier than ever both to make specific DNA sequence changes to the genome and to perform genome-wide functional screens to identify genes involved in a phenotype of interest. This blog post will discuss the differences between these approaches, as well as provide updates on how best to design gRNAs for your experiments. You can also find validated gRNAs for your next experiment in our Validated gRNA Sequence Datatable.
Important Considerations Before You Start an Experiment with CRISPR
When designing experiments using CRISPR, the first choice centers on the type of edit you want to introduce: a small insertion or deletion (indel) or a specific change (gene editing by homology directed repair).
- Using CRISPR to Create Indels
A small indel is most often used to disrupt the protein-coding capacity of a locus by introducing frameshift alleles that result from the error-prone nature of non-homologous end joining (NHEJ) following Cas9-mediated dsDNA breaks. For this, all that is needed is Cas9 and the gRNA. It is clear that this technique can be quite robust, with near-100% efficiency possible, and many groups (eg. Shalem et al , Wang et al , and Parnas et al ) have performed genome-wide loss-of-function screens with this approach. When using S. pyogenes Cas9, potential target sites are both [5’-20nt-NGG] and [5’-CCN-20nt], as it is equally efficacious to target the coding or non-coding strand of DNA. Thus, for disruption of protein coding genes, potential target sites occur ~1 every 8 nucleotides, and thus most genes have a large number of gRNAs from which to choose.
- Genome Editing via Homology Directed Repair
A specific change, such as the insertion of a fluorescent tag or the introduction of a specific mutation, relies on homology directed repair (HDR) to incorporate the new sequence into the DNA target site, and thus also requires an exogenous template. HDR, however, is still a very low-efficiency process, and usually involves the need for single cell cloning and subsequent screening for successful edits. This is a very time consuming process and should not be undertaken lightly! When targeting a dsDNA break for HDR, the choice of target site is far more constrained by the desired location of insertion; efficiency decreases dramatically when the cut site is >30nt from the proximal ends of the repair template. This means that, for gene editing, there are often far fewer potential gRNAs.
CRISPR Delivery Options
Once a target site has been identified, it is important to consider delivery options. For conducting genetic screens using pooled libraries, the use of an integrating virus (e.g. lentivirus) is critical to the entire process; at the end of the process you need to be able to determine what mutations caused the phenotypic effect you screened for - this is easier to figure out if you can look for a genetic element that has stably integrated into your cells. However, for generating a cellular model, long-term expression of CRISPR components is not desirable, due to the potential for accumulation of off-target lesions. Transient expression options are the appropriate choice for creating a stable cell line with your desired edit. Possible transient expression methods include: transfection or electroporation of plasmid DNA, mRNA, or Cas9 protein pre-complexed to in vitro transcribed gRNA, or the use of non-integrating viruses such as AAV or Adenovirus.
If performing HDR, the repair template must be co-delivered with Cas9 and the gRNA as either a long, dsDNA (e.g. a plasmid) or a single-stranded oligonucleotide. The choice between the two templates is largely dictated by the size of the intended change, as introduction of small (< ~40) nucleotides changes can be accomplished with a synthesized oligonucleotide of ~100 – 200 nts in length. Large inserts on the other hand, such as the introduction of GFP, require a template with much larger homology arms.
Predicting gRNA On-Target Activity
Whether one’s goal is gene disruption or gene editing, of one gene or genome-wide, being able to distinguish effective from ineffective gRNAs can greatly streamline an experiment and simplify the interpretation of results. Here at the Broad Institute, we examined sequence features that enhance on-target activity of gRNAs by creating all possible gRNAs for a panel of genes and assessing, by flow cytometry, which sequences led to complete protein knockout (4). By examining the nucleotide features of the most-active gRNAs from a set of 1,841 gRNAs, we derived scoring rules and built a website to help researchers design design gRNAs against genes of interest based on these rules.
More recently, we have both expanded our dataset and improved our computational modeling to derive a more powerful set of rules to predict gRNA efficacy (Rule Set 2) (5). We measured the activity of more than 2,000 additional gRNAs to further strengthen the statistical power, and confirm the generalizability, of activity predictions. In collaboration with Nicolo Fusi and Jennifer Listgarten of Microsoft Research, we explored the use of more-powerful computational modeling approaches. While our initial model (Rule Set 1) was based on a fairly simple classification model, we found that the use of regression models in general, and gradient-boosted regression trees in particular, greatly improved the power of our predictions. Web-based implementation of Rule Set 2 is now available from both the Broad and Microsoft.
We tested the real-world impact of the use of these rules by designing a new genome-wide library, which we named Avana (a grape used for making wine) for human and Asiago (a cheese) for mouse and compared performance to the GeCKO library, which was developed before these rules were available. For both positive and negative selection screens, we found that these new libraries were able to identify more hits with greater statistical confidence, due to the increased consistency of different gRNAs targeting a gene, that is, more of the gRNAs in the library were efficacious. This enables the use of smaller libraries, with fewer gRNAs per gene. While it is of course true that using more gRNAs per gene in a screen provides more information, this comes at the cost of screening and sequencing more cells, which puts some cellular models and experimental systems out of reach. Thus, for many researchers, a primary screen that uses a smaller, high-activity genome-wide library will be desirable. Towards this end, we have made new libraries, named Brunello for human (again, a wine-making grape… you can see where we’re going with this) and Brie for mouse, that take into account both our newest on-target designs, as well as methods to avoid off-target sites, discussed below. The Brunello and Brie Libraries are currently available from Addgene .
Decreasing Off-Target Effects
Avoiding off-target effects of Cas9, that is, preventing cuts at other, unintended sites in the genome, is an important step in designing gRNAs. Merely glancing through the literature shows that different groups have come to wildly different conclusions as to the specificity of gRNAs. To take two examples, compare these titles:
“Low Incidence of Off-Target Mutations in Individual CRISPR-Cas9 and TALEN Targeted Human Stem Cell Clones Detected by Whole-Genome Sequencing”(6)
“High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells”(7)
It is reasonable to ask, well, which is right? As usual, the truth lies in the details, which is another way of saying that you can’t judge a journal article by its title! Indeed, both titles are correct within the confines of each study, but the generalizability is what matters most. For sure, some differences in these reports (and many others) likely relate to differences in experimental systems, but probably most importantly, both of these papers examined small numbers of gRNAs. Are there some really promiscuous gRNAs? For sure! Are there quite specific ones? You bet! Of course, the same could be said for essentially any targeting technology – there are both really specific and really non-specific TALENs, siRNAs, antibodies, and small molecules.
Generalizability, then, needs to come from sampling from large numbers, and indeed, rules governing off-target effects are beginning to be understood in more detail. First, direct physical detection of off-target sites though techniques like GUIDE-Seq, developed by the Joung lab, have shown that some gRNAs have dozens of detectable off-target sites, but that same study also found 1 gRNA, of 10 examined, that had zero off-target sites by their technique (8). Further, they showed that existing heuristics to find and score off-targets in fact miss many sites. They compared GUIDE-Seq results to two prediction sites, from Feng Zhang’s lab and Michael Boutros’s lab and “discovered that neither program identified the vast majority of off-target sites found by GUIDE-seq.” Of course, at the time of launch, these servers were based on the best-available information, and the perfect should not be the enemy of the good.
More recently, we have examined off-target sites at much larger scale than previous studies and developed the CFD (Cutting Frequency Determination - see paper for details) score to predict off-target sites with better sensitivity and specificity than previous heuristics (5). In the course of this study, we also found that the search algorithm itself plays a perhaps-under-appreciated role in arriving at the right result. Because of its ease of implementation and speed, many have used Bowtie2 to perform scans of the genome to find off-target sites that contain small numbers of mismatches, but the Bowtie algorithm was not designed for quite this purpose, and in fact misses many potential off-target sites, especially sites with more than 1 mismatch. Thus, both the search metric and the scoring metric are critical for a comprehensive view of potential off-target sites.
Even as off-target activity becomes better understood, in the context of genetic screens, it is still critical to require that multiple perturbations targeting a gene show consistency in order to conclude that the observed phenotype is due to on-target activity. Indeed, all libraries build in this redundancy. For gene editing approaches, however, where the goal is to introduce a specific change at a specific site, the choice of gRNAs is often quite limiting. One method to decrease off-target effects with CRISPR technology is the use of two gRNAs in combination with a mutated “nickase” version of Cas9. This approach has the benefit of increased specificity and thus a reduced rate of off-target dsDNA breaks. One downside of this approach, though, is that the requirement for two target sites will mean some specific locations are not suitable for creating a dsDNA break. When possible, though, this is the preferred approach for gene editing (learn more about nickase and specificity here). The characterization of novel CRISPR systems, and the development of variants of Cas9 enzymes with alternative PAM requirements, promises to make more and more of the genome readily accessible to on-demand dsDNA breaks.
In sum, selection of gRNAs for an experiment needs to balance maximizing on-target activity while minimizing off-target activity, which sounds obvious but can often require difficult decisions. For example, would it be better to use a less-active gRNA that targets a truly unique site in the genome, or a more-active gRNA with one additional target site in a region of the genome with no known function? For the creation of stable cell models that are to be used for long-term study, the former may be the better choice. For a genome-wide library to conduct genetic screens, however, a library composed of the latter would likely be more effective, so long as care is taken in the interpretation of results by requiring multiple sequences targeting a gene to score in order to call that gene as a hit.
This is an exciting time for functional genomics, with an ever-expanding list of tools to probe gene function. The best tools are only as good as the person using them, and the proper use of CRISPR technology will always depend on careful experimental design, execution, and analysis.
Many thanks to our Guest Blogger John Doench!
John Doench is Associate Director of the Genetic Perturbation Platform at the Broad Institute and has worked with many Addgenies to help improve the understanding, curation, and explanation of our CRISPR resources. He really likes small RNAs.
3. Parnas, Oren, et al. “A genome-wide CRISPR screen in primary immune cells to dissect regulatory networks.” Cell 162.3 (2015): 675-686. PubMed PMID: 26189680. PubMed Central PMCID: PMC4522370.
5. Doench, J.G. et al. Optimized gRNA design to maximize activity and minimize off-target effects for genetic screens with CRISPR-Cas9. Nat Biotechnol. PubMed PMID: 26780180.
6. Veres, Adrian, et al. "Low incidence of off-target mutations in individual CRISPR-Cas9 and TALEN targeted human stem cell clones detected by whole-genome sequencing." Cell stem cell 15.1 (2014): 27-30. PubMed PMID: 24996167. PubMed Central PMCID: PMC4082799.
7. Fu, Yanfang, et al. "High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells." Nature biotechnology 31.9 (2013): 822-826. PubMed PMID: 23792628. PubMed Central PMCID: PMC3773023.
8. Tsai, Shengdar Q., et al. "GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases." Nature biotechnology 33.2 (2015): 187-197. PubMed PMID: 25513782. PubMed Central PMCID: PMC4320685.
Resources at the Addgene Blog
- Listen to our Podcast about John Doench
- Learn How to Conduct Genome-Wide CRISPR Pooled Library Screens
- Read other CRISPR Blog Posts
Resources on Addgene.org
- Brush up on Your CRISPR Basics with Our CRISPR Guide
- Find CRISPR Plasmids for Your Research
- Find CRISPR Pooled Libraries for Your Research