This Post was updated on May 3, 2017 with additional information and resources.
This post was contributed by guest blogger, Addgene Advisory Board member, and Associate Director of the Genetic Perturbation Platform at the Broad Institute, John Doench.
CRISPR technology has made it easier than ever both to engineer specific DNA edits and to perform functional screens to identify genes involved in a phenotype of interest. This blog post will discuss differences between these approaches, as well as provide updates on how best to design gRNAs. You can also find validated gRNAs for your next experiment in Addgene's Validated gRNA Sequence Datatable.
Important considerations before you start an experiment with CRISPR
The hammer, the jigsaw, and the wrench are all great tools, but which one you use, of course, depends on what you are trying to do – there’s no “best” tool among them. While this seems obvious, it is important to remember that the same is true when designing gRNAs for using CRISPR technology – the “best” gRNA depends an awful lot on what you are trying to do: gene knockout, a specific base edit, or modulation of gene expression.
The hammer: Gene knockout by NHEJ
Gene knockout with CRISPR technology is usually accomplished by Cas9-mediated dsDNA breaks: following a cut, the error-prone nature of non-homologous end joining (NHEJ) often leads to the generation of indels and thus frameshifts that disrupt the protein-coding capacity of a locus. When using S. pyogenes Cas9, potential target sites are both [5’-20nt-NGG] and [5’-CCN-20nt], as it is equally efficacious to target the coding or non-coding strand of DNA. As a rule of thumb, we avoid target sites that code for amino acids near the N’ terminus of the protein, in order to mitigate the ability of the cell to use an alternative ATG downstream of the annotated start codon. Likewise, we avoid target sites that code for amino acids close to the C’ terminus of the protein, to maximize the chances of creating a non-functional allele. For a 1 kilobase gene, since potential target sites occur ~1 in every 8 nucleotides, restricting gRNAs to 5 – 65% of the protein coding region will still result in many dozens of gRNAs to choose from. With so many possibilities, picking a gRNA with an optimized sequence is of primary importance (more on this below).
The jigsaw: Editing by HDR
For a specific edit, such as the insertion of a fluorescent tag or the introduction of a specific mutation, one generally relies on homology directed repair (HDR) to incorporate new information into DNA. This also requires an exogenous DNA template. HDR, however, is a very low-efficiency process, and usually involves the need for single cell cloning and subsequent screening for successful edits. This is a very time consuming process and should not be undertaken lightly! Indeed, truly achieving the gold standard requires not one but two rounds of single cell cloning – as a control, one should revert the edit back to the original in order to prove that the phenotype was really due to the intended edit rather than some passenger variant that came along with the single cell clone.
When targeting a dsDNA break for HDR, the choice of target site is far more constrained by the desired location of edit; efficiency decreases dramatically when the cut site is >30nt from the proximal ends of the repair template (1). This means that, for gene editing, there are usually very few potential gRNAs. The same locational constraints are even more exquisite for the so-called Base Editor Cas9, which makes DNA changes in the absence of dsDNA breaks (2). Thus, for gene editing, location is the most critical design parameter.
The wrench: Gene activation and inhibition by CRISPRa and CRISPRi
Finally, for modulating gene expression at the level of transcription – CRISPRa (activation) and CRISPRi (inhibition) technologies – a nuclease-dead Cas9 (dCas9) is directed near the promoter of a target gene. Here, the target window is not quite as broad as for knockout via CRISPR cutting. For CRISPRa, it is most-efficacious to target a ~100nt window upstream of the transcription start site (TSS), while for CRISPRi, a ~100nt window downstream of the TSS gives the most activity. Thus, a given gene will only have a dozen or so gRNAs to choose from in the optimal location. It is also important to have good information on the exact location of the TSS. Different databases annotate the TSS in different ways, and it was recently shown that the FANTOM database, which relies on CAGE-seq to directly capture the mRNA cap, provides the most accurate mapping (3). In this case, location and sequence are of approximately equal importance in design – an optimized sequence will do little if it is in the wrong place, but because the target window is more-narrow, there are fewer gRNA to choose from, and thus an optimal sequence may not be available.
Predicting gRNA on-target activity
Whether one’s goal is gene disruption or gene editing, of one gene or genome-wide, being able to distinguish effective from ineffective gRNAs can greatly streamline an experiment and simplify interpretation of results. Previously, we had examined sequence features that enhance on-target activity of gRNAs by creating all possible gRNAs for a panel of genes and assessing, by flow cytometry, which sequences led to complete protein knockout (4). By examining the nucleotide features of the most-active gRNAs from a set of 1,841 gRNAs, we derived scoring rules and built a website implementation of these rules to design gRNAs against genes of interest. We then expanded our dataset and improved our computational modeling to derive Rule Set 2 for prediction of gRNA efficacy (5). We measured the activity of more than 2,000 additional gRNAs to further strengthen the statistical power, and confirm the generalizability, of activity predictions. In collaboration with Microsoft Research, we explored the use of more-powerful computational modeling approaches. While our initial model (Rule Set 1) was based on a fairly simple classification model, we found that the use of regression models in general, and gradient-boosted regression trees in particular, greatly improved the power of our predictions. Web-based implementation of Rule Set 2 is now available from both the Broad and Microsoft and independent publications have shown its predictive value (6).
Decreasing off-target effects
Avoiding off-target effects of Cas9, that is, cutting at other, unintended sites in the genome, is an important step in designing gRNAs. Merely glancing through the literature shows that different groups have come to wildly different conclusions as to the specificity of gRNAs. To take two examples, compare these titles:
It is reasonable to ask, well, which is right? As usual, the truth lies in the details, which is another way of saying that you can’t judge a journal article by its title! Indeed, both titles are correct within the confines of each study, but the generalizability is what matters most. For sure, some differences in these reports (and many others) likely relate to differences in experimental systems, but probably most importantly, both of these papers examined small numbers of gRNAs. Are there some really promiscuous gRNAs? For sure! Are there quite specific ones? You bet! Of course, the same could be said for essentially any targeting technology – there are both really specific and really non-specific TALENs, siRNAs, antibodies, and small molecules.
Generalizability, then, needs to come from sampling from large numbers, and indeed, rules governing off-target effects are beginning to be understood in more detail. First, direct physical detection of off-target sites though techniques like GUIDE-Seq have shown that some gRNAs have dozens of detectable off-target sites, but that same study also found 1 gRNA, of 10 examined, that had zero off-target sites by their technique (9). Further, they showed that existing heuristics to find and score off-targets in fact miss many sites. They compared GUIDE-Seq results to two prediction algorithms from Feng Zhang’s lab and Michael Boutros’s lab and “discovered that neither program identified the vast majority of off-target sites found by GUIDE-seq.” Of course, at the time of launch, these servers were based on the best-available information at the time, and the perfect should not be the enemy of the good.
More recently, we have examined off-target sites at much larger scale than previous studies and developed the CFD score to predict off-target sites with better sensitivity and specificity than previous heuristics (5). In the course of this study, we also found that the search algorithm itself plays a perhaps-under-appreciated role in arriving at the right result. Because of its ease of implementation and speed, many have used bowtie2 to perform scans of the genome to find off-target sites that contain small numbers of mismatches, but the bowtie algorithm was not designed for quite this purpose, and in fact misses many potential off-target sites, especially sites with more than 1 mismatch. Thus, both the search metric and the scoring metric are critical for a comprehensive view of potential off-target sites (10).
For gene editing approaches, where the goal is to introduce a specific change at a specific site, the choice of gRNAs is often quite limiting and thus sometimes all your gRNAs will have poor off-target properties. One method to decrease off-target effects with CRISPR technology is the use of two gRNAs in combination with a mutated “nickase” version of Cas9. This approach has the benefit of increased specificity and thus a reduced rate of off-target dsDNA breaks. One downside of this approach, though, is that the requirement for two target sites will mean some specific locations are not suitable for creating a dsDNA break. When possible, though, this is the preferred approach for gene editing (learn more about nickase and specificity here). Another approach to decrease off-target effects is the use of Cas9 variants with engineered mutations that result in decreased binding energy between the protein, the RNA, and the DNA (11, 12). As a result, mismatched (i.e. off-target) sites can generally no longer serve as substrates for cutting.
Genome-wide pooled gRNA libraries
We have implemented our on- and off-target scoring rules to create genome-wide pooled libraries. Our first attempts were named Avana (a grape used for making wine) for human and Asiago (a cheese) for mouse, and we compared performance to the GeCKO library, which was developed before these rules were available. For both positive and negative selection screens, we found that these new libraries were able to identify more hits with greater statistical confidence, due to the increased consistency of different gRNAs targeting a gene, that is, more of the gRNAs in the library were efficacious.
While it is of course true that more gRNAs per gene provide more information, this comes at the cost of screening and sequencing more cells, which puts some cellular models and experimental systems out of reach. Thus for many researchers, a primary screen that uses a smaller, high-activity genome-wide library will be desirable. Towards this end, we have made new libraries, named Brunello for human (again, a wine-making grape… you can see where we’re going with this) and Brie for mouse, that take into account both our newest on-target designs and avoidance of off-target sites. These libraries are available from Addgene as both plasmid pools and ready-to-use lentvirus. Our (as-yet-unpublished... but also not-yet-rejected!) data show that the improvement from Avana to Brunello is approximately equal to the improvement we saw in going from GeCKOv2 to Avana. By one analysis approach, we see that the use of just a single gRNA in the Brunello library outperforms the use of all 6 gRNAs in the GeCKOv2 library. Additionally, we have designed libraries for CRISPRa (Calabrese and Caprano) and CRISPRi (Dolcetto and Dolomiti) using optimized design rules, which will soon be available via Addgene as well. A publication describing these libraries is likewise working its way through The System.
Once a target site has been identified, it is important to consider delivery options. For conducting genetic screens in pooled format, the use of an integrating virus (e.g. lentivirus) is critical to the entire process. However, for generating a cellular model, long-term expression of CRISPR components is not desirable, due to the potential for accumulation of off-target lesions. Transient expression options are the most appropriate choices for the creation of a stable cell line. These can include the transfection or electroporation of plasmid DNA, mRNA, or Cas9 protein pre-complexed to in vitro transcribed or synthesized gRNA, or the use of non-integrating viruses such as AAV or Adenovirus.
If performing HDR, the repair template can be either a long, dsDNA (e.g. a plasmid) or a single-stranded oligonucleotide co-delivered with the Cas9 and gRNA. The choice between the two templates is largely dictated by the size of the intended change; small (< ~40 nt) changes can be introduced with synthesized oligonucleotides of ~100 – 200 nts in length. These can simply be purchased commercially. Large inserts, such as the introduction of GFP to tag a protein, require a template with much larger homology arms.
In sum, selection of gRNAs for an experiment needs to balance maximizing on-target activity while minimizing off-target activity, which sounds obvious but can often require difficult decisions. For example, would it be better to use a less-active gRNA that targets a truly unique site in the genome, or a more-active gRNA with one additional target site in a region of the genome with no known function? For the creation of stable cell models that are to be used for long-term study, the former may be the better choice. For a genome-wide library to conduct genetic screens, however, a library composed of the latter would likely be more effective, so long as care is taken in the interpretation of results by requiring multiple sequences targeting a gene to score in order to call that gene as a hit.
This is exciting time for functional genomics, with an ever-expanding list of tools to probe gene function. The best tools are only as good as the person using them, and the proper use of CRISPR technology will always depend on careful experimental design, execution, and analysis.
Many thanks to our Guest Blogger John Doench!
John Doench is Associate Director of the Genetic Perturbation Platform at the Broad Institute and has worked with many Addgenies to help improve the understanding, curation, and explanation of our CRISPR resources. He really likes small RNAs.
2. Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 1–17 (2016). PubMed PMID: 27096365. PubMed Central PMCID: PMC4873371.
3. Radzisheuskaya, A., Shlyueva, D., Müller, I. & Helin, K. Optimizing sgRNA position markedly improves the efficiency of CRISPR/dCas9-mediated transcriptional repression. Nucleic Acids Research 44, e141–e141 (2016). PubMed PMID: 27353328. PubMed Central PMCID: PMC5062975.
5. Doench, J.G. et al. Optimized gRNA design to maximize activity and minimize off-target effects for genetic screens with CRISPR-Cas9. Nat Biotechnol. PubMed PMID: 26780180. PubMed Central PMCID: PMC4744125.
6. Haeussler, M. et al. Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR. Genome Biol 17, 1–12 (2016). PubMed PMID: 27380939. PubMed Central PMCID: PMC4934014.
7. Veres, Adrian, et al. "Low incidence of off-target mutations in individual CRISPR-Cas9 and TALEN targeted human stem cell clones detected by whole-genome sequencing." Cell stem cell 15.1 (2014): 27-30. PubMed PMID: 24996167. PubMed Central PMCID: PMC4082799.
8. Fu, Yanfang, et al. "High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells." Nature biotechnology 31.9 (2013): 822-826. PubMed PMID: 23792628. PubMed Central PMCID: PMC3773023.
9. Tsai, Shengdar Q., et al. "GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases." Nature biotechnology 33.2 (2015): 187-197. PubMed PMID: 25513782. PubMed Central PMCID: PMC4320685.
10. Bae, S., Park, J., Park, J. & Kim, J. S. Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30, 1473–1475 (2014). PubMed PMID: 24463181. PubMed Central PMCID: PMC4016707.
12. Kleinstiver, B. P. et al. High-fidelity CRISPR–Cas9 nucleases with no detectable genome-wide off-target effects. Nature 529, 490–495 (2016). PubMed PMID: 26735016. PubMed Central PMCID: PMC4851738.
Resources at the Addgene Blog
- Listen to our Podcast with John Doench
- Learn How to Conduct Genome-Wide CRISPR Pooled Library Screens
- Read other CRISPR Blog Posts
Resources on Addgene.org
- Brush up on Your CRISPR Basics with Our CRISPR Guide
- Find CRISPR Plasmids for Your Research
- Find CRISPR Pooled Libraries for Your Research