All plasmids coming through Addgene’s doors are now verified by next-generation sequencing (NGS) to provide you with more data. For most plasmids, we are now able to confirm the full insert and backbone sequence. In order to manage and interpret these data, we’ve updated our quality control workflow. Here’s a peek at how our process has improved now that we’re backed by the power of NGS!
A Broader View of the Whole Plasmid
Sanger sequencing can cover a small region of a plasmid: a specific primer produces a sequencing result between 500 - 700 base pairs in length. In contrast, next generation sequencing looks at the entire plasmid by fragmenting it into smaller pieces that are then amplified and sequenced. The final step is to assemble the short sequencing reads into the full plasmid sequence.
Our sequencing partner seqWell provides us with assembled plasmid sequences and raw data containing the amplified and sequenced plasmid DNA fragments. The NGS results that we receive for most plasmids are provided as single, circular assemblies with sequence data for each plasmid backbone and its associated insert. Our new workflow for analyzing these assemblies consists of three steps:
- Aligning the NGS result to a reference sequence to confirm backbone elements.
- Confirming the gene/insert by aligning to NCBI entry or using BLAST.
- Confirming tags and fusion proteins.
We’ll break down each of these steps below.
Aligning the Sequence
The first step we take when performing quality control of newly deposited plasmids is to align the NGS result to a reference sequence like a known backbone sequence. We’re not necessarily expecting a perfect match - we will often find a few mismatches in the origin of replication or other common backbone elements. Since we’ve successfully grown the plasmid in culture to prepare it for sequencing, we feel confident that these few minor mismatches usually don’t affect the function of the plasmid.
Confirming the Insert
We usually confirm the insert through BLAST or by direct alignment to an NCBI reference sequence or a sequence provided by the depositor. Depositing labs often provide insert sequences or annotated Genbank files that are useful for more complicated plasmids, like those that contain synthesized regions that have no publicly available reference sequence or plasmids containing genes with many modifications. We look for point mutations, truncations, and insertions that could compromise function. When we do find mutations, we check to see if they affect the translated amino acid. We also confirm that the species of the gene matches the data associated with the plasmid.
Confirming Tags and Fusion Proteins
Finally, we confirm promoters, tags, fusion proteins, and selectable markers by detecting common features using Snapgene. If we find information that differs from what we would expect given the data provided by the depositing lab, we call these “quality control (QC) issues.” We then ask the the depositing laboratory to review the discrepancies. If the depositing laboratory confirms that these differences are expected and do not affect plasmid function, we will update information on the plasmid’s page.
Sometimes We’re Missing a Piece of the Circle
As we noted in a previous post, there are regions of some plasmids that are particularly difficult to sequence and assemble, including GC rich and repetitive regions. Due to these issues, NGS for some plasmids will not result in one complete, circular assembly. Most of the time, these plasmids are returned as one or two partial assemblies.
We’ve seen some plasmid elements that are consistently difficult to assemble due to their sequence. For example, the CAG promoter, a hybrid promoter consisting of a CMV enhancer, chicken beta actin promoter, and rabbit beta-globin splice acceptor, contains sequence that is over 80% GC. Some IRES sequences also contain regions that are GC rich. When we analyze the NGS results for plasmids containing these elements, we will trim out the affected region and present the sequence as a linear, partial sequencing result.
Sequencing at Addgene
Introducing NGS for all incoming plasmids was a big change for Addgene - we had used Sanger sequencing for quality control for over ten years! Transitioning to NGS required us to optimize our lab processes and learn how to manage significantly more data. Today, we’re reaping the rewards of NGS with a high sequencing success rate and an increased amount of information that we can share with the community. In addition to sequencing plasmids, we have also started using NGS to check the quality of other services that we provide, including viral preps and pooled libraries - look out for more information about those processes in future posts!