Abstract
Prime editing installs precise edits into the genome with minimal unwanted byproducts, but low and variable editing efficiencies have complicated application of the approach to high-throughput functional genomics. Here we assembled a prime editing platform capable of high-efficiency substitution editing suitable for functional interrogation of small genetic variants. We benchmarked this platform for pooled, loss-of-function screening using a library of ~240,000 engineered prime editing guide RNAs (epegRNAs) targeting ~17,000 codons with 1–3 bp substitutions. Comparing the abundance of these epegRNAs across screen samples identified negative selection phenotypes for 7,996 nonsense mutations targeted to 1,149 essential genes and for synonymous mutations that disrupted splice site motifs at 3′ exon boundaries. Rigorous evaluation of codon-matched controls demonstrated that these phenotypes were highly specific to the intended edit. Altogether, we established a prime editing approach for multiplexed, functional characterization of genetic variants with simple readouts.
Similar content being viewed by others
Main
Large-scale sequencing efforts have cataloged millions of human genetic variants, including hundreds of thousands linked to human traits or diseases1,2,3,4. A central challenge now is to characterize the functional effects of such variants on molecular, cellular and physiological processes (for example, protein function or gene regulation). Technologies for variant screening have greatly enabled such work5,6,7,8,9,10,11,12,13,14,15,16, but existing approaches have limitations. Ectopic gene expression, for example, can be applied in large, pooled experiments to evaluate all possible variants across small, defined sequences8,9,10,11, but variants evaluated with this approach may not always phenocopy their endogenous counterparts because exogenous sequences do not retain genomic context. To overcome this limitation, an approach for saturation genome editing that uses homology-directed repair to install variant libraries into the genome at Cas9-induced DNA double-strand breaks was developed12,13. This approach allows nearly any sequence change to be introduced at endogenous loci; however, variant installation with homology-directed repair is often inefficient or imprecise and thus difficult to multiplex across targets17, restricting use of this platform to individual genomic regions. To further improve variant screening, base editing platforms were developed5,6,7,16. These platforms enable efficient, multiplexable variant installation across the genome but also introduce undesired bystander mutations alongside programmed edits and are restricted by mutation type (for example, cytosine base editors produce C>T or G>A edits)14,15,18,19,20,21. Individual base editing screens thus have limited variant scope22,23,24.
An ideal platform for high-throughput variant characterization would allow precise, efficient and multiplexable editing of any variant type across the genome. Approaching this ideal, prime editing can flexibly install all 12 single nucleotide substitutions, small insertions, and deletions into targeted genomic loci with minimal unintended editing25. Prime editing is also theoretically well-suited to pooled, multiplexed experiments because all information required for installation of any one variant is physically encoded on a single prime editing guide RNA (pegRNA). Prime editing should therefore be compatible with standard screening protocols (as used for other types of CRISPR-based screens)6,7,26,27,28,29,30, including parallel construction of large guide RNA libraries, pooled guide RNA delivery and phenotyping through quantification of guide RNA enrichment or dropout. However, despite some promising recent examples31,32,33,34,35,36,37,38, use of prime editing for genetic screens has been limited, with the primary challenge being typically low and variable editing efficiencies. Here we show that prime editing can achieve efficient variant installation when applied with stably expressed editing components and in the absence of DNA mismatch repair (MMR). Using these conditions, we then establish a screening platform capable of producing robust, high-specificity dropout effects across thousands of substitutions targeted to essential genes.
Results
Building a platform capable of high-efficiency prime editing
The simplest form of prime editing is a two-component system that uses an engineered Cas9 protein (Cas9 H840A nickase fused to a reverse transcriptase) and a pegRNA that specifies both the genomic target and intended edit. Together, these components bind the targeted locus, nick the complementary DNA strand and, using the nicked DNA as a primer, reverse transcribe the intended edit into the genome, creating few unwanted edits in the process. We set out to determine whether this form of prime editing (‘PE2 approach’) could be optimized for screening. To begin, we obtained two K562 clonal cell lines that constitutively express different prime editor fusion proteins (PE2 (ref. 25) or PEmax39), with enhanced green fluorescent protein (EGFP) coexpressed to enable monitoring of editor expression (Fig. 1a,b). We call these cell lines PE2 and PEmax40, respectively. We transduced these cell lines with standard pegRNAs or engineered pegRNAs (epegRNAs) containing the tevopreQ1 motif on their 3′ ends41 and measured editing frequencies at two endogenous targets over 1 month. Because prime editing with the PE2 approach should primarily produce intended edits at the targeted loci or reinstate unedited sequence after target engagement, we reasoned that stable expression of editing components should result in accumulation of precise edits over time42 (Fig. 1c). Additionally, because PEmax and epegRNAs can outperform PE2 (ref. 39) and pegRNAs41, respectively, we expected higher editing efficiencies from those optimized components. In agreement with our expectations, we observed clearly higher editing in PEmax cells at one locus and with epegRNAs at both loci, and found that intended edits increased throughout the experiment with minimal co-occurrence of unwanted outcomes or ‘errors’ (Fig. 1d and Extended Data Fig. 1a).
a, Western blot analysis of K562 cells (parental) and clonal derivatives stably expressing indicated prime editor protein from the AAVS1 safe-harbor locus, either with (PEmaxKO) or without (PE2, PEmax) genetic disruption of MLH1. Analysis after 1 month of culture post transduction with pegRNA constructs (from same cell populations as in b, d and e). b, Percentages of cells with expression of marker for prime editor construct (EGFP driven by IRES2 from the same transcript as the prime editor protein). Analysis over 1 month of culture post transduction with (e)pegRNA constructs, data and error bars represent mean ± s.d. of n = 4 independent biological samples for each cell line. c, Schematic of prime editing over time, with intended edit shown in cyan. d, Percentages of sequencing reads containing HEK3 +1 T>A (left) or DNMT1 +6 G>C (right) and no errors recovered from the indicated cells edited with either a pegRNA or epegRNA over 1 month. Edits are specified such that +1 and +6 represent nucleotide positions downstream from the Cas9(H840A) nick, with the +6 edit targeting within the PAM. Day 0 represents the timepoint at which cells were transduced with (e)pegRNA constructs. e, Percentages of sequencing reads containing either only the HEK3 +1 T>A (left) or DNMT1 +6 G>C (right) substitution (green) or errors (gray) from cells sampled 28 days post transduction of epegRNA constructs. Data and error bars in d and e represent mean ± s.d. (n = 3 independent biological replicates). c, Created with BioRender.com.
We also tested the PE2 approach with stable expression of either pegRNAs or epegRNAs in an MMR-deficient, PEmax-derived cell line in which the MMR gene MLH1 was genetically disrupted (we call this cell line PEmaxKO; Fig. 1a,b and Extended Data Fig. 1b). Consistent with our previous observation that MMR can inhibit small prime edits39,43, editing in these cells showed remarkably high efficiencies (Fig. 1d); specifically, we observed 68.9% precise editing (that is, intended edit with no errors) of HEK3 +1 T>A and 81.1% of DNMT1 +6 G>C after only 7 days of editing using epegRNAs. For comparison, editing with the same conditions in PEmax cells produced far less precise editing for HEK3 +1 T>A (2.3%) and moderately less for DNMT1 +6 G>C (55.9%), with the difference between edits observed in PEmax cells attributable to our previous finding that G>C edits can evade MMR to some degree39,44,45,46,47. Additionally, precise editing in PEmaxKO cells continued to increase over time, ultimately reaching ~95% for both edits by day 28 using epegRNAs (Fig. 1d,e). These results demonstrate near perfect editing for two edits that did not reach higher than 30% precise editing with transient expression of PE2 and pegRNAs in our previous study, despite being evaluated in an MMR-deficient cell line39; however, we note that because neither target was predicted to be in an unfavorable chromatin context48, each of these edits may be particularly amenable to prime editing.
Prime editing with constitutive PEmax and epegRNA sensor libraries
We next evaluated prime editing with stable expression of PEmax and epegRNAs across many additional edits. For these experiments, we designed two self-targeting ‘sensor’ libraries that linked epegRNA expression cassettes to sensor target sequences (Fig. 2a). Such libraries allow editing to be analyzed across many guide RNA–target pairs and reasonably model editing at endogenous targets14,16,34,37,48,49,50,51,52,53,54, although because target sequences are transduced, editing outcomes and efficiencies do not account for all features of endogenous loci such as chromatin context. Using this approach, we evaluated thousands of epegRNA–target pairs in two sets: one with many target sequences and few edit types, the other with many edit types and few target sequences.
a, Schematic of self-targeting expression cassette. Regions indicated with purple varied coordinately across the library (as denoted by dots), with dark purple specifying variable epegRNA components and light purple specifying the corresponding target site. mU6, modified mouse U6 promoter. b, Schematic of workflow for sensor screens. c, Percentages of sequencing reads from sensor targets containing only the precise edit from two replicates of +5 G>H screens performed in PEmax (top) or PEmaxKO (bottom) cells, at day 28 post transduction. Each data point represents an individual epegRNA–target pair. Correlation between replicates (Pearson’s r) indicated. Density plots on top and side show data distribution for replicates 1 and 2, respectively. d, Replicate-averaged percentages of sequencing reads from sensor targets containing only the precise edit for experimental (noncontrol) epegRNA–target pairs from +5 G>H screen performed in PEmax cells, at indicated days post transduction (n = 635, 630 and 633 pairs per day for G>A, G>C and G>T edits, respectively). Median and interquartile range (IQR) of the full set of experimental epegRNA–target pairs installing specified substitution types on indicated days are shown. Whiskers extend 1.5× IQR past the upper and lower quartiles. e, As in d, but for +5 G>H screen performed in PEmaxKO cells. f, Replicate-averaged percentages of sequencing reads from sensor targets containing only the precise edit for experimental epegRNA–target pairs from +5 G>H screens performed in PEmax and PEmaxKO cells, at day 28 post transduction. Density plots on top and side show data distribution per substitution type for PEmax and PEmaxKO cells, respectively. g, Maximum precise editing frequencies (replicate-averaged) among multiple epegRNA designs for each edit in the Tiled edits screen performed in PEmaxKO cells, at day 10 post transduction. Each data point represents a unique edit (n = 195), binned by edit position relative to the Cas9(H840A) nick. Boxplots indicate the median and IQR for each group with whiskers extending 1.5× IQR past the upper and lower quartiles. h, As in g, but binned by substitution type. b, Created with BioRender.com.
To design our first library, we mined data from a previously published, self-targeting prime editing screen52. This screen evaluated editing of +5 G>C substitutions across 2,000 target sequences with transient editor expression and 48,000 stably expressed pegRNA-target pairs, including 24 pegRNAs per target with various reverse transcriptase template (RTT) and primer binding site (PBS) lengths. From those 2,000 target sites, we randomly selected 640. We then identified the most efficient pegRNA for each target (0.14–60.4% precise editing after 5 days from published data) and redesigned each as three epegRNAs with identical PBS sequences and nearly identical RTTs, with each RTT targeting one substitution (G>A, G>T or G>C) to the invariant +5 G of the corresponding protospacer adjacent motif (PAM). Our final ‘+5 G>H’ library consisted of 2,000 epegRNA–target pairs (Extended Data Fig. 2a and Supplementary Table 1) with 22 positive controls (epegRNAs specifying edits tested previously at endogenous loci)25,39 and 58 negative controls (epegRNAs specifying reference sequence or nontargeting epegRNAs).
To evaluate editing efficiencies across our +5 G>H edits, we transduced this first self-targeting library into our PEmax and PEmaxKO cells at a low multiplicity of infection (MOI of 0.7). We selected the transduced cells for cassette integration and grew the resulting population for approximately 1 month, sampling cells at 7, 14, 21 and 28 days post transduction (Fig. 2b). We sequenced the epegRNA–target pairs from each sample, determined editing outcomes and quantified three outcome categories for each pair: outcomes containing only the precise edit, those with at least one error and unedited sequence (Supplementary Table 1). For many epegRNA–target pairs, we observed high-efficiency precise editing, with 20.2% (388) and 75.5% (1,453) of edits reaching 75% or higher by day 28 in PEmax and PEmaxKO cells, respectively (Fig. 2c). Additionally, we observed generally low error frequencies (median errors on day 28, <4%) and reproducibility of editing results (Pearson’s r = 0.932–0.999 for precise edits, r = 0.663–0.975 for errors; Fig. 2c and Extended Data Fig. 2b,c).
Results from our +5 G>H screens produced four observations relevant to prime editing efficiencies. First, across comparable edits, prime editing was more efficient than observed previously with alternative conditions (that is, transient expression of PE2 in MMR-deficient HEK293T cells using pegRNAs52; Extended Data Fig. 2d). Second, +5 G>C edits outperformed other +5 substitutions in PEmax cells (median precise editing on day 28, 79.2% for G>C, 20.2% for G>A, 19.4% for G>T; Fig. 2d), consistent with widespread evasion of MMR for +5 G>C edits, but all three substitution types were installed to similarly high efficiencies in PEmaxKO cells (median precise editing on day 28, 83.0% for G>C, 81.8% for G>A, 83.0% for G>T; Fig. 2e). Third, edit installation occurred relatively quickly in PEmaxKO cells, with efficiencies reaching reasonable levels after 7 days and then continuing to improve over the following weeks (Fig. 2e). Fourth, similar to previous findings by others52,54, our longer RTT and PBS lengths generally installed edits with lower frequencies (Extended Data Fig. 2e,f). Altogether, these observations demonstrate proof-of-principle for high-efficiency editing across many epegRNA–target pairs in PEmaxKO cells and for select pairs in PEmax cells (Fig. 2f), providing promise for functional screening.
We next designed a sensor library to assess prime editing efficiencies in PEmaxKO cells across a wider range of edit types and without prescreened target sequences (Supplementary Table 2). We first chose five epegRNA spacer sequences from the mouse ZRS enhancer based on their proximity to transcription factor binding sites of potential interest55 and the presence of a 5′-NGG-3′ motif, with no additional criteria applied. We then designed epegRNA extensions to install 1 bp substitutions across the regions of interest (up to 20 designs per edit). Our final ‘Tiled edits’ library contained 3,745 epegRNA–target pairs specifying 195 different edits with all 12 substitution types and edits spanning +1–21 nucleotides (nt) downstream of the Cas9(H840A) nick and additional positive25,56 and negative controls. We screened this library in our PEmaxKO cells with similar procedures as our previous sensor screen (Fig. 2b) and sampled cells 5, 8 and 10 days post transduction to quantify editing outcomes (Methods). Editing measurements were well-correlated across replicates, indicating reproducible screen results (Pearson’s r = 0.982–0.995 for precise edits, r = 0.723–0.804 for errors; Extended Data Fig. 3a).
As expected from the diversity of epegRNA designs included in this library, editing rates varied for each edit across epegRNAs with different extension sequences, with examples of poorer performance observed for designs known to be suboptimal, such as those with a short PBS (for example, 6 nt) or RTT sequences terminating at the edited nucleotide (Extended Data Fig. 3b). To evaluate across edits, we selected the most efficient epegRNA per edit (Extended Data Fig. 3a). Associated results revealed that many edits were efficiently installed by day 10, with median precise editing of 58.0% across all edits or 56.0% after excluding A>G edits, which may be influenced by possible adenosine deaminase acting on RNA (ADAR) during lentiviral packaging53 (Extended Data Fig. 3c). Edits positioned +1–13 nt from the Cas9(H840A) nick were especially efficient (median precise editing of 50–80% and errors of 2.6–6.2% on day 10; Fig. 2g), and editing was generally comparable across many substitution types (Fig. 2h). Efficiencies also varied across the five target sequences, suggesting an influence of spacer activity (Extended Data Fig. 3d). Together, our sensor screens show that, when screening with constitutively expressed PEmax and epegRNAs, high-efficiency editing is achievable with the PE2 approach across multiple positions and substitution types in the absence of MMR and for G>C edits without MMR disruption.
Dropout screening with prime editing at massive scale
We next directly evaluated the feasibility of high-throughput variant screening with our optimized prime editing conditions by screening tens of thousands of edits with expected and easily measurable phenotypes, that is, edits that introduce premature nonsense codons in essential genes. For this experiment, we developed an epegRNA design pipeline (Fig. 3a). This pipeline identifies codons that can be changed to stop codons within candidate genes, selects those that can also be changed to a synonymous codon, designs epegRNA extensions specifying 1–3 bp substitutions within selected codons, and generates oligonucleotide sequences for library construction (Extended Data Fig. 4a). Using this pipeline, we designed 129,696 ‘stop’ epegRNAs targeting 24,240 nonsense edits to 16,991 codons in 1,231 commonly essential genes (defined by DepMap57; Extended Data Fig. 4b) with an average of ~13 codons targeted per gene (roughly two codons per protospacer; Fig. 3b,c). We also designed 94,245 spacer- and codon-matched ‘synonymous’ control epegRNAs specifying 17,857 edits that do not alter amino acid sequence and thus should not disrupt protein function (Fig. 3b,c). We included up to eight different epegRNA extensions for each edit (PBSs of 11 or 13 nt; RTTs of 10, 12, 15 or 20 nt) and constrained edits to within +20 nt downstream from the Cas9(H840A) nick. Additionally, we designed two sets of controls that should not alter genomic sequence, including 12,000 ‘no edit’ epegRNAs with extensions specifying the reference sequence and 3,000 ‘nontargeting’ epegRNAs58 (Fig. 3b,c). We call our final ~240,000 epegRNA library StopPR (stop codon prime editing; Supplementary Tables 3 and 4).
a, Schematic of design pipeline for generating StopPR epegRNA library. CRISPick58 and gene annotations used to identify edits. ‘Inaccessible’ filters removed nonsense or spacer- and codon-matched synonymous edits that could not be made with prime editing (for example, if edit occurred upstream of the Cas9 H840A nick). b, Schematic illustrating the intended consequences of prime editing for each epegRNA category in StopPR. Intended edit is shown in red. c, StopPR composition, including numbers of genes and/or protospacers targeted, and numbers of stop epegRNAs with different substitutions, positions and codons, for each edit length. Multiple codons were often targeted near the same protospacer, with 7,785 protospacers targeting 16,991 codons. d, Growth phenotypes for epegRNAs from independent biological replicates of StopPR screen collected 14 days post transduction. Dotted lines denote phenotype cutoffs (Z < −2). Correlation (Pearson’s r) between replicates indicated for each epegRNA category. Density plots on top and side show data distribution per epegRNA category for replicates 1 and 2, respectively. e, Gene-level growth phenotypes from StopPR screen (calculated as average phenotype of the absolute strongest two stop epegRNAs per gene on day 14 post transduction) binned by CRISPRi phenotypes (as previously determined in K562 cells59, split into three equally sized bins with number of genes denoted as n). Individual P values from one-way ANOVA (F = 59.49) and two-sided Tukey post hoc denoted. Median and IQR of the full set of epegRNAs used in this analysis for each phenotype bin are indicated. Whiskers extend 1.5× IQR past the upper and lower quartiles. Dotted line denotes phenotype cutoff (Z < −2). f, Growth phenotypes from StopPR screen (y axis, replicate-averaged) and independent retest of individual epegRNAs (x axis, triplicate-averaged), for stop epegRNAs (ten), spacer- and codon-matched synonymous epegRNAs (ten) and negative controls (five). Measurements sampled from day 14 post transduction. g, Percentages of sequencing reads containing precise substitutions from cells sampled 7 days post transduction using same epegRNA constructs as in f. Matched stop (red) and synonymous (blue) epegRNAs targeting the same codon for each site. Exact edits denoted. Data and error bars represent mean ± s.d. (n = 3 independent biological replicates). b, Created with BioRender.com.
We screened StopPR in our PEmaxKO cell line and sampled cells at 7, 14 and 28 days post transduction (Extended Data Fig. 4c). To determine phenotypes, we sequenced the integrated epegRNAs from each sample (Extended Data Fig. 4d), removed those with low read counts and calculated log2 fold changes in relative abundance at days 14 and 28 compared to day 7, which we expressed as Z scores normalized to nontargeting controls (Supplementary Table 3 and 4 and Methods). We recovered Z scores for 106,092 stop epegRNAs, 77,118 synonymous epegRNAs, 10,007 no edit controls and 2,312 nontargeting controls. Using Z < −2 to threshold these measurements revealed that 17.1% (18,187) of stop epegRNAs induced a negative growth phenotype by day 14 (Fig. 3d), which increased to 23.1% (24,510) by day 28 (Supplementary Discussion and Extended Data Fig. 4e-g). These stop phenotypes were reproducible, as demonstrated by correlation between replicates (Pearson’s r = 0.71 at day 14, r = 0.51 at day 28), and were highly specific when compared to control epegRNAs, which had very little activity overall (2.3% with Z < −2 across all sets of control epegRNAs by day 28) and low replicate correlation (Fig. 3d and Extended Data Fig. 4e). Collapsing Z scores from stop epegRNAs to gene-level phenotypes (Methods) successfully reported 80.1% (984 of 1,228) of targeted genes as required for cell growth by day 14, improving to 89.3% (1,097) by day 28. Additionally, comparison to results from a published CRISPR interference (CRISPRi) screen59 showed general agreement with phenotypic strength (Fig. 3e).
Stop epegRNAs specifying the same edit (average of roughly five analyzed per edit) often varied in phenotype strength (for example, for edits with at least one stop epegRNA with Z < −2 at day 14, median range of Z scores across stop epegRNA designs was 3.7; Supplementary Table 4), indicating differences in epegRNA activity; however, 35.8% (7,996 of 22,315) of edits were associated with a negative phenotype (Z < −2) from one or more stop epegRNAs by day 14, increasing to 45.5% (10,160) by day 28. At the codon level, 40.6% (6,353 of 15,646) and 50.8% (7,948) were associated with a negative phenotype (Z < −2) from one or more stop epegRNAs by day 14 and day 28, respectively. Over half of our stop epegRNAs would therefore have produced a phenotype had the library included only the most active designs per codon. Our results thus demonstrate that, when applied in high-throughput, prime editing can perturb genes with enough efficiency to generate reproducible phenotypes from epegRNA abundance measurements and without sequencing edited loci.
To validate stop epegRNA phenotypes observed in our screen, we individually retested ten stop epegRNAs, ten spacer- and codon-matched synonymous controls, three no edit epegRNAs and two nontargeting epegRNAs in PEmaxKO cells. All stop epegRNAs reproduced growth phenotypes, while negative controls showed little to no effect (Fig. 3f and Extended Data Fig. 4h). Additionally, endogenous target sequencing after only 7 days of editing revealed a median of 66.3% precise editing for validated stop epegRNAs, a median of 64.4% precise editing for spacer- and codon-matched synonymous controls and low on-target errors for both types of epegRNAs (median 2.5%; Fig. 3g and Extended Data Fig. 4i). Thus, all 20 edit-specifying epegRNAs showed reasonable frequencies of edit installation, with the most efficient reaching 83.5% precise editing. We conclude that phenotypes observed for these stop epegRNAs arose from intended edits.
Influence of epegRNA design and targeted loci on phenotype
Multiple previous studies have examined (e)pegRNA design principles in the context of sensor screens and shown how different design choices (for example, PBS and RTT length or edit position) affect prime editing across transduced target sequences that do not preserve endogenous context34,37,48,52,53,54. Additional studies have profiled prime editing efficiencies at exogenous targets randomly integrated at many genomic locations without regulatory elements and demonstrated the influence of chromatin context on prime editing48,60. Results from our StopPR screen provided a unique opportunity to ask how features of epegRNA design, specified edit and genomic context impact the phenotypic consequences of prime editing across thousands of unaltered endogenous loci.
We first defined groups of stop epegRNAs based on several relevant features (for example, edit location in gene and stop codon installed) and compared group-averaged phenotypes in pairwise combinations (Fig. 4a and Extended Data Fig. 5a). For this analysis, we considered only the top two epegRNAs per gene with the absolute strongest phenotypes and excluded epegRNAs specifying PAM-disrupting edits (that is, those at position +5–6) from all groups except edit position type. These subsettings were intended to enrich for functional epegRNAs while also ensuring equal representation across genes in our library and preventing PAM-disrupting edits, which are known to be particularly efficient52,53,54, from influencing other features. Results from this analysis showed that most features had a mild effect, with the strongest effect from edit position relative to the Cas9(H840A) nick. PAM-disrupting edits resulted in stronger phenotypes than other positions, with previous positions (+1–4) remaining effective to a lesser degree, and further positions (+7–20) typically less effective, particularly beyond the +16 position (Fig. 4a,b). Additional features that contributed to phenotypic strength were edit location within the targeted gene body and orientation of epegRNA spacer sequence with respect to gene expression; specifically, edits targeting codons within the first 33% of genes and epegRNAs with antisense spacers were slightly more disruptive (Fig. 4a and Extended Data Fig. 5b). The former observation could be attributed to higher editing efficiency earlier in the gene60, but also may reflect the influence of nonsense-mediated mRNA decay before the last exon of a gene61. The latter observation could be due to higher editing efficiency on the template strand60, or a higher phenotypic impact of those edits from earlier disruption of the encoded protein (Extended Data Fig. 5c).
a, Cohen’s d effect size for characteristics of epegRNA design and targeted genomic loci. All features except edit position type were evaluated without +5–6 edits. Top two stop epegRNAs with the absolute largest phenotypes at day 14 post transduction analyzed per gene. For edit location in gene, beginning, middle and end refer to editing within (0–33], (33–67] or (67–100]% of a gene, respectively. For extension length, other RTT lengths include 10, 12 and 20 nt. Cohen’s d measurements greater than 0.8 in magnitude are generally considered ‘large’. The center value denotes Cohen’s d statistic estimate and bar ranges indicate 95% confidence intervals, so that intervals including effect size 0 are not significant. All epegRNA counts (n) per feature group and P values are listed in Extended Data Fig. 5a. b, Top, schematic of epegRNA target sequence. Bottom, average growth phenotypes from StopPR screen sampled from day 14 post transduction (left) for stop epegRNAs with edits specified as indicated (right). Dark gray indicates the edited position; blue indicates positions +1–3 with respect to the Cas9(H840A) nick, light green indicates +4 (non-PAM-disrupting), dark green indicates +5–6 (PAM-disrupting) and peach indicates +7–20. Numbers of stop epegRNAs denoted. c, Replicate-averaged growth phenotypes for stop and spacer- and codon-matched synonymous epegRNAs from StopPR screen sampled from day 14 post transduction. Data points colored by density, indicated by number of neighbors. Dotted lines denote phenotype cutoffs (Z < −2). Green dots indicate strong negative growth phenotypes (Z < −5) associated with 69 synonymous epegRNAs. d, Growth phenotypes for synonymous epegRNAs (bottom plot) from StopPR screen sampled from day 14 post transduction, binned by edit position relative to exon boundaries (top schematic). Phenotypes were calculated as the average of 50 epegRNAs with the strongest negative phenotype at each position. Positions A+1 and A+2 were excluded, as fewer than 50 synonymous epegRNAs targeted those positions. Vertical lines indicate 95% confidence intervals generated for each average. The horizontal dotted line denotes phenotype cutoff (Z < −2). Splice site acceptor (AG) and donor (GT) motifs are indicated in the schematic.
We next evaluated the effect of many features simultaneously using a multiple linear regression model (Methods). Results from this model confirmed that each of the features previously interrogated by effect size contributed to growth phenotypes, with RTT length, edit position relative to the Cas9(H840A) nick and substitution and edit location within the gene body having the strongest effects (Supplementary Table 5). Subsetting our screen results by the most important features (that is, RTT of 15 nt, editing positions +5–6 and codons within the first 33% of genes targeted with substitution types determined to have the strongest impact on phenotype) more than doubled our day 14 rate of phenotype induction (39.3% of stop epegRNAs with Z < −2), and expanding this subset to include edit positions +1–6 showed increased phenotype induction by nearly double (1.9× to 32.7%). These results confirm the importance of epegRNA design for phenotype-based prime editing screens and identify features that can be used in the design of future screens.
Finally, we investigated the potential effects of chromatin context on phenotype induction using the recently released ePRIDICT tool48. Briefly, we determined ePRIDICT scores for nearly all codons targeted in StopPR (Methods) and used published score thresholds to identify those with favorable (‘high’ score, >50) or unfavorable (‘low’ score, <35) chromatin contexts. Of the 15,008 codons targeted by StopPR that we analyzed, 35.8% (5,378) were classified as favorable while just 0.9% (138) were unfavorable. Moreover, 74.0% (11,106) were in the highest 25% of all ePRIDICT scores genome-wide, indicating a better-than-average chromatin context for prime editing. This uneven distribution of scores likely reflects the pan-essentiality of genes targeted by our library, which we expect to be expressed and thus positioned in favorable chromatin contexts. Nevertheless, we observed an enrichment for phenotype induction among stop epegRNAs targeting codons with favorable scores (odds ratio 1.87; 95% confidence interval 1.81–1.94; Fisher’s exact test P = 2.36 × 10−301) and mild depletion among those with unfavorable scores (OR = 0.81; 95% CI = 0.67–0.97; Fisher’s exact test P = 0.02), demonstrating that ePRIDICT has potential to aid epegRNA library design even when targeting generally favorable regions of the genome. Indeed, restricting StopPR to only targets with high ePRIDICT scores showed a 35% increase in our phenotype induction rate at day 14 (23.1% of stop epegRNAs with Z < −2).
Dropout phenotypes are highly specific and edit-dependent
For each of the 129,696 stop epegRNAs within StopPR, we included one spacer- and codon-matched control epegRNA specifying a synonymous edit at the same codon (with multiple stop epegRNAs sometimes matched to the same control and thus only 94,245 synonymous epegRNAs designed in the library). Similar to no edit and nontargeting controls, this subset of epegRNAs demonstrated very low activity (2.4% with Z < −2 at day 14) and associated results showed low correlations between replicates (Pearson’s r = 0.30 at day 14; Fig. 3d). Even at codons where nonsense mutations produced strong phenotypes (4,090 with Z < −5) or where synonymous controls specified +5–6 position edits (12,534), we observed few effects from synonymous epegRNAs (119 and 489 with Z < −2, respectively; Fig. 4c and Extended Data Fig. 6a). These results indicate high specificity for growth phenotypes attributed to stop epegRNAs. Moreover, this rate of unintended phenotypes compares favorably to other platforms developed for variant screening, where false positives were relatively more common, presumably due to unintended on-target edits or reproducible off-target effects. In particular, using the same cutoff (Z < −2), a screen using first-generation base editing technology showed negative growth phenotypes for 8.7–26.5% of single-guide RNAs (sgRNAs) designed to install silent edits in essential genes7 and a platform built with the more indel-prone PE3 approach to prime editing called 7.9–11.2% of no edit controls as notably depleted from growth screens32. The latter study also suggested that prime editing can indirectly induce phenotypes through disruption of transcription at targeted genes akin to CRISPRi32. Examining the possibility of such effects in our screen showed that edit-independent phenotypes were not common nor pervasive (Supplementary Discussion, Extended Data Fig. 6b–d and Supplementary Table 6).
Unbiased identification of splice site variants
Evaluation of synonymous control epegRNAs revealed a small set that caused unexpectedly strong phenotypes, which on further consideration, could be attributed to their specified edit (excluded from analysis in Extended Data Fig. 6b–d). Specifically, we observed 69 synonymous epegRNAs targeting 25 loci with Z < −5 (green dots in Fig. 4c), including 61 that induced a stronger phenotype than the corresponding spacer- and codon-matched stop epegRNA. Providing a plausible explanation for most of these phenotypes, most corresponding epegRNAs (65 targeting 23 loci) specified edits that would disrupt splice site motifs at 3′ exon boundaries. Given these results, we reevaluated the locations of all synonymous edits in StopPR and found that 2,637 targeted the last nucleotide at the 3′ end of an exon (D−1 position), with slight overrepresentation of that edit position due to the presence of a PAM within the canonical splice site donor motif (AGGT)62. This subset of epegRNAs was enriched for negative growth phenotypes (11% with Z < −2 compared to 2.4% for all synonymous epegRNAs), consistent with mutational intolerance at the D−1 position, which has also been observed through analysis of naturally occurring near-splice-site mutations63. Further examining synonymous epegRNAs by edit position relative to either exon boundary (within 10 bp) then revealed no strong effects (Z < −5) at adjacent exonic positions (Fig. 4d; see Supplementary Discussion for additional evaluation of small subset of epegRNAs targeting the D+1 position). We subsequently validated several phenotypes associated with D−1 variants in PEmaxKO cells (Extended Data Fig. 6e). Altogether, these results demonstrate use of our platform for interrogation of an additional class of genetic variants.
Discussion
The development of prime editing has sparked wide interest in its potential use for high-throughput characterization of genetic variants. Evident of that interest, several groups have recently reported or preprinted proof-of-principle prime editing screens (Supplementary Table 7)31,32,33,34,35,36,37,38. While each of these efforts represents an informative step forward, to circumvent low prime editing efficiencies, others have used positive selection phenotypes33, readouts filtered by or calculated from editing efficiencies measured with exogenous sensors or endogenous target sequencing31,34,35,36,37, or a more efficient but indel-prone version of prime editing (PE3)31,32. Screening platforms demonstrated by many other studies are thus limited: positive selection screens are restricted to specific phenotypes, screens that rely on sensors increase cost and experimental complexity, platforms that calculate phenotypes from endogenous target sequencing cannot be easily multiplexed across genomic loci and PE3 increases unintended editing at targeted sites, potentially confounding results.
We sought to develop an accurate and generalizable prime editing screening platform that could be used with standard screening protocols (for example, parallel construction of epegRNA libraries and phenotypes calculated from epegRNA abundance). To build this platform, we implemented prime editing with the PE2 approach and stable expression of PEmax and epegRNAs in MMR-deficient cells. We found that, when implemented with these features, prime editing can install precise variants with high efficiency and can generate reproducible, growth-based negative selection phenotypes in large, pooled experiments, achieving a reasonable rate of phenotype induction across codons in essential genes targeted with nonsense mutations (50.8% of codons by day 28 with at least one epegRNA out of multiple tested). Additionally, by benchmarking our approach with a highly complex prime editing library (comprising hundreds of thousands of epegRNAs), we identified growth phenotypes from splicing variants not intentionally designed into our library, which highlights strong potential for discovery-based applications of our approach.
Concurrent studies piloting alternative prime editing screening platforms have performed similar (albeit typically much smaller) benchmarking experiments using variants of known function. For example, a recent study applied the PE3 approach to examine growth phenotypes from 115 epegRNAs specifying stop codons in growth-related genes32. Results from this experiment demonstrated reproducible negative selection phenotypes, but correlated measurements from spacer-matched, no edit controls suggested that many of those effects may be nonspecific. Another recent effort leveraged the PE2 approach to quantify the effects of targeting nonsense mutations to one essential gene (RPL15) and found that variant phenotypes were categorized as depleting more often when endogenous target sites were sequenced and used to determine results than when phenotypes were calculated from epegRNA abundances alone (80% of nonsense variants called as detrimental with the former strategy and only 32% with the latter, across 25 total variants)35. When evaluating the same gene in our StopPR screen, we observed negative growth phenotypes (Z < −2) for 77.8% (seven out of nine) of targeted nonsense variants from at least one stop epegRNA. Results from our platform therefore compare favorably to contemporary platforms and demonstrate the ability to measure highly specific and reliable phenotypes without the need for endogenous sequencing.
A key challenge moving forward will be to increase the efficiencies of prime editing libraries overall, thus enabling screening with fewer epegRNAs per target. Our results highlight the importance of this goal, as only 23.1% of stop epegRNAs from our library induced negative growth phenotypes by day 28. While promising and comparable to rates observed using first-generation base editor technology—wherein 30.5% of sgRNAs designed to install nonsense codons in essential genes caused negative growth phenotypes7—this rate of phenotype induction will need to be improved. To aid construction of more active epegRNA libraries, we identified features of epegRNA design and targeted loci that contribute to activity (for example, edit location in the gene body). Additionally, our growth phenotypes and the tens of thousands of epegRNAs responsible for them should provide a useful resource for future efforts seeking to develop and test new prime editing tools, including experimental systems, computational pipelines and analytical approaches. Other advances may also be explored to further improve prime editing efficiencies, including alternative or additional methods of (e)pegRNA stabilization40,64.
Notably, features of our platform (that is, MMR deficiency and stable expression of editing components) impose a requirement for extensive cell engineering before screening, and thus would be expected to impede some applications. Screens in primary cells would presumably be relatively difficult to implement, although recent work has shown that base editor screens can be performed in primary human CD4+ T cells over the course of 9 days (ref. 5), suggesting that similar screens may be feasible using our 14-day platform, if editor delivery and knockout of MLH1 were highly optimized. Additionally, we found that +5 G>C substitutions are typically installed with high efficiency with and without MMR, suggesting that such edits may be suitable for applications where MMR cannot be inactivated by genetic disruption. Alternatively, the use of a dominant negative MLH1 protein39 or an inducible knockdown system43 may also be explored in some cases. Because MMR has well-known specificity for single base pair mismatches and small insertion–deletion loops up to 13 nt (refs. 39,65), we do not expect that loss of MMR would enable screening of larger insertion or deletion prime edits65.
In sum, we provide a proof-of-principle demonstration of precise, massively parallel dropout screening with prime editing using standard screening protocols and the highly specific PE2 approach.
Methods
Experimental model and subject details
Prime editing cell lines
All prime editor constructs contained an SpCas9(H840A) nickase, fused to an MMLV RT (D200N, T306K, W313F, T330P, L603W). In addition, PEmax editor construct contained a codon-optimized MMLV RT and the following additional mutations in the SpCas9 nickase: R221K and N394K. Construction of PEmax cell line described previously40. PE2 cell line constructed in the same manner as PEmax cell line. To construct MLH1 knockout PEmax cells (PEmaxKO), 122 pmol of Alt-R S.p. Cas9 Nuclease V3 (IDT 1081058) and 200 pmol Alt-R CRISPR–Cas9 sgRNA targeting MLH1 (IDT Hs.Cas9.MLH1.1.AG, 5′-mC*mU*mU*rCrArCrUrGrArGrUrArGrUrUrUrGrCrArUrGrUrUrUrUrArGrArGrCrUrArGrArArArUrArGrCrArArGrUrUrArArArArUrArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrArArCrUrUrGrArArArArArGrUrGrGrCrArCrCrGrArGrUrCrGrGrUrGrCmU*mU*mU*rU) were complexed for 20 min at room temperature and were nucleofected into 5 × 105 PEmax cells using the SE Cell Line 4D-Nucleofector X Kit (Lonza V4XC-1032) and program FF-120, according to the manufacturer’s protocol. Five days post nucleofection, cells were sorted by BD FACSAria Fusion Flow Cytometer into 96-well plates at one cell per well with 150 μl conditioned culture medium. Single cells were grown and expanded for 2–3 weeks into clonal lines. Clones with a high percentage of cells with expression of EGFP according to AttuneNXT flow cytometry analysis (Attune Cytometric Software) were selected for further characterization.
General cell culture and selection conditions
Lenti-X 293T was purchased from Takara (632180) and K562 (CCL-243) was purchased from ATCC. K562 stable prime editing cell lines were maintained in RPMI 1640 medium (Gibco, 22400089) supplied with 10% FBS (Corning, 35-010-CV) and penicillin/streptomycin (Gibco, 15140122; 100 U ml−1). The 293T cells were maintained in DMEM medium (Corning, 10-013-CV) supplied with 10% FBS and penicillin/streptomycin. All cells were kept in a humidified incubator at 37 °C, 5% CO2. For all pooled screens, K562 cells were kept in a humidified incubator with agitation (multitron) at 37 °C, 5% CO2, 52–76 rpm depending on total volume. AttuneNXT (Attune Cytometric Software) was used to quantify fluorescent protein expression.
General sequences and cloning
For endogenously tested HEK3 +1 T>A and DNMT1 +6 G>C substitutions, spacer and 3′ extension sequences were from a previous publication (HEK3_4a_1TtoA and DNMT1_ED5f _6GtoC, respectively)25, modified scaffold sequence was 5′-GTTTAAGAGCTATGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC (ref. 66) and RNA structural motif for epegRNAs was tevopreQ1 (5′-CGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAA)41. pegRNAs and epegRNAs used the pU6-sgRNA-EF1Alpha-puro-T2A-BFP (Addgene no. 60955)29 backbone. Cloning details for these guides described previously40.
To create a backbone plasmid suitable for use in cloning our self-targeting epegRNA libraries, an intermediate backbone plasmid (pJY126) was first generated by removing BsmBI restriction sites on pU6-sgRNA-EF1Alpha-puro-T2A-BFP (Addgene no. 60955)29 through Golden Gate Assembly (NEB E1602S). Then, through restriction cloning, a DNA duplex annealed from DNA oligos (5′-TTGGGAGACGCCTGCAGGCTGCTAAGCTAGGCGCGCCCGTCTCATTTTTTTC, 5′-TCGAGAAAAAAATGAGACGGGCGCGCCTAGCTTAGCAGCCTGCAGGCGTCTCCCAACAAG) was inserted into pJY126 digested with BstXI (NEB R0113S) and XhoI (NEB R0146S). This intermediate backbone (pJY127) was then digested with BamHI (NEB R0136S) and NotI (NEB R0189S), and a DNA duplex annealed from DNA oligos (5′-GATCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACGC, 5′-GGCCGCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTG) was inserted through restriction cloning to produce the final pAC025 backbone plasmid.
To create a backbone plasmid suitable for use in cloning StopPR (lAC002) with a tevopreQ1 motif41, we first inserted a DNA duplex annealed from DNA oligos (5′-CGCGCCCGTCTCACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAATTTTTTTC, 5′-TCGAGAAAAAAATTCTAGTTGGTTTAACGCGTAACTAGATAGAACCGCGTGAGACGGG) into pJY127 digested with AscI (NEB R0558S) and XhoI. This intermediate backbone (pJY128) was then digested with BamHI and NotI, and a DNA duplex annealed from DNA oligos (5′-GATCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACGC, 5′-GGCCGCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTG) was inserted through restriction cloning to produce the final pAC026 backbone plasmid.
For endogenously validating epegRNA hits from StopPR (lAC002), we used pAC025 as the backbone to construct the individual plasmids encoding ten selected pairs of stop and synonymous epegRNAs (pWY037-056), along with three no edit (pWY057-059) and two nontargeting epegRNAs (pWY060, pWY061) as negative controls. Gene fragments from Twist Bioscience were assembled into pAC025 digested with XbaI (NEB R0145S) and XhoI to produce pWY037-046 and pWY057-061, and gBlocks from IDT were assembled into pAC025 digested with BstXI and XhoI to produce pWY047-056 using NEB Hifi DNA assembly (E2621S).
Method details
Western blot for prime editor and MLH1
Cells were collected from cell culture (1 × 104 cells per µl) and lysed in 1× lysis buffer (1× NuPage LDS, 50 mM sample reducing agent). After resuspension via vortex, samples were incubated at 70 °C for 10 min. Temperature was raised to 85 °C for 3 min. After incubation, samples were moved to room temperature and Benzonase Mix (final concentration 5 mM MgCl2, 1.25 U µl−1 benzonase) was added. Samples were then incubated at 37 °C for 30 min and subsequently used for protein electrophoresis. Samples (1 × 105 cells) were loaded and run on 3–8% Tris-Acetate Gels (ThermoFisher EA0375BOX) in Running Buffer (1× NuPage tris-acetate running buffer, 2.5× NuPage antioxidant) at 180 V until completion. Proteins were then transferred to an ethanol-activated polyvinyl difluoride membrane (BioRad 1620177) in transfer buffer (1× NuPage transfer buffer, 10% methanol, 2.5× NuPage antioxidant, 0.025% SDS) at 30 V for 1 h. Protein transfer and total protein content was assessed by Ponceau staining (Sigma Aldrich P7170-1L), and the membrane was carefully cut into three strips (<65, 65–120, >120 kDa) to separately blot for each protein of interest. Ponceau stain was washed out with 1× tris-buffered saline with Tween (TBST), and then these membranes were incubated in blocking buffer (1× TBST and 5% dry milk) for 1 h at room temperature. Membranes were then incubated overnight on a shaker at 4 °C in primary antibodies (β-actin CST3700S; MLH1 Invitrogen MA5-32041; Cas9 Takara 632607) diluted 1:1,000 in 1× TBST with 3% BSA, washed 3× in 1× TBST for 5 min and then incubated in secondary antibody (1× Li-COR intercept buffer, 1:20,000 IRDye secondary antibodies: goat antimouse LI-COR BioScience 926-32210 and goat antirabbit LI-COR BioScience 926-68071) for 1 h at room temperature in dark. Before imaging on a Li-COR Odyssey Infrared Imaging system, membranes were washed 3× in 1× TBST for 5 min.
Oligonucleotide library designs
Self-targeting +5 G>H library (lDS004)
Here, 640 target sites in human protein-coding genes were randomly selected from ‘library 1’ in ref. 52 and the corresponding highest-efficiency RTT/PBS length combination was determined for each selected site. We then designed three epegRNAs per target site with the selected PBS and identical or nearly identical RTT sequence, each specifying a +5 G>A, G>T or G>C edit. With the addition of 22 positive control epegRNAs for sites tested endogenously in the literature, 51 nontargeting controls (with a scrambled target site sequence) and seven no edit controls (with epegRNAs specifying the reference sequence), the final library of 2,000 epegRNA–target pairs tests seven PBS lengths (7, 9, 11, 13, 14, 15 and 17 nt), nine RTT lengths (10, 11, 12, 13, 14, 15, 17, 20 and 22 nt) and all three G>H mutations at the +5 position (Supplementary Table 1).
epegRNA sequences and accompanying target sites were synthesized as 250 nt oligonucleotides by Twist Bioscience. Oligonucleotides were structured with adapter sequences on both ends for library amplification, specifically 5′-GTATCCCTTGGAGAACCACCT on the 5′ end and 5′-CAGACGTGTGCTCTTCCGAT on the 3′ end, with internal BstXI (5′-CCACCTTGTTGG) and BamHI (5′-GGATCC) restriction enzyme sites surrounding epegRNA components (19 nt sgRNA and 17–39 nt extension sequences, 37 nt tevopreQ1 (ref. 41) and 7 nt polyT), 17 nt barcodes unique to each epegRNA–target pair and 45 nt target sites, with reversed BsmBI restriction enzyme sites (5′-GTTTAGAGACGGCATGCCGTCTCGGTGC) splitting the sgRNA target sequence from the remainder of designed components to facilitate a two-step cloning process. Target sites were designed to include 4 nt upstream of the protospacer sequence in addition to the PAM and full RTT binding site.
Self-targeting Tiled edits library (lRM001)
Five target sites within the mouse ZRS enhancer were selected based solely on their proximity to transcription factor binding sites of interest55 and the presence of a 5′-NGG-3′ sequence. epegRNAs were designed to encode all possible single nucleotide variants within a specific positional range at these targets, which overall spanned the +1 position (relative to the nick site) to the +6 to +21 position depending on the target site. Twenty epegRNAs were designed for each unique edit (that is, target site, edit position, substitution type combination) with PBS lengths of 6, 9, 12 and 15 nt and ‘homology flap’ lengths (the region 5′ of the encoded edit within the RTT that facilitates incorporation of the 3′ flap after reverse transcription) of 0, 4, 8, 12 and 16 nt. epegRNAs encoding the unedited sequence with a PBS length of 7 nt and RTT length of 13 nt were included for each target site as negative controls. Additionally, previously validated epegRNA designs encoding single nucleotide variant edits in Rnf2 and Hoxd13 were included as positive controls25,56 for a total library size of 3,745 constructs (Supplementary Table 2). For each epegRNA design, the corresponding target site (forward direction, −1 nt 3′ of protospacer to +29 nt 5′ of protospacer, 50 nt total) was included in the oligonucleotide construct along with a unique 12 bp barcode and necessary cloning and PCR sites as in the +5 G>H self-targeting library (lDS004, above). epegRNA–target site constructs, ranging in size from 200–250 nt, were synthesized by Twist Bioscience.
StopPR (lAC002)
A set of 1,247 genes were nominated for inclusion in StopPR due to their determined status as common essential genes by DepMap57. CRISPick58 was used to design 35 sgRNAs targeting each gene using reference genome Human GRCh38 (GRCh38.p13; National Center for Biotechnology Information (NCBI) Refseq GCF_000001405.39) with CRISPRko and SpyoCas9 options, which were then filtered to 16,278 sgRNA target sequences with on-target efficacy scores >0.5. Ensembl Biomart67 was used to obtain exon coordinates, coding sequences and full genomic regions for each target gene. Codons accessible to each protospacer that could be mutated to stop codons with 1, 2 or 3 bp mutations were identified, then any edits that could not be targeted with prime editing were removed, which could occur if the edit occurred at a position upstream of the Cas9(H840A) nick. For each targeted codon, mutations inducing a synonymous amino acid change (such as mutating the codon ACA to ACG, both encoding threonine) were also identified, and codons where a synonymous mutation could not be introduced were filtered, including the removal of all tryptophan codons, as only one codon sequence produces it. For each edit, we designed accompanying PBS (11, 13 nt) and RTT (10, 12, 15, 20 nt) sequences, and filtered any combinations that would result in a too-long oligonucleotide for synthesis.
epegRNA sequences were then designed into 120-nt oligonucleotides with flanking 5′ (5′-CACCAGAAGCCACCTTGTTG) and 3′ (5′-CTGTGTTGGTCTCCCGCG) amplification regions containing BstXI and BasI restriction enzyme sites for synthesis by Twist Bioscience. sgRNA and extension sequences were split by reversed BsmBI restriction enzyme sites (5′-GTTTAGAGACGGCATGCCGTCTCGGTGC) to enable a two-step cloning process. Finally, oligonucleotides that contained incidental restriction enzyme sites or homopolymer T runs (5+) were removed. Then 12,000 epegRNAs were designed to introduce no edits and 3,000 epegRNAs containing scrambled nontargeting spacer sequences were also included to generate a library of ~240,000 epegRNAs (Supplementary Table 3 and 4). Notably, during later analysis, an updated design filter identified a small number of epegRNAs with erroneous features (580 pairs of spacer- and codon-matched stop and synonymous epegRNAs for which either epegRNA was affected). These were removed before analysis (excluded epegRNAs indicated with ‘no’ in ‘included_in_analysis’ column in Supplementary Tables 3 and 4).
Cloning of epegRNA libraries
Self-targeting +5 G>H library (lDS004)
A two-step cloning process was used. First, the Twist oligo pool was PCR amplified using Phusion Plus polymerase (ThermoFisher F630S), 0.5 μM forward primer (5′-GTATCCCTTGGAGAACCACCT), 0.5 μM reverse primer (5′-CAGACGTGTGCTCTTCCGAT) and 0.1 pmol resuspended oligo pool with the following conditions: one cycle of 1 min at 98 °C; 15 cycles of 15 s at 98 °C, followed by 15 s at 60 °C, followed by 45 s at 72 °C; one cycle of 10 min at 72 °C; 10 °C hold. PCR products were purified using Machery-Nagel NucleoSpin Gel and PCR Clean-up kit (740609.50) as per the manufacturer’s protocol and quantified via Nanodrop. Vector backbone pAC025 was subjected to a BstXI-BamHI double restriction digest, followed by column clean-up. NEB Hifi DNA assembly was used to assemble the amplified library pool and digested vector in a 1:3 vector:insert ratio at 50 °C for 1 h. After SPRI purification, assembled products were transformed into electrocompetent cells (Endura, 60242-1) using a MicroPulser (BioRad). SOC media was added (for a total of 1.2 ml) and the transformation mixture was incubated at 37 °C for 1 h. The cells were then grown for 14 h at 37 °C in a 500 ml culture with Luria-Bertani medium and 100 μg ml−1 carbenicillin, and plasmids were extracted from the resulting cultures. To assess intermediate library coverage and quality, epegRNA cassettes and target regions were amplified for validation sequencing using flanking 5′ primer (5′-AATGATACGGCGACCACCGAGATCTACACGCACAAAAGGAAACTCACCCT) and 3′ indexing primer (5′-CAAGCAGAAGACGGCATACGAGATNNNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTC) with the following program: one cycle of 30 s at 98 °C; ten cycles of 10 s at 98 °C, followed by 20 s at 65 °C, followed by 20 s at 72 °C; one cycle of 2 min at 72 °C and 10 °C hold. Sequencing was performed on an Illumina MiSeq at 500× coverage (‘Sequencing’ section). Notably, sequencing revealed that epegRNA identities and their accompanying target regions with barcodes became uncoupled in ~15% of reads, which we hypothesize may be due to the substantial homologous portions within and between each oligo. These uncoupled epegRNA–target site pairs were filtered from downstream analysis (‘Analysis of prime editing efficiencies’ section).
To complete the cloning, the intermediate library was digested with Esp3I enzyme (NEB R0734S) at 37 °C for 6 h and gel purified. The epegRNA scaffold sequence (5′-GTTTAAGAGCTATGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC)66 was synthesized with flanking reversed Esp3I sites (5′-CGTCTCGGTTT and 5′-GTGCTGAGACG) as a gene fragment by IDT and amplified by PCR using Phusion polymerase, 0.5 μM forward primer (5′-TCACAACTACACCAGAAGCCAC), 0.5 μM reverse primer (5′-GCTGGCAACACTTTGACGAAGA) and 0.1 pmol resuspended gene fragment with the following program: one cycle of 30 s at 98 °C; 25 cycles of 10 s at 98 °C, followed by 10 s at 58 °C, followed by 15 s at 72 °C; one cycle of 5 min at 72 °C and 10 °C hold. The amplified scaffold was purified by column clean-up and digested with Esp3I at 37 °C for 6 h. After column clean-up, the purified scaffold insert (2 ng) was ligated with the digested initial plasmid library vector (200 ng) using T4 DNA Ligase (NEB M0202S) at 16 °C overnight. After SPRI purification, ligated products were transformed into Endura electrocompetent cells as above. Final library quality was assessed via sequencing as above, with 90% of library elements occurring within a 6.1× range and a Gini coefficient of 0.26 (Extended Data Fig. 2a).
Self-targeting Tiled edits library (lRM001)
Amplification and cloning of lRM001 was performed as described above for lDS004. The final cloned library was assessed via sequencing and reported a Gini coefficient of 0.30.
StopPR (lAC002)
As with the construction of lDS004, we used a two-step cloning process. First, the Twist oligo pool was PCR amplified using Phusion HSII HF (ThermoFisher, F565S), 0.4 μM forward primer (5′-CACCAGAAGCCACCTTGTTG), 0.4 μM reverse primer (5′-CTGTGTTGGTCTCCCGCG) and 10 ng resuspended oligo pool with the following program: one cycle of 30 s at 98 °C; six cycles of 10 s at 98 °C, followed by 20 s at 65 °C, followed by 10 s at 72 °C; one cycle of 5 min at 72 °C; 10 °C hold. Products from multiple PCR reactions were aggregated and purified using SPRI. Vector backbone pAC026 was subjected to a BstXI-BlpI (NEB R0585S) double digest at 37 °C for 4 h followed by SPRI purification, BsmBI-v2 (NEB R0739S) digest at 55 °C for 6 h and final SPRI purification. Amplified oligo pool was double digested with BstXI and BsaI-v2 (NEB R3733S) restriction enzymes at 37 °C for 4 h and purified through column clean-up. Digested oligo pool and vector backbone were ligated using T4 DNA Ligase at room temperature for 45 min and purified using SPRI. Transformation using electrocompetent Endura cells proceeded as described above, and library quality was assessed via sequencing. epegRNA cassettes were amplified for validation sequencing using primers as above for lDS004. Sequencing was performed on an Illumina NovaSeq at 600× coverage (‘Sequencing’ section).
To complete the cloning, the intermediate library was digested with BsmBI-v2 enzyme at 55 °C for 4 h and SPRI purified. PCR amplification and purification of the epegRNA scaffold proceeded as above. Purified PCR product was digested with BsmBI-v2 at 55 °C overnight, followed by SPRI purification. The purified scaffold insert (2 ng) was ligated with the digested intermediate plasmid library vector (200 ng) using T4 DNA Ligase at room temperature for 45 min. After SPRI purification, ligated products were transformed into Endura electrocompetent cells and final library quality was assessed via sequencing as above. StopPR exhibited moderate skew resulting from missing elements (Gini coefficient of 0.35, with 90% of analyzed library elements present within a 57× range). After filtering lowly represented epegRNAs (‘Analysis of epegRNA phenotypes’ section), we retained 84% of originally designed epegRNAs with well-distributed representation (Gini coefficient of 0.26, 90% of analyzed library elements present within a 5× range).
Production of lentivirus
Lentivirus production was performed for each library using a similar process. Lenti-X 293T cells (14 × 106) were seeded in a 150-mm cell culture dish with DMEM. Plasmids pALD-Rev-A (1 μg, Aldevron), pALD-GagPol-A (1 μg, Aldevron), pALD-VSV-G-A (2 μg, Aldevron) and the transfer vector (15 μg) were mixed with Opti-MEM I Reduced Serum Medium (Gibco, 31985070) and TransIT-LT1 (Mirus MIR 2300) transfection reagent, and cotransfected into cells. At 12–14 h post transfection, 1× ViralBoost reagent (ALSTEM VB100) was added to cells, and at 48 h post transfection, lentivirus-containing supernatant was collected and stored at −80 °C. To determine viral titer, serial dilutions of virus (0–500 μl) were transduced into K562 cells with 8 mg ml−1 polybrene. Titer was calculated 48 h post transduction based on the percentage of cells expressing blue fluorescent protein (BFP).
Endogenous site editing
HEK3 and DNMT1
The lentiviral pegRNAs and epegRNAs (tevopreQ1) targeting HEK3 and DNMT1 endogenous sites were transduced separately, each into a total of 0.6 × 106 cells for PE2, PEmax and PEmaxKO stable cell lines in triplicate, at an MOI of 0.7. Cells were spun at 1,000g for 2 h in the presence of 8 mg ml−1 polybrene (Santa Cruz Biotechnology, sc-134220) before incubating in a humidified incubator. Puromycin (Goldbio, P-600-100) was added 72 h post transduction to deplete untransduced cells. Cells were kept at a minimum of 2.5 × 107 cells per replicate, at a density of 0.5–1.0 × 106 cells per ml (splitting as necessary). Editing lasted for 28 days post transduction, with timepoint samples collected at days 7, 14, 21 and 28. Genomic DNA (gDNA) was extracted from collected K562 cells by first treating with lysis buffer (10 μM Tris-HCl, pH 7.5; 0.05% SDS; 25 μg ml−1 Proteinase K), then by incubating at 37 °C for 90 min followed by heat inactivation at 80 °C for 30 min.
Endogenous sites were amplified from gDNA using a two-step PCR. First, flanking 5′ and 3′ primers were used to amplify HEK3 and DNMT1 genomic sites. HEK3 was amplified with flanking 5′ primer (5′-CGCCCATGCAATTAGTCTATTTCTGC) and 3′ primer (5′-CTCTGGGTGCCCTGAGATCTTTT), with the following program: one cycle of 2 min at 98 °C; 32 cycles of 10 s at 98 °C, followed by 20 s at 69 °C, followed by 30 s at 72 °C; one cycle of 2 min at 72 °C; 10 °C hold. DNMT1 was amplified with flanking 5′ primer (5′-CACAACAGCTTCATGTCAGCCAAG) and 3′ primer (3′-CGTTTGAGGAGTGTTCAGTCTC), with the following program: one cycle of 2 min at 98 °C; 32 cycles of 10 s at 98 °C, followed by 20 s at 66 °C, followed by 30 s at 72 °C; one cycle of 2 min at 72 °C; 10 °C hold. Resulting PCR1 products were SPRI purified using 1.0× reactions. Then, 5′ (5′-AATGATACGGCGACCACCGAGATCTACACNNNNNNNNACACTCTTTCCCTACACGAC) and 3′ (5′-CAAGCAGAAGACGGCATACGAGATNNNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTC) indexing primers were used to amplify purified PCR1 products, with the following program: one cycle of 2 min at 98 °C; eight cycles of 10 s at 98 °C, followed by 20 s at 65 °C, followed by 30 s at 72 °C; one cycle of 2 min at 72 °C and 10 °C hold. Sequencing was performed on an Illumina MiSeq at 50,000× coverage (‘Sequencing’ section).
StopPR validation sites
To individually test negative growth phenotypes for hits from our StopPR (lAC002) screen, the lentiviral epegRNAs targeting the respective loci were each transduced separately into a total of 0.3 × 106 PEmaxKO cells in 24-well plates in triplicate, at an average MOI of 0.9. Cells were spun at 1,000g for 2 h in the presence of 8 mg ml−1 polybrene before incubating in a humidified incubator. The cells were cultured for 14 days (splitting as necessary). For validating the editing efficiency at these same loci, we also transduced the cells at an average MOI of 1.5. Cells were selected by 3 μg ml−1 puromycin at 72 h post transduction. At 7 days post transduction, gDNA was extracted from collected cells as above for HEK3 and DNMT1, with a heat inactivation for 45 min.
Endogenous sites were amplified from gDNA using a two-step PCR. First, ten sets of flanking 5′ and 3′ primers were used to amplify the targeted genomic sites using NEBNext Ultra II Q5 Master Mix (NEB, M0544) with the following program: one cycle of 30 s at 98 °C; 31 cycles of 10 s at 98 °C, followed by 20 s at 58 °C, followed by 40 s at 72 °C; one cycle of 2 min at 72 °C and 4 °C hold. Then, 5′ (5′-AATGATACGGCGACCACCGAGATCTACACNNNNNNNNACACTCTTTCCCTACACGAC) and 3′ (5′- CAAGCAGAAGACGGCATACGAGATNNNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTC) indexing primers were used to amplify PCR1 products by Phusion U Green Multiplex PCR Master Mix (Thermo Scientific, F564L), with the following program: one cycle of 2 min at 98 °C; nine cycles of 10 s at 98 °C, followed by 20 s at 61 °C, followed by 40 s at 72 °C; one cycle of 2 min at 72 °C and 4 °C hold. The resulting PCR2 products were pooled and gel purified. Sequencing was performed on an Illumina MiSeq (‘Sequencing’ section).
Pooled screening
Self-targeting +5 G>H screen (lDS004)
The lentiviral library was transduced into a total of 5 × 107 cells for both PEmax and PEmaxKO stable cell lines in replicate, at an MOI of 0.7 to achieve >10,000× coverage of the number of epegRNA–target pairs. Cells were spun at 1,000g for 2 h in the presence of 8 mg ml−1 polybrene before incubating in a humidified incubator with agitation (multitron). Then, 1 μg ml−1 Puromycin was added 72 h post transduction to deplete untransduced cells. To maintain coverage, cells were kept at a minimum of 2.5 × 107 cells per replicate (>10,000× coverage), at a density of 0.5–1.0 × 106 cells per ml (splitting as necessary). Screening lasted for 28 days post transduction, with timepoint samples (12,500–25,000× representation) collected at days 7, 14, 21 and 28. gDNA was extracted from collected K562 cells using the NucleoSpin Blood XL kit (Macherey Nagel, 740950.50). Subsequently, gDNA was treated with RNase A and purified by ethanol precipitation. epegRNA–target cassettes were PCR amplified using 5′ flanking primer (5′-AATGATACGGCGACCACCGAGATCTACACGCACAAAAGGAAACTCACCCT) and 3′ indexing primer (5′-CAAGCAGAAGACGGCATACGAGATNNNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTC). Each 100 μl reaction contained 10 μg of gDNA, 1 μM primers and 50 μl of NEBNext Ultra II Q5 Master Mix, and was run with the following program: one cycle of 1 min at 98 °C; 22 cycles of 10 s at 98 °C, followed by 30 s at 67 °C, followed by 45 s at 72 °C; one cycle of 5 min at 72 °C and 10 °C hold. Resulting PCR products from each sample were pooled and SPRI purified using 0.85–0.56× double-sided reactions.
Self-targeting Tiled edits screen (lRM001)
For the self-targeting, Tiled edits epegRNA screen, the prepared lentiviral library was transduced into a total of 3 × 107 PEmaxKO cells at an MOI of 0.2 to achieve >1,000× coverage of the number of epegRNA–target pairs. Cells were spun at 1,000g for 2 h in the presence of 8 μg ml−1 polybrene, split into two separate 150 ml cultures and maintained in a humidified incubator with agitation (multitron) at 37 C, 5% CO2 and 60 rpm. At 72 h post transduction, 2 μg ml−1 puromycin was introduced to deplete untransduced cells and replenished at each subsequent passage. To maintain >1,000× coverage, cells were kept at a density of 0.1–1.0 × 106 cells per ml in 150–200 ml cultures, splitting as necessary. Cell populations were sampled at days 5, 8 and 10 post transduction, collecting 50 × 106–75 × 106 cells per replicate at each timepoint (>1,000× representation). Nucleic acids were extracted from collected PEmaxKO cells using the NucleoSpin Blood XL kit, treated with RNase A and SPRI purified to obtain purified gDNA. epegRNA–target cassettes were PCR amplified and indexed as described for the initial self-targeting screen (lDS004).
StopPR screen (lAC002)
The lentiviral library was transduced into a total of 4.1 × 108 cells for PEmaxKO stable cell line in replicate, at an MOI of 0.7 to achieve >500× coverage of the number of epegRNAs. Cells were spun at 1,000g for 2 h in the presence of 8 mg ml−1 polybrene before incubating in a humidified incubator with agitation (multitron). Next, 1 μg ml−1 Puromycin was added 72 h post transduction to deplete untransduced cells. To maintain coverage, cells were kept at a minimum of 4.5 × 108 cells per replicate (>1,500× coverage), at a density of 0.5–1.0 × 106 cells per ml (splitting as necessary). Screening lasted for 28 days post transduction, with timepoint samples (1,250–2,000× representation) collected at days 7, 14 and 28. gDNA extraction and PCR amplification of epegRNA cassettes proceeded as above, under the following conditions: one cycle of 30 s at 98 °C; 22 cycles of 10 s at 98 °C, followed by 20 s at 65 °C, followed by 20 s at 72 °C; one cycle of 2 min at 72 °C; 10 °C hold. Resulting PCR products from each sample were pooled and SPRI purified using 0.85–0.56× double-sided reactions.
Sequencing
Endogenous sites
Sequencing was performed on an Illumina MiSeq with 10% phiX spike-in with single reads: I1 = 8 nt, i7 index read; I2 = 8 nt, i5 index read; R1 = 300 nt, endogenous sequence. Standard Illumina primers were used for all reads.
Self-targeting +5 G>H screen (lDS004)
Sequencing was performed on an Illumina MiSeq with 5% phiX spike-in with paired-end reads: I1 = 6 nt, i7 index read; I2 = 0 nt, i5 index read; R1 = 144 nt, epegRNA spacer and extension; R2 = 68 nt, target sequence and barcode. Custom primers were used for R1 (5′-GTGTGTTTTGAGACTATAAGTATCCCTTGGAGAACCACCTTGTTG), and standard Illumina primers were used for remaining reads.
Self-targeting Tiled edits screen (lRM001)
Sequencing was performed on an Illumina NovaSeq with 5% phiX spike-in with paired-end reads: I1 = 8 nt, i7 index read; I2 = 0 nt, i5 index read; R1 = 220 nt, epegRNA spacer and extension; R2 = 88 nt, target sequence and barcode. Custom primers were used for R1 as in sequencing of lDS004, and standard Illumina primers were used for remaining reads.
StopPR screen (lAC002)
Sequencing was performed on an Illumina NovaSeq with 25% phiX spike-in with paired-end reads: I1 = 8 nt, i7 index read; I2 = 0 nt, i5 index read; R1 = 28 nt, epegRNA spacer; R2 = 102 nt, epegRNA extension. Custom primers were used for R1 as in sequencing of lDS004, and standard Illumina primers were used for remaining reads.
All sequencing reads were demultiplexed through HTSEQ (Princeton University High Throughput Sequencing Database (https://htseq.princeton.edu/; v.13.13.15)).
Statistical analysis
Analysis of prime editing efficiencies
Endogenous sites
To analyze sequencing data, we first used CRISPRessoBatch68 to align reads to reference endogenous sequences (inputted as amplicon_seq) based on spacer sequences (inputted as guide_seq). Both min_average_read_quality and min_bp_quality_or_N arguments were set to 30, otherwise default parameters were used. The CRISPRessoBatch quantification window was positioned to include 25 nt on both sides of the Cas9(H840A) nick site (50 nt total window size), which ensured analysis of at least 10 nt past the RTT length for all sites. Custom Python scripts were used to further process aligned reads from CRISPRessoBatch (contained in allele frequency tables): First, to account for the presence of inferred single-nucleotide polymorphisms at the endogenous targets in K562 cells, we allowed either A/G at the position 11 nt upstream of the nick site and either A/G at the position 9 nt downstream of the nick site for the HEK3 reference, and for the DNMT1 reference, we allowed either A/G at the position 3 nt upstream of the nick site. Second, we also considered nucleotides assigned to ‘N’ by CRISPRessoBatch, which likely arise due to sequencing errors, as reference nucleotides. We then collapsed reads into alignment bins accordingly. Reads were classified as either precise edit (only variant was the intended edit), no edit (same as reference sequence) or error (contained a variant that was not the intended edit), and reported efficiencies describe the percentage of: number of reads with the classified edit/number of reads that align to the amplicon.
Self-targeting +5 G>H screen (lDS004)
This self-targeting screen was analyzed using a three-stage pipeline:
In the first stage, each read was assigned to an epegRNA identity (unique to each epegRNA–target pair) by aligning components of the epegRNA (contained on read 1) and target (contained on read 2) to reference sequences (that is, spacer through the end of epegRNA extension for read 1, target sequence through the barcode for read 2) using bowtie2. Read pairs with low mapping quality (≤5) or with recombination between the two reads were removed, and remaining reads were assigned to groups based on their epegRNA identities to enable parallel processing.
In stage two, the 45 nt target sites for each epegRNA–target pair were extracted, collapsed and analyzed to determine observed editing outcomes. First, we extracted the part of the read that matched the reference target site with at least 60% of bases. As we have a 45 nt target site, outcomes with 18 or more nucleotide differences from the reference would have been discarded (defining an upper limit on observed indel lengths). Next, barcodes were extracted from reads by identifying the portion of reads that matched the expected barcode with no more than eight mismatches, then any reads with errors in the barcodes (three or more mismatched bases) were filtered to ensure that target sites matched epegRNA identities. Then, reads were collapsed to ‘outcomes’ by identifying all reads with the same sequence. Outcomes that occurred at very low frequencies (<0.1% or ten total reads, whichever was higher) were filtered. We reasoned that the latter set of outcomes likely represented PCR or sequencing errors rather than edits introduced by prime editing. To deal with other outcomes likely containing systematic errors from low sequencing quality, we developed and applied the following algorithm: for each outcome, the mean sequencing quality score was calculated at each base; if the average quality was below 15 and the base did not match the reference sequence, it was corrected. This process was used sparingly, correcting a median of 33 reads per epegRNA–target pair across all four time points. After base correction, outcomes were globally aligned to their reference target sites and variants (substitutions, insertions and deletions) were called for each outcome. Each outcome was associated with zero (reference, no edits made) or more variants and classified as no edit (same as reference), precise edit (only variant is the intended edit) or error (contains a variant that is not the intended edit).
In stage three, all outcomes associated with individual epegRNA identities across all time points were aggregated into one file and the resulting individual files were concatenated for analysis. Any pairs with fewer than 50 reads at any of the four collected time points were removed from analysis, with a unified set of epegRNA–target pairs analyzed for both cell lines.
Self-targeting Tiled edits screen (lRM001)
Our second self-targeting screen was analyzed using a different, independent analysis pipeline:
For each paired read, the 20 nt epegRNA spacer sequence was extracted as bases 2–21 of R1 while the 12 bp construct barcode was extracted from R3 using the polyT termination sequence as an anchor to account for the fact that errors introduced during prime editing could alter the barcode’s absolute position in the read but not its position relative to the polyT. Reads with a valid barcode (exact match to reference) and spacer sequence with a Hamming distance ≤3 compared to the expected spacer sequence, using the barcode as ground truth, were then assigned to a given construct or design. While a valid barcode was detected in approximately 90% of the total reads per library, only 50% of total reads per library had a correct barcode:spacer match, including in the library prepared directly from the cloned plasmid pool (prepackaging and transduction), possibly due to recombination during PCR. Once assigned to a given construct, the R1 sequence of a paired read was searched for the constant epegRNA scaffold and tevopreQ1 regions by local alignment to their reference sequences using the align.localds() function within the pairwise2 module of Biopython (v.1.78) using a scoring criteria of 1, −1, −5, −1 and 0 for matches, mismatches, gap initiations, gap extensions and mismatches with ambiguous nucleotides (‘N’s), respectively.
The epegRNA 3′ extension (composed of the PBS and RTT) within the read was then extracted by taking the entire sequence between these two alignments. A paired read was only considered valid and used to assess editing rate if the extracted 3′ extension was an exact match (both in sequence and length) to the reference 3′ extension, again using the extracted read barcode as ground truth. Overall, 90% of assigned reads passed this last validation step, although this percentage varied between epegRNA designs with some constructs obtaining <20% valid reads. A minimum threshold of 200 valid reads detected per library was used to filter out poorly represented constructs, resulting in a final list of 3,662 epegRNA designs for which prime editing rates could be assessed. From the list of valid paired reads per construct, editing outcomes were determined by comparing the 50 nt region between bases 6–56 of the R2 sequence to unedited and precisely edited reference sequences. Exact matches (Hamming distance 0) dictated the assignment whereas reads matching neither sequence were classified as containing errors. Due to these stringent requirements and a low frequency of errors introduced during PCR and/or sequencing, a background rate of valid reads containing errors was observed for all constructs as established by sequencing epegRNA–target cassettes from the cloned plasmid pool (prepackaging and transduction): typically ranging between 1 and 10% of valid reads with an average error rate of 3.9%.
Analysis of epegRNA phenotypes
StopPR screen (lAC002)
To analyze deep sequencing data from StopPR, we used custom Python scripts to exactly match sequencing reads to epegRNA spacer and extension sequences. Excluded from reported library numbers and statistics throughout the paper were pairs of spacer- and codon-matched stop and synonymous epegRNAs that did not pass an updated design filter, including pairs for which either epegRNA converted a stop codon to a different stop codon or erroneously specified an edit in a noncoding region (found after updating validation code). These constituted a small minority of epegRNA pairs (580 total; associated epegRNAs are indicated as ‘no' in ‘included_in_analysis’ column in Supplementary Tables 3 and 4). Notably, this set of excluded epegRNAs included 68 epegRNAs (designed as synonymous) targeting the intronic base directly adjacent to 3′ exon boundaries; this small number of epegRNAs with unintended targets was used in the Supplementary Discussion, ‘Examining a small subset of noncoding epegRNA phenotypes’. Additionally, we filtered any pairs of spacer- and codon-matched stop and synonymous epegRNAs for which either epegRNA had fewer than 200 reads at day 7 (23,024). At day 14 and day 28, a pseudocount of ten was added to all read counts to account for epegRNAs that had fully dropped out of the population. Enrichment of each epegRNA both at t = day 14 and t = day 28 was calculated as follows, where t0 = day 7:
Enrichment was then normalized by subtracting the median enrichment of negative control epegRNAs (NC, nontargeting controls), resulting in our final growth phenotype measurement:
Phenotypes per epegRNA were averaged across replicates for both day 14 and 28, and all epegRNA phenotypes were converted to Z scores by dividing them by the standard deviation of the nontargeting control epegRNA phenotypes. A phenotype induction cutoff was set as two standard deviations below the mean enrichment of nontargeting controls (that is, a score of Z < −2) based on previous literature7. To determine a per-gene (or gene-level) stop epegRNA growth phenotype, the top two epegRNAs with the absolute largest stop epegRNA phenotypes for each gene were averaged.
StopPR validation
Enrichment of each epegRNA at t = day 14 was calculated as follows, where t0 = day 7:
Phenotypes per epegRNA were averaged across triplicates for day 14.
Multiple linear regression model
To investigate the effects of different epegRNA design choices on phenotypic outcomes in a more informative way than simple feature grouping (which can be potentially confounded by the fact that epegRNAs within any such group unavoidably represent multiple features), we built a multiple linear regression model. First, we restricted our analysis to all stop epegRNAs that targeted a codon where phenotype induction was observed by at least one epegRNA. Subsetting the data in this manner isolated edits for which we had reasonable evidence that edit installation could induce phenotype. We reasoned that, in these cases, features other than the edit itself would determine differences in phenotype induction. This set of 51,279 stop epegRNAs was used to create a multiple linear regression model with the following features to predict day 14 phenotypes: edit distance from cut site (1–20 bp), edit length (1, 2 or 3 bp), edit installed (174 possibilities as no epegRNA specifying a CCT>TAA edit induced a phenotype), starting codon (59 possibilities), stop codon installed (TAG, TGA, TAA), PBS (11, 13 nt) and RTT (10, 12, 15, 20 nt) length, spacer orientation relative to gene (sense or antisense), edit location within gene body (0–100%) and edit located within last exon of transcript (yes or no). Discrete features (starting codon, stop codon installed, substitution type, spacer orientation, last exon) were given numerical encodings through the use of tenfold target encoding that, together with the coefficients from the resulting model, enabled a ranking of the relative importance of each category within the different features. We opted to use a target encoding approach to keep the dimensionality of our model low, as it directly replaces categorical features with their phenotypic mean. RTT length and edit position were given additional quadratic terms in the model to adjust from the observed preference of 15 nt RTT length and edits within the PAM region (Fig. 4a and Extended Data Fig. 5a). After encoding, all features were centered and scaled by subtracting the mean and dividing by the standard deviation of each feature, and then the model was fit (Supplementary Table 5).
ePRIDICT evaluation
We used ePRIDICT48 to generate chromatin favorability scores for prime editing for each stop epegRNA that survived filters in StopPR. For a small number of edits (639), ePRIDICT was missing needed chromatin features and thus did not generate scores, leaving a set of 101,857 stop epegRNAs targeting 15,008 codons for analysis. We defined a codon-level ePRIDICT score as the average ePRIDICT score from all targeted genomic positions within the same codon, and subsequently defined codons with score >50 as having a favorable chromatin context, and those with score <35 as having an unfavorable chromatin context, following thresholds for high and low scores defined in the original publication48.
TSS analysis
To compare phenotypes for stop and control epegRNAs in our StopPR screen with respect to the targeted Cas9(H840A) nick’s distance from the transcription start site (TSS), we used Ensembl to obtain TSS coordinates for all genes targeted in our library. In the case that a gene had more than one annotated TSS, the closest TSS to the targeted Cas9(H840A) nick was used. We removed the subset of synonymous control epegRNAs that were found to target the D−1 position at a canonical splice site (2,637 epegRNAs), as phenotypes that may result from edits specified by these epegRNAs could be attributed to their edit as opposed to a possible CRISPRi effect.
Statistical testing and reproducibility
To compare top two stop epegRNA Z scores between bins of K562 CRISPRi phenotypes, edit position type, edit length, edit location in gene and stop codon installed, we used a one-way analysis of variance (ANOVA) followed by two-sided Tukey’s post hoc test. To compare top two stop epegRNA enrichment values between binarized features including RTT and PBS lengths, spacer orientation relative to gene, and installation in the last exon, we used a two-sided two-sample t-test. When comparing all sense and antisense stop epegRNAs targeting the same substitution, we used a two-sided two-sample t-test. When comparing epegRNAs targeting binned positions from the TSS, we used a two-sided two-sample t-test (all P values reported in Supplementary Table 6). KEGG pathway analysis was performed using ShinyGO web application69,70,71. Effect size analysis was performed using the cohen.d function from the effsize R package with default parameters, with Cohen’s d measurements greater than 0.8 in magnitude generally considered large72. For odds ratio calculations in the ePRIDICT analysis, a total of 101,857 stop epegRNAs were included, with 17,445 inducing phenotype (Z < −2). There were 36,223 stop epegRNAs identified with favorable chromatin contexts, of which 8,373 induced a phenotype, while 1,002 stop epegRNAs targeted codons in unfavorable contexts, with 144 of those inducing phenotype. For all analyses, nonsignificant P ≥ 0.05, *P < 0.05, **P < 0.01, ***P < 0.001.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Processed data from all screens are available as Supplementary Tables 1–4. Raw sequencing data from all screens and endogenous target site experiments have been deposited into the NCBI Sequence Read Archive database under accession no. PRJNA1159206. epegRNA libraries lDS004 (Addgene no. 227707), lRM001 (Addgene no. 227708) and lAC002 (Addgene no. 227706; also referred to as StopPR) generated in this study have been deposited to Addgene, along with backbone plasmids pAC025 (Addgene no. 227597) and pAC026 (Addgene no. 227598). Cell lines are available on request. Source data are provided with this paper.
Code availability
Scripts used to process data from the +5 G>H self-targeting screen are available via GitHub at https://github.com/simpsondl/TSpeg. Scripts used to process data from the Tiled edits self-targeting screen are available via GitHub at https://github.com/badamsonlab/lRM001_pegRNA_screen. Scripts used to process data from the StopPR screen are available via GitHub at https://github.com/anncir1/StablePE.
References
Bick, A. G. et al. Genomic data in the All of Us Research Program. Nature https://doi.org/10.1038/s41586-023-06957-x (2024).
Stenson, P. D. et al. The Human Gene Mutation Database (HGMD®): optimizing its use in a clinical diagnostic or research setting. Hum. Genet. 139, 1197–1207 (2020).
Sahni, N. et al. Widespread macromolecular interaction perturbations in human genetic disorders. Cell 161, 647–660 (2015).
Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009).
Schmidt, R. et al. Base-editing mutagenesis maps alleles to tune human T cell functions. Nature https://doi.org/10.1038/s41586-023-06835-6 (2023).
Cuella-Martin, R. et al. Functional interrogation of DNA damage response variants with base editing screens. Cell 184, 1081–1097.e19 (2021).
Hanna, R. E. et al. Massively parallel assessment of human variants with base editor screens. Cell 184, 1064–1080.e20 (2021).
Cornu, T. I. & Cathomen, T. Targeted genome modifications using integrase‐deficient lentiviral vectors. Mol. Ther. 15, 2107–2113 (2007).
Woods, N. T. et al. Functional assays provide a robust tool for the clinical annotation of genetic variants of uncertain significance. NPJ Genomic Med. 1, 16001 (2016).
Giacomelli, A. O. et al. Mutational processes shape the landscape of TP53 mutations in human cancer. Nat. Genet. 50, 1381–1387 (2018).
Kotler, E. et al. A systematic p53 mutation library links differential functional impact to cancer mutation pattern and evolutionary conservation. Mol. Cell 71, 178–190.e8 (2018).
Findlay, G. M., Boyle, E. A., Hause, R. J., Klein, J. C. & Shendure, J. Saturation editing of genomic regions by multiplex homology-directed repair. Nature 513, 120–123 (2014).
Findlay, G. M. et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217–222 (2018).
Koblan, L. W. et al. Efficient C•G-to-G•C base editors developed using CRISPRi screens, target-library analysis, and machine learning. Nat. Biotechnol. 39, 1414–1425 (2021).
Chen, L. et al. Adenine transversion editors enable precise, efficient A•T-to-C•G base editing in mammalian cells and embryos. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01821-9 (2023).
Sánchez-Rivera, F. J. et al. Base editing sensor libraries for high-throughput engineering and functional analysis of cancer-associated single nucleotide variants. Nat. Biotechnol. 40, 862–873 (2022).
Chen, P. J. & Liu, D. R. Prime editing for precise and highly versatile genome manipulation. Nat. Rev. Genet. 24, 161–177 (2023).
Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).
Gaudelli, N. M. et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).
Kurt, I. C. et al. CRISPR C-to-G base editors for inducing targeted DNA transversions in human cells. Nat. Biotechnol. 39, 41–46 (2021).
Tong, H. et al. Programmable A-to-Y base editing by fusing an adenine base editor with an N-methylpurine DNA glycosylase. Nat. Biotechnol. 41, 1080–1084 (2023).
Wang, J. Y. & Doudna, J. A. CRISPR technology: a decade of genome editing is only the beginning. Science 379, eadd8643 (2023).
Rees, H. A. & Liu, D. R. Base editing: precision chemistry on the genome and transcriptome of living cells. Nat. Rev. Genet. 19, 770–788 (2018).
Anzalone, A. V., Koblan, L. W. & Liu, D. R. Genome editing with CRISPR-Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol. 38, 824–844 (2020).
Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019).
Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. Genetic screens in human cells using the CRISPR-Cas9 system. Science 343, 80–84 (2014).
Shalem, O. et al. Genome-scale CRISPR-Cas9 knockout screening in human cells. Science 343, 84–87 (2014).
Konermann, S. et al. Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex. Nature 517, 583–588 (2015).
Gilbert, L. A. et al. Genome-scale CRISPR-mediated control of gene repression and activation. Cell 159, 647–661 (2014).
Nuñez, J. K. et al. Genome-wide programmable transcriptional memory by CRISPR-based epigenome editing. Cell 184, 2503–2519.e17 (2021).
Erwood, S. et al. Saturation variant interpretation using CRISPR prime editing. Nat. Biotechnol. 40, 885–895 (2022).
Ren, X. et al. High-throughput PRIME-editing screens identify functional DNA variants in the human genome. Mol. Cell 83, 4633–4645.e9 (2023).
Chardon, F. M. et al. A multiplex, prime editing framework for identifying drug resistance variants at scale. Preprint at bioRxiv https://doi.org/10.1101/2023.07.27.550902 (2023).
Gould, S. I. et al. High-throughput evaluation of genetic variants with prime editing sensor libraries. Nat. Biotechnol. https://doi.org/10.1038/s41587-024-02172-9 (2024).
Kim, Y., Oh, H.-C., Lee, S. & Kim, H. H. Saturation resistance profiling of EGFR variants against tyrosine kinase inhibitors using prime editing. Preprint at bioRxiv https://doi.org/10.1101/2023.12.03.569825 (2023).
Martyn, G. E. Rewriting regulatory DNA to dissect and reprogram gene expression. Preprint at bioRxiv https://doi.org/10.1101/2023.12.20.572268 (2023).
Herger, M. et al. High-throughput screening of human genetic variants by pooled prime editing. Preprint at bioRxiv https://doi.org/10.1101/2024.04.01.587366 (2024).
Niu, X. et al. Prime editor-based high-throughput screening reveals functional synonymous mutations in the human genome. Preprint at bioRxiv https://doi.org/10.1101/2024.06.16.599253 (2024).
Chen, P. J. et al. Enhanced prime editing systems by manipulating cellular determinants of editing outcomes. Cell 184, 5635–5652.e29 (2021).
Yan, J. et al. Improving prime editing with an endogenous small RNA-binding protein. Nature 628, 639–647 (2024).
Nelson, J. W. et al. Engineered pegRNAs improve prime editing efficiency. Nat. Biotechnol. 40, 402–410 (2022).
Choi, J. et al. Precise genomic deletions using paired prime editing. Nat. Biotechnol. 40, 218–226 (2022).
Ferreira da Silva, J. et al. Prime editing efficiency and fidelity are enhanced in the absence of mismatch repair. Nat. Commun. 13, 760 (2022).
Lahue, R. S., Au, K. G. & Modrich, P. DNA mismatch correction in a defined system. Science 245, 160–164 (1989).
Su, S. S., Lahue, R. S., Au, K. G. & Modrich, P. Mispair specificity of methyl-directed DNA mismatch correction in vitro. J. Biol. Chem. 263, 6829–6835 (1988).
Shen, M. W. et al. Predictable and precise template-free CRISPR editing of pathogenic variants. Nature 563, 646–651 (2018).
Kim-Yip, R. P. et al. Efficient prime editing in two-cell mouse embryos using PEmbryo. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-02106-x (2024).
Mathis, N. et al. Machine learning prediction of prime editing efficiency across diverse chromatin contexts. Nat. Biotechnol. https://doi.org/10.1038/s41587-024-02268-2 (2024).
Allen, F. et al. Predicting the mutations generated by repair of Cas9-induced double-strand breaks. Nat. Biotechnol. 37, 64–72 (2019).
Kim, H. K. et al. SpCas9 activity prediction by DeepSpCas9, a deep learning–based model with high generalization performance. Sci. Adv. 5, eaax9249 (2019).
Arbab, M. et al. Determinants of base editing outcomes from target library analysis and machine learning. Cell 182, 463–480.e30 (2020).
Kim, H. K. et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nat. Biotechnol. 39, 198–206 (2021).
Mathis, N. et al. Predicting prime editing efficiency and product purity by deep learning. Nat. Biotechnol. https://doi.org/10.1038/s41587-022-01613-7 (2023).
Yu, G. et al. Prediction of efficiencies for diverse prime editing systems in multiple cell types. Cell 186, 2256–2272.e23 (2023).
Lim, F. et al. Affinity-optimizing enhancer variants disrupt development. Nature 626, 151–159 (2024).
Liu, Y. et al. Efficient generation of mouse models with the prime editing system. Cell Discov. 6, 27 (2020).
DepMap 22Q2 Public. Figshare https://doi.org/10.6084/m9.figshare.19700056.v2 (2022).
Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184–191 (2016).
Horlbeck, M. A. et al. Compact and highly active next-generation libraries for CRISPR-mediated gene repression and activation. eLife 5, e19760 (2016).
Li, X. et al. Chromatin context-dependent regulation and epigenetic manipulation of prime editing. Cell 187, 2411–2427.e25 (2024).
Lykke-Andersen, S. & Jensen, T. H. Nonsense-mediated mRNA decay: an intricate machinery that shapes transcriptomes. Nat. Rev. Mol. Cell Biol. 16, 665–677 (2015).
Sibley, C. R., Blazquez, L. & Ule, J. Lessons from non-canonical splicing. Nat. Rev. Genet. 17, 407–421 (2016).
Zhang, S. et al. Base-specific mutational intolerance near splice sites clarifies the role of nonessential splice nucleotides. Genome Res. 28, 968–974 (2018).
Li, X. et al. Highly efficient prime editing by introducing same-sense mutations in pegRNA or stabilizing its structure. Nat. Commun. 13, 1669 (2022).
Koeppel, J. et al. Prediction of prime editing insertion efficiencies using sequence features and DNA repair determinants. Nat. Biotechnol. 41, 1446–1456 (2023).
Chen, B. et al. Dynamic imaging of genomic loci in living human cells by an optimized CRISPR/Cas system. Cell 155, 1479–1491 (2013).
Cunningham, F. et al. Ensembl 2022. Nucleic Acids Res. 50, D988–D995 (2022).
Clement, K. et al. Accurate and rapid analysis of genome editing data from nucleases and base editors with CRISPResso2. Nat. Biotechnol. 37, 224–226 (2019).
Ge, S. X., Jung, D. & Yao, R. ShinyGO: a graphical gene-set enrichment tool for animals and plants. Bioinformatics 36, 2628–2629 (2020).
Luo, W. & Brouwer, C. Pathview: an R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics 29, 1830–1831 (2013).
Kanehisa, M., Furumichi, M., Sato, Y., Ishiguro-Watanabe, M. & Tanabe, M. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res. 49, D545–D551 (2021).
Cohen, J. Statistical power analysis. Curr. Dir. Psychol. Sci. 1, 98–101 (1992).
Acknowledgements
We thank members of the Adamson laboratory for feedback and support throughout the course of this project. We thank former laboratory member K. Tam for helpful discussions. We thank W. Wang, J. Miller and J. A. Volmar (Genomics Core Facility of Princeton University), as well as T. DeCoste and K. Rittenbach (Flow Cytometry Core Facility of Princeton University). Research was supported by the National Institutes of Health under grant award numbers R35GM138167 (B.A.), RM1HG009490 (B.A.), R01-GM076275 (M.S.), P30CA072720 (Rutgers Cancer Institute of New Jersey via a National Institutes of Health Cancer Center Support Grant), F31HD113443 (S.C.S.), T32HG003284 (Princeton QCB training grant) and T32GM007388 (Princeton MOL training grant), as well as the New Jersey Commission on Cancer Research (NJCCR) under grant award number COCR24PRF021 (predoctoral fellowship, R.M.), the Eric and Wendy Schmidt Transformative Technology Fund (B.A.), NSF CAREER 2239957 (E.K.F.) and Princeton University (B.A.). A.C. was supported by the National Science Foundation Graduate Research Fellowship Program (grant no. DGE-2039656). J.Y. was supported by a fellowship provided by the China Scholarship Council (CSC), based on the April 2015 Memorandum of Understanding between the CSC and Princeton University.
Author information
Authors and Affiliations
Contributions
Generation of cell lines was done by P.R. and J.Y. lDS004 library design was carried out by A.C. lAC002 library design was done by D.S. lRM001 library design was done by R.M., F.L. and E.K.F. lDS004 and lAC002 library cloning was carried out by A.C., D.S., P.R. and S.C.S. lDS004 and lAC002 screening was done by A.C., P.R. and S.C.S. lDS004 and lAC002 screen analysis was done by A.C. and D.S. lRM001 cloning, screening and screen analysis were carried out by R.M. Validation experiments were done by W.Y. Western blot was by S.C.S. The paper was written by A.C., D.S. and B.A., with contributions from all authors. M.S. and B.A. supervised the project. Project administration was carried out by B.A.
Corresponding author
Ethics declarations
Competing interests
B.A. is an advisory board member with options for Arbor Biotechnologies and Tessera Therapeutics. B.A. holds equity in Celsius Therapeutics. J.Y. and B.A. have filed patent application(s) on prime editing technologies. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks Francisco Sanchez-Rivera, Hao Yin and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lei Tang, in collaboration with the Nature Methods team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Continuous prime editing at two endogenous loci produces very low frequencies of on-target, unwanted editing (errors).
a, Percentages of sequencing reads with unintended, on-target edits (errors) from indicated conditions. Corresponding frequencies of precise editing presented in Fig. 1d. Data and error bars represent mean +/- s.d. (n = 3 independent biological replicates). b, Sequences and frequencies of alleles observed at the targeted MLH1 locus in our PEmaxKO cell line (characterization of n = 1 cell line). Figure made with CRISPResso268.
Extended Data Fig. 2 Results from + 5 G > H self-targeting sensor screens demonstrate precise editing with low errors over one month of continuous prime editing.
a, Count distribution of epegRNA-target reads from +5 G > H self-targeting sensor library (plasmid; characterization of n = 1 library). Gini coefficient indicates the degree of inequality on a scale from 0 (perfectly equal representation among all elements) to 1 (only one element represented). Typically, libraries with coefficients less than 0.30 are reasonably well distributed. Here, 90% of library elements occurred within a 6.1X range. b, Percentages of sequencing reads from sensor targets containing only the precise edit from two replicates of +5 G > H screens. Dots represent data from individual epegRNA-target pairs. Correlation between replicates (Pearson’s r) indicated. Density plots on top and side show data distribution for replicate 1 and 2, respectively. c, As in b, but for percentages of sequencing reads from sensor targets containing errors. d, Comparison between precise editing efficiencies observed in our +5 G > H screens (y-axis, replicate-averaged) and a previous report (x-axis). Screen performed in previous report used transiently expressed PE2 and stably expressed pegRNAs in HEK293T cells and edited for only five days post transduction52. Only edits evaluated in both studies (+5 G > C) included. Measurements from +5 G > H screens performed in PEmax and PEmaxKO cells shown in brown and green, respectively, sampled at day 28 post transduction. e, Heatmap depicting median, replicated-averaged percentages of sequencing reads from sensor targets containing only the precise edit for different RTT and PBS lengths of experimental epegRNAs (percentages indicated). Data from cells collected on indicated days post transduction of sensor library from indicated +5 G > H screen. RTT and PBS combinations that were used to target <5 sensor sequences or were used in <10 epegRNA-target pairs were excluded. f, As in e, but only for non-G > C edits.
Extended Data Fig. 3 Results from Tiled edits self-targeting sensor screen demonstrate precise editing with low errors across 1 bp substitution edits.
a, Percentages of sequencing reads from sensor targets containing only the precise edit from two replicates of Tiled edits screen performed in PEmaxKO cells and collected at day 5 (left), day 8 (middle), and day 10 (right) post transduction of sensor library. Each data point represents an individual epegRNA-target pair, with best epegRNA design per edit indicated in green, and additional controls indicated in red (negative) and cyan (positive). Correlation between replicates (Pearson’s r) indicated. Inset plots show percentages of sequencing reads from sensor targets containing errors for replicate 1 (x-axis) and replicate 2 (y-axis) from respective timepoint. b, Replicate-averaged percentages of sequencing reads with only the precise edit (green), any error (red), or unedited sequence (blue) for each epegRNA design targeting the +10 T > C edit at target site 1 in Tiled edits screen. Results from day 10 post transduction of sensor library. Bars are stacked per epegRNA to show cumulative total of sequencing reads. Individual data points for replicates provided in Supplementary Table 2. c, Replicate-averaged percentages of sequencing reads containing only the precise edit or errors for epegRNA-target pairs from the Tiled edits screen for the best epegRNA per edit (maximum precise editing) on day 10 post transduction of sensor library. Boxplots indicate the median and interquartile range (IQR) for each group with whiskers extending 1.5 x IQR past the upper and lower quartiles. Left, all edits (n = 195 edits). Right, all edits except A > G substitutions, which may occur during ADAR-based editing of the epegRNA-target constructs during lentiviral packaging53 (n = 167 edits). d, Maximum precise editing frequencies (replicate-averaged) across epegRNA designs for each edit in the Tiled edits screen, separated by target site. x-axis indicates the edit position relative to the nick. y-axis specifies the targeted edit. na indicates an unedited nucleotide at that position, which was not measured in the screen.
Extended Data Fig. 4 Growth phenotypes from stop epegRNAs accumulate but become noisier over time.
a, Schematic of epegRNA expression cassette. Purple regions varied coordinately across StopPR. mU6, modified mouse U6 promoter; RTT, reverse transcriptase template; PBS, primer binding site. b, Top 15 KEGG pathways enriched in full set of targeted genes in StopPR compared to all human genes. Image generated with ShinyGO69,70,71. c, Schematic of StopPR screen. d, Normalized read counts (relative to total sequencing reads) for each epegRNA from independent biological replicates of StopPR screen, at day 14 post transduction of library. Data points colored by density, indicated by number of neighbors. Correlation between replicates (Pearson’s r) indicated. e, Growth phenotypes for epegRNAs from independent biological replicates of StopPR screen at 28 days post transduction of library. Correlation (Pearson’s r) indicated for each epegRNA category. Density plots on top and side show data distribution per epegRNA category for replicate 1 and 2, respectively. f, Growth phenotypes (replicate-averaged) from StopPR screen at 14 and 28 days post transduction of library. Density plots on top and side show data distribution per epegRNA category for day 14 and day 28, respectively. g, Same as in f, except presented as the log2 transformation of the epegRNA enrichment fold change (prior to converting to Z scores; Methods). h, Fold change of cells expressing epegRNAs over two weeks post transduction of epegRNAs (determined by loss of co-expressed BFP). Data from day 14 also depicted in Fig. 3f. Corresponding target site sequencing shown in i and Fig. 3g. i, Percentages of sequencing reads containing errors from endogenous sites (7 days post transduction of same epegRNA constructs as in h). Left, stop (red) and synonymous (blue) control epegRNAs targeting the same codon. Right, no edit control epegRNAs (gray). Dotted lines in e-g denote phenotype cutoffs (Z < -2). Data and error bars in h, i represent mean +/- s.d (n = 3 independent biological replicates). c, Created with BioRender.com.
Extended Data Fig. 5 epegRNA and endogenous target features vary in their impact on phenotype.
a, Growth phenotypes for stop epegRNAs (top two per gene) from StopPR screen sampled from day 14 post transduction, binned by indicated features. Feature categories and analyzed epegRNAs comprise those used for effect size analysis in Fig. 4a. For features with two categories, a two-sided, two-sample t-test was performed. For features with three categories, a one-way ANOVA followed by a two-sided Tukey post-hoc was performed. Resulting p-values indicated and number of epegRNAs analyzed for each feature category denoted as n. F or t statistics are as follows (left to right, top to bottom): 115.9, 29.2, -7.73, -4.91, 5.27, 11.4, 2.61, and -2.42. Median and interquartile range (IQR) of the full set of epegRNAs used in this analysis per group are indicated. Whiskers extend 1.5*IQR past the upper and lower quartiles. b, Average growth phenotypes for stop epegRNAs specifying individual substitution types targeted in StopPR with both sense (blue) and antisense (red) protospacers. Data from cells sampled from day 14 post transduction of library. Numbers of stop epegRNAs denoted as n and p-values from two-sided, two-sample t-test indicated. c, Schematic of sense and antisense protospacers relative to gene.
Extended Data Fig. 6 epegRNAs do not show systemic CRISPRi effect.
a, Replicate-averaged growth phenotypes for stop and spacer- and codon-matched synonymous epegRNAs from StopPR screen at day 14 post transduction of library, for the subset in which the matched synonymous epegRNA targets a + 5-6 position (PAM-disrupting). b, Replicate-averaged growth phenotypes for stop, spacer- and codon-matched synonymous, and no edit epegRNAs from StopPR screen at day 14 post transduction of library. epegRNAs grouped by distance to the closest transcription start site (TSS) annotated in the targeted gene. Number of epegRNAs in each group denoted above as n. Two-sided t-test p-values comparing phenotypes between relevant groups reported in Supplementary Table 6. Median and interquartile range (IQR) of the set of epegRNAs targeting indicated distances from the TSS are shown, excluding epegRNAs targeting the D-1 position (see section “Unbiased identification of splice site variants”). Whiskers extend 1.5*IQR past the upper and lower quartiles. c, Replicate-averaged growth phenotypes for stop and spacer- and codon-matched synonymous epegRNAs (excluding D-1 epegRNAs) from StopPR screen at day 14 post transduction of library, for the subset in which the matched synonymous epegRNA targets a non-PAM-disrupting position within the indicated distance from the TSS. Non-PAM-disrupting synonymous epegRNAs target a non +5-6 position and could allow for continued target engagement after editing. epegRNAs targeting closest to the TSS are most expected to show CRISPRi activity, if present29. d, As in c, but for the subset in which the matched synonymous epegRNA targets a PAM-disrupting position within the indicated distance from the TSS. e, Fold change of cells expressing epegRNAs over two weeks post transduction of epegRNAs (determined by loss of co-expressed BFP). Analysis for 6 synonymous epegRNAs disrupting D-1 splice sites and spacer-matched no edit control epegRNAs. Data for n = 3 independent biological replicates shown, lines and error bars represent mean +/- s.d. Dotted lines in a-d denote phenotype cutoffs (Z < -2).
Supplementary information
Supplementary Information
Supplementary Discussion, Table Legends, Table 7 and Figs. 1–3.
Supplementary Tables 1–3
Supplementary Table 1. Self-targeting +5 G>H sensor library design and results, related to Fig. 2 and Extended Data Fig. 2. Information about the design of each epegRNA and its associated target site used in the self-targeting +5 G>H sensor screens, along with resulting editing and error rates at each timepoint. Specifically, 'Freq.EditOnly' (proportion of reads with only the intended edit) was used when reporting precise editing rates, and 'Freq.TotalErrors' (proportion of reads with at least one unintended edit) was used when reporting error rates. Additionally, 'Freq.TotalWildType' (proportion of reads that were unedited) is included. Average columns report the indicated frequencies averaged across replicates for the same epegRNA–target pair for the relevant screen and sampled timepoint. Supplementary Table 2. Self-targeting Tiled edits sensor library design and results, related to Fig. 2 and Extended Data Fig. 3. Information about the design of each epegRNA and its associated target site used in the self-targeting Tiled edits sensor screen, along with resulting editing and error rates at each timepoint. Specifically, '% PE' (proportion of reads with only the intended edit) was used when reporting precise editing rates, '% error' (proportion of reads with at least one unintended edit) was used when reporting error rates and '% WT' (proportion of reads that exactly matched the unedited sequence) was used when reporting unedited rates. Supplementary Table 3. StopPR library design and results for control epegRNAs, related to Figs. 3 and 4 and Extended Data Figs. 4–6. Information about the design of each control epegRNA (nontargeting, no edit and synonymous) used in the StopPR screen, along with resulting growth phenotypes (reported as Z scores) at each timepoint. Average columns report the indicated Z-score (either at day 14 or day 28 post transduction) averaged across replicates for the same epegRNA. Column titled 'included_in_analysis' is denoted as yes or no for each epegRNA if it is included or not included in StopPR screen analysis, respectively.
Supplementary Tables 4–6
Supplementary Table 4. StopPR library design and results for stop epegRNAs, related to Figs. 3 and 4 and Extended Data Figs. 4–6. Information about the design of each stop epegRNA used in the StopPR screen, along with resulting growth phenotypes (reported as Z scores) at each timepoint. Average columns report the indicated Z-score (either at day 14 or day 28 post transduction) averaged across replicates for the same epegRNA. Column titled 'included_in_analysis' is denoted as yes or no for each epegRNA if it is included or not included in StopPR screen analysis, respectively. Supplementary Table 5. Multilinear model for impact of epegRNA and endogenous target features, related to the ‘Multiple linear regression model' section in the Methods. Model coefficients and significance for the multilinear model discussed in the Results. Model used two-sided t-test. Model encodings used for categorical variables are also provided. Supplementary Table 6. Two-sided t-test P values for pairwise comparisons of epegRNA phenotypes across distances from targeted TSS, related to Extended Data Fig. 6b. Two-sided t-test P values comparing growth phenotypes from StopPR screen for stop, synonymous and no edit control epegRNAs, when comparing each of the first 200 bp increment bins to each downstream bin in relation to the TSS. Data presented in Extended Data Fig. 6b.
Source data
Source Data Fig. 1
Full image of a western blot, with all three panels shown (presented in Fig. 1a and Supplementary Fig. 3).
Source Data Fig. 2
Full image of a western blot, Cas9 panel only, re-imaging of same blot in Source Data Fig. 1 (presented in Fig. 1a and Supplementary Fig. 3).
Source Data Fig. 3
Full image of a western blot, MLH1 panel only, re-imaging of same blot in Source Data Fig. 1 (presented in Fig. 1a and Supplementary Fig. 3).
Source Data Fig. 4
Full image of a western blot, Actin panel only, re-imaging of same blot in Source Data Fig. 1 (presented in Fig. 1a and Supplementary Fig. 3).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Cirincione, A., Simpson, D., Yan, W. et al. A benchmarked, high-efficiency prime editing platform for multiplexed dropout screening. Nat Methods 22, 92–101 (2025). https://doi.org/10.1038/s41592-024-02502-4
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41592-024-02502-4
This article is cited by
-
Advancements in the application of reporter gene cell lines in bioactivity evaluation of biological products
Bioresources and Bioprocessing (2025)
-
Prime editor-based high-throughput screening reveals functional synonymous mutations in human cells
Nature Biotechnology (2025)
-
Precision mutational scanning: your multipass to the future of genetics
Nature Methods (2025)
-
Systematic pegRNA design with PRIDICT2.0 and ePRIDICT for efficient prime editing
Nature Protocols (2025)