Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

High-throughput evaluation of in vitro CRISPR activities enables optimized large-scale multiplex enrichment of rare variants

Abstract

Previous high-throughput evaluations of CRISPR activities for a large number of target and guide RNA sequences were based on measuring insertion–deletion frequencies rather than cleavage efficiencies. Here we develop two high-throughput in vitro methods, Cut-seq1 and Cut-seq2, to evaluate Cas9 cleavage efficiency for tens of thousands, or even hundreds of thousands, of guide RNA–target pairs. These methods reveal low correlations between in vitro cleavage efficiencies and insertion–deletion frequencies in cells, yet high concordances in protospacer adjacent motif compatibility. Using the resulting large datasets of in vitro cleavage efficiencies, we develop DeepCut, a set of deep learning models that can identify optimized single-guide RNAs that can selectively cleave specific sequences, even in the presence of similar noise sequences. Using these optimized single-guide RNAs, we develop a method, CLOVE-seq (which stands for cleavage for large-scale optimized variant enrichment sequencing), to enrich rare variants in a multiplexed manner by Cas9-mediated specific cleavage of noise or rare variant sequences. Our methods can enhance the understanding of CRISPR nuclease activities and could be used to detect a large number of rare variants in various biomedical contexts.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Development and validation of Cut-seq1 for high-throughput evaluation of Cas9 cleavage efficiency in vitro.
Fig. 2: Cas9 nuclease activities in vitro versus in cultured cells.
Fig. 3: The selectivity of divergent sgRNAs evaluated by Cut-seq1 and its advanced version, Cut-seq2.
Fig. 4: Development and performance of DeepCut-HF1, DeepCut-NRRH-HF1 and DeepCut-NRCH-HF1.
Fig. 5: Enrichment of rare variant sequences using optimized sgRNAs versus perfectly matched sgRNAs with HF1 by cleaving noise sequences.
Fig. 6: Enrichment of rare variant sequences by cleaving them with optimized versus perfectly matched sgRNAs.

Data availability

All data are available in the main text or the Supplementary Information. We have submitted the deep sequencing data from this study to the National Center of Biotechnology Information’s Sequence Read Archive under accession number PRJNA1290823. Source data are provided with this paper.

Code availability

The custom Python scripts used for data analysis are available at GitHub at https://github.com/JooHyeYeo/Cutseq_DeepCut (ref. 149). DeepCut can be accessed at https://edu.deepcrispr.info.

References

  1. Kim, H. K. et al. In vivo high-throughput profiling of CRISPR–Cpf1 activity. Nat. Methods 14, 153–159 (2016).

    Article  PubMed  Google Scholar 

  2. Kim, H. K. et al. Deep learning improves prediction of CRISPR–Cpf1 guide RNA activity. Nat. Biotechnol. 36, 239–241 (2018).

    Article  CAS  PubMed  Google Scholar 

  3. Kim, H. K. et al. SpCas9 activity prediction by DeepSpCas9, a deep learning-based model with high generalization performance. Sci. Adv. 5, eaax9249 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Kim, H. K. et al. High-throughput analysis of the activities of xCas9, SpCas9-NG and SpCas9 at matched and mismatched target sequences in human cells. Nat. Biomed. Eng. 4, 111–124 (2020).

    Article  CAS  PubMed  Google Scholar 

  5. Kim, N. et al. Prediction of the sequence-specific cleavage activity of Cas9 variants. Nat. Biotechnol. 38, 1328–1336 (2020).

    Article  CAS  PubMed  Google Scholar 

  6. Kim, N. et al. Deep learning models to predict the editing efficiencies and outcomes of diverse base editors. Nat. Biotechnol. 42, 484–497 (2024).

    Article  CAS  PubMed  Google Scholar 

  7. van Overbeek, M. et al. DNA repair profiling reveals nonrandom outcomes at Cas9-mediated breaks. Mol. Cell 63, 633–646 (2016).

    Article  PubMed  Google Scholar 

  8. Song, B., Yang, S., Hwang, G. H., Yu, J. & Bae, S. Analysis of NHEJ-based DNA repair after CRISPR-mediated DNA cleavage. Int. J. Mol. Sci. 22, 6397 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Wu, X. et al. Genome-wide binding of the CRISPR endonuclease Cas9 in mammalian cells. Nat. Biotechnol. 32, 670–676 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Chen, X. et al. Probing the impact of chromatin conformation on genome editing tools. Nucleic Acids Res. 44, 6482–6492 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Daer, R., Cutts, J., Brafman, D. & Haynes, K. The impact of chromatin dynamics on Cas9-mediated genome editing in human cells. ACS Synth. Biol. 6, 428–438 (2017).

    Article  CAS  PubMed  Google Scholar 

  12. Jensen, K. T. et al. Chromatin accessibility and guide sequence secondary structure affect CRISPR–Cas9 gene editing efficiency. FEBS Lett. 591, 1892–1901 (2017).

    Article  CAS  PubMed  Google Scholar 

  13. Kim, D. & Kim, J.-S. DIG-seq: a genome-wide CRISPR off-target profiling method using chromatin DNA. Genome Res. 28, 1894–1900 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Uusi-Mäkelä, M. I. et al. Chromatin accessibility is associated with CRISPR–Cas9 efficiency in the zebrafish (Danio rerio). PLoS ONE 13, e0196238 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Yarrington, R. M., Verma, S., Schwartz, S., Trautman, J. K. & Carroll, D. Nucleosomes inhibit target cleavage by CRISPR–Cas9 in vivo. Proc. Natl Acad. Sci. USA 115, 9351–9358 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Liu, G., Yin, K., Zhang, Q., Gao, C. & Qiu, J.-L. Modulating chromatin accessibility by transactivation and targeting proximal dsgRNAs enhances Cas9 editing efficiency in vivo. Genome Biol. 20, 145 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Chakrabarti, A. M. et al. Target-specific precision of CRISPR-mediated genome editing. Mol. Cell 73, 699–713.e696 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Chechik, L., Martin, O. & Soutoglou, E. Genome editing fidelity in the context of DNA sequence and chromatin structure. Front. Cell Dev. Biol. 8, 319 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Weiss, T. et al. Epigenetic features drastically impact CRISPR–Cas9 efficacy in plants. Plant Physiol. 190, 1153–1164 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Pattanayak, V. et al. High-throughput profiling of off-target DNA cleavage reveals RNA-programmed Cas9 nuclease specificity. Nat. Biotechnol. 31, 839–843 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Kim, D. et al. Digenome-seq: genome-wide profiling of CRISPR–Cas9 off-target effects in human cells. Nat. Methods 12, 237–243 (2015).

    Article  CAS  PubMed  Google Scholar 

  22. Tsai, S. Q. et al. CIRCLE-seq: a highly sensitive in vitro screen for genome-wide CRISPR–Cas9 nuclease off-targets. Nat. Methods 14, 607–614 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Lazzarotto, C. R. et al. CHANGE-seq reveals genetic and epigenetic effects on CRISPR–Cas9 genome-wide activity. Nat. Biotechnol. 38, 1317–1327 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Cameron, P. et al. Mapping the genomic landscape of CRISPR–Cas9 cleavage. Nat. Methods 14, 600–606 (2017).

    Article  CAS  PubMed  Google Scholar 

  25. Kaminski, M. M., Abudayyeh, O. O., Gootenberg, J. S., Zhang, F. & Collins, J. J. CRISPR-based diagnostics. Nat. Biomed. Eng. 5, 643–656 (2021).

    Article  CAS  PubMed  Google Scholar 

  26. Gu, W. et al. Depletion of abundant sequences by hybridization (DASH): using Cas9 to remove unwanted high-abundance species in sequencing libraries and molecular counting applications. Genome Biol. 17, 41 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Lee, S. H. et al. CUT-PCR: CRISPR-mediated, ultrasensitive detection of target DNA using PCR. Oncogene 36, 6823–6829 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Jia, C. et al. New applications of CRISPR/Cas9 system on mutant DNA detection. Gene 641, 55–62 (2018).

    Article  CAS  PubMed  Google Scholar 

  29. Wang, L. et al. Improved EGFR mutation detection sensitivity after enrichment by Cas9/sgRNA digestion and PCR amplification. Acta Biochim. Biophys. Sin. 52, 1316–1324 (2020).

    Article  CAS  PubMed  Google Scholar 

  30. Lee, D., Lee, J. H. & Bang, D. Accurate detection of rare mutant alleles by target base-specific cleavage with the CRISPR/Cas9 system. ACS Synth. Biol. 10, 1451–1464 (2021).

    Article  CAS  PubMed  Google Scholar 

  31. Chen, J. et al. Programmable endonuclease combined with isothermal polymerase amplification to selectively enrich for rare mutant allele fractions. Chin. Chem. Lett. 33, 4126–4132 (2022).

    Article  CAS  PubMed  Google Scholar 

  32. Li, S.-Y. et al. CRISPR–Cas12a-assisted nucleic acid detection. Cell Discov. 4, 20 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Li, S.-Y. et al. CRISPR–Cas12a has both cis-and trans-cleavage activities on single-stranded DNA. Cell Res. 28, 491–493 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Chen, J. S. et al. CRISPR–Cas12a target binding unleashes indiscriminate single-stranded DNase activity. Science 360, 436–439 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Gootenberg, J. S. et al. Multiplexed and portable nucleic acid detection platform with Cas13, Cas12a, and Csm6. Science 360, 439–444 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Teng, F. et al. CDetection: CRISPR–Cas12b-based DNA detection with sub-attomolar sensitivity and single-base specificity. Genome Biol. 20, 132 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Li, L. et al. HOLMESv2: a CRISPR-Cas12b-assisted platform for nucleic acid detection and DNA methylation quantitation. ACS Synth. Biol. 8, 2228–2237 (2019).

    Article  CAS  PubMed  Google Scholar 

  38. Song, J. et al. Amplifying mutational profiling of extracellular vesicle mRNA with SCOPE. Nat. Biotechnol. 43, 1485–1495 (2025).

  39. Gootenberg, J. S. et al. Nucleic acid detection with CRISPR–Cas13a/C2c2. Science 356, 438–442 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Kellner, M. J., Koob, J. G., Gootenberg, J. S., Abudayyeh, O. O. & Zhang, F. SHERLOCK: nucleic acid detection with CRISPR nucleases. Nat. Protoc. 14, 2986–3012 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Arizti-Sanz, J. et al. Streamlined inactivation, amplification, and Cas13-based detection of SARS-CoV-2. Nat. Commun. 11, 5921 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Ackerman, C. M. et al. Massively multiplexed nucleic acid detection with Cas13. Nature 582, 277–282 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Liu, Q. et al. Argonaute integrated single-tube PCR system enables supersensitive detection of rare mutations. Nucleic Acids Res. 49, e75 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Song, J. et al. Highly specific enrichment of rare nucleic acid fractions using Thermus thermophilus Argonaute with applications in cancer diagnostics. Nucleic Acids Res. 48, e19 (2020).

    Article  PubMed  Google Scholar 

  45. Chen, W. et al. Detection of low-frequency mutations in clinical samples by increasing mutation abundance via the excision of wild-type sequences. Nat. Biomed. Eng. 7, 1602–1613 (2023).

    Article  CAS  PubMed  Google Scholar 

  46. Quan, J. et al. FLASH: a next-generation CRISPR diagnostic for multiplexed detection of antimicrobial resistance sequences. Nucleic Acids Res. 47, e83 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Lee, J. et al. CRISPR-Cap: multiplexed double-stranded DNA enrichment based on the CRISPR system. Nucleic Acids Res. 47, e1 (2019).

    Article  CAS  PubMed  Google Scholar 

  48. Kim, J. M., Kim, D., Kim, S. & Kim, J. S. Genotyping with CRISPR–Cas-derived RNA-guided endonucleases. Nat. Commun. 5, 3157 (2014).

    Article  PubMed  Google Scholar 

  49. Kim, H. K. et al. In vivo high-throughput profiling of CRISPR–Cpf1 activity. Nat. Methods 14, 153–159 (2017).

    Article  CAS  PubMed  Google Scholar 

  50. Kim, H. K. et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nat. Biotechnol. 39, 198–206 (2021).

    Article  CAS  PubMed  Google Scholar 

  51. Seo, S.-Y. et al. Massively parallel evaluation and computational prediction of the activities and specificities of 17 small Cas9s. Nat. Methods 20, 999–1009 (2023).

    Article  CAS  PubMed  Google Scholar 

  52. Yu, G. et al. Prediction of efficiencies for diverse prime editing systems in multiple cell types. Cell 186, 2256–2272.e2223 (2023).

    Article  CAS  PubMed  Google Scholar 

  53. Cohen, J. D. et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 359, 926–930 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Hsu, P. D. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol. 31, 827–832 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Jost, M. et al. Titrating gene expression using libraries of systematically attenuated CRISPR guide RNAs. Nat. Biotechnol. 38, 355–364 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Lin, Y. et al. CRISPR/Cas9 systems have off-target activity with insertions or deletions between target DNA and guide RNA sequences. Nucleic Acids Res. 42, 7473–7485 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Tsai, S. Q., Nguyen, N., Zheng, Z. & Joung, J. K. High-fidelity CRISPR–Cas9 variants with undetectable genome-wide off-targets. Nature 529, 490–495 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  58. Bravo, J. P. K. et al. Structural basis for mismatch surveillance by CRISPR–Cas9. Nature 603, 343–347 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Casini, A. et al. A highly specific SpCas9 variant is identified by in vivo screening in yeast. Nat. Biotechnol. 36, 265–271 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Miller, S. M. et al. Continuous evolution of SpCas9 variants compatible with non-G PAMs. Nat. Biotechnol. 38, 471–481 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In NIPS'17: Proc. 31st International Conference on Neural Information Processing Systems 4768–4777 (Curran Associates, 2017).

  62. Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Song, M. et al. Sequence-specific prediction of the efficiencies of adenine and cytosine base editors. Nat. Biotechnol. 38, 1037–1043 (2020).

    Article  CAS  PubMed  Google Scholar 

  65. Fu, R. et al. Systematic decomposition of sequence determinants governing CRISPR/Cas9 specificity. Nat. Commun. 13, 474 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Wang, D. et al. Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by deep learning. Nat. Commun. 10, 4284 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  67. Wan, Y. & Jiang, Z. TransCrispr: transformer based hybrid model for predicting CRISPR/Cas9 single guide RNA cleavage efficiency. IEEE/ACM Trans. Comput. Biol. Bioinform. 20, 1518–1528 (2022).

    Article  Google Scholar 

  68. Lin, J., Zhang, Z., Zhang, S., Chen, J. & Wong, K. C. CRISPR-net: a recurrent convolutional network quantifies CRISPR off-target activities with mismatches and indels. Adv. Sci. 7, 1903562 (2020).

    Article  CAS  Google Scholar 

  69. Chuai, G. et al. DeepCRISPR: optimized CRISPR guide RNA design by deep learning. Genome Biol. 19, 80 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Xiang, X. et al. Enhancing CRISPR–Cas9 gRNA efficiency prediction by data integration and deep learning. Nat. Commun. 12, 3238 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Anthon, C., Corsi, G. I. & Gorodkin, J. CRISPRon/off: CRISPR/Cas9 on-and off-target gRNA design. Bioinformatics 38, 5437–5439 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Song, P. et al. Selective multiplexed enrichment for the detection and quantitation of low-fraction DNA variants via low-depth sequencing. Nat. Biomed. Eng. 5, 690–701 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Wu, L. R., Chen, S. X., Wu, Y., Patel, A. A. & Zhang, D. Y. Multiplexed enrichment of rare DNA variants via sequence-selective and temperature-robust amplification. Nat. Biomed. Eng. 1, 714–723 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Chakravarty, D. et al. OncoKB: a precision oncology knowledge base. JCO Precis. Oncol. 2017, PO.17.00011 (2017).

  75. Lee, H. W. et al. Applications of molecular barcode sequencing for the detection of low-frequency variants in circulating tumour DNA from hepatocellular carcinoma. Liver Int. 42, 2317–2326 (2022).

    Article  CAS  PubMed  Google Scholar 

  76. Newman, A. M. et al. An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nat. Med. 20, 548–554 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Freitas, C. et al. The role of liquid biopsy in early diagnosis of lung cancer. Front. Oncol. 11, 634316 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Tie, J. et al. Circulating tumor DNA analysis guiding adjuvant therapy in stage II colon cancer. N. Engl. J. Med. 386, 2261–2272 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Garcia-Murillas, I. et al. Mutation tracking in circulating tumor DNA predicts relapse in early breast cancer. Sci. Transl. Med. 7, 302ra133 (2015).

    Article  PubMed  Google Scholar 

  80. Li, S. et al. Circulating tumor DNA predicts the response and prognosis in patients with early breast cancer receiving neoadjuvant chemotherapy. JCO Precis. Oncol. 4, PO.19.00292 (2020).

  81. Phallen, J. et al. Direct detection of early-stage cancers using circulating tumor DNA. Sci. Transl. Med. 9, eaan2415 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  82. Tomczak, K., Czerwińska, P. & Wiznerowicz, M. Review The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemp. Oncol. 2015, A68–A77 (2015).

    Google Scholar 

  83. Warnecke, P. M. et al. Detection and measurement of PCR bias in quantitative methylation analysis of bisulphite-treated DNA. Nucleic Acids Res. 25, 4422–4426 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Kebschull, J. M. & Zador, A. M. Sources of PCR-induced distortions in high-throughput sequencing data sets. Nucleic Acids Res. 43, e143 (2015).

    PubMed  PubMed Central  Google Scholar 

  85. Best, K., Oakes, T., Heather, J. M., Shawe-Taylor, J. & Chain, B. Computational analysis of stochastic heterogeneity in PCR amplification efficiency revealed by single molecule barcoding. Sci. Rep. 5, 14629 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Weusten, J. & Herbergs, J. A stochastic model of the processes in PCR based amplification of STR DNA in forensic applications. Forensic Sci. Int. Genet. 6, 17–25 (2012).

    Article  CAS  PubMed  Google Scholar 

  87. Patchsung, M. et al. Clinical validation of a Cas13-based assay for the detection of SARS-CoV-2 RNA. Nat. Biomed. Eng. 4, 1140–1149 (2020).

    Article  CAS  PubMed  Google Scholar 

  88. Kaminski, M. M. et al. A CRISPR-based assay for the detection of opportunistic infections post-transplantation and for the monitoring of transplant rejection. Nat. Biomed. Eng. 4, 601–609 (2020).

    Article  CAS  PubMed  Google Scholar 

  89. Bonner, E. R. et al. Circulating tumor DNA sequencing provides comprehensive mutation profiling for pediatric central nervous system tumors. npj Precis. Oncol. 6, 63 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Ernst, S. M. et al. Clinical utility of circulating tumor DNA in patients with advanced KRASG12C-mutated NSCLC treated with sotorasib. J. Thorac. Oncol. 19, 995–1006 (2024).

    Article  CAS  PubMed  Google Scholar 

  91. Vanni, I. et al. Combining germline, tissue and liquid biopsy analysis by comprehensive genomic profiling to improve the yield of actionable variants in a real-world cancer cohort. J. Transl. Med. 22, 462 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Harter, J. et al. Analytical performance evaluation of a 523-gene circulating tumor DNA assay for next-generation sequencing–based comprehensive tumor profiling in liquid biopsy samples. J. Mol. Diagn. 26, 61–72 (2024).

    Article  CAS  PubMed  Google Scholar 

  93. Robbins, H. L. et al. Abstract A115: mTOR targeting in STK11 deficient non-small cell lung cancer (NSCLC): final results, pre-clinical rationale and biomarker analysis of a phase II trial of the mTORC1/2 inhibitor vistusertib in STK11 deficient lung adenocarcinoma (NLMT B2). Mol. Cancer Ther. 22, A115–A115 (2023).

    Article  Google Scholar 

  94. Nakamura, Y. et al. Targeted therapy guided by circulating tumor DNA analysis in advanced gastrointestinal tumors. Nat. Med. 31, 165-175 (2025).

  95. Assaf, Z. J. F. et al. A longitudinal circulating tumor DNA-based model associated with survival in metastatic non-small-cell lung cancer. Nat. Med. 29, 859–868 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Black, J. R. et al. Ultrasensitive ctDNA detection for preoperative disease stratification in early-stage lung adenocarcinoma. Nat. Med. 31, 70–76 (2025).

  97. Ryoo, S.-B. et al. Personalised circulating tumour DNA assay with large-scale mutation coverage for sensitive minimal residual disease detection in colorectal cancer. Br. J. Cancer 129, 374–381 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Schrock, A. B. et al. Hybrid capture–based genomic profiling of circulating tumor DNA from patients with advanced non–small cell lung cancer. J. Thorac. Oncol. 14, 255–264 (2019).

    Article  CAS  PubMed  Google Scholar 

  99. Tyler, J., Kumer, L., Fisher, C., Casey, H. & Shike, H. Personalized chimerism test that uses selection of short tandem repeat or quantitative PCR depending on patient’s chimerism status. J. Mol. Diagn. 21, 483–490 (2019).

    Article  CAS  PubMed  Google Scholar 

  100. Breuer, S. et al. Early recipient chimerism testing in the T-and NK-cell lineages for risk assessment of graft rejection in pediatric patients undergoing allogeneic stem cell transplantation. Leukemia 26, 509–519 (2012).

    Article  CAS  PubMed  Google Scholar 

  101. Shimoni, A. & Nagler, A. Non-myeloablative stem cell transplantation (NST): chimerism testing as guidance for immune-therapeutic manipulations. Leukemia 15, 1967–1975 (2001).

    Article  CAS  PubMed  Google Scholar 

  102. Bae, T. et al. Different mutational rates and mechanisms in human cells at pregastrulation and neurogenesis. Science 359, 550–555 (2018).

    Article  CAS  PubMed  Google Scholar 

  103. Ju, Y. S. et al. Somatic mutations reveal asymmetric cellular dynamics in the early human embryo. Nature 543, 714–718 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. Vijg, J. & Dong, X. Pathogenic mechanisms of somatic mutation and genome mosaicism in aging. Cell 182, 12–23 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Lim, E. T. et al. Rates, distribution and implications of postzygotic mosaic mutations in autism spectrum disorder. Nat. Neurosci. 20, 1217–1224 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. Martincorena, I. et al. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. Kim, J. H. et al. Analysis of low-level somatic mosaicism reveals stage and tissue-specific mutational features in human development. PLoS Genet. 18, e1010404 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Wang, Y. et al. Comprehensive identification of somatic nucleotide variants in human brain tissue. Genome Biol. 22, 92 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Uchimura, A. et al. Early embryonic mutations reveal dynamics of somatic and germ cell lineages in mice. Genome Res. 32, 945–955 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  111. Huyghe, J. R. et al. Discovery of common and rare genetic risk variants for colorectal cancer. Nat. Genet. 51, 76–87 (2019).

    Article  CAS  PubMed  Google Scholar 

  112. Lu, C. et al. Patterns and functional implications of rare germline variants across 12 cancer types. Nat. Commun. 6, 10086 (2015).

    Article  CAS  PubMed  Google Scholar 

  113. Kessler, M. D. et al. Common and rare variant associations with clonal haematopoiesis phenotypes. Nature 612, 301–309 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  114. Pizzo, L. et al. Rare variants in the genetic background modulate cognitive and developmental phenotypes in individuals carrying disease-associated variants. Genet. Med. 21, 816–825 (2019).

    Article  CAS  PubMed  Google Scholar 

  115. Wang, Q. et al. Rare variant contribution to human disease in 281,104 UK Biobank exomes. Nature 597, 527–532 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  116. Parsons, H. A. et al. Sensitive detection of minimal residual disease in patients treated for early-stage breast cancer. Clin. Cancer Res. 26, 2556–2564 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Klein, C. A. et al. Genetic heterogeneity of single disseminated tumour cells in minimal residual cancer. Lancet 360, 683–689 (2002).

    Article  CAS  PubMed  Google Scholar 

  118. Partridge, M. et al. Detection of minimal residual cancer to investigate why oral tumors recur despite seemingly adequate treatment. Clin. Cancer Res. 6, 2718–2725 (2000).

    CAS  PubMed  Google Scholar 

  119. Szczepariski, T., Orfão, A., van der Valden, V. H., San Miguel, J. F. & van Dongen, J. J. Minimal residual disease in leukaemia patients. Lancet Oncol. 2, 409–417 (2001).

    Article  Google Scholar 

  120. Ivey, A. et al. Assessment of minimal residual disease in standard-risk AML. N. Engl. J. Med. 374, 422–433 (2016).

    Article  CAS  PubMed  Google Scholar 

  121. Jongen-Lavrencic, M. et al. Molecular minimal residual disease in acute myeloid leukemia. N. Engl. J. Med. 378, 1189–1199 (2018).

    Article  CAS  PubMed  Google Scholar 

  122. Huang, M., Zhou, X., Wang, H. & Xing, D. Clustered regularly interspaced short palindromic repeats/Cas9 triggered isothermal amplification for site-specific nucleic acid detection. Anal. Chem. 90, 2193–2200 (2018).

    Article  CAS  PubMed  Google Scholar 

  123. Pardee, K. et al. Rapid, low-cost detection of Zika virus using programmable biomolecular components. Cell 165, 1255–1266 (2016).

    Article  CAS  PubMed  Google Scholar 

  124. Zhou, W. et al. A CRISPR–Cas9-triggered strand displacement amplification method for ultrasensitive DNA detection. Nat. Commun. 9, 5012 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  125. Dai, Y. et al. Exploring the trans-cleavage activity of CRISPR–Cas12a (cpf1) for the development of a universal electrochemical biosensor. Angew. Chem. 131, 17560–17566 (2019).

    Article  Google Scholar 

  126. Wang, T., Liu, Y., Sun, H. H., Yin, B. C. & Ye, B. C. An RNA-guided Cas9 nickase-based method for universal isothermal DNA amplification. Angew. Chem. Int. Ed. Engl. 58, 5382–5386 (2019).

    Article  CAS  PubMed  Google Scholar 

  127. Hajian, R. et al. Detection of unamplified target genes via CRISPR–Cas9 immobilized on a graphene field-effect transistor. Nat. Biomed. Eng. 3, 427–437 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  128. English, M. A. et al. Programmable CRISPR-responsive smart materials. Science 365, 780–785 (2019).

    Article  CAS  PubMed  Google Scholar 

  129. Gayet, R. V. et al. Creating CRISPR-responsive smart materials for diagnostics and programmable cargo release. Nat. Protoc. 15, 3030–3063 (2020).

    Article  CAS  PubMed  Google Scholar 

  130. Bennett-Baker, P. E. & Mueller, J. L. CRISPR-mediated isolation of specific megabase segments of genomic DNA. Nucleic Acids Res. 45, e165 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  131. Nachmanson, D. et al. Targeted genome fragmentation with CRISPR/Cas9 enables fast and efficient enrichment of small genomic regions and ultra-accurate sequencing with low DNA input (CRISPR-DS). Genome Res. 28, 1589–1599 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  132. Schultzhaus, Z., Wang, Z. & Stenger, D. CRISPR-based enrichment strategies for targeted sequencing. Biotechnol. Adv. 46, 107672 (2021).

    Article  CAS  PubMed  Google Scholar 

  133. Hung, K. L. et al. Targeted profiling of human extrachromosomal DNA by CRISPR-CATCH. Nat. Genet. 54, 1746–1754 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  134. Malekshoar, M. et al. CRISPR-Cas9 targeted enrichment and next-generation sequencing for mutation detection. J. Mol. Diagn. 25, 249–262 (2023).

    Article  CAS  PubMed  Google Scholar 

  135. Kim, Y. et al. High-throughput functional evaluation of human cancer-associated mutations using base editors. Nat. Biotechnol. 40, 874–884 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  136. Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  137. Ke, G. et al. Lightgbm: a highly efficient gradient boosting decision tree. In NIPS'17: Proc. 31st International Conference on Neural Information Processing Systems 3149–3157 (Curran Associates, 2017).

  138. Corsi, G. I. et al. CRISPR/Cas9 gRNA activity depends on free energy changes and on the target PAM context. Nat. Commun. 13, 3006 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  139. Lorenz, R. et al. ViennaRNA package 2.0. Algorithms Mol. Biol. 6, 26 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  140. Cock, P. J. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  141. Starks, E. R. et al. Assessing limit of detection in clinical sequencing. J. Mol. Diagn. 23, 455–466 (2021).

    Article  CAS  PubMed  Google Scholar 

  142. Ahlmann-Eltze, C. & Huber, W. Comparison of transformations for single-cell RNA-seq data. Nat. Methods 20, 665–672 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  143. Zhang, Z. R. & Jiang, Z. R. Effective use of sequence information to predict CRISPR–Cas9 off-target. Comput. Struct. Biotechnol. J. 20, 650–661 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  144. Needleman, S. B. & Wunsch, C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970).

    Article  CAS  PubMed  Google Scholar 

  145. Vaswani, A. et al. Attention is all you need. In NIPS'17: Proc. 31st International Conference on Neural Information Processing Systems 5999–6009 (Curran Associates, 2017).

  146. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT'19: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) 4171–4186 (Association for Computational Linguistics, 2019).

  147. Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In International Conference on Learning Representations (ICLR 2019) https://openreview.net/pdf/5963886abef941684ffc0cf670297e47fb1e5155.pdf (ICLR, 2019).

  148. Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  149. Yeo, J. H. et al. Cutseq_DeepCut. GitHub https://github.com/JooHyeYeo/Cutseq_DeepCut (2025).

Download references

Acknowledgements

We thank K. S. Lee, Y. Kim, S. Park and G. Baek for assisting with the experiments. We also thank T. DeFanti, F. Wuerthwein, L. Smarr, D. Mishin, J. Graham, J. Paden, I. Nealey, J. Moon and W. Kwon for their help. We thank Medical Illustration & Design (MID), a part of the Medical Research Support Services at Yonsei University College of Medicine, for their professional support with medical illustrations. This work was supported, in part, by the National Research Foundation of Korea grants funded by the Korean Government (MSIT) (RS-2022-NR070713 (H.H.K.), 2021R1I1A1A01047269 (J.H.Y.), RS-2025-02214844 (H.H.K.) and RS-2023-NR076625 (E.-J.N.)); the Bio and Medical Technology Development Program of the NRF funded by the Korean Government (MSIT) RS-2022-NR067326 (H.H.K.), RS-2022-NR067345 (H.H.K.) and RS-2023-00260968 (H.H.K.); the Deep Science Startup Promotion Program funded by the Ministry of Science and ICT (RS-2025-02633786 (J.H.Y.)); the Korea Drug Development Fund funded by the Ministry of Science and ICT, the Ministry of Trade, Industry and Energy; the Korea–US Collaborative Research Fund (KUCRF), funded by the Ministry of Science and ICT and Ministry of Health and Welfare, Republic of Korea (grant number RS-2024-00467177 (H.H.K.)); the Yonsei Signature Research Cluster Program of 2025-22-0015 (H.H.K.); the Brain Korea 21 FOUR Project for Medical Science (Yonsei University College of Medicine); the SNUH Kun-hee Lee Child Cancer and Rare Disease Project, Republic of Korea (22B-000-0101 (H.H.K.)); the Yonsei Fellow Program, funded by Lee Youn Jae; the “Regional Innovation System & Education (RISE)” through the Seoul RISE Center, funded by the Ministry of Education (MOE) and the Seoul Metropolitan Government (2025-RISE-01-022-05 (H.H.K.)); the Seok-San Yonsei Medical Scientist Training Program (MSTP) Song Yong-Sang Scholarship, College of Medicine, Yonsei University (S.K.); the MD-PhD/Medical Scientist Training Program (MSTP) through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health and Welfare, Republic of Korea (S.K.); a Student Research Bursary from the Song-Dang Institute for Cancer Research at Yonsei University College of Medicine (S.K.); a faculty research grant of Yonsei University College of Medicine (6-2019-0166 (E.J.-N.)); the National Research Platform (NRP) and the Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI) at the University of California, San Diego, by funding from the National Science Foundation (NSF), with award numbers 1730158, 1540112, 1541349, 1826967, 2138811, 2112167, 2100237 and 2120019; and additional funding from community partners (S.K.) and infrastructure use from the Chameleon testbed, supported by NSF award numbers 1419152, 1743354 and 2027170 (S.K.). This work was supported by KREONET.

Author information

Authors and Affiliations

Authors

Contributions

J.H.Y. conducted all wet experiments and analysed data, including bioinformatics analysis. S.L. critically contributed to data analysis. S.K., H.-C.O. and J.H.Y. performed machine learning. J.-G.M. and R.G. contributed to conducting wet experiments. J.H.Y. designed all libraries. H.K.K. contributed to the initial design of sgRNA libraries. S.K. contributed to the design of a subset of libraries. E.-J.N. contributed to evaluating CLOVE-seq using samples from human patients with cancer. J.H.Y., S.K. and H.H.K. wrote the paper with input from all authors. J.H.Y. and H.H.K. designed the study. H.H.K. conceived and supervised the study. J.H.Y., S.K. and H.H.K. visualized the results. All authors approved the manuscript.

Corresponding author

Correspondence to Hyongbum Henry Kim.

Ethics declarations

Competing interests

Yonsei University has filed patent applications based on this work, in which J.H.Y., S.L., S.K., H.-C.O. and H.H.K. are listed as inventors. H.H.K. is the founder of cisionMed. The other authors declare no competing interests.

Peer review

Peer review information

Nature Biomedical Engineering thanks Zhenran Jiang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Target sequence preferences and PAM compatibilities of Cas9 in vitro and in cultured cells.

a, Sequence preferences at each position in targets that are cleaved efficiently (top 20%) compared to inefficiently (bottom 20%) with matched sgRNAs, in vitro and in cultured cells (HEK239T, HeLa, Hepa 1-6, and B16-F10). The y-axis represents the log2 odds ratio of nucleotide frequencies between efficiently and inefficiently cleaved target sequences. A positive log odds ratio indicates a preference for the nucleotide in efficiently cleaved targets at that position, whereas a negative value signifies a preference for the nucleotide in inefficiently cleaved targets at the respective position. The number of target sequences (n) = 33,309 in vitro and 33,309 in HEK293T, 354 in HeLa, 352 in Hepa 1-6, and 348 in B16-F10 cells. b, Heat maps showing the average SpCas9-induced cleavage indices in vitro (left) and the average indel frequencies in HEK293T cells (right) for each 4-nt PAM sequence. The number of sgRNA and target sequence pairs per 4-nt PAM sequence (n) = 27 (in vitro) and 30 (HEK293T).

Extended Data Fig. 2 Development of Cut-seq2.

a, The structures of vectors used for Cut-seq1 and Cut-seq2. EcoRV, EcoRV restriction enzyme site; Constant, a constant sequence for identifying reads uncut by Cas9; UMI, unique molecular identifier; T7, T7 promoter; T7-term, T7-terminator; EcoR1, EcoR1 restriction enzyme site. b, Schematic representation of Cut-seq2. Illustrations in b created with BioRender.com.

Extended Data Fig. 3 In vitro cleavage activities of Cas9 measured using Cut-seq2.

a, Correlations between cleavage ratio indices of two biological replicates. The Pearson correlation coefficient (r) and Spearman correlation coefficient (R) are displayed. The number of sgRNA and target pairs (n) = 25,644 for HF1, 27,260 for NRRH-HF1, and 25,553 for NRCH-HF1. b, Heatmaps showing the effect of the number of mismatched nucleotides, DNA bulges, or RNA bulges in the sgRNA relative to the target on the mean cleavage ratio indices of Cas9 variants measured using Cut2_30k and Cut2_42k. HF1, SpCas9-HF1; NRRH-HF1, SpCas9-NRRH-HF1; NRCH-HF1, SpCas9-NRCH-HF1.

Extended Data Fig. 4 Selectivity of perfectly-matched and divergent sgRNAs and Cas9 variants evaluated using Cut-seq2.

a, Heatmaps showing the mean (left) and median (right) cleavage ratio index differences for perfectly-matched sgRNAs and sgRNAs with 1-nt mismatches, 2-nt mismatches, 1-nt DNA bulges, and 1-nt RNA bulges relative to the WT target for SpCas9 variants (HF1, NRRH-HF1, and NRCH-HF1). The number of sgRNA and target pairs for each Cas9 variant (n) = 18,155. b, The proportions of sgRNA types that showed the highest cleavage ratio index differences for 77, 68, and 68 target mutations (that is, the most discriminatory sgRNAs) for SpCas9-HF1, SpCas9-NRRH-HF1, and SpCas9-NRCH-HF1, respectively. The numbers of sgRNAs (n) are as follows: HF1, n = 77; NRRH-HF1, n = 68; NRCH-HF1, n = 68. c, Heatmaps showing the average cleavage ratio index differences induced by the SpCas9 variants (HF1, NRRH-HF1, or NRCH-HF1) for each 4-nt candidate PAM sequence. If a SpCas9 variant has the highest average cleavage ratio index difference among the three evaluated variants for a specific 4-nt PAM sequence, those PAMs are marked with blue borders and a ‘v’ in the box.

Extended Data Fig. 5 Important features associated with cleavage ratio index differences.

The 15 most important features associated with cleavage ratio index differences for HF1 (a), NRRH-HF1 (b), or NRCH-HF1 (c) were determined by Tree SHAP after considering multicollinearity, using a threshold of 0.7 for Pearson correlation coefficients. A high SHAP value indicates that the feature is associated with a high cleavage ratio index difference. Red and blue dots respectively represent high and low values of the relevant feature. Tm, melting temperature; GN19, sgRNA spacer starts with G. The number of sgRNA and target sequence pairs (that is, the number of dots per feature in the summary plot) (n) = 62,397 for HF1, 64,461 for NRRH-HF1, and 63,599 for NRCH-HF1.

Extended Data Fig. 6 Validation of models that predict the cleavage ratio index.

a-c, Evaluation of model for HF1 (a), NRRH-HF1 (b), and NRCH-HF1 (c), which were trained without additional features, using datasets of cleavage ratio indices that were never used for training. Scatter plots of measured and predicted cleavage ratio indices are shown. The Pearson correlation coefficient (r) and the Spearman correlation coefficient (R) are indicated. The number of sgRNA and target pairs in the test sets (n) = 5,831 for HF1, 4,402 for NRRH-HF1, and 5,570 for NRCH-HF1. d, Cleavage ratio index differences of sgRNAs suggested by DeepCut-HF1 (blue), of sgRNAs with 1-nt mismatches at positions 4–8 (red), or of sgRNAs with any mismatches at position 4–8 (pink). Cleavage ratio index differences were measured using target mutations in the Cut2_30k test set, which was not used for training DeepCut-HF1. The boxes represent the 25th, 50th (median), and 75th percentiles; whiskers show the 10th and 90th percentiles. P value calculated using Kruskal–Wallis test, followed by Dunn’s post hoc test with Bonferroni correction is shown. The number of target mutations (n) = 12. e, The recommended process for identifying optimized sgRNAs for CLOVE-seq. The process begins with the selection of a target mutation in the rare variant of interest. All possible PAM sites near the rare variant are identified, and perfectly-matched sgRNAs are designed, taking into consideration cases in which the rare variant appears in either the spacer or PAM sequence. For each perfectly-matched sgRNA, divergent sgRNAs are generated by introducing 1-nt mismatches, 2-nt mismatches, 1-nt insertions, or 1-nt deletions into the guide sequences of the perfectly-matched sgRNAs (an example is shown in f). The total number of divergent sgRNAs is shown in g. Each designed sgRNA is paired with both the noise sequence and the rare variant sequence, generating sgRNA-noise and sgRNA-rare variant pairs as input for the DeepCut model, which predicts cleavage ratio indices. Finally, based on the predicted cleavage ratio indices, optimized sgRNAs are selected to achieve the best discrimination between noise and rare variant sequences. MM, mismatch; DB, DNA bulge (1-nt deletion in sgRNA); RB, RNA bulge (1-nt insertion in sgRNA). f, Examples of divergent sgRNAs. The noise and rare variant sequences contain a C (bold blue) and a G (bold black), respectively. The protospacer (sgRNA binding site) is underlined, and the PAM (AGG) is italicized. A perfectly-matched sgRNA and two divergent sgRNAs are shown as examples. g, Number of possible divergent sgRNAs for a perfectly-matched sgRNA. A total of 1,691 (= 57 + 1,539 + 76 + 19) sgRNAs can be designed for a single perfectly-matched sgRNA.

Extended Data Fig. 7 Cancer-related target library and optimized sgRNAs.

a, Schematic overview of the approach for enriching rare variants via Cas9-mediated cleavage of noise sequences. b, The number of cancer-relevant mutations in each of the 40 genes most frequently mutated in cancer among the genes containing the 2,612 mutations. These mutations were included in the panel of 2,612 target mutations. c, The proportions of types of optimized sgRNAs that showed the highest cleavage ratio index differences among sgRNAs with cleavage ratio indices that are less than 0 at mutant targets, for 2,612 target mutations for SpCas9-HF1 and SpCas9-NRRH-HF1. HF1, SpCas9-HF1; NRRH-HF1, SpCas9-NRRH-HF1. The number of optimized sgRNAs (n) = 2,612 for HF1 and 2,612 for NRRH-HF1. d,e, The proportions of mutation positions (within protospacers (that is, selectivity by sgRNAs) vs. within the PAM (that is, selectivity by PAM sequences)) for sgRNAs that showed the highest cleavage ratio index differences for 2,612 target mutations (that is, optimized sgRNAs) for HF1 (d) and NRRH-HF1 (e).

Extended Data Fig. 8 Optimal conditions for selective enrichment of rare variant sequences.

a-c, The effects of different HF1:sgRNA molar ratios (a), HF1 concentrations (b), and cleavage reaction times (c) on the fold enrichment (that is, VAF fold increases). Only a single cleavage reaction cycle was conducted using the optimized_HF1_2k sgRNA library and a 1,000: 1 mixture of 7.8k_WT and 7.8k_MT. The boxes represent the 25th, 50th, and 75th percentiles; whiskers show the 10th and 90th percentiles. The number of target mutations for each condition (n) = 2,612.

Extended Data Fig. 9 Enrichment of rare variant sequences using optimized sgRNAs vs. perfectly-matched sgRNAs with NRRH-HF1.

a,b, Variant allele frequency (VAF) fold changes (a) and VAFs (b) before (without cleavage) or after the indicated number of treatment cycles of NRRH-HF1-induced cleavage using optimized _NRRH_2k (pink) or perfectly-matched_2k (grey) and PCR. 7.8k_WT and 7.8k_MT were mixed at a ratio of 1,000:1. Black or red dots represent the mean fold changes in the VAF (a) and mean VAFs (b). The boxes represent the 25th, 50th, and 75th percentiles; whiskers show the 10th and 90th percentiles. ‘V’ indicates that the median value is 0, which cannot be shown in the graph. The number of target mutations (n) = 2,612. c, The proportion of target mutations with VAFs higher than the reliable detection limit (VAFs > 0.2%) after the cleavage and PCR cycles using NRRH-HF1 and either optimized _NRRH_2k (pink) or perfectly-matched_2k (grey). The number of target mutations (n) = 2,612. d, Fold increases in the VAF for mutations in each of the 40 genes most frequently mutated in cancer among all genes containing the 2,612 mutations. Each dot represents a cancer-relevant mutation. Fold increases in the VAF were measured after 5 cycles of cleavage and PCR using optimized sgRNAs for NRRH-HF1 with a 1,000:1 mixture of 7.8k_WT and 7.8k_MT. The number of target mutations (n) = 346 for TP53, 78 for PIK3CA, 67 for FAT4, 58 for KMT2C, 55 for APC, 49 for KMT2D, 44 for RNF213, 43 for ARID1A, 43 for PTEN, 36 for KRAS, 36 for NFE2L2, 35 for FAT1, 33 for ERBB4, 29 for CDKN2A, 28 for FBXW7, 28 for CTNNB1, 27 for NF1, 26 for BCL7A, 24 for ATM, 24 for PTPRT, 23 for ZFHX3, 22 for SPEN, 20 for NTRK3, 20 for SF3B1, 19 for MET, 18 for EGFR, 18 for NRAS, 17 for TRRAP, 17 for RUNX1, 17 for TSPOAP1-AS1, 17 for KMT2A, 17 for MTOR, 16 for PDE4DIP, 16 for GNAS, 15 for BRAF, 13 for KIT, 13 for HRAS, 13 for ERBB2, 12 for DNMT3A, and 12 for SMARCA4. e, Histograms showing the number of target mutations with specified ranges of VAF fold change after one cycle of NRRH-HF1-induced cleavage with optimized (left) or perfectly-matched (right) sgRNAs and PCR. The percentage above each bar represents the proportion of target mutations within the corresponding VAF fold change range. The number of target mutations (n) = 2,612 for optimized sgRNAs and 2,612 for perfectly-matched sgRNAs. f, Fold increases in the VAF as a function of the DeepCut-NRRH-HF1-predicted cleavage ratio index difference. The fold increases in the VAF were measured before, or after 1, 3, or 5 cycles of cleavage and PCR using NRRH-HF1 and optimized_NRRH_2k. Black dots represent the mean values. The boxes represent the 25th, 50th, and 75th percentiles; whiskers show the 10th and 90th percentiles. The number of target mutations (n) = 2,612.

Extended Data Fig. 10 The effects of the position of mutations within the target sequence and the deep sequencing depth on the HF1-based enrichment of rare variant sequences using optimized sgRNAs.

a,b, The effect of the position of mutations within the Cas9 target sequence on the expected (that is, the predicted cleavage ratio index difference, left) and experimentally measured (that is, VAF fold change, right) fold enrichment of MT sequences using HF1 (a) and NRRH-HF1 (b). c, The effects of deep sequencing read depth (that is, ~2,000x vs. ~10,000x) on the enrichment of MT sequences. a-c, The boxes represent the 25th, 50th (median), and 75th percentiles; whiskers show the 10th and 90th percentiles. The number of target mutations (n) = 2,612.

Supplementary information

Supplementary Information

Supplementary Text 1 and 2, Figs. 1–5, Tables 1 and 10, Legends for Tables 2–9 and 11–13, and Methods.

Reporting Summary

Supplementary Tables

Supplementary Tables 2–9 and 11–13.

Source data

Source Data Fig. 1

Unprocessed gel image for Fig. 1d–f.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yeo, J.H., Lee, S., Kim, S. et al. High-throughput evaluation of in vitro CRISPR activities enables optimized large-scale multiplex enrichment of rare variants. Nat. Biomed. Eng (2025). https://doi.org/10.1038/s41551-025-01535-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41551-025-01535-0

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing