Extended Data Fig. 6: Validation of models that predict the cleavage ratio index. | Nature Biomedical Engineering

Extended Data Fig. 6: Validation of models that predict the cleavage ratio index.

From: High-throughput evaluation of in vitro CRISPR activities enables optimized large-scale multiplex enrichment of rare variants

Extended Data Fig. 6

a-c, Evaluation of model for HF1 (a), NRRH-HF1 (b), and NRCH-HF1 (c), which were trained without additional features, using datasets of cleavage ratio indices that were never used for training. Scatter plots of measured and predicted cleavage ratio indices are shown. The Pearson correlation coefficient (r) and the Spearman correlation coefficient (R) are indicated. The number of sgRNA and target pairs in the test sets (n) = 5,831 for HF1, 4,402 for NRRH-HF1, and 5,570 for NRCH-HF1. d, Cleavage ratio index differences of sgRNAs suggested by DeepCut-HF1 (blue), of sgRNAs with 1-nt mismatches at positions 4–8 (red), or of sgRNAs with any mismatches at position 4–8 (pink). Cleavage ratio index differences were measured using target mutations in the Cut2_30k test set, which was not used for training DeepCut-HF1. The boxes represent the 25th, 50th (median), and 75th percentiles; whiskers show the 10th and 90th percentiles. P value calculated using Kruskal–Wallis test, followed by Dunn’s post hoc test with Bonferroni correction is shown. The number of target mutations (n) = 12. e, The recommended process for identifying optimized sgRNAs for CLOVE-seq. The process begins with the selection of a target mutation in the rare variant of interest. All possible PAM sites near the rare variant are identified, and perfectly-matched sgRNAs are designed, taking into consideration cases in which the rare variant appears in either the spacer or PAM sequence. For each perfectly-matched sgRNA, divergent sgRNAs are generated by introducing 1-nt mismatches, 2-nt mismatches, 1-nt insertions, or 1-nt deletions into the guide sequences of the perfectly-matched sgRNAs (an example is shown in f). The total number of divergent sgRNAs is shown in g. Each designed sgRNA is paired with both the noise sequence and the rare variant sequence, generating sgRNA-noise and sgRNA-rare variant pairs as input for the DeepCut model, which predicts cleavage ratio indices. Finally, based on the predicted cleavage ratio indices, optimized sgRNAs are selected to achieve the best discrimination between noise and rare variant sequences. MM, mismatch; DB, DNA bulge (1-nt deletion in sgRNA); RB, RNA bulge (1-nt insertion in sgRNA). f, Examples of divergent sgRNAs. The noise and rare variant sequences contain a C (bold blue) and a G (bold black), respectively. The protospacer (sgRNA binding site) is underlined, and the PAM (AGG) is italicized. A perfectly-matched sgRNA and two divergent sgRNAs are shown as examples. g, Number of possible divergent sgRNAs for a perfectly-matched sgRNA. A total of 1,691 (= 57 + 1,539 + 76 + 19) sgRNAs can be designed for a single perfectly-matched sgRNA.

Back to article page