Extended Data Fig. 2: Additional validation of the DeepPE model.
From: Predicting prime editing efficiency and product purity by deep learning

a, Predicted (PRIDICT) and measured intended editing efficiency for GtoC edits at position 5 of RTT in the dataset from this study. Data from all five test sets (fivefold cross-validation) were combined for this visualization. n = 540. b, Evaluation of attention-based bidirectional RNN (PRIDICT-AttnBiRNN; trained on the dataset from this study) by testing on pegRNAs from Kim et al. 2021 HT dataset (only G to C at Position 5). n = 4,457. c, Evaluation of DeepPE model (original, trained on Kim et al. 2021 HT dataset) by testing on the dataset from this study (only G to C at Position 5). n = 540. d,e, SHAP analysis of XGBoost models trained and tested on DeepPE dataset (n = 43,149) (d) or on G-to-C Position 5 edits from library 1 (e). Feature descriptions are listed in Supplementary Table 1. f, Editing efficiency with different RTT overhang lengths (5, 7, 10, 15 bp) in DeepPE (Kim et al.) dataset. n for each bar (left to right) = 10,746, 10,828, 10,921, 10,654. Error bars = mean ±s.d. g, Editing efficiency with different RTT overhang lengths (3, 7, 10, 15 bp) in GtoC Pos. 5 edits of library 1 for a direct comparison to identical edits in the DeepPE dataset. n for each bar (left to right) = 135, 135, 137, 133. (f,g) Error bars = mean ±s.d. h,i, Evaluation of DeepPE model (n = 18) on 18/45 endogenous edits from this study in HEK293T (h) and K562 (i).