Fig. 4: Development of computational models to predict the activities of Cas12a variants.
From: Highly parallel profiling of the activities and specificities of Cas12a variants in human cells

a 5-fold cross-validation of seq-DeepCpf1variants models on training data sets. The Pearson correlation coefficients (left) and the Spearman correlation coefficients (right) were shown. Boxes represent the 25th, 50th, and 75th percentiles, and whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles. b Evaluation of seq-DeepCpf1variants computational models predicting the activities of Cas12a variants on data sets of indel frequencies that were never used for the training. The Spearman correlation coefficient (ρ) and the Pearson correlation coefficient (r) are shown. c Evaluation of enseq-DeepCpf1 computational models predicting the activities of AsCas12a on test data sets As HT and HT 1-2; As HT test data set was split from AsCas12a induced high-throughput data set; HT 1-2 test data set and seq-DeepCpf1 computational models are derived entirely from the Kim 201757; The Spearman correlation coefficients are shown. Data are presented as mean values +/− SEM. d Correlation between enseq-DeepCpf1 prediction scores and measured indel frequency ranks at an independent test Kim 201755 (n = 82, guide RNA-target sequence pairs, 293 T). e Development and evaluation of enDeepCpf1 computational models after consideration of chromatin accessibility. Performance comparison of enDeepCpf1 with enseq-DeepCpf1 in HCT116 cells and HEK293T cells; The training data set of HEK-lenti, the test data sets of HCT plasmid and HEK plasmid are derived entirely from the Kim 201757. Data are presented as mean values +/− SEM. f Correlation between enDeepCpf1 prediction scores and measured indel frequency ranks at three independent tests (Yin (n = 15, transfection, 293 T); Chari 201759 (n = 38, transfection, 293T); Kim 201660 (n = 10, transfection, 293T)). The Spearman correlation coefficients are shown. g, h Model comparation at integrated and endogenous sites on test data sets. Pearson and Spearman correlation coefficients between different models and data sets of integrated target sites (g) and data sets of endogenous target sites (h). The test data sets are arranged vertically, whereas the prediction models are placed horizontally. Correlation coefficient values are listed in boxes. ****P < 0.0001 by Two-tailed paired t-test. Source data are provided as a Source Data file.