Fig. 1: Development of Rule Set 3 (Sequence + Target).

a Fraction overlap between sgRNAs used for training Rule Set 3 and those used for training other on-target models. Edge width and color are proportional to the fraction overlap. Node size is proportional to the number of sgRNAs. b Schematic depicting Rule Set 3 (Sequence + Target) development. Nucleotide differences in tracrRNA sequences are colored. Models were trained only on the train set, as indicated by blue outlines. Italics indicate features for which information was obtained from existing databases. c SHAP feature importance for the 20 most important features in Rule Set 3 (Sequence). Each point represents one sgRNA from the training set. Descriptions of model features can be found in Supplementary Data 3. d Histograms of SHAP values for sgRNAs, colored by guanine status in the 20th sgRNA position and split by tracrRNA identity. e SHAP feature importance for the 20 most important features in Rule Set 3 (Target). Each point represents one sgRNA from the training set. Descriptions of model features can be found in Supplementary Data 3. Source data are provided as a Source Data file.