Fig. 7: Within human datasets, the PAM proximal sequence defines an upper bound of potential activity for a given gRNA.
From: Genome dependent Cas9/gRNA search time underlies sequence dependent gRNA activity

a Together, the two largest human datasets contain gRNAs that represent all possible 5-mers in the PAM proximal position. b We combined these datasets, grouped all gRNA by the PAM proximal 5 bp, calculated an average value for each group, and then used these grouped averages to predict gRNA activity in all human datasets. c We correlated this predicted activity with actual activity using Pearson correlation. The datasets that we used to generate the averages are highlighted in blue, while test datasets expressing gRNA normally or at higher expression levels are highlighted in orange and gray, respectively. We then calculated the residuals (Activity - Predicted Activity) for d) the two training datasets, e all of the wild-type Cas9 datasets, and f all other Cas9 variants. Datasets with normal gRNA expression are in orange and those with higher gRNA expression are in gray. g For each of the training datasets we compared predicted activity on the x-axis to actual activity on the y-axis. h The same comparison is shown for each of the other datasets. Numbers in the top left of each sub-plot refer to the datasets identified in c). Refer to Supplementary Fig. 5 for a similar analysis for E. coli datasets. i We applied the same prediction method to a dCas9 dataset from Horlbeck et al. 201639 and base editing datasets from Marquart et al. 202040. Shown are the Pearson correlations between Activity and Predicted Activity. We then j calculated the residuals and k plotted Predicted Activity against Actual Activity.