Fig. 4: GNE induced mutations are enriched in predicted deleterious effects.
From: Perturbing proteomes at single residue resolution using base editing

a SIFT score distributions for the most likely induced mutations of both GNEs (blue) and NSGs (red). The thresholds for the categories used in the enrichment calculations in b are shown as black dotted lines. SIFT scores represent the probability of a specific mutation being tolerated based on evolutionary information: the first threshold of 0.05 was set by the authors in the original manuscript32 but might be permissive considering the number of mutations tested in our experiment (n = 571, 12,718, 457, 8767, 430, 7609, 343, 5847). All GNE vs NSG score comparisons are significant (Welch’s t-test p-values: 1.64 × 10−21, 5.99 × 10−20, 1.62 × 10−12, 1.75 × 10−9). Boxplots represent the upper and lower quartiles of the data, with the median shown as a black bar. Whiskers extend to 1.5 times the interquartile range (Q3–Q1) at most. Outliers are shown in gray. The box cutoff is due to the large fraction of mutations for which the SIFT score is 0. b Enrichment folds of GNEs over NSGs for different variant effect prediction measurements. Envision score (Env.), SIFT score (SIFT), protein folding stability based on solved protein structures (Struct. ∆∆G), protein folding based on homology models (Model ∆∆G) and protein–protein interaction interface stability based on structure data (Inter. ∆∆G). The predictions based on conservation and experimental data are grouped under ‘Predictors’ and those based on the computational analysis of protein structures and complexes under ‘Structural’. Source data are available in the Source Data file.