Fig. 5: Escape mutations can be predicted by machine-learning approaches.
From: Comprehensive profiling of neutralizing polyclonal sera targeting coxsackievirus B3

a Overview of the machine-learning (ML) workflow: the dataset was preprocessed to reduce dimensionality by binning and randomly split into training (75%) and testing (25%) datasets. After 1,000 training-testing rounds, 27 outliers for which the ML classifier provided a large probability of escape (>70%) were identified. The reliability of the ML classifier was reevaluated after their reclassification (no escape to escape). b Receiver Operating Characteristic (ROC) curves for “no escape” (left) and “escape” (right) for training (blue), testing without reclassification (yellow), and testing with reclassification (pink). The area under the curve (AUC) is indicated in the insets. c Analysis of accuracy loss after feature shuffling. Features contributing to more than 10% of the model’s accuracy are indicated in bold. d SHAP feature analysis. The colors indicate the relative value of numerical features and the positive or negative impact of the feature in the class is measured on the horizontal axis. Relevant predictive features (in bold) are those having a large impact, with their numeric value having directionality. e The relative enrichment of individual amino acids in sites conferring escape versus surface-exposed residues where no escape is observed. f The relative enrichment of individual amino acids in mutations conferring escape versus mutations in the same residues that do not confer escape. neg: negatively charged; pos: positively charged; N non-polar, P polar. *p < 0.05, **p < 0.01, ***p < 0.005 by a two-sided Fisher’s exact test. Source data are provided as a Source Data file. See Figure S5 for characteristics of sites and mutations of escape and Tables S3 and S4 for a comparison of the different machine-learning algorithms and RF classifiers tested, respectively.