Fig. 5: Machine learning predictive models fitted using ten-by-ten-fold nested cross-validation for response to etanercept, tocilizumab and rituximab.

a Schema showing a machine learning pipeline. b Box plots of model performance for each of the three trial drugs. Multiple types of machine learning (ML) models were fitted to baseline synovial RNA-Seq gene expression data to predict response to each trial drug at the 16-week primary endpoint, with response defined as DAS28-ESR <3.2. Model types: gradient boosted machine (gbm), elastic net regression (glmnet), mixed discriminant analysis (mda), random forest (rf), support vector machine (svm) with polynomial (svmPoly) or radial (svmRadial) kernel, extreme gradient boosting (xgboost) with tree booster (xgbTree) or linear booster (xgbLinear). Unbiased model performance was determined by 10 × 10-fold nested cross-validation (CV) with 25 repeats (each point shows one repeat), with the area under the receiver operating characteristic (ROC) curve as performance metric for etanercept and tocilizumab. The Coefficient of determination R2 was used as a performance metric for rituximab models (see Methods), which were fitted to an ordinal (four-level) response outcome, as this led to improved final binary response prediction. Box plots show median, upper and lower quartiles, with whiskers denoting maximal and minimal data within 1.5 × interquartile range (IQR). c ROC curves for final best models for each drug, showing nested CV ROC and ROC calculated from inner CV folds. d Variable importance plots showing stability of variables selected by the final ML model for each drug across nested CV. Error bars show the standard error of mean variable importance, size of points shows frequency with which each gene/predictor was selected by models during nested CV. Colour of points shows directionality of association with response: red for genes/predictors upregulated in non-response, blue for genes/predictors upregulated in response. e Validation of STRAP-trained tocilizumab and rituximab machine learning models in R4RA. Models for tocilizumab and rituximab shown in c, d were applied to synovial RNA-Seq and data from patients randomised to treatment with tocilizumab (n = 65) or rituximab (n = 68) in the R4RA trial. Predicted outcome was compared to the real outcome, with response defined as DAS28-ESR <3.2 at the 16-week primary endpoint of the trial. Predictive model performance was assessed by ROC AUC.