Figure 2 | Scientific Reports

Figure 2

From: An integrated pipeline for prediction of Clostridioides difficile infection

Figure 2

Association among features and performance of models in prediction of CDI in MyCode and nonMyCode samples with (simulated) genotypes included. (a, b) Heatmaps to show the significant association between variables employed in the prediction model using the training dataset. Data extraction and pre-processing details (z-scored index age, binary codes for other variables) have previously been described. Association among variables (index age further dummy coded) was assessed using a bivariate χ2 test. (c, d) To examine the discrimination power of each modeling algorithm in the testing dataset, we estimated the AUROC using common clinical risk factors for CDI with or without rs2227306 as predictors. Here the genotypes of rs2227306 were simulated in the nonMyCode samples. (e) The summary of AUROCs of the optimal modeling algorithms (gbm and xgbDART) versus glm using simulated rs2227306 genotype. P values represent the result of the DeLong test to compare AUROC between models with or without (simulated) genetic data included, with or without PSM for index age and sex.

Back to article page