Extended Data Fig. 3: Random Forest Classification Model Schematic and Performance. | Nature Medicine

Extended Data Fig. 3: Random Forest Classification Model Schematic and Performance.

From: Multi-omic profiling a defined bacterial consortium for treatment of recurrent Clostridioides difficile infection

Extended Data Fig. 3

A, Schematic of random forest model used to predict features of CDI recurrence and VE303 colonization. Participants were classified into recurrent/non-recurrent and VE303 colonized/non-colonized using a combination of continuous and categorical independent variables including metagenomics, immunological and metabolite datasets, and patient-level metadata such as demographics and medical history. Random forest classification across different data modalities is used to predict B, the colonization status of VE303 strains across all dosed participants; diamonds and error bars represent the mean area under the receiver operating characteristic curve [AUC] ± standard error (se) over n = 10 model iterations C, on-study CDI recurrence status including all VE303 recipients; diamonds and error bars represent the mean AUC ± standard error (se) over n = 100 model iterations and D, on-study CDI recurrence status excluding subjects who recurred later than Day 14; diamonds and error bars represent the mean AUC ± standard error (se) over n = 100 model iterations. Leave-one-out cross-validation was performed to assess the model performance. AUC was computed for all models; an AUC of ≤0.5 (dashed red line) indicates random classification or unreliable model performance. Number labels on each graph indicate the total number of samples available per timepoint and dataset; brackets convey class sizes for (B) colonized/non-colonized and (C, D) recurrent/non-recurrent classes. Missing values indicate lack of sample availability with fewer than 3 recurrent participants available for analysis and unreliable results in the corresponding model.

Source data

Back to article page