Fig. 4: Prognostic models for distinguishing between NYHA class IV patients and those in NYHA class II and III. | Nature Communications

Fig. 4: Prognostic models for distinguishing between NYHA class IV patients and those in NYHA class II and III.

From: Integrated plasma and vegetation proteomic characterization of infective endocarditis for early diagnosis and treatment

Fig. 4

A Schematic of the ML pipeline for developing prognostic models. BG ML model establishment based on CBMs (BD) and plasma proteome (EG). Bar plots illustrating the weighted feature importance of the top 10 CBMs (B) and the top 20 DEPs (E) identified by the ensemble models. The error bars represent the SEM of the weighted feature importance calculated from five-fold cross-validation across six algorithms. Data are presented as mean ± SEM (n = 6). Scatter line plots (C, F) illustrating RFECV for optimally determining feature combinations with the highest model accuracy. ROC curves (D, G) depicting the AUC for the models. H Scatter line plot illustrating RFECV for optimal feature selection for the proteome + CBM model from the 6 CBMs and 14 DEPs identified independently in the CBM-only and proteome-only models to maximize model accuracy. I Boxplots and heatmap showing significant differences in the 15 features identified in the proteome + CBM model between NYHA class IV patients (n = 60) and those in NYHA class II and III (n = 134). For boxplots, the central line represents the median, the box indicates the IQR, and whiskers extend to 1.5 × IQR, with outliers as individual points. The unpaired two-sided Wilcoxon rank-sum test was used for differential analysis. The annotation of the pathogen identified by mNGS for each sample is displayed above the heatmap, with blanks indicating missing records. J, K ROC curves (J) and confusion matrices (K) illustrating the performance of the proteome + CBM model on the hold-out test set of the discovery cohort (Cohort 1) and the external validation cohort (Cohort 3). For figures (C, F, and H), the error bands represent the 95% CI from five-fold cross-validation. Source data are provided as Source Data files.

Back to article page