Fig. 2: Validation of modeling strategy by nested cross-validation and determination of essential features by cross-validation.

a, d, g, and j ROC curves of nested cross-validation of random forest models for HY stages 1–3 (a), 1 (d), 2 (g), and 3 (j) at year 0. Red and blue solid lines represent models constructed by bacterial and clinical features, respectively. The optimal point by Youden index is indicated by a dot with the specificity and sensitivity in parentheses. b, e, h, and k AUROCs by leave-one-out cross-validation of random forest models for HY stages 1–3 (b), 1 (e), 2 (h), and 3 (k) at year 0, while features were recursively eliminated. An arrow points to the maximum AUROC with the number of features. c, f, i, and l ROC curves of leave-one-out cross-validation of random forest models at the maximum AUROC for HY stages 1–3 (c), 1 (f), 2 (i), and 3 (l) at year 0. The optimal point by Youden index is indicated by a dot with the specificity and sensitivity in parentheses. Green ROC curve in f represents a model predicted by Fusicatenibacter alone, and its AUROC is plotted in green in e.