Fig. 7

Construction and evaluation of machine learning models for distinguishing TB clusters.(A-B) Residual distribution plots for the four ML models. The XGBoost (XGB) and Support Vector Machine (SVM) models exhibit the smallest and most centered residuals, indicating better fit and prediction accuracy. (C) Variable importance plot displaying the top 15 predictive genes for each model, ranked by their contribution to the model’s performance. (D) Receiver operating characteristic (ROC) curves of the four models using 5-fold cross-validation. Random Forest (RF), SVM, and XGB models achieve excellent performance (AUC > 0.96), significantly outperforming the Generalized Linear Model (GLM).