Fig. 1

Performance and interpretability of machine learning models applied to breathomics data. (A) Macro-averaged ROC curves for seven classification models using outer cross-validation. (B) SHAP summary plot showing the top 10 VOCs contributing to classification across asthma, bronchiectasis, and COPD. Each horizontal bar represents a VOC, identified by its PubChem CID number on the y-axis, with bar length and color indicating the magnitude and class-specific contribution. (Abbreviations: kNN = k-nearest neighbors, LR = logistic regression, NB = naïve Bayes, DT = decision tree, SVM = support vector machine, RF = random forest, and XGBoost = extreme gradient boosting.)