Fig. 2: Construction and validation of the serum metabolic classifier for discriminating between benign and malignant pulmonary nodules.

a The establishment workflow of the pulmonary nodule classifier, including selection of an optimal serum metabolite panel in the discover set by tenfold cross validation using binary logistic regression model, and evaluation of the prediction performance in the internal and external validation sets. b Cross-validation statistics of LASSO-regression model for selection of metabolic biomarkers. Numbers above indicate the average number of selected biomarkers under a given λ. The red dotted line indicates the mean values of AUC under the corresponding λ. The gray error band indicates the minimum and maximum values of AUC. The dashed line points to the optimal model with 27 selected biomarkers. AUC, area under the receiver operating characteristic (ROC) curve. c Fold change of 27 selected metabolites in LA group compared with BN group in the discovery set. Red columns, upregulated. Blue columns, downregulated. d–f The receiver operating characteristic (ROC) curves showing the efficacy of the discriminant model based on the combination of 27 metabolites in the discovery, internal and external validation sets. Source data are provided as a Source Data file.