Fig. 3: Diagnosis of early-stage LA by machine learning.
From: Machine learning of serum metabolic patterns encodes early-stage lung adenocarcinoma

a Schematic workflow for the construction of classification models, including an inner loop (machine-learning stage, orange) to tune the hyperparameters for the optimal classifier and outer cross-validation (classifier building stage, blue) to evaluate the discriminant performance. b Receiver operating characteristic (ROC) curves for the classifier designed to distinguish between early-stage LA patients and healthy controls. The colours of ROC curves—blue represented the ROC curve obtained by averaging 20 rounds of five-fold nested cross-validations (100 models in total) with a mean AUC of 0.921 (95% confidence interval (CI): 0.891–0.953), and the optimized number of training subjects was 240 (120/120, LA/control); red represented the ROC curve obtained from double-blind test (23/35, LA/control), showed AUC of 0.915 with diagnostic sensitivity of 88.57% and specificity of 91.30%; the grey area indicated the specificity/sensitivity of all independent ROC curves from 100 models, showing the diagnostic performance of the best (asterisk) and worst (hash mark) classifiers. c Averaged ROC curves with AUC to optimize the number of training subjects, analyzing from 20 (10/10, LA/control) to 280 (140/140, LA/control). Source data are provided as a Source Data file.