Fig. 3 | Scientific Reports

Fig. 3

From: Interpretable machine learning analysis of clinicopathological and immunonutritional biomarkers for predicting lymph node metastasis in gastric cancer

Fig. 3

XGBoost model performance evaluation and feature importance analysis. (A) receiver operating characteristic (ROC) curves for the training and testing datasets with optimal cutoff value of 0.44. (B) confusion matrices showing classification performance on training set (left) and test set (right). (C) model performance metrics comparing sensitivity and specificity between training and test datasets. (D) SHAP (SHapley Additive exPlanations) value analysis showing feature importance and contribution to lymph node metastasis (LNM) prediction, with features ranked by importance from top to bottom. (E) Calibration plot for the training and test sets. The blue and red solid lines represent the calibration curves of the model in the training and test sets, respectively. The dashed line indicates the ideal reference line (perfect agreement between predicted and observed outcomes). The dots represent the mean predicted probability and observed incidence after dividing the samples into 10 groups according to predicted risk. In the training set, the 10 data points were evenly distributed with predicted probabilities ranging from 0.08 to 0.83 and observed incidences from 0.06 to 0.98; in the test set, predicted probabilities ranged from 0.08 to 0.82 and observed incidences from 0.03 to 0.97. (F) Precision–Recall (PR) curve. The green solid line represents the relationship between precision and recall at different thresholds, the red dots indicate the threshold points, and the red horizontal line denotes the no-skill classifier baseline. The PR-AUC was 0.8413, demonstrating excellent discriminative performance of the model even under class imbalance (positive rate = 62%). (G) Decision curve analysis (DCA). The x-axis represents the threshold probability, and the y-axis represents the net benefit. The blue solid line indicates the net benefit of the model, while the purple and yellow dashed lines correspond to the “treat-all” and “treat-none” strategies, respectively. MTD, maximum tumor diameter; SII, systemic immune-inflammation index; RAR, red blood cell distribution width-to-albumin ratio; PNI, prognostic nutritional index; PLR, platelet-to-lymphocyte ratio; PAR, platelet-to-albumin ratio; Lauren type, Lauren histological classification; HRR, hemoglobin-to-red blood cell distribution width ratio; FIB, fibrinogen; CEA, carcinoembryonic antigen; AISI, aggregate index of systemic inflammation; NLR, neutrophil-to-lymphocyte ratio; MLR, monocyte-to-lymphocyte ratio; SIRI, systemic inflammatory response index; dNLR, derived neutrophil-to-lymphocyte ratio; PT, prothrombin time; NPR, neutrophil-to-platelet ratio.

Back to article page