Fig. 3: Comparison of the diagnostic performance of six ML-based models for differentiating XGC and GBC across the training, internal validation, and external testing cohorts.

a–c ROC curves of each model across the three cohorts; d–f Calibration curves showing predicted vs. observed probabilities for each model; g–i DCA curves indicating net benefit and clinical utility for each model across the three cohorts. ML machine learning, XGC Xanthogranulomatous cholecystitis, GBC gallbladder cancer, LR logistic regression, RF random forest, SVM support vector machine, XGB eXtreme gradient boosting, LGB light gradient boosting, MLP multilayer perceptron, ROC receiver operating characteristic, DCA decision curve analysis, CI confidence interval.