Machine learning model for differentiating xanthogranulomatous cholecystitis and gallbladder cancer in multicenter largescale study

Zhang, Ke; He, Jiajia; Ji, Weiyue; Pan, Qunyan; Xiong, Weilv; Wang, Liping; Sun, Weiqi; Xie, Liting; Jiang, Tianan

doi:10.1038/s41746-025-01991-7

Download PDF

Article
Open access
Published: 01 October 2025

Machine learning model for differentiating xanthogranulomatous cholecystitis and gallbladder cancer in multicenter largescale study

Ke Zhang ORCID: orcid.org/0000-0001-5606-428X¹^na1,
Jiajia He²^na1,
Weiyue Ji³,
Qunyan Pan²,
Weilv Xiong⁴,
Liping Wang⁵,
Weiqi Sun⁶,
Liting Xie¹ &
…
Tianan Jiang¹

npj Digital Medicine volume 8, Article number: 590 (2025) Cite this article

194 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Preoperative differentiation between xanthogranulomatous cholecystitis (XGC) and gallbladder cancer (GBC) remains challenging due to overlapping clinical and imaging features. This multicenter retrospective study developed a machine learning (ML) model, LIDGAX, using preoperative clinical, imaging, and laboratory data from 1246 patients (554 XGC, 692 GBC). Twelve variables were identified as independent predictors via multivariate logistic regression and least absolute shrinkage and selection operator analyses. LIDGAX achieved area under the curve (AUC) values of 0.94 (internal validation) and 0.88 (external testing), outperforming the other five ML models. Calibration and decision curve analyses demonstrated its superior clinical utility. Compared to six radiologists, LIDGAX improved sensitivity (1.2–8.5%), specificity (0.0–4.6%), and balanced accuracy (1.8–6.6%), while reducing average diagnostic time per patient by 30.44–35.76 s. LIDGAX was deployed on an open-source online platform, maintaining high performance (AUC 0.95, accuracy 0.92). This non-invasive tool shows strong potential for clinical translation in preoperative differentiation of XGC and GBC.

Prognostic model for log odds of negative lymph node in locally advanced rectal cancer via interpretable machine learning

Article Open access 07 March 2025

Automated gall bladder cancer detection using artificial gorilla troops optimizer with transfer learning on ultrasound images

Article Open access 19 September 2024

Establishment and characterization of a novel human gallbladder cancer cell line, GBC-X1

Article Open access 13 September 2024

Introduction

In real-world clinical scenario, gallbladder diseases are primarily categorized as benign cholecystitis or malignant gallbladder cancer. Xanthogranulomatous cholecystitis (XGC) is a rare type of chronic inflammatory gallbladder diseases characterized histologically by focal or diffuse inflammatory infiltration of foamy cells, multinucleated giant cells, lymphocytes, and fibroblasts¹. In contrast, gallbladder cancer (GBC) is the most aggressive malignancy of the biliary tract², with most cases having a poor prognosis due to its aggressive nature and limited therapeutic options^2,3. Accurate differentiation between XGC and GBC has critical implications for treatment decisions. XGC, being a chronic inflammatory condition, is typically managed with laparoscopic cholecystectomy⁴. Conversely, GBC often necessitates more aggressive interventions depending on the stage, including liver resection, bile duct excision, lymph node dissection, and possibly neoadjuvant chemoradiotherapy before surgery⁵. Misdiagnosing GBC as XGC could result in undertreatment, such as lack of preoperative surgical planning, incomplete resection, or inadequate follow-up, potentially accelerating disease progression. Conversely, misidentifying XGC as GBC could lead to unnecessary surgical procedures like liver resection or extensive lymph node dissection, as well as increased complications and resource usage. Therefore, distinguishing XGC from GBC preoperatively is crucial in clinical practice, which could reduce intraoperative frozen section misdiagnosis risks and guide postoperative surveillance strategies. However, this differentiation is highly challenging due to the overlapping clinical characteristics and imaging features of the two diseases, such as abdominal pain, jaundice, gallbladder wall thickening, and invasion of adjacent organs^6,7. Additionally, misdiagnosis is common, with reported rates ranging from 10% to 30%^6,8,9,10. Thus, non-invasive preoperative biomarkers are needed to improve differentiation between XGC and GBC and reduce overtreatment.

Preoperative diagnostic imaging for gallbladder diseases commonly includes ultrasound (US), contrast-enhanced computed tomography (CECT), and magnetic resonance imaging (MRI). Previous studies have explored imaging features on US, CT, and MRI to differentiate XGC from GBC. For instance, gallbladder mucosal line continuity, a low-density border surrounding the lesion, diffuse gallbladder wall thickening, hypo-attenuated or hypoechoic nodules in the thickened walls, and the presence of calculi were found to be strongly associated with XGC^11,12,13,14. Lee et al.¹⁵ compared the diagnostic performance of these imaging modalities and demonstrated that MRI had the highest accuracy, followed by US and CT. Each imaging technique has distinct advantages and limitations. US is frequently used for initial screening due to its high temporal resolution, ability to observe blood flow, convenience, absence of radiation, and low cost, despite its lower spatial resolution. CT is effective for visualizing liver and lymph node involvement but involves radiation exposure and lower soft tissue resolution. MRI provides superior soft tissue resolution and effectively shows gallbladder wall invasion but is time-consuming and susceptible to respiratory motion artifacts. Therefore, integrating the features from these imaging modalities may enhance the differential diagnosis of XGC and GBC.

Machine learning (ML) algorithms, as novel non-invasive approaches, can flexibly and efficiently analyze high-throughput data, enabling the discovery of complex relationships between variables¹⁶. Due to their advanced capabilities, various ML techniques are widely used to identify disease risk factors, predict treatment outcomes in patients with tumors, and support clinicians in real-world practice^17,18,19,20. However, a previous study has indicated that different ML methods can produce varying performance results²¹. Therefore, identifying the most effective ML techniques is essential for ensuring accurate and reliable predictions and classifications in clinical applications. To our knowledge, no current research has employed multiple ML methods on large-scale, multi-center datasets to differentiate XGC from GBC in real-world settings.

In this study, we developed distinct ML-based models to differentiate between XGC and GBC using preoperative clinical characteristics, imaging features, and laboratory tests. We compared the performance of these models and validated them on an independent, external, multi-center testing cohort to assess generalizability. We then evaluated the optimal model against results from a reader study involving six radiologists with varying levels of experience. Finally, we explored the application of the most effective model in real-world clinical settings, including outpatient, inpatient, and physical examination settings. Figure 1 illustrates the framework of the proposed ML-based model.

**Fig. 1: Overall design of the framework.**

Results

Baseline characteristics

The baseline characteristics of patients in the four cohorts are summarized in Table 1. Between January 2023 and February 2024, a total of 1246 patients were included in the analysis, comprising 554 patients diagnosed with XGC and 692 with GBC (Fig. 2). The median age across the overall dataset was 63.0 years, with 574 males (46.1%) and 672 females (53.9%). The training cohort consisted of 674 patients (326 with XGC and 348 with GBC), while the internal validation cohort included 169 patients (82 with XGC and 87 with GBC). The external testing cohort contained 279 patients, distributed as 90 XGC and 189 GBC cases. Detailed definitions of clinical data, laboratory tests, and imaging features are available in Supplementary Tables 1, 2. Further details on the distribution of XGC and GBC across all cohorts can be found in Supplementary Table 3.

**Fig. 2: Flowchart of the study population.**

Table 1 The baseline characteristics of the training, internal validation, external testing, and real-world cohorts

Full size table

Construction of ML-based models

A total of 79 variables were collected, including clinical characteristics (n = 9), imaging features (n = 19), and laboratory tests (n = 51). To identify and retain only the most relevant indicators, univariate and multivariate logistic regression analyses were performed on all variables (Supplementary Table 4). The multivariate analysis identified 20 variables as independently associated with either XGC or GBC. Specifically, the variables independently associated with XGC included male, epigastric pain, hyperechoic findings on US, presence of gallbladder stones, regular gallbladder morphology, reduced gallbladder size, presence of intramural nodules, continuous mucosal line, elevated fibrinogen level, and higher total bilirubin level. In contrast, independent indicators for GBC included fever, smoking, other conditions (such as schistosomiasis or congenital biliary dilation/cyst), biliary duct dilation, intraluminal tumors, invasion of adjacent structures, enlarged peri-tumoral lymph nodes, hyperdense findings on CT, increased indirect bilirubin, higher CEA levels, and a higher CA199-to-total bilirubin (TB) ratio.

To further refine and determine the optimal number of features, LASSO analysis was employed for an in-depth selection of the 20 independent variables (Supplementary Fig. 1). This analysis ultimately selected 12 key variables for the construction of ML models, which included sex, other conditions, ultrasound echo, gallbladder stones, biliary duct dilation, gallbladder morphology, intramural nodules, intraluminal tumor, mucosal line, enlarged peri-tumoral lymph nodes, fibrinogen level, and indirect bilirubin level. Multicollinearity analysis confirmed that all 12 variables had VIF values below 1.50, indicating no significant collinearity issues among them (Supplementary Table 5).

Diagnostic performance of ML-based models

Using the 12 selected variables, we constructed six ML-based models, including LR, RF, SVM, XGB, LGB, and MLP. In the training, internal validation, and external testing cohorts, the AUCs ranged from 0.98 to 1.00 (95% CI: 0.97–1.00), from 0.92 to 0.94 (95% CI: 0.90–0.98), and from 0.86 to 0.88 (95% CI: 0.81–0.92), respectively (Fig. 3a–c). Notably, the LGB model consistently achieved the highest AUC values in both the internal validation and external testing cohorts, outperforming the other ML-based models. We also compared the differences in AUCs among the six models in the training and internal validation cohorts (Supplementary Table 6), as well as in different external testing cohorts (Supplementary Fig. 2). Additionally, Fig. 3 and Table 2 summarize the diagnostic metrics for each model across the cohorts, including AUC, accuracy, sensitivity, specificity, PPV, NPV, and recall. LIDGAX achieved an AUC of 0.88 (95% CI: 0.84–0.93), accuracy of 0.80 (95% CI: 0.74–0.84), sensitivity of 0.79 (95% CI: 0.73–0.85), and specificity of 0.80 (95% CI: 0.70–0.88) in the external testing cohort. Supplementary Fig. 3 presents the confusion matrices of all models. Calibration curves demonstrated that all models showed good alignment between predicted and observed probabilities for differentiating XGC and GBC in each cohort (Fig. 3d–f). The DCA illustrated the net benefit of clinical utility in the six ML-based models across the three cohorts (Fig. 3g–i). These results strongly suggested that the LGB model outperformed the other five models in various performance parameters. Additionally, the performance of LIDGAX was assessed using time-stratified five-fold cross-validation, demonstrating robust predictive capabilities across all folds (AUC: 0.97–0.98 and 0.94–0.98 in the training and internal validation cohorts; Supplementary Table 7 and Supplementary Fig. 4). This temporal validation strategy effectively simulates real-world clinical deployment scenarios where model performance must remain stable despite temporal shifts in patient characteristics.

Table 2 Comparison of the diagnostic performance of six ML-based models for differentiating XGC and GBC across the training, internal validation, and external testing cohorts

Full size table

**Fig. 3: Comparison of the diagnostic performance of six ML-based models for differentiating XGC and GBC across the training, internal validation, and external testing cohorts.**

Three thresholding strategies demonstrated distinct performance trade-offs across cohorts (Supplementary Table 8). In the external testing cohort, the Youden Index balanced sensitivity (0.79) and specificity (0.80). Maximizing sensitivity achieved near-perfect GBC detection (0.97) but caused significant specificity drops (0.46), increasing false positives. Conversely, maximizing specificity minimized overtreatment risks (0.96) but sacrificed sensitivity (0.42), raising missed diagnosis concerns. Confusion matrices further revealed classification accuracy of three strategies (Supplementary Fig. 5).

Interpretability of LIDGAX model

To enhance the explainability of LIDGAX, the SHAP explainer was utilized to interpret the diagnostic importance of features in the optimal LGB model for distinguishing XGC from GBC. The SHAP beeswarm plot (Fig. 4a) visualizes the 12 key variables, showing each variable’s contribution to model predictions. Variables were ranked by importance using average SHAP values and are displayed in descending order (Fig. 4b). SHAP values greater than zero correspond to predictions for the positive class, indicating a higher risk of GBC. For instance, features such as intraluminal tumors or enlarged peri-tumoral lymph nodes were associated with positive SHAP values, which drive predictions toward the “GBC” class. Additionally, Fig. 4c illustrates a case aligned with the “XGC” class, while Fig. 4d represents a case aligned with the “GBC” class according to LIDGAX predictions, with actual variable measurements displayed in each force plot.

**Fig. 4: Model interpretability using SHAP.**

Subgroup analyses

We compared the performance of four models built with clinical, imaging, laboratory, and combined variables, respectively (Supplementary Fig. 6). Ultimately, the clinical model was constructed using MLP, the imaging model using MLP, the laboratory model using SVM, and the combined model using LGB (Supplementary Fig. 7). In the external testing cohort, the combined model outperformed the clinical model (0.68 vs. 0.88, adjusted P < 0.0001), the imaging model (0.88 vs. 0.88, adjusted P = 0.699), and the laboratory model (0.62 vs. 0.88, adjusted P < 0.0001). Furthermore, Supplementary Table 9 provides an overview of the AUC, accuracy, sensitivity, specificity, PPV, NPV, and recall of the four models across these cohorts. Supplementary Figs. 8–10 present the confusion matrices of six ML models based on clinical variables for differentiating XGC and GBC across all cohorts. The calibration and DCA curves for these models demonstrated that the combined model had a satisfactory alignment and net benefit of clinical utility (Supplementary Figs. 11, 12). All results showed that models utilizing combined variables outperformed those using single-variable groups, with imaging variables making the most substantial contribution. Furthermore, the subgroup analyses demonstrated robust performance of LIDGAX, with AUCs consistently ranging from 0.85 to 0.91 across all subgroups (P = 0.079–0.682; Supplementary Fig. 13 and Supplementary Table 10), confirming its generalizability despite demographic, temporal, and institutional variations.

Reader study

To evaluate the performance of LIDGAX compared to six radiologists (including gallbladder specialists, general radiologists, and radiology residents) in distinguishing XGC from GBC, we measured diagnostic accuracy and time efficiency, both with and without LIDGAX assistance (Fig. 5 and Supplementary Table 11). The study involved 169 patients (82 XGC and 87 GBC) from the internal validation cohort. Results showed that all six radiologists performed less accurately than LIDGAX alone in differentiating XGC from GBC (Fig. 5a, d, e), particularly for radiology residents, with significant differences observed in specificity (P = 0.009–0.010) and sensitivity (P = 0.041). When unassisted, the radiologists demonstrated sensitivity rates between 74.4% and 85.4%, which improved to 82.8–89.0% when assisted by LIDGAX (Fig. 5b, f). Similarly, specificity increased from 78.2–86.2% to 82.8–87.4% (Fig. 5b, f). Balanced accuracy rose from 76.3–85.2% unassisted to 82.8–87.6% with LIDGAX assistance, though it remained slightly lower than LIDGAX’s performance of 88.2% (Fig. 5c, f). This improvement was most pronounced for radiology residents compared to gallbladder specialists and general radiologists, although not statistically significant (P = 0.121–0.606). Furthermore, the average time per assessment decreased significantly from 68.23–89.36 s without LIDGAX to 37.79–53.60 s with its assistance (all P-values < 0.0001, Fig. 5f and Supplementary Table 11), highlighting LIDGAX’s potential to improve both diagnostic accuracy and efficiency in differentiating XGC from GBC.

**Fig. 5: Evaluation of LIDGAX versus six radiologists in differentiating XGC and GBC.**

Real-world clinical evaluation

To better facilitate clinical translation in real-world settings, we developed an open-source online platform (Supplementary Fig. 14; Version 2.0; https://lidgaxmodel.streamlit.app) based on the LIDGAX model, making it convenient for physicians to use. This retrospective real-world cohort from Center A ultimately comprised 124 individuals, including 56 (45.0%) diagnosed with XGC and 68 (55.0%) with GBC (Fig. 6a). Supplementary Table 12 provides baseline characteristics for XGC and GBC patients in this cohort. LIDGAX achieved an AUC of 0.95 (95% CI: 0.91–0.99), an accuracy of 0.92 (95% CI: 0.86–0.96), a sensitivity of 0.94 (95% CI: 0.86–0.98), a specificity of 0.89 (95% CI: 0.78–0.96), a PPV of 0.91 (95% CI: 0.82–0.97), an NPV of 0.93 (95% CI: 0.82–0.98), and a recall of 0.94 (95% CI: 0.86–0.98) (Fig. 6).

**Fig. 6: Evaluation of the real-world cohort.**

Discussion

In our multicenter real-world study, we present LIDGAX, an advanced ML-based model developed to differentiate between XGC and GBC using clinical, imaging, and laboratory variables. Accurate differentiation between these conditions remains a significant challenge for hepatobiliary surgeons and radiologists, often leading to misdiagnoses and unnecessary healthcare resource usage. By curating a large dataset of pathologically confirmed XGC and GBC cases, we collected relevant clinical, imaging, and laboratory data to construct the LIDGAX model, based on the LGB intelligent differentiator for XGC and GBC, utilizing 12 selected variables. LIDGAX demonstrated high sensitivity (0.86 and 0.79) and specificity (0.90 and 0.80) in distinguishing XGC from GBC in both the internal validation and independent external testing cohorts. The subgroup analyses demonstrated its generalizability despite demographic, temporal, and institutional variations. Its diagnostic accuracy significantly outperformed that of radiologists, particularly enhancing precision and efficiency for residents. Additionally, real-world validation via an online platform further underscored its clinical utility potential.

Artificial intelligence has been utilized to analyze high-throughput data, revealing intricate connections between features and leveraging advanced computational techniques for categorization, prediction, and evidence-based decision-making in novel ways²². To our knowledge, only five previous studies have investigated ML- or deep learning (DL)-based approaches for differentiating XGC from GBC. Fujita et al.²³ developed a CT-based DL model that attained high predictive accuracy, achieving an AUC of 0.989 with a dataset of 49 patients. Zhou et al. ²⁴ established an ML-based prediction model that achieved an AUC of 0.888 for the preoperative differentiation of XGC and GBC. Zhang et al.²⁵ developed a DL nomogram integrating CECT scans, reaching an accuracy of 0.89, a precision of 0.92, and an AUC of 0.92 across two affiliated hospitals, used as an external validation cohort. Gupta et al.²⁶ employed three DL models to differentiate XGC from GBC on US, demonstrating superior accuracy over radiologists. However, these studies were limited by single-center designs and small sample sizes, with findings unvalidated in independent external cohorts. Another study²⁷ constructed a predictive nomogram based on 436 patients from two centers, incorporating variables such as sex, Murphy’s sign, absolute neutrophil count, glutamyl transpeptidase levels, CEA levels, and imaging findings. Our study included the largest dataset to date—1246 patients from four centers—to differentiate between XGC and GBC. To assess generalizability, reliability, and effectiveness, we validated the LIDGAX model on independent external testing cohorts, achieving AUCs of 0.84 and 0.92 in Centers B and C, and Center D, respectively.

To enhance the clinical applicability of our model, we selected commonly relevant factors for diagnosing gallbladder disease, incorporating general clinical data, imaging features from US, CT, and MRI, and laboratory tests (including routine blood tests, biochemical tests, coagulation tests, and tumor markers). Following multivariate logistic regression for variable selection, 20 variables were identified as independently associated with XGC and GBC. Among these, factors such as sex, symptoms, gallbladder stones, biliary duct dilation, gallbladder morphology, gallbladder size, intramural nodules, intraluminal tumor, mucosal line, invasion of adjacent structures, enlarged peri-tumoral lymph nodes, and CEA were consistent with prior studies^{23,24,25,27,28}. Notably, our analysis revealed that fever was significantly associated with GBC. This association may be mechanistically explained by tumor necrosis and systemic inflammatory response, which triggers elevated pro-inflammatory cytokines (e.g., TNF-α, IL-1) and COX-2 expression^29,30,31. Additionally, gallstones or GBC-related biliary obstruction may predispose patients to bacterial infections, thereby further contributing to fever^32,33. In contrast, XGC—a chronic granulomatous inflammatory condition characterized by lipid-laden macrophage infiltration—typically lacks such pronounced systemic inflammatory responses³⁴. CT hyperdensity was another feature associated with GBC, likely reflecting desmoplastic stromal reactions with collagen deposition and fibroblast proliferation, whereas XGC typically exhibits hypodense regions from lipid-laden macrophages. Furthermore, we identified new key risk factors for differentiating XGC and GBC—namely smoking, fibrinogen, total bilirubin, indirect bilirubin, and the CA199-to-TB ratio—factors not previously reported in this context. Prior research has shown that preoperative serum fibrinogen and total bilirubin levels correlate with tumor progression and may independently predict GBC^35,36,37,38. However, their potential role in distinguishing XGC from GBC has not been explored until now. Using LASSO to minimize redundancy, we refined these 20 variables to 12 final input parameters for the six ML-based models.

In our study, the inclusion of patients with complete US, CT, and MRI data may introduce selection bias, as this criterion excluded those typically managed with fewer imaging modalities in routine practice. However, the 12 key variables comprising LIDGAX—particularly key imaging features such as gallbladder stones, biliary duct dilation, gallbladder morphology, intramural nodules, intraluminal tumor, mucosal line, and enlarged peri-tumoral lymph nodes—are not modality-specific. These semantic features can be reliably identified across US, CT, or MRI. Consequently, LIDGAX remains applicable even when MRI is unavailable, provided the essential features are assessable through existing modalities. In clinical practice, US is often the first-line modality, followed by CT or MRI if needed for further characterization. For features requiring MRI confirmation (e.g., occult gallstones undetected by US/CT, intramural nodules demonstrating isoechoic density on US and isodense characteristics on CT), MRI is recommended as a supplementary modality to ensure input accuracy. When MRI is inaccessible, radiologists should flag such cases for multidisciplinary review. This protocol balances diagnostic accuracy with resource constraints. The retrospective requirement for multimodal imaging aimed to minimize feature omission during model development. While potentially introducing selection bias, this strategy ensured comprehensive data collection. In the future, prospective validations will specifically evaluate LIDGAX’s performance in settings with restricted imaging protocols. Besides, our study excluded seven GBC complicated with XGC cases, potentially introducing selection bias. Previous studies reported the incidence of XGC-GBC coexistence ranges from 3% to 12.5%^39,40,41. For such cases, histopathological examination remains the gold standard for definitive diagnosis. LIDGAX was specifically designed not to replace histopathology but to improve preoperative differentiation between pure XGC and GBC, guiding clinical decision-making and surveillance planning. The exclusion of coexisting cases from model validation ensures alignment with its intended use scenario.

Six ML algorithms we used are widely used in medical diagnostics¹⁶. Among these, the LGB model (AUC: 0.94 and 0.88) showed superior performance compared to others (AUC: 0.92–0.94 and 0.86–0.87) in both internal validation and external testing cohorts. Multicollinearity can contribute to overfitting; thus, we evaluated it using VIF, with all values under 1.50, indicating no significant multicollinearity among these variables. All six ML-based models exhibited slight overfitting, with significant differences in AUC between the training and internal validation cohorts (P = 0.002–0.040). However, the LGB model had the smallest AUC discrepancy and achieved the highest AUC in the independent external testing cohort, highlighting its robustness and generalizability. Calibration and DCA curves further showed that the LGB model offered the best alignment and net clinical benefit. This led us to select it as the optimal model for differentiating XGC from GBC, naming it LIDGAX. Furthermore, the interpretability of LIDGAX is crucial for clinicians and radiologists in decision-making processes. Therefore, we employed SHAP values to enhance interpretability, revealing the underlying relationships between features and outcomes⁴². The SHAP value analysis highlighted that intraluminal tumor and mucosal line contributed the most to distinguishing XGC from GBC.

The choice of thresholding strategy should align with clinical priorities. In high-risk populations, a sensitivity-prioritized threshold is optimal for screening, minimizing missed GBC diagnoses and enabling timely resection. Though this increases false positives, confirmatory biopsies or short-term imaging follow-up can mitigate overdiagnosis risks. Conversely, a specificity-prioritized threshold is critical for surgical decision-making, reducing unnecessary extended hepatectomies, neoadjuvant therapies, and lymph node dissections for benign XGC. For example, misclassifying XGC as GBC could expose patients to toxic chemotherapy or aggressive lymphadenectomy—procedures avoided by LIDGAX’s 0.96 specificity. However, its low sensitivity (0.42) necessitates cautious use, particularly in younger patients prioritizing cancer detection. The Youden Index (0.79 sensitivity, 0.80 specificity) balances accuracy and resource allocation, mirroring real-world trade-offs. Future integration of cost-benefit analyses could refine personalized threshold selection.

We also compared LIDGAX with six radiologists of varying experience levels in differentiating XGC from GBC, finding that LIDGAX demonstrated superior diagnostic accuracy. Two main factors contributed to this advantage. First, LIDGAX was trained using a combination of clinical, imaging, and laboratory variables within a supervised learning framework—an approach that provided systematic, digitized data integration across multiple sources, unlike the workflow radiologists typically experience. Second, ML algorithms, such as LIDGAX, are inherently more effective in feature selection and weighting than manual assessments⁴³, enabling our model to directly learn diagnostic patterns from detailed input data and apply them efficiently. Moreover, LIDGAX’s support enhanced radiologists’ performance across several metrics: sensitivity improved by 1.2–8.5%, specificity by 0.0–4.6%, and balanced accuracy by 1.8–6.6%. Additionally, the average diagnostic time per patient was reduced by 30.44–35.76 s, indicating that LIDGAX can substantially improve radiologists’ accuracy, lower misdiagnosis rates, and optimize time efficiency, particularly in high-demand clinical environments. ML still faces challenges in clinical translation⁴⁴. To support LIDGAX’s implementation, we developed an open-source online platform designed for convenient clinical use. This platform enables clinicians to input 12 key factors and instantly receive diagnostic predictions. In a retrospective real-world cohort of 124 patients from Center A, the platform achieved an impressive AUC of 0.95, with an accuracy of 0.92, sensitivity of 0.94, and specificity of 0.89. These results demonstrate that the platform is user-friendly for clinicians and radiologists and achieves robust performance in distinguishing XGC from GBC in real-world clinical settings.

This study has several limitations. First, LIDGAX was developed using data from Chinese populations, so its generalizability to broader, global populations remains uncertain and requires further validation with additional datasets, despite our dataset being the largest available to date. Second, although LIDGAX was built using the clinical, imaging, and laboratory variables we were able to collect, there may be other relevant variables not considered in this study that could potentially enhance model performance. Future iterations could enhance performance by incorporating emerging biomarkers and genomic data. Third, the retrospective design inherently introduces selection bias, a limitation common to all observational studies. To address this, we are planning prospective randomized controlled trials to validate LIDGAX’s clinical efficacy. Lastly, given the rapid advancements in DL for medical imaging, our future goal is to incorporate DL to extract complex, high-dimensional features from multimodal imaging, enabling more accurate and intelligent differentiation between XGC and GBC for clinical applications.

In conclusion, we developed the LIDGAX model, utilizing the LGB algorithm, to accurately differentiate XGC from GBC. The model demonstrated robust diagnostic performance across independent external testing cohorts, surpassing six expert radiologists in diagnostic accuracy. By employing SHAP values, we improved the interpretability of LIDGAX for clinical applications. Additionally, we constructed an open-source online platform to validate the clinical translation potential of LIDGAX. Given its high accuracy and reliability, LIDGAX holds promise as a valuable, non-invasive tool for effectively distinguishing XGC from GBC in clinical settings.

Methods

Patients

This multicenter, retrospective study included patients diagnosed with XGC or GBC between January 2023 and February 2024 from four Chinese hospitals: The First Affiliated Hospital, Zhejiang University School of Medicine (Center A); The Second Affiliated Hospital, Jiaxing University (Center B); Beilun District People’s Hospital (Center C); and Huzhou Central Hospital (Center D). Adult participants who underwent either simple or radical cholecystectomy and were pathologically confirmed to have XGC or GBC were included. The exclusion criteria were: (1) patients with more than 5% incomplete clinical data and laboratory tests (Center A: n = 91; Centers B and C: n = 74; Center D: n = 54); (2) the lack of preoperative US, CECT, and CEMRI (Center A: n = 145; Centers B and C: n = 86; Center D: n = 70); (3) pathologically confirmed metastatic gallbladder malignancy (Center A: n = 13; Centers B and C: n = 5; Center D: n = 3); (4) GBC complicated with XGC (Center A: n = 4; Centers B and C: n = 1; Center D: n = 2). Details of the study population are illustrated in Fig. 2a.

In total, 1246 patients were included in the differential diagnosis task, comprising 554 XGC patients and 692 GBC patients. Patients from Center A (n = 843) were split chronologically into a training cohort and an internal validation cohort in a 4:1 ratio, while patients from Centers B, C, and D (n = 279) were assigned to independent external testing cohorts.

Variable collection

Baseline variables included clinical data, laboratory tests, and imaging features, selected based on consultations with gallbladder specialists and a review of recent literature on risk factors relevant to XGC and GBC. Clinical data were collected from medical records, including sex, age, symptoms, complications, smoking status, diabetes, gallbladder adenomyomatosis, biliary tract infection, as well as other conditions like schistosomiasis and congenital biliary dilatation/cyst. Laboratory tests conducted within two weeks prior to cholecystectomy, including routine blood tests, biochemical analyses, coagulation profiles, and tumor markers, were systematically extracted from electronic medical records. Further details on the clinical data and laboratory tests are provided in Supplementary Table 1. Two abdominal radiologists, each with over 10 years of experience, independently reviewed each US, CECT, and CEMRI scan as a standard reference and assessed all imaging features by consensus. In cases of disagreement, a third radiologist with over 20 years of experience in gallbladder disease performed the final evaluation. All scans conducted within two weeks were used as references. Detailed definitions of imaging features are available in Supplementary Table 2.

Model development

The development of ML-based models followed a two-step approach: (1) selecting robust features related to the differentiation of XGC and GBC from collected clinical, imaging, and laboratory variables; and (2) constructing six ML models using the selected features. For the first step, we implemented a three-stage feature selection approach within the training cohort (n = 674) to identify robust clinical, imaging, and laboratory variables. First, a preliminary univariate binary logistic regression analysis was conducted to identify variables with significant differences between XGC and GBC patients. Next, multivariate logistic regression analysis was applied to variables identified in the univariate analysis. Finally, the least absolute shrinkage and selection operator (LASSO) regression method was used to select the most predictive features with non-zero coefficients, with penalty tuning through 10-fold cross-validation (Supplementary Fig. 1).

For the second step, we developed six ML classification algorithms using the features selected by LASSO in the training cohort: logistic regression (LR), random forest (RF), support vector machine (SVM), eXtreme gradient boosting (XGB), light gradient boosting (LGB), and multilayer perceptron (MLP). Each algorithm was fine-tuned through grid search. After identifying optimal hyperparameters, each model was retrained on the full training subset with a set random seed, finalizing the weights and generating a locked model, which was then evaluated on the internal validation cohort.

Model evaluation

To systematically evaluate model performance, we compared the performance of six models across training, internal validation, and external testing cohorts. Evaluation metrics included the area under the curve (AUC), sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), recall, and confusion matrix. Calibration curves and decision curve analysis (DCA) were employed to assess model calibration and clinical utility across all cohorts^45,46. Based on the integrated performance metrics, the LGB-based algorithm was identified as the optimal model and named LIDGAX (LGB Intelligent Differentiator for GBC and XGC). To confirm the robustness of the chronological split strategy, we performed time-stratified five-fold cross-validation on dataset A (n = 843) using LIDGAX. The model’s decision-making process was visualized using SHapley Additive exPlanations (SHAP)^47,48, which quantified feature importance scores and elucidated the relationships between XGC, GBC, and selected features.

Our study implemented three thresholding strategies to address distinct clinical priorities: (1) Diagnostic balance: Optimized using the Youden Index (sensitivity + specificity − 1) to balance sensitivity and specificity; (2) Screening priority: Maximized sensitivity (>95%) to minimize missed diagnoses in gallbladder cancer (GBC) screening; (3) Treatment precision: Maximized specificity (>95%) to reduce overtreatment risks caused by false positives in therapeutic decision-making.

Subgroup analyses

To evaluate the effectiveness of the combined model, we constructed four separate models using these factors: a clinical model, an imaging model, a laboratory model, and a combined model. Each of these four models was developed using six different ML algorithms. To address potential variability and ensure the robustness of the findings, we conducted subgroup analyses in the external testing cohort, stratified by sex (female and male), age (<60 years and ≥60 years), time periods (2011–2015, 2016–2020, and 2021–2024), and centers (Centers B, C, and D).

Reader study

To assess the performance of radiologists in differentiating XGC from GBC, six radiologists independently diagnosed cases in the internal validation cohort (n = 169). Participants included two radiology residents (3–5 years of experience), two general radiologists (5–10 years of experience in abdominal imaging), and two gallbladder specialists (10–20 years of experience in gallbladder imaging). Prior to the study, a gallbladder specialist with extensive experience (over 3000 case reviews) conducted a training session for each radiologist, covering key imaging features identified in 40 representative cases from the training cohort.

This study included two main steps. In the first step, we compared the diagnostic performance of LIDGAX with that of the radiologists. Each radiologist reviewed anonymized US, CECT, and MRI images in random order using the local picture archiving and communication system, without access to clinical or laboratory data. They were tasked with determining whether each case represented XGC or GBC. In the second step, we assessed LIDGAX’s potential to support radiologists in diagnosis. Each radiologist received LIDGAX’s probability score for each case and then reanalyzed the same cases from the first step with this additional input. A minimum interval of one month separated the two steps to reduce recall bias.

Real-world study

For the real-world clinical evaluation, we deployed the LIDGAX model on an open-source online computing platform, allowing clinicians and radiologists to easily analyze cases through a user-friendly interface. This retrospective study included consecutive patients diagnosed with XGC or GBC between February 2023 and February 2024 from Center A. Exclusion criteria included: (1) patients with incomplete clinical and laboratory data (n = 3), and (2) patients without preoperative US, CECT, and CEMRI scans (n = 12). After applying these criteria, 124 patients were included in the real-world evaluation study (Fig. 2b).

Statistical analysis

All statistical analyses were conducted using R software (Version 4.2.2; https://www.rproject.org). To address incomplete clinical and laboratory data, multiple imputation was applied as part of data preprocessing⁴⁹. Categorical variables were compared between groups using either the chi-square test or Fisher’s exact test and are presented as numbers and frequencies. For continuous variables, the Kolmogorov–Smirnov test assessed normality. Variables following a normal distribution are expressed as mean ± standard deviation (SD) and were compared using the t-test, while non-normally distributed variables are reported as median (interquartile range, IQR) and analyzed with the Mann–Whitney U test. Non-normally distributed continuous data were normalized before model development. Collinearity among variables was evaluated through the variance inflation factor (VIF), where a VIF > 5 indicated notable collinearity and a VIF > 10 suggested significant collinearity⁵⁰. Univariate and multivariate binary logistic regression analyses identified variables associated with XGC and GBC. Variables found significant in univariate analysis were subsequently included in a stepwise multivariate analysis, using the Akaike information criterion for optimal variable selection⁵¹. The diagnostic performance of the ML-based models was evaluated by metrics including sensitivity, specificity, accuracy, PPV, NPV, recall, balanced accuracy, and confusion matrix. Model performance comparisons of AUCs between the six algorithms were carried out using the DeLong test. Confidence intervals (95% CIs) were obtained through 1000 bootstrap resampling. To compare the sensitivity and specificity of diagnostic performance before and after LIDGAX assistance, McNemar’s test was applied for paired categorical data analysis. Benjamini–Hochberg false discovery rate (BH-FDR, q < 0.05) was used for multiple testing correction⁵². Two-tailed P-values < 0.05 were considered statistically significant.

Data availability

The source data for all models, tables, and figures, along with supporting materials, are available from the corresponding author upon reasonable request.

Code availability

The code developed for this study is not publicly accessible to protect proprietary knowledge. However, qualified academic researchers may request access to the code—including preprocessing scripts, model architecture, and training protocols—for noncommercial research purposes. Requests should be submitted to the corresponding author at tiananjiang@zju.edu.cn and must include a detailed research proposal outlining the intended use. Access will be granted following approval by the institutional review committee, which evaluates proposals within 4-6 weeks. A formal code use agreement, prohibiting redistribution or commercial exploitation, will be required prior to release.

References

Rammohan, A., Cherukuri, S. D., Sathyanesan, J., Palaniappan, R. & Govindan, M. Xanthogranulomatous cholecystitis masquerading as gallbladder cancer: can it be diagnosed preoperatively?. Gastroenterol. Res. Pr. 2014, 253645 (2014).
Google Scholar
Roa, J. C. et al. Gallbladder cancer. Nat. Rev. Dis. Prim. 8, 69 (2022).
Article PubMed Google Scholar
Baiu, I. & Visser, B. Gallbladder cancer. Jama 320, 1294 (2018).
Article PubMed Google Scholar
Güneş, Y., Bostancı, Ö, İlbar Tartar, R. & Battal, M. Xanthogranulomatous cholecystitis: is surgery difficult? Is laparoscopic surgery recommended?. J. Laparoendosc. Adv. Surg. Tech. A 31, 36–40 (2021).
Article PubMed Google Scholar
Benson, A. B. et al. Hepatobiliary Cancers, Version 2.2021, NCCN Clinical Practice Guidelines in Oncology. J. Natl. Compr. Canc. Netw. 19, 541–565 (2021).
Article PubMed Google Scholar
Feng, L., You, Z., Gou, J., Liao, E. & Chen, L. Xanthogranulomatous cholecystitis: experience in 100 cases. Ann. Transl. Med. 8, 1089 (2020).
Article PubMed PubMed Central Google Scholar
Truant, S., Chater, C. & Pruvot, F. R. Greatly enlarged thickened gallbladder. Diagnosis: Xanthogranulomatous cholecystitis (XGC). JAMA Surg. 150, 267–268 (2015).
Article PubMed Google Scholar
Deng, Y. L. et al. Xanthogranulomatous cholecystitis mimicking gallbladder carcinoma: An analysis of 42 cases. World J. Gastroenterol. 21, 12653–12659 (2015).
Article PubMed PubMed Central Google Scholar
Spinelli, A. et al. Extended surgical resection for xanthogranulomatous cholecystitis mimicking advanced gallbladder carcinoma: a case report and review of literature. World J. Gastroenterol. 12, 2293–2296 (2006).
Article PubMed PubMed Central Google Scholar
Huang, E. Y. et al. Distinguishing characteristics of xanthogranulomatous cholecystitis and gallbladder adenocarcinoma: a persistent diagnostic dilemma. Surg. Endosc. 38, 348–355 (2024).
Article PubMed Google Scholar
Xiao, J., Zhou, R., Zhang, B. & Li, B. Noninvasive preoperative differential diagnosis of gallbladder carcinoma and xanthogranulomatous cholecystitis: a retrospective cohort study of 240 patients. Cancer Med. 11, 176–182 (2022).
Article PubMed Google Scholar
Goshima, S. et al. Xanthogranulomatous cholecystitis: diagnostic performance of CT to differentiate from gallbladder cancer. Eur. J. Radio. 74, e79–e83 (2010).
Article Google Scholar
Parra, J. A. et al. Xanthogranulomatous cholecystitis: clinical, sonographic, and CT findings in 26 patients. AJR Am. J. Roentgenol. 174, 979–983 (2000).
Article CAS PubMed Google Scholar
Ros, P. R. & Goodman, Z. D. Xanthogranulomatous cholecystitis versus gallbladder carcinoma. Radiology 203, 10–12 (1997).
Article CAS PubMed Google Scholar
Lee, E. S. et al. Xanthogranulomatous cholecystitis: diagnostic performance of US, CT, and MRI for differentiation from gallbladder carcinoma. Abdom. Imaging 40, 2281–2292 (2015).
Article PubMed Google Scholar
Goecks, J., Jalili, V., Heiser, L. M. & Gray, J. W. How machine learning will transform biomedicine. Cell 181, 92–101 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zhang, B. et al. Identifying behaviour-related and physiological risk factors for suicide attempts in the UK Biobank. Nat. Hum. Behav. 8, 1784–1797 (2024).
Article PubMed Google Scholar
Zamanzadeh, D. et al. Data-driven prediction of continuous renal replacement therapy survival. Nat. Commun. 15, 5440 (2024).
Article CAS PubMed PubMed Central Google Scholar
Liu, R. et al. Development and prospective validation of postoperative pain prediction from preoperative EHR data using attention-based set embeddings. NPJ Digit Med. 6, 209 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wagner, M. et al. Artificial intelligence for decision support in surgical oncology - a systematic review. Artif. Intell. Surg. 2, 159–172 (2022).
Article Google Scholar
Teng, X. et al. Development and validation of an early diagnosis model for bone metastasis in non-small cell lung cancer based on serological characteristics of the bone metastasis mechanism. EClinicalMedicine 72, 102617 (2024).
Article PubMed PubMed Central Google Scholar
Xu, Y. et al. Artificial intelligence: a powerful paradigm for scientific research. Innovation 2, 100179 (2021).
PubMed PubMed Central Google Scholar
Fujita, H. et al. Differential diagnoses of gallbladder tumors using CT-based deep learning. Ann. Gastroenterol. Surg. 6, 823–832 (2022).
Article PubMed PubMed Central Google Scholar
Zhou, Q. M. et al. Machine learning-based radiological features and diagnostic predictive model of xanthogranulomatous cholecystitis. Front. Oncol. 12, 792077 (2022).
Article PubMed PubMed Central Google Scholar
Zhang, W. et al. Deep learning nomogram for preoperative distinction between Xanthogranulomatous cholecystitis and gallbladder carcinoma: a novel approach for surgical decision. Comput. Biol. Med. 168, 107786 (2024).
Article PubMed Google Scholar
Gupta, P. et al. Deep-learning models for differentiation of xanthogranulomatous cholecystitis and gallbladder cancer on ultrasound. Indian J. Gastroenterol. 43, 805–812 (2024).
Article PubMed Google Scholar
Fu, T. et al. Machine learning-based diagnostic model for preoperative differentiation between xanthogranulomatous cholecystitis and gallbladder carcinoma: a multicenter retrospective cohort study. Front. Oncol. 14, 1355927 (2024).
Article PubMed PubMed Central Google Scholar
Ito, R. et al. A scoring system based on computed tomography for the correct diagnosis of xanthogranulomatous cholecystitis. Acta Radio. Open 9, 2058460120918237 (2020).
Google Scholar
Yang, S. Q. et al. Prognostic significance of tumor necrosis in patients with gallbladder carcinoma undergoing curative-intent resection. Ann. Surg. Oncol. 31, 125–132 (2024).
Article PubMed Google Scholar
P‚rez-Moreno, P., Riquelme, I., Garc¡a, P., Brebi, P. & Roa, J. C. Environmental and Lifestyle risk factors in the carcinogenesis of gallbladder cancer. J. Pers. Med. 12, 234 (2022).
Article Google Scholar
Balkwill, F. Tumour necrosis factor and cancer. Nat. Rev. Cancer 9, 361–371 (2009).
Article CAS PubMed Google Scholar
Elinav, E. et al. Inflammation-induced cancer: crosstalk between tumours, immune cells and microorganisms. Nat. Rev. Cancer 13, 759–771 (2013).
Article CAS PubMed Google Scholar
Espinoza, J. A. et al. The inflammatory inception of gallbladder cancer. Biochim. Biophys. Acta 1865, 245–254 (2016).
CAS PubMed PubMed Central Google Scholar
Azari, F. S. et al. Kt. A contemporary analysis of xanthogranulomatous cholecystitis in a Western cohort. Surgery 170, 1317–1324 (2021).
Article PubMed Google Scholar
Yang, Z. et al. Preoperative serum fibrinogen as a valuable predictor in the nomogram predicting overall survival of postoperative patients with gallbladder cancer. J. Gastrointest. Oncol. 12, 1661–1672 (2021).
Article PubMed PubMed Central Google Scholar
Zhang, L. et al. Exploring the diagnosis markers for gallbladder cancer based on clinical data. Front. Med. 9, 350–355 (2015).
Article CAS PubMed Google Scholar
Yang, S. Q. et al. Unraveling early recurrence of risk factors in gallbladder cancer: a systematic review and meta-analysis. Eur. J. Surg. Oncol. 50, 108372 (2024).
Article PubMed Google Scholar
Liu, F. et al. The prognostic value of combined preoperative PLR and CA19-9 in patients with resectable gallbladder cancer. Updates Surg. 76, 1235–1245 (2024).
Article PubMed Google Scholar
Bolukbasi, H. & Kara, Y. An important gallbladder pathology mimicking gallbladder carcinoma: xanthogranulomatous cholecystitis: a single tertiary center experience. Surg. Laparosc. Endosc. Percutan Tech. 30, 285–289 (2020).
Article PubMed Google Scholar
Kwon, A. H. & Sakaida, N. Simultaneous presence of xanthogranulomatous cholecystitis and gallbladder cancer. J. Gastroenterol. 42, 703–704 (2007).
Article PubMed Google Scholar
Pandey, A., Kumar, D., Masood, S., Chauhan, S. & Kumar, S. Is final histopathological examination the only diagnostic criteria for xanthogranulomatous cholecystitis?. Niger. J. Surg. 25, 177–182 (2019).
Article PubMed PubMed Central Google Scholar
Nohara, Y., Matsumoto, K., Soejima, H. & Nakashima, N. Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Comput. Methods Prog. Biomed.214, 106584 (2022).
Article Google Scholar
Wang, S. & Summers, R. M. Machine learning and radiology. Med. Image Anal. 16, 933–951 (2012).
Article CAS PubMed PubMed Central Google Scholar
Dinsdale, N. K. et al. Challenges for machine learning in clinical translation of big data imaging studies. Neuron 110, 3866–3881 (2022).
Article CAS PubMed Google Scholar
Vickers, A. J. & Elkin, E. B. Decision curve analysis: a novel method for evaluating prediction models. Med Decis. Mak. 26, 565–574 (2006).
Article Google Scholar
Steyerberg, E. W. et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology 21, 128–138 (2010).
Article PubMed PubMed Central Google Scholar
Lundberg S. M., Lee S.-I. A unified approach to interpreting model predictions. In Advances in neural information processing systems 30, (NeurIPS, 2017).
Lundberg, S. M., Erion, G. G. & Lee, S.-I. Consistent individualized feature attribution for tree ensembles. Preprint at https://arxiv.org/abs/1802.03888 (2018).
Little, R. J. et al. The prevention and treatment of missing data in clinical trials. N. Engl. J. Med. 367, 1355–1360 (2012).
Article CAS PubMed PubMed Central Google Scholar
O’brien, R. M. A caution regarding rules of thumb for variance inflation factors. Qual. Quant. 41, 673–690 (2007).
Article Google Scholar
Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. control 19, 716–723 (1974).
Article Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B. 57, 289–300 (1995).
Article Google Scholar

Download references

Acknowledgements

Funding was provided by the Development Project of National Major Scientific Research Instrument (82027803), the National Natural Science Foundation of China (82202151), the Key Research and Development Project of Zhejiang Province (2024C03092), the National Key R&D Program of China (2022YFC2405505), Zhejiang Provincial Natural Science Foundation of China (Y24H180007), and Beilun Health Technology Project (2024BLWSQN002).

Author information

These authors contributed equally: Ke Zhang, Jiajia He.

Authors and Affiliations

Department of Ultrasound Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
Ke Zhang, Liting Xie & Tianan Jiang
Department of Ultrasound Medicine, Beilun District People’s Hospital, Ningbo, Zhejiang, China
Jiajia He & Qunyan Pan
Department of Radiology, Beilun District People’s Hospital, Ningbo, Zhejiang, China
Weiyue Ji
Department of Ultrasound Medicine, Huzhou Central Hospital, Huzhou, Zhejiang, China
Weilv Xiong
Department of Ultrasound Medicine, Qingchun Hospital of Zhejiang Province, Hangzhou, Zhejiang, China
Liping Wang
Department of Ultrasound Medicine, The Second Affiliated Hospital, Jiaxing University, Jiaxing, Zhejiang, China
Weiqi Sun

Authors

Ke Zhang
View author publications
Search author on:PubMed Google Scholar
Jiajia He
View author publications
Search author on:PubMed Google Scholar
Weiyue Ji
View author publications
Search author on:PubMed Google Scholar
Qunyan Pan
View author publications
Search author on:PubMed Google Scholar
Weilv Xiong
View author publications
Search author on:PubMed Google Scholar
Liping Wang
View author publications
Search author on:PubMed Google Scholar
Weiqi Sun
View author publications
Search author on:PubMed Google Scholar
Liting Xie
View author publications
Search author on:PubMed Google Scholar
Tianan Jiang
View author publications
Search author on:PubMed Google Scholar

Contributions

Study concept and design: K.Z., J.H., L.X., and T.J.; Acquisition of data: K.Z., J.H., W.J., Q.P., W.X., L.W., and W.S.; Analysis and interpretation of data: K.Z. and J.H.; Drafting of the manuscript: K.Z., J.H., and L.X.; Critical revision of the manuscript: K.Z., J.H., L.X., and T.J.; Statistical analysis: K.Z. and J.H.; Study supervision: K.Z., J.H., W.J., Q.P., W.X., L.W., W.S., L.X., and T.J. All authors have read and approved this manuscript.

Corresponding authors

Correspondence to Liting Xie or Tianan Jiang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Material.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhang, K., He, J., Ji, W. et al. Machine learning model for differentiating xanthogranulomatous cholecystitis and gallbladder cancer in multicenter largescale study. npj Digit. Med. 8, 590 (2025). https://doi.org/10.1038/s41746-025-01991-7

Download citation

Received: 21 February 2025
Accepted: 01 September 2025
Published: 01 October 2025
DOI: https://doi.org/10.1038/s41746-025-01991-7

Subjects

Abstract

Similar content being viewed by others

Prognostic model for log odds of negative lymph node in locally advanced rectal cancer via interpretable machine learning

Automated gall bladder cancer detection using artificial gorilla troops optimizer with transfer learning on ultrasound images

Establishment and characterization of a novel human gallbladder cancer cell line, GBC-X1

Introduction

Results

Baseline characteristics

Construction of ML-based models

Diagnostic performance of ML-based models

Interpretability of LIDGAX model

Subgroup analyses

Reader study

Real-world clinical evaluation

Discussion

Methods

Patients

Variable collection

Model development

Model evaluation

Subgroup analyses

Reader study

Real-world study

Statistical analysis

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Material.

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links