Introduction

Hilar cholangiocarcinoma (hCCA) is a rare malignancy of the biliary system with poor prognosis1. In recent years, the global incidence rate of hCCA has been increasing, with an estimated rate of 0.6-2/100,0002. It is widely accepted that radical resection is the therapeutic gold standard3. However, even with radical resection, positive margin occurs in up to 50% of patients4. Previous studies have demonstrated that the 5-year overall survival (OS) rate following surgical resection was 11–40%, with a median survival of only 25 months5,6. The current prognostic assessment of hCCA faces significant challenges due to its aggressive nature, complex anatomical location and limited surgical resectability. The prognosis remains poor despite advancements in surgical techniques and multimodal therapies, with high recurrence rates and low survival outcomes. Nevertheless, there is currently lack of an effective and practical prognostic model for predicting the survival of postoperative patients of hCCA. Therefore, it is essential to identify the independent prognostic risk factors of hCCA and construct prognostic prediction models for accurate prognostic assessment.

Machine learning (ML), as an effective tool for screening feature variables and discerning intricate patterns, has been extensively applied to various studies of clinically relevant data, producing remarkable outcomes7,8. In recent years, ML-based nomograms have gradually emerged as a prominent tool, exhibiting advantages such as high model accuracy, risk visualization and flexible integration of relationships among variables. Numerous studies have conducted prognostic analyses utilizing ML-based nomograms, demonstrating high model accuracy and robust reliability7,8,9,10,11,12,13,14. Until the present moment, no researches have explored the predictive models of hCCA based on ML.

In this study, we aimed to employ five ML algorithms to screen variables for patients with hCCA. Based on ML algorithms, a nomogram was developed to construct a prognostic predictive model. We aimed to identify the risk factors that significantly impact postoperative survival among patients with hCCA and to offer personalized prognosis prediction, thereby maximizing clinical benefit for patients.

Materials and methods

Patients and design

All patients who underwent curative-intent resection for hCCA from the First Affiliated Hospital of Xi’an Jiaotong University between January 2010 and June 2021 were eligible for inclusion. The inclusion criteria were as follows: (1) postoperative pathological confirmation of hCCA; (2) serological, surgical and pathological data were available; (3) patients absence of other malignancies. Our exclusion criteria comprised: (1) patients died within 30 days after surgery; (2) surgical margins classified as R2; (3) distant metastasis. In our study, we excluded the missing clinical or pathological data (≤ 5% of the total). Ultimately, 340 cases with complete information were included.

The primary outcome measures of our study were the OS rates at 1-, 2- and 3-year intervals, specifically defined as the proportions of patients surviving from the first day after surgery until the final follow-up visit or until mortality, for patients with hCCA.

Follow-up

After completion of hospital admission, all patients included in the study underwent standardized follow-up evaluations adhering to clinical guidelines, with follow-up assessments conducted every three months. For those patients who failed to visit our hospital for timely reviews, treatment updates and inquiries concerning their living conditions were conducted via telephone follow-up performed by our clinicians. The final follow-up date for these assessments was June 2024.

Study variables

For the purpose of comprehensive evaluation, all patients were assessed in accordance with the established guidelines of the 8th edition of the American Joint Committee on Cancer (AJCC) staging system, the Bismuth-Corlette classification and the Child-Pugh scoring system. In addition, we incorporated various variables, including demographic, serological, surgical and pathological factors (Table 1). It is worth noting that, positive margin was defined as R1 resection (microscopic tumor involvement at the surgical margin) per the AJCC 8th edition, and we did not conduct a further detailed differentiation, such as carcinoma in situ and invasive carcinoma, or ductal margin and radial margin.

Table 1 Baseline characteristics of hCCA patients in the training and testing sets.

Statistical analysis

SPSS version 25.0 (IBM Corp., Armonk, NY, USA) was used to perform the t-test, F-test and Log-rank test. Variables with P < 0.05 were considered statistically significant. R software (version 4.4.0; http://www.Rproject.org) was employed for other statistical analysis. Various ML-based variables selection was implemented via “randomForestSRC”, “glmnet”, “survminer”, “Boruta” and “xgboost” package. Based on “survival” and “rms” package, we completed Cox regression analysis, calculated consistency index (C-index), plotted nomogram, calibration plots and Kaplan-Meier survival curves. Furthermore, the “dcurves” and “timeROC” packages were utilized specifically for the purposes of plotting decision curves and receiver operating characteristic (ROC) curves, respectively.

Variable selection

We employed five distinct ML algorithms: Least Absolute Shrinkage and Selection Operator (LASSO) Regression, Forward Stepwise Cox regression, Boruta feature selection, Random Forest and eXtreme Gradient Boosting (XGBoost), to identify the prognostic variables and construct a nomogram based on included variables. Excluding subjective variables such as abdominal pain, bloating and jaundice, all other variables were comprehensively analyzed utilizing the aforementioned ML algorithms, leading to the identification and screening of independent risk factors. In addition, we compared the predictive outcomes using Cox regression analysis.

The development, validation and application of nomogram

A training set (N = 237) and a testing set (N = 103) were randomly created from all included patients in a 7:3 ratio. We constructed the nomogram using R software based on independent risk factors and then conducted validation procedures to ensure the accuracy and reliability of the results obtained from the nomogram. Each variable’s point value in the nomogram reflects its relative prognostic importance derived from ML algorithms. We could receive a total score by summing individual points, which is then mapped to predicted 1-, 2- and 3-year OS probabilities. The C-index, calibration curves, ROC curves for 1-, 2- and 3-year OS, and decision curves were all derived. Additionally, we calculated the total score for each patient using the nomogram and divided the patients into three groups of high, middle and low prognostic risk based on their total scores, using X-tile software (Yale University, New Haven, CT, USA).

Results

Basic characteristics

A total of 340 patients who underwent curative-intent resection for hCCA fulfilled the prescribed inclusion criteria, thus qualifying for inclusion in this retrospective study. Specifically, the study population was divided into a training set comprising 237 patients and a testing set consisting of 103 patients, respectively, after undergoing curative-intent resection for hCCA. Statistical analysis of the baseline characteristics revealed no significant difference between the training and testing sets (Table 1). In the training and testing sets, the median survival times were 21.7 months and 22.1 months, respectively (p = 0.48).

The outcome of variable selection

LASSO regression identified four characteristic variables at minimum values, including positive margin, lymph node metastasis, low total lymph node count (TLNC) and tumor differentiation (Fig. 1a,b). Similarly, Forward Stepwise Cox regression also confirmed the statistically significant differences of the aforementioned four characteristic variables (p < 0.01, Fig. 1c). Additionally, concordant outcomes from Random Forest, Boruta and XGBoost robustly indicated the aforementioned four characteristic variables exhibited the most pronounced significance among the considered features (Fig. 1d,e,f). Similarly, the outcomes of Cox regression corroborate this finding (Fig. 1g).

Fig. 1
figure 1

Five machine learning algorithms and Cox regression for screening independent risk factors. (a) LASSO coefficient profiles of the 21 risk factors. (b) Twenty risk factors selected using LASSO regression analysis. (cg) Forward Stepwise Cox, Boruta feature selection, Random Forest, XGBoost and Cox regression analysis to screen risk factors.

Development and performance of prediction model

Based on the prognostic factors identified through ML algorithms, a nomogram was constructed to predict 1-, 2- and 3-year OS (Fig. 2). The C-index of the nomogram was 0.731 (95% CI: 0.684–0.753), and the AUCs for predicting 1-, 2-, and 3-year survival using the nomogram were 0.780 (95% CI: 0.711–0.848), 0.783 (95% CI: 0.726–0.840) and 0.784 (95% CI: 0.724–0.844), respectively, in the training set (Fig. 3a). Moreover, similar results were obtained in the testing set: the C-index of the nomogram was 0.714 (95% CI: 0.661–0.775), and the AUCs (ROC curve) for 1-, 2-, and 3-year OS were 0.756 (95% CI: 0.654–0.858), 0.775 (95% CI: 0.678–0.872) and 0.770 (95% CI: 0.670–0.870), respectively (Fig. 3b).

Fig. 2
figure 2

Nomogram for 1-, 2- and 3-year OS in patients with hCCA after curative-intent resection.

Fig. 3
figure 3

ROC curves for 1-, 2- and 3-year OS of the included patients. The horizontal axis shows false positive rate and the vertical axis shows true positive rate. (a) ROC curves for 1-, 2- and 3-year OS in the training set of nomogram, respectively. (b) ROC curves for 1-, 2- and 3-year OS in the testing set of nomogram, respectively.

Comparison with traditional models

The C-index of both the training and testing sets for the nomogram surpassed that of the traditional TNM staging system (0.621, 95% CI: 0.552–0.628; 0.612, 95% CI: 0.557–0.667) and Bismuth-Corlette classification (0.531, 95% CI: 0.471–0.552; 0.487, 95% CI: 0.415–0.555), demonstrating better level of predictive accuracy.

Calibration curve and decision curve analysis

The calibration plots for 1-, 2- and 3-year OS probabilities show good concordance between nomogram prediction and actual observation in the training and testing sets (Fig. 4). Furthermore, the decision curve analysis revealed that the nomogram outperformed both the TNM staging system and the Bismuth-Corlette classification system in terms of performance, as evidenced in both the training and testing sets (Fig. 5).

Fig. 4
figure 4

Calibration curve of the nomogram in the training and testing sets, with the x-axes are actual survival estimated by the nomogram, the y-axes are observed survival calculated by the Kaplan-Meier method. (a,c,e) 1-, 2- and 3-year OS in the training set of nomogram, respectively. (b,d,f) 1-, 2- and 3-year OS in the testing set of nomogram, respectively.

Fig. 5
figure 5

Decision curve analysis for OS in the training and testing sets. a In the training set. b In the testing set.

Application of the nomogram model for risk stratification

Risk stratification cutoffs were determined using X-tile software with 1000-bootstrap resampling to ensure stability, resulting in categorizing patients into three distinct groups with high, middle and low prognostic risk. The total score ≤ 80 was defined as low risk and > 135 was considered high risk, while the remaining was middle risk. The Kaplan-Meier curve of three groups was described, and the risk group was compared using the three-point factor, log-rank test and the two-tailed P value < 0.05 was statistically significant (Fig. 6). In the training set, there were 81, 84 and 72 cases in the low-risk, middle-risk and high-risk group, respectively. Intergroup OS was (39.3 ± 1.7) months, (24.1 ± 3.2) months, and (10.3 ± 1.4) months (p < 0.001). In the testing set, there were 33, 44 and 26 cases in the low-risk, middle-risk and high-risk group. Intergroup OS was (43.4 ± 5.4) months, (20.9 ± 2.3) months, and (10.3 ± 1.3) months (p < 0.001).

Fig. 6
figure 6

Kaplan-Meier survival curves for different risk levels. a In the training set. b In the testing set.

Discussion

hCCA accounts for 50–60% of cholangiocarcinoma, characterized by poor prognosis, with surgery representing the optimal therapeutic approach15. Given the complexity of surgical procedures for hCCA and the current lack of accurate prognostic models, it is important to accurately predict the postoperative survival outcomes16. To address this need, we have conducted a retrospective study using ML algorithms to identify the prognostic risk factors among patients with hCCA after curative-intent resection. Furthermore, a predictive nomogram has been developed based on our findings.

In this study, we found four statistically significant and most prominent independent risk factors: positive margin, lymph node metastasis, low TLNC and poor tumor differentiation via the analysis employing five ML algorithms. Additionally, we employed Cox regression to screen for prognostic risk factors. Previous studies show ML outperforms Cox regression in extracting prognosis-related risk factors10,12,13. However, our study finds minimal difference between ML and Cox regression in identifying prognostic risk factors, likely due to the four key prognostic indicators identified being significantly more important than other variables with a pronounced importance gap. ML algorithms allow simultaneous screening of all variables and detection of nonlinear interactions, clearly presenting variable importance which traditional Cox regression may overlook. Additionally, unlike classical Cox regression, ML algorithms provide robust feature ranking and cross-validation to ensure the nomogram includes the most impactful prognostic factors. In summary, ML is superior in visualization, variable screening, nonlinear interaction identification, feature ranking and cross-validation, ensuring nomogram efficacy.

The identification of these four factors underscores the complexity and multifaceted nature of prognosis with hCCA, emphasizing the critical role of surgical quality (status of resection margins and the extent of lymphadenectomy) and tumor biology (tumor differentiation and lymph node involvement). Previous studies have demonstrated that lymph node invasion, tumor differentiation and margin status are significant prognostic indicators that influence the clinical outcomes of patients with hCCA17. This finding aligns with our research outcomes.

Positive margin, which is predictive of reduced survival, underscores the indispensable necessity for comprehensive preoperative assessment and meticulous surgical technique in order to achieve complete tumor excision. Studies have found that the status of positive margins in patients with biliary tract malignancies can be further classified into carcinoma in situ and invasive carcinoma, with the latter exhibiting significantly higher recurrence rates and lower survival rates compared with the former18. In addition, many studies have shown that classifying positive margins (R1 resection) into ductal margins and radial margins is helpful for more accurate prognostic stratification and patient selection for adjuvant therapy, although there is no significant difference in their impact on the survival rate19,20,21. Therefore, it is meaningful to classify the status of positive surgical margins in detail.

Notably, recent studies have demonstrated that the impact of positive margins on OS in postoperative patients with hCCA may be closely associated with lymph node metastasis status22,23. Specifically, in the presence of lymph node metastasis, margin status does not appear to influence survival outcomes. Another study has indicated that lymph node metastasis primarily determines prognosis regardless of margin status24. This contrasts with our findings, which might be attributed to the limitations of the single-center study population. However, Koca F. et al.25 found that patients with positive margins have worse prognoses after resection for hCCA, irrespective of lymph node metastasis. Thus, future research should further explore the effect of margin status on survival outcomes in patients with lymph node metastasis.

In addition, based on the extent of tumor invasion, surgical resection for hCCA involves the excision of the diseased biliary ducts and major liver resection. Previous study has indicated that the caudate lobe is frequently involved in 40–98% of hCCA cases, thereby suggesting combined caudate lobe resection to achieve radical resection surgery26. Furthermore, numerous studies have consistently demonstrated that combined caudate lobe resection as a strategy for radical resection of hCCA does not significantly elevate postoperative morbidity and mortality27,28,29,30. Given the high incidence of caudate lobe involvement in hCCA and the association between positive margins and poor survival, our findings support including caudate lobe resection in curative surgery to achieve negative margins, when safely feasible.

The poor survival outcomes associated with lymph node metastasis and low TLNC underscore the importance of adequate lymphadenectomy and comprehensive staging in hCCA management. Previous studies have indicated that there is a significant difference in survival rates between hCCA patients exhibiting ≥ 13 and < 13 TLNC, with the latter group exhibiting a markedly worse prognosis31. A recent systematic analysis demonstrated that ≥ 7 TLNC is sufficient for prognostic staging, whereas ≥ 15 TLNC does not enhance the detection of lymph node-positive patients32. In our study, we observed that TLNC played an important role in hCCA patients’ survival. Patients with ≤ 6 TLNC exhibited significantly lower survival rates compared to those with ˃ 6. Notably, the difference in prognostic impact diminished when reaching ≥ 12 TLNC, suggesting that around 12 TLNC maybe sufficient to achieve a satisfactory surgical outcome. Currently, the routine dissection of lymph nodes encompasses the lymph nodes in the hilar, pericholedochal, peripancreatic, periportal and common hepatic artery regions33.

In numerous cases of malignant tumors, poorly differentiated tumors generally indicate a poorer prognosis8,34,35. Similarly, this trend is also observed in hCCA. Recent studies have revealed that perineural invasion serves as a factor contributing to the poor prognosis of hCCA, with an incidence ranging from 56.0 to 88.0% in biliary tract tumors36. Interestingly, perineural invasion is more prevalent in moderately and poorly differentiated tumors37. This finding suggests that the poorer the differentiation status of hCCA, the higher the likelihood of perineural invasion. Poorly differentiated tumors, which exhibit a higher metastatic potential, further demonstrate their independent prognostic value. Although perineural invasion is also associated with a poor prognosis, ML algorithms prioritize tumor differentiation in risk factor extraction. This is likely because tumor differentiation reflects broader biological aggressiveness, including the propensity for perineural invasion, and thus has a stronger ability to influence prognosis. In the future, it is meaningful to conduct an in-depth exploration of the association between tumor differentiation and perineural invasion.

We further employed ML-based nomogram to stratify patients into three risk groups, which provides a clinically actionable framework for postoperative management. In the training set, for patients belonging to the high-, middle-, and low-risk groups, OS was (39.3 ± 1.7) months, (24.1 ± 3.2) months and (10.3 ± 1.4) months, respectively (p < 0.001). Similarly, in the testing set, OS was (43.4 ± 5.4) months, (20.9 ± 2.3) months and (10.3 ± 1.3) months, respectively (p < 0.001). These findings underscore the utility of the nomogram for postoperative risk assessment, enabling tailored surveillance and intervention strategies. For patients categorized within the low- and middle-risk groups, optimized postoperative care including enhanced follow-up imaging and symptom monitoring may suffice. Among patients belonging to the high-risk group, the administration of adjuvant therapy may also represent a significant option. However, given the nomogram’s reliance on postoperative variables, its utility is confined to post-surgical risk stratification and cannot guide preoperative decision-making. Future studies integrating preoperative imaging or molecular markers may bridge this gap.

The Bismuth-Corlette classification has become the most widely adopted classification and surgical guidance for hCCA, since its proposal in 197516. The TNM staging system is the most commonly used and traditional method for predicting the prognosis of hCCA patients. However, the TNM staging system has limited accuracy in prognostic assessment, and it is challenging to personalize the evaluation of postoperative patients with hCCA38. We compared the performance of our ML-based nomogram with established prognostic tools, including the TNM staging system, the Bismuth-Corlette classification and previously reported nomograms. Our results conclusively demonstrated that the performance of the proposed model surpassed that of the TNM staging system, the Bismuth-Corlette classification and previously established nomograms derived from similar studies (Table 2). The C-index values of our model validate this superiority. Specifically, compared with our study, these previous studies have a smaller range of variables, rely solely on a single analytical method and have a smaller real sample size. Consequently, these findings indicate a favorable improvement in the predictive accuracy and clinical utility of our model over existing methods.

Table 2 Comparison of traditional and nomogram models for hCCA.

However, as a retrospective study, there may be some limitations. Firstly, the sample size of patients with hCCA after curative-intent resection was limited and they all came from a single center. Therefore, a notable limitation is the lack of external validation in an independent cohort, which is essential to confirm generalizability. Secondly, our study encompassed a long time span. Regrettably, we did not include the year of surgery as a variable. During this period, pathological and molecular detection techniques, perioperative care technologies, adjuvant therapy strategies, follow-up methods, and the social environment have been continuously evolving. This might have led to better prognoses for patients treated in recent years compared to those who received treatment earlier. Thirdly, deficiencies in the standardization and completeness of chemotherapy could also exert biases on the results. Considering these limitations, future large-sample, multicenter prospective studies are warranted. These studies should employ a wider range of ML algorithms, integrate multi-omics data, such as emerging biomarkers, preoperative imaging and pathomics, and refine the modalities of adjuvant therapy. Furthermore, it is essential to further validate the performance of the nomogram across diverse populations and treatment regimens, thereby enhancing prognostic accuracy.

Conclusion

This study employed ML algorithms to develop and validate a nomogram for predicting the 1-, 2-, and 3-year OS rates of patients with hCCA after curative-intent resection. Upon validation, the nomogram exhibited good performance. In summary, this ML-based nomogram underscored critical risk factors influencing hCCA patients’ postoperative survival, thereby offering personalized prognostic prediction and maximizing clinical benefits.