Introduction

Colon cancer, a gastrointestinal malignancy characterized by a bleak prognosis, exhibits an escalating incidence attributed to shifts in human lifestyle and dietary patterns. According to a 2019 epidemiological survey1, colon cancer ranks third globally in malignant tumor incidence, following lung and breast cancer. Radical surgery remains the primary treatment for colon cancer, with complete mesocolic excision (CME), initially proposed by Hohenberger2,3, proving effective in diminishing morbidity and mortality rates. CME underscores the importance of thoroughly excising the tumor and colonic mesentery, clearing surrounding lymph nodes, and entirely obstructing the main vessels of the colon, embodying the tenets of radical colon cancer surgery.

Despite the comprehensive exploration of CME’s benefits, clinicians have observed that certain patients experience postoperative ischemia-reperfusion injury and hemodynamic changes, elevating the risk of postoperative heart failure (HF). The incidence of these complications is anticipated to rise with the increasing age of the population undergoing the procedure. HF, the ultimate stage of cardiovascular disease, poses a significant global public health challenge due to its associated high morbidity and mortality4. Recognizing the pivotal role of imaging in HF diagnosis, a worldwide multicenter prospective study named VISION emphasized the significance of elevated postoperative troponin in predicting myocardial injury and HF after non-cardiac surgery5. Nevertheless, subjecting all patients to such tests would impose a considerable burden on healthcare systems globally. Consequently, it is imperative to pinpoint risk factors for postoperative HF development in patients.

Artificial intelligence (AI) is rapidly advancing in the medical realm6,7,8, with machine learning, a prominent AI branch, lauded for its stable model construction and precise prediction capabilities. Embraced by clinicians, machine learning is employed in clinical prediction and various domains9,10,11. In this investigation, we scrutinized clinical data from colon cancer patients, utilizing machine learning algorithms to formulate a predictive model for postoperative HF. By doing so, it holds the promise of curbing healthcare costs and aiding clinicians in devising timely and accurate individualized treatment plans for their patients.

Materials and methods

Study subjects

This study leveraged data from the clinical databases of Wuxi People’s Hospital, affiliated with Nanjing Medical University.

The inclusion criteria were defined as follows: (a) patients who underwent open complete mesocolic excision (CME) or laparoscopic-assisted CME; (b) the surgical team comprised senior surgeons proficient in independently performing CME; (c) patients received a confirmed postoperative pathological diagnosis of colon cancer; and (d) patients received a confirmed postoperative diagnosis of heart failure (HF).

Exclusion criteria were outlined as follows: (A) patients with other malignant tumors; (B) patients with distant metastasis of colon cancer confirmed through pathological examination or imaging; (C) patients with a preoperative diagnosis of acute HF, atrial fibrillation, or other severe cardiac dysfunction diseases; (D) patients with a history of open-heart surgery; and (E) patients with missing cases, incomplete clinical data, or absent follow-up.

During the postoperative period, the two surgeons conducted regular assessments, including history taking, electrocardiogram (ECG), and echocardiography, every other month. All participants in the study underwent a follow-up period lasting a minimum of 5 years post-surgery. The study adhered to the principles of the Declaration of Helsinki and received approval from the Ethics Committee of Wuxi People’s Hospital, with approval number KY22085.

Study design and data collection

A comprehensive set of 37 variables was collected, encompassing preoperative variables (within 24 h of the surgery day), intraoperative variables, and postoperative variables. The preoperative variables included patient demographic characteristics (gender, age, smoking history, alcohol history, and body mass index), basic clinical characteristics (American Society of Anesthesiologists score, nutrition risk screening 2002 score, history of surgery, duration of disease, adjuvant chemotherapy history, and adjuvant radiotherapy history), basic medical history (anemia, tachycardia, chronic kidney disease, diabetes, hypertension, chronic obstructive pulmonary disease, and hyperlipidemia), laboratory tests (albumin, carcinoembryonic antigen, carbohydrate antigen 19 − 9, procalcitonin, C-reactive protein, serum amyloid A, and neutrophil to lymphocyte ratio), and tumor characteristics (T-stage, N-stage, tumor size, and number of tumors). Intraoperative variables covered the surgical approach, type of surgery, duration of surgery, intraoperative bleeding, whether it was an emergency procedure, intraoperative tachycardia, and intraoperative blood transfusion. The study’s outcome variable focused on postoperative heart failure (HF).

Diagnosis of HF and definition of associated factors

The diagnosis and evaluation of heart failure (HF) necessitate a comprehensive approach encompassing patient history, physical examination, laboratory investigations, cardiac imaging, and functional assessments. Initially, the likelihood of heart failure is gauged through clinical manifestations such as recent weight gain, jugular venous distension, and peripheral edema. Elevated jugular venous pressure and alterations in the location of the apical impulse are also hallmark indicators in the diagnostic process. Furthermore, in patients with suspected postoperative heart failure, an electrocardiogram (ECG) is performed to assess rhythm, heart rate, QRS morphology and duration, as well as to identify pathological changes, including atrial fibrillation or left ventricular hypertrophy. Chest imaging is conducted to identify pulmonary conditions or other potential etiologies of dyspnea, providing critical insights into pulmonary congestion, pulmonary edema, and cardiomegaly. However, normal findings on chest imaging do not definitively exclude the diagnosis of heart failure. Plasma levels of B-type natriuretic peptide (BNP) or N-terminal proBNP (NT-proBNP) are also measured to evaluate the severity and prognosis of the condition. Ultimately, cardiac ultrasound, regarded as the diagnostic gold standard, is employed to assess both the structure and function of the heart, with left ventricular ejection fraction (LVEF) quantified using the Simpson biplane method.

In this study, the diagnosis of HF was grounded in a unified, comprehensive evaluation encompassing patient history, physical examination, laboratory investigations, cardiac imaging, and functional assessments. In instances where the diagnosis remained uncertain, clinical manifestations such as recent weight gain, jugular venous distension, and peripheral edema were prioritized during the initial evaluation. All patients suspected of postoperative heart failure underwent echocardiography, with LVEF measured using the Simpson biplane method. Patients with an LVEF of less than 50% were classified as having heart failure with reduced ejection fraction (HFrEF), while those exhibiting a normal LVEF (≥ 50%) were diagnosed with heart failure with preserved ejection fraction (HFpEF), provided other clinical symptoms were present. In cases where echocardiographic data were unavailable or inconclusive, the diagnosis was augmented with clinical evaluations, including ECG findings and plasma levels of BNP or NT-proBNP.

Development and evaluation of predictive models for machine learning algorithms

The construction and assessment of clinical prediction models were carried out using SPSS software and R software, involving the following steps:

  1. (a).

    Data Preprocessing:

  • - Colon cancer patients from January 2010 to December 2015 comprised the model establishment set, while patients from January 2016 to January 2018 formed the external validation set.

  • - The model establishment set was randomly divided into a training set (70%) and a test set (30%).

  1. (b).

    Univariate and Multivariate Regression Analyses:

  • - Univariate and multivariate regression analyses were performed on the data from the model establishment set.

  • - Categorical variables underwent chi-square tests, t-tests were used for continuous variables following a normal distribution, and the rank sum test was applied for non-normally distributed continuous variables.

  • - Logistic regression analysis was conducted on significant variables from the univariate analysis to identify independent influences on postoperative HF.

  • - Four models (XGBoost, RF, SVM, and KNN) were used to rank the importance of each independent influence, with the top ten variables selected.

  1. (c).

    Machine Learning Algorithm Prediction Models:

  • - The selected feature variables were integrated into four machine learning algorithm prediction models (SVM, RF, XGBoost, and KNN).

  • - Models were evaluated for discrimination, calibration, and clinical usefulness. The best model was chosen based on these evaluations.

  • - ROC curves were generated to determine the area under the curve (AUC) value, calibration curves assessed agreement between predicted and actual results, and decision curve analysis (DCA) gauged patient benefit post-intervention.

  1. (d).

    External Validation:

  • - The best model underwent external validation using an external test set.

  • - ROC curves and calibration curves were plotted to assess generalizability and predictive efficiency.

  1. (e).

    Model Interpretation:

  • - SHAP analysis (Shapley value) was employed to ascertain the contribution of each feature to the prediction.

  • - SHAP summary plots, ranking the importance of risk factors, and SHAP force plots, analyzing and interpreting individual sample prediction results, were constructed based on Shapley values.

Code availability

This work is supported by the Extreme Smart Analysis platform (https://www.xsmartanalysis.com/). In addition, to ensure that versions of custom code, software, or algorithms described in the publication are maintained, we will release them as uploads to GitHub (https://github.com/SongyunZhao123/zsy.git).

Results

Basic clinical information of the patients

The study comprised a total of 1158 patients, with 172 (14.85%) experiencing postoperative heart failure (HF) (Fig. 1; Table 1). The original dataset is provided in Supplementary Table.

Fig. 1
Fig. 1
Full size image

Flow diagram of patients included in the study.

Table 1 Preoperation and intraoperative information.

Analysis of risk factors for postoperative HF

Results from univariate and multivariate analyses identified independent influencing factors for postoperative HF, including age, body mass index (BMI), smoking history, hyperlipidemia, hypertension, diabetes, preoperative tachycardia, chronic obstructive pulmonary disease (COPD), N-stage, blood transfusion, intraoperative tachycardia, C-reactive protein (CRP) level, and neutrophil to lymphocyte ratio (NLR) (P < 0.05) (Table 2). Machine learning models (XGBoost, RF, SVM, and KNN) highlighted risk factors, such as advanced age, preoperative and intraoperative tachycardia, BMI ≥ 25 kg/m2, smoking history, COPD history, hypertension history, NLR ≥ 3, and CRP level ≥ 10 mg/l (Fig. 2A-D).

Table 2 Univariate and multivariate analyses of variables related to HF.
Fig. 2
Fig. 2
Full size image

The variable influence factor ranking plots of the four models. (A) Variable importance ranking diagram of the XGBoost model. (B) Variable importance ranking diagram of the RF model. (C) Variable importance ranking diagram of the SVM model. (D) Variable importance ranking diagram of the KNN model.

Model Building and evaluation

The ROC curve results show that XGBoost has an AUC value as high as 0.990 in the training set; the AUC value in the validation set is 0.941, which is the best performance among the four models (Table 3). The calibration curve results show that the calibration curves of the four models are similar to the ideal curves, and the models have high consistency between the predicted and actual results. The DCA curves showed that all four models achieved a net clinical benefit relative to either the full treatment or no treatment plan (Fig. 3A-D). The k-fold cross-validation method was used to compare the generalization ability of the four models. A sample of 254 cases (30.09%) was taken as the test set in the internal validation set, and the remaining samples were used as the training set for 10-fold cross-validation, AUC = 0.9252 ± 0.0464 in the validation set of XGBoost, AUC = 0.9272 in the test set, accuracy = 0.8819 (Fig. 4A-C); AUC = 0.9071 ± 0.0533 in the validation set of RF, AUC = 0.9227 in the test set, accuracy = 0.8780; AUC = 0.9188 ± 0.0601 in the validation set of SVM, AUC = 0.9218 in the test set, accuracy = 0.8543; AUC = 0.8587 ± 0.1024 in the validation set of KNN, AUC = 0.9032 in the test set, accuracy = 0.9173. After a comprehensive comparison, the XGBoost algorithm was chosen to construct the model in this study.

Table 3 Evaluation of the performance of the four models.
Fig. 3
Fig. 3
Full size image

Evaluation of the four models for predicting heart failure. (A) ROC curves for the training set of the four models. (B) ROC curves for the validation set of the four models. (C) Calibration plots of the four models. The 45° dotted line on each graph represents the perfect match between the observed (y-axis) and predicted (x-axis) complication probabilities. A closer distance between two curves indicates greater accuracy. (D) DCA curves of the four models. The intersection of the red curve and the All curve is the starting point, and the intersection of the red curve and the None curve is the node within which the corresponding patients can benefit.

Fig. 4
Fig. 4
Full size image

Internal validation of the XGBoost model. (A) ROC curve of the XGBoost model for the training set. (B) ROC curve of the XGBoost model for the validation set. (C) ROC curve of the XGBoost model for the test set. (D) External validation of the XGBoost model.

Model external validation

The ROC curve for the external validation set yielded an AUC value of 0.93, indicating high accuracy in disease prediction (Fig. 4D).

Model explanation

The SHAP summary plot result shows that the risk factors for HF after CME were ranked as advanced age, history of COPD, history of smoking, intraoperative tachycardia, history of hypertension, CRP level ≥ 10 mg/l, NLR ≥ 3, preoperative tachycardia, and BMI ≥ 25 kg/m2 (Fig. 5). The SHAP force plots show the predictive analysis of the study model for four HF patients. The model predicted a 0.004 probability of HF in patient I, and the characteristics that increased its probability were NLR ≥ 3, BMI ≥ 25 kg/m2, and preoperative tachycardia; the model predicted a 0.012 probability of HF in patient II, and the characteristics that increased its probability were NLR ≥ 3, BMI ≥ 25 kg/m2, history of smoking, and intraoperative tachycardia; the model predicted a probability of HF in patient III of 0.958, and the characteristics that increased its probability were NLR ≥ 3, CRP level ≥ 10 mg/l, history of COPD, history of hypertension, preoperative tachycardia, and intraoperative tachycardia; the model predicted a probability of HF in patient IV of 0.497, and the characteristics that increased its probability were age ≥ 65, CRP level ≥ 10 mg/l, history of hypertension, preoperative tachycardia, and intraoperative tachycardia (Fig. 6A-D).

Fig. 5
Fig. 5
Full size image

SHAP summary plot. Risk factors are arranged along the y-axis based on their importance, which is given by the mean of their absolute Shapley values. The higher the risk factor is positioned in the plot, the more important it is for the model.

Fig. 6
Fig. 6
Full size image

SHAP force plot. The contributing variables are arranged in the horizontal line, sorted by the absolute value of their impact. Blue represents features that have a negative effect on disease prediction, with a decrease in SHAP values; red represents features that have a positive effect on disease prediction, with an increase in SHAP values. (A) Predictive Analysis of Patient (I) (B) Predictive Analysis of Patient (II) (C) Predictive Analysis of Patient (III) (D) Predictive Analysis of Patient IV.

Discussion

Although HF is not the sole complication after procedures like CME, its potential occurrence remains a crucial concern. All major surgeries, including CME, elevate patients’ risk of various complications, HF included. Identifying specific risk factors for HF in this context can facilitate enhanced patient monitoring and management during the critical postoperative phase12. Although clinicians have various methods to identify major complications like postoperative HF after total mesocolonectomy, several key factors justify the use of novel machine learning methods to explore HF. Machine learning algorithms possess the capability to analyze extensive and complex datasets, revealing subtle patterns and risk factors that might elude traditional statistical methods. This can result in more accurate predictions and personalized risk assessments. By precisely predicting the risk of HF, healthcare providers can implement targeted interventions and preventive measures, thereby enhancing patient outcomes, reducing mortality rates, and improving overall quality of care.

This study assesses the predictive models for heart failure (HF) risk constructed by four distinct machine learning algorithms. Notably, the XGBoost algorithm demonstrates exceptional accuracy and stands out for its efficiency, flexibility, and universal adaptability13,14. The AUC values of XGBoost in the training and validation sets, 0.990 and 0.941 respectively, underscore its exceptional discriminative power, significantly surpassing other models such as RF, SVM, and KNN. In 10-fold cross-validation, XGBoost achieved an AUC of 0.9252 ± 0.0464, reflecting its strong generalization capability. Although KNN exhibited slightly higher accuracy in the test set (0.9173), XGBoost demonstrated more stable and consistent performance, with superior AUC values and overall evaluation metrics. In contrast, other algorithms exhibited certain limitations. While Random Forest (RF) is proficient at managing high-dimensional data and mitigating overfitting, its AUC (0.9071 ± 0.0533) and accuracy (0.8780) were marginally lower than XGBoost’s. Moreover, RF is more sensitive to hyperparameter tuning, necessitating intricate adjustments to achieve optimal performance15. Support Vector Machine (SVM) achieved an AUC of 0.9188 ± 0.0601, indicating strong predictive capability. However, its computational complexity is notably high, particularly when handling large-scale datasets, leading to significantly increased training times. Additionally, SVM is more sensitive to noisy data, and its performance diminishes compared to XGBoost when addressing data imbalances. Meanwhile, although KNN demonstrated high accuracy (0.9173) in this study, its AUC of 0.8587 ± 0.1024 reveals a relatively weak discriminative ability. KNN is prone to the “curse of dimensionality” when handling high-dimensional data, with substantial computational overhead, especially in the presence of numerous feature variables. This renders its efficiency and effectiveness inferior to that of XGBoost, particularly when managing large-scale datasets16. Moreover, XGBoost’s calibration curve aligns closely with the ideal curve, indicating a high degree of agreement between predicted probabilities and actual outcomes, thereby enhancing the model’s reliability in clinical applications. XGBoost is also highly effective at managing high-dimensional data, utilizing its built-in feature importance ranking function to prioritize variables that most significantly influence the prediction of postoperative heart failure. This capability proves particularly advantageous in complex datasets with up to 37 feature variables. Additionally, XGBoost incorporates L1 and L2 regularization, which mitigates overfitting, allowing the model to maintain robust performance across both the training and validation phases.

Several studies17,18 have validated the efficacy of machine learning algorithm applications in clinical diagnosis and prognosis. Furthermore, machine learning techniques excel in accurately predicting adverse outcomes in disease progression when compared to traditional diagnostic methods. In the context of this study, machine learning algorithms played a pivotal role in developing the predictive model. The resulting model facilitates patient avoidance of unnecessary tests, alleviates financial burdens on patients’ families, and reduces the side effects associated with diagnostic instruments. Most importantly, the model aids clinical decision-makers in accurately identifying high-risk patients, enabling timely intervention and improving patient prognosis. The SHAP analysis results from this study identify advanced age, preoperative and intraoperative tachycardia, BMI ≥ 25 kg/m2, a history of smoking, a history of chronic obstructive pulmonary disease (COPD), a history of hypertension, neutrophil-to-lymphocyte ratio (NLR) ≥ 3, and C-reactive protein (CRP) level ≥ 10 mg/l as significant risk factors for postoperative HF following colorectal cancer surgery. We will enhance our postoperative care for these patients by closely monitoring cardiac ultrasound, electrocardiogram, and serum levels of brain natriuretic peptide (BNP) or N-terminal pro-brain natriuretic peptide (NT-proBNP) to promptly detect any abnormal cardiac function. Additionally, we can initiate or adjust cardiac medications such as beta-blockers, ACE inhibitors, and diuretics for these patients to improve their short-term prognosis and survival.

As age increases, patients face a heightened likelihood of developing organic lesions in the heart. In older patients, the heart muscle undergoes various degrees of hypertrophy, fibrosis, and other pathological changes, significantly impacting the heart’s pumping function. Major procedures such as colorectal cancer surgery induce changes in the sympathetic and endocrine systems in older patients, increasing the volume and pressure load on the heart and elevating the risk of cardiovascular diseases, including HF. This susceptibility is also observed in patients with preoperative and intraoperative tachycardia. The study results emphasize that both these patient groups are at a higher risk of postoperative HF. The authors posit that surgery, as a specific type of trauma, exacerbates existing complications in patients. Those with tachycardia face a potential risk of HF, and patients with atypical clinical symptoms may develop acute HF in response to surgical triggers. Moreover, the procedure may lead to increased pressure in the patient’s left atrium, significantly heightening the risk of postoperative HF. Philippe et al. have similarly demonstrated a strong association between patients’ heart rate and postoperative HF, indicating a 10% increase in the risk of postoperative HF for every 10 beats/min increase in heart rate19,20,21. This further underscores the utility of a rapid heart rate as a novel risk factor for predicting postoperative HF.

The current investigation utilizes four distinct samples to elucidate the predictive capacity of the model concerning postoperative heart failure (HF) in patients. For instance, in the disease prediction analysis of samples I and II, elevated Body Mass Index (BMI) emerged as a notable risk factor. Such patients are predisposed to comorbid chronic conditions like hypertension and diabetes due to obesity and an unhealthy lifestyle22,23,24, significantly heightening the risk of HF25,26. Furthermore, in obese patients undergoing colorectal cancer surgery, whether via open or laparoscopic-assisted procedures, achieving local exposure of the surgical field becomes more challenging. Additionally, individuals with high BMI tend to possess more perivascular adipose tissue, increasing the likelihood of injury. Collectively, these factors contribute to heightened procedural complexity, necessitate advanced surgical techniques, and elevate the risk of intraoperative bleeding, thereby increasing the hemodynamic burden on the patient. Zhou’s work also substantiates that high BMI independently serves as a risk factor for HF, demonstrated through an obesity-induced animal model27. In predictive analyses of samples II and III, a history of smoking and chronic obstructive pulmonary disease (COPD) played pivotal roles in postoperative HF development. Long-term smoking and chronic inflammation elevate the cardiopulmonary burden, potentially leading to pulmonary stasis, compensatory myocardial hypertrophy, and compromised cardiac function. Importantly, these patients may exhibit altered airway structures due to chronic inflammation, reduced protective defenses, an increased likelihood of pulmonary infection, and heightened susceptibility to postoperative HF28. This underscores the importance of clinicians providing active health education, encouraging smoking cessation two to three weeks before surgery, promoting appropriate exercise to enhance cardiopulmonary function, and closely monitoring pulmonary signs and laboratory parameters. Prophylactic use of antimicrobial drugs may also mitigate the risk of pulmonary infections.

A 20-year follow-up investigation highlighted hypertension as an independent risk factor for heart failure (HF)29. Correspondingly, a recent study reported a twofold higher risk of HF in hypertensive patients compared to those with normal blood pressure30. Our study aligns with these findings, underscoring a heightened risk of postoperative HF in hypertensive individuals. Hypertensive patients exhibit increased cardiac ejection resistance and oxygen consumption, contributing to pathological outcomes like cardiomyocyte hypertrophy and interstitial fibrosis, thereby elevating HF risk. Additionally, elevated blood pressure activates the renin-angiotensin-aldosterone system (RAAS) and sympathetic nervous system, causing excessive contraction of the systolic left ventricular wall, resulting in myocardial hypertrophy and ventricular wall thickening. Giovanni’s experimental study, involving 2078 participants, further corroborates the strong association between left ventricular wall hypertrophy and the development of heart failure with preserved ejection fraction (HFpEF) and heart failure with reduced ejection fraction (HFrEF)31. Consequently, we assert that a history of hypertension serves as a crucial high-risk factor for predicting postoperative HF. Clinicians are urged to prioritize hypertensive patients preoperatively, emphasizing active health education and lifestyle modifications, such as smoking cessation, reduced salt and sugar intake, and appropriate exercise32. Clinical decision-makers can also consider employing thiazide-like diuretics, angiotensin-converting enzyme inhibitors, and angiotensin II receptor blockers for high-risk patients to mitigate blood pressure levels and enhance myocardial remodeling33.

Recently, some researchers have sought to detect heart failure using imaging methods that could replace echocardiography, positing that these alternatives have the potential to reduce global healthcare expenditures. However, this approach has been associated with an elevated misdiagnosis rate and an increased risk of patient mortality. Despite the successful construction of several heart failure (HF)-related prediction models in recent years, these studies have tended to focus primarily on the general clinical characteristics of patients34,35. The development of postoperative HF is intricately influenced by neurological and endocrine systems, where alterations in these systems can lead to the overexpression of proinflammatory cytokines and signaling molecules. In light of this, the present study incorporated relevant inflammatory indicators for discussion, revealing a higher incidence of HF in patients with elevated postoperative Neutrophil-to-Lymphocyte Ratio (NLR) and C-reactive protein (CRP). A meta-analysis of 14 studies further corroborated a strong association between patients’ inflammatory markers and HF development36. Similarly, Christian employed CRP to predict patients’ cardiovascular function, offering precise individualized treatment and thereby supporting the findings of the current study37,38,39.

The present study undertook a rigorous and comprehensive evaluation of the model, assessing its discrimination, calibration, and clinical utility. This analysis holds the potential to establish a paradigm for forecasting postoperative complications in complex abdominal surgeries exhibiting similar characteristics. However, several notable limitations warrant consideration. Although the study incorporated a wide array of variables, the primary emphasis was placed on demographics, preoperative evaluations, surgical type, and intraoperative data. Key postoperative factors related to heart failure, such as postoperative care, long-term medication adherence, and the patient’s mental health, were regrettably not incorporated, thus potentially restricting the model’s capacity to predict heart failure in a more holistic manner. Additionally, certain variables may be susceptible to measurement inaccuracies or inconsistencies (e.g., subjective interpretation of preoperative test results), introducing the possibility of bias into the analysis. Notably, the potential cardiotoxicity of certain chemotherapy agents, particularly anthracyclines, which are frequently employed in colon cancer treatment, was not addressed. Future investigations should prioritize exploring this concern in greater depth. While the study predominantly focused on patients with HFrEF and HFpEF, ensuring a broad representation of heart failure subtypes within the postoperative cohort, we acknowledge that different HF types may carry distinct prognostic implications, which constitutes a limitation of the current analysis. Future research would benefit from a more nuanced examination of specific HF subtypes. Furthermore, while the machine learning algorithm demonstrated enhanced accuracy, its model was more intricate and less interpretable. The entire computational and decision-making process of the model operates as a black box, lacking the intuitive clarity of logistic regression models40,41,42. The data utilized in this study were derived from a single-center cohort of colon cancer patients, which may not fully reflect patient populations in other regions or healthcare institutions. Notably, variations in the incidence of postoperative heart failure across different populations may result in the model performing less optimally in external cohorts than in the current study. Disparities in surgical techniques, standards of care, and the pathological characteristics of patients across hospitals may influence both the incidence and the risk factors for heart failure, thereby limiting the generalizability of the model. Multicenter prospective studies should be incorporated into future research to further enhance the reliability and robustness of the results. To enable clinical application of the model, various patient datasets can be input for external validation, providing an estimation of the probability of postoperative complications and assessing the model’s accuracy. However, the success of the model’s implementation relies on the precision and consistency of data input. Discrepancies in data formats and quality across hospitals, or even within different departments of the same institution at varying times, pose a significant challenge to its widespread adoption. To address this, hospital information systems must standardize data collection protocols, ensuring seamless integration of preoperative, intraoperative, and postoperative data to facilitate real-time automated predictions. Moving forward, we aim to incorporate the model into a clinical decision support system (CDSS) that will assist physicians in making critical preoperative and intraoperative decisions. This CDSS will feature a streamlined user interface and integrate other essential information, such as surgical risk scores and postoperative care plans, offering comprehensive guidance to clinicians.

Conclusion

This study employed the XGBoost machine learning algorithm to construct a predictive model for assessing the risk of heart failure (HF) following colorectal cancer surgery (CME). The model exhibited commendable accuracy and clinical utility, offering valuable support for timely diagnosis by surgeons. The findings underscored that HF remains a significant concern for CME patients post-surgery, with strong associations identified with advanced age, preoperative and intraoperative tachycardia, BMI ≥ 25 kg/m2, a history of smoking, a history of chronic obstructive pulmonary disease (COPD), a history of hypertension, Neutrophil-to-Lymphocyte Ratio (NLR) ≥ 3, and C-reactive protein (CRP) levels ≥ 10 mg/l.