Introduction

Colorectal cancer (CRC), which originates from abnormal cell proliferation in the mucosa of the colon or rectum, is one of the most common malignant tumors of the digestive system. The incidence and mortality rates of CRC are continuously increasing, making it the third most common cancer globally and posing significant challenges to public health and healthcare systems1. According to the World Health Organization (WHO), there were approximately 1.93 million new cases of CRC and 935,000 CRC-related deaths worldwide in 2020, accounting for about 10% of all new cancer cases and 9.4% of all cancer deaths, respectively2. From 2000 to 2016, the incidence and mortality rates of CRC in China consistently increased annually, ranking second and fourth, respectively, among malignant tumors3. This trend exacerbates the burden on public health.

The Chinese Protocol of Diagnosis and Treatment of CRC (2023 edition) indicates that the current treatment for CRC primarily involves surgical resection, supplemented by preoperative neoadjuvant therapy, postoperative chemoradiotherapy, targeted therapy, and immunotherapy as part of a comprehensive treatment approach4. Standardized treatment is a crucial factor affecting the prognosis of CRC patients. Actively promoting standardized early diagnosis and treatment is essential for further improving the prognosis of CRC patients5. The specific treatment and follow-up plans need to be assessed by a clinician based on the patient’s individual situation and prognosis6,7. Therefore, accurately assessing patient prognosis risks is essential for formulating personalized treatment interventions. Ensuring adequate treatment while minimizing unnecessary iatrogenic harm and the economic burden of overintervention has become a key factor affecting the survival time and quality of life of CRC patients.

Currently, clinicians mainly assess patients’ prognostic risk based on staging by the American Joint Committee on Cancer (AJCC)8,9. However, due to tumor heterogeneity, even patients at the same stage may exhibit different survival outcomes10,11. To more accurately identify high-risk patients with adverse prognoses, various assessment tools have been developed, most of which are based on traditional logistic regression and Cox proportional hazards regression models12. Traditional statistical models struggle to handle nonlinear relationships and high-dimensional data; in contrast, machine learning (ML) models have fewer assumption constraints, enabling them to better capture complex patterns in the data, thus exhibiting superior predictive performance13. However, ML models also face challenges; when feature dimensions are too high, ML models suffer from issues such as high complexity, low generalization ability, and high computational costs14. Additionally, ML models often struggle to provide intuitive or easily understandable explanations, posing challenges for clinicians' decision-making15.

To address these issues, we collected clinically accessible data from hospitalized CRC patients and established a longitudinal cohort dataset through follow-up. For this follow-up dataset, we employed various models combined with the recursive feature elimination (RFE) method to construct and select the best-performing predictive model. Furthermore, we utilized the SHapley Additive exPlanations (SHAP) method to visualize the “black-box model” for decision-making. The RFE algorithm can eliminate redundant features, reduce model complexity, and improve model generalization ability. ML models can capture complex relationships within the data, leading to more accurate predictions. SHAP can explain the model’s prediction results, aiding in the understanding of the underlying principles of the model’s operation. By integrating RFE algorithms, ML models, and SHAP, a predictive model with high accuracy and strong interpretability can be constructed16,17,18. Therefore, this study employed the aforementioned methods for modelling, aiming to enhance the accuracy and interpretability of the CRC prognostic model, thus providing more reliable support for personalized treatment and clinical decision-making.

Materials and methods

Study design and data source

This study aimed to develop and validate a prognostic model for CRC based on a five-year follow-up dataset of CRC patients. CRC patients undergoing surgical treatment at Yixing Hospital were selected as the study subjects. A total of 470 patients were included in the baseline. The inclusion criteria were as follows: (1) patients who were hospitalized and underwent surgical treatment between January 2006 and December 2010; (2) patients who were diagnosed with primary CRC by histopathology, and the exclusion criteria were as follows: (1) patients who had received chemotherapy, radiotherapy, or immunotherapy prior to the operation; (2) patients with malignant tumors of other organs or tissues or other serious systemic diseases (e.g., liver or kidney dysfunction, haematological diseases, infections or inflammatory diseases); (3) patients who were lost to follow-up. Collected data included demographic information, clinical blood test indicators, histopathological results, and additional treatment-related information. All data were obtained from the medical records documented in the clinical electronic medical record system during the patients' hospitalization. The blood test indicators specifically referred to the laboratory results of the last blood sample collected before surgery. Follow-up for survival time and status of these patients was completed before December 30, 2015. The endpoint was overall survival (OS). OS was calculated from the date of diagnosis to the date of death or the last follow-up. For the purpose of the analysis, patients were divided into two groups based on their five-year survival status: those who died within five years and those who survived for at least five years.

Data cleaning and splitting

Based on clinical availability and modelling requirements, a sample is deleted if more than 30% of the feature variables are missing in the sample, and a feature variable is deleted if more than 30% of the feature variables are missing in all samples. In data science and ML practice, the reason for using 30% as a threshold for variable exclusion is to retain a sufficient number of samples and features, while avoiding too much missing data that can negatively affect data quality, thus ensuring data reliability and model stability19,20. Eventually, a clinical follow-up dataset containing 411 patients with 73 feature variables was established. The specific variables collected are shown in Table S1. The proportion of missing data in the follow-up dataset is shown in Fig. S1 and Table S2. For feature variables with a missing value rate of less than 30%, Little's MCAR (Missing Completely At Random) test was used to assess the randomness of the missing data. The p-value was 0.922, greater than 0.05, indicating that these missing values are completely random (MCAR). For this missing pattern, multiple imputation was used to fill in the gaps21. Multiple imputations generated several complete datasets, each representing different estimates of the missing values22. Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) were used as widely accepted model selection criteria to balance goodness of fit and model complexity23,24,25. Among the multiple datasets generated by multiple imputation, the dataset with the smallest AIC and BIC values was selected as the modeling dataset for subsequent model construction. Meanwhile, additional datasets were generated using Random Forest Imputation and K-Nearest Neighbors (KNN) Imputation for subsequent sensitivity analysis. The modeling dataset was randomly split into training and validation sets at a ratio of 7:3. Feature selection and model training were conducted based on the training set data, while model evaluation was performed using the validation set to verify the performance of the models developed during the training phase.

Feature selection and model building

Based on the characteristics of the dataset, five models were selected, including the Cox proportional hazards model, decision tree (DT), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGBoost) to construct prognostic models. The Cox proportional hazards model was used as the baseline for model comparison. The steps of feature selection and model training are as follows: (1) The model was fitted using all feature variables, and the hyperparameters were optimized through a grid search. During the grid search process, tenfold cross-validation was employed, with accuracy used as the evaluation metric to select the optimal hyperparameter combination. (2) After hyperparameter tuning, individual feature selection was performed for each model, resulting in different feature variables being included in each final model. For the Cox proportional hazards model, the Least Absolute Shrinkage and Selection Operator (Lasso) was used for variable selection. The Lasso method introduces an L1 norm penalty, encouraging sparsity in model coefficients, thereby achieving both variable selection and regularization. For the other four ML models, RFE was used for feature selection. RFE is a wrapper-based feature selection method that iteratively identifies and removes the least important features based on the feature importance ranking provided by the model. Using the respective RFE algorithms for each model, and with ten-fold cross-validation accuracy as the evaluation metric, different optimal feature subsets were selected for these four ML models. (3) The corresponding models were refitted and trained based on the optimal feature subset obtained through feature selection for each model. Parameters were adjusted, and the models were finalized to complete the training process. The specific process of model construction is shown in Fig. 1.

Fig. 1
figure 1

The technical flowchart of this study, including data collection and cleaning, feature selection, model training, model evaluation, and model interpretation.

Model evaluation and interpretation

The models were comprehensively evaluated using various metrics to select the optimal classifier. The evaluation metrics included sensitivity, specificity, positive predictive value, negative predictive value, F1 score, Brier score, and accuracy. For each evaluation indicator, 1000 bootstrap samples were used to calculate its 95% confidence interval. Model discrimination was assessed using receiver operating characteristic (ROC) curves, calibration was evaluated using calibration curves, and clinical utility was determined using decision curve analysis (DCA) curves. The selected optimal model was interpreted using the SHAP method. Based on Shapley value theory, the SHAP method improves model interpretability by assigning an importance value to each feature, revealing each feature's contribution to the model's output and prediction results26. This approach clearly demonstrates how each feature influences the model’s predictions, thereby increasing the model's credibility. In this study, the decision process of the model was visualized by drawing variable importance plots, SHAP summary plots, and SHAP decision plots. Additionally, to evaluate the impact of multiple imputation of missing values on model performance robustness, sensitivity analyses were conducted using complete datasets without missing values, datasets imputed with Random Forest, and datasets imputed with KNN imputation. Furthermore, to assess potential implications of missing treatment-specific information, model performance was stratified by AJCC stages, allowing for a comprehensive evaluation of the model's predictive capability across different disease progression levels. This stratified analysis provided insights into the model's reliability and generalizability across various patient subgroups, while accounting for potential information gaps in treatment documentation.

Statistical analysis

Categorical variables are described as percentages, and between-group comparisons were made using the chi-square test. Continuous variables are described using the median and interquartile range. For continuous variables following a normal distribution, differences were analyzed using the t-test; for skewed continuous variables, the Mann–Whitney U test was used. The significance level was set at a P value of less than 0.05. All analyses in this study were conducted using Python V3.11.4 and R V4.2.2. Statistical testing was performed using the “stats” package (version 4.2.2) and “coin” package (version 1.4.3) in R, and missing values were imputed using the “mice” package (version 3.16.0), “missForest” package (version 1.5), and “VIM” package (version 6.2.2). Several packages in Python were used for data modeling, including “pandas” (version 1.5.3), “numpy” (version 1.24.3), “scikit-learn” (version 1.2.2), “xgboost” (version 1.7.3), and “lifelines” (version 0.27.7). Additionally, the “matplotlib” (version 3.7.1) and "shap" (version 0.42.1) packages were used for model interpretation and data visualization.

Ethics approval

This study was approved by the Ethics Committee of both Yixing Hospital and the Affiliated Hospital of Xuzhou Medical University in accordance with the Declaration of Helsinki (Grant No: IRB-2017-TECHNOLOGY-068). Informed consent was obtained from all the participants.

Results

Patient characteristics

This study ultimately included 411 CRC patients who underwent surgical treatment at our hospital and completed a 5-year follow-up. The differences in age, sex, and pathological stage between included and excluded patients were not statistically significant, indicating that the included population was not biased towards low-risk or high-risk cases (Supplementary Table S3). Among these included patients, 166 were female (40.389%) and 245 were male (59.611%). The median age of the patients was 64 years. Colon cancer accounted for 46.473% of the cases, while rectal cancer accounted for 53.527%. Patients in stages I, II, III, and IV accounted for 21.410%, 38.200%, 36.740%, and 3.650%, respectively. The proportions of patients with a history of blood transfusion, low tumor differentiation rate, advanced TNM and AJCC stages; the median age, median admission weight, perioperative transfusion quantity; and the levels of carcinoembryonic antigen (CEA), carbohydrate antigen 125 (CA125), carbohydrate antigen 19–9 (CA19-9), percentage of neutrophils, red cell volume distribution width, prothrombin time, and fibrinogen were greater in the deceased group than in the surviving patient group. Conversely, the levels of red blood cell count, hemoglobin concentration, hematocrit, lymphocyte count, thrombin time, serum prealbumin, serum sodium, and serum albumin were all lower in the deceased group than in the surviving patient group (Supplementary Table S1).

Feature selection and model training

For the Cox proportional hazards model, a Lasso regression model with survival time as the dependent variable was established for feature selection. The model achieved the minimum mean squared error (MSE) and optimal predictive performance on the training set when 18 feature variables were selected (Supplementary Fig. S2). For the DT, RF, SVM, and XGBoost models, feature selection was performed using the RFE algorithm based on each model. The RFE algorithm iteratively selected features for these four models by evaluating accuracy with ten-fold cross-validation, ultimately choosing 5, 15, 34, and 16 feature variables, respectively. The results of the RFE iterations for each model are shown in Supplementary Fig. S3. During the model training process, grid search was used to optimize the hyperparameters. The search ranges and optimal values of the hyperparameters are shown in Table S4.

Model performance evaluation

The performance of the five models was primarily evaluated by calculating their accuracy, F1 score, sensitivity, specificity, positive predictive value, negative predictive value, and Brier score on the validation set (Table 1). The results showed that the XGBoost model performed best in terms of accuracy (0.83 [95% CI 0.76–0.90]), F1 score (0.81 [95% CI 0.72–0.88]), sensitivity (0.75 [95% CI 0.64–0.86]), and negative predictive value (0.80 [95% CI 0.69–0.88]), surpassing the other four models. The positive predictive value of the XGBoost model was slightly lower than that of the RF model. The specificity of the XGBoost model was slightly lower than that of the RF and DT models. However, the area under the receiver operating characteristic curve (AUC) of the XGBoost model (0.89 [95% CI 0.82–0.94]) was the highest among the five models, indicating good discrimination ability (Fig. 2A). Its Brier score (0.13 [95% CI 0.09–0.17]) is the lowest, indicating the highest consistency between the predicted and actual probabilities and demonstrating a more stable performance compared to other models (Fig. 2B). Additionally, the DCA curve demonstrated that the XGBoost model had a greater net clinical benefit, indicating its potential for clinical application (Supplementary Fig. S4). The performance of each model on the training set was shown in Table 2. Combined with its excellent and stable performance on both the training and validation sets, the XGBoost model was selected as the optimal model for prognosis prediction in this study. Additionally, sensitivity analysis for the best model indicated that the impact of missing data and different interpolation methods on model performance was within the acceptable range, further demonstrating the robustness of the model (Supplementary Table S5). The stratified performance evaluation by AJCC stages, maintaining fixed hyperparameters, demonstrated stable model performance in stages 2 and 3. A slight performance decline was observed in stage 1, which could be attributed to class imbalance without further hyperparameter optimization for this specific stage. Stage 4 was excluded from the stratified performance evaluation due to the presence of only positive samples. Detailed stratified performance metrics are presented in Supplementary Table S6.

Table 1 Performance evaluation of the five prediction models on the validation set. Values in square brackets indicate the 95% Confidence Interval (CI).
Fig. 2
figure 2

Comparison of the discrimination and calibration of the models. (A) The ROC curves of the 5 models in the validation set. (B) The calibration curves of the 5 models on the validation set. AUC, area under the receiver operating characteristic curve.

Table 2 Performance evaluation of the five prediction models on the training set. Values in square brackets indicate the 95% Confidence Interval (CI).

Global interpretation of the optimal model

To enhance the interpretability of ML models, this study employed the SHAP method to explain the factors influencing the prognosis of CRC patients. The results revealed that the top six factors most strongly associated with CRC prognosis risk were perioperative transfusion quantity, tumor AJCC stage, CEA level, CA19-9 level, age, and CA125 level (Fig. 3A). To further explore the actual impact of these risk factors on CRC prognosis, we used SHAP summary plots to illustrate the contribution of each feature to the overall model output (Fig. 3B). Higher values of perioperative transfusion quantity, AJCC stage, CEA, CA19-9, age, and CA125 were associated with higher SHAP values, indicating that individuals with higher values of these features had a greater risk of death; thus, they can be interpreted as risk factors for CRC prognosis.

Fig. 3
figure 3

SHAP representation of the XGBoost model. (A) Importance ranking of the features predicted by the model. An evaluation based on the average absolute SHAP values revealed the importance of 16 feature variables in the XGBoost model. (B) The influence of each feature on the model output. In each feature row, each point represents a sample, where the x-axis represents its SHAP value for the given feature, and the color represents the magnitude of its feature value, with red indicating higher feature values and blue indicating lower feature values. PTQ: perioperative transfusion quantity; CEA: carcinoembryonic antigen; CA199: carbohydrate antigen 19–9; CA125: carbohydrate antigen 125; MCV: mean corpuscular volume; LDH: lactate dehydrogenase; P: serum phosphorus; Na: serum sodium; AFU: α-L-fucosidase; bas percentage: basophil granulocyte percentage; PDW: platelet distribution width; PT: prothrombin time; APTT, activated partial thromboplastin time.

Personalized decision process of the optimal model

Furthermore, we visualized the model decision process for two independent samples using SHAP decision plots (Fig. 4). From Fig. 4A, it can be observed that an AJCC stage of 1 and a perioperative transfusion quantity of 0 move the prediction line lower than the average prediction, resulting in a negative effect on the prediction of a positive event (death), which can be considered protective factors. Conversely, an age of 76 years and an MCV of 92.1 move the prediction line higher than the average prediction, resulting in a positive effect on the prediction of a positive event (death), which can be considered risk factors. The final model, based on the cumulative output SHAP values of 16 feature variables, predicted a negative outcome (survival), with a predicted survival probability of 0.889. Similarly, Fig. 4B shows that an AJCC stage of 3 and a CA19-9 value of 29.3 were prognostic risk factors, while an age of 61 years was a protective factor. The final model’s cumulative output SHAP value predicted a positive outcome (death), with a predicted death probability of 0.784. The SHAP decision plots elucidate the influence of different feature values on the outcomes during the model decision and prediction process.

Fig. 4
figure 4

SHAP decision plot of the XGBoost model. (A and B) Individual predictions of two patients. In the graph, the gray vertical line indicates the baseline prediction of the model, while the colored prediction lines represent the impact of each feature of the sample on the final prediction. The movement of the colored prediction lines demonstrates how individual feature values move the output prediction above or below the average prediction, thereby identifying the magnitude and direction of the influence of feature values on the prediction results. The feature values are provided alongside the prediction lines for reference. PTQ: perioperative transfusion quantity; CEA: carcinoembryonic antigen; CA199: carbohydrate antigen 19–9; CA125: carbohydrate antigen 125; MCV: mean corpuscular volume; LDH: lactate dehydrogenase; P: serum phosphorus; Na: serum sodium; AFU: α-L-fucosidase; bas percentage: basophil granulocyte percentage; PDW: platelet distribution width; PT: prothrombin time; APTT: activated partial thromboplastin time.

Discussion

In this study, we employed feature selection combined with ML models and integrated the SHAP method to construct and validate a highly accurate and interpretable predictive model for postoperative survival status in CRC patients, aiming to guide personalized treatment and clinical decisions. Among the five prognostic models, the model constructed using XGBoost combined with the RFE algorithm exhibited the best predictive performance, with an accuracy of 0.83 (95% CI 0.76–0.90), an F1 score of 0.81 (95% CI 0.72–0.88), and an AUC of 0.89 (95% CI 0.82–0.94), outperforming the other four models. Additionally, this model demonstrated good calibration and clinical applicability. In addition, the analysis based on the SHAP method effectively highlights the key characteristics of the ML model in predicting the prognosis of CRC patients. The SHAP summary plots and decision plots of XGBoost intuitively help clinicians understand these key features.

It is worth mentioning that our modeling is based on highly accessible data extracted from real-world clinical electronic medical record systems. These systems contain a wealth of obtainable data, which may have significant value for constructing disease prognosis models. Several studies have discussed the practical value of real-world data in disease diagnosis and prognosis prediction. For example, Yoshida et al. reported that the use of real-world data could more accurately predict the risk of developing cardiovascular diseases27. Margue et al. conducted research using prospective real-world data and ML to develop an individual postoperative disease-free survival (DFS) prediction model28. This model provided accurate individual DFS predictions, outperforming traditional prognostic scores and offering more reliable support for personalized treatment.

The optimal model constructed in this study is the RFE-XGBoost model, which has several advantages. XGBoost, as a powerful gradient-boosting tree algorithm, performs well in handling large-scale datasets and complex problems. It can effectively manage various types of features, including numerical and categorical ones, reducing the complexity of feature preprocessing and achieving more accurate predictive performance. Previous studies have demonstrated the successful application of the XGBoost model in disease prediction. For instance, Li et al. utilized the XGBoost model to more accurately predict the metastatic status of breast cancer29. Additionally, Tian and Yan found that the XGBoost model demonstrated excellent accuracy and robustness in predicting the prognosis of patients with chronic heart failure30.

However, ML models also face specific challenges, such as the tendency of the XGBoost model to overfit when the feature dimension is high, as well as computational complexity. To address these challenges, this study employed the RFE algorithm for feature selection. By combining RFE and XGBoost, RFE optimized the feature set by reducing the number of features, thus reducing the complexity of the model and the risk of overfitting. These selected key features provided a more accurate and effective feature set for the XGBoost model, allowing it to focus more on learning important features. This combination further improved the predictive accuracy and generalization ability of the model. Previous studies have demonstrated the successful application of the combination of RFE and the XGBoost model. For instance, Deng et al. utilized the RFE-XGBoost model in a multiclassification task of CRC-related causes of death and reported that the RFE-based feature selection method improved the model's performance in classifying CRC-related causes of death using clinical and omics data, or other data with highly redundant and imbalanced features31. Similarly, Zhang et al. employed this algorithm for spinal disease diagnosis, compared different combinations of feature selection methods and models, and found that the RFE-XGBoost model achieved the highest accuracy and F1 score32. These research examples indicate that the RFE-XGBoost combination model has broad application prospects in disease prediction, providing strong support for improving the accuracy and efficiency of disease diagnosis and prognosis prediction.

Furthermore, a major challenge for predictive models is how to intuitively present decisions. To address this issue, this study applies the SHAP framework to XGBoost's ‘black box’ tree ensemble model to reveal the specific decision process of the model. The SHAP method elucidates the magnitude and direction of the influence of feature variables on the model's output, providing important clinical insights. These clinical insights have significant practical implications. In this study, SHAP analysis showed that perioperative blood transfusion volume and AJCC stage were relatively important risk factors for predicting the prognosis of CRC patients. Studies have shown that perioperative blood transfusion has a significant negative impact on long-term prognosis after CRC surgery and increases the likelihood of short-term complications33. McSorley et al. also arrived at similar conclusions, indicating an association between perioperative blood transfusion and postoperative inflammation, complications, and poorer survival rates in CRC surgery patients34. Additionally, there may be a synergistic effect between perioperative blood transfusion and infectious complications, leading to proinflammatory activation conducive to tumor recurrence35. Additionally, the AJCC staging directly indicates the patient's prognosis by reflecting the extent of tumor progression8,9.

The optimal model ultimately included 16 feature variables encompassing various aspects of patient information, including age, histopathological results, tumor markers, routine blood tests, blood biochemistry, and coagulation function indicators. These features serve as crucial inputs for the predictive model and further guide patient prognosis. Previous studies have demonstrated the relevance of age, pathological results, and tumor markers to the prognosis of CRC, where older age, later stage, and elevated tumor markers typically indicate a poor prognosis36,37,38. Additionally, tumor location is associated with prognosis, with rectal cancer having a worse prognosis than colon cancer39. Furthermore, routine blood tests, blood biochemistry, and coagulation function tests also play a role in the prognostic assessment of CRC patients. These clinical laboratory indicators provide information about the overall health status and physiological function of patients. Such supplementary information can assist clinicians in better evaluating the overall condition of CRC patients and their tolerance to treatment, thereby enhancing prognosis prediction. Therefore, in the study of CRC prognosis, comprehensive consideration of clinical laboratory test indicators and other clinical examination information can provide a more thorough understanding of the overall condition of patients, achieving a more accurate individualized assessment.

In general, this study has significant practical implications. The establishment of the model and the identification of key features provide potential applications for personalized treatment and patient management. By accurately predicting patient prognosis, physicians can formulate personalized treatment plans more effectively, thereby improving treatment outcomes and patient survival rates. Moreover, the model, combined with the additional insights provided by the SHAP method, helps physicians to more fully consider patient conditions and risks in clinical decision-making, making the process more scientific and reliable.

Meanwhile, this study has several limitations that need to be acknowledged. First, the data used in this study are derived from a small, single-center cohort, which limits the external validity and generalizability of our findings. The lack of external validation raises doubts about the performance of the models in different populations and environments. Second, the model has not been tested or applied in clinical settings, so its practical utility and validity have not been verified. Third, there was a small proportion of missing data in this study, which was addressed using imputation methods. Although sensitivity analyses indicated that the model's robustness was minimally affected by imputation, the process may still have a potential impact on the model's performance. Additionally, due to the limitations of the collected information, this study did not adequately account for the differential treatment approaches across AJCC stages. This limitation may have constrained the model's ability to identify prognostic heterogeneity within each stage, as treatment regimens can significantly influence patient outcomes and the predictive accuracy of the model. Treatment strategies often vary substantially across disease stages, and the lack of detailed treatment information may have impacted the model's capacity to capture stage-specific survival patterns. To address these limitations, future studies should aim to incorporate comprehensive treatment-specific data across different AJCC stages, validate findings using larger multicenter datasets, and enhance the model's performance within each stratified stage. The integration of detailed treatment-related variables would better reflect their stage-specific impact on prognosis, thereby improving the model's clinical utility and its ability to identify prognostic heterogeneity within each AJCC stage.

Conclusion

In this study, we established an RFE-XGBoost model based on clinically available data to predict the postoperative survival status of CRC patients. Compared with the other four models, this model demonstrated superior predictive performance. Furthermore, SHAP elucidated the impact of key features on the prediction results, enhancing the credibility and practicality of the model, with features such as perioperative transfusion quantity and AJCC staging being of significant importance to the prognosis of CRC patients. Finally, this study provides an effective model approach for predicting the prognosis of CRC patients and offers a valuable reference for its future application in clinical practice. However, further work and validation are still needed to improve and optimize the model to improve clinical practice and improve the treatment outcomes and survival rates of CRC patients.