Abstract
Colon cancer represents a significant global health burden, accounting for a substantial portion of cancer-related morbidity and mortality worldwide. Many studies have been conducted to predict survival outcomes; however, most of these analyses have been performed predominantly via basic statistical methods. The aim of this study was to perform machine learning techniques to build models for survival prediction in patients with colon cancer. A retrospective review of 764 colon cancer patients treated over a 10-year period facilitated the construction of a detailed dataset containing 44 predictor variables and one dependent variable, the survival status of the patients (alive or dead). The data were randomly split into 80% training and 20% testing sets. Prognostic features from the database were used to build machine learning algorithms, including random forest, logistic regression, XGBoost, gradient boosting, categorical boosting (CatBoost), light gradient boosting machine (LightGBM), multilayer perceptron (MLP), and one-dimensional convolutional neural network (1D-CNN) to predict progressive disease outcomes. Models were validated for sensitivity, accuracy and specificity, with predictive ability assessed by receiver operating characteristic (ROC) curve and area under the curve (AUC) calculations. In terms of model accuracy and precision, almost all algorithms produced similar outcomes; however, among the evaluated models, CatBoost achieved the highest accuracy of 0.813, and the random forest model demonstrated the best precision of 0.727, whereas the logistic regression model exhibited the highest recall of 0.658 on the test set. Our results revealed that the random forest algorithm exhibited the highest AUC of 0.83, demonstrating remarkable efficacy in achieving an optimal balance between sensitivity and specificity. In summary, this research highlights the potential of machine learning models to support personalized and timely interventions for colon cancer patients, ultimately aiming to improve patient care and outcomes.
Data availability
Data are available from the corresponding author upon reasonable request.
References
Xi, Y. & Xu, P. Global colorectal cancer burden in 2020 and projections to 2040. Translational Oncol. 14 (10), 101174 (2021).
Arnold, M. et al. Global patterns and trends in colorectal cancer incidence and mortality. Gut 66 (4), 683–691 (2017).
Gandomani, H. S. et al. Colorectal cancer in the world: Incidence, mortality and risk factors. Biomedical Res. Therapy. 4 (10), 1656–1675 (2017).
Iqbal, M. J. et al. Clinical applications of artificial intelligence and machine learning in cancer diagnosis: Looking into the future. Cancer Cell Int. 21 (1), 270 (2021).
Cruz, J. A. & Wishart, D. S. Applications of machine learning in cancer prediction and prognosis. Cancer Inform. 2, 117693510600200030 (2006).
Eisenstein, S., Stringfield, S. & Holubar, S. D. Introduction to big data in colorectal surgery: Using the National Surgical Quality Improvement Project (NSQIP) to perform clinical research in colon and rectal surgery. Clin. Colon Rectal Surg. 32 (1), 41 (2019).
Petrus, A. An Application of Survival Analysis on the Prevalence and Risk Factors of Breast Cancer in Namibia (University of Namibia, 2019).
Pearce, C. B., Gunn, S. R., Ahmed, A. & Johnson, C. D. Machine learning can improve prediction of severity in acute pancreatitis using admission values of APACHE II score and C-reactive protein. Pancreatology 6 (1–2), 123–131 (2006).
Das, S., Nayak, S. P., Sahoo, B. & Nayak, S. C. Machine learning in healthcare analytics: A state-of-the-art review. Arch. Comput. Methods Eng. 1–40 (2024).
Navin, K., Nehemiah, H. K., Nancy Jane, Y. & Veena Saroji, H. A classification framework using filter–wrapper based feature selection approach for the diagnosis of congenital heart failure. J. Intell. Fuzzy Syst. 44 (4), 6183–6218 (2023).
Gupta, P. et al. Prediction of colon cancer stages and survival period with machine learning approach. Cancers 11 (12), 2007 (2019).
Mamdouh, A., El-Melegy, M. T., Ali, S. A. & El-Baz, A. S. (eds) Prediction of The Gleason Group of Prostate Cancer from Clinical Biomarkers: Machine and Deep Learning from Tabular Data. 2022 International Joint Conference on Neural Networks (IJCNN); : IEEE. (2022).
Li, Y. et al. Optimizing a interpretable diagnostic model for colorectal cancer based on Yin deficiency pattern characteristic genes using 21 machine learning algorithms and bayesian opitimization. (2023).
Mısırlıoğlu, H. K., Leblebici, A., Koçal, G. Ç., Ellidokuz, H. & Başbınar, Y. AI-assisted survival prediction in colorectal cancer: A clinical decision support tool. J. Basic. Clin. Health Sci. 8 (3), 771–778 (2021).
Woźniacki, A., Książek, W. & Mrowczyk, P. A novel approach for predicting the survival of colorectal cancer patients using machine learning techniques and advanced parameter optimization methods. Cancers 16 (18), 3205 (2024).
Britto, C. F. (Ed) Prediction of colon cancer disease with the handling of outliers and overfitting through neural network clustering and optimal tuning. In 2023 International Conference on Advances in Computation, Communication and Information Technology (ICAICCIT). 23–24 Nov 2023 (2023).
Katz, M. H. Multivariable analysis: A primer for readers of medical research. Ann. Intern. Med. 138 (8), 644–650 (2003).
Huynh-Thu, V. A., Saeys, Y., Wehenkel, L. & Geurts, P. Statistical interpretation of machine learning-based feature importance scores for biomarker discovery. Bioinformatics 28 (13), 1766–1774 (2012).
Jullumstrø, E., Wibe, A., Lydersen, S. & Edna, T. H. Colon cancer incidence, presentation, treatment and outcomes over 25 years. Colorectal Dis. 13 (5), 512–518 (2011).
Van Steenbergen, L. et al. Improved survival of colon cancer due to improved treatment and detection: a nationwide population-based study in the Netherlands 1989–2006. Ann. Oncol. 21 (11), 2206–2212 (2010).
Wang, J. et al. Metastatic patterns and survival outcomes in patients with stage IV colon cancer: A population-based analysis. Cancer Med. 9 (1), 361–373 (2020).
Buk Cardoso, L. et al. Machine learning for predicting survival of colorectal cancer patients. Sci. Rep. 13 (1), 8874 (2023).
Grass, F. et al. Impact of delay to surgery on survival in stage I-III colon cancer. Eur. J. Surg. Oncol. 46 (3), 455–461 (2020).
Jiang, C. et al. Metastatic lymph node ratio as a prognostic indicator in patients with stage IV colon cancer undergoing resection. J. Cancer. 10 (11), 2534 (2019).
Märkl, B. et al. The clinical significance of lymph node size in colon cancer. Mod. Pathol. 25 (10), 1413–1422 (2012).
Li, Q. et al. Negative to positive lymph node ratio is a superior predictor than traditional lymph node status in stage III colorectal cancer. Oncotarget 7 (44), 72290 (2016).
Dimitriou, N., Arandjelović, O., Harrison, D. J. & Caie, P. D. A principled machine learning framework improves accuracy of stage II colorectal cancer prognosis. NPJ Digit. Med. 1 (1), 52 (2018).
Jiang, D. et al. A machine learning-based prognostic predictor for stage III colon cancer. Sci. Rep. 10 (1), 10333 (2020).
Xu, Y., Ju, L., Tong, J., Zhou, C-M. & Yang, J-J. Machine learning algorithms for predicting the recurrence of stage IV colorectal cancer after tumor resection. Sci. Rep. 10 (1), 2519 (2020).
Bülbül, H. M., Burakgazi, G. & Kesimal, U. Preoperative assessment of grade, T stage, and lymph node involvement: machine learning-based CT texture analysis in colon cancer. Japanese J. Radiol. 42 (3), 300–307 (2024).
Nikolaou, N. et al. A machine learning approach for multimodal data fusion for survival prediction in cancer patients. NPJ Precision Oncol. 9 (1), 128 (2025).
Acknowledgements
This work is supported by Vice-chancellor for Research Affairs of Shiraz University of Medical Sciences (Grant No: 1402-29200).
Author information
Authors and Affiliations
Contributions
H.G., S.V.H. and A.R conceived the experiment(s), S.A.N., A.M.B. and H.G conducted the experiment(s), A.R.S., P.M., M.Z., and H.K. analyzed the results. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ghasemi, H., Hosseini, S.V., Rezaianzadeh, A. et al. Machine learning application in colon cancer treatment outcome prediction. Sci Rep (2026). https://doi.org/10.1038/s41598-026-36917-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-36917-0