Abstract
Cardiovascular disease (CVD) remains a leading global health threat, responsible for one in five deaths worldwide. Early detection is critical to mitigate morbidity and mortality, yet traditional diagnostic methods often rely on reactive clinical assessments, missing opportunities for preventive intervention. In this study, a machine learning (ML) ecosystem is developed to enhance CVD diagnosis through two key approaches: (1) an early warning system using non-clinical, self-reported features for accessible risk stratification, and (2) specialized diagnostic models integrating clinical and non-clinical data. The framework leverages advanced ML techniques, including tabular neural networks (TabNet, TabPFN) and ensemble methods (XGBoost, Random Forest), validated on multi-regional datasets. Shapley Additive Explanations (SHAP) analysis identified ECG-related features as dominant predictors of CVD risk, with ST-segment slope (+0.93) and ST depression (+0.63) exhibiting the strongest effects. Counterfactual explanations from the non-clinical model further revealed actionable preventive measures: reducing exercise-induced angina and chest pain severity, alongside increasing exercise heart rate, could shift predictions from diseased to healthy, highlighting the model’s utility for lifestyle interventions. To address ethical and clinical trustworthiness, interpretability tools (SHAP, counterfactuals), fairness mitigation (FairLearn), and uncertainty quantification (Bayesian Neural Networks) are incorporated. Causal inference identified key predictors and their Average Treatment Effects (ATEs) such as exercise-induced angina (ATE: 0.36) and ST slope (ATE: 0.33), informing a hybrid ensemble model that achieved 89% accuracy while reducing dimensionality. The system aligns with FDA Good ML Practices and EU Trustworthy AI guidelines, offering a scalable solution for early detection and equitable diagnosis.
Data availability
Data used in this research, materials, and reproducibility guidelines are provided in the GitHub repository: https://github.com/SakibHasanSimanto/CVD-AI-Research. Original source of data is UCI Machine Learning Repository: https://archive.ics.uci.edu/dataset/45/heart+disease.
References
Centers for Disease Control and Prevention. National Center for Health Statistics mortality data on CDC WONDER (CDC WONDER, 2025). https://wonder.cdc.gov/mcd.html
Martin, S. S. et al. 2024 heart disease and stroke statistics: A report of US and global data from the American Heart Association. Circulation 149(8), 1209. https://doi.org/10.1161/CIR.0000000000001209 (2024).
Centers for Disease Control and Prevention. Heart Disease Facts. (2024). https://www.cdc.gov/heart-disease/data-research/facts-stats/index.html
Cleveland Clinic. Cardiovascular Disease (2022). https://my.clevelandclinic.org/health/diseases/21493-cardiovascular-disease
World Health Organization. Cardiovascular Diseases. https://www.who.int/health-topics/cardiovascular-diseases#tab=tab_2
Karatza, A. A. et al. Missed or delayed diagnosis of heart disease by the general pediatrician. Children 12(3), 366. https://doi.org/10.3390/children12030366 (2025).
Mandal, A. Cardiovascular disease diagnosis. News-Medical.net. (2023). https://www.news-medical.net/health/Cardiovascular-Disease-Diagnosis.aspx
Qadri, A. M., Raza, A., Munir, K. & Almutairi, M. S. Effective feature engineering technique for heart disease prediction with machine learning. IEEE Access 11, 56214–56223. https://doi.org/10.1109/ACCESS.2023.3281484 (2023).
Kumar, A., Singh, K. U. & Kumar, M. A clinical data analysis based diagnostic system for heart disease prediction using ensemble method. Big Data Min. Anal. 6(4), 513–525. https://doi.org/10.26599/BDMA.2022.9020052 (2023).
Subramani, S. et al. Cardiovascular diseases prediction by machine learning incorporation with deep learning. Front. Med. 10, 1150933. https://doi.org/10.3389/fmed.2023.1150933 (2023).
Rohan, D., Pradeep Reddy, G., Pavan Kumar, Y. V., Purna Prakash, K. & Pradeep Reddy, Ch. RHYTHMI: An extensive experimental analysis for heart disease prediction using artificial intelligence techniques. Sci. Rep. 15, 6132. https://doi.org/10.1038/s41598-025-90530-1 (2025).
Mohan, S., Thirumalai, C. & Srivastava, G. Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7, 81542–81554. https://doi.org/10.1109/ACCESS.2019.2923707 (2019).
Ali, M. M. et al. Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison. Comput. Biol. Med. 136, 104672. https://doi.org/10.1016/j.compbiomed.2021.104672 (2021).
Bhatt, C. M., Patel, P., Ghetia, T. & Mazzeo, P. L. Effective heart disease prediction using machine learning techniques. Algorithms 16(2), 88. https://doi.org/10.3390/a16020088 (2023).
Almazroi, A. A., Aldhahri, E. A., Bashir, S. & Ashfaq, S. A clinical decision support system for heart disease prediction using deep learning. IEEE Access 11, 61646–61659. https://doi.org/10.1109/ACCESS.2023.3285247 (2023).
Saeed, M. H. & Hama, J. I. Cardiac disease prediction using AI algorithms with SelectKBest. Med. Biol. Eng. Comput. 61, 3397–3408. https://doi.org/10.1007/s11517-023-02918-8 (2023).
Al-Alshaikh, H. A. et al. Comprehensive evaluation and performance analysis of machine learning in heart disease prediction. Sci. Rep. 14, 7819. https://doi.org/10.1038/s41598-024-58489-7 (2024).
Nandy, S. et al. An intelligent heart disease prediction system based on swarm-artificial neural network. Neural Comput. Appl. 35, 14723–14737. https://doi.org/10.1007/s00521-021-06124-1 (2023).
Eleyan, A., AlBoghbaish, E., AlShatti, A., AlSultan, A. & AlDarbi, D. RHYTHMI: A deep learning-based mobile ECG device for heart disease prediction. Appl. Syst. Innov. 7(5), 77. https://doi.org/10.3390/asi7050077 (2024).
Mehdi, R. R., Kumar, M., Mendiola, E. A., Sadayappan, S. & Avazmohammadi, R. Machine learning-based classification of cardiac relaxation impairment using sarcomere length and intracellular calcium transients. Comput. Biol. Med. 163, 107134. https://doi.org/10.1016/j.compbiomed.2023.107134 (2023).
Mehdi, R. R. et al. Non-invasive diagnosis of chronic myocardial infarction via composite in-silico-human data learning. Adv. Sci. https://doi.org/10.1002/advs.202406933 (2025).
Mehdi, R. R. et al. Comparison of three machine learning methods to estimate myocardial stiffness. In Reduced Order Models for the Biomechanics of Living Organs 363–382 (Academic Press, 2023). https://doi.org/10.1016/B978-0-32-389967-3.00025-1
Hasan, K. S. CVD-AI-Research: A diverse ML ecosystem for cardiovascular disease diagnosis [Computer software]. GitHub. (2025). https://github.com/SakibHasanSimanto/CVD-AI-Research
Janosi, A., Steinbrunn, W., Pfisterer, M. & Detrano, R. Heart Disease [Dataset]. UCI Machine Learning Repository. (1989). https://doi.org/10.24432/C52P4X
Xu, L., Skoularidou, M., Cuesta-Infante, A. & Veeramachaneni, K. Modeling tabular data using conditional GAN. arXiv preprint (2019). arXiv:1907.00503
U.S. Equal Employment Opportunity Commission, U.S. Department of Labor, U.S. Department of Justice, & U.S. Civil Service Commission. Uniform Guidelines on Employee Selection Procedures (1978). https://www.eeoc.gov/laws/guidance/questions-and-answers-about-uniform-guidelines-employee-selection-procedures
Chen, T., & Guestrin, C. XGBoost: A scalable tree boosting system. (2016). arXiv preprint arXiv:1603.02754
Arık, S. Ö. & Pfister, T. TabNet: Attentive interpretable tabular learning. (2020). arXiv preprint arXiv:1908.07442
Hollmann, N., Müller, S., Eggensperger, K. & Hutter, F. TABPFN: A transformer that solves small tabular classification problems in a second. (2023). arXiv preprint arXiv:2207.01848
Gorishniy, Y., Rubachev, I., Khrulkov, V. & Babenko, A. Revisiting deep learning models for tabular data. arXiv preprint (2021). arXiv:2106.11959
Goan, E. & Fookes, C. Bayesian neural networks: An introduction and survey. arXiv preprint (2020). arXiv:2006.12024
Author information
Authors and Affiliations
Contributions
K.S.H. formulated, wrote the manuscript, and conducted the research. I.S.D revised and edited the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Hasan, K.S., Dhrubo, I.S. Advancing cardiovascular disease diagnosis with an interpretable and responsible AI framework. Sci Rep (2026). https://doi.org/10.1038/s41598-026-35451-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-35451-3