Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Clustering-cum-regression based model and performance analysis for early prediction of heart disease
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 18 February 2026

Clustering-cum-regression based model and performance analysis for early prediction of heart disease

  • Manoj Tolani1,
  • Yazeed AlZahrani2,
  • Gaurav Suman3,
  • Pankaj Kumar3,
  • Arun Balodi4 &
  • …
  • Ambar Bajpai5 

Scientific Reports , Article number:  (2026) Cite this article

  • 242 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Biomedical engineering
  • Congenital heart defects

Abstract

In real-time health monitoring systems, Wireless Body Area Networks (WBAN) are widely recognized for collecting various disease parameters using sensors. The collected data can be used for the early prediction of diseases. To address the growing need for accurate and efficient heart disease prediction, we introduce a novel hybrid approach that combines K-Means clustering with advanced regression techniques to analyze various factors in heart health monitoring. This integrated method utilizes the strengths of unsupervised and supervised learning to enhance predictive accuracy across both training and testing datasets. Our analysis focuses on 12 critical feature parameters, systematically clustered using K-Means to uncover inherent patterns and relationships. These parameters are then rigorously evaluated through multiple regression models to determine their predictive significance. By employing K-Means to assess parameter relevance within defined ranges, the proposed framework ensures robust feature selection and improved model interpretability. To validate its effectiveness, we benchmark our approach against widely used machine learning models, including Decision Tree Regression, K-Nearest Neighbor, Support Vector Machine (SVM), Kernel SVM, and others. The results demonstrate that our method not only outperforms traditional techniques but also offers a scalable and reliable solution for real-world healthcare applications. The prediction accuracy and false-prediction performance parameters were analyzed to compare the proposed method with existing heart disease prediction models. Earlier approaches reported accuracies up to 85%, with limited improvement in recall, specificity, and F1 score. In contrast, the newly proposed hybrid model–integrating Random Forest regression with K-Means clustering–achieved a significantly higher accuracy of 91%, along with improved recall (0.8864), specificity (0.9583), F1 score (0.8977), and ROC–AUC (0.9155). These quantitative performance gains, obtained without increasing model complexity, clearly demonstrate the superiority and robustness of the proposed approach over traditional prediction methods.

Similar content being viewed by others

Comparative analysis of heart disease prediction using logistic regression, SVM, KNN, and random forest with cross-validation for improved accuracy

Article Open access 18 April 2025

Optimal feature selection for heart disease prediction using modified Artificial Bee colony (M-ABC) and K-nearest neighbors (KNN)

Article Open access 31 October 2024

AttGRU-HMSI: enhancing heart disease diagnosis using hybrid deep learning approach

Article Open access 03 April 2024

Data availability

The datasets used in this study are publicly available and can be accessed through the below-given link/platform. University of California Irvine’s Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/Heart+Disease.

References

  1. Ciotti, M. et al. The covid-19 pandemic. Crit. Rev. Clin. Lab. Sci. 57(6), 365–388 (2020).

    Google Scholar 

  2. Indrakumari, R., Poongodi, T. & Jena, S. R. Heart disease prediction using exploratory data analysis. Procedia Comput. Sci. 173, 130–139. https://doi.org/10.1016/j.procs.2020.06.017 (2020).

    Google Scholar 

  3. Ayub, K. & AlShawa, R. Revolutionizing healthcare with iomt and wban: A comprehensive analysis. In: 2025 6th International Conference on Bio-engineering for Smart Technologies (BioSMART), 1–4 (2025). https://doi.org/10.1109/biosmart66413.2025.11046147.

  4. Tolani, M., Bajpai, A., Sunny, Singh, R.K., Wuttisittikulkij, L. & Kovintavewat, P. Energy efficient hybrid medium access control protocol for wireless sensor network. In: 2021 36th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC), 1–4. IEEE, ??? (2021).

  5. Tolani, M., Sunny & Singh, R. K. Energy efficient adaptive bit-map-assisted medium access control protocol. Wireless Personal Communication 108(3), 1595–1610 (2019).

  6. Boulis, A. Castalia: A Simulator for Wireless Sensor Networks and Body Area Networks. (2011). User’s manual version 3.2, NICTA.

  7. Mohan, S., Thirumalai, C. & Srivastava, G. Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7, 81542–81554. https://doi.org/10.1109/ACCESS.2019.2923707 (2019).

    Google Scholar 

  8. Rodriguez, M. Z. et al. Clustering algorithms: A comparative approach. PLoS One 14(1), 0210236. https://doi.org/10.1371/journal.pone.0210236 (2019).

    Google Scholar 

  9. Damarla, R. Heart Disease Prediction. Available: https://www.kaggle.com/datasets/rishidamarla/heart-disease-prediction (2020).

  10. Yuan, X., Chen, J., Zhang, K., Wu, Y. & Yang, T. A stable ai-based binary and multiple class heart disease prediction model for iomt. IEEE Trans. Ind. Inform. 18(3), 2032–2040. https://doi.org/10.1109/TII.2021.3098306 (2022).

    Google Scholar 

  11. Fitriyani, N. L., Syafrudin, M., Alfian, G. & Rhee, J. Hdpm: An effective heart disease prediction model for a clinical decision support system. IEEE Access 8, 133034–133050. https://doi.org/10.1109/ACCESS.2020.3010511 (2020).

    Google Scholar 

  12. Ordonez, C. Association rule discovery with the train and test approach for heart disease prediction. IEEE Trans. Inf. Technol. Biomed. 10(2), 334–343. https://doi.org/10.1109/TITB.2006.864475 (2006).

    Google Scholar 

  13. Pan, Y., Fu, M., Cheng, B., Tao, X. & Guo, J. Enhanced deep learning assisted convolutional neural network for heart disease prediction on the internet of medical things platform. IEEE Access 8, 189503–189512. https://doi.org/10.1109/ACCESS.2020.3026214 (2020).

    Google Scholar 

  14. Rohan, D., Reddy, G. P., Kumar, Y. V. P., Prakash, K. P. & Reddy, C. P. An extensive experimental analysis for heart disease prediction using artificial intelligence techniques. Sci. Rep. 15, 6132. https://doi.org/10.1038/s41598-025-90530-1 (2025).

    Google Scholar 

  15. Indrakumari, R., Poongodi, T. & Jena, S. R. Heart disease prediction using exploratory data analysis. Procedia Comput. Sci. 173, 130–139. https://doi.org/10.1016/j.procs.2020.06.017 (2020).

    Google Scholar 

  16. Prakash, C.S., MadhuBala, M. & Rudra, A. Data science framework - heart disease predictions, variant models and visualizations. In: 2020 International Conference on Computer Science, Engineering and Applications (ICCSEA), 1–4. IEEE, ??? (2020). https://doi.org/10.1109/ICCSEA49143.2020.9132920.

  17. Kavitha, M., Gnaneswar, G., Dinesh, R., Sai, Y.R. & Suraj, R.S. Heart disease prediction using hybrid machine learning model. In: 2021 6th International Conference on Inventive Computation Technologies (ICICT), 1329–1333. IEEE, ??? (2021). https://doi.org/10.1109/ICICT50816.2021.9358597.

  18. Lakshmanarao, A., Srisaila, A. & Kiran, T.S.R. Heart disease prediction using feature selection and ensemble learning techniques. In: 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), 994–998. IEEE, ??? (2021). https://doi.org/10.1109/ICICV50876.2021.9388482.

  19. Alim, M.A., Habib, S., Farooq, Y. & Rafay, A. Robust heart disease prediction: A novel approach based on significant feature and ensemble learning. In: 3rd International Conference on Computing Mathematics and Engineering Technologies (iCoMET) (2020).

  20. Ismaeel, S., Miri, A. & Chourishi, D. Using the extreme learning machine technique for heart disease. In: IEEE Canada International Humanitarian Technology Conference (IHTC) (2020).

  21. Ahmed, R., Mahmud, S.M.H., Hossin, M.A., Jahan, H. & Noori, S.R.H. A cloud based four-tier architecture for early detection of heart disease with machine learning algorithms. In: 4th International Conference on Computer and Communications (2018).

  22. Kapila, R. & Saleti, S. Federated learning-based disease prediction: A fusion approach with feature selection and extraction. Biomed. Signal Process. Control 100, 106961. https://doi.org/10.1016/j.bspc.2024.106961 (2025).

    Google Scholar 

  23. Khan, M. A. et al. Optimal feature selection for heart disease prediction using modified artificial bee colony (m-abc) and k-nearest neighbors (knn). Sci. Rep. https://doi.org/10.1038/s41598-024-78021-1 (2024).

    Google Scholar 

  24. Gavhane, A., Kokkula, G., Pandya, P.I. & Devadkar, K. Prediction of heart disease using machine learning. In: Proceedings of the 2nd International Conference on Electronics Communication and Aerospace Technology (ICECA 2018).

  25. Atallah, R. & Al-Mousa, A. Heart disease detection using machine learning majority voting ensemble method. In: 2nd International Conference on New Trends in Computing Sciences (ICTCS) (2019).

  26. Rajdhan, A., Agarwal, A. & Ghuli, P. Heart disease prediction using machine learning. International Journal Of Engineering Research & Technology (IJERT) 9(4), (2020).

  27. Wijayaa, G.B.S. & Astuti, L.G. Analysis of the effect of hidden layer units on coronary heart prediction using the radial basis functions algorithm. JELIKU 9(2), (2020).

  28. Mienye, I. D., Sun, Y. & Wang, Z. Improved sparse autoencoder based artificial neural network approach for prediction of heart disease. Inf. Med. Unlocked https://doi.org/10.1016/j.imu.2020.100307 (2020).

    Google Scholar 

  29. Balodi, A., Anand, R. S., Dewal, M. L. & Rawat, A. Severity analysis of mitral regurgitation using discrete wavelet transform. IETE J. Res. https://doi.org/10.1080/03772063.2020.1814880 (2020).

    Google Scholar 

  30. Balodi, A., Anand, R. S., Dewal, M. L. & Rawat, A. Computer-aided classification of the mitral regurgitation using multiresolution local binary pattern. Neural Comput. Appl. 32(7), 2205–2215 (2020b).

    Google Scholar 

  31. Bajpai, A. & Balodi, A. Role of 6g networks: Use cases and research directions. In: IEEE Bangalore Humanitarian Technology Conference (B-HTC), 1–5 (2020). https://doi.org/10.1109/B-HTC50970.2020.9298017.

  32. Repository, U.M.L. Heart Disease. Available: https://archive.ics.uci.edu/ml/datasets/Heart+Disease (2020).

  33. Devi, A. & Raj, T.N. Plmpfs: Predictive learning with polynomial features and smotetomek balancing based heart disease prediction. In: 2025 6th International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), 453–459 (2025). https://doi.org/10.1109/icicv64824.2025.11085773.

  34. Vibha, M. B., Sneha, S. R., Kiran, U., & Kirana, Y. Exploratory data analysis of heart disease prediction using machine learning techniques-rs algorithm. In: 2024 Second International Conference on Intelligent Cyber Physical Systems and Internet of Things (ICoICI), Coimbatore, India, 209–216 (2024). https://doi.org/10.1109/ICoICI62503.2024.10696414.

  35. Lakshmi, A. & Devi, R. Heart disease prediction using enhanced whale optimization algorithm based feature selection with machine learning techniques. In: 2023 12th International Conference on System Modeling & Advancement in Research Trends (SMART), Moradabad, India, 644–648 (2023). https://doi.org/10.1109/SMART59791.2023.10428617.

  36. Allgaier, J. & Pryss, R. Cross-validation visualized: A narrative guide to advanced methods. Mach. Learn. Knowl. Extr. 6(2), 1378–1388. https://doi.org/10.3390/make6020065 (2024).

    Google Scholar 

  37. Smith, J. & Doe, J. Impact of high cholesterol on cardiovascular health. JAMA Cardiol. 7(4), 456–464. https://doi.org/10.1001/jamacardio.2022.0912 (2022).

    Google Scholar 

  38. Doe, J. & Smith, J. St depression and its prognostic significance in patients with coronary artery disease. J. Am. Coll. Cardiol. 75(10), 1234–1245. https://doi.org/10.1016/j.jacc.2022.01.045 (2022).

    Google Scholar 

  39. Zeid, S. et al. Heart rate variability: Reference values and role for clinical profile and mortality in individuals with heart failure. Clin. Res. Cardiol. 113, 1317–1330. https://doi.org/10.1007/s00392-023-02248-7 (2024).

    Google Scholar 

Download references

Acknowledgements

This data comes from the University of California Irvine’s Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/Heart+Disease9,32

Funding

Open access funding provided by Manipal Academy of Higher Education, Manipal. This research is not funded by any agency.

Author information

Authors and Affiliations

  1. Department of Electronics and Communication Engineering, Jaypee Institute of Information Technology, Noida, 201309, Uttar Pradesh, India

    Manoj Tolani

  2. Department of Computer Engineering and Information, College of Engineering in Wadi Addawasir, Prince Sattam bin Abdulaziz University, Wadi Addawasir, Saudi Arabia

    Yazeed AlZahrani

  3. Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India

    Gaurav Suman & Pankaj Kumar

  4. Department of Electronics and Communication Engineering, Dayananda Sagar University, Bengaluru, Karnataka, India

    Arun Balodi

  5. Department of Electrical, Electronics and Communication Engineering, GITAM University, Bengaluru, Karnataka, India

    Ambar Bajpai

Authors
  1. Manoj Tolani
    View author publications

    Search author on:PubMed Google Scholar

  2. Yazeed AlZahrani
    View author publications

    Search author on:PubMed Google Scholar

  3. Gaurav Suman
    View author publications

    Search author on:PubMed Google Scholar

  4. Pankaj Kumar
    View author publications

    Search author on:PubMed Google Scholar

  5. Arun Balodi
    View author publications

    Search author on:PubMed Google Scholar

  6. Ambar Bajpai
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Manoj Tolani, Yazeed AlZahrani contributed to the conceptualization, methodology, coding, and writing of the original draft. Gaurav Suman, Arun Balodi is responsible for validation, formal analysis, and investigation. Ambar Bajpai, Pankaj Kumar handled the writing review and editing, as well as visualization.

Corresponding author

Correspondence to Pankaj Kumar.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tolani, M., AlZahrani, Y., Suman, G. et al. Clustering-cum-regression based model and performance analysis for early prediction of heart disease. Sci Rep (2026). https://doi.org/10.1038/s41598-026-40626-z

Download citation

  • Received: 15 June 2025

  • Accepted: 13 February 2026

  • Published: 18 February 2026

  • DOI: https://doi.org/10.1038/s41598-026-40626-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Wireless body area network
  • Prediction
  • Regression
  • Medium access control
  • K-Means Clustering
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing