Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Behaviour, Psychology and Sociology

Assessing online chat-based artificial intelligence models for weight loss recommendation appropriateness and bias in the presence of guideline incongruence

Abstract

Background and aim

Managing obesity requires a comprehensive approach that involves therapeutic lifestyle changes, medications, or metabolic surgery. Many patients seek health information from online sources and artificial intelligence models like ChatGPT, Google Gemini, and Microsoft Copilot before consulting health professionals. This study aims to evaluate the appropriateness of the responses of Google Gemini and Microsoft Copilot to questions on pharmacologic and surgical management of obesity and assess for bias in their responses to either the ADA or AACE guidelines.

Methods

Ten questions were compiled into a set and posed separately to the free editions of Google Gemini and Microsoft Copilot. Recommendations for the questions were extracted from the ADA and the AACE websites, and the responses were graded by reviewers for appropriateness, completeness, and bias to any of the guidelines.

Results

All responses from Microsoft Copilot and 8/10 (80%) responses from Google Gemini were appropriate. There were no inappropriate responses. Google Gemini refused to respond to two questions and insisted on consulting a physician. Microsoft Copilot (10/10; 100%) provided a higher proportion of complete responses than Google Gemini (5/10; 50%). Of the eight responses from Google Gemini, none were biased towards any of the guidelines, while two of the responses from Microsoft Copilot were biased.

Conclusion

The study highlights the role of Microsoft Copilot and Google Gemini in weight loss management. The differences in their responses may be attributed to the variation in the quality and scope of their training data and design.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others

Data availability

All data generated or analyzed during this study are included in this published article.

References

  1. World Health Organization: WHO. Obesity and overweight. 2024. https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight.

  2. Flegal KM, Kit BK, Orpana H, Graubard BI. Association of all-cause mortality with overweight and obesity using standard body mass index categories: a systematic review and meta-analysis. JAMA. 2013;309:71–82.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Chew HSJ, Ang WHD, Lau Y. The potential of artificial intelligence in enhancing adult weight loss: a scoping review. Public Health Nutr. 2021;24:1993–2020.

    Article  PubMed  Google Scholar 

  4. Kushner RF. Weight loss strategies for treatment of obesity. Prog Cardiovasc Dis. 2014;56:465–72.

    Article  PubMed  Google Scholar 

  5. Chu YT, Huang RY, Chen TTW, Lin WH, Tang JT, Lin CW, et al. Effect of health literacy and shared decision-making on choice of weight-loss plan among overweight or obese participants receiving a prototype artificial intelligence robot intervention facilitating weight-loss management decisions. Digit Health. 2022;8:20552076221136372.

    PubMed  PubMed Central  Google Scholar 

  6. Kivimäki M, Kuosma E, Ferrie JE, Luukkonen R, Nyberg ST, Alfredsson L, et al. Overweight, obesity, and risk of cardiometabolic multimorbidity: pooled analysis of individual-level data for 120 813 adults from 16 cohort studies from the USA and Europe. Lancet Public Health. 2017;2:e277–85.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Masood A, Alsheddi L, Alfayadh L, Bukhari B, Elawad R, Alfadda AA. Dietary and lifestyle factors serve as predictors of successful weight loss maintenance postbariatric surgery. J Obes. 2019;2019:7295978.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Saperstein SL, Atkinson NL, Gold RS. The impact of Internet use for weight loss. Obes Rev. 2007;8:459–65.

    Article  CAS  PubMed  Google Scholar 

  9. Tan SSL, Goonawardene N. Internet health information seeking and the patient-physician relationship: a systematic review. J Med Internet Res. 2017;19:e9.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, et al. Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol. 2017;2:230–43.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Ahuja AS. The impact of artificial intelligence in medicine on the future role of the physician. PeerJ. 2019;7:e7702.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Nadarzynski T, Miles O, Cowie A, Ridge D. Acceptability of artificial intelligence (AI)-led chatbot services in healthcare: a mixed-methods study. Digit Health. 2019;5:2055207619871808.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Chew HSJ. The use of artificial intelligence-based conversational agents (Chatbots) for weight loss: scoping review and practical recommendations. JMIR Med Inform. 2022;10:e32578.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Pavlik EJ, Ramaiah DD, Swiecki-Sikora AL, Land JM. Replies to queries in gynecologic oncology by Bard, Bing and Google Assistant. BioMedInformatics. 2024;4:1773–82.

    Article  Google Scholar 

  15. Shukla R, Mishra AK, Banerjee N, Verma A. The comparison of ChatGPT 3.5, Microsoft Bing, and Google Gemini for diagnosing cases of neuroophthalmology. Cureus. 2024;16:e58232.

    PubMed  PubMed Central  Google Scholar 

  16. Yang R, Tan TF, Lu W, Thirunavukarasu AJ, Ting DSW, Liu N. Large language models in health care: development, applications, and challenges. Health Care Sci. 2023;2:255–63.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Barlas T, Altinova AE, Akturk M, Toruner FB. Credibility of ChatGPT in the assessment of obesity in type 2 diabetes according to the guidelines. Int J Obes. 2024;48:271–5.

    Article  Google Scholar 

  18. Alhur A. Redefining healthcare with artificial intelligence (AI): the contributions of ChatGPT, Gemini, and Co-pilot. Cureus. 2024;16:e57795.

    PubMed  PubMed Central  Google Scholar 

  19. Atarere J, Naqvi H, Haas C, Adewunmi C, Bandaru S, Allamneni R, et al. Applicability of online chat-based artificial intelligence models to colorectal cancer screening. Dig Dis Sci. 2024;69:791–7.

    Article  PubMed  Google Scholar 

  20. Lee Y, Shin T, Tessier L, Javidan A, Jung J, Hong D. et al. ASMBS Artificial Intelligence and Digital Surgery Task Force. Harnessing artificial intelligence in bariatric surgery:comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations. Surg Obes Relat Dis. 2024;20:603–608. https://doi.org/10.1016/j.soard.2024.03.011.

    Article  PubMed  Google Scholar 

  21. Kozaily E, Geagea M, Akdogan ER, Atkins J, Elshazly MB, Guglin M, et al. Accuracy and consistency of online large language model-based artificial intelligence chat platforms in answering patients’ questions about heart failure. Int J Cardiol. 2024;408:132115.

    Article  PubMed  Google Scholar 

  22. Nazir T, Ahmad U, Mal M, Rehman MU, Saeed R, Kalia JS. Microsoft Bing vs. Google Bard in Neurology: A comparative study of AI-generated patient education material. 2023. https://doi.org/10.1101/2023.08.25.23294641.

  23. Mudrik A, Nadkarni GN, Efros O, Glicksberg BS, Klang E, Soffer S. Exploring the role of large language models (LLMs) in hematology: A systematic review of applications, benefits, and limitations. 2024. https://doi.org/10.1101/2024.04.26.24306358.

  24. Cornell S. Comparison of the diabetes guidelines from the ADA/EASD and the AACE/ACE. J Am Pharm Assoc. 2017;57:261–5.

    Article  Google Scholar 

  25. Sblendorio E, Dentamaro V, Cascio AL, Germini F, Piredda M, Cicolini G. Integrating human expertise & automated methods for a dynamic and multiparametric evaluation of large language models’ feasibility in clinical decisionmaking. Int J Med Inform. 2024;188:105501.

  26. Haider SA, Pressman SM, Borna S, Gomez-Cabello CA, Sehgal A, Leibovich BC, et al. Evaluating large language model (LLM) performance on established breast classification systems. Diagnostics. 2024;14:1491.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Eugene Annor: conceptualization, methodology, drafting, and revision of manuscript. Joseph Atarere: conceptualization, methodology, drafting of the manuscript. Nneoma Ubah: drafting of the manuscript. Bryce Kunkle: drafting of the manuscript. Olachi Egbo: drafting of the manuscript. Oladoyin Jolaoye: drafting of the manuscript. Daniel K. Martin: drafting of the manuscript and reviewing for scientific content.

Corresponding author

Correspondence to Eugene Annor.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval and consent to participate

This study does not contain identifying information about the patients. The study was performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its subsequent amendments.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Annor, E., Atarere, J., Ubah, N. et al. Assessing online chat-based artificial intelligence models for weight loss recommendation appropriateness and bias in the presence of guideline incongruence. Int J Obes 49, 896–901 (2025). https://doi.org/10.1038/s41366-025-01717-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41366-025-01717-5

Search

Quick links