Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

npj Digital Medicine
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. npj digital medicine
  3. articles
  4. article
Integrating large language models for enhanced predictive analytics in healthcare
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 02 April 2026

Integrating large language models for enhanced predictive analytics in healthcare

  • Yuli Wang1,2,3 na1,
  • Yuwei Dai1,4 na1,
  • Robin Wang5,
  • Tej Mehta2,
  • Premal Trivedi1,
  • Thao Vu6,
  • Cheng Ting Lin2,
  • Li Yang4,7 na2,
  • Zhicheng Jiao8,
  • Ihab Kamel1,
  • Jing Wu9 na2 &
  • …
  • Harrison Bai1 na2 

npj Digital Medicine , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computational biology and bioinformatics
  • Health care
  • Mathematics and computing
  • Medical research

Abstract

Physicians frequently confront time-sensitive decisions under uncertain conditions, necessitating reliable tools for forecasting clinical outcomes. Although clinical predictive models have the potential to assist in these critical decisions, their widespread adoption is hindered by complexities in data handling, model development, and integration into clinical workflows. This study introduces a novel framework (Hopkins LLM) leveraging structured electronic health records (EHRs) data to develop and deploy clinical large language models (LLMs) that act as multi-task-capable predictive engines to support clinically constrained decision-support tasks with minimal barriers to implementation. Employing the advanced LLaMA architecture, consisting of 7 billion parameters, our model was pre-trained on a comprehensive corpus and subsequently fine-tuned and tested on a dataset of 42,160 patients within Johns Hopkins Health System, addressing a spectrum of clinical and operational prediction tasks. We validated our model across three diverse external health systems and four key prediction tasks involving 1,329 patients, including 30-day all-cause readmissions, 90-day all-cause mortality, 30-day intensive care unit (ICU) admissions, and treatment recommendations. The proposed Hopkins-LLM framework achieved a mean area under the receiver operating characteristic curve (ROC-AUC) of 0.84 [0.82, 0.88], yielding a significant 0.28 advancement over zero-shot baseline LLMs (p<0.05). These findings underscore the promise of LLMs as unified, user-friendly clinical prediction systems, adept at reasoning across diverse data sources to enhance decision-making at the point of care.

Data availability

The multimodal imaging and EHRs datasets used in this study were derived from the Johns Hopkins Health System under Institutional Review Board (IRB) approval and contain protected health information. Due to patient privacy considerations, the raw data cannot be publicly shared. De-identified subsets of the data may be made available upon reasonable request to the corresponding author, subject to completion of a Data Use Agreement (DUA) and approval by the Johns Hopkins IRB.

Code availability

All deep learning models were implemented in Python (version 3.10) using PyTorch (version 2.1.2). The following libraries were used for model development and evaluation: NumPy (1.26.4), pandas (2.2.1), transformers (4.36.1), vLLM (0.2.5), scikit-learn (1.2.1), matplotlib (3.7.1), and SciPy (1.11.3). Custom code modules were used for data input/output pipelines and distributed parallelization across computing nodes and GPUs. All source code supporting this study is available for scientific research and non-commercial use at:https://github.com/YuliWanghust/Hopkins_LLM.

References

  1. Woolf, S. H. et al. Promoting informed choice: transforming health care to dispense knowledge for decision making (2005).

  2. Kaur, S. et al. Medical diagnostic systems using artificial intelligence (ai) algorithms: principles and perspectives. IEEE Access 8, 228049–228069 (2020).

    Google Scholar 

  3. Graber, M. L. The incidence of diagnostic error in medicine. BMJ Qual. Saf. 22, ii21–ii27 (2013).

    Google Scholar 

  4. Stern, S. D. Symptom to Diagnosis an Evidence-Based Guide (McGraw-Hill Education, 2010).

  5. Achour, S. L., Dojat, M., Rieux, C., Bierling, P. & Lepage, E. A umls-based knowledge acquisition tool for rule-based clinical decision support system development. J. Am. Med. Inform. Assoc. 8, 351–360 (2001).

    Google Scholar 

  6. Papadopoulos, P., Soflano, M., Chaudy, Y., Adejo, W. & Connolly, T. M. A systematic review of technologies and standards used in the development of rule-based clinical decision support systems. Health Technol. 12, 713–727 (2022).

    Google Scholar 

  7. Riley, R. D. & Collins, G. S. Stability of clinical prediction models developed using statistical or machine learning methods. Biometrical J. 65, 2200302 (2023).

    Google Scholar 

  8. Eloranta, S. & Boman, M. Predictive models for clinical decision making: Deep dives in practical machine learning. J. Intern. Med. 292, 278–295 (2022).

    Google Scholar 

  9. Shouval, R. et al. Application of machine learning algorithms for clinical predictive modeling: a data-mining approach in sct. Bone Marrow Transplant. 49, 332–337 (2014).

    Google Scholar 

  10. Zhong, Z. et al. Abn-blip: Abnormality-aligned bootstrapping language-image pre-training for pulmonary embolism diagnosis and report generation from ctpa. Med. Image Anal. 107, 103786 (2026).

    Google Scholar 

  11. Giesa, N. et al. Applying a transformer architecture to intraoperative temporal dynamics improves the prediction of postoperative delirium. Commun. Med. 4, 251 (2024).

    Google Scholar 

  12. Xu, Y., Xu, S., Ramprassad, M., Tumanov, A. & Zhang, C. Transehr: Self-supervised transformer for clinical time series data. In Machine Learning for Health (ML4H), 623–635 (PMLR, 2023).

  13. Oh, J., Wang, J. & Wiens, J. Learning to exploit invariances in clinical time-series data using sequence transformer networks. In Machine learning for healthcare conference, 332–347 (PMLR, 2018).

  14. Guo, H. et al. A multitask framework for automated interpretation of multi-frame right upper quadrant ultrasound in clinical decision support. arXiv preprint arXiv:2601.12174 (2026).

  15. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 4171–4186 (2019).

  16. Lee, J. et al. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2020).

    Google Scholar 

  17. Achiam, J. et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).

  18. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I. et al. Improving language understanding by generative pre-training. arXiv preprint (2018).

  19. Huang, K., Altosaar, J. & Ranganath, R. Clinicalbert: Modeling clinical notes and predicting hospital readmission. arXiv preprint arXiv:1904.05342 (2019).

  20. Jiang, L. Y. et al. Health system-scale language models are all-purpose prediction engines. Nature 619, 357–362 (2023).

    Google Scholar 

  21. Yang, X. et al. Gatortron: A large clinical language model to unlock patient information from unstructured electronic health records. arXiv preprint arXiv:2203.03540 (2022).

  22. Luo, R. et al. Biogpt: generative pre-trained transformer for biomedical text generation and mining. Brief. Bioinforma. 23, bbac409 (2022).

    Google Scholar 

  23. Chen, C. et al. Integration of large language models and federated learning. Patterns 5 (2024).

  24. Kokash, N. et al. Ontology-and llM-based data harmonization for federated learning in healthcare. arXiv preprint arXiv:2505.20020 (2025).

  25. Nascimento, L. et al. Federated large language models in healthcare: a systematic review, opportunities and challenges. Eng. Archive (2025).

  26. Nguyen, D.-T. et al. Federated learning for renal tumor segmentation and classification on multi-center mri dataset. J. Magn. Reson. Imaging 62, 814–824 (2025).

    Google Scholar 

  27. Floridi, L. & Chiriatti, M. Gpt-3: Its nature, scope, limits, and consequences. Minds Mach. 30, 681–694 (2020).

    Google Scholar 

  28. Pan, T., Shen, J. & Xu, M. Enhancing the performance of neurosurgery medical question-answering systems using a multi-task knowledge graph-augmented answer generation model. Front. Neurosci. 19, 1606038 (2025).

    Google Scholar 

  29. Xu, L. et al. End-to-end knowledge-routed relational dialogue system for automatic diagnosis. In Proceedings of the AAAI conference on artificial intelligence, vol. 33, 7346–7353 (2019).

  30. Liu, W. et al. Meddg: A large-scale medical consultation dataset for building medical dialogue system. arXiv preprint (2020).

  31. Martino, A., Iannelli, M. & Truong, C. Knowledge injection to counter large language model (llm) hallucination. In European Semantic Web Conference, 182–185 (Springer, 2023).

  32. Ouyang, L. et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 35, 27730–27744 (2022).

    Google Scholar 

  33. Touvron, H. et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).

  34. Liu, X. et al. A generalist medical language model for disease diagnosis assistance. Nat. Med. 31, 932–942 (2025).

    Google Scholar 

  35. Kirk, H. R., Vidgen, B., Röttger, P. & Hale, S. A. The benefits, risks and bounds of personalizing the alignment of large language models to individuals. Nat. Mach. Intell. 6, 383–392 (2024).

    Google Scholar 

  36. Sutton, N. R. et al. Coronary artery disease evaluation and management considerations for high risk occupations: commercial vehicle drivers and pilots. Circ.: Cardiovas. Interv. 14, e009950 (2021).

    Google Scholar 

  37. Righini, M. et al. The simplified pulmonary embolism severity index (pesi): validation of a clinical prognostic model for pulmonary embolism. J. Thrombosis Haemost. 9, 2115–2117 (2011).

    Google Scholar 

  38. Budoff, M. J. et al. Ten-year association of coronary artery calcium with atherosclerotic cardiovascular disease (ascvd) events: the multi-ethnic study of atherosclerosis (mesa). Eur. Heart J. 39, 2401–2408 (2018).

    Google Scholar 

  39. Guo, D. et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948 (2025).

  40. Team, G. et al. Gemma 3 technical report. arXiv preprint arXiv:2503.19786 (2025).

  41. Tu, T. et al. Towards generalist biomedical AI. Nejm AI 1, AIoa2300138 (2024).

    Google Scholar 

  42. Toma, A. et al. Clinical camel: An open expert-level medical language model with dialogue-based knowledge encoding. arXiv preprint arXiv:2305.12031 (2023).

  43. Zhao, L. et al. Artificial intelligence-based lesion characterization and outcome prediction of prostate cancer on [18f] dcfpyl psma imaging. Radiotherapy Oncol. 111265 (2025).

  44. Wu, J., Roy, J. & Stewart, W. F. Prediction modeling using ehr data: challenges, strategies, and a comparison of machine learning approaches. Med. Care 48, S106–S113 (2010).

    Google Scholar 

  45. Bernstein, I. A. et al. Comparison of ophthalmologist and large language model chatbot responses to online patient eye care questions. JAMA Netw. Open 6, e2330320–e2330320 (2023).

    Google Scholar 

  46. Xu, F. et al. Are large language models really good logical reasoners? a comprehensive evaluation and beyond. IEEE Trans. Knowledge Data Eng. (2025).

  47. Wang, C. et al. Survey on factuality in large language models. ACM Comput. Surv. 58, 1–37 (2025).

    Google Scholar 

  48. Hager, P. et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat. Med. 30, 2613–2622 (2024).

    Google Scholar 

  49. Shamout, F., Zhu, T. & Clifton, D. A. Machine learning for clinical outcome prediction. IEEE Rev. Biomed. Eng. 14, 116–126 (2020).

    Google Scholar 

  50. Kim, J. I. et al. Machine learning for antimicrobial resistance prediction: current practice, limitations, and clinical perspective. Clin. Microbiol. Rev. 35, e00179–21 (2022).

    Google Scholar 

  51. Rajkomar, A., Dean, J. & Kohane, I. Machine learning in medicine. N. Engl. J. Med. 380, 1347–1358 (2019).

    Google Scholar 

  52. Beam, A. L. & Kohane, I. S. Big data and machine learning in health care. Jama 319, 1317–1318 (2018).

    Google Scholar 

  53. Perez, E., Kiela, D. & Cho, K. True few-shot learning with language models. Adv. Neural Inf. Process. Syst. 34, 11054–11070 (2021).

    Google Scholar 

  54. Zhang, C., Morris, J. X. & Shmatikov, V. Extracting prompts by inverting llm outputs. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 14753–14777 (2024).

  55. Huang, L. et al. A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions. ACM Trans. Inform. Syst. 43, 1–55 (2025).

    Google Scholar 

  56. Mahajan, A., Obermeyer, Z., Daneshjou, R., Lester, J. & Powell, D. Cognitive bias in clinical large language models. npj Digital Med. 8, 428 (2025).

    Google Scholar 

  57. Suenghataiphorn, T., Tribuddharat, N., Danpanichkul, P. & Kulthamrongsri, N. Bias in large language models across clinical applications: A systematic review. arXiv preprint arXiv:2504.02917 (2025).

  58. Hsu, W.-C. et al. Mri-based ovarian lesion classification via a foundation segmentation model and multimodal analysis: A multicenter study. Radiology 316, e243412 (2025).

    Google Scholar 

  59. Wu, J. et al. Vision-language foundation model for 3d medical imaging. npj Artif. Intell. 1, 17 (2025).

    Google Scholar 

  60. Zhong, Z. et al. Vision-language model for report generation and outcome prediction in ct pulmonary angiogram. NPJ Digital Med. 8, 432 (2025).

    Google Scholar 

  61. Huang, Z. et al. A pathologist–ai collaboration framework for enhancing diagnostic accuracies and efficiencies. Nat. Biomed. Eng. 9, 455–470 (2025).

    Google Scholar 

  62. Huang, X. et al. Understanding the planning of llm agents: A survey. arXiv preprint arXiv:2402.02716 (2024).

  63. Zhao, A. et al. Expel: Llm agents are experiential learners. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, 19632–19642 (2024).

  64. Mosqueira-Rey, E., Hernández-Pereira, E., Alonso-Ríos, D., Bobes-Bascarán, J. & Fernández-Leal, Á Human-in-the-loop machine learning: a state of the art. Artif. Intell. Rev. 56, 3005–3054 (2023).

    Google Scholar 

  65. Cook, R. J., Zeng, L. & Yi, G. Y. Marginal analysis of incomplete longitudinal binary data: a cautionary note on locf imputation. Biometrics 60, 820–828 (2004).

    Google Scholar 

  66. Xue, H. & Salim, F. D. Promptcast: A new prompt-based learning paradigm for time series forecasting. IEEE Trans. Knowl. Data Eng. 36, 6851–6864 (2023).

    Google Scholar 

  67. Liu, H., Zhao, Z., Wang, J., Kamarthi, H., & Prakash, B. A. (2024, August). Lstprompt: Large language models as zero-shot time series forecasters by long-short-term prompting. In Findings of the Association for Computational Linguistics: ACL 2024, pp. 7832–7840.

  68. Moon, H. C., Joty, S. & Chi, X. Gradmask: Gradient-guided token masking for textual adversarial example detection. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 3603–3613 (2022).

  69. Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. neural Inf. Process. Syst. 35, 24824–24837 (2022).

    Google Scholar 

  70. Dwivedi, A. K., Mallawaarachchi, I. & Alvarado, L. A. Analysis of small sample size studies using nonparametric bootstrap test with pooled resampling method. Stat. Med. 36, 2187–2205 (2017).

    Google Scholar 

  71. Tong, X. et al. A novel subpixel phase correlation method using singular value decomposition and unified random sample consensus. IEEE Trans. Geosci. Remote Sens. 53, 4143–4156 (2015).

    Google Scholar 

  72. Naidu, K., Beenen, E., Gananadha, S. & Mosse, C. The yield of fever, inflammatory markers and ultrasound in the diagnosis of acute cholecystitis: a validation of the 2013 tokyo guidelines. World J. Surg. 40, 2892–2897 (2016).

    Google Scholar 

Download references

Acknowledgements

This work was supported by the American Heart Association (Award #25IPA1454088), the National Institutes of Health (Award No. 1R03CA286693-01A1 and Award No. 1R01CA291826-01A1), the U.S. Department of Defense (Award No. HT94252510807), and the National Science Foundation (Award No. 2545071).

Author information

Author notes
  1. These authors contributed equally: Yuli Wang, Yuwei Dai.

  2. These authors jointly supervised this work: Li Yang, Jing Wu, Harrison Bai.

Authors and Affiliations

  1. Department of Radiology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA

    Yuli Wang, Yuwei Dai, Premal Trivedi, Ihab Kamel & Harrison Bai

  2. Department of Radiology, Johns Hopkins University School of Medicine, Baltimore, MD, USA

    Yuli Wang, Tej Mehta & Cheng Ting Lin

  3. Department of Radiation Oncology, Icahn School of Medicine at Mount Sinai, New York, USA

    Yuli Wang

  4. Department of Neurology, Second Xiangya Hospital, Central South University, Changsha, Hunan, China

    Yuwei Dai & Li Yang

  5. Department of Radiology, Stanford University School of Medicine, Stanford, CA, USA

    Robin Wang

  6. Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA

    Thao Vu

  7. Clinical Medical Research Center for Stroke Prevention and Treatment of Hunan Province, The Second Xiangya Hospital, Central South University, Changsha, China

    Li Yang

  8. Department of Diagnostic Imaging, Brown University Health, Providence, RI, USA

    Zhicheng Jiao

  9. Department of Radiology, Second Xiangya Hospital, Central South University, Changsha, Hunan, China

    Jing Wu

Authors
  1. Yuli Wang
    View author publications

    Search author on:PubMed Google Scholar

  2. Yuwei Dai
    View author publications

    Search author on:PubMed Google Scholar

  3. Robin Wang
    View author publications

    Search author on:PubMed Google Scholar

  4. Tej Mehta
    View author publications

    Search author on:PubMed Google Scholar

  5. Premal Trivedi
    View author publications

    Search author on:PubMed Google Scholar

  6. Thao Vu
    View author publications

    Search author on:PubMed Google Scholar

  7. Cheng Ting Lin
    View author publications

    Search author on:PubMed Google Scholar

  8. Li Yang
    View author publications

    Search author on:PubMed Google Scholar

  9. Zhicheng Jiao
    View author publications

    Search author on:PubMed Google Scholar

  10. Ihab Kamel
    View author publications

    Search author on:PubMed Google Scholar

  11. Jing Wu
    View author publications

    Search author on:PubMed Google Scholar

  12. Harrison Bai
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Conceptualization: Y.W. and H.B. Methodology: Y.W. and Y.D. Investigation: R.W., T. M., P.T., T.V. Visualization: C.L., Z. J., I.K., J.W. Supervision: L.Y. and H.B. Writing original draft: Y.W. and Y.D. Writing, review, and editing: H.B.

Corresponding authors

Correspondence to Li Yang, Jing Wu or Harrison Bai.

Ethics declarations

Competing interests

Harrison Bai, MD, serves as an Associate Editor of the npj Digital Medicine. He was not involved in the peer-review process or editorial decision-making for this manuscript.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Dai, Y., Wang, R. et al. Integrating large language models for enhanced predictive analytics in healthcare. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02572-y

Download citation

  • Received: 15 December 2025

  • Accepted: 13 March 2026

  • Published: 02 April 2026

  • DOI: https://doi.org/10.1038/s41746-026-02572-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Associated content

Collection

Artificial Intelligence in Emergency and Critical Care Medicine

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Content types
  • Journal Information
  • About the Editors
  • Contact
  • Editorial policies
  • Calls for Papers
  • Journal Metrics
  • About the Partner
  • Open Access
  • Early Career Researcher Editorial Fellowship
  • Editorial Team Vacancies
  • News and Views Student Editor
  • Communication Fellowship

Publish with us

  • For Authors and Referees
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

npj Digital Medicine (npj Digit. Med.)

ISSN 2398-6352 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics