Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Scaling medical AI across clinical contexts

Abstract

Medical artificial intelligence (AI) tools, including clinical language models, vision–language models and multimodal health record models, are used to summarize clinical notes, answer questions and support decisions. Their adaptation to new populations, specialties or care settings often relies on fine-tuning, prompting or retrieval from external knowledge bases. These strategies can scale poorly and risk contextual errors—outputs that appear plausible but miss critical patient or situational information. We envision context switching as an emergent solution. Context switching adjusts model reasoning at inference, without retraining. Generative models can tailor outputs to patient biology, care setting or disease. Multimodal models can switch between notes, laboratory results, imaging and genomics, even when some data are missing or delayed. Agent models can coordinate tools and roles based on task and user context. In each case, context switching enables medical AI to adapt across specialties, populations and geographies. This approach requires advances in data design, model architectures and evaluation frameworks, and establishes a foundation for medical AI that scales to an infinite number of contexts, while remaining reliable and suited to real-world care.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: From pretraining to contextual adaptation in medical AI.
Fig. 2: Context switching across medical specialties and diseases, geographies and populations, and healthcare roles.

References

  1. Jiang, L. Y. et al. Health system-scale language models are all-purpose prediction engines. Nature 619, 357–362 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Habicht, J. et al. Closing the accessibility gap to mental health treatment with a personalized self-referral chatbot. Nat. Med. 30, 595–602 (2024).

    Article  CAS  PubMed  Google Scholar 

  5. Lu, M. Y. et al. A multimodal generative AI copilot for human pathology. Nature 634, 466–473 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Wan, P. et al. Outpatient reception via collaboration between nurses and a large language model: a randomized controlled trial. Nat. Med. 30, 2878–2885 (2024).

    Article  CAS  PubMed  Google Scholar 

  7. Huang, K. et al. A foundation model for clinician-centered drug repurposing. Nat. Med. 30, 3601–3613 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Li, J. et al. Integrated image-based deep learning and language models for primary diabetes care. Nat. Med. 30, 2886–2896 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Liu, X. et al. A generalist medical language model for disease diagnosis assistance. Nat. Med. 31, 932–942 (2025).

    Article  CAS  PubMed  Google Scholar 

  10. Van Veen, D. et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat. Med. 30, 1134–1142 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Johri, S. et al. An evaluation framework for clinical use of large language models in patient interaction tasks. Nat. Med. 31, 77–86 (2025).

    Article  CAS  PubMed  Google Scholar 

  12. Ao, G. et al. Comparative analysis of large language models on rare disease identification. Orphanet J. Rare Dis. 20, 150 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Shyr, C. Large language models for rare disease diagnosis at the undiagnosed diseases network. JAMA Netw. Open 8, e2528538 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Weiner, S. J. & Schwartz, A. Listening for What Matters: Avoiding Contextual Errors in Health Care (Oxford Univ. Press, 2023).

  15. Yu, K. -H. & Kohane, I. S. Framing the challenges of artificial intelligence in medicine. BMJ Qual. Saf. 28, 238–241 (2019).

    Article  PubMed  Google Scholar 

  16. Zhang, S., Liu, Q., Qin, G., Naumann, T. & Poon, H. Med-RLVR: emerging medical reasoning from a 3B base model via reinforcement learning. Preprint at https://doi.org/10.48550/arXiv.2502.19655 (2025).

  17. Hager, P. et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat. Med. 30, 2613–2622 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. McDermott, M. B. A., Yap, B., Szolovits, P. & Zitnik, M. Structure-inducing pre-training. Nat. Mach. Intell. 5, 612–621 (2023).

    Article  Google Scholar 

  19. Guo, L. L. et al. A multi-center study on the adaptability of a shared foundation model for electronic health records. npj Digit. Med. 7, 171 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Wornow, M. et al. The shaky foundations of large language models and foundation models for electronic health records. npj Digit. Med. 6, 135 (2023).

  21. Pais, C. et al. Large language models for preventing medication direction errors in online pharmacies. Nat. Med. 30, 1574–1582 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Sabuncu, M. R., Wang, A. Q. & Nguyen, M. Ethical use of artificial intelligence in medical diagnostics demands a focus on accuracy, not fairness. NEJM AI 2, AIp2400672 (2024).

  23. Li, M. M. et al. Contextual AI models for single-cell protein biology. Nat. Methods 21, 1546–1557 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Kather, J. N., Ferber, D., Wiest, I. C., Gilbert, S. & Truhn, D. Large language models could make natural language again the universal interface of healthcare. Nat. Med. 30, 2708–2710 (2024).

    Article  CAS  PubMed  Google Scholar 

  25. Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Proc. Syst. 33, 1877–1901 (2020).

    Google Scholar 

  26. Liu, H. et al. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Adv. Neural Inf. Proc. Syst. 35, 1950–1965 (2022).

  27. Pan, J., Gao, T., Chen, H. & Chen, D. What in-context learning ‘learns’ in-context: disentangling task recognition and task learning. In Findings of the Association for Computational Linguistics 8298–8319 (ACL, 2023).

  28. Min, S. et al. Rethinking the role of demonstrations: what makes in-context learning work? In Proc. 2022 Conference on Empirical Methods in Natural Language Processing 11048–11064 (ACL, 2022).

  29. Chen, B., Zhang, Z., Langrené, N. & Zhu, S. Unleashing the potential of prompt engineering for large language models. Patterns 6, 101260 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Shen, S. et al. Multitask vision-language prompt tuning. In Proc. the IEEE/CVF Winter Conference on Applications of Computer Vision 5656–5667 (IEEE, 2024).

  31. Wang, W. et al. VisionLLM: large language model is also an open-ended decoder for vision-centric tasks. Adv. Neural Inf. Proc. Syst. 36, 61501–61513 (2023).

  32. Tanwani, A. K., Barral, J. & Freedman, D. RepsNet: combining vision with language for automated medical reports. In International Conference on Medical Image Computing and Computer-assisted Intervention 714–724 (Springer, 2022).

  33. Shentu, J. & Al Moubayed, N. CXR-IRGen: an integrated vision and language model for the generation of clinically accurate chest X-ray image-report pairs. In Proc. IEEE/CVF Winter Conference on Applications of Computer Vision (IEEE, 2024).

  34. Wu, S. et al. CollabLLM: from passive responders to active collaborators. In Proc. 42nd International Conference on Machine Learning (PMLR, 2025).

  35. Alsentzer, E. et al. Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases. npj Digit. Med. 8, 380 (2025).

  36. Goh, E. et al. Large language model influence on diagnostic reasoning: a randomized clinical trial. JAMA Netw. Open 7, e2440969 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Wang, L. et al. Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs. npj Digit. Med. 7, 41 (2024).

  38. Khattab, O. et al. DSPy: compiling declarative language model calls into state-of-the-art pipelines. In International Conference on Learning Representations (ICLR, 2024).

  39. Yuksekgonul, M. et al. Optimizing generative AI by backpropagating language model feedback. Nature 639, 609–616 (2025).

    Article  CAS  PubMed  Google Scholar 

  40. Vaziri, M., Mandel, L., Spiess, C. & Hirzel, M. PDL: a declarative prompt programming language. Preprint at https://doi.org/10.48550/arXiv.2410.19135 (2024).

  41. Lu, Y. et al. Towards doctor-like reasoning: Medical RAG fusing knowledge with patient analogy through textual gradients. In 39th Conference on Neural Information Processing Systems (NeurIPS, 2025).

  42. Maharjan, J. et al. OpenMedLM: prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models. Sci. Rep. 14, 14156 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Nori, H. et al. Can generalist foundation models outcompete special-purpose tuning? Case study in medicine. Preprint at https://doi.org/10.48550/arXiv.2311.16452 (2023).

  44. Wu, S., Koo, M., Scalzo, F. & Kurtz, I. AutoMedPrompt: a new framework for optimizing LLM medical prompts using textual gradients. Preprint at https://doi.org/10.48550/arXiv.2502.15944 (2025).

  45. Yu, F. et al. Heterogeneity and predictors of the effects of AI assistance on radiologists. Nat. Med. 30, 837–849 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Rrv, A., Tyagi, N., Uddin, M. N., Varshney, N. & Baral, C. Chaos with keywords: exposing large language models sycophancy to misleading keywords and evaluating defense strategies. In Findings of the Association for Computational Linguistics 12717–12733 (ACL, 2024).

  47. Fanous, A. et al. SycEval: evaluating LLM sycophancy. In Proc. AAAI/ACM Conference on AI, Ethics, and Society 8, 893–900 (ACM, 2025).

  48. Su, X. et al. KGARevion: an AI agent for knowledge-intensive biomedical QA. In International Conference on Learning Representations (ICLR, 2025).

  49. Zhang, G. et al. Leveraging long context in retrieval augmented language models for medical question answering. npj Digit. Med. 8, 239 (2025).

  50. Ke, Y. H. et al. Retrieval augmented generation for 10 large language models and its generalizability in assessing medical fitness. npj Digit. Med. 8, 187 (2025).

  51. Kresevic, S. et al. Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework. npj Digit. Med. 7, 102 (2024).

  52. Lopez, I. et al. Clinical entity augmented retrieval for clinical information extraction. npj Digit. Med. 8, 45 (2025).

  53. Asai, A., Wu, Z., Wang, Y., Sil, A. & Hajishirzi, H. Self-RAG: learning to retrieve, generate, and critique through self-reflection. In International Conference on Learning Representations (ICLR, 2024).

  54. Yang, D., Zeng, L., Rao, J. & Zhang, Y. Knowing you don’t know: learning when to continue search in multi-round RAG through self-practicing. In Proc. 48th International ACM SIGIR Conference on Research and Development in Information Retrieval 1305–1315 (ACM, 2025).

  55. Islam, S. B. et al. Open-RAG: enhanced retrieval augmented reasoning with open-source large language models. In Findings of the Association for Computational Linguistics 14231–14244 (ACL, 2024).

  56. Jeong, S., Baek, J., Cho, S., Hwang, S. J. & Park, J. C. Adaptive-RAG: learning to adapt retrieval-augmented large language models through question complexity. In Proc. 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Vol. 1: Long Papers) 7036–7050 (ACL, 2024).

  57. Yang, R. et al. Retrieval-augmented generation for generative artificial intelligence in health care. npj Health Syst. 2, 2 (2025).

  58. Anisuzzaman, D. M., Malins, J. G., Friedman, P. A. & Attia, Z. I. Fine-tuning large language models for specialized use cases. Mayo Clin. Proc. Digit. Health 3, 100184 (2025).

    Article  CAS  PubMed  Google Scholar 

  59. Wiest, I. C. et al. Deidentifying medical documents with local, privacy-preserving large language models: the LLM-anonymizer. NEJM AI 2, 4 (2025).

  60. Croskerry, P. A universal model of diagnostic reasoning. Acad. Med. 84, 1022–1028 (2009).

  61. Geiping, J. et al. Scaling up test-time compute with latent reasoning: a recurrent depth approach. In 39th Annual Conference on Neural Information Processing Systems (NeurIPS, 2025).

  62. Makarov, N. et al. Large language models forecast patient health trajectories enabling digital twins. npj Digit. Med. 8, 588 (2025).

  63. Renc, P. et al. Zero shot health trajectory prediction using transformer. npj Digit. Med. 7, 256 (2024).

  64. Wang, J. et al. Self-improving generative foundation model for synthetic medical image generation and clinical applications. Nat. Med. 31, 609–617 (2024).

    Article  PubMed  Google Scholar 

  65. Rao, V. M. et al. Multimodal generative AI for medical image interpretation. Nature 639, 888–896 (2025).

    Article  CAS  PubMed  Google Scholar 

  66. Duan, Y., Xu, C., Pei, J., Han, J. & Li, C. Pre-train and plug-in: flexible conditional text generation with variational auto-encoders. In Proc. 58th Annual Meeting of the Association for Computational Linguistics 253–262 (ACL, 2020).

  67. Epstein, D., Jabri, A., Poole, B., Efros, A. & Holynski, A. Diffusion self-guidance for controllable image generation. Adv. Neural Inf. Proc. Syst. 36, 16222–16239 (2023).

  68. Li, Z. et al. ControlAR: controllable image generation with autoregressive models. In 13th International Conference on Learning Representations (ICLR, 2025).

  69. Beattie, J. et al. Using large language models to create patient centered consent forms. Int. J. Radiat. Oncol. Biol. Phys. 120, e612 (2024).

    Article  Google Scholar 

  70. Shi, Q. et al. Transforming informed consent generation using large language models: mixed methods study. JMIR Med. Inform. 13, e68139 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  71. Rudra, P., Balke, W. -T., Kacprowski, T., Ursin, F. & Salloch, S. Large language models for surgical informed consent: an ethical perspective on simulated empathy. J. Med. Ethics https://doi.org/10.1136/jme-2024-110652 (2025)

  72. Ravfogel, S., Goldberg, Y. & Goldberger, J. Conformal nucleus sampling. In Findings of the Association for Computational Linguistics 27–34 (ACL, 2023).

  73. Minh, N. N. et al. Turning up the heat: min-p sampling for creative and coherent LLM outputs. In 13th International Conference on Learning Representations (ICLR, 2025).

  74. Zhou, K., Yang, J., Loy, C. C. & Liu, Z. Conditional prompt learning for vision-language models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 16816–16825 (IEEE, 2022).

  75. Khasentino, J. et al. A personal health large language model for sleep and fitness coaching. Nat. Med. 31, 3394–3403 (2025).

  76. Wen, J. et al. The genetic architecture of multimodal human brain age. Nat. Commun. 15, 2604 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Mizrahi, D. et al. 4M: massively multimodal masked modeling. Adv. Neural Inf. Proc. Syst. 36, 58363–58408 (2023).

  78. Meng, X., Sun, K., Xu, J., He, X. & Shen, D. Multi-modal modality-masked diffusion network for brain MRI synthesis with random modality missing. IEEE Trans. Med. Imaging 43, 2587–2598 (2024).

  79. Stahlschmidt, S. R., Ulfenborg, B. & Synnergren, J. Multimodal deep learning for biomedical data fusion: a review. Brief Bioinform. 23, bbab569 (2022).

  80. Boehm, K. M., Khosravi, P., Vanguri, R., Gao, J. & Shah, S. P. Harnessing multimodal data integration to advance precision oncology. Nat. Rev. Cancer 22, 114–126 (2022).

    Article  CAS  PubMed  Google Scholar 

  81. Johnson, R., Li, M. M., Noori, A., Queen, O. & Zitnik, M. Graph artificial intelligence in medicine. Annu. Rev. Biomed. Data Sci. 7, 345–368 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  82. Kline, A. et al. Multimodal machine learning in precision health: a scoping review. npj Digit. Med. 5, 171 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  83. Huang, Y. et al. Multimodal AI predicts clinical outcomes of drug combinations from preclinical data. Preprint at https://doi.org/10.48550/arXiv.2503.02781 (2025).

  84. Zhang, Y. et al. Multiple heads are better than one: mixture of modality knowledge experts for entity representation learning. In 13th International Conference on Learning Representations (ICLR, 2025).

  85. Bao, H. et al. VLMo: unified vision-language pre-training with mixture-of-modality-experts. Adv. Neural Inf. Proc. Syst. 35, 32897–32912 (2022).

  86. Yun, S. et al. Flex-MoE: modeling arbitrary modality combination via the flexible mixture-of-experts. Adv. Neural Inf. Proc. Syst. 37, 98782–98805 (2024).

  87. Cho, M. et al. Cocoon: robust multi-modal perception with uncertainty-aware sensor fusion. In 13th International Conference on Learning Representations (ICLR, 2025).

  88. Tu, T. et al. Towards conversational diagnostic artificial intelligence. Nature 642, 442–450 (2025).

  89. McDuff, D. et al. Towards accurate differential diagnosis with large language models. Nature 642, 451–457 (2025).

  90. Gao, S. et al. Empowering biomedical discovery with AI agents. Cell 187, 6125–6151 (2024).

    Article  CAS  PubMed  Google Scholar 

  91. Guo, D. et al. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature 645, 633–638 (2025).

  92. Gao, S. et al. TxAgent: an AI agent for therapeutic reasoning across a universe of tools. Preprint at https://doi.org/10.48550/arXiv.2503.10970 (2025).

  93. Qu, X. et al. A survey of efficient reasoning for large reasoning models: language, multimodality, and beyond. Preprint at https://doi.org/10.48550/arXiv.2503.21614 (2025).

  94. Besta, M. et al. Reasoning language models: a blueprint. Preprint at https://doi.org/10.48550/arXiv.2501.11223 (2025).

  95. Johnson, R. et al. ClinVec: unified embeddings of clinical codes enable knowledge-grounded AI in medicine. Preprint at medRxiv https://doi.org/10.1101/2024.12.03.24318322 (2025).

  96. Wallace, E. et al. Managing patients with multimorbidity in primary care. BMJ 350, h176 (2015).

  97. Spillmann, R. C. et al. A window into living with an undiagnosed disease: illness narratives from the Undiagnosed Diseases Network. Orphanet J. Rare Dis. 12, 1–11 (2017).

    Article  Google Scholar 

  98. Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Proc. Syst. 35, 24824–24837 (2022).

  99. Rafailov, R. et al. Direct preference optimization: your language model is secretly a reward model. Adv. Neural Inf. Proc. Syst. 36, 53728–53741 (2023).

  100. Nathani, D. et al. MLGym: a new framework and benchmark for advancing AI research agents. Preprint at https://doi.org/10.48550/arXiv.2502.14499(2025).

  101. Jiang, Y. et al. MedAgentBench: a virtual EHR environment to benchmark medical LLM agents. NEJM AI 2, AIdbp2500144 (2025).

  102. Kazemi, M. et al. BIG-bench extra hard. In Proc. 63rd Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers) 26473–26501 (ACL, 2025).

  103. Liang, P. et al. Holistic evaluation of language models. Preprint at https://doi.org/10.48550/arXiv.2211.09110 (2023).

  104. Choi, H. K., Khanov, M., Wei, H. & Li, Y. How contaminated is your benchmark? Measuring dataset leakage in large language models with kernel divergence. In 13th International Conference on Machine Learning (ICLR, 2025).

  105. Ektefaie, Y. et al. Evaluating generalizability of artificial intelligence models for molecular datasets. Nat. Mach. Intell. 6, 1512–1524 (2024).

    Article  Google Scholar 

  106. Bourlon, M. T. et al. Envisioning academic global oncologists: proposed competencies for global oncology training from ASCO. JCO Glob. Oncol. 10, e2300157 (2024).

  107. Johnson-Peretz, J. et al. Geographical, social, and political contexts of tuberculosis control and intervention, as reported by mid-level health managers in Uganda: ‘the activity around town’. Soc. Sci. Med. 338, 116363 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  108. Ning, Y. et al. An ethics assessment tool for artificial intelligence implementation in healthcare: CARE-AI. Nat. Med. 30, 3038–3039 (2024).

    Article  CAS  PubMed  Google Scholar 

  109. Boverhof, B. -J. et al. Radiology AI Deployment and Assessment Rubric (RADAR) to bring value-based AI into radiological practice. Insights Imaging 15, 34 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  110. Dagan, N. et al. Evaluation of AI solutions in health care organizations — the OPTICA tool. NEJM AI 1, AIcs2300269 (2024).

  111. Borja, N. A. et al. Advancing equity in rare disease research: insights from the undiagnosed disease network. Am. J. Med. Genet. A 197, e63904 (2025).

    Article  CAS  PubMed  Google Scholar 

  112. Williams, J. S., Walker, R. J. & Egede, L. E. Achieving equity in an evolving healthcare system: opportunities and challenges. Am. J. Med. Sci. 351, 33–43 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  113. Pool, J., Indulska, M. & Sadiq, S. Large language models and generative AI in telehealth: a responsible use lens. J. Am. Med. Inform. Assoc. 31, 2125–2136 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  114. Yu, K. -H., Healey, E., Leong, T. -Y., Kohane, I. S. & Manrai, A. K. Medical artificial intelligence and human values. N. Engl. J. Med. 390, 1895–1904 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  115. Lewis, P. et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv. Neural Inf. Process. Syst. 33, 9459–9474 (2020).

    Google Scholar 

  116. Wei, J. et al. Finetuned language models are zero-shot learners. In 10th International Conference on Learning Representations (ICLR, 2022).

  117. Ouyang, L. et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 35, 27730–27744 (2022).

  118. Gururangan, S. et al. Don’t stop pretraining: adapt language models to domains and tasks. In Proc. 58th Annual Meeting of the Association for Computational Linguistics 8342–8360 (ACL, 2020).

  119. Schick, T. et al. Toolformer: language models can teach themselves to use tools. Adv. Neural Inf. Process. Syst. 36, 68539–68551 (2023).

    Google Scholar 

Download references

Acknowledgements

M.M.L. is supported by The Ivan and Francesca Berkowitz Family Living Laboratory Collaboration at Harvard Medical School and Clalit Research Institute. We gratefully acknowledge the support of NIH R01-HD108794, NSF CAREER 2339524, US DoD FA8702-15-D-0001, ARPA-H BDF program, Chan Zuckerberg Initiative, Bill & Melinda Gates Foundation INV-079038, Amazon Faculty Research, Google Research Scholar Program, AstraZeneca Research, Roche Alliance with Distinguished Scientists, Sanofi iDEA-iTECH, Pfizer Research, GlaxoSmithKline Award, Boehringer Ingelheim Award, Merck Award, Optum AI Research Collaboration Award, John and Virginia Kaneb Fellowship at Harvard Medical School, Biswas Computational Biology Initiative in partnership with the Milken Institute, Harvard Medical School Dean’s Innovation Fund for the Use of Artificial Intelligence, and Kempner Institute for the Study of Natural and Artificial Intelligence at Harvard University. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funders.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the writing of the paper and provided critical feedback. M.M.L. and M.Z. conceived the study and were responsible for overall direction and planning.

Corresponding authors

Correspondence to Michelle M. Li or Marinka Zitnik.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Medicine thanks Dokyoon Kim and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Karen O’Leary, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, M.M., Reis, B.Y., Rodman, A. et al. Scaling medical AI across clinical contexts. Nat Med (2026). https://doi.org/10.1038/s41591-025-04184-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41591-025-04184-7

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing