Technical metrics used to evaluate medical artificial intelligence tools often fail to predict their clinical impact. We characterize this discordance and propose a framework of study designs to guide the translational process for clinical artificial intelligence tools, acknowledging their diversity and specific validation requirements.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Artificial intelligence for surgical scene understanding: a systematic review and reporting quality meta-analysis
npj Digital Medicine Open Access 17 December 2025
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout




References
Perez-Lopez, R., Ghaffari Laleh, N., Mahmood, F. & Kather, J. N. Nat. Rev. Cancer 24, 427–441 (2024).
Reis-Filho, J. S. & Kather, J. N. J. Natl Cancer Inst. 115, 608–612 (2023).
Daye, D. et al. Radiology 305, 555–563 (2022).
Carstens, M. et al. Preprint at medRxiv https://doi.org/10.1101/2025.07.12.25330122 (2025).
Maier-Hein, L. et al. Nat. Meth. https://doi.org/10.1038/s41592-023-02151-z (2024).
Reinke, A. et al. Nat. Meth. https://doi.org/10.1038/s41592-023-02150-0 (2024).
Yao, X. et al. Nat. Med. 27, 815–819 (2021).
Djurisic, S. et al. Trials 18, 360 (2017).
Faris, O. & Shuren, J. N. Engl. J. Med. 376, 1350–1357 (2017).
Han, R. et al. Lancet Digital Health 6, e367–e373 (2024).
Ford, I. & Norrie, J. N. Engl. J. Med. 375, 454–463 (2016).
Castelo-Branco, L. et al. Ann. Oncol. 34, 1097–1112 (2023).
González, J. et al. NEJM AI https://doi.org/10.1056/AIoa2400859 (2025).
Feinberg, B. A. et al. Value Health 23, 1358–1365 (2020).
Franklin, J. M., Glynn, R. J., Suissa, S. & Schneeweiss, S. Clin. Pharmacol. Ther. 107, 735–737 (2020).
Mandrekar, S. J. & Sargent, D. J. J. Clin. Oncol. 27, 4027–4034 (2009).
Kolbinger, F. R., Veldhuizen, G. P., Zhu, J., Truhn, D. & Kather, J. N. Commun. Med. 4, 71 (2024).
Acknowledgements
F.R.K. receives support from the German Cancer Research Center (CoBot 2.0), the Joachim Herz Foundation (Add-On Fellowship for Interdisciplinary Life Science), the Central Indiana Corporate Partnership AnalytiXIN Initiative, the Evan and Sue Ann Werling Pancreatic Cancer Research Fund, and the Indiana Clinical and Translational Sciences Institute (EPAR4157), funded, in part, by grant number UM1TR004402 from the National Institutes of Health, the National Center for Advancing Translational Sciences, Clinical and Translational Sciences Award. J.N.K. is supported by the German Cancer Aid DKH (DECADE, 70115166), the German Federal Ministry of Research, Technology and Space BMFTR (PEARL, 01KD2104C; CAMINO, 01EO2101; TRANSFORM LIVER, 031L0312A; TANGERINE, 01KT2302 through ERA-NET Transcan; Come2Data, 16DKZ2044A; DEEP-HCC, 031L0315A; DECIPHER-M, 01KD2420A; NextBIG, 01ZU2402A), the German Research Foundation DFG (CRC/TR 412, 535081457; SFB 1709/1 2025, 533056198), the German Academic Exchange Service DAAD (SECAI, 57616814), the German Federal Joint Committee G-BA (TransplantKI, 01VSF21048), the European Union’s Horizon Europe Research and Innovation Programme (ODELIA, 101057091; GENIAL, 101096312), the European Research Council ERC (NADIR, 101114631), the National Institutes of Health NIH (EPICO, R01 CA263318) and the National Institute for Health and Care Research NIHR (Leeds Biomedical Research Centre, NIHR203331). The views expressed are those of the authors and not necessarily those of the National Institutes of Health, the NHS, the NIHR or the Department of Health and Social Care. This work was funded by the European Union. Views and opinions expressed are, however, those of the authors only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.
Author information
Authors and Affiliations
Contributions
Both authors developed the adaptive validation framework, wrote and revised the manuscript.
Corresponding authors
Ethics declarations
Competing interests
F.R.K. declares advisory roles for Radical Health AI and the Surgical Data Science Collective. J.N.K. declares ongoing consulting services for AstraZeneca, Panakeia and Bioptimus. Furthermore, J.N.K. holds shares in Stratifai, Synagen and Spira Labs, has received an institutional research grant from GlaxoSmithKline, and has received honoraria from AstraZeneca, Bayer, Daiichi Sankyo, Eisai, Janssen, Merck, MSD, BMS, Roche, Pfizer and Fresenius.
Peer review
Peer review information
Nature Computational Science thanks the anonymous reviewer(s) for their contribution to the peer review of this work.
Rights and permissions
About this article
Cite this article
Kolbinger, F.R., Kather, J.N. Adaptive validation strategies for real-world clinical artificial intelligence. Nat Comput Sci 5, 980–986 (2025). https://doi.org/10.1038/s43588-025-00901-x
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s43588-025-00901-x