Collection

Evaluating the Real-World Clinical Performance of AI

Submission status: Open
Submission deadline: 03 June 2026

As artificial intelligence (AI) continues to transform healthcare, its shift from controlled research environments to real-world clinical settings presents both immense promise and critical challenges. This collection in npj Digital Medicine showcases cutting-edge research that rigorously examines how AI systems perform in diverse, dynamic, and often unpredictable clinical contexts.

We invite contributions that explore the following areas:

Clinical utility: Demonstrating how AI tools enhance diagnostic accuracy, inform treatment decisions, and improve patient outcomes in prospective or retrospective evaluations
Safety and reliability: Assessing risks, unintended consequences, and robustness of AI systems in real-world deployment
Equity and fairness: Evaluating performance across diverse populations to uncover and mitigate algorithmic bias
Scalability and generalizability: Understanding how AI systems adapt across institutions, specialities, and care settings
Workflow integration: Investigating how AI fits into clinical processes, team dynamics, and decision-making, and understanding barriers and facilitators to AI adoption in clinical workflows.
Post-deployment monitoring and evaluation: Including model updating, performance decay, feedback loops, audits, prospective trials, and longitudinal studies of AI in practice over time
Transparency and reproducibility: Promoting open methods, interpretable models, and replicable results to foster trust and accountability
Patient-centered outcomes: Measuring impact of AI on care quality, experience, and health equity
Human factors and clinician-AI interaction: including trust, interpretability, and decision support dynamics.

By spotlighting real-world evidence, this collection aims to bridge the gap between algorithmic innovation and clinical impact—ensuring that AI technologies not only work in theory but truly deliver value in clinical practice.

Submit manuscript

Manuscript editing services

Editors

Wei-Qi Wei, MD, PhD ,

Wei-Qi Wei, MD, PhD

Vanderbilt University Medical Center, USA
Lequan Yu, PhD &

Lequan Yu, PhD

The University of Hong Kong, HK, China
Dinh Nguyen, MD, MSHI

Dinh Nguyen, MD, MSHI

Kaiser Permanente, USA

Multidimensional evaluation of large language models in radiology report readability
- Yunhai Mao
- Chunyan Wang
- Mengchao Zhang
ArticleOpen Access1 Apr 2026 npj Digital Medicine
Comparative performance of LLMs and machine learning in predicting complications after percutaneous kyphoplasty for osteoporotic vertebral compression fractures
- Tianyi Wang
- Ruiyuan Chen
- Lei Zang
ArticleOpen Access1 Apr 2026 npj Digital Medicine
A unified deep learning framework for cross-platform harmonization of multi-tracer PET quantification in neurodegenerative disease
- Jing Wang
- Aocheng Zhong
- Chuantao Zuo
ArticleOpen Access30 Mar 2026 npj Digital Medicine
Real-world unified denoising for multi-organ fast MRI: a large-scale prospective validation
- Yuchen Shao
- Hongyan Huang
- Yingwei Qiu
ArticleOpen Access19 Mar 2026 npj Digital Medicine
Limited validity of an AI-powered app for dietary assessment in females with obesity
- Michele Serra
- Daniela Alceste
- Marco Bueter
ArticleOpen Access17 Mar 2026 npj Digital Medicine
A multicenter multifunctional assessment of large language models in pure-tone audiogram interpretation for patients
- Jun Liang
- Mengyao Xing
- Jianbo Lei
ArticleOpen Access15 Mar 2026 npj Digital Medicine
Automated interpretation of fetal cardiac function evaluation from the echocardiogram
- Caixin Huang
- Lihe Zhang
- Hongning Xie
ArticleOpen Access10 Mar 2026 npj Digital Medicine
Comparing artificial intelligence and healthcare professional performance in surgical and interventional video analysis: a systematic review and meta-analysis
- Amir Rafati Fard
- Simon C. Williams
- Hani J. Marcus
ArticleOpen Access6 Mar 2026 npj Digital Medicine
Performance of the 12-lead ECG in predicting short- and long-term risk of sudden cardiac death
- Jussi A. Hernesniemi
- Teemu Pukkila
- Juho Tynkkynen
ArticleOpen Access5 Mar 2026 npj Digital Medicine
A test-time clinically adaptive framework for detecting multiple fundus diseases harnessing ophthalmic foundation models
- Hongyang Jiang
- Zirong Liu
- Carol Y. Cheung
ArticleOpen Access2 Mar 2026 npj Digital Medicine
Ambient scribe in general practice: a multi-perspective before-after longitudinal mixed-methods study
- R. C. A. van Linschoten
- C. M. van Loon
- E. W. M. A. Bischoff
ArticleOpen Access2 Mar 2026 npj Digital Medicine
Deep learning for fast screening and localization of spinal dural arteriovenous fistulas to enhance clinical workflow
- Fei Zheng
- Xuyang Cao
- Nan Hong
ArticleOpen Access27 Feb 2026 npj Digital Medicine
Multi night digital assessment of sleep disordered breathing is associated with accelerated vascular aging
- Lucía Pinilla
- Kelly Sansom
- Danny J. Eckert
ArticleOpen Access26 Feb 2026 npj Digital Medicine
A large-scale benchmark for evaluating large language models on medical question answering in Romanian
- Ana-Cristina Rogoz
- Radu Tudor Ionescu
- Andreea Iuliana Ionescu
ArticleOpen Access21 Feb 2026 npj Digital Medicine
Hierarchical deep learning pipeline for robust cervical parameter measurement in radiographs with C7 obscuration
- Dong-Ho Kang
- Se-Jun Park
- Chong-Suh Lee
ArticleOpen Access19 Feb 2026 npj Digital Medicine
A deep learning model integrating structured data and clinical text for predicting atrial fibrillation recurrence
- Sixiang Jia
- Yanping Yin
- Shudong Xia
ArticleOpen Access16 Feb 2026 npj Digital Medicine
A weakly supervised transformer for rare disease diagnosis and subphenotyping from EHRs with pulmonary case studies
- Kimberly F. Greco
- Zongxin Yang
- Tianxi Cai
ArticleOpen Access6 Feb 2026 npj Digital Medicine
Neck-to-knee dixon MRI thigh volume as a superior mass biomarker for Sarcopenia: evidence from the UK biobank
- Hyeon Su Kim
- Hyunwoo Park
- Jun-Il Yoo
ArticleOpen Access5 Feb 2026 npj Digital Medicine
AI-driven low-cost rehabilitation exergame as a lightweight framework for stroke assessment
- Júlia Tannús
- Caroline Valentini
- Eduardo Naves
ArticleOpen Access28 Jan 2026 npj Digital Medicine
Human–large language model collaboration in clinical medicine: a systematic review and meta-analysis
- Guoyong Wang
- Kaijun Zhang
- Xiaonan Yang
ArticleOpen Access28 Jan 2026 npj Digital Medicine
Evaluation of large language models for diagnostic impression generation from brain MRI report findings: a multicenter benchmark and reader study
- Ming-Liang Wang
- Rui-Peng Zhang
- Yue-Hua Li
ArticleOpen Access22 Jan 2026 npj Digital Medicine
Diagnostic and interpretive gains from reasoning over conclusions with a large reasoning model in radiology
- Ruixin Wang
- Jinghang Wang
- Jun Liu
ArticleOpen Access31 Dec 2025 npj Digital Medicine
A deep learning based automated maxillary sinus segmentation and bone grafts analysis in CBCT images
- Fan Yang
- Xing Wu
- Linhong Wang
ArticleOpen Access31 Dec 2025 npj Digital Medicine
Real-world performance evaluation of a commercial deep learning model for intracranial hemorrhage detection
- Mohammadreza Chavoshi
- Aawez Mansuri
- Hari Trivedi
ArticleOpen Access24 Dec 2025 npj Digital Medicine
ARTEMIS: a pilot study comparing AI-based and expert therapeutic decisions in simulated clinical cases of neuroendocrine neoplasms
- Giuseppe Lamberti
- Francesco Panzuto
- Davide Campana
ArticleOpen Access23 Dec 2025 npj Digital Medicine
Prospective evaluation of speech as a digital biomarker for covert hepatic encephalopathy
- Jakub Gazda
- Juan Carlos García-Pagán
- Peter Jarcuska
ArticleOpen Access22 Dec 2025 npj Digital Medicine
Deep learning-based non-contrast MRI model for nasopharyngeal carcinoma diagnosis: an end-to-end gadolinium-free solution
- Zhen Li
- Yiqian Shi
- Xiaofeng Liu
ArticleOpen Access22 Dec 2025 npj Digital Medicine
ECG sonification methods for robust and generalizable clinical decision support
- Mohamed Elgendi
- Azza Elkhalifa
- Rabab Ward
ArticleOpen Access16 Dec 2025 npj Digital Medicine

Evaluating the Real-World Clinical Performance of AI

Editors

Multidimensional evaluation of large language models in radiology report readability

Comparative performance of LLMs and machine learning in predicting complications after percutaneous kyphoplasty for osteoporotic vertebral compression fractures

A unified deep learning framework for cross-platform harmonization of multi-tracer PET quantification in neurodegenerative disease

Real-world unified denoising for multi-organ fast MRI: a large-scale prospective validation

Limited validity of an AI-powered app for dietary assessment in females with obesity

A multicenter multifunctional assessment of large language models in pure-tone audiogram interpretation for patients

Automated interpretation of fetal cardiac function evaluation from the echocardiogram

Comparing artificial intelligence and healthcare professional performance in surgical and interventional video analysis: a systematic review and meta-analysis

Performance of the 12-lead ECG in predicting short- and long-term risk of sudden cardiac death

A test-time clinically adaptive framework for detecting multiple fundus diseases harnessing ophthalmic foundation models

Ambient scribe in general practice: a multi-perspective before-after longitudinal mixed-methods study

Deep learning for fast screening and localization of spinal dural arteriovenous fistulas to enhance clinical workflow

Multi night digital assessment of sleep disordered breathing is associated with accelerated vascular aging

A large-scale benchmark for evaluating large language models on medical question answering in Romanian

Hierarchical deep learning pipeline for robust cervical parameter measurement in radiographs with C7 obscuration

A deep learning model integrating structured data and clinical text for predicting atrial fibrillation recurrence

A weakly supervised transformer for rare disease diagnosis and subphenotyping from EHRs with pulmonary case studies

Neck-to-knee dixon MRI thigh volume as a superior mass biomarker for Sarcopenia: evidence from the UK biobank

AI-driven low-cost rehabilitation exergame as a lightweight framework for stroke assessment

Human–large language model collaboration in clinical medicine: a systematic review and meta-analysis

Evaluation of large language models for diagnostic impression generation from brain MRI report findings: a multicenter benchmark and reader study

Diagnostic and interpretive gains from reasoning over conclusions with a large reasoning model in radiology

A deep learning based automated maxillary sinus segmentation and bone grafts analysis in CBCT images

Real-world performance evaluation of a commercial deep learning model for intracranial hemorrhage detection

ARTEMIS: a pilot study comparing AI-based and expert therapeutic decisions in simulated clinical cases of neuroendocrine neoplasms

Prospective evaluation of speech as a digital biomarker for covert hepatic encephalopathy

Deep learning-based non-contrast MRI model for nasopharyngeal carcinoma diagnosis: an end-to-end gadolinium-free solution

ECG sonification methods for robust and generalizable clinical decision support

Search

Quick links

Evaluating the Real-World Clinical Performance of AI

Editors

Wei-Qi Wei, MD, PhD

Lequan Yu, PhD

Dinh Nguyen, MD, MSHI

Search

Quick links