Multidimensional evaluation of large language models in radiology report readability

Mao, Yunhai; Wang, Chunyan; Li, Yuxin; Wang, Wei; Zhang, Mengchao

doi:10.1038/s41746-026-02589-3

Article
Open access
Published: 01 April 2026

Multidimensional evaluation of large language models in radiology report readability

npj Digital Medicine , Article number: (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

This study systematically investigated the influence of demographic characteristics on the readability of patient-centric radiology reports and compared the performance of different large language models (LLMs) in generating patient-centered reports. Adopting a sequential two-stage design, the research first conducted a retrospective evaluation involving 320 radiology reports followed by a clinical setting validation with 800 patients. Results suggested that all three LLMs significantly improved the readability of radiology reports (P < 0.05), with DeepSeek-R1 showing potentially superior performance within this specific cohort. Demographic analysis revealed significant interactive effects: higher education and older age (within consistent educational levels) were associated with better comprehension. Clinical setting validation further indicated that reading simplified reports suggesting the potential to significantly improved patients’ subjective and objective comprehension while significantly alleviating medical anxiety (P < 0.05). However, limitations persist, including inconsistent model outputs, missing anatomical details, and comprehension variances driven by demographic factors. Consequently, LLMs should be integrated as auxiliary communication tools for radiologists rather than standalone solutions, necessitating personalized interventions tailored to specific demographic profiles.

Multi-step retrieval and reasoning improves radiology question answering with large language models

Article Open access 22 December 2025

Evaluation of large language models for diagnostic impression generation from brain MRI report findings: a multicenter benchmark and reader study

Article Open access 22 January 2026

Diagnostic and interpretive gains from reasoning over conclusions with a large reasoning model in radiology

Article Open access 31 December 2025

Data availability

The datasets generated or analyzed during the study are available from the corresponding author on reasonable request.

References

Vijan, A., Bhagwanani, A., Calle, F. & Brun-Vergara, M. L. Optimizing patient communication in radiology. Radiographics 43, e230002 (2023).
Google Scholar
Rockall, A. G., Justich, C., Helbich, T. & Vilgrain, V. Patient communication in radiology: moving up the agenda. Eur. J. Radiol. 155, 110464 (2022).
Google Scholar
Cabarrus, M., Naeger, D. M., Rybkin, A. & Qayyum, A. Patients prefer results from the ordering provider and access to their radiology reports. J. Am. Coll. Radi. ol. 12, 556–562 (2015).
Google Scholar
Gunn, A. J. et al. JOURNAL CLUB: structured feedback from patients on actual radiology reports: a novel approach to improve reporting practices. AJR Am. J. Roentgenol. 208, 1262–1270 (2017).
Google Scholar
Martin-Carreras, T., Cook, T. S. & Kahn, C. E. Jr Readability of radiology reports: implications for patient-centered care. Clin. Imaging 54, 116–120 (2019).
Google Scholar
Burns, J., Agarwal, V., Catanzano, T. M., Schaefer, P. W. & Jordan, S. G. Talking points: enhancing communication between radiologists and patients. Acad. Radiol. 29, 888–896 (2022).
Google Scholar
Yin, S. et al. A survey on multimodal large language models. Natl. Sci. Rev. 11, nwae403 (2024).
Google Scholar
Gulati, V. et al. Transcending language barriers: can ChatGPT Be the key to enhancing multilingual accessibility in health care? J. Am. Coll. Radiol. 21, 1888–1895 (2024).
Google Scholar
Herwald, S. E. et al. RadGPT: a system based on a large language model that generates sets of patient-centered materials to explain radiology report information. J. Am. Coll. Radiol. 22, 1050–1059 (2025).
Google Scholar
Leutz-Schmidt, P. et al. Performance of large language models ChatGPT and Gemini on workplace management questions in radiology. Diagnostics 15, 497 (2025).
Google Scholar
Elhakim, T. et al. Enhanced PROcedural information READability for Patient-Centered Care in Interventional Radiology With Large Language Models (PRO-READ IR). J. Am. Coll. Radiol. 22, 84–97 (2025).
Google Scholar
Kim, H. et al. Conversion of mixed-language free-text CT reports of pancreatic cancer to national comprehensive cancer network structured reporting templates by using GPT-4. Korean J. Radiol. 26, 557–568 (2025).
Google Scholar
Çamur, E., Cesur, T. & Güneş, Y. C. A comparative study: performance of large language models in simplifying Turkish computed tomography reports. J. Infect. Public Health 87, 321–326 (2024).
Google Scholar
Berzolla, E. et al. Artificial intelligence large language models improve patient comprehension of radiologist magnetic resonance imaging reports. Arthroscopy 41, 4607–4614.e4604 (2025).
Google Scholar
Chen, A. H., Rudin, R. S., Levine, D. M. & Mehrotra, A. Improving patient understanding of radiology reports using generative artificial intelligence: a vignette study of 2000 US adults. J. Am. Med. Inform. Assoc. https://doi.org/10.1093/jamia/ocaf187 (2025).
Doyle, C., Lennox, L. & Bell, D. A systematic review of evidence on the links between patient experience and clinical safety and effectiveness. BMJ Open 3, e001570 (2013).
Google Scholar
Jeblick, K. et al. ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports. Eur. Radiol. 34, 2817–2825 (2024).
Google Scholar
Doshi, R. et al. Quantitative evaluation of large language models to streamline radiology report impressions: a multimodal retrospective analysis. Radiology 310, e231593 (2024).
Google Scholar
Rahsepar, A. A. Large language models for enhancing radiology report impressions: improve readability while decreasing burnout. Radiology 310, e240498 (2024).
Google Scholar
Nakaura, T. et al. The impact of large language models on radiology: a guide for radiologists on the latest innovations in AI. Jpn. J. Radiol. 42, 685–696 (2024).
Google Scholar
Prucker, P. et al. A prospective controlled trial of large language model-based simplification of oncologic CT reports for patients with cancer. Radiology 317, e251844 (2025).
Google Scholar
Jebb, A. T., Ng, V. & Tay, L. A review of key likert scale development advances: 1995-2019. Front. Psychol. 12, 637547 (2021).
Google Scholar

Download references

Acknowledgements

We gratefully acknowledge the Radiology Department of the Third Hospital of Jilin University for their support of this research, and Professor Mengchao Zhang on the research team.

Author information

These authors contributed equally: Yunhai Mao, Chunyan Wang.

Authors and Affiliations

Department of Radiology, the Third Hospital of Jilin University, Changchun, China
Yunhai Mao, Chunyan Wang, Yuxin Li, Wei Wang & Mengchao Zhang

Authors

Yunhai Mao
View author publications
Search author on:PubMed Google Scholar
Chunyan Wang
View author publications
Search author on:PubMed Google Scholar
Yuxin Li
View author publications
Search author on:PubMed Google Scholar
Wei Wang
View author publications
Search author on:PubMed Google Scholar
Mengchao Zhang
View author publications
Search author on:PubMed Google Scholar

Contributions

M.Z. conceptualized the study, performed formal analysis and investigation, and was responsible for project administration and supervision. Y.M. and C.W. (equal contributors) contributed to data curation, formal analysis, methodology, validation, visualization, and wrote the original draft and revised the manuscript. Y.L. contributed to data curation, methodology, and visualization. W.W. contributed to methodology and writing the original draft. All authors have read and approved the manuscript.

Corresponding author

Correspondence to Mengchao Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Mao, Y., Wang, C., Li, Y. et al. Multidimensional evaluation of large language models in radiology report readability. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02589-3

Download citation

Received: 15 December 2025
Accepted: 17 March 2026
Published: 01 April 2026
DOI: https://doi.org/10.1038/s41746-026-02589-3

Multidimensional evaluation of large language models in radiology report readability

Subjects

Abstract

Similar content being viewed by others

Multi-step retrieval and reasoning improves radiology question answering with large language models

Evaluation of large language models for diagnostic impression generation from brain MRI report findings: a multicenter benchmark and reader study

Diagnostic and interpretive gains from reasoning over conclusions with a large reasoning model in radiology

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information (download PDF )

Rights and permissions

About this article

Cite this article

Search

Quick links

Subjects

Abstract

Similar content being viewed by others

Multi-step retrieval and reasoning improves radiology question answering with large language models

Evaluation of large language models for diagnostic impression generation from brain MRI report findings: a multicenter benchmark and reader study

Diagnostic and interpretive gains from reasoning over conclusions with a large reasoning model in radiology

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information (download PDF )

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links