While physicians routinely consider uncertainty during patient diagnosis, large language models (LLMs) often fail to recognize that real-world clinical data can be too limited for a definitive diagnosis. Zhou et al. address this problem by training a LLM, ConfiDx, to recognize medical cases with limited clinical data. This approach improves the utility of LLMs in the clinic and enables physicians to more effectively recognize and explain uncertainty in their patient care.
Introduction
Physicians often need to make rapid diagnoses based on insufficient data to start treatment and help patients1. To prevent medical misdiagnoses and errors, it is important to consider a broad differential of possible disease etiologies, including both likely and “can’t miss” diagnoses. In the process of arriving at a definitive diagnosis, physicians discuss their diagnostic explanations and adhere to clinical guidelines2,3,4,5. However, since real-world medical cases often do not cleanly align with clinical guidelines, the probability of each diagnosis needs to be considered.
While large language models (LLMs) can generate a list of possible diagnoses, they often cannot accurately assess the likelihood of each diagnosis3. Additionally, when prompted to estimate likelihoods, LLM-generated uncertainty estimates are often inaccurate and overestimated3,6,7. Zhou et al. reason that an explainable AI approach would increase the reliability and trustworthiness of LLMs, i.e., LLMs should be able to provide diagnostic explanations of how well the symptoms and data match diagnostic criteria proposed in clinical guidelines8.
Recognizing uncertainty may improve patient outcomes
Uncertainty is common in hospital settings, and unfortunately, the pressure on physicians to quickly arrive at a definitive diagnosis can lead to misdiagnosis and patient harm9. Most people have a psychological bias towards seeking definitive answers, but acknowledging uncertainty when appropriate increases the likelihood that physicians will continue to seek necessary information and adjust diagnoses based on new information10,11.
Recognizing what information is lacking requires knowing what you do not know12. Many current approaches to applying LLMs fail to assess the sufficiency or lack of medical data before making diagnoses6,13. This is because scientists typically train these LLMs to provide answers rather than admitting uncertainty, and training datasets are often curated to exclude cases with noisy or confusing data. As a result, LLMs can sometimes hallucinate reasoning or diagnoses. Explainability is critical for all AI systems, and in particular, false certainty and misdiagnosis can have dangerous consequences in the medical setting.
ConfiDx acknowledges uncertainty and improves trust
In contrast, the outputs from ConfiDx not only explain how well the diagnostic criteria match the patient case but also what data would be needed to arrive at a more certain diagnosis. This performance increase arises from training ConfiDx on cases with diagnostic evidence-complete and evidence-incomplete notes from the MIMIC-IV dataset, which provides de-identified electronic health records for nearly 300,000 patients treated for cardiovascular, endocrine, or hepatic issues at Beth Israel Deaconess Medical Center14. To generate evidence-incomplete notes, a portion of the relevant diagnostic evidence was initially masked, and ConfiDx was trained to recognize these cases as uncertain diagnoses.
Zhou et al. show that, compared to off-the-shelf models like GPT-4o, OpenAI-o1, Gemini-2.0, Claude-3.7, and DeepSeek-R1, ConfiDx substantially improves uncertainty-aware diagnostic performance, meaning it is better able to provide not just disease diagnosis but also diagnostic explanation, uncertainty recognition, and uncertainty explanation when evaluated on a separate test dataset8. Additionally, compared to standalone medical experts, ConfiDx-assisted experts were 10.7% more accurate in uncertainty recognition, 14.6% better in diagnostic explanation accuracy, and 26.3% better in uncertainty explanation8.
Improvements in recognizing uncertainty and missing information make it easier for physicians to trust ConfiDx, as it is less likely to be falsely confident compared to other LLMs. The explanations provided by ConfiDx allow physicians to follow the “reasoning” of how ConfiDx arrived at the conclusion and determine whether that reasoning makes sense. Importantly, physicians who were randomly selected to use ConfiDx were more likely to recognize uncertainty, provide more accurate diagnostic explanations, and offer improved explanations of uncertainty compared to physicians who did not use ConfiDx or any other LLMs. This could then help patients better process their diagnoses and prognoses.
Conclusion
In an era where quick answers are applauded, taking the time to recognize uncertainty is key to avoiding medical mishaps from misdiagnosis and false confidence. While uncertainty-aware models have been increasingly developed and used outside of medicine, ConfiDx paves the way as the first LLM example in medicine that honors uncertainty. Zhou et al. demonstrate that such LLMs can be beneficial in the clinic and may improve patient outcomes in the future.
Data availability
No datasets were generated or analyzed during the current study.
References
Smith, P. C. et al. Missing clinical information during primary care visits. JAMA 293, 565–571 (2005).
Kresevic, S. et al. Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework. npj Digit. Med. 7, 1–9 (2024).
Hager, P. et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat. Med. 30, 2613–2622 (2024).
Ghorbani, A. et al. Deep learning interpretation of echocardiograms. npj Digit. Med. 3, 10 (2020).
McDuff, D. et al. Towards accurate differential diagnosis with large language models. Nature 642, 451–457 (2025).
Zhou, S. et al. Large language models for disease diagnosis: a scoping review. npj Artif. Intell. 1, 9 (2025).
Vazhentsev, A. et al. Uncertainty-aware abstention in medical diagnosis based on medical texts. https://doi.org/10.48550/arXiv.2502.18050 (2025).
Zhou, S. et al. Uncertainty-aware large language models for explainable disease diagnosis. npj Digit. Med. 8, 690 (2025).
Meyer, A. N. D., Giardina, T. D., Khawaja, L. & Singh, H. Patient and clinician experiences of uncertainty in the diagnostic process: current understanding and future directions. Patient Educ. Couns. 104, 2606–2615 (2021).
Patel, B., Gheihman, G., Katz, J. T., Begin, A. S. & Solomon, S. R. Navigating uncertainty in clinical practice: a structured approach. J. Gen. Intern. Med. 39, 829 (2024).
McGrath, B. M. How doctors think. Can. Fam. Physician 55, 1113 (2009).
Yin, Z. et al. Do large language models know what they don’t know? https://doi.org/10.48550/arXiv.2305.18153 (2023).
Bhasuran, B. et al. Preliminary analysis of the impact of lab results on large language model generated differential diagnoses. npj Digit. Med. 8, 1–15 (2025).
Johnson, A. E. W. et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10, 1 (2023).
Author information
Authors and Affiliations
Contributions
M.S. wrote and edited the main manuscript text. K.R., K.H., E.J.E., and J.C.K. edited and reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
Authors M.S., K.R., K.H., and E.J.E. declare no financial or non-financial competing interests. Author J.C.K. serves as Editor-in-Chief of this journal and had no role in the peer-review or decision to publish this manuscript. Author J.C.K. declares no financial competing interests.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Sui, M., Rosen, K., Heydari, K. et al. The value of doubt: training LLMs to consider diagnostic uncertainty may improve clinical utility. npj Digit. Med. 9, 141 (2026). https://doi.org/10.1038/s41746-025-02307-5
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41746-025-02307-5