Introduction

Physicians often need to make rapid diagnoses based on insufficient data to start treatment and help patients1. To prevent medical misdiagnoses and errors, it is important to consider a broad differential of possible disease etiologies, including both likely and “can’t miss” diagnoses. In the process of arriving at a definitive diagnosis, physicians discuss their diagnostic explanations and adhere to clinical guidelines2,3,4,5. However, since real-world medical cases often do not cleanly align with clinical guidelines, the probability of each diagnosis needs to be considered.

While large language models (LLMs) can generate a list of possible diagnoses, they often cannot accurately assess the likelihood of each diagnosis3. Additionally, when prompted to estimate likelihoods, LLM-generated uncertainty estimates are often inaccurate and overestimated3,6,7. Zhou et al. reason that an explainable AI approach would increase the reliability and trustworthiness of LLMs, i.e., LLMs should be able to provide diagnostic explanations of how well the symptoms and data match diagnostic criteria proposed in clinical guidelines8.

Recognizing uncertainty may improve patient outcomes

Uncertainty is common in hospital settings, and unfortunately, the pressure on physicians to quickly arrive at a definitive diagnosis can lead to misdiagnosis and patient harm9. Most people have a psychological bias towards seeking definitive answers, but acknowledging uncertainty when appropriate increases the likelihood that physicians will continue to seek necessary information and adjust diagnoses based on new information10,11.

Recognizing what information is lacking requires knowing what you do not know12. Many current approaches to applying LLMs fail to assess the sufficiency or lack of medical data before making diagnoses6,13. This is because scientists typically train these LLMs to provide answers rather than admitting uncertainty, and training datasets are often curated to exclude cases with noisy or confusing data. As a result, LLMs can sometimes hallucinate reasoning or diagnoses. Explainability is critical for all AI systems, and in particular, false certainty and misdiagnosis can have dangerous consequences in the medical setting.

ConfiDx acknowledges uncertainty and improves trust

In contrast, the outputs from ConfiDx not only explain how well the diagnostic criteria match the patient case but also what data would be needed to arrive at a more certain diagnosis. This performance increase arises from training ConfiDx on cases with diagnostic evidence-complete and evidence-incomplete notes from the MIMIC-IV dataset, which provides de-identified electronic health records for nearly 300,000 patients treated for cardiovascular, endocrine, or hepatic issues at Beth Israel Deaconess Medical Center14. To generate evidence-incomplete notes, a portion of the relevant diagnostic evidence was initially masked, and ConfiDx was trained to recognize these cases as uncertain diagnoses.

Zhou et al. show that, compared to off-the-shelf models like GPT-4o, OpenAI-o1, Gemini-2.0, Claude-3.7, and DeepSeek-R1, ConfiDx substantially improves uncertainty-aware diagnostic performance, meaning it is better able to provide not just disease diagnosis but also diagnostic explanation, uncertainty recognition, and uncertainty explanation when evaluated on a separate test dataset8. Additionally, compared to standalone medical experts, ConfiDx-assisted experts were 10.7% more accurate in uncertainty recognition, 14.6% better in diagnostic explanation accuracy, and 26.3% better in uncertainty explanation8.

Improvements in recognizing uncertainty and missing information make it easier for physicians to trust ConfiDx, as it is less likely to be falsely confident compared to other LLMs. The explanations provided by ConfiDx allow physicians to follow the “reasoning” of how ConfiDx arrived at the conclusion and determine whether that reasoning makes sense. Importantly, physicians who were randomly selected to use ConfiDx were more likely to recognize uncertainty, provide more accurate diagnostic explanations, and offer improved explanations of uncertainty compared to physicians who did not use ConfiDx or any other LLMs. This could then help patients better process their diagnoses and prognoses.

Conclusion

In an era where quick answers are applauded, taking the time to recognize uncertainty is key to avoiding medical mishaps from misdiagnosis and false confidence. While uncertainty-aware models have been increasingly developed and used outside of medicine, ConfiDx paves the way as the first LLM example in medicine that honors uncertainty. Zhou et al. demonstrate that such LLMs can be beneficial in the clinic and may improve patient outcomes in the future.