Fig. 4: Analysis of hallucinations in the LLMs and classification of diagnostic errors by LLMs and physicians. | npj Digital Medicine

Fig. 4: Analysis of hallucinations in the LLMs and classification of diagnostic errors by LLMs and physicians.

From: Multiple large language models versus experienced physicians in diagnosing challenging cases with gastrointestinal symptoms

Fig. 4

a, b The left scatter plot shows the distribution of the number of hallucinations vs. the coverage rate for each round of responses by seven LLMs. A negative correlation was observed but not statistically significant (Pearson coefficient = −0.356, p = 0.063). The right scatter plot displays the distribution of the number of hallucinations vs. accuracy for each round of responses by the same LLMs, revealing a significant negative correlation (Pearson coefficient = −0.458, p = 0.014). c The first seven pie charts illustrate the categorization of error types for the seven LLMs across four rounds of repeated testing. The last three pie charts represent the error categorization made by five physicians: one for all cases in the dataset, one for the GI subgroup, and one for the non-GI subgroup.

Back to article page