Fig. 3: Distributions of self-reported confidence by model and correctness of response. | npj Gut and Liver

Fig. 3: Distributions of self-reported confidence by model and correctness of response.

From: Across generations, sizes, and types, large language models poorly report self-confidence in gastroenterology clinical reasoning tasks

Fig. 3

Left panel: overall distributions of self-reported confidence scores (0−10 scale) for each model, with star markers indicating mean response accuracy for that model. Right panel: distributions of self-reported confidence scores stratified by response accuracy, showing separate curves for correct and incorrect answers for each model.

Back to article page