Fig. 3: Distributions of self-reported confidence by model and correctness of response.

Left panel: overall distributions of self-reported confidence scores (0−10 scale) for each model, with star markers indicating mean response accuracy for that model. Right panel: distributions of self-reported confidence scores stratified by response accuracy, showing separate curves for correct and incorrect answers for each model.