Fig. 6: GPT-4’s performance across two settings: five-choice vs. three-choice WEP options, for all four metrics. | npj Complexity

Fig. 6: GPT-4’s performance across two settings: five-choice vs. three-choice WEP options, for all four metrics.

From: An evaluation of estimative uncertainty in large language models

Fig. 6: GPT-4’s performance across two settings: five-choice vs. three-choice WEP options, for all four metrics.The alternative text for this image may have been generated using AI.

Results are analyzed under both narrow (less uncertain) and wide (more uncertain) outcome ranges. Standard errors and significance are reported as in Fig. 5.

Back to article page