Fig. 6: Bayesian teaching generalizes to human users.
From: Bayesian teaching enables probabilistic reasoning in large language models

We show accuracy over rounds when the user is a human participant. The original LLMs achieve strong performance but do not show any learning behavior. In contrast, fine-tuned LLMs (with both Bayesian and oracle teachers) improve their performance over rounds, and the Bayesian LLMs consistently outperform the Oracle LLMs. Error bars show standard errors across four random seeds (and three training runs); the error bars are not visible in the plot because they are very small.