Fig. 1: Evaluating and improving LLMs’ probabilistic belief updates.
From: Bayesian teaching enables probabilistic reasoning in large language models

The flight recommendation task (left) involves multi-round interactions between a user and a flight booking assistant. In each round, the assistant is asked to recommend to the user one of three available flight options. The assistant is then shown the flight that was in fact chosen by the user (based on the user’s reward function, which characterizes the user’s preferences). To make good recommendations, the assistant needs to infer the user’s preferences from the user’s choices. To teach the LLM to reason probabilistically, we fine-tune the LLM on interactions between users and a Bayesian Assistant, which represents the normative way to update beliefs about the user’s preferences. We then evaluate the fine-tuned model on the flight recommendation task as well as two new tasks (right).