Fig. 3: Evaluation in different held-out settings. | Nature

Fig. 3: Evaluation in different held-out settings.

From: A foundation model to predict and capture human cognition

Fig. 3

a, Negative log-likelihoods averaged over responses (n = 9,702) for the two-step task with a modified cover story23. b, Negative log-likelihoods averaged over responses (n = 510,154) for a three-armed bandit experiment25. c, Negative log-likelihoods averaged over responses (n = 99,204) for an experiment probing logical reasoning26 with items based on the Law School Admission Test (LSAT). Centaur outperforms both Llama and domain-specific cognitive models when faced with modified cover stories, problem structures and entirely new domains. N/A, not applicable. Error bars show the s.e.m. The image in a is reproduced from ref. 23, Springer Nature Limited. The image in c is reproduced from Wikipedia.org.

Back to article page