Fig. 2: Transformers and humans trade off in-context and in-weights learning depending on the training data distribution (Experiment 1).
From: Shared sensitivity to data distribution during learning in humans and transformer networks

a, Training and test performances for transformers (n = 30 per training data distribution). b, Same for human participants (Exp. 1, n = 30 per training data distribution). The small dots indicate data from individual transformers/humans; the large dots indicate group averages. c, Scatter plots of the in-context versus in-weights test performances for feed-forward networks (left), LSTM networks (middle left), transformers (middle right) and humans (right). Feed-forward and LSTM networks do not learn in-context. Transformers and human participants trade off in-context and in-weights learning. Each dot indicates data from an individual model/human.