Extended Data Fig. 5: Performance of transformers trained on a wide range of composite distributions.
From: Shared sensitivity to data distribution during learning in humans and transformer networks

Scatter plots of the in-context vs in-weights test performances of transformers after training on different values of Pc (proportion of in-context trials during training) and αs (the rest of the trials). Dots are individual models.