Extended Data Fig. 3: Performance as a function of the image frequency during training.
From: Shared sensitivity to data distribution during learning in humans and transformer networks

a,b, Training and test performances for transformers (top, N = 30 per training data distribution) and human participants (bottom, Exp. 1, N = 30 per training data distribution) as a function of the frequency of the image during training. For each value of α, test items were grouped by how often they appeared during training. For example, in α = 2: ‘top 1’ corresponds to the image that was seen 92 times during training, ‘top 2–4’ to images that were seen ~13 times, and ‘top 5–10’ to images that were seen ~2 times. Large dots are group average. Errors are s.e.m.