Fig. 1: A mapping from LLM embeddings captures visual responses to natural scenes.
From: High-level visual representations in the human brain are aligned with large language models

a, LLM to brain mapping methods. Each image in the NSD dataset is associated with captions written by different human observers to describe the scene. These captions are passed through an LLM model to generate embeddings. We use two approaches to quantify the match between these embeddings and fMRI data (RSA and encoding models). Note that, for copyright reasons, we cannot show the real COCO image we used; hence, it has been replaced by a similar copyright-free image. b, RSA reveals an extended network of brain regions where LLM representations correlate with brain activities. Searchlight map for the group-average Pearson correlation (not noise-ceiling corrected) between LLM embeddings (MPNet) and brain representations (significance threshold set by a two-tailed t-test across participants (N = 8) with Benjamini–Hochberg false discovery rate (FDR) correction; P = 0.05). See Supplementary Fig. 3 for individual participants. c, A linear encoding model highlights a similar network of brain regions. We performed voxel-wise linear regression to predict voxel activities from LLM embeddings. Shown is the group-average Pearson correlation map (not noise-ceiling corrected) between the predicted and actual beta responses on the test set (significance threshold set by a two-tailed t-test across participants (N = 8) with Benjamini–Hochberg false discovery rate correction; P = 0.05). See Supplementary Fig. 4 for individual participants. d, Encoding model performance versus interparticipant agreement. Each dot in the scatter plot shows the encoding model performance for a given voxel versus the interparticipant agreement, computed as the mean Pearson correlation between each participant’s (N = 8) voxel activities and the average of the voxel activities of the remaining seven participants on the test images. Our encoding model approaches the interparticipant agreement in all ROIs, indicating good performance. Values below the diagonal can be explained by the fact that the model captures participant-specific variance not captured by the mean of other participants. Calc, calcarine sulcus; CGS, cingulate sulcus; CoS, collateral sulcus; CS, central sulcus; IFRS, inferior frontal sulcus; IPS, intraparietal sulcus; LS, lateral sulcus; OTS, occipitotemporal sulcus; PoCS, post-central sulcus; PrCS, precentral sulcus; SFRS, superior frontal sulcus; STS, superior temporal sulcus.