Fig. 4: LLM-trained deep recurrent convolutional networks outperform other models in predicting brain activity.
From: High-level visual representations in the human brain are aligned with large language models

a, RCNNs. Our RCNNs have ten recurrent convolutional layers with bottom-up (purple), lateral (green) and top-down (orange) connections, followed by a fully connected readout layer. The training objective is to minimize the cosine distance between the network’s output and the target LLM caption embeddings. Category-trained control networks are identical, except that they are trained to predict multi-hot category labels. Note that, for copyright reasons, we cannot show the real COCO images we used; hence, they have been replaced by similar copyright-free images. b, Category labels can be decoded from LLM-trained RCNN activities. After freezing network weights, we tested how well category labels (respectively LLM embeddings) can be decoded from activities in the pre-readout layer of the LLM-trained (respectively category-trained) network. The plot shows test performance (averaged across N = 10 network instances; error bars represent standard deviation), quantified as the cosine similarity between predicted and target vectors. Dashed horizontal bars show floor performance, operationalized as the performance obtained by predicting the mean training target. c, LLM-trained RCNNs versus LLM embeddings. Searchlight RSA contrast between llm-trained RCNN activities (last layer and timestep) and the LLM embeddings of scene captions. RCNN RDMs are averaged across ten network instances; correlations are averaged across eight participants; significance threshold set by a two-tailed t-test across participants with Benjamini–Hochberg FDR correction; P = 0.05. See Supplementary Fig. 15 for individual participants. Insert: brain-model correlation for LLM-trained RCNNs versus LLM embeddings for each searchlight location. d, LLM-trained versus category-trained RCNNs. Similar plot as c, but showing the contrast between LLM-trained and category-trained RCNNs (last layer and timestep). See Supplementary Fig. 17 for individual participants, Supplementary Fig. 16 for all other RCNN layers and timesteps, and Supplementary Fig. 18 for a reproduction of this effect using the ResNet50 architecture. e, ROI-wise comparison of LLM-trained RCNNs with other widely used ANNs. Noise-ceiling-corrected correlations between the pre-readout layer of various models and ROI RDMs. Our RCNN model significantly outperforms all other models (except CORnet-S, which is not significantly worse in the parietal ROI; two-tailed t-test across participants with Benjamini–Hochberg FDR correction; P = 0.05). Benjamini–Hochberg FDR-corrected P values for all pairwise model comparisons are given in Supplementary Fig. 20.