Figure 6 | Scientific Reports

Figure 6

From: Cortical network responses map onto data-driven features that capture visual semantics of movie fragments

Figure 6

Gradual emergence of the semantic information from low-level visual features. (a) Similarity of frame representations between each intermediate layer of the automatic visual object recognition model and the semantic components used to fit the brain data. We used all so-called pooling layers of the object recognition model as intermediate layers. First layer (pool1) is placed quite early in the model (it is preceded by only two convolutional layers), whereas the last layer (pool5) is located very deep in the model (followed by two dense layers and a probability output layer). The similarity with the semantic components is measured as Pearson correlation of all pairwise frame comparisons between each pooling layer and the semantic components (see “Methods” for details). Dark grey line shows similarity between each pooling layer and all 50 semantic components, light grey line shows similarity between each pooling layer and only top five semantic components. In both cases the shading represents 95th confidence interval based on the bootstrapping procedure (sampling of 1,000 frames 10,000 times). (b) Cortical map of the difference in prediction accuracy between the fit using features of the last pooling layer (pool5) and the semantic components. (c) Scatter plot showing the difference in prediction accuracy between the brain fit using pool5 (\({r}_{pool5}\)) and the semantic components (\({r}_{sem}\)). The results are shown for the models fitted at a 320 ms temporal shift of the brain data with respect to the stimulus onset. Each point represents a cross-validated accuracy per individual electrode. Red-colored points denote electrodes with \({r}_{sem}-{r}_{pool5}>0.1\), blue-colored points denote electrodes with \({r}_{pool5}-{r}_{sem}>0.1\). Red-colored line shows median prediction accuracy over all electrodes with a significant fit for the semantic components, blue-colored line shows the same value for pool5. (d) Significant difference in the prediction accuracy for individual regions of interest (ROI). Each bar shows a difference in median prediction accuracy values (\({r}_{sem}-{r}_{pool5}\)) per ROI. Error bars represent 95th confidence interval based on the bootstrapping procedure (resampling of the electrode accuracy values per ROI 10,000 times). We only show the ROIs, for which the difference in the median prediction accuracy (\({r}_{sem}-{r}_{pool5}\)) was above zero (based on the confidence intervals of the bootstrapping distributions): pOrb (pars orbitalis), Cu (cuneus), pCu (precuneus), rMFG (rostral middle frontal gyrus), pOpr (pars opercularis), ITG (inferior temporal gyrus), SPG (superior parietal gyrus), IPG (inferior parietal gyrus), SM (supramarginal gyrus), LOC (lateral occipital complex), STG (superior temporal gyrus), SFG (superior frontal gyrus), mOFC (medial orbitofrontal cortex), cMFG (caudal middle frontal gyrus), MTG (middle temporal gyrus), prCe (precentral gyrus). All ROIs defined in the Desikan–Killiany atlas (34 regions in total) were used in this analysis.

Back to article page