Introduction

Higher visual functions, particularly object and face recognition, are crucial for interacting with and making sense of the environment. A hallmark of primate vision is our ability to rapidly and effortlessly identify and categorize objects across variations in the retinal input1. This remarkable capacity relies on the brain’s ability to extract relevant visual features from complex visual input and integrate them into a coherent percept.

The hierarchical organization of the ventral visual cortex has long been recognized as a key factor in transforming retinal input into useful perceptual representations2. This hierarchy progresses from early visual cortex (V1, V2) responding to low-level features such as local contrast and orientation, through intermediate areas (V4) encoding more complex features such as textures and shapes, to anterior regions exhibiting categorical distinctions in their responsiveness to faces, bodies, and objects3,4,5. In addition to feedforward pathways giving rise to increasingly complex representations, recurrent and feedback connections are thought to help ensure robust recognition in the face of challenges during natural visual experience, such as partial occlusion or varying light conditions6,7.

However, this cortico-centric view is incomplete. Recent evidence suggests that recurrent connectivity with the thalamus, particularly the pulvinar, may play a critical role in achieving robust perceptual and cognitive functions8,9,10. The pulvinar, with its extensive reciprocal connections to visual cortex, is well-positioned to influence visual processing at multiple levels. While much research has focused on the pulvinar’s involvement in early sensory processing11,12,13, visual attention14,15, and visuomotor behaviors16,17, its distinct anatomical connectivity with ventral visual cortex18 suggests a potential role in high-level object recognition processes.

Despite the presence of distinct anatomical connectivity between the pulvinar and cortical regions supporting high-level visual processes, our understanding of the pulvinar’s role in complex visual functions, such as object recognition, remains limited. Previous investigations of pulvinar function have predominantly employed simple stimuli, focusing on low-level visual features, such as orientation and image contrast11,19. Some evidence hints at the pulvinar’s involvement in higher-level visual representations: anatomical tracing in non-human primates reveals connections between the pulvinar and face-selective cortical regions20; electrophysiological recordings have identified pulvinar neurons sensitive to face-like shapes21 and facial emotion22; and fMRI studies in humans have identified functional substructure within the pulvinar with sensitivity to visual categories23,24. However, the specificity of these findings to object recognition processes remains unclear. For example, medial pulvinar damage can impair emotion recognition, while sparing other aspects of visual recognition25. Moreover, individual pulvinar regions receive convergent inputs from functionally diverse cortical regions, potentially resulting in broadly tuned representations. Finally, whereas prior work has demonstrated the existence of retinotopic maps in pulvinar for low-level visual features11,19,26, it is presently unknown whether such spatial coding is also present in areas of the pulvinar that may be responsive to higher-level visual features. Thus, how the pulvinar represents complex visual information, its functional topography, and its relationship to cortical processing of visually presented objects remains unclear.

To investigate the pulvinar’s role in object recognition and the relationship of the pulvinar to cortical areas, we leveraged the Natural Scenes Dataset (NSD; http://naturalscenesdataset.org), a 7T fMRI dataset consisting of responses to tens of thousands of natural scenes. NSD constitutes an especially promising opportunity to study the pulvinar, given the use of ultra-high magnetic field strength (7T) to improve signal-to-noise ratio and achieve high spatial resolution (1.8 mm). We first developed encoding models to characterize spatial coding and responsiveness to different visual features present within natural scenes, including low-level image content and selectivity to specific visual categories. We then used model-free correlation analyses to identify co-activation patterns between the pulvinar and visual cortex during the processing of natural scenes. These analyses revealed a pulvinar region selectively responsive to faces and bodies, distinct from portions of the pulvinar sensitive to low-level visual content. Notably, both regions exhibited retinotopic organization, suggesting a common organizing principle across different levels of visual processing. The body- and face-selective pulvinar regions also showed specific co-activation patterns with corresponding body- and face-selective cortical areas. Furthermore, there was some evidence that these clusters were embedded within a broader gradient of cortical correlations across the pulvinar. These findings position the pulvinar as an important structure in high-level vision, potentially interfacing with cortex at multiple levels of visual processing.

Results

Using pRF models to characterize representations of features in natural scenes

We first characterized the pulvinar’s response to different visual features within natural scenes by building and testing encoding models for the Natural Scenes Dataset (NSD). We evaluated a wide range of image features, including low-level properties, such as local contrast, and high-level content, such as bodies and faces. To predict voxel-wise responses, we coupled features extracted from the NSD images with a population receptive field (pRF) model27 (Fig. 1). This approach characterizes each voxel as being jointly tuned for a specific stimulus feature and an area of visual space. We tested models corresponding to seven different features prevalent in real-world scenes: local contrast, salience, faces, bodies, words, foreground, and background (Supplementary Fig. 1; see Methods for details about how each feature was computed). For each feature, we obtained a full set of best-fitting pRF parameters, including pRF position, size, and gain for every voxel within each participant. This model-based approach allowed us to assess multiple levels of visual processing within the pulvinar. Importantly, when applied in cortex, these pRF models revealed selective responses in the ventral visual stream regions tuned for particular visual categories, including the body-selective FBA and the face-selective FFA (Supplementary Fig. 6), thereby validating the approach before probing subcortex.

Fig. 1: Identifying stimulus features encoded in voxel responses using population receptive field (pRF) models.
Fig. 1: Identifying stimulus features encoded in voxel responses using population receptive field (pRF) models.
Full size image

Each NSD image was converted into feature maps that indicated the presence of a given feature at each location in the image. Seven types of feature maps, corresponding to different low-, mid-, and high-level image properties (see Methods for details), were prepared (background was defined as the inverse of foreground). To predict the response to an image, a given feature map is weighted by a 2D isotropic Gaussian, summed over the visual field, and then a compressive static nonlinearity is applied (Kay, Winawer, et al., 2013). Model parameters are optimized to minimize error between observed BOLD amplitudes (gray circle markers) and predicted BOLD amplitudes (purple “X” markers). For each voxel, a separate model was fit for each feature type, attempting to account for the fMRI response amplitudes to the 9000–10,000 natural scenes seen by each NSD subject. Image examples are reproduced from the NSD stimulus set, derived from the Microsoft COCO image database57 licensed under CC-BY-4.0.

Distinct pulvinar regions responsive to low- and high-level features

Our analyses revealed distinct regions within the pulvinar responsive to different types of visual content within natural scenes. We observed a clear functional segregation between regions of the pulvinar that process low-level visual features and those responsive to high-level object categories (Fig. 2). The variance explained by the pRF models was relatively low in the pulvinar compared to cortex (Supplementary Fig. 6). This difference can be attributed to many factors that impair signal detection within the thalamus, including the relatively high distance from the radiofrequency coils, the intermixing of cell bodies and white matter, the close proximity to ventricles, and other intrinsic differences in noise sources between midbrain and cortical structures28. Despite this, the observed patterns of activation were consistent across individuals, underscoring the reliability and functional relevance of these distinctions. Across subjects, low-level feature processing was predominantly localized to the inferior-lateral portions of the pulvinar and the neighboring lateral geniculate nucleus (LGN). The local contrast and image salience models yielded highly correlated variance-explained maps (r = 0.8; Supplementary Fig. 2). We found that these models best accounted for BOLD responses in the LGN (green outline) and inferior-lateral portions (light blue) of the pulvinar (Fig. 2A, B; Supplementary Fig. 3). This finding aligns with previous human neuroimaging studies demonstrating these regions’ sensitivity to basic visual patterns, such as flickering checkerboards26,29,30. In contrast, high-level feature processing was localized to medial and posterior regions of the pulvinar across subjects. The body and face pRF models best accounted for BOLD responses in these regions (Fig. 2A, B; Supplementary Fig. 3). The high correlation between face and body variance-explained maps (r = 0.93) may reflect the frequent co-occurrences of these features in natural scenes. The foreground pRF model showed a broader pattern of responsiveness, accounting for variance in both low-level and feature-responsive regions. Foreground maps were correlated with image contrast (r = 0.6) and salience (r = 0.65) as well as with face (r = 0.41) and body (r = 0.35) feature maps. As an important control comparison, not all features were linked to activations in the pulvinar. Background and word pRF models explained little variance in pulvinar activations (Supplementary Fig. 4), and we therefore do not examine these models further.

Fig. 2: Distinct regions of the pulvinar responsive to low- and high-level visual features.
Fig. 2: Distinct regions of the pulvinar responsive to low- and high-level visual features.
Full size image

A Group-average (n = 8) variance-explained maps for six pRF feature models shown in three adjacent coronal images covering the LGN and pulvinar. The first column shows colored contours of five functionally defined subcortical ROIs derived from independent data overlaid on corresponding MNI anatomical coronal slices (green LGN; light blue ventral pulvinar; dark blue dorsolateral pulvinar; dark purple dorsomedial pulvinar; magenta superior colliculus; see Methods for details; Arcaro et al. 2015; 2018). Remaining columns show variance-explained maps for each pRF model. Only voxels with >0.2% variance explained are shown. B Maximum intensity projection of variance-explained results along the anterior-posterior axis for each feature model. Dashed colored contours indicate the five ROIs. Variance-explained values from voxels outside the thalamus ROI were set to 0%. C Left, sagittal view of anatomy with red lines indicating which slices are plotted in subsequent panels. Right, coronal views of winner-take-all analysis, showing which feature model explained the most variance for each voxel within the posterior thalamus ROI. Only voxels with >0.2% variance explained by any feature model are included.

A winner-take-all analysis comparing the variance explained by different feature models further illustrates this feature segregation within the pulvinar. Activity in the inferior-lateral pulvinar was best explained by image contrast and salience, while activity in the posterior-medial pulvinar was best explained by bodies and faces (Fig. 2C). Notably, none of these models captured substantial variance in dorsal parts of the pulvinar associated with attentional filtering23, suggesting that object-recognition processes are specifically linked to the ventral pulvinar. The limited overlap between variance-explained maps for the contrast model versus the body and face models, combined with the lack of correlation in variance explained across models (Supplementary Fig. 2), demonstrates a clear functional substructure within the pulvinar, with anatomically distinct regions exhibiting selective responses to low- and high-level visual features.

Organized spatial coding of low- and high-level visual features within the pulvinar

Having identified distinct pulvinar regions responsive to low- and high-level visual content in real-world scenes, we next examined their spatial coding. Each pRF feature model produced a set of parameter estimates characterizing voxel-wise spatial selectivity throughout the pulvinar (Supplementary Fig. 4). We focused on the image contrast and body pRF models, as these models explained the most variance in responses to the NSD stimuli within the lateral and medial pulvinar, respectively (Fig. 2C).

With respect to low-level features, the contrast model revealed a bilateral topographic organization of contralateral visual space within inferior-lateral portions of the pulvinar across subjects (Fig. 3B). The upper visual field (Fig. 3B, first row; red) was represented ventral-laterally, while the lower visual field (blue) was represented superior and medially. Between these vertical meridian representations, there was a qualitative gradual progression of preferred visual angle crossing the horizontal midline (green). Quantitatively, pRF angle estimates could be predicted by a linear gradient oriented in 3D largely along the ventral-lateral to dorsal-medial axis (left hemisphere: R2 = 0.16; right hemisphere: R2 = 0.16; Supplementary Fig. 8A–D). These representations were predominantly within central visual space (Fig. 3B, second row) and exhibited strong lateralization to the contralateral visual field (Fig. 3B, third row). At the group-average level, 73% of voxels having at least 0.1% variance explained had pRFs tuned within 3° eccentricity (range at individual subject level: 49–79%). pRF eccentricity estimates could be predicted by a simple model consisting of foveal center surrounded by more peripheral representations in 3D (left hemisphere: R2 = 0.26; right hemisphere: R2 = 0.36; Supplementary Fig. 8E). pRF organization was consistent across individual subjects (Fig. 3D) and aligns with prior electrophysiological work in non-human primates11,19 and fMRI studies in humans26,30 showing an inverted retinotopic map in this region of the pulvinar. These results demonstrate that the spatial coding of the ventral pulvinar can be effectively probed by modeling responses to low-level feature content in real-world scenes.

Fig. 3: Spatial coding of low- and high-level visual features in the pulvinar.
Fig. 3: Spatial coding of low- and high-level visual features in the pulvinar.
Full size image

A An MNI anatomical reference illustrating the location and extent of the coronal slices used in panels (B and C). Colored contours indicate the five subcortical ROIs shown in Fig. 2. B Group-average (n = 8) results from the contrast pRF model. From top to bottom, polar angle, eccentricity, and visual-field laterality are shown for three coronal sections. Data were thresholded at 0.1% variance explained (by the contrast pRF model). C Group-average results from the body pRF model. Same format as panel (B) (with data thresholded based on body pRF model results). D Visual field coverage plots for the contrast pRF model. pRFs from the top 100 voxels in each hemisphere (in terms of variance explained) are plotted on the visual field. Dots indicate the centers of pRFs while circles indicate the extent of the pRFs (±1 standard deviation). The gray square indicates the size and location of the stimulus. Dashed lines indicate the vertical and horizontal meridians. The larger subpanel shows results from a group average (i.e., based on averaging pRF parameters across participants in each voxel), while the other subpanels show results from individual subjects. E Visual field coverage for the body pRF model, in the same format as panel (D).

Surprisingly, responses to high-level visual content also exhibited clear topographic organization. The body pRF model revealed bilateral representations of contralateral visual space in pulvinar regions posterior and medial to the contrast model activations (Fig. 3C, first row). Upper visual field representations were localized to dorsal medial portions, while horizontal meridian and lower visual field representations were ventral and lateral. The medial pulvinar contained a central visual field representation surrounded by peripheral representations (Fig. 3C, second row). Body pRFs were generally lateralized to the contralateral visual hemifield (Fig. 3C, third row) and covered the extent of the visual field stimulated by the NSD images (Fig. 3E). At the group-average level, only 19% of voxels having at least 0.1% variance explained had pRFs tuned within 3° eccentricity (range at individual level: 17–57%). Similar to the contrast maps, body pRF angle estimates (left hemisphere: R2 = 0.07; right hemisphere: R2 = 0.03; Supplementary Fig. 8F–I) and eccentricity estimates (left hemisphere: R2 = 0.33; right hemisphere: R2 = 0.43; Supplementary Fig. 8J) could be partially predicted by simple low-dimensional spatial models. These results demonstrate that the processing of high-level categorical content in the pulvinar is anchored to spatial coding of sensory input.

Comparison of visual field representations derived from the contrast and body models revealed two distinct, mirror symmetric retinotopic zones. The organization of visual space along the polar angle dimension for the body model was inverted relative to that in the lateral pulvinar for the contrast model. The lower visual field representations of both models overlapped (Fig. 3B, C; Supplementary Fig. 5, dashed lines), whereas the upper visual field representation from the contrast model was located ventrolaterally and that of the body model dorsomedially. In the left hemisphere, this qualitative characterization of map orientations was reinforced by the quantitative analysis results (Supplementary Fig. 8). Together, these findings indicate that lateral and medial regions of the pulvinar each contain distinct retinotopic maps, and that their relative positioning suggests a continuity of retinotopic organization across the pulvinar. This organization parallels the layout of multiple retinotopic maps in visual cortex, where adjacent areas are arranged so that visual field representations progress smoothly from one map to the next.

Model-free analyses reveal similarity in representations between cortex and pulvinar

As a complement to our model-based analyses, we devised a model-free approach to relate cortical and subcortical representations of natural scenes. Prior research has shown that different regions of the pulvinar are anatomically connected31 and functionally coupled23 with different parts of visual cortex. The anatomical locations of the contrast- and body-selective subregions align with evidence for a gradient of pulvino-cortical connectivity26,31, suggesting that visually evoked activity in different parts of the pulvinar might resemble activity in functionally corresponding cortical regions. To test this hypothesis and the functional specificity of such connectivity, we conducted a correlational analysis between the pulvinar and cortex based on activations from the NSD dataset.

We analyzed single-trial BOLD response amplitudes in the pulvinar and cortex, derived using GLMsingle to capture the response of each voxel to individual image trials in NSD (Fig. 4A; see Methods for details). For each subject, we identified pulvinar voxels showing the highest variance explained by the contrast and body models. We then computed trial-by-trial correlations between the BOLD responses in these pulvinar voxels and all cortical voxels (Fig. 4B). This analysis revealed robust correlations between the pulvinar and extensive portions of the visual cortex for both the contrast-peak and body-peak pulvinar voxels, with weaker correlations in non-visual cortex (Fig. 4D). Contrast-peak voxels showed strong correlations with both early and anterior visual cortex, whereas body-peak voxels correlated more selectively with anterior visual cortex. Importantly, qualitatively similar maps are produced using different voxel seed selections (Supplementary Fig. 7), indicating the robustness of these findings. These findings demonstrate distinct patterns of cortical coupling for lateral and medial regions of the ventral pulvinar.

Fig. 4: Contrast- and body-selective pulvinar subregions are co-active with distinct cortical regions.
Fig. 4: Contrast- and body-selective pulvinar subregions are co-active with distinct cortical regions.
Full size image

A Schematic showing the same-trial correlation analysis, where correlation is computed between BOLD responses observed at the same time. Image examples are reproduced from the NSD stimulus set, derived from the Microsoft COCO image database57 licensed under CC-BY-4.0. B Schematic showing how subcortical-to-cortical correlations are computed. For a given seed voxel in the pulvinar, we correlated the activity in that seed voxel with activity in each vertex on the cortical surface, resulting in a map of correlations across the cortical surface. C Extents of the pRFs (±1 standard deviation) for seed voxels selected from each subject. Contrast seed voxels are shown in purple, body seed voxels in yellow. The gray square indicates the size and location of the stimulus. Dashed lines indicate the vertical and horizontal meridians. D Cortical correlations resulting from seeding the peak voxel in the pulvinar for the (left) contrast and (right) body models. Data are thresholded based on a bootstrap procedure (see Methods for details). Inset images show unthresholded maps.

To isolate correlations driven by stimulus-evoked responses from other potential sources of co-activation (such as arousal and attention), we recomputed correlations between the pulvinar and cortex using responses to the same image but from different trials (Fig. 5A). This approach yielded more spatially localized correlations. For the contrast peak, cortical correlations were predominantly confined to posterior occipital cortex, aligning with the extent of early visual retinotopic maps (V1-hV4) identified from a separate localizer experiment (Fig. 5B; see Methods for details). Body peak co-activations were primarily constrained to extrastriate visual cortical areas responsive to faces and bodies, including EBA, FBA, and FFA (Fig. 5C). Notably, the body peak showed no correlation with early visual cortex or other category-selective extrastriate areas, such as place (e.g., TOS, PPA) and word areas (e.g., OWFA, VWFA). These findings demonstrate functionally specific coupling between pulvinar and cortex in response to real-world stimulation, reflecting an alignment of processing between the thalamus and cortex.

Fig. 5: Different-trials analysis reveals highly specific pulvino-cortical correspondences.
Fig. 5: Different-trials analysis reveals highly specific pulvino-cortical correspondences.
Full size image

A Correlation is computed between BOLD responses evoked by the same stimulus image but on different trials.Image examples are reproduced from the NSD stimulus set, derived from the Microsoft COCO image database 57 licensed under CC-BY-4.0. B Cortical correlations corresponding to the peak pulvinar voxel for the contrast pRF model. Data are thresholded based on a bootstrap procedure (see Methods for details). Bottom image shows zoomed-in view of peak correlations within occipital cortex and correspondence to occipital retinotopic map ROIs (V1, V2, V3, hV4) defined based on separate localizer experiments (see Methods and Allen et al., 2022 for details). C Cortical correlations corresponding to the peak pulvinar voxel for the body pRF model. Same conventions as (B), except that ROI outlines instead indicate category-selected areas: face-selective (OFA, FFA) in green, body-selective (EBA, FBA) in dark blue, word-selective (OWFA, VWFA, mfs) in cyan, and scene-selective (TOS, PPA) in white.

To contextualize these results with respect to the entire pulvinar, we performed a third analysis correlating responses from various cortical areas with responses from each voxel within the pulvinar (Fig. 6A). As in Fig. 5, this analysis was performed using responses to the same images from different trials. This analysis revealed that V1 correlations were strongest within the LGN and ventrolateral areas of the pulvinar (Fig. 6B), consistent with the greater anatomical connectivity of those subcortical regions with early visual cortex31. In contrast, FFA and FBA correlations were biased toward superior and medial aspects of pulvinar (Fig. 6B). Notably, voxels best explained by the contrast pRF model (Fig. 6B; purple circles) aligned with the peak V1 correlation, while those best explained by the face (yellow circles) or body (green circles) pRF models aligned with the peak FFA and FBA correlations.

Fig. 6: Progression along ventral cortical hierarchy recapitulated in the pulvinar.
Fig. 6: Progression along ventral cortical hierarchy recapitulated in the pulvinar.
Full size image

A Schematic showing how cortical-to-subcortical correlations were computed. For a given cortical surface ROI, we correlated the ROI-averaged cortical activity with activity in each subcortical voxel, resulting in a volumetric correlation map throughout the pulvinar. Correlations were computed on BOLD responses to the same stimulus across different trials. B Coronal slices showing pulvinar correlation maps corresponding to V1, FFA, and FBA cortical ROIs. At each subcortical voxel, we plot the number of subjects for whom correlation values passed a threshold of |r|> 0.01 (see Methods for details). We also label the peak model-based variance-explained value for the contrast, face, and body models for the depicted slice. Faint gray dashed lines in the coronal slice indicate the same location across maps. Different rows correspond to different coronal slices as in Fig. 2. C Comparison of correlations across the pulvinar for visual areas V1, V2, V3, hV4, OFA, and FFA. Colored lines indicate half-max contours of the maximum intensity projection of cortical-to-subcortical correlation results for six cortical ROIs. Tick marks indicate the position of the maximum cortical-to-subcortical correlation value for a given cortical ROI. Gray lines indicate contours of maximum intensity projections of the anatomical ROIs from Fig. 2B.

Visualizing cortical correlation maps with half-max contours revealed a progression across the pulvinar from anterior-inferior-lateral to posterior-superior-medial as the cortical seed area shifted from V1 in the posterior visual cortex to hV4 anteriorly (Fig. 6C; see Methods for details). While all early visual cortical areas showed correlations with the LGN and ventrolateral pulvinar, the superior-medial extent of correlation increased systematically across successive cortical seeds. Because fMRI responses from nearby cortical ROIs are themselves correlated, we tested whether unique contributions from individual areas could be isolated using multiple regression (Supplementary Fig. 9). This analysis yielded more focal correlations in the pulvinar for V1, V2, V3, hV4, and FBA. Notably, while the main analysis showed both V1 and V2 correlating with anterior coronal slices overlapping the LGN, multiple regression isolated focal LGN correlations for V1, with V2 peaking instead in adjacent pulvinar. Similarly, whereas the main analysis showed widespread MT correlations, the multiple regression revealed a more focal ventral pulvinar region just medial to the V1/V2 peaks, consistent with the location of pulvinar subregion PIm, which connects exclusively with MT32. These patterns mirror prior fingerprint functional connectivity results23, suggesting that multivariate approaches such as regression or pattern analysis can better separate shared variance between nearby cortical areas. Overall, these findings reinforce the conclusion of structured subcortical–cortical correlations.

Discussion

Our analysis of BOLD responses to tens of thousands of natural scenes revealed a complex functional architecture within the pulvinar. Using pRF models, we identified distinct regions responsive to low-level visual features and high-level object categories, particularly faces and bodies. These regions each contained orderly retinotopic maps and together formed a continuous organization across the pulvinar. Correlational analyses further demonstrated functional coupling between pulvinar subregions and cortical areas selective for low- and high-level visual features, matching the distinctions revealed by the pRF analysis. Together, these findings suggest that the pulvinar interacts with cortex at multiple levels of cortical processing, and that the pulvinar may play a more integral role in visual processing than previously appreciated.

High-level visual processing in the pulvinar

The pulvinar has long been associated with visual processing, with most studies emphasizing its role in low-level visual feature analysis11,12, attention14 or motor processes17. The present results confirm prior evidence for low-level feature sensitivity in the ventral lateral pulvinar11,12, consistent with its strong anatomical connections with early visual areas31. At the same time, our results provide clear evidence that the pulvinar also participates in high-level visual perception. We identified a focal region in the posterior medial pulvinar that responds selectively to images of faces and bodies. Although previous macaque electrophysiology has reported responses to face-like images21 and facial emotion22, and human fMRI studies have reported weak category preferences under artificial contexts23,24, our findings demonstrate robust category-specific responses under ecologically valid conditions with tens of thousands of natural scenes. A key contribution of the present work is demonstrating the specificity of these responses in relation to spatial coding, stimulus feature selectivity, anatomical localization within the pulvinar, and functional coupling with corresponding category-selective cortical regions.

Our results suggest that the pulvinar may play a more integral role in object recognition than previously thought, potentially contributing to the rapid and efficient processing of socially relevant stimuli like faces and bodies33. Converging with recent work showing face responses in the superior colliculus34, our results indicate a broader role for subcortex in both low- and high-level vision.

Retinotopy as an organizing principle of the visual pulvinar

A striking feature of our results is the prominence of retinotopic organization at multiple levels of visual responsiveness in the pulvinar. Prior fMRI studies demonstrated retinotopic maps in portions of the human pulvinar interconnected with posterior visual cortex and sensitive to low-level features26, consistent with single-unit physiology in non-human primates11,19. The present results demonstrate that retinotopy is a more general organizational principle of the pulvinar, with orderly visual field maps present even in regions selective for higher-order features such as faces and bodies. This preservation of retinotopy mirrors findings in inferotemporal cortex, where spatial sensitivity persists35 even in face- and object-selective areas36,37,38. Such consistent spatial coding across cortical and subcortical structures suggests that retinotopy provides a common framework for integrating diverse information, from basic feature detection to complex object representations.

These results also align with proposals that retinotopic organization shapes development of visual cortex, potentially serving as a scaffold for the emergence of complex functional specializations39. Pulvinar connectivity with cortex emerges early in gestation40, and is hierarchically and retinotopically organized at birth41, preceding the development of functional clusters in cortex such as face-42 and word-selective regions43. This developmental timeline suggests that the pulvinar may play a critical role in shaping the ventral visual cortical hierarchy44, potentially providing error-driven signals that support the emergence of categorical representations45.

The pulvinar within the visual hierarchy

How does the pulvinar integrate with the established cortical architecture of object vision to support recognition? The pulvinar is an associative thalamic nucleus with limited internal connectivity46, forming input and output relationships primarily with cerebral cortex. Prior studies have revealed a gradient of anatomical18,47,48 and functional23 connectivity across the pulvinar, progressing from posterior to anterior visual cortex. Rather than constituting a distinct, parallel processing pathway, the pulvinar’s topographically organized, but overlapping, cortical connection zones enable indirect transcortical communication31, effectively integrating with the ventral visual cortex at multiple hierarchical stages. Converging with these prior studies, our analysis of processing during natural scene viewing revealed a gradient of functional coactivation between the pulvinar and cortex that recapitulates the hierarchical organization of the ventral visual pathway. This hierarchical organization of thalamo-cortical connections suggests that the pulvinar functions as an integral part of the ventral visual cortical pathway, as we discuss below.

We speculate that these connectivity patterns reflect the pulvinar’s role in gating cortical visual processing and synchronizing cortico-cortical communication via cortico-thalamic loops. Studies in macaques have shown that pulvinar neurons can modulate the gain of visual responses in cortex13 and facilitate communication between distant cortical areas15. While most work pursuing this hypothesis has focused on the pulvinar’s role in cortical attention networks15,49,50, our results provide evidence that the pulvinar also plays an important role in regulating cortical networks dedicated to object vision. These influences occur at multiple stages of the cortical hierarchy10,51, potentially integrating diverse visual information and facilitating the binding of relatively basic visual features into coherent object representations. While our results emphasize the integration of processing within the ventral visual pathway, it remains an open question the extent to which the pulvinar might integrate information across multiple processing streams, and even potentially serve as a hub for multi-sensory integration52.

The connection between the pulvinar and object vision suggested by our results can be considered in light of theories that the pulvinar contributes to fast detection circuits that support rapid, evolutionarily important visual behaviors53. In macaques, some pulvinar neurons sensitive to face-like features respond with latencies shorter than 50 ms21, much earlier than typical responses in inferotemporal cortex. Such rapid responses could arise within the pulvinar or reflect input from the superior colliculus54, where similarly fast face-selective responses have been observed34. Importantly, not all pulvinar neurons respond this quickly, and population-level face responses accumulate more gradually21. Rather than being inconsistent with a role in ventral visual stream processing, these temporal differences may reflect complementary functions with fast pulvinar responses providing early predictive signals that help shape slower, more detailed cortical analysis45.

Towards studying visual networks with higher ecological validity

Our study demonstrates the utility of using naturalistic stimuli to probe processing throughout the visual system. While naturalistic stimuli are often regarded as too uncontrolled for systematic investigation, we have shown that computational approaches can be used to quantify the contribution of various aspects of visual processing, from low- to high-level visual properties, and link this processing with specific neural circuits. Our findings isolating regions of the ventral lateral pulvinar selective to local image contrast content in the real-world scenes converges with prior work using less naturalistic stimuli, serving as a validation of this approach and demonstrating a principled way to explore neural responses to basic visual properties within real-world contexts.

The use of naturalistic stimuli in conjunction with computational modeling approaches offers several advantages. First, it allows for the investigation of visual processing under conditions that more closely approximate real-world vision. Second, it enables the simultaneous examination of multiple levels of visual processing, from low-level features to high-level object categories. Third, it may reveal functional properties that are not easily observable with more constrained stimuli, as evidenced by our discovery of robust visual-category selectivity in the pulvinar. The success of this approach in uncovering such aspects of pulvinar function underscores its potential for embracing complexity of natural visual input. This shift towards more ecologically valid paradigms promises to deepen our understanding of how the brain processes the rich visual world.

Future directions for study of the human pulvinar

The present work opens several avenues for advancing our understanding of the pulvinar’s role in perceptual and cognitive processing. While our results demonstrate clear involvement in both low- and high-level visual features, the full range of category selectivity in the pulvinar remains to be mapped. For example, we found limited evidence for regions specialized for scenes, text, or inanimate objects. Some of these gaps could be addressed by reanalysis of the NSD data with improved pRF models for additional features, such as tools24 and food, whereas others may require follow-up experiments with more controlled stimuli.

Our findings also generate specific experimental predictions about the pulvinar’s functional organization and development. Decoding analyses should reveal stronger face-versus-non-face classification in the posterior medial pulvinar compared to anterior lateral regions, with posterior responses approaching those of cortical face-selective areas and anterior response better suited to low-level feature discrimination. Developmental studies could test whether face and body selectivity emerges earlier in the pulvinar than in cortex, consistent with its early forming connectivity to cortical targets in neonates41 and supporting a role in scaffolding cortical organization. The overlapping representations of faces and bodies further suggest that the pulvinar may serve as an integration hub, a possibility that could be evaluated with representational similarity analyses to determine whether pulvinar responses reflect weighted averages of nearby cortical representations.

Beyond representational structure, future work could examine the pulvinar’s contribution to behavior55. Studies explicitly focused on sampling different visual tasks could test whether the pulvinar supports face detection versus identification of faces, while naturalistic paradigms could investigate whether it contributes to extracting contextual associations. Such integrative processes may be particularly important for learning statistical regularities in everyday environments and could help explain how advanced recognition abilities emerge before cortical circuitry fully matures.

Our study demonstrates that the pulvinar contains multiple retinotopic representations of the visual scenes, including low-level features and high-level object categories. These representations are localized to distinct anatomical regions that align well with known patterns of pulvino-cortical anatomical connectivity. Our results support the hypothesis that the pulvinar plays an integral role in coordinating communication between visual cortical areas across multiple levels of processing.

These findings underscore the importance of considering subcortical contributions to object vision. Unraveling the complexities of cortical visual processing, such as in face and body recognition, may require a more comprehensive understanding of their interactions with subcortical structures like the pulvinar and superior colliculus34. Our work provides a foundation for future investigations into the computational roles of the pulvinar in object recognition and other high-level visual processes.

Methods

Participants

Eight participants (six females and two males; age range, 19–32 years) participated in the NSD study, which was approved by the Institutional Review Panel of the University of Minnesota. All participants provided informed consent and had normal or corrected-to-normal acuity. For detailed participant information, see ref. 56.

Dataset

The NSD comprises fMRI measurements from 8 participants viewing 9000–10,000 distinct natural color scenes (22,000–30,000 trials) over 30–40 scan sessions. Scanning was conducted using a 7T MRI scanner, with whole-brain gradient-echo EPI at 1.8-mm isotropic resolution and 1.6-s repetition time. Images were sourced from the Microsoft Common Objects in Context (COCO) database57, square cropped, and presented at 8.4° × 8.4° visual angle. Stimulus images reproduced here (Figs. 1, 4, and 5, and Supplementary Fig. 1) are thus modifications of the original images used in compliance with COCO database’s Creative Commons 4.0 license (https://creativecommons.org/licenses/by/4.0/). A set of 1000 images was shared across participants, with the remaining images unique to each participant. Images were presented for 3 s each, with 1-s gaps between them.

Data preprocessing included temporal interpolation for slice time correction and spatial interpolation for head motion correction and compensation for spatial distortion. Single-trial beta weights, representing BOLD response amplitudes, were estimated using a general linear model. The study utilized “Version 3” trial response estimates56, which incorporate voxel-specific hemodynamic response functions, denoising using GLMdenoise 1.458, and ridge regression for trial responses estimation. For cortical surface data, we utilized the NSD-prepared subject-native surface data, registered and transferred to the fsaverage template using nearest neighbor interpolation. For subcortical data, we used the 1.0-mm volumetric preparation of the NSD data in subject-native space, mapped into the MNI template space using the provided T1-to-MNI transformation. To control for inter-session variability, all data were z-scored within each voxel (or vertex for surface preparations) on a session-by-session basis.

Regions of interest (ROIs)

Subcortical analyses focused on a manually defined region of the posterior thalamus that included the pulvinar, lateral geniculate nucleus (LGN) and superior colliculus (SC). This region was identified on each participant’s native-space T1-weighted anatomical image, using the contrast between gray and white matter (for more details, see ref. 26). To ensure consistency across subjects, each subject’s subcortical ROI was then transformed to 1 mm MNI space using nearest-neighbor interpolation. A group average was constructed by selecting all voxels labeled as posterior thalamus in at least 3 out of 8 participants, then projected back to each participant’s native 0.8-mm anatomical space.

Specific structures within this ROI were identified based on previously reported group average functional data26. These data were warped from 1-mm MNI space to each participant’s native space. The ventral pulvinar was defined by voxels in vPul1 or vPul2 retinotopic maps. Dorsomedial and dorsolateral pulvinar were defined by correlations with the precuneus and retinotopic portions of the frontoparietal cortex (IPS1-5, SPL1, FEF, and IFS), respectively. The LGN and SC were defined by their respective retinotopic maps. ROIs were manually edited to exclude voxels outside the anatomical extent of each structure and transformed back to 1 mm MNI space for group-average ROI construction.

Stimulus feature maps

Feature maps for image contrast, saliency, foreground, background, and specific image categories (faces, bodies, and words) were generated for fMRI analyses. Local image contrast maps were computed by converting images to greyscale, resizing to 800 × 800 pixels, squaring to approximate luminance response, and calculating local contrast within a 51 × 51 grid (grid element size: 0.168 × 0.168 degrees). The grid was padded with half a grid element on all sides to capture transitions between the image and the mean gray background (for more detail, see ref. 56). Saliency maps were generated using a pre-trained deep neural network model59 trained on the SALICON dataset, a fixation dataset based on the COCO image database60. As the NSD images also were sourced from the COCO database, this model was expected to generalize well to the NSD stimuli. Saliency maps were cropped and resampled to match the NSD stimuli from the COCO database56. All other feature maps comprised binary mask annotations. Face annotations were generated using the RetinaFace model (https://github.com/deepinsight/insightface/tree/master/RetinaFace), which identifies rectangular bounding boxes around faces. Word annotations were created using the EAST text detector (https://github.com/argman/EAST), which outputs bounding quadrangles or rectangles around text. Quadrangles were converted to enclosing rectangles for consistency with other annotations. Whole body annotations were sourced from the Microsoft COCO dataset (http://images.cocodataset.org/annotations/annotations_trainval2017.zip), using bounding boxes for human bodies (category ID 1) and animals (categories 16:25). Foreground annotations were derived from segmentations across all 80 object categories, and background annotations were defined as the inverse of the foreground annotations.

Population receptive field (pRF) analysis

A pRF encoding model with compressive spatial summation was employed to characterize spatially specific responses to stimulus feature maps within each subject’s subcortical ROI. The model was implemented using a modified version of AnalyzePRF (http://cvnlab.net/analyzePRF/), which estimates voxel-wise parameters for each feature map58, including spatial position, pRF size, gain, and variance explained. Separate models were fit for each feature map, with variance explained (R2) reflecting the percentage of variance in trial-averaged BOLD responses accounted for by the model. Only stimuli unique to each subject were included, excluding the common stimuli viewed by all participants. For non-contrast and non-saliency pRF models, a baseline term was added to account for responses to image regions covered by the annotated feature. For group-level analysis, voxel-wise parameter estimates were transformed to 1 mm MNI space using linear interpolation, and median values then were calculated across participants for all parameters.

Winner-take-all pRF analysis

The performance of different features was evaluated with a winner-take-all analysis. Variance explained values for each feature model were averaged across participants in the shared 1 mm MNI space (see above). Then, each voxel within the thalamus ROI with at least 0.2% variance explained by any feature was labeled to indicate which feature model explained the most variance.

pRF map characterization

To characterize the organization of retinotopic maps derived from the pRF analysis, follow-up analyses were carried out on the contrast and body pRF parameter estimates. To test whether pRF angle maps followed a consistent orientation, we extracted group-averaged pRF angle values that exceeded a 0.15% variance-explained criterion within the thalamus ROI. Voxel coordinates were then projected onto candidate map orientations, represented as oriented 3D unit vectors. Scalar projection values for each voxel were then correlated with the voxels’ pRF angle value (transformed to span 0° = lower vertical meridian to 180° = upper vertical meridian). A strong correlation coefficient indicated that voxel positions along the tested orientation predicted pRF angle values. This procedure was repeated across 60 candidate orientations evenly distributed in 3D space (azimuth values from 0 to 2π, elevation values from 0 to π), separately for each hemisphere. Resulting correlation coefficients were visualized on the surface of a unit sphere (Supplementary Fig. 8). To assess significance of the best-fit orientation, the analysis was repeated with shuffled voxel-to-pRF correspondences to generate a null distribution of peak correlation coefficients, against which the observed peak was compared.

To test whether pRF eccentricity values followed a systematic spatial arrangement, we fit an isotropic 3D Gaussian to the eccentricity parameter estimates. The model had six parameters: a 3D mean (x, y, z), a single standard deviation, a baseline term, and a gain term. Note that when the baseline term is high (e.g., 4° eccentricity) and the gain negative, the model predicts central visual field eccentricities near the Gaussian center and peripheral eccentricities further away. The model was fit using the nonlinear minimizer from SciPy61 with standard parameter settings and 1-R2 as the objective function, separately for the contrast and body maps in each hemisphere. Analyses were restricted to voxels within the thalamus ROI exceeding a 0.15% variance-explained threshold in the pRF analysis. For each fit, 50 random initializations of the Gaussian mean within the 3D bounding box were attempted, and the best-fitting model was retained.

Pulvino-cortical correlation analysis

Correlation analyses comprised two parts: (1) correlating single-voxel pulvinar responses with those across the entire cortical surface, and (2) correlating average responses for cortical ROIs with all pulvinar voxels.

For the first analysis, we identified voxels with the highest variance explained for the contrast model (contrast peaks) and for the body model (body peaks) in each participant’s native space. Trial response estimates from these voxels were correlated with response estimates from every vertex on the fsaverage cortical surface. This process was repeated 1000 times for each participant with trials randomly resampled with replacement (i.e., bootstrapped). For visualization of results on the cortical surface (Figs. 4 and 5), the correlation coefficient, averaged across bootstrap replicates and participants, was shown only if at least 95% of the bootstrapped correlation coefficients exceeded a value of 0 in a majority of participants in the given vertex.

For the second analysis, we constructed a set of cortical ROIs covering most of the visual cortex. These included: (1) retinotopic areas (V1, V2, V3, and hV4) defined from the pRF localizer experiment included with NSD56, (2) additional retinotopic areas from the Wang probabilistic retinotopic atlas62, and (3) category-selective areas (OFA, FFA, aTL-faces, EBA, FBA, PPA, and VWFA) from the NSD functional category localizer experiment56. ROIs were pruned on an individual-participant basis to avoid overlap. Average responses were generated for each cortical ROI in fsaverage space and correlated with the trial response estimates from voxels within the subcortical ROI.

We performed analyses between the pulvinar and cortex using both same-trial and different-trial (for same image) pairings of response estimates. For same-trial correlations, each response in the subcortex was correlated with responses from the same trials in the cortex. For different-trial correlations, we leveraged that each participant viewed each distinct image three times. This allowed us to identify six possible pairs of image repeats, resulting in six response pairings for 10,000 images across different trials of the same image. We computed the correlation for each pairing, and averaged the resulting six correlation coefficients to derive a final correlation estimate between the pulvinar and cortex.

Programming languages and packages

Analyses were carried out and figures were produced using custom code written in MATLAB R2018a (data preprocessing, stimulus feature map generation, pRF analysis, correlation analysis, surface visualizations) or Python 3.13.9 (pRF map characterization, volume visualizations). The following packages were essential in this process, some of which were already mentioned in the text above: analyzePRF 1.2 (https://github.com/cvnlab/analyzePRF/), NumPy 2.2.4 (https://numpy.org/), SciPy 1.15.2 (https://scipy.org/), matplotlib 3.10.1 (https://matplotlib.org/), and nibabel 5.3.2 (https://nipy.org/nibabel/).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.