Introduction

Prevailing theories on the functional organization of primate neocortex are dominated by large-scale areal parcellations, ranging from approximately 200 areas in humans to around 130 in macaques1,2,3. Early sensory-motor areas of primate cortex, however, exhibit a more fine-grained organization, with clusters of neurons sharing common functional properties and precise patterns of connectivity within and between neighboring areas4,5. These groups of neighboring neurons with shared response properties can be referred to as mesoscale functional units (MFUs), which represent functional sub-compartments within larger brain regions. While traditional cortical columns are one example of MFUs, MFUs can vary significantly in size, distribution, and frequency of occurrence within a given area, distinguishing them from traditional cortical columns.

While the concept of MFUs is not new, their existence and characteristics in higher-order extrastriate visual cortex, particularly within specialized category-selective areas, remain poorly understood. These category-selective regions are known to encode complex information beyond their preferred category, with voxels in face-selective areas, such as the fusiform face area (FFA)6, also encoding significant information about non-face categories7,8,9,10,11,12,13,14,15. Notably, substantial functional heterogeneity has been observed within these areas, with individual voxels exhibiting different response properties16,17. However, studies showing that this functional heterogeneity is organized in a spatially coherent manner at small scales, such as that would be expected of MFUs, remain contested and limited18,19.

Previous work by Gallant and colleagues revealed that voxels with similar response profiles are grouped within FFA10 and scene-selective regions17, but the median cluster size was approximately 0.5 cm3 (voxels were 17.6 mm3), likely reflecting a macro-level rather than mesoscale functional organization. The work of Tanaka and colleagues, which revealed a columnar-like organization in anterior inferotemporal cortex for simple shape features20,21,22, is a notable exception. However, spatial clustering of neurons at mesoscale level in category-selective cortex of primates has yet to be shown.

In this study, we used sub-millimeter whole-brain fMRI (0.6 mm isotropic, or 0.22 mm3 voxels)23 to investigate the existence and characteristics of MFUs in high-level category-selective areas. We demonstrate that face-, body-, and object-selective areas can be reliably subdivided into anatomically clustered MFUs. Single-cell recordings in a body-selective area confirm a similar functional clustering as observed with fMRI. Additionally, resting-state fMRI-based functional connectivity reveals distinct interhemispheric connectivity patterns for MFUs of the same type. Moreover, our analyses revealed remarkable similarities in MFU properties across subjects, sharp boundaries between MFUs, and similar spatial fall-off patterns of response similarities as previously observed in single-unit recordings. These findings indicate the presence of a mesoscale functional organization in high-level category-selective areas and suggest that these MFUs form large-distance mesoscale functional networks.

Results

Functional and anatomical clustering of voxels in middle lateral face area ML

In a block-design fMRI experiment, monkeys viewed 200 images belonging to 10 different visual categories including human and monkey faces and bodies, two types of objects, animals, birds, fruits and sculptures (see Fig. 1, “Methods”, and ref. 24). Whole-brain contrast-agent enhanced25 fMRI data (0.6 mm isotropic voxels) were acquired with phased-array receive coils embedded in the headset of the animals23,26,27. We first identified face area ML, using the conjunction of three different contrasts (monkey faces versus monkey objects, monkey faces versus fruits, and monkey faces versus monkey bodies). Only voxels reaching a threshold of p < 0.05 for each contrast on both even and odd scan days were selected—a conservative approach yielding voxels exclusively belonging to ML. On the fMRI tuning curves of these 0.216 mm3 voxels, we performed a hierarchical cluster analysis. This revealed 3 different clusters within ML, indicating functional grouping (Fig. 2A–C; see Supplementary Fig. S7 for visualization of these clusters in low-dimensional space). These clusters exhibited a significant decrease in the Trace Cov W index (Fig. 2E), and the largest change in cumulative distribution function (CDF) area from consensus clustering (Supplementary Fig. S3A). Although all 3 clusters were face-selective, exactly as predicted based on previous research12,28,29 and our selection criteria, each cluster showed a different functional profile (Fig. 2B, C). Voxels of the first cluster were most specific for face stimuli compared to the two other clusters. The second and third clusters were also activated by animals and birds, and the third cluster was also characterized by its stronger sensitivity for mammals and weaker responses for non-animate objects.

Fig. 1: Stimuli and experimental design.
figure 1

A block design was conducted and each block lasted 30 s. Each category contained 20 different images, each image was shown for 750 ms and was repeated twice in one block. The color of the blocks matches the color of the outline of the example stimuli indicated in the upper panel.

Fig. 2: ML contains three segregated mesoscale functionally units with different categorical selectivity profiles and inter-hemispheric functional connections in subject M1.
figure 2

A Unsorted normalized activation profiles in each 0.216 mm3 voxel. B Normalized activation profiles in each voxel sorted by the hierarchical cluster analysis. Left panel: dendrogram of the hierarchical cluster tree. C Mean normalized activation profiles in each MFU of ML. Each stimulus class is indicated by an example image. D Functionally clustered voxels (three clusters in B) are back-projected to anatomical images in coronal (upper panel) and sagittal (lower panel) planes. Different functionally clustered voxels are also anatomically clustered as indicated by the same color-code as (B). The insets in the right upper corner correspond to the black rectangles on the slices. E Trace Cov W index indicated that the optimal number of clusters = 3 (red dot). Blue curve: Trace Cov W index calculated from different number of clusters. Orange curve: first difference of the Trace Cov W indices calculated from the real data. Gray dashed line and shading: mean and 99% confidence interval of the first differences of the Trace Cov W indices calculated from the 10,000 permutations. F Violin plot of the per run FC analysis. Each dot represents FC calculated from each single run. Box plots (grey box in the center of the violin) show the interquartile range (IQR; 25th–75th percentiles), with a white dot indicating the median within each box. Whiskers extend to the most extreme data points within 1.5\(\times\)IQR of the box edges, or to the minimum/maximum values. FC between MFUs of the same type (i.e., belonging to the same functional cluster) is significantly higher than FC between MFUs of different types across the two hemispheres (n = 28, p = 0.0014, uncorrected, two-sided paired t-test). G A permutation test shows that the veridical “same vs. different type MFUs FC strength” was significantly stronger compared to virtual MFUs with randomly assigned voxels (p = 10−4, uncorrected). The median value for veridical “same vs. different type MFUs FC strength” is indicated by the red vertical line and the median value for the same comparison but with randomly shuffled voxels across MFUs (10,000 permutations) is indicated by black vertical line. Source data are provided on GitLab (https://gitlab.com/lzgitlab/share/mfus_of_face_body).

Next, we mapped the voxels from these three functionally defined clusters onto the brain. Interestingly, these voxels were not randomly interspersed in a salt and pepper-like pattern but grouped in spatially segregated units (Fig. 2D). Moreover, voxels belonging to the same functional cluster were retrieved in approximately similar spatial locations along the medio-lateral and posterior-anterior axes of ML in both hemispheres (Fig. 2D) and monkeys (Fig. 2 (M1) and Supplementary Fig. S1 (M2)). Thus, functionally clustered voxels are orderly organized in MFUs within face area ML, indicating spatial clustering.

To quantify the similarity in response patterns between clusters across animals, we calculated Pearson correlation coefficients on the normalized response patterns of the three clusters between M1 and M2. After subtracting the mean response pattern within each monkey, matching clusters exhibited high correlation coefficients ranging from 0.84 to 0.95 in ML (Supplementary Fig. S4A, left panel), while non-matching clusters exhibited low correlations ranging from −0.79 to −0.04.

To statistically validate these observations, we performed a permutation test (10,000 iterations) where voxels from M2 were randomly assigned to one of the 3 clusters while preserving the original cluster sizes. For each iteration, we computed the correlation matrix between the normalized mean response patterns of the 3 clusters across animals, and compared the average diagonal (matching clusters) correlations to off-diagonal (non-matching clusters) correlations. The result demonstrated that the real data exhibited significantly higher matching-versus-nonmatching cluster correlation than the permuted data (p = 0.00039; Supplementary Fig. S4A, right panel). Hence, the 3 types of MFUs in ML are reliably conserved across animals, beyond that would be expected by chance.

Mesoscale functional units of ML form segregated mesoscale functional networks

Previous research has been shown that face areas are interconnected across hemispheres30,31. Here, we aim to explore whether there is a more detailed functional connectivity pattern, not at the level of areas but at MFU level. Specifically, we investigated whether different-type MFUs within ML exhibit distinct connectivity profiles. To this end, we conducted a functional connectivity (FC) analysis on independent high-resolution resting-state fMRI data from the same subjects (0.6 mm isotropic voxels) using the functionally defined MFUs as starting point. We tested whether MFUs belonging to the same cluster in both hemispheres (same-type MFUs) are preferentially connected with each other. We found that across runs, FC strength between same-type MFUs in the left and right hemispheres is significantly stronger than between different-type MFUs (p-values \(\le\) 0.002) (Fig. 2F (M1), and Supplementary Fig. S1 (M2)). To control for any biases introduced by different sizes of the MFUs, we conducted a permutation test in which we randomly shuffled the ML voxels and compared the veridical FC strength of the “same vs. different type MFUs” against virtual MFUs consisting of permutated voxels. Again, we found that there was significantly higher FC across same-type MFUs in the two hemispheres of both subjects compared to MFUs with permutated voxels (p-values < 10−4) (Fig. 2G (M1), and Supplementary Fig. S1 (M2)).

Middle body-selective area MSB also contains mesoscale functional units, which form interhemispheric mesoscale functional networks

To generalize our findings, we performed the same analysis on another stringently defined category-selective area but belonging to the body-processing network, i.e., body selective area MSB (Fig. 3). MSB could also be subdivided into three MFUs, with functionally clustered voxels based on different functional responses for the different object categories.

Fig. 3: Three types of mesoscale functional units in body area MSB in M2.
figure 3

Body area MSB in M2 contains three different clusters of voxels with different categorical selectivity profiles (functional clustering) (A, B), which are also anatomically clustered (C). Hence, MSB also consists of mesoscale functional units. D Trace Cov W index indicated that the optimal number of clusters = 3 (red dot). Gray shading indicates the 99% confidence interval of the first differences of the Trace Cov W indices from 10,000 permutations (cf. Fig. 2E). E, F MFUs of the same type across the two hemispheres (belonging to the same functional cluster) show stronger functional connections compared to different type MFUs (E, two-sided paired t-test, p = 0.005, uncorrected, n = 37 runs; F, permutation test, p = 0.017, uncorrected, 10,000 permutations). Box plots (grey box in the center of the violin) in (E) show the interquartile range (IQR; 25th–75th percentiles), with a white dot indicating the median within each box. Whiskers extend to the most extreme data points within 1.5\(\times\)IQR of the box edges, or to the minimum/maximum values. Data are from subject M2. Results are computed and presented as in Fig. 2. Source data are provided on GitLab (https://gitlab.com/lzgitlab/share/mfus_of_face_body).

MFU1 and MFU2 in M2 show the strongest responses to monkey and human bodies, as well as mammals and birds. While faces do not elicit responses in MFU1 of M2’s MSB, face-evoked activity gradually increases in MFU2 and MFU3. The latter also responds less to human bodies compared to MFU1 and MFU2 (Fig. 3). MFU1 in M1’s MSB (Supplementary Fig. S2) is particularly responsive to monkey bodies, with minimal or no response to human bodies, animals, or birds. In contrast, MFU2 also responds to birds and to a lesser extent to mammals, a pattern that is reversed in MFU3. Interestingly, MFU3 also responds to faces but does not react to either monkey or human objects, unlike MFU1.

To quantify the similarity in response patterns between the two animals in the three MSB clusters, we performed the same correlation analysis as in ML (Supplementary Fig. S4B). The responses in matching MFUs show higher correlations (ranging from 0.48 to 0.73; diagonal) across animals than those in non-matching ones (ranging from −0.81 to 0.25; off-diagonal) (left panel). This finding was further validated using a permutation test (right panel), as described above for ML. The actual, non-permuted data revealed a significantly higher (p = 0.028) matching-versus-nonmatching cluster correlation than the permuted data (Supplementary Fig. S4B, right panel).

Finally, these functionally clustered voxels appear to be anatomically clustered within body area MSB, hence, they constitute MFUs within area MSB. Moreover, same-type MFUs of MSB also show higher interhemispheric functional connectivity, compared to different-type MFUs. These results were again confirmed in both monkeys (Fig. 3E, F (M2), and Supplementary Fig. S2E, F (M1)).

Correspondence between fMRI and single-unit defined mesoscale functional units in MSB

The discovery of MFUs in category-selective areas is based on hemodynamic signals. Therefore, to investigate the neuronal basis of this finding, we reanalyzed single-unit recordings throughout the entire extent of fMRI-defined MSB of another animal (M3). MSB was first identified by low-resolution (1.25 mm instead of 0.6 mm isotropic voxels) fMRI maps and fMRI-guided single-unit recordings were performed using half of the stimuli that were also used in the high-resolution fMRI experiment (for details see “Methods”, and refs. 15,24). Hierarchical cluster analysis on the population of single-unit responses showed that body selective cells (n = 98), selected from the entire visually driven population using identical criteria as the fMRI voxels (i.e., the conjunction of 3 body-selective contrasts: monkey bodies versus monkey objects, monkey bodies versus fruits, and monkey bodies versus monkey faces), could be clustered into three subdivisions (Fig. 4A). These clusters exhibited a significant decrease in the Trace Cov W index (Fig. 4C), and the largest change in CDF area from consensus clustering, with or without averaging responses within each category (Supplementary Fig. S3D).

Fig. 4: Comparison between MFUs based on fMRI and functional clusters based on single-cell recordings in body-selective area MSB.
figure 4

A three functional clusters of category-selective cells recorded throughout the entire extent of MSB (M3), obtained using single-unit recordings (98 neurons). Right, Spiking activity matrix where each row represents the normalized responses of a neuron across 100 stimuli (10 per category) in MSB. Left, dendrogram of the hierarchical cluster tree, after clustering. B hierarchical clustering conducted on averaged fMRI and single-cell cluster profiles. Right, averaged fMRI and single-cell cluster profiles. Each row of the average single-cell cluster profile represents the mean normalized response of a functional cluster in (A) across the 10 stimulus categories. Each stimulus class is indicated by an example image. Left, dendrogram of the hierarchical cluster tree, after clustering. C Trace Cov W index (calculated from averaged single-cell responses at category level) indicated that the optimal number of clusters = 3 (red dot). Gray shading indicates the 99% confidence interval of the first differences of the Trace Cov W indices from 10,000 permutations (cf. Fig. 2E). D Permutation test results. Red line represents the within-versus-between-cluster correlation coefficient of activation profiles between fMRI MFUs and single-cell functional clusters. Histogram represents the distribution of the same correlation coefficients calculated from the 10,000 permutations. The results demonstrate a significantly higher (p = 0.0002, uncorrected) matching-versus-nonmatching cluster correlation of activation patterns between fMRI and single neuron clusters when neurons with similar response profiles are grouped together, as opposed to being arranged in a salt-and-pepper configuration. Source data are provided on GitLab (https://gitlab.com/lzgitlab/share/mfus_of_face_body).

The category-selective tuning profiles of these neuronal clusters were not simply face-, body-, or animal-selective. Instead, the profiles are surprisingly similar to those observed in fMRI-defined MFUs in different animals. To quantify their correspondence, we calculated nonparametric Spearman correlation coefficients between the average across-voxel activation profiles in three fMRI-defined MFUs (using data of both subjects) and the average across-neuron spiking activity profiles of the three functionally defined clusters obtained in the electrophysiology experiment. A hierarchical cluster analysis showed that, instead of a separation in distinct fMRI and single-cell clusters, each fMRI cluster corresponded specifically to a single-cell cluster (Fig. 4B). This suggests that the fMRI-defined category-selective activity patterns, defining the three MFUs within MSB, are also reflected by the single-unit responses recorded in the same area. Note, however, that the single-unit data could not reveal evidence for anatomical clustering, unlike the fMRI data.

One might argue that the clusters observed in fMRI data are artificially induced by spatial smoothing, even if single neurons are organized in a salt-and-pepper pattern. To rule out this possibility, we conducted a permutation test by randomly assigning each single neuron to one of the 3 clusters 10,000 times, while preserving the original cluster sizes identified through hierarchical clustering analysis. For each permutation, we computed the mean activation pattern of each cluster and correlated them with those of the fMRI clusters. We then compared the matching-versus-nonmatching cluster correlation coefficient from the permuted data (represented by the histogram) to that obtained from our original clustering analysis (indicated by the dashed red vertical line in Fig. 4D). The observed correlation was significantly higher (p = 0.0002) for neurons grouped by similar response profiles as compared to when they were arranged in a salt-and-pepper configuration. This finding strongly supports the existence of spatially clustered neurons.

Mesoscale functional units in object-responsive cortex

Finally, we addressed the question whether clustering in MFUs is restricted to face and body areas. Specifically, we performed the same analysis on the object-responsive region adjacent to ML and MSB, and also revealed the presence of 3 MFUs (Supplementary Fig. S5). These MFUs predominantly preferred inanimate stimulus categories, with a notable emphasis on human and monkey objects. In both animals, MFU3 was slightly activated by human bodies, and sculptures, unlike monkey faces and bodies. The main distinction between MFUs 1 and 2 was that MFU1 showed a slightly stronger preference for human faces and bodies compared to MFU2. The correlation in response profiles between the three clusters demonstrated high inter-subject reproducibility (see Supplementary Fig. S4C).

Additionally, as observed in the face (ML) and body-selective patches (MSB), interhemispheric functional connectivity was higher for same-type MFUs compared to different-type MFUs within the object-selective patch in the inferotemporal cortex of both animals (Supplementary Fig. S5E, F, K, L).

These findings suggest that the fine-grained organization, whereby large areas are subdivided into MFUs which form mesoscale functional networks, might be a general feature of the IT cortex.

Differences among these mesoscale functional units cannot be explained by gradual variations in eccentricity bias or face/body-selectivity

We observed stronger interhemispheric functional connectivity between MFUs of the same-type compared to those of different-types within ML, MSB, and the object-selective patches. One might argue that this pattern may be linked to more robust functional connections between matching eccentricity representations compared to non-matching ones, considering that these patches are retinotopically organized32.

To determine whether the MFUs can be distinguished based on eccentricity differences, we conducted a separate high-resolution (0.6 mm isotropic voxels) retinotopic mapping experiment using the same subjects27. We then computed correlation between distance matrices derived from eccentricities and those from full object response profiles to assess how much variance among the 3 clusters could be attributed to eccentricity. To further evaluate the reliability of these distance measurements, we split data from each experiment into independent datasets and correlated distances calculated using the same response properties across these splits. The results showed that while test-retest correlations were high for both types of distance measures (eccentricity and full object response profiles), the cross-correlations between these two measurements were low (see Fig. 5A). This indicates that, although eccentricity might account for some variance among the 3 clusters, its contribution is minimal. Indeed, eccentricity explained only 4.03% and 8.97% of the variance in ML, and 1.21% and 8.46% in MSB in the two subjects, respectively. Therefore, eccentricity does not drive the clustering or the observed interhemispheric functional connectivity between same-type MFUs.

Fig. 5: MFU organization is independent of eccentricity and face/body selectivity, and exhibits sharp boundaries.
figure 5

A, B Correlation analyses demonstrating the independence of MFU organization from eccentricity and face/body selectivity. Test-retest reliability (lines) for distance matrices derived from eccentricities (yellow in A) and face/body selectivity indices (yellow in B), alongside reliability for full object response profiles (orange). Cross-correlation between distance measurements based on categorical responses and those based on eccentricity (blue bars in A) or face/body selectivity (blue bars in B) is shown. Data are presented for both monkeys. C, D Sharp transitions at MFU boundaries. Comparison of Euclidean distances between object response profiles of neighboring voxels along the borders of MFUs in subject 1 (M1, C) and subject 2 (M2, D). Distances were calculated for voxel pairs within the same MFU (adjacent voxels within the same MFU along the boundary) and across different MFUs (adjacent voxels across a boundary). Box plots (grey box in the center of the violin) show the interquartile range (IQR; 25th–75th percentiles), with a white dot indicating the median within each box. Whiskers extend to the most extreme data points within 1.5 × IQR of the box edges, or to the minimum/maximum values. For each subject, two-sided Wilcoxon rank-sum tests revealed significant differences: M1—ML (across/within pairs, n = 201/452, p = 9.9 \(\times\) 10−28); MSB (n = 147/362, p = 5.8 \(\times\) 10−28). M2—ML (n = 280/612, p = 1.3 \(\times\) 10−18); MSB (n = 123/253, p = 6.5 \(\times\) 10−8). All p-values are uncorrected. Source data are provided on GitLab (https://gitlab.com/lzgitlab/share/mfus_of_face_body).

We performed a similar analysis to assess the contribution of face/body-selective responses to differences among the three MFU clusters. Specifically, we computed correlations between distance matrices derived from face (ML) and body selectivity (MSB) indices—calculated as t-values from “faces versus bodies” contrast or vice versa—and those derived from full object response profiles. Again, test-retest correlations were high for both types of distance measures, but cross-correlation between them were low (see Fig. 5B). Face/body selectivity explains only 0.87% and 1.47% of the variance in ML, and 1.36% and 18.28% in MSB in the two subjects, respectively.

Smooth or sharp functional boundaries between adjacent mesoscale functional units

To determine whether object response profiles change gradually across neighboring MFUs or exhibit sharp boundaries, we compared the Euclidean distance of object response profiles between neighboring voxels along MFU borders. Specifically, we calculated functional distances: (1) between adjacent voxels at the border of the MFUs, but belonging to the same MFU (“within” condition), and (2) between adjacent voxels at the border of the MFUs but belonging to different MFUs (“across” condition), and compared them using a Wilcoxon rank sum test. In each region and subject (Fig. 5C, D and Supplementary Fig. S6), we observed significantly higher Euclidean distances between the response profiles of two neighboring voxels across MFU borders compared to those within the same MFU (all ps < 10−7). Hence, there is a sharp functional boundary between adjacent MFUs.

Comparing spatial falloff patterns of response similarities between fMRI and single-unit recordings

Previous single-unit recordings in monkey IT cortex have shown that nearby cells exhibit similar stimulus and category selectivity, with response similarities (correlations of response profiles) decreasing as the distance between cells increases13,33. This spatial profile of response similarity can be approximated by a simple rational function 1/(1 + x)33. We assessed whether our high-resolution fMRI data demonstrate a comparable spatial falloff pattern in IT. We used 1- Euclidean distance as an index of response similarity and pooled voxels from ML, MSB, and object-responsive regions to match the broader IT sampling of previous single-unit studies13,33. Center-to-center geometrical distance was used to measure cortical distances between voxel pairs. The mean pairwise similarity exhibited a similar falloff pattern (black line in Fig. 6) as observed in previous sing-unit data33. Importantly, fitting a rational function [a/(1 + bx) + c] with three free parameters to the raw data points yielded a slope parameter b of 0.9532 (95% CI: 0.88–1.03), which is not significantly different from the slope fitted to the single-unit recording data (slope = 1). A likelihood ratio F-test confirmed that this three-parameter model did not provide a significantly better fit than an alternative model with b fixed to 1 (F1,223501 = 1.38, p = 0.24). Thus, the change of response properties along the cortical surface as a function of distance, measured with high-resolution fMRI, corresponds remarkably well with data obtained from dense electrophysiological recordings in IT cortex.

Fig. 6: fMRI response similarity spatial profile mirrors single-unit recordings in IT cortex.
figure 6

Mean pairwise voxel similarity plotted against cortical distance for pooled fMRI data from ML, MSB, and object-responsive regions (data of M1 and M2 combined). The black line shows the observed decay pattern, with grey shading indicating the standard deviation at each pair-wise distance. The green line represents the best-fit rational function with a slope parameter (b = 0.9532) not significantly different from that obtained from the single-unit recordings. Source data are provided on GitLab (https://gitlab.com/lzgitlab/share/mfus_of_face_body).

Discussion

Compared to early visual cortex, the mesoscale functional organization of higher-level visual cortex, especially inferotemporal cortex, is still poorly understood. This is mainly due to a lack of high-spatial resolution tools with a sufficiently large field-of-view covering difficult-to-reach cortex. Exquisite optical imaging and electrophysiological recordings revealed a columnar-like organization of accessible parts of anterior inferotemporal cortex for simple shape features20,21,22. Our sub-millimeter fMRI approach, however, mitigated accessibility limitations and showed orderly organized and anatomically segregated functional units within fMRI-defined category-selective areas of primate inferotemporal cortex hidden in a sulcus. The combination of functionally- and anatomically segregated clusters in face-selective area ML and body-selective MSB shows that also specialized category-selective areas in inferotemporal cortex can be subdivided into MFUs, resembling the functional architecture found in early visual areas such as V1, V234,35,36,37, MT38,39,40, and V434,41,42,43.

Moreover, an independent high-resolution resting-state fMRI dataset revealed dissociable functional connections among MFUs of the same type (i.e., those belonging to the same functional cluster) across hemispheres. This finding greatly complements and refines previous results from studies showing that face areas across hemispheres are interconnected30,31. Furthermore, our analyses demonstrated that the stronger functional connectivity between same-type MFUs compared to different-type MFUs cannot be attributed to differences in eccentricity representations (Fig. 5). Although it remains to be investigated whether these functional connectivity patterns also correspond to differences in anatomical connections, our results indicate that MFUs of the same type form mesoscale functional networks spanning large distances (i.e., in this case even across hemispheres, possibly spanning multiple synapses). The functional connectivity results also resemble those observed in early visual areas where MFUs of the same type, located in different areas, are interconnected with each other. For example, color-biased blobs in V1 are connected with color-biased thin stripes in area V2, and vice versa36,44,45.

Due to the challenges in systematically parametrizing stimuli that activate different category-selective regions in the inferotemporal cortex, we are limited to making qualitative comparisons of the functions of different MFUs. In ML, MFU2 and MFU3 exhibit more complex tuning curves compared to MFU1. Voxels in both MFU2 and MFU3 display a progressively higher sensitivity to animals and birds, which include bodies as well as heads and face-like features. Furthermore, MFU3 shows enhanced responses to monkey bodies and reduced responses to inanimate objects. Given these increased responses to non-face stimuli in MFU2 and MFU3, it is plausible to speculate that these MFUs might play a role in contextualizing lower-level face information. For instance, they could integrate facial information with that from other body parts, thereby providing a more comprehensive representation of the entire body.

Strikingly, single-cell data recorded from the same MSB area revealed three functional clusters with an average population response pattern that was surprisingly similar to those obtained using the high-resolution fMRI data. Notably, the single-unit data were guided by low-resolution fMRI and were acquired with single-contact electrodes across multiple acute recording sessions. Due to factors such as idiosyncratic bending of the electrodes, it is challenging to precisely determine the relative positions of the recorded neurons across different days. As a result, it is exceedingly challenging to assert spatial clustering based solely on this electrophysiological dataset. Our permutation analysis presented in Fig. 4D, however, strongly suggest that the neurons belonging to a functional cluster are also spatially clustered. Moreover, our analyses showed a rapid change in functional properties across borders between adjacent MFUs (Fig. 5C, D and Supplementary Fig. S6) and a similar change in response properties as a function of cortical distance as observed in single-unit recordings of IT cortex (Fig. 6). Thus, regardless of the limitations in spatial clustering analysis of the single units, our data suggest that when spatial resolution and sensitivity are sufficiently high, fMRI methods can effectively capture neuronal population responses across the entire cortex.

Previous low-resolution fMRI studies using non-parametric statistics have shown that category-selective areas are functionally diverse, with individual voxels encoding multiple types of information. For instance, the FFA processes various types of facial information and can also differentiate between many non-face object categories7,8,11. These findings are further supported by single-unit recordings in fMRI-targeted category-selective regions14,15,46,47,48,49,50,51. Interestingly, low-spatial resolution human fMRI studies have also identified spatially distinct functional subdomains within larger category-selective regions, such as the FFA10, and in scene-selective areas like the parahippocampal place area, retrosplenial complex, and occipital place area17. While the subdivisions observed in the FFA by Çukur et al. (2013) appear to resemble those reported in the monkey ML, it is important to note that the voxel size in the latter studies10,17 was nearly two orders of magnitude larger than the voxels used in the present one. An open question remains whether the areal subdivisions in the FFA observed here, based on high-resolution data, correspond to those identified in earlier low-resolution imaging studies, and if so, why these subdivisions are detectable with lower resolution techniques (see ref. 18 for further discussion).

In summary, our data indicate that category-selective areas contain functionally heterogeneous neurons, and that cells with similar responses are spatially grouped. Since MFUs of the same type are functionally connected across large distances (across hemispheres) (see schematic overview in Fig. 7), it is tempting to speculate that mesoscale functional networks may represent fundamental architectural features of the entire primate cortex. In addition, our findings demonstrate that ultra-high-resolution and sensitive fMRI can capture population responses akin to those observed in electrophysiology, making it a powerful tool for uncovering this mesoscale neural architecture in both humans and non-human primates40,52,53,54,55.

Fig. 7: Schematic summary.
figure 7

Macroscale face-selective patches (orange), body-selective patches (yellow), and object-selective patches are organized into mesoscale functional units (MFUs, represented by colored circles within each object-selective patch). MFUs of the same type are preferentially linked across hemispheres, forming long-range mesoscale functional networks (MFNs). Although each category-selective patch examined contains three types of MFUs, the same MFU type appears multiple times within a given macroscale category-selective patch.

Methods

Subjects

Most of the procedures are the same as in ref. 26, but will be summarized below.

Two rhesus monkeys (Macaca mulatta; 1 female; 5–9 kg), M1 and M2, were used for the high-resolution fMRI study. They had previously participated in several other studies26,27,56,57. Another male rhesus monkey, M3 (6.5 kg), was used for the electrophysiological experiments. The latter subject also participated in several previous studies15,24. Animal care and experimental procedures were performed in accordance with the National Institute of Health’s Guide for the Care and Use of Laboratory Animal, the European legislation (Directive 2010/63/EU) and were approved by the Animal Ethics Committee of the KU Leuven. Weatherall reports were used as reference for animal housing and handling. All animals were group-housed in cages sized 16–32 m3, which encourages social interactions and locomotor behavior. The environment was enriched by foraging devices and toys. The animals were fed daily with standard primate chow supplemented with fruits, vegetables, bread, peanuts, cashew nuts, raisins, and dry apricots. The animals were exposed to natural light and additional artificial light for 12 h every day. On training and experimental days, the animals were allowed unlimited access to fluid through their performance during the experiments. Using operant conditioning techniques with positive reinforcers, the animals received fluid rewards for every correctly performed trial. During non-working days, they received water in their living quarters. Throughout the study, the animals’ psychological and veterinary welfare was monitored daily by the veterinarians, the animal facility staff and the lab’s scientists, all specialized in working with non-human primates. The animals were healthy at the conclusion of our study. M1 and M2 are currently still employed in other studies.

To improve the sensitivity of sub-millimeter resolution (f)MRI, 8 (M1), or 10 (M2) channel phased-array receive coils were embedded in an MRI-compatible headpost above the skull23. M3 was also implanted with a magnetic resonance (MR) compatible headpost but lacked an embedded phased-array receive coil. M3 was equipped with a recording chamber targeting the middle portion of the superior temporal sulcus (STS).

The monkeys were trained to maintain fixation within a 2 \(\times \,\)2° virtual window in the center of the screen while sitting in a sphinx position inside a plastic primate chair, and with their heads constrained by a plastic headpost. Fluid rewards were contingent upon fixation behavior and keeping both hands positioned on response keys within the response box in front of the chair -which was monitored using infrared beams. Before each fMRI scan, 8–11 mg/kg monocrystalline iron oxide nanoparticle (Molday ION, BioPAL) was injected via the femoral/saphenous vein to improve the contrast-to-noise ratio and to reduce the contribution of superficial draining veins25. To mitigate the risk of iron accumulation, 1 g/day deferoxamine mesylate (Desferal, Novartis; intramuscular injection) and 60 mg/kg/day deferiprone (Ferriprox, ApoPharama; oral administration) were administered immediately after the scan, and we continued this iron chelation for 4–20 days, until serum iron and ferritin level returned approximately to normal ranges.

(f)MRI acquisition

High resolution fMRI (M1 and M2)

A 3 T Siemens PrismaFit scanner was used at the KU Leuven. For sub-millimeter (0.6 mm isotropic voxel size) functional measurements, we used a 2D simultaneous multiple slice gradient echo planar imaging sequence using the 10- or 8-channel implanted receive coils and a custom-built local single loop transmit coil to cover the whole brain [echo time (TE) = 22/21 ms; accelerated multiband (MB) = 2; acceleration factor = 3; in-plane field of view (FOV) 84 \(\times\) 84 mm; matrix size 140 \(\times\) 140; slices = 74 (M1)/80 (M2), adjusted according to individual brain size; flip angle (\(\alpha\)) = 90°]. Repetition time (TR) = 3000 ms in all experiments except for resting state experiments (TR = 2850 ms).

During a separate session under ketamine-medetomidine anesthesia, we obtained template EPI images using the same sequences, coils, and positions of the subjects, as during the awake behaving fMRI sessions. The corresponding field maps, acquired within the same session, were used to correct for EPI distortions caused by magnetic field inhomogeneity [0.6 \(\times\) 0.6 \(\times\) 0.7 mm voxel size; TR = 917 ms; \({{\rm{T}}}{{{\rm{E}}}}_{1}\) = 6.48 ms, \({{\rm{T}}}{{{\rm{E}}}}_{2}\) = 8.94 ms; in-plane FOV 84 \(\times\) 84 mm; matrix size 140 \(\times\) 140; slices = 66 (M1)/70 (M2), adjusted according to individual brain size; \(\alpha\) = 55°]. To achieve a better registration between the EPI template image and the anatomical reference images, we also acquired T1-w 3D magnetization prepared rapid gradient echo (MPRAGE) images (0.6 mm isotropic voxel size; TR = 2700 ms; TE = 3.8 ms; in-plane FOV 154 \(\times\) 125 mm; matrix size 256 \(\times\) 208; slices = 144; \(\alpha\) = 9°; inversion time (TI) = 850 ms) in the same session during which the animals were anesthetized. We used these as intermediate images for registration of images acquired in the awake fMRI sessions and the high-resolution T1-weighted images (acquired in a different session using anesthesia).

To visualize the functional results, we acquired high-resolution (0.4 mm isotropic voxel size) T1-w and T2-w images when the subjects were under ketamine-xylazine (M1) or ketamine-medetomidine anesthesia (M2), using a custom-built local single-loop receive coil and the body transmit coil of the scanner. 13 and 11 T1-w 3D images were acquired respectively for subjects M1 and M2 using a 3D MPRAGE sequence [TR = 2700 ms; TE = 3.5 ms; in-plane FOV 104 \(\times\) 128 mm; matrix size 250 \(\times\) 320; slices = 208; \(\alpha\) = 9°; TI = 882 ms]. 5 and 4 T2-w 3D images were acquired from M1 and M2 with the same in-plane FOV and matrix size as the T1-w images, to reduce field inhomogeneities in the anatomical images and to extract dura and blood vessels from pial surfaces58. The T2-w images were acquired using a sampling perfection with variable flip angle turbo spin-echo (SPACE) sequence (TR = 3200 ms; TE = 456 ms; total turbo factor = 131; echo spacing = 6 ms; no fat suppression).

Low-resolution fMRI for the electrophysiology monkey M3

Details of the fMRI procedure, data analysis and results are provided in refs. 15,24 and are very similar to those described above for the high-resolution fMRI experiments of M1 and M2 -except that a 3T Siemens Trio scanner was used and the spatial resolution was 9 times lower (1.95 mm3 voxels) than in the high-resolution experiments (0.22 mm3 voxels). Specifically, we used a gradient-echo single-shot echo planar imaging (EPI) sequence (TR = 2000 ms; TE = 17 ms; \(\alpha\) = 75°; matrix size 80 \(\times\) 80; slices = 40; 1.25 mm isotropic voxel size). The functional images were co-registered with a high-resolution (0.4 mm isotropic) T1-w 3D anatomical image of the monkey’s individual brain (see above), serving as a template.

Experimental design and stimuli

The stimuli in the scanner (M1-3) were projected using a Barco LCD projector at 60 Hz refresh rate and 1400\(\,\times\) 1050 resolution onto a translucent screen located at 57 cm from the subjects’ eyes. An eye-tracking system based on infrared corneal reflection (ISCAN, 120 Hz) was used to monitor eye movements. Only those runs in which the subjects fixated within a 2° \(\times\) 2° fixation window centered on the middle of the screen for more than 90% of a run were retained for further analysis. Moreover, high-resolution fMRI monkeys M1 and M2 were required to keep both hands on two response keys within a response box in front of the chair. This helped to significantly reduce motion-induced susceptibility artifacts.

fMRI design

A block design was conducted (as in ref. 26). Each block lasted 30 s for M1 and M2, and 20 s for M3 (see Fig. 1). There were multiple run orders in each experiment, counterbalanced across runs such that each condition occurred equally often in each serial position. Each run started and ended with a blank fixation block, in which a uniform grey background was presented with the same luminance as the mean of the images shown in the other conditions.

Category localizer

The same stimuli were used in the high-resolution fMRI experiments in M1 and M2, and the low-resolution fMRI experiments in M3. There were ten classes of achromatic images—monkey and human bodies (excluding the head), monkey and human faces, four-legged mammals, birds, manmade objects (matched either to the monkey or to the human bodies), fruits/vegetables, and body-like sculptures (by the British artist H. Moore). Each class consisted of 20 images, which were previously used in the fMRI study of 15,24. Examples of the stimuli are shown in Fig. 1, while the full stimulus set together with details about the stimuli can be found in ref. 24.

We made every effort to equate the low-level image characteristics, such as mean luminance, mean contrast, and aspect ratio, across the different stimulus classes. The mean aspect ratio of the monkey and human bodies differed since the upright human bodies tend to be more elongated than the monkey bodies. This was controlled for by using two classes of manmade objects—one matching the aspect ratio of the monkey bodies (monkey objects) and another one matching the aspect ratio of the human bodies (human objects). The images were resized so that the average area per class was matched across all classes, except for the human objects and human bodies, but still allowing some variation in area (range: 3.7° to 6.7° (square root of the area)) within each class. The mean vertical and horizontal extent of the images was 8.3° and 6.7° of visual angle, respectively. The images were embedded into pink noise backgrounds having the same mean luminance as the images and which filled the entire display (height \(\times\) width: 30° \(\times\) 40° of visual angle). Each image was presented on top of 9 different backgrounds that varied randomly across stimulus presentations. The stimuli were gamma corrected.

Each category contained 20 images from multiple individuals, each image was shown twice and lasted for 750 ms (M1 and M2) or 500 ms (M3) in every block. Each run lasted for 705 s (M1 and M2) or 430 s (M3), containing 235 (M1 and M2) or 215 (M3) functional volumes. 30 runs were acquired for M1, 55 runs for M2, and 28 runs for M3.

Resting-state fMRI

The subjects were asked to maintain fixation (within a 2 \(\times\) 2° window) at a small red fixation point on a uniform gray background during the entire run. Each run lasted 658 s, containing 231 functional volumes. 28 runs and 37 runs were acquired for M1 and M2, respectively.

Electrophysiological recordings

Standard single-unit recordings were performed with epoxylite-insulated tungsten microelectrodes (FHC; in situ measured impedance between 1.3 and 1.6 MΩ) using techniques as described previously59. Briefly, the electrode was lowered with a Narishige microdrive into the brain using a stainless steel or an MR-compatible (when a position verification scan was performed after recording) guide tube that was fixed in a standard Crist grid positioned within the recording chamber. After amplification and filtering between 540 Hz and 6 KHz, spikes of a single-unit were isolated online using a custom amplitude- and time-based discriminator.

The recording grid locations were defined so that the electrode targeted the left MSB body area in M3. Before the recordings started, we performed a structural MRI and visualized long glass capillaries filled with the MRI-opaque copper sulfate (CuSO4) that were inserted into the recording chamber grid (until the dura) at predetermined positions. Then, the functional images (the contrast between the monkey bodies and monkey objects) of each monkey were co-registered with its anatomical MRI using the co-registration toolbox of SPM8 (Wellcome Department of Cognitive Neurology, London, UK) and the registration was verified by visual examination. Primary grid positions were selected for MSB recordings if the electrode would end in a voxel that was activated significantly more by monkey bodies than monkey objects and was not activated by monkey faces compared to monkey objects. Only neurons that were body-selective within MSB of M3 were included in the analysis, as this was also the (stringent) criterion for selecting the voxels in the high-resolution fMRI experiment in M1 and M2 (focusing on MSB) -see above. During the recordings, we verified the recording locations with 10 additional anatomical MRI scans. Four of these scans were performed immediately after recording sessions that targeted the body area, using an MR-compatible (fused silica; Plastics One, Roanoke, VA, US) guide tube with the electrode left in the cortex during the MRI scan. In all other scans we visualized long glass capillaries filled with copper sulfate that were inserted into the grid at recorded grid positions. The recording locations along the medio-lateral and antero-posterior dimensions were extrapolated from the trajectories of the imaged capillaries. The validity of the latter method to verify recording locations is supported by 4 MRI scans in M3 in which the electrode was imaged directly and was indeed shown to be present at the predicted location in the anterior-posterior and medial-lateral dimensions. The ventral-dorsal location of the electrode tip was verified in each recording session using the transitions of white and gray matter and the silence marking the sulcus between the banks of the STS.

Since the entire stimulus set (200 images) was too large to be presented in single cell recording sessions, neurons were searched while presenting half of the images (100 images) from the main stimulus set in a pseudo-random order. Stimuli were presented for 200 ms each with an inter-stimulus interval (ISI) of approximately 400 ms during passive fixation (fixation window size 2° \(\times\) 2°). The pink noise background was present throughout the task but refreshed together with the stimulus onset. Fixation was required in a period from 100 ms pre-stimulus to 200 ms post-stimulus. A trial was aborted when the monkey interrupted fixation in this interval. In the pseudo-randomization procedure, all 100 stimuli were presented randomly interleaved in blocks of 100 unaborted trials. Aborted stimulus presentations were repeated within the same block in a subsequent randomly chosen trial. ISIs within and between successive blocks were the same. Juice rewards were given with decreasing intervals (2000 ms to 1350 ms) as long as the monkeys maintained its fixation. All neurons were tested using this procedure and testing was continued when a response was notable in the on-line Peri-stimulus Time Histograms for at least one of the stimuli.

Retinotopic mapping experiment

The same stimuli as in refs. 27,57,60 were used for retinotopic mapping. Eccentricity and polar angle were mapped using phase-encoded annuli and wedges, which contained both dynamic monkey faces and walking humans. To mitigate phase errors caused by hemodynamic response delays, both expanding and contracting annuli, as well as clockwise and counter-clockwise rotating wedges, were used. The stimuli covered the central 0.25° to 12.25° of the visual field (radius). Each run included 4 stimulus cycles, with each cycle lasting 96 s. During a run, only one type of stimulus (e.g., a wedge or annulus moving in a specific direction) was displayed. A central fixation point was shown throughout, and the subjects were required to maintain fixation on this point. Only runs in which the subjects maintained fixation within a small virtual fixation window (2° \(\times\) 2° around the fixation point) for more than 90% of the time, while keeping their hands in a rest position, were used for further analysis. The data were analyzed exactly as described in ref. 27.

Data analysis

Stringent definition of face and body areas for the high-resolution fMRI experiment (M1 and M2)

Face and body areas were defined in each monkey’s native space. Face-selective area ML61 was defined using the conjunction of the following 3 contrasts (each with p < 0.05): monkey faces versus monkey objects, monkey faces versus fruits, and monkey faces versus monkey bodies. Body-selective area MSB24,62,63 was defined in a similar way using 3 different contrasts (each with p < 0.05): monkey bodies versus monkey objects, monkey bodies versus fruits, and monkey bodies versus monkey faces. As we aimed to be conservative in our definition of ML and MSB, we included only those voxels in these ROIs showing highly reproducible face or body selectivity. Therefore, we inclusively masked the voxels showing significant face or body selectivity across odd and even days for the 3 above-mentioned contrasts.

Mesoscale functional units (MFUs) in face and body areas and determining the optimal number of clusters

To estimate fMRI tuning in each individual voxel from the face and body areas, the effect size of each category versus fixation in units of percent signal changes were normalized across the 10 conditions by converting them to z-scores. A dissimilarity matrix was calculated based on pairwise Euclidean distances among all voxels within a face or body selective area. Then we performed hierarchical cluster analysis with Ward’s method on the dissimilarity matrix. The number of clusters was estimated based on the maximum differences in the trace of the within clusters pooled covariance matrix (Trace Cov W)64.

The Trace Cov W index quantifies the total within-cluster variance across all clusters, with a lower value indicating more compact clusters. By definition, this index decreases as the number of clusters increases because data points are divided into more groups, resulting in tighter and more compact clusters. The goal of this metric is to determine the optimal number of clusters by identifying the point where further increases in cluster no longer lead to a significant reduction in within-cluster variance, or where the decrease in variance between consecutive counts is most pronounced.

To further validate the number of clusters, we performed a permutation analysis of the Trace Cov W index and a consensus clustering analysis. For permutation testing, we randomly shuffled the data 10,000 times and calculated Trace Cov W indices for each permutation. We then compared them to the observed Trace Cov W index from the original data, to determine at which cluster count the drop in the Trace Cov W index (1st difference) is significantly larger than drops calculated from randomly permuted data. For consensus clustering, we performed hierarchical clustering on 1000 resampled subsets of the data (75% of the data each time). We then assessed the cluster stability and robustness for different values of K by calculating the proportion of times each pair of data points co-clustered across these iterations. From the resulting consensus matrices (values ranging from 0— indicating two data points were never clustered together, to 1—indicating they were always clustered together), we then evaluated the increase in clustering stability with increasing K by examining the change in area under the CDF of the consensus values.

FC analysis based on resting-state fMRI data

We used identical procedures as described in ref. 26. We included slice timing correction and the signal was high-pass filtered at 0.0025 Hz and low-pass filtered at 0.05 Hz. The signal from white matter and ventricle ROIs, the motion correction regressors, reward, and eye movement related regressors, and their first derivatives were used to regress out nuisance effects. No global mean regression was performed to avoid discarding any underlying neural components65. The representative time course of each MFU was obtained by averaging the signals across voxels. Pearson correlations between all possible pairs of MFUs were calculated to measure the functional connectivity. The correlations were converted to z-scores by Fisher’s r-to-z transformation per run66 to improve normality.

Quantitative tests comparing interhemispheric functional connectivity

Considering that resting-state signal fluctuations in cortical regions tend to be positively correlated between homologous regions from left and right hemispheres67, even in the absence of direct anatomical connections68, we predicted that the same holds true for MFUs of the same type across the two hemispheres. Therefore, we tested whether FC between MFUs belonging to the same cluster ( = same-type MFUs) of a large and well-established face area (ML), yet across hemispheres, is stronger than between MFUs (across hemispheres) belonging to different clusters of the same face area ( = different-type MFUs). To generalize this result, we also performed the same analyses for MFUs of the body area MSB. To test inter-MFU functional connectivity quantitatively, we first calculated the FC strengths from the same- and different-type MFUs across hemispheres for each resting-state run. We then performed a pairwise t-test to compare FC strengths between same- and different-type MFUs across all the resting state runs. Note that (i) two MFUs can be different in size, (ii) smaller units tend to exhibit lower SNR, and (iii) FC strengths can be biased depending on the size of these functional units. To control for this bias, we performed a permutation test (10,000 times of permutations) and calculated the probability that the median value of FC differences between same- or different-type MFUs was higher than the median value when the same voxels were randomly assigned to different MFUs.

Comparison between MFUs based on fMRI and functional clusters based on single-cell recordings in body area MSB

To test whether similar sub-areal functional clusters appear at single-cell level, we re-analyzed single-unit data15, which were recorded in fMRI-defined MSB of another animal (M3). We only included the single-unit data in our analysis, which showed body selectivity for the conjunction of the same contrasts as used in the high-resolution fMRI study (monkey bodies versus monkey objects, monkey bodies versus fruits, and monkey bodies versus monkey faces, each with p < 0.05). Each cell’s average (net) spike response to the 100 stimuli was normalized by converting them to z-scores, and a dissimilarity matrix was calculated based on pairwise Euclidean distances across all cells. Then we performed hierarchical cluster analysis with Ward’s method on the dissimilarity matrix. As the number of data points (i.e., 98 cells) is smaller than the number of features (i.e., 100 stimulus images), Trace Cov W index cannot be used to determine the number of clusters. We therefore chose the number of clusters based on the Trace Cov W index calculated from averaged single-cell responses at category level. Finally, nonparametric Spearman correlation coefficients were calculated on the average response profile between each fMRI and single-cell clusters. Since single-cell and fMRI data were recorded from different subjects, we pooled data from the 2 fMRI subjects with the goal to reduce individual biases. We first calculated average profiles for each cluster from each individual and then calculated the mean of the corresponding profiles from the two individuals. Nonparametric Spearman correlations were used to calculate the distance matrix, and a hierarchical cluster analysis with the complete linkage method was conducted to objectively match corresponding clusters obtained by either fMRI or single-unit recordings.

Evaluating the impact of eccentricity biases or face/body-selectivity on MFU clustering

To determine whether MFUs could be distinguished based on eccentricity differences, we computed correlations between Euclidean distance matrices derived from eccentricity maps and those from full object response profiles. This analysis quantified how much of the variance among the 3 clusters could be attributed to eccentricity. Eccentricity was estimated using a general linear model applied to retinotopic mapping data. Phase values for the annuli served as proxies, with the fovea assigned a phase of zero and the most peripheral eccentricity a phase of 2π. These phase values were z-normalized across voxels before computing Euclidean distances. Only voxels exhibiting significant eccentricity biases (p < 10−3) were included in the analysis. Object response profiles were derived as described previously for hierarchical analysis, with effect sizes calculated as percent signal change of each category relative to fixation. These values were z-normalized across conditions before computing Euclidean distance matrices. To assess the reliability of these distance measurements, we performed test-retest analyses. Data acquired on different days from each experiment were split evenly into two independent datasets, and the consistency of distance measurements from both eccentricity and object response profiles was evaluated using test-retest correlation analysis and the Spearman-Brown formula.

We conducted a similar analysis to quantify the contribution of face selectivity (for ML) and body selectivity (for MSB) to the variance among the 3 clusters. Face selectivity was estimated using t-values from an “all faces versus all bodies” contrast, while body selectivity was calculated from the reverse contrast. These t-values were z-normalized across voxels before computing Euclidean distances. To evaluate the variance explained by face and body selectivity, we computed correlations between Euclidean distances derived from these selectivity measures and those from the full object response profiles. Additionally, we computed split-half analysis, following the same procedure used for eccentricity, to assess the reliability of these distance measurements.

Quantify the sharpness of MFU boundaries

To quantify whether object response profiles change gradually or sharply across MFU borders, we compared Euclidean distances between response profiles of neighboring voxels. For each voxel located at an MFU border, we calculated the Euclidean distance between its response profile and those of its neighboring voxels within the same MFU (“within” condition), as well as those of its neighboring voxels belonging to a different MFU (“across” condition). The two sets of distances were compared using a Wilcoxon rank-sum test.

Quantify the response similarity spatial fall-off pattern

Since MFUs were clustered based on Euclidean distances, we used 1- Euclidean distance as an index of response similarity between voxels. Pairwise Euclidian distances between all possible voxel pairs were calculated separately for ML, MSB and object-responsive regions in each hemisphere, then pooled across these regions after z-normalization within each region and hemisphere. To measure cortical distance between voxel pairs, we used center-to-center geometrical distance. To quantify the slope of the spatial fall-off pattern, we fitted a rational function with three free parameters [a/(1 + bx) + c] to the raw data using lsqcurvefit function in MATLAB. A likelihood ratio F-test was conducted to test whether the slope parameter b differed significantly from 1.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.