Introduction

Orientation processing has been a cornerstone of vision science, since the discovery of orientation-selective neurons in the primary visual cortex (V1) by Hubel and Wiesel1,2. Building on this foundational work, increasingly sophisticated models have been developed to elucidate the mechanisms underlying early visual perception. Subsequent multiscale and spatial frequency models3,4,5,6,7,8 established orientation and scale as core representational dimensions, providing the basis for modern filter-based accounts of visual processing. These models, often implemented using Gabor filters, have been widely used as analogues of V1 receptive fields, representing orientation and spatial frequency information across scales9,10,11.

Over the years, various studies of orientation processing have used stimuli defined by either sharp edges (e.g., bars, contours) or stimuli generated using oriented filters (e.g., Gabor patches). These two types of stimuli have often been treated as equivalent, with research on ensemble perception12,13,14, visual attention15,16 and neural decoding17,18 using them interchangeably. However, edge-based and filter-based stimuli capture fundamentally different aspects of orientation information—discrete object boundaries versus distributed local gradients—raising the question of whether the visual system processes them in the same way.

In both vision science and computer vision, filter-based methods, such as the Steerable Pyramid19, have been widely used to describe orientation structure in real-world images and to predict neural responses20,21,22,23,24,25. These models compute orientation energy across spatial frequencies but combine multiple sources of information—including sharp edges, gradual luminance gradients, and fine surface patterns—without distinguishing them. In contrast, contour-based representations isolate extended boundaries that define object shape, providing information crucial for recognition and categorization of objects and scenes26,27,28,29,30,31. This distinction is theoretically important: whereas filters integrate local gradients across scales, contour models emphasize global geometric structure. While information from high spatial frequencies may capture sharp edges in the Steerable Pyramid, its standard implementation assigns equal weights to all orientations and spatial frequencies, producing a mixed representation of shape and surface textural features (small-scale intensity variations). Whether such filter-based representations adequately capture the orientation information that humans rely on in perception and that is reflected in neural selectivity remains an open question. Real-world images contain a mixture of textural and structural cues, but contour and boundary information often carry more perceptual weight. This raises the question of which form of orientation the human visual system prioritizes when processing complex natural scenes where multiple orientation cues are intertwined. Resolving this ambiguity is crucial for understanding how distinct orientation signals contribute to visual perception and neural representation.

To address these questions, this study investigates how humans perceive orientation in complex scenes and how different computational methods for extracting orientation impact our understanding of neural representations. In the current study, we adapted the steerable pyramid filter used in previous work (e.g., Roth et al.24,25), applying equal weighting across spatial frequency levels when combining orientation responses. This implementation reflects the conventional use of the steerable pyramid in recent computational neuroimaging studies, enabling direct comparison with prior work. However, this equal weighting does not reflect the spatial frequency-dependent tuning observed in human vision, where sensitivity typically peaks at mid-range frequencies and declines at both lower and higher ends of the spectrum 5. Thus, our comparison was designed to assess whether the standard filter-based representation captures orientation information relevant to human perception as effectively as a contour-based model.

Study 1 examines how these two computational measures of orientation correspond to human judgments of average orientation in natural images. Study 2 builds on these findings by evaluating the impact of these types of orientation information on neural maps of orientation selectivity in the visual cortex. Together, these studies aim to uncover the nature of human visual processing of orientation by bridging the gap between technical methodologies and perceptual and neural insights.

Study 1. Human judgment of orientation in natural images

The human visual system efficiently summarizes visual information, as shown by ensemble studies demonstrating that people can quickly and accurately report summary statistics, such as the average orientation of a set of elements12,13,14. However, orientation averaging is not perfectly efficient. Solomon et al.32 showed that it relies on a combination of serial and parallel integration processes. This underscores the importance of testing how orientation judgments operate in complex, real-world scenes.

Determining orientation in real-world scenes is inherently complex because such scenes contain both fine-scale structures and extended contours that define object boundaries33. This raises the question of whether the visual system prioritizes certain types of orientation information over others when integrating signals from complex visual input. Prior work by Takebayashi and Saiki34 demonstrated that ensemble perception of orientations varies with spatial frequency using simplified Gabor arrays, but it remains unclear whether these findings generalize to natural scenes where multiple cues interact.

To address this question, we compared two computational methods for extracting orientation: the Steerable Pyramid filter and the Contour-Based method (Fig. 1a; see detailed implementation in Methods, Orientation Analysis). The Steerable Pyramid filter captures multi-scale orientation information across spatial frequencies. We utilized eight orientations and seven spatial frequency levels, computing the local orientation energy as the squared magnitude of the complex filter response. The orientation energy was first averaged across spatial frequency levels with equal weighting to produce a per-orientation energy map. The mean orientation of each image region was then computed as the circular mean of orientation angles, weighted by the corresponding energy across orientation channels. In contrast, the Contour-Based method derives orientation geometrically from the tangent direction of contours extracted from edge maps, emphasizing shape-defining edges and boundaries. Participants judged the average orientation of image patches selected to either maximize or minimize the difference in orientation values computed by these methods (Fig. 1b). To ensure that responses were not biased by the probe format, Study 1a, participants indicated their responses by adjusting the orientation of a red bar overlaid on the image patch, whereas Study 1b added a condition in which participants rotated a grating probe with a lower spatial frequency. Comparing human responses to these computed orientations allowed us to evaluate which method aligns more closely with human perception.

Fig. 1
figure 1

(a) Example stimuli. Photograph: original image from the Natural Scenes Dataset35. Steerable Pyramid: Energy responses from eight orientation averaged across seven spatial frequency levels mapped on to the photograph. Contour: Orientation map from line drawing. The color represents orientation, where 0° corresponds to vertical and angles increase clockwise. Max Diff: Image patch from the original photograph that has the biggest difference in mean orientation computed using steerable pyramid Filter versus Contour-based methods. The blue bar indicates the mean orientation of the steerable pyramid filter, the pink bar indicates the mean contour orientation. Min Diff: Image patch where the mean orientation is the same with the two methods, indicated in yellow. (b) An example experimental sequence. Study 1a: When the red bar appeared above the image, participants were instructed to move their mouse to rotate the bar and click to confirm. Study 1b: Either a red bar or grating appeared next to the image for response.

Results

Orientation response errors were determined by measuring the absolute difference between participants’ responses and the average orientation values derived from either the Steerable Pyramid filter or the Contour-based method.

Study 1a

We fit linear mixed-effects models using the lme4 package36 with the lmerTest extension37 to obtain p-values. Following recommendations by Barr et al.38, we tested models with varying random-effects structures. A maximal model with random intercepts and random slopes for both Method and Image Condition failed to converge, and a model with random intercepts and a random slope for Image Condition alone also failed to converge; therefore, we retained a model with random intercepts and a random slope for Method. Degrees of freedom for fixed effects were estimated using the Satterthwaite approximation. Effect coding (-1, 1) was applied to all predictors to simplify interpretation. Effect coding is a contrast-coding scheme for categorical variables in which the categories are coded as − 1 and + 1 (rather than 0 and 1, as in dummy coding). With effect coding, main effects can be interpreted as the average effect across conditions rather than as comparisons to an arbitrary reference category. This allows coefficients to reflect overall differences between conditions directly, while interactions indicate how those differences vary across factors.

We found significant main effects of Image Condition (β = 5.85, SE = 0.21, t(26,250) = 28.44, p < 0.001) and Method (β = 3.97, SE = 0.21, t(27.9) = 13.09, p < 0.001) (Fig. 2a). Here, β refers to the fixed-effect regression coefficients from the linear mixed model. Response errors were generally larger in the maximum-difference condition (Mean (M) = 45.00°, SE = 0.30°) compared to the minimum-difference condition (M = 33.32°, SE = 0.29°). Overall, orientation judgments were more closely aligned with the predictions from the Contour model (M = 35.14°, SE = 0.29°) than with those from the filter-based model (M = 43.17°, SE = 0.31°). Importantly, there was also a significant interaction between Method and Image Condition (β = 4.01, SE = 0.21, t(26,250) = 19.52, p < 0.001), suggesting that the difference between the filter and Contour methods varied depending on the image condition.

Post-hoc comparisons further clarified this interaction. In the maximum-difference condition, response errors were significantly smaller when they were compared to the Contour method (M = 37.0, SE = 0.94) than the filter method (M = 53.0, SE = 0.94), estimate = 15.97, SE = 0.73, p < 0.001. Here, “estimate” refers to the contrast estimate (mean difference) between the two conditions derived from the mixed model. However, no significant difference was found between these methods in the minimum-difference condition, estimate = -0.08, SE = 0.73, p = 0.91.

A Kruskal–Wallis test revealed no significant differences in confidence ratings across image conditions, with maximum-difference (M = 3.18, SD = 1.37) and minimum-difference (M = 3.18, SD = 1.37), χ2(1) = 0.03, p = 0.853. Additionally, the amount of response errors did not vary with confidence level. When categorizing confidence ratings into low (1,2) and high (4,5), we initially attempted a linear mixed-effects model including by-participant random slopes for both Method and Confidence Category; however, this model failed to converge. We therefore selected a model with a random slope for Confidence Category only, as it converged successfully and had slightly lower AIC/BIC compared to the alternative model with a random slope for Method. In this model, there was no significant main effect of Confidence Category (β = 1.00, SE = 1.17, t(32.72) = 36.14, p = 0.281), nor the interaction between Method and Confidence Category (β = 1.63, SE = 0.99, t(19,616.42) = 1.66, p = 0.098). These results suggest that participants’ confidence judgments neither predicted their actual orientation judgments, nor were influenced by experimental conditions.

Study 1b

We analyzed the data in the same way as in Study 1a, but added Response Type (bar vs. grating) as an additional fixed factor (Fig. 2b). For the random-effects structure, we used the same random slope for Method as in Study 1a, since models with additional random slopes failed to converge, and this model also had the lowest AIC/BIC. We found significant main effects of Image Condition (β = 6.26, SE = 0.13, t(59,180) = 47.00, p < 0.001) and Method (β = 3.61, SE = 0.25, t(46.25) = 14.54, p < 0.001). Again, response errors were larger in the maximum-difference condition (M = 45.00°, SE = 0.20°) compared to the minimum-difference condition (M = 32.47°, SE = 0.19°). Furthermore, orientation judgments were more closely aligned with the predictions from the Contour model (M = 35.10°, SE = 0.19°) than with those from the filter-based model (M = 42.37°, SE = 0.20°). However, there was no main effect of Response Type (bar: M = 38.80°, SE = 0.20°; grating: M = 38.67°, SE = 0.20°). We again found a significant interaction between Method and Image Condition (β = 3.63, SE = 0.13, p < 0.001). However, there were no significant interactions with Response Type, suggesting that participants’ responses did not differ based on the type of response stimuli.

The mean confidence rating was 3.29 (SD = 1.20) across participants. A Kruskal–Wallis test revealed no significant differences in confidence ratings across conditions, including Image Condition (χ2(1) = 0.67, p = 0.412) and Response Type (χ2(1) = 0.00, p = 0.983). After categorizing confidence ratings into low (1,2) and high (4,5), a linear mixed model with Method (Filter vs. Contour), Confidence Category (low vs. high), Response Type (bar vs. grating), and by-participant random intercepts and random slopes for Method was conducted. This random-effects structure was chosen to match Study 1a; models including additional random slopes for other variables failed to converge, and this model also had the lowest AIC/BIC. Notably, there was no main effect of Confidence Category (β = 1.29, SE = 0.85, t(162.56) = 1.51, p = 0.133). However, there was an interaction between Method and Confidence Category (β = 3.65, SE = 0.96, t(42,200.63) = 3.80, p < 0.001). The difference in response error in high and low condition of Confidence Category was larger in the Contour condition (estimate = -4.55, SE = 0.70, p < 0.0001) compared to the filter condition (estimate = -1.86, SE = 0.70, p = 0.008), suggesting participants responded more accurately when they were confident especially in the Contour condition.

The findings from Study 1a demonstrate that participants’ orientation judgments aligned more closely with the Contour-based method when the orientation values computed by the two methods diverged. The results from Study 1b further confirmed that participants prioritized contour information when summarizing orientation, regardless of whether they responded using a bar or a grating, demonstrating that the preference for Contour-based orientation is not merely a consequence of the response format.

Interestingly, participants showed overall higher errors in the maximum-difference condition than in the minimum-difference condition, suggesting that conflicting orientation signals within an image may have incurred perceptual costs. This pattern implies that multiple sources of orientation information contribute to perception, but some are weighted more heavily or provide more reliable cues than others.

Together, these findings suggest that the human visual system does not treat all orientation signals equally but instead prioritizes those that convey more stable and geometrically informative structure.

Study 2. Neural representation of orientation selectivity

Building on Study 1, Study 2 examined whether the perceptual emphasis on contour information is reflected in the visual cortex’s orientation selectivity. Using the framework introduced by Roth et al.24, who modeled orientation tuning in 7 T fMRI data from the Natural Scenes Dataset, we tested whether the type of extracted orientation information modulates neural responses in visual cortex.

Roth et al.24’s steerable-pyramid model revealed coarse-scale orientation maps and a radial bias in V1 when participants viewed real-world scenes. However, their model treated orientation as broadband energy equally weighted across spatial frequencies, without distinguishing whether signals originated from texture from multiple scales or from object contours. To address this limitation, we compared three image-computable models that differ in the type of orientation information they represent (Fig. 3; see Supplementary Fig. S1-8 for examples of orientation responses from each model).

Fig. 2
figure 2

Study 1a: Response error by Image Condition and Method. Both main effects and their interaction were significant. Study 1b: Response error by Image Condition, Method and Response Type. Only main effects of Image Condition and Method and their interaction were significant. Note that the absolute value of the response error was used for the analyses.

The Photo-Steerable Pyramid Model replicated Roth et al.’s24 method. Orientation energy was extracted from grayscale real-world photographs across eight orientations (0°–157.5° in 22.5° increments) and seven spatial frequencies. For each orientation and frequency, responses were computed as the squared magnitude of quadrature-pair filter outputs to yield orientation energy maps.

The Line Drawing-Steerable Pyramid Model used the same filter bank on line drawings derived from the photographs, isolating boundary and shape information. This model allowed a direct test of whether differences arise from the filtering method or on the type of orientation information represented.

The Contour-Based Model derived orientation geometrically from the tangent direction of contour line segments extracted from line drawings. Each contour pixel was assigned an orientation value corresponding to its local tangent direction, producing a map that captures the distribution of contour orientations in the image.

To link these models to neural responses, we applied a population receptive field (pRF) mapping: for each voxel, model outputs were sampled within the voxel’s pRF to estimate its orientation response. These model-derived responses were then entered into a multiple regression analysis to predict voxel-wise fMRI signals. If the nature of extracted orientation information significantly shapes neural coding, then discrepancies would emerge between our results and those of Roth et al.24. Especially, if contour information is critical for orientation perception, models incorporating contour-derived features should better explain neural representations in visual cortex.

Results

Distribution of orientations

Orientation distributions in the images differed between computation methods (Fig. 4). For the Photo- and Line Drawing-Steerable Pyramid models, orientation energy was summed across different spatial frequencies. All methods showed more horizontal and vertical orientations than oblique ones, but the Contour-Based Model accentuated this cardinal dominance. Despite sharing the same line-drawing input as the Contour-Based Model, the Line Drawing-Steerable Pyramid produced an orientation distribution more similar to the Photo-Steerable Pyramid model, indicating that filtering, not input alone, influences the resulting orientation statistics.

Fig. 3
figure 3

Analysis pipeline for each condition. Photo steerable pyramid: Pipeline for the original model from Roth et al.24. Line drawing steerable pyramid: Pipeline for Line drawing – Pyramid Filter condition. Instead of photographs of real-world scene images, the line drawings are used as inputs of the model. The line drawings were computer-generated using the MLV toolbox. Contour: Pipeline for Contour Orientation condition. The same line drawing images were vectorized. Then, the orientations of those contours were computed. The model and the figures are borrowed from Roth et al. Only four orientations were shown here for visualization purposes. The images presented here are solely for illustration purposes and were not part of the study’s dataset.

Fig.4
figure 4

Distribution of orientations in images. The x-axis depicts the eight different orientations, and the y-axis depicts proportions of each orientation in the images

Model fit

To evaluate these models relate to neural responses, we analyzed high-resolution fMRI data from the Natural Scenes Dataset (NSD)35, focusing on early visual areas (V1–hV4). For each voxel, model outputs were sampled within its population receptive field (pRF) and entered into a multiple-regression analysis predicting voxel-wise responses. For R2, we cross-validated by splitting data into halves and measured the variance explained by each model on out-of-sample data. Cross-validated R2 values were averaged across participants and regions (Fig. 5a).

The Line Drawing-Steerable Pyramid Model achieved the highest mean R2, followed by the Contour-Based Model, with the Photo-Steerable Pyramid Model showing the lowest fit (Fig. 5a). A Kruskal–Wallis test performed on ROI-averaged R2 values confirmed significant differences among models, χ2(2) = 3376.2, p < 2.2e-16. As illustrated in Fig. 5a, this pattern was consistent across individual visual areas (V1–hV4), with all pairwise differences significant (all p < 1e–7) except between the Contour-Based and Line Drawing–Steerable Pyramid Models in hV4 (p = 0.032). We validated the goodness of fit of the model using both AIC (Akaike Information Criteria) and BIC (Bayesian Information Criteria) (Table 1) producing the same trend as R2.

Table 1 Summary of model fit indices. For all models, there were eight orientations and an intercept as predictors. All indices were averaged across participants and ROIs.
Fig.5
figure 5

(a) Mean cross-validated R2 for each model, averaged across participants and shown separately for each visual area (V1–hV4) and across all ROIs combined. Error bars indicate ± 1 SEM. All pairwise differences were significant except between the Contour-Based and Line Drawing–Steerable Pyramid Models in hV4. (b) Mean R2 by population receptive field eccentricity. Eccentricities of voxels are binned up to 4.2 visual angle which was the extent of the image from the central fixation mark. (c) Left: Visual ROI on an inflated surface map in fsaverage space. Right: R2 difference surface map in fsaverage space. R2 from the Photo-Steerable Pyramid Model was subtracted from the Contour Model. Positive values indicate bigger R2 values from Contour compared to Photo-Steerable Pyramid Model.

Model performance also varied with pRF eccentricity (Fig. 5b; see Supplementary Fig. S9 for distribution of R2 values between the models, pRF R2, and pRF eccentricity). Both line drawing-based models outperformed the Photo-Steerable Pyramid Model across the visual field, including at higher eccentricities. This result suggests that contour structure contributes strongly to orientation selectivity beyond the fovea.

A surface map of voxel-level R2 difference (Contour – Photo-Steerable Pyramid; Fig. 5c) further shows that the Contour-Based Model explains more variance across most cortical surface locations, with exceptions concentrated in V1. Together, these analyses converge on the same conclusion: emphasizing contours yields better fits than a conventional filter-based model, both globally and across the visual field.

Mapping orientation preference.

All models revealed coarse-scale patterns of orientation selectivity in V1–hV4 (Fig. 6a). The Line Drawing–Steerable Pyramid and Contour-Based models yielded highly similar maps and, relative to the Photo–Steerable Pyramid model, produced more vertices preferring near-vertical orientations.

Fig.6
figure 6

(a) Preferred orientation for voxels in visual cortex on the inflated surface in fsaverage space for each model. Hue encodes the preferred orientation. Unthresholded orientation maps include all vertices in V1–hV4. Thresholded maps show vertices with R2 > 0.014 (median R2 of the Photo-Steerable Pyramid model). (b) The orientation preferences of V1 are mapped onto visual space; each line is centered at the voxel’s pRF center, with length/width scaled by R2. The hue and orientation of the line denote the voxel’s preferred orientation. A solid square marks the image stimuli extent (± 4.2°). (c) A schematic illustrating three idealized orientation preference maps plotted in visual space (top panel). The deviation of preferred orientation from ideal vertical (dotted line), cardinal (dashed line), and radial (solid line) orientation maps are shown as a function of pRF eccentricity (middle panel) and pRF angle (bottom panel). Note: Eccentricities are plotted out to 10° to match the convention used by Roth et al. 24. Estimates beyond the stimulus extent (~ ± 4°) reflect model-based extrapolation.

To examine V1 in detail, we plotted orientation preferences of V1 voxels in visual space (Fig. 6b). We quantified the similarity of the orientation maps to the ideal vertical, cardinal, and radial maps by computing the deviation of each voxel’s orientation preference from the preference predicted by each ideal map (Fig. 6c; eccentricities are plotted up to 10° for comparability with Roth et al.24, although the stimuli extended to ± 4°. Estimates beyond the stimulus extent (~ ± 4°) are based on model interpolation). The Photo-Steerable Pyramid Model reproduced the radial bias – the tendency for neurons in V1 to prefer orientations that align with the radial axis extending from the center of the visual field, confirming the results of Roth et al.24 and Sasaki et al.39. By analyzing residuals of constrained model that did not consider orientation, Roth et al.24 confirmed that the radial bias does indeed arise from the orientation present in the image rather than potential artifacts from vignetting.

In contrast, the Line Drawing-Steerable Pyramid and Contour Models demonstrated a predominant vertical bias, particularly at lower eccentricities (< 5°). Similarly, Freeman et al.40 found that most V1 voxels exhibited near-vertical orientation preferences at low eccentricities (< 5°) and a radial bias at high eccentricities (> 5°). This divergence from studies that observed radial bias across all eccentricities24,39 likely reflects the greater weighting of extended edges in the line-drawing and contour representations.

To better understand orientation tuning across these three different analyses, we performed a control analysis using residuals similar to Roth et al.24 (Fig. 7). We were curious how the orientation preference in V1 would look like if we fit the Photo-Steerable Pyramid model to the data after the contour information has been regressed out. To this end, we fitted the Photo-Steerable Pyramid model to the residuals of the Line drawing-Steerable Pyramid model. We found the same radial bias as the Photo-Steerable Pyramid model, confirming that the radial bias arises from orientation information other than contours including gradual luminance gradients, and fine surface patterns.

Fig.7
figure 7

(a) Schematic Venn diagram illustrating the control analyses. For the first analysis (left), the residuals from Line drawing-Steerable Pyramid model were fitted with the Photo-Steerable Pyramid model. For the second analysis (right), the residuals of the Contour model were fitted with the Line drawing-Steerable Pyramid model. (b) Orientation map of V1 in visual space from the two analyses with the same format as Fig. 6b.

To confirm that the vertical bias truly arises from contour information and not artifacts of the analysis method, we performed similar residual analysis for the two line drawing-based models (Fig. 7). Specifically, we analyzed the residuals of the Contour model with the Line drawing-Steerable Pyramid model. In the orientation mapping, we found no discernible pattern thereby confirming both models capture the orientation tuning due to contour information.

These differences suggest that emphasizing structural boundaries alters the apparent organization of orientation selectivity in early visual cortex. The reduced radial bias and enhanced vertical preference observed in the contour-emphasizing models imply that boundary information engages distinct orientation encoding mechanisms in V1 compared to models that integrated richer distributed gradient energy, providing one possibility for why previous studies have reported conflicting patterns of orientation selectivity. The finding that the Line Drawing–Steerable Pyramid model best explained the fMRI data supports this view: although it remains a filter-based implementation, it operates on stimuli in which contours are explicitly extracted, underscoring that the critical factor is the content of orientation information, not the computational tool itself. How orientation preferences are mapped in the brain remains speculative and developing models that more accurately capture how humans perceive orientation in complex scenes—and which types of information are prioritized—will be essential to obtain a clearer and more precise understanding of orientation selectivity.

While a nonlinear or segmentation-aware variant of the steerable pyramid could, in principle, approximate contour extraction by emphasizing strong, extended edges, this would depart from how the model is conventionally implemented (e.g., Roth et al.24,25). Our results therefore highlight that the standard linear, equal-weight version underrepresents boundary cues, treating surface and edge information equivalently. Models that incorporate grouping or contour-integration mechanisms may better capture how human vision encodes oriented structure.

Importantly, these discrepancies cannot simply be explained by differences in orientation distribution in the images (Fig. 4). Despite having nearly identical orientation distributions, the Photo- and Line Drawing–Steerable Pyramid models produced distinct selectivity maps, with the latter closely matching the Contour-Based results. This confirms that orientation selectivity depends on the type of information represented rather than on global orientation statistics.

Our results indicate that methodological choices in orientation computation can have substantial consequences for interpreting neural data. By incorporating models that emphasize contour and boundary structure, future studies could better capture the perceptual and neural underpinnings of orientation selectivity in real-world scenes.

General discussion

This research provides novel insights into how humans perceive and represent orientations in real-world scenes. Across two complementary studies, we show that orientation measures emphasizing extended contour structure align more closely with both perceptual judgments and neural selectivity, despite the fact that conventional filter-based measures capture more rich orientation information from multiple scales.

Study 1 demonstrated that human orientation judgments are more consistent with Contour-Based computations, suggesting that people prioritize edge and boundary information when summarizing orientation in complex scenes. This finding aligns with previous research indicating that contour processing is fundamental to object and scene perception, as it allows the visual system to extract meaningful shapes and structures from the environment26,27,28,29.

Study 2 extended this finding to neural data, showing that models emphasizing contour information explain more variance in fMRI responses and yield distinct orientation-selective maps across visual areas. These findings emphasize the critical role of the type of orientation information in shaping our understanding of neural orientation selectivity.

The Steerable Pyramid remains a powerful and biologically inspired framework for analyzing image orientation, yet its standard linear implementation—assigning equal weights across spatial frequencies—treats boundary and surface information equivalently. Our results show that this conventional use may underrepresent the structural cues most relevant to human perception. Models incorporating nonlinear grouping or contour-integration mechanisms could better approximate how vision organizes oriented structure, bridging the gap between computational efficiency and perceptual realism.

These findings also connect classic spatial frequency research with naturalistic approaches. Burr and Wijesundra41 showed that observers generally exhibit better discriminability for orientations presented at high spatial frequencies using single-orientation gratings. Our results extend this finding and demonstrate that similar orientation–frequency relationships also hold in the context of real-world scenes, validating those principles under ecologically realistic conditions. Real-world vision involves a hierarchy of overlapping contours and gradients, and the human visual system appears tuned to extract the contour information that most effectively conveys shape and layout.

Future research could test whether contour prioritization extends to other visual domains, such as depth and motion perception. Integrating contour-based methods into existing computational frameworks could bridge the gap between human and computer vision, leading to the development of more robust and human-like visual processing systems. For example, models that incorporate contour integration and symmetry have been shown to improve tasks such as scene categorization and image completion, resulting in more perceptually plausible outcomes42,43,44,45.

In conclusion, our study underscores the importance of methodological alignment in visual neuroscience. As the building blocks of the shape representations for both objects and scenes alike, contours play a pivotal role in visual perception. By recognizing the perceptual and neural significance of contour information, we can develop more accurate models of shape processing across the entire visual processing hierarchy, fostering a deeper understanding of visual perception and its underlying mechanisms.

Methods

Study 1. Human judgment of orientation in natural images

Subjects

All participants were undergraduate students from the University of Toronto who participated in the studies for course credit. All participants reported normal or corrected-to-normal vision. Informed consent was obtained prior to participation, and the study was approved by the Research Ethics Board at the University of Toronto (Protocol number 30999). All research procedures were performed in accordance with the relevant guidelines and regulations, including the Declaration of Helsinki. Thirty-one undergraduate students participated in Study 1a. The required sample size was determined using G  Power 3.146, assuming a medium effect size (f = 0.35), α = 0.05, power (1 − β) = 0.9, nonsphericity correction ε = 0.5 for a main effect within a repeated-measures ANOVA. Because comparable prior studies were limited, this conventional medium effect size was used as a reasonable benchmark. The resulting target sample size was 26; to account for potential exclusions, data from 31 participants were collected. Study 1b followed the same design as Study 1a, with the addition of a within-subject factor. Because each participant completed fewer trials per condition in this expanded design, we increased the total sample size to maintain comparable power at the condition level. Data were collected from 53 participants, of whom 48 remained after applying pre-registered exclusion criteria.

Although we initially analyzed the data using ANOVA, we ultimately used mixed-effects models, which provides a more flexible framework for handling subject-level variability. Here, we report the results from mixed effects models since the pattern of results was consistent with the results from ANOVA.

Orientation analysis

The orientation information in the images was analyzed in two ways.

Steerable pyramid filter

We used a Steerable Pyramid decomposition adapted from Roth et al.24,25. The Steerable Pyramid filter is a multi-scale, multi-orientation image decomposition technique that breaks down images into sub-bands across various orientations and spatial frequencies19,47. It is termed “steerable” because the response of a filter at any arbitrary orientation can be expressed as a linear combination of a fixed set of basis filters, making it possible to “steer” the filter analytically to any angle without physically rotating the kernel which enables efficient orientation analysis19. Each sub-band was generated by applying complex-valued quadrature-pair filters in the Fourier domain, where the frequency response was defined by the product of a radial bandpass envelope and an angular raised-cosine function.

Following Roth et al.24, we utilized eight orientation bands, equally distributed within 180 degrees, and seven spatial frequency levels corresponding to 128, 64, 32, 16, 8, 4, and 2 cycles per image. The spatial frequency bandwidth was set to one octave, resulting in tuning profiles that approximate those of individual V1 neurons48. This configuration captures orientation information across a wide range of spatial scales, providing a comprehensive representation of the image’s components. For each level and orientation, we computed the local orientation energy as the squared magnitude of the complex filter response. The orientation energy was first averaged across spatial frequency levels with an equal weight to produce a per-orientation energy map. Finally, the mean orientation of each image region was derived as the circular mean of orientation angles, weighted by their corresponding energy across orientation channels.

Contour orientation

The images were processed and analysed using the Mid-Level Vision Toolbox49. Initially, edges are detected in the photographs using a structured learning framework applied to random decision forests50. Continuous contours are then traced from the resulting edge maps, producing binary contour maps indicating the presence or absence of a contour at each pixel. These were subsequently vectorized into line drawings, represented as a set of contours, each consisting of a sequence of connected straight line segments. For each line segment, an orientation value was computed from its direction vector and assigned to all pixels belonging to that segment, yielding an orientation map aligned with the underlying contour structure of the image.

Stimuli

The experiment utilized grayscale real-world images from the Natural Scenes Dataset 35. Mean orientation was computed for each image using both the Steerable Pyramid filter and Contour orientation methods. For the Steerable Pyramid filter, orientation maps were averaged across the seven spatial frequencies using a weighted circular mean. Contour orientations were computed without consideration of spatial frequencies. From 453 images with the largest differences in mean orientation between the two methods, circular patches (142 pixels in diameter) representing regions of maximum and minimum difference were cropped and used as stimuli (Fig. 1a). The image patches were rotated randomly to prevent response biases, given the prevalence of horizontal and vertical orientations in real-world scenes20,21,51.

In Study 1a, a red bar was used as the response stimulus, whereas in Study 1b, participants responded using either a red bar or a red grating. The grating had a spatial frequency of 2.1 cycles per image and was presented within a circular aperture.

To ensure participants’ attention, sixteen grating stimuli with controlled spatial frequencies and contrasts were added as attention checks. Gratings were generated at four spatial frequencies (1,3.29, 10.80, 35.50 cycles per image) and four contrast settings (0.25, 0.5, 0.75, 1). These gratings were cropped to the circular aperture, matching the size of the experimental stimuli and rotated by a random angle.

Procedure

Before beginning the experiment, participants received instructions and provided consent to participate. To ensure consistent image presentation in the online setting, participants were instructed to adjust the size of a rectangle on their screen to match the physical size of a credit card.

In Study 1a, each trial began with a grayscale image patch, followed 200 ms later by a red bar overlaid on the image with a random initial orientation. Participants adjusted the bar’s orientation using a mouse to match the perceived average orientation of the image and confirmed their response with a mouse click. The stimulus remained on screen until a response was made, with a 20-s timeout. To discourage rapid guessing, participants saw a message—“Please respond as accurately as possible. Your participation is very important to this study”—if they responded faster than 800 ms. After each orientation judgment, participants rated their confidence on a scale from 1 (low) to 5 (high) (Fig. 1b).

In Study 1b, to examine whether the type of response probe influenced participants’ orientation judgments, trials presented a grayscale image patch next to either a red bar or a red grating. To control for potential spatial biases, the locations of response stimuli were counterbalanced across trials, appearing either on the left or right side of the screen. Participants adjusted the orientation of the response stimulus (bar or grating) to match the average orientation of the image, following the same response procedure as in Study 1a.

The main experiment consisted of eight blocks, with 58 trials per block in Study 1a and 82 trials per block in Study 1b, including two grating trials and equal numbers of maximum and minimum difference patches. The bar (B) or grating (G) conditions were blocked, and participants were randomly assigned to either a GBBGGBBG or BGGBBGGB sequence. Before beginning the main trials, participants completed 10 practice trials. The experiment lasted approximately 40 min for Study 1a and 50 min for Study 1b.

Data Analysis

Based on a pilot experiment, it took at least 1000 ms for participants to adjust the probe bar and confirm their selection. Responses faster than 800 ms were considered random and therefore excluded. In grating trials, responses deviating more than 30 degrees from the correct orientation were considered as wrong. If either both of the two grating trials within a block were wrong or the block had more than half of excluded trials, that entire block was excluded from the analyses. As a result, 9.14% of trials from Study 1a and 12.62% of trials from Study 1b were excluded from the analyses.

Orientations in our task were defined within the range of 0–180° (e.g., 190° was treated as equivalent to 10°). Orientation response errors were calculated as the absolute deviation of the observers’ responses from the average orientation values computed either with the steerable pyramid filter or the contour method. Specifically, for each response, we used the smaller of the two values: either the raw absolute difference or the difference subtracted from 180 degrees. This approach ensured that the orientation response reflects the minimal angular deviation from each method’s computed average orientation.

Study 2. Neural representation of orientation selectivity

fMRI Data

We used fMRI data from the Natural Scenes Dataset (NSD) 35. This extensive dataset comprises high-resolution (1.8-mm isotropic, 1.6-s repetition time) whole-brain fMRI measurements collected at 7 Tesla from eight participants. Each participant viewed between 9,000 to 10,000 real-world scene photographs over 30 to 40 scanning sessions. The images, sourced from the Microsoft Common Objects in Context (COCO) database 52, were displayed for 3 s with 1-s inter-stimulus intervals. Participants engaged in a continuous recognition task throughout the experiment. For our analyses, we employed the 1.8-mm volumetric data and utilized version ‘betas_fithrf_GLMdenoise_RR’ for single-trial betas, a generalized linear model (GLM) where the hemodynamic response function (HRF) is estimated for each voxel, denoised with GLMdenoise, and improved estimation with ridge regression. Regions of interest (ROIs) included V1, V2, V3, and hV4, as defined by population receptive field (pRF) maps.

Stimuli

We processed the NSD images by down sampling them to 357 × 357 pixels and then added a gray background to match the display conditions during fMRI acquisition, resulting in final image dimensions of 512 × 512 pixels (6.02° × 6.02° visual angle). The images were converted to grayscale by averaging the RGB channels. For cross-validation purposes, each subject’s set of images was randomly divided into two partitions.

Data analysis

We compared three models to evaluate the impact of orientation computation methods on neural selectivity.

  1. 1.

    Photo-Steerable Pyramid Model: Replicating Roth et al. 24, this model used the same steerable pyramid filter as Study 1 to compute orientation information across spatial frequency bands in grayscale real-world scene photographs from the NSD 35. Eight orientations and seven spatial frequency levels were used for the steerable pyramid. Orientations were evenly divided into eight different filters from 0 to 180 degrees. Vertical orientation was considered as 0 degrees and rotated clockwise from there. The spatial frequency levels were determined by the image size and tuning bandwidth of one octave. This configuration resulted in spatial frequency levels of 21.85, 10.93, 5.46, 2.73, 1.37, 0.68, and 0.34 cycles per degree (cpd).

  2. 2.

    Line Drawing-Steerable Pyramid Model: Line drawings of real-world scenes, generated using the Mid-Level Vision Toolbox 49, were processed with the steerable pyramid filter to isolate the role of image content. This design enabled us to examine whether observed differences stemmed from the nature of the image content or the orientation computation method itself.

  3. 3.

    Contour-Based Model: As in study 1, orientation information was computed directly from the vectorized contours of the line drawings, emphasizing edges and boundaries while excluding fine-grained surface information (Fig. 4). It is important to note that this method of analyzing orientation information works directly in image space and does not include any explicit representations of spatial frequencies.

For the Photo-Steerable Pyramid Model and Line Drawing-Steerable Pyramid Model, the orientation responses were first computed independently for each spatial frequency and orientation.

Orientation responses from each model were sampled using the population receptive field (pRF) of each voxel. The pRF was modeled as a 2D isotropic (circular) Gaussian, with the “size” parameter representing the Gaussian’s standard deviation. For each orientation, the sampled output for a voxel was calculated by taking the dot product between the pRF and the orientation response output. That is, the orientation response was sampled proportional to the value of the pRF at each location in the visual field. After pRF sampling, responses were summed across all spatial frequency levels (with equal weights) within each orientation channel for the Photo-Steerable Pyramid Model and Line Drawing-Steerable Pyramid Model to yield a single pRF-weighted orientation response per voxel. This procedure ensured an equal number of predictors across all models for the multiple regression analysis, facilitating direct comparison between models. Therefore, each of the three models produced one such sampled outputs per image and per orientation for each voxel.

Multiple regression was applied to fMRI response amplitudes observed for each voxel, treating the response as a linear combination of the sampled pRF responses and noise. Beta weights were calculated using ordinary least squares. Each voxel was assigned a distinct set of beta weights and predictors derived from its individual pRF. To evaluate the model’s accuracy, cross-validation was performed. Model parameters were estimated using one half of the data, and predictions were made for the other half. The residuals were calculated by subtracting the predicted responses from the actual responses, and the cross-validated R-squared value was computed by comparing the residual sum of squares with the total sum of squares. The resulting coefficients and R-squared values were averaged across the partitions. The regression process generated a set of model weights that were used to determine the preferred orientation of each voxel. To estimate each voxel’s preferred orientation, the weighted circular mean of the model weights was used.