Introduction

The visual inputs we receive in real life consist of vast arrays of features scattered across space and dynamically evolving through time. Yet, we phenomenologically experience the world in a spatiotemporally seamless manner. How does the brain integrate the complex and ever-changing inputs across space and time?

Information redundancy in natural inputs may play a critical role: Inputs are redundant across space, with predictable arrangements of both low-level features1 and high-level object content2,3. They are also redundant across time, with events unfolding in highly predictable sequences4,5. These redundancies enable the brain to efficiently predict how visual features need to be integrated across space and time.

Such predictions are carried by cortical feedback flows expressed in dedicated rhythmic channels6,7,8. In our previous study9, we investigated integrative processing by manipulating the spatiotemporal coherence of naturalistic videos, where either the same video or different videos were shown through two apertures in the left and right visual hemifields. Decoding analyses on frequency-resolved Electroencephalography (EEG) data demonstrate that when natural inputs align across the visual field (and thus can be integrated), there was stimulus information in feedback-related alpha rhythms, whereas when the inputs do not match (and thus cannot be readily integrated), there was stimulus information in gamma oscillations. Analytically combining the EEG data with functional magnetic resonance imaging (fMRI) data recorded during the same paradigm, we further showed that integration-related alpha dynamics are linked to representations in the early visual cortex, suggesting that integration is mediated by alpha-rhythmic feedback that traverses the hierarchy back to the early visual cortex.

If rhythmic feedback was indeed critical for visual integration, the degree of feedback should co-vary with the phenomenological experience of a coherent visual world: When a visual input is perceived as coherent, it should be represented more strongly in feedback-related alpha rhythms, while inputs perceived as incoherent should be represented more strongly in feedforward-related gamma rhythms.

Here, we put this prediction to the test. In an EEG study, we presented natural video segments across the two visual hemifields, either synchronously or asynchronously (with one segment relatively delayed in time), and asked participants to report whether they perceived the stimulation as spatiotemporally coherent or not. Critically, this paradigm allowed us to test whether asynchronously presented videos at the perceptual threshold (i.e., stimuli perceived as coherent 50% of time) are coded differently across alpha and gamma rhythms, depending on the perceptual report.

Results

We manipulated the degree of spatiotemporal coherence by presenting two segments from the same video synchronously or asynchronously (Fig. 1A) through two square apertures left and right of the central fixation (Fig. 1B). Participants were tasked with reporting whether the whole video display appeared as coherent or incoherent to them. When temporal stimulus asynchrony was low, the videos should appear as coherent (i.e., stemming from one seamless movie), but with higher temporal asynchrony they should appear as incoherent (i.e., with noticeable offset). To investigate differences in neural processing at the perceptual threshold, we initially quantified the delays that led to coherent and incoherent perception with equal probability in a behavioral experiment (Fig. 1C).

Fig. 1: A rhythmic signature of phenomenological coherence.
Fig. 1: A rhythmic signature of phenomenological coherence.The alternative text for this image may have been generated using AI.
Full size image

A Five natural videos were presented through apertures left and right of fixation, either synchronously or asynchronously. B Participants (n = 26) were instructed to fixate centrally and judge whether the stimulation was perceptually coherent or incoherent. C In an initial behavioral experiment, we used adaptive staircases to determine participants’ threshold delay for each of the five videos. In the subsequent Electroencephalography (EEG) experiment, we presented the videos with no delay (coherent), the staircased delay (threshold), and twice the staircased delay (incoherent). We further separated the threshold trials into coherent and incoherent trials based on participants’ responses. We then decoded between the 5 videos within each condition using spectral power patterns in the alpha and gamma bands. D In the coherent and incoherent conditions, we found that coherent stimuli were decodable from alpha activity, suggesting prominent feedback propagation, whereas incoherent stimuli were decodable from gamma activity, suggesting dominant feedforward propagation. E The threshold condition replicated these results, showing that the representational balance across alpha and gamma rhythms tracks perceived coherence for identical visual inputs. Error bars represent standard errors. Dots represent individual participants. *P < 0.05, +P = 0.064.

In the subsequent EEG experiment, we presented the video stimuli in three conditions: no delay between the two video segments (coherent; 25% of trials), the delay at each participant’s subjective threshold (threshold; 50% of trials), and twice this subjective threshold (incoherent; 25% of trials). We subsequently split the threshold trials into threshold-coherent and threshold-incoherent trials, based on participants’ responses, allowing us to quantify neural representations for the same stimulus when participants perceived it as coherent or incoherent.

We hypothesized that the videos were coded more strongly in feedback-related alpha when perceived as coherent and more strongly in feedforward-related gamma when perceived as incoherent9. To test this prediction, we decoded the stimuli in each condition using spectral power patterns across parietal-occipital (PO) channels, separately for alpha (8–12 Hz) and gamma (31–70 Hz) frequency bands.

We found that stimuli in the coherent condition were decodable from alpha activity [t(25) = 5.409, P < 0.001], whereas stimuli in the incoherent condition were decodable from gamma activity [t(25) = 2.558, P = 0.017]. Coherent stimuli were decoded better than incoherent stimuli in the alpha frequency band [t(25) = 4.257, P < 0.001], and incoherent stimuli were decoded better than coherent stimuli in the gamma frequency band [t(25) = 2.203, P = 0.037; interaction: F(1, 25) = 30.282, P < 0.001] (Fig. 1D).

The threshold condition replicated this pattern of results: When the stimuli at threshold were perceived as coherent, they were decodable in the alpha band [t(25) = 6.415, P < 0.001], and when the stimuli were perceived as incoherent, they were decodable in the gamma band [t(25) = 2.948, P = 0.014]. Coherent stimuli were decoded better than incoherent stimuli in the alpha frequency band [t(25) = 5.385, P < 0.001]. Numerically, incoherent stimuli were decoded better than coherent stimuli in the gamma frequency band, though this difference was not significant [t(25) = 1.938, P = 0.064; interaction: F(1, 25) = 23.592, P < 0.001] (Fig. 1E).

No effects were found in the theta (4–7 Hz) and beta (13–30 Hz) frequency bands (see Supplementary Information Fig. S1) and in evoked broadband responses (see Supplementary Information Fig. S2).

We additionally performed the statistical analyses using permutation tests, which reproduced the overall pattern of results (see Supplementary Information Table S1).

Discussion

Our results show that the representational balance between alpha and gamma rhythms tracks the phenomenological experience of coherence: The same stimulus is coded in feedback-related alpha when it is perceived as coherent but in feedforward-related gamma when it is perceived as incoherent.

Integration-related alpha dynamics may carry spatiotemporally redundant – and thus predictable – stimulus information upstream10, guiding the adaptive integration of this information into meaningful unified percepts. We have previously demonstrated that spatiotemporally coherent stimulation, which readily allows for integration into a coherent percept, is linked to content coding in feedback-related alpha rhythms9,11. Our results show that these alpha-rhythmic codes indeed relate to the phenomenological experience of visual coherence.

Alpha rhythms have been associated with temporal integration before. The duration of the alpha cycle has been linked to the width of temporal integration windows12,13,14, and the phase and power of pre-stimulus alpha rhythms have been linked to integration versus segregation in subsequently presented stimuli13,15. Our findings demonstrate that, beyond that, alpha rhythms also fulfill a function in representing the contents of upstream flows in the cortex, suggesting an active involvement of alpha in binding visual stimuli across time (and space).

Our study probed the concurrent integration across space and time, and perceived incongruencies could originate from incongruent temporal patterns (e.g., motion trajectories) or spatial patterns (e.g., continuation of contours). Whether spatial or temporal properties drive integration to different extents needs to be explored further. Alternatively, integration across space or time may be governed by shared neural mechanisms16.

To conclude, our results suggest that representational shifts from bottom-up gamma to top-down alpha dynamics drive visual integration, highlighting the crucial role of cortical feedback in the construction of seamless perceptual experiences. More broadly, our results provide a rhythmic signature of the feedforward-to-feedback balance in the visual cortex, which can be employed to track subjective changes in perception, attention, or cognition as a function of top-down or bottom-up dominance17.

Methods

Participants

Twenty-six healthy adults (16 females; age = 22.2 ± 2.6 years) with normal or corrected-to-normal vision participated. A minimum sample size of 24 was determined with an effect size of 0.25 as derived from our previous study9, a significance level of 0.05, and a power of 0.8. All participants signed written informed consent and received either course credits or cash reimbursement. The study was approved by the ethical committee of the Department of Education and Psychology at Freie Universität Berlin and was conducted in accordance with the Declaration of Helsinki. All ethical regulations relevant to human research participants were followed.

Stimuli and paradigm

The stimulus set consisted of five short video clips (airplane takeoff, cyclist, roller coaster, ski jumper, and driving car). The videos were presented through two square apertures (6° visual angle) left and right of the central fixation (2.78° offset). The central fixation had a diameter of 0.44° visual angle. Videos were played either synchronously (in each frame, the two images shown through the apertures were from the same frame of the original video) or asynchronously (in each frame, the two images were from different frames of the original video; see Fig. 1A).

We presented stimuli (at 60-Hz refresh rate) and recorded participants’ responses using MATLAB and the Psychophysics Toolbox18,19. We first presented a central fixation for 0.5 seconds, followed by the videos for 3 seconds. Participants were instructed to maintain central fixation and, after the video ended, judge whether the videos were perceptually coherent or incoherent. An example trial is shown in Fig. 1B.

Behavioral experiment

We first conducted a behavioral experiment to estimate subjective integration thresholds for scenes using the QUEST adaptive staircasing procedure20. We ran separate QUEST staircases for each video, initializing the delay between two video segments randomly between 100 and 400 ms in the first trial of each scene and adaptively adjusting the delay afterwards. Each staircase terminated after 80 trials. The staircases converged within this trial count. For each participant, we averaged the delay values in the last 5 trials for each video to obtain the threshold delays.

EEG experiment

In the EEG experiment, we presented stimuli in three conditions. In the coherent condition, stimuli were presented synchronously. In the threshold condition, we set the delay between video segments for each scene to the subjective threshold estimated in the behavioral experiment with the same participant. In the incoherent condition, we set the delay for each scene to twice the subjective threshold. Each coherent/incoherent stimulus was presented 30 times, and each threshold stimulus was shown 60 times, yielding 600 trials, which were presented in random order. For the conditions with a delay, the left segment temporally led in half of the trials, and the right segment led in the other half. After the experiment, we separated threshold trials based on each participant’s responses: if a trial was judged as coherent, we assigned it to the threshold-coherent condition; otherwise, we assigned the trial to the threshold-incoherent condition. On average, 93.4% of coherent trials were perceived as coherent; 94.4% of incoherent trials were perceived as incoherent; 50.6% of threshold trials were perceived as coherent, and 49.4% as incoherent.

We recorded EEG and eye-tracking data while participants conducted the experiment. EEG data were acquired using a 10-10 EASYCAP 64-electrode system with a BrainVision actiCHamp amplifier at 1000 Hz. The data were online filtered at 0.03–100 Hz and referenced to FCz. Eye-tracking data were acquired using the Psychophysics and Eyelink Toolbox extensions21, with an Eyelink 1000 Tower Mount (SR Research Ltd., Canada). We recorded the movements of the right eye and conducted a standard 9-point calibration at the beginning of the experiment.

Eye-movement analysis

We used Fieldtrip22 to epoch the eye-tracking data from −0.5 to 3.5 s and downsampled the data to 200 Hz. To check participants’ fixation stability, we estimated the mean and standard deviation (SD) of horizontal eye movements during stimulus presentation (0–3 seconds). There were no significant between-condition differences in horizontal eye movements, neither in the mean, F(3,75) = 1.295, P = 0.295, nor the standard deviation, F(3,75) = 0.334, P = 0.801.

EEG preprocessing

We preprocessed EEG data using Fieldtrip. We first epoched the data from −0.5 to 3.5 s relative to the stimulus onset. We then band-stop filtered the data to remove 50-Hz line noise, referenced the data to the average of all channels, and downsampled the data to 200 Hz. Next, we visually inspected the data and removed noisy trials (74.7 ± 12.6) and channels (2.4 ± 0.3). The removed channels were subsequently interpolated using neighboring channels. We performed independent component analysis (ICA) using the FastICA algorithm, followed by a visual inspection of the topographical and time-course properties of resulting components, to further remove blinks and eye movement artifacts (1.9 ± 0.1 components). Finally, the data were baseline-corrected by subtracting the mean of pre-stimulus signals.

EEG spectral analysis

We performed spectral analysis on the preprocessed EEG data using Fieldtrip, replicating the analysis pipeline used previously9,11. For each trial, we conducted the fast Fourier transform (FFT) and estimated the power of each frequency from 4 to 70 Hz in each channel. FFT was performed on the whole stimulation period (0–3 s). We used a signal tapper with a Hanning window for the low-frequency bands: theta (4–7 Hz, in steps of 1 Hz), alpha (8–12 Hz, in steps of 1 Hz), and beta (13–30 Hz, in steps of 2 Hz). For the gamma band (31–70 Hz, in steps of 2 Hz), we used the discrete prolate spheroidal sequences (DPSS) multitaper method with ±8 Hz smoothing.

EEG decoding analysis

We performed multivariate decoding analysis to probe rhythmic representations of stimuli using CoSMoMVPA23 and LIBSVM24. For this analysis, we chose 17 parietal and occipital (PO) channels (Pz, P1, P2, P3, P4, P5, P6, P7, P8, POz, PO3, PO4, PO7, PO8, Oz, O1, O2) over visual cortex11,25 and extracted the spectral power patterns across these channels to differentiate between five stimuli within each of the four conditions (coherent, threshold-coherent, threshold-incoherent, and incoherent), separately for each frequency band (theta, alpha, beta, and gamma). The analysis was conducted using the linear support machine (SVM) and leave-one-trial-out cross-validation. The number of trials was always balanced across scenes. Additionally, to reduce the dimensionality of the data, we performed PCA on the training data and then projected the PCA solutions (99% variance explained of the training set) onto the testing data9,11,26.

Statistics and reproducibility

We used one-sample t-tests (one-tailed) to compare the decoding accuracy against the chance level (20%) separately for each frequency band and each condition to detect frequency-specific representations of stimuli. To investigate whether the pattern of decoding accuracies across the 4 frequency bands differs for congruent and incongruent conditions, as hypothesized, we performed 2-condition × 4-frequency two-way ANOVAs. Two such ANOVAs were performed, one comparing the coherent and incoherent conditions, and one comparing the threshold-coherent and threshold-incoherent conditions. After that, we performed paired t-tests (two-tailed) to compare decoding accuracies between the congruent and incongruent conditions separately at each frequency band. Multiple comparisons were corrected using false discovery rate (FDR) correction (P < 0.05).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.