Introduction

Visual attention tunes sensory processing to optimize decision making and behavior. Professional sports players know that preparing themselves to the onset of sudden visual signals in peripheral visual fields is necessary for fast and accurate motor control. Likewise, driving a car safely requires one to expect peripheral visual signals such as hasty motorbikes and pedestrians. Both the timing and location of these upcoming signals are often unknown to the agent, all the more in peripheral vision, which multiplies the number of possible locations for a stimulus. In such situations, optimal behavior necessitates two fundamental abilities to minimize spatiotemporal uncertainty. First, it requires using available cues, such as other players or traffic lights, to direct our attention to space and time. Spatial attention refers to the cognitive capacity to select task-relevant locations at which targets will occur, while temporal attention represents the capacity to prioritize task-relevant time points at which targets will occur. Second, optimal behavior often requires a high visual temporal acuity, which reflects one’s ability to perceive the temporal characteristics of visual signals at the millisecond scale. This temporal acuity of vision enables structuring information across time and inferring about motion direction and causality between events. In this sense, outperforming opponents in sports or avoiding accidents in life-threatening situations call for the effective temporal processing of peripheral visual signals at the millisecond level to adapt behavior and to anticipate those risks. As those risks most often start in peripheral vision, we showed stimuli in the peripheral visual field. To this day, it remains unclear which type of cues help temporal processing in peripheral vision under spatiotemporal uncertainty.

The present study investigates whether and how cue-based spatial and temporal attention influence the temporal acuity in peripheral vision under uncertainty. Two decades of research have shown complex and somewhat contradictory effects of spatial attention cueing on visual temporal acuity. In central vision, some results suggest that spatial attention impedes visual temporal acuity1 and prolongs temporal integration2. Mounts and Edwards (2016) also showed trade-offs between the impact of spatial attention on spatial and temporal resolution3. Hein et al. (2006) replicated the deleterious effect of spatial cueing on temporal acuity, but only when they used an exogenous cue4. Critically, the effect was reversed in case of an endogenous cue. Such positive effects of spatial cues might increase in peripheral vision when attention orientation is critical. The effect of temporal attention in case of spatiotemporal uncertainty is also unclear. Many studies have shown that temporal attention facilitates visual processing5 and simple detection tasks6,7, accompanied by electrophysiological correlates5,8,9,10,11,12,13. However, explorations usually concern the effects of temporal cues for information whose spatial location is fixed. To the best of our knowledge, no study explored the impact of temporal cues on visual temporal acuity under spatial uncertainty. This knowledge gap is all the more significant that a recent framework of temporal attention14,15 emphasizes that the impact of temporal expectations on performance is task and context-dependent. Does both endogenous spatial and temporal attention influence visual temporal acuity under sensory uncertainty in peripheral vision? If it is the case, to what extent spatial and temporal attention mechanisms jointly contribute to driving the temporal resolution of vision?

Some evidence supports an independence hypothesis of spatial and temporal attention mechanisms16,17,18. Indeed, given previous findings4,19,20 one would assume that anticipating specific locations across the visual field enhances visual temporal acuity, and this independently from expectations about the onset of the upcoming signal (independence hypothesis H1). This means that the reduction of spatial uncertainty could enhance the temporal acuity of vision. In contrast, other findings21,22 suggest that anticipating the onset of the signal enhances visual temporal acuity, and this independently from expectations about the location of that signal (independence hypothesis H2). In other words, the benefit from temporal attention in enhancing temporal acuity might be decoupled from spatial uncertainty regarding the visual signals. Whether the concomitant deployment of spatial and temporal attention, which will help to reduce spatiotemporal uncertainty, provides additive benefits to temporal acuity remains elusive19,23,24,25. A reasonable hypothesis is that attending to both space and time may provide unrivaled benefits to process the fine-grained temporal characteristics of visual signals19 (synergistic hypothesis H3). Yet, an alternative hypothesis is that the concomitant orienting in space and time can only be suboptimal given our time-constrained attentional resources26, thus resulting in non-cumulative effects on task performance.

Here, we evaluate the independence and synergistic hypotheses of spatiotemporal attention for temporal discrimination in vision using a virtual reality scenario (Fig. 1a). To this aim, we developed an original spatiotemporal asynchrony detection task in immersive virtual reality (VR; Fig. 1b-c). Previous asynchrony detection tasks (also called simultaneity judgment task) have proven to be fruitful when evaluating visual temporal processing in both clinical27,28,29,30,31,32 and neurotypical33,34 populations (see refs. 35,36 for reviews). In our paradigm, participants have to detect whether the lighting of two spotlights (targets) located on a virtual stage occur simultaneously (0 ms stimulus onset asynchrony; SOA) or asynchronously (33 ms, 55 ms, or 77 ms SOA). A cue covertly orients attention towards the locations of the spotlights, towards the timing of their lighting, or towards both. Signal detection theory is applied to the asynchrony judgments to distinguish perceptual sensitivity to the SOAs from the decision bias. Because voluntary attention is deployed over time, the study capitalizes on electroencephalographic activity and pupillary dynamics as measures of attention to demonstrate the effectiveness of spatial and temporal orienting. In brief, this study investigates whether and how cue-based spatiotemporal orienting of attention drives the temporal processing in peripheral vision.

Fig. 1: Depiction of the virtual environment, trial procedure, attentional conditions, and hypotheses.
figure 1

a In immersive virtual reality, participants impersonated a light engineer facing a stage whose task was to detect delays between the onset of pairs of spotlights (targets). b Participants provided non-speeded, two-alternative forced-choice simultaneity/asynchrony judgments about the targets appearing with various SOAs (0, 33, 55, or 77 ms). The targets appeared in various locations (top, bottom, left or right target pairs) following a variable cue-target time interval (500, 750, 1000, or 1250 ms foreperiod). c Each trial would start with a cue (100% validity) indicating either the location (spatial cue), the foreperiod (temporal cue), both location and foreperiod (spatiotemporal cue), or no information (neutral cue). d Independency and synergistic hypotheses were evaluated about the effects of spatial, temporal, and spatiotemporal attention on visual temporal acuity.

Methods

Participants

Sample size was estimated using G*Power (v. 3.1.9.6)37. Thirty-six participants were required to reach 95% of statistical power at α = 0.05 for a medium effect size (η2p = 0.25) of the attentional cues on perceptual sensitivity. We increased this estimated sample to forty participants to account for the low signal-noise ratio typical of EEG recordings. Participants were adult volunteers (26 women and 14 men, including two left-handed, all between 19 and 35 years old) from the Université Libre de Bruxelles. The gender was reported by the participants themselves. No data on race or ethnicity was collected. All participants reported having normal or corrected-to-normal visual acuity. All were naïve with respect to the goals of this study and not compensated for their participation in the study. Data from five participants was removed, one due to technical issues with the EEG equipment, three for not following task instructions, and one for color blindness to green and yellow colors. In addition, EEG data from four participants were discarded, one due to lack of recording due to technical issues, and three due to excessive EEG artifacts. Finally, two participants performed only two and three trial blocks out of four due to time constraints. No participant reported any feeling of cyber-sickness during or after the experiment. All participants provided written informed consent to take part in the study. The local ethics committee of the Université Libre de Bruxelles approved the study. The study was not preregistered.

Experimental protocol

The study was developed with the Unity software (Unity Technologies, v. 2019.3.9f1) to create the virtual environment. The HTC Vive Pro Eye (HTC Corp.) headset and controllers immersed participants in VR. SRanipal (v. 1.1.0.1) enabled the monitoring of the eyes’ position and pupil diameter. Participants wore simultaneously the EEG and VR headsets while sitting on a chair. Here, participants were immersed in a virtual theater and played the role of a light engineer who must verify the latencies between projectors on stage. This scenario enables the translation of the experimental task (i.e., asynchrony detection task) into an ecological situation, which potentiates participants’ engagement with the task. Immersive VR enables a more realistic visual experience than traditional 2D screen psychophysical setups. Also, the embedded eye-tracking system in the VR headset allows estimating where participants are looking, that is the 3D point of ocular focus, to control for uninstructed potential saccades during the task.

Participants were located behind a light engineer's desk and were facing a stage in the virtual environment. The light engineer desk was made of two consoles, a central black screen, and two lateral black screens. The central black screen displayed task instructions, while the lateral screens reminded the participants of the mapping between the response buttons and the perceptual judgment (simultaneity vs asynchrony). The stage was composed of red curtains and a set of spot lights in the background (Fig. 1a). A structure with 17 led-like lights formed a gaze fixation cross with a radius of 4.7° of visual angle. This fixation cross was displayed in the center of the visual field and was located at the back of the stage. It also served as attention cues, corresponding to different patterns of yellow and green lights (Fig. 1b, c). The patterns provided four attention cues (neutral, spatial, temporal, or spatiotemporal). The luminance of green and yellow lights was equalized (~90 cad/m2), as measured on-screen with a luminance meter. Inspired by our previous research29, four white spotlights in the foreground of the stage (two on the floor and two on the ceiling) surrounded the fixation cross. They served as placeholders for simultaneous/asynchronous visual stimulation. The spotlights (called ‘targets’ in the remainder of the article) were located in peripheral vision at 24.2° of visual angle from the center of the gaze fixation cross. The size of the glare produced by these spotlights was 1.8° of visual angle. The dim light within the theater ensured a comfortable perceptual contrast between the gaze fixation cross, the four targets, and the remaining scene. A video depicting the procedure is available online (https://www.youtube.com/watch?v=pQpi6Zw1baU). Trials containing saccades beyond 10° of visual angle from the fixation cross during the foreperiod were considered as invalid and discarded (min = 1 trials, max = 33 trials, median = 3.5 trials). The number of these trials was equivalent across cueing conditions.

Task

First, an EEG resting-state measurement of three minutes was recorded while participants fixated a white dot located on the central black screen on the desk (data not exploited in this article).

Second, participants fixated the center of the fixation cross while presented with different patterns of yellow and green lights. At this stage of the experiment, participants were unaware of the relevance of the patterns to the task. Consequently, this passive viewing allowed us to measure task-free pupillary responses to the yellow and green light patterns before participants were informed of their task relevance as cues. Such spontaneous pupillary responses were subsequently compared to task-induced responses. During this passive viewing, each type of cue was presented 16 times, resulting in 64 presentations. Each cue was on for one second and off for three seconds. In other words, three seconds elapsed between the offset of a cue and the onset of the next cue.

Finally, participants performed a delayed-response two-alternative forced-choice asynchrony detection task. Participants were instructed to discriminate simultaneous from asynchronous onsets of two spotlights (i.e. the targets). During the task, participants continuously fixated on the center of the fixation cross. At the start of each trial, the cue was presented for one second (Fig. 1b). Four possible foreperiods (i.e., 500 ms, 750 ms, 1000 ms, or 1250 ms) separated the offset of the cue from the onset of the first target. The targets could appear at four different paired locations (i.e., top, bottom, left, or right pairs of targets), each target being on a top-left, top-right, lower-right or lower-left corner. Four foreperiods were implemented to equate the temporal and spatial uncertainty regarding the occurrence of the targets, and consequently the reduction of temporal and spatial uncertainty provided by the temporal and spatial cues, respectively. Targets were presented simultaneously (0 ms SOA) or asynchronously (33 ms, 55 ms, or 77 ms SOA). Multiple SOAs were used to ensure some detections of asynchronies by all participants while minimizing the probability of floor/ceiling effect in task performance. Participants responded after the presentation of a go-signal by pressing the trigger button or the pad button of the VR controller placed in their right hand. The offset of the central light of the fixation cross served as the go-signal and occurred two seconds following the offset of the cue (the two seconds are the addition of the cue offset-target delay and the target-go-signal delay). No response time limit was implemented. An inter-trial interval of 1500 ms separated the participant’s response and the onset of the subsequent cue. The matching of the controller buttons with the simultaneous and asynchronous judgments was randomly assigned for each participant.

The attentional cueing paradigm was inspired by previous research9,38,39. In the spatial attention condition, the cue predicted the location of the targets pair. During the cue presentation interval, the branches of the fixation cross turned yellow, except for one branch that turned green and pointed towards the targets’ location40. In the temporal attention condition, the cue predicted when the targets would appear. In this case, during the cue presentation, the fixation cross turned yellow except for four led-like lights, which turned green. The four green lights were concentrically located around the center of the cross and gradually predicted the foreperiods with distance. The green lights closest to the center of the cross predicted the shortest foreperiod (500 ms). The green lights at the two intermediary distances from the center predicted the mid-range foreperiods (750 ms and 1000 ms). Finally, the most distant green lights from the center of the cross predicted the longest foreperiod (1250 ms). The spatiotemporal condition consisted of a combination of the spatial and temporal conditions. Here, during the cue presentation, the fixation cross turned yellow except for a single led-like that turned green. This single green light indicated both the locations of the targets and the foreperiod, following the logic used for the spatial and temporal conditions. These conditions were compared with a neutral condition, with the fixation cross turning entirely green, thus providing neither spatial nor temporal information about the targets.

During an initial training phase, participants performed 16, 16, 8, and 16 trials with the neutral, temporal, spatial, and spatiotemporal cue, respectively. Fewer trials were used with the spatial cue during training given its easily understandable significance. Alongside the displayed instructions, the experimenter commented on the patterns of green and yellow lights on the fixation cross to ensure adequate understanding of the information provided by the different types of cue. Furthermore, during the training phase only, click sounds were delivered to the participants every 250 ms during the foreperiod. In other words, participants heard one, two, three, or four click sounds in trials with the first, second, third, and fourth shortest foreperiods, respectively. This training protocol was adopted based on the results of a pilot study showing that the auditory clicks helped participants discriminate the possible foreperiods used in the task.

The main task consisted of four blocks of 256 trials for a total of 1024 trials per participants. Each block was composed of four sub-blocks of 64 trials. The attention cue (neutral, spatial, temporal, or spatiotemporal) was kept constant within each sub-block so as to facilitate the orienting of attention41. Each of the 64 trial instantiated a unique combination of all possible experimental parameters within each sub-block (four locations, four foreperiods, and four SOAs). This combination of the three parameters was randomized for each sub-block, so that the probability of a specific location, foreperiod, and SOA was 25% for each parameter and for each trial. The order of the sub-blocks, representing the four cueing conditions, was randomized within each trial block. Participants were reminded to use the cues to optimize their performance and take their time before responding. Also, participants were invited to take a break and remove the VR headset at the end of each sub-block. At the end of the experiment, the percentage of correct response for each block was shown to the participant on the lateral black screens on the virtual desk.

Data acquisition and analysis

Behavioral data

We applied signal detection theory to distinguish perceptual sensitivity (d’) to the SOAs (the ability to distinguish simultaneous from asynchronous stimuli) from the decision bias (criterion c, also called decision bias, the tendency to report the presence or absence of an SOA independently of the SOAs presented). To do so, we first computed hit rates (HR) and false alarm rates (FAR). Perceptual sensitivity is then computed through the following equations:

$$d^{\prime} =z\left({HR}\right)-z\left({FAR}\right)$$
(1)
$$c=\frac{1}{2}(z\left({HR}\right)+z\left({FAR}\right))$$
(2)

Importantly, perceptual sensitivity and criterion values are uncorrelated only under the assumption of equal distribution of signal and noise42. Because such distribution assumption cannot be verified in our asynchrony detection task and could be violated across attentional cueing conditions, distribution-independent measures of sensitivity (AROC) and criterion (βROC) were also evaluated and estimated from the receiver operating characteristic (ROC) curve. Based on the area under the ROC curve, we computed the AROC and βROC values43 using the following equations:

$${A}_{{ROC}}={K}_{A}+{K}_{B}+0.5$$
(3)
$${B}_{{ROC}}={K}_{A}/{K}_{B}$$
(4)

given

$${K}_{A}=\frac{1}{4}\left({HR}-{FAR}\right)({HR}+{FAR}+\frac{{FAR}}{\left(1-{FAR}\right)})$$
(5)
$${K}_{B}=\frac{1}{4}\left({HR}-{FAR}\right)(2-{HR}+{FAR}+\frac{(1-{HR})}{{HR}})$$
(6)

These perceptual sensitivity (d’, AROC) and criterion (c, βROC) values were estimated for each participant, SOA, and cueing condition in the main analysis.

Eye-tracking data

The eye-tracking system (Tobii Ltd.) integrated into the VR headset monitored the binocular gaze position and pupil size at a sampling rate of 90 Hz. The spatial accuracy estimated by the manufacturer varies between 0.5° and 1.1°. The eye tracker was calibrated at the start of the experiment and following each removal of the headset. Measuring the gaze position enabled the subsequent detection of non-instructed saccades while measuring the binocular pupil diameter allowed to estimate the pupil dynamics during the task.

Raw pupil diameter data were interpolated with a spline method to increase the temporal precision, including the missing data reflecting the blinks identified with the HTC SRanipal development kit. An up-sampling strategy was implemented to limit distortions (e.g. phase shift) introduced by the interpolation of missing data when analyzing pupil data from the low temporal resolution of the head-mounted eye-tracking system (as in refs. 33,44). Then, a low-pass second order IIR Butterworth filter was applied to pupil data (cutoff 6 Hz, as in refs. 45,46). Pupil traces were downsampled to 100 Hz to facilitate further calculations. Both tonic (non-baseline corrected) and phasic (baseline corrected) pupil states were computed using a baseline correction applied to the pupil traces using the mean average of the -500 ms to 0 ms interval before the onset of the cue presentation.

EEG data

EEG activity was continuously recorded using a Biosemi ActiveTwo 10–20 system with 64 active channels at 1024 Hz sampling rates and the ActiView software. The electrode offset was kept below 20 mV. The offset values were the voltage difference between each electrode and the CMS-DRL reference channels to the left and right sides of POz. EEG analyses were performed with MNE-Python v.0.22.047,48. Raw data were filtered offline with a 0.1 Hz high pass filter and a 30 Hz low pass filter. Artifacts arising from eye blinks were identified via independent component analysis and regressed out from the data using the mne.preprocessing.ICA function. Then, EEG data were downsampled to 500 Hz to facilitate further computations. The Autoreject algorithm49 was then used to detect and repair via topographic interpolation50 the remaining artifacts due to head movements and frictions between the electrodes and the VR headset. Nevertheless, the procedure led to the rejection of a mean average of 8.6% of trials (minimum = 0%, maximum = 35%), which helped minimize noise in further analysis.

Time-frequency EEG analysis

First, a surface Laplacian filter was applied (stiffness m = 4, λ = 10−5), resulting in a reference-free current source density (CSD), which increases the spatial resolution of the signal and reduces the signal deformation due to volume conduction51,52. Then, oscillatory activities were extracted using the wavelet approach53 on epochs starting 2300 ms before and finishing 3000 ms after the cue onset, respectively. A family of Morlet wavelets (Gaussian-windowed complex sine wave) was built to perform the convolution via fast Fourier transform over each channel and each trial. The family of wavelets was parametrized to extract frequencies from 4–30 Hz with a step of 1 Hz. The number of cycles of wavelets was linearly spaced, from 3 cycles for the lowest frequency to 8 cycles for the highest frequency. This precaution was used to keep a well-balanced trade-off between time and frequency resolution at each frequency. A baseline correction was applied to transform the signal power into dB, then into normalized z-scores using the mean average of the −500 ms to 0 ms interval before the onset of the cue presentation.

Preliminary visualization of the oscillatory activities revealed one temporo-spatial cluster located over the bilateral parieto-occipital areas (electrodes Pz-P1-P2-POz-PO3-PO4), most likely reflecting the effect of the attentional cues. A general approach in the spatial attention literature is to evaluate the lateralized alpha suppression depending on the cued location (left vs. right) of the stimuli. This approach is suboptimal here because half of the trials are not lateralized (i.e., stimuli presented in top and bottom locations). We used the aforementioned cluster of electrodes for statistical analysis given that 1) our previous studies showed that alpha power recorded over this cluster of electrodes reflects the processing of visual asynchronies in a similar asynchrony detection task30,33, 2) activities related to spatiotemporal information processing are the most likely to be recorded over parietal areas (see54 for a review), and 3) alpha power suppression recorded over similar parieto-occipital electrodes has been widely associated with covert spatial orienting55,56,57. A control analysis of the instantaneous alpha frequency.

(IAF) is described in the Supplementary Methods—Instantaneous Alpha Frequency (IAF).

Statistical analysis

Before statistical analysis, all trials with responses provided before the go-signal were removed (Mean = 4.3%, SD = 4.1%). R (v.4.3)58, and the rstatix (v. 0.6.0) package were used to perform two-sided repeated-measures analyses of variances (rANOVAs). All rANOVAs were performed with a Greenhouse-Geisser correction when the within-subject factor (Cue) violated the sphericity assumption. Shapiro’s test was used to evaluate the normal distribution of the data. We performed contrast analyses to test our three hypotheses (Fig. 1d). Specifically, three planned contrasts were evaluated with Bayesian t-tests and t-tests with False-Discovery Rate (FDR) p-value correction59, as follows:

$${Spatial\; effect}={d^{\prime} }_{{Spatial\; cue}}-{d^{\prime} }_{{Neutral\; cue}}$$
(7)
$${Temporal\; effect}={d^{\prime} }_{{Temporal\; cue}}-{d^{\prime} }_{{Neutral\; cue}}$$
(8)
$${Synergistic\; effect}={d^{\prime} }_{{Spatiotemporal\; cue}}-\frac{\left({d^{\prime} }_{{Spatial\; cue}}+{d^{\prime} }_{{Temporal\; cue}}\right)}{2}$$
(9)

Similarly, Eqs. (7), (8) and (9) were applied to c values, EEG activities (i.e., mean of pre-target oscillatory power), and pupillary measurements (i.e., mean of pre-target pupil diameter) to evaluate the effect of the cues. Bayesian t-tests were computed using the BayesFactor package60 for each contrast to report the strength of evidence for or against the null hypothesis. All Bayes Factors (BF) were calculated using a standard Cauchy prior (s = 0.707) for the alternative hypothesis. Reported BF10 values (along their mean standard error; MSE) indicate how many times more evidence there is in favor of the alternative hypothesis compared with the null hypothesis. Conversely, BF01 values indicate how many times more evidence there is in favor of the null hypothesis compared with the alternative hypothesis.

Results

Spatial but not temporal attentional orienting enhances the temporal resolution of human vision

Here, participants performed a non-speeded, two-alternative forced-choice simultaneity/asynchrony judgment task.

They were instructed to discriminate simultaneous from asynchronous onsets of two visual targets in a virtual reality scenario. The targets appeared in various locations following a variable cue-target foreperiod. Each trial would start with a cue indicating either the location (spatial cue), the foreperiod (temporal cue), both location and foreperiod (spatiotemporal cue), or no information (neutral cue).

We first quantified to which extent participants distinguished simultaneous from asynchronous presentation of the targets. A two-way ANOVA with the factor SOA and Cue applied to the perceptual sensitivity indexes (d’) revealed an effect of the SOA (F(2,68) = 283.2, p = 1.06 × 10−33, η2p = .893, CI95% = [85, 92]), such that sensitivity increases proportionally with the SOA (Fig. 2a). Further, d’ values were all statistically different from each other (one-sided t-tests FDR-corrected; SOA 33ms vs. SOA 55ms : t(34) = 37.6, p < 2.2 × 10−16, BF10 > 1000, MSE <01%; SOA 33ms vs. SOA 77ms : t(34) = 17.7, p < 2.2 × 10−16, BF10 > 1000, MSE <01%; SOA 55ms vs. SOA 77ms : t(34) = 17.7, p < 2.2 × 10−16, BF10 > 29, MSE <01%). This result confirms that task difficulty decreases with the SOA (see also Fig. S1). The analysis also indicated a significant effect of the Cue (F(3,102) = 4.79, p = 0.004, η2p = 0.124, CI95% = [0.02, 023]) and no interaction effect (F(6,204) = 0.86, p = 0.52).

Fig. 2: Covert spatial orienting of attention enhances the perceptual sensitivity to visual asynchronies.
figure 2

The perceptual sensitivity reflects the participants’ ability to discriminate simultaneous from asynchronous onsets of the two visual events. a Perceptual sensitivity increases proportionally with the SOA. b Cue-based spatial, but not temporal, orienting of attention enhances the perceptual sensitivity to asynchronies. Error bars represent one confidence interval from the mean. N = 35 participants.

The contrast analysis revealed moderate evidence for the independency hypothesis (H1), as spatial attention increases perceptual sensitivity (one-sided t-test; t(34) = 2.55, p = 0.008, Cohen’s d = 0.43, CI95% = [0.08, 0.8], BF10 = 2.9, MSE < 01%). In contrast, the analysis revealed moderate evidence against an effect of temporal attention (H2; one-sided t-test; t(34) = 0.52 p = 0.69, BF01 = 4.8, MSE = 04%) and anecdotal evidence against a synergistic effect of spatiotemporal attention (H3; one-sided t-test; t(34) = 1.45, p = 0.08, BF01 = 2.13, MSE = 03%) on perceptual sensitivity. Thus, only covert spatial orienting of attention enhances participants’ ability to discriminate between the SOAs. The criterion analysis revealed no decision bias across cues (see Supplementary Results – No decision bias across cues). Finally, supplemental analyses showed no evidence for a differential effect of spatial or temporal orienting on sensitivity across foreperiods nor trial blocks, thus supporting the idea that explicit (here, cue-based) temporal orienting does not enhance the temporal acuity of vision in our task. However, the sensitivity increased with the duration of the foreperiods, suggesting an effect of implicit temporal orienting (i.e. hazard rate) on visual temporal acuity (Fig. S2, see Supplementary Results—Implicit temporal attention enhances sensitivity). Overall, this data supports the hypothesis that covert spatial—but not temporal—orienting of attention improves the temporal acuity of vision (i.e., independency hypothesis H1).

Electrophysiological synergistic effects of spatial and temporal attentional orienting

Decades of research highlighted the role of alpha-band (8–14 Hz) oscillations in accounting for attentional state, through top-down inhibitory and gating mechanisms61,62,63,64,65 implemented via inter-areal communications in the brain66. Parieto-occipital alpha-band power suppression reflects both spatial57,67,68,69,70,71 and temporal67,72,73 orienting of attention. In contrast, theta power has been found to increase during visual temporal orienting74 and implicit temporal expectations75. Some studies nonetheless, suggest global power decreases during spatiotemporal orienting73. Thus, although the role of theta-band power in attentional orienting remains uncertain, parieto-occipital alpha-band power may mirror spatiotemporal orienting.

First, we evaluated whether the attentional cues significantly reduced the pre-target alpha power recorded over parieto-occipital areas (Fig. 3a–c). A one-way ANOVA with the factor Cue applied to the pre-target (-200 to 0 ms) alpha power revealed an effect of the Cue (F(3,90) = 6.39, p = 0.0006, η2p = 0.176, CI95% = [0.04, 30]). The contrast analysis revealed strong evidence for an effect of spatial attention (one-sided t-test; t(30) = 3.08, p = 0.002, Cohen’s d = 0.55, CI95% = [0.22, 0.93], BF10 = 9.0, MSE <01%), anecdotal evidence for an effect of temporal attention (one-sided t-test; t(30) = 2.02, p = 0.026, Cohen’s d = 0.36, CI95% = [0.02, 0.77], BF10 = 1.14, MSE <01%), and moderate evidence for a synergistic effect of the spatiotemporal attention (H3; two-sided t-test; t(30) = 2.59, p = 0.014, Cohen’s d = 0.47, CI95% = [0.15, 0.83], BF10 = 3.27, MSE <01%) on the pre-target alpha-power.

Fig. 3: Synergistic effect of spatial and temporal orienting of attention in pre-target alpha-band and theta-band power suppression.
figure 3

a Time-frequency representations of the parieto-occipital oscillatory power suppression around the targets’ onset for each cueing condition. b Time series representing the alpha-band (8–14 Hz) power suppression through spatial, temporal, and spatiotemporal orienting of attention. c Means of spatial, temporal, and synergistic effect of cueing on alpha-band suppression (−200 to 0 ms pre-target interval). d Time series representing the (4–8 Hz) theta-band power suppression elicited by the spatial, temporal, and spatiotemporal orienting of attention. e Means of spatial, temporal, and synergistic effect of cueing on theta suppression (−200 to 0 ms pre-target interval). Topographic maps of the pre-target (−200 to 0 ms) (f) alpha-band and (g) theta-band power suppression over parieto-occipital areas induced by attentional orienting. The white circles indicate the locations of the electrodes used for statistical analysis. The analysis provides electrophysiological evidence for a synergistic effect of spatiotemporal attention. Error bars represent one confidence interval from the mean. Colored shaded areas represent one standard error of the mean. N = 31 participants.

Second, we evaluated the effects of the attentional cues on the parieto-occipital theta power (Fig. 3d–f). A one-way ANOVA with the factor Cue applied to the pre-target (−200 to 0 ms) theta (4–8 Hz) power revealed an effect of the Cue (F(3,90) = 8.58, p = 4.51 × 10−05; η2p = 0.222, CI95% = [0.07,35]), such as the cues reduced the pre-target theta-band power. The contrast analysis revealed moderate evidence for an effect of spatial attention (one-sided t-test; t(31) = 2.89, p = 0.005, Cohen’s d = 0.50, CI95% = [0.02, 0.86], BF10 = 4.8, MSE <01%), anecdotal evidence for an effect of temporal attention (one-sided t-test; t(31) = 2.13, p = 0.021, Cohen’s d = 0.38, CI95% = [0.03, 0.75], BF10 = 1.37, MSE <01%), and strong evidence for a synergistic effect of spatiotemporal attention (H3; two-sided t-test; t(31) = 3.124, p = 0.004, Cohen’s d = 0.56, CI95% = [0.28, 0.85], BF10 = 9.90, MSE <01%) on the pre-target theta power. Supplemental analyses indicate that attentional orienting in space, but not in time, affects pre-target beta-band (16–22 Hz) power suppression (Fig. S3, see Supplementary Results—Only spatial orienting reduces parieto-occipital beta power). Spearman correlations reveal no relationship between perceptual sensitivity and pre-target alpha (r Neutral cue = 0.17, p Neutral cue = 0.364; r Spatial cue = 0.20, p Spatial cue = 0.289; r Temporal cue = -0.03, p Temporal cue = 0.873; r Spatiotemporal cue = −0.01, p Spatiotemporal cue = 0.953), theta (r Neutral cue = −0.08, p Neutral cue = 0.679; r Spatial cue = 0.06, p Spatial cue = 0.769; r Temporal cue = −0.11, p Temporal cue = 0.579; r Spatiotemporal cue = −0.21, p Spatiotemporal cue = 0.269) nor beta power (r Neutral cue = −0.02, p Neutral cue = 0.909; r Spatial cue = 0.13, p Spatial cue = 0.476; r Temporal cue = −0.15, p Temporal cue = 0.429; r Spatiotemporal cue = 0.07, p Spatiotemporal cue = 0.726). Overall, this EEG analysis shows that participants benefited from all cues to orient their attention across space and time, and that spatial and temporal attention induce suppression of parieto-occipital oscillatory activities in a synergistic manner (synergistic hypothesis H3).

Non-synergistic effects of spatial and temporal attentional orienting on pupil dynamics

Pupillary dynamics reflect the deployment of selective attention76,77,78,79,80. Previous studies revealed modulation of pupil dilation during cue-based spatial orienting and implicit temporal orienting81,82. First, we evaluated whether pupil dynamics reflect the deployment of covert attention in our task. Then, we evaluated when this modulation starts. Here, accounting for the foreperiod is essential given the slow dynamics of pupil responses. A two-way ANOVA with the factors Foreperiod and Cue applied to the pre-target (−200 to 0 ms) pupil diameter revealed an effect of the Foreperiod (F(3,102) = 17.9, p = 1.95 × 10−04, η2p = 0.346, CI95% = [0.19,46]), reflecting the basic observation that the longer the foreperiod the more dilated the pupil following the cue offset. Yet, the analysis also revealed an effect of the Cue (F(3,102) = 7.59, p = 0.0001, η2p = 0.183, CI95% = [0.05, 30]; Fig. 4) yet no interaction effect (F(9,306) = 0.8, p = .58). The contrast analysis revealed strong evidence for an effect of the spatial (one-sided t-test; t(34) = 3.37, p = 0.001, Cohen’s d = 0.57, CI95% = [0.43, 0.81], BF10 = 18.2, MSE < 0.01%) and temporal (one-sided t-test; t(34) = 3.43, p = 0.008, Cohen’s d = 0.58, CI95% = [0.39, 0.9], BF01 = 20.8, MSE < 0.01%) attention on pupil diameter, but no synergistic effect (one-sided t-test; t(34) = 0.29, p = 0.61, BF01 = 5.3, MSE = 0.04%). Spearman correlations reveal no relationship between perceptual sensitivity and pre-target pupil size (r Neutral cue = 0.15, p Neutral cue = 0.403; r Spatial cue = −0.07, p Spatial cue = 0.692; r Temporal cue = 0.06, p Temporal cue = 0.712; r Spatiotemporal cue = −0.09, p Spatiotemporal cue = 0.585).

Fig. 4: Spatial and temporal orienting of attention inhibit pupil constriction.
figure 4

a Pupil size around cue presentation for each cueing condition. b Pupil size around target onset for each cueing condition. Colored shaded areas represent one standard error of the mean. c) Boxplot representing the increased pupil size during the pre-target time interval (−200 to 0 ms). Small data points represent individual data, while large data points represent the mean. Error bars represent one confidence interval from the mean. The analysis reveals that generating cue-based spatial and temporal expectations inhibits the pupillary light reflex. N = 35 participants.

Next, we evaluated pupil diameter around the cue presentation to characterize the emergence of the effect of attentional orienting on pupil dynamics (Fig. 4a). To do so, we first segmented pupil size during the cue presentation in bins of 100 ms. Then, we performed the contrast analysis for each bin and FDR-corrected the p-values for multiple testing. This exploratory analysis revealed that both spatial and temporal attention impede the pupil constriction as early as 400 ms following the cue onset. A control analysis comparing task-induced and task-free pupil size indicates that the present effects cannot be accounted by minimal differences of luminance across cues (Fig. S4). Also, an additional analysis suggests that these cue-based effects on pupil dynamics cannot be explained by distinct tonic pupil states (i.e. sustained states through trial blocks; Fig. S5, see Supplementary Results – Pupil diameter reflects attentional orienting beyond changes in luminance) possibly related to the block-wise presentation of the cues. Overall, this analysis provides evidence that the formation of spatiotemporal expectations from external cues transiently inhibits pupil constriction.

Discussion

The temporal acuity of vision reflects one’s capacity to detect short delays between events. The literature suggests that spatial attention can alter this temporal acuity in vision, and, in turn, influence decision-making. However, existing evidence is inconclusive regarding how spatial and temporal attention drive visual temporal acuity under spatiotemporal uncertainty. Our results show that attentional cues inhibit pupil constrictions prior attentional orienting. Then, the orienting of attention in space and time modulates posterior electroencephalographic activities in a synergistic manner (hypothesis H3). However, only covert spatial attention enhances the temporal sensitivity to visual asynchronies (independency hypothesis H1). Altogether, these results highlight an integrated spatial-temporal mechanism of selective attention at the physiological level (EEG results) that has no apparent consequence for visual temporal acuity at the behavioral level. The present results provide evidence that explicit spatial but not temporal orienting of attention enhances the temporal acuity of vision despite the effective deployment of spatiotemporal attentional mechanisms at the physiological level.

A recurrent debate in the vision and attention literatures concerns whether covert spatial attention improves4,20 or deteriorates the temporal resolution of vision1,3,4,83,84, also termed visual temporal acuity. At first sight, our results suggesting that spatial orienting enhances temporal acuity may appear to be at odds with extant literature. In the following we discuss alternative accounts and reconcile our findings with the literature. First, some previous results have been attributed to different speed-accuracy trade-offs85 and response strategies1,83. The delayed-response paradigm employed in our study limits the confound of various speed-accuracy trade-offs across attentional states. Moreover, we found no evidence that attentional states influence response strategies (criterion) in our task. Hence, our results could not be accounted for by either of these two arguments. However, a major difference between our investigation and the studies mentioned above is the stimulus location. Previous studies evaluated whether and how selective attention affects temporal acuity in central (i.e., foveal and parafoveal) vision. Here, stimuli were purposely presented beyond the locus of gaze fixation, that is in near peripheral visual fields. Given that the visual temporal acuity decreases with the eccentricity of stimuli86,87,88, one could assume that the benefits from endogenous attention is greater in peripheral than central vision. More specifically, recent evidence shows that the detection of asynchronies varies across the visual space89. While we did not compare perceptual sensitivity in peripheral versus central vision, the present data suggest that cue-based spatial (but not temporal) attention enhances temporal acuity in the periphery. This discrepancy between our results and the literature in terms of stimulus location may explain the differential effect of explicit attention orienting on visual temporal acuity. Yet, further explanations are also considered.

Indeed, previous experimental1,84,90 and computational91,92 evidence of a detrimental effect of spatial attention on the temporal acuity of vision could be explained by the nature of the attentional states (endogenous or exogenous) and the stimulus layout (or visual context). On the one hand, it has been proposed that covert spatial attention hinders the temporal acuity of vision through inhibitory interactions between the parvocellular system (main contributor to visual spatial acuity) and magnocellular system (main contributor to visual temporal acuity84,90,91,92). Arguably, transient spatial attention would favor parvocellular processing and thus inhibit magnocellular processing of visual information, resulting in a transient extension of the temporal windows of sensory integration and the impairment of one’s ability to discriminate stimuli in time. Given that alpha oscillatory power suppression is associated with enhanced information processing93, our EEG results rather fit with the idea of an increased activation of the dorsal magnocellular system, presumably responsible for the enhanced task performance. Importantly, the previously reported deleterious effect of spatial attention on temporal acuity was found in exogenous cueing paradigms (e.g., a transient flash involuntarily attracting our attention to a location) but not during endogenous spatial orienting (i.e., based on the voluntary orientation of attention to a location4). This difference is essential given that endogenous and exogenous attention rely on partially distinct neuronal systems94. The spatial cue in our task relies on a color difference, which arguably does not attract attention as automatically as a flash. In this sense, our results are consistent with the study by Hein et al., which showed that endogenous (but not exogenous) spatial attention facilitates visual temporal processing4. Another decisive parameter varying between studies concerns the spatial characteristics of the visual environment. Using a simultaneity/asynchrony judgments task with two distinct stimuli across space limits the potential effect of masking and could explain the disparity between previous findings and the current evidence that spatial orienting enhances temporal acuity (see details in Supplementary Note 1).

The role of attentional mechanisms in parsing of visual signals into discrete events remain unclear95. Our main neurophysiological finding is that covert spatial and temporal attentional mechanisms synergistically interact through parieto-occipital alpha and theta-band oscillations (see Supplementary Note 2 for a discussion). This synergy provides decisive support to the hypothesis that spatial and temporal attentional processes are interactive19,24,25 rather than independent6,16,17,96,97. Here, spatial and temporal attentional states seem to rely on inter-dependent nested oscillatory activities preparing, in a top-down manner, for optimal temporal processing of visual targets. A neuronal gating mechanism61 reflected in the present alpha-band suppressions could depict such a top-down mechanism. However, our synergistic alpha suppression effect prompts that such gating mechanisms are at work not only when selective attention is directed across space70,98 but also when directed in time. In other words, alpha oscillations could reflect the deployment of location-specific and time-specific neuronal gating, potentially tapping into the parietal domain-general magnitude system99,100 to modulate the temporal acuity of vision where and when it is beneficial. The lack of behavioral benefit does not allow us to conclude about this hypothesis, but further M/EEG studies using our task with varying validity cues would shed light on whether gating mechanisms induce perceptual trade-offs across both spatial and temporal dimensions of vision. Moreover, the extent to which pupil dynamics truly reflects spatial and temporal attentional mechanisms remain to be clarify.

In this study, we found that generating spatial and temporal expectations from cues about forthcoming visual targets inhibits pupil constriction. Notably, the present cueing paradigm uses a neutral cue instead of an absence of cue to mitigate the level of arousal between attentional and non-attentional cueing conditions. While the discrepancy between task-induced and task-free pupil dynamics can be interpreted in terms of arousal, interpreting our cue-based effects in such terms sounds questionable. Instead, it is most likely that the formation of spatiotemporal expectations has driven the involuntary constriction of the pupils. In other words, preparing to attend to space or time interfered with the pupillary light reflex. The fact that attentional preparation influences pupil size is not surprising, but its effect has been previously found during pupil dilation101 and not during its preceding pupillary light reflex. The difference is notable given that pupil dilation and constriction rely on different constituents of the autonomous system, that is the sympathetic and parasympathetic systems, respectively. Could spatiotemporal expectations act upon the parasympathetic system and, in turn, on the pupil constriction? First, the prefrontal cortex could be critical in explaining our spatiotemporal expectation effect. Specifically, the frontal eye field (FEF) exerts a top-down effect through alpha oscillations on the intraparietal sulcus during spatial orienting102. Then, the FEF is involved in both attentional orienting103 and pupil control104. It has been proposed that the FEF could drive pupil constriction via its connection to sub-cortical circuits, including the superior colliculi105,106, the olivary pretectal, and the Edinger–Westphal nuclei107,108, which are part of the parasympathetic system. The formation of cue-based expectation may recruit this circuitry and, in turn, affect pupillary states. As such, pupil dynamics may represent a readout of the formation of spatial and temporal expectations. Whether spatiotemporal expectations rely on the aforementioned cortico-subcortical circuits and whether implicit attention orienting leads to similar pupil inhibition remains to be investigated in further studies and could shed promising light upon the embodiment of attentional mechanisms.

Limitations

Finally, we should acknowledge some limitations of our study. First, the relatively short intervals separating our foreperiods could limit the generalization of our effects to other timescales. The field would gain from replicating the present results using longer foreperiods. Second, one may explain the current lack of effect of temporal orienting on temporal acuity by the close temporal proximity separating the foreperiods. For instance, one could benefit more from a temporal cue when selective attention is sustained over several seconds rather a second. However, the benefits of temporal orienting on visual performance occurs on the timescale of a few hundred milliseconds7,26, that is below the longest foreperiod used in our study. Also, while temporal attention cues are often less intuitive and explicit than spatial cues, our physiological measures testify for the effective orienting of attention in time during the task. Finally, the present results are limited to visual attention towards peripheral vision. Theories of visual attention would benefit from exploring whether the synergistic effects of spatiotemporal attention reported here generalizes to temporal task performed in central vision.

Conclusion

Our results cast doubts on the consensual trade-off that spatial attention systematically enhances the spatial resolution of vision while impeding its temporal resolution. Not only does endogenous spatial orienting enhance the temporal acuity of vision, it can also synergistically interact with temporal orienting at a neurophysiological level. Furthermore, pupil dynamics appear to offer a readout of the formation of spatial and temporal expectations from cues. The lack of synergistic effect at the behavioral level suggest a simple yet informative insight, that is orienting our attention towards task-relevant locations is not sufficient to benefit from temporal predictability and improve task performance. In other words, knowing “where” is not enough to benefit from knowing “when” about the visual targets and increase temporal discrimination at the millisecond level. Altogether, we believe the present results call for the refinement to recent computational models of visual attention while providing further support to the recent proposal14,15 that the benefit from temporal attention on performance is task and context-dependent.