Abstract
When we move through the environment, the direction of objects in the optic array changes, producing an optic flow. To perceive world-relative object motion during self-motion, complex flow vectors are decomposed during a process called flow parsing. The real world and realistic VR environments contain abundant depth and distance cues, including size and binocular disparities. When targets move in various directions, the distance signals potentially aid in the flow parsing process. We designed two experiments with our wide-field stereoscopic environment. Participants observed target motions during visually simulated self-motion and indicated the direction of target motion with respect to a scene depicting a large room (Experiment 1) or a cluster of 3D objects (Experiment 2). Forward-backward and left-right target motions, as well as self-motions were simulated. Optic flow and motion vectors were controlled across conditions to examine cues to target distance and motion in depth, such as binocular disparity and object size, and the change in these signals (e.g. looming, change in disparity, interocular velocity difference). During left-right locomotion through both environments, flow parsing gains were significantly lower for left-right than for forward-backward moving targets. However, during forward-backward locomotion, left-right moving targets yielded significantly higher flow parsing gains than forward-backward moving targets. Overall, flow parsing gains were higher when self-motion and target motion are orthogonal to each other, than when they are parallel. These findings provide evidence that depth and distance cues are integrated in perceiving world-relative object motion during self motion. Availability of such signals improves the effectiveness of flow parsing.
Similar content being viewed by others
Introduction
Optic flow refers to the global motion pattern in the optic array caused by the movement of the observer1. It is well known that optic flow is used in perception2,3,4, control5,6,7, and guidance8 of self-motion. Optic flow patterns have been shown to stimulate neurons in the MST region in visual systems of non-human primates9,10,11, and corresponding motion-sensitive areas have been identified in the human visual system12,13.
When perceiving the world-relative motion of an object during self-motion, the flow vector of the object is complicated, as it is composed of the flow vectors of self-motion and object motion. How is this information processed to recover the world-relative object motion? The flow parsing hypothesis proposes that the scene-relative motion is obtained by subtracting the self-motion flow vector from the complex object motion flow vector14,15,16,17. Optic flow processing has been reported to play a primary role in the identification of scene-relative object movements18. Moreover, non-visual inputs, such as vestibular inputs, also contribute to flow parsing19,20,21. When both visual and non-visual information is combined, discrimination thresholds were improved19, and more of the self-motion component of the target motion was removed20. Flow parsing might not be exclusive to humans. Peltier et al.22 trained two macaque monkeys to perform direction discrimination with a saccade response and tested them on this task while presenting a global optic flow simulating forward self-motion. The results showed biases consistent with the subtraction of background optic flow predicted by the flow-parsing hypothesis. Vestibular inputs facilitated flow parsing of these two subjects22. Niehorster and Li16 proposed the flow parsing gain g, the ratio of the discounted flow vector to the true locomotion flow vector, for measuring flow parsing completeness. In the same paper16, the authors found that the flow parsing gain g was roughly constant within a given scene, regardless of changes in self-motion speed or target motion speed.
An effective method for studying optic flow parsing is presenting stimuli that include a target motion and a backgroud optic flow, and adjusting the target motion to find the point of subjective equality (PSE) where the target is equally likely to be perceived as moving in one direction and moving in the opposite direction. A PSE can also be called the point of subjective stationarity (PSS) when it is also the point that the target appeared to not move. Examples in which the PSE does not coincide with the PSS are provided by Niehorster and Li16 and Rushton et. al18. In their studies, a point probe moved obliquely upward, with its motion defined by horizontal and vertical motion components. The PSE was defined as the magnitude of the horizontal component at which the motion appeared to move vertically upward. Staircase methods can be used to find the PSE efficiently.
Psychophysical studies of flow parsing tend to use artificially generated stimuli to isolate variables and identify effects. In the context of optic flow and object motion perception, distance is a critical parameter, given that optic flow velocity is inversely proportional to the egocentric distance. In the real world and in realistic-looking VR scenes, abundant distance and depth cues are available. Such cues include retinal image size, height in the visual field, perspective, binocular disparity, blur, vergence and accommodation23. Some of them, such as blur, motion parallax, and binocular disparity, are often considered as cues to relative depth rather than absolute distance23. However, relative depth cues can be integrated over space or can change the perception of absolute distance24, so they should not be treated in isolation. Can distance and depth cues affect flow parsing? Empirical evidence indicates that binocular disparity contributes to the perception of scene-relative object motion. The experimental stimulus used by Rushton and Warren14 was manipulated such that binocular disparity was the only reliable cue to distance. Their results showed that participants were able to utilize binocular disparity to judge the distance order and to perceive the correct direction of the object motion. By comparing between three viewing conditions, Guo and Allison25 found that binocular disparity improved the detection threshold and reduced the discrimination bias of target motions, possibly by segregating the object in depth from the background. Monocular depth cues, such as motion parallax, relative size, perspective, and occlusion, have also been found to aid in flow parsing17.
Many of these distance and depth cues were included in the stimuli used in flow parsing studies, either as a variable or just as part of the stimuli. Stereoscopic displays have been widely used to generate and present optic flow stimuli with binocular disparity14,15,16,21,25,26,27,28. Effects of several monocular depth cues have also been examined17. However, flow parsing studies that use realistic-looking objects as targets are relatively rare. More commonly, point probes have been used as the target in flow parsing studies14,15,16,17,18. While not all studies provided exact details of the stimulus, the point probe target was generally described as a dot representing a spatial location. Thus, it is common and justified to present a small dot with a fixed size, regardless of the distance between the observer and the spatial location that the probe represents. It is also worth mentioning that some of these studies have used definitively sized objects as surrounding scene-objects (e.g.14,17). The reason for using a point probe was likely that one can prevent possible ambiguity and complexity in optic flow composed of background flow, flow caused by target motion, and flow caused by target looming, to reduce ambiguity for the observer and to better isolate the effect of interest. However, when other cues are unavailable, the observer has to rely on the cue provided (optic flow in this case). Even if an effect is found, it remains unclear whether or to what extent this cue is still relied upon in the real world. This is a common trade-off in psychophysics between isolating the effect of interest and the validity in a cue-rich environment, such as the real world. It is a specific instance of the general trade-off between internal and external validity.
Another major difference between the stimuli in flow parsing studies and the real world is that the real world is three-dimensional (3D). Three coordinates, that is, values on three axes, are required to define a point or a motion vector. A reasonable set of the three axes are the canonical axes of the human body. The forward-backward axis, the left-right axis, and the vertical axis. The forward-backward axis is known as the anteroposterior(AP) axis especially in biology and anatomy, or the sagittal axis (less common). Here, we call it the forward-backward axis because we refer to directions of both self- and object motions and to avoid confusion with the sagittal plane.
Object motions that we perceive are compositions of motions on the three axes. The vast majority of studies examined either left-right movements, vertical movements, or some composition of both, as these motions can be easily represented on a monitor. Guo and Allison25 examined targets that moved forward and backward, but to the best of our knowledge, no flow parsing study compared self- and target motions between multiple axes. As mentioned earlier, a point probe is a popular target choice in a flow parsing study. However, it is not the best choice for representing movements in depth, because when an object moves in depth, its image size changes. And a point probe does not and should not change its size. While perspective-accurate targets have been used to examine flow parsing during obstacle avoidance28, visual and non-visual contributions21, and rotational motions26, these studies did not compute the flow parsing gain and they did not compare between multiple self- and object motion axes. This study aims to fill the gaps by examining flow parsing gains when the target and the observer move in forward-backward directions and frontoparallel (left-right) directions in scenes with abundant cues to distance and depth.
The present study uses visual stimulation to simulate self-motion through a 3D environment. As noted earlier, flow parsing processes are affected by vestibular and other non-visual self-motion inputs that contribute to building an accurate self-motion percept. Use of a visually-simulated motion allows for investigation of the specific role of optic flow, with a controlled and repeatable visual stimulus and the non-visual held constant. Visually-simulated motion is also a key stimulus in mediated perception such as in immersive cinema and virtual reality. Virtual reality (VR) technology can create an immersive experience, but it can also bring side effects such as cybersickness29,30 and distorted spatial perception31. How movements affect interaction in VR is a key area that requires further research32, and one of the challenges is supporting natural behavior and perceptual judgements in spite of the limits on actual motion imposed on the user. This is particularly relevant for immersive training and serious game applications where skill transfer to the real world is critical and actual travel must be visually simulated. Understanding the ability and limitations of users when parsing a moving scene to extract self and object motion is critical to successful scenarios. The present study will provide information to VR researchers and developers on user tolerances and perceptual biases that can help optimize perception when developing interactive VR content.
General methods
All methods were performed in accordance with relevant guidelines and regulations.
Apparatus
The stimuli for both Experiment 1 and Experiment 2 were displayed in our Wide-field Stereoscopic Environment (WISE, see Fig. 1). It consisted of eight Christie Mirage WU-L projectors, each projecting at 120 Hz (60 Hz per eye) in full HD. The projections were hardware-synced and blended. Participants wore active shutter glasses to view the scene. Three WorldViz PPT-N infrared cameras were mounted to track the head movements by capturing the IR markers (WorldViz PPT Eyes) attached to the active shutter glasses. The 3D head position was calculated by WorldViz PPT Studio and passed to the VR software WorldViz Vizard which controls the multi-projection mapping. The experimental program was developed in Python 3 with the WorldViz Vizard 7 VR toolkit and the Psychopy package33.
The wide-field stereoscopic environment at York University. Participants sat in the seat to view the projective display. The seat was slid forward in to the centre of the display providing a full-field visual stimulus. During experiments the lights in the lab were turned off to keep a high contrast and eliminate any extraneous visual features.
Design and procedure
The task for both Experiment 1 and Experiment 2 was to discriminate the direction of the target’s motion during a simulated self-motion. The possible directions of the target were always on a given axis, which was given to the participant at the beginning of each block of trials. Our design included four self-motion and target motion directions: forward, backward, left and right. These directions can be grouped by axes (forward-backward and left-right), or they could be grouped into away and toward (A/T) directions. Take self-motion as an example. There are two self-motion directions on each axis; one gets the observer away from the target, the other gets them closer to (thus toward) the target. Thus, we adopted a 2 (self-motion axes) \(\times\) 2 (away/toward self-motion directions) \(\times\) 2 (target motion axes) \(\times\) 2 (eccentricity levels) \(\times\) 2 (sides of target) within-participant design. Note that only the A/T self-motion direction was listed as a factor, but not the A/T target motion, because we set the PSE of the target motion as our dependent variable. The PSE on a given axis must be unique and include a speed and A/T direction of motion.
The four combinations of self-motion and target motion axes were tested separately in four blocks of trials, and their orders were counterbalanced among the participants. All other variables (A/T direction, eccentricity and side) were randomly interleaved across trials in all blocks. We chose the forward-backward axis and left-right axis as our two self- and target motion axes (see Fig. 2), because they lie orthogonally in the horizontal plane, and we humans live and interact with objects on a near-horizontal surface, the ground. We left out the less common vertical axis to keep the experiment within a reasonable time frame to avoid fatigue and to ensure the validity of the data.
Top-down illustration of possible directions on forward-backward axis and left-right axis. In this illustration, the target is on the right of the observer. Our both experiments included trials with a left target and trials with a right target.
To obtain the PSE efficiently34, we adopted the up-down staircase method . Our method of obtaining a PSE with adaptive staircases was inspired by the motion nulling method adopted in previous studies16,18,20 with some modifications to fit our study. The staircases governed the A/T direction and speed of the target motion on a given axis for a given trial, and the mean value of the final five trials was taken as the PSE. At the end of the staircase sequence, participants reported that the task was difficult because the target did not appear to move or barely moved (in the direction being judged). This is typical in staircase procedures. The forward and the rightward directions were defined as positive on their respective axes, for the purpose of preparing and rendering stimuli only. Each staircase started at a random speed between -1 m/s and 1 m/s, and the step size started at 1 m/s. After each reversal, the step size was halved. Each block contained eight staircases, one for each combination of self-motion direction, side of the target position (left/right), and eccentricity level. Each staircase ended after 30 trials, so each block contained a total of 240 trials. There were four blocks, so the total number of staircases was 32 and the total number of trials per observer was 960. At the beginning of each block, after acknowledging the target motion directions of the block (left-right or forward-backward), participants performed a practice of 40 trials under the same condition as the experimental block. The practice data was discarded. In between each block of trials, participants took a break for at least two minutes. They were allowed to take longer breaks if necessary.
Data analysis
From the PSE, flow parsing gain (g) of each condition was calculated to analyze participants’ performance, and thus it is our dependent variable. According to the definition by Niehorster and Li16, g is the ratio of the discounted flow vector to the locomotion-defined flow vector. When \(g=0\), self-motion is ignored in the perception of object motion, and the perceived motion is only determined by changes in egocentric direction. When \(g=1\), the world-relative target motion is perceived perfectly. As g was initially proposed to measure the completeness of parsing a frontoparallel motion, to enable comparison on multiple axes, we expand the definition of flow parsing gain to incorporate both forward-backward and left-right self- and target motions.
Figure 3 illustrates how g was calculated in this study. All targets were at the eye level, and moved horizontally (i.e. forward-backward or left-right). In both panels of Fig. 3, \(\alpha\) denotes the movement of the target (angular movement in terms of the change in egocentric direction) when the target remains stationary in the scene, and \(\beta\) denotes the movement when the target is perceived to be stationary in the scene (i.e., at PSE).
We can then obtain the average flow vectors by dividing the above motion vectors by the time t:
where \(v_{\alpha}\) denotes the locomotion component, and \(v_{\beta }\) is the target motion component of flow vector at PSE. With them, we obtain g as follows:
which can be simplified to:
When flow parsing is complete, the bias is 0 and \(\beta =\alpha\), \(g=1\). When flow parsing does not occur, the bias is equal to locomotion and \(\beta =0\), \(g=0\).
Our definition resembles that of Niehorster and Li16 with modifications for clarity on the direction of the motions to fit our experiment design. Their nulling component16 was the retinal motion component added to the point probe at PSE. Since we manipulated the linear motions of a 3D object in the space, we obtained linear biases (CD in Fig. 3) and calculated its angular motions with respect to the observer (\(\alpha -\beta\)). The meaning of g was kept unchanged. Since the targets were always at eye level, the directions of \(\alpha\) and \(\beta\) were always horizontal, either inward or outward. Usually, \(\beta\) and \(\alpha\) are in the same direction (e.g., outward during forward self-motion) and \(|\beta |<|\alpha |\), which makes \(0<g<1\). It is worth noting that this analysis method is not restricted to any specific direction of self- or target motion (e.g. left-right, forward-backward, vertical). However, in the present experiments, only motions along the forward-backward axis or on the left-right axis were examined.
We expected that the self-motion axis and the target motion axis (left-right or forward-backward) might affect the flow parsing gain. Thus, these were our primary independent variables of interest. We also analyzed the effect of A/T self-motion direction, eccentricity, and side. The left/right A/T self-motion direction was corrected based on the side of the target in that trial.
The flow parsing gains were computed in Python, and then the data were fed into R for model fitting and statistical tests. We first performed analysis of variance (ANOVA) of the effects of the locomotion axis, the target motion axis, locomotion direction, eccentricity, and side of the target on the flow parsing gain. Results were corrected with Greenhouse-Geisser correction. This was done with the anova_test function from rstatix package. Then, we fitted linear mixed effect models to our data in R using the lme4 package and computed estimated marginal means (EMMs) for contrast comparison, with the emmeans package. The p-values were adjusted with Tukey’s method.
Illustrations of the variables in the flow parsing gain calculation. Left panel: parallel motion. Right panel: orthogonal motion. Participant moves from A to B, while the target started at C and moved towards E. D is the point of subjective equality (PSE), i.e. when the target moves from C to D, it is perceived as stationary on the given axis. Flow parsing gain g is obtained by \(g=\frac{\beta }{\alpha }\). Only two examples of forward locomotion are shown in this figure.
Experiment 1
Participants
We recruited participants from students and lab members at York University. Eight participants (male: 5, female: 3) volunteered to participate. Their age ranged between 18 and 40 (\(M=26.75, SD=7.05\)). All participants had experience in VR environments, but were naive to the purpose of this study. All participants had normal or corrected-to-normal vision, and had stereoacuity better than 100 arcseconds, tested with the stereo fly test (https://www.stereooptical.com). Their IPD were measured, and our apparatus was adjusted accordingly. This study was approved by the Office of Research Ethics at York University. (Certificate No: e2024-279). All participants provided informed consent in accordance with the Declaration of Helsinki.
Stimuli
The stimulus is shown in Fig. 4. The scene was a room consisting of a white tiled ceiling, floor, and walls. The room was 6 m (width) \(\times\) 6 m (length) \(\times\) 3.4 m (height). Participant started at the south end of the room, facing the north wall. A fixation cross was placed at eye level 6 m ahead. Participants were instructed to keep their gaze on the fixation whenever it was visible.
Participants experienced visually simulated left-right or forward-backward self-motion and performed a two-alternate forced choice (2AFC) task. In each trial, they always discriminated between two opposite directions (e.g. forward vs backward, or left vs right) of scene-relative target motions. At the beginning of a trial, the room, a red fixation cross and the target were presented (see Fig. 4). The height of the viewpoint was 1.37 m, approximately the eye height while sitting on the provided seat (including the height of the seat above the platform on which the seat was located). The target, an orange ball with diameter of 0.22 m, appeared three meters away from the observer at 10\(^\circ\) or 20\(^\circ\) eccentricity to the left or right side. The target was placed at eye level, so that its motion vector on the observer’s optic array always aligned on the same horizontal line, regardless of its motion direction.
The target and the observer stayed stationary for 0.5 s, before they started moving left-right or forward-backward. The self-motion and the target motion started simultaneously. The locomotion speed in the forward or backward direction was 1.4 m/s, regardless of the target eccentricity. left-right locomotion speed was 0.26 m/s when the target starts at 10\(^\circ\) and 0.54 m/s when target starts at 20\(^{\circ }\). The locomotion speeds were chosen carefully, such that the angular motion (in terms of change in egocentric direction) of a scene-stationary target was matched for both left-right and the forward-backward locomotion. Note that to match the angular motion of a scene-stationary target, the locomotion in the 3D space was different in magnitude. This difference existed both for opposite directions on the same axis and across axes, but fortunately the difference between opposite directions was much smaller than the difference across axes. To maintain consistency within a locomotion axis, the locomotion speed of opposite directions along the same axis was averaged. After 0.5 s of motion, the screen was replaced by a uniform gray color, indicating that the participant should respond with the perceived direction of the scene-related target motion by pressing the corresponding button on an Xbox controller.
Example annotated stimulus of Experiment 1. This example shows a trial with simulated forward self-motion while the ball has a rightward motion vector. Note that this motion can be interpreted either as rightward motion or forward-backward motion in which the ball approaches the observer.
Results
The results are shown in Fig. 5. Repeated measures ANOVA found a significant effect of locomotion axis on flow parsing gain g (F(1,7) = 74.081, \(p<.001\), \(\eta _p^2\) = 0.914). We did not find a significant effect of target axis(F(1,7) = 2.782, \(p=.139\), \(\eta _p^2\) = 0.284), but there was a significant interaction effect of locomotion axis \(\times\) target motion axis(F(1,7) = 37.973, \(p<.001\), \(\eta _p^2\) = 0.844).
Interestingly, the away/toward direction did not affect g significantly (F(1,7) = 0.711, \(p=.427\), \(\eta _p^2\) = 0.092). Moreover, neither side (F(1,7) = 3.557, \(p=.101\), \(\eta _p^2\) = 0.337) nor eccentricity (F(1,7) = 0.012, \(p=.916\), \(\eta _p^2\) = 0.002) of the target affected flow parsing gain significantly.
Contrast comparisons were performed to investigate the interaction between locomotion axis and the target motion axis more closely. The result is represented in Fig. 5. During left-right locomotion, forward-backward-moving targets had higher flow parsing gains (t(245) = -11.070, \(p<.001\)). During forward-backward locomotion, left-right-moving targets had higher flow parsing gains (t(245) = 8.808, \(p<.001\)). Furthermore, for left-right moving targets, g was not significantly different between during left-right locomotion and during forward-backward locomotion (t(245) = -1.679,\(p=.095\)). However, for forward-backward moving targets, g was significantly higher during left-right locomotion than during forward-backward locomotion (t(245) = 18.200,\(p<.001\)). In summary, the self-motion axis had a large effect on forward-backward moving targets but not on left-right moving targets.
Results of Experiment 1. Gains were calculated from the PSE for each condition. A gain of 1 corresponds to a PSE which is objectively stationary with respect to the scene, while a gain of 0 corresponds to a PSE in the same direction and amplitude as the observer self-motion. Data were collapsed across self-motion A/T direction, target side and eccentricity in computing the averages. Error bars represent standard error of the mean. *: \(p<.05\), **: \(p<.005\), ***: \(p<.0005\).
Experiment 2
The scene in Experiment 1, the room, was a fixed scene, and its walls were always located at a greater distance than the target. The observers may have experienced similarly sized rooms as the dimensions of the room were within a typical range. In Experiment 2, we adopted a more dynamic, laboratory-style scene composed of multiple floating objects, an approach that has previously been adopted in some flow parsing experiments14,16. The results of Experiment 2 may be more relatable to these studies. Furthermore, we hope to bridge the gap between conventional VR scenes and laboratory scenes with conclusions drawn from the results of both Experiments 1 and 2.
Participants
We recruited participants in the same way as in Experiment 1. Eight participants (male: 2, female: 5) volunteered to participate. Their age ranged between 19-34 (\(M=26.25, SD=4.58\)). Three of them also participated in Experiment 1. However, none of them performed both experiments on the same day.
Stimuli
As shown in Fig. 6a, the background in Experiment 2 was a cluster of blue and green cubic boxes with length of 0.2 m (also called scene objects), instead of the room in Experiment 1. Compared to the walls of room, the scene objects were closer to the target object, and it was also possible that some of the scene objects were randomly placed closer to the observer than the target object. In each trial, 20 boxes were placed in front of the observer. Their x, y and z coordinates were randomly chosen inside the space of 2 m (length) \(\times\) 4 m (width) \(\times\) 2 m (height) left or right in front of the observer (illustrated in Fig. 6b). We selected this range of possible positions so that the scene object surrounds the target but generally would not occlude or collide with the target. However, because we adopted the adaptive staircase method, the ball movement speed in each trial depended on the previous responses of the participant. In addition, the step size was programmed to decrease as the number of reversals increases. Thus, although it was possible for the target to collide or become occluded by the scene objects, but in practice occlusion and collision were rare (typically no more than a few times across all trials), and limited to the beginning of the staircases, when the stimulus was far from the PSE. Moreover, the physics simulation was turned off, so the ball would pass through any object in the case of collision.
Stimulus of Experiment 2. (a) An example of actual stimulus. (b) Size and location of the space for the random spawn of the scene object cloud in each trial of Experiment 2. Red sphere represents the observer’s viewpoint. The scene objects were randomly placed in the space bounded by the light blue cuboids. Grid spacing on the ground was 0.5 m and neither grid nor boxes were visible to the observer.
The starting positions and the motion profiles of both the target and the observer were the same as in Experiment 1.
Results
The results of Experiment 2, which are shown in Fig. 7, were generally consistent with the result of Experiment 1 with some differences. There was a similar significant interaction effect on g between locomotion axis and target axis (F(1,7) = 17.652, \(p = .004\), \(\eta _p^2\) = 0.716). Also as in Experiment 1, ANOVA of Experiment 2 found no significant effect of eccentricity (F(1,7) = 0.099, \(p = .762\), \(\eta _p^2\) = 0.014), side of the target (F(1,7) = 1.11, \(p = .327\), \(\eta _p^2\) = 0.137), or whether the locomotion was away or toward the target (F(1,7) = 0.372, \(p = .561\), \(\eta _p^2\) = 0.05). However, unlike in Experiment 1, in Experiment 2, neither locomotion axis (F(1,7) = 1.686, \(p = .235\), \(\eta _p^2\) = 0.194) nor target motion axis (F(1,7) = 5.537, \(p = .051\), \(\eta _p^2\) = 0.442) had a significant main effect.
To further investigate the interaction effect, we performed a contrast comparison for locomotion axis and for target motion axis. For left-right moving targets, g was lower during left-right self-motion than during forward-backward self-motion (t(245) = -5.533, \(p<.001\)). Conversely, for forward-backward moving targets, g was higher during left-right self-motion than during forward-backward self-motion (t(245) = 7.184, \(p<.001\)). Furthermore, during left-right self-motion, g was lower for left-right moving targets than for forward-backward moving targets (t(245) = -3.206, \(p = .002\)). During forward-backward self-motion, g was higher for left-right moving targets than for forward-backward moving targets (t(245) = 9.510, \(p<.001\)).
The g for left-right moving targets during forward-backward self-motion is 1.06, an unusually high value inconsistent with the incomplete flow parsing (corresponding to \(g<1\)) which has been reported repeatedly16,20,35. We performed an additional one-sided t-test, and did not find the result significantly different from 1.0 (t(63) = 1.08, \(p = .141\)).
Results of Experiment 2. Gains were calculated from the PSE for each condition. A gain of 1 corresponds to a PSE which is objectively scene-stationary on the given axis, while a gain of 0 corresponds to a PSE in the same direction and amplitude as the observer self-motion. Error bars represent standard error of the mean. *: \(p<.05\), **: \(p<.005\), ***: \(p<.0005\).
Discussion
Flow parsing gains in both experiments were generally between 0 and 1. The only exception, out of eight combinations of conditions in two experiments, occurred in Experiment 2 when participants viewed left-right moving targets during forward-backward self-motion. A range \(0<g<1\) has been observed in multiple previous flow parsing studies16,25,36, and is consistent with an underestimated self-motion. Underestimations of self-motion speed in VR have been reported repeatedly37,38,39. However, it should be noted that there is a debate as to the role of self-motion perception in perceiving object motion during self-motion26. For example, one could argue that self-motion estimation is unnecessary for flow parsing, and that the process requires only vector calculation. Moreover, perceived background flow might not be fully utilized in flow parsing36, let alone perceived self-motion. Nevertheless, it is difficult to dispute that self-motion signals are involved in flow parsing, if optic flow is regarded as a self-motion signal.
The flow parsing gains in Experiment 2 were generally larger than those in Experiment 1. It is not too surprising that they differ, as g is known to change between different scenes, and it increases as the magnitude of optic flow increases16. It is worth noting that in Experiment 2, the gains in orthogonal conditions were close to 1. Especially, g for left-right moving targets during forward-backward self-motion was greater than 1, but the difference was not significant. Rushton et al.40 noted a pop-out effect where a moving object was easier to identify in a cluster of objects. The pop-out effect might have helped the target in Experiment 2 to capture attention more effectively, thereby indirectly improving performance.
Alternatively, the differences in performance might be due to the different spatial relations to the surrounding features. In terms of distances, there are two differences between the two experiments: (i) the scene objects (Experiment 2) were closer to the observer than the walls (Experiment 1), and (ii) the scene objects (Experiment 2) were closer to the target than the walls (Experiment 1). The magnitude of optic flow scales inversely with the distance, so the surrounding flow was faster in Experiment 2. However, the faster surrounding optic flow in Experiment 2 does not explain the higher g, as g has been found to be insensitive to change in optic flow speed16. Instead, we believe that it was other differences in the scene that made the flow parsing more accurate in Experiment 2. Such differences may include a closer target-background distance in Experiment 2, the structural difference between the scenes, or the self-motion perception.
Interestingly, in both experiments we found interaction effects between target motion axis and locomotion axis on flow parsing gains. Specifically, for any motion (self-motion or target motion) on a given axis, the flow parsing gain was always higher when the other motion (target motion or self-motion) was orthogonal. This was observed for Experiment 2, and partially observed for Experiment 1 except when the target moves left-right, where the flow parsing gain was larger for forward-backward locomotion, consistent with an orthogonal enhancement, but this difference was not significant. It should be noted that the orthogonal motion only occurred in 3D space, while the target motion vector always aligned on the same horizontal line on the observer’s optic array. This result is in contrast to previous findings of flow parsing gain constancy16 within the same scene.
It is worth noting that the focus of expansions (FoE) of optic flow generated from a forward/backward self-motion and a left-right self-motion were 90\(^\circ\) apart, so objects with the same eccentricity in these two self-motion conditions would be at different separations from the FoE. It was not possible to control for both eccentricity and FoE simultaneously. When FoE-relative location is controlled for, the eccentricity varies. However, the parallel and orthogonal comparisons within the same locomotion axis do not suffer from this complication. Indeed, the orthogonal enhancement exists for both self-motion axes in both experiments.
While eccentricity from the orthogonal FOEs necessarily varies, the target motion may be and was controlled. To explain, take a typical inward PSE and outward optic flow (forward moving observer) as an example. In 3D space, this PSE is consistent with a forward moving target (in the forward-backward condition), or an inward moving target (in the left-right condition), and a g between 0 and 1 in both cases. For a fixed g, the target motion (in terms of change in egocentric direction) remains the same regardless of its axis of motion, as the bias in angular terms remains the same. However, the change in distance varies and so do the distance and depth cues. Table 1 shows examples of the change in Euclidean distance (\(\Delta d\), in m) and in percentage of the initial target distance, under selected representative conditions when \(g = 0.5\). When the observer and target move on orthogonal axes, the \(\Delta d\)s are an order of magnitude greater compared to when the observer and target move on parallel axes.
Therefore, our result is evidence that with a target that represents an actual object in space (rather than a point), perception of world-relative object motion does not rely solely on local optic flow vectors. The observers were able to use signals to the change in distance between themselves and the target to perceive the target motion more effectively, achieving higher flow parsing gains. This was consistent with previous conclusions that optic flow alone might be insufficient for accurate perception of scene-relative motion16, and that flow parsing is more accurate under ecologically valid conditions35.
In our task, the change in distance manifests itself in changes in target size (looming) and binocular depth cues. The target was always rendered at the eye level, so height in the visual field was not a cause of the difference. Neither was the shape of the target, as it was spherical. Binocular motion-in-depth cues include change-in-disparity (CD) and interocular velocity difference (IOVD)41,42. Vergence and accommodation were unlikely to have a role because we asked the observer to keep their gaze on a stationary fixation cross positioned at a far distance. This conclusion is consistent with previous studies14,25 which showed that stereopsis also contributes to flow parsing.
The aforementioned pictorial and binocular depth cues are also known to contribute to the detection of collisions and the estimation of the time-to-contact (TTC)43,44. An eccentric target will not collide with the observer when the target and the observer move in parallel, but they could collide when they are moving orthogonally. In other words, TTC is not defined in the parallel scenario in our study, leaving it without a role in the estimation process. Our result that the perception of the scene-relative target motion was more accurate for orthogonal rather than parallel motions indicates that these two tasks (estimating TTC and flow parsing) likely share some common processes.
While we provide evidence for the role of depth cues, our results also confirm that optic flow plays a role in the task. If the egocentric distance alone dictated perception, then we would have observed \(g = 0\) when self-motion and target motion were parallel, which was not the case. This is because when both motions are parallel, the only possibility to keep a finite egocentric distance of the target unchanged is to also keep its egocentric direction unchanged (i.e. no relative movement at all between the target and the observer). Moreover, our findings that the flow parsing gain was not altered by changes in eccentricity, side of the target, and A/T locomotion direction add to the constancy properties found in16.
We jointly scaled the self-motion velocity and eccentricity for the left-right locomotion conditions, to keep the average angular motions of a scene-stationary target the same as those in the forward-backward locomotion conditions. This was done to ensure that the proximal stimuli were comparable between the forward-backward and left-right locomotion conditions. In fact, if we kept the 1.4 m/s self-motion speed across all conditions, the target would have crossed the observer’s mid-sagittal plane in some cases. We do not believe that the different self-motion velocities had an effect on the results, as the ANOVA did not support an effect of eccentricity in either Experiment 1 or Experiment 2, despite the fact that the velocity was different between the two eccentricity levels in left-right locomotion conditions. In addition, it has also been shown that the speed of self-motion does not affect g16.
In summary, we investigated scene-relative target motion perception across all four combinations of left-right and forward-backward self- and target motion conditions, while controlling the optic flow and target motion vectors. Accuracy of perceived scene-relative target motion, measured in flow parsing gains, was generally better when axes of self- and target motions were orthogonal. Our results provide further evidence that distance signals are integrated in the flow parsing process along with optic flow. Based on this, we recommend that VR researchers and developers should consider environments and layouts that provide this information when designing interactive content for tasks such as driving that require accurate object motion judgments during simulated self-motion.
Data availability
The datasets generated during and/or analysed during the current study are available in the Borealis repository, https://doi.org/10.5683/SP3/SHT1HE.
References
Gibson, J. J. The Perception of the Visual World. (Houghton Mifflin, 1950).
Lappe, M., Bremmer, F. & Van Den Berg, A. V. Perception of self-motion from visual flow. Trends Cogn. Sci. 3, 329–336. https://doi.org/10.1016/S1364-6613(99)01364-9 (1999).
Frenz, H. & Lappe, M. Absolute travel distance from optic flow. Vision. Res. 45, 1679–1692. https://doi.org/10.1016/j.visres.2004.12.019 (2005).
Lappe, M., Jenkin, M. & Harris, L. R. Travel distance estimation from visual motion by leaky path integration. Exp. Brain Res. 180, 35–48. https://doi.org/10.1007/s00221-006-0835-6 (2007).
Rogers, B. Optic Flow: Perceiving and Acting in a 3-D World. i-Perception 12, 1–25. https://doi.org/10.1177/2041669520987257 (2021).
Warren, R. & Wertheim, A. H. Perception and Control of Self-motion (Psychology Press, 1990).
Warren, W. H., Kay, B. A., Zosh, W. D., Duchon, A. P. & Sahuc, S. Optic flow is used to control human walking. Nat. Neurosci. 4, 213–216. https://doi.org/10.1038/84054 (2001).
Cheng, J. & Li, L. Perceiving path from optic flow. J. Vis. 11, 908–908. https://doi.org/10.1167/11.11.908 (2011).
Ilg, U. J., Schumann, S. & Thier, P. Posterior parietal cortex neurons encode target motion in world-centered coordinates. Neuron 43, 145–151. https://doi.org/10.1016/j.neuron.2004.06.006 (2004).
Duffy, C. J. & Wurtz, R. H. Response of monkey MST neurons to optic flow stimuli with shifted centers of motion. J. Neurosci. Off. J. Soc. Neurosci. 15, 5192–5208. https://doi.org/10.1523/JNEUROSCI.15-07-05192.1995 (1995).
Bradley, D. C., Maxwell, M., Andersen, R. A., Banks, M. S. & Shenoy, K. V. Mechanisms of heading perception in primate visual cortex. Science (New York, N.Y.) 273, 1544–1547. https://doi.org/10.1126/science.273.5281.1544 (1996).
Smith, A. T., Wall, M. B., Williams, A. L. & Singh, K. D. Sensitivity to optic flow in human cortical areas MT and MST. Eur. J. Neurosci. 23, 561–569. https://doi.org/10.1111/j.1460-9568.2005.04526.x (2006).
Pitzalis, S. et al. Selectivity to Translational Egomotion in Human Brain Motion Areas. PLoS ONE 8, 1–14. https://doi.org/10.1371/journal.pone.0060241 (2013).
Rushton, S. K. & Warren, P. A. Moving observers, relative retinal motion and the detection of object movement. Curr. Biol. 15, 542–543 (2005).
Warren, P. A. & Rushton, S. K. Perception of object trajectory: Parsing retinal motion into self and object movement components. J. Vis. 7, 1–11. https://doi.org/10.1167/7.11.2 (2007).
Niehorster, D. C. & Li, L. Accuracy and tuning of flow parsing for visual perception of object motion during self-motion. i-Perception8, 1–18. https://doi.org/10.1177/2041669517708206 (2017).
Warren, P. A. & Rushton, S. K. Perception of scene-relative object movement: Optic flow parsing and the contribution of monocular depth cues. Vision. Res. 49, 1406–1419. https://doi.org/10.1016/j.visres.2009.01.016 (2009).
Rushton, S. K., Niehorster, D. C., Warren, P. A. & Li, L. The primary role of flow processing in the identification of scene-relative object movement. J. Neurosci. 38, 1737–1743. https://doi.org/10.1523/JNEUROSCI.3530-16.2017 (2018).
MacNeilage, P. R., Zhang, Z., DeAngelis, G. C. & Angelaki, D. E. Vestibular facilitation of optic flow parsing. PLoS ONE7. https://doi.org/10.1371/journal.pone.0040264 (2012).
Xie, M., Niehorster, D. C., Lappe, M. & Li, L. Roles of visual and non-visual information in the perception of scene-relative object motion during walking. J. Vis. 20, 1–11. https://doi.org/10.1167/jov.20.10.15 (2020).
Fajen, B. R. & Matthis, J. S. Visual and Non-Visual Contributions to the Perception of Object Motion during Self-Motion. PLoS ONE 8. https://doi.org/10.1371/journal.pone.0055446 (2013).
Peltier, N. E., Angelaki, D. E. & DeAngelis, G. C. Optic flow parsing in the macaque monkey. J. Vis. 20, 1–27. https://doi.org/10.1167/jov.20.10.8 (2020).
Howard, I. P. & Rogers, B. J. Perceiving in depth, volume 2: Stereoscopic vision (OUP USA, 2012).
Gogel, W. C. & Tietz, J. D. Relative cues and absolute distance perception. Perception Psychophys. 28, 321–328. https://doi.org/10.3758/BF03204391 (1980).
Guo, H. & Allison, R. S. Binocular contributions to motion detection and motion discrimination during locomotion. PLoS ONE 19, 1–21. https://doi.org/10.1371/journal.pone.0315392 (2024).
Dupin, L. & Wexler, M. Motion perception by a moving observer in a three-dimensional environment. J. Vis. 13, 15. https://doi.org/10.1167/13.2.15 (2013).
Matsumiya, K. & Ando, H. World-centered perception of 3D object motion during visually guided self-motion. J. Vis. 9, https://doi.org/10.1167/9.1.15 (2009).
Fajen, B. R., Parade, M. S. & Matthis, J. S. Humans Perceive Object Motion In World Coordinates During Obstacle Avoidance. J. Vis. 13, 1–13. https://doi.org/10.1167/13.8.25 (2013).
Ramaseri Chandra, A. N., El Jamiy, F. & Reza, H. A Systematic Survey on Cybersickness in Virtual Environments. Computers 11. https://doi.org/10.3390/computers11040051 (2022).
Luu, W., Zangerl, B., Kalloniatis, M. & Kim, J. Effects of stereopsis on vection, presence and cybersickness in head-mounted display (HMD) virtual reality. Sci. Rep. 11, 1–10. https://doi.org/10.1038/s41598-021-89751-x (2021).
Feldstein, I. T., Kölsch, F. M. & Konrad, R. Egocentric Distance Perception: A Comparative Study Investigating Differences Between Real and Virtual Environments. Perception 49, 940–967. https://doi.org/10.1177/0301006620951997 (2020).
Spittle, B., Frutos-Pascual, M., Creed, C. & Williams, I. A Review of Interaction Techniques for Immersive Environments. IEEE Trans. Visual Comput. Graphics 29, 3900–3921. https://doi.org/10.1109/TVCG.2022.3174805 (2023).
Peirce, J. et al. PsychoPy2: Experiments in behavior made easy. Behav. Res. Methods 51, 195–203. https://doi.org/10.3758/s13428-018-01193-y (2019).
Levitt, H. Transformed up-down methods in psychoacoustics. J. Acoust. Soc. Am. 49(Suppl 2), 467–477 (1971).
Layton, O. W., Parade, M. S. & Fajen, B. R. The accuracy of object motion perception during locomotion. Front. Psychol. 13, 1–15. https://doi.org/10.3389/fpsyg.2022.1068454 (2023).
Falconbridge, M., Stamps, R. L., Edwards, M. & Badcock, D. R. Target motion misjudgments reflect a misperception of the background; revealed using continuous psychophysics. i-Perception14, https://doi.org/10.1177/20416695231214439 (2023).
Banton, T., Stefanucci, J., Durgin, F., Fass, A. & Proffitt, D. The perception of walking speed in a virtual environment. Presence 14, 394–406 (2005).
Durgin, F. H., Gigone, K. & Scott, R. Perception of visual speed while moving. J. Exp. Psychol. Hum. Percept. Perform. 31, 339 (2005).
Steinicke, F., Bruder, G., Jerald, J., Frenz, H. & Lappe, M. Estimation of detection thresholds for redirected walking techniques. IEEE Trans. Visual Comput. Graphics 16, 17–27 (2009).
Rushton, S. K., Bradshaw, M. F. & Warren, P. A. The pop out of scene-relative object movement against retinal motion due to self-movement. Cognition 105, 237–245. https://doi.org/10.1016/j.cognition.2006.09.004 (2007).
Allison, R. S. & Howard, I. P. Stereoscopic motion in depth. In Harris, L. R. & Jenkin, M. R. M. (eds.) Vision in 3D Environments, 416, 163–186, https://doi.org/10.1017/CBO9780511736261.008 (Cambridge University Press, Cambridge, 2011).
Sakano, Y., Allison, R. S. & Howard, I. P. Motion aftereffect in depth based on binocular information. J. Vis. 12, 1–15. https://doi.org/10.1167/12.1.11 (2012).
Hecht, H. & Savelsbergh, G. J. P. Chapter 1 Theories of time-to-contact judgment. In Hecht, H. & Savelsburgh, G. J. P. (eds.) Advances in Psychology, vol. 135 of Advances in Psychology, chap. 1, 1–11, https://doi.org/10.1016/S0166-4115(04)80003-7 (Elsevier B.V., Amsterdam, 2004).
Gray, R. & Regan, D. Chapter 13 The use of binocular time-to-contact information. In Hecht, H. & Savelsburgh, G. J. P. (eds.) Advances in Psychology, vol. 135, chap. 13, 303–325, https://doi.org/10.1016/S0166-4115(04)80003-7 (Elsevier B.V., Amsterdam, 2004).
Acknowledgements
This study was supported by the Natural Sciences and Engineering Research Council of Canada (Grant Nos. RGPIN-2020-06061 and RGPIN-2025-06223). The funder had no role in the design of the study, analysis, or manuscript preparation.
Author information
Authors and Affiliations
Contributions
H.G.: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Validation, Visualization, Writing - original draft, and Writing - review & editing. R.S.A.: Conceptualization, Funding acquisition, Methodology, Resources, Supervision, Validation, and Writing - review & editing.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Supplementary Information 1.
Supplementary Information 2.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Guo, H., Allison, R.S. Axes of self-motion and object motion shape how we perceive world-relative motion. Sci Rep 16, 8914 (2026). https://doi.org/10.1038/s41598-026-42955-5
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-026-42955-5









