Background & Summary

Humans use coordinated head and eye movement to effectively survey, gain information, and interact with their environment. Information about this head-eye coordination provides key insights for how a person interacts with the environment. This information can also be used for designing controllers of neck exoskeletons1,2, realistic robotic actors3,4,5 or virtual characters6,7,8. The dataset presented in this paper provides information on eye-head coordination that can, for example, be analysed for key behavioral insights or used to supply the necessary data for training machine learning models.

In general, head and eye movements are tightly coupled in gaze behaviors through neural mechanisms like vestibulo-ocular and optokinetic reflex9,10,11,12. When responding to a visual stimulus that requires a large gaze shift (saccade), the head compensates for eye movement to recenter the gaze13,14,15. When performing tracking of a moving target (pursuit), the head moves with the stimulus to keep the target centered within the field of vision11,12,16,17,18. However, due to technological limitations, much prior work only considered settings where the head is fixed when studying gaze behaviors9,11,19,20,21. Recently, with the help of head-mounted eye-tracking technology and portable motion sensors, modern studies have increasingly emphasized the role of head movements in human gaze behaviors. For example, Einhauser et al. underscored the dynamic nature of retinal inputs when developing sensory coding models through concurrent recording of eye and head during natural exploration22. Kothari et al. aimed to develop algorithms to categorize gaze events by measuring head and eye movements during real-world tasks12.

Despite these emphasis, equipment such as eye trackers or inertial measurement units used in these studies were highly specialized, which limits data reproducibility. These systems also suffer from technical limitations such as sensor drift and unwanted sensor-body motions (e.g., slippage)12. Additionally, the use of physical environments and tasks require significant setup and modification costs. Data collection is thus limited to participants who have access to the physical infrastructure. Hence, there is a need to provide an open dataset of paired eye and head movements during various visual-motor tasks through accurate, easy-to-access, and standardized collection procedures.

To address this need, we present an open dataset of paired head-eye movements during tasks in virtual environments. The use of virtual environments allows for standard and accurate equipment, as well as rapid modifications of the environments with relatively low cost. In this dataset, paired head and eye movement information was collected from 20 healthy young adults. Both tracking (following objects) and searching (looking for objects) tasks were included. The data collection procedure is easy to carry out as only a head-mounted display with eye-tracking capability is needed to perform the data collection. The virtual environment can be easily shared, enabling highly scalable data collection from anyone with access to the same virtual reality system. The use of the virtual reality system also allows for fast reconfiguration of the virtual environment to study different head-eye behaviors as needed. This open head-eye coordination dataset provides a new resource for scientists to study behaviors of head and eye movements and underlying neural mechanisms (e.g., anticipatory gaze behaviors). It may also assist engineers to develop bio-inspired machines that are relevant to robotics and medical research such as neck exoskeletons to restore head-neck motions in patients with neurological impairments.

Methods

Twenty-five participants were enrolled in this data collection who have varying experience with virtual reality (Table 1). All participants were without any known history of head-neck injuries. 6 participants wore glasses, five of which wore them during the experiment. The participant who opted to not wear glasses was able to perform the tasks as well as all other participants. All participants were able to understand and follow the task instructions. The protocol was approved by the University of Utah Institutional Review Board (#00145893). All participants provided written informed consent prior to the data collection.

Table 1 Characteristics of participants.

A virtual reality headset with eye tracking capability (VIVE Pro Eye, HTC Corporation) was used in the data collection where virtual environment and tasks were developed in Unity 3D. Each participant was seated on a stationary chair throughout data collection. After fitting the headset, the eye tracking system was calibrated using a built-in procedure (see Technical Validation section). Each participant engaged in four interactive visual tracking and searching virtual reality environments three times each in increasing level of difficulty, where the increase in number of targets and complexity of target trajectory, along with distracting elements, is defined as more difficult (Fig. 1). The experiment could be paused at any time and the system could be recalibrated using the same built-in procedure when resumed.

Fig. 1
figure 1

Data collection procedure. The participant applied the VIVE Pro Eye headset and then calibrated it for eye and head measurement. To calibrate, A physical calibration of the headset position and interpupillary distance is performed followed by a software eye tracking calibration. The experiment could be stopped at anytime and the system could be recalibrated when resumed. Four tasks of increasing level of difficulty are completed three times each. The time, eye direction vectors relative to the head, and head direction vectors are all recorded.

The total duration of the data collection was approximately 18 minutes, resulting in 1080 seconds of paired head and eye movement per experiment. During these trials, participants were instructed to track moving objects in each virtual environment, with their eyes and head free to move. The headset was used to record their paired head and eye directions at an average (standard deviation) of 120.66 (8.75) Hz. The head and eye directions are expressed using unit vectors: the left eye direction relative to the head frame, the right eye direction relative to the head frame, and the head relative to the virtual environment ground frame. The orientation of the frames used by the headset are shown in Fig. 2.

Fig. 2
figure 2

(Left) Example target trajectories. (a) The target in the Linear Smooth Pursuit task moves in straight lines of random direction and distance. (b) The target in the Arc Smooth Pursuit task moves in circular lines of random direction, distance, and curvature. Both pursuit tasks feature only a single target. (c) Each of the three targets in the Rapid Visual Search task starts some distance from the participant and moves towards the participant. (d) In the Rapid Visual Search Avoidance task there are three targets to fixate on (blue) and three blocking targets to not fixate on (yellow). Each target and blocking cube starts from a plane some distance from the participant and moves towards the participant. The dashed line represents a horizon. (Right) Orientation of the frames used by the VIVE Pro Eye headset to define head and eye orientations. The head vector is defined with respect to the world frame, and the eye vectors are defined with respect to the head frame which is attached to and follows the VR headset. Target trajectories of each of the four tasks.

The first two tasks, Linear Smooth Pursuit and Arc Smooth Pursuit, were designed to measure the movement behavior of eyes and head tracking a single, slow-moving target at a fixed distance of 10 meters. From the participant’s perspective, in the Linear Smooth Pursuit task, the target moves on a linear trajectory, whereas the target moves on a circular trajectory in the Arc Smooth Pursuit task.

The latter two tasks, Rapid Visual Search and Rapid Visual Search Avoidance, were designed to measure the head and eye movement behavior when participants were asked to search for discrete targets and gaze at them for a short period of time. The difference between these two tasks is that in the Rapid Visual Search Avoidance task, additional objects are presented to distract the participant.

Overall, we have collected data for tracking and pursuing tasks with disturbances caused by random direction changes and searching and fixating tasks with obstacles. For all tasks, targets (blocks) were constrained within a 140° by 140° conic range. When the gaze was fixed on a target, its color switched from blue to green. Each trial was 90-seconds long, and randomization was uniquely seeded for each participant. At the end of each trial, tracking performance of the participant was measured using a numerical score (explained below). Images of the VR environment are shown in Fig. 3.

Fig. 3
figure 3

The VR environment used for capturing the data. (Left) The single target in a linear or arc pursuit task. The cube is green because the gaze is fixated on it. (Right) A blue gaze target with two yellow distracting targets in the searching avoidance task.

Task 1: Linear Smooth Pursuit

Participants were asked to follow a target that moved between uniformly random positions at a fixed speed of 5 meters per second (Fig. 2a). Throughout a trial, the participant was asked to visually track this target as it was moving, with their head free to move. After each trial, the participant received a score that equals the number of frames in which the measured gaze were on the target.

Task 2: Arc Smooth Pursuit

Participants were asked to follow a target moving on circular trajectories with a fixed angular speed of a radian per second, as illustrated in Fig. 2b. The trajectories were randomly generated and updated similar to Linear Smooth Pursuit task. With the head free to move, the participant was asked to visually track the moving target during a trial. At the end of each trial, each participant received a score that equals the number of frames in which the measured gaze were on the target.

Task 3: Rapid Visual Search

Three targets were generated at uniformly random locations on a plane a fixed distance from the participant and moved towards the participant at a fixed speed (Fig. 2c). The participant was asked to eliminate these targets before they hit the participant by fixing their gaze on the target for 0.3 seconds. After a target was eliminated, a new target was then generated at a random location on the same plane, not necessarily inside the field of vision, which required the participant to search for the new target. The participant received a score after each trial where the number of points received equals the number of targets eliminated.

Task 4: Rapid Visual Search Avoidance

This is an extension of the Rapid Visual Search task where three distracting objects (yellow block, Fig. 2d) were added to the trial. The participant was instructed to avoid gazing at them. When a target is eliminated or reaches the subject, the target is respawned back to a random starting point on the same plane. The distracting objects turn red when the eyes are fixated on them and do not respawn until they reach the subject. The participant received a score after each trial where the number of points received equals the number of targets eliminated, with no penalty for gazing on the distracting targets beyond losing time not eliminating objects.

In the first two tasks, randomly changing the trajectories of the targets causes the participant to correct the gaze, thus capturing reactionary in addition to anticipatory behavior. Randomizing the starting position of targets in the last two tasks ensures the participant is performing true searching behavior. The four tasks are in the order of difficulty, however, this may easily be modified in the program.

Data Records

All data as well as trial scores and demographics are made available using Figshare23. Each participant is identified by a unique numeric code from 1 to 25 (Table 1). The file structure is a set of directories for each participant (labeled as User[Participant ID]) where each of the participant’s trials are stored in their respective participant’s directories. Thus, there are 25 participant folders in the first directory, and 12 trial data files in each participant folder. Since each task was performed three times for each participant, each of the 12 trial file names have the structure of User[Participant ID]_[Trial Type]_[Trial Number].csv. The [Trial Type] are the four tasks used in this experiment (i.e., Linear smooth pursuit, Arc Smooth Pursuit, Rapid Visual Search, and Rapid Visual Search Avoidance). The [Trial Number] is the occurrence of a trial in each task (indexed from 0, i.e., 0, 1, and 2). For example, the second trial during the Arc Smooth Pursuit task for the sixth participant will have the file name User6_ArcSmoothPursuit_1.csv. Users 21 through 25 also include the trajectories of targets and follows the same Participant ID, Trial Type, and Trial Number naming pattern as eye and head data (e.g., Object6_ArcSmoothPursuit_1.csv).

The trial files are in .csv format with a single header line followed by the data with comma delimiters (Fig. 4). The first column of the data is the timestamp. At each time step, the second to the fourth columns are the directional vector, in the order of x, y, and z components, of the left eye recorded in the head frame, following the coordinate frames defined in Fig. 2. Similarly, the directional vectors of the right eye in the head frame and the head in the world frame can be found in the 5–7th columns and 8–10th columns, respectively. The tracker records the data at an average (standard deviation) sampling rate of 120.66 (8.75) Hz and each trial was recorded for 90 seconds.

Fig. 4
figure 4

The data retrieval structure. The dataset is a folder with separate directories for each user. Each user has 12 csv files for each trial. Each trial has data for time, eye-in-head direction vectors, and head direction vectors. The vectors are recorded in x, y, and z components.

To test the correctness of the data, the distributions of horizontal and vertical head and eye angles for all participants for each task are plotted using histograms (Fig. 5). Horizontal angles were calculated as the deviation from the sagittal plane, and vertical angles were calculated as the deviation from the transverse plane.

Fig. 5
figure 5

Normalized histograms of horizontal and vertical left eye angles and head angles of each task for all participants for all trials. The eye angles are with respect to the head frame. The red curves correspond to left eye angles and the blue curves correspond to the head angles. Only left eye angles are included because the right eye angles were similar. After calculating each participant’s histogram for each task, the mean histograms (solid lines) and first and third quartile histograms (filled band) are produced.

The majority of eye angle data are centered at zero while the head angles have a wider distribution than the eye angles. This is most likely due to the head rotating to keep the eyes centered on the target, which aligns with the behavior observed in the literature14,15. It was suggested that such a behavior reduces cognitive load13. For the searching tasks, there are two symmetric non-zero peaks for the head angles, an expected outcome due to the participants needing to search their environment for targets. Due to the symmetry of the task environments and approximately uniform spread of target positions, we do not observe any skewness or concentration of head angles. To further demonstrate the data, a five second window of eye and head angles for a random subject is shown in Fig. 6. For the searching tasks (Task 3 and 4), the eye and head angles show more activity while the pursuit tasks are more mild. In the pursuit tasks, the eye angles are nearly constantly zero, showing that the head is centering the gaze.

Fig. 6
figure 6

Five second window of left eye angles and head angles of each task of a random subject of a random trial. The eye angles are with respect to the head frame. The red curves correspond to left eye angles and the blue curves correspond to the head angles. Only left eye angles are included because the right eye angles were similar.

Technical Validation

In addition to the headset, base stations need to be installed for best capture of the headset motions. For this dataset collection, two were installed on opposite corners of the capture volume according to onscreen instructions of the virtual reality software and manual. The capture volume was roughly 8 × 8 × 8 cubic feet. The headset must be adjusted and calibrated to each participant. This was performed by following the software calibration procedure of the headset: the participant first fit and tighten the headset on their head, then use a dial on the headset to adjust the interpupillary distance following the onscreen instructions, and lastly follow a target at five different positions with the head fixed to calibrate the eye tracking measurement. This process can be found in the headset manual. Literature suggests that after proper calibration, a mean accuracy of 1.08° and 4.16° for this headset can be achieved in a 60° by 60° and a 100° by 100° conic window, respectively24,25. Instructions given on how to run and modify the experiment with the given code (see Code Availability section) can be found in a README.txt file included with the code located in the root directory. For example, to change the fixation time required to respawn targets in the search tasks, the code file EyeTrackingTest/Assets/Scripts/HighlightAtGaze.cs can be modified. To add a new task to the experiment pipeline, the code file EyeTrackingTest/Assets/Scripts/ModelSim.cs can be modified.

Blinking of eyes occurs during data collections. In our data record, data recorded during blinks are labelled as zeros for both eyes. The duration of blinks are short, on average (standard deviation) 0.101(0.086) seconds. This creates discontinuity of time trajectories of the eyes. However, this can be addressed in post processing by proper interpolation techniques (e.g., linear interpolation).

A limitation of our dataset is that we did not include the 3D trajectories of the targets for most of our participants. This data, however, can be included in future data collections by modifying the code found in EyeTrackingTest/Assets/Scripts/GazeCollection2.cs, as we have included for data collected from participants 21–25. The virtual reality system does not allow us to capture velocity data directly, but this may be approximated by the position and time data if desired or, an external IMU may be attached to the headset which would increase the complexity of the system. Additional proper filtering should be considered when numerically computing the velocities. Another limitation of the dataset is the fixed environment conditions (e.g., lighting) in which the data were collected. These conditions, however, can be modified by future users through the codes which we made publicly available.