Performance metrics outperform physiological indicators in robotic teleoperation workload assessment

Odoh, Gift; Landowska, Aleksandra; Crowe, Emily M.; Benali, Khairidine; Cobb, Sue; Wilson, Max L.; Maior, Horia A.; Kucukyilmaz, Ayse

doi:10.1038/s41598-024-82112-4

Download PDF

Article
Open access
Published: 28 December 2024

Performance metrics outperform physiological indicators in robotic teleoperation workload assessment

Gift Odoh¹,
Aleksandra Landowska¹,
Emily M. Crowe²,
Khairidine Benali¹,
Sue Cobb³,
Max L. Wilson¹,
Horia A. Maior¹ &
…
Ayse Kucukyilmaz¹

Scientific Reports volume 14, Article number: 30984 (2024) Cite this article

4292 Accesses
3 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Robotics holds the potential to streamline the execution of repetitive and dangerous tasks, which are difficult or impossible for a human operator. However, in complex scenarios, such as nuclear waste management or disaster response, full automation often proves unfeasible due to the diverse and intricate nature of tasks, coupled with the unpredictable hazards, and is typically prevented by stringent regulatory frameworks. Consequently, the predominant approach to managing activities in such settings remains human teleoperation. Teleoperation can be demanding, especially in high-stress situations, and involves a complex blend of both cognitive and physical workload. We present an experiment to explore a range of physiological and performance-related metrics for workload assessment during robotic teleoperation. Thirty-five participants performed a teleoperation task, during which we manipulated cognitive and physical workload conditions. We recorded multiple metrics, including brain activity using functional Near-Infrared Spectroscopy, galvanic skin responses, cardiovascular responses, subjective workload ratings, task and robot performance data. Our results suggest that robotic teleoperation performance may be the most robust metric for distinguishing between different levels of workload experienced during teleoperation, with most physiological measures becoming insignificant to distinguish high cognitive workload.

Sensory manipulation as a countermeasure to robot teleoperation delays: system and evidence

Article Open access 21 February 2024

Cognitive load affects early processes involved in mentalizing robot behaviour

Article Open access 02 September 2022

Categorizing robots by performance fitness into the tree of robots

Article Open access 21 February 2025

Introduction

Teleoperation is widely used to control robots in environments that are dangerous or difficult for humans to access safely, such as nuclear waste decommissioning¹, space robotics², underwater robotics³, and surgery⁴. These systems frequently utilise leader-follower manipulators with joint-to-joint mapping to offer intuitive and convenient control interfaces for human operators⁵ (Fig. 1). However, teleoperation presents several challenges for human operators. These challenges include suboptimal display systems and limited views from static cameras, leading to blind operation; difficulty in controlling heavy objects against gravity; unergonomic postures due to extensive motion ranges of local (leader) manipulators; and need to coordinate with other humans in the control room. These issues increase both the physical and cognitive effort required from operators, causing fatigue and poor operation^6,7. Operators are also challenged to adapt to a fundamentally different sensorimotor control loop, relying on limited visual feedback from cameras⁸.

The resulting high cognitive demand in teleoperation places significant strain on operators⁹, underscoring the need for system-based assistance to operators experiencing high levels of mental workload^10,11,12. While technological progress has facilitated the development of fully autonomous robotic systems in controlled settings, many teleoperation applications resist automation due to the essential requirement for human problem-solving abilities and dexterous manipulation skills⁸. Additionally, stringent regulatory requirements in safety-critical domains hinder the adoption of autonomous robotic approaches¹³. As a result, human-in-the-loop assistance mechanisms are preferred, in the form of shared control—merging operator and system inputs^14,15,16—or traded control—alternating control between operator and system^17,18,19,20. Both paradigms require trigger mechanisms to determine when robot autonomy should intervene. Although real-time workload monitoring can be used for the application of these assistance paradigms, existing workload assessment techniques typically address only a single dimension of workload: They focus on either cognitive load induced by tasks such as arithmetic^21,22, working memory²³, stress²⁴, or delays and interruptions in information flow²⁵; or on physical load, reflected by spatial teleoperation performance⁹ or latency in sensory feedback²⁶. On the other hand, in teleoperation, multitasking is expected under critical task conditions with little freedom to switch between the primary physical task (the teleoperation) and accompanying cognitive sub-tasks (decision making, coordination within the human team etc.). Consequently, it remains unclear which workload assessment techniques are suitable for accounting for both cognitive and physical workload elements and their interplay in teleoperation.

This article presents a controlled study to identify appropriate measures for real-time workload assessment in teleoperation with the ultimate goal of developing systematic robotic assistance. For this purpose, we designed an experiment, in which participants interact with a robotic teleoperation system to fulfill a physical ring-on-a-wire task. Meanwhile, they were presented with a secondary working memory task. We controlled the physical (teleoperation vs no teleoperation) and cognitive demand (as modulated by the secondary working memory task) to observe measurable differences in these two workload dimensions. We computed a variety of metrics using various physiological channels, including brain, cardiovascular and electrodermal activity, as well as robot movement and task performance data. Interestingly, we found that physiological measures can detect changes in either cognitive or physical workload individually, but not both simultaneously. Our findings suggest that robot performance data shows the most promise for developing a workload assessment system for teleoperation.

The article is organised as follows: The methodology described in detail in Section Methods. Section Results presents the experimental findings, followed by a discussion of these results and the study’s limitations in Section Discussion. Finally, Section Conclusion offers key insights and conclusions.

Methods

Robotic teleoperation setup

The teleoperation architecture is described in detail by Singh et al.²⁷ and consists of two identical Franka Emika Panda robot arms (i.e. a local and a remote manipulator), arranged in a unilateral leader-follower setup as shown in Fig. 2. Each Panda arm is equipped with torque sensors integrated at joint actuators. A static Ethernet connection is utilised to link each robot to a dedicated workstation, minimising potential time delays. The Franka Control Interface (FCI) enables communication with the workstations through a low-level, bidirectional connection operating at 1 kHz. The open-source libfranka library is used to control the robots, providing various control options, including joint torques, positions and velocities, as well as Cartesian poses and velocities. Both the local and remote arms operate at a frequency of 1 kHz through the Simulation Lab (SL) robotics simulator and real-time control engine in real-time control mode. A control script running on the local system acts as a centralised point for controlling the robots. It also serves as a hub for ROS communication for integrating multiple sensors, such as the Octamon wireless fNIRS headset for brain activity monitoring and Shimmer GSR+ unit for GSR (Galvanic Skin Response) and PPG (Photoplethysmograph) measurements. It also incorporates an Arduino for real-time primary task performance monitoring, automatically tracking the contact between the ring and wire. This centralization simplifies operations and enhances coordinated data collection for accuracy, synchronization, and event tracking. The remote robot is connected to a dedicated controller and a separate remote PC with the capability of data exchange and coordination with the local robot workstation. This architecture is illustrated in Fig. 3. Custom Python scripts, using a wrapper-based approach and dedicated driver-based APIs, were implemented to ensure seamless integration of fNIRS and Shimmer sensors with the main ROS system, converting sensor data into ROS messages. The Arduino board is connected via rosserial, facilitating communication.

In our setup, the robotic arms are equipped with a zero torque controller for gravity compensation. This gravity compensation allows the operator to move the local arm effortlessly and intuitively through space as an extension of their movements without additional torque commands. The forward dynamics of an $n$-dof robot manipulator, where $n = 7$ in our setup, is given by (1), where $\mathbf{M}(\mathbf{q}) \in {\mathbb {R}}^{n \times n}$ represents the mass matrix for joint configuration, q, $\mathbf{C}(\mathbf{q}, \ddot{\mathbf{q}}) \in {\mathbb {R}}^{n \times n}$ accounts for the Coriolis effect and $\mathbf{g}(\mathbf{q}) \in {\mathbb {R}}^n$ denotes the joint torques due to gravity. In our setup, we omit g(q) because the robot arms are already gravity-compensated. The Coriolis term, $C(q,\ddot{q})$, is also negligible due to the low velocities of motion, hence omitted.

$$\begin{aligned} \tau = {M}(q) \ddot{q} + {C}(q, \dot{q}) \dot{q} + g(q) + \tau _{\text {ext}} \end{aligned}$$

(1)

where $q,\dot{q},\ddot{q} \in {\mathbb {R}}^n$ are the joint positions, velocities and accelerations, respectively and $\tau \in {\mathbb {R}}^n$ is the vector of joint actuation torques.

The remote robot acts on commands from the local robot and translates these instructions into physical actions by replicating the operator’s movements within the operational environment. The system uses a PD controller for unilateral position-position teleoperation to synchronise joints movements as formulated in (2):

$$\begin{aligned} \tau _{\text {cmd}} = K_p (q_r - q_l) - K_d \dot{q}_l , \end{aligned}$$

(2)

where $q_r$ and $q_l$ denote the remote and local joint angles, respectively. $K_p$ and $K_d$ are diagonal matrices that define the controller’s proportional and derivative gains, respectively, with set values: $K_p = diag([120,120,120,120,20,20,4])$ and $K_d = diag([10,10,10,10,6,5,3])$.

Experimental tasks

Primary physical task: teleoperated ring-on-a-wire

The participants operated a local Franka Emika Panda arm to perform a ring-on-a-wire task with a kinematically identical remote arm, as illustrated in Fig. 2. The task required maneuvering a 12 cm diameter ring, attached to the gripper of the remote arm, along a straight wire spanning 75 cm between two designated points. Participants had 60 seconds to move the ring back and forth as many times as possible without touching the wire.

To prevent direct observation, the remote arm was concealed behind a curtain. Frontal and side views of the ring-on-wire setup were captured by two cameras (also shown in Fig. 2), and displayed side-by-side on a 65-inch screen, enabling participants to monitor and complete the remote task.

An Arduino board controlled both the buzzer and the LED strip. If the ring touched the wire at any point during the task, a circuit was completed that changed the voltage state of an Arduino’s input, activating a buzzer and an LED strip. Both the buzzer and the LED strip remained activated for as long as the contact was maintained. The Arduino communicated with the local arm computer via a serial port connection using the ROSSerial protocol, and published a ROS topic at 10 Hz to record the contact state between the ring and the wire.

Secondary cognitive task: auditory n-back working memory task

An auditory n-back task was implemented as a secondary task to modulate participants’ cognitive workload levels. The n-back task was chosen for its established efficacy in manipulating cognitive workload²⁸. In the auditory n-back task, participants listened to a stream of spoken stimuli. The auditory stimuli were pre-recorded spoken digits ranging from 0 to 9, played through audio speakers at 90 dB for 1 second each, with a sampling rate of 16,000 Hz. The objective of the task was to determine whether each current stimuli matched the one heard ‘n’ steps ago in the sequence, by verbally responding “yes” for a match (hit) or “no” for a non-match (miss). For example, if ‘n’ is 2, participants compared the current digit with the digit heard two positions earlier. Following each digit presentation, there was a 2-second interval for participants to provide their verbal response. In the zero-back condition, participants were instructed to respond with a “yes” to every digit, serving as a baseline to control for brain activity related to speech production. Responses were recorded through a microphone, and analysed post-experiment to compute the n-back task accuracy.

Each n-back task lasted 60 seconds to match the duration of the teleoperation task, comprising 20 trials. Within these trials, 30% of the digits were designated as targets (digits matching the n-back criterion) and 70% as non-targets (digits not matching the criterion). The sequence of digits was presented in a pseudo-random order across all trials; hence, the auditory stimuli was uniform across participants, although an illusion of randomness was created per individual.

Design and procedure

We controlled both the physical demand (Teleoperation [T] vs No Teleoperation [NT]) and cognitive demand (as varied by $n \in \{0,1,2,3\}$ in the auditory n-back task) to observe measurable differences in these workload dimensions. Participants engaged in two blocks: one in which they performed the teleoperation task and another, in which they stood with their hands on the robot workbench, maintaining an upright posture to prevent blood from rushing to the forehead. In each block, they completed three 60-second trials for each $n \in \{0,1,2,3\}$, with 10-15 second rest periods between trials. These trials coincided with the ring-on-a-wire task in the Teleoperation condition. During the rest periods between trials, participants verbally responded to three questions about perceived physical, cognitive, and overall workload, providing a subjective rating between 1 (lowest) to 5 (highest). The order of all conditions were counterbalanced using a Latin square method. Figure 4 illustrates the experimental blocks.

The study used a $2 \times 4$ within-subjects design. The independent variables were teleoperation mode (teleoperation, no-teleoperation) and n-back condition (0-back, 1-back, 2-back, 3-back). Participants took place in the experiment over two days, completing a familiarization session, followed by the experimental sessions.

Familiarization session

Participants attended a familiarization session at least one day prior to their experimental session to acquaint themselves with the tasks, mitigate novelty effects and enable more reliable brain data collection by controlling for brain plasticity-related responses. During this session, they practiced with both the n-back task and teleoperation until they felt comfortable with the procedures. No data were recorded during this session.

Experimental session

Upon arrival at the lab, participants received instructions regarding the experimental session and had the opportunity to ask questions. They then completed a brief demographic questionnaire consisting of 9 items, which took approximately 3 minutes. Following this, participants were fitted with the Octamon fNIRS headset and Shimmer sensors, and the researcher adjusted each fNIRS channel to ensure stable signal acquisition. Participants were then shown the teleoperation system once more and given a brief reminder about the ring-on-a-wire and n-back tasks.

Prior to the start of the experimental sessions, a three-minute resting period was recorded to establish a baseline of each participant’s brain activity using fNIRS. Following this baseline recording, the main data collection phase commenced.

During the experimental sessions, participants received brief prompts about the experimental conditions via prerecorded messages played over speakers. Each of the four n-back levels were performed in either the Teleoperation (T) condition, prompted by instruction “n-back, motion”, or the No-Teleoperation (NT) condition, prompted by instruction “n-back, no motion”, where n is a value between 0 and 3 depending on the current condition.

In T condition, participants were asked to actively observe the screen to move the local arm to execute the teleoperation, and simultaneously respond to the sequence of digits for the n-back task, by vocalising “yes” for a hit or “no” for a miss. At the end of the trial, participants heard a “Park” prompt, instructing them to guide the local arm back to the starting position on the wire, as marked by a red tape clearly visible in the camera feed. In NT, participants rested their arms on the workbench instead of performing the ring-on-a-wire task.

After each trial, regardless of the teleoperation condition, participants heard a “Rest” prompt. In T mode, this prompt meant they should let go of the local arm. In both modes, during rest, participants were asked to relax, and fix their gaze on a visual marker (a yellow sticker) attached to the local arm, while maintaining their arms in a restful position on the workstation. In this period, the experimenter asked participants to verbally rate their cognitive, physical, and overall workload. These concepts were explained to them prior to the beginning of the experiments. The total duration of the experiment was approximately 1 hour.

Participants

Thirty-five healthy participants (22 males, 12 females, 1 non-binary, average age 25.9) took part in the experiment. Each participant signed an informed consent form, agreeing to the publication of identifying information and images in an online open-access publication. They received a £20 gift voucher as compensation for their time. Ethical approval was obtained from the School of Computer Science Ethics Committee of the University of Nottingham under reference CS-2021-R55 and all experiments were performed in accordance with relevant guidelines and regulations²⁹.

Metrics

In this study, we employ a variety of physiological and robotic performance measurements to evaluate physical and cognitive workload during robotic teleoperation. Additionally, we use a subjective workload assessment scale to gain insights into participants’ perceived physical, cognitive, and overall workload. This scale also serves to verify the effectiveness of the experimental conditions in altering the workload dimensions.

Performance measurements are indicators of robot teleoperation efficiency and effectiveness. The measured physiological data include electrodermal and cardiovascular measurements, as well as brain activity. Specifically, we utilise Galvanic Skin Response (GSR) and Photoplethysmography (PPG) to assess autonomic nervous system activity. Additionally, we measure functional Near-Infrared Spectroscopy (fNIRS) to monitor brain activity, providing valuable information on cerebral oxygenation and hemodynamics.

These measures, collectively, allow for a comprehensive analysis of both central and peripheral physiological responses, task performance, and subjective experiences, offering a holistic view of the body’s and mind’s reactions to changes in physical and cognitive workload levels.

Subjective perceptions of workload

We drew inspiration from the NASA Task Load Index (NASA-TLX) when selecting our subjective workload questions, which encompass both cognitive and physical aspects of workload. During piloting, we found that administering a NASA-TLX after each block was impractical for our methods: Participants had to leave the teleoperation workbench to complete paper or computer-based questionnaires, which would disrupt physiological data collection. Additionally, integrating NASA-TLX into the precise timing of tasks proved challenging, and overall participation duration became excessive. As a result, we opted for three single-dimensional rating scales:

Cognitive Workload: How much mental and perceptual activity was required? Was the task easy or demanding, simple or complex?
Physical Workload: How much physical activity was required? Was the task easy or demanding, slack or strenuous?
Overall Workload: How hard did you have to work (mentally and physically) to accomplish your level of performance?

Each of these components was rated on a 5-pt Likert scale. The participants verbally reported their ratings after each trial when prompted. We presented participant with guidelines for interpretation as shown in Table 1.

Table 1 Participant instructions for the subjective workload scale.

Full size table

Teleoperation performance measures

N-back Accuracy: Performance on the n-back task was quantified as the percentage of correctly identified responses.

Number of Ring Touches: The count of instances when the ring touches the wire, triggering the buzzer.

Ring Contact Time: The average duration of contact between the ring and the wire.

Average Speed of the End Effector: This metric quantifies the average speed of the remote arm’s end-effector throughout a trial. It indicates how quickly a participant operates the remote robot to complete the ring-on-a-wire task. The speed is calculated by dividing the total trajectory length covered during the trial by the trial’s duration.

$$\begin{aligned} {\text {Average speed}} = \frac{\sum _{i=1}^{N-1} \sqrt{(x_{i+1} - x_i)^2 + (y_{i+1} - y_i)^2 + (z_{i+1} - z_i)^2}}{t_{\text {finish}}({\mathrm{trial}}) - t_{\text {start}}({\mathrm{trial}})} , \end{aligned}$$

(3)

where N is the number of data samples in a trial; $x_i$, $y_i$ and $z_i$ are respectively the x, y and z Cartesian coordinates of the remote arm’s end effector for the ith sample; and $t_{\mathrm {start}}(\text {trial})$ and $t_{\mathrm {finish}}(\text {trial})$ are respectively start and end times of the trial.

Average Power Consumption: This metric represents the robot arm’s average power output, providing a measure of the participant’s efficiency in controlling the robot arm. The average power is calculated as:

$$\begin{aligned} \text {P} = \frac{1}{N} \sum _{i=1}^{N} \left( \sum _{j=1}^{7} |\tau _{i,j} \cdot \dot{\theta }_{i,j}| \right) , \end{aligned}$$

(4)

where $\tau _{i,j}$ is the torque applied at the jth joint at the ith time point. $\dot{\theta }_{i,j}$ is the angular velocity of the jth joint at the ith time point.

Total Energy Consumption: This metric quantifies the total energy expended by the robot arm during a trial, reflecting the overall energy demands placed on the robot and the participant controlling it. It takes into account both the intensity and duration of the interaction. Total energy consumed is computed as:

$$\begin{aligned} E = \sum _{i=1}^{N} \left( \sum _{j=1}^{7} |\tau _{i,j} \cdot \dot{\theta }_{i,j}| \right) \Delta t_i , \end{aligned}$$

(5)

where $\tau _{i,j}$ is the torque applied at the jth joint at the ith time point, $\dot{\theta }_{i,j}$ is the angular velocity of the jth joint at the ith time point, and $\Delta t_i$ is the duration between ith and $i-1$th samples.

Robot Motion Smoothness: Spectral arc length (SAL)³⁰ is computed using the local robot’s end-effector speed for each trial to quantify motion smoothness. SAL reflects the participant’s ability to execute fluid and controlled movements, which is crucial for precision and efficiency in robotic manipulation tasks. It is derived by computing the SAL of the frequency spectra of the speed profiles of the end-effector of the local arm during a trial. SAL is typically a negative metric when computed using speed profiles, and values closer to zero indicate smoother profiles.

Electrodermal and cardiovascular measurements

Galvanic skin response (GSR) is also known as skin conductance or electrodermal activity (EDA), and is measured using exosomatic methods that pass a small current, either AC or DC, through the skin to determine its resistance^31,32. GSR is significantly influenced by eccrine sweat glands³¹, which are regulated by the sympathetic nervous system. When a person is aroused, sweat gland activity increases, leading to higher skin conductance. These glands are linked to psychological responses and stimulus processing, unlike other glands primarily involved in temperature regulation³².

We used a Shimmer3 GSR+ device to record GSR data. This device uses reusable electrodes attached to two fingers on one hand to measure the electrical conductance. The GSR data were acquired at 10 Hz. A third-order low-pass filter with a cutoff frequency of 1Hz was applied to the GSR signal to remove motion artefacts³³. Since GSR signals vary significantly between individuals, we normalised each participant’s data using mean normalisation³⁴.

We use the average normalised GSR as a workload measure. This metric is calculated by subtracting the mean GSR value of each trial from the raw time series data for each participant and then averaging the result:

$$\begin{aligned} \text {average GSR} = \frac{1}{N} \sum _{i=1}^{N} (\text {GSR}_i - \frac{1}{N} \sum _{i=1}^{N} \text {GSR}_i) , \end{aligned}$$

(6)

where N is the number of data samples in a trial.

Cardiovascular response is expected to be significantly impacted by increasing workload³⁵. Elevated workload typically results in heightened sympathetic activity and diminished parasympathetic activity³⁶. Under increased cognitive load, the sympathetic nervous system response is expected to lead to increased heart rate and decreased heart rate variability.

Shimmer3 GSR+ device was also used to measure Photoplethysmography (PPG) data by attaching an optical pulse probe to a finger. The AC component of the PPG signal indicates the heart’s rhythmic blood volume changes. PPG also efficiently measures Pulse Rate Variability (PRV) based on the time intervals between adjacent cardiac cycles^37,38. The PPG data were acquired at 10 Hz. We extracted the following metrics using the PPG data:

Pulse Rate: Pulse rate is the rate at which the arteries expand and contract in response to heartbeats due to blood volume changes in the microvascular bed of tissue. It is closely related to the heart rate and reflects the same physiological process^39,40. The pulse rate is computed as the ratio of 60 seconds to the time interval between consecutive PPG peaks. The average pulse rate is computed as the mean of these individual pulse rate values over a given period:

$$\begin{aligned} PR = \frac{1}{P-1} \sum _{i=1}^{P-1} \left( \frac{60}{\Delta t_i} \right) , \end{aligned}$$

(7)

Where P is the number of PPG peaks in a trial, and $\Delta t_i$ is the time duration between two consecutive peaks in a trial, also known as the interbeat interval (IBI).

Pulse Rate Variability: Closely related to heart rate variability, pulse rate variability refers to the variation in time intervals between consecutive beats as measured by changes in blood volume. This metric indicates the heart’s adaptability and the overall balance of the autonomic nervous system. It is obtained by measuring the time intervals between consecutive peaks in PPG data during a trial. In this study, the average pulse rate variability is computed as:

$$\begin{aligned} PRV = \frac{1}{P-1} \sum _{i=1}^{P-1} \Delta t_{i} , \end{aligned}$$

(8)

Where P is the number of PPG peaks in a trial, and $\Delta t_i$ is the time duration between two consecutive peaks in a trial.

Brain activity measurements

The assessment of brain activity has become increasingly popular as a physiological measure of workload, based on the idea that task-related brain activity consumes specific mental resources, correlating with the cognitive complexity of a task⁴¹. Functional near-infrared spectroscopy (fNIRS) offers portable, movement-friendly imaging that effectively measures workload through variations in brain hemoglobin levels in response to varying working memory loads²³. Unlike fMRI and Positron Emission Tomography (PET), which rely on neurovascular coupling^42,43, or EEG and Magnetoencephalography (MEG), which are based on electromagnetic brain activity^44,45, fNIRS offers robust monitoring of cortical hemodynamics with minimal sensitivity to bodily movements⁴⁶. fNIRS is thus beneficial in investigating workplace tasks, providing insights into areas for improvement through enhanced interfaces or systematic assistance.

In this study, we measure brain activity using the Octamon wireless fNIRS system (Artinis Medical Systems, Elst) and the Oxysoft software (version 3.2.72). The fNIRS probe covers the frontal cortex (over the forehead) where mental workload is typically observed, as shown in Fig. 5. The octamon has 8 infrared light emitters and 2 detectors, creating 8 distinct data channels, including two short-separation channels (S4 and S6 as shown in Fig. 5), which were all acquired at 10 Hz. The distance between emitters and detectors in data channels is 35 mm and measures activity in the brain. The short-separation channels (S4 and S6), at 10 mm, measure blood oxygenation in the skin, in order to subtract this data as noise. fNIRS emitters use 760 nm and 850 nm wavelengths, which reflect oxygenated and deoxygenated hemoglobin, respectively:

Oxygenated Hemoglobin (HbO) is the amount of blood cells containing oxygen, indicating activation in the brain.
Deoxygenated Hemoglobin (HbR) is the amount of cells where oxygen has been used. HbR should change inversely to HbO in order for HbO changes to be considered as workload changes.

Brain activation leads to an increase in HbO in order to function, and in tandem this should lead to a decrease in HbR. Therefore, only channels with a negatively correlated haemoglobin species (an increased HbO and decreased HbR at $q < 0.5$) are considered significant⁴⁷. An increase or decrease in both HbO and HbR at the same time, or a change in only one of the variables is considered a noise artefact.

The analysis of the fNIRS data was carried out using the NIRS Toolbox⁴⁸. Initially, the raw data were downsampled to a frequency of 4Hz to improve the signal-to-noise ratio⁴⁹. The raw signals were then converted into optical density changes and subsequently into measures of oxygenated (HbO) and deoxygenated hemoglobin (HbR) using the Beer-Lambert law, incorporating a partial path length correction factor of 0.1 for both wavelengths⁵⁰. To mitigate the impact of motion artifacts, the Temporal Derivative Distribution Repair (TDDR) method was applied. Additionally, a short-separation channel was included as a regressor in the General Linear Model (GLM) to correct for physiological noise⁵¹. Beta coefficients, indicative of activations, were determined using the autoregressive iteratively reweighted least squares method⁵². Finally, the hemodynamic response was modeled using the BoxCar function.

Results

Subjective perceptions of workload

Figure 6 shows the subjective workload ratings. These Likert scale data were analysed using non-parametric tests due to the ordinal nature of the responses. Specifically, the Wilcoxon signed-rank paired test was used for comparisons between T and NT, and the Kruskal–Wallis was used for discovering main effects of the n-back condition. In case of main effects, Dwass-Steel-Crichtlow-Fligner (DSCF) non-parametric post-hoc tests were used for pairwise comparisons between n-back levels. As DSCF is designed to control the familywise error rate inherently, we do not perform additional alpha correction for multiple comparisons.

Perceived cognitive workload

A Wilcoxon signed-rank test revealed a significant difference in perceived cognitive workload between the T and NT conditions ($z = -8.263, p < 0.001$). The Ranks table showed that there were 13 negative ranks ($N = 13, {\text {Mean Rank}} = 28.00, \text {Sum of Ranks} = 364.00$), 101 positive ranks ($N = 101, {\text {Mean Rank}} = 61.30, \text {Sum of Ranks} = 6191.00$), and 26 ties ($N = 26$), demonstrating that the perceived cognitive workload ratings were generally higher under the teleoperation condition compared to the no teleoperation condition. This suggests that participants perceived teleoperation as imposing additional cognitive demands rather than solely physical ones.

Kruskal–Wallis tests revealed significant differences in perceived cognitive workload between the different levels of the n-back task under NT ($\chi ^2(3) = 107, p < 0.001, \varepsilon ^2 = 0.750$) and under T ($\chi ^2(3) = 80.0, p < 0.001, \varepsilon ^2 = 0.560$). Pairwise post-hoc comparisons indicated significant differences between all levels of the n-back task as summarised in Table 2, with lower n-back levels consistently corresponding to lower perceived cognitive workload.

These results confirm that the n-back task was able to manipulate the perceived cognitive workload as expected under the NT condition and observed significant differences were maintained whilst performing teleoperation in T.

Table 2 Pairwise comparisons of perceived cognitive workload between n-back conditions under NT and T.

Full size table

Perceived physical workload

A Wilcoxon signed-rank test revealed a significant difference in perceived physical workload between the T and NT conditions ($z = -9.988, p < 0.001$). The Ranks table showed that there were 0 negative ranks ($N = 0$), indicating that no perceived physical workload ratings under the teleoperation condition were lower than the perceived physical workload ratings under the no teleoperation condition. However, there were 132 positive ranks ($N = 132, {\text {Mean Rank}} = 66.50, \text {Sum of Ranks} = 8778.00$) and 8 ties ($N = 8$), indicating that perceived physical workload ratings were generally higher under the teleoperation condition compared to the no teleoperation condition. This suggests that physical workload was significantly greater during teleoperation, as expected

The Kruskal–Wallis test revealed no significant differences in perceived physical workload under NT across the different levels of the n-back task ($\chi ^2(3) = 0.590, p = 0.899, \varepsilon ^2 = 0.0004$). This indicates that participants did not experience significantly different levels of physical workload during n-back tasks alone.

On the other hand, under T, the Kruskal–Wallis test revealed significant differences in physical workload across the different levels of the n-back task ($\chi ^2(3) = 11.2, p = 0.011, \varepsilon ^2 = 0.007$). Pairwise comparisons indicated that physical workload was perceived to be significantly lower in the 0-back condition compared to the 3-back condition ($W = 4.15, p = 0.018$), but other comparisons were not significant. The results are presented in Table 3.

Table 3 Pairwise comparisons of perceived physical workload between n-back conditions under NT and T.

Full size table

Perceived overall workload

A Wilcoxon signed-rank test revealed a significant difference in perceived overall workload between the T and NT conditions ($z = -9.372, p < 0.001$). The Ranks table showed that there were 6 negative ranks ($N = 6, {\text {Mean Rank}} = 27.50, \text {Sum of Ranks} = 165.00$), 120 positive ranks ($N = 120, {\text {Mean Rank}} = 65.30, \text {Sum of Ranks} = 7836.00$), and 14 ties ($N = 14$), demonstrating that the perceived overall workload ratings were generally higher under the teleoperation condition compared to the no teleoperation condition. This suggests that the overall workload was significantly higher while performing teleoperation.

The Kruskal–Wallis test revealed significant differences in overall workload across the different levels of the n-back task under NT ($\chi ^2(3) = 85.2, p < 0.001, \varepsilon ^2 = 0.596$) and T ($\chi ^2(3) = 53.1, p < 0.001, \varepsilon ^2 = 0.372).$ Pairwise comparisons indicated significant differences between all levels of the n-back task, except between 1-back and 2-back in T, as summarised in Table 4. Lower n-back levels consistently corresponded to lower perceived cognitive workload. These findings suggest that perceptions of overall workload increases consistently with the difficulty of the n-back tasks. This is a trend that primarily echoes the results of the cognitive workload component rather than the experienced physical workload.

Table 4 Pairwise comparisons of perceived overall workload between n-back conditions under NT and T.

Full size table

N-back task performance

Figure 7 shows the mean accuracy of the n-back task for all conditions, which is the percentage of correctly identified responses. A Shapiro-Wilks test revealed that the data were not normally distributed $(W = 0.683,p <0.001),$ so the n-back task performance was analysed using non-parametric tests. Specifically, the Wilcoxon signed-rank paired test was used for comparisons between T and NT, and the Kruskal–Wallis was used for discovering main effects of the n-back condition. In case of main effects, DSCF was used for pairwise comparisons between n-back levels.

A Wilcoxon signed-rank test revealed a significant difference in average accuracy between the NT and T conditions ($z = -4.394, p < 0.001$). The Ranks table showed that there were 27 negative ranks ($N = 27, {\text {Mean Rank}} = 37.57, \text {Sum of Ranks} = 1014.50$), 65 positive ranks ($N = 65, {\text {Mean Rank}} = 50.21, \text {Sum of Ranks} = 3263.50$), and 13 ties ($N = 13$), demonstrating that n-back performance was generally higher under the no teleoperation condition compared to the teleoperation condition. The results indicates that performing a teleoperation task simultaneously as the n-back task reduced the n-back performance.

Consistent with findings in the literature for n-back tasks performed without additional stimuli^53,54, Kruskal–Wallis tests revealed significant differences in accuracy across the different levels of the n-back task under both NT ($\chi ^2(3) = 106, p < 0.001, \varepsilon ^2 = 0.763$) and T conditions ($\chi ^2(3) = 89.3, p < 0.001, \varepsilon ^2 = 0.642$).

Pairwise comparisons showed that under NT, accuracy was significantly better in the 1-back condition compared to the 2-back ($W = -8.04, p < 0.001$) and the 3-back ($W = -8.17, p < 0.001$) conditions. Under T, accuracy was significantly higher in the 1-back condition compared to the 3-back condition ($W = -5.57, p < 0.001$) and higher in the 2-back condition compared to the 3-back condition ($W = -3.96, p = 0.026$).

Teleoperation performance

Several metrics are utilised to represent teleoperation performance, as detailed in “Section Metrics”. Since robot performance data exist only in the T condition, analyses were conducted using one-way ANOVAs, followed by Tukey posthoc multiple comparison tests. ANOVA showed no significant differences in the number of ring touches or the ring contact time between the four n-back conditions. The mean of other performance measures are illustrated in Fig. 8.

Average Speed of the End Effector ANOVA indicated significant differences in average speed of the remote robot’s end effector ($F(3,416) = 2.761, p = 0.042$). Post-hoc analysis using the Tukey test demonstrated that the average speed in the 0-back condition ($0.0357 \pm 0.0149 m/s$) was significantly higher than in the 3-back condition ($0.0303 \pm 0.0149 m/s, p = 0.044$). No other comparisons between n-back conditions showed significant differences in average speed.

Average Power Consumption ANOVA revealed significant differences between n-back conditions in average power consumption ($F(3,416) = 5.222, p = 0.002$). The Tukey post-hoc test indicated that the average power was significantly lower in the 2-back ($1.59 \pm 0.84 W, p = 0.041$) and 3-back conditions ($1.45 \pm 0.82 W, p = 0.001$) compared to the 0-back condition ($1.93 \pm 0.99 W$). There were no significant differences found in comparisons between other workload conditions.

Total Energy Consumption ANOVA also showed significant differences in the total energy consumption between workload conditions ($F(3,416) = 3.622, p = 0.013$). Post-hoc Tukey tests revealed that energy consumed in the 0-back condition ($64.79 \pm 42.51 J$) was significantly higher than in the 3-back condition ($48.03 \pm 35.03 J, p = 0.009$). No significant differences in energy consumption were observed between the other workload conditions.

Robot Motion Smoothness ANOVA for robot motion smoothness across different n-back conditions revealed no significant differences ($F(3,416) = 1.251, p = 0.291$).

Electrodermal and cardiovascular measurements

Galvanic skin response (GSR) and Photoplethysmography (PPG) data were recorded from 34 participants out of 35, as data recording failed for one individual. Mean normalised GSR, heart rate and heart rate variability are computed from the recorded data as mentioned in “Section Electrodermal and cardiovascular measurements”. Figure 9 illustrates the means and confidence intervals for these physiological variables for the teleoperation conditions across all levels of the n-back task.

Galvanic skin response (GSR)

A two-way ANOVA revealed a significant main effect of the teleoperation condition on the mean-normalised GSR ($F(1, 800) = 3.993, p = 0.046$), with a very small effect size (partial $\eta ^2 = 0.005$). Specifically, GSR was higher in T, compared to NT. The main effect of the n-back conditions was not significant ($F(3, 800) = 1.148, p = 0.329,$ partial $\eta ^2 = 0.004$). No interaction effects were found ($F(3, 800) = 1.484, p = 0.217$, partial $\eta ^2 = 0.006)$.

Pulse rate (PR)

A two-way ANOVA revealed that the main effect of teleoperation was statistically significant ($F(1, 800) = 4.073, p = 0.044$, partial $\eta ^2 = 0.005$), suggesting that teleoperation has an effect on pulse rate, albeit with a small effect size. The main effect of the n-back condition on pulse rate was not significant ($F(3, 800) = 0.011, p = 0.998$, partial $\eta ^2 = 0.000$), suggesting that an increasing cognitive demand does not significantly influence pulse rate, although physical demand does. No significant interaction effects were found ($F(3, 800) = 0.063, p = 0.979$, partial $\eta ^2 = 0.000$).

Pulse rate variability (PRV)

A two-way ANOVA analysis revealed that neither the main effect of teleoperation ($F(1, 800) = 1.656, p = 0.199,$ Partial $\eta ^2 = 0.002$) nor the main effect of the working memory conditions ($F(3, 800) = 0.090, p = 0.966,$ Partial $\eta ^2 < .000$) on PRV were statistically significant. This suggests that neither teleoperation or working memory conditions alone impacted the PRV. No significant interaction effects were found $(F(3, 800) = 0.090, p = 0.965,$ Partial $\eta ^2 = 0.000)$.

Brain activity measurements

To analyse fNIRS data, we employed a General Linear Model (GLM) approach. This method allows for the subtraction of data from the short-distance channels and considers the shape of the fNIRS signal rather than its mean values. After performing the GLM analysis, we applied a mixed-effects model to each channel independently to assess the impact of each experimental condition as fixed effects, with participants treated as random effects. This model helps identify significant changes in oxyhemoglobin (HbO) and deoxyhemoglobin (HbR) levels, attributing these changes to the independent variables for each fNIRS channel. Shifts in blood oxygen levels show brain activation as an impact of mental workload. Specifically, an activation due to an increase in mental workload is characterised by an increase in HbO and a decrease in HbR. For multiple comparisons, we utilised the false discovery rate (FDR) correction, setting the significance threshold at 0.05 ($q \le 0.05$) as per Benjamini and Hochberg’s method⁵⁵. Finally, we conducted contrast analyses to evaluate differences between experimental conditions. For any significant differences found, we verified that the changes were present in opposite directions for both HbO and HbR to ensure that the observed changes were due to neural activation rather than artefacts, such as those caused by motion.

We observed a significant HbR decrease in channel S5-D2, located over the left dorsolateral prefrontal cortex (DLPFC), in response to the teleoperation variable irrespective of the n-back condition ($beta = 1.86, SE = 3.80, t = -7.09, q < 0.001, power = 0.96$). However, there was no corresponding significant increase in HbO. The mixed effects model identified a significant difference over channel S5-D2 between T and NT, while performing the 1-back task.

As shown in Fig. 10, we observed a significant increase in HbO levels ($beta = 3.10, SE = 0.99, t = 3.10, q = 0.012, power = 0.90$) and a significant decrease in HbR levels ($beta = -4.60, SE = 0.93, t-statistic = -4.90, q < 0.001, power = 0.96$). We also observed a significant increase in HbO in channel S8-D2, which is also over the left DLPFC, in the 0-back task under T condition compared to NT ($beta = 6.39, SE = 2.12, t = 3.00, q = 0.035, power = 0.42$). However, contrary to normal expectations, this increase in HbO was not accompanied by a corresponding decrease in HbR, meaning that it cannot be confirmed as significant.

The mixed effects model did not identify a consistent significant effect of n-back conditions on brain activity. Under the 1-back condition, we observe a significant decrease in HbR in S8-D2 $(beta = -1.62, SE = 0.47,t = -3.45, q = 0.23, power = 0.76).$ Similarly, under the 3-back condition, a significant decrease in HbR was observed in S8-D2 ($beta = -1.98, SE = 0.47,t = -4.21, q = 0.02, power = 0.76$). However, neither of these were associated with a significant in HbO regardless of whether participants were performing teleoperation or not.

Under NT, comparing the 1-back task to the 0-back task, our analysis found a significant decrease in HbO ($beta = -5.50, SE = 0.99, t = -5.51, q < 0.001, power = 0.89$) and a significant increase in HbR ($beta = 4.53, SE = 0.93, t = 4.84, q < 0.001, power = 0.95$) in the left DLPFC, this time in channel S5-D2. We note here that this response is contrary to what is normally expected. More in line with what is expected, there was a significant increase in HbO $(beta = 4.13, SE = 0.99, t = 4.16, q < 0.001, power = 0.90)$ and a decrease in HbR ($beta = -2.51, SE = 0.93, t = -2.69, q = 0.04, power = 0.96$) in the same channel, when comparing the 2-back task to the 1-back task. Similarly, we observed a significant increase in HbO ($beta = 5.13, SE = 0.99, t = 5.15, q < 0.001, power = 0.90$) and a decrease in HbR ($beta = -3.56, SE = 0.93, t = -3.80, q < 0.001, power = 0.95$) in the S5-D2 during the 3-back task in contrast to the 1-back task.

Under T, the mixed effects model did not find the same significant differences. The only significant difference found was between the 0-back and 1-back conditions: there was a noticeable increase in HbO ($beta = 2.50, SE = 0.99, t = 2.51, q = 0.007, power = 0.90$); however, there was no corresponding decrease in HbR, indicating a potential noise. This implies that the significant differences observable by fNIRS performing different n-back tasks alone were no longer observable for the workload induced by the teleoperation task.

Discussion

This study evaluated the effectiveness of various metrics in assessing workload as a combination of physical and cognitive demands during robotic teleoperation tasks. These metrics included teleoperation performance, physiological data, and subjective workload scores. Notably, the number of ring touches and the error duration did not vary significantly across conditions, suggesting that participants may have adapted their strategies to manage workload changes while maintaining accuracy in avoiding ring-wire contact. In the rest of this section, we discuss the results and highlighting key findings, study limitations, and directions for future work.

How workload was experienced during teleoperation

Performance measures and subjective ratings confirmed that workload was successfully manipulated by the primary teleoperation task and the secondary n-back task. Comparative analysis between Teleoperation (T) and No-Teleoperation (NT) conditions revealed that teleoperation adds a layer of workload, reducing n-back task accuracy while increasing perceived cognitive, physical, and overall workload. In NT, participants performed better on lower-complexity n-back tasks (1-back) compared to higher complexity ones (3-back). This performance difference persisted during teleoperation in the T condition, consistent with previous studies using the n-back task^53,54.

Subjective workload perceptions varied significantly across cognitive, physical, and overall workload variables under the teleoperation and n-back conditions. Perceived cognitive workload increased with n-back task difficulty, and these effects were amplified under teleoperation, indicating an extra cognitive load beyond the n-back task alone. Perceived physical workload was also significantly higher in T, compared to NT, particularly in the most challenging n-back condition (3-back), suggesting that physical demands are heightened when cognitive workload is high. This perhaps indicates that physical workload during teleoperation is exacerbated by high cognitive workload. Overall workload trends were influenced by both cognitive and physical workload components, indicating that participants’ general workload experience during teleoperation is not determined solely by one factor.

These findings highlight that secondary task complexity and the existence of teleoperation significantly affect the perceived workload. The comparison between tasks done with and without teleoperation underscores the extra cognitive and physical effort required for teleoperation, emphasising the importance of understanding the impact of secondary tasks on operators.

Measuring workload during teleoperation for dynamic task demand

While subjective workload scores and task performance varied with n-back task difficulty during teleoperation, objective physiological measures did not show discernible differences across these levels. GSR and PR variables significantly differed between T and NT conditions, yet remained unaffected by different cognitive workload levels imposed by the n-back tasks. While physiological measures like GSR and PR indicated significant responses to teleoperation, the relationship between cognitive workload and physiological responses does not straightforwardly correlate with cognitive demands as might be intuitively expected.

This suggests that physiological stress responses, as reflected in GSR and PR, are more closely tied to the demands of the teleoperation task itself rather than the cognitive complexity of working memory tasks. This aligns with prior findings, which suggest that such variables, along with breathing rate, are better indicative of accumulated fatigue over time⁵⁶ rather than cognitive workload fluctuations within a task.

fNIRS was able to detect differences between n-back tasks in NT, however these differences were not observable during the high-workload primary teleoperation task in T. The analysis showed a clear pattern of brain activity in response to increasing mental workload from 0-back to 3-back tasks in NT, but not in T. Notably, moving from 0-back to 1-back tasks resulted in a decrease in HbO and an increase in HbR (the opposite to what is normally expected), indicating a significantly lower level of neural response to the 1-back task. The unexpected initial decrease in neural response from 0-back to 1-back might suggest that the 0-back condition engages participants differently, possibly due to its demand for continuous response (i.e., saying no to every stimulus) rather than a decisive response to stimuli. Anecdotally, we have found it hard in our past work to make sure participants do not mind-wander when doing near-zero effort repetitive tasks. This could lead to increased cognitive workload levels during 0-back, which then reduces when participants are engaged in a low workload 1-back task. As the tasks became more challenging (from 1-back to 3-back), the expected increases in HbO and decreases in HbR were observed, reflecting increased neural engagement and cognitive effort.

When comparing T and NT, only the 1-back task showed a significant increase in HbO and a decrease in HbR during teleoperation. This suggests that teleoperation introduces an additional layer of workload or requires a different cognitive strategy, as evidenced by the distinct neural activation patterns. Another explanation for this outcome could be that 1-back task is a balance between a task which is sufficiently engaging cognitively, but not overly demanding. This moderate level of difficulty, combined with teleoperation, might be enough to see a significant increase in workload. However, the same significant difference was not observed between T and NT for other levels of n-back task, implying again that fNIRS may not be sufficiently sensitive to distinguish already high-levels of cognitive workload.

These findings are relevant for ongoing research on using physiological data to adapt task demand in human-robot interaction. The lack of objective physiological measure differences between 1-back, 2-back, and 3-back conditions during teleoperation could be attributed to several factors. One possibility is the redistribution of cognitive resources, where individuals allocate cognitive capabilities in a way that task difficulty differences do not alter their measured cognitive workload. Another possibility is that fNIRS may not have the necessary sensitivity to detect subtle variations in cognitive workload during high workload tasks like teleoperation. Furthermore, teleoperation involves physical movements, which can introduce motion artefacts into fNIRS data, affecting brain signal quality and making it challenging to accurately measure brain activity related to teleoperation. Therefore, considering the complexity of the observed neural responses, the potential for motion artifacts inherent in teleoperation tasks, and the limitations in sensitivity to discrete variations in cognitive workload, our analysis might suggest that while fNIRS may provide useful insights in certain contexts, it may not be ideal for measuring nuanced workload variations in teleoperation.

Implications for teleoperation and human-machine interface

Increased cognitive workload negatively impacted teleoperation effectiveness across several dimensions, including remote end-effector speed, power usage, and energy expenditure, while robot motion smoothness remained largely unaffected.

Significant differences in performance indicators, particularly between the least and most demanding cognitive conditions, underscore how elevated cognitive demands compromise teleoperation. Operators facing higher workload levels may prioritise precision over speed to reduce errors, a conservative strategy often observed in various domains under increased cognitive demands. Significant differences in power and energy consumption, with lower values in higher n-back conditions, suggest that operators adopt slower, more efficient maneuvers when under cognitive strain, further indicating a cautious approach. The consistency in robot motion smoothness across cognitive workload levels may reflect either resilience in basic motor skills required for teleoperation, or a limitation in the sensitivity of smoothness measures used in the study.

Our findings suggest that a combination of subjective reports and teleoperation performance indicators offers the most robust tools for measuring workload in teleoperation. These methods provide insights into both the operator’s subjective experience and the challenges faced during the task. These insights are essential for designing teleoperation systems and interfaces for human-machine interaction. They reinforce existing research emphasizing that teleoperation interface design critically influences workload, hence affect operator performance. Misalignment in interface design can increase workload, potentially compromising operational efficiency and increasing cognitive strain on operators⁵⁷. Therefore, interface design should aim to minimize workload and stress by observing, measuring and responding to how users interact with teleoperation tasks and adapting to users’ needs and responses. In addition, strategies such as incorporating partial automation and prioritizing interface consistency, reliability, and transparency could enhance operator satisfaction and efficiency by reducing the cognitive demands of robot teleoperation and facilitating behavior prediction.

Limitations of the study

This study presents findings that should be considered alongside certain limitations related to participant experience, setup complexity, and metric scope. The participant pool, primarily consisting of University of Nottingham students and staff with limited teleoperation experience, may not fully represent the real-world settings. Although participants received training and participated in practice sessions before the experimental trials, experienced teleoperators with substantial field experience may exhibit different operational patterns and physiological responses, having developed skills and strategies to manage cognitive and physical demands more effectively.

Additionally, the experimental setup was basic, involving only two Franka Emika robotic arms separated by a curtain. This simple arrangement does not reflect the complexity of real-world teleoperation control rooms, which often feature advanced information systems, sophisticated control interfaces, complex visual displays, and strict supervision protocols. These additional elements could significantly impact teleoperator performance and workload, aspects not captured in this study.

Finally, the study focused on a limited range of performance and physiological metrics. Examining a broader set of metrics could offer a more comprehensive understanding of teleoperation under different workload levels. Additional performance measures, such as path efficiency, along with further physiological metrics, such as gaze tracking, detailed cardiovascular responses, and muscle activity through electromyography (EMG), could provide deeper understanding around the impacts of workload on teleoperation.

Conclusion

In teleoperation, where multitasking is expected under critical conditions with little freedom to switch between primary tasks and accompanying sub-tasks, it becomes crucial to investigate how workload varies with primary and secondary task demands. Most existing studies in teleoperation have focused extensively on measuring cognitive workload and physical workload relating to feedback limitations, such as latency in sensory feedback. This study is a first to explore the cognitive and physical aspects of teleoperation separately in a controlled factorial design study, aiming to identify how high workload levels from various sources can be detected. An extended goal of this work is to investigate how we can dynamically manage task demands, offering tailored support when users experience high workload.

Our results clearly show that both the primary task (teleoperation) and the secondary task (working memory) impacted performance and subjective perceptions of workload. Physiological measures related to galvanic skin response and cardiovascular activity showed significant differences in different levels of the primary task but remained unaffected by the secondary task demand. Finally, fNIRS successfully detected changes in cognitive workload for different levels of the n-back working memory task, but not for workload variations during teleoperation.

Among the approaches investigated during this study, task performance measurements were most robust to recognising variations in task demand during a high mental workload teleoperation task. Although it is possible that a larger study sample may reach levels of significance, future research could explore additional objective physiological measures like breathing patterns, pupil dilation, and facial thermography, to see if the changes in these are observable across different levels of workload during teleoperation.

Data availability

The datasets used and/or analysed during the current study are available from the corresponding author upon reasonable request.

References

Talha, M. et al. Towards robotic decommissioning of legacy nuclear plant: Results of human-factors experiments with tele-robotic manipulation, and a discussion of challenges and approaches for decommissioning. In 2016 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), 166–173 (IEEE, Lausanne, Switzerland, 2016).
Schmaus, P. et al. Knowledge driven orbit-to-ground teleoperation of a robot coworker. IEEE Robot. Autom. Lett. 5, 143–150 (2019).
Article MATH Google Scholar
Xia, P., Xu, F., Song, Z., Li, S. & Du, J. Sensory augmentation for subsea robot teleoperation. Comput. Ind. 145, 103836 (2023).
Article MATH Google Scholar
Alambeigi, F. et al. A curved-drilling approach in core decompression of the femoral head osteonecrosis using a continuum manipulator. IEEE Robot. Autom. Lett. 2, 1480–1487 (2017).
Article MATH Google Scholar
Gong, D., Zhao, J., Yu, J. & Zuo, G. Motion mapping of the heterogeneous master-slave system for intuitive telemanipulation. Int. J. Adv. Robot. Syst. 15, 1729881417748134. https://doi.org/10.1177/1729881417748134 (2018).
Article Google Scholar
Ju, C. & Son, H. I. Human-centered evaluation of shared teleoperation system for maintenance and repair tasks in nuclear power plants. Int. J. Control Autom. Syst. 20, 3418–3432. https://doi.org/10.1007/s12555-021-0770-0 (2022).
Article MATH Google Scholar
Wang, J. et al. Control strategy of master-slave manipulator based on force feedback for decommissioning of nuclear facilities. Math. Probl. Eng. 2022, 1–9. https://doi.org/10.1155/2022/9945758 (2022).
Article CAS Google Scholar
Toet, A., Kuling, I. A., Krom, B. N. & Van Erp, J. B. Toward enhanced teleoperation through embodiment. Front. Robot. AI 7, 14 (2020).
Article PubMed PubMed Central Google Scholar
Shao, S., Zhou, Q. & Liu, Z. Mental workload characteristics of manipulator teleoperators with different spatial cognitive abilities. Int. J. Adv. Robot. Syst. 16. https://doi.org/10.1177/1729881419888042 (2019).
Panzirsch, M., Balachandran, R., Weber, B., Ferre, M. & Artigas, J. Haptic augmentation for teleoperation through virtual grasping points. IEEE Trans. Haptics 11, 400–416. https://doi.org/10.1109/toh.2018.2809746 (2018).
Article PubMed Google Scholar
Luo, J., He, W. & Yang, C. Combined perception, control, and learning for teleoperation: key technologies, applications, and challenges. Cognit. Comput. Syst. 2, 33–43. https://doi.org/10.1049/ccs.2020.0005 (2020).
Article MATH Google Scholar
Bandala, M., West, C., Monk, S., Montazeri, A. & Taylor, C. J. Vision-based assisted tele-operation of a dual-arm hydraulically actuated robot for pipe cutting and grasping in nuclear environments. Robotics 8. https://doi.org/10.3390/robotics8020042 (2019).
Royal College of Surgeons of England. Robotic assisted surgery—A pathway to the future (2023).
Wang, L., Li, Q., Lam, J., Wang, Z. & Zhang, Z. Intent inference in shared-control teleoperation system in consideration of user behavior. Complex Intell. Syst. 8, 2971–2981 (2021).
Article MATH Google Scholar
Kucukyilmaz, A., Oguz, S. O., Sezgin, T. M. & Basdogan, C. Improving human-computer cooperation through haptic role exchange and negotiation. In Immersive Multimodal Interactive Presence 229–254 (2012).
Al-Saadi, Z., Hamad, Y. M., Aydin, Y., Kucukyilmaz, A. & Basdogan, C. Resolving conflicts during human-robot co-manipulation. In Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, 243–251 (2023).
Oh, Y., Schafer, T., Ruther, B., Toussaint, M. & Mainprice, J. A system for traded control teleoperation of manipulation tasks using intent prediction from hand gestures. In 2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN). https://doi.org/10.1109/ro-man50785.2021.9515440 (IEEE, 2021).
Ly, K. T., Poozhiyil, M., Pandya, H., Neumann, G. & Kucukyilmaz, A. Intent-aware predictive haptic guidance and its application to shared control teleoperation. In 2021 30th IEEE international conference on robot & human interactive communication (RO-MAN), 565–572 (IEEE, 2021).
Phillips-Grafflin, C. et al. From autonomy to cooperative traded control of humanoid manipulation tasks with unreliable communication: Applications to the valve-turning task of the darpa robotics challenge and lessons learned. J. Intell. Robot. Syst. 82, 341–361 (2015).
Article Google Scholar
Del Duchetto, F., Kucukyilmaz, A., Iocchi, L. & Hanheide, M. Do not make the same mistakes again and again: Learning local recovery policies for navigation from human demonstrations. IEEE Robot. Autom. Lett. 3, 4084–4091 (2018).
Article Google Scholar
Nourbakhsh, N., Chen, F., Wang, Y. & Calvo, R. A. Detecting users’ cognitive load by galvanic skin response with affective interference. ACM Trans. Interactive Intell. Syst. 7, 1–20 (2017).
Article Google Scholar
Engström, J., Johansson, E. & Östlund, J. Effects of visual and cognitive load in real and simulated motorway driving. Transport. Res. F Traffic Psychol. Behav. 8, 97–120 (2005).
Article MATH Google Scholar
Ayaz, H. et al. Optical brain monitoring for operator training and mental workload assessment. Neuroimage 59, 36–47 (2012).
Article PubMed MATH Google Scholar
Sam, Y. T., Natarajan, M. & Gombolay, M. C. Stress and performance in human-robot space teleoperation tasks (2020).
Fernandez Rojas, R. et al. Electroencephalographic workload indicators during teleoperation of an unmanned aerial vehicle shepherding a swarm of unmanned ground vehicles in contested environments. Front. Neurosci. 14 (2020).
Ye, Y., Zhou, T., Zhu, Q., Vann, W. & Du, J. Brain functional connectivity under teleoperation latency: a fNIRs study. https://doi.org/10.48550/ARXIV.2311.09062 (2023).
Singh, J., Srinivasan, A. R., Neumann, G. & Kucukyilmaz, A. Haptic-guided teleoperation of a 7-dof collaborative robot arm with an identical twin master. IEEE Trans. Haptics 13, 246–252 (2020).
Article PubMed Google Scholar
Kirchner, W. K. Age differences in short-term retention of rapidly changing information. J. Exp. Psychol. 55, 352 (1958).
Article CAS PubMed MATH Google Scholar
University of Nottingham. Ethics and integrity. https://www.nottingham.ac.uk/research/ethics-and-integrity/index.aspx. Last Accessed 22 August 2024.
Balasubramanian, S., Melendez-Calderon, A. & Burdet, E. A robust and sensitive metric for quantifying movement smoothness. IEEE Trans. Biomed. Eng. 59, 2126–2136 (2011).
Article PubMed MATH Google Scholar
Boucsein, W. Electrodermal Activity (Springer, 2012).
Book Google Scholar
Stern, R. M., Ray, W. J. & Quigley, K. S. Psychophysiological Recording (Oxford University Press, 2001).
Google Scholar
Topoglu, Y., Watson, J., Suri, R. & Ayaz, H. Electrodermal Activity in Ambulatory Settings: A Narrative Review of Literature, 91–102 (Springer International Publishing, 2019).
Chao, C.-J., Lin, C.-H. & Hsu, S.-H. An assessment of the effects of navigation maps on drivers’ mental workloads. Percept. Mot. Skills 118, 709–731 (2014).
Article PubMed MATH Google Scholar
Delliaux, S., Delaforge, A., Deharo, J.-C. & Chaumet, G. Mental workload alters heart rate variability, lowering non-linear dynamics. Front. Physiol. 10 (2019).
Moses, Z. B., Luecken, L. J. & Eason, J. C. Measuring task-related changes in heart rate variability. In 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE, 2007).
Ghamari, M. A review on wearable photoplethysmography sensors and their potential future applications in health care. Int. J. Biosens. Bioelectron. 4 (2018).
Gonzalez, R. et al. A computer based photoplethysmographic vascular analyzer through derivatives. In 2008 Computers in Cardiology (IEEE, 2008).
Huveneers, S., Daemen, M. J. & Hordijk, P. L. Between rho(k) and a hard place: The relation between vessel wall stiffness, endothelial contractility, and cardiovascular disease. Circ. Res. 116, 895–908 (2015).
Article CAS PubMed Google Scholar
Hamilton, P., Lockhart, C., Quinn, C. & Mcveigh, G. Arterial stiffness: clinical relevance, measurement and treatment. Clin. Sci. 113, 157–170 (2007).
Article CAS MATH Google Scholar
Causse, M., Chua, Z., Peysakhovich, V., Del Campo, N. & Matton, N. Mental workload and neural efficiency quantified in the prefrontal cortex using fnirs. Sci. Rep. 7. https://doi.org/10.1038/s41598-017-05378-x (2017).
Ochsner, K. N., Bunge, S. A., Gross, J. J. & Gabrieli, J. D. E. Rethinking feelings: An fMRI study of the cognitive regulation of emotion. J. Cogn. Neurosci. 14, 1215–1229 (2002).
Article PubMed Google Scholar
Andreasen, N. C. et al. Schizophrenia and cognitive dysmetria: A positron-emission tomography study of dysfunctional prefrontal-thalamic-cerebellar circuitry. Proc. Natl. Acad. Sci. 93, 9985–9990 (1996).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Ray, W. J. & Cole, H. W. EEG alpha activity reflects attentional demands, and beta activity reflects emotional and cognitive processes. Science 228, 750–752 (1985).
Article ADS CAS PubMed Google Scholar
Halgren, E. Cognitive response profile of the human fusiform face area as determined by meg. Cereb. Cortex 10, 69–81 (2000).
Article CAS PubMed Google Scholar
Pinti, P. et al. The present and future use of functional near-infrared spectroscopy (fNIRs) for cognitive neuroscience. Ann. N. Y. Acad. Sci. 1464, 5–29. https://doi.org/10.1111/nyas.13948 (2018).
Article ADS PubMed PubMed Central MATH Google Scholar
Cui, X., Bray, S. & Reiss, A. L. Functional near infrared spectroscopy (NIRs) signal improvement based on negative correlation between oxygenated and deoxygenated hemoglobin dynamics. Neuroimage 49, 3039–3046 (2010).
Article CAS PubMed Google Scholar
Santosa, H., Zhai, X., Fishburn, F. & Huppert, T. The NIRs brain analyzir toolbox. Algorithms 11, 73 (2018).
Article MathSciNet PubMed PubMed Central MATH Google Scholar
Huppert, T. J. Commentary on the statistical properties of noise and its implication on general linear models in functional near-infrared spectroscopy. Neurophotonics 3, 010401–010401 (2016).
Article PubMed PubMed Central MATH Google Scholar
Strangman, G., Franceschini, M. A. & Boas, D. A. Factors affecting the accuracy of near-infrared spectroscopy concentration calculations for focal changes in oxygenation parameters. Neuroimage 18, 865–879 (2003).
Article PubMed MATH Google Scholar
Fishburn, F. A., Norr, M. E., Medvedev, A. V. & Vaidya, C. J. Sensitivity of fNIRs to cognitive state and load. Front. Hum. Neurosci. 8, 76 (2014).
Article PubMed PubMed Central MATH Google Scholar
Barker, J. W., Aarabi, A. & Huppert, T. J. Autoregressive model based algorithm for correcting motion and serially correlated errors in fNIRs. Biomed. Opt. Express 4, 1366–1379 (2013).
Article PubMed PubMed Central MATH Google Scholar
Dimitrakopoulos, G. N. et al. Cognitive reorganization due to mental workload: A functional connectivity analysis based on working memory paradigms. Appl. Sci. 13, 2129 (2023).
Article CAS MATH Google Scholar
von Janczewski, N., Wittmann, J., Engeln, A., Baumann, M. & Krauß, L. A meta-analysis of the n-back task while driving and its effects on cognitive workload. Transport. Res. F: Traffic Psychol. Behav. 76, 269–285 (2021).
Article Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 57, 289–300 (1995).
Article MathSciNet MATH Google Scholar
Argyle, E. M., Marinescu, A., Wilson, M. L., Lawson, G. & Sharples, S. Physiological indicators of task demand, fatigue, and cognition in future digital manufacturing environments. Int. J. Hum Comput. Stud. 145, 102522 (2021).
Article Google Scholar
Rea, D. J. & Seo, S. H. Still not solved: A call for renewed focus on user-centered teleoperation interfaces. Front. Robot. AI 9 (2022).

Download references

Acknowledgements

This work was supported by EPSRC Energy Programme [Grant Number EP/W006839/1], Centre for Doctoral Training in Horizon [Grant Number EP/S023305/1], Connected Everything II Network+ [Grant Number EP/S036113/1], and UKRI Trustworthy Autonomous Systems Hub [Grant Number EP/V00784X/1]. The authors would like to thank Dr Pablo Lopez-Custodio for his contributions to the early-stage experiments and Dr Harun Tugal for his valuable comments on the manuscript and his insights on the nuclear teleoperation use case scenario.

Author information

Authors and Affiliations

School of Computer Science, University of Nottingham, Nottingham, UK
Gift Odoh, Aleksandra Landowska, Khairidine Benali, Max L. Wilson, Horia A. Maior & Ayse Kucukyilmaz
School of Psychology, University of Nottingham, Nottingham, UK
Emily M. Crowe
Faculty of Engineering, University of Nottingham, Nottingham, UK
Sue Cobb

Authors

Gift Odoh
View author publications
Search author on:PubMed Google Scholar
Aleksandra Landowska
View author publications
Search author on:PubMed Google Scholar
Emily M. Crowe
View author publications
Search author on:PubMed Google Scholar
Khairidine Benali
View author publications
Search author on:PubMed Google Scholar
Sue Cobb
View author publications
Search author on:PubMed Google Scholar
Max L. Wilson
View author publications
Search author on:PubMed Google Scholar
Horia A. Maior
View author publications
Search author on:PubMed Google Scholar
Ayse Kucukyilmaz
View author publications
Search author on:PubMed Google Scholar

Contributions

A.K. and H.M. conceptualized the study and designed the experiment in collaboration with E.M.C., S.C. and M.W.. A.K. provided the robotic teleoperation setup and its human-machine interface. K.B. and G.O. implemented the human-robot data collection pipeline, while A.L. developed the brain data collection utilities. A.L., G.O., and K.B conducted the experiments. G.O. and A.L. analysed the data. The majority of the text is written by G.O., A.L., A.K. and M.W. All authors attended meetings, interpreted the data and reviewed the manuscript.

Corresponding author

Correspondence to Ayse Kucukyilmaz.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Odoh, G., Landowska, A., Crowe, E.M. et al. Performance metrics outperform physiological indicators in robotic teleoperation workload assessment. Sci Rep 14, 30984 (2024). https://doi.org/10.1038/s41598-024-82112-4

Download citation

Received: 23 August 2024
Accepted: 02 December 2024
Published: 28 December 2024
DOI: https://doi.org/10.1038/s41598-024-82112-4

Subjects

Abstract

Similar content being viewed by others

Sensory manipulation as a countermeasure to robot teleoperation delays: system and evidence

Cognitive load affects early processes involved in mentalizing robot behaviour

Categorizing robots by performance fitness into the tree of robots

Introduction

Methods

Robotic teleoperation setup

Experimental tasks

Primary physical task: teleoperated ring-on-a-wire

Secondary cognitive task: auditory n-back working memory task

Design and procedure

Familiarization session

Experimental session

Participants

Metrics

Subjective perceptions of workload

Teleoperation performance measures

Electrodermal and cardiovascular measurements

Brain activity measurements

Results

Subjective perceptions of workload

Perceived cognitive workload

Perceived physical workload

Perceived overall workload

N-back task performance

Teleoperation performance

Electrodermal and cardiovascular measurements

Galvanic skin response (GSR)

Pulse rate (PR)

Pulse rate variability (PRV)

Brain activity measurements

Discussion

How workload was experienced during teleoperation

Measuring workload during teleoperation for dynamic task demand

Implications for teleoperation and human-machine interface

Limitations of the study

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links