Measurement of stretch-evoked brainstem function using fMRI

Zonnino, Andrea; Farrens, Andria J.; Ress, David; Sergi, Fabrizio

doi:10.1038/s41598-021-91605-5

Download PDF

Article
Open access
Published: 15 June 2021

Measurement of stretch-evoked brainstem function using fMRI

Andrea Zonnino¹,
Andria J. Farrens¹,
David Ress² &
…
Fabrizio Sergi¹

Scientific Reports volume 11, Article number: 12544 (2021) Cite this article

9469 Accesses
12 Citations
6 Altmetric
Metrics details

Subjects

Abstract

Knowledge on the organization of motor function in the reticulospinal tract (RST) is limited by the lack of methods for measuring RST function in humans. Behavioral studies suggest the involvement of the RST in long latency responses (LLRs). LLRs, elicited by precisely controlled perturbations, can therefore act as a viable paradigm to measure motor-related RST activity using functional Magnetic Resonance Imaging (fMRI). Here we present StretchfMRI, a novel technique developed to study RST function associated with LLRs. StretchfMRI combines robotic perturbations with electromyography and fMRI to simultaneously quantify muscular and neural activity during stretch-evoked LLRs without loss of reliability. Using StretchfMRI, we established the muscle-specific organization of LLR activity in the brainstem. The observed organization is partially consistent with animal models, with activity primarily in the ipsilateral medulla for flexors and in the contralateral pons for extensors, but also includes other areas, such as the midbrain and bilateral pontomedullary contributions.

The mesencephalic locomotor region recruits V2a reticulospinal neurons to drive forward locomotion in larval zebrafish

Article Open access 04 September 2023

Towards robust in vivo quantification of oscillating biomagnetic fields using Rotary Excitation based MRI

Article Open access 13 September 2022

The history and future of resting-state functional magnetic resonance imaging

Article 28 May 2025

Introduction

Multiple secondary pathways are known to participate, together with the corticospinal tract (CST), in controlling motor actions¹. Among these secondary pathways, the reticulospinal tract (RST) is especially important for its involvement in locomotion², maintenance of posture³, reaching⁴ and grasping⁵.

Anatomically, the RST is divided in two distinct pathways⁶: the medial tract that originates in the pons, and the lateral tract that originates in the medulla. In contrast to the CST that has a lateralized organization, with 85% of the fibers that cross the midline innervating contralateral muscles^1,7, the RST shows a bilateral organization. A single axon that originates in either side of the reticular formation (RF) can innervate muscles on both sides of the body^4,8,9, with stimulation of the RF that has been observed to produce bilateral muscle activity^8,10,11. Moreover, while the CST is known to provide excitatory stimuli primarily to contralateral muscles¹², the most widely accepted model of motor organization in the RST indicates that the medial RST provides excitation to the ipsilateral extensors and inhibition to the contralateral flexors, and the lateral RST provides inhibition to the contralateral extensors and excitation to the ipsilateral flexors^5,8,10,13.

Although secondary to the corticospinal tract, the RST could assume considerable importance for studying the neural basis of motor impairment and recovery after corticospinal lesions. While previous research advanced that the RST could be a useful target for neuro-rehabilitative interventions that aim to strengthen neural drive to skeletal muscles^8,10,14,15, recent research also shows that in humans increased RST function or structural integrity is associated with post-stroke impairment, including loss of independent joint control and hyperexcitability of stretch reflex^16,17,18. However, given the limited capability of measuring directly and in-vivo function in the brainstem nuclei originating the RST, a complete understanding of the role of RST in neuromotor impairment and recovery is currently lacking. As such, a new technique capable of quantifying in-vivo function of the RF during motor tasks would be very useful for both basic and translational neuroscience.

Unfortunately, due to the small size of the brainstem and its location deep in the cranium, direct measurements of RF function during active motor tasks is not feasible, as the only viable approach would be using non-invasive technique based on neuroimaging. However, the between-subject and between-task variability associated with experiments involving voluntary responses, combined with the relatively low signal-to-noise ratio of functional neuroimaging of the brainstem¹⁹, make these investigations very challenging.

Several recent studies indicate that the RF might be actively involved in modulating the amplitude of long latency responses (LLRs), a stereotypical response evoked in muscles after a perturbation stretching a set of muscles²⁰, with response times for upper extremity comprised between 50 and 100 ms. LLRs are a fundamental component of motor control, as they gracefully blend the fast reaction time afforded by short latency reflexes (SLR) (delay shorter than 50 ms) with the flexible and skilled action of voluntary movement (responses occurring more than 100 ms after stimulus). We know that the LLR follows a transcortical pathway with a prominent role of the primary motor cortex^21,22, but the RF also processes some features of LLRs²⁰. Direct evidence on the involvement of the RF in LLRs comes from neuronal recordings in behaving cats, where a burst in RF activity was detected in response to a foot drop²³. In humans, a direct link between RF function and LLRs has not yet been established, primarily due to the lack of direct methods for measuring RF function in humans. It is currently established that LLR responses are due to the temporal overlap of two responses, the task-dependent component and the automatic response^24,25, and it is possible that these responses may be produced by non-identical pathways. Using the StartReact paradigm, investigators have established that startle stimuli can elicit the discharge of a planned motor action of upper limb muscles within 70 ms from the stimulus, compared to the 100 ms response time measured in absence of startle^26,27. The measured reduction in response time likely reflects the engagement of the RF in the StartReact paradigm since those circuits are directly responsible for the startle response. Moreover, the LLR task-dependent component and StartReact have been observed to share striking similarities at the level of muscle recordings and modulation by experimental factors²⁸.

Because LLRs are “semi-reflexive” responses, they are less affected by confounds such as individual subject skill and task performance, resulting in smaller between-subject variability than is found for voluntary motor tasks. As such, precisely evoked LLRs may be a means to reliably stimulate the RST to enable the direct measurement of motor activity of the RF associated with rapid joint stretch using neuroimaging. Yet, while the approaches used in previous research, based on startle stimuli and/or inhibition of cortical areas, did indeed allow to conclude that brainstem regions such as the RF might be involved in LLRs, the knowledge gained via those investigations had little spatial specificity. As such, details on which areas of the RF are associated with LLRs of specific muscles are still unknown.

Here, we present StretchfMRI, a novel technique that we have developed to study the brainstem correlates of LLRs in-vivo in humans. StretchfMRI combines robotic perturbations with electromyography (EMG) and functional Magnetic Resonance Imaging (fMRI) to provide simultaneous recording of neural and muscular activity associated with LLRs. In this paper, we demonstrate that StretchfMRI enables the reliable quantification of both EMG and fMRI associated with LLRs, and establish muscle-specific representation of LLR activity in the brainstem. Additionally, we present an exploratory analysis to determine cortical areas associated with LLR activity for flexors and extensors. Some of the development stages toward StretchfMRI (including the robotic perturbator and the use of fMRI sequences with silent windows) have been presented in preliminary form in an earlier conference paper²⁹. The novel components presented here for the first time include the methods for simultaneous measurement of EMG and fMRI, their validation, and a human-subject experiment to identify RF activity associated with LLRs of a flexor and an extensor muscle.

Materials and methods

StretchfMRI technique

The investigation of the neural substrates of LLRs via fMRI requires simultaneous measurement of muscular and neural activity during the application of velocity-controlled perturbations that condition LLRs in a set of muscles. To enable reliable measurement of stretch-evoked muscle responses during fMRI, we combined a newly-developed MRI-compatible robot with custom EMG acquisition and processing methods, and used a modified fMRI sequence including a 225 ms silent window after every acquisition volume, during which stretch-evoked responses were measured (Fig. 1). A one degree-of-freedom wrist robot—the MR-StretchWrist—shown in Fig. 1B and described in detail in a previous publication²⁹, was used to apply perturbations at different velocities to condition LLRs of two muscles (Flexor Carpi Radialis—FCR, and Extensor Carpi Ulnaris—ECU). Muscle responses were quantified using a custom electrode set that included co-located measurement and reference electrodes. Use of the custom electrode set allowed to simultaneously record both muscle signal and motion artifact induced by the unavoidable movement of the electrodes in the scanner. The measured signals were filtered using a pipeline that included adaptive noise cancellation to estimate and remove the signal related to motion artifacts (non-linearly related to the measurement from the reference electrode), and obtain clean measurements of EMG. Details about the methods developed for this study, the protocols used for their validation, and application of the methods to study the neural substrates fo LLR are provided in the sections below.

Simultaneous recording of fMRI and EMG data

Measuring EMG during fMRI protocols is challenging because of the artifacts introduced in the EMG recordings by the coupling of the time- and spatially-varying electromagnetic fields required for MRI—i.e. static magnetic field, gradient magnetic field, radio waves—with the undesired movement of the EMG electrodes. While the radio waves introduce noise at a frequency range that is distinct from the spectrum expected for physiological muscle contractions, meaning that noise could be removed using frequency domain filters, this is not the case for gradient and movement artifacts whose spectrum overlaps with the one expected for muscle contractions.

Because we were only interested in measuring EMG during brief periods of time—i.e. those where we expect to observe a LLR—we introduced in the MRI scanning protocol a 225 ms silent window after each acquisition volume. During silent windows, no fMRI excitation is generated, allowing us to avoid the effects of gradient and radio-frequency waves (Fig. 1A). This approach enables the analysis of associations between EMG and fMRI signals due to the intrinsic delay of the hemodynamic signal. Such a delay decouples temporally the measurement of muscle activity associated with an LLR—expected within 100 ms of a perturbation—with the measurement of the associated blood-oxygen level dependent (BOLD) signal—which completes its course several seconds after a reflex is elicited (Fig. 1A).

While this approach allowed us to remove artifacts associated with both radio waves and gradients, preliminary analyses have highlighted that sub-millimeter movements of the electrodes caused by the robotic perturbation, while tolerable in normal laboratory conditions, produce artifacts up to 5–10 times larger than the stretch-evoked EMG response when using off-the-shelf electrodes in the static field of the MRI scanner (Fig. S6 in the Supplementary Materials).

These artifacts are a consequence of the Maxwell-Faraday law that describes how a time-varying magnetic flux through the surface enclosed by a conductive loop induces current on the conductive loop. For our specific application, the time-varying flux is generated because the path of the electrode leads during wrist perturbation unpredictably moves and deforms, leading to a change in the area enclosed by the conductive loop. Moreover, because the MRI static magnetic field is non-homogeneous in space, the amplitude of the artifact is highly dependent on the position that the electrodes occupy in space, which cannot be guaranteed to be constant during an experiment.

However, since motion-induced artifacts are ultimately a function of the position of electrodes and wire path over time, they could theoretically be removed by (1) co-locating reference electrodes (measuring only motion-induced artifacts) with measurement electrodes (measuring both motion-induced artifacts and muscle EMG), (2) matching the electrode-to-electrode and electrode-to-ground impedance of the two sets of electrodes, (3) routing the leads of the reference and measurement electrodes so that their path is the same.

We have developed a set of electrodes that embodies these principles to measure EMG signal of forearm muscles during fMRI (Fig. 1C). The set includes a bipolar reference electrode (REF) separated by a measurement electrode (EMG) via an insulating layer. Both pairs of electrodes are off-the-shelf Ag/AgCl bipolar electrodes (multitrode, Brain Products, Munich, Germany), with the terminals of the EMG electrode that are placed on the belly of the corresponding muscle and aligned with the direction of the muscle fiber. The REF electrode pair, colocated with the EMG pair, is then embedded in a conductive substrate (Squishy Circuits, Anoka, MN, USA) that approximates the electrode-to-electrode and electrode-to-ground impedance of superficial forearm muscles (2–5 M$\Omega $). To obtain the best matching wire path between REF and EMG electrode leads, all wires are routed through a StarQuad cable (VDC 268-026-000, Van Damme Cable). Proper coiling around a central core in a StarQuad cable ensures that differential signals of the two sets of bipolar electrodes carry the same motion-induced signal of a virtual conductor routed through the center of the core. The conductive substrate was then grounded using the shield terminal of the StarQuad cable to the amplifier ground, and connected to a fifth Ag/AgCl electrode placed on the lateral epicondyle of the elbow. A similar approach has been used to minimize fMRI-induced artifacts on EEG recordings³⁰.

Because perfect (i.e. whole spectrum) impedance matching cannot be guaranteed, a simple subtraction of the EMG and REF signals would not ensure sufficient compensation of motion artifacts. As such, we developed a novel processing scheme that uses adaptive noise cancellation (ANC) to estimate artifact-free EMG signal (Fig. 1D). ANC is an adaptive filtering technique based on a multi-layer neural network that can be used to estimate a signal x corrupted by an interference w³¹. To properly work, the ANC takes two inputs consisting of the corrupted signal $y = x + w$ and a measurement of one or more interference signals ${\mathbf {r}}$. Considering that the true interference is altered when passing through two different measurement channels, signals ${\mathbf {r}}$ and w may be non-linearly related, such that a simple subtraction between y and components of ${\mathbf {r}}$ usually causes the distortion of the estimated signal ${\hat{x}}$. To solve this problem, ANC attempts to learn the non-linear relationship $f({\mathbf {r}})$ between the interference measured by the two different channels, estimating the signal ${{\hat{w}}} = f({\mathbf {r}})$, so that the desired signal can be estimated as ${{\hat{x}}} = y - {{\hat{w}}} = x +w - {{\hat{w}}} $.

For our analysis, we implemented the ANC scheme in MATLAB 2019a (MathWorks Inc., Natick, MA, USA) using an Artificial Neural Network Fuzzy Inference System (ANFIS)³². After appropriate learning, ANFIS constructs the non-linear mapping for the input-output relationship (estimate x given ${\mathbf {r}}$ and y in our case) without requiring a-priori knowledge of the structure of $f({\mathbf {r}})$. Learning occurs by tuning a set of membership functions that refer to the different layers of the network. In our algorithm, we used a hybrid iterative optimization method consisting of back-propagation for the parameters associated with the input membership functions, and least squares estimation for the parameters associated with the output membership functions³². Since the model structure has a large number of parameters, there is a risk for the ANFIS to overfit the data. To avoid overfitting, the algorithm partitions at every iteration a random sample from the entire dataset (in our case, a single continuous random interval within the time-series of y and ${\mathbf {r}}$ collected during a given perturbation) and uses it to cross-validate the model. The idea is that the cross validation error decreases until overfitting starts to occur. As such, the algorithm selects the set of parameters for the membership functions that refer to the solution with the minimum cross-validation error³².

In our implementation, we considered the interference signals ${\mathbf {r}}$ as a three-component vector, including the interference signal r measured by REF electrodes, its derivative $\dot{r}$, and a signal that quantifies the perturbation-related motion, as measured by the rotative encoder built in the MR-SW ($\theta $). The ANFIS was implemented using the genfis and anfis functions built in the Fuzzy Logic Toolbox.

Procedures

Population

27 healthy individuals (16 males, 11 females; age range 19–38 years) volunteered to participate in one of the two experiments. 13 participants were involved in Experiment 1, 14 participants were involved in Experiment 2. All participants self reported as right handed, free from neurological disorders, orthopedic or pain conditions affecting the right arm and provided informed written consent prior to data collection. This study was approved by the Investigation Review Board of the University of Delaware Protocol no. 1097082-5 and was conducted in accordance with the Declaration of Helsinki.

Of the 27 participants recruited for this study, 25 (12 for Experiment 1 and 13 for Experiment 2) completed the full experimental protocol. Two participants only completed one of the two fMRI sessions because of discomfort during imaging. For this reason, they were excluded from the all imaging analyses. Due to technical issues affecting the quality of data collected during the experiments, other exclusions were required. Because of faulty EMG recordings (disconnected electrode wire) during fMRI, data collected from three participants (all involved in Experiment 2) were excluded from all the analyses described above. Moreover, because of errors in slice prescription, imaging data collected in three other participants (all involved in Experiment 1) were excluded from the fMRI analyses.

As a result, the EMG validation analysis was performed on data collected from $n=12$ individuals (all involved in Experiment 1), while the fMRI analyses were performed on $n=18$ participants (9 participants involved in Experiment 1, 9 participants involved in Experiment 2).

Experiment 1

Experiment 1 was composed of a total of four sessions, all performed during the same visit. The first and last sessions ($\text {OUT}_1$ and $\text {OUT}_2$) were performed in a mock scanner outside the MRI room, allowing the participant to maintain a supine posture similar to the one required for fMRI scanning. This was done in the attempt to replicate imaging conditions as much as possible, while removing any source of interference induced by the MRI electromagnetic fields. The two middle sessions ($\text {IN}_1$ and $\text {IN}_2$) were performed inside the MRI scanner (Siemens Prisma 3T scanner using a 64 channel coil), during fMRI. Parameters used for the MRI sequence included: Multi-Band Accelerated EPI Pulse sequence; 2 × 2 × 2 mm³ voxel resolution with 0.3 mm slice spacing; 46 deg degree flip angle; 110 × 110 px per image, 60 slices; TR = 1225 ms; TE = 30 ms; pixel bandwidth = 1625 Hz/pixel; receiver gain: high; simultaneous multi slice acceleration factor: 4. After the two functional sessions, a high resolution structural scan (magnetization-prepared rapid acquisition with gradient echo (MPRAGE): 0.7 × 0.7 × 0.7 mm³ resolution, with TR = 2300 ms, and TE = 3.2 ms; 160 slices with 256 × 256 px per image, was acquired for registration and normalization of the results to a common space. The full experimental protocol took about 140 min divided as 45 min for the robot setup and the electrode placement, 20 min per each one of the four experimental sessions, and 15 min for the structural scan.

Experiment 2

Experiment 2 was composed by only sessions $\text {IN}_1$ and $\text {IN}_2$ performed in Experiment 1—i.e. those performed during fMRI scanning. The full experimental protocol took about 85 min divided as 30 min for the robot setup and the electrode placement, 20 min per each one of the two experimental sessions, and 15 min for the structural scan.

Protocol

In all conditions, participants were exposed to a sequence of Ramp-and-Hold (RaH) perturbations in either flexion or extension at multiple velocities while EMG was recorded from the flexor carpi radialis (FCR) and the extensor carpi ulnaris (ECU). Before the perturbation onset, participants were visually cued to apply 200 mNm of background torque in the direction opposing the ensuing perturbation (e.g. if the perturbation was in the extension direction the participant was required to apply a flexion torque) to condition the muscles that would be stretched by the perturbation. The perturbation was then automatically triggered after the error between the target and measured torque was below 25 mNm for a given time $\text {T}_{\text {hold}}$. To avoid habituation to the time delay, for each perturbation the time $\text {T}_{\text {hold}}$ was randomly selected between 400 ms and 800 ms. Because the impact dynamics that characterize the end of a perturbation generates undesired oscillations in the EMG recording, leading to detection of false positive activity in the LLR window, perturbations were kept active for a fixed time of 200 ms, regardless of the magnitude of perturbation velocity. A rest period lasting for an interval randomly selected between 7 and 10 s was then included after each perturbation, after which the robot slowly moved the hand back to the neutral position with a return velocity set to 25 deg/s. Additional 4 s of rest were then allowed before the toque target for the following perturbation was displayed to the participant. Participants were instructed to yield to all perturbations.

In each session, participants were exposed to a total of 60 perturbations, pseudo-randomly selected from a pool of six different perturbation velocities ([50 125 200] deg/s in flexion or extension) repeated 10 times each. Before the beginning of each session, participants were visually cued to apply and hold a set of ten interleaved flexion and extension isometric torques. In both directions, the magnitude of the desired torque was set to be 500 mNm, and the participants were asked to keep it constant with a maximum error of 25 mNm for 5 s. This set of contractions was used to normalize stretch-evoked responses and enable group analysis of collected EMG data.

EMG acquisition and analysis

Surface EMG was recorded with the BrainVision Recorder software (Brain Products, Munich, Germany) using a 16-channel MR-compatible bipolar amplifier (ExG, Brain Products, Munich, Germany). Given our interest in studying the neural substrates of LLR for both flexors and extensors, EMG was recorded from two wrist muscles: Flexor Carpi Radialis (FCR) and Extensor Carpi Ulnaris (ECU).

For each muscle, after having carefully cleaned the skin with a 70% Isopropyl Alcohol solution, we placed the bottom layer of electrodes on the belly of the muscle oriented along the muscle fiber, and filled the central hole of each electrode with an abrasive Electrolyte-Gel (Abralyt HiCl, Rouge Resolution, Cardiff, UK). Contact impedance for each electrode was measured using the BrainVision Recoder software and a cotton swab dipped in abrasive gel was swirled on the skin until the measured contact impedance was lower then $10\,\text {k}{\Omega }$, as described in the product technical specification. We then carefully co-located the reference electrodes on top of the measurement electrodes, using a layer of electric tape applied on top of the measurement electrodes to avoid electrical contact. Finally, we placed the conductive substrate on top of the reference electrodes and connected it to ground. In order to minimize relative motion of the different components of the apparatus, we applied pre-wrap around the entire forearm.

EMG data have been processed using three different pipelines to compare the novel processing scheme presented in this paper with two standard methods. The first method (STD) relies on the assumption that MRI-related movement artifacts are negligible and so it implements the same standard pipeline used to process the data recorded outside the scanner (Fig. S7A). In this way, the estimate of the EMG signal ${{\hat{x}}}$ is considered to be ${{\hat{x}}} = y$. The second method (SUB) compensates for MR-related movement artifacts assuming perfect match between the interference r measured by the REF electrodes and the true interference w (Fig. S7B). As such, it quantifies the EMG signal as ${{\hat{x}}} = y - r$. Finally, the third method fully implements the pipeline described in “StretchfMRI technique” section, quantifying the EMG signal as ${{\hat{x}}} = y - {\hat{f}}({\mathbf {r}})$ (Fig. 1D).

The EMG signal was processed to quantify reflex responses using standard pipeline³³ modified to include the ANC and SUB pipelines. Specifically, both REF and EMG signals were initially segmented to extract the subset of data points representing perturbation-related activity recorded during the 200 ms silent window (25 ms after volume acquisition is completed), so that the first time point would correspond to the perturbation onset. The segmented signals were band-pass filtered using a 4th order Butterworth filter with cut-off frequencies $ f_{LP} =20 \text { Hz}$, and $ f_{HP} =250 \text { Hz}$, and fed to the later components of the filtering pipeline (Fig. 1D, for ANC this is signal $y_f$). The estimate of the EMG activity returned by the ANC, SUB, and STD filters was finally rectified and low-pass filtered with a 4$^{\text {th}}$ order Butterworth filter with cut-off frequency $ f_{ENV} =60 \text { Hz}$. To allow between-subject comparison, after filtering, we normalized the stretch-evoked EMG activity by the average EMG ($\overline{EMG_c}|_j$) measured during the isometric contractions of the muscle j recorded prior to the beginning of each perturbation session. To determine $\overline{EMG_c}|_j$, we used only the central 3 s of activity recorded for the subset of contractions in which the given muscle was active—i.e. only the flexion torques for the FCR and only the extension torques for the ECU. The same constant was used to normalize EMG activity measured in response to perturbations that both stretched and shortened the muscle. Finally, to extract the magnitude of the long-latency response $H_{i,j}$ elicited by perturbation i on muscle j, we used the cumsum method³⁴, quantifying $H_{i,j}$ as the area underlying the processed EMG signal $EMG_{i,j}(t)$ in the time window [50, 100] ms after the perturbation onset:

$$\begin{aligned} H_{i,j} = \int _{50}^{100} EMG_{i,j} dt \end{aligned}$$

(1)

where time is expressed in ms.

Validation of the EMG measurements during fMRI

To validate the ability of StretchfMRI to reliably condition and quantify stretch-evoked LLR responses during fMRI, we used the EMG data recorded in the four sessions of Experiment 1. As no MR-related noise is expected in the sessions performed outside the MRI scanner, we considered the sessions $\text {OUT}_1$ and $\text {OUT}_2$ to act as a gold standard of measured stretch-evoked responses that is reflective of the true LLR responses, useful for comparison with the LLR-related activity measured during sessions $\text {IN}_1$ and $\text {IN}_2$.

EMG signals recorded during the two OUT sessions were processed using a standard pipeline that used all steps described above, but only used the STD filter (no expected signal from REF electrodes); while the EMG recorded during the sessions performed inside the MRI scanner was processed using the three different pipelines described in “EMG acquisition and analysis”.

Finally, for all sessions ($s = \left[ \text {OUT}_1, \text {IN}_1, \text {IN}_2, \text {OUT}_2 \right] $), filtering pipelines ($f = \left[ \text {STD}, \text {SUB}, \text {ANC} \right] $), and participants ($p = \left[ \text {P}01, \text {P}02, ..., \text {P}27 \right] $), we computed the amplitude of the stretch-evoked muscle activity $H_{i,j,v}|_{s,f,p}$ for all repetitions ($i = \left[ 1,2,...,10\right] $), both muscles $j = \left[ \text {FCR}, \text {ECU}\right] $, and all perturbations velocities ($v = \left[ -200, -125, -50, 50, 125, 200\right] $ deg/s) using Eq. (1). In general, averages across repetitions are indicated as ${\bar{H}}_{j^{*},v^{*}}|_{s^{*},f^{*},p^{*}}$ for specific levels of all other factors. The bold notation ${\bar{\mathbf{H}}}$ will be used to refer to the set of ${\bar{H}}$ measured for all levels of one or more factors. When this operation is performed, the indices of the corresponding factors are removed (e.g. ${\bar{\mathbf{H}}}|_{IN_1, ANC}$ refers to the set of the mean LLR amplitudes measured during the session $\text {IN}_1$ in both muscles for all participants and all perturbation velocities, when the ANC filter is used).

Statistical analysis

We quantified the accuracy afforded by each filtering method in identifying the true stretch-evoked muscle responses during fMRI at both the group level (combining measurements at all levels of factors muscle, velocity, participant), and at the individual perturbation level (considering each level of factors muscle, velocity, and repetition separately). With the group level analysis, we sought to quantify the agreement between the average LLR amplitude (${\bar{\mathbf{H}}}$) measured inside and outside the MRI for each filtering method. With the perturbation-specific analysis, we sought to quantify the deviation between each stretch-evoked response measured during MRI and the distribution of responses measured in the $\text {OUT}_1$ session in the same subject, velocity, and muscle. The analysis presented below is specific to the fMRI session $\text {IN}_1$; the results obtained when the following methods are applied to the session $\text {IN}_2$ are reported in the Supplementary Materials.

Group level analysis

For the group level analysis, we used the paired Bland-Altman (BA) analysis³⁵, a statistical method to assess agreement between two measurement techniques when they are used to obtain two sets of data points in paired experimental conditions. Specifically, we compared the group level sets ${\bar{\mathbf{H}}}|_{IN_1,f}$ and ${\bar{\mathbf{H}}}|_{OUT_1}$ pairing the average LLR response measured for each perturbation velocity, each muscle, and for each participant. Due to the difference between responses obtained as a consequence of muscle stretch and shortening, we considering separately the subset of values measured in response to muscle stretch (${\tilde{\mathbf{H}}}|_f^{st}$) and shortening (${\tilde{\mathbf{H}}}|_f^{sh}$).

For each of the two stimulus direction conditions (d) (i.e. stretch and shortening), we used BA analysis to determine bias (B$|^{d}_{\text {IN},f}$) as the mean difference between paired measurements ${\tilde{\mathbf{H}}}|^d_{IN_1,f} - {\tilde{\mathbf{H}}}|^d_{OUT_1}$, for all conditions, and the 95% limits of agreement (LoA$|^{d}_{\text {IN},f}$), defined as the interval where given one measurement, we expect to measure the other in 95% of the cases. Each metric is measured with its own precision, expressed in terms of its 95% confidence interval. Both bias and limits of agreement are expected to decrease with an increase in agreement between the two techniques. However, when the paired measurements are not obtained simultaneously with two techniques, the metrics of test-retest reliability extracted by BA analysis will be affected by both measurement error and by intrinsic physiological variability of the measurand, in this case stretch-evoked responses²⁰. As such, bias or limits of agreement may be artificially inflated by physiological variability. To isolate the effects of measurement error from those of physiological variability, we thus defined our outcome measures as contrasts between the reliability measured via the $\text {IN}_1$ vs. $\text {OUT}_1$ comparison and the reliability measured via the $\text {OUT}_2$ vs. $\text {OUT}_1$ comparison.

We thus used the metrics obtained with the BA analysis to make inference on (i) whether filtering method affects test-retest reliability, and (ii) whether filtering method affords test-retest reliability comparable with the baseline variability of the physiological process being measured (obtained for the $\text {OUT}_2$ vs. $\text {OUT}_1$ comparison).

We tested the null hypothesis that the estimated bias does not change when using different filtering methods by calculating the 95% confidence interval of bias for the $\text {IN}_1$ vs. $\text {OUT}_1$ comparison for each filtering method (B$|^{d}_{\text {IN},f}$), and performing three pairwise comparisons of the resulting confidence intervals (one for each pair of methods), using the Bonferroni method to correct for multiple comparisons. Moreover, we tested the null hypothesis that the bias measured in the $\text {IN}_1$ vs. $\text {OUT}_1$ comparison was equal to the one measured in the REF comparison ($\text {OUT}_2$ vs. $\text {OUT}_1$) by establishing if any of the intervals (B$|^{d}_{\text {IN},f}$) overlapped with the confidence interval (B$|^{d}_{\text {REF}}$) measured for the $\text {OUT}_2$ vs. $\text {OUT}_1$ comparison.

We conducted a similar analysis for the LoA metrics to make inference on the test-retest reliability of a specific measurement. Because the LoA is defined as an interval, test-retest reliability inferences are based on the analysis of the Jaccard index, used to quantify the relative overlap of two intervals as the ratio between their intersection and union. In our case, the Jaccard index $J|_f^{d}$ is defined as a function of the LoA measured for the comparison $\text {IN}_1$ vs. $\text {OUT}_1$ (LoA$|^{d}_{\text {IN},f}$), and the LoA measured for the comparison $\text {OUT}_2$ vs. $\text {OUT}_1$ (LoA$|^{d}_{\text {REF}}$), as

$$\begin{aligned} \text {J}|^{d}_{f} = \frac{\text {LoA}|^{d}_{\text {IN},f} \cap \text {LoA}|^{d}_{\text {REF}}}{\text {LoA}|^{d}_{\text {IN},f} \cup \text {LoA}|^{d}_{\text {REF}}} \end{aligned}$$

(2)

With this procedure, the test-retest error afforded by each filtering method was scaled between 0 and 1, with a score of 1 representing a perfect match of the LoA in the two conditions. Because each LoA is measured with its own precision, we determined the 95% confidence intervals for the Jaccard index using bootstrapping to numerically compute the distribution of Jaccard indices when the LoA for the comparisons $\text {IN}_1$ vs. $\text {OUT}_1$ and $\text {OUT}_2$ vs. $\text {OUT}_1$ were randomly sampled from a normal distribution with mean and standard deviation calculated from the BA analysis coefficients.

Similarly to the estimated bias analysis, we used the estimated distributions of Jaccard coefficients to make inference on whether filtering method affects test-retest reliability. Specifically, we tested the null hypothesis that there is no difference in the overlap of the limits of agreement by establishing if the 95% confidence interval of Jaccard indices overlapped for different levels of filtering method (three comparisons).

Perturbation-specific analysis

The goal of the perturbation-specific analysis was to quantify the departure of each stretch-evoked response measured during fMRI from the baseline values measured during the $\text {OUT}_1$ session. To this aim, we computed the standardized z-score separately for each value in the set $\mathbf{H}|_{IN_1, f}$ as:

$$\begin{aligned} z_{i,j,v}|_{IN_1,f,p} = \frac{H_{i,j,v}|_{IN_1,f,p}-\mu _{j,v}|_{p} }{\sigma _{j,v}|_{p} } \end{aligned}$$

(3)

where i, j, v, f, and p are the perturbation, muscle, velocity, and participant indices, respectively; while ($\mu _{j,v}|_{p}$) and ($\sigma _{j,v}|_{p}$) are the population mean and standard deviation of the set $\mathbf{H}_{j,v}|_{OUT_1,p}$. To simplify the interpretation of the results, for each stimulus direction condition d, we determined the standard deviation ($\sigma _{v}|^{d}_{IN_1,f}$) of the values in the set $\mathbf{z}_{v}|^{d}_{IN_1,f}$. Ideally, if the elements in the sets $\mathbf{H}_{v}|^{d}_{IN_1, f}$ and $\mathbf{H}_{v}|^{d}_{OUT_1}$ are sampled from the same normal distribution, the standard deviation $\sigma _{v}|^{d}_{IN_1,f}$ should have unitary magnitude. However, due to the physiological variability of the stretch-evoked muscle response the distribution of the elements in the two sets might have different standard deviation. As such, to have a term of comparison, we determined also the standard deviation $\sigma _{v}|^{d}_{REF}$ of the set of standardized scores $\mathbf{z}_{v}|^{d}_{REF,f}$ obtained applying Eq. (3) to the elements in the set $\mathbf{H}_{j,v}|_{OUT_2,p}$.

We then tested the null hypothesis that the variance of the sets $\mathbf{z}_{v}|^{d}_{IN_1,f}$ is the same for all filtering methods, performing three Bartlett tests, one for each pair of sets $\mathbf{z}_{v}|^{d}_{IN_1,f_i}$, $i=1,2,3$. Moreover, we performed three additional Bartlett tests to test the null hypothesis that the variance of the sets $\mathbf{z}_{v}|^{d}_{REF}$ and $\mathbf{z}_{v}|^{d}_{IN_1,f}$ is the same for each filtering method (one test per each filtering method). The Bonferroni correction was used to control for multiple comparisons when multiple tests were applied to the same outcome measures.

Statistical analysis of imaging data

We performed the analysis of all fMRI datasets using SPM12 (Wellcome Department of Cognitive Neurology, London, UK, http://www.fil.ion.ucl.ac.uk/spm) running on Matlab 2019a (Mathworks, Inc., Natick, MA, USA, http://www.mathworks.com). Functional datasets were preprocessed using a standard pipeline composed of five steps: realignment to the mean image, co-registration to structural MPRAGE, normalization to a standard MNI space template, spatial smoothing, and high-pass filtering (cut off at 128 s). For spatial smoothing, we used a Gaussian kernel with FWHM = 4 mm for voxels in the brainstem for brainstem-specific analyses (both hypothesis-testing and reliability analyses), and FWHM = 8 mm for the all other analyses. 4 mm smoothing of the brainstem was selected to account for the different expected extent of functional activations of brainstem nuclei compared to whole brain, and was used only for testing a regionally-specific (brainstem) hypotheses.

For each scanning session, we performed two first-level analyses to determine activation in the whole brain and in the brainstem, separately. While we used the same general linear model (GLM) for both analyses, we convolved it with two different hemodynamic response functions (HRF) to account for different hemodynamic response dynamics expected in different brain regions. A brainstem-specific HRF³⁶ (delay of the peak = 4.5 s, delay of the undershoot = 10 s, dispersion of response = 1 s, dispersion of undershoot = 1 s, ratio of peak to undershoot = 15, length of kernel = 32 s) was used to assess activation in the brainstem, while a standard HRF (delay of the peak = 6 s, delay of the undershoot = 16 s, dispersion of response = 1 s, dispersion of undershoot = 1 s, ratio of peak to undershoot = 6, length of kernel = 32 s) was used to assess activation in all other regions. The neural response was modeled as:

$$\begin{aligned} \mathbf {y} = \beta _{\text {FCR}} \mathbf {H}_{\mathbf {FCR}}^{\star } + \beta _{\text {ECU}} \mathbf {H}_{\mathbf {ECU}}^{\star } + \beta _0 + \varvec{\beta }_{\mathbf {R}} \mathbf {R}, \end{aligned}$$

(4)

where $\mathbf {H}_{\mathbf {FCR}}^{\star }$ and $\mathbf {H}_{\mathbf {ECU}}^{\star }$ are obtained by convolution of rectangular functions (duration: 50 ms, onset: 50 ms after perturbation start, amplitude: ${\tilde{\mathbf {H}}_{FCR}|_{ANC}^{st}}$ and ${\tilde{\mathbf {H}}_{ECU}|_{ANC}^{st}}$, respectively) with the appropriate HRF (Fig. S8 in the Supplementary Material) to account for BOLD signal associated with LLRs of specific muscles. As such, the regressors of interest accounted for the magnitude of LLR amplitude measured for each muscle under stretch. An additional set of 6 nuisance regressors ${\mathbf {R}}$ was included to account for variance in the measured signal associated with 3D head movements (translation and rotation).

To determine voxels whose BOLD signal was significantly associated with LLRs of specific muscles under stretch, we first calculated the statistical maps for the contrasts $\beta _{\text {FCR}} > 0$ and $\beta _{\text {ECU}} > 0$ for each subject, and then used the contrast images as input for the second level analysis used to determine a group level effect.

Because participants were asked to contract their muscles prior to each perturbation, there is an intrinsic statistical association between the muscle state before the perturbation and the perturbation itself. As such, given the small temporal resolution of fMRI, it is possible that some of the variance in the BOLD signal explained by LLR-related regressors might actually arise from the neural activity associated with the background contraction. To rule out this concern, we conducted two alternative analyses: one based on simulated data, and one based on the implementation of a different GLM to account for neural signal associated with background muscle activity. A detailed description of these analyses and the respective discussion are included in the Supplementary Materials.

Test-retest reliability of neural activations

Data collected in two consecutive fMRI sessions ($\text {IN}_1$ and $\text {IN}_2$) were used to establish the test-retest reliability of fMRI data in different brain regions. Reliability of the activation maps was quantified at both the individual subject and group levels. For both levels of analysis we quantified reliability for the whole brain and for one bilateral and eight unilateral anatomical Regions of Interest (ROIs) known to be associated with the execution of motor actions: primary motor cortex (M1), primary somatosensory cortex (S1), premotor cortex (PM), superior parietal lobule (SPL), intraparietal sulcus (IPS), cerebellum (Cr), thalamus (Th), putamen (Pt), and brainstem (Bs). For each unilateral ROI, we considered two separate ROIs that refer to the left and right hemispheres. The specific brain areas included in each ROI are reported in Table 1 and were obtained by thresholding the Juelich, Harvard Subcortical, or Cerebellar MNI152 brain atlases at a probability of 50%.

Table 1 Definition of ROIs for test-retest reliability analysis.

Full size table

To quantify spatial congruence between the thresholded t-maps (p < 0.001, uncorrected) obtained for the two sessions $\text {IN}_1$ and $\text {IN}_2$, we used the Sørensen-Dice index³⁷:

$$\begin{aligned} S = \dfrac{2 V_o}{V_1 + V_2} \end{aligned}$$

(5)

where $V_o$ represents the number of supra-threshold voxels contained in both $\text {IN}_1$ and $\text {IN}_2$, while $\text {V}_1$ and $\text {V}_2$ represent the number of supra-threshold voxels contained in $\text {IN}_1$ and $\text {IN}_2$, respectively. This index ranges between 0 and 1, with 0 meaning no overlap and 1 meaning perfect overlap. Activation maps resulting from the first- and second-level analyses have been used to calculate the Sørensen-Dice index at the individual subject ($\text {S}_{{p}}$) and at the group levels ($\text {S}_g$), respectively.

While the Sørensen-Dice index is commonly used to quantify test-retest reliability of pair of fMRI-based activation maps, the operation of thresholding used to calculate it loses a lot of information that could be relevant to quantify test-retest reliability. To overcome these limitations, we complemented the Sørensen-Dice index with a second index, the intraclass correlation coefficient (ICC)^38,39. We performed two different analyses to determine the ICC both at the individual participant ($\text {ICC}_p$) and at the group levels ($\text {ICC}_g$). For the participant-specific analysis, we calculated the ICC applying the definition of the ICC(3,1) proposed by Shrout and Fleiss⁴⁰ to the subset of voxels contained in a specific ROI (Table 1) that, in the case of only two repeated measurements, simplify to:

$$\begin{aligned} \text {ICC}(3,1) = \dfrac{BMS-EMS}{BMS + EMS} \end{aligned}$$

(6)

where BMS is the Between voxel Mean Square variance and EMS is the Error Mean Square variance obtained using a two-way mixed effect ANOVA, with voxel as a random effect, and session as a fixed effect.

For the group level analysis, we calculated reliability maps applying Eq. (6) to each voxel in the brain assuming subject as a random effect and session as a fixed effect. To obtain the ROI-specific $\text {ICC}_g$, we computed the median of the ICC distributions within each region.

For each of the two subject-specific reliability indices ($\text {S}_p$ and $\text {ICC}_p$), we performed statistical inference to test two separate null hypotheses: $h_0|^1$) within each ROI, the means of the reliability indices obtained for the two regressors ($\mathbf {H}_{\mathbf {FCR}}^{\star }$ and $\mathbf {H}_{\mathbf {ECU}}^{\star }$) are equal; $h_0|^2$) for each regressor, the means of the reliability indices obtained for the unilateral ROIs are equal in the two hemispheres. To test $h_0|^1$, for each of the two indices, we performed 18 paired t-tests (one per ROI plus one for the whole brain). To test $h_0|^2$, for each of the two indices, we performed 16 paired t-tests (two per each tested ROI).

Neural correlates of LLRs

Brainstem-specific analysis

We conducted our primary fMRI analysis to test the hypothesis that BOLD signal in any voxel in the brainstem was significantly associated with LLRs for flexors and/or extensor muscles under stretch. For this analysis, we combined data measured in sessions $\text {IN}_1$ and $\text {IN}_2$ by concatenating time-series measured in the two sessions, and adding a separate regressor to model the factor session. We used FWHM = 4 mm and the brainstem-specific HRF for this analysis. Given the regionally-specific nature of our primary hypothesis, we only included signal from brainstem voxels and applied a small-volume correction⁴¹ that uses random field theory⁴² to control for the Family-Wise Error rate (FWE) for voxel-specific t-tests at the $\alpha <0.05$ level in the region-of-interest (number of voxels at the 2-mm isotropic resolution: 3314). To identify the location of clusters of activation in the brainstem, we used the Harvard Ascending Arousal Network Atlas⁴³, the Brainstem Connectome Atlas⁴⁴, in addition to non MNI-based anatomical brainstem atlases^45,46.

Whole-brain analysis

We conducted a secondary analysis to test the hypothesis that BOLD signal in any voxel of the brain was significantly associated with LLRs for flexors and/or extensor muscles. For this analysis, we used FWHM = 8 mm for all voxels and the standard HRF. Given the non-specific nature of this secondary hypothesis (number of voxels tested at the group level: 228,365), we controlled for Family-wise Error rate (FWE) in the whole brain using random field theory⁴² to achieve $\alpha <0.05$ for all voxels in the brain and applied a cluster correction of $k=10$ (only clusters of activation larger than k voxels are accounted).

Participant-specific analysis

We first extracted participant-specific t-scores from the first level analysis quantifying activation in the center of the two most significant clusters at the group level (MNI coordinates: [8, − 42, 40] mm for the right medulla, and [− 8, − 30, 26] mm for the left pons), then selected individuals with the greatest difference in t-scores for FCR-specific activation (right medulla), and ECU-specific activation (left pons)—subj 7 and 18 respectively—and individuals with similar t-scores for the two regressors—subjects 9 (right medulla) and 8 (left pons) (Fig. 6). Then, we quantified the residual signal not explained by nuisance regressors (FCR residuals, nuisance regressors: ECU + head movements; ECU residuals, nuisance regressors: FCR + head movements) to evaluate muscle-specific responses of the BOLD signal. We finally segmented and averaged the residual BOLD signal corresponding to multiple perturbations stretching specific muscles. BOLD residuals are overlaid with the regressor of interest from model 1 (scaled by the estimated $\beta _i$ coefficient), in Fig. 6, and with scaled regressors from model 2 in Fig. S13. The background regressor in Fig. 6 has arbitrary amplitude and is only reported for visual comparison.

Results

Validation of StretchfMRI

Reliability of stretch-evoked EMG during fMRI

We validated the stretch-evoked EMG collected during fMRI by quantifying agreement between measurements collected inside and outside the MR scanner. We used two analyses: one based on the mean LLR responses averaged across multiple repetitions for each set of conditions (group-level analysis), and one based on the analysis of EMG measurements of individual perturbations (perturbation-specific analysis). Behavioral data from IN and OUT sessions are reported in the supplementary materials. For reference, the mean timeseries of the EMG recorded during Exp 1 and processed using three different filtering pipelines is shown in Fig. 2A. Muscle-specific EMG recordings are reported in Figs. S9, S10, respectively for FCR and ECU measured during Exp 1.

Group level results

Group-level analysis was performed using the Bland-Altman (BA) method³⁵, with plots shown in Fig. 2B for the comparisons $\text {OUT}_2$ vs. $\text {OUT}_1$ and $\text {IN}_1$ vs. $\text {OUT}_1$. The values of bias, positive and negative limits of agreement (LoA), with the respective confidence intervals are reported in Table S8 in the supplementary materials.

Based on the BA analysis, Adaptive Noise Cancellation (ANC) enables reliable estimation of LLR amplitude during fMRI sessions. In fact, the agreement between LLR amplitudes measured inside and outside the MRI is not worse than the agreement between two OUT sessions. Specifically, the bias estimated for the $\text {IN}_1$ vs. $\text {OUT}_1$ comparison was not significantly different from the one estimated for the $\text {OUT}_2$ vs. $\text {OUT}_1$ comparison (z(72)=0.56, $p = 0.57$ for stretch, $z(72)=1.70$ , $p=0.09$ for shortening). Additionally, the 95% confidence intervals of the bias estimated for both comparisons, in both muscle stimulus directions, intersect the zero value indicating that the measurement is unbiased (Fig. 3A). Finally, the overlap between the range defined by the LoA for two repeated OUT experiments and the range of LoA for IN vs. OUT experiment, quantified by the Jaccard coefficient, approached perfect overlap (mean ± s.e.m. $J=0.843 \pm 0.001$ for stretch, $J=0.783 \pm 0.001 $ for shortening).

For reference, test-retest reliability was worse using standard filtering pipelines (STD and SUB). Specifically, the pairwise comparisons performed between the difference in measurements obtained when using different filtering pipelines rejected the null hypothesis for the comparisons STD vs. ANC and SUB vs. ANC, for both stretch (z(72) = 3.70, p < 0.001; z(72) = 4.13, p < 0.001, respectively) and shortening (z(72) = 4.75, p < 0.001; z(72) = 4.77, p < 0.001, respectively) (Fig. 3A). The measurements estimated using ANC were consistently smaller than the ones estimated using either STD or SUB (Table S8) in both shortening and stretch conditions. The ANC filter also afforded the highest overlap in LoA in quantifying LLRs during fMRI during both muscle stretch and shortening compared to other methods. Jaccard index values for ANC were greater than those obtained with the SUB filter ($J^{st}_{SUB}=0.593 \pm 0.001$, $J^{sh}_{SUB}=0.459 \pm 0.001$), and the STD filter ($J^{st}_{STD}=0.489 \pm 0.001$, $J^{sh}_{STD}=0.324 \pm 0.001$). Pairwise comparisons of the Jaccard indices rejected the null hypothesis for all comparisons showing a statistically significant difference between the test-retest error obtained using the different filtering pipelines, in both muscle stimulus conditions (p < 0.001 for all comparisons) (Table S8, Fig. 3B). The group level analysis has been repeated using measurements collected for the $\text {IN}_2$ session (Figs. S11, S12, and Table S10, supplementary materials), showing a perfect overlap of all statistically significant results.

Perturbation-specific results

With the perturbation-specific analysis, we sought to quantify the deviation between each stretch-evoked response during IN sessions and the distribution of responses measured in a matched reference (REF) condition ($\text {OUT}_1$ session), in terms of the z-score of each perturbation (Fig. 3C). A Bartlett test established that the variance in measurements collected during IN sessions using ANC was not significantly different from the one during OUT sessions in matched conditions for muscles undergoing stretch (50 deg/s: $\chi ^2 = 3.67$, p = 0.16; 125 deg/s: $\chi ^2 = 3.69$, p = 0.16; 200 deg/s: $\chi ^2 = 5.8$, p = 0.06), while the variance was greater during the IN condition for muscles undergoing shortening (50 deg/s: $\chi ^2 = 17.50$, p < 0.001; 125 deg/s: $\chi ^2 = 17.87$, p < 0.001; 200 deg/s: $\chi ^2 = 33.07$, p < 0.001).

For reference, the variance of z-scores measured using other filtering pipelines (STD and SUB) was greater than the one measured for both ANC and REF conditions ($p<0.001$ for all paired comparisons). Detailed figures and tables arising from this analysis are included in Table S9 of the supplementary materials. The perturbation-specific analysis is repeated using measurements collected for the $\text {IN}_2$ session (Figs. S11, S12, and Tabs. S10, S11 of the supplementary materials), showing a perfect overlap of all statistically significant results.

Test-retest reliability of neural activations

The analysis of the overlap between the group level activation maps showed a moderate to good overlap in the whole brain as well as for all contralateral cortical ROIs for both regressors, with a poor or absent overlap in the ipsilateral ROIs (Fig. 4A and Table S12). Good to excellent overlap was observed bilaterally in the cerebellar ROI, while moderate overlap could be observed bilaterally in the Thalamus, in the left Putamen, and in the right Putamen for FCR-specific maps. Poor overlap could instead be observed in the right Putamen for the ECU-specific map and in the brainstem (Fig. 4A and Table S12). The analysis of participant-specific activation maps showed a moderate overlap in the whole brain as well as for all selected cortical and cerebellar ROIs, while fair to poor overlap can observed for the subcortical ROIs (Fig. 4A and Table S12). Similar reliability was measured for activation associated with flexor and extensor LLRs (paired t-test between subject specific Soerensen-Dice index $S_p$ failed to reject the null hypothesis $h_0|^1$ that they are the same for all selected selected ROIs with the exception of the right Putamen).

Additionally, the paired t-tests between the $\text {S}_p$ measured in the two hemispheres rejected the null hypothesis $h_0|^2$ for M1 (FCR: t(34) = − 2.71, p = 0.011; ECU: t(34) = − 3.44, p = 0.003), PM (FCR: t(34) = − 2.73, p = 0.010; ECU: t(34) = − 2.45, p = 0.020), and S1 (FCR: t(34) = − 2.66, p = 0.010), showing a significantly higher overlap in the contralateral sensorimotor cortex for both regressors. No significant difference was observed in any of the cerebellar and subcortical ROIs.

The ICC showed fair to good agreement between the session-specific statistical maps for the cortical and cerebellar ROIs both at the individual subject and the group level (Fig. 4B and Table S12). On the contrary, only poor to fair ICC can be observed in the subcortical ROIs (Fig. 4B and Table S12). The paired t-tests failed to reject the null hypothesis $h_0|^1$ for all ROIs, showing no significantly different test-retest reliability for the two regressors. Similarly to what was observed in the analysis of the overlap of the thresholded maps, the paired t-test rejected the null hypothesis $h_0|^2$ for M1 (FCR: t(34) = − 2.50, p = 0.018; ECU: t(34) = − 3.35, p = 0.002), PM (ECU t(34) = − 2.39, p = 0.022), and S1 (FCR: t(30) = − 3.99, p < 0.001; ECU: t(34) = − 3.41, p = 0.002), showing a significantly higher ICC in the contralateral sensorimotor cortex for both regressors. No significant difference was observed in any of the cerebellar and subcortical ROIs.