Background & Summary

Disabling hearing loss is a leading cause of moderate to severe disability worldwide, affecting over 430 million individuals1. Cochlear implants (CIs) bypass the inner ear and directly stimulate the auditory nerve via electrical potentials, offering a solution to individuals who do not benefit from conventional hearing aids2,3. In terms of speech understanding, despite their proven efficacy, CI outcomes vary significantly due to factors that remain poorly understood4,5,6,7.

Emerging evidence highlights the critical role of brain activation patterns (i.e., cortical factors) in CI outcomes. Investigating these factors under clinically relevant conditions is essential for understanding individual variability, predicting outcomes, and optimizing therapeutic interventions4,5,8,9.

Functional near-infrared spectroscopy (fNIRS) is an ideal neuroimaging tool for investigating cortical activation patterns in CI patients. By measuring changes in oxygenated and deoxygenated hemoglobin concentrations (HbO and HbR), fNIRS provides a non-invasive method for assessing cortical hemodynamic activity. It is silent, non-invasive, radiation-free, and compatible with ferromagnetic implants, making it suitable for clinical populations, including patients with CIs. The measuring optical probes, affixed to the head via a cap, enable long-term recordings in diverse environments and are compatible with other imaging modalities, such as magnetic resonance imaging (MRI) and electroencephalography (EEG). fNIRS has been applied across numerous clinical domains, including: attention deficit hyperactivity disorder (ADHD)10, stroke rehabiliation11, depression diagnostics12, neurodegeneration and cognitive decline13,14, hearing loss and speech outcome prediction15,16,17,18, neurorehabiliation19 and tinnitus assessment20,21.

Although fNIRS holds great potential, research on cortical factors in CI patients is complex, and existing studies suffer from methodological heterogeneity and a lack of standardized datasets22,23,24,25,26,27,28,29,30. The brain networks supporting speech understanding in CI listeners remain poorly understood, partly due to the challenges of obtaining functional neuroimaging data in this population31.

As a result, the number of available fNIRS datasets in adult CI populations is limited22,23,32. Shader et al. shared a dataset involving 12 newly implanted CI participants, measuring cortical responses above auditory and visual brain areas33,34. Sherafati et al. collected fNIRS data on 20 unilaterally implanted CI patients and normal hearing (NH) controls, focusing on brain regions supporting spoken word understanding31. Steinmetzger et al. shared an fNIRS-EEG dataset of 20 CI patients with single-sided deafness, focusing on voice pitch processing in temporal regions35. Anderson et al. shared a dataset including 17 CI patients and NH controls measured preoperatively in auditory related brain regions30.

These datasets represent important contributions to field30,31,33,34,35. However, they focus on specific aspects of speech processing, thereby do not comprehensively address multimodal speech understanding. In particular, they do not cover all relevant cortical areas involved in audiovisual speech perception31,33. Moreover, these studies lack behavioral assessments during fNIRS acquisition and do not include spatial registration of probe positions30,31,33,34,35.

In our approach, we emphasize the importance of conducting research under clinically relevant conditions. We adapted a clinical speech audiometric test (German Matrix Sentence Test, OLSA), to assess cortical processing under different speech understanding scenarios, including speech-in-quiet, speech-in-noise, audiovisual speech and visual speech (i.e., lipreading) conditions. This approach ensures comparability with clinical assessments and allows for a more comprehensive investigation of multimodal speech understanding. Our dataset covers key brain regions involved in audiovisual speech perception, including the prefrontal, temporal, and visual cortices. We incorporated objective (e.g., hearing tests) and behavioral parameters to improve the accuracy of data interpretation. Further, we conducted spatial registration of the optical probe positions to correct for anatomical differences across participants, improving the reliability of group-level analyses.

The dataset presented in this data descriptor includes 46 CI patients and 26 NH controls. It is valuable for researchers in the field of hearing research, seeking to explore the neural basis of prosthetic hearing rehabilitation with CIs.

Potential applications of this dataset include, but are not limited to (i) exploration of neural mechanisms underlying speech perception in CI users across different listening modalities, (ii) comparative studies between CI and NH individuals to better understand cortical adaptations and neuroplasticity in prosthetic hearing, (iii) correlational analyses linking brain responses with objective or behavioral parameters (e.g., listening effort, speech-in-noise performance), (iv) development and validation of new signal processing pipelines for fNIRS data, especially in clinical populations, (v) training of machine learning models to predict individual rehabilitation outcomes or listening effort from neural signals.

Methods

Ethics statement

This study adhered to the Declaration of Helsinki and received approval from the local ethics committee (KEK-Bern, BASED-ID 2020-02978). All participants provided written informed consent before their participation.

Participants

The dataset comprises 72 participants, including 46 CI users with varying levels of speech understanding and 26 NH controls. All participants had normal or corrected-to-normal vision and no history of neurological or psychiatric conditions, nor any brain injury.

As inclusion criteria, CI users met the criterion of pure-tone average (PTA) hearing thresholds exceeding 80 decibel hearing level (dB HL) in both ears, averaged across 500, 1000, 2000, and 4000 Hz, confirmed via pure-tone audiometry. During the study, CI users wore the audio processor on their better-hearing side, as determined by audiometric assessment or personal preference.

Study procedure

Each participant took part in a single study appointment, which lasted approximately 2 to 2.5 hours. The study was conducted in a sound-treated acoustic chamber. Participants completed questionnaires, hearing assessments, and performed a multimodal speech comprehension task while brain activity was measured by fNIRS (summarized in Table 1). The score range and interpretation of the collected metadata are summarized in Table 2.

The study procedure involved:

  • Step 1: Initial questionnaires to assess handedness, lipreading experience, and subjective hearing perception.

  • Step 2: Pure-tone and speech audiometry to assess hearing.

  • Step 3 and 5: fNIRS measurements involving multimodal speech comprehension task.

  • Step 4 and 6: Behavioral assessments of listening effort, fatigue, and task engagement.

  • Step 7: Spatial registration of probe positions.

Table 1 Study Procedure.
Table 2 Summary of questionnaires, hearing tests, and behavioral assessments included in the metadata.

Questionnaires

Before the neuroimaging experiment, participants completed questionnaires on their handedness (Edinburgh Handedness Inventory36), subjective hearing experience (SSQ-1237), and lipreading skills38.

Hearing tests and etiology

All participants underwent speech audiometric testing. Speech recognition in quiet was assessed using the Freiburg monosyllabic word test at 65 dB sound pressure level (SPL), with word recognition scores (WRS) reported as percentages39. Speech recognition in noise was evaluated using the adaptive OLSA40,41,42,43,44, from which the speech reception threshold (SRT) was determined in dB sound-to-noise ratio (SNR). The maximum SNR was limited to 30 dB to avoid loudspeaker distortion.

For CI users, patient history included information on age at onset of deafness, duration of deafness, age at implantation, implant experience and cause of hearing loss. These variables are shared in an aggregated format in order to comply with the legal and ethical guidelines on human subject data.

fNIRS system and data acquisition

We used a continuous-wave fNIRS system (FOIRE-3000, Shimadzu, Kyoto, Japan) with 16 light sources and 16 detectors. The system emits laser light at wavelengths of 780 nm, 805 nm, and 830 nm, and a multi-alkali photomultiplier detector captures changes in the light’s intensity.

To account for CI participants, the measurement probes were arranged to allow the attachment of the external CI transmitter behind the ear. The cap design included measurement channels above temporal, visual, and prefrontal brain regions. Three channels were short source-detector separation (SDS) channels to capture extracerebral signals45,46,47. The sampling rate was 14 Hz.

During the fNIRS measurement, participants were seated in a chair behind a desk with a monitor and a loudspeaker. The loudspeaker was calibrated to 65 dB SPL for all stimuli. Following task instructions and familiarization, the fNIRS cap was fitted to the participant’s head. For CI users, any probes interfering with the external transmitter were removed.

fNIRS task design

In fNIRS tasks participants listened to sentences from the clinically established OLSA test48. Each 13-second long stimulus49 consisted of a sentence repeated three times in one of four modalities: (1) speech-in-quiet (fixation cross only), (2) speech-in-noise (fixation cross only), (3) audiovisual speech (without noise), or (4) visual speech (i.e., lipreading). Each block included four stimuli (one per modality) and a comprehension questions about the content. Breaks of 20–25 seconds followed each stimulus, shortened to 10 seconds before comprehension questions, with no time limit for answering (see the block design on Fig. 1).

Fig. 1
figure 1

The sentences were presented in either of the following conditions: speech-in-quiet, speech-in-noise, audiovisual speech, or visual speech. They were arranged in a counterbalanced block design, including a comprehension question about the content.

Overall, participants completed ten counterbalanced blocks, resulting in ten repetitions and two questions per modality. After the fifth block, participants had an extended break to report on behavioral parameters.

Behavioral assessments

We collected behavioral data throughout the experiment to assess task-related parameters, including listening effort according to the Adaptive Categorical Listening Effort Scaling (ACALES)50, fatigue measured using the Rating-of-Fatigue Scale51 and task related attention or mind-wandering52,53.

Spatial registration

Following the fNIRS task, we performed the spatial registration of the optical probe positions on the head using a Structure Sensor Pro 3D scanner (Occipital Inc., Boulder, United States) mounted to an iPad Pro 2020 (Apple Inc., Los Altos, California, United States, iOS 14.3)54.

Data Records

The data are available at Dryad55. The dataset is organized according to the Brain Imaging Data Structure v1.10.0 (BIDS)56 with the extension for NIRS data57.

The main data files are stored in the Shared Near Infrared Spectroscopy Format (SNIRF)58, a widely accepted standard for storing fNIRS data. The SNIRF files contain raw absorbance data from each channel at wavelengths of 780 nm, 805 nm, and 830 nm. They also include probe positions, and associated metadata from questionnaires, hearing tests, behavioral assessments, and patient history.

The temporal structure of each recording is consistent across participants and is illustrated in Fig. 2, which shows a representative example from a single participant. The fNIRS task begins with a start trigger, followed by ten counterbalanced blocks, with a break period at halftime.

Fig. 2
figure 2

Example timeline of the complete fNIRS measurement stored in the SNIRF files. The fNIRS task began with a start trigger, followed by ten counterbalanced blocks. These blocks consisted of four different listening conditions and two comprehension questions per condition. The questions had three possible outcomes: Right Answer, Wrong Answer, or Not Sure. After the fifth block, participants had a break, marked with a stop trigger.

Each participant has an individual SNIRF file, and also fields from the BIDS specification. The BIDS fields are replicated from the SNIRF file, which allows relevant behavioral, events or channel related data to be parsed without using an SNIRF reader. The latest specifications of the BIDS-NIRS format can be accessed online59.

Participant-level information is stored in the root directory of the dataset in a “participants.tsv” file, along with an accompanying “participants.json” sidecar that provides descriptions of each column, following BIDS formatting conventions60. These are top level files summarizing relevant demographic and audiometric variables, enabling researchers to filter participant groups efficiently. To complement this, a more detailed Excel file (participants info.xlsx) include all metadata, with accompanying code provided in ”supplementary code” folder in MATLAB61.

For Python, we provide a demonstration pipeline that includes loading, preprocessing, and visualization, implemented in a Jupyter Notebook (ds_main.ipynb) and a supporting script (ds_helpers.py). To run the demonstration, all dependencies listed in the “environment.yml” file needs to be installed.

Technical Validation

The data descriptor includes three main groups of data: (1) metadata, (2) raw fNIRS data, and (3) spatial registration data.

We report the (1) metadata without processing. We share the (2) raw fNIRS data in a BIDS-compliant format, and additionally we provide scripts for processing and visualization. We processed and visualized the (3) spatial registration data according to Bálint et al.54.

fNIRS data processing

We used the MNE-NIRS toolbox for fNIRS data processing62,63. In the first step, we conducted short-separation regression on the raw absorbance data using the nearest short SDS channel46,64,65. The signal was then bandpass filtered in the range of 0.01-0.1 Hz66.

Following, we converted to changes in HbO and HbR concentrations according to the specifications of the fNIRS machine (resulting unit µMolar*cm67):

$$\delta HbO=(-1.4887)\ast Ab{s}^{780}+0.5970\ast Ab{s}^{805nm}+1.4878\ast Ab{s}^{830nm}$$
(1)
$$\delta HbR=1.8545\ast Ab{s}^{780nm}-0.2394\ast Ab{s}^{805nm}-1.0947\ast Ab{s}^{830nm}$$
(2)

Motion artifacts were removed using temporal derivative distribution repair (TDDR)68.

Individual epochs from 0 to 24 seconds following the onset of stimulation were extracted and baseline corrected (−5 to 0 seconds). Epochs in which participants reported interference (e.g., movement during stimulus presentation, incorrectly selected answers) were removed, which affected 0.29% of all epochs. In addition, due to physical constraints from the CI coil, and cap setup, a subset of channels (4.5% of all channels) were marked as NaNs, meaning they were not used in the analysis.

Spatial registration processing

On the 3D scans, we manually labeled the probe positions and anatomical landmarks (nasion, left and right preauricular points) using MeshLab (version 2022.02)69. These labeled probe positions were then aligned with the MNE-Python 10-05 template62,63 through rigid registration, using the anatomical landmarks as control points54. The resulting 3D positions are provided in the coordinate system of the MNE 10-05 template. In 14 NH and 9 CI cases out of 72 participants the spatial registration was insufficient, and data from a comparable participant with a matching head circumference was used. On average, a matching head size could be found within an error of 0.39 cm (standard deviation = 0.30 cm, range = 0-1.25 cm). These participants are marked in the dataset with the variable “HeadSizeMatch”.

Metadata results

Figure 3 shows two examples from the metadata. The top row shows the distribution of participants based on their SRT. For NH participants, the values are consistent with normative data70. Some CI participants had an SRT comparable to those of the NH group, while others performed above this level (which means worse speech understanding in noise).

Fig. 3
figure 3

Example variables from the audiological and behavioral assessments between normal hearing (controls, NH) and cochlear implant (CI) participants: The top row is the participant distribution by speech reception threshold (SRT), and the bottom row is the reported listening effort based on the Adaptive Categorical Listening Effort Scaling (ACALES). The box plot shows the median, quartiles, data range, and any outliers.

The bottom row shows listening effort ratings across different conditions during the fNIRS speech comprehension task. As expected due to their lower speech understanding, CI participants reported higher listening effort in speech-in-quiet, speech-in-noise, and audiovisual conditions compared to NH participants (top row). In the visual speech (i.e., lipreading) condition, CI participants showed effort across a wide range, with a trend towards reporting lower effort compared to NH participants.

fNIRS data results

The neurophysiological basis of fNIRS lies in the detection of local hemodynamic responses which reflects neural activity. In an activated brain region the blood flow increases more than oxygen metabolism, leading to changes in hemoglobin concentration71. Typically, an increase in HbO and a corresponding decrease in HbR concentration indicate a canonical hemodynamic response. The fNIRS signal is inherently slow, peaking multiple seconds after stimulation72.

Figure 4 shows grand average evoked hemodynamic responses for three selected regions of interest (ROIs) across conditions in the NH group. Distinct activation patterns were observed: speech-in-quiet and speech-in-noise activated bilateral temporal ROIs, visual speech stimuli engaged mainly the occipital ROI, and audiovisual stimuli activated all ROIs. Channel positions contributing to each ROI are shown in the top right corner .

Fig. 4
figure 4

Grand average evoked responses in three selected regions of interest (ROIs) for each condition. The red signal represents the concentration change in oxygenated hemoglobin (HbO) and the blue signal in deoxygenated hemoglobin (HbR). Error bars represent the 95% confidence intervals. The signals are measured in the normal hearing (NH) participants, who are serving as controls. In the top right corner, we show the channel positions which were included in the corresponding ROI, indicating the brain regions contributing to the grand average patterns. An increase in HbO and a decrease in HbR concentration indicate typical hemodynamic responses.

Spatial registration results

We have visualized the registered probe positions on Fig. 5, showing lateral, caudal and medial views. The yellow dots represent all of the registered positions. The red dots represent the light sources, while the blue dots represent the light detectors for a single participant. The cap design ensured coverage of key audiovisual speech regions, including the occipital cortex, left and right temporal cortices, and prefrontal areas.

Fig. 5
figure 5

Visualization of spatially registered probe positions. The top left is the lateral, the top right is the medial, and bottom middle is the caudal view. Yellow markers represent all registered positions, while red dots denote light sources and blue dots denote detectors for a single participant.