Introduction

Mental imagery is an important component of human cognition and has value in many application domains. Mental imagery refers to an internal process in which perception- like representations of objects, scenes, events, or sensations emerge without direct external inputs1. Studies suggest that these internally generated representations acti- vate neural substrates that also respond to actual perception, indicating that mental imagery functions as a neural simulation of real sensory experiences2,3.

Among the various types of mental imagery, visual imagery has been studied in a more extensive way4. Previous research has shown that visual imagery activates early visual cortical areas (V1–V4) and is closely associated with spatial reasoning, visual memory, and visual creativity5,6. Auditory imagery engages the auditory cor- tex (A1)7, which plays an essential role in music memory, language comprehension, and speech learning processes. Motor imagery, associated with the motor cortex (M1), is frequently used in athletic training8, motor skill enhancement, and rehabilitation therapies to improve muscle coordination and precision of movement9. Beyond these widely studied types, olfactory imagery involves the olfactory cortex and related lim- bic structures, such as the piriform cortex and amygdala, and can trigger emotional and autobiographical memory activation10,11. Tactile imagery helps mentally recon- struct the sense of touch, including temperature, texture, and pressure12. Gustatory imagery activates the insular cortex and is closely related to emotional regulation and appetite control13. Emotional imagery involves emotional processing regions, including the amygdala and anterior cingulate cortex (ACC), and has been applied effectively in treatments for psychological disorders such as post-traumatic stress disor- der (PTSD) and anxiety disorders14,15. Importantly, different types of mental imagery often co-occur in naturalistic settings. Neuroimaging findings suggest that these inter- nally generated sensations partially recruit the same neural substrates that process real sensory input1,16. Mental imagery, in essence, is not merely symbolic or abstract representation; rather, on multiple levels, it simulates the brain’s response to actual perception. However, mental imagery does not occur exclusively in task-specific states; even in the absence of external stimuli, the brain remains highly active during rest- ing states, particularly in the medial prefrontal cortex, the posterior cingulate cortex/precuneus, and the lateral cortical regions, which consistently exhibit coordinated activity during rest17,18. This network, known as the default mode network (DMN), is closely related to memory recall19, mind wandering and daydreaming20, memory retrieval, self-referential thought, and mental simulation21,22,23.

Researchers have applied neural representations of mental imagery in electroen- cephalography (EEG) decoding technologies. The neural mechanisms of mental imagery include sensory simulation, memory retrieval, and internal thought regula- tion, and these mechanisms illuminate how the brain reconstructs sensory experiences in the absence of external inputs4,24. Studies indicate that mental imagery produces measurable EEG patterns, such as alpha-wave changes and altered occipital gamma waves25,26. These neural markers suggest that EEG signals offer a promising means to recognize and classify mental imagery without relying on external stimuli. Visual imagery appears in alpha-wave modulation and changes in gamma-band activity in occipital regions25,26. Auditory imagery involves cortical areas linked to auditory pro- cessing. Motor imagery modulates sensorimotor rhythms and aids rehabilitation and motor control. Each imagery type reflects distinct neural patterns, and EEG signals detect these patterns in real time. The ability to capture such activity advances inter- active technologies that integrate mental imagery for communication or control. Most studies focus on unimodal sensory imagery, such as purely visual or auditory forms. Research on multisensory integration remains less extensive27. Concurrent processing of visual, auditory, and tactile imagery needs better understanding of neural mecha- nisms that integrate these modalities. Mental imagery EEG datasets also encounter technical constraints. A low signal-to-noise ratio (SNR) and high interindividual vari- ability limit reproducibility of research findings28. Methods lack uniform standards, which complicates generalization of decoding results29. Observing EEG signals dur- ing mixed imagery tasks may clarify how the brain coordinates and merges several internal simulations.

The YOTO (You Only Think Once) dataset holds strong potential for advancing research into the neural mechanisms of mental imagery and resting-state brain activity. It provides a rich collection of non-invasive EEG recordings during multimodal mental imagery tasks and spontaneous rest from a diverse group of participants. We anticipate a wide range of applications for this dataset. For instance, it can be used to develop and evaluate EEG-based decoding models of imagined sensory experiences, investigate and compare dynamics of perceived and imagined sensory responses, and explore the integration of multisensory representations. In sum, YOTO complements existing EEG resources by providing a high-quality, systematically curated dataset tailored to the study of internal mental states and multimodal cognitive process.

Methodology

Participants

Twenty healthy participants (14 males, 6 females) volunteered to participate in the study, with a mean age of 23.3 years (median: 23 years, range: 20–36 years). All participants had normal or corrected-to-normal vision and provided their written informed consent prior to the experiment. Exclusion criteria included screen-induced dizziness, major diseases, irregular sleep patterns, poor sleep quality, disability, psy- chiatric disorders, or pregnancy. The study protocol was reviewed and approved by the Research Ethics Committee for Human Subject Protection, National Chiao Tung University, Taiwan (protocol No. NCTU-REC-108–128F; approval date: 31 March 2020; valid until 31 January 2023). All participants provided written informed consent for both participation and public sharing of anonymized EEG data.

EEG data acquisition

EEG signals were recorded in an electromagnetically shielded chamber using a high- fidelity electrophysiological recording system. The Polhemus 3SPACE FASTRAK system was used to position the Cz reference point before securing the EEG headset (Fig. 1). Thirty-two electrodes, including two reference electrodes at A1 and A2, were placed according to the 10–20 international system. Thirty-channel EEG signals were amplified and transmitted to a computer via a SynAmps RT 64 channel amplifier (Compumedics Neuroscan), digitized at 1000 Hz, and event markers were transmit- ted via a parallel port to indicate the onset of the trial, stimulus presentation, resting phase, and imagery phases.

Fig. 1
figure 1

EEG Data Acquisition System Architecture. (a) The EEG experimental setup, (b) The EEG cap with a 30-channel electrode configuration, (c) The 64-channel high-resolution EEG signal amplifier.

Experimental protocol

Each participant completed two separate sessions on different days, each session con- sisting of four blocks of 48 trials, interspersed with short breaks (Fig. 2). During the experiment, participants were instructed to keep fixation on a central cross while participating in mental imagery tasks associated with the stimuli presented.

Fig. 2
figure 2

Experimental Procedure and Trial Structure. Detailed presentation of the trial structure, clearly illustrating task phases including fixation, stimulus presen- tation, imagery phase, and subjective self-report.

Each trial followed a structured sequence, as shown in Table 1, beginning with a fixation period (2 s), during which the participants focused on a central cross to clear their thoughts. This was followed by a stimulus presentation phase (2 s), in which a visual, auditory, or combined stimulus was randomly displayed. Subsequently, the participants entered the imagery phase (4 s), during which they mentally visualized the stimulus received previously. Finally, during the self-report phase, participants rated the vividness of their mental imagery on a scale from 1 to 5. The duration of this phase varied across individuals. To assess baseline neural activity, resting-state EEG data were recorded both before and after the experimental session.

Table 1 Experimental procedure of a single trial consisting of four sequential phases.

Stimuli details

In the imagery task, visual and auditory stimuli were presented, individually or in combination, to ensure a complete examination of sensory processing.

The visual stimuli included a gray square (Fig. 3) and two neutral-expression facial images (one male, one female) taken from the Karolinska Directed Emotional Faces (KDEF) dataset30. The auditory stimuli included three human short vowels (/a/, /i/, /o/) and three piano tones (C: 261.63 Hz, D: 293.66 Hz, E: 329.63 Hz). These stimuli were systematically combined to create 27 unique stimulus conditions, which encompassed three categories: visual-only, auditory-only, and multimodal (visual-auditory) stimuli. To ensure a balanced distribution of stimuli in trials, a weighted randomization strategy was implemented (Table 2, Trials/Block). This weighted randomization approach prevented overrepresentation of specific conditions while maintaining sufficient exposure to all stimulus types, ensuring a balanced and unbiased experimental design.

Fig. 3
figure 3

Example of Gray Square. Visual stimuli employed in the experiment, including an author-designed grayscale square.

Table 2 Stimuli types and their corresponding trials per block.

Data preprocessing pipeline

Preprocessing of the EEG data was carried out using EEGLAB v202231 to ensure the integrity of the signal and the mitigation of artifacts. A causal FIR filter was used for bandpass filtering between 1 and 50 Hz, with a filter order of 500 and a buffer size of 30 seconds, effectively attenuating low-frequency drifts and high-frequency noise while preserving relevant neural oscillations. The data were subsequently resampled at 250 Hz to optimize computational efficiency while maintaining the fidelity of the recorded neural activity.

To address non-neural artifacts, artifact subspace reconstruction (ASR)32 was applied with a threshold parameter as suggested in33, which adaptively detects and reconstructs segments exhibiting excessive deviation from the statistical distribution of clean EEG data. The choice of the threshold parameter in ASR is essential for balancing between high-amplitude artifact removal and signal preservation, ensuring that transient high-amplitude artifacts (e.g. muscle activity, electrode displacement) are mitigated without excessively attenuating valid neural signals. Lower k values would result in overly aggressive rejection of artifacts, potentially removing informative neural activity, while higher values might allow significant artifacts to persist.

Following high-amplitude artifact correction, independent component analysis (ICA)34 with linear interpolation was performed to decompose the multichannel EEG signals into statistically independent sources, facilitating the isolation of neural components from non-neural artifacts. Finally, ICLabel35, a machine learning-based independent component classification tool, was utilized to automatically identify and remove artifacts related to ocular and muscular activities. Components classified as artifacts with a confidence probability greater than 0.8 were excluded from further analysis. This procedure effectively improved the SNR, ensuring that the retained components predominantly reflect neural activity.

Data Records

The dataset is publicly available on OpenNeuro36. The raw EEG recordings, stored in a BIDS-compliant structure, have been publicly released via https://doi.org/10.18112/openneuro.ds005815.v2.0.1. The data of each subject are organized in indi- vidual folders under /sub-<participant id>/eeg/, which contain raw EEG files in BrainVision format, including.vhdr,.eeg and.vmrk files for each session.

A separate /derivatives/ directory includes the processed EEG data used in technical validation. These files were preprocessed using a pipeline comprising a 1–50 Hz bandpass filter, resampling to 250 Hz, artifact subspace reconstruction (ASR), and independent component analysis (ICA), in order to ensure data quality and reduce artifacts prior to further analysis.

Technical Validation

To ensure the reliability and validity of the dataset, we conducted both behavioral and neurophysiological analyzes. Subjective vividness ratings were assessed to evalu- ate participants’ self-reported imagery experiences under different stimulus conditions. In parallel, neural responses were examined using event-related potentials (ERPs) to capture time-locked brain activity and power spectral density (PSD) analysis to char- acterize frequency-domain neural oscillations. These analyses were performed to visualize neural responses across experimental conditions.

Vividness ratings analysis

Figure 4 presents the distribution of vividness ratings across different stimulus condi- tions. A three-way repeated measures ANOVA was conducted to analyze the effects of subject, stimulus condition, and session. Significant main effects were observed for subject (F = 239.54, p < 0.001), stimulus condition (F = 4.04, p < 0.001), and the session (F = 14.61, p < 0.001). Interaction effects between the stimulus condition and the subject (F = 3.51, p < 0.001) and the subject and the session (F = 16.21, p < 0.001), while the interaction between stimulus condition and session (F = 0.67, p = 0.896)was not significant. A significant three-way interaction (F = 1.16, p = 0.009) was also observed. Self-report durations during the vividness rating phase were also recorded. Across all subject × session pairs, the minimum duration was 454.54 ms, the maximum was 5434.53 ms, the mean was 1515.24 ms, and the standard deviation was 1092.57 ms.

Fig. 4
figure 4

Distribution of Subjective Imagery Vividness Ratings Across Stim- ulus Conditions. Violin plot depicting the distribution of participant-rated vividness of mental imagery across different stimulus conditions. The horizontal axis categorizes auditory-only, visual-only, and multimodal stimuli, while the vertical axis indicates vividness ratings from 1 to 5.

Event-related potentials (ERP) analysis

Figures 57 show the ERP waveforms across stimulus conditions.

Fig. 5
figure 5

Event-Related Potentials (ERP) Waveforms for Visual Stimuli. ERP waveforms elicited by visual stimuli (male faces, female faces, and squares) at FCz, Cz, and Pz electrode sites. Facial stimuli evoke stronger neural responses compared to non-facial shapes, reflecting specialized neural processing of face perception.

Fig. 6
figure 6

Event-Related Potentials (ERP) Waveforms for Auditory Stimuli. ERP waveforms comparing auditory stimuli responses (human vowels and musical tones) at different electrodes.

Fig. 7
figure 7

Event-Related Potentials (ERP) Waveforms for Multimodal (Visual + Auditory) Stimuli. ERP waveforms highlighting neural responses to combined visual and auditory stimuli.

The ERP of the visual stimuli, as illustrated in Fig. 5, shows the waveform distributions of the visual stimuli, including male faces, female faces, and squares. The ERP waveforms for face stimuli show higher amplitudes at FCz, Cz, and Pz electrodes compared to non-facial stimuli.

The ERP of the auditory stimuli, depicted in Fig. 6, compare neural responses to vocal and musical stimuli.

The ERP of the mixture stimuli, shown in Fig. 7, illustrate the waveform char- acteristics of combined visual and auditory stimuli.

When comparing the differences in power spectral density (PSD) of brain waves between various types of imagined activities and the baseline condition (Fixation) (Fig. 8), significant regional brain wave characteristics were observed in mixed visual, auditory and audiovisual imagery.

Fig. 8
figure 8

Power Spectral Density (PSD) Differences between Imagery Tasks and Baseline Fixation. Comparative PSD analysis illustrating EEG spectral changes between imagery conditions (visual, auditory, multimodal) and baseline fix- ation. Visual imagery significantly enhances Alpha oscillations in the occipital lobe; auditory imagery primarily boosts Theta activity centrally; multimodal imagery broadly elevates neural oscillations across frequency bands.

Visual imagery (Fig. 8a) primarily activated the enhancement of alpha wave responses in the occipital lobe, while auditory imagery (Fig. 8b) showed a more pronounced enhancement of Theta waves in the central, frontal and parietal regions. Mixed audiovisual imagery (Fig. 8c) showed significant changes in all frequency bands, with wave enhancements that significantly exceeded those of single-modality stimuli

In Fig. 9a, it can be seen that Delta and Theta waves show significant reductions in the frontal region, with the reduction in Theta waves being more prominent. In contrast, the Alpha and Beta waves exhibit significant enhancements in the occipital lobe (O1, O2, Oz).

Fig. 9
figure 9

Power Spectral Density (PSD) Differences Among Various Imagery Conditions. Comparative PSD analyses highlighting neural spectral differences between (a) visual versus auditory imagery, (b) multimodal versus visual imagery, and (c) multimodal versus auditory imagery. Results underscore distinctive neural activa- tion patterns across different brain regions and frequency bands, notably heightened Alpha and Beta activities in multimodal imagery.

In Fig. 9b, the Delta waves show a slight enhancement in the occipital lobe, while the Theta waves demonstrate a significant improvement in the central (Cz) and parietal (Pz) regions. Alpha waves also show slight enhancement in the occipital lobe, while beta waves exhibit enhancement in the frontal and central regionsIn Fig. 9c, the Delta waves show an improvement in the occipital lobe, with negli- gible differences in the central and frontal regions. Theta waves demonstrate significant enhancement in the central region, with minimal differences in the frontal region. Alpha waves show significant enhancement in the occipital and parietal regions, while beta waves exhibit significant enhancement in the frontal and central regions with- out any noticeable reductions.

The dataset includes EEG signals recorded from both resting-state and task phases, providing complete coverage for further analysis.