Abstract
Developing decoders for imagined speech is crucial for individuals unable to produce audible speech, yet this process is challenging due to the lack of verifiable reference. This limitation highlights the need for innovative approaches to imagined speech decoding that can leverage the available verifiable reference from other types of speech production. To address this challenge, it is essential to understand the relationships between overt, mimed, and imagined speech. In our study, we utilized electrocorticography (ECoG) to examine neural correlates and decoder performance across these speech types during speech preparation. We observed high gamma activity (HGA) diverged earlier in the left hemisphere than in the right, suggesting an early stage of information processing for speech production. Also, HGA in overt and mimed speech showed similarity in the precentral and postcentral gyri, peaking at the end of speech preparation. In contrast, mimed and imagined speech demonstrated similarities in the superior temporal gyrus earlier in the process, also in HGA. These findings indicate that mimed speech shares neural signals with both overt and imagined speech in distinct spatial and temporal aspects, facilitating the development of imagined speech decoders for speech-impaired individuals.
Similar content being viewed by others
Introduction
Verbal communication is the most straightforward way to share thoughts. However, for patients with motor impairments caused by neurological conditions like Amyotrophic Lateral Sclerosis (ALS), the loss of speech is often regarded as one of the most devastating disabilities1. Speech Brain-Computer Interface (BCI) technology has emerged as a promising solution, enabling the restoration of communication by decoding neural signals associated with speech production. Previous speech BCIs have predominantly focused on precentral and postcentral gyri, including laryngeal motor cortices (LMC), to decode speech-related activity2,3,4,5.
To date, speech production decoding has primarily centered around overt speech production6,7,8,9. The performance of these decoders is evaluated by comparing the sounds produced by the subject with those generated by the decoder. However, considering that the primary beneficiaries of speech decoders are those who cannot produce audible output, there is an urgent need to develop technology that can decode mimed or imagined speech.
Decoding of non-audible speech is actively being pursued, recently3,10,11,12,13,14,15,16. In our previous study, we demonstrated that mimed speech could be moderately reconstructed using data from the precentral gyrus collected during overt speech3. However, the imagined speech could not be reconstructed from overt speech. Notably, in mimed speech, it is possible to obtain a verifiable reference from articulatory movements, allowing attempts to predict articulatory movements and thereby predict speech intent17. However, in imagined speech, a different approach is required.
Initially, the potential of imagined speech decoding was demonstrated through syllable decoding using EEG18,19. Recently, imagined words were decoded with high performance using single neuron firing data in the supramarginal gyrus20. However, due to the absence of the verifiable reference in imagined speech, directly decoding the neural signals into actual speech has remained a significant challenge in prior studies. This limitation highlights the need for a different approach to imagined speech decoding, one that leverages the available verifiable reference in other speech production types.
To address this challenge, it is crucial to understand the relationships between overt, mimed, and imagined speech. By characterizing the spatial, temporal, and spectral similarities and differences among these three speech types, we can refine existing decoders and bridge the gap between modalities with and without a verifiable reference. Leveraging the nested hierarchy revealed in previous studies, where mimed speech is situated between overt and imagined speech in neural representation21, offers a promising starting point for this exploration.
Focusing on the speech preparation stage is crucial for understanding the neural mechanisms underlying speech intention, as it reflects the cognitive and motor planning processes before articulation. This stage is particularly relevant when comparing overt, mimed, and imagined speech, as it allows us to bridge the neural representations between physically articulated and imagined speech production.
While the hierarchical relationship among these speech types has been established, the specific spatial and temporal dynamics underlying this hierarchy remain unclear. For instance, it is still uncertain whether mimed speech shares stronger neural alignment with overt speech due to articulatory activity, or with imagined speech due to the absence of acoustic output.
To address these gaps, this study aims to elucidate the spatial and temporal characteristics of neural activity during speech preparation across overt, mimed, and imagined speech. Using electrocorticography (ECoG), which provides high spatiotemporal resolution, we examine the shared and distinct neural mechanisms of these speech production types. By building on the existing nested hierarchy and further dissecting the cortical dynamics, we aim to uncover actionable insights that can enhance imagined speech decoding, ultimately improving BCI performance for individuals with severe speech impairments.
Results
Subjects were asked to read Korean words with three different types (Fig. 1a-c): overt, mimed, and imagined speech. The words and a fixation cross were displayed sequentially, each for 3 s and 1 s, respectively. Each session (overt, mimed, and imagined speech) consisted of 108 unique words, with each word presented once per session.
High gamma activities associated with three speech production types from each cortical region. a-c. Subjects were asked to perform overt (a), mimed (b), and imagined speech (c) for the word displayed on a monitor. This was repeated for 108 words each speech production type. d-i. High gamma activities during distinct speech production types in the inferior temporal gyrus (d) superior temporal gyrus (e), pars triangularis (f), rostral middle frontal gyrus (g), precentral gyrus (h), and postcentral gyrus (i). The figures in the left column show each region in the left hemisphere, while the figures in the right column show each region in the right hemisphere. Traces indicate the mean \(\:\pm\:\) standard error of high gamma activities. The red line underneath indicates time points during which activities significantly differed (FDR-corrected p < 0.0001; one-way ANOVA).
In the present study, we assumed that each neural state in specific speech production type is determined prior to speech onset. Since overt speech is unique with audible sounds generated and auditory feedback elicited, we deliberately analyzed the neural signals during the speech preparation period from word presentation to the onset of overt speech, lasting approximately 950 ms on average (Supplementary Fig. 1). For two no-audible-sound generating speech production types—mimed and imagined—the speech preparation period was assumed to coincide with before-the-overt SOT of the same word.
The high gamma activity (HGA) was analyzed from each electrode as it serves as the most important feature for speech decoding, given its critical role in representing speech-related neural activity during both silent and spoken speech production2,3,17.
Distinct high gamma patterns during overt, mimed, and imagined speech
First, we analyzed the high-gamma activities in the three speech production tasks - overt, mimed, and imagined speech to elucidate the temporal and spatial dynamics across all electrodes (n = 663) (Supplementary Fig. 2). Statistical significance of observed HGA differences was assessed using a one-way ANOVA across 20 time bins—each representing 5% of the trial duration (47.5 ms)—with false discovery rate (FDR) correction (α = 0.0001), as detailed in the Methods 3.6.
During the early phase of speech processing at 5–10% of the period (approximately 95 ms after word presentation), we observed simultaneous activation across all three types of speech production (overt, mimed, and imagined) in the lateral occipital gyrus (Supplementary Fig. 3a). This may represent visual processing with word presentation.
Approximately at 25–30% of the period (285 ms after word presentation), simultaneous activities were found in the left fusiform gyrus and inferior temporal gyrus (ITG) for all three speech production types (Supplementary Fig. 3b, Fig. 1d). This phase likely represents visual word processing after visual information.
After the initial time period, a less pronounced but discernible activity was observed in the superior temporal gyrus (STG) (Fig. 1e). Notably, the left STG showed higher activation for mimed and imagined speech compared to overt speech approximately from 30 to 35% of the period (332.5 ms after word presentation), which may reflect increased reliance on internal auditory simulation in the absence of external feedback during non-overt speech production. The right STG showed slightly later activation, approximately from 40 to 45% of the period (380 ms after word presentation). Interestingly, high gamma activation was observed in the STG in both hemispheres just before SOT during the overt speech, indicating a preparatory auditory process prior to sound production. This later activation pattern—particularly prominent in overt speech—may reflect the engagement of an internal forward model for error correction at real-time speech.
After the activation in left STG, activities in the left pars triangularis for overt and mimed speech increased compared to imagined speech at 20–25% of the period (237.5 ms after word presentation) (Fig. 1f). In the right hemisphere, the activities began to diverge at 50–55% of the period (522.5 ms after word presentation) similar to patterns observed in the right precentral and postcentral gyri (Fig. 1h, i).
A similar grouping of overt and mimed speech was observed in the left rostral middle frontal gyrus from 25 to 30% of the period (280 ms after word presentation) (Fig. 1g) as well. However, no distinct pattern was observed in the right rostral middle frontal gyrus. The consistent grouping of overt and mimed speech from the left pars triangularis to the left precentral and postcentral gyri may imply speech motor planning.
Interestingly, activation in the bilateral precentral (Fig. 1h) and postcentral gyri (Fig. 1i) was observed only during the overt and mimed speech tasks. The diverge occurs earlier in the left hemisphere than in the right. In the left hemisphere, precentral and postcentral gyri began to diverge at 25–30% and 30–35% of the period (285 ms and 332.5 ms after word presentation), respectively. On the other side, precentral and postcentral gyri began to diverge at 60–65% and 45–50% of the period (617.5 ms and 475 ms after word presentation), respectively. These results indicate that speech motor planning initiates earlier in the left hemisphere compared to the right hemisphere.
During the three speech production tasks, high gamma activity shows the complex and dynamic patterns across different cortical regions. Our findings suggest that each speech task has both shared and unique characteristics, spanning temporal and spatial domains in speech processing. Shared early visual and word processing stages were observed across all three speech production types. This was followed by task-specific patterns in speech motor planning and execution areas, with the left hemisphere showing earlier activation and a dominant role in speech production. Imagined speech exhibited less pronounced activity in motor-related areas compared to overt and mimed speech.
Evaluating statistical distances between speech production types
The distinct high gamma activity observed across different speech production types indicate specific functional characteristics for each cortical region. To assess the similarities and dissimilarities between the neural correlates of each speech production type, we first investigated correlation of average HGAs during speech preparation period for all implanted electrodes between speech production types. We observed a significant positive correlation of the average HGA between overt and mimed speech (Fig. 2a; r = 0.56, p < 0.01), suggesting a moderate neural alignment between these two types of speech production. Similarly, there was a low but significant correlation between mimed and imagined speech (Fig. 2b; r = 0.18, p < 0.01), which indicates a lesser, yet present, neural correspondence between these conditions. In contrast, the average HGA for overt and imagined speech did not show a significant correlation (Fig. 2c; r = 0.10, p = 0.08), highlighting a distinct neural processing pattern for imagined speech as compared to overt speech production.
Correlations between each speech production type and temporal dynamics of distances between each two speech production types for each cortical region. a-c. Correlations of averaged HGA of each electrode across speech preparation time and trials between (a) overt and mimed speech, (b) mimed and imagined speech, and (c) overt and imagined speech. Each dot represents each electrode. d-g. Temporal dynamics of the Hellinger distances between two speech production types at (d) superior temporal gyrus, (e) pars triangularis, (f) precentral gyrus, and (g) postcentral gyrus. The purple, magenta, and green lines depict the statistical distance between overt and mimed, between mimed and imagined, and between overt and imagined speech. The gray regions indicate confidence interval of 99.7% (± 3SD) generated by permutation.
Furthermore, we quantified the distance between neural signals of each speech production types (Fig. 1d-i). The Hellinger distance was employed to define the statistical distance between two speech production types, with higher values indicating more distinct neural activation patterns and lower values suggesting more similar representations.
In both the left and right STGs (Fig. 2d), neural correlates of mimed and imagined speech remained consistent throughout the entire speech preparation. Conversely, distances for the other pairs increased until the actual production began, reflecting the generation of audible sounds in overt speech.
For the left pars triangularis, the three speech types showed high similarity except at 25–30% of the period (285 ms after word presentation). The distance between overt and mimed speech remained low for all time bins, implying the motor related property of the region as mentioned earlier. In the right hemisphere, the distance between overt and mimed speech remained low but it started diverging at 80–85% of the period (807.5 ms after word presentation) (Fig. 2e).
The left and right precentral gyri (Fig. 2f) showed significant differences in distances for all pairs, since around 30% and 80% of the period (285 ms and 760 ms after word presentation), respectively. This illustrates that the processing of speech motor information begins much later in the right precentral gyrus. The divergence between overt and mimed speech in the left precentral gyrus may imply functions related to the generation of audible sound, such as vocalization. The left and right postcentral gyri (Fig. 2g) demonstrated property similar to the ipsilateral precentral gyri. However, in the left postcentral gyrus the distance between overt and mimed speech remained low until 80–85% of the period (807.5 ms after word presentation).
Neural similarities through classification among speech production types
Next, we assessed the neural similarities between three distinct types of speech. We employed a machine learning classifier trained to categorize averaged HGA during the speech preparation period into one of the three speech production classes (Supplementary Fig. 4). As HGA demonstrated the highest accuracy across different neural frequency bands for assessing neural similarities between speech production types (Supplementary Fig. 5), we assessed the similarities based on the band. The result of the three-class classification is shown in Supplementary Fig. 6.
To assess neural similarities, we grouped the three speech production types into pairs, combining two types into one class and comparing this combined class against the remaining type. The underlying assumption was that if the difference between the combined class and the remaining class was significant, it would indicate that the two combined types were more similar to each other than to the remaining type. The first comparison is combined overt and mimed speech versus imagined speech. For this comparison, overt and mimed speech were considered as one class and imagined speech as the other. The second comparison is combined mimed and imagined speech versus overt speech. The third comparison was combined overt and imagined speech versus mimed speech. The performance of these comparisons was evaluated using balanced accuracy. These evaluations enabled us to draw conclusions about the neural similarities between the different speech production types.
Analysis revealed that decoding accuracies for all three comparisons significantly surpassed chance levels (Fig. 3a). Notably, the highest accuracy was observed when overt and mimed speech were grouped against imagined speech (Balanced accuracy: 72.1 \(\:\pm\:\) 8.89%). This was followed by the performance when distinguishing overt speech as a different class from mimed and imagined speech (Balanced accuracy: 67.5 \(\:\pm\:\) 6.20%). The least accuracy was noted in the comparisons where overt and imagined speech were combined together, with mimed speech treated as a separate category (Balanced accuracy: 59.0 \(\:\pm\:\) 5.03%). This result highlights a closer neural similarity between overt and mimed speech compared to imagined speech, and subsequently, a greater neural similarity between mimed and imagined speech compared to overt speech. These classification trends were consistent across other machine learning methods, including LDA, Logistic Regression, and KNN (Supplementary Fig. 7).
Neural similarity between three speech production types. a. Balanced accuracy of decoding performance using all inserted electrodes for each subject in three binary comparisons: (1) combined overt and mimed speech vs. imagined speech (purple), (2) combined mimed and imagined speech vs. overt speech (magenta), and (3) combined overt and imagined speech vs. mimed speech (green). Chance-level distributions, depicted in gray, are derived from randomly permuting the class labels. b. Confusion matrix summarizing the classification performance of the Random Forest model. The diagonal elements represent the proportion of correctly classified instances for each speech type, while off-diagonal elements represent misclassification types. c. Temporal dynamics of decoding performance, measured by balanced accuracy, over the speech preparation period. The purple line represents the first comparison which combined overt and mimed speech vs. imagined speech, the magenta line depicts the second comparison combined mimed and imagined speech vs. overt speech, and the green line illustrates the third comparison combined overt and imagined speech vs. mimed speech. Each line traces the change in accuracy from the onset of word presentation (0%) to the SOT (100%). Shaded areas indicate the standard error across subjects.
Figure 3b presents the confusion matrix summarizing the classification performance. The diagonal elements show that imagined speech was the most accurately classified (65%), followed by overt speech (59%) and mimed speech (41%). Importantly, the off-diagonal elements reflect the relative proximity between speech production types. Overt speech was most frequently misclassified as mimed speech, and vice versa, indicating a closer neural similarity between these two types. Similarly, mimed speech showed moderate misclassification as imagined speech, suggesting a greater similarity between these two types compared to overt and imagined speech. These patterns reinforce the results observed in Fig. 3a, where overt and mimed speech exhibit the greatest similarity, followed by mimed and imagined speech.
Figure 3c illustrates the temporal dynamics of balanced accuracy for the three comparisons during the speech preparation period. The accuracy for distinguishing combined overt and mimed speech from imagined speech steadily increased over time, peaking just before the SOT. The second comparison, involving combined mimed and imagined speech versus overt speech also showed an increase in accuracy over time, with the performance interestingly showing a dramatic increase after the 75% period from word presentation to SOT (about 712.5 ms after word presentation). In contrast, the accuracy for differentiating overt and imagined speech when combined, with mimed speech as a distinct category, remained relatively stable and closer to the chance level line across the entire period.
Regional differences in temporal dynamics during overt, mimed, and imagined speech
Previous analyses delineated neural similarities across speech production types by the classifier trained by all inserted electrodes of each subject. To further elucidate the distinct neural patterns associated with each cortical region, we employed individual classifier trained by individual electrode. These classifiers were tasked with distinguishing between the three speech production types, and their performance was quantified using balanced accuracy, akin to the approach detailed in Fig. 3 for three distinct comparisons.
When we consider the average accuracies over channels, the first comparison that is combined overt and mimed speech versus imagined speech and second comparison that is combined imagined and mimed speech versus overt speech display higher accuracies compared to the third comparison combining overt and imagined speech versus mimed speech (Fig. 4a). The regional differences of functional roles were analyzed by identifying electrodes achieving above 70% accuracy as distinctive. Our findings demonstrate that the activation patterns differ among regions, reflecting diverse responses to different types of speech.
Regional differences in temporal dynamics for different speech production types. a. Balanced accuracy of decoding performance for individual classifier trained by individual electrode across three comparisons: (1) combined overt and mimed speech vs. imagined speech (magenta), (2) combined mimed and imagined speech vs. overt speech (red), and (3) combined overt and imagined speech vs. mimed speech (green). b. Feature importance over time with a balanced accuracy exceeding 70% accuracy for each comparison of neural similarities between speech production types. The magenta line represents the comparison that is combined overt and mimed speech vs. imagined speech (Overt + Mimed vs. Imagined). The red line represents the comparison that is combined mimed and imagined speech vs. overt speech (Overt vs. Mimed + Imagined). The x-axis shows the period as a percentage of the speech preparation time, from word presentation (0%) to speech onset (100%). The y-axis shows the feature importance, indicating the contribution of each time window to the classifier’s decision. Shaded areas represent the standard error for the feature importance values. c-d. Electrodes with a balanced accuracy exceeding 70% accuracy on a brain template. (c) Electrodes where the combined overt and mimed speech vs. imagined speech comparison shows a balanced accuracy exceeding 70% (magenta). These electrodes are primarily located within the bilateral precentral and postcentral gyri, including the dorsal and ventral laryngeal motor cortices (dLMC: 4 electrodes, vLMC: 2 electrodes) and the primary somatosensory cortex (2 electrodes). (d) Electrodes where the combined mimed and imagined speech vs. overt speech comparison shows a balanced accuracy exceeding 70% (red). These electrodes are predominantly located in the bilateral superior temporal gyrus (STG: 12 electrodes). Regions associated with speech motor function exhibit high decoding accuracy when grouping overt and mimed speech (c), while those associated with auditory perception demonstrates high decoding accuracy when grouping mimed and imagined speech (d).
In the first comparison that is combined overt and mimed speech versus imagined speech, distinctive electrodes were predominantly located within the bilateral precentral and postcentral gyri (Fig. 4c). The precentral gyrus, with its distinct somatotopic representation, includes electrodes situated near the dorsal and ventral laryngeal motor cortices (dLMC, 4 out of 10, and vLMC, 2 out of 10 electrodes respectively) and the primary somatosensory cortex linked to the articulatory system (2 out of 10 electrodes) (Fig. 4c). These regions are recognized for their involvement in the motor control of speech articulation, a key feature distinguishing imagined speech from the other two types.
Conversely, in the second comparison that is combined mimed and imagined speech versus overt speech, the bilateral STG (12 out of 14 electrodes) stands out significantly (Fig. 4d). This region is associated with auditory perception which is the distinctive characteristic of overt speech from the other two types. Contemporary speech processing theories, such as the state feedback control (SFC) model, hypothesize an internal feedback loop that anticipates auditory responses22. The STG’s activation prior to auditory perception thus can be attributed to the efference copy.
Furthermore, we examined key time windows during the speech preparation period when different speech production types become similar, based on the feature importance of the RF model (Fig. 4b). After the 75% period from word presentation to SOT (about 712.5 ms after word presentation), both first and second comparisons show a marked increase in feature importance. This suggests that HGA in this time window is particularly critical for distinguishing between the different speech production types.
In contrast, the third comparison that is combined overt and imagined speech and mimed speech yields no distinctive electrodes. This suggests that mimed speech is less distinct compared to the other speech production types.
Methods
Subjects
ECoG recordings were collected from thirteen patients (5 females; age = 35.3 \(\:\pm\:\) 10.8), all native speakers of Korean and diagnosed with drug-resistant epilepsy, as part of routine clinical procedures at Seoul National University Hospital (SNUH). We excluded data from a subject with occipital lobe epilepsy because the electrodes were only placed on the occipital lobe, the function of which is out of the scope of our study. The placement of electrodes was guided entirely by therapeutic needs. Prior to participating in the study, all subjects provided written informed consent. The study protocol was approved by the Institutional Review Board of Seoul National University Hospital (Approval No. H-2011-087-1173). All methods were performed in accordance with the relevant guidelines and regulations, including the Declaration of Helsinki.
Experimental design
Subjects were directed to read words displayed on the screen under three distinct conditions: overt speech, mimed speech, and imagined speech. The experiment was conducted in three separate sessions, with each session corresponding to one of the speech production types. The order of the sessions was fixed for all subjects, starting with the overt speech session, followed by the mimed speech session, and concluding with the imagined speech session. This consistent sequence was chosen to maintain uniformity across subjects.
We set this order to encourage active speech imagination during the imagined speech session by placing the overt and mimed speech sessions beforehand, both of which involve articulatory activities. This structure helped participants actively imagine producing speech rather than passively perceiving it.
Within each session, a list of 108 Korean words (Supplementary Table 1) was presented. These words were selected based on the word list used in a previous study on speech decoding study (Meng et al., 2023)3. In the previous study, the vocabulary was chosen from the literature on children’s pain vocabulary23 and communication during mechanical ventilation in intensive care units24. The words were displayed one at a time in a pseudorandom order to minimize predictability. Each word appeared for 3 seconds, followed by a 1-second fixation slide for baseline correction.
Since each session focused on one speech production type, subjects produced each word once per session. Therefore, each session consisted of 108 words production tasks (trials).
In the mimed speech condition, subjects were instructed to move their lips as if speaking the word but without producing any sound. In the imagined speech condition, subjects were asked to neither vocalize nor move their lips but instead to imagine speaking the word as if they were producing it. These instructions were verbally communicated to the subjects before each session to ensure they understood the required speech production type, thereby establishing a clear distinction in motor and cognitive activity across the three speech production types. There were no exclusions based on task non-compliance, as all subjects were native Korean speakers without speech impairments, and the words used were basic and familiar.
The experiment’s framework was implemented in Python, with the Pygame library to simultaneously record audio, which was synchronized to the event triggers in the ECoG recordings.
Audio data acquisition and determining speech production onset time
The acquired audio signals contain noise and artifacts, complicating the extraction of reliable, fine acoustic features. To mitigate such undesirable elements, the signals were processed using Online Audio Noise Reduction (https://audiodenoise.com/) with the AI speech enhancement option.
We subsequently defined the speech onset time (SOT) for overt speech as the moment when audible sounds first began. To assess the audibility, we calculated the Mel spectrum using the Librosa library, with a shift of 10 ms. This calculation utilized 80 Mel bands, spanning a frequency range from 55 Hz to 7500 Hz. For each frame, we determined the maximum envelope by identifying the highest value in the Mel spectrum. The SOT for each word was then marked at the first occurrence where the z-score exceeded a value of 2 (Supplementary Fig. 1a). Despite differences in the number of syllables, the speech onset time remains consistent across conditions (Supplementary Fig. 1b).
To ensure temporal consistency and remove outliers, we excluded trials in which the overt SOT exceeded three standard deviations above the mean—specifically, those longer than 1.55 s (Average, 0.87 s; Standard deviation, 0.22 s) after word presentation. To maintain consistency across conditions, the trials corresponding to the same word in the mimed and imagined speech sessions were also excluded from analysis.
For mimed and imagined speech, which do not produce audible sounds, we determined the SOTs for each word based on the SOTs of the corresponding words in overt speech.
Neural data acquisition and preprocessing
The ECoG signals were acquired with high density subdural arrays (PMT, USA). The signals were acquired utilizing a 128-channel Neuvo system (Compumedics Neuroscan, Victoria, Australia), with a sampling rate of 2,000 Hz. We analyzed ECoG signals of each electrode, excluding any electrode presenting impedances above 30 k\(\:{\Omega\:}\) during recording to ensure signal quality. The raw ECoG signals were bandpass filtered within a frequency range of 1 to 200 Hz and subsequently re-referenced to the common average reference. To remove electrical line noise, the re-referenced signals underwent notch filtering at 60, 120, and 180 Hz.
High gamma activity (HGA; 70–150 Hz) was defined as the amplitude of the signal envelope obtained by applying the Hilbert transform to the bandpass-filtered signal within this frequency range. Baseline correction was performed using the HGA during the fixation period, specifically from 0.5 to 0.95 s before word onset, to ensure consistency across trials. The corrected HGA was then used for further analyses.
Electrode localization
The ECoG electrodes were localized on each subject-specific brain by co-registering preoperative MR images with postoperative CT images utilizing CURRY 9.0 (Compumedics Neuroscan, Victoria, Australia). MR and CT images were obtained with a Magnetom Trio Tim 3.0 T scanner and a Somatom Sensation 64 eco (both Siemens, Erlangen, Germany), respectively. Electrode coordinates were initially referenced to the Talairach system and subsequently converted to the Montreal Neurological Institute (MNI) coordinate system. This conversion utilized the MATLAB script ‘tal2mni.m’ from the GingerALE software v. 3.0 (www.brainmap.org)25,26. Electrode positions were further refined by localizing each electrode to the nearest cortical point based on Euclidean distance. This spatial data was then integrated into surface maps delineating electrode coordinates relative to Brodmann areas in the MNI space, employing the MNE Python library version 1.6.027.
Statistical measurements between speech production type
We employed two measurements to assess the relationship between different speech production types using the SciPy library. High-gamma activity (HGA) values were used as the primary measure, calculated by averaging data across time and trials for each speech production type. The analysis window was divided into 20 non-overlapping bins, each corresponding to 5% of the trial duration (47.5 ms). This bin size was chosen to match the temporal resolution of task-related high-gamma transients28. Given that the high-gamma band spans 70–150 Hz, a 47.5 ms window encompasses approximately 7 cycles of oscillatory activity, ensuring statistically stable power estimation within each bin. To assess a statistically significant effect of speech production type on HGA values, a one-way analysis of variance (ANOVA) was performed at each of 20 time bins. To control for multiple comparisons, false discovery rate (FDR) correction was applied using the Benjamini–Hochberg method (α = 0.0001) (Fig. 1, Supplementary Fig. 3).
For Fig. 2a-c, electrodes were excluded if their HGA values were biologically implausible (e.g., less than 0) or demonstrated consistently low activity (e.g., all averaged values below 0.5). Outliers were then removed if they were below Q1–1.5 times the interquartile range (IQR) or above Q3 + 1.5 times the IQR, to minimize bias. Linear relationships between speech production types were assessed using Pearson’s correlation coefficient. This method was chosen for its sensitivity to linear dependencies and its interpretability in comparing the strength of neural correlations across conditions. Linear least-squares regression was employed to fit the data, providing a quantitative measure of the correspondence between average HGA values during the speech preparation period.
For Fig. 2d-g, the Hellinger distance was calculated using a combination of kernel density estimation (KDE) and permutation-based surrogate distributions to quantify the statistical distances. For each cortical region, distributions of HGA values were estimated for overt, mimed, and imagined speech production types using Gaussian KDE, ensuring smooth and continuous probability density functions. The computation of Hellinger distance involved generating points over a range defined by the minimum and maximum HGA values across all speech production types within each cortical region. The probability density functions (PDFs) for the compared conditions were evaluated at these points, and the Hellinger distance was calculated as:
where \(f\left( x \right)\) and g\(\left( x \right)\) are the PDFs of two speech production types. For efficiency, this integral was approximated numerically using 1,000 evenly spaced points.
To establish confidence intervals for the observed Hellinger distances, surrogate distributions were generated by permuting and reshuffling the combined HGA data across conditions. For each permutation, a Hellinger distance was recalculated, and this process was repeated 3,000 times to produce a distribution of surrogate distances. To enhance interpretability, all distance time series, including those from surrogate distributions, were smoothed using a Savitzky–Golay filter (window length = 10, polynomial order = 4). The observed distances were compared to these surrogate distributions, with 99.7% confidence intervals (± 3SD) derived from the surrogate data, providing robust statistical validation for the observed results.
Classification of speech production type
To quantitatively assess the degree of separation or overlap in neural activity patterns across the three speech production types—overt, mimed, and imagined speech—we employed a classification approach. Specifically, we implemented a Random Forest (RF) classifier to categorize the HGA into the three speech types. The classifier was trained and evaluated separately for each subject, ensuring subject-specific classification tasks. The RF classifier was selected as the primary method due to its ability to calculate feature importance, which enabled the identification of key neural features that differentiated between the speech production types (Fig. 4b).
To validate the robustness of the results, we additionally tested other well-established machine learning algorithms, including Linear Discriminant Analysis (LDA), Logistic Regression, and k-Nearest Neighbors (KNN). All these methods demonstrated similar classification trends, providing further confidence in the consistency of the findings (Fig. 3a and Supplementary Fig. 7).
Three distinct classification models were developed for each subject to evaluate the neural distinctions among the speech production types. The first model was trained using the average HGA calculated over the entire speech preparation period across all implanted electrodes (Fig. 3a). Classifier significance was assessed using permutation testing by randomly shuffling class labels and comparing the observed accuracy to the resulting null distribution. This approach provided a broad assessment of neural activity across the preparation phase. The second model utilized the average HGA from specific time windows within the speech preparation period, considering data from all electrodes (Fig. 3c). The final model was designed to investigate spatial specificity by training the classifier with the average HGA from a single electrode. This model incorporated HGA data across all time windows during the speech preparation period as multidimensional inputs, enabling the identification of cortical regions and time periods that were most informative for differentiating speech types (Fig. 4).
Evaluation metrics
To assess the neural similarities between the different pairs of speech production types, we defined three comparisons based on binary classification as evaluation metrics. These comparisons were designed to test specific hypotheses about the relationships between the speech production types by transitioning the classifier’s output into binary classes (Supplementary Fig. 4).
The first comparison assumed that overt speech is more similar to mimed speech than to imagined speech. For this comparison, overt and mimed speech were considered as one class and imagined speech as another class. The classifier’s output was recoded accordingly: if the predicted result was either overt or mimed, it was labeled as one class; if the predicted result was imagined, it was labeled as the other class.
The second comparison assumed that mimed speech is more similar to imagined speech than to overt speech. For this comparison, mimed and imagined speech were considered as one class and overt speech as another class. The classifier’s output was recoded as follows: if the predicted result was either mimed or imagined, it was labeled as one class; if the predicted result was overt, it was labeled as the other class.
The third comparison assumed that overt speech is more similar to imagined speech than to mimed speech. For this comparison, overt and imagined speech were considered as one class and mimed speech as another class. The classifier’s output was recoded such that if the predicted result was either overt or imagined, it was labeled as one class; if the predicted result was mimed, it was labeled as the other class. The performances of these comparisons were evaluated using balanced accuracy, which accounts for potential class imbalances.
ECoG signals preprocessing were performed using custom-written scripts in MATLAB R2022b (MathWorks; www.mathworks.com) and Python version 3.10.9.
Discussion
In the present study, we explored the neural patterns of overt, mimed, and imagined speech during the speech preparation period to elucidate the shared and distinct neural mechanisms among these speech production tasks using HGA measured from ECoG. Our findings demonstrate unique patterns of neural activities and significant differences in the temporal dynamics and spatial distribution of brain regions involved in each speech task.
Shared and distinct neural activities
Our results highlight significant overlaps in neural activities among the three types of speech, particularly in the early phases of processing, as observed in the bilateral lateral occipital gyrus, left fusiform gyrus, and bilateral ITG (Supplementary Fig. 3, and Fig. 1d). These shared activations suggest a common visual and linguistic processing stage29,30,31,32, irrespective of the mode of speech production. This may imply that basic sensory processing and initial linguistic parsing are preserved across different forms of speech.
Additionally, the activation of the ITG, known for semantic processing, and the pars triangularis, involved in speech motor planning, occurs simultaneously during overt and mimed speech (Fig. 1d and f). This suggests that during a word reading task, semantic processing and speech motor planning are performed in parallel after visual processing.
However, distinct patterns emerged in later stages of processing, particularly with stronger and more sustained activation in regions involved in motor planning and execution before speech onset of overt speech, such as the bilateral precentral and postcentral gyri (Figs. 1h and i and 2f and g). Notably, there is also activation in these regions during mimed speech, although the activation is lower than in overt speech. This indicates that while early perceptual functions might be shared, the execution processes remain distinct, supporting planning for articulatory movement in overt and mimed speech that are not engaged in imagined speech33,34,35.
Furthermore, activation in the left hemisphere was observed earlier than in the right hemisphere (Fig. 1e-i). The left hemisphere is known to be involved in both the ventral stream, which is more lexical, and the dorsal stream, which is more phonetic/phonological36. This earlier activation may reflect the dominant role of the left hemisphere in language processing, where lexical and phonetic/phonological processing are prioritized. In contrast, the right hemisphere, which is more involved in prosody36,37, showed later activation. This suggests that the left hemisphere’s more efficient and prioritized processing pathways for speech-related tasks contribute to the faster activation observed in our results. These findings enhance our understanding of the shared and distinct neural mechanisms underlying different speech production types.
One limitation of our experimental paradigm is the use of a fixed task order across all subjects, starting with the overt speech, followed by the mined speech, and concluding with imagined speech session. While this sequence was chosen to promote engagement in the imagined condition, it may introduce potential task-order effects, such as fatigue or habituation, that could influence neural responses. Future studies should consider counterbalancing task order to better isolate production type effects from sequential influences.
Validation of speech onset time (SOT) for mimed and imagined speech
In this study, the SOT for overt speech was defined as the moment when audible sounds first began, providing a verifiable reference. For mimed and imagined speech, SOTs were estimated based on the corresponding overt speech SOT for the same word, given the absence of audible sounds. This choice was made to minimize the burden on subjects, as collecting additional articulatory signals (e.g., EMG) would have required attaching more equipment, potentially causing discomfort to epilepsy patients.
Nevertheless, we validated this approach by analyzing neural activity patterns in key regions such as the fusiform gyrus and IFG. Specifically, HGA in these regions, notably within the lateral occipital gyrus, fusiform gyrus, and inferior temporal gyrus (ITG), exhibited peak timings that aligned closely across all three speech production types, including overt, mimed, and imagined speech (Supplementary Fig. 3 and Fig. 1d). This finding confirms temporal consistency in the initial stages of speech processing. However, we acknowledge that this validation, being rooted in early-stage processing, serves as an indirect proxy for the timing of subsequent motor planning. Therefore, these findings provide a reasonable, albeit indirect, basis for using the overt SOT as a reference across conditions, allowing for the analysis of preparatory neural activity.
Furthermore, to provide additional support for verifying participants’ compliance during imagined speech, we compared HGA peak timing between early and late overt SOT trials (Supplementary Fig. 8). Peak HGA occurred significantly later in late-onset trials across key regions, including the left postcentral gyrus, STG, and pars triangularis (p < 0.001, Bonferroni-corrected; one-sided Mann–Whitney U test). This timing shift suggests that neural activity during imagined speech preserved a temporal structure aligned with overt production, providing strong indirect evidence of participants’ task compliance.
Notably, in motor-related regions such as the precentral gyrus, postcentral gyrus, and pars triangularis, mimed speech did not show earlier activation compared to overt speech. This further supports that the observed neural alignment reflects preparatory processes, rather than prematurely executed speech.
Unique aspects of overt speech
A notable finding in our study was the significant activation pattern observed in the bilateral STG during overt speech, even before the speech onset (Figs. 1e and 2d). This pattern was not pronounced in mimed or imagined speech. Given that activation in the bilateral precentral and postcentral gyri before overt speech onset is highest, the activation in the bilateral STG prior to speech onset in overt speech may reflect the involvement of an internal model of SFC model of speech production, which is hypothesized to play a role in predicting the sensory outcomes of speech production38,39,40. The efference copy may be crucial for adjusting motor commands during speech, suggesting that auditory feedback mechanisms are actively engaged even before actual sound production begins41,42. This finding highlights the unique aspects of neural processing in overt speech that are essential for real-time speech modulation and error correction.
Functional roles of cortical regions by distances and decoding performance
The Hellinger distances between different types of speech production suggest specific functional roles for each cortical region. Regions not associated with specific speech functions, such as the bilateral lateral occipital gyrus, left fusiform gyrus, and bilateral ITG, exhibit similar HGA patterns across all speech types (Supplementary Fig. 3 and Fig. 1d). Conversely, regions involved in shared functional components show similar HGA patterns for each speech type. Notably, the bilateral precentral, postcentral gyri and bilateral pars triangularis show shared HGA patterns for overt and mimed speech, although the activation intensity differed (Fig. 2f and g), while the bilateral STG display similar HGA patterns for mimed and imagined speech (Fig. 2d and e). Decoder performance further corroborates these findings. A closer neural similarity between overt and mimed speech is observed in the bilateral precentral and postcentral gyri, with regions concentrated in the LMC (Fig. 4c), highlighting the prominence of articulatory movement features and the role of the posterior dorsal stream in both overt and mimed speech3,17,43. Additionally, a neural similarity between mimed and imagined speech is observed in the bilateral posterior STG (Fig. 4d), implying shared neural processes and a common functional role of the anterior ventral stream in auditory object identification44. These consistent results from both statistical and decoding analyses highlight the shared and distinct neural mechanisms underlying different speech production types.
Implications for BCI development
These results suggest that overt speech may not provide the appropriate verifiable reference for training decoders for imagined speech due to its distinct neural activation patterns. Conversely, the neural signals from mimed speech could potentially provide more suitable verifiable reference for imagined speech. By leveraging the neural similarities found in the STG during mimed and imagined speech, decoder could be better tailored to imagined speech, enhancing the functionality and usability of BCIs.
This study extends prior work by demonstrating that mimed speech, rather than overt speech, provides a more neurally congruent reference for imagined speech decoding—an aspect largely overlooked in earlier BCI studies.
Notably, Single-neuron recording studies have demonstrated that the human STG encodes diverse speech features, such as consonants, vowels, relative vocal pitch, and onsets45. This underscores the importance of STG in developing speech BCIs. Furthermore, the observed similarities in neural activity between imagined speech and mimed speech in the STG suggest promising strategies for BCI training and development using mimed speech. These findings point to new approaches that could potentially improve the accuracy and efficiency of BCI systems for imagined speech decoding.
Future directions
In this study, we intentionally limited our analysis before the onset of speech in order to exclude the effects of auditory feedback resulting from the audible sounds produced during overt speech. This approach helps isolate the neural dynamics associated purely with the preparatory processes of speech production. In contrast, previous studies have analyzed signals after speech onset, depicting features for speech decoding in BCIs.
Therefore, future research should consider analyzing neural activity at various time points, including during and after speech onset. Such an extensive temporal analysis would provide comprehensive understanding of the similarities and differences across speech production types.
In addition to temporal analysis, exploring inter-regional interactions during speech preparation could provide deeper insights into the dynamic coordination between auditory and motor areas. Incorporating connectivity or causal inference analyses would be particularly valuable. These analytical approaches could help to reveal the underlying mechanisms of inter-regional connectivity and causal relationships between key brain regions involved in speech production. Although we did not include these analyses in the current study, we acknowledge their potential importance and plan to address them in future research.
Furthermore, while participants reported that they actively imagined producing the words, the current study lacks a quantitative measure to assess the vividness or fidelity of imagined speech. To promote vivid speech imagery, the task was intentionally structured to follow overt and mimed speech sessions. However, self-reports alone may not sufficiently capture inter-individual variability in imagination strength. Future studies should therefore incorporate more objective and quantitative measures (e.g., rating scales). This would help validate participant engagement and enhance the interpretability of decoding results from imagined speech. Moreover, since imagined speech does not produce observable behavioral output, the exact timing of its internal execution cannot be directly verified. This remains a limitation when aligning neural activity across speech conditions using overt speech onset time as a reference.
Additionally, identifying brain signals with common characteristics across overt, mimed, and imagined speech could provide valuable insights into the neural substrates of core language processing. These shared neural patterns might represent fundamental aspects of language that persist regardless of the mode of speech production. Further investigation into whether these common signals can be decoded to reveal core linguistic elements would be particularly intriguing. Such research could potentially uncover the neural basis of language’s essential components and lead to the development of BCIs capable of decoding these fundamental linguistic features across different speech production modalities.
These findings could contribute to the development of more versatile and universally applicable BCIs that are capable of effective operation regardless of the user’s ability to produce audible speech. By focusing on the core neural processes underlying language, we may be able to create more robust and adaptable speech decoding systems that benefit a wider range of users, including those with various speech production capabilities.
Data availability
The data that support the findings of this study are available from the corresponding authors upon request.
References
Hecht, M. et al. Subjective experience and coping in ALS. Amyotroph. Lateral Scler. Other Motor Neuron Disord. 3, 225–231 (2002).
Metzger, S. L. et al. A high-performance neuroprosthesis for speech decoding and avatar control. Nature 620, 1037–1046 (2023).
Meng, K. et al. Continuous synthesis of artificial speech sounds from human cortical surface recordings during silent speech production. J. Neural Eng. 20, 046019 (2023).
Berezutskaya, J. et al. Direct speech reconstruction from sensorimotor brain activity with optimized deep learning models. J. Neural Eng. 20, 056010 (2023).
Stavisky, S. D. et al. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 93–97 (IEEE, 2018).
Herff, C. et al. Generating natural, intelligible speech from brain activity in motor, premotor, and inferior frontal cortices. Front. Neurosci. 13, 1267 (2019).
Angrick, M. et al. Speech synthesis from ECoG using densely connected 3D convolutional neural networks. J. Neural Eng. 16, 036019 (2019).
Herff, C. et al. Brain-to-text: decoding spoken phrases from phone representations in the brain. Front. Neurosci. 9, 217 (2015).
Moses, D. A., Leonard, M. K., Makin, J. G. & Chang, E. F. Real-time decoding of question-and-answer speech dialogue using human cortical activity. Nat. Commun. 10, 3096 (2019).
Angrick, M. et al. Real-time synthesis of imagined speech processes from minimally invasive recordings of neural activity. Commun. Biol. 4, 1055 (2021).
Nguyen, C. H., Karavas, G. K. & Artemiadis, P. Inferring imagined speech using EEG signals: a new approach using Riemannian manifold features. J. Neural Eng. 15, 016002 (2017).
Cooney, C., Korik, A., Folli, R. & Coyle, D. Evaluation of hyperparameter optimization in machine and deep learning methods for decoding imagined speech EEG. Sensors 20, 4629 (2020).
Rezazadeh-Sereshkeh, A., Yousefi, R., Wong, A. T., Rudzicz, F. & Chau, T. Development of a ternary hybrid fNIRS-EEG brain–computer interface based on imagined speech. Brain-Comput. Interfaces 6, 128–140 (2019).
Dash, D., Ferrari, P. & Wang, J. Decoding imagined and spoken phrases from non-invasive neural (MEG) signals. Front. Neurosci. 14, 290 (2020).
Pei, X., Barbour, D. L., Leuthardt, E. C. & Schalk, G. Decoding vowels and consonants in spoken and imagined words using electrocorticographic signals in humans. J. Neural Eng. 8, 046028 (2011).
Proix, T. et al. Imagined speech can be decoded from low-and cross-frequency intracranial EEG features. Nat. Commun. 13, 48 (2022).
Anumanchipalli, G. K., Chartier, J. & Chang, E. F. Speech synthesis from neural decoding of spoken sentences. Nature 568, 493–498 (2019).
D’Zmura, M., Deng, S., Lappas, T., Thorpe, S. & Srinivasan, R. In Human-Computer Interaction. New Trends: 13th International Conference, HCI International 2009, San Diego, CA, USA, July 19–24, 2009, Proceedings, Part I 13 40–48 (Springer, 2009).
Ghane, P. & Hossain, G. Learning patterns in imaginary vowels for an intelligent brain computer interface (BCI) design. arXiv preprint arXiv:2010.12066 (2020).
Wandelt, S. K. et al. Representation of internal speech by single neurons in human supramarginal gyrus. Nat. Hum. Behav. 2024, 1–14 (2024).
Soroush, P. Z. et al. The nested hierarchy of overt, mouthed, and imagined speech activity evident in intracranial recordings. Neuroimage 269, 119913 (2023).
Houde, J. F. & Nagarajan, S. S. Speech production as state feedback control. Front. Hum. Neurosci. 5, 82 (2011).
Johnson, E., Bornman, J. & Tönsing, K. M. An exploration of pain-related vocabulary: implications for AAC use with children. Augment. Altern. Commun. 32, 249–260 (2016).
Duffy, E. I. et al. A pilot study assessing the spiritual, emotional, physical/environmental, and physiological needs of mechanically ventilated surgical intensive care unit patients via eye tracking devices, head nodding, and communication boards. Trauma Surg. Acute Care Open 3, e000180 (2018).
Lancaster, J. L. et al. Bias between MNI and Talairach coordinates analyzed using the ICBM-152 brain template. Hum. Brain Mapp. 28, 1194–1205 (2007).
Laird, A. R. et al. Comparison of the disparity between Talairach and MNI coordinates in functional neuroimaging data: validation of the Lancaster transform. Neuroimage 51, 677–683 (2010).
Gramfort, A. et al. MEG and EEG data analysis with MNE-Python. Front. Neurosci. 7, 70133 (2013).
Gruenwald, J. et al. Characterization of high-gamma activity in electrocorticographic signals. Front. Neurosci. 17, 1206120 (2023).
Zhang, B., He, S. & Weng, X. Localization and functional characterization of an occipital visual word form sensitive area. Sci. Rep. 8, 6723 (2018).
Lochy, A. et al. Selective visual representation of letters and words in the left ventral occipito-temporal cortex with intracerebral recordings. Proc. Natl. Acad. Sci. 115, E7595–E7604 (2018).
Palejwala, A. H. et al. Anatomy and white matter connections of the fusiform gyrus. Sci. Rep. 10, 13489 (2020).
Debska, A., Wójcik, M., Chyl, K., Dziegiel-Fivet, G. & Jednoróg, K. Beyond the Visual Word Form Area–a cognitive characterization of the left ventral occipitotemporal cortex. Front. Hum. Neurosci. 17, 52 (2023).
Mock, J. R., Foundas, A. L. & Golob, E. J. Modulation of sensory and motor cortex activity during speech preparation. Eur. J. Neurosci. 33, 1001–1011 (2011).
Salmelin, R., Schnitzler, A., Schmitz, F. & Freund, H.-J. Single word reading in developmental stutterers and fluent speakers. Brain 123, 1184–1202 (2000).
Liljeström, M., Kujala, J., Stevenson, C. & Salmelin, R. Dynamic reconfiguration of the language network preceding onset of speech in picture naming. Hum. Brain Mapp. 36, 1202–1216 (2015).
Hertrich, I., Dietrich, S. & Ackermann, H. The margins of the language network in the brain. Front. Commun. 5, 519955 (2020).
Friederici, A. D. The brain basis of language processing: from structure to function. Physiol. Rev. 91, 1357–1392 (2011).
Hickok, G., Houde, J. & Rong, F. Sensorimotor integration in speech processing: computational basis and neural organization. Neuron 69, 407–422 (2011).
Hickok, G. & Poeppel, D. Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language. Cognition 92, 67–99 (2004).
Walker, G. M. & Hickok, G. Empirical and computational findings converge in support of the Hierarchical State Feedback Control theory. Lang. Cogn. Neurosci. 31, 470–470 (2016).
Wolpert, D. M., Ghahramani, Z. & Jordan, M. I. An internal model for sensorimotor integration. Science 269, 1880–1882 (1995).
Niziolek, C. A., Nagarajan, S. S. & Houde, J. F. What does motor efference copy represent? Evidence from speech production. J. Neurosci. 33, 16110–16116 (2013).
Rauschecker, J. P. An expanded role for the dorsal auditory pathway in sensorimotor control and integration. Hear. Res. 271, 16–25 (2011).
Rauschecker, J. P. & Scott, S. K. Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing. Nat. Neurosci. 12, 718–724 (2009).
Leonard, M. K. et al. Large-scale single-neuron speech sound encoding across the depth of human cortex. Nature 626, 593–602 (2024).
Acknowledgements
This work was supported by the ‘Alchemist Project’ (Fully implantable closed loop Brain to X for voice communication) funded By the Ministry of Trade, Industry & Energy (MOTIE, Korea), under Grant 20012355 and NTIS 1415181023.
Author information
Authors and Affiliations
Contributions
J.K. and G.L. contributed equally to this work. J.K., G.L., and C.K.C. conceptualized the study. J.K. and G.L. developed the methodology, performed analysis and visualization, and wrote and edited the original draft of the manuscript. J.K., G.L., and Y.J.P. performed data curation. C.K.C. reviewed and edited the manuscript, supervised the project, and acquired funding. E.J.L. and C.K.C. performed the neurosurgical procedures.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Kwon, J., Lee, G., Park, Y.J. et al. Mimed speech as an intermediary state between overt and imagined speech production in an electrocorticography study. Sci Rep 15, 33393 (2025). https://doi.org/10.1038/s41598-025-18537-2
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-18537-2






