Fig. 2: Seq2Seq models for phoneme neural decoding. | Nature Communications

Fig. 2: Seq2Seq models for phoneme neural decoding.

From: Transfer learning via distributed brain recordings enables reliable speech decoding

Fig. 2

A Schematic representation of a sequence-to-sequence model: neural data with variable cortical coverage were processed by a temporal convolutional layer, a recurrent neural network, and a linear readout layer to isolate phoneme identity probabilities for each index in the phoneme sequence. These predicted phoneme sequences (example predicted trial is depicted) are then compared using a distance metric to evaluate a phoneme error rate (PER). B Phoneme sequences were decoded utilizing frames of neural activity during articulation and prior to articulation with a fixed and variable length Seq2Seq model as well as a linear model for comparison. PERs were computed across these conditions to evaluate effects of time window and model architecture. C, D Cohort level trial and channel statistics from controlled analyses driving decoding performance, and extrapolated values for optimal number of trials and channels for high decoding accuracy (1-PER). E Regional electrode occlusion (REO) analysis created for decoding architectures for broad lobe-based and region-specific analysis, employing a linear mixed effects model with random effects for patients across different time windows preceding articulation. **Box plots (center/bounds/whiskers): Linear 76%/70–82%/65–90% (articulatory), 80%/75–85%/70–92% (pre-articulatory); Fixed 27%/22–32%/18–42% (articulatory), 34%/29–39%/25–47% (pre-articulatory); Variable 44%/38–50%/32–58% (articulatory), 56%/50–62%/45–68% (pre-articulatory). Outliers beyond 1.5×IQR shown. Statistical significance by repeated measures ANOVA, two-sided (*p < 0.05, **p < 0.01, n = 25 subjects). **Created in BioRender. Singh, A. (2025) https://BioRender.com/22ktqfb.

Back to article page