Figure 2
From: Decoding of the speech envelope from EEG using the VLAAI deep neural network

Left: Comparison of the VLAAI network with the baseline models: a subject-independent linear model, and the FCNN and CNN models presented by Thornton et al.21. All models were trained on data from all subjects in the single-speaker stories dataset. Each point in the violin plot is the reconstruction score for a subject (80 subjects in total), averaged across stimuli. The FCNN, CNN21 and VLAAI network significantly outperforms the linear decoder baseline (p <0.001). The CNN significantly outperforms the FCNN model \(({\hbox {p}}\,=\,0.02)\). The VLAAI network significantly outperforms all baseline models (\(p<0.001\)), a relative improvement of 52% compared to the linear decoder. (n.s.: p \(\ge\) 0.05, *: 0.01 \(\le\) p < 0.05, **: 0.001 \(\le\) p < 0.01, ***: p < 0.001). Right: A subject-independent VLAAI model is finetuned on data of individual subjects, resulting in one subject-specific VLAAI model per subject. The same finetuning procedure as in the Finetuning subsection is followed. The training set of the single-speaker stories dataset was used to train and finetune the subject-independent model. The test set remains unseen during training/finetuning. Each point in the violin plot is the reconstruction score for a subject (80 subjects in total), averaged across stimuli.