Figure 3
From: Decoding of the speech envelope from EEG using the VLAAI deep neural network

(A) The small/larger/largest CNN model. For the small CNN (M = 2), the convolutional layers have a kernel size of 20 and 256 filters. The large CNN has five convolutional layers (M = 5) with 256 filters for the first three layers and 128 filters for the last two filters, all with a kernel size of 8. The largest CNN also has five convolutional layers (M = 5), all with 512 filters and a kernel size of 8. (B) The larger CNN, multiple blocks, following the structure of the larger CNN model. The asterisk next to the linear layer highlights that it is not present in the last repetition of that block. For the experiment shown in (D), N = 4 and M = 5. (C) The larger CNN, multiple blocks, with skip connections (step 5 in (D)). The asterisk next to the linear layer and skip connection is to highlight that it is not present in the last repetition of that block. For the experiment shown in (D), N = 4 and M = 5. (D) Ablation study of the VLAAI network. Each point in the violin plot represents a reconstruction score (Pearson correlation) for a subject, averaged across stimuli. No significant difference was found between the large and largest CNN (p = 0.68) and between the larger CNN with multiple blocks and the larger CNN with multiple blocks with skip connections (p = 0.99). The biggest increases in reconstruction score are between the linear model and the small CNN (14% increase in median reconstruction score), the larger CNN with \(N=1\) and \(N=4\) (10% increase in median reconstruction score) and when adding the output context layer to the penultimate model to obtain the VLAAI network (10% increase in median reconstruction score). (n.s.: p \(\le\) 0.05, *: 0.01 \(\le\) p < 0.05, **: 0.001 \(\le\) p < 0.01, ***: p < 0.001).