Table 2 Results

From: Decoding speech perception from non-invasive brain recordings

Model

Brennan (EEG)

Broderick (EEG)

Gwilliams (MEG)

Schoffelen (MEG)

Random model

5.3 ± 0.1

0.5 ± 0.1

0.7 ± 0.1

0.8 ± 0.1

Base model

6.0 ± 0.9

1.0 ± 0.3

12.4 ± 1.2

20.6 ± 1.8

+ Contrastive

8.0 ± 4.8

9.7 ± 1.0

55.1 ± 0.7

55.1 ± 0.9

+ Deep Mel

24.7 ± 3.2

15.4 ± 1.6

64.4 ± 0.8

61.2 ± 0.6

+ wav2vec 2.0

25.7 ± 2.9

17.7 ± 0.6

70.7 ± 0.1

67.5 ± 0.4

  1. Top-10 segment-level accuracy (%) for a random baseline model that predicts a uniform distribution over the segments (‘random’), a convolutional network trained to predict the Mel spectrograms with a regression loss (‘base’), the same model trained with a contrastive CLIP loss (‘+ Contrastive’) and our model, which is trained to predict the features of wav2vec 2.0 with a contrastive loss (‘+ wav2vec 2.0’). We also report the performance obtained with training, from scratch, a deep learning based speech representation using a contrastive loss (‘+ Deep Mel’). Values are mean ±s.d. across three random initializations of the model’s weights. The best accuracy across methods is indicated in bold.