Table 2 Results

Model	Brennan (EEG)	Broderick (EEG)	Gwilliams (MEG)	Schoffelen (MEG)
Random model	5.3 ± 0.1	0.5 ± 0.1	0.7 ± 0.1	0.8 ± 0.1
Base model	6.0 ± 0.9	1.0 ± 0.3	12.4 ± 1.2	20.6 ± 1.8
+ Contrastive	8.0 ± 4.8	9.7 ± 1.0	55.1 ± 0.7	55.1 ± 0.9
+ Deep Mel	24.7 ± 3.2	15.4 ± 1.6	64.4 ± 0.8	61.2 ± 0.6
+ wav2vec 2.0	25.7 ± 2.9	17.7 ± 0.6	70.7 ± 0.1	67.5 ± 0.4

Top-10 segment-level accuracy (%) for a random baseline model that predicts a uniform distribution over the segments (‘random’), a convolutional network trained to predict the Mel spectrograms with a regression loss (‘base’), the same model trained with a contrastive CLIP loss (‘+ Contrastive’) and our model, which is trained to predict the features of wav2vec 2.0 with a contrastive loss (‘+ wav2vec 2.0’). We also report the performance obtained with training, from scratch, a deep learning based speech representation using a contrastive loss (‘+ Deep Mel’). Values are mean ±s.d. across three random initializations of the model’s weights. The best accuracy across methods is indicated in bold.

Quick links

Search