Extended Data Fig. 5: Comparing DNN encoding performance across different models.
From: Dissecting neural computations in the human auditory pathway using deep neural networks for speech

The distribution of the normalized brain prediction score of the best-performing neural encoding model based on each single layer in the DNN model (maximum over delay window length) across individual electrodes. a) Wav2Vec 2.0 Unsupervised (SSL) model; b) Wav2Vec 2.0 Supervised finetuning (SSL + FT) model; c) HuBERT Unsupervised (SSL) model; d) HuBERT pure supervised model. Each column corresponds to one area in the auditory pathway, from left to right AN/IC/HG/STG. Magenta bars indicate CNN output layers, cyan bars indicate Transformer layers. Red star (*) indicates the best model for each area, black dot (.) indicates other models that are not statistically different from the best model (p > 0.05, two-sided paired t-test). Box plot shows the first and third quantiles across electrodes, orange line indicates the median, black line is the mean value, and whiskers indicate the 5th and 95th percentiles.