Extended Data Fig. 8: Analysis on attentions in HuBERT model. | Nature Neuroscience

Extended Data Fig. 8: Analysis on attentions in HuBERT model.

From: Dissecting neural computations in the human auditory pathway using deep neural networks for speech

Extended Data Fig. 8

a) The averaged attention distance in each Transformer encoder layer of HuBERT model (mean ± s.d., n = 499 independent sentences). The averaged attention distance is computed as token distance weighted by attention weights, averaging across all attention heads and across all tokens. The attention weights in each layer are iteratively aggregated over previous layers using attention rollout. b) The AS-BPS correlation across layers in random model versus English-pretrained model for STG in native English speakers (Pearson’s correlation, * p < 0.05, permutation test, one-sided). Each panel corresponds to one type of attention pattern. (See also Fig. 4). c) The shifted AS-BPS correlation (with attention matrix shuffled in blocks) across layers versus unshifted original AS-BPS in English-pretrained model for STG in native English speakers (Pearson’s correlation, * p < 0.05, permutation test, one-sided). Each panel corresponds to one type of attention pattern.

Source data

Back to article page