Extended Data Fig. 7: Cross-model encoding comparisons reveal language-specific representation and computations aligned between DNN and STG. | Nature Neuroscience

Extended Data Fig. 7: Cross-model encoding comparisons reveal language-specific representation and computations aligned between DNN and STG.

From: Dissecting neural computations in the human auditory pathway using deep neural networks for speech

Extended Data Fig. 7

a) Schematic of the cross-model paradigm. Both English (lighter color) and Mandarin (darker color) speech were fed into models pretrained on English or Mandarin. The extracted representations were used to predict neural responses recorded in STG from native English speakers or native Mandarin speakers when they listened to the corresponding speech (English speaker listened to English; Mandarin speaker listened to Mandarin). b) The distribution of normalized brain prediction score of the encoding model based on every single layer in English-pretrained HuBERT model (light shaded bars) versus Mandarin-pretrained model (dark shaded bars) in native English speakers when listening to English speech. * p < 0.05, ** p < 0.01, *** p < 0.001, paired two-sided t-test; n = 57 electrodes in STG. c) The AS-BPS correlation across layers in English-pretrained (light shaded bars) and Mandarin-pretrained (dark shaded bars) HuBERT model with STG in native English speakers (Pearson’s correlation, * p < 0.05, permutation test, one-sided). Each panel corresponds to one type of attention pattern. (See also Fig. 4). d-e) Same as b-c, but using recordings from STG in native Mandarin speakers when listening to Mandarin speech (n = 61 electrodes in STG). The performance of English-pretrained model (light shaded bars) and Mandarin-pretrained HuBERT models (dark shaded bars) are compared. f-j) same as a-e, but for native English speakers or native Mandarin speakers when they listened to speech in the other language (English speaker listened to Mandarin; Mandarin speaker listened to English). Box plot shows the first and third quantiles across electrodes, orange line indicates the median, gray line is the mean value, and whiskers indicate the 5th and 95th percentiles.

Source data

Back to article page