Fig. 5: Cross-language encoding comparisons reveal language-specific representations and computations aligned between the DNN and the STG. | Nature Neuroscience

Fig. 5: Cross-language encoding comparisons reveal language-specific representations and computations aligned between the DNN and the STG.

From: Dissecting neural computations in the human auditory pathway using deep neural networks for speech

Fig. 5

a, Schematic of the cross-language paradigm. Both English (darker shade) and Mandarin (lighter shade) speech samples were fed into models pretrained on English or Mandarin. The extracted representations were used to predict neural responses recorded in the STG of native English speakers or native Mandarin speakers when they listened to the corresponding speech. b, Distribution of the prediction R2 values of the linear STRF model in STG electrode recordings from native English speakers using English or Mandarin speech. Two-sided paired t test. c, Averaged normalized BPS of the encoding model based on every single layer in the English-pretrained HuBERT model in native English speakers when they listened to English versus Mandarin speech. *P < 0.05, **P < 0.01, ***P < 0.001, paired two-sided t test; n = 57 electrodes in the STG (a subset of all participants who completed the relevant tasks). d, AS–BPS correlation across layers in the English-pretrained HuBERT model and the STG in native English speakers (Pearson’s correlation, *P < 0.05, permutation test, one-sided). Each panel corresponds to one type of attention pattern. Colored bars correspond to different contexts, as in Fig. 4. e–g, Same as b–d but using the Mandarin-pretrained HuBERT model and recordings from n = 61 STG electrodes in native Mandarin speakers. Box plot shows the first and third quantiles across electrodes (orange line indicates the median; gray line indicates the mean value; and whiskers indicate the 5th and 95th percentiles). Dashed horizontal gray line: the performance of the full acoustic-phonetic feature baseline model. CNN out, CNN output layer; CNN proj, CNN projection layer; NS, not significant.

Source data

Back to article page