Fig. 1: Overview of the deepfake voice synthesis, acoustic representation of identity-encoding features, and experimental tasks.
From: Cortical-striatal brain network distinguishes deepfake from real speaker identity

a The deepfake synthesis consisted of the acoustic voice feature extraction of the natural target and source speaker, training of the Gaussian mixture model (GMM), and conversion of the synthesized idiosyncratic acoustic voice profile of the target speaker with the natural speech sound of the source speaker. b Distribution of identity-encoding voice features in natural and deepfake voices. We scaled acoustic values to facilitate visualization and model output comparisons. LMMs assessed acoustic differences between natural and deepfake voices. Asterisks indicate *p < .05, Bonferroni-corrected for n = 7 models, ns: nonsignificant. Circles indicate sentence-specific acoustic values with speaker-specific color coding. Horizontal lines indicate the mean values. c Experimental design of the fMRI matching task, including an identity and speech task. d Accuracy of the fMRI matching task. Statistics based on LMM (nobservations=100, nparticipants = 25) with task and sound condition as fixed effects and participants as a random factor. Asterisks indicate **p < .0001, *p < .001. Circles indicate individual data and horizontal lines mean performances.