Fig. 4: Fusion-induced recalibration results from causal inference of multisensory speech. | Communications Psychology

Fig. 4: Fusion-induced recalibration results from causal inference of multisensory speech.

From: Repeatedly experiencing the McGurk effect induces long-lasting changes in auditory speech perception

Fig. 4

a The top row illustrates the model fit for a participant who experiences strong FIR. The model posits an internal representational space for speech tokens; for simplicity, a two-dimensional space with three tokens is shown. During the presentation of a McGurk stimulus consisting of auditory ba paired with visual ga, three internal representations are created. The first representation is auditory (A; green ellipse). The second representation is visual (V; yellow ellipse). The third representation is audiovisual (AV; purple ellipse). The audiovisual representation is the average of the auditory and visual representations, weighted by the likelihood that they arise from a common cause (the same talker). For a participant who places a high likelihood on a common cause, the integrated representation is midway between the auditory and visual representations, in the da region of representational space. b During the presentation of McGurk stimuli during the exposure phase, sensory noise causes the encoded location of each representation (gray points) to fall at a different location somewhere within the 95% distribution ellipses shown in A (the encoded visual location, not shown, also varies). The integrated audiovisual representation always lies in the da region of representational space, leading to mainly McGurk fusion percepts. The difference between the encoded A and AV locations creates an error signal (red arrows). Two trials are shown (subscripts 1 and 2). Repeated errors induce a shift in the auditory representation from its original location (A, ellipse with green dashed line) to a new location that overlaps the integrated audiovisual representation (A’, ellipse with solid green line), eliminating the error signal. c During the 24-h post-test, the shift in the auditory representation can be measured. The encoded auditory location varies from presentation to presentation due to sensory noise (each gray point represents the location in one presentation), but all fall within the A’ ellipse, producing a preponderance of FIR da percepts (bar graph). d The middle row illustrates the model fit for a participant who experiences moderate FIR, due to their tendency to estimate a moderate likelihood of a common cause for incongruent auditory and visual speech. For this type of participant, the integrated audiovisual representation is closer to the auditory representation, straddling the ba and da regions of representational space. e Sensory noise causes the encoded location to vary from trial to trial, resulting in a mixture of percepts across repeated presentations of the same McGurk stimulus. In the first trial, the encoded audiovisual representation is in the ba region of space, resulting in a ba percept, while in the second trial, the encoded audiovisual representation is in the da region of space. The difference between the encoded A and AV locations creates an error signal (red arrows) inducing a partial shift in the auditory representation from its original location (A) to a new location that straddles the ba and da boundary (A’). f In the 24-h post-test, sensory noise causes the encoded location to vary from presentation to presentation within the A’ distribution, sometimes falling in the ba region of representational space and sometimes in the da region, producing a moderate number of FIR (da) percepts. g The bottom row illustrates the model fit for participants who do not experience FIR. Because they estimate a low likelihood of a common cause for incongruent auditory and visual speech, the integrated audiovisual representation is similar to the auditory representation. h Across repeated presentations of the McGurk stimulus, the integrated audiovisual representation always lies in the ba region of representational space, resulting in the absence of McGurk percepts and error signals. The auditory representation remains unchanged. i In the 24-h post-test, the encoded auditory location always falls in the ba region of representational space, resulting in no FIR (da) percepts.

Back to article page