Fig. 5: Mimics human perception of dynamic audio-visual information. | Nature Communications

Fig. 5: Mimics human perception of dynamic audio-visual information.

From: Reconfigurable optoelectronic transistors for multimodal recognition

Fig. 5

a Simplified diagram of humans achieving high levels of cognition through audio-visual integration. b Three-dimensional space mapping of “Rolling hand toward left” gesture, a sample from the EgoGesture dataset. c Images superimposed at four time points in the intermediate state after hand movements were preprocessed by the reservoir. d Speech spectrogram of “Rolling hand toward left” gesture, reflecting the frequency domain distribution of audio at different moments. Corresponding voice overs for gestures, generated using text-to-speech engine. e Input vector after normalization of time and frequency domain information, and output vector after reservoir processing. f Comparison of recognition accuracy under single- and multi-mode information processing.

Back to article page