Fig. 3: RNPU analogue feature extraction for speech recognition. | Nature

Fig. 3: RNPU analogue feature extraction for speech recognition.

From: Analogue speech recognition based on physical computing

Fig. 3

a, t-distributed stochastic neighbour embedding (t-SNE) visualization for the female subset of the TI-46-Word spoken digit dataset before RNPU preprocessing. b, Schematic of RNPU (nonlinear function noted as f(.)) fed with analogue time-dependent input x(t) (blue electrode), voltage measured at the orange electrode and constant control voltages (black electrodes). Every set of control voltages results in a unique transformed signal (green and red curves shown as examples) and forms an output channel with a ten times lower sample rate compared with the raw input signal (Methods). c, t-SNE visualization after RNPU preprocessing (1 configuration out of 32 sets of randomly chosen control voltages). The output data show that the RNPU preprocessing helps clustering utterances of the same digit, simplifying later classification. d, Comparison of the TI-46-Word classification accuracy for linear and CNN classifier models without (green, software model) and with (blue, software model and orange, hardware-aware (HWA) trained model) RNPU preprocessing for 16, 32 and 64 RNPU channels (CHNL). The all-hardware (RNPU with the AIMC classifier) result (orange) is presented as the mean ±1 standard deviation over 10 inference measurements. e, Comparison of the Google Speech Commands (GSC) keyword spotting (KWS) classification accuracy with and without RNPU preprocessing with four-layer, five-layer and six-layer CNNs. RNPU preprocessing allows for achieving higher accuracies with classifiers requiring over twice fewer MAC operations. M, million.

Back to article page