Fig. 4: LSTM for KWS task.

a Architecture of LSTM network on-chip inference. b Mapping of LSTM network onto the chip. Weights and nonlinearities (Sigmoid and Tanh) of LSTM layer are programmed crossbar arrays as conductance. Input and output (I/O) data of LSTM layer are sent from/to the integrated chip through off-chip circuits. c Weight conductance distribution curve and error. d The measured inference accuracy results obtained on the chip are compared with the software baseline using the ideal model, as well as simulation results under different bit NL-ADC models and hardware-measured weight noise. e Energy efficiency and area efficiency comparison: our LSTM IC, conventional ADC model and recently published LSTM ICs from research papers23,25,26,27,28,31,67,68. Energy efficiency and throughput under 8 bit, 5 bit, 4 bit and 3 bit NL-ADC are calculated based on 16 nm CMOS technology and clock frequency of 1 GHz. Detailed calculations are shown in Supplementary Note S3, Supplementary Note S4 and Tab. S5. Area efficiency of all works are normalized to 1 GHz clock and 16 nm CMOS process. f Energy efficiency comparison (this work, conventional ADC model, a chip for speech recognition using LSTM model31) at various levels: MAC array, NL-processing, full system. Full system includes MAC and NL-processing and other modules that assist MAC and NL-processing.