Fig. 7: Event-based video classification with DVSGesture128 dataset.

A sample of the left-hand waving event stream is shown in a spatial view and b spatiotemporal view. The event video is down-sampled into 28 pixels in both spatial directions and time-binned into 32 frames. Five evenly spaced frames c are perceived by the in-sensor RC system d. e The feature map extracted from the in-sensor reservoir matrix. f The non-volatile organic memristor array, functioning as the readout map. g The classification results, a [0, 0, 1] vector in this case, representing the probabilities that the sample falls to the 3 classes. h The feature distribution visualization via PCA showing the feature map of different gestures extracted from the reservoir system can be linearly divided even in the two-dimensional space. Each point represents a sample and the color represents its class. i The confusion matrix, dominated by the diagonal elements.