Figure 5
From: Micro-expression recognition model based on TV-L1 optical flow method and improved ShuffleNet

In this model, the convolutional layer of ShuffleNetV2 is responsible for downsampling, and a modified ViT module is inserted before and after the last ShuffleNetV2 layer to extract global features.