Figure 4 | Scientific Reports

Figure 4

From: Micro-expression recognition model based on TV-L1 optical flow method and improved ShuffleNet

Figure 4

Each pixel sees every other pixel in our ViT block. In this example, the red pixel uses the transformer to focus on the blue pixel (the pixel in the corresponding position in the other blocks). Because the blue pixel already encodes the information of neighboring pixels using convolution, this allows the red pixel to encode the information of all pixels in the image. Here, each cell in the black and grey grid represents a block and a pixel, respectively41.

Back to article page