Fig. 2: Overall architecture of BSP DCA-ViViT. | npj Digital Medicine

Fig. 2: Overall architecture of BSP DCA-ViViT.

From: Smartphone video-based early diagnosis of blepharospasm using dual cross-attention modeling enhanced by facial pose estimation

Fig. 2

The architecture consists of four modules: a Pretrained FaceMesh detector for facial landmark extraction, b ViViT model for spatial feature extraction, c Dual cross attention model, d Dual cross attention layer. PE positional embedding, DCAL dual cross attention layer, MLP multilayer perceptron, SA self attention, LN layer norm, CA cross attention, FF feed-forward layer.

Back to article page