Fig. 5
From: Collaborative positional attention for image to English question answering

Structure of the Intra-modal Self-Attention with Collaboration (IMSAC) unit. This unit refines single-modality features (Query, Key, Value) by incorporating inter-head collaboration to capture internal dependencies.