Fig. 8
From: Collaborative positional attention for image to English question answering

Ablation study on the impact of IHCM location. The chart compares accuracy percentages across four metrics: “All” (Overall), “Other”, “Yes/No”, and “Number”. The lines represent the baseline model (Yu et al.52 and three variations of our method: adding IHCM only after normalization (&IHCMaf), only before normalization (&IHCMbf), and at both stages (&IHCMaf+bf). The results demonstrate that the combined approach (&IHCMaf + bf) yields the highest accuracy across all categories.