Fig. 12: Attention map of the cross attention module (400 × 400) from the last block of 4DVarFormer.

The cross-attention in each head exhibits a nearly full-rank attention map, indicating that 4DVarFormer can learn a wide range of features and capture diverse relationships from background fields and gradients.