Fig. 1 | Scientific Reports

Fig. 1

From: Dynamic adaptive synergistic attention network for visible-infrared person re-identification

Fig. 1

Network Architecture Overview. During training, the framework processes paired RGB-IR images through two modality-specific ResNet-50 backbones that extract stage-specific features \(F_v^{(l)}\) and \(F_r^{(l)}\) (\(l \in \{1,2,3,4\}\)). DASF modules (containing AKSA mechanisms) generate fused features at each stage, unified by an FPN. The network is optimized via QBOL (\(\alpha L_{id} + \beta L_{tri} + \gamma L_{sup} + \delta L_{mmd}\)) that supervises modality-specific features, while DASF-generated fused features serve as auxiliary gradient pathways during backpropagation. At inference, only modality-specific branches are activated—DASF and FPN are bypassed for single-modality feature extraction.

Back to article page