Fig. 3 | Scientific Reports

Fig. 3

From: Toward general object search in open reality

Fig. 3

The architecture of Siamese Exchanged Attention Network (SEA-Net) for General Object Search in Open Reality (GOSO). The model is composed of three parallel branches, two of which share weights and are used to extract the features of gallery and query respectively, termed as G-branch (blue) and Q-branch (green). The other branch (orange) progressively extracts the desirable features with richer semantic information through multiple stage-stacked SEA layers. Then these features are fused by the Hierarchical Feature Fusion module (HFF), which is concretized as a weighted summation function (\(\Sigma\)) in Section “Siamese exchanged attention module”. Note that \({W}_{Q}\), \({W}_{K}\), \({W}_{V}\) are shared by G-branch and Q-branch in each SEA layer. Finally, the features from all three branches are fed into a shared classification layer C (yellow) to calculate loss value (during training) or the proposed Open Score Fusion module (OSF) in Section “Open score fusion module” to produce matching scores (during inference).

Back to article page