Fig. 2
From: Mutual contextual relation-guided dynamic graph networks for cross-modal image-text retrieval

A sample graph constructed with only ten attributes of ViT and BERT features.
From: Mutual contextual relation-guided dynamic graph networks for cross-modal image-text retrieval
A sample graph constructed with only ten attributes of ViT and BERT features.