Fig. 6: Cross-modal retrieval and visual question answering (VQA) results. | Nature Communications

Fig. 6: Cross-modal retrieval and visual question answering (VQA) results.

From: Towards artificial general intelligence via a multimodal foundation model

Fig. 6

a Cross-modal retrieval results (%) on the Chinese dataset AIC-ICC. b VQA results on Visual7W. Overall accuracies (%) along with results on each question type are reported. The dataset is translated into Chinese. c VQA examples of our BriVL model regarding whether it is pre-trained to validate the strong imagination ability of our pre-trained BriVL. Highest results in (a) and (b) are highlighted in bold.

Back to article page