Fig. 1: Overarching concept of our BriVL model with weak training data assumption.
From: Towards artificial general intelligence via a multimodal foundation model

a Comparison between the human brain and our multimodal foundation model BriVL (Bridging-Vision-and-Language) for coping with both vision and language information. b Comparison between modeling weak semantic correlation data and modeling strong semantic correlation data.