Fig. 4: Performance comparison of proprietary MLLMs, open-source MLLMs on general multimodal benchmarks.
From: Efficient GPT-4V level multimodal large language model for deployment on edge devices

MiniCPM-Llama-V 2.5, with only 8 billion parameters, outperforms leading open-source MLLMs and achieves superior results on the OpenCompass benchmark compared to proprietary models like GPT-4V-1106 and Gemini Pro. In addition, MiniCPM-V 2.0, with 2 billion parameters, significantly outperforms other MLLMs with fewer than 4 billion parameters.