Fig. 5: Performance on OCR and multilingual benchmarks.
From: Efficient GPT-4V level multimodal large language model for deployment on edge devices

a, b Results on OCR benchmarks including OCRBench, the DocVQA test set and the TextVQA val set for (a) open-source MLLMs (>4B) and (b) MLLMs (<4B). c, Multilingual multimodal capabilities on the multilingual LLaVABench.