Fig. 1: Overview of our contributions.
From: Towards building multilingual language model for medicine

a The figure demonstrates our proposed large-scale multilingual medical corpus (MMedC), containing 25.5B tokens, covering six main languages, collected from four data sources. b The figure shows the composition of our comprehensive multilingual medical benchmark (MMedBench), that is constructed by aggregating medical QA cases in different languages, and prompting GPT-4 to provide rationale sentences. MMedBench enables the evaluation on both multi-choice accuracy and the ability of rationale generation for different LLMs under zero-shot or fine-tuning settings. c The line plot shows the final multi-choice accuracy of various LLMs on our MMedBench are shown, where our final model MMed-Llama 3 demonstrated the best performance among all existing open-source LLMs. d The comparison bar further details the gains in both multi-choice accuracy and ability of rationale generation, when comparing MMedLM 2 to InternLM 2, or comparing MMed-Llama 3 to Llama 3. Considering that the main difference between our models and their base models lies in the auto-regressive training on MMedC, such comparison highlights the importance of our contributed medical-specific multilingual language corpus. Source data are provided as a Source Data file.