Fig. 5: The pipeline of MMedBench construction.
From: Towards building multilingual language model for medicine

Firstly multi-choice QA pairs from various languages are collected from 5 QA datasets. Then corresponding rationale is generated with the help of GPT4. The rationale of testset is further checked by humans to ensure its quality.