Fig. 3: Statistic results on MMedBench.
From: Towards building multilingual language model for medicine

a The bar plot shows the foundation statistic number on the train and test set of MMedBench. The term “Avg. tokens” represents the mean token length per sample across various compositions in it. “Rationale” denotes the rationale sentences in answer. “Option” denotes the option descriptions in choice list and “question” denotes the question sentences. Then the term “Prop. of multi-option” denotes the proportion of the question with multiple correct options and “Prop. of single-option” denotes the proportion of those with one options in answer. The final term “Number of QA pairs” denotes how many QA pairs are in train or test split. b The statistic histogram shows the topics distribution in the test split of MMedBench, covering a wide range of medical aspects, ranging from general and specialized medicine to basic medical sciences. This allows MedQA to comprehensively measure the performance of medical models. Source data are provided as a Source Data file.