Table 2 Statistics of our instruction-tuning datasets

From: Small language models learn enhanced reasoning skills from medical textbooks

Target application

Dataset

# Examples

Multiple-choice QA

MedQA-CoTa36

9308

 

MedBooks-18-CoTa

77,660

 

MedMCQA41

182,822

Free-form/Single-turn QA

LiveQA42

633

 

MedicationQA43

689

 

ChatDoctor-cleaneda38

111,902

Multi-turn QA

MedQA-dialoga36

4818

Clinical Note Generation

MTS-dialog60

1200

Miscellaneous

MedInstruct-52K44

52,002

  1. “# Examples” denotes the number of training examples for each dataset.
  2. aindicates that the dataset is newly constructed in this study. The total number of training examples is 441,034.