npj Digital Medicine

Table 2 Statistics of our instruction-tuning datasets

From: Small language models learn enhanced reasoning skills from medical textbooks

Target application	Dataset	# Examples
Multiple-choice QA	MedQA-CoT^a³⁶	9308
	MedBooks-18-CoT^a	77,660
	MedMCQA⁴¹	182,822
Free-form/Single-turn QA	LiveQA⁴²	633
	MedicationQA⁴³	689
	ChatDoctor-cleaned^a³⁸	111,902
Multi-turn QA	MedQA-dialog^a³⁶	4818
Clinical Note Generation	MTS-dialog⁶⁰	1200
Miscellaneous	MedInstruct-52K⁴⁴	52,002

“# Examples” denotes the number of training examples for each dataset.
^aindicates that the dataset is newly constructed in this study. The total number of training examples is 441,034.

Back to article page

Search

Advanced search

Quick links