Fig. 2: Overview of MedS-Ins. | npj Digital Medicine

Fig. 2: Overview of MedS-Ins.

From: Towards evaluating and building versatile large language models for medicine

Fig. 2

a The task collection pipeline. For each task, we add a task category along with a hand-written definition to it, resulting in a total of 19 task categories. b We collect the existing 58 public datasets. c We convert the formats of different datasets into one unified medical instruction dataset, MedS-Ins. d The final data distribution of our collected MedS-Ins. The Sankey diagram shows how the different text domains (left), task categories (middle), and data sources (right) contribute to the final datasets. On the left of the bottom, two pie charts show the data distributions on text domains and task categories respectively.

Back to article page