Table 1 The evolution of the semantic labels.

From: A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning

Iteration

New labels

Label count

Sample count

1

Acute lymphoblastic leukemia, acute myeloid leukemia, inadequate, lymphoproliferative disorder, mastocytosis, metastatic, myelodysplastic syndrome, myeloproliferative neoplasm, normal, plasma cell neoplasm

10

50

2

Erythroid hyperplasia, iron deficiency

12

83

3

Acute leukemia, acute promyelocytic leukemia, chronic myeloid leukemia, hemophagocytosis, hypercellular, hypocellular

18

 

4

Basophilia, eosinophilia

20

282

5

 

20

296

6

Granulocytic hyperplasia

21

344

7

 

21

393

8

 

21

408

9

 

21

500

  1. In each iteration, new cases and/or new labels are added to the dataset. In some iterations, we reviewed the labeled cases and added new labels to the previous cases, or added a small number of new semantic labels.