Table 8 Breakdown of labels for each dataset split

From: Detecting stigmatizing language in clinical notes with large language models for addiction care

Label

Train

Validation

Test

External Validation Full

External Validation Balanced

0

28,302

6617

6270

286,058

2072

1

25,665

4949

5316

2072

2072

Total

53,967

11,566

11,586

288,130

4144

  1. Breakdown of labels for train, validation, and test sets from post-processed and selected MIMIC-III17 data and external validation sets from University of Wisconsin-Madison Hospital. External validation breakdown is provided for the native prevalence of the label (no selection, simply recording the true frequency of stigmatizing language) and a randomly balanced set.