Table 8 Breakdown of labels for each dataset split

Label	Train	Validation	Test	External Validation Full	External Validation Balanced
0	28,302	6617	6270	286,058	2072
1	25,665	4949	5316	2072	2072
Total	53,967	11,566	11,586	288,130	4144

Breakdown of labels for train, validation, and test sets from post-processed and selected MIMIC-III¹⁷ data and external validation sets from University of Wisconsin-Madison Hospital. External validation breakdown is provided for the native prevalence of the label (no selection, simply recording the true frequency of stigmatizing language) and a randomly balanced set.

Quick links

Search