Table 1 Data Demographics.

From: Detecting stigmatizing language in clinical notes with large language models for addiction care

Demographic Characteristic

Selected MIMIC Data (n = 77,104)

External Validation Full (n = 288,130)

External Validation Down-Sampled (n = 4144)

Age (yrs), mean (SD)

56.37 (41.49)

59 (16)

55 (15)

Sex, n (%)

 Male

47,844 (62%)

176285 (61%)

2614 (63%)

Ethnicity, n (%)

 Black or African American

7298 (9.47%)

1799 (4.41%)

208 (5.02%)

 Pacific Islander or Hawaiian Native

12 (0.017%)

67 (0.16%)

13 (0.31%)

 White or Caucasian

51,994 (67.43%)

36363 (89.04%)

3641 (87.86%)

 American Indian or Alaska Native

76 (0.099%)

299 (0.73%)

35 (0.84%)

 Asian or Mideast Indian

1774 (2.30%)

540 (1.32%)

42 (1.01%)

 Hispanic or Latino

2319 (3.01%)

992 (2.43%)

106 (2.56%)

 Unknown or Declined

7,868 (10.20%)

678 (1.66%)

88 (2.12%)

 Non-Hispanic/Latino

328 (0.46%)

100 (0.24%)

11 (0.27%)

  1. Demographics of post-selected and labeled MIMIC-III data17 (Train, Test, and Validation sets combined), External Validation Full (UW), and External Validation down-sampled/balanced (UW).