Table 2 Results of the best-performing models on the out-of-domain test datasets.
From: Large language models to identify social determinants of health in electronic health records
Any social determinant of health (SDoH) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
Dataset | Macro-F1 | No SDoH (F1) | Employment (F1) | Housing (F1) | Parent (F1) | Relationship (F1) | Social support (F1) | Transportation (F1) | ||
Mean (95% CI) | Delta F1 | P value | ||||||||
Immunotherapy | ||||||||||
FlanXXL: Gold data only | 0.70 (0.63–0.76) | +0.01 | <0.01 | 0.99 | 0.83 | 0.55 | 0.69 | 0.93 | 0.46 | 0.46 |
FlanXXL: Gold + synthetic data | 0.71 (0.64–0.76) | 0.99 | 0.79 | 0.55 | 0.68 | 0.91 | 0.63 | 0.40 | ||
MIMIC-III | ||||||||||
FlanXXL: Gold data only | 0.57 (0.49–0.63) | −0.02 | <0.01 | 0.98 | 0.65 | 0.00 | 0.63 | 0.91 | 0.32 | 0.50 |
FlanXXL: Gold + synthetic data | 0.55 (0.49–0.61) | 0.98 | 0.69 | 0.24 | 0.44 | 0.91 | 0.33 | 0.24 | ||
Adverse social determinants of health (SDoH) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
Dataset | Macro-F1 | No SDoH (F1) | Employment (F1) | Housing (F1) | Parent (F1) | Relationship (F1) | Social support (F1) | Transportation (F1) | ||
Mean (95% CI)a | Delta F1b | P value | ||||||||
Immunotherapy | ||||||||||
FlanXL: Gold data only | 0.63 (0.54–0.72) | +0.03 | <0.01 | 1.00 | 0.56 | 0.46 | 0.68 | 0.81 | 0.50 | 0.46 |
FlanXL: Gold + synthetic data | 0.66 (0.58–0.72) | 1.00 | 0.60 | 0.63 | 0.60 | 0.81 | 0.59 | 0.40 | ||
MIMIC-III | ||||||||||
FLANXL: Gold data only | 0.53 (0.47–0.60) | −0.02 | <0.01 | 0.99 | 0.51 | 0.50 | 0.53 | 0.65 | 0.22 | 0.20 |
FLANXL: Gold + synthetic data | 0.51 (0.43–0.59) | 0.99 | 0.55 | 0.35 | 0.54 | 0.68 | 0.43 | 0.20 | ||