Table 3 Model performance on synthetic test data.
From: Large language models to identify social determinants of health in electronic health records
Any social determinant of health (SDoH) | ||||||||
|---|---|---|---|---|---|---|---|---|
Model parameters | Mean Macro-F1 (95% CI) | Employment (F1) | Housing (F1) | Parent (F1) | Relationship (F1) | Social support (F1) | Transportation (F1) | |
FT Flan-T5 XXL | 11B | 0.92 (0.62–0.95) | 0.92 | 0.91 | 0.63 | 0.95 | 0.77 | 0.93 |
GPT3.5 | 175B | |||||||
Zero-shot | 0.84 (0.48–0.95) | 0.94 | 0.87 | 0.85 | 0.82 | 0.49 | 0.84 | |
10-shot | 0.82 (0.60–0.90) | 0.89 | 0.89 | 0.76 | 0.79 | 0.61 | 0.85 | |
GPT4 | Unknown | |||||||
Zero-shot | 0.85 (0.48–0.94) | 0.94 | 0.83 | 0.72 | 0.88 | 0.49 | 0.86 | |
10-shot | 0.88 (0.58–0.93) | 0.91 | 0.90 | 0.96 | 0.82 | 0.59 | 0.91 | |
Adverse social determinants of health (SDoH) | ||||||||
|---|---|---|---|---|---|---|---|---|
Model parameters | Mean Macro-F1 (95% CI)a | Employment (F1) | Housing (F1) | Parent (F1) | Relationship(F1) | Social support (F1) | Transportation (F1) | |
FT Flan-T5 XL | 3B | 0.86 (0.65–0.98) | 0.86 | 0.86 | 0.65 | 0.98 | 0.84 | 0.86 |
GPT3.5 | 175B | |||||||
Zero-shot | 0.82 (0.51–0.95) | 0.77 | 0.93 | 0.87 | 0.72 | 0.52 | 0.94 | |
10-shot | 0.81 (0.50–0.94) | 0.93 | 0.83 | 0.78 | 0.70 | 0.50 | 0.93 | |
GPT4 | Unknown | |||||||
Zero-shot | 0.84 (0.52–0.94) | 0.79 | 0.94 | 0.94 | 0.78 | 0.53 | 0.89 | |
10-shot | 0.90 (0.71–0.96) | 0.92 | 0.91 | 0.90 | 0.73 | 0.73 | 0.96 | |