npj Digital Medicine

Table 3 Model performance on synthetic test data.

From: Large language models to identify social determinants of health in electronic health records

Any social determinant of health (SDoH)
Model parameters		Mean Macro-F1 (95% CI)	Employment (F1)	Housing (F1)	Parent (F1)	Relationship (F1)	Social support (F1)	Transportation (F1)
FT Flan-T5 XXL	11B	0.92 (0.62–0.95)	0.92	0.91	0.63	0.95	0.77	0.93
GPT3.5	175B
Zero-shot		0.84 (0.48–0.95)	0.94	0.87	0.85	0.82	0.49	0.84
10-shot		0.82 (0.60–0.90)	0.89	0.89	0.76	0.79	0.61	0.85
GPT4	Unknown
Zero-shot		0.85 (0.48–0.94)	0.94	0.83	0.72	0.88	0.49	0.86
10-shot		0.88 (0.58–0.93)	0.91	0.90	0.96	0.82	0.59	0.91

Adverse social determinants of health (SDoH)
Model parameters		Mean Macro-F1 (95% CI)^a	Employment (F1)	Housing (F1)	Parent (F1)	Relationship(F1)	Social support (F1)	Transportation (F1)
FT Flan-T5 XL	3B	0.86 (0.65–0.98)	0.86	0.86	0.65	0.98	0.84	0.86
GPT3.5	175B
Zero-shot		0.82 (0.51–0.95)	0.77	0.93	0.87	0.72	0.52	0.94
10-shot		0.81 (0.50–0.94)	0.93	0.83	0.78	0.70	0.50	0.93
GPT4	Unknown
Zero-shot		0.84 (0.52–0.94)	0.79	0.94	0.94	0.78	0.53	0.89
10-shot		0.90 (0.71–0.96)	0.92	0.91	0.90	0.73	0.73	0.96

The 95% CI (confidence interval) for Macro-F1 is calculated by bootstrapping 10000 times (to achieve bootstrap SE < 0.01) with replacement. The SE of the 95% confidence interval limits is 0.0038, ascertained by performing bootstrapping 10,000 times on three distinct samples. Bolded text indicates the best performance. FT fine-tuned, CI confidence interval, SE standard error.

Back to article page

Search

Advanced search

Quick links