Table 3 Model performance on synthetic test data.

From: Large language models to identify social determinants of health in electronic health records

Any social determinant of health (SDoH)

Model parameters

Mean Macro-F1 (95% CI)

Employment (F1)

Housing (F1)

Parent (F1)

Relationship (F1)

Social support (F1)

Transportation (F1)

FT Flan-T5 XXL

11B

0.92 (0.62–0.95)

0.92

0.91

0.63

0.95

0.77

0.93

GPT3.5

175B

       

 Zero-shot

 

0.84 (0.48–0.95)

0.94

0.87

0.85

0.82

0.49

0.84

 10-shot

 

0.82 (0.60–0.90)

0.89

0.89

0.76

0.79

0.61

0.85

GPT4

Unknown

       

 Zero-shot

 

0.85 (0.48–0.94)

0.94

0.83

0.72

0.88

0.49

0.86

 10-shot

 

0.88 (0.58–0.93)

0.91

0.90

0.96

0.82

0.59

0.91

Adverse social determinants of health (SDoH)

 

Model parameters

Mean Macro-F1 (95% CI)a

Employment (F1)

Housing (F1)

Parent (F1)

Relationship(F1)

Social support (F1)

Transportation (F1)

FT Flan-T5 XL

3B

0.86 (0.65–0.98)

0.86

0.86

0.65

0.98

0.84

0.86

GPT3.5

175B

       

 Zero-shot

 

0.82 (0.51–0.95)

0.77

0.93

0.87

0.72

0.52

0.94

 10-shot

 

0.81 (0.50–0.94)

0.93

0.83

0.78

0.70

0.50

0.93

GPT4

Unknown

       

 Zero-shot

 

0.84 (0.52–0.94)

0.79

0.94

0.94

0.78

0.53

0.89

 10-shot

 

0.90 (0.71–0.96)

0.92

0.91

0.90

0.73

0.73

0.96

  1. The 95% CI (confidence interval) for Macro-F1 is calculated by bootstrapping 10000 times (to achieve bootstrap SE < 0.01) with replacement. The SE of the 95% confidence interval limits is 0.0038, ascertained by performing bootstrapping 10,000 times on three distinct samples. Bolded text indicates the best performance. FT fine-tuned, CI confidence interval, SE standard error.