Table 2 Performance of different fine-tuned models for seizure frequency attribute extraction on the test set
Model | Precision (%) | Recall (%) | F1-score (%) |
|---|---|---|---|
bert-large-cased | 87.19 ± 1.74 | 90.9 ± 1.35 | 89 ± 1.43 |
biobert-large-cased | 87.45 ± 1.68 | 90.28 ± 1.34 | 88.84 ± 1.38 |
Bio_ClinicalBERT | 83.98 ± 1.97 | 88.05 ± 1.56 | 85.96 ± 1.67 |
Llama-2-70b-hf | 84.64 ± 2.68 | 85.83 ± 2.17 | 85.23 ± 2.33 |
GPT-3.5 Turbo | 88.99 ± 1.62 | 90.23 ± 1.7 | 87.91 ± 1.61 |
GPT-4 | 90.23 ± 1.7 | 93.51 ± 1.21 | 91.84 ± 1.36 |