Table 1 Performance of various fine-tuned models for seizure frequency phrase extraction on the test set
Model | Precision (%) | Recall (%) | F1-score (%) |
|---|---|---|---|
bert-large-cased | 77.33 ± 4.24 | 71.95 ± 4.65 | 74.51 ± 4.2 |
biobert-large-cased | 78.83 ± 4.06 | 75.43 ± 4.27 | 77.06 ± 3.86 |
Bio_ClinicalBERT | 70.12 ± 4.79 | 65.8 ± 4.58 | 67.84 ± 4.29 |
Llama-2-70b-hf | 80.72 ± 4.16 | 80.69 ± 3.65 | 80.68 ± 3.58 |
GPT-3.5 Turbo | 84.53 ± 3.85 | 77.13 ± 4.15 | 80.64 ± 3.81 |
GPT-4 | 86.61 ± 4.28 | 85.04 ± 3.51 | 85.79 ± 3.59 |