Table 2 Performance of different fine-tuned models for seizure frequency attribute extraction on the test set

Model	Precision (%)	Recall (%)	F1-score (%)
bert-large-cased	87.19 ± 1.74	90.9 ± 1.35	89 ± 1.43
biobert-large-cased	87.45 ± 1.68	90.28 ± 1.34	88.84 ± 1.38
Bio_ClinicalBERT	83.98 ± 1.97	88.05 ± 1.56	85.96 ± 1.67
Llama-2-70b-hf	84.64 ± 2.68	85.83 ± 2.17	85.23 ± 2.33
GPT-3.5 Turbo	88.99 ± 1.62	90.23 ± 1.7	87.91 ± 1.61
GPT-4	90.23 ± 1.7	93.51 ± 1.21	91.84 ± 1.36

Quick links

Search