Fig. 2: The performance of the seizure frequency attribute extraction models for different training set sizes.

This figure demonstrates the F1-scores achieved by the seizure frequency attribute models as a function of different training set sizes. It shows that GPT models follow a similar trend with a notable increase in the F1-score when the training set size is increased from 170 to 270. The BERT models, together with Llama-2, also follow a similar trend with a notable increase in performance when the training dataset is increased from 70 to 170.