Fig. 3

Performance comparison of classical machine learning (CML) models and fine-tuned large language models (LLMs) in COVID-19 mortality prediction across varying sample sizes. F1 scores and accuracy are shown for seven CML models (logistic regression, support vector machines, decision trees, k-nearest neighbors, random forests, XGBoost, and neural networks) and a fine-tuned LLM. Training sample sizes range from 20 to 2047. XGBoost consistently outperforms other models, with performance improving as sample size increases.