Fig. 5: Performance of different LLMs and LVLMs across different chapters and difficulty levels.
From: Evaluating the performance of large language & visual-language models in cervical cytology screening

a, b The performance of different models on QA (a) and VQA (b) datasets across different chapters. c, d The performance of different models on QA (c) and VQA (d) datasets across three difficulty levels: “easy”, “normal”, and “hard”. The titles of the chapters are as follows: non-neoplastic findings (Ch. 2), endometrial cell (Ch. 3), atypical squamous cells (Ch. 4), squamous epithelial cell abnormalities (Ch. 5), glandular epithelial cell abnormalities (Ch. 6), and other malignant neoplasms (Ch. 7).