Fig. 1: Construction pipeline of CCBench and its characteristics.
From: Evaluating the performance of large language & visual-language models in cervical cytology screening

a The GPT-4 based semi-automatic pipeline for dataset construction. Textual knowledge points and image-text pairs were extracted from the TBS textbook and its online atlas. GPT-4 was then employed to generate close/open-ended QA pairs and VQA triplets using these data, followed by a manual review to ensure their quality. b QA pair (middle) and VQA triplets (right) examples generated from the TBS textbook (left). c The distribution of the first three words of open-ended questions in the CCBench, with the order of words radiating outward from the center. d The proportion of knowledge points from different chapters and sections of the textbook. e, f Distribution of question length (e) and answer length (f) in QA and VQA datasets. g The top 50 most frequent medical terms in the questions and answers of CCBench.