Table 5 Comparison of evaluation datasets used for ophthalmology-related model benchmarking.

Attribute	Ophtimus-Eval-V1	MedMCQA (Ophth. Subset)	PubMedQA (Ophth. Subset)
Domain	Ophthalmology (19 subfields)	Medical QA (Ophthalmology subset)	Biomedical Literature (Ophthalmology subset)
Source	Educational websites (e.g., academia.edu)	NEET-PG (India’s national medical exam)	PubMed
Size	2154 questions	6932 questions	297 questions
Format	MCQ (4 options)	MCQ (4 options, some with explanations)	NLI (Yes/No/Maybe)
Purpose	Domain-specific MCQA benchmark; fine-grained topic analysis	Assess domain transfer and robustness in ophthalmology QA	Evaluate clinical reasoning and inference from literature
Access	Restricted (available upon request)	Public	Public

Quick links

Search