Table 5 Comparison of evaluation datasets used for ophthalmology-related model benchmarking.
From: Ophtimus-V2-Tx: a compact domain-specific LLM for ophthalmic diagnosis and treatment planning
Attribute | Ophtimus-Eval-V1 | MedMCQA (Ophth. Subset) | PubMedQA (Ophth. Subset) |
|---|---|---|---|
Domain | Ophthalmology (19 subfields) | Medical QA (Ophthalmology subset) | Biomedical Literature (Ophthalmology subset) |
Source | Educational websites (e.g., academia.edu) | NEET-PG (India’s national medical exam) | PubMed |
Size | 2154 questions | 6932 questions | 297 questions |
Format | MCQ (4 options) | MCQ (4 options, some with explanations) | NLI (Yes/No/Maybe) |
Purpose | Domain-specific MCQA benchmark; fine-grained topic analysis | Assess domain transfer and robustness in ophthalmology QA | Evaluate clinical reasoning and inference from literature |
Access | Restricted (available upon request) | Public | Public |