Table 5 Comparison of evaluation datasets used for ophthalmology-related model benchmarking.

From: Ophtimus-V2-Tx: a compact domain-specific LLM for ophthalmic diagnosis and treatment planning

Attribute

Ophtimus-Eval-V1

MedMCQA (Ophth. Subset)

PubMedQA (Ophth. Subset)

Domain

Ophthalmology (19 subfields)

Medical QA (Ophthalmology subset)

Biomedical Literature (Ophthalmology subset)

Source

Educational websites (e.g., academia.edu)

NEET-PG (India’s national medical exam)

PubMed

Size

2154 questions

6932 questions

297 questions

Format

MCQ (4 options)

MCQ (4 options, some with explanations)

NLI (Yes/No/Maybe)

Purpose

Domain-specific MCQA benchmark; fine-grained topic analysis

Assess domain transfer and robustness in ophthalmology QA

Evaluate clinical reasoning and inference from literature

Access

Restricted (available upon request)

Public

Public