Table 22 Pairwise inter-mapper agreement (Cohen’s \(\kappa\)/observed agreement).

From: Ophtimus-V2-Tx: a compact domain-specific LLM for ophthalmic diagnosis and treatment planning

Pair

ATC (\(\kappa\)/%)

ICD-10-CM (\(\kappa\)/%)

ICD-10-PCS (\(\kappa\)/%)

Avg. \(\kappa\)

Claude–Perplexity

0.877/89.4

0.757/75.9

0.736/73.7

0.790

Gemini–Perplexity

0.912/92.4

0.673/67.6

0.586/58.8

0.724

Claude–Gemini

0.869/88.7

0.622/62.4

0.594/59.6

0.695

OpenAI–Perplexity

0.898/91.2

0.469/47.3

0.468/47.0

0.611

OpenAI–Claude

0.913/92.5

0.412/41.5

0.428/43.0

0.585

OpenAI–Gemini

0.924/93.5

0.358/36.1

0.352/35.5

0.545

  1. Each cell uses items where the pair emitted valid coarse labels (“X” excluded), so \(N_{\text {used}}\) differs by pair/system. Coarse granularity as in Table 21.