Table 18 System-level bias metrics (lower is better).

From: Ophtimus-V2-Tx: a compact domain-specific LLM for ophthalmic diagnosis and treatment planning

System

Mapper

Bias_score \(\downarrow\)

Abstain

Error dist

Close miss

Under-spec$

ATC

OpenAI

0.500

0.291

2.527

0.702

0.110

ATC

Claude

0.544

0.077

2.763

0.647

0.192

ATC

Gemini

0.498

0.043

2.859

0.597

0.192

ATC

Perplexity

0.442

0.036

4.048

0.288

0.173

ICD-10-CM

OpenAI

0.259

0.015

4.497

0.429

0.010

ICD-10-CM

Claude

0.524

0.000

3.782

0.789

0.015

ICD-10-CM

Gemini

0.521

0.005

4.249

0.656

0.015

ICD-10-CM

Perplexity

0.750

0.434

3.087

0.804

0.018

ICD-10-PCS

OpenAI

0.359

0.032

2.571

0.413

ICD-10-PCS

Claude

0.369

0.003

1.802

0.768

ICD-10-PCS

Gemini

0.320

0.029

2.015

0.604

ICD-10-PCS

Perplexity

0.652

0.395

1.710

0.752

  1. Error dist: hierarchical distance for ATC/CM; Hamming distance (0–7) for PCS. Close miss: ATC = L2 match; CM = 3-character category match; PCS = first-3-axes match. $Under-spec: ATC = \(1-\text {L5 specificity}\); CM = \(1-\text {full specificity (7-char)}\); PCS = not applicable.