Table 4 Hierarchical F1 scores for CliBench across all tasks—primary/secondary diagnosis, medication, and surgical procedure—with 95% confidence intervals (CI).
From: Ophtimus-V2-Tx: a compact domain-specific LLM for ophthalmic diagnosis and treatment planning
# | Model | Task | L1 | L2 | L3 | L4 | Full |
|---|---|---|---|---|---|---|---|
1 | OpenAI API (ChatGPT-4o) | Primary | 0.74 [0.72, 0.76] | 0.57 [0.55, 0.60] | 0.52 [0.49, 0.54] | 0.38 [0.36, 0.41] | 0.26 [0.24, 0.28] |
Secondary | 0.56 [0.53, 0.59] | 0.31 [0.28, 0.34] | 0.25 [0.22, 0.28] | 0.17 [0.15, 0.19] | 0.12 [0.10, 0.14] | ||
Medication | 0.66 [0.63, 0.69] | 0.63 [0.60, 0.66] | 0.51 [0.48, 0.54] | 0.44 [0.41, 0.47] | 0.32 [0.30, 0.35] | ||
Surgical | 0.75 [0.72, 0.78] | 0.62 [0.58, 0.65] | 0.35 [0.31, 0.39] | – | 0.16 [0.12, 0.19] | ||
2 | LLaMA 3.1 8B Instruct | Primary | 0.63 [0.60, 0.65] | 0.45 [0.42, 0.47] | 0.38 [0.36, 0.41] | 0.27 [0.25, 0.29] | 0.18 [0.16, 0.20] |
Secondary | 0.33 [0.31, 0.35] | 0.23 [0.21, 0.25] | 0.17 [0.15, 0.19] | 0.14 [0.12, 0.16] | 0.12 [0.10, 0.14] | ||
Medication | 0.47 [0.45, 0.50] | 0.45 [0.42, 0.48] | 0.36 [0.33, 0.39] | 0.31 [0.29, 0.34] | 0.22 [0.19, 0.24] | ||
Surgical | 0.58 [0.56, 0.61] | 0.29 [0.26, 0.31] | 0.13 [0.11, 0.14] | – | 0.06 [0.05, 0.07] | ||
3 | LLaMA 3.1 8B Ophthalmic Instruct | Primary | 0.72 [0.69, 0.74] | 0.56 [0.54, 0.59] | 0.50 [0.48, 0.52] | 0.40 [0.37, 0.42] | 0.24 [0.22, 0.26] |
Secondary | 0.39 [0.37, 0.41] | 0.26 [0.24, 0.29] | 0.21 [0.19, 0.23] | 0.17 [0.16, 0.19] | 0.14 [0.13, 0.16] | ||
Medication | 0.54 [0.52, 0.57] | 0.52 [0.49, 0.55] | 0.41 [0.39, 0.44] | 0.37 [0.35, 0.40] | 0.28 [0.26, 0.31] | ||
Surgical | 0.66 [0.63, 0.68] | 0.40 [0.38, 0.42] | 0.20 [0.18, 0.22] | – | 0.09 [0.08, 0.10] | ||
4 | Ophtimus-V2-Inst | Primary | 0.67 [0.65, 0.70] | 0.50 [0.48, 0.53] | 0.42 [0.40, 0.45] | 0.31 [0.28, 0.33] | 0.17 [0.15, 0.19] |
Secondary | 0.37 [0.34, 0.40] | 0.23 [0.21, 0.26] | 0.19 [0.16, 0.22] | 0.15 [0.13, 0.18] | 0.12 [0.09, 0.14] | ||
Medication | 0.52 [0.49, 0.55] | 0.49 [0.46, 0.52] | 0.39 [0.36, 0.43] | 0.34 [0.31, 0.37] | 0.25 [0.22, 0.27] | ||
Surgical | 0.62 [0.59, 0.64] | 0.39 [0.36, 0.41] | 0.20 [0.18, 0.22] | – | 0.10 [0.09, 0.12] | ||
5 | Ophtimus-V2-Tx | Primary | 0.73 [0.71, 0.75] | 0.58 [0.56, 0.61] | 0.51 [0.48, 0.53] | 0.40 [0.37, 0.42] | 0.23 [0.21, 0.25] |
Secondary | 0.36 [0.33, 0.38] | 0.25 [0.23, 0.27] | 0.20 [0.18, 0.22] | 0.17 [0.15, 0.19] | 0.15 [0.13, 0.16] | ||
Medication | 0.55 [0.52, 0.57] | 0.52 [0.49, 0.55] | 0.43 [0.40, 0.46] | 0.40 [0.37, 0.42] | 0.31 [0.28, 0.33] | ||
Surgical | 0.75 [0.72, 0.77] | 0.42 [0.39, 0.46] | 0.25 [0.22, 0.28] | – | 0.13 [0.10, 0.15] |