Table 3 Accuracy and near-answer rates for ASCO-SNO-ASTRO guideline recommendation
ASCO guideline | ||||
|---|---|---|---|---|
SoR evaluation | QoE evaluation | |||
Accuracy % | Near-answer % | Accuracy % | Near-answer % | |
Nrs_a1 | 23.07 | 84.61 | 30.76 | 92.30 |
Nrs_a2 | 30.76 | 46.15 | 15.38 | 92.30 |
Nrs_r1 | 15.38 | 38.46 | 30.76 | 46.15 |
Nrs_r2 | 53.84 | 92.30 | 69.23 | 100 |
Rad_a1 | 23.07 | 53.84 | 23.07 | 61.53 |
Rad_a2 | 23.07 | 84.61 | 23.07 | 76.92 |
Rad_r1 | 30.76 | 61.53 | 15.38 | 61.53 |
Rad_r2 | 38.46 | 61.53 | 30.76 | 61.53 |
Medical experts mean/median | 29.80/26.91 | 65.37/61.53 | 29.80/26.91 | 65.37/69.22 |
GPT-4o | 30.76 | 100 | 23.07 | 69.23 |
Gemini | 30.76 | 76.92 | 38.46 | 84.61 |
Copilot | 7.69 | 92.30 | 7.69 | 84.61 |
Deepseek | 61.53 | 84.61 | 46.15 | 84.61 |
LLMs mean/median | 32.69/30.76 | 88.46/88.45 | 28.84/30.76 | 80.76/84.61 |