Table 4 Convergence (Cohen’s Weighted Kappa) with reference truth for ASCO-SNO-ASTRO Guideline

From: Large language models standardize the interpretation of complex oncology guidelines for brain metastases

 

ASCO-SNO-ASTRO guideline

SoR evaluation

QoE evaluation

κ-value

%95 CI

p-value

κ-value

%95 CI

p-value

Nrs_a1:

0.060

−0.267 to 0.387

0.710

0.286

0.049 to 0.583

0.037

Nrs_a2:

−0.118

−0.421 to 0.184

0.353

−0.054

−0.554 to 0.446

0.756

Nrs_r1:

−0.321

−0.728 to −0.086

0.040

0.071

−0.073 to 0.215

0.443

Nrs_r2:

0.428

0.042 to 0.813

0.050

0.644

0.354 to 0.934

0.004

Rad_a1:

−0.118

−0.293 to −0.057

0.353

0.085

−0.030 to 0.199

0.279

Rad_a2:

0.025

−0.239 to 0.289

0.853

0.106

−0.131 to 0.342

0.429

Rad_r1:

0.042

−0.043 to 0.127

0.488

−0.010

−0.186 to 0.167

0.913

Rad_r2:

0.066

−0.220 to 0.352

0.634

0.133

−0.047 to 0.314

0.242

GPT-4o

0.291

0.098 to 0.484

0.026

0.117

−0.023 to 0.256

0.188

Gemini

0.060

−0.276 to 0.396

0.710

0.286

0.029 to 0.542

0.037

Copilot

−0.090

−0.353 to 0.173

0.506

−0.022

−0.213 to 0.186

0.835

Deepseek

0.428

0.041 to 0.814

0.069

0.264

−0.153 to 0.681

0.187

  1. The values shown in bold in the tables represent the highest values achieved by participants in the relevant section.