Table 4 Convergence (Cohen’s Weighted Kappa) with reference truth for ASCO-SNO-ASTRO Guideline
ASCO-SNO-ASTRO guideline | ||||||
|---|---|---|---|---|---|---|
SoR evaluation | QoE evaluation | |||||
κ-value | %95 CI | p-value | κ-value | %95 CI | p-value | |
Nrs_a1: | 0.060 | −0.267 to 0.387 | 0.710 | 0.286 | 0.049 to 0.583 | 0.037 |
Nrs_a2: | −0.118 | −0.421 to 0.184 | 0.353 | −0.054 | −0.554 to 0.446 | 0.756 |
Nrs_r1: | −0.321 | −0.728 to −0.086 | 0.040 | 0.071 | −0.073 to 0.215 | 0.443 |
Nrs_r2: | 0.428 | 0.042 to 0.813 | 0.050 | 0.644 | 0.354 to 0.934 | 0.004 |
Rad_a1: | −0.118 | −0.293 to −0.057 | 0.353 | 0.085 | −0.030 to 0.199 | 0.279 |
Rad_a2: | 0.025 | −0.239 to 0.289 | 0.853 | 0.106 | −0.131 to 0.342 | 0.429 |
Rad_r1: | 0.042 | −0.043 to 0.127 | 0.488 | −0.010 | −0.186 to 0.167 | 0.913 |
Rad_r2: | 0.066 | −0.220 to 0.352 | 0.634 | 0.133 | −0.047 to 0.314 | 0.242 |
GPT-4o | 0.291 | 0.098 to 0.484 | 0.026 | 0.117 | −0.023 to 0.256 | 0.188 |
Gemini | 0.060 | −0.276 to 0.396 | 0.710 | 0.286 | 0.029 to 0.542 | 0.037 |
Copilot | −0.090 | −0.353 to 0.173 | 0.506 | −0.022 | −0.213 to 0.186 | 0.835 |
Deepseek | 0.428 | 0.041 to 0.814 | 0.069 | 0.264 | −0.153 to 0.681 | 0.187 |