Fig. 2: Performance on ASCO-SNO-ASTRO guideline. | Communications Medicine

Fig. 2: Performance on ASCO-SNO-ASTRO guideline.

From: Large language models standardize the interpretation of complex oncology guidelines for brain metastases

Fig. 2

Comparative performance analysis of medical experts (blue bars: Nrs_a1, Nrs_a2, Nrs_r1, Nrs_r2, Rad_a1, Rad_a2, Rad_r1, Rad_r2) and large language models (orange bars: GPT-4o, Gemini, Copilot, DeepSeek) in ASCO-SNO-ASTRO clinical practice guideline assessments. 2A Accuracy rates for Strength of Recommendation. 2B Cohen’s kappa values for Strength of Recommendation. 2C Accuracy rates for Quality of Evidence. 2D Cohen’s kappa values for Quality of Evidence. Participant identifiers: Nrs = Neurosurgery (a = attending, r = resident); Rad = Radiation oncology (a = attending, r = resident). Higher kappa values indicate greater Convergence in responses relative to the reference standard.

Back to article page