Communications Medicine

Table 3 Accuracy and near-answer rates for ASCO-SNO-ASTRO guideline recommendation

From: Large language models standardize the interpretation of complex oncology guidelines for brain metastases

	ASCO guideline
	SoR evaluation		QoE evaluation
	Accuracy %	Near-answer %	Accuracy %	Near-answer %
Nrs_a1	23.07	84.61	30.76	92.30
Nrs_a2	30.76	46.15	15.38	92.30
Nrs_r1	15.38	38.46	30.76	46.15
Nrs_r2	53.84	92.30	69.23	100
Rad_a1	23.07	53.84	23.07	61.53
Rad_a2	23.07	84.61	23.07	76.92
Rad_r1	30.76	61.53	15.38	61.53
Rad_r2	38.46	61.53	30.76	61.53
Medical experts mean/median	29.80/26.91	65.37/61.53	29.80/26.91	65.37/69.22
GPT-4o	30.76	100	23.07	69.23
Gemini	30.76	76.92	38.46	84.61
Copilot	7.69	92.30	7.69	84.61
Deepseek	61.53	84.61	46.15	84.61
LLMs mean/median	32.69/30.76	88.46/88.45	28.84/30.76	80.76/84.61

The values shown in bold in the tables represent the highest values achieved by participants in the relevant section.

Back to article page

Search

Advanced search

Quick links