Table 3 Accuracy and near-answer rates for ASCO-SNO-ASTRO guideline recommendation

From: Large language models standardize the interpretation of complex oncology guidelines for brain metastases

 

ASCO guideline

SoR evaluation

QoE evaluation

Accuracy %

Near-answer %

Accuracy %

Near-answer %

Nrs_a1

23.07

84.61

30.76

92.30

Nrs_a2

30.76

46.15

15.38

92.30

Nrs_r1

15.38

38.46

30.76

46.15

Nrs_r2

53.84

92.30

69.23

100

Rad_a1

23.07

53.84

23.07

61.53

Rad_a2

23.07

84.61

23.07

76.92

Rad_r1

30.76

61.53

15.38

61.53

Rad_r2

38.46

61.53

30.76

61.53

Medical experts mean/median

29.80/26.91

65.37/61.53

29.80/26.91

65.37/69.22

GPT-4o

30.76

100

23.07

69.23

Gemini

30.76

76.92

38.46

84.61

Copilot

7.69

92.30

7.69

84.61

Deepseek

61.53

84.61

46.15

84.61

LLMs mean/median

32.69/30.76

88.46/88.45

28.84/30.76

80.76/84.61

  1. The values shown in bold in the tables represent the highest values achieved by participants in the relevant section.