Table 5 Performance comparison of the proposed CARE-AD method with baseline models at -10-year prediction

From: CARE-AD: a multi-agent large language model framework for Alzheimer’s disease prediction using longitudinal clinical notes

Method

LLM calls

AD cases (P/R/F)

Controls (P/R/F)

Accuracy

Zero-shot

1

0.09 (0.08, 0.09)/0.25 (0.23, 0.28)/0.13 (0.11, 0.14)

0.56 (0.54, 0.57)/0.26 (0.25, 0.27)/0.35 (0.34, 0.37)

0.26

(0.25, 0.27)

Chain of thought

(CoT)

1

0.11 (0.10, 0.12)/0.27 (0.25, 0.29)/0.15 (0.13, 0.17)

0.65 (0.63, 0.67)/0.37 (0.35, 0.39)/0.47 (0.45, 0.49)

0.35

(0.33, 0.37)

Self-consistency

6 reasoning paths

0.13 (0.11, 0.15)/0.29 (0.26, 0.32)/0.18 (0.16, 0.20)

0.70 (0.69, 0.71)/0.47 (0.45, 0.49)/0.56 (0.54, 0.58)

0.43

(0.42, 0.44)

Self-refine

6 refine rounds

0.16 (0.14, 0.18)/0.36 (0.33, 0.39)/0.22 (0.19, 0.25)

0.73 (0.72, 0.74)/0.47 (0.45, 0.49)/0.57 (0.55, 0.59)

0.45

(0.44, 0.46)

AutoGen multi-agent (1 round)

6 doctor agents

(6 LLM calls)

0.16 (0.15, 0.17)/0.36 (0.33, 0.39)/0.22 (0.20, 0.24)

0.73 (0.72, 0.74)/0.48 (0.47, 0.50)/0.58 (0.57, 0.59)

0.45

(0.44, 0.47)

AutoGen multi-agent (2 rounds)

6 doctor agents

(12 LLM calls)

0.20 (0.18, 0.21)/0.38 (0.35, 0.41)/0.26 (0.23, 0.28)

0.77 (0.76, 0.78)/0.58 (0.56, 0.59)/0.66 (0.65, 0.67)

0.53

(0.52, 0.55)

AutoGen multi-agent (3 rounds)

6 doctor agents

(18 LLM calls)

0.20 (0.18, 0.21)/0.38 (0.35, 0.41)/0.26 (0.24, 0.28)

0.77 (0.76, 0.78)/0.58 (0.56, 0.59)/0.66 (0.64, 0.67)

0.53

(0.52, 0.55)

CARE-AD

6 doctor agents

0.20 (0.18, 0.21)/0.38 (0.35, 0.41)/0.26 (0.24, 0.28)

0.77 (0.76, 0.78)/0.57 (0.55, 0.59)/0.65 (0.64, 0.67)

0.53

(0.51, 0.54)