Table 5 Performance comparison of the proposed CARE-AD method with baseline models at -10-year prediction
Method | LLM calls | AD cases (P/R/F) | Controls (P/R/F) | Accuracy |
---|---|---|---|---|
Zero-shot | 1 | 0.09 (0.08, 0.09)/0.25 (0.23, 0.28)/0.13 (0.11, 0.14) | 0.56 (0.54, 0.57)/0.26 (0.25, 0.27)/0.35 (0.34, 0.37) | 0.26 (0.25, 0.27) |
Chain of thought (CoT) | 1 | 0.11 (0.10, 0.12)/0.27 (0.25, 0.29)/0.15 (0.13, 0.17) | 0.65 (0.63, 0.67)/0.37 (0.35, 0.39)/0.47 (0.45, 0.49) | 0.35 (0.33, 0.37) |
Self-consistency | 6 reasoning paths | 0.13 (0.11, 0.15)/0.29 (0.26, 0.32)/0.18 (0.16, 0.20) | 0.70 (0.69, 0.71)/0.47 (0.45, 0.49)/0.56 (0.54, 0.58) | 0.43 (0.42, 0.44) |
Self-refine | 6 refine rounds | 0.16 (0.14, 0.18)/0.36 (0.33, 0.39)/0.22 (0.19, 0.25) | 0.73 (0.72, 0.74)/0.47 (0.45, 0.49)/0.57 (0.55, 0.59) | 0.45 (0.44, 0.46) |
AutoGen multi-agent (1 round) | 6 doctor agents (6 LLM calls) | 0.16 (0.15, 0.17)/0.36 (0.33, 0.39)/0.22 (0.20, 0.24) | 0.73 (0.72, 0.74)/0.48 (0.47, 0.50)/0.58 (0.57, 0.59) | 0.45 (0.44, 0.47) |
AutoGen multi-agent (2 rounds) | 6 doctor agents (12 LLM calls) | 0.20 (0.18, 0.21)/0.38 (0.35, 0.41)/0.26 (0.23, 0.28) | 0.77 (0.76, 0.78)/0.58 (0.56, 0.59)/0.66 (0.65, 0.67) | 0.53 (0.52, 0.55) |
AutoGen multi-agent (3 rounds) | 6 doctor agents (18 LLM calls) | 0.20 (0.18, 0.21)/0.38 (0.35, 0.41)/0.26 (0.24, 0.28) | 0.77 (0.76, 0.78)/0.58 (0.56, 0.59)/0.66 (0.64, 0.67) | 0.53 (0.52, 0.55) |
CARE-AD | 6 doctor agents | 0.20 (0.18, 0.21)/0.38 (0.35, 0.41)/0.26 (0.24, 0.28) | 0.77 (0.76, 0.78)/0.57 (0.55, 0.59)/0.65 (0.64, 0.67) | 0.53 (0.51, 0.54) |