Table 2 Mean balanced accuracy of models across different age groups without (w/o) and with RAG
Evaluated Models | Young | Mid-Age/Pregeriatric | Geriatric | ||||||
|---|---|---|---|---|---|---|---|---|---|
w/o RAG | RAG | \(\Delta\)RAG | w/o RAG | RAG | \(\Delta\)RAG | w/o RAG | RAG | \(\Delta\)RAG | |
Llama 3.2 3B | 0.33 ± 0.07 | 0.38 ± 0.04 | +0.05 | 0.36 ± 0.08 | 0.42 ± 0.05 | +0.06 | 0.47 ± 0.06 | 0.46 ± 0.04 | −0.01 |
Qwen 2.5 14B | 0.47 ± 0.06 | 0.57 ± 0.01 | +0.10 | 0.47 ± 0.03 | 0.48 ± 0.01 | +0.01 | 0.63 ± 0.04 | 0.66 ± 0.02 | +0.03 |
DSR Llama 70B | 0.47 ± 0.03 | 0.52 ± 0.01 | +0.05 | 0.51 ± 0.01 | 0.48 ± 0.01 | −0.03 | 0.70 ± 0.01 | 0.69 ± 0.02 | −0.01 |
GPT-4o | 0.67 ± 0.02 | 0.65 ± 0.03 | −0.02 | 0.63 ± 0.02 | 0.55 ± 0.01 | −0.08 | 0.78 ± 0.02 | 0.76 ± 0.01 | −0.02 |
GPT-4o mini | 0.48 ± 0.07 | 0.46 ± 0.07 | −0.02 | 0.49 ± 0.03 | 0.45 ± 0.04 | −0.04 | 0.68 ± 0.04 | 0.68 ± 0.04 | ±0 |
o3 mini | 0.57 ± 0.04 | 0.58 ± 0.04 | +0.01 | 0.54 ± 0.02 | 0.51 ± 0.01 | −0.03 | 0.72 ± 0.03 | 0.75 ± 0.02 | +0.03 |
Llama3 Med42 8B | 0.31 ± 0.06 | 0.30 ± 0.01 | −0.01 | 0.39 ± 0.06 | 0.33 ± 0.03 | −0.06 | 0.54 ± 0.04 | 0.45 ± 0.03 | −0.09 |