Table 3 QA Accuracy Resulta by Modelb
Model Type | Model Name | Overall | Symptom Group | Medical History | Family and Social History |
|---|---|---|---|---|---|
Closed-Source | Claude-3-Haiku | 72.25% | 68.00% | 40.91% | 56.76% |
Claude-3.5-Sonnet | 83.82% | 88.00% | 86.36% | 48.65% | |
Claude-4-Sonnet | 86.13% | 68.00% | 86.36% | 64.86% | |
Claude-4-Opus | 90.80% | 88.00% | 86.36% | 78.38% | |
GPT-3.5-Turboc | 55.49% | 60.00% | 36.36% | 60.00% | |
GPT-4o | 89.02% | 92.00% | 81.82% | 64.86% | |
GPT-4-Turbo | 94.15%c | 91.20% | 87.10% | 85.56% | |
Open-Source | Deepseek-v3-671b | 80.81% | 63.64% | 79.31% | 62.50% |
Llama-3-70b | 75.72% | 84.00% | 36.36% | 43.24% | |
Qwen-3-32b | 78.61% | 84.00% | 77.27% | 40.54% |