Table 1 Top-1 and top-10 accuracy of DDx lists produced with AMIE and Search assistance
From: Towards accurate differential diagnosis with large language models
Model only | Human | |||||||
---|---|---|---|---|---|---|---|---|
AMIE | Before assistance | After Search assistance | After AMIE assistance | |||||
Metrics | Top-1↑ | Top-10↑ | Top-1↑ | Top-10↑ | Top-1↑ | Top-10↑ | Top-1↑ | Top-10↑ |
Full set (302 cases) | 29.2% | 59.1% | 15.9% | 33.6% | 24.3% | 44.5% | 25.2% | 51.8% |
Set with no overlap (56 cases) | 35.4% | 55.4% | 13.8% | 34.6% | 29.2% | 46.2% | 24.6% | 52.3% |
Difference compared to full set | +6.2% | –3.7% | –2.1% | +1.0% | +4.9% | +1.7% | –0.6% | +0.5% |
Set with partial overlap (249 cases) | 29.9% | 61.4% | 14.9% | 33.1% | 24.3% | 44.2% | 24.7% | 51.4% |
Difference compared to full set | +0.7% | +2.3% | –1.0% | –0.5% | 0% | –0.3% | –0.5% | –0.4% |