Table 1 Top-1 and top-10 accuracy of DDx lists produced with AMIE and Search assistance

From: Towards accurate differential diagnosis with large language models

 

Model only

Human

 

AMIE

Before assistance

After Search assistance

After AMIE assistance

Metrics

Top-1

Top-10

Top-1

Top-10

Top-1

Top-10

Top-1

Top-10

Full set (302 cases)

29.2%

59.1%

15.9%

33.6%

24.3%

44.5%

25.2%

51.8%

Set with no overlap (56 cases)

35.4%

55.4%

13.8%

34.6%

29.2%

46.2%

24.6%

52.3%

Difference compared to full set

+6.2%

3.7%

2.1%

+1.0%

+4.9%

+1.7%

0.6%

+0.5%

Set with partial overlap (249 cases)

29.9%

61.4%

14.9%

33.1%

24.3%

44.2%

24.7%

51.4%

Difference compared to full set

+0.7%

+2.3%

1.0%

0.5%

0%

0.3%

0.5%

0.4%

  1. The percentage of DDx lists with the final diagnosis. Bold numbers reflect the difference in percentage accuracy between the full case set and the partial case sets.