Fig. 2: Top-n accuracy in DDx lists through human and automated evaluations.
From: Towards accurate differential diagnosis with large language models

The percentage accuracy of DDx lists with the final diagnosis through human evaluation (left) or automated evaluation (right). Points reflect the mean; shaded areas show ±1 s.d. from the mean across 10 trials.