Fig. 4: Study design. | npj Digital Medicine

Fig. 4: Study design.

From: Benchmarking the diagnostic performance of open source LLMs in 1933 Eurorad case reports

Fig. 4

A total of 2894 cases were excluded as the true diagnosis was mentioned in the case description to be provided as LLM input. 16 LLMs were prompted to output the three most likely differential diagnoses, based on the Eurorad case reports. Llama-3-70B was used to automatically determine the % of correct responses, given the ground truth diagnosis. A subset of 140 LLM responses were additionally rated by radiologists to evaluate the judging accuracy of Llama-3-70B. DDx differential diagnoses.

Back to article page