Fig. 5: Scatter plot comparison of the scores between expert evaluation and automated metrics for each sample in the MedThink-Bench dataset. | npj Digital Medicine

Fig. 5: Scatter plot comparison of the scores between expert evaluation and automated metrics for each sample in the MedThink-Bench dataset.

From: Automating expert-level medical reasoning evaluation of large language models

Search

Advanced search

Quick links

Explore articles by subject
Find a job
Guide to authors
Editorial policies