Fig. 4: Summary of pilot study on bedside consultation dataset. | Nature Medicine

Fig. 4: Summary of pilot study on bedside consultation dataset.

From: Toward expert-level medical question answering with large language models

Fig. 4: Summary of pilot study on bedside consultation dataset.

a, Three-way ranking results for model, generalist and specialist answers by plurality of raters. Top bars show specialist raters, and bottom bars show generalist raters (11× replication per question). Both groups of physicians preferred specialist answers the most, and both preferred model answers more often than generalist answers. b, Pairwise ranking results for model, generalist and specialist answers, averaged over raters. Top bars, generalist raters; bottom bars, specialist raters (11× replication per question). Both groups of physicians preferred specialist answers over model answers. Specialists preferred model answers over generalist answers, while generalists rated them about equally.

Back to article page