Extended Data Fig. 6: Contingency tables showing pairwise rating agreement between our best AutoRaters, their corresponding expert raters, and other experts. | Nature Medicine

Extended Data Fig. 6: Contingency tables showing pairwise rating agreement between our best AutoRaters, their corresponding expert raters, and other experts.

From: A personal health large language model for sleep and fitness coaching

Extended Data Fig. 6

Counts are aggregated across all case studies, sections, and principles for each case study for which at least one rating from the AutoEval training rater is available in the sleep (a) and fitness (b) domains. Blue, the primary expert rater versus other raters. Green, the AutoEval model trained on primary expert ratings versus other raters. Yellow, the primary expert rater versus the corresponding AutoEval model.

Source data

Back to article page