Fig. 2: Chart-level accuracy comparisons between AI-alone, Human-alone, and Human+AI arms.

Chart-level accuracy comparisons between AI-alone (blue), Human-alone (red), and Human+AI (purple) arms. N = 355 patient charts were independently reviewed in each arm. Each chart constitutes an independent observational unit, and no technical replicates were generated. Boxplots display the median, 25th–75th percentiles (interquartile range), and whiskers representing the minimum and maximum observed values. Horizontal line inside box denotes median accuracy value. Horizontal black bar with asterisk indicates significant noninferiority (p < 0.001), tested with a one-sided, paired Wilcoxon Rank Sum test (alpha = 0.05).