Table 1 Criterion-level accuracy comparisons between AI-alone, Human-alone, and Human+AI study arms

Criteria found to have significantly different mean accuracy between the Human+AI and Human-Alone arms are denoted with an asterisk. All criteria with significant differences were found to have significant superiority in favor of the Human+AI arm based on a second set of hypothesis tests. Both sets of tests were conducted at alpha level 0.05 and adjusted for multiple comparisons. Cells in light grey indicate a difference greater than 5% between Human-Alone and Human+AI, and cells in dark grey indicate a difference greater than 10%. Criterion for which the AI-Alone arm outperforms the Human-Alone and Human+AI arms are noted with bolded values. Two-sided, paired binomial exact tests with Bonferroni correction were conducted. Superiority of the Human+AI arm for criterion-level abstraction was assessed for significant criteria with an additional one-sided hypothesis with an unspecified superiority margin.

Quick links

Search