Fig. 1: Absolute performance difference between AI and non-AI models across c-index (AUC), accuracy, sensitivity and specificity (reported as %).

Positive absolute performance difference values mean AI model performance was higher than non-AI model for the given metric. Stratification corresponds to study-specific APPRAISE-AI score (low, moderate or high). A, B depict results for studies predicting mortality and functional outcome respectively. Note: absolute performance differences reflect comparisons of study-specific performance point estimates, not confidence intervals, which were inconsistently reported in included studies and models. Listed comparisons between AI and non-AI models may therefore overstate performance differences due to unreported confidence intervals quantifying uncertainty. Pease 2022 accuracy, sensitivity and specificity results from AI compared to average of three human experts (neurosurgeons).