Fig. 4: Multi-reader multi-case evaluation and workflow impact.

A–F Per-reader performance without (black) and with (red) ProAI for: (A) AUC, (B) sensitivity, (C) specificity, (D) accuracy, (E) PPV and (F) NPV. Nine readers (R1–R9) each interpreted n = 250 de-identified cases per condition; points show per-reader estimates and lines link paired conditions. AUCs were computed from ordinal reader scores; other metrics were calculated at the prespecified diagnostic threshold used in the reading study. Group-level comparisons were performed within the OR–MRMC framework (two-sided) and are reported in the text and Supplementary Tables. G Experience-stratified AUC for senior radiologists (>10 years prostate MRI), general radiologists (<5 years prostate MRI) and urologists. Bars show group means with 95% CIs; two-sided tests as specified in Methods. H Reading time per case with and without ProAI. Violin plots summarise n = 250 cases per reader per condition; central line = median; box = IQR (25th–75th percentiles); whiskers = non-outlier range; overlaid dots indicate mean ± SD. Within-reader differences were assessed with a two-sided paired t-test (no multiplicity adjustment). Mean time decreased from 72.7 ± 23.5 s to 48.7 ± 10.0 s. Timing reflects active interpretation only and excludes AI preprocessing, which runs in parallel. I Clinical integration: consultation counts per reader across 250 cases; overall ProAI consultation rate 91.13%. Usage patterns reflect case complexity and reader confidence. All statistical tests were two-sided, and P < 0.05 was considered significant unless stated. AUC area under the ROC curve, PPV positive predictive value, NPV negative predictive value, R reader, AI ProAI software. Source data are provided as a Source Data file.