Fig. 2: Mean classification performance of different model-implementation methods.

Panels compare each implementation method (ready-made, threshold adjustment, transfer learning) and baseline models (site-specific and multi-site), across four NHS Trusts: a OUH, b PUH, c UHB, and d BH (numerical results are shown in Supplementary Tables S4, S5). All models were optimized during training and validation to achieve sensitivities of 0.85 (optimal thresholds can be found in Supplementary Table S6). Error bars show 95% confidence intervals (CIs) based on standard error. CIs for AUROC are calculated using Hanley and McNeil’s method.