Fig. 2: Mean classification performance of different model-implementation methods. | npj Digital Medicine

Fig. 2: Mean classification performance of different model-implementation methods.

From: Machine learning generalizability across healthcare settings: insights from multi-site COVID-19 screening

Fig. 2

Panels compare each implementation method (ready-made, threshold adjustment, transfer learning) and baseline models (site-specific and multi-site), across four NHS Trusts: a OUH, b PUH, c UHB, and d BH (numerical results are shown in Supplementary Tables S4, S5). All models were optimized during training and validation to achieve sensitivities of 0.85 (optimal thresholds can be found in Supplementary Table S6). Error bars show 95% confidence intervals (CIs) based on standard error. CIs for AUROC are calculated using Hanley and McNeil’s method.

Back to article page