Fig. 4: Performance for the prospective observational deployment of the AI system.

a, The top segment (n = 45,653) reflects retrospective study cases from both screening services included for comparison, reanalyzed using the prospective study’s shorter 3-month ground truth. The middle segment (n = 3,678) reflects cases read at the initial deployment OP (OP1 until the OP update was performed). The bottom segment (n = 5,588) reflects cases read under the adjusted, more specific OP (OP2, latter 4 weeks). b, CDR versus recall rate, comparing the AI to human readers, using a within-episode ground truth (that is, for screen-detected cancers only, not including future interval cancers and those detected at the next round), across retrospective (n = 45,653), prospective at initial OP (n = 3,678) and prospective at adjusted OP (n = 5,588). c, Recall rate and CDR by week of the study. Dotted lines reflect performance had the Initial OP been continued for the remainder of the study. Error bars and shaded areas reflect the 95% CIs.