Fig. 2: Comparison of performance before and after arbitration.
From: Impact of using artificial intelligence as a second reader in breast screening including arbitration

a,b, Sensitivity and false-positive rate (1 − specificity) of the AI read10 as ROC curve (line), at the operating point used in the study (yellow), first (blue) and second (turquoise) historical reader before arbitration and AI arm (blue and yellow) and human arm (blue and turquoise) after arbitration, for center 1 (a) (n = 23,019 women) and center 2 (b) (n = 22,583 women), respectively. Error bars for sensitivity and specificity values are Wald’s CI. c, Sensitivity by time of cancer detection before and after arbitration for the AI and human arms and specificity before and after arbitration for the human and AI arms, for both sites combined. Numbers on top of the bars are the sensitivity or specificity for that bar and error bars are Wald’s CI (n = 45,602 women). d, Outcome of comparing the ROIs shown by AI at the ground-truth location, for the 84 positive cases that AI correctly recalled on a case level before arbitration, but the reader pair overruled at arbitration (n = 84 women).