Table 3 Outcome metrics for standard double reading versus double reading plus the AI-assisted additional-reader workflow

From: Prospective implementation of AI-assisted screen reading to improve early detection of breast cancer

Variable	Double reading		Double reading plus the AI-assisted additional-reader workflow		Difference
Variable	Num/Denom	Value (95% CI)	Num/Denom	Value (95% CI)	Difference
Results of phase 1, pilot rollout (1 site, 1 additional arbitrator, additional arbitration cases were single read), n = 3,746 screens
CDR (per 1,000 cases)	48/3,746	12.8 (9.7–16.9)	54/3,746	14.4 (11.1–18.8)	1.6^a
RR (%)	250/3,746	6.7 (5.9–7.5)	256/3,746	6.8 (6.1–7.7)	0.2
Sen (%)	48/58	82.8 (71.7–90.4)	54/58	93.1 (83.6–97.3)	10.3^a
Spec (%)	3,486/3,688	94.5 (93.7–95.2)	3,486/3,688	94.5 (93.7–95.2)	0.0
PPV (%)	48/250	19.2 (14.8–24.5)	54/256	21.1 (16.5–26.5)	1.9
Arbitration rate (%)	114/3,746	3.0 (2.5–3.6)	510/3,746	13.6 (12.6–14.8)	10.6
Positive discordance rate (%)	–	–	396/3,746	10.6 (9.6–11.6)	–
RR of additional arbitration (%)	–	–	6/396	1.5 (0.7–3.3)	–
PPV of additional arbitration (%)	–	–	6/6	100 (61.0–100)	–
Results of phase 2, extended pilot (4 sites, 3 additional arbitrators, all additional arbitration cases were read by each additional reader), n = 9,112 screens
CDR (per 1,000 cases)	126/9,112	13.8 (11.6–16.4)	139/9,112	15.3 (12.9–18.0)	1.4^a
RR (%)	639/9,112	7.0 (6.5–7.6)	661/9,112	7.3 (6.7–7.8)	0.2
Sen (%)	126/145	86.9 (80.4–91.4)	139/145	95.9 (91.3–98.1)	9.0^a
Spec (%)	8,454/8,967	94.3 (93.8–94.7)	8,445/8,967	94.2 (93.7–94.6)	−0.1
PPV (%)	126/639	19.7 (16.8–23.0)	139/661	21.0 (18.1–24.3)	1.3
Arbitration rate (%)	270/9,112	3.0 (2.6–3.3)	1,294/9,112	14.2 (13.5–14.9)	11.2
Positive discordance rate (%)	–	–	1,024/9,112	11.2 (10.6–11.9)	–
RR of additional arbitration (%)	–	–	22/1,024	2.1 (1.4–3.2)	–
PPV of additional arbitration (%)	–	–	13/22	59.1 (38.7–76.7)	–
Results of phase 3, live use in standard clinical practice (4 sites, 3 additional arbitrators, additional arbitration cases were single read), n = 15,953 screens
CDR (per 1,000 cases)	238/15,953	14.9 (13.2–16.9)	249/15,953	15.6 (13.8–17.7)	0.7^a
RR (%)	1,228/15,953	7.7 (7.3–8.1)	1,276/15,953	8.0 (7.6–8.4)	0.3
Sen (%)	238/253	94.1 (90.4–96.4)	249/253	98.4 (96.0–99.4)	4.3^a
Spec (%)	14,710/15,700	93.7 (93.3–94.1)	14,673/15,700	93.5 (93.1–93.8)	−0.2
PPV (%)	238/1,228	19.4 (17.3–21.7)	249/1,276	19.5 (17.4–21.8)	0.1
Arbitration rate (%)	529/15,953	3.3 (3.0–3.6)	1,715/15,953	10.8 (10.3–11.2)	7.4
Positive discordance rate (%)	–	–	1,186/15,953	7.4 (7.0–7.9)	–
RR of additional arbitration (%)	–	–	48/1,186	4.0 (3.1–5.3)	–
PPV of additional arbitration (%)	–	–	11/48	22.9 (13.3–36.5)	–

Num, numerator; Denom, denominator; CI, confidence interval; Sen, sensitivity; Spec, specificity; RR, recall rate; see metric definitions in Methods.
^aThe two-sided McNemar test to assess CDR and Sen differences between double reading and double reading plus the AI-assisted additional-reader workflow resulted in P values of 0.0031, 0.0002 and 0.001 for phases 1, 2 and 3, respectively. The McNemar test is based on the binomial distribution. Continuity correction was applied.

Back to article page

Table 3 Outcome metrics for standard double reading versus double reading plus the AI-assisted additional-reader workflow

Search

Quick links