Table 2 Summary of the impact of the validation error data selection on the overall AI performance (AUROC) and impact on CDR and FPR when used for AI threshold calibration
Characteristic | Representative population | Calibration bias | AUROC p-value | % change in CDR estimate | % change in FPR estimate |
---|---|---|---|---|---|
1. Reference standard (follow-up time) | 3-year follow-up | 1-year follow-up | <0.001 | NA | |
2-year follow-up | <0.001 | ||||
4-year follow-up | <0.001 | ||||
2. Examination year | Representative selection for each calendar year (2008–2019) | Early cases (2008–2012) and late controls (2015–2019) | <0.001 | −2.7 to +1.1 | −13.3 to +5.9 |
Early control (2008–2012) and late cases (2015–2019) | <0.001 | +17.3 to +19.2 | +86.9 to +130 | ||
3. Population characteristics: Age at examination | Representative age distribution (40–74) | Post-menopause cases (>55) and pre-menopause controls (≤55) | <0.04 | −0.2 to +3.8 | −0.2 to +18.8 |
Pre-menopause cases (<=55) and post-menopause controls (>55) | <0.001 | +15.5 to +26.8 | +95 to +159.5 | ||
4. Population characteristics: Mammographic density | Representative density distribution | Density A and B only | <0.02 | +1.1 to +6.9 | +4 to +35.7 |
Density C and D only | <0.001 | +8.3 to +15.8 | +44.3 to +80.4 | ||
Density A, B and C only | <0.5 | +6.2 to +6.8 | +26.9 to +35.1 | ||
5. Cancer characteristics | Representative distribution | Invasive only | <0.06 | +7.9 to +9.3 | +38.2 to +41.8 |
In situ only | <0.08 | −8.9 to +11.6 | −8.9 + 11.6 | ||
6. Image acquisition (equipment manufacturer) | GE only | Philips only | <0.001 | −13.1 to −32.5 | −39.2 to −78.2 |
Philips only | GE only | <0.001 | +10.6 to +33.4 | +62.9 to +442 | |
Representative GE and Philips | GE case and Philips controls | <0.001 | +11.6 to +20.3 | +59.1 to +137.0 | |
Philips case and GE control | <0.001 | −3.8 to −18.4 | −8.1 to −59.8 |