Table 2 Summary of the impact of the validation error data selection on the overall AI performance (AUROC) and impact on CDR and FPR when used for AI threshold calibration

From: Simulating mismatch between calibration and target population in AI for mammography the retrospective VAIB study

Characteristic

Representative population

Calibration bias

AUROC p-value

% change in CDR estimate

% change in FPR estimate

1. Reference standard (follow-up time)

3-year follow-up

1-year follow-up

<0.001

NA

2-year follow-up

<0.001

4-year follow-up

<0.001

2. Examination year

Representative selection for each calendar year (2008–2019)

Early cases (2008–2012) and late controls (2015–2019)

<0.001

−2.7 to +1.1

−13.3 to +5.9

Early control (2008–2012) and late cases (2015–2019)

<0.001

+17.3 to +19.2

+86.9 to +130

3. Population characteristics: Age at examination

Representative age distribution (40–74)

Post-menopause cases (>55) and pre-menopause controls (≤55)

<0.04

−0.2 to +3.8

−0.2 to +18.8

Pre-menopause cases (<=55) and post-menopause controls (>55)

<0.001

+15.5 to +26.8

+95 to +159.5

4. Population characteristics: Mammographic density

Representative density distribution

Density A and B only

<0.02

+1.1 to +6.9

+4 to +35.7

Density C and D only

<0.001

+8.3 to +15.8

+44.3 to +80.4

Density A, B and C only

<0.5

+6.2 to +6.8

+26.9 to +35.1

5. Cancer characteristics

Representative distribution

Invasive only

<0.06

+7.9 to +9.3

+38.2 to +41.8

In situ only

<0.08

−8.9 to +11.6

−8.9 + 11.6

6. Image acquisition (equipment manufacturer)

GE only

Philips only

<0.001

−13.1 to −32.5

−39.2 to −78.2

Philips only

GE only

<0.001

+10.6 to +33.4

+62.9 to +442

Representative GE and Philips

GE case and Philips controls

<0.001

+11.6 to +20.3

+59.1 to +137.0

Philips case and GE control

<0.001

−3.8 to −18.4

−8.1 to −59.8

  1. The minimum to maximum percent change across all three AI systems is provided for CDR and FPR with a negative value indicating percent reduction and positive values indicating percent (%) increase.
  2. AUROC Area under the receiver operating curve, CDR Cancer detection rate, FPR False positive rate.