Table 3 Summary of the validation error categories along with the suggested guideline and potential erroneous data selection methods

From: Simulating mismatch between calibration and target population in AI for mammography the retrospective VAIB study

Validation error

Representative data

Non-representative data

1. Follow-up period and reference standard

3-year follow-up

1-year follow-up

2-year follow-up

4-year follow-up

3. Temporal selection

Balanced selection for each calendar year (2008–2019)

Early cases (2008–2012) and late controls (2015–2019)

Early control (2008–2012) and late cases (2015–2019)

5. Population characteristics: Age

Representative age distribution (40–74)

Post-menopause cases (>55) and pre-menopause controls (≤55)

Pre-menopause cases (≤55) and post-menopause controls (>55)

7. Population characteristics: Mammographic density

Representative mammographic density distribution

Density A and B

Density C and D

Density A, B and C

10. Cancer characteristics

Representative distribution

Invasive only

In situ only

12. Image acquisition (equipment manufacturer)

GE data

Philips calibration

Philips data

GE calibration

Balanced multivendor data

GE case and Philips controls

Philips case and GE control