Fig. 1: Schematic overview of sampling regimes for performance assessment in the entire target population of images or in specific subsets.

Overall performance assessment requires a representative sample along all dimensions of variability, relevant subsets are typically limited along one dimension (e.g., age range or scanner type).