Fig. 1: Key challenges in machine learning for medical imaging.

a Data scarcity and b–d data mismatch. X represents images and Y, annotations (e.g. diagnosis labels). Ptr refers to the distribution of data available for training a predictive model, and Pte is the test distribution, i.e. data that will be encountered once the model is deployed. Dots represent data points with any label, while circles and crosses indicate images with different labels (e.g. cases vs. controls).