Fig. 3: ML models predict image swelling within the range of human performance.

a Per image results from the labeling survey that demonstrates both ML models consistently predict image swelling within the range generated by six human experts. Horizontal jitter of Expert results (red triangles) is artificial, and to prevent data overlap. b Distributions of the predicted normalized swelling values from the human expert labeling survey, and both ML models on the Cavity Test set & Survey set. Image confidence filtering was used for both ML models. Experimental-ML: n = 21, Synthetic-ML: n = 20, Human Experts: n = 30 (6 experts × 5 images each = 30).