Extended Data Fig. 9: Inter-reader Dice score versus best performing algorithm (ensemble) Dice score on a subset of challenge samples.

Overall, quantitative metrics suggest similar performance of the second reader compared to the ensemble model. FP volume = false positive volume, FN volume = false negative volume, NSD = Normalized Surface Dice.