Fig. 4: Comparative analysis of hematopathologist assessments of real and generated images.

Three hematopathologists were presented with 80 real or synthetic image tiles representing ET or prePMF conditions and were tasked with diagnosing ET or prePMF, abstaining, or proposing an alternative diagnosis. The manually selected images comprised of the subset correctly predicted by the AI classifier. a, b Across the full set of surveyed images, hematopathologists more often diagnosed real images as ET or prePMF compared to generated ones (80.0% for real vs 55.0% for generated, p < 0.001). Among those images where pathologists rendered a diagnosis, accuracy was higher upon the real images (76.0% vs 53.0%, p < 0.01). c, d Within the subset of real images, the assignment of a diagnosis was similar between prePMF and ET (73.3% for prePMF versus 86.7% for ET, p = 0.1). When hematopathologists assigned a diagnosis upon real images, the accuracy did not significantly differ with the condition present (75.0% for prePMF vs 76.9% for ET, p = 1). e, f Within the subset of generated image tiles, a higher proportion of prePMF images received a diagnosis by hematopathologists (68.3% vs. 41.7%, p < 0.01). Moreover, in the generated images subset with an assigned diagnosis, accurate predictions were substantially more frequent for prePMF images (78.0% vs. 12.0%, p < 0.001).