Supplementary Figure 2: Individual player performance in Project Discovery

(a) Individual player accuracies (dots) for players with a minimum of 10 image evaluations show that player accuracy generally increases as players evaluate more samples (contour). Despite ~10% of players perform worse than naively guessing the most common class (Cytoplasm, blue dots), the consensus accuracy (black line) remains remarkably higher than the player average. Though a large number of poor players drop off after 100 samples or so, player performance remains remarkably unimproved over samples analyzed. (b) Player performance vs time spent per task (seconds) shows no discernable trend. This measure is confounded with time which players spent on other in-game actions with the interface open.