Fig. 1: A statistically significant group mean does not imply all, or even most, participants in population show an effect.
From: Bayesian p-curve mixture models as a tool to dissociate effect size and effect prevalence

a For example, this histogram shows 10,000 simulated draws from population in which 30% of participants show an effect with a group-level effect size of Cohen’s d = 4, and all other participants show no effect or Cohen’s d = 0. The two underlying sampling distributions are overlaid with their means, as well as the mean/expected value of the full population. A null hypothesis significance test with this data would rightly reject the null hypothesis that the population mean is equal to zero. b For illustration, a similar simulation is shown where prevalence of effect is 70% of population. c,d Same as the top panels, but the effect size among participants who show the effect has been reduced to Cohen’s d = 2. While still a very large effect size by conventional standards, the histogram is no longer clearly bimodal. Still, population means differ from zero even when only a minority of participants show the effect. Because such a result is not a false positive – the true population mean does, in fact, differ from zero—collecting more data makes it more, not less, likely that a group-level null hypothesis test would reject the null hypothesis as a result of a low prevalence effect.