Supplementary Figure 6: Non-random sampling of genes in the 1,766 tested gene sets.

(a) Genes are ranked and plotted based on the number of gene sets to which they belong. The top 1,000 genes were over-represented in gene sets from public databases, and genes outside the top 5,000 genes were under-represented. (b) Distribution of overlap coefficients with the set of loss-of-function-intolerant genes. The overlap coefficients between each of the 1,766 discovery gene sets and the set of loss-of-function-intolerant genes were calculated. The overlap coefficients between randomly sampled gene sets and loss-of-function-intolerant genes were similarly computed. These values are displayed as two density plots. The overlap coefficient is a similarity measure defined as where X and Y are sets of genes.