Fig. 2

Functional relevance of iCells. a For each of the four cancers and four control tissues, we created clusters of genes either by integrating all datasets (iCell, in pink), or by considering each dataset in isolation (PPI in red, COEX in blue, and GI in green), as detailed in Methods. For each clustering, the bars show the percentage of the reactome pathway (RP) annotated genes having at least one annotation that is enriched in the clusters. b The same as a, but averaged over all tissues and according to each of reactome pathway (RP), KEGG pathway (KP), and gene ontology biological process (GO-BP) annotations. Error bars show the standard deviations across different tissues. c Each line shows the fitting of a network model (color coded, Erdös–Rènyi (ER), generalized random (ER-DD), geometric (GEO), geometric with gene duplication (GEO-GD), scale-free (SF), scale-free with gene duplication and divergence (SF-GD), and stickiness-index based model (STICKY)) for the different types of real-world networks (x-axis), the error-bars show the averages and standard deviations of the pairwise GCD distances between real and 840 randomly generated networks of the size as the real networks. All data, apart from GI networks (which are the most sparse) are structured, in the sense that their GCD distances from ER networks are larger than from other network models. d The distribution of distances between iCell and ER networks (in pink) and the distribution of distances between ER networks (in blue) are significantly non-overlapping (MWU p value ≤ 5%), indicating that the ER model does not fit the iCell, i.e., that the structure (topology) of iCell is not random