Fig. 1

Deconvolution and estimation of histomorphological niches within bulk glioblastoma proteomes. (A) Schematic overview of our methodology to estimate niche proportions using reference microdissected proteomic profiles and classifying bulk tumor samples via a random forest algorithm. Hematoxylin and Eosin (H&E) images detailing the anatomical niches within GBM: leading edge (LE), infiltrating tumor (IT), cellular tumor (CT), microvascular proliferations (MVP), and palisading cells around necrosis (PAN). (B) Multidimensional scaling of CPTAC samples based on all proteins using principal component analysis highlights distinct grouping of TCGA subtypes (n = 110). (C) Gene Set Enrichment Analysis (GSEA) based on all samples and their comparisons against other sample types highlights similarities in pathways between the Normal brain samples and the proneural subgroup. Normalized enrichment score (NES) is derived from the GSEA output and accounts for differences in gene set size and in correlations between gene sets in the expression dataset. (D) Random forest algorithm trained on a proteomic dataset of histomorphological features classifies CPTAC proteomic samples into niche like signatures. Cases are classified into niches based on the major niche contribution. The machine learning classifications on the X-axis represent the most abundant feature. (E) A stacked bar chart highlights the variability of decision tree probabilities across the tumors and normal brain samples (n = 108). Machine learning classified proteomes show concordance with H&E slide images for (F) LE, (G) MVP, (H) CT, (I) IT, and (J) PAN -like signatures. These H & E images are representative sections and not whole slide images. Source data are provided as a Source Data file.