Fig. 4: Proteome-grouping pattern associated with nodal metastasis and refinement of targets for ML analysis.
From: Connecting multiple microenvironment proteomes uncovers the biology in head and neck cancer

a Clustering of tissues and fluids based on the global proteomic profile (C1 and C2 clusters) (n = 59 patients). C1 and C2 groups were generated using Complete Canberra (25 primary tumor – malignant samples: 2444 proteins), and Ward Chebyshev (27 primary tumor – non-malignant: 1984 proteins, 27 lymph node – non-malignant: 2137 proteins, 24 buffy coat: 2188 proteins; 24 saliva samples: 1154 proteins). b Association between lymph node metastasis and patient clustering for non-malignant cells from the lymph node (p = 0.046; two-sided Fisher’s exact test). *p ≤ 0.05. c PCA plots presenting clusters of samples based on the abundance of proteins that were differentially abundant between pN+ and pN0 (p ≤ 0.05; two-sided unpaired Student’s t-test or proteins detected exclusively in one group; 201, 110, 85, 80, 54 proteins from non-malignant cells from lymph nodes, malignant cells from primary tumor, non-malignant cells from primary tumor, buffy coat samples, and saliva samples, respectively). d Volcano plot for the differential protein abundance between pN+ and pN0 non-malignant cells from lymph nodes. Differentially abundant proteins are presented as blue and orange dots (pN+ vs pN0; q ≤ 0.05; two-sided unpaired Student’s t-test followed by Benjamini-Hochberg test or proteins detected exclusively in one group in at least 50% of samples). e Top-10 GO biological processes that were significantly enriched for the 13 proteins associated with lymph node metastasis from d (Enrichment FDR ≤ 0.05; hypergeometric test followed by FDR correction). f AUC distribution per classifier using ML analysis of SRSF1, SRSF2, SRSF3, SRSF5, TRA2A, and CD209 proteins (upper panel) and transcripts (lower panel). Details about the AUCs plotted for each classifier are available within Supplementary Data 5-2 and 5-3. Boxplots show the median (central line), the 25–75% interquartile range (IQR) (box limits), and the ±1.5×IQR (whiskers). Source data are provided as a Source Data file.