Fig. 6

Immune clusters are associated with EMT and proliferation, two mutually exclusive phenotypes in breast cancer. a Genes overexpressed in Cluster B were defined using Bonferroni-corrected differential expression analysis (Cluster B vs Cluster A and Cluster B vs Cluster C). Genes with significantly higher expression in Cluster B were used in a gene set enrichment analysis using the C2 (white histograms) and H (gray histograms) collections of the MsigDB. −Log10 p value of hypergeometric test are presented. The five most enriched processes in each collection are denoted. b Samples from each cohort (15 cohorts; 6101 samples) were scored using the GSVA Bioconductor package for enrichment in 12 pathways related to proliferation, EMT, and stem cells (Supplementary Data 3). Average enrichment scores are calculated per immune cluster and cohort. Unsupervised clustering using maximum method and ward. D2 linkage shows that pathways enrichment scores recapitulate the immune clusters. The numbers of samples in each cohort and immune clusters are denoted. Immune cluster from which the median score originate are annotated. c Estimates of univariate logistic regression analysis and the 95% confidence interval (CI) are illustrated by forest plot to assess which gene set signature scores calculated using GSVA associate with the poor prognosis cluster (Cluster B) vs Clusters A and C. Box size is inversely proportional to the width of the confidence interval. Asterisks denote FDR-corrected p value < 0.05. d Correlation plots represent all the significant (FDR p value < 0.05) Spearman correlations between gene set signature scores and inferred immune infiltration at the tumor site as calculated using the CIBERSORT algorithm. Color of the dots indicate positive (blue) or negative correlations (red). The size of the dots is proportional to the Spearman Rho value. e Unsupervised clustering of 1318 Cluster B samples from 15 cohorts according to the gene set signature scores using the correlation linkage and ward.D method allows to separate the samples in Cluster B with an EMT phenotype: Cluster B1 (green) or proliferative phenotype: Cluster B2 (orange). PAM50 subtypes and ER status are annotated on the top of the heatmap. f, g Kaplan–Meier survival curves for Cluster B1 (green) and Cluster B2 (orange). In all METABRIC (b) and TCGA (c) samples. p Values are from log-rank tests. Kaplan–Meier display breast cancer-specific survival for the METABRIC and relapse-free survival for the TCGA.