Fig. 2: Clustering of lncRNA into relevant pathways for breast cancer.
From: Subtype and cell type specific expression of lncRNAs provide insight into breast cancer

a Hierarchical clustering of lncRNA-mRNA Spearman correlation values (positive correlation in red, negative correlation in blue) following co-expression analysis between lncRNAs (n = 4108) and protein coding mRNAs (n = 17060). Only lncRNA and mRNA with significant correlation (Bonferroni p-value < 0.05) and −0.4> Spearman’s rho > 0.4 in the TCGA (n = 1095) and SCAN-B (n = 3455) cohorts are used in the unsupervised clustering. In addition, we plot only lncRNAs and mRNAs with number of association higher than the mean value of association (Supplementary Fig. 4). Clusters are defined using cutree_rows = 3 and cutree_cols = 3. lncRNAs (x-axis) are annotated according to the differential expression analysis (Fig. 1). b, d Bar plot showing -log(FDR q.value) from a hypergeometric test (y-axis) of gene set enrichment analysis using Hallmark pathways of the MSigDB database. Input genes for GSEA are genes from mRNA-cluster A (n = 2890) (b), mRNA-cluster B (n = 1480) (c), and mRNA-cluster C (n = 667)(d). Boxplot of the coefficients from the generalized linear modeling of the expression of lncRNAs in the SCAN-B cohort using three variables into the same model, ESR1 mRNA (to reflect estrogen signaling (e)), fibroblast score (to infer fibroblast tumor content (f)) and lymphocyte score (to infer lymphocyte infiltration (g)). Each dot represents the coefficient for a variable and each lncRNA in cluster 1 (n = 610), cluster 2 (n = 199), and cluster 3 (n = 110). Kruskal-Wallis test p-values are shown. The line within each box represents the median. Upper and lower edges of each box represent 75th and 25th percentile, respectively. The whiskers represent the lowest datum still within [1.5 × (75th − 25th percentile)] of the lower quartile, and the highest datum still within [1.5 × (75th − 25th percentile)] of the upper quartile.