Abstract
Breast cancer is a complex disease that is characterized by altered functions of many genes. In this study, we aim to establish long non-coding RNAs as prognostic biomarkers as they play vital roles in their prognosis and progression. For this purpose, we retrieved four datasets (GSE10810, GSE42568, GSE65194, and GSE45827) from NCBI Gene Expression Omnibus. Using R studio, we identified differentially expressed genes followed by functional enrichment analysis with the help of DAVID software. DEGs were used to construct a protein–protein interaction map which we later used to identify highly correlated genes with the help of sub-network identification and centrality analysis in cytoscape software. Next, lncRNAs-mRNA coexpression analysis was performed to determine the potential role of lncRNAs. Later on, we assessed their role in survival with the help of Kaplein Meir’s plot. The results of these analyses establish long noncoding RNAs i.e. EPB41L4A-AS1, LINC00667, MAGI2-AS3, and MALAT1 as prognostic biomarkers. This study provides a blueprint for identifying mRNA-lncRNA networks in any cancer, facilitating the use of lncRNAs as prognostic biomarkers and aiding in the identification of therapeutic targets. Future research may focus on validating this research biologically to further substantiate the role of lncRNAs as prognostic biomarkers in breast cancer.
Similar content being viewed by others
Introduction
As a prevalent form of malignant tumor, breast cancer (BC) is the second most frequent cause of cancer-related deaths on a global scale. It is a significant threat to the health and quality of life of women1. Challenges such as recurrence, treatment resistance, metastasis, and cancer heterogeneity2 remain barriers to effectively controlling the condition, despite recent therapeutic advancements. Projections from the American Cancer Society indicate that, by 2024, BC will emerge as the most prevalent cancer type in women, accounting for 32% of female cancer cases3. Therefore, it becomes essential to explore more effective biomarkers for facilitating the assessment of Breast cancer progression and recurrence.
Our understanding of cancer has been significantly expanded by advances in the genetic basis, assisting in diagnosis, prognosis, and treatment4. Whole genome analysis, expression profiling, and gene editing are three tools that have revolutionized cancer research. Numerous regulatory RNAs of different sizes have been identified5. According to genome-wide transcriptomics, only 2% of the human genome comprises protein-coding genes6. A large portion, approximately 78%, of the genome consists of non-coding RNA (ncRNA)7. The non-coding RNAs include structural RNAs like ribosomal, transfer, small nuclear, and small nucleolar RNAs. Most non-coding RNAs are regulatory, such as lncRNAs, microRNAs, piwi-interacting RNAs, and circular RNAs8.
Recently, it has emerged that long non-coding RNAs (LncRNAs) and Breast cancer metastasis are closely related9. LncRNAs account for a large proportion of noncoding RNAs longer than 200 nucleotides10. LncRNAs were initially perceived as mere by-products of polymerase II transcription, which had no significance in biology. However, since the development of high-throughput sequencing technology, more and more lncRNAs have been annotated, and their biological functions in carcinogenesis and tumor progression have been gradually revealed11.
From another perspective, lncRNAs can be classified as nuclear or cytoplasmic based on their subcellular localization, which helps predict their functions12. There is significant evidence that cytoplasmic lncRNAs primarily regulate mRNA stability or translation and contribute to cell signal transmission. On the other hand, nuclear lncRNAs have a crucial role in chromatin interaction, transcriptional control, and RNA processing13.
Long non-coding RNAs are also essential in tumor immune microenvironment variation, genetic predisposition and regulation of cell cycle in breast cancer. Moreover, they play role either as oncogene or tumor suppressor genes by regulating signaling pathways alongwith cancer-related modulators14. Abnormal expression of lncRNAs may influence the immune microenvironment of tumor, potentially serving as prognostic biomarkers and predictors of immunotherapy response in Breast cancer patients15. In addition, lncRNAs can act as competing endogenous RNAs (ceRNAs) for microRNA regulation and hence playing a role in resistance of tumors in Breast cancer16. These outcomes suggest promising new therapeutic strategies for BC treatment targeting lncRNAs.
Approximately 1900 lncRNAs were found to be deregulated in Breast cancer and the expression levels of these deregulated lncRNAs may correlate with distinct clinical outcomes17. These lncRNAs known as oncogenes or tumor suppressors may regulate the development of breast cancer pathophysiology. Therefore, further investigation is needed to understand their role in breast cancer biology and overall their utility as diagnostic and prognostic biomarkers18. Studies have shown that there is an upregulation of several lncRNAs, such as HOTAIR, BCAR4, and linc-ROR which promote the invasion and metastasis of Breast cancer19. In both primary and metastatic breast tumors, HOTAIR was overexpressed leading to highly aggressive tumors and poor prognosis20. Another oncogenic lncRNA, MALAT1, promoted triple-negative breast cancer growth and metastasis via modulating cell cycle gene expression21. Suppression of MALAT1 has been shown to impede the cell cycle and inhibit apoptosis, consequently promoting cancer cell proliferation22.
The current study aims to elucidate the significance of lncRNAs in prognosis and patient treatment, and their potential use as biomarkers in breast cancer diagnosis.
Results
Screening of differentially expressed genes
Background correction followed by normalization were applied to the selected datasets followed by differentially expressed gene analysis. The principal component analysis (PCA) plot was plotted for outlier detection. It also gives valuable details about the structure of the analyzed datasets. It may be employed to elucidate commonalities between samples in the datasets. We discovered one sample from the tumor group and six from the normal group that were found to be much different from similar samples. Therefore, we eliminated these seven samples. Fig S1 (A, B, C, D).
After correction and executing data normalization, 3483 DEGs including 1873 upregulated and 1610 down-regulated DEGs were differentiated between breast cancer and normal samples from datasets GSE10810, GSE45827, GSE65194, and GSE42568 according to |logFC|> 2 and < 0.5 adjusted p value (adjp) as cut off criteria. The list of common DEGs is presented in supplementary Table S1. A Venn diagram that depicts the overlap between two datasets is drawn for all four datasets (Fig. 1A). Additionally, a Volcano plot was designed to indicate overall gene expression levels of DEGs with log2 FC score and log10 P values using R software (Fig. 1B).
Enrichment analysis
To determine the function of common DEGs in breast cancer, we executed the Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathway and Gene Ontology (GO) using DAVID database using intersection of differentially expressed genes (DEGs) across the datasets. The outcomes of GO analysis were divided into biological processes (BP), cellular components (CC), and molecular functions (MF). It appeared that DEGs are enriched in various biological processes including, cell division, lipid storage, response to hydrogen peroxide, retinol metabolic processes, response to bacterium, jasmonic acid response, response to glucocorticoid, nitric oxide transport, inhibition of epithelial proliferation and TNF response Fig. 2A. For the cellular component group, DEGs are specifically enriched in extracellular exosome, lipid particle, cell surface, extracellular region, extracellular matrix, extracellular space, endocytic vesicle lumen perinuclear region of cytoplasm, caveola and collagen trimmer and endocytic vesicle lumen Fig. 2A. Moreover, molecular functions are enriched in extracellular matrix structural constituent, heparin-binding, protein homodimerization activity, phenanthrene hydroxylase activity, androsterone dehydrogenase activity, ketosteroid monooxygenase activity, carboxylase acid binding, oxidoreductase activity, bile acid binding, and androstanol dehydrogenase activity Fig. 2A. KEGG analysis demonstrated that the DEGs are enriched in the AMPK signaling pathway, adipocytokine signaling pathway, PPAR signaling pathway, proximal tubule bicarbonate reclamation, Malaria, ECM-receptor interaction, oocyte mitosis, regulation of lipolysis in adipocytes, P13k-Akt signaling pathway and Focal Adhesion Fig. 2B.
PPI network construction
To determine the interaction between DEGs and the protein–protein interaction network (PPI), 180 common DEGs were uploaded to the STRING online database which showed 167 nodes and 642 edges. These DEGs were filtered at a combined score > 0.4 with 138 nodes and 642 edges. Lastly, genes with a combined score > 0.9 were chosen as key DEGs to be imported into Cytoscape (v3.10.2). This is demonstrated as a map in Fig. 3A. Following this, the MCODE plug-in was performed and the top two clusters with 24 nodes, 270 edges, and 16 nodes, 102 edges respectively were screened in Fig. 3B, C. In addition, two Cytoscape plug-ins including CytoHubba and CytoNCA were applied for centrality analysis which provided a key understanding of the most significant nodes or edges in the network. CytoHubba application was applied which took data from four calculation methods (EPC, MCC, MNC, and Stress) (Fig S2 (A, B, C, D). The top 40 nodes which were ranked by these calculation methods were picked. Additionally, four algorithms from the CytoNCA application (Degree, Eigenvector, Betweenness, and Closeness) (Fig. 4A–D) were used, and the top 40 nodes based on these four calculations were attained (Supplementary Table 2). Also, to identify the notable hub genes that are similar among all groups, a list of 26 key genes was identified (Supplementary Table 3). Moreover, a heatmap was plotted to elucidate the correlation among samples using the pheatmap package of R software (Fig. 5).
Functional annotation of key genes and co-expression analysis
KEGG pathway and GO enrichment analysis of these 26 genes revealed that six pathways were enriched including the Cell cycle, PPAR signaling pathway, p53 signaling pathway, Progesterone-mediated oocyte maturation, AMPK signaling pathway, and Oocyte meiosis. We identified the top dysregulated pathways, as well as cellular and molecular functions involved in breast cancer. The DEGs common to all these categories were shortlisted for further analysis. These include AURKA, BUB1B, CCNB1, CDK1, CCNB2 CENPF and MELK. Genes including AURKA, BUB1B, CCNB1, CDK1, CCNB2, CENPF, and MELK were strongly involved in cell division, protein phosphorylation, DNA replication, and apoptotic processes and are found in the centrosome, cytosol, cytoplasm, nucleus, and nucleoplasm. These genes also take part in protein kinase activity, protein homodimerization activity, ATP binding, and protein binding. Later, expression of these genes was analyzed by using the ENCOR1 database which has expression data for 1104 breast cancer samples and 113 normal breast samples from the TCGA project. This expression analysis represented different values including AURKA, Fold change: 3.07, Adjusted P value: 2.10E−104; BUB1B, Fold change: 7.2, Adjusted P value: 2.10E−104; CCNB1, Fold change: 5.63, Adjusted P value: 1.80E−111; CDK1, Fold change: 8.54, Adjusted P value: 5.30E−121; CCNB2, Fold change: 7.7, Adjusted P value: 5.30E−101; CENPF, Fold change: 8.99, Adjusted P value: 2.00E−114; MELK, Fold change: 11.71, Adjusted P value: 4.000E−114.
Moreover, the Pearson correlation coefficient(R) for these seven genes was calculated and genes with ≥ 0.7 R-value were shortlisted because they showed a strong correlation. This calculation inferred that BUB1B and CCNB2 had R = 0.82, MELK and CCNB2 had R = 0.78, and CCNB2 and AURKA had R = 0.7.
Identification of DE lncRNAs and co‑expression analysis
List of lncRNA genes was downloaded from the HGNC database and lncRNA gene symbols were extracted from the GSE10810, GSE42568, GSE65194, and GSE45827. A total of 87 lncRNAs with |logFC|> 0.5 and adjusted P value < 0.01 were filtered and finally, 15 differentially expressed lncRNAs were selected (Fig. 6). We calculated the Pearson correlation coefficient between DElncRNAs and CCNB2, MELK, AURKA, and BUB1B based on their expression value. LncRNA with Pearson correlation coefficient ≥ 0.2 or ≤ − 0.2 and P value < 0.05 were selected as key lncRNA which co-expressed with CCNB2, MELK, AURKA, and BUB1B. Total 04 lncRNAs meet this criterion (Table 1).
Survival analysis
The associations between the expression levels of candidate hub genes and relapse-free survival (RFS) and overall survival (OS) in breast cancer patients were assessed using the Kaplan–Meier (KM) method to evaluate their prognostic significance. The results indicate that low expression levels of EPB41L4A-AS1, LINC00667, MAGI2-AS3, and MALAT1 are significantly associated with higher RFS and OS rates, suggesting these genes could serve as potential markers for better prognosis (Table 2). Notably, EPB41L4A-AS1 and MAGI2-AS3 exhibited stronger associations with RFS, whereas LINC00667 and MALAT1 demonstrated significant predictive potential for both RFS and OS. Multivariate and univariate analyses consistently supported the prognostic value of these genes, highlighting their robust potential as markers for improved outcomes in breast cancer patients.
Discussion
Significant amount of clinical research has been conducted to identify the molecular basis of breast cancer but the outcomes have been limited due to lack of stable biomarkers. The existing literature primarily reflects a focus on singular genetic events or findings derived from individual cohort studies. To overcome it, four expression profile datasets were shortlisted from GEO, NCBI and were subjected to differential gene expression analysis. A total of 3483 DEGs including 1873 upregulated and 1610 downregulated were filtered between breast cancer and normal samples.
GO function and KEGG pathway analysis was performed for further in-depth understanding of these DEGs. The results of GO analysis were divided into biological processes (BP), cellular components (CC), and molecular functions (MF) which indicate that these genes are strongly involved in cell division, protein phosphorylation, DNA replication, and apoptotic processes and are found in the centrosome, cytosol, cytoplasm, nucleus, and nucleoplasm. These genes also take part in protein homodimerization activity, protein kinase activity, ATP binding, and protein binding. Another study identified similar pathways in ovarian cancer23. Furthermore, KEGG analysis indicates that the DEGs are enriched in the AMPK signaling pathway, PPAR signaling pathway, Malaria, ECM-receptor interaction, adipocytokine signaling pathway, proximal tubule bicarbonate reclamation, oocyte mitosis, regulation of lipolysis in adipocytes, P13k-Akt signaling pathway and Focal adhesion. Similar pathways were studied in the gastric cancer research24. After that, the DEG PPI network complex was constructed, consisting of 167 nodes and 642 edges. The MCODE plug-in filtered the top two clusters, which had 24 nodes and 270 and 16 nodes and 102 edges, respectively, from the PPI network complex. Subsequently, the PPI network complex’s ten DEGs with the greatest levels of interaction were filtered, and survival analysis revealed that patients whose DEGs were dysregulated had a worse prognosis. The filtered genes include AURKA, BUB1B, CCNB1, CDK1, CCNB2, CENPF, and MELK and they have been extensively associated with the development and recurrence of breast cancer.
A member of the serine/threonine kinase family, Aurora Kinase A (AURKA)25 exhibits markedly elevated expression in various cancer types, including breast, colorectal26, lung, prostate27, ovarian, and gastric cancer28. Notably, it plays a pivotal role in facilitating cell division processes through the regulation of mitosis29. AURKA’s overexpression is implicated in tumorigenesis, cancer cell proliferation, epithelial-mesenchymal transition (EMT), metastasis, apoptosis, and self-renewal of cancer stem cells, rendering it a potentially sensitive prognostic marker across diverse cancer types30. Its overexpression is associated with poor prognosis. CDK1 is a key regulator of cell cycle progression, interacts with several signaling pathways linked to breast cancer, including the PI3K/AKT/mTOR and RAS/RAF/MEK/ERK pathways31. CDK1 overexpression has been linked to early breast cancer development, poorer survival, and chemoresistance. CENPF dysregulation has a substantial impact on breast cancer development, affecting several aspects of the disease. Overexpression of this gene is associated with enhanced bone metastases in breast cancer by activating PI3K-AKT-mTORC1 signalling pathways32. The Maternal Embryonic Leucine-Zipper Kinase (MELK) gene is essential in breast cancer. It is usually associated with aggressive types such as TNBC. MELK has been found to be overexpressed in breast cancer tissue, especially in TNBC as compared to non-TNBC, and correlates with radioresistance in breast cancer cell lines33. CCNB2 is a major regulatory gene associated with lymphovascular invasion (LVI) in breast cancer and it plays a critical role for cell migration, proliferation, and the suppression of the G2/M transition, all of which contribute to the BC metastasis34. Dysregulation of this gene contributes in aggressive tumor behavior and reduced survival time in BC patients because it is related to multiple carcinogenesis pathways It has been demonstrated that CDK1 is over expressed in various cancers including BC35. BUB1B significantly impact the spindle checkpoint protein hence playing the role in breast cancer prognosis and treatment. Studies show that while BUB1B dysregulation impacts mitosis, sister chromatid separation, and cell cycle progression, excessive expression of the protein in breast cancer tissues is linked to a poor prognosis36.
On the basis of pearson correlation coefficient, we identified four lncRNAs they are co-expressed with CCNB2, MELK, AURKA, and BUB1B. These include EPB41L4A-AS1, LINC00667, MAGI2-AS3 and MALAT1. The lncRNAs MAGI2-AS3, LINC00667, MALAT1, and EPB41L4A-AS1 individually demonstrate specific expression patterns in breast cancer and are essential for the development and dissemination of the disease. The expression of EPB41L4A-AS1, a tumor suppressor, is significantly downregulated in breast cancer tissues when compared to non-cancerous tissues37. Breast cancer cell migration, invasion, and proliferation are inhibited by this lncRNA38. This study indicates that low expression of EPB41L4A-AS1 is linked with improved recurrence free survival (RFS). A previous study highlighted similar findings making it a prognostic biomarker38. Similarly, MAGI2-AS3 and LINC00667 have been linked to regulating the course of cancer39, and improved relapse-free survival rates are associated with their low expression40. This study shows that low levels of MAGI2-AS3 and LINC00667 are associated with improved RFS and OS. Previously it has been indicated that low levels of MAGI2-AS3 are linked with advanced tumor stage making it a great potential biomarker41. Moreover, another study exhihibited similar results indicating its potential as prognostic biomarker42. In a similar manner our study suggested that low levels of MALAT1 are associated with better RFS and OS as well. These results are in accordance with a previous study indicate that in particular cases of breast cancer low levels of MALAT1 are associated with higher RFS43. This study offers valuable insights for future research on the differential expression of genes in breast cancer. However, a notable limitation is the absence of experimental validation for the identified candidate genes.
Conclusion and future prospective
Using bioinformatical analysis on datasets containing profiles from several cohorts, we were able to identify 475 DEG candidate genes and remove 167 nodes and 642 edges from the DEG PPI network complex. Ten crucial genes having the highest interaction degrees were identified, and these genes were also predominantly associated with number of pathways related to biological processes, cellular components and molecular functions. In order to determine the relationship between the filtered core genes and breast cancer, survival analysis was also performed.
Additionally, we included several suggestions from the literature on the part that potential genes play in the etiology of cancer. The lack of an experimental evaluation of the potential genes in this work limits its applicability, even though it offers some promising data for future differential expression investigations in breast cancer. Several hub genes and linked lncRNAs that may have a role in the etiology of breast cancer and the prognosis of patients were found using the in silico technique. As a result, these genes may have applications as therapeutic targets or biomarkers for this cancer. The future prospective of this research includes validating core genes as biomarkers for prognosis and treatment response in breast carcinoma, potentially guiding personalized therapeutic strategies, and improving patient outcomes.
Methods
In this study, we employed a multi-omics approach to identify long non-coding RNAs as prognostic biomarkers by integrating gene expression data from four datasets, which encompass both normal and tumor breast tissues (GSE10810, GSE42568, GSE65194, and GSE45827). Our objective is to identify differentially expressed genes (DEGs) and lncRNAs, and to construct an mRNA–lncRNA network through co-expression analysis. Figure 7 show all the steps included in the study.
Data collection
In this study, those datasets were included that had samples of both normal and tumor tissue. Gene Expression Omnibus database; GEO, NCBI (https://www.ncbi.nlm.nih.gov/geo/), was used to download four gene expression datasets i.e. GSE10810, GSE42568, GSE65194, and GSE45827. The chip-based platform i.e. GPL570 (HG-U133_Plus_2) Afymetrix Human Genome U133 plus 2.0 Array had been applied to process all datasets.
A total of 58 samples were included in dataset GSE1081044 out of which 27 and 31 were the control and tumor samples respectively. In the dataset GSE42568, 104 biopsy samples of breast cancer and 17 of the normal biopsy samples were included45. In a similar way, the dataset GSE65194 contained 11 samples of normal tissue and 130 samples of breast cancer46, while the dataset GSE45827 had 11 samples of normal tissue, 14 cell lines as well as samples of the primary invasive breast cancer (including 41 TN, 30 HER2, 30 Luminal B, and 29 Luminal A)47.
Identification of differentially expressed genes (DEGs)
Robust Multichip Average (RMA) was applied in R Studio software (v4.3.3) (https://www.R-project.org) to provide quantile normalization and background correction to the raw data files of all datasets. For profiling data of both mRNA and lncRNA, RMA is a useful tool in the Bioconductor package. It provides reliable normalization and background correction to ensure precise analysis of gene expression.
For the analysis of differentially expressed genes, the limma package (version 3.60.2) from Bioconductor (https://www.bioconductor.org/) in R software was used. The adjusted P value (adjp) was < 0.01 and the cut-off value for |log2FC| was > 0.5.
Functional enrichment analysis
KEGG Pathway and Gene Ontology (GO) functional enrichment analyses were carried out on differentially expressed genes to examine the function of DEGs in breast cancer. GO functional enrichment analysis was carried out in three functional ontologies, molecular functions (MF), cellular components (CC), and biological processes (BP), using the DAVID Bioinformatics program (https://david.ncifcrf.gov/). Since the adjusted P value of p < 0.05 is considered as statistically significant, it was selected as the cutoff point.
PPI network construction and analysis
The STRING (v12.0) database (https://string-db.org/)48 was used to predict the relationship among overlapping DEGs. To construct the PPI network, we applied a filter of > 0.4 combined score. After that, this network was imported into the Cytoscape (v 3.10.2) program (https://www.cytoscape.org/) to observe the PPI network and identify/examine hub genes49. Molecular Complex Detection (MCODE) plug in (version 2.0.3) was used to identify the PPI subnetwork and the highly connected clusters within the PPI network. The filter values were specified to maximum depth = 100, node score = 0.2, and K-core = 2 as threshold parameters. Two other plug-ins that offer multiple algorithms for identifying hub genes in the network are CytoHubba (version 0.1)50 and CytoNCA (version 2.1.6)51. Furthermore, using 1104 cancer and 113 normal samples from the TCGA project, identified key genes were chosen for further expression analysis in The Encyclopedia of RNA Interactomes (ENCORI) database (https://starbase.sysu.edu.cn/panCancer.php).
Prediction of lncRNAs function
To assess the potential roles of lncRNAs, a co-expression analysis of lncRNA-mRNA was conducted. The complete list of lncRNA genes with approved HUGO Gene Nomenclature Committee (HGNC) symbols was downloaded from (https://www.genenames.org/). Gene symbols from our dataset were compared to the list of lncRNA gene names, and genes that overlapped were selected. Fifteen differentially expressed lncRNAs were chosen using the cutoff criteria of (|logFC|) > 0.5 and the adjusted P value < 0.01. The lower expression level of lncRNAs compared to mRNAs was the reason behind the use of more lenient selection criteria. Then, using functional annotation and co-expression analysis data from previous steps, the Pearson correlation coefficient between the two key protein-coding genes (MAD2L1 and CCNA2) in our sample and the differentially expressed lncRNA was determined. The lncRNAs that co-expressed with MAD2L1 and CCNA2 were those whose correlation coefficients were at least 0.6 or less than 0.6.
Survival analysis
These candidate hub genes were subjected to survival analysis to investigate their impact on survival of breast cancer. Based on expression data from 6234 breast cancer patients, Recurrence Free Survival (RFS) and Overall Survival (OS) analyses were conducted using the Kaplan Meier plotter (kmplot.com/), which evaluates the impact of gene expression on survival in 21 tumor types52. We classified the patients based on the optimal cutoff value automatically determined by the Kaplan Meier plotter. The follow-up threshold was defined as 120 months in order to eliminate those patients with follow-up periods more than this duration. Comparisons between survival curves were performed by the log-rank test, and results are presented with HR, 95% CI, and P values.
Data availability
All data generated or analysed during this study are included in this manuscript and its supplementary information files.
References
Li, X. et al. A signature of autophagy-related long non-coding RNA to predict the prognosis of breast cancer. Front. Genet. 12, 569318 (2021).
Sideris, N., Dama, P., Bayraktar, S., Stiff, T. & Castellano, L. LncRNAs in breast cancer: A link to future approaches. Cancer Gene Ther. 29(12), 1866–1877 (2022).
Siegel, R. L., Giaquinto, A. N. & Jemal, A. Cancer statistics, 2024. CA A Cancer J. Clin. 74(1), 12–49 (2024).
Delgado-Martín, B. & Medina, M. Á. Advances in the knowledge of the molecular biology of glioblastoma and its impact in patient diagnosis, stratification, and treatment. Adv. Sci. 7(9), 1902971 (2020).
Grillone, K. et al. Non-coding RNAs in cancer: Platforms and strategies for investigating the genomic “dark matter”. J. Exp. Clin. Cancer Res. 39, 1–19 (2020).
Ponting, C. P. & Haerty, W. Genome-wide analysis of human long noncoding RNAs: a provocative review. Annu. Rev. Genomics Hum. Genet. 23, 153–172 (2022).
El-Helkan, B. et al. Long non-coding RNAs as novel prognostic biomarkers for breast cancer in Egyptian women. Sci. Rep. 12(1), 19498 (2022).
Sarraf, J. S. et al. Noncoding RNAs and colorectal cancer: A general overview. Microrna 9(5), 336–345 (2020).
Smolarz, B., Zadrożna-Nowak, A. & Romanowicz, H. The role of lncRNA in the development of tumors, including breast cancer. Int. J. Mol. Sci. 22(16), 8427 (2021).
Abolghasemi, M. et al. Critical roles of long noncoding RNAs in breast cancer. J. Cell. Physiol. 235(6), 5059–5071 (2020).
Mattick, J. S. et al. Long non-coding RNAs: Definitions, functions, challenges and recommendations. Nat. Rev. Mol. Cell Biol. 24(6), 430–447 (2023).
Quinn, J. J. & Chang, H. Y. Unique features of long non-coding RNA biogenesis and function. Nat. Rev. Genet. 17(1), 47–62 (2016).
Li, X. & Fu, X.-D. Chromatin-associated RNAs as facilitators of functional genomic interactions. Nat. Rev. Genet. 20(9), 503–519 (2019).
Wang, M.-Q., Zhu, W.-J. & Gao, P. New insights into long non-coding RNAs in breast cancer: Biological functions and therapeutic prospects. Exp. Mol. Pathol. 120, 104640 (2021).
Lv, W. et al. Landscape of prognosis and immunotherapy responsiveness under tumor glycosylation-related lncRNA patterns in breast cancer. Front. Immunol. 13, 989928 (2022).
Ahmadpour, S. T. et al. Breast cancer chemoresistance: insights into the regulatory role of lncRNA. Int. J. Mol. Sci. 24(21), 15897 (2023).
Lu, C., Wei, D., Zhang, Y., Wang, P. & Zhang, W. Long non-coding RNAs as potential diagnostic and prognostic biomarkers in breast cancer: progress and prospects. Front. Oncol. 11, 710538 (2021).
Thakur, K. K. et al. Long noncoding RNAs in triple-negative breast cancer: A new frontier in the regulation of tumorigenesis. J. Cell. Physiol. 236(12), 7938–7965 (2021).
Brown, J. M., Wasson, M. C. D. & Marcato, P. The missing Lnc: The potential of targeting triple-negative breast cancer and cancer stem cells by inhibiting long non-coding RNAs. Cells 9(3), 763 (2020).
Rajagopal, T., Talluri, S., Akshaya, R. & Dunna, N. R. HOTAIR LncRNA: A novel oncogenic propellant in human cancer. Clin. Chim. Acta 503, 1–18 (2020).
Adewunmi, O., Shen, Y., Zhang, X.H.-F. & Rosen, J. M. Targeted inhibition of lncRNA Malat1 alters the tumor immune microenvironment in preclinical syngeneic mouse models of triple-negative breast cancer. Cancer Immunol. Res. 11(11), 1462–1479 (2023).
Huang, Y. et al. lncRNA MALAT1 participates in metformin inhibiting the proliferation of breast cancer cell. J. Cell Mol. Med. 25(15), 7135–7145 (2021).
Rashid, H. et al. Identification of novel genes and pathways of Ovarian Cancer using a Comprehensive Bioinformatic Framework. Appl. Biochem. Biotechnol. 196(6), 3056–3075 (2024).
Xu, J. et al. Integrated bioinformatics analysis of noncoding RNAs with tumor immune microenvironment in gastric cancer. Sci. Rep. 13(1), 15006 (2023).
Du, R., Huang, C., Liu, K., Li, X. & Dong, Z. Targeting AURKA in Cancer: Molecular mechanisms and opportunities for Cancer therapy. Mol. Cancer 20, 1–27 (2021).
Tang, J. et al. ARID3A promotes the development of colorectal cancer by upregulating AURKA. Carcinogenesis 42(4), 578–586 (2021).
Chen, X. et al. CCNB1 and AURKA are critical genes for prostate cancer progression and castration-resistant prostate cancer resistant to vinblastine. Front. Endocrinol. 13, 1106175 (2022).
Mesic, A., Rogar, M., Hudler, P., Juvan, R. & Komel, R. Association of the AURKA and AURKC gene polymorphisms with an increased risk of gastric cancer. IUBMB Life 68(8), 634–644 (2016).
Marima, R., Hull, R., Penny, C. & Dlamini, Z. Mitotic syndicates Aurora Kinase B (AURKB) and mitotic arrest deficient 2 like 2 (MAD2L2) in cohorts of DNA damage response (DDR) and tumorigenesis. Mutat. Res./Rev. Mutat. Res. 787, 108376 (2021).
Liu, Y. et al. Function of AURKA protein kinase in the formation of vasculogenic mimicry in triple-negative breast cancer stem cells. OncoTargets Ther. 9, 3473–3484 (2016).
Mir, M. A. & Haq, B. U. CDK1 Dysregulation in Breast Cancer. Therapeutic Potential of Cell Cycle Kinases in Breast Cancer 195–210 (Springer, Berlin, 2023).
Sun, J. et al. Overexpression of CENPF correlates with poor prognosis and tumor bone metastasis in breast cancer. Cancer Cell Int. 19, 1–11 (2019).
Speers, C. et al. Maternal embryonic leucine zipper kinase (MELK) as a novel mediator and biomarker of radioresistance in human breast cancer. Clin. Cancer Res. 22(23), 5864–5875 (2016).
Aljohani, A. I. et al. Upregulation of Cyclin B2 (CCNB2) in breast cancer contributes to the development of lymphovascular invasion. Am. J. Cancer Res. 12(2), 469 (2022).
Ding, W.-N., Ree, R. H., Spicer, R. A. & Xing, Y.-W. Ancient orogenic and monsoon-driven assembly of the world’s richest temperate alpine flora. Science 369(6503), 578–581 (2020).
Zhang, P. et al. Prognostic values of spindle checkpoint protein BUB1B in triple negative breast cancer. Zhonghua Bing li xue za= zhi Chin. J. Pathol. 50(6), 645–649 (2021).
Wang, M. et al. Integrated analysis of lncRNA-miRNA-mRNA ceRNA network identified lncRNA EPB41L4A-AS1 as a potential biomarker in non-small cell lung cancer. Front. Genet. 11, 511676 (2020).
Yang, F. & Lv, S. LncRNA EPB41L4A-AS1 regulates cell proliferation, apoptosis and metastasis in breast cancer. Ann. Clin. Lab. Sci. 52(1), 3–11 (2022).
Gong, J., Ma, L., Peng, C. & Liu, J. LncRNA MAGI2-AS3 acts as a tumor suppressor that attenuates non-small cell lung cancer progression by targeting the miR-629-5p/TXNIP axis. Ann. Transl. Med. 9(24), 1793 (2021).
Zhang, Y. et al. Identification of a new eight-long noncoding RNA molecular signature for breast cancer survival prediction. DNA Cell Biol. 38(12), 1529–1539 (2019).
Záveský, L., Jandáková, E., Weinberger, V., Minář, L., Kohoutová, M., Slanař, O. Long non-coding RNAs PTENP1, GNG12-AS1, MAGI2-AS3 and MEG3 as tumor suppressors in breast cancer and their associations with clinicopathological parameters. Cancer Biomark. (Preprint), 1–18 (2024).
Zhu, M. et al. Identification of a four-long non-coding RNA signature in predicting breast cancer survival. Oncol. Lett. 19(1), 221–228 (2020).
Huang, N.-s. et al. Long non-coding RNA metastasis associated in lung adenocarcinoma transcript 1 (MALAT1) interacts with estrogen receptor and predicted poor survival in breast cancer. Oncotarget 7(25), 37957 (2016).
Velazquez-Caldelas, T. E., Alcalá-Corona, S. A., Espinal-Enríquez, J. & Hernandez-Lemus, E. Unveiling the link between inflammation and adaptive immunity in breast cancer. Front. Immunol. 10, 56 (2019).
Liu, R., Guo, C.-X. & Zhou, H.-H. Network-based approach to identify prognostic biomarkers for estrogen receptor–positive breast cancer treatment with tamoxifen. Cancer Biol. Ther. 16(2), 317–324 (2015).
Li, Z., Lim, S. K., Liang, X. & Lim, Y. P. The transcriptional coactivator WBP2 primes triple-negative breast cancer cells for responses to Wnt signaling via the JNK/Jun kinase pathway. J. Biol. Chem. 293(52), 20014–20028 (2018).
Gruosso, T. et al. Chronic oxidative stress promotes H2 AX protein degradation and enhances chemosensitivity in breast cancer patients. EMBO Mol. Med. 8(5), 527–549 (2016).
Szklarczyk, D. et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47(D1), D607–D613 (2019).
Jiang, W. et al. A mitochondrial EglN1-AMPKα axis drives breast cancer progression by enhancing metabolic adaptation to hypoxic stress. EMBO J. 42(20), e113743 (2023).
Chin, C.-H. et al. cytoHubba: Identifying hub objects and sub-networks from complex interactome. BMC Syst. Biol. 8, 1–7 (2014).
Tang, Y., Li, M., Wang, J., Pan, Y. & Wu, F.-X. CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems 127, 67–72 (2015).
Győrffy, B. Survival analysis across the entire transcriptome identifies biomarkers with the highest prognostic power in breast cancer. Comput. Struct. Biotechnol. J. 19, 4101–4109 (2021).
Author information
Authors and Affiliations
Contributions
S.A. conceived the idea of study. S.T. alongwith HN wrote the manuscript. S.T. performed Gene Ontology, Survival Analysis alongwith the identification of Differentially Expressed lncRNAs and co‑expression analysis. U.S. and M.A. performed the coding in R studio and separated the differentially expressed genes alongwith the formation of volcano plots, venn diagram, heatmap and PCA Plots. AM performed performed PPI Network construction using STRING and cytoscape.SA proofread the article. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Tariq, S., Sajjad, U., Naveed, H. et al. Genomic data mining reveals hub genes and lncRNAs as prognostic biomarkers in breast cancer. Sci Rep 15, 35585 (2025). https://doi.org/10.1038/s41598-025-98204-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-98204-8