Introduction

Nasopharyngeal carcinoma (NPC) arises from the nasopharyngeal mucosal lining and is an epithelial carcinoma frequently observed at the pharyngeal recess1. The geographical distribution of NPC is extremely unbalanced, with most new cases of NPC occurring in East and Southeast Asia2. The age-standardized rate of NPC is approximately 3.0 per 100,000 in China, while in populations that are mainly white, the rate is approximately 0.4 per 100,0002. The risk factors for NPC include Epstein–Barr virus (EBV) infection, host genetics, environmental factors, and dietary patterns, all of which contribute to the remarkable geographical distribution of NPC3. It has been reported that the incidence of NPC has gradually declined in some regions, such as North America and Nordic countries4. However, the incidence of NPC has remained static over the past two decades in some southern provinces of mainland China, placing a burden on the medical system5.

The main subtypes of NPC include keratinizing squamous, nonkeratinizing squamous, and basaloid squamous, among which nonkeratinizing squamous cell carcinoma is the most common subtype6. The current hypothesis concerning the pathogenesis of NPC is that nasopharyngeal epithelial cells are infected by EBV and express different viral oncogenic genes, leading to cellular invasive phenotype transformation and NPC progression7. The upregulation of cyclin D1 (CCND1) and/or inactivation of tumor suppressor genes such as transforming growth factor beta receptor 2 (TGFBR2) results in persistent infection with EBV, which promotes unlimited cellular proliferation, resistance to apoptosis, immune dysregulation, inflammation, and genome instability8. A whole-genome sequencing study revealed that the upregulation of EBV-encoded latent membrane protein-1 (LMP-1) activates nuclear factor kappa B (NF-κB) signalling pathway, which is the key oncogenic driver of NPC9,10. In addition, the overexpression of EBV-encoded BNLF2a caused immune evasion and contributed to the progression of NPC11. Recent studies have also demonstrated that the transforming growth factor β (TGFβ) signalling pathway, phosphatidylinositol-3 kinase (PI3K) signalling pathway, and mitogen-activated protein kinase (MAPK) signalling pathway play important roles in the tumorigenesis of NPC12. Despite these findings, the exact mechanism underlying the tumorigenesis of NPC is not yet clear and needs further exploration.

Owing to the high sensitivity of NPC to ionizing radiation, radiotherapy is the key treatment for NPC13. With the development of technology, radiotherapy has progressed from traditional two-dimensional radiotherapy to three-dimensional conformal radiotherapy and then intensity-modulated radiotherapy14. Intensity-modulated radiotherapy is currently the most widely used treatment, and intensity-modulated radiotherapy can reduce the 5-year occurrence rate of patients with NPC15. Moreover, compared with two-dimensional and three-dimensional radiotherapy, intensity-modulated radiotherapy is significantly related to better 5-year locoregional control and overall survival16. Currently, radiotherapy combined with chemotherapy is important for treating locoregionally advanced NPC17.

However, biomarkers that can accurately predict the outcome after treatment and the survival of patients with NPC are lacking3. In this study, the hub genes of NPC were identified using bioinformatics analysis and validated in clinical NPC samples. Next, using hub genes and clinical data, we constructed a prediction model for the survival of patients with NPC who underwent radiotherapy, and the performance of the model was evaluated.

Results

Identification of hub genes of NPC

The GSE61218 and GSE126683 data were normalized and merged after the batch effect was removed. The differentially expressed genes (DEGs) of NPC were screened using the thresholds of p < 0.05 and |log2foldchange| > 1. The results revealed 2080 DEGs between the normal and tumor groups; 736 DEGs were downregulated in the tumor group, whereas 1344 DEGs were upregulated in the tumor group (Fig. 1A). The results of the Gene Ontology (GO) enrichment analysis indicated that the DEGs were significantly associated with deoxyribonucleic acid (DNA) replication, mitotic nuclear division, and nuclear division (Fig. 1B). In addition, Kyoto Encyclopedia of Genes and Genomes (KEGG)18,19 enrichment analysis revealed that the DEGs were significantly related to signalling pathways, including the cell cycle, DNA replication, p53 signalling pathway, and mismatch repair (Fig. 1C). The hub genes were subsequently identified on the basis of different methods of protein–protein interaction network (Fig. 1D, E). AURKA, AURKB, BUB1, BUB1B, CCNA2, CCNB2, and CDK1 were identified as hub genes, all of which were significantly upregulated in the tumor group (Fig. 1F).

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Hub genes of NPC. (A) Volcano plot of the DEGs. (B) Results of the GO enrichment analysis of the DEGs. (C) Results of the KEGG (www.kegg.jp/kegg/kegg1.html) enrichment analysis of the DEGs. (D) Flowchart of identification of hub genes. (E) Hub genes were identified by taking the intersection of the top 20 genes derived from different methods of cytoHubba. (F) The expression levels of the hub genes in GSE61218 and GSE126683. ***, p < 0.001 vs. the normal group.

High expression levels of hub genes were significantly associated with a low overall survival rate

To further explore the potential roles of these hub genes in predicting survival, we collected tumor samples from 120 patients with NPC. The baseline data of these patients are displayed in Table 1. The median follow-up time was 2669 days, and 26 patients died during follow-up. The number and size of tumor were significantly greater in the Death group. Similarly, the T, N, and M stages of NPC were significantly greater in the Death group. In addition, there was no significant difference in the pathological type between the two groups of patients. The expression levels of the hub genes were subsequently validated in clinical samples. Immunofluorescence staining revealed that the protein expression levels of AURKA, BUB1, and CDK1 were significantly upregulated in the NPC tissues of patients in the Death group, whereas the expression levels of CCNA2 and CCNB2 did not significantly differ between the NPC tissues from Control group and Death group (Fig. 2A–E). Thereafter, the relationships between these hub genes and the overall survival rate were determined. The results of the log-rank test suggested that the overall survival rate of patients with high expression levels of AURKA, BUB1, or CDK1 was significantly lower than that of patients with low expression levels of AURKA, BUB1, or CDK1, while the expression level of CCNA2 or CCNB2 was not significantly correlated with the overall survival rate (Fig. 2F–J). These results indicated that the expression levels of AURKA, BUB1, and CDK1 might be predictive factors for the survival of patients with NPC.

Table 1 Baseline data of patients.
Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

High expression levels of hub genes were significantly associated with a low overall survival rate. (AE) Representative images of immunofluorescence staining for AURKA, BUB1, CDK1, CCNA2, and CCNB2 in NPC samples. (FJ) Kaplan–Meier curves of patients with different expression levels of AURKA, BUB1, CDK1, CCNA2, and CCNB2. Control: nasopharyngeal carcinoma tissues from patients who survived during follow-up. Death: nasopharyngeal carcinoma tissues from patients who died during follow-up. ns, not significant. **, p < 0.01 vs. Control group. ***, p < 0.001 vs. Control group.

Construction and evaluation of the survival prediction model based on the hub genes

We constructed a survival prediction model based on the hub genes. Age and gender were included in the prediction model as variables. The other variables were screened by univariate Cox regression analysis with a threshold of p value < 0.15 (Supplementary Table 1). Variables that were inconsistent with clinical experience were excluded (Supplementary Table 1). Finally, gender, age, T stage, N stage, M stage, BUB1 expression, and AURKA expression were used to construct the survival prediction model (Model 1; Table 2). The receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) of Model 1 were calculated, and the results revealed that the areas under the curve (AUC) values for predicting survival at 1500, 2000, and 3000 days were 0.832, 0.927, and 0.939, respectively, exhibiting good predictive ability (Fig. 3A). To further explore whether the inclusion of BUB1 and AURKA improved the performance of the prediction model, we constructed another prediction model (Model 2; Supplementary Table 2) that included gender, age, T stage, N stage, and M stage as variables. The ROC curve and AUC of Model 2 were calculated and are displayed in Fig. 3B, and its AUC values were significantly lower than those of Model 1 (p < 0.001). The calibration plot indicated that Model 1 was better calibrated than Model 2 (Fig. 3C). The net reclassification index (NRI) and integrated discrimination improvement index (IDI) of Model 1 vs. Model 2 were then calculated, and the values of the NRI and IDI were 0.233 and 0.194, respectively, both of which indicated that the inclusion of BUB1 and AURKA significantly improved the performance of the prediction model (Table 3). Finally, the results of decision curve analysis (DCA) revealed that compared with Model 2, Model 1 resulted in greater clinical net benefits (Fig. 3D). In conclusion, the inclusion of BUB1 and AURKA significantly improved the discriminating ability, predictive ability, and clinical utility of the prediction model.

Table 2 Cox survival model for patients with NPC.
Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Evaluation of the prediction model. (A) ROC curve and AUC of the prediction model including hub genes as variables (Model 1). (B) ROC curve and AUC of the prediction model not including hub genes as variables (Model 2). (C) Calibration plots of Model 1 and Model 2. (D) The decision curves of Model 1 and Model 2.

Table 3 Discriminating and predictive ability of different models.

Discussion

NPC is endemic to Southeast Asia and North Africa2. The incidence of NPC has remained static in Southern China and poses a serious threat to people’s health5. Treatments for NPC have been developed in the past two decades7, but there is a lack of biomarkers that can accurately predict the treatment outcome and survival of patients with NPC. In this study, the DEGs of NPC were first analysed using Gene Expression Omnibus (GEO) gene expression datasets. AURKA, AURKB, BUB1, BUB1B, CCNA2, CCNB2, and CDK1 were subsequently identified as hub genes by different protein‒protein interaction networks. The expression levels of AURKA, BUB1, and CDK1 were significantly increased in samples from patients who died during follow-up. The overall survival rate of patients with high expression levels of AURKA, BUB1, or CKD1 was significantly lower than that of patients with low expression levels of these genes. Finally, a survival prediction model was constructed with baseline clinical data and hub gene expression levels. The performance of the model was evaluated, and the results revealed that the prediction model has good discriminating ability, predictive ability, and clinical utility.

Infection with EBV is the primary pathogenic factor of NPC20. EBV mainly exists as a latent infection in NPC and expresses viral proteins, including EBNA1, LMP1, and LMP2, which play crucial roles in the tumorigenesis and development of NPC20. Among these viral proteins, LMP1 is regarded as one of the most important oncogenic proteins21. LMP1 can simulate the function of tumor necrosis factor receptor (TNFR) and activate the NF-κB, ERK/MAPK, JNK, JAK-STAT, p38/MAPK, and PI3K/Akt pathways, all of which promote tumor cell survival and proliferation21. In addition, LMP1 can regulate the expression of proinflammatory factors such as interleukin-6 (IL-6), IL-8, and macrophage inflammatory protein 1-α22. These proinflammatory factors can recruit T cells and macrophages and significantly affect the tumor microenvironment22. Moreover, proinflammatory factors can also induce the growth, migration and invasion of tumor cells22. In addition, LMP1 participates in the reprogramming of glycolysis to provide enough energy for the proliferation of tumor cells23. Infection with EBV is also related to anoikis resistance and immune evasion11,23. EBNA1 is expressed in most EBV-associated tumor and significantly contributes to the maintenance, replication, and transcription of the viral genome11. In this study, the DEGs of NPC were found to be significantly enriched in biological processes such as DNA replication and the cell cycle. These results further support the hypothesis that EBV infection induces the expression of virus-related oncogenes, which lead to the progression and development of NPC.

Since the DEGs of NPC were significantly related to DNA replication and the cell cycle of tumor cells, the authors speculated that the hub genes of the DEGs might be potential prognostic biomarkers for patients with NPC. AURKA, AURKB, BUB1, BUB1B, CCNA2, CCNB2, and CDK1 were identified as hub genes, and all the hub genes were upregulated in tumor groups. AURKA and AURKB are members of the serine/threonine kinase family and share a highly conserved catalytic domain containing autophosphorylation sites24. Moreover, both AURKA and AURKB play crucial roles in the cell cycle24,25. According to the Cancer Genome Atlas (TCGA) UALCAN database, AURKA is expressed in many kinds of tumor, such as rectal adenocarcinoma24. AURKA expression is significantly upregulated in bladder urothelial carcinoma, invasive breast carcinoma, cholangiocarcinoma, and colon adenocarcinoma tissues compared with corresponding normal tissues24. AURKB is also significantly upregulated in a variety of tumor24. The function of AURKB is similar to that of AURKA24. Therefore, the authors focused only on AURKA for further exploration in this study. BUB1 and its paralogous homologue BUB1B are members of the spindle assembly checkpoint (SAC) protein family, both of which can prevent premature mitotic chromosome segregation and reduce aneuploidy26. The interaction between BUB1 and BUB1B is mediated by a conserved N-terminal region. This interaction is important for the localization of the mitotic checkpoint kinetochore26. Mutations in the BUB1 and BUB1B genes have been identified in tumor27. The upregulation of BUB1 can induce the proliferation and invasion of gastric tumor cells via the Wnt/β-catenin signalling pathway, whereas the downregulation of BUB1 can lead to S-phase arrest in liver tumor cells28,29. Similarly, the upregulation of BUB1B was related to the proliferation of myeloma cells via the CDC20/CCNB signalling pathway30. Considering the similarity in the functions of BUB1 and BUB1B, the authors selected BUB1 for further evaluation. CCNA2 is a member of the Cyclin A family and participates in cell cycle regulation31. Studies have suggested that CCNA2 is involved in the occurrence and progression of many types of tumors through the induction of epithelial–mesenchymal transformation and metastasis32. CCNB2 is a member of the cell cycle protein family and primarily controls the G2/M phase transition33. Many studies have shown that CCNB2 is aberrantly expressed in a variety of tumor, including glioblastoma and non-small cell lung cancer34,35. In addition, the upregulation of CCNB2 was associated with an accelerated proliferation rate of tumor cells35. CDK1 is a member of the cyclin-dependent kinase family and is a serine/threonine kinase that forms a complex with cyclin proteins to regulate the cell cycle36. In addition, CDK1 is the only CDK in mammals that is necessary for the cell cycle and induces G2/M and G1/S transitions and G1 progression36. The dysregulation of CDK1 leads to unrestricted cell proliferation, which ultimately results in the occurrence of tumor36.

Since these hub genes are closely related to the progression of tumor, the authors speculated that these hub genes might have potential predictive value for the survival of patients with NPC. Therefore, the authors further validated the expression levels of the hub genes in patients. The authors collected tumor samples from 120 patients with NPC. The median follow-up time was 2669 days, and 26 patients died during follow-up. The results of multiplex immunofluorescence showed that the expression levels of AURKA, BUB1, and CDK1 were significantly upregulated in the Death group of patients with NPC. Afterwards, the authors constructed a survival prediction model based on hub genes and baseline clinical data. gender, age, T, N, M, BUB1, and AURKA were selected to construct the survival prediction model. As mentioned above, the upregulation of BUB1 contributes to the development of tumor. Zhang et al. reported that BUB1 expression is upregulated in endometrial cancer, is significantly related to the infiltration of T cells in the tumor microenvironment, and is correlated with the prognosis of patients with endometrial cancer37. In addition, Chen et al. reported that BUB1 was significantly correlated with the overall survival rate of patients with breast cancer and might be a prognostic biomarker for patients with breast cancer38. Moreover, a bioinformatics study indicated that BUB1 has potential predictive value for the survival of patients with NPC39. Moreover, studies have indicated that AURKA expression can predict the outcome of patients with breast cancer40. Studies have also suggested that AURKA might be a potential prognostic biomarker for NPC41. In our study, we validated the predictive values of BUB1 and AURKA in a clinical cohort. Finally, the performance of the final prediction model was evaluated using ROC, AUC, calibration plot, and DCA, and the results revealed that the prediction model has good discriminating ability, predictive ability, and clinical utility.

However, there were several limitations in this study. First, owing to the lack of healthy nasopharyngeal tissue samples, the protein expression levels of the hub genes were not compared between healthy nasopharyngeal tissue and NPC tissue. Second, the sample size of the study population was relatively small, and multicentre studies are needed in the further research. In addition, the functions of AURKA and BUB1 in NPC cell lines have not been explored.

Conclusion

In this study, the DEGs of NPC were identified and analysed, and they were found to be significantly enriched in biological processes such as DNA replication. Then, AURKA, AURKB, BUB1, BUB1B, CCNA2, CCNB2, and CDK1 were identified as hub genes. The overall survival rate of patients with high expression levels of AURKA, BUB1, or CDK1 was significantly reduced. Finally, a survival prediction model was constructed with hub genes and clinical data, which had good discriminating ability, predictive ability, and clinical utility.

Methods

Identifying hub genes

The Gene Expression Omnibus database (GEO database, https://www.ncbi.nlm.nih.gov/gds) was searched for gene expression datasets of NPC according to the following criteria: (1) Search term: Nasopharyngeal carcinoma, (2) Top Organisms: Homo sapiens, (3) Study type: Expression profiling by array, (4) Attribute name: Tissue, (5) Sample count: From 6 to 1000, and (6) datasets containing both NPC tissues and normal healthy nasopharyngeal tissues. GSE61218 and GSE126683 met all the criteria. The gene expression datasets of GSE61218 and GSE126683 were subsequently downloaded from the GEO database. The GSE61218 dataset contains six normal healthy nasopharyngeal tissue samples and ten NPC samples. The GSE126683 dataset contains three normal healthy nasopharyngeal tissue samples and three NPC samples. The raw data of these datasets were normalized using the “Lumi” package in R software (R Core Team; R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing; Vienna, Austria; https://www.R-project.org/; version 4.1.2). Afterwards, the data were annotated using the “dplyr” and “limma” packages in R software. The batch effect between each dataset was removed using the “sva” package in R software, and these datasets were merged for further analysis.

The differentially expressed genes (DEGs) were identified using “limma” packages in R software with thresholds of p < 0.05 and |log2fold change| > 1. A protein‒protein interaction network of the DEGs was constructed using the Search Tool for the Retrieval of Interacting Genes online tool (STRING, https://cn.string-db.org/). The minimum required interaction score applied in STRING was medium confidence (0.400). The full STRING network, which included both functional and physical associations, was used in this study. In addition, the active interaction sources included text mining, experiments, databases, coexpression, neighbourhood, gene fusion, and co-occurrence. The protein‒protein interaction network of the DEGs was then visualized using Cytoscape software (version 3.9.1)42. The “node’s score” of the network was calculated using cytoHubba (ver. 0.1), which is a plugin of Cytoscape software43. In this study, the top 20 DEGs were identified using the “degree”, “EPC”, “MCC”, and “MNC” methods via cytoHubba. The “degree” method takes the number of directly connected edges of a node as the core indicator to identify key nodes with the most direct connections in the network. The “EPC” method focuses on the anti-interference stability of the network and identifies nodes that can maintain the connectivity of network components. The “MCC” method captures key nodes with both connectivity and local network density advantages by evaluating the central position of nodes in the maximal cliques of the network. The “MNC” method measures the centrality of nodes in their maximum neighbourhood components to explore key nodes that play a leading role in local subnetworks. Finally, the intersection of the four sets of the top 20 genes was taken to obtain the hub genes.

Study population

This study was approved by the Ethics Committee of Xiangya Hospital, Central South University, and was performed in accordance with the Declaration of Helsinki. Informed consent was obtained from all participants. Patients with NPC who underwent radiotherapy at Xiangya Hospital, Central South University, from 2008 to 2013 were included in this study. The exclusion criteria were as follows: age ≤ 18 years or ≥ 80 years, a diagnosis of other tumor, pregnancy, and missing data. NPC samples were obtained by biopsy before treatment. Baseline data and clinical data were recorded. Telephone follow-ups were conducted.

Multiplex immunofluorescence

Formalin-fixed, paraffin-embedded NPC slides were deparaffinized and hydrated. Multiplex immunofluorescence was conducted using an Opal™ 7-Color Manual IHC Kit according to the manufacturer’s recommended procedures. Briefly, the slides were heated in AR buffer using a microwave. After they cooled, the slides were blocked with normal goat serum. The primary antibody was then incubated with the slides overnight. After being rinsed, the slides were incubated with the secondary Polymer HRP Ms + Rb and Opal Fluorophore Working Solution to generate specific Opal signals for each target. After that, the slides were heated in AR buffer using a microwave to strip the primary–secondary–HRP complex, allowing the introduction of the next primary antibody. Then, the same procedure was repeated, starting with the blocking agent and followed by primary antibody incubation and secondary Polymer HRP Ms + Rb and Opal Fluorophore Working Solution incubation to generate specific Opal signal for all targets. Finally, DAPI Working Solution was applied to the slides and the slides were then mounted with mounting medium. Images were obtained via Vectra Quantitative Pathology Imaging Systems, and the fluorescence intensity was measured using ImageJ software. The primary antibodies used in this study were as follows: anti-Aurora A (1:200 dilution; #14475; Cell Signaling Technology), anti-BUB1 (1:200 dilution; #94244; Cell Signaling Technology), anti-CDK1 (1:200 dilution; #9116; Cell Signaling Technology), anti-CCNA2 (1:200 dilution; #67955; Cell Signaling Technology), and anti-CCNB2 (1:200 dilution; #12231; Cell Signaling Technology).

Construction and evaluation of prediction model

Age and gender were included in the prediction model as variables. Univariate Cox regression analysis was then used to screen variables for the predictive model with a p value < 0.15. Multivariate Cox regression analysis was used to construct a prediction model with gender, age, T stage, N stage, M stage, BUB1 expression, and AURKA expression. The discriminating ability of the prediction model was first evaluated by the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) using the “ROCR” package in R software (version 4.1.2). In addition, the differences between the AUCs of the different models were compared by the Delong method using the “pROC” package in R software (version 4.1.2). Afterwards, a calibration plot with the “boot” method using 1000 replications was applied to evaluate the calibration of the models using the “rms” package in R software (version 4.1.2). The net reclassification index (NRI) and integrated discrimination improvement index (IDI) were further utilized to evaluate the additional predictive ability of the model after the inclusion of the hub genes by using the “nricens” and “PredictABEL” packages in R software (version 4.1.2). Finally, decision curve analysis (DCA) was applied to analyse the clinical utility of the models using the “rmda” package in R software (version 4.1.2).

Statistical analysis

Statistical analysis was performed using SPSS version 19 (IBM Corporation, Armonk, NY, USA), R software (version 4.1.2), and GARPHPAD (version 8.0). Continuous data are expressed as means ± standard deviations (SDs). Count data are expressed as frequencies (percentages). Student’s t test was used to compare continuous data with a normal distribution between different groups, and Mann–Whitney U tests were used to compare continuous data with a nonnormal distribution. For count data, the chi-square test was used to compare the difference in frequencies between groups. Cox regression analysis was used to construct the prediction models. The performance of the models was determined using ROC curves, AUC, NRI, IDI, and DCA. A Kaplan–Meier curve (log-rank test) was used to compare the survival rates between different groups. A value of p < 0.05 was considered to indicate statistical significance.