Introduction

Gastric cancer (GC), the fourth cause of cancer death, is a global disease with high mortality as many patients are diagnosed at a late stage1. Despite significant progress in the treatment of gastric cancer over the past few decades, such as immune checkpoint inhibitors, the death toll related to GC remains high. GC is a tumor with obvious heterogeneity, which has an appreciable effect on the treatment and prognosis of patients2. Despite traditional clinicopathological staging and classification, such as TNM stage, could provide accurate information for the prediction of prognosis, it can be improved by computational and bioinformatics tools based on features of transcriptomics, as molecular changes in advanced cancer can regulate local and systemic body responses3.

In the process of cell cycle, both P53 and G2/M checkpoints play essential roles. P53-mediated signaling is a crucial regulation pathway for the block of cell replication in both the G1 and the G2 phase4,5,6. P53, the key protein and transcription factor in this pathway, can up-regulate expression of the critical molecules, such as p21, p38, or Bax, to regulate the cell cycle or result in apoptosis of abnormal cells5,7. The G2/M checkpoint is essential for precise cell division and cell survival. DNA and chromatin damage occurring in the G2 phase can also lead to cell cycle arrest in cancer cells8. Dysregulation of the cell cycle, driven by mutations, chromosomal aberrations, differential gene expression, and epigenetic alterations of key genes, is an important molecular feature of cancer, and is closely associated with disease progression and patient prognosis9,10. In fact, the loss of function of the G1/S checkpoint was found in many types of cancers, and thus, cancer cells survive on a functional G2/M checkpoint for DNA repair when suffering chemotherapy11. This means that targeting the G2/M checkpoint is a promising treatment for improving the prognosis of cancer patients. What’s more, some studies have found in tumors that genes related to cell cycle are involved in the formation of inflammatory immune microenvironment and associate the response to immunotherapy12,13,14.

In this scenario, we developed a molecular subtype based on prognostic genes of the G2/M checkpoint and the P53 pathway, and constructed corresponding prognosis prediction model in the TCGA-STAD cohort (stomach adenocarcinoma from The Cancer Genome Atlas database) and validated its reliability using multiple datasets from the Gene Expression Omnibus (GEO) database and patients’ data from a Chinese center.

Materials and methods

Acquisition of datasets and gene set

The process of this study was shown schematically in the flow chart (Fig. S1). The genetic expression profile data on transcriptome level, somatic mutation data and clinical information of patients with stomach adenocarcinoma (STAD) were obtained from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/) and Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). The information of 407 samples (375 tumor tissues and 32 paracancerous tissues) from 375 patients with GC, including gene expression profile, genetic mutation information, demographic information, clinicopathological information, and survival information, were collected from TCGA-STAD cohort. Besides, gene expression data and survival information of two GC cohorts (GSE13861 and GSE26901) were obtained from the GEO database.

Gene sets related to the G2/M checkpoint (HALLMARK_G2M_CHECKPOINT.v2022.1.Hs) and P53 pathway (HALLMARK_P53_PATHWAY.v2022.1.Hs) were acquired from The Molecular Signatures Database (MSigDB, http://www.gsea-msigdb.org/gsea/msigdb).

Prognostic analysis and differential expression analysis of G2/M checkpoint and P53-pathway related genes

Patients in the TCGA-STAD cohort (n = 375) were employed as the training set to determine the key differentially expressed genes (DEGs) with prognostic significance. In this part, R packages “survival” and “survminer” were utilized to identify the reliable genes predicting the overall survival (OS) of patients. Univariate Cox regression and Mann–Whitney U test were used to screen prognostic DEGs (PDEGs). Concretely, genes of G2/M checkpoint and P53 pathway with a P value < 0.05 in univariate Cox regression analysis and |FC|> 2 & FDR < 0.05 in Mann–Whitney U test were considered as PDEGs and selected for further analysis.

Analysis of somatic mutation

Somatic mutation data, including copy number variation (CNV), single nucleotide variants (SNV) and insertion/deletion (InDel), were downloaded from the TCGA-STAD cohort via the R package “TCGAbiolinks”. Thereafter, the R package “maftools” was used to analyse the mutation frequency, mutation type and draw oncoplots waterfall plot of PDEGs. In addition, we also calculated the tumor mutation burden (TMB) of each patient for further analysis.

Multivariable cox regression analysis and PDEGs score

The “BiocGenerics” package and “stats” package were used to implement multivariable Cox regression analysis. The Cox proportional hazards model with the stepwise method was used to identify appropriate PDEGs and determine the best formula for the PDEGs score. Then, we applied the selected PDEGs to calculate the PDEGs risk score by the “rms” package:

$$PDEGs\,score = h0\left( t \right)*exp\sum \left( {Expi*coefi} \right).$$

Patients of the TCGA-STAD cohort were divided into two groups according to their PDEGs score: High score group (with PDEGs score > median) and Low score group (with PDEGs score < median). Clinicopathological data between different PDEGs score subgroups (OS, TNM stages, and pathological type) were also subjected to further statistical analysis.

The establishment of prognostic prediction model and nomogram

Molecular subtypes accompanied with demographic features, TNM stage, and pathology subtype were included in the multivariate Cox regression analysis with stepwise regression. Then, factors with a P value < 0.05 were integrated to construct molecular-subtype-based risk models. R package “rms” was used to draw nomograms corresponding to the molecular-subtype-based prediction model. Besides, we assessed the prognostic values of the above model and nomogram via performing time‐dependent ROC curve analyses for the 2-, 3-, and 5-year survival rates using R packages “timeROC” and “ggplot2”. Meanwhile, calibration curves of 2-, 3-, and 5-year OS were drawn to evaluate the agreement between the predicted survival rates and observational survival rates.

Assessment of the infiltration of immunocytes

CIBERSORTx (https://cibersortx.stanford.edu/) was used to calculate the proportion of 22 immune cell types based on the expression profile in different risk groups of TCGA patients15.

Prediction of immunotherapy responses

Tumor immune dysfunction and exclusion (TIDE) score is a calculation method based on the transcriptome of patients, which is associated with the dysfunction and infiltration of tumor infiltrating lymphocytes (TILs, T cells that are resident within tumors). It has the potential to predict the ability of immune evasion and curative effect immunotherapies. Here, we obtained and compared the TIDE scores of patients in different risk groups online (http://tide.dfci.harvard.edu/) to evaluate the guidance role of cell-cycle-related genes in immunotherapy16.

Prediction of chemotherapy sensitivity

The package “oncoPredict” was utilized to predict various chemotherapeutic drug sensitivity in STAD patients based on the gene expression matrix of the TCGA cohort17. The half-inhibitory concentration (IC50) between different PDEGs score subgroups of chemotherapy drugs commonly used in GC as first-line or second-line treatment, including paclitaxel, oxaliplatin, irinotecan, docetaxel, 5-fluorouracil, and cisplatin, were compared by Wilcoxon signed-rank test.

Validation in GSE13861 and GSE26901

In this section, we calculated the PDEGs scores of each patient in GSE 13861 and GSE 26901 by applying of the obtained PDEGs score formula in the training set. Patients in validation cohorts were also divided into high PDEGs score and low PDEGs score group according to their respective median. Similarly, the OS between two score subgroups was also compared by the R packages “survival” and “survminer”.

Paired tissues of patients with GC

Paired samples, including GC tissues and corresponding paracancerous normal tissues (3 cm beyond lesion) from 12 patients with surgical resection at the Seventh Affiliated Hospital, Sun Yat-sen University, were used to validate the expression pattern of specific genes.

Meanwhile, 97 patients with GC undergoing surgery from the Seventh Affiliated Hospital, Sun Yat-sen University between 2021 and 2024 were enrolled for this study. Paraffin sections and clinical pathological information of these patients were collected for analysis.

Cell lines and cell culture

Human GC cell lines, including SGC-7901, MKN-28, HGC-27, MGC-803 and SNU-719, and human gastric epithelial cell line, GES-1, were acquired from the Cell Bank of Chinese Academy of Science (Shanghai, China) and were cultured in RPMI medium 1640 (Gibco, #1640-C11875500BT-1) supplemented with 10% fetal bovine serum (FBS; Gibco, #C0232) at 37 °C in 5% CO2 incubator. Oxaliplatin-resistant GC cell line MGC-803R was gifted from the Seventh Affiliated Hospital of Sun Yat-sen University (Shenzhen, China). Oxaliplatin-resistant MGC-803 was sustained in 1640 medium containing oxaliplatin (4uM). All cell lines were tested negative for Mycoplasma and have been authenticated (STR profile).

Quantitative real time polymerase chain reaction (qRT-PCR)

Total RNA was prepared from cell lines or tissue of GC using FastPure Cell/Tissue Total RNA Isolation Kit V2 (Vazyme, #RC112-01). cDNA was synthesized by reverse transcriptase reaction using the Evo M-MLV RT Premix for qPCR kit (Accurate Biology, #AG11601). The cDNA products were employed as a template for qPCR analysis using the SYBR Green Pro Taq HS qPCR kit (Accurate Biology, #AG11702). According to the directions of the manufacturer, cDNA, primer, SYBR Green and RNase-free water were mixed to form a 10ul-reaction system. The CT value of each gene is measured by Touch Real-time PCR Detection System (USA, Bio-Rad). The β-actin gene was used as a reference gene in quantitative analysis of mRNA expression of target genes. The sequence of primers used in this study is shown in Supplementary Table 1.

The silence of F2R

The Lipofectamine 3000 (Invitrogen, #L3000001) were employed to silence the expression of F2R in cell lines, according to the manual of manufacturer.

Cytotoxicity of oxaliplatin in vitro

Each well of a 96-well plate was seeded by 8000 cells and was added to different concentrations of oxaliplatin (Aladdin, Shanghai) 24 h later. The Cell Counting Kit-8 (CCK8) (MeilunBio, #MA0218) was used to detect cell viability 48 h after adding oxaliplatin. Growth-inhibitory curves were drawn and the IC50 of each cell line of oxaliplatin was calculated.

Wound healing assay

Wound healing analysis was performed to evaluate the infiltration ability and migration ability of the cells. Cancer cells were seeded on 6-well plates at 1.0 × 106 cells/well (n = 3). After they were sub-confluent to around 75–90%, the cells in each well were scratched to generate wound areas, and then the cells were cultured for 24 h under conditions of 37ºC, 5% CO2. The wound area was measured at 0 h and 48 h after scratch and calculated using ImageJ for wound healing analysis.

Immunohistochemistry (IHC)

Paraffin-embedded sections received dewaxing and hydration by the use of dimethylbenzene and ethanol with gradient concentrations. Next, antigen retrieval was conducted using an EDTA solution, and endogenous catalase was deactivated using H2O2. After a 30-min period of blocking non-specific antigens with goat serum, human tissue sections were incubated with the primary antibody. The incubation took place overnight at a temperature of 4 ℃. The following day, immunodetection was performed using the DAB Detection Kit (Gene Tech, #GK600511). The protein expression was evaluated by a panel of two expert pathologists using staining intensity and the percentage of positive cells.

Western blotting

Western blotting was performed to investigate the expression of F2R in the cell line. Total protein of was extracted and denatured before electrophoresis. Then, the protein was separated in SDS-PAGE and transferred to PVDF membrane. The membrane was incubated overnight with primary antibody, including F2R (Proteintech, #26366-1-AP), E-cadherin (Proteintech, #60335-1-Ig), N-cadherin (Proteintech, #66219-1-Ig), Vimentin (Proteintech, #60330-1-Ig), TWIST1 (Proteintech, 25465-1-AP), Phospho-Histone H2A.X (CST, #60566), B-Actin (Proteintech, 66009-1-Ig) and GAPDH (Proteintech, #60004-1-Ig). The protein expression was detected after the incubation of second antibody in the next day.

HR (homologous recombination) and NHEJ (non-homologous end joining DNA repair assay

To assess the efficiency of HR (homologous recombination) and NHEJ (non-homologous end joining) repair pathways, HR and NHEJ assay were employed. Specifically, gastric cancer cells integrated with the direct repeat GFP (DR-GFP) or EJ5-GFP reporter gene were infected with the control plasmid or siRNA of F2R. Subsequently, the cells were transfected with I-SceI (m-cherry) expression vector. Doxycycline (DOX) was added to induce I-SceI expression. 24 h after the addition of DOX, the percentages of GFP-positive or mCherry-positive cells were analyzed by flow cytometry. The efficiencies of HR and NHEJ were represented by the percentages of GFP and RFP-positive cells.

Flow cytometry

Single-cell was obtained from gastric cancer cells in HR and NHEJ assay and were washed and resuspended with PBS. CytoFLEX flow cytometer (Beckman Coulter) was used for the analysis.

Statistical analysis and repeatability

Software, including R (4.2.0), RStudio and GraphPad Prism 8.0, were used for statistical analysis and visualization of data. Data analysis was performed using appropriate statistical methods based on the distribution of the data. For normally distributed data, parametric tests such as Student’s t-test or ANOVA were applied. For data that did not follow a normal distribution, non-parametric tests, including the Mann–Whitney U test or Kruskal–Wallis test, were utilized. The normality of the data was assessed using the Shapiro–Wilk test. The Chi-square test was used for categorical data. Correlation between the parameters was performed using Pearson correlation analysis or Spearman rank correlation analysis. Log-rank tests and univariate and multivariate Cox risk regression analyses were used to analyze survival data. The above statistical tests are bilateral tests. P < 0.05 was considered statistically significant.

All experiments were repeated at least three times to ensure reproducibility, and results are presented as mean ± standard deviation (SD) or median ± interquartile range (IQR), as appropriate. Statistical significance was set at P < 0.05.

Results

Demographic and clinicopathologic characteristic

The demographic and clinicopathologic characteristics of patients contained in the three datasets used in this study were summarized in Table 1. A total of 549 patients with GC with complete gene expression data and relevant information were included in the following analysis. TCGA cohort was employed as the training set to identify differentially expressed genes and prognostic genes while GSE26901 and GSE13861 were used as validation sets. Of note, a portion of patients in GSE26901(39/109, 35.78%) and GSE13861 (n = 49/65, 75.38%) had received adjuvant chemotherapy.

Table 1 Demographic and clinicopathologic characteristic.

Prognostic DEGs (PDEGs) of G2/M checkpoint and P53 pathway

In all, 399 genes of G2/M checkpoint and P53 pathway and 8701 DEGs (7348 up-regulated DEGs and 1353 down-regulated DEGs) were obtained via the above-mentioned statistical analysis (Supplementary Table II&III)(Fig. 1A). Meanwhile, 40 genes of the G2/M checkpoint and P53 pathway with prognostic value were screened out by univariate Cox regression analysis (Supplementary Table IV). Closer inspection of the 40 genes revealed that most of these prognostic genes were highly expressed in tumor tissue compared with normal tissue (Fig. 1B). Eventually, we acquired 19 PDEGs related to the cell cycle by the intersection of DEGs set and prognostic gene set. Specifically, POLQ, MCM2, RAD54L, EZH2, MCM3, LMNB1, STIL, CKS2, KIF11, UCK2, INCENP, FANCC, KIF15, CDC25A, CHAF1A, AEN, F2R, IRAK1, PVT1 were PDEGs related to the prognosis of TCGA patients among 399 genes of G2/M checkpoint and P53-pathway (Fig. 1C,D).

Fig. 1
figure 1

19 PDEGs of G2/M checkpoint and P53 pathway identified from 8701 DEGs in TCGA-STAD cohort. (A) Volcano plot revealing 8701 DEGs between normal tissue and tumor tissue of 375 patients. (B) The expressions of 40 prognostic genes of G2/M checkpoint and P53 pathway in normal tissue and tumor tissue. (C) Venn diagram showing 19 PDEGs by the intersection of two gene sets. (D) The overview of the expression of 19 PDEGs among 399 G2/M checkpoint and P53 pathway genes. *P < 0.05, **P < 0.01, ***P < 0.001.

The calculation of PDEGs score and PDEGs score group

Here, we further identified critical PDEGs with independent prognostic significance from the above 19 cell-cycle-related hallmark genes. The Cox proportional hazards model with stepwise method, a multivariate analysis for the 19 PDEGs, was used to identify the optimum PDEGs for the calculation of PDEGs score. The ultimate outcome is shown in Table 2. The expressions of four genes, including INCENP, F2R, PVT1, and IRAK1, were considered as the key elements of the PDEGs score associating with overall survival of patients with GC. Hence, we employed the “predict” function to calculate PDEGs score of each patient in the TCGA cohort:

$$\begin{aligned} PDEGs\,score & = h0\left( t \right)*exp[expression\,level\,of\,INCENP*( - 0.044060) \\ & \quad + expression\,level\,of\,F2R*(0.011727) \\ & \quad + expression\,level\,of\,IRAK1 * ( - 0.006604) \\ & \quad + expression\,level\,of\,PVT1 * ( - 0.234261)]. \\ \end{aligned}$$
Table 2 Four critical genes for calculation of PDEGs score.

According to the median of the PDEGs score (median value = 0.03), we divided the patients in the TCGA cohort into two subgroups: the high score group (PDEG score > 0.03) and the low score group (PDEG score < 0.03). The PDEGs score distribution, as well as the survival status of each patient and the expression of 4 PDEGs in patients with high PDEGs score(n = 184) compared to those with low PDEGs (n = 183) were shown in Fig. 2A (8 patients were removed due to the miss of survival data). Careful scrutiny of the 4 PDEGs revealed a high expression level of F2R while low expression levels of INCENP, PVT1, and IRAK1 in high PDEGs score group. Additionally, the result of Kaplan Meier method showed that patients in the low PDEGs score group had a better OS (log-rank test, P < 0.001) (Fig. 2B) and a better PFS (log-rank test, P = 0.021) (Fig. 2C) compared with patients in high score. Further, the correlation analysis discovered no significant correlation between the PDEGs subtype and the TNM stage as well as the pathological type (Fig. 2D), which implied that the PDEGs subtype might be a risk factor for OS independent of TNM stage. To provide more information for the prediction of patients’ survival, we carried out the subgroup analysis stratified by T stage, N stage and M stage. The result indicated that the three variables can further predict the prognosis for patients in the high- or low-PDEGs score subgroup. Patients in the high PDEGs score subgroup with more advanced TNM stage have the worst survivals (Fig. S2).

Fig. 2
figure 2

PDEGs score and the clinical significance of PDEGs score in TCGA-STAD cohort. (A) Ranked dot and scatter plots representing the PDEGs score distribution and patient survival status. Heatmap showing expressions of PVT1, F2R, INCENP and IRAK1 in two score subgroup. Scale from 0 to 15 represents minimum and maximum values in the heatmap. (B) The overall survival plot of high score and low score group (log-rank tests, P < 0.001). (C) The progression-Free survival plot of high score and low score group (log-rank tests, P = 0.021). (D) Differences in TNM stage and pathologic type between the two PDEGs score subgroups.

The prediction model and nomogram based on PDEGs score and clinical features

Likewise, PDEGs score groups, age, gender, and TNM stage were identified as appropriate variables for the establishment of the PDEGs-score-based model and nomogram (Supplementary Table V, Fig. 3A). Hence, a PDEGs-score-based model was constructed for the prediction of patient prognosis:

$$\begin{aligned} Risk\,score & = h0\left( t \right)*exp[0.035*Age + Stage\,(I = 0,II = 0.368,III = 0.756,IV = 1.656) \\ & \quad {-} PDEGs\,score\,group\,(High\,score = 0,High\,score = - 0.818)]. \\ \end{aligned}$$
Fig. 3
figure 3

The establishment of model and nomogram based on score groups and clinical features in TCGA-STAD cohort. (A) Age, score groups and TNM stage were independent factors of OS by multiple cox regression analysis. (B) The survival plot of high risk and low risk group (log-rank tests, P < 0.001). (C) Nomogram consisting of age, PDEGs score group and TNM stage and predicting the 2- , 3- and 5-year survival rates of patients. (D, E) Time-dependent ROC curves (D) and time-dependent AUC curve (E) evaluating the PDEG-score-based risk model.

Subsequently, we divided the above patients with complete data into two risk groups according to the median score of risk score. The following analysis also showed that the low-risk group had a better prognosis compared with the high-risk group (HR 0.43, 95% CI [0.30, 0.62], P<0.0001) (Fig. 3B). The PDEGs-score-based nomogram was depicted to predict the probability of OS exceeding 2 years, 3 years or 5 years (Fig. 3C).

Figure 3D also presents ROC curves based on the PDEG-score-based nomogram that predicted 2-, 3-, and 5-year survival rates of patients with GC(73.7%, 67.0%, and 67.2%, respectively) and Fig. 3E displayed a modest predictive ability in 2-year survival according to time-dependent ROC curves. Besides, calibration curves of 2-, 3- and 5-year OS showed a preferable concordance between the predicted OS rate and the actual observed OS rate (Fig. S3).

TMB between different subgroups

TMB is a well-recognized biomarker that is relevant to the immunogenicity of tumors and predicts response or nonresponse to ICBs18,19. Patients in the low PDEGs score group possessed higher TMBs, as depicted in the two following oncoplots, than their counterparts in high PDEGs score group seemingly (Fig. 4A). Statistical results further confirmed statistical differences of TMB (P < 0.001) between high PDEGs score and low PDEGs score group (Fig. 4C). Besides, correlation analysis also revealed that PDEGs score was positively correlated with TMB (R = 0.4, P < 0.001) (Fig. 4B). Overall, patients in low PDEGs score group had a higher mutation frequency and stronger immunogenicity.

Fig. 4
figure 4

TMB of different subgroups in TCGA-STAD cohort. (A) Oncoplots showing the mutational difference between different subgroups. (B) The relationship between TMB and PDEGs score (Spearman correlation test, R = 0.4, P < 0.001). (C) TMB in different PDEGs score groups (Wilcoxon test, P < 0.001).

The infiltration of immunocytes and immunotherapy responses of TCGA-STAD cohort

The compositions of immune cells within a tumor are considered prominently predictive markers of prognosis and effective ICB therapy for patients. In view of this, the estimation of the abundances of 22 immunocyte types between different molecular subtypes was carried out in CIBERSORTx. As shown in Fig. 5A, compared with the high PDEGs score group, a relatively increasing number of M0-like macrophages, activated mast cells, and T follicular helper cells (Tfh) was found in the low score group, while a relative decrease of M1-like macrophages, memory B cells, naïve B cells and resting mast cells. Taken together, patients in low PDEGs score group had a more inflammatory anti-tumor microenvironment.

Fig. 5
figure 5

The infiltration of immunocytes and the predictive value of molecular subtype for immunotherapy in TCGA-STAD cohort. (A) The relative immune infiltration score of 22 immune cells between low PDEGs score and high PDEGs score groups. (B) The distribution of TIDE score of 375 patients. (C) Scatter plot displaying the linear correlation between TIDE score and PDEGs score (Spearman correlation test, R = 0.46, P < 0.001). (D) Violin diagram revealing the difference of TIDE score between low PDEGs score and high PDEGs score groups (Wilcoxon tests, P < 0.001). (E) Bar plot depicting the difference of response rate between different subgroups (Chi-square tests, P < 0.001). *P < 0.05, **P < 0.01, ***P < 0.001.

Based on the RNA-seq expression profiles of 375 patients, TIDE scores of each patient were calculated to predict patient response to immunotherapy, including anti-PD1 and anti-CTLA4. According to the TIDE scores, patients were allotted into two groups: the Nonresponder group (TIDE scores > 0) and the Responder group (TIDE scores < 0) (Fig. 5B). Significantly, PDEGs scores positively correlated with TIDE scores significantly (R = 0.46, P < 0.001) (Fig. 5C). In addition, the result of the univariate analysis showed that TIDE scores of patients in high PDEGs score group were significantly higher than that of low PDEGs score group (Fig. 5D) and thus implied a lower proportion of responders to immunotherapy (Fig. 5E). In conclusion, these results suggested that patients the low score group were more likely to have an immune response to immunotherapy, which is entirely consistent with the results mentioned above.

The prediction of chemotherapy

A number of studies have also found that the expression of cell-cycle-related genes is associated with a better prognosis and a greater likelihood of response to chemotherapy20,21. In this part, the “oncoPredict” package was employed to estimate the half-inhibitory concentration (IC50) of TCGA patients and assess the chemotherapeutics sensitivity of different subtypes, which including 5-Fluorouracil (5-Fu), oxaliplatin, cisplatin, irinotecan, docetaxel and paclitaxel. Results of the analysis illustrated that the low PDEGs score was associated with a lower IC50 of the above chemotherapeutics (P < 0.05), except cisplatin (P = 0.1), as shown in Fig. 6. Collectively, this molecular classification based on PDEGs score group could act as a potential predictor for chemo-sensitivity.

Fig. 6
figure 6

Relationships between chemotherapeutics sensitivity and molecular subtypes, which showed the difference of IC50 between high score group and low score group (Wilcoxon test).

Testing the predictive model by GSE13861 and GSE26901

Firstly, to validate the prognostic value of the PDEGs score derived from the training set, we applied the same formula to two validation sets (GSE13861 and GSE26901). Using the median of PDEGs score as the threshold, the 4-gene prognostic signature significantly stratified the patients in GSE13861 for OS (HR 0.37, 95% CI 0.17–0.80, P = 0.008) independent from age and stage (Fig. 7A,E), though did not show a significant prognostic significance in GSE26901 set (Fig. 7B). Secondly, in consideration of the potential role of the PDEGs score for chemotherapy, we further validated prognostic value of the 4-gene signature in chemotherapy subgroup. Modest power was found to distinguish the prognosis of patients in high PDEGs score and low PDEGs score group (HR 0.34, 95% CI 0.14–0.82, P = 0.016) (Fig. 7C,D). Thirdly, we further verified the ability of the PDEGs-score-based model to predicet 2-year, 3-year and 5-year OS of patients with GC in GSE13861 cohort. This prognostic model exhibited a prominent accuracy (AUC = 0.910, 0.910, 0.847, respectively) in predicting 2-year, 3-year and 5-year OS of patients with GC (Fig. 7F,G). Together, the clinical significance of the PDEGs score has been fully confirmed in the GSE13861 cohort.

Fig. 7
figure 7

The validation of PDEGs score and corresponding risk model in GSE13861 and GSE26901 cohorts. (A) The survival plot of high score and low score groups in GSE13861 cohort (log-rank tests, P = 0.013). (B) The survival plot of high score and low score groups in GSE26901 cohort (log-rank tests, P = 0.88). (C) Alluvial diagram of the subtype distributions in groups with different PDEGs score and survival outcomes in GSE1386. (D) The survival plot of high score and low score groups in chemotherapy subgroup of GSE13861 cohort (log-rank tests, P = 0.093). (E) Age, PDEGs score and TNM stage were independent factors of OS in validation set (GSE13861). (F, G) Time-dependent ROC curves (F) and time-dependent AUC curve (G) evaluating the performance of score-group-based model to predicting prognosis of patients in validation set (GSE13861).

The validation of differential expression of 4 genes in vitro and ex vivo

To verify the expression characteristics of the selected genes in our study, we first examined the mRNA levels of each GC cell line and compared them to those of GES-1. The quantitative real-time polymerase chain reaction (qRT-PCR) validation in GC cell lines showed that the expression of F2R was only up-regulated in the HGC-27 cell line and down-regulated in MKN-28 (Fig. 8A). The expression of PVT1 was remarkedly enhanced in GC cell lines except for SGC-7901 (Fig. 8B). Similarly, the expression of IRAK1 was remarkedly enhanced in GC cell lines except for HGC-27 (Fig. 8C). Besides, mRNA level of INCENP increased significantly in MGC803, SNU-719 and HGC-27, while was down-regulated substantially in MKN-28 and SGC7901 in comparison with GES-1 (Fig. 8D).

Fig. 8
figure 8

The expression of four PDEGs in GC cell lines and tissues. (AD) The evaluation of expression level of F2R (A), PVT1 (B), INCENP (C) and IRAK1 (D) in GC cells. (EH) The evaluation of expression level of F2R (E), PVT1 (F), INCENP (G) and IRAK1 (H) in GC tissues. *P < 0.05, **P < 0.01, ***P < 0.001.

To further investigate whether the expression of above genes was up-regulated in clinical patients, the expression level of mRNA of above genes were measured in tumor tissue specimens and corresponding adjacent normal tissue specimens from 15 patients with GC. As confirmed by the results of qRT-PCR, the four genes were highly expressed in tumor compared to adjacent normal tissue (Fig. 8E–H).

The validation of differential expression of 4 genes in drug-resistant cell lines and identifying F2R as the hub gene for tumor progression

Given that chemotherapy remains the preferred adjuvant therapy and oxaliplatin remains one of the first-line drugs for advanced gastric cancer, we further validated the expression of the four genes in the drug-resistant cell line and its parental cell line. We first proved that MGC-803R has a remarkable resistance to oxaliplatin compared to wild-type MGC803 which was sensitive to oxaliplatin (Fig. 9A). The result of qRT-PCR showed that F2R and PVT1 were upregulated in drug-resistant GC cell significantly, while no expression difference of INCENP and IRAK1 between MGC-803 and MGC-803R was observed (Fig. 9B). Combined with above experimental results, we have found that F2R was the unique gene that has significantly increased in patients with poor survival, GC tissue, and oxaliplatin-resistant GC cell (Fig. 9C). Meanwhile, the expression of INCENP, IRAK1 and PVT1 did not elevated in patients with poor prognosis or in drug-resistant GC cell. Hence, F2R might be the hub gene responsible for the disease progression.

Fig. 9
figure 9

The difference of the expression of four PDEGs between drug-resistant MGC803 and its parental cell line. (A) Cell viability of MGC-803 and MGC-803R cells treated with the gradient concentration of oxaliplatin for 48 h was measured by CCK8 assay (IC50: 15.60 μM vs 51.13 μM). (B) The evaluation of expression level of F2R, PVT1, INCENP and IRAK1in MGC803R and its parental cell. (C) Venn diagram indicating that F2R is the hub gene. (D) Representative section of GC tissue with high or low expression of F2R (scale bars: 60 μM). (E) Kaplan–Meier survival curves for OS of GC patients with low or high F2R expression (log-rank tests, P = 0.0033). (F) The western blot confirming the successful silence of F2R. (G) Cell viability curves of HGC27-si-NC and HGC27-si-F2R cells treated with the gradient concentration of oxaliplatin for 48 h was measured by CCK8 assay (IC50: 1.42 μM vs 0.42 μM). (H) The silence of F2R markedly inhibited the wound healing. (I) The western blot analysis demonstrating a marked elevation in γ-H2A.X levels within gastric cancer cells (exposed to 1 μM oxaliplatin) following the suppression of F2R expression. (J) Flow cytometry analysis of HR and NHEJ DNA repair assay measuring the ratio of GFP+ cells among mCherry+ cells. (K) The western blot analysis revealing alterations in the expression levels of E-cadherin, N-cadherin, and Vimentin subsequent to the inhibition of F2R expression. *P < 0.05, **P < 0.01, ***P < 0.001.

To further explore the role of F2R, we first conducted an IHC analysis of F2R for 97 patients and investigated the relationship between F2R and overall survival. The results show that the high expression of F2R was related to poor survival (Fig. 9D,E). In addition, the silence of F2R in GC cells undermined the resistance to oxaliplatin and the ability of invasion and metastasis (Fig. 9F–H). Since cancer cells resist the effects of chemotherapy drugs by enhancing their DNA repair capabilities22, which leads to drug resistance, we further investigated whether F2R silencing affected the DNA repair mechanism. The result of WB analysis showed that after F2R silencing, the expression of γ-H2A.X, a key protein of DNA damage repair, was significantly increased when exposed to 1 μM oxaliplatin, suggesting that F2R may promote the resistance to oxaliplatin by regulating the DNA repair pathway (Fig. 9I). HR and NHEJ are two main pathways for DNA repair. We next implemented the HR and NHEJ assay to determine which DNA repair pathway affected by F2R. The result indicated that knocking down F2R impaired the HR repair pathway of cancer cells, but did not affect the NHEJ pathway (Fig. 9J). Invasion and metastasis are major causes of poor prognosis in cancer patients, and EMT (epithelial-mesenchymal transition) is a key biological process through which cancer cells acquire invasiveness and migratory abilities. To further validate the effect of F2R in tumor invasion and metastasis, we detected the expression of EMT markers (E-cadherin, N-cadherin and Vimentin) among Si-NC cells and Si-F2R cells. As shown in Fig. 9K, following the knockdown of F2R, a notable reduction in E-cadherin expression was observed, accompanied by a significant upregulation of N-cadherin and Vimentin. These findings underscore the role of F2R in facilitating metastatic processes. Collectively, the high expression level of F2R is associated with the development of oxaliplatin resistance and tumor metastasis, which is consistent with the above results.

Discussions

Globally, GC is one of the leading causes of preventable death and causes a heavy burden of disease. East Asia, including China, is the area with highest prevalence and mortality of the gastric carcinoma1. Although lots of great strides have been made in the treatment in recent years, such as immunotherapy, targeted therapy and adjuvant chemotherapy, the patients of GC have poor clinical outcomes. One of the important reasons is that the current clinical stage can’t predict the response of patients to treatment. For instance, classic TNM staging can well predict the prognosis of patients, but provides limited information for immunotherapy, targeted therapy and adjuvant chemotherapy23,24. With the rapid progression of sequencing technology, numerous studies and databases have provided the researchers with massive information on gene sequencing in kinds of carcinoma. However, a large amount of public online data is still not fully utilized.

In this study, we filtered out 19 PDEGs from 399 genes related to the cell cycle and identified four critical genes to develop the molecular classification via by Cox regression analysis with the stepwise method. The overall survival of patients with GC was distinct between different PDEGs score groups. In addition, results of correlation and multi-variable analysis discovered PDEGs score as an independent factor for OS of TCGA patients, which indicated the predictive effect of this molecular subtype on patients’ prognosis. To further improve the prognostic value of this classification, we constructed the powerfully effective prognostic models based on molecular subtyping, age and TNM stage and demonstrated those predictive ability using the ROC curve. These models had a certain but relatively modest ability to predict of 2-, 3- and 5-year survival rates for TCGA patients. Moreover, in the validation set GSE13861 and its chemotherapy subgroup, the PDEGs score and the corresponding model also exert a significant impact on patient survival.

In the finding of our research, 4 genes, INCENP, F2R, IRAK1, PVT1 respectively, were proven to paly crucial parts in the PDEGs score model. What’s more, the validation experiments of cell lines and human specimens confirm that these critical genes were upregulated compared to normal epithelial cell lines or normal tissues. Mechanically, INCENP is a critical component of the chromosomal passenger complex (CPC), which acts as a key regulator of mitosis. Its upregulation is believed to be associated with poor prognosis and targeting INCENP is considered as a promising treatment method in some cancer patients25,26. F2R is a 7-transmembrane receptor involved in the regulation of thrombotic response. It is found to be correlated to the killing effect of cytotoxic T cells and the expression of PD-1 and PD-L1 in tumor tissues for patients with liver cancer27. Significantly, F2R has also been confirmed to be responsible for resistance of the PI3K pathway inhibitors in PTEN-null triple-negative breast cancer28. IRAK1 encodes the interleukin-1 receptor-associated kinase 1, which is partially responsible for IL1-induced upregulation of the transcription factor NF-kappa B and plays a critical role in initiating innate immune response against foreign pathogens. It is noteworthy that IRAK1 has been found to have an effect on disease progression and chemoresistance in many types of cancers, including hepatocellular carcinoma, breast cancer and nasopharyngeal carcinoma29,30,31. PVT1 is a long non-coding RNA that has been identified as a candidate oncogene. transcription of PVT1 is regulated by p53 through a canonical p53-binding site. It is closely related to tumor chemoresistance in gastric cancer and colon cancer32,33. Besides, recent research has shown that PVT1 inhibition can stimulate anti-tumor immunity and suppress tumor growth in HNSCC34.

Cell components (e.g., fibroblasts and immunocytes), cytokines and metabolites in the tumor microenvironment (TME) can dramatically affect immune response to cancer cells or response to immunotherapy and thus related to prognosis in patients with solid cancer35,36. Cytotoxic CD8 positive T cells and NK cells in TME have powerful ability to kill tumor cells, as Treg cells act a opposite role in tumor killing via its robust function of immunological inhibition37,38,39. In addition, the high infiltration of M1-like cells in TME indicates a better prognosis, whereas high abundance M2-like cells is associated with tumor growth and metastasis as well as poor clinical outcome40. The deletion of immunosuppressive M2-like cells can enhance the effectiveness of immunotherapies41. A recent study about T follicular helper cells has revealed the antitumor immune effect of Tfh cells via a CD8 + -dependent manner42. Single-cell sequencing can perform analysis of each cell at the single-cell level and thus can identify subpopulation of immune cells more accurately. Yet, it is time-consuming and expensive for some researchers. CIBERSORTx is a simple and convenient tool estimating the composition and abundance of immune cells in mixed cells by deconvolution algorithms15. In this paper, CIBERSORTx was utilized for analyzing the relative proportion of 22 immune cells in each GC specimen. Our results have indicated a stronger anti-tumor immunity in patients of low PDEGs score group. Specifically, the tumors in low PDEGs score group have a lower abundance of M2-like macrophages and a higher abundance of follicular helper T cells. This discrepant TME phenotype of inflammatory cells infiltration could be attributed to alteration in cell cycle pathways. For instance, the inhibition of CDK4 and CDK6, key cyclin dependent kinases of cycle, was proven to have an ability to enhance T cell infiltration and anti-tumor immunity, and thus augmented clinical benefits of immunotherapy12,13. This could be one of the reasons for the explain of poor prognosis of patients in high score group.

ICBs are capable of strengthening and maintaining the anti-tumor responses of T cells in TME by interrupting inhibitory signals, such as PD-1/PD-L1 and CTLA-4/B7 signaling. Response rates to ICI treatment vary widely among different patients, and only a few patients can benefit from it43. Therefore, it is important to identify patients who have response to ICBs before treatment. TMB refers to the total number of mutations, base substitutions, and insertion or deletion errors detected per million bases. TMB has proven to be a reliable biomarker for predicting response of ICBs44,45. Mechanistically, researchers generally agree that immunogenicity of tumors have a close association with TMB. In other words, the production new tumor-associated antigens (TAAs) depend on mutation of genes, which mean that high TMB indicates a stronger immunogenicity of cancer cells theoretically43,46,47. TIDE score is a predictive model based on genetic signatures of dysfunction and low infiltration of cytotoxic T cells, which resulting tumor immune evasion. It is considered to have a better predictive performance of ICB response compared with biomarkers such as TMB and PD-L1 level48. Results in this study showed higher TMBs and lower TIDE scores, which imply a higher response rate of immunotherapy, in patients of low PDEGs score group compared to high score group. Consequently, the molecular classification is closely linked to TMB and immune response of ICBs. This could be an alternative reason why patients in low PDEGs score have a better clinical outcome.

5-Fu, oxaliplatin, cisplatin, irinotecan, docetaxel and paclitaxel are common chemotherapeutic drugs for gastric cancer49. These drugs can damage the DNA of cancer cells directly or indirectly50,51,52. However, sensitivities of a same drug vary greatly among patients. The variability in treatment outcomes receiving the same therapy for seemingly similar tumors can be attributed in part to genetics53. The cell cycle checkpoint pathways are important to induce the sensitization of chemotherapy via multiple pathways, just as the functions of key PDEGs we mentioned above. Primary and acquired resistance to anti-tumor drugs is one of the important factors affecting the prognosis of patients. “oncoPredict” is an R package for predicting in vivo or cancer patient drug response and biomarkers from cell line screening data17. In this paper, we utilized this R package to estimate IC50 of these chemotherapy drugs for each patient and the result supported a correlation between individual sensibility and molecular subtype. In the further validation experiment, more importantly, the expression level of F2R was significantly elevated in drug-resistant cell lines, which was consistent with the result of our bioinformatics analysis. It implied that PDEGs score may be used as a candidate biomarker for predicting efficacy of chemotherapy and F2R may be a key molecule for oxaliplatin resistance in GC. The capacity of some cancer cells to repair the DNA damage influence the effectiveness of chemotherapeutics22. The result of the study revealed that the suppression of F2R could impair the capacity of homologous recombination repair in gastric cancer. Thus, this maybe the underlying mechanism for the predictive power of our molecular subtypes for chemotherapy.

Conclusions

In conclusion, this comprehensive analysis developed and validated a combined genetic signature P53 and G2/M checkpoint for GC classification. This molecular classification is significantly related to the clinical prognosis of patients with GC and has the potential to individualize treatment, including chemotherapy and immunotherapy, for patients.