Introduction

Gastric cancer is a globally significant ailment, with an estimated yearly incidence surpassing one million new cases, positioning it as the fifth most prevalent contributor to cancer-related fatalities1. The intricate diversity in its morphological characteristics and the intricate web of classification systems have rendered conventional histopathological categorization approaches largely ineffective in offering precise prognostic insights for gastric cancer2. Consequently, the early detection of gastric cancer remains a formidable challenge, often resulting in late-stage presentations for patients. Beyond the confines of Japan, the overall 5-year relative survival rate hovers around a mere 20% in most global regions3, underscoring the acute demand for potent biomarkers that can forecast the prognosis of individuals grappling with gastric cancer. In recent years, various prognostic models have been developed using bioinformatics methods. For example, Gui et al. constructed a prognostic model based on cancer-associated fibroblast (CAF) genes in cervical cancer4, while Bian et al. established a cancer-testis antigen-related gene signature for gastric cancer5.More recently, Sun et al. developed an immune-related signature for papillary thyroid cancer using comprehensive immune infiltration analysis6, and Shi et al. constructed an APOBEC mutagenesis-related model for bladder cancer through machine learning approaches7. These studies demonstrate the feasibility and significance of developing prognostic models based on specific gene families or molecular features.The present study aspires to construct a set of prognostic attributes comprising genes from the CLEC family, a stride towards effectively prognosticating the future trajectories of gastric cancer patients.

The CLEC gene family encompasses genes bearing C-type lectin domains. C-type lectins (CLECs) are carbohydrate-binding proteins called lectin8, and the ‘C’ designation denotes their calcium-binding requisites. CLEC family genes encode proteins featuring C-type lectin domains, which play pivotal roles within the immune system, including cell–cell adhesion and immune responses to pathogens9. Drickamer et al. categorized C-type lectins into seven subgroups (I to VII) based on the sequence of various protein domains within each protein10. This classification was subsequently updated in 2002, introducing seven additional subgroups (VIII to XIV)11, and later expanded to include three more subgroups (XV to XVII)8. Due to the diversity of CLEC protein domains, CLEC proteins exhibit varied gene regulatory roles in different cellular environments and stimuli. Convincing evidence suggests that CLEC proteins play pivotal roles in activating and reshaping the immune system12. In particular, neutrophils express a subset of C-type lectin innate immune receptors, including Dectin-1 (CLEC7A), Mincle (CLEC4E), MDL-1 (CLEC5A), Mcl (CLEC4D), and CLEC2. Among these, CLEC7A is the primary receptor for fungal β-glucans and plays a role in fungal recognition by neutrophils. CLEC4E is a versatile receptor capable of recognizing Malassezia fungi, mycobacterial structures, and cytosolic danger signals such as SAP130. CLEC5A is implicated in viral recognition, while CLEC4α3 increases T-cell infiltration into the spinal cord after nerve root injury13.

Furthermore, CLEC2 is predominantly expressed in platelets or megakaryocytes and can adhere to cancer cells, inducing the release of pro-inflammatory cytokines14, thus inhibiting the proliferation of gastric cancer cell lines15. CLEC2 also interacts with podoplanin, promoting lung metastasis of osteosarcoma cancer cells through cell adhesion16,17. In summary, CLEC family genes may participate in tumor immunity and contribute to the initiation and progression of cancer.

With the advancement of large-scale genome sequencing technologies, integrating family gene markers associated with prognosis has improved the early diagnosis of cancer, surpassing traditional clinical parameters and single-gene predictive models. In this study, we selected CLEC family genes associated with prognosis from the TCGA dataset. We validated the prognostic value of three key genes within the CLEC family using the GEO dataset. Furthermore, we constructed a nomogram based on the risk score derived from these three key genes and clinical characteristics to predict individual overall survival (OS). In summary, our work holds the potential to contribute to the early diagnosis of gastric cancer patients.

Materials and methods

Collection of data

RNA sequencing data from gastric cancer patients were extracted from The Cancer Genome Atlas (TCGA, https://cancergenome.nih.gov) and the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/). RNA sequencing data for normal gastric tissue were extracted from the Genotype-Tissue Expression (GTEx, https://www.gtexportal.org/) database.Relevant clinical information, including age, gender, grade, survival status, and TNM staging, was obtained. In this study, we collected 412 gastric cancer samples and 36 adjacent non-cancer samples from TCGA, 483 gastric cancer samples from the GSE84437 dataset, and 174 normal gastric tissue samples from the GTEx database.The standardization of RNA-Seq data is essential for robust inference and reproducible results18,19. We uniformly converted TCGA and GEO data into TPM format, followed by normalization to eliminate batch effects using R packages. Subsequently, we extracted the expression levels of CLEC family genes for survival analysis to identify CLEC genes associated with prognosis. Using these genes, we constructed a prognostic model and validated its accuracy.

Establishment and testing of the risk score model

First, we applied the ‘limma’ package in R to select differentially expressed genes within the CLEC family. Subsequently, we randomly divided TCGA-STAD patient samples into training and testing sets. Concurrently, we conducted chi-square tests to analyze whether the training and testing sets were associated with clinical characteristics, ensuring the reliability of the randomized grouping results. We established a prognostic model using the training set and validated it with the testing set. Through univariate Cox regression analysis, least absolute shrinkage and selection operator (LASSO) regression analysis, and multivariate Cox regression analysis, we identified three CLEC family genes associated with the prognosis of gastric cancer patients. These genes serve as prognostic markers for predicting the prognosis of gastric cancer patients. Subsequently, we divided the training set into high-risk and low-risk groups using the median risk score as the threshold. We employed Kaplan–Meier survival curve analysis to assess the overall survival (OS) differences between the two groups and risk curves to evaluate the model’s reliability.

In addition, we performed univariate and multivariate Cox analyses on the TCGA and GEO datasets to confirm whether the three prognostic genes were independent prognostic factors for gastric cancer when compared to other clinical features such as age, gender, clinical staging, TNM, and risk scores. Next, we used ROC curves to validate the accuracy of clinical parameters and survival rates in the prognostic model. We analyzed the correlation between risk scores and clinical parameters, explored whether there were differences in risk scores among different clinical parameters, and ultimately constructed a nomogram to predict survival rates based on the prognostic model.

Immune analysis

To assess whether CLEC prognostic gene markers contribute to distinguishing differences in the tumor microenvironment of gastric cancer, we utilized the R package ‘ESTIMATE’ to compare the gene expression characteristics of stromal cells and immune cells between high-risk and low-risk groups. We employed the R package ‘CIBERSORT’ to investigate differences in the expression of infiltrating immune cells between the high-risk and low-risk groups. Further analysis of immune cell-related functions between risk groups was conducted using ‘GSVA’ and ‘GSEABase’ R packages, among others. Additionally, we evaluated the immune evasion status and the effectiveness of immunotherapy in patients from different risk groups by scoring tumor samples using the online database Tumor Immune Dysfunction and Exclusion (TIDE, http://tide.dfci.harvard.edu/). Finally, we employed the ‘oncoPredict’ R package to analyze drug sensitivity in the high-risk and low-risk groups.

Tumor mutation analysis

To assess the relationship between CLEC family prognostic biomarkers and tumor mutation characteristics, we downloaded somatic mutation data for gastric cancer patients from the TCGA database. Using the R package “maftools,” we analyzed and visualized mutation profiles in both high- and low-risk groups, selecting the top 15 genes with the highest mutation frequencies to generate an oncoplot. For each sample, the number of nonsynonymous mutations was calculated to represent the tumor mutation burden (TMB), with TMB values log2-transformed. We then applied the Wilcoxon rank-sum test to compare TMB differences between high- and low-risk groups. Using the R package “survminer,” we determined the optimal TMB cut-off value, categorizing patients into high-TMB (H-TMB) and low-TMB (L-TMB) groups. Combined with risk scores, patients were further stratified into four subgroups: H-TMB + high risk, H-TMB + low risk, L-TMB + high risk, and L-TMB + low risk. Kaplan–Meier survival analysis was conducted to evaluate the prognostic value of TMB level and its combination with risk scores.

Prediction of chemotherapy drug sensitivity

We used the R package “oncoPredict” to perform drug sensitivity prediction analysis across patient groups with different risk levels. This analysis was based on drug sensitivity data and gene expression profiles from the GDSC2 (Genomics of Drug Sensitivity in Cancer) database. Using preprocessed expression data, we predicted the half-maximal inhibitory concentration (IC50) values for each sample in response to commonly used clinical chemotherapy drugs. The Wilcoxon rank-sum test was applied to compare drug sensitivity differences between high- and low-risk groups (P < 0.001), and box plots were generated using the ggplot2 package to visualize the results. This analysis aims to provide a reference for selecting personalized treatment options for patients in different risk groups.

Gene-set enrichment analysis

We conducted Gene Ontology (GO) term enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG)20,21,22 pathway analysis to investigate potential signaling pathways and functions associated with the model containing three CLEC family genes. Furthermore, we utilized Gene Set Enrichment Analysis (GSEA) to elucidate further the regulated signaling pathways in high-risk and low-risk group patients.

Immunohistochemical analysis

With the approval of the First Affiliated Hospital of Gannan Medical University, we collected 20 pairs of gastric cancer and adjacent normal tissue specimens. All patients provided written informed consent and had not received radiotherapy or chemotherapy prior to surgery. Paraffin-embedded tumor sections were incubated overnight at 4 °C with Anti-VCAN rabbit polyclonal antibody (1:200), Anti-CD93 rabbit polyclonal antibody (1:400), and Anti-CLEC3A rabbit polyclonal antibody (1:200). The sections were then incubated with enzyme-labeled secondary antibodies at 37 °Cor 60 min. DAB staining was performed, followed by hematoxylin counterstaining. Photographs were taken using an inverted microscope. Protein expression levels in each section were quantified using the mean Integrated Optical Density (IOD) method with Image-Pro Plus software. Anti-VCAN and Anti-CLEC3A antibodies were purchased from Beijing Solarbio Science & Technology Co., Ltd. (Beijing, China), and the Anti-CD93 antibody was obtained from Elabscience Biotechnology Co., Ltd. (Wuhan, China).

Statistical analysis

We conducted statistical analyses using R software (version × 64 4.3.1), including differential analysis, Cox regression analysis, lasso analysis, survival analysis, ROC curve analysis, gene enrichment analysis, immune-related analysis, Mutation Analysis, and drug sensitivity analysis. Immunohistochemical experimental data were analyzed using GraphPad Prism 10.1.2 software for paired analysis.

Results

Features of patients with GC enrolled in this study

We obtained sequencing data and corresponding clinical data for 412 tumors and 36 adjacent non-cancer tissues of STAD from the TCGA database. Clinical information for the 412 gastric cancer patients, including age, gender, tumor grade, tumor stage, and pathological TNM staging, is summarized in Supplementary Table 1.

Construction and evaluation of the prognostic risk signature

This study aimed to explore the prognostic significance of differential expression genes within the CLEC family in gastric cancer. The process of constructing and validating the signature of three prognostic CLEC family genes is illustrated in Fig. 1. First, we integrated data from the TCGA and GTEx databases to conduct a differential analysis of CLEC family genes in gastric cancer tissues. Genes with |logFC|> 0.585 and FDR < 0.05 were selected as significantly differentially expressed using the ‘limma’ package, as shown in Fig. 2. Subsequently, univariate Cox regression was performed on the differentially expressed genes, and nine significant prognostic genes (ATRNL1, CD93, CD248, CLEC3A, CLEC4A, ASGR2, CLEC11A, SELL, VCAN) were identified (Fig. 3A). Next, we used LASSO regression to establish prognostic features and determine coefficients (Fig. 3B–C). Finally, the three CLEC family genes are included in the signature, with each coefficient representing the weight of the corresponding gene’s expression (P < 0.05, Supplementary Table 2). Subsequently, we employed multivariate Cox regression to calculate their respective coefficients (βi) to build the risk score model. We also validated the randomness of the grouping through chi-square tests (Supplementary Table 3). We analyzed the expression differences of the three prognostic CLEC family genes in 412 GC patients and 36 adjacent samples using unpaired t-tests. The results showed that VCAN, CLEC3A, and CD93 were upregulated in the high-risk group (Fig. 4A–B). We set the risk median and divided the 412 patients into high-risk and low-risk groups (Fig. 4C). Next, we performed PCA to assess the distributions between the high-risk and low-risk groups. Patients tended to be divided into two groups, clearly indicating the distinct states of gastric cancer patients in the two risk score groups (Fig. 4D). The scatter plot shows that higher risk was associated with more deceased patients (Fig. 4E). Kaplan–Meier survival curves confirmed a negative correlation between risk score and prognosis (P = 0.003, Fig. 4F).

Fig. 1
figure 1

Flowchart for generating and validating the three prognosis-related CLEC family genes signature.

Fig. 2
figure 2

Identification of differentially expressed CLEC genes in the TCGA and GTEx cohorts. (A) Heatmap depicting the expression levels of CLEC genes in tumor (T) and normal (N) samples. (B) Volcano plot representation of differentially expressed CLEC genes in the TCGA and GTEx cohorts

Fig. 3
figure 3

Selection of prognostic CLEC family genes with prognostic value. (A) The risk ratio forest plot shows that nine prognosis-related CLEC family genes were significantly related to OS of GC patients. (B) Adjusted parameters of LASSO regression model. (C) Figure for LASSO coefficient spectrum of prognostic CLEC family genes

Fig. 4
figure 4

CLEC family genes signature predicts overall survival in patients with GC. (A–B) A heatmap and box plot showed the differential expression of three prognosis-related CLEC family genes between high-low-risk subgroups. (C–D)The distribution of risk scores for each patient. With the median risk score as the cutoff, GC patients were divided into high- and low-risk subgroups. (E) Relationship between survival time (years) and survival status for each patient. (F) Kaplan–Meier curve of patients in the high- and low-risk subgroups to validate the predictive value of CLEC genes signature. The difference between the high- and low-risk subgroups was measured by the log-rank test, with a P-value < 0.01. OS, Overall Survival.

Testing of the risk score model

After calculating the risk scores for all patients in TCGA, we divided the training set and testing set samples into high-risk and low-risk groups based on the median risk score, as shown in Fig. 5A and B. We observed a higher proportion of deaths among gastric cancer patients in the high-risk group compared to the low-risk group in both the training and testing sets (Fig. 5C–D). Furthermore, the three CLEC family genes associated with prognosis in the risk model exhibited consistent expression patterns in both the training and testing sets (Fig. 5E–F). Kaplan–Meier survival curves demonstrated that patients in the low-risk group had better clinical prognosis than those in the high-risk group, with both P-values below 0.05 (training set: P = 0.021, Fig. 5G; testing set: P = 0.044, Fig. 5H).

Fig. 5
figure 5

Development and validation of the risk model for patients with GC. (A, B) Distribution of the GC patients with different risk scores in the training set and testing set. According to the median of the patient’s risk score, the GC patients were divided into high- (red) and low-risk(blue) groups. (C, D) The distribution of survival status of GC patients. The blue blots represent the patients who are alive, and red represents the patients who are dead. (E, F) Heat map depicting the expression patterns in the three prognosis-related CLEC family genes between high- and low-risk groups. (G, H) Overall survival(OS) of high-risk and low-risk patients in the testing group and training group.

Correlation between the model and the clinical parameters

We then analyzed the correlation between risk score and other clinical parameters. Kaplan–Meier analysis demonstrated the significant risk stratification ability of the model based on the three prognostic CLEC family genes in gastric cancer. The results showed that high-risk gastric cancer patients had a poorer prognosis in terms of age (Fig. 6A, 6B), female gender (Fig. 6C), early-stage grading (G1-G2; Fig. 6E), advanced-stage staging (stage III-IV; Fig. 6H), T3-T4 staging (Fig. 6J), N0-N1 staging (Fig. 6K), and M0 staging (Fig. 6M; all P < 0.05). However, no significant difference in prognosis between high-risk and low-risk gastric cancer patients was observed in male patients (Fig. 6D), G3-stage gastric cancer (Fig. 6F), early-stage staging (stage I-II; Fig. 6G), T1-T2 staging (Fig. 6I), N2-N3 staging (Fig. 6L), and M1 staging (Fig. 6N).

Fig. 6
figure 6

Kaplan–Meier curves showing the differences in prognosis between the high- and low-risk groups in different clinical subgroups, including age (A-B), female (C), male (D), G1-2 (E), G3 (F), stage I-II (G), stage III-IV (H), T1-2 (I), T3-4 (J), N0-1 (K), N2-3 (L), M0(M), M1(N).

Independent prognostic analysis and construction of a nomogram

Both univariate and multivariate Cox analyses showed that age, staging, and risk score could independently predict the prognosis of gastric cancer (Fig. 7A–B). We further compared these variables and found that the risk score was more accurate in predicting 5-year OS than pathological staging and age. The 5-year AUC for risk score, age, gender, grading, and staging were 0.802, 0.633, 0.592, 0.505, and 0.593, respectively (Fig. 7C). Time-dependent ROC curves for the two groups of patients were also plotted (Fig. 7D). The AUC values for risk score at 1, 3, and 5 years were 0.615, 0.657, and 0.802, respectively. We also constructed a nomogram to estimate the 1-year, 3-year, and 5-year survival probabilities, including age, grading,T staging, M staging, N staging, and risk grouping based on CLEC family genes (Fig. 7E). The C-index value of the calibration curve was 0.659, indicating that the calibration curves for predicting actual survival rates at 1, 3, and 5 years were relatively consistent with the reference line (Fig. 7F). These results suggest that the predicted survival rates from the nomogram are accurate and reliable.

Fig. 7
figure 7

Assessing risk factors and constructing nomogram of prognosis. Univariate analysis (A) and multivariate analysis (B) were performed to screen risk factors. (C) The ROC curves of clinicopathological characteristics and risk score for the five-year OS. (D) The ROC curves for 1-, 3-, and 5-year OS (E) An established nomogram model incorporated with the three prognosis-related CLEC family genes and clinicopathological parameters for prediction of OS in the TCGA dataset. (F) Calibration curves showed the concordances between predicted and observed 1-, 3-, and 5-year survival rates of GC patients based on the nomogram after bias corrections.

External validation of the prognostic gene signature

To confirm the predictive value of the prognostic model in different populations, we conducted external validation using the GEO cohort. Supplementary Table 1 displays gastric cancer patients’ demographic and clinicopathological characteristics in the GEO validation cohort. Similarly, we performed univariate and multivariate Cox regression analyses to assess the prognostic significance of the model in combination with various clinical and pathological parameters (Fig. 8A-B). Furthermore, the risk score showed slightly lower accuracy in predicting 5-year OS than tumor T and N staging. The 5-year AUCs for risk score, age, gender, T, and N were 0.588, 0.558, 0.538, 0.627, and 0.691, respectively (Fig. 8C). Therefore, the prognostic risk score model built in the GEO validation cohort is an independent prognostic factor for gastric cancer. The riskScore model also demonstrated predictive capability for 1-year, 3-year, and 5-year OS rates, with AUC values of 0.517, 0.560, and 0.588, respectively (Fig. 8D). Additionally, based on the risk coefficients of the three CLEC prognostic genes, the GEO cohort of 483 gastric cancer patients was divided into low-risk and high-risk groups (Fig. 8E–F), with significantly lower OS in the high-risk group compared to the low-risk group (P = 0.004; Fig. 8G). Finally, we constructed a nomogram for predicting survival probability based on the GEO validation cohort (Fig. 9A). The calibration curves for describing actual survival rates at 1 year, 3 years, and 5 years, as well as the predicted survival rates from the nomogram, were relatively consistent with the reference line (Fig. 9B).

Fig. 8
figure 8

External validation of the prognostic gene signature. Univariate analysis (A) and multivariate analysis (B) were performed to screen risk factors in the GEO dataset. (C) The ROC curves of clinicopathological characteristics and risk score for 5-year OS. (D) The ROC curves for 1-, 3-, and 5-year OS. (E) Distribution of the GC patients with different risk scores in high- and low-riskScore groups. (F) The distribution of survival status of GC patients. (G) Overall survival (OS) of high-risk and low-risk patients.

Fig. 9
figure 9

Established nomogram model. (A) An established nomogram model incorporated the three prognosis-related CLEC family genes and clinicopathological parameters for the prediction of OS in the GEO dataset. (B) Calibration curves showed the concordances between predicted and observed 1-,3-, and 5-year survival rates of GC patients based on the nomogram after bias corrections.

The tumor immune microenvironment in gastric cancer

To assess whether the prognostic model based on CLEC family genes contributes to distinguishing differences in gastric cancer’s tumor immune microenvironment, we used the ESTIMATE tool to compare the gene expression characteristics of stromal cells and immune cells in the high-risk and low-risk groups. We found that tumor stromal score, immune score, and ESTIMATE score in the high-risk group were significantly higher than in the low-risk group (P < 0.001, Fig. 10D).

Fig. 10
figure 10

Tumor immune correlation analysis. (A) Differences in immune cell infiltration between the high- and low-risk groups of patients with gastric cancer. (B) Histogram showing the relative infiltration of immune cell populations in tumor samples from The Cancer Genome Atlas dataset. (C) Comparison of immune-related functions between the different risk groups. (D) Comparison of tumor microenvironment composition between risk groups in the TCGA-GC cohort. (E) The difference in TIDE signature between the two risk groups.

Immunological analysis and evaluation

In order to evaluate immune cell infiltration scores and immune-related functions, we quantified immune cell infiltration scores and immune-related functions using R packages such as ‘CIBERSORT’ and ‘GSVA.’ High-risk patients showed significantly higher levels of M2 macrophages, resting dendritic cells, resting mast cells, and eosinophils compared to low-risk patients (P < 0.05; Fig. 10A). Plasma cells, memory B cells, regulatory T cells, and activated mast cells were significantly lower in high-risk patients compared to low-risk patients (P < 0.05; Fig. 10A). Additionally, Fig. 10B displays the proportion of each immune cell type in each sample, providing a more intuitive representation of the distribution of immune cells in different samples. High-risk patients exhibited higher levels of immune functions, including APC co-inhibition, APC co-stimulation, B cells, CCR, CD8 + T cells, Check-point, Cytolytic activity, DCs, iDCs, Macrophages, Mast cells, Neutrophils, NK cells, Parainflammation, pDCs, T cell co-inhibition, T cell co-stimulation, T helper cells, TIL, Treg, Type II IFN Response (P < 0.001, Fig. 10C). Furthermore, we used TIDE to assess the potential immune treatment effects in the high-risk and low-risk groups. The high-risk group had higher TIDE scores (P < 0.001, Fig. 10E), indicating a greater likelihood of immune escape and suggesting that the effectiveness of immunotherapy may be limited, which could contribute to cancer development.

Tumor mutation analysis

We obtained the mutation profiles of gastric cancer patients from the TCGA datasets and visualized the top 15 genes with the highest mutation frequencies. Waterfall plots were generated to compare mutation frequencies between the high-risk and low-risk groups. As shown in Fig. 11A and B, the mutation frequency of 14 genes was lower in the high-risk group than in the low-risk group, with the exception of ARID1A, which exhibited a higher mutation frequency in the high-risk group. The five most frequently mutated genes in the high-risk group were TTN, TP53, MUC16, LRP1B, and ARID1A. In the low-risk group, the five most frequently mutated genes—TTN, TP53, MUC16, LRP1B, and SYNE1—had higher mutation frequencies compared to the high-risk group.

Fig.11
figure 11

Tumor mutation correlation analysis for high- and low-risk subgroups. (A–B) The overall mutation burden of patients in the high- and low-risk groups. (C) Violin plot for the TMB scores between the high-risk and low-risk groups. (D) Kaplan–Meier survival curve in the high-mutation and low-mutation groups. (E) Kaplan–Meier survival curve in the high-mutation + high-risk, high-mutation + low-risk, low-mutation + high-risk, and low-mutation + low-risk groups .

We further analyzed the tumor mutation burden (TMB) levels in both groups. The TMB was significantly lower in the high-risk group than in the low-risk group (p < 0.0042), indicating a correlation between lower mutation burden and increased risk (Fig. 11C). To explore the predictive value of TMB and risk scores on survival outcomes, patients were stratified into high-TMB and low-TMB groups based on the median TMB. Survival curves for each subgroup were generated, revealing that gastric cancer patients in the high-TMB group had better survival outcomes compared to those in the low-TMB group (Fig. 11D). Additionally, survival analysis of four subgroups—high-risk + high-TMB, high-risk + low-TMB, low-risk + high-TMB, and low-risk + low-TMB—showed significant differences in survival outcomes (Fig. 11E), with high-risk patients consistently exhibiting poorer prognosis across TMB levels.

Analysis of drug sensitivity

“OncoPredict” is an R package used for predicting drug sensitivity. It can predict drug sensitivity based on gene expression levels23, specifically estimating the IC50 values for each drug. A lower IC50 value indicates that the drug is more effective in inhibiting the growth of cancer cells. Our analysis identified 53 drugs with higher sensitivity in the high-risk group. Among these, several drugs with clinical trial evidence in gastric cancer were identified, including tyrosine kinase inhibitors (Dasatinib, Cediranib, Foretinib, AZD4547), PI3K/AKT pathway inhibitors (Alpelisib, AZD5363, AZD8186), PARP inhibitor (Olaparib), hormone receptor modulator (Fulvestrant), bone metabolism agent (Zoledronate), and alkylating agent Carmustine (Fig. 12B). Similarly, among the 12 drugs showing higher sensitivity in the low-risk group, we identified EGFR inhibitors with clinical trial evidence, including Sapitinib, Gefitinib, Erlotinib, Afatinib, and Lapatinib (Fig. 12A). The remaining drugs without clinical trial reports for both high-risk and low-risk groups are detailed in Supplementary Materials for Drug Analysis.

Fig. 12
figure 12

Drug sensitivity analysis to drugs of high- and low-risk subgroups. (A) Patients in the low-risk group were more sensitive to the drug. (B) Patients in the high-risk group were more sensitive to the drug.

Gene set and functional enrichment analysis

We conducted GO and KEGG analyses on the differentially expressed mRNAs of three prognosis-related CLEC family genes. The GO enrichment analysis revealed that the target genes were predominantly enriched in pathways related to extracellular matrix organization, extracellular structure organization, external encapsulating, collagen-containing extracellular matrix, and extracellular matrix structural constituents (Fig. 13A). These enriched pathways indicate the involvement of the gene set in biological processes associated with the extracellular matrix and extracellular structural organization. In the KEGG enrichment analysis, we observed significant enrichment in pathways such as the PI3K-Akt signaling pathway, Focal adhesion, Neuroactive ligand-receptor interaction, and Calcium signaling pathway (Fig. 13B). These pathways play crucial roles in various aspects of cell signaling, cell adhesion, metabolism, and cell survival. Furthermore, through GSEA analysis, we found that the high-risk group exhibited enrichment in pathways, including KEGG ECM RECEPTOR INTERACTION and KEGG FOCAL ADHESION (Fig. 13C). These pathways are closely associated with extracellular matrix interactions and adhesion processes. Conversely, the low-risk group showed significant enrichment in pathways including KEGG OXIDATIVE PHOSPHORYLATION and KEGG PROTEASOME (Fig. 13D). These pathways are intimately related to cellular energy production and metabolic processes.

Fig. 13
figure 13

Gene set enrichment analysis. (A) GO analysis of the first 10 items about the enrichment of BP, CC, and MF were shown in the bubble chart (B) The top 30 terms of KEGG pathways enrichment were displayed in the bubble chart20,21,22 (C, D) Gene set and function enrichment analysis of differentially expressed genes between the high-risk group and low-risk group.

Immunohistochemical validation of protein expression

Immunohistochemical methods were used to measure the protein expression levels of three genes from the CLEC family in 20 pairs of gastric cancer and adjacent normal tissue samples (Fig. 14). The results indicated that VCAN (Fig. 14A) and CD93 (Fig. 14B) were highly expressed in gastric cancer tissues, whereas CLEC3A (Fig. 14C) was highly expressed in adjacent normal tissues. The expression levels of these proteins were quantified by IOD values (Fig. 14D), with higher IOD values indicating greater protein expression. The immunohistochemistry and clinical pathology correlation analysis can be found in the Supplementary Materials for Immunohistochemistry.

Fig. 14
figure 14

Expression of three prognosis-related genes of the CLEC family in protein level in 20 pairs of gastric cancer and adjacent normal tissue samples. (A) Expression of VCAN in gastric cancer and adjacent normal tissues. (B) Expression of CD93 in gastric cancer and adjacent normal tissues. (C) Expression of CLEC3A in gastric cancer and adjacent normal tissues. (D) Paired differential analysis of the proteins of three prognosis-related genes from the CLEC family. T represents tumor tissues, and N represents adjacent normal tissues. Magnification levels: 100X, 200X, 400X. ****P < 0.0001. Integrated Optical Density (IOD) reflects the protein expression levels in immunohistochemistry images.

Discussion

Gastric cancer, one of China’s most common malignant tumors, exhibits high incidence and mortality rates. Current treatment strategies for gastric cancer include chemotherapy, radiation therapy, surgery, immunotherapy, targeted therapy, and their combinations24. However, the prognosis for gastric cancer remains poor, with a low 5-year survival rate. Achieving early diagnosis and effective treatment remains challenging. Therefore, identifying new biomarkers is crucial for assessing prognosis, selecting patients suitable for immunotherapy and drug treatment, and guiding personalized therapy. The CLEC family proteins represent promising biomarkers for gastric cancer. Firstly, they are a diverse group of proteins with a wide range of molecular functions, particularly playing essential roles in processes such as cell adhesion8, inflammatory responses25, and antigen presentation26 in the immune system. Secondly, CLEC family proteins are involved in tumor initiation, cancer progression, and metastasis formation. Thus, in this study, we conducted bioinformatics analysis to explore whether differential expression of CLEC family genes can predict the outcomes of gastric cancer and aid in risk stratification.

Including 412 gastric cancer cases from TCGA and 483 cases from the GEO database, we have significantly improved the accuracy of our data analysis. Utilizing mRNA expression matrices and clinical data from the TCGA-STAD cohort, we identified three prognostically relevant CLEC genes that may serve as clinically valuable biomarkers. Based on the prognostic model of these three CLEC family genes, gastric cancer patients were stratified into two subgroups with distinct survival outcomes. Furthermore, we established a risk-scoring model to predict the prognosis of gastric cancer patients based on these prognostic genes. Importantly, we not only validated the overall survival (OS) differences between high and low-risk patients within our cohort but also confirmed the model’s ability to distinguish between them and estimate OS using an external dataset, GSE84437. Additionally, we integrated the risk score with other clinical variables to create a quantitative prognostic assessment chart, known as a nomogram, to better assess the prognosis of gastric cancer patients.

We based our signature on differential genes from the CLEC family, including VCAN, CD93, and CLEC3A. VCAN encodes for a large aggregating chondroitin sulfate proteoglycan and is also a CLEC family member. VCAN is a crucial extracellular matrix component and is closely associated with tumorigenesis27. Previous studies have indicated that VCAN enhances tumor cell survival, growth, migration, invasion, angiogenesis, and metastasis28,29. It has been reported that increased VCAN expression is associated with leukemia, brain cancer, colorectal cancer, liver cancer, prostate cancer, breast cancer, ovarian cancer, oral squamous cell carcinoma, and lung cancer and is correlated with adverse outcomes29,30,31,32,33,34,35. VCAN is involved in key signaling pathways, such as PI3K in bladder36 and liver cancer37, NF-κB in ovarian cancer38, and TGF-β in leukemia39. It also regulates EGFR signaling in renal cell carcinoma40 and plays a role in EMT in lung adenocarcinoma41. These findings highlight VCAN’s complex role in tumorigenesis and provide insights into its potential mechanisms in gastric cancer.Our results also indicate that VCAN is upregulated and is oncogenic in gastric cancer. Human CD93, also known as complement component 1q subcomponent receptor 1 (C1qR1 or C1qRp), is considered to be a cell surface receptor for the complement component C1q42. CD93 contains C-type lectin-like domains and is involved in various cellular processes, including angiogenesis, inflammation, and cell adhesion43. CD93 has been shown to activate the integrin β1/PI3K/AKT/SP2 signaling pathway in breast cancer to promote growth and angiogenesis44, and it interacts with VEGFR2 to regulate tumor vasculature integrity, with loss of CD93 leading to excessive VEGFR2 phosphorylation and promoting metastasis45. These findings provide important insights into CD93’s role in tumor progression, especially in gastric cancer. In addition to renal clear cell carcinoma, high CD93 expression has been associated with tumor angiogenesis, immune cell infiltration, poor prognosis, and advanced TNM staging in many cancer types, including glioblastoma, nasopharyngeal carcinoma, ovarian cancer, renal interstitial cell carcinoma, and squamous cell lung carcinoma46,47,48. Furthermore, it has been reported that CD93 overexpression is associated with poor prognosis in gastric cancer, although the molecular mechanisms behind this association are still relatively unexplored49. CLEC3A belongs to the C-type lectin superfamily and was initially known to be expressed in cartilage8,50. CLEC3A influences tumor cell proliferation and migration through cell adhesion51,52. Additionally, CLEC3A participates in cell invasion and metastatic spread by enhancing tissue plasminogen activator activation53,54,55. In osteosarcoma, CLEC3A promotes proliferation and enhances chemotherapy sensitivity through the AKT1/mTOR/HIF1α signaling pathway, with increased expression linked to advanced TNM stages and lymph node metastasis56. In breast cancer, CLEC3A also contributes to tumor growth and metastasis via the PI3K/AKT pathway57. These findings suggest that CLEC3A may be a promising therapeutic target in gastric cancer. However, the role of CLEC3A in the occurrence and development of gastric cancer has not been explored yet. In summary, our research findings suggest that the three prognostically relevant CLEC family genes may play significant roles in the development of gastric cancer.

Through univariate Cox analysis, Lasso regression, and multivariate Cox analysis, we identified three genes associated with the prognosis of the CLEC gene family to construct a prognostic model. Survival and ROC curve analyses revealed that these three genes possess diagnostic solid capabilities and can be utilized to identify gastric cancer patients with poor prognoses. The risk score model exhibited good predictive ability with AUC values of 0.615, 0.657, and 0.802 at 1, 3, and 5 years, respectively. However, the specific molecular mechanisms of these three CLEC family genes related to prognosis in gastric cancer remain unclear, and their potential molecular mechanisms await further exploration. Subsequently, we evaluated the relationship between the risk score model and clinical variables, finding that the risk score model exhibited significant risk stratification capabilities among the clinical parameters of gastric cancer. Furthermore, we developed a nomogram to predict gastric cancer patients’ 1-year, 3-year, and 5-year survival rates. Our results demonstrated that when predicting overall survival in TCGA and GEO datasets, the risk score exhibited good accuracy among various clinical parameters.

Next, we utilized the characteristics of CLEC genes to predict differences in the tumor microenvironment within distinct risk groups. In the high-risk group, tumor stromal scores, immune scores, and ESTIMATE scores were significantly higher than those in the low-risk group, indicating that the extracellular matrix components in the tumor microenvironment are more abundant in high-risk patients. This may be associated with tumor growth, spread, and invasiveness58. Furthermore, in high-risk patients, there was a higher infiltration of immune cells in the tumor microenvironment, suggesting that the immune system is more active in these individuals. This could be a favorable aspect of the immune system’s anti-tumor response. In summary, differences in the tumor microenvironment may be related to tumor growth, spread, and patient prognosis. These variances can provide researchers with valuable insights into the biological characteristics of gastric cancer and patient prognosis, aiding in developing more precise treatment strategies.

Subsequently, we further analyzed the differences in immune cells, immune functions, and immune evasion within different risk groups as defined by our prognostic model. In high-risk patients, M2 macrophages, resting dendritic cells, resting mast cells, and eosinophils significantly increased, while memory B cells,plasma cells, regulatory T cells, and activated mast cells significantly decreased. This suggests that in gastric cancer patients, the increased presence of immunosuppressive cells and inflammation-promoting cells, along with the decreased presence of immune-activating cells, is associated with poor prognosis. Additionally, studies have shown that inflammatory cells can secrete various pro-inflammatory factors, growth factors, and metalloproteinases, leading to extracellular matrix remodeling and degradation, thus promoting tumor cell proliferation, migration, and invasion58. These results indicate that changes in the composition of immune cell populations are correlated with the severity of gastric cancer, playing a crucial role in assessing disease progression and prognosis. Moreover, Differences in various immune cells and immune functions, such as APC co-inhibition, APC co-stimulation, B cells, CCR, CD8 + T cells, check-point markers, and more, indicate the presence of widespread immune activation in high-risk gastric cancer patients. The overall immune function is more active in high-risk patients. However, the tumors represented by high-risk patients are more invasive, and the disease is more severe, suggesting that this immune activation does not effectively eliminate the tumor but contributes to disease progression. This finding holds significant implications for our understanding of the gastric cancer immune microenvironment and objective prognosis assessment. Early recognition of this “pseudo” immune activation state can provide a basis for optimizing treatment timing and strategies. Finally, we analyzed the potential for immune therapy and evasion in different risk groups using TIDE scores. In high-risk gastric cancer patients, the TIDE scores were significantly higher compared to the low-risk group. This indicates that tumor cells have acquired stronger immune evasion and suppression capabilities in high-risk gastric cancer patients. In summary, although high-risk gastric cancer patients exhibit a significant presence of immune cells and widespread immune activation, they are unable to suppress the tumor effectively. This is because the tumors have also developed robust immune evasion capabilities. Therefore, high-risk patients may manifest a complex “pseudo” immune-enhanced state, leading to poor responses to immune therapy. Recognizing this “pseudo” immune-enhanced state in gastric cancer patients will help provide a more accurate understanding of the disease’s immune characteristics, optimizing treatment strategies and developing new therapies.

In this study, we revealed significant differences in mutation frequencies between high-risk and low-risk subgroups, as well as their association with tumor mutation burden (TMB). In the high-risk group, the mutation frequencies of genes such as TTN, TP53, and MUC16 were lower, while the low-risk group exhibited higher frequencies. This observation may reflect that the normal functions of these genes are, on the whole, more effective in promoting tumor progression within the high-risk cohort. Notably, TP53, recognized as a crucial tumor suppressor gene, plays a vital role in the body’s defense against cancer59, and its lower mutation frequency may indicate that its function is preserved in the high-risk group, thereby exerting some degree of inhibitory effect on tumor progression.

MUC16 is important for tumor proliferation, metastasis, and the regulation of innate immune responses by inhibiting natural killer cells60,61. Literature has reported a strong association between TTN mutations and the development of gastric cancer, suggesting that TTN mutations may serve as clinical biomarkers for this malignancy. Additionally, the relationship between TTN mutations and immune response has garnered attention, with studies developing models based on immune characteristics associated with TTN mutations62.

Furthermore, the low-risk group exhibited significantly higher TMB compared to the high-risk group (p < 0.0042), aligning with the understanding that elevated TMB is typically associated with favorable prognoses63. This finding underscores the potential value of TMB as a biomarker for personalized treatment strategies and highlights the need for further investigation into the roles of these genes in relation to prognosis.

Utilizing prognostic CLEC gene markers in gastric cancer, we sought to predict the sensitivity of patients from different risk groups to chemotherapy, offering potential guidance for treatment strategies. Our drug analysis results revealed that high-risk gastric cancer patients exhibited higher sensitivity to various agents targeting the PI3K/mTOR pathways, CDK inhibitors, and HER2 and EGFR targeting drugs. Conversely, low-risk patients demonstrated increased sensitivity to RAF/MEK pathway inhibitors. These findings suggest that tumor cells in high-risk patients rely heavily on the PI3K/mTOR pathway and cell cycle protein activity, whereas low-risk patients depend more on the MAPK pathway. This distinction underscores the intrinsic variances in driving pathways and oncogenic mechanisms between risk groups, thereby advocating for the adoption of personalized targeted treatment strategies and optimized drug selection tailored to patients of varying risk profiles. Analyzing drug sensitivities among patients of different risk groups can identify disparities in tumor-driving pathways, ultimately offering insights for novel drug development and the formulation of individualized precision treatment strategies.

We further conducted GO, KEGG, and GSEA analyses to assess biological functions. Enrichment analysis of CLEC family genes’ biological functions and pathways revealed distinctions between high and low-risk groups in extracellular matrix and signaling pathways. The high-risk group exhibited anomalies in extracellular matrix and cell adhesion pathways, which may be related to the increased disease risk in this group. However, experimental validation is required to elucidate these pathways’ specific roles in the disease’s development and progression.

In summary, this study identified differential expression of CLEC genes in gastric cancer tissues, potentially attributed to varying mechanisms involved in tumorigenesis. Through univariate COX analysis, we investigated the prognostic significance of three CLEC family genes associated with prognosis. We found that high expression of VCAN, CD93, and CLEC3A genes is associated with poor prognosis.

However, there are several limitations to the current study. Firstly, the TCGA database lacks adequate normal tissue samples, necessitating expanding the sample size for validation. Secondly, the samples in the GSE84437 are derived from diffuse gastric cancer, which does not align entirely with the pathological classification of gastric adenocarcinoma in the TCGA dataset. Moreover, the clinical data in the GEO dataset lacks information such as grading and staging of gastric cancer. As a result, when the Nomogram was used for external validation with GEO data, there were discrepancies in survival rate predictions compared to the results predicted by the TCGA data. This might require more comprehensive information for further validation. The functional relationships between members of the CLEC gene signature and non-tumor cells, especially infiltrating immune cells, in the tumor microenvironment are still poorly understood and require further in vitro and in vivo research. The impact of CLEC family genes on gastric cancer cell proliferation, invasion, and migration also needs further validation in both in vitro and in vivo settings.

Compared with previous prognostic models for gastric cancer, our CLEC family-based model provides unique insights into the tumor immune microenvironment. While the CAF-based model focuses on the characteristics of tumor stromal cells4, our model emphasizes immune recognition molecules, offering complementary perspectives on the tumor microenvironment. Similarly, compared to the cancer-testis antigen model which concentrates on tumor-specific antigens5, our study highlights immune recognition molecules and inflammatory responses. Recent studies have demonstrated the value of immune-related signatures in various cancers, such as the immune-related model in papillary thyroid cancer6 and the APOBEC mutagenesis-related model in bladder cancer7. In line with these advances, our CLEC family-based model specifically focuses on the immune recognition aspect in gastric cancer, representing a novel and complementary approach to existing models. These different prognostic models each have their distinct features and collectively contribute to a more comprehensive understanding of cancer prognosis mechanisms. Notably, our CLEC family-based model may have unique advantages in predicting immunotherapy response and identifying potential immunotherapy targets, as CLEC family members play crucial roles in immune cell recognition and inflammatory responses. This suggests that our model might provide valuable guidance for personalized immunotherapy strategies in gastric cancer treatment.

Conclusion

Taken together, we initially constructed a gastric cancer prognostic model based on the characteristics of CLEC family genes, stratifying gastric cancer patients into two distinct subgroups with varying survival outcomes. Additionally, we developed a nomogram chart to aid clinical decision-makers in providing optimal treatment strategies. The prognostic features were associated with different immune cells and immune functions and predicted sensitivity to chemotherapy drugs. These findings may serve as novel targets for developing immunotherapies for low-risk and high-risk gastric cancer patients. The prognosis-related CLEC family genes likely play pivotal roles in gastric cancer’s initiation, progression, invasion, and metastasis. Furthermore, these discoveries have paved the way for developing new clinical treatment targets or prognostic markers.