Abstract
This study aimed to elucidate the potential causal relationship between 4,907 plasma proteins and the risk of gastric cancer using a two-sample Mendelian randomization approach. We utilized genome-wide association study (GWAS) data to perform two-sample Mendelian randomization analyses, treating the 4,907 plasma proteins as exposure factors and gastric cancer as the outcome. Instrumental variables for plasma proteins were selected based on strongly correlated SNPs identified through data processing and screening of the GWAS data provided by the deCode database. We employed a set of statistical methods centered on inverse variance weighting (IVW) for Mendelian randomization analysis to estimate the odds ratios (ORs) for the effects of these plasma proteins on gastric cancer susceptibility. According to the IVW method, 14 plasma proteins were associated with gastric cancer (p < 0.005). Specifically, CHST15 (OR = 0.7553, 95% CI = 0.6346 − 0.8988), L1CAM (OR = 0.7230, 95% CI = 0.5876 − 0.8896), FTMT (OR = 0.8246, 95% CI = 0.7241 − 0.9391), and PMM2 (OR = 0.5767, 95% CI = 0.3943 − 0.8433) were negatively correlated with GASTRIC CANCER, whereas ABO (OR = 1.1868, 95% CI = 1.0638 − 1.3240), FAM3D (OR = 1.2109, 95% CI = 1.0850 − 1.3515), FAM3B (OR = 1.2988, 95% CI = 1.0953 − 1.5402), ADH7 (OR = 1.3568, 95% CI = 1.1044 − 1.6670), MAP1LC3A (OR = 1.3704, 95% CI = 1.1194 − 1.6778), PGLYRP1 (OR = 1.4071, 95% CI = 1.1235 − 1.7623), PDE5A (OR = 1.7446, 95% CI = 1.2693 − 2.3978), GLUL (OR = 3.1203, 95% CI = 1.5017 − 6.4839), NFE2L1 (OR = 3.1759, 95% CI = 1.6163 − 6.2402), and MAFG (OR = 3.1945, 95% CI = 1.5329 − 6.6575) were positively correlated. Convergent results from Weighted Median and MR-Egger analyses confirmed these associations. Reverse Mendelian randomization analysis indicated that gastric cancer does not significantly alter the levels of these 14 plasma proteins (p > 0.05). Sensitivity analyses, including assessments of heterogeneity and horizontal pleiotropy, confirmed the robustness and reliability of our findings without significant bias. Pathway enrichment analysis of gene expression associated with these 14 plasma proteins, using GO and KEGG pathways, revealed that CHST15, L1CAM, FTMT, and PMM2 may serve as protective factors against gastric cancer, while ABO, FAM3D, FAM3B, ADH7, MAP1LC3A, PGLYRP1, PDE5A, GLUL, NFE2L1, and MAFG may contribute to gastric cancer pathogenesis. These results highlight the complex biological interactions between plasma proteins and tumorigenesis, providing valuable insights for preventive and therapeutic strategies in gastric malignancy management.
Similar content being viewed by others
Introduction
Gastric cancer(GC) is the second most prevalent malignancy in China1, with gastric adenocarcinoma accounting for 95% of cases. Globally, it ranks among the highest in incidence and mortality, with the World Health Organization reporting it as the fifth most commonly diagnosed cancer and the fourth leading cause of cancer-related death2,3. Major risk factors include Helicobacter pylori infection, smoking, alcohol consumption, and an unhealthy diet4. Early-stage GC often presents with no obvious symptoms, leading to diagnosis at more advanced stages. Recent advancements in bioinformatics have identified several tumor markers for GC, such as carcinoembryonic antigen (CEA) and CA199; however, their sensitivity remains suboptimal5. Consequently, there is a pressing need to discover new circulating tumor markers to improve early detection and screening of GC.
With the rapid advancement of proteomics technologies, a plethora of differently expressed plasma proteins have been identified, providing a substantial and reliable data foundation for exploring new tumor markers for GC. In recent years, genome-wide association studies (GWAS) have identified numerous genetic polymorphisms associated with GC. Protein quantitative trait loci (pQTLs) refer to genetic variants that influence the abundance of specific proteins, and GWAS of circulating protein levels can pinpoint these pQTL clusters. By co-localizing with disease variants, pQTLs can guide the identification of pathogenic genes and disease pathways, thereby elucidating the association between circulating proteins and disease6,7. Mendelian randomization(MR), increasingly utilized in medical research, employs genetic variants as instrumental variables (IVs) to mitigate confounding effects and analyze exposure-outcome relationships, thereby clarifying the inherent causal relationships8. Compared to traditional observational studies and randomized controlled trials, MR offers several advantages: (a). By leveraging existing large-scale GWAS data, MR studies are more cost- and time-efficient. (b).MR can reveal potential causal relationships between exposure factors and diseases. (c). MR analysis, analogous to the random allocation of treatments in randomized controlled trials, benefits from the random segregation of genetic alleles during gamete formation, thereby minimizing the biases inherent in conventional randomized controlled trials9. This approach has been shown to enhance our understanding of the complex mechanisms by which abnormal expression of plasma circulating proteins may promote the progression of GC. Not only does it deepen our knowledge of GC, but it also provides potential targets and strategic pathways for prevention and therapeutic intervention. This study aims to use MR analysis, in conjunction with data from GWAS, to explore the causal relationship between plasma proteins and GC risk, thereby offering guidance for early screening and efficacy assessment of GC.
Materials and methods
Material
GWAS data on GC were sourced from the Finngen database (https://www.finngen.fi/en/access_results), specifically from the cohort labeled “Malignant neoplasm of stomach (controls excluding all cancers)” (Ncase = 1,741; Ncontrol = 345,118; R11)10. Data on 4,907 plasma proteins were obtained from the study by Ferkingstad et al.6 and downloaded from the deCODE database (https://www.decode.com) (Table 1).
Selection of instrumental variables (IVs)
MR analysis examines the effect of exposure on an outcome by substituting instrumental variables (IVs) for the exposure and outcome in the analysis. To ensure the validity of IVs, three assumptions must be met: (a) the relevance assumption, where IVs must be strongly associated with the exposure; (b) the independence assumption, where IVs should not be associated with the outcome except through the exposure; and (c) the exclusivity assumption, where IVs affect the outcome solely through the exposure and are not influenced by other potential confounders (Fig. 1)11. IVs from GWAS data are selected following these steps: (a) SNPs are retained if p < 5e-8, indicating strong association with the exposure; (b) SNPs within 10,000 kb of each other are clustered for linkage disequilibrium, with r2 < 0.001 SNPs selected; (c) SNPs are aligned for exposure and outcome allele directions, excluding palindromic or ambiguous SNPs; (d) SNPs are assigned an F-statistic value (F = R^2*(n-2)/(1-R^2); R^2 = 2*(1-MAF)*MAF*beta^2/(2*(1-MAF)*MAF*beta^2 + 2*(1-MAF)*MAF*se^2*n)) and those with F < 10 are excluded12. To ensure adequate IVs, we employed a threshold of p < 5 × 10^-6 for selecting gastric cancer IVs in the reverse MR analysis of plasma proteins and GC, indicating a strong association between SNPs and GC. For the forward MR analysis, a threshold of p < 5 × 10^-8 was sufficient to identify an adequate number of plasma protein IVs; thus, we used this criterion to enhance the robustness of the findings.
Forward MR analysis
In this study, MR analysis was conducted using the TwoSampleMR package (v0.6.6) of R software (v4.41). The primary analysis employed the inverse-variance weighted (IVW) method to assess the causal effects of genetically predicted protein levels on GC risk. To provide supplementary evidence to the IVW results, we also utilized MR-Egger regression, weighted median, simple mode, and weighted mode analyses13. To ensure a sufficient number of statistically significant plasma proteins for analysis while minimizing the probability of false positives, we set a stringent threshold for the IVW analysis results with a p-value cut-off of p < 0.005. The flowchart of the research design is shown in Fig. 2.
The study was conducted following these steps: First, we downloaded GWAS data on GC and plasma proteins for association and linkage disequilibrium analyses, selecting SNPs that could serve as instrumental variables. Second, we performed forward Mendelian randomization analysis with plasma proteins as the exposure and GC as the outcome, employing sensitivity analyses to account for heterogeneity and pleiotropy. Subsequently, we conducted reverse Mendelian randomization, using the GC-associated plasma proteins as the outcome and GC as the exposure, to elucidate the causal relationship between the two. Finally, we performed pathway enrichment analyses using the GO and KEGG databases, and conducted GeneMANIA analysis on the identified protein-coding genes. GWAS: Genome-Wide Association Study. SNP: Single Nucleotide Polymorphism. Ncase: Number of cases. Ncontrol: Number of controls. MR analysis: Mendelian randomization analysis. GC: Gastric cancer. GO : Gene Ontology. KEGG: Kyoto Encyclopedia of Genes and Genomes.
Sensitivity analyses
In this study, sensitivity analyses were conducted using the TwoSampleMR package (v0.6.6) of R software (v4.41). Genetic variation heterogeneity was assessed with Cochran’s Q test, where p > 0.05 indicated the absence of significant heterogeneity. Horizontal pleiotropy was evaluated using the Egger intercept, with p > 0.05 suggesting no evidence of pleiotropy14. Additionally, we employed the MR Pleiotropy RESidual Sum and Outlier (MR-PRESSO) test to detect outliers indicative of horizontal pleiotropy among the multiple instrumental variables. A p-value > 0.05 from MR-PRESSO indicated no significant horizontal pleiotropy15.
Reverse MR analysis
To better elucidate the causal relationship between exposure and outcome, we conducted reverse MR analysis and sensitivity analysis by treating GC as the exposure and the plasma proteins identified as associated with GC in forward MR as the outcomes. This approach helps identify potential reverse causality, mitigate confounding factors, and ensure the robustness of our study16.
GO/KEGG pathway enrichment and GeneMANIA analysis
To deeply understand the relationship between the screened proteins and GC, we also performed gene ontology (GO) and Kyoto Encyclopedia of Gene and Genome (KEGG)17,18,19 pathway enrichment analysis for the screened biomarker genes of GC by using the clusterProfiler package (V4.12.2) of R software (v4.41). GeneMANIA database (http://genemania.org/) is a website for analyzing protein-protein interactions (PPIs), through which we analyzed the screened protein genes and determined the PPIs of the screened protein network.
Results
Forward MR analysis
After performing association analysis and removing linkage disequilibrium from GWAS data involving 4,907 plasma proteins, 1,733 GWASs were excluded. Subsequently, these proteins were subjected to forward MR analysis using varying p-value thresholds. Specifically, 93 plasma proteins were selected at p < 0.05, 30 proteins at p-value < 0.01, and 14 proteins at p < 0.005, which are associated with the incidence of GC. To minimize the probability of false positives while ensuring the identification of a sufficient number of plasma proteins, we adopted a threshold of p < 0.005 as a stringent criterion, thereby selecting 14 plasma proteins for subsequent analyses (Figs. 3 and 4). These proteins include Carbohydrate Sulfotransferase 15 (CHST15), L1 Cell Adhesion Molecule (L1CAM), Ferritin Mitochondrial (FTMT), Phosphomannomutase 2 (PMM2), Alpha 1-3-N-Acetylgalactosaminyltransferase and Alpha 1-3-Galactosyltransferase (ABO), FAM3 Metabolism Regulating Signaling Molecule D (FAM3D), FAM3 Metabolism Regulating Signaling Molecule B (FAM3B), Alcohol Dehydrogenase 7 (Class IV) (ADH7), Microtubule Associated Protein 1 Light Chain 3 Alpha (MAP1LC3A), Peptidoglycan Recognition Protein 1 (PGLYRP1), Phosphodiesterase 5 A (PDE5A), Glutamate-Ammonia Ligase (GLUL), NFE2 Like BZIP Transcription Factor 1 (NFE2L1), and MAF BZIP Transcription Factor G (MAFG). Of these, The results obtained by IVW method analysis show that CHST15 (OR = 0.7553, 95% CI = 0.6346–0.8988, p = 0.0016), L1CAM (OR = 0.7230, 95% CI = 0.5876–0.8896, p = 0.0022), FTMT (OR = 0.8246, 95% CI = 0.7241–0.9391, p = 0.0036), and PMM2 (OR = 0.5767, 95% CI = 0.3943–0.8433, p = 0.0045) were negatively associated with GC, suggesting they may act as protective factors. Conversely, ABO (OR = 1.1868, 95% CI = 1.0638–1.3240, p = 0.0021), FAM3D (OR = 1.2109, 95% CI = 1.0850–1.3515, p = 0.0006), FAM3B (OR = 1.2988, 95% CI = 1.0953–1.5402, p = 0.0026), ADH7 (OR = 1.3568, 95% CI = 1.1044–1.6670, p = 0.0036), MAP1LC3A (OR = 1.3704, 95% CI = 1.1194–1.6778, p = 0.0023), PGLYRP1 (OR = 1.4071, 95% CI = 1.1235–1.7623, p = 0.0029), PDE5A (OR = 1.7446, 95% CI = 1.2693–2.3978, p = 0.0006), GLUL (OR = 3.1203, 95% CI = 1.5017–6.4839, p = 0.0023), NFE2L1 (OR = 3.1759, 95% CI = 1.6163–6.2402, p = 0.0008), and MAFG (OR = 3.1945, 95% CI = 1.5329–6.6575, p = 0.0019) were positively associated with GC, indicating they may contribute to disease pathogenesis (Figs. 5 and 6). The findings were further corroborated by the MR-Egger regression, weighted median (WM) analysis, as well as the Simple mode and Weighted mode methodologies (Figure S1).
Scatter plots of 4 plasma proteins (including CHST15, L1CAM, FTMT, PMM2,) negatively associated with GC generated by Mendelian randomization analysis. The X-axis represents plasma proteins, while the Y-axis represents GC. The five slashes represent each of the five MR analysis methods. Each black dot represents 1 SNP. SNP: single nucleotide polymorphism.
Scatter plots of 10 plasma proteins (including PDE5A, FAM3D, NFE2L1, MAFG, ABO, MAP1LC3A, GLUL, FAM3B, PGLYRP1, ADH7) positively associated with GC generated by Mendelian randomization analysis. The X-axis represents plasma proteins, while the Y-axis represents GC. The five slashes represent each of the five MR analysis methods. Each black dot represents 1 SNP. SNP: single nucleotide polymorphism.
A forest plot summarizing the Mendelian randomization analysis results of plasma proteins associated with GC is presented. Pval is determined from the IVW MR method. The black dots represent odds ratios (OR), with horizontal lines indicating 95% confidence intervals (CI), and the vertical red dashed line represents OR = 1. OR: odds ratio, with values greater than 1 indicating a positive association between the exposure and outcome, and values less than 1 indicating a negative association. CI: confidence interval.
Sensitivity analysis
Sensitivity analysis confirmed the robustness of our findings, including heterogeneity analysis using Cochran’s Q test and horizontal pleiotropy assessment via Egger intercept and MR-PRESSO. Results indicated that the Cochran’s Q test p-values for the 14 plasma proteins in relation to GC were all greater than 0.05, suggesting no significant heterogeneity. Similarly, Egger intercept p-values were also above 0.05 for these analyses, indicating no impact from horizontal pleiotropy. Furthermore, the MR-PRESSO global test p-value greater than 0.05 further corroborated the absence of horizontal pleiotropy (Table 2).
Reverse MR analysis
Utilizing plasma protein levels as exposure factors and GC as the outcome, the forward MR analysis revealed that the abnormal expression of these 14 plasma proteins can influence the onset and progression of GC. To further investigate whether the occurrence of GC leads to the abnormal expression of these 14 plasma proteins, we conducted a reverse MR analysis. For selecting instrumental variables (IVs), we adjusted the criteria to include SNPs with p < 5e-6. After performing association analysis and linkage disequilibrium processing of GC GWAS data, 8 IVs were selected. The IVW analysis revealed p-values greater than 0.05, indicating that GC incidence does not affect the abnormal expression of these proteins (Fig. 7). This finding was corroborated by MR-Egger regression, weighted median, simple mode, and weighted mode analyses (Fig. S2). Sensitivity analysis of the reverse MR results showed Cochran’s Q test p-values greater than 0.05, confirming no significant heterogeneity. Additionally, the Egger intercept and MR-PRESSO global test p-values were also above 0.05, indicating no impact from horizontal pleiotropy (Table 3).
A forest plot summarizing the results of reverse Mendelian randomization analysis for plasma proteins associated with GC is presented. Pval is determined from the IVW MR method. Black dots represent odds ratios (OR), horizontal lines indicate 95% confidence intervals (CI), and the vertical red dashed line denotes OR = 1. OR: odds ratio, with values greater than 1 indicating a positive association between the exposure and outcome, and values less than 1 indicating a negative association. CI: confidence interval.
GO/KEGG pathway enrichment analysis and GeneMANIA analysis
Pathway enrichment analysis using the GO database revealed the following: In the Biological Process (BP), significant enrichment was observed in the glycoprotein biosynthetic process (p = 0.0016) and glycoprotein metabolic process (p = 0.0026). In the Cellular Component (CC), these genes were significantly enriched in the neuronal cell body (p = 0.0450). In the Molecular Function (MF), these genes were significantly enriched in cytokine activity (p = 0.0135) (Fig. 8A, B). KEGG database analysis indicated significant enrichment in Ferroptosis (p = 0.0008) (Fig. 8C, D). Lastly, GeneMANIA network analysis of these genes revealed that Physical Interactions accounted for 69.92%, Shared Protein Domains for 14.63%, Co-expression for 11.72%, Co-localization for 2.16%, Genetic Interactions for 0.98%, and Predicted interactions for 0.59% (Fig. 8E).
(A) A bar chart depicting the enrichment analysis of 14 selected protein genes using the GO database. The color of the bars represents the p-value, and “Count” denotes the number of genes enriched in the pathway. (B) A bubble chart illustrating the enrichment analysis of 14 selected protein genes using the GO database. The color of the bubbles indicates the p-value, and the size of the bubbles represents the number of genes enriched in the pathway. (C) A bar chart showing the enrichment analysis of 14 selected protein genes using the KEGG database. The color of the bars represents the p-value, and “Count” denotes the number of genes enriched in the pathway. (D) A bubble chart depicting the enrichment analysis of 14 selected protein genes using the GO database. The color of the bubbles represents the p-value, and the size of the bubbles indicates the number of genes enriched in the pathway. GeneRatio: the ratio of the number of genes enriched in the pathway to the total number of these 14 genes. BP: Biological Process. CC: Cellular Component. MF: Molecular Function. (E) A PPI network was generated from GeneMANIA analysis of these proteins.
Discussion
We conducted MR analysis using GWAS data for 4907 plasma proteins to identify those potentially associated with GC. This analysis highlighted 14 plasma proteins of interest. Sensitivity analyses were performed to minimize the impact of heterogeneity and horizontal pleiotropy on our findings. Additionally, reverse MR demonstrated that the incidence of GC does not lead to abnormal expression of these proteins, further validating the causal relationship between these proteins and GC. Among the identified proteins, ABO, FAM3D, FAM3B, ADH7, MAP1LC3A, PGLYRP1, PDE5A, GLUL, NFE2L1, and MAFG showed a positive association with GC, indicating their potential as biomarkers for early detection of this malignancy. Enrichment analyses using GO and KEGG databases revealed significant involvement of MAP1LC3A and FTMT in Ferroptosis, a process that is critically implicated in tumor development and progression according to current research20.
The ABO protein, a circulating protein associated with human ABO blood types, is foundational for the formation of these blood types due to genetic variations. Research dating back to 1950 has indicated an association between ABO blood groups and GC21. A large-scale meta-analysis conducted by Cui et al. elucidated the relationship between abnormal expression of ABO genes and cancer, revealing that individuals with blood type A (OR = 1.19, p = 3.90 × 10^–15) and blood type AB (OR = 1.10, p = 0.007) are associated with an increased risk of GC22. A/B antigens have pro-coagulant and angiogenic properties and act as receptors and ligands for key immune proteins. They can also alter cell motility and resistance to apoptosis, potentially linking blood type to tumor development and metastasis23,24. In our present study, it was also found ABO protein was a causative factor of GC and might also act as a potential biomarker for early detection of GC, implying ABO protein plays an important role in tumorigenesis of GC. FAM3D and FAM3B are metabolic-regulating signaling molecules of the FAM3 family. It has been documented that upregulation of FAM3B induces cisplatin resistance in GC cells by modulating epithelial-mesenchymal transition (EMT), suggesting an association with GC25. Additionally, Park et al. reported that FAM3D expression is highly correlated with Helicobacter pylori infection in gastritis, a significant risk factor for GC26. These results are consistent with our findings, indicating that FAM3D and FAM3B serve as causative factors for GC, potentially promoting its occurrence. ADH7, a member of the alcohol dehydrogenase family, is highly expressed in gastric epithelial tissues and converts ethanol to acetaldehyde27,28. Acetaldehyde is a known toxin and carcinogen29. While acetaldehyde is rapidly metabolized to acetate by aldehyde dehydrogenase, further research is needed to determine whether its excessive expression increases the risk of GC. Autophagy, crucial for maintaining homeostasis, limiting inflammation, and preventing tissue damage and genomic instability, involves microtubule-associated protein 1 light chain 3 (MAP1LC3), a key regulator of autophagosome formation30,31. A retrospective study by Liao et al.32 found that abnormal MAP1LC3A expression is linked to GC and poor prognosis. Our MR analysis revealed that ADH7 and MAP1LC3A were positively associated with GC, indicating it may contribute to disease pathogenesis. However, the involvements of the underlying mechanisms need to be further illuminated. Peptidoglycan recognition protein 1 (PGLYRP1) is an innate immune protein with critical roles in antimicrobial and antitumor defense, exhibiting direct bactericidal activity against both Gram-positive and Gram-negative bacteria33. Recent research indicates that the gene encoding PGLYRP1 is highly co-expressed with genes encoding co-suppressor molecules, suggesting its potential as a promising target for cancer immunotherapy and possibly a new target for GC immunotherapy34. Glutamate-ammonia ligase (GLUL), involved in glutamine synthesis, has recently been found to have significant positive correlations with various cancers and may serve as a novel cancer treatment target35,36. However, a recent study by Jiang et al.37 reported that GLUL expression is significantly reduced in GC tissues compared to adjacent normal tissues, which also found that GLUL stabilizes E-cadherin by antagonizing β-catenin, thus inhibiting GC progression. Our findings indicate that high GLUL expression increases GC risk, which may seem contradictory. This discrepancy may arise because the highly invasive nature of GC might lead to substantial glutamine uptake from the environment, reducing the need for endogenous glutamine synthesis. Further research is needed to elucidate the underlying mechanisms38. NFE2 Like BZIP Transcription Factor 1 (NFE2L1) is a key regulator of antioxidant, detoxification, and cell protection genes in response to cellular stress39. Increasing evidence highlights its critical role in tumor development, including growth, invasion, migration, and metastasis40. MAF BZIP transcription factor G (MAFG) is a basic leucine zipper (bZIP) protein belonging to the v-maf oncogene family. It is implicated in regulating hematopoietic gene expression41, although no studies have yet linked it to cancer development. However, its antisense gene, MAFG-AS1, has been associated with the onset and progression of various cancers, including GC. This long non-coding RNA (lncRNA) promotes tumorigenesis by inhibiting apoptosis and enhancing proliferation, migration, invasion, aerobic glycolysis, ferroptosis, and angiogenesis42,43. Moreover, our enrichment analyses showed MAP1LC3A and FTMT mainly enriched in Ferroptosis pathway, which is known as a newly programmed cell death pattern and is related to the tumorigenesis and progression of cancer44. Previous studies have reported various programmed cell death patterns, such as apoptosis, ferroptosis, cuproptosis, etc., which may be the new therapeutic directions for GC45,46. Among of them, ferroptosis, an iron-dependent form of cell death, is characterized by intracellular lipid peroxide accumulation and redox imbalance. Several researches have documented that ferroptosis can play an important role in the proliferation, invasion, metastasis, and tumor microenvironment of GC45,47. Thereby, our GO and KEGG enrichment pathway analyses of MAP1LC3A and FTMT suggested a new therapeutic direction related to ferroptosis in GC.
In our study, we employed bidirectional two-sample MR to identify 14 plasma proteins with causal relationships to GC. Gene enrichment analyses using GO and KEGG databases, along with co-expression network analysis via the GeneMANIA online tool, provided deeper insights into the relationships between these proteins and GC. Our findings may enhance gene-based screening methods for early GC and offer valuable evidence for the development of screening and prevention strategies. Future research will focus on further validating the functional roles of the identified proteins through additional experiments to explore their impact on GC cells.
This study has several limitations. First, the GWAS data for GC lack stratification by specific subtypes, precluding subtype analyses. Second, inherent limitations of MR (MR) studies must be addressed, including ensuring that instrumental variables (IVs) satisfy the assumptions of relevance, independence, and exclusivity to mitigate horizontal pleiotropy and confounding biases. Despite our efforts to meet these conditions, completely eliminating these biases remains challenging. Third, potential weak instrument biases in MR analysis may lead to inaccuracies and reduced statistical power; we employed various statistical methods and sensitivity analyses to address this issue, yet some inaccuracies may persist. Lastly, the GWAS data utilized primarily come from European populations, and further validation is needed to confirm whether these findings apply to other ethnic groups.
Conclusion
In summary, our study identifies a potential causal relationship between 14 genetically determined plasma proteins and GC. Notably, proteins such as ABO, FAM3D, FAM3B, ADH7, MAP1LC3A, PGLYRP1, PDE5A, GLUL, NFE2L1, and MAFG demonstrate potential as serological biomarkers for early GC detection. Nonetheless, further investigation is required to elucidate the specific mechanisms by which these candidate proteins contribute to the progression of GC, thereby exploring their potential as therapeutic targets.
Data availability
The data used to perform all the analyses is publicly accessible through the following sources: GWAS data for GC: https://storage.googleapis.com/finngen-public-data-r11/summary_stats/finngen_R11_C3_STOMACH_EXALLC.gz. GWAS data for plasma protein: https://download.decode.is/form/folder/proteomics. To obtain the relevant data of the study, please contact the email: 1394418251@qq.com.
Code availability
All code for MR analysis is available in TwoSampleMR package (v0.6.6) of R software (v4.41). Code for enrichment analysis is available in the clusterProfiler package (V4.12.2) of R software (v4.41).
References
Cao, M. et al. Cancer burden of major cancers in China: A need for sustainable actions. Cancer Commun. (Lond.) 40(5), 205–210 (2020).
Joshi, S. S. & Badgwell, B. D. Current treatment and recent progress in gastric cancer. CA Cancer J. Clin. 71(3), 264–279 (2021).
Wang, F. et al. Fruquintinib plus Paclitaxel versus placebo plus paclitaxel for gastric or gastroesophageal junction adenocarcinoma: the randomized phase 3 FRUTIGA trial. Nat. Med. (2024).
Kartsonaki, C. et al. Infectious pathogens and risk of esophageal, gastric and duodenal cancers and ulcers in China: A case-cohort study. Int. J. Cancer 154(8), 1423–1432 (2024).
de Mello, R. A. et al. Current and potential biomarkers in gastric cancer: A critical review of the literature. Future Oncol. 17(25), 3383–3396 (2021).
Ferkingstad, E. et al. Large-scale integration of the plasma proteome with genetics and disease. Nat. Genet. 53(12), 1712–1721 (2021).
Sun, B. B. et al. Genomic atlas of the human plasma proteome. Nature 558(7708), 73–79 (2018).
Skrivankova, V. W. et al. Strengthening the reporting of observational studies in epidemiology using mendelian randomization: The STROBE-MR statement. JAMA 326(16), 1614–1621 (2021).
Larsson, S. C., Butterworth, A. S. & Burgess, S. Mendelian randomization for cardiovascular diseases: Principles and applications. Eur. Heart J. 44(47), 4913–4924 (2023).
Kurki, M. I. et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 613(7944), 508–518 (2023).
VanderWeele, T. J. et al. Methodological challenges in mendelian randomization. Epidemiology 25(3), 427–435 (2014).
Pierce, B. L., Ahsan, H. & Vanderweele, T. J. Power and instrument strength requirements for mendelian randomization studies using multiple genetic variants. Int. J. Epidemiol. 40(3), 740–752 (2011).
Zhang, L. et al. Systematic proteome-wide mendelian randomization using the human plasma proteome to identify therapeutic targets for lung adenocarcinoma. J. Transl. Med. 22(1), 330 (2024).
Bowden, J., Davey, S. G. & Burgess, S. Mendelian randomization with invalid instruments: Effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44(2), 512–525 (2015).
Verbanck, M. et al. Detection of widespread horizontal pleiotropy in causal relationships inferred from mendelian randomization between complex traits and diseases. Nat. Genet. 50(5), 693–698 (2018).
Hemani, G., Tilling, K. & Davey, S. G. Orienting the causal relationship between imprecisely measured traits using GWAS summary data. PLoS Genet. 13(11), e1007081 (2017).
Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000).
Kanehisa, M. Toward understanding the origin and evolution of cellular organisms. Protein Sci. 28(11), 1947–1951 (2019).
Kanehisa, M. et al. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res. 51(D1), D587–D592 (2023).
Zhang, Y. & Xie, J. Unveiling the role of ferroptosis-associated exosomal non-coding RNAs in cancer pathogenesis. Biomed. Pharmacother. 172, 116235 (2024).
Mao, Y. et al. Blood groups a and AB are associated with increased gastric cancer risk: Evidence from a large genetic study and systematic review. BMC Cancer 19(1), 164 (2019).
Cui, H. et al. Epidemiological and genetic evidence for the relationship between ABO blood group and human cancer. Int. J. Cancer 153(2), 320–330 (2023).
Le Pendu, J. et al. ABH and Lewis histo-blood group antigens in cancer. APMIS 109(1), 9–31 (2001).
Marionneau, S., Le Moullac-Vaidye, B., Le, J. & Pendu Expression of histo-blood group a antigen increases resistance to apoptosis and facilitates escape from immune control of rat colon carcinoma cells. Glycobiology 12(12), 851–856 (2002).
Zhu, Y. et al. Cloning, expression, and initial characterization of a novel cytokine-like gene family. Genomics 80(2), 144–150 (2002).
Park, C. H. et al. Multi-omics reveals microbiome, host gene expression, and immune landscape in gastric carcinogenesis. iScience 25(3), 103956 (2022).
Kedishvili, N. Y. et al. Expression and kinetic characterization of recombinant human stomach alcohol dehydrogenase. Active-site amino acid sequence explains substrate specificity compared with liver isozymes. J. Biol. Chem. 270(8), 3625–3630 (1995).
Yin, S. J. et al. Identification of a human stomach alcohol dehydrogenase with distinctive kinetic properties. Biochem. Int. 22(5), 829–835 (1990).
Salaspuro, M. Acetaldehyde as a common denominator and cumulative carcinogen in digestive tract cancers. Scand. J. Gastroenterol. 44(8), 912–925 (2009).
Ravikumar, B. et al. Regulation of mammalian autophagy in physiology and pathophysiology. Physiol. Rev. 90(4), 1383–1435 (2010).
Choi, A. M., Ryter, S. W. & Levine, B. Autophagy in human health and disease. N. Engl. J. Med. 368(7), 651–662 (2013).
Liao, W. et al. LC3A-positive stone-like structures predict an adverse prognosis of gastric cancer. Anat. Rec. (Hoboken) 297(4), 653–662 (2014).
Dziarski, R. & Gupta, D. Review: Mammalian peptidoglycan recognition proteins (PGRPs) in innate immunity. Innate Immun. 16(3), 168–174 (2010).
Schnell, A. et al. Targeting PGLYRP1 promotes antitumor immunity while inhibiting autoimmune neuroinflammation. Nat. Immunol. 24(11), 1908–1920 (2023).
Frieg, B. et al. Glutamine synthetase as a central element in hepatic glutamine and ammonia metabolism: novel aspects. Biol. Chem. 402(9), 1063–1072 (2021).
Kim, G. W. et al. Glutamine synthetase as a therapeutic target for cancer treatment. Int. J. Mol. Sci. 22(4) (2021).
Jiang, Q. et al. GLUL stabilizes N-cadherin by antagonizing beta-catenin to inhibit the progresses of gastric cancer. Acta Pharm. Sin B 14(2), 698–711 (2024).
Ye, J. et al. Targeting of glutamine transporter ASCT2 and glutamine synthetase suppresses gastric cancer cell growth. J. Cancer Res. Clin. Oncol. 144(5), 821–833 (2018).
Zhang, Y. & Xiang, Y. Molecular and cellular basis for the unique functioning of Nrf1, an indispensable transcription factor for maintaining cell homoeostasis and organ integrity. Biochem. J. 473(8), 961–1000 (2016).
Zhang, H. et al. Understanding the transcription factor NFE2L1/NRF1 from the perspective of hallmarks of cancer. Antioxidants (Basel) 13(7) (2024).
Blank, V., Knoll, J. H. & Andrews, N. C. Molecular characterization and localization of the human MAFG gene. Genomics 44(1), 147–149 (1997).
Ahmadi, M. et al. Carcinogenic roles of MAFG-AS1 in human cancers. Clin. Transl. Oncol. 26(1), 52–68 (2024).
Li, P., Ma, X. & Gu, X. LncRNA MAFG-AS1 is involved in human cancer progression. Eur. J. Med. Res. 28(1), 497 (2023).
Strasser, A. & Vaux, D. L. Cell death in the origin and treatment of Cancer. Mol. Cell 78(6), 1045–1054 (2020).
Ouyang, S. et al. Inhibition of STAT3-ferroptosis negative regulatory axis suppresses tumor growth and alleviates chemoresistance in gastric cancer. Redox Biol. 52, 102317 (2022).
Song, X. et al. Metal-dependent programmed cell death-related lncRNA prognostic signatures and natural drug sensitivity prediction for gastric cancer. Front. Pharmacol. 13, 1039499 (2022).
Gu, R. et al. Ferroptosis and its role in gastric Cancer. Front. Cell Dev. Biol. 10, 860344 (2022).
Acknowledgements
We thank the deCODE database for providing the GWAS summary statistics for our analysis. We want to acknowledge the participants and investigators of the FinnGen study. The FinnGen study is a large-scale genomics initiative that has analyzed over 500,000 Finnish biobank samples and correlated genetic variation with health data to understand disease mechanisms and predispositions. The project is a collaboration between research organizations and biobanks within Finland and international industry partners.
Author information
Authors and Affiliations
Contributions
Fan WH contributed to the design of the work, the acquisition, analysis, interpretation of data, and the drafting and revision of articles. Wu ZJ,Xu SH, Liu ZH, and Huang YM contributed to the design, drafting, and revision of this article. Wang P contributed to the design of the work, the acquisition of funding, the management of the project, and the drafting and revision of the terms. All authors agree to take personal responsibility for their contributions. All authors have read and approved the final manuscript
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Fan, W., Wu, Z., Xu, S. et al. Mendelian randomization analysis of plasma proteins reveals potential novel tumor markers for gastric cancer. Sci Rep 15, 3537 (2025). https://doi.org/10.1038/s41598-025-88118-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-88118-w