Introduction

Pancreatic cancer remains one of the most lethal cancers, which is predicted to become the second leading cause of cancer-related deaths by 20301,2. Pancreatic ductal adenocarcinoma (PDAC) accounts for 95% of pancreatic cancer cases, and patients with PDAC display a dismal 5 year survival rate of 5–10%3,4. Unfortunately, this bleak scenario will not change in the future unless multiple actions are taken to control PDAC, including the identification of PDAC high-risk populations to improve primary and secondary prevention interventions. To this end, a better understanding of the role of genetic and non-genetic factors involved in pancreas carcinogenesis is central.

The immune system plays a critical role in detecting and eliminating cancer cells; however, it can also promote tumor growth and progression through various mechanisms5,6. Among the immune-related pathways, multiple pieces of evidence support the role of the Major Histocompatibility Complex (MHC) region in PDAC risk. While most studies have focused on the MHC class I and class II regions, the MHC III region has barely been explored. This region is responsible for inflammatory processes and the CS response, a multistep cascade operating as a key component of the innate immune response. Its activation triggers the formation of the cell-killing membrane attack complex (MAC), which disrupts the cell membrane and ultimately leads to the lysis of target cells. This process is orchestrated through three distinct activation pathways: classical, lectin, and alternative pathways7,8,9. Further to this role, the complement cascade interacts closely with the clotting system. There is strong evidence that several proteases of the coagulation system can activate C3 and C5, indirectly leading to the activation of the CS10. Notably, the CS can be activated by cancer cells and their microenvironment, producing inflammatory and immunomodulatory molecules that can promote tumor growth and suppress immune surveillance5,11,12. Genetic variation within CS-related genes, such as C3, C513, or CD3514, has been associated with the prognosis of different cancer types, including PDAC15. In fact, some CS genes have been suggested as biomarkers for PDAC prognosis and treatment response16. However, the role of the CS in cancer risk and prognosis, particularly in the context of PDAC, is still not well understood. This is particularly relevant given the fact that PDAC risk is also associated with asthma, allergies, autoimmune diseases, and other inflammatory conditions17,18.

In this hypothesis-driven study, we report on the association between variation in CS-related genes and susceptibility to, as well as prognosis of, PDAC. Considering that the impact of a single genetic variant might have a small effect on a complex disease like PDAC, we conducted a comprehensive gene-based association analysis by aggregating all variants within each gene to provide a broader perspective on the overall gene-level effects. To gain deeper insights into the identified susceptibility loci in PDAC, we have also integrated the information on the association of SNPs with other omics data, such as expression and splicing quantitative trait loci (eQTL and sQTL, respectively) in normal pancreas, using colocalization and performing functional in silico analysis. Overall, our findings suggest that several CS-related genes play a significant role in the genetic susceptibility to PDAC. Finally, we demonstrate that the expression of different CS-related genes not only influences PDAC survival but also correlates with immune cell infiltration patterns in PDAC. We also provide a signature of genes whose expression is associated with improved PDAC prognosis. By utilizing large-scale studies and state-of-the-art analytical methodologies, our study contributes to a deeper understanding of the role of CS-related genes in pancreas carcinogenesis and progression, shedding light on the potential implications for disease susceptibility and prognosis.

Results

Characteristics of the study populations

The demographic and clinical characteristics of the PanGenEU and UK Biobank study populations are displayed in Supplementary Table 3. PanGenEU individuals were from Spain (58.8%), Italy (22.6%), Sweden (7.9%), Germany (6.7%), the UK (3.7%), and Ireland (0.2%). The average age of cases and controls was comparable within each study population, although the UK Biobank controls were younger. Moreover, a slightly lower proportion of females was observed in both cases and controls in PanGenEU (n. s.). In terms of risk factors, cases in both PanGenEU and UK Biobank populations had a higher prevalence of smoking and diabetes compared to controls. On the other hand, the prevalence of asthma and allergies was higher among controls than among cases in both studies (Supplementary Table 3). We observed significant differences in the proportion of stage I-II and III-IV PDAC tumors across studies, PanGenEU was the study population with the highest proportion of stage III-IV PDAC tumors and, therefore, more representative of the full spectrum of the disease. The median survival of the TCGA patients was significantly shorter than survival in the other cohorts (PanGenEU, ICGC-AU, and ICGC-CA).

Single variant association analyses with PDAC risk

Results of the meta-analysis of the CS-related SNPs and PDAC risk are presented in Supplementary Data 1. A total of 653 SNPs located within 41 CS-related genes were found to be significantly associated with PDAC risk at a nominal P < 0.05. However, after adjusting for multiple testing (Benjamini–Hochberg), none of the SNPs remained statistically significant (minimum adjusted P = 0.09). The five SNPs with the lowest P-value map to F13A1 (rs6597196), A2M (rs34803501, rs12427063, and rs12426061), and CFH (rs10922096).

Gene-based association analysis with PDAC risk and established PDAC risk factors

Utilizing SKAT-O, we identified two genes significantly associated with PDAC risk in the meta-analysis, ficolin-1 (FCN1, adjusted P = 9.9 × 10-3) and tissue-type plasminogen activator (PLAT, adjusted P = 1.56 × 10-2). Furthermore, four genes showed a borderline significant association: vitronectin (VTN, adjusted P = 7.64 × 10-2), the complement regulatory protein CD46 (adjusted P = 7.24 × 10-2), coagulation factor II receptor-like 2 (F2RL2, adjusted P = 7.24 × 10-2), and alpha-2-macroglobulin (A2M, adjusted P = 7.64 × 10-2) (Table 1). Notably, PLAT and F2RL2 were also associated with asthma and VTN with nasal allergies (Supplementary Table 4).

Table 1 Number of SNPs (nSNPs) within each gene and results from the SKAT-O association test between complement system–related genes and PDAC risk

Functional in silico analyses

The functional in silico analyses provided valuable insights into the CS-related genes associated with PDAC risk. Gene set enrichment analysis conducted using FUMA GWAS revealed that pathways related to humoral immune responses, coagulation, and regulation of immune responses were enriched with the significant set of CS-related genes (Supplementary Fig. 1). Exploring the DisGeNET database, we found that many of the CS-related genes associated with PDAC risk had previously been associated with digestive system diseases and neoplasms, as well as with nervous diseases and mental disorders (Supplementary Fig. 2).

By using GTEx expression data, we discovered that within the VTN gene, eight out of the eleven SNPs associated with an increased risk of PDAC (nominal P < 0.05) were also eQTLs associated with a higher expression of VTN in the normal pancreas: rs2227726, rs2227721, rs2227720, rs2227718, rs111502247, rs2071378, rs2071377, and rs113635549 (max. PDAC OR = 1.17, P = 4.58 × 10-2; max. eQTL OR = 2.34, P = 3.66 × 10-11) (Supplementary Fig. 3a). Additionally, a single SNP within CD46 was identified as a sQTL, meaning that it is also associated with differential splicing (CD46-rs2796278, PDAC OR = 1.08, P = 5.79 × 10-2, sQTL OR = 0.51, P = 5.73 × 10-16) (Supplementary Fig. 3b). Results of colocalization analyses (Supplementary Table 5) suggest that there is a moderate likelihood that the causal variants associated with a higher risk of PDAC (rs2111023, rs12427063, and rs12426061) are also causal of differential expression of the A2M gene. The posterior probability for colocalization of PDAC-associated SNPs and eQTLs corresponding to the A2M gene was 46.3%. For VTN, the posterior probability that the SNPs are causal for both PDAC risk and expression was much lower (13.5%). For PLAT, FCN1, CD46, and F2RL2 the posterior probabilities for sharing a causal variant were lower than 0.1, as well as for all the genes in the colocalization analysis of PDAC and sQTLs (Supplementary Table 6).

Prognostic associations of CS-related genes in PDAC patients

We first assessed the association between the CS-related SNPs and genes with PDAC survival in both PanGenEU and UK Biobank populations independently. To further assess their association with survival, we conducted a meta-analysis incorporating the findings from each population. This identified 583 SNPs with a nominal P-value significantly associated with PDAC survival. However, after adjusting for multiple comparisons, these associations did not retain statistical significance (Supplementary Data 2). Additionally, through our SKAT-O model, we also observed that three CS-related genes (C8B, F2, and KLKB1) were significantly associated with PDAC survival at a nominal P-value (P < 0.05, Supplementary Table 6).

Prognostic CS-based gene signatures for patient stratification

Initially, we tested the association between the expression of CS-related genes and PDAC OS at the individual study level (PanGenEU, TCGA, ICGC-CA, and ICGC-AU). We then performed a meta-analysis combining the results from each study. Out of 17 CS-related genes associated with PDAC OS at nominal P < 0.05, 12 remained significant after correcting for multiple testing: Ig heavy chain gamma 3 (IGHG3), Ig heavy chain mu (IGHM), Ig kappa chain (IGKC), coagulation factor II thrombin receptor (F2R), F2RL2, complement factor I (CFI), alpha-2 macroglobulin (A2M), complement component 4A (C4A), serpin family E member 1 (SERPINE1), fibrinogen alpha chain (FGA), fibrinogen gamma chain (FGG), coagulation factor III, tissue factor (F3). Results showed that an elevated expression of IGHG3, IGHM, IGKC, F2R, F2RL2, CFI, A2M, and C4A was significantly associated with better OS (Fig. 1). In contrast, higher expression of SERPINE1, FGA, FGG, and F3 was significantly associated with poorer OS (Fig. 2).

Fig. 1: Forest plots from the meta-analysis of complement system-related genes associated with improved overall survival when considering PanGenEU, TCGA, ICGC-CA, and ICGC-AU (Fig. 1A-1G) and when including data from TCGA, ICGC-CA, and ICGC-AU (Fig. 1H).
Fig. 1: Forest plots from the meta-analysis of complement system-related genes associated with improved overall survival when considering PanGenEU, TCGA, ICGC-CA, and ICGC-AU (Fig. 1A-1G) and when including data from TCGA, ICGC-CA, and ICGC-AU (Fig. 1H).
Full size image

The Hazard ratios (HR) and 95% confidence intervals (CI) were estimated using two-sided Cox proportional-hazards models under a dominance mode of inheritance. Meta-analysis p-values were adjusted for multiple comparisons using the Benjamini–Hochberg false discovery rate method.

Fig. 2: Forest plots from the meta-analysis of the complement system-related genes indicate an association with poorer overall survival when considering PanGenEU, TCGA, ICGC-CA, and ICGC-AU (Fig. 2A-2D).
Fig. 2: Forest plots from the meta-analysis of the complement system-related genes indicate an association with poorer overall survival when considering PanGenEU, TCGA, ICGC-CA, and ICGC-AU (Fig. 2A-2D).
Full size image

Hazard ratios (HR) and 95% confidence intervals (CI) were estimated using two-sided Cox proportional-hazards regression models under a dominance mode of inheritance. Meta-analysis p-values were adjusted for multiple comparisons using the Benjamini–Hochberg false discovery rate method. Source data are provided as a Source Data file.

We also investigated the association of two gene expression signatures comprising genes whose expression was associated with better (Signature 1: IGHG3, IGKC, IGHM, F2R, F2RL2, CFI, and A2M) or worse (Signature 2: FGA, SERPINE1, FGG, F3) OS in the PanGenEU, TCGA, ICGC-CA, and ICGC-AU cohorts, followed by a meta-analysis. Patients with high expression levels of Signature 1 genes experienced a 32% reduction in mortality risk compared to those with low expression levels (HR = 0.68, P = 4.0 × 10-4, Fig. 3a). Conversely, patients with high expression levels of Signature 2 genes displayed a less favorable OS compared to their low expression counterparts. However, this association was not statistically significant (HR = 1.16, P = 2.01 × 10-1, Fig. 3b).

Fig. 3: Forest plots from the meta-analysis of complement system-related genes illustrate the protective signature that includes genes linked to improved overall survival, as analyzed in the PanGenEU, TCGA, ICGC-CA, and ICGC-AU databases (Fig. 3A).
Fig. 3: Forest plots from the meta-analysis of complement system-related genes illustrate the protective signature that includes genes linked to improved overall survival, as analyzed in the PanGenEU, TCGA, ICGC-CA, and ICGC-AU databases (Fig. 3A).
Full size image

Additionally, the risk signature considers CS-related genes tied to an increased risk of death in the TCGA, ICGC-CA, and ICGC-AU cohorts (Fig. 3 B). Hazard ratios (HR) and 95% confidence intervals (CI) are estimated using two-sided Cox proportional-hazards regression models under a dominance mode of inheritance. Meta-analysis p-values were adjusted for multiple comparisons using the Benjamini-Hochberg false discovery rate method. Source data are provided as a Source Data file.

Patients stratified according to CS-related gene expression have distinct immune cell infiltration profiles

We found that patients with elevated expression of any of the CS-related genes associated with better survival in the meta-analysis (IGHG3, IGKC, IGHM, F2R, F2RL2, CFI, and A2M), had significantly higher infiltration of TCD8+, B cells, and Th1 (Fig. 4). These PDAC tumors also exhibited an increased abundance of state-S1 TCD8+, state-S2 TCD4+, or state-S1 macrophages (Fig. 5), as well as a higher abundance of C10 ecotype (Fig. 6).

Fig. 4: Complement-related gene expression is linked to distinct immune-cell infiltration patterns in PDAC.
Fig. 4: Complement-related gene expression is linked to distinct immune-cell infiltration patterns in PDAC.
Full size image

A heatmap displays the log₂ fold-change (log₂FC) in the relative abundance of immune-cell populations when comparing samples with high expression of each complement system–related gene to those with low expression. The left block (PanGenEU, n = 142) corresponds to 21 immune cells derived from CIBERSORTx. The right block (TCGA-PAAD, n = 124) contains 12 immune cells inferred by Thorsson. Columns are ordered based on hierarchical clustering of the cell types shared between the two cohorts; the brace below indicates the two data sources. Rows represent the 12 complement system-related genes that were included in both expression matrices (IGHG3, IGKC, F2R, F2RL2, IGHM, CFI, A2M, signature 1, FGA, SERPINE1, FGG, F3). Color intensity reflects log₂FC (red = higher, blue = lower abundance in the high-expression group). Gray squares represent cell types for which all values were missing in one comparison group (e.g., Th1 cells in TCGA). Asterisks indicate significant differences that remain after the Benjamini–Hochberg correction (FDR < 0.05, two-sided Wilcoxon rank-sum test). One extreme outlier for T cells CD4 Naïve in TCGA was set to NA for that specific comparison only; the sample was retained for all other cell-type tests. Source data are provided as a Source Data file.

Fig. 5: The heatmap illustrates the relationships between gene-cell state types, highlighting significant differences in the abundance of 12 immune cell types in PanGenEU.
Fig. 5: The heatmap illustrates the relationships between gene-cell state types, highlighting significant differences in the abundance of 12 immune cell types in PanGenEU.
Full size image

This is based on a comparison of high expression levels of complement system-related genes versus low expression levels. Color intensity reflects log₂FC (red = higher, blue = lower abundance in the high-expression group). Asterisks indicate significant differences that remain after the Benjamini–Hochberg correction (FDR < 0.05, two-sided Wilcoxon rank-sum test). Source data are provided as a Source Data file.

Fig. 6: Heatmap illustrating gene-ecotype relationships that highlights the significant differences in the abundance of 10 immune ecotypes in PanGenEU by comparing high expression levels of the complement system-related genes to low expression levels.
Fig. 6: Heatmap illustrating gene-ecotype relationships that highlights the significant differences in the abundance of 10 immune ecotypes in PanGenEU by comparing high expression levels of the complement system-related genes to low expression levels.
Full size image

T Color intensity reflects log₂FC (red = higher, blue = lower abundance in the high-expression group). Asterisks indicate significant differences that remain after the Benjamini–Hochberg correction (FDR < 0.05, two-sided Wilcoxon rank-sum test). Source data are provided as a Source Data file.

Patients with high expression of Signature 1 also had increased infiltration by TCD8+, Th1, or B cells, and a reduced presence of cells like M2 macrophages (Fig. 4), higher abundance of state-S1 TCD8+, state-S1 and state-S2 TCD4+, and state-S1 B cells (Fig. 5), along with a higher abundance of ecotype CE10 (Fig. 6). Interestingly, PDAC tumors with high Signature 1 levels showed elevated clonal expansion of B cells compared to those with low Signature 1 levels (Supplementary Fig. 4).

Conversely, tumors expressing higher levels of FGA or F3 genes, associated with worse outcomes, had significantly reduced infiltration of TCD8+ (Fig. 4). Tumors with high expression of F3 or FGG had a lower abundance of state-S1 NK (Fig. 5) and CE10 ecotype (Fig. 6), and higher abundance of state-S1 mast cells and CE2 ecotype.

Discussion

The development of pancreatic ductal adenocarcinoma is influenced by several genetic factors as described before19,20. The present hypothesis-driven study aimed to investigate the potential role of CS-related genes in PDAC risk and prognosis. Understanding the genetic factors involved in PDAC development is crucial to defining high-risk populations suitable for screening and primary prevention targeted interventions.

Our findings provide valuable insights suggesting that CS is associated with PDAC risk as well as with that of allergy and asthma through a meta-analysis of PanGenEU and UKBiobank study results, both at the SNP and gene levels. Functional in silico analyses of identified susceptibility SNPs/genes have offered additional evidence supporting the involvement of CS-related genes in PDAC and shed light on their potential mechanisms of action, including immune regulation, expression modulation, and tumor-specific expression changes. In particular, eight SNPs in VTN identified as risk variants were also associated with a higher VTN expression in the normal pancreas. Our study also identified moderate and weak colocalization between PDAC and eQTL signals for A2M and VTN, respectively, suggesting that the SNPs within these genes might share a functional effect on PDAC risk and gene expression.

At the gene level, we identified six (four with borderline significance) susceptibility CS-related genes, including FCN1, which is expressed in leukocytes and involved in elastin-binding21,22; PLAT converts plasminogen into plasmin, crucial in cell migration and tissue remodeling/degradation23; CD46 protects host cells from complement self-cell damage24,25; F2RL2 is key in thrombosis26; VTN promotes cell adhesion27 and inhibits the MAC28; and A2M, which works as a protease inhibitor29,30. Remarkably, our study provides evidence of the association between these CS-related genes and PDAC risk. Interestingly, PLAT is among the top overexpressed genes in PDAC compared with normal pancreas31,32. Functional studies have shown its role in tumor growth and invasion in vitro and in vivo33,34 through a variety of mechanisms including non-catalytic effects35. We also discovered associations between PLAT and F2RL2 and the risk of asthma, and VTN with the risk of allergy, highlighting the interplay between complement-related genetic variations and immune-related conditions also associated with PDAC risk.

Pathways related to humoral immune responses and coagulation are enriched with the identified CS-related genes. Notably, A2M, F2RL2, VTN, and PLAT are related to thrombosis, a condition that is prevalent in cancer patients, especially among those with PDAC36, who frequently suffer thromboembolic events37,38,39 due to the release of procoagulant factors like thrombin40. A2M plays a crucial role in regulating thrombin activity, while thrombin can cleave the F2RL2 receptor41, enhancing platelet activation, promoting a procoagulant state, and contributing to thromboembolic diseases42. VTN binds to fibrin and platelets participating in platelet aggregation43 and inhibiting PLAT, responsible for dissolving blood clots44,45. The interplay between these genes and the coagulation cascade components warrants further investigation to elucidate their contribution to the development and progression of PDAC. Furthermore, DisGeNET analysis showed that some of these genes were associated with central nervous system diseases. PLAT and A2M, previously linked to depression46,47, were also associated with PDAC risk. Interestingly, PDAC has the highest prevalence of depression symptoms preceding its diagnosis48,49,50 compared to other cancer types, suggesting a potential connection between genetic variations in PLAT and A2M, depression, and PDAC. Furthermore, A2M has been linked to neuronal disorders like Alzheimer’s disease51,52. PDAC is a neurotropic tumor53,54,55 with a high incidence of perineural invasion. Given this neurotropic nature of PDAC, the association between A2M and PDAC becomes even more relevant for its potential involvement in Alzheimer’s disease, supporting the observed inverse relationship between neurodegenerative diseases and cancer56,57,58,59,60. However, further research is needed to explore the interplay between these genes, neuronal and mental diseases, and the development of PDAC.

Next, we aimed to identify robust prognostic biomarkers considering the CS-related genetic variation and transcriptomic data. Although no significant variants at the germline level were identified, our results found biomarkers at the somatic level. We observed that overexpression of eight CS-related genes was associated with a favorable prognosis. Among them, IGHG3, IGKC, and IGHM code for proteins essential for immunoglobulin (IG) integrity, indicating a robust immune response potentially through the clonal expansion of B cells and increased IG levels to target tumor cells. We also found that high expression levels of F2R and F2RL2, crucial for platelet activation and thrombotic responses, were also associated with anti-tumor immune cell infiltration, including TCD8+, TCD4+, B cells, and DCs. We hypothesize that these inflammation processes, in the context of “cold” tumors like PDAC, might favor the activation and infiltration of the abovementioned TCD8+, DCs, and B cells, leading to improved patient survival. Conversely, Wu et al. (2024)61 identified F2R as a potential biomarker of adverse outcomes in gastric adenocarcinoma due to inflammatory processes. Elevated expression of F2RL2 has been associated with improved prognosis in esophageal squamous cell carcinoma62, although with poorer survival in colorectal adenocarcinoma63 and gliomas64. Furthermore, we showed that A2M expression was associated with enhanced overall survival outcomes and anti-tumor immune cell infiltration (TCD8+, Th1, and B cells), consistent with findings in intrahepatic cholangiocarcinoma, where its low expression was associated with unfavorable prognosis65. Lastly, we found that greater expression of CFI was associated with enhanced patient survival and infiltration of tumor-tolerant immune cells (TCD8+, Th1, and NK cells). Other studies have associated CFI with shorter breast cancer survival66, and tumor progression in cutaneous squamous cell carcinoma and gliomas67,68. Considering that the dual role of these genes might be cancer-specific, further research on the roles of F2R, F2RL2, A2M, CFI, and C4A in PDAC prognosis is crucial. Additionally, we propose a prognostic gene expression signature composed of the genes associated with a favorable prognosis: Signature 1. Interestingly, patients with higher levels of this signature had tumors with enhanced clonal expansion, higher immune infiltration of tumor-killing immune cells, and state-S1 TCD8+, state-S2 TCD4+, or state-S1 macrophages or ecotype CE10, previously associated with better overall survival69, suggesting a more robust immune response. Based on these findings, we anticipate that these patients may respond favorably to immunotherapy.

We also identified four biomarkers associated with unfavorable prognosis: FGA, FGG, SERPINE1, and F3. These genes are involved in blood clot formation through the coagulation cascade. Higher levels of FGA and FGG are linked to poor prognosis in gastric cancer70. Serum levels of FGG are also suggested to act as a biomarker for castration-resistant prostate cancer71. SERPINE1 expression has been associated with poorer prognosis72 and cancer progression in colon73 and gastric74 cancer. F3, which activates the coagulation cascade via coagulation factor VII interaction, is associated with colorectal cancer75. We observed that PDAC tumors displaying high expression of these genes also had a decreased presence of anti-tumor immune cells like TCD8+, decreased abundance of state-S1 mast cells or NK cells, and higher abundance of cell communities such as CE1 and CE269 ecotypes, possibly creating an environment conducive to tumor progression, leading to poorer patient outcomes69. Interestingly, a signature comprising SERPINE1 and F3 among other genes, has been recently associated with worse prognosis in lower-grade glioma76. Based on our results and previous evidence, we speculate that targeting SERPINE1, F3, FGG, or FGA genes in combination with chemotherapies could enhance treatment effectiveness in PDAC tumors. This is particularly relevant as some F3 targeted therapies, like Genmab’s tisotumab vedotin, are already approved to treat cervical cancer77. Additionally, SERPINE1 has been shown to reduce the efficacy of chemotherapy agents, promoting drug resistance in triple-negative breast tumors78, while FGA has been proposed for targeted therapy responsiveness in NSCLC patients79.

This study has some limitations. First, our analyses focused on the European population, and therefore, the generalizability to other ancestry groups remains uncertain. Additionally, the study relied on self-reported epidemiological information, which may introduce misclassification of risk factors and comorbidity. However, the use of a meta-analysis approach incorporating data from different populations helped mitigate this potential bias. In this study, we also encountered one of the key limitations affecting association studies in complex diseases, which relates to the small effect sizes of SNP association estimates, resulting in a lack of significant results. The gene-based analysis using the SKAT-O approach alleviates this issue, although it does not elucidate the direction of the effect. We minimized this limitation by applying functional in silico analyses. We applied the Benjamini-Hochberg procedure to control false discoveries. Lastly, we focused on common variants; future studies should explore rare variants through exome or genome sequencing or targeted analysis to gain deeper insights.

Importantly, our study has several strengths. We conducted an extensive analysis of the association between the CS-related genes and PDAC risk utilizing advanced methodologies in two large studies with distinct designs, PanGenEU and UK Biobank, enhancing the statistical power and robustness of our findings. Additionally, we thoroughly examined the association of CS-related genes with various PDAC risk factors using comprehensive epidemiological and clinical data, which provides a deeper understanding of the complex interplay between CS-related genes and PDAC risk. The use of four independent study populations to explore the role of CS-based genes on PDAC prognosis allowed us to identify a strong signature indicative of a better prognosis, which is also supported by immune infiltration characteristics within the tumor in addition to the statistical evidence.

In conclusion, our work exhaustively assesses the association of six CS-related genes (CD46, A2M, VTN, F2RL2, FCN1, and PLAT) with PDAC risk. The functional relevance of these genetic variations, particularly in gene expression regulation and disease pathways, underscores their role in PDAC. These findings highlight their importance as potential biomarkers for PDAC risk profiling, leading to early diagnosis. Furthermore, we showed that gene expression patterns of CS-related genes may help to discriminate PDAC patients based on their prognosis. We also shed light on the molecular mechanisms underlying pancreatic carcinogenesis, which might provide valuable insights for personalized treatment strategies. Future research should focus on unraveling the intricate mechanisms through which these complement-related genes contribute to PDAC development, as well as exploring their potential as therapeutic targets in PDAC tumors.

Methods

Study populations

We utilized the resources of the PanGenEU case-control study, the UK Biobank, the International Cancer Genome Consortium (ICGC), Pancreatic Cancer Canada (ICGC-CA) and Pancreatic Cancer Australia (ICGC-AU) repositories (https://dcc.icgc.org/repositories), and The Cancer Genome Atlas (TCGA) repository (https://portal.gdc.cancer.gov). A description of these studies and populations can be found in Supplementary Methods. Sex information was determined by self-report. For the statistical analyses, we prioritized individuals with genetic European ancestry belonging to the PangenEU and UKBiobank study populations, and those whose self-reported race was white in TCGA and ICGC.

Single variant association analysis with PDAC risk

For the single variant association analysis with PDAC risk, we considered both imputed and genotyped SNPs annotated within the 111 complement-related genes preselected by Qian et al. (2019)80 (Supplementary Table 2), for both PanGenEU (197,050 SNPs) and UK Biobank (183,849 SNPs) studies. In the PanGenEU, we prioritized SNPs with a minor allele frequency (MAF) ≥ 5% and high imputation quality (info >0.3). To estimate the OR and 95% CI for the selected SNPs with PDAC risk in each study population, we conducted logistic regression models adjusted for the first five principal components to account for population stratification, age, sex, and the European region as covariates in the case of the PanGenEU study population. For the UK Biobank population, we adjusted the models for the aforementioned variables except for the European region, in addition to the genotype array type and center. Finally, we combined the individual study estimates in a meta-analysis assessing the summary statistics of the association analysis of 5,932 SNPs obtained in both populations using the function rma.umi (metafor R package, v. 4.2-0)81. To limit the number of false positives resulting from the multiple comparisons performed in the meta-analysis, given the hypothesis-driven approach we followed, we applied the Benjamini-Hochberg procedure, one of the most widely used false discovery rate (FDR) methods.

Gene-based association analysis with PDAC risk

We conducted a gene-based association analysis in each population using the Sequence Kernel Association Test (SKAT-O), implemented in the R package seqMeta (v1.6.7) (https://github.com/DavisBrian/seqMeta). To do this, we employed the SNPs included in the single variant association analysis, followed by a filtering step based on linkage disequilibrium (LD > 0.7). We adjusted the SKAT-O models for the same covariates as mentioned previously. Furthermore, to detect potential associations and gain more robust insights into the role of the complement-related genes in PDAC risk, we performed a meta-analysis of the summary statistics from the 111 complement-related genes derived from both study populations. We then assessed the association (OR and 95%CI) between the CS-related genes linked with PDAC risk in our analysis and the well-known PC risk factors, including asthma, allergies, tobacco, family history of cancer, and body mass index (BMI) in each study population using logistic regression models adjusted for the same covariates as previous models. Furthermore, we conducted a meta-analysis using a random-effects model to integrate the results from both study populations and applied FDR as previously to correct for multiple comparisons.

To understand the functional implications of the genes significantly associated with PDAC risk, we conducted an in silico functional analysis using various bioinformatics tools (see Supplementary Methods).

Survival analysis of CS-related genes at germline and somatic levels

We examined the association between the SNPs within 111 CS-related genes and PDAC overall survival (OS) in the PanGenEU and UK Biobank populations (see Supplementary Methods).

Similarly, we used data on the expression levels of the 111 CS-related genes in PDAC tumors to examine their association with the overall survival of 134 PDAC patients from TCGA, classifying them as low-expression and high-expression groups based on the median. To validate these findings, we analyzed survival data from (PanGenEU (n = 122), ICGC-CA (n = 160), and ICGC-AU (n = 86)) employing Cox PH models with adjustment for sex, age at diagnosis, and tumor stage in each population. Next, we meta-analyzed the findings from the four study populations, focusing on CS-related genes with a nominal P < 0.1 in any of them. We also created gene expression signatures based on the meta-analysis results: Signature 1, comprising genes whose expression was associated with improved overall survival after applying multiple test correction; Signature 2, comprising genes whose expression was associated with a decreased OS after applying FDR. These signatures were created by converting gene expression levels into a score, with patients being classified into low-expression or high-expression groups based on the median expression levels of the signature. Then, we assessed the association between these signature scores and PDAC survival in each study population and conducted a comprehensive meta-analysis combining findings from the four cohorts.

Finally, we explored whether the observed associations between CS-related genes at the individual and signature levels with survival might be partially explained by specific immune infiltration patterns within PDAC tumors (see Supplementary Methods).

Ethics Statement

IRB ethical approval was obtained by all participating centers contributing to PanGenEU (Supplementary Note 1). Written informed consent was obtained from all study participants. The study was conducted in accordance with the Helsinki Declaration.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.