Abstract
Background
Cancer is a polygenic disease that involves the dysregulation of multiple biological events, including aberrant cell proliferation and evasion of immune responses, which collectively contribute to cancer development and progression. Although the polygenic risk score (PRS) has enabled assessment of the effects of cancer risk loci, the potential impact of other genetic variants with functional relevance to cancer biology remains largely unexplored.
Methods
We conducted a large-scale analysis of UK Biobank data, including 61,249 cancer cases and 254,203 controls of European ancestry across 16 cancer types. We assessed the impact of germline variants relevant to cancer gene expression and 33 cancer immune responses by constructing genetic scores (eQS: eQTL-specific genetic score; IMS: immune response-specific genetic score) and examining the utility of these scores independently and in addition to cancer-specific PRSs, for each of the 16 cancer types using multivariable logistic regression models.
Results
Here, we show that seven associations are significant after the Bonferroni correction (eQS of MAP3K1 and IDH2 in breast cancer, eQS of PAX8 in cervical cancer, eQS of NRG1 in thyroid cancer, IMS26 in lung cancer, IMS16 in prostate cancer, and IMS8 in testicular cancer). The IMS for interferon responses and effector T cells show protective associations with lung, prostate, and testicular cancers. Combining these IMS with cancer PRS improves risk stratification for testicular and prostate cancers.
Conclusions
Our study provides insights into functional genetic scores associated with cancers.
Plain Language Summary
Many genetic factors contribute to cancer development, and most have been identified by comparing the genetics of individuals with and without cancer. However, genetic factors that affect how the immune system responds to cancer and what genes are expressed by cancer cells remain less explored. In this study, we developed genetic scores signifying cancer gene expression and immune responses across multiple cancer types, using large-scale data sets. Scores showed that several specific genes significantly associated with the risk of breast, cervical, and thyroid cancers. In addition, immune-related scores were associated with the risk of lung, prostate, and testicular cancers. These findings add to our understanding of the underlying genetics of cancer and provide additional information beyond existing risk scores.
Similar content being viewed by others
Introduction
Cancer is a complex, polygenic disease involving many germline variants. Hundreds of cancer susceptibility loci across various types of cancer have been identified using genome-wide association studies (GWAS). The aggregate impact of these loci on cancer has been assessed by constructing polygenic risk scores (PRS) for each cancer type1,2,3,4,5,6. GWAS is a methodology for identifying trait susceptibility loci by comparing the frequencies of variants between cases and controls, but it does not consider the functional characteristics of variants. Most of the identified variants are in non-coding regions that limit interpretation and investigation in vitro. Because cancer PRS represents the cumulative effect of genetic variants identified by GWAS, it is difficult to determine the functional relevance of these variants, to cancer1,5.
In cancer, there are distinct biological mechanisms, including abnormal cell proliferation and aberrant immune responses, including host immune system evasion7,8. Although somatic events that lead to the accumulation of somatic mutations in cancer cells play a critical role in cancer etiology, recent studies have shown that germline variants may also be important in shaping the tumor environment9. One of the key mechanisms of cancer is the accumulation of somatic driver mutations within cells, which can directly affect tumorigenesis by accelerating cell proliferation. These mutations have been identified in various types of cancer10. Because germline variants can contribute to the tumor environment, germline variants associated with these somatic driver genes may play a role in tumorigenesis. Previous studies have investigated germline variants associated with 138 cancer gene mutations (single nucleotide variants, SNV, and copy number variants, CNV) or gene expression in samples from The Cancer Genome Atlas (TCGA)11,12. However, the effect of germline variants specifically associated with cancer gene expression in cancer compared to non-cancer controls has not been assessed.
Aberrant immune responses are also key contributors to tumorigenesis, as cancer occurs from the escape phase of the balance between immune surveillance and cancer-initiating factors13. Using genomic and clinical data, a recent study identified six cancer-specific immune subtypes affecting patient prognosis across 33 cancer types14. Although subsequent study has identified germline variants associated with these 33 cancer immune traits using samples from TCGA, the effects of these variants on individual cancer types compared to non-cancer controls have not been investigated15.
We hypothesized that germline variants with functional relevance to cancer, in addition to cancer susceptibility loci, would have a significant impact on cancer. This study aimed to identify germline variants and assess their association with cancer using genetic score. We used UK Biobank (UKB) samples of European ancestry which is same ancestry of the summary statistics used to determine the PRS and genetic scores, and defined cancer and control samples considering a recent study of 16 cancer types using data from the UKB study (Supplementary Table 1)5,16. We examined germline variants associated with somatic cancer gene expression and 33 cancer immune responses using published studies, as well as germline variants that increase cancer risk identified by cancer GWAS5,12,15,17. For each of the 16 cancer types, we constructed a cancer PRS (known cancer PRS), genetic scores for cancer gene expression (eQS: expression quantitative trait loci-specific genetic score), and 33 cancer immune responses (IMS: immune response-specific genetic score), using summary statistics from relevant studies. We then assessed the impact of these genetic scores in each of the 16 cancer types.
In summary, we show that four eQS and three IMS are significantly associated with the risk of multiple cancer types, including breast, cervical, thyroid, lung, prostate, and testicular cancers. These findings provide insights into how functional genetic scores contribute to cancer risk and progression and highlight their potential relevance for understanding cancer biology and improving risk stratification.
Methods
UK Biobank dataset and quality control
We used data from the UKB study, which included data on approximately 500,000 individuals from the United Kingdom, aged between the ages of 40 and 69 at recruitment16. Participants had been recruited from 2006 to 2010 across multiple centers in the UK, and the genotypic and phenotypic data were collected from 487,409 participants (as of March 3, 2018). We used GRCh37 coordinates. After excluding variants and samples in the quality control process (variants with a call rate <95%, Hardy-Weinberg equilibrium p < 1×10−6, minor allele frequency [MAF] <1×10−4, or imputation quality score [INFO] <0.4, samples with non-Europeans, mismatched sex, putative sex chromosome aneuploidy, no sex information, or withdrawn participants from the UKB), we additionally excluded participants with familial relatedness to minimize confounding effects due to ancestry and kinship. In total, 9,575,249 variants and 338,649 samples were included. This study was conducted using the UKB Resource under Application Number 33002.
Phenotypic data
We defined participants with any of 16 types of cancer as cancer cases (bladder, breast, cervix, colorectum, endometrium, kidney, lung, lymphocytic leukemia, melanoma, non-Hodgkin’s lymphoma, ovary, pancreas, prostate, testis, and thyroid), who had ICD-9 or ICD-10 codes for one of the 16 malignant neoplasms or in situ carcinoma registered in the hospital’s inpatient or cancer registries of UKB. Samples from participants who were identified as having no history of cancer in the registries (inpatient, self-reported, or death records) were used as controls. A total of 61,249 cancer cases and 254,203 controls were included in the analysis.
Construction of genetic scores
All summary statistics used in this study are summarized in Supplementary Data 1. To unify cancer types between these summary statistics and our dataset, cancer types were integrated as described in Supplementary Table 2. The result of correlation of variants between known cancer PRS and genetic scores of eQTL (eQS) or immune responses (IMS), which were estimated using the PLINK–r2 command, is summarized in Supplementary Table 3. Each genetic score was normalized using the z-score transformation. To assess the effect of genetic scores on cancer incidence/prevalence, we performed multivariable logistic regression analysis adjusted for age at recruitment, genotyping array, 10 principal components (PCs), and sex (if applicable).
Known cancer PRS (polygenic risk score)
To construct a known cancer PRS, we obtained the summary statistics of cancer risk variants across the 16 cancer types from a recent study (Supplementary Data 1)5. After excluding variants that were not included in our dataset, 787 variants (range: 8 [endometrial cancer] to 165 [breast cancer]) were used to construct a known cancer PRS for each type of cancer. PRS for each cancer was estimated as follows, which involved consideration of the effect and dose of risk variants per sample:
where \({{PRS}}_{i}^{({\rm{cancer}})}\) denotes the PRS of each cancer type for individual \(i\), \({\beta }_{j}^{({cancer})}\) denotes effect size for each risk allele in cancer-specific summary statistics, and \({G}_{{ij}}\) is the dosage of effective allele j in individual i. The z-score of the cancer PRS [z = (cancer PRS – mean (cancer PRS)) / standard deviation (cancer PRS)] was used to normalize the scores between individuals.
eQS (eQTL-specific genetic score)
We first obtained the cis- and trans- expression quantitative trait loci (eQTL) variants across various cancer types using a published study, PancanQTL (Supplementary Fig. 1)12. We extracted eQTLs associated with cancer genes listed in the OncoKB database and categorized them as either oncogenes or tumor suppressor genes (TSG)17. No data were available for lymphocytic leukemia or non-Hodgkin’s lymphoma during eQS construction. Among the 14 cancer types, for cases in which the same eQTL was associated with multiple genes within the same cancer type, we retained the eQTL-eGene pair with the smallest p-value and removed duplicates. The SNP IDs were harmonized by creating a standardized identifier based on chromosomes, base positions, and alphabetically ordered alleles. To reduce redundancy, we clumped variants with R2 > 0.5 using plink. To harmonize the coefficients with cancer risk, we applied the following adjustments based on the type of gene (TSG or OG) and the direction of the eQTL beta value (\({\beta }_{{eQTL}}\)). For TSGs, if the eQTL variant was positively associated with the gene (\({\beta }_{{eQTL}}\) > 0), we reversed the direction of the beta coefficient by multiplying it by −1 (\({\beta }_{{eQTL}}\) × (−1)). Additionally, the alternative allele was designated as a risk allele to ensure that the beta direction aligned with the increased cancer risk. However, if the eQTL variant was negatively associated with the TSG (\({\beta }_{{eQTL}}\) <0), no adjustments were made to the beta value or the risk allele, as the original beta direction already aligned with cancer risk increase. For OGs, if the eQTL variant was negatively associated with the gene (\({\beta }_{{eQTL}}\) <0), the direction of the beta coefficient was similarly reversed by multiplying it by −1 (\({\beta }_{{eQTL}}\) × (−1)) and the alternative allele was assigned as the risk allele to harmonize the coefficients with cancer risk. Conversely, if the eQTL variant was positively associated with the oncogene (\({\beta }_{{eQTL}}\) > 0), no changes were applied to the beta value or the risk allele, as the positive beta direction was already consistent with cancer risk increase. Finally, for all TSG, an additional transformation (\({\beta }_{{eQTL}}\) × (−1) × (−1)) was applied to align the direction of TSG coefficients with those of oncogenes, ensuring consistency in the interpretation of cancer risk contributions. The eQS per gene was estimated as the sum of the effects of eQTLs (Supplementary Fig. 1), as shown below, and normalized by the z-score.
where \({G}_{{ij}}\) is the dosage of effective allele \(j\) in individual \(i\) and \(S\) is a set of genes.
IMS (immune response-specific genetic score)
To estimate the immune response-specific genetic score (IMS) across the 16 cancer types, we used summary statistics for 33 heritable immune traits and five immune modules defined in recent studies employing over 9000 TCGA cancer samples14,15. The 33 immune traits were composed of four immune-related features (IMS1–IMS4) and 29 gene signatures of specific immune cells or immune responses from gene expression data (IMS5–33). In the study, the effects of SNPs (n = 1794) on 33 heritable immune traits were estimated, adjusted for age, sex, cancer type, and seven genetic PCs, at genome-wide significance (p < 5×10−8, n = 598) and suggestive (p < 1×10−6, n = 1196) significance levels. We constructed IMSs for 33 immune responses per sample using 223 variants, with their effects after SNP ID harmonization and clumping (r2 < 0.5), as described below (Supplementary Fig. 2 and Supplementary Data 1).
where \({G}_{{ij}}\) is the dosage of effective allele \(j\) in individual \(i\). If the variants were duplicated within the same immune traits, we retained the variant with the smallest p value.
In the original study, immune modules were generated as clusters from a Pearson correlation matrix of 139 immune traits15, For our study, to construct IMSs for the five immune modules, we aggregated the SNPs of 33 heritable immune traits into five groups according to the immune modules and retained variants with the lowest p values among traits if variants were duplicated within the same modules. After clumping with r2 < 0.5, the remaining 132 variants were used in IMS construction for the immune modules. While the original study determined the effect of each variant of the immune traits on pan-cancer using linear or logistic regression models, we used the effect size as a weight for IMS in each specific cancer type, as below (Supplementary Fig. 2 and Supplementary Data 1).
where \({G}_{{ij}}\) is the dosage of the effective allele \(j\) in individual \(i\). The Z-score was used in normalization of IMS. To assess the effect of IMSs on cancers, we used univariable and multivariable logistic models adjusted for age at recruitment, genotype array, 10 genetic PCs, and sex, if applicable. The associations between SNPs of IMS and cancer in the final models are shown in Supplementary Data 2.
Construction and validation of final genetic score models for cancer
After constructing the known cancer PRS, eQS, and IMS, we performed multivariable logistic regression analysis to evaluate the effects of each genetic score on each of the 16 types of cancer. To ensure that significant genetic scores were not observed by chance, we performed permutation tests on the final models, using cancer risk as the outcome and significant genetic scores as predictors. This analysis included 500 permutations across the 16 cancer types. To avoid a p-value of zero, the p-value was estimated by (k + 1)/(n + 1), with k representing the number of instances where ∣\({\rm{observed}}\; {\rm{coefficient}}\; {\rm{of}}\; {\rm{genetic}}\; {\rm{score}}\)∣\(\ge\)∣original coefficient of genetic score∣, and n denotes the total number of permutations.
The final model was constructed as follows: when the model included cancer PRS and eQS, eQS was incorporated if it met the Bonferroni threshold. Similarly, when the model included cancer PRS and IMS, IMS was incorporated if the p-value was <0.05. Thus, the final model consisted of the cancer PRS, eQS (if applicable), and IMS (if applicable). The p-values for eQS and IMS were then recalculated within the final model, which was considered complete only if both p-values for eQS and IMS were <0.05, and the permutation p-value was <0.05.
The variance inflation factor (VIF) was estimated to assess the multicollinearity between genetic scores in the final models for each type of cancer; among genetic scores with a high VIF ( > 6), IMS26 was retained in the final model for lung cancer, while IMS6 was removed. Although variants 6:31613739:C:T and 6:31851234:A:G were present in both IMS29 and lung cancer PRS (Supplementary Data 1), there were no overlapping variants in the genetic scores of the final models for each cancer type, as IMS29 was excluded from the final lung cancer model. To evaluate the improvement in the final models compared to those with only known cancer PRS, we estimated McFadden’s R2 and performed a likelihood ratio test.
Stratifying samples by genetic scores
To investigate whether individuals with higher genetic scores were at greater risk of cancer, we divided our samples based on known cancer PRS and IMS and tested odds ratio of the groups adjusted for covariates of age at recruitment, genotype array, 10 genetic PCs, and sex, if applicable.
Statistics and reproducibility
To examine associations of genetic scores and cancer risk, we conducted multivariable logistic regression analyses adjusting for age, age at recruitment, genotype array, 10 genetic PCs, and sex, if applicable. Since an appropriate independent validation dataset was not available, we instead conducted permutation tests (N = 500) for the final models of each cancer type. Functional genetic scores that remained significant at p < 0.05 in the permutation tests were considered robust and retained as final associations.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Results
Variants used in the construction of genetic scores
To assess the effect of genetic scores related to cancer biology, in addition to known cancer PRS, we first listed germline variants associated with cancer risk, somatic cancer gene expression, and 33 cancer immune responses used in the construction of known cancer PRS, eQS per gene of each cancer type, and 33 IMS from the Pan-cancer study, respectively (Fig. 1). A total of 787, 3311, and 223 variants were used to construct known cancer PRS, eQS, and IMS, respectively, in the unrelated European UKB samples (Supplementary Figs. 1, 2, Supplementary Tables 2, and Supplementary Data 1). Since our study aimed to assess the impact of variants associated with cancer gene expression and cancer-related immune responses on cancer, represented as genetic scores (germline variants sets 2 and 3, respectively) independently from known cancer PRS (germline variants set 1), we examined the correlation between known cancer PRS and eQS or IMS (Supplementary Table 3). A weak correlation was observed between the known cancer PRS and the derived genetic scores (eQS and IMS) (R2 < 0.3) in each type of cancer, indicating that the effects of germline variants sets 2 (eQS) and 3 (IMS) were mostly independent of those of germline variants set 1 (known cancer PRS) identified by cancer GWAS (Fig. 1).
Effect of cancer PRS on the UK Biobank samples across 16 cancer types
We constructed known cancer PRS for 16 cancer types using summary statistics from a recent study of Graff et al. 5 In total, 787 variants (range: 8 [endometrial cancer] – 165 [breast cancer]) were used to construct known cancer PRS across the 16 cancer types. We confirmed a similar trend in odds ratios (ORs) across the 16 cancer types in both our study and the study by Graff et al., although there may be minor differences in the definition of samples owing to variations in the dates of sample download (Supplementary Table 4). In all cancer types, we observed that cancer PRS significantly increased the risk of cancer (range of odds ratio: 1.09 [ovary] – 2.26 [testis]).
Significant associations of eQS: MAP3K1 and IDH2 in breast cancer, PAX8 in cervical cancer, and NRG1 in thyroid cancer
To assess the effect of variants associated with cancer gene expression (germline variants set 2, eQS), we calculated eQS for 687 cancer genes across 14 cancer types using expression quantitative trait loci (eQTL) data and the OncoKB cancer gene list and performed multivariable logistic regression analysis (Supplementary Fig. 1)12,17. In total, we identified 1606 eQS genes across 14 cancer types (Supplementary Table 5). We identified four significant eQS (eQS for MAP3K1 and IDH2 in breast cancer, PAX8 in cervical cancer, and NRG1 in thyroid cancer) that remained significant at the Bonferroni threshold even when cancer PRS was included in the model (OR = 1.06 [1.04–1.08], p = 4.80×10-13; OR = 1.03 [1.02–1.05], p = 3.42×10-4; OR = 1.09 [1.06–1.13], p = 1.85×10-9; OR = 1.24 [1.15–1.34], p = 2.26×10-8, respectively) (Table 1). MAP3K1 is recognized as a driver gene in breast cancer, whereas other genes have not been classified as drivers10. Interestingly, while the variant components of the eQS for IDH2 in breast cancer and PAX8 in cervical cancer did not appear to be associated with cancer risk, rs11073900 of IDH2 and rs6707386 of PAX8 were significantly associated with lymphocyte count18,19,20. These variants may not only regulate IDH2 and PAX8 gene expression but may also be linked to the regulation of lymphocytes.
Protective effects of immune scores of interferon responses and effector T cells on various cancer types
To assess the effect of germline variants set 3 (IMS), which is associated with 33 heritable immune traits across 30 cancer types in TCGA data (pan-cancer), we constructed IMS for 33 immune traits and 5 immune modules using summary statistics of 223 and 132 germline variants, respectively, from a published study (Supplementary Fig. 2, Supplementary Tables 6, 7, and Supplementary Data 1)14,15. To determine the significant associations between IMS, or immune modules, and cancer, we performed univariable and multivariable logistic regression analyses. We observed 46 significant associations between 35 IMSs and 15 cancer types, and between 11 immune modules and nine cancer types (p < 0.05). After applying the Bonferroni correction, four associations remained significant (p < 0.05/16 = 0.0031) (Fig. 2 and Supplementary Tables 8, 9). The number of associations varied across the different types of cancer (range,1–6; median,3.5). Interferon response-related IMS (IMS26–IMS28) showed significant negative associations in lung, oral cavity/pharynx cancer, and melanoma (OR = 0.86 [0.79–0.94], p = 6.71×10−4; OR = 0.79 [0.68–0.93], p = 3.70×10−3; OR = 0.91 [0.85–0.97], p = 5.27×10−3, respectively, Fig. 2a and Supplementary Table 8). These results indicate that some germline variants relevant to the expression of interferon responses may contribute to the suppression of cancer incidence and prevalence, which is consistent with a previous finding that interferon responses are involved in the anti-tumor tumor environment during initial tumor formation21,22,23. Additionally, T cell-related immune scores (IMS8 and IMS16) demonstrated a protective effect against testicular cancer and prostate cancer (OR = 0.89 [0.83–0.96], p = 2.68×10−3; OR = 0.96 [0.94–0.99], p = 1.79×10−3, respectively, Fig. 2a and Supplementary Tables 8, 9). Given the crucial role of cytotoxic CD8 + T cells and effector memory T cells in the early phase of cancer as key participants in adaptive immunity, our results suggest that genes associated with these cell types may have protective roles in testicular and prostate cancer (Supplementary Table 10)24,25,26.
a Significant associations between 33 IMS or 5 immune modules and each of the 16 cancer types are presented and analyzed using multivariable logistic regression analysis adjusted for age at recruitment, genotyping array, 10 principal components (PCs), and sex (if applicable). The direction of the coefficients of the scores is indicated in red (OR > 1) and blue colors (OR < 1). The significance levels are depicted using different colors: light red and light blue for p < 0.05, red and blue for p < 0.01, and an asterisk for associations that have passed Bonferroni correction. Immune traits that are involved in the same immune modules are represented in the same color. b The forest plot displays the odds ratio for carriers of the top 20% testicular cancer PRS combined with the top 1%, 5%, 10%, and 20% of IMS8 versus non-carriers (upper plot), and for carriers of the top 20% prostate cancer PRS combined with the top 1%, 5%, 10%, and 20% of IMS16 versus non-carriers (lower plot), representing the association between genetic scores and cancer risk.
By stratifying testicular and prostate cancer patients based on known cancer PRS and IMS8 or IMS16, respectively, we observed a large decrease in the OR for individuals with high cancer PRS and high IMS compared to those using known cancer PRS alone. Specifically, OR for individuals with top 20% cancer PRS and top 5% IMS8 or top 1% IMS16 was 1.89 (1.04–3.44, p = 3.78×10−2) and 1.52 (1.03–2.26, p = 3.73×10−2), respectively, while OR for individuals with top 20% testicular cancer PRS alone was 3.77 (3.21–4.43, p = 3.24×10−58) and for those with prostate cancer PRS alone was 1.81 (1.73–1.89, p = 1.15×10−141) (Fig. 2b and Supplementary Figs. 3, 4). These findings suggest that immune genetic scores (IMS), as determined in this study, have the potential to be used as stratifying scores, along with the known cancer PRS.
Independent effect of significant genetic scores and age on cancer
We constructed the final models for each cancer type using only eQS, IMS, and cancer PRS that were significant based on the permutation test (Table 2, Fig. 3 and Supplementary Figs. 5–7). We observed significant associations between our genetic scores (eQS or IMS) and cancer, as well as the known cancer PRS, for all cancer types except kidney cancer (range of our genetic scores (eQS or IMS):1–4). Although the effect of the known cancer PRS was generally stronger than that of our genetic scores, models incorporating our genetic scores showed significant improvements compared with those using only the known cancer PRS (Supplementary Table 11). We also identified seven genetic scores that remained significant after Bonferroni correction: eQS for MAP3K1 and IDH2 in breast cancer, eQS for PAX8 in cervical cancer, eQS for NRG1 in thyroid cancer, IMS26 in lung cancer, IMS16 in prostate cancer, and IMS8 in testicular cancer.
The forest plots display the final models with significantly associated genetic scores (known cancer PRS, eQS, and IMS). Age was also included in the models. Significance levels are denoted by varying numbers of asterisks to represent the statistical significance of the associations (***: p < 0.001, **: p < 0.01, *: p < 0.05). Odds ratios (OR < 1 in red and OR > 1 in blue) and 95% CI are shown for each cancer type.
In addition to analyzing genetic factors, we evaluated the effect of age, a known risk factor for most types of cancer. Our results indicated a significant positive association with age for most types of cancer, except for cervical and testicular cancers, which often manifest at a relatively young age (Table 2 and Fig. 3).
Discussion
In this study, we examined germline variants that are functionally associated with cancer biology and assessed their effects on the risk of cancer using genetic scores. We showed a significant association of these functional genetic scores with 15 different cancer types, independent of known cancer PRS, which are composed of multiple cancer susceptibility loci. Our study suggests that functional genetic scores may provide a complementary tool for stratifying individuals with known PRS, particularly those at a high risk of testicular and prostate cancers.
Our findings suggest that germline variants that are functionally relevant to cancer could have a substantial impact on cancer when constructed as genetic scores, and that the genetic scores can be interpreted based on biological knowledge. We observed protective associations of IMS8, IMS16, and IMS26 with testicular, prostate, and lung cancers, respectively (Fig. 2 and Table 2). Although the variants for these scores displayed both positive and negative associations for each immune trait in the original study for TCGA samples, we can infer the sum of the variants may be protective effect on those cancers15. Specifically, for IMS26, eight of nine variants composed of IMS26, exhibited a negative association with pan-cancer and four variants showed negative association in lung adenocarcinoma and lung squamous cell carcinoma from TCGA data (2:163110536:A:G [rs2111485], 2:163260691:C:T [rs17716942], 5:138857919:C:T [rs1131769], and 3:133015467:A:G [rs35356925]). Considered that variants composed of IMS26, IMS8, and IMS16 were associated with gene expression of interferon responses, CD8 + T cells, and effector memory T cells, respectively, which are known to play crucial roles in anti-tumor activity in cancer, our results suggest that these genes may play an important role in the development of lung, testicular, and prostate cancers compared to other types of cancer (Supplementary Table 10)21,22,24,25,26. In addition, a higher B cell proportion (IMS1) was associated with testicular cancer. This finding is consistent with the results of a study involving immunohistochemical characterization (IHC) of infiltrating immune cells in testicular cancer, where a high number of B cells were exclusively observed in germ cell neoplasia in situ (n = 26) and seminoma (n = 18), but not in cases of hypospermatogenesis (n = 10) or normal spermatogenesis (n = 18)27.
We used genome-wide significant SNPs from a published study by Graff et al. without considering the LD SNPs of the selected GWAS SNPs or genome-wide significant SNPs in other sources in the construction of the known cancer PRS. There may, therefore, be variants within our genetic scores (eQS or IMS) that overlap with cancer GWAS signals which were not considered in our study. For example, we identified rs80310238 as a component of eQS for MAP3K1 in breast cancer and showed that rs80310238 is associated with an increased risk of breast cancer28. This occurred because the breast cancer PRS used in our study was based on a study by Graff et al., which did not include rs80310238 as a breast cancer risk variant5. Our study highlights the significance of the functional relevance of variants that may overlap with cancer GWAS signals.
Our study had several limitations. Cancer subtypes were not considered because they may have distinct biological mechanisms, such as different driver genes and immunological events. Subgroup analysis by tumor subtype may identify additional significant associations between genetic scores, but it requires appropriate datasets with more samples. We considered the risk of cancer only among individuals of European ancestry, which may limit its generalizability to other ancestral groups. We evaluated only the effects of genetic factors and age on cancer; hence, the effects of other cancer risk factors, including tobacco use and family history, were not considered. Although our study included data for cancer cases and controls from a large dataset (UKB) and considered statistics from another large dataset (TCGA), repeating these analyses using different datasets is warranted.
Our study demonstrated the impact of germline variants with functional relevance on cancer gene expression and cancer immune responses. This finding may contribute to interpretation of the genetic architecture of the cancer genome, in addition to considering the signals derived from cancer GWASs.
Data availability
Access to genetic and phenotypic data from UK Biobank samples can be obtained by gaining approved access through the UK Biobank website (https://www.ukbiobank.ac.uk/). Summary statistics of known cancer PRS, eQTL and 33 immune traits can be downloaded in the respective studies5,12,15,17.
References
Sud, A., Kinnersley, B. & Houlston, R. S. Genome-wide association studies of cancer: current insights and future perspectives. Nat Rev Cancer 17, 692–704 (2017).
Rashkin, S. R. et al. Pan-cancer study detects genetic risk variants and shared genetic basis in two large cohorts. Nat Commun 11, 4423 (2020).
Choi, J., Jia, G., Wen, W., Long, J. & Zheng, W. Evaluating polygenic risk scores in assessing risk of nine solid and hematologic cancers in European descendants. Int J Cancer 147, 3416–3423 (2020).
Jia, G. et al. Evaluating the utility of polygenic risk scores in identifying high-risk individuals for eight common cancers. JNCI Cancer Spectr 4, pkaa021 (2020).
Graff, R. E. et al. Cross-cancer evaluation of polygenic risk scores for 16 cancer types in two large cohorts. Nat Commun 12, 970 (2021).
Fritsche, L. G. et al. Cancer PRSweb: An online repository with polygenic risk scores for major cancer traits and their evaluation in two independent biobanks. Am J Hum Genet 107, 815–836 (2020).
Hanahan, D. Hallmarks of Cancer: New Dimensions. Cancer Discov 12, 31–46 (2022).
Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).
Chatrath, A., Ratan, A. & Dutta, A. Germline variants that affect tumor progression. Trends Genet 37, 433–443 (2021).
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385.e318 (2018).
Carter, H. et al. Interaction Landscape of Inherited Polymorphisms with Somatic Events in Cancer. Cancer Discov 7, 410–423 (2017).
Gong, J. et al. PancanQTL: systematic identification of cis-eQTLs and trans-eQTLs in 33 cancer types. Nucleic Acids Res. 46, D971–D976 (2018).
O’Donnell, J. S., Teng, M. W. L. & Smyth, M. J. Cancer immunoediting and resistance to T cell-based immunotherapy. Nat Rev Clin Oncol 16, 151–167 (2019).
Thorsson, V. et al. The Immune Landscape of Cancer. Immunity 48, 812–830.e814 (2018).
Sayaman, R. W. et al. Germline genetic contribution to the immune landscape of cancer. Immunity 54, 367–386.e368 (2021).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Chakravarty, D. et al. OncoKB: A Precision oncology knowledge base. JCO Precis Oncol 1, 1–16 (2017).
Chen, M. H. et al. Trans-ethnic and Ancestry-Specific Blood-Cell Genetics in 746,667 Individuals from 5 Global Populations. Cell 182, 1198–1213.e1114 (2020).
Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429.e1419 (2016).
Vuckovic, D. et al. The polygenic and monogenic basis of blood traits and diseases. Cell 182, 1214–1231.e1211 (2020).
Boukhaled, G. M., Harding, S. & Brooks, D. G. Opposing Roles of Type I Interferons in Cancer Immunity. Annu Rev Pathol 16, 167–198 (2021).
Budhwani, M., Mazzieri, R. & Dolcetti, R. Plasticity of Type I Interferon-Mediated Responses in Cancer Therapy: From Anti-tumor Immunity to Resistance. Front Oncol 8, 322 (2018).
Jorgovanovic, D., Song, M., Wang, L. & Zhang, Y. Roles of IFN-gamma in tumor progression and regression: a review. Biomark Res 8, 49 (2020).
Liu, Q., Sun, Z. & Chen, L. Memory T cells: strategies for optimizing tumor immunotherapy. Protein Cell 11, 549–564 (2020).
Barach, Y. S., Lee, J. S. & Zang, X. T cell coinhibition in prostate cancer: new immune evasion pathways and emerging therapeutics. Trends Mol Med. 17, 47–55 (2011).
Raskov, H., Orhan, A., Christensen, J. P. & Gogenur, I. Cytotoxic CD8(+) T cells in cancer and cancer immunotherapy. Br J Cancer 124, 359–367 (2021).
Klein, B. et al. Specific immune cell and cytokine characteristics of human testicular germ cell neoplasia. Hum Reprod 31, 2192–2202 (2016).
Michailidou, K. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, 92–94 (2017).
Acknowledgements
This study was approved by the UK Biobank (Application Number 33002). This research was supported by the Bio&Medical Technology Development Program of the National Research Foundation (NRF), funded by the Korean government (MSIT) (RS-2023-00223277), and the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, South Korea (RS-2020-KH088686 and HI19C1328).
Author information
Authors and Affiliations
Contributions
SC, DK, and H-HW designed the study and interpreted the data. IS processed the UKB data. SC analyzed the data and wrote the draft. IS, S-HJ, BK, SK, M-SP, HC, YN, W-YP, DK, and H-HW contributed to the manuscript review. All the authors have read and approved the final version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Medicine thanks Jyotsna Batra and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. [A peer review file is available].
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Cha, S., Shim, I., Jung, SH. et al. Germline genetic scores associated with cancer gene expression and immune responses across multiple cancer types. Commun Med 5, 265 (2025). https://doi.org/10.1038/s43856-025-00958-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s43856-025-00958-9