Introduction

Lymphoid malignancies (LM) are neoplasms that arise from the lymph nodes, other lymphatic tissue, or in the lymphatic cells of other organs. They are heterogenous with respect to biology, aggressiveness, and treatment, with over 100 subtypes including the common subtypes of diffuse large B-cell lymphoma (DLBCL), follicular lymphoma (FL), mantle cell lymphoma (MCL), marginal zone lymphoma (MZL), small lymphocytic leukemia (SLL), T-cell lymphoma (TCL), Hodgkin lymphoma (HL), Waldenström macroglobulinemia (WM), and multiple myeloma (MM) [1, 2]. Collectively, there were an estimated 125,110 new cases in the United States in 2023 [3].

Established non-genetic risk factors for LM, or specific subtypes, include family history of hematologic cancer, infectious agents, immune dysregulation, and chemical/toxin exposure [4]. Genetic epidemiology studies of LM risk have shown that LM clusters in families and that family history, irrespective of subtype, is a risk factor for LM [5]. However, a family history of a specific subtype is generally most strongly associated with an increased risk of the same subtype [6]. In a large study that investigated over 150,000 individuals with hematological malignancies, family history of a specific subtype with risk of the same subtype was >4-fold for chronic lymphocytic leukemia, MCL, and HL and between 2 to 3-fold higher for DLBCL, FL, and MM [7]. This study also showed that a family history of a specific LM subtype is associated with a risk of other subtypes.

A family history of LM supports that inherited genetics play an important a role in the etiology of LM. Genome-wide association studies (GWAS) have identified susceptibility variants for risk of HL, CLL, FL, DLBCL, MZL, WM, and MM [8,9,10,11,12]. These variants, in the form of single nucleotide polymorphisms (SNPs), are relatively common with small effects on risk. However, the genetic component attributed to rare mutations with a risk of LM is less understood, particularly in the general population.

Genetic variants that alter the function of a gene, most notably protein-truncating, can increase the risk of disease. These variants, called pathogenic variants (PV), are rare in the general population. Some of the more common genes that harbor PV associated with cancer risk are genes associated with DNA damage repair. For example, PV in BRCA1 and BRCA2 are notably associated with increased risk of breast (5-10-fold increase) [13, 14] and ovarian (5-11-fold increase) cancer [15]. CHEK2 is associated with breast cancer (~2-fold increase) [13, 14], prostate cancer (~4-fold increase) [16, 17], and thyroid cancer (1.6-3-fold increase) [18, 19]. ATM is associated with breast cancer (~2-fold)[13, 14], ovarian cancer (1.7-fold increase) [15], and pancreatic cancer (4-fold increase) [20]. Given the strong associations of PV in these genes with several solid tumor malignancies, individuals diagnosed with these cancers often qualify for genetic testing based on patient or clinical characteristics at the time of diagnosis (e.g., age at diagnosis, cancer subtype, or family history) through National Comprehensive Cancer Network (NCCN) criteria due to the increased likelihood they carry a PV in a known cancer predisposition gene [21]. Knowing mutation status can inform therapy for the individual and risk assessment and management for family members without cancer.

To date, our understanding of the relationship between PV in known cancer predisposition genes and risk of LM is limited, but some suggestive evidence of an association has come from small population studies, family studies, or predisposition syndromes. Increased incidence of lymphoma has been reported in Li Fraumeni syndrome (TP53) [22] and Lynch Syndrome (MSH2, MLH1, MSH6) [23, 24]. Survivors of pediatric or adolescent lymphomas (N = 1380) were enriched for BRCA2 mutations compared to gnomAD [25]. A recent study from the Japanese Biobank found that PV in BRCA1, BRCA2, ATM, and TP53 were associated with the risk of lymphoma overall [26]. However, further research with large sample sizes is necessary to understand the prevalence of PV and the risk of LM and within LM subtypes. Herein, we investigate the association between PV in 19 cancer predisposition genes commonly found on genetic testing panels and the risk of LM and major LM subtypes using a large clinic-based case-control study.

Methods

Study participants

Lymphoma cases

Lymphoma cases (excluding CLL and MM) included newly diagnosed Hodgkin and non-Hodgkin lymphomas from the Mayo component of the Molecular Epidemiology Resource (MER), a prospective observational cohort study of newly diagnosed lymphoma patients from the Iowa/Mayo Lymphoma Specialized Program of Research Excellence (SPORE) [27]. Eligibility for this analysis included age 18 years or older, enrolled from 2002 to 2018, and had an available blood sample for DNA sequencing. All pathology was centrally reviewed and classified according to the WHO Classification [28]. Participants were offered an enrollment questionnaire (completion rate 82%) and a risk factor questionnaire (completion rate 76%), which included family history of hematological malignancies. MM cases were recruited from individuals having a bone marrow biopsy as part of their clinical workup for MM diagnosis at Mayo Clinic. Eligibility for this analysis included MM cases ages 18 and older, seen between 1998 and 2022, who also had an available blood sample as a source of DNA for sequencing. All MM diagnoses were confirmed by a hematopathologist. Family history of hematologic malignancy was obtained through medical record abstraction or a risk factor questionnaire.

Controls

Controls were from the Mayo Clinic Biobank, which is a large-scale bio-repository of Mayo patients aged 18 years or older who were enrolled between 2009 and 2016, mainly from primary care clinics [29]. Participants completed an enrollment questionnaire, which included a personal and family history of hematological cancers. Participants provided a blood sample for DNA sequencing. Participants with a personal history of a hematologic malignancy were excluded from this analysis.

DNA sequencing and bioinformatics analysis

Germline DNA samples were sequenced for exomes by Regeneron Genetics Center (RGS) using a high-throughput automated approach. The targeted bases were captured using the Twist Comprehensive Exome design. The captured libraries were sequenced on the Illumina NovaSeq 6000 system on S4 flow cells using paired-end 75 bp reads. Genetic variants were called using a Parabricks accelerated version of DeepVariant v0.10. High-quality sequence data (>20× read depth) was observed in >95% of samples. Variants were annotated and classified using the Biological Reference Repository, a toolkit for annotating variants using public and user-specific annotation resources in indexed JSON-encoded flat files (catalogs) [30]. PVs, defined as loss-of-function (i.e., nonsense, frameshift, consensus splice sites (±1 or 2)) or identified as “pathogenic” or “likely pathogenic” in the ClinVar database, were identified in 19 cancer predisposition genes (ATM, BARD1, BRCA1, BRCA2, BRIP1, CDKN2A, CHEK2, MLH1, MRE11, MSH2, MSH6, NBN, NF1, PALB2, PTEN, RAD50, RAD51C, RAD51D, TP53). PV in TP53 were restricted to alternate allele fractions (calculated as the fraction between alternate allele reads and the total number of reads at a specific genomic position) between 0.3 and 0.7 in an attempt to exclude potential clonal hematopoiesis [31]. The distributions of the variant allele fraction for the PVs for each gene are in Fig. S1.

United Kingdom Biobank (UKB)

To validate our findings, we utilized lymphoma patients and controls from the UKB. The UKB cohort is a prospective cohort of >500,000 individuals aged 40–70 years at the time of recruitment, of which 450,000 had whole exome sequencing (WES) [32]. We used the International Classification of Disease (ICD) codes to identify lymphoid (ICD code C80) or MM (C90) patients. Prevalent and incident cases were included in the analyses. Controls were defined as those with no personal history of any cancer. In total, 4772 cases and 389,731 controls were included. PVs were identified using the same definition as above. The analyses were performed under the UKB application no. 79864.

Statistical analysis

Frequencies of PV in each gene were tabulated for cases and controls. The association between PV in each gene and overall LM risk were estimated using odds ratios (OR) and 95% confidence intervals (CI) from logistic regression models, adjusted for age (at the time of diagnosis for cases and at the time of enrollment into the biobank for controls) and sex. Separate analyses by LM subtype were also conducted. Further analyses were conducted stratified by age, sex, and family history of hematologic malignancy. Case-only analyses were performed to compare patient characteristics between PV carriers and non-carriers in genes significantly associated with the risk of LM. All analyses were performed in R (version 4.2.2). Statistical significance was set at p < 0.003 to account for multiple testing (0.05/19 genes) for the analysis of LM overall. For UKB, Fisher’s Exact test was used to compare the frequency of PV carriers between cases and controls by gene. Meta-analyses were performed across the Mayo Clinic (discovery) and UKB (validation) datasets for genes where the Mayo Clinic cases had at least 4 PV carriers. Meta-analyses were conducted using the ‘metafor’ package to fit random-effects models via the restricted maximum likelihood method.

Results

Patient characteristics

The Mayo Clinic study included a total of 6990 LM cases and 42,632 controls, all unrelated individuals (Table 1). The median age for cases and controls was 63 years (range 19–99 years). The proportion of males was 58.5% in cases and 41.7% in controls. In both cases and controls, EUR ancestry accounted for 96.5% of the study population, followed by 1.4% AFR ancestry. Self-reported positive family history of lymphoma or leukemia was 14.9% in cases and 8.6% in controls. As expected, increasing age (OR = 1.007 [continuous], 95% CI: 1.005–1.009), male sex (OR = 1.88, 95% CI: 1.76–2.00), and positive family history of first degree relative with a hematologic malignancy (OR = 1.92, 95% CI: 1.75–2.10) were all significantly associated with LM. Non-Hodgkin lymphoma subtypes (NHL, n = 4395), HL (n = 457), and MM accounted for 62.9%, 6.5%, and 30.6% of cases, respectively. Within NHL, the most prevalent subtype DLBCL (n = 1146, 26.1%), followed by FL (n = 1132, 25.8%), MZL (n = 444, 10.1%), TCL (n = 369, 8.4%), MCL (n = 325, 7.4%), SLL (n = 192, 4.4%), and other B-cell lymphomas (n = 787, 17.9%).

Table 1 Patient characteristics by case/control status and by lymphoid malignancy subtype.

Association of predisposition genes and lymphoid malignancies overall

The prevalence of PV across the 19 cancer predisposition genes was 4.7% in cases and 3.5% in controls. Among cases, the highest prevalence of PV was observed for CHEK2 (1.8%), ATM (0.8%), BRCA1 (0.3%), BRCA2 (0.3%), and TP53 (0.3%, Fig. 1, Table S1). PVs in ATM (OR = 1.86, 95% CI: 1.36–2.49), CHEK2 (OR = 1.74, 95% CI: 1.42–2.13), and TP53 (OR = 9.07, 95% CI: 4.51–18.87) were associated with increased risk of LM (all p < 5.7 × 10−5). The remaining genes showed either no evidence of an association with the risk of LM (e.g., BRCA1, BRCA2) or had limited numbers of patients with PVs to assess associations (e.g., CDKN2A, MLH1, MSH2, NF1, PTEN, and RAD51D). The results remained unchanged when restricted to cases and controls who did not have a prior history of any cancer (excluding non-melanoma skin cancer, Table S2).

Fig. 1: Association between pathogenic variants in cancer predisposition genes and risk of lymphoid malignancy.
figure 1

Estimates were adjusted for age and sex in the Mayo Clinic study. Univariate analysis was used in the UK Biobank cohort. OR Odds Ratio, CI Confidence Interval.

In the UKB cohort, we evaluated the 13 genes that had at least 4 cases with PV in the Mayo study. In the UKB cohort, the median age of cases was 65 years and 56 years in controls. The frequency of males was 55% for cases and 45% for controls.

We found significant associations with LM risk for both ATM (OR = 1.72, 95% CI: 0.96–2.86) and CHEK2 (OR = 1.88, 95% CI: 1.40–2.48), with similar risk estimates as that observed in the Mayo study (Fig. 1). We also observed significant association of NBN (OR = 1.94, 95% CI: 1.10–3.17), which was not significant in the Mayo study. The association of LM risk with TP53 was weaker (OR = 2.27, 95% CI: 0.46–6.81) than that in the Mayo study and not statistically significant in the UKB cohort. Similar to the Mayo cohort, no evidence of association was observed in the other genes. In a meta-analysis across the two cohorts (11,762 LM cases and 432,363 controls), we observed significant associations between risk of LM and ATM (OR = 1.83, 95% CI: 1.40–2.38), CHEK2 (OR = 1.79, 95% CI: 1.51–2.11), and TP53 (OR = 5.13, 95% CI: 1.35–19.51) (Fig. 1).

In the Mayo study, we performed stratified analyses for age (<60 years of age or ≥60 years of age), sex (male or female), and family history of hematologic malignancy (positive or negative first degree family history). Among participants under the age of 60 years (n = 2804 cases, n = 17,804 controls), the frequency of PVs was 4.9% in cases and 3.8% in controls. PVs in CHEK2 (OR = 1.71, 95% CI: 1.24–2.32) and TP53 (OR = 23.12, 95% CI:5.78–154.42) were associated with increased risk of LM in participants under age 60. Interestingly, RAD51C had an OR > 2 but did not reach statistical significance (p = 0.094, Table S3). Among those aged 60 years or older (n cases = 4182, n controls = 24,652), the frequency of PVs was 4.5% in cases and 3.2% in controls. PVs in ATM (OR = 2.10, 95% CI: 1.42–3.04), CHEK2 (OR = 1.74, 95% CI: 1.32–2.26), and TP53 (OR = 6.03, 95% CI: 2.57–14.17) were all significantly associated with the risk of LM. Stratifying by sex, among males (n cases = 4090, n controls = 17,762), the frequency of PV was 4.5% in cases and 3.7% in controls (Table S4). PVs in ATM (OR = 2.04, 95% CI: 1.36–3.00), CHEK2 (OR = 1.33, 95% CI: 1.00–1.74), and TP53 (OR = 12.52, 95% CI: 4.77–38.83) were significantly associated with the risk of LM. For females (n cases = 2897, n controls = 24,870), the frequency of PVs was 4.9% in cases and 3.4% in controls. Similarly, PVs in ATM (OR = 1.64, 95% CI: 0.99–2.59), CHEK2 (OR = 2.45, 95% CI: 1.82–3.26), and TP53 (OR = 6.09, 95% CI: 2.00–17.55) were significantly associated with the risk of LM. Finally, for individuals with a positive family history of hematologic malignancy (n cases = 709, n controls = 3376), the frequency of PVs was 5.4% in cases and 3.7% in controls (Table S5). PVs in CHEK2 (OR = 2.70, 95% CI: 1.52–4.68) were significantly associated with increased risk of LM; ATM (OR = 2.27, 95% CI: 0.84–5.63) did not reach statistical significance, and TP53 did not have enough mutation carriers to assess risk.

Association of predisposition genes and lymphoid subtypes

Given the heterogenous nature of lymphoid malignancies, we further investigated the prevalence of PV in the cancer predisposition genes and risk associated with each subtype (Tables S6S14). The collective frequency of PVs across the subtypes in the Mayo study ranged from 2.0% to 10.5%, with HL having the lowest frequency and MCL having the highest. PVs in ATM, CHEK2, NBN, and TP53 were associated with at least one LM subtype, while all other genes showed no evidence of subtype specific associations (Fig. 2). PVs in ATM were associated with increased risk of FL (OR = 2.22, 95% CI: 1.13–3.90), MCL (OR = 8.62, 95% CI: 4.48–15.12), MM (OR = 1.97, 95% CI: 1.16–3.12), and T-cell (OR = 3.11, 95% CI: 1.16–6.85). PVs in CHEK2 was associated with increased risk of SLL (OR = 3.65, 95% CI: 1.54–7.24), DLBCL (OR = 1.93, 95% CI: 1.23–2.89), and MM (OR = 1.75, 95% CI: 1.24–2.41). Additionally, OR estimates for FL, MCL, MZL, and T-cell were all elevated (OR~1.5 or higher) for CHEK2, but none were statistically significant. Finally, PVs in TP53 were associated with increased risk of DLBCL (OR = 10.97, 95% CI: 3.06–31.43), MCL (OR = 38.30, 95% CI: 10.27–116.69), MM (OR = 7.67, 95% CI: 2.43–20.72), and other B-cell (OR = 16.12, 95% CI: 4.47–46.62).

Fig. 2: Risk of lymphoid malignancy subtype by cancer predisposition gene.
figure 2

Data shown for ATM (A), CHEK2 (B), and TP53 (C). Gene selected based on being significantly associated with at least one lymphoid subtype. OR Odds Ratio, CI Confidence Interval. Estimates were adjusted for age and sex. NA denotes not applicable (too few events [<4] to calculate a stable odds ratio). SLL small lymphocytic lymphoma, DLBCL diffuse large B-cell lymphoma, FL follicular lymphoma, HL Hodgkin’s lymphoma, MCL mantle cell lymphoma, MM multiple myeloma, MZL marginal zone lymphoma.

Patient characteristics of cases with PV

Given that ATM, CHEK2, and TP53 were significantly associated with LM risk, we next investigated patient characteristics of cases with or without PV in these genes (Table 2). A total of 202 (6.8%) cases had a PV. Overall, cases who were PV carriers across these three genes had an elevated frequency of positive family history of hematologic malignancies compared to non-carrier cases (20.6% vs. 14.7%, p = 0.06), which was largely driven by the positive family history frequency of PV in CHEK2 (22.7%). However, cases who were PV carriers vs. non-carriers had similar median age at diagnosis (median age 63), similar proportion of cases under age 60 (40.1% vs 41.1%, respectively, p = 0.78) and similar proportion of males (58.6% vs. 57.9%, respectively, p = 0.86). Lastly, 19.6% of cases with a PV had a prior history of solid tumor cancer and were not significantly different than the 15.6% of cases who were non-carriers (p = 0.27).

Table 2 Patient characteristics by pathogenic variant status overall and by gene.

Discussion

In this case-control study of 6990 cases and 42,632 controls, we report the prevalence of PV in 19 cancer predisposition genes commonly found on genetic testing panels and provide estimates of the risk of LM associated with these PV. Overall, cases had a significantly higher prevalence of PV compared to controls. ATM and CHEK2 were significantly associated with an approximately 2-fold increased risk, while TP53 was associated with a 9-fold increased risk of LM. These associations were validated using 4772 LM cases and 389,731 controls from the UKB, further confirming that these cancer predisposition genes are associated with LM. Furthermore, PV in ATM, CHEK2, and TP53 were associated with moderate to high risk of specific LM subtypes, but not all subtypes, highlighting the heterogeneity of LM.

The cancer predisposition genes investigated here are associated with the risk of an array of solid tumor cancers, with some evidence in LM. In an analysis of 23 hereditary cancer genes from 3 cohorts, carriers of PV in ATM had a 1.9-fold increased risk of NHL [33]. CHEK2 was associated with 2-fold elevated risk of NHL in a Polish study [34] and was replicated in a Czech study (2.9-fold increase) [35]. Lymphoma risk was associated with PV in ATM (2.6-fold increase) and TP53 (5.2-fold increase) in a recent study using a Japanese biobank [26]. Our results for ATM, CHEK2, and TP53 from both the Mayo Clinic and UKB cohorts are consistent with these smaller studies. Collectively, these results provide strong evidence that these cancer predisposition genes are associated with LM. Conversely, a few studies have suggested that the BRCA genes are associated with increased risk of NHL, HL, or MM including BRCA1 (7.7-fold increase) and BRCA2 (5.9-fold increase) in the Japanese population [26] and BRCA2 (3.3-fold increase) in pediatric or adolescent lymphoma survivors from St. Jude Life cohort [25], and BRCA1 (3.9-fold increase) and BRCA2 (7.0-fold increase) in MM cases compared to gnomAD controls [36]. We did not find evidence of an association between PV in either gene with the risk of LM. In support of our BRCA1 and BRCA2 results, there was not an increased frequency of PV in LM cases compared to controls in the UKB for BRCA1 (0.2% in cases vs. 0.1% in controls) or BRCA2 (0.3% in cases vs. 0.3% in controls). Further research is required to understand these discrepancies, including the need for replication in these target populations as well as the possibility of population specific associations through founder mutations, although this is unlikely given the similar frequencies of PV in the control population in the Japanese population and our study; an age related association that could not be detected in our study due to older age of onset compared to the St Jude study; or comparison to gnomAD controls which may have a different population structure compared to ascertainment of cases.

As with the heterogenous nature of LM, we also observed heterogeneity with regard to genes associated with LM. The 3 genes (ATM, CHEK2, TP53) associated with overall LM in our study were generally associated with multiple subtypes but not consistently the same subtypes. For example, the risk of DLBCL was associated with PV in CHEK2 and TP53 but not ATM. FL was associated with PV in ATM but not CHEK2 nor TP53. Studies investigating the relationship between PV in cancer predisposition genes in MM are lacking. Here, MM was the only subtype associated with PV in all three genes, ATM, CHEK2, and TP53. In the Japanese population, PV collectively in ATM, BRCA1, BRCA2, and TP53 was most highly associated with the risk of MCL [26]. Both ATM and TP53 were highly associated with the risk of MCL in our study, with 8.62- and 38.3-fold increased risk, respectively. HL had too few cases with a PV in any gene to estimate risk, which may be suggestive that genes predominately associated with the DNA damage response are not associated with HL. It is worth noting that CHEK2 was consistently elevated (~2-fold) for risk of each subtype (except for HL) but was not always statistically significant.

All three genes found to be associated with LM in this study are directly related to the DNA damage response (DDR) pathway. Mutations in either ATM or CHEK2 disrupt the repair mechanisms for double-strand breaks, impairing the cell’s ability to repair these breaks [37, 38]. Mouse studies have shown that ATM or CHEK2 deficient mice often develop lymphoma [39,40,41]. TP53 is important to cell cycle checkpoints. The absence of functional TP53 results in checkpoint failures and unrepaired double-strand breaks. As with ATM and CHEK2, mutations in TP53 predispose mice to lymphoma [42, 43]. Our large study confirms prior mouse models that the association between these genes, which are integral to DNA repair, are associated with risk LM.

The presence of PV in cancer predispositions genes has often been shown to be associated with patient and clinical characteristics, such as younger age of onset of disease, family history, or histology[14, 44]. In this study, we found no evidence of an association between PV carriers and non-carriers with age of onset, sex, or enrichment of family history of hematological malignancies in our cases. These results are consistent with those reported in a study from Japan [26]. Our null results with respect to patient characteristics at the time of diagnosis provide little information as to which individuals carry a PV in one of these three genes. Therefore, these results do not inform NCCN criteria to identify individuals that might benefit from genetic testing based on risk of LM, nor do they support the need for routine screening given the moderate effect sizes of the associations. However, this does not mean these results have no clinical utility. Individuals who do qualify for genetic testing based on NCCN criteria for other malignancies can be informed of their risk of LM. Further, family members that may carry a PV in a cancer predisposition gene may benefit from understanding their risks of LM. Additional research is required to investigate the clinical characteristics, response to therapy, toxicity, and survival of patients with LM that carry these inherited PV. This information may inform clinical management, alternative therapies, and surveillance. Finally, the WHO-HEAM5 Classification acknowledges that with the growing number of lymphomas linked to germline predisposition genes, there is a need to incorporate these findings into the classification (as for other organ sites) [2]. This is currently recommended as using conventional criteria for the diagnosis with a notation for any germline associations. It is not yet clear how many germline associations may yet be identified.

This study is not without limitations. This study population is almost exclusively of European ancestry and may not completely generalize to non-European ancestries and should be a focus of future studies. As our DNA source was peripheral blood, some mutations could be somatic (i.e. clonal hematopoiesis of indeterminate potential, or CHIP), most prominently TP53 [45, 46]. Although we did not confirm our mutations as inherited or somatic through orthogonal sequencing, our variant allele fractions were normally distributed around 0.5 (Fig. S1), fractions that are in line with inherited mutations. To mitigate including CHIP variants, we restricted the VAF to be within 0.3–0.7 for TP53; however, we cannot rule out that a small percentage of TP53 mutations included in this study were not large clones. Nonetheless, our findings agree with other studies that have investigated inherited PV in TP53 LM patients [26, 47].

In summary, we identified that established cancer predisposition genes are associated with a moderate to high risk of LM overall and with LM subtypes. These results can be used to better inform individuals of their risk of LM. Furthermore, this study demonstrates that rare inherited genetics is an additional source of variation that expands our understanding of the genetic architecture of LM risk.