Introduction

Lynch syndrome (LS) is an inherited condition resulting from defective DNA mismatch repair (MMR) due to germline pathogenic variants in MLH1 (path_MLH1), MSH2 (path_MSH2), MSH6 (path_MSH6), or PMS2 (path_PMS2)1. The National Comprehensive Cancer Network (NCCN) guideline recognizes colorectal, endometrial, gastric, ovarian, pancreatic, urothelial, brain (usually glioblastoma), biliary tract, and small intestine cancers and sebaceous adenomas, sebaceous carcinomas, and keratoacanthomas (Muir-Torre syndrome) as LS-related cancers2. There are clinal guidelines providing selection criteria for genetic testing for LS, which uses age at diagnosis, personal and family cancer histories, and history of tumor MMR deficiency2. The Amsterdam II criteria and revised Bethesda guidelines are widely used to identify patients and families at risk for LS. The Amsterdam II criteria have high specificity (up to 98%) and low sensitivity (between 27 and 42%), while the revised Bethesda guidelines have higher sensitivity (82–95%) and lower specificity (77–93%)3. According to a study analyzing 10,206 colorectal cancer patients, 27.2% and 68.6% of 312 patients with LS met the Amsterdam II criteria and the revised Bethesda guidelines, respectively4. The NCCN guideline also provides the criteria for genetic testing, but around one-third of patients with early-onset colorectal cancer who are carriers of pathogenic variants are often missed5.

One approach to improve the sensitivity of testing is universal screening, in which all individuals newly diagnosed with colorectal and endometrial cancer have either microsatellite instability (MSI) or immunohistochemical testing for the absence of one of the four MMR proteins6. This approach provides a sensitivity of 100% and a specificity of 93.0% for identifying LS in patients with colorectal cancer4. Routine screening for LS in patients with colorectal and endometrial cancer up to 70 years of age is also recommended because of the cost-effectiveness7,8. The NCCN guideline, with sections dedicated to LS, showed a germline multigene panel test (MGPT) strategy, which is an alternative approach to tumor- and family history-driven selection of patients with colorectal or endometrial cancer for genetic testing2. While this MGPT strategy offers increased sensitivity, the guideline indicates several evidence gaps. First, its effectiveness in individuals not selected based on clinical criteria remains uncertain. This is partly because existing LS data primarily originate from preselected populations, which may inflate the estimated prevalence of pathogenic variant carriers8,9,10,11,12,13. Second, data on how MGPT results influence subsequent testing of family members are currently insufficient. There are also challenges such as the uncertainty of cost-effectiveness and the lack of data regarding individuals from diverse population backgrounds14,15,16. In addition, the increased number of identifying variants of uncertain significance (VUS) is expected. The International Society for Gastrointestinal Hereditary Tumors (InSiGHT) database provides annotation of variants in MMR genes; 40% of the variants identified in UK Biobank samples remain VUS in the database9. Therefore, further investigations are required for the germline MGPT strategy.

The present study aimed to provide evidence that can be used to support the development or revision of criteria for germline testing for LS by examining the clinical and demographic profiles of pathogenic variants in MMR genes using biobank samples and comparing our findings with the NCCN guidelines and existing literature. We analyzed the coding regions of MMR genes in 74,085 unselected patients with 23 cancer types and 38,842 individuals without cancer as controls from BioBank Japan. This study reveals carrier frequencies and cancer risks for each type, along with clinical features including age at diagnosis, incidence of multiple cancers, and family history. These data provide useful insights for guidelines on genetic testing for Lynch syndrome.

Methods

Participants

We obtained samples from 74,085 patients diagnosed with 23 different types of cancer (Table 1) through BioBank Japan, which collected DNA and clinical information nationwide between April 2003 and March 2018, without bias toward suspected hereditary diseases10,11,12. The participants were representative of the general patients in Japan10,11,12. Additionally, we enrolled 38,842 individuals with no cancer history as controls, selected via frequency matching for sex, age, and hospital area.

Table 1 Participant characteristics in the study

All the participants provided written informed consent. This study was approved by the ethics committees of the Institute of Medical Sciences, University of Tokyo, and the RIKEN Center for Integrative Medical Sciences.

Sequencing and bioinformatics analyses

For germline sequencing, we analyzed all coding regions and 2 bp flanking intronic sequences (10,363 bp) of all transcripts (Consensus CDS, release 15; Supplementary Table 1)13 of MLH1, MSH2, MSH6, and PMS2, excluding exons 10–15 of PMS2 owing to the presence of the pseudogene PMS2CL14,15. This was achieved using a multiplex polymerase chain reaction (PCR)-based target sequence method16. Experimental and bioinformatics analyses were conducted as outlined in our prior study17. Multiplex PCR primers for MMR genes are listed in Supplementary Data 1. Variants with minor allele frequencies below 0.1% in the controls in this study and predicted as loss-of-function, categorized as HIGH impact by SnpEff18 or listed as pathogenic and likely pathogenic in ClinVar (October 15, 2022)19, were collectively designated as pathogenic variants.

We evaluated the clinical validity of in silico pathogenicity prediction tools to minimize the number of VUS in MMR genes. We predicted the pathogenicity of missense variants (MAF < 0.1%) using 41 software tools (Supplementary Table 2). Subsequently, we conducted a case-control association analysis of individuals with colorectal and endometrial cancers harboring putative pathogenic missense variants, after excluding those with pathogenic or likely pathogenic variants from ClinVar. If a particular software program exclusively identified pathogenic variants, its OR would be similar to that observed in the association analysis of loss-of-function variants and pathogenic or likely pathogenic variants from ClinVar.

To investigate the contribution of copy number deletions of exons in MMR genes to LS, we examined copy number deletions using HI-CNV software20, in 69,494 individuals (61.5% of case and control samples) with single-nucleotide polymorphism array data. This analysis included copy number deletions encompassing exons 8 and 9 of EPCAM, upstream of MSH221. Furthermore, we performed this case-control association analysis of individuals with copy number deletions after excluding those with loss-of-function variants and pathogenic or likely pathogenic variants from ClinVar.

Statistical analysis

In the case-control association analysis, we used logistic regression under a dominant model, with age at registration, sex, and hospital area as covariates to control for potential confounding effects of these variables. We used Welch’s t-test to explore the associations between pathogenic variants and continuous variables such as age at diagnosis. We applied Fisher’s exact test or the Cochran-Armitage test for discrete variables, including multiple cancer types and family history. We evaluated the association between path_MMR and overall survival using Cox regression models adjusted for age at diagnosis, sex, and area, with significance estimated using likelihood ratio tests. To reduce immortal time bias, the survival analysis was limited to individuals enrolled in BioBank Japan within 92 days of diagnosis. All statistical tests were two-sided, with P < 0.05 considered significant. The Bonferroni correction was applied to case-control analyses. All analyses were conducted using R version 4.1.2 (R Foundation). Missing data are excluded from each analysis.

Results

Patient characteristics

Table 1 presents the characteristics of 74,085 patients with 84,333 diagnosed cancers (42.9% female) and 38,842 controls without cancer (43.8% female). Among the patients, 8972 (12.1%) were diagnosed with 2–5 cancer types. The mean age at registration was 66.8 (standard deviation 11.3) years for patients and 66.6 (11.5) years for controls.

Germline pathogenic variants in MMR genes

Following sequencing quality control procedures, our study included 73,677 patients (83,880 cases) and 38,297 controls, with 99.9% of the target region covered by at least 20 sequence reads. Of the 1692 genetic variants with <0.1% allele frequency in the controls, 228 were classified as pathogenic (Supplementary Table 3). The prevalence of pathogenic variants was highest for MSH6 (0.2%), followed by MSH2 (0.1%), MLH1 (0.05%), and PMS2 (0.04%). Supplementary Fig. 1 illustrates the distribution and frequency of the pathogenic variants identified in the patients. The proportion of path_PMS2 among pathogenic variants in MMR genes (path_MMR) in this study (5.7%) was lower than that observed in a study using samples from the UK Biobank (51.9%)9. This was primarily due to the difference in the prevalence of the c.137 G > T (p.Ser46Ile) variant in PMS2, which is frequently observed in the UK Biobank but not in this study. In addition, variants previously reported as founder or high-frequency pathogenic variants in other populations were absent or observed only as solitary cases in this study cohort; these included c.2252_2253del (p.Lys751fs) and c.2059 C > T (p.Arg687Trp) in MLH1, identified as founder variants in the Italian and Swedish populations, respectively22; c.1165 C > T (p.Arg389*) in MSH2 and c.10 C > T (p.Gln4*) in MSH6, reported as founder variants in the Quebec region22; and c.199 G > A (p.Gly67Arg) in MLH1 and c.1147 C > T (p.Arg383*) in MSH2, known as a high-frequency pathogenic variant in the Chinese population23. On the other hand, there were high frequent pathogenic variants which were observed in 10 or more patients; these included c.1684 G > T (p.Glu562*) in MSH2 (30 individuals, 27.8% of the patients had pathogenic variants in MSH2 genes), c.3261dupC (p.Phe1088fs) in MSH6 (19 individuals, 10.4%), and c.943 C > T (p.Arg315*) in PMS2 (19 individuals, 44.2%).

As depicted in Fig. 1, the highest prevalence of path_MMR was observed in 5.0% of patients diagnosed with endometrial cancer, comprising 0.7% for MLH1, 1.7% for MSH2, 2.6% for MSH6, and 0.1% for PMS2 (one patient possessed pathogenic variants in both MSH2 and MSH6) (Supplementary Data 2). Among the LS-related cancer types, the second and third highest prevalence of path_MMR was observed in 3.6% and 1.0% of ureteral and colorectal cancers, respectively, although the number of patients with ureteral cancers analyzed in this study was 56, and the number of pathogenic variant carriers was only one for each of the MSH2 and MSH6 genes.

Fig. 1: Frequency of pathogenic variants in DNA mismatch repair genes across 23 cancer types and controls.
Fig. 1: Frequency of pathogenic variants in DNA mismatch repair genes across 23 cancer types and controls.
Full size image

Patients with pathogenic variants in both MSH2 and MSH6 are indicated as MSH2_MSH6. The number of each cancer patients and controls are described in Table 2. The number of patients with ureteral cancer analyzed in this study was 56, and the number of pathogenic variant carriers is only one for each of the MSH2 and MSH6 genes.

The prevalence of patients with path_MMR and tumor MSI-H was compared to obtain a comprehensive picture of LS. As tumor MSI-H data were unavailable in BioBank Japan, we used tumor MSI-H data from 17 cancer types from a published large-scale Japanese study as an alternative24. All cancer types showed a higher prevalence of tumor MSI-H carriers than of path_MMR carriers (Supplementary Fig. 2). When focusing on cancer types with 100 samples or more, the prevalence of path_MMR carriers and tumor MSI-H was correlated (Spearman’s rank correlation coefficient 0.57, P = 0.033), and the prevalence of tumor MSI-H carriers was, on average, 6.2-fold higher than that of path_MMR carriers, ranging from 1.2- to 15.8-fold.

Association between each gene and each cancer type

We conducted 92 association analyses involving all patients and controls for each gene (23 cancers × 4 genes) (Table 2, Supplementary Table 4). Among LS-related cancer types, we identified 11 significant associations with P-values < 5.43 × 10−4 ( = 0.05/92) as follows: colorectal cancer: MLH1 (odds ratio [OR] 111.7; 95% confidence interval [CI] 15.5–806.1; P = 2.93 × 10−6), MSH2 (OR 16.2; 95% CI 8.0–32.9; P = 1.52 × 10−14), and MSH6 (OR 6.3; 95% CI 3.9–10.2; P = 8.05 × 10−14); endometrial cancer: MLH1 (OR 117.0; 95% CI 15.3–896.9; P = 4.60 × 10−6), MSH2 (OR 53.6; 95% CI 21.0–136.9; P = 8.65 × 10−17), and MSH6 (OR 148.2; 95% CI 46.4–473.9; P = 3.49 × 10−17); gastric cancer: MLH1 (OR 40.9; 95% CI 5.2–322.1; P = 4.29 × 10−4) and MSH2 (OR 5.8; 95% CI 2.5–13.4; P = 3.70 × 10−5); ovarian cancer: MSH6 (OR 29.5; 95% CI 8.0–108.8; P = 3.81 × 10−7); brain tumor: MLH1 (OR 216.6; 95% CI 13.4–3515.6; P = 1.55 × 10−4); and ureteral cancer: MSH2 (OR 101.9; 95% CI 12.2–848.3; P = 1.90 × 10−5). Additionally, eight nominal associations with P < 0.05 were observed in LS-related cancers. In contrast, pancreatic cancer did not exhibit any association with these four genes. Among the 15 other cancer types, we identified a significant association between path_MSH2 and bladder cancer (OR 26.6; 95% CI 7.4–94.8; P = 4.39 × 10−7) and between path_MLH1 and bone cancer (OR 378.6; 95% CI 22.6–6350.0; P = 3.68 × 10−5), alongside seven nominal associations.

Table 2 Associations Between Pathogenic Variants in DNA Mismatch Repair Genes and the Risk of Eight LS-related Cancers and Controls

Age at diagnosis in carrier and non-carrier patients with colorectal, endometrial, gastric, and ovarian cancers

Given the observation of at least 10 carrier patients with a pathogenic variant in at least one gene, we focused on colorectal, endometrial, gastric, and ovarian cancers to compare the ages at diagnosis between carrier and non-carrier patients. Among patients with colorectal cancer, those carrying path_MLH1 (−12.4 years, P = 6.62 × 10−6), path_MSH2 (−12.2 years, P = 4.57 × 10−7), and path_MSH6 (−4.3 years, P = 8.32 × 10−3) exhibited significantly earlier diagnoses (Fig. 2, Supplementary Data 3). Among patients with endometrial cancer, carriers of pathogenic variants in path_MLH1 (−6.6 years, P = 6.86 × 10-3) and path_MSH2 (−8.5 years, P = 8.05 × 10−5) demonstrated significantly earlier diagnoses. However, patients with gastric cancer do not have a significantly earlier diagnosis for any gene. Patients with ovarian cancer harboring path_MSH6 exhibited significantly earlier diagnoses (−5.0 years, P = 0.035).

Fig. 2: Age of diagnosis in patients with Lynch syndrome-related cancers by DNA mismatch repair genes.
Fig. 2: Age of diagnosis in patients with Lynch syndrome-related cancers by DNA mismatch repair genes.
Full size image

Error bars indicate the standard deviation. The (estimated) average age of diagnosis based on NCCN guidelines for carriers is depicted by a triangle. Differences were evaluated using the Welch’s t-test (two-sided). Number of missing data of age at diagnosis in colorectal, endometrial, gastric, and ovarian cancers are 3369, 360, 2983, and 263, respectively.

We also compared the average age at diagnosis of carriers with pathogenic variants to those described in the NCCN guidelines (indicated by red triangles in Fig. 2). Compared to those reported in the guidelines, the average age at diagnosis of pathogenic variant carriers was higher for colorectal cancer among path_MLH1 carriers (+9.2) and path_MSH2 carriers (+9.5), as well as for gastric cancer among path_MSH2 carriers (+16.4)2. Conversely, the average age at diagnosis was comparable for endometrial and ovarian cancers2.

Patients with multiple cancers and carrier status

As the selection criteria for genetic testing included multiple LS-related cancers2, we explored the association between the number of cancer types and carrier frequency separately for females and males, considering that endometrial and ovarian cancers affect only females. We observed a higher carrier frequency in two or more LS-related cancer types for path_MLH1 (OR 8.8, 95% CI 3.5–19.7, P = 5.81 × 10-6), path_MSH2 (OR 13.2, 95% CI 7.1–23.7, P = 4.42 × 10-14), and path_MSH6 (OR 3.9, 95% CI 1.9–7.4, P = 1.61 × 10-4) among females and for path_MLH1 (OR 4.0, 95% CI 1.0–12.0, P = 0.027) and path_MSH2 (OR 4.4, 95% CI 1.5–10.8, P = 4.81 × 10-3) among males (Fig. 3a, Supplementary Data 4). In comparison, the same analyses with non-LS-related cancers revealed no associations, with a carrier frequency of <1.0% for all genes (Fig. 3a). We further investigated 53 cancer pairs in groups of more than 50 patients to identify those with high carrier frequencies (Fig. 3b, Supplementary Data 5). Notably, colorectal and endometrial cancer pairs exhibited path_MMR in 24.8% of patients, representing an increase compared to that in patients with either colorectal (OR 41.7, 95% CI 25.5–67.0, P = 3.91 × 10-34) or endometrial (OR 8.5, 95% CI 5.1–14.1, P = 8.47 × 10-15) cancers. The next highest carrier frequency was observed in the endometrial and breast cancer pair (5.6%), although breast cancer is not an LS-related cancer type. The carrier frequencies in patients with other pairs of LS-related cancer types ranged from 0% to 5.3%.

Fig. 3: Carrier frequency in patients with single and multiple cancer diagnoses.
Fig. 3: Carrier frequency in patients with single and multiple cancer diagnoses.
Full size image

a Comparison of carrier frequency of path_MMR between patients with multiple (non-) Lynch syndrome (LS)-related cancers and those with a single (non-) LS-related cancer, with separate calculations for males and females owing to the inclusion of endometrial and ovarian cancers. The P-value was calculated using the Fisher’s exact test (two-sided). b Triangular matrix depicting carrier frequency of path_MMR in patients with multiple cancer diagnoses, with heat map representation. Carrier frequency in patients with a single cancer is displayed below the triangular matrix. Combinations with fewer than 50 patients are shaded gray. LS Lynch syndrome, MMR mismatch repair, PV pathogenic variant, df degree of freedom.

Family history of LS-related cancer and carrier status

A family history of LS-related cancer is a crucial consideration in patient selection for genetic testing2. Therefore, we analyzed the carrier frequency of path_MMR and the presence of a cancer-related family history in first-degree relatives. First, we investigated the association between the number of family members with or without LS-related cancers and the carrier frequency. We observed an increase in carrier frequency from 0.2% in patients with no history of LS-related cancer to 9.4% in patients with four or more family histories in first-degree relatives with LS-related cancer (P = 8.64 × 10−45) (Fig. 4a, Supplementary Data 6). As a negative control, we conducted a similar analysis with the number of family members affected by non-LS-related cancers, which revealed no significant increase in the carrier frequency (P = 0.777).

Fig. 4: Carrier frequency among individuals with a family history among first-degree relatives.
Fig. 4: Carrier frequency among individuals with a family history among first-degree relatives.
Full size image

a Carrier frequency of path_MMR according to the number of first-degree relatives’ family history of Lynch syndrome (LS)-related and non-LS-related cancers. The P-value was calculated using the Cochran-Armitage test (two-sided). b Matrix showing carrier frequency of path_MMR between individuals with and without a family history among first-degree relatives, represented as a heat map. Combinations with fewer than 50 patients are shaded gray. LS Lynch syndrome, PV pathogenic variant, FH family history.

Next, we investigated the combination of personal and family histories of cancer because they may also influence the carrier frequency of path_MMR. Notably, the highest carrier frequency (26.0%) was observed in patients with personal and family histories of endometrial cancer (Fig. 4b, Supplementary Data 7). This frequency was higher (OR 7.4, 95% CI 3.5–14.8, P = 4.45 × 10-7) than that in patients with a personal history alone (4.5%). The second highest frequency, at 16.1%, was observed in patients with endometrial cancer and a family history of colorectal cancer, representing an increase (OR 5.2, 95% CI 3.3–7.9, P = 4.52 × 10-13) compared to that in patients with a personal history alone (3.6%). The carrier frequencies of patients with other combinations of LS-related cancer types, as well as those between LS-related cancer types and non-LS-related cancer types or among non-LS-related cancer types, were 0–7.8% and 0–7.1%, respectively.

Survival analysis of carrier and non-carrier patients with colorectal and gastric cancers

We examined the relationship between path_MMR and survival outcomes in colorectal and gastric cancers because survival data from ≥10 carrier patients were available only for these two cancer types. The number of patients analyzed was 2190 (19 carriers, 2171 non-carriers) with colorectal cancer and 2160 (11 carriers, 2149 non-carriers) with gastric cancer. Patients with colorectal cancer who carried path_MMR exhibited a better prognosis than that of non-carriers (hazard ratio 0.305, 95% CI 0.093–1.003; P = 0.021). However, the association between carrier status and survival was not statistically significant for gastric cancer (hazard ratio 0.911, 95% CI 0.323–2.571; P = 0.859) (Supplementary Fig. 3).

Performance evaluation of in silico prediction tools for missense variants

We conducted a performance evaluation of in silico prediction tools aimed at reducing the number of VUS, by examining whether individuals with variants predicted to be “pathogenic” actually developed the disease, as this is the most clinically relevant factor. Variants predicted to be pathogenic by each tool were analyzed in the same manner as the loss-of-function and ClinVar pathogenic/likely pathogenic variants, and their OR were compared to identify the tool whose predictions most closely approximate the pathogenicity of the loss-of-function and ClinVar pathogenic/likely pathogenic variants. For MLH1, we noted considerable variability in the number of predicted pathogenic variants (ranging from 1 to 188) among 41 tools (Supplementary Fig. 4). While the OR for pathogenic variants—defined by loss-of-function mutations and ClinVar annotations—was 111.7 in colorectal cancer and 117.0 in endometrial cancer, the OR for predicted pathogenic variants remained close to one across all software tools tested (Supplementary Fig. 5). Similarly, poor performance was observed for other genes (Supplementary Figs. 69).

Detection and risk estimation of copy number deletion

We identified 18 copy number deletions: 1 in MLH1, 7 in MSH2, 4 in MSH6, 1 in PMS2, 3 in EPCAM, and 2 involving both MSH2 and EPCAM. Notably, no recurrent copy number deletions, defined as deletions occurring in three or more individuals, were detected. Given the limited number of copy number deletions within each gene, we conducted a combined case-control association analysis for all deletions in MMR genes. Endometrial (OR 79.1; 95% CI 9.6–649.7; P = 4.73 × 10-5), colorectal (OR 8.1; 95% CI 2.5–26.3; P = 5.53 × 10-4), and cervical (OR 40.7; 95% CI 4.2–369.7; P = 1.42 × 10-3) exhibited significant associations, with P-values < 2.17 × 10-3 (=0.05/23).

Discussion

The present study is the largest and most comprehensive case-control association study using sequencing data from unselected patients. The results revealed different risks among LS-related cancer types and MMR genes, earlier age at diagnosis in carriers with pathogenic variants, and higher carrier frequencies in patients with personal and/or family histories of colorectal and endometrial cancers. We also observed several discrepancies between our findings and the germline MMR gene testing recommendations outlined in the NCCN guidelines.

The combinations of path_MMR genes and cancer types that showed significant associations in this study also exhibited increased cumulative incidences by the age of 75 years, based on data from the Prospective Lynch Syndrome Database25, except for cancer types for which only a single carrier was identified in this study. Specifically, the cumulative incidences were as follows: for colorectal cancer, 45.8% in MLH1, 43.0% in MSH2, and 15.0% in MSH6; for endometrial cancer, 42.7% in MLH1, 56.7% in MSH2, and 46.2% in MSH6; for gastric cancer, 7.1% in MLH1 and 7.7% in MSH2; for ovarian cancer, 13.1% in MSH6; for bladder cancer, 8.1% in MSH2; and for ureteral cancer, 17.8% in MSH225. When compared with the results of a study that analyzed LS using data from the UK Biobank9, our findings were also consistent in that the risks of colorectal and endometrial cancer were elevated for MLH1, MSH2, and MSH6, but not for PMS2. In contrast, pancreatic cancer was not associated with any genes in this study, and no path_MLH1 carriers were observed among the 1137 patients with pancreatic cancer. These results were consistent with the data, which displayed the lowest proportion of tumor MSI-H cases (0.76%) in pancreatic cancer among LS-related cancers in this population24. Based on an prospective study of 3119 path_MMR carriers, a relative risk of pancreatic cancer in path_MLH1 carriers at the age of 75 years were 7.825. In a case-control study analyzing 21 cancer predisposition genes in 2999 patients with pancreatic cancer (mostly non-Hispanic white), the prevalence of the path_MLH1 carrier is 0.13%26. The statistical power of the sample size in this study (n = 1137) for detecting that frequency was 77.2%, indicating a relatively sufficient sample size. However, it is important to validate this association in this cohort by using a larger sample size. In women with Lynch syndrome, breast cancer showed immunohistochemical loss of MMR protein expression in 42–51% of cases27,28, and was initially listed as an LS-related cancer in the NCCN guidelines and is still recognized as such by InSiGHT2. However, large-scale association analyses of breast cancer showed no significant associations29,30. A recent study using samples from the UK Biobank also suggested that breast cancer is not an LS-related cancer. Our case-control analysis of 11,595 female patients with breast cancer showed no significant associations with the four MMR genes, which supports the observations of these studies.

The present study revealed that carriers were diagnosed 4.3–12.4 years earlier than non-carriers. Notably, the average age at the diagnosis of colorectal and gastric cancers in this study was approximately 10 years higher than that indicated in the NCCN guidelines. In a study analyzing samples from the UK Biobank9, the diagnosed ages for colorectal cancer were comparable to those in this study, both in carriers and non-carriers. For endometrial cancer, patients in our study, including non-carriers, were diagnosed approximately five years earlier than those in the study analyzing samples from the UK Biobank. This may reflect ethnic differences, as SEER cancer statistics indicate that Asians are diagnosed with endometrial cancer at a median age of six years younger than Non-Hispanic Whites31. Therefore, the age at diagnosis in the guidelines for endometrial cancer may also be younger than that in the biobank samples. One possible reason for the discrepancy in age at diagnosis between the guidelines and studies analyzing biobank samples is that the guidelines derive age at diagnosis primarily from follow-up studies of pathogenic variant carriers who undergo surveillance for LS-related cancers2. Colorectal and gastric cancers may be detected earlier through surveillance such as endoscopy. In contrast, in population-based studies such as ours, these cancers are often diagnosed in general hospitals after symptom presentation. Nevertheless, both findings regarding the age at diagnosis are crucial in personalized medicine. While the NCCN guidelines specify the age at diagnosis for surveillance purposes, they may not be effective for identifying carriers for genetic testing owing to their relatively young age. For example, selecting patients based on the average age at diagnosis for path_MSH2 carriers in gastric cancer outlined in the NCCN guidelines (52 years) or the age at diagnosis indicated in the guidelines’ selection criteria (50 years) would result in missing 90% and 100% of the carriers in our cohorts, respectively. These data suggest that different age criteria, derived from follow-up studies of carriers and case-control investigations, should be used for different purposes.

Clinical guidelines for selecting patients for genetic testing for LS, including the Amsterdam II criteria and revised Bethesda guidelines, as well as the NCCN guideline, emphasize the personal and family histories of patients with LS-related cancers. While the NCCN guideline prioritizes personal and family history of colorectal and endometrial cancers2, they contain selection items that do not distinguish between different types of LS-related cancers. The data from this study demonstrated that even among LS-related cancers, the carrier frequency of path_MMR varied greatly depending on the type of cancer present in the family history (0%–26.0%) or a combination of multiple cancers (0%–24.8%), suggesting that refining which LS-related cancers are considered could enhance the specificity of these guidelines. In addition, the carrier frequencies of path_MMR remained high even among patients without a family history of cancer—3.6%–5.1% in endometrial cancer and 0.7%–1.1% in colorectal cancer (as shown in parentheses in Fig. 4b). This is consistent with previous reports indicating that a substantial number of patients with LS do not meet traditional clinical criteria, such as the Amsterdam II or revised Bethesda guidelines4. This finding highlights the limitations of relying on family history to identify patients with LS and supports the implementation of a universal screening strategy to avoid missing opportunities for clinical intervention in both patients with LS and their relatives.

This study had two limitations. First, approximately half of the PMS2 gene was not sequenced because of the presence of a pseudogene, as noted in previous studies15,32,33. However, as the risk associated with PMS2 pathogenic variants is relatively small compared to that with other MMR genes, any potential underestimation in the unsequenced region would likely affect both patients and controls similarly, minimizing its clinical impact. Second, the tumor MSI-H data referenced in this study were not derived from the same samples analyzed in this study. Although only germline DNA was available in BioBank Japan, we considered it important to compare the carrier frequency of pathogenic variants with the frequency of tumor MSI-H; therefore, we used publicly available large-scale data on tumor MSI-H as an alternative. As tumor MSI-H data were obtained from the same population (Japanese), the prevalence was expected to be comparable.

In conclusion, this study delineated the disease risk and clinical and demographic characteristics associated with pathogenic variants of MMR genes across 23 cancer types. A comparison of these findings with the descriptions in the clinical guidelines, primarily derived from follow-up studies of carriers, revealed several disparities in the disease risk and characteristics of carrier patients. These disparities illustrate the complementary nature of the two types of studies and highlight their indispensable roles in advancing personalized medicine for LS.