Abstract
Microsatellite instability (MSI) serves as an important therapeutic and prognostic marker. Immunohistochemistry (IHC) and polymerase chain reaction (PCR) have long been regarded as the “gold standard” for MSI detection. In recent years, next-generation sequencing (NGS)-based methods, offering expanded target coverage of microsatellite (MS) loci and improved analytical performance, have gained widespread acceptance. However, discrepancies between NGS and traditional methods have been occasionally reported. In this study, we conducted a large-scale retrospective analysis of MSI results of 35,563 Chinese pan-cancer cases which underwent a NGS test in a central clinical laboratory. Our work introduced a novel algorithm for NGS-based MSI detection, examined MSI-H prevalence and value distribution across cancer types, identifies MSI-associated genes and variants, evaluates the MSI detection discordance between PCR and NGS in a pan-cancer context, and distilled 7 MS loci suitable for pan-cancer MSI detection.
Similar content being viewed by others
Introduction
Microsatellites (MS), also known as short tandem repeats (STR), consisting of DNA sequences formed by tandem repetitive units of 1-6 nucleotides, are ubiquitous in the human genome1. When the DNA mismatch repair (MMR) machinery is compromised by acquired or inherited factors, deletions or insertions of one or more units accumulate at MS loci. This phenomenon is termed microsatellite instability (MSI)2,3.
MSI is most prevalent in endometrial, gastric and colorectal cancers, with lower incidence in other malignancies4. It serves as an important therapeutic and prognostic marker. High-level MSI status (MSI-H) is associated with sensitivity to immune checkpoint inhibitors and resistance to 5-fluorouracil-based chemotherapy5, therefore, accurate identification of patients’ MSI status is vital.
According to ESMO guidelines6, IHC-based methods targeting protein MLH1, MSH2, PMS2 and MSH6 (IHC-MSI) are the preferred approach for MSI testing. IHC indirectly assesses the integrity of MMR function by checking nuclear location of key MMR proteins. However, the results of IHC-MSI can be influenced by various factors such as pre-analytical processing of samples7 and non-truncating inactivating mutation of genes8. As reported, ~5–11% of MSI samples are caused by such mutations in MMR genes4, which results in loss of function of the gene products but retain their antigenicity. When IHC results indeterminate, PCR-based methods (PCR-MSI) are employed. PCR directly determines the integrity of MMR function by check length changes of microsatellites caused by insertions or deletions of repeating units due to unrepaired “replication slippage”4. At present, a PCR panel of five quasi-monomorphic poly-A mononucleotide repeats9, the most popular commercial implementation of which came from Promega, is widely used for its better performance than the Bethesda panel2. Though the concordance of PCR-MSI and IHC-MSI can reach up to 97%10, it is noteworthy that current approved PCR-MSI testing products of the five loci are intended only for colorectal cancers; the usage of such products on other non-colorectal malignancies is still controversial11.
In recent years, NGS-based MSI detection methods (NGS-MSI) have gained widespread acceptance. NGS check length changes of MS loci as well but can expand the number of MS loci targets, thereby potentially being able to improve the analytical performance, particular in non-colorectal samples. NGS-MSI are highly concordant with PCR-MSI in colorectal cancers; however, some discordance has been reported in non-colorectal cancers12,13,14,15. In a study led by Memorial Sloan Kettering Cancer Center, NGS-MSI detection methods demonstrated 99.4% concordance with PCR or IHC in colorectal and endometrial cancers, and 96.6% concordance with PCR in non-colorectal or non-endometrial cancers14. Several NGS-MSI algorithms have been developed, such as MSIsensor16, MSI-ColonCore17, etc.
In this study, we conducted a large-scale retrospective analysis of NGS-MSI results of 35,563 Chinese pan-cancer cases. Here we introduced a novel NGS-MSI algorithm, examined the prevalence and the genomic variation association of MSI, systematically evaluates the discordance of NGS-MSI with PCR-MSI in a pan-caner context, and, finally extract 7 MS loci suitable for pan-cancer MSI detection.
Results
Development of a NGS-based MSI detector
An in-house NGS-based MSI detector, MSIDRL, was developed primarily according the idea of Wang et al18.
Initially, top 500 most robust noncoding MS loci in 10 colorectal circulating tumor DNA whole-exome sequencing assays were selected. Capture probes targeting these loci were designed and synthesized forming a prototype panel. A training set of 105 pan-cancer FFPE samples, whose MSI status had been determined with PCR (31 MSI-H and 74 MSI-L/MSS, Supplementary Table 2, Training Set), were assayed with the prototype. For any MS locus i, the reads covered the entire repeat were counted and summed in the MSI-H samples and the MSI-L/MSS samples separately, and cumulatively computed by observed repeat length. The observed repeat length maximizing the cumulative read count difference between the MSI-H samples and the MSI-L/MSS samples was defined as the “diacritical repeat length” of locus i, designated DRLi. For any MS locus i of any sample j, the reads of repeat length longer than DRLi were defined as “stable” reads, the count of which was designated SRCij; the reads of repeat length shorter than or equal to DRLi were defined as “unstable” reads, the count of which was designated URCij. We defined the background noise of locus i as:
Then we have \({b}_{{ij}}=\frac{{{URC}}_{{ij}}}{{{SRC}}_{{ij}}+{{URC}}_{{ij}}}\) for any locus i of any sample j, test the null hypothesis \({H}_{0}:{b}_{{ij}} > {B}_{i}\) with binomial test and obtain the p-value pij. With the PCR-predefined MSI status, determine p-value cutoff Pi for each locus, requiring specificity >= 99.0% and sensitivity as higher as possible.
Top 100 most sensitive MS loci were selected, forming the final panel. These loci do not overlap with the 6 loci of PCR-MSI. The unstable locus count (ULC) of a sample is the count of final panel MS loci whose binomial test p-values less than or equal to the cutoffs. ULC could classify the training set and the validation set properly (Supplementary Fig. 1, Supplementary Table 2).
ULC cutoff & MSI-H prevalence
From June 2020 to July 2023, 35,563 valid cases were tested with the MSIDRL-embedded 733-gene NGS LDT (see Materials and Methods), which produced abundant data entailing fine-tuning of the ULC cutoff.
The pan-cancer ULC distribution is bimodal (Fig. 1A). The first peak appeared at the lower extreme of ULC spectrum, followed by a sharp case count decrease near 10 and then a wide flat valley from 10 to 90. We considered the existence of the first peak as a self-explanatory aggregation of MSS cases, so determined the ULC cutoff as 11. The case count rose gently around 90 and culminated at 100 forming a second peak. Intriguingly, once the cases were inspected across anatomical cancer types, only in GACA and BWCA would be observed the second peak, while the first arose in all cancer types (Supplementary Fig. 2).
A Pan-cancer ULCs demonstrated a bimodal distribution and cases of ULCs >= 11 were considered MSI-H. B Total and MSI-H case counts differed between cancer types of four clusters. BWCA bowel cancers, GACA gastric cancer, UTNP uterine neoplasms, BITC biliary tract cancers, LICA liver cancers, OFPC ovarian cancer including Fallopian tube cancer and primary peritoneal cancer, PACA pancreatic cancer, LUCA lung cancers, The rest, other cancers not above.
With the prevalence of MSI-H calculated (Supplementary Table 3), the cancer types could be grouped into 4 clusters (Fig. 1B). UTNP, GACA and BWCA were common cancers of high MSI-H prevalence; they contributed approximately 80% of the MSI-H cases. BITC, LICA, OFPC, and PACA were common cancers with a lower MSI-H prevalence. LUCA was the most prevalent cancer, but MSI-H was rare in it. The rest cancers were not common, in which few MSI-H cases were reported.
We investigated MSI-H prevalence in some cancer subtypes as well (Supplementary Table 4). Significance difference was observed, between colon cancer and rectal cancer (10.66% vs. 2.19%, p-value = 1.26×10−36), and, esophagogastric junction cancer and esophageal cancer (4.04% vs. 0.30%, p-value = 2.11 × 10−3).
DMGs associated with ULC
Within the scope of the 733-gene panel, 363 genes were found deleteriously mutated in at least one case (Supplementary Data 1), and 94 of which were associated with ULC from a pan-cancer perspective, 92 positively associated and BTK and TERT negatively (Fig. 2). ULC-associated DMGs of a specific cancer type were mostly a subset of the 94 genes, though with some cancer-specific exceptions, such as MYC in UTNP, KRAS in GACA, and CTNNB1 and TP53 in BWCA (Supplementary Fig. 3).
We supposed the DMGs positively associated with ULC and of germline mutation incidence were potential MSI drivers. 29 such genes were discovered, most of them were well-established DNA damage repair genes whose products involved in a physical interaction network (Supplementary Fig. 4). In the MMR genes investigated (MLH1, MLH3, MSH2, MSH3, MSH6, PMS2), MLH3 and MSH3 bore few germline mutation, while a considerable amount were observed in the others.
Variants associated with ULC
To discover more specific factors associated with MSI, ACMG P and LP variants, and Variants of Uncertain Significance (VUS) detected in the data set were analyzed.
From the entire data set perspective, 481 variants were associated with ULC, the majority of which were somatic single-nucleotide indels positively associated with ULC (Supplementary Data 2). A single deletion, chr2:g.148683686del (ACVR2A:NM_001616.5:c.1310del:p.K437Rfs*5), was detected in 66.6% (728/1,093) MSI-H cases (Fig. 3A). Four germline VUSs, chr7:g.6445235 C > T (RAC1, rs836554), chr6:g.43737486 C > T (VEGFA, rs833061), chr10:g.131264931 A > C (MGMT, rs1625649) and chr7:g.6443839 T > C (RAC1, rs4720672), were found negatively associated with ULC, they are all non-coding germline SNVs (Fig. 3B).
Correlation between MSI and TMB
TMB information was available for 97.3% cases of the entire data set (34,588/35,563). We investigated the correlation between MSI and TMB.
As expected, ULC demonstrated a weak positive correlation with TMB in MSI-H cases (Fig. 4A). Meanwhile, a considerable proportion of MSS cases were TMB-H, indicating other MMR-independent mutagenesis mechanisms (Fig. 4A). The overall fractions of MSI-H/TMB-H, MSI-H/TMB-L, MSS/TMB-H, and MSS/TMB-L were 2.97%, 0.08%, 17.49% and 79.47% respectively, though these fractions varied in cancer types (Fig. 4B).
A A ULC-TMB scatter plot of the cohort’s cases (TMB is set to 0.1 if 0). B MSI-TMB status concordance in various cancer types. AMCA ampullary cancer, ANCA anal carcinoma, BITC biliary tract cancers, BLCA bladder cancer, BNCA bone cancer. BRCA breast cancer, BWCA bowel cancers, CECA cervical cancer, CNSC central nervous system cancers, EEJC esophageal and esophagogastric junction cancers, EXPD extramammary Paget’s disease, GACA gastric cancer, GIST gastrointestinal stromal tumors, HCNP histiocytic neoplasms, HNCA head and neck cancers, KASA Kaposi’s sarcoma, KICA kidney cancer, LICA liver cancers, LUCA lung cancers, LYMM lymphomas, MESO mesothelioma, MLNM melanoma, NEAT neuroendocrine and adrenal tumors, NMSK non-melanoma skin cancers, OFPC ovarian cancer including Fallopian tube cancer and primary peritoneal cancer, PACA pancreatic cancer, PECA penile cancer, PRCA prostate cancer, STSM soft tissue sarcoma, TECA testicular cancer, TERA teratoma, THCA thyroid carcinoma, TTCA thymomas and thymic carcinomas, UTNP uterine neoplasms, VVCA vulvar and vaginal cancers.
Discordance between PCR-MSI and NGS-MSI
With the fine-tuned ULC cutoff, we reviewed the validation set of MSIDRL development. Three PCR-determined MSI-L/MSS gastric cases were defined as MSI-H by MSIDRL (Supplementary Fig. 1), indicating a non-negligible discordance between PCR-MSI and NGS-MSI.
To verify the idea, 50 cases with ULCs between 12 and 45 were randomly selected from the data set. Their PCR-MSI results, ULC, MLH1 methylation status and genomic variants were integrated and analyzed. In these NGS-determined MSI-H cases, only 4 were determined as MSI-H by PCR and the rest were MSS/MSI-L, though most of them were either with methylated MLH1 promoter or supportive genomic variants or both, except 2 LUCA cases (Fig. 5, Supplementary Data 3). All MSI-L samples were supported by at least MSI-related variants, with one case supported by extra MLH1 methylation. We believed that these samples are actually MSI-H. This phenomenon may suggest a gap in sensitivity between PCR and NGS.
Shrinkage of the MSI panel
Inspired by the bimodal distribution of ULC (Fig. 1A) and the prevalence of chr2:g.148683686del in MSI-H cases (Fig. 3A), we wondered if a panel of a small number of MS loci was sufficient to present the MSI status of any case of any cancer type.
With an in-house developed greedy algorithm, the classifier performance of virtual panels consisted of different numbers of MS loci was shown in Fig. 6A. Taking the original 100-locus panel as the reference, a panel of 7 loci (Supplementary Table 5) was able to reproduce the MSI status with an OPA (overall percent agreement) of 99.5%, resulting in 115 false positives and 49 false negatives (Fig. 6B).
Discussion
In this study, we developed a novel NGS-based pan-cancer MSI detection algorithm MSIDRL. It avoided the biologically-invalidated empirical statistics assumptions (e.g. mean +/- standard deviation × 3, etc.) applied by previous approaches, such as mSINGS19, MSI-ColonCore17, MSIsensor-pro20. The result of MSIDRL, ULC, reflected the extent that MMR deficiency affects MS loci. Interestingly, ULC distribution aggregated in extremes only in GACA and BWCA (Supplementary Fig. 2), indicating the impact of MMR deficiency in these cancers was more intense and universal than that in other cancers. This phenomenon does not depend on the cancer type constituents of the training set, as similar phenomenon was observed when the loci were selected based on the UTNP samples only (data not shown). We also investigated the prevalence of MSI in various cancer types. Besides the well-established MSI-prevalent UTNP, GACA and BWCA, BITC, LICA, OFPC and PACA contributed a considerable amount of MSI-H cases. The prevalence of MSI-H is lower in these Chinese patients than that in the European cohorts, which is consistent with the previous report21. Prevalence difference was observed between subtypes of cancers, such as colon cancer and rectal cancer, which may explain the controversial effect of MSI biomarker in rectal cancer22.
We analyzed the DMGs associated with MSI status, which may help investigate the mechanism or the consequence of MMR deficiency. Some genes were inactivated by germline mutations, such as BRCA2, ATM, RAD50, MLH1, PALB2 etc., indicating a potential role of MSI drivers; while the other were found with only somatic mutations, such as ACVR2A, MSH3, TGFBR2, KMT2C, RNF43 etc., and these mutations were always found in “hotspot” STR regions in their CDS. Though MMR genes, MLH3 and MSH3 bore few germline mutations, this is consistent with rare MSH3 or MLH3-related hereditary non-polyposis colorectal cancer cases23.
Besides somatic single-nucleotide deletion “hotspots” in STR regions, our study found 4 non-coding germline SNPs of strong but negative association with MSI. None of these SNPs were involved in ULC calculation. These SNPs had been reported to associate with chemotherapy toxicity, susceptibility to cancer, or, prognosis24,25,26, but reports of association with MSI were rare and the protective mechanism underlying MSI suppression is intriguing.
We also studied the relationship between MSI and TMB. It’s not surprising to see the ULC of MSI-H cases demonstrated a correlation with TMB, as parts of variants counted by ULC also were counted by TMB. With a cutoff of 10 mutations per Mb, TMB-H cases includes almost all MSI-H cases showing the potential of TMB as a surrogate biomarker for MSI.
The application of PCR-based five-locus MSI detection panel (either Bethesda or Promega) in non-colorectal samples is still under debate4. In this study, we demonstrated the Promega panel brought putative false negative results in pan-cancer cases. In Supplementary Fig. 2, we can see MSI in GACA and BWCA tends to be global, while in the other cancers, its effects were more diverse. So the lack of representativeness or vulnerability of the classical loci in other cancers caused the ineffectiveness, which was probably related to cancer-specific chromatin organization. Eventually, 7 MS loci with the power of pan-cancer MSI detection were discovered. The consolidation from 100 loci to 7 loci would reduce diagnostic costs.
The main limitation of this article is that we lack clinical trials to definitively determine the clinical effectiveness of NGS and PCR on samples that are inconsistent. We expect head-to-head clinical trials of drug response ultimately evaluate the performance of NGS and PCR.
Methods
Patients and data
Cancer patient cases tested by the capture-based NGS 733-gene laboratory-developed test (LDT) (see “NGS LDT” part and Supplementary Table 1) in the CAP-, CLIA- and ISO15189-certified 3DMed Medical Laboratory (3DMed Biomedical Technology, Shanghai, China) from June 2020 to July 2023 were consecutively recruited to this study, except those failed in any QC procedure. Written informed consent was obtained from all participants, permitting the use of anonymized NGS data for academic research.
All data involved in this study were handled in accordance with the Declaration of Helsinki. This study is exempt from ethical review according to Article 32 of the “Measures for Ethical Review of Life Science and Medical Research Involving Human Beings” (https://www.gov.cn/zhengce/zhengceku/2023-02/28/content_5743658.htm) issued by the National Health Commission of the People’s Republic of China.
NGS LDT
For each patient, a pair of samples was analyzed: formalin-fixed paraffin-embedded (FFPE) tumor tissue alongside either FFPE paracancerous normal tissue or peripheral blood. For FFPE samples, we require at least 15 non-stained slides of 4–5 μm thick, prepared within past one year, and with a tumor content >= 20%, to assure a DNA input of 200 ng. For peripheral blood, we require at least 5 ml collected in a Streck tube. All these samples are transported in ambient temperature.
Genomic DNA was extracted with ReliaPrep™ FFPE gDNA Miniprep System (Promega Corporation, Madison, Wisconsin, USA) or QIAamp DNA Blood Mini Kit (QIAGEN, Germantown, Maryland, USA), and sonicated to an average size of 250 bp. Libraries were prepared with KAPA HyperPrep Kit (KAPA Biosystems, Cape Town, South Africa) and targets were enriched by hybridization with customized single-stranded DNA probes synthesized by Integrated DNA Technologies (Coralville, Iowa, USA). Sequencing was performed on NovaSeqTM 6000 Sequencing Systems (Illumina, San Diego, California, USA) in PE100 or PE150 mode to produce adequate data assuring a minimal mean effective depth of 500×. FASTQ files were mapped to human reference genome hg19. Somatic and/or germline single-nucleotide variation (SNV), insertion or deletion not longer than 40 bp (indel), large genomic rearrangement (LGR), copy-number variation (CNV), gene fusion, and MSI, and tumor mutational burden (TMB) were called or calculated with in-house bioinformatics pipelines.
PCR-MSI
PCR-MSI assays were performed by Guangyue Medical Laboratory (Microread Genetics Co., Ltd., Guangzhou, China) with a multiple fluorescent PCR capillary electrophoresis approach. Amplicon lengths of 6 MS loci (NR-21, BAT-26, NR-27, BAT-25, NR-24, MONO-27) were analyzed. MSS was defined as the situation that no locus of the six altered in tested samples, MSI-L as only 1 locus altered and MSI-H as 2 or more loci altered.
Cancer typing and subtyping
The cancers of the test cases were classified primarily according to the categories outlined in the NCCN cancer treatment guidelines (https://www.nccn.org/guidelines/category_1) with a few minor, arbitrary modifications. The classification was primarily based on anatomy with some consideration of certain histological types. Subtyping of cancer types was based on anatomy or histology depending on cancer types.
Definition of deleteriously mutated genes (DMG)
If an ACMG-classified Pathogenic (P) or Likely Pathogenic (LP)27 somatic or germline SNV, indel, CNV, LGR or fusion variant was detected in a gene in a case, the gene was defined as a DMG of the case.
Protein association analysis
Protein association analysis was performed with STRING (Version: 12.0) (https://cn.string-db.org/). Default parameters were used except that only physical interactions evidenced by experiments were considered.
MLH1 promoter methylation test
MLH1 promoter methylation was tested with MethylTargetTM (GeneSky, Shanghai, China). Sample DNA was bisulfite-converted, amplified and sequenced. Average methylation level of CpG dinucleotides between hg19 chr3:37034654- 37034840 larger than or equal to 10% was defined as MLH1 promoter methylated, otherwise, unmethylated.
Statistical Analysis
Statistical analyses were performed with the SciPy package (Version: 1.10.0) of Python (Version: 3.10.9). Fisher’s exact test was used to analyze categorical data. Mann-Whitney U test was used to compare numerical ULC difference between groups. P-values were Bonferroni-adjusted and converted to Q-values for convenience (Eq. 2). Q-values larger than 0 (i.e. adjusted P-values < 0.01) were considered statistically significant. Correlation between MSI and TMB was analyzed with linear regression.
Data availability
No/Not applicable (Most if not all data generated or analyzed by this study have been included in this article and its supplementary information files).
Code availability
All relevant codes are available from the corresponding authors upon reasonable request.
References
Litt, M. & Luty, J. A. A hypervariable microsatellite revealed by in vitro amplification of a dinucleotide repeat within the cardiac muscle actin gene. Am. J. Hum. Genet. 44, 397–401 (1989).
Boland, C. R. et al. A National Cancer Institute Workshop on Microsatellite Instability for cancer detection and familial predisposition: development of international criteria for the determination of microsatellite instability in colorectal cancer. Cancer Res. 58, 5248–5257 (1998).
Li, G.-M. Mechanisms and functions of DNA mismatch repair. Cell Res 18, 85–98 (2008).
V. Yakushina et al. Microsatellite instability detection: the current standards, limitations, and misinterpretations. JCO Precis Oncol., 7, p. e2300010 (2023).
Diao, Z., Han, Y., Chen, Y., Zhang, R. & Li, J. The clinical utility of microsatellite instability in colorectal cancer. Crit. Rev. Oncol. Hematol. 157, 103171 (2021).
Luchini, C. et al. ESMO recommendations on microsatellite instability testing for immunotherapy in cancer, and its relationship with PD-1/PD-L1 expression and tumour mutational burden: a systematic review-based approach. Ann. Oncol. 30, 1232–1243 (2019).
Engel, K. B. & Moore, H. M. Effects of preanalytical variables on the detection of proteins by immunohistochemistry in formalin-fixed, paraffin-embedded tissue. Arch. Pathol. Lab. Med. 135, 537–543 (2011).
Shia, J. Immunohistochemistry versus microsatellite instability testing for screening colorectal cancer patients at risk for hereditary nonpolyposis colorectal cancer syndrome. J. Mol. Diagn. 10, 293–300 (2008).
Goel, A., Nagasaka, T., Hamelin, R. & Boland, C. R. An optimized pentaplex PCR for detecting DNA mismatch repair-deficient colorectal cancers. PLoS One 5, e9393 (2010).
Loughrey, M. B. et al. Identifying mismatch repair-deficient colon cancer: near-perfect concordance between immunohistochemistry and microsatellite instability testing in a large, population-based series. Histopathology 78, 401–413 (2021).
Shia, J. The diversity of tumours with microsatellite instability: molecular mechanisms and impact upon microsatellite instability testing and mismatch repair protein immunohistochemistry. Histopathology 78, 485–497 (2021).
Kautto, E. A. et al. Performance evaluation for rapid detection of pan-cancer microsatellite instability with MANTIS. Oncotarget 8, 7452–7463 (2017).
Vanderwalde, A., Spetzler, D., Xiao, N., Gatalica, Z. & Marshall, J. Microsatellite instability status determined by next-generation sequencing and compared with PD-L1 and tumor mutational burden in 11,348 patients. Cancer Med. 7, 746–756 (2018).
S. Middha et al. Reliable pan-cancer microsatellite instability assessment by using targeted next-generation sequencing data. JCO Precision Oncol. 1–17 (2017).
S. Bartels et al. Concordance in detection of microsatellite instability by PCR and NGS in routinely processed tumor specimens of several cancer types. Cancer Med., https://doi.org/10.1002/cam4.6293 (2023).
Niu, B. et al. MSIsensor: microsatellite instability detection using paired tumor-normal sequence data. Bioinformatics 30, 1015–1016 (2014).
Zhu, L. et al. A novel and reliable method to detect microsatellite instability in colorectal cancer by next-generation sequencing. J. Mol. Diagn. 20, 225–231 (2018).
Wang, Z. et al. Plasma-based microsatellite instability detection strategy to guide immune checkpoint blockade treatment. J. Immunother. Cancer 8, e001297 (2020).
Salipante, S. J. et al. Microsatellite instability detection by next generation sequencing. Clin. Chem. 60, 1192–1199 (2014).
Jia, P. et al. MSIsensor-pro: FAst, Accurate, and Matched-normal-sample-free Detection of Microsatellite Instability. Genomics, Proteom. Bioinforma. 18, 65–71 (2020).
Li, Z. et al. Genomic landscape of microsatellite instability in Chinese tumors: a comparison of Chinese and TCGA cohorts. Int. J. Cancer 151, 1382–1393 (2022).
Swets, M. et al. Microsatellite instability in rectal cancer: what does it mean? A study of two randomized trials and a systematic review of the literature. Histopathology 81, 352–362 (2022).
Harfe, B. D. & Jinks-Robertson, S. DNA mismatch repair and genetic instability. Annu. Rev. Genet. 34, 359–399 (2000).
Zou, T. et al. Rho GTPases: RAC1 polymorphisms affected platinum-based chemotherapy toxicity in lung cancer patients. Cancer Chemother. Pharm. 78, 249–258 (2016).
Wu, X. et al. Polymorphisms in the VEGFA promoter are associated with susceptibility to hepatocellular carcinoma by altering promoter activity. Int. J. Cancer 133, 1085–1093 (2013).
Y. Chae et al. Association of MGMT-535G>T polymorphism with prognosis for patients with metastatic colorectal cancer treated with oxaliplatin-based chemotherapy. JCO. 28, 15_suppl, pp. e14067–(2010).
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 17, 405–424 (2015).
Acknowledgements
We would like to thank Hank Yang for his invaluable guidance and mentorship throughout the development of this research. His expertise in algorithm design and data analysis techniques proved instrumental in shaping the analytical framework of this study. Additionally, this work would not have been possible without the generous participation of the study participants who voluntarily contributed their data. Their willingness to engage with our research protocol has enabled us to generate meaningful insights into molecular diagnostics of MSI. We deeply appreciate their trust and commitment to advancing scientific knowledge.This research is funded by Yunnan Province XingDian Talent Support Plan (XDYC-YLWS-2023-0068) and 3DMed Biomedical Technology Co., Ltd., Shanghai, China.
Author information
Authors and Affiliations
Contributions
Y.S., L.Y., and P.H. performed data analysis and wrote the main manuscript. S.Z. and J.X. collected and structuralized NGS test results. Y.Z. conducted the promoter methylation assay and structuralized the results. X.R. designed statistical analysis. D.Z. collected and structuralized metadata. Q.H. and X.L. designed the study. All authors reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Shang, Y., Yang, L., Hu, P. et al. Applying next-generation sequencing to detect microsatellite instability in pan-cancer patients: a retrospective study of 35,563 Chinese cases. npj Precis. Onc. 9, 303 (2025). https://doi.org/10.1038/s41698-025-01096-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41698-025-01096-0