Optimizing the NGS-based discrimination of multiple lung cancers from the perspective of evolution

Wang, Ziyang; Yuan, Xiaoqiu; Sun, Kunkun; Wu, Fang; Liu, Ke; Jin, Yiruo; Chervova, Olga; Nie, Yuntao; Yang, Airong; Jin, Yichen; Li, Jing; Li, Yun; Yang, Fan; Wang, Jun; Beck, Stephan; Carbone, David; Jiang, Guanchao; Chen, Kezhong

doi:10.1038/s41698-024-00786-5

Download PDF

Article
Open access
Published: 14 January 2025

Optimizing the NGS-based discrimination of multiple lung cancers from the perspective of evolution

Ziyang Wang^1,2,3^na1,
Xiaoqiu Yuan^1,2,3,4^na1,
Kunkun Sun⁵^na1,
Fang Wu ORCID: orcid.org/0000-0002-6627-3437^6,7,8^na1,
Ke Liu⁹^na1,
Yiruo Jin^1,2,3,4,
Olga Chervova ORCID: orcid.org/0000-0002-1671-9488¹⁰,
Yuntao Nie¹¹,
Airong Yang⁹,
Yichen Jin^1,2,3,
Jing Li⁹,
Yun Li^1,2,3,
Fan Yang^1,2,3,
Jun Wang ORCID: orcid.org/0000-0002-7033-5012^1,2,3,
Stephan Beck ORCID: orcid.org/0000-0001-5290-2151¹⁰,
David Carbone ORCID: orcid.org/0000-0003-3002-1921¹²^na2,
Guanchao Jiang^1,2,3^na2 &
…
Kezhong Chen ORCID: orcid.org/0000-0002-9723-6153^1,2,3

npj Precision Oncology volume 9, Article number: 14 (2025) Cite this article

4331 Accesses
3 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Next-generation sequencing (NGS) offers a promising approach for differentiating multiple primary lung cancers (MPLC) from intrapulmonary metastasis (IPM), though panel selection and clonal interpretation remain challenging. Whole-exome sequencing (WES) data from 80 lung cancer samples were utilized to simulate MPLC and IPM, with various sequenced panels constructed through gene subsampling. Two clonal interpretation approaches primarily applied in clinical practice, MoleA (based on shared mutation comparison) and MoleB (based on probability calculation), were subsequently evaluated. ROC analysis highlighted MoleB’s superior performance, especially with the NCCNplus panel (AUC = 0.950 ± 0.002) and pancancer MoleA (AUC = 0.792 ± 0.004). In two independent cohorts (WES cohort, N = 42 and non-WES cohort, N = 94), NGS-based methodologies effectively stratified disease-free survival, with NCCNplus MoleB further predicting prognosis. Phylogenetic analysis further revealed evolutionary distinctions between MPLC and IPM, establishing an optimized NGS-based framework for differentiating multiple lung cancers.

Recommendations for reporting tissue and circulating tumour (ct)DNA next-generation sequencing results in non-small cell lung cancer

Article Open access 15 May 2024

Consensus clustering methodology to improve molecular stratification of non-small cell lung cancer

Article Open access 12 May 2023

Lung cancer diagnosis based on weighted convolutional neural network using gene data expression

Article Open access 13 February 2024

Introduction

Multiple lung cancers (MLCs) refer to the occurrence of more than one malignant tumor in the lung (s), where tumors of independent origins are termed multiple primary lung cancers (MPLC) while those sharing the original clone are intrapulmonary metastasis (IPM). As lung cancer remains the leading cause of cancer-related death worldwide¹, MLCs account for approximately 4.5 to 15.8%^2,3. The clinical management is strongly disparate within MLCs^4,5,6. MPLC patients usually receive curative resection and achieve long-term survival, while adjuvant or neoadjuvant therapies are predispositions for intrapulmonary metastasis IPM considering the worse prognosis^7,8. Incorrect discrimination of MLCs leads to improper treatment and thus should be avoided. Clinicopathology has laid the foundation for discriminating MPLC and IPM, with comprehensive histology assessment (CHA)⁹ as the guideline recommendation^6,10,11,12, nevertheless displaying insufficiency in certain cases with a sensitivity of 76–78% and a specificity of 47–74%^13,14. The advent of next-generation sequencing (NGS) revolutionized the discrimination of MLCs^{5,13,14,15,16} but lacking explicit protocol accessible for medical practitioners. Limited studies tentatively pointed out the requirement of at least 100 genes to define clonal relatedness of MLCs with no validation¹⁷. Moreover, most studies empirically interpreted the sequencing results by counting the number of shared mutations, while only a few attempted to calculate the clonal probability based on all the mutations with bioinformatic tools^16,17,18. The critical issue of what panel to use and how to infer the clonal relatedness from sequencing results remained unresolved. Therefore, we designed this integrative study (Fig. 1) to fill in the gaps and establish a detailed procedure.

Results

Exploration of panel size and clonal relatedness analysis from previous studies

A comprehensive comparison of different diagnostic methods was conducted, incorporating clinical-pathological data and 2842 molecularly testable variants identified in 495 patients from 15 retrospective studies (Fig. 2, Supplementary Tables 1 and 2). Here, we present three main conclusions and the related challenges in discrimination of MLCs. First, while NGS determines all cases with equivocal pathological diagnosis here, it does not outperform clinical or pathological methods due to the similar inconclusive rates (Mole A: 6.7%, 29/435; Mole B all-gene: 9.7%, 48/495; Mole B > 50-genes: 5.1%, 6/314 vs CHA 9.8%, 27/274) (Supplementary Tables 3 and 4)^{15,18,19,20,21,22,23,24} and consistent failures in prognosis stratification (Supplementary Table 2)^{15,17,22,23,25}. Second, the comparison of the optimal method for assessing clonal relatedness remains unvalidated against a gold standard. Although discrimination through counting shared mutations or calculating clonal probability shows similar efficacy, the lack of follow-up data hinders more definitive validation (Fig. 2A, Supplementary Tables 1, 4 and 5). Third, panel selection remains confusing. The similar inconclusive rates between the “50 genes” and “>300 genes” groups (6.7% vs. 4.5%) suggest that blindly expanding the panel leads to diminishing marginal returns (Fig. 2B). However, the benefits of expanding sequencing coverage to resolve uncertainties, where the sequencing results of rarely analyzed genes helped clarify 21 out of 32 inconclusive cases (65%) (Fig. 2C), along with the substantial inconsistency between pan-cancer panels and the hotspot mutations in MLCs, underscore the importance of optimizing panels specifically for MLCs identification (Fig. 2D). Given these key issues and challenges, we conducted further investigations to address them.

**Fig. 2: Data extracted from previous studies implicate the superiority of NGS in discriminating MLCs.**

Simulated data identifies the optimal panels and molecular approaches for interpreting clonal relatedness

WES data from the 235 samples from 80 patients with solitary NSCLC in our institution were used for simulation. The review of the mutational landscape of mimicked MLCs²⁶ shows that the vast majority (92.6%) of sim-IPM share at least one mutation, whereas sim-MPLC (89.3%) carry almost no identical mutations (Fig. 3A, Supplementary Table 6), partially supporting the validity of the simulation method. Thus, we first compared the effectiveness of two interpretation methods: MoleA and MoleB (detailed in the “Methods” part)²⁷. Overall, as the sequencing range broadens from one gene (EGFR) to WES, the diagnostic uncertainty declines (MoleA from 62.2% ± 0.59% to 1.68% ± 0.16%; MoleB from 43.7% ± 0.90% to 0.0% ± 0.0%, Fig. 3B) while AUC increases (MoleA from 0.437 ± 0.009 to 0.949 ± 0.002; MoleB from 0.910 ± 0.005 to 0.987 ± 0.001, Fig. 3C). For each panel, MoleB consistently diagnoses more cases than MoleA and achieves higher AUCs, implying the superiority of bioinformatic interpretation in assessing clonal relatedness.

**Fig. 3: Simulation data reveals the optimal panels and superiority of bioinformatic analysis in judging clonal relatedness.**

Furthermore, we attempted to identify the optimal panels for MLCs discrimination. The inflection points on the curves of inconclusive fractions or AUC values indicate optimal coverages candidates (Fig. 3B, C), balancing the accuracy loss of panel simplification against the limited benefits of panel expansion. Based on the drop and growth rates, the inflections occur at 10-genes (covering TP53 plus nine driver genes recommended by the NCCN²⁸: EGFR, KRAS, ALK, BRAF, ERBB2, MET, RET, ROS1, PIK3CA) under MoleB and at 363-genes (one pancancer panel) for MoleA (Fig. 3B, C). Therefore, the 10-genes panel plus MoleB seems to be sufficient in most cases while a pan-cancer panel of at least 363 genes is necessary when bioinformatic support is limited. Although inconclusiveness only disappears with WES MoleB, the ROC curves (Fig. 3D, E) show excellent and comparable accuracy between the 10-genes MoleB (AUC = 0.950 ± 0.002) and WES MoleB (AUC = 0.983 ± 0.001). The 363-genes MoleA performs weaker than WES MoleA (AUC = 0.949 ± 0.002), but still shows an acceptable AUC of 0.792 ± 0.004 (Fig. 3F, G). ROC curves for other coverages are available in Supplementary Fig. 1 (MoleA) and Supplementary Fig. 2 (MoleB).

MLCs cohorts verify the optimal panels and clonal relatedness analysis

The clinical characteristics of two validation MLCs cohorts are provided in Table 1. Stage I, never-smokers, lung adenocarcinomas and synchronous MLCs dominate both cohorts. The average follow-up time was 51 months.

Table 1 Clinical characteristics of MLCs patients

Full size table

Initially, we verified the superiority of NGS and identify the optimal panels. In the WES cohort, clinical method and CHA are clearly less conclusive than two molecular methods. CHA was considered the previous gold standard diagnostic method. However, fourteen cases are inconclusive or undiagnosable. In contrast, molecular testing displays higher conclusive rates (10-genes MoleB = 75.6%, WES MoleB = 100%, 363-genes MoleA = 82.9%, WES MoleA = 87.8%) (Fig. 4A).

**Fig. 4: Analysis of the WES cohort verifies the superiority of optimal panels in discriminating MLCs.**

Then, we evaluated the effectiveness of the diagnostic methods based on their ability to differentiate prognosis. After excluding five non-stage I patients to reduce the bias of tumor staging, all molecular methods successfully distinguish the DFS between MPLC and IPM (Fig. 4D, 10-genes MoleB P = 0.016; Fig. 4E, 363-genes MoleA P = 0.029; Supplementary Fig. 3L, WES MoleB P = 0.033; Supplementary Fig. 4L, WES MoleA P = 0.029) while clinical and CHA fail (Fig. 4B, clinical P = 0.92; Fig. 4C, CHA P = 0.87). Additionally, all panels using ≥10 genes by MoleB and those with ≥363 genes using MoleA, demonstrate the ability to stratify prognosis (Fig. 4F, Supplementary Figs. 3 and 4), consistent with the simulation part. It is noteworthy that although the 3-gene MoleB can also stratify the prognosis, its conclusive rate (24/41, 58.5%) is insufficient to provide meaningful clinical value.

Given that high-depth NGS could amplify variation from single-site sampling, potentially leading to inconsistent results compared to the WES cohort, we also conducted a validation using a non-WES cohort to better reflect clinical practice. A similar trend was also observed in the high-depth non-WES cohort, where both clinical and pathological methods failed to distinguish the prognosis of MLCs (clinical P = 0.6, pathological P = 0.99; ≥10 genes, N = 49: clinical P = 0.53, pathological P = 0.69, Supplementary Fig. 5B and Supplementary Fig. 5C). In contrast, 10-genes MoleB significantly stratify the prognosis (P = 0.0011, Supplementary Fig. 5D). However, the 363-genes MoleA is unable to separate the survival curves.

Therefore, the 10-gene MoleB is the most cost-effective choice, meeting the requirements for both high conclusive rates and diagnostic accuracy. To develop the most cost-effective sequencing strategy, we reviewed the mutations of the 10 inconclusive patients by 10-genes MoleB and found that certain high-frequency mutations, such as TTN (with the highest frequency of 36.4%, Fig. 4G), are typically excluded from routine sequencing due to their limited therapeutic value. However, since these genes are often passenger or background mutations that may reflect similar mutational landscapes but cannot reliably indicate a shared clonal origin, designing an optimized diagnostic panel with the best cost-effectiveness still requires more careful thought and validation.

Next, we compare different approaches for discriminating MLCs. To illustrate the challenges CHA may encounter, we presented three pathological examples: MPLC (P38), IPM (P3), and inconclusiveness (P32) in Fig. 4H. The consistency of diagnostic results is low in the clinical-molecular comparison, moderate in the pathology-molecular comparison, and high in the molecular-molecular comparison (Fig. 4I). CHA disagrees with WES MoleB in 15.4% (6/39) of tumor pairs. In these cases, histological similarity (P12, P15) or differences in morphologies (P5, P6, and P34) contradict their mutational profiles (Supplementary Fig. 6). The varying relationships between genotype and phenotype could partially explain the discordance, reminding the inherent weakness of pathology. We then adopted WES MoleB as the reference to evaluate the validity of molecular methods, as it is the only method capable of diagnosing all cases while effectively stratifying prognosis (Fig. 4J). 10-genes MoleB holds the highest sensitivity for diagnosing IPM (i.e., highest specificity for diagnosing MPLC) and balanced performance considering the small difference between sensitivity and specificity, while the 363-genes MoleA bears the highest specificity for diagnosing IPM (i.e., highest sensitivity for diagnosing MPLC).

Based on these findings, molecular testing should be performed in parallel with pathology, rather than exclusively, to achieve optimal results. The 10-genes panel (NCCNplus panel) is preferred when bioinformatics support is fully available while at least 363-genes (pancancer panel) NGS is required if bioinformatics is limited. WES remains the ultimate solution for both MoleA and MoleB.

Evolutionary features of shared mutations in MLCs

We reviewed two cases that encountered challenges in clinical or pathological discrimination but were corrected through phylogenetic reconstruction, preliminarily revealing the evolutionary features of MLCs and their value in distinguishing MLCs.

Case 1 (P15, Fig. 5A) is a 73-year-old man presented with two solid nodules in the left upper lobe (T1) and the right upper lobe (T2). Pathology showed similar adenocarcinomas with a predominant acinar pattern. However, using the WES-MoleB method, the final diagnosis was MPLC, further confirmed by the phylogenetic tree. He has been free of recurrence for 43 months. Case 2 (P34, Fig. 5B) is a 62-year-old man had two pure solid tumors removed from the left upper lobe. The pathologist diagnosed the patient as MPLC since his lesions were adenocarcinomas with different proportions of acinar and papillary components. However, the results from WES-MoleB and the phylogenetic tree pointed to IPM. This patient received four cycles of chemotherapy and died of intracranial metastasis 33 months after surgery.

**Fig. 5: Representative cases of pathological misdiagnoses and exploration of MLCs evolution.**

This demonstrates again that the different evolutionary trajectories of MPLC and IPM differ greatly between IPM and MPLC (Fig. 5C). In total, we identified 283 clonal driver events (MPLC, total 197, median 2, range 0–29; IPM, total 86, median 2; range 0–7), 98 subclonal drivers (MPLC, total 86, median 1, range 0–7; IPM, total 12, range 0–2), and 103 overlapped driver mutations in IPM (clonal, total 94, median 1, range 0–7; subclonal, total 9), Supplementary Figs. 8 and 9). First, we focused on shared mutations most likely to impact the molecular differentiation of MLCs. In general, MPLC and IPM show a statistically significant difference in the clonal/subclonal classification of shared mutations (P = 0.0000539) (Fig. 5D, E, Supplementary Fig. 7). Specifically, overlapping mutations in MPLC are predominantly subclonal or neutral with low VAFs (76.9%, 20/26), suggesting the independent lesion progression and accidental overlap. In contrast, shared mutations in IPM include a larger proportion (72.8%, 8/11) of clonal mutations with VAF close to 0.5, along with a minor proportion of subclonal mutations with low VAF, possibly resulting from complex late-stage dissemination of metastatic clones, such as cross-seed or reseed (Figs. 5D and 6A)²⁹. Thus, if feasible in the future, parallel sequencing of all MLCs tumors, with VAF incorporated as a correction factor in clonality assessments, is highly recommended to differentiate coincidental low-VAF overlapped mutations in MPLC. Besides, we found that only 4 out of the 10 genes in NCCNplus panel play a role in the evolution of MLCs. A modified panel specific to MLCs, based on unique driver mutations, was developed (Supplementary Table 7), which successfully classified 41.67% of the inconclusive cases using 10-genes MoleB.

**Fig. 6: Timing of mutations in MLCs evolution based on the WES cohort.**

Genome doubling in metastatic progression of MLCs

To decipher the timing of somatic events in each tumor, we further conduct the analysis of genome doubling in MLCs. Generally, WGD is observed in 43% (37/86) of all lesions. The incidence of WGD in MPLC (46.8%, 29/62) is higher than that of IPM (36.0%, 9/25) (Supplementary Fig. 8). Previous studies suggested that in metachronous IPM, most primary foci appeared larger than metastases³⁰, and there is no significant difference in tumor sizes between synchronous and metachronous MLCs¹⁵. MPLC are often considered unrelated primary tumors, whereas IPM lesions tend to show greater correlation in metastatic progression. Therefore, we attempted to preliminarily investigate the discrepancy between the primary site (IPM-L) and the metastatic site (IPM-S) in IPM from the perspective of tumor evolution (Fig. 6). Consistent with the common findings, IPM-S (46.2%, 6/13) always harbor more WGD events than IPM-L (25.0%, 3/12)^31,32. The temporal distribution of mutations in IPM-S is statistically different from those in MPLC, IPM-L, and SLC (a cohort of solitary lung cancer from our institution), while no significant differences were observed among MPLC, IPM-L, and SLC (P = 0.61) (Supplementary Fig. 9C). Moreover, most mutations in IPM-S are clonal, and more than 75% of them occur before WGD, indicating that metastasis could occur at early stages of cancer development. These findings suggest that WGD may serve as a potential biomarker for distinguishing between IPM and MPLC from an evolutionary perspective. Additionally, the timing of overlapping mutations in IPM differ significantly from that of SLC (P = 0.0003), however, this may be influenced by differences in sample size and population characteristics (Supplementary Fig. 9D). Similarly, the temporal analysis of mutations in TRACERx³³ and SLC also exhibits significant differences (Supplementary Fig. 9, P < 0.001), also emphasizing the impact of population distinction on the analysis results.

Recommended process for clinicopathologic-molecular discrimination of MLCs

Incorporating the preceding findings, we proposed an integrative procedure for discriminating MLCs (Fig. 7): (1) After preoperative prediction of multiple malignancies, the discrimination of MLCs starts with clinical evaluation of lymph nodes and systemic metastases. (2) With preoperative biopsy of multiple foci, molecular diagnosis is possible but faces practical difficulties, while pathological discrimination is not applicable³⁴. (3) Resected tumors are routinely subjected to pathology and some typical cases could be diagnosed with CHA. Clinical and pathological assessments should follow the Martini criteria¹⁰, ACCP guidelines¹², or IASLC proposals⁶. (4) Molecular evaluation is recommended alongside pathology in all cases to reduce misdiagnosis, and should be strongly prioritized in cases with equivocal pathology diagnosis or when experienced pathologists are not available. (5) Bioinformatic support for analyzing detected mutations is recommended to interpret clonal relatedness. On this basis, the NCCNplus panel is the preferred first choice, followed by WES, with the panel modified for MLCs as the alternative option. It is not recommended to blindly use a large panel for sequencing from the beginning. For example, in cases where different oncogenic drivers are found or where one oncogenic driver is present in one sample but absent in another, the results obtained through such analysis are highly reliable, and there is no need to expand the sequencing scope. (7) If no mutations are detected across all lesions, WES should be applied to resolve any inconclusiveness. (8) Techniques based on other molecular markers require further validation.

**Fig. 7: Recommended procedure for discriminating MLCs.**

Discussion

Although NGS owns the ability to determine tumor clonal relatedness, it remains controversial in application details. This study unprecedently investigated the optimal panels and mutation-based interpretation of clonal relatedness, exploring the evolutionary characteristics of MLCs in depth, and further developed a feasible process for discriminating MLCs integrating clinicopathology and NGS.

Generally, MoleA shows clear weakness against MoleB when applying the same panel, likely to be due to the qualitative nature and susceptibility to limited coverages. In contrast, reproducible calculations could minimize human error to the greatest extent in ambiguous cases. To date, three studies have utilized bioinformatic analysis to discriminate MLCs, however, the implications of the research should be interpreted with caution. Ezer et al.¹⁸ proposed an algorithm only considering shared mutations and failed to distinguish the prognosis. Chang et al.¹⁷ and Goodwin et al.¹⁶ both utilized the SNVtest function from the “Clonality” package, which is less accurate and less adaptable for target-specific parameter adjustments compared to the mutation.rem used in this study³⁵. Other limitations include the relatively short follow-up time of 15 months and the failure of prognosis analysis in Chang’s cohort¹⁷, as well as the exclusive enrollment of MPLC cases diagnosed using the Martini criteria in Goodwin’s research¹⁶. Besides, ever-smokers predominated these cohorts, which differs from the characteristics of MLCs patients in East Asians populations. To be noticed, we implemented several analytical optimizations. Instead of using the default TCGA datasets^36,37, mutational datasets from Chinese NSCLC^38,39 were used as references to address mutational differences specific to each population. Additionally, to remove the potential bias caused by different sequencing depth, two MLCs cohorts were used for validation. Commonly, low-depth sequencing impedes the capture of low-frequency mutations while high-depth sequencing may be limited by budget constraints⁴⁰. Technically, calculated from the principle of NGS, the minimum depth to detect ≥95% of subclonal and clonal mutations is 91× and 29×, respectively⁴¹. Similar to theoretical assumptions, the low-depth WES panels used in our study appear sufficient for distinguishing MLCs.

We also observed that ITH may impact molecular discrimination. In manual review of the 17 MLCs patients with multi-point sampling, the result of one patient changed when selecting different sampled point within the same tumor. We noticed that this case carried a great burden of subclonal mutations (private mutations in point 1:private in point 2:shared = 483:455:11). This aligns with our previous hypothesis that distinguishing incidental subclonal overlap in MPLC from low-VAF shared mutations in IPM is challenging (Fig. 5D). As the sequencing range expands, more unshared subclonal mutations are detected, while the number of overlapped mutations plateaus. The impact of unshared mutations may reverse the clonal relatedness in some cases, as reflected by the leftward shift in the distribution of clonality signal ξ (Supplementary Table 8). Therefore, it is suggested to incorporate additional metrics related to cancer evolutionary dynamics, such as VAF, CCF, and copy number variation [CNV], for a more accurate assessment of clonal relatedness, rather than relying solely on the numerical count of mutations. Additionally, although single-region biopsy could capture the majority of cancer mutations⁴², multi-point sampling is strongly recommended for large-volume tumors or highly heterogeneous tumors to avoid misinterpretation⁴³. Phylogenetic quantification to cross-validate the clonal correlation is promising⁴⁴, though it is still far from being widely adopted in clinical practice.

This study has some limitations. First, despite our best efforts, the sample size was restricted by the inability to obtain adequate specimens or poor DNA quality. Larger datasets are warranted for parameter estimation, and the inclusion of external validation cohorts would further solidify the conclusions. Second, there are discrepancies between the mimicked and real mutation spectrum of MLCs (Figs. 3A and 5C). Nevertheless, reassessment of the simulation data suggests that this method seems acceptable, especially given the current lack of validated MLC datasets. Third, the clonal relationship analysis stopped at the level of SNV. We aim to enhance the application of CNV and fusion analysis in molecular identification in future investigations. The inclusion of synonymous mutations is also a potential optimization approach⁴⁵. They have been found helpful in clonality assessment⁴⁶, which is also supported by two of our cases (P20 and P23). However, it is undeniable that the “Clonality” package remains sufficient in most cases, given its consistently satisfactory performance. Moreover, due to objective factors such as sample size, there are further research topics that warrant investigation. For instance, improvements to diagnostic panels for the relatively rare non-adenocarcinomas, which are difficult to diagnose pathologically, require attention. Considering that multiple non-adenocarcinomas always exhibit greater mutational heterogeneity and genomic instability, a broader sequencing scope or more targeted diagnostic panels may improve diagnostic efficiency. Besides, distinct mutational profiles across adenocarcinoma subtypes^47,48, as well as intergenic correlations between hotspot mutations^17,25 that may affect diagnostic efficiency, also necessitate larger sequencing studies and wider clinical application for further improvements.

Today’s clinicians are confronted with an intricate list of diverse molecular testing, struggling to avoid inaccuracy while reducing the economic burden. We managed to clarify the optimized panels as well as proper interpretation through innovative modeling of different panels by subsampling sequencing data and evolutionary analysis, successfully proposing a feasible procedure for discriminating MLCs.

Methods

Preliminary exploration of NGS-based discrimination through systematic review

The comprehensive online searches were performed by two independent authors (Z.Y.W. and X.Q.Y) from four databases, including PubMed, Web of Science, Scopus, and Cochrane Library from the inception to Sep 20, 2021, using the terms “multiple lung cancers”, “MLCs”, “multiple primary lung cancers”, “MPLC”, “intrapulmonary metastasis”, and “IPM” in combination with “next-generation sequencing”. Searching was restricted to original studies published in English with main text available for extraction of genomic profiles. 89 records were obtained from the four databases. After duplicate removal and rough screening, 15 articles were left for further assessment of full-text. The screening process of this research followed the statement of PRISMA (Supplementary Table 10)⁴⁹. The quality of all included studies is 3 as they are all retrospective cohort studies following Newcastle-Ottawa Scaling system. Three investigators (Z.Y.W., K.L., and X.Y.Y.) read full text of 15 articles and performed data extraction independently. Any disagreements were resolved by consultation with a fourth reviewer (Y.T.N.). We collected the patient characteristics, identification results by clinicopathology criteria (including histopathology examination, ACCP (American College of Chest Physicians) guideline, Martini-Melamed criteria), discriminating procedure of MLCs, prognosis, and available genomic alterations at different levels. If one article lacked essential data, the authors were contacted for supplementation requests. The process of bioinformatic analysis is detailed later. We calculated hazard ratios (HRs) for survival analysis from the extracted information using established methods⁵⁰.

Study design and patient selection of the validation cohorts

This study investigated the superiority of NGS, suitable panels, and precise interpretation in discriminating multiple lung cancers (MLCs). Our studies reported according to the STROBE statement. We reviewed the MLCs patients who underwent curative surgeries in Peking University People’s Hospital from January 2009 to December 2021. We selected patients who underwent surgical resection for more than 1 NSCLC while excluding those with (1) preoperatively confirmed or suspected systemic metastases, (2) malignant pleural or pericardial effusion, (3) diffuse pneumonic lesions, (4) known primary cancer history or (5) all lesions diagnosed as ground-glass opacity (GGO). Two cohorts of multiple lung cancers were used for validation, including WES (whole-exome sequencing) cohort and non-WES cohort. The WES cohort includes 42 patients, 17 of which underwent multiregional WES. The non-WES cohort consists of 94 patients tested for NGS of different panels (38 with 8-gene, 7 with 9-gene, 49 with ≥10-gene) (Supplementary Table 6). The studies involving human participants were reviewed and approved by the Institutional Review Board and Ethics Committee at Peking University People’s Hospital (approval No. 2020PHB210-01), in compliance with the regulations laid forth in the Declaration of Helsinki.

The patient tissue and blood samples, along with associated clinical and demographic data (e.g., age, diagnosis), were collected under written informed consent from all participants. All patient samples were deidentified and assigned unique sample IDs to ensure the privacy of patients’ information. Identifiable information linking patients to their samples was accessible only to study personnel involved in the clinical annotation of study data.

Clinical review and pathological review of the validation cohorts

The electronic medical records of enrolled patients we reviewed include the following clinical information: demographics (gender, age, and occupation), medical history (chief complaint, history of present illness, history of malignancies, smoking history, and family history of lung cancer), preoperative examinations (tumor markers, pulmonary function, and radiology), features of pulmonary nodules on chest CT (counts, size, locations, unilateral or bilateral, ground glass or solid nodules, lymphadenopathy, and emphysema), surgical records (dates, surgical approach, and perioperative complications), and pathological details (histologic type, main subtype, pleural invasion, vascular invasion, and lymph node status). During 5 years after surgery, chest CT and abdominal ultrasound were performed every 6 months while MRI and bone scan every 1 year on follow-up visits or at any time with symptoms. Additionally, the patients or their family members were regularly followed up by telephone, for this study the last episode of follow-up took place in August 2022. Disease-free survival (DFS) was defined as the time from the day of the last lung cancer surgery until the first relapse or metastasis or last follow-up. Four thoracic surgeons (K.Z.C., Z.Y.W., Y.C.J., X.Q.Y.) carried out the clinic discrimination of MLCs with the collected information except follow-up data, following the 2013 ACCP guideline¹² and the International Association for the Study of Lung Cancer (IASLC) proposals⁶.

Pathologic evaluation was performed by the experienced pathologist in Peking University People’s Hospital (K.K.S.), following the criterion of CHA⁹. After excluding one patient whose pathological data was unavailable, 41 patients were diagnosed with CHA, of whom four were indeterminate due to objective factors (one of inconsistent staining, two of too little tissue, and one of severe tissue loosening) while 10 received relatively favorable but uncertain diagnoses by the pathologist.

Sequencing workflow for simulation cohort and validation cohort

DNA was extracted from formalin-fixed, paraffin-embedded (FFPE) tissues using the QIAamp DNA FFPE Tissue Kit (QIAGEN, Valencia, Calif), following the manufacturer’s protocols. To ensure the quality of the libraries, DNA was quantified with the Qubit dsDNA HS Assay Kit (Life Technologies, Carlsbad, Calif) and Qubit 2.0/3.0/4.0 Fluorometer (Life Technologies, USA).

As for the 8-genes NGS, initial sequencing data of EGFR, KRAS, BRAF, ERBB2, and MET mutants were processed using Torrent Suite (5.0.4) and Coverage Analysis plugins (5.0.4.0), along with the Ion Reporter (4.4) Oncomina DNA analysis workflow. The Torrent Mapping Alignment Program and the Torrent Variant Caller plugin (5.0.4.0) were, respectively, applied to align DNA sequences and call variants, under default settings. Two filtering steps were used to eliminate misidentified bases and generate final calls. ALK, ROS1, and RET fusion analysis was performed by the Ion Reporter (4.4) workflow and the Torrent Mapping Alignment Program.

As for the 31/68/143/363/457/520/654-genes NGS, purified DNA was first broken into fragments of around 300 bp using 5X WGS Fragmentation Mix (Qiagen, USA), followed by end repair and adapter ligation. Polymerase chain reaction (PCR) with gene targeted by the cSMART methods⁵¹, and 96 RXN xGen Exome Research Panel v1.0 (Integrated DNA Technologies, USA) were used successively for the library preparation. NovaSeq 6000 platform (Illumina, San Diego, USA) in 150PE mode was applied for sequencing, with a depth of more than 1000x. Then We used fastp⁵² to remove low-quality results, Burrows-Wheeler Aligner (BWA)⁵³ to align the filtered data to reference genome (hg19), Gencore tool⁵⁴ to manage repeated sequences, SAMtools⁵⁵ to call variants, and ANNOVAR⁵⁶ to annotate mutations. Non-synonymous SNPs with population allele frequencies lower than 0.05% and variant allele fraction (VAF) higher than 0.5% were kept for further analysis. For the 520-genes NGS, we used Nextseq500 sequencer (Illumina, Inc., USA) for paired-end sequencing and gene targeting with a depth of more than 1000x. VAF lower than 0.1% were reserved for further analysis, referring to ExAC, 1000Genomes, dbSNP, and ESP6500SI-V2 databases.

For the patients included in non-whole exome sequencing (WES) cohort, the choice of sequencing platforms was made based on the patients’ preferences. Gene coverages of targeted panels were presented in Supplementary Table 6.

As for WES cohort, lung tumor tissue and adjacent normal tissue specimens were collected by surgical removal for histopathological diagnosis and further study. To assess the impact of intra-tumor heterogeneity (ITH), tumor tissue samples were collected from two or three regions, based on the size of the tumor. Paired peripheral blood samples were collected immediately before surgery, fractionated into white blood cells and plasma. The samples were stored at −80 °C in the laboratory for subsequent experiments. Sequencing data from multi-point sampling were simultaneously included in the analysis. To eliminate the false positive discoveries, matched normal control samples were also sequenced to remove germline polymorphisms.

The library was prepared with the SureSelectXT Target Enrichment System (G7530-90000). Paired-end sequence data were generated applying the Illumina HiSeq machine, and aligned to the reference genome (hg19) using BWA. We used the SAMtools to sort original alignment results, Picard (http://broadinstitute.github.io/picard/) to mark duplication reads, and GATK⁵⁷ to reduce false positives via local realignment and recalibration. SNVs and small indels were called by MuTect2⁵⁸ in the default setting and then annotated by ANNOVAR according to HGVS, ExAC, 1000Genomes, dbSNP, OMIM, COSMIC, and ClinVar databases. It is important to emphasize that all variants were retained, regardless of their pathogenicity. Variations with population allele frequencies lower than 0.05% were subjected to further analysis. After excluding significant outlier data, we filtered out the SNVs with less than 20X depth or less than 4X depth of the alternate alleles in tumor samples, as well as SNVs with less than 10X depth in normal samples or variant reads greater than 1% of normal reads. For multiple-region samples, variants detected in some but not all samples were recalled. This was done to address the possibility that the absent variants in some samples might be due to low VAF, thereby reducing false negative callings. The samples sequenced exhibited an average purity of 0.52 (ranging from 0.14 to 1) (Supplementary Table 9). The average sequencing depth on target was 129X (range: 88–180X) for tumor tissues and 119X (range: 36–175X) for adjacent normal tissues.

Mutation-based analysis of clonal relatedness

We compared two methods for mutation-based analysis of clonal relatedness: counting the shared mutations (MoleA) and clonal probability calculation based on all mutations (MoleB) in simulation cohort and the validation cohort (Fig. 1). MoleA represents the empirical approach commonly used previously, typically defining MPLC as paired tumors with no shared mutations or only one tumor harboring mutations. “Inconclusive” refers to cases with no detected mutation. The additional criteria for MoleA categorized cases with only one overlapping hotspot mutation as “inconclusive”, based on the COSMIC database⁵⁹ and hotspots identified in 24,592 tumor samples by Chang et al.⁶⁰. MoleB utilizes bioinformatics supports, specifically employing the “Clonality” package, to calculate the clonal relatedness³⁸. “Inconclusive” refers to cases with no detected mutation. For MLCs patients with three tumors, the mutations were compared pairwise, and IPM was diagnosed as long as one pair was judged as IPM.

To evaluate the performance of different panels, we mimicked applying multiple panels on certain cases via subsampling the sequencing data of the included studies, simulated cases, and validation cohort. The modeled coverages in the review part were as follows (detailed in Supplementary Table 6): 1 gene, 3 genes, 9 genes, 10 genes (EGFR, ALK, KRAS, BRAF, ERBB2, MET, PIK3CA, RET, ROS1, TP53), and 50 genes. These ranges were determined based on commonly used panels and authoritative guidelines such as the National Comprehensive Cancer Network (NCCN) guideline^28,61. All the larger panels covered the smaller range well (Supplementary Fig. 11), ensuring the feasibility of the subsampling process. It should be noted that gene fusion was not included in the analysis, which inevitably or partially diminishes the diagnostic role of certain genes, such as ALK, RET, and ROS1.

Clonal relatedness calculation using the “Clonality” package (MoleB)

This package has been detailed by the developers^38,39,62. The first step is to estimate the frequencies of detected mutations. Briefly, if one mutation was detected in A samples in the reference dataset and B samples in the studied cohort, then the frequency of this mutation is assigned as (A + B)/(M + N), where M and N are the total number of cases in the reference dataset and the study in question, respectively⁶². For this step, we queried the default TCGA data^38,39 in the systematic review, but leveraged the mutational datasets built from Chinese NSCLC^36,37 in simulation and validation parts.

In the second step, the evaluation of clonal relatedness is determined by two kinds of input: the number and incidence of unique variants in a pair of tumors which reflects independent clones, as well as shared mutations which indicate the possible degree of identical origin^39,62. Two functions have been proposed for step 2. The SNVtest is simpler and was adopted by previous studies involving MLCs^16,17, the essence of which is a test for the alternative hypothesis of independent origins⁶². The mutation.rem computes the exact probability of identical clones in an individual, which has not been employed in investigating MLCs but bears higher accuracy as clarified in other cancers^38,39. Thus, we chose this function in the simulation and validation part. However, in the review part, we adopt the SNVtest given the inter-study heterogeneity, since this function operates independently from the genetic profiles of studied cohorts while the mutation.rem yields the model parameters specific to the studied population³⁹. Extracted mutations from review were reversely annotated to meet the requirement of SNVtest using Transvar⁶³. However, 76 genomic alterations failed to undergo reverse translation even after artificial correction.

The third step of interpretation also differs between the two functions. As for the SNVtest, a generated p-value of less than 0.05 rejects the null hypothesis and IPM is diagnosed, while MPLC is strictly identified when the p-value is more than 0.95. But the diagnosis cannot be determined as long as the p-value is between 0.05 and 0.95. The interpretation of mutation.rem is based on the established cutoff (the obtaining process in this study is detailed later), above which indicates the same clones (IPM), while below which the diagnosis of MPLC is made.

Simulation of precisely-diagnosed MLCs

Considering of the lack of validated MLCs cohorts, we chose to select different pairwise sequencing profile from multi-region sequencing solitary NSCLC to model MPLC and IPM⁶⁴. Samples from the same patient were paired to model IPM (sim-IPM) while those from different patients mimicked MPLC (sim-MPLC), generating 28,203 simulated cases of precisely diagnosed MLCs (details of these samples have been reported²⁶). WES data of these cases were subsequently subsampled to evaluate the performance of different molecular methods.

During the bioinformatic analysis of modeled MLCs, we first randomly selected 56 of the 80 SLC patients and used simulated cases derived from these SLC to establish the optimal parameters of the mutation.rem function corresponding to each panel (Supplementary Table 8). Then, the determined parameters were applied to the data derived from the remaining 24 SLC cases to draw the receiver operating characteristic (ROC) curves for each panel and calculate the areas under the curve (AUCs) as well as the interpretation cutoffs of probabilities. The above processes were repeated 100 times to obtain the optimal parameters and cutoffs, which were used and validated in subsequent analyses.

Evolutionary analysis

Analysis of variant allele frequencies (VAF) and phylogenetic reconstruction were performed using established methods, TumE⁶⁵ and PyClone⁶⁶. Briefly, mutations were classified as clonal/subclonal based on cancer cell fraction (CCF)⁶⁷ and further timed with respect to whole-genome doubling (WGD) following previous publications³³. This section consists of two parts, VAF analysis and timing of mutations.

An established synthetic supervised learning method (TumE) was appiled⁶⁵ to compare subclonal compositions of overlapped mutations in MLCs and explore the differences in evolutionary trajectories between MPLC and IPM through VAF analysis which is based on bulk-sequenced single biopsy. Briefly, this algorithm integrates simulated models of cancer evolution under positive and neutral selection with Bayesian neural networks to quantify subclonal dynamics using purity-corrected VAF information from diploid genomic regions.

Sequenza (v.2.1.2) was used to estimate both the copy number profile (including allele-specific copy number) and tumor purity, and ploidy for each sample. Mutation copy number was equal to the fraction of tumor cells carrying a given mutation multiplied by the number of chromosomal copies at that locus. CCF was computed based on the method proposed before⁶⁷. Mutations were classified as clonal if CCF ≥1, and subclonal if below1. Clonal mutations were then further timed as early, late or untimed clonal. In brief, mutations with at least two copies of the major allele were preliminarily classified as early if the mutation copy number was >1, and late if it was ≤1. Any clonal mutations that could not be timed as either early or late, were classified as “clonal untimed”. The sequencing profile from TRACERx and our published study about solitary NSCLC were also include to further investigate the temporal difference in cancer evolution^26,33.

Generally, MPLC represents a cluster of unrelated primary tumors, whereas IPM consists of two foci with tumorous cells transfer to another site. Previous studies suggested that in metachronous IPM most primary foci appeared larger than metastases³⁰, and there is no significant difference in tumor sizes between synchronous and metachronous MLCs¹⁵, so here we hypothesized the larger lesion of IPM to be the primary focus (IPM-L) and the smaller lesion as metastasis (IPM-S) to explore the discrepancy inside IPM.

Statistics

DFS was presented on Kaplan–Meier curves, with differences in prognosis determined by log-rank test. Continuous variables with normal distribution were described as mean ± standard deviation while categorical variables were expressed as frequencies and percentages. Univariable comparisons were conducted using the Wilcoxon rank-sum test for continuous variables and the chi-square test for categorical variables. Statistical analyses were performed using R 4.1.0 (R Core Team) and GraphPad PRISM 9.0 (GraphPad Software). Two-tailed p-values less than 0.05 was considered statistically significant.

Data availability

Raw data of NGS derived from human samples of our MLCs patients have been deposited at the China National Center for Bioinformation (https://ngdc.cncb.ac.cn/gsa/) with the accession number (HRA005102). Local law prohibits depositing raw WES datasets derived from human samples outside of the country of origin. Prior to publication, the authors officially requested that the raw sequencing datasets reported in this paper be made publicly accessible. To request access, contact the Office of Human Genetic Resource Administration of The Ministry of Science and Technology for The Regulation of the People’s Republic of China on the Administration of Human Genetic Resources. The software used in this study is described in the above sections in details. Any additional information and codes required to reanalyze the data reported in this paper is available from the lead contact upon reasonable request.

References

Siegel, R. L., Miller, K. D., Fuchs, H. E. & Jemal, A. Cancer statistics, 2022. CA Cancer J. Clin. 72, 7–33 (2022).
Article PubMed Google Scholar
Shintani, Y. et al. Clinical features and outcomes of patients with stage I multiple primary lung cancers. Cancer Sci. 112, 1924–1935 (2021).
Article PubMed PubMed Central CAS Google Scholar
Mascalchi, M. et al. Screen-detected multiple primary lung cancers in the ITALUNG trial. J. Thorac. Dis. 10, 1058–1066 (2018).
Article PubMed PubMed Central Google Scholar
Cheng, H., Lei, B. F., Peng, P. J., Lin, Y. J. & Wang, X. J. Histologic lung cancer subtype differentiates synchronous multiple primary lung adenocarcinomas from intrapulmonary metastases. J. Surg. Res. 211, 215–222 (2017).
Article PubMed CAS Google Scholar
Shao, J. et al. A comprehensive algorithm to distinguish between MPLC and IPM in multiple lung tumors patients. Ann. Transl. Med. 8, 1137–1137 (2020).
Article PubMed PubMed Central Google Scholar
Detterbeck, F. C. et al. The IASLC lung cancer staging project: summary of proposals for revisions of the classification of lung cancers with multiple pulmonary sites of involvement in the forthcoming eighth edition of the TNM classification. J. Thorac. Oncol. 11, 639–650 (2016).
Article PubMed Google Scholar
Hamaji, M., Ali, S. O. & Burt, B. M. A meta-analysis of resected metachronous second non-small cell lung cancer. Ann. Thorac. Surg. 99, 1470–1478 (2015).
Article PubMed Google Scholar
Nie, Y. et al. Surgical prognosis of synchronous multiple primary lung cancer: systematic review and meta-analysis. Clin. Lung Cancer 22, 341–350.e3 (2021).
Article PubMed Google Scholar
Girard, N. et al. Comprehensive histologic assessment helps to differentiate multiple lung primary nonsmall cell carcinomas from metastases. Am. J. Surg. Pathol. 33, 1752–1764 (2009).
Article PubMed PubMed Central Google Scholar
Martini, N. & Melamed, M. R. Multiple primary lung cancers. J. Thorac. Cardiovasc. Surg. 70, 606–612 (1975).
Article PubMed CAS Google Scholar
Antakli, T., Schaefer, R. F., Rutherford, J. E. & Read, R. C. Second primary lung cancer. Ann. Thorac. Surg. 59, 863–867 (1995).
Article PubMed CAS Google Scholar
Kozower, B. D., Larner, J. M., Detterbeck, F. C. & Jones, D. R. Special treatment issues in non-small cell lung cancer: Diagnosis and management of lung cancer: American College of Chest Physicians evidence-based clinical practice guidelines. Chest 143, e369S–e399S (2013).
Article PubMed Google Scholar
Tian, S. et al. Differential diagnostic value of histology in MPLC and IPM: a systematic review and meta-analysis. Front. Oncol. 12, 871827 (2022).
Wang, Z. et al. Towards the molecular era of discriminating multiple lung cancers. EBioMedicine 90, 104508 (2023).
Article PubMed PubMed Central CAS Google Scholar
Takahashi, Y. et al. Comparative mutational evaluation of multiple lung cancers by multiplex oncogene mutation analysis. Cancer Sci. 109, 3634–3642 (2018).
Article PubMed PubMed Central CAS Google Scholar
Goodwin, D., Rathi, V., Conron, M. & Wright, G. M. Genomic and clinical significance of multiple primary lung cancers as determined by next-generation sequencing. J. Thorac. Oncol. 16, 1166–1175 (2021).
Article PubMed CAS Google Scholar
Chang, J. C. et al. Comprehensive next-generation sequencing unambiguously distinguishes separate primary lung carcinomas from intrapulmonary metastases: comparison with standard histopathologic approach. Clin. Cancer Res. 25, 7113–7125 (2019).
Article PubMed PubMed Central CAS Google Scholar
Ezer, N. et al. Integrating NGS-derived mutational profiling in the diagnosis of multiple lung adenocarcinomas. Cancer Treat. Res. Commun. 29, 100484 (2021).
Article PubMed Google Scholar
Belardinilli, F. et al. A multidisciplinary approach for the differential diagnosis between multiple primary lung adenocarcinomas and intrapulmonary metastases. Pathol. Res. Pr. 220, 153387 (2021).
Article CAS Google Scholar
Bruehl, F. K. et al. Does histological assessment accurately distinguish separate primary lung adenocarcinomas from intrapulmonary metastases? A study of paired resected lung nodules in 32 patients using a routine next-generation sequencing panel for driver mutations. J. Clin. Pathol. 75, 390–396 (2022).
Article PubMed Google Scholar
Patel, S. B. et al. Next-generation sequencing: a novel approach to distinguish multifocal primary lung adenocarcinomas from intrapulmonary metastases. J. Mol. Diagn. 19, 870–880 (2017).
Article PubMed CAS Google Scholar
Roepman, P. et al. Added value of 50-gene panel sequencing to distinguish multiple primary lung cancers from pulmonary metastases: a systematic investigation. J. Mol. Diagn. 20, 436–445 (2018).
Article PubMed CAS Google Scholar
Donfrancesco, E. et al. Histopathological and molecular study for synchronous lung adenocarcinoma staging. Virchows Arch. 476, 835–842 (2020).
Article PubMed CAS Google Scholar
Zheng, R. et al. Molecular profiling of key driver genes improves staging accuracy in multifocal non-small cell lung cancer. J. Thorac. Cardiovasc. Surg. 160, e71–e79 (2020).
Article PubMed Google Scholar
Mansuet-Lupo, A. et al. Proposal for a combined histomolecular algorithm to distinguish multiple primary adenocarcinomas from intrapulmonary metastasis in patients with multiple lung tumors. J. Thorac. Oncol. 14, 844–856 (2019).
Article PubMed CAS Google Scholar
Chen, K. et al. Spatiotemporal genomic analysis reveals distinct molecular features in recurrent stage I non-small cell lung cancers. Cell Rep. 40, 111047 (2022).
Article PubMed CAS Google Scholar
Ostrovnaya, I., Seshan, V. E., Olshen, A. B. & Begg, C. B. Clonality: an R package for testing clonal relatedness of two tumors from the same patient based on their genomic profiles. Bioinformatics 27, 1698–1699 (2011).
Article PubMed PubMed Central CAS Google Scholar
Ettinger, D. S. et al. Non-small cell lung cancer, version 3.2022. J. Natl Compr. Cancer Netw. 20, 497–530 (2022).
Article Google Scholar
Hunter, K. W., Amin, R., Deasy, S., Ha, N.-H. & Wakefield, L. Genetic insights into the morass of metastatic heterogeneity. Nat. Rev. Cancer 18, 211–223 (2018).
Article PubMed PubMed Central CAS Google Scholar
Wu, C. T., Lin, M. W., Hsieh, M. S., Kuo, S. W. & Chang, Y. L. New aspects of the clinicopathology and genetic profile of metachronous multiple lung cancers. Ann. Surg. 259, 1018–1024 (2014).
Article PubMed Google Scholar
Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210–216 (2019).
Article PubMed PubMed Central CAS Google Scholar
Bielski, C. M. et al. Genome doubling shapes the evolution and prognosis of advanced cancers. Nat. Genet 50, 1189–1195 (2018).
Article PubMed PubMed Central CAS Google Scholar
Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109–2121 (2017).
Article PubMed CAS Google Scholar
Matsuzawa, R. et al. Factors influencing the concordance of histological subtype diagnosis from biopsy and resected specimens of lung adenocarcinoma. Lung Cancer 94, 1–6 (2016).
Article PubMed Google Scholar
Mauguen, A., Seshan, V. E., Ostrovnaya, I. & Begg, C. B. An EM algorithm to improve the estimation of the probability of clonal relatedness of pairs of tumors in cancer patients. BMC Bioinform. 20, 1–8 (2019).
Article Google Scholar
Zhang, X. C. et al. Comprehensive genomic and immunological characterization of Chinese non-small cell lung cancer patients. Nat. Commun. 10, 1–12 (2019).
Google Scholar
Chen, J. et al. Genomic landscape of lung adenocarcinoma in East Asians. Nat. Genet 52, 177–186 (2020).
Article PubMed CAS Google Scholar
Mauguen, A., Seshan, V. E., Begg, C. B. & Ostrovnaya, I. Testing clonal relatedness of two tumors from the same patient based on their mutational profiles: update of the Clonality R package. Bioinformatics 35, 4776–4778 (2019).
Article PubMed PubMed Central CAS Google Scholar
Mauguen, A., Seshan, V. E., Ostrovnaya, I. & Begg, C. B. Estimating the probability of clonal relatedness of pairs of tumors in cancer patients. Biometrics 74, 321–330 (2018).
Article PubMed CAS Google Scholar
Gydush, G. et al. Massively parallel enrichment of low-frequency alleles enables duplex sequencing at low depth. Nat. Biomed. Eng. 6, 257–266 (2022).
Article PubMed PubMed Central CAS Google Scholar
Tarabichi, M. et al. A practical guide to cancer subclonal reconstruction from DNA sequencing. Nat. Methods 18, 144–155 (2021).
Article PubMed PubMed Central CAS Google Scholar
Zhang, J. et al. Intratumor heterogeneity in localized lung adenocarcinomas delineated by multiregion sequencing. Science 346, 256–259 (2014).
Article PubMed PubMed Central CAS Google Scholar
Litchfield, K. et al. Representative sequencing: unbiased sampling of solid tumor tissue. Cell Rep. 31, 107550 (2020).
Article PubMed CAS Google Scholar
Kader, T., Zethoven, M. & Gorringe, K. L. Evaluating statistical approaches to define clonal origin of tumours using bulk DNA sequencing: context is everything. Genome Biol. 23, 1–23 (2022).
Article Google Scholar
Sharma, Y. et al. A pan-cancer analysis of synonymous mutations. Nat. Commun. 10, 1–14 (2019).
Article Google Scholar
Schultheis, A. M. et al. Massively parallel sequencing-based clonality analysis of synchronous endometrioid endometrial and ovarian carcinomas. J. Natl Cancer Inst. 108, djv427 (2015).
Article Google Scholar
Zhang, T. et al. Genomic and evolutionary classification of lung cancer in never smokers. Nat. Genet 53, 1348–1359 (2021).
Article PubMed PubMed Central CAS Google Scholar
Caso, R. et al. The underlying tumor genomics of predominant histologic subtypes in lung adenocarcinoma. J. Thorac. Oncol. 15, 1844–1856 (2020).
Article PubMed PubMed Central CAS Google Scholar
Moher, D., Liberati, A., Tetzlaff, J. & Altman, D. G. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ 339, 332–336 (2009).
Article Google Scholar
Tierney, J. F., Stewart, L. A., Ghersi, D., Burdett, S. & Sydes, M. R. Practical methods for incorporating summary time-to-event data into meta-analysis. Trials 8, 1–16 (2007).
Article Google Scholar
Lv, W. et al. Noninvasive prenatal testing for Wilson disease by use of circulating single-molecule amplification and resequencing technology (cSMART). Clin. Chem. 61, 172–181 (2015).
Article PubMed CAS Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Article PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article PubMed PubMed Central CAS Google Scholar
Chen, S. et al. Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data. BMC Bioinform. 20, 1–8 (2019).
Article Google Scholar
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, 1–7 (2010).
Article Google Scholar
McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article PubMed PubMed Central CAS Google Scholar
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Article PubMed PubMed Central CAS Google Scholar
Tate, J. G. et al. COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res. 47, D941–D947 (2019).
Article PubMed CAS Google Scholar
Chang, M. T. et al. Accelerating discovery of functional mutant alleles in cancer. Cancer Discov. 8, 174–183 (2018).
Article PubMed CAS Google Scholar
Ettinger, D. S. et al. Non-small cell lung cancer, Version 2.2021 featured updates to the NCCN guidelines. J. Natl Compr. Cancer Netw. 19, 254–266 (2021).
Article CAS Google Scholar
Ostrovnaya, I., Seshan, V. E. & Begg, C. B. Using somatic mutation data to test tumors for clonal relatedness. Ann. Appl. Stat. 9, 1533–1548 (2015).
Article PubMed PubMed Central Google Scholar
Zhou, W. et al. TransVar: a multilevel variant annotator for precision genomics. Nat. Methods 12, 1002–1003 (2015).
Article PubMed PubMed Central CAS Google Scholar
Zhang, X. et al. A novel NGS-based diagnostic algorithm for classifying multifocal lung adenocarcinomas in pN0M0 patients. J. Pathol.: Clin. Res. 9, 108–120 (2023).
PubMed CAS Google Scholar
Ouellette, T. W. & Awadalla, P. Inferring ongoing cancer evolution from single tumour biopsies using synthetic supervised learning. PLoS Comput. Biol. 18, 1–30 (2022).
Article Google Scholar
Gillis, S. & Roth, A. PyClone - VI: scalable inference of clonal population structures using whole genome data. BMC Bioinform. https://doi.org/10.1186/s12859-020-03919-2 (2020).
Dentro, S. C., Wedge, D. C. & Van Loo, P. Principles of reconstructing the subclonal architecture of cancers. Cold Spring Harb. Perspect. Med. 7, a026625 (2017).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (No. 82072566, No. 82373416, No. 92059203 and No. 82388102), CAMS Medical and Health Science and Technology Innovation Project (2021-I2M-5-002), Chinese Academy of Medical Sciences (2021RU002), Clinical Medicine Plus X - Young Scholars Project, Peking University, the Fundamental Research Funds for the Central Universities (PKU2023LCXQ008), Peking University People's Hospital Research and Development Funds (RZ2022-03). The funders had no role in paper design, data collection, data analysis, interpretation and writing of the paper.

Author information

These authors contributed equally: Ziyang Wang, Xiaoqiu Yuan, Kunkun Sun, Fang Wu, Ke Liu.
These authors jointly supervised this work: David Carbone, Guanchao Jiang.

Authors and Affiliations

Department of Thoracic Surgery, Peking University People’s Hospital, Beijing, 100044, China
Ziyang Wang, Xiaoqiu Yuan, Yiruo Jin, Yichen Jin, Yun Li, Fan Yang, Jun Wang, Guanchao Jiang & Kezhong Chen
Thoracic Oncology Institute, Peking University People’s Hospital, Beijing, 100044, China
Ziyang Wang, Xiaoqiu Yuan, Yiruo Jin, Yichen Jin, Yun Li, Fan Yang, Jun Wang, Guanchao Jiang & Kezhong Chen
Research Unit of Intelligence Diagnosis and Treatment in Early Non-small Cell Lung Cancer, Chinese Academy of Medical Sciences, 2021RU002, Peking University People’s Hospital, Beijing, 100044, China
Ziyang Wang, Xiaoqiu Yuan, Yiruo Jin, Yichen Jin, Yun Li, Fan Yang, Jun Wang, Guanchao Jiang & Kezhong Chen
Peking University Health Science Center, Beijing, China
Xiaoqiu Yuan & Yiruo Jin
Department of Pathology, Peking University People’s Hospital, Beijing, China
Kunkun Sun
Department of Oncology, The Second Xiangya Hospital, Changsha, Hunan, 410011, China
Fang Wu
Hunan Cancer Mega-Data Intelligent Application and Engineering Research Centre, Changsha, Hunan, China
Fang Wu
Changsha Thoracic Cancer Prevention and Treatment Technology Innovation Center, Changsha, Hunan, China
Fang Wu
Berry Oncology Corporation, Beijing, China
Ke Liu, Airong Yang & Jing Li
University College London Cancer Institute, University College London, London, UK
Olga Chervova & Stephan Beck
China-Japan Friendship Hospital, Beijing, China
Yuntao Nie
James Thoracic Oncology Center, Ohio State University, Columbus, USA
David Carbone

Authors

Ziyang Wang
View author publications
Search author on:PubMed Google Scholar
Xiaoqiu Yuan
View author publications
Search author on:PubMed Google Scholar
Kunkun Sun
View author publications
Search author on:PubMed Google Scholar
Fang Wu
View author publications
Search author on:PubMed Google Scholar
Ke Liu
View author publications
Search author on:PubMed Google Scholar
Yiruo Jin
View author publications
Search author on:PubMed Google Scholar
Olga Chervova
View author publications
Search author on:PubMed Google Scholar
Yuntao Nie
View author publications
Search author on:PubMed Google Scholar
Airong Yang
View author publications
Search author on:PubMed Google Scholar
Yichen Jin
View author publications
Search author on:PubMed Google Scholar
Jing Li
View author publications
Search author on:PubMed Google Scholar
Yun Li
View author publications
Search author on:PubMed Google Scholar
Fan Yang
View author publications
Search author on:PubMed Google Scholar
Jun Wang
View author publications
Search author on:PubMed Google Scholar
Stephan Beck
View author publications
Search author on:PubMed Google Scholar
David Carbone
View author publications
Search author on:PubMed Google Scholar
Guanchao Jiang
View author publications
Search author on:PubMed Google Scholar
Kezhong Chen
View author publications
Search author on:PubMed Google Scholar

Contributions

Z.Y.W. and K.L.: Conceptualization, resources, data curation, formal analysis, validation, investigation, visualization, methodology, writing—original draft, writing—review and editing. X.Q.Y.: Conceptualization, resources, formal analysis, validation, investigation, visualization, methodology, writing—original draft, and writing—review and editing. K.K.S. and F.W.: Resources, formal analysis, supervision, methodology, writing—original draft. R.Y.J., O.C., S.B., and D.C.: Writing—original draft, validation, writing—review and editing. Y.T.N.: Conceptualization, resources, investigation, methodology. A.R.Y. and J.L.: Formal analysis, supervision, validation, visualization, project administration, writing—review and editing. Y.C.J.: Resources, validation, writing—original draft. Y.L., F.Y., J.W., and G.C.J.: Supervision, funding acquisition, methodology, project administration, writing—review and editing. K.Z.C.: Conceptualization, formal analysis, supervision, funding acquisition, methodology, validation, writing—original draft, project administration, writing—review and editing.

Corresponding author

Correspondence to Kezhong Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, Z., Yuan, X., Sun, K. et al. Optimizing the NGS-based discrimination of multiple lung cancers from the perspective of evolution. npj Precis. Onc. 9, 14 (2025). https://doi.org/10.1038/s41698-024-00786-5

Download citation

Received: 01 April 2024
Accepted: 14 December 2024
Published: 14 January 2025
Version of record: 14 January 2025
DOI: https://doi.org/10.1038/s41698-024-00786-5