Introduction

Multiple lung cancers (MLCs) refer to the occurrence of more than one malignant tumor in the lung (s), where tumors of independent origins are termed multiple primary lung cancers (MPLC) while those sharing the original clone are intrapulmonary metastasis (IPM). As lung cancer remains the leading cause of cancer-related death worldwide1, MLCs account for approximately 4.5 to 15.8%2,3. The clinical management is strongly disparate within MLCs4,5,6. MPLC patients usually receive curative resection and achieve long-term survival, while adjuvant or neoadjuvant therapies are predispositions for intrapulmonary metastasis IPM considering the worse prognosis7,8. Incorrect discrimination of MLCs leads to improper treatment and thus should be avoided. Clinicopathology has laid the foundation for discriminating MPLC and IPM, with comprehensive histology assessment (CHA)9 as the guideline recommendation6,10,11,12, nevertheless displaying insufficiency in certain cases with a sensitivity of 76–78% and a specificity of 47–74%13,14. The advent of next-generation sequencing (NGS) revolutionized the discrimination of MLCs5,13,14,15,16 but lacking explicit protocol accessible for medical practitioners. Limited studies tentatively pointed out the requirement of at least 100 genes to define clonal relatedness of MLCs with no validation17. Moreover, most studies empirically interpreted the sequencing results by counting the number of shared mutations, while only a few attempted to calculate the clonal probability based on all the mutations with bioinformatic tools16,17,18. The critical issue of what panel to use and how to infer the clonal relatedness from sequencing results remained unresolved. Therefore, we designed this integrative study (Fig. 1) to fill in the gaps and establish a detailed procedure.

Fig. 1: Design of this study.
figure 1

The left half illustrates the main workflow of our study, starting with a systematic review, followed by simulation analysis based on WES data from solitary NSCLC patients and independent validation comprising two MLCs cohorts, including a WES cohort (N = 42) and non-WES cohorts (N = 94). The right half depicted mutation-based clonal relatedness analysis adopted in this study, including MoleA (counting the shared mutations) and MoleB (clonal probability calculation). More details are elaborated in the “Methods” section. MLCs multiple lung cancers, MPLC multiple primary lung cancers, IPM intrapulmonary metastasis, WES whole-exome sequencing, NGS next-generation sequencing, NSCLC non-small cell lung cancer.

Results

Exploration of panel size and clonal relatedness analysis from previous studies

A comprehensive comparison of different diagnostic methods was conducted, incorporating clinical-pathological data and 2842 molecularly testable variants identified in 495 patients from 15 retrospective studies (Fig. 2, Supplementary Tables 1 and 2). Here, we present three main conclusions and the related challenges in discrimination of MLCs. First, while NGS determines all cases with equivocal pathological diagnosis here, it does not outperform clinical or pathological methods due to the similar inconclusive rates (Mole A: 6.7%, 29/435; Mole B all-gene: 9.7%, 48/495; Mole B > 50-genes: 5.1%, 6/314 vs CHA 9.8%, 27/274) (Supplementary Tables 3 and 4)15,18,19,20,21,22,23,24 and consistent failures in prognosis stratification (Supplementary Table 2)15,17,22,23,25. Second, the comparison of the optimal method for assessing clonal relatedness remains unvalidated against a gold standard. Although discrimination through counting shared mutations or calculating clonal probability shows similar efficacy, the lack of follow-up data hinders more definitive validation (Fig. 2A, Supplementary Tables 1, 4 and 5). Third, panel selection remains confusing. The similar inconclusive rates between the “50 genes” and “>300 genes” groups (6.7% vs. 4.5%) suggest that blindly expanding the panel leads to diminishing marginal returns (Fig. 2B). However, the benefits of expanding sequencing coverage to resolve uncertainties, where the sequencing results of rarely analyzed genes helped clarify 21 out of 32 inconclusive cases (65%) (Fig. 2C), along with the substantial inconsistency between pan-cancer panels and the hotspot mutations in MLCs, underscore the importance of optimizing panels specifically for MLCs identification (Fig. 2D). Given these key issues and challenges, we conducted further investigations to address them.

Fig. 2: Data extracted from previous studies implicate the superiority of NGS in discriminating MLCs.
figure 2

A Conclusive rates of the clinical criterion (Martini-Melamed criteria and proposals in the American Joint Committee on Cancer (AJCC) staging system), pathology and NGS (Empirical &Bioinformatic interpretation of sequencing results) of different coverages subsampled from the extracted sequencing data (sub3genes, etc). These charts were separated according to the panels (below 10 genes, etc) used in the original articles. B Summary of discrimination results of different mimicked panels. C The effects of the coverage excluding the 50 genes (sub50 genes or non 50 genes) compared with the 50-genes panel. The results were changed in 21 cases. D Overlaps between the genes covered by most NGS panels and the genes mutated frequently in MLCs. The number of mutations located on a specific gene is defined as the total number of selected mutations divided by the proportion of mutated cases in all cases where the corresponding gene was sequenced. The diagnosis of the different methods applied to each patient in included studies is presented in tabular form (Supplementary Tables 3 and 4). MLCs multiple lung cancers, MPLC multiple primary lung cancers, IPM intrapulmonary metastasis.

Simulated data identifies the optimal panels and molecular approaches for interpreting clonal relatedness

WES data from the 235 samples from 80 patients with solitary NSCLC in our institution were used for simulation. The review of the mutational landscape of mimicked MLCs26 shows that the vast majority (92.6%) of sim-IPM share at least one mutation, whereas sim-MPLC (89.3%) carry almost no identical mutations (Fig. 3A, Supplementary Table 6), partially supporting the validity of the simulation method. Thus, we first compared the effectiveness of two interpretation methods: MoleA and MoleB (detailed in the “Methods” part)27. Overall, as the sequencing range broadens from one gene (EGFR) to WES, the diagnostic uncertainty declines (MoleA from 62.2% ± 0.59% to 1.68% ± 0.16%; MoleB from 43.7% ± 0.90% to 0.0% ± 0.0%, Fig. 3B) while AUC increases (MoleA from 0.437 ± 0.009 to 0.949 ± 0.002; MoleB from 0.910 ± 0.005 to 0.987 ± 0.001, Fig. 3C). For each panel, MoleB consistently diagnoses more cases than MoleA and achieves higher AUCs, implying the superiority of bioinformatic interpretation in assessing clonal relatedness.

Fig. 3: Simulation data reveals the optimal panels and superiority of bioinformatic analysis in judging clonal relatedness.
figure 3

A Left panel: heatmap of overlapped mutations in 230 mimicked IPM tumor pairs, ranked by frequencies and counts. Right panel: heatmap of mutations in 235 tumor samples from 80 solitary NSCLC patients, ranked by frequencies and counts. B The proportions of inconclusive cases corresponding to different panels with MoleB or MoleA and their drop rates. C The AUCs corresponding to different panels with MoleB or MoleA and their growth rates. D The ROC curve of 10-genes panel with MoleB. E The ROC curve of WES with MoleB. F The ROC curve of 363-genes panel with MoleA. G The ROC curve of WES with MoleA. MoleA empirical interpretation by counting the shared mutations, MoleB bioinformatic interpretation by calculating the clonal probability based on all the mutations, AUC area under the receiver operating curve.

Furthermore, we attempted to identify the optimal panels for MLCs discrimination. The inflection points on the curves of inconclusive fractions or AUC values indicate optimal coverages candidates (Fig. 3B, C), balancing the accuracy loss of panel simplification against the limited benefits of panel expansion. Based on the drop and growth rates, the inflections occur at 10-genes (covering TP53 plus nine driver genes recommended by the NCCN28: EGFR, KRAS, ALK, BRAF, ERBB2, MET, RET, ROS1, PIK3CA) under MoleB and at 363-genes (one pancancer panel) for MoleA (Fig. 3B, C). Therefore, the 10-genes panel plus MoleB seems to be sufficient in most cases while a pan-cancer panel of at least 363 genes is necessary when bioinformatic support is limited. Although inconclusiveness only disappears with WES MoleB, the ROC curves (Fig. 3D, E) show excellent and comparable accuracy between the 10-genes MoleB (AUC = 0.950 ± 0.002) and WES MoleB (AUC = 0.983 ± 0.001). The 363-genes MoleA performs weaker than WES MoleA (AUC = 0.949 ± 0.002), but still shows an acceptable AUC of 0.792 ± 0.004 (Fig. 3F, G). ROC curves for other coverages are available in Supplementary Fig. 1 (MoleA) and Supplementary Fig. 2 (MoleB).

MLCs cohorts verify the optimal panels and clonal relatedness analysis

The clinical characteristics of two validation MLCs cohorts are provided in Table 1. Stage I, never-smokers, lung adenocarcinomas and synchronous MLCs dominate both cohorts. The average follow-up time was 51 months.

Table 1 Clinical characteristics of MLCs patients

Initially, we verified the superiority of NGS and identify the optimal panels. In the WES cohort, clinical method and CHA are clearly less conclusive than two molecular methods. CHA was considered the previous gold standard diagnostic method. However, fourteen cases are inconclusive or undiagnosable. In contrast, molecular testing displays higher conclusive rates (10-genes MoleB = 75.6%, WES MoleB = 100%, 363-genes MoleA = 82.9%, WES MoleA = 87.8%) (Fig. 4A).

Fig. 4: Analysis of the WES cohort verifies the superiority of optimal panels in discriminating MLCs.
figure 4

A Clinical features, discriminating results, and mutational profiles of the 42 patients in WES cohort. The stage was the highest stage among the MLCs when staging all lesions separately. T1, T2, and T3 refer to different tumors of the same MLCs patient. A square consisting of two colors represents that the patient has multiple tumor pairs with different identification results. B Survival analyses stratified by clinic criteria of ACCP criteria. C Survival analyses stratified by CHA. D Survival analyses of patients diagnosed by the 10-genes panel with MoleB. E Survival analyses of patients diagnosed by WES with MoleB. F Summary of survival analyses stratified by different panels with MoleB, subsampled from WES data. G Heatmap of mutations with high frequency in the 10 cases where the 10-genes panel failed to detect mutations. H Typical pathologic manifestations of MPLC and IPM and one ambiguous example of MLCs. I Concordant rates between different discriminating methods (note that the unit here is “tumor pairs”). J Sensitivities and specificities for diagnosing MPLC or IPM of different discriminating methods, with WES MoleB as reference. MLCs multiple lung cancers, MPLC multiple primary lung cancers, IPM intrapulmonary metastasis, WES whole-exome sequencing, MoleA empirical interpretation by counting the shared mutations, MoleB bioinformatic interpretation by calculating the clonal probability based on all the mutations, ACCP American College of Chest Physicians, CHA comprehensive histology assessment.

Then, we evaluated the effectiveness of the diagnostic methods based on their ability to differentiate prognosis. After excluding five non-stage I patients to reduce the bias of tumor staging, all molecular methods successfully distinguish the DFS between MPLC and IPM (Fig. 4D, 10-genes MoleB P = 0.016; Fig. 4E, 363-genes MoleA P = 0.029; Supplementary Fig. 3L, WES MoleB P = 0.033; Supplementary Fig. 4L, WES MoleA P = 0.029) while clinical and CHA fail (Fig. 4B, clinical P = 0.92; Fig. 4C, CHA P = 0.87). Additionally, all panels using ≥10 genes by MoleB and those with ≥363 genes using MoleA, demonstrate the ability to stratify prognosis (Fig. 4F, Supplementary Figs. 3 and 4), consistent with the simulation part. It is noteworthy that although the 3-gene MoleB can also stratify the prognosis, its conclusive rate (24/41, 58.5%) is insufficient to provide meaningful clinical value.

Given that high-depth NGS could amplify variation from single-site sampling, potentially leading to inconsistent results compared to the WES cohort, we also conducted a validation using a non-WES cohort to better reflect clinical practice. A similar trend was also observed in the high-depth non-WES cohort, where both clinical and pathological methods failed to distinguish the prognosis of MLCs (clinical P = 0.6, pathological P = 0.99; ≥10 genes, N = 49: clinical P = 0.53, pathological P = 0.69, Supplementary Fig. 5B and Supplementary Fig. 5C). In contrast, 10-genes MoleB significantly stratify the prognosis (P = 0.0011, Supplementary Fig. 5D). However, the 363-genes MoleA is unable to separate the survival curves.

Therefore, the 10-gene MoleB is the most cost-effective choice, meeting the requirements for both high conclusive rates and diagnostic accuracy. To develop the most cost-effective sequencing strategy, we reviewed the mutations of the 10 inconclusive patients by 10-genes MoleB and found that certain high-frequency mutations, such as TTN (with the highest frequency of 36.4%, Fig. 4G), are typically excluded from routine sequencing due to their limited therapeutic value. However, since these genes are often passenger or background mutations that may reflect similar mutational landscapes but cannot reliably indicate a shared clonal origin, designing an optimized diagnostic panel with the best cost-effectiveness still requires more careful thought and validation.

Next, we compare different approaches for discriminating MLCs. To illustrate the challenges CHA may encounter, we presented three pathological examples: MPLC (P38), IPM (P3), and inconclusiveness (P32) in Fig. 4H. The consistency of diagnostic results is low in the clinical-molecular comparison, moderate in the pathology-molecular comparison, and high in the molecular-molecular comparison (Fig. 4I). CHA disagrees with WES MoleB in 15.4% (6/39) of tumor pairs. In these cases, histological similarity (P12, P15) or differences in morphologies (P5, P6, and P34) contradict their mutational profiles (Supplementary Fig. 6). The varying relationships between genotype and phenotype could partially explain the discordance, reminding the inherent weakness of pathology. We then adopted WES MoleB as the reference to evaluate the validity of molecular methods, as it is the only method capable of diagnosing all cases while effectively stratifying prognosis (Fig. 4J). 10-genes MoleB holds the highest sensitivity for diagnosing IPM (i.e., highest specificity for diagnosing MPLC) and balanced performance considering the small difference between sensitivity and specificity, while the 363-genes MoleA bears the highest specificity for diagnosing IPM (i.e., highest sensitivity for diagnosing MPLC).

Based on these findings, molecular testing should be performed in parallel with pathology, rather than exclusively, to achieve optimal results. The 10-genes panel (NCCNplus panel) is preferred when bioinformatics support is fully available while at least 363-genes (pancancer panel) NGS is required if bioinformatics is limited. WES remains the ultimate solution for both MoleA and MoleB.

Evolutionary features of shared mutations in MLCs

We reviewed two cases that encountered challenges in clinical or pathological discrimination but were corrected through phylogenetic reconstruction, preliminarily revealing the evolutionary features of MLCs and their value in distinguishing MLCs.

Case 1 (P15, Fig. 5A) is a 73-year-old man presented with two solid nodules in the left upper lobe (T1) and the right upper lobe (T2). Pathology showed similar adenocarcinomas with a predominant acinar pattern. However, using the WES-MoleB method, the final diagnosis was MPLC, further confirmed by the phylogenetic tree. He has been free of recurrence for 43 months. Case 2 (P34, Fig. 5B) is a 62-year-old man had two pure solid tumors removed from the left upper lobe. The pathologist diagnosed the patient as MPLC since his lesions were adenocarcinomas with different proportions of acinar and papillary components. However, the results from WES-MoleB and the phylogenetic tree pointed to IPM. This patient received four cycles of chemotherapy and died of intracranial metastasis 33 months after surgery.

Fig. 5: Representative cases of pathological misdiagnoses and exploration of MLCs evolution.
figure 5

A Images, pathology, mutation profiles, and the phylogenetic tree of a pathologically-inconclusive patient with a clear molecular diagnosis of MPLC (case 1). T1 and T2 refer to tumor1 and tumor2 of the same patient. B Images, pathology, mutation profiles, and the phylogenetic tree of a molecular-diagnosed IPM patient with typical MPLC pathology (case 2). C Heatmap of overlapped mutations within MLCs cases, ranked by frequencies and counts. D VAF distributions of overlapped mutations within MPLC and IPM. E Clonal/subclonal compositions of overlapped mutations within MPLC and IPM. MLCs multiple lung cancers, MPLC multiple primary lung cancers, IPM intrapulmonary metastasis, VAF variant allele frequency.

This demonstrates again that the different evolutionary trajectories of MPLC and IPM differ greatly between IPM and MPLC (Fig. 5C). In total, we identified 283 clonal driver events (MPLC, total 197, median 2, range 0–29; IPM, total 86, median 2; range 0–7), 98 subclonal drivers (MPLC, total 86, median 1, range 0–7; IPM, total 12, range 0–2), and 103 overlapped driver mutations in IPM (clonal, total 94, median 1, range 0–7; subclonal, total 9), Supplementary Figs. 8 and 9). First, we focused on shared mutations most likely to impact the molecular differentiation of MLCs. In general, MPLC and IPM show a statistically significant difference in the clonal/subclonal classification of shared mutations (P = 0.0000539) (Fig. 5D, E, Supplementary Fig. 7). Specifically, overlapping mutations in MPLC are predominantly subclonal or neutral with low VAFs (76.9%, 20/26), suggesting the independent lesion progression and accidental overlap. In contrast, shared mutations in IPM include a larger proportion (72.8%, 8/11) of clonal mutations with VAF close to 0.5, along with a minor proportion of subclonal mutations with low VAF, possibly resulting from complex late-stage dissemination of metastatic clones, such as cross-seed or reseed (Figs. 5D and 6A)29. Thus, if feasible in the future, parallel sequencing of all MLCs tumors, with VAF incorporated as a correction factor in clonality assessments, is highly recommended to differentiate coincidental low-VAF overlapped mutations in MPLC. Besides, we found that only 4 out of the 10 genes in NCCNplus panel play a role in the evolution of MLCs. A modified panel specific to MLCs, based on unique driver mutations, was developed (Supplementary Table 7), which successfully classified 41.67% of the inconclusive cases using 10-genes MoleB.

Fig. 6: Timing of mutations in MLCs evolution based on the WES cohort.
figure 6

A Tumor evolution of MLCs, merging MPLC and IPM. B Tumor evolution of IPM. This Figure shows the approximate timing of driver mutations with respect to the cancer life history. Driver genes were screened against the COSMIC database. The timing of mutations is shown as bars indicating whether the events are clonal or subclonal. Clonal mutations are further timed as early, late, or untimed with respect to whole-genome doubling. The frequency of mutations (subclonal and total) is indicated on the right side of the bars. Only genes containing ≥2 driver alterations across the cohort are included. MLCs multiple lung cancers, MPLC multiple primary lung cancers, IPM intrapulmonary metastasis, WES whole-exome sequencing, Pre-GD occurrence before whole-genome doubling, Post-GD occurrence after whole-genome doubling.

Genome doubling in metastatic progression of MLCs

To decipher the timing of somatic events in each tumor, we further conduct the analysis of genome doubling in MLCs. Generally, WGD is observed in 43% (37/86) of all lesions. The incidence of WGD in MPLC (46.8%, 29/62) is higher than that of IPM (36.0%, 9/25) (Supplementary Fig. 8). Previous studies suggested that in metachronous IPM, most primary foci appeared larger than metastases30, and there is no significant difference in tumor sizes between synchronous and metachronous MLCs15. MPLC are often considered unrelated primary tumors, whereas IPM lesions tend to show greater correlation in metastatic progression. Therefore, we attempted to preliminarily investigate the discrepancy between the primary site (IPM-L) and the metastatic site (IPM-S) in IPM from the perspective of tumor evolution (Fig. 6). Consistent with the common findings, IPM-S (46.2%, 6/13) always harbor more WGD events than IPM-L (25.0%, 3/12)31,32. The temporal distribution of mutations in IPM-S is statistically different from those in MPLC, IPM-L, and SLC (a cohort of solitary lung cancer from our institution), while no significant differences were observed among MPLC, IPM-L, and SLC (P = 0.61) (Supplementary Fig. 9C). Moreover, most mutations in IPM-S are clonal, and more than 75% of them occur before WGD, indicating that metastasis could occur at early stages of cancer development. These findings suggest that WGD may serve as a potential biomarker for distinguishing between IPM and MPLC from an evolutionary perspective. Additionally, the timing of overlapping mutations in IPM differ significantly from that of SLC (P = 0.0003), however, this may be influenced by differences in sample size and population characteristics (Supplementary Fig. 9D). Similarly, the temporal analysis of mutations in TRACERx33 and SLC also exhibits significant differences (Supplementary Fig. 9, P < 0.001), also emphasizing the impact of population distinction on the analysis results.

Recommended process for clinicopathologic-molecular discrimination of MLCs

Incorporating the preceding findings, we proposed an integrative procedure for discriminating MLCs (Fig. 7): (1) After preoperative prediction of multiple malignancies, the discrimination of MLCs starts with clinical evaluation of lymph nodes and systemic metastases. (2) With preoperative biopsy of multiple foci, molecular diagnosis is possible but faces practical difficulties, while pathological discrimination is not applicable34. (3) Resected tumors are routinely subjected to pathology and some typical cases could be diagnosed with CHA. Clinical and pathological assessments should follow the Martini criteria10, ACCP guidelines12, or IASLC proposals6. (4) Molecular evaluation is recommended alongside pathology in all cases to reduce misdiagnosis, and should be strongly prioritized in cases with equivocal pathology diagnosis or when experienced pathologists are not available. (5) Bioinformatic support for analyzing detected mutations is recommended to interpret clonal relatedness. On this basis, the NCCNplus panel is the preferred first choice, followed by WES, with the panel modified for MLCs as the alternative option. It is not recommended to blindly use a large panel for sequencing from the beginning. For example, in cases where different oncogenic drivers are found or where one oncogenic driver is present in one sample but absent in another, the results obtained through such analysis are highly reliable, and there is no need to expand the sequencing scope. (7) If no mutations are detected across all lesions, WES should be applied to resolve any inconclusiveness. (8) Techniques based on other molecular markers require further validation.

Fig. 7: Recommended procedure for discriminating MLCs.
figure 7

The gray boxes stand for the diagnostic steps before discriminating MLCs. The pink boxes display the clinicopathological evaluation and possible biopsy. The blue boxes describe the molecular assessment. The discrimination of MLCs starts with clinical evaluation. If preoperative biopsy of multiple foci is possible, it may allow the identification based on molecular features. Resected tumors are routinely subjected to pathological evaluation, which can unambiguously diagnose some typical cases with CHA. Molecular evaluation is recommended to be carried out together with pathology for all cases to reduce misdiagnosis. If the case is pathologically equivocal or lacking experienced pathologists, molecular evaluation should be preferentially performed. Bioinformatic interpretation of detected mutations is advised to quantify the clonal relatedness. On this premise, NGS with the NCCNplus panel (9 drivers recommended by the NCCN [EGFR, KRAS, ALK, BRAF, ERBB2, MET, RET, ROS1, PIK3CA] plus TP53) is recommended as the first choice, followed by WES as the second choice, and NGS with panels modified for MLCs as the third choice. With limited access to bioinformatic analysis, NGS using pancancer panels is recommended. If no mutation has been detected by limited sequencing, WES should be applied to eliminate inconclusiveness. Techniques based on other marker, such as variations in chromosomes or RNAs need further verification but could be carried out simultaneously with the former methods for research purposes. MLCs multiple lung cancers, MPLC multiple primary lung cancers, IPM intrapulmonary metastasis, NGS next-generation sequencing, WES whole-exome sequencing, ACCP American College of Chest Physicians, IASLC International Association for the Study of Lung Cancer, NCCN National Comprehensive Cancer Network.

Discussion

Although NGS owns the ability to determine tumor clonal relatedness, it remains controversial in application details. This study unprecedently investigated the optimal panels and mutation-based interpretation of clonal relatedness, exploring the evolutionary characteristics of MLCs in depth, and further developed a feasible process for discriminating MLCs integrating clinicopathology and NGS.

Generally, MoleA shows clear weakness against MoleB when applying the same panel, likely to be due to the qualitative nature and susceptibility to limited coverages. In contrast, reproducible calculations could minimize human error to the greatest extent in ambiguous cases. To date, three studies have utilized bioinformatic analysis to discriminate MLCs, however, the implications of the research should be interpreted with caution. Ezer et al.18 proposed an algorithm only considering shared mutations and failed to distinguish the prognosis. Chang et al.17 and Goodwin et al.16 both utilized the SNVtest function from the “Clonality” package, which is less accurate and less adaptable for target-specific parameter adjustments compared to the mutation.rem used in this study35. Other limitations include the relatively short follow-up time of 15 months and the failure of prognosis analysis in Chang’s cohort17, as well as the exclusive enrollment of MPLC cases diagnosed using the Martini criteria in Goodwin’s research16. Besides, ever-smokers predominated these cohorts, which differs from the characteristics of MLCs patients in East Asians populations. To be noticed, we implemented several analytical optimizations. Instead of using the default TCGA datasets36,37, mutational datasets from Chinese NSCLC38,39 were used as references to address mutational differences specific to each population. Additionally, to remove the potential bias caused by different sequencing depth, two MLCs cohorts were used for validation. Commonly, low-depth sequencing impedes the capture of low-frequency mutations while high-depth sequencing may be limited by budget constraints40. Technically, calculated from the principle of NGS, the minimum depth to detect ≥95% of subclonal and clonal mutations is 91× and 29×, respectively41. Similar to theoretical assumptions, the low-depth WES panels used in our study appear sufficient for distinguishing MLCs.

We also observed that ITH may impact molecular discrimination. In manual review of the 17 MLCs patients with multi-point sampling, the result of one patient changed when selecting different sampled point within the same tumor. We noticed that this case carried a great burden of subclonal mutations (private mutations in point 1:private in point 2:shared = 483:455:11). This aligns with our previous hypothesis that distinguishing incidental subclonal overlap in MPLC from low-VAF shared mutations in IPM is challenging (Fig. 5D). As the sequencing range expands, more unshared subclonal mutations are detected, while the number of overlapped mutations plateaus. The impact of unshared mutations may reverse the clonal relatedness in some cases, as reflected by the leftward shift in the distribution of clonality signal ξ (Supplementary Table 8). Therefore, it is suggested to incorporate additional metrics related to cancer evolutionary dynamics, such as VAF, CCF, and copy number variation [CNV], for a more accurate assessment of clonal relatedness, rather than relying solely on the numerical count of mutations. Additionally, although single-region biopsy could capture the majority of cancer mutations42, multi-point sampling is strongly recommended for large-volume tumors or highly heterogeneous tumors to avoid misinterpretation43. Phylogenetic quantification to cross-validate the clonal correlation is promising44, though it is still far from being widely adopted in clinical practice.

This study has some limitations. First, despite our best efforts, the sample size was restricted by the inability to obtain adequate specimens or poor DNA quality. Larger datasets are warranted for parameter estimation, and the inclusion of external validation cohorts would further solidify the conclusions. Second, there are discrepancies between the mimicked and real mutation spectrum of MLCs (Figs. 3A and 5C). Nevertheless, reassessment of the simulation data suggests that this method seems acceptable, especially given the current lack of validated MLC datasets. Third, the clonal relationship analysis stopped at the level of SNV. We aim to enhance the application of CNV and fusion analysis in molecular identification in future investigations. The inclusion of synonymous mutations is also a potential optimization approach45. They have been found helpful in clonality assessment46, which is also supported by two of our cases (P20 and P23). However, it is undeniable that the “Clonality” package remains sufficient in most cases, given its consistently satisfactory performance. Moreover, due to objective factors such as sample size, there are further research topics that warrant investigation. For instance, improvements to diagnostic panels for the relatively rare non-adenocarcinomas, which are difficult to diagnose pathologically, require attention. Considering that multiple non-adenocarcinomas always exhibit greater mutational heterogeneity and genomic instability, a broader sequencing scope or more targeted diagnostic panels may improve diagnostic efficiency. Besides, distinct mutational profiles across adenocarcinoma subtypes47,48, as well as intergenic correlations between hotspot mutations17,25 that may affect diagnostic efficiency, also necessitate larger sequencing studies and wider clinical application for further improvements.

Today’s clinicians are confronted with an intricate list of diverse molecular testing, struggling to avoid inaccuracy while reducing the economic burden. We managed to clarify the optimized panels as well as proper interpretation through innovative modeling of different panels by subsampling sequencing data and evolutionary analysis, successfully proposing a feasible procedure for discriminating MLCs.

Methods

Preliminary exploration of NGS-based discrimination through systematic review

The comprehensive online searches were performed by two independent authors (Z.Y.W. and X.Q.Y) from four databases, including PubMed, Web of Science, Scopus, and Cochrane Library from the inception to Sep 20, 2021, using the terms “multiple lung cancers”, “MLCs”, “multiple primary lung cancers”, “MPLC”, “intrapulmonary metastasis”, and “IPM” in combination with “next-generation sequencing”. Searching was restricted to original studies published in English with main text available for extraction of genomic profiles. 89 records were obtained from the four databases. After duplicate removal and rough screening, 15 articles were left for further assessment of full-text. The screening process of this research followed the statement of PRISMA (Supplementary Table 10)49. The quality of all included studies is 3 as they are all retrospective cohort studies following Newcastle-Ottawa Scaling system. Three investigators (Z.Y.W., K.L., and X.Y.Y.) read full text of 15 articles and performed data extraction independently. Any disagreements were resolved by consultation with a fourth reviewer (Y.T.N.). We collected the patient characteristics, identification results by clinicopathology criteria (including histopathology examination, ACCP (American College of Chest Physicians) guideline, Martini-Melamed criteria), discriminating procedure of MLCs, prognosis, and available genomic alterations at different levels. If one article lacked essential data, the authors were contacted for supplementation requests. The process of bioinformatic analysis is detailed later. We calculated hazard ratios (HRs) for survival analysis from the extracted information using established methods50.

Study design and patient selection of the validation cohorts

This study investigated the superiority of NGS, suitable panels, and precise interpretation in discriminating multiple lung cancers (MLCs). Our studies reported according to the STROBE statement. We reviewed the MLCs patients who underwent curative surgeries in Peking University People’s Hospital from January 2009 to December 2021. We selected patients who underwent surgical resection for more than 1 NSCLC while excluding those with (1) preoperatively confirmed or suspected systemic metastases, (2) malignant pleural or pericardial effusion, (3) diffuse pneumonic lesions, (4) known primary cancer history or (5) all lesions diagnosed as ground-glass opacity (GGO). Two cohorts of multiple lung cancers were used for validation, including WES (whole-exome sequencing) cohort and non-WES cohort. The WES cohort includes 42 patients, 17 of which underwent multiregional WES. The non-WES cohort consists of 94 patients tested for NGS of different panels (38 with 8-gene, 7 with 9-gene, 49 with ≥10-gene) (Supplementary Table 6). The studies involving human participants were reviewed and approved by the Institutional Review Board and Ethics Committee at Peking University People’s Hospital (approval No. 2020PHB210-01), in compliance with the regulations laid forth in the Declaration of Helsinki.

The patient tissue and blood samples, along with associated clinical and demographic data (e.g., age, diagnosis), were collected under written informed consent from all participants. All patient samples were deidentified and assigned unique sample IDs to ensure the privacy of patients’ information. Identifiable information linking patients to their samples was accessible only to study personnel involved in the clinical annotation of study data.

Clinical review and pathological review of the validation cohorts

The electronic medical records of enrolled patients we reviewed include the following clinical information: demographics (gender, age, and occupation), medical history (chief complaint, history of present illness, history of malignancies, smoking history, and family history of lung cancer), preoperative examinations (tumor markers, pulmonary function, and radiology), features of pulmonary nodules on chest CT (counts, size, locations, unilateral or bilateral, ground glass or solid nodules, lymphadenopathy, and emphysema), surgical records (dates, surgical approach, and perioperative complications), and pathological details (histologic type, main subtype, pleural invasion, vascular invasion, and lymph node status). During 5 years after surgery, chest CT and abdominal ultrasound were performed every 6 months while MRI and bone scan every 1 year on follow-up visits or at any time with symptoms. Additionally, the patients or their family members were regularly followed up by telephone, for this study the last episode of follow-up took place in August 2022. Disease-free survival (DFS) was defined as the time from the day of the last lung cancer surgery until the first relapse or metastasis or last follow-up. Four thoracic surgeons (K.Z.C., Z.Y.W., Y.C.J., X.Q.Y.) carried out the clinic discrimination of MLCs with the collected information except follow-up data, following the 2013 ACCP guideline12 and the International Association for the Study of Lung Cancer (IASLC) proposals6.

Pathologic evaluation was performed by the experienced pathologist in Peking University People’s Hospital (K.K.S.), following the criterion of CHA9. After excluding one patient whose pathological data was unavailable, 41 patients were diagnosed with CHA, of whom four were indeterminate due to objective factors (one of inconsistent staining, two of too little tissue, and one of severe tissue loosening) while 10 received relatively favorable but uncertain diagnoses by the pathologist.

Sequencing workflow for simulation cohort and validation cohort

DNA was extracted from formalin-fixed, paraffin-embedded (FFPE) tissues using the QIAamp DNA FFPE Tissue Kit (QIAGEN, Valencia, Calif), following the manufacturer’s protocols. To ensure the quality of the libraries, DNA was quantified with the Qubit dsDNA HS Assay Kit (Life Technologies, Carlsbad, Calif) and Qubit 2.0/3.0/4.0 Fluorometer (Life Technologies, USA).

As for the 8-genes NGS, initial sequencing data of EGFR, KRAS, BRAF, ERBB2, and MET mutants were processed using Torrent Suite (5.0.4) and Coverage Analysis plugins (5.0.4.0), along with the Ion Reporter (4.4) Oncomina DNA analysis workflow. The Torrent Mapping Alignment Program and the Torrent Variant Caller plugin (5.0.4.0) were, respectively, applied to align DNA sequences and call variants, under default settings. Two filtering steps were used to eliminate misidentified bases and generate final calls. ALK, ROS1, and RET fusion analysis was performed by the Ion Reporter (4.4) workflow and the Torrent Mapping Alignment Program.

As for the 31/68/143/363/457/520/654-genes NGS, purified DNA was first broken into fragments of around 300 bp using 5X WGS Fragmentation Mix (Qiagen, USA), followed by end repair and adapter ligation. Polymerase chain reaction (PCR) with gene targeted by the cSMART methods51, and 96 RXN xGen Exome Research Panel v1.0 (Integrated DNA Technologies, USA) were used successively for the library preparation. NovaSeq 6000 platform (Illumina, San Diego, USA) in 150PE mode was applied for sequencing, with a depth of more than 1000x. Then We used fastp52 to remove low-quality results, Burrows-Wheeler Aligner (BWA)53 to align the filtered data to reference genome (hg19), Gencore tool54 to manage repeated sequences, SAMtools55 to call variants, and ANNOVAR56 to annotate mutations. Non-synonymous SNPs with population allele frequencies lower than 0.05% and variant allele fraction (VAF) higher than 0.5% were kept for further analysis. For the 520-genes NGS, we used Nextseq500 sequencer (Illumina, Inc., USA) for paired-end sequencing and gene targeting with a depth of more than 1000x. VAF lower than 0.1% were reserved for further analysis, referring to ExAC, 1000Genomes, dbSNP, and ESP6500SI-V2 databases.

For the patients included in non-whole exome sequencing (WES) cohort, the choice of sequencing platforms was made based on the patients’ preferences. Gene coverages of targeted panels were presented in Supplementary Table 6.

As for WES cohort, lung tumor tissue and adjacent normal tissue specimens were collected by surgical removal for histopathological diagnosis and further study. To assess the impact of intra-tumor heterogeneity (ITH), tumor tissue samples were collected from two or three regions, based on the size of the tumor. Paired peripheral blood samples were collected immediately before surgery, fractionated into white blood cells and plasma. The samples were stored at −80 °C in the laboratory for subsequent experiments. Sequencing data from multi-point sampling were simultaneously included in the analysis. To eliminate the false positive discoveries, matched normal control samples were also sequenced to remove germline polymorphisms.

The library was prepared with the SureSelectXT Target Enrichment System (G7530-90000). Paired-end sequence data were generated applying the Illumina HiSeq machine, and aligned to the reference genome (hg19) using BWA. We used the SAMtools to sort original alignment results, Picard (http://broadinstitute.github.io/picard/) to mark duplication reads, and GATK57 to reduce false positives via local realignment and recalibration. SNVs and small indels were called by MuTect258 in the default setting and then annotated by ANNOVAR according to HGVS, ExAC, 1000Genomes, dbSNP, OMIM, COSMIC, and ClinVar databases. It is important to emphasize that all variants were retained, regardless of their pathogenicity. Variations with population allele frequencies lower than 0.05% were subjected to further analysis. After excluding significant outlier data, we filtered out the SNVs with less than 20X depth or less than 4X depth of the alternate alleles in tumor samples, as well as SNVs with less than 10X depth in normal samples or variant reads greater than 1% of normal reads. For multiple-region samples, variants detected in some but not all samples were recalled. This was done to address the possibility that the absent variants in some samples might be due to low VAF, thereby reducing false negative callings. The samples sequenced exhibited an average purity of 0.52 (ranging from 0.14 to 1) (Supplementary Table 9). The average sequencing depth on target was 129X (range: 88–180X) for tumor tissues and 119X (range: 36–175X) for adjacent normal tissues.

Mutation-based analysis of clonal relatedness

We compared two methods for mutation-based analysis of clonal relatedness: counting the shared mutations (MoleA) and clonal probability calculation based on all mutations (MoleB) in simulation cohort and the validation cohort (Fig. 1). MoleA represents the empirical approach commonly used previously, typically defining MPLC as paired tumors with no shared mutations or only one tumor harboring mutations. “Inconclusive” refers to cases with no detected mutation. The additional criteria for MoleA categorized cases with only one overlapping hotspot mutation as “inconclusive”, based on the COSMIC database59 and hotspots identified in 24,592 tumor samples by Chang et al.60. MoleB utilizes bioinformatics supports, specifically employing the “Clonality” package, to calculate the clonal relatedness38. “Inconclusive” refers to cases with no detected mutation. For MLCs patients with three tumors, the mutations were compared pairwise, and IPM was diagnosed as long as one pair was judged as IPM.

To evaluate the performance of different panels, we mimicked applying multiple panels on certain cases via subsampling the sequencing data of the included studies, simulated cases, and validation cohort. The modeled coverages in the review part were as follows (detailed in Supplementary Table 6): 1 gene, 3 genes, 9 genes, 10 genes (EGFR, ALK, KRAS, BRAF, ERBB2, MET, PIK3CA, RET, ROS1, TP53), and 50 genes. These ranges were determined based on commonly used panels and authoritative guidelines such as the National Comprehensive Cancer Network (NCCN) guideline28,61. All the larger panels covered the smaller range well (Supplementary Fig. 11), ensuring the feasibility of the subsampling process. It should be noted that gene fusion was not included in the analysis, which inevitably or partially diminishes the diagnostic role of certain genes, such as ALK, RET, and ROS1.

Clonal relatedness calculation using the “Clonality” package (MoleB)

This package has been detailed by the developers38,39,62. The first step is to estimate the frequencies of detected mutations. Briefly, if one mutation was detected in A samples in the reference dataset and B samples in the studied cohort, then the frequency of this mutation is assigned as (A + B)/(M + N), where M and N are the total number of cases in the reference dataset and the study in question, respectively62. For this step, we queried the default TCGA data38,39 in the systematic review, but leveraged the mutational datasets built from Chinese NSCLC36,37 in simulation and validation parts.

In the second step, the evaluation of clonal relatedness is determined by two kinds of input: the number and incidence of unique variants in a pair of tumors which reflects independent clones, as well as shared mutations which indicate the possible degree of identical origin39,62. Two functions have been proposed for step 2. The SNVtest is simpler and was adopted by previous studies involving MLCs16,17, the essence of which is a test for the alternative hypothesis of independent origins62. The mutation.rem computes the exact probability of identical clones in an individual, which has not been employed in investigating MLCs but bears higher accuracy as clarified in other cancers38,39. Thus, we chose this function in the simulation and validation part. However, in the review part, we adopt the SNVtest given the inter-study heterogeneity, since this function operates independently from the genetic profiles of studied cohorts while the mutation.rem yields the model parameters specific to the studied population39. Extracted mutations from review were reversely annotated to meet the requirement of SNVtest using Transvar63. However, 76 genomic alterations failed to undergo reverse translation even after artificial correction.

The third step of interpretation also differs between the two functions. As for the SNVtest, a generated p-value of less than 0.05 rejects the null hypothesis and IPM is diagnosed, while MPLC is strictly identified when the p-value is more than 0.95. But the diagnosis cannot be determined as long as the p-value is between 0.05 and 0.95. The interpretation of mutation.rem is based on the established cutoff (the obtaining process in this study is detailed later), above which indicates the same clones (IPM), while below which the diagnosis of MPLC is made.

Simulation of precisely-diagnosed MLCs

Considering of the lack of validated MLCs cohorts, we chose to select different pairwise sequencing profile from multi-region sequencing solitary NSCLC to model MPLC and IPM64. Samples from the same patient were paired to model IPM (sim-IPM) while those from different patients mimicked MPLC (sim-MPLC), generating 28,203 simulated cases of precisely diagnosed MLCs (details of these samples have been reported26). WES data of these cases were subsequently subsampled to evaluate the performance of different molecular methods.

During the bioinformatic analysis of modeled MLCs, we first randomly selected 56 of the 80 SLC patients and used simulated cases derived from these SLC to establish the optimal parameters of the mutation.rem function corresponding to each panel (Supplementary Table 8). Then, the determined parameters were applied to the data derived from the remaining 24 SLC cases to draw the receiver operating characteristic (ROC) curves for each panel and calculate the areas under the curve (AUCs) as well as the interpretation cutoffs of probabilities. The above processes were repeated 100 times to obtain the optimal parameters and cutoffs, which were used and validated in subsequent analyses.

Evolutionary analysis

Analysis of variant allele frequencies (VAF) and phylogenetic reconstruction were performed using established methods, TumE65 and PyClone66. Briefly, mutations were classified as clonal/subclonal based on cancer cell fraction (CCF)67 and further timed with respect to whole-genome doubling (WGD) following previous publications33. This section consists of two parts, VAF analysis and timing of mutations.

An established synthetic supervised learning method (TumE) was appiled65 to compare subclonal compositions of overlapped mutations in MLCs and explore the differences in evolutionary trajectories between MPLC and IPM through VAF analysis which is based on bulk-sequenced single biopsy. Briefly, this algorithm integrates simulated models of cancer evolution under positive and neutral selection with Bayesian neural networks to quantify subclonal dynamics using purity-corrected VAF information from diploid genomic regions.

Sequenza (v.2.1.2) was used to estimate both the copy number profile (including allele-specific copy number) and tumor purity, and ploidy for each sample. Mutation copy number was equal to the fraction of tumor cells carrying a given mutation multiplied by the number of chromosomal copies at that locus. CCF was computed based on the method proposed before67. Mutations were classified as clonal if CCF ≥1, and subclonal if below1. Clonal mutations were then further timed as early, late or untimed clonal. In brief, mutations with at least two copies of the major allele were preliminarily classified as early if the mutation copy number was >1, and late if it was ≤1. Any clonal mutations that could not be timed as either early or late, were classified as “clonal untimed”. The sequencing profile from TRACERx and our published study about solitary NSCLC were also include to further investigate the temporal difference in cancer evolution26,33.

Generally, MPLC represents a cluster of unrelated primary tumors, whereas IPM consists of two foci with tumorous cells transfer to another site. Previous studies suggested that in metachronous IPM most primary foci appeared larger than metastases30, and there is no significant difference in tumor sizes between synchronous and metachronous MLCs15, so here we hypothesized the larger lesion of IPM to be the primary focus (IPM-L) and the smaller lesion as metastasis (IPM-S) to explore the discrepancy inside IPM.

Statistics

DFS was presented on Kaplan–Meier curves, with differences in prognosis determined by log-rank test. Continuous variables with normal distribution were described as mean ± standard deviation while categorical variables were expressed as frequencies and percentages. Univariable comparisons were conducted using the Wilcoxon rank-sum test for continuous variables and the chi-square test for categorical variables. Statistical analyses were performed using R 4.1.0 (R Core Team) and GraphPad PRISM 9.0 (GraphPad Software). Two-tailed p-values less than 0.05 was considered statistically significant.