Introduction

Genetic alterations in erythroblastic leukemia oncogene B (erbB) family, also known as human epidermal growth factor receptor (HERs), are important mediators of cellular proliferation, tumorigenesis, and apoptosis1. The HER family has four members: epidermal growth factor receptor (EGFR; also known as HER1 or erbB1), ERBB2 (erbB2 or neu), HER3 (erbB3), HER4 (erbB4). EGFR and ERBB2 are major therapeutic targets in patients with non-small cell lung cancer (NSCLC)2. ERBB2 has no ligand binding domain of its own, however, it binds to other ligand-bound EGF receptor family members to form a heterodimer, stabilizing ligand binding and enhancing kinase-mediated activation of downstream signaling pathways, such as mitogen-activated protein kinase (MAPK) and phosphatidylinositol-3 kinase (PI3K) pathways. Around 1–6% of NSCLC tumors harbor ERBB2 mutations3,4,5 leading to constitutive action of downstream proliferative pathways, and resulting in oncogenesis6. Similar to EGFR mutations, ERBB2 mutations in NSCLC are most common in female non-smokers with a higher incidence of brain metastases3,7,8. Other reports showed that 6–35% of NSCLC tumors display ERBB2 protein overexpression while between 10 and 20% of NSCLC tumors contain ERBB2 amplification9,10,11,12,13,14.

It has been characterized that ERBB2 exon 20 insertions (3–12 bp) in the kinase domain between codons 775 and 881 are the most common oncogenic ERBB2 alteration3. The most common ERBB2 insertion is a YVMA duplication at the codon 775 (83% of cases)3,8. Some point mutations in ERBB2 exon 20 are oncogenic as well, such as L755S and G776C3. Recurrent mutations in ERBB2 transmembrane domain (TMD) and juxtamembrane domain (JMD) have been identified, including G660D, R678Q, E693K, and Q709L. These oncogenic mutations enhance ERBB2 activity by improving the active dimer interface or stabilizing an activating conformation15. Exon 8 ERBB2 mutation (S310X) is at the extracellular domain, also known as furin-like cysteine rich domain. S310F and S310Y mutations may result in hydrophobic interactions between the aromatic rings of the newly introduced 310F or 310Y with Y274 and F279 of the neighboring molecule, promoting noncovalent dimerization and kinase activation16. However, all previous studies had limited sample size, so the incidence of each of the ERBB2 mutations in lung cancer are still largely unknown. Furthermore, ERBB2 lung cancer’s co-occurring mutation profile also require further investigation.

Currently, for metastatic lung cancer harboring ERBB2 mutations, trastuzumab deruxtecan (EnHertu, T-Dxd) is the only approved targeted therapy. In the DESTINY-Lung01 and DESTINY-Lung02 trials, trastuzumab deruxtecan, a humanized anti-ERBB2 monoclonal antibody-drug conjugate (ADC), showed an overall response rate (ORR) of 33.9%-72.5% with median PFS of 8.2 (95% CI, 6.0–11.9) months17,18,19. Small molecule tyrosine kinase inhibitors (TKIs) to target oncogenic ERBB2 mutations have also been investigated. Studies using non-selective ERBB2 TKIs like dacomitinib (NCT00818441), afatinib (NCT02597946), tarloxotinib (NCT03805841), and neratinib (NCT01827267) yielded ORR ranging 0–19%. More recently, mobocertinib (NCT02716116), poziotinib (NCT03318939) and pyrotinib (NCT04063462, NCT02834936)20,21,22,23 demonstrated moderate efficacy. Currently, multiple anti-ERBB2 agents are under active development, including TKIs such as BI1810631 (NCT04886804) and BAYER 2927088 (NCT05099172). Additionally, other ERBB2 ADCs and monoclonal antibodies like trastuzumab with pertuzumab (NCT02925234, NCT02091141, NCT03845270) and T-DM1 (NCT02675829) have been conducted displaying clinical effect as well.

As ERBB2 is established as a new actionable target oncogene in lung cancer, it is critical to understand the heterogeneity of its mutational landscape. Using next generation sequencing (NGS) approach from both patient blood and tissue sample, this study has identified a diversity of ERBB2 alterations and the associated clinicopathological features from two independent retrospective cohorts from Geneplus (Chinese) and Guardant360 (the US). Furthermore, this study also examines the co-occurring mutations with ERBB2 mutations and compared EGFR and ERBB2 exon 20 insertions’ distribution. This study represents a real-world dataset that incorporated patients with ERBB2-altered NSCLC from both China and the United States.

Results

Clinicopathologic characteristics of ERBB2 altered NSCLC

In this study, two large retrospective cohorts of NSCLC patients were analyzed. In the Geneplus study, a total of 1281 Chinese patients diagnosed with NSCLC harboring ERBB2 somatic mutations (including amplification) were enrolled from November 2016 to April 2022. The incidence of all types of ERBB2 alterations in the tested NSCLC patients during the enrollment period was 5.6% (1281 out of 22,905). In the Guardant360 cohort, 1719 patients in the United States diagnosed with NSCLC harboring ERBB2 somatic mutations were analyzed from June 2019 to October 2021, with an incidence of 5.2% (1719 out of 33,080). Guardant360 cohort tested patient samples from the United States from individuals with various ethnic background, however, this cohort did not collect racial data as a requirement. As Guardant360 is widely used in the United Stated, it is very likely that this cohort represents the US patient population. In total, 3000 ERBB2-altered NSCLC patients were identified and analyzed.

Among the 1281 patients with ERBB2 alterations in the Geneplus cohort, 55% were female with a median age of 58 (18-89) years. The majority of patients with ERBB2 alterations were never smokers (70.9%) and adenocarcinoma (94.9%) histology. Clinicopathologic and demographic features of patients with ERBB2 somatic mutations including ERBB2 amplifications are included in Table 1. Among the 1,719 patients with ERBB2 alterations in the Guardant360 cohort, 51.8% were female with a median age of 70 (22–100) years. The majority of patients were adenocarcinoma (94.7%).

Table 1 Clinical features of patients with ERBB2 somatic mutations including ERBB2 amplifications in Geneplus cohort

Mutation landscape of ERBB2 alterations in NSCLC

We characterized the mutational landscape of ERBB2 in NSCLC patients from these two cohorts. In Geneplus, ERBB2 mutations or insertions were identified in 930 patients, ERBB2 amplification(amp)-only in 351 patients, and concurrent mutation/amplification at 5.5% (70 out of 1281). Next, we annotated the mutation/insertions for their oncogenic potential, using OncoKB and COSMIC database (“Methods”) and found that 920 mutation/insertions were oncogenic (Fig. 1a). Therefore, the incidence of ERBB2 alterations was 5.6% (1,281/22,905) and ERBB2 oncogenic mutations or insertions 4.0% (920/22,905), respectively.

Fig. 1: ERBB2 mutational profile in newly diagnosed NSCLC patients.
figure 1

ERBB2 oncogenic mutations and proportion reported by OncoKB and COSMIC in Geneplus (a, c) and Guardant360 (b, d). e Proportion of ERBB2 oncogenic mutations for 70 patients with co-occurring ERBB2 amplification in Geneplus cohort. f ERBB2 mutational profile of the 70 patients with co-occurring ERBB2 amplification in Geneplus cohort.

Among the oncogenic mutations, the most common alterations were in the tyrosine kinase domain exon 20 (80.1% [737/920]), with Y772_A775dupYVMA being the most frequent (58% [534/920]), followed by G776delinsVC/LC/VV/IC (10.7% [98/920]), and S310F/Y (10.5% [97/920]) (Fig. 1a, c). In the 70 patients with co-occurring ERBB2 amplification, oncogenic ERBB2 alterations were identified in 61 patients (Fig. 1e). Among them, Y772_A775dupYVMA was the most common mutation (50.8% [31/61]) (Fig. 1f). 2.5% (32/1,281) of patients had ERBB2 mutations which were reported as neutral by OncoKB or COSMIC or not reported (Supplemental Fig. 1a).

In the Guardant360 cohort, 634 mutation/insertions were analzyed as oncogenic by OncoKB and/or COSMIC (Fig. 1b). Therefore, the incidence of ERBB2 alterations was 5.2% (1,719 / 33,080) and ERBB2 oncogenic mutations or insertions 1.9% (634 / 33,080), respectively.

Among those, a similar pattern was observed with Y772_A775dupYVMA being the most frequent (41.6% [264/634]) (Fig. 1b, d), followed by G776delinsVC/LC/VV (9.7% [61/634]), and S310F/Y (15.4% [98/634]). 48% (825 out of 1,719) of alterations had no known annotations in those two databases (Supplemental Fig. 1b) including 806 unique variants.

Among the 806 unique variants of unknown significance (VUS), 154 (19.1%) were detected in more than one patient and 50 (6.2%) in more than 3 patients. Using clonality analysis (“Methods”), the clonality of 309 (38.3%) variants were above 0.5. Overall, there were 29 VUSs having repeated occurrence (in more than one patient) and functioning as the predicted dominant clone. S335C, G222C, and D277Y, which were all located at the Furin-like region, were the top three frequent protein changes with dominant clonality (Supplemental Fig. 1c). Given this stringent filtering criteria, these 29 alterations may be potential oncogenic driver mutations and warrant further investigation (Supplemental Fig. 1c).

Co-occurring alterations with ERBB2 in NSCLC

Next, we evaluated other co-occurring genetic alterations with ERBB2 alterations in NSCLC. In Geneplus cohort, 663 patients’ samples were analyzed using the 1021 gene panel (Table 1) from their tissue (91.4%), blood (6.8%), or pleural effusion (1.5%) samples. 451 (68.0%) samples had ERBB2 mutations, 175 (26.4%) with ERBB2 amplification only and 37 (5.6%) with ERBB2 mutations and amplifications Fig. 2a. Clinicopathologic and demographic features of these patients are described in Supplementary Table 1. The median age at diagnosis was 57 years (range, 21-89), with 56.3% females and 70.3% non-smoker. Among patients with known histology types, most had adenocarcinoma histology (94.3%).

Fig. 2: Differential co-mutated genes between patients with ERBB2 oncogenic mutations and with ERBB2 amplifications in Geneplus cohort.
figure 2

a Co-mutation plot showing co-occurring alterations between ERBB2 oncogenic mutations and amplifications. Enrichment analysis of co-altered genes between patients with ERBB2 oncogenic mutations and amplifications: (b) gene SNVs/Indels; (c) gene CNVs. d Map of chromosome 17 (Chr17) showing the location of ERBB2 and co-amplified genes on the same chromosome.

When the co-mutations were compared between ERBB2 mutations to ERBB2 amplification, TP53, EGFR, and CDKN2A were significantly more frequently co-occurring with amplifications in both the Geneplus (Fig. 2b) and Guardant360 cohorts (Supplemental Fig. 2a, b). Copy number gain in the genes of CDK12, FGFR1, and EGFR was significantly more frequently observed in ERBB2 amplification compared to those who have ERBB2 mutation (Fig. 2c). Given the proximal location of CDK12 and ERBB2 located on Chr17q12, it is not surprising that ERBB2 amplified samples also contained an amplification of CDK12 (Fig. 2d). This observation has also been reported for ERBB2 amplified in other solid cancer types, including breast cancer, gastric cancer, biliary tract cancer, and colorectal cancer24.

As ERBB2 mutation versus amplification displayed distinct co-mutation landscape, we also compare the clinical features between the two groups (Table 2). Patients with ERBB2 mutations are more significantly associated with being younger (mean age: 53.6 vs 63 years; p < 0.001), more likely female patients (66% vs 31%; p < 0.001) and being a never smoker (79% vs. 50%; p < 0.001). The majority of the ERBB2 mutation tumors were adenocarcinomas (98.6%), compared to a higher proportion of ERBB2 amplified tumors have a squamous cell carcinoma histology (15.4% vs 0.7%; p < 0.001). ERBB2 mutation tumors has a low TMB (mean value: 3.5 vs 8.9; p < 0.001).

Table 2 Clinical features of patients with ERBB2 somatic mutations or ERBB2 amplification only in Geneplus cohort

Clonality relationship between ERBB2 and EGFR when co-occurring in NSCLC

Oncogene variant clonality can be deduced from VAF to infer dominant versus non-dominant clonal relationship. We annotated the relative clonality inferred by VAF for oncogenic alterations of ERBB2 and EGFR to dissect the relationship between these two driver genes.

Among samples with ERBB2 oncogenic alterations (n = 474), co-occurring EGFR oncogenic mutations were observed in 48 cases (10.1%), and the dominant clonality was identified as the following: ERBB2 (n = 7), EGFR (n = 16), both (n = 20), and other gene alterations (n = 5) (Supplementary Fig. 3a). In the 27 cases with ERBB2 dominant with or without EGFR dominant (22 S310F; 3 Y772_A775dup; 1 V659D; 1 S310Y), 18 had EGFR L858R and 7 had exon 19 deletion. In the 16 cases with EGFR only dominant, 11 cases had ERBB2 S310X mutations, and 1 case had Y772_A775dup (Supplementary Fig. 3b–d). These results suggest that EGFR with specific documented dominant clones such as L858R and exon 19 deletion held the most dominant clonality in patients with concomitant EGFR and ERBB2 oncogenic mutations.

Difference Between EGFR and ERBB2 exon 20 insertions in NSCLC

As EGFR and ERBB2 are homologous in the ERBB family and exon 20 insertion is one common mechanism of activation for both genes, we then made direct comparisons of the exon 20 insertion mutations for those two groups. In the Geneplus cohort, a comparison of clinical characteristics and molecular features between samples with ERBB2 exon 20 insertions (n = 370) and EGFR exon 20 insertions (n = 323) (Table 3). Patients with ERBB2 exon 20 insertion are significantly associated with younger age (mean age: 51.6 vs 57.1 years; p < 0.001), and female gender (69% vs 60%; p = 0.03). A lower level of TMB (mean value: 2.8 vs 3.6; p = 0.007) was observed in patients with ERBB2 exon 20 insertion than those with EGFR exon 20 insertion.

Table 3 Clinical features of patients with ERBB2 exon 20 (ex20) insertions (ins) or EGFR ex20 ins using 1021 gene panel in Geneplus cohort

We then compared the exon 20 insertion patterns between EGFR and ERBB2. In the Geneplus dataset, 276 out of 370 (74.7%) ERBB2 exon20 insertions were Y772_A755dup YVMA alteration in the alpha C-helix domain, and 25% of the cases were located in the loop following the alpha C-helix domain (Fig. 3a). Similar percentage of samples in each location was identified in the Guardant360 cohort (Fig. 3b). For EGFR exon 20 insertions, in the Geneplus cohort, 29.3% were A767_V769dup and 20.2% with S768_D770dup located in the near-loop. In the far loop, H773dup (4.6%) is the most frequently amino acid change, followed by H773_V774dup (3%) and H773_V774insAH (1.6%) (Fig. 3c). Similarly, In the Guardant360 cohort, in patients with EGFR exon 20 insertion, A767_V769dup and S768_D770dup are the most frequently insertions in the near loop, whereas H773dup and V774_C775insHV are more frequently occurred in the far loop (Fig. 3d). Overall, EGFR exon 20 insertions had a more diverse insertion distribution than ERBB2 exon 20 insertions in both cohorts. Most of the variants were diverse in-frame insertions or duplications of one to four amino acids spanning A767-V774 in EGFR and Y772-P780 in ERBB2 (Fig. 3e, f).

Fig. 3: Difference between EGFR and ERBB2 exon 20 insertions (ex20 ins) in NSCLC.
figure 3

ERBB2 ex20 ins in Geneplus (a) and Guardant360 (b). EGFR ex20 ins in Geneplus (c) and Guardant360 (d). Bubble plots of ex20 ins comparison between ERBB2 and EGFR in NSCLC patients in Geneplus (e) and Guardant360 (f).

We also compared the co-mutation landscape between EGFR and ERBB2 exon20 insertions. In tumors with ERBB2 exon 20 insertion, TP53 is the most frequently co-mutated gene (29%), followed by MED12 (7%), LRP1B (5%) and RB1 (5%) (Fig. 4a). In EGFR exon 20 insertion tumors (Fig. 4b), top co-mutation genes were TP53 (46%), LRP1B (9%), RB1 (9%), and MED12 (8%), sharing similarities. When directly compared, co-mutation in TERT were more frequently occurred in ERBB2 than EGFR exon 20 insertion, whereas TP53, RBM10, CTNNB1, TBX3, STAG2, RPTOR, and MLH3 were more frequently occurred in patients with EGFR exon 20 insertion (Fig. 4c and Supplementary Fig. 4a). EGFR and ERBB2 each also enrich to have copy number gain with itself when copy number gain were compared (Fig. 4d). ERBB2 co-mutations occurred in 13 samples with ERBB2 exon 20 insertion and 3 samples with EGFR exon 20 insertion (Supplementary Fig. 4b), whereas EGFR co-mutations occurred in 8 cases with ERBB2 exon 20 insertion and 42 cases with EGFR exon 20 insertion (Supplementary Fig. 4c). Regarding copy number variants (CNV), MYC, MDM2, KRAS, IKZF1, CARD11, and PMS2 were associated with EGFR exon 20 insertion.

Fig. 4: Differential co-mutated genes between patients with EGFR and ERBB2 exon 20 insertions (ex20 ins) in Geneplus cohort.
figure 4

Co-mutation plots of ERBB2 (a) and EGFR (b). Enrichment analysis of co-altered genes between patients with ERBB2 ex20 ins and EGFR ex20 ins: (c) gene SNVs/Indels; (d) gene CNVs.

Discussion

In this study, we reported largest mutational profile dataset of NSCLC harboring ERBB2 alterations, including both Geneplus (Chinese) patients and Guardant360 (US) patients and provided important insights into ERBB2 alterations in NSCLC.

The incidence of ERBB2 mutations in NSCLC has been controversial, often cited as 1%-6%. The incidence is dependent on the definition of the mutation, in the Guardant360 cohort, without filtering, 5.2% NSCLC samples had at least one ERBB2 alterations; while after OncoKB and COSMIC inferences, the incidence on annotated ERBB2 oncogenic alterations decreased, and the oncogenic incidence was 1.9%. Unlike EGFR mutations with a strong predisposition in Asian patients, in previous reports, the prevalence of ERBB2 oncogenic alterations appears to be comparable between patients from the US or France (~2%)3,8 and Asian (4–8%)14,25 population. Based on the ERBB2 alterations’ subtypes, previous studies have reported gene mutations (3–5%)5,8,26, gene amplification (2–5%), and protein overexpression (2–30%)27 in lung cancers. Here, we report oncogenic ERBB2 mutations at 1.9–5.1% and amplification at 0.5–1.8%.

We characterized the demographics of NSCLC patients with ERBB2 mutation or amplification. Patients with ERBB2 mutation NSCLC were relatively young, with a female and never-smoker dominance. The tumors are mostly adenocarcinoma. Those features resemble EGFR-mutant NSCLC closely although with some difference. This demographics similarity was further confirmed in our comparison between EGFR and ERBB2 exon 20 patient populations. However, patients with ERBB2 amplification NSCLC had distinct features, including male and smoker dominance, as well as high TMB, similar to previous reports24,28 and suggesting that ERBB2 amplification might not be a strong oncogenic driver.

Previous studies had demonstrated variant-specific differences in patient outcomes on targeted treatment. Dacomitinib treatment resulted in an ORR of 11.5% for ERBB2 mutant NSCLC but no response in patients with ERBB2 exon 20 insertion Y772dupYVMA29. In a pan-ERBB2 mutant NSCLC study testing efficacy of T-DM1, patients harboring ERBB2 exon 20 insertion had an ORR of 54.5%, but patients with ERBB2 exon 19 mutation (p.L755P) did not have responses30. Therefore, the understanding of mutational landscape is important for offering precision oncology treatment to the matching patients, and crucial for future drug development. We analyzed the frequency of mutations within the various regions reported by OncoKB and COSMIC. In cohorts from Asian and Western countries, ERBB2 exon 20 insertions frequently occur (>50%), which had poor response to TKI (ORR: 0–18.8%)29,31,32,33, while good response to ADCs (14.3–75%) reported by previous clinical studies30,34,35,36. Y772_V775dupYVMA is the most recurrent mutation in the kinase domain in Geneplus and Guardant360 datasets, 58% and 41.6%, respectively. Worse survival was associated with A775_G776YVMA in NSCLC when compared to other less common ERBB2 alterations37. Conventional platinum-based chemotherapy has previously been found to have worse PFS in those with A775_G776YVMA compared with other ERBB2 variants. However, studies reported that ADCs had better outcome in this group of patients. The most recurrent mutation variant in the non-TK domain is S310F, which located in the furin-like cysteine-rich domain, 8.5% in Geneplus and 10.7% in Guardant360, respectively. Furin-like cysteine-rich domain contain numerous cysteine residuals that participate in disulfide bond formation, and in homodimer and heterodimer formation with other ErbB family members. S310F mutation promoted ERBB2 homodimerization and consequent auto-phosphorylation to activate the downstream PI3K/AKT and MAPK pathway, which was independent on ERBB1, ERBB3, and ERBB4, contributing to the growth and migration of cancer cells38. Responses to ADCs in transmembrane and extracellular domain ERBB2 mutations (V659E and S310F) have been reported30. In summary, due to the high incidence of ERBB2 exon 20 A775_G776YVMA, the population itself can represents an area of focused drug development. Drugs targeting transmembrane and extracellular domain ERBB2 mutations are also quite needed for the field.

Co-occurring genetic alterations with an oncogenic mutation can also associate with clinical response or resistance. Our findings indicated that concurrent driver mutations were mostly mutually exclusive with ERBB2 kinase domain mutation, including KRAS, ALK, or BRAF actionable alterations. However, a relatively higher frequency of co-mutation with EGFR was observed in non-TK domain mutation, especially S310F, which was dominant or co-dominant with EGFR mutations. Furthermore, a significantly higher frequency of co-mutation with EGFR was showed in patients with ERBB2 amplification group, indicating that S310F and ERBB2 amplification could be a potential mechanism of resistance to EGFR TKI39,40. Dual targetable drugs such as afatinib or TKI plus ADCs could be a potential choice for this special group of patients with both EGFR and ERBB2 actionable alterations.

Compared to EGFR classical mutations, EGFR exon 20 insertions amount to a small fraction of EGFR mutations and now has a distinct set of targeted therapy approvals in lung cancer patients. As described above, ERBB2 exon 20 insertions are the dominant alterations in NSCLC patients harboring ERBB2 alterations, different than EGFR classical mutations being the most common, exon 20 insertions less common. We assessed the characteristics between these two exon 20 insertions. We observed high heterogeneity in EGFR exon 20 insertions compared to ERBB2 exon 20 A775_G776YVMA being highly dominant. In the past, some TKIs were developed to target both EGFR and ERBB2 exon 20 insertions, including poziotinib, mobocertinib and pyrotinib. Poziotinib is an irreversible pan-HER TKI. ZENITH 20 trial demonstrated response rate of 14.8% in previously treated EGFR exon 20 insertion41 and 27.8% in ERBB2 exon 20 insertions42. Data from another trial treated with mobocertinib showed an ORR of 28% for patients with EGFR exon 20 insertion22. In patients with ERBB2 exon 20 insertions treated with pyrotinib, a pan-HER TKI against HER1/2/4, a phase II study reported that the ORR was 31.7% and PFS was 6.8 months in ERBB2 exon 20 insertion patients43, which is similar to those patients with EGFR exon 20 insertion treated with pan-HER TKI. For this groups of patients with ERBB2 exon 20 insertion, ADCs showed excellent results in DESTINY-Lung01 clinical trial and can probably change the clinical practice. Now, small molecules designed to specifically targeting ERBB2 mutations are under investigation, which potentially spare targeting EGFR related toxicities.

There are strengths and weaknesses to our study. To our knowledge, this is the biggest cohort in comparison of NSCLC patients with ERBB2 alterations between Chinese and the US datasets, however, the conclusion is limited by the heterogeneity of lab assays and sample source. Furthermore, due to the real-world study with incomplete information, clinical outcome data to certain drugs are not assessed in this study. Finally, for the VUS, further technologies and bioinformatic approaches are needed to identify the function of rare variants. As such, future studies, both experimental and clinical, are warranted to validate these provocative genomic findings and their clinical implications.

In conclusion, in two large independent cohorts, Geneplus with patients from China and Guardant360 with patients from the United States, ERBB2 mutation and co-mutation patterns were similar. ERBB2 exon 20 insertions/mutations were dominant at over 80% with Y772_A775dupYVMA being the most common driver mutation; TP53 and EGFR were the most frequently co-occurred genes. ERBB2 mutation lung cancers had low TMB and PDL1, as expected in female and never-smoker dominant lung adenocarcinomas, similar to EGFR exon 20 NSCLC.

Methods

Study population and platforms

Two retrospective cohorts of NSCLC patients were analyzed for ERBB2 alterations: Geneplus (both ctDNA and tissue, November 2016 to April 2022); Guardant360 (ctDNA, June 2019 to Oct 2021). The panels used in tissue or blood samples from Geneplus were summarized in Supplementary Tables 26. Supplementary Tables 78 showed Guardant360 ctDNA 74 and 83 gene-panels. All patients in the study provided written informed consent. The study protocol was approved by Institutional Review Board of Peking Union Medical College Hospital (K3415) and was conducted in accordance with the Declaration of Helsinki and principles of Good Clinical Practice. This study is compliant with the Guidance of the Ministry of Science and Technology (MOST) for the Review and Approval of Human Genetic Resources, with the formal approval of publishing data in a scientific journal (2024BAT00891). The raw data from China will not leave the country and will not be disclosed publicly. The generation of de-identified data sets by Guardant Health for research purposes was approved by the Advarra Institutional Review Board (Pro00034566); patient identity protection was maintained throughout the study in a de-identified database.

DNA extraction and targeted next-generation sequencing (Geneplus)

DNA was isolated from tissue samples, peripheral blood, pleura effusion and hydrothorax using commercial kits (Qiagen, Hilden, Germany). Peripheral blood leukocytes were separated to extract germline genomic DNA using QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany). The KAPA Library Preparation Kit (Kapa Biosystems, Wilmington, MA, USA) was used to prepare indexed Illumina NGS libraries. Custom-designed 1021, 73, 36, 59 or 86 cancer-related gene panels were used to hybridize the DNA libraries, and their selected regions and genes are listed in Supplementary Tables 26. The hybridized libraries were sequenced using a 100-bp paired-end configuration on a DNBSEQ-T7RS sequencer (MGI Tech, Shenzhen, China). The minimal mean effective depth of coverage for tissue and germline DNA was 300× and for ctDNA was 1000×.

After the removal of terminal adaptor sequences and low-quality reads with FASTP44, the remaining reads were mapped to the reference human genome (hg19) and aligned using the Burrows-Wheel Aligner (version 0.7.12-r1039) with default parameters. GATK (3.4–46-gbc02625) and MuTect2 (1.1.4) were used to call somatic single nucleotide variants and small insertions and deletions. Contra (2.0.8) was used to identify copy number variations45. NCSV (in-house algorithm 0.2.3) was employed to detect structural variants46. All candidate variants were manually confirmed by using the integrative genomics viewer browser. Variants were filtered to exclude clonal hematopoietic mutations with an inhouse database of clonal hematopoiesis variants of >10,000 pan-cancer patients and healthy individuals47, germline mutations in dbSNP, as well as variants that occur at a population frequency of >1% in the Exome Sequencing Project. Oncogenic/neutral ERBB2 mutations are those that are documented as pathogenic/neutral in the OncoKB (https://www.oncokb.org/) and COSMIC database (https://cancer.sanger.ac.uk/cosmic) databases. Unknown ERBB2 mutations are those that are not reported by either OncoKB or COSMIC databases. Copy number alterations of ERBB2 with copy number ≥ 2.6 were considered as amplification, and the alterations were manually confirmed with CNV plot.

Tumor mutation burden (TMB) and PD-L1 expression evaluation (Geneplus)

The TMB was determined as the number of somatic non-synonymous single nucleotide variants and small insertions/deletions per mega-base in the coding region (with VAF ≥ 0.03 for tumor tissues and ≥0.005 for ctDNA, respectively). Immunohistochemistry with the PD-L1 IHC 22C3 pharmDx assay (Agilent Technologies, Santa Clara, CA, USA) was performed to evaluate PD-L1 expression of tumor tissues.

Clonality analysis (Geneplus)

Somatic substitution/small insertions and deletions were applied to PyClone by default to analyze the clonal structure using a Bayesian clustering method48. Cancer cell fraction was calculated with the mean of predicted cellular frequencies. The cluster with the highest mean VAF was identified as the clonal cluster, and mutations in this cluster were clonal mutations. Meanwhile, other clusters and mutations were considered subclonal.

ctDNA sequencing and analysis (Guardant360)

ctDNA was evaluated using the commercially available Guardant360™ assay (Guardant Health, Inc., Redwood City, CA) to evaluate up to 83 cancer-related genes as previously described49. The Guardant360 assay is a comprehensive genomic profiling assay that identifies single-nucleotide variants (SNVs), insertions and deletions, fusions, and amplifications50. The assay covers complete exon sequencing of multiple genes, including EGFR, ERBB2, and KRAS. During the collection period, the assay included 74 to 83 genes. The NGS testing was performed as part of standard clinical care in a CLIA-certified and College of American Pathologists accredited laboratory51. Blood was collected in two 10 mL Streck tubes and processed plasma was evaluated for single-nucleotide variants (SNVs), insertions-deletions (indels), gene fusions/rearrangements, and copy number variants (CNVs)51. Mutations were annotated using OncoKB to define pathogenic variants52. Synonymous mutations and variants of unknown significance were not considered to be clinically relevant but were included as indicators of tumor shed in the plasma. ERBB2 amplifications ≥2.2 copies were included in the Guardant360 cohort.

Clonality analysis (Guardant360)

Variant clonality was determined by normalizing VAF to the maximum somatic VAF in a sample. Variants were classified as clonal if the normalized value was ≥0.5.

Statistical analysis

Statistical analyses were performed using R statistical software (version 4.1.3 for Windows). A comparison of categorical variables was conducted with Pearson’s χ2 test or Fisher’s exact tests. The Mann-Whiney U test and Student’s t-test were used for nonnormally and normally distributed continuous variables, respectively. The Kruskal-Wallis test was used to compare non-normally distributed continuous variables among three or more independently sampled groups. All statistical tests were performed with two-sided methods, and P < 0.05 was considered to indicate statistical significance.