Abstract
Lung cancer in never smokers (LCINS) accounts for around 25% of all lung cancers1,2 and has been associated with exposure to second-hand tobacco smoke and air pollution in observational studies3,4,5. Here we use data from the Sherlock-Lung study to evaluate mutagenic exposures in LCINS by examining the cancer genomes of 871 treatment-naive individuals with lung cancer who had never smoked, from 28 geographical locations. KRAS mutations were 3.8 times more common in adenocarcinomas of never smokers from North America and Europe than in those from East Asia, whereas a higher prevalence of EGFR and TP53 mutations was observed in adenocarcinomas of never smokers from East Asia. Signature SBS40a, with unknown cause6, contributed the largest proportion of single base substitutions in adenocarcinomas, and was enriched in cases with EGFR mutations. Signature SBS22a, which is associated with exposure to aristolochic acid7,8, was observed almost exclusively in patients from Taiwan. Exposure to secondhand smoke was not associated with individual driver mutations or mutational signatures. By contrast, patients from regions with high levels of air pollution were more likely to have TP53 mutations and shorter telomeres. They also exhibited an increase in most types of mutations, including a 3.9-fold increase in signature SBS4, which has previously been linked with tobacco smoking9, and a 76% increase in the clock-like10 signature SBS5. A positive dose–response effect was observed with air-pollution levels, correlating with both a decrease in telomere length and an increase in somatic mutations, mainly attributed to signatures SBS4 and SBS5. Our results elucidate the diversity of mutational processes shaping the genomic landscape of lung cancer in never smokers.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
Normal and tumour-paired CRAM files for the WGS data for the individuals in the Sherlock-Lung study and the EAGLE study have been deposited in dbGaP under the accession numbers phs001697.v2.p1 and phs002992.v1.p1, respectively. Detailed access information for the publicly available datasets is available in Supplementary Table 13. Data from the rnaturalearthdata v.1.0.0 (https://github.com/ropensci/rnaturalearthdata) were used to generate maps. Data on passive smoking and PM2.5 estimates of outdoor air pollution are available in Supplementary Table 14. Human reference genome GRCh38 was downloaded from the GATK resources at https://github.com/broadinstitute/gatk/blob/master/src/test/resources/large/Homo_sapiens_assembly38.fasta.gz.
Code availability
The WGS bioinformatics pipelines are available at https://github.com/xtmgah/Sherlock-Lung. The Battenberg SCNA calling algorithm is available at https://github.com/Wedge-lab/battenberg.
References
Sun, S., Schiller, J. H. & Gazdar, A. F. Lung cancer in never smokers—a different disease. Nat. Rev. Cancer 7, 778–790 (2007).
Bray, F. et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 74, 229–263 (2024).
World Health Organization & International Agency for Research on Cancer. Tobacco Smoke and Involuntary Smoking: IARC Monographs on the Evaluation of Carcinogenic Risks to Humans Vol. 83 (WHO & IARC, 2004).
Turner, M. C. et al. Outdoor air pollution and cancer: an overview of the current evidence and public health recommendations. CA Cancer J. Clin. 70, 460–479 (2020).
Ciabattini, M., Rizzello, E., Lucaroni, F., Palombi, L. & Boffetta, P. Systematic review and meta-analysis of recent high-quality studies on exposure to particulate matter and risk of lung cancer. Environ. Res. 196, 110440 (2021).
Senkin, S. et al. Geographic variation of mutagenic exposures in kidney cancer genomes. Nature 629, 910–918 (2024).
Poon, S. L. et al. Genome-wide mutational signatures of aristolochic acid and its application as a screening tool. Sci. Transl. Med. 5, 197ra101 (2013).
Hoang, M. L. et al. Mutational signature of aristolochic acid exposure as revealed by whole-exome sequencing. Sci. Transl. Med. 5, 197ra102 (2013).
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Alexandrov, L. B. et al. Clock-like mutational processes in human somatic cells. Nat. Genet. 47, 1402–1407 (2015).
Proctor, R. N. Tobacco and the global lung cancer epidemic. Nat. Rev. Cancer 1, 82–86 (2001).
Siegel, D. A., Fedewa, S. A., Henley, S. J., Pollack, L. A. & Jemal, A. Proportion of never smokers among men and women with lung cancer in 7 US states. JAMA Oncol. 7, 302–304 (2021).
Lui, N. S. et al. Sub-solid lung adenocarcinoma in Asian versus Caucasian patients: different biology but similar outcomes. J. Thorac. Dis. 12, 2161–2171 (2020).
Gaughan, E. M., Cryer, S. K., Yeap, B. Y., Jackman, D. M. & Costa, D. B. Family history of lung cancer in never smokers with non-small-cell lung cancer and its association with tumors harboring EGFR mutations. Lung Cancer 79, 193–197 (2013).
Toh, C. K. et al. Never-smokers with lung cancer: epidemiologic evidence of a distinct disease entity. J. Clin. Oncol. 24, 2245–2251 (2006).
Yano, T. et al. Never-smoking nonsmall cell lung cancer as a separate entity: clinicopathologic features and survival. Cancer 113, 1012–1018 (2008).
Brennan, P. et al. High cumulative risk of lung cancer death among smokers and nonsmokers in Central and Eastern Europe. Am. J. Epidemiol. 164, 1233–1241 (2006).
Wang, P., Sun, S., Lam, S. & Lockwood, W. W. New insights into the biology and development of lung cancer in never smokers—implications for early detection and treatment. J. Transl. Med. 21, 585 (2023).
Koh, G., Degasperi, A., Zou, X., Momen, S. & Nik-Zainal, S. Mutational signatures: emerging concepts, caveats and clinical applications. Nat. Rev. Cancer 21, 619–637 (2021).
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).
Wang, X. et al. Association between smoking history and tumor mutation burden in advanced non-small cell lung cancer. Cancer Res. 81, 2566–2573 (2021).
Lee, J. J. et al. Tracing oncogene rearrangements in the mutational history of lung adenocarcinoma. Cell 177, 1842–1857 (2019).
Zhang, T. et al. Genomic and evolutionary classification of lung cancer in never smokers. Nat. Genet. 53, 1348–1359 (2021).
Landi, M. T. et al. Tracing lung cancer risk factors through mutational signatures in never-smokers: the Sherlock-Lung study. Am. J. Epidemiol. 190, 962–976 (2021).
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Islam, S. M. A. et al. Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. Cell Genom. 2, 100179 (2022).
Sondka, Z. et al. COSMIC: a curated database of somatic variants and clinical data for cancer. Nucleic Acids Res. 52, D1210–D1217 (2024).
Zou, X. et al. A systematic CRISPR screen defines mutational mechanisms underpinning signatures caused by replication errors and endogenous DNA damage. Nat. Cancer 2, 643–657 (2021).
Steele, C. D. et al. Signatures of copy number alterations in human cancer. Nature 606, 984–991 (2022).
Everall, A. et al. Comprehensive repertoire of the chromosomal alteration and mutational signatures across 16 cancer types from 10,983 cancer patients. Preprint at medRxiv https://doi.org/10.1101/2023.06.07.23290970 (2023).
Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).
Degasperi, A. et al. A practical framework and online tool for mutational signature analyses show inter-tissue variation and driver dependencies. Nat. Cancer 1, 249–263 (2020).
Huang, K. L. et al. Pathogenic germline variants in 10,389 adult cancers. Cell 173, 355–370 (2018).
Nguyen, L., Martens, J. W. M., Van Hoeck, A. & Cuppen, E. Pan-cancer landscape of homologous recombination deficiency. Nat. Commun. 11, 5584 (2020).
Davies, H. et al. HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures. Nat. Med. 23, 517–525 (2017).
Zhang, T. et al. Deciphering lung adenocarcinoma evolution and the role of LINE-1 retrotransposition. Preprint at bioRxiv https://doi.org/10.1101/2025.03.14.643063 (2025).
Letouze, E. et al. Mutational signatures reveal the dynamic interplay of risk factors and cellular processes during liver tumorigenesis. Nat. Commun. 8, 1315 (2017).
Fujimoto, A. et al. Whole-genome mutational landscape and characterization of noncoding and structural mutations in liver cancer. Nat. Genet. 48, 500–509 (2016).
Swanton, C., McGranahan, N., Starrett, G. J. & Harris, R. S. APOBEC enzymes: mutagenic fuel for cancer evolution and heterogeneity. Cancer Discov. 5, 704–712 (2015).
Chen, Y.-J. et al. Proteogenomics of non-smoking lung cancer in East Asia delineates molecular signatures of pathogenesis and progression. Cell 182, 226–244 (2020).
Zhang, T. et al. APOBEC affects tumor evolution and age at onset of lung cancer in smokers. Nat. Commun. 16, 4711 (2025).
Morton, L. M. et al. Radiation-related genomic profile of papillary thyroid carcinoma after the Chernobyl accident. Science 372, eabg2538 (2021).
Lawson, A. R. J. et al. Extensive heterogeneity in somatic mutation and selection in the human bladder. Science 370, 75–82 (2020).
Degasperi, A. et al. Substitution mutational signatures in whole-genome-sequenced cancers in the UK population. Science 376, abl9283 (2022).
Otlu, B. et al. Topography of mutational signatures in human cancer. Cell Rep. 42, 112930 (2023).
Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109–2121 (2017).
Zhang, T. et al. Distinct genomic landscape of lung adenocarcinoma from household use of smoky coal. Am. J. Respir. Crit. Care Med. 208, 733–736 (2023).
Hill, W. et al. Lung adenocarcinoma promotion by air pollutants. Nature 616, 159–167 (2023).
van Donkelaar, A. et al. Monthly global estimates of fine particulate matter and their uncertainty. Environ. Sci. Technol. 55, 15287–15300 (2021).
Mochizuki, A. et al. Passive smoking-induced mutagenesis as a promoter of lung carcinogenesis. J. Thorac. Oncol. 19, 984–994 (2024).
Yu, X. J. et al. Characterization of somatic mutations in air pollution-related lung cancer. EBioMedicine 2, 583–590 (2015).
Chan, W.-H. et al. Verifying the accuracy of self-reported smoking behavior in female volunteer soldiers. Sci. Rep. 13, 3438 (2023).
Landi, M. T. et al. Environment And Genetics in Lung cancer Etiology (EAGLE) study: an integrative population-based case–control study of lung cancer. BMC Public Health 8, 203 (2008).
Bergmann, E. A., Chen, B. J., Arora, K., Vacic, V. & Zody, M. C. Conpair: concordance and contamination estimator for matched tumor–normal pairs. Bioinformatics 32, 3196–3198 (2016).
Pedersen, B. S. et al. Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches. Genome Med. 12, 62 (2020).
Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
Boot, A. et al. In-depth characterization of the cisplatin mutational signature in human cell lines and in esophageal and liver tumors. Genome Res. 28, 654–665 (2018).
Dentro, S. C. et al. Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes. Cell 184, 2239–2254 (2021).
Imielinski, M. et al. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell 150, 1107–1120 (2012).
Lee, J. K. et al. Clonal history and genetic predictors of transformation into small-cell carcinomas from lung adenocarcinomas. J. Clin. Oncol. 35, 3065–3074 (2017).
The Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014).
Carrot-Zhang, J. et al. Whole-genome characterization of lung adenocarcinomas lacking the RTK/RAS/RAF pathway. Cell Rep. 34, 108707 (2021).
Pedersen, B. S. & Quinlan, A. R. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34, 867–868 (2018).
Sadedin, S. P. & Oshlack, A. Bazam: a rapid method for read extraction and realignment of high-throughput sequencing data. Genome Biol. 20, 78 (2019).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).
Freed, D., Pan, R. & Aldana, R. TNscope: accurate detection of somatic mutations with haplotype-based variant candidate detection and machine learning filtering. Preprint at bioRxiv https://doi.org/10.1101/250647 (2018).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Ramos, A. H. et al. Oncotator: cancer variant annotation tool. Hum. Mutat. 36, E2423–E2429 (2015).
Hasan, M. S., Wu, X., Watson, L. T. & Zhang, L. UPS-indel: a universal positioning system for indels. Sci. Rep. 7, 14106 (2017).
Mayakonda, A., Lin, D. C., Assenov, Y., Plass, C. & Koeffler, H. P. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 28, 1747–1756 (2018).
Jun, G. et al. Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. Am. J. Hum. Genet. 91, 839–848 (2012).
Martinez-Jimenez, F. et al. A compendium of mutational cancer driver genes. Nat. Rev. Cancer 20, 555–572 (2020).
Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041 (2017).
Sondka, Z. et al. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer 18, 696–705 (2018).
Muiños, F., Martinez-Jimenez, F., Pich, O., Gonzalez-Perez, A. & Lopez-Bigas, N. In silico saturation mutagenesis of cancer genes. Nature 596, 428–432 (2021).
Chakravarty, D. et al. OncoKB: a precision oncology knowledge base. JCO Precis. Oncol. 2017, 1–16 (2017).
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385 (2018).
Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023).
Yuan, K., Macintyre, G., Liu, W., PCAWG-11 working group & Markowetz, F. Ccube: a fast and robust method for estimating cancer cell fractions. Preprint at bioRxiv https://doi.org/10.1101/484402 (2018).
Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).
Yang, L. et al. Diverse mechanisms of somatic structural variations in human cancer genomes. Cell 153, 919–929 (2013).
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).
Yang, Y. & Yang, L. Somatic structural variation signatures in pediatric brain tumors. Cell Rep. 42, 113276 (2023).
Zhu, H. et al. Candidate cancer driver mutations in distal regulatory elements and long-range chromatin interaction networks. Mol. Cell 77, 1307–1321 (2020).
Ding, Z. et al. Estimating telomere length from whole genome sequence data. Nucleic Acids Res. 42, e75 (2014).
Alexandrov, L. B., Nik-Zainal, S., Wedge, D. C., Campbell, P. J. & Stratton, M. R. Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 3, 246–259 (2013).
Bergstrom, E. N. et al. SigProfilerMatrixGenerator: a tool for visualizing and exploring patterns of small mutational events. BMC Genomics 20, 685 (2019).
Díaz-Gay, M. et al. Assigning mutational signatures to individual samples and individual somatic mutations with SigProfilerAssignment. Bioinformatics 39, btad756 (2023).
Otlu, B. & Alexandrov, L. B. Evaluating topography of mutational signatures with SigProfilerTopography. Genome Biol. 26, 134 (2025).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).
Acknowledgements
This work was supported by the Intramural Research Program of the National Cancer Institute, US NIH (project ZIACP101231 to M.T.L.); by the NIH grants R01ES032547-01, R01CA269919-01 and 1U01CA290479-01 to L.B.A.; and by a Packard Fellowship for Science and Engineering to L.B.A. The research performed in the L.B.A. laboratory was also supported by the Sanford Stem Cell Institute at the University of California San Diego. M.D.-G. and P.G.-G. were awarded fellowships within the Generación D initiative, Red.es, Ministerio para la Transformación Digital y de la Función Pública, for talent attraction (C005/24-ED CV1), funded by the European Union NextGenerationEU funds, through the Plan de Recuperación, Transformación y Resiliencia (PRTR). The funders had no roles in study design, data collection and analysis, decision to publish or preparation of the manuscript. The computational analyses reported in this manuscript used the Triton Shared Computing Cluster at the San Diego Supercomputer Center of the University of California San Diego. We thank the study participants; P. Kraft for reading and commenting on the manuscript; and the staff at Westat for their assistance with collecting samples and corresponding clinical data. This work used the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov).
Author information
Authors and Affiliations
Contributions
Conceptualization: M.D.-G., T.Z., L.B.A. and M.T.L. Methodology: M.D.-G., T.Z., L.Y., J. Shi, D.C.W., B.Z., L.B.A. and M.T.L. Formal analysis: M.D.-G., T.Z., P.H.H., A.K., W.Z., C.D.S., B.O., S.P.N., R.V., E.N.B., M. Kazachkova, J. Sang, J.P.M., C.H., O.W.L., K.M.J., P.G.-G., Y.Y., X.Z., L.Y., M.A.N., J. Shi, B.Z. and J.C. Pathology work: C.L., M.K.B., W.D.T., L.M.S., P.J., R.H. and S.-R.Y. Resources: O.P., C.S., C.A.H., I.-S.C., M.P.W., K.C.L., E.S.E., J.M.S., M.B.S., S.S.Y., M. Manczuk, J.L., B.S., A.M., O.S., D.Z., I.H., D.M., S.M., M. Kontic, Y.B., B.E.G.R., D.C.C., V.G., P.B., G.L., P.H., N.R., A.C.P., D.C., Q.L., S.J.C. and M.T.L. Data curation: P.H.H., T.Z., F.J.C.-M., M. Miraftab, M.S. and O.W.L. Writing (original draft): M.D.-G., T.Z., L.B.A. and M.T.L. Writing (review and editing), M.D.-G., T.Z., C.S., L.Y., M.A.N., D.C.W., B.Z., S.J.C., J.C., L.B.A. and M.T.L. Visualization: M.D.-G., T.Z., L.B.A. and M.T.L. Supervision: L.B.A. and M.T.L.
Corresponding authors
Ethics declarations
Competing interests
L.B.A. is a co-founder, CSO, scientific advisory member and consultant for io9, has equity and receives income. The terms of this arrangement have been reviewed and approved by the University of California San Diego in accordance with its conflict-of-interest policies. L.B.A. is also a compensated member of the scientific advisory board of Inocras. L.B.A.’s spouse is an employee of Biotheranostics. E.N.B. and L.B.A. declare a US provisional patent application filed with the University of California San Diego with serial number 63/269,033. L.B.A. also declares US provisional applications filed with the University of California San Diego with serial numbers 63/366,392, 63/289,601, 63/483,237, 63/412,835 and 63/492,348. L.B.A. is also an inventor of US patent 10,776,718 for source identification by non-negative matrix factorization. L.B.A. and M.D.-G. further declare a European patent application with application number EP25305077.7. S.-R.Y has received consulting fees from AstraZeneca, Sanofi, Amgen, AbbVie and Sanofi, and speaking fees from AstraZeneca, Medscape, PRIME Education and Medical Learning Institute. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Association of mutational signature prevalence and driver mutations with geographical region, biological sex and EGFR mutation status in LCINS adenocarcinoma cases.
a, DBS, ID, CN and SV mutational signatures enrichment analysis with geographical regions. Horizontal lines marking statistically significant thresholds were included at 0.05 (dashed orange line) and 0.01 FDR value levels (dashed red line). Blue-coloured signatures were enriched in North American and European patients, whereas red-coloured signatures were enriched in East Asian patients. Statistical significance was evaluated using multivariable logistic regression models for geographical regions and adjusted by age, sex, and tumour purity. b, SBS, DBS, ID CN, and SV mutational signatures enrichment analysis with biological sexes. Blue-coloured signatures were enriched in males, whereas red-coloured signatures were enriched in females. Statistical significance was evaluated using multivariable logistic regression models for biological sex and adjusted by age, genetic ancestry, and tumour purity. c–e, Detail of the enrichment of EGFR (c), TP53 (d) and KRAS (e) driver mutations in North American and European versus East Asian LCINS adenocarcinoma cases. f, Driver mutations enrichment analysis with biological sexes. Blue-coloured genes were enriched in males, whereas red-coloured genes were enriched in females. Statistical significance was evaluated using multivariable logistic regression models for biological sex and adjusted by age, genetic ancestry, and tumour purity. g, Quantification of the tumour mutational burden for TP53 wild-type and mutant tumours across EGFR mutation status (n = 271 TP53 wild-type EGFR wild-type, n = 241 TP53 wild-type EGFR mutant, n = 81 TP53 mutant EGFR wild-type, n = 144 TP53 mutant EGFR mutant). Statistical significance was evaluated using a multivariable linear regression model for EGFR mutation status and adjusted by age, sex, ancestry, and tumour purity. The line within the box indicates the median, the upper and lower ends indicate the 25th and 75th percentiles, whiskers show 1.5 × interquartile range, and values outside are shown as individual data points. h, SBS, DBS, ID, CN and SV mutational signatures enrichment analysis with EGFR mutation status. Blue-coloured signatures were enriched in EGFR mutant tumours, whereas red-coloured signatures were enriched in EGFR wild-type tumours. Statistical significance was evaluated using multivariable logistic regression models for EGFR mutation status and adjusted by age, sex, genetic ancestry, and tumour purity.
Extended Data Fig. 2 Genomic differences between lung cancers from smokers and lung cancers from never-smokers, and landscape of 56 LCINS tumours that exhibit SBS4 activity.
a, Differences between smoker and never-smoker lung cancer cases across SBS signatures. Volcano plot (top) indicating the enrichment of SBS signature prevalence in never-smokers (left) and smokers (right) with lung cancer. Statistically significant enrichments were evaluated using multivariable logistic regression models for smoking status and adjusted by age, sex, histology, genetic ancestry, and tumour purity. Firth’s bias-reduced logistic regressions were used for regression presenting complete or quasi-complete separation. P-values were adjusted for multiple comparisons based on the total number of mutational signatures considered, and adjusted p-values were reported as FDR values. Horizontal lines marking statistically significant thresholds were included at 0.05 (dashed orange line) and 0.01 FDR levels (dashed red line). Bar plot (bottom) indicating prevalence by smoking history. b, Tumour mutational burden differences between SBS4-positive (n = 56) and negative (n = 815) LCINS tumours for SBS, DBS, ID, CN segments, and SV events. Statistical significance was evaluated using two-sided Wilcoxon rank sum tests. The line within the box indicates the median, the upper and lower ends indicate the 25th and 75th percentiles, whiskers show 1.5 × interquartile range, and values outside are shown as individual data points. c–e, Mutational signature landscape for SBS (c), DBS (d) and ID (e) mutation types, including absolute and relative number of mutations assigned to each mutational signature, unsupervised clustering based on the signature contributions, and sample-level annotations of sex, genetic ancestry, passive smoking, and accuracy of signature reconstruction based on cosine similarity. f, Driver mutations landscape, including different types of genomic alterations, as well as sample-level annotations of sex, genetic ancestry, histology, and tumour purity. g, Enrichment of EGFR p.L858R hotspot driver mutations in SBS4-positive tumours from never-smokers compared to smokers using multivariable logistic regressions considering clinical and epidemiological covariates, including age, sex, genetic ancestry, histology, and tumour purity (n = 5 mutated non-smoker cases, n = 1 mutated smoker case, n = 51 non-smoker wild-type cases, n = 301 smoker wild-type tumours). Error bars indicate 95% CIs.
Extended Data Fig. 3 Topographical characteristics of 56 LCINS and 68 lung cancers from smokers exhibiting SBS4 activity.
a,b, Distribution of SBS4 mutations with replication timing in our cohort of never-smokers (a) and in the smokers from the PCAWG cohort (b). Data are separated into deciles, with each segment harbouring 10% of the observed replication time signal in the x-axis, and the normalized mutational density displayed in the y-axis. Black dashed lines represent the behaviour of simulated mutations. c,d, Association of SBS4 mutations with nucleosome occupancy in never-smokers (c) and smokers (d). The solid blue line represents real somatic mutations, whereas the dashed grey line indicates the distribution of simulated mutations. Both lines show the average nucleosome signal in the y-axis, using a genomic window of 2 kilobases centred around the SBS4-associated mutations in the x-axis. e,f, Strand asymmetry of SBS4-associated mutations in comparison to simulations and considering lagging and leading DNA strands, transcribed and untranscribed DNA regions and genic and intergenic genomic locations in never-smokers (e) and smokers (f). The number of circles represents the odds ratio and the colour the corresponding strand/region of statistically significant asymmetries.
Extended Data Fig. 4 Influence of passive smoking on the landscape of ID, DBS, CN and SV signatures in LCINS.
a–e, Differences in DBS, ID, CN, and SV burden using univariate comparisons based on two-sided Wilcoxon rank sum tests (a) as well as multivariable linear regressions considering clinical and epidemiological covariates (b–e), including age, sex, genetic ancestry, and tumour purity (n = 250 passive smokers, n = 208 non-passive smokers). The line within the box indicates the median, the upper and lower ends indicate the 25th and 75th percentiles, whiskers show 1.5 × interquartile range, and values outside are shown as individual data points (a). Error bars indicate 95% CIs (b–e). f, Enrichment of mutational signatures derived from DBS, ID, CN, and SV alterations. Horizontal lines marking statistically significant thresholds were included at 0.05 (dashed orange line) and 0.01 FDR value levels (dashed red line). Statistical significance was evaluated using multivariable logistic regression models for passive smoking history and adjusted by age, sex, genetic ancestry, histology and tumour purity.
Extended Data Fig. 5 Effects of PM2.5 exposure in large genomic alterations in LCINS.
a,b, Differences in the number of CN segments and SV events using univariate comparisons based on two-sided Wilcoxon rank sum tests (a) as well as multivariable linear regressions, considering clinical and epidemiological covariates (b), including age, sex, genetic ancestry, histology, and tumour purity, for patients diagnosed in geographical regions with high and low PM2.5 exposure levels (threshold defined at 20 μg m−3; n = 440 high-pollution group, n = 413 low-pollution group; only samples for which the country of origin was known are included). The line within the box indicates the median, the upper and lower ends indicate the 25th and 75th percentiles, whiskers show 1.5 × interquartile range, and values outside are shown as individual data points (a). Error bars indicate 95% CIs (b). c, Volcano plots indicating enrichment of mutational signatures derived from CN and SV alterations. Horizontal lines marking statistically significant thresholds were included at 0.05 (dashed orange line) and 0.01 FDR value levels (dashed red line). Statistical significance was evaluated using multivariable logistic regression models for PM2.5 exposure levels and adjusted by age, sex, genetic ancestry, histology, and tumour purity.
Extended Data Fig. 6 Assignment of mutational signatures and estimation of telomere length using data from control and PM2.5-exposed mice.
a,b, Box plots comparing the mutations assigned to SBS5 (a) and the estimations for the telomere length ratio between the tumour and normal samples (b). Two-sided Student’s t-tests were used to calculate statistical significance. The line within the box indicates the median, the upper and lower ends indicate the 25th and 75th percentiles, and whiskers show 1.5 × interquartile range. n = 5 (control mice) and n = 5 (PM2.5-exposed mice); data from a previous study49.
Extended Data Fig. 7 Mutagenic effects of PM2.5 exposure in LCINS cases excluding SBS4 contributions.
a, Quantification of SBS burden excluding SBS4 mutations for patients living in geographical regions with high and low PM2.5 exposure levels (threshold defined at 20 μg m−3; n = 440 high-pollution group, n = 413 low-pollution group; only samples for which the country of origin was known are included). Statistical significance was evaluated using two-sided Wilcoxon rank sum tests. The line within the box indicates the median, the upper and lower ends indicate the 25th and 75th percentiles, whiskers show 1.5 × interquartile range, and values outside are shown as individual data points. b, Forest plot corresponding to a multivariable linear regression considering high or low PM2.5 exposure group, age, sex, genetic ancestry, histology and tumour sample purity as covariates and SBS burden as independent variable (threshold defined at 20 μg m−3; n = 440 high-pollution group, n = 413 low-pollution group; only samples for which the country of origin was known are included). Error bars indicate 95% CIs. c, Scatter plot showing a significant correlation between individual sample estimates of PM2.5 exposure and SBS burden. Statistical significance was evaluated using a multivariable linear regression of the individual PM2.5 estimates per sample and mutation burden (log10 scale), and adjusted by age, sex, genetic ancestry, histology and tumour purity. Blue lines and bands indicate univariate linear regressions and 95% CIs for average mutation burden versus average PM2.5 estimates.
Supplementary information
Supplementary Information
Supplementary Note, Supplementary Figures and Supplementary References
Supplementary Tables
Supplementary Tables 1–14
Rights and permissions
About this article
Cite this article
Díaz-Gay, M., Zhang, T., Hoang, P.H. et al. The mutagenic forces shaping the genomes of lung cancer in never smokers. Nature 644, 133–144 (2025). https://doi.org/10.1038/s41586-025-09219-0
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41586-025-09219-0