Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

The mutagenic forces shaping the genomes of lung cancer in never smokers

Abstract

Lung cancer in never smokers (LCINS) accounts for around 25% of all lung cancers1,2 and has been associated with exposure to second-hand tobacco smoke and air pollution in observational studies3,4,5. Here we use data from the Sherlock-Lung study to evaluate mutagenic exposures in LCINS by examining the cancer genomes of 871 treatment-naive individuals with lung cancer who had never smoked, from 28 geographical locations. KRAS mutations were 3.8 times more common in adenocarcinomas of never smokers from North America and Europe than in those from East Asia, whereas a higher prevalence of EGFR and TP53 mutations was observed in adenocarcinomas of never smokers from East Asia. Signature SBS40a, with unknown cause6, contributed the largest proportion of single base substitutions in adenocarcinomas, and was enriched in cases with EGFR mutations. Signature SBS22a, which is associated with exposure to aristolochic acid7,8, was observed almost exclusively in patients from Taiwan. Exposure to secondhand smoke was not associated with individual driver mutations or mutational signatures. By contrast, patients from regions with high levels of air pollution were more likely to have TP53 mutations and shorter telomeres. They also exhibited an increase in most types of mutations, including a 3.9-fold increase in signature SBS4, which has previously been linked with tobacco smoking9, and a 76% increase in the clock-like10 signature SBS5. A positive dose–response effect was observed with air-pollution levels, correlating with both a decrease in telomere length and an increase in somatic mutations, mainly attributed to signatures SBS4 and SBS5. Our results elucidate the diversity of mutational processes shaping the genomic landscape of lung cancer in never smokers.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the Sherlock-Lung cohort of LCINS.
Fig. 2: Repertoire of mutational signatures and driver mutations in LCINS adenocarcinomas.
Fig. 3: Influence of passive smoking in the genomic landscape of LCINS.
Fig. 4: Mutagenic effects of exposure to PM2.5 in LCINS.
Fig. 5: Associations between PM2.5 exposure and specific mutational signatures that affect LCINS tumours.

Similar content being viewed by others

Data availability

Normal and tumour-paired CRAM files for the WGS data for the individuals in the Sherlock-Lung study and the EAGLE study have been deposited in dbGaP under the accession numbers phs001697.v2.p1 and phs002992.v1.p1, respectively. Detailed access information for the publicly available datasets is available in Supplementary Table 13. Data from the rnaturalearthdata v.1.0.0 (https://github.com/ropensci/rnaturalearthdata) were used to generate maps. Data on passive smoking and PM2.5 estimates of outdoor air pollution are available in Supplementary Table 14. Human reference genome GRCh38 was downloaded from the GATK resources at https://github.com/broadinstitute/gatk/blob/master/src/test/resources/large/Homo_sapiens_assembly38.fasta.gz.

Code availability

The WGS bioinformatics pipelines are available at https://github.com/xtmgah/Sherlock-Lung. The Battenberg SCNA calling algorithm is available at https://github.com/Wedge-lab/battenberg.

References

  1. Sun, S., Schiller, J. H. & Gazdar, A. F. Lung cancer in never smokers—a different disease. Nat. Rev. Cancer 7, 778–790 (2007).

    Article  CAS  PubMed  Google Scholar 

  2. Bray, F. et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 74, 229–263 (2024).

    PubMed  Google Scholar 

  3. World Health Organization & International Agency for Research on Cancer. Tobacco Smoke and Involuntary Smoking: IARC Monographs on the Evaluation of Carcinogenic Risks to Humans Vol. 83 (WHO & IARC, 2004).

  4. Turner, M. C. et al. Outdoor air pollution and cancer: an overview of the current evidence and public health recommendations. CA Cancer J. Clin. 70, 460–479 (2020).

    Google Scholar 

  5. Ciabattini, M., Rizzello, E., Lucaroni, F., Palombi, L. & Boffetta, P. Systematic review and meta-analysis of recent high-quality studies on exposure to particulate matter and risk of lung cancer. Environ. Res. 196, 110440 (2021).

    Article  CAS  PubMed  Google Scholar 

  6. Senkin, S. et al. Geographic variation of mutagenic exposures in kidney cancer genomes. Nature 629, 910–918 (2024).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  7. Poon, S. L. et al. Genome-wide mutational signatures of aristolochic acid and its application as a screening tool. Sci. Transl. Med. 5, 197ra101 (2013).

    Article  PubMed  Google Scholar 

  8. Hoang, M. L. et al. Mutational signature of aristolochic acid exposure as revealed by whole-exome sequencing. Sci. Transl. Med. 5, 197ra102 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Alexandrov, L. B. et al. Clock-like mutational processes in human somatic cells. Nat. Genet. 47, 1402–1407 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Proctor, R. N. Tobacco and the global lung cancer epidemic. Nat. Rev. Cancer 1, 82–86 (2001).

    Article  CAS  PubMed  Google Scholar 

  12. Siegel, D. A., Fedewa, S. A., Henley, S. J., Pollack, L. A. & Jemal, A. Proportion of never smokers among men and women with lung cancer in 7 US states. JAMA Oncol. 7, 302–304 (2021).

    Article  PubMed  Google Scholar 

  13. Lui, N. S. et al. Sub-solid lung adenocarcinoma in Asian versus Caucasian patients: different biology but similar outcomes. J. Thorac. Dis. 12, 2161–2171 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Gaughan, E. M., Cryer, S. K., Yeap, B. Y., Jackman, D. M. & Costa, D. B. Family history of lung cancer in never smokers with non-small-cell lung cancer and its association with tumors harboring EGFR mutations. Lung Cancer 79, 193–197 (2013).

    Article  PubMed  Google Scholar 

  15. Toh, C. K. et al. Never-smokers with lung cancer: epidemiologic evidence of a distinct disease entity. J. Clin. Oncol. 24, 2245–2251 (2006).

    Article  PubMed  Google Scholar 

  16. Yano, T. et al. Never-smoking nonsmall cell lung cancer as a separate entity: clinicopathologic features and survival. Cancer 113, 1012–1018 (2008).

    Article  PubMed  Google Scholar 

  17. Brennan, P. et al. High cumulative risk of lung cancer death among smokers and nonsmokers in Central and Eastern Europe. Am. J. Epidemiol. 164, 1233–1241 (2006).

    Article  PubMed  Google Scholar 

  18. Wang, P., Sun, S., Lam, S. & Lockwood, W. W. New insights into the biology and development of lung cancer in never smokers—implications for early detection and treatment. J. Transl. Med. 21, 585 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Koh, G., Degasperi, A., Zou, X., Momen, S. & Nik-Zainal, S. Mutational signatures: emerging concepts, caveats and clinical applications. Nat. Rev. Cancer 21, 619–637 (2021).

    Article  CAS  PubMed  Google Scholar 

  20. Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  21. The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).

    Article  CAS  ADS  Google Scholar 

  22. Wang, X. et al. Association between smoking history and tumor mutation burden in advanced non-small cell lung cancer. Cancer Res. 81, 2566–2573 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Lee, J. J. et al. Tracing oncogene rearrangements in the mutational history of lung adenocarcinoma. Cell 177, 1842–1857 (2019).

    Article  CAS  PubMed  Google Scholar 

  24. Zhang, T. et al. Genomic and evolutionary classification of lung cancer in never smokers. Nat. Genet. 53, 1348–1359 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Landi, M. T. et al. Tracing lung cancer risk factors through mutational signatures in never-smokers: the Sherlock-Lung study. Am. J. Epidemiol. 190, 962–976 (2021).

    Article  PubMed  Google Scholar 

  26. Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  PubMed  ADS  Google Scholar 

  27. Islam, S. M. A. et al. Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. Cell Genom. 2, 100179 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Sondka, Z. et al. COSMIC: a curated database of somatic variants and clinical data for cancer. Nucleic Acids Res. 52, D1210–D1217 (2024).

    Article  CAS  PubMed  Google Scholar 

  29. Zou, X. et al. A systematic CRISPR screen defines mutational mechanisms underpinning signatures caused by replication errors and endogenous DNA damage. Nat. Cancer 2, 643–657 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Steele, C. D. et al. Signatures of copy number alterations in human cancer. Nature 606, 984–991 (2022).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  31. Everall, A. et al. Comprehensive repertoire of the chromosomal alteration and mutational signatures across 16 cancer types from 10,983 cancer patients. Preprint at medRxiv https://doi.org/10.1101/2023.06.07.23290970 (2023).

  32. Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  33. Degasperi, A. et al. A practical framework and online tool for mutational signature analyses show inter-tissue variation and driver dependencies. Nat. Cancer 1, 249–263 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Huang, K. L. et al. Pathogenic germline variants in 10,389 adult cancers. Cell 173, 355–370 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Nguyen, L., Martens, J. W. M., Van Hoeck, A. & Cuppen, E. Pan-cancer landscape of homologous recombination deficiency. Nat. Commun. 11, 5584 (2020).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  36. Davies, H. et al. HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures. Nat. Med. 23, 517–525 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Zhang, T. et al. Deciphering lung adenocarcinoma evolution and the role of LINE-1 retrotransposition. Preprint at bioRxiv https://doi.org/10.1101/2025.03.14.643063 (2025).

  38. Letouze, E. et al. Mutational signatures reveal the dynamic interplay of risk factors and cellular processes during liver tumorigenesis. Nat. Commun. 8, 1315 (2017).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  39. Fujimoto, A. et al. Whole-genome mutational landscape and characterization of noncoding and structural mutations in liver cancer. Nat. Genet. 48, 500–509 (2016).

    Article  CAS  PubMed  Google Scholar 

  40. Swanton, C., McGranahan, N., Starrett, G. J. & Harris, R. S. APOBEC enzymes: mutagenic fuel for cancer evolution and heterogeneity. Cancer Discov. 5, 704–712 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Chen, Y.-J. et al. Proteogenomics of non-smoking lung cancer in East Asia delineates molecular signatures of pathogenesis and progression. Cell 182, 226–244 (2020).

    Article  CAS  PubMed  Google Scholar 

  42. Zhang, T. et al. APOBEC affects tumor evolution and age at onset of lung cancer in smokers. Nat. Commun. 16, 4711 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Morton, L. M. et al. Radiation-related genomic profile of papillary thyroid carcinoma after the Chernobyl accident. Science 372, eabg2538 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Lawson, A. R. J. et al. Extensive heterogeneity in somatic mutation and selection in the human bladder. Science 370, 75–82 (2020).

    Article  CAS  PubMed  ADS  Google Scholar 

  45. Degasperi, A. et al. Substitution mutational signatures in whole-genome-sequenced cancers in the UK population. Science 376, abl9283 (2022).

    Article  Google Scholar 

  46. Otlu, B. et al. Topography of mutational signatures in human cancer. Cell Rep. 42, 112930 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109–2121 (2017).

    Article  CAS  PubMed  Google Scholar 

  48. Zhang, T. et al. Distinct genomic landscape of lung adenocarcinoma from household use of smoky coal. Am. J. Respir. Crit. Care Med. 208, 733–736 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Hill, W. et al. Lung adenocarcinoma promotion by air pollutants. Nature 616, 159–167 (2023).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  50. van Donkelaar, A. et al. Monthly global estimates of fine particulate matter and their uncertainty. Environ. Sci. Technol. 55, 15287–15300 (2021).

    Article  PubMed  ADS  Google Scholar 

  51. Mochizuki, A. et al. Passive smoking-induced mutagenesis as a promoter of lung carcinogenesis. J. Thorac. Oncol. 19, 984–994 (2024).

    Article  CAS  PubMed  Google Scholar 

  52. Yu, X. J. et al. Characterization of somatic mutations in air pollution-related lung cancer. EBioMedicine 2, 583–590 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Chan, W.-H. et al. Verifying the accuracy of self-reported smoking behavior in female volunteer soldiers. Sci. Rep. 13, 3438 (2023).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  54. Landi, M. T. et al. Environment And Genetics in Lung cancer Etiology (EAGLE) study: an integrative population-based case–control study of lung cancer. BMC Public Health 8, 203 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  55. Bergmann, E. A., Chen, B. J., Arora, K., Vacic, V. & Zody, M. C. Conpair: concordance and contamination estimator for matched tumor–normal pairs. Bioinformatics 32, 3196–3198 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Pedersen, B. S. et al. Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches. Genome Med. 12, 62 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Boot, A. et al. In-depth characterization of the cisplatin mutational signature in human cell lines and in esophageal and liver tumors. Genome Res. 28, 654–665 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Dentro, S. C. et al. Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes. Cell 184, 2239–2254 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Imielinski, M. et al. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell 150, 1107–1120 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Lee, J. K. et al. Clonal history and genetic predictors of transformation into small-cell carcinomas from lung adenocarcinomas. J. Clin. Oncol. 35, 3065–3074 (2017).

    Article  CAS  PubMed  Google Scholar 

  62. The Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014).

    Article  PubMed Central  ADS  Google Scholar 

  63. Carrot-Zhang, J. et al. Whole-genome characterization of lung adenocarcinomas lacking the RTK/RAS/RAF pathway. Cell Rep. 34, 108707 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Pedersen, B. S. & Quinlan, A. R. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34, 867–868 (2018).

    Article  CAS  PubMed  Google Scholar 

  65. Sadedin, S. P. & Oshlack, A. Bazam: a rapid method for read extraction and realignment of high-throughput sequencing data. Genome Biol. 20, 78 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  66. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).

    Article  CAS  PubMed  Google Scholar 

  68. Freed, D., Pan, R. & Aldana, R. TNscope: accurate detection of somatic mutations with haplotype-based variant candidate detection and machine learning filtering. Preprint at bioRxiv https://doi.org/10.1101/250647 (2018).

  69. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  70. Ramos, A. H. et al. Oncotator: cancer variant annotation tool. Hum. Mutat. 36, E2423–E2429 (2015).

    Article  PubMed  Google Scholar 

  71. Hasan, M. S., Wu, X., Watson, L. T. & Zhang, L. UPS-indel: a universal positioning system for indels. Sci. Rep. 7, 14106 (2017).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  72. Mayakonda, A., Lin, D. C., Assenov, Y., Plass, C. & Koeffler, H. P. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 28, 1747–1756 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Jun, G. et al. Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. Am. J. Hum. Genet. 91, 839–848 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Martinez-Jimenez, F. et al. A compendium of mutational cancer driver genes. Nat. Rev. Cancer 20, 555–572 (2020).

    Article  CAS  PubMed  Google Scholar 

  75. Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Sondka, Z. et al. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer 18, 696–705 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Muiños, F., Martinez-Jimenez, F., Pich, O., Gonzalez-Perez, A. & Lopez-Bigas, N. In silico saturation mutagenesis of cancer genes. Nature 596, 428–432 (2021).

    Article  PubMed  ADS  Google Scholar 

  78. Chakravarty, D. et al. OncoKB: a precision oncology knowledge base. JCO Precis. Oncol. 2017, 1–16 (2017).

    Article  Google Scholar 

  79. Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023).

    Article  CAS  PubMed  Google Scholar 

  81. Yuan, K., Macintyre, G., Liu, W., PCAWG-11 working group & Markowetz, F. Ccube: a fast and robust method for estimating cancer cell fractions. Preprint at bioRxiv https://doi.org/10.1101/484402 (2018).

  82. Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  83. Yang, L. et al. Diverse mechanisms of somatic structural variations in human cancer genomes. Cell 153, 919–929 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).

    Article  CAS  PubMed  Google Scholar 

  85. Yang, Y. & Yang, L. Somatic structural variation signatures in pediatric brain tumors. Cell Rep. 42, 113276 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Zhu, H. et al. Candidate cancer driver mutations in distal regulatory elements and long-range chromatin interaction networks. Mol. Cell 77, 1307–1321 (2020).

    Article  CAS  PubMed  Google Scholar 

  87. Ding, Z. et al. Estimating telomere length from whole genome sequence data. Nucleic Acids Res. 42, e75 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Alexandrov, L. B., Nik-Zainal, S., Wedge, D. C., Campbell, P. J. & Stratton, M. R. Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 3, 246–259 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Bergstrom, E. N. et al. SigProfilerMatrixGenerator: a tool for visualizing and exploring patterns of small mutational events. BMC Genomics 20, 685 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  90. Díaz-Gay, M. et al. Assigning mutational signatures to individual samples and individual somatic mutations with SigProfilerAssignment. Bioinformatics 39, btad756 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  91. Otlu, B. & Alexandrov, L. B. Evaluating topography of mutational signatures with SigProfilerTopography. Genome Biol. 26, 134 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  92. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported by the Intramural Research Program of the National Cancer Institute, US NIH (project ZIACP101231 to M.T.L.); by the NIH grants R01ES032547-01, R01CA269919-01 and 1U01CA290479-01 to L.B.A.; and by a Packard Fellowship for Science and Engineering to L.B.A. The research performed in the L.B.A. laboratory was also supported by the Sanford Stem Cell Institute at the University of California San Diego. M.D.-G. and P.G.-G. were awarded fellowships within the Generación D initiative, Red.es, Ministerio para la Transformación Digital y de la Función Pública, for talent attraction (C005/24-ED CV1), funded by the European Union NextGenerationEU funds, through the Plan de Recuperación, Transformación y Resiliencia (PRTR). The funders had no roles in study design, data collection and analysis, decision to publish or preparation of the manuscript. The computational analyses reported in this manuscript used the Triton Shared Computing Cluster at the San Diego Supercomputer Center of the University of California San Diego. We thank the study participants; P. Kraft for reading and commenting on the manuscript; and the staff at Westat for their assistance with collecting samples and corresponding clinical data. This work used the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: M.D.-G., T.Z., L.B.A. and M.T.L. Methodology: M.D.-G., T.Z., L.Y., J. Shi, D.C.W., B.Z., L.B.A. and M.T.L. Formal analysis: M.D.-G., T.Z., P.H.H., A.K., W.Z., C.D.S., B.O., S.P.N., R.V., E.N.B., M. Kazachkova, J. Sang, J.P.M., C.H., O.W.L., K.M.J., P.G.-G., Y.Y., X.Z., L.Y., M.A.N., J. Shi, B.Z. and J.C. Pathology work: C.L., M.K.B., W.D.T., L.M.S., P.J., R.H. and S.-R.Y. Resources: O.P., C.S., C.A.H., I.-S.C., M.P.W., K.C.L., E.S.E., J.M.S., M.B.S., S.S.Y., M. Manczuk, J.L., B.S., A.M., O.S., D.Z., I.H., D.M., S.M., M. Kontic, Y.B., B.E.G.R., D.C.C., V.G., P.B., G.L., P.H., N.R., A.C.P., D.C., Q.L., S.J.C. and M.T.L. Data curation: P.H.H., T.Z., F.J.C.-M., M. Miraftab, M.S. and O.W.L. Writing (original draft): M.D.-G., T.Z., L.B.A. and M.T.L. Writing (review and editing), M.D.-G., T.Z., C.S., L.Y., M.A.N., D.C.W., B.Z., S.J.C., J.C., L.B.A. and M.T.L. Visualization: M.D.-G., T.Z., L.B.A. and M.T.L. Supervision: L.B.A. and M.T.L.

Corresponding authors

Correspondence to Ludmil B. Alexandrov or Maria Teresa Landi.

Ethics declarations

Competing interests

L.B.A. is a co-founder, CSO, scientific advisory member and consultant for io9, has equity and receives income. The terms of this arrangement have been reviewed and approved by the University of California San Diego in accordance with its conflict-of-interest policies. L.B.A. is also a compensated member of the scientific advisory board of Inocras. L.B.A.’s spouse is an employee of Biotheranostics. E.N.B. and L.B.A. declare a US provisional patent application filed with the University of California San Diego with serial number 63/269,033. L.B.A. also declares US provisional applications filed with the University of California San Diego with serial numbers 63/366,392, 63/289,601, 63/483,237, 63/412,835 and 63/492,348. L.B.A. is also an inventor of US patent 10,776,718 for source identification by non-negative matrix factorization. L.B.A. and M.D.-G. further declare a European patent application with application number EP25305077.7. S.-R.Y has received consulting fees from AstraZeneca, Sanofi, Amgen, AbbVie and Sanofi, and speaking fees from AstraZeneca, Medscape, PRIME Education and Medical Learning Institute. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Association of mutational signature prevalence and driver mutations with geographical region, biological sex and EGFR mutation status in LCINS adenocarcinoma cases.

a, DBS, ID, CN and SV mutational signatures enrichment analysis with geographical regions. Horizontal lines marking statistically significant thresholds were included at 0.05 (dashed orange line) and 0.01 FDR value levels (dashed red line). Blue-coloured signatures were enriched in North American and European patients, whereas red-coloured signatures were enriched in East Asian patients. Statistical significance was evaluated using multivariable logistic regression models for geographical regions and adjusted by age, sex, and tumour purity. b, SBS, DBS, ID CN, and SV mutational signatures enrichment analysis with biological sexes. Blue-coloured signatures were enriched in males, whereas red-coloured signatures were enriched in females. Statistical significance was evaluated using multivariable logistic regression models for biological sex and adjusted by age, genetic ancestry, and tumour purity. ce, Detail of the enrichment of EGFR (c), TP53 (d) and KRAS (e) driver mutations in North American and European versus East Asian LCINS adenocarcinoma cases. f, Driver mutations enrichment analysis with biological sexes. Blue-coloured genes were enriched in males, whereas red-coloured genes were enriched in females. Statistical significance was evaluated using multivariable logistic regression models for biological sex and adjusted by age, genetic ancestry, and tumour purity. g, Quantification of the tumour mutational burden for TP53 wild-type and mutant tumours across EGFR mutation status (n = 271 TP53 wild-type EGFR wild-type, n = 241 TP53 wild-type EGFR mutant, n = 81 TP53 mutant EGFR wild-type, n = 144 TP53 mutant EGFR mutant). Statistical significance was evaluated using a multivariable linear regression model for EGFR mutation status and adjusted by age, sex, ancestry, and tumour purity. The line within the box indicates the median, the upper and lower ends indicate the 25th and 75th percentiles, whiskers show 1.5 × interquartile range, and values outside are shown as individual data points. h, SBS, DBS, ID, CN and SV mutational signatures enrichment analysis with EGFR mutation status. Blue-coloured signatures were enriched in EGFR mutant tumours, whereas red-coloured signatures were enriched in EGFR wild-type tumours. Statistical significance was evaluated using multivariable logistic regression models for EGFR mutation status and adjusted by age, sex, genetic ancestry, and tumour purity.

Extended Data Fig. 2 Genomic differences between lung cancers from smokers and lung cancers from never-smokers, and landscape of 56 LCINS tumours that exhibit SBS4 activity.

a, Differences between smoker and never-smoker lung cancer cases across SBS signatures. Volcano plot (top) indicating the enrichment of SBS signature prevalence in never-smokers (left) and smokers (right) with lung cancer. Statistically significant enrichments were evaluated using multivariable logistic regression models for smoking status and adjusted by age, sex, histology, genetic ancestry, and tumour purity. Firth’s bias-reduced logistic regressions were used for regression presenting complete or quasi-complete separation. P-values were adjusted for multiple comparisons based on the total number of mutational signatures considered, and adjusted p-values were reported as FDR values. Horizontal lines marking statistically significant thresholds were included at 0.05 (dashed orange line) and 0.01 FDR levels (dashed red line). Bar plot (bottom) indicating prevalence by smoking history. b, Tumour mutational burden differences between SBS4-positive (n = 56) and negative (n = 815) LCINS tumours for SBS, DBS, ID, CN segments, and SV events. Statistical significance was evaluated using two-sided Wilcoxon rank sum tests. The line within the box indicates the median, the upper and lower ends indicate the 25th and 75th percentiles, whiskers show 1.5 × interquartile range, and values outside are shown as individual data points. ce, Mutational signature landscape for SBS (c), DBS (d) and ID (e) mutation types, including absolute and relative number of mutations assigned to each mutational signature, unsupervised clustering based on the signature contributions, and sample-level annotations of sex, genetic ancestry, passive smoking, and accuracy of signature reconstruction based on cosine similarity. f, Driver mutations landscape, including different types of genomic alterations, as well as sample-level annotations of sex, genetic ancestry, histology, and tumour purity. g, Enrichment of EGFR p.L858R hotspot driver mutations in SBS4-positive tumours from never-smokers compared to smokers using multivariable logistic regressions considering clinical and epidemiological covariates, including age, sex, genetic ancestry, histology, and tumour purity (n = 5 mutated non-smoker cases, n = 1 mutated smoker case, n = 51 non-smoker wild-type cases, n = 301 smoker wild-type tumours). Error bars indicate 95% CIs.

Extended Data Fig. 3 Topographical characteristics of 56 LCINS and 68 lung cancers from smokers exhibiting SBS4 activity.

a,b, Distribution of SBS4 mutations with replication timing in our cohort of never-smokers (a) and in the smokers from the PCAWG cohort (b). Data are separated into deciles, with each segment harbouring 10% of the observed replication time signal in the x-axis, and the normalized mutational density displayed in the y-axis. Black dashed lines represent the behaviour of simulated mutations. c,d, Association of SBS4 mutations with nucleosome occupancy in never-smokers (c) and smokers (d). The solid blue line represents real somatic mutations, whereas the dashed grey line indicates the distribution of simulated mutations. Both lines show the average nucleosome signal in the y-axis, using a genomic window of 2 kilobases centred around the SBS4-associated mutations in the x-axis. e,f, Strand asymmetry of SBS4-associated mutations in comparison to simulations and considering lagging and leading DNA strands, transcribed and untranscribed DNA regions and genic and intergenic genomic locations in never-smokers (e) and smokers (f). The number of circles represents the odds ratio and the colour the corresponding strand/region of statistically significant asymmetries.

Extended Data Fig. 4 Influence of passive smoking on the landscape of ID, DBS, CN and SV signatures in LCINS.

ae, Differences in DBS, ID, CN, and SV burden using univariate comparisons based on two-sided Wilcoxon rank sum tests (a) as well as multivariable linear regressions considering clinical and epidemiological covariates (be), including age, sex, genetic ancestry, and tumour purity (n = 250 passive smokers, n = 208 non-passive smokers). The line within the box indicates the median, the upper and lower ends indicate the 25th and 75th percentiles, whiskers show 1.5 × interquartile range, and values outside are shown as individual data points (a). Error bars indicate 95% CIs (be). f, Enrichment of mutational signatures derived from DBS, ID, CN, and SV alterations. Horizontal lines marking statistically significant thresholds were included at 0.05 (dashed orange line) and 0.01 FDR value levels (dashed red line). Statistical significance was evaluated using multivariable logistic regression models for passive smoking history and adjusted by age, sex, genetic ancestry, histology and tumour purity.

Extended Data Fig. 5 Effects of PM2.5 exposure in large genomic alterations in LCINS.

a,b, Differences in the number of CN segments and SV events using univariate comparisons based on two-sided Wilcoxon rank sum tests (a) as well as multivariable linear regressions, considering clinical and epidemiological covariates (b), including age, sex, genetic ancestry, histology, and tumour purity, for patients diagnosed in geographical regions with high and low PM2.5 exposure levels (threshold defined at 20 μg m−3; n = 440 high-pollution group, n = 413 low-pollution group; only samples for which the country of origin was known are included). The line within the box indicates the median, the upper and lower ends indicate the 25th and 75th percentiles, whiskers show 1.5 × interquartile range, and values outside are shown as individual data points (a). Error bars indicate 95% CIs (b). c, Volcano plots indicating enrichment of mutational signatures derived from CN and SV alterations. Horizontal lines marking statistically significant thresholds were included at 0.05 (dashed orange line) and 0.01 FDR value levels (dashed red line). Statistical significance was evaluated using multivariable logistic regression models for PM2.5 exposure levels and adjusted by age, sex, genetic ancestry, histology, and tumour purity.

Extended Data Fig. 6 Assignment of mutational signatures and estimation of telomere length using data from control and PM2.5-exposed mice.

a,b, Box plots comparing the mutations assigned to SBS5 (a) and the estimations for the telomere length ratio between the tumour and normal samples (b). Two-sided Student’s t-tests were used to calculate statistical significance. The line within the box indicates the median, the upper and lower ends indicate the 25th and 75th percentiles, and whiskers show 1.5 × interquartile range. n = 5 (control mice) and n = 5 (PM2.5-exposed mice); data from a previous study49.

Extended Data Fig. 7 Mutagenic effects of PM2.5 exposure in LCINS cases excluding SBS4 contributions.

a, Quantification of SBS burden excluding SBS4 mutations for patients living in geographical regions with high and low PM2.5 exposure levels (threshold defined at 20 μg m−3; n = 440 high-pollution group, n = 413 low-pollution group; only samples for which the country of origin was known are included). Statistical significance was evaluated using two-sided Wilcoxon rank sum tests. The line within the box indicates the median, the upper and lower ends indicate the 25th and 75th percentiles, whiskers show 1.5 × interquartile range, and values outside are shown as individual data points. b, Forest plot corresponding to a multivariable linear regression considering high or low PM2.5 exposure group, age, sex, genetic ancestry, histology and tumour sample purity as covariates and SBS burden as independent variable (threshold defined at 20 μg m−3; n = 440 high-pollution group, n = 413 low-pollution group; only samples for which the country of origin was known are included). Error bars indicate 95% CIs. c, Scatter plot showing a significant correlation between individual sample estimates of PM2.5 exposure and SBS burden. Statistical significance was evaluated using a multivariable linear regression of the individual PM2.5 estimates per sample and mutation burden (log10 scale), and adjusted by age, sex, genetic ancestry, histology and tumour purity. Blue lines and bands indicate univariate linear regressions and 95% CIs for average mutation burden versus average PM2.5 estimates.

Supplementary information

Supplementary Information

Supplementary Note, Supplementary Figures and Supplementary References

Reporting Summary

Supplementary Tables

Supplementary Tables 1–14

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Díaz-Gay, M., Zhang, T., Hoang, P.H. et al. The mutagenic forces shaping the genomes of lung cancer in never smokers. Nature 644, 133–144 (2025). https://doi.org/10.1038/s41586-025-09219-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41586-025-09219-0

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer