Abstract
Understanding lung cancer evolution can identify tools for intercepting its growth1,2. Here, in a landscape analysis of 1,024 lung adenocarcinomas (LUADs) with deep whole-genome sequencing integrated with multiomic data, we identified 542 LUADs with a diverse clonal architecture. In this group, we observed divergent evolutionary trajectories based on tobacco smoking exposure, ancestry and sex. LUAD from smokers showed an abundance of tobacco-related C:G>A:T driver mutations3 in KRAS and short subclonal diversification. LUAD in people who have never smoked (hereafter, never-smokers) showed early occurrence of copy-number alterations and EGFR mutations associated with SBS5 and SBS40a mutational signatures. Tumours containing EGFR mutations exhibited long latency, particularly in female individuals of European-ancestry. Tumours from Asian never-smokers showed a short clonal evolution. Importantly, we found that the mutational signature ID24 is a marker of a previously unrecognized mechanism for LUAD evolution. Tumours with ID2 showed short latency and high long interspersed nuclear element-1 (LINE-1, hereafter L1) retrotransposon activity linked to L1 promoter demethylation. These tumours exhibited an aggressive phenotype with genomic instability, elevated hypoxia scores, low neoantigen burden, metastasis propensity and poor overall survival. Reactivated L1-retrotransposition-induced mutagenesis probably contributes to the mutational signature ID2, including through the regulation of the transcriptional factor ZNF695, a member of the KZFP family5. The complex nature of LUAD evolution creates both challenges and opportunities for screening and treatment plans.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
Normal and tumour-paired CRAM files, as well as raw methylation intensity (IDAT) files for the WGS participants from the Sherlock-Lung study, have been deposited in dbGaP under accession numbers phs001697.v2.p1 and phs002992.v1.p1. RNA-seq FASTQ files for the same individuals are available through dbGaP under the accession number phs002346.v1.p1. The human reference genome (GRCh38) was obtained from the GATK resource repository (https://github.com/broadinstitute/gatk/blob/master/src/test/resources/large/Homo_sapiens_assembly38.fasta.gz). Publicly available LUAD multi-omics datasets can be accessed through dbGaP (accessions phs000178.v9.p8 and phs000488.v1.p1) and the European Genome-phenome Archive (EGA) (EGAS00001001757, EGAS00001002801 and EGAS00001003830). Detailed dataset descriptions and accession information are also provided in Supplementary Table 1.
Code availability
The WGS bioinformatics pipelines can be accessed at GitHub (https://github.com/xtmgah/Sherlock-Lung). Battenberg SCNA calling algorithm can be found at GitHub (https://github.com/Wedge-lab/battenberg). The Dirichlet process-based method for the subclonal reconstruction of tumours can be found at GitHub (https://github.com/Wedge-lab/dpclust). The bioinformatic pipeline for identifying TE insertion is available at GitLab (https://gitlab.com/mobilegenomesgroup/TraFiC).
References
Yates, L. R. & Campbell, P. J. Evolution of the cancer genome. Nat. Rev. Genet. 13, 795–806 (2012).
Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109–2121 (2017).
Alexandrov, L. B. et al. Mutational signatures R. Science 354, 618–622 (2016).
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Yang, P., Wang, Y. & Macfarlan, T. S. The role of KRAB-ZFPs in transposable element repression and mammalian evolution. Trends Genet. 33, 871–881 (2017).
Zhang, Y. et al. Global variations in lung cancer incidence by histological subtype in 2020: a population-based study. Lancet Oncol. 24, 1206–1218 (2023).
Gerstung, M. et al. The evolutionary history of 2,658 cancers. Nature 578, 122–128 (2020).
Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525–533 (2023).
Burns, K. H. Transposable elements in cancer. Nat. Rev. Cancer 17, 415–424 (2017).
Rodriguez-Martin, B. et al. Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition. Nat. Genet. 52, 306–319 (2020).
Mendez-Dorantes, C. et al. Chromosomal rearrangements and instability caused by the LINE-1 retrotransposon. Preprint at bioRxiv https://doi.org/10.1101/2024.12.14.628481 (2024).
Rodić, N. et al. Retrotransposon insertions in the clonal evolution of pancreatic ductal adenocarcinoma. Nat. Med. 21, 1060–1064 (2015).
Nguyen, T. H. M. et al. L1 retrotransposon heterogeneity in ovarian tumor cell evolution. Cell Rep. 23, 3730–3740 (2018).
Zhang, T. et al. Genomic and evolutionary classification of lung cancer in never smokers. Nat. Genet. 53, 1348–1359 (2021).
Landi, M. T. et al. Tracing lung cancer risk factors through mutational signatures in never-smokers. Am. J. Epidemiol. 190, 962–976 (2021).
Zhu, B. et al. The genomic and epigenomic evolutionary history of papillary renal cell carcinomas. Nat. Commun. 11, 3096 (2020).
Senkin, S. et al. Geographic variation of mutagenic exposures in kidney cancer genomes. Nature 629, 910–918 (2024).
Díaz-Gay, M. et al. The mutagenic forces shaping the genomes of lung cancer in never smokers. Nature 644, 133–144 (2025).
McGranahan, N. et al. Clonal status of actionable driver events and the timing of mutational processes in cancer evolution. Sci. Transl. Med. 7, 283ra54 (2015).
Otlu, B. et al. Topography of mutational signatures in human cancer. Cell Rep. 42, 112930 (2023).
Stamatoyannopoulos, J. A. et al. Human mutation rate associated with DNA replication timing. Nat. Genet. 41, 393–395 (2009).
Whitfield, M. L., George, L. K., Grant, G. D. & Perou, C. M. Common markers of proliferation. Nat. Rev. Cancer 6, 99–106 (2006).
Emami Nejad, A. et al. The role of hypoxia in the tumor microenvironment and development of cancer stem cell: a novel approach to developing treatment. Cancer Cell Int. 21, 62 (2021).
Bhandari, V., Li, C. H., Bristow, R. G., Boutros, P. C. & PCAWG Consortium. Divergent mutational processes distinguish hypoxic and normoxic tumours. Nat. Commun. 11, 737 (2020).
Liu, N. et al. Selective silencing of euchromatic L1s revealed by genome-wide screens for L1 regulators. Nature 553, 228–232 (2018).
Li, X. et al. LINE-1 transcription activates long-range gene expression. Nat. Genet. 56, 1494–1502 (2024).
Scott, E. C. & Devine, S. E. The role of somatic L1 retrotransposition in human cancers. Viruses 9, 131 (2017).
Tubio, J. M. C. et al. Mobile DNA in cancer. Extensive transduction of nonrepetitive DNA mediated by L1 retrotransposition in cancer genomes. Science 345, 1251343 (2014).
McKerrow, W. et al. LINE-1 expression in cancer correlates with p53 mutation, copy number alteration, and S phase checkpoint. Proc. Natl Acad. Sci. USA 119, e2115999119 (2022).
Kazazian, H. H. Jr & Moran, J. V. Mobile DNA in health and disease. N. Engl. J. Med. 377, 361–370 (2017).
Petljak, M. et al. Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. Cell 176, 1282–1294 (2019).
Nam, C. H. et al. Widespread somatic L1 retrotransposition in normal colorectal epithelium. Nature 617, 540–547 (2023).
Levin, H. L. & Moran, J. V. Dynamic interactions between transposable elements and their hosts. Nat. Rev. Genet. 12, 615–627 (2011).
Gasior, S. L., Wakeman, T. P., Xu, B. & Deininger, P. L. The human LINE-1 retrotransposon creates DNA double-strand breaks. J. Mol. Biol. 357, 1383–1393 (2006).
Morrish, T. A. et al. DNA repair mediated by endonuclease-independent LINE-1 retrotransposition. Nat. Genet. 31, 159–165 (2002).
Farkash, E. A. & Luning Prak, E. T. DNA damage and L1 retrotransposition. J. Biomed. Biotechnol. 2006, 37285 (2006).
Suzuki, J. et al. Genetic evidence that the non-homologous end-joining repair pathway is involved in LINE retrotransposition. PLoS Genet. 5, e1000461 (2009).
Baldwin, E. T. et al. Structures, functions and adaptations of the human LINE-1 ORF2 protein. Nature 626, 194–206 (2024).
Freeman, B. et al. Analysis of epigenetic features characteristic of L1 loci expressed in human cells. Nucleic Acids Res. 50, 1888–1907 (2022).
Rodgers, K. & McVey, M. Error-prone repair of DNA double-strand breaks. J. Cell. Physiol. 231, 15–24 (2016).
Wangsri, S., Subbalekha, K., Kitkumthorn, N. & Mutirangura, A. Patterns and possible roles of LINE-1 methylation changes in smoke-exposed epithelia. PLoS ONE 7, e45292 (2012).
Stueve, T. R. et al. Epigenome-wide analysis of DNA methylation in lung tissue shows concordance with blood studies and identifies tobacco smoke-inducible enhancers. Hum. Mol. Genet. 26, 3014–3027 (2017).
Caliri, A. W., Caceres, A., Tommasi, S. & Besaratinia, A. Hypomethylation of LINE-1 repeat elements and global loss of DNA hydroxymethylation in vapers and smokers. Epigenetics 15, 816–829 (2020).
Camila, B. et al. Genotoxicity and hypomethylation of LINE-1 induced by electronic cigarettes. Ecotoxicol. Environ. Saf. 256, 114900 (2023).
Joehanes, R. et al. Epigenetic signatures of cigarette smoking. Circ. Cardiovasc. Genet. 9, 436–447 (2016).
Imbeault, M., Helleboid, P.-Y. & Trono, D. KRAB zinc-finger proteins contribute to the evolution of gene regulatory networks. Nature 543, 550–554 (2017).
Han, G. et al. An atlas of epithelial cell states and plasticity in lung adenocarcinoma. Nature 627, 656–663 (2024).
Long, E. et al. Context-aware single-cell multiomics approach identifies cell-type-specific lung cancer susceptibility genes. Nat. Commun. 15, 7995 (2024).
Rosspopoff, O. & Trono, D. Take a walk on the KRAB side. Trends Genet. 39, 844–857 (2023).
Hill, W. et al. Lung adenocarcinoma promotion by air pollutants. Nature 616, 159–167 (2023).
Haga, Y. et al. Whole-genome sequencing reveals the molecular implications of the stepwise progression of lung adenocarcinoma. Nat. Commun. 14, 8375 (2023).
Huang, Z. et al. Single-cell analysis of somatic mutations in human bronchial epithelial cells in relation to aging and smoking. Nat. Genet. 54, 492–498 (2022).
Colom, B. et al. Mutant clones in normal epithelium outcompete and eliminate emerging tumours. Nature 598, 510–514 (2021).
Jardim, D. L., Goodman, A., de Melo Gagliato, D. & Kurzrock, R. The challenges of tumor mutational burden as an immunotherapy biomarker. Cancer Cell 39, 154–173 (2021).
Klein, S. L. & Flanagan, K. L. Sex differences in immune responses. Nat. Rev. Immunol. 16, 626–638 (2016).
Vaz, M. et al. Chronic cigarette smoke-induced epigenomic changes precede sensitization of bronchial epithelial cells to single-step transformation by KRAS mutations. Cancer Cell 32, 360–376 (2017).
Mengs, U. Tumour induction in mice following exposure to aristolochic acid. Arch. Toxicol. 61, 504–505 (1988).
Ambatipudi, S. et al. Tobacco smoking-associated genome-wide DNA methylation changes in the EPIC study. Epigenomics 8, 599–618 (2016).
Kobayashi, S. et al. EGFR mutation and resistance of non-small-cell lung cancer to gefitinib. N. Engl. J. Med. 352, 786–792 (2005).
Shah, N. M. et al. Pan-cancer analysis identifies tumor-specific antigens derived from transposable elements. Nat. Genet. 55, 631–639 (2023).
Bergmann, E. A., Chen, B.-J., Arora, K., Vacic, V. & Zody, M. C. Conpair: concordance and contamination estimator for matched tumor-normal pairs. Bioinformatics 32, 3196–3198 (2016).
Pedersen, B. S. et al. Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches. Genome Med. 12, 62 (2020).
Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
Dentro, S. C. et al. Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes. Cell 184, 2239–2254 (2021).
Sadedin, S. P. & Oshlack, A. Bazam: a rapid method for read extraction and realignment of high-throughput sequencing data. Genome Biol. 20, 78 (2019).
Martínez-Jiménez, F. et al. A compendium of mutational cancer driver genes. Nat. Rev. Cancer 20, 555–572 (2020).
Yuan, K., Macintyre, G., Liu, W., PCAWG-11 Working Group & Markowetz, F. Ccube: a fast and robust method for estimating cancer cell fractions. Preprint at bioRxiv https://doi.org/10.1101/484402 (2018).
Yang, L. et al. Diverse mechanisms of somatic structural variations in human cancer genomes. Cell 153, 919–929 (2013).
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).
Muiños, F., Martínez-Jiménez, F., Pich, O., Gonzalez-Perez, A. & Lopez-Bigas, N. In silico saturation mutagenesis of cancer genes. Nature 596, 428–432 (2021).
Chakravarty, D. et al. OncoKB: a precision oncology knowledge base. JCO Precis. Oncol. 1, 1–16 (2017).
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385 (2018).
Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023).
Bergstrom, E. N. et al. SigProfilerMatrixGenerator: a tool for visualizing and exploring patterns of small mutational events. BMC Genom. 20, 685 (2019).
Islam, S. M. A. et al. Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. Cell Genom. 2, 100179 (2022).
Sondka, Z. et al. COSMIC: a curated database of somatic variants and clinical data for cancer. Nucleic Acids Res. 52, D1210–D1217 (2024).
Díaz-Gay, M. et al. Assigning mutational signatures to individual samples and individual somatic mutations with SigProfilerAssignment. Bioinformatics 39, btad756 (2023).
Degasperi, A. et al. Substitution mutational signatures in whole-genome-sequenced cancers in the UK population. Science 376, abl9283 (2022).
Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014).
Grossman, R. L. et al. Toward a shared vision for cancer genomic data. N. Engl. J. Med. 375, 1109–1112 (2016).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Putri, G. H., Anders, S., Pyl, P. T., Pimanda, J. E. & Zanini, F. Analysing high-throughput sequencing data in Python with HTSeq 2.0. Bioinformatics 38, 2943–2945 (2022).
Zhang, Y., Parmigiani, G. & Johnson, W. E. ComBat-seq: batch effect adjustment for RNA-seq count data. NAR Genom. Bioinform. 2, lqaa078 (2020).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).
Li, T. et al. TIMER2.0 for analysis of tumor-infiltrating immune cells. Nucleic Acids Res. 48, W509–W514 (2020).
de Tribolet-Hardy, J. et al. Genetic features and genomic targets of human KRAB-zinc finger proteins. Genome Res. 33, 1409–1423 (2023).
Del Toro, N. et al. The IntAct database: efficient access to fine-grained molecular interaction data. Nucleic Acids Res. 50, D648–D653 (2022).
Oleksiewicz, U. et al. TRIM28 and interacting KRAB-ZNFs control self-renewal of human pluripotent stem cells through epigenetic repression of pro-differentiation genes. Stem Cell Rep. 9, 2065–2080 (2017).
McKerrow, W. & Fenyö, D. L1EM: a tool for accurate locus specific LINE-1 RNA quantification. Bioinformatics 36, 1167–1173 (2020).
Reyes-Gopar, H. et al. A single-cell transposable element atlas of human cell identity. Cell Rep. Methods 5, 101086 (2025).
Müller, F. et al. RnBeads 2.0: comprehensive analysis of DNA methylation data. Genome Biol. 20, 55 (2019).
Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).
Kawaguchi, S., Higasa, K., Shimizu, M., Yamada, R. & Matsuda, F. HLA-HD: an accurate HLA typing algorithm for next-generation sequencing data. Hum. Mutat. 38, 788–797 (2017).
Reynisson, B., Alvarez, B., Paul, S., Peters, B. & Nielsen, M. NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Res. 48, W449–W454 (2020).
Schenck, R. O., Lakatos, E., Gatenbee, C., Graham, T. A. & Anderson, A. R. A. NeoPredPipe: high-throughput neoantigen prediction and recognition potential pipeline. BMC Bioinform. 20, 264 (2019).
Thorsson, V. et al. The immune landscape of cancer. Immunity 48, 812–830 (2018).
Buffa, F. M., Harris, A. L., West, C. M. & Miller, C. J. Large meta-analysis of multiple cancers reveals a common, compact and highly prognostic hypoxia metagene. Br. J. Cancer 102, 428–435 (2010).
Winter, S. C. et al. Relation of a hypoxia metagene derived from head and neck cancer to prognosis of multiple cancers. Cancer Res. 67, 3441–3449 (2007).
Ragnum, H. B. et al. The tumour hypoxia marker pimonidazole reflects a transcriptional programme associated with aggressive prostate cancer. Br. J. Cancer 112, 382–390 (2015).
Elvidge, G. P. et al. Concordant regulation of gene expression by hypoxia and 2-oxoglutarate-dependent dioxygenase inhibition: the role of HIF-1α, HIF-2α, and other pathways. J. Biol. Chem. 281, 15215–15226 (2006).
Sørensen, B. S., Toustrup, K., Horsman, M. R., Overgaard, J. & Alsner, J. Identifying pH independent hypoxia induced genes in human squamous cell carcinomas in vitro. Acta Oncol. 49, 895–905 (2010).
Acknowledgements
This work was supported by the Intramural Research Program of the National Cancer Institute, US National Institute of Health (NIH) (project ZIACP101231 to M.T.L.); by the NIH grants R01ES032547-01, R01CA269919-01 and 1U01CA290479-01 to L.B.A. as well as by L.B.A.’s Packard Fellowship for Science and Engineering. The research performed in L.B.A.’s laboratory was also supported by UC San Diego Sanford Stem Cell Institute. The funders had no roles in study design, data collection and analysis, decision to publish or preparation of the manuscript. The M.D.-G. fellowship, within the “Generación D” initiative, Red.es, Ministerio para la Transformación Digital y de la Función Pública, for talent attraction (C005/24-ED CV1), is funded by the European Union NextGenerationEU funds, through PRTR. The computational analyses reported in this manuscript have used the NIH high-performance Biowulf Cluster and the Triton Shared Computing Cluster at the San Diego Supercomputer Center of UC San Diego. We thank the study participants; P. Kraft for reading manuscript and comments; and the staff at Westat for their assistance in collecting samples and corresponding clinical data. Where authors are identified as personnel of the International Agency for Research on Cancer/World Health Organization, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy or views of the International Agency for Research on Cancer/World Health Organization. This research was supported by the Intramural Research Program of the NIH. The contributions of the NIH authors were made as part of their official duties as NIH federal employees, are in compliance with agency policy requirements, and are considered works of the United States Government. However, the findings and conclusions presented in this paper are those of the authors and do not necessarily reflect the views of the NIH or the US Department of Health and Human Services. We want to acknowledge the patients and the INCLIVA Biobank (PT17/0015/0049) integrated in the Spanish National Biobanks Network and in the Valencian Biobanking Network for their collaboration. This study was supported by the Health and Medical Research Fund of Hong Kong SAR, HMRF 03142856. The related studies of Taiwan site were supported by grants from the Ministry of Health and Welfare, Taiwan DOH97-TD-G-111-026 (C.A.H.), DOH98-TD-G-111-015 (C.A.H.), DOH99-TD-G-111-028 (C.A.H.); DOH97-TD-G-111-029 (C.Y.C.), DOH98-TD-G-111-018 (C.Y.C.), DOH99-TD-G-111-015 (C.Y.C.), DOH97-TD-G-111-028(I.S.C.), DOH98-TD-G-111-017(I.S.C.), DOH99-TD-G-111-014(I.S.C.), and the Ministry of Science and Technology, Taiwan MOST109-2740-B-400-002 (C.A.H.), MOST110-2740-B-400-002 (C.A.H.), MOST111-2740-B-400-002 (C.A.H.). This work has been supported in part by the Tissue Core at the H. Lee Moffitt Cancer Center & Research Institute, a comprehensive cancer center designated by the National Cancer Institute and funded in part by a Moffitt Cancer Center Support Grant (no. P30-CA076292). And, in part, by NIH (NCI) grant # U01CA209414 to the Boston Lung Cancer Survival Study of the Dana-Farber/ Harvard Cancer Center (D.C.C.). The authors would like to thank the team at the IUCPQ site of the Quebec Respiratory Health Network Biobank of the FRQS for their valuable assistance, and would like to thank the staff at Harvard University, Yale University, Roswell Park Cancer Institute and Roswell PI, Instituto Nacional de Cancerologia, Nice University Hospital Centre (Nice UHC) - University Côte d’Azur and the Nice Biobank CRB, Toronto University Health Network, and Mayo Clinic for their assistance providing samples and corresponding clinical data.
Author information
Authors and Affiliations
Contributions
T.Z. and M.T.L. conceived and designed the study. T.Z., L.Y., B.Z., J. Shi, L.B.A., D.C.W. and M.T.L. developed the methodology. T.Z., W.Z., C.W., M.D.-G., P.H.H., J.M.S., J.P.M., A. Klein, C.H., L.Y., B.Z., A. Khandekar and J. Sang carried out the formal analyses. J.Y., M.A., M. Cardelli, M. Cecati, F.M., K.B., J.C. and K.M.J. performed validation of findings. C.L., M.K.B., W.D.T., L.M.S., P.J., R.H. and S.-R.Y. conducted pathology review and analysis. M.P.W., K.C.L., C.A.H., C.-Y.C., N.E.C., A.C.P., D.C., E.S.E., J.M.S., M.B.S., S.S.Y., M. Manczuk, J.L., B.Ś., A.M., O.S., D.Z., I.H., D.M., S.M., M. Savic, Y.B., B.E.G.R., D.C.C., V.G., P.B., G.L., P.H., N.R., Q.L., M.T.L. and S.J.C. provided resources and essential materials. T.Z., J.R., M. Miraftab, F.J.C.-M., M. Saha, P.H.H. and O.W.L. undertook data curation. The original manuscript draft was written by T.Z. and M.T.L. All of the authors, including L.Y., B.Z., J. Shi, J.C., T.Z., M.A.N., D.C.W., S.J.C., L.B.A. and M.T.L., contributed to the review and editing of the manuscript. T.Z. performed visualization. M.T.L. supervised the study. All of the authors read and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
L.B.A. is a co-founder, CSO, scientific advisory member and consultant for io9, has equity and receives income. The terms of this arrangement have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. L.B.A. is also a compensated member of the scientific advisory board of Inocras. L.B.A.’s spouse is an employee of Biotheranostics. L.B.A. declares US provisional applications filed with UCSD with serial numbers: 63/269,033, 63/366,392, 63/289,601, 63/483,237, 63/412,835 and 63/492,348. L.B.A. is also listed as an inventor of a US patent 10,776,718 for source identification by non-negative matrix factorization. S.-R.Y. has received consulting fees from AstraZeneca, Sanofi, Amgen, AbbVie and Sanofi; and received speaking fees from AstraZeneca, Medscape, PRIME Education and Medical Learning Institute. The other authors declare no competing interests.
Peer review
Peer review information
Nature thanks David Adams and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Power analysis for detecting a diverse clonal architecture in 1024 LUAD.
a, The scatter plot illustrates the relationship between the number of reads per chromosome copy (NRPCC) and the total detected single nucleotide variants (SNVs). Our ability to detect subclones relies not on the number of identified SNVs, but on the number of reads per tumour chromosomal copy. NRPCC accounts for tumour purity, ploidy, and sequencing coverage. b, The minimum cancer cell fraction (CCF) of the detected clusters in each tumour is plotted against NRPCC. To mitigate biases, we exclusively considered tumours with NRPCC ≥ 10. In these tumours, our analysis is sufficiently powered to identify a subclone with a CCF ≥ 30%. The suggested NRPCC threshold is denoted by the dashed line.
Extended Data Fig. 2 Overview of the molecular timing distribution of copy number gains per chromosome.
a, Pie charts depict the distribution of the inferred mutation time for a given copy number gain in a group. Green denotes early clonal gains, with a gradient to purple for late gains. The size of each pie chart is proportional to the frequency of recurrence of each event. b, Comparison of the molecular timing of copy number gains on chromosomes 21 and 22 versus other chromosomes across all three groups. c, Comparison of the molecular timing of copy number gains on chromosome X versus other chromosomes in tumours from the EU_S group. Two-sided Wilcoxon rank-sum test p-values are shown above the box plots. All box plots show the median (centre line), IQR (box), and whiskers extending to 1.5× the IQR by default in ggplot2. The P-values derived from the two-sided Wilcoxon rank-sum test are shown above the box plots.
Extended Data Fig. 3 Contribution of mutational signatures to major driver genes.
a, Single base substitution (SBS) mutational patterns for the driver mutations of EGFR, KRAS, and TP53. b, Distribution of all driver mutations and hotspot mutations (recurrence > 5) in EGFR, KRAS, and TP53 attributed to each mutational signature.
Extended Data Fig. 4 Dynamic SBS mutational processes during early and late tumour evolution.
Fold changes between the relative proportions of early and late clonal mutations attributed to individual SBS mutational signatures. Box plots display the median (centre line), interquartile range (box), and whiskers extending to 1.5× the IQR. Each point represents a tumour sample (n = 542) and is coloured by mutational signature. P values from two-sided Wilcoxon rank-sum tests are shown below the box plots.
Extended Data Fig. 5 Associations between tumour latency and EGFR mutation status (a), KRAS mutation status (b) and sex (c), stratified by ancestry and tobacco smoking group.
All box plots display the median (centre line), interquartile range (box), and whiskers extending to 1.5× the IQR. Sample size: n = 542. P values were calculated using two-sided Wilcoxon rank-sum tests.
Extended Data Fig. 6 Association between tumours with mutational signature ID2 and increased tumour proliferation.
a, Ingenuity pathway analysis (IPA) of differentially expressed genes in tumours with and without mutational signature ID2. The x-axis represents the Z-score, and the y-axis indicates the corresponding canonical pathways. Suppressed pathways are shown in blue, while activated pathways are shown in orange. Circle size reflects the level of significance. b–d, Gene set enrichment analysis (GSEA) highlights key proliferation-related pathways significantly associated with tumours harbouring mutational signature ID2, including hallmark mitotic spindle (b), hallmark MYC targets V2 (c), and hallmark G2M checkpoint (d). The normalized enrichment score (NES) and FDR-adjusted q-value are displayed at the bottom of each GSEA plot. e,f, Enrichment of tumour metastasis in tumours with mutational signature ID2, stratified by TP53 mutation status. Odds ratios and p-values from the Fisher’s exact test are shown above the plot. g,h, Comparison of ID2 deletion burden between tumours with and without metastasis, stratified by TP53 mutation status (g) or across all tumours (n = 294) (h). All box plots show the median (centre line), IQR (box), and whiskers extending to 1.5× the IQR by default in ggplot2. The P-values derived from the two-sided Wilcoxon rank-sum test are shown above the box plots.
Extended Data Fig. 7 Box plot illustrating neoantigen predictions across all mutational signatures.
Neoantigen burden is normalized as the number of neoantigens per 1000 mutations. All box plots show the median (centre line), IQR (box), and whiskers extending to 1.5× the IQR by default in ggplot2. Each data point is shown as an individual sample.
Extended Data Fig. 8 Validation of the association between L1 retrotransposition and mutational signatures ID2 using WGS data from normal and matched colorectal cancer clones32.
Scatter plots show Pearson correlations between the number of somatic L1 insertions and the number of ID1/ID2 mutations in normal colorectal epithelium (red), matched colorectal cancer (blue), and MUTYH-associated adenomatous polyps (orange). Spearman correlation coefficients (R) and p-values (P) are indicated in the plots. The fitted regression line (blue) and 95% confidence intervals (grey) are shown.
Extended Data Fig. 9 ZNF695 expression in single cell RNA-seq studies.
a, Identification of lung cell types with detectable ZNF695 expression across multiple single cell types from RNA-seq data in the CZ CELLxGENE database. The numbers on the right y-axis represent the number of cells identified from the single-cell studies. b, Enrichment of ZNF695 expression in alveolar type 2 proliferating (AT2pro) cells from a normal lung single-nucleus multiome dataset. Dot plot visualizes the normalized RNA expression of ZNF695 by cell type. The colour and size of each dot correspond to the scaled average expression level and fraction of expressing cells, respectively. c, Enrichment of L1 RNA expression in cells with detectable ZNF695 expression. P-values from the Wilcoxon rank-sum test are shown on the top of the boxplot.
Extended Data Fig. 10 Schematic model of canonical and non-canonical ZNF695 function in L1 regulation.
This schematic, adapted from ref. 49, illustrates ZNF695-mediated regulation of L1 retrotransposons. In normal cells, canonical KZFP transcripts encode proteins with an N-terminal KRAB domain and a C-terminal zinc finger array. The zinc finger domain binds transposable elements (TEs), while the KRAB domain recruits a heterochromatin-inducing complex—including TRIM28 (KAP1), SETDB1, the NuRD complex, HP1, and DNMTs. This complex mediates H3K9 trimethylation and directs DNMTs to methylate CpG sites in L1 promoter regions, repressing L1 retrotransposition. In tumour cells, however, non-canonical ZNF695 transcripts, lacking the zinc finger domain, predominate. This impairs TE binding and DNMT recruitment, leading to L1 promoter hypomethylation and subsequent L1 activation. Between the first and second strand synthesis by activated ORF2p (ref. 38), intermediates may be cleaved and integrated into the genome. Coupled with DNA repair mechanisms, this process may generate single-base-pair indels (e.g., ID1/ID2). This suggests a dominant-negative effect, where the balance between canonical and non-canonical transcripts governs L1 methylation status.
Supplementary information
Supplementary Information
Guide to the Supplementary Tables 1–7 and Supplementary Figs 1–36.
Supplementary Tables 1–7
Supplementary Tables 1–7.
Rights and permissions
About this article
Cite this article
Zhang, T., Zhao, W., Wirth, C. et al. Uncovering the role of LINE-1 in the evolution of lung adenocarcinoma. Nature (2025). https://doi.org/10.1038/s41586-025-09825-y
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41586-025-09825-y


