Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Passenger mutations link cellular origin and transcriptional identity in human lung adenocarcinomas

Abstract

DNA damage is preferentially repaired in expressed genes; thus, genome-wide correlations between somatic mutation patterns and normal cell transcription may reflect tumor cell origins. Accordingly, we found that aggregate lung adenocarcinoma (LUAD) and squamous cancer (LUSC) somatic mutation density associated most strongly with distal (alveolar) and proximal (basal) lung cell-type-specific gene expression, respectively, consistent with presumed LUAD and LUSC cell origins. Analyzing individual genomes, 21% of LUADs bore mutational footprints of proximal airway origins, with 38% classified as ambiguous. Distal origin LUADs, enriched for KRAS and STK11 drivers, occurred mainly in smokers; proximal origin LUADs, enriched for EGFR drivers, were more common in never-smokers. Ambiguous origin LUADs showed APOBEC signatures and SMARCA4 alterations. TP53 mutant LUADs with non-distal cell origins preferentially exhibited non-distal transcriptional identity. Our study reveals a complex interplay between lineage and identity in LUAD evolution and offers a scalable strategy to infer tumor origins in human cancers.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Lung scRNA-seq atlas reveals transcriptional plasticity in LUAD cells.
Fig. 2: Integrating single-cell transcriptomes with passenger mutational footprints enables inference of COO.
Fig. 3: Inferring patient-specific LUAD COO from passenger mutational patterns uncovers non-distal origins.
Fig. 4: Molecular features associated with LUAD COO subgroups.
Fig. 5: Distal identity is depleted in a subset of LUADs enriched having TP53 mutations.
Fig. 6: Lineage plasticity in an ambiguous origin LUAD.

Similar content being viewed by others

Data availability

Obtained data sources

Whole genome sequencing data were obtained from cases as part of TCGA Research Network consortium through the Genomic Data Commons (https://portal.gdc.cancer.gov). We obtained 205 cases with appropriate dbGaP permissions (study accession, phs000178.v11.p8). Additional WGS data were obtained for 90 cases from multiple studies publicly available through the EGA (https://ega-archive.org) and dbGap (https://dbgap.ncbi.nlm.nih.gov). These cohorts include 49 LUADs (accession no. EGAS00001002801)28, 13 LUADs and three LUSCs from Cancer Alliance (accession no. EGAS00001004013) and 25 LUADs (accession no. phs000488.v2.p1)25. The HLCA was downloaded from CellXGene (https://cellxgene.cziscience.com/collections/6f6d381a-7701-4781-935c-db10d30de293). ScRNA-seq data for the lung cancer atlas were obtained from several publicly available cohorts, including https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-6149 (ref. 30), GEO accession GSE123904 (ref. 32), https://lungcancer.chenlulab.com (ref. 34), https://ega-archive.org/studies/EGAS00001004419 (ref. 33) and GSE133747 (ref. 31).

Generated data sources

Five LUAD samples were processed for both scRNA-seq and WGS (described in detail above). Raw count scRNA-seq data for these samples are available as Supplementary Data 1. WGS data for WCM-1 are available as Supplementary Data 2. Raw sequencing data available upon request by contacting the corresponding author and a 4–8 week review of the request by a data access committee and IRB. Source data are provided with this paper.

Code availability

Code for generating analyses and figures is provided in GitHub (https://github.com/mskilab-org/lung_coo_2025) and Zenodo (https://doi.org/10.5281/zenodo.17243535)59.

References

  1. Visvader, J. E. Cells of origin in cancer. Nature 469, 314–322 (2011).

    Article  CAS  PubMed  Google Scholar 

  2. Sainz de Aja, J., Dost, A. F. M. & Kim, C. F. Alveolar progenitor cells and the origin of lung cancer. J. Intern. Med. 289, 629–635 (2021).

    Article  CAS  PubMed  Google Scholar 

  3. Ferone, G., Lee, M. C., Sage, J. & Berns, A. Cells of origin of lung cancers: lessons from mouse studies. Genes Dev. 34, 1017–1032 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Rubin, M. A., Bristow, R. G., Thienger, P. D., Dive, C. & Imielinski, M. Impact of lineage plasticity to and from a neuroendocrine phenotype on progression and response in prostate and lung cancers. Mol. Cell 80, 562–577 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Quintanal-Villalonga, A. et al. Lineage plasticity in cancer: a shared pathway of therapeutic resistance. Nat. Rev. Clin. Oncol. 17, 360–371 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Ci, B. et al. Molecular differences across invasive lung adenocarcinoma morphological subgroups. Transl. Lung Cancer Res. 9, 1029–1040 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Nicholson, A. G., Scagliotti, G., Tsao, M. S., Yatabe, Y. & Travis, W. D. 2021 WHO classification of lung cancer: a globally applicable and molecular biomarker-relevant classification. J. Thorac. Oncol. 17, e80–e83 (2022).

    Article  PubMed  Google Scholar 

  8. Tomasetti, C., Vogelstein, B. & Parmigiani, G. Half or more of the somatic mutations in cancers of self-renewing tissues originate prior to tumor initiation. Proc. Natl Acad. Sci. USA 110, 1999–2004 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Pleasance, E. D. et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463, 191–196 (2010).

    Article  CAS  PubMed  Google Scholar 

  10. Pleasance, E. D. et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184–190 (2010).

    Article  CAS  PubMed  Google Scholar 

  11. Imielinski, M., Guo, G. & Meyerson, M. Insertions and deletions target lineage-defining genes in human cancers. Cell 168, 460–472.e14 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Haradhvala, N. J. et al. Mutational strand asymmetries in cancer genomes reveal mechanisms of DNA damage and repair. Cell 164, 538–549 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Supek, F. & Lehner, B. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 521, 81–84 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Stamatoyannopoulos, J. A. et al. Human mutation rate associated with DNA replication timing. Nat. Genet. 41, 393–395 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Tomkova, M., Tomek, J., Kriaucionis, S. & Schuster-Böckler, B. Mutational signature distribution varies with DNA replication timing and strand asymmetry. Genome Biol. 19, 129 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Schuster-Böckler, B. & Lehner, B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature 488, 504–507 (2012).

    Article  PubMed  Google Scholar 

  18. Polak, P. et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature 518, 360–364 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Gonzalez-Perez, A., Sabarinathan, R. & Lopez-Bigas, N. Local determinants of the mutational landscape of the human genome. Cell 177, 101–114 (2019).

    Article  CAS  PubMed  Google Scholar 

  20. Pich, O. et al. Somatic and germline mutation periodicity follow the orientation of the DNA minor groove around nucleosomes. Cell 175, 1074–1087.e18 (2018).

    Article  CAS  PubMed  Google Scholar 

  21. Salvadores, M., Mas-Ponte, D. & Supek, F. Passenger mutations accurately classify human tumors. PLoS Comput. Biol. 15, e1006953 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Jiao, W. et al. A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns. Nat. Commun. 11, 728 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Nguyen, L., Van Hoeck, A. & Cuppen, E. Machine learning-based tissue of origin classification for cancer of unknown primary diagnostics using genome-wide mutation features. Nat. Commun. 13, 4013 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Sikkema, L. et al. An integrated cell atlas of the lung in health and disease. Nat. Med. 29, 1563–1577 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Imielinski, M. et al. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell 150, 1107–1120 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Carrot-Zhang, J. et al. Whole-genome characterization of lung adenocarcinomas lacking the RTK/RAS/RAF pathway. Cell Rep. 34, 108707 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Collisson, E. A. et al. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014).

    Article  CAS  Google Scholar 

  28. Lee, J. J.-K. et al. Tracing oncogene rearrangements in the mutational history of lung adenocarcinoma. Cell 177, 1842–1857.e21 (2019).

    Article  CAS  PubMed  Google Scholar 

  29. Hadi, K. et al. Distinct classes of complex structural variation uncovered across thousands of cancer genome graphs. Cell 183, 197–210.e32 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Lambrechts, D. et al. Phenotype molding of stromal cells in the lung tumor microenvironment. Nat. Med. 24, 1277–1289 (2018).

    Article  CAS  PubMed  Google Scholar 

  31. Raredon, M. S. B. et al. Single-cell connectomic analysis of adult mammalian lungs. Sci. Adv. 5, eaaw3851 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Laughney, A. M. et al. Regenerative lineages and immune-mediated pruning in lung cancer metastasis. Nat. Med. 26, 259–269 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Lukassen, S. et al. SARS-CoV-2 receptor ACE 2 and TMPRSS 2 are primarily expressed in bronchial transient secretory cells. EMBO J. 39, e105114 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Zhang, L. et al. Integrated single-cell RNA sequencing analysis reveals distinct cellular and transcriptional modules associated with survival in lung cancer. Signal Transduct. Target. Ther. 7, 9 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Helleday, T., Eshtad, S. & Nik-Zainal, S. Mechanisms underlying mutational signatures in human cancers. Nat. Rev. Genet. 15, 585–598 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Spisak, N., de Manuel, M., Milligan, W., Sella, G. & Przeworski, M. The clock-like accumulation of germline and somatic mutations can arise from the interplay of DNA damage and repair. PLoS Biol. 22, e3002678 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Newman, A. M. et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 37, 773–782 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Campbell, J. D. et al. Distinct patterns of somatic genome alterations in lung adenocarcinomas and squamous cell carcinomas. Nat. Genet. 48, 607–616 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Kadur Lakshminarasimha Murthy, P. et al. Human distal lung maps and lineage hierarchies reveal a bipotent progenitor. Nature 604, 111–119 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Hill, W. et al. Lung adenocarcinoma promotion by air pollutants. Nature 616, 159–167 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Tong, X. et al. Adeno-to-squamous transition drives resistance to KRAS inhibition in LKB1 mutant lung cancer. Cancer Cell 42, 413–428.e7 (2024).

    Article  CAS  PubMed  Google Scholar 

  42. Rekhtman, N. et al. SMARCA4-deficient thoracic sarcomatoid tumors represent primarily smoking-related undifferentiated carcinomas rather than primary thoracic sarcomas. J. Thorac. Oncol. 15, 231–247 (2020).

    Article  CAS  PubMed  Google Scholar 

  43. Concepcion, C. P. et al. Smarca4 inactivation promotes lineage-specific transformation and early metastatic features in the lung. Cancer Discov. 12, 562–585 (2022).

    Article  CAS  PubMed  Google Scholar 

  44. Petljak, M., Green, A. M., Maciejowski, J. & Weitzman, M. D. Addressing the benefits of inhibiting APOBEC3-dependent mutagenesis in cancer. Nat. Genet. 54, 1599–1608 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Ettinger, D. S. et al. NCCN guidelines insights: non-small cell lung cancer, version 2.2021: featured updates to the NCCN guidelines. J. Natl Compr. Cancer Netw. 19, 254–266 (2021).

    Article  CAS  Google Scholar 

  46. Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 20, 296 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Díaz-Gay, M. et al. Assigning mutational signatures to individual samples and individual somatic mutations with SigProfilerAssignment. Bioinformatics 39, btad756 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Mallick, S. et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Pliner, H. A., Shendure, J. & Trapnell, C. Supervised classification enables rapid annotation of cell atlases. Nat. Methods 16, 983–986 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Tokheim, C. & Karchin, R. CHASMplus reveals the scope of somatic missense mutations driving human cancers. Cell Syst. 9, 9–23.e28 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Sondka, Z. et al. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer 18, 696–705 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Suehnholz, S. P. et al. Quantifying the expanding landscape of clinical actionability for patients with cancer. Cancer Discov. 14, 49–65 (2024).

    Article  CAS  PubMed  Google Scholar 

  56. Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016).

    Article  CAS  PubMed  Google Scholar 

  57. Masica, D. L. et al. CRAVAT 4: Cancer-Related Analysis of Variants Toolkit. Cancer Res. 77, e35–e38 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Finak, G. et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Panja, S., Mantri, P., Johnson, K. E., Martinez, J. S. A. & Imielinski, M. Analysis code for inferring cell-of-origin and transcriptional identity in human lung adenocarcinoma. Zenodo https://doi.org/10.5281/zenodo.17243535 (2025).

Download references

Acknowledgements

We thank C. Kim, T. Tammela and D. Lyden for helpful discussions and the Weill Cornell Medicine Epigenomics and Histology core facilities for technical support. Project support for this research was provided in part by the Center for Translational Pathology at the Department of Pathology and Laboratory Medicine, Weill Cornell Medicine. M.I., S.P., P.M. and H.T. were supported by National Institutes of Health (NIH) award R37CA229861 to M.I. In addition, M.I., J.S.A-M., A.D., J.R. and H.T. were supported by Weill Cornell Medicine Department of Pathology and Laboratory Medicine startup funds, NYU Perlmutter Cancer Center startup funds, a Pershing Square Sohn Cancer Prize and a Burroughs Wellcome Fund Career Award for Medical Scientists awarded to M.I. K.E.J. was supported by a National Science Foundation Graduate Research Fellowship (1746886).

Author information

Authors and Affiliations

Authors

Contributions

M.I. designed and supervised the study. S.P., P.M., J.S.A.-M. and K.E.J. performed experiments and data analysis with contributions from M.I. and A.D. S.B., K.O. and J.M.M. coordinated sample collection and reviewed histopathology. M.S. and P.S. performed scRNA-seq experiments. H.T., M.S., J.S.A.-M. and K.E.J. performed sample processing and library preparation for scRNA-seq and WGS. S.P., P.M., J.S.A.-M. and K.E.J. performed data curation, scRNA-seq analysis, cancer genomics analyses and simulations, with assistance from A.L. on driver mutation analysis. S.R.Y. and W.T. conducted the expert pathology review. K.E.J., S.B., J.R. and J.M.M. performed immunohistochemistry analysis. S.P., P.M., K.E.J., J.S.A.-M., A.D., P.P. and M.I. interpreted data. S.P., P.M., J.S.A.-M., K.E.J. and M.I. wrote the manuscript with comments from all authors.

Corresponding author

Correspondence to Marcin Imieliński.

Ethics declarations

Competing interests

M.I. reports receiving personal or consultancy fees from ImmPACT Bio outside of the scope of the submitted work. M.S. and P.S. report receiving personal fees from 10× Genomics outside of the scope of the submitted work. P.P. reports receiving personal fees from C2I-Genomics outside of the scope of the submitted work. S.R.Y. reports receiving speaking fees from Medscape, Onclive, Medical Learning Institute, PRIME Education, AstraZeneca and Roche outside of the scope of the submitted work. S.R.Y. reports consultancy fees from AstraZeneca, AbbVie, Merus, Eli Lilly, Boehringer Ingelheim, Roche, Amgen and Sanofi outside of the scope of the submitted work. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks the anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Identification of carcinoma cells.

(a) Average gene expression of selected cell type specific markers across lung cell atlas scRNA-seq clusters (see Fig. 1b). Each circle i,j represents the expression of gene i in cell cluster j. The size of each circle i,j represents the percentage of cells in cluster i expressing gene j. The color represents the average expression of gene i in cluster j. (b) Scatter plot of Shannon’s diversity index (quantifying uniformity of cluster membership across patient samples) and cluster-level aneuploidy score used to label epithelial cell (EPCAM + ) clusters as carcinoma vs. non-carcinoma (see Methods and Supplementary Note 5). Carcinoma cells are EPCAM+ clusters with low diversity (Shannon’s diversity index < 10) and high aneuploidy (aneuploidy score > 40, thresholds indicated by dotted lines in the plot). (c) Heatmap of chromosome arm-level aneuploidy scores (average of normalized gene expression across each chromosome arm, see Methods and Supplementary Note 5 for details) showing gains (red) and losses (blue) across individual cells from EPCAM+ clusters (rows). Bars (right) delineate carcinoma and non-carcinoma labels for clusters to which cells were assigned.

Source Data

Extended Data Fig. 2 Hierarchical representation of epithelial cell-type composition across five levels of annotation in the healthy human lung atlas.

Hierarchy of human lung cell atlas cell types across five levels, from broadest (level 1) to most detailed (finest level). The “finest level” annotation comprises 23 cell types. Cell types in level 4 and “finest level” are denoted: b = bronchial, n = nasal, nn = non-nasal, ss = subsegmental, SMG = submucosal gland, TB = terminal bronchiole, and AT = alveolar type.

Source Data

Extended Data Fig. 3 Benchmarking label transfer and COO inference.

(a) Bar plot showing label concordance between auto-encoder based label transfer to the human lung cell atlas (scANVI) and marker-based cell annotation (Garnett) for benign cells from tumor and adjacent normal tissue samples (see Methods). (b) Gene-specific SNV density (SNVs/Mbp) in LUAD (n = 242 independent tumor/normal pairs) and LUSC (n = 53 independent tumor/normal pairs). Densities are shown across tertiles of AT2 and basal-resting gene expression. (c) Gene-specific LUAD SNV density of tobacco, aging, and APOBEC SNVs (SNVs/Mbp) across three tertiles of average lung epithelial cell gene expression (n = 242, as in b). (d) Accuracy of COO inference (see Methods) across levels of the HLCA lung epithelial cell hierarchy (see Extended Data Fig. 2). Cell types are denoted: b = bronchial, n = nasal, nn = non-nasal, ss = subsegmental, SMG = submucosal gland, TB = terminal bronchiole, and AT = alveolar type. (e) Accuracy of COO inference at various levels of cell type resolution (see Extended Data Fig. 2) and tumor mutational burden (TMB) (for a single tumor sample and/or aggregated cohort). Error bars in (b-c) represent standard error of the mean. Accuracy for (d) and (e) was calculated as the fraction of simulations which the inferred COO matched the true COO at the specified cell type taxonomy level (see Methods).

Source Data

Extended Data Fig. 4 Inferring patient-specific LUSC COO from passenger mutational patterns uncovers proximal origins.

(a) Hierarchically clustered heatmap showing association between benign cell type-specific gene expression and SNV density across individual LUSC WGS samples (n = 53 independent tumor/normal pairs). Each heatmap pixel i, j represents the strength of correlation (RR) for tumor sample i and benign lung cell type j, with values below 1 (blue) representing anti-correlation. (b) Bar plot of benign cell type-specific gene expression vs SNV counts regression results across the aggregated mutation calls from each of the LUSC clusters 1–3 (n = 6, 25, 22 cases respectively). Lime green bars represent proximal cell types, and orange bars represent distal cell types. Relative risk (RR) and 95% confidence interval (error bars) from the maximum likelihood regression fit is plotted for each aggregate cluster sample and benign lung cell type combination. Error bars represent 95% confidence intervals on the Bernoulli trial parameter. Cell types are labeled with the following abbreviations: b = bronchial, n = nasal, nn = non-nasal, ss = subsegmental, SMG = submucosal gland, TB = terminal bronchiole, and AT = alveolar type.

Source Data

Extended Data Fig. 5 Relationship between identity, origin, histology and TP53 mutations in LUAD.

(a) Bar plot comparing the fraction of cases with distal/non-distal lineage plasticity (that is those with distal COO and non-distal identity or vice versa) grouped by inferred LUAD COO and TP53 mutation status (b) Alluvial plot linking origin, identity and histology with TP53 wildtype (WT) and mutant status across n = 75 LUAD cases. Bar height is proportional to the number of cases with the given feature, and ribbons indicate the number of cases with the given feature pair. (c-d) Bar plots comparing fraction of (c) papillary histology and (d) NSCLC, NOS histology between non-distal (ND, n = 34 cases) and distal (D, n = 41 cases) identity groups. (e) Bar plots comparing fraction of NSCLC, NOS between TP53 mutant (n = 44) and wild-type (n = 31) groups. (f) Bar plots comparing fraction of TP53 mutant NSCLC, NOS (n = 19 cases) and other TP53 mutant histologies (n = 56 cases). (g) Oncoprint of genomic alterations in LUAD, LUSC and Pan NSCLC drivers in NSCLC-NOS tumor samples. Error bars in a,c-f represent 95% confidence intervals on the Bernoulli trial parameter. P values in c-f obtained by two-sided Fisher’s exact test.

Source Data

Extended Data Fig. 6 Identification of two distinct carcinoma cell populations in WCM-1.

(a) Euclidean distance of the (gene-wise cell type expression vs. SNV density) regression result vector for WCM-1 to regression vector centroids for distal, proximal, and ambiguous groups (Fig. 3, see Methods) (b-c) UMAP projection highlighting WCM-1 carcinoma cells (b) and two major WCM-1 carcinoma cell clusters (WCM-1-A, n = 618 cells; WCM-1-B, n = 1,297 cells) (c) against the backdrop of scRNA-seq transcriptomes of carcinoma cells from other patients’ tumor samples.

Source Data

Supplementary information

Supplementary Information

Supplementary Notes.

Reporting Summary

Supplementary Table

Supplementary Table 1 contains lung cancer single-cell atlas cases; Supplementary Table 2 contains cell-type-specific markers; Supplementary Table 3 contains LUAD WGS cases; Supplementary Table 4 contains antibody information; Supplementary Table 5 contains reagents and instruments used in the study.

Supplementary Data 1

Zip file with contents: highly_variable_genes.txt, most highly variable genes across lung epithelial cell types; WCMcounts.csv, scRNA-seq counts matrix for LUAD patients profiled as part of this study; and WCMMetaData.csv, metadata for LUAD patients profiled as part of this study.

Supplementary Data 2

Zip file with contents: WCM-1_non-apobec_data.csv, per gene count of Non-APOBEC mutation in LUAD patient WCM-1.

Source data

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Panja, S., Mantri, P., Johnson, K.E. et al. Passenger mutations link cellular origin and transcriptional identity in human lung adenocarcinomas. Nat Genet 57, 3066–3074 (2025). https://doi.org/10.1038/s41588-025-02418-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41588-025-02418-5

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research