Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

EpiAgent: foundation model for single-cell epigenomics

Abstract

Although single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) enables the exploration of the epigenomic landscape that governs transcription at the cellular level, the complicated characteristics of the sequencing data and the broad scope of downstream tasks mean that a sophisticated and versatile computational method is urgently needed. Here we introduce EpiAgent, a foundation model pretrained on our manually curated large-scale Human-scATAC-Corpus. EpiAgent encodes chromatin accessibility patterns of cells as concise ‘cell sentences’ and captures cellular heterogeneity behind regulatory networks via bidirectional attention. Comprehensive benchmarks show that EpiAgent excels in typical downstream tasks, including unsupervised feature extraction, supervised cell type annotation and data imputation. By incorporating external embeddings, EpiAgent enables effective cellular response prediction for both out-of-sample stimulated and unseen genetic perturbations, reference data integration and query data mapping. Through in silico knockout of cis-regulatory elements, EpiAgent demonstrates the potential to model cell state changes. EpiAgent is further extended to directly annotate cell types in a zero-shot manner.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of EpiAgent.
Fig. 2: Typical downstream tasks for analyzing scATAC-seq data using EpiAgent.
Fig. 3: Prediction of cellular responses to out-of-sample stimulated and unseen genetic perturbations using EpiAgent.
Fig. 4: Reference data integration and query data mapping using EpiAgent.
Fig. 5: In silico cCRE knockouts through EpiAgent to simulate cell state.
Fig. 6: Direct annotation on newly sequenced datasets using supervised EpiAgent-B and EpiAgent-NT models.

Similar content being viewed by others

Data availability

All data used in this study, including pretraining datasets and those used for downstream analyses, are available in the ensemble database79. The pretraining data in Human-scATAC-Corpus includes 27 published datasets6,7,8,38,42,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72 and an integrated dataset from 30 PBMC samples from the 10x Genomics website (https://www.10xgenomics.com/datasets/). All of these datasets are publicly accessible, and detailed statistics and data availability are provided in Supplementary Table 1. Additional datasets3,4,5,30,31,42,45,46 not used for pretraining, but used for evaluating EpiAgent in downstream analyses, are also publicly available, with detailed statistics and data availability provided in Supplementary Table 5. Source data are provided with this paper.

Code availability

EpiAgent is freely available on GitHub (https://github.com/xy-chen16/EpiAgent) and the Zenodo repository80 (https://doi.org/10.5281/zenodo.16562787).

References

  1. Klemm, S. L., Shipony, Z. & Greenleaf, W. J. Chromatin accessibility and the regulatory epigenome. Nat. Rev. Genet. 20, 207–220 (2019).

    CAS  PubMed  Google Scholar 

  2. Monnoye, L. et al. Chromatin accessibility profiling methods. Nat. Rev. Methods Primers 1, 10 (2021).

    Google Scholar 

  3. Buenrostro, J. D. et al. Integrated single-cell analysis maps the continuous regulatory landscape of human hematopoietic differentiation. Cell 173, 1535–1548 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Ameen, M. et al. Integrative single-cell analysis of cardiogenesis identifies developmental trajectories and non-coding mutations in congenital heart disease. Cell 185, 4937–4953 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Terekhanova, N. V. et al. Epigenetic regulation during cancer transitions across 11 tumour types. Nature 623, 432–441 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Domcke, S. et al. A human cell atlas of fetal chromatin accessibility. Science 370, eaba7612 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Zhang, K. et al. A single-cell atlas of chromatin accessibility in the human genome. Cell 184, 5985–6001 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Li, Y. E. et al. A comparative atlas of single-cell chromatin accessibility in the human brain. Science 382, eadf7044 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Stuart, T., Srivastava, A., Madad, S., Lareau, C. A. & Satija, R. Single-cell chromatin state analysis with Signac. Nat. Methods 18, 1333–1341 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Danese, A. et al. EpiScanpy: integrated single-cell epigenomic analysis. Nat. Commun. 12, 5228 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Bravo Gonzalez-Blas, C. et al. cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data. Nat. Methods 16, 397–400 (2019).

    CAS  PubMed  Google Scholar 

  12. Xiong, L. et al. SCALE method for single-cell ATAC-seq analysis via latent feature extraction. Nat. Commun. 10, 4576 (2019).

    PubMed  PubMed Central  Google Scholar 

  13. Ashuach, T., Reidenbach, D. A., Gayoso, A. & Yosef, N. PeakVI: a deep generative model for single-cell chromatin accessibility analysis. Cell Rep. Methods 2, 100182 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. Xiong, L. et al. Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space. Nat. Commun. 13, 6118 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Yuan, H. & Kelley, D. R. scBasset: sequence-based modeling of single-cell ATAC-seq using convolutional neural networks. Nat. Methods 19, 1088–1096 (2022).

    CAS  PubMed  Google Scholar 

  16. Cui, X. et al. Discrete latent embedding of single-cell chromatin accessibility sequencing data for uncovering cell heterogeneity. Nat. Comput. Sci. 4, 346–359 (2024).

    CAS  PubMed  Google Scholar 

  17. Chen, X. et al. Cell type annotation of single-cell chromatin accessibility data via supervised Bayesian embedding. Nature Machine Intelligence 4, 116–126 (2022).

    Google Scholar 

  18. Ma, W., Lu, J. & Wu, H. Cellcano: supervised cell type identification for single cell ATAC-seq data. Nat. Commun. 14, 1864 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Zeng, Y. et al. Deciphering cell types by integrating scATAC-seq data with genome sequences. Nat. Comput. Sci. 4, 285–298 (2024).

    CAS  PubMed  Google Scholar 

  20. Li, Z. et al. Chromatin-accessibility estimation from single-cell ATAC-seq data with scOpen. Nat. Commun. 12, 6386 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Tang, S. et al. scCASE: accurate and interpretable enhancement for single-cell chromatin accessibility sequencing data. Nat. Commun. 15, 1629 (2024).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Han, X. et al. Pre-trained models: past, present and future. AI Open 2, 225–250 (2021).

    Google Scholar 

  23. Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).

    CAS  PubMed  Google Scholar 

  25. Hao, M. et al. Large-scale foundation model on single-cell transcriptomics. Nat. Methods 21, 1481–1491 (2024).

    CAS  PubMed  Google Scholar 

  26. Yang, X. et al. GeneCompass: deciphering universal gene regulatory mechanisms with a knowledge-informed cross-species foundation model. Cell Res. 34, 830–845 (2024).

    PubMed  PubMed Central  Google Scholar 

  27. Lotfollahi, M. Toward learning a foundational representation of cells and genes. Nat. Methods 21, 1416–1417 (2024).

    CAS  PubMed  Google Scholar 

  28. Dao, T. FlashAttention-2: faster attention with better parallelism and work partitioning. Preprint at https://arxiv.org/abs/2307.08691 (2023).

  29. Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).

    CAS  PubMed  Google Scholar 

  30. Kanemaru, K. et al. Spatially resolved multiomics of human cardiac niches. Nature 619, 801–810 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Li, J. et al. Divergent single cell transcriptome and epigenome alterations in ALS and FTD patients with C9orf72 mutation. Nat. Commun. 14, 5714 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Long, Z. et al. Single-cell multiomics analysis reveals regulatory programs in clear cell renal cell carcinoma. Cell Discov. 8, 68 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    PubMed  PubMed Central  Google Scholar 

  34. Moon, K. R. et al. Visualizing structure and transitions in high-dimensional biological data. Nat. Biotechnol. 37, 1482–1492 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Xiao, Y. et al. Tracking single-cell evolution using clock-like chromatin accessibility loci. Nat. Biotechnol. 43, 784–798 (2025).

    CAS  PubMed  Google Scholar 

  36. Abdelaal, T. et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 20, 194 (2019).

    PubMed  PubMed Central  Google Scholar 

  37. Ma, W., Su, K. & Wu, H. Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction. Genome Biol. 22, 264 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. Lee, A. J. et al. Characterization of altered molecular mechanisms in Parkinson’s disease through cell type-resolved multiomics analyses. Sci. Adv. 9, eabo2467 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Peidli, S. et al. scPerturb: harmonized single-cell perturbation data. Nat. Methods 21, 531–540 (2024).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Harris, M. A. et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258–D261 (2004).

    CAS  PubMed  Google Scholar 

  42. Lareau, C. A. et al. Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nat. Biotechnol. 37, 916–924 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. Lotfollahi, M., Wolf, F. A. & Theis, F. J. scGen predicts single-cell perturbation responses. Nat. Methods 16, 715–721 (2019).

    CAS  PubMed  Google Scholar 

  44. Jiang, Q., Chen, S., Chen, X. & Jiang, R. scPRAM accurately predicts single-cell gene expression perturbation response based on attention mechanism. Bioinformatics 40, btae265 (2024).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. Pierce, S. E., Granja, J. M. & Greenleaf, W. J. High-throughput single-cell chromatin accessibility CRISPR screens enable unbiased identification of regulatory networks in cancer. Nat. Commun. 12, 2969 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. Liscovitch-Brauer, N. et al. Profiling the genetic determinants of chromatin accessibility with scalable single-cell CRISPR screens. Nat. Biotechnol. 39, 1270–1277 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. Roohani, Y., Huang, K. & Leskovec, J. Predicting transcriptional outcomes of novel multigene perturbations with GEARS. Nat. Biotechnol. 42, 927–935 (2024).

    CAS  PubMed  Google Scholar 

  48. Argelaguet, R., Cuomo, A. S. E., Stegle, O. & Marioni, J. C. Computational principles and challenges in single-cell data integration. Nat. Biotechnol. 39, 1202–1215 (2021).

    CAS  PubMed  Google Scholar 

  49. Fu, X. et al. A foundation model of transcription across human cell types. Nature 637, 965–973 (2025).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Yang, Z. et al. Multiomic foundation model predicts epigenetic regulation by zero-shot. Preprint at bioRxiv https://doi.org/10.1101/2024.12.19.629561 (2024).

  51. Mannens, C. C. et al. Chromatin accessibility during human first-trimester neurodevelopment. Nature https://doi.org/10.1038/s41586-024-07234-1 (2024).

  52. Garcia-Alonso, L. et al. Single-cell roadmap of human gonadal development. Nature 607, 540–547 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Hocker, J. D. et al. Cardiac cell type-specific gene regulatory programs and disease risk association. Sci. Adv. 7, eabf1444 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Yoshimura, Y. et al. A single-cell multiomic analysis of kidney organoid differentiation. Proc. Natl Acad. Sci. USA 120, e2219699120 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. Satpathy, A. T. et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat. Biotechnol. 37, 925–936 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. Wang, S. K. et al. Single-cell multiome of the human retina and deep learning nominate causal variants in complex eye diseases. Cell Genom. 2, 100164 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. Jin, C. et al. Molecular and genetic insights into human ovarian aging from single-nuclei multi-omics analyses. Nat. Aging 5, 275–290 (2025).

    PubMed  Google Scholar 

  58. Zhang, Z. et al. Single nucleus transcriptome and chromatin accessibility of postmortem human pituitaries reveal diverse stem cell regulatory mechanisms. Cell Rep. 38, 110467 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. Muto, Y. et al. Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney. Nat. Commun. 12, 2190 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  60. Granja, J. M. et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nat. Biotechnol. 37, 1458–1465 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  61. Wang, J. et al. Single-cell multiomics of the human retina reveals hierarchical transcription factor collaboration in mediating cell type-specific effects of genetic variants on gene regulation. Genome Biol. 24, 269 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. Liang, Q. et al. A multi-omics atlas of the human retina at single-cell resolution. Cell Genom. 3, 100298 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  63. Herring, C. A. et al. Human prefrontal cortex gene regulatory dynamics from gestation to adulthood at single-cell resolution. Cell 185, 4428–4447 (2022).

    CAS  PubMed  Google Scholar 

  64. Ziffra, R. S. et al. Single-cell epigenomics reveals mechanisms of human cortical development. Nature 598, 205–213 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  65. Corces, M. R. et al. Single-cell epigenomic analyses implicate candidate causal variants at inherited risk loci for Alzheimer’s and Parkinson’s diseases. Nat. Genet. 52, 1158–1168 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Morabito, S. et al. Single-nucleus chromatin accessibility and transcriptomic characterization of Alzheimer’s disease. Nat. Genet. 53, 1143–1155 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. Ma, S. et al. Molecular and cellular evolution of the primate dorsolateral prefrontal cortex. Science 377, eabo7257 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. Zhu, K. et al. Multi-omic profiling of the developing human cerebral cortex at the single-cell level. Sci. Adv. 9, eadg3754 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. Trevino, A. E. et al. Chromatin and gene-regulatory dynamics of the developing human cerebral cortex at single-cell resolution. Cell 184, 5053–5069 (2021).

    CAS  PubMed  Google Scholar 

  70. Wang, A. et al. Single-cell multiomic profiling of human lungs reveals cell-type-specific and age-dynamic control of SARS-CoV2 host genes. Elife 9, e62522 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. Chiou, J. et al. Single-cell chromatin accessibility identifies pancreatic islet cell type- and state-specific regulatory programs of diabetes risk. Nat. Genet. 53, 455–466 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  72. Duong, T. E. et al. A single-cell regulatory map of postnatal lung alveologenesis in humans and mice. Cell Genom. 2, 100108 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).

    CAS  PubMed  Google Scholar 

  74. Cao, Z. J. & Gao, G. Multi-omics single-cell data integration and regulatory inference with graph-linked embedding. Nat. Biotechnol. 40, 1458–1466 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. Klingler, H. C. [Energy ablative therapy of renal tumours.] Urologe A 46, 485–486, 488–490, 492–495 (2007).

  76. Neural optimal transport predicts perturbation responses at the single-cell level. Nat. Methods 20, 1639–1640 (2023).

  77. Wei, X., Dong, J. & Wang, F. scPreGAN, a deep generative model for predicting the response of single-cell expression to perturbation. Bioinformatics 38, 3377–3384 (2022).

    CAS  PubMed  Google Scholar 

  78. Buttner, M., Miao, Z., Wolf, F. A., Teichmann, S. A. & Theis, F. J. A test metric for assessing single-cell RNA-seq batch correction. Nat. Methods 16, 43–49 (2019).

    PubMed  Google Scholar 

  79. Chen, X. et al. Human-scATAC-Corpus: a comprehensive database of scATAC-seq data. Preprint at bioRxiv https://doi.org/10.1101/2025.09.05.674505 (2025).

  80. Chen, X. Codebase for EpiAgent: foundation model for single-cell epigenomics. Zenodo https://doi.org/10.5281/zenodo.16562787 (2025).

Download references

Acknowledgements

This work was partly supported by the National Key Research and Development Program of China, grant numbers 2023YFF1204802 (R.J.), 2025YFC3409300 (R.J.), 2022YFF1202400 (H.L.), 2021YFF1200902 (R.J.), the National Natural Science Foundation of China, grant number 62273194 (R.J.) and the Beijing Natural Science Foundation grant number L242026 (R.J.).

Author information

Authors and Affiliations

Authors

Contributions

R.J. conceived the study and supervised the project. X. Chen collected and processed all data in Human-scATAC-Corpus and downstream analyses, and designed, implemented and validated EpiAgent. K.L. assisted in analyzing the results for data imputation, prediction of unseen genetic perturbations, and in silico treatment. X. Cui contributed to analyzing the results for unsupervised feature extraction and reference data integration. Z.W. helped with analyzing the results for supervised cell type annotation and validation of EpiAgent-B and EpiAgent-NT. Q.J. aided in the analysis prediction of out-of-sample stimulated perturbations. J.L. contributed to designing the pretraining tasks. Z.L., Z.G. and H.L. helped with implementation and analysis of EpiAgent. X. Chen, K.L. and R.J. wrote the manuscript, with input from all of the authors.

Corresponding author

Correspondence to Rui Jiang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Rita Strack, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Notes 1–17, Supplementary Figs. 1–47.

Reporting Summary

Supplementary Tables

Supplementary Table 1–5.

Source data

Source Data Figs. 2–6

Source data for Figs. 2–6.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, X., Li, K., Cui, X. et al. EpiAgent: foundation model for single-cell epigenomics. Nat Methods 22, 2316–2327 (2025). https://doi.org/10.1038/s41592-025-02822-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41592-025-02822-z

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics