Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

EpiAgent: foundation model for single-cell epigenomics

Abstract

Although single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) enables the exploration of the epigenomic landscape that governs transcription at the cellular level, the complicated characteristics of the sequencing data and the broad scope of downstream tasks mean that a sophisticated and versatile computational method is urgently needed. Here we introduce EpiAgent, a foundation model pretrained on our manually curated large-scale Human-scATAC-Corpus. EpiAgent encodes chromatin accessibility patterns of cells as concise ‘cell sentences’ and captures cellular heterogeneity behind regulatory networks via bidirectional attention. Comprehensive benchmarks show that EpiAgent excels in typical downstream tasks, including unsupervised feature extraction, supervised cell type annotation and data imputation. By incorporating external embeddings, EpiAgent enables effective cellular response prediction for both out-of-sample stimulated and unseen genetic perturbations, reference data integration and query data mapping. Through in silico knockout of cis-regulatory elements, EpiAgent demonstrates the potential to model cell state changes. EpiAgent is further extended to directly annotate cell types in a zero-shot manner.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of EpiAgent.
Fig. 2: Typical downstream tasks for analyzing scATAC-seq data using EpiAgent.
Fig. 3: Prediction of cellular responses to out-of-sample stimulated and unseen genetic perturbations using EpiAgent.
Fig. 4: Reference data integration and query data mapping using EpiAgent.
Fig. 5: In silico cCRE knockouts through EpiAgent to simulate cell state.
Fig. 6: Direct annotation on newly sequenced datasets using supervised EpiAgent-B and EpiAgent-NT models.

Similar content being viewed by others

Data availability

All data used in this study, including pretraining datasets and those used for downstream analyses, are available in the ensemble database79. The pretraining data in Human-scATAC-Corpus includes 27 published datasets6,7,8,38,42,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72 and an integrated dataset from 30 PBMC samples from the 10x Genomics website (https://www.10xgenomics.com/datasets/). All of these datasets are publicly accessible, and detailed statistics and data availability are provided in Supplementary Table 1. Additional datasets3,4,5,30,31,42,45,46 not used for pretraining, but used for evaluating EpiAgent in downstream analyses, are also publicly available, with detailed statistics and data availability provided in Supplementary Table 5. Source data are provided with this paper.

Code availability

EpiAgent is freely available on GitHub (https://github.com/xy-chen16/EpiAgent) and the Zenodo repository80 (https://doi.org/10.5281/zenodo.16562787).

References

  1. Klemm, S. L., Shipony, Z. & Greenleaf, W. J. Chromatin accessibility and the regulatory epigenome. Nat. Rev. Genet. 20, 207–220 (2019).

    Article  CAS  PubMed  Google Scholar 

  2. Monnoye, L. et al. Chromatin accessibility profiling methods. Nat. Rev. Methods Primers 1, 10 (2021).

    Article  Google Scholar 

  3. Buenrostro, J. D. et al. Integrated single-cell analysis maps the continuous regulatory landscape of human hematopoietic differentiation. Cell 173, 1535–1548 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Ameen, M. et al. Integrative single-cell analysis of cardiogenesis identifies developmental trajectories and non-coding mutations in congenital heart disease. Cell 185, 4937–4953 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Terekhanova, N. V. et al. Epigenetic regulation during cancer transitions across 11 tumour types. Nature 623, 432–441 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Domcke, S. et al. A human cell atlas of fetal chromatin accessibility. Science 370, eaba7612 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Zhang, K. et al. A single-cell atlas of chromatin accessibility in the human genome. Cell 184, 5985–6001 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Li, Y. E. et al. A comparative atlas of single-cell chromatin accessibility in the human brain. Science 382, eadf7044 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Stuart, T., Srivastava, A., Madad, S., Lareau, C. A. & Satija, R. Single-cell chromatin state analysis with Signac. Nat. Methods 18, 1333–1341 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Danese, A. et al. EpiScanpy: integrated single-cell epigenomic analysis. Nat. Commun. 12, 5228 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Bravo Gonzalez-Blas, C. et al. cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data. Nat. Methods 16, 397–400 (2019).

    Article  CAS  PubMed  Google Scholar 

  12. Xiong, L. et al. SCALE method for single-cell ATAC-seq analysis via latent feature extraction. Nat. Commun. 10, 4576 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Ashuach, T., Reidenbach, D. A., Gayoso, A. & Yosef, N. PeakVI: a deep generative model for single-cell chromatin accessibility analysis. Cell Rep. Methods 2, 100182 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Xiong, L. et al. Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space. Nat. Commun. 13, 6118 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Yuan, H. & Kelley, D. R. scBasset: sequence-based modeling of single-cell ATAC-seq using convolutional neural networks. Nat. Methods 19, 1088–1096 (2022).

    Article  CAS  PubMed  Google Scholar 

  16. Cui, X. et al. Discrete latent embedding of single-cell chromatin accessibility sequencing data for uncovering cell heterogeneity. Nat. Comput. Sci. 4, 346–359 (2024).

    Article  CAS  PubMed  Google Scholar 

  17. Chen, X. et al. Cell type annotation of single-cell chromatin accessibility data via supervised Bayesian embedding. Nature Machine Intelligence 4, 116–126 (2022).

    Article  Google Scholar 

  18. Ma, W., Lu, J. & Wu, H. Cellcano: supervised cell type identification for single cell ATAC-seq data. Nat. Commun. 14, 1864 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Zeng, Y. et al. Deciphering cell types by integrating scATAC-seq data with genome sequences. Nat. Comput. Sci. 4, 285–298 (2024).

    Article  CAS  PubMed  Google Scholar 

  20. Li, Z. et al. Chromatin-accessibility estimation from single-cell ATAC-seq data with scOpen. Nat. Commun. 12, 6386 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Tang, S. et al. scCASE: accurate and interpretable enhancement for single-cell chromatin accessibility sequencing data. Nat. Commun. 15, 1629 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Han, X. et al. Pre-trained models: past, present and future. AI Open 2, 225–250 (2021).

    Article  Google Scholar 

  23. Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).

    Article  CAS  PubMed  Google Scholar 

  25. Hao, M. et al. Large-scale foundation model on single-cell transcriptomics. Nat. Methods 21, 1481–1491 (2024).

    Article  CAS  PubMed  Google Scholar 

  26. Yang, X. et al. GeneCompass: deciphering universal gene regulatory mechanisms with a knowledge-informed cross-species foundation model. Cell Res. 34, 830–845 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Lotfollahi, M. Toward learning a foundational representation of cells and genes. Nat. Methods 21, 1416–1417 (2024).

    Article  CAS  PubMed  Google Scholar 

  28. Dao, T. FlashAttention-2: faster attention with better parallelism and work partitioning. Preprint at https://arxiv.org/abs/2307.08691 (2023).

  29. Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).

    Article  CAS  PubMed  Google Scholar 

  30. Kanemaru, K. et al. Spatially resolved multiomics of human cardiac niches. Nature 619, 801–810 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Li, J. et al. Divergent single cell transcriptome and epigenome alterations in ALS and FTD patients with C9orf72 mutation. Nat. Commun. 14, 5714 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Long, Z. et al. Single-cell multiomics analysis reveals regulatory programs in clear cell renal cell carcinoma. Cell Discov. 8, 68 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Moon, K. R. et al. Visualizing structure and transitions in high-dimensional biological data. Nat. Biotechnol. 37, 1482–1492 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Xiao, Y. et al. Tracking single-cell evolution using clock-like chromatin accessibility loci. Nat. Biotechnol. 43, 784–798 (2025).

    Article  CAS  PubMed  Google Scholar 

  36. Abdelaal, T. et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 20, 194 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Ma, W., Su, K. & Wu, H. Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction. Genome Biol. 22, 264 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Lee, A. J. et al. Characterization of altered molecular mechanisms in Parkinson’s disease through cell type-resolved multiomics analyses. Sci. Adv. 9, eabo2467 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Peidli, S. et al. scPerturb: harmonized single-cell perturbation data. Nat. Methods 21, 531–540 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Harris, M. A. et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258–D261 (2004).

    Article  CAS  PubMed  Google Scholar 

  42. Lareau, C. A. et al. Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nat. Biotechnol. 37, 916–924 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Lotfollahi, M., Wolf, F. A. & Theis, F. J. scGen predicts single-cell perturbation responses. Nat. Methods 16, 715–721 (2019).

    Article  CAS  PubMed  Google Scholar 

  44. Jiang, Q., Chen, S., Chen, X. & Jiang, R. scPRAM accurately predicts single-cell gene expression perturbation response based on attention mechanism. Bioinformatics 40, btae265 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Pierce, S. E., Granja, J. M. & Greenleaf, W. J. High-throughput single-cell chromatin accessibility CRISPR screens enable unbiased identification of regulatory networks in cancer. Nat. Commun. 12, 2969 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Liscovitch-Brauer, N. et al. Profiling the genetic determinants of chromatin accessibility with scalable single-cell CRISPR screens. Nat. Biotechnol. 39, 1270–1277 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Roohani, Y., Huang, K. & Leskovec, J. Predicting transcriptional outcomes of novel multigene perturbations with GEARS. Nat. Biotechnol. 42, 927–935 (2024).

    Article  CAS  PubMed  Google Scholar 

  48. Argelaguet, R., Cuomo, A. S. E., Stegle, O. & Marioni, J. C. Computational principles and challenges in single-cell data integration. Nat. Biotechnol. 39, 1202–1215 (2021).

    Article  CAS  PubMed  Google Scholar 

  49. Fu, X. et al. A foundation model of transcription across human cell types. Nature 637, 965–973 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Yang, Z. et al. Multiomic foundation model predicts epigenetic regulation by zero-shot. Preprint at bioRxiv https://doi.org/10.1101/2024.12.19.629561 (2024).

  51. Mannens, C. C. et al. Chromatin accessibility during human first-trimester neurodevelopment. Nature https://doi.org/10.1038/s41586-024-07234-1 (2024).

  52. Garcia-Alonso, L. et al. Single-cell roadmap of human gonadal development. Nature 607, 540–547 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Hocker, J. D. et al. Cardiac cell type-specific gene regulatory programs and disease risk association. Sci. Adv. 7, eabf1444 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Yoshimura, Y. et al. A single-cell multiomic analysis of kidney organoid differentiation. Proc. Natl Acad. Sci. USA 120, e2219699120 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Satpathy, A. T. et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat. Biotechnol. 37, 925–936 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Wang, S. K. et al. Single-cell multiome of the human retina and deep learning nominate causal variants in complex eye diseases. Cell Genom. 2, 100164 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Jin, C. et al. Molecular and genetic insights into human ovarian aging from single-nuclei multi-omics analyses. Nat. Aging 5, 275–290 (2025).

    Article  PubMed  Google Scholar 

  58. Zhang, Z. et al. Single nucleus transcriptome and chromatin accessibility of postmortem human pituitaries reveal diverse stem cell regulatory mechanisms. Cell Rep. 38, 110467 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Muto, Y. et al. Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney. Nat. Commun. 12, 2190 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Granja, J. M. et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nat. Biotechnol. 37, 1458–1465 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Wang, J. et al. Single-cell multiomics of the human retina reveals hierarchical transcription factor collaboration in mediating cell type-specific effects of genetic variants on gene regulation. Genome Biol. 24, 269 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Liang, Q. et al. A multi-omics atlas of the human retina at single-cell resolution. Cell Genom. 3, 100298 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Herring, C. A. et al. Human prefrontal cortex gene regulatory dynamics from gestation to adulthood at single-cell resolution. Cell 185, 4428–4447 (2022).

    Article  CAS  PubMed  Google Scholar 

  64. Ziffra, R. S. et al. Single-cell epigenomics reveals mechanisms of human cortical development. Nature 598, 205–213 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Corces, M. R. et al. Single-cell epigenomic analyses implicate candidate causal variants at inherited risk loci for Alzheimer’s and Parkinson’s diseases. Nat. Genet. 52, 1158–1168 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Morabito, S. et al. Single-nucleus chromatin accessibility and transcriptomic characterization of Alzheimer’s disease. Nat. Genet. 53, 1143–1155 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Ma, S. et al. Molecular and cellular evolution of the primate dorsolateral prefrontal cortex. Science 377, eabo7257 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Zhu, K. et al. Multi-omic profiling of the developing human cerebral cortex at the single-cell level. Sci. Adv. 9, eadg3754 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Trevino, A. E. et al. Chromatin and gene-regulatory dynamics of the developing human cerebral cortex at single-cell resolution. Cell 184, 5053–5069 (2021).

    Article  CAS  PubMed  Google Scholar 

  70. Wang, A. et al. Single-cell multiomic profiling of human lungs reveals cell-type-specific and age-dynamic control of SARS-CoV2 host genes. Elife 9, e62522 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Chiou, J. et al. Single-cell chromatin accessibility identifies pancreatic islet cell type- and state-specific regulatory programs of diabetes risk. Nat. Genet. 53, 455–466 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Duong, T. E. et al. A single-cell regulatory map of postnatal lung alveologenesis in humans and mice. Cell Genom. 2, 100108 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).

    Article  CAS  PubMed  Google Scholar 

  74. Cao, Z. J. & Gao, G. Multi-omics single-cell data integration and regulatory inference with graph-linked embedding. Nat. Biotechnol. 40, 1458–1466 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Klingler, H. C. [Energy ablative therapy of renal tumours.] Urologe A 46, 485–486, 488–490, 492–495 (2007).

  76. Neural optimal transport predicts perturbation responses at the single-cell level. Nat. Methods 20, 1639–1640 (2023).

  77. Wei, X., Dong, J. & Wang, F. scPreGAN, a deep generative model for predicting the response of single-cell expression to perturbation. Bioinformatics 38, 3377–3384 (2022).

    Article  CAS  PubMed  Google Scholar 

  78. Buttner, M., Miao, Z., Wolf, F. A., Teichmann, S. A. & Theis, F. J. A test metric for assessing single-cell RNA-seq batch correction. Nat. Methods 16, 43–49 (2019).

    Article  PubMed  Google Scholar 

  79. Chen, X. et al. Human-scATAC-Corpus: a comprehensive database of scATAC-seq data. Preprint at bioRxiv https://doi.org/10.1101/2025.09.05.674505 (2025).

  80. Chen, X. Codebase for EpiAgent: foundation model for single-cell epigenomics. Zenodo https://doi.org/10.5281/zenodo.16562787 (2025).

Download references

Acknowledgements

This work was partly supported by the National Key Research and Development Program of China, grant numbers 2023YFF1204802 (R.J.), 2025YFC3409300 (R.J.), 2022YFF1202400 (H.L.), 2021YFF1200902 (R.J.), the National Natural Science Foundation of China, grant number 62273194 (R.J.) and the Beijing Natural Science Foundation grant number L242026 (R.J.).

Author information

Authors and Affiliations

Authors

Contributions

R.J. conceived the study and supervised the project. X. Chen collected and processed all data in Human-scATAC-Corpus and downstream analyses, and designed, implemented and validated EpiAgent. K.L. assisted in analyzing the results for data imputation, prediction of unseen genetic perturbations, and in silico treatment. X. Cui contributed to analyzing the results for unsupervised feature extraction and reference data integration. Z.W. helped with analyzing the results for supervised cell type annotation and validation of EpiAgent-B and EpiAgent-NT. Q.J. aided in the analysis prediction of out-of-sample stimulated perturbations. J.L. contributed to designing the pretraining tasks. Z.L., Z.G. and H.L. helped with implementation and analysis of EpiAgent. X. Chen, K.L. and R.J. wrote the manuscript, with input from all of the authors.

Corresponding author

Correspondence to Rui Jiang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Rita Strack, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Notes 1–17, Supplementary Figs. 1–47.

Reporting Summary

Supplementary Tables

Supplementary Table 1–5.

Source data

Source Data Figs. 2–6

Source data for Figs. 2–6.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, X., Li, K., Cui, X. et al. EpiAgent: foundation model for single-cell epigenomics. Nat Methods (2025). https://doi.org/10.1038/s41592-025-02822-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41592-025-02822-z

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics