Abstract
Although single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) enables the exploration of the epigenomic landscape that governs transcription at the cellular level, the complicated characteristics of the sequencing data and the broad scope of downstream tasks mean that a sophisticated and versatile computational method is urgently needed. Here we introduce EpiAgent, a foundation model pretrained on our manually curated large-scale Human-scATAC-Corpus. EpiAgent encodes chromatin accessibility patterns of cells as concise ‘cell sentences’ and captures cellular heterogeneity behind regulatory networks via bidirectional attention. Comprehensive benchmarks show that EpiAgent excels in typical downstream tasks, including unsupervised feature extraction, supervised cell type annotation and data imputation. By incorporating external embeddings, EpiAgent enables effective cellular response prediction for both out-of-sample stimulated and unseen genetic perturbations, reference data integration and query data mapping. Through in silico knockout of cis-regulatory elements, EpiAgent demonstrates the potential to model cell state changes. EpiAgent is further extended to directly annotate cell types in a zero-shot manner.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
All data used in this study, including pretraining datasets and those used for downstream analyses, are available in the ensemble database79. The pretraining data in Human-scATAC-Corpus includes 27 published datasets6,7,8,38,42,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72 and an integrated dataset from 30 PBMC samples from the 10x Genomics website (https://www.10xgenomics.com/datasets/). All of these datasets are publicly accessible, and detailed statistics and data availability are provided in Supplementary Table 1. Additional datasets3,4,5,30,31,42,45,46 not used for pretraining, but used for evaluating EpiAgent in downstream analyses, are also publicly available, with detailed statistics and data availability provided in Supplementary Table 5. Source data are provided with this paper.
Code availability
EpiAgent is freely available on GitHub (https://github.com/xy-chen16/EpiAgent) and the Zenodo repository80 (https://doi.org/10.5281/zenodo.16562787).
References
Klemm, S. L., Shipony, Z. & Greenleaf, W. J. Chromatin accessibility and the regulatory epigenome. Nat. Rev. Genet. 20, 207–220 (2019).
Monnoye, L. et al. Chromatin accessibility profiling methods. Nat. Rev. Methods Primers 1, 10 (2021).
Buenrostro, J. D. et al. Integrated single-cell analysis maps the continuous regulatory landscape of human hematopoietic differentiation. Cell 173, 1535–1548 (2018).
Ameen, M. et al. Integrative single-cell analysis of cardiogenesis identifies developmental trajectories and non-coding mutations in congenital heart disease. Cell 185, 4937–4953 (2022).
Terekhanova, N. V. et al. Epigenetic regulation during cancer transitions across 11 tumour types. Nature 623, 432–441 (2023).
Domcke, S. et al. A human cell atlas of fetal chromatin accessibility. Science 370, eaba7612 (2020).
Zhang, K. et al. A single-cell atlas of chromatin accessibility in the human genome. Cell 184, 5985–6001 (2021).
Li, Y. E. et al. A comparative atlas of single-cell chromatin accessibility in the human brain. Science 382, eadf7044 (2023).
Stuart, T., Srivastava, A., Madad, S., Lareau, C. A. & Satija, R. Single-cell chromatin state analysis with Signac. Nat. Methods 18, 1333–1341 (2021).
Danese, A. et al. EpiScanpy: integrated single-cell epigenomic analysis. Nat. Commun. 12, 5228 (2021).
Bravo Gonzalez-Blas, C. et al. cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data. Nat. Methods 16, 397–400 (2019).
Xiong, L. et al. SCALE method for single-cell ATAC-seq analysis via latent feature extraction. Nat. Commun. 10, 4576 (2019).
Ashuach, T., Reidenbach, D. A., Gayoso, A. & Yosef, N. PeakVI: a deep generative model for single-cell chromatin accessibility analysis. Cell Rep. Methods 2, 100182 (2022).
Xiong, L. et al. Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space. Nat. Commun. 13, 6118 (2022).
Yuan, H. & Kelley, D. R. scBasset: sequence-based modeling of single-cell ATAC-seq using convolutional neural networks. Nat. Methods 19, 1088–1096 (2022).
Cui, X. et al. Discrete latent embedding of single-cell chromatin accessibility sequencing data for uncovering cell heterogeneity. Nat. Comput. Sci. 4, 346–359 (2024).
Chen, X. et al. Cell type annotation of single-cell chromatin accessibility data via supervised Bayesian embedding. Nature Machine Intelligence 4, 116–126 (2022).
Ma, W., Lu, J. & Wu, H. Cellcano: supervised cell type identification for single cell ATAC-seq data. Nat. Commun. 14, 1864 (2023).
Zeng, Y. et al. Deciphering cell types by integrating scATAC-seq data with genome sequences. Nat. Comput. Sci. 4, 285–298 (2024).
Li, Z. et al. Chromatin-accessibility estimation from single-cell ATAC-seq data with scOpen. Nat. Commun. 12, 6386 (2021).
Tang, S. et al. scCASE: accurate and interpretable enhancement for single-cell chromatin accessibility sequencing data. Nat. Commun. 15, 1629 (2024).
Han, X. et al. Pre-trained models: past, present and future. AI Open 2, 225–250 (2021).
Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).
Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).
Hao, M. et al. Large-scale foundation model on single-cell transcriptomics. Nat. Methods 21, 1481–1491 (2024).
Yang, X. et al. GeneCompass: deciphering universal gene regulatory mechanisms with a knowledge-informed cross-species foundation model. Cell Res. 34, 830–845 (2024).
Lotfollahi, M. Toward learning a foundational representation of cells and genes. Nat. Methods 21, 1416–1417 (2024).
Dao, T. FlashAttention-2: faster attention with better parallelism and work partitioning. Preprint at https://arxiv.org/abs/2307.08691 (2023).
Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).
Kanemaru, K. et al. Spatially resolved multiomics of human cardiac niches. Nature 619, 801–810 (2023).
Li, J. et al. Divergent single cell transcriptome and epigenome alterations in ALS and FTD patients with C9orf72 mutation. Nat. Commun. 14, 5714 (2023).
Long, Z. et al. Single-cell multiomics analysis reveals regulatory programs in clear cell renal cell carcinoma. Cell Discov. 8, 68 (2022).
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Moon, K. R. et al. Visualizing structure and transitions in high-dimensional biological data. Nat. Biotechnol. 37, 1482–1492 (2019).
Xiao, Y. et al. Tracking single-cell evolution using clock-like chromatin accessibility loci. Nat. Biotechnol. 43, 784–798 (2025).
Abdelaal, T. et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 20, 194 (2019).
Ma, W., Su, K. & Wu, H. Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction. Genome Biol. 22, 264 (2021).
Lee, A. J. et al. Characterization of altered molecular mechanisms in Parkinson’s disease through cell type-resolved multiomics analyses. Sci. Adv. 9, eabo2467 (2023).
McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).
Peidli, S. et al. scPerturb: harmonized single-cell perturbation data. Nat. Methods 21, 531–540 (2024).
Harris, M. A. et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258–D261 (2004).
Lareau, C. A. et al. Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nat. Biotechnol. 37, 916–924 (2019).
Lotfollahi, M., Wolf, F. A. & Theis, F. J. scGen predicts single-cell perturbation responses. Nat. Methods 16, 715–721 (2019).
Jiang, Q., Chen, S., Chen, X. & Jiang, R. scPRAM accurately predicts single-cell gene expression perturbation response based on attention mechanism. Bioinformatics 40, btae265 (2024).
Pierce, S. E., Granja, J. M. & Greenleaf, W. J. High-throughput single-cell chromatin accessibility CRISPR screens enable unbiased identification of regulatory networks in cancer. Nat. Commun. 12, 2969 (2021).
Liscovitch-Brauer, N. et al. Profiling the genetic determinants of chromatin accessibility with scalable single-cell CRISPR screens. Nat. Biotechnol. 39, 1270–1277 (2021).
Roohani, Y., Huang, K. & Leskovec, J. Predicting transcriptional outcomes of novel multigene perturbations with GEARS. Nat. Biotechnol. 42, 927–935 (2024).
Argelaguet, R., Cuomo, A. S. E., Stegle, O. & Marioni, J. C. Computational principles and challenges in single-cell data integration. Nat. Biotechnol. 39, 1202–1215 (2021).
Fu, X. et al. A foundation model of transcription across human cell types. Nature 637, 965–973 (2025).
Yang, Z. et al. Multiomic foundation model predicts epigenetic regulation by zero-shot. Preprint at bioRxiv https://doi.org/10.1101/2024.12.19.629561 (2024).
Mannens, C. C. et al. Chromatin accessibility during human first-trimester neurodevelopment. Nature https://doi.org/10.1038/s41586-024-07234-1 (2024).
Garcia-Alonso, L. et al. Single-cell roadmap of human gonadal development. Nature 607, 540–547 (2022).
Hocker, J. D. et al. Cardiac cell type-specific gene regulatory programs and disease risk association. Sci. Adv. 7, eabf1444 (2021).
Yoshimura, Y. et al. A single-cell multiomic analysis of kidney organoid differentiation. Proc. Natl Acad. Sci. USA 120, e2219699120 (2023).
Satpathy, A. T. et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat. Biotechnol. 37, 925–936 (2019).
Wang, S. K. et al. Single-cell multiome of the human retina and deep learning nominate causal variants in complex eye diseases. Cell Genom. 2, 100164 (2022).
Jin, C. et al. Molecular and genetic insights into human ovarian aging from single-nuclei multi-omics analyses. Nat. Aging 5, 275–290 (2025).
Zhang, Z. et al. Single nucleus transcriptome and chromatin accessibility of postmortem human pituitaries reveal diverse stem cell regulatory mechanisms. Cell Rep. 38, 110467 (2022).
Muto, Y. et al. Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney. Nat. Commun. 12, 2190 (2021).
Granja, J. M. et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nat. Biotechnol. 37, 1458–1465 (2019).
Wang, J. et al. Single-cell multiomics of the human retina reveals hierarchical transcription factor collaboration in mediating cell type-specific effects of genetic variants on gene regulation. Genome Biol. 24, 269 (2023).
Liang, Q. et al. A multi-omics atlas of the human retina at single-cell resolution. Cell Genom. 3, 100298 (2023).
Herring, C. A. et al. Human prefrontal cortex gene regulatory dynamics from gestation to adulthood at single-cell resolution. Cell 185, 4428–4447 (2022).
Ziffra, R. S. et al. Single-cell epigenomics reveals mechanisms of human cortical development. Nature 598, 205–213 (2021).
Corces, M. R. et al. Single-cell epigenomic analyses implicate candidate causal variants at inherited risk loci for Alzheimer’s and Parkinson’s diseases. Nat. Genet. 52, 1158–1168 (2020).
Morabito, S. et al. Single-nucleus chromatin accessibility and transcriptomic characterization of Alzheimer’s disease. Nat. Genet. 53, 1143–1155 (2021).
Ma, S. et al. Molecular and cellular evolution of the primate dorsolateral prefrontal cortex. Science 377, eabo7257 (2022).
Zhu, K. et al. Multi-omic profiling of the developing human cerebral cortex at the single-cell level. Sci. Adv. 9, eadg3754 (2023).
Trevino, A. E. et al. Chromatin and gene-regulatory dynamics of the developing human cerebral cortex at single-cell resolution. Cell 184, 5053–5069 (2021).
Wang, A. et al. Single-cell multiomic profiling of human lungs reveals cell-type-specific and age-dynamic control of SARS-CoV2 host genes. Elife 9, e62522 (2020).
Chiou, J. et al. Single-cell chromatin accessibility identifies pancreatic islet cell type- and state-specific regulatory programs of diabetes risk. Nat. Genet. 53, 455–466 (2021).
Duong, T. E. et al. A single-cell regulatory map of postnatal lung alveologenesis in humans and mice. Cell Genom. 2, 100108 (2022).
Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).
Cao, Z. J. & Gao, G. Multi-omics single-cell data integration and regulatory inference with graph-linked embedding. Nat. Biotechnol. 40, 1458–1466 (2022).
Klingler, H. C. [Energy ablative therapy of renal tumours.] Urologe A 46, 485–486, 488–490, 492–495 (2007).
Neural optimal transport predicts perturbation responses at the single-cell level. Nat. Methods 20, 1639–1640 (2023).
Wei, X., Dong, J. & Wang, F. scPreGAN, a deep generative model for predicting the response of single-cell expression to perturbation. Bioinformatics 38, 3377–3384 (2022).
Buttner, M., Miao, Z., Wolf, F. A., Teichmann, S. A. & Theis, F. J. A test metric for assessing single-cell RNA-seq batch correction. Nat. Methods 16, 43–49 (2019).
Chen, X. et al. Human-scATAC-Corpus: a comprehensive database of scATAC-seq data. Preprint at bioRxiv https://doi.org/10.1101/2025.09.05.674505 (2025).
Chen, X. Codebase for EpiAgent: foundation model for single-cell epigenomics. Zenodo https://doi.org/10.5281/zenodo.16562787 (2025).
Acknowledgements
This work was partly supported by the National Key Research and Development Program of China, grant numbers 2023YFF1204802 (R.J.), 2025YFC3409300 (R.J.), 2022YFF1202400 (H.L.), 2021YFF1200902 (R.J.), the National Natural Science Foundation of China, grant number 62273194 (R.J.) and the Beijing Natural Science Foundation grant number L242026 (R.J.).
Author information
Authors and Affiliations
Contributions
R.J. conceived the study and supervised the project. X. Chen collected and processed all data in Human-scATAC-Corpus and downstream analyses, and designed, implemented and validated EpiAgent. K.L. assisted in analyzing the results for data imputation, prediction of unseen genetic perturbations, and in silico treatment. X. Cui contributed to analyzing the results for unsupervised feature extraction and reference data integration. Z.W. helped with analyzing the results for supervised cell type annotation and validation of EpiAgent-B and EpiAgent-NT. Q.J. aided in the analysis prediction of out-of-sample stimulated perturbations. J.L. contributed to designing the pretraining tasks. Z.L., Z.G. and H.L. helped with implementation and analysis of EpiAgent. X. Chen, K.L. and R.J. wrote the manuscript, with input from all of the authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Rita Strack, in collaboration with the Nature Methods team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Notes 1–17, Supplementary Figs. 1–47.
Supplementary Tables
Supplementary Table 1–5.
Source data
Source Data Figs. 2–6
Source data for Figs. 2–6.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, X., Li, K., Cui, X. et al. EpiAgent: foundation model for single-cell epigenomics. Nat Methods (2025). https://doi.org/10.1038/s41592-025-02822-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41592-025-02822-z