Abstract
Single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) deciphers genome-wide chromatin accessibility, providing profound insights into gene regulation mechanisms. With the rapid advance of sequencing technologies, scATAC-seq data typically encompass numerous samples from various conditions, resulting in complex batch effects, thus necessitating reliable integration tools. While numerous batch integration tools exist for single-cell RNA sequencing data, inherent data characteristic differences limit their effectiveness on scATAC-seq data. Existing integration methods for scATAC-seq data suffer from several fundamental limitations, such as disrupting the biological heterogeneity and focusing solely on low-dimensional correction, which may distort data and hinder downstream analysis. Here we propose Fountain, a deep learning framework for scATAC-seq data integration via rigorous barycentric mapping. Barycentric mapping transforms one data distribution to another in a principled and effective manner through optimal transport. By regularizing barycentric mapping with geometric data information, Fountain achieves accurate batch alignment while preserving biological heterogeneity. Comprehensive experiments across diverse real-world datasets demonstrate the advantages of Fountain over existing methods in batch correction and biological conservation. In addition, the trained Fountain model can integrate data from new batches alongside already integrated data without retraining, enabling continuous online data integration. Moreover, Fountain’s reconstruction strategy generates batch-corrected ATAC profiles, improving the capture of cellular heterogeneity and revealing cell-type-specific implications such as expression enrichment analysis and partitioned heritability analysis.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
The following publicly available datasets were used in this study: the MB dataset from Gene Expression Omnibus (GEO) under accession number GSE126724 and 10x Genomic (https://support.10xgenomics.com/single-cell-atac/datasets/1.1.0/atac_v1_adult_brain_fresh_5k?; free registration required), the BMMC dataset from GSE123581, the HI dataset from GSE149683, the HL dataset from GSE161383 and the mouse cortex and marmoset cortex (MM) dataset from GSE204851. The methods for unifying cell type labels across batches, and the sources of those labels in these datasets, are described in Supplementary Note 1. The download methods of the two human peripheral blood mononuclear cell datasets (PBMCA and PBMCB) are provided in Supplementary Note 1. More detailed descriptions and source of datasets are provided in Supplementary Note 1 and Supplementary Table 1.
Code availability
Fountain is implemented in Python using the PyTorch framework. The MIT-licensed Fountain software with detailed documentation and tutorials is available via GitHub at https://github.com/BioX-NKU/Fountain. The source code is also available via Zenodo at https://doi.org/10.5281/zenodo.14924285 (ref. 64).
References
Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol. 109, 21–29 (2015).
Satpathy, A. T. et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat. Biotechnol. 37, 925–936 (2019).
Preissl, S. et al. Single-nucleus analysis of accessible chromatin in developing mouse forebrain reveals cell-type-specific transcriptional regulation. Nat. Neurosci. 21, 432–439 (2018).
Gehring, J., Hwee Park, J., Chen, S., Thomson, M. & Pachter, L. Highly multiplexed single-cell RNA-seq by DNA oligonucleotide tagging of cellular proteins. Nat. Biotechnol. 38, 35–38 (2020).
Almanzar, N. et al. A single-cell transcriptomic atlas characterizes ageing tissues in the mouse. Nature 583, 590–595 (2020).
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with harmony. Nat. Methods 16, 1289–1296 (2019).
Tran, H. T. N. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 21, 12 (2020).
Chazarra-Gil, R., van Dongen, S., Kiselev, V. Y. & Hemberg, M. Flexible comparison of batch correction methods for single-cell RNA-seq using BatchBench. Nucleic Acids Res. 49, e42–e42 (2021).
Chen, S. et al. RA3 is a reference-guided approach for epigenetic characterization of single cells. Nat. Commun. 12, 2177 (2021).
Stuart, T., Srivastava, A., Madad, S., Lareau, C. A. & Satija, R. Single-cell chromatin state analysis with Signac. Nat. Methods 18, 1333–1341 (2021).
Ashuach, T., Reidenbach, D. A., Gayoso, A. & Yosef, N. PeakVI: a deep generative model for single-cell chromatin accessibility analysis. Cell Rep. Methods 2, 100182 (2022).
Kopp, W., Akalin, A. & Ohler, U. Simultaneous dimensionality reduction and integration for single-cell atac-seq data using deep learning. Nat. Mach. Intell. 4, 162–168 (2022).
Xiong, L. et al. Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space. Nat. Commun. 13, 6118 (2022).
Yuan, H. & Kelley, D. R. scBasset: sequence-based modeling of single-cell ATAC-seq using convolutional neural networks. Nat. Methods 19, 1088–1096 (2022).
McInnes, L., Healy, J. & Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. Preprint at https://arxiv.org/abs/1802.03426 (2020).
Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).
Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).
Grandi, F. C., Modi, H., Kampman, L. & Corces, M. R. Chromatin accessibility profiling by ATAC-seq. Nat. Protoc. 17, 1518–1552 (2022).
Lareau, C. A. et al. Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nat. Biotechnol. 37, 916–924 (2019).
Domcke, S. et al. A human cell atlas of fetal chromatin accessibility. Science 370, eaba7612 (2020).
Wang, A. et al. Single-cell multiomic profiling of human lungs reveals cell-type-specific and age-dynamic control of SARS-CoV2 host genes. eLife 9, e62522 (2020).
Yuan, W. et al. Temporally divergent regulatory mechanisms govern neuronal diversification and maturation in the mouse and marmoset neocortex. Nat. Neurosci. 25, 1049–1058 (2022).
Wilcoxon, F., Katti, S. K. & Wilcox, R. A. in Critical Values and Probability Levels for the Wilcoxon Rank Sum Test and the Wilcoxon Signed Rank Test, Vol. 1. 171–259 (American Cyanamid, 1963).
Chen, Y. T. & Zou, J. GenePT: a simple but hard-to-beat foundation model for genes and cells built from chatgpt. Preprint at bioRxiv https://doi.org/10.1101/2023.10.16.562533 (2024).
Danese, A. et al. EpiScanpy: integrated single-cell epigenomic analysis. Nat. Commun. 12, 5228 (2021).
Abdelaal, T. et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 20, 194 (2019).
Li, Z. et al. Chromatin-accessibility estimation from single-cell ATAC-seq data with scOpen. Nat. Commun. 12, 6386 (2021).
Kiselev, V. Y., Yiu, A. & Hemberg, M. scmap: projection of single-cell RNA-seq data across data sets. Nat. Methods 15, 359–362 (2018).
Slowikowski, K., Hu, X. & Raychaudhuri, S. SNPsea: an algorithm to identify cell types, tissues and pathways affected by risk loci. Bioinformatics 30, 2496–2497 (2014).
Manz, M. G. & Boettcher, S. Emergency granulopoiesis. Nat. Rev. Immunol. 14, 302–314 (2014).
Rock, J. R. & Hogan, B. L. Epithelial progenitor cells in lung development, maintenance, repair, and disease. Annu. Rev. Cell Dev. Biol. 27, 493–512 (2011).
Fonsatti, E., Altomonte, M., Nicotra, M. R., Natali, P. G. & Maio, M. Endoglin (CD105): a powerful therapeutic target on tumor-associated angiogenetic blood vessels. Oncogene 22, 6557–6563 (2003).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Dendrou, C. A., Fugger, L. & Friese, M. A. Immunopathology of multiple sclerosis. Nat. Rev. Immunol. 15, 545–558 (2015).
Friese, M. A. & Fugger, L. Pathogenic CD8+ T cells in multiple sclerosis. Ann. Neurol. 66, 132–141 (2009).
Lu, L. et al. Regulation of activated CD4+ T cells by NK cells via the Qa-1–NKG2A inhibitory pathway. Immunity 26, 593–604 (2007).
McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).
Chatzileontiadou, D. S., Sloane, H., Nguyen, A. T., Gras, S. & Grant, E. J. The many faces of CD4+ T cells: Immunological and structural characteristics. Int. J. Mol. Sci. 22, 73 (2020).
Zhu, J., Yamane, H. & Paul, W. E. Differentiation of effector CD4 T cell populations. Annu. Rev. Immunol. 28, 445–489 (2009).
Sakaguchi, S., Yamaguchi, T., Nomura, T. & Ono, M. Regulatory T cells and immune tolerance. Cell 133, 775–787 (2008).
Smith-Garvin, J. E., Koretzky, G. A. & Jordan, M. S. T cell activation. Annu. Rev. Immunol. 27, 591–619 (2009).
Gordon, S. & Taylor, P. R. Monocyte and macrophage heterogeneity. Nat. Rev. Immunol. 5, 953–964 (2005).
Wang, Y. et al. The essential role of transcription factor Pitx3 in preventing mesodiencephalic dopaminergic neurodegeneration and maintaining neuronal subtype identities during aging. Cell Death Dis. 12, 1008 (2021).
Pei, J. et al. Integrated analysis reveals FLI1 regulates the tumor immune microenvironment via its cell-type-specific expression and transcriptional regulation of distinct target genes of immune cells in breast cancer. BMC Genomics 25, 250 (2024).
Goodnight, A. V. et al. Chromatin accessibility and transcription dynamics during in vitro astrocyte differentiation of Huntington’s Disease Monkey pluripotent stem cells. Epigenet. Chromatin 12, 67 (2019).
Demetci, P., Santorella, R., Sandstede, B., Noble, W. S. & Singh, R. SCOT: single-cell multi-omics alignment with optimal transport. J. Comput. Biol. 29, 3–18 (2022).
Cang, Z. & Nie, Q. Inferring spatial and signaling relationships between cells from single cell transcriptomic data. Nat. Commun. 11, 2084 (2020).
Cao, K., Hong, Y. & Wan, L. Manifold alignment for heterogeneous single-cell multi-omics data integration using Pamona. Bioinformatics 38, 211–219 (2021).
Cao, K., Gong, Q., Hong, Y. & Wan, L. A unified computational framework for single-cell data integration with optimal transport. Nat. Commun. 13, 7419 (2022).
Samaran, J., Peyré, G. & Cantini, L. scConfluence: single-cell diagonal integration with regularized inverse optimal transport on weakly connected features. Nat. Commun. 15, 7762 (2024).
Villani, C. Optimal Transport: Old and New vol. 338 (Springer, 2009).
Peyré, G. & Cuturi, M. Computational optimal transport. Found. Trends Mach. Learn. 11, 355–607 (2019).
Fefferman, C., Mitter, S. & Narayanan, H. Testing the manifold hypothesis. J. Am. Math. Soc. 29, 983–1049 (2016).
Jost, J. Riemannian Geometry and Geometric Analysis (Springer, 2017).
Tenenbaum, J. B., De Silva, V. & Langford, J. C. A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000).
Reuter, M., Biasotti, S., Giorgi, D., Patanè, G. & Spagnuolo, M. Discrete Laplace–Beltrami operators for shape analysis and segmentation. Comput. Graph. 33, 381–390 (2009).
Belkin, M. & Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003).
Cui, Z., Chang, H., Shan, S. & Chen, X. Generalized unsupervised manifold alignment. In Advances in Neural Information Processing Systems (eds Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. & Weinberger, K. Q.) 27, 2429–2437 (Curran Associates, Inc., 2014).
Perrot, M., Courty, N., Flamary, R. & Habrard, A. Mapping estimation for discrete optimal transport. In Advances in Neural Information Processing Systems (eds Lee, D., Sugiyama, M., Luxburg, U., Guyon, I. & Garnett, R.) 29, 4197–4205 (Curran Associates, Inc., 2016).
Zhou, P. et al. Towards theoretically understanding why SGD generalizes better than ADAM in deep learning. Adv. Neural Inf. Process. Syst. 33, 21285–21296 (2020).
Fatras, K., Sejourne, T., Flamary, R. & Courty, N. Unbalanced minibatch optimal transport: applications to domain adaptation. Proc. 38th Int. Conf. Mach. Learn. 139, 3186–3197 (2021).
Hendrycks, D. & Gimpel, K. Gaussian error linear units (GELUs). Preprint at https://arxiv.org/abs/1606.08415 (2016).
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Zhu, S., Hua, H. & Chen, S. Rigorous integration of single-cell ATAC-seq data using regularized barycentric mapping. Zenodo https://doi.org/10.5281/zenodo.14924285 (2025).
Acknowledgements
We thank H. Yuan of Calico Life Sciences LLC, S. Li of Nankai University and R. Li of Xi’an Jiaotong University for their input on this project. This work was supported by the National Natural Science Foundation of China (grant nos. 62473212 and 62203236 to S.C.) and the Young Elite Scientists Sponsorship Program by CAST (grant no. 2023QNRC001 to S.C.).
Author information
Authors and Affiliations
Contributions
S.C. conceived the study and supervised the project; S.Z. designed and implemented Fountain; S.Z., H.H. and S.C. performed research; S.Z. and H.H. analysed the results; and S.Z., H.H. and S.C. wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Zhana Duren and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Notes 1–15, Figs. 1–20 and Tables 1–8.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhu, S., Hua, H. & Chen, S. Rigorous integration of single-cell ATAC-seq data using regularized barycentric mapping. Nat Mach Intell 7, 1461–1477 (2025). https://doi.org/10.1038/s42256-025-01099-3
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s42256-025-01099-3