Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Rigorous integration of single-cell ATAC-seq data using regularized barycentric mapping

Abstract

Single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) deciphers genome-wide chromatin accessibility, providing profound insights into gene regulation mechanisms. With the rapid advance of sequencing technologies, scATAC-seq data typically encompass numerous samples from various conditions, resulting in complex batch effects, thus necessitating reliable integration tools. While numerous batch integration tools exist for single-cell RNA sequencing data, inherent data characteristic differences limit their effectiveness on scATAC-seq data. Existing integration methods for scATAC-seq data suffer from several fundamental limitations, such as disrupting the biological heterogeneity and focusing solely on low-dimensional correction, which may distort data and hinder downstream analysis. Here we propose Fountain, a deep learning framework for scATAC-seq data integration via rigorous barycentric mapping. Barycentric mapping transforms one data distribution to another in a principled and effective manner through optimal transport. By regularizing barycentric mapping with geometric data information, Fountain achieves accurate batch alignment while preserving biological heterogeneity. Comprehensive experiments across diverse real-world datasets demonstrate the advantages of Fountain over existing methods in batch correction and biological conservation. In addition, the trained Fountain model can integrate data from new batches alongside already integrated data without retraining, enabling continuous online data integration. Moreover, Fountain’s reconstruction strategy generates batch-corrected ATAC profiles, improving the capture of cellular heterogeneity and revealing cell-type-specific implications such as expression enrichment analysis and partitioned heritability analysis.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of Fountain.
Fig. 2: Fountain removes batch effects and preserves biological heterogeneity in scATAC-seq data.
Fig. 3: Fountain integrates datasets with various types and degrees of imbalance.
Fig. 4: Comparison between the raw data and Fountain-enhanced data.
Fig. 5: Fountain effectively captures cell-type-specific biological signals.

Similar content being viewed by others

Data availability

The following publicly available datasets were used in this study: the MB dataset from Gene Expression Omnibus (GEO) under accession number GSE126724 and 10x Genomic (https://support.10xgenomics.com/single-cell-atac/datasets/1.1.0/atac_v1_adult_brain_fresh_5k?; free registration required), the BMMC dataset from GSE123581, the HI dataset from GSE149683, the HL dataset from GSE161383 and the mouse cortex and marmoset cortex (MM) dataset from GSE204851. The methods for unifying cell type labels across batches, and the sources of those labels in these datasets, are described in Supplementary Note 1. The download methods of the two human peripheral blood mononuclear cell datasets (PBMCA and PBMCB) are provided in Supplementary Note 1. More detailed descriptions and source of datasets are provided in Supplementary Note 1 and Supplementary Table 1.

Code availability

Fountain is implemented in Python using the PyTorch framework. The MIT-licensed Fountain software with detailed documentation and tutorials is available via GitHub at https://github.com/BioX-NKU/Fountain. The source code is also available via Zenodo at https://doi.org/10.5281/zenodo.14924285 (ref. 64).

References

  1. Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol. 109, 21–29 (2015).

    Article  Google Scholar 

  2. Satpathy, A. T. et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat. Biotechnol. 37, 925–936 (2019).

    Article  Google Scholar 

  3. Preissl, S. et al. Single-nucleus analysis of accessible chromatin in developing mouse forebrain reveals cell-type-specific transcriptional regulation. Nat. Neurosci. 21, 432–439 (2018).

    Article  Google Scholar 

  4. Gehring, J., Hwee Park, J., Chen, S., Thomson, M. & Pachter, L. Highly multiplexed single-cell RNA-seq by DNA oligonucleotide tagging of cellular proteins. Nat. Biotechnol. 38, 35–38 (2020).

    Article  Google Scholar 

  5. Almanzar, N. et al. A single-cell transcriptomic atlas characterizes ageing tissues in the mouse. Nature 583, 590–595 (2020).

    Article  Google Scholar 

  6. Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with harmony. Nat. Methods 16, 1289–1296 (2019).

    Article  Google Scholar 

  7. Tran, H. T. N. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 21, 12 (2020).

    Article  Google Scholar 

  8. Chazarra-Gil, R., van Dongen, S., Kiselev, V. Y. & Hemberg, M. Flexible comparison of batch correction methods for single-cell RNA-seq using BatchBench. Nucleic Acids Res. 49, e42–e42 (2021).

    Article  Google Scholar 

  9. Chen, S. et al. RA3 is a reference-guided approach for epigenetic characterization of single cells. Nat. Commun. 12, 2177 (2021).

    Article  Google Scholar 

  10. Stuart, T., Srivastava, A., Madad, S., Lareau, C. A. & Satija, R. Single-cell chromatin state analysis with Signac. Nat. Methods 18, 1333–1341 (2021).

    Article  Google Scholar 

  11. Ashuach, T., Reidenbach, D. A., Gayoso, A. & Yosef, N. PeakVI: a deep generative model for single-cell chromatin accessibility analysis. Cell Rep. Methods 2, 100182 (2022).

    Article  Google Scholar 

  12. Kopp, W., Akalin, A. & Ohler, U. Simultaneous dimensionality reduction and integration for single-cell atac-seq data using deep learning. Nat. Mach. Intell. 4, 162–168 (2022).

    Article  Google Scholar 

  13. Xiong, L. et al. Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space. Nat. Commun. 13, 6118 (2022).

    Article  Google Scholar 

  14. Yuan, H. & Kelley, D. R. scBasset: sequence-based modeling of single-cell ATAC-seq using convolutional neural networks. Nat. Methods 19, 1088–1096 (2022).

    Article  Google Scholar 

  15. McInnes, L., Healy, J. & Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. Preprint at https://arxiv.org/abs/1802.03426 (2020).

  16. Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).

    Article  Google Scholar 

  17. Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).

    Article  Google Scholar 

  18. Grandi, F. C., Modi, H., Kampman, L. & Corces, M. R. Chromatin accessibility profiling by ATAC-seq. Nat. Protoc. 17, 1518–1552 (2022).

    Article  Google Scholar 

  19. Lareau, C. A. et al. Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nat. Biotechnol. 37, 916–924 (2019).

    Article  Google Scholar 

  20. Domcke, S. et al. A human cell atlas of fetal chromatin accessibility. Science 370, eaba7612 (2020).

    Article  Google Scholar 

  21. Wang, A. et al. Single-cell multiomic profiling of human lungs reveals cell-type-specific and age-dynamic control of SARS-CoV2 host genes. eLife 9, e62522 (2020).

    Article  Google Scholar 

  22. Yuan, W. et al. Temporally divergent regulatory mechanisms govern neuronal diversification and maturation in the mouse and marmoset neocortex. Nat. Neurosci. 25, 1049–1058 (2022).

    Article  Google Scholar 

  23. Wilcoxon, F., Katti, S. K. & Wilcox, R. A. in Critical Values and Probability Levels for the Wilcoxon Rank Sum Test and the Wilcoxon Signed Rank Test, Vol. 1. 171–259 (American Cyanamid, 1963).

  24. Chen, Y. T. & Zou, J. GenePT: a simple but hard-to-beat foundation model for genes and cells built from chatgpt. Preprint at bioRxiv https://doi.org/10.1101/2023.10.16.562533 (2024).

  25. Danese, A. et al. EpiScanpy: integrated single-cell epigenomic analysis. Nat. Commun. 12, 5228 (2021).

    Article  Google Scholar 

  26. Abdelaal, T. et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 20, 194 (2019).

    Article  Google Scholar 

  27. Li, Z. et al. Chromatin-accessibility estimation from single-cell ATAC-seq data with scOpen. Nat. Commun. 12, 6386 (2021).

    Article  Google Scholar 

  28. Kiselev, V. Y., Yiu, A. & Hemberg, M. scmap: projection of single-cell RNA-seq data across data sets. Nat. Methods 15, 359–362 (2018).

    Article  Google Scholar 

  29. Slowikowski, K., Hu, X. & Raychaudhuri, S. SNPsea: an algorithm to identify cell types, tissues and pathways affected by risk loci. Bioinformatics 30, 2496–2497 (2014).

    Article  Google Scholar 

  30. Manz, M. G. & Boettcher, S. Emergency granulopoiesis. Nat. Rev. Immunol. 14, 302–314 (2014).

    Article  Google Scholar 

  31. Rock, J. R. & Hogan, B. L. Epithelial progenitor cells in lung development, maintenance, repair, and disease. Annu. Rev. Cell Dev. Biol. 27, 493–512 (2011).

    Article  Google Scholar 

  32. Fonsatti, E., Altomonte, M., Nicotra, M. R., Natali, P. G. & Maio, M. Endoglin (CD105): a powerful therapeutic target on tumor-associated angiogenetic blood vessels. Oncogene 22, 6557–6563 (2003).

    Article  Google Scholar 

  33. Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    Article  Google Scholar 

  34. Dendrou, C. A., Fugger, L. & Friese, M. A. Immunopathology of multiple sclerosis. Nat. Rev. Immunol. 15, 545–558 (2015).

    Article  Google Scholar 

  35. Friese, M. A. & Fugger, L. Pathogenic CD8+ T cells in multiple sclerosis. Ann. Neurol. 66, 132–141 (2009).

    Article  Google Scholar 

  36. Lu, L. et al. Regulation of activated CD4+ T cells by NK cells via the Qa-1–NKG2A inhibitory pathway. Immunity 26, 593–604 (2007).

    Article  Google Scholar 

  37. McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).

    Article  Google Scholar 

  38. Chatzileontiadou, D. S., Sloane, H., Nguyen, A. T., Gras, S. & Grant, E. J. The many faces of CD4+ T cells: Immunological and structural characteristics. Int. J. Mol. Sci. 22, 73 (2020).

    Article  Google Scholar 

  39. Zhu, J., Yamane, H. & Paul, W. E. Differentiation of effector CD4 T cell populations. Annu. Rev. Immunol. 28, 445–489 (2009).

    Article  Google Scholar 

  40. Sakaguchi, S., Yamaguchi, T., Nomura, T. & Ono, M. Regulatory T cells and immune tolerance. Cell 133, 775–787 (2008).

    Article  Google Scholar 

  41. Smith-Garvin, J. E., Koretzky, G. A. & Jordan, M. S. T cell activation. Annu. Rev. Immunol. 27, 591–619 (2009).

    Article  Google Scholar 

  42. Gordon, S. & Taylor, P. R. Monocyte and macrophage heterogeneity. Nat. Rev. Immunol. 5, 953–964 (2005).

    Article  Google Scholar 

  43. Wang, Y. et al. The essential role of transcription factor Pitx3 in preventing mesodiencephalic dopaminergic neurodegeneration and maintaining neuronal subtype identities during aging. Cell Death Dis. 12, 1008 (2021).

    Article  Google Scholar 

  44. Pei, J. et al. Integrated analysis reveals FLI1 regulates the tumor immune microenvironment via its cell-type-specific expression and transcriptional regulation of distinct target genes of immune cells in breast cancer. BMC Genomics 25, 250 (2024).

    Article  Google Scholar 

  45. Goodnight, A. V. et al. Chromatin accessibility and transcription dynamics during in vitro astrocyte differentiation of Huntington’s Disease Monkey pluripotent stem cells. Epigenet. Chromatin 12, 67 (2019).

    Article  Google Scholar 

  46. Demetci, P., Santorella, R., Sandstede, B., Noble, W. S. & Singh, R. SCOT: single-cell multi-omics alignment with optimal transport. J. Comput. Biol. 29, 3–18 (2022).

    Article  MathSciNet  Google Scholar 

  47. Cang, Z. & Nie, Q. Inferring spatial and signaling relationships between cells from single cell transcriptomic data. Nat. Commun. 11, 2084 (2020).

    Article  Google Scholar 

  48. Cao, K., Hong, Y. & Wan, L. Manifold alignment for heterogeneous single-cell multi-omics data integration using Pamona. Bioinformatics 38, 211–219 (2021).

    Article  Google Scholar 

  49. Cao, K., Gong, Q., Hong, Y. & Wan, L. A unified computational framework for single-cell data integration with optimal transport. Nat. Commun. 13, 7419 (2022).

    Article  Google Scholar 

  50. Samaran, J., Peyré, G. & Cantini, L. scConfluence: single-cell diagonal integration with regularized inverse optimal transport on weakly connected features. Nat. Commun. 15, 7762 (2024).

    Article  Google Scholar 

  51. Villani, C. Optimal Transport: Old and New vol. 338 (Springer, 2009).

  52. Peyré, G. & Cuturi, M. Computational optimal transport. Found. Trends Mach. Learn. 11, 355–607 (2019).

    Article  Google Scholar 

  53. Fefferman, C., Mitter, S. & Narayanan, H. Testing the manifold hypothesis. J. Am. Math. Soc. 29, 983–1049 (2016).

    Article  MathSciNet  Google Scholar 

  54. Jost, J. Riemannian Geometry and Geometric Analysis (Springer, 2017).

  55. Tenenbaum, J. B., De Silva, V. & Langford, J. C. A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000).

    Article  Google Scholar 

  56. Reuter, M., Biasotti, S., Giorgi, D., Patanè, G. & Spagnuolo, M. Discrete Laplace–Beltrami operators for shape analysis and segmentation. Comput. Graph. 33, 381–390 (2009).

    Article  Google Scholar 

  57. Belkin, M. & Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003).

    Article  Google Scholar 

  58. Cui, Z., Chang, H., Shan, S. & Chen, X. Generalized unsupervised manifold alignment. In Advances in Neural Information Processing Systems (eds Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. & Weinberger, K. Q.) 27, 2429–2437 (Curran Associates, Inc., 2014).

  59. Perrot, M., Courty, N., Flamary, R. & Habrard, A. Mapping estimation for discrete optimal transport. In Advances in Neural Information Processing Systems (eds Lee, D., Sugiyama, M., Luxburg, U., Guyon, I. & Garnett, R.) 29, 4197–4205 (Curran Associates, Inc., 2016).

  60. Zhou, P. et al. Towards theoretically understanding why SGD generalizes better than ADAM in deep learning. Adv. Neural Inf. Process. Syst. 33, 21285–21296 (2020).

    Google Scholar 

  61. Fatras, K., Sejourne, T., Flamary, R. & Courty, N. Unbalanced minibatch optimal transport: applications to domain adaptation. Proc. 38th Int. Conf. Mach. Learn. 139, 3186–3197 (2021).

    Google Scholar 

  62. Hendrycks, D. & Gimpel, K. Gaussian error linear units (GELUs). Preprint at https://arxiv.org/abs/1606.08415 (2016).

  63. Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    Article  Google Scholar 

  64. Zhu, S., Hua, H. & Chen, S. Rigorous integration of single-cell ATAC-seq data using regularized barycentric mapping. Zenodo https://doi.org/10.5281/zenodo.14924285 (2025).

Download references

Acknowledgements

We thank H. Yuan of Calico Life Sciences LLC, S. Li of Nankai University and R. Li of Xi’an Jiaotong University for their input on this project. This work was supported by the National Natural Science Foundation of China (grant nos. 62473212 and 62203236 to S.C.) and the Young Elite Scientists Sponsorship Program by CAST (grant no. 2023QNRC001 to S.C.).

Author information

Authors and Affiliations

Authors

Contributions

S.C. conceived the study and supervised the project; S.Z. designed and implemented Fountain; S.Z., H.H. and S.C. performed research; S.Z. and H.H. analysed the results; and S.Z., H.H. and S.C. wrote the paper.

Corresponding author

Correspondence to Shengquan Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Zhana Duren and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Notes 1–15, Figs. 1–20 and Tables 1–8.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, S., Hua, H. & Chen, S. Rigorous integration of single-cell ATAC-seq data using regularized barycentric mapping. Nat Mach Intell 7, 1461–1477 (2025). https://doi.org/10.1038/s42256-025-01099-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s42256-025-01099-3

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing