Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Causal machine learning for single-cell genomics

Abstract

Advances in single-cell '-omics' allow unprecedented insights into the transcriptional profiles of individual cells and, when combined with large-scale perturbation screens, enable measuring of the effect of targeted perturbations on the whole transcriptome. These advances provide an opportunity to better understand the causative role of genes in complex biological processes. In this Perspective, we delineate the application of causal machine learning to single-cell genomics and its associated challenges. We first present the causal model that is most commonly applied to single-cell biology and then identify and discuss potential approaches to three open problems: the lack of generalization of models to novel experimental conditions, the complexity of interpreting learned models, and the difficulty of learning cell dynamics.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Causal modeling of the cell.
Fig. 2: Overview of the default SCM model and strategies for learning generalizable causal models of cells.
Fig. 3: Latent causal variables help to model complex processes.
Fig. 4: Modeling of a cell through causal kinetic models.

Similar content being viewed by others

References

  1. McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: Uniform Manifold Approximation and Projection. J. Open Source Softw. 3, 861 (2018).

    Article  Google Scholar 

  2. van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    Google Scholar 

  3. Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).

    Article  PubMed  Google Scholar 

  4. Lange, M. et al. CellRank for directed single-cell fate mapping. Nat. Methods 19, 159–170 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Hetzel, L. et al. Predicting cellular responses to novel drug perturbations at a single-cell resolution. In Proceedings of the 36th International Conference on Neural Information Processing Systems 26711–26722 (Curran Associates, 2022).

  6. Liu, J. et al. Towards out-of-distribution generalization: a survey. Preprint at https://arxiv.org/abs/2108.13624 (2021).

  7. Sekhon, J. The Neyman–Rubin model of causal inference and estimation via matching methods. In The Oxford Handbook of Political Methodology (eds Box-Steffensmeier, J. M. et al.) Ch. 11 (Oxford Academic, 2008).

  8. Imbens, G. W. & Rubin, D. B. Causal Inference in Statistics, Social, and Biomedical Sciences (Cambridge University Press, 2015).

  9. Segal, E. et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 34, 166–176 (2003).

    Article  PubMed  Google Scholar 

  10. Qiao, L., Khalilimeybodi, A., Linden-Santangeli, N. J. & Rangamani, P. The evolution of systems biology and systems medicine: from mechanistic models to uncertainty quantification. Annu. Rev. Biomed. Eng. https://doi.org/10.1146/annurev-bioeng-102723-065309 (2025).

  11. Wen, Y. et al. Applying causal discovery to single-cell analyses using CausalCell. eLife 12, e81464 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Belyaeva, A., Squires, C. & Uhler, C. DCI: learning causal differences between gene regulatory networks. Bioinformatics 37, 3067–3069 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Tam, G. H. F., Chang, C. & Hung, Y. S. Gene regulatory network discovery using pairwise Granger causality. IET Syst. Biol. 7, 195–204 (2013).

    Google Scholar 

  14. Ke, N. R. et al. DiscoGen: learning to discover gene regulatory networks. Preprint at bioRxiv https://doi.org/10.1101/2023.04.11.536361 (2023).

  15. Badia-I-Mompel, P. et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat. Rev. Genet. 24, 739–754 (2023).

    Article  PubMed  Google Scholar 

  16. Fleck, J. S. et al. Inferring and perturbing cell fate regulomes in human brain organoids. Nature 621, 365–372 (2023).

    Article  PubMed  Google Scholar 

  17. Bravo González-Blas, C. et al. SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks. Nat. Methods 20, 1355–1367 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Santos-Zavaleta, A. et al. RegulonDB v 10.5: tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12. Nucleic Acids Res. 47, D212–D220 (2019).

    Article  PubMed  Google Scholar 

  19. Peters, J., Janzing, D. & Scholkopf, B. Elements of Causal Inference: Foundations and Learning Algorithms (MIT Press, 2017).

  20. Lopez, R., Hutter, J.-C., Pritchard, J. & Regev, A. Large-scale differentiable causal discovery of factor graphs. Neural Inf. Process. Syst. abs/2206.07824, 19290–19303 (2022).

    Google Scholar 

  21. Chevalley, M., Roohani, Y., Mehrjou, A., Leskovec, J. & Schwab, P. CausalBench: a large-scale benchmark for network inference from single-cell perturbation data. Preprint at https://arxiv.org/abs/2210.17283 (2022).

  22. Wang, Y., Solus, L., Yang, K. D. & Uhler, C. Permutation-based causal inference algorithms with interventions. Neural Inf. Process. Syst. 30, 5822–5831 (2017).

    Google Scholar 

  23. Aliee, H., Kapl, F., Hediyeh-Zadeh, S. & Theis, F. J. Conditionally invariant representation learning for disentangling cellular heterogeneity. Preprint at https://arxiv.org/abs/2307.00558 (2023).

  24. Levine, M. & Davidson, E. H. Gene regulatory networks for development. Proc. Natl Acad. Sci. USA 102, 4936–4942 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Lazar, N. H. et al. High-resolution genome-wide mapping of chromosome-arm-scale truncations induced by CRISPR–Cas9 editing. Nat. Genet. 56, 1482–1493 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Hsu, P. D. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol. 31, 827–832 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Adikusuma, F. et al. Large deletions induced by Cas9 cleavage. Nature 560, E8–E9 (2018).

    Article  PubMed  Google Scholar 

  28. Tsuchida, C. A. et al. Mitigation of chromosome loss in clinical CRISPR–Cas9-engineered T cells. Cell 186, 4567–4582 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Papalexi, E. et al. Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens. Nat. Genet. 53, 322–331 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Bunne, C. et al. Learning single-cell perturbation responses using neural optimal transport. Nat. Methods 20, 1759–1768 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Heumos, L. et al. Pertpy: an end-to-end framework for perturbation analysis. Preprint at bioRxiv https://doi.org/10.1101/2024.08.04.606516 (2024).

  32. Dixit, A. et al. Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Norman, T. M. et al. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 365, 786–793 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Replogle, J. M. et al. Mapping information-rich genotype–phenotype landscapes with genome-scale Perturb-seq. Cell 185, 2559–2575 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Lotfollahi, M., Wolf, F. A. & Theis, F. J. scGen predicts single-cell perturbation responses. Nat. Methods 16, 715–721 (2019).

    Article  PubMed  Google Scholar 

  36. Tran, H. T. N. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 21, 12 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Tung, P.-Y. et al. Batch effects and the effective design of single-cell gene expression studies. Sci. Rep. 7, 39921 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Rainforth, T., Foster, A., Ivanova, D. R. & Bickford Smith, F. Modern Bayesian experimental design. Stat. Sci. 39, 100–114 (2024).

    Article  Google Scholar 

  39. Jain, M. et al. GFlowNets for AI-driven scientific discovery. Digit. Discov. 2, 557–577 (2023).

  40. Williams, C. & Rasmussen, C. Gaussian processes for regression. In Advances in Neural Information Processing Systems (eds Touretzky, D. et al.) 514–520 (MIT Press, 1995).

  41. Gal, Y. & Ghahramani, Z. Dropout as a Bayesian approximation: representing model uncertainty in deep learning. ICML 48, 1050–1059 (2015).

    Google Scholar 

  42. Lakshminarayanan, B., Pritzel, A. & Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems 405, 6402–6413 (2017).

  43. Lahlou, S. et al. DEUP: direct epistemic uncertainty prediction. Trans. Mach. Learn. Res. (in the press).

  44. Ke, N. R. et al. Learning neural causal models from unknown interventions. Preprint at https://arxiv.org/abs/1910.01075 (2019).

  45. Deleu, T. et al. Bayesian structure learning with generative flow networks. In Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence 518–528 (2022).

  46. Močkus, J. On Bayesian methods for seeking the extremum. In Optimization Techniques IFIP Technical Conference Novosibirsk (ed. Marchuk, G. I.) 400–404 (Springer, 1975).

  47. Toth, C. et al. Active Bayesian causal inference. Adv. Neural Inf. Proc. Syst. 35, 16261–16275 (2022).

    Google Scholar 

  48. Scherrer, N. et al. Learning neural causal models with active interventions. Preprint at https://arxiv.org/abs/2109.02429 (2021).

  49. Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. Less is more: sampling chemical space with active learning. J. Chem. Phys. 148, 241733 (2018).

    Google Scholar 

  50. Tran, K. et al. Computational catalyst discovery: active classification through myopic multiscale sampling. J. Chem. Phys. 154, 124118 (2021).

    Google Scholar 

  51. Kim, S. et al. Deep learning for Bayesian optimization of scientific problems with high-dimensional structure. Preprint at https://arxiv.org/abs/2104.11667 (2021).

  52. Bertin, P. et al. RECOVER identifies synergistic drug combinations in vitro through sequential model optimization. Cell Rep. Methods 3, 100599 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Tosh, C. et al. A Bayesian active learning platform for scalable combination drug screens. Nat. Commun. 16, 156 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  54. Szklarczyk, D. et al. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 51, D638–D646 (2022).

    Article  PubMed Central  Google Scholar 

  55. Türei, D., Korcsmáros, T. & Saez-Rodriguez, J. OmniPath: guidelines and gateway for literature-curated signaling pathway resources. Nat. Methods 13, 966–967 (2016).

    Article  PubMed  Google Scholar 

  56. Lobentanzer, S. et al. Democratizing knowledge representation with BioCypher. Nat. Biotechnol. 41, 1056–1059 (2023).

    Article  PubMed  Google Scholar 

  57. Bertin, P. et al. Analysis of gene interaction graphs as prior knowledge for machine learning models. Preprint at https://arxiv.org/abs/1905.02295 (2019).

  58. Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Stein-O’Brien, G. L. et al. Enter the matrix: factorization uncovers knowledge from omics. Trends Genet. 34, 790–805 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Piran, Z., Cohen, N., Hoshen, Y. & Nitzan, M. Disentanglement of single-cell data with biolord. Nat. Biotechnol. 42, 1678–1683 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  61. Lotfollahi, M. et al. Predicting cellular responses to complex perturbations in high-throughput screens. Mol. Syst. Biol. 19, e11517 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Schölkopf, B. et al. Toward causal representation learning. Proc. IEEE 109, 612–634 (2021).

    Article  Google Scholar 

  63. Ahuja, K., Mahajan, D., Wang, Y. & Bengio, Y. Interventional causal representation learning. Proc. 40th Intl Conf. Mach. Learn. 202, 372–407 (2023).

    Google Scholar 

  64. Varici, B., Acarturk, E., Shanmugam, K., Kumar, A. & Tajer, A. Score-based causal representation learning with interventions. Preprint at https://arxiv.org/abs/2301.08230 (2023).

  65. Michael, B. & Karaletsos, T. Modelling cellular perturbations with the sparse additive mechanism shift variational autoencoder. Adv. Neural Inf. Proc. Syst. 36, 1–12 (2023).

    Google Scholar 

  66. Lotfollahi, M. et al. Biologically informed deep learning to query gene programs in single-cell atlases. Nat. Cell Biol. 25, 337–350 (2023).

    PubMed  PubMed Central  Google Scholar 

  67. Lopez, R. et al. Learning causal representations of single cells via sparse mechanism shift modeling. Proc. Mach. Learn. Res. 213, 1–30 (2023).

    Google Scholar 

  68. Kartik, A., Hartford, J. S. & Bengio, Y. Weakly supervised representation learning with sparse perturbations. Adv. Neural Inf. Process. Syst. 35, 15516–15528 (2022).

    Google Scholar 

  69. Peters, J., Bauer, S. & Pfister, N. in Causal Models for Dynamical Systems. Probabilistic and Causal Inference: The Works of Judea Pearl 1st edn. 671–690 (Association for Computing Machinery, 2022).

  70. Haghverdi, L., Büttner, M., Wolf, F. A., Buettner, F. & Theis, F. J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).

    Article  PubMed  Google Scholar 

  71. Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).

    Article  PubMed  Google Scholar 

  72. Moon, K. R. et al. Manifold learning-based methods for analyzing single-cell RNA-sequencing data. Curr. Opin. Syst. Biol. 7, 36–46 (2018).

    Article  Google Scholar 

  73. Aliee, H., Theis, F. J. & Kilbertus, N. Beyond predictions in neural ODEs: identification and interventions. Preprint at https://arxiv.org/abs/2106.12430 (2021).

  74. Hananeh, A. et al. Sparsity in continuous-depth neural networks. Adv. Neural Inf. Process. Syst. 35, 901–914 (2022).

    Google Scholar 

  75. Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).

    Article  PubMed  Google Scholar 

  76. Tong, A. et al. Trajectorynet: A dynamic optimal transport network for modeling cellular dynamics. Internatl Conf. Mach. Learn. http://proceedings.mlr.press/v119/tong20a/tong20a-supp.pdf (PMLR, 2020).

  77. Schiebinger, G. et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell 176, 928–943 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  78. Eyring, L. V. et al. Modeling single-cell dynamics using unbalanced parameterized Monge maps. Preprint at bioRxiv https://doi.org/10.1101/2022.10.04.510766 (2022).

  79. Wu, Y. et al. PerturBench: benchmarking machine learning models for cellular perturbation analysis. Preprint at https://arxiv.org/abs/2408.10609 (2024).

  80. Csendes, G., Szalay, K. Z. & Szalai, B. Benchmarking a foundational cell model for post-perturbation RNAseq prediction. Preprint at bioRxiv https://doi.org/10.1101/2024.09.30.615843 (2024).

  81. Mehrjou, A. et al. GeneDisco: a benchmark for experimental design in drug discovery. Preprint at https://arxiv.org/abs/2110.11875 (2021).

  82. Metzner, E., Southard, K. M. & Norman, T. M. Multiome Perturb-seq unlocks scalable discovery of integrated perturbation effects on the transcriptome and epigenome. Cell Syst. 16, 101161 (2025).

    Article  PubMed  Google Scholar 

  83. Sethuraman, M. G. et al. NODAGS-Flow: nonlinear cyclic causal structure learning. In International Conference on Artificial Intelligence and Statistics (eds Ruiz, F. et al.) 6371–6387 (PMLR, 2023).

  84. Nguyen, T., Tong, A., Madan, K., Bengio, Y. & Liu, D. Causal inference in gene regulatory networks with GFlowNet: towards scalability in large systems. Preprint at https://arxiv.org/abs/2310.03579 (2023).

  85. Tung, K.-F., Pan, C.-Y., Chen, C.-H. & Lin, W.-C. Top-ranked expressed gene transcripts of human protein-coding genes investigated with GTEx dataset. Sci. Rep. 10, 16245 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  86. Dhamija, S. & Menon, M. B. Non-coding transcript variants of protein-coding genes — what are they good for? RNA Biol. 15, 1025–1031 (2018).

    Google Scholar 

  87. Aebersold, R. et al. How many human proteoforms are there? Nat. Chem. Biol. 14, 206–214 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  88. Dubey, A. et al. The Llama 3 herd of models. Preprint at https://arxiv.org/abs/2407.21783 (2024).

  89. Gavrilov, A. A. et al. Studying RNA–DNA interactome by Red-C identifies noncoding RNAs associated with various chromatin types and reveals transcription dynamics. Nucleic Acids Res. 48, 6699–6714 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  90. Noh, J. Y. et al. CCIDB: a manually curated cell–cell interaction database with cell context information. Database 2023, baad057 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  91. Pearce, A. C. et al. Vav1 and Vav3 have critical but redundant roles in mediating platelet activation by collagen. J. Biol. Chem. 279, 53955–53962 (2004).

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

A.T.-L. and P.B. conceptualized the work. P.B., A.T.-L. and H.A. wrote the original draft and created the figures. S.B. provided useful feedback. All authors critically reviewed and edited the manuscript. F.J.T., Y.B. and H.A. supervised the project.

Corresponding authors

Correspondence to Hananeh Aliee, Yoshua Bengio or Fabian J. Theis.

Ethics declarations

Competing interests

Y.B. is an advisor to Recursion Pharmaceuticals. F.J.T. consults for Immunai, CytoReason, Cellarity, BioTuring and Genbio.AI and has an ownership interest in Dermagnostix and Cellarity. All other authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Patrick Schwab and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tejada-Lapuerta, A., Bertin, P., Bauer, S. et al. Causal machine learning for single-cell genomics. Nat Genet 57, 797–808 (2025). https://doi.org/10.1038/s41588-025-02124-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41588-025-02124-2

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics