Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Interpretation, extrapolation and perturbation of single cells

Abstract

Single-cell analyses have transitioned from descriptive atlasing towards inferring causal effects and mechanistic relationships that capture cellular logic. Technological advances and the growing scale of observational and interventional datasets have fuelled the development of machine learning methods aimed at identifying such dependencies and extrapolating perturbation effects. Here, we review and connect these approaches according to their modelling concepts (including representation learning, causal inference, mechanistic discovery, disentanglement and population tracing), underlying assumptions and downstream tasks. We propose a unifying ontology to guide practitioners in selecting the most suitable methods for a given biological question, with detailed technical descriptions provided in an online resource. Finally, we identify promising computational directions and underexplored data properties that could pave the way for future developments.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: From perturbations to modelling causal cellular processes.
Fig. 2: The overarching aims of causal and mechanistic modelling.
Fig. 3: An ontology for modelling alterations and response.

Similar content being viewed by others

References

  1. Regev, A. et al. The human cell atlas. eLife 6, e27041 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  2. Rood, J. E. et al. The Human Cell Atlas from a cell census to a unified foundation model. Nature 637, 1065–1071 (2025).

    Article  CAS  PubMed  Google Scholar 

  3. Argelaguet, R. et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 21, 111 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Velten, B. et al. Identifying temporal and spatial patterns of variation from multimodal data using MEFISTO. Nat. Methods 19, 179–186 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Tian, T., Zhang, J., Lin, X., Wei, Z. & Hakonarson, H. Dependency-aware deep generative models for multitasking analysis of spatial omics data. Nat. Methods 21, 1501–1513 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Zhong, C., Ang, K. S. & Chen, J. Interpretable spatially aware dimension reduction of spatial transcriptomics with STAMP. Nat. Methods 21, 2072–2083 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Lotfollahi, M. et al. Mapping single-cell data to reference atlases by transfer learning. Nat. Biotechnol. 40, 121–130 (2022).

    Article  CAS  PubMed  Google Scholar 

  9. Pearl, J. Causality 2nd edn (Cambridge Univ. Press, 2009).

  10. Schölkopf, B. et al. Toward causal representation learning. Proc. IEEE 109, 612–634 (2021).

    Article  Google Scholar 

  11. Saez-Rodriguez, J. et al. Discrete logic modelling as a means to link protein signalling networks with functional analysis of mammalian signal transduction. Mol. Syst. Biol. 5, 331 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Yazar, S. et al. Single-cell eQTL mapping identifies cell type-specific genetic control of autoimmune disease. Science 376, eabf3041 (2022).

    Article  CAS  PubMed  Google Scholar 

  13. Jerber, J. et al. Population-scale single-cell RNA-seq profiling across dopaminergic neuron differentiation. Nat. Genet. 53, 304–312 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Kuppe, C. et al. Spatial multi-omic map of human myocardial infarction. Nature 608, 766–777 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Oliver, A. J. et al. Single-cell integration reveals metaplasia in inflammatory gut diseases. Nature 635, 699–707 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Velten, B. & Stegle, O. Principles and challenges of modeling temporal and spatial omics data. Nat. Methods 20, 1462–1474 (2023).

    Article  CAS  PubMed  Google Scholar 

  17. Fischer, D. S., Villanueva, M. A., Winter, P. S. & Shalek, A. K. Adapting systems biology to address the complexity of human disease in the single-cell era. Nat. Rev. Genet. 26, 514–531 (2025).

    Article  CAS  PubMed  Google Scholar 

  18. Shojaie, A. & Fox, E. B. Granger causality: a review and recent advances. Annu. Rev. Stat. Appl. 9, 289–319 (2022).

    Article  PubMed  Google Scholar 

  19. Arnol, D., Schapiro, D., Bodenmiller, B., Saez-Rodriguez, J. & Stegle, O. Modeling cell-cell interactions from spatial molecular data with spatial variance component analysis. Cell Rep. 29, 202–211.e6 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Bressan, D., Battistoni, G. & Hannon, G. J. The dawn of spatial omics. Science 381, eabq4964 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Armingol, E., Baghdassarian, H. M. & Lewis, N. E. The diversification of methods for studying cell-cell interactions and communication. Nat. Rev. Genet. 25, 381–400 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Palla, G., Fischer, D. S., Regev, A. & Theis, F. J. Spatial components of molecular tissue biology. Nat. Biotechnol. 40, 308–318 (2022).

    Article  CAS  PubMed  Google Scholar 

  23. Vandereyken, K., Sifrim, A., Thienpont, B. & Voet, T. Methods and applications for single-cell and spatial multi-omics. Nat. Rev. Genet. 24, 494–515 (2023).

    Article  CAS  PubMed  Google Scholar 

  24. Badia-I-Mompel, P. et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat. Rev. Genet. 24, 739–754 (2023).

    Article  CAS  PubMed  Google Scholar 

  25. Zhang, J. et al. Tahoe-100M: a giga-scale single-cell perturbation atlas for context-dependent gene function and cellular modeling. Preprint at bioRxiv https://doi.org/10.1101/2025.02.20.639398 (2025).

  26. McFarland, J. M. et al. Multiplexed single-cell transcriptional response profiling to define cancer vulnerabilities and therapeutic mechanism of action. Nat. Commun. 11, 4296 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Srivatsan, S. R. et al. Massively multiplex chemical transcriptomics at single-cell resolution. Science 367, 45–51 (2020).

    Article  CAS  PubMed  Google Scholar 

  28. Dixit, A. et al. Perturb-Seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866.e17 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Replogle, J. M. et al. Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq. Cell 185, 2559–2575.e28 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Norman, T. M. et al. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 365, 786–793 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Ji, Y., Lotfollahi, M., Wolf, F. A. & Theis, F. J. Machine learning for perturbational single-cell omics. Cell Syst. 12, 522–537 (2021).

    Article  CAS  PubMed  Google Scholar 

  32. Rood, J. E., Hupalowska, A. & Regev, A. Toward a foundation model of causal cell and tissue biology with a Perturbation Cell and Tissue Atlas. Cell 187, 4520–4545 (2024).

    Article  CAS  PubMed  Google Scholar 

  33. Bock, C. et al. High-content CRISPR screening. Nat. Rev. Methods Primers 2, 8 (2022).

    Article  CAS  Google Scholar 

  34. Peidli, S. et al. scPerturb: harmonized single-cell perturbation data. Nat. Methods 21, 531–540 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Ishikawa, M. et al. RENGE infers gene regulatory networks using time-series single-cell RNA-seq data with CRISPR perturbations. Commun. Biol. 6, 1290 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Feng, C. et al. A genome-scale single cell CRISPRi map of trans gene regulation across human pluripotent stem cell lines. Preprint at bioRxiv https://doi.org/10.1101/2024.11.28.625833 (2024).

  37. Dong, M. et al. Causal identification of single-cell experimental perturbation effects with CINEMA-OT. Nat. Methods 20, 1769–1779 (2023). This work uses optimal transport to identify counterfactual couplings between control and perturbed populations, following the disentanglement and exclusion of perturbation effects.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Tejada-Lapuerta, A. et al. Causal machine learning for single-cell genomics. Nat. Genet. 57, 797–808 (2025).

    Article  CAS  PubMed  Google Scholar 

  39. Adamson, B. et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 167, 1867–1882.e21 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Papalexi, E. et al. Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens. Nat. Genet. 53, 322–331 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Duan, B. et al. Model-based understanding of single-cell CRISPR screening. Nat. Commun. 10, 2233 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Jiang, L. et al. Systematic reconstruction of molecular pathway signatures using scalable single-cell perturbation screens. Nat. Cell Biol. 27, 505–517 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Barry, T., Wang, X., Morris, J. A., Roeder, K. & Katsevich, E. SCEPTRE improves calibration and sensitivity in single-cell CRISPR screen analysis. Genome Biol. 22, 344 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Ryu, J. et al. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. Nat. Genet. 56, 925–937 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Sánchez-Rivera, F. J. et al. Base editing sensor libraries for high-throughput engineering and functional analysis of cancer-associated single nucleotide variants. Nat. Biotechnol. 40, 862–873 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Huang, A. C. et al. X-Atlas/Orion: genome-wide perturb-seq datasets via a scalable fix-cryopreserve platform for training dose-dependent biological foundation models. Preprint at bioRxiv https://doi.org/10.1101/2025.06.11.659105 (2025).

  47. Trapnell, C. Revealing gene function with statistical inference at single-cell resolution. Nat. Rev. Genet. 25, 623–638 (2024).

    Article  CAS  PubMed  Google Scholar 

  48. Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).

    Article  CAS  PubMed  Google Scholar 

  49. Ramirez Flores, R. O., Schäfer, P. S. L., Küchenhoff, L. & Saez-Rodriguez, J. Complementing cell taxonomies with a multicellular analysis of tissues. Physiology 39, 129–141 (2024).

    Article  Google Scholar 

  50. Montesuma, E. F., Mboula, F. N. & Souloumiac, A. Recent advances in optimal transport for machine learning. IEEE Trans. Pattern Anal. Mach. Intell. 47, 1161–1180 (2025).

    Article  PubMed  Google Scholar 

  51. Bunne, C., Schiebinger, G., Krause, A., Regev, A. & Cuturi, M. Optimal transport for single-cell and spatial omics. Nat. Rev. Methods Primers 4, 58 (2024).

    Article  CAS  Google Scholar 

  52. Szałata, A. et al. Transformers in single-cell omics: a review and new perspectives. Nat. Methods 21, 1430–1443 (2024).

    Article  PubMed  Google Scholar 

  53. Consens, M. E. et al. Transformers and genome language models. Nat. Mach. Intell. 7, 346–362 (2025).

    Article  Google Scholar 

  54. Bunne, C. et al. How to build the virtual cell with artificial intelligence: Priorities and opportunities. Cell 187, 7045–7063 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Lobentanzer, S., Rodriguez-Mier, P., Bauer, S. & Saez-Rodriguez, J. Molecular causality in the advent of foundation models. Mol. Syst. Biol. 20, 848–858 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Cui, H. et al. Towards multimodal foundation models in molecular cell biology. Nature 640, 623–633 (2025).

    Article  CAS  PubMed  Google Scholar 

  57. Stuart, J. M., Segal, E., Koller, D. & Kim, S. K. A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003).

    Article  CAS  PubMed  Google Scholar 

  58. Segal, E. et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 34, 166–176 (2003).

    Article  CAS  PubMed  Google Scholar 

  59. Lopez, R. et al. Learning causal representations of single cells via sparse mechanism shift modeling. In Proc. 2nd Conference on Causal Learning and Reasoning (eds van der Schaar, M. et al.) 662–691 (PMLR, 2023). This work uses sparse mechanism shifts to provide interpretable causal effects on learned latent variables.

  60. Träuble, F. et al. On disentangled representations learned from correlated data. In Proc. 38th International Conference on Machine Learning 10401–10412 (PMLR, 2021).

  61. Locatello, F. et al. Challenging common assumptions in the unsupervised learning of disentangled representations. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) 4114–4124 (PMLR, 2019).

  62. Weinberger, E., Lin, C. & Lee, S.-I. Isolating salient variations of interest in single-cell data with contrastiveVI. Nat. Methods 20, 1336–1345 (2023). This work builds on a series of contrastive autoencoder frameworks to isolate variations of interest, such as perturbation-induced changes, from ‘background’ biological signals using single-cell omics data.

    Article  CAS  PubMed  Google Scholar 

  63. Piran, Z., Cohen, N., Hoshen, Y. & Nitzan, M. Disentanglement of single-cell data with biolord. Nat. Biotechnol. 42, 1678–1683 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Moinfar, A. A. & Theis, F. J. Unsupervised deep disentangled representation of single-cell omics with DRVI. In Proc. Learning Meaningful Representations of Life Workshop at ICLR (ICLR, 2025).

  65. Kamimoto, K. et al. Dissecting cell identity via network inference and in silico gene perturbation. Nature 614, 742–751 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Bravo González-Blas, C. et al. SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks. Nat. Methods 20, 1355–1367 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  67. Birk, S. et al. Quantitative characterization of cell niches in spatially resolved omics data. Nat. Genet. 57, 897–909 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Schrod, S. et al. Spatial Cellular Networks from omics data with SpaCeNet. Genome Res. 34, 1371–1383 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Zheng, X., Aragam, B. & Ravikumar, P. K. DAGs with NO TEARS: Continuous optimization for structure learning. In Advances in Neural Information Processing Systems 31 (eds Bengio, S. et al.) (2018).

  70. Rohbeck, M. et al. Bicycle: intervention-based causal discovery with cycles. In Proc. 3rd Conference on Causal Learning and Reasoning 209–242 (PMLR, 2024).

  71. Brouillard, P., Lachapelle, S., Lacoste, A., Lacoste-Julien, S. & Drouin, A. Differentiable causal discovery from interventional data. In Proc. 34th International Conference on Neural Information Processing Systems (eds Larochelle, H. et al.) 21865–21877 (Curran, 2020).

  72. Bertin, P. et al. A scalable gene network model of regulatory dynamics in single cells. Preprint at https://doi.org/10.48550/arXiv.2503.20027 (2025). This work combines optimal transport and pseudotime inference to model perturbation-dependent gene regulatory networks and cellular differentiation using ordinary differential equations.

  73. Lopez, R., Hütter, J. C., Pritchard, J. & Regev, A. Large-scale differentiable causal discovery of factor graphs. In Proc. 36th International Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 19290–19303 (Curran, 2022).

  74. Peters, J., Janzing, D. & Schölkopf, B. Elements of Causal Inference: Foundations and Learning Algorithms (Adaptive Computation and Machine Learning series) 288 (MIT Press, 2017).

  75. Subramanian, A. et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171, 1437–1452.e17 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Lotfollahi, M., Wolf, F. A. & Theis, F. J. scGen predicts single-cell perturbation responses. Nat. Methods 16, 715–721 (2019).

    Article  CAS  PubMed  Google Scholar 

  77. Lotfollahi, M., Naghipourfar, M., Theis, F. J. & Wolf, F. A. Conditional out-of-distribution generation for unpaired data using transfer VAE. Bioinformatics 36, i610–i617 (2020).

    Article  CAS  PubMed  Google Scholar 

  78. Lotfollahi, M. et al. Predicting cellular responses to complex perturbations in high-throughput screens. Mol. Syst. Biol. 19, e11517 (2023). This work introduces the concept of explicitly disentangling and combining perturbational, covariate and background effects using autoencoder frameworks in single-cell data.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Hediyeh-zadeh, S., Fischer, T. & Theis, F. J. Disentanglement via mechanism sparsity by replaying realizations of the past. In Proc. ICLR 2024 Workshop on Machine Learning for Genomics Explorations (ICLR, 2024).

  80. Bunne, C. et al. Learning single-cell perturbation responses using neural optimal transport. Nat. Methods 20, 1759–1768 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Rohbeck, M. et al. Modeling complex system dynamics with flow matching across time and conditions. In Proc. 13th International Conference on Learning Representations (ICLR, 2025).

  82. Bhaskar, D., et al. Inferring dynamic regulatory interaction graphs from time series data with perturbations. In Proc. 2nd Learning on Graphs Conference (eds Villar, S. & Chamberlain, B.) 22:1–22:21 (PMLR, 2024).

  83. Roohani, Y., Huang, K. & Leskovec, J. Predicting transcriptional outcomes of novel multigene perturbations with GEARS. Nat. Biotechnol. 42, 927–935 (2024). This work shows that co-expressions and prior knowledge representations can be used to relate gene perturbations, thus improving the extrapolation of unobserved perturbations.

    Article  CAS  PubMed  Google Scholar 

  84. Gaudelet, T. et al. Season combinatorial intervention predictions with Salt & Peper. In ICLR 2024 Workshop on Machine Learning for Genomics Explorations (ICLR, 2024).

  85. Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).

    Article  CAS  PubMed  Google Scholar 

  86. Bereket, M. & Karaletsos, T. Modelling cellular perturbations with the sparse additive mechanism shift variational autoencoder. In Proc. 37th Conference on Neural Information Processing Systems (eds Oh, A. et al.) 1–12 (Curran, 2023).

  87. Slack, M. D., Martinez, E. D., Wu, L. F. & Altschuler, S. J. Characterizing heterogeneous cellular responses to perturbations. Proc. Natl Acad. Sci. USA 105, 19306–19311 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. He, S. et al. Squidiff: predicting cellular development and responses to perturbations using a diffusion model. Nat. Methods https://doi.org/10.1038/s41592-025-02877-y (2025).

  89. Bunne, C., Krause, A. & Cuturi, M. Supervised training of conditional monge maps. In Proc. 36th International Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 35, 6859–6872 (Curran, 2022). This work builds on CellOT to introduce a context-aware optimal transport method that enables the extrapolation to novel perturbations and combinatorial effects.

  90. Kim, M. C. et al. Method of moments framework for differential expression analysis of single-cell RNA sequencing data. Cell 187, 6393–6410.e16 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Squair, J. W. et al. Confronting false discoveries in single-cell differential expression. Nat. Commun. 12, 5692 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Neufeld, A., Gao, L. L., Popp, J., Battle, A. & Witten, D. Inference after latent variable estimation for single-cell RNA sequencing data. Biostatistics 25, 270–287 (2023).

    Article  PubMed  Google Scholar 

  93. Missarova, A., Dann, E., Rosen, L., Satija, R. & Marioni, J. Leveraging neighborhood representations of single-cell data to achieve sensitive DE testing with miloDE. Genome Biol. 25, 189 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  94. Ahlmann-Eltze, C. & Huber, W. Analysis of multi-condition single-cell data with latent embedding multivariate regression. Nat. Genet. 57, 659–667 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Madrigal, A., Lu, T., Soto, L. M. & Najafabadi, H. S. A unified model for interpretable latent embedding of multi-sample, multi-condition single-cell data. Nat. Commun. 15, 6573 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Jin, K. et al. CellDrift: inferring perturbation responses in temporally sampled single-cell data. Brief. Bioinform. 23, bbac324 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  97. Dong, M., Su, D. G., Kluger, H., Fan, R. & Kluger, Y. SIMVI disentangles intrinsic and spatial-induced cellular states in spatial omics data. Nat. Commun. 16, 2990 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Cui, Y. & Yuan, Z. Prioritizing perturbation-responsive gene patterns using interpretable deep learning. Nat. Commun. 16, 6095 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Song, B. et al. Decoding heterogeneous single-cell perturbation responses. Nat. Cell Biol. 27, 493–504 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Yang, L. et al. scMAGeCK links genotypes with multiple phenotypes in single-cell CRISPR screens. Genome Biol. 21, 19 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Skinnider, M. A. et al. Cell type prioritization in single-cell data. Nat. Biotechnol. 39, 30–34 (2021).

    Article  CAS  PubMed  Google Scholar 

  102. Burkhardt, D. B. et al. Quantifying the effect of experimental perturbations at single-cell resolution. Nat. Biotechnol. 39, 619–629 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. Nicol, P. B. et al. Robust identification of perturbed cell types in single-cell RNA-seq data. Nat. Commun. 15, 7610 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Li, C. et al. scRank infers drug-responsive cell types from untreated scRNA-seq data using a target-perturbed gene regulatory network. Cell Rep. Med. 5, 101568 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. Cui, Y. & Yuan, Z. Scalable condition-relevant cell niche analysis of spatial omics data with Taichi. Preprint at bioRxiv https://doi.org/10.1101/2024.05.30.596656 (2024).

  106. Teo, A. Y. Y. et al. Identification of perturbation-responsive regions and genes in comparative spatial transcriptomics atlases. Preprint at bioRxiv https://doi.org/10.1101/2024.06.13.598641 (2024).

  107. Stein-O’Brien, G. L. et al. Enter the matrix: factorization uncovers knowledge from omics. Trends Genet. 34, 790–805 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  108. Lopez, R., Gayoso, A. & Yosef, N. Enhancing scientific discoveries in molecular biology with deep generative models. Mol. Syst. Biol. 16, e9198 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  109. Zhou, Y., Luo, K., Liang, L., Chen, M. & He, X. A new Bayesian factor analysis method improves detection of genes and biological processes affected by perturbations in single-cell CRISPR screening. Nat. Methods 20, 1693–1703 (2023). This work proposes a supervised factor model that allows the direct mapping of interventions to latent factors and associated genes.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Jones, A., Townes, F. W., Li, D. & Engelhardt, B. E. Contrastive latent variable modeling with application to case-control sequencing experiments. Ann. Appl. Stat. 16, 1268–1291 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  111. Capraz, T. et al. Semi-supervised Omics Factor Analysis (SOFA) disentangles known sources of variation from latent factors in multi-omics data. Preprint at bioRxiv https://doi.org/10.1101/2024.10.10.617527 (2025).

  112. Moeed, A. et al. Identifying effects of disease on single-cells with domain-invariant generative modeling. In Proc. Causal Representation Learning Workshop at NeurIPS (NeurIPS, 2023).

  113. He, S. et al. High-plex imaging of RNA and proteins at subcellular resolution in fixed tissue by spatial molecular imaging. Nat. Biotechnol. 40, 1794–1806 (2022).

    Article  CAS  PubMed  Google Scholar 

  114. Mitchel, J. et al. Coordinated, multicellular patterns of transcriptional variation that stratify patient cohorts are revealed by tensor decomposition. Nat. Biotechnol. 43, 1192–1201 (2025).

    Article  CAS  PubMed  Google Scholar 

  115. Ramirez Flores, R. O., Lanzer, J. D., Dimitrov, D., Velten, B. & Saez-Rodriguez, J. Multicellular factor analysis of single-cell data for a tissue-centric understanding of disease. eLife 12, e93161 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  116. Jerby-Arnon, L. & Regev, A. DIALOGUE maps multicellular programs in tissue from single-cell or spatial transcriptomics data. Nat. Biotechnol. 40, 1467–1477 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Pekayvaz, K. et al. Multiomic analyses uncover immunological signatures in acute and chronic coronary syndromes. Nat. Med. 30, 1696–1710 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Macnair, W. et al. snRNA-seq stratifies multiple sclerosis patients into distinct white matter glial responses. Neuron 113, 396–410.e9 (2025).

    Article  CAS  PubMed  Google Scholar 

  119. Yuan, Q. & Duren, Z. Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data. Nat. Biotechnol. 43, 247–257 (2025).

    Article  PubMed  Google Scholar 

  120. Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. In Proc. 31st International Conference on Neural Information Processing Systems (eds von Luxburg, U. et al.) 4768–4777 (Curran, 2017).

  121. Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 3145–3153 (PMLR, 2017).

  122. Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 3319–3328 (PMLR, 2017).

  123. Kalfon, J., Samaran, J., Peyré, G. & Cantini, L. scPRINT: pre-training on 50 million cells allows robust gene network predictions. Nat. Commun. 16, 3607 (2025). This work introduces a foundational model that combines the learned representations with diverse prior knowledge to evaluate and improve gene regulatory network inference.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  124. Hao, M. et al. Large-scale foundation model on single-cell transcriptomics. Nat. Methods 21, 1481–1491 (2024).

    Article  CAS  PubMed  Google Scholar 

  125. Yang, X. et al. GeneCompass: deciphering universal gene regulatory mechanisms with a knowledge-informed cross-species foundation model. Cell Res. 34, 830–845 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  126. Tu, X. et al. A supervised contrastive framework for learning disentangled representations of cell perturbation data. In Proc. 18th Machine Learning in Computational Biology Meeting (eds Knowles, D. A. & Mostafavi, S.) 90–100 (PMLR, 2024).

  127. Weinberger, E., Conrad, R. & Ashuach, T. Modeling variable guide efficiency in pooled CRISPR screens with ContrastiveVI+. In Proc. NeurIPS 2024 Workshop on AI for New Drug Modalities (NeurIPS, 2024).

  128. Aliee, H. et al. inVAE: conditionally invariant representation learning for generating multivariate single-cell reference maps. Preprint at bioRxiv https://doi.org/10.1101/2024.12.06.627196 (2024).

  129. Weinberger, E., Lopez, R., Huetter, J.-C. & Regev, A. Disentangling shared and group-specific variations in single-cell transcriptomics data with multiGroupVI. In Proc. 17th Machine Learning in Computational Biology Meeting 16–32 (PMLR, 2022).

  130. DeTomaso, D. & Yosef, N. Hotspot identifies informative gene modules across modalities of single-cell genomics. Cell Syst. 12, 446–456.e9 (2021).

    Article  CAS  PubMed  Google Scholar 

  131. Xu, Y., Fleming, S., Tegtmeyer, M., McCarroll, S. A. & Babadi, M. Explainable modeling of single-cell perturbation data using attention and sparse dictionary learning. Cell Syst. 16, 101245 (2025).

    Article  CAS  PubMed  Google Scholar 

  132. Lotfollahi, M. et al. Biologically informed deep learning to query gene programs in single-cell atlases. Nat. Cell Biol. 25, 337–350 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  133. Seninge, L., Anastopoulos, I., Ding, H. & Stuart, J. VEGA is an interpretable generative model for inferring biological network activity in single-cell transcriptomics. Nat. Commun. 12, 5684 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  134. Zhao, Y., Cai, H., Zhang, Z., Tang, J. & Li, Y. Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data. Nat. Commun. 12, 5261 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  135. Doncevic, D. & Herrmann, C. Biologically informed variational autoencoders allow predictive modeling of genetic and drug-induced perturbations. Bioinformatics 39, btad387 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  136. Saraswat, M. et al. Decoding plasticity regulators and transition trajectories in glioblastoma with single-cell multiomics. Preprint at bioRxiv https://doi.org/10.1101/2025.05.13.653733 (2025).

  137. Nazaret, A. et al. Joint representation and visualization of derailed cell states with Decipher. Genome Biol. 26, 219 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  138. Svensson, V., Gayoso, A., Yosef, N. & Pachter, L. Interpretable factor models of single-cell RNA-seq via variational autoencoders. Bioinformatics 36, 3418–3421 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  139. Lucas, J., Tucker, G., Grosse, R. B. & Norouzi, M. Don’t blame the ELBO! a linear VAE perspective on posterior collapse. In Proc. 33rd International Conference on Neural Information Processing Systems (eds Wallach, H. M.) 9408–9418 (Curran, 2019).

  140. Garrido-Rodriguez, M., Zirngibl, K., Ivanova, O., Lobentanzer, S. & Saez-Rodriguez, J. Integrating knowledge and omics to decipher mechanisms via large-scale models of signaling networks. Mol. Syst. Biol. 18, e11036 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  141. Badia-I-Mompel, P. et al. decoupleR: ensemble of computational methods to infer biological activities from omics data. Bioinform. Adv. 2, vbac016 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  142. Kunes, R. Z., Walle, T., Land, M., Nawy, T. & Pe’er, D. Supervised discovery of interpretable gene programs from single-cell data. Nat. Biotechnol. 42, 1084–1095 (2024).

    Article  CAS  PubMed  Google Scholar 

  143. Qoku, A. & Buettner, F. Encoding domain knowledge in multi-view latent variable models: a Bayesian approach with structured sparsity. In Proc. 26th International Conference on Artificial Intelligence and Statistics 11545–11562 (PMLR, 2023).

  144. Buettner, F., Pratanwanich, N., McCarthy, D. J., Marioni, J. C. & Stegle, O. f-scLVM: scalable and versatile factor analysis for single-cell RNA-seq. Genome Biol. 18, 212 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  145. Gut, G., Stark, S. G., Rätsch, G. & Davidson, N. R. pmVAE: learning interpretable single-cell representations with pathway modules. Preprint at bioRxiv https://doi.org/10.1101/2021.01.28.428664 (2021).

  146. Niyakan, S., Luo, X., Yoon, B.-J. & Qian, X. Biologically interpretable VAE with supervision for transcriptomics data under ordinal perturbations. In ICLR 2024 Workshop on Machine Learning for Genomics Explorations (ICLR, 2024).

  147. de la Fuente Cedeño, J. et al. Interpretable causal representation learning for biological data in the pathway space. In Proc. 13th International Conference on Learning Representations (eds Yue, Y. et al.) (ICLR, 2025).

  148. Gonzalez, G. et al. Combinatorial prediction of therapeutic perturbations using causally inspired neural networks. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-025-01481-x (2025).

    Article  PubMed  Google Scholar 

  149. Bai, D., Ellington, C. N., Mo, S., Song, L. & Xing, E. P. AttentionPert: accurately modeling multiplexed genetic perturbations with multi-scale effects. Bioinformatics 40, i453–i461 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  150. Wu, Y., et al. Predicting cellular responses with variational causal inference and refined relational information. In Proc. 11th International Conference on Learning Representations (ICLR, 2023).

  151. Alsulami, R. et al. PrePR-CT: predicting perturbation responses in unseen cell types using cell-type-specific graphs. Preprint at bioRxiv https://doi.org/10.1101/2024.07.24.604816 (2024).

  152. Huang, W. & Liu, H. Predicting single-cell cellular responses to perturbations using cycle consistency learning. Bioinformatics 40, i462–i470 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  153. Hetzel, L. et al. Predicting cellular responses to novel drug perturbations at a single-cell resolution. In Proc. 36th International Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 26711–26722 (Curran, 2022).

  154. Qi, X. et al. Predicting transcriptional responses to novel chemical perturbations using deep generative model for drug discovery. Nat. Commun. 15, 9256 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  155. Schrod, S., Zacharias, H. U., Beißbarth, T., Hauschild, A.-C. & Altenbuchinger, M. CODEX: COunterfactual Deep learning for the in silico EXploration of cancer cell line perturbations. Bioinformatics 40, i91–i99 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  156. Huang, K. et al. Sequential optimal experimental design of perturbation screens guided by multi-modal priors. In 28th Annual Conference on Research in Computational Molecular Biology (ed. Ma, J.) 17–37 (Springer-Verlag, 2024).

  157. Märtens, K., Donovan-Maiye, R. & Ferkinghoff-Borg, J. Enhancing generative perturbation models with LLM-informed gene embeddings. In Proc. Workshop on Machine Learning for Genomics Explorations (ICLR, 2024).

  158. Klein, D. et al. CellFlow enables generative single-cell phenotype modeling with flow matching. Preprint at bioRxiv https://doi.org/10.1101/2025.04.11.648220 (2025).

  159. Hetzel, L. et al. Predicting cellular responses to novel drug perturbations at a single-cell resolution. In Advances in Neural Information Processing Systems 35, 26711–26722 (NeurIPS, 2025).

  160. Badia-i-Mompel, P. et al. Comparison and evaluation of methods to infer gene regulatory networks from multimodal single-cell data. Preprint at bioRxiv https://doi.org/10.1101/2024.12.20.629764 (2025).

  161. Hasanaj, E. et al. Multimodal benchmarking of foundation model representations for cellular perturbation response prediction. Preprint at bioRxiv https://doi.org/10.1101/2025.06.26.661186 (2025).

  162. Szalai, B. & Saez-Rodriguez, J. Why do pathway methods work better than they should? FEBS Lett. 594, 4189–4200 (2020).

    Article  CAS  PubMed  Google Scholar 

  163. Barabási, A.-L. & Oltvai, Z. N. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113 (2004).

    Article  PubMed  Google Scholar 

  164. Pratapa, A., Jalihal, A. P., Law, J. N., Bharadwaj, A. & Murali, T. M. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat. Methods 17, 147–154 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  165. Gao, S. & Wang, X. Quantitative utilization of prior biological knowledge in the Bayesian network modeling of gene expression data. BMC Bioinform. 12, 359 (2011).

    Article  Google Scholar 

  166. Huynh-Thu, V. A., Irrthum, A., Wehenkel, L. & Geurts, P. Inferring regulatory networks from expression data using tree-based methods. PLoS One 5, e12776 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  167. Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  168. Wang, L. et al. Dictys: dynamic gene regulatory network dissects developmental continuum with single-cell multiomics. Nat. Methods 20, 1368–1378 (2023).

    Article  CAS  PubMed  Google Scholar 

  169. Dong, M. & Kluger, Y. GEASS: neural causal feature selection for high-dimensional biological data. In Proc. 11th International Conference on Learning Representations (ICLR, 2023).

  170. Wang, W. et al. RegVelo: gene-regulatory-informed dynamics of single cells. Preprint at bioRxiv https://doi.org/10.1101/2024.12.11.627935 (2024).

  171. Tanevski, J. et al. Learning tissue representation by identification of persistent local patterns in spatial omics data. Nat. Commun. 16, 4071 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  172. Tanevski, J., Flores, R. O. R., Gabor, A., Schapiro, D. & Saez-Rodriguez, J. Explainable multiview framework for dissecting spatial relationships from highly multiplexed data. Genome Biol. 23, 97 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  173. Megas, S. et al. Estimation of single-cell and tissue perturbation effect in spatial transcriptomics via spatial causal disentanglement. In Proc. 13th International Conference on Learning Representations (ICLR, 2024).

  174. Wen, Y. et al. Applying causal discovery to single-cell analyses using CausalCell. eLife 12, e81464 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  175. Belyaeva, A., Squires, C. & Uhler, C. DCI: learning causal differences between gene regulatory networks. Bioinformatics 37, 3067–3069 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  176. Chevalley, M., Roohani, Y. H., Mehrjou, A., Leskovec, J. & Schwab, P. A large-scale benchmark for network inference from single-cell perturbation data. Commun. Biol. 8, 412 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  177. Zheng, X., Dan, C., Aragam, B., Ravikumar, P. & Xing, E. Learning sparse nonparametric DAGs. In Proc. 23rd International Conference on Artificial Intelligence and Statistics (eds Chiappa, S. & Calandra, R.) 3414–3425 (PMLR, 2020).

  178. Yu, Y., Chen, J., Gao, T. & Yu, M. DAG-GNN: DAG structure learning with graph neural networks. In Proc. 36th International Conference on Machine Learning 7154–7163 (PMLR, 2019).

  179. Wu, M., Bao, Y., Barzilay, R. & Jaakkola, T. Sample, estimate, aggregate: a recipe for causal discovery foundation models. In Transactions on Machine Learning Research (eds Kamath, G. et al.) 10 (TMLR, 2025).

  180. Zhang, J., Cammarata, L., Squires, C., Sapsis, T. P. & Uhler, C. Active learning for optimal intervention design in causal models. Nat. Mach. Intell. 5, 1066–1075 (2023). This work introduces an early active learning scheme that uses a causal graph model to guide the experimental exploration of genetic perturbations.

    Article  Google Scholar 

  181. Lorch, L., Sussex, S., Rothfuss, J., Krause, A. & Schölkopf, B. In Proc. 36th International Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 13104–13118 (Curran, 2022).

  182. Sethuraman, M. G. et al. NODAGS-Flow: nonlinear cyclic causal structure learning. In Proc. 26th International Conference on Artificial Intelligence and Statistics (eds Ruiz, F. et al.) 6371–6387 (PMLR, 2023).

  183. Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  184. Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).

    Article  CAS  PubMed  Google Scholar 

  185. Comon, P. Independent component analysis, a new concept? Signal Process. 36, 287–314 (1994).

    Article  Google Scholar 

  186. Hyvärinen, A. & Oja, E. Independent component analysis: algorithms and applications. Neural Netw. 13, 411–430 (2000).

    Article  PubMed  Google Scholar 

  187. Yu, H. & Welch, J. D. MichiGAN: sampling from disentangled representations of single-cell data using generative adversarial networks. Genome Biol. 22, 158 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  188. Moran, G. E., Sridhar, D., Wang, Y. & Blei, D. Identifiable deep generative models via sparse decoding. In Transactions on Machine Learning Research (eds Kamath, G. et al.) 182 (TMLR, 2022).

  189. Lopez, R., Regier, J., Jordan, M. I. & Yosef, N. Information constraints on auto-encoding variational Bayes. In Proc. 32nd International Conference on Neural Information Processing Systems (eds Bengio, S. et al.) 6117–6128 (Curran, 2018).

  190. Gayoso, A. et al. A Python library for probabilistic analysis of single-cell omics data. Nat. Biotechnol. 40, 163–166 (2022). This work combines a series of variational autoencoder extensions that build on scVI into a centralized Python framework that aims to accelerate the development of probabilistic (autoencoder) models for single-cell omics data analysis.

    Article  CAS  PubMed  Google Scholar 

  191. Hyvärinen, A. & Pajunen, P. Nonlinear independent component analysis: Existence and uniqueness results. Neural Netw. 12, 429–439 (1999).

    Article  PubMed  Google Scholar 

  192. Hyvärinen, A., Khemakhem, I. & Morioka, H. Nonlinear independent component analysis for principled disentanglement in unsupervised deep learning. Patterns 4, 100844 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  193. Lachapelle, S. et al. Disentanglement via mechanism sparsity regularization: a new principle for nonlinear ICA. In First Conference on Causal Learning and Reasoning (eds Schölkopf, B. et al.) 177, 428–484 (2022).

  194. Zou, J. Y., Hsu, D. J., Parkes, D. C. & Adams, R. P. Contrastive learning using spectral methods. In Proc. 27th International Conference on Neural Information Processing Systems - Volume 2 (eds Burges, C. J. C. et al.) 2238–2246 (Curran, 2013).

  195. Abid, A., Zhang, M. J., Bagaria, V. K. & Zou, J. Exploring patterns enriched in a dataset with contrastive principal component analysis. Nat. Commun. 9, 2134 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  196. Li, D., Jones, A. & Engelhardt, B. Probabilistic contrastive dimension reduction for case-control study data. Ann. Appl. Stat. 18, 2207–2229 (2024).

    Article  Google Scholar 

  197. Boileau, P., Hejazi, N. S. & Dudoit, S. Exploring high-dimensional biological data with sparse contrastive principal component analysis. Bioinformatics 36, 3422–3430 (2020).

    Article  CAS  PubMed  Google Scholar 

  198. Abid, A. & Zou, J. Contrastive variational autoencoder enhances salient features. Preprint at https://doi.org/10.48550/arXiv.1902.04601 (2019).

  199. Severson, K. A., Ghosh, S. & Ng, K. Unsupervised learning with contrastive latent variable models. In Proc. AAAI Conference on Artificial Intelligence 33, 4862–4869 (AAAI, 2019).

  200. Zhang, L. & Zhang, S. Learning common and specific patterns from data of multiple interrelated biological scenarios with matrix factorization. Nucleic Acids Res. 47, 6606–6617 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  201. Qian, K., Fu, S., Li, H. & Li, W. V. scINSIGHT for interpreting single-cell gene expression from biologically heterogeneous data. Genome Biol. 23, 82 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  202. Weinberger, E., Beebe-Wang, N. & Lee, S.-I. Moment matching deep contrastive latent variable models. In Proc. 25th International Conference on Artificial Intelligence and Statistics 2354–2371 (PMLR, 2022).

  203. Zhang, Z., Zhao, X., Bindra, M., Qiu, P. & Zhang, X. scDisInFact: disentangled learning for integration and prediction of multi-batch multi-condition single-cell RNA-sequencing data. Nat. Commun. 15, 912 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  204. Megas, S. et al. Integrating multi-covariate disentanglement with counterfactual analysis on synthetic data enables cell type discovery and counterfactual predictions. Preprint at bioRxiv https://doi.org/10.1101/2025.06.03.657578 (2025).

  205. Inecik, K., Kara, A., Rose, A., Haniffa, M. & Theis, F. J. TarDis: achieving robust and structured disentanglement of multiple covariates. In Proc. Research in Computational Molecular Biology: 29th International Conference, RECOMB 2025 (ed. Sankararaman, S.) 285–289 (Springer, 2025).

  206. Inecik, K., Uhlmann, A., Lotfollahi, M. & Theis, F. MultiCPA: multimodal compositional perturbation autoencoder. Preprint at bioRxiv https://doi.org/10.1101/2022.07.08.499049 (2022).

  207. Wei, X., Dong, J. & Wang, F. scPreGAN, a deep generative model for predicting the response of single-cell expression to perturbation. Bioinformatics 38, 3377–3384 (2022).

    Article  CAS  PubMed  Google Scholar 

  208. Mao, H. et al. Learning identifiable factorized causal representations of cellular responses. In Advances in Neural Information Processing Systems 37 (NeurIPS 2024) (eds Globerson, A. et al.) 121630–121669 (NeurIPS, 2024).

  209. Miladinovic, D. et al. In silico biological discovery with large perturbation models. Nat. Comput. Sci. 5, 1029–1040 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  210. Adduri, A. K. et al. Predicting cellular responses to perturbation across diverse contexts with State. Preprint at bioRxiv https://doi.org/10.1101/2025.06.26.661135 (2025).

  211. Rampášek, L., Hidru, D., Smirnov, P., Haibe-Kains, B. & Goldenberg, A. Dr.VAE: improving drug response prediction via modeling of drug perturbation effects. Bioinformatics 35, 3743–3751 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  212. Zhang, J. et al. Identifiability guarantees for causal disentanglement from soft interventions. In Proc. 37th International Conference on Neural Information Processing Systems (eds Oh, A. et al.) 50254–50292 (Curran, 2023).

  213. Wu, Y. et al. PerturBench: benchmarking machine learning models for cellular perturbation analysis. In NeurIPS 2024 Workshop on AI for New Drug Modalities (NeurIPS, 2024).

  214. Liu, T. et al. scELMo: embeddings from language models are good learners for single-cell data analysis. Preprint at bioRxiv https://doi.org/10.1101/2023.12.07.569910 (2023).

  215. Zhong, J., Li, L., Dannenfelser, R. & Yao, V. Benchmarking gene embeddings from sequence, expression, network, and text models for functional prediction tasks. Preprint at bioRxiv https://doi.org/10.1101/2025.01.29.635607 (2025).

  216. Istrate, A.-M., Li, D. & Karaletsos, T. scGenePT: is language all you need for modeling single-cell perturbations? Preprint at bioRxiv https://doi.org/10.1101/2024.10.23.619972 (2024).

  217. Wenteler, A. et al. PertEval-scFM: benchmarking single-cell foundation models for perturbation effect prediction. In 42nd International Conference on Machine Learning (ICML, 2025).

  218. Csendes, G., Sanz, G., Szalay, K. Z. & Szalai, B. Benchmarking foundation cell models for post-perturbation RNA-seq prediction. BMC Genom. 26, 393 (2025).

    Article  Google Scholar 

  219. Kernfeld, E., Yang, Y., Weinstock, J. S., Battle, A. & Cahan, P. A comparison of computational methods for expression forecasting. Genome Biol. 26, 388 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  220. Viñas Torné, R. et al. Systema: a framework for evaluating genetic perturbation response prediction beyond systematic variation. Nat. Biotechnol. https://doi.org/10.1038/s41587-025-02777-8 (2025).

    Article  PubMed  Google Scholar 

  221. Ahlmann-Eltze, C., Huber, W. & Anders, S. Deep-learning-based gene perturbation effect prediction does not yet outperform simple linear baselines. Nat. Methods 22, 1657–1661 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  222. von Kügelgen, J., Ketterer, J., Shen, X., Meinshausen, N. & Peters, J. Representation learning for distributional perturbation extrapolation. In Learning Meaningful Representations of Life (LMRL) Workshop at ICLR (ICLR, 2025).

  223. Carvalho, C. M. et al. High-dimensional sparse factor modeling: applications in gene expression genomics. J. Am. Stat. Assoc. 103, 1438–1456 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  224. Liu, E., Zhang, J. & Uhler, C. Learning genetic perturbation effects with variational causal inference. Preprint at bioRxiv https://doi.org/10.1101/2025.06.05.657988 (2025).

  225. Jiang, Q., Chen, S., Chen, X. & Jiang, R. scPRAM accurately predicts single-cell gene expression perturbation response based on attention mechanism. Bioinformatics 40, btae265 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  226. Klein, D. et al. Mapping cells through time and space with moscot. Nature 638, 1065–1075 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  227. Schiebinger, G. et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell 176, 928–943.e22 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  228. Kapuńniak, K. et al. Metric flow matching for smooth interpolations on the data manifold. In Proc. 38th International Conference on Neural Information Processing Systems (eds Globerson, A. et al.) 135011–135042 (Curran, 2024).

  229. Tong, A. et al. Improving and generalizing flow-based generative models with minibatch optimal transport. In Transactions on Machine Learning Research (eds Kamath, G. et al.) 1768 (TMLR, 2024).

  230. Erbe, R., Stein-O’Brien, G. & Fertig, E. J. Transcriptomic forecasting with neural ordinary differential equations. Patterns 4, 100793 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  231. Palma, A. et al. Multi-modal and multi-attribute generation of single cells with CFGen. In Proc. 13th International Conference on Learning Representations (ICLR, 2025).

  232. Yuan, B. et al. CellBox: interpretable machine learning for perturbation biology with application to the design of cancer combination therapy. Cell Syst. 12, 128–140.e4 (2021).

    Article  CAS  PubMed  Google Scholar 

  233. Aivazidis, A. et al. Cell2fate infers RNA velocity modules to improve cell fate prediction. Nat. Methods 22, 698–707 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  234. Qiu, X. et al. Mapping transcriptomic vector fields of single cells. Cell 185, 690–711.e45 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  235. Tong, A., Huang, J., Wolf, G., van Dijk, D. & Krishnaswamy, S. Trajectorynet: a dynamic optimal transport network for modeling cellular dynamics. Proc. Mach. Learn. Res. 119, 9526–9536 (2020).

    PubMed  PubMed Central  Google Scholar 

  236. Alatkar, S. A. & Wang, D. ARTEMIS integrates autoencoders and Schrödinger Bridges to predict continuous dynamics of gene expression, cell population, and perturbation from time-series single-cell data. Bioinformatics 41, i189–i197 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  237. Somnath, V. R. et al. Aligned diffusion Schrödinger bridges. In Proc. 39th Conference on Uncertainty in Artificial Intelligence 1985–1995 (PMLR, 2023).

  238. Zhang, Z., Li, T. & Zhou, P. Learning stochastic dynamics from snapshots through regularized unbalanced optimal transport. In Proc. 13th International Conference on Learning Representations (ICLR, 2025).

  239. Yeo, G. H. T., Saksena, S. D. & Gifford, D. K. Generative modeling of single-cell time series with PRESCIENT enables prediction of cell trajectories with interventions. Nat. Commun. 12, 3222 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  240. Luo, E., Hao, M., Wei, L. & Zhang, X. scDiffusion: conditional generation of high-quality single-cell data using diffusion model. Bioinformatics 40, btae518 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  241. Luecken, M. D. & Theis, F. J. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol. 15, e8746 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  242. Huang, S., Soto, A. M. & Sonnenschein, C. The end of the genetic paradigm of cancer. PLoS Biol. 23, e3003052 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  243. Szałata, A. et al. A benchmark for prediction of transcriptomic responses to chemical perturbations across cell types. In Proc. 38th International Conference on Neural Information Processing Systems (eds Globerson, A. et al.) 20566–20616 (Curran, 2024).

  244. Kernfeld, E., Keener, R., Cahan, P. & Battle, A. Transcriptome data are insufficient to control false discoveries in regulatory network inference. Cell Syst. 15, 709–724.e13 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  245. Caranzano, I. et al. Sparsity is all you need: rethinking biological pathway-informed approaches in deep learning. Preprint at https://doi.org/10.48550/arXiv.2505.04300 (2025).

  246. Radig, J. et al. Tracking biological hallucinations in single-cell perturbation predictions using scArchon, a comprehensive benchmarking platform. Preprint at bioRxiv https://doi.org/10.1101/2025.06.23.661046 (2025).

  247. Kedzierska, K. Z., Crawford, L., Amini, A. P. & Lu, A. X. Zero-shot evaluation reveals limitations of single-cell foundation models. Genome Biol. 26, 101 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  248. Mejia, G. M. et al. Diversity by design: addressing mode collapse improves scRNA-seq perturbation modeling on well-calibrated metrics. In ICML 2025 Generative AI and Biology Workshop (ICML, 2025).

  249. Mahmood, F. A benchmarking crisis in biomedical machine learning. Nat. Med. 31, 1060 (2025).

    Article  CAS  PubMed  Google Scholar 

  250. Ji, Y. et al. Optimal distance metrics for single-cell RNA-seq populations. Preprint at bioRxiv https://doi.org/10.1101/2023.12.26.572833 (2023).

  251. Luecken, M. D. et al. Defining and benchmarking open problems in single-cell analysis. Nat. Biotechnol. 43, 1035–1040 (2025).

    Article  CAS  PubMed  Google Scholar 

  252. Roohani, Y. H. et al. Virtual Cell Challenge: toward a Turing test for the virtual cell. Cell 188, 3370–3374 (2025).

    Article  CAS  PubMed  Google Scholar 

  253. Heumos, L. et al. Pertpy: an end-to-end framework for perturbation analysis. Preprint at bioRxiv https://doi.org/10.1101/2024.08.04.606516 (2024).

  254. CZI Cell Science Program et al. CZ CELLxGENE Discover: a single-cell data platform for scalable exploration, analysis and modeling of aggregated data. Nucleic Acids Res. 53, D886–D900 (2025).

    Article  Google Scholar 

  255. Youngblut, N. D. et al. scBaseCamp: an AI agent-curated, uniformly processed, and continually expanding single cell data repository. Preprint at bioRxiv https://doi.org/10.1101/2025.02.27.640494 (2025).

  256. Roohani, Y. et al. BioDiscoveryAgent: an AI agent for designing genetic perturbation experiments. The 13th International Conference on Learning Representations (ICLR, 2024).

  257. Weinreb, C., Rodriguez-Fraticelli, A., Camargo, F. D. & Klein, A. M. Lineage tracing on transcriptional landscapes links state to fate during differentiation. Science 367, eaaw3381 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  258. Chen, W. et al. Live-seq enables temporal transcriptomic recording of single cells. Nature 608, 733–740 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  259. Kobayashi-Kirschvink, K. J. et al. Prediction of single-cell RNA expression profiles in live cells by Raman microscopy with Raman2RNA. Nat. Biotechnol. 42, 1726–1734 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  260. Reynolds, D. E. et al. Temporal and spatial omics technologies for 4D profiling. Nat. Methods 22, 1408–1419 (2025).

    Article  CAS  PubMed  Google Scholar 

  261. Gu, J. et al. Mapping multimodal phenotypes to perturbations in cells and tissue with CRISPRmap. Nat. Biotechnol. 43, 1101–1115 (2025).

    Article  CAS  PubMed  Google Scholar 

  262. Dhainaut, M. et al. Spatial CRISPR genomics identifies regulators of the tumor microenvironment. Cell 185, 1223–1239.e20 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  263. Saunders, R. A. et al. A platform for multimodal in vivo pooled genetic screens reveals regulators of liver function. Preprint at bioRxiv https://doi.org/10.1101/2024.11.18.624217 (2025).

  264. Breinig, M. et al. Integrated in vivo combinatorial functional genomics and spatial transcriptomics of tumours to decode genotype-to-phenotype relationships. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-025-01437-1 (2025).

    Article  PubMed  Google Scholar 

  265. Metzner, E., Southard, K. M. & Norman, T. M. Multiome Perturb-seq unlocks scalable discovery of integrated perturbation effects on the transcriptome and epigenome. Cell Syst. 16, 101161 (2025).

    Article  CAS  PubMed  Google Scholar 

  266. Mimitou, E. P. et al. Multiplexed detection of proteins, transcriptomes, clonotypes and CRISPR perturbations in single cells. Nat. Methods 16, 409–412 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  267. Ryu, J., Lopez, R., Bunne, C., Pinello, L. & Regev, A. Cross-modality matching and prediction of perturbation responses with labeled Gromov-Wasserstein optimal transport. In ICML 2024 AI for Science Workshop (ICML, 2024).

  268. Wenckstern, J. et al. AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery. In Proc. Learning Meaningful Representations of Life (LMRL) Workshop at ICLR (ICLR, 2025).

  269. Chen, W. et al. A visual-omics foundation model to bridge histopathology with spatial transcriptomics. Nat. Methods 22, 1568–1582 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  270. Rizvi, S. A. et al. Scaling large language models for next-generation single-cell analysis. Preprint at bioRxiv https://doi.org/10.1101/2025.04.14.648850 (2025).

  271. Ji, Y. et al. Scalable and universal prediction of cellular phenotypes. Preprint at bioRxiv https://doi.org/10.1101/2024.08.12.607533 (2025).

  272. Gupta, A. et al. SubCell: vision foundation models for microscopy capture single-cell biology. Preprint at bioRxiv https://doi.org/10.1101/2024.12.06.627299 (2025).

  273. Maan, H. et al. Multi-modal disentanglement of spatial transcriptomics and histopathology imaging. In Learning Meaningful Representations of Life (LMRL) Workshop at ICLR (ICLR, 2025).

  274. Datlinger, P. et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat. Methods 14, 297–301 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  275. Lalli, M. A., Avey, D., Dougherty, J. D., Milbrandt, J. & Mitra, R. D. High-throughput single-cell functional elucidation of neurodevelopmental disease-associated genes reveals convergent mechanisms altering neuronal differentiation. Genome Res. 30, 1317–1331 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  276. Huguet, G. et al. Manifold interpolating optimal-transport flows for trajectory inference. Adv. Neural Inf. Process. Syst. 35, 29705–29718 (2022).

    PubMed  PubMed Central  Google Scholar 

  277. Wang, S.-W., Herriges, M. J., Hurley, K., Kotton, D. N. & Klein, A. M. CoSpar identifies early cell fate biases from single-cell transcriptomic and lineage information. Nat. Biotechnol. 40, 1066–1074 (2022).

    Article  CAS  PubMed  Google Scholar 

  278. Heimberg, G., Bhatnagar, R., El-Samad, H. & Thomson, M. Low dimensionality in gene expression data enables the accurate extraction of transcriptional programs from shallow sequencing. Cell Syst. 2, 239–250 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  279. VanderWeele, T. J. & Shpitser, I. On the definition of a confounder. Ann. Stat. 41, 196–220 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  280. Fröhlich, F. et al. Efficient parameter estimation enables the prediction of drug response using a mechanistic pan-cancer pathway model. Cell Syst. 7, 567–579.e6 (2018).

    Article  PubMed  Google Scholar 

  281. Cuturi, M. et al. Optimal Transport Tools (OTT): a JAX toolbox for all things Wasserstein. Preprint at https://doi.org/10.48550/arXiv.2201.12324 (2022).

Download references

Acknowledgements

The authors thank S. Müller-Dott, P. S. L. Schäfer, P. Rodriguez Mier, A. Moeed, M. Garrido Rodriguez-Cordoba, R. O. Ramirez Flores, R. Abdulhamid and J. Saez-Rodriguez for their feedback on the initial draft. The authors’ work is supported through state funds approved by the State Parliament of Baden-Württemberg for the Innovation Campus Health + Life Science alliance Heidelberg Mannheim, the Data Science Collaborative Research Programme 2022 by the Novo Nordisk Foundation (grant NNF22OC0076414), and the European Research Council (Synergy Grant DECODE 810296). The authors also acknowledge funding from GSK through the EMBL-GSK collaboration framework (3000038350).

Author information

Authors and Affiliations

Authors

Contributions

D.D., S.S. and M.R. researched the literature. D.D., S.S. and O.S. contributed substantially to discussions of the content. All authors wrote the article and reviewed and/or edited the manuscript.

Corresponding authors

Correspondence to Daniel Dimitrov, Stefan Schrod or Oliver Stegle.

Ethics declarations

Competing interests

O.S. is a paid consultant of Insitro. The other authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Genetics thanks the anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Online resource: https://interp-extrap-perturb.readthedocs.io

Supplementary information

Glossary

Agentic workflows

A computational process in which multiple task-specific models (agents) autonomously collaborate to plan and execute a sequence of tasks, attempting to achieve a complex common objective with minimal human intervention.

Autoencoders

Types of neural networks that learn a compressed, low-dimensional representation (encoding) of input data and then reconstruct (decode) the original input from the (typically) compressed encoding.

Causal graph models

Statistical models that represent cause–and–effect relationships through a structured graph in which variables are represented by nodes and causal influences by directed edges.

Causal mechanisms

Directed, causal interactions between specific molecules through which signals propagate.

Causal signatures

A set of observable variables that reflect the underlying causal processes, such as perturbations, cellular heterogeneity, regulatory layers, and temporal and spatial scales.

Conditional independence

The mutual status of two variables that no longer provide information about each other once other variables are accounted for.

Confounders

Extraneous factors that, if not controlled for, can produce misleading or spurious associations between variables of interest.

Counterfactual

A hypothetical outcome representing what would have occurred under alternative conditions or different interventions from those actually observed.

Diffusion models

A class of generative models that systematically introduce noise into data and attempt to reverse this process to generate new data by modelling complex probability distributions.

Embeddings

Low-dimensional vector (or matrix) representations of an entity, such as a sample, feature or condition, that capture its relevant properties and relationships.

Factor models

Statistical models that represent observed variables as linear combinations of lower-dimension latent factors plus noise, in which each factor captures shared variation among the variables.

Gene programmes

A coordinated set of genes that represent shared biological functions and responses.

Generalize

To maintain performance and validity across datasets or conditions beyond those used during development or training, indicating robustness and broader applicability.

Generative models

Models designed to learn the underlying distributions of datasets, in order to generate new, similar data from them.

Identifiable

A model’s parameters or solutions are identifiable if they can be uniquely determined from the available data under the assumed model.

Interventions

Deliberate actions to manipulate a biological variable or process within a system to observe their effects.

Latent spaces

Abstract representations of the data that capture the essential features and relationships in low dimensions.

Latent variable

A hidden or unobservable variable that cannot be measured directly but is inferred from observable data, ideally representing the underlying factors or structures influencing the observed measurements.

Optimal transport

A method used to pair distributions of cells (for example, control and perturbed) in a cost-efficient way, while preserving overall mass.

Ordinary differential equations

Equations or sets of equations that describe a rate of change of a quantity (for example, RNA degradation rate).

Perturbations

Disturbances or deviations from a system’s normal or steady state, which can be intentional or unintentional.

Prior knowledge

Information about a biological system, such as molecular interactions, pathways or phenotypic relationships, collected or estimated from diverse experiments and data modalities.

Pseudotime

An estimate that orders cells along a continuous trajectory, such as differentiation, by using the similarities in their gene expression profiles.

RNA velocity

An estimate of the time derivative of gene expression states, commonly calculated by analysing the ratios of spliced to unspliced messenger RNAs.

Spurious correlations

Relationships between pairs of variables that seem to be causal but are solely coincidental or owing to the influence of third variables linking them.

Supervised

A machine learning paradigm in which a model is trained on input features paired with known labels or outcomes.

Transformer

A neural network architecture based on attention that processes data by computing pairwise relationships between elements in parallel.

Unsupervised

A machine learning paradigm in which a model learns from input data without access to known labels or categories.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dimitrov, D., Schrod, S., Rohbeck, M. et al. Interpretation, extrapolation and perturbation of single cells. Nat Rev Genet (2026). https://doi.org/10.1038/s41576-025-00920-4

Download citation

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41576-025-00920-4

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing