Interpretation, extrapolation and perturbation of single cells

Dimitrov, Daniel; Schrod, Stefan; Rohbeck, Martin; Stegle, Oliver

doi:10.1038/s41576-025-00920-4

Review Article
Published: 02 January 2026

Interpretation, extrapolation and perturbation of single cells

Nature Reviews Genetics (2026)Cite this article

7132 Accesses
13 Altmetric
Metrics details

Subjects

Abstract

Single-cell analyses have transitioned from descriptive atlasing towards inferring causal effects and mechanistic relationships that capture cellular logic. Technological advances and the growing scale of observational and interventional datasets have fuelled the development of machine learning methods aimed at identifying such dependencies and extrapolating perturbation effects. Here, we review and connect these approaches according to their modelling concepts (including representation learning, causal inference, mechanistic discovery, disentanglement and population tracing), underlying assumptions and downstream tasks. We propose a unifying ontology to guide practitioners in selecting the most suitable methods for a given biological question, with detailed technical descriptions provided in an online resource. Finally, we identify promising computational directions and underexplored data properties that could pave the way for future developments.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: From perturbations to modelling causal cellular processes.**

**Fig. 2: The overarching aims of causal and mechanistic modelling.**

**Fig. 3: An ontology for modelling alterations and response.**

Causal machine learning for single-cell genomics

Article 31 March 2025

Statistical mechanics meets single-cell biology

Article 19 April 2021

Data-driven comparison of multiple high-dimensional single-cell expression profiles

Article Open access 01 November 2021

References

Regev, A. et al. The human cell atlas. eLife 6, e27041 (2017).
Article PubMed PubMed Central Google Scholar
Rood, J. E. et al. The Human Cell Atlas from a cell census to a unified foundation model. Nature 637, 1065–1071 (2025).
Article CAS PubMed Google Scholar
Argelaguet, R. et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 21, 111 (2020).
Article PubMed PubMed Central Google Scholar
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).
Article CAS PubMed PubMed Central Google Scholar
Velten, B. et al. Identifying temporal and spatial patterns of variation from multimodal data using MEFISTO. Nat. Methods 19, 179–186 (2022).
Article CAS PubMed PubMed Central Google Scholar
Tian, T., Zhang, J., Lin, X., Wei, Z. & Hakonarson, H. Dependency-aware deep generative models for multitasking analysis of spatial omics data. Nat. Methods 21, 1501–1513 (2024).
Article CAS PubMed PubMed Central Google Scholar
Zhong, C., Ang, K. S. & Chen, J. Interpretable spatially aware dimension reduction of spatial transcriptomics with STAMP. Nat. Methods 21, 2072–2083 (2024).
Article CAS PubMed PubMed Central Google Scholar
Lotfollahi, M. et al. Mapping single-cell data to reference atlases by transfer learning. Nat. Biotechnol. 40, 121–130 (2022).
Article CAS PubMed Google Scholar
Pearl, J. Causality 2nd edn (Cambridge Univ. Press, 2009).
Schölkopf, B. et al. Toward causal representation learning. Proc. IEEE 109, 612–634 (2021).
Article Google Scholar
Saez-Rodriguez, J. et al. Discrete logic modelling as a means to link protein signalling networks with functional analysis of mammalian signal transduction. Mol. Syst. Biol. 5, 331 (2009).
Article PubMed PubMed Central Google Scholar
Yazar, S. et al. Single-cell eQTL mapping identifies cell type-specific genetic control of autoimmune disease. Science 376, eabf3041 (2022).
Article CAS PubMed Google Scholar
Jerber, J. et al. Population-scale single-cell RNA-seq profiling across dopaminergic neuron differentiation. Nat. Genet. 53, 304–312 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kuppe, C. et al. Spatial multi-omic map of human myocardial infarction. Nature 608, 766–777 (2022).
Article CAS PubMed PubMed Central Google Scholar
Oliver, A. J. et al. Single-cell integration reveals metaplasia in inflammatory gut diseases. Nature 635, 699–707 (2024).
Article CAS PubMed PubMed Central Google Scholar
Velten, B. & Stegle, O. Principles and challenges of modeling temporal and spatial omics data. Nat. Methods 20, 1462–1474 (2023).
Article CAS PubMed Google Scholar
Fischer, D. S., Villanueva, M. A., Winter, P. S. & Shalek, A. K. Adapting systems biology to address the complexity of human disease in the single-cell era. Nat. Rev. Genet. 26, 514–531 (2025).
Article CAS PubMed Google Scholar
Shojaie, A. & Fox, E. B. Granger causality: a review and recent advances. Annu. Rev. Stat. Appl. 9, 289–319 (2022).
Article PubMed Google Scholar
Arnol, D., Schapiro, D., Bodenmiller, B., Saez-Rodriguez, J. & Stegle, O. Modeling cell-cell interactions from spatial molecular data with spatial variance component analysis. Cell Rep. 29, 202–211.e6 (2019).
Article CAS PubMed PubMed Central Google Scholar
Bressan, D., Battistoni, G. & Hannon, G. J. The dawn of spatial omics. Science 381, eabq4964 (2023).
Article CAS PubMed PubMed Central Google Scholar
Armingol, E., Baghdassarian, H. M. & Lewis, N. E. The diversification of methods for studying cell-cell interactions and communication. Nat. Rev. Genet. 25, 381–400 (2024).
Article CAS PubMed PubMed Central Google Scholar
Palla, G., Fischer, D. S., Regev, A. & Theis, F. J. Spatial components of molecular tissue biology. Nat. Biotechnol. 40, 308–318 (2022).
Article CAS PubMed Google Scholar
Vandereyken, K., Sifrim, A., Thienpont, B. & Voet, T. Methods and applications for single-cell and spatial multi-omics. Nat. Rev. Genet. 24, 494–515 (2023).
Article CAS PubMed Google Scholar
Badia-I-Mompel, P. et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat. Rev. Genet. 24, 739–754 (2023).
Article CAS PubMed Google Scholar
Zhang, J. et al. Tahoe-100M: a giga-scale single-cell perturbation atlas for context-dependent gene function and cellular modeling. Preprint at bioRxiv https://doi.org/10.1101/2025.02.20.639398 (2025).
McFarland, J. M. et al. Multiplexed single-cell transcriptional response profiling to define cancer vulnerabilities and therapeutic mechanism of action. Nat. Commun. 11, 4296 (2020).
Article CAS PubMed PubMed Central Google Scholar
Srivatsan, S. R. et al. Massively multiplex chemical transcriptomics at single-cell resolution. Science 367, 45–51 (2020).
Article CAS PubMed Google Scholar
Dixit, A. et al. Perturb-Seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866.e17 (2016).
Article CAS PubMed PubMed Central Google Scholar
Replogle, J. M. et al. Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq. Cell 185, 2559–2575.e28 (2022).
Article CAS PubMed PubMed Central Google Scholar
Norman, T. M. et al. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 365, 786–793 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ji, Y., Lotfollahi, M., Wolf, F. A. & Theis, F. J. Machine learning for perturbational single-cell omics. Cell Syst. 12, 522–537 (2021).
Article CAS PubMed Google Scholar
Rood, J. E., Hupalowska, A. & Regev, A. Toward a foundation model of causal cell and tissue biology with a Perturbation Cell and Tissue Atlas. Cell 187, 4520–4545 (2024).
Article CAS PubMed Google Scholar
Bock, C. et al. High-content CRISPR screening. Nat. Rev. Methods Primers 2, 8 (2022).
Article CAS Google Scholar
Peidli, S. et al. scPerturb: harmonized single-cell perturbation data. Nat. Methods 21, 531–540 (2024).
Article CAS PubMed PubMed Central Google Scholar
Ishikawa, M. et al. RENGE infers gene regulatory networks using time-series single-cell RNA-seq data with CRISPR perturbations. Commun. Biol. 6, 1290 (2023).
Article CAS PubMed PubMed Central Google Scholar
Feng, C. et al. A genome-scale single cell CRISPRi map of trans gene regulation across human pluripotent stem cell lines. Preprint at bioRxiv https://doi.org/10.1101/2024.11.28.625833 (2024).
Dong, M. et al. Causal identification of single-cell experimental perturbation effects with CINEMA-OT. Nat. Methods 20, 1769–1779 (2023). This work uses optimal transport to identify counterfactual couplings between control and perturbed populations, following the disentanglement and exclusion of perturbation effects.
Article CAS PubMed PubMed Central Google Scholar
Tejada-Lapuerta, A. et al. Causal machine learning for single-cell genomics. Nat. Genet. 57, 797–808 (2025).
Article CAS PubMed Google Scholar
Adamson, B. et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 167, 1867–1882.e21 (2016).
Article CAS PubMed PubMed Central Google Scholar
Papalexi, E. et al. Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens. Nat. Genet. 53, 322–331 (2021).
Article CAS PubMed PubMed Central Google Scholar
Duan, B. et al. Model-based understanding of single-cell CRISPR screening. Nat. Commun. 10, 2233 (2019).
Article PubMed PubMed Central Google Scholar
Jiang, L. et al. Systematic reconstruction of molecular pathway signatures using scalable single-cell perturbation screens. Nat. Cell Biol. 27, 505–517 (2025).
Article CAS PubMed PubMed Central Google Scholar
Barry, T., Wang, X., Morris, J. A., Roeder, K. & Katsevich, E. SCEPTRE improves calibration and sensitivity in single-cell CRISPR screen analysis. Genome Biol. 22, 344 (2021).
Article PubMed PubMed Central Google Scholar
Ryu, J. et al. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. Nat. Genet. 56, 925–937 (2024).
Article CAS PubMed PubMed Central Google Scholar
Sánchez-Rivera, F. J. et al. Base editing sensor libraries for high-throughput engineering and functional analysis of cancer-associated single nucleotide variants. Nat. Biotechnol. 40, 862–873 (2022).
Article PubMed PubMed Central Google Scholar
Huang, A. C. et al. X-Atlas/Orion: genome-wide perturb-seq datasets via a scalable fix-cryopreserve platform for training dose-dependent biological foundation models. Preprint at bioRxiv https://doi.org/10.1101/2025.06.11.659105 (2025).
Trapnell, C. Revealing gene function with statistical inference at single-cell resolution. Nat. Rev. Genet. 25, 623–638 (2024).
Article CAS PubMed Google Scholar
Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).
Article CAS PubMed Google Scholar
Ramirez Flores, R. O., Schäfer, P. S. L., Küchenhoff, L. & Saez-Rodriguez, J. Complementing cell taxonomies with a multicellular analysis of tissues. Physiology 39, 129–141 (2024).
Article Google Scholar
Montesuma, E. F., Mboula, F. N. & Souloumiac, A. Recent advances in optimal transport for machine learning. IEEE Trans. Pattern Anal. Mach. Intell. 47, 1161–1180 (2025).
Article PubMed Google Scholar
Bunne, C., Schiebinger, G., Krause, A., Regev, A. & Cuturi, M. Optimal transport for single-cell and spatial omics. Nat. Rev. Methods Primers 4, 58 (2024).
Article CAS Google Scholar
Szałata, A. et al. Transformers in single-cell omics: a review and new perspectives. Nat. Methods 21, 1430–1443 (2024).
Article PubMed Google Scholar
Consens, M. E. et al. Transformers and genome language models. Nat. Mach. Intell. 7, 346–362 (2025).
Article Google Scholar
Bunne, C. et al. How to build the virtual cell with artificial intelligence: Priorities and opportunities. Cell 187, 7045–7063 (2024).
Article CAS PubMed PubMed Central Google Scholar
Lobentanzer, S., Rodriguez-Mier, P., Bauer, S. & Saez-Rodriguez, J. Molecular causality in the advent of foundation models. Mol. Syst. Biol. 20, 848–858 (2024).
Article PubMed PubMed Central Google Scholar
Cui, H. et al. Towards multimodal foundation models in molecular cell biology. Nature 640, 623–633 (2025).
Article CAS PubMed Google Scholar
Stuart, J. M., Segal, E., Koller, D. & Kim, S. K. A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003).
Article CAS PubMed Google Scholar
Segal, E. et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 34, 166–176 (2003).
Article CAS PubMed Google Scholar
Lopez, R. et al. Learning causal representations of single cells via sparse mechanism shift modeling. In Proc. 2nd Conference on Causal Learning and Reasoning (eds van der Schaar, M. et al.) 662–691 (PMLR, 2023). This work uses sparse mechanism shifts to provide interpretable causal effects on learned latent variables.
Träuble, F. et al. On disentangled representations learned from correlated data. In Proc. 38th International Conference on Machine Learning 10401–10412 (PMLR, 2021).
Locatello, F. et al. Challenging common assumptions in the unsupervised learning of disentangled representations. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) 4114–4124 (PMLR, 2019).
Weinberger, E., Lin, C. & Lee, S.-I. Isolating salient variations of interest in single-cell data with contrastiveVI. Nat. Methods 20, 1336–1345 (2023). This work builds on a series of contrastive autoencoder frameworks to isolate variations of interest, such as perturbation-induced changes, from ‘background’ biological signals using single-cell omics data.
Article CAS PubMed Google Scholar
Piran, Z., Cohen, N., Hoshen, Y. & Nitzan, M. Disentanglement of single-cell data with biolord. Nat. Biotechnol. 42, 1678–1683 (2024).
Article CAS PubMed PubMed Central Google Scholar
Moinfar, A. A. & Theis, F. J. Unsupervised deep disentangled representation of single-cell omics with DRVI. In Proc. Learning Meaningful Representations of Life Workshop at ICLR (ICLR, 2025).
Kamimoto, K. et al. Dissecting cell identity via network inference and in silico gene perturbation. Nature 614, 742–751 (2023).
Article CAS PubMed PubMed Central Google Scholar
Bravo González-Blas, C. et al. SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks. Nat. Methods 20, 1355–1367 (2023).
Article PubMed PubMed Central Google Scholar
Birk, S. et al. Quantitative characterization of cell niches in spatially resolved omics data. Nat. Genet. 57, 897–909 (2025).
Article CAS PubMed PubMed Central Google Scholar
Schrod, S. et al. Spatial Cellular Networks from omics data with SpaCeNet. Genome Res. 34, 1371–1383 (2024).
Article CAS PubMed PubMed Central Google Scholar
Zheng, X., Aragam, B. & Ravikumar, P. K. DAGs with NO TEARS: Continuous optimization for structure learning. In Advances in Neural Information Processing Systems 31 (eds Bengio, S. et al.) (2018).
Rohbeck, M. et al. Bicycle: intervention-based causal discovery with cycles. In Proc. 3rd Conference on Causal Learning and Reasoning 209–242 (PMLR, 2024).
Brouillard, P., Lachapelle, S., Lacoste, A., Lacoste-Julien, S. & Drouin, A. Differentiable causal discovery from interventional data. In Proc. 34th International Conference on Neural Information Processing Systems (eds Larochelle, H. et al.) 21865–21877 (Curran, 2020).
Bertin, P. et al. A scalable gene network model of regulatory dynamics in single cells. Preprint at https://doi.org/10.48550/arXiv.2503.20027 (2025). This work combines optimal transport and pseudotime inference to model perturbation-dependent gene regulatory networks and cellular differentiation using ordinary differential equations.
Lopez, R., Hütter, J. C., Pritchard, J. & Regev, A. Large-scale differentiable causal discovery of factor graphs. In Proc. 36th International Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 19290–19303 (Curran, 2022).
Peters, J., Janzing, D. & Schölkopf, B. Elements of Causal Inference: Foundations and Learning Algorithms (Adaptive Computation and Machine Learning series) 288 (MIT Press, 2017).
Subramanian, A. et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171, 1437–1452.e17 (2017).
Article CAS PubMed PubMed Central Google Scholar
Lotfollahi, M., Wolf, F. A. & Theis, F. J. scGen predicts single-cell perturbation responses. Nat. Methods 16, 715–721 (2019).
Article CAS PubMed Google Scholar
Lotfollahi, M., Naghipourfar, M., Theis, F. J. & Wolf, F. A. Conditional out-of-distribution generation for unpaired data using transfer VAE. Bioinformatics 36, i610–i617 (2020).
Article CAS PubMed Google Scholar
Lotfollahi, M. et al. Predicting cellular responses to complex perturbations in high-throughput screens. Mol. Syst. Biol. 19, e11517 (2023). This work introduces the concept of explicitly disentangling and combining perturbational, covariate and background effects using autoencoder frameworks in single-cell data.
Article CAS PubMed PubMed Central Google Scholar
Hediyeh-zadeh, S., Fischer, T. & Theis, F. J. Disentanglement via mechanism sparsity by replaying realizations of the past. In Proc. ICLR 2024 Workshop on Machine Learning for Genomics Explorations (ICLR, 2024).
Bunne, C. et al. Learning single-cell perturbation responses using neural optimal transport. Nat. Methods 20, 1759–1768 (2023).
Article CAS PubMed PubMed Central Google Scholar
Rohbeck, M. et al. Modeling complex system dynamics with flow matching across time and conditions. In Proc. 13th International Conference on Learning Representations (ICLR, 2025).
Bhaskar, D., et al. Inferring dynamic regulatory interaction graphs from time series data with perturbations. In Proc. 2nd Learning on Graphs Conference (eds Villar, S. & Chamberlain, B.) 22:1–22:21 (PMLR, 2024).
Roohani, Y., Huang, K. & Leskovec, J. Predicting transcriptional outcomes of novel multigene perturbations with GEARS. Nat. Biotechnol. 42, 927–935 (2024). This work shows that co-expressions and prior knowledge representations can be used to relate gene perturbations, thus improving the extrapolation of unobserved perturbations.
Article CAS PubMed Google Scholar
Gaudelet, T. et al. Season combinatorial intervention predictions with Salt & Peper. In ICLR 2024 Workshop on Machine Learning for Genomics Explorations (ICLR, 2024).
Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).
Article CAS PubMed Google Scholar
Bereket, M. & Karaletsos, T. Modelling cellular perturbations with the sparse additive mechanism shift variational autoencoder. In Proc. 37th Conference on Neural Information Processing Systems (eds Oh, A. et al.) 1–12 (Curran, 2023).
Slack, M. D., Martinez, E. D., Wu, L. F. & Altschuler, S. J. Characterizing heterogeneous cellular responses to perturbations. Proc. Natl Acad. Sci. USA 105, 19306–19311 (2008).
Article CAS PubMed PubMed Central Google Scholar
He, S. et al. Squidiff: predicting cellular development and responses to perturbations using a diffusion model. Nat. Methods https://doi.org/10.1038/s41592-025-02877-y (2025).
Bunne, C., Krause, A. & Cuturi, M. Supervised training of conditional monge maps. In Proc. 36th International Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 35, 6859–6872 (Curran, 2022). This work builds on CellOT to introduce a context-aware optimal transport method that enables the extrapolation to novel perturbations and combinatorial effects.
Kim, M. C. et al. Method of moments framework for differential expression analysis of single-cell RNA sequencing data. Cell 187, 6393–6410.e16 (2024).
Article CAS PubMed PubMed Central Google Scholar
Squair, J. W. et al. Confronting false discoveries in single-cell differential expression. Nat. Commun. 12, 5692 (2021).
Article CAS PubMed PubMed Central Google Scholar
Neufeld, A., Gao, L. L., Popp, J., Battle, A. & Witten, D. Inference after latent variable estimation for single-cell RNA sequencing data. Biostatistics 25, 270–287 (2023).
Article PubMed Google Scholar
Missarova, A., Dann, E., Rosen, L., Satija, R. & Marioni, J. Leveraging neighborhood representations of single-cell data to achieve sensitive DE testing with miloDE. Genome Biol. 25, 189 (2024).
Article PubMed PubMed Central Google Scholar
Ahlmann-Eltze, C. & Huber, W. Analysis of multi-condition single-cell data with latent embedding multivariate regression. Nat. Genet. 57, 659–667 (2025).
Article CAS PubMed PubMed Central Google Scholar
Madrigal, A., Lu, T., Soto, L. M. & Najafabadi, H. S. A unified model for interpretable latent embedding of multi-sample, multi-condition single-cell data. Nat. Commun. 15, 6573 (2024).
Article CAS PubMed PubMed Central Google Scholar
Jin, K. et al. CellDrift: inferring perturbation responses in temporally sampled single-cell data. Brief. Bioinform. 23, bbac324 (2022).
Article PubMed PubMed Central Google Scholar
Dong, M., Su, D. G., Kluger, H., Fan, R. & Kluger, Y. SIMVI disentangles intrinsic and spatial-induced cellular states in spatial omics data. Nat. Commun. 16, 2990 (2025).
Article CAS PubMed PubMed Central Google Scholar
Cui, Y. & Yuan, Z. Prioritizing perturbation-responsive gene patterns using interpretable deep learning. Nat. Commun. 16, 6095 (2025).
Article CAS PubMed PubMed Central Google Scholar
Song, B. et al. Decoding heterogeneous single-cell perturbation responses. Nat. Cell Biol. 27, 493–504 (2025).
Article CAS PubMed PubMed Central Google Scholar
Yang, L. et al. scMAGeCK links genotypes with multiple phenotypes in single-cell CRISPR screens. Genome Biol. 21, 19 (2020).
Article CAS PubMed PubMed Central Google Scholar
Skinnider, M. A. et al. Cell type prioritization in single-cell data. Nat. Biotechnol. 39, 30–34 (2021).
Article CAS PubMed Google Scholar
Burkhardt, D. B. et al. Quantifying the effect of experimental perturbations at single-cell resolution. Nat. Biotechnol. 39, 619–629 (2021).
Article CAS PubMed PubMed Central Google Scholar
Nicol, P. B. et al. Robust identification of perturbed cell types in single-cell RNA-seq data. Nat. Commun. 15, 7610 (2024).
Article CAS PubMed PubMed Central Google Scholar
Li, C. et al. scRank infers drug-responsive cell types from untreated scRNA-seq data using a target-perturbed gene regulatory network. Cell Rep. Med. 5, 101568 (2024).
Article CAS PubMed PubMed Central Google Scholar
Cui, Y. & Yuan, Z. Scalable condition-relevant cell niche analysis of spatial omics data with Taichi. Preprint at bioRxiv https://doi.org/10.1101/2024.05.30.596656 (2024).
Teo, A. Y. Y. et al. Identification of perturbation-responsive regions and genes in comparative spatial transcriptomics atlases. Preprint at bioRxiv https://doi.org/10.1101/2024.06.13.598641 (2024).
Stein-O’Brien, G. L. et al. Enter the matrix: factorization uncovers knowledge from omics. Trends Genet. 34, 790–805 (2018).
Article PubMed PubMed Central Google Scholar
Lopez, R., Gayoso, A. & Yosef, N. Enhancing scientific discoveries in molecular biology with deep generative models. Mol. Syst. Biol. 16, e9198 (2020).
Article PubMed PubMed Central Google Scholar
Zhou, Y., Luo, K., Liang, L., Chen, M. & He, X. A new Bayesian factor analysis method improves detection of genes and biological processes affected by perturbations in single-cell CRISPR screening. Nat. Methods 20, 1693–1703 (2023). This work proposes a supervised factor model that allows the direct mapping of interventions to latent factors and associated genes.
Article CAS PubMed PubMed Central Google Scholar
Jones, A., Townes, F. W., Li, D. & Engelhardt, B. E. Contrastive latent variable modeling with application to case-control sequencing experiments. Ann. Appl. Stat. 16, 1268–1291 (2022).
Article PubMed PubMed Central Google Scholar
Capraz, T. et al. Semi-supervised Omics Factor Analysis (SOFA) disentangles known sources of variation from latent factors in multi-omics data. Preprint at bioRxiv https://doi.org/10.1101/2024.10.10.617527 (2025).
Moeed, A. et al. Identifying effects of disease on single-cells with domain-invariant generative modeling. In Proc. Causal Representation Learning Workshop at NeurIPS (NeurIPS, 2023).
He, S. et al. High-plex imaging of RNA and proteins at subcellular resolution in fixed tissue by spatial molecular imaging. Nat. Biotechnol. 40, 1794–1806 (2022).
Article CAS PubMed Google Scholar
Mitchel, J. et al. Coordinated, multicellular patterns of transcriptional variation that stratify patient cohorts are revealed by tensor decomposition. Nat. Biotechnol. 43, 1192–1201 (2025).
Article CAS PubMed Google Scholar
Ramirez Flores, R. O., Lanzer, J. D., Dimitrov, D., Velten, B. & Saez-Rodriguez, J. Multicellular factor analysis of single-cell data for a tissue-centric understanding of disease. eLife 12, e93161 (2023).
Article PubMed PubMed Central Google Scholar
Jerby-Arnon, L. & Regev, A. DIALOGUE maps multicellular programs in tissue from single-cell or spatial transcriptomics data. Nat. Biotechnol. 40, 1467–1477 (2022).
Article CAS PubMed PubMed Central Google Scholar
Pekayvaz, K. et al. Multiomic analyses uncover immunological signatures in acute and chronic coronary syndromes. Nat. Med. 30, 1696–1710 (2024).
Article CAS PubMed PubMed Central Google Scholar
Macnair, W. et al. snRNA-seq stratifies multiple sclerosis patients into distinct white matter glial responses. Neuron 113, 396–410.e9 (2025).
Article CAS PubMed Google Scholar
Yuan, Q. & Duren, Z. Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data. Nat. Biotechnol. 43, 247–257 (2025).
Article PubMed Google Scholar
Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. In Proc. 31st International Conference on Neural Information Processing Systems (eds von Luxburg, U. et al.) 4768–4777 (Curran, 2017).
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 3145–3153 (PMLR, 2017).
Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 3319–3328 (PMLR, 2017).
Kalfon, J., Samaran, J., Peyré, G. & Cantini, L. scPRINT: pre-training on 50 million cells allows robust gene network predictions. Nat. Commun. 16, 3607 (2025). This work introduces a foundational model that combines the learned representations with diverse prior knowledge to evaluate and improve gene regulatory network inference.
Article CAS PubMed PubMed Central Google Scholar
Hao, M. et al. Large-scale foundation model on single-cell transcriptomics. Nat. Methods 21, 1481–1491 (2024).
Article CAS PubMed Google Scholar
Yang, X. et al. GeneCompass: deciphering universal gene regulatory mechanisms with a knowledge-informed cross-species foundation model. Cell Res. 34, 830–845 (2024).
Article PubMed PubMed Central Google Scholar
Tu, X. et al. A supervised contrastive framework for learning disentangled representations of cell perturbation data. In Proc. 18th Machine Learning in Computational Biology Meeting (eds Knowles, D. A. & Mostafavi, S.) 90–100 (PMLR, 2024).
Weinberger, E., Conrad, R. & Ashuach, T. Modeling variable guide efficiency in pooled CRISPR screens with ContrastiveVI+. In Proc. NeurIPS 2024 Workshop on AI for New Drug Modalities (NeurIPS, 2024).
Aliee, H. et al. inVAE: conditionally invariant representation learning for generating multivariate single-cell reference maps. Preprint at bioRxiv https://doi.org/10.1101/2024.12.06.627196 (2024).
Weinberger, E., Lopez, R., Huetter, J.-C. & Regev, A. Disentangling shared and group-specific variations in single-cell transcriptomics data with multiGroupVI. In Proc. 17th Machine Learning in Computational Biology Meeting 16–32 (PMLR, 2022).
DeTomaso, D. & Yosef, N. Hotspot identifies informative gene modules across modalities of single-cell genomics. Cell Syst. 12, 446–456.e9 (2021).
Article CAS PubMed Google Scholar
Xu, Y., Fleming, S., Tegtmeyer, M., McCarroll, S. A. & Babadi, M. Explainable modeling of single-cell perturbation data using attention and sparse dictionary learning. Cell Syst. 16, 101245 (2025).
Article CAS PubMed Google Scholar
Lotfollahi, M. et al. Biologically informed deep learning to query gene programs in single-cell atlases. Nat. Cell Biol. 25, 337–350 (2023).
CAS PubMed PubMed Central Google Scholar
Seninge, L., Anastopoulos, I., Ding, H. & Stuart, J. VEGA is an interpretable generative model for inferring biological network activity in single-cell transcriptomics. Nat. Commun. 12, 5684 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zhao, Y., Cai, H., Zhang, Z., Tang, J. & Li, Y. Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data. Nat. Commun. 12, 5261 (2021).
Article CAS PubMed PubMed Central Google Scholar
Doncevic, D. & Herrmann, C. Biologically informed variational autoencoders allow predictive modeling of genetic and drug-induced perturbations. Bioinformatics 39, btad387 (2023).
Article CAS PubMed PubMed Central Google Scholar
Saraswat, M. et al. Decoding plasticity regulators and transition trajectories in glioblastoma with single-cell multiomics. Preprint at bioRxiv https://doi.org/10.1101/2025.05.13.653733 (2025).
Nazaret, A. et al. Joint representation and visualization of derailed cell states with Decipher. Genome Biol. 26, 219 (2025).
Article PubMed PubMed Central Google Scholar
Svensson, V., Gayoso, A., Yosef, N. & Pachter, L. Interpretable factor models of single-cell RNA-seq via variational autoencoders. Bioinformatics 36, 3418–3421 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lucas, J., Tucker, G., Grosse, R. B. & Norouzi, M. Don’t blame the ELBO! a linear VAE perspective on posterior collapse. In Proc. 33rd International Conference on Neural Information Processing Systems (eds Wallach, H. M.) 9408–9418 (Curran, 2019).
Garrido-Rodriguez, M., Zirngibl, K., Ivanova, O., Lobentanzer, S. & Saez-Rodriguez, J. Integrating knowledge and omics to decipher mechanisms via large-scale models of signaling networks. Mol. Syst. Biol. 18, e11036 (2022).
Article PubMed PubMed Central Google Scholar
Badia-I-Mompel, P. et al. decoupleR: ensemble of computational methods to infer biological activities from omics data. Bioinform. Adv. 2, vbac016 (2022).
Article PubMed PubMed Central Google Scholar
Kunes, R. Z., Walle, T., Land, M., Nawy, T. & Pe’er, D. Supervised discovery of interpretable gene programs from single-cell data. Nat. Biotechnol. 42, 1084–1095 (2024).
Article CAS PubMed Google Scholar
Qoku, A. & Buettner, F. Encoding domain knowledge in multi-view latent variable models: a Bayesian approach with structured sparsity. In Proc. 26th International Conference on Artificial Intelligence and Statistics 11545–11562 (PMLR, 2023).
Buettner, F., Pratanwanich, N., McCarthy, D. J., Marioni, J. C. & Stegle, O. f-scLVM: scalable and versatile factor analysis for single-cell RNA-seq. Genome Biol. 18, 212 (2017).
Article PubMed PubMed Central Google Scholar
Gut, G., Stark, S. G., Rätsch, G. & Davidson, N. R. pmVAE: learning interpretable single-cell representations with pathway modules. Preprint at bioRxiv https://doi.org/10.1101/2021.01.28.428664 (2021).
Niyakan, S., Luo, X., Yoon, B.-J. & Qian, X. Biologically interpretable VAE with supervision for transcriptomics data under ordinal perturbations. In ICLR 2024 Workshop on Machine Learning for Genomics Explorations (ICLR, 2024).
de la Fuente Cedeño, J. et al. Interpretable causal representation learning for biological data in the pathway space. In Proc. 13th International Conference on Learning Representations (eds Yue, Y. et al.) (ICLR, 2025).
Gonzalez, G. et al. Combinatorial prediction of therapeutic perturbations using causally inspired neural networks. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-025-01481-x (2025).
Article PubMed Google Scholar
Bai, D., Ellington, C. N., Mo, S., Song, L. & Xing, E. P. AttentionPert: accurately modeling multiplexed genetic perturbations with multi-scale effects. Bioinformatics 40, i453–i461 (2024).
Article PubMed PubMed Central Google Scholar
Wu, Y., et al. Predicting cellular responses with variational causal inference and refined relational information. In Proc. 11th International Conference on Learning Representations (ICLR, 2023).
Alsulami, R. et al. PrePR-CT: predicting perturbation responses in unseen cell types using cell-type-specific graphs. Preprint at bioRxiv https://doi.org/10.1101/2024.07.24.604816 (2024).
Huang, W. & Liu, H. Predicting single-cell cellular responses to perturbations using cycle consistency learning. Bioinformatics 40, i462–i470 (2024).
Article PubMed PubMed Central Google Scholar
Hetzel, L. et al. Predicting cellular responses to novel drug perturbations at a single-cell resolution. In Proc. 36th International Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 26711–26722 (Curran, 2022).
Qi, X. et al. Predicting transcriptional responses to novel chemical perturbations using deep generative model for drug discovery. Nat. Commun. 15, 9256 (2024).
Article CAS PubMed PubMed Central Google Scholar
Schrod, S., Zacharias, H. U., Beißbarth, T., Hauschild, A.-C. & Altenbuchinger, M. CODEX: COunterfactual Deep learning for the in silico EXploration of cancer cell line perturbations. Bioinformatics 40, i91–i99 (2024).
Article PubMed PubMed Central Google Scholar
Huang, K. et al. Sequential optimal experimental design of perturbation screens guided by multi-modal priors. In 28th Annual Conference on Research in Computational Molecular Biology (ed. Ma, J.) 17–37 (Springer-Verlag, 2024).
Märtens, K., Donovan-Maiye, R. & Ferkinghoff-Borg, J. Enhancing generative perturbation models with LLM-informed gene embeddings. In Proc. Workshop on Machine Learning for Genomics Explorations (ICLR, 2024).
Klein, D. et al. CellFlow enables generative single-cell phenotype modeling with flow matching. Preprint at bioRxiv https://doi.org/10.1101/2025.04.11.648220 (2025).
Hetzel, L. et al. Predicting cellular responses to novel drug perturbations at a single-cell resolution. In Advances in Neural Information Processing Systems 35, 26711–26722 (NeurIPS, 2025).
Badia-i-Mompel, P. et al. Comparison and evaluation of methods to infer gene regulatory networks from multimodal single-cell data. Preprint at bioRxiv https://doi.org/10.1101/2024.12.20.629764 (2025).
Hasanaj, E. et al. Multimodal benchmarking of foundation model representations for cellular perturbation response prediction. Preprint at bioRxiv https://doi.org/10.1101/2025.06.26.661186 (2025).
Szalai, B. & Saez-Rodriguez, J. Why do pathway methods work better than they should? FEBS Lett. 594, 4189–4200 (2020).
Article CAS PubMed Google Scholar
Barabási, A.-L. & Oltvai, Z. N. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113 (2004).
Article PubMed Google Scholar
Pratapa, A., Jalihal, A. P., Law, J. N., Bharadwaj, A. & Murali, T. M. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat. Methods 17, 147–154 (2020).
Article CAS PubMed PubMed Central Google Scholar
Gao, S. & Wang, X. Quantitative utilization of prior biological knowledge in the Bayesian network modeling of gene expression data. BMC Bioinform. 12, 359 (2011).
Article Google Scholar
Huynh-Thu, V. A., Irrthum, A., Wehenkel, L. & Geurts, P. Inferring regulatory networks from expression data using tree-based methods. PLoS One 5, e12776 (2010).
Article PubMed PubMed Central Google Scholar
Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wang, L. et al. Dictys: dynamic gene regulatory network dissects developmental continuum with single-cell multiomics. Nat. Methods 20, 1368–1378 (2023).
Article CAS PubMed Google Scholar
Dong, M. & Kluger, Y. GEASS: neural causal feature selection for high-dimensional biological data. In Proc. 11th International Conference on Learning Representations (ICLR, 2023).
Wang, W. et al. RegVelo: gene-regulatory-informed dynamics of single cells. Preprint at bioRxiv https://doi.org/10.1101/2024.12.11.627935 (2024).
Tanevski, J. et al. Learning tissue representation by identification of persistent local patterns in spatial omics data. Nat. Commun. 16, 4071 (2025).
Article CAS PubMed PubMed Central Google Scholar
Tanevski, J., Flores, R. O. R., Gabor, A., Schapiro, D. & Saez-Rodriguez, J. Explainable multiview framework for dissecting spatial relationships from highly multiplexed data. Genome Biol. 23, 97 (2022).
Article PubMed PubMed Central Google Scholar
Megas, S. et al. Estimation of single-cell and tissue perturbation effect in spatial transcriptomics via spatial causal disentanglement. In Proc. 13th International Conference on Learning Representations (ICLR, 2024).
Wen, Y. et al. Applying causal discovery to single-cell analyses using CausalCell. eLife 12, e81464 (2023).
Article CAS PubMed PubMed Central Google Scholar
Belyaeva, A., Squires, C. & Uhler, C. DCI: learning causal differences between gene regulatory networks. Bioinformatics 37, 3067–3069 (2021).
Article CAS PubMed PubMed Central Google Scholar
Chevalley, M., Roohani, Y. H., Mehrjou, A., Leskovec, J. & Schwab, P. A large-scale benchmark for network inference from single-cell perturbation data. Commun. Biol. 8, 412 (2025).
Article PubMed PubMed Central Google Scholar
Zheng, X., Dan, C., Aragam, B., Ravikumar, P. & Xing, E. Learning sparse nonparametric DAGs. In Proc. 23rd International Conference on Artificial Intelligence and Statistics (eds Chiappa, S. & Calandra, R.) 3414–3425 (PMLR, 2020).
Yu, Y., Chen, J., Gao, T. & Yu, M. DAG-GNN: DAG structure learning with graph neural networks. In Proc. 36th International Conference on Machine Learning 7154–7163 (PMLR, 2019).
Wu, M., Bao, Y., Barzilay, R. & Jaakkola, T. Sample, estimate, aggregate: a recipe for causal discovery foundation models. In Transactions on Machine Learning Research (eds Kamath, G. et al.) 10 (TMLR, 2025).
Zhang, J., Cammarata, L., Squires, C., Sapsis, T. P. & Uhler, C. Active learning for optimal intervention design in causal models. Nat. Mach. Intell. 5, 1066–1075 (2023). This work introduces an early active learning scheme that uses a causal graph model to guide the experimental exploration of genetic perturbations.
Article Google Scholar
Lorch, L., Sussex, S., Rothfuss, J., Krause, A. & Schölkopf, B. In Proc. 36th International Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 13104–13118 (Curran, 2022).
Sethuraman, M. G. et al. NODAGS-Flow: nonlinear cyclic causal structure learning. In Proc. 26th International Conference on Artificial Intelligence and Statistics (eds Ruiz, F. et al.) 6371–6387 (PMLR, 2023).
Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).
Article CAS PubMed PubMed Central Google Scholar
Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).
Article CAS PubMed Google Scholar
Comon, P. Independent component analysis, a new concept? Signal Process. 36, 287–314 (1994).
Article Google Scholar
Hyvärinen, A. & Oja, E. Independent component analysis: algorithms and applications. Neural Netw. 13, 411–430 (2000).
Article PubMed Google Scholar
Yu, H. & Welch, J. D. MichiGAN: sampling from disentangled representations of single-cell data using generative adversarial networks. Genome Biol. 22, 158 (2021).
Article PubMed PubMed Central Google Scholar
Moran, G. E., Sridhar, D., Wang, Y. & Blei, D. Identifiable deep generative models via sparse decoding. In Transactions on Machine Learning Research (eds Kamath, G. et al.) 182 (TMLR, 2022).
Lopez, R., Regier, J., Jordan, M. I. & Yosef, N. Information constraints on auto-encoding variational Bayes. In Proc. 32nd International Conference on Neural Information Processing Systems (eds Bengio, S. et al.) 6117–6128 (Curran, 2018).
Gayoso, A. et al. A Python library for probabilistic analysis of single-cell omics data. Nat. Biotechnol. 40, 163–166 (2022). This work combines a series of variational autoencoder extensions that build on scVI into a centralized Python framework that aims to accelerate the development of probabilistic (autoencoder) models for single-cell omics data analysis.
Article CAS PubMed Google Scholar
Hyvärinen, A. & Pajunen, P. Nonlinear independent component analysis: Existence and uniqueness results. Neural Netw. 12, 429–439 (1999).
Article PubMed Google Scholar
Hyvärinen, A., Khemakhem, I. & Morioka, H. Nonlinear independent component analysis for principled disentanglement in unsupervised deep learning. Patterns 4, 100844 (2023).
Article PubMed PubMed Central Google Scholar
Lachapelle, S. et al. Disentanglement via mechanism sparsity regularization: a new principle for nonlinear ICA. In First Conference on Causal Learning and Reasoning (eds Schölkopf, B. et al.) 177, 428–484 (2022).
Zou, J. Y., Hsu, D. J., Parkes, D. C. & Adams, R. P. Contrastive learning using spectral methods. In Proc. 27th International Conference on Neural Information Processing Systems - Volume 2 (eds Burges, C. J. C. et al.) 2238–2246 (Curran, 2013).
Abid, A., Zhang, M. J., Bagaria, V. K. & Zou, J. Exploring patterns enriched in a dataset with contrastive principal component analysis. Nat. Commun. 9, 2134 (2018).
Article PubMed PubMed Central Google Scholar
Li, D., Jones, A. & Engelhardt, B. Probabilistic contrastive dimension reduction for case-control study data. Ann. Appl. Stat. 18, 2207–2229 (2024).
Article Google Scholar
Boileau, P., Hejazi, N. S. & Dudoit, S. Exploring high-dimensional biological data with sparse contrastive principal component analysis. Bioinformatics 36, 3422–3430 (2020).
Article CAS PubMed Google Scholar
Abid, A. & Zou, J. Contrastive variational autoencoder enhances salient features. Preprint at https://doi.org/10.48550/arXiv.1902.04601 (2019).
Severson, K. A., Ghosh, S. & Ng, K. Unsupervised learning with contrastive latent variable models. In Proc. AAAI Conference on Artificial Intelligence 33, 4862–4869 (AAAI, 2019).
Zhang, L. & Zhang, S. Learning common and specific patterns from data of multiple interrelated biological scenarios with matrix factorization. Nucleic Acids Res. 47, 6606–6617 (2019).
Article CAS PubMed PubMed Central Google Scholar
Qian, K., Fu, S., Li, H. & Li, W. V. scINSIGHT for interpreting single-cell gene expression from biologically heterogeneous data. Genome Biol. 23, 82 (2022).
Article CAS PubMed PubMed Central Google Scholar
Weinberger, E., Beebe-Wang, N. & Lee, S.-I. Moment matching deep contrastive latent variable models. In Proc. 25th International Conference on Artificial Intelligence and Statistics 2354–2371 (PMLR, 2022).
Zhang, Z., Zhao, X., Bindra, M., Qiu, P. & Zhang, X. scDisInFact: disentangled learning for integration and prediction of multi-batch multi-condition single-cell RNA-sequencing data. Nat. Commun. 15, 912 (2024).
Article CAS PubMed PubMed Central Google Scholar
Megas, S. et al. Integrating multi-covariate disentanglement with counterfactual analysis on synthetic data enables cell type discovery and counterfactual predictions. Preprint at bioRxiv https://doi.org/10.1101/2025.06.03.657578 (2025).
Inecik, K., Kara, A., Rose, A., Haniffa, M. & Theis, F. J. TarDis: achieving robust and structured disentanglement of multiple covariates. In Proc. Research in Computational Molecular Biology: 29th International Conference, RECOMB 2025 (ed. Sankararaman, S.) 285–289 (Springer, 2025).
Inecik, K., Uhlmann, A., Lotfollahi, M. & Theis, F. MultiCPA: multimodal compositional perturbation autoencoder. Preprint at bioRxiv https://doi.org/10.1101/2022.07.08.499049 (2022).
Wei, X., Dong, J. & Wang, F. scPreGAN, a deep generative model for predicting the response of single-cell expression to perturbation. Bioinformatics 38, 3377–3384 (2022).
Article CAS PubMed Google Scholar
Mao, H. et al. Learning identifiable factorized causal representations of cellular responses. In Advances in Neural Information Processing Systems 37 (NeurIPS 2024) (eds Globerson, A. et al.) 121630–121669 (NeurIPS, 2024).
Miladinovic, D. et al. In silico biological discovery with large perturbation models. Nat. Comput. Sci. 5, 1029–1040 (2025).
Article PubMed PubMed Central Google Scholar
Adduri, A. K. et al. Predicting cellular responses to perturbation across diverse contexts with State. Preprint at bioRxiv https://doi.org/10.1101/2025.06.26.661135 (2025).
Rampášek, L., Hidru, D., Smirnov, P., Haibe-Kains, B. & Goldenberg, A. Dr.VAE: improving drug response prediction via modeling of drug perturbation effects. Bioinformatics 35, 3743–3751 (2019).
Article PubMed PubMed Central Google Scholar
Zhang, J. et al. Identifiability guarantees for causal disentanglement from soft interventions. In Proc. 37th International Conference on Neural Information Processing Systems (eds Oh, A. et al.) 50254–50292 (Curran, 2023).
Wu, Y. et al. PerturBench: benchmarking machine learning models for cellular perturbation analysis. In NeurIPS 2024 Workshop on AI for New Drug Modalities (NeurIPS, 2024).
Liu, T. et al. scELMo: embeddings from language models are good learners for single-cell data analysis. Preprint at bioRxiv https://doi.org/10.1101/2023.12.07.569910 (2023).
Zhong, J., Li, L., Dannenfelser, R. & Yao, V. Benchmarking gene embeddings from sequence, expression, network, and text models for functional prediction tasks. Preprint at bioRxiv https://doi.org/10.1101/2025.01.29.635607 (2025).
Istrate, A.-M., Li, D. & Karaletsos, T. scGenePT: is language all you need for modeling single-cell perturbations? Preprint at bioRxiv https://doi.org/10.1101/2024.10.23.619972 (2024).
Wenteler, A. et al. PertEval-scFM: benchmarking single-cell foundation models for perturbation effect prediction. In 42nd International Conference on Machine Learning (ICML, 2025).
Csendes, G., Sanz, G., Szalay, K. Z. & Szalai, B. Benchmarking foundation cell models for post-perturbation RNA-seq prediction. BMC Genom. 26, 393 (2025).
Article Google Scholar
Kernfeld, E., Yang, Y., Weinstock, J. S., Battle, A. & Cahan, P. A comparison of computational methods for expression forecasting. Genome Biol. 26, 388 (2025).
Article PubMed PubMed Central Google Scholar
Viñas Torné, R. et al. Systema: a framework for evaluating genetic perturbation response prediction beyond systematic variation. Nat. Biotechnol. https://doi.org/10.1038/s41587-025-02777-8 (2025).
Article PubMed Google Scholar
Ahlmann-Eltze, C., Huber, W. & Anders, S. Deep-learning-based gene perturbation effect prediction does not yet outperform simple linear baselines. Nat. Methods 22, 1657–1661 (2025).
Article CAS PubMed PubMed Central Google Scholar
von Kügelgen, J., Ketterer, J., Shen, X., Meinshausen, N. & Peters, J. Representation learning for distributional perturbation extrapolation. In Learning Meaningful Representations of Life (LMRL) Workshop at ICLR (ICLR, 2025).
Carvalho, C. M. et al. High-dimensional sparse factor modeling: applications in gene expression genomics. J. Am. Stat. Assoc. 103, 1438–1456 (2008).
Article CAS PubMed PubMed Central Google Scholar
Liu, E., Zhang, J. & Uhler, C. Learning genetic perturbation effects with variational causal inference. Preprint at bioRxiv https://doi.org/10.1101/2025.06.05.657988 (2025).
Jiang, Q., Chen, S., Chen, X. & Jiang, R. scPRAM accurately predicts single-cell gene expression perturbation response based on attention mechanism. Bioinformatics 40, btae265 (2024).
Article CAS PubMed PubMed Central Google Scholar
Klein, D. et al. Mapping cells through time and space with moscot. Nature 638, 1065–1075 (2025).
Article CAS PubMed PubMed Central Google Scholar
Schiebinger, G. et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell 176, 928–943.e22 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kapuńniak, K. et al. Metric flow matching for smooth interpolations on the data manifold. In Proc. 38th International Conference on Neural Information Processing Systems (eds Globerson, A. et al.) 135011–135042 (Curran, 2024).
Tong, A. et al. Improving and generalizing flow-based generative models with minibatch optimal transport. In Transactions on Machine Learning Research (eds Kamath, G. et al.) 1768 (TMLR, 2024).
Erbe, R., Stein-O’Brien, G. & Fertig, E. J. Transcriptomic forecasting with neural ordinary differential equations. Patterns 4, 100793 (2023).
Article CAS PubMed PubMed Central Google Scholar
Palma, A. et al. Multi-modal and multi-attribute generation of single cells with CFGen. In Proc. 13th International Conference on Learning Representations (ICLR, 2025).
Yuan, B. et al. CellBox: interpretable machine learning for perturbation biology with application to the design of cancer combination therapy. Cell Syst. 12, 128–140.e4 (2021).
Article CAS PubMed Google Scholar
Aivazidis, A. et al. Cell2fate infers RNA velocity modules to improve cell fate prediction. Nat. Methods 22, 698–707 (2025).
Article CAS PubMed PubMed Central Google Scholar
Qiu, X. et al. Mapping transcriptomic vector fields of single cells. Cell 185, 690–711.e45 (2022).
Article CAS PubMed PubMed Central Google Scholar
Tong, A., Huang, J., Wolf, G., van Dijk, D. & Krishnaswamy, S. Trajectorynet: a dynamic optimal transport network for modeling cellular dynamics. Proc. Mach. Learn. Res. 119, 9526–9536 (2020).
PubMed PubMed Central Google Scholar
Alatkar, S. A. & Wang, D. ARTEMIS integrates autoencoders and Schrödinger Bridges to predict continuous dynamics of gene expression, cell population, and perturbation from time-series single-cell data. Bioinformatics 41, i189–i197 (2025).
Article CAS PubMed PubMed Central Google Scholar
Somnath, V. R. et al. Aligned diffusion Schrödinger bridges. In Proc. 39th Conference on Uncertainty in Artificial Intelligence 1985–1995 (PMLR, 2023).
Zhang, Z., Li, T. & Zhou, P. Learning stochastic dynamics from snapshots through regularized unbalanced optimal transport. In Proc. 13th International Conference on Learning Representations (ICLR, 2025).
Yeo, G. H. T., Saksena, S. D. & Gifford, D. K. Generative modeling of single-cell time series with PRESCIENT enables prediction of cell trajectories with interventions. Nat. Commun. 12, 3222 (2021).
Article CAS PubMed PubMed Central Google Scholar
Luo, E., Hao, M., Wei, L. & Zhang, X. scDiffusion: conditional generation of high-quality single-cell data using diffusion model. Bioinformatics 40, btae518 (2024).
Article CAS PubMed PubMed Central Google Scholar
Luecken, M. D. & Theis, F. J. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol. 15, e8746 (2019).
Article PubMed PubMed Central Google Scholar
Huang, S., Soto, A. M. & Sonnenschein, C. The end of the genetic paradigm of cancer. PLoS Biol. 23, e3003052 (2025).
Article CAS PubMed PubMed Central Google Scholar
Szałata, A. et al. A benchmark for prediction of transcriptomic responses to chemical perturbations across cell types. In Proc. 38th International Conference on Neural Information Processing Systems (eds Globerson, A. et al.) 20566–20616 (Curran, 2024).
Kernfeld, E., Keener, R., Cahan, P. & Battle, A. Transcriptome data are insufficient to control false discoveries in regulatory network inference. Cell Syst. 15, 709–724.e13 (2024).
Article CAS PubMed PubMed Central Google Scholar
Caranzano, I. et al. Sparsity is all you need: rethinking biological pathway-informed approaches in deep learning. Preprint at https://doi.org/10.48550/arXiv.2505.04300 (2025).
Radig, J. et al. Tracking biological hallucinations in single-cell perturbation predictions using scArchon, a comprehensive benchmarking platform. Preprint at bioRxiv https://doi.org/10.1101/2025.06.23.661046 (2025).
Kedzierska, K. Z., Crawford, L., Amini, A. P. & Lu, A. X. Zero-shot evaluation reveals limitations of single-cell foundation models. Genome Biol. 26, 101 (2025).
Article PubMed PubMed Central Google Scholar
Mejia, G. M. et al. Diversity by design: addressing mode collapse improves scRNA-seq perturbation modeling on well-calibrated metrics. In ICML 2025 Generative AI and Biology Workshop (ICML, 2025).
Mahmood, F. A benchmarking crisis in biomedical machine learning. Nat. Med. 31, 1060 (2025).
Article CAS PubMed Google Scholar
Ji, Y. et al. Optimal distance metrics for single-cell RNA-seq populations. Preprint at bioRxiv https://doi.org/10.1101/2023.12.26.572833 (2023).
Luecken, M. D. et al. Defining and benchmarking open problems in single-cell analysis. Nat. Biotechnol. 43, 1035–1040 (2025).
Article CAS PubMed Google Scholar
Roohani, Y. H. et al. Virtual Cell Challenge: toward a Turing test for the virtual cell. Cell 188, 3370–3374 (2025).
Article CAS PubMed Google Scholar
Heumos, L. et al. Pertpy: an end-to-end framework for perturbation analysis. Preprint at bioRxiv https://doi.org/10.1101/2024.08.04.606516 (2024).
CZI Cell Science Program et al. CZ CELLxGENE Discover: a single-cell data platform for scalable exploration, analysis and modeling of aggregated data. Nucleic Acids Res. 53, D886–D900 (2025).
Article Google Scholar
Youngblut, N. D. et al. scBaseCamp: an AI agent-curated, uniformly processed, and continually expanding single cell data repository. Preprint at bioRxiv https://doi.org/10.1101/2025.02.27.640494 (2025).
Roohani, Y. et al. BioDiscoveryAgent: an AI agent for designing genetic perturbation experiments. The 13th International Conference on Learning Representations (ICLR, 2024).
Weinreb, C., Rodriguez-Fraticelli, A., Camargo, F. D. & Klein, A. M. Lineage tracing on transcriptional landscapes links state to fate during differentiation. Science 367, eaaw3381 (2020).
Article CAS PubMed PubMed Central Google Scholar
Chen, W. et al. Live-seq enables temporal transcriptomic recording of single cells. Nature 608, 733–740 (2022).
Article CAS PubMed PubMed Central Google Scholar
Kobayashi-Kirschvink, K. J. et al. Prediction of single-cell RNA expression profiles in live cells by Raman microscopy with Raman2RNA. Nat. Biotechnol. 42, 1726–1734 (2024).
Article CAS PubMed PubMed Central Google Scholar
Reynolds, D. E. et al. Temporal and spatial omics technologies for 4D profiling. Nat. Methods 22, 1408–1419 (2025).
Article CAS PubMed Google Scholar
Gu, J. et al. Mapping multimodal phenotypes to perturbations in cells and tissue with CRISPRmap. Nat. Biotechnol. 43, 1101–1115 (2025).
Article CAS PubMed Google Scholar
Dhainaut, M. et al. Spatial CRISPR genomics identifies regulators of the tumor microenvironment. Cell 185, 1223–1239.e20 (2022).
Article CAS PubMed PubMed Central Google Scholar
Saunders, R. A. et al. A platform for multimodal in vivo pooled genetic screens reveals regulators of liver function. Preprint at bioRxiv https://doi.org/10.1101/2024.11.18.624217 (2025).
Breinig, M. et al. Integrated in vivo combinatorial functional genomics and spatial transcriptomics of tumours to decode genotype-to-phenotype relationships. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-025-01437-1 (2025).
Article PubMed Google Scholar
Metzner, E., Southard, K. M. & Norman, T. M. Multiome Perturb-seq unlocks scalable discovery of integrated perturbation effects on the transcriptome and epigenome. Cell Syst. 16, 101161 (2025).
Article CAS PubMed Google Scholar
Mimitou, E. P. et al. Multiplexed detection of proteins, transcriptomes, clonotypes and CRISPR perturbations in single cells. Nat. Methods 16, 409–412 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ryu, J., Lopez, R., Bunne, C., Pinello, L. & Regev, A. Cross-modality matching and prediction of perturbation responses with labeled Gromov-Wasserstein optimal transport. In ICML 2024 AI for Science Workshop (ICML, 2024).
Wenckstern, J. et al. AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery. In Proc. Learning Meaningful Representations of Life (LMRL) Workshop at ICLR (ICLR, 2025).
Chen, W. et al. A visual-omics foundation model to bridge histopathology with spatial transcriptomics. Nat. Methods 22, 1568–1582 (2025).
Article PubMed PubMed Central Google Scholar
Rizvi, S. A. et al. Scaling large language models for next-generation single-cell analysis. Preprint at bioRxiv https://doi.org/10.1101/2025.04.14.648850 (2025).
Ji, Y. et al. Scalable and universal prediction of cellular phenotypes. Preprint at bioRxiv https://doi.org/10.1101/2024.08.12.607533 (2025).
Gupta, A. et al. SubCell: vision foundation models for microscopy capture single-cell biology. Preprint at bioRxiv https://doi.org/10.1101/2024.12.06.627299 (2025).
Maan, H. et al. Multi-modal disentanglement of spatial transcriptomics and histopathology imaging. In Learning Meaningful Representations of Life (LMRL) Workshop at ICLR (ICLR, 2025).
Datlinger, P. et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat. Methods 14, 297–301 (2017).
Article CAS PubMed PubMed Central Google Scholar
Lalli, M. A., Avey, D., Dougherty, J. D., Milbrandt, J. & Mitra, R. D. High-throughput single-cell functional elucidation of neurodevelopmental disease-associated genes reveals convergent mechanisms altering neuronal differentiation. Genome Res. 30, 1317–1331 (2020).
Article CAS PubMed PubMed Central Google Scholar
Huguet, G. et al. Manifold interpolating optimal-transport flows for trajectory inference. Adv. Neural Inf. Process. Syst. 35, 29705–29718 (2022).
PubMed PubMed Central Google Scholar
Wang, S.-W., Herriges, M. J., Hurley, K., Kotton, D. N. & Klein, A. M. CoSpar identifies early cell fate biases from single-cell transcriptomic and lineage information. Nat. Biotechnol. 40, 1066–1074 (2022).
Article CAS PubMed Google Scholar
Heimberg, G., Bhatnagar, R., El-Samad, H. & Thomson, M. Low dimensionality in gene expression data enables the accurate extraction of transcriptional programs from shallow sequencing. Cell Syst. 2, 239–250 (2016).
Article CAS PubMed PubMed Central Google Scholar
VanderWeele, T. J. & Shpitser, I. On the definition of a confounder. Ann. Stat. 41, 196–220 (2013).
Article PubMed PubMed Central Google Scholar
Fröhlich, F. et al. Efficient parameter estimation enables the prediction of drug response using a mechanistic pan-cancer pathway model. Cell Syst. 7, 567–579.e6 (2018).
Article PubMed Google Scholar
Cuturi, M. et al. Optimal Transport Tools (OTT): a JAX toolbox for all things Wasserstein. Preprint at https://doi.org/10.48550/arXiv.2201.12324 (2022).

Download references

Acknowledgements

The authors thank S. Müller-Dott, P. S. L. Schäfer, P. Rodriguez Mier, A. Moeed, M. Garrido Rodriguez-Cordoba, R. O. Ramirez Flores, R. Abdulhamid and J. Saez-Rodriguez for their feedback on the initial draft. The authors’ work is supported through state funds approved by the State Parliament of Baden-Württemberg for the Innovation Campus Health + Life Science alliance Heidelberg Mannheim, the Data Science Collaborative Research Programme 2022 by the Novo Nordisk Foundation (grant NNF22OC0076414), and the European Research Council (Synergy Grant DECODE 810296). The authors also acknowledge funding from GSK through the EMBL-GSK collaboration framework (3000038350).

Author information

These authors contributed equally: Daniel Dimitrov, Stefan Schrod.

Authors and Affiliations

Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
Daniel Dimitrov, Stefan Schrod & Oliver Stegle
Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
Daniel Dimitrov, Stefan Schrod, Martin Rohbeck & Oliver Stegle
Heidelberg University, Heidelberg, Germany
Martin Rohbeck
Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
Oliver Stegle

Authors

Daniel Dimitrov
View author publications
Search author on:PubMed Google Scholar
Stefan Schrod
View author publications
Search author on:PubMed Google Scholar
Martin Rohbeck
View author publications
Search author on:PubMed Google Scholar
Oliver Stegle
View author publications
Search author on:PubMed Google Scholar

Contributions

D.D., S.S. and M.R. researched the literature. D.D., S.S. and O.S. contributed substantially to discussions of the content. All authors wrote the article and reviewed and/or edited the manuscript.

Corresponding authors

Correspondence to Daniel Dimitrov, Stefan Schrod or Oliver Stegle.

Ethics declarations

Competing interests

O.S. is a paid consultant of Insitro. The other authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Genetics thanks the anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Glossary

Agentic workflows: A computational process in which multiple task-specific models (agents) autonomously collaborate to plan and execute a sequence of tasks, attempting to achieve a complex common objective with minimal human intervention.
Autoencoders: Types of neural networks that learn a compressed, low-dimensional representation (encoding) of input data and then reconstruct (decode) the original input from the (typically) compressed encoding.
Causal graph models: Statistical models that represent cause–and–effect relationships through a structured graph in which variables are represented by nodes and causal influences by directed edges.
Causal mechanisms: Directed, causal interactions between specific molecules through which signals propagate.
Causal signatures: A set of observable variables that reflect the underlying causal processes, such as perturbations, cellular heterogeneity, regulatory layers, and temporal and spatial scales.
Conditional independence: The mutual status of two variables that no longer provide information about each other once other variables are accounted for.
Confounders: Extraneous factors that, if not controlled for, can produce misleading or spurious associations between variables of interest.
Counterfactual: A hypothetical outcome representing what would have occurred under alternative conditions or different interventions from those actually observed.
Diffusion models: A class of generative models that systematically introduce noise into data and attempt to reverse this process to generate new data by modelling complex probability distributions.
Embeddings: Low-dimensional vector (or matrix) representations of an entity, such as a sample, feature or condition, that capture its relevant properties and relationships.
Factor models: Statistical models that represent observed variables as linear combinations of lower-dimension latent factors plus noise, in which each factor captures shared variation among the variables.
Gene programmes: A coordinated set of genes that represent shared biological functions and responses.
Generalize: To maintain performance and validity across datasets or conditions beyond those used during development or training, indicating robustness and broader applicability.
Generative models: Models designed to learn the underlying distributions of datasets, in order to generate new, similar data from them.
Identifiable: A model’s parameters or solutions are identifiable if they can be uniquely determined from the available data under the assumed model.
Interventions: Deliberate actions to manipulate a biological variable or process within a system to observe their effects.
Latent spaces: Abstract representations of the data that capture the essential features and relationships in low dimensions.
Latent variable: A hidden or unobservable variable that cannot be measured directly but is inferred from observable data, ideally representing the underlying factors or structures influencing the observed measurements.
Optimal transport: A method used to pair distributions of cells (for example, control and perturbed) in a cost-efficient way, while preserving overall mass.
Ordinary differential equations: Equations or sets of equations that describe a rate of change of a quantity (for example, RNA degradation rate).
Perturbations: Disturbances or deviations from a system’s normal or steady state, which can be intentional or unintentional.
Prior knowledge: Information about a biological system, such as molecular interactions, pathways or phenotypic relationships, collected or estimated from diverse experiments and data modalities.
Pseudotime: An estimate that orders cells along a continuous trajectory, such as differentiation, by using the similarities in their gene expression profiles.
RNA velocity: An estimate of the time derivative of gene expression states, commonly calculated by analysing the ratios of spliced to unspliced messenger RNAs.
Spurious correlations: Relationships between pairs of variables that seem to be causal but are solely coincidental or owing to the influence of third variables linking them.
Supervised: A machine learning paradigm in which a model is trained on input features paired with known labels or outcomes.
Transformer: A neural network architecture based on attention that processes data by computing pairwise relationships between elements in parallel.
Unsupervised: A machine learning paradigm in which a model learns from input data without access to known labels or categories.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Dimitrov, D., Schrod, S., Rohbeck, M. et al. Interpretation, extrapolation and perturbation of single cells. Nat Rev Genet (2026). https://doi.org/10.1038/s41576-025-00920-4

Download citation

Accepted: 11 November 2025
Published: 02 January 2026
Version of record: 02 January 2026
DOI: https://doi.org/10.1038/s41576-025-00920-4

Interpretation, extrapolation and perturbation of single cells

Subjects

Abstract

Access options

Similar content being viewed by others

Causal machine learning for single-cell genomics

Statistical mechanics meets single-cell biology

Data-driven comparison of multiple high-dimensional single-cell expression profiles

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Related links

Supplementary information

Supplementary information

Glossary

Rights and permissions

About this article

Cite this article

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

Causal machine learning for single-cell genomics

Statistical mechanics meets single-cell biology

Data-driven comparison of multiple high-dimensional single-cell expression profiles

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Related links

Supplementary information

Supplementary information

Glossary

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links