Abstract
Single-cell analyses have transitioned from descriptive atlasing towards inferring causal effects and mechanistic relationships that capture cellular logic. Technological advances and the growing scale of observational and interventional datasets have fuelled the development of machine learning methods aimed at identifying such dependencies and extrapolating perturbation effects. Here, we review and connect these approaches according to their modelling concepts (including representation learning, causal inference, mechanistic discovery, disentanglement and population tracing), underlying assumptions and downstream tasks. We propose a unifying ontology to guide practitioners in selecting the most suitable methods for a given biological question, with detailed technical descriptions provided in an online resource. Finally, we identify promising computational directions and underexplored data properties that could pave the way for future developments.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout



Similar content being viewed by others
References
Regev, A. et al. The human cell atlas. eLife 6, e27041 (2017).
Rood, J. E. et al. The Human Cell Atlas from a cell census to a unified foundation model. Nature 637, 1065–1071 (2025).
Argelaguet, R. et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 21, 111 (2020).
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).
Velten, B. et al. Identifying temporal and spatial patterns of variation from multimodal data using MEFISTO. Nat. Methods 19, 179–186 (2022).
Tian, T., Zhang, J., Lin, X., Wei, Z. & Hakonarson, H. Dependency-aware deep generative models for multitasking analysis of spatial omics data. Nat. Methods 21, 1501–1513 (2024).
Zhong, C., Ang, K. S. & Chen, J. Interpretable spatially aware dimension reduction of spatial transcriptomics with STAMP. Nat. Methods 21, 2072–2083 (2024).
Lotfollahi, M. et al. Mapping single-cell data to reference atlases by transfer learning. Nat. Biotechnol. 40, 121–130 (2022).
Pearl, J. Causality 2nd edn (Cambridge Univ. Press, 2009).
Schölkopf, B. et al. Toward causal representation learning. Proc. IEEE 109, 612–634 (2021).
Saez-Rodriguez, J. et al. Discrete logic modelling as a means to link protein signalling networks with functional analysis of mammalian signal transduction. Mol. Syst. Biol. 5, 331 (2009).
Yazar, S. et al. Single-cell eQTL mapping identifies cell type-specific genetic control of autoimmune disease. Science 376, eabf3041 (2022).
Jerber, J. et al. Population-scale single-cell RNA-seq profiling across dopaminergic neuron differentiation. Nat. Genet. 53, 304–312 (2021).
Kuppe, C. et al. Spatial multi-omic map of human myocardial infarction. Nature 608, 766–777 (2022).
Oliver, A. J. et al. Single-cell integration reveals metaplasia in inflammatory gut diseases. Nature 635, 699–707 (2024).
Velten, B. & Stegle, O. Principles and challenges of modeling temporal and spatial omics data. Nat. Methods 20, 1462–1474 (2023).
Fischer, D. S., Villanueva, M. A., Winter, P. S. & Shalek, A. K. Adapting systems biology to address the complexity of human disease in the single-cell era. Nat. Rev. Genet. 26, 514–531 (2025).
Shojaie, A. & Fox, E. B. Granger causality: a review and recent advances. Annu. Rev. Stat. Appl. 9, 289–319 (2022).
Arnol, D., Schapiro, D., Bodenmiller, B., Saez-Rodriguez, J. & Stegle, O. Modeling cell-cell interactions from spatial molecular data with spatial variance component analysis. Cell Rep. 29, 202–211.e6 (2019).
Bressan, D., Battistoni, G. & Hannon, G. J. The dawn of spatial omics. Science 381, eabq4964 (2023).
Armingol, E., Baghdassarian, H. M. & Lewis, N. E. The diversification of methods for studying cell-cell interactions and communication. Nat. Rev. Genet. 25, 381–400 (2024).
Palla, G., Fischer, D. S., Regev, A. & Theis, F. J. Spatial components of molecular tissue biology. Nat. Biotechnol. 40, 308–318 (2022).
Vandereyken, K., Sifrim, A., Thienpont, B. & Voet, T. Methods and applications for single-cell and spatial multi-omics. Nat. Rev. Genet. 24, 494–515 (2023).
Badia-I-Mompel, P. et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat. Rev. Genet. 24, 739–754 (2023).
Zhang, J. et al. Tahoe-100M: a giga-scale single-cell perturbation atlas for context-dependent gene function and cellular modeling. Preprint at bioRxiv https://doi.org/10.1101/2025.02.20.639398 (2025).
McFarland, J. M. et al. Multiplexed single-cell transcriptional response profiling to define cancer vulnerabilities and therapeutic mechanism of action. Nat. Commun. 11, 4296 (2020).
Srivatsan, S. R. et al. Massively multiplex chemical transcriptomics at single-cell resolution. Science 367, 45–51 (2020).
Dixit, A. et al. Perturb-Seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866.e17 (2016).
Replogle, J. M. et al. Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq. Cell 185, 2559–2575.e28 (2022).
Norman, T. M. et al. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 365, 786–793 (2019).
Ji, Y., Lotfollahi, M., Wolf, F. A. & Theis, F. J. Machine learning for perturbational single-cell omics. Cell Syst. 12, 522–537 (2021).
Rood, J. E., Hupalowska, A. & Regev, A. Toward a foundation model of causal cell and tissue biology with a Perturbation Cell and Tissue Atlas. Cell 187, 4520–4545 (2024).
Bock, C. et al. High-content CRISPR screening. Nat. Rev. Methods Primers 2, 8 (2022).
Peidli, S. et al. scPerturb: harmonized single-cell perturbation data. Nat. Methods 21, 531–540 (2024).
Ishikawa, M. et al. RENGE infers gene regulatory networks using time-series single-cell RNA-seq data with CRISPR perturbations. Commun. Biol. 6, 1290 (2023).
Feng, C. et al. A genome-scale single cell CRISPRi map of trans gene regulation across human pluripotent stem cell lines. Preprint at bioRxiv https://doi.org/10.1101/2024.11.28.625833 (2024).
Dong, M. et al. Causal identification of single-cell experimental perturbation effects with CINEMA-OT. Nat. Methods 20, 1769–1779 (2023). This work uses optimal transport to identify counterfactual couplings between control and perturbed populations, following the disentanglement and exclusion of perturbation effects.
Tejada-Lapuerta, A. et al. Causal machine learning for single-cell genomics. Nat. Genet. 57, 797–808 (2025).
Adamson, B. et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 167, 1867–1882.e21 (2016).
Papalexi, E. et al. Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens. Nat. Genet. 53, 322–331 (2021).
Duan, B. et al. Model-based understanding of single-cell CRISPR screening. Nat. Commun. 10, 2233 (2019).
Jiang, L. et al. Systematic reconstruction of molecular pathway signatures using scalable single-cell perturbation screens. Nat. Cell Biol. 27, 505–517 (2025).
Barry, T., Wang, X., Morris, J. A., Roeder, K. & Katsevich, E. SCEPTRE improves calibration and sensitivity in single-cell CRISPR screen analysis. Genome Biol. 22, 344 (2021).
Ryu, J. et al. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. Nat. Genet. 56, 925–937 (2024).
Sánchez-Rivera, F. J. et al. Base editing sensor libraries for high-throughput engineering and functional analysis of cancer-associated single nucleotide variants. Nat. Biotechnol. 40, 862–873 (2022).
Huang, A. C. et al. X-Atlas/Orion: genome-wide perturb-seq datasets via a scalable fix-cryopreserve platform for training dose-dependent biological foundation models. Preprint at bioRxiv https://doi.org/10.1101/2025.06.11.659105 (2025).
Trapnell, C. Revealing gene function with statistical inference at single-cell resolution. Nat. Rev. Genet. 25, 623–638 (2024).
Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).
Ramirez Flores, R. O., Schäfer, P. S. L., Küchenhoff, L. & Saez-Rodriguez, J. Complementing cell taxonomies with a multicellular analysis of tissues. Physiology 39, 129–141 (2024).
Montesuma, E. F., Mboula, F. N. & Souloumiac, A. Recent advances in optimal transport for machine learning. IEEE Trans. Pattern Anal. Mach. Intell. 47, 1161–1180 (2025).
Bunne, C., Schiebinger, G., Krause, A., Regev, A. & Cuturi, M. Optimal transport for single-cell and spatial omics. Nat. Rev. Methods Primers 4, 58 (2024).
Szałata, A. et al. Transformers in single-cell omics: a review and new perspectives. Nat. Methods 21, 1430–1443 (2024).
Consens, M. E. et al. Transformers and genome language models. Nat. Mach. Intell. 7, 346–362 (2025).
Bunne, C. et al. How to build the virtual cell with artificial intelligence: Priorities and opportunities. Cell 187, 7045–7063 (2024).
Lobentanzer, S., Rodriguez-Mier, P., Bauer, S. & Saez-Rodriguez, J. Molecular causality in the advent of foundation models. Mol. Syst. Biol. 20, 848–858 (2024).
Cui, H. et al. Towards multimodal foundation models in molecular cell biology. Nature 640, 623–633 (2025).
Stuart, J. M., Segal, E., Koller, D. & Kim, S. K. A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003).
Segal, E. et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 34, 166–176 (2003).
Lopez, R. et al. Learning causal representations of single cells via sparse mechanism shift modeling. In Proc. 2nd Conference on Causal Learning and Reasoning (eds van der Schaar, M. et al.) 662–691 (PMLR, 2023). This work uses sparse mechanism shifts to provide interpretable causal effects on learned latent variables.
Träuble, F. et al. On disentangled representations learned from correlated data. In Proc. 38th International Conference on Machine Learning 10401–10412 (PMLR, 2021).
Locatello, F. et al. Challenging common assumptions in the unsupervised learning of disentangled representations. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) 4114–4124 (PMLR, 2019).
Weinberger, E., Lin, C. & Lee, S.-I. Isolating salient variations of interest in single-cell data with contrastiveVI. Nat. Methods 20, 1336–1345 (2023). This work builds on a series of contrastive autoencoder frameworks to isolate variations of interest, such as perturbation-induced changes, from ‘background’ biological signals using single-cell omics data.
Piran, Z., Cohen, N., Hoshen, Y. & Nitzan, M. Disentanglement of single-cell data with biolord. Nat. Biotechnol. 42, 1678–1683 (2024).
Moinfar, A. A. & Theis, F. J. Unsupervised deep disentangled representation of single-cell omics with DRVI. In Proc. Learning Meaningful Representations of Life Workshop at ICLR (ICLR, 2025).
Kamimoto, K. et al. Dissecting cell identity via network inference and in silico gene perturbation. Nature 614, 742–751 (2023).
Bravo González-Blas, C. et al. SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks. Nat. Methods 20, 1355–1367 (2023).
Birk, S. et al. Quantitative characterization of cell niches in spatially resolved omics data. Nat. Genet. 57, 897–909 (2025).
Schrod, S. et al. Spatial Cellular Networks from omics data with SpaCeNet. Genome Res. 34, 1371–1383 (2024).
Zheng, X., Aragam, B. & Ravikumar, P. K. DAGs with NO TEARS: Continuous optimization for structure learning. In Advances in Neural Information Processing Systems 31 (eds Bengio, S. et al.) (2018).
Rohbeck, M. et al. Bicycle: intervention-based causal discovery with cycles. In Proc. 3rd Conference on Causal Learning and Reasoning 209–242 (PMLR, 2024).
Brouillard, P., Lachapelle, S., Lacoste, A., Lacoste-Julien, S. & Drouin, A. Differentiable causal discovery from interventional data. In Proc. 34th International Conference on Neural Information Processing Systems (eds Larochelle, H. et al.) 21865–21877 (Curran, 2020).
Bertin, P. et al. A scalable gene network model of regulatory dynamics in single cells. Preprint at https://doi.org/10.48550/arXiv.2503.20027 (2025). This work combines optimal transport and pseudotime inference to model perturbation-dependent gene regulatory networks and cellular differentiation using ordinary differential equations.
Lopez, R., Hütter, J. C., Pritchard, J. & Regev, A. Large-scale differentiable causal discovery of factor graphs. In Proc. 36th International Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 19290–19303 (Curran, 2022).
Peters, J., Janzing, D. & Schölkopf, B. Elements of Causal Inference: Foundations and Learning Algorithms (Adaptive Computation and Machine Learning series) 288 (MIT Press, 2017).
Subramanian, A. et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171, 1437–1452.e17 (2017).
Lotfollahi, M., Wolf, F. A. & Theis, F. J. scGen predicts single-cell perturbation responses. Nat. Methods 16, 715–721 (2019).
Lotfollahi, M., Naghipourfar, M., Theis, F. J. & Wolf, F. A. Conditional out-of-distribution generation for unpaired data using transfer VAE. Bioinformatics 36, i610–i617 (2020).
Lotfollahi, M. et al. Predicting cellular responses to complex perturbations in high-throughput screens. Mol. Syst. Biol. 19, e11517 (2023). This work introduces the concept of explicitly disentangling and combining perturbational, covariate and background effects using autoencoder frameworks in single-cell data.
Hediyeh-zadeh, S., Fischer, T. & Theis, F. J. Disentanglement via mechanism sparsity by replaying realizations of the past. In Proc. ICLR 2024 Workshop on Machine Learning for Genomics Explorations (ICLR, 2024).
Bunne, C. et al. Learning single-cell perturbation responses using neural optimal transport. Nat. Methods 20, 1759–1768 (2023).
Rohbeck, M. et al. Modeling complex system dynamics with flow matching across time and conditions. In Proc. 13th International Conference on Learning Representations (ICLR, 2025).
Bhaskar, D., et al. Inferring dynamic regulatory interaction graphs from time series data with perturbations. In Proc. 2nd Learning on Graphs Conference (eds Villar, S. & Chamberlain, B.) 22:1–22:21 (PMLR, 2024).
Roohani, Y., Huang, K. & Leskovec, J. Predicting transcriptional outcomes of novel multigene perturbations with GEARS. Nat. Biotechnol. 42, 927–935 (2024). This work shows that co-expressions and prior knowledge representations can be used to relate gene perturbations, thus improving the extrapolation of unobserved perturbations.
Gaudelet, T. et al. Season combinatorial intervention predictions with Salt & Peper. In ICLR 2024 Workshop on Machine Learning for Genomics Explorations (ICLR, 2024).
Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).
Bereket, M. & Karaletsos, T. Modelling cellular perturbations with the sparse additive mechanism shift variational autoencoder. In Proc. 37th Conference on Neural Information Processing Systems (eds Oh, A. et al.) 1–12 (Curran, 2023).
Slack, M. D., Martinez, E. D., Wu, L. F. & Altschuler, S. J. Characterizing heterogeneous cellular responses to perturbations. Proc. Natl Acad. Sci. USA 105, 19306–19311 (2008).
He, S. et al. Squidiff: predicting cellular development and responses to perturbations using a diffusion model. Nat. Methods https://doi.org/10.1038/s41592-025-02877-y (2025).
Bunne, C., Krause, A. & Cuturi, M. Supervised training of conditional monge maps. In Proc. 36th International Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 35, 6859–6872 (Curran, 2022). This work builds on CellOT to introduce a context-aware optimal transport method that enables the extrapolation to novel perturbations and combinatorial effects.
Kim, M. C. et al. Method of moments framework for differential expression analysis of single-cell RNA sequencing data. Cell 187, 6393–6410.e16 (2024).
Squair, J. W. et al. Confronting false discoveries in single-cell differential expression. Nat. Commun. 12, 5692 (2021).
Neufeld, A., Gao, L. L., Popp, J., Battle, A. & Witten, D. Inference after latent variable estimation for single-cell RNA sequencing data. Biostatistics 25, 270–287 (2023).
Missarova, A., Dann, E., Rosen, L., Satija, R. & Marioni, J. Leveraging neighborhood representations of single-cell data to achieve sensitive DE testing with miloDE. Genome Biol. 25, 189 (2024).
Ahlmann-Eltze, C. & Huber, W. Analysis of multi-condition single-cell data with latent embedding multivariate regression. Nat. Genet. 57, 659–667 (2025).
Madrigal, A., Lu, T., Soto, L. M. & Najafabadi, H. S. A unified model for interpretable latent embedding of multi-sample, multi-condition single-cell data. Nat. Commun. 15, 6573 (2024).
Jin, K. et al. CellDrift: inferring perturbation responses in temporally sampled single-cell data. Brief. Bioinform. 23, bbac324 (2022).
Dong, M., Su, D. G., Kluger, H., Fan, R. & Kluger, Y. SIMVI disentangles intrinsic and spatial-induced cellular states in spatial omics data. Nat. Commun. 16, 2990 (2025).
Cui, Y. & Yuan, Z. Prioritizing perturbation-responsive gene patterns using interpretable deep learning. Nat. Commun. 16, 6095 (2025).
Song, B. et al. Decoding heterogeneous single-cell perturbation responses. Nat. Cell Biol. 27, 493–504 (2025).
Yang, L. et al. scMAGeCK links genotypes with multiple phenotypes in single-cell CRISPR screens. Genome Biol. 21, 19 (2020).
Skinnider, M. A. et al. Cell type prioritization in single-cell data. Nat. Biotechnol. 39, 30–34 (2021).
Burkhardt, D. B. et al. Quantifying the effect of experimental perturbations at single-cell resolution. Nat. Biotechnol. 39, 619–629 (2021).
Nicol, P. B. et al. Robust identification of perturbed cell types in single-cell RNA-seq data. Nat. Commun. 15, 7610 (2024).
Li, C. et al. scRank infers drug-responsive cell types from untreated scRNA-seq data using a target-perturbed gene regulatory network. Cell Rep. Med. 5, 101568 (2024).
Cui, Y. & Yuan, Z. Scalable condition-relevant cell niche analysis of spatial omics data with Taichi. Preprint at bioRxiv https://doi.org/10.1101/2024.05.30.596656 (2024).
Teo, A. Y. Y. et al. Identification of perturbation-responsive regions and genes in comparative spatial transcriptomics atlases. Preprint at bioRxiv https://doi.org/10.1101/2024.06.13.598641 (2024).
Stein-O’Brien, G. L. et al. Enter the matrix: factorization uncovers knowledge from omics. Trends Genet. 34, 790–805 (2018).
Lopez, R., Gayoso, A. & Yosef, N. Enhancing scientific discoveries in molecular biology with deep generative models. Mol. Syst. Biol. 16, e9198 (2020).
Zhou, Y., Luo, K., Liang, L., Chen, M. & He, X. A new Bayesian factor analysis method improves detection of genes and biological processes affected by perturbations in single-cell CRISPR screening. Nat. Methods 20, 1693–1703 (2023). This work proposes a supervised factor model that allows the direct mapping of interventions to latent factors and associated genes.
Jones, A., Townes, F. W., Li, D. & Engelhardt, B. E. Contrastive latent variable modeling with application to case-control sequencing experiments. Ann. Appl. Stat. 16, 1268–1291 (2022).
Capraz, T. et al. Semi-supervised Omics Factor Analysis (SOFA) disentangles known sources of variation from latent factors in multi-omics data. Preprint at bioRxiv https://doi.org/10.1101/2024.10.10.617527 (2025).
Moeed, A. et al. Identifying effects of disease on single-cells with domain-invariant generative modeling. In Proc. Causal Representation Learning Workshop at NeurIPS (NeurIPS, 2023).
He, S. et al. High-plex imaging of RNA and proteins at subcellular resolution in fixed tissue by spatial molecular imaging. Nat. Biotechnol. 40, 1794–1806 (2022).
Mitchel, J. et al. Coordinated, multicellular patterns of transcriptional variation that stratify patient cohorts are revealed by tensor decomposition. Nat. Biotechnol. 43, 1192–1201 (2025).
Ramirez Flores, R. O., Lanzer, J. D., Dimitrov, D., Velten, B. & Saez-Rodriguez, J. Multicellular factor analysis of single-cell data for a tissue-centric understanding of disease. eLife 12, e93161 (2023).
Jerby-Arnon, L. & Regev, A. DIALOGUE maps multicellular programs in tissue from single-cell or spatial transcriptomics data. Nat. Biotechnol. 40, 1467–1477 (2022).
Pekayvaz, K. et al. Multiomic analyses uncover immunological signatures in acute and chronic coronary syndromes. Nat. Med. 30, 1696–1710 (2024).
Macnair, W. et al. snRNA-seq stratifies multiple sclerosis patients into distinct white matter glial responses. Neuron 113, 396–410.e9 (2025).
Yuan, Q. & Duren, Z. Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data. Nat. Biotechnol. 43, 247–257 (2025).
Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. In Proc. 31st International Conference on Neural Information Processing Systems (eds von Luxburg, U. et al.) 4768–4777 (Curran, 2017).
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 3145–3153 (PMLR, 2017).
Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 3319–3328 (PMLR, 2017).
Kalfon, J., Samaran, J., Peyré, G. & Cantini, L. scPRINT: pre-training on 50 million cells allows robust gene network predictions. Nat. Commun. 16, 3607 (2025). This work introduces a foundational model that combines the learned representations with diverse prior knowledge to evaluate and improve gene regulatory network inference.
Hao, M. et al. Large-scale foundation model on single-cell transcriptomics. Nat. Methods 21, 1481–1491 (2024).
Yang, X. et al. GeneCompass: deciphering universal gene regulatory mechanisms with a knowledge-informed cross-species foundation model. Cell Res. 34, 830–845 (2024).
Tu, X. et al. A supervised contrastive framework for learning disentangled representations of cell perturbation data. In Proc. 18th Machine Learning in Computational Biology Meeting (eds Knowles, D. A. & Mostafavi, S.) 90–100 (PMLR, 2024).
Weinberger, E., Conrad, R. & Ashuach, T. Modeling variable guide efficiency in pooled CRISPR screens with ContrastiveVI+. In Proc. NeurIPS 2024 Workshop on AI for New Drug Modalities (NeurIPS, 2024).
Aliee, H. et al. inVAE: conditionally invariant representation learning for generating multivariate single-cell reference maps. Preprint at bioRxiv https://doi.org/10.1101/2024.12.06.627196 (2024).
Weinberger, E., Lopez, R., Huetter, J.-C. & Regev, A. Disentangling shared and group-specific variations in single-cell transcriptomics data with multiGroupVI. In Proc. 17th Machine Learning in Computational Biology Meeting 16–32 (PMLR, 2022).
DeTomaso, D. & Yosef, N. Hotspot identifies informative gene modules across modalities of single-cell genomics. Cell Syst. 12, 446–456.e9 (2021).
Xu, Y., Fleming, S., Tegtmeyer, M., McCarroll, S. A. & Babadi, M. Explainable modeling of single-cell perturbation data using attention and sparse dictionary learning. Cell Syst. 16, 101245 (2025).
Lotfollahi, M. et al. Biologically informed deep learning to query gene programs in single-cell atlases. Nat. Cell Biol. 25, 337–350 (2023).
Seninge, L., Anastopoulos, I., Ding, H. & Stuart, J. VEGA is an interpretable generative model for inferring biological network activity in single-cell transcriptomics. Nat. Commun. 12, 5684 (2021).
Zhao, Y., Cai, H., Zhang, Z., Tang, J. & Li, Y. Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data. Nat. Commun. 12, 5261 (2021).
Doncevic, D. & Herrmann, C. Biologically informed variational autoencoders allow predictive modeling of genetic and drug-induced perturbations. Bioinformatics 39, btad387 (2023).
Saraswat, M. et al. Decoding plasticity regulators and transition trajectories in glioblastoma with single-cell multiomics. Preprint at bioRxiv https://doi.org/10.1101/2025.05.13.653733 (2025).
Nazaret, A. et al. Joint representation and visualization of derailed cell states with Decipher. Genome Biol. 26, 219 (2025).
Svensson, V., Gayoso, A., Yosef, N. & Pachter, L. Interpretable factor models of single-cell RNA-seq via variational autoencoders. Bioinformatics 36, 3418–3421 (2020).
Lucas, J., Tucker, G., Grosse, R. B. & Norouzi, M. Don’t blame the ELBO! a linear VAE perspective on posterior collapse. In Proc. 33rd International Conference on Neural Information Processing Systems (eds Wallach, H. M.) 9408–9418 (Curran, 2019).
Garrido-Rodriguez, M., Zirngibl, K., Ivanova, O., Lobentanzer, S. & Saez-Rodriguez, J. Integrating knowledge and omics to decipher mechanisms via large-scale models of signaling networks. Mol. Syst. Biol. 18, e11036 (2022).
Badia-I-Mompel, P. et al. decoupleR: ensemble of computational methods to infer biological activities from omics data. Bioinform. Adv. 2, vbac016 (2022).
Kunes, R. Z., Walle, T., Land, M., Nawy, T. & Pe’er, D. Supervised discovery of interpretable gene programs from single-cell data. Nat. Biotechnol. 42, 1084–1095 (2024).
Qoku, A. & Buettner, F. Encoding domain knowledge in multi-view latent variable models: a Bayesian approach with structured sparsity. In Proc. 26th International Conference on Artificial Intelligence and Statistics 11545–11562 (PMLR, 2023).
Buettner, F., Pratanwanich, N., McCarthy, D. J., Marioni, J. C. & Stegle, O. f-scLVM: scalable and versatile factor analysis for single-cell RNA-seq. Genome Biol. 18, 212 (2017).
Gut, G., Stark, S. G., Rätsch, G. & Davidson, N. R. pmVAE: learning interpretable single-cell representations with pathway modules. Preprint at bioRxiv https://doi.org/10.1101/2021.01.28.428664 (2021).
Niyakan, S., Luo, X., Yoon, B.-J. & Qian, X. Biologically interpretable VAE with supervision for transcriptomics data under ordinal perturbations. In ICLR 2024 Workshop on Machine Learning for Genomics Explorations (ICLR, 2024).
de la Fuente Cedeño, J. et al. Interpretable causal representation learning for biological data in the pathway space. In Proc. 13th International Conference on Learning Representations (eds Yue, Y. et al.) (ICLR, 2025).
Gonzalez, G. et al. Combinatorial prediction of therapeutic perturbations using causally inspired neural networks. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-025-01481-x (2025).
Bai, D., Ellington, C. N., Mo, S., Song, L. & Xing, E. P. AttentionPert: accurately modeling multiplexed genetic perturbations with multi-scale effects. Bioinformatics 40, i453–i461 (2024).
Wu, Y., et al. Predicting cellular responses with variational causal inference and refined relational information. In Proc. 11th International Conference on Learning Representations (ICLR, 2023).
Alsulami, R. et al. PrePR-CT: predicting perturbation responses in unseen cell types using cell-type-specific graphs. Preprint at bioRxiv https://doi.org/10.1101/2024.07.24.604816 (2024).
Huang, W. & Liu, H. Predicting single-cell cellular responses to perturbations using cycle consistency learning. Bioinformatics 40, i462–i470 (2024).
Hetzel, L. et al. Predicting cellular responses to novel drug perturbations at a single-cell resolution. In Proc. 36th International Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 26711–26722 (Curran, 2022).
Qi, X. et al. Predicting transcriptional responses to novel chemical perturbations using deep generative model for drug discovery. Nat. Commun. 15, 9256 (2024).
Schrod, S., Zacharias, H. U., Beißbarth, T., Hauschild, A.-C. & Altenbuchinger, M. CODEX: COunterfactual Deep learning for the in silico EXploration of cancer cell line perturbations. Bioinformatics 40, i91–i99 (2024).
Huang, K. et al. Sequential optimal experimental design of perturbation screens guided by multi-modal priors. In 28th Annual Conference on Research in Computational Molecular Biology (ed. Ma, J.) 17–37 (Springer-Verlag, 2024).
Märtens, K., Donovan-Maiye, R. & Ferkinghoff-Borg, J. Enhancing generative perturbation models with LLM-informed gene embeddings. In Proc. Workshop on Machine Learning for Genomics Explorations (ICLR, 2024).
Klein, D. et al. CellFlow enables generative single-cell phenotype modeling with flow matching. Preprint at bioRxiv https://doi.org/10.1101/2025.04.11.648220 (2025).
Hetzel, L. et al. Predicting cellular responses to novel drug perturbations at a single-cell resolution. In Advances in Neural Information Processing Systems 35, 26711–26722 (NeurIPS, 2025).
Badia-i-Mompel, P. et al. Comparison and evaluation of methods to infer gene regulatory networks from multimodal single-cell data. Preprint at bioRxiv https://doi.org/10.1101/2024.12.20.629764 (2025).
Hasanaj, E. et al. Multimodal benchmarking of foundation model representations for cellular perturbation response prediction. Preprint at bioRxiv https://doi.org/10.1101/2025.06.26.661186 (2025).
Szalai, B. & Saez-Rodriguez, J. Why do pathway methods work better than they should? FEBS Lett. 594, 4189–4200 (2020).
Barabási, A.-L. & Oltvai, Z. N. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113 (2004).
Pratapa, A., Jalihal, A. P., Law, J. N., Bharadwaj, A. & Murali, T. M. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat. Methods 17, 147–154 (2020).
Gao, S. & Wang, X. Quantitative utilization of prior biological knowledge in the Bayesian network modeling of gene expression data. BMC Bioinform. 12, 359 (2011).
Huynh-Thu, V. A., Irrthum, A., Wehenkel, L. & Geurts, P. Inferring regulatory networks from expression data using tree-based methods. PLoS One 5, e12776 (2010).
Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).
Wang, L. et al. Dictys: dynamic gene regulatory network dissects developmental continuum with single-cell multiomics. Nat. Methods 20, 1368–1378 (2023).
Dong, M. & Kluger, Y. GEASS: neural causal feature selection for high-dimensional biological data. In Proc. 11th International Conference on Learning Representations (ICLR, 2023).
Wang, W. et al. RegVelo: gene-regulatory-informed dynamics of single cells. Preprint at bioRxiv https://doi.org/10.1101/2024.12.11.627935 (2024).
Tanevski, J. et al. Learning tissue representation by identification of persistent local patterns in spatial omics data. Nat. Commun. 16, 4071 (2025).
Tanevski, J., Flores, R. O. R., Gabor, A., Schapiro, D. & Saez-Rodriguez, J. Explainable multiview framework for dissecting spatial relationships from highly multiplexed data. Genome Biol. 23, 97 (2022).
Megas, S. et al. Estimation of single-cell and tissue perturbation effect in spatial transcriptomics via spatial causal disentanglement. In Proc. 13th International Conference on Learning Representations (ICLR, 2024).
Wen, Y. et al. Applying causal discovery to single-cell analyses using CausalCell. eLife 12, e81464 (2023).
Belyaeva, A., Squires, C. & Uhler, C. DCI: learning causal differences between gene regulatory networks. Bioinformatics 37, 3067–3069 (2021).
Chevalley, M., Roohani, Y. H., Mehrjou, A., Leskovec, J. & Schwab, P. A large-scale benchmark for network inference from single-cell perturbation data. Commun. Biol. 8, 412 (2025).
Zheng, X., Dan, C., Aragam, B., Ravikumar, P. & Xing, E. Learning sparse nonparametric DAGs. In Proc. 23rd International Conference on Artificial Intelligence and Statistics (eds Chiappa, S. & Calandra, R.) 3414–3425 (PMLR, 2020).
Yu, Y., Chen, J., Gao, T. & Yu, M. DAG-GNN: DAG structure learning with graph neural networks. In Proc. 36th International Conference on Machine Learning 7154–7163 (PMLR, 2019).
Wu, M., Bao, Y., Barzilay, R. & Jaakkola, T. Sample, estimate, aggregate: a recipe for causal discovery foundation models. In Transactions on Machine Learning Research (eds Kamath, G. et al.) 10 (TMLR, 2025).
Zhang, J., Cammarata, L., Squires, C., Sapsis, T. P. & Uhler, C. Active learning for optimal intervention design in causal models. Nat. Mach. Intell. 5, 1066–1075 (2023). This work introduces an early active learning scheme that uses a causal graph model to guide the experimental exploration of genetic perturbations.
Lorch, L., Sussex, S., Rothfuss, J., Krause, A. & Schölkopf, B. In Proc. 36th International Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 13104–13118 (Curran, 2022).
Sethuraman, M. G. et al. NODAGS-Flow: nonlinear cyclic causal structure learning. In Proc. 26th International Conference on Artificial Intelligence and Statistics (eds Ruiz, F. et al.) 6371–6387 (PMLR, 2023).
Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).
Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).
Comon, P. Independent component analysis, a new concept? Signal Process. 36, 287–314 (1994).
Hyvärinen, A. & Oja, E. Independent component analysis: algorithms and applications. Neural Netw. 13, 411–430 (2000).
Yu, H. & Welch, J. D. MichiGAN: sampling from disentangled representations of single-cell data using generative adversarial networks. Genome Biol. 22, 158 (2021).
Moran, G. E., Sridhar, D., Wang, Y. & Blei, D. Identifiable deep generative models via sparse decoding. In Transactions on Machine Learning Research (eds Kamath, G. et al.) 182 (TMLR, 2022).
Lopez, R., Regier, J., Jordan, M. I. & Yosef, N. Information constraints on auto-encoding variational Bayes. In Proc. 32nd International Conference on Neural Information Processing Systems (eds Bengio, S. et al.) 6117–6128 (Curran, 2018).
Gayoso, A. et al. A Python library for probabilistic analysis of single-cell omics data. Nat. Biotechnol. 40, 163–166 (2022). This work combines a series of variational autoencoder extensions that build on scVI into a centralized Python framework that aims to accelerate the development of probabilistic (autoencoder) models for single-cell omics data analysis.
Hyvärinen, A. & Pajunen, P. Nonlinear independent component analysis: Existence and uniqueness results. Neural Netw. 12, 429–439 (1999).
Hyvärinen, A., Khemakhem, I. & Morioka, H. Nonlinear independent component analysis for principled disentanglement in unsupervised deep learning. Patterns 4, 100844 (2023).
Lachapelle, S. et al. Disentanglement via mechanism sparsity regularization: a new principle for nonlinear ICA. In First Conference on Causal Learning and Reasoning (eds Schölkopf, B. et al.) 177, 428–484 (2022).
Zou, J. Y., Hsu, D. J., Parkes, D. C. & Adams, R. P. Contrastive learning using spectral methods. In Proc. 27th International Conference on Neural Information Processing Systems - Volume 2 (eds Burges, C. J. C. et al.) 2238–2246 (Curran, 2013).
Abid, A., Zhang, M. J., Bagaria, V. K. & Zou, J. Exploring patterns enriched in a dataset with contrastive principal component analysis. Nat. Commun. 9, 2134 (2018).
Li, D., Jones, A. & Engelhardt, B. Probabilistic contrastive dimension reduction for case-control study data. Ann. Appl. Stat. 18, 2207–2229 (2024).
Boileau, P., Hejazi, N. S. & Dudoit, S. Exploring high-dimensional biological data with sparse contrastive principal component analysis. Bioinformatics 36, 3422–3430 (2020).
Abid, A. & Zou, J. Contrastive variational autoencoder enhances salient features. Preprint at https://doi.org/10.48550/arXiv.1902.04601 (2019).
Severson, K. A., Ghosh, S. & Ng, K. Unsupervised learning with contrastive latent variable models. In Proc. AAAI Conference on Artificial Intelligence 33, 4862–4869 (AAAI, 2019).
Zhang, L. & Zhang, S. Learning common and specific patterns from data of multiple interrelated biological scenarios with matrix factorization. Nucleic Acids Res. 47, 6606–6617 (2019).
Qian, K., Fu, S., Li, H. & Li, W. V. scINSIGHT for interpreting single-cell gene expression from biologically heterogeneous data. Genome Biol. 23, 82 (2022).
Weinberger, E., Beebe-Wang, N. & Lee, S.-I. Moment matching deep contrastive latent variable models. In Proc. 25th International Conference on Artificial Intelligence and Statistics 2354–2371 (PMLR, 2022).
Zhang, Z., Zhao, X., Bindra, M., Qiu, P. & Zhang, X. scDisInFact: disentangled learning for integration and prediction of multi-batch multi-condition single-cell RNA-sequencing data. Nat. Commun. 15, 912 (2024).
Megas, S. et al. Integrating multi-covariate disentanglement with counterfactual analysis on synthetic data enables cell type discovery and counterfactual predictions. Preprint at bioRxiv https://doi.org/10.1101/2025.06.03.657578 (2025).
Inecik, K., Kara, A., Rose, A., Haniffa, M. & Theis, F. J. TarDis: achieving robust and structured disentanglement of multiple covariates. In Proc. Research in Computational Molecular Biology: 29th International Conference, RECOMB 2025 (ed. Sankararaman, S.) 285–289 (Springer, 2025).
Inecik, K., Uhlmann, A., Lotfollahi, M. & Theis, F. MultiCPA: multimodal compositional perturbation autoencoder. Preprint at bioRxiv https://doi.org/10.1101/2022.07.08.499049 (2022).
Wei, X., Dong, J. & Wang, F. scPreGAN, a deep generative model for predicting the response of single-cell expression to perturbation. Bioinformatics 38, 3377–3384 (2022).
Mao, H. et al. Learning identifiable factorized causal representations of cellular responses. In Advances in Neural Information Processing Systems 37 (NeurIPS 2024) (eds Globerson, A. et al.) 121630–121669 (NeurIPS, 2024).
Miladinovic, D. et al. In silico biological discovery with large perturbation models. Nat. Comput. Sci. 5, 1029–1040 (2025).
Adduri, A. K. et al. Predicting cellular responses to perturbation across diverse contexts with State. Preprint at bioRxiv https://doi.org/10.1101/2025.06.26.661135 (2025).
Rampášek, L., Hidru, D., Smirnov, P., Haibe-Kains, B. & Goldenberg, A. Dr.VAE: improving drug response prediction via modeling of drug perturbation effects. Bioinformatics 35, 3743–3751 (2019).
Zhang, J. et al. Identifiability guarantees for causal disentanglement from soft interventions. In Proc. 37th International Conference on Neural Information Processing Systems (eds Oh, A. et al.) 50254–50292 (Curran, 2023).
Wu, Y. et al. PerturBench: benchmarking machine learning models for cellular perturbation analysis. In NeurIPS 2024 Workshop on AI for New Drug Modalities (NeurIPS, 2024).
Liu, T. et al. scELMo: embeddings from language models are good learners for single-cell data analysis. Preprint at bioRxiv https://doi.org/10.1101/2023.12.07.569910 (2023).
Zhong, J., Li, L., Dannenfelser, R. & Yao, V. Benchmarking gene embeddings from sequence, expression, network, and text models for functional prediction tasks. Preprint at bioRxiv https://doi.org/10.1101/2025.01.29.635607 (2025).
Istrate, A.-M., Li, D. & Karaletsos, T. scGenePT: is language all you need for modeling single-cell perturbations? Preprint at bioRxiv https://doi.org/10.1101/2024.10.23.619972 (2024).
Wenteler, A. et al. PertEval-scFM: benchmarking single-cell foundation models for perturbation effect prediction. In 42nd International Conference on Machine Learning (ICML, 2025).
Csendes, G., Sanz, G., Szalay, K. Z. & Szalai, B. Benchmarking foundation cell models for post-perturbation RNA-seq prediction. BMC Genom. 26, 393 (2025).
Kernfeld, E., Yang, Y., Weinstock, J. S., Battle, A. & Cahan, P. A comparison of computational methods for expression forecasting. Genome Biol. 26, 388 (2025).
Viñas Torné, R. et al. Systema: a framework for evaluating genetic perturbation response prediction beyond systematic variation. Nat. Biotechnol. https://doi.org/10.1038/s41587-025-02777-8 (2025).
Ahlmann-Eltze, C., Huber, W. & Anders, S. Deep-learning-based gene perturbation effect prediction does not yet outperform simple linear baselines. Nat. Methods 22, 1657–1661 (2025).
von Kügelgen, J., Ketterer, J., Shen, X., Meinshausen, N. & Peters, J. Representation learning for distributional perturbation extrapolation. In Learning Meaningful Representations of Life (LMRL) Workshop at ICLR (ICLR, 2025).
Carvalho, C. M. et al. High-dimensional sparse factor modeling: applications in gene expression genomics. J. Am. Stat. Assoc. 103, 1438–1456 (2008).
Liu, E., Zhang, J. & Uhler, C. Learning genetic perturbation effects with variational causal inference. Preprint at bioRxiv https://doi.org/10.1101/2025.06.05.657988 (2025).
Jiang, Q., Chen, S., Chen, X. & Jiang, R. scPRAM accurately predicts single-cell gene expression perturbation response based on attention mechanism. Bioinformatics 40, btae265 (2024).
Klein, D. et al. Mapping cells through time and space with moscot. Nature 638, 1065–1075 (2025).
Schiebinger, G. et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell 176, 928–943.e22 (2019).
Kapuńniak, K. et al. Metric flow matching for smooth interpolations on the data manifold. In Proc. 38th International Conference on Neural Information Processing Systems (eds Globerson, A. et al.) 135011–135042 (Curran, 2024).
Tong, A. et al. Improving and generalizing flow-based generative models with minibatch optimal transport. In Transactions on Machine Learning Research (eds Kamath, G. et al.) 1768 (TMLR, 2024).
Erbe, R., Stein-O’Brien, G. & Fertig, E. J. Transcriptomic forecasting with neural ordinary differential equations. Patterns 4, 100793 (2023).
Palma, A. et al. Multi-modal and multi-attribute generation of single cells with CFGen. In Proc. 13th International Conference on Learning Representations (ICLR, 2025).
Yuan, B. et al. CellBox: interpretable machine learning for perturbation biology with application to the design of cancer combination therapy. Cell Syst. 12, 128–140.e4 (2021).
Aivazidis, A. et al. Cell2fate infers RNA velocity modules to improve cell fate prediction. Nat. Methods 22, 698–707 (2025).
Qiu, X. et al. Mapping transcriptomic vector fields of single cells. Cell 185, 690–711.e45 (2022).
Tong, A., Huang, J., Wolf, G., van Dijk, D. & Krishnaswamy, S. Trajectorynet: a dynamic optimal transport network for modeling cellular dynamics. Proc. Mach. Learn. Res. 119, 9526–9536 (2020).
Alatkar, S. A. & Wang, D. ARTEMIS integrates autoencoders and Schrödinger Bridges to predict continuous dynamics of gene expression, cell population, and perturbation from time-series single-cell data. Bioinformatics 41, i189–i197 (2025).
Somnath, V. R. et al. Aligned diffusion Schrödinger bridges. In Proc. 39th Conference on Uncertainty in Artificial Intelligence 1985–1995 (PMLR, 2023).
Zhang, Z., Li, T. & Zhou, P. Learning stochastic dynamics from snapshots through regularized unbalanced optimal transport. In Proc. 13th International Conference on Learning Representations (ICLR, 2025).
Yeo, G. H. T., Saksena, S. D. & Gifford, D. K. Generative modeling of single-cell time series with PRESCIENT enables prediction of cell trajectories with interventions. Nat. Commun. 12, 3222 (2021).
Luo, E., Hao, M., Wei, L. & Zhang, X. scDiffusion: conditional generation of high-quality single-cell data using diffusion model. Bioinformatics 40, btae518 (2024).
Luecken, M. D. & Theis, F. J. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol. 15, e8746 (2019).
Huang, S., Soto, A. M. & Sonnenschein, C. The end of the genetic paradigm of cancer. PLoS Biol. 23, e3003052 (2025).
Szałata, A. et al. A benchmark for prediction of transcriptomic responses to chemical perturbations across cell types. In Proc. 38th International Conference on Neural Information Processing Systems (eds Globerson, A. et al.) 20566–20616 (Curran, 2024).
Kernfeld, E., Keener, R., Cahan, P. & Battle, A. Transcriptome data are insufficient to control false discoveries in regulatory network inference. Cell Syst. 15, 709–724.e13 (2024).
Caranzano, I. et al. Sparsity is all you need: rethinking biological pathway-informed approaches in deep learning. Preprint at https://doi.org/10.48550/arXiv.2505.04300 (2025).
Radig, J. et al. Tracking biological hallucinations in single-cell perturbation predictions using scArchon, a comprehensive benchmarking platform. Preprint at bioRxiv https://doi.org/10.1101/2025.06.23.661046 (2025).
Kedzierska, K. Z., Crawford, L., Amini, A. P. & Lu, A. X. Zero-shot evaluation reveals limitations of single-cell foundation models. Genome Biol. 26, 101 (2025).
Mejia, G. M. et al. Diversity by design: addressing mode collapse improves scRNA-seq perturbation modeling on well-calibrated metrics. In ICML 2025 Generative AI and Biology Workshop (ICML, 2025).
Mahmood, F. A benchmarking crisis in biomedical machine learning. Nat. Med. 31, 1060 (2025).
Ji, Y. et al. Optimal distance metrics for single-cell RNA-seq populations. Preprint at bioRxiv https://doi.org/10.1101/2023.12.26.572833 (2023).
Luecken, M. D. et al. Defining and benchmarking open problems in single-cell analysis. Nat. Biotechnol. 43, 1035–1040 (2025).
Roohani, Y. H. et al. Virtual Cell Challenge: toward a Turing test for the virtual cell. Cell 188, 3370–3374 (2025).
Heumos, L. et al. Pertpy: an end-to-end framework for perturbation analysis. Preprint at bioRxiv https://doi.org/10.1101/2024.08.04.606516 (2024).
CZI Cell Science Program et al. CZ CELLxGENE Discover: a single-cell data platform for scalable exploration, analysis and modeling of aggregated data. Nucleic Acids Res. 53, D886–D900 (2025).
Youngblut, N. D. et al. scBaseCamp: an AI agent-curated, uniformly processed, and continually expanding single cell data repository. Preprint at bioRxiv https://doi.org/10.1101/2025.02.27.640494 (2025).
Roohani, Y. et al. BioDiscoveryAgent: an AI agent for designing genetic perturbation experiments. The 13th International Conference on Learning Representations (ICLR, 2024).
Weinreb, C., Rodriguez-Fraticelli, A., Camargo, F. D. & Klein, A. M. Lineage tracing on transcriptional landscapes links state to fate during differentiation. Science 367, eaaw3381 (2020).
Chen, W. et al. Live-seq enables temporal transcriptomic recording of single cells. Nature 608, 733–740 (2022).
Kobayashi-Kirschvink, K. J. et al. Prediction of single-cell RNA expression profiles in live cells by Raman microscopy with Raman2RNA. Nat. Biotechnol. 42, 1726–1734 (2024).
Reynolds, D. E. et al. Temporal and spatial omics technologies for 4D profiling. Nat. Methods 22, 1408–1419 (2025).
Gu, J. et al. Mapping multimodal phenotypes to perturbations in cells and tissue with CRISPRmap. Nat. Biotechnol. 43, 1101–1115 (2025).
Dhainaut, M. et al. Spatial CRISPR genomics identifies regulators of the tumor microenvironment. Cell 185, 1223–1239.e20 (2022).
Saunders, R. A. et al. A platform for multimodal in vivo pooled genetic screens reveals regulators of liver function. Preprint at bioRxiv https://doi.org/10.1101/2024.11.18.624217 (2025).
Breinig, M. et al. Integrated in vivo combinatorial functional genomics and spatial transcriptomics of tumours to decode genotype-to-phenotype relationships. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-025-01437-1 (2025).
Metzner, E., Southard, K. M. & Norman, T. M. Multiome Perturb-seq unlocks scalable discovery of integrated perturbation effects on the transcriptome and epigenome. Cell Syst. 16, 101161 (2025).
Mimitou, E. P. et al. Multiplexed detection of proteins, transcriptomes, clonotypes and CRISPR perturbations in single cells. Nat. Methods 16, 409–412 (2019).
Ryu, J., Lopez, R., Bunne, C., Pinello, L. & Regev, A. Cross-modality matching and prediction of perturbation responses with labeled Gromov-Wasserstein optimal transport. In ICML 2024 AI for Science Workshop (ICML, 2024).
Wenckstern, J. et al. AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery. In Proc. Learning Meaningful Representations of Life (LMRL) Workshop at ICLR (ICLR, 2025).
Chen, W. et al. A visual-omics foundation model to bridge histopathology with spatial transcriptomics. Nat. Methods 22, 1568–1582 (2025).
Rizvi, S. A. et al. Scaling large language models for next-generation single-cell analysis. Preprint at bioRxiv https://doi.org/10.1101/2025.04.14.648850 (2025).
Ji, Y. et al. Scalable and universal prediction of cellular phenotypes. Preprint at bioRxiv https://doi.org/10.1101/2024.08.12.607533 (2025).
Gupta, A. et al. SubCell: vision foundation models for microscopy capture single-cell biology. Preprint at bioRxiv https://doi.org/10.1101/2024.12.06.627299 (2025).
Maan, H. et al. Multi-modal disentanglement of spatial transcriptomics and histopathology imaging. In Learning Meaningful Representations of Life (LMRL) Workshop at ICLR (ICLR, 2025).
Datlinger, P. et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat. Methods 14, 297–301 (2017).
Lalli, M. A., Avey, D., Dougherty, J. D., Milbrandt, J. & Mitra, R. D. High-throughput single-cell functional elucidation of neurodevelopmental disease-associated genes reveals convergent mechanisms altering neuronal differentiation. Genome Res. 30, 1317–1331 (2020).
Huguet, G. et al. Manifold interpolating optimal-transport flows for trajectory inference. Adv. Neural Inf. Process. Syst. 35, 29705–29718 (2022).
Wang, S.-W., Herriges, M. J., Hurley, K., Kotton, D. N. & Klein, A. M. CoSpar identifies early cell fate biases from single-cell transcriptomic and lineage information. Nat. Biotechnol. 40, 1066–1074 (2022).
Heimberg, G., Bhatnagar, R., El-Samad, H. & Thomson, M. Low dimensionality in gene expression data enables the accurate extraction of transcriptional programs from shallow sequencing. Cell Syst. 2, 239–250 (2016).
VanderWeele, T. J. & Shpitser, I. On the definition of a confounder. Ann. Stat. 41, 196–220 (2013).
Fröhlich, F. et al. Efficient parameter estimation enables the prediction of drug response using a mechanistic pan-cancer pathway model. Cell Syst. 7, 567–579.e6 (2018).
Cuturi, M. et al. Optimal Transport Tools (OTT): a JAX toolbox for all things Wasserstein. Preprint at https://doi.org/10.48550/arXiv.2201.12324 (2022).
Acknowledgements
The authors thank S. Müller-Dott, P. S. L. Schäfer, P. Rodriguez Mier, A. Moeed, M. Garrido Rodriguez-Cordoba, R. O. Ramirez Flores, R. Abdulhamid and J. Saez-Rodriguez for their feedback on the initial draft. The authors’ work is supported through state funds approved by the State Parliament of Baden-Württemberg for the Innovation Campus Health + Life Science alliance Heidelberg Mannheim, the Data Science Collaborative Research Programme 2022 by the Novo Nordisk Foundation (grant NNF22OC0076414), and the European Research Council (Synergy Grant DECODE 810296). The authors also acknowledge funding from GSK through the EMBL-GSK collaboration framework (3000038350).
Author information
Authors and Affiliations
Contributions
D.D., S.S. and M.R. researched the literature. D.D., S.S. and O.S. contributed substantially to discussions of the content. All authors wrote the article and reviewed and/or edited the manuscript.
Corresponding authors
Ethics declarations
Competing interests
O.S. is a paid consultant of Insitro. The other authors declare no competing interests.
Peer review
Peer review information
Nature Reviews Genetics thanks the anonymous reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
Online resource: https://interp-extrap-perturb.readthedocs.io
Supplementary information
Glossary
- Agentic workflows
-
A computational process in which multiple task-specific models (agents) autonomously collaborate to plan and execute a sequence of tasks, attempting to achieve a complex common objective with minimal human intervention.
- Autoencoders
-
Types of neural networks that learn a compressed, low-dimensional representation (encoding) of input data and then reconstruct (decode) the original input from the (typically) compressed encoding.
- Causal graph models
-
Statistical models that represent cause–and–effect relationships through a structured graph in which variables are represented by nodes and causal influences by directed edges.
- Causal mechanisms
-
Directed, causal interactions between specific molecules through which signals propagate.
- Causal signatures
-
A set of observable variables that reflect the underlying causal processes, such as perturbations, cellular heterogeneity, regulatory layers, and temporal and spatial scales.
- Conditional independence
-
The mutual status of two variables that no longer provide information about each other once other variables are accounted for.
- Confounders
-
Extraneous factors that, if not controlled for, can produce misleading or spurious associations between variables of interest.
- Counterfactual
-
A hypothetical outcome representing what would have occurred under alternative conditions or different interventions from those actually observed.
- Diffusion models
-
A class of generative models that systematically introduce noise into data and attempt to reverse this process to generate new data by modelling complex probability distributions.
- Embeddings
-
Low-dimensional vector (or matrix) representations of an entity, such as a sample, feature or condition, that capture its relevant properties and relationships.
- Factor models
-
Statistical models that represent observed variables as linear combinations of lower-dimension latent factors plus noise, in which each factor captures shared variation among the variables.
- Gene programmes
-
A coordinated set of genes that represent shared biological functions and responses.
- Generalize
-
To maintain performance and validity across datasets or conditions beyond those used during development or training, indicating robustness and broader applicability.
- Generative models
-
Models designed to learn the underlying distributions of datasets, in order to generate new, similar data from them.
- Identifiable
-
A model’s parameters or solutions are identifiable if they can be uniquely determined from the available data under the assumed model.
- Interventions
-
Deliberate actions to manipulate a biological variable or process within a system to observe their effects.
- Latent spaces
-
Abstract representations of the data that capture the essential features and relationships in low dimensions.
- Latent variable
-
A hidden or unobservable variable that cannot be measured directly but is inferred from observable data, ideally representing the underlying factors or structures influencing the observed measurements.
- Optimal transport
-
A method used to pair distributions of cells (for example, control and perturbed) in a cost-efficient way, while preserving overall mass.
- Ordinary differential equations
-
Equations or sets of equations that describe a rate of change of a quantity (for example, RNA degradation rate).
- Perturbations
-
Disturbances or deviations from a system’s normal or steady state, which can be intentional or unintentional.
- Prior knowledge
-
Information about a biological system, such as molecular interactions, pathways or phenotypic relationships, collected or estimated from diverse experiments and data modalities.
- Pseudotime
-
An estimate that orders cells along a continuous trajectory, such as differentiation, by using the similarities in their gene expression profiles.
- RNA velocity
-
An estimate of the time derivative of gene expression states, commonly calculated by analysing the ratios of spliced to unspliced messenger RNAs.
- Spurious correlations
-
Relationships between pairs of variables that seem to be causal but are solely coincidental or owing to the influence of third variables linking them.
- Supervised
-
A machine learning paradigm in which a model is trained on input features paired with known labels or outcomes.
- Transformer
-
A neural network architecture based on attention that processes data by computing pairwise relationships between elements in parallel.
- Unsupervised
-
A machine learning paradigm in which a model learns from input data without access to known labels or categories.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dimitrov, D., Schrod, S., Rohbeck, M. et al. Interpretation, extrapolation and perturbation of single cells. Nat Rev Genet (2026). https://doi.org/10.1038/s41576-025-00920-4
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41576-025-00920-4


