Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Benchmarking algorithms for generalizable single-cell perturbation response prediction

Abstract

Single-cell perturbation technologies enable systematic investigation of gene functions and regulatory networks with single-cell resolution. However, performing large-scale and combinatorial perturbation screens poses notable challenges due to their exponentially increased complexity. Computational methods, including foundation models, have been developed to predict perturbation effects. Yet despite claims of promising performance, concerns remain about their true efficacy, particularly when evaluated across diverse and previously unseen cellular contexts and perturbation scenarios. Here, we present a comprehensive benchmark of 27 methods for single-cell perturbation response prediction, evaluated across 29 datasets using 6 complementary performance metrics. By evaluating them under multiple scenarios, we systematically assess their generalizability, including that of emerging foundation models. Our results provide practical guidance for method selection and underscore the need for cellular context embedding approaches to enhance the generalizability of perturbation effect prediction in single-cell research.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Workflow and datasets for benchmarking the single-cell perturbation effect prediction.
Fig. 2: Benchmarking results for the o.o.d. setting in the cellular context generalization scenario.
Fig. 3: Limitation of current methods in the cellular context generalization scenario.
Fig. 4: Benchmarking results for genetic perturbation in the perturbation generalization scenario.
Fig. 5: Benchmarking results in the perturbation generalization scenario.

Similar content being viewed by others

Data availability

We have uploaded all the processed datasets used in our benchmark study to Figshare at https://doi.org/10.6084/m9.figshare.28143422 (ref. 63) and https://doi.org/10.6084/m9.figshare.28147883 (ref. 64) and Zenodo via https://doi.org/10.5281/zenodo.14607156 (ref. 65) and https://doi.org/10.5281/zenodo.14638779 (ref. 66).

The kangCrossCell33, kangCrossPatient33 and Haber37 datasets consist of preprocessed data from Lotfollahi et al., which can be downloaded directly from Google Drive via https://drive.google.com/drive/folders/1n1SLbXha4OH7j7zZ0zZAxrj_-2kczgl8. The Parekh58, CrossPatient35 and sciPlex3 (ref. 1) datasets were obtained from the PerturBase database http://www.perturbase.cn/. KaggleCrossPatient and KaggleCrossCell can be downloaded from the Kaggle competition webpage via https://www.kaggle.com/competitions/open-problems-single-cell-perturbations/data/. The McFarland32 dataset was downloaded from the scPerturb database (version 1.3), which is available at Zenodo via https://doi.org/10.5281/zenodo.10044268. (ref. 59). CrossSpecies34 is available at https://github.com/theislab/scgen-reproducibility/blob/master/code/DataDownloader.py/. The Afriat60 perturbation dataset was downloaded from the biolord GitHub tutorial site via https://biolord.readthedocs.io/en/latest/tutorials/biolord_pipeline.html. The sciPlex3_comb dataset1 was downloaded from the CPA tutorial website and can be accessed at https://drive.google.com/uc?export=download&id=1RRV0_qYKGTvD3oCklKfoZQFYqKJy4l6t. The remaining 16 datasets—Adamson3, Frangieh45, TianActivation44, TianInhibition44, Replogle_exp6 (ref. 43), Replogle_exp7 (ref. 43), Replogle_exp8 (ref. 43), Papalexi42, Replogle_RPE1essential41, Replogle_K562essential41, Norman40, Wessels39, Schmidt38, sciPlex_A549 (ref. 1), sciPlex3_K562 (ref. 1) and sciPlex3_MCF7 (ref. 1)—were downloaded from the PerturBase database via http://www.perturbase.cn/. Source data are provided with this paper.

Code availability

The scripts used in this study are available via GitHub at https://github.com/bm2-lab/scPerturBench/ and Zenodo at https://doi.org/10.5281/zenodo.15904698 (ref. 65). To promote transparency and reproducibility, we provide a Podman image that contains all major scripts used in our benchmark, along with the complete set of preconfigured conda environments (https://github.com/bm2-lab/scPerturBench/). We have created an online platform that hosts the benchmarking results of all evaluated tools across all datasets and performance metrics included in our study (https://bm2-lab.github.io/scPerturBench-reproducibility/).

References

  1. Srivatsan, S. R. et al. Massively multiplex chemical transcriptomics at single-cell resolution. Science 367, 45–51 (2020.

    Article  CAS  PubMed  Google Scholar 

  2. Dixit, A. et al. Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Adamson, B. et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 167, 1867–1882 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Lotfollahi, M., Wolf, F. A. & Theis, F. J. scGen predicts single-cell perturbation responses. Nat. Methods 16, 715–721 (2019).

    Article  CAS  PubMed  Google Scholar 

  5. Roohani, Y., Huang, K. & Leskovec, J. Predicting transcriptional outcomes of novel multigene perturbations with GEARS. Nat. Biotechnol. 42, 927–935 (2024).

    Article  CAS  PubMed  Google Scholar 

  6. Hao, M. et al. Large-scale foundation model on single-cell transcriptomics. Nat. Methods 21, 1481–1491 (2024).

    Article  CAS  PubMed  Google Scholar 

  7. Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).

    Article  CAS  PubMed  Google Scholar 

  8. Yang, F. et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4, 852–866 (2022).

    Article  Google Scholar 

  9. Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Wu, Y. et al. PerturBench: benchmarking machine learning models for cellular perturbation analysis. Preprint at https://arxiv.org/abs/2408.10609 (2024).

  11. Bendidi, I. et al. Benchmarking transcriptomics foundation models for perturbation analysis: one PCA still rules them all. Preprint at https://arxiv.org/abs/2410.13956 (2024).

  12. Ahlmann-Eltze, C., Huber, W. & Anders, S. Deep-learning-based gene perturbation effect prediction does not yet outperform simple linear baselines. Nat. Methods 22, 1657–1661 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Peidli, S. et al. scPerturb: harmonized single-cell perturbation data. Nat. Methods 21, 531–540 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Bunne, C. et al. Learning single-cell perturbation responses using neural optimal transport. Nat. Methods 20, 1759–1768 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Jiang, Q., Chen, S., Chen, X. & Jiang, R. scPRAM accurately predicts single-cell gene expression perturbation response based on attention mechanism. Bioinformatics 40, btae265 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Piran, Z., Cohen, N., Hoshen, Y. & Nitzan, M. Disentanglement of single-cell data with biolord. Nat. Biotechnol. 42, 1678–1683 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Yeh, C. -H., Chen, Z. -G., Liou, C. -Y. & Chen, M. -J. Homogeneous space construction and projection for single-cell expression prediction based on deep learning. Bioengineering 10, 996 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Zhang, Z., Zhao, X., Bindra, M., Qiu, P. & Zhang, X. scDisInFact: disentangled learning for integration and prediction of multi-batch multi-condition single-cell RNA-sequencing data. Nat. Commun. 15, 912 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Wei, X., Dong, J. & Wang, F. scPreGAN, a deep generative model for predicting the response of single-cell expression to perturbation. Bioinformatics 38, 3377–3384 (2022).

    Article  CAS  PubMed  Google Scholar 

  20. Wang, H., Wang, Y., Jiang, Q., Zhang, Y. & Chen, S. SCREEN: predicting single-cell gene expression perturbation responses via optimal transport. Front. Comput. Sci. 18, 2095–2228 (2024).

    Article  Google Scholar 

  21. Kana, O. et al. Generative modeling of single-cell gene expression for dose-dependent chemical perturbations. Patterns 4, 100817 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Lotfollahi, M., Naghipourfar, M., Theis, F. J. & Wolf, F. A. Conditional out-of-distribution generation for unpaired data using transfer VAE. Bioinformatics 36, i610–i617 (2020).

    Article  CAS  PubMed  Google Scholar 

  23. Bai, D., Ellington, C. N., Mo, S., Song, L. & Xing, E. P. AttentionPert: accurately modeling multiplexed genetic perturbations with multi-scale effects. Bioinformatics 40, i453–i461 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Lotfollahi, M. et al. Predicting cellular responses to complex perturbations in high-throughput screens. Mol. Syst. Biol. 19, e11517 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Chen, Y. & Zou, J. Simple and effective embedding model for single-cell biology built from ChatGPT. Nat. Biomed. Eng. 9, 483–493 (2025).

    Article  PubMed  Google Scholar 

  26. Hetzel, L., Boehm, S., Kilbertus, N., Günnemann, S. & Theis, F. Predicting cellular responses to novel drug perturbations at a single-cell resolution. Adv. Neural Inf. Process. Syst. 35, 26711–26722 (2022).

    Google Scholar 

  27. Zhu, O. & Li, J. Scouter: predicting transcriptional responses to genetic perturbations with LLM embeddings. Preprint at bioRxiv https://doi.org/10.1101/2024.12.06.627290 (2024).

  28. Liu, T., Chen, T., Zheng, W., Luo, X. & Zhao, H. scELMo: embeddings from language models are good learners for single-cell data analysis. Preprint at bioRxiv https://doi.org/10.1101/2023.12.07.569910 (2023).

  29. Yang, X. et al. GeneCompass: deciphering universal gene regulatory mechanisms with a knowledge-informed cross-species foundation model. Cell Res. 34, 830–845 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Qi, X. et al. Predicting transcriptional responses to novel chemical perturbations using deep generative model for drug discovery. Nat. Commun. 15, 9256 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Huang, W. & Liu, H. Predicting single-cell cellular responses to perturbations using cycle consistency learning. Bioinformatics 40, i462–i470 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  32. McFarland, J. M. et al. Multiplexed single-cell transcriptional response profiling to define cancer vulnerabilities and therapeutic mechanism of action. Nat. Commun. 11, 4296 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).

    Article  CAS  PubMed  Google Scholar 

  34. Hagai, T. et al. Gene expression variability across cells and species shapes innate immunity. Nature 563, 197–202 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Zhao, W. et al. Deconvolution of cell type-specific drug responses in human tumor tissue with single-cell RNA-seq. Genome Med. 13, 82 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Nault, R., Fader, K. A., Bhattacharya, S. & Zacharewski, T. R. Single-nuclei RNA sequencing assessment of the hepatic effects of 2,3,7,8-tetrachlorodibenzo-p-dioxin. Cell Mol. Gastroenterol. Hepatol. 11, 147–159 (2021).

    Article  CAS  PubMed  Google Scholar 

  37. Haber, A. L. et al. A single-cell survey of the small intestinal epithelium. Nature 551, 333–339 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Schmidt, R. et al. CRISPR activation and interference screens decode stimulation responses in primary human T cells. Science 375, eabj4008 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Wessels, H. H. et al. Efficient combinatorial targeting of RNA transcripts in single cells with Cas13 RNA Perturb-seq. Nat. Methods 20, 86–94 (2023).

    Article  CAS  PubMed  Google Scholar 

  40. Norman, T. M. et al. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 365, 786–793 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Replogle, J. M. et al. Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq. Cell 185, 2559–2575 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Papalexi, E. et al. Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens. Nat. Genet. 53, 322–331 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Replogle, J. M. et al. Combinatorial single-cell CRISPR screens by direct guide RNA capture and targeted sequencing. Nat. Biotechnol. 38, 954–961 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Tian, R. et al. Genome-wide CRISPRi/a screens in human neurons link lysosomal failure to ferroptosis. Nat. Neurosci. 24, 1020–1034 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Frangieh, C. J. et al. Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion. Nat. Genet. 53, 332–341 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Wei, Z. et al. PerturBase: a comprehensive database for single-cell perturbation data analysis and visualization. Nucleic Acids Res. 53, D1099–D1111 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Ji, Y. et al. Optimal distance metrics for single-cell RNA-seq populations. Preprint at bioRxiv https://doi.org/10.1101/2023.12.26.572833 (2023).

  48. Gaudelet, T. et al. Season combinatorial intervention predictions with Salt & Peper. Preprint at https://arxiv.org/html/2404.16907v1 (2024).

  49. Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).

    Article  CAS  PubMed  Google Scholar 

  50. Yuan, Z. et al. Benchmarking spatial clustering methods with spatially resolved transcriptomics data. Nat. Methods 21, 712–722 (2024).

    Article  CAS  PubMed  Google Scholar 

  51. Shahapure, K. R. & Nicholas, C. Cluster quality analysis using silhouette score. in 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA) 747-748 (IEEE, 2020).

  52. Gao, Y. et al. Toward subtask-decomposition-based learning and benchmarking for predicting genetic perturbation outcomes and beyond. Nat. Comput Sci. 4, 773–785 (2024).

    Article  PubMed  Google Scholar 

  53. Subramanian, A. et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171, 1437–1452 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Song, D. et al. scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics. Nat. Biotechnol. 42, 247–252 (2024).

    Article  CAS  PubMed  Google Scholar 

  55. Rood, J. E., Hupalowska, A. & Regev, A. Toward a foundation model of causal cell and tissue biology with a perturbation cell and tissue atlas. Cell 187, 4520–4545 (2024).

    Article  CAS  PubMed  Google Scholar 

  56. Zhang, J. et al. Tahoe-100M: a giga-scale single-cell perturbation atlas for context-dependent gene function and cellular modeling. Preprint at bioRxiv https://doi.org/10.1101/2025.02.20.639398 (2025).

  57. Huang, A. C. et al. X-Atlas/Orion: genome-wide Perturb-seq datasets via a scalable fix-cryopreserve platform for training dose-dependent biological foundation models. Preprint at bioRxiv https://doi.org/10.1101/2025.06.11.659105 (2025).

  58. Parekh, U. et al. Mapping cellular reprogramming via pooled overexpression screens with paired fitness and single-cell RNA-sequencing readout. Cell Syst. 7, 548–555 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Peidli, S. et al. scPerturb single-cell perturbation data: RNA and protein h5ad files (1.3). Zenodo https://doi.org/10.5281/zenodo.10044268 (2022).

  60. Afriat, A. et al. A spatiotemporally resolved single-cell atlas of the Plasmodium liver stage. Nature 611, 563–569 (2022).

    Article  CAS  PubMed  Google Scholar 

  61. Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Heumos, L. et al. Pertpy: an end-to-end framework for perturbation analysis. Preprint at bioRxiv https://doi.org/10.1101/2024.08.04.606516 (2024).

  63. Zhiting, W. Cellular context generalization datasets. Figshare https://doi.org/10.6084/m9.figshare.28143422 (2025).

  64. Zhiting, W. Perturbation generalization datasets. Figshare https://doi.org/10.6084/m9.figshare.28147883 (2025).

  65. Wei, Z. et al. Benchmarking algorithms for generalizable single-cell perturbation response prediction. Zenodo https://doi.org/10.5281/zenodo.14607156 (2025).

  66. Zhiting, W. Perturbation generalization H5ad datasets. Zenodo https://doi.org/10.5281/zenodo.14638780 (2025).

Download references

Acknowledgements

We gratefully acknowledge all single-cell perturbation dataset owners for generously sharing their data. Q.L. was supported by the National Key Research and Development Program of China (grant no. 2025YFC3409300), National Natural Science Foundation of China (grant no. T2425019, 32341008), Shanghai Pilot Program for Basic Research; Shanghai Science and Technology Innovation Action Plan—Key Specialization in Computational Biology, Shanghai Shuguang Scholars Project, Shanghai Excellent Academic Leader Project, Shanghai Municipal Science and Technology Major Project (2021SHZDZX0100), Fundamental Research Funds for the Central Universities, Funding for open access charge, National Natural Science Foundation of China. Z.W. was supported by the National Natural Science Foundation of China (32500555).

Author information

Authors and Affiliations

Contributions

Z.W., Y.W., Y.C.G., A.L., G.C. and Q.L. conceived and designed the study. Z.W. designed the metrics, established the benchmarking pipeline and collected the methods and datasets. Z.W. and Y.W. implemented the benchmarking pipeline. Y.C.G. developed the cell-line embedding strategy. Z.W. and Y.W. analyzed the results and generated the figures with help from P.L., D.S., Y.L.G., S.Q.W., D.L., K.D., X.Y., C.T., S.F., X.C., W.L., Y.Y. and C.Z. Z.W., Y.W. and Q.L. wrote the manuscript. Z.W. and S.G.W. designed the website. Q.L., G.C. and A.L. supervised the entire project. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Aibin Liang, Guohui Chuai or Qi Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Yi Zhao and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lin Tang, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Correlation of model performance across 13 commonly used evaluation metrics.

(a–d) Pairwise correlations among the 13 metrics used to assess model performance on the KangCrossCell, Haber, Replogle-RPE1essential, and Replogle-K562essential datasets, respectively. Each panel displays the Spearman correlation coefficients between different metrics, computed across all prediction methods evaluated in each dataset.

Source data

Extended Data Fig. 2 Effects of covariates on model performance across datasets.

(a) For each of the 12 benchmark datasets, we evaluated the impact of covariates—including cellular context, perturbation, and model (method)—on prediction performance using ANOVA based on ordinary least squares (OLS) regression. In datasets containing only a single perturbation condition (KangCrossCell, CrossSpecies, KangCrossPatient and TCDD), only cellular context and model were included as covariates. For the remaining datasets, perturbation identity was additionally included as a third covariate.

Source data

Extended Data Fig. 3 Effects of covariates on model performance across multicondition datasets.

(a-e) For each of the 5 multicondition benchmark datasets, we evaluated the impact of covariates—including cellular context, perturbation, model (method) and time-point/dosage—on prediction performance using ANOVA based on ordinary least squares (OLS) regression. only scDisInFact, biolord, and scVIDR explicitly incorporate time-point/dosage information. Consequently, only these three methods were included in the 5 multicondition datasets. In datasets containing only a single perturbation condition (CrossSpecies and TCDD), only cellular context, model and time-point/dosage were included as covariates. For the remaining datasets, perturbation identity was additionally included as a covariate.

Source data

Extended Data Fig. 4 Impact of inter- and intra-heterogeneity on model performance.

(a–b) Correlation between model performance and inter-heterogeneity, as measured by MSE and PCC-delta. A higher degree of inter-heterogeneity indicates greater variation across cellular contexts, which typically increases the difficulty of generalizing perturbation effects. A linear regression line with 95% confidence interval is shown. Pearson correlation coefficients were calculated, and statistical significance was assessed using two-sided t-tests. Adjustments were not made for multiple comparisons. (c–d) Correlation between model performance and intra-heterogeneity, as measured by MSE and PCC-delta. In both cases, results from test contexts across 12 datasets were aggregated, with each point representing a test context within a specific dataset. MSE and PCC-delta were chosen as representative performance metrics (see Methods). A linear regression line with 95% confidence interval is shown. Pearson correlation coefficients were calculated, and statistical significance was assessed using two-sided t-tests. Adjustments were not made for multiple comparisons. (e) Two-way ANOVA assessing the effects of inter- and intra-heterogeneity on model performance, as measured by MSE. (f) Two-way ANOVA assessing the effects of inter- and intra-heterogeneity on model performance, as measured by PCC-delta.

Source data

Extended Data Fig. 5 Impact of fine-tuning set size on model performance.

(a) Impact of fine-tuning set size on model performance in the Replogle-K562essential dataset. (b) Impact of fine-tuning set size on model performance in the Replogle-RPE1essential dataset.

Source data

Supplementary information

Supplementary Information (download PDF )

Supplementary Notes 1–18 and Figs. 1–45

Reporting Summary (download PDF )

Peer Review File (download PDF )

Supplementary Table 1 (download XLSX )

Metrics used in single-cell perturbation prediction methods.

Supplementary Table 2 (download XLSX )

Effects of covariates on model performance across multi-condition datasets.

Supplementary Table 3 (download XLSX )

The detailed information of datasets in the cellular context and perturbation generalization scenario.

Supplementary Table 4 (download XLSX )

The detailed information of simulated datasets for robustness experiments in the cellular context and perturbation generalization scenario.

Supplementary Table 5 (download XLSX )

Comparison of method performance between our study and prior studies.

Source data

Source Data Fig. 1 (download XLSX )

Datasets for benchmarking the single-cell perturbation effect prediction.

Source Data Fig. 2 (download XLSX )

Benchmarking results for the o.o.d. setting in the cellular context generalization scenario.

Source Data Fig. 3 (download XLSX )

Limitation of current methods in the cellular context generalization scenario.

Source Data Fig. 4 (download XLSX )

Benchmarking results for genetic perturbation in the perturbation generalization scenario.

Source Data Fig. 5 (download XLSX )

Benchmarking results in the perturbation generalization scenario.

Source Data Extended Data Fig./Table 1 (download XLSX )

Correlation of model performance across 13 commonly used evaluation metrics.

Source Data Extended Data Fig./Table 2 (download XLSX )

Effects of covariates on model performance across datasets.

Source Data Extended Data Fig./Table 3 (download XLSX )

Effects of covariates on model performance across multi-condition datasets.

Source Data Extended Data Fig./Table 4 (download XLSX )

Impact of inter-heterogeneity and intra-heterogeneity on model performance.

Source Data Extended Data Fig./Table 5 (download XLSX )

Impact of fine-tuning set size on model performance.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, Z., Wang, Y., Gao, Y. et al. Benchmarking algorithms for generalizable single-cell perturbation response prediction. Nat Methods 23, 451–464 (2026). https://doi.org/10.1038/s41592-025-02980-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41592-025-02980-0

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research