Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Reusability report: Exploring the transferability of self-supervised learning models from single-cell to spatial transcriptomics

Abstract

Self-supervised learning (SSL) has emerged as a powerful approach for learning meaningful representations from large-scale unlabelled datasets in single-cell genomics. Richter et al. evaluated SSL pretext tasks on modelling single-cell RNA sequencing (scRNA-seq) data, demonstrating the effective use of SSL models. However, the transferability of these pretrained SSL models to the spatial transcriptomics domain remains unexplored. Here we assess the performance of three SSL models (random mask, gene programme mask and Barlow Twins) pretrained on scRNA-seq data with spatial transcriptomics datasets, focusing on cell-type prediction and spatial clustering. Our experiments demonstrate that the SSL model with random mask strategy exhibits the best overall performance among evaluated SSL models. Moreover, the models trained from scratch on spatial transcriptomics data outperform the fine-tuned SSL models on cell-type prediction, highlighting a domain gap between scRNA-seq and spatial transcriptomics data whose underlying causes remain an open question. Through expanded analyses of multiple imputation methods and data degradation scenarios, we demonstrate that gene imputation would degrade SSL model performance on cell-type prediction, an effect that is exacerbated by increasing data sparsity. Finally, integrating zero-shot random mask embeddings into chosen spatial clustering methods significantly enhanced their accuracy. Overall, our findings provide valuable insights into the limitations and potential of transferring SSL models to spatial transcriptomics and offer practical guidance for researchers leveraging pretrained models for spatial transcriptomics data analysis.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the reusability framework.
Fig. 2: Benchmarking SSL models on cell-type prediction with new scRNA-seq datasets.
Fig. 3: Evaluation in the MERSCOPE human neocortex dataset.
Fig. 4: The impact of imputation on performance of cell-type prediction across models.
Fig. 5: Cross-species experiments.
Fig. 6: Enhancing spatial clustering with zero-shot SSL embeddings.

Similar content being viewed by others

Data availability

This study utilized publicly available datasets. For reproducing the original study’s results, subsets of the scTab dataset were used. These include datasets of peripheral blood mononuclear cells after SARS-CoV-2 infection52 (dataset ID 87) and the Tabula Sapiens Atlas53 (dataset ID 41), available at https://pklab.med.harvard.edu/felix/data/merlin_cxg_2023_05_15_sf-log1p.tar.gz. Two unseen datasets also used in the original study14 are tail of hippocampus (HiT)—caudal hippocampus—CA4-DGC54 from the Human Brain Atlas (available at https://cellxgene.cziscience.com/e/9f499d32-400d-4c42-ac9a-fb1481844fee.cxg/ (ref. 55)) and human, great apes study56 (available at http://cellxgene.cziscience.com/e/2bdd3a2c-2ff4-4314-adf3-8a06b797a33a.cxg (ref. 57)). For further evaluating generalizability in single-cell genomics, two single-cell transcriptomics datasets were used: a developing human neocortex dataset4 (CELLxGENE portal: https://cellxgene.cziscience.com/collections/ad2149fc-19c5-41de-8cfe-44710fbada73 (ref. 58) or UCSF Cell Atlas: https://cell.ucsf.edu/snMultiome/) and a human breast cancer dataset26 (via 10x Genomics at https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast (ref. 59)). For spatial transcriptomics analyses, we used the MERSCOPE human neocortex dataset4 (via Brain Image Library at https://doi.org/10.35077/g.1156), the Xenium human breast cancer dataset26 (via 10x Genomics at https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast (ref. 59)) and the Slide-seqV2 human and mouse kidney datasets27 (via CELLxGENE portal at https://cellxgene.cziscience.com/collections/8e880741-bf9a-4c8e-9227-934204631d2a (ref. 60)). Checkpoints of SSL models are available at https://huggingface.co/TillR/sc_pretrained/tree/main (ref. 51). Source data are provided with this paper.

Code availability

The original implementation of STAGATE28 is available via GitHub at https://github.com/QIFEIDKN/STAGATE_pyG. The original implementation of GraphST29 is available via GitHub at https://github.com/JinmiaoChenLab/GraphST. Code of the original study14 is available via GitHub at https://github.com/theislab/ssl_in_scg (ref. 50). Our code and tutorials for reusability are available via GitHub at https://github.com/CSHCY/Reusability_SSL_in_SCG and via Zenodo at https://doi.org/10.5281/zenodo.15767719 (ref. 64).

References

  1. Regev, A. et al. The Human Cell Atlas. eLife https://doi.org/10.7554/elife.27041 (2017).

  2. Rood, J. E., Maartens, A., Hupalowska, A., Teichmann, S. A. & Regev, A. Impact of the Human Cell Atlas on medicine. Nat. Med. 28, 2486–2496 (2022).

    Article  Google Scholar 

  3. Bressan, D., Battistoni, G. & Hannon, G. J. The dawn of spatial omics. Science https://doi.org/10.1126/science.abq4964 (2023).

  4. Wang, L. et al. Molecular and cellular dynamics of the developing human neocortex. Nature https://doi.org/10.1038/s41586-024-08351-7 (2025).

  5. Sun, E. D. et al. Spatial transcriptomic clocks reveal cell proximity effects in brain ageing. Nature 638, 160–171 (2024).

  6. Moses, L. & Pachter, L. Museum of spatial transcriptomics. Nat. Methods 19, 534–546 (2022).

    Article  Google Scholar 

  7. Stegle, O., Teichmann, S. A. & Marioni, J. C. Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet. 16, 133–145 (2015).

    Article  Google Scholar 

  8. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) 4171–4186 (Association for Computational Linguistics, 2019).

  9. He, K. et al. Masked autoencoders are scalable vision learners. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 16000–16009 (IEEE, 2022).

  10. Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning 8748–8763 (PMLR, 2021).

  11. Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).

  12. Hao, M. et al. Large-scale foundation model on single-cell transcriptomics. Nat. Methods 21, 1481–1491 (2024).

  13. Yang, F. et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4, 852–866 (2022).

    Article  Google Scholar 

  14. Richter, T., Bahrami, M., Xia, Y., Fischer, D. S. & Theis, F. J. Delineating the effective use of self-supervised learning in single-cell genomics. Nat. Mach. Intell. 7, 68–78 (2024).

  15. Fischer, F. et al. scTab: scaling cross-tissue single-cell annotation models. Nat. Commun.15, 6611 (2024).

  16. Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In Proc. 37th International Conference on Machine Learning 1597–1607 (PMLR, 2020).

  17. Zbontar, J., Jing, L., Misra, I., LeCun, Y. & Deny, S. Barlow Twins: self-supervised learning via redundancy reduction. In Proc. 38th International Conference on Machine Learning 12310–12320 (PMLR, 2021).

  18. Marx, V. Method of the year: spatially resolved transcriptomics. Nat. Methods 18, 9–14 (2021).

    Article  Google Scholar 

  19. Carstens, J. L. et al. Spatial multiplexing and omics. Nat. Rev. Methods Prim. 4, 54 (2024).

    Article  Google Scholar 

  20. Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).

    Article  Google Scholar 

  21. Ergen, C. et al. Consensus prediction of cell type labels in single-cell data with popV. Nat. Genet. 56, 2731–2738 (2024).

    Article  Google Scholar 

  22. Zormpas, E., Queen, R., Comber, A. & Cockell, S. J. Mapping the transcriptome: realizing the full potential of spatial data analysis. Cell 186, 5677–5689 (2023).

    Article  Google Scholar 

  23. Yuan, Z. et al. Benchmarking spatial clustering methods with spatially resolved transcriptomics data. Nat. Methods 21, 712–722 (2024).

    Article  Google Scholar 

  24. Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).

    Article  Google Scholar 

  25. Stickels, R. R. et al. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat. Biotechnol. 39, 313–319 (2021).

    Article  Google Scholar 

  26. Janesick, A. et al. High resolution mapping of the tumor microenvironment using integrated single-cell, spatial and in situ analysis. Nat. Commun. 14, 8353 (2023).

    Article  Google Scholar 

  27. Marshall, J. L. et al. High-resolution Slide-seqV2 spatial transcriptomics enables discovery of disease-specific cell neighborhoods and pathways. iScience 25, 104097 (2022).

    Article  Google Scholar 

  28. Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun. 13, 1739 (2022).

    Article  Google Scholar 

  29. Long, Y. et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat. Commun. 14, 1155 (2023).

    Article  Google Scholar 

  30. Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).

    Article  Google Scholar 

  31. Zhang, M. et al. Molecularly defined and spatially resolved cell atlas of the whole mouse brain. Nature 624, 343–354 (2023).

    Article  Google Scholar 

  32. Wu, B. et al. A spatiotemporal atlas of cholestatic injury and repair in mice. Nat. Genet. 56, 938–952 (2024).

  33. Chen, X. et al. Whole-cortex in situ sequencing reveals input-dependent area identity. Nature https://doi.org/10.1038/s41586-024-07221-6 (2024).

  34. Greenwald, A. C. et al. Integrative spatial analysis reveals a multi-layered organization of glioblastoma. Cell 187, 2485–2501 (2024).

    Article  Google Scholar 

  35. Lucas, C.-H. G. et al. Spatial genomic, biochemical and cellular mechanisms underlying meningioma heterogeneity and evolution. Nat. Genet. 56, 1121–1133 (2024).

    Article  Google Scholar 

  36. Atta, L. & Fan, J. Computational challenges and opportunities in spatially resolved transcriptomic data analysis. Nat. Commun. 12, 5283 (2021).

    Article  Google Scholar 

  37. Szałata, A. et al. Transformers in single-cell omics: a review and new perspectives. Nat. Methods 21, 1430–1443 (2024).

    Article  Google Scholar 

  38. Wang, Y. et al. Spatial transcriptomics: technologies, applications and experimental considerations. Genomics 115, 110671 (2023).

    Article  Google Scholar 

  39. Shah, S., Lubeck, E., Zhou, W. & Cai, L. In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus. Neuron 92, 342–357 (2016).

    Article  Google Scholar 

  40. Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, eaat5691 (2018).

    Article  Google Scholar 

  41. Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009).

    Article  Google Scholar 

  42. Biancalani, T. et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat. Methods 18, 1352–1362 (2021).

    Article  Google Scholar 

  43. Benchmarking spatial and single-cell transcriptomics integration methods. Nat. Methods https://doi.org/10.1038/s41592-022-01481-8 (2022).

  44. Li, B. et al. Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution. Nat. Methods 19, 662–670 (2022).

    Article  Google Scholar 

  45. Ren, H., Walker, B. L., Cang, Z. & Nie, Q. Identifying multicellular spatiotemporal organization of cells with SpaceFlow. Nat. Commun. 13, 4076 (2022).

    Article  Google Scholar 

  46. Varrone, M., Tavernari, D., Santamaria-Martínez, A., Walsh, L. A. & Ciriello, G. CellCharter reveals spatial cell niches associated with tissue remodeling and cell plasticity. Nat. Genet. 56, 74–84 (2024).

    Article  Google Scholar 

  47. Singhal, V. et al. BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis. Nat. Genet. 56, 431–441 (2024).

    Article  Google Scholar 

  48. Ma, Y. & Zhou, X. Accurate and efficient integrative reference-informed spatial domain detection for spatial transcriptomics. Nat. Methods 21, 1231–1244 (2024).

  49. Boiarsky, R. et al. Deeper evaluation of a single-cell foundation model. Nat. Mach. Intell. 6, 1443–1446 (2024).

    Article  Google Scholar 

  50. Richter, T. & Bahrami, M. theislab/ssl_in_scg. GitHub https://github.com/theislab/ssl_in_scg (2025).

  51. Richter, T. TillR/sc_pretrained. Hugging Face https://huggingface.co/TillR/sc_pretrained/tree/main (2024).

  52. Yoshida, M. et al. Local and systemic responses to SARS-CoV-2 infection in children and adults. Nature 602, 321–327 (2022).

    Article  Google Scholar 

  53. Tabula Sapiens Consortium et al. The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science 376, eabl4896 (2022).

  54. Siletti, K. et al. Transcriptomic diversity of cell types across the adult human brain. Science 382, eadd7046 (2023).

    Article  Google Scholar 

  55. Siletti, K. et al. Human Brain Cell Atlas v1.0. CELLxGENE cellxgene.cziscience.com/e/9f499d32-400d-4c42-ac9a-fb1481844fee.cxg (2023).

  56. Jorstad, N. L. et al. Comparative transcriptomics reveals human-specific cortical features. Science 382, eade9516 (2023).

    Article  Google Scholar 

  57. Bakken, T. E. et al. Comparative transcriptomics reveals human-specific cortical features. CELLxGENE cellxgene.cziscience.com/e/2bdd3a2c-2ff4-4314-adf3-8a06b797a33a.cxg (2023).

  58. Wang, L. et al. Molecular and cellular dynamics of the developing human neocortex at single-cell resolution. CELLxGENE https://cellxgene.cziscience.com/collections/ad2149fc-19c5-41de-8cfe-44710fbada73 (2025).

  59. Janesick, A. et al. High resolution mapping of the tumor microenvironment using integrated single-cell, spatial and in situ analysis. 10x Genomics https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast (2023).

  60. Marshall, J. L. et al. High resolution Slide-seqV2 spatial transcriptomics enables discovery of disease-specific cell neighborhoods and pathways. CELLxGENE https://cellxgene.cziscience.com/collections/8e880741-bf9a-4c8e-9227-934204631d2a (2022).

  61. Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. https://doi.org/10.1186/s13059-017-1382-0 (2018).

  62. Biancalani, T., Scalia, G., Lu, Z., Gaddam, S. & Hupalowska, A. Tangram 0.4.0 documentation. TANGRAM https://tangram-sc.readthedocs.io/en/latest/index.html (2021).

  63. Biancalani, T. et al. broadinstitute/Tangram/tree/master. GitHub https://github.com/broadinstitute/Tangram/tree/master (2023).

  64. Han, C. CSHCY/Reusability_SSL_in_SCG: release of the SSLBench. Zenodo https://doi.org/10.5281/zenodo.15767720 (2025).

  65. Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2019).

    Article  Google Scholar 

Download references

Acknowledgements

Z.Y. acknowledges the support by the Computational Biology Program (grant no. 25JS2850200) of Science and Technology Commission of Shanghai Municipality (STCSM), National Natural Science Foundation of China (grant nos. 62303119 and 32470706), Shanghai Science and Technology Development Funds (grant no. 23YF1403000), Fund of Fudan University and Cao’ejiang Basic Research (grant no. 24FCA10).

Author information

Authors and Affiliations

Authors

Contributions

Z.Y. conceived of and designed the study. C.H., S.L. and Z.W. designed the pipeline and collected the methods and datasets. Z.Y., C.H., Y.C. and Q.Z. analysed the results and generated the figures. Z.Y. and C.H. wrote the paper and designed the figures. All authors approved the paper.

Corresponding author

Correspondence to Zhiyuan Yuan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Qing Nie, Jesper Tegner and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Notes 1–8, Table 1 and Figs. 1–11.

Reporting Summary

Source data

Source Data Fig. 2

Statistical source data for Fig. 2: cell-type prediction performance of different SSL methods on the human breast cancer and the human neocortex scRNA-seq datasets.

Source Data Fig. 3

Statistical source data for Fig. 3: cell-type prediction performance of different SSL methods on the MERSCOPE human neocortex dataset.

Source Data Fig. 4

Statistical source data for Fig. 4: cell-type prediction performance of different SSL methods on the original and the imputed Xenium breast cancer datasets.

Source Data Fig. 5

Statistical source data for Fig. 5: cell-type prediction performance of different SSL methods on the Slide-seqV2 human kidney and the Slide-seqV2 mouse kidney datasets.

Source Data Fig. 6

Statistical source data for Fig. 6: spatial clustering performance of STAGATE, GraphST, STAGATE-RM, STAGATE-BT, GraphST-RM and GraphST-BT methods on the MERSCOPE human neocortex dataset.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, C., Lin, S., Wang, Z. et al. Reusability report: Exploring the transferability of self-supervised learning models from single-cell to spatial transcriptomics. Nat Mach Intell 7, 1414–1428 (2025). https://doi.org/10.1038/s42256-025-01097-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s42256-025-01097-5

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing