Abstract
Self-supervised learning (SSL) has emerged as a powerful approach for learning meaningful representations from large-scale unlabelled datasets in single-cell genomics. Richter et al. evaluated SSL pretext tasks on modelling single-cell RNA sequencing (scRNA-seq) data, demonstrating the effective use of SSL models. However, the transferability of these pretrained SSL models to the spatial transcriptomics domain remains unexplored. Here we assess the performance of three SSL models (random mask, gene programme mask and Barlow Twins) pretrained on scRNA-seq data with spatial transcriptomics datasets, focusing on cell-type prediction and spatial clustering. Our experiments demonstrate that the SSL model with random mask strategy exhibits the best overall performance among evaluated SSL models. Moreover, the models trained from scratch on spatial transcriptomics data outperform the fine-tuned SSL models on cell-type prediction, highlighting a domain gap between scRNA-seq and spatial transcriptomics data whose underlying causes remain an open question. Through expanded analyses of multiple imputation methods and data degradation scenarios, we demonstrate that gene imputation would degrade SSL model performance on cell-type prediction, an effect that is exacerbated by increasing data sparsity. Finally, integrating zero-shot random mask embeddings into chosen spatial clustering methods significantly enhanced their accuracy. Overall, our findings provide valuable insights into the limitations and potential of transferring SSL models to spatial transcriptomics and offer practical guidance for researchers leveraging pretrained models for spatial transcriptomics data analysis.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
This study utilized publicly available datasets. For reproducing the original study’s results, subsets of the scTab dataset were used. These include datasets of peripheral blood mononuclear cells after SARS-CoV-2 infection52 (dataset ID 87) and the Tabula Sapiens Atlas53 (dataset ID 41), available at https://pklab.med.harvard.edu/felix/data/merlin_cxg_2023_05_15_sf-log1p.tar.gz. Two unseen datasets also used in the original study14 are tail of hippocampus (HiT)—caudal hippocampus—CA4-DGC54 from the Human Brain Atlas (available at https://cellxgene.cziscience.com/e/9f499d32-400d-4c42-ac9a-fb1481844fee.cxg/ (ref. 55)) and human, great apes study56 (available at http://cellxgene.cziscience.com/e/2bdd3a2c-2ff4-4314-adf3-8a06b797a33a.cxg (ref. 57)). For further evaluating generalizability in single-cell genomics, two single-cell transcriptomics datasets were used: a developing human neocortex dataset4 (CELLxGENE portal: https://cellxgene.cziscience.com/collections/ad2149fc-19c5-41de-8cfe-44710fbada73 (ref. 58) or UCSF Cell Atlas: https://cell.ucsf.edu/snMultiome/) and a human breast cancer dataset26 (via 10x Genomics at https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast (ref. 59)). For spatial transcriptomics analyses, we used the MERSCOPE human neocortex dataset4 (via Brain Image Library at https://doi.org/10.35077/g.1156), the Xenium human breast cancer dataset26 (via 10x Genomics at https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast (ref. 59)) and the Slide-seqV2 human and mouse kidney datasets27 (via CELLxGENE portal at https://cellxgene.cziscience.com/collections/8e880741-bf9a-4c8e-9227-934204631d2a (ref. 60)). Checkpoints of SSL models are available at https://huggingface.co/TillR/sc_pretrained/tree/main (ref. 51). Source data are provided with this paper.
Code availability
The original implementation of STAGATE28 is available via GitHub at https://github.com/QIFEIDKN/STAGATE_pyG. The original implementation of GraphST29 is available via GitHub at https://github.com/JinmiaoChenLab/GraphST. Code of the original study14 is available via GitHub at https://github.com/theislab/ssl_in_scg (ref. 50). Our code and tutorials for reusability are available via GitHub at https://github.com/CSHCY/Reusability_SSL_in_SCG and via Zenodo at https://doi.org/10.5281/zenodo.15767719 (ref. 64).
References
Regev, A. et al. The Human Cell Atlas. eLife https://doi.org/10.7554/elife.27041 (2017).
Rood, J. E., Maartens, A., Hupalowska, A., Teichmann, S. A. & Regev, A. Impact of the Human Cell Atlas on medicine. Nat. Med. 28, 2486–2496 (2022).
Bressan, D., Battistoni, G. & Hannon, G. J. The dawn of spatial omics. Science https://doi.org/10.1126/science.abq4964 (2023).
Wang, L. et al. Molecular and cellular dynamics of the developing human neocortex. Nature https://doi.org/10.1038/s41586-024-08351-7 (2025).
Sun, E. D. et al. Spatial transcriptomic clocks reveal cell proximity effects in brain ageing. Nature 638, 160–171 (2024).
Moses, L. & Pachter, L. Museum of spatial transcriptomics. Nat. Methods 19, 534–546 (2022).
Stegle, O., Teichmann, S. A. & Marioni, J. C. Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet. 16, 133–145 (2015).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) 4171–4186 (Association for Computational Linguistics, 2019).
He, K. et al. Masked autoencoders are scalable vision learners. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 16000–16009 (IEEE, 2022).
Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning 8748–8763 (PMLR, 2021).
Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).
Hao, M. et al. Large-scale foundation model on single-cell transcriptomics. Nat. Methods 21, 1481–1491 (2024).
Yang, F. et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4, 852–866 (2022).
Richter, T., Bahrami, M., Xia, Y., Fischer, D. S. & Theis, F. J. Delineating the effective use of self-supervised learning in single-cell genomics. Nat. Mach. Intell. 7, 68–78 (2024).
Fischer, F. et al. scTab: scaling cross-tissue single-cell annotation models. Nat. Commun.15, 6611 (2024).
Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In Proc. 37th International Conference on Machine Learning 1597–1607 (PMLR, 2020).
Zbontar, J., Jing, L., Misra, I., LeCun, Y. & Deny, S. Barlow Twins: self-supervised learning via redundancy reduction. In Proc. 38th International Conference on Machine Learning 12310–12320 (PMLR, 2021).
Marx, V. Method of the year: spatially resolved transcriptomics. Nat. Methods 18, 9–14 (2021).
Carstens, J. L. et al. Spatial multiplexing and omics. Nat. Rev. Methods Prim. 4, 54 (2024).
Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).
Ergen, C. et al. Consensus prediction of cell type labels in single-cell data with popV. Nat. Genet. 56, 2731–2738 (2024).
Zormpas, E., Queen, R., Comber, A. & Cockell, S. J. Mapping the transcriptome: realizing the full potential of spatial data analysis. Cell 186, 5677–5689 (2023).
Yuan, Z. et al. Benchmarking spatial clustering methods with spatially resolved transcriptomics data. Nat. Methods 21, 712–722 (2024).
Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).
Stickels, R. R. et al. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat. Biotechnol. 39, 313–319 (2021).
Janesick, A. et al. High resolution mapping of the tumor microenvironment using integrated single-cell, spatial and in situ analysis. Nat. Commun. 14, 8353 (2023).
Marshall, J. L. et al. High-resolution Slide-seqV2 spatial transcriptomics enables discovery of disease-specific cell neighborhoods and pathways. iScience 25, 104097 (2022).
Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun. 13, 1739 (2022).
Long, Y. et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat. Commun. 14, 1155 (2023).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).
Zhang, M. et al. Molecularly defined and spatially resolved cell atlas of the whole mouse brain. Nature 624, 343–354 (2023).
Wu, B. et al. A spatiotemporal atlas of cholestatic injury and repair in mice. Nat. Genet. 56, 938–952 (2024).
Chen, X. et al. Whole-cortex in situ sequencing reveals input-dependent area identity. Nature https://doi.org/10.1038/s41586-024-07221-6 (2024).
Greenwald, A. C. et al. Integrative spatial analysis reveals a multi-layered organization of glioblastoma. Cell 187, 2485–2501 (2024).
Lucas, C.-H. G. et al. Spatial genomic, biochemical and cellular mechanisms underlying meningioma heterogeneity and evolution. Nat. Genet. 56, 1121–1133 (2024).
Atta, L. & Fan, J. Computational challenges and opportunities in spatially resolved transcriptomic data analysis. Nat. Commun. 12, 5283 (2021).
Szałata, A. et al. Transformers in single-cell omics: a review and new perspectives. Nat. Methods 21, 1430–1443 (2024).
Wang, Y. et al. Spatial transcriptomics: technologies, applications and experimental considerations. Genomics 115, 110671 (2023).
Shah, S., Lubeck, E., Zhou, W. & Cai, L. In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus. Neuron 92, 342–357 (2016).
Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, eaat5691 (2018).
Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009).
Biancalani, T. et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat. Methods 18, 1352–1362 (2021).
Benchmarking spatial and single-cell transcriptomics integration methods. Nat. Methods https://doi.org/10.1038/s41592-022-01481-8 (2022).
Li, B. et al. Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution. Nat. Methods 19, 662–670 (2022).
Ren, H., Walker, B. L., Cang, Z. & Nie, Q. Identifying multicellular spatiotemporal organization of cells with SpaceFlow. Nat. Commun. 13, 4076 (2022).
Varrone, M., Tavernari, D., Santamaria-Martínez, A., Walsh, L. A. & Ciriello, G. CellCharter reveals spatial cell niches associated with tissue remodeling and cell plasticity. Nat. Genet. 56, 74–84 (2024).
Singhal, V. et al. BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis. Nat. Genet. 56, 431–441 (2024).
Ma, Y. & Zhou, X. Accurate and efficient integrative reference-informed spatial domain detection for spatial transcriptomics. Nat. Methods 21, 1231–1244 (2024).
Boiarsky, R. et al. Deeper evaluation of a single-cell foundation model. Nat. Mach. Intell. 6, 1443–1446 (2024).
Richter, T. & Bahrami, M. theislab/ssl_in_scg. GitHub https://github.com/theislab/ssl_in_scg (2025).
Richter, T. TillR/sc_pretrained. Hugging Face https://huggingface.co/TillR/sc_pretrained/tree/main (2024).
Yoshida, M. et al. Local and systemic responses to SARS-CoV-2 infection in children and adults. Nature 602, 321–327 (2022).
Tabula Sapiens Consortium et al. The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science 376, eabl4896 (2022).
Siletti, K. et al. Transcriptomic diversity of cell types across the adult human brain. Science 382, eadd7046 (2023).
Siletti, K. et al. Human Brain Cell Atlas v1.0. CELLxGENE cellxgene.cziscience.com/e/9f499d32-400d-4c42-ac9a-fb1481844fee.cxg (2023).
Jorstad, N. L. et al. Comparative transcriptomics reveals human-specific cortical features. Science 382, eade9516 (2023).
Bakken, T. E. et al. Comparative transcriptomics reveals human-specific cortical features. CELLxGENE cellxgene.cziscience.com/e/2bdd3a2c-2ff4-4314-adf3-8a06b797a33a.cxg (2023).
Wang, L. et al. Molecular and cellular dynamics of the developing human neocortex at single-cell resolution. CELLxGENE https://cellxgene.cziscience.com/collections/ad2149fc-19c5-41de-8cfe-44710fbada73 (2025).
Janesick, A. et al. High resolution mapping of the tumor microenvironment using integrated single-cell, spatial and in situ analysis. 10x Genomics https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast (2023).
Marshall, J. L. et al. High resolution Slide-seqV2 spatial transcriptomics enables discovery of disease-specific cell neighborhoods and pathways. CELLxGENE https://cellxgene.cziscience.com/collections/8e880741-bf9a-4c8e-9227-934204631d2a (2022).
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. https://doi.org/10.1186/s13059-017-1382-0 (2018).
Biancalani, T., Scalia, G., Lu, Z., Gaddam, S. & Hupalowska, A. Tangram 0.4.0 documentation. TANGRAM https://tangram-sc.readthedocs.io/en/latest/index.html (2021).
Biancalani, T. et al. broadinstitute/Tangram/tree/master. GitHub https://github.com/broadinstitute/Tangram/tree/master (2023).
Han, C. CSHCY/Reusability_SSL_in_SCG: release of the SSLBench. Zenodo https://doi.org/10.5281/zenodo.15767720 (2025).
Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2019).
Acknowledgements
Z.Y. acknowledges the support by the Computational Biology Program (grant no. 25JS2850200) of Science and Technology Commission of Shanghai Municipality (STCSM), National Natural Science Foundation of China (grant nos. 62303119 and 32470706), Shanghai Science and Technology Development Funds (grant no. 23YF1403000), Fund of Fudan University and Cao’ejiang Basic Research (grant no. 24FCA10).
Author information
Authors and Affiliations
Contributions
Z.Y. conceived of and designed the study. C.H., S.L. and Z.W. designed the pipeline and collected the methods and datasets. Z.Y., C.H., Y.C. and Q.Z. analysed the results and generated the figures. Z.Y. and C.H. wrote the paper and designed the figures. All authors approved the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Qing Nie, Jesper Tegner and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Notes 1–8, Table 1 and Figs. 1–11.
Source data
Source Data Fig. 2
Statistical source data for Fig. 2: cell-type prediction performance of different SSL methods on the human breast cancer and the human neocortex scRNA-seq datasets.
Source Data Fig. 3
Statistical source data for Fig. 3: cell-type prediction performance of different SSL methods on the MERSCOPE human neocortex dataset.
Source Data Fig. 4
Statistical source data for Fig. 4: cell-type prediction performance of different SSL methods on the original and the imputed Xenium breast cancer datasets.
Source Data Fig. 5
Statistical source data for Fig. 5: cell-type prediction performance of different SSL methods on the Slide-seqV2 human kidney and the Slide-seqV2 mouse kidney datasets.
Source Data Fig. 6
Statistical source data for Fig. 6: spatial clustering performance of STAGATE, GraphST, STAGATE-RM, STAGATE-BT, GraphST-RM and GraphST-BT methods on the MERSCOPE human neocortex dataset.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Han, C., Lin, S., Wang, Z. et al. Reusability report: Exploring the transferability of self-supervised learning models from single-cell to spatial transcriptomics. Nat Mach Intell 7, 1414–1428 (2025). https://doi.org/10.1038/s42256-025-01097-5
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s42256-025-01097-5


