Abstract
Recent advances in pathology foundation models, pre-trained on large-scale histopathology images, have greatly advanced disease-focused applications. At the same time, spatial multi-omic technologies now measure gene and protein expression with high spatial resolution, offering valuable insights into tissue context. Yet, existing models struggle to integrate these complementary data types. Here, to address this challenge, we present spEMO, a computational framework that unifies embeddings from pathology foundation models and large language models for spatial multi-omic analysis. By leveraging multi-modal representations, spEMO surpasses single-modality models across diverse downstream tasks, including spatial domain identification, spot-type classification, whole-slide disease prediction and interpretation, multicellular interaction inference and automated medical reporting. These results highlight spEMO’s strength in both biological discovery and clinical applications. Furthermore, we introduce a new benchmark task—multi-modal alignment—to evaluate how effectively pathology foundation models retrieve complementary information. Together, spEMO establishes a powerful step towards holistic, interpretable and generalizable AI for spatial biology and pathology.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout







Data availability
We do not generate new data in this research and all data used in this paper are publicly available without restricted access. SpatialLIBD data can be accessed under access code (https://research.libd.org/spatialLIBD/). Spatial multi-omic data can be accessed under access code (https://drive.google.com/drive/folders/1RlU3JmHg_LZM1d-o6QORvykYPoulWWMI) and (https://drive.google.com/drive/folders/1RlU3JmHg_LZM1d-o6QORvykYPoulWWMI). Two 10x Visium data can be accessed under code (https://support.10xgenomics.com/spatial-gene-expression/datasets/1.1.0/V1_Adult_Mouse_Brain) and via figshare at https://doi.org/10.6084/m9.figshare.13604168 (ref. 53). TCGA data can be accessed under code (https://portal.gdc.cancer.gov/). HEST data can be accessed under code (https://huggingface.co/datasets/MahmoodLab/hest). We summarize the dataset statistics and download information in Supplementary Table 1. Source data are provided with this paper.
Code availability
To generate the text descriptions and corresponding embeddings, we rely on the API of OpenAI. To generate the embeddings from pathology foundation models as well as fine-tune them, we reply on Yale High-performance Computing Center (YCRC) and utilize one NVIDIA A100 GPU with up to 50 GB RAM. To run spEMO and compare it with other methods, we utilize one NVIDIA A100 GPU with up to 150 GB RAM. Information of running time and memory usage can be found in Supplementary Table 2. The codes of spEMO can be found at https://github.com/HelloWorldLTY/spEMO. The licence is MIT licence.
References
Song, A. H. et al. Artificial intelligence for digital and computational pathology. Nat. Rev. Bioeng 1, 930–949 (2023).
Zhang, S. & Metaxas, D. On the challenges and perspectives of foundation models for medical image analysis. Med. Image Anal. 91, 102996 (2024).
Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).
Moses, L. & Pachter, L. Museum of spatial transcriptomics. Nat. Methods 19, 534–546 (2022).
Wheeler, D. L. et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 35, 5–12 (2007).
Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).
Tejada-Lapuerta, A. et al. Nicheformer: a foundation model for single-cell and spatial omics. Nat. Methods 22, 2525–2538 (2025).
Liu, T., Li, K., Wang, Y., Li, H. & Zhao, H. Evaluating the utilities of foundation models in single-cell data analysis. Preprint at bioRxiv https://doi.org/10.1101/2023.09.08.555192 (2023).
Liu, T., Chen, T., Zheng, W., Luo, X. & Zhao, H. scELMo: embeddings from language models are good learners for single-cell data analysis. Preprint at bioRxiv https://doi.org/10.1101/2023.12.07.569910 (2023).
Abdi, H. & Williams, L. J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2, 433–459 (2010).
Maynard, K. R. et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci. 24, 425–436 (2021).
Kiselev, V. Y. et al. SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods 14, 483–486 (2017).
Hu, J. et al. SpaGCN: integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. Methods 18, 1342–1351 (2021).
Blampey, Q. et al. Novae: a graph-based foundation model for spatial transcriptomics data. Nat. Methods 22, 2539–2550 (2025).
Zhang, D. et al. Inferring super-resolution tissue architecture by integrating spatial transcriptomics with histology. Nat. Biotechnol. 42, 1372–1377 (2024).
Fu, X. & Chen, Y. Pix2Path: integrating spatial transcriptomics and digital pathology with deep learning to score pathological risk and link gene expression to disease mechanisms. Preprint at bioRxiv https://doi.org/10.1101/2024.08.18.608468 (2024).
Ma, J. et al. A generalizable pathology foundation model using a unified knowledge distillation pretraining framework. Nat. Biomed. Eng https://doi.org/10.1038/s41551-025-01488-4 (2025).
Xu, H. et al. A whole-slide foundation model for digital pathology from real-world data. Nature 630, 181–188 (2024).
Zimmermann, E. et al. Virchow2: scaling self-supervised mixed magnification models in pathology. Preprint at https://arxiv.org/abs/2408.00738 (2024).
Wang, X. et al. A pathology foundation model for cancer diagnosis and prognosis prediction. Nature 634, 970–978 (2024).
Xiang, J. et al. A vision–language foundation model for precision oncology. Nature 638, 769–778 (2025).
Lu, M. Y. et al. A visual-language foundation model for computational pathology. Nat. Med. 30, 863–874 (2024).
Ding, T. et al. A multimodal whole-slide foundation model for pathology. Nat. Med. 31, 3749–3761 (2025).
Chen, W. et al. A visual–omics foundation model to bridge histopathology with spatial transcriptomics. Nat. Methods 22, 1568–1582 (2025).
Jaume, G. et al. HEST-1k: a dataset for spatial transcriptomics and histology image analysis. In Proc. 38th International Conference on Neural Information Processing Systems (NIPS '24) Vol. 37, Article 1704, 53798–53833 (Curran Associates, 2024).
Palla, G. et al. Squidpy: a scalable framework for spatial omics analysis. Nat. Methods 19, 171–178 (2022).
Long, Y. et al. Deciphering spatial domains from spatial multi-omics with SpatialGlue. Nat. Methods 21, 1658–1667 (2024).
McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).
Labianca, R. et al. Colon cancer. Crit. Rev. Oncol. Hematol. 74, 106–133 (2010).
LemaÃŽtre, G., Nogueira, F. & Aridas, C. K. Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18, 559–563 (2017).
Liu, T. et al. Learning multi-cellular representations of single-cell transcriptomics data enables characterization of patient-level disease states. In Research in Computational Molecular Biology: Proc. 29th International Conference, RECOMB 2025 (ed. Sankararaman, S.) 303–306 (Springer, 2025).
Weinstein, J. N. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
Katzman, J. L. et al. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med. Res. Methodol. 18, 24 (2018).
Kvamme, H. Pycox. GitHub https://github.com/havakv/pycox (2025).
Lee, Y., Liu, X., Hao, M., Liu, T. & Regev, A. PathOmCLIP: connecting tumor histology with spatial gene expression via locally enhanced contrastive learning of pathology and single-cell foundation model. Preprint at bioRxiv https://doi.org/10.1101/2024.12.10.627865 (2024).
Huang, T., Liu, T., Babadi, M., Jin, W. & Ying, Z. Scalable generation of spatial transcriptomics from histology images via whole-slide flow matching. In Proc. 42nd International Conference on Machine Learning (eds Singh, A. et al.) Vol. 267, 25550–25565 https://proceedings.mlr.press/v267/huang25t.html (2025).
Cang, Z. et al. Screening cell–cell communication in spatial transcriptomics via collective optimal transport. Nat. Methods 20, 218–228 (2023).
Tsuneki, M. & Kanavati, F. Inference of captions from histopathological patches. In International Conference on Medical Imaging with Deep Learning 1235–1250 (PMLR, 2022).
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q. & Artzi, Y. BERTScore: evaluating text generation with BERT. In International Conference on Learning Representations. https://openreview.net/forum?id=SkeHuCVFDr (2019).
Soldaini, L. & Goharian, N. QuickUMLS: a fast, unsupervised approach for medical concept extraction. In Medical Information Retrieval (MedIR) Workshop at SIGIR 2016 https://ir.cs.georgetown.edu/downloads/quickumls.pdf (2016).
Shaikovski, G. et al. PRISM: a multi-modal generative foundation model for slide-level histopathology. Preprint at https://arxiv.org/abs/2405.10254 (2024).
Vorontsov, E. et al. A foundation model for clinical-grade computational pathology and rare cancers detection. Nat. Med. 30, 2924–2935 (2024).
Wang, X. et al. Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559 (2022).
Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).
Campanella, G. et al. A clinical benchmark of public self-supervised pathology foundation models. Nat. Commun. 16, 3640 (2025).
Yuan, Z. et al. Benchmarking spatial clustering methods with spatially resolved transcriptomics data. Nat. Methods 21, 712–722 (2024).
Xu, Y. et al. A multimodal knowledge-enhanced whole-slide pathology foundation model. Nat. Commun. 16, 11406 (2025).
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Litinetskaya, A. et al. Multimodal weakly supervised learning to identify disease-specific changes in single-cell atlases. Preprint at bioRxiv https://doi.org/10.1101/2024.07.29.605625 (2024).
Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).
Antolini, L., Boracchi, P. & Biganzoli, E. A time-dependent discrimination index for survival data. Stat. Med. 24, 3927–3944 (2005).
Palla, G. Brain coronal fluorescent Adata crop. figshare https://doi.org/10.6084/m9.figshare.13604168.v1 (2021).
Mueller, A. C. amueller/word_cloud. GitHub https://github.com/amueller/word_cloud (2023).
Acknowledgements
We thank M. Hao (Harvard University) for the suggestions of naming the model. We thank N. Sun (Yale University), W. Jin (Northeastern University) and T. Chu (Yale University) for the suggestions of model comparison. We thank D. Stern (Yale University) and H. Kluger (Yale University) for helping us recruit pathologists. We also thank S. John (Yale University) and one anonymous pathologist (Lois Hole Hospital for Women) for helping us validate medical reports. This project is supported in part by NIH grants U24HG012108 and U01HG013840.
Author information
Authors and Affiliations
Contributions
T.L. designed this study. T.L., T.H. and T.D. designed the model. T.L. and T.H. run all the experiments. H.W., P.H., S.P. and K.S. performed human evaluations. T.L., T.H., R.Y., H.X., J.Z., F.M. and H.Z. wrote the paper. R.Y. supported the computation resources. H.Z. supervised this project.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Biomedical Engineering thanks Dong Xu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–17.
Supplementary Table 1
Information of human evaluation test.
Supplementary Table 2
LLM output for report generation.
Supplementary Data 1
Supplementary data.
Supplementary Data 2
Source data for supplementary figures and tables.
Source data
Source Data Figs. 2–7 and Table 1
Source data for main figures and tables.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, T., Huang, T., Ding, T. et al. Leveraging multi-modal foundation models for analysing spatial multi-omic and histopathology data. Nat. Biomed. Eng (2026). https://doi.org/10.1038/s41551-025-01602-6
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41551-025-01602-6