Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Leveraging multi-modal foundation models for analysing spatial multi-omic and histopathology data

Abstract

Recent advances in pathology foundation models, pre-trained on large-scale histopathology images, have greatly advanced disease-focused applications. At the same time, spatial multi-omic technologies now measure gene and protein expression with high spatial resolution, offering valuable insights into tissue context. Yet, existing models struggle to integrate these complementary data types. Here, to address this challenge, we present spEMO, a computational framework that unifies embeddings from pathology foundation models and large language models for spatial multi-omic analysis. By leveraging multi-modal representations, spEMO surpasses single-modality models across diverse downstream tasks, including spatial domain identification, spot-type classification, whole-slide disease prediction and interpretation, multicellular interaction inference and automated medical reporting. These results highlight spEMO’s strength in both biological discovery and clinical applications. Furthermore, we introduce a new benchmark task—multi-modal alignment—to evaluate how effectively pathology foundation models retrieve complementary information. Together, spEMO establishes a powerful step towards holistic, interpretable and generalizable AI for spatial biology and pathology.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The landscape of spEMO.
Fig. 2: Understanding the contribution of spEMO for spatial domain identification.
Fig. 3: Clustering scores and visualization of embeddings coloured by different metadata.
Fig. 4: Results of utilizing image embeddings for possibly improving disease-state prediction.
Fig. 5: The analysis of multicellular interaction based on predicted gene expression profiles.
Fig. 6: Pipeline and results of generating medical reports with a multi-modal AI agent.
Fig. 7: Evaluations and examples for the medical report generation task.

Data availability

We do not generate new data in this research and all data used in this paper are publicly available without restricted access. SpatialLIBD data can be accessed under access code (https://research.libd.org/spatialLIBD/). Spatial multi-omic data can be accessed under access code (https://drive.google.com/drive/folders/1RlU3JmHg_LZM1d-o6QORvykYPoulWWMI) and (https://drive.google.com/drive/folders/1RlU3JmHg_LZM1d-o6QORvykYPoulWWMI). Two 10x Visium data can be accessed under code (https://support.10xgenomics.com/spatial-gene-expression/datasets/1.1.0/V1_Adult_Mouse_Brain) and via figshare at https://doi.org/10.6084/m9.figshare.13604168 (ref. 53). TCGA data can be accessed under code (https://portal.gdc.cancer.gov/). HEST data can be accessed under code (https://huggingface.co/datasets/MahmoodLab/hest). We summarize the dataset statistics and download information in Supplementary Table 1. Source data are provided with this paper.

Code availability

To generate the text descriptions and corresponding embeddings, we rely on the API of OpenAI. To generate the embeddings from pathology foundation models as well as fine-tune them, we reply on Yale High-performance Computing Center (YCRC) and utilize one NVIDIA A100 GPU with up to 50 GB RAM. To run spEMO and compare it with other methods, we utilize one NVIDIA A100 GPU with up to 150 GB RAM. Information of running time and memory usage can be found in Supplementary Table 2. The codes of spEMO can be found at https://github.com/HelloWorldLTY/spEMO. The licence is MIT licence.

References

  1. Song, A. H. et al. Artificial intelligence for digital and computational pathology. Nat. Rev. Bioeng 1, 930–949 (2023).

    Article  CAS  Google Scholar 

  2. Zhang, S. & Metaxas, D. On the challenges and perspectives of foundation models for medical image analysis. Med. Image Anal. 91, 102996 (2024).

    Article  PubMed  Google Scholar 

  3. Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Moses, L. & Pachter, L. Museum of spatial transcriptomics. Nat. Methods 19, 534–546 (2022).

    Article  CAS  PubMed  Google Scholar 

  5. Wheeler, D. L. et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 35, 5–12 (2007).

    Article  Google Scholar 

  6. Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).

    Article  CAS  PubMed  Google Scholar 

  7. Tejada-Lapuerta, A. et al. Nicheformer: a foundation model for single-cell and spatial omics. Nat. Methods 22, 2525–2538 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Liu, T., Li, K., Wang, Y., Li, H. & Zhao, H. Evaluating the utilities of foundation models in single-cell data analysis. Preprint at bioRxiv https://doi.org/10.1101/2023.09.08.555192 (2023).

  9. Liu, T., Chen, T., Zheng, W., Luo, X. & Zhao, H. scELMo: embeddings from language models are good learners for single-cell data analysis. Preprint at bioRxiv https://doi.org/10.1101/2023.12.07.569910 (2023).

  10. Abdi, H. & Williams, L. J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2, 433–459 (2010).

    Article  Google Scholar 

  11. Maynard, K. R. et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci. 24, 425–436 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Kiselev, V. Y. et al. SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods 14, 483–486 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Hu, J. et al. SpaGCN: integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. Methods 18, 1342–1351 (2021).

    Article  PubMed  Google Scholar 

  14. Blampey, Q. et al. Novae: a graph-based foundation model for spatial transcriptomics data. Nat. Methods 22, 2539–2550 (2025).

    Article  CAS  PubMed  Google Scholar 

  15. Zhang, D. et al. Inferring super-resolution tissue architecture by integrating spatial transcriptomics with histology. Nat. Biotechnol. 42, 1372–1377 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Fu, X. & Chen, Y. Pix2Path: integrating spatial transcriptomics and digital pathology with deep learning to score pathological risk and link gene expression to disease mechanisms. Preprint at bioRxiv https://doi.org/10.1101/2024.08.18.608468 (2024).

  17. Ma, J. et al. A generalizable pathology foundation model using a unified knowledge distillation pretraining framework. Nat. Biomed. Eng https://doi.org/10.1038/s41551-025-01488-4 (2025).

  18. Xu, H. et al. A whole-slide foundation model for digital pathology from real-world data. Nature 630, 181–188 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Zimmermann, E. et al. Virchow2: scaling self-supervised mixed magnification models in pathology. Preprint at https://arxiv.org/abs/2408.00738 (2024).

  20. Wang, X. et al. A pathology foundation model for cancer diagnosis and prognosis prediction. Nature 634, 970–978 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Xiang, J. et al. A vision–language foundation model for precision oncology. Nature 638, 769–778 (2025).

    Article  CAS  PubMed  Google Scholar 

  22. Lu, M. Y. et al. A visual-language foundation model for computational pathology. Nat. Med. 30, 863–874 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Ding, T. et al. A multimodal whole-slide foundation model for pathology. Nat. Med. 31, 3749–3761 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Chen, W. et al. A visual–omics foundation model to bridge histopathology with spatial transcriptomics. Nat. Methods 22, 1568–1582 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Jaume, G. et al. HEST-1k: a dataset for spatial transcriptomics and histology image analysis. In Proc. 38th International Conference on Neural Information Processing Systems (NIPS '24) Vol. 37, Article 1704, 53798–53833 (Curran Associates, 2024).

  26. Palla, G. et al. Squidpy: a scalable framework for spatial omics analysis. Nat. Methods 19, 171–178 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Long, Y. et al. Deciphering spatial domains from spatial multi-omics with SpatialGlue. Nat. Methods 21, 1658–1667 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).

    Article  Google Scholar 

  29. Labianca, R. et al. Colon cancer. Crit. Rev. Oncol. Hematol. 74, 106–133 (2010).

    Article  PubMed  Google Scholar 

  30. LemaÃŽtre, G., Nogueira, F. & Aridas, C. K. Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18, 559–563 (2017).

    Google Scholar 

  31. Liu, T. et al. Learning multi-cellular representations of single-cell transcriptomics data enables characterization of patient-level disease states. In Research in Computational Molecular Biology: Proc. 29th International Conference, RECOMB 2025 (ed. Sankararaman, S.) 303–306 (Springer, 2025).

  32. Weinstein, J. N. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Katzman, J. L. et al. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med. Res. Methodol. 18, 24 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Kvamme, H. Pycox. GitHub https://github.com/havakv/pycox (2025).

  35. Lee, Y., Liu, X., Hao, M., Liu, T. & Regev, A. PathOmCLIP: connecting tumor histology with spatial gene expression via locally enhanced contrastive learning of pathology and single-cell foundation model. Preprint at bioRxiv https://doi.org/10.1101/2024.12.10.627865 (2024).

  36. Huang, T., Liu, T., Babadi, M., Jin, W. & Ying, Z. Scalable generation of spatial transcriptomics from histology images via whole-slide flow matching. In Proc. 42nd International Conference on Machine Learning (eds Singh, A. et al.) Vol. 267, 25550–25565 https://proceedings.mlr.press/v267/huang25t.html (2025).

  37. Cang, Z. et al. Screening cell–cell communication in spatial transcriptomics via collective optimal transport. Nat. Methods 20, 218–228 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Tsuneki, M. & Kanavati, F. Inference of captions from histopathological patches. In International Conference on Medical Imaging with Deep Learning 1235–1250 (PMLR, 2022).

  39. Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q. & Artzi, Y. BERTScore: evaluating text generation with BERT. In International Conference on Learning Representations. https://openreview.net/forum?id=SkeHuCVFDr (2019).

  40. Soldaini, L. & Goharian, N. QuickUMLS: a fast, unsupervised approach for medical concept extraction. In Medical Information Retrieval (MedIR) Workshop at SIGIR 2016 https://ir.cs.georgetown.edu/downloads/quickumls.pdf (2016).

  41. Shaikovski, G. et al. PRISM: a multi-modal generative foundation model for slide-level histopathology. Preprint at https://arxiv.org/abs/2405.10254 (2024).

  42. Vorontsov, E. et al. A foundation model for clinical-grade computational pathology and rare cancers detection. Nat. Med. 30, 2924–2935 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Wang, X. et al. Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559 (2022).

    Article  PubMed  Google Scholar 

  44. Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Campanella, G. et al. A clinical benchmark of public self-supervised pathology foundation models. Nat. Commun. 16, 3640 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Yuan, Z. et al. Benchmarking spatial clustering methods with spatially resolved transcriptomics data. Nat. Methods 21, 712–722 (2024).

    Article  CAS  PubMed  Google Scholar 

  47. Xu, Y. et al. A multimodal knowledge-enhanced whole-slide pathology foundation model. Nat. Commun. 16, 11406 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  50. Litinetskaya, A. et al. Multimodal weakly supervised learning to identify disease-specific changes in single-cell atlases. Preprint at bioRxiv https://doi.org/10.1101/2024.07.29.605625 (2024).

  51. Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).

    Article  CAS  PubMed  Google Scholar 

  52. Antolini, L., Boracchi, P. & Biganzoli, E. A time-dependent discrimination index for survival data. Stat. Med. 24, 3927–3944 (2005).

    Article  PubMed  Google Scholar 

  53. Palla, G. Brain coronal fluorescent Adata crop. figshare https://doi.org/10.6084/m9.figshare.13604168.v1 (2021).

  54. Mueller, A. C. amueller/word_cloud. GitHub https://github.com/amueller/word_cloud (2023).

Download references

Acknowledgements

We thank M. Hao (Harvard University) for the suggestions of naming the model. We thank N. Sun (Yale University), W. Jin (Northeastern University) and T. Chu (Yale University) for the suggestions of model comparison. We thank D. Stern (Yale University) and H. Kluger (Yale University) for helping us recruit pathologists. We also thank S. John (Yale University) and one anonymous pathologist (Lois Hole Hospital for Women) for helping us validate medical reports. This project is supported in part by NIH grants U24HG012108 and U01HG013840.

Author information

Authors and Affiliations

Authors

Contributions

T.L. designed this study. T.L., T.H. and T.D. designed the model. T.L. and T.H. run all the experiments. H.W., P.H., S.P. and K.S. performed human evaluations. T.L., T.H., R.Y., H.X., J.Z., F.M. and H.Z. wrote the paper. R.Y. supported the computation resources. H.Z. supervised this project.

Corresponding author

Correspondence to Hongyu Zhao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Biomedical Engineering thanks Dong Xu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–17.

Reporting Summary

Supplementary Table 1

Information of human evaluation test.

Supplementary Table 2

LLM output for report generation.

Supplementary Data 1

Supplementary data.

Supplementary Data 2

Source data for supplementary figures and tables.

Source data

Source Data Figs. 2–7 and Table 1

Source data for main figures and tables.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, T., Huang, T., Ding, T. et al. Leveraging multi-modal foundation models for analysing spatial multi-omic and histopathology data. Nat. Biomed. Eng (2026). https://doi.org/10.1038/s41551-025-01602-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41551-025-01602-6

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing