SIMVI disentangles intrinsic and spatial-induced cellular states in spatial omics data

Dong, Mingze; Su, David G.; Kluger, Harriet; Fan, Rong; Kluger, Yuval

doi:10.1038/s41467-025-58089-7

Download PDF

Article
Open access
Published: 27 March 2025

SIMVI disentangles intrinsic and spatial-induced cellular states in spatial omics data

Nature Communications volume 16, Article number: 2990 (2025) Cite this article

10k Accesses
5 Citations
19 Altmetric
Metrics details

Subjects

Abstract

Spatial omics technologies enable analysis of gene expression and interaction dynamics in relation to tissue structure and function. However, existing computational methods may not properly distinguish cellular intrinsic variability and intercellular interactions, and may thus fail to reliably capture spatial regulations. Here, we present Spatial Interaction Modeling using Variational Inference (SIMVI), an annotation-free deep learning framework that disentangles cell intrinsic and spatial-induced latent variables in spatial omics data with rigorous theoretical support. By this disentanglement, SIMVI enables estimation of spatial effects at a single-cell resolution, and empowers various downstream analyses. We demonstrate the superior performance of SIMVI across datasets from diverse platforms and tissues. SIMVI illuminates the cyclical spatial dynamics of germinal center B cells in human tonsil. Applying SIMVI to multiome melanoma data reveals potential tumor epigenetic reprogramming states. On our newly-collected cohort-level CosMx melanoma data, SIMVI uncovers space-and-outcome-dependent macrophage states and cellular communication machinery in tumor microenvironments.

Screening cell–cell communication in spatial transcriptomics via collective optimal transport

Article Open access 23 January 2023

Spatial integration of multi-omics single-cell data with SIMO

Article Open access 01 February 2025

DestVI identifies continuums of cell types in spatial transcriptomics data

Article 21 April 2022

Introduction

Spatial omics technologies have emerged as powerful tools for exploring tissue organization and function at unprecedented resolution. Modern imaging-based spatial transcriptomics (ST) platforms, including SeqFISH¹, MERFISH^2,3, and CosMx⁴, can profile hundreds to thousands of genes at single-cell or subcellular resolution. Complementary sequencing-based platforms such as DBiT-seq^5,6,7, Slide-seqV2⁸, HDST⁹, Slide-tags¹⁰, and Stereo-seq^11,12 offer genome-wide profiling at cellular or near-cellular resolution. Together, these technologies enable detailed analysis of how spatial microenvironments influence gene expression at the cellular level.

A cell’s gene expression is shaped by both its intrinsic properties and spatial microenvironment. Intrinsic properties represent aspects of gene expression that operate independently of the cellular neighborhood, such as cell type identity and cell cycle state. The spatial microenvironment, in contrast, encompasses spatially distributed signaling molecules and cell-cell interactions¹³. Disentangling these two factors is crucial for spatial omics analysis, as it reveals how cells of specific phenotypes respond to their local environment, giving rise to gene expression changes. This understanding is particularly valuable for dissecting complex biological processes involving both intrinsic state transitions and spatial dynamics.

Yet, the task brings forth two significant challenges. First, cells of the same type often cluster together in space, creating spatial patterns of gene expressions that mostly reflect intrinsic properties rather than spatial interactions. Second, different cell types may have different responses to their local environment, particularly in the case of cellular interaction (Fig. 1a). To the best of our knowledge, most existing computational approaches do not address the challenge of disentangling intrinsic and spatial patterns from spatial omics data. Existing analysis methods for spatial omics data generally aim to identify spatially variable genes^{14,15,16,17,18,19}, learn spatial-aware embeddings^{20,21,22,23,24,25,26,27,28,29,30}, select genes that mediate cell interactions³¹, predict patient phenotypes^32,33, or learn gene expression changes associated with local interactions^34,35,36. Most of these methods do not address the disentanglement task, thus improperly specify the contributions of different variations in driving spatial-associated gene expression changes. While several works made preliminary efforts^21,34 to separate intrinsic and spatial-induced variations, a failure of disentanglement has been noted³⁴ due to the lack of identifiability.

**Fig. 1: Overview of the SIMVI framework.**

In this work, we present Spatial Interaction Modeling using Variational Inference (SIMVI), a deep variational inference framework that disentangles intrinsic and spatial-induced (hereafter also termed spatial) variations in spatial omics data. Our approach is supported by rigorous theoretical guarantees for model identifiability in achieving this disentanglement. The learned representations enable diverse downstream analyses, including but are not limited to batch integration, niche identification, and differential expression analysis. A key innovation of SIMVI is its ability to quantify spatial effects (SE) - the influence of spatial location on gene expression - at single-cell resolution. This quantification yields both gene-specific variance decomposition and rankings of spatially-regulated genes. Importantly, SIMVI’s SE estimation framework is adaptable to emerging modalities in spatial multi-omics data.

We comprehensively evaluated SIMVI’s capacity to uncover disentangled variations and spatial effects across multiple public datasets from different platforms and tissues: MERFISH human cortex³, Slide-seqv2 mouse hippocampus⁸, Slide-tags human tonsil, and spatial multiome human melanoma¹⁰. Our analyses demonstrate SIMVI’s superior performance in disentangling intrinsic and spatial variations compared to existing methods. SIMVI also uniquely captures complex spatial interactions and dynamics, yielding novel biological insights. In the Slide-tags tonsil data, SIMVI reveals spatial-dependent dynamics of germinal center B cells during maturation. Analysis of the multiome melanoma data uncovers refined spatial niches and multi-omics spatial effects, suggesting potential epigenetic reprogramming states. When applied to our newly collected cohort-level CosMx melanoma dataset, SIMVI identifies macrophage states with distinct spatial patterns that correlate with patient outcomes. Furthermore, it reveals the underlying cellular interaction machinery within tumor microenvironments, characterized by asymmetric dependencies between ligand-receptor strength and gene spatial effects.

Results

Disentanglement of intrinsic and spatial-induced variations by SIMVI

In our work, we consider the gene expression xⁱ of each cell i to be generated by two sets of low-dimensional latent variables zⁱ, sⁱ, representing intrinsic and spatial variation respectively (Methods, Supplementary Note 1). The intrinsic variation zⁱ encodes cell type information, while the spatial variation sⁱ captures factors such as spatial gradients and cell interactions (Fig. 1a). We also considered a key biological principle in our framework - cell types often exhibit specific spatial organizations, introducing natural dependencies between intrinsic and spatial variations. We implement this principle in our generative process through a permutation procedure (Supplementary Note 1).

We next investigated the theoretical possibility of disentangling intrinsic and spatial variations from data distributions, which is known as model identifiability in statistics³⁷. Without identifiability, the inferred intrinsic and spatial latent variables could become arbitrary mixtures of the true underlying variations. In our context, establishing identifiability presents two key challenges: 1. non-linear models generally need additional supervision for identifiability³⁸; 2. the intrinsic and spatial variation may be correlated due to cell-type-specific spatial organizations. We derived the first rigorous theoretical support for model identifiability in this context, showing that these challenges can be addressed together by viewing neighborhood intrinsic variation as a supervision on the spatial variation. Specifically, we show that it is possible to find a representation of the intrinsic latent variables up to a non-linear transformation. As for the spatial latent variables, it is possible to infer them up to an invertible linear transformation. For this theoretical framework to hold, intrinsic variations must encode minimal information, alongside other mild theoretical conditions (Supplementary Note 1).

We designed a deep variational inference framework (SIMVI) to infer intrinsic and spatial latent variables based on our theoretical support. In SIMVI, the intrinsic latent variable zⁱ is modeled by the gene expression xⁱ of cell i. The spatial latent variable sⁱ is modeled through weighted aggregation of neighborhood intrinsic variations using Graph Attention Network (GAT)^39,40. This treatment of spatial latent variables aligns with the theoretically-required supervision of neighborhood intrinsic variation. To enforce the other theoretical requirement of minimizing information encoded in intrinsic variation, SIMVI incorporates an asymmetric regularization term (Fig. 1b, Methods). SIMVI further includes a pretraining step that randomly permutes a subset of input genes, which empirically enhances model performance (see Methods, Supplementary Note 3). The model parameters are optimized by maximizing a modified evidence lower bound (ELBO, see Methods). Additional connections between the theoretical framework and model design are discussed in Supplementary Note 1.4. Together, SIMVI integrated several innovative designs aligning with our theoretical foundation.

After training on spatial omics data, SIMVI returns two posterior embeddings. The intrinsic embedding represents cell-intrinsic variation, while the spatial embedding captures the contribution of spatial neighborhood context on the spatial variation. For clarity, we hereafter refer to these as SIMVI intrinsic and spatial variations, respectively. The disentangled representations enhance our capacity to explore and characterize cellular diversity at different levels including cell-intrinsic properties, spatial heterogeneity, and their combination. They facilitate a variety of analyses, including cell type annotation, spatial niche identification, and differential gene expression analysis. SIMVI can also incorporate covariates such as batch labels by the conditional design implemented in scVI⁴¹, enabling analyses for datasets comprised of multiple batches.

The disentangled representations provide a natural framework for estimating the effect of spatial location on individual gene expression, by viewing SIMVI spatial variation as “treatments” and intrinsic variation as “covariates”. This formulation transforms the task into a problem of estimating continuous treatment effects, a well-established paradigm in causal inference. In this study, we employ a linear-regression based estimator rooted in causal inference and double machine learning (DML,⁴²) to estimate the conditional spatial effect on individual cells (see Methods and Supplementary Note 2).

In some cases, discerning the spatial effect may be impossible. For instance, if a particular cell type exhibits unique intrinsic states and localizes to spatial niches that do not overlap with those of other cells, then it is impossible to determine whether its expression shifts are influenced by differences in intrinsic states or by spatial niches. This case constitutes non-identifiability and is well-known as the violation of the positivity assumption in causal inference (refs. ^43,44, Supplementary Note 2). To this end, we derived single-cell positivity indices that can be used to exclude cells potentially violating the positivity assumption, enforcing reliable spatial effect inference (Methods, Supplementary Note 2).

SIMVI effectively disentangles variations

To demonstrate SIMVI’s ability to reveal disentangled variations, we applied it to two datasets that were profiled using representative imaging-based and sequencing-based technologies. The first dataset comes from a recent MERFISH study that profiled 4000 genes in the middle temporal gyrus (MTG) and superior temporal gyrus (STG) regions of human cortex³. With its detailed annotations and experimental replicates, this dataset provides an ideal case for comprehensively benchmarking SIMVI’s capability. The second dataset is the widely used Slide-seqV2 mouse hippocampus data⁸, which provides unbiased spatial profiling of gene expression in the mouse hippocampus at near-cellular resolution.

An overview of the MERFISH MTG/STG dataset is shown in Fig. 2a and Supplementary Fig. 5a. Each dataset included spatial profiles of the cortex sampled across three replicates, each capturing slightly different cortical layers. Apart from cortex layers, another prominent spatial pattern within the data is the “local niche” vascular structure, characterized by the colocalization of mural and endothelial cells expressing MYH11³.

**Fig. 2: SIMVI reveals intrinsic and spatial variations in MERFISH human cortex and Slide-seqV2 mouse hippocampus.**

To evaluate the disentanglement performance, we defined four key metrics: batch correction, cell type preservation, layer preservation, and local niche preservation scores (See Methods). These metrics assess the effectiveness of embeddings in representing corresponding intrinsic or spatial information, while adequately removing batch effects across replicates. Prior to benchmarking SIMVI against other methods, we conducted an extensive evaluation of the impact of parameter settings and model modifications on SIMVI’s performance, as detailed in Supplementary Note 3 (Supplementary Figs. 1–4, Supplementary Data). In particular, while it has been argued that modeling zero inflation is not necessary for spatial transcriptomics data⁴⁵, our results show that the quantitative performance of SIMVI embeddings is not altered by considering zero inflation or not.

Apart from SIMVI, we applied a variety of methods that learn batch integrated embeddings (Harmony, scVI^41,46), spatial-aware embeddings (MEFISTO, NSF, SpiceMix, GraphST, STAGATE^{20,21,22,23,24}) or perform both simultaneously (GraphST + Harmony, STAGATE + Harmony, CellCharter, SpiceMix + Harmony^{22,23,24,26,46}). We further tested baseline models (Graph Only, Graph Only + cell type 1, Graph Only + cell type 2) that have a SIMVI graph encoder design, yet replace intrinsic variation with different settings of covariates (batch label, batch label + cell type label, batch label + cell subtype label, see Methods). Such a design coincides with NCEM³⁴ when the covariate contains cell types. Overall, SIMVI performs well across all tasks, with higher accuracy in spatially relevant tasks and higher total scores in both MTG and STG compared with alternative methods (Fig. 2b, Supplementary Fig. 7e, f, Source Data). Apart from SIMVI, we observed a clear distinction in different methods’ capability. Several methods that do not address disentanglement (scVI, NSF, CellCharter, SpiceMix, Graph Only) also excel at batch correction and cell type preservation. This is expected since cell type represents the most prominent variation in the data. However, these methods underperform in spatially relevant tasks. For these tasks, MEFISTO works reasonably well only in layer preservation, while there is no consistent winner apart from SIMVI in local niche preservation. Qualitatively, SIMVI intrinsic variation preserves cell type structures and remove batch effects. SIMVI spatial variation distinguishes different layers, including those not observed in all batches, and localizes cells with high expression levels of MYH11 (Fig. 2c, d, Supplementary Fig. 5b–f).

We next applied SIMVI to the Slide-seqV2 mouse hippocampus data⁸ (Fig. 2e). In this dataset, each pixel represents a mixture of cells, and the cell type annotation for each pixel is based on the dominant cell type obtained from deconvolution (Supplementary Fig. 8a, b). In this case, although the cell type label is no longer appropriate for benchmarking method performance, an evaluation on the spatial representation can still be conducted by leveraging an anatomic annotation for the same brain region (Allen Mouse Brain Atlas, Fig. 2e). Compared to prior methods with good performance (CellCharter, GraphST), SIMVI is the only method that correctly reveals the organizations of cortical layers and the Third Ventricle (V3) region (Fig. 2f). Quantitatively, SIMVI has an advantage in either revealing all niches or selected niches that show clear spatial structures aligning with the anatomy (See Methods, Fig. 2g, Source Data). The suboptimal performance of alternative methods in identifying V3 is likely due to the different cell compositions in V3 upper and lower regions, highlighting the need of disentangling intrinsic from spatial variations. Interestingly, we observed that the SIMVI Cornu Ammonis (CA) and Dentate Gyrus (DG) clusters show a consistent boundary with the manually derived annotation (See Methods), while CellCharter and GraphST identified thicker CA and DG regions (Fig. 3f, Supplementary Fig. 8e). This suggests a potential “oversmoothing” phenomenon for these methods through spatial neighborhood aggregation. A benchmark of the relative distances between externally annotated CA/DG regions and their spatial neighborhoods confirms our observation, and highlights SIMVI’s advantage in revealing reliable spatial structures (Supplementary Fig. 8f, Supplementary Data).

**Fig. 3: SIMVI infers single-cell level spatial effects.**

SIMVI infers accurate spatial effects

To estimate the spatial effect, we first decompose the spatial variation using archetypal analysis⁴⁷. This method represents the spatial variation as a product of extreme spatial states (archetypes) and “archetype weight vectors”, which quantify the contribution of each archetype to individual cells. Next, we formulate a conditional treatment estimation task by considering the archetype weights as continuous treatments, and adjusting for confounders including intrinsic variation and other optional covariates (Fig. 3a). Specifically, we employ a partial regression procedure to estimate the conditional spatial effect at the level of individual cell and gene. It corresponds to an ordinary causal regression model, or a simplified version of double machine learning⁴², where all regression models are linear and no cross-fitting is used. The procedure also returns a decomposition of R² (coefficient of determination) of gene expressions into the “intrinsic variation explained R²” and “Spatial effect explained R²”. Details of the procedure are described in Methods and Supplementary Note 2. Applying our procedure to MERFISH MTG data revealed archetypes corresponding to cortex layers and vascular structures (Supplementary Fig. 9). Additionally, we identified genes with distinct spatial patterns that showed notably high (outlier) spatial effect R² values compared to the bulk (Fig. 3b, Supplementary Figs. 10, 11). The spatial effects of these genes show cleaner spatial patterns in visualizations compared with normalized original counts (Fig. 3c, Supplementary Fig. 11).

We further focused on the spatial effect within a specific cell type (astrocyte, ASC, see Methods). We chose astrocytes because they are known to exhibit spatial diversity across cortex layers yet do not have a definitive subtype distinction (morphology, neurotransmitter type, etc) as those seen in neurons^48,49. Unsupervised analyses of spatial effects identified gene clusters with divergent spatial patterns (Fig. 3d, Supplementary Fig. 12). We benchmarked the spatial effects by taking advantage of the vertical layered structure of MTG replicate 1, which implies that the spatial effect should capture trends along the vertical axis (true positive) but not along the horizontal axis (false positive). We adopted two non-parametric metrics, Kendall’s tau and Spearman R, to evaluate the dependency between each gene and the spatial coordinates. Apart from our approach (named as SIMVI-archetype, double machine learning (DML)), we additionally included six baselines: normalized counts (NC), scVI NC, NCEM NC, cell-type conditioned linear regression using SIMVI spatial variation (analogous to the c-SIDE⁵⁰ setup, SIMVI-linear regression (LR)), cell-type conditioned linear regression using SIMVI archetype representation (SIMVI-archetype, LR), and an ablation setting that substitutes SIMVI intrinsic variation with scVI embedding (scVI ablation*). SIMVI-archetype, LR and SIMVI-archetype, DML emerged as the two top performers in the benchmark, with the latter showing favorable performance in identifying spatial-dependent genes (4/7 gene clusters excluding cluster 0) while filtering out false positives (6/8 gene clusters, Fig. 3e, Supplementary Fig. 13). Substitution of SIMVI intrinsic variation with the scVI embedding (scVI ablation*) greatly hampered the performance and obscured true spatial patterns (Supplementary Figs. 12, 13). This underscores the essence of removing spatial information from intrinsic variation for the spatial effect estimation task. Further gene pathway analysis underscores functional distinctions among these genes exhibiting different spatial patterns (Fig. 3f).

In the Slide-seqV2 mouse hippocampus data, a number of spatial niches also exhibit distinct intrinsic states, presumably due to the lower inherent resolution of the data. This constitutes a case where the positivity condition may be violated and examining the positivity index is essential. The spatial effect analysis of the Slide-seqV2 mouse hippocampus data primarily identified genes upregulated within the V3 upper region (Supplementary Fig. 14a). Similar to the results in MERFISH data, the spatial effects of these genes show cleaner spatial patterns in visualizations compared with normalized original counts, especially for genes exhibiting high sparsity (Supplementary Fig. 14d, e). Furthermore, the positivity analysis identifies potential violation of the positivity assumption in CA, DG, MH (Medial Habenula), and a layer between CA1 and cortical regions that primarily consist of oligodendrocytes (Supplementary Fig. 14b). These regions are composed of cells with distinct intrinsic states and spatial environments simultaneously, supporting the validity of the approach. The spatial regions violating positivity (archetypes with high positivity index) remain consistent across different clustering resolutions (Supplementary Fig. 14c).

SIMVI illuminates B cell dynamics in Slide-tags human tonsil

We next applied SIMVI to the recently published Slide-tags human tonsil dataset (Fig. 4a)¹⁰. As expected, the intrinsic variation generated by SIMVI captured different cell types in the sample (Fig. 4b), while its spatial variation grouped biologically meaningful niches, which we annotated as B cell zone, germinal center (GC) dark / light zone, GC boundary that enriches T follicular helper (Tfh) cells, and T cell zone (Fig. 4c).

**Fig. 4: SIMVI identifies cell niches and dynamics in human tonsil.**

Our primary focus in the dataset is on germinal center B cells. These cells initially undergo rapid proliferation and somatic hypermutation (SHM) in the dark zone^51,52. Following SHM, B cells migrate to the light zone, where they receive survival and proliferation signals. This cycle repeats until the B cells mature, at which point they either undergo apoptosis or survive based on their antigen-binding efficiency^51,52. As such, we expect a cyclical organization of germinal center B cells. Using SIMVI’s full variation, we successfully annotated different phases of B cells, which indeed exhibit a circular structure (Fig. 4d). These phases are also characterized by sequentially activated genes (Fig. 4e, Supplementary Fig. 15c), and can be identified via standard pipelines or linear projection of SIMVI representation (Supplementary Fig. 15a, b).

The spatial locations of germinal center B cells reveal intricate dependencies with their gene expression states. Among the five annotated phases, B cells from phases 2 and 3 were predominantly found in the dark zone, while those from other phases were mainly located in the light zone (Fig. 4f, Supplementary Fig. 15d). Combining with the phase adjacency relations (0-1-2-3-4-5-0, Fig. 4d, e), we were able to differentiate phase transitions caused by spatial migration from those driven by shifts in gene expression (Fig. 4g). Such a distinction explicitly requires two different representations encoding spatial information and cellular states respectively. Finally, Although external spatial annotations for benchmarking are lacking, an assessment using SIMVI-curated labels demonstrates that SIMVI consistently identifies spatial structures within the tonsil and the phases of germinal center B cells, compared with other methods (Supplementary Fig. 15e, Supplementary Data). Altogether, these findings underscore the unique strength of SIMVI disentangled representations in revealing distinct mechanisms underlying cellular state transitions.

SIMVI reveals niches and spatial-dependent epigenetic states in human melanoma

Melanoma is the most common cause of skin cancer related fatalities, resulting in over 7000 deaths per year in the United States⁵³. In recent years, the death rate has dropped dramatically due to the advent of immune checkpoint inhibitors which reverse T cell exhaustion and disinhibit effector T cells, among other functions⁵⁴. However, not all patients respond to these therapies, and extensive efforts are ongoing to understand mechanisms of resistance or sensitivity. Apart from the heterogeneity across individuals, substantial heterogeneity is also observed within each tumor, which may promote drug resistance and the transition to pro-metastatic cellular states. In this work, we applied SIMVI to two datasets profiling human melanoma samples, to better understand the underlying biology at both individual and cohort levels.

The first dataset we employed is the spatial multiome melanoma dataset from the Slide-tags work¹⁰. It consists of a melanoma sample from an individual patient, profiling both spatial gene expression and chromatin accessibility. The dataset is particularly well-suited for exploring tumor heterogeneity within distinct spatial niches and elucidating the role of epigenetic regulation in defining different tumor states. Analyses of the dataset in the original study revealed two tumor subpopulations (tumor 1 and tumor 2) that exhibit distinct gene expression patterns and are also spatially segregated (Fig. 5a, b). We first applied SIMVI (and other baseline methods) on the gene expression modality alone to extract intrinsic and spatial variations. The cell type heterogeneity is well captured by SIMVI intrinsic variation, which further enforces latent space proximity across tumor cells consistent with cell ontology, compared with the scVI baseline (Fig. 5b, Supplementary Fig. 16d).

**Fig. 5: SIMVI uncovers functional niches and potential epigenetic reprogramming states in human melanoma.**

The SIMVI spatial variation further depicts refined microenvironments within the sample, including niches primarily composed of tumor cells and those involving infiltrating immune cells. In particular, it differentiates two intermediate states of the immune infiltrating microenvironment: one involves an non-infiltrating intermediate state physically between two tumors (gray cluster in Fig. 5c). The other one shows a continuous shift with intermediate states found in both tumor 1 and 2 regions (Fig. 5c). Furthermore, the spatial variation discriminates a spatial niche in tumor 1 with highly expressed SLC6A8 and MIR3681HG (Supplementary Fig. 16a–c), which we labeled as “hypoxia” due to the role of SLC6A8 in metabolism and previous literatures highlighting its upregulation in hypoxic tumor cells⁵⁵.

Spatial effect analysis on the dataset using the gene expression modality alone reveals various genes influenced by the spatial context. One notable example is the macrophages that exhibit differential expression in tumor 1 and tumor 2 niches (AQP9, MRC, Supplementary Fig. 17a–c). We next benchmarked the quantitative performance across different methods in identifying cell types and macrophage / hypoxia niches based on external annotation labels (See Methods). Our results show that SIMVI excels in both cell type identification and spatial niche characterization, achieving the highest overall score. Moreover, multiple runs of SIMVI result in consistent spatial niches marked by a high SIMVI niche score (See Methods, Fig. 5d, Source Data). The advantage of SIMVI disentangled representations over SIMVI full variation further highlights the benefits of utilizing disentangled representations (Fig. 5d).

We extended the SIMVI spatial effect estimation procedure to infer the epigenetic spatial effect. Noting the high noise of the ATAC-seq dataset, we performed spatial effect estimation on its Latent Semantic Indexing (LSI) components, aiming to reveal a global organization of spatial effect clusters. Surprisingly, our results highlight two distinct clusters of tumor 2 in the epigenetic spatial effect space, which we termed as state 1 and 2 (Fig. 5e). State 1 was mostly observed in immune infiltrating microenvironments, whereas state 2 was primarily localized in the non-immune-infiltrating tumor regions (Fig. 5f). Compared with state 2, cells from state 1 also exhibited more spatially dispersed distributions (Fig. 5g). Although these tumor states exhibit distinct spatial distributions, they coexist within each spatial niche (Fig. 5f), suggesting that they cannot be simply resolved by niche identification analysis.

We further investigated the functions of differential genes / peaks across state 1 and state 2 cells. The gene expression differences are notably distinctive in the VEGFA-VEGFR2 signaling and focal adhesion pathways (Fig. 5i). These pathways are crucially linked to the metastatic capability and proliferation of tumor cells. The ATAC-seq differential peaks between these two states display diverse yet subtle functional enrichments, and relatively attenuated transcription starting site (TSS) accessibility in state 1 (Supplementary Fig. 18). Our motif analyses identified transcriptional factor (TF) motifs enriched specifically in state 2 (in particular the PITX2 motif, Fig. 5j), yet the gene expression changes of these TFs were not observed (Fig. 5h). These results provide evidences for a epigenetic reprogramming state in tumor 2 cells, which also exhibits a focused shift in gene expressions. This finding was not noted in the original study, likely relevant to substantial noise in the ATAC data that even obscured the distinction between tumor 1 and tumor 2 (Supplementary Fig. 17f,g). These results highlight the power of SIMVI in characterizing spatial-dependent states in spatial multi-omics datasets.

SIMVI uncovers spatial interactions in cohort Melanoma data

The second dataset we employed is our newly collected CosMx samples from 25 melanoma patients treated with immune checkpoint inhibitors with various outcomes (Fig. 6a). The phenotype diversity within this single spatial array makes it an ideal case to systematically understand melanoma-associated spatial biology at a cohort level. In the dataset, we observed reasonable overlap for non-tumor cells across patients, while a huge heterogeneity of tumor cells across patients is still evident (Supplementary Fig. 19a). Further exploratory analyses revealed a cell type composition difference across patients with different outcomes (Supplementary Fig. 19b).

**Fig. 6: SIMVI characterizes macrophage subtypes and reveals cell interaction landscape in cohort-level spatial melanoma profiles.**

The heterogeneity across tumors includes differences of both intrinsic and spatial variations. Therefore, it constitutes a case where positivity, the assumption for estimating spatial effects, is not satisfied. Indeed, we observed a higher positivity index of all tumor cell subtypes compared with non-tumor cell types (Supplementary Fig. 19i). Nevertheless, in this challenging case, we found that the SIMVI intrinsic variation correctly merged tumor cells of the same subtypes (Fig. 6b, Supplementary Fig. 19). Meanwhile, the SIMVI spatial variation captures the pattern of patient responses in non-tumor cells, while separating tumor cells from different patients (Fig. 6c, Supplementary Fig. 19). In our benchmarking, SIMVI achieves the best performance in accurately identifying both cell types (via intrinsic variation) and patient response labels (through spatial variation), leading to improved overall performance (Fig. 6d, Source Data, see Methods).

The proportion of macrophages among all cells remained consistent across patients with different outcomes, as opposed to other prevalent non-tumor cell types (Supplementary Fig. 19a). Moreover, tumor-associated macrophages (TAMs) are known as important indicators for patient outcomes^56,57. Based on these, we anticipate macrophages contribute to patient outcomes through spatial-dependent subpopulations, and therefore focused on spatial effect analysis for macrophages. Our analyses delineated four states in macrophages, characterized by canonical TAM markers (State 1: C1QC+, State 2: C1QC+,LYZ+, State 3: LYZ+, state 4: SPP1+) (Fig. 6e), showing a clearer representation than analyses using original counts (Supplementary Fig. 20a). These states well aligns with existing knowledge on TAMs⁵⁶, where C1QC+, LYZ+ and SPP1+ macrophages were categorized as tissue resident, classical tumor-infiltrating, and angiogenesis macrophages respectively⁵⁶. We observed a monotonicly increasing tumor proximity in macrophage states from 1 to 4 (Fig. 6f). This observation is supported by visualization of a representative patient sample, where SPP1 expression is primarily observed in macrophages adjacent to tumor, and the other two markers are mostly observed in immune niches (Supplementary Fig. 20c–e).

By comparing the expression of the canonical markers across patients, we observed an increase of SPP1 and a decrease of LYZ in patients with the worst outcome PD (progressive disease, Supplementary Fig. 20b). The pattern was also observed in SIMVI spatial effects (Fig. 6g). A quantitative evaluation suggests an advantage of SIMVI spatial effect over alternative baselines in predicting patient PD (progressive disease) outcome by either SPP1 or LYZ (Fig. 6g, see Methods).

Finally, we investigated the relationships between ligand-receptor strength and gene spatial effects among non-tumor cells. The correlation analysis reveals a statistically significant asymmetric relationship between ligand-receptor strength and spatial effects across patients and spatial niches (Supplementary Fig. 21), indicating a potential latent trajectory for cell interaction machinery (Fig. 6h). Specifically, a high strength of adhesion / extracellular matrix (ECM) signaling ligand-receptor (LR) pairs (dominant in innate immune cells and tissue resident cells) induces major histocompatibility complex (MHC) signaling genes. The high level of MHC related LR across all cell types further activates chemokine signaling in lymphoid cells (Fig. 6h, Supplementary Fig. 21g). Other methods mostly recovered two regions and failed to identify the asymmetric pattern (Supplementary Figs. 21, 22). This is likely because alternative methods return “normalized expressions” convolving intrinsic and spatial effects. We also found that the LR strength from different stages in the machinery varies with the patient outcome (Fig. 6i), which may be explained by the cell composition difference across patient outcomes. This latent trajectory can be also represented by gene pathways, suggesting a directed information transfer landscape across cells in the tumor microenvironment (Fig. 6j).

Discussion

We introduced SIMVI (Spatial Interaction Modeling using Variational Inference), a powerful approach to disentangle intrinsic and spatial-induced variations in spatial omics data. To the best of our knowledge, SIMVI is the first model that shows capability for the task, enabling further estimation of spatial effects at a single-cell level. SIMVI outperforms alternative methods in terms of various quantitative metrics and qualitative comparisons. We applied SIMVI to five real datasets from different tissues and platforms, including MERFISH human cortex, Slide-seqV2 mouse hippocampus, Slide-tags human tonsil, spatial multiome human melanoma, and CosMx cohort-level melanoma. SIMVI provides new biological insights for all analyzed datasets, with notable findings in the latter three. Given the rapid development of high-resolution spatial omics, we anticipate SIMVI to be of immediate interest to the spatial omics community.

SIMVI was designed to handle spatial omics data with single-cell resolution, such as imaging-based spatial omics data and high-resolution sequencing-based spatial omics data like Slide-tags¹⁰ and Stereo-seq¹¹. SIMVI may encounter limitations when applied to lower-resolution spatial omics datasets, but these challenges could potentially be addressed by incorporating complementary methods. One limitation is that these datasets have more than one cell in each pixel, which may obscure the cellular interactions and make the spatial gradient and gradual shifts in cell composition difficult to distinguish. Another limitation is that some of these technologies may have non-negligible gaps between pixels, which restricts the interpretation of local interactions between observed pixels. To address these limitations, advanced deconvolution methods such as Tangram⁵⁸, CARD⁵⁹, and DestVI⁶⁰ could help reveal the single-cell profile within each pixel by using scRNA-seq references. Moreover, computational techniques that model the spatial image and spatial transcriptomics datasets, such as XFuse⁶¹ and TESLA⁶², may be extended to provide imputations for cells not covered by pixels.

While our spatial effect estimation procedure was based on linear models, other more advanced machine learning models can be directly adopted through the double machine learning framework. However, the use of non-linear models might raise complexities related to model overfitting and interpretation⁶³. Additionally, fine-tuning of trained SIMVI models on scRNA-seq datasets may facilitate niche annotation and spatial effect identification in single-cell sequencing datasets⁶⁴. The spatial effect may require careful interpretation, as it could arise from different biological mechanisms such as cell migration and ligand-receptor interaction. Leveraging established gene interaction relationships (e.g., ligand-receptor)^{65,66,67,68,69} can aid in interpretation of the observed spatial effects. Incorporating prior knowledge into the SIMVI model may also enhance SIMVI’s power in challenging spatial omics datasets, where intrinsic and spatial variations substantially coincide.

Methods

The SIMVI model

We now provide a more detailed description of the SIMVI model. Let $X\in {{\mathbb{R}}}^{n\times p}$ denote the count matrix of a spatial omics dataset, where n is the number of cells/points and p is the number of genes. The coordinate matrix of cells/points is represented by $C\in {{\mathbb{R}}}^{n\times 2}$, with Cⁱ indicating the (x, y) coordinates of cell i. We preprocess the spatial information to construct a k-nearest neighbor graph (k = 10 throughout the study): G = (V, E), where the vertex set is V = {1, …, n}, and the edge set E consists of ordered pairs (i, j) with i, j ∈ V. The neighbors of cell i are defined as N(i) = {j∣(i, j) ∈ E}. Using these notations, we proceed to describe the generative model and inference procedure of SIMVI.

Generative process

In this work, we consider the following generative process for modeling the distribution of entries ${x}_{g}^{i}$ (cell i, gene g) in the count matrix of spatial omics data, given the graph G:

$$ z \sim \,{{\rm{Normal}}}\,(0,I)\\ \pi : \,{{\rm{permutation}}}\,,\,p(\pi | z,G)=\frac{1}{Z}{U}_{G}({z}^{\pi })\\ {s}^{i}| z \sim \,{{\rm{Normal}}}\,(A{z}^{\pi (N(i))},{\Sigma }_{s})\\ {l}^{i} \sim \,{{\rm{LogNormal}}}\,\left({l}_{\mu }^{{b}^{i}},{l}_{\sigma }^{{b}^{i}}\right)\\ {\rho }^{i}=\,f({z}^{\pi (i)},{s}^{i},{b}^{i})\\ {w}_{g}^{i} \sim \,{{\rm{Gamma}}}\,({\rho }_{g}^{i},{\theta }_{g})\\ {y}_{g}^{i} \sim \,{{\rm{Poisson}}}\,({l}^{i}{w}_{g}^{i})\\ {h}_{g}^{i} \sim \,{{\rm{Bernoulli}}}\,\left({f}_{h}^{g}({z}^{\pi (i)},{s}^{i},{b}^{i})\right)\\ {x}_{g}^{i}=\,{h}_{g}^{i}{y}_{g}^{i}$$

(1)

In the generative process, z = [z¹, z², …, zⁿ] represents the intrinsic variation for all cells. ${z}^{i}\in {{\mathbb{R}}}^{{d}_{1}}$ follows i.i.d. standard normal distributions, thus is independent of G. To further incorporate the possible spatial dependence, the generative process includes a probabilistic permutation π allocating zⁱ to spatial locations π(i). The permutation π is sampled from the distribution $\frac{1}{Z}{U}_{G}({z}^{\pi })$, accounting for intrinsic cell information z and spatial proximity G. After the permutation, z^π(i) represents the intrinsic information of the cell at node i.

sⁱ represents the spatial variation of the cell at node i. It depends on the neighborhood intrinsic variation z^π(N(i)), and follows a normal distribution with the form shown in (1), where the matrix $A\in {{\mathbb{R}}}^{{d}_{2}\times k{d}_{1}}$, and the covariance matrix ${\Sigma }_{s}\in {{\mathbb{R}}}^{{d}_{2}\times {d}_{2}}$ is positive definite. bⁱ represents experimental covariates such as batch label. The library size is modeled as a latent variable following log normal distribution with parameters ${l}_{\mu }^{{b}^{i}},{l}_{\sigma }^{{b}^{i}}$, following the scVI design⁴¹. In practice, we have found the statistical estimation of library size lⁱ is usually sufficient. In this case, lⁱ can be fixed as the observed library size of cell i. ρⁱ = f(z^π(i), sⁱ, bⁱ) represents the expectation of scaled gene expression. The function f employs a soft-max layer so that ρⁱ has positive entries that sum to 1. $\theta \in {{\mathbb{R}}}^{p}$ specifies the gene-specific shape parameter of the Gamma distribution. ${h}_{g}^{i}$ represents the zero inflation level. Together, ${x}_{g}^{i}$ follows a zero-inflated negative binomial (ZINB) distribution, parametrized by functions of (z^π(i), sⁱ, bⁱ, lⁱ). The generative process of SIMVI mostly follows the scVI framework^41,70, with the addition of the probabilistic spatial assignment π and the spatial variation s.

Approximate posterior inference of SIMVI

In order to infer the parameters in the generative process, we approximate the posterior distribution via variational inference. The posterior distribution is approximated to be of the following factorized forms:

$$ {q}_{\Phi }({z}^{\pi i},{s}^{i},{l}^{i}| {x}^{i},{x}^{N(i)},{b}^{i})=\, {q}_{\Phi }({z}^{\pi i}| {x}^{i},{b}^{i}){q}_{\Psi }({s}^{i}| {x}^{N(i)},{b}^{N(i)}){q}_{{\Phi }_{l}}({l}^{i}| {x}^{i},{b}^{i})\\ {q}_{\Phi }({z}^{\pi i}| {x}^{i},{b}^{i}):{z}^{\pi i} \sim \, {{{\mathcal{N}}}}({\phi }_{1}({x}^{i},{b}^{i}),{\phi }_{2}({x}^{i},{b}^{i}))\\ {q}_{\Psi }({s}^{i}| {x}^{N(i)},{b}^{N(i)}):{s}^{i} \sim \, {{{\mathcal{N}}}}({\psi }_{1}({\phi }_{1}({x}^{N(i)},{b}^{N(i)})),{\psi }_{2}({\phi }_{2}({x}^{N(i)},{b}^{N(i)})))\\ {q}_{{\Phi }_{l}}({l}^{i}| {x}^{i},{b}^{i}):{l}^{i} \sim \, \exp \left({{{\mathcal{N}}}}({\phi }_{l1}({x}^{i},{b}^{i}),{\phi }_{l2}({x}^{i},{b}^{i}))\right)$$

(2)

Here (Φ, Ψ, Φ_l) denotes neural network weights that determine the variational posterior, composed of ((ϕ₁, ϕ₂), (ψ₁, ψ₂), (ϕ_l1, ϕ_l2)) as shown above. Our treatment of zⁱ and lⁱ aligns with the scVI family of models^41,70. In SIMVI, we additionally model the spatial variation s_i using transformed cell neighborhood gene expression ϕ₁(x^N(i)), where ϕ₁ encodes the expectation term of intrinsic variation. Such a design enforces the spatial variation to be determined by the intrinsic variation in the cell neighborhood. We use a one-layer graph attention network (GAT) with dynamic attention⁴⁰ to model the variational posterior ψ₁, ψ₂. Finally, we note that the constructed variational posterior directly starts from z^π and does not explicitly model z and π. Consequently, the variational posteriors are best viewed as descriptive models of the previously outlined generative process.

After the factorization, we derive the evidence lower bound (ELBO) as follows. Instead of considering each cell, we consider ELBO of the full likelihood. As samples in the generative process are not independent, the full likelihood is not a simple sum of individual likelihoods. In the derivation of the ELBO, terms without an upper index represent those from all data points.

$$\log p(x| b,G) =\log \int\,p(x,{z}^{\pi },s,l| b,G)dzd\pi dsdl\\ =\log \int\,\frac{p(x,{z}^{\pi },s,l| b,G)q({z}^{\pi },s,l| x,b,G)}{q({z}^{\pi },s,l| x,b,G)}dzd\pi dsdl\\ \ge \int\,q({z}^{\pi },s,l| x,b,G)\log \frac{p(x,{z}^{\pi },s,l| b,G)}{q({z}^{\pi },s,l| x,b,G)}dzd\pi dsdl\\ =\int\,q({z}^{\pi },s,l| x,b,G)\log \frac{p(x| {z}^{\pi },s,l,b,G)p({z}^{\pi },s,l| b,G)}{q({z}^{\pi },s,l| x,b,G)}dzd\pi dsdl\\ ={{\mathbb{E}}}_{q({z}^{\pi },s,l| x,b,G)}\log p(x| {z}^{\pi },s,l,b,G)-{D}_{{{{\rm{KL}}}}}(q({z}^{\pi },s,l| x,b,G)| | p({z}^{\pi },s,l| b,G))\\ ={{\mathbb{E}}}_{q({z}^{\pi },s,l| x,b,G)}{\sum}_{i}\log p({x}^{i}| {z}^{\pi (i)},{s}^{i},{l}^{i},{b}^{i},G)-{D}_{{{{\rm{KL}}}}}(q({z}^{\pi }| x,b,G)| | p({z}^{\pi }| b,G))\\ \quad -{D}_{{{{\rm{KL}}}}}(q(s| {z}^{\pi },x,b,G)| | p(s| {z}^{\pi },b,G)| q({z}^{\pi }| x,b,G))\\ \quad -{\sum}_{i}{D}_{{{{\rm{KL}}}}}(q({l}^{i}| {x}^{i},{b}^{i})| | p({l}^{i}| {b}^{i})).\quad (\,{{\rm{ELBO}}})$$

(3)

For the first KL divergence term, we have

$${D}_{{{{\rm{KL}}}}}(q({z}^{\pi }| x,b,G)| | p({z}^{\pi }| b,G))=-\int\,q({z}^{\pi }={{{\boldsymbol{z}}}}| x,b,G)\log \frac{p({z}^{\pi }={{{\boldsymbol{z}}}}| b,G)}{q({z}^{\pi }={{{\boldsymbol{z}}}}| x,b,G)}d{{{\boldsymbol{z}}}}.$$

(4)

Considering the term p(z^π = z∣b, G), we have

$$p({z}^{\pi }={{{\boldsymbol{z}}}}| b,G)={\sum}_{{{{\boldsymbol{\pi }}}}}p(z={{{{\boldsymbol{z}}}}}^{{{{{\boldsymbol{\pi }}}}}^{-1}}| b,G)p({{{\boldsymbol{\pi }}}}| z={{{{\boldsymbol{z}}}}}^{{{{{\boldsymbol{\pi }}}}}^{-1}},b,G)$$

(5)

Because ${z}^{i}| b,G \sim {{{\mathcal{N}}}}(0,I)$ are i.i.d. across different i, $p(z={{{{\boldsymbol{z}}}}}^{{{{{\boldsymbol{\pi }}}}}^{-1}}| b,G)$ is a constant with respect to π, and equals p(z = z∣b, G). Moreover, from the definition of π, $p({{{\boldsymbol{\pi }}}}| z={{{{\boldsymbol{z}}}}}^{{{{{\boldsymbol{\pi }}}}}^{-1}},b,G)={U}_{G}({{{\boldsymbol{z}}}})/Z$ is also a constant with respect to π. The total number of π is n!. Therefore we have

$$p({z}^{\pi }={{{\boldsymbol{z}}}}| b,G)=\frac{n!}{Z}p(z={{{\boldsymbol{z}}}}| b,G){U}_{G}({{{{\boldsymbol{z}}}}}^{{\pi }^{-1}\circ \pi })=\frac{n!}{Z}p(z={{{\boldsymbol{z}}}}| b,G){U}_{G}({{{\boldsymbol{z}}}}).$$

(6)

Altogether, this KL divergence term equals

$$ -\int\,q({z}^{\pi }={{{\boldsymbol{z}}}}| x,b,G)\log \frac{n!p(z={{{\boldsymbol{z}}}}| b,G){U}_{G}({{{\boldsymbol{z}}}})}{q({z}^{\pi }={{{\boldsymbol{z}}}}| x,b,G)}d{{{\boldsymbol{z}}}}\\ ={\sum}_{i}{D}_{{{{\rm{KL}}}}}(q({z}^{\pi i}| {x}^{i},{b}^{i},G)| | {{{\mathcal{N}}}}(0,I))+\int \,q({z}^{\pi }={{{\boldsymbol{z}}}}| x,b,G)\log \frac{1/n!}{{U}_{G}({{{\boldsymbol{z}}}})/Z}d{{{\boldsymbol{z}}}}\\ ={\sum}_{i}{D}_{{{{\rm{KL}}}}}(q({z}^{\pi i}| {x}^{i},{b}^{i},G)| | {{{\mathcal{N}}}}(0,I))+{{\mathbb{E}}}_{q({z}^{\pi }| x,b,G)}\log \frac{1/n!}{{U}_{G}({z}^{\pi })/Z}$$

(7)

Our derivation shows that the KL divergence regarding z can be decomposed into two parts. The first part coincides with the KL divergence term in vanilla VAEs and scVI⁴¹, while the second term is minimized when U_G(z^π) achieves maximum under the variational posterior q(z^π∣x, b, G). In practice, we only used the first part in the loss function for two reasons. First, the function U is not tractable in practice; second, if the variational posterior q(z^π∣x) well characterizes individual observations, then it should automatically capture the underlying colocalization pattern, i.e., U_G(z^π) is large.

For the second KL divergence term, we have

$$ {D}_{{{{\rm{KL}}}}}(q(s| {z}^{\pi },x,b,G)| | p(s| {z}^{\pi },b,G)| q({z}^{\pi }| x,b,G))\\ ={\sum}_{i}{D}_{{{{\rm{KL}}}}}(q({s}^{i}| x,b,G)| | {{{\mathcal{N}}}}(A{z}^{\pi (N(i))},{\Sigma }_{s})| q({z}^{\pi }| x,b,G))\\ ={{\mathbb{E}}}_{q({z}^{\pi }| x,b,G)}{\sum}_{i}{D}_{{{{\rm{KL}}}}}({{{\mathcal{N}}}}({\psi }_{1}({\phi }_{1}({x}^{N(i)},{b}^{N(i)})),{\psi }_{2}({\phi }_{2}({x}^{N(i)},{b}^{N(i)})))| | {{{\mathcal{N}}}}(A{z}^{\pi (N(i))},{\Sigma }_{s}))$$

(8)

As A, Σ_s are not known in priori, the term is not tractable in general. Therefore, we instead formulate a penalty term regularizing the variational posterior towards standard normal distribution with a small coefficient α (0.01 in this study). This coincides with (8) when A = 0, Σ_s = I. Such a regularization with a small weight is also applied in state-of-the-art vision models such as latent diffusion⁷¹.

$$ \alpha {{\mathbb{E}}}_{q({z}^{\pi }| x,b,G)}{\sum}_{i}{D}_{{{{\rm{KL}}}}}({{{\mathcal{N}}}}({\psi }_{1}({\phi }_{1}({x}^{N(i)},{b}^{N(i)})),{\psi }_{2}({\phi }_{2}({x}^{N(i)},{b}^{N(i)})))| | {{{\mathcal{N}}}}(0,I))\\ =\alpha {\sum}_{i}{D}_{{{{\rm{KL}}}}}({{{\mathcal{N}}}}({\psi }_{1}({\phi }_{1}({x}^{N(i)},{b}^{N(i)})),{\psi }_{2}({\phi }_{2}({x}^{N(i)},{b}^{N(i)})))| | {{{\mathcal{N}}}}(0,I))$$

(9)

In summary, we have derived the (effective) ELBO of the SIMVI model, which can be indeed decomposed for each cell i, after dropping the term associated with U_G:

$$\,{\mbox{ELBO for cell}}\,i= {{\mathbb{E}}}_{q({z}^{\pi },s,l| x,b,G)}\log p({x}^{i}| {z}^{\pi (i)},{s}^{i},{l}^{i},{b}^{i},G)-{D}_{{{{\rm{KL}}}}}(q({z}^{\pi i}| {x}^{i},{b}^{i},G)| | {{{\mathcal{N}}}}(0,I))\\ -\alpha {D}_{{{{\rm{KL}}}}}(q({s}^{i}| {x}^{N(i)},{b}^{N(i)},G)| | {{{\mathcal{N}}}}(0,I))-{D}_{{{{\rm{KL}}}}}(q({l}^{i}| {x}^{i},{b}^{i})| | p({l}^{i}| {b}^{i}))$$

(10)

Optimizing solely ELBO is not enough to achieve disentanglement. In particular, the inferred intrinsic variation may encompass all information in the dataset, if no additional constraints are specified. Our theoretical analysis shows that, the model identifiability can indeed be achieved, when the intrinsic variation z encodes minimal information (Supplementary Note 1). In order to incorporate this into the SIMVI objective, we utilize an independence regularization term between s and z, and only regularize z by the term. We implement two versions of the regularization, utilizing closed-form mutual information and kernel maximum mean discrepancy (MMD) respectively. The expressions for both are introduced as follows.

1.
Closed-form mutual information. Assuming [z, s] follows joint Gaussian distribution, the mutual information term can be derived analytically:
$$I(z,s)=\frac{1}{2}\log \frac{\det {\hat{\Sigma }}_{s}\det {\hat{\Sigma }}_{z}}{\det {\hat{\Sigma }}_{[z,s]}},$$
(11)
where ${\hat{\Sigma }}_{z},{\hat{\Sigma }}_{s},{\hat{\Sigma }}_{[z,s]}$ denote the sample covariance matrices for z, s, [z, s] respectively. If [z, s] follows a joint Gaussian distribution, then this term can be viewed as a closed-form estimator of mutual information. Even when [z, s] does not follow a joint Gaussian distribution, it still serves as a meaningful measure of the dependency between z and s.
2.
Kernel MMD based independence regularization. We consider the regularization of the following form:
$$I(z,s)=\,{{\rm{MMD}}}\,(p([z,s]),\,{{{\mathcal{N}}}}(0,I)).$$
(12)
Where the function MMD is defined as:
$$\,{{{\rm{MMD}}}}\,(p(x)| | q(x))= {{\mathbb{E}}}_{p(x),p({x}^{{\prime} })}k(x,{x}^{{\prime} })+{{\mathbb{E}}}_{q(x),q({x}^{{\prime} })}k(x,{x}^{{\prime} })\\ -2{{\mathbb{E}}}_{p(x),q({x}^{{\prime} })}k(x,{x}^{{\prime} }).$$
(13)
Here k indicates Gaussian kernel function with the denominator equal to the dimension number, consistent with the setting in InfoVAE⁷². The MMD based independence regularization jointly enforces Gaussianity and the independence between z and s.

To enforce meaningful disentanglement, we only apply the regularization term on z. The resulting optimization problem is formulated as follows:

$$ \,{{{\rm{Loss}}}}\, {{{\rm{for}}}}\,({\phi }_{1},{\phi }_{2})=-\,{{\rm{ELBO}}}\,+\beta I(z,s),\\ \, {{{\rm{Loss}}}}\, {{{\rm{for}}}}\, ({\psi }_{1},{\psi }_{2})\, {{{\rm{and}}}} \, {{{\rm{for}}}}\, {{{\rm{other}}}} \, {{{\rm{parameters}}}}\,=-\,{{\rm{ELBO}}}.\,$$

(14)

We finalize by noting that the train / validation set construction and the mini-batch setting for the SIMVI model is different from most scVI-based models^73,74. This arises from the fact that a general connected graph does not admit a natural partitioning scheme. In this work, in order to define train and validation sets or mini-batches, we adopt the semi-supervised node classification framework for graph neural networks⁷⁵. Specifically, the full dataset is fed into the SIMVI model to compute the intermediate cell-level outputs (embeddings) for each cell. The cell-level outputs are then divided into training and validation sets, with each further segmented into mini-batches. During model training, the loss function is computed using only the embedding outputs from cells in the training set.

Permutation-based pretraining

SIMVI also provides an option to employ the denoising autoencoding scheme⁷⁶ as a pretraining step. Specifically, we first sample a subset of genes in each training batch, and then permute the gene values across cells. This step adds noise in the training data while still preserving the marginal distribution of genes. Then the model is trained to reconstruct the original data from this noisy version of data.

Spatial effect identification with SIMVI

The spatial effect, i.e., the gene expression changes due to spatial microenvironment, is usually determined by both intrinsic and spatial variations of a cell. We propose to estimate the spatial effect via continuous treatment effect estimation framework in causal inference⁴³. The methodology we employ (archetype transformation, and partial regression) is also described in Supplementary Note 2 with an emphasis on the mathematical foundation. A related task to ours is the treatment effect estimation problem for single-cell RNA-seq data^77,78; however, a key distinction in our case is that the “treatment”, namely SIMVI spatial variation, is inferred from the model.

We first use archetypal analysis⁴⁷ to transform the SIMVI spatial variation s into archetype weight representation ${s}^{{\prime} }$ using the py_PCHA package (v1.0.3)⁴⁷. The number of archetypes is customized across datasets in our study. We set δ = 0.1 consistent with the PCHA usage example to increase the tolerance for archetypes outside data support. We further adjust the default setting to reduce the number of iterations (conv_crit=1E-5, maxiter=200 compared with the default conv_crit=1E-6, maxiter=500). All other parameters are selected as PCHA defaults.

Then we use linear regression to fit the two models and obtain residuals $\bar{Y},{\bar{s}}^{{\prime} }$:

$$Y \sim \hat{Y}(z),\,{s}^{{\prime} } \sim {\hat{s}}^{{\prime} }(z);\quad \bar{Y}=Y-\hat{Y}(z),\,{\bar{s}}^{{\prime} }={s}^{{\prime} }-{\hat{s}}^{{\prime} }(z).$$

(15)

Here Y represents log normalized expression of a gene (or any compatible term of interest), z represents the covariate, which is the SIMVI intrinsic variation, optionally concatenating other covariates; ${s}^{{\prime} }$ represents the transformed archetypal weight vector. After we obtained residuals $\bar{Y},\bar{s}$, we run a linear regression model

$$\bar{Y} \sim {\sum}_{j}{\bar{s}}_{j}^{{\prime} }z{\beta }_{j}+{c}_{j}{\bar{s}}_{j}^{{\prime} }.$$

(16)

After the model parameters are obtained, the spatial effect can be obtained as:

$${\sum}_{j}{\bar{s}}_{j}^{{\prime} }z{\beta }_{j}+{c}_{j}{\bar{s}}_{j}^{{\prime} }.$$

(17)

The coefficient of determination, R², for intrinsic variation is represented as ${R}^{2}(Y,\hat{Y}(z))$, while the R² for spatial variation is denoted by ${R}^{2}(\bar{Y},{\sum }_{j}{\bar{s}}_{j}^{{\prime} }z{\beta }_{j}+{c}_{j}{\bar{s}}_{j}^{{\prime} })$. All linear regression models in our study were implemented using sklearn.linear_model.LinearRegression (scikit-learn v1.2.2).

We further developed an “individual mode” to estimate the spatial effect for each archetype. In this case, we have one regression model for each archetype on the residual $\bar{Y}$:

$$\bar{Y} \sim {\bar{s}}_{j}^{{\prime} }z{\beta }_{j}+{c}_{j}{\bar{s}}_{j}^{{\prime} }$$

(18)

After fitting all regression models, the spatial effect is still obtained as the sum ${\sum }_{j}{\bar{s}}_{j}^{{\prime} }z{\beta }_{j}+{c}_{j}{\bar{s}}_{j}^{{\prime} }$, whereas the R² for each gene is defined as the max R² among all archetypes. Our analysis revealed high concordance between spatial effects generated by the original and individual modes (Supplementary Fig. 11c). Therefore we used the individual mode throughout the study, leveraging its enhanced capacity to estimate spatial effects for each archetype. To highlight the distinction between intrinsic variable genes and spatial-induced genes in the R² scatter plot, we labeled genes with intrinsic variation R² larger than a threshold as “Intrinsic-specific”. We further labeled outlier genes from Huber regression (scaled residual larger than a threshold) as “Spatial-induced”.

Finally, we provide an option to derive label-based positivity scores to evaluate the feasibility of spatial effect estimation for each cell. A mathematical formulation is provided in Supplementary Note 2. The cell-level positivity index involves the selection of two thresholds: thres1, which is used in filtering out archetypes, and thres2, which is applied in filtering out individual cells. An evaluation on the real Slide-seqV2 dataset suggests that the spatial regions violating positivity (archetypes with high positivity index) remain stable with respect to the clustering resolution (Supplementary Fig. 14c).

MERFISH human cortex dataset

We downloaded the data from https://datadryad.org/stash/dataset/doi:10.5061/dryad.x3ffbg7mw. We used the following samples from the dataset: MTG sample: H18.06.006.MTG.4000 rep1-3, with 11059 cells in total and 4000 genes; STG sample: H19.30.001.STG.4000 rep1-3, with 14924 cells in total and 4000 genes. The cell type label was provided in the dataset named cluster_L1. We annotated the layers by manual segmentation, referring to the cell subtype annotations provided in the dataset (cluster_L2), which additionally labeled layers of excitory neurons and subtypes of inhibitory neurons. We further derived a local niche annotation based on MYH11 expression. In the MTG dataset, we labeled cells with log normalized MYH11 expression ≥1 as “MYH11+”, and the remaining as “MYH11-”. In the STG dataset, due to the overall higher sparsity, we labeled all cells expressing MYH11 as “MYH11+”, and the remaining as “MYH11-”. We conducted a thorough parameter sweeping study using MTG and STG samples to better understand the SIMVI performance under different model configurations, with details available in Supplementary Note 3.

We included a variety of methods in the benchmarking apart from SIMVI, including Harmony (harmonypy v0.0.9), scVI (scvi-tools v0.16.1), MEFISTO (mofapy2 v0.2.0), NSF (Commit id: 2c656a72b3bba6f2c7af1ec121a832011a541d75), SpiceMix (Commit id: aea69f8081f33e8ca64b734c680804db907ed4eb), GraphST (v1.1.1), STAGATE (v1.0.1), CellCharter (Implemented using scvi-tools v0.16.1)^{21,22,23,24,26,41,46}, and included a list of baselines that adds a batch integration step for methods do not address batch correction (GraphST + Harmony, STAGATE + Harmony, SpiceMix + Harmony). We further tested batch-correcting scVI-backbone models that do not incorporate intrinsic variation but include covariate labels instead (Graph Only; Graph Only + cell type 1; Graph Only + cell type 2). We tested all methods using the same set of 10 random seeds. For all methods that transform spatial information into graphs (SIMVI, SpiceMix, GraphST, STAGATE, CellCharter, GraphST + Harmony, STAGATE + Harmony, SpiceMix + Harmony, Graph Only, Graph Only + cell type 1, Graph Only + cell type 2), we used a k-NN graph encoding spatial proximity with k=10. For the remaining methods that require spatial locations (MEFISTO, NSF), we combined the datasets and positioned different samples at sufficiently distant locations. For scVI-based approaches (SIMVI, scVI, CellCharter, Graph Only, Graph Only + cell type 1, Graph Only + cell type 2) that involve mini-batch training, we used the same number of training epoches (100) and the same batch size (500). We used the default set of parameters of SIMVI (Supplementary Note 3), with 25 pretraining epoches. We used default settings for Harmony and applied it on the PCA embedding of log normalized expression of all 4000 genes. For scVI, we used a compatible setting with the SIMVI setting that is also similar to scVI default, with latent space dim = 20, number of encoder and decoder layers = 2. For methods that involve normalization (MEFISTO, NSF, GraphST, STAGATE, SpiceMix), we used log normalized counts by Scanpy (v1.9.3) default. For MEFISTO and NSF, we used 100 and 500 inducing points respectively (Gaussian likelihood for NSF), with the spatial variation dim = 10. We observed that the version of NSF we used that outputs both spatial and non-spatial variations (NSFH) outputs NaN for the non-spatial part. Therefore, only the spatial part is used in our benchmarking. We used default parameter settings for GraphST and STAGATE. For SpiceMix, we used k-means initialization, and a compatible parameter setting with SIMVI, namely K = num_pcs = 20. For CellCharter, we leveraged the trained scVI embeddings and used the first order neighborhood and set the aggregation function as mean aggregation. For (GraphST + Harmony, STAGATE + Harmony, SpiceMix + Harmony), we utilized the formerly trained embeddings from GraphST, STAGATE, and SpiceMix as inputs for Harmony, applying a fixed random seed for Harmony. For (Graph Only, Graph Only + cell type 1, and Graph Only + cell type 2), latent space dim = 20, and all remaining applicable parameter settings were consistent with those of SIMVI.

We defined a list of benchmarking scores for evaluating performances across different methods. Each score is an ensemble of batch correction or biological preservation metrics implemented in scIB⁷⁹. Specifically, we used graph connectivity, silhouette batch / label, and KBET (k-nearest-neighbor batch-effect test) as batch correction metrics, and Leiden NMI (normalized mutual information), Leiden ARI (adjusted Rand index), Silhouette label as biological preservation metrics. All the metrics were implemented by the scib-metrics Python package (v0.3.3)⁷⁹. Batch correction metrics require both batch and biological annotation labels as inputs, and were applied to datasets containing multiple batches. In contrast, biological preservation metrics only require biological annotation labels, and were applied to both single-batch and multi-batch datasets.

We defined three aggregation rules for summarizing metrics as final scores. Scores without auxillary labels indicate that it is a result of averaging metrics without rescaling. Scores labeled as * indicate that it is a result of averaging metrics and then rescaling the score to have min 0 and max 1. Scores labeled as † indicate that it is an average of rescaled metrics with min 0 and max 1. Using the aggregation rules, we derived four scores evaluating batch correction, cell type preservation, layer preservation, and local structure (MYH11+/− label) preservation respectively. These scores are further summarized as total scores by the described aggregation rules. For SIMVI, we assessed batch correction and cell type preservation using the intrinsic variation, while layer and niche preservation performances were evaluated based on the spatial variation.

We performed the archetype transformation in the dataset with number of archetypes = 7. Then we computed the spatial effect and associated R²s as previously described, with covariates as the concatenation of SIMVI intrinsic variation and one-hot labeling of the cell subtype annotation. Genes with log normalized mean expression larger than 0.1 (1561 genes in total) were used in the analysis. The spatial effect of astrocytes from replicate 1 was binned into 25 segments based on spatial Y coordinates and subsequently clustered using k-means (k = 8) to generate gene clusters 0–7. For baseline models, normalized counts (NC) represents log normalized original expression, and scVI / NCEM NC represent the normalized expression returned by either a scVI or a Graph Only + cell type model as previously described. We implemented two additional linear regression baseline models: one regressing log-normalized expression against the concatenation of cell subtype label and SIMVI spatial variation, and another regressing log-normalized expression against the concatenation of cell subtype and archetype weights of SIMVI spatial variation. We computed Spearman correlation and Kendall’s tau between spatial coordinates and each gene from different models’ output (without binning) of replicate 1 astrocytes. We computed the enrichment of Wikipathway 2023 Human gene set using EnrichR wrapped by GSEAPY (v0.10.8)^80,81.

Slide-seqV2 mouse hippocampus dataset

We acquired the annotated dataset in AnnData format from Squidpy⁸². As the dataset does not provide raw counts, we additionally accessed the raw count matrix from the Broad single-cell portal. An anatomic map from the Allen Mouse Brain Atlas⁸³ (P56, Coronal, Image 74, plate = 100960228) was downloaded and shown in Fig. 2e. In the analyses from the original work⁸, it has been noted that not all pixels correspond to individual cells. Instead, they can represent subcellular structures such as dendrites⁸. As a result, we applied a strict filtering scheme by including the top 2000 highly variable genes (via Scanpy default) and included cells with at least 20 highly variable genes. Our preprocessing scheme yielded to 32838 cells and 2000 genes. As the dataset only contains one batch, a smaller list of methods were included for comparison (scVI, MEFISTO, NSF, SpiceMix, GraphST, STAGATE, CellCharter, Graph Only, Graph Only + dominant cell type label (NCEM)^{21,22,23,24,26,34,41}). We note that here due to the absence of batch labels, the Graph Only model here does not have condition variables, and the Graph Only + cell type model uses the dominant cell type label from prior deconvolution. MEFISTO was not included due to its extensive time and memory usage. Due to the larger dataset size, we chose a larger batch size (2000) for scVI-based approaches. Other details of implementation were the same as those adopted in the MERFISH benchmarking.

We visualized SIMVI and two other top performers’ (GraphST, CellCharter) Leiden clustering results. The clustering resolutions were manually picked for best consistency with the Allen Mouse Brain Atlas annotation. Based on the SIMVI clustering results, we defined two scores. One is the niche preservation score that accounts for all clusters shown in the SIMVI panel of Supplemental Fig. 8e (All niche score); the other is based on the subset of SIMVI clusters that show clear spatial structures and aligns with the Allen Brain Atlas Annotation (cluster 7,8,9,10,11,12,13,14, Supplementary Fig. 8e), which we term as Selected niche score. The scores were defined as aggregation of biological preservation metrics using cluster labels. To avoid confirmation bias, in the benchmarking, we excluded the SIMVI run used to generate the clustering results.

We further used the existing deconvolution results provided in Squidpy to derive an annotation for CA/DG regions and their neighborhoods and compared with the clustering results. Specifically, for each cell, we computed a score by summing its own deconvolution ratios (CA / DG) with 1.5 times the average of its 10 nearest neighbors. We then applied thresholds to the scores, resulting in spatially smooth structures corresponding to CA and DG regions (Supplementary Fig. 8b, c). We further calculated the relative distance between CA, DG and their neighborhood by Silhouette label score. For spatial effect analysis, we used the same parameter setting as those adopted in MERFISH data analysis (with the exception of archetype number = 20), with covariates as the concatenation of SIMVI intrinsic variation and the prior deconvolution results. We further calculated the binary positivity index with thres1 = 0.95, thres2 = 0.5.

Slide-tags human tonsil dataset

We downloaded the dataset along with metadata from Broad Institute Single Cell Portal under the accession number SCP2169¹⁰. The data consists of 5778 cells. We selected the top 4000 highly variable genes from the dataset by Scanpy default. We applied the same set of methods as those in the Slide-seqV2 section. The training settings of all methods were consistent as those used for the MERFISH dataset, with the exception of the batch size for scVI-based methods (SIMVI, scVI, Graph Only, Graph Only + cell type) was set to 1000. We used the Leiden clustering result from SIMVI spatial variation to define the SIMVI niche shown in Fig. 4c. To define the germinal center B cell subset used in Fig. 4d, we first selected the cells that were labeled as germinal center B cells in the original annotation, and then filtered out possible doublet clusters that express CD247 and FDCSP respectively. We used the Leiden clustering result from SIMVI full variation to define the germinal center B cell phases. For the likelihood analysis in Supplementary Fig. 15d, we used MELD (v1.0.2)⁸⁴ to compute a graph filter that smooths a signal on graph defined by embeddings. We used MELD to derive a continuous dark zone likelihood by applying the graph filter estimated from SIMVI spatial variation on the one-hot dark zone label. The parameter β is set as 80, and all other parameters were consistent with the MELD default.

To benchmark across different methods, we derived three scores evaluating the cell type preservation, niche preservation, and GC phase preservation (within the set of GC B cells) respectively. The definitions of these scores are consistent with the descriptions in previous sections. To minimize the confirmation bias, we did not include the SIMVI run that was used to generate the annotation results.

Spatial multiome melanoma dataset

We downloaded the dataset along with metadata from Broad Institute Single Cell Portal under the accession number SCP2176¹⁰. The data consists of 2535 cells. We selected the top 2000 highly variable genes from the dataset by Scanpy default. We applied the same set of methods as in the Slide-seqV2 and Slide-tags human tonsil section. Apart from disentangled SIMVI representations, we further incorporated the full SIMVI variation in the benchmarking. We used the Leiden clustering result from SIMVI spatial variation to define the SIMVI niche shown in Fig. 5c. We derived the tumor niche label and the micoenvironment label (hypoxic or not for tumor 1) by manually segmenting the spatial region (Supplementary Fig. 16a). All implementation details were consistent with those in Slide-tags human tonsil data. To benchmark across different methods, we defined four scores evaluating the cell type preservation, macrophage state preservation (across tumor 1 and tumor 2 regions), hypoxia state preservation (within tumor 1), and SIMVI niche preservation (using the clustering labels derived from SIMVI) respectively. The definition of these scores are consistent with the descriptions in previous sections. To minimize the confirmation bias, we did not include the SIMVI run used to generate the SIMVI niche annotation in the benchmark.

For the ATAC modality preprocessing, we performed TF-IDF transformation on the provided peak matrix¹⁰ and computed the LSI components. We observed that removing the first LSI component has almost no effect on the data embedding (Supplementary Fig. 17f). Therefore we used all LSI components in subsequent analysis. We computed the spatial effect of the LSI components and log normalized gene expressions (number of archetypes = 8 in both cases) with covariates as the concatenation of SIMVI intrinsic variation and one-hot labeling of the cell type annotation. The state 1 and 2 were annotated by Leiden clustering of the ATAC spatial effect (Supplementary Fig. 17g). We performed differential analysis on the gene expression and ATAC peaks using sc.tl.rank_genes_groups with default settings. We next computed the Wikipathway enrichment for differential ATAC peaks (transformed to overlapping genes by Ensembl REST API⁸⁵ and gene transcription starting sites by the main annotation file from GRCh38.p14⁸⁶) and differentially expressed genes using EnrichR wrapped by GSEAPY^80,81. Motifs were annotated with JASPAR 2020 using the ChromVar (v1.18.0) implementation^87,88.

CosMx melanoma dataset

Statistics and reproducibility

This study complies with all relevant ethical regulations and was approved by the Yale Human Investigation Committee protocol #9505008219, conducted in accordance with the Declaration of Helsinki. Informed consent was obtained from all patients for the use of their tissue specimens for research purposes. Tissue specimens from 60 melanoma patients treated with immune checkpoint inhibitors were placed in a tissue microarray (TMA) block. Representative tumor areas were identified by a board-certified pathologist and 0.6-mm cores from each tumor block were used. Patients were treated between 2011 and 2016. The data cutoff date was September 1, 2017 and the median follow-up time was 20.1 months. The cohort consisted of retrospectively serially identified patients without stratification or matching. Clinicopathologic information was collected from clinical records and pathology reports⁸⁹. Slices from the block were submitted to NanoString for CosMx spatial profiling. The SIMVI analysis was performed on 25 melanoma patient samples that were co-profiled on one CosMx slide. No data were excluded from the analyses. No statistical method was used to predetermine sample size. The investigators were not blinded to allocation during experiments and outcome assessment. The dataset used contains samples from 16 male patients and 9 female patients, ages ranging from 35-90. Our analysis focused on demonstrating SIMVI’s power on unsupervised biological discovery, therefore sex and gender analysis was not considered in our study. Among these patients, 11 patients received ipilimumab and nivolumab in combination (IPI+NIVO), 13 received pembrolizumab (PEMBRO), and one patient received nivolumab (NIVO) alone. CosMx Human Universal Cell Characterization RNA Panel was used as the SMI reagent. This panel included genes for cell typing and mapping (243 genes), cell state and function (269 genes), cell-cell interaction (435 genes), and hormone activities (46 genes). No statistical methods were used to predetermine sample size. The data was divided into 11 categories of non-tumor cells and six subclasses of tumor cells by NanoString. The preprocessed dataset consisted of 56,761 cells and 960 genes.

Data analysis

The dataset is composed of only one batch but contains samples from different patients. Therefore we included a slightly different list for benchmarking (Harmony, scVI, scVI using patient label, GraphST, STAGATE, Graph Only, Graph Only + cell type, Graph Only + patient ID, SIMVI). NSF and SpiceMix were not included due to their extensive usage of time and memory, and previous benchmarking results showing that they prioritize intrinsic information and do not have a notable advantage over scVI. Apart from disentangled SIMVI representations, we further incorporated the full SIMVI variation in the benchmarking. We trained the SIMVI model with default settings with the pretraining epoch number = 75. Other training settings of all methods were consistent as those used for the MERFISH dataset, with the exception of the batch size for scVI-based methods (SIMVI, scVI, Graph Only, Graph Only + cell type) was set to 5000. We defined the cell type score as the scIB total score for batch correction (in this case patient label) and biological preservation for cell type, both defined consistently as described in previous sections. We also defined a patient response score to evaluate if a embedding of the CD8 T cell subset preserves the patient response label and maximally integrates different patients. We used only CD8 T cells here to remove the effect of cell composition difference across patient outcomes (Supplementary Fig. 19b), and noting the direct relevance between CD8 T cells and the immunotherapy treatment. We excluded NMI and ARI in calculating the response score as these two metrics are cluster-based and prefer embeddings that separate each patient.

We computed the spatial effect for all cells (number of archetypes = 20) and calculated the continuous positivity index with thres1 = 0.9 (Supplementary Fig. 19i,j). We extracted the macrophage subset for analyses in Fig. 6e-g. For the ligand-receptor analysis, we used CellTalkDB⁶⁹ for extracting the ligand-receptor pairs. For each ligand-receptor pair, we computed the sum of the average neighborhood normalized ligand expression (k = 10) and the central cell normalized receptor expression as the ligand-receptor strength for each cell. We further computed the Spearman correlation map between the ligand-receptor strength and gene spatial effect within non-tumor cells. The rows / columns that has an max absolute value above 0.25 is preserved in Fig. 6h. We used seaborn.clustermap (v0.12.2) via the Scanpy wrapper to perform hierarchical clustering on correlation maps, with method=‘centroid’ and other settings were set to be default. The correlation map was further segragated into different regions (Supplementary Fig. 21e). We computed the KEGG enrichment for ligand-receptor pairs and genes in different regions of the correlation map using EnrichR wrapped by GSEAPY (v0.10.8)^80,81.

Empirical running time

All SIMVI experiments were conducted on a high performance Linux server using one NVIDIA A6000 GPU. The most extensive (in both time and memory) training session we ran (on the CosMx melanoma dataset with 56,761 cells and 960 genes) completed the 100 epoches training (75 of the epoches are pretraining epoches) with batch size 5000 in approximately 15 minutes. We also note that the permutation epoch typically requires more time due to its permutation sampling scheme.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Our newly-generated CosMx melanoma dataset is available at Zenodo⁹⁰ (https://doi.org/10.5281/zenodo.14708000). Other analyzed datasets from previous publications are publicly available: MERFISH MTG/STG (Dryad), Slide-seqV2 mouse hippocampus (Broad single-cell portal SCP815), Slide-tag human tonsil (Broad single-cell portal SCP2169), and Slide-tag multiome melanoma (Broad single-cell portal SCP2176). Detailed preprocessing instructions are available in Methods. Source data are provided with this paper.

Code availability

We have made SIMVI available as a public open-source Python package, which can be accessed at https://github.com/KlugerLab/SIMVI. The code used to generate the results is available at https://github.com/MingzeDong/SIMVI_reproducibility⁹⁰.

References

Lohoff, T. et al. Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis. Nat. Biotechnol. 40, 74–85 (2022).
Article CAS PubMed Google Scholar
Zhang, M. et al. Spatially resolved cell atlas of the mouse primary motor cortex by merfish. Nature 598, 137–143 (2021).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Fang, R. et al. Conservation and divergence of cortical cell organization in human and mouse revealed by merfish. Science 377, 56–62 (2022).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
He, S. et al. High-plex imaging of rna and proteins at subcellular resolution in fixed tissue by spatial molecular imaging. Nat. Biotechnol. 40, 1794–1806 (2022).
Article CAS PubMed MATH Google Scholar
Liu, Y. et al. High-spatial-resolution multi-omics sequencing via deterministic barcoding in tissue. Cell 183, 1665–1681 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Zhang, D. et al. Spatial epigenome–transcriptome co-profiling of mammalian tissues. Nature 616, 113–122 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Bai, Z. et al. Spatially exploring rna biology in archival formalin-fixed paraffin-embedded tissues. Cell 187, 6760–6779 (2024).
Article CAS PubMed MATH Google Scholar
Stickels, R. R. et al. Highly sensitive spatial transcriptomics at near-cellular resolution with slide-seqv2. Nat. Biotechnol. 39, 313–319 (2021).
Article CAS PubMed Google Scholar
Vickovic, S. et al. High-definition spatial transcriptomics for in situ tissue profiling. Nat. methods 16, 987–990 (2019).
Article CAS PubMed PubMed Central MATH Google Scholar
Russell, A. J. et al. Slide-tags enables single-nucleus barcoding for multimodal spatial genomics. Nature 625, 101–109 (2024).
Article ADS CAS PubMed MATH Google Scholar
Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using dna nanoball-patterned arrays. Cell 185, 1777–1792 (2022).
Article CAS PubMed MATH Google Scholar
Wei, X. et al. Single-cell stereo-seq reveals induced progenitor cells involved in axolotl brain regeneration. Science 377, eabp9444 (2022).
Article CAS PubMed Google Scholar
Gilbert, S. F. Developmental biology. (sinauer associates, Inc, 2010).
Dries, R. et al. Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol. 22, 1–31 (2021).
Article MATH Google Scholar
Weber, L. M., Saha, A., Datta, A., Hansen, K. D. & Hicks, S. C. nnsvg for the scalable identification of spatially variable genes using nearest-neighbor gaussian processes. Nat. Commun. 14, 4059 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Hao, M., Hua, K. & Zhang, X. Somde: a scalable method for identifying spatially variable genes with self-organizing map. Bioinformatics 37, 4392–4398 (2021).
Article CAS PubMed MATH Google Scholar
Sun, S., Zhu, J. & Zhou, X. Statistical analysis of spatial expression patterns for spatially resolved transcriptomic studies. Nat. methods 17, 193–200 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Zhu, J., Sun, S. & Zhou, X. Spark-x: non-parametric modeling enables scalable and robust detection of spatial expression patterns for large spatial transcriptomic studies. Genome Biol. 22, 1–25 (2021).
Article MATH Google Scholar
Svensson, V., Teichmann, S. A. & Stegle, O. Spatialde: identification of spatially variable genes. Nat. methods 15, 343–346 (2018).
Article CAS PubMed PubMed Central Google Scholar
Velten, B. et al. Identifying temporal and spatial patterns of variation from multimodal data using mefisto. Nat. methods 19, 179–186 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Townes, F. W. & Engelhardt, B. E. Nonnegative spatial factorization applied to spatial genomics. Nat. Methods 20, 229–238 (2023).
Article CAS PubMed MATH Google Scholar
Chidester, B., Zhou, T., Alam, S. & Ma, J. Spicemix enables integrative single-cell spatial modeling of cell identity. Nat. Genet. 55, 78–88 (2023).
Article CAS PubMed PubMed Central MATH Google Scholar
Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun. 13, 1739 (2022).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Long, Y. et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with graphst. Nat. Commun. 14, 1155 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Haviv, D. et al. The covariance environment defines cellular niches for spatial inference. Nat. Biotechnol. 43, 269–280 (2025).
Varrone, M., Tavernari, D., Santamaria-Martínez, A., Walsh, L. A. & Ciriello, G. Cellcharter reveals spatial cell niches associated with tissue remodeling and cell plasticity. Nat. Genet. 56, 74–84 (2024).
Article CAS PubMed Google Scholar
Tang, Z. et al. Search and match across spatial omics samples at single-cell resolution. Nat. Methods 21, 1818–1829 (2024).
Zhong, C., Ang, K. S. & Chen, J. Interpretable spatially aware dimension reduction of spatial transcriptomics with stamp. Nat. Methods 21, 2072–2083 (2024).
Singhal, V. et al. Banksy unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis. Nat. Genet. 56, 431–441 (2024).
Article CAS PubMed PubMed Central MATH Google Scholar
Long, Y. et al. Deciphering spatial domains from spatial multi-omics with spatialglue. Nat. Methods 21, 1658–1667 (2024).
Dong, M. & Kluger, Y. Geass: Neural causal feature selection for high-dimensional biological data. International Conference on Learning Representations (2023).
Wu, Z. et al. Graph deep learning for the characterization of tumour microenvironments from spatial protein profiles in tissue specimens. Nat. Biomed. Eng. 6, 1435–1448 (2022).
Article CAS PubMed MATH Google Scholar
Shimonov, S. et al. Sorbet: Automated cell-neighborhood analysis of spatial transcriptomics or proteomics for interpretable sample classification via gnn. Preprint at https://doi.org/10.1101/2023.12.30.573739 (2024).
Fischer, D. S., Schaar, A. C. & Theis, F. J. Modeling intercellular communication in tissues using spatial graphs of cells. Nat. Biotechnol. 41, 332–336 (2023).
Article CAS PubMed MATH Google Scholar
Arnol, D., Schapiro, D., Bodenmiller, B., Saez-Rodriguez, J. & Stegle, O. Modeling cell-cell interactions from spatial molecular data with spatial variance component analysis. Cell Rep. 29, 202–211 (2019).
Article CAS PubMed PubMed Central Google Scholar
Tanevski, J., Flores, R. O. R., Gabor, A., Schapiro, D. & Saez-Rodriguez, J. Explainable multiview framework for dissecting spatial relationships from highly multiplexed data. Genome Biol. 23, 97 (2022).
Article PubMed PubMed Central Google Scholar
Hyvärinen, A. & Pajunen, P. Nonlinear independent component analysis: Existence and uniqueness results. Neural Netw. 12, 429–439 (1999).
Article PubMed MATH Google Scholar
Khemakhem, I., Kingma, D., Monti, R. & Hyvarinen, A. Variational autoencoders and nonlinear ica: A unifying framework. International Conference on Artificial Intelligence and Statistics 2207–2217 (2020).
Veličković, P. et al. Graph attention networks. International Conference on Learning Representations (2018).
Brody, S., Alon, U. & Yahav, E. How attentive are graph attention networks? International Conference on Learning Representations (2022).
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. methods 15, 1053–1058 (2018).
Article CAS PubMed PubMed Central MATH Google Scholar
Chernozhukov, V. et al. Double/debiased machine learning for treatment and structural parameters. Econ. J. 21, C1–C68 (2018).
MathSciNet MATH Google Scholar
Yao, L. et al. A survey on causal inference. ACM Trans. Knowl. Discov. Data (TKDD) 15, 1–46 (2021).
Article MATH Google Scholar
Imbens, G. W. & Rubin, D. B. Causal inference in statistics, social, and biomedical sciences (Cambridge university press, 2015).
Zhao, P., Zhu, J., Ma, Y. & Zhou, X. Modeling zero inflation is not necessary for spatial transcriptomics. Genome Biol. 23, 118 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with harmony. Nat. methods 16, 1289–1296 (2019).
Article CAS PubMed PubMed Central MATH Google Scholar
Mørup, M. & Hansen, L. K. Archetypal analysis for machine learning and data mining. Neurocomputing 80, 54–63 (2012).
Article MATH Google Scholar
Clavreul, S., Dumas, L. & Loulier, K. Astrocyte development in the cerebral cortex: Complexity of their origin, genesis, and maturation. Front. Neurosci. 16, 916055 (2022).
Article PubMed PubMed Central Google Scholar
Hevner, R. F. Layer-specific markers as probes for neuron type identity in human neocortex and malformations of cortical development. J. Neuropathol. Exp. Neurol. 66, 101–109 (2007).
Article CAS PubMed MATH Google Scholar
Cable, D. M. et al. Cell type-specific inference of differential expression in spatial transcriptomics. Nat. methods 19, 1076–1087 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Allen, C. D., Okada, T. & Cyster, J. G. Germinal-center organization and cellular dynamics. Immunity 27, 190–202 (2007).
Article CAS PubMed PubMed Central Google Scholar
Mesin, L., Ersching, J. & Victora, G. D. Germinal center b cell dynamics. Immunity 45, 471–482 (2016).
Article CAS PubMed PubMed Central Google Scholar
Siegel, R. L., Miller, K. D., Wagle, N. S. & Jemal, A. Cancer statistics, 2023. Ca Cancer J. Clin. 73, 17–48 (2023).
Article PubMed Google Scholar
Pichler, A. C. et al. Tcr-independent cd137 (4-1bb) signaling promotes cd8+-exhausted t cell proliferation and terminal differentiation. Immunity56, 1631–1648 (2023).
Li, Q. et al. Slc6a8-mediated intracellular creatine accumulation enhances hypoxic breast cancer cell survival via ameliorating oxidative stress. J. Exp. Clin. Cancer Res. 40, 168 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ma, R.-Y., Black, A. & Qian, B.-Z. Macrophage diversity in cancer revisited in the era of single-cell omics. Trends in immunology 43, 546–563 (2022).
Christofides, A. et al. The complex role of tumor-infiltrating macrophages. Nat. Immunol. 23, 1148–1156 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Biancalani, T. et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with tangram. Nat. methods 18, 1352–1362 (2021).
Article PubMed PubMed Central MATH Google Scholar
Ma, Y. & Zhou, X. Spatially informed cell-type deconvolution for spatial transcriptomics. Nat. Biotechnol. 40, 1349–1359 (2022).
Article CAS PubMed MATH Google Scholar
Lopez, R. et al. Destvi identifies continuums of cell types in spatial transcriptomics data. Nat. Biotechnol. 40, 1360–1369 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Bergenstråhle, L. et al. Super-resolved spatial transcriptomics by deep data fusion. Nat. Biotechnol. 40, 476–479 (2022).
Article PubMed MATH Google Scholar
Hu, J. et al. Deciphering tumor ecosystems at super resolution from spatial transcriptomics with tesla. Cell Syst. 14, 404–417 (2023).
Article CAS PubMed PubMed Central MATH Google Scholar
Murdoch, W. J., Singh, C., Kumbier, K., Abbasi-Asl, R. & Yu, B. Definitions, methods, and applications in interpretable machine learning. Proc. Natl Acad. Sci. 116, 22071–22080 (2019).
Article ADS MathSciNet CAS PubMed PubMed Central MATH Google Scholar
Lotfollahi, M. et al. Mapping single-cell data to reference atlases by transfer learning. Nat. Biotechnol. 40, 121–130 (2022).
Article CAS PubMed Google Scholar
Vento-Tormo, R. et al. Single-cell reconstruction of the early maternal–fetal interface in humans. Nature 563, 347–353 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Efremova, M., Vento-Tormo, M., Teichmann, S. A. & Vento-Tormo, R. Cellphonedb: inferring cell–cell communication from combined expression of multi-subunit ligand–receptor complexes. Nat. Protoc. 15, 1484–1506 (2020).
Article CAS PubMed Google Scholar
Garcia-Alonso, L. et al. Mapping the temporal and spatial dynamics of the human endometrium in vivo and in vitro. Nat. Genet. 53, 1698–1711 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Garcia-Alonso, L. et al. Single-cell roadmap of human gonadal development. Nature 607, 540–547 (2022).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Shao, X. et al. Celltalkdb: a manually curated database of ligand–receptor interactions in humans and mice. Brief. Bioinforma. 22, bbaa269 (2021).
Article Google Scholar
Gayoso, A. et al. A python library for probabilistic analysis of single-cell omics data. Nat. Biotechnol. 40, 163–166 (2022).
Article CAS PubMed MATH Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 10684–10695 (2022).
Zhao, S., Song, J. & Ermon, S. Infovae: Balancing learning and inference in variational autoencoders. Proc. aaai Conf. Artif. Intell. 33, 5885–5892 (2019).
MATH Google Scholar
Xu, C. et al. Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models. Mol. Syst. Biol. 17, e9620 (2021).
Article PubMed PubMed Central Google Scholar
Weinberger, E., Lin, C. & Lee, S.-I. Isolating salient variations of interest in single-cell data with contrastivevi. Nat. Methods 20, 1336–1345 (2023).
Article CAS PubMed MATH Google Scholar
Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. International Conference on Learning Representations (2017).
Vincent, P., Larochelle, H., Bengio, Y. & Manzagol, P.-A. Extracting and composing robust features with denoising autoencoders. Proceedings of the 25th international conference on Machine learning 1096–1103 (2008).
Papalexi, E. et al. Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens. Nat. Genet. 53, 322–331 (2021).
Article CAS PubMed PubMed Central Google Scholar
Dong, M. et al. Causal identification of single-cell experimental perturbation effects with cinema-ot. Nat. Methods 20, 1769–1779 (2023).
Article CAS PubMed PubMed Central MATH Google Scholar
Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. methods 19, 41–50 (2022).
Article CAS PubMed Google Scholar
Xie, Z. et al. Gene set knowledge discovery with enrichr. Curr. Protoc. 1, e90 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Fang, Z., Liu, X. & Peltz, G. Gseapy: a comprehensive package for performing gene set enrichment analysis in python. Bioinformatics 39, btac757 (2023).
Article CAS PubMed Google Scholar
Palla, G. et al. Squidpy: a scalable framework for spatial omics analysis. Nat. methods 19, 171–178 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Wang, Q. et al. The allen mouse brain common coordinate framework: a 3d reference atlas. Cell 181, 936–953 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Burkhardt, D. B. et al. Quantifying the effect of experimental perturbations at single-cell resolution. Nat. Biotechnol. 39, 619–629 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Martin, F. J. et al. Ensembl 2023. Nucleic acids Res. 51, D933–D941 (2023).
Article CAS PubMed MATH Google Scholar
Frankish, A. et al. Gencode reference annotation for the human and mouse genomes. Nucleic acids Res. 47, D766–D773 (2019).
Article CAS PubMed MATH Google Scholar
Fornes, O. et al. Jaspar 2020: update of the open-access database of transcription factor binding profiles. Nucleic acids Res. 48, D87–D92 (2020).
CAS PubMed Google Scholar
Schep, A. N., Wu, B., Buenrostro, J. D. & Greenleaf, W. J. chromvar: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat. methods 14, 975–978 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wong, P. F. et al. Multiplex quantitative analysis of tumor-infiltrating lymphocytes and immunotherapy outcome in metastatic melanoma. Clin. Cancer Res. 25, 2442–2449 (2019).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Dong, M., Su, D., Kluger, H., Fan, R. & Kluger, Y. SIMVI disentangles intrinsic and spatial-induced cellular states in spatial omics data. Zenodo Repository, https://doi.org/10.5281/zenodo.14708000 (2024).

Download references

Acknowledgements

The authors thank Boaz Nadler for extensive discussions and Henry Li for helpful feedbacks. R.F. and Y.K. disclose support for the research of this work from NIH [U54AG076043, U54AG079759]. H.K. and Y.K. disclose support for the research of this work from NIH [P50CA121974]. Y.K. also discloses support for the research of this work from NIH [R01GM131642, UM1DA051410, and U01DA053628].

Author information

Authors and Affiliations

Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA
Mingze Dong & Yuval Kluger
Department of Pathology, Yale School of Medicine, New Haven, CT, USA
Mingze Dong, Rong Fan & Yuval Kluger
Department of Biomedical Engineering, Yale University, New Haven, CT, USA
Mingze Dong & Rong Fan
Department of Medicine, Yale School of Medicine, New Haven, CT, USA
David G. Su & Harriet Kluger
Yale Cancer Center, Yale School of Medicine, New Haven, CT, USA
David G. Su, Harriet Kluger & Rong Fan
Yale Center for Immuno-Oncology, Yale School of Medicine, New Haven, CT, USA
David G. Su & Harriet Kluger
Department of Surgery, Yale School of Medicine, New Haven, CT, USA
David G. Su
Applied Mathematics Program, Yale University, New Haven, CT, USA
Yuval Kluger

Authors

Mingze Dong
View author publications
Search author on:PubMed Google Scholar
David G. Su
View author publications
Search author on:PubMed Google Scholar
Harriet Kluger
View author publications
Search author on:PubMed Google Scholar
Rong Fan
View author publications
Search author on:PubMed Google Scholar
Yuval Kluger
View author publications
Search author on:PubMed Google Scholar

Contributions

M.D. conceived the study, developed SIMVI, and conducted computational analyses in the study. M.D. established the theoretical foundation of SIMVI with input from Y.K. H.K. provided melanoma tumor samples and clinical data. Y.K. and H.K. provided the CosMx melanoma dataset. D.G.S. provided feedback on the melanoma data analysis. R.F. and Y.K. provided overall supervision. M.D., H.K., and Y.K. wrote the manuscript with input from R.F.

Corresponding author

Correspondence to Yuval Kluger.

Ethics declarations

Competing interests

R.F. is co-founder of and scientific advisor to IsoPlexis, Singleton Biotechnologies, and AtlasXomics with financial interest. H.K. has received institutional research grants (to Yale University) from Merck, Bristol-Myers Squibb, Apexigen and personal fees from Iovance, Celldex, Merck, Bristol-Myers Squibb, Clinigen, Shionogi, Chemocentryx, Calithera, Signatero, Gigagen, GI reviewers, Pliant Therapeutics and Esai. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Reporting Summary

Transparent Peer Review file

Source data

Source Data for Supplementary Figs.

Source Data for Main Figs.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Dong, M., Su, D.G., Kluger, H. et al. SIMVI disentangles intrinsic and spatial-induced cellular states in spatial omics data. Nat Commun 16, 2990 (2025). https://doi.org/10.1038/s41467-025-58089-7

Download citation

Received: 15 January 2025
Accepted: 05 March 2025
Published: 27 March 2025
DOI: https://doi.org/10.1038/s41467-025-58089-7

This article is cited by

Advancing biological understanding of cellular senescence with computational multiomics
- Sheng Li
- Paula A. Agudelo Garcia
- Rong Fan
Nature Genetics (2025)