Abstract
Spatial transcriptomics has emerged as a groundbreaking tool for the study of intercellular ligand-receptor interactions (LRIs) that exhibit spatial variability. To identify spatially variable LRIs with activation evidence, we present SPIDER, which constructs cell-cell interaction interfaces constrained by cellular interaction capacity, and profiles and identifies spatially variable interaction (SVI) signals with support from downstream transcript factors via multiple probabilistic models. SPIDER demonstrates superior performance regarding accuracy, specificity, and spatial variance relative to existing methods. Experiments of simulations and real datasets in bulk and single-cell resolutions validate SPIDER-identified SVIs by spatial autocorrelation and correlation with downstream target genes, and reveal their consistency across multiple biological replicates. Particularly, distinct SVIs on mouse datasets reveal the potential in representing regional and inter-cell type interactions. SPIDER groups SVIs with similar spatial distributions into SVI patterns that are supported by strong Pearson correlations on spot annotations, generating interaction-based sub-clusters within cell-type regions, and deriving enriched pathways.
Similar content being viewed by others
Introduction
Cell-cell interactions (CCIs) play a vital role in cellular functions, organogenesis, and disease progression, with identified CCIs widely applied in disease diagnosis and therapeutic strategies1,2,3. These highly complex interactions involve multiple signaling pathways and crosstalk between different cell types, some facilitated by ligand-receptor interactions (LRIs). Single-cell RNA sequencing (scRNA-seq) technology facilitates LRI inference from the co-expression of signaling molecules, such as ligands, receptors, and downstream transcription factors4.
Recently, the advance of spatially resolved transcriptome (ST) technologies5,6,7,8 has improved the accuracy of LRI inference by applying spatial proximity to LRI signal detection9. For example, Giotto and SpaTalk infer LRIs from neighboring cells defined by Delaunay triangulation, and recent COMMOT limits the signaling range of LRIs by collective optimal transport (COT)10,11,12. The inferred LRI signals are subject to differential expression tests given cluster labels such as cell types. For example, SpaTalk identifies enriched LRIs between any two clusters with a permutation test of randomly shuffling cell labels to reconstruct interacting cell pairs11. COMMOT summarizes LRI signals into a cluster-by-cluster signaling matrix and applies label permutation tests to identify the interaction significance between any pair of clusters12. However, examining LRIs with predefined spot clusters neglects possible regional interactions among mixed clusters or sub-clusters13.
Given the regulatory role of LRIs for tissue organization and homeostasis14, LRI signals could exhibit spatial heterogeneity across spatial locations, which we refer to as spatially variable LRIs (SVIs). Spatial variance in LRI signals can reflect cell states independent of gene-based annotations - cells from multiple clusters or a subcluster could exhibit homogeneous LRI signals15. Furthermore, similar to spatially variable genes (SVG), SVIs preserve the spatial relationships of cells and capture cellular heterogeneity concerning interactions16. Therefore, the identification of SVIs fills the gap in existing CCI studies.
Statistical models for assessing the significance of the dependence between signals and spatial coordinates have been proposed and reviewed17. Spatial autocorrelation statistics, such as Moran’s I and Geary’s C, serve as the baseline measurements for spatial variance18,19. For more sophisticated methods, SOMDE, SpatialDE, SpatialDE2, and nnSVG are based on the Gaussian process, SPARK-X is based on nonparametric covariance tests, and scGCO is based on hidden Markov random fields and graph cut20,21,22,23,24. However, the above methods are not directly applicable to identifying SVIs due to two obstacles. First, the LRI signal is inferred between two neighboring cells without specific locations, which is a prerequisite for spatial variance models. Furthermore, the other challenge is the computational scalability17. For example, SPARK-X and nnSVG scale linearly with the number of spatial locations22,23, while SpatialDE2 scales quadratically21. Detecting the spatial variance of interaction signals is more computationally intensive, as the number of spot pairs producing the signal doubles or triples the number of spots. This introduces the need for current SVG methods to adapt to the increasingly larger data size, as ST platforms move towards even higher resolution.
In addition to detecting spatial variance for a single signal, methods also exist for detecting the spatial co-occurrence between two signals. SVCA is a Gaussian process-based framework that decomposes gene expression variation into intrinsic effects, environmental effects, and gene interaction effects, identifying genes with a high proportion of variance explained by the interaction term25. Similarly, SpatialCorr employs a likelihood ratio test statistic to detect spatially varying gene correlations derived from the Gaussian kernel and assesses statistical significance through sequential Monte Carlo permutation26. However, the above methods do not consider if the spatially co-occurring genes form any spatial patterns. Recently, SpatialDM proposed to identify SVIs by introducing Moran’s R, a bivariate extension of the aforementioned spatial autocorrelation statistic Moran’s I, to detect spatial co-expression with spatial patterns27. However, SpatialDM fails to consider any functional support for identified SVIs, for example, whether the SVIs trigger any downstream targets in receivers. Without functional validation, the reported SVIs could contain false positives that should not be integrated into downstream analyses. Therefore, a method that removes such false positives by utilizing the enrichment of the ligand-receptor-target (LRT) signaling network is still lacking.
In this study, we propose the package, named SPtial Interaction-encoDed intErface decipheR (SPIDER), to identify SVIs with functional supports. SPIDER constructs interaction capacity-constrained cell-cell interaction interfaces using a power diagram. SPIDER then encodes each interface with the estimated interaction strengths of individual LR pairs by joining COT and co-expressions. Subsequently, SPIDER seeks functional support of an LRI by examining the activation of its target genes. In selecting potentially spatially variable LRIs, SPIDER recruits a self-organizing map (SOM) neural network to form abstract interfaces with interfaces in close proximity and identify signals that are potentially spatially variable across interfaces using six probabilistic models. Finally, SPIDER screens for SVI candidates with functional support, and generates interaction patterns by grouping SVIs with similar spatial distributions. Across all simulated scenarios in both bulk and single-cell resolutions, SPIDER is able to receive the highest Area Under the Receiver Operator Curve (AUC) and specificity values compared to other methods. On real datasets, SPIDER shows robustness in identifying SVIs from both single-cell and spot-based ST data generated by different platforms, including 10x Visium, Slide-seq V2, seqFISH, and Stereo-seq8,28,29,30. SPIDER demonstrates scalability by effectively operating across a range of constructed interfaces, from hundreds to over one hundred thousand, accommodating the increasing size of ST data. The difference in resolution also suggests the robustness of SPIDER against sequencing platforms. The SPIDER-identified SVIs have been validated based on two criteria: they exhibit higher scores on spatial autocorrelation metrics than the excluded LRIs, and they correlate stronger with their downstream target genes than the excluded LRIs. The biological significance of SVIs and SVI patterns has been validated through strong Pearson correlations with spot annotations, indicating biologically relevant spatial patterns. The analysis has also generated interaction-based sub-clusters within cell-type regions and clusters characterized by mixed cell-type interactions. Furthermore, SPIDER derives enriched pathways from SVIs and their supporting genes and excludes false-positive SVIs without literature evidence.
Results
Overview of SPIDER
Following the assumption that LRIs are spatially constrained, SPIDER estimates an interaction interface between any pair of cells given their interaction capacity and spatial proximity (Fig. 1A). SPIDER first evaluates the interaction capacity for each cell based on the total expression of ligand and receptor genes. SPIDER then applies a power diagram to identify interfaces between cells based on varying interaction capacities31.
A SPIDER estimates interaction interfaces between neighboring cells based on interaction capacity and spatial proximity. Interface sizes are proportional to assigned capacities in a power diagram representation76. B SPIDER models ligand-receptor interaction (LRI) signals across interfaces using a collective optimal transport approach integrated with co-expression analysis to estimate interaction profiles and directions. C Interfaces are located on the spatial transcriptome slice and represented as nodes in a proximity graph. Self-organizing mapping derives an abstract interface graph joining adjacent interfaces. D A knowledge graph is used to estimate receptor activation by analyzing weighted paths reaching spatially variable TFs (svTFs) using a cell-specific adjacency matrix. The power of this matrix quantifies path counts at increasing lengths. E Correlating receptor activation scores with LRI scores provides supporting evidence for LRIs by analysis of svTF patterns. F Statistical testing identifies spatially variable interaction (SVI) candidates. G Candidates are further screened for SVIs supported by correlating receptor activations with LRI scores. H SVIs with similar distributions are clustered using a mixture model.
Similar to the commonly used Delaunay triangulation in ST data analysis, a power diagram generates polygons representing spots. However, power diagrams directly represent interaction areas, with polygon sizes proportional to assigned interaction capacities. Therefore, the capacity-based power diagram outperforms Delaunay triangulation in its superior adaptability to identify interfaces according to varying interaction dynamics (detailed comparison in Supplementary Note 1).
In a power diagram, cells are lifted onto a paraboloid surface in 3D space, with the position of each cell determined by its assigned weight and 2D positions. The lifted positions shape an overall convex hull when viewed from above the paraboloid. This convex hull is then projected back downward onto the original 2D plane. The boundaries determined by this projection define the sizes and shapes of each cell’s bounding polygon within the 2D plane. The area of each polygon is directly proportional to the corresponding cell’s original lifting weight, with higher weights yielding larger polygons.
Subsequently, SPIDER models LRI signals across interfaces, represented as interaction profiles, as well as interaction directions, using a COT approach integrated with co-expression analysis of ligand and receptor genes (Fig. 1B and Supplementary Note 2). First, SPIDER applies COT to estimate the distribution of ligand and receptor expression across interfaces12. The COT problem is formulated to minimize the transport of LR expression across interfaces while penalizing un-transported expression, with constraints ensuring expression is transported from source to target spots. The COT solution represents the optimal distribution plan of ligands and receptors across interfaces, therefore facilitating the estimation of LRI signals and directions. Subsequently, the direction of an LRI in an interface is inferred as the maximum between bi-directional COT scores. For the interaction profiles, each entry in the profile represents an LR pair from the LR database. From the optimal transport scores, LRI-specific expressions are extracted for each spot to calculate LR co-expression. The interaction strength of a ligand-receptor pair is then calculated as the maximum of the corresponding COT score and co-expression value.
Subsequently, SPIDER locates the profiled interface on the ST slice at the center of the connected spots (Fig. 1C). As a result, we obtain the coordinates and expressions for interfaces. In particular, locating interfaces enables the construction of a spatial proximity graph with interfaces as nodes, which facilitates calculating the spatial variance of an LRI signal.
To reduce the number of sites in testing spatial variance, SPIDER further finds an abstract representation of interfaces based on the interface spatial proximity graph (Fig. 1C). Specifically, SPIDER utilizes SOM, an unsupervised neural network, to adaptively integrate adjacent interfaces into an abstract interface. SOM derives the mapping between interfaces and an abstract interface by both the topological neighboring relations and the relative interface densities32. The proximity-based abstraction of SOM preserves the spatial continuity of joint interaction profiles. Each abstract interface joins the interaction profiles of the contracted interfaces into an abstract interaction profile with a mean-max signal convolution.
After interface construction, SPIDER analyzes spatially variable transcription factors (svTFs) to provide functional support for LRI scores. The inclusion of svTF analysis serves two major purposes. First, the activation of a TF downstream of an SVI provides mechanistic validation for the detected SVI. If an SVI is functional, its signaling should propagate through the receptor to downstream gene regulatory programs, resulting in the spatial activation of specific TFs11. Therefore, observing spatially variable expression of a TF that can be mechanistically linked to an upstream SVI offers strong evidence that the SVI is not merely a result of spatial co-expression or technical artifact, but is biologically meaningful and functionally realized in the tissue context. Second, integrating svTFs into our analysis enhances the biological interpretability of SVIs, as it allows us to connect spatial cell-cell communication events with downstream changes in gene regulatory networks and pathway enrichment analysis. By linking SVIs to the activation of specific TFs, we can better understand how spatial signaling events drive functional heterogeneity within the tissue and how these events may shape spatial patterns of gene expression and cell states. Consequently, we exclude non-spatially variable TF genes, as they provide less compelling functional evidence for SVIs and are less likely to reflect the downstream consequences of spatial signaling.
SPIDER contains a knowledge graph representing known regulatory relationships among ligands, receptors, and downstream target TF genes. Receptor activation is estimated by tracing weighted paths reaching svTF nodes through this graph topology, using a cell-specific adjacency matrix representation integrated with cell expressions (Fig. 1D).
Powers of the weighted adjacency matrix quantify path counts linking receptors and TFs at increasing hops, akin to signal propagation through multiple network steps. Hop matrices extracted at each power allow the dissection of receptor-TF connectivity over a range of path lengths. The summation of hop matrices within a length threshold yields a combined matrix of full activation scores, systematically characterizing signal flow over the knowledge graph. Correlating the resulting activation scores with the LRI profiles implicating those receptors provides functional evidence of receptor-TF collaborations (Fig. 1E).
Subsequently, SPIDER identifies SVI candidates by a multi-model test, as illustrated in Fig. 1F from the abstract interfaces. Specifically, SPIDER integrates six statistical models and two benchmark tests for spatial variance and retains LRIs with statistically significant spatial variances that pass the joint model tests. SPIDER further screens for SVI candidates that are supported by svTFs, generalizing the set of svTF-supported SVIs (Fig. 1G).
SPIDER then clusters SVIs based on similarities in spatial distributions using a Gaussian process mixture model (Fig. 1H). The number of clusters is automatically determined by a Dirichlet process prior21. Each SVI cluster generates a spatial pattern as the posterior mixture of the member SVIs modeled as Gaussian process components, summarizing the spatial distribution of the included SVIs. We refer to the Gaussian process mixture as the activation strength of the generated pattern, which provides a quantitative measure of the summarized SVI profiles within the cluster. To explore the potential biological processes associated with SVIs that share similar spatial distributions, SPIDER also identifies significantly enriched pathways within each SVI cluster from KEGG and Reactome databases using Fisher’s exact test33,34.
Evaluation of SPIDER on simulated datasets compared to the SOTA CCI methods
To examine the accuracy of SPIDER in identifying SVIs, we simulated multiple datasets from the pancreatic ductal adenocarcinoma (PDAC) dataset in Fig. 2A with 428 spots and 498 LR pairs28. We used the SVCA package25 to simulate the interaction strength between ligands and receptors, as well as the receptor activation of TF genes. Specifically, SVCA dissected the expression variance of a gene into distinct Gaussian process components that individually represent the contributions from intrinsic cell states, spatial proximity, gene interactions, and residual noise.
A–E Comparisons on simulated single-cell PDAC datasets. A PDAC sample with region annotations. B Receiver Operating Characteristic (ROC) of the simulation dataset with 99% interaction strength level; Area Under ROC (AUC) for each method is marked in the legend. C The boxplot of five SVI evaluation metrics on LRIs detected by SPIDER, SpatialDM, CellChat, SpatialCorr, SpaTalk, and those excluded by the above methods (n=490 LRIs). The Geary score, where a lower value indicates higher spatial clustering, is reversed for visualization consistency. D Comparison of average AUCs across datasets under four different interaction strength levels (25, 50, 75, and 99%, respectively) and median noise level across different fractions. E Comparison of average specificity scores across null datasets under four different interaction strength levels (25%, 50%, 75%, and 99%, respectively) across all three noise levels. F–J Comparisons on simulated single-cell DLPFC sample 151,673. F DLPFC sample 151,673 with region annotations. G ROC of the simulation dataset with 99% interaction strength level. H The boxplot of five SVI evaluation metrics on LRIs detected by SPIDER, SpatialDM, CellChat, and SpaTalk, and those excluded by the above methods, with reverted Geary scores (n = 407 LRIs). I Comparison of average AUCs across datasets under four different interaction strength levels and median noise level across different fractions. J Comparison of average specificity across null datasets under four different interaction strength levels with all three noise levels between SPIDER and SpatialDM. K–M Comparisons on simulated bulk PDAC datasets. K Bulk PDAC simulation with annotations. L ROC of the simulation dataset with 99% interaction strength level; Area Under ROC (AUC) for each method is marked in the legend. M Comparison of AUCs across datasets under four different interaction strength levels with two resolution settings and all three noise levels. N–P Comparisons on simulated bulk DLPFC datasets. N Bulk DLPFC simulation with annotations. O ROC of the simulation dataset with 99% interaction strength level; AUC for each method is marked in the legend. P. Comparison of AUCs across datasets under four different interaction strength levels with two resolution settings and all three noise levels). PDAC: pancreatic ductal adenocarcinoma; DLPFC: dorsolateral prefrontal cortex. The boxplots display the median (center line), the 25 and 75th percentiles (box bounds), whiskers extending to the most extreme data points within 1.5 × the interquartile range, minima and maxima as the lowest and highest points within the whiskers, and outliers as individual points beyond the whiskers. The statistical significance of box plots is calculated using one-sided Mann-Whitney-Wilcoxon test with Benjamini-Hochberg correction, with the exact adjusted p-values listed in Supplementary Table 2 and the following significance annotations: ****: adjusted p-value ≤ 1.00e-04; ***: 1.00e-04 < adjusted p-value ≤ 1.00e-03; **: 1.00e-03 < adjusted p-value ≤ 1.00e-02; *: 1.00e-02 < adjusted p-value ≤ 5.00e-02. Source data are provided as a Source Data file.
By controlling the fraction of expression variance explained by gene interactions, we simulated ligand gene expression from the corresponding receptor expression. This fraction, which defines the interaction strength level, allows heterogeneous interaction strengths across both cells and LR pairs. Similarly, we simulated the expression of TF genes by controlling the fraction of intrinsic variance from receptors. As the spatial pattern of simulated expression is dominated by that of the given receptor gene, we selected the top 100 SV and non-SV receptors. Considering that SVCA is a Guassian-based framework, we further imposed different levels of Poisson noise on the simulated count matrix.
We constructed various simulation scenarios by combining the simulated ligand and TF expression data, initially generating 12 simulations by systematically varying the interaction strength levels–specifically, setting the fraction of ligand variance explained by receptor interaction to 99, 75, 50, and 25%–in conjunction with different noise levels. At the median noise level, we generated an additional 12 simulations by adjusting the fraction of svTF-supported non-SV receptors, with options for full, partial, or no svTF support. Lastly, we created four null simulations at the median noise level across different interaction strength levels, ensuring that all SV receptors were unsupported by svTFs, while non-SV receptors could have varying levels of support.
We benchmarked SPIDER’s performance against state-of-the-art (SOTA) CCI inference methods: cluster-based SpaTalk for ST data and CellChat for non-spatial single-cell data, as well as spatial correlation-based SpatialCorr and SpatialDM. To further test the effect of SPIDER components, we replace the interface scoring step in SPIDER with COMMOT and stLearn. We also evaluated the individual performances of SPIDER SVI models, namely SpatialDE, SpatialDE2, scGCO, SPARKX, SOMDE, and nnSVG.
AUC, or Area Under the Curve, is a widely utilized performance metric that evaluates the effectiveness of binary classification models by quantifying the area under the Receiver Operating Characteristic (ROC) curve. This metric provides a comprehensive measure of a model’s ability to differentiate between positive and negative classes, with a value of 1.0 signifying perfect discrimination. When the interaction strength is set at 99%, SPIDER achieved an average AUC of 0.84 across all noise levels, surpassing both SpatialDM (AUC = 0.701) and TF-based SpaTalk (AUC = 0.697), as illustrated in Fig. 2B. Additionally, SPIDER outperformed CCI methods SpatialCorr (AUC = 0.475) and CellChat (AUC = 0.673), which are not specifically designed for identifying SVIs. In contrast, the performance of scoring methods stLearn and COMMOT was less satisfactory, yielding AUCs of 0.713 and 0.562, respectively, thereby indicating SPIDER’s superiority in LRI scoring.
To further evaluate the selected SVIs identified by SPIDER, we employed two standard measures of spatial autocorrelation: Moran’s I and Geary’s C. High values of I and low values of C indicate non-random, clustered distributions18,19. We reverse the Geary score for visual and contextual consistency. In addition to these baseline assessments, we analyzed specialized metrics from the SOMDE and nnSVG algorithms, specifically the log-likelihood ratio (LR) and SOMDE’s fraction of spatial variation (FSV) score, which quantifies spatially explained interaction variability20,22. Under conditions of 99% interaction strength level, the SVI candidates selected by SPIDER and SpatialDM, and LRIs from SpatialCorr, CellChat, and SpaTalk exhibited significantly higher scores than those excluded by the above methods (adjusted p-value ≤ 0.0001), as illustrated in Fig. 2C. Additionally, SPIDER-nominated SVI candidates generally demonstrated significantly higher scores compared to those identified by SpatialDM, SpatialCorr, CellChat, and SpaTalk.
In scenarios with lower interaction strength levels of 25, 50, and 75% and three noise levels, SPIDER consistently outperformed all other methods (Fig. 2D and Supplementary Fig. 1A). As summarized in Supplementary Table 1, SPIDER achieved an average AUC of 0.886, compared to average AUCs of 0.762 and 0.703 for SpatialDM and SpaTalk, respectively. Similar performance trends were observed for spatial variance metrics in other simulation cases, where SPIDER often attained higher scores than SpatialDM, SpatialCorr, CellChat, and SpaTalk, although not always statistically significant, as demonstrated in Supplementary Fig. 1B. Additionally, SPIDER outperformed all other methods under different combinations of interaction strength levels and fractions of svTF-supported non-SV receptors (Supplementary Fig. 1C). SPIDER also demonstrated robust performance in composite interaction scenarios (details in Supplementary Note 3), achieving mean AUROCs of 0.930 (mixed interaction strength levels) and 0.871 (mixed noise levels), with superior performance in weak interaction identification (AUROC = 0.931 vs 0.797–0.671) and high-noise spatial pattern (AUROC = 0.779 vs 0.767–0.669) compared to SpatialDM and SpaTalk. Systematic validation through variance ranking and spatial autocorrelation metrics further confirmed SPIDER reliability in detecting SVI across interaction variance gradients and varying spatial patterns.
Furthermore, the individual SPIDER SVI model consistently yielded higher AUCs than SpatialDM across varying interaction and noise levels (Fig. 2D). Notably, SPIDER-SpatialDE and SPIDER-nnSVG, which utilize only the statistical results from SpatialDE and nnSVG, achieved the highest AUCs of 0.902 and 0.901, respectively. Among SPIDER SVI models, Gaussian models, such as SPIDER-SOMDE (average AUC = 0.899) and SPIDER-SpatialDE2 (average AUC = 0.895) outperformed scGCO (average AUC = 0.884) and SPARKX (average AUC = 0.872).
To evaluate performance under null scenarios, we conducted tests on SPIDER and comparable methods using simulated data without svTF-supported SVIs. Specificity, defined as a method’s ability to minimize false positives, serves as a critical performance indicator, with higher values indicating superior outcomes. SPIDER consistently demonstrated higher specificity compared to SpatialDM and SpaTalk across various interaction levels, with average specificities of 0.848, 0.392, and 0.418, respectively (Fig. 2E). Additionally, SPIDER outperformed individual statistical models and most other methods in terms of specificity (Supplementary Fig. 1C). It is noteworthy that some methods achieved higher specificity than SPIDER, primarily due to their tendency to reject a majority of LRI pairs.
We further test for the effect of TF variability on SPIDER. Across three TF noise levels, SPIDER consistently outperformed SOTA CCI methods with the highest mean AUCs of 0.813, compared to average AUCs of 0.779 and 0.639 for SpatialDM and SpaTalk, respectively (Supplementary Fig. 2A). While both SPIDER and SpaTalk were affected by the TF noise, SPIDER presented a smaller drop of 0.036 in AUC value from low to high TF noise levels compared to that of 0.043 for SpaTalk. Additionally, SPIDER under the median TF noise generally outperformed SOTA CCI methods against varying LRI noise levels and interaction strength levels (Supplementary Fig. 2B, C).
To eliminate dataset-specific influences, we replicated the simulation using the human dorsolateral prefrontal cortex (DLPFC) sample 151673 shown in Fig. 2F5. When the interaction strength level is set at 99%, SPIDER achieved an AUC of 0.85 across all noise levels, surpassing both SpatialDM (AUC = 0.78) and TF-based SpaTalk (AUC = 0.81), as illustrated in Fig. 2G. Additionally, SPIDER outperformed the CCI method CellChat (AUC = 0.72), with SpatialCorr failing to run on this larger dataset. In this dataset, we observed a better but still lesser performance of scoring methods stLearn and COMMOT, yielding AUCs of 0.81 and 0.74, respectively, again supporting SPIDER’s superiority in LRI scoring. The spatial variance evaluation metrics also suggested the higher quality of SVIs from SPIDER compared with SpatialDM, CellChat, and SpaTalk (Fig. 2H). Such accuracy and quality persist for other combinations of interaction strength levels and noise levels (Fig. 2I and Supplementary Fig. 3A, B). Across all interaction strength and noise level, SPIDER performance is in alignment with previous findings, exhibiting higher an average AUC of 0.859 and average specificity of 0.592, compared to SpatialDM, which recorded an average AUC of 0.803 and average specificity of 0.395, and SpaTalk, which recorded an average AUC of 0.809 and average specificity of 0.408 (Fig. 2J and Supplementary Fig. 3C).
In both simulations, integrating six distinct spatial variance testing models within SPIDER confers notable advantages over relying on any single model. As shown in Supplementary Table 1, while individual models such as SPIDER-SpatialDE and SPIDER-nnSVG achieve high AUROC values (0.902 and 0.901, respectively), and others like SPIDER-SPARKX and SPIDER-SpatialDE2 exhibit strong specificity (0.835 and 0.833, respectively), none surpass SPIDER itself in both AUROC and specificity across both PDAC and DLPFC simulations. This demonstrates that the integrative framework of SPIDER leverages the complementary strengths of each method, resulting in consistently robust performance–particularly in balancing sensitivity and specificity. Furthermore, by aggregating results from multiple tests, SPIDER effectively reduces the disagreement commonly observed among existing spatial variance detection methods35, thereby enhancing the reliability of identified SVI candidates. Importantly, SPIDER’s flexible design allows users to consider a custom group of spatial variance tests, ensuring adaptability to datasets with varying size and noise characteristics.
Furthermore, we evaluate SPIDER and comparative methods (SpatialDM and SpaTalk) on bulk simulations of the PDAC and DLPFC datasets. Across all simulations, SPIDER consistently demonstrated superior predictive performance for identifying SVIs supported by svTFs while lowering resolution differentially influenced certain models’ accuracy. We generated bulk PDAC dataset with the number of cells per spot as four and six (Fig. 2L and Supplementary Fig. 4A). We found SPIDER still demonstrated satisfactory identification of SVIs with svTF support compared to SpatialDM and SpaTalk, as indicated by ROCs in Fig. 2M and Supplementary Fig. 4B when the number of cells per spot set to four. Notably, the accuracy of SpatialDE2, scGCO, and SPARKX was more impacted by the reduced bulk resolution. This pattern held for both cell-per-spot settings tested (Fig. 2N). For the DLPFC bulk simulations (Fig. 2O and Supplementary Fig. 4C), SPIDER again showed superior predictive ability based on the ROCs (Fig. 2P and Supplementary Fig. 4D). In this dataset though, only SpatialDE2 appeared to suffer substantially from the lower bulk resolution, according to its AUROCs for both cell-per-spot settings (Fig. 2Q). Overall, SPIDER maintained strong predictive performance in identifying SVIs supported by svTFs under bulk resolution, outperforming alternative approaches. However, moving to bulk simulations decreased performance for some methods, with SpatialDE2 consistently most affected on both datasets.
Using both single-cell and bulk resolution simulations, we examined the impact of spatial constraints on SPIDER and two comparative methods, SpatialDM and SpaTalk. When spatial constraints on LRIs are absent–i.e., when spatial information is disregarded–none of the methods should identify SVIs. Across repeated random simulations with varying interaction strength levels, SPIDER and SpatialDM consistently achieved remarkably accurate specificity scores of 1 (Supplementary Fig. 5). In contrast, SpaTalk performed less effectively, likely due to its primary focus on identifying cluster-level interactions rather than SVIs.
To evaluate SPIDER’s robustness under weak spatial constraints, we permuted spots or cells within blocks of varying sizes while adjusting interaction strength levels (Supplementary Fig. 6A). SPIDER consistently outperformed both SpatialDM and SpaTalk across three block size settings, achieving a mean AUROC of 0.8716, compared to 0.7625 for SpatialDM and 0.6487 for SpaTalk (Supplementary Fig. 6B). Furthermore, we observed that loosening spatial constraints had a less pronounced effect on bulk resolution datasets, with a reduction in mean AUROC of only 0.0184 compared to a reduction of 0.0558 in single-cell resolution datasets (Supplementary Fig. 6C). This difference is likely attributable to the aggregated signals within spots in bulk resolution datasets.
In real samples, we assessed the interaction types identified by SPIDER under varying degrees of spatial constraint. Specifically, we permuted spots or cells within blocks of different sizes in two cancerous and two normal brain samples. Across all samples, the proportion of short-distance interactions decreased as spatial constraints were relaxed, confirming the influence of spatial proximity on interactions (Supplementary Fig. 6D). Notably, cancerous tissues exhibited a greater reduction in short-distance interactions, consistent with the predominant role of ECM-receptor interactions in cancer. In contrast, brain samples, where long-range secreted signaling dominates, showed less pronounced changes in interaction proportions.
SPIDER involves three hyperparameters for interface abstraction and svTF filtering of LRIs. We systematically assessed the impact of interface abstraction as well as alternative settings for these hyperparameters–the mixture parameter, SOM node number, and svTF filtering threshold–on model performance using simulation datasets (detailed in Supplementary Note 4). Power analysis demonstrates that interface abstraction is essential for larger datasets, where it effectively enhances sparse signals, but is less critical for smaller datasets. While the mixture parameter had limited overall influence, adjusting the SOM node number and svTF threshold demonstrated more prominent effects. Specifically, higher SOM nodes tended to reduce accuracy for some methods more substantially on the larger DLPFC dataset, and the stringent svTF threshold of 0.5 showed stronger dependence on dataset noise profiles than less restrictive cutoffs.
For practical application, we recommend enabling interface abstraction and using a moderate SOM node number (e.g., 10) for large-scale or high-complexity datasets, while for smaller datasets with less than 1000 spots or cells, interface abstraction can be omitted or a lower SOM node number (e.g., 5) is preferred. The mixture parameter can be set to the default value of 0.3, but increasing it to 0.5 may help amplify weak signals in sparse data. For svTF filtering, the default threshold of 0.3 generally balances sensitivity and specificity, whereas a more stringent threshold of 0.5 may be used when higher specificity is required, though this may reduce sensitivity in noisy datasets. In summary, parameter values should be tailored to dataset size and noise level: larger and noisier datasets benefit from interface abstraction, moderate SOM node numbers, and careful adjustment of the svTF threshold. Users are encouraged to iteratively refine these parameters based on dataset characteristics and the biological relevance of detected interfaces.
Evaluation of SPIDER on real datasets compared to SOTA CCI methods
We compare SPIDER, SpatialDM, stLearn, and COMMOT on the real PDAC dataset, where SPIDER identified 42 svTF-supported SVIs, while SpatialDM, stLearn, and COMMOT proposed 260, 323, and 346 LRIs, respectively. Similar to the simulation cases, we compare the selected LRIs with six metrics, as shown in Fig. 3A. Statistical tests annotated in Fig. 3A further confirmed that svTF-supported SVIs by SPIDER received significantly higher metrics values than those identified by SpatialDM, stLearn, and COMMOT, suggesting that SPIDER is more strict in identifying SVIs with both spatial variance and functional support from target TF genes. In addition, such pattern generally holds for other parameter settings (Supplementary Fig. 7).
A Boxplot of six SVI-evaluation metrics, showing significantly higher scores of SPIDER-identified SVIs over those by SpatialDM, COMMOT, and stLearn, with those excluded as a baseline (n = 498 LRIs). The Geary score is again reversed. B Boxplot of the number of supporting svTFs per LRI among SPIDER svTF-supported LRIs, SpatialDM SVIs, stLearn and COMMOT LRIs (n = 374, 260, 346, and 320 LRIs, respectively). C Venn plot of svTF-supported LRIs, SpatialDM SVIs, and stLearn LRIs. D TSPAN14/15 signaling LRIs with their best supporting TF genes and correlations. E Barplot of the average number of ligands per receptor (left) and the average number of receptors per ligand (right) in three different LRI sets: SPIDER SVIs, SpatialDM SVIs, COMMOT and stLearn LRIs, and all LRIs in the database. F Barplot showing the number of TF supports for the top five SVI ranked by number of TF supports. G Dot plot of pathways enriched by each SVI and its supporting spatially variable TF genes. H Deconvoluted cell type annotations and the corresponding SPIDER SVIs with the five highest Pearson correlation coefficients. I Interaction patterns generated by LRIs identified by SPIDER and SpatialDM. J Heatmap of Pearson correlation coefficients between interaction patterns and PDAC region annotations, from SPIDER and SpatialDM, respectively. K Heatmap of pattern-indicated interaction between PDAC region annotations. PDAC: pancreatic ductal adenocarcinoma; corr: Pearson correlation coefficient. The boxplots display the median (center line), the 25th and 75th percentiles (box bounds), whiskers extending to the most extreme data points within 1.5 × the interquartile range, minima and maxima as the lowest and highest points within the whiskers, and outliers as individual points beyond the whiskers. The statistical significance of box plots is calculated using one-sided Mann-Whitney-Wilcoxon test with Benjamini-Hochberg correction, with the exact adjusted p-values listed in Supplementary Table 3 and the following significance annotations: ****: adjusted p-value ≤ 1.00e-04; ***: 1.00e-04 < adjusted p-value ≤ 1.00e-03; **: 1.00e-03 < adjusted p-value ≤ 1.00e-02; *: 1.00e-02 < adjusted p-value ≤ 5.00e-02. Source data are provided as a Source Data file.
We first compare the SPIDER-proposed LRIs with svTF gene support with the SpatialDM SVIs and stLearn and COMMOT LRIs, acknowledging that the three SOTA methods are not designed for this specific task. The level of TF support, quantified by the number of supporting svTF genes for an LRI, is significantly lower for all three SOTA methods, as shown in Fig. 3B. Furthermore, 59 SpatialDM SVIs and 85 stLearn LRIs are not supported by any svTF genes (Fig. 3C). As a result, SpatialDM, stLearn, and COMMOT received high false-positive rate (FPR) of 0.417, 0.667, and 0.750, respectively, when comparing to LRI svTF support. Such pattern can be observed across dataset: SpatialDM achieved the lowest mean FPR of 0.230 among the three methods, followed by 0.781 from stLearn and 0.958 from COMMOT (Supplementary Fig. 8A). The gap between SpatialDM and other two CCI methods also supports the relation between SVI and svTF support. We further examine the false positive LRIs from the three SOTA methods that are rejected in SPIDER for the lack of svTF support (Supplementary Fig. 8B). For example, among three LRIs related to TSPAN14/15 signaling (Fig. 3D), only APP-TSPAN15 are supported by downstream target GAPDH, while the interaction between ADAM10 and APP-TSPAN15/14 are rejected (Pearson r = 0.323, 0.256, and 0.122, respectively). However, ADAM10-TSPAN15 is falsely identified by SpatialDM, COMMOT, and stLearn, and ADAM10-TSPAN14 by SpatialDM and stLearn, as neither validates svTF gene support. We first compare the SPIDER-proposed LRIs with svTF gene support with the SpatialDM SVIs, acknowledging that SpatialDM is not designed for this specific task. The level of TF support, quantified by the number of supporting svTF genes for an LRI, is significantly lower for SpatialDM SVIs, as shown in Fig. 3C. Furthermore, 59 SpatialDM SVIs are not supported by any svTF genes. For example, among three SpatialDM SVIs related to TSPAN14/15 signaling (Fig. 3D), only APP-TSPAN15 are supported by downstream target GAPDH, while the interaction between ADAM10 and APP-TSPAN15/14 are rejected (Pearson r = 0.323, 0.256, and 0.122, respectively). APP has been shown to influence the phosphorylation of key proteins like ERK within the MAPK signaling pathway, leading to increased cell migration, invasion, and epithelial-mesenchymal transition (EMT), promoting aggressive tumor behavior in various cancers36. In particular, the ERK pathway can regulate metabolic enzymes, including GAPDH, a known PDAC therapeutic target associated with increased metabolic activity in tumor growth37. In addition, by keeping only svTF-supported LRIs, we found a reduced average number of ligands per receptor and receptors per ligand in SPIDER svTF-supported SVIs, as shown in Fig. 3E. This observation also validates SPIDER’s effectiveness in finding functional SVIs, with co-expressing but not functioning LRIs excluded by downstream target genes. Both the higher level of TF support and the exclusion of less biologically meaningful LRIs suggest the effectiveness of SPIDER in identifying svTF-supported LRIs compared with SpatialDM, stLearn, and COMMOT.
Subsequently, we examine the biological insights provided by the pathways activated jointly by SVI and their supporting svTF genes. We show the top five ranking SVIs by their number of supporting svTF genes in Fig. 3F, all of which are supported by over twenty target TF genes. GO analysis on genes implicated in the top five ranking SVIs, along with their supporting svTF genes, showed diverse enriched pathways in Fig. 3G and Supplementary Table 4. We notice similar enriched pathways for LAMC2-ITGA2 and THBS2-NOTCH3, especially two cancer-related terms, proteoglycans in cancer and PI3K-Akt signaling pathway (entries hsa05205 and hsa04151, adjusted p-values ≤ 0.0001). The remaining three SVIs all enriched immune-related pathways such as antigen processing and presentation (entry hsa04612, adjusted p-value ≤ 0.0001). Here, SPP1-ITGB1 and TGM2-SDC4 and their svTF genes further enrich T-helper cell-related pathways, while INS-INSR and its svTF genes enrich the FoxO signaling pathway (hsa04068, adjusted p-value = 0.0005). Additionally, we compare the biological insights provided by false positive LRIs and svTF-supported SVIs. The false positive LRIs from other three CCI methods enriched 22 pathways, while SPIDER svTF-supported SVIs enriched 47 pathways, with only a small intersection of nine pathways (Supplementary Fig. 8C).
Notably, these biologically enriched SVIs not only highlight key signaling axes but also offer a practical guide for downstream experimental validation and therapeutic exploration. For instance, LAMC2-ITGA2 and THBS2-NOTCH3 along with their supporting svTF genes, which activate cancer-related pathways, represent promising candidates for perturbation assays such as ligand-blocking or CRISPR-mediated disruption, enabling assessment of their spatial functional roles. Moreover, INS-INSR or TGM2-SDC4 and their svTF genes, involved in metabolism and immune regulation, may inform targeted modulation strategies within the tumor microenvironment. By integrating spatial variance and downstream activation, SPIDER effectively narrows the search space for biologically meaningful, testable ligand-receptor interactions with translational potential.
Subsequently, we examine the spatial pattern of SPIDER SVIs against LRI spatial regions identified by SptialDM and stLearn. The deconvoluted cell type annotations of the PDAC dataset served as another validation of the identified SVIs (Supplementary Fig. 8D). As shown in Fig. 3H, SVIs identified by SPIDER, such as SERPINE1-PLAUR and PLAU-ITGA3, captured subregional spatial arrangements of two cancer clones, A and B, dispersed in the tumor region of the sample (Pearson r = 0.873 and 0.764, respectively). Both SVIs aid the understanding of the spatial heterogeneity and the molecular mechanisms underlying PDAC as they involve interactions between genes that are crucial in the tumor microenvironment and are implicated in the pathogenesis and progression of pancreatic duct adenocarcinoma38. These clone-specific SVIs also provide experimentally actionable hypotheses: spatial perturbation of SERPINE1-PLAUR or PLAU-ITGA3 in situ–through blocking antibodies or local gene silencing–can help elucidate the functional consequences of disrupting localized tumor subclone communication. Such targeted validation may clarify the role of these LRIs in maintaining clonal niches and promoting invasion or immune escape, guiding precision therapeutic design in spatially heterogeneous tumors. However, SpatialDM proposed less correlating marker SVIs for the cancer clones, with the Pearson r of the top marker SVIs are 0.697 and 0.517, respectively (Supplementary Fig. 8E). Furthermore, stLearn LRIs demonstrated insignificant correlations with the two cancer clone types. Similarly, SPIDER-proposed SVIs also correlated with major cell types, such as LTF-LRP1 and acinar cells with Pearson r = 0.850, as well as TGM2-SDC4 and ductal cells with Pearson r=0.818. Again, we can observe a higher average correlation with SPIDER SVIs than SpatialDM SVIs and stLearn LRIs, along with other cell types such as the tuft cells, pDCs, and endocrine cells (Supplementary Fig. 8F).
In addition, SPIDER can reveal interactions among annotated cell types from identified SVIs. We observed that 27 SPIDER SVIs exhibited heterotypic signaling between different cell types, while the remaining 15 exhibited homotypic signaling within the same cell type (Supplementary Fig. 8G). Interaction strength is also stronger for heterotypic signaling (Supplementary Fig. 8H). Traditional CCI analysis using scRNA-seq data finds a larger portion of homotypic signaling, suggesting that SVIs could reveal more heterotypic signaling39.
In particular, we observed heterotypic SVIs representing tumor-TME and TME-exclusive interactions (Supplementary Fig. 8I). Agreeing with the metastatic role of the aforementioned PLAUR, the SVI FN1-PLAUR revealed tumor-TME interactions, including the bi-directional cancer-stroma crosstalk, as well as interaction signals from duct epithelium to the cancer region. Conversely, SVI INS-INSR, with INSR being a known regulator of immune responses, showed strong TME interactions but limited interactions with the tumor region - it mainly revealed the interaction between duct epithelium and stroma issue. The biological importance of these SVIs is further validated, with FN1-PLAUR identified by SpatialDM, COMMOT, and stLearn, and INS-INSR recognized by SpatialDM and COMMOT.
LRIs can be categorized into three types: long-distance secreted signaling and short-distance signaling, which encompasses both ECM-receptor interactions and cell-cell contact signaling27. We evaluate SPIDER’s ability to identify both long-distance and short-distance signaling, using SpatialDM as a benchmark. In the PDAC sample, the proportion of secreted signaling in SPIDER SVIs is the highest (0.33 for SPIDER compared to 0.26 for SpatialDM, 0.23 for COMMOT and 0.32 for stLearn). Additionally, a majority of the samples analyzed show larger percentages of secreted signaling in SPIDER SVIs compared to SpatialDM, stLearn, and COMMOT (Supplementary Fig. 9).
Besides identifying LRIs with spatial variance, both SPIDER and SpatialDM can generate interaction patterns from the identified LRIs, as shown in Fig. 3I. The basic evaluations of SPIDER patterns are illustrated in Supplementary Fig. 10A, suggesting SPIDER SVI patterns are satisfying representations of their member SVIs. Subsequently, we further compare the biological relevance and region specificity of the interaction patterns generated by SPIDER and SpatialDM, highlighting their respective abilities to capture meaningful tissue interactions.
We anticipate that certain interaction patterns, akin to SVIs, will reflect intra-cell type interactions, as assessed by the correlation between these patterns and regional annotations (Fig. 3J). Specifically, SPIDER-generated interaction patterns 4 and 3 exhibited strong correlations with the cancer and duct epithelium regions, with Pearson correlation coefficients of 0.822 and 0.648, respectively. In contrast, pattern 2 displayed a significant negative correlation with the cancer region, yielding a Pearson coefficient of -0.802. Conversely, interaction patterns derived from SpatialDM demonstrated weaker correlations with the annotated regions, with the highest Pearson coefficient being 0.626. Thus, the SVI patterns produced by SPIDER more effectively capture the interaction differences within tissue regions.
SPIDER interaction patterns also demonstrate interactions between cell types (Supplementary Fig. 10B). In particular, pattern 1 and pattern 2 revealed tumor-TME (Tumor Microenvironment) and TME-exclusive interactions similar to SVIs (Fig. 3K). Here, pattern 1 represents the interaction between the cancer region and duct epithelium (adjusted p-value ≤ 0.001, Supplementary Fig. 10C). Similarly, pattern 0 represents the interaction between cancer and stroma, while pattern 3 represents the stroma-epithelium interaction (adjusted p-values ≤ 0.0001, Supplementary Fig. 10C).
We conducted additional analyses comparing SVI identified by SPIDER, SpatialDM, stLearn, and COMMOT using samples from the DLPFC dataset and other datasets (Supplementary Note 5). SVIs supported by svTFs in SPIDER generally exhibited higher spatial and TF correlation metrics than those from SpatialDM, stLearn, and COMMOT. In terms of finding LRIs supported by svTFs, SPIDER received the highest level of svTF support, and generally reduced the average number of ligands/receptors per receptor/ligand. SPIDER SVI patterns also showed stronger correlations with cell types than the SpatialDM-derived patterns in the DLPFC samples.
SPIDER identifies svTF-supported spatially variable ligand-receptor interactions from multiple samples in single-cell resolution
This section serves as a comprehensive assessment of SPIDER’s capacity to identify svTF-supported SVIs across samples from different platforms and biological repeats. To test SPIDER’s robustness against biological repeats, we apply SPIDER on the seqFISH mouse organogenesis dataset with three organogenesis samples, each with two biological repeats29. Furthermore, we showcase SPIDER’s robustness against sequencing platforms with the Stereo-seq mouse brain dataset almost at a single-cell resolution30.
On the seqFISH mouse organogenesis dataset, we first assess the quality of the identified SVIs using both spatial-variance metrics and svTF correlations. In terms of the spatial variance of SPIDER-proposed SVIs, we found significantly higher scores from SVIs than those excluded in all five spatial variance metrics in all samples (Fig. 4A). Furthermore, Fig. 4A shows that the identified SVIs demonstrate significantly higher svTF correlations than those excluded. Additionally, SpatialDM and stLearn both proposed interactions lacking svTF support, receiving mean FPRs of 0.344 and 0.766 when comparing to LRI svTF support (Supplementary Fig. 11A, B). The level of TF support, quantified by the number of supporting svTF genes for an LRI, is also lower for SpatialDM and stLearn, as shown in Supplementary Fig. 11C.
A–K Validation of SPIDER on identifying TF-supported SVIs on multiple seqFISH mouse samples at the single-cell resolution. A The boxplot of five SVI evaluation metrics and one TF correlation metric (n=282 LRIs). B The stacked bar plot showing TF supports for the top 10 SVIs across samples, ranked by the number of supports. C The dot plot lists pathways enriched by each SVI and its supporting spatially variable TF genes. D SVI related to the Fzd2 receptor, with different supporting TF genes and correlations. E The location of the annotated erythroid region in three embryo samples (z2) and the profile of the top SVI with the highest correlation Kitl-Epor. F The location of the annotated cardiomyocyte region in three embryo samples (z2) and the profile of the top SVI with the highest correlation Bmp7-Acvr2a/Acvr1. G The annotated sample embryo 1 (z2) of the embryo dataset, the location of the annotated brain region composed of the forebrain, midbrain, and hindbrain, and the profiles of the regional SVIs Fgf15-Fgfr1 and Sfrp1-Fzd2. H Clustering of spots based on identified SVIs. I, J Sub-clusters within the brain region and the spinal cord region, respectively. K Marker SVIs for the main clusters involved in the brain region and the spinal cord region, respectively. L–P. Validation of SPIDER on identifying TF-supported SVIs on the Stereo-seq mouse sample at approximately single-cell resolution. L Annotation of the Stereo-seq mouse sample. M The boxplot of five SVI evaluation metrics and one TF correlation metric (n=1295 LRIs). N The dot plot lists marker SVIs of inter-cell type interfaces. O Clustering of cells based on identified SVIs. P Cell types within SVI-defined Cluster 0 and 6, respectively. The boxplots display the median (center line), the 25 and 75th percentiles (box bounds), whiskers extending to the most extreme data points within 1.5 × the interquartile range, minima and maxima as the lowest and highest points within the whiskers, and outliers as individual points beyond the whiskers. The statistical significance of box plots is calculated using one-sided Mann-Whitney-Wilcoxon test with Benjamini-Hochberg correction, with the exact adjusted p-values listed in Supplementary Table 5 and the following significance annotations: ****: adjusted p-value ≤ 1.00e-04; ***: 1.00e-04 < adjusted p-value ≤ 1.00e-03; **: 1.00e-03 < adjusted p-value ≤ 1.00e-02; *: 1.00e-02 < adjusted p-value ≤ 5.00e-02. For bar plots, data are presented as mean values ± 95% confidence intervals. Corr: Pearson correlation coefficient. Inter: interneuron-selective cells; GABA: long-projecting GABAergic Cell; Astro: astrocyte; CR: Cajal-Retzius Cell; Oligo: oligodendrocyte. Source data are provided as a Source Data file.
Aside from the benchmarking metrics, we further evaluate the biological validity of TF-supported SVI activation. Ranking SVIs by their number of supporting svTF genes in all six samples allows us to evaluate the level of support, as shown in Supplementary Fig. 11D. The top five SVIs received support from over twenty target TF genes (Fig. 4B), namely Sftp1-Fzd2, Bmp7-Acvr1, Bmp2-Acvr1, Wnt5a-Fzd2, and Apln-Aplnr. We apply GO analysis on genes implicated in the above SVIs along with the corresponding supporting svTF genes (Supplementary Table 6). All five SVIs and svTFs enrich PI3K/Akt signal transduction pathway and signaling pathways regulating pluripotency of stem cells (entries mmu04151 and mmu04550, adjusted p-values ≤ 0.05, Fig. 4C), in accordance with the developing state of samples. Additionally, the two Fzd2-related SVIs and their supporting svTF genes enrich the Wnt signaling pathways (entry mmu04310, adjusted p-value ≤ 0.001), and the two Bmp-Acvr SVIs enrich TGF-β signaling pathways (entry mmu04350, adjusted p-value ≤ 0.01), with both pathways known to participate in embryonic development.
Furthermore, we examine the difference between active and non-active SVIs. Specifically, supporting svTF genes vary among LRIs of the same receptor, potentially providing diverse information for underlying function activation. Take receptor Fgfr2 in Embryo 1 (z2) as an example (Fig. 4D and Supplementary Fig. 11E): while Fgf5-Fgfr2 is not supported by any svTF genes, Fgf10-Fgfr2 and Fgf17-Fgfr2 are most supported by Ldhd and Aldob, with Pearson r of 0.396 and 0.410, respectively. Moreover, Fgf5-Fgfr2 is rejected by svTFs in most samples. However, Fgf5-Fgfr2 is identified by SpatialDM in two samples, and stLearn in all six samples.
This concurs with the distinct roles of FGFs in development. Specifically, Fgf5 interacts primarily with Fgfr1 to regulate hair cycling40, without functional evidence in organogenesis, indicating its limited interaction with Fgfr2 and lack of supporting transcription factors in that context. Conversely, Fgf10 and Fgf17 are critical for organ development, with Fgf10 playing a vital role in lung branching morphogenesis and Fgf17 being implicated in cerebellar development41. The presence of supporting svTFs for Fgf10-Fgfr2 and Fgf17-Fgfr2 underscores the necessity of these factors in facilitating their interactions with Fgfr2, highlighting the importance of context-dependent regulatory mechanisms in these signaling pathways. A similar difference can be observed in other Fgf receptors (Supplementary Fig. 11F–J).
Now that we have validated SPIDER’s ability to identify high quality SVIs from single-cell samples, we can access its robustness as drawing similar results from bulk samples. To this end, we constructed bulk samples by aggregating neighboring cells into spots using SOM for the six single-cell samples. On the aggregated bulk sample, we also identified high-quality SVIs, with average Jaccard similarity between the single-cell and bulk SVI lists as 0.7278 for five-cell-per-spot aggregation and 0.6683 for ten-cell-per-spot aggregation (Supplementary Fig. 12A, B). Across all samples, eight of ten SVIs with the most supporting svTF genes are consistent in both aggregations, with Sftp1-Fzd2, Bmp7-Acvr1, Bmp2-Acvr1, Wnt5a-Fzd2, and Apln-Aplnr still received support from over twenty target TF genes (Supplementary Fig. 12C, D). Therefore, both the quality and level of TF support for svTF-supported SVIs are rather consistent between single-cell and bulk samples.
After examining the supporting svTF genes for SVIs, we proceed to the assessment of SVIs in terms of biological validity, demonstrated by their regional presence related to annotated cell types. First, SPIDER identified SVIs that correlate with cell types across all samples. Figure 4E shows relatively strong correlations (highest Pearson r = 0.917) between erythroid and the SVI Kitl-Epor, in accordance with the importance of Kitl-Epor interaction in erythropoiesis42. Further validating this finding, Kitl-Epor has strong correlations across all samples as shown in Supplementary Fig. 13A. Similarly, agreeing with bone morphogenetic protein (BMP) signaling’s participation in heart development, the BMP-related SVIs Bmp7-Acvr2a and Bmp7-Acvr1 correlate moderately with the spatial distribution of cardiomyocytes as shown in Fig. 4F43. Again, Bmp2/7-Acvr1/2a interactions are found across samples (Supplementary Fig. 13B) with the highest Pearson r at 0.729.
Aside from the above cell types, we also observed other cell types with correlated SVIs. In the endothelium region, we found a constant presence of Alpn-Alpnr interactions across all six samples, as shown in Supplementary Fig. 13C (highest Pearson r=0.616). Similarly, in the gut tube region, we found the presence of Cdh-Fgfr1 interactions in five out of six samples (Supplementary Fig. 13D, highest Pearson r = 0.861).
In addition, SPIDER proposed SVIs marking subregions within cell types. For example, within the annotated brain region composed of the forebrain, midbrain, and hindbrain, SPIDER identified SVIs with regional presence. In Embryo 1 (z2), SPIDER-proposed Fgf15-Fgfr1 and Sfrp1-Fzd2 are located separately in the midbrain and hindbrain, moderately correlating with the brain region (Pearson r = 0.519 and 0.505, Fig. 4G). Validating our finding, Fgf15 is a known regulator of neurogenesis in the midbrain44. Its region presence in the midbrain can be observed in other samples (Supplementary Fig. 13E). These region-specific SVIs offer direct hypotheses for functional developmental studies. For example, Fgf15-Fgfr1 and Fgf17-Fgfr3, enriched in distinct brain subregions, could be targeted using spatially resolved gene perturbation or reporter assays to study their role in neuronal differentiation or regional patterning. The ability of SPIDER to map such localized, functionally supported interactions enables focused experimental interrogation of developmental signaling landscapes.
To further investigate the regional feature of SVIs, we used SVI to perform cell clustering using the Leiden algorithm implemented in Scanpy45 as detailed in Methods. To demonstrate the effectiveness of SVIs in revealing interaction-based clusters, we also performed cell clustering using genes implicated by the identified SVIs. Based on both AMI and ARI metrics evaluated against ground truth annotations, SVI-based clustering achieved significantly higher performance (mean AMI = 0.5957, mean ARI = 0.3456; see Supplementary Fig. 14A) compared to clustering using only the implicated genes (mean AMI = 0.2783, mean ARI = 0.1368). These results indicate that while both gene- and interaction-based clustering capture tissue heterogeneity, SVI-based clustering provides additional insights into distinct cell states.
Using Embryo 1 (z2) as an example, we further assess the biological validity of SVI-based clusters compared to clusters derived from genes implicated in these SVIs (Fig. 4H; Supplementary Fig. 14B). Notably, SVI-based clustering identified major clusters corresponding to annotated brain and spinal cord regions (Fig. 4I, J), which were not detected by clustering based solely on the implicated genes (Supplementary Fig. 14C). In accordance with the aforementioned regional SVIs within the brain region, clustering based on SVIs separated the brain region into clusters 9, 10, 7, and 3, representing the hindbrain, midbrain, and two sub-regions within the forebrain, respectively. Similarly, the spinal cord also consisted of four SVI-defined subregions, which could be related to the separation of the anterior, dorsal, and ventral regions of the spinal cord.
We also found distinct marker SVIs of the subclusters in the brain region (Fig. 4K). In particular, our findings of marker SVIs Fgf17-Fgfr3 of the hindbrain and Fgf17-Fgfr2 of the midbrain concord with prior reports documenting a regional separation of these receptors46. Specifically, this separation concurs with the region-specific functions of these FGF-receptor interactions in governing neuronal differentiation and spatial patterning during embryonic brain development.
Subsequently, we test SPIDER on a different ST sequencing platform Stereo-seq, with near single-cell resolution. Cell type annotations on this mouse brain dataset30 are displayed in Fig. 4L. SPIDER again identified high-quality SVI as indicated by five out of the six metrics shown in Fig. 4M. The levels of TF support in results from SpatialDM and stLearn are significant lower compared to SPIDER svTF-supported LRIs, as shown in Supplementary Fig. 15A. Additionally, the false positive LRIs from SpatialDM and stLearn enriched 21 pathways, while SPIDER svTF-supported SVIs enriched 62 pathways, with a small intersection of fourteen pathways (Supplementary Fig. 15B). In pathways uniquely enriched by either gene sets, we found only one signaling pathway from false positive LRIs, and eleven from SPIDER svTF-supported SVIs. Therefore, SPIDER is able to provide more biological insight with TF incorporation.
Further validating the biological importance of SPIDER svTF-supported SVIs, we found marker SVIs for all cell types (Supplementary Fig. 15C). For example, the correlation between mural cell and Prg4-Cd44 reaches 0.863, and the correlation between oligodendrocyte and Trf-Tfrc reaches 0.821. In particular, Trf-Tfrc is known to promote oligodendrocyte maturation and proliferation47.
On this dataset, we explore SPIDER SVI’s potential in revealing cross-cell-type interactions. That is, aside from SVIs marking cell type and sub-cell type regions, we can also observe interactions between cell types (Fig. 4N). For example, Npy-Npy2r marks most cross-cell type interactions involving the interneuron-selective cells, a common neuromodulator for interneurons to inhibit postsynaptic neurons48. Conversely, the interaction between interneuron-selective cells and oligodendrocytes is uniquely marked by Rtn2-Rtn4r, with Rtn4r being an important regulator of oligodendrocytes49.
Furthermore, the dendrogram in Fig. 4N suggests distinct interaction among groups of cell types. Agreeing with the marker SVIs identified above, one group involves oligodendrocytes and interneuron-selective cells-related interactions. Conversely, mural cell-related interactions are closer in the dendrogram.
We further explore cross-cell-type SVIs with interaction regions. Similar to the previous dataset, we used SVI to perform cell clustering using the Leiden algorithm, achieving an ARI of 0.5047 and an AMI of 0.5923 with the spot annotation (Fig. 4O). This interaction-based clustering revealed two distinct clusters of multi-cellular interactions in the brain. Cluster 3 was predominantly defined by interactions among interneuron-selective cells, long-projecting GABAergic neurons, and astrocytes (Fig. 4P, left). This concurred with previous hierarchical clustering, especially since cluster 3 is marked by Lefty1-Acvr2a and Npy-Npy2r (Supplementary Fig. 15D) - marker SVIs of GABAergic-astrocyte and interneuron-GABA/astrocyte communications, respectively. Similarly, cluster 5 describing the crosstalk between Cajal-Retzius cells and oligodendrocytes (Fig. 4P, right) is also marked by SVI Efnb3-Ephb1 identified as a marker for oligodendrocytes-Cajal-Retzius cell interaction.
Under different clustering parameters, we can observe SVI clusters marking subregions within annotated cell types (Supplementary Fig. 15E). For example, we found the SVI cluster 6 in Fig. 4O consistently revealing a subregion within the annotated oligodendrocyte region, as shown in Supplementary Fig. 15F. Furthermore, cluster 6 is consistently marked by the dopamine signaling SVI Gnal2-Drd2 (Supplementary Fig. 15G). This SVI-indicated oligodendrocyte subregion agrees with the function of dopamine signaling in maintaining healthy oligodendrocyte function and supporting neuronal networks50.
Supplementary Note 6 provides a further discussion on SVI patterns for both datasets. The mouse embryo samples revealed a limited number of SVIs due to the small gene count, with SPIDER generating 4-6 SVI patterns per sample. While the SVI patterns captured spatial similarities within larger organ regions instead of distinct cell types, they still identified brain regions in embryos. In the SAW mouse brain sample, although the patterns were also fewer than the cell types, they effectively marked Cajal-Retzius cells, oligodendrocytes, and mural cells.
SPIDER SVI pattern clustering reveals sub-cell type clusters on the slide-seq V2 mouse brain dataset
We demonstrated SPIDER’s ability to cluster SVIs with similar spatial distributions on the slide-seq V2 mouse brain data with single-cell resolution8 as shown in Fig. 5A. SPIDER identified 119 high-quality svTF-supported SVI out of 163 LRIs, as indicated by all six metrics shown in Supplementary Fig. 16A. We can again identify marker SVIs for major cell types (Supplementary Fig. 16B) such as the SVIs Nlgn1-Nrxn3 marking the CA1/CA2/CA3/subiculum and Calm2-Cacna1c marking dentate pyramids.
A The cell type annotation of the sample. B The boxplot of the correlations between the activation strength and the profiles of member and non-member SVIs in each pattern shows that the activation strength represents the member SVIs (n=163 LRIs). C Deconvoluted cell types and their corresponding patterns are displayed, with annotated correlation coefficients for each pattern. For each pattern, two member SVIs having the highest correlation coefficients with the activation strength are plotted. D Top ten enriched pathways for genes contained in each SVI pattern, under the significance threshold of adjusted p-value ≤ 0.05. E Cell clustering using SVI patterns. F Cell type compositions in SVI pattern-based clusters. Clusters having a cell type with a majority of over 30% are displayed. G–I Clusters involved in the cell types and their top three marker SVIs. G Oligodendrocytes. H CA1/CA2/CA3 subiculum. I Astrocytes. Corr: Pearson correlation coefficient. The boxplots display the median (center line), the 25 and 75th percentiles (box bounds), whiskers extending to the most extreme data points within 1.5 × the interquartile range, minima and maxima as the lowest and highest points within the whiskers, and outliers as individual points beyond the whiskers. The statistical significance of box plots is calculated using one-sided Mann-Whitney-Wilcoxon test with Benjamini-Hochberg correction, with the exact adjusted p-values listed in Supplementary Table 7 and the following significance annotations: ****: adjusted p-value ≤ 1.00e-04; ***: 1.00e-04 < adjusted p-value ≤ 1.00e-03; **: 1.00e-03 < adjusted p-value ≤ 1.00e-02; *: 1.00e-02 < adjusted p-value ≤ 5.00e-02. Source data are provided as a Source Data file.
From the identified SVIs, SPIDER generated seven SVI patterns, each with an activation strength aggregated from the profiles of SVI members (Supplementary Fig. 16C). We first evaluated the quality of the inferred activation strength of a pattern by their correlations with profiles of member SVIs, compared to the rest of SVIs. The boxplot in Fig. 5B showed significantly higher correlations between the activation strength and member SVIs compared to the non-member SVIs.
Correlations between deconvoluted cell types and SVI patterns (Fig. 5C and Supplementary Fig. 16D) suggested that some patterns capture spatial heterogeneity similarly to gene expressions and SVIs. For example, pattern 2 strongly correlated with oligodendrocytes (Pearson r = 0.760). Examination of member SVIs in pattern 2 validated its fidelity in representing member SVI profiles, with high correlations between activation strengths and the pattern (Pearson r =0.837 and 0.834). While pattern 3 modestly correlated with ependymal cells (Pearson r = 0.322), one member SVI, Tac1-Tacr1, has been experimentally validated to relate to ependymal cell activation51, demonstrating the potential biological relevance even of patterns with weaker cell type correlations.
We also performed gene enrichment analysis with respect to patterns (Fig. 5D). For example, pattern 1 correlates moderately with the dentate pyramids (Pearson r=0.518), representing the interaction between neurotrophins Bdnf and Ntfk3, indicating a potential functional role of neurotrophins in the dentate pyramids52. Summarizing all SVI-implicated genes in pattern 1, we found enriched neuron-related pathways, such as axon guidance, neuroactive ligand-receptor interaction, and neurotrophin signaling pathway (mmu04360, mmu04080, and mmu04722, adjusted p-values ≤ 0.01), in accordance with the abundant neurons in the represented regions. Similarly, for smaller cell populations such as the endothelial tip cells, SPIDER also found SVI pattern 5 with similar spatial distributions (Pearson’s r = 0.362). In pattern 1, we found enriched TGF-β signaling (mmu04350, adjusted p-value ≤ 0.001), agreeing with the signaling role of TGF-β receptors for endothelial tip cells within the neural environment53.
To further demonstrate how the SVI patterns represent spatial interaction modules, we performed cell clustering with SVI patterns using the Leiden algorithm. We obtained 23 cell clusters from SVI patterns, as shown in Fig. 5E, among which 11 clusters describe the major cell types (Fig. 5F). For example, given the high correlation between cell types and patterns, endothelial tip and dentate pyramids are represented by clusters 17 and 15, respectively.
To further validate that the SVI patterns represent spatially organized interaction modules, we performed cell clustering using the Leiden algorithm based on the SVI patterns. This resulted in 23 clusters (Fig. 5E), with 11 associated with major cell types (Fig. 5F). For example, consistent with their high correlations to patterns, endothelial tip cells, and dentate pyramidal cells were distinctly represented by clusters 17 and 15, respectively.
Furthermore, the SVI patterns suggest subpopulation structure within major cell types astrocytes, CA1/CA2/CA3/subiculum, and oligodendrocytes. Regarding oligodendrocytes, the main regions were characterized by clusters 14 and 6, while the sparser regions were associated with cluster 5 (Fig. 5G). Cluster 14 was defined by the MAG interaction, consistent with MAG being a marker of myelination and mature oligodendrocytes54. In contrast, cluster 5 was characterized by EphA-Ephrin signaling, which is required prior to myelination. These results indicate the SVI patterns capture interaction variation within oligodendrocyte populations related to distinct developmental and functional states.
Similarly, three clusters are observed in the CA1/CA2/CA3/subiculum (Fig. 5H). Here, cluster 13 encompasses multiple Calm2-related SVIs, such as the SVI Calm2-Kcnq3 that is crucial for the proper assembly and functionality of KCNQ2/KCNQ3 channels. These channels are vital for stabilizing neuronal membrane potentials and supporting the cognitive functions associated with the CA1, CA2, CA3, and subiculum regions55. Contrarily, cluster 16, possibly representing the CA1 region, is marked by Silt2 interactions that establish synaptic specificity in hippocampal CA156. Such sub-clusters can also be observed for astrocytes in (Fig. 5I). Overall, SPIDER SVI patterns are able to produce interaction-based clusters that represent interaction diversity within cell types.
SPIDER identifies svTF-supported SVIs from multiple samples in bulk resolution
In this section, we show that SPIDER is not restricted to single-cell resolution datasets by incorporating intra-spot interfaces. We use two datasets to demonstrate these properties: a HER2-positive breast cancer dataset with annotated cancerous and non-cancerous regions57, and the aforementioned DLPFC dataset with layered structures of spot clusters. In addition, a previous study showed that the inclusion of CCI information improves psduotime and trajectory inference in scRNA-seq data, given the important regulatory function of CCI in cell differentiation and developmental processes58. Therefore, we also explore the application of SVI in revealing diffusion pseudotime and trajectory compared to gene expression.
SPIDER identified high-quality SVIs across all six samples in the breast cancer dataset, as shown by the six metrics in Fig. 6A. Furthermore, we can find marker SVIs for cell-type annotations across samples (Supplementary Fig. 17). For biological validation of SVIS, we take sample G2 as an example, which contains both invasive and in situ cancer, as well as immune infiltrated regions (Fig. 6B). We find the presence of SVI ZG16B-CXCR4 in the region of immune cell infiltration (correlation coefficient at 0.431), with CXCR4 has also been found to relate to T-lymphocyte infiltration and immunotherapy in metastatic breast cancer59. Similarly, cancer in situ and invasive cancer regions are marked by different SVIs HSP90AA1-ERBB2 and THBS2-ITGB1 with correlation coefficients of 0.824 and 0.697, respectively. In particular, ERBB2, also known as HER2, is a biomarker for HER2-positive breast cancer, and THBS2-ITGB1 interaction is known to regulate tumor invasion in breast cancer60.
A–D. Validation of SPIDER on identifying TF-supported SVIs on multiple breast cancer samples at the bulk resolution. A The boxplot of five SVI evaluation metrics and one TF correlation metric (n=490, 487, 2117, 944, 563, and 647 LRIs for samples A1 to G2, respectively). B Marker SVIs for the main clusters involved in sample G2 with correlation coefficients higher than 0.3. C Marker SVIs for invasive cancer and cancer in situ that are consistent across samples, shown on samples A1 and G2. D The dot plot lists pathways enriched by genes implicated in constant cancer marker SVIs. E Pseudotime results based on SVIs (left) and gene expression (right) shown on samples A1 and G2. F The barplot of correlations between cancer/TME labels and pseudotime from SVI or gene expression across all samples. G Boxplots showing pseudotime distributions with respect to cancer/TME labels (n=307, 140, 20 spots for TME, Invasive Cancer, and Cancer In Situ, respectively). H–L Validation of SPIDER on identifying TF-supported SVIs on multiple DLPFC samples at the bulk resolution. H The boxplot of five SVI evaluation metrics and one TF correlation metric (n=1584, 1416, and 1454 LRIs for sample 151673, 151510, and 151672, respectively). I Correlation heatmap of white matter and layer 3 regions on three samples with top three white matter marker SVIs (left) and layer 3 marker SVIs (right). The white color indicates the corresponding SVI is missing in the sample. J The dot plot lists pathways enriched by genes implicated in constant region marker SVIs. K Trajectories inferred with gene expression (top) and SVIs (bottom) on sample 151673. L The barplot showing AUROC scores of trajectories inferred from gene expression and SVIs on three samples. Corr: Pearson correlation coefficient; WM: white matter. The boxplots display the median (center line), the 2 and 75th percentiles (box bounds), whiskers extending to the most extreme data points within 1.5 × the interquartile range, minima and maxima as the lowest and highest points within the whiskers, and outliers as individual points beyond the whiskers. The statistical significance of box plots is calculated using one-sided Mann-Whitney-Wilcoxon test with Benjamini-Hochberg correction, with the exact adjusted p-values listed in Supplementary Table 8 and the following significance annotations: ****: adjusted p-value ≤ 1.00e-04; ***: 1.00e-04 < adjusted p-value ≤ 1.00e-03; **: 1.00e-03 < adjusted p-value ≤ 1.00e-02; *: 1.00e-02 < adjusted p-value ≤ 5.00e-02. Source data are provided as a Source Data file.
Notably, SPIDER is able to produce distinct marker SVIs that represent the diversity between invasive cancer and cancer in situ (Supplementary Fig. 18). Furthermore, such cancer marker SVIs are consistent across samples, as shown in Fig. 6C. The six consistent marker SVIs for invasive cancer have an average correlation coefficient of 0.464, much higher compared to the average correlation coefficient with cancer in situ region of −0.046. Similarly, we found 28 consistent marker SVIs for the cancer in situ region. These SVIs have an average correlation coefficient of 0.492 with cancer in situ region, also higher than the average correlation coefficient with invasive cancer of −0.029.
GO analysis on the above consistent cancer marker SVIs further revealed functional differences between cancer in situ and invasive cancer as shown in Fig. 6D, while sharing the enrichment of pathways in cancer with adjusted p-values ≤ 0.05. For invasive cancer, the SVI-implicated genes enrich the pathway of fluid shear stress and atherosclerosis (entry hsa05418, adjusted p-value ≤ 0.05), which has been found to promote breast cancer cell proliferation and invasive potential61. Similarly, the enrichment of thyroid hormone signaling pathway (entry hsa04919, adjusted p-value ≤ 0.05) by the interaction involving ITGAV, suggests the activation of processes critical for metastasis, such as epithelial-mesenchymal transition62. Conversely, genes from SVIs marking cancer in situ region enrich pathways whose dysregulation often contributes to the invasive progression of in situ cancer63.
Subsequently, we inspect if SVI can be used to generate pseudotime results in a similar manner to gene expression. In cancerous ST samples, pseudotime represents tumor transition revealed by gene expression changes64. Similarly, pseudotime from interaction information reveals the temporal evolution of intercellular signaling networks within the tumor microenvironment. Setting invasive cancer as the root for pseudotime, we obtained pseudotime results from both SVI and gene expression for all samples (Fig. 6E and Supplementary Fig. 19A). We evaluate the quality of the generated pseudotime by its correlation with spot labels ordered by invasive cancer, TME, and cancer in situ. Across all samples, SVI pseudotime achieved higher correlations with ordered labels compared to gene pseudotime, with mean correlations across samples as 0.5752 and 0.3967, respectively (Fig. 6F). The better performance from SVIs could be justified by the aforementioned consistent presence of cancer marker SVIs. Subsequently, we evaluate the biological significance of SVI pseudotime results. Agreeing with previous findings65, the pseudotime between invasive and in situ cancer showed the great divergence (Fig. 6G and Supplementary Fig. 19B), suggesting that SVI can serve as a valid counterpart of gene expression.
To test if SPIDER can identify SVIs with different spatial patterns, we also apply SPIDER on the DLPFC dataset with clustered and layered spatial structure, unlike the rather sparse annotations from the breast cancer dataset. Again, SPIDER identified high-quality SVIs across all three samples in the DLPFC dataset, as shown by the six metrics in Fig. 6H. We can also observe SVIs marking known brain layers (Supplementary Fig. 20). For example. the neuronal interactions LPL-LRP1 and PSAP-LRP1 are present in layers 3 and 1 from sample 151673 (Pearson r=0.632 and 0.480, respectively).
Furthermore, we demonstrated SPIDER’s ability to identify SVIs that differentiate brain regions in multiple samples (Fig. 6I). Specifically, six SVIs consistently correlated with white matter across all samples (mean Pearson r=0.720). Additionally, 26 SVIs correlated with layer 3 in two of three samples (mean Pearson r = 0.395). The higher positive correlations between SVIs and their respective regions, compared to lower negative correlations with other regions, provide statistical evidence that SPIDER can delineate discrete brain regions like white matter and layer 3 based on SVIs.
Again, we conducted GO analysis on the identified marker SVIs, which revealed functional differences between white matter and layer 3 regions (Fig. 6J). Specifically, pathways enriched in white matter markers primarily supported structural integrity (adjusted p-values ≤ 0.01), while layer 3 markers focused on synaptic interactions and cognitive functions (adjusted p-values ≤ 0.0001). Notably, the enrichment of neuroactive ligand-receptor interaction and calcium signaling pathways in layer 3 concurs with its role in synaptic interactions. Similarly, the enriched long-term potentiation and circadian entrainment pathways in layer 3 agree with its central role in cognitive processing.
The DLPFC dataset is characterized by its linear development trajectory66. Therefore, we inspect if SVI can be used to reveal the linear trajectory in a similar manner to gene expression. Trajectory analysis based on gene expression reveals the gene expression patterns and functional roles of cells across different cortical layers, while trajectory analysis based on SVIs uncovers the communication patterns between these layers, helping us understand their coordination in maintaining normal brain function. We obtained trajectory results from both SVI and gene expression for all samples (Fig. 6K and Supplementary Fig. 21A). We evaluate the quality of the generated trajectories with the AUROC score on the edge weights against ground truth connections between consecutive layers. Across all samples, SVI trajectory achieved higher AUROC scores, with mean correlations across samples as 0.9833 and 0.9065, respectively (Fig. 6L). Therefore, SVI can also generate valid trajectory as gene expression, with a possible advantage in this case coming from the spatial constraint of interactions.
Supplementary Note 7 provides further discussion on the SVI patterns obtained for both datasets. Similar to our finding based on SVIs, the SVI patterns in the breast cancer dataset also distinctly mark regions of invasive and in situ cancer. For the DLPFC dataset, we observed SVI patterns marking the white matter region across all three samples.
Discussion
Identifying functionally activated spatially variable interactions is crucial for understanding the relationship between tissue structure and cell-talk functions. In this paper, we introduce SPIDER, a computational method that accurately identifies spatially variable interactions (SVIs) with the support of svTF genes. SPIDER identifies capacity-constrained cell-cell interaction interfaces, and scores LRIs with interface-adapted COT and LR gene co-expression. SPIDER constructs the abstract interface graph by SOM, and integrates six statistical models to produce SVI candidates20,21,22,23,24, leveraging the Gaussian process with various covariance kernels, nonparametric covariance tests, and hidden Markov random fields. SPIDER further screens for SVI candidates that are supported by the activation of target svTF genes, reducing the number of false positives.
SPIDER performed well at identifying svTF-supported SVIs compared to other CCI methods, both in simulations and real datasets. In simulation testing across varying levels of supported and unsupported svTF genes, interaction strength, and noise, SPIDER generally achieved higher AUC values among SpatialDM, SpatialCorr, SpaTalk, and CellChat. In addition to the simulation data, we applied SPIDER on six datasets from four platforms varying in spot numbers and resolutions (Supplementary Table 9). SPIDER-identified svTF-supported generally exhibited higher spatial autocorrelation scores and stronger correlations with downstream target svTF genes than those from SpatialDM. These results demonstrate SPIDER’s ability to accurately recognize svTF-supported SVIs over alternative CCI methods in multiple testing scenarios.
The importance of screening for functionally active SVIs becomes evident in the context of interaction-based analyses, particularly when assessing the relevance and biological significance of these interactions. In the PDAC dataset, among LRIs related to TSPAN14/15 signaling, only the interaction APP-TSPAN15 is supported by a downstream target. Additionally, in a mouse embryo dataset, SPIDER effectively excluded a false positive interaction Fgf5-Fgfr2 with limited relevance to organogenesis due to the absence of svTF support. In contrast, SpatialDM exhibited significantly lower TF support compared to SPIDER-proposed LRIs, with many lacking supporting svTF genes. By concentrating on svTF-supported LRIs, SPIDER reduces the average number of ligands per receptor and receptors per ligand, demonstrating its ability to identify biologically meaningful interactions. The top-ranking SVIs, supported by multiple svTF genes, reveal diverse enriched pathways, highlighting their distinct biological roles in cooperation with downstream targets. These findings emphasize SPIDER’s effectiveness in identifying biologically relevant svTF-supported LRIs and the essential role of TF support in functional pathway analysis.
SPIDER SVIs demonstrate both heterotypic and homotypic signaling characteristics, providing insights into cellular interactions across diverse biological contexts. In a pancreatic ductal adenocarcinoma (PDAC) dataset, approximately 64% of SPIDER SVIs represented heterotypic interactions, with stronger interaction strength compared to homotypic signaling. For homotypic interactions, correlations between SPIDER SVIs and gene-driven spot annotations validated that these SVIs effectively capture interactions within specific cell types. Clustering analyses further confirmed the presence of both signaling types. In a mouse embryo dataset, SPIDER identified SVIs that delineated sub-regions of the brain and spinal cord. Similarly, in the SAW mouse brain dataset, SPIDER-generated SVIs revealed regions composed of mixed cell types. The identification of both heterotypic and homotypic signaling is crucial for understanding cell type coordination, particularly within heterogeneous tissues like tumors.
The SVI patterns generated by SPIDER uncover spatial interaction modules that enhance our understanding of the cooperative functions of SVIs and cell types. By grouping SVIs with similar spatial distributions, we can achieve higher-level functional annotations based on these interactions. In the slide-seq mouse brain dataset, SPIDER identified significant interaction modules, such as neurotrophins in the dentate pyramids and TGF-β signaling in endothelial tip cells. Additionally, SVI patterns facilitate interaction-based cell annotation. A comparison with SpatialDM in the PDAC dataset showed that SPIDER effectively delineates distinct SVI patterns linked to specific tissue regions, while SpatialDM patterns lacked this regional specificity. The SVI patterns in the slide-seq dataset also indicated sub-clusters within gene-based annotations, revealing the presence of distinct marker SVIs.
The evaluation of SPIDER’s runtime across key steps–scoring interfaces, scoring TF genes, and identifying SVI candidates–highlights important insights into computational efficiency. The runtime of the above steps on a single CPU core is reported for each sample in Supplementary Table 10. As detailed in Supplementary Note 8, a strong linear relationship was found between the number of interfaces and the runtime for interface scoring and SVI candidate identification. In contrast, the number of LRIs showed a weaker correlation. For svTF scoring, the number of LRIs was a more substantial factor, but the number of interfaces still played a role. This quantitative observation further supports that interface screening and abstraction can substantially enhance computational efficiency for SPIDER. However, interface abstraction presents a trade-off between LRI profile accuracy and runtime efficiency (Supplementary Note 9). While it reduces the number of interfaces and significantly decreases identification time, it also smooths out LRI signals, potentially masking important variations. Evaluation metrics indicate that abstracted interfaces have lower expression variability than original interfaces. Thus, while abstraction enhances efficiency, users should consider its impact on biological accuracy, especially in smaller datasets or specific regions of interest.
Compared to existing methods, SPIDER offers a more flexible framework for analyzing SVIs, facilitated by the approaches of interface identification and svTF validation. Our SPIDER framework allows users to utilize different interface scoring methods, as the underlying assumptions may influence the results67. Additionally, SPIDER enables customization of spatial variance testing methods, considering the specific advantages and limitations of each approach in relation to data size and noise levels. By combining results from multiple spatial variance tests, SPIDER can reduce discrepancies that often arise when calling statistically significant spatial variance35. This adaptability not only enhances the robustness of SVI identification but also facilitates the incorporation of advanced CCI methods.
In conclusion, SPIDER offers a powerful approach for analyzing SVIs in ST data, significantly enhancing our understanding of cellular interactions within complex tissues. By accurately identifying functionally activated SVIs supported by svTF genes, SPIDER minimizes false positives and captures biologically relevant interactions that are critical for elucidating tissue structure and function. Its ability to discern both heterotypic and homotypic signaling enriches our insights into cell type coordination, particularly in heterogeneous environments like tumors. Furthermore, the identification of distinct SVI patterns facilitates higher-level functional annotations and interaction-based cell annotations, allowing researchers to uncover meaningful biological modules. Overall, SPIDER serves as a valuable tool for exploring the intricate dynamics of cell communication, paving the way for insightful discoveries in ST data.
Methods
Identifying cell-cell interaction interfaces constrained by interaction capacity
The inputs of SPIDER are the spatial gene expression matrix \({{{\bf{X}}}}\in {{\mathbb{R}}}^{m\times n}\) and the spot coordinates \({{{\bf{S}}}}=({{{{\bf{s}}}}}_{1}^{T},\cdots \,,{{{{\bf{s}}}}}_{m}^{T})\in {{\mathbb{R}}}^{m\times 2}\) of the ST data, with m denoting the number of spatial spots and n denoting the number of genes. From the gene set \({{{\mathcal{G}}}}=\{{g}_{1},\cdots \,,{g}_{n}\}\), we extract the ligand and receptor gene set \({{{{\mathcal{G}}}}}^{{\prime} }\) that contains genes participating in known ligand-receptor interactions. In this work, we use the LR pairs from CellTalkDB68, which contains 3398 human LR pairs and 2033 mouse LR pairs.
We first evaluate the interaction capacity for spot by the total expression of ligand and receptor (LR) genes. Given the interaction capacity per spot, we can then use a power diagram to identify varying numbers of interfaces for spots with different interaction capacities31. Similar to the Delaunay triangulation, the power diagram generated polygons representing spots, and we consider spots with adjacent polygons to be potentially interacting. To generate a power diagram, we lift spots from the Euclidean space \({{\mathbb{E}}}^{2}\) onto a paraboloid in \({{\mathbb{E}}}^{3}\). This construction incorporates the interaction capacity cs of spot s with a scaling function f, defining its lifted height hs in \({{\mathbb{E}}}^{3}\)
The lifting equation and capacity scaling are detailed in Supplementary Method 1.1. The interaction capacity cs modifies the paraboloid, with larger capacities pulling it downward, resulting in more extensive downward-facing facets. The convex hull of this adjusted paraboloid forms these facets, which correspond to the power diagram for the points. This lifting process determines the capacity-driven paraboloid and, consequently, the capacity-guided polygons representing the points.
After obtaining possibly interacting spots from the power diagram, we further screen for spot pairs based on both their spatial proximity and interaction capacity. We first keep spot pairs that are in close proximity. For grid ST datasets, the proximity threshold is defined as the third quantile of the distance between all adjacent spots. For non-grid ST datasets, we apply the 99th quantile instead. We then estimate the ideal number of interfaces for each spot based on their interaction capacity with a Gaussian mixture model (GMM). For a spot with more interfaces than its ideal number of interfaces, we prune its interface with the most distant adjacent spot.
We also adapt the modeling of interfaces to account for ST resolution. Specifically, for bulk resolution, we consider an interface between two spots as a regional representation of interaction signals, and we define an inner interface to represent the interactions within a spot. Considering that our task is to identify interaction with spatial variation, the regional interaction signal can also indicate variation in space.
Finally, we determine the spatial locations of interaction interfaces. For each neighboring spot pair \((s,{s}^{{\prime} })\) identified and screened above, we define the interaction interface\(I=(s,{s}^{{\prime} })\) representing the interactions between the two spots. Formally, we obtain the set of interaction interfaces \({{{\mathcal{I}}}}\) and the location of interfaces Z as
with si and \({s}_{i}^{{\prime} }\) denoting the two neighboring spots on either side of the i-th interface Ii, and k denoting the number of identified interfaces.
Estimating ligand-receptor interaction strengths and directions on interfaces with COT and co-expression
We model the ligand-receptor interaction (LRI) signals at each interaction interface from gene expression X, referred to as the interaction profile, by incorporating both COT and the co-expression of ligand-receptor (LR) genes at the interaction interface. From the set of LR genes \({{{{\mathcal{G}}}}}^{{\prime} }\), we extract the ligand gene set \({{{\mathcal{L}}}}\) and the receptor gene set \({{{\mathcal{R}}}}\). The set of LR pairs are represented as \({{{\mathcal{P}}}}=\{({l}_{1},{r}_{1}),\cdots \,,({l}_{{n}^{{\prime} }},{r}_{{n}^{{\prime} }})\}\), where \({n}^{{\prime} }\) denotes the number of LR pairs.
We first utilize COT to estimate the distribution of ligand and receptor expression across the interfaces. The COT implementation is based on the approach provided by COMMOT12, with modifications to constrain the transport plan between SPIDER interfaces. Specifically, for the set of interfaces \({{{\mathcal{I}}}}\), the COT formulation is adjusted as follows:
Here, F is the penalty function associated with the untransported LR expression, defined as \({\mu }_{l,s}={X}_{s,l}-{\sum }_{r,{s}^{{\prime} }}T(l,r,s,{s}^{{\prime} })\) and \({\nu }_{r,{s}^{{\prime} }}={X}_{{s}^{{\prime} },r}-{\sum }_{l,s}T(l,r,s,{s}^{{\prime} })\). The COT score of the p-th LRI \((l,r)\in {{{\mathcal{P}}}}\) at the i-th interface \((s,{s}^{{\prime} })\in {{{\mathcal{I}}}}\) is therefore
Furthermore, from the optimal transport plan T, we extract the LR specific expressions of ligands \({{{\bf{L}}}}\in {{\mathbb{R}}}^{m\times {n}^{{\prime} }}\) and receptors \({{{\bf{R}}}}\in {{\mathbb{R}}}^{m\times {n}^{{\prime} }}\) to compute LR co-expression. Specifically, the co-expression of the p-th LRI (l, r) at the i-th interface \((s,{s}^{{\prime} })\) is
For the ith interface between spots s and \({s}^{{\prime} }\), we define the interaction profile yi, where the pth entry represents the interaction strength of the pth ligand-receptor pair:
Then we can present the interaction profiles across k interfaces between neighboring spots for all \({n}^{{\prime} }\) ligand-receptor pairs as a two-dimensional matrix \({{{\bf{Y}}}}=({{{{\bf{y}}}}}_{1}^{T},\cdots \,,{{{{\bf{y}}}}}_{k}^{T})\in {{\mathbb{R}}}^{k\times {n}^{{\prime} }}\).
Additionally, we infer the interaction direction from the optimal transport plan. The interaction direction can be determined by the choice made in Equation (4). That is, the interaction pair \((s,{s}^{{\prime} })\) is reversed to \(({s}^{{\prime} },s)\) when \(T(l,r,s,{s}^{{\prime} }) < T(l,r,{s}^{{\prime} },s)\), and kept as \((s,{s}^{{\prime} })\) otherwise.
Scoring the activation of spatially variable downstream genes
The objective of this step is to use transcription factor (TF) genes to support scored LRIs. We collect a knowledge graph encoding known LR-TF regulatory relation from SpaTalk11, which is a directed graph with LR/TF nodes and regulatory edges. Let V denote all nodes in the knowledge graph, we use VR and VTF to denote the receptor and TF nodes. Using the spatial information of ST data, we can further partition the TF node set VTF into a set of spatially varying TF (svTF) genes VsvTF and non-spatially varying TF genes VnsvTF. We calculate activation scores for each receptor by analyzing weighted paths from VR to VsvTF in the knowledge graph. The directed graph structure is encoded by the adjacency matrix \({{{\bf{A}}}}\in {{\mathbb{R}}}^{| V| \times | V| }\). The power of an adjacency matrix, denoted as Ah, where h is a positive integer, represents the number of paths of hop h between vertices in the graph. Specifically, the entry (VR, VsvTF) of the matrix Ah indicates the number of distinct paths of hop h that start at a receptor vertex VR and end at a svTF vertex VsvTF.
We further derive a weighted adjacency matrix by considering the level of target node activation represented by gene expression. Let x denote the gene expression vector of a spot, the weighted adjacency matrix \(\tilde{{{{\bf{A}}}}}\) are defined as \({\tilde{A}}_{ij}={A}_{ij}{x}_{j}\). Its powers \({\tilde{{{{\bf{A}}}}}}^{h}\) capture the number of paths between nodes within h hops weighted by the co-expression of target nodes. To make sure the weighted adjacency matrix is comparable among different h, we regularize \({\tilde{{{{\bf{A}}}}}}^{h}\) with the weights as well as the number of paths. The regularized weighted adjacency matrix \({\overline{{{{\bf{A}}}}}}^{{{{\bf{h}}}}}\) is therefore defined as
We use \({{{\bf{H}}}}\in {{\mathbb{R}}}^{| {V}_{R}| \times | {V}_{TF}| }\) to record activation scores between each receptor-svTF pair with a spot, which is defined as the weighted receptor-svTF path sum within 3 hops
Here, the hop matrices Hh is a subset that keeps only receptor-svTF paths. We reorganize the H matrices across all cells into a single activation score matrix \({{{\bf{S}}}}\in {{\mathbb{R}}}^{m\times {n}_{rt}}\), where nrt is the number of receptor-svTF pairs.
Finally, we calculate the correlation between Sr (activation scores for receptor r) and LRIs scores implicating r. Correlating svTFs provides supporting evidence for those LRIs, with a user-defined threshold on correlation coefficients filtering supported LRIs, with a threshold of 0.3 set as default.
Building abstract interfaces with self-organizing map
In this section, we build an abstract graph from the interfaces derived above. From m spots, the number of interfaces is approximately 2m for square or hex grid ST data, and 3m for non-grid ST data, as shown in Supplementary Method 1.2. Considering that statistical tests for spatial variance are challenged by computational scalability, we reduce the number of interfaces by finding an abstract graph of interfaces.
An abstract interface represents a group of spatially adjacent interfaces; we derive such a mapping with a self-organizing map (SOM) that preserves the spatial topology of interfaces20. Formally, we initialize a K × K square lattice covering the ST slice whose vertices represent an abstract interface with initial coordinates \(\hat{{{{\bf{Z}}}}}=({\hat{{{{\bf{z}}}}}}_{1\times 1}^{T},{\hat{{{{\bf{z}}}}}}_{1\times 2}^{T},\cdots \,,{\hat{{{{\bf{z}}}}}}_{K\times K}^{T})\in {{\mathbb{R}}}^{{K}^{2}\times 2}\). To achieve a level of abstraction that balances between a resolution that retains detail while still reducing the number of interfaces, we set K by requiring the number of interfaces represented by a single abstract interface as \(\frac{k}{{K}^{2}}=5\) by default. In the training process, SOM iteratively updates the mapping between an interface at z and its best-matching abstract interfaces selected as the one with the smallest distance from z, as well as the coordinates \(\hat{{{{\bf{Z}}}}}\) of abstract interfaces to better present the spatial topology of interfaces. Here, we use the batch SOM implemented in SOMDE20.
After training, we construct the interaction profile of an abstract interface with a mixed max-average pooling on the represented interface profiles. The joint of two pooling methods retains the most prominent LRI signals while avoiding overestimating variance due to the limited neighborhood size69. In particular, we set the mixture portion of average values to 0.7 to avoid extreme values presenting as noise in calculating the spatial variance of interaction signals. We denote the profiles of abstract interfaces as \(\hat{{{{\bf{Y}}}}}=({\hat{{{{\bf{y}}}}}}_{1\times 1}^{T},{\hat{{{{\bf{y}}}}}}_{1\times 2}^{T},\cdots \,,{\hat{{{{\bf{y}}}}}}_{M\times M}^{T})\in {{\mathbb{R}}}^{{M}^{2}\times {n}^{{\prime} }}\).
Identifying spatially variable LRI candidates
SPIDER incorporates six state-of-the-art methods for spatial variance evaluation. From the profile matrix and coordinates of the abstract interfaces, denoted as Y and Z to ease the notation, we have \({{{\bf{Y}}}}={({{{{\bf{y}}}}}_{1},{{{{\bf{y}}}}}_{2},\cdots,{{{{\bf{y}}}}}_{p},\cdots,{{{{\bf{y}}}}}_{{n}^{{\prime} }})}^{T}\), where \({n}^{{\prime} }\) is the number of ligand-receptor pairs and yp denotes the profile of LRI p across all abstract interfaces. SPIDER measures the spatial-induced variance of yp with several formulations below.
Gaussian process (GP) regression is a common approach for spatial regression; under the GP model, an LRI signal yp follows a normal distribution with the mean of μ ⋅ 1 and a covariance matrix from two terms - a non-spatial noise term ψI and a spatial term formulated by a covariance function σ2Q(ϕ)70. In the spatial variance term, Q(ϕ) captures the spatial correlation between abstract interfaces under the lengthscale ϕ controlling the decay rate of correlation with distance. The parameters are inferred by maximizing the log-likelihood, with which we obtain the log-likelihood ratio (LLR) compared to the null hypothesis of no spatial covariance.
Assuming that the LLR follows a χ2 distribution, we obtain the corresponding p-value and q-value for yp. We can also rank LRI with the fraction of spatial variance (FSV) defined in SpatialDE70. For the above process, we apply the implementation in SpatialDE and SOMDE20 using a squared exponential covariance function.
Furthermore, we incorporate two variations of the GP regression model (Supplementary Method 1.3.1). Replacing the full covariance matrix with a nearest-neighbor approximation accelerates the model fitting process to scale linearly with the number of abstract interfaces, instead of cubically when using the full covariance matrix; the subsequent estimation of LLR, p-value, and q-value resembles the previous method. We apply the implementation in nnSVG22. To integrate a variety of covariance functions, we incorporate all possible covariance functions with a generalized linear mixed model. For an efficient model fitting process, we utilize an extended score test that only estimates the model under the null hypothesis as implemented in SpatialDE221. Assuming that the sum of the score vector for the variance components follows a mixture of χ2 distributions, we obtain the corresponding p-value and q-value for yp.
Moreover, we apply a nonparametric covariance test for estimating the spatial variance (Supplementary Method 1.3.2). Assumes that if yp is independent of Z, then the spatial distance between two abstract interfaces zi and zj would also be independent of the LRI profile difference between the two abstract interfaces i and j. With yp and Z, we have the centered profile and coordinate covariance matrices Q(yp) and Q(Z). Subsequently, tr(Q(yp)Q(Z))/k, a correlation measurement between yp and spatial coordinates Z is computed. Assuming that the correlation measurement follows a mixture of χ2 distributions, we obtain the p-value and the corresponding q-value for yp. SPIDER uses the implementation in SPARK-X23.
Given that LRI profiles are often noisy due to sequencing limitations, we further denoise the LRI profiles by discretization, that is, converging the continuous LRI profile to discrete labels. We achieve spatial neighbor-aware discretization with hidden Markov random field by assuming that the observable LRI signal yp are conditionally dependent on the underlying discrete labels f = {fi} whose states are not observable. Initially, we estimate f with a Gaussian mixture model, with which we apply the alpha-expansion graph cut algorithm implemented in scGCO24 to find a label vector f of interfaces that minimizes an energy function (detailed in Supplementary Method 1.3.3). The optimal label f * defines an active region \({{{\mathcal{F}}}}\) for the LRI signal, on which we test for the spatial non-randomness p-value with the complete spatial randomness framework24, as well as the corresponding q-values.
Since we have multiple statistical tests, SPIDER filters the SVIs by applying a strict threshold of p-value ≤ 0.01 across all tests. If the resultant SVIs is less than 10, SPIDER combines the resulting p values using the Cauchy p value combination rule71 to obtain a single p value, which is also subject to the threshold of p-value ≤ 0.01. The threshold is also implemented as a hyperparameter for users to adjust.
SVI pattern generation with a Gaussian process mixture model
We use a Gaussian process mixture model proposed in SpatialDE70 with a Dirichlet process prior21 to group SVIs sharing similar spatial distributions. The nonparametric Dirichlet process prior automatically determines the number of patterns. From the LRI profile matrix \(\hat{{{{\bf{Y}}}}}\), we obtain the latent SVI membership of the SVI pattern and the distribution of each pattern c, denoted as \({\hat{{{{\boldsymbol{\mu }}}}}}_{c}\), in the form of a GP mixture model with components as the GP models of SVI members, solved by variational inference as implemented in SpatialDE221.
We term \({\hat{{{{\boldsymbol{\mu }}}}}}_{c}\) as the activation strength for pattern c. Following the mapping between z and its best-matching abstract interface \({\hat{{{{\bf{z}}}}}}^{*}({{{\bf{z}}}})\), each interface z has the same activation strength as \({\hat{{{{\bf{z}}}}}}^{*}({{{\bf{z}}}})\). We use \({{{\mathbf{\Upsilon }}}}={({{{{\boldsymbol{\mu }}}}}_{1},{{{{\boldsymbol{\mu }}}}}_{2},\cdots,{{{{\boldsymbol{\mu }}}}}_{c})}^{T}\) to denote the activation strength matrix of interfaces.
Construction of simulation tests and comparisons with other methods
To examine SPIDER’s accuracy in identifying SVIs, we simulated multiple ST datasets containing both spatially variable (SV) and non-SV ligand-receptor interactions (LRIs) as well as target gene supports using the SVCA package25. In these simulations, the interaction strength of each LRI is not uniform but is varied across cells. Specifically, the level of interaction strength is controlled by the fraction of ligand expression variance explained by its interaction with the corresponding receptor, as specified in SVCA. Similarly, SVCA simulates target gene expressions from a given receptor by controlling the intrinsic effect, that is, the fraction of target gene variance explained by receptor activation. Therefore, we use the fraction of the interaction effect to control the strength level of interaction, and the fraction of the intrinsic effect to control the activation of target TF genes by receptors.
We first simulated ligand expression for testing the identification of SVI candidates. To distinguish SV and non-SV patterns, we selected the top 100 SV and non-SV receptors from a real dataset. From the expression of SV receptors, SVCA simulates ligands that present similar spatial patterns, therefore generating LRIs with spatial patterns. Similarly, ligands generated from non-SV receptors will result in LRIs without spatial patterns. We control the level of interaction strength by supplying a range of fractions of variance to be explained by gene interactions. We further apply a Poisson noise on the generated ligand expression, with λ defined as the product of expressions and a scaling factor adjusting the intensity of the noise, resulting in three noise levels.
Subsequently, we simulated TF expression for testing svTF support of LRIs. We constructed a four-hop knowledge graph, where the target TF genes can be either svTF (spatially variable TF) or nsvTF (non-spatially variable TF), and their expression can either correlate or not correlate with the receptor gene. By setting the intrinsic effect to 0.9, we obtain TF expressions correlating with the input receptor gene, and the uncorrelated TF expressions take the negative part of the correlating TF expressions. If the input receptor gene is spatially variable, then the simulated TF expressions are also spatially variable. To generate nsvTF for SV receptors, we apply a Poisson distribution with λ = 1 to disrupt the generated spatial pattern while maintaining the correlation between TFs and the SV receptor. On the other hand, if the input receptor is not spatially variable, we apply a Gaussian model with length-scale 2 to at an expression hot-spot to simulate spatial variance. When testing the effect of TF variability on SPIDER, we also apply three levels of Poisson noise on the generated TF gene expression as above.
Joining the simulated ligand and TF expression, we construct various simulation cases. Initially, we generated 12 simulations by systematically varying the interaction strength levels–specifically, setting the fraction of ligand variance explained by receptor interaction to 99, 75, 50, and 25%–and applying three noise levels using scaling factors of 0.5, 0.7, and 0.9, which correspond to low (0.1), medium (0.3), and high (0.5) noise, respectively. For both SV and non-SV receptor genes, half of them are supported by svTFs, while the other half are not. At the median noise level and with four interaction strength levels, we generated an additional 12 simulations by adjusting the fraction of svTF-supported non-SV receptors. Specifically, non-SV receptor genes can be fully supported, partially supported, or not supported by svTFs to evaluate the TF scoring function of SPIDER. Lastly, we generated 12 null simulations at the median noise level across different interaction strength levels, ensuring that all SV receptors are not supported by any svTFs. In contrast, non-SV receptor genes can be fully supported, half supported, or not supported by svTFs.
Several methods have been developed for ST-based LRI analyses. SpatialDM and SpatialCorr detect spot-level interactions, while SpaTalk identifies cluster-level interactions. On the other hand, CellChat detects cluster-level interactions without using spatial knowledge. For cluster-based LRI methods SpaTalk and CellChat, we kept the lowest p-value for each ligand-receptor pair between any two clusters. We compared SPIDER with these methods. However, it should be noted that these methods, except for SpatialDM, are rather unfavorable in this simulation setting for having relatively different objectives.
Furthermore, to show the effect of SPIDER’s interaction scoring on the capacity-constrained interfaces, we incorporated two other methods, namely stLearn and COMMOT, that provide spot-based interaction scoring12,13, and followed with SPIDER’s statistical model testing. Statistical testing with similar LRIs is used in stLearn to estimate spot-wise interaction significance scores. On the other hand, we supply COMMOT’s spot-spot interaction scores onto the constructed interfaces.
We generated receiver operating characteristic (ROC) curves under each interaction scenario for all benchmarking methods, SPIDER, and different models used in SPIDER. The ROC curves were then used to compare the AUC (Area Under the Receiver Operator Curve) values. For null cases with no svTF-supported SVIs, we use the specificity metric with a significance cutoff of 0.01.
To generate bulk simulation datasets, we grouped adjacent cells into computational spots using SOM on the PDAC and DLPFC simulation datasets. Due to the smaller number of cells in the PDAC sample, we set the number of cells per spot to four and six for both the PDAC and DLPFC samples. The simulated expression values of cells within each spot were aggregated by taking the mean expression, generating bulk-level profiles from the underlying single-cell simulations. For cluster-based SpaTalk, we label a spot by the majority of cluster labels from its member cells.
Evaluating the effect of spatial constraints on both simulation and real datasets
We further assessed SPIDER’s robustness under weak and non-existent spatial constraints by applying permutation tests to both single-cell and bulk resolution PDAC simulations across four interaction strength levels. To model weak spatial constraints, we performed block permutations, randomly shuffling cells or spots within predefined block sizes of 3, 5, and 10, with larger block sizes representing progressively weaker spatial constraints. To simulate the complete absence of spatial constraints, we randomly permuted all spots or cells without maintaining any block structure, repeating this process five times for each interaction strength setting.
Decoding cell type interactions with directed and undirected SVI
Given spot annotations such as cell types or cluster labels, SPIDER decodes the interacting parties from an SVI. Let C = {Ct} denote the categories in the given annotation, and c = {cs} denote the spot labels. We define a symmetric matrix K representing cell type interactions for the pth LRI, where
subject to \(t \, \ne \, {t}^{{\prime} }\). Here \((s,{s}^{{\prime} })\) represents an interface, ys,p is the expression level of the pth LRI on spot s, and \({\mathbb{1}}\) is the indicator function. For the diagonal entry, we have
The matrix K reveals the mean SVI expressions of interfaces connecting different cell types in the sample.
Subsequently, we construct a non-symmetric version of K for directed cell type interactions, with
The diagonal entries remain unchanged as in Equation (10). SPIDER visualizes the non-symmetric K with a chord diagram, with the option to exclusively display interactions between different cell types.
Downstream analyses with identified SVIs and SVI patterns
In the above section, we obtain the SVI expression matrix \({{{{\bf{Y}}}}}^{{\prime} }\), subset from the full expression matrix Y, and the SVI pattern matrix ϒ. Subsequently, we demonstrate that SPIDER populates \({{{{\bf{Y}}}}}^{{\prime} }\) and ϒ back to the feature space of the spot.
We retain a mapping between spots and interfaces as B, with B(s) denoting a set of interfaces connected to spot s. Let X and χ denote the spot-transformed SVI expression and SVI pattern matrics, defined as
Furthermore, we calculate Pearson correlation with known spot annotations and perform SVI or SVI pattern-based spot clustering given the populated features. We showcase the resemblance between spot annotations and SVIs or SVI patterns by calculating their Pearson correlations. Specifically, we calculate the Pearson correlations between the populated matrices with known spot annotations, such as one-hot encoded cell type labels or cell type deconvolution results.
For the populated SVI and SVI pattern matrices X and χ, we apply Scanpy’s clustering pipeline45. Specifically, we embed the populated matrices in a two-dimensional space using Scanpy’s UMAP implementation with default parameters. We then use Scanpy’s Leiden implementation to extract clusters from the UMAP-reduced features. For the mouse embryo and mouse brain datasets, we set the resolution of Leiden as 0.1 and 0.5, respectively, to approximate the number of annotated clusters. For the SAW dataset, we vary the resolution parameter to 0.1, 0.5, and 1, respectively. We further find the differentially expressed SVIs among interaction-based clusters using Scanpy’s rank_genes_groups with default parameters. The top SVIs were ranked by integrating log-fold change and t-test p-value. We further utilized the SVI matrix X for inferring pseudotime and trajectory, following Scanpy’s trajectory inference pipeline with the number of PCA components set to 10.
Considering that interfaces could serve as a counterpart of spots or cells, we also performed clustering, pseudotime and trajectory inference on interfaces instead of spots. The method and results are discussed in detail in Supplementary Note 10. Similarly, SVIs could be seen as a counterpart of genes, for which we performed additional tests to examine the relation between SVIs and genes (details in Supplementary Note 11).
Pathway enrichment for LRIs and svTFs
We perform pathway enrichment analyses on a given set of LRIs, both utilizing the “enrichR” function implemented in the GSEApy package to test for enrichment on pathway databases such as KEGG and Reactome72. Specifically, for enrichment analysis on genes implicated in the set of LRIs, we call “enrichR” with default human KEGG pathway “KEGG_2021_Human” and mouse KEGG pathway “KEGG_2019_Mouse”. For enrichment on an SVI and its supporting svTFs, the gene set is the joint set of the LR genes and the supporting svTFs.
Configuration of the used software
We configured several computational software packages for interface scoring, abstraction, and SVI analysis. For COT scoring of interfaces, we used the default parameters implemented in COMMOT, except for adjustments made at interfaces. When applying SOMDE for interface abstraction, we configured the mixture portion parameter to 0.7 and set it to represent 10 interfaces, with each abstract interface by default. For identifying and evaluating SVI candidates, we applied eight software. Regarding the SpatialDE, SpatialDE2’s omnibus mode, and SPARKX for detecting SVI candidates, we left them at their default settings. For SOMDE, we set the default mixture portion and number of interfaces per abstract interface to four for under 1000 interfaces or ten otherwise. To evaluate different combinations of edge and node penalties using the scGCO software, we systematically varied the smooth factor among 1, 5, 10, and 20 and the unary scale factor between 50 and 100. Across all statistical tests run using various combinations, we report the smallest q-value to determine significant SVI candidates. For nnSVG, we try with default parameters, and when nnSVG fails for some inputs, we include two configuration options - adding a pseudo-count of 1 to the interface profile matrix or discarding low-reliability LRI pairs. We obtained Moran’s I and Geary’s C metrics using the Squidpy package73 with 1000 permutations set for significance testing.
We compared SPIDER with several methodologies, including SpatialDM, SpaTalk, CellChat, stLearn, COMMOT, and spatialcorr. For SpatialDM, we set the parameters l=1.2 and a cutoff of 0.2, using bin-spot results for cell-type correlation assessments. SpaTalk was run with default parameters. In the case of CellChat, we specified the type as truncatedMean in the computeCommunProb function and set the trim to 0.01. For stLearn, we applied score pval_adj_cutoff=0.05, and set distances to 8 for DLPFC and 1.3 for PDAC simulations. The resultant lr_sig_scores were used as the LRI scores. For COMMOT, we applied a distance threshold (dis_thr) of 2.1 and enabled heteromeric interactions. Lastly, spatialcorr was evaluated with a bandwidth of 10, but it took too long to process the DLPFC simulations, leading to its exclusion from further evaluations. A significance threshold of 0.01 was applied to the final LRI results from all methods, consistent with the threshold used in SPIDER.
Datasets
An overview of datasets, including cell or spot number, gene number, as well as the number of interfaces and SVIs, are included in Supplementary Table 9. Preprocessing of count matrices, LR pair databases, and interfaces are described in Supplementary Method 1.4.
Human pancreatic ductal adenocarcinoma dataset
Raw data of the Spatial Transcriptomics sample PDAC-A of pancreatic ductal adenocarcinoma is collected from GSE111672, and the spot annotations are from Fig. 2 of the publication28. Cell type deconvolution with paired scRNA-seq data is performed by SPOTlight following the github tutorial.
SAW mouse brain dataset
Raw data and clusters of the stereo-seq mouse brain samples are collected from CNP000443730.
Mouse hippocampus dataset
Raw data and annotations of the slide-seq V2 mouse hippocampus samples are collected from the Single Cell Portal under project SCP8158. Cell type deconvolution with a mouse single-cell RNA-seq hippocampus dataset74 is performed by Seurat following the official tutorial.
Mouse embryo dataset
Raw data and cell types of the seqFISH mouse embryos are collected from the Spatial Mouse Atlas29.
Human dorsolateral prefrontal cortex dataset
The raw counts and manual annotations of the human dorsolateral prefrontal cortex samples are collected from globus. For each biological sample, one repeat sample is selected, namely samples 151,510, 151,672, and 151,673.
Human breast cancer dataset
From the HER2-positive breast cancer Spatial Transcriptomics samples provided in the study of Wu et al.57, we selected six samples namely A1, B1, C1, D1, G2, and F1, whose count matrices and spot annotations are collected from zenodo project 3957257.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The ST datasets analyzed in this study are collected from their original publications: Spatial Transcriptomics data of PDAC-A with GEO accession number GSE11167228; Stereo-seq data of the mouse brain with the China National GeneBank DataBase Sequence Archive with the accession number CNP000443730; Slide-seq V2 data of the mouse hippocampus at the Single Cell Portal under project SCP815 at https://singlecell.broadinstitute.org/single_cell/study/SCP815/sensitive-spatial-genome-wide-expression-profiling-at-cellular-resolution8; seqFISH data of mouse embryos from the Spatial Mouse Atlas dataset at https://content.cruk.cam.ac.uk/jmlab/SpatialMouseAtlas2020/29; Visium data of the human dorsolateral prefrontal cortex from the globus website at http://research.libd.org/globus5; Visium data of human breast cancer from Andersson et al. under the zenodo project 3957257 at https://zenodo.org/record/395725757. SVI lists for all samples are provide in Supplementary Data 1. Source data are provided with this paper.
Code availability
The source code of SPIDER is available in https://github.com/deepomicslab/SPIDERunder the MIT License and the scripts to reproduce the results in our manuscript are in https://github.com/deepomicslab/SPIDER-paper. The original source code for third-party software used in this study is also publicly available under OSI-approved licenses. In particular, we would like to thank the authors of COMMOT, SOMDE, SpatialDE, SpatialDE2, SPARKX, scGCO, nnSVG, and SpatialDM for their work, which was essential for our analyses. Tutorials of SPIDER are available in https://spiderst-readmedoc.readthedocs.io/en/latest/. A working example of SPIDER is hosted by Code Ocean in https://codeocean.com/capsule/119403875.
References
Bich, L., Pradeu, T. & Moreau, J.-F. Understanding multicellularity: the functional organization of the intercellular space. Front. Physiol. 10, 1170 (2019).
AlMusawi, S., Ahmed, M. & Nateri, A. S. Understanding cell-cell communication and signaling in the colorectal cancer microenvironment. Clin. Transl. Med. 11, e308 (2021).
Baghban, R. et al. Tumor microenvironment complexity and therapeutic implications at a glance. Cell Commun. Signal. 18, 1–19 (2020).
Wang, S. et al. A systematic evaluation of the computational tools for ligand-receptor-based cell-cell interaction inference. Brief. Funct. Genomics 21, 339–356 (2022).
Maynard, K. R. et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci. 24, 425–436 (2021).
Eng, C.-H. L. et al. Transcriptome-scale super-resolved imaging in tissues by rna seqfish+. Nature 568, 235–239 (2019).
Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell 185, 1777–1792 (2022).
Stickels, R. R. et al. Highly sensitive spatial transcriptomics at near-cellular resolution with slide-seqv2. Nat. Biotechnol. 39, 313–319 (2021).
Liu, Z., Sun, D. & Wang, C. Evaluation of cell-cell interaction methods by integrating single-cell RNA sequencing data with spatial information. Genome Biol. 23, 1–38 (2022).
Dries, R. et al. Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol. 22, 1–31 (2021).
Shao, X. et al. Knowledge-graph-based cell-cell communication inference for spatially resolved transcriptomic data with spatalk. Nat. Commun. 13, 4429 (2022).
Cang, Z. et al. Screening cell–cell communication in spatial transcriptomics via collective optimal transport. Nat. Methods 20, 218–228 (2023).
Pham, D. et al. Robust mapping of spatiotemporal trajectories and cell–cell interactions in healthy and diseased tissues. Nat. Commun. 14, 7739 (Nature Publishing Group UK London, 2023).
Walker, B. L., Cang, Z., Ren, H., Bourgain-Chang, E. & Nie, Q. Deciphering tissue structure and function using spatial transcriptomics. Commun. Biol. 5, 1–10 (2022).
Wang, Y. et al. Deciphering the transcriptional landscape of human pluripotent stem cell-derived gnrh neurons: the role of Wnt signaling in patterning the neural fate. Stem Cells 40, 1107–1121 (2022).
Edsgärd, D., Johnsson, P. & Sandberg, R. Identification of spatial expression trends in single-cell gene expression data. Nat. Methods 15, 339–342 (2018).
Li, K. et al. Computational elucidation of spatial gene expression variation from spatially resolved transcriptomics data. Mol. Ther.-Nucleic Acids 27, 404–411 (2022).
Moran, P. A. Notes on continuous stochastic phenomena. Biometrika 37, 17–23 (1950).
Geary, R. C. The contiguity ratio and statistical mapping. Incorporated Stat. 5, 115–146 (1954).
Hao, M., Hua, K. & Zhang, X. Somde: a scalable method for identifying spatially variable genes with self-organizing map. Bioinformatics 37, 4392–4398 (2021).
Kats, I., Vento-Tormo, R. & Stegle, O. Spatialde2: Fast and localized variance component analysis of spatial transcriptomics. bioRxiv (2021).
Weber, L. M., Saha, A., Datta, A., Hansen, K. D. & Hicks, S. C. nnSVG for the scalable identification of spatially variable genes using nearest-neighbor Gaussian processes Nat. Commun. 14, 4059 (Nature Publishing Group UK London, 2023).
Zhu, J., Sun, S. & Zhou, X. Spark-x: non-parametric modeling enables scalable and robust detection of spatial expression patterns for large spatial transcriptomic studies. Genome Biol. 22, 1–25 (2021).
Zhang, K., Feng, W. & Wang, P. Identification of spatially variable genes with graph cuts. Nat. Commun. 13, 1–15 (2022).
Arnol, D., Schapiro, D., Bodenmiller, B., Saez-Rodriguez, J. & Stegle, O. Modeling cell-cell interactions from spatial molecular data with spatial variance component analysis. Cell Rep. 29, 202–211 (2019).
Bernstein, M. N. et al. Spatialcorr identifies gene sets with spatially varying correlation structure. Cell Rep. Methods 2, 100369 (2022).
Li, Z., Wang, T., Liu, P. & Huang, Y. Spatialdm for rapid identification of spatially co-expressed ligand–receptor and revealing cell–cell communication patterns. Nat. Commun. 14, 3995 (2023).
Moncada, R. et al. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol. 38, 333–342 (2020).
Lohoff, T. et al. Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis. Nat. Biotechnol. 40, 74–85 (2022).
Gong, C. et al. Saw: an efficient and accurate data analysis workflow for stereo-seq spatial transcriptomics. GigaByte 2024, 111 (2024).
Aurenhammer, F. Power diagrams: properties, algorithms and applications. SIAM J. Comput. 16, 78–96 (1987).
Wittek, P., Gao, S. C., Lim, I. S. & Zhao, L. Somoclu: An efficient parallel library for self-organizing maps. J. Stat. Softw. 78, 1–21 (2017).
Kanehisa, M., Furumichi, M., Sato, Y., Kawashima, M. & Ishiguro-Watanabe, M. Kegg for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res. 51, D587–D592 (2023).
Gillespie, M. et al. The reactome pathway knowledgebase 2022. Nucleic Acids Res. 50, D687–D692 (2022).
Chen, C., Kim, H. J. & Yang, P. Evaluating spatially variable gene detection methods for spatial transcriptomics data. Genome Biol. 25, 18 (2024).
Wu, X., Chen, S. & Lu, C. Amyloid precursor protein promotes the migration and invasion of breast cancer cells by regulating the MAPK signaling pathway. Int. J. Mol. Med. 45, 162–174 (2020).
Pacchiana, R. et al. 3-bromo-isoxazoline derivatives inhibit gapdh enzyme in PDAC cells triggering autophagy and apoptotic cell death. Cancers 14, 3153 (2022).
Ma, Y. et al. Identification of potential hub genes associated with the pathogenesis and prognosis of pancreatic duct adenocarcinoma using bioinformatics meta-analysis of multi-platform datasets. Oncol. Lett. 18, 6741–6751 (2019).
Hou, R., Denisenko, E., Ong, H. T., Ramilowski, J. A. & Forrest, A. R. Predicting cell-to-cell communication networks using natmi. Nat. Commun. 11, 5011 (2020).
Zhao, J. et al. Suppression of fgf5 and fgf18 expression by cholesterol-modified sirnas promotes hair growth in mice. Front. Pharmacol. 12, 666860 (2021).
Rice, R. et al. Disruption of fgf10/fgfr2b-coordinated epithelial-mesenchymal interactions causes cleft palate. J. Clin. Investig. 113, 1692–1700 (2004).
Haas, N. et al. Kit transduced signals counteract erythroid maturation by MAPK-dependent modulation of erythropoietin signaling and apoptosis induction in mouse fetal liver. Cell Death Differ. 22, 790–800 (2015).
van Wijk, B., Moorman, A. F. & van den Hoff, M. J. Role of bone morphogenetic proteins in cardiac differentiation. Cardiovasc. Res. 74, 244–255 (2007).
Fischer, T. et al. Fgf15-mediated control of neurogenic and proneural gene expression regulates dorsal midbrain neurogenesis. Develop. Biol. 350, 496–510 (2011).
Wolf, F. A., Angerer, P. & Theis, F. J. Scanpy: large-scale single-cell gene expression data analysis. Genome Biol. 19, 1–5 (2018).
Blak, A. A. et al. Expression of fgf receptors 1, 2, and 3 in the developing mid-and hindbrain of the mouse. Develop. Dyn. Off. Publ. Am. Assoc. Anatomists 233, 1023–1030 (2005).
Cheli, V. T., Correale, J., Paez, P. M. & Pasquini, J. M. Iron metabolism in oligodendrocytes and astrocytes, implications for myelination and remyelination. ASN Neuro 12, 1759091420962681 (2020).
Comeras, L. B., Hörmer, N., Mohan Bethuraj, P. & Tasan, R. O. Npy released from GABA neurons of the dentate gyrus specially reduces contextual fear without affecting cued or trace fear. Front. Synaptic Neurosci. 13, 635726 (2021).
Chen, M. S. et al. Nogo-a is a myelin-associated neurite outgrowth inhibitor and an antigen for monoclonal antibody in-1. Nature 403, 434–439 (2000).
Caldwell, M. et al. Axo-glial interactions between midbrain dopamine neurons and oligodendrocyte lineage cells in the anterior corpus callosum. Brain Struct. Funct. 228, 1993–2006 (2023).
Patel, N. et al. Developmental regulation of tac1 in peptidergic-induced human mesenchymal stem cells: implication for spinal cord injury in zebrafish. Stem Cells Dev. 21, 308–320 (2012).
Binder, D. K. Neurotrophins in the dentate gyrus. Prog. Brain Res. 163, 371–397 (2007).
Nguyen, H.-L. et al. Tgf-β signaling in endothelial cells, but not neuroepithelial cells, is essential for cerebral vascular development. Lab. Investig. 91, 1554–1563 (2011).
Linneberg, C., Harboe, M. & Laursen, L. S. Axo-glia interaction preceding CNS myelination is regulated by bidirectional eph-ephrin signaling. ASN Neuro 7, 1759091415602859 (2015).
Ma, D. et al. Ligand activation mechanisms of human KCNQ2 channel. Nat. Commun. 14, 6632 (2023).
Blockus, H. et al. Synaptogenic activity of the axon guidance molecule robo2 underlies hippocampal circuit function. Cell Rep. 37, 109828 (2021).
Andersson, A. et al. Spatial deconvolution of her2-positive breast tumors reveals novel intercellular relationships. bioRxiv 2020-07 (2020).
Fu, Y., Qu, H., Qu, D. & Zhao, M. Trajectory inference with cell-cell interactions (ticci): intercellular communication improves the accuracy of trajectory inference methods. Bioinformatics 41, btaf027 (2025).
Chen, I. X. et al. Blocking CXCR4 alleviates desmoplasia, increases T-lymphocyte infiltration, and improves immunotherapy in metastatic breast cancer. Proc. Natl. Acad. Sci. USA 116, 4558–4566 (2019).
Qi, L., Sun, B., Yang, B. & Lu, S. Cebpb regulates the migration, invasion and EMT of breast cancer cells by inhibiting THBS2 expression and o-fucosylation. Hum. Mol. Genet. 32, 1850–1863 (2023).
Novak, C. M., Horst, E. N., Taylor, C. C., Liu, C. Z. & Mehta, G. Fluid shear stress stimulates breast cancer cells to display invasive and chemoresistant phenotypes while upregulating plau in a 3D bioreactor. Biotechnol. Bioeng. 116, 3084–3097 (2019).
Uzair, I. D., Conte Grand, J., Flamini, M. I. & Sanchez, A. M. Molecular actions of thyroid hormone on breast cancer cell migration and invasion via cortactin/n-wasp. Front. Endocrinol. 10, 139 (2019).
Wang, J. et al. Progression from ductal carcinoma in situ to invasive breast cancer: molecular features and clinical significance. Signal Transduct. Target. Ther. 9, 83 (2024).
Wang, J. et al. Spatial transcriptomic analysis drives pet imaging of tight junction protein expression in pancreatic cancer theranostics. Nat. Commun. 15, 10751 (2024).
Ren, H., Walker, B. L., Cang, Z. & Nie, Q. Identifying multicellular spatiotemporal organization of cells with spaceflow. Nat. Commun. 13, 4076 (2022).
Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun. 13, 1739 (2022).
Armingol, E., Officer, A., Harismendy, O. & Lewis, N. E. Deciphering cell-cell interactions and communication from gene expression. Nat. Rev. Genet. 22, 71–88 (2021).
Shao, X. et al. Celltalkdb: a manually curated database of ligand–receptor interactions in humans and mice. Brief. Bioinforma. 22, bbaa269 (2021).
Yu, D., Wang, H., Chen, P. & Wei, Z. Mixed pooling for convolutional neural networks. In Proc. International Conference on Rough Sets and Knowledge Technology, 364–375 (Springer, 2014).
Svensson, V., Teichmann, S. A. & Stegle, O. Spatialde: identification of spatially variable genes. Nat. Methods 15, 343–346 (2018).
Liu, Y. et al. Acat: a fast and powerful p value combination method for rare-variant analysis in sequencing studies. Am. J. Hum. Genet. 104, 410–421 (2019).
Fang, Z., Liu, X. & Peltz, G. Gseapy: a comprehensive package for performing gene set enrichment analysis in Python. Bioinformatics 39, btac757 (2023).
Palla, G. et al. Squidpy: a scalable framework for spatial omics analysis. Nat. Methods 19, 171–178 (2022).
Saunders, A. et al. Molecular diversity and specializations among the cells of the adult mouse brain. Cell 174, 1015–1030 (2018).
Shiying, L. Spider: finding spatially variable ligand-receptor interactions with functional support from downstream genes. (CodeOcean, 2024).
Doreen, R. Abstract concept icon of human cells in line illustration. https://stock.adobe.com/images/abstract-concept-icon-of-human-cells-in-line-illustration/229924712. Standard Image, Adobe Stock, Asset ID: 229924712, Licensed 2025-06-18.
Acknowledgements
This work was supported by the National Natural Science Foundation of China (32270687) and the General Research Fund provided by the Research Grants Council of the HKSAR (Project No. 9043559, CityU 11218823). Open Access made possible with partial support from the Open Access Publishing Fund of the City University of Hong Kong.
Author information
Authors and Affiliations
Contributions
The authors confirm the contribution to the article as follows: conception and design of the study: S.C.L.; data collection: S.T.L.; code development: S.Y.L. and R.H.W.; analysis and interpretation of results: S.Y.L.; draft manuscript preparation: S.Y.L. and R.H.W.; all authors reviewed the results and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous, reviewers for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Li, S., Wang, R., Liu, S. et al. Finding spatially variable ligand-receptor interactions with functional support from downstream genes. Nat Commun 16, 7784 (2025). https://doi.org/10.1038/s41467-025-62988-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-62988-0