Identification and characterization of cell niches in tissue from spatial omics data at single-cell resolution

Qian, Jingyang; Shao, Xin; Bao, Hudong; Fang, Yin; Guo, Wenbo; Li, Chengyu; Li, Anyao; Hua, Hua; Fan, Xiaohui

doi:10.1038/s41467-025-57029-9

Download PDF

Article
Open access
Published: 16 February 2025

Identification and characterization of cell niches in tissue from spatial omics data at single-cell resolution

Nature Communications volume 16, Article number: 1693 (2025) Cite this article

34k Accesses
28 Citations
29 Altmetric
Metrics details

Subjects

Abstract

Deciphering the features, structure, and functions of the cell niche in tissues remains a major challenge. Here, we present scNiche, a computational framework to identify and characterize cell niches from spatial omics data at single-cell resolution. We benchmark scNiche with both simulated and biological datasets, and demonstrate that scNiche can effectively and robustly identify cell niches while outperforming other existing methods. In spatial proteomics data from human triple-negative breast cancer, scNiche reveals the influence of the microenvironment on cellular phenotypes, and further dissects patient-specific niches with distinct cellular compositions or phenotypic characteristics. By analyzing mouse liver spatial transcriptomics data across normal and early-onset liver failure donors, scNiche uncovers disease-specific liver injury niches, and further delineates the niche remodeling from normal liver to liver failure. Overall, scNiche enables decoding the cellular microenvironment in tissues from single-cell spatial omics data.

stClinic dissects clinically relevant niches by integrating spatial multi-slice multi-omics data in dynamic graphs

Article Open access 16 June 2025

Nicheformer: a foundation model for single-cell and spatial omics

Article Open access 30 October 2025

Quantitative characterization of cell niches in spatially resolved omics data

Article Open access 18 March 2025

Introduction

The cell niche, also referred to as the cellular microenvironment or spatial domain, is defined as the local environment or communities surrounding cells and plays a critical role in determining various biological processes, such as maintaining tissue homeostasis^1,2,3 and shaping disease progression^4,5,6. Recent advances in spatial omics technologies^{7,8,9,10,11,12,13,14,15} provide molecular profiles at single-cell resolution, allowing systematic exploration of cellular states, functions, and interactions in the tissue context. However, while these advances have offered extensive spatial atlases, it remains a challenge to decipher the latent cell niche information within these data accurately.

Various computational methods have been developed to identify cell niches by integrating molecular profiles of the cell with spatial information. Early methods such as HMRF¹⁶, BayesSpace¹⁷, and DR-SC¹⁸ employ a Potts model to encourage physically proximal cells to have the same label. This strategy assumes that a cell niche is a region with homogeneous gene expression and models the gene expression of all cells with the same distribution, which cannot accurately capture the gene expression heterogeneity of different cell types within the same niche¹⁹. As an improvement, BASS¹⁹ introduces additional hierarchical modeling structures on top of the Potts model to explicitly model heterogeneous gene expression of different cell types, thus enabling more flexible and effective modeling of spatial omics data. SCGP²⁰, as another class of method, constructs spatial cellular graphs by computing spatial edges and feature edges between cells separately, enabling traditional graph community detection algorithms to identify cell niches. Most other subsequent methods, on the other hand, tend to combine the molecular profiles of the cell itself with that of its neighbors in different ways to generate new features that are more representative of the cell niche. Specifically, UTAG²¹ and CellCharter²² integrate the molecular profiles of neighbors into the cell’s own molecular profiles using linear weighting and neighborhood aggregation, respectively. BANKSY²³ generates neighbor-augmented features by combining the molecular profiles of the cell itself with that of its neighbors, and provides a specific hyperparameter to tune the contributions of the cells and their local microenvironments. Deep learning-based methods such as SpaGCN²⁴, STAGATE²⁵, GraphST²⁶, and SpaceFlow²⁷, learn better latent features through graph neural networks. In addition, there are also some methods primarily used for spatial proteomics data such as CytoCommunity²⁸ and Spatial-LDA²⁹, which rely on well-annotated cell type information and utilize only the neighborhood composition features of cells to identify cell niches. However, these methods may not reveal some niches located in spatially specific regions, such as the tumor-immune interface, where both tumor and immune cells exhibit altered molecular profiles¹³. Overall, the effectiveness of these methods suggests that the various features of cells and their microenvironments may both be potentially helpful in accurately identifying cell niches. However, current methods are generally designed based on a fixed architecture of feature combinations, which may have limitations when users want to integrate specific combinations of features they only have. In addition, except for a few methods such as BANKSY and CellCharter, most methods are primarily demonstrated on small datasets (such as the spatial transcriptomics data of individual tissue slices). Scaling to large datasets with dozens or hundreds of tissue slices and simultaneously identifying conserved or specific cell niches across these slices remains a prominent challenge in the field.

In this study, we define the features of the cell itself (e.g., the molecular profiles of the cell) and various features of its microenvironment (e.g., the cellular compositions or molecular profiles of neighborhoods of the cell) in a unified way as features from different “views” of the cell, and introduce scNiche, a computational method that leverages these multi-views features of the cell to identify and characterize cell niches in tissue. We highlight the novelty and strengths of scNiche over other existing methods: 1) unlike most previous deep learning-based methods, which typically run graph neural networks on the spatial graph to integrate molecular profiles, scNiche first constructs separate graphs for features from different views of the cell, and then utilizes the graph neural networks to integrate these multi-views features of the cell into a meaningful joint representation of niches. This unique model framework allows the flexibility to dynamically replace or add features from other views of the cell in practice, and as such can be used as a model paradigm to comprehensively consider and investigate the optimal combination of multi-views features of the cell for niche modeling; 2) through the batch training strategy, scNiche can scale to large datasets containing millions of cells from a series of tissue slices, holding the potential to simultaneously identify conserved or specific cell niches across multiple slices or samples.

We first benchmarked the performance of scNiche with existing methods using simulated and biological datasets. We then applied scNiche to a variety of spatial omics datasets from different tissues, including human triple-negative breast cancer across two archetypical subtypes (mixed and compartmentalized)¹³ as well as mouse liver under normal and early-onset liver failure states¹⁰, to identify patient- or disease-specific cell niches and to further provide comprehensive characterization and interpretation of these niches from both the cellular composition and molecular expression perspectives.

Results

Design concept of scNiche

scNiche is designed to leverage and integrate multi-view features of the cell from both itself and its microenvironment to identify cell niches. By default, scNiche takes single-cell spatial omics data as input and first extracts the following three-view features of each cell within a pre-defined neighborhood range: the molecular profiles of the cell, the molecular profiles of its neighborhoods, and the cellular compositions of its neighborhoods (Fig. 1a). Notably, when applied to spatial transcriptomics datasets containing multiple tissue slices, dimensionality reduction and batch correction on the features of the first two views are usually necessary to balance the dimensionality of different views while eliminating potential batch effects across different slices (Methods). On the other hand, in addition to the default three views, features from other views (such as the histological information of cells or the deconvoluted cellular compositions of spots in the low-resolution spatial transcriptomics data) can also be added or replaced conveniently, allowing for a more flexible investigation of the optimal combination of multi-view features of the cell for niche modeling. Subsequently, scNiche applies a neural network architecture of the multiple graph autoencoder (M-GAE) coupled with a graph fusion network (GFN) to integrate the multi-view features of the cell into a joint representation (z). Specifically, the M-GAE model encodes the complementary information of multi-view data, and the GFN captures the relationships among graphs from different views and generates a consensus graph that contains a global node relationship across all views, which is then input back into the M-GAE model. scNiche also applies a multi-view mutual information maximization (MMIM) module to guide the joint representation (z) to be more clustering-friendly by boosting the similarity between representations of neighboring samples within any view (Fig. 1a and Supplementary Fig. S1). The training process is guided by minimizing the combined loss function comprising the M-GAE reconstruction, graph reconstruction, and mutual information loss (Fig. 1a and Methods). Additionally, a batch training strategy is developed to enable scNiche to efficiently handle large datasets (Methods). After model training, the learned joint representation (z) can be clustered using any unsupervised clustering algorithms such as k-means or Leiden³⁰ to identify the cell niches. Finally, scNiche also implements an integrated downstream analytical framework for the comprehensive characterization of identified cell niches (Fig. 1b and Methods).

Multi-view feature fusion improves the accuracy of cell niche identifications

We first evaluated the performance of scNiche using the simulated datasets generated by scCube³¹, where the heterogeneity in both the cellular composition and gene expression of cell niches was considered. Furthermore, the cell niches in each simulated dataset exhibited variations in spatial continuity and compositional complexity, aiming to simulate the cellular microenvironment across different tissues (Supplementary Fig. S2a and Methods). Ten existing methods (DC-SC¹⁸, BASS¹⁹, UTAG²¹, CellCharter²², BANKSY²³, SpaGCN²⁴, STAGATE²⁵, GraphST²⁶, SpaceFlow²⁷, and CytoCommunity²⁸) were selected for comparison. Two evaluation metrics, the adjusted Rand index (ARI) and the macro-F1 score, were used to assess the accuracy of identifying true cell niches. As shown in Supplementary Fig. S2b, scNiche outperformed other methods in accurately identifying cell niches, with its performance being nearly unaffected by the spatial continuity or compositional complexity of cell niches (Supplementary Fig. S2c–e).

We next assessed the performance of scNiche when the data quality degrades through two simulation scenarios. Specifically, in one scenario, we randomly set the expression values of a certain proportion of genes to 0 (the gene expression dropout), and in another scenario, we randomly altered the cell annotation labels of a certain proportion of cells to “ambiguous” (the cell annotation dropout) (Methods). As expected, the accuracy of all methods dropped as the data quality degraded (Supplementary Fig. S3a, b). In the former simulation scenario, scNiche exhibited relatively stable performance at lower dropout rates of gene expression. However, for higher dropout rates of gene expression, the performance of scNiche and all other methods declined dramatically (Supplementary Fig. S3a). In the latter simulation scenario, we found that the performance of scNiche was more robust compared to CytoCommunity as the dropout rates of cell annotation increased, suggesting that scNiche’s strategy of multi-view feature fusion may effectively mitigate the impact of ambiguous cell annotations by considering the cell gene expression information (Supplementary Fig. S3b).

We also conducted ablation studies on each view of the three default inputs as well as on each model component of scNiche respectively to assess their individual contributions. For the former, as shown in Supplementary Table 1–2, scNiche outperformed all its derivatives, each of which excludes the fusion of features from a specific view, indicating that features from all three views contribute to the accurate identification of cell niches. Furthermore, scNiche also performed better than using the features from a single view alone, and its model-based feature fusion strategy was superior to the simple concatenation of features from different views. For the latter, as expected, scNiche was unable to effectively encode the complementary information from multiple views of cells when the M-GAE or GFN component was removed, resulting in an inability to accurately model the cellular microenvironment (Supplementary Table 3). In addition, the performance of scNiche-w/o MMIM also declined compared to scNiche, suggesting that the MMIM component contributes to the learning of more discriminative joint representations (Supplementary Table 3).

In summary, the performance evaluation on the simulated datasets demonstrated that scNiche can effectively integrate information from different views of the cell and holds the potential to identify cell niches accurately.

Performance evaluation of scNiche on mouse spleen CODEX dataset

The real spatial omics data are better examples than simulated data for evaluating the performance of scNiche because the cell niches therein are objectively present and biologically meaningful. Therefore, we first applied scNiche to a mouse spleen spatial proteomics dataset generated by the co-detection by indexing (CODEX) technology¹². The compartment label of each cell from three wild-type spleen samples (BALBc-1, BALBc-2, and BALBc-3) was provided and can be regarded as the ground truth of niches (Fig. 2a). We first evaluated the performance of scNiche’s batch training strategy on this dataset. As shown in Supplementary Fig. S4, our results indicated that scNiche maintained a relatively stable performance under different batch number settings without requiring additional training epochs. Furthermore, the performance of scNiche in identifying cell niches across multiple slices is comparable to that of using only a single slice, which was consistent with previous findings³² (Supplementary Fig. S5). Benchmarking results showed that scNiche outperformed other methods on both evaluation metrics, and accurately identified the marginal zone (a unique cell niche located on the periphery of the B cell follicle) in all three samples (Fig. 2b, c).

**Fig. 2: Performance evaluation of scNiche on spatial proteomics datasets.**

Performance evaluation of scNiche on human upper tract urothelial carcinoma IMC dataset

We also applied scNiche to another spatial proteomics dataset from human upper tract urothelial carcinoma (UTUC) generated by the Imaging Mass Cytometry (IMC) technology³³ to further evaluate its performance. In this dataset, 16 images had been manually annotated with boundaries of tumor and stroma, which can be regarded as the ground truth of niches²¹ (Fig. 2d and Supplementary Fig. S6). As shown in Fig. 2e, f and Supplementary Fig. S6-7, although distinguishing tumor and stroma niches in this dataset is a relatively simple task and all methods achieved comparatively good performance in most samples, scNiche still demonstrated the best overall performance across all 16 samples, and successfully resolved the fine structure of the boundaries of tumor and stroma niches in some samples such as PM57_B8-01. Furthermore, scNiche with higher clustering granularity identified more fine-grained niches, including different immune-enriched niches and tumor-enriched niches (Supplementary Fig. S8).

Additionally, since the finer subpopulation annotation of each cell was also provided by the original authors, we thus further evaluated the robustness of scNiche to the granularity of cell population annotation. As illustrated in Supplementary Fig. S9, scNiche continued to accurately identify tumor and stromal niches when using the refined cell population annotation of cells and consistently outperformed CytoCommunity, which also utilized the refined cellular annotation information.

Performance evaluation of scNiche on mouse brain spatial transcriptomics datasets

We further evaluated the performance of scNiche on two additional mouse brain single-cell spatial transcriptomics datasets with more complex niche structures generated by different technologies. Specifically, the STARmap dataset³⁴ contains one tissue slice from the mouse V1 neocortex and the MERFISH dataset³⁵ contains 31 tissue slices from the mouse frontal cortex and striatum. The brain region label of each cell was manually annotated in both datasets and can be regarded as the ground truth of niches (Fig. 3a, d). Consistent with the results on spatial proteomics datasets, scNiche also demonstrated superior overall performance compared with other methods on these two additional spatial transcriptomics datasets (Fig. 3b, c, e, f and Supplementary Fig. S10-11), suggesting the general applicability of scNiche in accurately identifying cell niches from different spatial omics data.

**Fig. 3: Performance evaluation of scNiche on spatial transcriptomics datasets.**

Performance evaluation of scNiche on low-resolution spatial transcriptomics dataset

We explored the potential applicability of scNiche to the spatial transcriptomics data generated by platforms with a lower resolution such as ST³⁶ and 10X Visium on the human DLPFC 10X Visium dataset³⁷. Specifically, we first used the human middle temporal gyrus (MTG) scRNA-seq data by Hodge et al.³⁸, which contains 75 transcriptomically distinct cell types, as the single-cell reference, and deconvoluted the spots of each DLPFC slice data using Cell2location³⁹ (Supplementary Fig. S12a). The deconvolution results were subsequently used to replace the feature of cellular compositions of neighborhoods utilized in the single-cell spatial omics data, and were input into scNiche along with features from the remaining two views (the molecular profiles of the spot and its neighborhoods). Four tissue slices from the same donor were selected, where the cortical layer label of each spot was manually annotated and can be regarded as the ground truth of niches (Supplementary Fig. S12b). As illustrated in Supplementary Fig. S12c, d, although scNiche was not originally designed for spot-based spatial transcriptomics data, its modified version still performed comparably to some state-of-the-art methods on the four DLPFC slices.

Scalability analysis of scNiche to large datasets

In addition to the spatial transcriptomics and spatial proteomics datasets used in the benchmarking studies, we further tested the scalability of scNiche and other methods on a much larger mouse whole brain MERFISH dataset generated by Zhang et al.⁴⁰. As shown in Supplementary Fig. S13, scNiche, BANKSY, UTAG, and CellCharter are the only four methods that can scale to the dataset with more than 3 million cells. For the mouse whole brain MERFISH dataset, scNiche identified 14 cell niches according to the cluster stability, which were aligned across sequential tissue sections (Supplementary Fig. S14). Moreover, we also selected four representative sections corresponding to different regions of the brain: C57BL6J-1.032, C57BL6J-1.056, C57BL6J-1.081, and C57BL6J-1.136, and compared the cell niches identified by each method with the anatomical regions from the Allen Mouse Brain Reference Atlas⁴¹. As shown in Fig. 4a, b, the niches identified by scNiche accurately correspond to different structures in the mouse brain. In contrast, the niches identified by UTAG and the nonspatial method lack clear spatial separation, while BANKSY and CellCharter failed to distinguish certain brain regions, such as the hypothalamus and striatum in the C57BL6J-1.056 section (Fig. 4c). In summary, these results suggest that scNiche can scale to large datasets while maintaining good performance.

**Fig. 4: Performance evaluation of scNiche on mouse whole brain MERFISH dataset.**

Robustness analysis of scNiche

We first evaluated the robustness of scNiche to the size of the pre-defined neighborhood range on four different datasets. As shown in Supplementary Fig. S15-16, the performance of scNiche was stable when different numbers of k-nearest neighbors were selected. Indeed, previous studies have shown that moderately changing the number of k-nearest neighbors does not lead to a significant degradation in method accuracy. Nevertheless, given the complexity of tissues, it may be still necessary to empirically determine an appropriate size of the neighborhood range in practical analyses^28,42,43,44. Furthermore, we also evaluated the sensitivity of scNiche to different random seed choices, and the results indicated that scNiche was also relatively robust to different random seed choices compared to other methods such as UTAG, SpaGCN, and DR-SC (Supplementary Fig. S17).

scNiche deciphers the cell niches in different subtypes and patients of human triple-negative breast cancer

The tumor microenvironment has been demonstrated by mounting evidence and new therapeutic strategies to play a pivotal role in the initiation and progression of cancer, opening new opportunities for diagnosis and therapy^{45,46,47,48,49}. Here, we applied scNiche to a human triple-negative breast cancer (TNBC) dataset generated by the multiplexed ion beam imaging by time-of-flight (MIBI-TOF) technology¹³. Studies have shown that TNBC exhibits three archetypical subtypes based on different tumor-immune interactions: cold, mixed, and compartmentalized. Among these, the cold subtype is characterized by low immune infiltration and is easily distinguished, while the mixed and compartmentalized subtypes may contain similar numbers of immune cells. Furthermore, the spatial organization and degree of mixing of tumor and immune cells may differ in the mixed and compartmentalized subtypes¹³. Therefore, we used scNiche to decipher the tumor microenvironment of these two TNBC subtypes. A total of 173,205 cells from 19 mixed subtype samples and 15 compartmentalized subtype samples were analyzed.

According to the cluster stability proposed by Varrone et al.²² (Supplementary Fig. S18), scNiche identified 13 cell niches, which broadly manifested as tumor-enriched niches (Niche 1, 10, 2, 9, 3, 5, and 11) and other immune-enriched niches (Niche 7, 6, 4, 0, 8, and 12) characterized by distinct combinations of immune and stroma cell types (Fig. 5a and Supplementary Fig. S19, 20). By comparing the enriched cell niches in the two subtypes of TNBC samples, we found that the tumor-enriched niches were predominantly enriched in the mixed subtype samples. In contrast, other immune-niches were more prevalent in the compartmentalized subtype samples (Fig. 5a). This finding of scNiche was supported by the previous studies^13,50,51 that immune cells in the mixed subtype demonstrated a higher degree of mixing with tumor cells compared to the compartmentalized subtype and were therefore less likely to form spatially separate regions. Furthermore, different niches tended to be shared only among a small number of specific patients (Supplementary Fig. S21a), reflecting the inter-patient heterogeneity of tumor microenvironment⁵².

**Fig. 5: scNiche deciphers the cellular compositional and phenotypic heterogeneity between immune niches from human TBNC dataset.**

The 6 immune-enriched niches showed differential cellular composition, corresponding to distinct microenvironments. Niche 7 exhibited significant enrichment of B cells, CD4 T cells, CD8 T cells, dendritic cells, and NK cells, which may represent the tertiary lymphoid structure (TLS)^53,54; Niche 8 was enriched with endothelial cells and mesenchymal-like cells, potentially representing the stromal microenvironment in tumor (Supplementary Fig. S21b). The two spatially adjacent cell niches, Niche 4 and Niche 0, co-existed in a specific subset of patients (Patient 3, 4, 5, 9, and 27) and were enriched with lymphoid immune cells (such as CD4 T cells) and other immune cells, respectively (Fig. 5b and Supplementary Fig. S21c). It has been reported that cells from the same lineage tended to be more spatially proximate, thus, these two cell niches may reflect the diversity of the immune responses to tumors, whereby specific immune cells were recruited to the tumor sites via specific mechanisms or local environments^13,45,55,56.

We also noticed that the two macrophage-enriched niches (Niche 6 and Niche 12) resolved by scNiche were mainly present in different TNBC subtypes and consisted of macrophages with distinct phenotypes (Fig. 5c). Specifically, while macrophages within these two niches exhibited consistent expression of classical monocyte markers such as CD68 and CD63, those in Niche 6 also displayed increased expression of both CD11b, CD11c, immune regulation proteins, and antigen presentation proteins, suggesting they were myeloid derived suppressor cells⁵⁷ (Fig. 5d). This inconsistency in the phenotype exhibited by cells from the same lineage was observed across the entire patient cohort scales, which may be related to the microenvironments the cells reside in. We found that the macrophages in Niche 6 were likely to exhibit more pronounced spatial proximity to other immune cell types, whereas macrophages in Niche 12 were more likely to co-localize with tumor cells (Fig. 5e). Indeed, Niche 6 may represent a unique niche at the tumor-immune border¹³, with its altered phenotypes compared with Niche 12 arising from changes in the expression profiles across all cell types, rather than being specific to a particular cell population (Fig. 5f, g and Supplementary Fig. S22).

Further cell population enrichment analysis of 7 tumor-enriched niches revealed more subtle compositional differences among them. For instance, Niche 5 and Niche 11, characterized by the enrichment of Keratin⁺ tumor cells, exhibited extremely low immune or stromal cell infiltration. In contrast, other tumor-enriched niches exhibited heterogeneity in the type of cells infiltrated and the degree of infiltration (Fig. 6a, b). Although the cellular compositions of the immune-exhausted tumor niches (Niche 5 and Niche 11) were similar, the tumor cells within these niches exhibited differential expression of the tumor-related proteins, including cytokeratin 6 (CK6) and CK17 (Fig. 6c). Interestingly, Niche 5 and Niche 11 were identified in different patients, potentially representing patient-specific niches composed of cells in distinct cellular states⁵⁸ (Fig. 6d). Furthermore, survival analysis results on public datasets suggested that the differences between these two niches may also reflect phenotypic differences between patients (Supplementary Fig. S23). On the other hand, the tumor cells within other infiltrating tumor-enriched niches exhibited variation in the expression of other types of proteins, which may be associated with the specific infiltrating cell types. For instance, the tumor cells in Niche 9 showed increased expression of antigen presentation proteins such as HLA1 and HLA-DR, suggesting the localized production of cytokines like IFN-γ induced by the extensive infiltration of immune cells^59,60 (Fig. 6e, f). Similarly, we observed that the tumor cells in Niche 3, the tumor-enriched niche infiltrated by stromal cells, displayed high expression of stromal-related proteins such as SMA and vimentin, which may indicate invasion and metastasis, and was often associated with poor prognosis^61,62,63 (Fig. 6g, h).

**Fig. 6: scNiche reveals tumor niches characterized with distinct phenotypes from human TBNC dataset.**

Together, these results effectively highlight the important influence of the microenvironment on cellular phenotypes, while also demonstrating the accuracy of scNiche in revealing both compositional and phenotypic heterogeneity of cell niches.

scNiche characterizes the cell niches in normal and disease mouse livers

To further demonstrate the applicability of scNiche on other types of spatial omics data, we next applied scNiche to a mouse liver spatial transcriptomics dataset generated by Cho et al.¹⁰ with the Seq-Scope technology. A total of 37,505 cells from 6 normal donors and 4 early-onset liver failure induced by excessive mTORC1 signaling⁶⁴ (Tsc1^Δhep/Depdc5^Δhep or TD model) donors were analyzed. Considering the significant batch effects presented in the high-dimensional spatial transcriptomics data between normal and TD livers, we first used scVI⁶⁵ for dimensionality reduction and batch effect removal before applying scNiche (Supplementary Fig. S24).

According to the cluster stability (Supplementary Fig. S25), scNiche identified 15 cell niches, with the majority of them showing specific enrichment in either normal or TD livers, potentially revealing distinct physiological states (Fig. 7a and Supplementary Fig. S26, 27a). Specifically, we found that the 7 cell niches (Niche 0, 3, 12, 5, 14, 11, and 1) enriched in normal livers exhibited spatial continuity, encompassing the zonation patterns from the central vein to the portal node (Fig. 7b). For example, Niche 0 (enriched with pericentral hepatocytes) and Niche 1 (enriched with periportal hepatocytes) were located in the pericentral and periportal zones, respectively, whereas the other 5 niches were situated in the transition zones between pericentral and periportal zones, and characterized by various enrichment patterns of different hepatocyte subtypes and other non-parenchymal cell types (Fig. 7c). Moreover, differentially expressed genes (adjusted p-value < 0.05) within these 7 niches also showed a pronounced spatial expression pattern of zones (Fig. 7d). From Niche 0 to Niche 1, we observed a gradual decrease in the expression of the pericentral genes^66,67,68,69 such as Cyp2e1, Cyp1a2, Mup17, Gsta3, and Gulo. Conversely, the expression of the periportal genes^{66,67,68,69,70} such as Ass1, Alb, Cyp2f2, Sds, Hsd17b13, and Mup20 exhibited a gradual increase (Fig. 7e and Supplementary Fig. S27b). Meanwhile, we also found that some specific genes exhibited the non-monotonic expression patterns across the 7 consecutive niches. For example, the hepcidin-encoding genes, Hamp and Hamp2⁶⁶, which demonstrated non-monotonic zonation expression patterns that peak at intermediate lobule layers, were highly expressed in Niche 5 and Niche 12, respectively (Fig. 7e). Additional non-monotonic genes such as Cyp8b1⁶⁶ and Apoc1^66,71, were highly expressed in Niche 3 and Niche 5, respectively (Supplementary Fig. S27b). The gene expression signature scores of KEGG pathways for the 7 niches also largely recapitulated previous zonation studies^66,67. Cells in Niche 0 had higher scores of drug metabolism, primary bile acid biosynthesis, and metabolism of xenobiotics pathways, while cells in Niche 1 had higher scores of oxidative phosphorylation, gluconeogenesis, and complement and coagulation cascades pathways (Fig. 7f). Overall, these results suggested that scNiche can precisely reveal the zonation profiles of normal livers.

**Fig. 7: scNiche accurately identifies the zonation patterns from the central vein to the portal node in the normal mouse liver.**

To further appreciate the difference in niches between normal and TD livers, we also investigated the cell niches enriched in TD livers. The scNiche results revealed three unique niches in TD livers: Niche 4, Niche 9, and Niche 7. These niches were spatially distributed from the core to the periphery of the injury and inflammation sites, and were characterized by the enrichment of a series of emerging cell populations, including inflamed macrophages, hepatic progenitor cells (HPC), activated hepatic stellate cells (HSC-A), and injured hepatocytes (Figs. 7a, 8a). Differential expression analysis showed that these three niches upregulated a range of injury-associated genes that were individually induced by different cell populations, possibly reflecting the unique response of different cell populations to liver injury as previously reported¹⁰ (Fig. 8b and Supplementary Fig. S27c). For example, injured hepatocytes and HPC highly expressed serum amyloid protein-encoding genes (Saa1 and Saa2) and Spp1, respectively, which have been reported to be associated with injury response^72,73, whereas inflamed macrophages and HSC-A exhibited high expression levels of pro-inflammatory markers (Cd74 and MHC-II components) and fibrosis markers (Acta2 and collagens), respectively. Consistent with these up-regulated markers, the cellular inflammatory infiltration and fibrosis⁷⁴ signature scores in Niche 4, Niche 9, and Niche 7 were also higher than in other niches (Fig. 8c). Overall, these results suggested that these three niches identified by scNiche reflected the specific microenvironment associated with liver injury.

**Fig. 8: scNiche uncovers disease-specific liver injury niches and delineates the niche remodeling from normal liver to liver failure.**

Furthermore, scNiche uncovered the partial remodeling of the zonation patterns from the central vein to the portal node in TD livers compared to normal livers. Specifically, we found that niches proximal to the central vein observed in normal livers (Niche 0, 3 and 12) were also present in TD liver, whereas niches proximal to the portal node were altered, despite no significant changes in their cellular composition (Fig. 8d and Supplementary Fig. S27d). These results indicated specific phenotypic changes in the niches located proximal to the portal node during liver injury, which were precisely captured by scNiche. We further performed the differential expression analysis between Niche 1 and Niche 10, which were located at the portal node in normal and TD livers, respectively, and the results showed that up-regulated genes in Niche 10 compared with Niche 1 comprised some antioxidant genes such as Gpx3, Gsta1, Gsta2, and Gsto1; the cathepsins-encoding genes like Ctsl and Ctsb, which have been reported to potentially mediate liver fibrosis^75,76,77; and several specific cytochrome P450 genes whose expression were increased during liver injury and fibrosis, including Cyp17a1⁷⁸, Cyp2b9^79,80, and Cyp2b10^64,79. In contrast, major urinary protein-encoding genes, carboxylesterase-encoding genes, and Traf5, a protective factor in liver inflammation and hepatic steatosis^81,82, were downregulated in Niche 10 (Supplementary Fig. S27e). Consistent with differential expression analysis results, gene set enrichment analysis (GSEA) confirmed that the up-regulated genes in Niche 10 were enriched in mTORC1 signaling, interferon response, inflammatory response, fibrosis-related, and apoptosis pathways (Fig. 8e). In particular, the mTORC1 signaling pathway, known as the vital pathway for homeostasis, metabolism, transplantation, and regeneration in the liver^83,84,85, has been reported to induce pronounced and extensive hepatocyte damage when activated^64,86,87. The results of CellChat⁸⁸ also confirmed the enhanced interactions related to inflammation and fibrosis between cells in Niche 10 compared to Niche 1 (Supplementary Fig. S28). Interestingly, we further found that these differential expression genes reflecting phenotypic changes during liver injury between Niche 1 and Niche 10 exhibited spatial expression gradients similar to those of pericentral or periportal markers (Fig. 8f), suggesting that scNiche is also capable of precisely deciphering subtle spatial variation trends among different niches in either normal or disease states.

Discussion

In this study, we have presented a computational framework, scNiche, for identifying and characterizing cell niches from spatial omics data at single-cell resolution. scNiche employs a different approach to utilizing graph neural networks compared to previous deep learning-based methods, which typically run graph neural networks on the spatial graph constructed by integrating molecular profiles of the cell with spatial information. Specifically, scNiche first constructs the separate graph for features from each view of the cell, and then integrates them by a multiple graph autoencoder model coupled with a graph fusion network. This approach provides greater flexibility in niche modeling while more comprehensively considering the common and complementary information from multiple views of cells. Additionally, scNiche applies a multi-view mutual information maximization module to guide the learning of more discriminative and clustering-friendly joint representations. Benchmarking studies demonstrated the superior performance of scNiche compared to other existing methods on different spatial omics datasets, including spatial transcriptomics and spatial proteomics.

The batch training strategy of scNiche enables its scalability to large-scale spatial omics datasets containing multiple samples under different conditions to identify homogeneous or heterogeneous cell niches across multiple samples, without compromising on accuracy. Our results on the mouse whole brain dataset containing over 3 million cells effectively indicate the potential of scNiche in this regard. Furthermore, the results on the human TNBC dataset and the mouse liver dataset also convincingly demonstrate the universality of scNiche in identifying refined patient- or disease-specific cell niches. In the former analysis, we deciphered the heterogeneity of cell niches within different TNBC subtypes, and identified patient-specific niches that exhibited distinct phenotypic characteristics. In the latter analysis, we discovered three liver injury-associated niches characterized by the enrichment and co-localization of inflamed macrophages, HSC-A, HPC, and injured hepatocytes. Furthermore, we also revealed the specific remodeling of niches located proximal to the portal node during liver injury.

scNiche also implements an integrated downstream analytical framework for the comprehensive characterization and interpretation of identified cell niches. The enrichment analysis framework of scNiche allows for the comprehensive characterization of identified cell niches from various perspectives (including cellular compositions, conditions, samples, etc.). The multi-sample analysis framework of scNiche allows for differential analyses at the sample scale, such as the comparison of specific niches across different conditions, or the comparison of specific cell populations across different niches, which holds the promise of identifying clinically relevant key niches or cell populations from large-scale datasets while avoiding the influence of individual outliers. On the other hand, benefiting from its modular architecture, scNiche can be conveniently compatible and integrated with other computational tools. For example, in the analysis of the mouse liver dataset in this study, we first applied scVI⁶⁵ to remove batch effects before employing scNiche to identify cell niches. Similarly, in subsequent downstream analysis, we also performed the spatial connectivity analysis among different niches facilitated by Squidpy⁸⁹ apart from the workflow provided by scNiche itself.

We also have some additional concerns and discussions about the “cellular compositions of neighborhoods” view. First, the ablation results of each view on the simulated datasets indicated that this view seemed to contribute less to the accurate identification of cell niches compared to the other two views (Supplementary Table 1). This is expected, as cell types are typically inferred from the molecular profiles of cells; therefore, the “cellular compositions of neighborhoods” view may be just a coarser version of the “molecular profiles of neighborhoods” view. Nevertheless, we found that the performance of scNiche consistently declined across the simulated datasets as well as other biological datasets when this view was removed (Supplementary Fig. S29), suggesting that this view, as an expert-based feature, may help to identify niches more accurately to a certain extent by reducing the potential noise that exist in the original molecular profiles. Second, although our results on both the simulated datasets and the human UTUC IMC dataset showed that scNiche was relatively robust to the dropout and granularity of cell types (Supplementary Fig. S3b, 9b), the quality of the cell type labels still needs to be assessed to avoid introducing additional noise during the subsequent integration of multi-view features. For example, accurate expert-annotated or expert-verified cell type labels typically provide more useful information compared with annotations that are just derived from a clustering algorithm. Finally, cell type labels are usually unavailable for the spatial transcriptomics data generated by platforms with a lower resolution such as ST³⁶ and 10X Visium. To address this issue, features from other views can be used as substitutes, such as the cell type deconvolution results of spots (which can be inferred through a series of spot deconvolution methods^39,90,91) or the histological information extracted from H&E staining images. Benchmarking studies on the human DLPFC 10X Visium dataset suggested that scNiche, using the deconvolution results of spots inferred by Cell2location³⁹ as alternative inputs, performed comparably to other state-of-the-art methods (Supplementary Fig. S12). However, users are supposed to test different deconvolution methods to obtain optimal results of niche identification in practice. Additionally, it is worth noting that scNiche may still be limited in accurately resolving sufficiently fine-grained cellular microenvironments on low-resolution spatial transcriptomics data due to the resolution constraints of the spot arrays. Another alternative strategy is to first employ single-cell spatial mapping^92,93,94 or reconstruction^95,96 methods to generate spatial coordinates for cells, and subsequently apply scNiche to the reconstructed spatially resolved single-cell data, which may effectively overcome the inherent limitations of technical platforms.

Finally, the strategy of scNiche for modeling features from different views of the cell offers more possible avenues for expansion, such as application to spatial multi-omics data. We tested this on a postnatal day (P)22 mouse brain coronal section dataset generated by Zhang et al.⁹⁷, which includes RNA-seq and CUT&Tag (acetylated histone H3 Lys27 (H3K27ac) histone modification) modalities. As shown in Supplementary Fig. S30, scNiche achieved clearer brain region identification compared to the single-modality results provided by the original authors. In summary, scNiche offers an accurate and scalable approach to identify and characterize the cell niches in tissues, with great potential for expanding to larger and more complex datasets.

Methods

Data collection and preprocessing

Two multi-condition scRNA-seq datasets used for constructing simulated data by scCube: The human PBMCs dataset⁹⁸ (eight control vs. eight IFN-β treated samples) and the mouse cortex dataset⁹⁹ (four vehicle vs. four LPS treated samples) were downloaded using the ExperimentHub¹⁰⁰ R package. We performed normalization and principal components analysis (PCA) dimensionality reduction steps on the data using scanpy¹⁰¹ Python package (version 1.9.1) before running scNiche.

Mouse spleen CODEX dataset¹²: Raw data were downloaded from https://data.mendeley.com/datasets/zjnpwh8m5b/1. The compartment labels of all cells from three wild-type spleen samples (BALBc-1, BALBc-2, and BALBc-3) were downloaded from https://github.com/huBioinfo/CytoCommunity. We did not perform the dimensionality reduction step and retained all proteins for running scNiche.

Human upper tract urothelial carcinoma (UTUC) IMC dataset³³: Processed h5ad files that contain raw data were downloaded from https://doi.org/10.5281/zenodo.6376766. A total of 16 images with manually annotated topological domain labels were utilized. We did not perform the dimensionality reduction step and retained all proteins for running scNiche.

Mouse V1 neocortex STARmap dataset³⁴: Raw data were downloaded from https://zenodo.org/record/7830764#.ZDpObi-1HUI. A total of one slice replicate was utilized. We performed normalization and PCA dimensionality reduction steps on the data using scanpy¹⁰¹ Python package (version 1.9.1) before running scNiche.

Mouse frontal cortex and striatum MERFISH dataset³⁵: Processed h5ad files were downloaded from CELLxGENE (https://cellxgene.cziscience.com/collections/31937775-0602-4e52-a799-b6acdd2bac2e). A total of 31 tissue slices were utilized. We retained 7 major niches shared across all slices: striatum, cortical layer VI, cortical layer V, corpus callosum, cortical layer II/III, olfactory region, and pia mater. In addition, since the data have been normalized by the original authors, we directly performed the PCA dimensionality reduction step using scanpy¹⁰¹ Python package (version 1.9.1) before running scNiche.

Human middle temporal gyrus (MTG) snRNA-seq dataset³⁸: Raw data were downloaded from https://portal.brain-map.org/atlases-and-data/rnaseq/human-mtg-smart-seq, containing 15,928 cells of 75 transcriptomically distinct cell types.

Human dorsolateral prefrontal cortex (DLPFC) 10X Visium dataset³⁷: Raw data were downloaded from http://spatial.libd.org/spatialLIBD/. A total of 4 tissue slices from the same donor (slice 151673, 151674, 151675, 151676) were utilized. Before running scNiche, we performed the normalization step using scanpy¹⁰¹ Python package (version 1.9.1) and subset the data based on the top 2000 highly variable genes (HVGs). Subsequently, we performed the dimensionality reduction and batch effect removal steps using scvi⁶⁵ Python package (version 1.1.2) on the data.

Mouse whole brain (ABCA-1) MERFISH dataset⁴⁰: Processed h5ad files were downloaded from CELLxGENE (https://cellxgene.cziscience.com/collections/0cca8620-8dee-45d0-aef5-23f032a5cf09). A total of 129 coronal sections were utilized after removing the sections that were not registered to the Allen CCFv3⁴¹. Since the data have been normalized by the original authors, we directly performed the PCA dimensionality reduction step using scanpy¹⁰¹ Python package (version 1.9.1) before running scNiche.

Human triple-negative breast cancer (TNBC) MIBI-TOF dataset¹³: Raw data were downloaded from Spatial Omics DataBase¹⁰² (https://gene.ai.tencent.com/SpatialOmics/dataset?datasetID=47). A total of 19 mixed subtype samples and 15 compartmentalized subtype samples were utilized. We did not perform the dimensionality reduction step and retained all proteins for running scNiche.

Mouse liver Seq-Scope dataset¹⁰: Processed RDS files that contain raw gene expression matrix and cell type annotation information were downloaded from Deep Blue Data (https://doi.org/10.7302/cjfe-wa35). A total of 6 normal donors and 4 early-onset liver failure donors were utilized. Before running scNiche, we performed the normalization step using scanpy¹⁰¹ Python package (version 1.9.1) and subset the data based on the top 2000 highly variable genes (HVGs). Subsequently, we performed the dimensionality reduction and batch effect removal steps using scvi⁶⁵ Python package (version 1.1.2) on the data.

Mouse brain spatial CUT&Tag–RNA-seq dataset⁹⁷: Processed h5ad files were downloaded from https://zenodo.org/records/10362607. Since the data have been processed by the authors, we directly used the low-dimensional representations of RNA (reduced by PCA) and CUT&Tag (reduced by latent semantic indexing) modalities.

The detailed description of each dataset is summarized in Supplementary Data 1.

Design of scNiche

Model overview

The scNiche model is developed based on the multi-view clustering method proposed by Wang et al.¹⁰³ and consists of three components: M-GAE, GFN, and MMIM. Importantly, we innovate the original framework in the following ways to better adapt to the spatial omics data: (1) we expand the M-GAE model to allow the creation of graph convolutional layers for each view in an extensible manner, replacing the design of a fixed number of views in the original framework. This optimization greatly enhances the flexibility of niche modeling with different numbers or combinations of views; (2) unlike the sequential architecture of M-GAE and GFN in the original framework, we develop an improved model architecture that couples these two components, and optimize the corresponding training process so that the parameters of both M-GAE and GFN can be updated simultaneously during training, enhancing the synergy between them; and (3) we develop a subgraph-based batch training strategy to adapt to the increasing size of spatial omics data, enabling scNiche to scale to large datasets.

scNiche initially extracts the multi-view features of cells within a pre-defined neighborhood range from the given spatial omics data and constructs graphs corresponding to each view. Notably, the extracted features can be obtained with dimensionality reduction and batch effect removal steps as needed before graph construction. Subsequently, scNiche applies this coupled neural network architecture of M-GAE and GFN to integrate information from multiple views and learn a joint representation. Furthermore, the MMIM module is also introduced to the learning of more clustering-friendly joint representation. Finally, scNiche uses k-means algorithm on the learned joint representation to identify cell niches, although other unsupervised clustering methods such as Leiden³⁰ algorithm are also provided. Below we describe each step of the workflow of scNiche in detail.

Cellular neighborhoods determination

scNiche applies the k-nearest neighbors algorithm to determine the size of cellular neighborhoods. We evaluated the robustness of scNiche to the different values of k on the simulated and biological datasets (Supplementary Fig. S15-16).

Dimensionality reduction and batch effect removal

In contrast to spatial proteomics data, which usually contain only a few dozen proteins, spatial transcriptomics data can often measure hundreds to thousands of genes, with potential batch effects commonly present across tissue slices from different samples. Therefore, dimensionality reduction and batch effect removal need to be performed on the molecular profiles of the cells and their neighborhoods before multi-view feature fusion. Additionally, considering that the number of genes measured in spatial transcriptomics data usually far exceeds the number of cell types that exist, this preprocessing step also helps balance the dimensionality of features across different views, allowing for more accurate niche identification (Supplementary Fig. S31). We use scVI⁶⁵ by default to perform dimensionality reduction and batch effect removal. However, simple PCA dimensionality reduction or other deep learning-based integration methods like scArches¹⁰⁴ are also applicable.

Multiple graph auto-encoder

To learn the joint representation that combines the common and complementary information from multiple views, scNiche applies a multiple graph autoencoder (M-GAE) model consisting of a multi-graph attention fusion encoder base on the GCN¹⁰⁵ and view-specific decoders.

In the multi-graph attention fusion encoder, we use V view-specific GCN layers as the first layer. Given the multi-view features ${{\mathscr{X}}}={\{{X}^{\left(v\right)}\}}_{v=1}^{V}$ and the corresponding graphs ${{\mathscr{A}}}={\{{A}^{\left(v\right)}\}}_{v=1}^{V}$, where ${X}^{\left(v\right)}{{\mathbb{\in }}{\mathbb{R}}}^{N\times F}$ is the feature matrix of the v-th view with N nodes and F features, and ${A}^{\left(v\right)}{{\mathbb{\in }}{\mathbb{R}}}^{N\times N}$ is the graph of the v-th view, the v-th view-specific representations ${Z}_{(1)}^{(v)}$ learned by the first layer can be obtained as follow:

$${Z}_{(1)}^{(v)}=\delta \left({\left({\widetilde{D}}^{(v)}\right)}^{-\frac{1}{2}}{\widetilde{A}}^{(v)}{\left({\widetilde{D}}^{(v)}\right)}^{-\frac{1}{2}}{X}^{(v)}{W}_{(1)}^{(v)}\right)$$

(1)

where δ(·) is the activation function. $\widetilde{A}=A+I$, I is the identity diagonal matrix. $\widetilde{D}={{\rm{diag}}}({\sum }_{j}{\widetilde{A}}_{{ij}})$ is the degree matrix of $\widetilde{A}$. ${W}_{(1)}^{(v)}{{\mathbb{\in }}\,{\mathbb{R}}}^{F\times {d}_{1}}\,$ is the parameter matrix of the v-th view learned by GCN layers, with ${d}_{1}$ being the output dimension for GCN layers.

To adaptively fuse the representations of a sample across different views, we introduce an attention coefficient matrix ${W}_{a}^{(v)}$ to learn the importance of each view. This allows for a weighted combination of view-specific representations, leading to a more informative common representation. The operation to compute this joint representation, denoted as ${Z}_{(2)}$, is defined by the following equation:

$${Z}_{(2)}=\delta \left({\sum }_{v=1}^{V}{W}_{a}^{(v)}\left({\left({\widetilde{D}}^{(v)}\right)}^{-\frac{1}{2}}{\widetilde{A}}^{(v)}{\left({\widetilde{D}}^{(v)}\right)}^{-\frac{1}{2}}{Z}_{(1)}^{(v)}{W}_{(2)}^{(v)}\right)\right)$$

(2)

where ${W}_{(2)}^{(v)}{{\mathbb{\in }}\,{\mathbb{R}}}^{{d}_{1}\times {d}_{2}}\,$ is the parameter matrix learned by GCN layers, the ${W}_{a}^{(v)}{{\mathbb{\in }}\,{\mathbb{R}}}^{{d}_{2}\times {d}_{2}}$ is the attention coefficient matrix, with d₂ being the output dimension for GCN layers.

We then continue to use the GCN layers to apply convolution over the obtained joint representation ${Z}_{(2)}\,$ and the consensus graph ${A}^{*}$ learned by the graph fusion network, and the final joint representation Z can be obtained as follow:

$$Z=\delta \left({\left({\widetilde{D}}^{*}\right)}^{-\frac{1}{2}}{\widetilde{A}}^{*}{\left({\widetilde{D}}^{*}\right)}^{-\frac{1}{2}}{Z}_{(2)}{W}_{(3)}\right)$$

(3)

where ${W}_{(3)}{{\mathbb{\in }}\,{\mathbb{R}}}^{{d}_{2}\times {d}_{3}}\,$ is the parameter matrix learned by GCN layers, with ${d}_{3}$ being the output dimension for GCN layers.

In the view-specific decoders, we use the inner-product as the decoder to reconstruct the multi-view graphs from the joint representation Z:

$${\hat{A}}^{(v)}={{\rm{sigmoid}}}\left(Z\cdot {W}^{(v)}\cdot {Z}^{T}\right)$$

(4)

where ${W}^{(v)}{{\mathbb{\in }}\,{\mathbb{R}}}^{{d}_{3}\times {d}_{3}}$ is the parameter matrix learned by the v-th view-specific decoder.

In order to minimize the reconstruction error between the original graph ${A}^{\left(v\right)}$ and the reconstruction graph ${\hat{A}}^{(v)}$ of each view, the loss of the multiple graph autoencoder is defined as:

$${L}_{{rec}}={\sum }_{v=1}^{V}{L}_{{rec}}^{(v)}={\sum }_{v=1}^{V}{loss}\left({A}^{(v)},{\hat{A}}^{(v)}\right)$$

(5)

The loss function to be optimized is binary cross entropy (BCE) loss.

Graph fusion network

In the scNiche framework, we introduce an additional graph fusion network (GFN) in the M-GAE model to learn the consensus graph ${A}^{*}$, which contains the global adjacency relations of graphs from different views (i.e., the global node relationships). Notably, the information is shared between M-GAE and GFN during training. The GFN is a two-layer fully connected model, where the first layer is followed by a ReLU activation. The consensus graph learned by l-th layer can be described as:

$${G}_{\left(l\right)}=\delta \left({W}_{{GFN}\left(l\right)}{G}_{\left(l-1\right)}+{b}_{{GFN}\left(l\right)}\right)$$

(6)

where δ(·) is the activation function, ${W}_{{GFN}\left(l\right)}{{\mathbb{\in }}\,{\mathbb{R}}}^{{d}_{l}\times {d}_{l-1}}$ and ${b}_{{GFN}\left(l\right)}\in {{\mathbb{R}}}^{{d}_{l}}$ are the weight matrix and bias of the l-th layer, respectively, with ${d}_{l}$ being the output dimension for layer l. The initial input ${G}_{0}$ to the GFN is defined as:

$${G}_{\left(0\right)}={\sum }_{v=1}^{V}{A}^{(v)}$$

(7)

where ${A}^{(v)}$ is the graphs from v-th view, and V is the number of views.

The GFN’s goal is to ensure that the final consensus graph ${A}^{*}$ integrates the information from each individual graph ${A}^{(v)}$ comprehensively. The optimization of the GFN involves a loss function that minimizes the discrepancy between the individual graphs from each view ${A}^{(v)}$ and the consensus graph ${A}^{*}$:

$${L}_{{gre}}={\sum }_{v=1}^{V}{loss}\left({A}^{(v)},{A}^{*}\right)$$

(8)

This loss function is typically a mean squared error (MSE) loss, focusing on reducing errors between the graphs from each view and the synthesized consensus graph.

Multi-view mutual information maximization

Mutual information is a Shannon entropy-based measure of dependence between random variables¹⁰⁶. Recent studies have revealed that maximizing the mutual information between input samples and learned latent representations contributes to the learning of useful representations by the models (such as the encoder)¹⁰⁷. Given the input samples $X={\left\{{x}^{\left(i\right)}\right\}}_{i=1}^{N}$ and the corresponding representations $Y={\left\{{y}^{\left(i\right)}\right\}}_{i=1}^{N}$, the mutual information between X and Y can be expressed as:

$$I\left(X,Y\right) =\iint p\left({y|x}\right)p\left(x\right)\log \frac{p\left({y|x}\right)}{p\left(y\right)}{dx}\,{dy}\\ ={{{\mathcal{D}}}}_{{KL}}(p\left({y|x}\right)p\left(x\right){{\rm{||}}}p\left(y\right)p\left(x\right))$$

(9)

Based on the assumption by Wang et al.¹⁰³ that if two samples x and ${x}^{{\prime} }$ are close in any view, their corresponding representations z and ${z}^{{\prime} }$ should also be close in the common latent view, we here applied their multi-view mutual information maximization (MMIM) module to guide the learning of the clustering-friendly joint representations. Specifically, the MMIM module aims to guide the coupled model of M-GAE and GFN to ultimately generate more useful joint representations for each cell by boosting the similarity of the multi-view joint representations of two cells that are similar to each other in any view, as a way to make subsequent cell niche identification more accurate. According to the relevant properties of mutual information, larger mutual information denotes the representations are more similar, thus the optimization objective of the MMIM module can be expressed as:

$$\max \left\{I\left(Z,{Z}^{{\prime} }\right)\right\}$$

(10)

Where Z and ${Z}^{{\prime} }$ are the corresponding representations of the samples X and their nearest neighbors ${X}^{{\prime} }$, respectively.

According to the Eq. (9) and Eq. (10), the loss of the MMIM module can be written as:

$${L}_{{mim}}=-{{{\mathcal{D}}}}_{{KL}}(p\left({z}^{{\prime} }{|z}\right)p\left(z\right){||p}\left({z}^{{\prime} }\right)p\left(z\right))$$

(11)

Since the KL divergence is unbounded, we use JS divergence instead of KL divergence in mutual information and Eq. (11) can be converted to:

$${L}_{{mim}}=-{{{\mathcal{D}}}}_{{JS}}(p\left({z}^{{\prime} }{|z}\right)p\left(z\right){||}p\left(z{\prime} \right)p\left(z\right))$$

(12)

The JS divergence, as a specific case of the f-divergences¹⁰⁸, is challenging to compute directly in practice. We thus utilize the variational lower bound on the f-divergences ${{{\mathcal{D}}}}_{f}\left({P||Q}\right)$¹⁰⁸ to estimate a generative model Q given the true distribution P. In this approach, we adopt the generative-adversarial network methodology, employing two neural networks: Q and T. Here, Q is our generative model that outputs a sample of interest from a random vector input, and T is the variational function that evaluates these samples. The variational estimation of f-divergences is defined as:

$${{{\mathcal{D}}}}_{f}\left({P||Q}\right)={\max }_{T}({{\mathbb{E}}}_{x \sim p(x)}\left[T(x)\right]-{{\mathbb{E}}}_{x \sim q(x)}\left[g(T(x))\right])$$

(13)

where $p(x)$ and $q(x)$ are the probability density functions of the true distribution P and the estimated distribution Q respectively, with Q being parameterized by the generative model in the GAN framework. The functions $f(u)$ and the its conjugate $g(t)$ dictate the specific form of divergence being measured¹⁰⁸:

$$f(u) =-\left(u+1\right)\log \frac{u+1}{2}+u\log u\\ g(t) =-\log (2-{e}^{t})$$

(14)

This framework facilitates the calculation of the JS divergence as follows:

$${{{\mathcal{D}}}}_{{JS}}\left({P||Q}\right) ={\max }_{T}({{\mathbb{E}}}_{x \sim p\left(x\right)}\left[T\left(x\right)\right]-{{\mathbb{E}}}_{x \sim q\left(x\right)}\left[g\left(T\left(x\right)\right)\right]) \\ ={\max }_{T}\left(\int p\left(x\right)T\left(x\right)-q\left(x\right)g\left(T\left(x\right)\right){dx}\right) \\ ={\max }_{T}\left(\int p\left(x\right)T\left(x\right)-q\left(x\right)\left[-\log \left(2-{e}^{T\left(x\right)}\right)\right]{dx}\right)$$

(15)

Let $T\left(x\right)=\log [2D(x)]$ (Here, $D(x)$ is a variational function that can be related to $T(x)$ via a simple transformation. The purpose of this transformation is to simplify the optimization process, enabling more tractable gradients for optimization. Importantly, this transformation does not alter the intrinsic nature of $T\left(x\right).$), then Eq. (15) can be converted to:

$${{{\mathcal{D}}}}_{{JS}}\left({P||Q}\right) ={\max }_{D}\left(\int p\left(x\right)\log [2D(x)]-q\left(x\right)\left[-\log \left(2-{e}^{\log [2D(x)]}\right)\right]{dx}\right)\\ ={\max }_{D}\bigg(\int p\left(x\right)\log \left[D\left(x\right)\right]+p\left(x\right)\log 2 \\ \quad+q\left(x\right)\log \left[1-D\left(x\right)\right]+q\left(x\right)\log 2{dx}\bigg) \\ ={\max }_{D}\left(\int p\left(x\right)\log \left[D\left(x\right)\right]+q\left(x\right)\log \left[1-D\left(x\right)\right]{dx}+\log 4\right) \\ \rightleftharpoons {\max }_{D}\left(\int p\left(x\right)\log \left[D\left(x\right)\right]+q\left(x\right)\log \left[1-D\left(x\right)\right]{dx}\right)\\ ={\max }_{D}\left({{\mathbb{E}}}_{x \sim p\left(x\right)}\left[\log D\left(x\right)\right]+{{\mathbb{E}}}_{x \sim q\left(x\right)}\left[\log (1-D\left(x\right))\right]\right)$$

(16)

In our loss function, $p\left(z^{\prime} {|z}\right)p\left(z\right)$ and $p\left(z^{\prime} \right)p\left(z\right)$ are used to replace $p\left(x\right)$ and $q\left(x\right)$, and the loss of the MMIM module can be rewritten as:

$${L}_{{mim}}=-{{\mathbb{E}}}_{\left(z,{z}^{{\prime} }\right) \sim p\left({z}^{{\prime} }{|z}\right)p\left(z\right)}\left[\log D\left(z,{z}^{{\prime} }\right)\right]-{{\mathbb{E}}}_{\left(z,{z}^{{\prime} }\right) \sim p\left({z}^{{\prime} }\right)p\left(z\right)}\left[\log \left(1-D\left(z,{z}^{{\prime} }\right)\right)\right]$$

(17)

The problem in Eq. (17) can be solved using the negative sample estimation¹⁰⁷. $D\left(z,{z}^{{\prime} }\right)$ is a discriminator to distinguish the negative sample pairs and positive sample pairs to estimate the distribution of positive samples. Positive sample pairs are composed by the latent representations of the sample x and its nearest neighbor $x{\prime}$ in any view, and negative sample pairs are composed by the latent representations of the sample x and random samples outside its nearest neighbors. The nearest neighbors of each sample are identified by the k-nearest neighbors algorithm.

Loss function of scNiche

The total loss function of scNiche is defined as:

$$L={L}_{{gre}}+{\lambda }_{1}{L}_{{rec}}+{{\lambda }_{2}L}_{{mim}}$$

(18)

where ${\lambda }_{1}$ and ${\lambda }_{2}$ are hyperparameters that balances three parts of the loss function. By default, ${\lambda }_{1}={\lambda }_{2}=1$.

Batch training strategy

We develop a subgraph-based batch training strategy that enables scNiche to scale to large datasets and multiple slices. Specifically, after extracting multi-view features of cells, we do not directly construct the corresponding graphs with all cells, which would result in insufficient memory due to the excessive number of nodes and edges on the entire graph. As an alternative, we initially divide the entire dataset into several non-overlapping subsets using a random sampling strategy, and subsequently construct corresponding graphs for each subset, which are referred to as subgraphs. Next, we employ the batched graph data loader facilitated by DGL¹⁰⁹ for batch-iterating over this set of subgraphs to generate the batched graph of each batch for model training. Considering the sharp reduction in the number of nodes and edges on each subgraph compared to the entire graph, this batch training strategy effectively avoids the out-of-memory limitation. We evaluated the robustness of scNiche’s batch training strategy to the different batch number settings on the mouse spleen CODEX dataset (Supplementary Fig. S4).

Clustering

We employ the unsupervised clustering algorithm k-means by default on the learned joint representation to identify the cell niches. Additionally, if the target number of clusters is not provided, we identify the optimal candidates for K based on the stability of the clustering proposed by Varrone et al.²². In brief, for each K within the specified range, we execute a single clustering run with K clusters. Subsequently, we calculated the average Fowlkes–Mallows Index¹¹⁰ (FMI) between the clusters at K-1 and K, and between K and K+1. A higher average FMI indicates a higher similarity between clustering solutions of the continuous cluster number, i.e., the clustering results are more stable.

Enrichment analysis of cell niches

We apply a general enrichment analysis framework that can characterize the identified cell niches from various perspectives (including cellular compositions, conditions, and samples, etc.) and compute the corresponding enrichment scores. Taking cellular composition as an example, given cells belonging to S samples, N identified cell niches, and M cell populations $C=\left\{{c}^{(s,n,m)}|1\le s\le S,1\le n\le N,1\le m\le M\right\}$, we first compute the observed value of the proportion of the cell population m within the cell niche n in each sample s:

$${{Prop}}_{{obs}}^{(s,n,m)}=\frac{{c}^{(s,n,m)}}{{\sum }_{i=1}^{M}{c}^{(s,n,i)}}$$

(19)

and the expected value of the proportion of the cell population m within the cell niche n in each sample s is defined as:

$${{Prop}}_{\exp }^{(s,n,m)}=\frac{{\sum }_{\begin{array}{c}i=1\\ i\ne n\end{array}}^{N}{{Prop}}_{{obs}}^{(s,i,m)}}{N-1}$$

(20)

we then define the enrichment score of cell population m within the cell niche n across S samples as follow:

$${{ES}}^{(n,m)}={\log }_{2}\left(\frac{\frac{{\sum }_{i=1}^{S}{{Prop}}_{{obs}}^{(i,n,m)}}{S}}{\frac{{\sum }_{i=1}^{S}{{Prop}}_{\exp }^{(i,n,m)}}{S}}\right)$$

(21)

The P-value of ${{ES}}^{(n,m)}$ can be computed with the one-sided Mann-Whitney U test if requested.

Multi-sample analysis framework

For large-scale datasets containing multiple samples under different conditions, scNiche provides a multi-sample analysis framework that enables niche comparisons at the sample scale. Specifically, for each niche within each sample, we compute its cell composition and phenotypic characteristics (defined as the average expression value of all cells belonging to this niche), as well as the proportion and phenotypic characteristics (defined as the average expression value of all cells belonging to this cell population) of specific cell populations within this niche. Subsequently, we can perform differential analyses across the entire sample series, including the comparison of compositions or phenotypes for specific niches between conditions, as well as the comparison of proportions or phenotypes for specific cell populations between niches. The p-value is calculated with the two-sided Mann-Whitney U test. Furthermore, to avoid the effect of outliers, for each niche, only samples with a proportion of that niche exceeding a set threshold (5% by default) are considered.

Simulation experiment setup

We simulated the situation in which the cell niches exhibit heterogeneity in both gene expression and cellular composition among each other (Supplementary Fig. S2a). To achieve this, we generated the simulated data from the multi-condition scRNA-seq datasets following the simulation framework used in scCube³¹. Specifically, we first generate the proportion and cellular composition of each niche in a randomized manner. If two cell niches were designated to exhibit heterogeneity in gene expression, the cells within these two cell niches were derived from different conditions of scRNA-seq data with similar composition proportions. Conversely, if two cell niches were designated to exhibit heterogeneity in cellular composition, the cells within these two cell niches were derived from the same condition of scRNA-seq data but with different composition proportions. Subsequently, we generated the random spatial patterns for each cell niche with the reference-free strategy of scCube.

We considered four variabilities in the simulated data: continuity of spatial patterns, complexity of niche composition, gene expression dropout, and cell annotation dropout. For the first variability, we generated cell niches with different levels of spatial pattern continuity by setting the parameter δ in scCube to 10, 20, 30, and 50. For the second variability, we generated cell niches with different composition complexity by randomly selecting 2, 3, and 4 cell populations for each cell niche. The last two variabilities corresponded to scenarios of degraded data quality. For the dropout of gene expression, we randomly set the expression values of genes to 0 with the proportions of 0.1, 0.2, 0.4, and 0.8. For the dropout of cell annotation, we randomly altered the cell annotation labels to “ambiguous” with the proportions of 0.1, 0.2, 0.4, and 0.8.

Benchmarking analysis

We compared the performance of scNiche with other existing methods using simulated and biological datasets. The target number of clusters was kept consistent across all methods for each benchmarking dataset, as determined by the ground truth. Furthermore, for methods that require specifying the range of neighborhoods based on the k-nearest neighbors algorithm first, such as scNiche, BANKSY, STAGATE, GraphST, SpaceFlow, and CytoCommunity, we set a consistent value of k for each method (20 for the simulated datasets, 30 for the single cell spatial omics datasets, and 6 for the human DLPFC 10X Visium dataset). Below we describe the application of each method.

scNiche

The preprocessing step of each dataset is described in the “Data collection and preprocessing” section above. For the mouse spleen CODEX dataset, we applied the batch training strategy to run scNiche on both single and multiple slices with the number of batches set to 30 and 100, respectively. For the human UTUC IMC dataset, mouse frontal cortex and striatum MERFISH dataset, and the human DLPFC 10X Visium dataset, we applied the batch training strategy to run scNiche on multiple slices with the number of batches set to 100, 20, and 2, respectively. For other datasets, we directly ran scNiche with the full-graph-based training. The parameter ‘epochs’ was set to 200 for the mouse spleen CODEX dataset and 100 for other datasets; the parameter ‘lr’ was set to 0.01 for all datasets.

DR-SC

DR-SC¹⁸ is implemented in the R package DR.SC (version 3.3). The parameter ‘platform’ was set to ‘Visium’ for the human DLPFC 10X Visium dataset and ‘Other_SRT’ for other datasets. All other parameters were set with default values.

BASS

BASS¹⁹ is implemented in the R package BASS (version 1.1.0.016). We ran BASS with default parameters except for the parameter ‘C’, which was determined by the number of cell types in each benchmarking dataset.

CellCharter

CellCharter²² is implemented in the Python package cellcharter (version 0.1.2). We followed the instructions in the original article for dimensionality reduction and batch effect removal. Specifically, for the simulated datasets and spatial proteomics datasets, we applied the TRVAE model implemented in the Python package scArches¹⁰⁴ (version 0.5.9). For the human DLPFC 10X Visium dataset, we applied the scVI model implemented in the Python package scvi⁶⁵ (version 1.1.2). For the mouse V1 neocortex STARmap dataset, we applied the PCA dimensionality reduction directly since this dataset only contains one tissue slice. For the mouse frontal cortex and striatum MERFISH dataset, we also applied the PCA dimensionality reduction since this dataset has been normalized by the original authors. All other parameters were set with default values.

SpaGCN

SpaGCN²⁴ is implemented in the Python package SpaGCN (version 1.2.7). We ran SpaGCN with default parameters. The refinement step was also performed.

UTAG

UTAG²¹ is implemented in the Python package utag (version 0.1.1). We ran UTAG with default parameters except for the parameter ‘max_dist’, which was set to be consistent with the value of k in methods that require specifying the range of neighborhoods based on the k-nearest neighbors algorithm first (20 for the simulated datasets, 30 for the single cell spatial omics datasets, and 6 for the human DLPFC 10X Visium dataset). Additionally, since the Leiden clustering algorithm employed in UTAG does not directly allow for setting the number of clusters, we modified the clustering process by introducing the ‘search_res’ function from SpaGCN²⁴ to determine the clustering resolution first, and subsequently performed Leiden clustering with this resolution.

BANKSY

BANKSY²³ is implemented in the R package Banksy (version 0.99.13). We followed the tutorials provided by the original authors to set recommended values of the parameter ‘lambda’. Specifically, for the human DLPFC 10X Visium dataset, the parameter ‘lambda’ was set to 0.2. For other datasets, the parameter ‘lambda’ was set to 0.8. All other parameters were set with default values. Additionally, for the human DLPFC 10X Visium dataset, we applied the multi-sample analysis followed by the tutorial. The clustering process was also modified as described above to determine the clustering resolution first.

STAGATE

STAGATE²⁵ is implemented in the Python package stagate-pyg (version 1.0.0). For the mouse spleen CODEX dataset, we ran STAGATE using the batch training strategy with the parameters ‘num_batch_x’ = 4, ‘num_batch_y’ = 6, and ‘n_epochs’ = 500, and all other parameters were set with default values. For other datasets, we ran STAGATE with default parameters. Additionally, for the mouse V1 neocortex STARmap dataset and the human DLPFC 10X Visium dataset, we applied the ‘mclust’ algorithm in the clustering step as recommended, and for other datasets, we applied the Louvain algorithm with the clustering resolution determined by the modified clustering process described above.

GraphST

GraphST²⁶ is implemented in the Python package GraphST (version 1.1.1). We ran GraphST with default parameters. The refinement step was also performed. Notably, GraphST raised a “CUDA out of memory” error on the mouse spleen CODEX dataset.

SpaceFlow

SpaceFlow²⁷ is implemented in the Python package SpaceFlow (version 1.0.3). We ran SpaceFlow with default parameters. The clustering process was also modified as described above to determine the clustering resolution first.

CytoCommunity

CytoCommunity²⁸ is implemented in the Python package CytoCommunity (version 1.1.0). We applied the unsupervised version of CytoCommunity. For the simulated datasets and the mouse V1 neocortex STARmap dataset, we ran CytoCommunity with default parameters. For the mouse spleen CODEX dataset, the human UTUC IMC dataset, and the mouse frontal cortex and striatum MERFISH dataset, we set the parameter ‘Num_Epoch’ to 100, 500, and 500, respectively, to reduce the training time. For the mouse spleen CODEX dataset, we further set the parameter ‘Loss_Cutoff’ to -0.3 to reduce the training time. All other parameters were set with default values. In addition, we did not run CytoCommunity on the human DLPFC 10X Visium dataset because the current version of CytoCommunity does not support data at non-single-cell resolution.

Evaluation metrics

We used the adjusted Rand index (ARI) and the macro-F1 score to evaluate the performance of each method. The benchmarking was conducted on a computing cluster with 2 AMD EPYC 7K62 CPUs (48 cores each), with approximately 503.65 GB of usable system memory. For GPU-compatible methods, an NVIDIA A10 GPU with 24 GB total memory (approximately 22.5 GB usable) was used.

Scalability analysis of each method on the mouse whole brain MERFISH dataset

We tested the scalability of scNiche and other methods on the processed mouse whole brain MERFISH dataset containing 3,698,530 cells from 129 coronal sections. For scNiche, we performed the PCA dimensionality reduction on the data first and then applied the batch training strategy on multiple sections with the number of batches set to 500. We set other parameters ‘k_cutoff’ = 30, ‘epochs’ = 25, and ‘lr’ = 0.01. The target number of clusters was set to 14 based on the cluster stability result. For CellCharter, we performed the PCA dimensionality reduction on the data first, and ran it with default parameters. For UTAG, we applied the batch mode provided in the tutorial with the parameter ‘max_dist’ = 30. For BANKSY, we applied the multi-sample analysis with the parameters ‘k_geom’ = 30 and ‘lambda’ = 0.8. The number of clusters was also set to 14 to be consistent with scNiche. In addition, we performed the k-means algorithm on the principal components of the data directly as the nonspatial clustering for comparison. The annotated mouse brain coronal section images were downloaded from the Allen Mouse Brain Atlas⁴¹ [https://mouse.brain-map.org/experiment/thumbnails/100048576?image_type=atlas]. The scalability benchmarking was conducted on the same computing cluster as the other benchmarking studies.

Analysis of the human TNBC MIBI-TOF dataset and the mouse liver Seq-Scope dataset

For both the human TNBC MIBI-TOF dataset and the mouse liver Seq-Scope dataset, we applied the batch training strategy to run scNiche on multiple slices with the number of batches set to 20. Notably, for the Seq-Scope data, despite its ability to achieve subcellular resolution, the UMI information for each high-definition map coordinate identifier (HDMI) needs to be aggregated to produce interpretable results¹⁰. We therefore used the data binning with 10 μm grids provided by the original authors for our analysis, as the resolution of this data was already close to the single-cell level and there was no much noise in cell type identification¹⁰ (Supplementary Fig. S32). We set the parameters ‘k_cutoff’ = 30, ‘epochs’ = 100, and ‘lr’ = 0.01 for both datasets. The target number of clusters was determined by the cluster stability results.

Spatial connectivity analysis of cell niches

Given cells belonging to N identified cell niches $C={\left\{{c}^{\left(n\right)}\right\}}_{n=1}^{N}$, we first computed the spatial graph of cells based on Delaunay triangulation using the ‘spatial_neighbors’ function in squidpy⁸⁹ Python package (version 1.2.3) and get the adjacency matrix A. The number of spatial links between cell niche i and j then can be defined as:

$${{Link}}^{(i,j)}={\sum}_{v\in {c}^{\left(i\right)},w\in {c}^{\left(j\right)}}{A}_{{vw}}$$

(22)

Larger values indicate a stronger spatial connectivity between cell niches.

Differential gene expression analysis

The differentially expressed genes were calculated using the ‘rank_genes_groups’ function in scanpy¹⁰¹ Python package (version 1.9.1) with the Wilcoxon rank sum test (adjusted p-value < 0.05).

Gene expression signature score calculation

The gene expression signature scores were calculated using the ‘score_genes’ function in scanpy¹⁰¹ Python package (version 1.9.1) with default parameters. The signature gene sets of KEGG¹¹¹ pathways were downloaded using the ‘get_library’ function in gseapy¹¹² Python package (version 1.1.2). The signature gene sets of the cellular inflammatory infiltration and fibrosis were obtained from the original publication by Te et al.⁷⁴.

Pathway enrichment analysis

The gene set enrichment analysis¹¹³ (GSEA) was performed using the gseapy¹¹² Python package (version 1.1.2) with default parameters, whose hallmark gene sets were downloaded from the Molecular Signatures Database^114,115 using the ‘get_library’ function in gseapy¹¹² Python package (version 1.1.2).

Statistics

Python (version 3.9.19) and R (version 4.2.1 and 4.3.1) are used for the statistical analysis.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

No new data was generated for this study. All data used in this study is publicly available and can be accessed through the following links: (1) the multi-condition human PBMCs dataset [downloaded using the ExperimentHub R package (muscData, EH2259)]⁹⁸; (2) the multi-condition mouse cortex dataset [downloaded using the ExperimentHub R package (muscData, EH3297)]⁹⁹; (3) the mouse spleen CODEX dataset [https://data.mendeley.com/datasets/zjnpwh8m5b/1]¹²; (4) the human UTUC IMC dataset [https://doi.org/10.5281/zenodo.6376766]³³; (5) the mouse V1 neocortex STARmap dataset [https://zenodo.org/record/7830764#.ZDpObi-1HUI]³⁴; (6) the mouse frontal cortex and striatum MERFISH dataset [https://cellxgene.cziscience.com/collections/31937775-0602-4e52-a799-b6acdd2bac2e]³⁵; (7) the human MTG snRNA-seq dataset [https://portal.brain-map.org/atlases-and-data/rnaseq/human-mtg-smart-seq]³⁸; (8) the human DLPFC 10X Visium dataset [http://spatial.libd.org/spatialLIBD/]³⁷; (9) the mouse whole brain (ABCA-1) MERFISH dataset [https://cellxgene.cziscience.com/collections/0cca8620-8dee-45d0-aef5-23f032a5cf09]⁴⁰; (10) the human TNBC MIBI-TOF dataset [https://gene.ai.tencent.com/SpatialOmics/dataset?datasetID=47]¹³; (11) the mouse liver Seq-Scope dataset [https://doi.org/10.7302/cjfe-wa35]¹⁰; (12) the mouse brain spatial CUT&Tag–RNA-seq dataset [https://zenodo.org/records/10362607]⁹⁷. Source data are provided with this paper.

Code availability

scNiche is an open-access python package available in the GitHub repository (https://github.com/ZJUFanLab/scNiche), under the GPL-3.0 license. The relevant code is also accessible via Zenodo (https://zenodo.org/records/14195486)¹¹⁶. Source code from CMGEC was used, with written permission from the authors.

References

Rojas-Ríos, P. & González-Reyes, A. Concise review: The plasticity of stem cell niches: a general property behind tissue homeostasis and repair. Stem Cells 32, 852–859 (2014).
Article PubMed MATH Google Scholar
Mendelson, A. & Frenette, P. S. Hematopoietic stem cell niche maintenance during homeostasis and regeneration. Nat. Med. 20, 833–846 (2014).
Article CAS PubMed PubMed Central Google Scholar
Kanemaru, K. et al. Spatially resolved multiomics of human cardiac niches. Nature 619, 801–810 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Ren, Y. et al. Spatial transcriptomics reveals niche-specific enrichment and vulnerabilities of radial glial stem-like cells in malignant gliomas. Nat. Commun. 14, 1028 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Guilliams, M. et al. Spatial proteogenomics reveals distinct and evolutionarily conserved hepatic macrophage niches. Cell 185, 379–396.e38 (2022).
Article CAS PubMed PubMed Central Google Scholar
Lake, B. B. et al. An atlas of healthy and injured cell states and niches in the human kidney. Nature 619, 585–594 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Moffitt, J. R. et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science 362, eaau5324 (2018).
Article ADS PubMed PubMed Central MATH Google Scholar
Stickels, R. R. et al. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat. Biotechnol. 39, 313–319 (2021).
Article CAS PubMed Google Scholar
Shi, H. et al. Spatial atlas of the mouse central nervous system at molecular resolution. Nature 622, 552–561 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Cho, C.-S. et al. Microscopic examination of spatial transcriptome using Seq-Scope. Cell 184, 3559–3572.e22 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell 185, 1777–1792.e21 (2022).
Article CAS PubMed MATH Google Scholar
Goltsev, Y. et al. Deep Profiling of Mouse Splenic Architecture with CODEX Multiplexed Imaging. Cell 174, 968–981.e15 (2018).
Article CAS PubMed PubMed Central MATH Google Scholar
Keren, L. et al. A Structured Tumor-Immune Microenvironment in Triple Negative Breast Cancer Revealed by Multiplexed Ion Beam Imaging. Cell 174, 1373–1387.e19 (2018).
Article CAS PubMed PubMed Central MATH Google Scholar
Jackson, H. W. et al. The single-cell pathology landscape of breast cancer. Nature 578, 615–620 (2020).
Article ADS CAS PubMed MATH Google Scholar
He, S. et al. High-plex imaging of RNA and proteins at subcellular resolution in fixed tissue by spatial molecular imaging. Nat. Biotechnol. 40, 1794–1806 (2022).
Article CAS PubMed MATH Google Scholar
Zhu, Q., Shah, S., Dries, R., Cai, L. & Yuan, G.-C. Identification of spatially associated subpopulations by combining scRNAseq and sequential fluorescence in situ hybridization data. Nat. Biotechnol. https://doi.org/10.1038/nbt.4260 (2018).
Zhao, E. et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat. Biotechnol. 39, 1375–1384 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Liu, W. et al. Joint dimension reduction and clustering analysis of single-cell RNA-seq and spatial transcriptomics data. Nucleic Acids Res 50, e72 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Li, Z. & Zhou, X. BASS: multi-scale and multi-sample analysis enables accurate cell type clustering and spatial domain detection in spatial transcriptomic studies. Genome Biol. 23, 168 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Wu, Z. et al. Discovery and generalization of tissue structures from spatial omics data. Cell Rep. Methods 4, 100838 (2024).
Article CAS PubMed PubMed Central Google Scholar
Kim, J. et al. Unsupervised discovery of tissue architecture in multiplexed imaging. Nat. Methods 19, 1653–1661 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Varrone, M., Tavernari, D., Santamaria-Martínez, A., Walsh, L. A. & Ciriello, G. CellCharter reveals spatial cell niches associated with tissue remodeling and cell plasticity. Nat. Genet 56, 74–84 (2024).
Article CAS PubMed Google Scholar
Singhal, V. et al. BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis. Nat. Genet. 56, 431–441 (2024).
Article CAS PubMed PubMed Central MATH Google Scholar
Hu, J. et al. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. Methods 18, 1342–1351 (2021).
Article PubMed MATH Google Scholar
Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun. 13, 1739 (2022).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Long, Y. et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat. Commun. 14, 1155 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Ren, H., Walker, B. L., Cang, Z. & Nie, Q. Identifying multicellular spatiotemporal organization of cells with SpaceFlow. Nat. Commun. 13, 4076 (2022).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Hu, Y. et al. Unsupervised and supervised discovery of tissue cellular neighborhoods from cell phenotypes. Nat. Methods 21, 267–278 (2024).
Article CAS PubMed PubMed Central MATH Google Scholar
Chen, Z., Soifer, I., Hilton, H., Keren, L. & Jojic, V. Modeling Multiplexed Images with Spatial-LDA Reveals Novel Tissue Microenvironments. J. Comput Biol. 27, 1204–1218 (2020).
Article CAS PubMed PubMed Central Google Scholar
Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Qian, J. et al. Simulating multiple variability in spatially resolved transcriptomics with scCube. Nat. Commun. 15, 5021 (2024).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Yuan, Z. MENDER: fast and scalable tissue structure identification in spatial omics data. Nat. Commun. 15, 207 (2024).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Ohara, K. et al. The evolution of metastatic upper tract urothelial carcinoma through genomic-transcriptomic and single-cell protein markers analysis. Nat. Commun. 15, 2009 (2024).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, eaat5691 (2018).
Article PubMed PubMed Central Google Scholar
Allen, W. E., Blosser, T. R., Sullivan, Z. A., Dulac, C. & Zhuang, X. Molecular and spatial signatures of mouse brain aging at single-cell resolution. Cell 186, 194–208.e18 (2023).
Article CAS PubMed Google Scholar
Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353, 78–82 (2016).
Article ADS PubMed MATH Google Scholar
Maynard, K. R. et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci. 24, 425–436 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Hodge, R. D. et al. Conserved cell types with divergent features in human versus mouse cortex. Nature 573, 61–68 (2019).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Kleshchevnikov, V. et al. Cell2location maps fine-grained cell types in spatial transcriptomics. Nat. Biotechnol. 40, 661–671 (2022).
Article CAS PubMed Google Scholar
Zhang, M. et al. Molecularly defined and spatially resolved cell atlas of the whole mouse brain. Nature 624, 343–354 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Wang, Q. et al. The Allen Mouse Brain Common Coordinate Framework: A 3D Reference Atlas. Cell 181, 936–953.e20 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Xu, H. et al. Unsupervised spatially embedded deep representation of spatial transcriptomics. Genome Med. 16, 12 (2024).
Article PubMed PubMed Central MATH Google Scholar
Duan, B., Chen, S., Cheng, X. & Liu, Q. Multi-slice spatial transcriptome domain analysis with SpaDo. Genome Biol. 25, 73 (2024).
Article PubMed PubMed Central MATH Google Scholar
Haviv, D. et al. The covariance environment defines cellular niches for spatial inference. Nat. Biotechnol. https://doi.org/10.1038/s41587-024-02193-4 (2024)
Quail, D. F. & Joyce, J. A. Microenvironmental regulation of tumor progression and metastasis. Nat. Med 19, 1423–1437 (2013).
Article CAS PubMed PubMed Central MATH Google Scholar
Anderson, N. M. & Simon, M. C. The tumor microenvironment. Curr. Biol. 30, R921–R925 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Binnewies, M. et al. Understanding the tumor immune microenvironment (TIME) for effective therapy. Nat. Med 24, 541–550 (2018).
Article CAS PubMed PubMed Central MATH Google Scholar
Junttila, M. R. & de Sauvage, F. J. Influence of tumour micro-environment heterogeneity on therapeutic response. Nature 501, 346–354 (2013).
Article ADS CAS PubMed MATH Google Scholar
Bejarano, L., Jordāo, M. J. C. & Joyce, J. A. Therapeutic Targeting of the Tumor Microenvironment. Cancer Discov. 11, 933–959 (2021).
Article CAS PubMed MATH Google Scholar
Ptacek, J. et al. Multiplexed ion beam imaging (MIBI) for characterization of the tumor microenvironment across tumor types. Lab Invest 100, 1111–1123 (2020).
Article CAS PubMed MATH Google Scholar
Hsieh, W.-C. et al. Spatial multi-omics analyses of the tumor immune microenvironment. J. Biomed. Sci. 29, 96 (2022).
Article PubMed PubMed Central MATH Google Scholar
Kudelova, E. et al. Genetic Heterogeneity, Tumor Microenvironment and Immunotherapy in Triple-Negative Breast Cancer. Int J. Mol. Sci. 23, 14937 (2022).
Article CAS PubMed PubMed Central Google Scholar
Fridman, W. H. et al. B cells and tertiary lymphoid structures as determinants of tumour immune contexture and clinical outcome. Nat. Rev. Clin. Oncol. 19, 441–457 (2022).
Article CAS PubMed MATH Google Scholar
Sautès-Fridman, C. et al. Tertiary Lymphoid Structures in Cancers: Prognostic Value, Regulation, and Manipulation for Therapeutic Intervention. Front Immunol. 7, 407 (2016).
Article PubMed PubMed Central Google Scholar
Mao, X. et al. Crosstalk between cancer-associated fibroblasts and immune cells in the tumor microenvironment: new findings and future perspectives. Mol. Cancer 20, 131 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Nishikawa, H. & Koyama, S. Mechanisms of regulatory T cell infiltration in tumors: implications for innovative immune precision therapies. J. Immunother. Cancer 9, e002591 (2021).
Article PubMed PubMed Central MATH Google Scholar
Gabrilovich, D. I. & Nagaraj, S. Myeloid-derived suppressor cells as regulators of the immune system. Nat. Rev. Immunol. 9, 162–174 (2009).
Article CAS PubMed PubMed Central MATH Google Scholar
Keren, L. et al. MIBI-TOF: A multiplexed imaging platform relates cellular phenotypes and tissue structure. Sci. Adv. 5, eaax5851 (2019).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Carrel, S., Schmidt-Kessen, A. & Giuffrè, L. Recombinant interferon-gamma can induce the expression of HLA-DR and -DC on DR-negative melanoma cells and enhance the expression of HLA-ABC and tumor-associated antigens. Eur. J. Immunol. 15, 118–123 (1985).
Article CAS PubMed Google Scholar
Hadrup, S., Donia, M. & Thor Straten, P. Effector CD4 and CD8 T cells and their role in the tumor microenvironment. Cancer Microenviron. 6, 123–133 (2013).
Article CAS PubMed MATH Google Scholar
Lee, E. S. et al. Calretinin, CD34, and alpha-smooth muscle actin in the identification of peritoneal invasive implants of serous borderline tumors of the ovary. Mod. Pathol. 19, 364–372 (2006).
Article CAS PubMed MATH Google Scholar
Anggorowati, N. et al. Histochemical and Immunohistochemical Study of α-SMA, Collagen, and PCNA in Epithelial Ovarian Neoplasm. Asian Pac. J. Cancer Prev. 18, 667–671 (2017).
PubMed PubMed Central MATH Google Scholar
Sarrió, D. et al. Epithelial-mesenchymal transition in breast cancer relates to the basal-like phenotype. Cancer Res. 68, 989–997 (2008).
Article PubMed MATH Google Scholar
Cho, C.-S. et al. Concurrent activation of growth factor and nutrient arms of mTORC1 induces oxidative liver injury. Cell Discov. 5, 60 (2019).
Article PubMed PubMed Central MATH Google Scholar
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).
Article CAS PubMed PubMed Central Google Scholar
Halpern, K. B. et al. Single-cell spatial reconstruction reveals global division of labour in the mammalian liver. Nature 542, 352–356 (2017).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Ben-Moshe, S. et al. Spatial sorting enables comprehensive characterization of liver zonation. Nat. Metab. 1, 899–911 (2019).
Article CAS PubMed PubMed Central MATH Google Scholar
Hildebrandt, F. et al. Spatial Transcriptomics to define transcriptional patterns of zonation and structural components in the mouse liver. Nat. Commun. 12, 7046 (2021).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Richter, M. L. et al. Single-nucleus RNA-seq2 reveals functional crosstalk between liver zonation and ploidy. Nat. Commun. 12, 4264 (2021).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Wang, S. et al. Region-specific cellular and molecular basis of liver regeneration after acute pericentral injury. Cell Stem Cell 31, 341–358.e7 (2024).
Article CAS PubMed MATH Google Scholar
Paris, J. & Henderson, N. C. Liver zonation, revisited. Hepatology 76, 1219–1230 (2022).
Article PubMed Google Scholar
Sack, G. H. Serum Amyloid A (SAA) Proteins. Subcell. Biochem. 94, 421–436 (2020).
Article CAS PubMed MATH Google Scholar
Strazzabosco, M., Fabris, L. & Albano, E. Osteopontin: a new player in regulating hepatic ductular reaction and hepatic progenitor cell responses during chronic liver injury. Gut 63, 1693–1694 (2014).
Article CAS PubMed Google Scholar
Te, J. A., AbdulHameed, M. D. M. & Wallqvist, A. Systems toxicology of chemically induced liver and kidney injuries: histopathology-associated gene co-expression modules. J. Appl Toxicol. 36, 1137–1149 (2016).
Article CAS PubMed PubMed Central Google Scholar
Manchanda, M. et al. Cathepsin L and B as Potential Markers for Liver Fibrosis: Insights From Patients and Experimental Models. Clin. Transl. Gastroenterol. 8, e99 (2017).
Article CAS PubMed PubMed Central MATH Google Scholar
Yu, C., Wan, Y., Piao, L. & Wu Cheng, X. Can cysteinyl cathepsin activity control diet-induced NAFLD? Int J. Cardiol. Heart Vasc. 28, 100516 (2020).
PubMed PubMed Central Google Scholar
Ruiz-Blázquez, P., Pistorio, V., Fernández-Fernández, M. & Moles, A. The multifaceted role of cathepsins in liver disease. J. Hepatol. 75, 1192–1202 (2021).
Article PubMed Google Scholar
Anakk, S. et al. Combined deletion of Fxr and Shp in mice induces Cyp17a1 and results in juvenile onset cholestasis. J. Clin. Invest. 121, 86–95 (2011).
Article CAS PubMed Google Scholar
Gant, T. W. et al. Gene expression profiles associated with inflammation, fibrosis, and cholestasis in mouse liver after griseofulvin. EHP Toxicog. 111, 37–43 (2003).
CAS MATH Google Scholar
Li, L. & Falany, C. N. Elevated hepatic SULT1E1 activity in mouse models of cystic fibrosis alters the regulation of estrogen responsive proteins. J. Cyst. Fibros. 6, 23–30 (2007).
Article CAS PubMed Google Scholar
Lalani, A. I., Zhu, S., Gokhale, S., Jin, J. & Xie, P. TRAF molecules in inflammation and inflammatory diseases. Curr. Pharm. Rep. 4, 64–90 (2018).
Article CAS Google Scholar
Gao, L. et al. Tumor necrosis factor receptor-associated factor 5 (Traf5) acts as an essential negative regulator of hepatic steatosis. J. Hepatol. 65, 125–136 (2016).
Article CAS PubMed Google Scholar
Fang, J. et al. Scientometric analysis of mTOR signaling pathway in liver disease. Ann. Transl. Med. 8, 93 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
He, J. et al. Mammalian Target of Rapamycin Complex 1 Signaling Is Required for the Dedifferentiation From Biliary Cell to Bipotential Progenitor Cell in Zebrafish Liver Regeneration. Hepatology 70, 2092–2106 (2019).
Article CAS PubMed MATH Google Scholar
Matter, M. S., Decaens, T., Andersen, J. B. & Thorgeirsson, S. S. Targeting the mTOR pathway in hepatocellular carcinoma: current state and future trends. J. Hepatol. 60, 855–865 (2014).
Article CAS PubMed MATH Google Scholar
Cho, C.-S., Kowalsky, A. H. & Lee, J. H. Pathological Consequences of Hepatic mTORC1 Dysregulation. Genes (Basel) 11, 896 (2020).
Article CAS PubMed MATH Google Scholar
Chen, F. et al. Loss of Ufl1/Ufbp1 in hepatocytes promotes liver pathological damage and carcinogenesis through activating mTOR signaling. J. Exp. Clin. Cancer Res 42, 110 (2023).
Article CAS PubMed PubMed Central Google Scholar
Jin, S. et al. Inference and analysis of cell-cell communication using CellChat. Nat. Commun. 12, 1088 (2021).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Palla, G. et al. Squidpy: a scalable framework for spatial omics analysis. Nat. Methods 19, 171–178 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Cable, D. M. et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat. Biotechnol. 40, 517–526 (2022).
Article CAS PubMed MATH Google Scholar
Elosua-Bayes, M., Nieto, P., Mereu, E., Gut, I. & Heyn, H. SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes. Nucleic Acids Res 49, e50 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wei, R. et al. Spatial charting of single-cell transcriptomes in tissues. Nat. Biotechnol. 40, 1190–1199 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Biancalani, T. et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat. Methods 18, 1352–1362 (2021).
Article PubMed PubMed Central MATH Google Scholar
Vahid, M. R. et al. High-resolution alignment of single-cell and spatial transcriptomes with CytoSPACE. Nat. Biotechnol. 41, 1543–1548 (2023).
Article CAS PubMed PubMed Central MATH Google Scholar
Qian, J. et al. Reconstruction of the cell pseudo-space from single-cell RNA sequencing data with scSpace. Nat. Commun. 14, 2484 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Zhang, Q. et al. Leveraging spatial transcriptomics data to recover cell locations in single-cell RNA-seq with CeLEry. Nat. Commun. 14, 4050 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Zhang, D. et al. Spatial epigenome-transcriptome co-profiling of mammalian tissues. Nature 616, 113–122 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).
Article CAS PubMed MATH Google Scholar
Crowell, H. L. et al. muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data. Nat. Commun. 11, 6077 (2020).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Morgan, M. & Shepherd, L. ExperimentHub: Client to Access ExperimentHub Resources. (2022).
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Article PubMed PubMed Central MATH Google Scholar
Yuan, Z. et al. SODB facilitates comprehensive exploration of spatial omics data. Nat. Methods 20, 387–399 (2023).
Article CAS PubMed MATH Google Scholar
Wang, Y., Chang, D., Fu, Z. & Zhao, Y. Consistent Multiple Graph Embedding for Multi-View Clustering. IEEE transactions on multimedia 25, 1008–1018 (2021).
Lotfollahi, M. et al. Mapping single-cell data to reference atlases by transfer learning. Nat. Biotechnol. 40, 121–130 (2022).
Article CAS PubMed Google Scholar
Kipf, T. N. & Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations (ICLR) (2017).
Belghazi, I., Rajeswar, S., Baratin, A., Hjelm, R. D. & Courville, A. C. MINE: Mutual Information Neural Estimation. In Proceedings of the 35th International Conference on Machine Learning (PMLR), 531–540 (2018).
Hjelm, R. D. et al. Learning deep representations by mutual information estimation and maximization. In International Conference on Learning Representations (ICLR) (2019).
Nowozin, S., Cseke, B. & Tomioka, R. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization. Advances in Neural Information Processing Systems 29, 271–279 (2016).
Wang, M. et al. Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. arXiv abs/1909.01315 (2019).
Fowlkes, E. B. & Mallows, C. L. A Method for Comparing Two Hierarchical Clusterings. J. Am. Stat. Assoc. 78, 553–569 (1983).
Article MATH Google Scholar
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27–30 (2000).
Article CAS PubMed PubMed Central MATH Google Scholar
Fang, Z., Liu, X. & Peltz, G. GSEApy: a comprehensive package for performing gene set enrichment analysis in Python. Bioinformatics 39, btac757 (2023).
Article CAS PubMed Google Scholar
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).
Article CAS PubMed PubMed Central Google Scholar
Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
Article CAS PubMed PubMed Central MATH Google Scholar
Dr. Fan, X. & Jingyang, Q. ZJUFanLab/scNiche: scNiche v1.1.0. Zenodo https://doi.org/10.5281/zenodo.14195486 (2024).

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (U23A20513, X.F., 82274213, H.H.), the “Pioneer” and “Leading Goose” R&D Program of Zhejiang (2024C03106, X.F.), the Fundamental Research Funds for the Central Universities (226-2024-00001, X.F.). The authors thank the High-Performance Computing Cluster of Zhejiang University Innovation Center of Yangtze River Delta for their technical support and thank Dr. Yao Zhao and his team for developing the multi-view clustering framework CMGEC, which inspired us during the development of this project.

Author information

These authors contributed equally: Jingyang Qian, Xin Shao, Hudong Bao.

Authors and Affiliations

College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
Jingyang Qian, Xin Shao, Hudong Bao, Wenbo Guo, Chengyu Li, Anyao Li & Xiaohui Fan
State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314102, China
Jingyang Qian, Xin Shao, Wenbo Guo, Chengyu Li, Anyao Li & Xiaohui Fan
Zhejiang Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314100, China
Xin Shao, Wenbo Guo & Xiaohui Fan
College of Computer Science and Technology, Zhejiang University, Hangzhou, 310013, China
Yin Fang
Translational Chinese Medicine Key Laboratory of Sichuan Province, SiChuan Institute for Translational Chinese Medicine, Chengdu, 610041, China
Hua Hua
Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women’s Hospital, Zhejiang University School of Medicine, 310006, Hangzhou, China
Xiaohui Fan

Authors

Jingyang Qian
View author publications
Search author on:PubMed Google Scholar
Xin Shao
View author publications
Search author on:PubMed Google Scholar
Hudong Bao
View author publications
Search author on:PubMed Google Scholar
Yin Fang
View author publications
Search author on:PubMed Google Scholar
Wenbo Guo
View author publications
Search author on:PubMed Google Scholar
Chengyu Li
View author publications
Search author on:PubMed Google Scholar
Anyao Li
View author publications
Search author on:PubMed Google Scholar
Hua Hua
View author publications
Search author on:PubMed Google Scholar
Xiaohui Fan
View author publications
Search author on:PubMed Google Scholar

Contributions

X.F. and H.H. conceived the study. J.Q. and X.S. implemented the scNiche model. J.Q., X.S., H.B., Y.F., W.G., C.L., and A.L. collected datasets involved in this article, performed benchmarking experiments, and conducted experimental analyses on biological datasets. J.Q. wrote the manuscript, and all authors edited and revised the manuscript.

Corresponding authors

Correspondence to Xin Shao, Hua Hua or Xiaohui Fan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Nigel Chou, who co-reviewed with Vipul Singhal and Xinrui Zhou; Zhenqin Wu and Nancy Zhang for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Description of Additional Supplementary Files (download PDF )

Supplementary Data 1 (download XLSX )

Reporting Summary (download PDF )

Transparent Peer Review file (download PDF )

Source data

Source Data (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Qian, J., Shao, X., Bao, H. et al. Identification and characterization of cell niches in tissue from spatial omics data at single-cell resolution. Nat Commun 16, 1693 (2025). https://doi.org/10.1038/s41467-025-57029-9

Download citation

Received: 12 August 2024
Accepted: 03 February 2025
Published: 16 February 2025
Version of record: 16 February 2025
DOI: https://doi.org/10.1038/s41467-025-57029-9

This article is cited by

DECIPHER for learning disentangled cellular embeddings in large-scale heterogeneous spatial omics data
- Chen-Rui Xia
- Zhi-Jie Cao
- Ge Gao
Nature Communications (2025)
Hypergraph-driven spatial multimodal fusion for precise domain delineation and tumor microenvironment decoding
- Chengyang Zhang
- Xulong Li
- Yuansong Zeng
Communications Biology (2025)
Role of intratumoral heterogeneity in metastatic progression and drug resistance
- Neeha Sinai Borker
- Jyothilakshmi Sajimon
- Radhika Nair
Discover Oncology (2025)