Abstract
We propose HERGAST, a system for spatial structure identification and signal amplification in ultra-large-scale and ultra-high-resolution spatial transcriptomics data. To handle ultra-large spatial transcriptomics (ST) data, we consider the divide and conquer strategy and devise a Divide-Iterate-Conquer framework especially for spatial transcriptomics data analysis, which can also be adopted by other computational methods for extending to ultra-large-scale ST data analysis. To tackle the potential over-smoothing problem arising from data splitting, we construct a heterogeneous graph network to incorporate both local and global spatial relationships. In simulations, HERGAST consistently outperforms other methods across all settings with more than a 10% increase in average adjusted rand index (ARI). In real-world datasets, HERGAST’s high-precision spatial clustering identifies SPP1+ macrophages intermingled within colorectal tumors, while the enhanced gene expression signals reveal unique spatial expression patterns of key genes in breast cancer.
Similar content being viewed by others
Introduction
Spatial transcriptomics (ST) technologies have transformed our understanding of gene expression organization within tissues, providing valuable insights into cellular heterogeneity and tissue architecture. Recently, ST technology has been transitioning to high-definition platforms, such as Visium HD1, which commonly reaches about 500 K bins (Supplementary Table 1), and Spatial Molecular Imager (SMI)2 or Xenium3, which achieves subcellular resolution. This transition has significantly increased data size and sparsity, posing computational challenges in delineating spatial tissue architecture, identifying cell types, and detecting spatially specific gene expression. Please note that different ST platforms generate data with varying minimal captured locations, which are referred to by different names. For instance, in Visium HD, the minimal unit is typically an \(8{\rm{\mu }}m\times 8{\rm{\mu }}m\) resolution bin, whereas, in SMI and Xenium, the minimal unit is a cell. For the sake of convenience in this paper, we refer to these minimal units as “spots,” regardless of the ST platform used.
Despite the progress made through various computational approaches in analyzing ST data, they have struggled to perform effectively with these new ultra-large-scale and ultra-high-resolution technologies. BayesSpace4 is a Bayesian model with a Markov random field; the Markov chain Monte Carlo (MCMC) optimization process is quite time-consuming. SpatialPCA5 leverages a factor analysis model for spatially-aware dimensional reduction, but the storage of pairwise distance kernel makes it unusable when the number of spots scales up. Some deep learning approaches, such as SpaGCN6, SEDR7, conST8, STAGATE9, GraphST10, SiGra11, and xSiGra12 utilize graph neural networks for representation learning, integrating spatial location and gene expression to delineate tissue spatial structures. However, the GPU memory is commonly much smaller than CPU memory, making these methods unscalable for datasets with hundreds of thousands of spots. Simpler techniques like PCA, focusing solely on gene expression similarity, inadequately harness spatial information, thus limiting the comprehensive utilization of ST data. Additionally, the increased resolution and data scale dilute biological signals, with highly expressed genes overshadowing signals from lowly expressed ones in Xenium data3. Enhancing or denoising the biological signal in such large-scale, high-resolution datasets is also a pressing issue that must be addressed.
Given the substantial challenge posed by the scale of the dataset, we draw inspiration from the divide-and-conquer strategy prevalent in computer science and mathematics13. We split the whole slice into processable small batches, iteratively training an advanced model on these small patches, and then inferring on the whole slice and conducting downstream inference based on the model’s output. This strategic framework is termed “Divide-Iterate-Conquer” (DIC). However, employing this methodology without proper consideration may lead to a potential issue: the model is agnostic of other patches when trained on a specific patch, thus, global patterns may be neglected, and the learned spatial pattern can be fragmented and partial, an issue we refer to as over-smoothing. To solve this problem, we need to impose an implicit connection between different batches during training. A fundamental assumption guiding our approach is that cells located on either side of a partition border are likely to be of the same cell type, exhibiting similar gene expression profiles. Therefore, we introduce gene expression profile similarity between different spots as a means of establishing connections through which information can flow during training. By linking central spots to border spots within a patch and implicitly connecting border spots to border spots in adjacent patches based on expression similarities, implicit associations between spots across different patches are formed, enabling the model to capture global patterns. Further introducing of spatial neighborhood relationships as an additional connection between spots enhances the model’s ability to learn local patterns, creating a heterogeneous graph neural network. Furthermore, using a cross-attention mechanism, the model can dynamically learn the attention weights for different relationships, facilitating the adaptive fusion of local and global spatial relationships. A graph illustration can be found in Fig. 1b.
a We divide the spatial transcriptomics data into patches and iteratively train HERGAST on them; inference is then conducted on the entire slice. b To avoid over-smoothing, we use the heterogeneous graph neural network as the core model, introducing gene expression profile similarity between different spots as a means of establishing connections through which information can flow during training. By linking central spots to border spots within a patch (s14 to s16 and s8 to s11) and implicitly connecting border spots to border spots in adjacent patches (s16 to s11) based on expression similarities, implicit connections between spots across different patches are created (e.g., the orange dash-lined circle). Spatial neighborhood relationship is added as another connection between spots to make the output of model locally-aware (e.g., s2 to s3 and s4, the blue dash-lined circle). Each spot’s profile is transformed into a latent embedding by an encoder and reconstructed using a linear decoder. c By considering gene expression similarity and spatial proximity, HERGAST generates low-dimensional embeddings that enable fine-scale spatial clustering. The reconstructed expression profile serves as amplified gene expression signal.
To sum over, we introduce HERGAST (High-resolution Enhanced Relational Graph Attention Network for ST), a method taking advantage of the DIC framework, heterogeneous graph network, and cross-attention mechanisms to handle the overwhelmed data scale of high-definition ST technologies like Visium HD and Xenium. Through comprehensive analysis of simulated and 10 real-world datasets, HERGAST further demonstrated its superiority in deciphering highly heterogeneous ST landscapes. In the simulation study, HERGAST consistently outperformed other methods across all settings with more than a 10% average increase in all metrics. Notably, HERGAST showcases its capability to discern regional cell types even in the absence of cell boundary information, accurately replicating cell distribution characteristics within complex tissues such as colorectal cancer samples. Moreover, it enhances spatial distribution patterns of critical genes in high-resolution ST samples with high sensitivity and specificity, contributing to the precise identification and localization of rare cell types. These findings play an important role in unraveling the complexities of the tumor microenvironment and devising effective therapeutic strategies, and indicate that HERGAST stands out as a robust and meaningful approach for managing ultra-large-scale and ultra-high-resolution ST data.
Results
HERGAST model overview
Based on the core idea of utilizing a DIC framework for ultra-large-scale ST data analysis, HERGAST integrates DIC methodology, heterogeneous graph networks, and attention mechanisms to enhance spatial analysis. The DIC framework involves splitting the spatial slice into manageable patches, iteratively training on these patches, and then inferring on the whole slice, ensuring scalability across datasets of varying sizes (Fig. 1a, see “Methods”). Within each patch, HERGAST leverages a heterogeneous graph network to combine spatial proximity and gene expression similarity, enabling spots to leverage local spatial information and attend to distant spots with similar expression profiles, thereby addressing potential over-smoothing issues (Fig. 1b). By employing a cross-attention mechanism, the model dynamically learns attention weights for different relationships, facilitating the adaptive fusion of local and global spatial relationships.
After training, the model conducts comprehensive inference across the entire slice with encoder generating low-dimensional embeddings that capture both gene expression similarity and spatial proximity. The low-dimensional embedding can used for the elucidation of intricate spatial structures. Furthermore, by leveraging the spatial distribution relationships to inform gene expression profiles, the decoder of HERGAST model generates a reconstructed expression spectrum, which can be used to enhance critical spatial patterns and amplify biologically significant signals that may have been subtle in the original data (Fig. 1c).
HERGAST consistently outperformed other methods in simulation
We initiated our study by generating ST datasets of varying scales to evaluate the scalability of different existing methods. We evaluated all methods within the computational settings of an 80 GB A100 GPU, 512 GB of CPU memory. We compared the GPU memory usage, CPU memory usage, and time consumption of all methods under these settings with varying data sizes. We tested the scalability of various deep learning-based graph neural network models (GraphST, SEDR, conST, STAGATE, siGra, and xSiGra) before and after introducing the DIC strategy. We observed that these methods, without the DIC strategy, failed to analyze ST slices at a scale exceeding 80,000 spots due to GPU memory limitations. (Dashed lines in Fig. 2a). After introducing the DIC strategy, these graph neural network methods demonstrated greater scalability (beyond 80,000 spots) and lower GPU memory usage than before (Solid lines in Fig. 2a). We observed that among the methods, STAGATE benefitted most from the DIC strategy, with STAGATE (DIC) capable of processing datasets as large as 640,000 spots.
a Maximum GPU memory consumption of different GPU-based methods in different data scales. Dashed line indicated the maximum CUDA memory of GPU in our experiments (NVIDIA A100-SXM4-80GB). b Maximum memory consumption of methods based on graph networks that incorporate the DIC strategy alongside statistical model methods at different data scales. Since the last experiment results of each method are out-of-memory and the record cannot be measured, we estimated the approximate required cuda memory and CPU memory based on the memory allocation information obtained during the experiment. Source data are provided as a Source Data file. c An illustration of the simulated spatial transcriptome data. The left panel depicts the ground truth spatial pattern in a setting with 360,000 spots. The right panel demonstrates the increasingly complex spatial pattern as the number of spots increases, exemplified by the indicated area in the left panel. d Performance of spatial clustering of different methods across varying conditions. The x-axis represents the number of spots in the corresponding simulation setting, and y-axis represents the ARI score. We ran 10 independent replications for each setting; the data points represent mean value, and the error bars represent the standard error calculated across these replications. Source data are provided as a Source Data file. e Visualization results of last replication in 640,000 spots’ setting. Visualization results of all settings can be found in Supplementary Figs. 4–9. f Schematic diagram of ground truth in different data scenario. g ARIs of models using different graph types across varying conditions. We ran 10 independent replications for each setting. The bar represents mean value, and the error bars represent the standard error calculated across these replications. Data points of the 10 replications are also overlaid on the plot. Source data are provided as a Source Data file.
Despite the fact that the GPU memory of GraphST (DIC), SEDR (DIC), conST (DIC), siGra (DIC), and xSiGra (DIC) was not fully utilized, these methods failed to operate on datasets larger than 250,000 spots. Further investigation revealed that this limitation was related to CPU memory consumption. Specifically, these methods exceeded the available CPU memory capacity of 512 GB when processing datasets of 250,000 spots. When GPU memory constraints no longer limit scalability, the CPU memory consumption becomes the pivotal factor, similar to statistical models that depend solely on CPU memory. Consequently, we expanded our analysis to include the CPU memory consumption of various methods and incorporated two CPU-only computational methods for ST analysis: BayesSpace and SpatiaPCA. The results indicate that HERGAST consumes less CPU memory than other methods, enabling it to operate on larger datasets under the same computational conditions (Fig. 2b).
We also evaluated computational efficiency by measuring the computation time of various methods. HERGAST demonstrated impressive efficiency in terms of runtime. For the dataset with 80,000 spots, HERGAST required only 3.45 min, while the second method, SEDR, took 18.99 min, and SpatialPCA exceeded 2000 min, which is far from acceptable. Even with 640,000 spots, HERGAST’s execution time was just 34.25 min, whereas STAGATE (DIC) took 291.6 min (Supplementary Fig. 1c).
Next, we proceeded with an extensive simulation study to juxtapose the performance of HERGAST against other methods in identifying fine-scale spatial structures. We also included PCA (implemented via the Scanpy package14) as a baseline, which focus on straightforward dimensionality reduction predicated solely on expression similarity. Simulated datasets were crafted by manually designing spatial tissue structures containing diverse cell clusters. As the spot count escalated, the intricacy of the spatial tissue structure and the diversity of cell clusters increased, mirroring the evolution in ST technologies (Fig. 2c and Supplementary Figs. 3–8). Gene expressions for each cell cluster were stochastically sampled from the comprehensive Human Lung Cell Atlas (HLCA) with cell annotations15 (see “Methods”). The evaluation of performance relied on 4 metrics: Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), Fowlkes-Mallows Index (FMI), and Homogeneity score (HS) to provide a comprehensive assessment of model performance.
The results indicate that, for most methods, performance tends to decline as the data scale increases (Fig. 2d and Supplementary Fig. 2). In contrast, HERGAST demonstrates remarkable robustness, maintaining strong performance even with extremely large datasets. For ARI and FMI, HERGAST achieves overall average scores of 0.613 and 0.725, respectively, which is approximately 10% higher than the second-best method, STAGATE(DIC), with average ARI and FMI scores of 0.527 and 0.613 (Fig. 2d and Supplementary Fig. 2a, c). For NMI and HS, HERGAST’s advantages may not be as pronounced when the data scale is relatively small (Supplementary Fig. 2b, d). This could be attributed to insufficient data heterogeneity at smaller scales, making it easier to achieve high scores in homogeneity. However, as the data scale increases, HERGAST consistently shows significant advantages across all metrics (Fig. 2d and Supplementary Fig. 2), also outperforming the second method by ~10% in terms of both NMI and HS, with overall average scores of 0.702 and 0.741, compared to 0.615 and 0.646 for the second method. While STAGATE(DIC) demonstrated satisfactory performance regarding the quantitative metrics, there were instances of spatial domain over-smoothing, suggesting a limitation in capturing the heterogeneity present in large-scale, high-resolution ST data (Fig. 2f and Supplementary Fig. 8). These insights spotlight HERGAST’s efficacy in fine-scale spatial structure identification and its capacity to outperform existing methodologies in analyzing intricate ST data landscapes.
To further illustrate how HERGAST addresses potential over-smoothing problem by introducing heterogeneous graph network and how effective this approach is, especially across different data conditions (dense and sparse), we also conducted multiple repeated experiments using simulated data. Specifically, we utilized a dataset of 360,000 spots and randomly applied dropout to half of the spots to simulate a sparse scenario. In contrast, the full, original dataset served as the control group, representing a dense scenario (Fig. 2f). We compared the performance of DIC using the heterogeneous graph network (HERGAST) with using graph network based solely on spatial neighborhood relationship to evaluate their respective efficacies. Our results indicate that the heterogeneous graph network effectively reconstructs the true spatial patterns, whereas the traditional graph network exhibited significant over-smoothing (Supplementary Fig. 9a). From the comprehensive results of ten repetitions, we observed that the model based on heterogeneous graphs demonstrated significantly better performance than the model that considered only spatial neighborhood relationships. Additionally, the ARI for the model focusing solely on spatial neighborhood relationships decreased sharply from dense conditions (0.549) to sparse conditions (0.271). In contrast, HERGAST’s ARI exhibited only a slight decline, from 0.664 to 0.613 (Fig. 2g and Supplementary Fig. 9b), which shows again the robustness of HERGAST using heterogeneous graph network.
HERGAST accurately identifies spatial domains in CosMx SMI lung cancer with robustness
To evaluate the performance of HERGAST in deciphering spatial domains in real data, we compared it with previous methods on a single-cell spatial transcriptomics (SCST) data set of Lung-9-1 generated by NanoString CosMx SMI. This dataset consists of 20 field-of-views (FOVs) from lung cancer tumor tissue2. Here we combined these FOVs as whole slice to conduct integrative analysis. By employing the DIC strategy, we successfully scaled all methods to this dataset, which contains 87,589 spots, enabling a comprehensive comparison. The dataset includes original annotations that serve as ground truth (Fig. 3a), allowing us to evaluate each method’s performance using quantitative metrics.
a Spatial regions of ground truth provided by NanoString CosMx. b Performance comparison of all methods using ARI, NMI, FMI, and HC. Source data are provided as a Source Data file. c Visualization of spatial clustering results of top 3 methods in ARI performance. Numbers in title are the corresponding ARI score. Full visualization results can be seen in Supplementary Fig. 10. d d-1 shows the number of clusters of different methods from Leiden clustering with different resolution parameter in simulated data. Two dashed lines indicate the viable boundary (3 and 20). The black line shows number of methods in the viable interval. d-2 shows the performance of different methods (ARI) with different resolution parameters in simulated data. The x-axis is shared between two panels. e Same as a but in real-world data. Source data are provided as a Source Data file.
The results indicate that HERGAST outperforms the others in ARI and FMI, achieving scores of ARI = 0.53 and FMI = 0.639, highlighting HERGAST’s unique advantage in distinguishing between closely related cell types. Although SEDR (DIC) slightly surpassed HERGAST in NMI (0.431 vs. 0.424), and BayesSpace achieved a marginally higher HS (0.403 vs. 0.400), these findings suggest that these two methods may place greater emphasis on maintaining the integrity of local niches. Nevertheless, HERGAST’s overall performance remains better, further demonstrating the robustness of our method. SiGra and xSiGra follow closely behind, particularly SiGra, which shows a substantial gap compared to other methods (Fig. 3b, c). This indicates that SiGra incorporating additional image information is effective for understanding the spatial structure of ST slices. Notably, STAGATE (DIC) exhibited issues with over-smoothing, while results from conST were highly disorganized, yielding minimal useful information (Supplementary Fig. 10c).
In our spatial clustering analysis, all methods utilize the Leiden algorithm to produce the final clustering results, except for BayesSpace, which is an end-to-end clustering method that does not require Leiden clustering. As a graph-based clustering algorithm, the resolution parameter plays a significant role in determining the number of clusters generated. To provide a more comprehensive understanding of how this key parameter affects clustering outcomes, we conducted extensive experiments on both simulated (80k spots so that all methods can scale to this data after applying DIC strategy) and real data (SMI Lung cancer).
To determine an optimal parameter range to conduct sensitivity analysis, we first assessed how varying the resolution parameter impacted the number of clusters produced by all methods. While we know that the datasets contain approximately ten cell types, it is often challenging to ascertain the exact number of cell types in real-world applications. An insufficient number of clusters can result in the loss of valuable information, while an excessive number can lead to confusion and difficulties in interpretation. Therefore, we deemed a range of cluster counts between 3 and 20 to be viable. If the majority of methods produce results within this range for a specific resolution parameter, we consider that parameter to be fair and reasonable. Our findings indicate that for simulated data, resolution values between 0.01 and 0.1 are generally recommended, while for real data, values ranging from 0.2 to 0.5 appear to be effective (Fig. 3d-1, e-1).
Next, we conducted a comprehensive sensitivity analysis, which demonstrated that HERGAST consistently outperformed other methods, both within and outside the recommended parameter interval in the simulated data (Fig. 3d-2). In the case of real data, HERGAST also exhibited the best performance across all evaluations within the recommended parameter range of 0.2–0.5, ranking second in only a few instances outside this interval, remaining slightly below the state-of-the-art (SOTA) values (Fig. 3e-2). These results further underscore the robustness of the HERGAST model.
HERGAST enable fine-scale tumor microenvironment mapping in colorectal cancer
Colorectal cancer (CRC) represents a significant health burden globally and is characterized by intricate tumor microenvironments that play pivotal roles in disease progression and treatment outcomes16,17. Analyzing CRC spatial landscapes can offer valuable insights into tumor-stroma interactions, immune responses, and cellular heterogeneity critical for understanding disease mechanisms and developing targeted therapies18,19. In this context, leveraging ST data is instrumental as it enables mapping gene expression patterns within their spatial context, providing a comprehensive view of the tumor microenvironment20. The utilization of high-resolution ST technologies like Visium HD offers distinct advantages in capturing detailed spatial gene expression profiles with exceptional granularity, essential for delineating complex spatial architectures within CRC samples. Here we utilized a \(8{\rm{\mu }}m\times 8{\rm{\mu }}m\) binned Visium HD slice of colorectal cancer from 10X genomics, which contains 545,913 bins with 18,085 genes’ expression profiled for analysis. Since only STAGATE (DIC) and PCA can also scale up to this data, we only benchmark HERGAST with these two methods here.
From the results we can see, HERGAST provided a smoother and more refined delineation of tumor stroma regions, distinguishing them from tumor regions, unlike PCA where stroma regions tended to intermingle with tumors (Fig. 4b). This finer resolution is crucial for accurately mapping the tumor microenvironment, enabling better understanding of tumor-stroma interactions for future cancer research21. STAGATE (DIC) still suffered from over-smoothing, splitting normal colon mucosal tissue into different clusters (right panel of Fig. 4a, box ①) and disrupting the continuity of tumor areas (right panel of Fig. 4a, box ②, ③, and ④). Within tumor regions, HERGAST identified a distinct gene expression profile area. Highly expressed genes in this region were normalized and merged into a metagene, showing an evident spatial distribution pattern (Fig. 4c). Enrichment analysis revealed significant upregulation of immune-related pathways, indicating enhanced anti-tumor immune response and active infiltration of immune cells within the tumor microenvironment22. HERGAST also identified a unique cluster of SPP1+ macrophages surrounding calcified areas (Fig. 4d, Supplementary Fig. 11), which were validated as phagocytes through correlation with HE pathological images and confirmation by experienced pathologists. SPP1+ macrophages play a crucial role in the tumor microenvironment and are significant for patient prognosis23,24,25. In this dataset, SPP1+ macrophages are distributed around tumor cells, forming clusters, strands, or encircling gland-like structures that resemble epithelial cells, posing identification challenges (Fig. 4d). PCA and STAGATE(DIC) failed to correctly identify these cells. PCA misclassified them as tumor stroma cells, while STAGATE(DIC) grouped them with adjacent cells (Fig. 4a, g). In contrast, HERGAST accurately identified these macrophages within the tumor, underscoring its advantage in deciphering complex spatial organization and revealing heterogeneity within tumor regions (Fig. 4e, f). HERGAST’s precise identification of macrophages and immune cells is crucial for understanding the immune landscape of colorectal cancer and informing immunotherapy development26. Subsequent validations across additional Visium HD datasets (human lung cancer, mouse brain, and mouse small intestine) further accentuated HERGAST’s superiority in delineating spatial domain patterns, underscoring its robustness across diverse ST contexts (Supplementary Figs. 12–14).
a Spatial clustering results of different methods in Visium HD human colorectal cancer slice. These same results are observed across 10 repeated experiments. b A zoomed-in view of a tumor area with hematoxylin and eosin (H&E) stained image (top left panel), HERGAST result (top right panel), PCA result (bottom left panel), and STAGATE(DIC) result (bottom right panel). c Left: dot plot of the highly expressed genes in HERGAST’s unique cluster indicated as box c in Fig. 3a. Right top: spatial expression of the combined metagene. Right bottom: corresponding significantly enriched biological pathways determined with the Metascape web tool. d Zoom-in view of six example of the spatial cluster of phagocytes uniquely identified by HERGAST (indicated as dash line boxes d in Fig. 3a) and the corresponding H&E image. e SPP1 spatial expression plot. f Separate display of the corresponding spatial cluster of HERGAST. g Separate display of the spatial cluster of PCA containing the corresponding area.
HERGAST enhanced critical molecular signature with high sensitivity and specificity
Utilizing high-resolution ST data, such as that provided by Xenium technology, offers a unique opportunity to delve into the intricate molecular landscape of cancer samples. However, despite its advantages, one potential limitation of high-resolution data like Xenium is the risk of oversaturation of high-intensity signals, which may mask or overshadow lower-intensity signals of biological significance3. In light of this challenge, beyond traditional spatial clustering methods, we are also interested in enhancing critical molecular signatures in Xenium samples. HERGAST is designed to learn both local and global spatial features, allowing it to effectively recognize and amplify signals from genes with inherent spatial characteristics. To elucidate how HERGAST enhances the signal-to-noise ratio, we conducted a series of simulations based on the methodology described by Yuan et al27. for simulating spatially variable genes. We utilized two widely accepted distributions in single-cell and ST: the zero-inflated Poisson distribution (ZIP) and ZINB. Through this approach, we simulated gene expression values across various spatial expression patterns. To assess HERGAST’s robustness, we introduced Gaussian noise to the original data, effectively reducing the expression signal in marked regions and lowering the signal-to-noise ratio (Fig. 5a). This noisy data served as the input for the HERGAST model. To evaluate the correspondence between the reconstructed gene expression and the original expression, we used the Pearson correlation coefficient as our metric.
a Reconstruction results of HERGAST based on zero-inflated Poisson distribution. The top panels are visualizations of the gene’s spatial expression. Numbers in the title are the Pearson correlation coefficients with ground truth expression. Bottom panels are distributions of gene expression in mark (highly expressed, in orange color) and non-mark area (lowly expressed, in blue color). Full results of various spatial pattern can be seen in Supplementary Fig. 15. Results based on zero-inflated negative binomial distribution can be seen in Supplementary Fig. 16. b Same as (a) but on negative control case. Negative control case based on zero-inflated negative binomial distribution can be seen in Supplementary Fig. 17. c Reconstruction results of lymph node markers in normal colon tissue. From left to right, panels are: H&E staining image of this sample containing manual annotation of lymph node areas; Clustering result of HERGAST; Original expression of the lymph node marker meta gene; Reconstructed expression of the lymph node marker meta gene. These same results are observed across 10 repeated experiments. d H&E image, and spatial clustering results of different methods in Xenium human breast cancer slice. These same results are observed across 10 repeated experiments. e The original and amplified spatial expression of ERBB2, ESR1, PGR, and EGFR in selected regions. Full spatial expression landscapes can be found in Supplementary Fig. 18. f Zoom-in view of the H&E image, original and amplified spatial expression of EGFR and ESR1 in selected corresponding red box regions in (e). More examples can be found in Supplementary Fig. 19.
Our results showed that, despite the heavily noised data making it difficult to discern the original spatial patterns (with a Pearson correlation coefficient of only 0.37), the reconstructed gene expression values could largely restore the original spatial patterns, achieving a Pearson correlation coefficient as high as 0.89 (Fig. 5a). Additionally, the model effectively separated the expression distributions of marked and non-marked regions. This outcome was consistent across different spatial patterns and both distributions (Supplementary Figs. 15 and 16), proving the high sensitivity of HERGAST model.
Furthermore, to demonstrate that the reconstructed data does not produce spurious spatial patterns, we generated completely random data under both distributions as a negative control. The reconstruction from this random input yielded results similar to the noise data, showing no discernible spatial patterns. This indicates that for genes lacking inherent spatial patterns, HERGAST’s reconstructed expression remains random (Fig. 5b and Supplementary Fig. 17). To prove this in real data, we sequenced a Xenium slice of human normal colorectal tissue containing small lymph nodes and conducted experiments on this slice. We utilized lymph node markers curated by Zhang et al. (Supplementary Table 2), normalizing and merging them into a metagene to assess the overall reconstruction outcome by HERGAST. The results indicate that HERGAST accurately identified the lymph node area in clustering result. The original and reconstructed expression of the lymph node markers align well with the manually annotated lymph node areas in the H&E image (Fig. 5c). Notably, HERGAST enhanced the signal only in regions where the genes are actually expressed (the lymph nodes), while signals in other areas remained zero (Fig. 5c). This finding further demonstrates HERGAST’s specificity - does not produce spurious spatial patterns or false positive signals.
Breast cancer remains a formidable challenge in the realm of oncology, characterized by its molecular heterogeneity and the complex interplay of various cellular components within the tumor microenvironment28. In this context, we used a Xenium breast cancer tissue section (167,780 cells, 280 genes) from 10X genomics for comprehensive analysis. HERGAST perfectly matched manually annotated regions of interest, successfully identifying and separating invasive cancer, ductal carcinoma in situ (DCIS) #1, and DCIS #2. In contrast, PCA cannot distinguish between the two DCIS areas, and STAGATE(DIC) results exhibited excessive regional smoothing, providing limited useful information (Fig. 5d). Similar patterns were observed in four other cancer slices (Supplementary Figs. 20–23). To validate the enhancement of significant biological signals in our reconstructed gene expression, we examined the expression patterns of key breast cancer genes (ERBB2, ESR1, PGR, and EGFR) in both the original and reconstructed datasets. The enhanced expression clearly revealed a triple-positive region for ERBB2, ESR1, and PGR, consistent with previous reports3 (Fig. 5e and Supplementary Fig. 18). Additionally, an intriguing observation from the enhanced expression data was the distinct spatial distribution of EGFR and ESR1 within the DCIS regions. EGFR-expressing cells were predominantly located around necrotic areas within the DCIS, forming a clear boundary shape, while ESR1-expressing cells exhibited a diffuse distribution in the DCIS (Fig. 5f and Supplementary Fig. 19). Notably, there is little overlap in the spatial expression of these two genes in this slice. This understanding of spatial distribution pattern of the molecular signature can provide insights into microenvironmental influences and cellular interactions within DCIS29, which are not easily discernible from the original expression signals.
Discussion
In conclusion, HERGAST emerges as a highly effective approach tailored for spatial clustering and signal amplification in the realm of ultra-large-scale and ultra-high-resolution ST data. At the heart of our methodology lies the DIC strategy, strategically designed to efficiently tackle scalability concerns in spatial analysis. By leveraging the capabilities of a heterogeneous graph network and cross-attention mechanism to seamlessly integrate gene expression similarity and spatial proximity, HERGAST adeptly captures the intricate nuances of both local and global spatial relationships, skillfully sidestepping potential over-smoothing pitfalls.
Our simulation results underscore the vast potential of the DIC strategy in enhancing the scalability of methods for handling vast ST datasets, thus positioning the DIC strategy as a promising solution for overcoming the computational challenges posed by modern high-resolution ST technologies. In comparative analyzes, HERGAST consistently outperformed other methods across all scenarios, demonstrating superior performance even as the spot numbers escalated. This unwavering superiority highlights HERGAST’s robustness in managing large-scale data and highly heterogeneous scenarios, emphasizing its prowess in fine-scale spatial structure identification.
Regarding applications, HERGAST continues to showcase exceptional performance, exhibiting enhanced spatial clustering capabilities in SMI Lung cancer and refined delineation of tumor stroma regions in the Visium HD CRC sample. Noteworthy is its proficiency in pinpointing high immune-response regions and unveiling the nuanced heterogeneity within tumor regions, offering invaluable insights into the dynamic intricacies of the tumor microenvironment and potential immunotherapeutic targets.
Furthermore, simulation and real data application demonstrated HERGAST’s high sensitivity and specificity to amplify the expression signals of critical genes in high-resolution ST data. The enhanced expression patterns of key breast cancer genes offered valuable insights into the spatial distribution of molecular signatures, particularly highlighting the identification of a triple-positive region for ERBB2, ESR1, and PGR, as well as the distinct localization of EGFR and ESR1 within the DCIS regions. Collectively, HERGAST represents a leap forward in the analysis of ultra-large-scale and ultra-high-resolution ST data, empowering researchers to delve into complex biological systems with unparalleled resolution and depth.
Methods
Construction of relational graph
The spatial neighborhood relationship is established by considering the Euclidean distance between the spatial locations of different spots. In simulation and Visium HD data, we set the graph to include the eight nearest neighbors for each spot. For other datasets, spot \(i\) and spot \(j\) are connected if the Euclidean distance between spot \(i\) and spot \(j\) is less than a pre-defined hyperparameter \(d\). The expression similarity relationship is constructed by considering the similarity of the PCA representation or selected highly variable genes of the gene expression spectra of different spots. Here we use PCA representation for Visium HD data and all genes’ expression profiles for Xenium data. We use Euclidean distance of each spot’s input representation to evaluate the similarity. Across all datasets, we connect each spot to the six spots exhibiting the most similar expression patterns.
Divide: split ultra-large-scale ST data into patches
To scale up the relational graph attention network to arbitrarily large ST datasets, we devise a “DIC” framework especially crafted for ST data analysis. We first split the ST data into multiple patches. The number of patches to split to is a hyperparameter. If the number of patches is too large, the number of spots in each patch will be too small to contain enough information and cause the training process hard to converge. If the number of patches is too small, the number of spots in each patch will be too large, so the problem comes back to the intractable scale of data. To guide the choice of number of spots when splitting the whole slice, we conducted a comprehensive analysis (See Supplementary Note 1 and Supplementary Fig. 24) and suggest splitting the original slice to \(m\times m\) patches where \(m=\text{round}\left(\sqrt{n/10000}\right)\) so that, on average, each patch will have about 10,000 spots. This is also what we adopted in this research.
Iterate: Iterative training through a relational graph attention network
After splitting original ST slice into several patches, training is iteratively conducted on each patch. As we have discussed before, over-smoothing problem may occur if we cannot carefully design a model that can take both local and global features into account. Here, we utilize a heterogeneous graph network that integrates both spatial proximity and gene expression similarity. An intuitive explanation can be found in Fig. 1a: When constructing the heterogeneous graph, a central spot in a small patch can attend to the spots at the boundary through gene expression similarity relationship (s14 to s16 and s8 to s11), which are correlated to neighboring patches through shared gene expression (s16 to s11). This relationship arises because the cells on either side of a splitting border often belong to the same cell type. So, implicit connections between spots across different patches are created (e.g., the orange dash-lined circle). Therefore, even if the training process is conducted on small patches, HERGAST model can also implicitly learn some global patterns. This effectively mitigates the issue of over-smoothing that might arise from splitting the data during training. Besides, we have conducted a comprehensive ablation study in different data conditions to validate the effectiveness of the heterogeneous graph network model (Supplementary Fig. 9).
Through a cross-attention mechanism, our model adaptively learns attention weights for different relationships, automatically weighs the importance of spatial neighbors, and expression similarity. The relational graph attention auto-encoder consists of an encoder of two relational graph attention layers and a linear layer decoder.
Encoder
The input of the encoder in our architecture is the relational graphs constructed on small patches with \(R=2\) relation types and \(N\) nodes (spots). The \({i}_{{th}}\) node is represented by a feature vector of the PCA representation or selected highly variable genes expression of \({{\bf{x}}}_{i}\). Query and key representations for the \({l}_{{th}}\) encoder layer are computed for each relation type with the help of both query and key kernels, i.e.,
where \({{\boldsymbol{W}}}_{l}\) is the trainable weight matrix of layer \(l\). Then, additive attention30 is applied to compute attention logits \({a}_{{ij}}^{\left(r\right)}\):
Then the attention coefficients for each relation type are then obtained via the across-relation attention mechanism:
where \({\mathcal{R}}\) denotes the set of relations, i.e., edge types. \({{\mathcal{N}}}_{{{\mathcal{r}}}^{{\prime} }}^{\left({\mathcal{i}}\right)}\) denotes the set of spots connected to node \(i\) under relation \({r}^{{\prime} }\). To avoid overfitting, we employed the dropout strategy on the normalized attention coefficients with dropout rate = 0.3. That is, \({\alpha }_{{ij}}^{\left(r\right)}\) will be set as 0 with a probability of 0.3.
Then spot \(i\) collectively aggregating information from spots connected to it with the neighborhood aggregation step to get the output of layer \(l\). We denote the intermediate representation of \({{\bf{x}}}_{j}^{\left(r\right)}\) as \({{\bf{h}}}_{j}^{\left(r\right)}\): \({{\bf{h}}}_{j}^{\left(r\right)}={{\boldsymbol{W}}}_{l}^{\left(r\right)}{{\bf{x}}}_{j}\). To enhance the discriminative power of the HERGAST layer, we further implement the additive cardinality preservation mechanism31:
where \({\mathcal{W}}\) is a non-zero vector \(\in {R}^{n}\), n is the output dimension of layer \(l\), \(\odot\) denotes the elementwise multiplication.
Decoder
The decoder reverses the latent embedding back into the original input representation. The one-layer linear decoder treats the output of the encoder (denoted by \({{\bf{h}}}_{i}\)) as its input and computes the reconstructed result. Specifically, the decoder computes the reconstructed result as follows:
where \({\boldsymbol{W}}\) and \({\bf{b}}\) are learnable weight matrix and bias vector. If the input representation is expression of HVG, then a ReLu activation is conducted to get non-negative reconstruction.
Loss function
The objective of HERGAST is to minimize the reconstruction loss of the original PCA profiles as follows:
In the training process, the model is iteratively trained on each patch. In all the experiments, we used Adam optimizer with learning rate = 0.001, weight decay = 0.0001, and training epoch = 200.
Conquer: Inference on whole slice using the trained model weights
Since the computation overhead of inference process is much less than training process, inference is conducted on the relational graph conducted based on the entire original dataset. Since the trained model has learned local and global spatial patterns, the output of encoder is considered as the final spot embedding and used to conduct spatial clustering. The output of decoder is considered as the reconstructed amplified gene expression profile.
Simulation study for spatial clustering
For a comprehensive comparison of various methods on simulated data, we utilized the integrated HLCA as the single-cell reference dataset. The HLCA consists of over 2.3 million lung single cells with well-established cell type annotations. To simulate different spatial resolutions, we manually designed different spatial patterns with an increasing number of spots and more complex and refined spatial structures. We populated these spatial regions by extracting different cell types from the HLCA and randomly selected one cell type to be diffusely distributed throughout the entire spatial area to mimic widely present cells. Among the compared methods, SiGra and xSiGra need accompanying images as input, so the pattern images used to generate simulated data are fed into the model, offering substantial information. To eliminate potential biases from cell type selection, we conducted ten replicates for each setting. Spatial clustering results were obtained using Leiden algorithm implemented by Scanpy package with resolution = 0.01 for all the settings. The performance of the methods was evaluated based on the accordance to the ground truth, with a particular focus on the ARI, NMI, FMI, and HS.
Evaluation metrices for spatial clustering
Adjusted Rand Index (ARI)
The Rand Index quantifies the concordance between two clustering assignments by evaluating pairwise relationships among samples. Specifically, it examines whether pairs of samples are grouped together or separately in both the predicted and reference clusterings. To account for random chance alignment, the initial agreement metric is subsequently normalized through a mathematical adjustment process:
To compute this metric, begin by constructing a cross-tabulation matrix:
\({Y}_{1}\) | \({Y}_{2}\) | \(\ldots\) | \({Y}_{s}\) | \({\text{Sums}}\) | |
\({X}_{1}\) | \({n}_{11}\) | \({n}_{12}\) | \(\cdots\) | \({n}_{1s}\) | \({a}_{1}\) |
\({X}_{2}\) | \({n}_{21}\) | \({n}_{22}\) | \(\cdots\) | \({n}_{2s}\) | \({a}_{2}\) |
\(\vdots\) | \(\vdots\) | \(\vdots\) | \(\ddots\) | \(\vdots\) | \(\vdots\) |
\({X}_{r}\) | \({n}_{r1}\) | \({n}_{r1}\) | \(\cdots\) | \({n}_{{rs}}\) | \({a}_{r}\) |
\({\text{Sums}}\) | \({b}_{1}\) | \({b}_{2}\) | \(\cdots\) | \({b}_{s}\) |
each entry in this matrix indicates the count of instances that are assigned to both a specific cluster group (Y) and a corresponding ground truth category (X). Subsequently, the ARI can be derived based on this matrix:
The ARI is bounded below by −0.5 for especially discordant clusterings. The ARI guarantees a value ~0.0 for randomly assigned labels, regardless of the sample size or number of clusters, while achieving a perfect score of 1.0 when cluster assignments are identical (allowing for label permutations). Notably, for highly dissimilar clustering results, the ARI has a lower bound of −0.5.
Normalized Mutual Information (NMI)
Mutual Information (MI) quantifies the degree of association between two distinct label assignments for a given dataset. Let \(\left|{U}_{i}\right|\) represent the cardinality of cluster \({U}_{i}\) and \(\left|{V}_{j}\right|\) denote the number of elements in cluster \({V}_{j}\). The MI between two clustering partitions \(U\) and \(V\) is mathematically expressed as follows:
NMI represents a scaled version of the MI metric, transforming its values into a standardized range from 0 (indicating complete statistical independence) to 1 (representing perfect label correspondence).
Fowlkes-Mallows Index (FMI)
FMI is mathematically expressed as the geometric mean of precision and recall metrics, formulated as follows:
In this context, True Positives represent instances where data point pairs are correctly grouped into identical clusters in both the ground truth and predicted cluster assignments. False Positives denote cases where pairs are erroneously grouped together in the predicted clustering but not in the actual classification. Conversely, False Negatives occur when pairs are incorrectly separated in the predicted clustering despite belonging to the same cluster in the reference classification. The FMI metric yields values within the interval [0, 1], with values approaching unity indicating strong agreement between the compared clustering solutions.
Homogeneity score (HS)
A clustering result satisfies homogeneity if all of its clusters contain only data points that are members of a single class. That is, the class distribution within each cluster should be skewed to a single class, that is, zero entropy. Rosenberg et al32. determine how close a given clustering is to this ideal by examining the conditional entropy of the class distribution given the proposed clustering. For a set of classes \(C=\{{c}_{i}|i=1,\ldots,n\}\) and a set of clusters \(K=\{{k}_{i}|1,\ldots,m\}\). Let \(A\) be the contingency table produced by the clustering algorithm representing the clustering solution, such that \(A=\{{a}_{{ij}}\}\) where \({a}_{{ij}}\) is the number of data points that are members of class \({c}_{i}\) and elements of cluster \({k}_{j}\). The HS is defined as:
where
HS ranges from 0.0 to 1.0. 1.0 stands for perfectly homogeneous labeling.
Simulation study for gene expression amplification
To elucidate how HERGAST enhances the signal-to-noise ratio, we conducted a series of simulations based on the methodology described by Yuan et al.27 for simulating spatially variable genes. Specifically, we employed two widely accepted distributions in single-cell transcriptomics: the zero-inflated negative binomial distribution (ZINB) and the ZIP. For each spatial pattern, total number of spots was fixed as 160,000, and spatial coordinates were generated using a Poisson Point Process (PPP). The expression distribution in marked areas was assigned a higher mean value (indicating highly expressed regions), while the expression distribution in non-marked areas was assigned a lower mean value (indicating lowly expressed regions). As a negative control, we generated expression values in all spots as random samples from a same distribution, which posed no spatial pattern. We then added Gaussian noise to the original data, with both the mean and variance equal to the mean of the marked area, effectively reducing the relative expression signal in the marked regions and lowering the signal-to-noise ratio. This noisy data served as the input for HERGAST. We use Pearson correlation coefficient to measure the concordance between the reconstructed data and the original data:
For a specific gene, \({g}_{i}\) is the original expression of \({i}_{{th}}\) spots and \(\hat{{g}_{i}}\) is the reconstructed expression of \({i}_{{th}}\) spots. \(\bar{g}\) is the mean expression across all spots.
Real data analysis
We applied HERGAST to several ultra-large, super resolution ST datasets generated by SMI (Lung cancer 9–1), Visium HD (Human Colorectal Cancer, Human Lung Cancer, Mouse Brain, Mouse Small Intestine) and Xenium (Human Breast Cancer, Human Colorectal Cancer, Human Pancreatic Ductal Adenocarcinoma, Human Lung Cancer, Human Ovarian Cancer). For SMI Lung cancer data, all 20 FOVs are concatenated as a whole slice for integrative analysis, annotations of cell types are obtained from their provided Giotto object. For Visium HD, the 8 µm binned data were used in this study. The data sources and statistics are summarized in Supplementary Table 1. For SMI and Visium HD datasets, we conducted normalization and scaling and then ran PCA using the Scanpy package14. The PCA representation of each spot was selected as the input of HERGAST (Dimension of 200 was used in SMI and all Visium HD datasets, see Supplementary Note 1 and Supplementary Fig. 25 for rationality analysis). For Xenium datasets, the normalized expression of all genes was selected as the input of HERGAST for reconstruction. For all methods, spatial clustering results were obtained using the Leiden algorithm implemented by the Scanpy package with resolution = 0.3 for all the datasets.
Reporting summary
Further information on research design is available in Nature Portfolio Reporting Summary linked to this article.
Data availability
The human normal colorectal Xenium data generated in this study have been deposited in the OMIX, China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences under accession code OMIX009402 (Normal sample). This data is available under restricted access according to the Method of Scientific Data Management and the Regulation on the Management of Human Genetic Resources of the People’s Republic of China. Access can be obtained by consenting academic-only usage and making application in the system. The human colorectal cancer Visium HD data used in this study is available from 10X Genomics’ data portal: https://www.10xgenomics.com/datasets/visium-hd-cytassist-gene-expression-libraries-of-human-crc. Human breast cancer Xenium data is accessed and downloaded from https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast (In Situ Sample 1, Replicate 1). The lung cancer dataset of NanoString CosMx SMI is available from https://nanostring.com/products/cosmx-spatial-molecular-imager/ffpe-dataset/nsclc-ffpe-dataset (Lung-9-1). Other public data used in supplementary experiments are summarized in Supplementary Table 1. Source data are provided with this paper.
Code availability
The code used to develop the model, perform the analyzes and generate results in this study is publicly available and has been deposited in GitHub at https://github.com/GYQ-form/HERGAST, under MIT license. The specific version of the code associated with this publication is archived in Zenodo and is accessible via https://doi.org/10.5281/zenodo.1500009433.
References
Nagendran, M. et al. 1457 Visium HD enables spatially resolved, single-cell scale resolution mapping of FFPE human breast cancer tissue. J. Immunother. Cancer 11, A1620–A1620 (2023).
He, S. et al. High-plex imaging of RNA and proteins at subcellular resolution in fixed tissue by spatial molecular imaging. Nat. Biotechnol. 40, 1794–1806 (2022).
Janesick, A. et al. High resolution mapping of the tumor microenvironment using integrated single-cell, spatial and in situ analysis. Nat. Commun. 14, 8353 (2023).
Zhao, E. et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat. Biotechnol. 39, 1375–1384 (2021).
Shang, L. & Zhou, X. Spatially aware dimension reduction for spatial transcriptomics. Nat. Commun. 13, 7203 (2022).
Hu, J. et al. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. Methods 18, 1342–1351 (2021).
Fu, H. et al. Unsupervised spatially embedded deep representation of spatial transcriptomics. bioRxiv https://doi.org/10.1101/2021.06.15.448542 (2021).
Zong, Y. et al. conST: an interpretable multi-modal contrastive learning framework for spatial transcriptomics. BioRxiv, 2022.2001. 2014.476408 (2022).
Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun. 13, 1739 (2022).
Long, Y. et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat. Commun. 14, 1155 (2023).
Tang, Z. et al. SiGra: single-cell spatial elucidation through an image-augmented graph transformer. Nat. Commun. 14, 5618 (2023).
Budhkar, A. et al. xSiGra: explainable model for single-cell spatial data elucidation. Brief. Bioinform. 25, bbae388 (2024).
Morales, F. A. & Martínez, J. A. Analysis of Divide-and-Conquer strategies for the 0–1 minimization knapsack problem. J. Comb. Optim. 40, 234–278 (2020).
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Sikkema, L. et al. An integrated cell atlas of the lung in health and disease. Nat. Med. 29, 1563–1577 (2023).
Siegel, R. L., Wagle, N. S., Cercek, A., Smith, R. A. & Jemal, A. Colorectal cancer statistics, 2023. CA Cancer J. Clin. 73, 233–254 (2023).
Dienstmann, R. et al. Consensus molecular subtypes and the evolution of precision medicine in colorectal cancer. Nat. Rev. Cancer 17, 79–92 (2017).
Isella, C. et al. Stromal contribution to the colorectal cancer transcriptome. Nat. Genet. 47, 312–319 (2015).
Sveen, A. et al. Colorectal cancer consensus molecular subtypes translated to preclinical models uncover potentially targetable cancer cell dependencies. Clin. Cancer Res. 24, 794–806 (2018).
Valdeolivas, A. et al. Profiling the heterogeneity of colorectal cancer consensus molecular subtypes using spatial transcriptomics. NPJ Precis. Oncol. 8, 10 (2024).
Bremnes, R. M. et al. The role of tumor stroma in cancer progression and prognosis: emphasis on carcinoma-associated fibroblasts and non-small cell lung cancer. J. Thorac. Oncol. 6, 209–217 (2011).
Chandra, P., Grigsby, S. J. & Philips, J. A. Immune evasion and provocation by Mycobacterium tuberculosis. Nat. Rev. Microbiol. 20, 750–766 (2022).
Bill, R. et al. CXCL9:SPP1 macrophage polarity identifies a network of cellular programs that control human cancers. Science 381, 515–524 (2023).
Ozato, Y. et al. Spatial and single-cell transcriptomics decipher the cellular environment containing HLA-G+ cancer cells and SPP1+ macrophages in colorectal cancer. Cell Rep. 42, 111929 (2023).
Qi, J. et al. Single-cell and spatial analysis reveal interaction of FAP(+) fibroblasts and SPP1(+) macrophages in colorectal cancer. Nat. Commun. 13, 1742 (2022).
Galon, J. et al. Type, density, and location of immune cells within human colorectal tumors predict clinical outcome. Science 313, 1960–1964 (2006).
Yuan, X. et al. HEARTSVG: a fast and accurate method for identifying spatially variable genes in large-scale spatial transcriptomics. Nat. Commun. 15, 5700 (2024).
Prat, A. et al. Clinical implications of the intrinsic molecular subtypes of breast cancer. Breast 24, S26–S35 (2015).
Stahl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353, 78–82 (2016).
Busbridge, D., Sherburn, D., Cavallo, P. & Hammerla, N. Y. Relational graph attention networks. arXiv preprint arXiv:1904.05811 (2019).
Zhang, S. & Xie, L. Improving attention mechanism in graph neural networks via cardinality preservation. IJCAI 2020, 1395–1402 (2020).
Rosenberg, A. & Hirschberg, J. In Proc. 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). 410–420 (Association for Computational Linguistics, 2007).
Gong, Y. et al. Unveiling fine-scale spatial structures and amplifying gene expression signals in ultra-large ST slices with HERGAST. HERGAST v1.0.0 https://doi.org/10.5281/zenodo.15000094 (2025).
Acknowledgements
This study was supported by grants from the National Natural Science Foundation of China (Grant No. 12171318 to Z.Y.), the Shanghai Science and Technology Commission (Grant No. 21ZR1436300, 23XD1401900, and 23DZ2290600 to Z.Y.), the Shanghai Jiao Tong University STAR Grant (Grant No. 20190102 to Z.Y.), the Medical Engineering Cross Fund of Shanghai Jiao Tong University (Grant No. YG2023ZD21 to Z.Y.), and Yu Lab. Some parts of computations in this paper were run on the Siyuan cluster supported by the Center for High Performance Computing at Shanghai Jiao Tong University.
Author information
Authors and Affiliations
Contributions
Y.G. performed the main research, analyzed data, and wrote the original manuscript. X.Y. and Q.J. investigated and interpretated the analysis outcomes. Z.Y. supervised the research. Y.G., X.Y., and Z.Y. discussed and revised the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Qianqian Song, Xuexia Wang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Gong, Y., Yuan, X., Jiao, Q. et al. Unveiling fine-scale spatial structures and amplifying gene expression signals in ultra-large ST slices with HERGAST. Nat Commun 16, 3977 (2025). https://doi.org/10.1038/s41467-025-59139-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-59139-w