Abstract
Multicellular programs in the tumour microenvironment (TME) drive cancer pathogenesis and response to therapy but remain challenging to identify and profile clinically1,2,3. Here, we present a machine-learning framework for multi-analyte profiling of spatially dependent cell states and multicellular ecosystems, termed spatial ecotypes (SEs). By integrating over 10 million single-cell and spot-level spatial transcriptomes from diverse human carcinomas and melanomas, we identified nine SEs with broad conservation, each of which has unique biology, geospatial features and clinical outcome associations, including several linked to immunotherapy response. Notably, SEs were distinguishable by DNA methylation profiling and were recoverable from plasma cell-free DNA (cfDNA) using deep learning. In cfDNA from nearly 100 patients with melanoma, SE levels exhibited striking associations with immunotherapy response. Our data reveal fundamental units of TME organization and demonstrate a multimodal platform for profiling solid and liquid TMEs, with implications for improved risk stratification and therapy personalization.
Main
Multicellular ecosystems are fundamental units of tissue organization and key elements of phenotypic variation. In cancer, such ecosystems—arising from immune, stromal and/or malignant cells—form dynamic signalling hubs that powerfully influence disease progression, immune evasion and response to therapy1,2,3. Although single-cell and bulk genomics studies have revealed crucial insights into multicellular ecosystems in cancer1,2,3,4,5,6,7, also known as tumour ecotypes, relatively little is known about their phenotypic diversity in relation to key geographical features in the TME. An unbiased pan-cancer survey of spatially defined cell states, their patterns of co-association into SEs8 and their clinical relationships would illuminate TME biogeography and aid the discovery of improved diagnostics and therapeutic targets9.
Two main factors currently hinder the identification and clinical application of spatially resolved ecotypes in cancer. First, SEs are challenging to profile using existing methods, which are either limited in breadth to a modest number of predefined markers (for example, multiplexed protein imaging)1,5,10, ignore spatial information (for example, EcoTyper)4,6,7 or are unable to perform integrative analyses across diverse samples, cancer types and genomic platforms11,12,13. A second key challenge is the requirement for invasive tumour biopsies to analyse SEs in clinical settings. In particular, solid-tumour biospecimens are subject to considerable sampling bias and are generally restricted to a single diagnostic biopsy14,15. Although cfDNA has emerged as a promising non-invasive analyte with the potential to address these problems16,17, no liquid biopsy assay has yet been described for non-invasive TME assessment.
Here, we introduce a new multimodal framework for solid and liquid profiling of SEs in the TME (Fig. 1a). Our approach combines data fusion, statistical learning and deep learning to overcome critical barriers in both the detection and the recovery of SEs across genomic platforms and bodily compartments. Starting with over 10 million single-cell and spot-level spatial transcriptomes from human carcinomas and melanomas, we identified nine SEs with highly conserved cellular, spatial and clinical features across cancer types, including several that are predictive of response to immune checkpoint inhibitors (ICIs). We then tested whether SEs are recoverable by liquid biopsy. From whole-genome cfDNA methylation profiles of nearly 100 patients with melanoma, we observed striking concordance between plasma-derived SE levels, tumour biopsy-confirmed SE levels and known ICI outcomes. Our study demonstrates a unified approach for granular spatial profiling of the TME, with implications for improved forecasting and spatiotemporal monitoring of therapy response.
a, Schematic description of the study. Top: discovery and clinical characterization of spatially colocalized cell states in human tumours (SEs). Bottom: recovery of SEs in plasma cell-free DNA and the use of non-invasive SE profiling for immunotherapy response assessment. b, Compendium of human tumour ST samples (left) and single-cell-scale expression profiles (right) curated and analysed in this work (Supplementary Tables 1 and 2). Inner and outer rings denote platform and cancer type proportions, respectively. Tumour-type abbreviations are defined in Supplementary Table 1. c, Main cell types and key geographic regions in representative breast cancer specimens profiled by MERSCOPE (left; n = 365,811 cells) and 10x Genomics Visium (right; n = 16,860 cells). The latter was integrated with scRNA-seq data using CytoSPACE25, resulting in a single-cell reconstructed ST specimen. Scale bars, 1 mm. d, Heat maps depicting pan-cancer gene-expression variation between tumour and adjacent stromal regions for 8 TME cell types, 10 malignancies and 120 ST tumour samples covering three platforms (Supplementary Tables 1 and 2). Genes satisfying differential expression requirements in the discovery cohort (Q < 0.05, median log2-transformed fold change (log2 FC) across samples greater than 0.02), with a maximum of 200 genes per compartment, are shown (Supplementary Table 3 and Methods). The top 10 gene symbols per compartment alongside additional canonical markers are highlighted. All ST datasets were reconstructed using scRNA-seq profiles of matching malignancies using CytoSPACE (Methods), with analyses validating this approach in Extended Data Fig. 1a–c. Plasma cells are shown in Extended Data Fig. 1d. Tumour and adjacent stromal regions were annotated as described in Supplementary Methods. Tumour-type abbreviations are defined in Supplementary Table 1. Illustrations in a were created with BioRender; Steen, C. https://BioRender.com/32pzjza (2026).
Spatially constrained TME cell states
High-resolution molecular profiling has revealed considerable plasticity among immune, fibroblast and endothelial subsets in the TME18,19,20,21,22,23,24. For example, CD8+ T cells preferentially express GZMB when localized to the tumour core but preferentially express GZMK when localized to the periphery6. However, the interplay between TME cell states and spatial context across the entire transcriptome, both for diverse malignancies and cell types, has remained unclear.
To gain insight into this question, and as a crucial first step towards charting SEs, we began by assembling spatial transcriptomics (ST) data covering 132 primary human tumour specimens from two main classes of malignancy: carcinoma, which is the most common type, and melanoma. Collectively, this dataset spans ten distinct neoplasms and six ST platforms, including bulk (10x Visium and legacy ST), single-cell (Vizgen MERSCOPE, 10x Xenium V1 and V3 Prime) and single-cell-scale (10x Visium HD) ST data (Fig. 1b, Supplementary Fig. 1a,b and Supplementary Table 1). We also curated single-cell RNA sequencing (scRNA-seq) atlases for the same malignancies, spanning 144 tumour samples (Fig. 1b, Supplementary Fig. 1a,c and Supplementary Table 2).
Initially focusing on bulk ST and MERSCOPE, we integrated scRNA-seq and ST data using CytoSPACE25. This enabled us to overcome the modest spatial resolution and gene recovery of the bulk and single-cell ST assays, respectively, while also leveraging single-cell ST data when available. The resulting multi-cancer atlas consisted of 4.6 million spatially resolved transcriptomes at cell-type resolution (Supplementary Fig. 1a–c).
We started with a supervised analysis to probe differentially expressed genes between two geographic landmarks: the tumour core and the adjacent stroma (Fig. 1c and Supplementary Fig. 2a–c). Using Visium data as a discovery cohort (n = 54 tumours) and the remaining ST data for validation (n = 66 tumours), we identified substantial pan-cancer variation in nine main TME cell types: B cells, plasma cells, CD8+ T cells, CD4+ T cells, NK cells, macrophages and monocytes, dendritic cells, endothelial cells and fibroblasts (Fig. 1d, Extended Data Fig. 1a–d, Supplementary Fig. 3 and Supplementary Tables 3–5). For instance, we readily confirmed T cell and macrophage genes with known spatial polarity, including GZMB and SPP1/TREM2 in the tumour core and GZMK and FOLR2 in the periphery, respectively6,26,27,28. We also identified genes with previously unknown regional specificity. For example, PRDX1 (encoding peroxiredoxin-1), which increases resistance to oxidative stress and enhances antitumour activity in PD-L1–CAR NK cells29, emerged as a top marker of tumour-associated NK cells. Similarly strong differences in regional expression were observed for the remaining TME cell types (Fig. 1d, Extended Data Fig. 1d, Supplementary Fig. 3 and Supplementary Tables 3–5).
Surprisingly, many transcripts also showed geographic heterogeneity independent of TME cell type or malignancy, revealing broad metabolic rewiring in the tumour core and upregulation of TNFα signalling in the periphery (Extended Data Fig. 1e–g and Supplementary Tables 6 and 7). For example, among the leading markers of tumour and adjacent stroma in the Visium data, the top two genes profiled by MERSCOPE—PKM (a glycolytic enzyme30) and FOS (a key target of TNFα31), respectively—exhibited robust patterns of regional variation. This was true at multiple scales, from individual mRNA molecules to entire tumour specimens (Extended Data Fig. 1e–g).
Thus, through integrative analysis of single-cell and ST data across ten malignancies, we identified a rich landscape of conserved spatial expression in the TME (Supplementary Tables 3–7). These results were highly consistent across platforms, including when MERSCOPE data were used directly for discovery, and are detailed in Extended Data Fig. 1a–c.
Spatial map of multicellular ecosystems
We next sought to characterize the relationship between transcriptional programs and spatial coordinates for all nine TME cell types simultaneously, both across malignancies and without the constraint of supervised analysis. However, doing so has been challenging, because existing methods either ignore spatial information6,7 or cannot effectively integrate across tumour samples and cancer types11,12,13. To address these issues, we developed Spatial EcoTyper, a machine-learning framework that generates a unique data representation in which cell type-specific gene expression profiles (GEPs) are ‘fused’ across samples into a common, spatially informed embedding (Fig. 2a, Extended Data Fig. 2, Supplementary Fig. 4 and Methods). By integrating GEPs with similar spatial coordinates while balancing contributions across multiple cell types, this approach, which draws inspiration from multi-omic data fusion32, is not only highly suitable for SE detection, but also outperformed previous methods (Extended Data Fig. 3a–e). Once defined, SEs can be profiled at scale, whether from bulk, single-cell or spatial expression data, using a specialized variant of non-negative matrix factorization33 (NMF) (Methods).
a, Overview of Spatial EcoTyper applied to a melanoma specimen, highlighting three steps: identification of spatial GEPs (sGEPs) for each cell type from a grid of spatial neighbourhoods (SNs); cross-comparison of sGEPs to define a covariance matrix for each cell type; and fusion of covariance matrices into a spatial embedding. To visualize the cell types in each SN (right), a small amount of jitter was applied; also see Extended Data Fig. 2. b, UMAPs showing spatial embeddings of five tumour specimens profiled by MERSCOPE (Supplementary Table 8). Each point denotes an individual SN coloured by distance to the tumour margin. c, Same as b, except coloured by SE. d, Heat map showing a fused spatial covariance matrix from five tumour specimens (from b), representing 41,066 individual SNs grouped into 892 SN clusters and 9 SEs. The similarity index measures SN similarity (Methods); also see Extended Data Fig. 4c. e, SEs and other features in melanoma and breast tumour specimens (MERSCOPE). Left to right: global and magnified views (1 mm2) of SEs, geographic features and cell types (coloured as in d, b and a, respectively). Scale bar, 1 mm. f, Network diagrams of SE cell states (coloured as in a) showing intra-SE associations (Supplementary Table 10). Thicker lines denote more-significant spatial colocalization, for all pairwise comparisons; see Extended Data Fig. 6c–g. g, Heat maps showing SE cell-state marker expression across ten cancer types profiled by scRNA-seq (Supplementary Table 10); SE colours are defined in d. h, Left: same as g, except showing SE consensus markers conserved across cell types and malignancies (coloured as in g; Supplementary Table 11). Right: bar plot showing enrichment of top biological processes associated with consensus markers for each SE (coloured as in d); also see Extended Data Fig. 7h.
To test this strategy, we first analysed diverse tumour specimens individually. To do so, we assembled a discovery cohort consisting of five formalin-fixed paraffin-embedded (FFPE) samples encompassing 844,315 TME cells profiled by MERSCOPE in four distinct carcinomas (breast, colon, liver and prostate) and one melanoma (Extended Data Fig. 3f and Supplementary Table 8). To focus on robust spatial trends while overcoming sparsity, we aggregated single-cell expression data by cell type into neighbourhoods of predefined radius (50 µm) (Extended Data Fig. 3g). We then applied Spatial EcoTyper to fuse transcriptional covariance across cell types and neighbourhoods and generate spatially informed embeddings.
Strikingly, when visualized using uniform manifold approximation and projection (UMAP), each embedding organized into a spatial gradient, with individual neighbourhoods tracing a trajectory from the tumour core to the adjacent stroma (Fig. 2b). These gradients were highly significant (P < 0.0001), independent of malignancy and robust to neighbourhoods of diverse radii (Extended Data Fig. 3g,h). They were also stronger than gradients derived from individual cell types, including GEPs assembled without spatial information (Supplementary Fig. 5), and were largely independent of immunologically hot and cold regions34, implying that there is an orthogonal axis of TME biology (Extended Data Fig. 3i). Thus, multicellular programs in the TME seem to vary in a granular manner, reflecting their distance to the tumour margin.
To determine whether these trajectories also reflect shared variation across cancers, we repeated our analysis by jointly considering all samples (Extended Data Fig. 2 and Methods). Spatial EcoTyper integrated all cell types and ST samples into a common embedding containing 41,066 spatial neighbourhoods. To delineate ecotypes, we applied NMF to this embedding, revealing nine clusters with strong co-association, mutual avoidance and spatial aggregation (Extended Data Fig. 4a,b). We termed these clusters ‘spatial ecotypes’ and numbered them from SE1 to SE9 according to their average distance from the tumour margin (Fig. 2b–e, Extended Data Fig. 4c and Supplementary Table 9).
Notably, all nine SEs were robustly identifiable, whether by perturbation analysis of key parameters, methodological choices or input data (Extended Data Fig. 4d–l). Similar clusters were also resolved from Spatial EcoTyper embeddings by archetypal analysis, an approach that represents high-dimensional input data as a mixture of non-overlapping ‘pure’ components35 (Extended Data Fig. 5). This indicates that nine discrete SEs underpin spatial and transcriptional variation in tumour microenvironments.
Cross-platform validation of SEs
To authenticate these results, we next assessed SE recovery in a validation cohort consisting of 90 held-out ST samples including melanoma, nine types of carcinoma and six ST platforms (Extended Data Fig. 6a,b). As part of this process, we defined SE-enriched cell states (n = 38) using a supervised variant of NMF applied to the discovery cohort (Fig. 2f, Extended Data Fig. 6a, Supplementary Table 10 and Methods). All cell states were reproducibly identified by cross-validation analysis (Extended Data Fig. 6a). We then predicted SE-associated cell states in all datasets using a previously established approach6.
To evaluate SE generalizability and robustness (Extended Data Fig. 6b), we first asked whether cell states belonging to the same SE are more spatially colocalized than expected by random chance. Regardless of whether we analysed MERSCOPE or Xenium and Visium HD assays, all SEs showed significant patterns of cell-state colocalization (Extended Data Fig. 6c–g). We next evaluated spatial coherence (Moran’s I) and distance to the tumour margin. Across six ST platforms, SEs aligned with expectation (Extended Data Fig. 6h–j). Finally, we asked whether SEs are discoverable in held-out data, both by de novo identification of ecotypes from a different ST platform (Xenium Prime) and by archetypal analysis of Spatial EcoTyper embeddings. On average, all nine original SEs were resolved by these approaches (Extended Data Fig. 6k–o).
Given these results, we investigated whether SEs are recoverable in scRNA-seq data using reference-guided cell labelling. Indeed, by predicting cell-state frequencies and their correlations in 144 tumours across ten malignancies, all nine SEs were significantly and specifically detectable (Extended Data Fig. 7a,b). Most SEs were also recovered in 64 brain metastases originating from melanoma and five types of carcinoma36, showing conservation beyond localized disease (Extended Data Fig. 7c). Thus, SEs exhibit conserved biogeographic features and robust cell-state co-association relationships.
Biological characteristics of SEs
To gain insight into SE-specific cellular programs, we next analysed SE-assigned cells in scRNA-seq tumour atlases (Methods). Marker genes for each SE cell state were reproducibly defined in held-out data (Fig. 2g and Extended Data Fig. 7d,e). Moreover, a subset of these genes distinguished SEs independently of TME cell type or malignancy (Fig. 2h, Extended Data Fig. 7f, Supplementary Fig. 6 and Supplementary Table 11). Such genes, termed consensus markers, were generalizable to single-cell-scale ST data and associated with distinct functional programs, establishing them as robust molecular hallmarks of SE biology (Fig. 2h and Extended Data Fig. 7g,h).
Combining these data with cellular and spatial features, we found that SE1, which is the ecotype most enriched in adjacent stroma, is associated with early response genes including FOS and EGR1. These genes, previously attributed to adjacent normal tissue using bulk transcriptomics37, are reproducibly expressed across all cell states in SE1, including naive and central memory T cells (GZMK+, IL7R+ and CXCR4+)20, FOLR2+ macrophages (SEPP1+ and FOLR2+)26, cDC2 cells (CLEC10A+ and FCER1A+)21, adventitial fibroblasts (C3+)22 and activated capillary cells (EGR1+ endothelial cells)18. Conversely, SE9, which is the ecotype most enriched in the tumour core, consists of two cell states associated with endothelial cell proliferation: angiogenesis-promoting TREM2+ macrophages38 and tip cells (INSR+ endothelium)18 (Fig. 2g and Supplementary Table 10).
Unlike SE1 and SE9, most SEs were found to preferentially localize within 250 µm of the tumour–stroma margin (Extended Data Fig. 4c and Supplementary Table 9). These include six SEs composed of unique plasma cell, myeloid and/or stromal cell states (Fig. 2f and Supplementary Table 10). For example, we identified a multi-phenotypic hub (SE3) comprising IGLL5+ plasma cells39, C1QB+ macrophages21, CD1E+ dendritic cells21, CXCL12+ cancer-associated fibroblasts40 and CD36+ endothelial cells. We also identified bi-phenotypic hubs comprising distinct myeloid and/or stromal cell states: SE4, associated with wound healing, is defined by MYH11+ myofibroblasts41 and hypoxia-associated FBLN5+ endothelial cells; and SE5, associated with immunosuppression, is composed of FAP+ cancer-associated fibroblasts42 and APOE+ M2-like macrophages21,43 (Fig. 2g and Supplementary Table 10). These and other margin-enriched ecotypes revealed extensive spatial organization along this phenotypic transition.
We also uncovered two proinflammatory SEs with elevated interferon signalling but distinct regionalization. SE7, which is largely restricted to the tumour margin, encompasses cell states from a total of eight lymphoid, myeloid and stromal lineages, each with elevated STAT1 expression and consensus genes associated with antigen processing and presentation. SE8, which is generally localized to the tumour core, consists of GZMB+ T cells, CCL8+ macrophages, CXCL8+ cancer-associated fibroblasts and CEACAM1+ endothelial cells, with consensus genes enriched in elevated metabolism (Fig. 2f,g and Supplementary Tables 10 and 11).
Beyond the TME, we found that spatially adjacent non-TME cells, including malignant subsets, express consensus markers for a subset of ecotypes (Extended Data Fig. 7i). This indicates that SEs can participate in larger multicellular assemblies with shared spatial programs.
Finally, we compared SEs with carcinoma ecotypes (CEs) identified in previous work by RNA-seq deconvolution6. We found that CEs with tighter spatial-aggregation patterns are more likely to have at least two distinct SE counterparts (Extended Data Fig. 7j). For example, CE9, which previously outperformed competing measures for predicting benefit from immune checkpoint inhibition (ICI)6, is likely to be a composite of SE7 (margin-associated) and SE8 (intratumoural). Thus, by using Spatial EcoTyper, we discovered numerous local microenvironments that were previously undetectable without high-resolution spatial analysis.
Digital cytometry of SEs
To characterize the potential clinical relevance of SEs, we next devised a gene-expression deconvolution strategy to determine SE composition at scale. Using scRNA-seq data from 10 cancer types, we leveraged 1,000 pseudo-bulk tumours to train an NMF model for quantifying SE content (Fig. 3a and Methods). After cross-validation, we observed generally high accuracy for deconvolving SE levels, both in reconstituted tumours from held-out cancers (Extended Data Fig. 8a–c) and from real tumours with scRNA-seq ‘ground truth’ (Fig. 3a, Extended Data Fig. 8d–f, Supplementary Table 12 and Supplementary Note 1). By contrast, previous state-of-the-art deconvolution methods yielded lower correlations in synthetic tumours and failed to recover several SEs from real data (Extended Data Fig. 8g).
a, Technical and clinical assessment of the Spatial EcoTyper deconvolution model. ED, Extended data. b, Correlations between SE levels deconvolved from paired bulk RNA-seq and Visium profiles of tumours from 42 patients spanning five cancer types. Significance was determined by two-sided t-tests. c, Scatter plots showing SE4 and SE7 correlations from b (also see Extended Data Fig. 8i). Pearson correlations and linear regression lines with 95% confidence bands are shown. P-values were determined as in b. Representative samples with high, medium and low SE abundances are shown. d, Box plots summarizing validation analyses in Extended Data Fig. 8a–f,h,i, showing medians, first and third quartiles, and minimum and maximum values within 1.5 × the interquartile range of the box limits. Significance was assessed by two-sided Wilcoxon signed-rank tests. NS, not significant. e, Left: MERSCOPE profile of a melanoma specimen (n = 327,822 cells) with an adjacent section captured by Visium (box). Right: spatial maps of selected SEs in the overlapping region. Scale bar, 1 mm. f, Quantification of pairwise concordance for all SEs, related to e (Methods). g, Clinical characteristics of SE levels in 7,076 tumour RNA-seq profiles spanning 17 TCGA cancer types. Prognostic associations between higher SE abundances and overall survival, shown pooled across cancer types (top) and per-cancer (bottom), with colours denoting favourable (blue) and adverse (red) relationships. h, Comparison of 41 features for predicting ICI response from 15 RNA-seq datasets (Supplementary Table 15), with features ranked by those most predictive of benefit (top) and resistance (bottom). Published features in the comparison include: T cell inflamed signature, ref. 55; CE9, ref. 6; IFNγ signature, ref. 55; CE10, ref. 6; M1 macrophage, ref. 56; TLS signature, ref. 57; and TIDE, ref. 58. Parenthetical numbers indicate history of ipilimumab (1, yes; 2, no) or chemotherapy (3, yes; 4, no). i, Forest plot showing associations between TMB, CD274 (encoding PD-L1) expression, SE7, SE8 and SE4 levels, and overall survival, across 465 evaluable tumours from h. Data are presented as hazard ratios ± 95% confidence intervals. Two-sided P-values were determined as described in Methods. Icons in panel a were created with BioRender; Steen, C. https://BioRender.com/dxvufyg (2026).
To extend this analysis to intact specimens, we analysed paired bulk RNA-seq and Visium ST data of tumour samples from 42 patients spanning breast, colorectal, lung, ovarian and pancreatic cancers from the Human Tumour Atlas Network (HTAN)44 (Fig. 3a). To compute SE abundances from each Visium sample, we aggregated deconvolution results from individual spatial spots (Methods). We found that all nine SEs were significantly and specifically correlated between platforms (Fig. 3b,c and Extended Data Fig. 8h,i). Furthermore, performance was generally independent of cancer type or disease state (Fig. 3b) and was consistent across diverse validation scenarios (Fig. 3d), emphasizing robustness.
To further evaluate SE recovery from bulk ST data, we performed spatial RNA sequencing with Visium and multiplexed RNA imaging with MERSCOPE on adjacent melanoma sections (Fig. 3a). Across 815 overlapping spatial spots, SE microarchitectures were significantly correlated between assays (Fig. 3e,f). Similar results were obtained from publicly available data of paired Visium and Visium HD profiles (Fig. 3a and Extended Data Fig. 9a–c).
These data demonstrate the promise of Spatial EcoTyper for high-resolution, scalable profiling of TME spatial ecosystems from bulk expression data.
Clinical significance of SEs
We next applied our approach to 7,076 clinically annotated bulk RNA-seq profiles of melanomas and 16 types of carcinoma from The Cancer Genome Atlas (TCGA)45 (Supplementary Table 13). Two-thirds of evaluable SEs (six of nine) were significantly prognostic for overall survival when assessed across cancers after adjustments for age and sex (Fig. 3g and Supplementary Table 14). Among them, SE5, which is a margin-enriched ecotype associated with elevated epithelial–mesenchymal transition and TGFβ1 production, emerged as the leading determinant of shorter survival (Fig. 3g and Extended Data Fig. 9d). Conversely, SE7 and SE8, which are distinguished by location and consensus programs enriched in antigen processing and metabolic activity, respectively, were significantly predictive of longer survival (Fig. 3g). Although most cancer types showed consistent survival patterns, we observed reciprocal associations for prostate, oesophageal and pancreatic cancers. These inversions were consistent with factors that hinder antitumour immunity, including lower expression levels of major histocompatibility class I (MHC-I) and MHC-II in prostate and oesophageal carcinomas46,47, and a dense desmoplastic stroma in pancreatic cancer48 (Extended Data Fig. 9e).
Given previous data linking CE9 to immunotherapy response6, we next wondered whether SEs could enhance ICI outcome stratification. To this end, we assembled public RNA-seq profiles of 1,249 pretreatment tumours spanning melanoma and three carcinomas, all with annotated outcomes to ICI monotherapy (anti-PD-[L]1) or combination therapy (anti-PD-1 and anti-CTLA-4) (Supplementary Table 15). We then enumerated SEs, CEs and 22 additional potential correlates of ICI response (Supplementary Table 16). CE9 was again a strong correlate of ICI benefit, corroborating previous findings6 in a greatly expanded meta-analysis (Fig. 3h). Nevertheless, SE8, a daughter ecotype of CE9 with strong intratumoural localization, outperformed it (Fig. 3h and Extended Data Fig. 7j). SE7, another daughter of CE9 with localization to the margin, showed nearly comparable performance (Fig. 3h and Extended Data Fig. 7j). It also surpassed SE8 in predicting ICI benefit when applied to melanoma alone (Supplementary Table 16). Moreover, SE4, a myofibroblast and hypoxia-associated endothelial cell ecotype associated with wound healing, emerged as the top correlate of ICI resistance (Fig. 3h).
To extend this analysis to established ICI biomarkers, we examined 465 patients across all datasets with available tumour mutational burden (TMB) data (non-small cell lung cancer, bladder cancer and melanoma), using CD274 expression as a surrogate for PD-L1 level. Across datasets and patients, SE7 and SE8 outperformed TMB and CD274 in predicting overall survival in multivariable models, with all three ecotypes including SE4 showing superior performance in univariate models (Fig. 3i and Extended Data Fig. 9f).
Thus, by coupling transcriptional and spatial variation into cohesive cellular assemblies, SEs offer the potential for improved prediction of clinical outcome.
Liquid biopsy of SEs
To fully exploit SEs, it will be important to address key barriers that limit their broad clinical applicability. These include sampling bias in metastatic disease, tumour geographic heterogeneity14,15 and the impracticality of acquiring serial biopsies for longitudinal analysis. Liquid biopsies represent a promising means of overcoming such challenges16,17 but have not previously been applied to detect multicellular ecosystems. Given the promise of cfDNA methylation profiling for deciphering cell-of-origin contributions to peripheral blood plasma17, we hypothesized that a cfDNA methylation approach might allow SE levels to be quantified in a more systemic, unbiased and accessible manner.
To explore this hypothesis, we designed and trained a deep-learning framework, termed Liquid EcoTyper, to infer SE levels from CpG methylation data (Fig. 4a and Methods). Our approach is based on a binary neural network that learns discriminatory CpG sets (akin to gene sets) to quantify SE levels, non-SE tumour content and healthy cfDNA background (Fig. 4a and Methods). By learning CpG sets through a binary network, our design enforces regularization, facilitates resistance to dropout and sparsity of individual CpGs, and allows all CpG signatures to be extracted seamlessly without feature-inference strategies, thereby providing complete transparency.
a–c, The development of Liquid EcoTyper. a, CpG methylation profiles are processed through a binary network module to identify informative CpG sets, which are averaged and transformed into a continuous output yielding relative SE levels (Methods). b, Training and testing of Liquid EcoTyper using simulated cfDNA combining methylation profiles of melanomas with healthy cfDNA. The former includes paired RNA-seq, from which ground-truth SE levels were determined. c, Box plot of Spearman correlations between predicted and expected SE levels in test data. d, Inventory of validation data (Supplementary Table 17). WashU, Washington University in St. Louis. e, Box plots of Spearman correlations between SE levels in cfDNA and paired tumour EM-seq (n = 20 patients) and Visium (n = 15 patients) (Supplementary Table 18). f, Scatter plots of SE levels in paired cfDNA and Visium samples, ranked and normalized to 0–1 in each compartment. g, As for f, except comparing the differences between SE7 and SE4. h, Top: rank-normalized SE7 and SE4 levels in cfDNA, ordered by the difference between them. Centre: SE7 and SE4 representation in tumours with the largest differences in cfDNA (for visualization details, see Methods). Bottom: averaged SE levels shown in the spatial maps above. i, Box plots comparing correlations of SE levels from EM-seq profiles of all plasma–tumour pairs, plasma–tumour pairs with PBMC data, and tumour–PBMC and plasma–PBMC pairs. Two-sided one-sample t-tests were applied to assess differences from zero (bottom). *P < 0.05, ****P < 0.0001. Boxes in c, e and i denote medians, first and third quartiles, and minimum and maximum values within 1.5 × the interquartile range of the box limits. In c, f and g, P-values were determined by two-sided t-tests. In f and g, linear regression lines with 95% confidence intervals are shown for display. In e and i, two-sided Wilcoxon signed-rank tests were used for group comparisons. NS, not significant. Illustrations in a and d were created with BioRender; Steen, C. https://BioRender.com/eogcxwk (2026).
To implement Liquid EcoTyper, we focused on metastatic melanoma, a cancer type for which ICI therapy is the standard-of-care and for which extensive multimodal genomic data, including datasets with known ICI outcomes, are publicly available. We compiled 461 melanomas from TCGA with paired 450K methylation array data and bulk RNA-seq profiles45, the latter of which served as ground-truth data for SE composition (Fig. 4b). To boost specificity, we generated NEBNext enzymatic methyl-seq (EM-seq) profiles of cfDNA collected from 23 healthy controls49. To train the model, we then simulated cfDNA from patients with melanoma by combining CpG methylation profiles from tumour genomic DNA and healthy cfDNA in defined proportions (Fig. 4b and Extended Data Fig. 10a). By principal component analysis (PCA), these data co-embedded with real cfDNA methylation profiles from patients with melanoma according to tumour content, and clustered separately from healthy cfDNA, supporting their utility for model development (Extended Data Fig. 10b).
By evaluating Liquid EcoTyper on 115 simulated melanoma cfDNA profiles held out from training, we observed significant correlations for all evaluated SEs using CpG methylation signatures, both for individual SEs (P < 10–6; Fig. 4c) and pairwise cross-correlations (P < 10–16; Extended Data Fig. 10c). Our approach also readily extended to 13 types of carcinoma in proof-of-principle analyses, underscoring its generalizability (Extended Data Fig. 10d).
To further characterize Liquid EcoTyper trained on melanoma, we examined model consistency, specificity and biological interpretability. First, we confirmed that Liquid EcoTyper successfully learns and preserves the ground-truth association of each CpG set for each SE (Extended Data Fig. 10e). Next, we ablated the top SE-associated CpG sets, showing total, or near-total, performance loss specific to each targeted SE (Extended Data Fig. 10f). Finally, we interrogated learnt CpG sets for previously defined marker genes of each SE (Extended Data Fig. 10g). By analysing CpG sites in promoter regions and gene bodies, we found that consensus genes (Fig. 2h) were specifically enriched in matching SEs for all but one ecotype (Extended Data Fig. 10h,i and Supplementary Note 2). Thus, Liquid EcoTyper can learn biologically grounded epigenomic profiles of spatial cellular ecosystems.
To test performance on real plasma while accounting for key confounding variables, we next collected tumour specimens and plasma cfDNA isolated from 23 patients with melanoma from two institutions, Yale University and Washington University in St. Louis (Fig. 4d and Supplementary Table 17). The cfDNA was profiled by whole-genome EM-seq, and matched tumours were subjected to Visium ST (n = 15) and/or EM-seq (n = 20) (Fig. 4d). Peripheral blood mononuclear cell (PBMC) DNA was also extracted from seven of these patients for EM-seq (Fig. 4d and Supplementary Table 17). To emulate clinical conditions, we limited our analyses to clinically practical cfDNA mass input amounts (up to a maximum of 10 ng). We then applied Spatial EcoTyper to ST data and Liquid EcoTyper to all methylomes.
Strikingly, the SE levels determined by our approach were well correlated between plasma and tumour compartments, a result that was consistent with simulation data and unaffected by the number of cell states per ecotype or key technical parameters (Fig. 4e,f, Extended Data Fig. 11a–f and Supplementary Fig. 7). Indeed, we observed significant correlations for all but one ecotype (SE2; P = 0.067) profiled by ST and plasma EM-seq (Fig. 4f), and for all but one ecotype (SE3; P = 0.058) profiled by tumour and plasma EM-seq (Extended Data Fig. 11b and Supplementary Table 18). Although consistent with modest inter-assay variation (Supplementary Note 2), when evaluated across ecotypes, there was no significant difference between assays (Fig. 4e). Further demonstrating cross-compartment concordance, differences in cfDNA abundance between two SEs with significant but reciprocal associations with ICI response in melanoma—SE7 and SE4 (Supplementary Table 16)—were reflective of their corresponding levels by ST (Fig. 4g,h). Emphasizing specificity, correlations between inferred SE levels in PBMCs and tumours from the same patients were substantially lower and not significantly different from 0 (Fig. 4i). The same was true for inferred SE levels in PBMCs versus matched plasma samples (Fig. 4i). This finding indicates that the plasma-derived SE signal is largely specific to metastatic melanomas and is not simply an artefact of DNA shed from circulating leukocytes.
Liquid ecotypes forecast ICI response
We next explored the clinical potential of liquid SE profiling in 78 patients with metastatic melanoma treated with ICI monotherapy (anti-PD-1 or anti-CTLA-4; n = 35 patients) or combination therapy (anti-CTLA-4 and anti-PD-1; n = 43) (Supplementary Tables 19 and 20). To this end, we generated whole-genome EM-seq profiles of clinically obtainable quantities of pretreatment plasma cfDNA from all patients (Fig. 5a and Supplementary Table 21). We also performed targeted sequencing of pretreatment circulating tumour DNA (ctDNA) for 60 patients and TMB profiling for 38 patients as a comparator50 (Fig. 5a and Supplementary Tables 22–26).
a, Heat map of patients with melanoma profiled by whole-genome EM-seq in this study (n = 78 patients) showing, from top to bottom, patient clinical characteristics, pretreatment SE7, SE8 and SE4 levels determined by Liquid EcoTyper, and pretreatment ctDNA levels determined by ultrasensitive targeted sequencing50 (n = 60 evaluable patients) (Supplementary Tables 19 and 26). N/A, not available; NDB, no durable benefit; SNV AF, single nucleotide variant allele frequency. b, Scatter plot showing ICI response associations, expressed as z-scores, between SE levels inferred from whole-genome EM-seq data of pretreatment plasma cfDNA from patients with advanced melanoma (n = 78 patients) (Supplementary Table 27), and bulk RNA-seq data of pretreatment tumours from patients with advanced melanoma (n = 366 patients) (Fig. 3h and Supplementary Table 16). Positive and negative z-scores indicate that higher SE levels are associated with resistance and response to ICIs, respectively. Concordance was determined by Pearson correlation with significance determined by a two-sided t-test. A linear regression line with 95% confidence band is shown. c, Box plots showing inferred SE7 levels in pretreatment plasma stratified by ICI response and shown for distinct ICI therapies. Significance was determined using a two-sided Wilcoxon rank-sum test. d, Kaplan–Meier plots showing differences in progression-free survival and overall survival of 78 patients with melanoma dichotomized into high and low groups based on the median of inferred SE7 levels in pretreatment plasma (Methods). Significance was determined by a two-sided log-rank test; 95% HR confidence intervals are shown in brackets. e, Same as c, except for SE4. f, Same as d, except for SE4. Boxes in c and e denote medians, first and third quartiles, and minimum and maximum values within 1.5 × the interquartile range of the box limits, respectively.
When evaluated across patients, ICI response associations for SEs were nearly perfectly correlated between plasma-derived and bulk tumour RNA-seq-derived measurements (Fig. 5b and Supplementary Tables 16 and 27). This consistency was remarkable given that the data were generated from disparate cohorts, bodily compartments and modalities. In particular, elevated SE7 and SE8 levels determined in pretreatment plasma were strongly associated with future durable clinical benefit, longer progression-free survival (PFS) (P < 0.001 and hazard ratio (HR) < 0.38 for both) and longer overall survival (OS) (P < 0.001 and HR < 0.38 for both), whereas higher SE4 levels forecasted ICI resistance, shorter PFS (P < 0.001 and HR = 2.92) and shorter OS (P = 0.006 and HR = 2.29) (Fig. 5a–f, Extended Data Fig. 12a–d and Supplementary Tables 27 and 28). These relationships were not only consistent with our previous findings from RNA-seq data, but were also robust across various ICI therapy types, melanoma subtypes, sex and age (Extended Data Fig. 12e and Supplementary Table 28). Moreover, our findings were maintained in an independent cohort of ten patients with melanoma from another institution (Extended Data Fig. 12f and Supplementary Table 20), with consistent relationships between model-derived CpG signatures and plasma-derived SE levels across cohorts (Supplementary Fig. 8).
In contrast to SEs, higher pretreatment ctDNA levels were only modestly associated with ICI resistance and shorter OS, in line with previous studies51,52,53 (Extended Data Fig. 12g,h). Moreover, baseline levels of ctDNA were not significant in multivariable survival models incorporating plasma-derived levels of SE7, SE8 or SE4 (Extended Data Fig. 12h and Supplementary Table 28).
We also explored tissue-based TMB and PD-L1 levels as established biological benchmarks to assess whether liquid SEs capture additional, non-redundant biology (Extended Data Fig. 12i). In patients with evaluable TMB or PD-L1 levels, all three liquid SEs (SE7, SE8 and SE4) were more significantly associated with OS regardless of multivariable adjustment (Extended Data Fig. 12i and Supplementary Table 28). These results were supported by time-dependent area under the receiver operating characteristic curve (AUC(t)) analyses (Supplementary Fig. 9). Coupled with complementary tissue-based findings across 465 patients and three cancer types (Fig. 3i), these data further demonstrate that SEs capture clinically relevant biology beyond approved ICI biomarkers.
Thus, liquid SE profiling has the potential to access the TME non-invasively, infer its spatial cellular architecture and outperform existing methods for early ICI response assessment.
Discussion
In this study, we have introduced two machine-learning strategies (Spatial EcoTyper and Liquid EcoTyper) to systematically identify and profile multicellular hubs, termed spatial ecotypes, that recur across spatial domains and tissue specimens. With Spatial EcoTyper, we discovered and validated nine SEs in human solid malignancies, each with distinct localization patterns, cell-state assemblies and clinical-outcome associations. With Liquid EcoTyper, we transferred our findings to plasma-derived circulating DNA. Collectively, these data open the door to new analytical possibilities.
For example, in an exploratory analysis of nearly 100 patients with advanced melanoma, we demonstrated that SE signatures—and by extension, TME features—are discernible from cfDNA and are strongly associated with ICI response. In the future, these findings could facilitate more precise, individualized and accessible read-outs of the TME, with implications for more effective diagnostics and management of patients with cancer. To realize these possibilities, it will be important to prospectively validate our results in large multi-institutional cohorts, examine their generalizability to other cancer therapies (such as engineered immune cells and personalized vaccines, for example) and determine the interplay between specific therapeutic interventions and longitudinal SE dynamics.
Our approach can also provide insights into tissue organization that are not readily obtainable by other methods. For example, we identified nearly 40 spatially dependent cell states with pan-cancer representation, including 14 non-immune stromal phenotypes that segregate to unique spatial ecosystems (Fig. 2f). Given that SEs consist of cell-type-specific transcriptional states, they would be difficult to identify by methods that rely on predefined cell phenotypes, small protein panels or known histological features1,5,10, or that lack a mechanism to reliably harmonize ST data across samples and subjects into conserved multicellular niches11,12,13. Moreover, by combining high-resolution SE discovery with massively scalable SE recovery, our approach enables well-powered statistical analyses. We expect that future studies will define more SEs, including some that are enriched in or are exclusive to specific malignancies, histological subtypes, treatment effects or disease stages, and that incorporate cell types beyond those analysed in this work.
Despite the promise of our approach, there are several limitations. First, current ST platforms have intrinsic trade-offs between spatial resolution and gene recovery54. To comprehensively survey SE taxonomies, larger and more diverse single-cell ST cohorts with greater gene and cell-type recovery will be needed. Second, our approach can effectively identify spatially dependent cell states and ecotypes, but the biological insights gleaned from these data require experimental confirmation. Third, although we demonstrate the feasibility of liquid SE profiling in melanoma, future studies are needed to establish the analytical sensitivity, specificity and extensibility of this strategy, including in the presence of potential background signals (such as treatment effects), and further investigate its application to other cancer types.
In summary, SEs are fundamental units of tissue biology in multicellular life. We anticipate that the analytical tools described in this work will have immediate utility for decoding SEs in health and disease, with implications for new hypotheses, improved therapeutic strategies and diagnostics that bridge the gap between surgical and liquid biopsy of solid-tissue microenvironments.
Methods
Human subjects
All human samples included in this study were collected with informed consent for research use and received approval from the Institutional Review Boards of Yale University School of Medicine and Washington University School of Medicine, in accordance with the principles of the Declaration of Helsinki (2013). These samples, obtained from a total of 123 human subjects, were divided into four cohorts.
Cohort 1
Intact and dissociated tumour samples were collected from seven patients (four with colon cancer and three with melanoma) at the time of surgery. Each sample underwent bulk RNA sequencing and the dissociated tumour samples also underwent scRNA-seq.
Cohort 2
Matched tumour and plasma cfDNA samples were collected from 23 patients with metastatic melanoma, with matched PBMCs also collected for seven. Tumour samples for each patient were profiled by ST and/or whole-genome EM-seq, depending on availability (Visium, Visium HD and/or EM-seq). A further 23 plasma cfDNA samples were collected from healthy individuals. All PBMC and plasma cfDNA samples were profiled by whole-genome EM-seq. Matched melanoma samples are graphically depicted in Fig. 4d and a full inventory is provided in Supplementary Table 17.
Cohort 3
Plasma samples were collected from 78 patients with melanoma treated at Yale Cancer Center, including seven from cohort 2, who received ICI monotherapy (30 received anti-PD-1 and five received anti-CTLA-4) or combination therapy (43 received anti-PD-1 and anti-CTLA-4). Samples were collected before treatment initiation (before or on the first day of ICI cycle 1) and underwent whole-genome EM-seq. ICI response was classified as either durable clinical benefit or no durable benefit by a board-certified medical oncologist, reflecting each patient’s disease response six months after ICI initiation. Progression-free survival was determined from the start of ICI treatment.
Cohort 4
Plasma samples were collected from ten patients with melanoma treated at Siteman Cancer Center, including eight from cohort 2, who received immune checkpoint inhibitor (ICI) monotherapy (seven received anti-PD-1) or combination therapy (three received anti-PD-1 and anti-CTLA-4). Samples were collected before or during treatment and underwent whole-genome EM-seq. ICI response was classified as described for cohort 3.
All clinical features, including age and sex, were documented using electronic medical records from Siteman Cancer Center (cohorts 1, 2 and 4) and Yale Cancer Center (cohorts 2 and 3). De-identified clinical characteristics are provided in Supplementary Tables 12, 17, 19 and 20 for the four cohorts. Details of sample processing and sequencing are provided in the Supplementary Methods.
Data collection and processing
scRNA-seq
Single-cell RNA-seq atlases of carcinomas and melanomas, either generated in this work or obtained from published studies as preprocessed data4,27,36,59,60,61,62,63,64,65,66 (Supplementary Fig. 1a and Supplementary Table 2), were annotated for B cells, plasma cells, CD4+ T cells, CD8+ T cells, NK cells, macrophages (including monocytes), dendritic cells, fibroblasts, endothelial cells, epithelial or melanoma cells and other (unclassified) cell types. For publicly available datasets with author-supplied annotations (breast cancer, colon cancer, liver cancer, squamous cell carcinoma, melanoma and brain metastasis), annotations were mapped to the above cell-type labels. Cohort 1 scRNA-seq data, as well as publicly available data without cell-type annotations (bladder cancer, lung cancer, ovarian cancer, prostate cancer and pancreatic cancer), were analysed using Seurat (v.4.3.0)67 as described below. For quality control, cells with fewer than 200 detected genes or more than 25% of reads mapped to mitochondrial genes were excluded. Raw counts were imported and cells clustered following SCTransform with the glmGamPoi method, FindVariableFeatures, ScaleData, RunPCA, FindNeighbors and FindClusters. Cell-type annotations were then manually assigned to clusters based on the expression of canonical lineage markers (Supplementary Fig. 1c): MS4A1, CD19, CD79A and CD79B for B cells, IGKC and MZB1 for plasma cells, CD3D, CD4 and IL7R for CD4+ T cells, CD3D, CD8A and CD8B for CD8+ T cells, GNLY and NCAM1 for NK cells, CD68 and CD14 for macrophages/monocytes, CD1C for dendritic cells, COL1A1, COL3A1, PDGFRA and FAP for fibroblasts, PECAM1 and VWF for endothelial cells, EPCAM for epithelial cells and SOX9, MET, MITF and MLANA for melanoma cells. Small clusters with multilineage marker expression were considered potential doublets or multiplets and eliminated from further analysis (3% of filtered cells from cohort 1, on average).
Visium and legacy ST
Processed data from 54 Visium (standard) and 54 legacy ST profiles of carcinoma and melanoma samples were downloaded from 10x Genomics (https://www.10xgenomics.com/resources/datasets) and 12 previous studies27,60,66,68,69,70,71,72,73,74,75,76 (Supplementary Fig. 1a and Supplementary Table 1). Legacy ST refers to the predecessor of 10x Visium, a lower-resolution ST assay with 100 μm spot diameter reported in ref. 77. For quality control of publicly available data, genes expressed in fewer than five spots and spots expressing fewer than 200 unique genes were omitted. Processing details of Visium data generated in this study are provided in ‘10x Visium (standard)’ in Supplementary Methods.
MERSCOPE
Preprocessed MERSCOPE profiles of 15 FFPE human tumour specimens, spanning melanoma and six distinct carcinomas, were downloaded from Vizgen (MERSCOPE FFPE Human Immuno-oncology program; https://info.vizgen.com/merscope-ffpe-solution). Three ovarian cancer samples were excluded owing to substantial tissue fragmentation (Supplementary Table 8). For quality control, genes expressed in fewer than five cells and cells with fewer than 300 total transcripts were excluded from each remaining sample.
For cell-type annotation, transcripts were downsampled to 300 per cell, and cells were clustered with Seurat (v.4.3.0)67 using the following steps: NormalizeData, FindVariableFeatures (nfeatures = 300), ScaleData, RunPCA, FindNeighbors and FindClusters (resolution = 1). Cell-type annotations were then assigned by cluster based on the expression of canonical markers (Supplementary Fig. 1b) as described for scRNA-seq above, with reclustering of individual clusters, particularly those containing mixed lymphocyte groups, performed for greater granularity as needed.
To remove ambient and improperly segmented mRNAs, publicly available tumour scRNA-seq atlases (‘scRNA-seq’ above) were used to identify genes commonly expressed in each cell type. Specifically, for each cell type, genes expressed in at least 5% of cells in three or more cancer types were identified, resulting in a whitelist for each cell type. Genes in each cell type that were absent from the corresponding whitelist were then set to zero expression in the MERSCOPE data. Subsequently, cells with fewer than five detectably expressed genes were excluded, resulting in a final dataset of 5.6 million evaluable cells from 12 samples (Supplementary Fig. 1a and Supplementary Tables 1 and 8). For MERSCOPE data generated in this study, see ‘Vizgen MERSCOPE’ in Supplementary Methods.
Xenium
Space Ranger results from 11 samples (nine carcinomas and two melanomas) profiled with Xenium V1 (n = 5) and Xenium Prime (n = 6) were downloaded from https://www.10xgenomics.com/resources/datasets (Supplementary Tables 1 and 8). Cell-type annotation by canonical marker expression within clusters (Supplementary Fig. 1b) and subsequent postprocessing were done following the workflow described in ‘MERSCOPE’ above, with two modifications, owing to the lower average number of detected genes in Xenium compared with MERSCOPE: omission of the downsampling step to 300 transcripts per cell; and variable feature selection using FindVariableFeatures with nfeatures = 200. One sample lacking annotatable CD4+ T cells was excluded from downstream analysis (Supplementary Table 8). For quality control, cells with fewer than 20 detected genes or with greater than 10% mitochondrial transcript content were omitted.
Visium HD
Space Ranger results from five carcinoma samples profiled with Visium HD (bins of 8 μm × 8 μm) were downloaded from https://www.10xgenomics.com/resources/datasets (Supplementary Tables 1 and 8). Cell-type annotation and postprocessing were performed as described for Xenium data (Supplementary Fig. 1b). For these samples, Visium HD bins generally yielded robust cell-type discrimination, in line with a previous report78. One sample lacking annotatable CD8+ T cells was excluded from downstream analysis (Supplementary Table 8). For quality control of publicly available data, bins with fewer than 20 detected genes or with greater than 10% mitochondrial transcript content were omitted. Processing details for Visium HD data generated in this study, which were used for SE deconvolution as described below in section ‘Paired tumour and plasma from patients with melanoma’, are provided in ‘10x Visium HD’ in the Supplementary Methods.
Analysis of tumour versus adjacent stroma
Integration of single-cell and spatial transcriptomes
CytoSPACE (v.1.0.3)25 was used to align scRNA-seq data to ST data from the same cancer type, reconstructing transcriptome-wide spatially resolved expression profiles of single cells (Fig. 1c). Alignment was done separately for each ST sample, with source data enumerated in Supplementary Tables 1 and 2. To eliminate potential bias arising from different total unique molecular identifiers (UMIs), raw counts from droplet-based scRNA-seq data were downsampled to 1,500 total UMIs per cell, whereas transcripts per million (TPM) data from Smart-seq2 samples (melanoma) were used without downsampling. The lap_CSPR solver and recommended settings were applied for all analyses, including the default mode for bulk ST (Visium and legacy ST), single-cell mode for single-cell ST data (MERSCOPE) and an average of five cells per spot for Visium data and 20 cells per spot for legacy ST data.
Differential expression analysis
For the results presented in Fig. 1d and Extended Data Fig. 1a–d,e,g, we analysed scRNA-seq data mapped to ST samples, as described above. We also analysed MERSCOPE data directly (without scRNA-seq integration) as a form of reciprocal validation (Extended Data Fig. 1a–c and Supplementary Table 4). As input to the analyses described below, UMI-based scRNA-seq data were normalized for each cell type separately using SCTransform from Seurat (v.4.3.0)67; Smart-seq2 data from melanoma samples were normalized to log2[TPM]; and MERSCOPE data were normalized using NormalizeData from Seurat (v.4.3.0)67.
To study transcriptome-wide variation in TME cell types localized to the tumour or adjacent stroma, and given broad cancer coverage and sample availability, CytoSPACE-enhanced Visium data were selected as the primary discovery cohort. Differential expression between tumour and adjacent stroma (annotated as described in Supplementary Methods) was first determined for each cell type and sample separately using the wilcoxauc function from presto (v.1.1.0; https://github.com/immunogenomics/presto)79. Log2-transformed fold changes (LFCs) for each gene were then aggregated by median to avoid bias, and corresponding meta P-values were calculated using Stouffer’s approach80 following conversion of two-sided P-values to z-scores. The calculations were done first across sample replicates, then across all samples within each cancer type, and finally across all cancer types in the discovery cohort. Meta P-values were adjusted using the Benjamini–Hochberg method to derive Q-values81. Significantly differentially expressed genes between tumour and adjacent stroma were identified as genes with: significant differential expression (per-cancer LFC > 0.05 and Q < 0.05) in at least three cancer types; a pan-cancer Q < 0.05; and a pan-cancer median LFC > 0.02. Genes with conserved pan-cell-type enrichment were omitted from this analysis and examined elsewhere (Extended Data Fig. 1e; see below). Among the remaining genes, up to 400 HUGO protein-coding genes (https://www.genenames.org) with the highest LFC in tumour (n \(\le \) 200) or adjacent stroma (n \(\le \) 200) were visualized across all held-out samples mapped to scRNA-seq data (Fig. 1d and Extended Data Fig. 1d). For cell types with fewer than 200 genes in either region, the minimum number per compartment was selected for balance (Supplementary Table 3). For visualization, data were scaled per column to a maximum absolute LFC of one and genes were ordered by the resulting median enrichment balanced across platforms (Fig. 1d, Extended Data Fig. 1d and Supplementary Table 3).
To cross-validate CytoSPACE-enhanced Visium against single-cell ST data directly (without scRNA-seq integration), we repeated the above analysis using the 500 genes covered by the MERSCOPE panel (section ‘MERSCOPE’ above). We then identified up to 50 HUGO protein-coding genes by median LFC (tumour, n \(\le \) 25; adjacent stroma, n \(\le \) 25) for each TME cell type and repeated this step independently for each platform. Cross-platform concordance was quantified by Spearman correlation of median LFCs and by the directionality of expression (higher in tumour or adjacent stroma) (Extended Data Fig. 1a–c). All results are detailed in Supplementary Table 4.
To identify spatially polarized genes with conservation across TME cell types, genes with differential expression (pan-cancer LFC > 0.05 and Q < 0.05) in more than 50% of cell types (n = 5) in the Visium discovery cohort were ranked by average LFC across all cell types. For balanced representation, the minimum number of top-ranking genes per compartment was selected for visualization (tumour, n = 138; adjacent stroma, n = 138) (Extended Data Fig. 1e and Supplementary Table 6).
Spatial EcoTyper framework
Despite experimental advances enabling high-resolution expression profiling of cells in situ, leveraging such data to systematically profile the co-association of cell states into SEs and discover conserved SEs across specimens and cancer types has remained challenging. The spatial organization of cell states and their relative abundances in an ecotype can vary across regions and between samples, and even expression profiles of individual cells sharing the same phenotypic state exhibit natural variability. Furthermore, technical drop-out and sample- or platform-specific batch effects all pose obstacles to SE discovery.
With these considerations in mind, we developed Spatial EcoTyper (Fig. 2a and Extended Data Fig. 2). At its core, the framework relies upon a network integration technique to identify common patterns of ST variation shared across samples. This is achieved by adapting similarity network fusion (SNF), a previously described approach for multi-omics data integration across patients32. By introducing a series of carefully constructed spatial GEPs, our approach mitigates technical drop-out while providing stability under biological variation. Once defined, SEs can be robustly recovered in a supervised manner from non-spatial data using unique cell states and molecular signatures that are learnt from spatial data.
Spatial EcoTyper consists of five key components, described in detail in the following sections.
-
Determination of sample-level spatial clusters. In each single-cell ST sample, spatial expression data are encoded into cell-type-specific GEPs of spatial neighbourhoods (SNs), spatial covariation among the SNs is computed by SNF and SNs are clustered over the resulting network (Fig. 2a; steps 1–4, Extended Data Fig. 2a).
-
Identification of conserved SEs. Spatial clusters discovered from individual single-cell ST samples are represented by GEPs of their associated cell states, and clusters with similar GEPs are aggregated across samples into conserved SEs (Extended Data Fig. 2; steps 1–9, Extended Data Fig. 2b).
-
Discovery of conserved SE-specific cell states. Cell states uniquely enriched in each SE and conserved across samples are identified using a specialized variant of NMF (Extended Data Fig. 6a).
-
Recovery of conserved SE-specific cell states. An NMF model is developed to recover SE-specific cell states in external single-cell or spatial expression datasets (Extended Data Fig. 6b).
-
Deconvolution of SEs from bulk RNA-seq. The approach from the previous component is generalized to the task of recovering SE abundances from bulk RNA-seq, using a training cohort of pseudo-bulk mixtures (Fig. 3 and Extended Data Fig. 8).
Determination of sample-level spatial clusters
The Spatial EcoTyper framework, schematically illustrated in Extended Data Fig. 2, begins by identifying clusters of SNs in each single-cell ST sample (steps 1–3, Extended Data Fig. 2a). Although ST data should be generated by the same assay, the discovery phase is applicable to diverse single-cell ST platforms. In this work, we used tumour samples profiled by MERSCOPE for discovery (Supplementary Table 8), with normalization performed as described in the ‘Spatial EcoTyper discovery cohort’ section below. To assess reproducibility, we also applied discovery mode to Xenium Prime data as described in ‘Robustness of spatial ecotype discovery to single-cell ST platform’ in Supplementary Methods (Extended Data Fig. 6k–n).
Assembly of cell-type-specific SN expression profiles
Spatial proximity is explicitly used by Spatial EcoTyper in two ways: when analysing individual cells, and when analysing distinct cell types. To accomplish the former, cell-type-specific GEPs are first aggregated by SNs centred along a regular grid (step 1, Extended Data Fig. 2a). This involves constructing a vector for each cell type c, denoted snGEPc, by averaging the normalized GEPs of the nearest up to k cells of cell type c located within radius r of the centre of each SN (selected to be 50 μm in practice; section ‘SN radius’ below). The snGEPc vectors for all SNs are then concatenated into matrix Ec with g genes (rows) and m snGEPc vectors (columns). Crucially, the latter is consistently ordered left-to-right by SN coordinates, enabling co-registration across cell types. In this way, for any given SN with coordinates i, j, snGEP vectors for cell types x and y will occupy the same column index in Ex and Ey, respectively (step 1, Extended Data Fig. 2a). To identify SEs containing multiple cell types, SNs characterized by a single cell type are eliminated by default. Furthermore, from each Ec matrix, genes expressed in fewer than five SNs, SNs with no cells of type c and SNs expressing fewer than five genes are excluded with associated entries set to NA.
The snGEPc vector serves as a fundamental data unit for Spatial EcoTyper, analogous to a ‘spatial meta-cell’ (Supplementary Fig. 5). It mitigates technical drop-out in single-cell gene expression profiling by aggregating over multiple cells while simultaneously reducing the influence of cell-type abundance. Hence, the snGEPc representation is suitable for ecotype detection based on cell-state variation, rather than shifts in local cell-type composition alone.
SN similarity network construction
To incorporate the spatial proximity of distinct cell types, a pairwise similarity matrix Ac of dimension m × m is constructed for each matrix Ec (step 2, left; Extended Data Fig. 2a). In detail, Spatial EcoTyper first performs dimension reduction on matrix Ec to identify the top 20 principal components using the RunPCA function from Seurat (v.4.3.0)67. Pairwise similarities between all Ec columns are then calculated as inverted Euclidean distance, yielding matrix Ac for each cell type c. Given the typically large number of SNs, we retained only the top α highest similarities in each row and column of Ac, setting all other values to zero to create a sparse matrix. Although α = 50 was used in this work, we note that our results were robust to a range of empirically tested values of α (data not shown). This step maintains key edges in the similarity network and enhances scalability. For any instance in which the given cell type c was not represented in both SNs, the corresponding entry in Ac was assigned as NA.
SN similarity network fusion
Because all SNs are co-registered across cell types, Spatial EcoTyper fuses all Ac matrices into a single similarity matrix A of dimension m × m using SNF (step 2, right; Extended Data Fig. 2a). Matrix A combines shared patterns of transcriptional covariance across colocalized cell types. To achieve network fusion in practice, we implemented an enhanced version of the SNF function from the SNFtool R package (v.2.3.1), adding support for sparse matrices and missing values while otherwise preserving the original functionality, then we applied this updated function to our set of Ac matrices. We then performed a rank normalization over the columns of the resulting matrix to transform similarity values per column into a standard space. Ranked values per column were subsequently converted to zero minimum and unit maximum.
Spatial clustering and cluster profiling
Given the fused similarity matrix A, Spatial EcoTyper groups SNs into clusters (step 3; Extended Data Fig. 2a), which will become candidates for SEs when considered across multiple samples as described in section ‘Identification of conserved SEs’ below. Clustering each input sample prior to multisample SE discovery serves two related purposes. First, it reduces the dimensionality of the data by grouping SNs with similar spatial covariance patterns. Second, it simplifies cross-sample integration by de-noising the data and minimizing drop-out. To cluster matrix A, we leveraged standard processing procedures optimized within Seurat (v.4.3.0)67, sequentially applying RunPCA, FindNeighbors and FindClusters (Louvain) functions. For single-sample analyses, the resulting spatial clusters represent sample-level SEs. For integrative analysis across samples, a higher clustering resolution is recommended to enhance robustness to parameter variation (section ‘Louvain clustering resolution’ below). In this work, we selected a resolution of 30, reducing the dimensionality from tens of thousands of individual SNs to hundreds of SN clusters per sample. Extended Data Fig. 2a shows the robustness of SE discovery to different clustering resolutions (1–50).
In practice, SNs in pre-annotated domains can be balanced to ensure equal representation before clustering. To obtain an equal number of SNs from tumour and adjacent stromal regions in this work, we uniformly downsampled the one with more SNs (for example, tumour) before integrative analysis.
Identification of conserved SEs
Beyond sample-level SE analysis, a key strength of Spatial EcoTyper lies in its ability to identify SEs conserved across a variety of conditions, such as samples, patients and cancer types. To identify such SEs, Spatial EcoTyper uses a variant of the sample-level process described above (section ‘Determination of sample-level spatial clusters’), using SN clusters rather than SNs as the fundamental units (steps 4–9; Extended Data Fig. 2).
Assembly of cell-type-specific spatial cluster gene expression profiles
Following clustering of SNs in a given input sample (section ‘Spatial clustering and cluster profiling’ above), each cell is assigned to the cluster of its spatially nearest SN. To minimize batch effects across samples, row-based standardization is then applied to each single-cell GEP, normalizing gene expression to zero mean and unit variance per gene. Spatial EcoTyper then computes the average cell-type-specific GEP for each cell type c and SN cluster, referred to as ccGEPc.
Once defined, ccGEPc vectors are aggregated per cell type c for a given input sample into matrix E′c, with g genes (rows) and s SN clusters (columns), with genes restricted to those with non-zero expression in at least some SN in each sample (step 4; Extended Data Fig. 2a). Importantly, all E′c matrices are co-registered across cell types, with any given SN cluster occupying the same column index across E′c matrices. Moreover, to ensure sufficient representation and well-defined computations per sample, we require a minimum of three SN clusters containing the cell type c for inclusion of a sample into each E′c. In preparation for cross-sample integration, the feature space of each E′c is reduced to the top 200 variable genes (by default), where highly variable genes per matrix are computed according to their rank product of variances across all input samples. In other words, for each cell type c, the variance of each gene across SN cluster ccGEPc vectors is computed per sample, with genes then assigned a rank by variance. Ranks are then aggregated across samples by geometric mean, and the top highly variable genes are selected per E′c from the result.
Cross-sample SN similarity network construction
When E′c matrices have been created for all input samples (step 5; Extended Data Fig. 2b), they are concatenated column-wise across samples yielding E*c, stratified by cell type c (step 6; Extended Data Fig. 2b). Similarity networks are then computed by Spearman correlation over the columns of E*c for each cell type c, yielding a set of c pairwise similarity matrices A*c describing the similarity of E*c columns across sample-level SN clusters (step 7; Extended Data Fig. 2b). For any pairwise comparison of SN clusters in which cell type c is not represented in both clusters, the corresponding entry in A*c is assigned NA. To minimize any remaining batch effects, Spatial EcoTyper standardizes similarity matrices A*c by performing rank normalization independently on each submatrix \({{\bf{A}}}_{{ij}}^{* c}\), which represents the similarities of SN clusters between samples i and j. The normalization is performed by converting the non-NA entries of each column in \({{\bf{A}}}_{{ij}}^{* c}\) to ranks and rescaling the ranks to the unit interval.
Cross-sample SN similarity network fusion
In step 8 (Extended Data Fig. 2b), Spatial EcoTyper fuses all A*c matrices across cell types into a single similarity matrix A* using the enhanced SNF function as described above in ‘SN similarity network fusion’. The resulting multisample matrix encodes the conservation of spatial community structures across cell types and SNs.
Clustering of sample-level SN clusters into SEs
In the final step, to group sample-level SN clusters into cross-sample SEs, NMF clustering is applied to A*, with the number of clusters (rank) set according to the following procedure (step 9; Extended Data Fig. 2b). In this work, NMF clustering of A* was tested for ranks ranging from 2 to 50, with 50 runs per rank using the Brunet method82, with optimal threshold selected as the highest rank for which the cophenetic coefficient exceeded 0.95 and subsequently showed the greatest drop (Extended Data Fig. 4b). NMF results derived from the selected rank, here identified as 11, were used to group sample-level SN clusters and corresponding SNs and single cells into SEs. This number was further reduced to nine final SEs by excluding candidate ecotypes that were devoid of SE-specific cell states (section ‘Discovery of conserved SE-specific cell states’ below) and that did not exhibit maximal within-cluster similarity. Similarity between two clusters was calculated as the average value of the block of A* corresponding to rows of the first cluster and columns of the second. Notably, NMF has well-established performance characteristics for robustly clustering high-dimensional genomic data encompassing hundreds to thousands of data points6,70. However, for step 3 of the multi-sample workflow (Extended Data Fig. 2a), we used the more efficient Louvain clustering (Seurat), owing to the large SNF matrices arising from single-sample analysis. For robustness testing, see Extended Data Fig. 4a.
Assembly of cell-type-specific SE gene expression profiles
Having assigned single cells to SEs, Spatial EcoTyper then determines SE cell-state gene expression profiles (csGEPs). For each cell type c, NMF is performed on single-cell GEP Gc and corresponding SE label matrix Hc to derive csGEPs. Here, Gc represents a gene-by-cell matrix for each cell type c. To construct Gc, GEPs are normalized with NormalizeData from Seurat (v.4.3.0)67, standardized to zero mean and unit variance per gene in each sample, and then posneg transformed as described previously6. To ensure balanced representation, an equal number of cells (at least 300, and up to 5,000 cells) are randomly selected from each sample-SE pair. Owing to the computational constraints of NMF, a maximum of 25,000 cells is selected through random down-sampling. GEPs of the selected cells are concatenated column-wise into csGEP matrix Gc. The SE label matrix Hc is a binary cell by SE matrix indicating SE membership for each cell in Gc. NMF is then applied to solve the equation:
where Wc represents the basis matrix containing csGEPs. To refine csGEPs for cell-state recovery, the top 50 genes are chosen per SE based on the largest positive delta compared with the second-highest expression across SEs. Each basis matrix is then reduced to the selected genes. These refined basis matrices enable the recovery of SEs and their cell states from independent data using NMF (section ‘Recovery of SE-specific cell states’ below).
Discovery of conserved SE-specific cell states
Although SEs are derived from spatial covariation in cell states across cell types, shared across samples, not every cell state associated with an SE need be specific to that SE. To identify and validate cell states specifically enriched in each SE and conserved across discovery samples, we performed leave-one-sample-out cross-validation (LOOCV), repeating the csGEP construction as described above (‘Assembly of cell-type-specific SE gene expression profiles’ above) for each training fold. We then used the resulting NMF basis matrices to predict cell-state labels on the held-out sample. For label assignment, NMF prediction output matrices Hc were standardized to unit sum per column, yielding a probability matrix encoding the probability of each single cell being localized in each SE. Cells were then assigned to the SE-associated cell state with the highest probability. For each LOOCV iteration, the enrichment of each cell state in each SE was assessed by its ability to correctly assign cells to that SE using an F1 score.
Because the csGEP construction involves subsampling of cells, this LOOCV process was repeated 20 times to ensure robustness, with F1 scores averaged across all repetitions. Cell states were considered specific to an SE only if the associated F1 score exceeded the second highest F1 score for the cell state across other SEs by at least 0.1. Otherwise, the cell state was deemed either broadly distributed across multiple SEs or not conserved across samples. Using this approach, 38 SE-specific cell states were identified, specific to nine SEs (Fig. 2f, Extended Data Fig. 6a and Supplementary Table 10). Two SEs identified initially were excluded from further analysis owing to a lack of specific cell states, and the remaining SEs were renumbered accordingly from 1 to 9 based on their average distance across discovery samples to the tumour margin (Fig. 2d and Extended Data Fig. 4c).
Recovery of SE-specific cell states
After identifying conserved SE-specific cell states through the above LOOCV process, we used all discovery-cohort samples to prepare an ensemble basis matrix W*c for each cell type c. Specifically, for each cell type, we repeated the process described above in ‘Assembly of cell-type-specific SE gene expression profiles’ 50 times and then averaged the resulting basis matrices to produce W*c. For feature selection, genes showing the highest and most specific expression in each cell state were identified from each basis matrix as described, and then genes that were selected for the same cell state in more than half of the repetitions were retained in the ensemble matrix.
The resulting ensemble matrices are a core component of the Spatial EcoTyper framework and can be used to recover SE-specific cell states from external single-cell-scale transcriptomics datasets using NMF, as described previously6. To recover SE-specific cell states from a query dataset, NMF is applied with single-cell-scale gene expression matrix Gc and the ensemble matrix W*c as input, to yield a probability matrix Hc denoting the probability of each cell belonging to each SE. Cells are then assigned to SE-specific cell states when the prediction probability exceeds 0.6, otherwise they are designated to a null class, referred to as non-SE. In practice, single-cell-scale GEPs in Gc should be normalized (to counts per million (CPM), TPM or by SCTransform67, as appropriate) and then scaled to zero mean and unit variance per gene.
Deconvolution of SEs from bulk RNA-seq
To enable profiling of SEs in bulk expression data, the Spatial EcoTyper framework includes an NMF model trained over simulated bulk RNA-seq prepared by aggregation of scRNA-seq data into pseudo-bulk mixtures for which ground-truth SE proportions are known.
Construction of pseudo-bulk mixtures
Previously described publicly available scRNA-seq data (section ‘scRNA-seq’ above) from ten cancer types were used to create pseudo-bulk mixtures. First, cells were annotated as described in section ‘Recovery of cell states and SEs in ST and scRNA-seq validation datasets’ below, labelled according to the parent SE of their assigned cell state or, if unassigned, designated non-SE, for a total of ten label classes. The fractional composition of each pseudo-bulk was generated by random sampling of a value per label class from the Gaussian distribution N(μ = 2, σ = 1), with negative values set to zero and the resulting values normalized to unit sum.
Pseudo-bulks were assembled separately by cancer type, with GEPs constructed by aggregating 1,000 cells randomly selected to satisfy the predefined fractions of cell states. For UMI- and plate-seq-based data, raw counts and TPM values were respectively summed across selected cells. We generated 100 pseudo-bulks per cancer type, and the resulting GEPs were normalized using the NormalizeData function from Seurat (v.4.3.0)67. To mitigate cancer type and batch differences, GEPs were further normalized to zero mean and unit variance per gene in each cancer type.
NMF model training for bulk deconvolution
The resulting profiles were concatenated into gene by sample matrix EP, with genes limited to those detected across all cancer types, and pseudo-bulk SE fractional abundances were encoded in sample by label matrix HP. From these, a basis matrix was derived by application of NMF followed by feature selection as described above (section ‘Assembly of cell-type-specific SE gene expression profiles’). The resulting basis matrix WB constitutes another core component of the Spatial EcoTyper framework and can be used to deconvolve SE fractional abundances from bulk gene expression data. For a given bulk expression dataset, predictions are performed by NMF as described in section ‘Discovery of conserved SE-specific cell states’, excluding the final classification step to yield HP, in which the values represent SE abundances across input bulk samples. In practice, to perform deconvolution, input data should be normalized to TPM or CPM as appropriate, log2-adjusted and then normalized across samples to zero mean and unit variance per gene.
Spatial EcoTyper discovery cohort
MERSCOPE samples were selected for SE discovery owing to their high spatial resolution and the availability of uniformly processed samples across multiple cancer types (Supplementary Table 8). Before analysis, MERSCOPE samples were preprocessed as described in section ‘MERSCOPE’ above, then standardized using NormalizeData from Seurat (v.4.3.0)67. For SE discovery and characterization, we focused on nine main TME cell types with strong representation across cancer types and tumour samples: B cells, plasma cells, CD4+ T cells, CD8+ T cells, NK cells, macrophages, dendritic cells, fibroblasts and endothelial cells. Malignant cells were not included owing to significant differences across tumour types. Other TME cell types, such as smooth muscle cells, pericytes and neutrophils, were not confidently detected in the MERSCOPE dataset, probably because of limitations in transcript capture inherent to MERSCOPE and the 500-gene panel that we analysed.
To capture spatial microenvironments from both tumour and adjacent stroma (annotated as described in Supplementary Methods), we selected samples in which each region included more than 5% of the total TME (immune and/or stromal) cells, yielding two melanomas, two colon cancer and two liver cancer samples, and one breast cancer, one prostate cancer and one ovarian cancer sample (Extended Data Fig. 3f and Supplementary Table 8). Five of these samples, each from a different cancer type, were used for SE discovery (Extended Data Fig. 3f and Supplementary Table 8).
Selection of Spatial EcoTyper parameters
SN radius
When applying Spatial EcoTyper to individual samples, we consistently observed a spatial gradient resembling the physical distance of SNs to the tumour margin (Fig. 2b). To assess robustness, Spatial EcoTyper analyses were performed with ten different SN radii, ranging from 10 μm to 100 μm, on a MERSCOPE melanoma sample (melanoma 1). For each SN radius, the relationship between gene expression similarity and physical distance of SNs to the tumour margin was evaluated. To do this, we used the procedure described in ‘Cells, meta-cells, and Spatial EcoTyper embeddings vs. distance to the margin’ in Supplementary Methods, with the exception that PCA was applied to the spatial embedding produced by Spatial EcoTyper (step 2; Extended Data Fig. 2) and SNs (rather than single cells) were used as the unit of analysis. To strike a balance between SN granularity and correlation with distance to the margin, a radius of 50 µm was selected for SE discovery (Extended Data Fig. 3g).
Louvain clustering resolution
A key parameter in the Spatial EcoTyper discovery pipeline (step 3; Extended Data Fig. 2a) is the resolution for Louvain clustering, which groups SNs into clusters in each sample. To ensure robustness, 11 different resolutions ranging from 1 to 50 were tested, and all resulting spatial clusters were grouped into ten clusters following the multisample discovery pipeline. The similarity between clusters derived at different resolutions was evaluated using the average adjusted Rand index (ARI), comparing each resolution with the others (Extended Data Fig. 4a). The discovered clusters showed high overall consistency, with results being more stable when a resolution higher than 15 was used. The resolution of 30, which had the highest average ARI, was selected for SE discovery in our analysis (Extended Data Fig. 4a).
Recovery of cell states and SEs in ST and scRNA-seq validation datasets
Single-cell-scale ST recovery
To validate SEs and their associated cell states, we analysed nine samples profiled by MERSCOPE (five discovery samples and four held-out samples) and 12 held-out samples profiled by different ST platforms: Xenium V1 (n = 5), Xenium Prime (n = 4) and Visium HD (n = 3) (Supplementary Table 8). The latter, drawn from publicly available data described in section ‘Data collection and processing’ above, were selected for consistency with the TME content threshold required for the MERSCOPE discovery cohort (section ‘Spatial EcoTyper discovery cohort’ above). In all held-out samples, SE-specific cell states were recovered using the approach described above in section ‘Recovery of SE-specific cell states’, assigning single cells (or 8-µm2 bins from Visium HD) to either non-SE or SE-specific cell states, which allowed for further grouping into the respective SEs. The same procedure was also applied to the MERSCOPE discovery cohort but with LOOCV to avoid overfitting (section ‘Discovery of conserved SE-specific cell states’ above).
Bulk ST recovery
For the analysis presented in Extended Data Fig. 6j, 26 Visium and 48 legacy ST samples were selected, each containing at least five spots located more than 500 µm away from the tumour margin (Supplementary Methods). Spatial spots were normalized to zero mean and unit variance per gene across all spots in each sample. We applied the SE cell-state recovery models (section ‘Recovery of SE-specific cell states’ above) to obtain an H*c matrix for each cell type c, and then averaged the matrices across cell types to estimate relative SE levels across spots. Each spot was then assigned to the dominant SE.
Single-cell RNA-seq recovery
For the analyses presented in Extended Data Fig. 7b, we queried SE content in scRNA-seq profiles from 144 tumour samples spanning ten cancer types (Supplementary Table 2). For each cell type from each carcinoma type, the scRNA-seq data were normalized using SCTransform from Seurat (v.4.3.0)67, and log2 TPM data were used for melanoma Smart-seq2 profiles. Cell-state recovery was then performed as described above (‘Recovery of SE-specific cell states’), and the abundance of cell state i of parent cell type c (for example, CD8+ T cells) in sample s was then determined as the fraction of cells assigned to state i out of the total cells of cell type c in sample s. We repeated the above process for scRNA-seq profiles of 64 brain metastases from melanoma and five types of carcinoma36, using NormalizeData with Seurat (v.4.3.0)67 before SE recovery (Extended Data Fig. 7c and Supplementary Table 2).
Metrics and analyses for SE recovery in ST and scRNA-seq data
Cell-state colocalization in single-cell-scale ST data
The spatial colocalization patterns of SE-specific cell states were assessed using single-cell-scale ST data from four platforms, with cell states recovered from each sample as described above (‘Recovery of cell states and SEs in ST and scRNA-seq validation datasets’) (Extended Data Fig. 6c and Supplementary Table 8). For each single cell (or 8-µm bin for Visium HD), the fractional abundances of neighbouring cell states within a radius of 50 µm were determined, resulting in an N × S matrix, F, where Fij denotes the fraction of cells within a 50-µm radius of cell Ni that are of state Sj. Single-cell-level fractions were subsequently averaged in each cell state, producing an S × S colocalization matrix, L, where Lij represents the average fractional abundance of cell state Si near cell state Sj. To control for biases, cell-state assignments were shuffled 10,000 times and colocalization matrices were recomputed, yielding 10,000 random colocalization matrices, Lrand. The colocalization matrix L was then normalized by subtracting the average of the Lrand matrices and dividing by their standard deviation for each element:
This resulted in a matrix L′, where Lij′ represents the colocalization index between cell state Si and cell state Sj. Finally, the colocalization indexes from multiple samples in each dataset were integrated using Stouffer’s approach80, with L′ capped at an absolute value of 5 per sample to prevent any single sample from disproportionately influencing the meta-analysis (Extended Data Fig. 6c–g).
Cell-state co-association in scRNA-seq data
Cell-state abundances were determined in scRNA-seq tumour atlases as described above in ‘Recovery of cell states and SEs in ST and scRNA-seq validation datasets’. Abundances were computed under four schemes: (i) including all SE and non-SE states; (ii) excluding non-SE states; third, the same as (i) except treating zero abundance as missing values (NA); and (iv) the same as (ii) except treating zero abundance as NA. Pairwise Pearson correlations between cell states were then calculated across all scRNA-seq samples for each abundance matrix, using the cor function in R with pairwise complete observations. The final co-association values were obtained by averaging the correlations across the four schemes (Extended Data Fig. 7a–c).
Significance of cell-state colocalization and co-association
Permutation experiments were done to assess the significance of cell-state cooccurrence indices, whether for colocalization in ST data (L′ from ‘Cell-state colocalization in single-cell-scale ST data’ above) or for co-associations in scRNA-seq data (‘Cell-state co-association in scRNA-seq data’ above). Let square matrix C represent all pairwise co-occurrence indices between SE cell states. Let Θw represent the average of all co-occurrence indices in C for cell states within SE class w. Θw was compared with 10,000 corresponding scores \({\theta }_{w}^{{\rm{rand}}}\) obtained by randomly shuffling the order of all columns in C, then determining the average co-occurrence index for all cell-state indices corresponding to SE w. The mean co-occurrence score Θw was then normalized by subtracting the average of \({\theta }_{w}^{{\rm{rand}}}\) and dividing by its standard deviation, yielding a two-sided z-score. For scRNA-seq co-association analyses, the z-score was directly converted into a P-value, and the process was repeated for each SE (Extended Data Fig. 7b,c). For ST colocalization analyses, each ST sample was analysed individually. Stouffer’s method80 was then used to aggregate SE-specific z-scores across all ST samples in a dataset, resulting in a meta-z-score for each SE, which was directly converted into a P-value (Extended Data Fig. 6d–f). To incorporate SE2, which is composed of a single cell state, colocalization testing was applied to individual SE2 cells. Otherwise, self-comparisons of cell states were excluded from significance testing.
Spatial autocorrelation
The spatial coherence of SE-specific cell states was evaluated using Moran’s I across 21 single-cell-scale ST datasets, with cell states recovered independently for each sample using Spatial EcoTyper (Extended Data Fig. 6j). We constructed a k-nearest-neighbour graph (k = 3) using spdep (v.1.3.11)83 over all annotated TME cells. We converted the resulting graph into a spatial weights matrix using the nb2listw function (with default parameters) and then calculated Moran’s I for each SE class using the moran function, where cells belonging to the SE were encoded as 1 and all others as 0.
To control for bias, we performed 1,000 permutation experiments in which SE labels were randomly shuffled within each cell type. Moran’s I was recalculated for each permutation to generate null distributions for every SE. Observed Moran’s I values were then normalized into z-scores by subtracting the mean of the null distribution and dividing by its standard deviation.
Distance to tumour margin
Following SE recovery from ST data as described above in section ‘Recovery of cell states and SEs in ST and scRNA-seq validation datasets’, the distance of each SE to the tumour margin (in micrometres) was computed by first averaging the Euclidean distance to the nearest tumour margin of all SNs (single-cell-scale ST) or spots (bulk ST) assigned to each SE in each sample, then averaging the resulting quantities by SE across samples in each cohort (Extended Data Fig. 6i,j). Positive and negative distances were used for cells and spots localized to the tumour region and adjacent stroma, respectively (see, for example, Fig. 1c). Finally, for each cohort, the consistency between expected distances (SE-specific distances to the tumour margin derived from the MERSCOPE discovery cohort) and predicted distances was evaluated using Pearson correlation (Extended Data Fig. 6i,j).
Characterization of SEs and associated cell states
Identification of SE cell-state markers
To identify SE-specific cell-state markers, we analysed scRNA-seq data from 144 tumours and ten cancer types (Supplementary Table 2), with all cells grouped into SE-specific cell states or the non-SE null class. The scRNA-seq data were normalized as described in section ‘Differential expression analysis’ above. Differential expression analysis was performed by comparing each cell state with all of the other cells of the same cell type using the wilcoxauc function from the presto package (v.1.0.0; https://github.com/immunogenomics/presto)79. LFCs were extracted from each cancer type and then aggregated across the ten cancer types by median, yielding pan-cancer LFCs.
To identify the markers most specific to each cell state i, we selected genes whose pan-cancer LFC in i was at least 0.1 higher than in any other state. Among these, the top 30 genes by pan-cancer LFC were considered to be marker genes (Fig. 2g and Supplementary Table 10). Markers for all SE cell types annotated in at least five cancer types in the combined scRNA-seq tumour atlas were determined (all except B and NK cells).
Reproducibility of cell-state markers
To assess the reproducibility of cell-state marker genes, we parcelled all ten scRNA-seq atlases into discovery (n = 5 cancer types corresponding to those used for the MERSCOPE discovery cohort) and validation (n = 5 remaining cancer types) cohorts, each with non-overlapping cancer types. Marker genes identified in the discovery cohort were evaluated in the validation cohort (Extended Data Fig. 7d) and compared with markers derived from the validation cohort or all ten cancer types using the Jaccard similarity index (Extended Data Fig. 7e).
Annotation of cell states
SE-specific cell states (n = 38) were annotated based on top markers and 135 previously reported reference cell states (Supplementary Table 10). Specifically, we assessed similarity to each of the 135 reference states by computing enrichment scores using AddModuleScore in Seurat (v.4.3.0)67. For each reference state, we then averaged enrichment scores across all cells of the state and tested whether the mean score was significantly higher than that of randomly sampled cells by permutation testing over 1,000 iterations. Of the 38 SE-specific states, 18 showed significant overlap with at least one reference state and were annotated with the most associated reference state by significance (Supplementary Table 10). To augment these assignments, all 38 SE cell states were also annotated based on the corresponding marker gene with the highest LFC compared with other cell states of the same cell type (Fig. 2g, Supplementary Table 10).
Identification of SE consensus markers
To identify SE-specific genes with conservation across cell types, termed consensus markers (Fig. 2h), the top 1,000 markers with positive pan-cancer LFCs—or all positive markers if fewer than 1,000 were available—were selected for each SE cell state from the analysis of scRNA-seq data described above (‘Identification of SE cell-state markers’). Consensus SE markers were then defined as genes with at least 80% conservation across all evaluable cell states in a SE (equivalent to 100% conservation for ecotypes with fewer than five states) for a minimum of 20 markers per SE. For SE2, which comprises a single-cell state, we limited consensus markers to those with statistically significant conservation (Q < 0.05) in at least three cancer types. To eliminate overlap among consensus markers, genes associated with multiple SEs were assigned to the SE with the highest number of significantly conserved cancer types. In cases where markers overlapped between SE2 and other SEs, genes were preferentially assigned to non-SE2 states. Given that SE3 had fewer than 20 genes following these steps (n = 16), we augmented it by including genes at a relaxed cell-type-conservation threshold of 60%, selected in order of decreasing conservation across cancer types, until the 20-marker minimum was satisfied. Consensus markers and normalized expression values are provided in Supplementary Table 11 (see also Supplementary Methods).
Biological pathways associated with SE consensus markers
To identify biological pathways associated with SE consensus markers, we performed overlap analysis using the enricher function from the clusterProfiler (v.4.14.6) R package84. Consensus marker sets were individually evaluated against hallmark (H) and biological process (C5:BP) gene sets from MSigDB85. Pathways with significant overlap (Q < 0.1) were retained, and for each SE, pathways showing the strongest overlap relative to other SEs were selected (Fig. 2h and Extended Data Fig. 7h).
Association between SEs and carcinoma ecotypes
To study the relationship between SEs and previously defined CEs6, SEs and CEs were recovered from the same scRNA-seq data across ten cancer types (Supplementary Table 1) using SE-specific and previously published6 recovery methods, respectively. The fraction of cells in each SE i that were also assigned to CE j was computed for each dataset, resulting in an overlap matrix O with rows representing nine SEs and columns representing nine CEs (excluding CE7 because of its low validation rate in a previous study6). To control for potential biases from different abundances, permutation experiments were performed by shuffling the cell-state assignments 10,000 times. In each iteration, the matrix O was recomputed, yielding \({O}_{i,j}^{{\rm{rand}}}\). The matrix O was then normalized using the mean and variance of \({O}_{i,j}^{{\rm{rand}}}\) (as described in section ‘Cell-state colocalization’ above), producing a normalized matrix O′, where \({O}_{i,j}^{{\prime} }\) represents the overlap index between SE i and CE j. Finally, the overlap indexes across the ten cancer types were aggregated using Stouffer’s method80 (Extended Data Fig. 7j).
Validation of SE deconvolution
Cross-validation over training pseudo-bulks
To evaluate the Spatial EcoTyper deconvolution model (section ‘Deconvolution of SEs from bulk RNA-seq’ above), we first applied a LOOCV procedure in which we trained the NMF model on pseudo-bulk GEPs from nine cancer types (n = 900 mixtures) and applied the trained model to pseudo-bulk GEPs from the remaining cancer type (n = 100 mixtures; see section ‘Construction of pseudo-bulk mixtures’ above). Consistency between predicted and ground-truth SE abundances was assessed by Pearson correlation (Fig. 3a, Extended Data Fig. 8a–c and ‘Benchmarking of SE deconvolution‘ in Supplementary Methods).
Paired bulk RNA-seq and scRNA-seq
We further evaluated the Spatial EcoTyper deconvolution model using paired scRNA-seq and bulk RNA-seq data from cohort 1 (Fig. 3a,d, Extended Data Fig. 8d and Supplementary Table 12). SE cell states were first recovered from scRNA-seq data (section ‘Recovery of cell states and SEs in ST and scRNA-seq validation datasets’ above) and SE abundances defined as the number of cells assigned to each SE over total evaluable cells per sample. Because mitochondrial quality-control filtering (section ‘Data collection and processing’ above) disproportionately removed cancer cells from a minority of samples, cancer and TME abundances were rescaled to match their proportions in the mitochondrial-unfiltered data, and SE proportions were adjusted accordingly. Next, SE abundances were inferred from bulk RNA-seq using the Spatial EcoTyper deconvolution model (section ‘NMF model training for bulk deconvolution’ above). Owing to the limited number of samples, which could bias the centring and unit variance normalization of gene expression required for SE deconvolution, we combined cohort 1 and TCGA RNA-seq data from matched cancer types, removed batch effects between the two datasets using the Combat function from the sva R package (v.3.46.0) with default parameters86, and then normalized the batch-corrected gene expression to zero mean and unit variance per gene across all samples. This process was conducted for melanoma and colon cancer separately, and for intact and digested bulk RNA-seq datasets separately (‘Bulk RNA sequencing’ in Supplementary Methods). The resulting data were used to infer SE abundances with Spatial EcoTyper (Extended Data Fig. 8e and ‘Benchmarking of SE deconvolution’ in Supplementary Methods).
Paired bulk RNA-seq and Visium ST
We downloaded bulk RNA-seq data as FASTQ files (n = 54 samples) and preprocessed Visium ST profiles (n = 103 samples) from 47 patients spanning five carcinoma types from the HTAN44. These data were processed and subjected to quality control as described in ‘Bulk RNA sequencing’ and ‘Paired bulk RNA-seq and ST quality control’, respectively, in Supplementary Methods, resulting in matched pairs from 42 of 47 patients, comprising 46 bulk RNA-seq and 88 Visium samples. Next, bulk RNA-seq data (log2 TPM) were scaled to unit variance per gene across samples for input to the Spatial EcoTyper deconvolution model (section ‘NMF model training for bulk deconvolution’ above). The log2 CPM data from each Visium sample were scaled to mean of zero and unit variance per gene across all spots. Deconvolution was then performed for each sample separately to determine SE abundances across Visium spots. To obtain sample-level SE abundances accounting for geographic variation in cell density, we estimated cell counts per spot using CytoSPACE (v.1.0.3)25. We then used these count estimates to compute a weighted average of SE levels across all spots and renormalized the resulting SE levels to unit sum in each sample. We averaged SE levels across samples per modality for each patient. The resulting SE abundances from Visium and matched bulk RNA-seq were then compared (Fig. 3b–d, Extended Data Fig. 8h,i). We also repeated this analysis, replacing bulk RNA-seq data with pseudo-bulk profiles constructed from Visium samples as described in Supplementary Methods, but without batch correction (Extended Data Fig. 8h).
Paired Visium and single-cell-scale ST
To assess whether spot-level deconvolution from bulk Visium data is consistent with single-cell-scale SE recovery (Fig. 3e–f and Extended Data Fig. 9a,b), we prospectively generated paired Visium data (quality control and processing as described in ‘10x Visium (standard)’ in Supplementary Methods) and MERSCOPE data (‘Vizgen MERSCOPE’ in Supplementary Methods) from adjacent melanoma sections (melanoma 3 from patient WU2109; Supplementary Tables 12, 17 and 20). We also downloaded matched Visium and Visium HD (8-μm2 bins) data for adjacent colon cancer sections (‘Visium HD, Sample P2 CRC’ and ‘Visium CytAssist v2, Sample P2 CRC’) from 10x Genomics (https://www.10xgenomics.com/platforms/visium/product-family/dataset-human-crc) and preprocessed them as described in section ‘Data collection and processing’ above. To co-register adjacent tissue sections profiled by Visium (standard) ST and paired single-cell-scale ST data (MERSCOPE or Visium HD), we manually selected four reference points at the edges of distinct morphological structures visible in both datasets. These references were used to learn a linear affine transformation function, which was subsequently applied to transform all coordinates from Visium into the coordinate space of the paired single-cell-scale ST dataset.
SE abundances were inferred for all Visium spots as described above (‘Paired bulk RNA-seq and Visium ST’). SEs were recovered from all single-cell-scale ST data as described in section ‘Recovery of cell states and SEs in ST and scRNA-seq validation datasets’ above. Two strategies were used to overcome potential imprecision in the co-registration procedure. First, SE abundances in single-cell-scale ST data corresponding to each co-registered Visium spot were estimated as the fraction of cells or bins assigned to each SE within a SN of 50 µm radius, requiring at least five cells or bins for robustness. Second, SE abundances in both datasets were smoothed by averaging across each co-registered spot and its six nearest neighbours. Non-SE cells, comprising cancer cells, non-SE TME cells and low-confidence cells (for example, because of limited gene detection; section ‘Recovery of SE-specific cell states’ above), were excluded from analysis.
Concordance between platforms was determined by Spearman correlation, adjusting for background dependencies between SE levels in the paired single-cell-scale ST sample (Fig. 3e–f and Extended Data Fig. 9a,b). To do this, pairwise Spearman correlations were computed between the levels of each Visium-derived SE i and each single-cell-scale-ST-derived SE j, conditioning on SE i in the single-cell-scale ST data, for all non-matching SE pairs, using the pcor function in the R package ppcor (v.1.1)87. Direct Spearman correlations were calculated for all matching SE pairs. The P-values of the resulting correlation coefficients were transformed into signed –log10 Q-values indicating the polarity of the correlation following Benjamini–Hochberg correction81.
Large-scale assessment of SE levels in human tumours
Overall survival and pathway analysis
We applied the Spatial EcoTyper deconvolution model (section ‘NMF model training for bulk deconvolution’ above) to infer SE levels from 7,076 bulk tumour RNA-seq profiles across 17 cancer types from TCGA, including melanoma and 16 carcinomas (Supplementary Table 13). SE deconvolution was performed separately for each cancer type using TPM data obtained from the PanCanAtlas (https://gdc.cancer.gov/about-data/publications/pancanatlas), which were log2-adjusted. To investigate SE prognostic associations, a Cox regression analysis was conducted to examine the association between SE abundance and patient overall survival, adjusting for age and sex, using the survival R package (v.3.6.4). This analysis was done separately for each cancer type (Fig. 3g and Supplementary Table 14). To determine the pan-cancer survival associations of SEs, a meta-analysis was done by combining the resulting z-scores from each SE across all 17 cancer types using Stouffer’s method80. For clarity, all z-scores and meta z-scores were converted to directional –log10 P-values (Fig. 3g and Supplementary Table 14). For pathway analysis details (Extended Data Fig. 9d), see ‘Pathways associated with inferred SE abundance in TCGA’ in Supplementary Methods. For the analysis in Extended Data Fig. 9e, MHC levels were computed by first averaging the log2 expression of MHC-I (HLA-A/B/C) and MHC-II genes (HLA-D*) separately, then averaging the resulting quantities together. Stromal levels were assessed using ESTIMATE (v.1.0.13)88.
Associations with immunotherapy response
We obtained publicly available bulk tumour RNA-seq data from patients with melanoma and carcinoma treated with ICIs, including anti-PD-1, anti-PD-L1 and combinations of anti-PD-1 and anti-CTLA-4 therapies, after tumour sample collection. All patients were grouped into responders (partial or complete response) and non-responders (stable or progressive disease) based on collected clinical information. To mitigate within-dataset heterogeneity, patients who received prior immunotherapy or chemotherapy were separated into independent datasets. A minimum of five responders and five non-responders was required for each dataset, resulting in 1,249 total patients from 15 datasets from 12 studies51,89,90,91,92,93,94,95,96,97,98,99, representing four cancer types (melanoma and three carcinoma types) (Supplementary Table 15). All expression data were normalized to TPM before analysis.
Using the Spatial EcoTyper deconvolution model, we predicted SE abundances across tumours in each dataset. We also evaluated the activity of publicly available transcriptional features associated with immunotherapy response100, including carcinoma ecotypes6, T-cell dysfunction58, T-cell exclusion, microsatellite instability (MSI)58, tumour immune dysfunction and exclusion (TIDE)58, immune resistance signatures101, IMPRES102, TLS signatures57,103, cytolytic score (GZMA and PRF1)104, MHC-I signature (HLA-A, HLA-B, HLA-C, B2M and CASP8), PD-L1 (CD274), 18-gene inflammatory signatures55, combined tumour and immune signals (MAP4K and TBX3)105, an M1 macrophage signature56 and an IFNG signature55. The activity of carcinoma ecotypes from ref. 6, T-cell dysfunction, exclusion, MSI and TIDE from ref. 58, immune resistance signatures from ref. 101 and IMPRES from ref. 102 was evaluated using their respective algorithms with default settings. For the remaining features, average gene expression was computed using log2 TPM data.
The association between each feature and ICI response was assessed using a z-score derived from a two-sided Wilcoxon rank-sum test, within each dataset. These z-scores were then combined across datasets using Liptak’s method106, weighted by the square root of sample sizes. The resulting combined z-scores were converted to two-sided P-values (Supplementary Table 16). For the analysis in Fig. 3h, data from all four cancer types were included, whereas the comparison in Fig. 5b was restricted to melanoma datasets.
SE levels versus TMB and CD274 expression in ICI-treated patients
Of the pretreatment bulk RNA-seq tumour profiles analysed in ‘Associations with immunotherapy response’, 465 patients from four studies with melanoma (n = 150), non-small cell lung cancer (n = 43) or bladder cancer (n = 272) have whole-exome-sequencing-derived TMB data available (Supplementary Table 15). TMB values, defined as the total number of non-synonymous mutations per patient, and CD274 (encoding PD-L1) expression levels were log2-transformed. For each dataset, univariate Cox proportional hazards regression models were fitted to evaluate the association between standardized feature levels (SE7, SE8, SE4, TMB and CD274 expression) and overall survival. The resulting HRs and their associated standard errors were pooled across datasets within each cancer type, and across cancer types, using a nested random-effects meta-analysis implemented in the rma.mv function of the metafor R package107 (v.4.8.0), with default parameters. Specifically, for each covariate, we used rma.mv to combine the log-hazard ratios and their corresponding variances (standard error squared) across outer and inner grouping factors (cancer type and dataset, respectively), then extracted pooled HRs, 95% confidence intervals and associated P-values to generate the forest plot in Fig. 3i. Because all covariates were standardized, each HR reflects the association with overall survival for the same (1 s.d.) change in the predictor, enabling direct comparison of their relative influence. For the analysis in Extended Data Fig. 9f, multivariable Cox proportional HR models were applied to each dataset, including standardized levels of SE8, SE7 or SE4 jointly with TMB and CD274 expression. The resulting log-HRs and standard errors were combined using the same nested random-effects framework described above.
Liquid EcoTyper framework
Current analyses of the tumour microenvironment rely on invasive solid-tumour biopsies, which are prone to sampling bias and generally restricted to a single diagnostic biopsy. This limitation hinders the application of SEs as biomarkers in clinical settings. To address these challenges, we developed Liquid EcoTyper, a deep-learning framework for non-invasive profiling of SEs using plasma cfDNA methylation profiles.
Liquid EcoTyper is built around a CpG set binary network (CSBN), in which informative CpG sets and associated weights are learnt simultaneously within a unified framework to enable multivariate prediction of SE levels (Fig. 4a). This CSBN approach draws on the gene set binary network model originally introduced for predicting cellular developmental potential from single-cell RNA-seq data108. Notably, sample methylation profiles are encoded at the CpG set level for model inference, analogous to gene sets in the previous study. This representation improves robustness to batch effects and technical dropout in methylation sequencing data, and enhances generalizability across data types, including both tumour and plasma methylation profiles.
Liquid EcoTyper network architecture
Input and output
As input, the model takes an n × s matrix X containing the preprocessed methylation levels of n CpGs over s samples (section ‘Methylation data preprocessing’ below). At evaluation, the model yields an s × l matrix Ŷ containing the predicted levels of l classes over s samples. Model classes include SEs, non-SE (section ‘Recovery of SE-specific cell states’ above) and a ‘background’ class representing DNA not derived from the tumour compartment (Fig. 4a).
CpG set binary network
Model input X is passed to a core binary module in which m CpG sets are learnt in binary n × m matrix WB. As described previously108, WB has a continuous equivalent W used during training for model initialization and back-propagation that undergoes binarization at each forward pass. Here, CpG sets encoded within WB are scored simply by averaging normalized methylation values over the selected CpGs per sample:
where\(\,{S}_{\bullet ,j}\) and \({W}_{\bullet ,j}^{B}\) denote the j-th columns of the respective matrices. Scores are standardized across samples by batch normalization, yielding s × m score matrix Snorm.
Prediction layer
The CpG set scores encoded in Snorm are then passed through a linear layer to produce s × l matrix Q. This matrix is then transformed using the sigmoid function σ and rescaled to yield the final output prediction Ŷ:
where \({\hat{Y}}_{i,\bullet }\) and \({P}_{i,\bullet }\) denote the i-th rows of the respective matrices.
Liquid EcoTyper training
The Liquid EcoTyper model was implemented and trained using PyTorch 2.2.0. In practice, the model feature space is of size n = 38,431 CpGs and the CpG set binary module included m = 400 CpG sets.
Loss function
For model training we define a custom loss function incorporating both the mean cross-SE Pearson correlation per sample and the mean cross-sample Pearson correlation per SE. In combination with the model structure, this loss function prioritizes robust maintenance of linear relationships, particularly of SE levels across samples—essential for clinical utility—while discouraging overfitting to training values. In detail, we define prediction loss as follows:
where Ŷ and Y denote predicted and true sample level matrices, respectively, and subscripts indicate matrix columns. Along with the prediction loss, we also include a term penalizing CpG set size as described previously78, designed to provide additional regularization to the model. Full model loss is then computed as
where \(\lambda \) denotes the CpG set size penalty weight, I denotes the m × m identity matrix, ⊙ denotes the Hadamard product, and \({\parallel \bullet \parallel }_{F}\) denotes the Frobenius norm. In practice, \(\lambda \) was set to \(\sqrt{10}\) to balance loss terms.
Model regularization
To support robustness to technical drop-out in EM-seq data, drop-out was applied to model input matrix X during training at a rate of 0.5.
Model initialization and updates
Model weights were given default initialization except binary weight matrix W, which here was initialized with values sampled from the Gaussian distribution with mean μ = –0.125 and standard deviation σ = 0.055 to give a highly sparse initial matrix targeting 400–500 CpGs initially selected per set. At each iteration, model parameters were updated using PyTorch’s NAdam optimizer with learning rate lr = 0.001 and cross-epoch gradient accumulation given the stabilizing role of inertia in training binary neural networks109,110.
Model training, evaluation and stopping
Models were trained over ten random splits (80% training, 20% validation) of the simulated training cohort (section ‘Simulation of plasma cfDNA with tumour contribution’ below). Each model was trained for 40 epochs, with model performance by epoch evaluated over training and validation sets using the PredLoss function described above (‘Loss function’ above). Performance on the validation split was used for early stopping. Final model weights were selected corresponding to the epoch yielding best validation performance.
Model ensembling
For all evaluations outside the training framework, the outputs of all ten models (from ten random folds) are averaged to yield a single ensembled prediction matrix.
Initial feature selection
To prepare a reduced model feature space for efficient training, informative CpGs per SE were selected from the TCGA melanoma cohort, which includes paired methylation and bulk RNA-seq profiles (n = 461 tumours) obtained from the TCGA PanCanAtlas (https://gdc.cancer.gov/about-data/publications/pancanatlas). First, the set of CpGs was reduced to those detected across all TCGA melanoma tumours. For each SE, we then performed differential methylation analysis in the training cohort to identify differentially methylated CpGs. Specifically, for each SE class, we grouped samples with inferred SE abundance from paired bulk RNA-seq data above the 75th or below the 25th quantile. We then assessed each CpG for differential methylation between groups. We computed the significance of group difference by the Wilcoxon rank-sum test and the magnitude by absolute value of the difference of group means, \(\Delta =|{\mu }_{75}-{\mu }_{25}|\), selecting CpGs satisfying P < 0.05 and \(\Delta \) > 0.1. If more than 5,000 CpGs met both criteria for an SE, we selected the top 5,000 by differential methylation magnitude. Selection of up to 5,000 CpGs was done separately for positive and negative differential methylation, allowing for a total of up to 10,000 CpGs to be selected per SE. To ensure adequate features for SE recovery from methylation data, we required a minimum of 1,000 informative CpGs per SE. In practice, ecotype SE6 did not meet this criterion for the melanoma model and was omitted. In total, this yielded a feature space of 38,431 CpGs.
Methylation data preprocessing
Mapping EM-seq CpGs to HM450K probe IDs
Methylation observations from EM-seq data, given in terms of CpG locations (chromosome, start and end coordinates), were matched to HM450 probe IDs using the GRCh38 HM450 manifest downloaded from https://zwdzwd.github.io/InfiniumAnnotation. In brief, the manifest was reduced to entries with defined values for CpG location as well as HM450 probe ID. Any duplicate entries per CpG location were subsequently dropped, yielding a one-to-one mapping.
Normalization and imputation
Data were subset to model features with methylation values subtracted from one and normalized to zero mean and unit variance over model CpGs per sample. Missing values were then imputed per CpG by dataset mean when defined and replaced by zeros otherwise.
Simulation of plasma cfDNA with tumour contribution
Because Liquid EcoTyper requires training data with known composition, and because of the limited availability of paired tumour and plasma cfDNA data, we simulated a training dataset by combining plasma cfDNA and tumour methylation profiles (Fig. 4b and Extended Data Fig. 10a). In doing so, we considered several factors. First, different malignant and TME cell states may exhibit variable kinetics of DNA shedding into circulation. We therefore prioritized cross-patient relative levels of SEs during training (‘Loss function’ above). Second, and relatedly, real biopsy samples, including those from patients with metastatic cancer, are prone to sampling bias, so plasma and ground-truth tumour SEs need not match exactly. Third, even cfDNA samples from patients with cancer with undetectable circulating tumour DNA by mutation analysis may include some tumour (and TME) methylation signal. Therefore, to isolate TME-derived signal during model training, we leveraged healthy donor plasma cfDNA to serve as a background class. Finally, inter-subject variation is much lower among cfDNA samples from healthy controls than from patients with cancer in our data (Extended Data Fig. 10b). We therefore considered it justifiable to combine methylomes from different healthy individuals to increase variation and boost the size of the training cohort (‘Generation of additional background cfDNA profiles’ below).
Accordingly, we designed simulated mixtures to contain both ‘background’ cfDNA and tumour tissue contributions, with the background compartment derived from plasma cfDNA profiled from cohort 2 healthy individuals (n = 23 in total; partitioned into n = 18 for training cohort and n = 5 for test cohort) and the tumour compartment derived from the TCGA melanoma methylation cohort (n = 461 in total; partitioned into n = 346 for training cohort and n = 115 for test cohort) (Fig. 4b,c and Extended Data Fig. 10a). Paired bulk RNA-seq data from TCGA tumour samples enabled the recovery of tissue SE levels through bulk deconvolution, as described in section ‘NMF model training for bulk deconvolution’ above.
Generation of additional background cfDNA profiles
Given the rationale outlined above, to expand the pool of healthy cfDNA profiles, we prepared new profiles from the samples of each cohort separately to reach the same number of samples as available from tumour tissue. For each new profile, we randomly selected three samples from the appropriate cohort (training or test), perturbing the methylation profiles of each sample with noise to reduce collinearity across new mixtures, then combining according to randomly generated fractions.
In detail, fractional composition was generated by sampling from the unit interval uniformly at random for each sample, then rescaling the three values to sum to one, yielding mixing fractions ϕ1, ϕ2 and ϕ3. Noise perturbations were applied multiplicatively to raw methylation values, with noise level parameters ν1, ν2 and ν3 generated by sampling from the normal distribution. Given sample methylation profiles M1, M2 and M3, perturbed profiles \({\hat{M}}_{1},{\hat{M}}_{2}\,{\rm{and}}\,{\hat{M}}_{3}\) were computed as follows:
where s denotes a scaling parameter, set here to s = 0.02, and min operates element-wise to cap resulting methylation values at 1. With these mixing parameters defined, the resulting new healthy cfDNA profile Mnew is given by:
Combination of background and tumour compartments
With the resulting background cfDNA profiles now matching the tumour tissue profiles in number, we generated the final sets of simulated samples by matching background and tumour profiles one-to-one in the training and test cohorts (Extended Data Fig. 10a). For each simulated sample, we generate mixing fractions ϕT and ϕB of tumour and background, respectively, by sampling ϕB uniformly at random from a desired range, then computing ϕT = 1 − ϕB as the complement. This range was selected here as [0.2, 0.6] to weight fractions in favour of tumour contribution while preserving substantial background for regularization to support model extensibility to both tumour tissue and plasma cfDNA methylation profiles. Of note, tumour samples contain highly variable malignant cell content (‘Validation of simulated samples’ below), capturing a broad range of both malignant and TME DNA (Extended Data Fig. 10b). As such, this approach enables sufficient exposure of the model to diverse tumour fractions during training, facilitating generalizability. The resulting simulated samples are then constructed as:
where MT denotes the raw methylation profile of the selected tumour tissue sample and MB denotes the raw methylation profile of the selected background profile, newly generated as described above.
Ground-truth composition of simulated samples
Tumour-specific SE levels used for simulating cfDNA above were inferred by applying the Spatial EcoTyper deconvolution model to tumour RNA-seq profiles as described above in section ‘NMF model training for bulk deconvolution’. Following this, any SE with inadequate feature support in Liquid EcoTyper (for example, SE6 in the melanoma-specific model) was omitted and the remaining sample-level deconvolution results were rescaled to unit sum. To define ground-truth SE levels in simulated plasma cfDNA samples, we then scaled the SE levels in corresponding tumour samples by ϕT, with the ground-truth background level given by ϕB, for each simulated sample.
Validation of simulated samples
To determine whether simulated profiles are a suitable proxy for authentic cfDNA profiles, we performed a direct assessment against real cfDNA samples from healthy individuals and patients with melanoma. Profiles from cohort 2 healthy individuals (n = 23), cohort 3 patients with melanoma (n = 78) and simulated training and test cohorts (n = 461) were jointly embedded by PCA. This embedding revealed tight clustering of healthy plasma cfDNA methylomes, with some melanoma cfDNA samples overlapping the healthy region but many others scattering out further (Extended Data Fig. 10b, left). Simulated cfDNA methylomes overlapped real melanoma cfDNA methylomes (Extended Data Fig. 10b, left).
Given these results, we hypothesized that the embedding distribution was organized by sample tumour content. Thus, we extended the visualization to colour by ctDNA percentage (Extended Data Fig. 10b, right). For real cfDNA samples, ctDNA percentage was given by AVENIO for patients with melanoma (n = 60; Supplementary Table 26) and set to zero for healthy individuals. We computed an effective ctDNA percentage for simulated profiles by multiplying the total tumour fraction by the tumour purity using consensus measurement of purity estimations111, available for 456 of 461 samples. The resulting embedding revealed a gradient of tumour content across the embedding for both real and simulated melanoma cfDNA samples, with samples harbouring the lowest tumour content placed closer to healthy plasma (Extended Data Fig. 10b, right).
Performance assessment of Liquid EcoTyper
Application to held-out simulated data
We tested Liquid EcoTyper’s ability to recover ground-truth levels of SEs from plasma methylation profiles by application to the held-out test cohort of simulated data (n = 115; section ‘Simulation of plasma cfDNA with tumour contribution’ above). Performance for each SE individually, as well as for recovery of the underlying cross-correlation structure of SE levels, was assessed by Spearman correlation between ground-truth and predicted levels (Fig. 4c and Extended Data Fig. 10c).
Extension to carcinomas
To evaluate generalizability, we prepared Liquid EcoTyper models for 13 carcinoma types profiled by TCGA, each with paired bulk RNA-seq and methylation arrays for more than 100 tumour samples. Colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ) were combined as colorectal cancer (CRC). For each carcinoma type, we repeated the procedures described in sections ‘Initial feature selection’, ‘Simulation of plasma cfDNA with tumour contribution’, ‘Model training’ and ‘Application to held-out simulated data’ above. As detailed in ‘Initial feature selection’, any SE with initial feature set selection resulting in fewer than 1,000 associated CpGs was excluded from model training and evaluation (SE1 from two carcinomas, SE3 from one carcinoma, SE6 from four and SE9 from one). We aggregated model performance results across carcinoma types by average Spearman correlation coefficients between predicted (Liquid EcoTyper) and expected SE levels for held-out test sets (Extended Data Fig. 10d, left).
We also evaluated a pan-carcinoma leave-one-out framework, training Liquid EcoTyper models over 12 carcinoma types at a time (150 randomly selected tumour samples per type to balance representation, totalling 1,800 training samples) and evaluating performance on each held-out carcinoma type in turn. For each pan-carcinoma model, we followed the same process as described above, with the exception that the initial CpG feature set for each cohort was selected according to a consensus mechanism across included carcinoma types. For each SE and methylation direction, CpGs were ranked by selection frequency across per-carcinoma feature sets and the top 5,000 were included. With this approach, all SEs had adequate CpG feature set coverage. To promote a fair comparison with carcinoma-specific models (above), for each held-out carcinoma type c, we excluded any SEs from model evaluation that were also excluded from the model exclusively trained on carcinoma c. Results were aggregated as described above (Extended Data Fig. 10d, centre).
Paired tumour and plasma from patients with melanoma
We assessed concordance of Liquid EcoTyper predictions for plasma cfDNA collected from 23 patients with melanoma (cohort 2) (Fig. 4d) against: first, SE levels inferred from matched tumour Visium or Visium HD by Spatial EcoTyper (15 patients); second, Liquid EcoTyper predictions from matched tumour EM-seq (20 patients); and third, Liquid EcoTyper predictions from matched PBMC EM-seq (Supplementary Table 17), as shown in Fig. 4e–i and Extended Data Fig. 11b.
For five tumour samples, EM-seq was performed on both FFPE and fresh-frozen sections (Supplementary Table 17). Given that SE levels inferred by Liquid EcoTyper were largely concordant across replicates (Extended Data Fig. 11a), we averaged SE levels across replicates for subsequent analyses. SE levels for Visium data were inferred as described above in ‘Paired bulk RNA-seq and Visium ST’ following quality control as described in ‘10x Visium (standard)’ in Supplementary Methods. For cases in which Visium replicates of adjacent sections were available (two patients, each with two samples profiled by Visium), we averaged SE levels across replicates (Supplementary Table 17). Two Visium HD samples were also included in the comparison with paired plasma EM-seq data, subject to quality control, as described in ‘10x Visium HD data’ in Supplementary Methods. To avoid platform-specific variation in analysing Visium and Visium HD data jointly, SE abundances for Visium HD were inferred for 16-µm bins using the same Spatial EcoTyper deconvolution protocol as described for Visium (section ‘Paired bulk RNA-seq and Visium ST’ above).
Liquid EcoTyper outputs were renormalized to unit sum excluding inferred plasma cfDNA background levels. We then evaluated the consistency of SE levels between tumour ST and plasma cfDNA EM-seq (Fig. 4e–h), as well as between tumour EM-seq and plasma cfDNA EM-seq (Fig. 4e,i and Extended Data Fig. 11b), and between PBMC EM-seq and both tumour and plasma EM-seq (Fig. 4i). To emphasize relative ordering, SE levels were ranked and normalized to the 0–1 range in each compartment, with concordance then quantified by Spearman correlation for each SE. All of the SE levels are provided in Supplementary Table 18.
To visualize the most prominent SE signals in Fig. 4h (centre), the levels of each SE (SE7 or SE4) in the Visium data were first separately standardized to mean zero and unit variance across all spots and samples. Next, for each sample, the 95th percentile of SE7 and SE4 levels together was determined, and SE levels were binarized by setting all values greater than this threshold to one and all other values to zero. The bar plots in Fig. 4h (bottom) show the means of the binarized data.
Validation of Liquid EcoTyper through learnt CpG sets
To further assess the ability of Liquid EcoTyper to learn biologically grounded methylation profiles of spatial cellular ecosystems, we performed a series of experiments leveraging the CpG sets learnt by the model to determine whether model predictions successfully target each SE accurately, specifically and in accordance with known biological features.
Concordance with ground-truth associations
We first assessed whether Liquid EcoTyper successfully learns and preserves the associations of each CpG set with ground-truth SE composition. For each CpG set extracted from the model, we averaged the methylation levels of its component CpGs in each training cohort sample, then compared the resulting quantities to ground-truth SE levels, for each SE, by Pearson correlation. We then replaced ground-truth with predicted SE levels and recomputed the associations for each CpG set–SE pair. Concordance between associations with ground-truth and predicted SE levels was evaluated using Pearson correlation across all CpG sets, separately for each SE (Extended Data Fig. 10e).
Model specificity
To assess the ability of Liquid EcoTyper to target each SE specifically for predictions, we evaluated the impact of ablating the top associated CpG sets per SE on model performance. For each SE, we selected all CpG sets with ground-truth Pearson correlations greater than 0.25 in absolute value among the training cohort (as determined from ‘Concordance with ground-truth associations’ above) and set the corresponding entries of the binary matrix WB encoding the learnt CpG sets to zero for each ensemble model. We then applied the resulting model to the held-out test cohort methylomes, assessing Liquid EcoTyper specificity for the SE in terms of performance loss. We defined a performance loss index as the fractional difference in Spearman correlations ρ of predicted versus ground-truth SE levels between the original (unablated) and ablated Liquid EcoTyper models:
Performance loss was calculated for all SEs following the ablation of each SE individually and comparing the loss for the ablated SE (on-target) with that for other SEs (off-target) (Extended Data Fig. 10f). For ease of interpretability, performance loss index values were capped to a range of 0 to 1, defined as performance losses of 0% (ρablated ≥ ρoriginal) and 100% (ρablated ≤ 0), respectively.
Biological interpretability
Hypomethylated CpGs proximal to the transcription start site (TSS) (for example, promoters, the first exon and the first intron) are generally associated with elevated gene expression112. Because Liquid EcoTyper learns CpG sets that include promoters and gene bodies, and because the Illumina Infinium HumanMethylation450 (HM450) BeadChip array is enriched for regulatory and TSS-proximal CpGs113, we proposed that, on average, learnt CpGs within these regions would reflect this general relationship. Therefore, to investigate links between CpG methylation and SE classes (Extended Data Fig. 10g–i), we focused on model CpGs (section ‘Initial feature selection’ above) that overlap with promoter regions, defined as the 1-kilobase region upstream of the TSS, and the gene body, with hg38 coordinates obtained from the HGNC track at https://genome.ucsc.edu/cgi-bin/hgTables.
For each SE, we constructed a ranked gene list from model CpGs and ground-truth CpG set associations among the training cohort. To do so, each CpG set was first converted to a gene set comprising all of the unique HUGO gene symbols overlapping the selected CpGs in either promoter regions or gene body at least once. For each gene g within a given SE, we then computed the average ground-truth SE association (‘Concordance with ground-truth associations’ above) over all CpG sets in which gene g is represented, resulting in gene-level methylation scores for each SE. The gene-level scores for each SE—along with non-SE and background components—were converted into rank space with higher ranks denoting lower scores. This yielded a rank matrix R consisting of 8,860 evaluable genes (rows) by 10 components (columns). To prioritize SE-specific features, we then calculated, for each gene in R, the delta in rank space between each SE and the maximum rank among other components (the remaining SEs, non-SE and background). This resulted in a new matrix D, comprising deltas for each gene for all ten components. We then removed non-SE and background components from R and D, averaged the resulting matrices to exploit complementary information (ensuring that SEs of the same class were averaged), and median-subtracted each column to yield equal numbers of positive and negative ranks for each SE. Finally, we subjected the resulting SE-specific ranked lists to fgsea (v.1.25.1)114, testing the enrichments of SE-specific consensus genes (Extended Data Fig. 10g,h and Supplementary Table 11). Global significance (Extended Data Fig. 10i) was determined by calculating the number of top-ranked diagonal matches (row or column) across all possible permutations of the ranked lists, yielding an exact permutation P-value.
Consistency of associations
To assess faithfulness of model predictions to the learnt CpG set associations, we applied the procedure outlined above in ‘Concordance with ground-truth associations’ to compute CpG set associations with predicted SE levels for melanoma plasma cfDNA collected as part of cohorts 2, 3, and 4, split by institution (in total, 79 samples collected at Yale and 17 at Washington University in St. Louis; Supplementary Table 21). We then compared the resulting CpG associations between institutions using Pearson correlation for each SE (Supplementary Fig. 8).
Robustness of Liquid EcoTyper
To evaluate the robustness of plasma SE recovery to technical variation, we performed the following two experiments.
Plasma SE recovery versus sequencing depth
For the analysis in Extended Data Fig. 11c, we performed a down-sampling experiment using cohort 2 plasma samples with paired tumour ST profiling. Aligned read pairs were randomly downsampled using samtools115 (v.1.18) at predefined fractions to achieve target sequencing depths of 20×, 15×, 10× and 5× for each sample. Down-sampled BAM files were subsequently processed through the remainder of the pipeline described in ‘Enzymatic methyl sequencing’ in Supplementary Methods. Samples with an original sequencing depth below a given target were retained at their original depth. Liquid EcoTyper performance was then re-evaluated as described above in ‘Paired tumour and plasma from patients with melanoma’ using down-sampled plasma samples. Performance remained robust down to 10× depth (the minimum depth in this study), with a significant but modest decline observed at 5× depth (Extended Data Fig. 11c).
Plasma SE recovery versus CpG imputation approach
To evaluate whether the approach for imputing missing CpG methylation values (‘Methylation data preprocessing’ above) influences Liquid EcoTyper predictions (Extended Data Fig. 11d,e), we tested an alternative imputation strategy in which missing values were imputed uniformly at random from the range [0,1] before data normalization. Although the default approach leverages other methylation profiles in the cohort to infer missing values, this alternative uses no prior knowledge and instead adds random noise. Applied to cohort 2 plasma samples with paired tumour ST profiling, the random-imputation approach had only a small effect on samples with low coverage (higher imputation fraction), and there was no significant global difference in SE recovery performance, emphasizing robustness.
Association between liquid SE levels and clinical outcomes in ICI-treated patients
To evaluate the clinical relevance of SE levels in plasma cfDNA from patients with melanoma, we applied Liquid EcoTyper as described above in ‘Paired tumour and plasma from patients with melanoma’ to infer SE levels in 78 pretreatment plasma samples from cohort 3 (Fig. 5a and Supplementary Tables 19 and 26) and 10 plasma samples from cohort 4 (Supplementary Tables 20 and 26).
Liquid SEs versus ICI response
The association between each SE and ICI response was assessed using a two-sided Wilcoxon rank-sum test, comparing patients with DCB to those with NDB in each dataset (Fig. 5c,e, Extended Data Fig. 12a,g and Supplementary Tables 19, 20 and 27) and by AUC (Extended Data Fig. 12f and Supplementary Table 27).
Liquid SEs versus survival
For Kaplan–Meier analyses (Fig. 5d,f and Extended Data Fig. 12b–d), we dichotomized patients based on the median level of each liquid SE (SE7, SE8 and SE4) in cohort 3 (Supplementary Table 18). Differences in OS and PFS were assessed using a two-sided log-rank test with the survival (v.3.6.4) R package. Associations with OS and PFS were also evaluated using both univariate and multivariable Cox regression models, including models combining continuous liquid SE7, SE8 or SE4 levels with other clinical indices, such as sex, age, ICI type, melanoma subtype and BRAF mutation status. Results are provided in Extended Data Fig. 12e and Supplementary Table 28. Additional Cox models evaluating the associations between OS and continuous liquid SE levels, ctDNA levels, TMB (log2-transformed number of non-synonymous mutations per megabase) and PD-L1 percentages (tumour proportion score, TPS), both alone and in bivariable models comparing liquid SE7, SE8 or SE4 levels against each of the above covariates, are provided in Extended Data Fig. 12h and Supplementary Table 27. For PD-L1 TPS values reported as ranges (for example, 1–5), the median of the range was used. For PD-L1 TPS values reported as inequalities, less than x or more than x, we subtracted or added 0.5 to x, respectively (Supplementary Table 24). Finally, we performed a time-dependent AUC(t) analysis for comparative OS prediction using the timeROC function of the timeROC R package116 (v.0.4 with default parameters) (Supplementary Fig. 9). The 6–18-month interval after ICI initiation was selected because it captures the minimum time needed for gauging durable clinical benefit (6 months) and provides additional time for delayed benefit. It also avoids losing patients to follow-up and minimizes the sparsity of events in cohort 3.
Statistics and reproducibility
Unless otherwise noted, all statistical tests were two-sided. Two-group comparisons were conducted with Wilcoxon rank-sum or signed-rank tests as appropriate. For comparisons of SE levels imputed from tissue expression data (bulk, Visium ST and scRNA-seq), which are expected to be linearly related, Pearson correlations were applied to assess concordance. For tissue versus plasma comparisons of SE levels, Spearman correlations were used to assess the directional concordance of SE levels because strict assumptions of normality and linearity could not be guaranteed. For Cox regression models, the proportional hazards assumption was verified for each covariate by evaluating the Schoenfeld residuals.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
De-identified genomic data generated in this work are available from the Gene Expression Omnibus (GEO) under accession number GSE320042. Preprocessed and normalized data are available from https://doi.org/10.25936/pm3t-cn37. All requests for raw data will be promptly reviewed by the corresponding authors to determine whether the request is subject to any confidentiality obligations. Any data that can be shared will be released through a data-use agreement. The accession numbers for the publicly available data analysed in this study are listed in the Supplementary Tables 1, 2 and 13. Additional data supporting the findings in this work are available in the main text, figures, extended data and supplementary files.
Code availability
The software and code generated in this study are available at https://github.com/digitalcytometry/spatialecotyper and https://spatialecotyper.stanford.edu for non-profit academic research use. An online interface for running trained Spatial EcoTyper and Liquid EcoTyper models is also available at https://spatialecotyper.stanford.edu.
References
Jackson, H. W. et al. The single-cell pathology landscape of breast cancer. Nature 578, 615–620 (2020).
Danenberg, E. et al. Breast tumor microenvironment structures are associated with genomic features and clinical outcome. Nat. Genet. 54, 660–669 (2022).
Hwang, W. L. et al. Single-nucleus and spatial transcriptome profiling of pancreatic cancer identifies multicellular dynamics associated with neoadjuvant treatment. Nat. Genet. 54, 1178–1191 (2022).
Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016).
Pelka, K. et al. Spatially organized multicellular immune hubs in human colorectal cancer. Cell 184, 4734–4752 (2021).
Luca, B. A. et al. Atlas of clinically distinct cell states and ecosystems across human solid tumors. Cell 184, 5482–5496 (2021).
Wagner, J. et al. A single-cell atlas of the tumor and immune ecosystem of human breast cancer. Cell 177, 1330–1345 (2019).
Gulati, G. S., D’Silva, J. P., Liu, Y., Wang, L. & Newman, A. M. Profiling cell identity and tissue architecture with single-cell and spatial transcriptomics. Nat. Rev. Mol. Cell Biol. 26, 11–31 (2025).
Gong, D., Arbesfeld-Qiu, J. M., Perrault, E., Bae, J. W. & Hwang, W. L. Spatial oncology: translating contextual biology to the clinic. Cancer Cell 42, 1653–1675 (2024).
Nirmal, A. J. et al. The spatial landscape of progression and immunoediting in primary melanoma at single-cell resolution. Cancer Discov. 12, 1518–1541 (2022).
Li, Y., Zhang, J., Gao, X. & Zhang, Q. C. Tissue module discovery in single-cell-resolution spatial transcriptomics data via cell-cell interaction-aware cell embedding. Cell Syst. 15, 578–592 (2024).
Varrone, M., Tavernari, D., Santamaria-Martínez, A., Walsh, L. A. & Ciriello, G. CellCharter reveals spatial cell niches associated with tissue remodeling and cell plasticity. Nat. Genet. 56, 74–84 (2024).
Singhal, V. et al. BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis. Nat. Genet. 56, 431–441 (2024).
AbdulJabbar, K. et al. Geospatial immune variability illuminates differential evolution of lung adenocarcinoma. Nat. Med. 26, 1054–1062 (2020).
Warrick, J. I. et al. Intratumoral heterogeneity of bladder cancer by molecular subtypes and histologic variants. Eur. Urol. 75, 18–22 (2019).
Shen, S. Y. et al. Sensitive tumour detection and classification using plasma cell-free DNA methylomes. Nature 563, 579–583 (2018).
Li, S. et al. Comprehensive tissue deconvolution of cell-free DNA by deep learning for disease diagnosis and monitoring. Proc. Natl Acad. Sci. USA 120, e2305236120 (2023).
Goveia, J. et al. An integrated gene expression landscape profiling approach to identify lung tumor endothelial cell heterogeneity and angiogenic candidates. Cancer Cell 37, 21–36 (2020).
Kieffer, Y. et al. Single-cell analysis reveals fibroblast clusters linked to immunotherapy resistance in cancer. Cancer Discov. 10, 1330–1351 (2020).
Zheng, L. et al. Pan-cancer single-cell landscape of tumor-infiltrating T cells. Science 374, abe6474 (2021).
Cheng, S. et al. A pan-cancer single-cell transcriptional atlas of tumor infiltrating myeloid cells. Cell 184, 792–809 (2021).
Buechler, M. B. et al. Cross-tissue organization of the fibroblast lineage. Nature 593, 575–579 (2021).
Tang, F. et al. A pan-cancer single-cell panorama of human natural killer cells. Cell 186, 4235–4251 (2023).
Ma, J. et al. A blueprint for tumor-infiltrating B cells across human cancers. Science 384, eadj4857 (2024).
Vahid, M. R. et al. High-resolution alignment of single-cell and spatial transcriptomes with CytoSPACE. Nat. Biotechnol. 41, 1543–1548 (2023).
Ramos, R. N. et al. Tissue-resident FOLR2+ macrophages associate with CD8+ T cell infiltration in human breast cancer. Cell 185, 1189–1207 (2022).
Qi, J. et al. Single-cell and spatial analysis reveal interaction of FAP+ fibroblasts and SPP1+ macrophages in colorectal cancer. Nat. Commun. 13, 1742 (2022).
Matusiak, M. et al. Spatially segregated macrophage populations predict distinct outcomes in colon cancer. Cancer Discov. 14, 1418–1439 (2024).
Klopotowska, M. et al. PRDX-1 supports the survival and antitumor activity of primary and CAR-modified NK cells under oxidative stress. Cancer Immunol. Res. 10, 228–244 (2022).
Christofk, H. R. et al. The M2 splice isoform of pyruvate kinase is important for cancer metabolism and tumour growth. Nature 452, 230–233 (2008).
Haliday, E. M., Ramesha, C. S. & Ringold, G. TNF induces c-fos via a novel pathway requiring conversion of arachidonic acid to a lipoxygenase metabolite. EMBO J. 10, 109–115 (1991).
Wang, B. et al. Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11, 333–337 (2014).
Lee, D. D. & Seung, H. S. Algorithms for non-negative matrix factorization. In Proc. 14th International Conference on Neural Information Processing Systems (eds Leen, T. K. et al.) 535–541 (MIT Press, 2000).
Spranger, S., Bao, R. & Gajewski, T. F. Melanoma-intrinsic β-catenin signalling prevents anti-tumour immunity. Nature 523, 231–235 (2015).
Cutler, A. & Breiman, L. Archetypal analysis. Technometrics 36, 338–347 (1994).
Xing, X. et al. Pan-cancer human brain metastases atlas at single-cell resolution. Cancer Cell 43, 1242–1260 (2025).
Aran, D. et al. Comprehensive analysis of normal adjacent to tumor transcriptomes. Nat. Commun. 8, 1077 (2017).
Lavin, Y. et al. Innate immune landscape in early lung adenocarcinoma by paired single-cell analyses. Cell 169, 750–765 (2017).
Morgan, D. & Tergaonkar, V. Unraveling B cell trajectories at single cell resolution. Trends Immunol. 43, 210–229 (2022).
Orimo, A. et al. Stromal fibroblasts present in invasive human breast carcinomas promote tumor growth and angiogenesis through elevated SDF-1/CXCL12 secretion. Cell 121, 335–348 (2005).
Grout, J. A. et al. Spatial positioning and matrix programs of cancer-associated fibroblasts promote T-cell exclusion in human lung tumors. Cancer Discov.12, 2606–2625 (2022).
Yang, X. et al. FAP promotes immunosuppression by cancer-associated fibroblasts in the tumor microenvironment via STAT3–CCL2 signaling. Cancer Res. 76, 4124–4135 (2016).
Liu, C. et al. Pan-cancer single-cell and spatial-resolved profiling reveals the immunosuppressive role of APOE+ macrophages in immune checkpoint inhibitor therapy. Adv. Sci. 11, 2401061 (2024).
Rozenblatt-Rosen, O. et al. The Human Tumor Atlas Network: charting tumor transitions across space and time at single-cell resolution. Cell 181, 236–249 (2020).
Thorsson, V. et al. The immune landscape of cancer. Immunity 48, 812–830 (2018).
Schaafsma, E., Fugle, C. M., Wang, X. & Cheng, C. Pan-cancer association of HLA gene expression with cancer prognosis and immunotherapy efficacy. Br. J. Cancer 125, 422–432 (2021).
Dhatchinamoorthy, K., Colbert, J. D. & Rock, K. L. Cancer immune evasion through loss of MHC class I antigen presentation. Front. Immunol. 12, 636568 (2021).
Pandol, S., Edderkaoui, M., Gukovsky, I., Lugea, A. & Gukovskaya, A. Desmoplasia of pancreatic ductal adenocarcinoma. Clin. Gastroenterol. Hepatol. 7, S44–S47 (2009).
Vaisvila, R. et al. Enzymatic methyl sequencing detects DNA methylation at single-base resolution from picograms of DNA. Genome Res. 31, 1280–1289 (2021).
Newman, A. M. et al. Integrated digital error suppression for improved detection of circulating tumor DNA. Nat. Biotechnol. 34, 547–555 (2016).
Kim, S. T. et al. Comprehensive molecular characterization of clinical responses to PD-1 inhibition in metastatic gastric cancer. Nat. Med. 24, 1449–1458 (2018).
Nabet, B. Y. et al. Noninvasive early identification of therapeutic benefit from immune checkpoint inhibition. Cell 183, 363–376 (2020).
Kim, E. S. et al. Blood-based tumor mutational burden as a biomarker for atezolizumab in non-small cell lung cancer: the phase 2 B-F1RST trial. Nat. Med. 28, 939–945 (2022).
Rao, A., Barkley, D., França, G. S. & Yanai, I. Exploring tissue architecture using spatial transcriptomics. Nature 596, 211–220 (2021).
Ayers, M. et al. IFN-γ-related mRNA profile predicts clinical response to PD-1 blockade. J. Clin. Invest. 127, 2930–2940 (2017).
Hwang, S. et al. Immune gene signatures for predicting durable clinical benefit of anti-PD-1 immunotherapy in patients with non-small cell lung cancer. Sci. Rep. 10, 643 (2020).
Cabrita, R. et al. Tertiary lymphoid structures improve immunotherapy and survival in melanoma. Nature 577, 561–565 (2020).
Jiang, P. et al. Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response. Nat. Med. 24, 1550–1558 (2018).
Chen, S. et al. Single-cell analysis reveals transcriptomic remodellings in distinct cell types that contribute to human prostate cancer progression. Nat. Cell Biol. 23, 87–98 (2021).
Wu, S. Z. et al. A single-cell and spatially resolved atlas of human breast cancers. Nat. Genet. 53, 1334–1347 (2021).
Wu, F. et al. Single-cell profiling of tumor heterogeneity and the microenvironment in advanced non-small cell lung cancer. Nat. Commun. 12, 2540 (2021).
Lu, Y. et al. A single-cell atlas of the multicellular ecosystem of primary and metastatic hepatocellular carcinoma. Nat. Commun. 13, 4594 (2022).
Olalekan, S., Xie, B., Back, R., Eckart, H. & Basu, A. Characterizing the tumor microenvironment of metastatic ovarian cancer by single-cell transcriptomics. Cell Rep. 35, 109165 (2021).
Peng, J. et al. Single-cell RNA-seq highlights intra-tumoral heterogeneity and malignant progression in pancreatic ductal adenocarcinoma. Cell Res. 29, 725–738 (2019).
Lai, H. et al. Single-cell RNA sequencing reveals the epithelial cell heterogeneity and invasive subpopulation in human bladder cancer. Int. J. Cancer 149, 2099–2115 (2021).
Ji, A. L. et al. Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma. Cell 182, 497–514 (2020).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587 (2021).
Gouin, K. H. et al. An N-Cadherin 2 expressing epithelial cell subpopulation predicts response to surgery, chemotherapy and immunotherapy in bladder cancer. Nat. Commun. 12, 4906 (2021).
Wu, R. et al. Comprehensive analysis of spatial architecture in primary liver cancer. Sci. Adv. 7, eabg3750 (2021).
Barkley, D. et al. Cancer cell states recur across tumor types and form specific interactions with the tumor microenvironment. Nat. Genet. 54, 1192–1201 (2022).
Thrane, K., Eriksson, H., Maaskola, J., Hansson, J. & Lundeberg, J. Spatially resolved transcriptomics enables dissection of genetic heterogeneity in stage III cutaneous malignant melanoma. Cancer Res. 78, 5970–5979 (2018).
Moncada, R. et al. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol. 38, 333–342 (2020).
Berglund, E. et al. Spatial maps of prostate cancer transcriptomes reveal an unexplored landscape of heterogeneity. Nat. Commun. 9, 2419 (2018).
Berglund, E. et al. Automation of Spatial Transcriptomics library preparation to enable rapid and robust insights into spatial organization of tissues. BMC Genomics 21, 298 (2020).
Andersson, A. et al. Spatial deconvolution of HER2-positive breast cancer delineates tumor-associated cell type interactions. Nat. Commun. 12, 6012 (2021).
He, B. et al. Integrating spatial gene expression and breast tumour morphology via deep learning. Nat. Biomed. Eng. 4, 827–834 (2020).
Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353, 78–82 (2016).
Oliveira, M. F. de et al. High-definition spatial transcriptomic profiling of immune cell populations in colorectal cancer. Nat. Genet. 57, 1512–1523 (2025).
Korsunsky, I., Nathan, A., Millard, N. & Raychaudhuri, S. Presto scales Wilcoxon and auROC analyses to millions of observations. Preprint at bioRxiv https://doi.org/10.1101/653253 (2019).
Stouffer, S. A., Suchman, E. A., Devinney, L. C., Star, S. A. & Williams, R. M. Jr. The American Soldier: Adjustment during Army Life. (Studies in Social Psychology in World War II) Vol. 1 (Princeton Univ. Press, 1949).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B Methodol. 57, 289–300 (1995).
Brunet, J.-P., Tamayo, P., Golub, T. R. & Mesirov, J. P. Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl Acad. Sci. USA 101, 4164–4169 (2004).
Pebesma, E. & Bivand, R. Spatial Data Science: With Applications in R (Chapman and Hall/CRC, 2023).
Yu, G. Thirteen years of clusterProfiler. Innovation 5, 100722 (2024).
Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).
Kim, S. ppcor: An R package for a fast calculation to semi-partial correlation coefficients. Commun. Stat. Appl. Methods 22, 665–674 (2015).
Yoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun. 4, 2612 (2013).
Jung, H. et al. DNA methylation loss promotes immune evasion of tumours with high mutation and copy number load. Nat. Commun. 10, 4278 (2019).
Gide, T. N. et al. Distinct immune cell populations define response to anti-PD-1 monotherapy and anti-PD-1/anti-CTLA-4 combined therapy. Cancer Cell 35, 238–255 (2019).
Riaz, N. et al. Tumor and microenvironment evolution during immunotherapy with nivolumab. Cell 171, 934–949 (2017).
Lee, J. S. et al. Synthetic lethality-mediated precision oncology via the tumor transcriptome. Cell 184, 2487–2502 (2021).
Liu, D. et al. Integrative molecular and clinical modeling of clinical outcomes to PD1 blockade in patients with metastatic melanoma. Nat. Med. 25, 1916–1927 (2019).
Campbell, K. M. et al. Prior anti-CTLA-4 therapy impacts molecular characteristics associated with anti-PD-1 response in advanced melanoma. Cancer Cell 41, 791–806 (2023).
Mariathasan, S. et al. TGFβ attenuates tumour response to PD-L1 blockade by contributing to exclusion of T cells. Nature 554, 544–548 (2018).
Cui, C. et al. Ratio of the interferon-γ signature to the immunosuppression signature predicts anti-PD-1 therapy response in melanoma. npj Genom. Med. 6, 7 (2021).
He, Y. et al. Multi-omics characterization and therapeutic liability of ferroptosis in melanoma. Signal Transduct. Target. Ther. 7, 268 (2022).
Kang, J. et al. Systematic dissection of tumor-normal single-cell ecosystems across a thousand tumors of 30 cancer types. Nat. Commun. 15, 4067 (2024).
Ravi, A. et al. Genomic and transcriptomic analysis of checkpoint blockade response in advanced non-small cell lung cancer.Nat. Genet.55, 807–819 (2023).
Liu, Y. et al. Predicting patient outcomes after treatment with immune checkpoint blockade: a review of biomarkers derived from diverse data modalities. Cell Genom. 4, 100444 (2024).
Jerby-Arnon, L. et al. A cancer cell program promotes T cell exclusion and resistance to checkpoint blockade. Cell 175, 984–997 (2018).
Auslander, N. et al. Robust prediction of response to immune checkpoint blockade therapy in metastatic melanoma. Nat. Med. 24, 1545–1549 (2018).
Meylan, M. et al. Tertiary lymphoid structures generate and propagate anti-tumor antibody-producing plasma cells in renal cell cancer. Immunity 55, 527–541 (2022).
Rooney, M. S., Shukla, S. A., Wu, C. J., Getz, G. & Hacohen, N. Molecular and genetic properties of tumors associated with local immune cytolytic activity. Cell 160, 48–61 (2015).
Freeman, S. S. et al. Combined tumor and immune signals from genomes or transcriptomes predict outcomes of checkpoint inhibition in melanoma. Cell Rep. Med. 3, 100500 (2022).
Lipták, T. On the combination of independent tests. A Magyar Tudományos Akadémia Matematikai Kutató Intézetének Közleményei 3, 171–197 (1958).
Viechtbauer, W. Conducting meta-analyses in R with the metafor package. J. Stat. Softw. 36, 1–48 (2010).
Kang, M. et al. Improved reconstruction of single-cell developmental potential with CytoTRACE 2. Nat. Methods 22, 2258–2263 (2025).
Alizadeh, M., Fernández-Marqués, J., Lane, N. D. & Gal, Y. A systematic study of binary neural networks’ optimisation. University of Oxford https://www.cs.ox.ac.uk/publications/publication13850-abstract.html (2019).
Helwegen, K. et al. Latent weights do not exist: rethinking binarized neural network optimization. In Proc. Advances in Neural Information Processing Systems 32 (eds Wallach, H. et al.) (NeurIPS, 2019).
Aran, D., Sirota, M. & Butte, A. J. Systematic pan-cancer analysis of tumour purity. Nat. Commun. 6, 8971 (2015).
Feinberg, A. P. & Vogelstein, B. Hypomethylation distinguishes genes of some human cancers from their normal counterparts. Nature 301, 89–92 (1983).
Bibikova, M. et al. High density DNA methylation array with single CpG site resolution. Genomics 98, 288–295 (2011).
Korotkevich, G. et al. Fast gene set enrichment analysis. Preprint at bioRxiv https://doi.org/10.1101/060012 (2021).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Blanche, P., Dartigues, J.-F. & Jacqmin-Gadda, H. Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks. Stat. Med. 32, 5381–5397 (2013).
Newman, A. M. et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 37, 773–782 (2019).
Chu, T., Wang, Z., Pe’er, D. & Danko, C. G. Cell type and gene expression deconvolution with BayesPrism enables Bayesian integrative analysis across bulk and single-cell RNA sequencing in oncology. Nat. Cancer 3, 505–517 (2022).
Tsoucas, D. et al. Accurate estimation of cell-type composition from gene expression data. Nat. Commun. 10, 2975 (2019).
Acknowledgements
We thank the patients and families involved in this study; J. Mudd, G. Ansstas and C. Kaufman for clinical samples; and F. Khameneh for assistance with sequencing. This work was supported by a Stanford Bio-X Interdisciplinary Graduate Fellowship (to M.K.), the Research Council of Norway (334328 to C.B.S.), the National Science Foundation (Graduate Research Fellowship DGE-1656518 to J.C.S.), a Stanford Graduate Fellowship in Science & Engineering (to J.C.S.), the National Cancer Institute (K08CA237727 to D.Y.C., P50CA121974 to A.B., M.S. and R.H., and R01CA283317 to A.A.C. and A.M.N.), a Cancer Research Foundation Young Investigator Award (to A.A.C.), a V Foundation for Cancer Research V Scholar Award (to A.A.C.), an Alvin Siteman Cancer Research Award (to A.A.C.), a Melanoma Research Alliance Team Science Award (to M.S., R.H., D.Y.C., A.A.C. and A.M.N.) and the Virginia and D.K. Ludwig Fund for Cancer Research (to A.M.N.). A.M.N. is a Chan Zuckerberg Biohub – San Francisco Investigator.
Author information
Authors and Affiliations
Contributions
W.Z., E.L.B., A.U., A.A.C. and A.M.N. conceived the study, developed strategies for related experiments and wrote the paper. W.Z. and E.L.B. did the data analysis and interpretation, with assistance from A.U., N.E., M.K., C.B.S., H.S.J., S.A., I.A., N.P.S., J.C.S., F.Q., Q.C., A.J.G., J.K., P.C.L., R.C.F., M.S., R.H., D.Y.C., A.A.C. and A.M.N. A.B., R.C.F., M.S., R.H. and D.Y.C. collected patient specimens, which were processed for next-generation sequencing by A.U., N.E., C.O., A.V., P.S.C., C.M.S. and P.K.H. R.P.G. performed PD-L1 scoring of tumour tissue samples. A.U., N.E. and A.B. determined the clinical characteristics and outcomes, with assistance from M.S., D.Y.C. and A.A.C. W.Z., E.L.B., A.U. and N.E. curated the clinical data. A.M.N. and A.A.C. are co-senior authors. All authors commented on the manuscript at all stages.
Corresponding authors
Ethics declarations
Competing interests
M.S. has consulted for Idera Pharmaceuticals, Regeneron Pharmaceuticals, Apexigen, Alligator Bioscience, Verastem Oncology, Agenus, Rubius Therapeutics, Bristol-Myers Squibb, Genentech-Roche, Boston Pharmaceuticals, Servier Laboratories, Adaptimmune Therapeutics, Immunocore, Dragonfly Therapeutics, Pierre-Fabre Pharmaceuticals, Molecular Partners, Boehringer Ingelheim, Innate Pharma, Nektar Therapeutics, Pieris Pharmaceuticals, Numab Therapeutics, Abbvie, Zelluna Immunotherapy, Seattle Genetics, Genocea Biosciences, GI Innovation, Chugai-Roche, BioNTech, Eli Lilly, Modulate Therapeutics, Array Biopharma, AstraZeneca and Genmab. M.S. has stock options in EvolveImmune, NextCure, Repertoire Immune Medicines, Adaptive Biotechnologies, Actym Therapeutics and Amphivena, and has stock ownership in GlaxoSmithKline and Johnson & Johnson. A.A.C. has patent filings and advisory experience related to liquid biopsy and cancer biomarkers. A.A.C. has stock options in Geneoscopy, and ownership interests in LiquidCell Dx and Droplet Biosciences. A.M.N. has patent filings related to digital cytometry, liquid biopsy and cancer biomarkers, and has served as a consultant to, and has ownership interests in, CiberMed and LiquidCell Dx. The other authors declare no competing interests.
Peer review
Peer review information
Nature thanks Dvir Aran, Hani Kim, Alexander Swarbrick and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Extended analysis of TME spatial polarization.
a,b, Scatter plots showing cross-platform consistency of top differentially expressed genes between tumour and adjacent stroma in representative TME cell types, comparing genes identified from bulk ST data (10x Visium) reconstructed with scRNA-seq data using CytoSPACE25 (x-axis) and validated using single-cell ST data (MERSCOPE) (a), and vice versa (b) (Supplementary Table 4). Concordance was assessed by Spearman correlation, linear regression with 95% confidence intervals, and fraction of genes with consistent directionality between platforms (upper left confusion matrices). P-values were determined by two-sided t tests. Point colours reflect the mean of both axes, with orange and purple denoting tumour-associated and adjacent stroma-associated genes, respectively. Top genes (up to a maximum of 10% of genes included in the MERSCOPE gene panel; 50 genes in total, and 25 genes per compartment) were selected from Visium data (panel a) or from MERSCOPE data (panel b) to satisfy differential expression requirements of Q < 0.05 and median log2 fold change (FC) across samples >0.05. Four genes per compartment with the highest average rank across both axes are highlighted. c, Same as a and b but showing for all evaluable TME cell types (n = 9) the fraction of top genes with consistent directionality between platforms in relation to the discovery cohort. Significance was assessed by two-sided Wilcoxon signed-rank test. ns, not significant. For details, see Methods. d, Same as Fig. 1d, using CytoSPACE to integrate scRNA-seq and ST data, but shown for plasma cells. The mean log2 FC of differentially expressed genes in plasma cells between discovery (Visium) and each validation cohort (MERSCOPE, legacy ST) was assessed by Spearman correlation, with significance determined by a two-sided t test. Significant concordance was observed for MERSCOPE but not legacy ST, which has 100 µm spots and the lowest spatial resolution among evaluated ST assays. e, Heat map showing spatial expression programmes that stratify tumour and adjacent stroma independent of TME cell type (n = 9; coloured as in c), cancer type (n = 10), or ST platform (n = 3) (Supplementary Table 6). Genes with asterisks (PKM and FOS) denote the top markers in the Visium discovery cohort that are contained within the MERSCOPE gene panel. f, Spatial polarization of PKM and FOS expression in tumour and adjacent stroma in representative liver cancer (Liver 2, top) and colon cancer (Colon 2, bottom) specimens profiled by MERSCOPE (Supplementary Table 8). Left: Annotated cell types, tumour/stroma regions, and specimen-wide expression of PKM and FOS. Right: Representative microregions (50 µm2) showing PKM and FOS mRNA transcripts in diverse cell types. g, Heat map depicting normalized enrichment scores (fgsea114) of hallmark pathways in tumour versus adjacent stroma for pan-cell-type markers related to panel e (Supplementary Table 7). Gene sets were applied to an ordered expression vector of log2 fold changes between tumour and adjacent stroma, balanced by TME cell type and cancer type. Data vectors in Visium discovery and MERSCOPE validation cohorts were compared by Spearman correlation, with significance determined by a two-sided t test. In panel c, the box centre lines, bounds of the box, and whiskers denote medians, 1st and 3rd quartiles, and minimum and maximum values within 1.5 × IQR (interquartile range) of the box limits, respectively.
Extended Data Fig. 2 Framework for spatial ecotype discovery.
a,b, Graphical depiction of the Spatial EcoTyper discovery module, illustrating its application to a single sample (steps 1–3; panel a) and to multiple samples (steps 1–9; panel b) profiled by single-cell ST. In step 1, spatial neighbourhoods (SNs) are defined over a grid and cell-type-specific gene expression profiles (GEPs) are constructed for each SN. Cell types within the same SN have the same matrix index and are co-registered. In step 2, the similarity of SNs is assessed and the covariance structures of their cell-type-specific GEPs are fused using similarity network fusion (SNF)32. In steps 3 and 4, Louvain clustering is applied to the fused similarity matrix (step 3), and cell-type-specific average expression is computed within each SN cluster (step 4), resulting in a gene by SN cluster expression matrix for each cell type. In step 5, steps 1 to 4 are repeated for multiple samples (e.g., tumour specimens). The resulting cell-type-specific expression matrices from all specimens are concatenated column-wise into a single GEP matrix per cell type (step 6), with rows representing common genes and columns representing the union of SN clusters across all samples. In steps 7 and 8, the covariance of cell-type-specific GEPs across all SN clusters is assessed and SNF is used to integrate these covariance matrices across cell types, resulting in a fused similarity matrix among SN clusters. In step 9, non-negative matrix factorization is applied to define robust spatial clusters termed spatial ecotypes82. Full details are provided in Methods. c, Inputs and outputs of Spatial EcoTyper, here showing three samples from the MERSCOPE discovery cohort used in this work. Cell icons in c were created using BioRender; Steen, C. (2026) https://BioRender.com/xxbpjx0.
Extended Data Fig. 3 Benchmarking, MERSCOPE samples, and analysis of spatial neighbourhoods from Spatial EcoTyper embeddings.
a, Spatial ecotype clusters defined by Spatial EcoTyper and previous methods in a representative melanoma specimen profiled by MERSCOPE (Melanoma 1, Supplementary Table 8; Methods). b, Comparison of methods for identifying spatial ecotype clusters in tumour samples profiled MERSCOPE (n = 9; Supplementary Table 8), applied to each tumour sample individually. Left: Dot plot showing the relative performance of each method for identifying spatial ecotype clusters using three metrics: (1) “spatial colocalization”, which captures the local contiguity of cells within each cluster, (2) “cell type mixing”, which captures the diversity of cell types per cluster (higher scores denote more cell types), and (3) “mean silhouette width”, which captures, for each cell type, the degree of gene expression profile (GEP) separation between clusters and the degree of GEP similarity within the same cluster (higher scores denote greater cluster separation and compactness) (Supplementary Methods). Dot sizes reflect the relative ranking of methods based on each metric (larger=better). Each quantity was first averaged across clusters within each sample, converted to rank space, and then averaged across samples. Right: Box plot aggregating the three metrics from each tumour sample by geometric mean of their ranks. The box centre lines, bounds of the box, and whiskers denote medians, 1st and 3rd quartiles, and minimum and maximum values within 1.5 × IQR (interquartile range) of the box limits, respectively. c, UMAP embeddings showing integration of two single-cell ST samples of the same cancer type (Melanoma 1 and Melanoma 2, Supplementary Table 8) for selected methods, with cells coloured by samples (top), tumour and adjacent stroma (centre), and cell types (bottom). In the Spatial EcoTyper UMAP, a small amount of jitter was applied to display individual cells within each spatial cluster. d, Scatter plot comparing Spatial EcoTyper against all methods in the benchmarking analysis that enable sample integration, showing their relative performance for identifying spatial ecotypes conserved across Melanoma 1 and Melanoma 2 (Supplementary Table 8). Performance was assessed using metrics that quantify the degree of cell type mixing (x-axis) and sample mixing (y-axis), averaged across identified spatial ecotype clusters (Supplementary Methods). e, Scatter plot summarizing the performance of each method for identifying spatial ecotype clusters, combining single-sample analysis (panel b) and integrative analysis (panel d). Integration performance was computed as the geometric mean of cell type mixing and sample mixing metrics (panel d) in rank space. For additional details, see Supplementary Methods. f, Composition of MERSCOPE specimens used for SE discovery and validation. Only samples with more than 5% TME cells derived from tumour or adjacent stroma regions were included (Supplementary Table 8). g, Robustness of spatial embeddings to spatial neighbourhoods of diverse radii, related to Fig. 2b. Here, Spatial EcoTyper was applied to a melanoma specimen profiled by MERSCOPE (Melanoma 1, Supplementary Table 8) (left) to create spatial embeddings as illustrated in Fig. 2a and detailed in Methods, but where the spatial neighbourhood radius was varied (centre). Each point in the embedding denotes an individual spatial neighbourhood. Right: Concordance between spatial neighbourhoods in the Spatial EcoTyper embedding and physical distance to the tumour margin, determined using Spearman correlation as described in Supplementary Methods. A radius of 50 µm, reflecting a balance between higher concordance and smaller radius, was selected for subsequent analysis. h, Analysis of the relationship between the organization of sample-specific spatial embeddings in Fig. 2b and physical distance to the tumour margin. Concordance was quantified as described in Supplementary Methods. Significance was calculated with two-sided t tests. i, Same as Fig. 2b, but colouring individual spatial neighbourhoods by the expression of an immune-hot signature (average log2 expression of 13 signature genes34; Supplementary Methods).
Extended Data Fig. 4 Discovery and robustness testing of nine spatial ecotypes from MERSCOPE data.
a, Robustness of spatial ecotype discovery to Louvain clustering at different resolutions (step 3 in Extended Data Fig. 2), performed for 10 final clusters (step 9 in Extended Data Fig. 2). Left: Spatial EcoTyper results following step 9 obtained with Louvain resolutions (step 3) ranging from 1–50. Right: The average adjusted rand index (ARI) comparing clusters at each resolution (x-axis) with results obtained at other resolutions (Supplementary Methods). The resolution used for SE discovery in this study is indicated by a red arrow. b, Cophenetic coefficient plot for non-negative matrix factorization (NMF) applied to the fused spatial covariance matrix generated by Spatial EcoTyper in step 9 (Extended Data Fig. 2). NMF was initialized across a range of cluster numbers (2–50). The red arrow indicates the cluster number selected for subsequent analysis, corresponding to the region of the graph with the largest subsequent drop (Methods). c, Box plot showing the average Euclidean distance (μm) of each SE to the tumour margin across five tumour samples in the discovery cohort (>0, intratumoral; <0 adjacent stroma; Supplementary Table 9). The box centre lines, bounds of the box, and whiskers denote medians, 1st and 3rd quartiles, and minimum and maximum values within 1.5 × IQR (interquartile range) of the box limits, respectively. d, Schematic summarizing robustness experiments for single-sample analysis with Spatial EcoTyper (with steps corresponding to the workflow in Extended Data Fig. 2), highlighting the parameters and analytical steps modified, along with the outputs used for performance evaluation in each experiment. SN, spatial neighbourhood; HVGs, highly variable genes; PCs, principal components. e, Density plot comparing spatial embedding matrices from Spatial EcoTyper derived using 100 versus 200 HVGs for the analysis of a melanoma specimen (Melanoma 1, Supplementary Table 8). Consistency between embeddings was assessed using Spearman correlation with p-value determined by two-sided t test. f, Bar plot summarizing the robustness of spatial embeddings derived from different conditions for step 2: varying the number of HVGs and the use of PCA. Robustness is quantified as the Spearman correlation between the spatial embedding matrix under each condition and the default matrix (as shown in e) for all five MERSCOPE discovery samples. g, Confusion matrix, normalized to unit sum for each row, comparing original SEs to newly defined SEs using 100 HVGs instead of 200. h, Same as g but comparing the use of spectral clustering instead of Louvain clustering for single-sample analysis (Supplementary Methods). i, Schematic summarizing robustness experiments for cross-sample integration, highlighting modified parameters and analytical steps and evaluated outputs. j, Comparison of SEs obtained from different conditions as in g, shown here for variations in step 7 for integrative analysis. Left: 100 versus 200 HVGs. Right: PCA with 20 PCs versus no PCA. k, Same as panel j, but comparing SEs derived from spectral clustering versus default NMF clustering used for final SE discovery. l, Confusion matrix, normalized to unit sum for each row, showing average overlap between the original nine SEs and those rediscovered from running Spatial EcoTyper on all combinations of three or more cancer types from the MERSCOPE discovery cohort. In panels g, h, j, k and l, overlap was determined for spatial neighbourhood clusters within the integrated Spatial EcoTyper embedding (step 8 in Extended Data Fig. 2). For details, see Supplementary Methods.
Extended Data Fig. 5 Distinguishability of spatial ecotypes by archetypal analysis.
a, Workflow for determining whether the nine spatial ecotypes are distinguishable by principal component analysis (PCA) applied to the Spatial EcoTyper embedding (step 2, Extended Data Fig. 2). b, Association between SEs and PCs from a spatial embedding of a representative tumour specimen profiled by MERSCOPE (Liver 1, Supplementary Table 8), where each data point is a spatial neighbourhood (SN) in the embedding (as in panel a). Association strength is quantified by an “Enrichment score”, capturing skewing of SEs toward positive (“pos”) and negative (“neg”) PC values (selected from PCs 1:10) following standardization, as described in Supplementary Methods. c, Workflow for evaluating SE distinguishability via archetypal analysis35, run with 10 archetypes applied to the top 10 PCs from the Spatial EcoTyper embedding (Supplementary Methods). d, Overlap between archetypes and SEs in the Liver 1 MERSCOPE sample. In a one-to-one manner, each archetype was assigned to a unique SE (SE1–SE9) or non-SE as described in Supplementary Methods, with archetypes assigned to the former numbered 1–9, respectively. Dot size and colour represent the fraction of overlapping spatial neighbourhoods, normalized to unit sum per SE. e, Same as d, but showing mean overlap across the five MERSCOPE discovery samples (Supplementary Table 8). f, Heat map showing single-sample SE-archetype overlap across all MERSCOPE discovery samples. Overlap is normalized for each SE and tumour sample pair as in panel d. Statistical significance was determined for each row by a two-sided Wilcoxon rank-sum test applied to each SE versus the remaining SEs. g, Relationship between the number of PCs used for archetypal analysis and SE discrimination, measured by the effect size (Cohen’s d) between each SE and the remaining SEs across the five MERSCOPE discovery samples in f (Supplementary Methods). The unit of analysis is SE-archetype overlap. A trend line was determined using local polynomial regression (LOESS), with the shaded area representing the 95% confidence interval.
Extended Data Fig. 6 Cell states, geographic features, and cross-platform detectability of spatial ecotypes.
a, Identification of SE-specific cell states via supervised NMF and leave-one-sample-out cross-validation (LOOCV). Left: Workflow (Methods). Right: Heat map showing F1 scores, standardized by row for all cell states and SEs, reflecting the specificity of each cell state for the SE from which it was derived, as determined by LOOCV. See also Supplementary Table 10. b, Workflow for validating SEs using Spatial EcoTyper recovery mode (top) and discovery mode (bottom; Extended Data Fig. 2) applied to diverse ST datasets (Supplementary Tables 1 and 8). For details, see Methods. c, Workflow for evaluating the spatial colocalization of SE-specific cell states. Conditional probabilities were adjusted for background expectation and combined across samples, yielding “colocalization indices” as described in Methods. d–f, Heat maps showing cell-state colocalization across tumour samples profiled by MERSCOPE (discovery and validation cohorts), Visium HD and Xenium. Two-sided p-values were determined as described in Methods. g, Concordance of SE cell-state colocalization indices between the MERSCOPE discovery cohort (panel d) and two held-out cohorts: MERSCOPE validation (panel e) and combined Visium HD and Xenium (panel f). All colocalization indices were first capped to an absolute value of 5 per sample, preventing larger values from disproportionately influencing integration results (Methods). h, Spatial coherence (Moran’s I) of SEs across 21 samples profiled by single-cell-scale ST. Moran’s I was standardized based on background expectation (Methods). The box centre lines, bounds of the box, and whiskers denote medians, 1st and 3rd quartiles, and minimum and maximum values within 1.5 × IQR (interquartile range) of the box limits, respectively. i, Scatter plot showing the consistency between expected and predicted Euclidean distances (μm) of SEs to the tumour margin—averaged by SE within sample, then by cancer type and then across cancer types—in the MERSCOPE discovery cohort. The former (x-axis) was drawn from Extended Data Fig. 4c. The latter (y-axis) reflects SE distances re-derived from LOOCV applied to the discovery cohort (Methods). Concordance was determined by Pearson correlation and linear regression, with 95% confidence intervals shown. Significance was determined by a two-sided t test. j, Same as i but plotting significance (–log10 p-value) versus Pearson correlation for each ST cohort enumerated in panel b as compared to expected distances (x-axis of panel i). “MERSCOPE discovery” is the same as panel i. For details, see Methods. k, Workflow for testing the reproducibility of MERSCOPE-derived SEs from an independent cohort of three tumour samples profiled by Xenium Prime (Supplementary Methods). l, Same as Fig. 2d, but showing a fused spatial covariance matrix from three tumour specimens profiled by Xenium Prime, clustered into 11 spatial ecotypes (Supplementary Methods). m, Overlap between MERSCOPE-derived SEs (identified by recovery mode) and the best matching Xenium-derived SEs (panel l), normalized by row to unit sum. Overlap was determined for spatial neighbourhood clusters within each sample and then averaged across samples. Xenium SEs 1 to 9 were numbered according to the best matching MERSCOPE SEs. n, Concordance between cell states discovered from Xenium-derived SEs 1 to 9 (“Xenium SE”) and cell states belonging to MERSCOPE-derived SEs (“Original SE”) for the same samples analysed in panel m. The former were predicted by LOOCV as described in Supplementary Methods. F1 scores comparing cell-state overlap within the same cell type were balanced across samples and standardized per row. o, Identical to Extended Data Fig. 5f but showing SE-archetype overlap across 21 single-cell-scale ST tumour samples (Supplementary Table 8). Statistical significance was determined for each row by two-sided Wilcoxon rank-sum test of each SE versus the remaining SEs.
Extended Data Fig. 7 Extended analysis of SE generalizability and molecular features.
a, Schema for evaluating pairwise correlations of SE-specific cell states recovered in scRNA-seq data. b, Left: Composition of scRNA-seq atlas covering 144 tumours spanning melanoma and nine carcinomas (Supplementary Table 2). Right: Heat map showing pairwise Spearman correlations of cell-state abundances in the scRNA-seq atlas, with two-sided p-values determined as described in Methods. SE and cell type colours are the same as panel c. c, Same as b but for 64 brain metastases originating from melanoma and five types of carcinoma (Supplementary Table 2). d, Same as Fig. 2g but showing SE-specific cell-state markers identified from five cancer types (those included in the MERSCOPE discovery cohort) and validated in five held-out cancer types. Cancer type colours are provided in panel f. e, Left: Pairwise Jaccard similarity matrix, standardized by rows, comparing cell-state markers identified from the discovery cohort in panel d (rows) versus those independently discovered from the validation cohort in panel d (columns). Right: Same as the left panel but comparing markers identified from the discovery cohort in panel d (rows) versus those identified from all 10 cancer types in the scRNA-seq compendium (columns). The latter are plotted in Fig. 2g and used throughout this work. SE colours are identical to those in d. f,g, Normalized expression levels of SE-specific consensus markers defined from 144 tumour samples profiled by scRNA-seq (panel f; Supplementary Table 2) and validated across 21 tumour samples profiled by single-cell-scale ST (panel g; Supplementary Table 8), shown mean-aggregated by cancer type and sample, respectively (Supplementary Methods). Consensus markers are provided in Supplementary Table 11. Statistical significance was determined for each row by a two-sided Wilcoxon rank-sum test, comparing differential expression of each SE versus the remaining SEs. ****P < 0.0001. “Not detected”, consensus genes absent from feature space. h, Enrichment of top biological processes overlapping consensus markers defined for each SE (Supplementary Table 11). Only processes satisfying a hypergeometric test significance threshold of Q < 0.1 are shown. Dot size and colour indicate significance of the enrichment. For additional details, see Methods. i, Heat map showing relative levels of SE consensus markers in malignant and non-TME cells, aggregated by platform across 10 scRNA-seq datasets (n = 144 tumours) and four single-cell-scale ST assays (Supplementary Tables 2 and 8; Supplementary Methods). j, Heat map depicting the overlap between spatial ecotypes and carcinoma ecotypes (CEs)6 in scRNA-seq data, with the latter ordered by Moran’s I, as determined previously6. The overlap index was computed as the fraction of cells within SE \(i\) that are jointly assigned to CE \(j\), normalized by the fraction expected by random chance and expressed as a two-sided z-score averaged across 10 evaluable cancer types (Methods). Cohort colours in g and i are identical to f.
Extended Data Fig. 8 Multimodal validation of SE deconvolution.
a, Heat map showing pairwise Pearson correlation coefficients between predicted and expected levels of nine SEs across pseudo-bulk tumour samples (n = 1,000). b, Box plots showing Pearson correlation coefficients between predicted and expected SE proportions in pseudo-bulk tumours of cancer types held out from training. Results for 10 malignancies and 1,000 pseudo-bulk tumours from 10 scRNA-seq datasets are shown (Methods). c, Scatter plots showing the concordance between predicted and expected SE proportions in pseudo-bulk tumour samples (n = 1,000). Each cancer type was held out from training using LOOCV (Methods). d, Workflow for assessing deconvolution performance of Spatial EcoTyper using paired bulk RNA-seq and single-cell RNA-seq data from melanoma and colorectal tumour specimens (Supplementary Table 12). SEs were deconvolved from bulk RNA-seq data to estimate SE abundances, while cell states were assigned to single cells to determine ecotype membership and infer relative abundances from scRNA-seq data (Methods). e, Scatter plots showing concordance between predicted and expected SE proportions, with the former deconvolved from real bulk RNA-seq data of melanomas (n = 3) and colorectal tumours (n = 4) and the latter determined from paired scRNA-seq data (Supplementary Table 12). Both assays were performed on the same cell suspensions to minimize bias (Methods). f, Box plot comparing Pearson correlation coefficients from panel e (“Dissociated”) versus those determined from the same analysis performed on bulk RNA-seq of non-dissociated (“Intact”) tumours (see panel d). Statistical significance was determined using a two-sided Wilcoxon signed-rank test. g, Left: Box plot showing performance of Spatial EcoTyper against CIBERSORTx117, BayesPrism118, and DWLS119 for deconvolving SE abundances from the same pseudo-bulk samples as in panel b (Methods). Statistical significance was assessed by a one-sided Wilcoxon signed-rank test relative to Spatial EcoTyper. Right: Same as the left panel but showing performance evaluated over the same data as in panel e (Methods). h, Left: Same as Fig. 3b but showing all pairwise Pearson correlations between SE levels determined from bulk RNA-seq and paired Visium ST profiles of tumours from 42 patients. Right: Same as the left panel but using Spatial EcoTyper to determine SE levels from pseudo-bulk RNA-seq profiles assembled from each Visium sample (Methods). i, Scatter plots showing all data underlying Fig. 3b and the diagonal of panel h (left). Performance in panels c and i was determined by Pearson correlation and linear regression, with 95% confidence intervals shown and p-values determined by two-sided t tests. In b, f and g, the box centre lines, bounds of the box, and whiskers denote medians, 1st and 3rd quartiles, and minimum and maximum values within 1.5 × IQR (interquartile range) of the box limits, respectively.
Extended Data Fig. 9 Extended analysis of SE deconvolution from paired ST profiles and large clinical cohorts.
a, Left: Spatial map of a colon cancer specimen profiled by Visium HD, for which an adjacent section profiled by standard Visium was deconvolved with Spatial EcoTyper (Supplementary Table 8). Spots are coloured by dominant cell types following stringent quality control (n = 509,217 spatial spots, each 8 µm2, are shown; Methods). Right: Spatial maps of selected SEs following co-registration of Visium HD and Visium samples (Methods). SEs in the former were aggregated to compare against 55 µm diameter Visium spots. SEs in the latter were quantified by Spatial EcoTyper deconvolution. b, Concordance between SE levels for all nine SEs in the co-registered ST data from panel a. Statistical significance was determined as described in Methods. c, Same as Extended Data Fig. 7g, but showing the normalized expression of SE consensus markers (Supplementary Table 11) in the MERSCOPE sample from Fig. 3e and the Visium HD sample from panel a (Supplementary Methods). d, Biological features enriched or depleted in each SE, determined by Spearman correlation between inferred SE abundances and each feature (Supplementary Methods). e, Scatter plot showing mean major histocompatibility complex (MHC) expression levels (x-axis) and relative stromal abundances (y-axis) across 17 TCGA cancer types (same as in Fig. 3g). PRAD/ESCA and PAAD have the lowest MHC expression and highest stromal content, respectively. For details, see Methods. f, Multivariable associations between overall survival and TMB, CD274 (PD-L1) expression, and inferred levels of either SE7, SE8, or SE4, in 465 pretreatment tumours from patients with advanced melanoma, non-small-cell lung cancer, or bladder cancer treated with ICIs (Supplementary Table 15). Data are presented as hazard ratios +/− 95% confidence intervals. Two-sided p-values were determined as described in Methods.
Extended Data Fig. 10 Development and technical assessment of Liquid EcoTyper.
a, Workflow for simulating plasma cfDNA methylation profiles for the development of a Liquid EcoTyper model for melanoma. Triplets of healthy plasma methylation profiles were first combined with noise, followed by mixing with melanoma methylation profiles to simulate melanoma plasma cfDNA methylation. Full details are provided in Methods. b, Comparison of real and simulated melanoma cfDNA methylomes. Left: Principal component analysis (PCA) of simulated plasma cfDNA methylomes from melanoma patients (n = 461) and real plasma cfDNA methylomes from melanoma patients (n = 78) and healthy individuals (n = 23). Right: Same as the left panel but showing circulating tumour DNA (ctDNA) levels for real melanoma samples (n = 60) and simulated samples (n = 461, tumour purity-weighted contribution) (Methods). c, Performance of the Liquid EcoTyper melanoma model on simulated cfDNA (related to a and Fig. 4b). Left: Pairwise correlations between predicted and expected SE levels in simulated test data. Centre: Ground truth SE covariance in test data. Right: Spearman correlation between left and centre panels, with p-value determined by a two-sided t test. A linear regression line with 95% confidence intervals is shown. d, Left: Average Spearman correlation coefficients between predicted (Liquid EcoTyper) and expected SE levels in held-out test cohorts for Liquid EcoTyper models trained and evaluated on 13 TCGA carcinoma types individually. Centre: Same as the left panel, but for models trained jointly over 12 carcinoma types at a time and then evaluated over the remaining held-out carcinoma type in a leave-one-out framework. Right: Median Spearman correlation coefficients across eight evaluable SEs per carcinoma type for per-cancer (left) and leave-one-out (centre) frameworks. The box centre lines, bounds of the box, and whiskers denote medians, 1st and 3rd quartiles, and minimum and maximum values within 1.5 × IQR (interquartile range) of the box limits, respectively. Statistical significance between boxes was assessed by a two-sided Wilcoxon signed-rank test. ns, not significant. For details, see Methods. e–i, Consistency, specificity, and interpretability of the Liquid EcoTyper melanoma model (related to a and Fig. 4b). e, Density plots of Pearson correlation coefficients comparing—for each SE—the average methylation levels of each CpG set learnt by Liquid EcoTyper (n = 4,000 CpG sets) against SE levels in the melanoma test set (n = 115 TCGA samples), calculated for predicted (y-axis) versus ground truth (x-axis) SE levels. Denser regions are denoted by darker colour. Concordance was assessed by Pearson correlation, with statistical significance assessed by two-sided t test. f, Heat map showing the specificity and degree of performance loss of Liquid EcoTyper in the melanoma test set (n = 115 TCGA samples) after ablation of CpG sets associated with each SE. More effective ablations are captured by a higher performance loss index, with 1 indicating complete loss (Spearman correlation against ground truth ≤0). For details, see Methods. g, Approach for determining whether CpG feature sets learnt by Liquid EcoTyper prioritize SE-specific markers, termed consensus genes (Fig. 2h and Supplementary Table 11). h, Dot plot (left) and heat map (right) showing enrichment of SE consensus genes within respective ranked gene lists derived from Liquid EcoTyper CpG sets. Normalized enrichment score (NES) values were determined with fgsea114. In the left panel, comparisons of SE-specific consensus markers with ranked gene lists of matching SEs are highlighted by coloured circles. i, Summary of results and corresponding global p-value, as determined by a permutation test (related to panel h; Methods).
Extended Data Fig. 11 Extended assessment of Liquid EcoTyper using paired tumour and plasma profiles.
a, Box plot showing Spearman correlations between SE levels inferred from EM-seq profiles of paired FFPE and fresh-frozen tumour samples from patients with melanoma (n = 5 pairs, Supplementary Table 17). b, Scatter plots showing Spearman correlations between SE levels in cfDNA and paired tumour EM-seq (n = 20 patients; see also Fig. 4d). SE levels are ranked and normalized to 0–1 within each compartment (Supplementary Table 18). c, Box plot summarizing Liquid EcoTyper performance on paired tumour Visium and plasma cfDNA EM-seq data (n = 15 pairs) across varying downsampled plasma cfDNA EM-seq sequencing depths. Importantly, across evaluable CpG sites, all plasma EM-seq samples in this study had a minimum median sequencing depth of 10× (Supplementary Table 21). d, Comparison of Liquid EcoTyper predictions on plasma cfDNA methylation data with missing CpG values imputed using the dataset mean (x-axis) or by random beta values (y-axis) (Methods). Consistency was assessed using Spearman correlation. e, Box plots summarizing Liquid EcoTyper performance on paired tumour Visium and plasma cfDNA EM-seq data (n = 15 pairs) with different CpG imputation strategies, related to panel d. f, Scatter plot showing the consistency of Liquid EcoTyper performance for SE recovery (coloured as in a) from simulated plasma cfDNA (x-axis) and real plasma cfDNA (y-axis). The x-axis is identical to Fig. 4c whereas the y-axis shows mean Spearman correlations from Visium versus plasma EM-seq (Fig. 4f) and tumour EM-seq versus plasma EM-seq (panel b). Concordance was assessed by Pearson correlation, with the identity line (y = x) shown as a comparator. Colours in c and e are defined in panel a. Group comparisons in c and e were evaluated by two-sided Wilcoxon signed-rank tests. Correlations in b, d, and f were evaluated by two-sided t tests. Linear regression lines with 95% confidence intervals are shown in b and d for display. In a, c, and d, the box centre lines, bounds of the box, and whiskers denote medians, 1st and 3rd quartiles, and minimum and maximum values within 1.5 × IQR (interquartile range) of the box limits, respectively.
Extended Data Fig. 12 Non-invasive early assessment of melanoma patient survival and immunotherapy response with liquid SEs.
a, Box plots showing inferred SE8 levels in pretreatment plasma from 78 patients with melanoma stratified by ICI response and shown for distinct ICI therapies. b, Kaplan-Meier plots showing differences in progression-free survival (left) and overall survival (right) of melanoma patients dichotomized into high and low groups based on the median of inferred SE8 levels in pretreatment plasma. c, Same as the right panel of b but for SE7 stratified by ICI therapy type (median split was determined from the entire 78-patient Yale cohort). d, Same as d but for SE4. In b-d, statistical significance was determined by a two-sided log-rank test. Hazard ratios (HRs) are shown along with 95% confidence intervals (CIs) in brackets. e, Univariate (left) and multivariable (right) Cox regression models comparing continuous liquid SE levels with key clinical indices for prediction of overall survival (see also Supplementary Table 28). Of note, all analyses are shown for the 78 patients in the Yale ICI cohort except for BRAF mutation status (mutant vs. wildtype), for which 65 of 78 patients were evaluable (Supplementary Table 19). Bars above the horizontal dashed line are statistically significant (P < 0.05). ns, not significant. f, Box plots showing inferred SE7 (left), SE8 (centre), and SE4 (right) levels in pretreatment plasma stratified by ICI response across datasets from different institutions (Supplementary Tables 19 and 20). The dashed line was determined by maximizing Youden’s J statistic in the Yale cohort. The area under the receiver operating characteristic (ROC) curve (AUC) was calculated within each cohort separately. g, Same as panel a but showing ctDNA levels in the Yale cohort (n = 60 evaluable patients, Supplementary Table 26). h, Same as e but showing univariate and bivariable Cox models comparing continuous liquid SE levels against ctDNA levels in all melanoma patients treated with ICI with evaluable ctDNA (n = 60, Supplementary Table 26). i, Same as h but shown for melanoma patients with liquid SE levels and paired TMB (left, n = 38 patients) or liquid SE levels and paired PD-L1 staining (right, n = 15 patients) (Supplementary Table 24). See also Supplementary Table 28. In a, f, and g, statistical significance was determined using two-sided Wilcoxon rank-sum tests. In e, h, and i, p-values were determined using two-sided Wald tests. In a, f, and h, the box centre lines, bounds of the box, and whiskers denote medians, 1st and 3rd quartiles, and minimum and maximum values within 1.5 × IQR (interquartile range) of the box limits, respectively. DCB, durable clinical benefit; NDB, non-durable clinical benefit.
Supplementary information
Supplementary Information (download PDF )
Supplementary Figures 1–9, Supplementary Methods, Supplementary Notes and Supplementary References
Supplementary Tables (download XLSX )
Supplementary Tables 1–28
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhang, W., Brown, E.L., Usmani, A. et al. Non-invasive profiling of the tumour microenvironment with spatial ecotypes. Nature (2026). https://doi.org/10.1038/s41586-026-10452-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41586-026-10452-4




