Cross-tissue multicellular coordination and its rewiring in cancer

Shi, Qiang; Chen, Yihan; Li, Yang; Qin, Shishang; Yang, Yu; Gao, Yang; Zhu, Linnan; Wang, Dongfang; Zhang, Zemin

doi:10.1038/s41586-025-09053-4

Download PDF

Article
Open access
Published: 28 May 2025

Cross-tissue multicellular coordination and its rewiring in cancer

Nature volume 643, pages 529–538 (2025)Cite this article

99k Accesses
22 Citations
127 Altmetric
Metrics details

Subjects

Abstract

The multicellular coordination that underlies tissue homeostasis and disease progression is of fundamental interest^1,2,3,4,5. However, how diverse cell types are organized within tissue niches for cohesive functioning remains largely unknown. Here we systematically characterized cross-tissue coordinated cellular modules in healthy tissues, uncovering their spatiotemporal dynamics and phenotypic associations, and examined their rewiring in cancer. We first compiled a comprehensive single-cell transcriptomic atlas from 35 human tissues, revealing substantial inter-tissue variability in cellular composition. By leveraging covariance in cellular abundance, we identified 12 cellular modules with distinct cellular compositions, tissue prevalences and spatial organizations, and demonstrated coordinated intercellular communication within cellular modules using in situ spatial and in vivo perturbation data. Among them, two immune cellular modules in the spleen showed contrasting chronological dynamics with ageing. Analysis of multicellular changes in the breast revealed a menopausal trajectory associated with fibroblast dynamics. Furthermore, interrogation across cancer types uncovered simultaneous rewiring of two types of multicellular ecosystem during tumour progression, including the loss of tissue-specific healthy organization and the emergence of a convergent cancerous ecosystem. These findings reveal fundamental organizing principles of multicellular ecosystems in health and cancer, laying a foundation for further investigations into tissue-level functional coordination across diverse contexts.

Multiscale modeling of collective cell migration elucidates the mechanism underlying tumor–stromal interactions in different spatiotemporal scales

Article Open access 28 September 2022

Integrated multi-omics reveals cellular and molecular interactions governing the invasive niche of basal cell carcinoma

Article Open access 20 August 2022

Single-cell multiomics analysis reveals dynamic clonal evolution and targetable phenotypes in acute myeloid leukemia with complex karyotype

Article Open access 25 November 2024

Main

The human organism is composed of an intricate mosaic of approximately 37 trillion diverse cells, harmoniously arranged into specialized tissues, organs and interconnected systems. Over the past decade, advances in single-cell technologies and initiatives such as the Human Cell Atlas⁶ have enabled comprehensive profiling of various human tissues, revealing a substantial number of previously uncharacterized cell types^7,8,9. However, it remains largely unclear how multiple cell types precisely coordinate within their native niches to perform essential physiological functions, and which components are particularly constrained under pathological conditions. In the context of tumours, for instance, the complex network of cellular interactions that underlie pro- and anti-tumour responses is still not fully elucidated¹⁰. This knowledge gap hampers our understanding of tissue organization and function, impeding the development of more effective strategies for disease diagnosis, monitoring and intervention.

Previous studies aimed to establish connections among cell types by measuring cell–cell interactions based on known ligand–receptor pairs^11,12. Although these efforts have provided valuable insights into intercellular communication within specific contexts, they fall short of systematically unravelling complex tissue-level signalling networks, in which multiple cell types are coordinated into higher-order functional modules. Additionally, spatially resolved technologies have been utilized to characterize cellular organization in tissues^13,14,15,16; however, the limited number of genes that has been profiled and the lack of bona fide single-cell resolution restrict their ability to comprehensively portray those intricate multicellular structures. Notably, functionally related organs may share similar functional modules, as exemplified by mucosal immunity in both the small intestine and colon¹³, suggesting the cross-tissue potential of multicellular ecosystems. Therefore, a systematic investigation of multicellular coordination is essential to elucidate fundamental units across human tissues.

A single-cell atlas of the human

To systematically elucidate the multicellular coordination, we assembled a comprehensive pan-tissue transcriptomic atlas at single-cell resolution. Following stringent quality control, we obtained a total of 2,293,951 high-quality cells from 706 healthy samples across 35 human tissues for downstream analyses (Fig. 1a, Extended Data Fig. 1 and Supplementary Table 1). To harmonize such extensive datasets, we evaluated several widely used data integration tools using the scIB platform¹⁷, with BBKNN¹⁸ emerging as the top performer (Extended Data Fig. 2). Of note, uniform manifold approximation and projection (UMAP) embedding demonstrated distinct separation among various cell types and effective integration across different sexes, tissues and datasets (Fig. 1b and Supplementary Fig. 1a).

We performed unsupervised cell clustering and annotation hierarchically. Initially, we established eight broad cell types using canonical markers, consistent with their annotations in the original studies (Fig. 1b and Supplementary Fig. 1b,c). Notably, approximately 45% of the cells in our atlas were attributed to the immune compartment (Supplementary Fig. 1c), providing a solid foundation for studying the phenotypic diversification of immune cells¹⁹. The composition of these cell types exhibited noticeable variation among diverse tissues (P < 2.2 × 10⁻¹⁶, Chi-squared test) (Fig. 1c). For example, peripheral blood and immune organs or tissues, including the bone marrow, lymph nodes, omentum, spleen and thymus, were predominantly composed of immune cells, aligning with their roles in immune cell production, maturation or storage. By contrast, the reproductive tissues, such as the uterus, vagina and testis, exhibited a higher proportion of stromal cells.

Next, each cell type was further dissected into several distinct cell states or subsets (Fig. 1d). Despite sharing similar transcriptomic profiles, cell subsets within the same cell types displayed clear preferences for specific tissues (Supplementary Fig. 2). For instance, endothelial cells—which are critical components of the vasculature—exhibited unique delineations based on their sources, whether from blood or lymphatic vessels. Additionally, a rare cell subset called age-associated memory B cells (ABC, B08), which constituted less than 1% of total B cells, was prominently present not only in the liver, bone marrow and spleen—as previously reported^7,20—but also in unexpected locations such as the ureter and skeletal muscle.

Among stromal cells, fibroblasts exhibit substantial diversity, encompassing both universal and specialized subsets²¹. We identified 12 fibroblast subsets across various tissues, with subsets S01 to S04 being the most widely distributed (Supplementary Fig. 3a–d and Supplementary Table 2). These universal subsets were considerably depleted in the reproductive system, whereas reproductive-system-specific subsets appeared more specialized to specific tissues rather than the entire reproductive system (Fig. 1e). This suggests unique architectural adaptations related to reproductive functions. Spatially resolved transcriptomics revealed distinct spatial distribution patterns for two uterus-specific fibroblast subsets, with S11 corresponding to the endometrium and S12 to the myometrium²² (Fig. 1f). Moreover, while exhibiting the strongest specificity for particular tissues, epithelial cells within the same human body systems displayed higher similarities than those among different systems (Supplementary Fig. 3e). In sum, we constructed a comprehensive pan-tissue single-cell atlas (http://cm.cancer-pku.cn) and revealed substantial preferences of cell subsets across various tissues.

Identification of cross-tissue CMs

A notable illustration of multicellular coordination is evident in gut-associated mucosal immunity, where diverse cell types such as lymphocytes, dendritic cells and epithelial cells collaborate to defend against pathogenic insults²³. Our study aims to determine whether such coordinated multicellular ecosystems represent a recurrent theme across the human body, thereby addressing the fundamental question of how these ecosystems contribute to cohesive functional coordination among diverse cell types at the tissue level.

To investigate this, we translated the concept of multicellular coordination into representations of co-occurring cellular networks and developed a computational framework, CoVarNet. It reconstructs cellular module (CM) networks by leveraging covariance in cell subset frequencies across samples through two parallel modules (Fig. 2a and Methods). The first module utilizes non-negative matrix factorization (NMF) to identify a set of factors, prioritizing cell subsets on the basis of their weights. The top subsets from each factor serve as co-occurring nodes in the CM network. The second module determines specifically correlated subset pairs, which act as potential edges in the network. Multiple CM networks are then constructed by connecting these co-occurring nodes through the potential edges, followed by topological and statistical evaluations.

**Fig. 2: Systematic identification of cross-tissue CMs.**

Applying this framework to our pan-tissue atlas, we identified a total of 12 CMs (Fig. 2b, Extended Data Fig. 3a–d and Supplementary Fig. 4). In light of our focus on cross-tissue multicellular coordination, epithelial cells, owing to their highly tissue-divergent nature, were excluded from this analysis. The vast majority (88.16%, 67 out of 76) of non-epithelial cell subsets participated in at least one CM, whereas the remaining subsets (9 out of 76) were excluded either owing to technical underrepresentation, including germinal centre B cells (B09) and neutrophils (M13 and M14), or owing to their highly specialized roles in a single tissue, as in the case of skeletal muscle cells (S18). In particular, one-quarter of these subsets (17 out of 67) were involved in multiple CMs, and none of the networks clearly exhibited the presence of a hub node. These findings are consistent with fundamental biological principles, emphasizing that all cell types are functionally distinct, with some serving unique roles in different tissues. Additionally, we utilized the coefficient matrix from NMF as a measure of CM activities for each sample (Extended Data Fig. 3e). As expected, the activities of all CMs exhibited positive correlations with the frequencies of their component cell subsets across tissues (Supplementary Fig. 5). Given that each sample hosted a predominant CM, we categorized all samples into 12 CM types (CMTs) according to their most abundant CMs. Notably, all CMTs were composed of multiple tissues, indicating the cross-tissue nature of the identified CMs (Extended Data Fig. 3e,f).

To validate the robustness of these results, we conducted an integrative analysis combining single-cell RNA sequencing (scRNA-seq) data with approximately 12,000 RNA-sequencing (RNA-seq) profiles from the Genotype-Tissue Expression (GTEx) project²⁴ (Extended Data Fig. 4a,b and Methods). Of note, all CMTs could be well recovered, and each bulk sample also had a predominant CM (Extended Data Fig. 4c,d). Furthermore, the expression patterns of markers for CM-related cell subsets demonstrated strong concordance between the two data types (Extended Data Fig. 4e), further confirming the validity of the CM landscape that we established.

CM annotations

Most CMs exhibited notable preferences for specific human body systems (Fig. 2c), validated by external GTEx data (Extended Data Fig. 4f). Among the 12 identified CMs, CM04, CM05, CM06 and CM09 were enriched in primary immune organs (bone marrow and thymus), secondary immune organs (lymph nodes and spleen) and peripheral blood. This enrichment correlated with their cellular compositions, primarily comprising immune subsets prevalent in these tissues (Fig. 2b). Similarly, CM07 and CM12 demonstrated preferences for the reproductive system, aligning with the presence of those specialized fibroblasts predominantly found in the ovary, vagina, breast and prostate (Figs. 1e and 2b). CM02 and CM03 were mainly distributed in the urinary system and gastrointestinal tract, whereas CM08 was enriched in barrier tissues such as the skin, oral mucosa, tongue, vagina and trachea, indicating that these CMs may represent multicellular ecosystems within mucosa-associated lymphoid tissues. Additionally, CM10 appeared to function as a vascular unit, characterized by pericytes (S13), smooth muscle cells (S14) and vascular endothelial cells (E01 and E02) (Fig. 2b and Extended Data Fig. 3c), and was enriched in the vasculature, heart, skin and fat tissues. By contrast, CM11 showed enrichment in the lung, kidney, liver and fat, implying roles in metabolic processes. CM01, characterized by tissue-resident macrophages (M09), universal fibroblasts (S03 and S05) and lymphatic endothelial cells (E05) (Fig. 2b and Extended Data Fig. 3c), was broadly distributed across nearly all human body systems (Fig. 2c and Extended Data Fig. 3f), suggesting a universal multicellular organization. Collectively, these findings demonstrate that the identified CMs exhibit distinct preferences for multiple tissues, thus representing cross-tissue multicellular ecosystems (Fig. 2d).

Spatial characteristics of CMs

To further characterize CMs in a spatial context, we mapped them onto spatial transcriptomics data (Visium) using cell2location²⁵ (Supplementary Table 3). We first examined CM08 and CM10, both of which were enriched in the skin. Our analysis revealed prominent spatial colocalization of cell subsets within CMs (Extended Data Fig. 5a–c). CM08, which included various immune cells such as dendritic cells and T cells, was localized to the epidermis and adjacent dermis layers, where it is likely to have a key role in immune defence. By contrast, CM10 represented components of the blood vessel that are located primarily in the dermis layer. Of note, some CM08 subsets, such as vein endothelial cells (E04), also overlapped with CM10 subsets, highlighting the interconnection of different CMs within the tissue microenvironment. These findings were consistently observed across samples from six available donors (Extended Data Fig. 5d), underscoring the robust existence of CMs.

Out of the 12 identified CMs, CM02, CM03 and CM05 exhibited prominent enrichment in the small intestine (Fig. 2c), a tissue with distinct anatomical organization¹³. Consistent spatial patterns of these CMs were observed in four sequential ileum sections from a single donor (Fig. 3a,b and Extended Data Fig. 5e,f). Unsupervised spatial clustering using CellCharter²⁶ further validated their organization that CM05 aligned with the C2 niche, whereas CM02 and CM03 aligned with the C1 niche (Fig. 3c and Extended Data Fig. 5g). Notably, CM05 showed elevated spatial concentration in the Peyer’s patch region (Extended Data Fig. 5h), where its cell subsets, including naive B cells (B03), naive T cells (CD4T03) and follicular helper T cells (CD4T04), displayed notable colocalization (Fig. 3a,b). By contrast, CM02 and CM03 were located primarily in the intestinal mucosa, in line with their composition of IgA-producing plasma cells (B12), memory (CD4T06) and tissue-resident memory (CD8T03 and CD8T04) T cells, as well as innate immune cells (I08 and I09) (Fig. 3a,b). These results suggest that CM05 and CM02/CM03, respectively, recapitulate the inductive and effector modules of mucosal immunity, underscoring their potential functional roles within tissue ecosystems.

**Fig. 3: Spatially resolved multicellular coordination in CMs.**

To deepen our understanding of CM02 and CM03, we utilized high-resolution Xenium data, enabling precise in situ characterization of hundreds to thousands of genes within cells and tissues. Using intestinal Xenium data, we designed a gene panel to differentiate the multiple cell subsets within CM02 and CM03, with gene transcript density representing the intensity of individual CMs (Extended Data Fig. 5i). Our analysis revealed a notable enrichment of CM03 in the lamina propria, whereas CM02 exhibited a more uniform distribution across the tissue (Fig. 3d,e). This subtle difference, which cannot be detected from Visium data alone, highlights not only the importance of integrating single-cell and spatial data, but also the superior sensitivity of single cell-based approaches in dissecting multicellular ecosystems with greater precision.

Multicellular communication within CMs

The formation and maintenance of CMs within tissue niches may rely on the local microenvironment. We hypothesized that diverse cell subsets within CMs are spatially organized to respond collectively, with cellular crosstalk potentially varying across different tissue microenvironments.

To explore this, we first investigated the spatial organization of CM components using Visium data across various tissues (Supplementary Table 3). Our analysis revealed a strong association between CM spatial organization and composition. The highest spatial concordance was observed in CMs with a greater proportion of lymphocytes, followed by myeloid cells, stromal cells and endothelial cells (Fig. 3f and Extended Data Fig. 6a,b). To further understand the implications of these patterns, we analysed ligand–receptor-mediated cell–cell communication using single-cell data with CellPhoneDB²⁷ (Methods). Notably, endothelial cells and stromal cells produced a wider variety of ligands compared with lymphocytes (B cells, T cells and innate lymphoid cells) (Extended Data Fig. 6c). This aligns with the notion that spatial proximity may enhance the specificity of intercellular communication, suggesting that lymphocyte-enriched CMs foster particular interactions within local niches. Conversely, CMs with a higher proportion of stromal and endothelial cells—often situated farther apart—generated a more diverse array of signalling molecules, facilitating broader signalling interactions over longer distances (Extended Data Fig. 6d,e). These insights highlight the relationship among spatial organization, cellular composition and intercellular communication.

To investigate the effect of tissue microenvironments on intercellular communication within CMs, we assessed cell–cell interactions within CMs using single-cell samples from different CMTs (Extended Data Fig. 3e and Methods). Only six CMs with high colocalization scores were analysed. Our analysis revealed enhanced cell–cell interactions within samples with high CM activities, except for CM08 (Fig. 3g), implying distinct cellular phenotypes shaped by tissue microenvironments in a CM-dependent manner. To further investigate such phenotypes, we utilized recent in vivo perturbation data from the Immune Dictionary²⁸ to identify cell-type-specific cytokines that induced these cellular phenotypes (Methods). We found that half of the cell subsets showed responsive associations with at least one cytokine and CMs with high colocalization scores tended to harbour more diverse cytokines (Extended Data Fig. 6f,g and Supplementary Table 4). Diverse cell subsets clustered together on the basis of both their cell types and CM identities (Extended Data Fig. 6f), suggesting that cellular phenotypes are collectively determined by intrinsic properties and local stimuli. For example, CD8⁺ effector memory T cells (CD8T02) exhibited distinct cytokine responses among different CMs, with TNF identified in CM02, CM08 and CM09, but absent in CM04 and CM06 (Extended Data Fig. 6f). Additionally, we examined spatial distribution of cytokines using the intestine data (Fig. 3a). Despite the transient and low-level expression characteristics of many cytokine genes, our analysis successfully validated their spatial enrichment, such as IL7 and IL18 in the CM02/CM03 (C1) niche, as well as LTA and LTB in the CM05 (C2) niche (Fig. 3h–j). These results provide a comprehensive overview of the cytokine-mediated regulatory landscape within CMs.

Collectively, our analyses of spatial organization and intercellular communication demonstrate that CMs, as fundamental tissue configurations, effectively recapitulate the complexity of multicellular ecosystems. These findings emphasize the interplay among diverse cell types within tissue microenvironments, providing insights into the mechanisms underlying tissue homeostasis.

Coordinated ageing dynamics in the spleen

To further investigate the significance of CMs, we assessed CM associations with phenotypic data. As CM activities were primarily influenced by the tissue (Extended Data Fig. 7a), we conducted a systematic interrogation of individual CMs within tissue-specific contexts (Supplementary Fig. 6 and Methods). This analysis revealed comparable CM patterns between male and female individuals in nearly all non-reproductive tissues (Extended Data Fig. 7b), whereas the thymus exhibited notable variation across age groups (Extended Data Fig. 7c). Consistent with age-related thymic involution, the thymus-enriched CM09 showed reduced activity in older individuals, with components, such as naive T cells (CD8T01) and regulatory T cells (CD4T08) being more abundant in younger individuals (Extended Data Fig. 7d).

Another noteworthy age-related association was observed in the spleen, where CM05 increased chronologically and CM06 decreased (Fig. 4a and Extended Data Fig. 7c). As the spleen included various immune cells (Fig. 1c), we systematically examined all immune subsets in the spleen, identifying ten that varied across age groups (Extended Data Fig. 7e,f). Of these, 80% (8 out of 10) were components of CM05 or CM06 (Fig. 4b), highlighting that these cross-tissue CMs effectively captured tissue-specialized variations. Notably, the expansion of four CM05 subsets (B03, B05, CD4T03 and I06) was more pronounced with ageing than that of the previously reported ABCs (B08)^7,20 (Fig. 4b and Extended Data Fig. 7g), suggesting their potential roles in the ageing process. The accumulation of NR4A1-high CD4⁺ T cells (CD4T03) in the spleen with ageing might be explained by the combined effects of immune tolerance²⁹ and thymic involution.

**Fig. 4: Multicellular dynamics in context.**

To gain deeper insights into these coordinated dynamics, we applied SCENIC³⁰ to uncover the regulators of these subsets. For each of the four CM05 subsets, we determined their specific regulons relative to other subsets within their cell types (Methods and Supplementary Table 5). Of note, these subsets exhibited notable overlap in regulons, and the convergent regulons in CM05 showed increased activity with ageing (Fig. 4c), shedding light on the common regulatory mechanisms that underlie multicellular coordination. Additionally, these subsets shared a set of signature genes, including 20 transcription factor genes such as NR4A1 and NR4A2 (Extended Data Fig. 8a–c). Further analysis identified eight key transcription factor genes (ATF3, FOS, FOSB, JUN, JUNB, JUND, KLF6 and NFKB1) as both regulons and signature genes (Extended Data Fig. 8c). These key regulators tended to act as regulatory hubs (Extended Data Fig. 8d), targeting many common genes across different cell types (Extended Data Fig. 8e). These findings align with previous reports in mice that highlight the activation of Jun and Fos members of the AP-1 complex as a signature of immune ageing³¹. Of note, NR4A1, a key mediator of T cell dysfunction²⁹, was regulated by multiple key regulators across cell types (Extended Data Fig. 8e), suggesting possible immune dysfunction associated with ageing. In summary, these results in the spleen, the largest peripheral immune organ in adults, underscore coordinated behaviour at the molecular, cellular and multicellular levels. Further investigation is warranted to identify the functional mechanisms underlying these dynamics in human ageing.

Fibroblast-engaged menopausal trajectory

To explore the role of fibroblasts within multicellular ecosystems, we focused on breast-enriched CM12, which comprised three specialized fibroblast subsets (S06, S09 and S10) alongside other diverse cells (Fig. 2b,c). We first utilized DIALOGUE³ to investigate whether local microenvironments in the breast triggered CM12-related coordinated multicellular programmes (MCPs), representing combinations of gene programmes across different cell subsets. Notably, this analysis identified an MCP that was upregulated in pre-menopausal samples compared with post-menopausal samples (Fig. 4d,e, Supplementary Fig. 7 and Supplementary Table 6). For instance, expression of genes such as SCGB2A2 and SCGB1D2 increased in post-menopausal samples across most subsets (Fig. 4d), aligning with previous reports highlighting SCGB2A2 as a promising biomarker for breast cancer detection³². Additionally, many inflammatory genes, including human leukocyte antigen genes, were found to be more highly expressed in pre-menopausal samples across cell subsets (Supplementary Table 6). Additional analysis confirmed a decrease in inflammatory scores after menopause (Extended Data Fig. 9a). Although this may seem at odds with literature suggesting a systemic increase in inflammation with ageing³³, we hypothesize that breast tissue, as a reproductive organ, is more strongly influenced by levels of oestrogen, leading to localized reductions in inflammation. These results indicate coordinated phenotypic shifts among diverse subsets within CM12 in response to menopause.

Next, we systematically examined the association between all nonepithelial subsets with menopausal status in the breast. Fibroblasts showed a particularly strong association with menopause, with subsets S10 and S06 exhibiting the most notable decreases in post-menopausal samples (Fig. 4f and Extended Data Fig. 9b). Notably, S10 displayed high expression of collagen genes (COL1A1, COL1A2 and COL3A1) (Extended Data Fig. 9c), consistent with previous reports demonstrating a reduction³⁴ in such fibroblasts in women over the age of 50. Spatial analysis revealed that S10 was uniformly distributed across the connective tissue, whereas S06 was located primarily in the lobules (Fig. 4g). By contrast, immune subsets and S09 fibroblasts within CM12 were found to colocalize in the ductal tissues of the breast (Extended Data Fig. 9d), with little change in their abundance observed after menopause. These findings underscore distinct functional roles of different fibroblast subsets in the breast.

In line with these molecular and cellular observations, post-menopausal women exhibited decreased CM12 activity compared with pre-menopausal women (Fig. 4f). Of note, such decrease was less pronounced in women over 50 (Extended Data Fig. 9e), suggesting that CM12 recapitulates menopause-associated biological changes rather than merely chronological variation. Further tailored studies are needed to fully elucidate the respective contributions of ageing and menopause, as well as their potential interactions.

Given that these changes are often progressive, we hypothesized that alterations in CM12 could serve as indicators of menopausal progression. Using the frequencies of CM12 cell subsets, we identified a menopausal trajectory that transitions from pre-menopausal to post-menopausal states (Fig. 4h). Specifically, the frequency of S10 exhibited a consistent decrease along this trajectory, whereas S06 tended to decrease later in the process (Fig. 4i). Notably, these fibroblasts (S06, S09, and S10) exhibited decreased inflammatory scores along the trajectory, particularly after menopause (Fig. 4j). Of note, we replicated the menopausal trajectory and fibroblast changes using an external breast dataset (Extended Data Fig. 9f,g), underscoring the robustness of our results. Together, these multicellular analyses highlight spatiotemporal dynamics of specialized fibroblasts in the breast.

Multicellular rewiring in cancer

To systematically understand the multicellular ecosystems in cancer, we extended our analysis to the tumour microenvironment (TME), a pathological tissue niche in which diverse immune and stromal cells interact to form a complex network¹⁰. We first established a comprehensive pan-cancer single-cell transcriptomic atlas comprising 1,062 clinical samples from 29 cancer types, identifying 91 cell subsets including 15 cancer-associated subsets previously reported across various cancer types^{35,36,37,38,39} (Fig. 5a,b, Extended Data Fig. 10, Supplementary Fig. 8 and Supplementary Tables 7–9) (http://cm.cancer-pku.cn).

**Fig. 5: Rewiring of multicellular ecosystems in cancer.**

Previous studies have shown that histologically normal tissues adjacent to tumours harbour genetic alterations and manifest a unique intermediate state between healthy and tumour tissues^40,41. Thus, we used adjacent non-tumour samples as a surrogate for precancerous tissues. To examine multicellular dynamics during tumour progression, we focused on eight cancer types with matched healthy, tumour and adjacent non-tumour samples (Fig. 5c). A notable reduction in healthy CM activity was observed in tumour samples across cancer types, suggesting a widespread disruption of tissue-specific multicellular ecosystems (Fig. 5d,e and Extended Data Fig. 11a). Notably, CM08 maintained consistent activity across healthy, adjacent non-tumour and tumour samples, suggesting that the multicellular ecosystem in healthy tissues was relatively well-preserved in cutaneous squamous cell carcinoma (Extended Data Fig. 11a). This finding aligns with the superior response to immunotherapy observed in skin cancers, such as melanoma and cutaneous squamous cell carcinoma, compared with other cancer types⁴², emphasizing the significance of contextualizing healthy ecosystems within the framework of cancer research (Supplementary Figs. 9 and 10 and Supplementary Table 10).

Given the extensive remodelling of multicellular ecosystems in cancer, we next examined the co-occurrence of cell subsets across individual cancer types. Our analysis revealed that cancer-associated subsets frequently co-occurred across multiple cancer types (Fig. 5f), indicating the emergence of a convergent multicellular ecosystem shared across cancers. To further explore this, we applied CoVarNet to the eight cancer types and identified four cancer-associated CMs (cCMs) (Extended Data Fig. 11b–d). Among these, cCM02, composed primarily of cancer-associated cell subsets, was enriched in tumour samples from most cancer types (Extended Data Fig. 12a,b), representing a cancer-associated TME ecosystem. Notably, cCM02 activity progressively increased from healthy to adjacent non-tumour, and then to tumour samples across cancer types (Fig. 5g,h and Extended Data Fig. 12c), underscoring its role as an indicator of tumour progression. We also observed increased co-occurrence of cell subsets within cCM02 in tumour samples compared with adjacent non-tumour samples, providing discernible evidence of tumour progression (Fig. 5i). Together, these findings highlight simultaneous rewiring of two types of multicellular ecosystem during tumour progression—marked by the loss of tissue-specific healthy organizations and the emergence of a convergent cancerous ecosystem (Extended Data Fig. 12d).

Further cytokine analysis for cCM02 revealed that key mediators of intercellular regulation included interferon, IL-18 and IL-15 (Extended Data Fig. 12e and Supplementary Table 11), aligning with a recent study highlighting CD8⁺ T cell-derived IFNγ as a critical modulator of the TME compared with TNF⁴³. Additionally, DIALOGUE analysis identified an MCP associated with increased cCM02 activity in tumour samples, characterized by upregulation of S100 family member genes (S100A2, S100A9 and S100A8)⁴⁴ across most cell subsets (Extended Data Fig. 12f and Supplementary Table 12). This TME programme was validated using external datasets from The Cancer Genome Atlas (TCGA) (Fig. 5j). To assess its clinical significance, we examined its expression in pre-invasive lung lesions from 51 individuals with known outcomes⁴⁵. Remarkably, pre-invasive lesions that progressed to invasive lung cancer exhibited higher expression of this programme compared with those undergoing spontaneous regression (Fig. 5k), suggesting its potential for early diagnosis of cancer.

Discussion

Understanding how diverse cell types coordinate to maintain tissue homeostasis and contribute to disease progression remains a fundamental challenge in biology. Here we present a computational framework for systematically identifying cross-tissue, co-occurring CMs and their rewiring in cancer. The pan-tissue and pan-cancer single-cell atlases that we curated represent valuable resources for the community. CoVarNet bridges the gap between well-characterized cellular diversity and the complex organization and function of tissues. By linking CMs to phenotypic data, we uncovered fundamental biological insights, highlighting CMs as a scaffold for studying multicellular organization across diverse contexts. Together, these findings illuminate core principles of multicellular ecosystems and advance our understanding of tissue-level coordination in health and disease, opening avenues for future research and potential therapeutic insights.

Methodologically, CoVarNet offers several advantages over existing strategies for identifying cellular niches²⁶ or spatial domains⁴⁶ using spatial data. First, spatial datasets are often limited by gene coverage or resolution, impeding comprehensive profiling of multicellular ecosystems. By contrast, our approach leverages single-cell transcriptomes to define fine-grained CMs, which can then be mapped onto spatial data to capitalize on the strengths of both modalities. For instance, spatial mapping of CM02 and CM03 revealed distinct distributions that were not apparent from spatial data alone. This framework enables integrative, multimodal analysis across a wide range of biological contexts. Second, many existing methods rely on spatial proximity to infer intercellular relationships, potentially overlooking broader coordination. By contrast, our approach—based on co-occurrence patterns—captures both local and distal multicellular interactions, which may be essential for understanding complex networks such as systemic immunity and cross-tissue regulation.

This study has several limitations. Our current framework does not explicitly incorporate epithelial cells, the extracellular matrix or microbiome components. Including these essential elements in future analyses will provide a more holistic perspective on tissue-level functional coordination. Additionally, integrating coordinated intercellular networks with intracellular regulatory circuits⁴⁷ holds promise for a more nuanced understanding of tissue functions. Expanding the analysis to larger cohorts and a broader range of phenotypes will further advance our understanding of multicellular ecosystems and their implications for translational medicine.

Methods

Single-cell data collection and preprocessing of healthy samples

To assemble a comprehensive pan-tissue cell atlas, we collected scRNA-seq datasets and conducted quality control procedures via the Scanpy⁴⁸ toolkit, as detailed in subsequent sections (Extended Data Fig. 1 and Supplementary Table 1). Default parameters were used unless otherwise specified.

Data collection

We included scRNA-seq datasets from adult samples that met the following criteria: (1) utilization of fresh, not frozen, samples; (2) inclusion of samples based on cell-type enrichment: (a) no cell-type enrichment; (b) a mixture of immune, epithelial, endothelial and stromal compartments; (c) enrichment for either immune or non-immune cell populations; and (3) generation of single-cell, not single-nucleus, data using the 10x Genomics platform. These criteria were implemented to minimize batch effects across the datasets⁴⁹. Ultimately, a total of 33 datasets from 26 cohorts were included, collectively representing a cell atlas across 35 human tissues.

Quality control

To standardize datasets annotated with different versions of the human genome assembly, we limited the transcriptome to the common set of 21,812 genes found in the three most widely used 10x Genomics gene annotations, specifically GRCh38 (Ensembl 84), GRCh38 (Ensembl 93) and GRCh38 (GENCODE v32/Ensembl 98). Cells identified as low-quality or germ line in the original studies were excluded, and only cells meeting the following criteria were retained: 500–8,000 genes, 1,000–100,000 gene counts, and less than 20% mitochondrial gene counts. We applied Scrublet⁵⁰, integrated into Scanpy, to each cohort and removed cells with a doublet score exceeding the 90th percentile across all cohorts. We then excluded samples with fewer than 50 high-quality cells. In the end, the analysis comprised a total of more than 700 samples that passed the stringent quality control measures.

Preprocessing

Beginning with the combined gene count matrix across all datasets, we derived the normalized gene expression matrix by normalizing total counts per cell (library size) using a scale factor of 10,000 followed by logarithmic transformation. Highly variable genes (HVGs) were then selected using the function scanpy.pp.highly_variable_genes with the following parameters: (n_top_genes=2000, flavor = “cell_ranger”, batch_key = “datasetID”). Notably, HVG selection was performed after removing specific genes, including immunoglobulin genes, T cell receptor genes, ribosome protein-coding genes, heat shock proteins-associated genes, and mitochondrial genes. Several confounding effects, including total gene counts per cell, the percentage of mitochondrial gene counts, and cell cycle were addressed, using the function scanpy.pp.regress_out. Finally, HVGs were centred and scaled among all cells.

Single-cell data integration and annotation

To integrate these extensive datasets, we used the Scanpy toolkit with default parameters unless otherwise specified.

Benchmarking integration methods

To determine the best integration method for our datasets, we utilized scIB¹⁷ to benchmark several widely used Python-based tools: BBKNN¹⁸, Harmony⁵¹, Scanorama⁵² and deep learning-based scVI⁵³, scANVI⁵⁴, and SCALEX⁵⁵. Among the 14 metrics in scIB, biological conservation for HVG and trajectory were not applicable, and the kBET metric was excluded owing to memory requirements exceeding 2 TB. Overall scores were calculated as a weighted mean (40/60) of batch correction and biological variance conservation. Importantly, we conducted two independent benchmarking analyses on the entire atlas and a subset atlas, respectively. In the end, BBKNN emerged as the top performer and was used for the integration of the pan-tissue datasets (Extended Data Fig. 2).

Dataset integration

Principal component analysis was performed on the centred and scaled HVG expression matrix to extract 50 principal components. BBKNN, integrated into Scanpy, was then executed with the dataset as the batch variable. The batch-corrected graph was then utilized to perform UMAP⁵⁶ for visualizing cells on a two-dimensional layout.

Cell clustering and annotation

We performed at least two levels of unsupervised cell clustering and annotation. The first level of clustering was performed using the function scanpy.tl.leiden with resolution = 0.1 followed by identification of differentially expressed genes (DEGs; log₂-transformed fold changes >1, FDR < 0.05, Student’s t-test). The eight broad cell types were identified on the basis of canonical markers and DEGs. We also received assistance with cell annotation from CellTypist⁷, an automated cell-type annotation tool, using the Immune_All_High and Immune_All_Low models. Subsequently, further clustering (second or more levels) was performed using context-specific resolutions to obtain several distinct cell subsets for each cell type. Epithelial cells were excluded from further clustering owing to their highly tissue-specific nature. In total, 2,293,951 high-quality cells from 706 samples across 317 donors were annotated into 76 non-epithelial subsets and 26 epithelia cell types.

Hierarchical clustering of cell subsets

We generated pseudo-bulk profiles for 76 non-epithelial cell subsets by averaging the gene expression of all cells within the same subsets. Next, unsupervised hierarchical clustering was performed using correlation distance and the hclust function (method = “ward.D”). The results were visualized using the dendextend R package.

CoVarNet framework

We introduced CoVarNet, a computational framework designed to systematically unravel coordination among multiple cell types. CoVarNet identifies co-occurring CM networks by analysing the covariance in cell subset frequencies across various samples.

CoVarNet overview

CoVarNet uses input data on cell subset frequencies within each cell type and sample. It utilizes two parallel modules to jointly determine CM networks by connecting co-occurring subsets (nodes) through edges. The first module applies NMF to the cell subset frequency matrix, identifying factors that prioritize subsets based on their weights. The top subsets of each factor act as co-occurring nodes in a single CM network. The second module identifies specifically correlated subset pairs, which act as potential edges. Multiple CM networks are then constructed to interconnect co-occurring nodes via these potential edges, followed by topological and statistical evaluations.

Input frequency matrix

To ensure comparability of cell-subset frequencies across tissues and clinical specimens, we included only samples without cell-type enrichment or those from mixtures of the four cell compartments. Samples with fewer than 50 high-quality cells were excluded. For each eligible sample, we computed the frequencies of cell subsets within their corresponding cell types. Min-max normalization was applied to correct the frequency matrix, mitigating the impact of varying numbers of cell subsets across different cell types. Thus, a corrected frequency matrix, ranging from 0 to 1, was utilized in the CoVarNet procedure. Specifically, we generated a frequency matrix consisting of 76 subsets (rows) and 510 samples (columns) for the pan-tissue atlas.

NMF

NMF has been used in the analysis of single-cell^57,58,59,60 and spatial^61,62 expression data to extract gene expression programmes. In this study, CoVarNet applies NMF to the frequency matrix to decipher cellular co-occurring programmes, specifically using the nsNMF method with ranks from 2 to 20, as implemented in the NMF R package⁶³. To ensure robustness, we conducted 30 runs to derive a consensus output, consisting of k factors and their activities in each sample. Specifically, the top ten subsets of each factor were used as co-occurring node candidates in a single CM network for the pan-tissue analysis.

Rank selection

To determine the optimal rank for NMF analysis, we used the cophenetic correlation coefficient (CCC) as the evaluation index, in accordance with practices from previous reports¹. CCC is used to quantify classification stability, with values ranging from 0 to 1 and 1 indicating maximum stability⁶⁴. We denoted the CCC at rank k as ρ_k and established a procedure tailored to this context for consistent stability based on the following criteria: (1) ρ_k − 2 < ρ_k − 1 < ρ_k; (2) ρ_k > ρ_k + 1. Among a set of ranks meeting these criteria, the optimal rank was then identified as the one at which CCC is maximized. The optimal rank selected for the pan-tissue atlas was 12 (Extended Data Fig. 3a,b).

Specifically correlated subset pairs

CoVarNet utilizes Pearson correlation coefficients to assess whether any two cell subsets co-occur. For a given set of s cell subsets, pairwise correlation tests are performed based on the frequency matrix, resulting in an s × s correlation coefficient matrix (denoted R). To quantify the specificity of correlations, an indicator is defined. For each element r_ij (i < j) in R, its background set S_ij and specificity index Spec (r_ij) are defined as:

$${S}_{ij}=\{{r}_{ik}| k\ne i\}\cup \{{r}_{kj}| k\ne j\}$$

$${\rm{Spec}}({r}_{ij})=\frac{| \{r\in {S}_{ij}| r\le {r}_{ij}\}| }{| {S}_{ij}| }$$

In other words, the specificity index is defined as the fraction of elements in the background set that do not exceed r_{i j}. The specificity cutoff is determined by an automatic method. If n and N represent the assumed number of subsets in each CM and the total number of subsets, then the specificity cutoff Cutoff (n, N) will be determined as:

$${\rm{C}}{\rm{u}}{\rm{t}}{\rm{o}}{\rm{f}}{\rm{f}}(n,N)=1-\frac{(n-1)\times 2-1}{(N-1)\times 2-1}$$

This approach enables a balanced assessment of the number of subsets within CMs and their co-occurrence. Specifically correlated subset pairs are determined jointly by the correlation (coefficient and FDR) and specificity. We generated 147 pairs for the pan-tissue atlas. These pairs were visualized as a global network (Supplementary Fig. 4).

Construction, evaluation and visualization of CM networks

For each NMF factor, the top subsets are designated as potential nodes, and edges connect specifically correlated subset pairs, removing isolated nodes. In each constructed CM network, the connectivity score is calculated as the ratio of observed edges to the total possible edges among all nodes within that network. The statistical significance of this score is assessed using a permutation test (n = 10,000) on the node labels. We used the igraph R package to visualize the CM networks, with nodes colour-coded by cell type and edge colour gradients scaled to reflect specificity.

CMT classifications of samples

The CM activities in individual samples are measured by the coefficient matrix from the NMF procedure, with the sum of activities for all CMs equalling 1 for each sample. Each sample was assigned a CMT label based on its most abundant CM. For instance, if a sample exhibited the highest activity of CM01 among all CMs, it was labelled as CMT01. All healthy single-cell samples used across tissues were stratified into 12 CMT groups (Extended Data Fig. 3e).

Integrative analysis of scRNA-seq and GTEx RNA-seq

We utilized GTEx²⁴ RNA-seq datasets to validate the CMs defined by single-cell data (Extended Data Fig. 4).

RNA-seq data preprocessing

We retrieved gene transcripts per million (TPM) and metadata for 17,382 bulk RNA-seq samples from the GTEx Portal (V8 release)⁶⁵. Samples derived from cell lines were excluded, and the ‘Cervix uteri’ category was merged into the ‘Uterus’ category for consistency. To ensure consistency, only tissues represented in the single-cell cohort were retained, narrowing down to a total of 12,240 samples spanning 23 tissues for further analysis. Gene expression data were re-normalized to a uniform library size of 10,000 and log-transformed for comparability with single-cell data.

CMT classifications of RNA-seq samples

We began by identifying DEGs among pseudo-bulk CMT samples. The top ten DEGs, ranked by fold change, were designated as CMT signature genes. Utilizing the Seurat R package⁶⁶, we applied the AddModuleScore function to calculate scores for individual RNA-seq samples based on these CMT signature sets. All negative scores were adjusted to zero, and 2.3% (278 out of 12,240) of samples with the highest score less than 0.2 were excluded to ensure robust classification. Ultimately, the remaining 11,962 samples were categorized into 12 distinct CMTs, facilitating a detailed examination of CM representation across the analysed tissues.

Tissue prevalence of cell subsets and CMs

To assess the prevalence of cell subsets across tissues, we compared observed (o) to expected (e) cell numbers for each subset-tissue combination, expressed as R_o/e = observed/expected, following established methods^35,38,39. Expected cell numbers for each subset–tissue combination were derived from the Chi-square test, with enrichment defined as R_o/e > 1 (Fig. 1e and Supplementary Fig. 2). For the assessment of each CM, we computed tissue-level CM activities by averaging its activity across all samples within each tissue. The R_o/e ratio indicated the tissue distribution of CM profiles (Fig. 2c). To compare CM enrichment across 23 overlapping tissues between bulk and single-cell analyses, we independently calculated R_o/e ratios for each data type and combined them for comparison (Extended Data Fig. 4f). Results were visualized using the ComplexHeatmap R package⁶⁷.

Analysis of spatial transcriptomics data

Data collection

We gathered published spatially resolved transcriptomics datasets (Visium and Xenium) of various human tissues and cancer types. Detailed accession numbers and references for these datasets are provided (Supplementary Table 3).

Cell subset identification

For deconvoluting spatial transcriptomics data, we utilized cell2location²⁵, a Bayesian model capable of accurately resolving fine-grained cell types within spatial data. Utilizing both healthy and cancer datasets, we used the corresponding integrated scRNA-seq data as a reference to obtain cell-type signatures. Prior to this process, the scRNA-seq data was subsampled to 1,000 cells per cell subset. In cases where cell types comprised fewer than 1,000 cells, all available cells were included. Following the recommended guidelines of cell2location, we set N = 5 as the expected cell abundance per spot and α_y = 20 to regularize within-experiment variation in RNA detection sensitivity. The output yielded the expected cell abundance per cell subset in each spot.

CM activities

To quantify and visualize CMs in spatial transcriptomics, we aggregated the abundance of the component cell subsets within each CM, applying weights derived from NMF results. The resulting CM activities were then scaled to a uniform standard, with the 99th percentile value across all CMs being set to 1, allowing for a direct comparison of their relative magnitudes.

Colocalization scores

To assess the colocalization of cell-subset components within CMs across Visium spots, we calculated the colocalization score for individual spatial sections. For each CM, we calculated Spearman correlation coefficients between subset pairs where at least one subset is within this CM, resulting in a set of correlation coefficients denoted as S. The median correlation coefficient within the CM is termed r. The colocalization score for each CM is defined as the proportion of correlations in S that are less than or equal to r, providing a measure of colocalization relative to global contexts.

Aggregation scores

To assess the regional aggregation of cell-subset components within CMs in spatial transcriptomics, we utilized the global bivariate Moran’s I⁶⁸ using the spdep R package. Similar to the colocalization score, the aggregation scores were calculated using global Moran’s I instead of the correlation coefficient.

Cellular niches

To provide orthogonal validation for the identified CMs, we used CellCharter²⁶ to identify cellular niches, clustering Visium spots based on both gene expression and spatial information to enable spatially informed niche categorization. This analysis was performed for each sample independently to mitigate batch effects, allowing cross-validation of results across samples from the same tissues.

Xenium analysis

The Xenium platform enables in situ characterization of hundreds to thousands of genes in cells and tissues with ultra-precise single-cell spatial imaging. Using published intestinal Xenium data, we characterized the spatial locations of CM02 and CM03. We first designed a gene panel to distinguish multiple cell subsets within these CMs, with gene transcript density serving as a proxy for CM intensity (Extended Data Fig. 5i). Spatial regions of the tissue (epithelium or lamina propria) were identified on the basis of k-means clustering (k = 2) in the original dataset. To assess CM distribution across tissue sub-regions, we selected six different areas within the intestinal mucosa and measured CM intensities, providing a spatially resolved, single cell-resolution distribution of CMs.

Cell–cell communication analysis

To disentangle complex cellular crosstalk within and across CMs, we performed ligand–receptor-mediated cell–cell communication analysis using single-cell data with the CellPhoneDB Python package^11,27.

Global analysis agnostic to CMs

Considering the extensive number of cells, we conducted subsampling to equalize the contribution of each cell subset. Specifically, we subsampled the number of cells to 1,000 cells for each cell subset. However, if the total cell count for certain subsets did not exceed 1,000, all cells were included in the analysis. This approach aimed to ensure that the null distribution accurately represented all cell subsets, avoiding bias towards cell subsets with larger cell numbers. Subsequently, the CellPhoneDB procedure was used for statistical inference of cell–cell interaction specificity, allowing the derivation of cell–cell interaction counts among different cell types (Extended Data Fig. 6c) or CMs (Extended Data Fig. 6d) by averaging the results of corresponding cell subsets. We also validated these results using an alternative tool, CellChat⁶⁹ (Extended Data Fig. 6e).

Comparative analysis informed by CMs

To explore the effect of tissue microenvironments (CMs) on cellular crosstalk within CMs, we conducted CellPhoneDB analysis using two groups of samples with high or low CM activities, respectively. For each CM, the ‘High’ group included all samples in which that CM showed the highest activity among all CMs, while other samples were used as the ‘Low’ group. Following the previously defined CMTs in ‘CMT classifications of samples’, taking CM01 as an example, all samples labelled as CMT01 were used as the High group, while other samples were used as the Low group. Subsequently, comparative analysis between the two groups focused on cell subsets that were components of each CM (Fig. 3g).

Cytokine response analysis

The recently reported Immune Dictionary provides a comprehensive overview of cell-type-specific responses to 86 cytokines²⁸. Leveraging this foundational knowledge, we explored intercellular crosstalk within CMs through cytokine responses. CM07 and CM10 were excluded as they lack immune cell subsets, while CM12 had no significant cytokine outputs.

CM-dependent DEGs of cell subsets

The tissue microenvironments exerted a broad influence on cell-subset phenotypes in a CM-dependent manner. Specifically, for each immune subset, we identified its DEGs (log₂-transformed fold changes >log₂(1.2), FDR < 0.05, Student’s t-test) in samples from corresponding CMTs when compared with those from other CMTs, utilizing the Scanpy toolkit. These identified DEGs were interpreted as responses to cytokines within the tissue microenvironments associated with CMs.

Cytokine response inference

First, we built a database of cytokine signatures for each cell type, using supplementary table 3 from the Immune Dictionary publication²⁸. Of note, we performed one-to-one conversion of mouse and human ortholog genes based on the Mouse Genome Informatics (MGI) database⁷⁰ (http://www.informatics.jax.org/downloads/reports/HMD_HumanPhenotype.rpt). Subsequently, we conducted cell-type-aware immune response enrichment analysis using the hypergeometric test (FDR < 0.05) through the enricher function in the clusterProfiler R package⁷¹.

CM-specific cytokine networks

To visualize cytokine-mediated multicellular regulation, we constructed cytokine networks by considering both cytokine production and response in a CM-specific manner. For cytokines to be considered in a CM, the normalized expression value of the cytokine gene needed to exceed 0.1 in at least one of non-responsive cell subsets. In the case of heteromeric cytokines or cytokine complexes with two subunits, each subunit was separately represented. The igraph R package was used to generate visual representations of the cytokine networks.

Association analysis of CM activity and phenotypic factors

Tissues

We first examined the association between CM activities and tissues, excluding tissues with fewer than five samples. For each CM, we fitted a linear model, with adjusted R² indicating the proportion of variance explained. The FDR of the F test was reported to ensure the robustness of the results. Given the strong tissue preferences of CMs, subsequent analyses focused on tissue-specific associations.

Sexes and age groups

In non-reproductive tissues, CM activities were compared between samples from male and female participants, with FDR-adjusted significance. Samples lacking age information were excluded from the analysis. Age data were categorized into groups: <35, 35–39, 40–49, 50–59, 60–69 and 70–85 years. For the breast dataset (D03) with over 100 samples (Supplementary Table 1), age was categorized as <50 or ≥50 years. Associations between CM activities and age groups were assessed, with adjustments for statistical significance. Specifically, we also examined associations between immune cell-subset frequencies and age groups in the spleen.

Additional phenotypes

We performed further association analyses between CMs and specific phenotypic factors. We analysed CM09 in relation to alcohol consumption in the lymph node, CM06 and CM11 with childhood tuberculosis in the lung, CM12 with menopause in the breast, and CM07 with menstrual cycle phases in the uterus.

CM05 regulators in the spleen

Inferring regulons

We used the pySCENIC^30,72 pipeline to infer regulons for the four subsets (B03, B05, CD4T03, and I06) within CM05, performing the analysis separately for the three cellular lineages (B cells, CD4⁺ T cells, and innate lymphoid cells). For each subset, regulons were ranked on the basis of their regulon specificity scores (RSS), and the top 50 regulons with the highest RSS were selected for each cell subset. Seventeen regulons were shared among the four subsets.

Activities and target genes of shared regulons

Sample-level activities for shared regulons were determined by averaging cellular activities across all cells in each sample. The mean activity of each regulon was then compared across age-stratified sample groups. Target genes of shared regulons were compared across lineages, and a regulatory network was constructed to illustrate their interrelationships.

MCP analysis

MCP identifications

As different cell types within the same CMs tend to be exposed to similar tissue microenvironments, we hypothesized that they might exhibit coordinated responses. To investigate this, we utilized a method called DIALOGUE³ to map MCPs for CMs. This procedure involved setting the parameter (k = 3) and assessing the association between the MCPs identified and other phenotypes. This analysis was applied to CM12 in the breast and CM08, and resulted MCPs were termed as CM12 programme and CM08 programme.

Comparison between CM08 programme and other signatures

To compare CM08 programme and inflammatory and cytotoxic signatures, we calculated their overall expression in external RNA-seq data, as described previously^73,74. Specifically, RNA-seq datasets of samples from individuals with advanced melanoma following anti-PD-1 therapies⁷⁵ were downloaded from https://github.com/ParkerICI/MORRISON-1-public. The inflammatory signature genes of immune cells are defined as CD3D, IDO1, CIITA, CD3E, CCL5, GZMK, CD2, HLA-DRA, CXCL13, IL2RG, NKG7, HLA-E, CXCR6, LAG3, TAGAP, CXCL10, STAT1 and GZMB⁷⁶. The cytotoxic signature genes are defined as IFNG, GZMA, GZMB, PRF1, GZMK, ZAP70, GNLY, FASLG, TBX21, EOMES, CD8A, CD8B, CXCL9, CXCL10, CXCL11, CX3CL1, CCL3, CCL4, CX3CR1, CCL5, CXCR3, NKG7, CD160, CD244, NCR1, KLRC2, KLRK1, CD226, GZMH, ITK, CD3D, CD3E, CD3G, TRAC, TRBC1, TRBC2, CD28, CD5, KIRDL4, FGFBP2, KLRF1, SH2D1B and NCR3 (ref. ⁷⁷).

Inflammatory scores of CM12 subsets in the breast

We calculated the sample-level inflammatory scores for immune and fibroblast subsets within CM12. Fibroblasts and immune subsets were scored using the corresponding inflammatory gene sets. The inflammatory signature genes of fibroblasts are defined as PLAU, CHI3L1, MMP3, IL1R1, IL13RA2, TNFSF11, MMP10, OSMR, IL11, STRA6, FAP, WNT2, TWIST1 and IL24 (ref. ⁷⁸). The inflammatory signature genes of immune cells are defined as above. Specifically, we first calculated the average gene expression among all cells of individual subsets for each sample. Subsequently, we used the R package AUCell³⁰ to calculate the sample-level inflammatory scores.

Menopausal trajectory analysis

Discovery cohort

We constructed a menopausal trajectory in the breast (dataset D03; Supplementary Table 1) based on the frequencies of cell subsets within CM12, following the methodology described in a recent study⁵. To mitigate the impact of frequency differences between cell subsets, we applied z-score normalization to correct the frequency matrix. Subsequently, we computed the k-neighbourhood and performed clustering for breast samples using the function scanpy.pp.neighbours and scanpy.tl.leiden with default parameters. To model trajectories along menopause, we performed PHATE⁷⁹ with a = 40, followed by pseudotime analysis using the Palantir⁸⁰ standard pipeline. The starting point was defined as the cluster with a high proportion of pre-menopausal samples.

Validation cohort

The Reed dataset⁸¹ was used to validate the menopausal trajectory. For the epithelial-enriched (‘organoid’) samples in the dataset, cell subset annotation was performed with CellTypist⁷, using previously annotated gene expression profiles of breast tissue as a reference. The cell subset frequency matrix was subsequently input into the trajectory analysis as described above.

Pan-cancer single-cell atlas

To disentangle the rewiring of CMs along malignant progression, we constructed a pan-cancer transcriptomic atlas at single-cell resolution (Fig. 5a).

Data collection and preprocessing

Following the criteria set for healthy datasets, we selectively included scRNA-seq datasets from fresh (not frozen) samples without cell-type enrichment, generated using 10x Genomics single-cell (not single-nucleus) platforms. One exception is the ESCC_GSE160269 cohort, where samples represent a mixture of immune, epithelial, endothelial, and stromal compartments. Quality control and other preprocessing procedures were also applied consistently with healthy datasets. In total, more than 1,000 samples from 48 datasets were incorporated, collectively forming a cell atlas across 29 human major cancer types (Extended Data Fig. 10 and Supplementary Table 7).

Data integration, cell clustering and initial annotation

Following the methodology applied to healthy datasets, we performed dataset integration using BBKNN. Unsupervised clustering for all cells was carried out using the scanpy.tl.leiden function with a resolution of 0.1. Subsequently, we identified the eight broad cell types based on canonical markers.

Supervised cell annotation

To accurately determine cell identities in cancerous samples, we utilized a transfer-learning-based strategy for cell subset annotation. Initially, we curated a single-cell reference dataset that encompassed 76 non-epithelial cell subsets identified in healthy data and 15 cancer-associated subsets identified in various cancer types^{35,36,37,38,39} (Supplementary Table 9). The number of cells of each subset was subsampled to 1,000, unless the total cell count did not exceed 1,000. Subsequently, a transformer-based reference model was trained on the reference dataset. Following this, non-epithelial cells from the pan-cancer atlas were annotated using the reference model. These procedures were executed using TOSICA, a multi-head self-attention model, enabling interpretable cell-type annotation⁸². The epoch number for prediction was selected as 15 (Supplementary Fig. 8c). Cells with predicted probabilities <0.5 were removed (Supplementary Fig. 8d). In the end, a total of 3,038,535 high-quality cells from 1,062 samples of 717 donors were well annotated to be 91 non-epithelial subsets.

Rewiring of multicellular ecosystems in cancer

Samples used

To compare multicellular dynamics during tumour progression, we included eight cancer types that had at least three samples from each condition (healthy, adjacent non-tumour and tumour).

Interrogation of healthy CMs

To quantify healthy CMs across different conditions, we aggregated the abundance of the component cell subsets within each CM, applying weights derived from NMF results. The resulting CM activities were then rescaled to range from 0 to 1. For each cancer type, only the most dominant CM was considered.

Co-occurrence of cell subsets in individual cancer types

For each cancer type, we derived all specifically correlated cell-subset pairs using the correlation analysis module of CoVarNet. Compared with pan-tissue or pan-cancer analysis, the following more stringent cutoffs are used: coefficients >0.5, FDR <0.05 and specificity >0.95. Only identified specifically correlated subset pairs are used to perform comparison across eight cancer types.

Identification of cCMs using CoVarNet

Specifically, we generated a frequency matrix consisting of 91 subsets and 955 samples for the pan-cancer atlas. Specifically, the top 15 subsets of each factor were used as co-occurring node candidates in a single CM network for the pan-cancer analysis.

Interrogation of cCM02

To quantify cCMs across different conditions, we measured cCM02 activities using the same procedure as in healthy CMs.

cCM02 analysis

Co-occurring network

To explore the dynamics of cCM02 during tumour progression, we constructed two co-occurring networks of which nodes are the same as original nodes in cCM02, while edges were recalculated in two scenarios. One used healthy and adjacent non-tumour samples, while the other used tumour and adjacent non-tumour samples. Thus, CM networks were constructed using unaltered nodes and new edges.

Cytokine analysis

For each cell-subset component of cCM02, we identified its DEGs (log₂-transformed fold changes >log₂(1.2), FDR < 0.05, Student’s t-test) in tumour samples compared with adjacent non-tumour samples. These identified DEGs were used to perform cytokine analysis as described above.

MCP analysis

We conducted MCP identification using only tumour and adjacent non-tumour samples. Overall expression of MCP were calculated for the RNA-seq datasets from TCGA portal and microarray data of pre-invasive lung lesions⁴⁵. TCGA datasets were downloaded using the TCGAbiolinks⁸³ R package. Only projects with more than ten tumour or adjacent non-tumour samples were included. Datasets of pre-invasive lung lesions were downloaded from https://github.com/ucl-respiratory/preinvasive.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

This study conducted a comprehensive analysis based on curated datasets from previously published studies without generating new primary data. Curated pan-tissue and pan-cancer scRNA-seq datasets can be downloaded from Zenodo (https://zenodo.org/records/15169362)⁸⁴ and can be interactively explored at http://cm.cancer-pku.cn. Spatially resolved transcriptomics datasets used are listed in Supplementary Table 3. Source data are provided with this paper.

Code availability

The source code of CoVarNet has been uploaded to the Github repository https://github.com/QiangShiPKU/CoVarNet. Other analysis code has been uploaded to Zenodo (https://zenodo.org/records/14175262)⁸⁵.

References

Luca, B. A. et al. Atlas of clinically distinct cell states and ecosystems across human solid tumors. Cell 184, 5482–5496.e5428 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ramirez Flores, R. O., Lanzer, J. D., Dimitrov, D., Velten, B. & Saez-Rodriguez, J. Multicellular factor analysis of single-cell data for a tissue-centric understanding of disease. eLife 12, e93161 (2023).
Article PubMed PubMed Central Google Scholar
Jerby-Arnon, L. & Regev, A. DIALOGUE maps multicellular programs in tissue from single-cell or spatial transcriptomics data. Nat. Biotechnol. 40, 1467–1477 (2022).
Article CAS PubMed PubMed Central Google Scholar
Mitchel, J. et al. Coordinated, multicellular patterns of transcriptional variation that stratify patient cohorts are revealed by tensor decomposition. Nat. Biotechnol. https://doi.org/10.1038/s41587-024-02411-z (2024).
Green, G. S. et al. Cellular communities reveal trajectories of brain ageing and Alzheimer’s disease. Nature 633, 634–645 (2024).
Article CAS PubMed PubMed Central Google Scholar
Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017).
Article PubMed PubMed Central Google Scholar
Dominguez Conde, C. et al. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science 376, eabl5197 (2022).
Article CAS PubMed PubMed Central Google Scholar
Eraslan, G. et al. Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function. Science 376, eabl4290 (2022).
Article CAS PubMed PubMed Central Google Scholar
The Tabula Sapiens Consortium. The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science 376, eabl4896 (2022).
Article PubMed Central Google Scholar
Wang, D., Liu, B. & Zhang, Z. Accelerating the understanding of cancer biology through the lens of genomics. Cell 186, 1755–1771 (2023).
Article CAS PubMed Google Scholar
Vento-Tormo, R. et al. Single-cell reconstruction of the early maternal-fetal interface in humans. Nature 563, 347–353 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Armingol, E., Baghdassarian, H. M. & Lewis, N. E. The diversification of methods for studying cell–cell interactions and communication. Nat. Rev. Genet. 25, 381–400 (2024).
Article CAS PubMed PubMed Central Google Scholar
Hickey, J. W. et al. Organization of the human intestine at single-cell resolution. Nature 619, 572–584 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Madissoon, E. et al. A spatially resolved atlas of the human lung characterizes a gland-associated immune niche. Nat. Genet. 55, 66–77 (2023).
Article CAS PubMed Google Scholar
Ji, A. L. et al. Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma. Cell 182, 497–514.e422 (2020).
Article CAS PubMed PubMed Central Google Scholar
He, S. et al. High-plex imaging of RNA and proteins at subcellular resolution in fixed tissue by spatial molecular imaging. Nat. Biotechnol. 40, 1794–1806 (2022).
Article CAS PubMed Google Scholar
Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).
Article CAS PubMed Google Scholar
Polanski, K. et al. BBKNN: fast batch alignment of single cell transcriptomes. Bioinformatics 36, 964–965 (2020).
Article CAS PubMed Google Scholar
Sender, R. et al. The total mass, number, and distribution of immune cells in the human body. Proc. Natl Acad. Sci. USA 120, e2308511120 (2023).
Article CAS PubMed PubMed Central Google Scholar
Blanco, E. et al. Age-associated distribution of normal B-cell and plasma cell subsets in peripheral blood. J. Allergy Clin. Immunol. 141, 2208–2219.e2216 (2018).
Article CAS PubMed Google Scholar
Buechler, M. B. et al. Cross-tissue organization of the fibroblast lineage. Nature 593, 575–579 (2021).
Article ADS CAS PubMed Google Scholar
Garcia-Alonso, L. et al. Mapping the temporal and spatial dynamics of the human endometrium in vivo and in vitro. Nat. Genet. 53, 1698–1711 (2021).
Article CAS PubMed PubMed Central Google Scholar
Agace, W. W. & McCoy, K. D. Regionalized development and maintenance of the intestinal adaptive immune landscape. Immunity 46, 532–548 (2017).
Article CAS PubMed Google Scholar
Consortium, G. T. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Article Google Scholar
Kleshchevnikov, V. et al. Cell2location maps fine-grained cell types in spatial transcriptomics. Nat. Biotechnol. 40, 661–671 (2022).
Article CAS PubMed Google Scholar
Varrone, M., Tavernari, D., Santamaria-Martinez, A., Walsh, L. A. & Ciriello, G. CellCharter reveals spatial cell niches associated with tissue remodeling and cell plasticity. Nat. Genet. 56, 74–84 (2024).
Article CAS PubMed Google Scholar
Efremova, M., Vento-Tormo, M., Teichmann, S. A. & Vento-Tormo, R. CellPhoneDB: inferring cell–cell communication from combined expression of multi-subunit ligand–receptor complexes. Nat. Protoc. 15, 1484–1506 (2020).
Article CAS PubMed Google Scholar
Cui, A. et al. Dictionary of immune responses to cytokines at single-cell resolution. Nature 625, 377–384 (2024).
Article ADS CAS PubMed Google Scholar
Liu, X. et al. Genome-wide analysis identifies NR4A1 as a key mediator of T cell dysfunction. Nature 567, 525–529 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).
Article CAS PubMed PubMed Central Google Scholar
Karakaslar, E. O. et al. Transcriptional activation of Jun and Fos members of the AP-1 complex is a conserved signature of immune aging that contributes to inflammaging. Aging Cell 22, e13792 (2023).
Article CAS PubMed PubMed Central Google Scholar
Larson, M. H. et al. A comprehensive characterization of the cell-free transcriptome reveals tissue- and subtype-specific biomarkers for cancer detection. Nat. Commun. 12, 2357 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Bonafe, M., Storci, G. & Franceschi, C. Inflamm-aging of the stem cell niche: breast cancer as a paradigmatic example: breakdown of the multi-shell cytokine network fuels cancer in aged people. Bioessays 34, 40–49 (2012).
Article CAS PubMed Google Scholar
Kumar, T. et al. A spatially resolved single-cell genomic atlas of the adult human breast. Nature 620, 181–191 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Cheng, S. et al. A pan-cancer single-cell transcriptional atlas of tumor infiltrating myeloid cells. Cell 184, 792–809.e723 (2021).
Article CAS PubMed Google Scholar
Tang, F. et al. A pan-cancer single-cell panorama of human natural killer cells. Cell 186, 4235–4251.e4220 (2023).
Article CAS PubMed Google Scholar
Zheng, L. et al. Pan-cancer single-cell landscape of tumor-infiltrating T cells. Science 374, abe6474 (2021).
Article PubMed Google Scholar
Gao, Y. et al. Cross-tissue human fibroblast atlas reveals myofibroblast subtypes with distinct roles in immune modulation. Cancer Cell 42, 1764–1783.e1710 (2024).
Article CAS PubMed Google Scholar
Yang, Y. et al. Pan-cancer single-cell dissection reveals phenotypically distinct B cell subtypes. Cell 187, 4790–4811.e4722 (2024).
Article CAS PubMed Google Scholar
Aran, D. et al. Comprehensive analysis of normal adjacent to tumor transcriptomes. Nat. Commun. 8, 1077 (2017).
Article ADS PubMed PubMed Central Google Scholar
Jakubek, Y. A. et al. Large-scale analysis of acquired chromosomal alterations in non-tumor samples from patients with cancer. Nat. Biotechnol. 38, 90–96 (2020).
Article CAS PubMed Google Scholar
Yarchoan, M. et al. PD-L1 expression and tumor mutational burden are independent biomarkers in most cancers. JCI Insight 4, e126908 (2019).
Article PubMed PubMed Central Google Scholar
Hoekstra, M. E. et al. Distinct spatiotemporal dynamics of CD8⁺ T cell-derived cytokines in the tumor microenvironment. Cancer Cell 42, 157–167.e159 (2024).
Article CAS PubMed PubMed Central Google Scholar
Bresnick, A. R., Weber, D. J. & Zimmer, D. B. S100 proteins in cancer. Nat. Rev. Cancer 15, 96–109 (2015).
Article CAS PubMed PubMed Central Google Scholar
Teixeira, V. H. et al. Deciphering the genomic, epigenomic, and transcriptomic landscapes of pre-invasive lung cancer lesions. Nat. Med. 25, 517–525 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zhao, E. et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat. Biotechnol. 39, 1375–1384 (2021).
Article CAS PubMed PubMed Central Google Scholar
Badia, I. M. P. et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat. Rev. Genet. 24, 739–754 (2023).
Article Google Scholar
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Article PubMed PubMed Central Google Scholar
Mereu, E. et al. Benchmarking single-cell RNA-sequencing protocols for cell atlas projects. Nat. Biotechnol. 38, 747–755 (2020).
Article CAS PubMed Google Scholar
Wolock, S. L., Lopez, R. & Klein, A. M. Scrublet: computational identification of cell doublets in single-cell transcriptomic data. Cell Syst. 8, 281–291.e289 (2019).
Article CAS PubMed PubMed Central Google Scholar
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
Article CAS PubMed PubMed Central Google Scholar
Hie, B., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37, 685–691 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).
Article CAS PubMed PubMed Central Google Scholar
Xu, C. et al. Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models. Mol. Syst. Biol. 17, e9620 (2021).
Article PubMed PubMed Central Google Scholar
Xiong, L. et al. Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space. Nat. Commun. 13, 6118 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 37–44 (2018).
Google Scholar
Puram, S. V. et al. Single-cell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck cancer. Cell 171, 1611–1624.e1624 (2017).
Article CAS PubMed PubMed Central Google Scholar
Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887.e1817 (2019).
Article CAS PubMed PubMed Central Google Scholar
Barkley, D. et al. Cancer cell states recur across tumor types and form specific interactions with the tumor microenvironment. Nat. Genet. 54, 1192–1201 (2022).
Article CAS PubMed PubMed Central Google Scholar
Gavish, A. et al. Hallmarks of transcriptional intratumour heterogeneity across a thousand tumours. Nature 618, 598–606 (2023).
Article ADS CAS PubMed Google Scholar
Townes, F. W. & Engelhardt, B. E. Nonnegative spatial factorization applied to spatial genomics. Nat. Methods 20, 229–238 (2023).
Article CAS PubMed Google Scholar
Rodriques, S. G. et al. Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution. Science 363, 1463–1467 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Gaujoux, R. & Seoighe, C. A flexible R package for nonnegative matrix factorization. BMC Bioinf. 11, 367 (2010).
Article Google Scholar
Brunet, J. P., Tamayo, P., Golub, T. R. & Mesirov, J. P. Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl Acad. Sci. USA 101, 4164–4169 (2004).
Article ADS CAS PubMed PubMed Central Google Scholar
The GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Article PubMed Central Google Scholar
Hao, Y. et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol. 42, 293–304 (2024).
Article CAS PubMed Google Scholar
Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016).
Article CAS PubMed Google Scholar
Thioulouse, J., Chessel, D. & Champely, S. Multivariate analysis of spatial patterns: a unified approach to local and global structures. Environ. Ecol. Stat. 2, 1–14 (1995).
Article Google Scholar
Jin, S. et al. Inference and analysis of cell–cell communication using CellChat. Nat. Commun. 12, 1088 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Blake, J. A. et al. Mouse Genome Database (MGD): knowledgebase for mouse-human comparative biology. Nucleic Acids Res. 49, D981–d987 (2021).
Article CAS PubMed Google Scholar
Wu, T. et al. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innovation 2, 100141 (2021).
CAS PubMed PubMed Central Google Scholar
Van de Sande, B. et al. A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat. Protoc. 15, 2247–2276 (2020).
Article PubMed Google Scholar
Jerby-Arnon, L. et al. A cancer cell program promotes T cell exclusion and resistance to checkpoint blockade. Cell 175, 984–997.e924 (2018).
Article CAS PubMed PubMed Central Google Scholar
Jerby-Arnon, L. et al. Opposing immune and genetic mechanisms shape oncogenic programs in synovial sarcoma. Nat. Med. 27, 289–300 (2021).
Article CAS PubMed PubMed Central Google Scholar
Campbell, K. M. et al. Prior anti-CTLA-4 therapy impacts molecular characteristics associated with anti-PD-1 response in advanced melanoma. Cancer Cell 41, 791–806.e794 (2023).
Article CAS PubMed PubMed Central Google Scholar
Ayers, M. et al. IFN-γ-related mRNA profile predicts clinical response to PD−1 blockade. J. Clin. Invest. 127, 2930–2940 (2017).
Article PubMed PubMed Central Google Scholar
Bagaev, A. et al. Conserved pan-cancer microenvironment subtypes predict response to immunotherapy. Cancer Cell 39, 845–865.e847 (2021).
Article CAS PubMed Google Scholar
Smillie, C. S. et al. Intra- and inter-cellular rewiring of the human colon during ulcerative colitis. Cell 178, 714–730.e722 (2019).
Article CAS PubMed PubMed Central Google Scholar
Moon, K. R. et al. Visualizing structure and transitions in high-dimensional biological data. Nat. Biotechnol. 37, 1482–1492 (2019).
Article CAS PubMed PubMed Central Google Scholar
Setty, M. et al. Characterization of cell fate probabilities in single-cell data with Palantir. Nat. Biotechnol. 37, 451–460 (2019).
Article CAS PubMed PubMed Central Google Scholar
Reed, A. D. et al. A single-cell atlas enables mapping of homeostatic cellular shifts in the adult human breast. Nat. Genet. 56, 652–662 (2024).
Article CAS PubMed PubMed Central Google Scholar
Chen, J. et al. Transformer for one stop interpretable cell type annotation. Nat. Commun. 14, 223 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Colaprico, A. et al. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res. 44, e71 (2016).
Article PubMed Google Scholar
Shi, Q. & Chen, Y. Data for “Cross-tissue multicellular coordination and its rewiring in cancer” [Dataset]. Zenodo https://doi.org/10.5281/zenodo.15169362 (2025).
Shi, Q. & Chen, Y. Code for “Cross-tissue multicellular coordination and its rewiring in cancer”. Zenodo https://doi.org/10.5281/zenodo.14175262 (2024).

Download references

Acknowledgements

The authors thank members of the Z.Z. laboratory for insightful discussion. This work was supported by the National Key Research and Development Program of China (2023YFF1204700), the National Natural Science Foundation of China (62203019, 92459001 and L2424217) and the Fundamental Research Funds for the Central Universities (Peking University). Q.S. was supported in part by the National Postdoctoral Fellowship Program of China (GZB20230023), the China Postdoctoral Science Foundation (2021TQ0012, 2022M720246 and 2023M740093), the Peking University Boya Postdoctoral Fellowship, the Postdoctoral Fellowship of Peking-Tsinghua Center for Life Sciences and the Boehringer-Ingelheim Postdoctoral Fellowship Program. Part of the analysis was performed on the High Performance Computing Platform of the Center for Life Sciences (Peking University).

Author information

These authors contributed equally: Qiang Shi, Yihan Chen

Authors and Affiliations

Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, and Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
Qiang Shi, Yihan Chen, Yang Li, Shishang Qin, Yu Yang, Linnan Zhu, Dongfang Wang & Zemin Zhang
Chongqing Medical University, Chongqing, China
Yu Yang & Zemin Zhang
Institute of Cancer Research, Shenzhen Bay Laboratory, Shenzhen, China
Yang Gao

Authors

Qiang Shi
View author publications
Search author on:PubMed Google Scholar
Yihan Chen
View author publications
Search author on:PubMed Google Scholar
Yang Li
View author publications
Search author on:PubMed Google Scholar
Shishang Qin
View author publications
Search author on:PubMed Google Scholar
Yu Yang
View author publications
Search author on:PubMed Google Scholar
Yang Gao
View author publications
Search author on:PubMed Google Scholar
Linnan Zhu
View author publications
Search author on:PubMed Google Scholar
Dongfang Wang
View author publications
Search author on:PubMed Google Scholar
Zemin Zhang
View author publications
Search author on:PubMed Google Scholar

Contributions

Z.Z. and Q.S. conceived and designed the project. Q.S. and Y.C. carried out data curation with the help of S.Q., Y.Y. and Y.G. Q.S., Y.C. and Y.L. conducted the computational analysis with the help of S.Q. D.W., L.Z., Y.G. and Y.Y. contributed valuable discussions. Q.S. and Z.Z. wrote the manuscript with input from all authors.

Corresponding authors

Correspondence to Qiang Shi or Zemin Zhang.

Ethics declarations

Competing interests

Z.Z. is a founder of Analytical BioSciences and is a board member for InnoCare Pharma. The other authors declare no competing interests.

Peer review

Peer review information

Nature thanks Sarah Teichmann, who co-reviewed with Simon Koplev; Itai Yanai; and the other, anonymous, reviewers for their contribution to the peer review of this work. Peer review reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Overview of pan-tissue single-cell datasets.

(a) Statistics of donor, sample, and cell numbers before and after QC for scRNA-seq datasets. (b) Statistics of gene numbers. The intersecting gene sets of the three commonly used versions of 10x Genomics gene annotations are used for data integration. (c) Dot plot showing the number of samples per cohort. (d) Pie chart showing the percentages of cells profiled by different 10x Genomics platforms. (e) Bar plot showing the number of cells per tissue.

Source Data

Extended Data Fig. 2 Benchmarking data integration methods.

(a) Summary of benchmarking for several widely used Python-based integration methods on the total atlas and a subset atlas according to the scIB platform. Overall scores are calculated using a weighted mean of batch correction (40%) and bio-conservation (60%) scores. Methods are ranked by overall scores. (b) UMAP visualization of total cells colored by compartments (top) or tissues (bottom) across different integration methods.

Source Data

Extended Data Fig. 3 NMF and cell subset ranking.

(a) NMF rank selection based on cophenetic correlation coefficients and their consistency, with the final rank chosen as 12 (Methods). (b) Consensus plot for NMF with a rank of 12. (c) Bar plots showing the top 15 cell subsets ranked by NMF weights for each factor (CM). (d) Topological and statistical evaluations of CMs, with the connectivity score defined as the ratio of observed edges to the total possible edges among all nodes within the CM network. (e) CM activities in scRNA-seq samples across 34 tissues. Each sample is assigned a CMT based on its dominant CM, with samples ordered by CMT identities and tissues. (f) CM Shannon equitability across tissues. For box plot in d, the center line represents the median, the box limits delineate the upper and lower quartiles, and the whiskers extend to the highest and lowest values within 1.5× the interquartile range.

Source Data

Extended Data Fig. 4 Validation of CMs using GTEx RNA-seq data.

(a) Overview of the GTEx data analysis pipeline. (b) Venn diagram showing intersecting tissues in GTEx RNA-seq and curated scRNA-seq cohorts. (c) CM activities in RNA-seq samples across 23 tissues. Each sample is assigned a CMT based on its dominant CM, with samples ordered by CMT identities and tissues. (d) Heat maps showing the average expression of CMT signature genes (rows) across scRNA-seq (left) and GTEx RNA-seq (right) samples grouped by CMT identities (columns). (e) Gene expression of selected cell subset markers across samples grouped by CMT identities. Data from scRNA-seq (top) and GTEx RNA-seq (bottom) are shown separately. Sample sizes are noted. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001, one-tailed unpaired Wilcoxon tests. (f) Tissue prevalence of CMs measured by R_o/e for tissues profiled using both single-cell (SC) and bulk RNA-seq. For box plots in e, the center line represents the median, the box limits delineate the upper and lower quartiles, and the whiskers extend to the highest and lowest values within 1.5× the interquartile range.

Source Data

Extended Data Fig. 5 Spatial characterization of CMs.

(a,b) Spatial distribution of CM08 and CM10 (a) and their component cell subsets (b) in a Visium section of the skin. (c) Spatial aggregation (based on Moran’s I) among cell subsets within CM08 or CM10 in the Visium section shown in a,b. (d) Averaged spatial aggregation (based on Moran’s I) among cell subsets within CM08 or CM10 across six Visium sections from six available donors. (e–g) H&E staining (e), spatial distribution of CMs (CM02, CM03, and CM05) (f), and cellular niches identified by CellCharter (g) in three adjacent Visium sections of the ileum from one donor. (h) CODEX imaging of intestinal Peyer’s patches showing B cells (CD21, green), T cells (CD3, pink), and DCs (CD11c, cyan). Scale bars, 200 μm. Data from the HuBMAP portal (dataset ID: HBM785.FJVT.469). (i) Dot plot illustrating scRNA-seq expression of a gene panel differentiating CM02 and CM03 among various cell subsets in the colon. The percentages of colored subsets among similar subsets are indicated at the top. For example, 98.2% of the plasma cells (B10, B11, and B12) are B12.

Source Data

Extended Data Fig. 6 Spatial organization and cell-cell interactions of CMs.

(a) Schematic of two spatial metrics for CMs using 10x Visium data. The colocalization score evaluates the extent to which two cell subsets colocalize within the same spots, while the aggregation score measures the spatial adjacency of two cell subsets, encompassing both colocalization and broader spatial relationships. (b) Spatial aggregation (top) and cellular composition (bottom) of CMs. Top: CM aggregation scores across Visium samples colored by tissues (Methods). CMs are ordered by the median of aggregation scores across samples. Bottom: Proportion of different cell lineages in all cell-subset nodes within individual CM networks shown in Fig. 2b. None = 0 <Low <0.2 ≤ Medium <0.4 ≤ High. (c,d) CellPhoneDB cell-cell interaction analysis based on scRNA-seq data. Heat maps showing of cell-cell interaction counts within and across cell types (c) or CMs (d). Bar plots on the top and right represent the sums of the columns and rows, respectively. (e) CellChat cell-cell interaction counts based on scRNA-seq, stratified by ligand-receptor pair types. (f) Heat map showing the responsive cytokines (columns) among cell subsets within individual CMs (rows) (Methods). Bar plots on the top and left represent the sums of columns and rows, respectively. (g) Cytokine responsiveness in CMs. For each CM, the CM FDR denotes the minimum FDR value among all subsets within the CM (Methods). For box plot in b, the center line represents the median, the box limits delineate the upper and lower quartiles, and the whiskers extend to the highest and lowest values within 1.5× the interquartile range.

Source Data

Extended Data Fig. 7 Associations between CM activity and tissue, sex, and age group.

(a) Bar plot showing the proportion of variance explained of CM activities by tissues, measured as adjusted R² from linear models. Significance of linear models is indicated. (b) Heat map showing associations of CM activity with sexes in non-reproductive tissues. Two-tailed unpaired Wilcoxon tests. (c) Heat map showing associations of CM activity with age groups. Kruskal–Wallis tests. (d) CM09 activity and frequencies of its subsets in individual thymus samples stratified by age groups. Two-tailed unpaired Wilcoxon tests. (e) Bar plot showing associations between age groups and frequencies of immune cell subsets in the spleen. Kruskal–Wallis tests. (f) Frequencies of selected immune subsets in individual spleen samples stratified by age groups. Two-tailed unpaired Wilcoxon tests. (g) Frequencies of the B08 subset in individual spleen samples stratified by age groups. Two-tailed unpaired Wilcoxon tests. For box plots in d,f,g, the center line represents the median, the box limits delineate the upper and lower quartiles, and the whiskers extend to the highest and lowest values within 1.5× the interquartile range.

Source Data

Extended Data Fig. 8 CM05 analysis in the spleen.

(a) Venn diagrams showing the overlap of four CM05 subsets in all signature genes (left) and signature TFs (right). (b) Expression of NR4A1, NR4A2, and NR4A3 in CM05 subsets and other cells within the same cell types. (c) Venn diagram showing the overlap of regulon TFs identified by SCENIC analysis and signature TFs. The eight shared TFs (key regulators) and other signature TFs are indicated. (d) Regulatory network of 17 convergent regulons in CM05. The key regulators are highlighted. Regulatory relationships are determined by SCENIC analysis. (e) Venn diagrams showing the overlap in regulon targets among different cell types. Hypergeometric tests. TF, transcription factor.

Source Data

Extended Data Fig. 9 CM12 analysis in the breast.

(a) Inflammatory scores of CM12 subsets in pre- and post-menopausal samples. Two-tailed unpaired Wilcoxon tests. (b) Bar plot showing the association between menopausal status and frequencies of all nonepithelial subsets in the breast. The dotted line indicates FDR at 0.05. Two-tailed unpaired Wilcoxon tests. (c) Dot plot showing the scaled expression of selected signature genes of CM12-related specialized fibroblast subsets in the breast. (d) Spatial distribution of CM12 cell subsets in a spatial transcriptomics slide of the breast. (e) CM12 activity and frequencies of S10 and S06 fibroblasts in samples under 50 years old and over 50 years old. Two-tailed unpaired Wilcoxon tests. (f) Validation of menopausal trajectory using another breast dataset (Methods). The PHATE embedding showing breast samples (individual dots) colored by menopausal status, pseudotime and frequencies of fibroblasts. (g) Trend lines show LOESS-smoothed fibroblast inflammatory scores along pseudotime in breast samples. The error bands show the 95% confidence intervals. For box plots in a,e, the center line represents the median, the box limits delineate the upper and lower quartiles, and the whiskers extend to the highest and lowest values within 1.5× the interquartile ranges.

Source Data

Extended Data Fig. 10 Overview of pan-cancer single-cell datasets.

(a) Statistics of donor, sample and cell numbers before and after QC for scRNA-seq datasets. (b) Dot plot showing the number of samples per cohort. c, Bar plot showing the number of cells per cancer type.

Source Data

Extended Data Fig. 11 Analysis of CMs and cCMs.

(a) Healthy CM activities across healthy, adjacent non-tumor, and tumor samples. Dots represent individual samples. Two-tailed unpaired Wilcoxon tests. (b) Topological and statistical evaluations of cCMs, with the connectivity score defined as the ratio of observed edges to the total possible edges among all nodes within the cCM network. (c) Bar plots showing the top 20 cell subsets ranked by NMF weights for each factor (cCM). (d) Network plots of the four CMs identified from the pan-cancer analysis. Nodes represent cell subsets labeled by short names and colored by cell types. Edge color indicates correlation specificity. For box plots in a,b, the center line represents the median, the box limits delineate the upper and lower quartiles, and the whiskers extend to the highest and lowest values within 1.5× the interquartile range.

Source Data

Extended Data Fig. 12 Pan-cancer analysis of cCMs.

(a) cCM activities across cancer types. Each sample is assigned a cCMT by its dominant cCM, and all samples are ordered by their cCMT identities and cancer types. (b) Tissue prevalence of cCMs measured by R_o/e. (c) cCM02 activities in healthy, adjacent non-tumor, and tumor samples across cancer types. Dots represent individual samples. Two-tailed unpaired Wilcoxon tests. (d) Schematic representation of rewiring of multicellular ecosystems during tumor progression. Created in BioRender; Shi, Q. (2025), https://BioRender.com/atp19sm. (e) Heat map showing the responsive cytokines among cell-subset components of cCM02. Bar plots on the top and left represent sums of columns and rows, respectively. (f) Distribution of cCM02 MCP overall expression across different cell subsets for cells from tumor and adjacent non-tumor samples. Vertical black line indicates the median of the distribution. Two-tailed unpaired Wilcoxon tests. For box plots in c, the center line represents the median, the box limits delineate the upper and lower quartiles, and the whiskers extend to the highest and lowest values within 1.5× the interquartile range. cCMT, cCM type.

Source Data

Supplementary information

Supplementary Information

This file contains Supplementary Figs. 1–10, Supplementary Notes 1 and 2 and legends for Supplementary Tables 1–12.

Reporting Summary

Supplementary Tables

This file contains Supplementary Tables 1–12.

Peer Review File

Source data

Source Data Fig. 1

Source Data Fig. 2

Source Data Fig. 3

Source Data Fig. 4

Source Data Fig. 5

Source Data Extended Data Fig. 1

Source Data Extended Data Fig. 2

Source Data Extended Data Fig. 3

Source Data Extended Data Fig. 4

Source Data Extended Data Fig. 5

Source Data Extended Data Fig. 6

Source Data Extended Data Fig. 7

Source Data Extended Data Fig. 8

Source Data Extended Data Fig. 9

Source Data Extended Data Fig. 10

Source Data Extended Data Fig. 11

Source Data Extended Data Fig. 12

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Shi, Q., Chen, Y., Li, Y. et al. Cross-tissue multicellular coordination and its rewiring in cancer. Nature 643, 529–538 (2025). https://doi.org/10.1038/s41586-025-09053-4

Download citation

Received: 11 April 2024
Accepted: 22 April 2025
Published: 28 May 2025
Version of record: 28 May 2025
Issue date: 10 July 2025
DOI: https://doi.org/10.1038/s41586-025-09053-4

This article is cited by

Spatial multi-omics mapping of tumor microanatomy dynamics following radiotherapy combined with targeted-immunotherapy in hepatocellular carcinoma
- Fansen Ji
- Haoming Xia
- Jiahong Dong
Biomarker Research (2026)
Cellular neighborhoods in cancer
- Lichun Ma
- Barbara Xiong
- Kai Tan
Nature Cancer (2026)
Various cell types in tissues cooperate in networks that are disrupted in disease
- Mirjana Efremova
Nature (2025)

Subjects

Abstract

Similar content being viewed by others

Main

A single-cell atlas of the human

Identification of cross-tissue CMs

CM annotations

Spatial characteristics of CMs

Multicellular communication within CMs

Coordinated ageing dynamics in the spleen

Fibroblast-engaged menopausal trajectory

Multicellular rewiring in cancer

Discussion

Methods

Single-cell data collection and preprocessing of healthy samples

Data collection

Quality control

Preprocessing

Single-cell data integration and annotation

Benchmarking integration methods

Dataset integration

Cell clustering and annotation

Hierarchical clustering of cell subsets

CoVarNet framework

CoVarNet overview

Input frequency matrix

NMF

Rank selection

Specifically correlated subset pairs

Construction, evaluation and visualization of CM networks

CMT classifications of samples

Integrative analysis of scRNA-seq and GTEx RNA-seq

RNA-seq data preprocessing

CMT classifications of RNA-seq samples

Tissue prevalence of cell subsets and CMs

Analysis of spatial transcriptomics data

Data collection

Cell subset identification

CM activities

Colocalization scores

Aggregation scores

Cellular niches

Xenium analysis

Cell–cell communication analysis

Global analysis agnostic to CMs

Comparative analysis informed by CMs

Cytokine response analysis

CM-dependent DEGs of cell subsets

Cytokine response inference

CM-specific cytokine networks

Association analysis of CM activity and phenotypic factors

Tissues

Sexes and age groups

Additional phenotypes

CM05 regulators in the spleen

Inferring regulons

Activities and target genes of shared regulons

MCP analysis

MCP identifications

Comparison between CM08 programme and other signatures

Inflammatory scores of CM12 subsets in the breast

Menopausal trajectory analysis

Discovery cohort

Validation cohort

Pan-cancer single-cell atlas

Data collection and preprocessing

Data integration, cell clustering and initial annotation

Supervised cell annotation

Rewiring of multicellular ecosystems in cancer

Samples used

Interrogation of healthy CMs

Co-occurrence of cell subsets in individual cancer types

Identification of cCMs using CoVarNet

Interrogation of cCM02

cCM02 analysis

Co-occurring network

Cytokine analysis

MCP analysis

Reporting summary

Data availability