Introduction

Pleomorphic adenoma (PA), a benign tumor, is the most prevalent neoplasm of the salivary glands1. However, recurrent or prolonged PA can undergo malignant transformation into carcinoma ex pleomorphic adenoma (CXPA), with an estimated malignancy rate of 6.2%2. Invasive CXPA is aggressive, with a poor prognosis and a 5-year survival rate of 25%–65%, often associated with lymph node metastasis and disease-related deaths3,4,5.

Currently, CXPA tumorigenesis is assumed to be accompanied by the accumulation of genetic changes, primarily the amplification of the human epidermal growth factor receptor 2 (HER-2), encoded by the ERBB-2 gene6. Literature and our previous studies indicate that HER-2 amplification promotes CXPA carcinogenesis by enhancing cell proliferation and invasiveness through the activation of the PI3K/Akt and MAPK/ERK signaling pathways7,8,9. However, clinical trials of HER-2 targeted therapy in CXPA have failed to demonstrate significant therapeutic benefit10, possibly because only a subset of patients have tumors with HER-2 amplification7. Therefore, effective targeted therapies to improve survival in patients with CXPA are still lacking, and identifying effective therapeutic targets based on the mechanisms of CXPA carcinogenesis remains a viable research strategy.

Cancer stem cells (CSCs) play a key role in cancer aggressiveness11 due to their capacity for self-renewal and multilineage differentiation (stemness) through asymmetric or symmetric division12,13,14. Cell surface markers have been beneficial in characterizing CSCs15,16,17. Al-Hajj et al. discovered that the CD44+/CD24/low cell subpopulation in breast cancer is highly tumorigenic18. An anti-CD44 antibody significantly reduced human breast tumor growth in a xenograft model19. CD133, a CSC marker, is highly expressed in several tumors, including gastric cancer. CD133-positive cells exhibited elevated malignant biological behaviors in gastric cancer20. CD133 inhibition in cisplatin-resistant CD133+ gastric cancer cells reduced tumor cell viability and increased apoptotic cell death21,22. Accordingly, CSC-targeted therapies, such as those that induce CSC apoptosis and differentiation, inhibit CSC self-renewal and division, and combat CSC activity by targeting the CSC niche23,24, may constitute a key component of multimodal treatment.

However, little research has been conducted on CSCs in salivary gland tumors. Only recently have studies begun to emphasize the presence of stem cells and the expression of marker proteins in salivary gland carcinomas and investigate their relationship with tumor behavior25,26. Wang et al. confirmed CD36 as a tumor stem cell marker in PA and demonstrated that PI3K-AKT pathway inhibition suppresses PA initiation and growth27. However, the significance of CSC markers in CXPA tumorigenesis and the relationship between CXPA-associated CSC markers and tumor proliferation and invasion remain unknown. Identifying CSCs in CXPA could facilitate the development of targeted therapeutic strategies.

High-throughput RNA sequencing (RNA-seq) has significantly advanced the discovery of CSCs28. While transcriptional heterogeneity is typically assessed through bulk RNA-seq and bioinformatics analyses, identifying specific cell types is challenging. Cell analysis using single-cell RNA sequencing (scRNA-seq) has enabled comprehensive dissection of cellular heterogeneity29,30. Therefore, this study performed scRNA-seq to analyze the transcriptomes of 41,145 single cells from two CXPA specimens and paired normal parotid glands. Subsequently, we used nonnegative matrix factorization (NMF) to examine epithelial cell heterogeneity and discovered a potential CSC population in CXPA. We further validated the potential marker gene of this CSC population in 100 CXPA surgical specimens. Finally, sphere-forming experiments and cell biology behavioral assays were conducted to validate stemness and function. Investigating the molecular mechanisms associated with CSCs underlying CXPA tumorigenesis will help identify effective therapeutic targets to improve patient survival.

Results

Single-cell sequencing and cell type identification

After removing low-quality cells and normalizing gene expression (Figs. 1A and S1A), 41,145 single cells were processed for further analysis, with 11 cell clusters identified using graph-based clustering and dimensional reduction with UMAP (Fig. S1B). The UMAP analysis revealed that cell distribution was consistent across the different CXPA samples (Fig. 1B), suggesting a unified cellular landscape within CXPA. Eight major cell types were annotated based on the expression of canonical gene markers (Figs. 1B, C and S1C), including epithelial cells expressing KRT8, KRT14, KRT18, EPCAM, KRT19, and KRT7; endothelial cells expressing VWF and PECAM1; fibroblasts expressing COL1A2, DCN, COL1A1, PDGFRA, FAP, and POSTN; T cells expressing CD3D and CD3E; B cells expressing CD79A and MS4A1; plasma cells expressing MZB1; myeloid cells expressing CD68, CD14, and LYZ; and pericytes expressing NOTCH3, RGS5, and NR2F2. For each cell type, we identified DEGs and discovered cell-type-specific pathway enrichments (Fig. 1D). Consequently, our scRNA-seq approach comprehensively delineated the cellular diversity present within the CXPA tumor microenvironment (TME).

Fig. 1: A single-cell atlas of CXPA and Parotid gland (PG).
figure 1

A The workflow of single-cell RNA sequencing. B UMAP plot of 41145 single cells from scRNA-seq labeled by cell type. T refers to CXPA and N refers to PG. C Dot Plot of classic genes of each cell type. D A heatmap of marker genes for each cell type and enrichment analysis of GO and KEGG of those marker genes. E The cell numbers and proportions of CXPA cell types in different samples and tissue types.

Notably, each of the eight clusters comprised cells from different samples despite variations in cell type proportions (Fig. 1E).

Molecular characteristics of CXPA epithelial carcinomatous cells

First, we inferred CNVs in the epithelial cells of each CXPA sample based on smoothed expression profiles across chromosomal intervals. We observed higher CNV levels in CXPA epithelial cells compared with pericytes in PG (Fig. 2A, B). The inferred CNV analysis revealed significant amplifications on chromosomes 1 and 22 but significant deletions on chromosomes 4 and 8 (Fig. 2A, B). Subsequently, a panel of marker genes was observed to be upregulated in CXPA and PG epithelial cells. High expression of marker genes related to salivary gland secretion and saliva enzymes, such as PRB2, HTN1, and CST5, were observed in PA epithelial cells, while several genes confirmed to promote CXPA carcinogenesis, including FASN, NQO1, and SDC1, were increased in CXPA cells (Fig. 2C, D)31,32,33. GO enrichment analysis based on the DEGs revealed that metabolism-related pathways, such as the lipid biosynthetic process, were involved in the malignant epithelium, suggesting that an abnormal metabolic process occurred in the CXPA epithelium (Fig. 2E). Conversely, several pathways related to the physiological function of salivary glands, including defense response to bacteria, were mainly enriched in the PG epithelium (Fig. 2F).

Fig. 2: Molecular characteristics of epithelial cells.
figure 2

A, B Chromosomal copy-number variation values in epithelial cells of two CXPA samples, with blue indicating deletion and red indicating amplification. C Violin plots revealing the high differential expression of the representative genes PRB2, HTN1, and CST5 in the PG sample (P < 0.01). D Violin plots demonstrating the high differential expression of the representative genes FASN, NQO1, and SDC1 in the CXPA samples (P < 0.01). E GO analysis of genes in CXPA epithelial cells (P < 0.05). F GO analysis of genes in PG epithelial cells (P < 0.05). ****P < 0.0001, Mann–Whitney Wilcoxon.

Characterization of transcriptional heterogeneity of epithelial cells via NMF

To characterize the transcriptional heterogeneity of epithelial cells, we analyzed clusters of co-expressed genes in each sample using NMF and obtained 30 signatures (Supplementary Data1). For example, the NMF results for CXPA-2 revealed complex intratumoral expression features (Fig. 3A). Subsequently, we identified four prominent meta-signatures across the four samples via correlation cluster analysis of the 30 signatures (Fig. 3B). Each of these meta-signatures comprised unique gene sets (Fig. 3C). The first meta-signature was associated with epithelial development and included genes such as MUC1, CAMK2N1, and AQP3, as well as fatty-acid-related genes, such as FASN. The second meta-signature mostly comprised genes associated with the cell cycle, including mitotic, DNA replication, and proliferative-related genes such as CCNB1, CDK1, CDKN3, HMGB2, and MKI67. The third meta-signature comprised genes involved in ECM remodeling, including COL1A2, CAV1, MYL9, TPM2, and IGFBP5. Genes in the fourth meta-signature were associated with salivary secretion, including PRB1, PRH1, and WFDC2 (Fig. 3C). Based on these meta-program signatures, we calculated functional scores for all epithelial cells (Fig. 3D), which enabled us to define four cell states with specific meta-signatures: (i) epithelial development (ED), (ii) cell cycle (CC), (iii) ECM remodeling (ER), and (iv) salivary secretion (SS) (Fig. 3E).

Fig. 3: Expression profiles revealed distinct clusters of epithelial cells in CXPA and PG.
figure 3

A Heatmap illustrating gene expression signatures parsed from a representative tumor (CXPA-2) using NMF. Each expression signature is numbered, with corresponding genes displayed (right). CXPA-2 refers to T2. B Pearson’s correlation clustering of 30 expression signatures. Four regions comprising clusters of highly correlated signatures were identified as meta-signatures. C Heatmap depicting the expression levels of the top 50 marker genes in each meta-signature. D UMAP plots displaying the distribution of four meta-signatures after using the “AddModuleScore” function. E UMAP plots demonstrating the distribution of four cell states and sample origins in CXPA and PG epithelial cells. F GO and KEGG enrichment analyses of marker genes for each cell state. G Heatmap depicting differences in TF activity across epithelial clusters of CXPA and PG (scored by SCENIC).

To illustrate epithelial cell functional heterogeneity, we conducted KEGG and GO enrichment analyses to infer the biological functions of epithelial clusters (Fig. 3F). Enrichment analysis revealed that ED was associated with epithelial branch elongation, positive regulation of keratinocyte differentiation, and mammary gland epithelial cell proliferation. Pathways such as muscle contraction, focal adhesion, cell junction assembly, and regulation of the actin cytoskeleton were upregulated in ER. Mitotic nuclear division and CC pathways were enriched in CC. SS was associated with a defense response to bacteria and salivary secretion.

Furthermore, we correlated transcriptome differences with TF motifs across all epithelial clusters by inferring a single-cell regulatory network and conducting clustering (SCENIC) analysis. The SCENIC analysis identified a set of TFs associated with biological signatures in distinct epithelial clusters (Fig. 3G). We discovered that key transcription motifs of FOS, JUN, and TWIST1 in the ER cluster were activated. Additionally, MYC2 and AR were substantially activated in the ED cluster. In the CC cluster, EZH2 and BRCA1 motifs were activated (Fig. 3G). EZH2 regulates the stemness-associated gene expression of CSCs34. BRCA1 becomes hyperphosphorylated during the late G1 and S phases of the CC and is transiently dephosphorylated shortly after the M phase, suggesting its role in the regulation of cell cycle progression35.

CC cluster is a potential CSC population in CXPA

To elucidate whether a stemness-associated epithelial cell subpopulation is associated with CXPA tumorigenesis, we first assessed the stemness state across these four epithelial cell states. Cells in the cell cycle exhibited high stemness scores (n = 16 genes, P < 0.0001) (Supplementary Data2, Fig. 4A). Additionally, we investigated the distribution pattern of the four cell states along the pseudotime trajectory. Cells in the CC were predominantly observed at the start of the pseudotime, whereas cells in ER and salivary secretion were mostly found at the end (Fig. 4B), supporting our notion that CC cells are highly undifferentiated cells. Afterward, we regrouped the four cell states into eight main clusters (Fig. 4C) and assessed the stemness state in these eight epithelial cell clusters. Cells in CC1 exhibited the highest stemness scores, followed by CC2 (n = 16 genes, P < 0.0001) (Fig. 4D). Cytotrace analysis further revealed that CC1 exhibited a higher potential of cell differentiation than CC2 (P < 0.0001) (Fig. 4E). CC1 and CC2 were almost enriched in tumor samples, suggesting that they represent two potential populations of CSCs (Fig. 4F).

Fig. 4: Transcriptional landscape heterogeneity of epithelial cell subsets.
figure 4

A Violin plot of stem cell score in four cell states. B Pseudotime evolution trajectories of epithelial cell states ED, ER, SS, and CC (inferred by Monocle2). C UMAP plots illustrating the distribution of cell clusters in CXPA and PA epithelial cells. D Violin plot of stem cell score in eight epithelial cell subsets. E Bar chart for CC (CC1 and CC2) stemness and differentiation (assessed using CytoTRACE analysis). F Bar chart depicting the proportions of the eight epithelial cell subsets in CXPA and PA samples. ****P < 0.0001, Mann–Whitney Wilcoxon.

We further analyzed marker genes of the CC1 and CC2 populations. UMAP plots revealed the relative specific distribution of ECT2 and TACC3 expression in the CC1 and CC2 populations, respectively (Figs. 5A, B, S2 and S3). IHC analysis indicated that ECT2 is localized in the nucleus (Fig. 5C), but TACC3 is mainly expressed in the cytoplasm of CXPA cells (Fig. 5D). These results suggest that ECT2 can serve as a marker for CC1 cells, and TACC3 can be a marker gene for CC2.

Fig. 5: Correlation analysis of ECT2 and TACC3 expression levels with invasiveness and prognosis in patients with CXPA.
figure 5

A Expression levels of ECT2 are illustrated in UMAP plots from eight epithelial cell clusters. B Expression levels of TACC3 illustrated in UMAP plots from eight epithelial cell clusters. C IHC analysis of the ECT2 expression levels (Scale bar = 20 μm) in CXPA and PA, as well as corresponding H&E staining results. D IHC analysis of the TACC3 expression levels (Scale bar = 20 μm) and the H&E staining results in CXPA and PA. The scoring system of ECT2 and TACC3 expression was as follows: low expression (score < 3), moderate expression (3 ≤ score < 6), and high expression (score ≥ 6). E Comparison of ECT2 and TACC3 expression in CXPA (n = 100) and PA (n = 29) patients. F Comparison of ECT2 and TACC3 expression in patients with WICXPA (n = 50), MICXPA, and ICCXPA (n = 50, Mann–Whitney U-test). G Comparison of ECT2 and TACC3 expression in patients with WICXPA (n = 50), MICXPA, and ICCXPA patients (n = 50, Fisher’s exact test). The thick line indicates the median value. The bottom and top of the boxes represent the 25th and 75th percentiles (interquartile range) and extend through the whiskers to 1.5 times the interquartile range. H Kaplan–Meier curves for OS based on ECT2 expression levels (log-rank test, P > 0.05) in WICXPA (n = 50). I Kaplan–Meier curves for OS based on TACC3 expression levels (log-rank test, P < 0.05) in WICXPA (n = 50). J Univariate and multivariate logistic regression analyses assessing the predictive value of biomarkers (ECT2 and TACC3 expression) for the invasive degree of CXPA. K Univariate and multivariate logistic regression analyses assessing the prognostic value of biomarkers (ECT2 and TACC3 expression) in patients with WICXPA. OR Odds Ratio.

CC1 cluster marker gene ECT2 was validated in surgical specimens from CXPA and PA

The extension of invasion beyond the capsule of the previous PA is a key factor in the prognosis of patients with CXPA1,2. Accordingly, CXPA is classified as intracapsular, minimally invasive, and widely invasive. We conducted subsequent immunohistochemical validation on 100 CXPA and 29 PA archived paraffin-embedded samples. The IHC results revealed that ECT2 and TACC3 were exclusively expressed in CXPA, with minimal expression in PA (Fig. 5E, P < 0.001), suggesting that both ECT2 and TACC3, which function as CSC maker genes of CXPA, are crucial for CXPA tumorigenesis. ECT2 and TACC3 expression levels were higher in widely invasive CXPA (WICXPA) than in minimally invasive CXPA (MICXPA) and intracapsular CXPA (ICCXPA) (Fig. 5F, P < 0.001). High ECT2 expression was predominantly observed in patients with WICXPA (Fig. 5G, P < 0.001). Furthermore, univariate and multivariate logistic regression analyses confirmed the strong correlation between ECT2 expression and CXPA invasiveness (Fig. 5J). However, the multivariate logistic regression analysis revealed no correlation between TACC3 expression and CXPA invasiveness (Fig. 5J).

The median ECT2 or TACC3 positivity rate was used as a cutoff to classify patients with CXPA into high- and low-expression groups for Kaplan–Meier survival analysis. Kaplan–Meier survival analysis revealed that patients with WICXPA exhibiting low ECT2 expression demonstrated better OS than those with high ECT2 expression (Fig. 5H, P < 0.5). No statistically significant difference in OS was observed between patients with WICXPA exhibiting low TACC3 expression and those exhibiting TACC3 expression (Fig. 5I). ECT2 and TACC3 expression demonstrated no statistically significant effects on DFS rate (Fig. S4A, B). Additionally, the univariate and multivariate logistic regression analyses revealed a correlation between ECT2 expression and prognosis in patients with WICXPA (Fig. 5K). Patients with intracapsular or minimally invasive CXPA exhibited a better prognosis, with Kaplan–Meier survival analyses demonstrating no statistical significance (Fig. S4C–F). These results indicate that ECT2 expression can be used to evaluate the prognosis of patients with WICXPA.

ECT2+ cells exhibited stem cell properties and promoted the proliferation, invasion, and migration of CXPA

Because ECT2 expression was more closely associated with CXPA prognosis and invasion than TACC3 expression, we investigated its role in CXPA carcinogenesis. First, we assessed ECT2 expression in two CXPA cell lines and confirmed that SM-AP4 exhibited higher ECT2 levels than SM-AP1 (Fig. 6A). Subsequently, SM-AP1 was stably transfected with a lentivirus containing a negative control or OE-ECT2 (Fig. 6B). The CCK-8 assay revealed that ECT2 overexpression significantly increased the growth rate of SM-AP1 cells (Fig. 6C). Additionally, wound healing and transwell assays demonstrated that the migration and invasion abilities of OE-ECT2 SM-AP1 cells were markedly enhanced (Fig. 6D, E). The live/dead staining assay revealed that the number of viable SM-AP4 decreased after ECT2 knockdown (Fig. 6F, G). Wound healing assays indicated that the number of migrated cells was smaller than that in the control group (Fig. 6H). Compared with the control groups, fewer cancer cells migrated through in the ECT2 knockdown group (Fig. 6I), indicating a notable reduction in the invasive capacity of the cells after the ECT2 knockdown.

Fig. 6: Effects of ECT2 on proliferation, migration, invasion, and stemness of CXPA cell lines.
figure 6

A Real-time quantitative polymerase chain reaction (RT-qPCR) and western blot analyses detected ECT2 expression in SM-AP1 and SM-AP4 cells. B RT-qPCR and western blot analyses detected the overexpression efficiency of ECT2 in SM-AP1 cells. C CCK-8 assay results for SM-AP1 and OE-ECT2 SM-AP1 cells. D Transwell assay results (with or without Matrigel) for SM-AP1 and OE-ECT2 SM-AP1 cells (Scale bar = 100 μm). E Wound healing assay results for OE-ECT2 SM-AP1 and SM-AP1 (Scale bar = 200 μm). F RT-qPCR and western blot analysis detected the knockdown efficiency of ECT2 in SM-AP4 cells. G Representative live/dead images of siECT2 SM-AP4 and SM-AP4 cells (Scale bar = 100 μm). H Wound healing assay results for siECT2 SM-AP4 and SM-AP4 cells (Scale bar = 200 μm). I Transwell assay results (with or without Matrigel) for SM-AP4 and siECT2 SM-AP4 cells (Scale bar = 100 μm). J Measurement of the area of the cell spheroids by ImageJ after 5 days of culture (n = 20, scale bar = 200 μm). Independent experiments were performed three times. *P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001, determined by Student’s t-test and two-way ANOVA. Error bars indicate mean ± standard deviation.

To assess the effect of ECT2 on the stemness of the CXPA cell lines, a tumor sphere formation assay was performed using OE-ECT2 SM-AP1, siECT2 SM-AP4, and control cells. Compared with the control groups, OE-ECT2 SM-AP1 cells formed larger spheres, whereas siECT2 SM-AP4 cells formed smaller spheres (Fig. 6J). Overall, ECT2 promoted the stemness, proliferation, invasion, and migration capabilities of the CXPA cell line in vitro.

Discussion

CXPA is a tumor caused by the malignant transformation of PA and is characterized by aggressive behavior and a poor prognosis36. The major molecular events in CXPA carcinogenesis are genomic changes, altered metabolism, and dynamic shifts in the TME. Notably, CSCs are considered key drivers of carcinogenesis and tumor progression owing to their self-renewal and differentiation capabilities37. Therefore, identifying CSCs in CXPA will help elucidate the mechanisms of malignancy to guide targeted therapy and improve patient survival. However, the CSCs of CXPA remain poorly understood.

Neoplastic cells have been identified as heterogeneous populations. scRNA-seq technique provides significant support for describing the biological characteristics of heterogeneous cell populations, accurately providing unbiased information about individual single cells38. In this study, the landscapes of cell constituents and their genetic heterogeneities in CXPA were investigated. To the best of our knowledge, this is the first study to generate a comprehensive gene expression atlas of CXPA using scRNA-seq technology. NMF is widely used to extract significant features from microarray and scRNA-seq data and to identify mRNA isoforms39,40. Yu Jiang et al. grouped four major neoplastic cell states with distinctive expression signatures using NMF based on single-cell data41, revealing the tumor heterogeneity of adamantinomatous craniopharyngioma. Similarly, in this study, NMF analysis of single-cell data identified four cell states of tumor epithelial cells with specific meta-signatures: (i) epithelial development (ED), (ii) cell cycle (CC), (iii) ECM remodeling (ER), and (iv) salivary secretion (SS), indicating the heterogeneity of CXPA tumor epithelial cells.

Stemness scores and pseudotime trajectory analyses were used to assess cell stemness. Results revealed that tumor epithelial cells within the CC exhibited significantly higher stemness scores than the other three clusters (P < 0.0001). Pseudotime trajectory analysis revealed that cells in the CC cluster were predominantly located at the start of the pseudotime trajectory, supporting our notion that these tumor epithelial cells may be highly undifferentiated and associated with stemness. Additionally, KEGG and GO enrichment analyses indicated that pathways involved in cell proliferation, such as nuclear division, mitotic nuclear division, and the CC, were enriched in the CC cluster. This evidence suggests that the CC cluster represents a tumor epithelial population with CXPA stemness.

In addition to the heterogeneity of tumor epithelial cells, CSCs exhibit heterogeneity, with different tumor CSCs characterized by different markers. Studies have identified novel CSC-related markers, including CD44, CD133, and CD24, in breast, hepatocellular, and bladder cancers, respectively42,43,44. We regrouped the CC cluster into CC1 and CC2 clusters using UMAP clustering. Results revealed that the stemness-associated tumor epithelial cells in CXPA also displayed heterogeneity. Based on specific gene expression in the UMAP plots, ECT2 and TACC3 were identified as candidate markers for the CC1 and CC2 clusters, respectively. Further analysis of 100 archived paraffin-embedded CXPA specimens and 29 cases of PA specimens revealed that ECT2 and TACC3 were significantly more expressed in CXPA than in its benign precursor PA and normal glands, validating ECT2 and TACC3 as markers for the CC1 and CC2 populations, respectively, in CXPA.

Epithelial cell transforming sequence 2 (ECT2), also known as ARHGEF31, is a RhoGEF activator primarily targeting RhoA45. ECT2 is overexpressed in tumors than in normal tissues, including oral squamous cell carcinoma46, glioblastoma47, lung adenocarcinoma48, and colorectal carcinomas49. TACC3 is upregulated in solid tumors and is strongly associated with poor prognosis in several cancers, including breast and lung cancers50. It is localized to centrosomes and microtubules, controlling spindle stability and microtubule nucleation51. Although ECT2 and TACC3 are highly relevant markers in malignant tumors, CytoTRACE analysis revealed that CC1 cells exhibited a higher potential for differentiation than CC2 cells (P < 0.0001). Logistic regression analyses of 100 CXPAs revealed that ECT2 was associated with tumor invasiveness and patient prognosis, whereas TACC3 was not, suggesting that ECT2-positive CSC clusters are more significantly associated with the biological behavior of CXPA. Worse OS was observed in patients with high ECT2 expression in invasive CXPA, suggesting that ECT2 may be an important molecule in the prognosis of CXPA.

Tumor-derived spheroid cultures are closely linked to in vitro studies of cancer stemness and are considered one of the criteria for identifying CSCs52. Sphere cells from prostate cancer exhibited extensive self-renewal and multilineage differentiation capabilities53. In this study, in vitro cellular assay demonstrated that ECT2 knockdown suppressed CXPA tumor sphere formation. Additionally, ECT2 knockdown in SM-AP4 cells suppressed cell proliferation, migration, and invasion, whereas ECT2-overexpressing SM-AP1 cells exhibited the opposite effect, indicating that ECT2 may play a pro-oncogene role in CXPA.

Nowadays, CSCs are recognized as drivers of tumor initiation and growth, and they are often associated with aggressive, heterogeneous, and therapy-resistant tumors37. For many years, researchers have focused on CSCs54. Combining CSC-targeted therapies with conventional non-targeted therapies can result in decreased chemoresistance55. Overall, our findings provide the first comprehensive gene expression atlas of CXPA, revealing intratumoral heterogeneity at single-cell resolution. Furthermore, we identified a unique tumor epithelial cell population with CSC characteristics and high ECT2 expression that contributes to CXPA carcinogenesis, which lays the groundwork for the development of precision therapies in CXPA.

Methods

Samples collection

All CXPA and PA samples were collected from the Department of Oral Pathology at Ninth People’s Hospital, Shanghai Jiao Tong University School of Medicine (Shanghai, China), between 2006 and 2022. All the archived paraffin-embedded CXPA (n = 100) and PA (n = 29) samples were selected for immunohistochemical analysis. Additionally, two fresh CXPA and paired normal parotid glands were collected from 2021 to 2022 for single-cell sequencing. All research processes were approved by the ethics committee of Shanghai Ninth People’s Hospital (SH9H-2023-T308-1).

Single-cell suspension preparation and droplet-based single-cell sequencing

Widely invasive CXPA and paired normal parotid glands(derived from the tumor-free tissue of the two CXPA patients) were processed immediately after collection from patients with CXPA. The histological morphology of the remaining parts of the four samples was evaluated to confirm the diagnosis. Samples were mechanically and enzymatically digested into single-cell suspensions using gentleMACS (Miltenyi) according to the manufacturer’s instructions. The dissociated cells were filtered through a 70-μm cell sieve and then through a 40-μm cell sieve. The filtrate was centrifuged at 300 × g for 5 min, and the cell pellet was resuspended in red blood cell lysis buffer (MACS, 130-094-183) after three washes with phosphate buffer saline (PBS). The cell pellet was centrifuged and resuspended in 3 mL of Dulbecco’s phosphate-buffered saline. The cells were counted (LUNA-II™) and assessed for viability using trypan blue staining.

Single-cell suspensions with viability greater than 90% were loaded into Chromium chips. cDNA libraries were constructed using the 10× Genomics Chromium Next GEM Single Cell 3′ Reagent kits (version 3.1, 1000268) following the manufacturer’s instructions. The scRNA-seq library was sequenced on the Illumina Nova 6000 PE150 platform. Sequenced reads were aligned and quantified using the Cell Ranger56 (version 5.0.0) pipeline. The unique molecular identifier count matrix was processed using the Seurat57 R package (version 4.0.0). The expression matrix passed quality control (QC) based on three metrics, including the number of detected genes (nFeature_RNA), RNA molecules (nCount_RNA), and the proportion of mitochondrial gene count per cell. After applying these QC criteria, 41,145 high-quality cells (400 < nFeature_RNA < 8000, 400 < nCount_RNA < 50,000, and percent.mt < 20) were included in downstream analyses.

Dimensionality reduction, clustering, and cell type annotation

To better capture the biological functional characteristics of the cells, we used the “NormalizeData” function to normalize the expression matrix and the “FindVariableFeatures” function to identify the intercellular variant genes in the dataset as highly variable genes (HVGs). Subsequently, we scaled the data using “ScaleData” and analyzed the first 50 main components of the expression matrix using “RunPCA.” Batch effects were eliminated using the Harmony package. The first 20 main components were clustered using “FindNeighbors” and “FindClusters” and projected onto two-dimensional uniform manifold approximation and projection (UMAP)58 images for visualization. The characteristic genes of each cell subgroup were calculated using the “FindAllMarkers” function in the Seurat package, and the cell type of each group was characterized using the recognized cell marker genes from previous studies.

Single-cell copy-number variation (CNV) evaluation

Using the default parameters, the “InferCNV”59 package (version 1.14.2) was used to detect CNVs in epithelial cells and identify actual cancer cells. We used 300 pericytes from normal samples as the reference, epithelial cells from different tumor samples as the observations, and ran inferCNV with a cutoff of 0.1. The expected output was a heatmap of observed expression relative to the reference, revealing regions of chromosomal gain and loss.

NMF analysis of epithelial cell heterogeneity in expression signatures

NMF was used to identify variable expression signatures in epithelial cells from CXPA and paired normal parotid glands using the NMF60 package (version 0.23.0). For each sample, we selected the first 2000 HVGs for expression program analysis. All negative values in the expression matrix were replaced with 0. For epithelial cells, we retrieved 30 potential expression signatures from four samples. To investigate the common characteristics across these four samples, we calculated the score of each program in each sample using the R “AUCell”61 package (version 1.20.2), followed by hierarchical clustering of all signatures and calculation of Pearson’s index. Finally, four meta-signatures were identified for each sample, representing epithelial development, cell cycle, extracellular matrix (ECM) remodeling, and salivary secretion. To determine epithelial cell states, we used the built-in “AddModuleScore” function in the Seurat package to score the function of single cells. The scoring gene set was derived from the four meta-signature marker genes.

Identification of differentially expressed genes (DEGs) and enrichment analysis

We identified the differentially overexpressed genes in the specific cluster compared with other clusters using the “FindMarkers” function in Seurat (adjusted P-value < 0.05 and logFC threshold = 0.25). Gene ontology (GO) and the Kyoto encyclopedia of genes and genomes (KEGG) pathway enrichment analysis of DEGs were performed in R62 using the hypergeometric distribution.

Transcription factor (TF) regulatory network analysis

The SCENIC61 package (version 1.2.4) was used to construct a TF regulatory network from scRNA-seq data and determine the functional status of cells. To investigate differences in TF activity and their downstream target genes across the four cell states, SCENIC analysis was performed on all single cells. The quantification matrix of TF activity was imported into the Seurat object, and subgroup-specific TFs were identified using the “FindAllMarkers” function.

Stemness score

Epithelial cell subsets were scored based on the reference gene sets using the “AddModuleScore” function in Seurat. We scored each epithelial cell state by comparing the mean abundance levels of function-related mRNA genes in each subset with those of the control gene sets.

Delineation of cell differentiation trajectories

The Monocle263 package (version 2.24.0) was used to conduct a pseudotime analysis and investigate the possible evolutionary relationships among the different epithelial cell states. All parameters were set to default.

CytoTRACE64 (version 0.3.3), a scoring method that estimates developmental potential based on included datasets, was applied using the default recommended settings after concatenating relevant batches through an outer join.

Identification and validation of the potential marker genes associated with CSCs of CXPA in surgical specimens

Immunohistochemistry (IHC) was performed on archived paraffin-embedded samples of CXPA (n = 100) and PA (n = 29) to assess the expression of ECT2 and transforming acidic coiled-coil 3 (TACC3), potential marker genes associated with CSCs in CXPA. IHC staining was performed using the Envision™ (Dako Denmark A/S, Glostrup, Denmark) two-step method. The paraffin-embedded tissues were sliced into 3-μm-thick sections, dewaxed in xylene, hydrated in gradient ethanol, and subjected to antigen retrieval in pH 8.0 EDTA buffer for 20 min. The slides were incubated overnight at 4 °C with primary antibodies, including anti-ECT2 (1:150; Abmart, China) and anti-TACC3 (1:150; Abmart, China), followed by a ready-to-use secondary antibody (Dako Denmark A/S, Glostrup, Denmark) for 30 min at 37 °C. PBS was used as a blank control instead of a primary antibody. The stained sections were examined using an OLYMPUS BX43 microscope. Tumor tissue samples that had been validated as ECT2- and TACC3-positive in our laboratory were employed as the positive control. Brown-stained cell nuclei were considered ECT2-positive, and TACC3 was located in the cytoplasm. Ten random fields were chosen from every slice to assess the ECT2 and TACC3 scores. The score of each slice was based on the percentage and intensity of positively stained cells. The percentage scoring system was as follows: no positive cells (0), <10% positive tumor cells (1), 10–30% positive tumor cells (2), and >30% positive tumor cells (3). The percentage scoring system was as follows: The intensity scoring system was as follows: no staining (0), light yellow (1), yellow brown (2), and dark brown (3). The percentage score was multiplied by the intensity score, and sections were divided into 3 groups based on the resulting product, as follows: low expression (score < 3), moderate expression (3 ≤ score < 6), and high expression (score ≥ 6). IHC slides were scored by two pathologists without knowledge of the clinical data in order to eliminate bias.

Follow-up

Follow-up information was available for all 100 patients with CXPA. The follow-up duration ranged from 2 to 221 months (mean = 73.4 months). At the end of the follow-up period, 23 patients (23%) had died. Disease-free survival (DFS) was calculated from the date of the initial surgery to the date of local recurrence. Overall survival (OS) was assessed from the date of the initial surgery to death from any cause.

Cell culture

The human CXPA cell lines SM-AP1 and SM-AP4 were obtained from the Division of Oral Pathology, Department of Tissue Regeneration and Reconstruction, Niigata University, Japan65. SM-AP1 and SM-AP4 cells were cultured in DMEM (Invitrogen, USA) supplemented with 10% fetal bovine serum and 1% penicillin/streptomycin.

Small interfering RNA (siRNA) transfection

Three siRNA sequences were designed to silence ECT2 mRNA: ECT2-si-1 (5′-3′): CAGAGGAGAUUAAGACUAUTT; AUAGUCUUAAUCUCCUCUGTT, ECT2-si-2 (5′-3′): GCACUCACCUUGUAGUUGATT; UCAACUACAAGGUGAGUGCTT, and ECT2-si-3 (5′-3′): GAUAAAGGAUGAUCUUGAATT; GAUAAAGGAUGAUCUUGAATT. The NC (5′-3′): UUCUCCGAACGUGUCACGUTT; ACGUGACACGUUCGGAGAATT. SM-AP4 cells were seeded in 6-well plates and transfected with the targeted siRNA when the cells reached 40% confluence. The efficiency was evaluated via real-time polymerase chain reaction (PCR) and western blot analysis.

Lentivirus-mediated ECT2 overexpression

The ECT2-overexpressing SM-AP1 cell line was generated through lentivirus transfection following the manufacturer’s instructions. The H-ECT2-ZV342 lentivirus vector (ZORIN, China) was used for transfection. After transfection, 5 μg/mL puromycin (Yeasen, China) was used to select ECT2-overexpressing SM-AP1 cells, and transfection efficiency was evaluated via real-time PCR and western blot analysis.

RNA extraction and real-time PCR analysis

RNA was extracted using TRIzol Reagent (NCM, USA). cDNA was synthesized and amplified according to the protocol provided with the SYBR Premix Ex Taq reagent kit (Takara, Japan). The holding stage was set at 95 °C for 30 s, followed by a cycling stage comprising 40 cycles of 95 °C for 5 s and 60 °C for 30 s. The primers for ECT2 (5′-3′) were ACTACTGGGAGGACTAGCTTG and CACTCTTGTTTCAATCTGAGGCA. GAPDH (Sangon Biotech, China) served as an internal parameter. Relative expression levels were calculated using the 2−∆∆Ct method.

Western blot analysis

Western blot analysis was performed using standard procedures. The primary antibodies used included ECT2 (1:1000; Abmart, China), GAPDH (1:1000; CST, USA), β-Actin (1:1000; Abmart, China), and Tubulin (1:1000; Abmart, China). The corresponding secondary HRP-conjugated antibodies were anti-rabbit (1:5000; Abmart, China) and anti-mouse (1:5000; Abmart, China).

Sphere culture

The ECT2-overexpressing SM-AP1, siECT2 SM-AP4, and control cells were seeded (1 × 104 cells/well) in CAPs 3D Organoid Culture Microarray (Xirui Biotech Inc., China). The cells were cultured for 5 days before harvesting. Images were captured using an inverted microscope.

In vitro validation of ECT2 expression and its impact on proliferation, migration, and invasion of CXPA cell lines

To directly evaluate cell viability, siECT2 SM-AP4 and control cells (1 × 105 cells/well) were seeded in 12-well plates and stained using a live/dead assay kit (Yeasen, China) following the manufacturer’s instructions.

The cell counting kit-8 (CCK-8; NCM, Suzhou, China) assay was used to assess the impact of ECT2 overexpression on cell proliferation. Following the manufacturer’s instructions, SM-AP1 and ECT2-overexpressing SM-AP1 cells were seeded in 96-well plates (1 × 104 cells/well), and 10 μL of CCK-8 reagent was added to each well. After 2 h of incubation, absorbance was measured spectrophotometrically at 450 nm (SpectraMax i3, Molecular Devices, USA). The CCK-8 assay was conducted after culturing the cells for 0, 24, 48, and 72 h.

Transwell assays (with or without Matrigel) were performed on SM-AP4 cells transfected with or without siECT2 according to standard protocols. The ability of the cells to migrate and invade was assessed after 24 h of culturing at 37 °C, followed by a 15-min crystal violet staining. For the wound healing assay, siECT2 SM-AP4 and control cells were seeded in 6-well plates. After the cells reached 100% confluence, they were scraped with a sterile 10-μL pipette tip. Images were captured at 0 and 24 h. Furthermore, ECT2-overexpressing SM-AP1 cells and control cells were used for these assays.

Statistical analysis

Statistical analyses were conducted using R software (version 4.2.2) and GraphPad Prism software (version 9.0; GraphPad, San Diego, CA). The Mann–Whitney U-test, Student’s t-test, and two-way analysis of variance (ANOVA) were employed for statistical comparisons. Categorical variables of clinicopathological characteristics were compared using Fisher’s exact test. Kaplan–Meier survival analysis, with a log-rank test, was used to estimate OS. Univariate and multivariate logistic regression analyses were conducted to evaluate the associations between variables and prognosis/invasiveness. A P < 0.05 was considered statistically significant.