Introduction

Pancreatic ductal adenocarcinoma (PDAC) persists as one of the world’s most formidable and lethal malignancies. Based on reports, there has been only a slight improvement in the prognosis of PDAC with a 5-year survival rate of 12.8% recently1. PDAC has remained largely refractory to available therapeutics, with a hallmark of heterogeneous chemotherapeutic responses in subsets of patients2. Primary surgical resection is feasible in less than 20% of patients, while the majority manifest with advanced, nonresectable disease3,4. Intriguingly, even among patients deemed suitable for surgical intervention and subjected to neoadjuvant therapy, an estimated 75% will encounter recurrence within a biennial timeframe, with a near 20% five-year overall survival rate5,6. The implementation of immune therapeutic strategies has been largely futile due to the immunologically “cold” TME characterized by significant myeloid cell infiltration and impeded T cell activation7. Therefore, it is necessary to further understanding the heterogeneity and biological functions of the TME during different stages of malignant progression in PDAC to develop effective treatment strategies.

Single-cell RNA sequencing (scRNA-seq) reveals the complex heterogeneity in the PDAC microenvironment with unprecedented resolution8,9. With the availability of whole-genome expression data, scRNA-seq can be used to identify new cell types and their marker genes, indicate the developmental trajectories of specific cells through pseudotime analysis (Monocle2)10, analyze patient prognosis based on TCGA data, explore key signaling pathways between cells in the TME using CellChat, and analyze drug sensitivity based on CellMiner. For example, scRNA-seq revealed plenty of distinct immune cell subsets with different state in breast TME, which provided new insight that different immune cell subsets play a different role in promoting and opposing tumor progression11.

Indeed, PDAC is characterized by an intricate immune microenvironment involving a diverse type of inflammatory cells, including macrophages, T cells, and B cells. Numerous studies have demonstrated that the regulatory T cells (Tregs), regulatory B cells (Bregs) and tumor associated macrophages (TAMs) in PDAC play a vital role in tumorigenesis and tumor progression, which are potential therapeutic targets12,13,14,15,16. Furthermore, the communication among tumor-infiltrating cells varies notably throughout distinct stages of PDAC malignant progression. By exploring the signaling pathways associated with these significant disparities, novel immune regulatory factors and immune checkpoints can be uncovered, thereby presenting potential targets for the development of innovative immunotherapy strategies. Disrupting these signaling pathways has the potential to reinstate immune cell recognition and combat against tumors17,18. Thus, it is imperative to deepen our understanding of the dynamic alterations in TME infiltrating immune cell subsets, alongside the marker genes and signaling pathways implicated in the intercellular communication during diverse stages of PDAC malignant progression.

In this work, we applied single cell RNA-sequencing approach to systematically explore dynamic changes of TME during PDAC malignant progression. A total of 31 samples (8 PDAC I, 9 PDAC II, 6 PDAC III, 8 PDAC IV referring to patients with stages I through IV PDAC respectively) were used to construct expression matrix. After quality control, 40,542 cells were selected for subsequent analysis. We observed notable alterations in the proportion of ductal cells, immune cells, CAFs and their respective gene expression profiles from early PDAC to late PDAC. We used GO analysis to compare the biological processes and molecular functions of cell subsets at different clinical stages of PDAC. We also examined the distribution of immune cell types at each stage of PDAC and constructed a prognostic model by integrating immune cell risk characteristics with clinical features. Additionally, we performed cell communication analysis at various stages of PDAC to compare the interactions between cells in the TME across different stages, elucidating the main signaling pathway changes during tumor progression. Finally, we conducted drug prediction and molecular docking based on the genes associated with key signaling pathways, revealing potential therapeutic drugs and their affinity for relevant targets. These results contribute to a deeper understanding of the dynamic changes in the TME components during PDAC malignant progression and may pave the way for the development of novel treatment approaches for PDAC.

Results

Single-cell analysis uncovers cellular constituents and evolutionary dynamics during PDAC progression

To enable a clinically integrated, comprehensive view of PDAC, we re-analyzed data from GSE242230 and GSE205013, which included 40,542 cells from 31 samples after quality control (Fig. 1, Supplementary Tables S1 and S2).

Fig. 1
figure 1

Workflow of sample collection and data analysis in this study. Including QC, normalization, PCA, differential gene analysis, pathway enrichment analysis, pseudotime analysis, survival analysis, cell communication analysis, drug prediction, etc.

We performed normalization, principal component analysis, and finally identified 8 distinct clusters, including ductal cell, macrophage, B Cell, endothelial cell, T cell, fibroblast, plasma and erythrocyte cell (Fig. 2A-D and Supplementary Fig. S1A-D). Signature genes for each cluster were in accordance with well-known cell markers recorded in the literature (Supplementary Fig. S1E-H)8,19.

Fig. 2
figure 2

Late PDAC is characterized by EMT and CSC properties. (A-D) The UMAP plots showing clustering information in PDACI, II, III, IV respectively. (E) The proportion of cells changing from PDAC I, II, III to PDAC IV. (F) The UMAP plot showing the distribution of ductal cell subpopulations, color-coded by cluster. UMAP uniform manifold approximation and projection. (G) The UMAP plots showing the distribution of ductal cell subpopulations, according to cancer stage (left) and patient sample (right). (H) Heatmap presenting the expression levels of top 5 cell type-specific genes among ductal cell subpopulations. (I) The histogram showing the frequency of each cell type in each patient. (J) The volcano plot showing the differential expression genes (DEGs) of ductal cells between PDAC IV and PDAC I. Red and blue dots represented the genes upregulated and downregulated respectively. The names of the top 10 up/downregulated genes were marked. (K-L) GO analysis individually revealing molecular function terms for genes upregulated in late PDAC (PDAC IV) and early PDAC (PDAC I).

Then we counted each cluster in four groups, which represented the samples with different clinical stages, and found that the majority of cells in PDAC I were ductal cells (59%), and identified only a small number of immune cells (35.87%). Interestingly, the proportion of ductal cells in PDAC IV was 40.9%, which was significantly reduced compared to PDAC I. In contrast, the proportion of immune cells, including T cells, B cells, and macrophages, was 56.05%, which was significantly increased compared to PDAC I. It suggested that ductal cell would not be the major component in TME when the tumor developed toward higher malignancy, immune cells migrated and accumulated in tumor, reflecting human immune system’s response for tumor lesion (Fig. 2E and Supplementary Table S3).

Ductal cell heterogeneity during PDAC progression

To reveal changes of ductal cells during PDAC malignant progression, we conducted the subcluster analysis for ductal cells. A total of 15,101 ductal cells were analyzed, which were divided into different clusters with unique gene signatures upon UMAP analysis (Fig. 2F, H). Furthermore, the UMAP plot indicated that ductal cells from different tumor stages and different patients formed the markedly different clusters (Fig. 2G, I), suggesting gradual changes in gene expression pattern of ductal cells during PDAC malignant progression. Common ductal cell markers such as SOX9 and epithelial cell marker EPCAM were expressed in all ductal cell subpopulations, confirming ductal cell identity (Supplementary Fig. S2A).

Subsequently, we compared the ductal cells from PDAC IV with those from PDAC I and identified 1,047 upregulated genes and 3,503 downregulated genes (Fig. 2J). GO analysis revealed that the upregulated genes in late PDAC (PDAC IV, representing primary tumors from metastatic patients) were significantly enriched in several cancer-related terms, such as cell adhesion, cytokine production, cell activation and phagocytosis (Fig. 2K), suggesting that tumor cells enhance their aggressiveness and metastatic potential by promoting tumor invasion and immune escape mechanisms through increased cell adhesion, cytokine production, and immune evasion. On the contrary, the upregulated genes in early PDAC (PDAC I, representing primary tumors from resectable patients) were mainly associated with normal biological functions, such as histone modification, glycoprotein biosynthesis process and regulation of response to DNA damage stimulus (Fig. 2L).

In addition, we performed gene expression analysis of epithelial markers (EPCAM), mesenchymal markers (FN1, MMP7 and VIM) and cancer stem cell (CSC) related markers (CD44, ALDH1A1 and NOTCH2). The results showed that ductal cells in early PDAC mainly exhibited epithelial expression profile, in contrast with ductal cells in late PDAC which were enriched for mesenchymal markers and had higher expression levels of CSC related genes (Supplementary Fig. S2B). These data supported the notion that epithelial-mesenchymal transition and CSC properties gradually accumulated with the progression of tumor, promoting tumor invasion and metastasis8.

Fibroblasts in the TME during PDAC progression

Next, we analyzed the mesenchymal cells in the TME, with an emphasis on fibroblasts. A total of 936 mesenchymal cells from PDAC with different clinical stages were analyzed, which formed three distinct subclusters with unique gene signatures upon UMAP analysis (Fig. 3A-B and Supplementary Fig. S3A). Common markers, such as ACTA2, TAGLN and FN1, were expressed in all subpopulations, confirming PDAC mesenchymal cell identity (Supplementary Fig. S3B).

Fig. 3
figure 3

The evolution of fibroblast and T cell subpopulations during PDAC malignant development. (A) The UMAP plot showing the distribution of CAF subpopulations, color-coded by cell subtype. UMAP uniform manifold approximation and projection. (B) The UMAP plots showing the distribution of CAF subpopulations, according to cancer stage (left) and patient sample (right). (C) The proportion of CAF subpopulations changing during PDAC progression. (D) The violin plot showing the ACTA2 (αSMA) expression levels of PSC in PDAC I, II, III, IV. (E) Pseudotime analysis exploring the development trajectories of CAFs, each dot corresponds to a single cell, the CAFs subtype shown according to specific markers expression level. The dashed line showing the potential developmental trajectory. (F) The UMAP plot showing the distribution of T cell subpopulations, color-coded by cluster. UMAP uniform manifold approximation and projection. (G) The UMAP plots showing the distribution of T cell subpopulations, according to cancer stage (left) and patient sample (right). (H) T cell subsets proportion in PDAC with different clinical stages. (I) GO analysis showing biological process terms for marker genes in Treg/exhausted T cell (cluster 6). (J) TNFRSF4 having specific expression pattern in Treg/exhausted T cell (cluster 6) based on violin plot. (K) Boxplot exhibiting distinct expression patterns of TNFRSF4 in PAAD based on TCGA and GTEx databases. Red: tumor group, gray: normal control group, num (T): sample size of tumor group, num (N): sample size of the normal control group. (L) Scatter plot showing TNFRSF4 has significant correlation with canonical Treg/exhausted T cell markers (FOXP3). (M) Kaplan–Meier curves depicting survival in PDAC patients profiled by TCGA, stratified according to the level of TNFRSF4 expression. (N) Pseudotime analysis exploring the development trajectories of T cell (naïve T cell—effector T cell—Treg/exhausted T cell).

In addition, three subclusters were identified according to specific gene expression profile: cCAFs (classical CAFs) expressed high levels of COL1A1, LUM (extracellular matrix related components), MMP11 (matrix remodeling molecular), FAP (stromal fibroblast activated protein) and SFRP2 (modulators of Wnt signaling), which are involved in extracellular matrix deposition; csCAFs (complement-secreting CAFs) showed highly activated transcription of complement system, including C3, C7, CFB, CFH, CFI, which may regulate immune and inflammation response within the tumor; PSCs (pancreatic stellate cells) were distinct from both cCAF and csCAF, and expressed marker genes of stellate cells, such as RGS5, ADIRF, CRIP1, NDUFA4L2, NOTCH3 and PDGFA (Supplementary Fig. S3B). Then we analyzed the proportion of these subpopulations in PDAC with different clinical stages. We found that the proportion of PSC decreased with the progression of tumor staging, while the proportion of csCAF gradually increased. PSCs express high levels of alpha-smooth muscle actin (encoded by ACTA2) in PDAC (Fig. 3C-D), consistent with previous conclusion that PDAC is characterized by a dense stroma consisted of proliferating myofibroblasts (PSC) and extracellular matrix components20.

We applied the Monocle 2 algorithm to explore the developmental trajectories of three subpopulations based on transcriptional similarities10. The results indicated that tree structure began with cCAFs with signature genes of LUM and FAP and csCAFs with signature genes of C3, C7, CFD, and ended with PSCs with signature genes of RGS5, ADIRF, NDUFA4L2, CRIP1 and NOTCH3, which demonstrated the possibility of evolution from cCAFs and csCAFs towards PSCs (Fig. 3E). Simultaneously, we also found the development trend of cells from early PDAC to late PDAC, which was in accordance with tumor progression direction (Supplementary Fig. S2C-F).

Landscape of tumor infiltrating immune cells in PDAC

Infiltrating immune cells within a tumor, such as regulatory T cells (Tregs), exhausted T cells, TAMs, and B regulatory cells (B regs), significantly contribute to tumor development and metastasis21,22,23. Immunotherapies targeting the TME, like anti-CTLA-4 and anti-PD-1 treatments, have demonstrably enhanced patient outcomes in certain cancers. However, these therapeutic effects can be highly variable depending on the cancer type and individual patients24. Therefore, it is crucial to identify predictive biomarkers for treatment response and devise novel immunotherapy approaches through a deeper comprehension of the TME. Therefore, we performed an analysis of a total of 15,640 T cells present in PDAC, which formed 15 distinct subclusters with unique signature genes upon unsupervised clustering (Fig. 3F-G and Supplementary Fig. S3C). Based on specific cell markers, the relative proportion of subsets, including CD4 + T cell, CD8 + T cell, Th1/2, Treg, cytotoxic T cell, memory T cell and exhausted T cell, were investigated. PDAC III/IV presented more cytotoxic T cells and memory T cells than PDAC I. Simultaneously, the results showed more Tregs and exhausted T cells accumulated in the advanced PDAC as well (Fig. 3H and Supplementary Fig. S3D). Indeed, anti-tumor T cell activity is observed in PDAC, but it might be compromised during disease progression as a result of the inhibitory influence exerted by the accumulation of T regulatory (Treg) cells and exhausted T cells within the TME.

Interestingly, the cluster 6 had enriched expression of known regulatory and exhausted T cell markers, FOXP3, TNFRSF18, CTLA4, TIGIT, LAYN, and was therefore identified as Treg/exhausted T cell (Supplementary Fig. S2G). This subpopulation was only a small part of total T cells, and total 791 genes were specifically expressed in this subcluster, and significantly enriched in some biological processes, including mRNA processing, activation of immune response, regulation of T cell activation (Fig. 3I). There was a clear convergence in the signature genes of cluster 6 with those previously reported for Tregs and exhausted T cells in studies on non-small cell lung cancer, colorectal cancer, breast cancer, melanoma, and hepatocellular carcinoma23,25,26,27.

Subsequently, our attention turned to key signature genes that hold significance in the diagnosis and prognosis of PDAC. TNFRSF4 (encoding TNF receptor superfamily member 4), through its interaction with its ligand OX40L, plays a pivotal role in T cell activation and immune responses28,29,30. The expression level of TNFRSF4 in pancreatic cancer samples was significantly higher than that of normal tissues, and we found a positive correlation between TNFRSF4 and classical Treg markers (FOXP3) (R: 0.58) using TCGA database (Fig. 3J-L). Based on differential expression and survival analysis by TCGA database, the patients with high expression level of TNFRSF4 had significantly better prognosis than those with low expression level of TNFRSF4 (OS: HR = 0.59, p = 0.014; DFS: HR = 0.65, p = 0.058) (Fig. 3M).

To gain deeper insights into the functional states and interrelationships of T cells, we employed the Monocle 2 algorithm to elucidate their developmental trajectories. The pseudotime analysis revealed that the tree structure originated from naïve T cells, characterized by the signature genes CCR7 and LEF1. Subsequently, these cells differentiated into effector T cells, marked by GZMK and GNLY, and Treg/exhausted T cells, distinguished by ENTPD1, LAYN, FOXP3, and CTLA4 (Fig. 3N). Notably, regulatory and exhausted T cells were predominantly observed during the later stages of T cell development, indicating a transition from an activated state to a suppressive and exhausted state, aligning with previous research findings27.

Macrophages, as another subset of tumors infiltrating immune cells, play a vital role in tumor immunity. Next, we characterized the heterogeneity of macrophages during the progression of PDAC through unsupervised clustering techniques. This analysis revealed the existence of 16 distinct macrophage subpopulations (Fig. 4A-B and Supplementary Fig. S4A). Utilizing specific cell markers for macrophage subsets, we quantified the abundance of these subsets across different clinical stages of PDAC, encompassing TAMs, M1/2 macrophages, CD169 + macrophages, and TCR + macrophages. Our findings demonstrated that late PDAC (PDAC II, III, and IV) exhibited a higher concentration of M1, M2, and TAM compared to early PDAC (PDAC I) (Fig. 4C-D). In accordance with macrophage subset analysis, the increase in anti-tumor components (M1) was accompanied by the infiltration and accumulation of pro-tumor components (M2 and TAM), collectively contributing to the establishment of an immunosuppressive environment. In addition, we compared late PDAC (PDAC II, III and IV) and early PDAC (PDAC I) and found 5 upregulated genes and 30 downregulated genes (Supplementary Fig. S4B). The upregulated genes in late PDAC were chiefly enriched for cytokine-mediated signaling pathway, response to molecule of bacterial origin, RAGE receptor binding, cytokine and chemokine activity (Supplementary Fig. S4C-D).

Fig. 4
figure 4

Transcriptional profiling of macrophage and B cell in PDAC and the immuno-infiltrating cells communication. (A) The UMAP plot showing the distribution of macrophage subpopulations, color-coded by cluster. UMAP uniform manifold approximation and projection. (B) The UMAP plots showing the distribution of macrophage subpopulations, according to cancer stage (left) and patient sample (right). (C) Macrophage subsets proportion in PDAC with different clinical stages. (D) The UMAP plots showing the expression level of specific macrophage subsets marker genes. (E) The UMAP plot showing the distribution of B cell subpopulations, color-coded by cluster. UMAP uniform manifold approximation and projection. (F) The UMAP plots showing the distribution of B cell subpopulations, according to cancer stage (left) and patient sample (right). (G) The violin plots showing the expression level of common B cell marker (CD19), Breg markers (CD1D, CD5), plasmacyte/memory B cell marker (CD27) among B cell subclusters. (H) Number (left) and communication strength (right) of receptor-ligand interactions between different cell type. (I) The heatmaps showing the signaling patterns of input (left) or output (right) in different cell type. The color intensity indicating the frequency or strength of signaling.

Next, we conducted an analysis of B cell in PDAC. Through unsupervised clustering, a total of 1,010 B cells were evaluated, resulting in the identification of 7 distinct subclusters characterized by unique signature genes (Fig. 4E-F and Supplementary Fig. S4E). We found B cells in PDAC IV showed radical difference with those in PDAC I based on transcriptional analysis. Thus, we speculated that B cell gradually penetrated into TME during PDAC malignant development and evolved into different subpopulations with unique gene expression profiles and pro-tumor or anti-tumor function. The marker CD19 was expressed in clusters 0, 1, 2, 3, 4 and 6, but not in clusters 5, indicating the heterogeneity of B cells in the TME (Fig. 4G). We found no regulatory B cells (Bregs) in TME according to the expression of Breg markers, such as CD1D and CD5. The main forms of B cells were plasmacyte and memory B cells, expressing high level of CD27. Simultaneously, there was three subcluster (cluster 0, cluster 2 and cluster 5) in PDAC, identified as naïve B cell with lack of CD27 expression (Fig. 4G). Moreover, through GO analysis, we found that the upregulated genes in PDAC II were mainly enriched in activation and regulation of immune response, while the upregulated genes in PDAC III were primarily enriched in cytoplasmic translation and B cell receptor signaling pathway, and upregulated genes in PDAC IV were mainly enriched in response to endoplasmic reticulum stress (Supplementary Fig. S4F-H).

Inference of cell–cell communications in TME

To determine the potential interactions between different cell types, we performed an analysis of the GEO dataset (GSE242230) using the CellChat31 tool. This tool was proposed to quantitatively infer the intercellular communication based on scRNA-seq data, to identify the cell–cell communications and how this cellular crosstalk may be relevant to protumor activities.

We first inferred the number and strength of interactions between different cell populations in primary samples (Fig. 4H). The majority of interactions occurred among ductal cells, fibroblasts, and endothelial cells. Once the interaction strength is considered (represented by the interaction weights), ductal cells exhibited a pivotal role in the TME, strongly interacting with fibroblasts, T cells, and macrophages. These results suggest that ductal cells play a central role in the TME. To reveal the signaling pathways contributing to the complex intercellular communications, we calculated the input and output interaction strengths for each signaling pathway (Fig. 4I). Notably, collagen signaling was the dominant input signaling pathway in ductal cells, while app (amyloid-beta precursor protein) signaling was the dominant output signaling pathway in ductal cells, and both exhibited the strongest interaction intensities compared to other signaling pathways. In addition to collagen signaling and app signaling, we also observed other increased extracellular matrix (ECM) signaling32,33 related to fibronectin 1 (FN1) and laminin (LAMININ), as well as proteins associated with proliferation of PDAC cells34 such as midkine (MK). In summary, these results provide a clear understanding of intercellular communication in pancreatic ductal adenocarcinoma.

A novel prognostic model was revealed by the differential genes of macrophages in PDAC IV

To effectively predict the prognosis of PDAC patients, we established a prognostic model based on immune genes. Specifically, we conducted differential gene expression analysis between PDAC IV and other PDAC stages in primary tumors, identifying 98 upregulated genes and 134 downregulated genes (Fig. 5A). Based on the GO enrichment analysis of the upregulated genes, we found that the upregulated genes in PDAC IV are primarily associated with response to hypoxia, collagen-containing extracellular matrix, receptor ligand activity and cytokine activity, among others (Fig. 5B).

Fig. 5
figure 5

Construction of the prognostic model and validation of model gene expression. (A) The volcano plots showing the DEGs of macrophage between PDAC IV and other stages. Red and blue dots represented the genes upregulated and downregulated respectively. The names of top 10 up/downregulated genes were marked. (B) GO analysis showing biological process items, molecular function items, and cellular component items for up-regulation genes in PDAC IV. (C-D) Screening for the best prognostic model genes through Lasso-Cox regression analysis and conducting cross-validation. (E) The gene expression, risk score, survival time and survival status of prognostic model genes. The top scatterplot representing the risk score from low to high. Different colors represent different groups. The scatter plot distribution in the middle representing the survival time and survival status of different samples. The bottom figure is the gene expression heatmap. The patients were equally divided into two groups according to the threshold of the median risk score and Survival status of patients with PDAC in high and low risk groups. (F) Kaplan–Meier curves showing the overall survival of patients in the high-risk and low-risk groups. (G) The predictive efficiency of the risk score was verified by the ROC curve. (H) The protein expression profiles of the key genes in the Human Protein Atlas (HPA) database.

Subsequently, we selected 29 genes from the identified 98 upregulated genes that were specifically highly expressed in PDAC IV. Then, we performed Lasso-Cox regression analysis on these 29 genes (Fig. 5C-D). The final selection included 14 genes that had a significant impact on subsequent analysis: RPS10, MIF, MT-ATP6, CSTB, IFI30, NPC2, BTG1, CTSD, FCGR2A, SEC61G, IER3, HSPB1, HMOX1 and ZFP36L1. Finally, based on these 14 genes, we constructed a prognostic model for all cancer samples. To validate whether the prognostic score based on the selected 14 upregulated genes could predict the prognosis of PDAC patients, we divided patients into high-risk and low-risk groups based on the threshold of median risk score. Compared to the low-risk group, the high-risk group exhibited higher mortality rates and shorter survival times. As the score increased, the prognosis of PDAC patients worsened (Fig. 5E). Kaplan–Meier curves demonstrated that patients in the high-risk group had poorer prognoses (P < 0.05, Fig. 5F). Time-dependent ROC analysis indicated prognostic accuracy for overall survival (OS) of 0.673 at 1 year, 0.724 at 2 years, and 0.616 at 3 years (Fig. 5G). Our findings indicate that the upregulated gene characteristics linked with PDAC IV in our model play a role in predicting the prognosis of PDAC.

Furthermore, to further validate the importance of the marker genes in our prognostic model, we analyzed their expression at the protein level in cancer cells. Specifically, we examined immunohistochemical results from the HPA database. Interestingly, compared to normal pancreatic tissue, the protein expression levels of IFI30, SEC61G, CSTB, and ZFP36L1 were significantly higher in pancreatic cancer cells (Fig. 5H). We then selected two genes (CSTB and SEC61G) from the four genes mentioned above for experimental validation. Notably, these two genes were upregulated in ductal cells of PDAC IV (Supplementary Table S4), with higher expression levels in PDAC cells compared to normal pancreatic cells (Fig. 6A). We knocked down CSTB and SEC61G in human PDAC cells (SW1990) for subsequent functional experiments. Initially, we assessed the knockdown efficiency of CSTB and SEC61G at the RNA level (Fig. 6B-C). CCK8 and EdU assays showed that knockdown CSTB and SEC61G significantly inhibited the proliferation of SW1990 cells (Fig. 6D, F). Finally, wound healing assays showed that inhibiting CSTB and SEC61G expression significantly slowed the wound healing rate of the cells (Fig. 6E). To confirm the accuracy and reliability of the above results, experiments were performed in triplicate in SW1990 cells. All data were presented as the means ± standard deviation of independent experiments. *P < 0.05, **P < 0.01, ***P < 0.001.

Fig. 6
figure 6

The role of CSTB and SEC61G in PDAC. (A)The box plots based on TCGA and GTEx databases show the expression of CSTB in PAAD and normal tissues (left) and the expression of SEC61G in PAAD and normal tissues (right). (B-C) RT-qPCR was used to verify the interference efficiency of CSTB and SEC61G in human PDAC cells SW1990. (D) CCK8 assay. (E) Wound healing assay. (F) EdU staining assay. Error bars are mean ± SD, *P < 0.05, **P < 0.01, ***P < 0.001.

Comparison of cellular communication in different stages of PDAC

To gain a deeper understanding of the interactions between cells in the TME at different stages of PDAC and decipher the dynamic changes in the major signaling pathways during various stages of tumor progression, we conducted cell–cell interaction analysis. The results from CellChat revealed signaling pathways for communication between cells at different stages of cancer during tumor progression (Supplementary Fig. S5A-C).

Compared to PDAC II, there were significantly different signaling pathways in PDAC I, including PTN, COMPLEMENT, SPP1, WNT, CX3C, and IL1. Most of these were related to tumor growth and development, participating in immune response, inflammatory reaction, and other bodily functions. Similarly, the signaling pathways in PDAC II included BAFF, RESISTIN, OX40, KIT, TRAIL, and APRIL, which were involved in cell proliferation, differentiation, survival, and apoptosis.

In PDAC III, the BTLA signaling pathway or the B/T lymphocyte attenuator became prominent. The role of BTLA in the cancer TME is complex, primarily involving the interaction between BTLA and its ligand HVEM. When BTLA binds to HVEM, it inhibits the activation, proliferation, and cytokine production of T cells and B cells35,36.

Finally, we analyzed the signaling pathways in PDAC IV, such as NGF, BMP, IGF, and LIGHT. Among them, tumor necrosis factor superfamily member 14 (LIGHT) is a protein primarily expressed on activated T cells, activated Natural Killer cells, and immature dendritic cells37,38. In addition to sensitizing tumor cells to IFNγ-mediated apoptosis, LIGHT induces tumor vasculature normalization, and drives the formation of high endothelial venules which subsequently encourage generation of tertiary lymphoid structures39,40,41. In addition, LIGHT stimulates effector cell function and antitumor CD8 + T cell entry into tumors, which aids in establishing anti-tumoral memory42,43,44,45.

SPP1, also known as osteopontin (OPN), is a multifunctional glycoprotein with various biological activities. The SPP1 signaling pathway typically refers to a series of intracellular and intercellular communication processes involving SPP1 (OPN) protein, which plays crucial roles in multiple biological processes, including inflammatory responses, immune regulation, and tumorigenesis. During our analysis, we observed a significant SPP1 signaling pathway in PDAC I, mainly involving the interaction of macrophages with other immune cells (Supplementary Fig. S5D). Genes associated with the SPP1 signaling pathway showed specific expression in macrophage, with T cell being the primary target. Macrophages are capable of secreting signaling molecules such as SPP1, which interact with receptors on the surface of T cells to regulate their activation, proliferation, and differentiation. This interaction forms a complex immune network that jointly combats the challenge of tumors46.

The CCL signaling pathway refers to the signal transduction pathway formed by the chemokine CC subfamily and its related receptors. Chemokines are a class of small cytokines or signaling proteins secreted by cells, which have the ability to induce directional chemotaxis of nearby responsive cells and exert their biological effects through interaction with G protein-coupled transmembrane receptors. The CCL signaling pathway plays a crucial role in various biological processes, including immune response, inflammation, tumor development, and so on. During our analysis, we observed a significant CCL signaling pathway activation in PDAC II, revealing increased interactions between macrophages and T cells (Supplementary Fig. S5E). The CC chemokine subfamily comprises various types such as CCL2, CCL3, and CCL20, and each chemokine plays a distinct role in specific physiological and pathological processes. For instance, CCL2 is expressed by tumor cells and stromal cells in the TME. It can induce the proliferation, migration, and invasion of tumor cells, thereby promoting tumor growth and metastasis47.

The GDF signaling pathway is a signal transduction system involved in various biological processes. GDF is a protein belonging to the transforming growth factor-β (TGF-β) superfamily. The TGF-β superfamily comprises numerous growth factors, such as BMPs and GDFs, which play crucial roles in cell differentiation, proliferation, apoptosis, and immune regulation. During our analysis, we observed a significant GDF signaling pathway in PDAC III, highlighting interactions mainly between macrophages and ductal cells (Supplementary Fig. S5F). In the GDF signaling pathway, GDF15, as a member of the GDF family, actively participates in cancer progression and invasion, and exerts its influence on the tumor immune environment through mitogen-activated kinase activities. The cytokine is actively secreted by tumors, produced by the cancer cells themselves or by tumor-associated macrophages. GDF15 potentially has both protective and tumor-promoting activities, inhibiting tumor growth in the early stages while inversely promoting tumor cell proliferation at later stages via metabolic and immunomodulatory mechanisms. Evidence accumulates of a positive association between GDF15 levels and cancer-induced cachexia48.

Finally, we identified the PTN signaling pathway in PDAC IV, mainly involving the interactions between ductal cells (Supplementary Fig. S5G). In cancer research, the PTN (Pleiotrophin) signaling pathway is an important area of study. PTN is a multifunctional protein with high affinity for heparin, and plays a pivotal role in tumor angiogenesis and tumor growth. PTN are over-expressed in several types of human cancers and regulate important cancer cell functions in vitro and cancer growth in vivo49,50,51.

Drug prediction based on CellMiner and molecular docking analysis of related drugs

CellMiner is a computational resource library that focuses on analyzing drug sensitivity and gene expression in sixty tumor cell lines (NCI-60) from the National Cancer Institute (NCI). Utilizing the drug sensitivity data available in CellMiner, the efficacy of various drugs on specific cancer cell lines can be predicted52. To further predict potential therapeutic drugs, we analyzed the relevant genes of the above signaling pathways using the CellMiner database (Fig. 7A-C). We discovered that cases in the high-risk group might be more sensitive to XR-11576, lxabepilone, and AMONAFIDE.

Fig. 7
figure 7

Drug prediction based on CellMiner and molecular docking. (A-C) The left and middle panels display drug prediction based on CellMiner, and the right panel illustrates the binding mode of screened drugs to their targets by molecular docking.

To evaluate the affinity of the candidate drugs for their targets, we performed molecular docking analysis. The binding poses and interactions of three drug candidates (XR-11576, lxabepilone, AMONAFIDE) with three proteins (CCL5, TGFBR2, NCL) were obtained with Autodock Vina v.1.2.2 and binding energy for each interaction was generated (Fig. 7A-C and Supplementary Table S5). Results showed that each drug candidates bound to its protein targets through visible hydrogen bonds and strong electrostatic interactions. Moreover, the hydrophobic pockets of the respective targets were successfully occupied by the three candidate drugs. For CCL5, the candidate XR-11576 exhibited a low binding energy of -7.812 kcal/mol. For TGFBR2, the candidate Ixabepilone showed a low binding energy of -45.525 kcal/mol. And for NCL, the candidate AMONAFIDE demonstrated a low binding energy of -111.267 kcal/mol. These findings indicated that the binding of these candidates is highly stable.

Discussion

A growing body of studies have demonstrated significant intratumoral heterogeneity in PDAC, leading to varied patient responses to the same treatment methods. Therefore, there is an urgent need for a deeper investigation into the intratumoral heterogeneity of PDAC and exploration of more precise targeted therapies targeting specific subgroups or gene targets, such as T cells, B cells, macrophages, signaling pathways, and transcription factors. In this study, we systematically analyzed the dynamic changes of the PDAC microenvironment throughout tumor progression using single-cell sequencing. Notably, we observed a shift away from ductal cells as the dominant component, with tumor-infiltrating immune cells gradually accumulating from early to late stages of PDAC. Our analysis of epithelial, mesenchymal, and CSC markers revealed that Epithelial-Mesenchymal Transition (EMT) and CSC properties are key features of advanced PDAC. In line with prior research, although anti-tumor immune responses emerged, they were hindered by negative regulation from Tregs, exhausted T cells, and TAMs during tumor development53,54.

In our study, we identified the previously defined CAF subgroup, namely complement-secreting CAFs (csCAFs), but did not find iCAFs and apCAF. To some extent, the previously reported iCAFs shows similarities to csCAFs, as iCAFs is also enriched in the expression of the complement system component (C3). Pseudotime analysis of CAFs indicated the potential for interconversion between different CAF subgroups. These findings suggest that CAFs exhibit dynamic plasticity during PDAC progression, harnessing the plasticity of CAFs to convert pro-tumorigenic CAFs into anti-tumorigenic CAFs is a noteworthy strategy, consistent with valuable research results19,55.

Previous studies identified many regulatory and exhausted T cell markers by single cell sequencing or bulk sequencing23,25,26,27. Among them, our study found that patients with high expression levels of TNFRSF4 have significantly better prognosis based on analysis of TCGA dataset. By interacting with its ligand OX40L (also known as CD252 or TNFSF4), TNFRSF4 transmits co-stimulatory signals that participate in a variety of physiological responses involving T cells and interactions with both lymphoid and non-lymphoid cells. Within the TME, expression of TNFRSF4 can enhance the activation and proliferation of effector T cells and modulate the function of Tregs, giving rise to intricate antitumor immune responses56,57,58,59. All these results suggested TNFRSF4 is eligible for prognosis marker and potential therapy target.

TAMs are one of the most abundant immune cell populations in the pancreatic tumor stroma. Previous research has identified that TAMs are inclined to M2 deviation in the TME, which promotes and supports tumor behaviors such as tumorigenesis, immune escape, metastasis, and chemotherapeutic resistance60,61. Based on these findings, our study focused on macrophages and identified a panel of 14 genes (including RPS10, MIF, MT-ATP6, CSTB, IFI30, NPC2, BTG1, CTSD, FCGR2A, SEC61G, IER3, HSPB1, HMOX1, and ZFP36L1) as predictive biomarkers for PDAC prognosis. The selection of these genes was based on their significant associations with PDAC prognosis, reflecting their roles in critical pathways within TAMs that influence tumor progression. Using these biomarkers, we constructed a prognostic model and stratified patients into high-risk and low-risk groups based on the median risk score threshold. Notably, compared to the low-risk group, the high-risk group exhibited significantly higher mortality rates and shorter survival times. The upregulated genes identified in our study provide valuable insights for predicting prognosis in PDAC patients. Subsequently, we validated four of the prognostic marker genes. Using the HPA database, we observed elevated levels of CSTB and SEC61G in human PDAC cells. Additionally, we conducted functional cell experiments in SW1990 cells, which showed that knockdown of CSTB and SEC61G significantly reduced cell invasion, migration, and proliferation in PDAC cells. However, there are some limitations to our study. First, our research is based on the analysis and summary of existing public databases and lacks a large number of clinical samples for validation. Second, further in vivo and in vitro experiments are needed to validate the mechanisms involved in this study.

At present, the field of immunotherapy for PDAC still faces numerous challenges, including inherent immune suppression characteristics of PDAC and side effects of treatment approaches. Among them, Pembrolizumab is an anti-PD-1 immune checkpoint inhibitor that has been approved by the Food and Drug Administration for the treatment of patients with specific types of advanced PDAC62. To explore more potential targeted therapeutic drugs, we conducted drug sensitivity analysis using genes obtained from cell communication. Specifically, we used the CellChat tool to gain in-depth insights into the interactions between cells in the TME at different stages of PDAC, deciphering the dynamic changes of key signaling pathways at various stages of tumor progression. Based on the key signaling pathways at each tumor stage, we selected three relevant genes (CCL5, TGFBR2, NCL) for drug prediction and identified the three most sensitive drugs (XR-11576, Ixabepilone, AMONAFIDE). Finally, we utilized Autodock Vina v.1.2.2 for molecular docking to evaluate the affinity of these drugs to their targets. The results showed that each candidate drug binds to its protein target through visible hydrogen bonds and strong electrostatic interactions, and this binding is highly stable.

While our study provides important insights into the dynamic changes in the PDAC tumor microenvironment, we acknowledge that a key limitation is the unavailability of spatial transcriptomic data. This limitation restricts our ability to examine spatial relationships between different cell populations and may affect the interpretation of certain findings. Future studies will aim to incorporate spatial transcriptomic data to better understand the spatial heterogeneity and interactions within the tumor.

In summary, through in-depth analysis of PDAC scRNA-seq data, we have revealed a novel prognostic model and identified three potential therapeutic drugs. These findings contribute to prognostic prediction in PDAC patients and may potentially open new avenues for treating PDAC in the future.

Methods

Data collection

The expression matrix for single-cell transcriptome analysis was from GEO (Gene Expression Omnibus) database (GEO: https://www.ncbi.nlm.nih.gov/geo), under accession number GSE242230 and GSE205013. All patient samples were obtained from public dataset. A total of 31 samples were downloaded.

Processing and integration of single cell RNA-seq data

Seurat (v4.3) R package and custom scripts were used for single-cell analysis. Firstly, filter out cells with less than 200 total gene expressions and genes expressed in less than 3 cells from each sample to create a Seurat object. Next, filter the original transcript count of the gene cell matrix to remove cells with transcripts less than 500; Cells with a total UMI count below 1000; And cells with over 10% mitochondrial genes. The UMI counts matrices were then log normalized and scaled with Seurat’s NormalizeData and ScaleData functions. Using Harmony R package technical factors between data sets were corrected to construct a well-integrated embedding across cell types and patients.

Data analysis

The Seurat package was utilized to discern major cell types through a process that involved dimension reduction, followed by clustering of cell groups. Genes exhibiting the highest variance were selected to carry out linear dimensional reduction (principal component analysis). The determination of the number of principal components utilized in subsequent analyses was made with careful consideration of Seurat’s PCHeatmap and Elbowplot. Finally, unsupervised graph-based clustering, as provided by Seurat, was executed on the projected principal component space. Total 2000 highly variable genes were generated for performing PCA reduction dimension.

Dimension reduction and clustering analysis

We utilized Seurat to carry out graph-based unsupervised clustering and Uniform Manifold Approximation and Projection (UMAP) for the purpose of visualizing our data in a two-dimensional space. We utilize the first 30 PCA components as the input for the UMAP, FindNeighbors function, and FindClusters function. The sensitivity for FindClusters was set to 0.4. FindAllMarkers was used to validate the coarse cell type. All heat maps were drawn using Seurat’s DoHeatmap function with default settings. They were then labeled based on subtype markers previously described in the literature. Subsequent subpopulation analysis of major cells groups (ductal, fibroblast, lymphocyte, myeloid) was further dissected by applying this variable feature selection, PCA, and dimensional reduction procedure over each cell type. Visualization of gene expression throughout the study used Seurat’s FeaturePlot function.

Gene ontology analysis

The package (clusterProfiler) was utilized to analyze and visualize the functional profiles of gene clusters. For the analysis, a cutoff value of 0.05 was chosen for both the p value and q value.

TCGA database analysis

The differential analysis, survival analysis and correlation analysis were performed with GEPIA (Gene Express Profiling Interactive Analysis) online tool, using a standard professing pipeline. These data were from the prevalent TCGA and GTEx PDAC samples.

Pseudotime analysis

The R package (Monocle 2) was utilized to perform cellular trajectory analysis, leveraging the assumption that a one-dimensional ‘pseudotime’ could effectively capture the complexity of high-dimensional expression values, thereby facilitating the analysis of single cells. The clusters identified as T cells and fibroblast cells were subsequently loaded into the R environment. After applying log normalization and DDDTree dimensionality reduction, we visualized the cell trajectory and position in a two-dimensional space, utilizing a tree structure. Finally, the pattern of each cell in the plot was determined based on the expression levels of specific markers.

Construction and validation of the prognostic signature

In the dataset, highly expressed genes in PDAC stage IV macrophages were identified using the Seurat’s FindMarkers function with default settings63. The subset of genes that best predict survival outcomes were identified using Lasso-Cox regression. The process involved utilizing the R package (glmnet) to fit the Cox proportional hazards model. Hazard ratios were calculated using the Cox proportional hazards regression model, and covariate p-values were calculated using the Wald test. Kaplan–Meier curves were generated with the R package (survminer) based on the risk grouping for each patient (“High Risk” or “Low Risk”). The receiver operating characteristics (ROC) curve from RF models with the area under the curve (AUC) was used to assess the performance of the prognostic model.

Expression of marker genes at the mRNA and protein level

We analyzed the expression of some of the above-mentioned marker genes at both the transcriptome and protein levels. For the mRNA level, we performed RT-qPCR validation in human pancreatic cancer cells (SW1990). For the protein level, we obtained immunohistochemical (IHC) staining images of the marker genes from the Human Protein Atlas database (HPA; https://www.proteinatlas.org/; accessed on 10, November 2024)64.

Real-time quantitative polymerase chain reaction (RT-qPCR)

qRT-PCR was conducted using the SYBR® Premix Ex Taq™ II kit (Takara Bio, Kusatsu, Japan). The primer sequences are CSTB-F: GAGTGTTCCAATCTCTCCCTCA; CSTB-R: ACAGCCCGGAGATGAAGCTTAT; SEC61G-F: CGGCTGGTTAAAAGATGCACTA; SEC61G-R: ACCCACAAAACATACAGGCAAA. The experiment was repeated three times, and the relative RNA expression levels of CSTB and SEC61G were quantified using the 2−ΔΔCt method, with GAPDH serving as the internal control.

Cell counting kit-8 proliferation assay

The cells were diluted and seeded into 96-well plates at a density of 1,000 cells per well, with three replicates at each of four time points (24 h, 48 h, 72 h, 96 h). Following the manufacturer’s instructions, 10 μL of CCK-8 reagent (Dojindo, Kumamoto, Japan) was added to each well, and the plates were incubated at 37 °C for 2 h. Absorbance (OD value) at 450 nm was then measured using an automated microplate reader (Tecan i-control, Tecan, Switzerland).

Wound healing assay

A sterile 200 μL pipette tip was used to create a smooth scratch on 90% confluent serum-starved cells. The cells were then incubated in complete growth medium (10% FBS) for 24 h. Cell migration into the wound area was observed under an inverted microscope at 0 and 24 h.

EdU assay

We performed this assay using the EdU Imaging Kit (APExBIO, Houston, TX, USA) according to the manufacturer’s instructions. Cells were cultured in 20 mm glass-bottomed dishes and exposed to 10 μmol/L EdU for 4 h. They were then fixed at room temperature for 15 min using PBS containing 4% paraformaldehyde and PBS containing 0.5% Triton X-100. Next, the prepared click reaction solution was introduced, and cells were incubated in the dark at room temperature for 30 min. After three washes with PBS (3 min each), cells were incubated with a diluted solution of Hoechst 33,342 (1:2000) at room temperature in the dark for 30 min. Images were acquired using a laser scanning confocal microscope (Olympus, Tokyo, Japan), and the number of positively stained cells (red) was counted using ImageJ.

CellChat for cell–cell communication analysis

The cell–cell communications were predicted using the R package (CellChat v1.6.1). Specifically, we first explored the correlation between cell types. Secondly, we loaded a gene expression matrix and its associated cell meta data to create a CellChat object for each of the four stages of cancer using the createCellChat function. We designated the ligand-receptor interaction database as CellChatDB. human and narrowed down the genes in each CellChat object by focusing solely on the signaling genes, thereby mitigating the computational burden. Subsequently, we pinpointed the overexpressed genes and overexpressed ligand-receptor interactions by leveraging the identifyOverExpressedGenes and identifyOverExpressedInteractions functions. After that, we estimated the probability of cell–cell communication by employing the computeCommunProb function. Lastly, we eliminated cell–cell communications that involved only a limited number of cells (defaulting to 3 in CellChat) within specific cell types.

Drug sensitivity analysis

CellMiner is a database and query tool designed for the cancer research community to facilitate integration and study of molecular and pharmacological data for the NCI-60 cancerous cell lines. We used the CellMiner database (https://discover.nci.nih.gov/cellminer) to calculate the correlation between gene expression and drug sensitivity. The RNA-seq and DTP NCI-60 drug dataset were downloaded from the CellMiner™ database (https://discover.nci.nih.gov/cellminer/)52.

Molecular docking

To analyze the binding affinities and modes of interaction between the drug candidate and their targets, Autodock Vina 1.2.2, a silico protein–ligand docking software was employed65. The molecular structures (XR-11576, Ixabepilone and AMONAFIDE) was retrieved from PubChem Compound (https://pubchem.ncbi.nlm.nih.gov/)66. The 3D coordinates of CCL5 (PDB ID, 6AEZ), TGFBR2 (PDB ID, 1PLO) and NCL (PDB ID, 2FC8) were downloaded from the PDB (http://www.rcsb.org/pdb/home/home.do). For docking analysis, all protein and molecular files were converted into PDBQT format with all water molecules excluded and polar hydrogen atoms were added. The grid box was centered to cover the domain of each protein and to accommodate free molecular movement. The grid box was set to 30A × 30A × 30A, and grid point distance was 0.05 nm. Molecular docking studies were performed by Autodock Vina 1.2.2 (http://autodock.scripps.edu/).

Statistical analysis

R software (version 4.3.3) was utilized for data processing (https:// www.r-project.org/). All data were presented as mean values and standard deviation. Spearman’s rank correlation was used to analyze the relationship between TNFRSF4 and Treg/Exhausted T cells signature genes. The Kaplan–Meier method and the corresponding log-rank test were performed to identify the prognostic value of marker genes. The Student’s t-test method was utilized to analyze statistical significance. Statistical significance was defined as *p < 0.05, **p < 0.01, ***p < 0.001.