Introduction

Pancreatic cancer (PC) is a particularly fatal cancer that ranks fourth among the common causes of cancer-related deaths in the United States in 2021, and seventh worldwide1. Radical surgery combined with chemotherapy remains the only way to cure PC currently, but due to early metastasis, more than 70% of patients have lost the opportunity to receive radical surgery when diagnosed, which has resulted in an increasing mortality rate among PC patients2,3. In recent years, immunotherapy has shown promising results in various cancers, but it has not met expectations due to the complex tumor heterogeneity in PC, where patients often exhibit resistance to these treatments and have poor outcomes4. In addition, with the development of bioinformatics and multiomics research, more and more researchers have begun to focus on the development of prognostic models for PC, hoping to optimize clinical decision-making and individualized treatment for PC patients5,6. However, the application effect of most biological models is not ideal, which may be due to limitations of algorithms and improper sample processing methods.

Toll-like receptors (TLRs) are a family of transmembrane pattern recognition receptors that are expressed in various immune cell types and are responsible for pathogen recognition and induction of innate immune responses7,8. At the same time, TLRs tightly link innate and adaptive immunity by regulating the activation of antigen-presenting cells and key cytokines9. Specifically, TLRs recognize various pathogens’ conserved structures through activating nuclear factor kappa B (NF-κB), type I interferon (IFN), and inflammasome signaling pathways, resulting in the production of corresponding antiviral factors and chemokines, which are crucial for inducing antigen-specific adaptive immune responses10,11,12. In recent years, it has been demonstrated that TLRs can mediate inflammatory reactions to drive the occurrence of cancer, and are likely to be involved in the activation of various abnormal pathways in cancer13, including increasing cancer growth and immune evasion, inducing tumor cell apoptosis inhibition and chemotherapy resistance14. Fortunately, small-molecule inhibitors targeting TLRs have gradually emerged in anti-cancer therapy, including glioma, bladder cancer, colorectal cancer, breast cancer, cervical cancer and melanoma15,16,17,18,19,20, by identifying TLR signaling pathways, it may be possible to completely improve future treatment strategies for cancer. However, the relationship between PC and TLR signaling pathways has not been fully clarified. Considering the limited application of immunotherapy in PC and the important role of TLR signaling pathways in anti-tumor immune regulation, further exploring the inherent relationship between the two is particularly important.

Our study selected 102 genes related to the TLR signaling pathway, and conducted a deep analysis of the expression landscape of the TLR signaling pathway in PC tissue at the single-cell level. By using clustering analysis, we divided 945 PC samples according to TLR signaling pathway scores into three subgroups, exploring the correlation between TLR pathways and the survival status of PC patients, immune microenvironment, metabolic reprogramming, cell death, and targeted therapy. The aim was to accurately interpret the mechanism of action of the TLR signaling pathway in PC. Finally, we built a prognostic model for PC based on a series of machine learning algorithms that included four genes (NT5E, TGFBI, ANLN, and FAM83A), and verified the superior predictive performance of the model.

Methods and materials

Data sources

The study included 945 PC samples with complete follow-up information, which were collected from three platforms: TCGA, GEO, and ICGC. Among them, TCGA provided expression data for 177 PC samples, ICGC provided 295 samples, and GSE79668, GSE85916, GSE102238, GSE57495, GSE62452, GSE71729, and GSE28735 provided expression data for 51, 79, 48, 63, 65, 125, and 42 samples respectively21,22,23,24,25,26,27. All these expression data have been processed using the “sva” package to reduce the impact of batch effects28, and for expression matrices with large changes in expression levels, we log-transformed them. In addition, we collected single-cell sequencing data for 24 PC samples and 11 adjacent tissue samples from the GSA (Genome Sequence Archive) database’s CRA001160 cohort. The genes related to the TLR signaling pathway (Supplementary Table 1) were downloaded from the MSigDB (The Molecular Signatures Database) database’s KEGG_TOLL_LIKE_RECEPTOR_SIGNALING_PATHWAY, which included 102 genes29.

Processing of single cell expression data for PC

The CRA001160 cohort provides single-cell expression data for 57,530 cells and 24,005 genes30, and using the CreateSeuratObject function to read these data and set min.cells = 3 and min.features = 200 for data cleaning and quality control, we filter out cells and genes that are detected in less than 200 genes and covered by less than 3 cells. To obtain high-quality single-cell RNA expression data, we use the R software “Seurat” package for initial quality control of single-cell RNA expression data, and we set the following filtering criteria: (1) nCount_RNA > = 1,000; (2) nFeature_RNA > = 2,000 and nFeature_RNA < = 6,500; (3) percent.mt < = 10. After filtering, we obtained single-cell expression data for 57,024 cells and 23,376 genes for subsequent single-cell level analysis in this study. Subsequently, we standardize the single-cell data using the LogNormalize method to make gene expression levels comparable among different samples and reduce data dispersion, while using the “FindVariableFeatures” function to identify the top 2,000 highly variable genes between cells, and using the “ScaleData” function to normalize the gene expression profile. Harmony analysis is used for dimensionality reduction and removing batch effects based on cell-specific marker genes. Based on the specific marker genes between cells, we divide all cells from 24 PC samples and 11 adjacent samples into 9 types of cells.

Single-cell level signaling characteristics of TLR pathway-related genes

In order to study the activation and distribution of the TLR signaling pathway at the single-cell level in PC, we used six algorithms to score gene sets related to TLR signaling for single-cell data, including “AUCell”, “Ucell”, “Singscore”, “ssGSEA”, “Add”, and “scoring”, where “scoring” is the sum of the scores from the first five algorithms. Based on the wilcox.test function, we compared the expression scores of gene sets related to the TLR signaling pathway in each cell type and mapped them visually to tSNE and UMAP plots. In many biological processes, cells do not develop completely synchronously. To respond to various stimuli, cells transition from one functional state to another, and cells in different states have different gene expression characteristics, resulting in dynamic changes in biological metabolites to perform their required work. This involves a large amount of gene silencing and activation. Monocle sorts cells based on their progress along a biological trajectory, enabling recognition of changes in gene expression associated with different cell fates31. In addition, the main form of cell-to-cell communication is through cell surface ligands and receptors, and CellChat can analyze the communication patterns between cells to obtain continuous states and interaction relationships along cell development trajectories32. Monocle and CellChat R packages are used for pseudotime trajectory analysis of ductal cells in PC samples and analysis of cell-to-cell communication, respectively. Finally, to distinguish between malignant cells (aneuploid) and normal cells (diploid) in the tumor microenvironment, we used CopyKAT method to calculate the copy number variation (CNV) levels of each cell, and combined the classification results with gene set scores to further analyze the activation and distribution of the TLR signaling pathway in PC samples.

Construction of PC classifier based on TLR signaling pathway

To distinguish PC patients with different TLR characteristics clearly, we developed a novel PC classifier. First, we analyzed gene expression data containing 171 normal samples and 945 tumor samples, using the GSVA package to perform gene set variation analysis on tumor samples. This allowed us to assess the expression activity of predefined gene sets across different tumor samples. Next, we intersected the predefined gene set with the genes in the sample expression data, resulting in a set of 44 genes that could partially represent the activity of the gene set. In the normal samples, we calculated the upper and lower bounds of expression for these intersecting genes, with the upper bound set at the 98th percentile and the lower bound at the 2nd percentile of normal expression. Based on these thresholds, we categorized gene expression in the tumor samples as high, low, or normal. Hierarchical clustering of the samples produced three distinct clusters. A Kruskal-Wallis test yielded a significant P-value (p = 1.013804e-40) among these clusters, indicating significant differences between the groups. Finally, we used the ‘Pheatmap’ package to create a heatmap, providing a comprehensive view of the expression activity levels of each sample within the predefined gene set. This visualization highlighted the expression intensity relationships among samples within the three clusters and the intersecting genes. Additionally, the proportion of samples with upregulated expression for each gene was calculated and noted in parentheses alongside each gene on the heatmap. To verify the clinical prognostic significance of the classifier, we further evaluated the clinical prognostic differences among different subtypes of PC, with prognostic analysis achieved through survival package and survminer package.

Identification of intrinsic molecular features of TLR signaling pathway-associated PC classifiers

To explain why different subtypes of PC have different clinical outcomes, we conducted a deeper exploration of their underlying molecular features. First, based on the KEGG database, we collected 11 classic cell death pathways, 42 classic metabolic pathways, and 23 classic immune pathways33, the GSVA package was used to assist in evaluating the cell death, metabolism, and immune signaling intensity of 945 PC samples, and a heatmap was used to depict the distribution of signal intensity among the three clusters.

The immune microenvironment is a classic molecular marker for tumors, and different immune microenvironment characteristics may be the potential reason for different outcomes in patients with different subtypes of PC. The Estimate algorithm uses gene expression data to predict the content of immune cells, stromal cells, and tumor cells in malignant tissues, resulting in four scores: stromal score, immune score, ESTIMATE score, and tumor purity34,35. Based on the Estimate package, we calculated these four scores for 945 PC samples and conducted corresponding comparisons. The TIMER2.0 online platform provides seven algorithms for predicting immune cell infiltration in tumor samples36, including TIMER, CIBERSOFT, CIBERSOFT-ABS, QUANTISEQ, XCELL, EPIC, and MCPCOUNTER. Based on the results of all algorithms, we generated a heatmap to show the infiltration of immune cells in each sample among the three subtypes. In addition, we collected 31 classic immune checkpoints and compared the expression differences of immune checkpoint-related genes among the three subtypes. Based on the ssGSEA algorithm, we evaluated the immune cell and immune function scores of 945 PC samples using Spearman correlation tests to explore potential associations between each TLR signaling pathway-related gene and immune cell infiltration and immune function. Finally, we used the “ggscatterstats” packages to construct scatter plots to show the associations between four immune cells or functions (CCR, TIL, Checkpoint, Parainflammation) and TLR signaling pathway scores.

Drug sensitivity analysis based on PC classifier

OncoPredict is an R package that uses gene expression levels to predict drug sensitivity, which can be applied to various in vitro and in vivo environments for drug and biomarker development, enabling easy prediction of cancer patients’ response to targeted drugs37. In this study, we used the OncoPredict package to predict the treatment response of 945 PC patients to 184 targeted drugs, and showed the most common 16 drugs in clinical practice. Finally, we analyzed potential beneficial drugs for patients with different subtypes of PC, and compared two groups using the wilcox.test function, while three group comparisons were conducted using the kruskal.test function.

Development of a robust toll-like receptor-based prognostic model (TLR-PM) based on the machine learning network framework

In the initial stage of model building, the TCGA cohort was used as the training cohort for developing a prognostic model for PC, while the remaining 10 cohorts were used as the validation cohort. The MetaCohort cohort was the sum of the 10 validation cohorts, while the Meta_cohort cohort included all cohorts including the TCGA cohort. This grouping aimed to maximize the robustness and applicability of the model. Based on Spearman correlation, we identified a gene network that was coexpressed with 102 genes related to TLR signaling pathways. This network contained a total of 3860 genes, and genes that were prognostic in at least 7 cohorts were considered key genes for PC prognosis. Therefore, we identified 39 genes that met the criteria, including 6 genes with completely opposite prognostic contributions. We removed these 6 genes and ultimately retained 33 prognostic key genes for subsequent machine learning modeling using various algorithms, including generalized boosted regression modeling (GBM), Ridge, partial least squares regression for Cox (plsRcox), random survival forest (RSF), elastic net (Enet), stepwise Cox, CoxBoost, supervised principal components (SuperPC), Lasso, and survival support vector machine (survival-SVM). Specifically, we designed and employed a series of algorithms, and established multiple PC prognostic models using 10-fold cross-validation. For each model, we calculated the Harrell consistency index (C-index) for all validation datasets, and selected the model with the highest average C-index as the optimal choice. The final optimal model was composed of 4 genes (NT5E, TGFBI, ANLN, and FAM83A) and was associated with TLR-PM prognosis.

Validation of predictive performance of PC prognosis model

Using the survival and survminer packages, survival curves were drawn for the above PC cohort to test the clinical usability of the TLR-PM, with high- and low-risk groups defined based on the median expression of prognostic key genes. We also compared the C-index of several key clinical characteristics (such as age, gender, grade, and stage) with the risk index of our constructed TLR-PM using boxplots to visualize the results. To compare the performance of our model with published models, we systematically searched the PubMed database for previously published 8 studies on the construction of PC prognostic models38,39,40,41,42,43,44,45, based on the model genes and risk coefficients provided in the studies, risk scores were calculated for each PC cohort, and the prognostic prediction performance advantage of our constructed TLR-PM was evaluated by the size of the C-index.

Results

Preprocessing and initial quality control of single cell data

Based on the dataset CRA001160, we obtained a total of 24 PC samples and 11 adjacent tissue samples. We filtered out cells and genes with less than 200 genes and less than 3 covered genes. During the initial quality control process, we selected samples with a gene number between 200 and 6500, with a sum of gene expression greater than 1000, and with a mitochondrial gene expression ratio not exceeding 10% (Supplementary Fig. 1A-D). After filtering, we obtained 57,024 cells and 23,376 genes from normal and tumor tissues. The correlation coefficients between sequencing depth and gene number, mitochondrial genes, and ribosomal genes were 0.9, 0.01, and − 0.07, respectively (Supplementary Fig. 1E-G). Subsequently, we identified the top 2,000 highly variant genes between cells for subsequent analysis, and marked the top 10 genes (Supplementary Fig. 1H). Harmony analysis was used to remove batch effects from raw single-cell data (Supplementary Fig. 2A). Based on cell-specific marker genes, we divided all cells from 35 samples into 9 types of cells, including acinar cell (PRSS1, CELA3A, CTRB1, CTRB2, and REG1B), B cell (MS4A1, CD79A, VPREB3, and CD19), ductal cell type (TFF1, TFF2, FXYD2, FXYD3, KRT19, MMP7, KRT8, and KRT18), endocrine cell (CHGB, CHGA, INS, and IAPP), endothelial cell (CDH5, PLVAP, VWF, CLDN5, and RAMP2), fibroblast cell (LUM, DCN, COL1A1, and COL1A2), macrophage cell (AIF1, APOC1, CD14, and S100A9), stellate cell (RGS5), and T cell (CD3D and CD3E). Heat map shows the enrichment of marker genes and functional pathways in biological processes of each cell cluster (Supplementary Fig. 2B). The horizontal volcano map shows the expression of the marker genes of each cell cluster, with yellow indicating up-regulation and purple indicating down-regulation, and we mainly show the top five marker genes (Supplementary Fig. 2C).

TLR signaling pathway related gene set scores of 9 cell subsets in PC tissue

We merged 24 PC samples and 11 adjacent cancer samples, and used two nonlinear clustering algorithms (t-distributed stochastic neighbor embedding (tSNE) and Uniform Manifold Approximation and Projection (UMAP) to reduce the dimensionality of the merged samples and annotate cell clusters, visually demonstrating the distribution and clustering of nine types of cells in PC tissues and adjacent tissues. We observed that PC tissues showed significant stellate cell, macrophage cell, T cell, B cell, and fibroblast cell enrichment, while adjacent tissues showed more acinar cell enrichment (Fig. 1A, B). To observe the expression and distribution of TLR signaling pathway genes in PC tissues, we used five algorithms (AUCell, Ucells, Singscore, ssgsea, and addmodulescore) to score gene sets for the TLR signaling pathway in single-cell data. The scoring was the sum of the five algorithms. Figure 1C shows the scores for the TLR signaling pathway gene sets for nine types of cells. The gene signaling of macrophages and endothelial cells was strong, while that of B cells, stellate cells, and acinar cells was weak (Fig. 1C). To compare the differences in gene expression distribution of TLR signaling pathway genes between tumor tissues and normal tissues, we displayed the gene set scores for each cell cluster under each algorithm in Fig. 1D. The different algorithm scores indicate that compared with normal tissues, tumor tissues show a phenomenon of higher model gene gene scores for almost all cells, with ductal cells, macrophage cells, endocrine cells, and T cells showing significantly higher gene set scores in tumor tissues (Fig. 1D). Subsequently, we further demonstrated the tSNE and UMAP plots for each cell cluster and their respective gene set scores, which showed that in almost all algorithm modes, macrophages had a higher number of cells with higher model gene gene set scores (Fig. 2). Figure 3A shows the expression differences of TLR signaling pathway-related genes among nine types of cells in normal and tumor tissues (Fig. 3A).

Fig. 1
figure 1

Annotation and gene-set scores of 9 cell clusters. (A, B) Using tSNE and UMAP algorithms for dimensionality reduction and annotation of 9 cell cluster species in normal and tumor tissues. (C) Gene set scores of 9 cell cluster species under 5 algorithms and Scoring. The color of the circle represents the average representation, and the size of the circle represents the percentage representation. (D) Comparison of gene-set scores of 9 cell cluster species in pancreatic cancer tissue with normal tissue species.

Fig. 2
figure 2

Gene set scores for normal and tumor tissues viewed from tSNE and UMAP perspectives.

Fig. 3
figure 3

Pseudotime trajectory analysis and cell–cell communication. (A) Expression comparison of TLR signaling pathway related genes in 9 cell cluster species (normal tissue and cancer tissue). (B) The evolutionary phylogenetic trees of ductal cells annotated by pseudotime. (C) TLR signaling pathway gene set Scoring during ductal cells differentiation. (D) Heatmap of 102 TLR signaling pathway related genes identified by branch expression analysis modeling.

Pseudotime trajectory analysis and cell–cell communication

Cell Trajectory Analysis (CTA) reconstructs the differentiation process of cells over time by building their trajectories, which helps people infer the evolution of cells from the single-cell level. Specifically, based on the time-ordered gene expression of each cell, CTA arranges each cell along its corresponding trajectory to generate a visually intuitive lineage tree, which further predicts the differentiation and development trajectories of cells. We used Monocle to illustrate the evolution trajectory of ductal cells, an important cell component in PC tissues. The tree diagram clearly shows ductal cells at three different differentiation stages (Fig. 3B). Additionally, during the differentiation process, the scoring of the TLR signaling pathway gene set did not undergo significant changes, indicating that we believe the TLR signaling pathway exhibits a certain degree of conservatism during the evolution of ductal cells (Fig. 3C). Supplementary Fig. 3 shows additional five TLR signaling pathway gene set scores during the differentiation process of ductal cells. We also created a heatmap to display the expression of TLR signaling pathway-related genes in ductal cells at different stages of development (Fig. 3D). From this figure, it can be seen that the genes in the TLR signaling pathway exhibit dynamic changes during the evolution of ductal cells, such as genes like AKT3, NFKBIA, TLR5, and TLR1 that are highly expressed during the early stage of ductal cell evolution, while genes like TOLLIP, FADD, TICAM1, and MAPK3 are highly expressed during the late stage. This explains why the gene set scores for the TLR signaling pathway did not undergo significant changes during the evolution process of ductal cells due to the fact that there was some upregulation of genes in the TLR signaling pathway at all stages of ductal cell evolution.

Bubble chart shows the functional enrichment of nine types of cells within PC tissue, including immune, metabolism, signaling, and proliferation (Fig. 4C). However, cells in the human body often interact with other cells when performing physiological functions, so to study the intercellular interactions between cells within PC tissue, we further analyzed their internal intercellular communication patterns (Fig. 4A and B), where Ductal (ductal cell) is the main research object. Based on gene set scores, we divided the Ductal into TLR high Ductal and TLR low Ductal, and we show the interaction networks between the two types of Ductal and other cell types in Fig. 5B (Fig. 5B). The Ductal (including TLR high Ductal and TLR low Ductal) interacts primarily with other cells using ATP-CD74 as a communication medium (ligand-receptor or receptor-ligand), in addition to MIF-(CD74 + CD44), SPP1-CD44, MIF-(CD74 + CXCR4), etc. (Fig. 5A). Figure 5C details the communication patterns between the TLR high Ductal and TLR low Ductal, acting as receptor or ligand cells, and several other cell types. For example, the ligand APP on the surface of endocrine cells has a strong binding strength with the CD74 on the surface of TLR low Ductal (Fig. 5C).

Fig. 4
figure 4

Cell-cell communication network among ductal cells and other cells. (A, B) Circle plots showing the number (left) and weights/strength (right) of cell interactions in tumor microenvironment. (C) Activity of 9 cell clusters in immune-related pathways. The color of the circles represents the value of logFC, while the size of the circles indicates the value of FDR.

Fig. 5
figure 5

Ligand-receptor relationships between Ductal cell and other cells during intercellular communication.

The identification of ductal cell malignancy and gene set scores

Non-diploid tumor copy number karyotype analysis (CopyKAT) can infer cell chromosome copy number by analyzing single-cell transcriptome data, and further determine whether the cell is a malignant cell with non-diploid chromosomes or a normal diploid cell. Based on this, we have divided ductal cells from PC tissues into a malignant ductal cell group and a normal ductal cell group, and have shown their distribution in the tSNE plot (Fig. 6B). To determine the activity of the TLR signaling pathway in the two ductal cells, we also conducted related analysis of gene set scores. The figure shows that there are significant differences in the expression of genes related to the TLR signaling pathway between the malignant ductal cell and normal ductal cell groups, with the normal ductal cell having a higher gene set score for the TLR signaling pathway (Fig. 6A). Further, Fig. 6C shows the distribution of gene set scores for malignant ductal cells and normal ductal cells using six different algorithms. The reason for this phenomenon being due to the ductal cells used for benignity and malignancy identification all coming from PC tissue is that it is possible that the TLR signaling pathway is upregulated in the normal ductal cells, which subsequently induces their conversion to malignant ductal cells, which subsequently lose their activity.

Fig. 6
figure 6

Identification of ductal cell malignancy and gene set scores. (A) Comparison of six gene set algorithm scores of malignant ductal cell and normal ductal cell. (B) The distribution of malignant ductal cell and normal ductal cell in tSNE diagram. (C) Gene set scores for malignant ductal cell and normal ductal cell viewed from tSNE perspectives.

Construction of a classifier for PC patients based on TLR signaling pathway score

Based on the expression of genes related to the TLR signaling pathway (Fig. 7A), we classified 945 PC patient samples from three platforms (TCGA, GEO, and ICGC) into three subtypes (C1, C2, and C3). The violin plot shows the enrichment scores of the three subtypes, with C2 subtype samples having a higher TLR signaling pathway score, indicating higher activity of related genes, while the C3 subtype showed the opposite (Fig. 7B). On the other hand, C1 subtype represents normal TLR signaling pathway activity. The heatmap shows the expression of different TLR signaling pathway-related genes among the three subtypes. Except for some genes, the vast majority of genes are expressed higher in C2 subtype than in C1 and C3 subtypes, which again confirms that C2 subtype has higher TLR signaling pathway activity (Fig. 7C). We also analyzed the prognostic differences among the three subtypes, and found that patients with C3 subtype had better prognosis with higher survival rate, indicating that PC patients with low TLR signaling pathway activity are more favorable for prognosis (Fig. 7D).

Fig. 7
figure 7

Cluster analysis of pancreatic cancer based on TLR signaling pathway. (A) Heat maps show pancreatic cancer samples clustered according to TOLL score. (B) Violin plot showing TOLL scores for 3 clusters, with C2 having the highest score and C3 having the lowest. (C) The expression of genes related to the TLR signaling pathway in the 3 clusters is shown, with red indicating upregulation and blue indicating downregulation. (D) Comparison of survival curves of 3 clusters. (E) Comparison of the activity of cell death-related signaling pathways among the 3 clusters, with red indicating upregulation and green indicating downregulation. (F) Comparison of the activity of immune signaling pathways between the 3 clusters, with red indicating upregulation and green indicating downregulation. (G) Comparison of the activity of the metabolic signaling pathways between the 3 clusters, with blue indicating upregulation and yellow indicating downregulation.

Characterization of cell death, metabolism, and immune-related pathways among three subtypes of PC

In order to fully explore the intrinsic molecular characteristics of PC patients in different subtypes, we collected 11 classic cell death pathways, 42 classic metabolic pathways, and 23 classic immune pathways based on the KEGG database, and evaluated the signal intensity of cell death, metabolism, and immune signaling in 945 PC samples. The results were displayed in heatmaps (Fig. 7E-G). In subtype C2, the signaling of cell death pathways such as immunogenic cell death, apoptosis, phagocytosis, and disulfidptosis was enhanced, while the signaling of cell death pathways such as necroptosis and necrosis was weakened. Conversely, in subtype C3. On the other hand, for immune-related pathways, the signaling of immune pathways such as T cell receptor signaling pathway, B cell receptor signaling pathway, cytokine receptor interaction, and primary immunodeficiency was enhanced in subtype C2, while that of base excision repair, homologous recombination, and spliceosome was enhanced in subtype C3 (Fig. 7F). In subtype C2, the signaling of metabolic pathways such as butanoate metabolism, tyrosine metabolism, and pyruvate metabolism was enhanced, while that of arachidonic acid metabolism, ether lipid metabolism, and galactose metabolism was weakened. Conversely, in subtype C3 (Fig. 7G).

In addition, we used ESTIMATE analysis to calculate the proportion of immune cells, stromal cells, and tumor cells in tumor tissues, and obtained immune scores, stromal scores, ESTIMATE scores, and tumor purities for each subtype (C1, C2, and C3). The results showed that the tumor tissues of subtype C2 had a higher proportion of stromal cells and immune cells with lower tumor purity (Fig. 8A). Subsequently, to further explore the differences in the immune microenvironment between the three subtypes, we analyzed the differences in immune cell infiltration between the three subtypes using seven algorithms provided by the TIMER2.0 platform for predicting the degree of infiltration of immune cells. The results showed that regardless of which algorithm was used, the degree of infiltration of immune cells in subtype C2 was significantly higher than that of the other two subtypes (Fig. 8B). Finally, Fig. 8C shows the expression of immune checkpoint-related genes in the three subtypes. The vast majority of genes such as PDCD1, CD27, CD48, CD8A, TNFRSF14, BTLA, JAK2, and LDHA were significantly upregulated in subtype C2 compared with subtype C3 (Fig. 8C), indicating that there was excessive activity of immune checkpoints in subtype C2, which may lead to a certain degree of tumor cell immune evasion and result in excessive accumulation of immune cells. This series of results suggests that the abnormal immune signaling in subtype C2 leads to abnormal accumulation of immune cells with high immune scores and low tumor purity. This abnormal immune signal may be an important reason for its poor survival status.

Fig. 8
figure 8

Immune microenvironment analysis of pancreatic cancer based on 3 clusters. (A) The ESTIMATE analysis calculates the ratio of immune cells, stromal cells, and tumor cells within the cancer microenvironment of the 3 clusters. (B) Seven algorithms are used to analyze the differences in immune cell infiltration levels among 3 clusters. (C) The expression differences of immune checkpoint-related genes among the 3 clusters.

TLR signaling pathway and immune microlandscape in PC

Given that the TLR signaling pathway is an important component of immune signaling, we further evaluated the immune cells and immune function scores of 945 PC samples using ssGSEA algorithm, and explored the potential association between TLR signaling pathway-related genes and 29 types of immune cells or functions using Spearman correlation test. We also calculated the correlation between TLR signaling pathway scores and immune cell infiltration. Different TLR signaling pathway-related genes are associated with various immune cells or functions to varying degrees, among which RAB33A, ESR1, and ESR2 are positively correlated with most immune cells or functions, while TOMM20, DDB1, and CUL4A are negatively correlated (Fig. 9A). Bubble plots show the correlation between classical immune-infiltration-related cells and TLR signaling pathway scores, with most immune-infiltration cells showing strong positive correlation with TLR signaling pathway scores (Fig. 9B). We selected the four immune cells with the highest correlation (CCR, TIL, Check. Point, and Parainflammation) for correlation analysis with TLR scores, and the R values were 0.62, 0.65, 0.65, and 0.57 for CCR (Fig. 9C), TIL (Fig. 9D), Check. Point (Fig. 9E), and Parainflammation (Fig. 9F), respectively.

Fig. 9
figure 9

Analysis of the correlation between TOLL score and immune cell infiltration. (A) Correlation between TLR signaling pathway related genes and immune cells or functions. (B) The correlation between TOLL score and immune cell infiltration is represented by the color of the circles, with larger circles indicating greater statistical significance. (C-F) Analysis of the correlation between flammability and TLR scores of 4 immune cells or functions (CCR, TIL, Check.Point and Parainflammation).

Significance of PC classifier in drug therapy

To further explore the potential value of the TLR signaling pathway in clinical drug treatment for PC patients, we predicted the response to various targeted drugs based on the oncoPredict software package for 945 PC patients, analyzed the potential beneficial drugs for PC patients with different subtypes, and drew box plots to determine the impact of TLR signaling pathway-related pathways on the sensitivity of 16 commonly used targeted drugs for PC patients. Specifically, patients in the C2 subtype showed high sensitivity to 12 drugs including PD173047, carmustine, CDK9-5038, and gefitinib, while patients in the C3 subtype showed high sensitivity to drugs including lapatinib, crizotinib, carmustine, and dabrafenib (Fig. 10). This part of results will provide scientific guidance for the development of individualized clinical treatment for PC patients.

Fig. 10
figure 10

Sensitivity analysis of 3 clusters to 16 common pancreatic cancer targeting drugs.

A robust prognostic model of PC was constructed based on machine learning network framework

Based on Spearman correlation, we identified a gene network that has co-expression relationships with 102 genes related to the TLR signaling pathway. This network contains a total of 3,860 genes, and genes that have prognostic value in at least 7 cohorts are considered key genes for PC prognosis. Therefore, we identified 39 genes that meet these criteria, with 6 of them having completely opposite prognostic contributions (Supplementary Fig. 5). We removed these 6 genes and ultimately retained 33 prognostic key genes for subsequent machine learning modeling using various algorithms. It is worth noting that the 33 prognostic key genes used for modeling demonstrated risk in almost all PC cohorts (Fig. 11A). In the training cohort, we used 10 algorithms and designed a series of algorithms to establish multple PC prognostic models (Supplementary Table 2). For each model, we calculated the C-index for all validation datasets and displayed the average C-index on the rightmost side (Fig. 12A). We selected the RSF algorithm with the highest average C-index (0.637) for building the final model, resulting in an optimal PC prognostic model (TLR-PM) composed of 4 model genes (NT5E, TGFBI, ANLN, FAM83A).

Fig. 11
figure 11

Verification of TLR-PM accuracy. (A) Risk profiles of 33 prognostic key genes in 11 pancreatic cancer cohorts, with gray indicating no significance and orange indicating risk. (B-D) The prediction accuracy of TLR-PM was compared with several commonly used clinical measures in the GSE62452, GSE79668, and TCGA cohorts. (E) Predictive performance of TLR-PM compared with eight other published pancreatic cancer prognostic model.

Fig. 12
figure 12

Construction of TLR-PM based on a series of machine learning algorithms. (A) The C-indexes of multiple machine-learning algorithm combinations in the 11 cohorts. The far right shows the average C-index of the algorithm in each cohort. (B) Survival curves of patients in the high and low risk groups in TCGA queues and Meta_cohort queues.

Validation of prognostic model for PC

Based on the expression files of the four genes included in the TLR-PM, we further calculated the scores for each sample in the PC cohort and divided the samples into high- and low-risk groups based on the median. To evaluate the prognostic evaluation ability of the TLR-PM, we downloaded the prognostic data and plotted Kaplan-Meier curves for the PC cohort (Supplementary Fig. 4). The results showed that in the TCGA cohort and Meta_cohort cohort, the survival time of the low-risk group samples was significantly longer, while the high-risk group samples had obvious unfavorable outcomes (Fig. 12B). To further verify the prediction accuracy of the TLR-PM, we compared it with several commonly used clinical indicators (age, grading, stage, etc.). The results showed that in the GSE62452 cohort, GSE79668 cohort, and TCGA cohort, our TLR-PM had a higher C-index compared to other clinical features (Fig. 11B-D), and the results were statistically significant, indicating that the TLR-PM has superior predictive performance and can stably and reliably predict the prognosis of PC patients.

With the increasing sharing of large-scale data, more and more studies have built many PC prognostic models through algorithms such as LASSO and CoxBoost. To identify the advantages of the TLR-PM, we collected eight published articles on PC prognostic model construction on PubMed, and compared the predictive ability of the TLR-PM with other PC prognostic models based on the C-index (Fig. 11E). As shown in the figure, among the 12 PC cohorts, the TLR-PM had the highest C-index in 7 cohorts, especially in the Meta_cohort cohort and MetaCohort cohort. In addition, the C-index of the TLR-PM in cohorts such as GSE57495, GSE71729, GSE85916, GCE102238, and ICGC AU was also ranked fourth, third, second, and second respectively. However, in some cohorts where the C-index was higher, the prognostic model did not have the same superiority in all cohorts. In contrast, only the TLR-PM was more stable and reliable. In summary, the prognostic model of PC constructed by four genes, namely TLR-PM, has more profound clinical application value compared to 11 previously developed PC prognostic models.

Discussion

PC, one of the most deadly cancers in the world, exhibits high tumor heterogeneity and is difficult to detect early, making it an aggressive and relatively difficult-to-treat tumor46,47,48. Toll-like receptors (TLRs) are pillars of the immune system and appear to play a role in the pathological evolution of PC49. Based on large-scale and multi-omics analysis, our study explored the relationship between the TLR signaling pathway and the complex tumor microenvironment of PC, and developed a robust prognostic model for PC based on TLR signaling pathway-related genes using the machine learning framework. This model provides an important guidance for personalized clinical management and prognostic follow-up prediction for PC patients.

To characterize the cellular landscape within PC tissues, we first collected single-cell data and divided all cells from PC and adjacent samples into nine cell subpopulations based on specific genes. We then assessed the gene set scores for the TLR signaling pathway for each cell type, and found that all nine cell subpopulations, including ductal cells, exhibit active TLR signaling characteristics in PC tissues. In the context of epigenetic remodeling and transforming growth factor beta signaling mutations, ductal cells in the pancreas can serve as origin cells for pancreatic ductal adenocarcinoma, with their cell plasticity and reprogramming being central to epithelial transformation50. We found that ductal cells from PC tissues exhibit significant differences in expression of genes related to the TLR signaling pathway during evolution, indicating that changes in the activity of the TLR signaling pathway are an important part of the process of ductal cell transformation. It is worth noting that we found that ductal cells from PC tissues exhibit highly conserved TLR signaling features during evolution, which may be due to the fact that there are many types of genes involved in the TLR signaling pathway, and a certain number of genes are upregulated during the early, middle, and late stages of ductal cell evolution. This further emphasizes their persistent high activity status during tumor progression. Based on CopyKAT analysis, we further distinguished malignant ductal cells from normal ductal cells, with the latter exhibiting a higher gene set score for the TLR signaling pathway, further indicating that the TLR signaling pathway is an important participant in PC induction. Specifically, upregulation of the TLR signaling pathway in normal ductal cells leads to their conversion into malignant ductal cells, and then their activity is downregulated. We also characterized the intercellular communication network between ductal cells and other cells, and noticed that regardless of the expression levels of genes related to the TLR signaling pathway, there are strong signaling connections between ductal cells and other cells, but different signaling connection patterns exist between TLR high Ductal and TLR low Ductal.

Afterwards, to classify PC patients reasonably, we divided 945 PC patient samples into 3 subtypes based on the expression of genes related to the TLR signaling pathway. Survival curves showed that patients in C3 had better outcomes than those in C1 and C2, indicating that the activation of the TLR signaling pathway is associated with poor prognosis in PC, which is consistent with previous research conclusions51,52,53. On the other hand, there are significant differences in cell death, metabolism, and immune pathways between C1, C2, and C3 subtypes, which may be an important factor for poor prognosis in patients with C2. Considering the important position of the TLR signaling pathway in the immune system and the fact that the immune microenvironment of PC is its main dependent environment for progression54,55, we further explored the potential connection between the TLR signaling pathway and the PC immune microenvironment. ESTIMATE is a tool that leverages gene expression data to predict tumor purity and the proportions of infiltrating stromal and immune cells within tumor tissue. Using this tool, we calculated the stroma proportion, immune cell infiltration proportion, and tumor purity for each of the three subtypes. The results indicated that subtype C2 contained a higher presence of stromal and immune cells within tumor tissue, resulting in lower tumor purity. Using the TIMER2.0 platform’s algorithm, we found that the C2 subtype exhibited significant immune cell infiltration. This abnormal, excessive immune infiltration is a key factor contributing to the poor immune response and treatment resistance in PC56. Studies have shown that a hallmark of pancreatic cancer (PC) is an immunosuppressive stromal response, characterized primarily by abnormal changes in stromal cells and the extracellular matrix within the tumor microenvironment. This stromal response is typically accompanied by extensive immune cell infiltration, including neutrophils and macrophages, further supporting our findings. The stromal response and immune imbalance driven by abnormal TLR signaling are among the factors contributing to the progression of PC57,58,59. Additionally, tumor immune escape mechanisms pose a significant obstacle to effective PC therapy. Our results demonstrate that immune checkpoint-related genes in the C2 subtype are significantly upregulated, suggesting that abnormally activated TLR signaling contributes to immune deficiency in PC, which in turn leads to treatment challenges and poor prognosis60. In conclusion, the activation of TLR signaling and the upregulation of immune checkpoint-related genes lead to the suppression of immune cell activity, further exacerbating the immunosuppressive stromal response and recruiting additional immune cells to the tumor microenvironment. This type of immune microenvironment, characterized by immune suppression and excessive accumulation of immune cells, is a key factor contributing to the poor prognosis of pancreatic cancer patients61,62.

Currently, drugs such as carmustine, gefitinib, lapatinib, carmustine, and dabrafenib have been shown to have certain anti-cancer effects in PC63,64,65,66,67. Based on the oncoPredict software package, we have conducted analysis of PC drug sensitivity prediction, and found that patients in the C2 subtype are highly sensitive to 12 drugs including PD173047, carmustine, CDK9-5038, and gefitinib, while patients in the C3 subtype are highly sensitive to drugs such as lapatinib, crizotinib, carmustine, and dabrafenib. This result will provide precise guidance for personalized medication of PC patients.

In order to find accurate and effective prognostic biomarkers, we have conducted large-scale machine learning algorithms and developed an optimal PC prognostic model (TLR-PM). TL-PM is composed of 4 genes (NT5E, TGFBI, ANLN, and FAM83A) whose expressions are closely related to the occurrence and progression of PC, as proven in previous studies. NT5E has been repeatedly proven to be a potential biomarker for PC prognosis and immunotherapy68,69; TGFBI, an extracellular matrix-interacting protein, enhances glycolysis and promotes PC cell migration by binding to integrin αVβ5, which is associated with poor prognosis in PC patients70; In addition, the disruption of TGFBI inhibits the polarization of macrophages and enhances their stimulation of PC cell growth, significantly boosting anti-tumor immunity and making PC patients sensitive to chemotherapy71; ANLN is an actin-binding protein that plays an important role in many malignant tumors. Its expression is significantly upregulated in PC cell lines and is closely related to tumor size, differentiation, lymph node metastasis, and poor prognosis based on the EZH2/miR-218-5p/LASP1 pathway72; FAM83A (family with sequence similarity 83 member A) has carcinogenic effects in various cancer types. In PC, FAM83A binds to β-catenin and inhibits the assembly of the cytoplasm-destabilizing complex, thus inhibiting its phosphorylation and degradation. FAM83A is also a direct downstream target of the Wnt/W-catenin signaling pathway. Upregulated FAM83A is closely related to the growth and metastasis of PC73. These conclusions are consistent with our research results and indicate that the four TLR signaling pathway-related genes composing TLR-PM are all risk factors for the prognosis of PC patients. Finally, to verify the accuracy of the model, we have taken a series of validation measures, which have demonstrated that the developed TLR-PM is a reliable prognostic prediction tool that meets the needs of ideal biomarkers and can be used to serve PC patients comprehensively.

However, there are several limitations to our study. First, all PC samples were obtained from public databases, and prospective clinical cohorts are needed to validate the reliability of the prognostic models. Second, the PC samples from the CRA001160 dataset we used were limited to pancreatic ductal adenocarcinoma, and other pathological subtypes of PC were not included, which reduces the rigor of our exploration of the single-cell landscape. Finally, the lack of causality remains a major limitation of machine learning. The algorithms we employed may not fully capture the critical relationship between model-related genes and PC prognosis. Therefore, in future studies, we plan to incorporate real PC patient samples from clinical settings for further exploration and enhance in vitro and in vivo experiments to validate the potential mechanisms of the genes associated with our model in PC.

Conclusion

This study reveals for the first time the TLR signaling landscape at single-cell resolution in PC, with the strongest TLR signaling observed in macrophages and endothelial cells. The TLR signaling in various cell types gradually increases with the development of PC, but gradually weakens with the increasing malignancy of ductal epithelial cells. Based on the TLR signaling, a clinical classifier and prognostic model for PC has been developed for the first time, which has the potential to help clinicians identify the clinical heterogeneity of PC patients in a timely manner, and develop personalized risk assessments and treatment plans after delving into the tumor microenvironment.