Introduction

Pancreatic cancer, arising from pancreatic duct and acini epithelial cells, is highly malignant1,2,3. It is notorious for its insidious onset, challenging early diagnosis, rapid progression, and poor prognosis, often dubbed the “king of cancers.“4,5,6 The most common subtype, pancreatic ductal adenocarcinoma (PAAD), accounts for 80–90% of cases according to WHO classification7,8. Options for Treating PAAD consist of surgery, radiation, drugs, targeted treatments, and immune system therapy9,10,11,12. Despite these treatments, PAAD is frequently diagnosed late, leading to high recurrence and metastasis rates. Patients face a grim prognosis, as the chances of five years are less than 10%13. Traditional prognostic indicators like tumor stage and invasion depth are insufficient for precision medicine’s needs14. Pancreatic cancer displays diverse histological subtypes and genomic profiles, resulting in varied challenges, outcomes, and responses to treatments. Hence, it is essential to develop a predictive algorithm that can detect patients at high-risk preoperatively by analyzing clinical and genomic features, paving the way for personalized precision treatment.

Oxidative phosphorylation (OXPHOS), as an important process of cellular energy metabolism, has complex interrelationships with various tumors15,16,17. Recent studies have shown that treatment with oxidative phosphorylation inhibitors can improve the anti-tumor response of certain cancers, such as melanoma18, lymphoma19, colorectal cancer20, leukemia21, and PAAD22, by regulating the “Warburg effect”. Abnormal expression of AMP-activated protein kinase (AMPK)23, nuclear factor erythroid 2-related factor 2 (Nrf2)24, mammalian target of rapamycin (mTOR)25, and sirtuin (SIRT) family genes26 directly or indirectly affects the oxidative phosphorylation process, thereby affecting the growth, metabolism, and metastasis of PAAD. Therefore, inhibition of oxidative phosphorylation might offer a safe and effective avenue for pancreatic cancer treatment27. Routine genetic testing may hinder the clinical application and promotion of these genes and molecules due to their high cost. Hence, it is essential to have efficient methods for categorizing the molecular characteristics of pancreatic cancer.

Analyzing PAAD specimens pathologically is essential for classifying patient prognosis. The characteristics displayed in the diseased tissue indicate the collective influence of the tumor surroundings on the behavior of cancer cells. The field of pathology has undergone a major transformation due to the increasing implementation of artificial intelligence28. Pathomics involves utilizing artificial intelligence to transform pathological images into detailed, easily analyzable data, including quantitative aspects like texture, shape, edge sharpness, and biological attributes. It is used to measure pathological diagnosis, molecular expression, and disease prognosis. The integration of histopathology and genomics has been attempted in some cancers, resulting in more precise patient prognostic stratification29,30. However, there have been limited reports on the utilization of computer-aided histopathological analysis in the diagnosis and prognosis prediction of PAAD.

Building upon the aforementioned factors, this study presents a novel approach utilizing pathomics technology and unsupervised machine learning to establish the relationship between clinical pathology, genomic information, and patient outcomes. Furthermore, it integrates bioinformatics analysis to investigate the underlying molecular mechanisms associated with pathomic subtyping.

Materials and methods

Ethical approval

We conducted a secondary analysis of publicly available and accessible data. The research ethics committee at Lihuili Hospital, affiliated with Ningbo University (Approval No. KY2024ML027), granted approval for the implementation of research procedures involving human subjects. All patients provided written informed consents before enrolling, in accordance with the Helsinki Declaration of 1964 and relevant ethical standards.

Image acquisition and processing

We obtained a PAAD patient dataset from The Cancer Genome Atlas (TCGA) database, which includes diverse clinical, H&E-stained histopathological images, and biological information, to be used as the training set (https://tcga-data.nci.nih.gov/tcga/). The clinical dataset from Li Hui Li Hospital, confirmed pathologically as pancreatic ductal adenocarcinoma between 2016 and 2022, was included as an external validation set. All pathological images are panoramic scans of formalin and paraffin-embedded tissue sections, in svs format, with a maximum magnification of 20x or 40x31,32.

The OTSU algorithm was utilized to acquire the tissue regions from the pathological slides. The OTSU algorithm, also referred to as the maximum inter-class variance method, is a threshold algorithm for image binarization that divides the image into two sections: the desired tissue region and the unwanted background for analysis33. The 40x images underwent segmentation into multiple sub-images measuring 1024 × 1024 pixels, while the 20x images were divided into numerous smaller images measuring 512 × 512 pixels and then enlarged to 1024 × 1024 pixels. Following review by pathologists, sub-images with poor quality (e.g., contamination, blurriness, or blank areas exceeding 50%) were excluded. For additional analysis, ten sub-images were chosen at random from each pathological slide31,32,34.

The PyRadiomics open-source package35 was used to standardize and extract original features from the sub-images. A total of 93 of first-order and second-order features were obtained. Additionally, higher-order features (original + wavelet (LL, LH, HL, HH)) were extracted. In total, 465 features were obtained. Subsequently, features were extracted from 10 smaller images within the pathological images of every patient. Each sample’s histopathological feature was determined by taking the corresponding average value. This information was utilized for subsequent analysis29,36,37.

Pathomic feature screening

After excluding four features with a variance of 0, we applied the interquartile range (IQR) normalization method to histopathological features (461 features extracted by the PyRadiomics package)38. The IQR normalization method was employed to eliminate the influence of variable dimensions and variation range. The calculation formula is as follows:

$$\:{X}^{{\prime\:}}=\:(X-Xmin)/(Xmax-Xmin)$$

Subsequently, the univariate COX regression analysis method was employed to identify histopathological features associated with prognosis. Features with a p-value < 0.05 in the univariate COX regression analysis were considered to be associated with prognosis.

Unsupervised cluster analysis of pathomic features

NMF (non-negative matrix factorization)39 was used for unsupervised clustering of the aforementioned histopathological features associated with prognosis, resulting in different sample categories and histopathological subtypes. To ensure the stability of the results and the convergence of the algorithm, the clustering method of Brunet was employed during the clustering process, with a rank value ranging from 2 to 6. The optimal rank value was determined based upon the steepest decline of the cophenetic value, with nrun parameter set to 10. Based on the histopathological subtypes clustering, the baseline data of different clinical variables were summarized in both the training and validation datasets.

Survival analysis

We performed overall survival (OS) analysis on patients grouped by different histopathological subtypes. Survival rates among various groups of variables were illustrated through Kaplan-Meier (K-M) survival curves generated by the “survival” package in R. The median survival time represents the survival time at which the survival rate is 50%. The significance of survival rate differences among groups was evaluated using the Log-rank test.

Univariate and multivariate analysis

The Cox model is utilized to examine how one or multiple study variables are related to the likelihood of survival events happening40. Univariate Cox regression analyses were performed separately in the training set and validation set to explore independent factors influencing OS. Furthermore, we performed a multivariable Cox regression to simultaneously investigate the effects of multiple factors on OS. If the Hazard Ratio (HR) exceeds 1, the independent variable is seen as a risk factor; If the HR is less than 1, the independent variable is viewed as a protective factor.

Subgroup analysis and interaction test

We conducted univariate Cox regression to perform survival analysis on subgroups based on different pathological subtypes, aiming to explore the impact of various covariates on patient prognosis within different subgroups. Likelihood ratio test was employed to analyze the interaction effects between different pathological subtypes and other covariates, and the results were visualized by forest plots.

Differential expression genes (DEGs) analysis

Transcriptional data related to the PAAD project was obtained from the TCGA (https://portal.gdc.cancer.gov/) database. The RNA-sequence data in Fragments Per Kilobase per Million (FPKM) format was logarithmically transformed to base 2 (log2). DEGs analysis was conducted between different groups based on different pathological subtype clusters. Genes exhibiting fold change (FC) greater than 1.2 and p less than 0.05 were classified as differentially expressed.

Enrichment analysis

To elucidate the potential role of the target, a functional enrichment analysis of DEGs across distinct pathological subtypes was conducted. The Gene Ontology (GO) is a widely adopted resource for annotating gene functions, particularly encompassing molecular functions (MF), biological processes (BP), and cellular components (CC)41. The top 15 significantly enriched pathways in the enrichment analysis of BP, CC and MF were visualized, using a filtering criterion of corrected p < 0.05 with the Benjamini & Hochberg method.

Immune infiltration analysis

PAAD samples’ gene expression matrix was submitted to the CIBERSORTx database (https://cibersortx.stanford.edu/)42 in order to compute the immune cell infiltration level for each sample. Differences in levels of immune cell infiltration between various pathological subtypes were analyzed using the Wilcoxon rank-sum test. Furthermore, the same statistical method was used to examine the variation in expression of 37 immune-related genes43 among different pathological subtypes, with statistical significance defined as p < 0.05.

Differential expression analysis of OXPHOS-related genes

From the Kyoto Encyclopedia of Genes and Genomes (KEGG) website, we retrieved 121 genes linked to oxidative phosphorylation. The Wilcoxon rank sum test was used to evaluate their expression variances among distinct pathological groups. Genes with p < 0.05 were deemed statistically significant and visualized using a box plot.

Gene mutation analysis

The gene mutation data in the Mutation Annotation Format (MAF) for PAAD samples was downloaded from the TCGA database. The intersection of the downloaded mutation data and pathological data contained 101 samples. The analysis of mutation data and calculation of mutation frequency for each gene in the samples was performed using the maftools R package44. Fisher’s exact test was applied to examine potential variations in gene mutations across various pathological subtypes, with statistical significance defined as p < 0.05. The mutation profile shows the frequency of mutations in cancer-causing genes and typical carcinogenic pathways, with rows representing the reported genes and columns representing patients of each pathological subtype45,46,47,48,49,50,51. The pathway mutation frequency was calculated as follows:

$$\:\frac{The\:number\:of\:mutated\:samples\:in\:the\:pathway}{The\:total\:number\:of\:samples}\times\:100\%$$

Statistical methods

All statistical analyses and data calculations were carried out using the open-source language and environment for statistical computing and graphics, R software (version 4.1.0; available at https://www.r-project.org/). The threshold for statistical significance was set at a p < 0.05.

Results

Figure 1 demonstrates the whole design and procedure of this study.

Fig. 1
Fig. 1
Full size image

Study design flow chart. Pathological features of PAAD patients were processed for machine learning, survival analysis and bioinformatical analysis. PAAD pancreatic adenocarcinoma.

Data collection and processing

According to the inclusion and exclusion criteria, we obtained a combined total of 113 samples containing full pathological images, gene expression matrices, and clinical details from the TCGA-PAAD dataset, along with 75 samples containing complete pathological images and clinical data from an external dataset (Details were shown in Table S1-S2). The TCGA-PAAD dataset served as the training set, while the external dataset was utilized for validation. Baseline characteristics were summarized for each dataset separately using OS as the grouping variable. No notable variances were discovered among the groups for the eight co-variates in the training dataset (p > 0.05) (Table 1). A notable variation in pathologic stage (p = 0.006) was detected among the groups in the validation dataset (Table 2).

Table 1 Baseline characteristics of training cohort.
Table 2 Baseline characteristics of validation cohort.

Pathological features clustering

Using univariate Cox regression analysis, we identified 45 pathological features associated with prognosis from a pool of 461 features (see Table S3 for details). Furthermore, during the unsupervised clustering process using NMF, we observed that the cophenetic value dropped most quickly when the number of clusters was two (Fig. 2A). Based on this observation, we divided the aforementioned pathological features into two main clusters, referred to as Cluster 1 and Cluster 2 (Fig. 2B-C).

Fig. 2
Fig. 2
Full size image

Unsupervised cluster analysis of pathomic characteristics. (A) NMF rank survey on pathomic features of TCGA-PAAD; (B) 45 screened patholmic features clustering in Consensus matrix; (C) 45 screened patholmic features clustering in heatmap. NMF nonnegative matrix factorization, TCGA the cancer genome atlas.

Survival analysis and risk factor assessment

Patients in the training group were categorized into two clusters, Cluster1 (n = 57) and Cluster2 (n = 56), using NMF. Table 3 presented a summary of the baseline characteristics of clinical variables and significant differences were found in the distribution of age (p = 0.011), diabetes history (p < 0.001), and smoking history (p = 0.022) between the different pathological clusters (Table 3). Similarly, in the validation group, the patients were also categorized into Cluster1 (n = 25) and Cluster2 (n = 50) using NMF, but no notable differences were found in the distribution of clinical variables between the two pathological clusters (p > 0.05) (Table 4).

Table 3 Baseline characteristics of Clinical variables (Training cohort).
Table 4 Baseline characteristics of clinical variables (validation cohort).

Survival analysis was conducted, and we found that the median OS for patients in Cluster1 was 24.6 months, while for those in Cluster2, it was 16.03 months in the training cohort, suggesting a significant correlation between Cluster2 and disease progression (p = 0.006) (Fig. 3A). On the other hand, the validation cohort showed no notable disparity in OS between the two groups, with a median OS of 18 months for Cluster1 and 12 months for Cluster2 (p = 0.102) (Fig. 3B).

Fig. 3
Fig. 3
Full size image

Survival analysis and risk factor assessment on different pathological cluster groups. (A) Kaplan–Meier curves show a better OS in the cluster1 group than the cluster2 group in the training cohort; (B) Kaplan–Meier curves show no significant difference in OS between the cluster1 group and the cluster2 group in the validation cohort; (C) Univariate Cox regression analysis for the OS of training cohort; (D) multivariate Cox regression analysis for the OS of training cohort; (E) Univariate Cox regression analysis for the OS of validation cohort; (F) multivariate Cox regression analysis for the OS of validation cohort. OS, overall survival. *P < 0.05, **P < 0.01, ***P < 0.001.

Univariate Cox regression analysis revealed that Cluster2 (HR = 2.063, 95% confidence interval (CI) = 1.220–3.486, p = 0.007), pathologic stage of II-IV (HR = 3.703, 95% CI = 1.326–10.344, p = 0.013), and histologic grade of G3&G4 (HR = 1.808, 95% CI = 1.065–3.069, p = 0.028) were potential risk factors for OS in the training cohort (Fig. 3C). However, radiotherapy (HR = 0.447, 95% CI = 0.212–0.943, p = 0.035) was found to be a protective predictor for OS in the training cohort. The consistent findings were observed after multivariate Cox regression analysis. Cluster2 (HR = 2.593, 95% CI = 1.377–4.882, p = 0.003), pathologic stage of II-IV (HR = 3.438, 95% CI = 1.188–9.953, p = 0.023), and histologic grade of G3&G4 (HR = 1.973, 95% CI = 1.125–3.461, p = 0.018) were significant risk factors, while radiotherapy (HR = 0.383, 95% CI = 0.172–0.848, p = 0.018) was found to be a protective factor for OS in the training cohort (Fig. 3D).

In the validation cohort, only pathologic stage of II-IV (HR = 2.425, 95% CI = 1.430–4.112, p = 0.001) was observed to be a risk factor for OS using univariate Cox regression analysis (Fig. 3E). However, after conducting multivariate Cox regression analysis, Cluster2 (HR = 1.773, 95% CI = 1.029–3.057, p = 0.039) and pathologic stage of II-IV (HR = 3.113, 95% CI = 1.737–5.580, p < 0.001) were identified as potential risk factors for OS in the validation cohort (Fig. 3F).

Subgroup risk factor assessment

The subgroup analysis in the training cohort revealed that Cluster2 was a significant risk predictor for OS in the following subgroups: age < 65 (HR = 3.943, 95% CI = 1.789–8.693, p < 0.001), male gender (HR = 2.986, 95% CI = 1.222–7.294, p = 0.016), pathologic stage of II-IV (HR = 1.851, 95% CI = 1.097–3.121, p = 0.021), histologic grade of G3&G4 (HR = 2.550, 95% CI = 1.092–5.955, p = 0.031), absence of radiotherapy (HR = 2.107, 95% CI = 1.222–3.631, p = 0.007), and no smoking history (HR = 3.028, 95% CI = 1.298–7.067, p = 0.01) (Fig. 4A). However, in the validation cohort, Cluster2 was only observed to be a potential risk factor for OS in the subgroups of pathologic stage of II-IV (HR = 2.421, 95% CI = 1.263–4.639, p = 0.008) and perineural invasion (HR = 1.907, 95% CI = 1.024–3.553, p = 0.042) (Fig. 4B).

Fig. 4
Fig. 4
Full size image

Subgroup risk factor assessment and co-variates interaction analysis. (A) Univariate Cox regression analysis for the OS of subgroups in training cohort; (B) Univariate Cox regression analysis for the OS of subgroups in validation cohort. *P < 0.05, **P < 0.01, ***P < 0.001.

DEGs and go analysis

Following the predefined threshold criterion, we identified 207 DEGs between the Cluster2 and Cluster1 groups. In the Cluster2 group, there were 64 genes exhibiting increased expression and 143 genes showing decreased expression compared to the Cluster1 group (Fig. 5A). To explore the potential functions of the DEGs further, we performed GO enrichment analysis. Our findings revealed a significant enrichment of DEGs in OXPHOS-related pathways, including fatty acid metabolic process, mitochondrial ATP synthesis coupled electron transport, mitochondrial inner membrane, respiratory chain complex, and oxidoreduction-driven active transmembrane transporter activity (all p < 0.001) (Fig. 5B).

Fig. 5
Fig. 5
Full size image

DEGs screening and immune infiltration analysis. (A) Volcano plot of 207 DEGs; (B) GO enrichment analysis of genes in DEGs; (C) Infiltration abundance of 22 immune cell types between different pathological clusters; (D) Differential expression analysis of immune-related genes. DEGs, differential expressed genes; GO, gene ontology. *P < 0.05, **P < 0.01, ***P < 0.001.

Immune-related analysis

We analyzed the presence of 22 types of immune cell in various pathological subcategories. A significantly decreased presence of T cells regulatory (Tregs) was noted in the Cluster2 group of the TCGA-PAAD dataset (p < 0.01) (Fig. 5C). Further analysis of differential expression in 37 immune-related genes indicated a notable reduction in the expression of TMIGD2 (p < 0.05), TNFRSF4 (p < 0.01), and TNFRSF18 (p < 0.01) in the Cluster2 group (Fig. 5D).

OXPHOS-related genes analysis and gene mutation atlas

Through DEGs analysis, we identified significant expression differences in 28 out of 121 OXPHOS-related genes across different pathological subtypes. These genes exhibited down-regulated expression levels specifically in the Cluster2 group, including NDUFB2, ATP5F1D, NDUFB7, NDUFS7, NDUFA2, COX5B, NDUFS6, NDUFB11, ATP5ME, and UQCRFS1 (all p < 0.01) (Fig. 6A).

Fig. 6
Fig. 6
Full size image

OXPHOS-related genes analysis and gene mutation atlas. (A) Differential expression analysis of OXPHOS-related genes; (B) Association between different pathological clusters and gene mutation features. OXPHOS oxidative phosphorylation. *P < 0.05, **P < 0.01, ***P < 0.001.

We further compared the gene mutation differences between the two groups. In the TCGA-PAAD dataset, both TP53 and Ras genes exhibited mutation rates exceeding 50% in both groups, with missense mutation being the predominant type. Additionally, PI3K-Akt, Wnt, and p53 signaling pathways also showed mutation rates higher than 50% in both groups. Comparatively, the mutation frequency of the CDKN2A gene was higher in Cluster2 (11/48, 23%) than in Cluster1 (4/53, 8%) (p = 0.048) (Fig. 6B).

Discussion

Despite recent advances in PAAD diagnosis and treatment outcomes driven by the flourishing development of multi-omics studies14,52,53, their translation into clinical practice remains limited, presenting an ongoing challenge to the global burden of PAAD. Molecular subtyping categorizes pancreatic cancer into distinct molecular-level prognoses54, yet it has not yielded adequate benefits for PAAD clinical management. This study leveraged histopathological image features and employed machine learning algorithms to cluster and establish pertinent Cox prognostic models. We identified the poorer prognosis and pertinent potential clinical risk factors for PAAD patients exhibiting cluster 2 features. Additionally, we unveiled differential gene expression, mechanistic pathways, immune infiltration, and gene mutations linked to this prognosis. To the best of our knowledge, the specific associations between pathological characteristics, prognosis, and potential mechanisms in PAAD, as identified through machine learning (NMF) and deep histopathological image analysis methods, have not been previously reported. Our results indicate a significant correlation between the histopathological features-based pathomic model and OS in PAAD patients, facilitating precise prognosis prediction and risk factor assessment. Furthermore, based on these histopathological features, we identified co-expressed modules of OXPHOS-related genes and pathways, gene mutation patterns, and immune infiltrating cells. Integrating pathomics and transcriptomics could enhance prognosis prediction and uncover interactive mechanisms for personalized treatment and risk stratification in clinical practice.

Due to tumor heterogeneity, molecular and genetic testing is now standard for characterizing cancer, particularly in precision oncology14,52,53,54. Variations in molecular expression can affect tissue and nuclear morphology, providing quantitative insights that enhance diagnostic and prognostic accuracy31. Artificial intelligence has bolstered image feature extraction, revealing valuable hidden information linked to tumor characteristics and survival outcomes55,56. Current research has found that histopathological images can predict the genetic mutation status of colorectal cancer57, liver cancer58, lung cancer59, and ovarian cancer60 through machine learning. In this study, we automated the extraction and identification of quantitative morphological features of histopathological sections, including Zernike shape features, using whole slide imaging (WSI) technology and the computer OTSU algorithm. This efficient and accurate method not only confirms the potential practical value of histopathological image analysis in predicting the prognosis of PAAD patients but also significantly reduces human resource costs, making it suitable for routine practice. Therefore, we believe that, in situations where molecular genomic data is limited, pathomics may be an excellent alternative strategy for predicting the prognosis and common mutations of PAAD patients.

Although histopathological images have been successfully utilized in the establishment of prognostic models for various tumors, the application of this submicroscopic information in stratified prognosis prediction for PAAD is unprecedented. In this study, we innovatively classified PAAD patients into two subtypes based on submicroscopic image features, revealing that patients with cluster 2 pathological characteristics had significantly lower OS than those with cluster 1 pathological characteristics (p < 0.01) (Fig. 3A). Similar results were also observed in an external validation set, although the difference did not reach statistical significance possibly due to insufficient clinical sample size. Furthermore, through univariate and multivariate Cox regression models, we identified cluster 2 and pathologic stage II/III/IV as key risk factors for poor prognosis in PAAD patients, both in the training and validation sets. Subgroup analysis of these patients further demonstrated that pathologic stage II/III/IV was a synergistic risk factor with cluster 2, leading to worse prognosis for PAAD patients (p < 0.05) (Fig. 4). However, the correlation between clinical prognosis and histopathological submicroscopic features still requires further validation in larger cohort studies.

The close association between pathological and radiological features and molecular genetics carries a wealth of microscopic-to-clinical intermediary information. Perez-Johnston R et al. also studied the CT-based radiogenomic and pathomic information of clinical stage I lung cancer patients, demonstrating the existence of associations among these three types of information61. In our study, GO enrichment analysis was employed to reveal a strong correlation between cluster 2 group and oxidative phosphorylation processes, such as ATP synthesis coupled electron transport, respiratory chain complex, electron transfer activity, etc.(Fig. 5B). Previous research has indicated the downregulation of OXPHOS in pancreatic cancer15, which is consistent with our findings. Additionally, we found that PAAD patients with cluster 2 subtype exhibited lower OXPHOS levels compared to those with cluster 1 subtype, especially in the expression of 28 OXPHOS-related genes (NDUF subunits, ATP5 subunits, COX subunits, etc.), which were significantly downregulated (all p < 0.05) (Fig. 6A). These findings imply a direct link between the extent of OXPHOS downregulation and poorer OS in PAAD, which probably implies that intervening in the OXPHOS pathway represents a promising therapeutic strategy for pancreatic cancer. Presently, efforts are being made to formulate targeted therapeutic agents for PAAD targeting this pathway22,62.

CDKN2A mutation is considered to be associated with various primary tumors, particularly one of the most common gene mutations in pancreatic cancer63,64. Baek et al.64 found that CDKN2A mutations were associated with poor prognosis in PAAD through gene mutation survival analysis. However, a study analyzing genomic alterations in a cohort of 608 Chinese PAAD patients did not support the prognostic value of CDKN2A mutations65. Nevertheless, our research found that CDKN2A mutations were not only associated with OS in PAAD patients but also closely linked to the poor OS of patients with cluster 2 histopathological subtyping, suggesting a dose-response relationship between the mutation of this gene and PAAD prognosis. However, further in-depth research is needed to investigate the prognostic value of this gene in pancreatic cancer due to limitations in sample sizes and cohorts.

On the other hand, Tregs have long been considered closely associated with the progression of PAAD, playing a crucial role in the tumor microenvironment (TME). However, whether Tregs promote or inhibit pancreatic cancer is complex and contentious66. Previous studies have indicated that high levels of FOXP3+ Treg infiltration may exacerbate the development of pancreatic cancer by inhibiting the function of effector T cells and modulating the immunosuppressive state within the tumor microenvironment66,67,68. However, there is also evidence suggesting that in certain cases, reducing Tregs levels may lead to worsening conditions, as this can trigger the activation of other immunosuppressive mechanisms, such as increased recruitment of myeloid-derived suppressor cells (MDSCs), thereby accelerating tumor progression69,70, or blocking the differentiation of specific types of cancer-associated fibroblasts, thereby promoting tumor growth71. Our findings are consistent with these, as analysis of 22 immune cell infiltrations revealed a significant downregulation of Tregs expression in PAAD patients with cluster 2 subtype (p < 0.05) (Fig. 5C), suggesting its association with poor prognosis. However, the potential reasons for this contradictory phenomenon may lie in the varying impacts of different Treg subpopulations on tumor prognosis72. This necessitates further foundational experiments to identify and validate the previously undisclosed Treg subpopulations associated with pancreatic cancer prognosis. Furthermore, we also observed a significant downregulation of TNFRSF14, TNFRSF18, and TMIGD2, which are closely associated with Tregs infiltration, in the cluster 2 group (p < 0.05) (Fig. 5D). These mutually supportive results indicate that the degree of Tregs infiltration in the tumor microenvironment of PAAD is significantly correlated with prognosis, and targeting this pathway may be an effective approach for anti-tumor immunotherapy73,74.

Although we have pioneered the use of pathomics in pancreatic cancer research, there are still some limitations. Currently, research on the correlation between histopathology and related molecular mechanism is still in its infancy. Data from public databases showed considerable inter-individual image variability. Enhancing the alignment between TCGA and clinical datasets and refining prognostic accuracy are imperative. Achieving this necessitates enhanced submicroscopic image interpretation skills and iterative machine learning models. Secondly, this is a single-center retrospective study, inevitably affected by confounding factors and limited sample size. We plan to collaborate with multiple pathology centers to expand the sample size, further validate the prognostic value of pathomics in PAAD, and develop relevant prognostic models. Finally, only representative sub-images were selected or identified in lesion delineation, which cannot reflect the overall characteristics of WSI, resulting in selection bias. This requires artificial intelligence with stronger computing power to screen and analyze effective information.

In summary, the acquisition of submicroscopic image features of histopathological slides through deep learning is an important biomarker for predicting OS, molecular subtypes, gene mutations, and potential mechanisms in PAAD patients. Our study defined, for the first time, the association between cluster 2 histopathological features and worse OS in PAAD patients, as well as revealed the potential biological mechanisms of the OXPHOS pathway, Tregs depletion, and CDKN2A mutations. The rise of big data, advancements in histopathology, and the growing need for precision medicine will lead to a shift towards integrating pathomics with genomics and proteomics in future research.