Introduction

Pancreatic ductal adenocarcinoma (PDAC) is a malignant tumor with poor prognosis, with a 5-year survival rate of only 2–9%1. There are various clinical and pathological factors that would affect the prognosis of patients with PDAC, including tumor size, grade, stage, lymph node metastasis, and surgical margin status2. In recent years, the search for new biomarkers has emerged as a promising avenue to improve the dismal prognosis of PDAC patients, including the exploration of carbohydrate antigen 19 − 9 (CA19-9)3, haemoglobin A1c4, laminin-5 gamma-2 (LAMC2)5, long non-coding RNAs (lncRNAs)6, extracellular vesicles (EVs)7, and collagen morphological features8. Tumor microenvironment (TME) is a hallmark of PDAC, comprising more than 50% of tumor tissue, and is composed of various components such as tumor cells, immune cells, extracellular matrix, fibroblasts, blood vessels, nerve tissue, cytokines, etc. It is a complex network of interactions, plays a critical role in disease progression, and strongly influences the biological behavior of tumor, ultimately impacting patient prognosis9.

The extracellular matrix (ECM) is composed mainly of collagen fibers and is an important role in TME10. Collagen fibers either facilitate tumor invasion through providing “highways” according to their orientation11or impede tumor invasion via acting as a barrier against migration12. Collagen features have demonstrated the prognostic value in different cancer types: Xi et al. used large-scale tumor-associated collagen signatures (TACSs) and their corresponding microscopic features for prognostic prediction of breast cancer13,14; Chen et al. constructed a competing risk nomogram including collagen features, tumor size, tumor differentiation status, and lymph node metastasis for individual prediction of postoperative peritoneal metastasis in gastric cancer with serosal invasion15; and Hu et al. characterize collagen fibers by means of texture analysis of SHG images using orientation-dependent gray level co-occurrence matrix method to differentiate between normal and cancerous human pancreatic tissues16. Compared to traditional gold standards for detecting collagenous distributions in PDAC, such as hematoxylin and eosin (H&E) staining or Masson’s trichrome staining, multiphoton microscopy (MPM) would offer more detailed tissue structural information of unprocessed specimens through a combination of two-photon excited fluorescence (TPEF) and second-harmonic generation (SHG), where TPEF imaging could incorporate tumor cell information, while SHG imaging can provide a direct and label-free way to visualize collagen structure17. Therefore, this technique could be a powerful tool for basic and clinical researchers to obtain the morphological changes in both cells and their surrounding collagen fibers within biological tissues.

In this work, we attempted to explore new biomarkers based on collagen features for the prognosis prediction of patients with PDAC using MPM combined with computerized image processing method. We clearly observed eight tumor-associated collagen signatures (TACS1-8) in multiphoton images of PDAC tissues, and obtained a TACS-score for each patient based on the combined TACS1-8 using ridge regression analysis. In addition, we further used an automated image processing technique to extract 142 microscopic tumor-associated collagen signatures (M-TACS) from SHG images and then applied the least absolute shrinkage and selection operator (LASSO) regression analysis to select the most robust features to construct M-TACS-score. Statistical results demonstrate that TACS-score and M-TACS-score have good ability to predict the overall survival (OS) of patients with PDAC, and may be adopted as independent indicators by pathologists for prognostic prediction.

Materials and methods

Sample preparation

This study was approved by the Institutional Review Board of the First Affiliated Hospital of Fujian Medical University. A total of 149 formalin-fixed, paraffin-embedded (FFPE) pancreatic cancer tissue samples were collected from 149 patients diagnosed between 2010 and 2019. Tissue Sect. (5 μm thick) were prepared in pathology lab using an ultra-thin, semi-automated microtome, yielding two consecutive sections per sample. Following deparaffinization with alcohol and xylene, one section was stained with hematoxylin and eosin (H&E), while the adjacent section was reserved for label-free MPM imaging.

Multiphoton microscopic imaging system

Such as our previously study8, we used a commercial MPM imaging system in this study which is composed of a laser scanning microscope (Zeiss LSM 880, Germany) and a mode-locked Ti: Sapphire laser tunable from 690 nm to 1064 nm (Chameleon Ultra, Coherent, USA). For excitation, we used a laser with 810 nm, along with a 20× objective lens (Plan-Apochromat, NA = 0.8, Zeiss, Germany) to obtain high-contrast MPM images. Two independent channels were employed to simultaneously capture backward-directed SHG and TPEF signals, where SHG signal (displayed in green) was detected within the 395–415 nm range and TPEF signal (displayed in red) was detected within the 428–695 nm range. All images were automatically captured and stitched by Zeiss software after being averaged twice with a pixel depth of 12 bits.

Statistical analysis

All statistical analysis was performed with R 4.2.2 and IBM SPSS Statistics 24. The overall survival (OS) was defined as the time from diagnosis to death, or if the patient was enrolled from diagnosis to the end of follow-up. We employed the LASSO-Cox regression to select the most useful prognostic markers for time-to-event analysis from high-dimensional features18, and used the R package “glmnet” to implement the LASSO-Cox regression analysis19. Other analytical methods were included in supplemental materials. Statistical tests were two-sided, with a threshold of P < 0.05 indicating statistical significance.

Results

Baseline clinical characteristics

For this study, 149 patients were divided into training (107 cases) and validation (42 cases) cohorts, and the baseline clinicopathological data were collected for each patient, including age at diagnosis, sex, TNM stage, differentiation, tumor location, perineural invasion, and lymphvascular invasion. Table 1 shows the clinical characteristics of patients in the two cohorts. The mean age of patients at the diagnosis of PDAC was 60.9 years (range, 28–86 years), and the median follow-up for OS was 13 months (range, 1–72 months).

Table 1 Characteristics of patients with PDAC.

Definition of TACS1-8 and TACS-score

As shown in Fig. 1, we also identified eight TACSs (Supplementary methods) in large-scale MPM images (~ 2.5 mm) of PDAC tissues as our previous study13. Pathologist should first confirm the presence of tumors and their borders in the H&E image of tissue section from each sample, and several non-overlapping regions of interest (ROIs) were numbered over the invasive margins and adjacent areas throughout the whole tissue section and MPM imaging was performed on the adjacent slice according to these ROIs. Subsequently, all MPM images were examined by three independent observers (Jikui Miao, Gangqin Xi, Xiwen Chen), and for each ROI, an individual TACS was identified if at least two observers answered “yes”. There may be multiple TACSs present in one ROI, and one TACS may be present in multiple ROIs. Finally, as shown in Fig. S1, TACS information was quantified as an 8-tuple vector, which reflects the frequency of TACSi (i = 1, 2, … or 8) present in all ROIs. We could further obtain the coefficients for each TACS based on the quantified TACS and OS data from training cohort using ridge regression analysis with cross-validation. These coefficients were then fixed in a formula to calculate patient-specific TACS-score, and the calculation formula of the TACS-score was as follows:

$$\begin{aligned} {\text{TACS}} - {\text{score}} = & ( - 1.39851567*{\text{TACS}}1) + (2.36452242*{\text{TACS}}2) + (0.26478416*{\text{TACS}}3) \\ & + ( - 0.89202496*{\text{TACS}}4) + (0.03964443*{\text{TACS}}5) + (0.06859262*{\text{TACS}}6) \\ & + (0.22637345*{\text{TACS}}7) + (1.39906726*{\text{TACS}}8) \\ \end{aligned}$$
Fig. 1
figure 1

MPM images of TACS 1–8 in PDAC. TACS1: characterized by curved collagen fibers wrapping around emergent tumor foci; TACS2: characterized by stretched collagen fibers wrapping around tumor foci due to tumor growth; TACS3: characterized by collagen fibers perpendicular to tumor foci to facilitate tumor cell migration; TACS4: characterized by reticular distribution of collagen fibers adjacent to expanding tumor and leading to a clear tumor boundary; TACS5: characterized by directionally distributed collagen fibers in tumor boundary that enable tumor cell migration ; TACS6: characterized by chaotic distribution of collagen fibers in tumor boundary that enables multidirectional tumor cell migration; TACS7: characterized by densely-distributed collagen fibers at the tumor invasion front; TACS8: characterized by sparsely-distributed collagen fibers at the tumor invasion front. Scale bar: 100 μm.

Definition of M-TACS and M-TACS-score

We further developed an automatic image processing method based on Gaussian mixture models to extract collagen microscopic features from SHG images14,20. A total of 142 microscopic TACSs (M-TACSs) including 8 morphologic features and 134 textural features were extracted as displayed in Table S1. All ROIs were averaged to generate 142 M-TACSs values for each patient (Fig. S2). In this study, we chose a total of 75,647 images with 512*512 pixels (~ 0.3 × 0.3mm2) from the SHG images of the ROIs with TACS pattern for feature extraction, and used LASSO-Cox regression to choose the most robust features to establish a M-TACS-score in training cohort. LASSO uses L1-penalty to shrink some of regression coefficients toward zero, depending on the penalty parameter (tuning constant) λ. Here, 10-fold cross-validation was used to determine the optimal value of penalty parameter λ (λ = 0.02659, with log (λ)=−3.627), and 7 non-zero coefficients were finally chosen based on the association between each M-TACS feature and OS. A formula was generated using a linear combination of the selected 7 features that were weighted by their respective regression coefficients to calculate M-TACS-score for each patient.

$$\begin{aligned} {\text{M}} - {\text{TACS}} - {\text{score}} = & (0.{\text{196}}0{\text{3462}}*{\text{Length}}) + \left( {0.0{\text{2646767 }}*{\text{Kurtosis}}\;{\text{of}}\;{\text{histograms}}} \right) \\ & + \left( {0.{\text{3788591}}0*{\text{GLCM}}\;{\text{correlation}}\_{\text{9}}0^\circ \_{\text{1 pixel}}} \right) + \left( {0.{\text{49624342}}*{\text{GLCM}}\;{\text{correlation}}\_{\text{9}}0^\circ \_{\text{5}}\;{\text{pixel}}} \right) \\& + \left( { - 0.{\text{18949484}}*{\text{Gabor}}\;{\text{variance}}\_0^\circ \_{\text{1 scale}}} \right) + \left( {0.0{\text{8628374}}*{\text{Gabor}}\;{\text{variance}}\_{\text{12}}0^\circ \_{\text{3 scale}}} \right) \\ &+ \left( { - 0.{\text{35764771}}*{\text{Gabor}}\;{\text{mean}}\_{\text{15}}0^\circ \_{\text{4 scale}}} \right). \\ \end{aligned}$$

Prognosis predictive performance of TACS-score and M-TACS-score

The histograms of TACS-score and M-TACS-score (Fig. 2A) demonstrate similar data distribution in two cohorts, and heatmaps (Fig. 2B) indicate the relationship between the risk scores and OS. A lower score (TACS-score or M-TACS-score) is associated with better prognosis (longer OS), while a higher score is associated with worse prognosis (shorter OS). There is a relatively apparent demarcation line at 6 months for the two scores in the two cohorts. Most of bars larger than 6 months are blue and have a good prognosis, while bars lower than 6 months are mostly red and have a poor prognosis. As shown in Fig. 3, TACS-score and M-TACS-score are significantly associated with OS in univariate Cox proportional hazards regression analysis, and after adjusting for the 7 clinical variables, both scores remain independent prognostic indicators for predicting OS in the whole cohort. Our previous study also indicated that TACS-score and TCMF-score were the key determinants of prognosis in breast cancer13,14. Moreover, the two scores also demonstrate superior risk stratification capability (higher HR) compared to clinicopathological factors. We further performed the receiver operating characteristic (ROC) analysis in the training and validation cohorts. The TACS model (AUC at 1-year OS, 0.765; 95% CI, 0.673–0.858) and M-TACS model (AUC at 1-year OS, 0.679; 95% CI, 0.577–0.782) perform better than the CLI model (AUC at 1-year OS, 0.630; 95% CI, 0.524–0.737) in the training cohort (Fig. 2C), where the CLI model includes seven clinical risk factors (age, sex, TNM stage, differentiation, perineural invasion, lymphvascular invasion and tumor location). Similar results are also observed in validation cohort, demonstrating the stable prediction performance of the two scores.

Fig. 2
figure 2

(A) Distribution histograms of TACS-score and M-TACS-score in training and validation cohorts. (B) Heatmaps show the relationship between TACS-score and M-TACS-score of the two cohorts and overall survival (OS). (C) ROC curves of the CLI, TACS-score, M-TACS-score, and CLI + TACS + M-TACS models in predicting 1-year OS in two cohorts. OS, overall survival; mth, month.

Fig. 3
figure 3

Univariate and multivariate Cox proportional hazards regression models for analyzing the relationship between prognostic factors and OS in patients with PDAC, and forest plot showing the hazard ratio of different prognostic factors. NA, not available; OS, overall survival; PDAC, pancreatic ductal adenocarcinoma.

Improved prognostic performance by combining TACS-score, M-TACS-score and CLI model

When the two scores (TACS + M-TACS) were incorporated into the CLI model, called the full model (CLI + TACS + M-TACS), the AUC could further increase to 0.796 (95%CI, 0.710–0.881) in the training set and to 0.778 (95%CI, 0.624–0.932) in the validation set (Fig. 2C). Furthermore, a nomogram (Fig. 4A) combining the seven clinical factors (CLI model), TACS-score, and M-TACS-score was established from the training cohort to visually predict 1-year OS in patients with PDAC, wherein each prognosticator was given a point based on the corresponding point scale and the sum of total points of all prognosticators was used to predict the survival of PDAC patients. We can see that TACS-score and M-TACS-score considerably outweigh clinical factors. We also used the optimal cutoff values (Youden index: CLI model is 0.1633; TACS-score is −0.0672; M-TACS-score is 3.9682; and full model is −0.0651) of the four predictive models respectively for reclassifying high-risk and low-risk patients in both training and validation cohorts, and then used the Kaplan-Meier survival curves to perform survival analysis. As shown in Fig. 4B, TACS-score, M-TACS-score and full model indicate an enhanced ability for risk stratification by comparing with the CLI model. We further evaluated the ability of risk stratification and discrimination of different models in different subgroups, and Table 2 shows that TACS-score and M-TACS-score also have good performance and the full model (CLI + TACS + M-TACS) achieves the best performance. Higher discrimination and risk stratification ability from statistical results validate TACS- and M-TACS-score as two strong OS prognosticators.

Fig. 4
figure 4

(A) Nomogram combining TACS-score, M-TACS-score, age, sex, perineural invasion, lymphovascular invasion, TNM stage, differentiation and tumor location from the training cohort to predict 1-year OS in patients with PDAC. (B) Kaplan-Meier curves of OS according to the CLI, TACS, M-TACS, and CLI + TACS + M-TACS models in the training and validation cohorts. The red lines indicated high risk and blue lines indicated low risk. A two-sided log-rank test was performed to determine statistical significance. OS, overall survival; PDAC, pancreatic ductal adenocarcinoma; HR, hazard ratio.

Table 2 Performance comparison of the four models on overall survival for different clinicopathological subgroups.

Discussion

Pancreatic ductal adenocarcinoma (PDAC) is a disease with an extremely poor prognosis due to several reasons, including high invasiveness and metastatic potential, molecular and genetic complexity, drug resistance, a unique tumor microenvironment, and the lack of effective early detection markers and treatment options. It was already well-known that many conventional clinical factors are associated with prognosis such as tumor location, tumor size, lymph node status, perineural invasion, and vascular invasion, but are not satisfying. It is necessary to seek more effective biomarkers for prognosis prediction in PDAC patients and therefore to make more appropriate therapy plan. Previous studies have showed that collagen fibers in tumor microenvironment have important influence on tumor development: Neesse et al. demonstrated that the ablation or modification of extracellular matrix components to reduce tissue tension and intra-tumoral pressure may result in improved tumor perfusion and therapeutic response21; Kashiwagi et al. confirmed that the gene expression of collagen in pancreatic cancer would promote or inhibit cancer progression, depending on the interaction between tumor cells and tumor microenvironment22; and Bachy et al. revealed that collagen structures in PDAC could affect macrophage phenotype and function by enhancing the immunosuppressive characteristics of macrophages23. Currently, histological examination of resection specimens such as masson-trichrome staining or picrosirius red staining remains the standard method for observing collagen distribution, however, requires a significant amount of time and effort.

MPM combining SHG and TPEF imaging is a powerful tool that can label-free capture cells and collagen fibers in biological tissues, providing exceptional resolution comparable to H&E-stained tissue section images. Bodelon et al. used this tool to study mammary collagen architecture and found a correlation between collagen profiles and breast cancer risk24, Rogart et al. demonstrated the effectiveness of MPM to examine gastrointestinal mucosa at cellular level25, and Cromey et al. assessed the feasibility of this technique in rapidly distinguishing pancreatic cancer from normal tissues with great accuracy26. Our previous study focused on the correlation between collagen distribution in breast tumor microenvironment and prognostic prediction and developed a method combining multiphoton images and an automatic image processing algorithm for quantifying collagen features13,14. Here, we sought to apply this method to study PDAC. MPM images show a clear spatial distribution between tumor cells and collagen fibers in tumor microenvironment, and could effectively identify TACS 1–8 in PDAC tissues (Fig. 1), which may be helpful to comprehend tumor development. TACS 1–3 were observed during tumor initiation stage, while TACS 4–8 were present during the invasive stage of tumor in a large-scale. We also assessed the frequency of occurrence of the eight types of TACS, rather than the density or number of TACS, and then employed the ridge regression analysis with cross-validation to retrieve the coefficients of each TACS. Subsequently, all TACS coefficients were incorporated into a formula to calculate a patient-specific collagen score, TACS-score. Statistical results show that TACS-score is an independent prognostic factor for OS of PDAC patients and has good performance such as discriminatory accuracy and risk stratification by comparing with the CLI model (combining seven clinical risk factors, including age, sex, TNM stage, tumor location and differentiation, perineural and lymph-vascular invasion). These findings substantiate the potential of TACS1-8 to be extended to the risk assessment of other cancer types.

It is well known that microscopic changes often precede macrostructural changes and usually occur at the early stage of disease, making them difficult to observe27. TACS 1–8 focus more on the macroscopic morphology of collagen fibers in tumor microenvironment. Collagen microscopic features such as textural features also play crucial role in biomedical research field15,28,29,30. In this work, computer-aided image processing technique was further utilized to automatically extract 142 microscopic features of collagen fibers (M-TACS) from SHG images, including morphologic features, histogram-based features, gray-level co-occurrence matrix (GLCM)-based features, and Gabor wavelet transformation features, and a M-TACS-score, also as an independent prognostic indicator in PDAC, was obtained by selecting the most robust features using LASSO regression analysis, where the Gabor wavelet transformation features were given the highest weights, highlighting their crucial role in the score system. Furthermore, statistical analyses show that the full model combining the two score (TACS- and M-TACS-score) and CLI model could further improve the prognostic prediction ability in patients with PDAC, emphasizing the important role of collagen characteristics in tumor microenvironment. Moreover, there are some potential study limitations in this work: firstly, sample size is small; secondly, this is a single-center retrospective study; and lastly, maybe there is interobserver variability due to manual annotation. In the future, more patients should be enrolled and the results need to be further validated by multi-center datasets. Also, deep learning methods could be tried to explore for automatically identifying TACS patterns or extracting collagen features.

Conclusion

In this work, MPM combining an automatic image processing method was used to label-free study collagen features in PDAC tumor microenvironment, and two optical biomarkers, TACS-score and M-TACS-score, were presented for improving prognostic prediction in patients with PDAC. Statistical analyses show that TACS- and M-TACS-score are independent prognostic factors for the overall survival (OS) in PDAC patients and have well performance in discrimination ability and risk stratification. A combination of multiphoton imaging with computer-aided image processing technology may be expected to be a new approach for researching PDAC and providing more effective prognostic information.