Development of a serum protein biomarker panel for the diagnosis of pancreatic ductal adenocarcinoma using a machine learning approach

Shin, Dong Woo; Cho, Je-Yoel; Cho, Sukki; Youn, Yuna; Hwang, Jin-Hyeok

doi:10.1038/s41598-025-19631-1

Download PDF

Article
Open access
Published: 13 October 2025

Development of a serum protein biomarker panel for the diagnosis of pancreatic ductal adenocarcinoma using a machine learning approach

Dong Woo Shin^1,2,
Je-Yoel Cho³,
Sukki Cho^4,5,
Yuna Youn⁶ &
…
Jin-Hyeok Hwang^6,7

Scientific Reports volume 15, Article number: 35659 (2025) Cite this article

294 Accesses
Metrics details

Subjects

Abstract

Early detection of pancreatic ductal adenocarcinoma (PDA) remains a major clinical challenge due to the lack of reliable biomarkers. We developed and validated a machine learning (ML)-based serum protein biomarker panel to enhance PDA diagnosis. Serum concentrations of 47 protein biomarkers were measured in 355 individuals using a Luminex™ bead-based immunoassay. Multiple ML algorithms were employed to construct a diagnostic model, with SHapley Additive exPlanations (SHAP) analysis used to determine the importance of each biomarker. The diagnostic performance of the panel was assessed by the area under the receiver operating characteristic curve (AUROC), F1 score, sensitivity, specificity, and accuracy, and further validated in an independent cohort of 130 individuals. Among the tested models, CatBoost demonstrated the highest diagnostic accuracy. SHAP analysis identified CA19-9, GDF15, and suPAR as key biomarkers, and the combined panel significantly outperformed CA19-9 alone in detecting PDA across all stages (AUROC 0.992 vs. 0.952) and in early-stage PDA (AUROC 0.976 vs. 0.868). Validation in another cohort confirmed the robustness of the model, with AUROC values of 0.977 for all stages and 0.987 for early-stage PDA. These findings suggest that ML-integrated biomarker panels may enable earlier and more accurate PDA detection in clinical practice.

Serum biomarker-based early detection of pancreatic ductal adenocarcinomas with ensemble learning

Article Open access 20 January 2023

Down regulation of Cathepsin W is associated with poor prognosis in pancreatic cancer

Article Open access 04 October 2023

Early detection of pancreatic cancer by comprehensive serum miRNA sequencing with automated machine learning

Article Open access 28 August 2024

Introduction

Pancreatic ductal adenocarcinoma (PDA) is a highly aggressive and heterogeneous malignancy associated with substantial clinical and economic burdens¹. Early detection is critical for improving patient prognosis, as the 5-year survival rate (5YSR) declines sharply with disease progression: 44.0% for localized disease, 16.2% for regional spread, and only 3.1% for metastatic PDA². Despite palliative chemotherapy regimens, such as FOLFIRINOX or nab-paclitaxel combined with gemcitabine, patients with metastatic PDA typically survive for < 12 months^3,4. Surgical resection remains the only potentially curative treatment; however, only 10–20% of newly diagnosed patients with PDA are eligible owing to late-stage presentation^5,6. In contrast, patients diagnosed at an early stage and treated with adjuvant chemotherapy (e.g., modified FOLFIRINOX) can achieve a median survival of 54.4 months⁷. Among those with favorable prognostic factors, including complete resection (R0) and absence of lymph node metastasis (pN0), the 5YSR increases to 38.2%⁸. Given the aggressive nature of PDA, early detection and timely intervention are essential for improving survival rates⁹.

Early PDA diagnosis remains challenging because of several factors, including lack of highly sensitive and specific screening biomarkers, vague early symptoms (e.g., epigastric pain, obstructive jaundice, weight loss), rapid disease progression, and the pancreas’s concealed anatomical location¹⁰. Carbohydrate antigen 19 − 9 (CA19-9) is the most widely used serum biomarker for PDA, with a pooled sensitivity of 79% (range: 70–90%) and specificity of 82% (range: 68–91%)¹¹. However, its low positive predictive value limits its utility for screening asymptomatic individuals¹². Recent advances in protein biomarker research have identified several promising candidates for diagnosing PDA¹³. Furthermore, machine learning (ML) provides a powerful approach for integrating multi-omics data—including genomics, transcriptomics, epigenomics, and proteomics—to identify optimal biomarker combinations for early detection¹⁴. Given the complex biology of PDA, ML-driven biomarker discovery is essential for addressing the limitations of traditional single-marker approaches and advancing early detection strategies. We aimed to develop an ML-based serum protein biomarker panel to improve the diagnostic accuracy of PDA, particularly in its early stages.

Materials and methods

Study design and cohort selection

Two independent cohorts were analyzed: Cohort A, comprising 355 individuals (181 with PDA and 174 healthy controls), served as the development set for identifying potential biomarkers and constructing an optimal biomarker panel. Cohort B, comprising 130 individuals (100 with PDA and 30 healthy controls), served as the validation set to assess the diagnostic performance and generalizability of the developed panel.

Biobank resources

Serum samples for Cohort A were obtained from the Human Bioresource Center of Seoul National University Bundang Hospital, while those for Cohort B were collected from regional university hospital–based biobanks in Daegu, Republic of Korea.

Inclusion criteria and sample collection

Patients aged ≥ 18 years and with histologically confirmed diagnosis of PDA were included. Blood samples were collected before any therapeutic intervention to ensure that biomarker levels reflected the untreated disease state. Control samples were collected from individuals with no history of malignancy.

Ethical approval and data confidentiality

This study adhered to the ethical principles outlined in the Declaration of Helsinki. The protocol was approved by the Institutional Review Board (IRB) of Seoul National University Bundang Hospital (IRB number X-1909-564-901). All participants provided written informed consent before sample collection. Additionally, the study complied with the Health Insurance Portability and Accountability Act regulations to ensure the confidentiality and security of participant data.

Study workflow

The study comprised six key steps (Fig. 1):

a.
Biomarker quantification: Serum levels of 47 candidate protein biomarkers were measured in Cohort A using Luminex™ bead-based multiplex immunoassays.
b.
ML-based biomarker analysis: Multiple ML algorithms analyzed serum biomarker data to identify key features associated with PDA classification.
c.
Model training and validation: A five-fold cross-validation approach was used to evaluate model performance. The dataset was divided into five equal folds, with the model trained on four folds and validated on the remaining fold. This process was repeated five times, and final performance metrics were calculated by averaging the results¹⁵.
d.
Feature importance analysis: SHapley Additive exPlanations (SHAP) were used to assess each biomarker’s contribution to the model’s predictions¹⁶. This method provided an interpretable ranking of biomarkers, highlighting proteins that significantly influenced PDA classification performance.
e.
Development of a biomarker panel: An optimal biomarker panel was constructed based on features selected through ML. Its diagnostic performance was evaluated using multiple metrics, including the area under the receiver operating characteristic curve (AUROC), F1 score, sensitivity, specificity, and accuracy.
f.
Independent validation: The developed biomarker panel was validated using an independent dataset (Cohort B) to assess its diagnostic accuracy and generalizability.

Serum protein biomarker quantification

Luminex™ multiplex assay procedure

Serum protein biomarkers were quantified using the Luminex™ 200 system (Austin, TX), a high-throughput multiplex platform that enables simultaneous analysis of multiple analytes within a single sample^17,18. The assay protocol followed these steps:

a.
Prewetting and plate preparation: Each well was prewetted with 100 µL of wash buffer and incubated for 10 min.
b.
Reagent loading: After removing the wash buffer, 25 µL of each standard, quality control sample, and assay buffer were added to the designated wells, followed by 25 µL of the matrix solution, following the manufacturer’s recommendations¹⁹.
c.
Bead incubation: A total of 25 µL of fluorescently labeled beads, conjugated with target-specific antibodies, was added to each well. Plates were incubated overnight at 4 °C on a plate shaker to facilitate antigen-antibody binding.
d.
Detection antibody binding: After two washes with 200 µL of wash buffer, 25 µL of biotinylated detection antibodies were added and incubated for 1 h at room temperature on a plate shaker.
e.
Signal amplification: 25 µL of streptavidin-phycoerythrin was added and incubated for 30 min at room temperature. The plate was subsequently washed twice with 200 µL of wash buffer to remove unbound reagents.
f.
Fluorescence detection: 100 µL of sheath fluid was added to each well, and the beads were resuspended on a plate shaker for 5 min at room temperature. Fluorescence intensity was measured using the Luminex xPONENT™ software, and biomarker concentrations were calculated using SoftMax Pro (version 5.4), applying a five-parameter logistic regression curve to logarithmically transformed data.

Forty-seven candidate protein biomarkers

This study employed bead-based xMAP™ immunoassays to analyze 47 circulating proteins, which were categorized into six distinct panels: (a) Human angiogenesis/growth factor panel 1 (Millipore, Billerica, NY; catalog number HANG2MAG-12 K), including angiopoietin-2, G-CSF, endoglin, FGF1, and follistatin; (b) Human angiogenesis panel 2 (Millipore, Billerica, NY; catalog number HAGP1MAG-12 K), including sAXL, sHER2, sE-selectin, TSP2, sEGFR, suPAR, sVEGFR1, sPECAM-1, and OPN; (c) Human cancer/metastasis biomarker panel 1 (Millipore, Billerica, NY; catalog number HCMBMAG-22 K), including GDF15, DKK1, NSE, and OPG; (d) Human circulating cancer biomarker panel 1 (Millipore, Billerica, NY; catalog number HCCBP1MAG-58 K), including CA15-3, CA19-9, MIF, leptin, IL-6, CEA, IL-8, HGF, sFas, TNFα, PRL, SCF, Cyfra21-1, FGF2, β-hCG, HE4, TGF-α, and VEGF; (e) Human circulating cancer biomarker panel 3 (Millipore, Billerica, NY; catalog number HCCBP3MAG-58 K), including galectin-3, myeloperoxidase, SHBG, IGFBP3, and ferritin); and (f) Human circulating cancer biomarker panel 4 (Millipore, Billerica, NY; catalog number HCCB4MAG-58 K), including mesothelin, midkine, kallikrein-6, ALDH1A1, EpCAM, and CD44.

Development of a diagnostic biomarker panel using Ml model selection and training (Cohort A)

To develop a diagnostic biomarker panel, multiple ML algorithms were applied across the following categories: (a) Decision tree-based algorithms: Random Forest, Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and CatBoost; (b) Other classifiers: Support Vector Machine (SVM) and k-Nearest Neighbors (KNN); and (c) Ensemble learning: A combined model integrating multiple classifiers to improve diagnostic accuracy.

The dataset was randomly split into training (80%) and testing (20%) subsets. To ensure unbiased model performance, a five-fold cross-validation approach was used. During the random split of data into training and validation sets, stratification was applied based on gender and age to ensure balanced distributions across cohorts and improve comparability. The dataset was divided into five equal folds, with four folds used for training and one for validation. This process was repeated five times, with each fold serving as the validation set once. The final model performance was calculated by averaging the results across all iterations, yielding a robust estimate of diagnostic accuracy. During five-fold cross-validation, the variability in cutoff values across folds was minimal (within ± 2–3%), supporting the robustness and stability of the model.

Feature importance analysis using SHAP in cohort A

To assess the importance of each biomarker, SHAP analysis was applied to rank features based on their contribution to the model’s predictive output. The SHAP values provided an interpretable ranking of biomarker significance, facilitating the identification of the most diagnostically relevant biomarkers.

Validation of biomarker panel in independent cohort B

Cohort B comprised a completely independent set of 130 individuals, including 100 patients with PDA and 30 healthy controls, distinct from Cohort A. The diagnostic model was trained exclusively on Cohort A and directly applied to Cohort B without retraining, using the same thresholds. The cutoff values were determined in Cohort A based on the Youden Index, which balances sensitivity and specificity. These thresholds were then fixed and applied unchanged in Cohort B for independent validation.

Diagnostic performance metrics

Model classification performance was evaluated using confusion matrices, which quantify true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN). Diagnostic performance was further assessed using the following metrics: (a) AUROC measures the ability of the biomarker panel to distinguish patients with PDA from healthy controls. A value of 1.0 indicates perfect discrimination, while a value of 0.5 suggests no discriminative power; (b) Accuracy reflects the overall proportion of correctly classified patients. The formula is (TP + TN) / (TP + TN + FP + FN), where TP and TN represent correct classifications, and FP and FN represent misclassifications; (c) Sensitivity (Recall) measures the model’s ability to correctly identify patients with PDA. It is calculated as TP / (TP + FN), where TP represents correctly identified patients with PDA, and FN represents missed patients with PDA; (d) Specificity indicates the ability to correctly classify healthy controls. The formula is TN / (TN + FP), where TN represents correctly identified healthy controls, and FP represents healthy controls misclassified as patients with PDA; (e) Positive predictive value (PPV, Precision) demonstrates the proportion of true patients with PDA among those predicted as having PDA. It is calculated as TP / (TP + FP), reflecting the reliability of positive results; (f) Negative predictive value measures the proportion of true healthy controls among those predicted as healthy. It is calculated as TN / (TN + FN), reflecting the reliability of negative results; and (g) F1 score balances precision and recall through their harmonic mean. The formula is 2 × (Precision × Recall) / (Precision + Recall), where 1 indicates perfect precision and 0 indicates the worst performance. The diagnostic metrics were visualized using a radar plot to compare the performance of different biomarker panels.

Statistical analyses

Categorical variables were analyzed using the chi-squared or Fisher’s exact test, while continuous variables were assessed using the Student’s t-test. Continuous data were presented as means ± standard deviations, while categorical variables were expressed as frequencies (percentages). Statistical significance was determined using predefined thresholds: *p-value < 0.05 was considered statistically significant, suggesting the results were unlikely to occur by chance; **p-value < 0.01 indicated high significance, providing stronger evidence against the null hypothesis; ***p-value < 0.001 was considered strongly significant, reflecting very high confidence in the results; and p-value ≤ 0.0001 was considered extremely significant, indicating exceptional statistical certainty. Comparisons of diagnostic performance between models were conducted using appropriate statistical tests. DeLong’s test was employed to compare AUROCs, while McNemar’s test was applied for paired comparisons of sensitivity, specificity, PPV, and NPV. All statistical analyses were conducted using SPSS version 25.0 (IBM Corporation, Chicago, IL, USA) and R version 3.4.2 (R Foundation for Statistical Computing). ML analyses were conducted in Python v3.12.4 using libraries for data preprocessing, model development, and performance evaluation. Figures were generated using GraphPad Prism 8.0 (GraphPad Software, La Jolla, CA, USA).

Results

Baseline characteristics of the study cohorts

The baseline demographic and clinical characteristics of patients in Cohorts A and B are summarized in Supplementary Table 1. Cohort A comprised 355 individuals, including 174 patients diagnosed with PDA and 181 healthy controls. Among the patients with PDA, 87 (50.0%) were men, with a median age of 65.0 years. The disease stages were distributed as follows: stage I (19 patients, 10.9%), stage II (23 patients, 13.2%), stage III (48 patients, 27.6%), and stage IV (84 patients, 48.3%). Tumors were most frequently found in the pancreatic head/uncinate process (75 patients, 43.1%), followed by the pancreatic tail (51 patients, 29.3%), and the pancreatic body (48 patients, 27.6%). The median tumor size at diagnosis was 38.9 mm (± 17.1 mm), with tumor size significantly increasing with disease progression (stage I: 23.7 mm ± 10.3 mm; stage II: 33.7 mm ± 12.6 mm; stage III: 37.0 mm ± 15.4 mm; stage IV: 44.8 mm ± 17.6 mm; p < 0.001).

Cohort B, serving as an independent validation cohort, comprised 130 individuals, including 100 PDA patients and 30 healthy controls, independent of Cohort A. The disease stages were distributed as follows: stage II (54 patients, 54.0%), stage III (17 patients, 17.0%), and stage IV (29 patients, 29.0%). Tumors were most frequently found in the pancreatic head/uncinate process (53 patients, 53.0%), followed by the pancreatic body (27 patients, 27.0%) and pancreatic tail (20 patients, 20.0%). The median tumor size at diagnosis was 42.2 mm (± 25.4 mm).

Differential expression of proteomic biomarkers in patients with PDA and healthy controls (Cohort A)

Of the 47 biomarkers analyzed, 12 biomarkers (sHER2, sE-selectin, sVEGFR1, sPECAM-1, FGF-1, NSE, FGF2, β-hCG, HE4, ALDH1A1, EpCAM, and IGFBP3) showed no significant differences when comparing PDA patients with healthy controls. The remaining 35 biomarkers exhibited significant differential expressions, suggesting their potential as discriminatory markers for PDA detection. Specifically, biomarkers were categorized according to significance levels as follows: (1) P ≤ 0.05: G-CSF, DKK-1, MIF, Leptin, VEGF, CD44, SHBG; (2) P ≤ 0.01: sAXL, sEGFR, CA15-3, CEA, sFas, TNFα, Cyfra21-1; (3) P ≤ 0.001: Endoglin, Prolactin, Kallikrein-6, Galectin-3, Ferritin; (4) P ≤ 0.0001: TSP-2, suPAR, OPN, Angiopoietin-2, Follistatin, GDF15, OPG, CA19-9, IL-6, IL-8, HGF, SCF, TGFα, Mesothelin, Midkine, Myeloperoxidase (Fig. 2a, b; Supplementary Tables 2 and Supplementary Fig. 1).

Notably, CA19-9, GDF15, suPAR, HGF, and IL-8 were significantly higher in patients with PDA than in healthy controls across all disease stages, reinforcing their diagnostic potential (Fig. 2c, d). CA19-9 levels were significantly higher in the advanced PDA group than in the early group. However, levels of suPAR, GDF15, HGF, and IL-8 showed no significant differences between the early- and advanced-stage PDA groups.

ML-based biomarker panel development for diagnosing all-stage PDA (Cohort A)

The CatBoost model achieved the highest diagnostic accuracy with an AUROC of 0.993 (Fig. 3a, b; Supplementary Fig. 2). In contrast, SVM and KNN demonstrated poorer diagnostic performance (AUROC values: 0.904 and 0.850) with higher rates of false results, revealing limitations in differentiating between patients with PDA and healthy controls. The ensemble model combining CatBoost, LightGBM, and XGBoost achieved an AUROC of 0.993, which was identical to that of the CatBoost model alone (AUROC 0.993), indicating no improvement over the single CatBoost classifier.

SHAP analysis using the CatBoost model identified CA19-9 as the most influential biomarker (Fig. 3c–f), followed by IL-8, GDF15, suPAR, and HGF. These biomarkers remained top contributors across Random Forest, XGBoost, and LightGBM models, confirming their significance in optimizing diagnostic performance. The diagnostic performance of the biomarker combination panels was further evaluated using the CatBoost model. The combination of CA19-9, GDF15, and suPAR outperformed CA19-9 alone in diagnosing all-stage PDA, achieving a higher AUROC (0.992 [0.985–0.997] vs. 0.952 [0.919–0.973]; p = 0.001, DeLong’s test), sensitivity (95.4 [91.5–98.4]% vs. 87.4 [82.2–91.6]%; p < 0.001, McNemar’s test), and PPV (97.1 [94.4–99.4]% vs. 94.4 [90.8–97.4]%; p < 0.001, McNemar’s test) (Table 1, Fig. 5a).

Table 1 Diagnostic performance of biomarker combination panels for diagnosing PDA in cohort A.

Full size table

ML-based biomarker panel development for diagnosing early-stage (stage I/II) PDA (Cohort A)

The CatBoost model achieved the highest diagnostic performance with an AUROC of 0.981, demonstrating superior sensitivity and specificity while reducing FN and improving diagnostic precision. LightGBM (AUROC: 0.981) and XGBoost (AUROC 0.978) performed well, though they exhibited slightly higher FN rates than those of CatBoost. In contrast, SVM (AUROC 0.891) and KNN (AUROC 0.777) demonstrated suboptimal performance, suggesting limitations in capturing complex biomarker interactions for early-stage PDA detection. An ensemble model combining CatBoost, LightGBM, and XGBoost achieved an AUROC of 0.982, which represented only a negligible improvement compared to the CatBoost model alone (AUROC 0.981) in distinguishing early-stage PDA from healthy controls (Fig. 4a, b).

SHAP analysis using a CatBoost model identified CA19-9 as the most influential biomarker, followed by IL-8, suPAR, and GDF15, which provided strong complementary value in improving diagnostic accuracy (Fig. 4c–f). These biomarkers consistently ranked among the top predictors across multiple ML models, highlighting their significance in early-stage PDA detection. The diagnostic performance of the biomarker combination panels was further evaluated using the CatBoost model. The CA19-9, GDF15, and suPAR panel significantly outperformed CA19-9 alone in early-stage PDA detection, achieving superior AUROC (0.976 [0.957–0.994] vs. 0.868 [0.774–0.933]), sensitivity (85.7 [72.5–93.9]% vs. 66.7 [52.9–78.6]%), and PPV (90.0 [81.3–98.9]% vs. 84.8 [72.7–95.7]%), thereby improving diagnostic accuracy and reducing FNs (Table 1; Fig. 5b).

Subgroup analysis in patients with normal CA19-9 levels (≤ 37 U/mL) (Cohort A)

The distribution of CA19-9 levels by tumor stage was as follows: stage I, 255.3 ± 532.3; stage II, 285.0 ± 436.9; stage III, 292.0 ± 520.6; and stage IV, 1373.2 ± 2145.8 (Supplementary Table 2). Among the 174 patients with PDA in Cohort A, 25 (14.4%) had normal CA19-9 levels (≤ 37 U/mL). In patients with normal CA19-9 levels (≤ 37 U/mL), CA19-9 alone showed limited diagnostic performance (AUROC 0.715, 95% CI: 0.590–0.808). By contrast, the combination of CA19-9, GDF15, and suPAR achieved markedly improved performance (AUROC 0.948, 95% CI: 0.910–0.979) (Supplementary Table 4).

Diagnostic performance of biomarker combination panels in stage I PDA patients (Cohort A)

Among the 174 patients with PDA in Cohort A, only 19 (10.9%) were diagnosed at stage I. Subgroup analysis restricted to stage I PDA patients demonstrated that CA19-9 alone provided limited diagnostic accuracy (AUROC 0.724, 95% CI: 0.538–0.914). By contrast, the biomarker panel consisting of CA19-9, GDF15, and suPAR achieved markedly improved performance, with an AUROC of 0.949 (95% CI: 0.906–0.989). These findings indicate that the multi-marker panel substantially outperformed CA19-9 alone for the early detection of PDA at stage I (Supplementary Table 5).

Age-stratified diagnostic performance of the biomarker panel (Cohort A)

Subgroup analyses stratified by age demonstrated that the biomarker panel retained robust diagnostic performance across both younger (< 65 years) and older (≥ 65 years) patients. For individuals < 65 years, the combination of CA19-9, GDF15, and suPAR achieved an AUROC of 0.990 (95% CI: 0.984–0.996), sensitivity of 90.9% (95% CI: 85.2–96.2), and specificity of 96.7% (95% CI: 94.6–99.2). In patients ≥ 65 years, the same panel achieved a nearly identical AUROC of 0.990 (95% CI: 0.978–0.999), with sensitivity of 94.2% (95% CI: 89.5–98.7) and specificity of 98.9% (95% CI: 97.7–100.0). These findings indicate that the biomarker panel outperformed CA19-9 alone across all age groups, with no evidence of effect modification by age (Supplementary Table 6).

Gender-stratified diagnostic performance of the biomarker panel (Cohort A)

We further assessed diagnostic performance stratified by gender. The biomarker panel maintained robust accuracy in both male and female patients. In males, the combination of CA19-9, GDF15, and suPAR achieved an AUROC of 0.996 (95% CI: 0.992–0.998), sensitivity of 90.8% (95% CI: 85.7–95.7), and specificity of 98.3% (95% CI: 96.2–100.0). In females, the same panel demonstrated an AUROC of 0.987 (95% CI: 0.978–0.997), sensitivity of 89.7% (95% CI: 84.8–95.1), and specificity of 97.8% (95% CI: 96.2–99.5). These findings indicate that the biomarker panel outperformed CA19-9 alone across both sexes, with no evidence of effect modification by gender (Supplementary Table 7).

Validation of the diagnostic performance of the serum protein biomarker panel (Cohort B)

Serum protein biomarker levels in patients with PDA (by stage) and healthy control group are presented in Supplementary Table 3. In all-stage PDA diagnosis, the combined CA19-9, GDF15, and suPAR biomarker panel outperformed CA19-9 alone, with higher AUROC (0.977 [0.954–0.991] vs. 0.829 [0.756–0.877]), sensitivity (95.0 [90.2–98.6]% vs. 83.0 [74.7–89.0]%), and PPV (93.1 [88.8–97.6]% vs. 83.8 [76.3–89.0]%). Similarly, for early-stage (stage I/II) PDA detection, the combined panel achieved superior performance compared with CA19-9 alone, with higher AUROC (0.987 [0.970–0.999] vs. 0.879 [0.806–0.937]), sensitivity (96.3 [90.8–100.0]% vs. 81.5 [72.8–91.0]%), and PPV (91.2 [84.0-98.2]% vs. 84.6 [75.6–92.8]%) (Table 2; Fig. 5c, d; Supplementary Figs. 3–4).

Table 2 Diagnostic performance of biomarker combination panels for PDA in cohort B.

Full size table

Discussion

This study developed an ML-based serum biomarker panel to improve PDA diagnosis by analyzing 47 candidate proteins. SHAP analysis identified five key biomarkers (CA19-9, GDF15, suPAR, HGF, and IL-8) as the most relevant for distinguishing individuals with PDA from healthy controls. Among the ML models evaluated, CatBoost demonstrated the highest diagnostic performance. In Cohort A (n = 355), a diagnostic panel comprising CA19-9, GDF15, and suPAR demonstrated significantly better performance than that of CA19-9 alone for both all-stage (AUROC 0.992 vs. 0.952) and early-stage PDA (AUROC: 0.976 vs. 0.868). These findings were independently validated in Cohort B (n = 130), where the combined panel consistently outperformed CA19-9 alone in both all-stage (0.977 vs. 0.829) and early-stage PDA (0.987 vs. 0.879).

Early diagnosis of PDA remains a significant challenge because of its low incidence, rapid progression, nonspecific early symptoms, and absence of reliable biomarkers^20,21. Therefore, most patients are diagnosed at an advanced stage, significantly limiting treatment options and leading to poor prognosis⁹. PDA is an aggressive malignancy with a poor prognosis and is projected to become the second leading cause of cancer-related deaths by 2030^22,23. Surgical intervention significantly improves survival rates, particularly in early-stage cancers²⁴. Notably, the 5YSR for stage IA PDA has improved markedly, from 44.7% in 2004 to 83.7% in 2012²⁵. These trends underscore the critical importance of early detection of potentially curable PDA to improve long-term survival outcomes²⁶.

CA19-9, first identified by Koprowski in 1979, is the most widely used and validated biomarker for PDA. It demonstrates moderate diagnostic performance, with a sensitivity of 79 (70–90%) and specificity of 82 (68–91%)^11,27,28. However, the United States Preventive Services Task Force²⁹American Society of Clinical Oncology³⁰and European Group on Tumor Markers³¹ recommend against using CA19-9 for routine PDA screening in the general population because of its diagnostic limitations. To address these challenges, ongoing multi-omics studies focused on identifying more reliable biomarkers for PDA⁹. Moreover, considerable progress has been made in the discovery, optimization, and clinical validation of diagnostic biomarkers derived from various biological sources, including blood (e.g., extracellular vesicles^32,33circulating tumor DNA [ctDNA] or cell-free DNA [cfDNA]^34,35mRNA, microRNA [miR]³⁶small nuclear RNA³⁷long noncoding RNA³⁸proteins³⁹metabolites⁴⁰, and pancreatic fluid⁴¹. Recent advances in liquid biopsy-based technologies present a promising complement to traditional cancer screening by enabling the detection of multiple cancer types from a single blood sample. While assays based on cfDNA methylation, such as PATHFINDER and Galleri, demonstrate high specificity in detecting PDA, they are limited by low sensitivity for early-stage disease^42,43. In PDA, ctDNA represents only a small fraction of the total cell-free DNA circulating in the bloodstream, which complicates its detection and reduces its accuracy as a diagnostic tool⁴⁴. In contrast, proteins—being terminal products of the central dogma—integrate genomic, transcriptomic, and post-translational modifications, providing valuable insights into the dysregulated pathways driving PDA progression¹³. Given the high molecular complexity and heterogeneity of PDA, proteomic analysis facilitates the identification of diverse proteins and their modifications, offering a detailed understanding of the molecular landscape of the disease^45,46. Serum proteins, such as CA19-9, GDF15, and suPAR, demonstrate substantially higher abundance in biofluids than that of ctDNA, enabling the detection of localized tumors with a smaller disease burden. CancerSEEK’s multi-analyte approach, combining protein markers with ctDNA analysis, achieved 70% sensitivity for early-stage gastrointestinal cancers by leveraging the biological amplification inherent to secretory pathways⁴⁷. This highlights the complementary role of protein biomarkers in overcoming the limitations of cfDNA-based methods while also capturing the dynamic tumor-stroma interactions critical for PDA pathogenesis.

We selected the Luminex™ bead-based immunoassay as the primary detection platform for its ability to simultaneously quantify multiple protein biomarkers in small serum volumes (25 µL per assay) with high analytical sensitivity and specificity. Compared to mass spectrometry, Luminex offers a shorter turnaround time, lower per-sample cost, and operational feasibility in routine clinical laboratories. Importantly, several FDA-cleared Luminex assays are currently in clinical use, supporting its translational potential. These characteristics make the platform well-suited for large-scale, multi-center validation studies and for eventual integration into high-throughput clinical screening workflows. At this discovery stage, we cannot determine the exact cost of the proposed panel since two components (GDF15 and suPAR) are not yet commercially available as diagnostic kits. The final per-test cost will depend on future commercialization, production scale, and supply chain factors.

In addition to our findings, several large-scale studies have evaluated biomarker panels for PDA in independent validation cohorts. Athanasiou et al. validated a serum biomarker panel for PDA in a large multicenter cohort of over 600 patients, highlighting the importance of extensive validation for clinical translation⁴⁸. Palma et al. demonstrated robust diagnostic performance of a proteomics-based biomarker panel for PDA in a large validation cohort, further supporting the utility of multi-marker approaches⁴⁹. Other studies have identified several novel protein biomarkers for PDA diagnosis, including tissue inhibitor of metalloproteinases 1 (TIMP1), leucine-rich alpha-2-glycoprotein 1 (LRG1), carcinoembryonic antigen (CEA), hepatocyte growth factor (HGF), osteopontin, thrombospondin-2 (THBS2), polymeric immunoglobulin receptor (PIGR), von Willebrand factor (vWF), and fibrinogen. Capello et al. analyzed plasma samples from 187 patients with PDA and 169 controls, demonstrating that a biomarker panel combining TIMP1, LRG1, and CA19-9 significantly improved early-stage PDA detection (AUROC: 0.949, 84% sensitivity at 95% specificity) with better performance than that of CA19-9 alone⁵⁰. Moreover, Cohen et al. reported that a multi-analyte liquid biopsy approach, integrating four protein biomarkers (CA19-9, CEA, HGF, and osteopontin) with Kirsten rat sarcoma viral oncogene homolog (KRAS) mutant circulating tumor DNA, detected resectable PDA with 64% sensitivity and 99% specificity in a cohort of 221 patients and 182 controls⁵¹. Kim et al. demonstrated that combining plasma THBS2 with CA19-9 significantly improved diagnostic performance, achieving an AUROC of 0.949, with sensitivity of 84% and specificity of 95% in a large cohort (n = 537)⁵². Similarly, Byeon et al. reported that the combination of PIGR, vWF, fibrinogen, and CA19-9 achieved higher diagnostic performance for early-stage PDA (AUROC: 0.979) than that of CA19-9 alone (AUROC: 0.911)⁵³. Consistent with previous research, our biomarker panel—comprising CA19-9, GDF15, and suPAR—significantly outperformed CA19-9 alone in diagnostic performance, with AUROC values of 0.992 in Cohort A and 0.977 in Cohort B for all stages. This enhanced diagnostic performance can be attributed to the complementary nature of the multiple biomarkers, which reflect distinct aspects of tumor biology, thereby providing comprehensive coverage of PDA’s inherent heterogeneity and overcoming the limitations associated with single-marker diagnostics.

Growth differentiation factor 15 (GDF-15), also known as macrophage inhibitory cytokine-1 (MIC-1), is a member of the transforming growth factor-beta (TGF-β) superfamily^54,55. GDF-15 is most highly expressed in the placenta and prostate, with lower expression levels observed in the pancreas, liver, gallbladder, colon, stomach, bladder, kidney, and endometrium^54,56. Notably, serum GDF-15 levels were significantly higher in patients with PDA than in healthy controls and those with chronic pancreatitis (7694.6 vs. 2247.9 pg/mL), highlighting its potential as a diagnostic biomarker^57,58,59,60. Beyond its diagnostic utility, GDF-15 plays a role in immune evasion in PDA by suppressing macrophage activity through NF-κB-mediated inhibition of TNF and nitric oxide production, thereby contributing to tumor progression^54,61,62. Numerous studies have demonstrated that combining GDF-15/MIC-1 with CA19-9 significantly improves both sensitivity and specificity for PDA diagnosis^63,64,65. A meta-analysis of 14 studies involving 2,826 participants reported that MIC-1 exhibited higher sensitivity for PDA diagnosis than that of CA19-9 (80% vs. 71%)⁶⁶. Furthermore, serum MIC-1 levels were more effective in distinguishing patients with resectable PDA from control patients than the CA19-9^67,68. These findings underscore the potential of GDF-15 as a valuable biomarker for PDA detection, particularly when used alongside CA19-9.

The urokinase plasminogen activator receptor (uPAR) binds to urokinase-type plasminogen activator (uPA) to catalyze the conversion of plasminogen to plasmin, driving fibrin degradation, tumor invasion, and cancer progression^{69,70,71,72,73}. While uPA and uPAR are minimally expressed in normal tissues, they are markedly overexpressed in malignancies, including PDA, where uPA levels rise sixfold and uPAR fourfold higher than those of the normal tissues^{74,75,76,77,78,79,80,81,82}. Soluble uPAR (suPAR), the circulating form of membrane-bound uPAR, plays a crucial role in cancer progression, immune activation, and inflammation⁸³. Plasma suPAR levels are also significantly higher in patients with PDA than in those with chronic pancreatitis. At a cutoff value of 2.8 ng/mL, suPAR demonstrated a sensitivity of 88% and specificity of 70%⁸⁴, while at a cutoff value of 3.2 ng/mL, it demonstrated a sensitivity of 82% and specificity of 43%. However, when combined with CA19-9, the specificity improved significantly to 86–88%. In another study, combining suPAR and CA19-9 further enhanced the ability to differentiate between patients with PDA and healthy controls, achieving 88.5% sensitivity and 98% specificity⁸⁵.

Linear models, such as generalized linear models, were initially considered but demonstrated inferior performance due to the complex and nonlinear interactions among biomarkers. Therefore, tree-based ML algorithms (CatBoost, LightGBM, and XGBoost) were selected for their ability to capture such nonlinear relationships. ML methods, including feature selection and classification algorithms, significantly enhance biomarker identification by efficiently extracting relevant features and classifying samples⁸⁶. In our study, several ML techniques, including CatBoost, XGBoost, LightGBM, SVM, KNN, and Random Forest, were employed to identify an optimal biomarker combination for PDA diagnosis. Among these methods, tree-based models, particularly CatBoost, achieved the highest AUROC, accuracy, sensitivity, and specificity. One key challenge in ML is the ‘black box’ nature of certain algorithms, which limits their interpretability. To address this, SHAP analysis was used to quantify the contributions of each biomarker, generating a ranked list and illustrating their influence on PDA classification. This approach enhanced the model’s transparency, supporting the clinical translation of our findings and facilitating integration into diagnostic workflows. Our study highlights the role of ML and SHAP in PDA biomarker discovery and validation. The identified panel offers a promising noninvasive method for early detection and improved screening, particularly for high-risk populations.

This study has some limitations. First, although our biomarker panel was validated using two independent cohorts, the number of patients with early-stage PDA remains limited. Neoadjuvant chemotherapy is now commonly used in patients with resectable or borderline resectable PDA to improve surgical outcomes, which reduces the availability of untreated patients with early-stage PDA for research. Therefore, external validation in a larger, multi-institutional cohort is essential to confirm the generalizability of our findings across diverse populations. Second, detecting low-abundance proteins or those with rapid turnover in circulation remains challenging. Some proteomic biomarkers may be present at undetectable levels, or their measurement could be masked by highly abundant proteins, potentially limiting their diagnostic utility. In addition, certain biomarkers (e.g., suPAR, HGF, GDF15) exhibited inter-cohort variability, which may reflect biological heterogeneity or pre-analytical differences. These discrepancies likely stem from biological heterogeneity (such as demographic differences, comorbidities, or disease stage distribution) and pre-analytical variability. Future multicenter validation with harmonization strategies will be needed to minimize potential cohort-specific effects.

Despite certain limitations, our study offers several notable strengths. First, we systematically evaluated 47 serum proteins and employed a ML–based feature selection strategy to identify an optimal multi-marker panel, addressing the biological heterogeneity of PDA that cannot be captured by a single biomarker. The resulting three-biomarker panel (CA19-9, GDF15, and suPAR) achieved superior diagnostic performance and demonstrated consistent accuracy across both development and independent validation cohorts. Importantly, the model was applied to the validation cohort without retraining, confirming its external validity and robustness. Second, the panel retained strong diagnostic performance in clinically critical subgroups, including stage I PDA and patients with normal CA19-9 levels, underscoring its potential utility for early detection and in populations where CA19-9 is limited. Finally, unlike most prior studies that relied on CA19-9 alone or small biomarker panels without rigorous external validation, our work integrates multiple complementary biomarkers with ML optimization, validated in a multicenter independent cohort. These features emphasize the novelty, clinical significance, and translational potential of our biomarker panel, particularly for high-risk population screening.

Conclusion

This study developed an ML-based biomarker panel incorporating CA19-9, GDF15, and suPAR for PDA diagnosis. The panel significantly outperformed CA19-9 alone, particularly in early-stage detection. The CatBoost model achieved high accuracy, with AUROC values of 0.992 for all stages and 0.976 for early-stage PDA, and validation confirmed its robustness. These findings highlight the potential of ML-driven biomarker panels in enhancing noninvasive PDA detection.

Data availability

The datasets generated and analyzed during the current study are publicly available in the Zenodo repository at: https://doi.org/10.5281/zenodo.15844304.

References

Khalaf, N., El-Serag, H. B., Abrams, H. R. & Thrift, A. P. Burden of pancreatic cancer: from epidemiology to practice. Clin. Gastroenterol. Hepatol. 19, 876–884. https://doi.org/10.1016/j.cgh.2020.02.054 (2021).
Article PubMed Google Scholar
National Cancer Institute (NCI). Cancer StatFacts: pancreatic cancer. NCI website. https://seer.cancer.gov/statfacts/html/pancreas.html (Accessed May 2023).
Conroy, T. et al. FOLFIRINOX versus gemcitabine for metastatic pancreatic cancer. N Engl. J. Med. 364, 1817–1825. https://doi.org/10.1056/NEJMoa1011923 (2011).
Article CAS PubMed Google Scholar
Von Hoff, D. D. et al. Increased survival in pancreatic cancer with nab-paclitaxel plus gemcitabine. N Engl. J. Med. 369, 1691–1703. https://doi.org/10.1056/NEJMoa1304369 (2013).
Article CAS Google Scholar
Strobel, O., Neoptolemos, J., Jäger, D. & Büchler, M. W. Optimizing the outcomes of pancreatic cancer surgery. Nat. Rev. Clin. Oncol. 16, 11–26. https://doi.org/10.1038/s41571-018-0112-1 (2019).
Article CAS PubMed Google Scholar
Neoptolemos, J. P. et al. Therapeutic developments in pancreatic cancer: current and future perspectives. Nat. Rev. Gastroenterol. Hepatol. 15, 333–348. https://doi.org/10.1038/s41575-018-0005-x (2018).
Article PubMed Google Scholar
Conroy, T. et al. FOLFIRINOX or gemcitabine as adjuvant therapy for pancreatic cancer. N Engl. J. Med. 379, 2395–2406. https://doi.org/10.1056/NEJMoa1809775 (2018).
Article CAS PubMed Google Scholar
Strobel, O. et al. Actual Five-year survival after upfront resection for pancreatic ductal adenocarcinoma: who beats the odds?? Ann. Surg. 275, 962–971. https://doi.org/10.1097/sla.0000000000004147 (2022).
Article PubMed Google Scholar
Pereira, S. P. et al. Early detection of pancreatic cancer. Lancet Gastroenterol. Hepatol. 5, 698–710. https://doi.org/10.1016/s2468-1253(19)30416-9 (2020).
Article PubMed PubMed Central Google Scholar
Kamisawa, T., Wood, L. D., Itoi, T. & Takaori, K. Pancreatic cancer. Lancet 388, 73–85. https://doi.org/10.1016/s0140-6736(16)00141-0 (2016).
Article CAS PubMed Google Scholar
Goonetilleke, K. S. & Siriwardena, A. K. Systematic review of carbohydrate antigen (CA 19 – 9) as a biochemical marker in the diagnosis of pancreatic cancer. Eur. J. Surg. Oncol. 33, 266–270. https://doi.org/10.1016/j.ejso.2006.10.004 (2007).
Article CAS PubMed Google Scholar
Ballehaninna, U. & Chamberlain, R. The clinical utility of serum CA 19 – 9 in the diagnosis, prognosis and management of pancreatic adenocarcinoma: an evidence based appraisal. J. Gastrointest. Oncol. 32, 105–119. https://doi.org/10.3978/j.issn.2078-6891.2011.021 (2012).
Article CAS Google Scholar
Vellan, C. J. et al. Application of proteomics in pancreatic ductal adenocarcinoma biomarker investigations: A review. Int. J. Mol. Sci. 23 https://doi.org/10.3390/ijms23042093 (2022).
Ozaki, Y., Broughton, P., Abdollahi, H., Valafar, H. & Blenda, A. V. Integrating omics data and AI for cancer diagnosis and prognosis. Cancers (Basel). 16 https://doi.org/10.3390/cancers16132448 (2024).
Benkeser, D., Ju, C., Lendle, S. & van der Laan, M. Online cross-validation-based ensemble learning. Stat. Med. 37, 249–260. https://doi.org/10.1002/sim.7320 (2018).
Article MathSciNet PubMed Google Scholar
Hu, C., Gao, C., Li, T., Liu, C. & Peng, Z. Explainable artificial intelligence model for mortality risk prediction in the intensive care unit: a derivation and validation study. Postgrad. Med. J. 100, 219–227. https://doi.org/10.1093/postmj/qgad144 (2024).
Article CAS PubMed Google Scholar
Lee, H. et al. Serum protein profiling of lung, pancreatic, and colorectal cancers reveals alcohol consumption-mediated disruptions in early-stage cancer detection. Heliyon 8, e12359. https://doi.org/10.1016/j.heliyon.2022.e12359 (2022).
Article CAS PubMed PubMed Central Google Scholar
Qiu, C. et al. A luminex approach to develop an anti-tumor-associated antigen autoantibody panel for the detection of prostate cancer in racially/ethnically diverse populations. Cancers (Basel). 15 https://doi.org/10.3390/cancers15164064 (2023).
R&D Systems. Human Premixed Multi-Analyte Kit: Luminex^® Assay Protocol. https://resources.rndsystems.com/pdfs/datasheets/lxsah.pdf (R&D Systems Inc., 2017).
Fahrmann, J. F. et al. Lead-time trajectory of CA19-9 as an anchor marker for pancreatic cancer early detection. Gastroenterology 160, 1373–1383.e1376. https://doi.org/10.1053/j.gastro.2020.11.052 (2021).
Article CAS PubMed Google Scholar
Wood, L. D., Canto, M. I., Jaffee, E. M. & Simeone, D. M. Pancreatic cancer: pathogenesis, screening, diagnosis, and treatment. Gastroenterology 163, 386–402e381. https://doi.org/10.1053/j.gastro.2022.03.056 (2022).
Article PubMed Google Scholar
Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68, 394–424. https://doi.org/10.3322/caac.21492 (2018).
Article PubMed Google Scholar
Rahib, L. et al. Projecting cancer incidence and deaths to 2030: the unexpected burden of thyroid, liver, and pancreas cancers in the united States. Cancer Res. 74, 2913–2921. https://doi.org/10.1158/0008-5472.Can-14-0155 (2014).
Article CAS PubMed Google Scholar
Shakeel, S., Finley, C., Akhtar-Danesh, G., Seow, H. Y. & Akhtar-Danesh, N. Trends in survival based on treatment modality in patients with pancreatic cancer: a population-based study. Curr. Oncol. 27, e1–e8. https://doi.org/10.3747/co.27.5211 (2020).
Article CAS PubMed PubMed Central Google Scholar
Blackford, A. L., Canto, M. I., Klein, A. P., Hruban, R. H. & Goggins, M. Recent trends in the incidence and survival of stage 1A pancreatic cancer: A surveillance, epidemiology, and end results analysis. J. Natl. Cancer Inst. 112, 1162–1169. https://doi.org/10.1093/jnci/djaa004 (2020).
Article PubMed PubMed Central Google Scholar
Hur, C. et al. Early pancreatic ductal adenocarcinoma survival is dependent on size: positive implications for future targeted screening. Pancreas 45, 1062–1066. https://doi.org/10.1097/mpa.0000000000000587 (2016).
Article CAS PubMed PubMed Central Google Scholar
Koprowski, H., Herlyn, M., Steplewski, Z. & Sears, H. F. Specific antigen in serum of patients with colon carcinoma. Science 212, 53–55. https://doi.org/10.1126/science.6163212 (1981).
Article ADS CAS PubMed Google Scholar
Luo, G. et al. Roles of CA19-9 in pancreatic cancer: biomarker, predictor and promoter. Biochim. Biophys. Acta Rev. Cancer. 1875, 188409. https://doi.org/10.1016/j.bbcan.2020.188409 (2021).
Article CAS PubMed Google Scholar
Owens, D. K. et al. Screening for pancreatic cancer: US preventive services task force reaffirmation recommendation statement. Jama 322, 438–444. https://doi.org/10.1001/jama.2019.10232 (2019).
Article PubMed Google Scholar
Locker, G. Y. et al. ASCO 2006 update of recommendations for the use of tumor markers in gastrointestinal cancer. J. Clin. Oncol. 24, 5313–5327. https://doi.org/10.1200/jco.2006.08.2644 (2006).
Article CAS PubMed Google Scholar
Duffy, M. J. et al. Tumor markers in pancreatic cancer: a European group on tumor markers (EGTM) status report. Ann. Oncol. 21, 441–447. https://doi.org/10.1093/annonc/mdp332 (2010).
Article CAS PubMed Google Scholar
Yee, N. S., Zhang, S., He, H. Z. & Zheng, S. Y. Extracellular vesicles as potential biomarkers for early detection and diagnosis of pancreatic cancer. Biomedicines 8 https://doi.org/10.3390/biomedicines8120581 (2020).
Yu, S. et al. Plasma extracellular vesicle long RNA profiling identifies a diagnostic signature for the detection of pancreatic ductal adenocarcinoma. Gut 69, 540–550. https://doi.org/10.1136/gutjnl-2019-318860 (2020).
Article CAS PubMed Google Scholar
Grunvald, M. W., Jacobson, R. A., Kuzel, T. M., Pappas, S. G. & Masood, A. Current status of Circulating tumor DNA liquid biopsy in pancreatic cancer. Int. J. Mol. Sci. 21 https://doi.org/10.3390/ijms21207651 (2020).
Mellby, L. D. et al. Serum biomarker signature-based liquid biopsy for diagnosis of early-stage pancreatic cancer. J. Clin. Oncol. 36, 2887–2894. https://doi.org/10.1200/jco.2017.77.6658 (2018).
Article CAS PubMed PubMed Central Google Scholar
Daoud, A. Z., Mulholland, E. J., Cole, G. & McCarthy, H. O. MicroRNAs in pancreatic cancer: biomarkers, prognostic, and therapeutic modulators. BMC Cancer. 19, 1130. https://doi.org/10.1186/s12885-019-6284-y (2019).
Article CAS PubMed PubMed Central Google Scholar
Baraniskin, A. et al. Circulating U2 small nuclear RNA fragments as a novel diagnostic biomarker for pancreatic and colorectal adenocarcinoma. Int. J. Cancer. 132, E48–57. https://doi.org/10.1002/ijc.27791 (2013).
Article CAS PubMed Google Scholar
Previdi, M. C., Carotenuto, P., Zito, D., Pandolfo, R. & Braconi, C. Noncoding RNAs as novel biomarkers in pancreatic cancer: what do we know? Future Oncol. 13, 443–453. https://doi.org/10.2217/fon-2016-0253 (2017).
Article CAS PubMed Google Scholar
Swietlik, J. J. et al. Cell-selective proteomics segregates pancreatic cancer subtypes by extracellular proteins in tumors and circulation. Nat. Commun. 14, 2642. https://doi.org/10.1038/s41467-023-38171-8 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Perazzoli, G. et al. Evaluating Metabolite-Based biomarkers for early diagnosis of pancreatic cancer: A systematic review. Metabolites 13 https://doi.org/10.3390/metabo13070872 (2023).
Suenaga, M. et al. Pancreatic juice mutation concentrations can help predict the grade of dysplasia in patients undergoing pancreatic surveillance. Clin. Cancer Res. 24, 2963–2974. https://doi.org/10.1158/1078-0432.Ccr-17-2463 (2018).
Article CAS PubMed PubMed Central Google Scholar
Neal, R. D. et al. Cell-free DNA-based multi-cancer early detection test in an asymptomatic screening population (NHS-Galleri): design of a pragmatic, prospective randomised controlled trial. Cancers (Basel). 14. https://doi.org/10.3390/cancers14194818 (2022).
Nadauld, L. D. et al. The PATHFINDER study: assessment of the implementation of an investigational multi-cancer early detection test into clinical practice. Cancers (Basel). 13. https://doi.org/10.3390/cancers13143501 (2021).
Huerta, M. et al. Circulating tumor DNA detection by digital-droplet PCR in pancreatic ductal adenocarcinoma: A systematic review. Cancers. 13. https://doi.org/10.3390/cancers13050994 (2021).
Ramalhete, L., Vigia, E., Araújo, R. & Marques, H. P. Proteomics-driven biomarkers in pancreatic cancer. Proteomes 11 https://doi.org/10.3390/proteomes11030024 (2023).
De Oliveira, G. et al. An integrated meta-analysis of secretome and proteome identify potential biomarkers of pancreatic ductal adenocarcinoma. Cancers. 12 https://doi.org/10.3390/cancers12030716 (2020).
Cohen, J. D. et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 359, 926–930. https://doi.org/10.1126/science.aar3247 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Athanasiou, A. et al. Biomarker discovery for early detection of pancreatic ductal adenocarcinoma (PDAC) using multiplex proteomics technology. J. Proteome Res. 24, 315–322. https://doi.org/10.1021/acs.jproteome.4c00752 (2025).
Article CAS PubMed Google Scholar
Palma, N. A. et al. A high performing biomarker signature for detecting early-stage pancreatic ductal adenocarcinoma in high-Risk individuals. Cancers (Basel). 17 https://doi.org/10.3390/cancers17111866 (2025).
Capello, M. et al. Sequential validation of blood-based protein biomarker candidates for early-stage pancreatic cancer. J. Natl. Cancer Inst. 109 https://doi.org/10.1093/jnci/djw266 (2017).
Cohen, J. D. et al. Combined circulating tumor DNA and protein biomarker-based liquid biopsy for the earlier detection of pancreatic cancers. Proc. Natl. Acad. Sci. U S A. 114, 10202–10207. https://doi.org/10.1073/pnas.1704961114 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Kim, J. et al. Detection of early pancreatic ductal adenocarcinoma with thrombospondin-2 and CA19-9 blood markers. Sci. Transl Med. 9 https://doi.org/10.1126/scitranslmed.aah5583 (2017).
Byeon, S. et al. Novel serum protein biomarker panel for early diagnosis of pancreatic cancer. Int. J. Cancer. 155, 365–371. https://doi.org/10.1002/ijc.34928 (2024).
Article CAS PubMed Google Scholar
Wischhusen, J., Melero, I. & Fridman, W. H.Growth/differentiation factor-15 (GDF-15): from biomarker to novel targetable immune checkpoint. Front. Immunol. 11, 951. https://doi.org/10.3389/fimmu.2020.00951 (2020).
Article CAS PubMed PubMed Central Google Scholar
Johnen, H. et al. Tumor-induced anorexia and weight loss are mediated by the TGF-beta superfamily cytokine MIC-1. Nat. Med. 13, 1333–1340. https://doi.org/10.1038/nm1677 (2007).
Article CAS PubMed Google Scholar
Böttner, M., Suter-Crazzolara, C., Schober, A. & Unsicker, K. Expression of a novel member of the TGF-β superfamily, growth/differentiation factor-15/macrophage-inhibiting cytokine-1 (GDF-15/MIC-1) in adult rat tissues. Cell Tissue Res. 297, 103–110. https://doi.org/10.1007/s004410051337 (1999).
Article PubMed Google Scholar
Koopmann, J. et al. Serum macrophage inhibitory cytokine 1 as a marker of pancreatic and other periampullary cancers. Clin. Cancer Res. 10, 2386–2392. https://doi.org/10.1158/1078-0432.ccr-03-0165 (2004).
Article CAS PubMed Google Scholar
Chen, Y. Z. et al. Diagnostic performance of serum macrophage inhibitory cytokine-1 in pancreatic cancer: a meta-analysis and meta-regression analysis. DNA Cell. Biol. 33, 370–377. https://doi.org/10.1089/dna.2013.2237 (2014).
Article CAS PubMed Google Scholar
Wang, X. et al. Macrophage inhibitory cytokine 1 (MIC-1/GDF15) as a novel diagnostic serum biomarker in pancreatic ductal adenocarcinoma. BMC Cancer. 14, 578. https://doi.org/10.1186/1471-2407-14-578 (2014).
Article CAS PubMed PubMed Central Google Scholar
Hogendorf, P. et al. Growth differentiation factor (GDF-15) concentration combined with Ca125 levels in serum is superior to commonly used cancer biomarkers in differentiation of pancreatic mass. Cancer Biomark. 21, 505–511. https://doi.org/10.3233/cbm-170203 (2018).
Article CAS PubMed Google Scholar
Ratnam, N. et al. NF-κB regulates GDF-15 to suppress macrophage surveillance during early tumor development. J. Clin. Investig. 127 10, 3796–3809. https://doi.org/10.1172/JCI91561 (2017).
Article Google Scholar
Ratnam, N. M. et al. NF-κB regulates GDF-15 to suppress macrophage surveillance during early tumor development. J. Clin. Invest. 127, 3796–3809. https://doi.org/10.1172/jci91561 (2017).
Article PubMed PubMed Central Google Scholar
O’Neill, R. S., Emmanuel, S., Williams, D. & Stoita, A. Macrophage inhibitory cytokine-1/growth differentiation factor-15 in premalignant and neoplastic tumours in a high-risk pancreatic cancer cohort. World J. Gastroenterol. 26, 1660–1673. https://doi.org/10.3748/wjg.v26.i14.1660 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kaur, S. et al. Potentials of plasma NGAL and MIC-1 as biomarker(s) in the diagnosis of lethal pancreatic cancer. PLoS One. 8, e55171. https://doi.org/10.1371/journal.pone.0055171 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Mohamed, A. A. et al. Evaluation of Circulating ADH and MIC-1 as diagnostic markers in Egyptian patients with pancreatic cancer. Pancreatology 15, 34–39. https://doi.org/10.1016/j.pan.2014.10.008 (2015).
Article CAS PubMed Google Scholar
Yang, Y., Yan, S., Tian, H. & Bao, Y. Macrophage inhibitory cytokine-1 versus carbohydrate antigen 19 – 9 as a biomarker for diagnosis of pancreatic cancer: A PRISMA-compliant meta-analysis of diagnostic accuracy studies. Med. (Baltim). 97, e9994. https://doi.org/10.1097/md.0000000000009994 (2018).
Article CAS Google Scholar
Misek, D. E., Patwa, T. H., Lubman, D. M. & Simeone, D. M. Early detection and biomarkers in pancreatic cancer. J. Natl. Compr. Canc Netw. 5, 1034–1041. https://doi.org/10.6004/jnccn.2007.0086 (2007).
Article CAS PubMed Google Scholar
Koopmann, J. et al. Serum markers in patients with resectable pancreatic adenocarcinoma: macrophage inhibitory cytokine 1 versus CA19-9. Clin. Cancer Res. 12, 442–446. https://doi.org/10.1158/1078-0432.Ccr-05-0564 (2006).
Article CAS PubMed Google Scholar
Mahmood, N., Mihalcioiu, C. & Rabbani, S. Multifaceted role of the urokinase-type plasminogen activator (uPA) and its receptor (uPAR): diagnostic, prognostic, and therapeutic applications. Front. Oncol. 8 https://doi.org/10.3389/fonc.2018.00024 (2018).
Kjaergaard, M., Hansen, L. V., Jacobsen, B., Gardsvoll, H. & Ploug, M. Structure and ligand interactions of the urokinase receptor (uPAR). Front. Biosci. 13, 5441–5461. https://doi.org/10.2741/3092 (2008).
Article CAS PubMed Google Scholar
Koblinski, J. E., Ahram, M. & Sloane, B. F. Unraveling the role of proteases in cancer. Clin. Chim. Acta. 291, 113–135. https://doi.org/10.1016/s0009-8981(99)00224-7 (2000).
Article CAS PubMed Google Scholar
Martin, C. E. & List, K. Cell surface-anchored serine proteases in cancer progression and metastasis. Cancer Metastasis Rev. 38, 357–387. https://doi.org/10.1007/s10555-019-09811-7 (2019).
Article PubMed PubMed Central Google Scholar
Hamada, M. et al. Urokinase-type plasminogen activator receptor (uPAR) in inflammation and disease: A unique inflammatory pathway activator. Biomedicines 12 https://doi.org/10.3390/biomedicines12061167 (2024).
Harvey, S. R. et al. Evaluation of urinary plasminogen activator, its receptor, matrix metalloproteinase-9, and von Willebrand factor in pancreatic cancer. Clin. Cancer Res. 9, 4935–4943 (2003).
CAS PubMed Google Scholar
Nielsen, T. O. et al. Expression of the insulin-like growth factor I receptor and urokinase plasminogen activator in breast cancer is associated with poor survival: potential for intervention with 17-allylamino geldanamycin. Cancer Res. 64, 286–291. https://doi.org/10.1158/0008-5472.can-03-1242 (2004).
Article CAS PubMed Google Scholar
Look, M. P. et al. Pooled analysis of prognostic impact of urokinase-type plasminogen activator and its inhibitor PAI-1 in 8377 breast cancer patients. J. Natl. Cancer Inst. 94, 116–128. https://doi.org/10.1093/jnci/94.2.116 (2002).
Article CAS PubMed Google Scholar
Märkl, B. et al. Tumour budding, uPA and PAI-1 are associated with aggressive behaviour in colon cancer. J. Surg. Oncol. 102, 235–241. https://doi.org/10.1002/jso.21611 (2010).
Article PubMed Google Scholar
Halamkova, J. et al. Clinical relevance of uPA, uPAR, PAI 1 and PAI 2 tissue expression and plasma PAI 1 level in colorectal carcinoma patients. Hepatogastroenterology 58, 1918–1925. https://doi.org/10.5754/hge10232 (2011).
Article CAS PubMed Google Scholar
Brungs, D. et al. The urokinase plasminogen activation system in gastroesophageal cancer: A systematic review and meta-analysis. Oncotarget 8, 23099–23109. https://doi.org/10.18632/oncotarget.15485 (2017).
Article PubMed PubMed Central Google Scholar
Kaneko, T., Konno, H., Baba, M., Tanaka, T. & Nakamura, S. Urokinase-type plasminogen activator expression correlates with tumor angiogenesis and poor outcome in gastric cancer. Cancer Sci. 94, 43–49. https://doi.org/10.1111/j.1349-7006.2003.tb01350.x (2003).
Article CAS PubMed PubMed Central Google Scholar
Gorantla, B., Asuthkar, S., Rao, J. S., Patel, J. & Gondi, C. S. Suppression of the uPAR-uPA system retards angiogenesis, invasion, and in vivo tumor development in pancreatic cancer cells. Mol. Cancer Res. 9, 377–389. https://doi.org/10.1158/1541-7786.Mcr-10-0452 (2011).
Article CAS PubMed Google Scholar
Hildenbrand, R. et al. Amplification of the urokinase-type plasminogen activator receptor (uPAR) gene in ductal pancreatic carcinomas identifies a clinically high-risk group. Am. J. Pathol. 174, 2246–2253. https://doi.org/10.2353/ajpath.2009.080785 (2009).
Article CAS PubMed PubMed Central Google Scholar
Loosen, S. et al. Soluble urokinase plasminogen activator receptor (suPAR) as a novel biomarker in patients undergoing resection of pancreatic adenocarcinoma. J. Clin. Oncol. https://doi.org/10.1200/JCO.2019.37.4_SUPPL.248 (2018).
Article Google Scholar
Aronen, A. et al. Plasma SuPAR may help to distinguish between chronic pancreatitis and pancreatic cancer. Scand. J. Gastroenterol. 56, 81–85. https://doi.org/10.1080/00365521.2020.1849383 (2021).
Article CAS PubMed Google Scholar
Loosen, S. H. et al. High baseline soluble urokinase plasminogen activator receptor (suPAR) serum levels indicate adverse outcome after resection of pancreatic adenocarcinoma. Carcinogenesis 40, 947–955. https://doi.org/10.1093/carcin/bgz033 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ledesma, D., Symes, S. & Richards, S. Advancements within modern machine learning methodology: impacts and prospects in biomarker discovery. Curr. Med. Chem. 28, 6512–6531. https://doi.org/10.2174/0929867328666210208111821 (2021).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This study was conducted with bioresources distributed from Human Bioresource Center of Seoul National University Bundang Hospital (DT-2019-005) and from the regional university hospital–based biobank in Daegu, Republic of Korea (21061501-17-01).

Funding

This work was supported by the Seoul National University Bundang Hospital Research Fund [grant numbers 14-2021-0042, 06-2020-0113].

Author information

Authors and Affiliations

Department of Translational Medicine, Seoul National University College of Medicine, Seoul, 03080, Republic of Korea
Dong Woo Shin
Department of Internal Medicine, Hallym University Sacred Heart Hospital, Anyang, Gyeonggi-do, 14068, Republic of Korea
Dong Woo Shin
Department of Biochemistry, College of Veterinary Medicine, BK 21 PLUS, Seoul National University, Seoul, 08826, Republic of Korea
Je-Yoel Cho
Department of Thoracic and Cardiovascular Surgery, Seoul National University Bundang Hospital, Seongnam, Gyeonggi-do, 13620, Republic of Korea
Sukki Cho
Department of Thoracic and Cardiovascular Surgery, Seoul National University College of Medicine, Seoul, 03080, Republic of Korea
Sukki Cho
Department of Internal Medicine, Seoul National University Bundang Hospital, 82 Gumi-ro 173 Beon-gil, Bundang-gu, Seongnam, Gyeonggi-do, 13620, Republic of Korea
Yuna Youn & Jin-Hyeok Hwang
Department of Internal Medicine, Seoul National University College of Medicine, Seoul, 03080, Republic of Korea
Jin-Hyeok Hwang

Authors

Dong Woo Shin
View author publications
Search author on:PubMed Google Scholar
Je-Yoel Cho
View author publications
Search author on:PubMed Google Scholar
Sukki Cho
View author publications
Search author on:PubMed Google Scholar
Yuna Youn
View author publications
Search author on:PubMed Google Scholar
Jin-Hyeok Hwang
View author publications
Search author on:PubMed Google Scholar

Contributions

Dong Woo Shin: conceptualization, software, validation, formal analysis, investigation, writing – original draft, visualization. Je-Yoel Cho: methodology, writing – review & editing. Sukki Cho: resources, investigation. Yuna Youn: validation, investigation, data curation. Jin-Hyeok Hwang: conceptualization, methodology, resources, supervision, project administration, writing – review & editing, funding acquisition. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jin-Hyeok Hwang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Supplementary Material 3

Supplementary Material 4

Supplementary Material 5

Supplementary Material 6

Supplementary Material 7

Supplementary Material 8

Supplementary Material 9

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Shin, D.W., Cho, JY., Cho, S. et al. Development of a serum protein biomarker panel for the diagnosis of pancreatic ductal adenocarcinoma using a machine learning approach. Sci Rep 15, 35659 (2025). https://doi.org/10.1038/s41598-025-19631-1

Download citation

Received: 06 July 2025
Accepted: 09 September 2025
Published: 13 October 2025
DOI: https://doi.org/10.1038/s41598-025-19631-1