Abstract
Early detection of pancreatic ductal adenocarcinoma (PDA) remains a major clinical challenge due to the lack of reliable biomarkers. We developed and validated a machine learning (ML)-based serum protein biomarker panel to enhance PDA diagnosis. Serum concentrations of 47 protein biomarkers were measured in 355 individuals using a Luminex™ bead-based immunoassay. Multiple ML algorithms were employed to construct a diagnostic model, with SHapley Additive exPlanations (SHAP) analysis used to determine the importance of each biomarker. The diagnostic performance of the panel was assessed by the area under the receiver operating characteristic curve (AUROC), F1 score, sensitivity, specificity, and accuracy, and further validated in an independent cohort of 130 individuals. Among the tested models, CatBoost demonstrated the highest diagnostic accuracy. SHAP analysis identified CA19-9, GDF15, and suPAR as key biomarkers, and the combined panel significantly outperformed CA19-9 alone in detecting PDA across all stages (AUROC 0.992 vs. 0.952) and in early-stage PDA (AUROC 0.976 vs. 0.868). Validation in another cohort confirmed the robustness of the model, with AUROC values of 0.977 for all stages and 0.987 for early-stage PDA. These findings suggest that ML-integrated biomarker panels may enable earlier and more accurate PDA detection in clinical practice.
Similar content being viewed by others
Introduction
Pancreatic ductal adenocarcinoma (PDA) is a highly aggressive and heterogeneous malignancy associated with substantial clinical and economic burdens1. Early detection is critical for improving patient prognosis, as the 5-year survival rate (5YSR) declines sharply with disease progression: 44.0% for localized disease, 16.2% for regional spread, and only 3.1% for metastatic PDA2. Despite palliative chemotherapy regimens, such as FOLFIRINOX or nab-paclitaxel combined with gemcitabine, patients with metastatic PDA typically survive for < 12 months3,4. Surgical resection remains the only potentially curative treatment; however, only 10–20% of newly diagnosed patients with PDA are eligible owing to late-stage presentation5,6. In contrast, patients diagnosed at an early stage and treated with adjuvant chemotherapy (e.g., modified FOLFIRINOX) can achieve a median survival of 54.4 months7. Among those with favorable prognostic factors, including complete resection (R0) and absence of lymph node metastasis (pN0), the 5YSR increases to 38.2%8. Given the aggressive nature of PDA, early detection and timely intervention are essential for improving survival rates9.
Early PDA diagnosis remains challenging because of several factors, including lack of highly sensitive and specific screening biomarkers, vague early symptoms (e.g., epigastric pain, obstructive jaundice, weight loss), rapid disease progression, and the pancreas’s concealed anatomical location10. Carbohydrate antigen 19 − 9 (CA19-9) is the most widely used serum biomarker for PDA, with a pooled sensitivity of 79% (range: 70–90%) and specificity of 82% (range: 68–91%)11. However, its low positive predictive value limits its utility for screening asymptomatic individuals12. Recent advances in protein biomarker research have identified several promising candidates for diagnosing PDA13. Furthermore, machine learning (ML) provides a powerful approach for integrating multi-omics data—including genomics, transcriptomics, epigenomics, and proteomics—to identify optimal biomarker combinations for early detection14. Given the complex biology of PDA, ML-driven biomarker discovery is essential for addressing the limitations of traditional single-marker approaches and advancing early detection strategies. We aimed to develop an ML-based serum protein biomarker panel to improve the diagnostic accuracy of PDA, particularly in its early stages.
Materials and methods
Study design and cohort selection
Two independent cohorts were analyzed: Cohort A, comprising 355 individuals (181 with PDA and 174 healthy controls), served as the development set for identifying potential biomarkers and constructing an optimal biomarker panel. Cohort B, comprising 130 individuals (100 with PDA and 30 healthy controls), served as the validation set to assess the diagnostic performance and generalizability of the developed panel.
Biobank resources
Serum samples for Cohort A were obtained from the Human Bioresource Center of Seoul National University Bundang Hospital, while those for Cohort B were collected from regional university hospital–based biobanks in Daegu, Republic of Korea.
Inclusion criteria and sample collection
Patients aged ≥ 18 years and with histologically confirmed diagnosis of PDA were included. Blood samples were collected before any therapeutic intervention to ensure that biomarker levels reflected the untreated disease state. Control samples were collected from individuals with no history of malignancy.
Ethical approval and data confidentiality
This study adhered to the ethical principles outlined in the Declaration of Helsinki. The protocol was approved by the Institutional Review Board (IRB) of Seoul National University Bundang Hospital (IRB number X-1909-564-901). All participants provided written informed consent before sample collection. Additionally, the study complied with the Health Insurance Portability and Accountability Act regulations to ensure the confidentiality and security of participant data.
Study workflow
The study comprised six key steps (Fig. 1):
-
a.
Biomarker quantification: Serum levels of 47 candidate protein biomarkers were measured in Cohort A using Luminex™ bead-based multiplex immunoassays.
-
b.
ML-based biomarker analysis: Multiple ML algorithms analyzed serum biomarker data to identify key features associated with PDA classification.
-
c.
Model training and validation: A five-fold cross-validation approach was used to evaluate model performance. The dataset was divided into five equal folds, with the model trained on four folds and validated on the remaining fold. This process was repeated five times, and final performance metrics were calculated by averaging the results15.
-
d.
Feature importance analysis: SHapley Additive exPlanations (SHAP) were used to assess each biomarker’s contribution to the model’s predictions16. This method provided an interpretable ranking of biomarkers, highlighting proteins that significantly influenced PDA classification performance.
-
e.
Development of a biomarker panel: An optimal biomarker panel was constructed based on features selected through ML. Its diagnostic performance was evaluated using multiple metrics, including the area under the receiver operating characteristic curve (AUROC), F1 score, sensitivity, specificity, and accuracy.
-
f.
Independent validation: The developed biomarker panel was validated using an independent dataset (Cohort B) to assess its diagnostic accuracy and generalizability.
Serum protein biomarker quantification
Luminex™ multiplex assay procedure
Serum protein biomarkers were quantified using the Luminex™ 200 system (Austin, TX), a high-throughput multiplex platform that enables simultaneous analysis of multiple analytes within a single sample17,18. The assay protocol followed these steps:
-
a.
Prewetting and plate preparation: Each well was prewetted with 100 µL of wash buffer and incubated for 10 min.
-
b.
Reagent loading: After removing the wash buffer, 25 µL of each standard, quality control sample, and assay buffer were added to the designated wells, followed by 25 µL of the matrix solution, following the manufacturer’s recommendations19.
-
c.
Bead incubation: A total of 25 µL of fluorescently labeled beads, conjugated with target-specific antibodies, was added to each well. Plates were incubated overnight at 4 °C on a plate shaker to facilitate antigen-antibody binding.
-
d.
Detection antibody binding: After two washes with 200 µL of wash buffer, 25 µL of biotinylated detection antibodies were added and incubated for 1 h at room temperature on a plate shaker.
-
e.
Signal amplification: 25 µL of streptavidin-phycoerythrin was added and incubated for 30 min at room temperature. The plate was subsequently washed twice with 200 µL of wash buffer to remove unbound reagents.
-
f.
Fluorescence detection: 100 µL of sheath fluid was added to each well, and the beads were resuspended on a plate shaker for 5 min at room temperature. Fluorescence intensity was measured using the Luminex xPONENT™ software, and biomarker concentrations were calculated using SoftMax Pro (version 5.4), applying a five-parameter logistic regression curve to logarithmically transformed data.
Forty-seven candidate protein biomarkers
This study employed bead-based xMAP™ immunoassays to analyze 47 circulating proteins, which were categorized into six distinct panels: (a) Human angiogenesis/growth factor panel 1 (Millipore, Billerica, NY; catalog number HANG2MAG-12 K), including angiopoietin-2, G-CSF, endoglin, FGF1, and follistatin; (b) Human angiogenesis panel 2 (Millipore, Billerica, NY; catalog number HAGP1MAG-12 K), including sAXL, sHER2, sE-selectin, TSP2, sEGFR, suPAR, sVEGFR1, sPECAM-1, and OPN; (c) Human cancer/metastasis biomarker panel 1 (Millipore, Billerica, NY; catalog number HCMBMAG-22 K), including GDF15, DKK1, NSE, and OPG; (d) Human circulating cancer biomarker panel 1 (Millipore, Billerica, NY; catalog number HCCBP1MAG-58 K), including CA15-3, CA19-9, MIF, leptin, IL-6, CEA, IL-8, HGF, sFas, TNFα, PRL, SCF, Cyfra21-1, FGF2, β-hCG, HE4, TGF-α, and VEGF; (e) Human circulating cancer biomarker panel 3 (Millipore, Billerica, NY; catalog number HCCBP3MAG-58 K), including galectin-3, myeloperoxidase, SHBG, IGFBP3, and ferritin); and (f) Human circulating cancer biomarker panel 4 (Millipore, Billerica, NY; catalog number HCCB4MAG-58 K), including mesothelin, midkine, kallikrein-6, ALDH1A1, EpCAM, and CD44.
Development of a diagnostic biomarker panel using Ml model selection and training (Cohort A)
To develop a diagnostic biomarker panel, multiple ML algorithms were applied across the following categories: (a) Decision tree-based algorithms: Random Forest, Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and CatBoost; (b) Other classifiers: Support Vector Machine (SVM) and k-Nearest Neighbors (KNN); and (c) Ensemble learning: A combined model integrating multiple classifiers to improve diagnostic accuracy.
The dataset was randomly split into training (80%) and testing (20%) subsets. To ensure unbiased model performance, a five-fold cross-validation approach was used. During the random split of data into training and validation sets, stratification was applied based on gender and age to ensure balanced distributions across cohorts and improve comparability. The dataset was divided into five equal folds, with four folds used for training and one for validation. This process was repeated five times, with each fold serving as the validation set once. The final model performance was calculated by averaging the results across all iterations, yielding a robust estimate of diagnostic accuracy. During five-fold cross-validation, the variability in cutoff values across folds was minimal (within ± 2–3%), supporting the robustness and stability of the model.
Feature importance analysis using SHAP in cohort A
To assess the importance of each biomarker, SHAP analysis was applied to rank features based on their contribution to the model’s predictive output. The SHAP values provided an interpretable ranking of biomarker significance, facilitating the identification of the most diagnostically relevant biomarkers.
Validation of biomarker panel in independent cohort B
Cohort B comprised a completely independent set of 130 individuals, including 100 patients with PDA and 30 healthy controls, distinct from Cohort A. The diagnostic model was trained exclusively on Cohort A and directly applied to Cohort B without retraining, using the same thresholds. The cutoff values were determined in Cohort A based on the Youden Index, which balances sensitivity and specificity. These thresholds were then fixed and applied unchanged in Cohort B for independent validation.
Diagnostic performance metrics
Model classification performance was evaluated using confusion matrices, which quantify true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN). Diagnostic performance was further assessed using the following metrics: (a) AUROC measures the ability of the biomarker panel to distinguish patients with PDA from healthy controls. A value of 1.0 indicates perfect discrimination, while a value of 0.5 suggests no discriminative power; (b) Accuracy reflects the overall proportion of correctly classified patients. The formula is (TP + TN) / (TP + TN + FP + FN), where TP and TN represent correct classifications, and FP and FN represent misclassifications; (c) Sensitivity (Recall) measures the model’s ability to correctly identify patients with PDA. It is calculated as TP / (TP + FN), where TP represents correctly identified patients with PDA, and FN represents missed patients with PDA; (d) Specificity indicates the ability to correctly classify healthy controls. The formula is TN / (TN + FP), where TN represents correctly identified healthy controls, and FP represents healthy controls misclassified as patients with PDA; (e) Positive predictive value (PPV, Precision) demonstrates the proportion of true patients with PDA among those predicted as having PDA. It is calculated as TP / (TP + FP), reflecting the reliability of positive results; (f) Negative predictive value measures the proportion of true healthy controls among those predicted as healthy. It is calculated as TN / (TN + FN), reflecting the reliability of negative results; and (g) F1 score balances precision and recall through their harmonic mean. The formula is 2 × (Precision × Recall) / (Precision + Recall), where 1 indicates perfect precision and 0 indicates the worst performance. The diagnostic metrics were visualized using a radar plot to compare the performance of different biomarker panels.
Statistical analyses
Categorical variables were analyzed using the chi-squared or Fisher’s exact test, while continuous variables were assessed using the Student’s t-test. Continuous data were presented as means ± standard deviations, while categorical variables were expressed as frequencies (percentages). Statistical significance was determined using predefined thresholds: *p-value < 0.05 was considered statistically significant, suggesting the results were unlikely to occur by chance; **p-value < 0.01 indicated high significance, providing stronger evidence against the null hypothesis; ***p-value < 0.001 was considered strongly significant, reflecting very high confidence in the results; and p-value ≤ 0.0001 was considered extremely significant, indicating exceptional statistical certainty. Comparisons of diagnostic performance between models were conducted using appropriate statistical tests. DeLong’s test was employed to compare AUROCs, while McNemar’s test was applied for paired comparisons of sensitivity, specificity, PPV, and NPV. All statistical analyses were conducted using SPSS version 25.0 (IBM Corporation, Chicago, IL, USA) and R version 3.4.2 (R Foundation for Statistical Computing). ML analyses were conducted in Python v3.12.4 using libraries for data preprocessing, model development, and performance evaluation. Figures were generated using GraphPad Prism 8.0 (GraphPad Software, La Jolla, CA, USA).
Results
Baseline characteristics of the study cohorts
The baseline demographic and clinical characteristics of patients in Cohorts A and B are summarized in Supplementary Table 1. Cohort A comprised 355 individuals, including 174 patients diagnosed with PDA and 181 healthy controls. Among the patients with PDA, 87 (50.0%) were men, with a median age of 65.0 years. The disease stages were distributed as follows: stage I (19 patients, 10.9%), stage II (23 patients, 13.2%), stage III (48 patients, 27.6%), and stage IV (84 patients, 48.3%). Tumors were most frequently found in the pancreatic head/uncinate process (75 patients, 43.1%), followed by the pancreatic tail (51 patients, 29.3%), and the pancreatic body (48 patients, 27.6%). The median tumor size at diagnosis was 38.9 mm (± 17.1 mm), with tumor size significantly increasing with disease progression (stage I: 23.7 mm ± 10.3 mm; stage II: 33.7 mm ± 12.6 mm; stage III: 37.0 mm ± 15.4 mm; stage IV: 44.8 mm ± 17.6 mm; p < 0.001).
Cohort B, serving as an independent validation cohort, comprised 130 individuals, including 100 PDA patients and 30 healthy controls, independent of Cohort A. The disease stages were distributed as follows: stage II (54 patients, 54.0%), stage III (17 patients, 17.0%), and stage IV (29 patients, 29.0%). Tumors were most frequently found in the pancreatic head/uncinate process (53 patients, 53.0%), followed by the pancreatic body (27 patients, 27.0%) and pancreatic tail (20 patients, 20.0%). The median tumor size at diagnosis was 42.2 mm (± 25.4 mm).
Differential expression of proteomic biomarkers in patients with PDA and healthy controls (Cohort A)
Of the 47 biomarkers analyzed, 12 biomarkers (sHER2, sE-selectin, sVEGFR1, sPECAM-1, FGF-1, NSE, FGF2, β-hCG, HE4, ALDH1A1, EpCAM, and IGFBP3) showed no significant differences when comparing PDA patients with healthy controls. The remaining 35 biomarkers exhibited significant differential expressions, suggesting their potential as discriminatory markers for PDA detection. Specifically, biomarkers were categorized according to significance levels as follows: (1) P ≤ 0.05: G-CSF, DKK-1, MIF, Leptin, VEGF, CD44, SHBG; (2) P ≤ 0.01: sAXL, sEGFR, CA15-3, CEA, sFas, TNFα, Cyfra21-1; (3) P ≤ 0.001: Endoglin, Prolactin, Kallikrein-6, Galectin-3, Ferritin; (4) P ≤ 0.0001: TSP-2, suPAR, OPN, Angiopoietin-2, Follistatin, GDF15, OPG, CA19-9, IL-6, IL-8, HGF, SCF, TGFα, Mesothelin, Midkine, Myeloperoxidase (Fig. 2a, b; Supplementary Tables 2 and Supplementary Fig. 1).
Serum protein biomarker expression profiles in patients with PDA and healthy controls. (a) Heatmap illustrating the intensity and distribution of serum protein biomarkers in patients with PDA and healthy controls. Rows represent individual biomarkers, and columns represent individual samples. Higher expression levels are indicated in red, while lower levels are shown in blue. (b) Boxplots of Z-score distributions for 47 serum protein biomarkers across three groups: healthy controls (blue), early-stage (stage I/II, orange) PDA, and advanced-stage (stage III/IV, red) PDA. Each box represents the IQR, with whiskers extending to 1.5 times the IQR; individual dots indicate outliers. (c) Scatter plots comparing biomarker distributions in Cohort A. (d) Scatter plots comparing biomarker distributions in Cohort B.
Notably, CA19-9, GDF15, suPAR, HGF, and IL-8 were significantly higher in patients with PDA than in healthy controls across all disease stages, reinforcing their diagnostic potential (Fig. 2c, d). CA19-9 levels were significantly higher in the advanced PDA group than in the early group. However, levels of suPAR, GDF15, HGF, and IL-8 showed no significant differences between the early- and advanced-stage PDA groups.
ML-based biomarker panel development for diagnosing all-stage PDA (Cohort A)
The CatBoost model achieved the highest diagnostic accuracy with an AUROC of 0.993 (Fig. 3a, b; Supplementary Fig. 2). In contrast, SVM and KNN demonstrated poorer diagnostic performance (AUROC values: 0.904 and 0.850) with higher rates of false results, revealing limitations in differentiating between patients with PDA and healthy controls. The ensemble model combining CatBoost, LightGBM, and XGBoost achieved an AUROC of 0.993, which was identical to that of the CatBoost model alone (AUROC 0.993), indicating no improvement over the single CatBoost classifier.
Performance evaluation of the ML-based biomarker panel for all-stage PDA diagnosis. (a) ROC curve comparing the diagnostic performance of multiple ML models in cohort A. (b) Confusion matrices for different ML algorithms. (c) Feature importance analysis for all-stage PDA using Random Forest. (d) Feature importance analysis for all-stage PDA using XGBoost. (e) Feature importance analysis for all-stage PDA using LightGBM. (f) Feature importance analysis for all-stage PDA using CatBoost.
SHAP analysis using the CatBoost model identified CA19-9 as the most influential biomarker (Fig. 3c–f), followed by IL-8, GDF15, suPAR, and HGF. These biomarkers remained top contributors across Random Forest, XGBoost, and LightGBM models, confirming their significance in optimizing diagnostic performance. The diagnostic performance of the biomarker combination panels was further evaluated using the CatBoost model. The combination of CA19-9, GDF15, and suPAR outperformed CA19-9 alone in diagnosing all-stage PDA, achieving a higher AUROC (0.992 [0.985–0.997] vs. 0.952 [0.919–0.973]; p = 0.001, DeLong’s test), sensitivity (95.4 [91.5–98.4]% vs. 87.4 [82.2–91.6]%; p < 0.001, McNemar’s test), and PPV (97.1 [94.4–99.4]% vs. 94.4 [90.8–97.4]%; p < 0.001, McNemar’s test) (Table 1, Fig. 5a).
ML-based biomarker panel development for diagnosing early-stage (stage I/II) PDA (Cohort A)
The CatBoost model achieved the highest diagnostic performance with an AUROC of 0.981, demonstrating superior sensitivity and specificity while reducing FN and improving diagnostic precision. LightGBM (AUROC: 0.981) and XGBoost (AUROC 0.978) performed well, though they exhibited slightly higher FN rates than those of CatBoost. In contrast, SVM (AUROC 0.891) and KNN (AUROC 0.777) demonstrated suboptimal performance, suggesting limitations in capturing complex biomarker interactions for early-stage PDA detection. An ensemble model combining CatBoost, LightGBM, and XGBoost achieved an AUROC of 0.982, which represented only a negligible improvement compared to the CatBoost model alone (AUROC 0.981) in distinguishing early-stage PDA from healthy controls (Fig. 4a, b).
Performance evaluation of the ML-based biomarker panel for early-stage PDA diagnosis. (a) ROC curve comparing the diagnostic performance of multiple ML models in cohort A. (b) Confusion matrices for different ML algorithms. (c) Feature importance analysis for early-stage PDA using Random Forest. (d) Feature importance analysis for early-stage PDA using XGBoost. (e) Feature importance analysis for early-stage PDA using LightGBM. (f) Feature importance analysis for early-stage PDA using CatBoost.
SHAP analysis using a CatBoost model identified CA19-9 as the most influential biomarker, followed by IL-8, suPAR, and GDF15, which provided strong complementary value in improving diagnostic accuracy (Fig. 4c–f). These biomarkers consistently ranked among the top predictors across multiple ML models, highlighting their significance in early-stage PDA detection. The diagnostic performance of the biomarker combination panels was further evaluated using the CatBoost model. The CA19-9, GDF15, and suPAR panel significantly outperformed CA19-9 alone in early-stage PDA detection, achieving superior AUROC (0.976 [0.957–0.994] vs. 0.868 [0.774–0.933]), sensitivity (85.7 [72.5–93.9]% vs. 66.7 [52.9–78.6]%), and PPV (90.0 [81.3–98.9]% vs. 84.8 [72.7–95.7]%), thereby improving diagnostic accuracy and reducing FNs (Table 1; Fig. 5b).
Subgroup analysis in patients with normal CA19-9 levels (≤ 37 U/mL) (Cohort A)
The distribution of CA19-9 levels by tumor stage was as follows: stage I, 255.3 ± 532.3; stage II, 285.0 ± 436.9; stage III, 292.0 ± 520.6; and stage IV, 1373.2 ± 2145.8 (Supplementary Table 2). Among the 174 patients with PDA in Cohort A, 25 (14.4%) had normal CA19-9 levels (≤ 37 U/mL). In patients with normal CA19-9 levels (≤ 37 U/mL), CA19-9 alone showed limited diagnostic performance (AUROC 0.715, 95% CI: 0.590–0.808). By contrast, the combination of CA19-9, GDF15, and suPAR achieved markedly improved performance (AUROC 0.948, 95% CI: 0.910–0.979) (Supplementary Table 4).
Diagnostic performance of biomarker combination panels in stage I PDA patients (Cohort A)
Among the 174 patients with PDA in Cohort A, only 19 (10.9%) were diagnosed at stage I. Subgroup analysis restricted to stage I PDA patients demonstrated that CA19-9 alone provided limited diagnostic accuracy (AUROC 0.724, 95% CI: 0.538–0.914). By contrast, the biomarker panel consisting of CA19-9, GDF15, and suPAR achieved markedly improved performance, with an AUROC of 0.949 (95% CI: 0.906–0.989). These findings indicate that the multi-marker panel substantially outperformed CA19-9 alone for the early detection of PDA at stage I (Supplementary Table 5).
Age-stratified diagnostic performance of the biomarker panel (Cohort A)
Subgroup analyses stratified by age demonstrated that the biomarker panel retained robust diagnostic performance across both younger (< 65 years) and older (≥ 65 years) patients. For individuals < 65 years, the combination of CA19-9, GDF15, and suPAR achieved an AUROC of 0.990 (95% CI: 0.984–0.996), sensitivity of 90.9% (95% CI: 85.2–96.2), and specificity of 96.7% (95% CI: 94.6–99.2). In patients ≥ 65 years, the same panel achieved a nearly identical AUROC of 0.990 (95% CI: 0.978–0.999), with sensitivity of 94.2% (95% CI: 89.5–98.7) and specificity of 98.9% (95% CI: 97.7–100.0). These findings indicate that the biomarker panel outperformed CA19-9 alone across all age groups, with no evidence of effect modification by age (Supplementary Table 6).
Gender-stratified diagnostic performance of the biomarker panel (Cohort A)
We further assessed diagnostic performance stratified by gender. The biomarker panel maintained robust accuracy in both male and female patients. In males, the combination of CA19-9, GDF15, and suPAR achieved an AUROC of 0.996 (95% CI: 0.992–0.998), sensitivity of 90.8% (95% CI: 85.7–95.7), and specificity of 98.3% (95% CI: 96.2–100.0). In females, the same panel demonstrated an AUROC of 0.987 (95% CI: 0.978–0.997), sensitivity of 89.7% (95% CI: 84.8–95.1), and specificity of 97.8% (95% CI: 96.2–99.5). These findings indicate that the biomarker panel outperformed CA19-9 alone across both sexes, with no evidence of effect modification by gender (Supplementary Table 7).
Validation of the diagnostic performance of the serum protein biomarker panel (Cohort B)
Serum protein biomarker levels in patients with PDA (by stage) and healthy control group are presented in Supplementary Table 3. In all-stage PDA diagnosis, the combined CA19-9, GDF15, and suPAR biomarker panel outperformed CA19-9 alone, with higher AUROC (0.977 [0.954–0.991] vs. 0.829 [0.756–0.877]), sensitivity (95.0 [90.2–98.6]% vs. 83.0 [74.7–89.0]%), and PPV (93.1 [88.8–97.6]% vs. 83.8 [76.3–89.0]%). Similarly, for early-stage (stage I/II) PDA detection, the combined panel achieved superior performance compared with CA19-9 alone, with higher AUROC (0.987 [0.970–0.999] vs. 0.879 [0.806–0.937]), sensitivity (96.3 [90.8–100.0]% vs. 81.5 [72.8–91.0]%), and PPV (91.2 [84.0-98.2]% vs. 84.6 [75.6–92.8]%) (Table 2; Fig. 5c, d; Supplementary Figs. 3–4).
Discussion
This study developed an ML-based serum biomarker panel to improve PDA diagnosis by analyzing 47 candidate proteins. SHAP analysis identified five key biomarkers (CA19-9, GDF15, suPAR, HGF, and IL-8) as the most relevant for distinguishing individuals with PDA from healthy controls. Among the ML models evaluated, CatBoost demonstrated the highest diagnostic performance. In Cohort A (n = 355), a diagnostic panel comprising CA19-9, GDF15, and suPAR demonstrated significantly better performance than that of CA19-9 alone for both all-stage (AUROC 0.992 vs. 0.952) and early-stage PDA (AUROC: 0.976 vs. 0.868). These findings were independently validated in Cohort B (n = 130), where the combined panel consistently outperformed CA19-9 alone in both all-stage (0.977 vs. 0.829) and early-stage PDA (0.987 vs. 0.879).
Early diagnosis of PDA remains a significant challenge because of its low incidence, rapid progression, nonspecific early symptoms, and absence of reliable biomarkers20,21. Therefore, most patients are diagnosed at an advanced stage, significantly limiting treatment options and leading to poor prognosis9. PDA is an aggressive malignancy with a poor prognosis and is projected to become the second leading cause of cancer-related deaths by 203022,23. Surgical intervention significantly improves survival rates, particularly in early-stage cancers24. Notably, the 5YSR for stage IA PDA has improved markedly, from 44.7% in 2004 to 83.7% in 201225. These trends underscore the critical importance of early detection of potentially curable PDA to improve long-term survival outcomes26.
CA19-9, first identified by Koprowski in 1979, is the most widely used and validated biomarker for PDA. It demonstrates moderate diagnostic performance, with a sensitivity of 79 (70–90%) and specificity of 82 (68–91%)11,27,28. However, the United States Preventive Services Task Force29American Society of Clinical Oncology30and European Group on Tumor Markers31 recommend against using CA19-9 for routine PDA screening in the general population because of its diagnostic limitations. To address these challenges, ongoing multi-omics studies focused on identifying more reliable biomarkers for PDA9. Moreover, considerable progress has been made in the discovery, optimization, and clinical validation of diagnostic biomarkers derived from various biological sources, including blood (e.g., extracellular vesicles32,33circulating tumor DNA [ctDNA] or cell-free DNA [cfDNA]34,35mRNA, microRNA [miR]36small nuclear RNA37long noncoding RNA38proteins39metabolites40, and pancreatic fluid41. Recent advances in liquid biopsy-based technologies present a promising complement to traditional cancer screening by enabling the detection of multiple cancer types from a single blood sample. While assays based on cfDNA methylation, such as PATHFINDER and Galleri, demonstrate high specificity in detecting PDA, they are limited by low sensitivity for early-stage disease42,43. In PDA, ctDNA represents only a small fraction of the total cell-free DNA circulating in the bloodstream, which complicates its detection and reduces its accuracy as a diagnostic tool44. In contrast, proteins—being terminal products of the central dogma—integrate genomic, transcriptomic, and post-translational modifications, providing valuable insights into the dysregulated pathways driving PDA progression13. Given the high molecular complexity and heterogeneity of PDA, proteomic analysis facilitates the identification of diverse proteins and their modifications, offering a detailed understanding of the molecular landscape of the disease45,46. Serum proteins, such as CA19-9, GDF15, and suPAR, demonstrate substantially higher abundance in biofluids than that of ctDNA, enabling the detection of localized tumors with a smaller disease burden. CancerSEEK’s multi-analyte approach, combining protein markers with ctDNA analysis, achieved 70% sensitivity for early-stage gastrointestinal cancers by leveraging the biological amplification inherent to secretory pathways47. This highlights the complementary role of protein biomarkers in overcoming the limitations of cfDNA-based methods while also capturing the dynamic tumor-stroma interactions critical for PDA pathogenesis.
We selected the Luminex™ bead-based immunoassay as the primary detection platform for its ability to simultaneously quantify multiple protein biomarkers in small serum volumes (25 µL per assay) with high analytical sensitivity and specificity. Compared to mass spectrometry, Luminex offers a shorter turnaround time, lower per-sample cost, and operational feasibility in routine clinical laboratories. Importantly, several FDA-cleared Luminex assays are currently in clinical use, supporting its translational potential. These characteristics make the platform well-suited for large-scale, multi-center validation studies and for eventual integration into high-throughput clinical screening workflows. At this discovery stage, we cannot determine the exact cost of the proposed panel since two components (GDF15 and suPAR) are not yet commercially available as diagnostic kits. The final per-test cost will depend on future commercialization, production scale, and supply chain factors.
In addition to our findings, several large-scale studies have evaluated biomarker panels for PDA in independent validation cohorts. Athanasiou et al. validated a serum biomarker panel for PDA in a large multicenter cohort of over 600 patients, highlighting the importance of extensive validation for clinical translation48. Palma et al. demonstrated robust diagnostic performance of a proteomics-based biomarker panel for PDA in a large validation cohort, further supporting the utility of multi-marker approaches49. Other studies have identified several novel protein biomarkers for PDA diagnosis, including tissue inhibitor of metalloproteinases 1 (TIMP1), leucine-rich alpha-2-glycoprotein 1 (LRG1), carcinoembryonic antigen (CEA), hepatocyte growth factor (HGF), osteopontin, thrombospondin-2 (THBS2), polymeric immunoglobulin receptor (PIGR), von Willebrand factor (vWF), and fibrinogen. Capello et al. analyzed plasma samples from 187 patients with PDA and 169 controls, demonstrating that a biomarker panel combining TIMP1, LRG1, and CA19-9 significantly improved early-stage PDA detection (AUROC: 0.949, 84% sensitivity at 95% specificity) with better performance than that of CA19-9 alone50. Moreover, Cohen et al. reported that a multi-analyte liquid biopsy approach, integrating four protein biomarkers (CA19-9, CEA, HGF, and osteopontin) with Kirsten rat sarcoma viral oncogene homolog (KRAS) mutant circulating tumor DNA, detected resectable PDA with 64% sensitivity and 99% specificity in a cohort of 221 patients and 182 controls51. Kim et al. demonstrated that combining plasma THBS2 with CA19-9 significantly improved diagnostic performance, achieving an AUROC of 0.949, with sensitivity of 84% and specificity of 95% in a large cohort (n = 537)52. Similarly, Byeon et al. reported that the combination of PIGR, vWF, fibrinogen, and CA19-9 achieved higher diagnostic performance for early-stage PDA (AUROC: 0.979) than that of CA19-9 alone (AUROC: 0.911)53. Consistent with previous research, our biomarker panel—comprising CA19-9, GDF15, and suPAR—significantly outperformed CA19-9 alone in diagnostic performance, with AUROC values of 0.992 in Cohort A and 0.977 in Cohort B for all stages. This enhanced diagnostic performance can be attributed to the complementary nature of the multiple biomarkers, which reflect distinct aspects of tumor biology, thereby providing comprehensive coverage of PDA’s inherent heterogeneity and overcoming the limitations associated with single-marker diagnostics.
Growth differentiation factor 15 (GDF-15), also known as macrophage inhibitory cytokine-1 (MIC-1), is a member of the transforming growth factor-beta (TGF-β) superfamily54,55. GDF-15 is most highly expressed in the placenta and prostate, with lower expression levels observed in the pancreas, liver, gallbladder, colon, stomach, bladder, kidney, and endometrium54,56. Notably, serum GDF-15 levels were significantly higher in patients with PDA than in healthy controls and those with chronic pancreatitis (7694.6 vs. 2247.9 pg/mL), highlighting its potential as a diagnostic biomarker57,58,59,60. Beyond its diagnostic utility, GDF-15 plays a role in immune evasion in PDA by suppressing macrophage activity through NF-κB-mediated inhibition of TNF and nitric oxide production, thereby contributing to tumor progression54,61,62. Numerous studies have demonstrated that combining GDF-15/MIC-1 with CA19-9 significantly improves both sensitivity and specificity for PDA diagnosis63,64,65. A meta-analysis of 14 studies involving 2,826 participants reported that MIC-1 exhibited higher sensitivity for PDA diagnosis than that of CA19-9 (80% vs. 71%)66. Furthermore, serum MIC-1 levels were more effective in distinguishing patients with resectable PDA from control patients than the CA19-967,68. These findings underscore the potential of GDF-15 as a valuable biomarker for PDA detection, particularly when used alongside CA19-9.
The urokinase plasminogen activator receptor (uPAR) binds to urokinase-type plasminogen activator (uPA) to catalyze the conversion of plasminogen to plasmin, driving fibrin degradation, tumor invasion, and cancer progression69,70,71,72,73. While uPA and uPAR are minimally expressed in normal tissues, they are markedly overexpressed in malignancies, including PDA, where uPA levels rise sixfold and uPAR fourfold higher than those of the normal tissues74,75,76,77,78,79,80,81,82. Soluble uPAR (suPAR), the circulating form of membrane-bound uPAR, plays a crucial role in cancer progression, immune activation, and inflammation83. Plasma suPAR levels are also significantly higher in patients with PDA than in those with chronic pancreatitis. At a cutoff value of 2.8 ng/mL, suPAR demonstrated a sensitivity of 88% and specificity of 70%84, while at a cutoff value of 3.2 ng/mL, it demonstrated a sensitivity of 82% and specificity of 43%. However, when combined with CA19-9, the specificity improved significantly to 86–88%. In another study, combining suPAR and CA19-9 further enhanced the ability to differentiate between patients with PDA and healthy controls, achieving 88.5% sensitivity and 98% specificity85.
Linear models, such as generalized linear models, were initially considered but demonstrated inferior performance due to the complex and nonlinear interactions among biomarkers. Therefore, tree-based ML algorithms (CatBoost, LightGBM, and XGBoost) were selected for their ability to capture such nonlinear relationships. ML methods, including feature selection and classification algorithms, significantly enhance biomarker identification by efficiently extracting relevant features and classifying samples86. In our study, several ML techniques, including CatBoost, XGBoost, LightGBM, SVM, KNN, and Random Forest, were employed to identify an optimal biomarker combination for PDA diagnosis. Among these methods, tree-based models, particularly CatBoost, achieved the highest AUROC, accuracy, sensitivity, and specificity. One key challenge in ML is the ‘black box’ nature of certain algorithms, which limits their interpretability. To address this, SHAP analysis was used to quantify the contributions of each biomarker, generating a ranked list and illustrating their influence on PDA classification. This approach enhanced the model’s transparency, supporting the clinical translation of our findings and facilitating integration into diagnostic workflows. Our study highlights the role of ML and SHAP in PDA biomarker discovery and validation. The identified panel offers a promising noninvasive method for early detection and improved screening, particularly for high-risk populations.
This study has some limitations. First, although our biomarker panel was validated using two independent cohorts, the number of patients with early-stage PDA remains limited. Neoadjuvant chemotherapy is now commonly used in patients with resectable or borderline resectable PDA to improve surgical outcomes, which reduces the availability of untreated patients with early-stage PDA for research. Therefore, external validation in a larger, multi-institutional cohort is essential to confirm the generalizability of our findings across diverse populations. Second, detecting low-abundance proteins or those with rapid turnover in circulation remains challenging. Some proteomic biomarkers may be present at undetectable levels, or their measurement could be masked by highly abundant proteins, potentially limiting their diagnostic utility. In addition, certain biomarkers (e.g., suPAR, HGF, GDF15) exhibited inter-cohort variability, which may reflect biological heterogeneity or pre-analytical differences. These discrepancies likely stem from biological heterogeneity (such as demographic differences, comorbidities, or disease stage distribution) and pre-analytical variability. Future multicenter validation with harmonization strategies will be needed to minimize potential cohort-specific effects.
Despite certain limitations, our study offers several notable strengths. First, we systematically evaluated 47 serum proteins and employed a ML–based feature selection strategy to identify an optimal multi-marker panel, addressing the biological heterogeneity of PDA that cannot be captured by a single biomarker. The resulting three-biomarker panel (CA19-9, GDF15, and suPAR) achieved superior diagnostic performance and demonstrated consistent accuracy across both development and independent validation cohorts. Importantly, the model was applied to the validation cohort without retraining, confirming its external validity and robustness. Second, the panel retained strong diagnostic performance in clinically critical subgroups, including stage I PDA and patients with normal CA19-9 levels, underscoring its potential utility for early detection and in populations where CA19-9 is limited. Finally, unlike most prior studies that relied on CA19-9 alone or small biomarker panels without rigorous external validation, our work integrates multiple complementary biomarkers with ML optimization, validated in a multicenter independent cohort. These features emphasize the novelty, clinical significance, and translational potential of our biomarker panel, particularly for high-risk population screening.
Conclusion
This study developed an ML-based biomarker panel incorporating CA19-9, GDF15, and suPAR for PDA diagnosis. The panel significantly outperformed CA19-9 alone, particularly in early-stage detection. The CatBoost model achieved high accuracy, with AUROC values of 0.992 for all stages and 0.976 for early-stage PDA, and validation confirmed its robustness. These findings highlight the potential of ML-driven biomarker panels in enhancing noninvasive PDA detection.
Data availability
The datasets generated and analyzed during the current study are publicly available in the Zenodo repository at: https://doi.org/10.5281/zenodo.15844304.
References
Khalaf, N., El-Serag, H. B., Abrams, H. R. & Thrift, A. P. Burden of pancreatic cancer: from epidemiology to practice. Clin. Gastroenterol. Hepatol. 19, 876–884. https://doi.org/10.1016/j.cgh.2020.02.054 (2021).
National Cancer Institute (NCI). Cancer StatFacts: pancreatic cancer. NCI website. https://seer.cancer.gov/statfacts/html/pancreas.html (Accessed May 2023).
Conroy, T. et al. FOLFIRINOX versus gemcitabine for metastatic pancreatic cancer. N Engl. J. Med. 364, 1817–1825. https://doi.org/10.1056/NEJMoa1011923 (2011).
Von Hoff, D. D. et al. Increased survival in pancreatic cancer with nab-paclitaxel plus gemcitabine. N Engl. J. Med. 369, 1691–1703. https://doi.org/10.1056/NEJMoa1304369 (2013).
Strobel, O., Neoptolemos, J., Jäger, D. & Büchler, M. W. Optimizing the outcomes of pancreatic cancer surgery. Nat. Rev. Clin. Oncol. 16, 11–26. https://doi.org/10.1038/s41571-018-0112-1 (2019).
Neoptolemos, J. P. et al. Therapeutic developments in pancreatic cancer: current and future perspectives. Nat. Rev. Gastroenterol. Hepatol. 15, 333–348. https://doi.org/10.1038/s41575-018-0005-x (2018).
Conroy, T. et al. FOLFIRINOX or gemcitabine as adjuvant therapy for pancreatic cancer. N Engl. J. Med. 379, 2395–2406. https://doi.org/10.1056/NEJMoa1809775 (2018).
Strobel, O. et al. Actual Five-year survival after upfront resection for pancreatic ductal adenocarcinoma: who beats the odds?? Ann. Surg. 275, 962–971. https://doi.org/10.1097/sla.0000000000004147 (2022).
Pereira, S. P. et al. Early detection of pancreatic cancer. Lancet Gastroenterol. Hepatol. 5, 698–710. https://doi.org/10.1016/s2468-1253(19)30416-9 (2020).
Kamisawa, T., Wood, L. D., Itoi, T. & Takaori, K. Pancreatic cancer. Lancet 388, 73–85. https://doi.org/10.1016/s0140-6736(16)00141-0 (2016).
Goonetilleke, K. S. & Siriwardena, A. K. Systematic review of carbohydrate antigen (CA 19 – 9) as a biochemical marker in the diagnosis of pancreatic cancer. Eur. J. Surg. Oncol. 33, 266–270. https://doi.org/10.1016/j.ejso.2006.10.004 (2007).
Ballehaninna, U. & Chamberlain, R. The clinical utility of serum CA 19 – 9 in the diagnosis, prognosis and management of pancreatic adenocarcinoma: an evidence based appraisal. J. Gastrointest. Oncol. 32, 105–119. https://doi.org/10.3978/j.issn.2078-6891.2011.021 (2012).
Vellan, C. J. et al. Application of proteomics in pancreatic ductal adenocarcinoma biomarker investigations: A review. Int. J. Mol. Sci. 23 https://doi.org/10.3390/ijms23042093 (2022).
Ozaki, Y., Broughton, P., Abdollahi, H., Valafar, H. & Blenda, A. V. Integrating omics data and AI for cancer diagnosis and prognosis. Cancers (Basel). 16 https://doi.org/10.3390/cancers16132448 (2024).
Benkeser, D., Ju, C., Lendle, S. & van der Laan, M. Online cross-validation-based ensemble learning. Stat. Med. 37, 249–260. https://doi.org/10.1002/sim.7320 (2018).
Hu, C., Gao, C., Li, T., Liu, C. & Peng, Z. Explainable artificial intelligence model for mortality risk prediction in the intensive care unit: a derivation and validation study. Postgrad. Med. J. 100, 219–227. https://doi.org/10.1093/postmj/qgad144 (2024).
Lee, H. et al. Serum protein profiling of lung, pancreatic, and colorectal cancers reveals alcohol consumption-mediated disruptions in early-stage cancer detection. Heliyon 8, e12359. https://doi.org/10.1016/j.heliyon.2022.e12359 (2022).
Qiu, C. et al. A luminex approach to develop an anti-tumor-associated antigen autoantibody panel for the detection of prostate cancer in racially/ethnically diverse populations. Cancers (Basel). 15 https://doi.org/10.3390/cancers15164064 (2023).
R&D Systems. Human Premixed Multi-Analyte Kit: Luminex® Assay Protocol. https://resources.rndsystems.com/pdfs/datasheets/lxsah.pdf (R&D Systems Inc., 2017).
Fahrmann, J. F. et al. Lead-time trajectory of CA19-9 as an anchor marker for pancreatic cancer early detection. Gastroenterology 160, 1373–1383.e1376. https://doi.org/10.1053/j.gastro.2020.11.052 (2021).
Wood, L. D., Canto, M. I., Jaffee, E. M. & Simeone, D. M. Pancreatic cancer: pathogenesis, screening, diagnosis, and treatment. Gastroenterology 163, 386–402e381. https://doi.org/10.1053/j.gastro.2022.03.056 (2022).
Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68, 394–424. https://doi.org/10.3322/caac.21492 (2018).
Rahib, L. et al. Projecting cancer incidence and deaths to 2030: the unexpected burden of thyroid, liver, and pancreas cancers in the united States. Cancer Res. 74, 2913–2921. https://doi.org/10.1158/0008-5472.Can-14-0155 (2014).
Shakeel, S., Finley, C., Akhtar-Danesh, G., Seow, H. Y. & Akhtar-Danesh, N. Trends in survival based on treatment modality in patients with pancreatic cancer: a population-based study. Curr. Oncol. 27, e1–e8. https://doi.org/10.3747/co.27.5211 (2020).
Blackford, A. L., Canto, M. I., Klein, A. P., Hruban, R. H. & Goggins, M. Recent trends in the incidence and survival of stage 1A pancreatic cancer: A surveillance, epidemiology, and end results analysis. J. Natl. Cancer Inst. 112, 1162–1169. https://doi.org/10.1093/jnci/djaa004 (2020).
Hur, C. et al. Early pancreatic ductal adenocarcinoma survival is dependent on size: positive implications for future targeted screening. Pancreas 45, 1062–1066. https://doi.org/10.1097/mpa.0000000000000587 (2016).
Koprowski, H., Herlyn, M., Steplewski, Z. & Sears, H. F. Specific antigen in serum of patients with colon carcinoma. Science 212, 53–55. https://doi.org/10.1126/science.6163212 (1981).
Luo, G. et al. Roles of CA19-9 in pancreatic cancer: biomarker, predictor and promoter. Biochim. Biophys. Acta Rev. Cancer. 1875, 188409. https://doi.org/10.1016/j.bbcan.2020.188409 (2021).
Owens, D. K. et al. Screening for pancreatic cancer: US preventive services task force reaffirmation recommendation statement. Jama 322, 438–444. https://doi.org/10.1001/jama.2019.10232 (2019).
Locker, G. Y. et al. ASCO 2006 update of recommendations for the use of tumor markers in gastrointestinal cancer. J. Clin. Oncol. 24, 5313–5327. https://doi.org/10.1200/jco.2006.08.2644 (2006).
Duffy, M. J. et al. Tumor markers in pancreatic cancer: a European group on tumor markers (EGTM) status report. Ann. Oncol. 21, 441–447. https://doi.org/10.1093/annonc/mdp332 (2010).
Yee, N. S., Zhang, S., He, H. Z. & Zheng, S. Y. Extracellular vesicles as potential biomarkers for early detection and diagnosis of pancreatic cancer. Biomedicines 8 https://doi.org/10.3390/biomedicines8120581 (2020).
Yu, S. et al. Plasma extracellular vesicle long RNA profiling identifies a diagnostic signature for the detection of pancreatic ductal adenocarcinoma. Gut 69, 540–550. https://doi.org/10.1136/gutjnl-2019-318860 (2020).
Grunvald, M. W., Jacobson, R. A., Kuzel, T. M., Pappas, S. G. & Masood, A. Current status of Circulating tumor DNA liquid biopsy in pancreatic cancer. Int. J. Mol. Sci. 21 https://doi.org/10.3390/ijms21207651 (2020).
Mellby, L. D. et al. Serum biomarker signature-based liquid biopsy for diagnosis of early-stage pancreatic cancer. J. Clin. Oncol. 36, 2887–2894. https://doi.org/10.1200/jco.2017.77.6658 (2018).
Daoud, A. Z., Mulholland, E. J., Cole, G. & McCarthy, H. O. MicroRNAs in pancreatic cancer: biomarkers, prognostic, and therapeutic modulators. BMC Cancer. 19, 1130. https://doi.org/10.1186/s12885-019-6284-y (2019).
Baraniskin, A. et al. Circulating U2 small nuclear RNA fragments as a novel diagnostic biomarker for pancreatic and colorectal adenocarcinoma. Int. J. Cancer. 132, E48–57. https://doi.org/10.1002/ijc.27791 (2013).
Previdi, M. C., Carotenuto, P., Zito, D., Pandolfo, R. & Braconi, C. Noncoding RNAs as novel biomarkers in pancreatic cancer: what do we know? Future Oncol. 13, 443–453. https://doi.org/10.2217/fon-2016-0253 (2017).
Swietlik, J. J. et al. Cell-selective proteomics segregates pancreatic cancer subtypes by extracellular proteins in tumors and circulation. Nat. Commun. 14, 2642. https://doi.org/10.1038/s41467-023-38171-8 (2023).
Perazzoli, G. et al. Evaluating Metabolite-Based biomarkers for early diagnosis of pancreatic cancer: A systematic review. Metabolites 13 https://doi.org/10.3390/metabo13070872 (2023).
Suenaga, M. et al. Pancreatic juice mutation concentrations can help predict the grade of dysplasia in patients undergoing pancreatic surveillance. Clin. Cancer Res. 24, 2963–2974. https://doi.org/10.1158/1078-0432.Ccr-17-2463 (2018).
Neal, R. D. et al. Cell-free DNA-based multi-cancer early detection test in an asymptomatic screening population (NHS-Galleri): design of a pragmatic, prospective randomised controlled trial. Cancers (Basel). 14. https://doi.org/10.3390/cancers14194818 (2022).
Nadauld, L. D. et al. The PATHFINDER study: assessment of the implementation of an investigational multi-cancer early detection test into clinical practice. Cancers (Basel). 13. https://doi.org/10.3390/cancers13143501 (2021).
Huerta, M. et al. Circulating tumor DNA detection by digital-droplet PCR in pancreatic ductal adenocarcinoma: A systematic review. Cancers. 13. https://doi.org/10.3390/cancers13050994 (2021).
Ramalhete, L., Vigia, E., Araújo, R. & Marques, H. P. Proteomics-driven biomarkers in pancreatic cancer. Proteomes 11 https://doi.org/10.3390/proteomes11030024 (2023).
De Oliveira, G. et al. An integrated meta-analysis of secretome and proteome identify potential biomarkers of pancreatic ductal adenocarcinoma. Cancers. 12 https://doi.org/10.3390/cancers12030716 (2020).
Cohen, J. D. et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 359, 926–930. https://doi.org/10.1126/science.aar3247 (2018).
Athanasiou, A. et al. Biomarker discovery for early detection of pancreatic ductal adenocarcinoma (PDAC) using multiplex proteomics technology. J. Proteome Res. 24, 315–322. https://doi.org/10.1021/acs.jproteome.4c00752 (2025).
Palma, N. A. et al. A high performing biomarker signature for detecting early-stage pancreatic ductal adenocarcinoma in high-Risk individuals. Cancers (Basel). 17 https://doi.org/10.3390/cancers17111866 (2025).
Capello, M. et al. Sequential validation of blood-based protein biomarker candidates for early-stage pancreatic cancer. J. Natl. Cancer Inst. 109 https://doi.org/10.1093/jnci/djw266 (2017).
Cohen, J. D. et al. Combined circulating tumor DNA and protein biomarker-based liquid biopsy for the earlier detection of pancreatic cancers. Proc. Natl. Acad. Sci. U S A. 114, 10202–10207. https://doi.org/10.1073/pnas.1704961114 (2017).
Kim, J. et al. Detection of early pancreatic ductal adenocarcinoma with thrombospondin-2 and CA19-9 blood markers. Sci. Transl Med. 9 https://doi.org/10.1126/scitranslmed.aah5583 (2017).
Byeon, S. et al. Novel serum protein biomarker panel for early diagnosis of pancreatic cancer. Int. J. Cancer. 155, 365–371. https://doi.org/10.1002/ijc.34928 (2024).
Wischhusen, J., Melero, I. & Fridman, W. H.Growth/differentiation factor-15 (GDF-15): from biomarker to novel targetable immune checkpoint. Front. Immunol. 11, 951. https://doi.org/10.3389/fimmu.2020.00951 (2020).
Johnen, H. et al. Tumor-induced anorexia and weight loss are mediated by the TGF-beta superfamily cytokine MIC-1. Nat. Med. 13, 1333–1340. https://doi.org/10.1038/nm1677 (2007).
Böttner, M., Suter-Crazzolara, C., Schober, A. & Unsicker, K. Expression of a novel member of the TGF-β superfamily, growth/differentiation factor-15/macrophage-inhibiting cytokine-1 (GDF-15/MIC-1) in adult rat tissues. Cell Tissue Res. 297, 103–110. https://doi.org/10.1007/s004410051337 (1999).
Koopmann, J. et al. Serum macrophage inhibitory cytokine 1 as a marker of pancreatic and other periampullary cancers. Clin. Cancer Res. 10, 2386–2392. https://doi.org/10.1158/1078-0432.ccr-03-0165 (2004).
Chen, Y. Z. et al. Diagnostic performance of serum macrophage inhibitory cytokine-1 in pancreatic cancer: a meta-analysis and meta-regression analysis. DNA Cell. Biol. 33, 370–377. https://doi.org/10.1089/dna.2013.2237 (2014).
Wang, X. et al. Macrophage inhibitory cytokine 1 (MIC-1/GDF15) as a novel diagnostic serum biomarker in pancreatic ductal adenocarcinoma. BMC Cancer. 14, 578. https://doi.org/10.1186/1471-2407-14-578 (2014).
Hogendorf, P. et al. Growth differentiation factor (GDF-15) concentration combined with Ca125 levels in serum is superior to commonly used cancer biomarkers in differentiation of pancreatic mass. Cancer Biomark. 21, 505–511. https://doi.org/10.3233/cbm-170203 (2018).
Ratnam, N. et al. NF-κB regulates GDF-15 to suppress macrophage surveillance during early tumor development. J. Clin. Investig. 127 10, 3796–3809. https://doi.org/10.1172/JCI91561 (2017).
Ratnam, N. M. et al. NF-κB regulates GDF-15 to suppress macrophage surveillance during early tumor development. J. Clin. Invest. 127, 3796–3809. https://doi.org/10.1172/jci91561 (2017).
O’Neill, R. S., Emmanuel, S., Williams, D. & Stoita, A. Macrophage inhibitory cytokine-1/growth differentiation factor-15 in premalignant and neoplastic tumours in a high-risk pancreatic cancer cohort. World J. Gastroenterol. 26, 1660–1673. https://doi.org/10.3748/wjg.v26.i14.1660 (2020).
Kaur, S. et al. Potentials of plasma NGAL and MIC-1 as biomarker(s) in the diagnosis of lethal pancreatic cancer. PLoS One. 8, e55171. https://doi.org/10.1371/journal.pone.0055171 (2013).
Mohamed, A. A. et al. Evaluation of Circulating ADH and MIC-1 as diagnostic markers in Egyptian patients with pancreatic cancer. Pancreatology 15, 34–39. https://doi.org/10.1016/j.pan.2014.10.008 (2015).
Yang, Y., Yan, S., Tian, H. & Bao, Y. Macrophage inhibitory cytokine-1 versus carbohydrate antigen 19 – 9 as a biomarker for diagnosis of pancreatic cancer: A PRISMA-compliant meta-analysis of diagnostic accuracy studies. Med. (Baltim). 97, e9994. https://doi.org/10.1097/md.0000000000009994 (2018).
Misek, D. E., Patwa, T. H., Lubman, D. M. & Simeone, D. M. Early detection and biomarkers in pancreatic cancer. J. Natl. Compr. Canc Netw. 5, 1034–1041. https://doi.org/10.6004/jnccn.2007.0086 (2007).
Koopmann, J. et al. Serum markers in patients with resectable pancreatic adenocarcinoma: macrophage inhibitory cytokine 1 versus CA19-9. Clin. Cancer Res. 12, 442–446. https://doi.org/10.1158/1078-0432.Ccr-05-0564 (2006).
Mahmood, N., Mihalcioiu, C. & Rabbani, S. Multifaceted role of the urokinase-type plasminogen activator (uPA) and its receptor (uPAR): diagnostic, prognostic, and therapeutic applications. Front. Oncol. 8 https://doi.org/10.3389/fonc.2018.00024 (2018).
Kjaergaard, M., Hansen, L. V., Jacobsen, B., Gardsvoll, H. & Ploug, M. Structure and ligand interactions of the urokinase receptor (uPAR). Front. Biosci. 13, 5441–5461. https://doi.org/10.2741/3092 (2008).
Koblinski, J. E., Ahram, M. & Sloane, B. F. Unraveling the role of proteases in cancer. Clin. Chim. Acta. 291, 113–135. https://doi.org/10.1016/s0009-8981(99)00224-7 (2000).
Martin, C. E. & List, K. Cell surface-anchored serine proteases in cancer progression and metastasis. Cancer Metastasis Rev. 38, 357–387. https://doi.org/10.1007/s10555-019-09811-7 (2019).
Hamada, M. et al. Urokinase-type plasminogen activator receptor (uPAR) in inflammation and disease: A unique inflammatory pathway activator. Biomedicines 12 https://doi.org/10.3390/biomedicines12061167 (2024).
Harvey, S. R. et al. Evaluation of urinary plasminogen activator, its receptor, matrix metalloproteinase-9, and von Willebrand factor in pancreatic cancer. Clin. Cancer Res. 9, 4935–4943 (2003).
Nielsen, T. O. et al. Expression of the insulin-like growth factor I receptor and urokinase plasminogen activator in breast cancer is associated with poor survival: potential for intervention with 17-allylamino geldanamycin. Cancer Res. 64, 286–291. https://doi.org/10.1158/0008-5472.can-03-1242 (2004).
Look, M. P. et al. Pooled analysis of prognostic impact of urokinase-type plasminogen activator and its inhibitor PAI-1 in 8377 breast cancer patients. J. Natl. Cancer Inst. 94, 116–128. https://doi.org/10.1093/jnci/94.2.116 (2002).
Märkl, B. et al. Tumour budding, uPA and PAI-1 are associated with aggressive behaviour in colon cancer. J. Surg. Oncol. 102, 235–241. https://doi.org/10.1002/jso.21611 (2010).
Halamkova, J. et al. Clinical relevance of uPA, uPAR, PAI 1 and PAI 2 tissue expression and plasma PAI 1 level in colorectal carcinoma patients. Hepatogastroenterology 58, 1918–1925. https://doi.org/10.5754/hge10232 (2011).
Brungs, D. et al. The urokinase plasminogen activation system in gastroesophageal cancer: A systematic review and meta-analysis. Oncotarget 8, 23099–23109. https://doi.org/10.18632/oncotarget.15485 (2017).
Kaneko, T., Konno, H., Baba, M., Tanaka, T. & Nakamura, S. Urokinase-type plasminogen activator expression correlates with tumor angiogenesis and poor outcome in gastric cancer. Cancer Sci. 94, 43–49. https://doi.org/10.1111/j.1349-7006.2003.tb01350.x (2003).
Gorantla, B., Asuthkar, S., Rao, J. S., Patel, J. & Gondi, C. S. Suppression of the uPAR-uPA system retards angiogenesis, invasion, and in vivo tumor development in pancreatic cancer cells. Mol. Cancer Res. 9, 377–389. https://doi.org/10.1158/1541-7786.Mcr-10-0452 (2011).
Hildenbrand, R. et al. Amplification of the urokinase-type plasminogen activator receptor (uPAR) gene in ductal pancreatic carcinomas identifies a clinically high-risk group. Am. J. Pathol. 174, 2246–2253. https://doi.org/10.2353/ajpath.2009.080785 (2009).
Loosen, S. et al. Soluble urokinase plasminogen activator receptor (suPAR) as a novel biomarker in patients undergoing resection of pancreatic adenocarcinoma. J. Clin. Oncol. https://doi.org/10.1200/JCO.2019.37.4_SUPPL.248 (2018).
Aronen, A. et al. Plasma SuPAR may help to distinguish between chronic pancreatitis and pancreatic cancer. Scand. J. Gastroenterol. 56, 81–85. https://doi.org/10.1080/00365521.2020.1849383 (2021).
Loosen, S. H. et al. High baseline soluble urokinase plasminogen activator receptor (suPAR) serum levels indicate adverse outcome after resection of pancreatic adenocarcinoma. Carcinogenesis 40, 947–955. https://doi.org/10.1093/carcin/bgz033 (2019).
Ledesma, D., Symes, S. & Richards, S. Advancements within modern machine learning methodology: impacts and prospects in biomarker discovery. Curr. Med. Chem. 28, 6512–6531. https://doi.org/10.2174/0929867328666210208111821 (2021).
Acknowledgements
This study was conducted with bioresources distributed from Human Bioresource Center of Seoul National University Bundang Hospital (DT-2019-005) and from the regional university hospital–based biobank in Daegu, Republic of Korea (21061501-17-01).
Funding
This work was supported by the Seoul National University Bundang Hospital Research Fund [grant numbers 14-2021-0042, 06-2020-0113].
Author information
Authors and Affiliations
Contributions
Dong Woo Shin: conceptualization, software, validation, formal analysis, investigation, writing – original draft, visualization. Je-Yoel Cho: methodology, writing – review & editing. Sukki Cho: resources, investigation. Yuna Youn: validation, investigation, data curation. Jin-Hyeok Hwang: conceptualization, methodology, resources, supervision, project administration, writing – review & editing, funding acquisition. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.








Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Shin, D.W., Cho, JY., Cho, S. et al. Development of a serum protein biomarker panel for the diagnosis of pancreatic ductal adenocarcinoma using a machine learning approach. Sci Rep 15, 35659 (2025). https://doi.org/10.1038/s41598-025-19631-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-19631-1