Fig. 2 | Scientific Reports

Fig. 2

From: Serum peptide biomarkers by MALDI-TOF MS coupled with machine learning for diagnosis and classification of hepato-pancreato-biliary cancers

Fig. 2

Multivariate analysis and classification performance based on 71 selected peptide features in the training and testing sets. (A) PCA score plot showing partial separation of CCA and HCC groups, with clustering of healthy control, GBC, and PDAC groups in the training set. (B) PLS-DA score plot demonstrating a comparable group distribution to PCA, based on the 71 selected peptides. (C) Top 15 peptide features ranked by VIP scores from the PLS-DA model. (D) PLS-DA cross-validation results showing optimal model performance (R2 = 0.564, Q2 = 0.502), with permutation testing (N = 2,000) confirming the absence of overfitting in the training set. (E) Heatmap of average peptide expression levels across groups, with peptides ordered by increasing mass. (F) Random Forest model classification results, achieving an out-of-bag (OOB) error rate of 2.2%. (G,H) PCA and PLS-DA score plots in the independent testing set, showing clustering patterns consistent with the training set. (I) Top 15 VIP-ranked peptides in the testing set, 13 of which overlapped with those identified in the training set. (J) Heatmap of average peptide expression levels across groups in the testing set, showing consistent expression trends. (K) RF classification performance in the testing set, yielding an OOB error rate of 3.5%.

Back to article page