Extended Data Fig. 2: Proteomic features in PDAC and validation of abundance levels of modules using an external independent cohort.
From: Prospective observational study on biomarkers of response in pancreatic ductal adenocarcinoma

(a) Determination of soft-threshold power in WGCNA. Analysis of the scale-free index for various soft-threshold powers (β) and the mean connectivity for various soft-threshold powers. (b) Dendrogram of proteins based on the measurement of dissimilarity and identification of the 32 modules. (c) Evaluating the stability of the 32 protein modules. For each calculation process, 80% of PDAC tumors were randomly selected for the identification of protein modules, employing a sample construction workflow and determining the soft-threshold power for the initial 32 protein modules. Taking ME01 as an example, the accuracy for ME01 was calculated using the formula [Number of proteins clustered in ME01] / [Total protein number of ME01]. This calculation process was repeated 20 times (n=20 biologically independent calculations) with different random seed, consistently yielding median accuracies exceeding 90%. This underscores the robustness and stability for module identification. In the boxplot, a black line within the box marks the median. The bottom and top of the box are located at the 25th and 75th percentiles, respectively. The bars represent values that are more than 1.5 times the interquartile range from the border of each box. (d) Receiver operating characteristic (ROC) curves, illustrating the prediction accuracy of the 32 protein modules in ‘RJ-cohort 1’. A random forest algorithm was employed to build the model, with 80% PDACs used as the training cohort and the remaining 20% as the validation cohort. Area under curve (AUC) values were calculated for each module separately. (e) ROC curves depicting the prediction accuracy of the 32 modules in Cao et al cohort. The model was constructed based on ‘RJ-cohort 1’, and the Cao et al. cohort was used as the validation cohort. (f) Validation of overlay of the significantly up/down-regulated proteins (left) and genes (right) between tumors and TATs on the weighted correlation network nodes from the Cao et al cohort. (g) Spearman’s Correlation of module scores calculated by using proteomic and transcriptomic data in ‘RJ-cohort 1’. Each dot represents one module, with the red dots indicating significant positive correlation (P-value < 0.05, r > 0) between protein and RNA-seq, and the blue dots for significant negative correlations (P-value < 0.05, r < 0). Two-sided P-values were calculated. (h) Kaplan-Meier survival curves comparing OS between patient subgroups stratified by RNA-seq. P-values were calculated using the log-rank test, and hazard ratios calculated using the univariable Cox regression analysis. (i) Kaplan-Meier survival curves comparing OS between patient RNA-seq subgroups stratified by both adjuvant therapy and the high/low abundance (median cutoff) of ME11. P-values were calculated using the log-rank test, and hazard ratios calculated using the univariable Cox regression analysis.