Fig. 2: Performance distributions of workflows, leave-one-dataset-out cross-validation (LODOCV) test results, and top-ranked workflows. | Nature Communications

Fig. 2: Performance distributions of workflows, leave-one-dataset-out cross-validation (LODOCV) test results, and top-ranked workflows.

From: Optimizing differential expression analysis for proteomics data via high-performing rules and ensemble inference

Fig. 2

a The performance distribution of DDA data quantified by FragPipe-based (FG_DDA) workflows. b Presents an example to demonstrate the process of the leave-one-dataset-out cross-validation, where the x-axis shows the averaged ranks of FG_DDA workflows (across five metrics) obtained from dataset HYEtims735_LFQ; the y-axis shows corresponding ranks obtained via benchmarking with mean performance of the remaining datasets. A Spearman correlation of 0.72 with p-value < 2.2e−16 (two-sided t-test) is obtained (N = 7852). c The distributions of LODOCV results under different quantification settings. In the boxplots, the mean Spearman correlations are marked by red triangles, centerline indicates the median, box limits indicate upper and lower quartiles, whiskers indicate the 1.5 interquartile range. The numbers of points for each boxplot (n) are shown above the boxplots. d Displays the Kruskal–Wallis (KW) test results checking whether workflow ranks are sensitive to instrument types. The x-axis lists the ranks of the top 30 workflows ranked by mean performances. The y-axis shows the log-transformed p-value of the KW tests. Most comparisons are non-significant, suggesting workflows are not sensitive to instrument types. e Shows the top 2 workflows for each matrix type under FG_DDA, MQ_DDA, DIANN_DIA, and spt_DIA settings. The color encodes the matrix type, and the labels encode selection details on DEA, normalization, and MVI. The overall ranks are shown in brackets. Source data of ae are provided as a Source Data file.

Back to article page