Fig. 5: Evaluation of reliability of annotations predicted by METASPACE-ML.

A 2D binned plot showing the relationship between the increased number of annotations compared to MSM (Log10 absolute difference) and MAP scores. Each bin, with a width of 0.5 and length of 0.1, displays the count of animal testing datasets, with a color gradient indicating density. Bins within dotted lines represent datasets with ≤0 Log10 absolute difference scores. B Boxplot of MAP scores for animal testing datasets, grouped by whether METASPACE-ML had equal, higher, or lower annotations compared to MSM. C Boxplot of Log2 folds changes in the number of annotations of METASPACE-ML relative to MSM across all animal testing datasets, for different FDR thresholds. Exact p-values from a two-tailed Wilcoxon rank-sum test in (B) and (C) are shown above each comparison. D Boxplot of reliability scores across all animal testing datasets, for optimal FDR thresholds. In (B–D), boxplots’ bottom and top edges represent the 25th and 75th percentiles, with the median (50th percentile) line inside the box. Whiskers extend to the minimum and maximum values within 1.5 times the interquartile range from the quartiles; the minimum and maximum values are represented by the extent of the jittered data points.