Fig. 6: Assessment of the target-decoy separation and feature importance.

A Ridge plot showing the density of SHAP impact contribution scores (see “Methods”) for each of the five ion features used for METASPACE-ML model training across all testing datasets. Features are displayed on the y-axis, and the SHAP contribution scores are displayed on the x-axis. Quartile lines are displayed for each ridgeline, and colors represent the area under each ridgeline for each of the 4 quartiles. B Density heatmaps show the distribution of SHAP impact contributions scores across datasets for each context in animal-based testing datasets and faceted by each of the five features. Each column represents a context which is described by its constituent metadata as bars colored by the classes in each metadata variable. The y-axis shows the SHAP impact contributions scores, and the color gradient represents their density. Columns are hierarchically clustered using a distance metric based on the Kolmogorov-Smirnov statistic. C UMAP for both target and decoy ions for one of the brain datasets used for the LC-MS bulk validation (https://metaspace2020.eu/dataset/2016-09-22_11h16m09s). A dot represents an ion and is colored by whether it’s a target or decoy (top left), the MSM score (top right), and the METASPACE-ML score (bottom left). D ROC curves for the same dataset as in (C) showing sensitivity and False Positive Rate (FPR) using the METASPACE-ML and MSM scores as well as each of the five constituent ion features. Curves are colored by the scores they correspond to.