Fig. 1: Workflow demonstrating key aspects of METASPACE-ML, model training, and evaluation.

A The workflow from the user perspective covers steps of data acquisition, molecular database selection, and preparation of the target and decoy database. B Scores are calculated for each ion which are used as features for the machine learning model. C The principle of machine learning is applied to the target and decoy ions’ features which are scored by a single decision tree by maximizing the PairLogit loss function. D Details of the training of Gradient Boosting Decision Trees using multiple datasets and visualization of how the metabolite annotations are generated for a given FDR threshold. E Details on the public datasets from METASPACE used for training and evaluation, as well as the measures used for evaluation. Created with Inkscape v1.2 and yED graph editor v3.22.