Fig. 1: Overview of the LC-MS2Struct workflow.

a, Input to LC-MS2Struct during the application phase. The LC-MS2 experiment results in a set of (MS2, RT)-tuples. The MS information is used to generate a molecular candidate set for each MS feature. b, The output of LC-MS2Struct is the ranked molecular candidates for each MS feature. c, A fully connected graph G models the pairwise dependency between the MS features. Using a set of random spanning trees Tk and SSVM, we predict the max-marginal scores for each candidate used for the ranking. d, The MS2 and RO information is used to score the nodes and edges in the graph G. e, To train the SSVM models and evaluate LC-MS2Struct, we extract MS2 spectra and RTs from MassBank. We group the MassBank records such that their experimental set-ups are matching, simulating LC-MS2 experiments. f, Main objective optimized during the SSVM training, where yi ∈ Σi is the ground-truth label sequence of example i and \({{{\bf{y}}}},{{{\bf{y}}}}^{\prime} \in {{{\varSigma }}}_{i}\) are further possible label sequences.