Fig. 7: METASPACE-ML captures low-intensity ions and biologically-relevant metabolites.

A Boxplots showing the median intensity distribution for each animal testing dataset across total and unique annotations by METASPACE-ML or MSM at FDR 10%. Each dot represents a dataset, colored by the Log10 number of annotations. X-axis labels indicate annotation approaches, with “only” denoting exclusive annotations. The Y-axis shows the median Log10 intensity for all ions per dataset. Significant p-values (p < 0.05) from a two-tailed Wilcoxon rank-sum test are shown above comparisons. B Heatmap displaying median Log10 intensity per context across animal testing datasets. Columns represent contexts described by metadata, with colors indicating classes. Rows show approaches from (A). Color gradients correspond to the median Log10 intensity. C Overrepresentation analysis using one-tailed Fisher’s exact test for datasets to identify enriched metabolite/lipid classes in ions exclusively captured by METASPACE-ML at FDR 10%. Log2 fold enrichment is on the x-axis, and HMDB metabolite subclasses are on the y-axis. Boxplots are colored by parent class, with the number of datasets in parentheses. Only terms with significant enrichment (p < 0.05) in at least 10% of datasets are shown. D Heatmap showing overrepresentation analysis results per context in animal testing datasets. Columns represent contexts described by metadata, with colors indicating classes, and rows show significantly enriched metabolite classes. Color gradient corresponds to Log2 fold enrichment, with labels indicating the number of datasets per context. In (A) and (C), boxplots’ bottom and top edges represent the 25th and 75th percentiles, with the median (50th percentile) line inside the box. Whiskers extend to the minimum and maximum values within 1.5 times the interquartile range from the quartiles; the minimum and maximum values are represented by the extent of the jittered data points.