Extended Data Fig. 2: UnitedMet’s metabolite level predictions are consistent across 4 ccRCC datasets.
From: UnitedMet harnesses RNA–metabolite covariation to impute metabolite levels in clinical samples

a) Correlation plots of metabolite-level prediction performances, characterized by their spearman correlation between true ranks and predicted ranks, in all pairwise comparisons of 4 ccRCC datasets. Each dot represents a metabolite. Significant difference was assessed by the two-sided spearman correlation. b) Prediction uncertainty is negatively correlated with the prediction accuracy. Prediction uncertainty is estimated by quantifying the standard deviation of 1000 posterior draws of metabolite levels. For each metabolite, average standard deviation across all samples is associated with its prediction accuracy, characterized by spearman rho values between true ranks and predicted ranks. Each dot, colored by the proportion of censored measurements, represents a metabolite. Significant difference was assessed by the two-sided spearman correlation. c) The imputation performance for the TNBC dataset is assessed by spearman rho values between predicted values and their ground-truths across all simulated missing metabolites. Well-predicted metabolites with predicted ranks that show significant positive correlation (two-sided FDR-adjusted p < 0.1 and spearman rho > 0) with the actual ranks are labeled red. d) Predicted metabolite level changes in metabolic subtype C1 (n = 52 patient samples) v.s. C2 samples (n = 119 patient samples) in the TNBC dataset. Top: Boxplots showing predicted metabolite levels in lipid metabolism. Bottom left: Boxplots showing predicted metabolite levels in glutathione metabolism. Bottom right: Boxplots showing predicted metabolite levels in sugar metabolism. p values are calculated by unpaired two tailed parametric t-tests. In the box plots, the center line represents the median, the bounds of the box indicate the interquartile range (25th to 75th percentiles), and the whiskers extend to 1.5 times the interquartile range. Individual data points are shown as dots.