Fig. 3: McMLP provides better predictive power than previously developed computational methods for predicting endpoint metabolomic profiles on synthetic data generated from microbial consumer-resource models. | Nature Communications

Fig. 3: McMLP provides better predictive power than previously developed computational methods for predicting endpoint metabolomic profiles on synthetic data generated from microbial consumer-resource models.

From: Predicting metabolite response to dietary intervention using deep learning

Fig. 3

Three computational methods are compared: Random Forest (RF), Gradient Boosting Regressor (GBR), and McMLP. For each method, we either included (“w/ b” label) or did not include (“w/o b” label) baseline metabolomic profiles as input variables. Each method with a particular combination of input data is colored the same way in all panels. Standard errors are computed based on fifty random train-test splits and shown in all panels (as solid black vertical lines or transparent areas around their means). To compare different methods, we adopted three metrics: the mean Spearman Correlation Coefficient (SCC) \(\bar{\rho }\), the fraction of metabolites with SCCs greater than 0.5 (denoted as \({f}_{\rho > 0.5}\)), and the mean SCC of the top-5 predicted metabolites \({\bar{\rho }}_{5}\). Error bars denote the standard error (n = 50). a1-a3, For the synthetic data with an intervention dose of 3 and 50 training samples, McMLP provides the best performance for all three metrics regardless of whether the baseline metabolomic profiles are included or not. b1-b3, When the intervention dose is 3, the predictive performance of all methods gets better and closer to each other as the training sample size increases. Including baseline metabolomic profiles also helps to improve the prediction. c1-c3, When 200 training samples are used, the performance gap between including and not including baseline metabolomic profiles shrinks as the intervention dose increases. All statistical analyses were performed using the two-sided Wilcoxon signed-rank test. P values obtained from the test are divided into four groups: (1) \(p > 0.05({{{\rm{n}}}}.{{{\rm{s}}}}.)\), (2) \(0.01 < p\le 0.05(*)\), (3) \({10}^{-3} < p\le 0.01(*\ast )\), and (4) \({10}^{-4} < p\le {10}^{-3}(*\ast*)\). Source data of raw data points and p values are provided as a Source Data file.

Back to article page