Fig. 4: McMLP is superior to previous methods in terms of predicting endpoint metabolomic profiles on real data from six dietary intervention studies.
From: Predicting metabolite response to dietary intervention using deep learning

Three computational methods are compared: Random Forest (RF), Gradient Boosting Regressor (GBR), and McMLP. For each method, we either included (“w/ b” label) or did not include (“w/o b” label) baseline metabolomic profiles as input variables. Each method with a particular combination of input data is colored the same in all panels. Standard errors are computed based on fifty random train-test splits and shown in all panels (solid black vertical lines). To compare different methods, we adopted three metrics: the mean Spearman Correlation Coefficient (SCC) \(\bar{\rho }\), the fraction of metabolites with SCCs greater than 0.5 (denoted as \({f}_{\rho > 0.5}\)), and the mean SCC of the top-5 predicted metabolites \({\bar{\rho }}_{5}\). Error bars denote the standard error (n = 50). a1-a3, Comparison of the performance in predicting SCFAs on the data from the avocado intervention study28. b1-b3, Comparison of performance in predicting bile acids on the data from the avocado intervention study28. c1-c3, Comparison of predictive performance on the data from the grain intervention study39. d1-d3, Comparison of predictive performance on the data from the walnut intervention study27. e1-e3, Comparison of predictive performance on the data from the almond intervention study40. f1-f3, Comparison of predictive performance on the data from the broccoli intervention study41. g1-g3, Comparison of predictive performance on the data from the high-fiber food or fermented food intervention study34. All statistical analyses were performed using the two-sided Wilcoxon signed-rank test. P values obtained from the test are divided into four groups: (1) \(p > 0.05\) (n.s.), (2) \(0.01 < p\le 0.05\) (*), (3) \({10}^{-3} < p\le 0.01\) (**), and (4) \({10}^{-4} < p\le {10}^{-3}\) (***). Source data of raw data points and p values are provided as a Source Data file.