Introduction

Precision nutrition aims to provide personalized dietary recommendations based on an individual’s unique biological and lifestyle characteristics such as genetics, gut microbiota, metabolomic profiles, and anthropometric data1,2. In addition to the design and implementation of large-scale clinical studies, one of the critical components for achieving precision nutrition is the development of predictive models that incorporate diverse individual data types to achieve an accurate prediction of metabolomic profiles following dietary changes1,2,3. However, existing models are limited to traditional machine learning methods such as Random Forest (RF)4,5 and Gradient-Boosting Regressor (GBR)3. Deep learning techniques have not been leveraged to predict metabolite responses for precision nutrition.

Among the biological characteristics relevant for precision nutrition, the gut microbiota is an important factor that explains a large fraction of individual metabolite responses among populations4,5,6,7. Indeed, the human gut microbiota produces many metabolites through the microbial metabolism of nondigested food components such as dietary fibers, which are prevalent in grains, vegetables and fruits8. Therefore, microbiota-derived metabolites are important mediators of host health9,10,11,12,13. For example, short-chain fatty acids (SCFAs) are metabolites produced by intestinal microbes through anaerobic fermentation of indigestible polysaccharides such as dietary fiber and resistant starch9,10. SCFA concentrations have been linked to regulation of immune cell function14,15, gut-brain communication16, and cardiovascular diseases17,18. Among the SCFAs, butyrate has been shown to be negatively correlated with pro-inflammatory cytokines19,20. Hence, a high level of butyrate from the gut microbiota is believed to be beneficial due to its anti-inflammatory effects19,20,21. Boosting the levels of health-beneficial metabolites by modulating the gut microbiota appears to be a promising approach to improve host health22,23,24.

One possible way to modulate the gut microbiota is through dietary interventions6. Gut microbial composition is affected by the diet6,25,26,27,28. As a result, microbiota‐targeted dietary interventions have been proposed to modulate the gut microbiota to increase the production of metabolites beneficial to the host29,30,31. Recently, there has been a growing trend to exploit the tripartite relationship between food/nutrition, gut microbiota, and microbiota-derived metabolites to provide better dietary advice for each individual3,4,5,28,29,30,31,32. Indeed, accurate prediction of personalized metabolite responses to foods and nutrients based on the gut microbiota holds great promise for precision nutrition33.

Many dietary intervention studies have attempted to investigate the relationship between diet and microbial metabolism of the gut microbiota27,28,32,34. However, most of these studies only analyzed correlations between dietary treatments, microbes, or metabolites. A few studies have used different analytic approaches to predict postprandial responses of metabolite markers such as blood glucose3,4 and immune markers4,34. However, the personalized prediction of how important markers such as SCFAs and bile acids respond to long-term dietary interventions is under-investigated (Fig. 1).

Fig. 1: A typical dietary intervention study design.
figure 1

Before the dietary intervention, the baseline gut microbial compositions and metabolomic profiles (of either fecal samples or blood samples) are measured. During the dietary intervention, one or a few dietary resources are introduced (represented here by avocado) in addition to the baseline diet. The task we intend to solve is to predict personalized metabolite responses after dietary intervention based on the baseline gut microbial compositions, baseline metabolomic profiles, and the dietary intervention strategy.

Our aim is to predict the post-dietary intervention (or “endpoint”) metabolite concentrations in fecal or blood samples based on the pre-dietary intervention (or “baseline”) microbial composition, metabolome data, and the dietary intervention strategy. This is conceptually different from existing studies on the inference of metabolomic profiles from microbial compositions measured at the same time35,36,37,38. Herein, we leveraged data from randomized, controlled dietary intervention studies27,28,34,39,40,41 and developed a deep-learning method: Metabolite response predictor using coupled Multilayer Perceptrons (McMLP) to predict endpoint metabolite concentrations based on baseline microbial compositions. We first generated synthetic data based on a microbial consumer-resource model that simulates the dietary intervention process and found that McMLP outperforms existing methods (RF and GBR), especially when the training sample size is small. We then applied all methods to real data from six dietary intervention studies27,28,34,39,40,41, finding that the predictive power of McMLP is higher than existing methods. Finally, based on the well-trained McMLP, we performed the sensitivity analysis to infer the tripartite food-microbe-metabolite relationship, supported by some literature evidence.

Results

Overview of McMLP

We hypothesized that in order to accurately predict post-dietary intervention metabolomic profiles, we first need to capture how the microbiome composition changes from the baseline to the endpoint. This is because metabolomic profiles reflect the microbial metabolism of a community7,42. To test our hypothesis, we proposed McMLP, which consists of two steps: (step-1) use the baseline microbiota and metabolome data (i.e., concentrations of targeted metabolites) and the dietary intervention strategy to predict the endpoint microbial composition; and (step-2) use the predicted endpoint microbial composition, the baseline metabolome data, and the dietary intervention strategy to predict the endpoint metabolomic profile (Fig. 2a; Supplementary Fig. 1a). For each step, we used a multilayer perceptron (MLP) with Rectified Linear Unit (ReLu) as the activation function to perform the prediction. We emphasize that, in principle, one can just use one MLP to directly predict endpoint metabolomic profiles based on baseline microbiota/metabolome data and the dietary intervention strategy (Supplementary Fig. 1b). Later, we confirmed that this one-step strategy has worse predictive power than our two-step strategy.

Fig. 2: The workflow of McMLP (Metabolite response predictor using coupled multilayer perceptrons).
figure 2

We aim to predict endpoint metabolomic profiles (i.e., metabolomic profiles after the dietary interventions) based on the baseline microbial compositions (i.e., microbial compositions before the dietary intervention), dietary intervention strategy, and baseline metabolomic profiles. Here we used a hypothetical example with n = 5 training samples and 2 samples in the test set. For each sample, we considered \({N}_{{{{\rm{s}}}}}\) microbial species, \({N}_{{{{\rm{d}}}}}\) dietary resources, and \({N}_{{{{\rm{m}}}}}\) metabolites. Across three panels, microbial species and their relative abundances are colored blue, dietary resources and their intervention doses are colored green, and metabolites and their concentrations are colored red. Icons associated with baseline/endpoint data are bounded by solid black/dashed lines respectively. a The model architecture of McMLP. McMLP comprises two coupled MLPs (multilayer perceptrons). The first MLP at the top (step 1) predicts the endpoint microbial compositions based on the baseline data and the dietary intervention strategy. The predicted endpoint microbial compositions from the first MLP are then provided as input to the second MLP at the bottom (step 2). The second MLP combines the predicted endpoint microbial compositions, the dietary intervention strategy, and the baseline metabolomic profiles to finally predict the endpoint metabolomic profiles. The value of dietary intervention strategy is either binary to denote the presence/absence of each dietary resource or numeric to be proportional to the intervention dose. Details of both MLPs can be found in Supplementary Fig. 1 and “Methods”. b McMLP takes two types of baseline data (baseline microbial compositions and baseline metabolomic profiles) and the dietary intervention strategy as input variables and is trained to predict corresponding endpoint metabolomic profiles. During training, the endpoint microbial composition is needed to train the first MLP. By contrast, the second MLP directly takes the predicted endpoint microbial composition instead of the actual endpoint microbial composition. c The well-trained McMLP can generate predictions for metabolomic profiles for the test set. During testing, no endpoint microbial composition is needed because the second MLP directly takes the predicted endpoint microbial composition from the first MLP as the input.

From a practical standpoint, our goal is to predict an individual’s metabolite response (i.e., the change in concentrations of the targeted metabolite) to a potential dietary intervention to facilitate precision nutrition. To achieve this goal, we feed the baseline microbiota and metabolome profiles of this individual and the potential dietary intervention strategy to a well-trained McMLP to predict the endpoint metabolome profile. Note that in this application (or test) stage, because the dietary intervention is a thought experiment, no real endpoint data is available. The first MLP in McMLP will predict the endpoint microbiota profile, which will be fed into the second MLP to predict the endpoint metabolome profile.

During the training stage of McMLP, we need to collect not only baseline microbiota and metabolome profiles of different individuals, but also perform dietary interventions to collect actual endpoint microbiota and metabolome profiles. We emphasize that the actual endpoint microbiota data will only be used to train the first MLP (Fig. 2b). It shall not be used to train the second MLP. This is because we need to keep the consistency between the training and test stages. After all, during the application stage, it is the predicted endpoint microbiome profile that will be fed into the second MLP, and the actual endpoint microbiome profile does not exist at all.

Instead of fine-tuning hyperparameters such as the number of layers \({N}_{{{{\rm{l}}}}}\) and the hidden layer dimension \({N}_{{{{\rm{h}}}}}\) for MLP, we overparameterized MLP by using a large and fixed number of layers \({N}_{{{{\rm{l}}}}}\) and hidden layer dimension \({N}_{{{{\rm{h}}}}}\) (\({N}_{{{{\rm{l}}}}}=6\) and \({N}_{{{{\rm{h}}}}}=2048\)). The overparameterized machine learning methods, especially deep learning models, yield better performance due to their high capacity (i.e., more model parameters). In fact, the high-capacity models can be even simpler due to smoother function approximation and thus less likely to overfit43.

To illustrate the prediction task, we used a hypothetical example comprising \({N}_{{{{\rm{s}}}}}(=5)\) microbial species, \({N}_{{{{\rm{d}}}}}(=3)\) dietary resources being intervened, \({N}_{{{{\rm{m}}}}}(=6)\) metabolites, and 7 samples (Fig. 2b, c). We will use both the baseline data and the dietary intervention strategy as inputs for McMLP (Fig. 2a). We used the Centered Log-Ratio (CLR)-transformed microbial relative abundances as the microbial composition and log10 transformed metabolite concentrations as the metabolomic profile. We did not impose the constraint that the predicted relative abundances from the first MLP add up to one. The value of dietary intervention strategy is either binary to denote the presence/absence of each dietary resource or numeric to be proportional to the intervention dose. 5 samples are used as the training set (Fig. 2b) and the remaining 2 samples form the test set (Fig. 2c). To evaluate the regression performance, we employed three metrics based on the Spearman correlation coefficient (SCC) \(\rho\) between the predicted and true values of the concentration of one metabolite across all samples: (1) \(\bar{\rho }\): the mean SCC, (2) \({f}_{\rho > 0.5}\): the fraction of metabolites with \(\rho\) greater than 0.5, and (3) \({\bar{\rho }}_{5}\): the mean SCC of the top-5 best-predicted metabolites.

McMLP generates superior performance over existing methods on synthetic data

To validate the predictive power of McMLP, we applied it to synthetic data generated from the Microbial Consumer-Resource Model (MiCRM) which considers microbial interactions through both nutrient competition and metabolic cross-feeding44. We adapted MiCRM to simulate the dietary intervention. For simplicity, we considered 20 food resources, 20 microbes, and 20 metabolites in the modeling. Also, we assumed that food resources can only be consumed while metabolites can be either consumed or produced. Prior to the dietary intervention, one food resource (referred to as “food resource #1”) was not introduced, while the remaining 19 food resources were supplied. Dietary intervention was simulated by adding food resource #1 at a specific “dose” to microbial communities composed of surviving species before the dietary intervention and calculating the new ecological steady state. Here, the “dose” is defined as the ratio between the amount of the introduced food resource during the dietary intervention and the average amount of other food resources introduced before the dietary intervention. We split the synthetic data (with 250 samples) with 80/20 ratio fifty times to generate fifty train-test pairs that can be used to reflect the variation in predictive performance. Details on model simulation and synthetic data generation can be found in the Supplementary Information.

We compared the performance of McMLP with two classical methods (GBR: Gradient-Boosting Regressor3; RF: Random Forest4,5) in the prediction task defined in Fig. 2. For each method, we considered two sets of input variables: (1) without baseline metabolomic profiles (denoted as “w/o b” hereafter) and (2) with baseline metabolomic profiles (denoted as “w/ b” hereafter).

We first used the three metrics (\(\bar{\rho }\), \({f}_{\rho > 0.5}\), \({\bar{\rho }}_{5}\)) to benchmark the predictive performance of the different methods on synthetic data with 50 training samples and an intervention dose of 3. We found that McMLP generated the best performance (Fig. 3a1-a3), especially when baseline metabolomic profiles were included in the input. When we predict without baseline metabolomic profiles, McMLP is significantly better than RF and GBR (p value < 0.05 for 5/6 comparison cases, Wilcoxon signed-rank test applied; McMLP yields the highest \(\bar{\rho }\) of \(0.391\pm 0.008\), the highest \({f}_{\rho > 0.5}\) of \(0.197\pm 0.018\), and the highest \({\bar{\rho }}_{5}\) of \(0.536\pm 0.007\); the standard error is used to measure the variation in performance metrics across 50 train-test splits). Including baseline metabolomic profiles in the input significantly improves the performance of all methods, with McMLP still being the best (which yields the highest \(\bar{\rho }\) of \(0.595\pm 0.005\), the highest \({f}_{\rho > 0.5}\) of \(0.815\pm 0.014\), and highest \({\bar{\rho }}_{5}\) of \(0.715\pm 0.006\)). We also tried to introduce 5 food resources during the dietary intervention (instead of 1 previously; see Supplemental Information for details) and found that the performance of McMLP is still superior to other methods when the dietary intervention strategy is more complex (Supplementary Fig. 2).

Fig. 3: McMLP provides better predictive power than previously developed computational methods for predicting endpoint metabolomic profiles on synthetic data generated from microbial consumer-resource models.
figure 3

Three computational methods are compared: Random Forest (RF), Gradient Boosting Regressor (GBR), and McMLP. For each method, we either included (“w/ b” label) or did not include (“w/o b” label) baseline metabolomic profiles as input variables. Each method with a particular combination of input data is colored the same way in all panels. Standard errors are computed based on fifty random train-test splits and shown in all panels (as solid black vertical lines or transparent areas around their means). To compare different methods, we adopted three metrics: the mean Spearman Correlation Coefficient (SCC) \(\bar{\rho }\), the fraction of metabolites with SCCs greater than 0.5 (denoted as \({f}_{\rho > 0.5}\)), and the mean SCC of the top-5 predicted metabolites \({\bar{\rho }}_{5}\). Error bars denote the standard error (n = 50). a1-a3, For the synthetic data with an intervention dose of 3 and 50 training samples, McMLP provides the best performance for all three metrics regardless of whether the baseline metabolomic profiles are included or not. b1-b3, When the intervention dose is 3, the predictive performance of all methods gets better and closer to each other as the training sample size increases. Including baseline metabolomic profiles also helps to improve the prediction. c1-c3, When 200 training samples are used, the performance gap between including and not including baseline metabolomic profiles shrinks as the intervention dose increases. All statistical analyses were performed using the two-sided Wilcoxon signed-rank test. P values obtained from the test are divided into four groups: (1) \(p > 0.05({{{\rm{n}}}}.{{{\rm{s}}}}.)\), (2) \(0.01 < p\le 0.05(*)\), (3) \({10}^{-3} < p\le 0.01(*\ast )\), and (4) \({10}^{-4} < p\le {10}^{-3}(*\ast*)\). Source data of raw data points and p values are provided as a Source Data file.

We further examined the effect of training sample size on model performance. While maintaining the same 50-sample test set used previously, we found that all performance metrics for all methods improved as the training sample size increased (Fig. 3b1–b3). More importantly, we found that the performance of McMLP is better than RF and GBR at small training sample sizes (20 or 50) and is close to RF and GBR at large training sample sizes (>50). This demonstrates the superior performance of McMLP with a limited number of samples, contrary to the traditional notion that deep learning methods tend to overfit at small sample sizes45.

We also examined the effect of intervention dose on model performance. By varying the concentration of the intervened food resource in MiCRM, we generated synthetic data with different intervention doses and subsequently trained all ML methods on them with 200 training samples. We found that the performance gap between methods using and not using baseline metabolomic profiles narrows as the intervention dose increases (Fig. 3c1–c3). We believe this is because a larger intervention dose significantly changes the endpoint metabolomic profile away from its baseline level, rendering the baseline metabolomic profile less useful.

Different from the above-mentioned benchmarking method where training data overlapped across train-test splits, we explored the impact of non-overlapping training data on our benchmarking results. To explore this, we created one independent synthetic dataset for each training and utilized the same, separate dataset as the test set (with 100 samples) for the performance evaluation across all repeats. Based on this new benchmarking protocol, we have benchmarked the performance of all algorithms and once again revealed the amazing predictive performance of McMLP (Supplementary Fig. 3).

McMLP accurately predicts metabolite responses on real human gut microbiota data

After validating McMLP using synthetic data, we analyzed real data from six dietary intervention studies to see if its performance on real data was consistently better than existing methods. The first dataset we collected was from a study investigating how avocado consumption alters gut microbial compositions and concentrations of fecal metabolites such as SCFAs and bile acids28. In this study all participants were divided into two groups based on the food components of the meals provided: (1) avocado group: 175 g (men) or 140 g (women) of avocado was provided as part of a meal once a day for 12 weeks and (2) control group: no avocado was included in their control meal28. Baseline (i.e., before the dietary intervention) and endpoint (i.e., during week 12 of the intervention) microbial compositions and concentrations of SCFAs and bile acids were quantified. The dataset is unique due to its relatively large sample size (66 for both avocado and control groups)28 compared to other dietary intervention studies27,32,34.

Because the amount of avocado consumed by participants in the avocado group was very similar and participants in the control group barely consumed avocado, for simplicity, we encoded the participant’s dietary intervention in McMLP and other methods as a binary variable in the input (green icons/symbols representing diets in Fig. 2) whose value equals 1 or 0 if the participant is in the avocado or control group, respectively. Note that in this study the concentrations of fecal SCFAs and bile acids were obtained from two separate targeted metabolomic assays. Hence, we separated the concentration prediction of SCFAs and bile acids to compare the predictability of the two metabolite classes. We found that for the concentration prediction of both SCFAs and bile acids, McMLP with the baseline metabolomic profiles consistently produces the best performance (Fig. 4a1-a3, b1-b3). Interestingly, the inclusion of baseline metabolomic profiles in the input of McMLP helps more with the prediction of bile acid concentrations than with the prediction of SCFA concentrations (\(\bar{\rho }\) increases from 0.182 to 0.346 for bile acids when metabolomic profiles are included; \(\bar{\rho }\) increases from 0.260 to 0.262 for SCFAs when metabolomic profiles are included). A potential explanation is that the correlation of SCFA concentrations between baseline and endpoint samples is weaker than that of bile acids (Supplementary Fig. 4).

Fig. 4: McMLP is superior to previous methods in terms of predicting endpoint metabolomic profiles on real data from six dietary intervention studies.
figure 4

Three computational methods are compared: Random Forest (RF), Gradient Boosting Regressor (GBR), and McMLP. For each method, we either included (“w/ b” label) or did not include (“w/o b” label) baseline metabolomic profiles as input variables. Each method with a particular combination of input data is colored the same in all panels. Standard errors are computed based on fifty random train-test splits and shown in all panels (solid black vertical lines). To compare different methods, we adopted three metrics: the mean Spearman Correlation Coefficient (SCC) \(\bar{\rho }\), the fraction of metabolites with SCCs greater than 0.5 (denoted as \({f}_{\rho > 0.5}\)), and the mean SCC of the top-5 predicted metabolites \({\bar{\rho }}_{5}\). Error bars denote the standard error (n = 50). a1-a3, Comparison of the performance in predicting SCFAs on the data from the avocado intervention study28. b1-b3, Comparison of performance in predicting bile acids on the data from the avocado intervention study28. c1-c3, Comparison of predictive performance on the data from the grain intervention study39. d1-d3, Comparison of predictive performance on the data from the walnut intervention study27. e1-e3, Comparison of predictive performance on the data from the almond intervention study40. f1-f3, Comparison of predictive performance on the data from the broccoli intervention study41. g1-g3, Comparison of predictive performance on the data from the high-fiber food or fermented food intervention study34. All statistical analyses were performed using the two-sided Wilcoxon signed-rank test. P values obtained from the test are divided into four groups: (1) \(p > 0.05\) (n.s.), (2) \(0.01 < p\le 0.05\) (*), (3) \({10}^{-3} < p\le 0.01\) (**), and (4) \({10}^{-4} < p\le {10}^{-3}\) (***). Source data of raw data points and p values are provided as a Source Data file.

We checked the predictive performance of the one-step strategy that uses the same number of layers and nodes as one step in McMLP (\({N}_{{{{\rm{l}}}}}=6\) and \({N}_{{{{\rm{h}}}}}=2048\) in Supplementary Fig. 1b), finding that it is not as good as that of McMLP (Supplementary Fig. 5). It is worth noting that augmenting the one-step approach with additional data types through the two-step McMLP does not automatically guarantee enhanced predictive performance. The utility of the additional data hinges on its relevance and the model’s capacity to utilize it efficiently. Despite these potential uncertainties, we believe the enhanced performance of McMLP could be attributed to its two-step approach. This method allows for an initial capture of the endpoint microbial composition, presumably better associated with the endpoint metabolite concentrations. This may also explain why McMLP outperforms RF4,5 and GBR3, which employ a one-step approach and do not leverage the endpoint microbial compositions during method training. We also compared McMLP with the state-of-art method of predicting metabolomic profiles from microbial compositions measured at the same time --- mNODE38, finding that it has a worse performance than McMLP (Supplementary Fig. 6). The worse performance of mNODE is likely due to the fact that it is not dedicated to predicting metabolomic profiles at different time points. More technical reasons can be found in the Supplementary Information.

We extended the method comparison to five additional datasets from independent dietary studies investigating how microbiota compositions and fecal metabolite concentrations were influenced by adding grains39, walnuts27, almonds40, broccoli41, and high-fiber or fermented foods34 (the number of fecal microbes and metabolites as well as the types of metabolites are summarized in Table 1; see Methods section for details of the studies). Each participant’s dietary intake was similarly encoded as either a binary variable or a vector whose value is proportional to the consumed amount of the added dietary component, depending on the complexity of the dietary intervention. Further details of the data processing and model architecture setup can be found in the Supplementary Information. As shown in Fig. 4, McMLP consistently produces the best performance across all datasets (p value < 0.05 for 47/84 comparison cases, Wilcoxon signed-rank test applied). The relatively poor performance of all methods on the data from the study that investigated fibers and fermented foods34 is likely due to the fact that a variety of foods within the fiber and fermented foods categories were consumed by the participants at will, while other studies were complete feeding trials34.

Table 1 Summary of key features of dietary intervention studies used in our method comparison. ASVs: Amplicon Sequence Variants

We noticed that the predictive performance of McMLP on real data is worse than that in synthetic data. We believe the observed discrepancy in predictive performance between the synthetic and real data may be due to the influence of human host, such as host metabolism46 and health status47. While \(\bar{\rho }\) appears to be low ( ~ 0.2 to 0.4), the top-5 best-predicted metabolites for each dataset have great predictability, likely due to their strong association with the gut microbiome (Supplementary Fig. 7). We also compared the predictive performance of McMLP with that of a simple MLP with one hidden layer with everything else the same as in McMLP, finding that McMLP generates better performance (Supplementary Fig. 8).

We also explored whether incorporating covariates in the metadata can help further improve the predictive performance of McMLP. We only obtained the covariates for the avocado intervention study. For the avocado dataset, we have three covariates: gender, BMI, and age. We included these three covariates as additional variables in McMLP, finding that the incorporation of covariates significantly improves the predictive performance for most cases (Fig. 5). We also analyzed the permutation feature importance of the three covariates by shuffling the values of a covariate in the input and then measuring the reduction in the average Spearman Correlation Coefficients \(\bar{\rho }\). We found that all three covariates are important, except that gender is slightly less important than age when predicting the SCFAs (Supplementary Fig. 9).

Fig. 5: Including the covariates in metadata (age, BMI, and gender) in the input of McMLP improves it in terms of predicting endpoint metabolomic profiles on real data from the avocado intervention study.
figure 5

All results are derived from McMLP. We either included (“w/ b” label) or did not include (“w/o b” label) baseline metabolomic profiles as input variables. Each method with a particular combination of input data is colored the same in all panels. Standard errors are computed based on fifty random train-test splits and shown in all panels (solid black vertical lines). To compare different methods, we adopted three metrics: the mean Spearman Correlation Coefficient (SCC) \(\bar{\rho }\), the fraction of metabolites with SCCs greater than 0.5 (denoted as \({f}_{\rho > 0.5}\)), and the mean SCC of the top-5 predicted metabolites \({\bar{\rho }}_{5}\). Error bars denote the standard error (n = 50). a1-a3, Comparison of the performance in predicting SCFAs on the data from the avocado intervention study28. b1-b3, Comparison of performance in predicting bile acids on the data from the avocado intervention study28. All statistical analyses were performed using the two-sided Wilcoxon signed-rank test. P values obtained from the test are divided into four groups: (1) \(p > 0.05\) (n.s.), (2) \(0.01 < p\le 0.05\) (*), (3) \({10}^{-3} < p\le 0.01\) (**), and (4) \({10}^{-4} < p\le {10}^{-3}\) (***). Source data of raw data points and p values are provided as a Source Data file.

We wonder if the predictive performance of McMLP can be enhanced if we use the functional profiles generated from the whole-metagenome shotgun (WMS) sequencing instead of the microbial compositions derived from the 16S rRNA gene sequencing. To test this, we leveraged the available WMS sequencing data for a subset of samples in the avocado study. In the end, only 45 individuals have paired baseline-endpoint data. Their functional profiles are represented by 375 pathway features (see Methods section for details). For the 45 paired baseline-endpoint data, we compared the predictive performance among three different input data types: (1) microbial compositions, (2) functional profiles, and (3) combining both microbial compositions and functional profiles. The performance comparison of the three different input data types yields no significant difference (Supplementary Figs. 10, 11).

For the avocado dataset, we also grouped the ASV (Amplicon Sequence Variants) compositions from the 16S rRNA gene sequencing and the species-level microbial compositions from the WMS sequencing to the genus level. When analyzing the 16S sequencing data, predictions using the ASV-level compositions are generally more accurate than those using the genus-level compositions (Supplementary Fig. 12). For SCFAs, the predictive performances based on two types of compositions are comparable. Regarding the WMS data, we observed that predictions using the species-level compositions are slightly better than those using the genus-level compositions (Supplementary Fig. 13).

Inferring the tripartite food-microbe-metabolite relationship

It has been previously shown that an individual’s metabolite response depends on her/his gut microbial composition7,42,48. If we want to introduce a new dietary resource to boost the concentration of a health-beneficial metabolite mediated by gut microbes, we need to target “key” microbial species that meet two criteria: (1) the species can consume one or more dietary components in the introduced food resource; (2) the species can increase the metabolite concentration. If either criterion is not met, it is difficult to boost the metabolite concentration via this dietary intervention. Specifically, we identify these “key” species that satisfy both criteria by revealing the food-microbe consumption and microbe-metabolite production patterns, which can be summarized in a tripartite food-microbe-metabolite graph (Supplementary Fig. 14). To achieve this, we performed the sensitivity analysis of McMLP. In particular, we interpreted a potential relationship between an input variable \(x\) and an output variable \(y\) by perturbing \(x\) by a small amount (denoted as \(\Delta x\)) and then measuring the response of \(y\) (denoted as \(\Delta y\)). Following the notion of sensitivity in engineering sciences, we defined sensitivity \(s=\frac{\Delta y}{\Delta x}\) and used its sign (positive/negative) to reflect whether \(y\) changes in the same/opposite direction as \(x\). More technical details of this calculation can be found in the Methods section or in our previous study38.

We calculated sensitivities for step-1 (and step-2) in McMLP to infer potential food-microbe consumption (and microbe-metabolite production) interactions, respectively (Fig. 6a). Specifically, in step-1, we perturbed the amount of food resource \(\alpha\) and measured the change in the relative abundance of species \(i\). The sensitivity of species \(i\) to food resource \(\alpha\) is \({{{{\rm{s}}}}}_{i\alpha }=\frac{\Delta {y}_{i}}{\Delta {x}_{\alpha }}\) and its sign can be used to reflect the interaction between species \(i\) and food resource \(\alpha\). \({{{{\rm{s}}}}}_{i\alpha } > 0\), indicates that species \(i\) can consume some nutrient components of food resource \(\alpha\). Similarly, for step-2, we define the sensitivity of metabolite \(\beta\) to species \(i\) as \({{{{\rm{s}}}}}_{\beta i}=\frac{\Delta {y}_{\beta }}{\Delta {x}_{i}}\). The positive sensitivity, \({{{{\rm{s}}}}}_{\beta i} > 0\), reveals potential production of the metabolite \(\beta\) by species \(i\).

Fig. 6: Applying sensitivity analysis of McMLP accurately infers food-microbe consumption interactions and microbe-metabolite production interactions in both synthetic and real data.
figure 6

a The sensitivity of the relative abundance of species \(i\) to the supplied dietary resource \(\alpha\) is denoted as s. It is defined as the ratio between the change in the relative abundance of species \(i\) (\({\Delta y}_{i}\)) and a small perturbation in the supplied dietary resource \(\alpha\) (\({\Delta x}_{\alpha }\)). Similarly, the sensitivity of the concentration of metabolite \(\beta\) to the relative abundance of species \(i\) is denoted as sβi. It is defined as the ratio between the change in the concentration of metabolite \(\beta\) (\({\Delta y}_{\beta }\)) and the perturbation in the relative abundance of species \(i\) (\({\Delta x}_{i}\)). b The sensitivity values for food-microbe consumption interactions (colored in green) and microbe-metabolite production interactions (colored in red) in the synthetic data. c The ground-truth food-microbe consumption rates (colored in green) and microbe-metabolite production rates (colored in red) in the synthetic data. d The Area Under the Receiver Operating Characteristic (AUROC) curve based on True Positive (TP) rates and False Positive (FP) rates which are obtained by using different sensitivity thresholds to classify interactions. e The sensitivity values for avocado-microbe consumption interactions (colored in green) and microbe-metabolite production interactions (colored in red) for the real data from the avocado intervention study. f The avocado-microbe-butyrate tripartite graph constructed based on the sensitivity values of avocado-microbe consumption interactions and microbe-butyrate production interactions for the real data from the avocado intervention study. The edge width and edge arrow sizes are proportional to the absolute values of the sensitivities. All microbes in the middle layer are arranged from left to right in the increasing order of the incoming edge width multiplied by the outgoing edge width. Source data are provided as a Source Data file.

We first evaluated our sensitivity method on the synthetic data for which we know the ground truth of food-microbe consumption and microbe-metabolite production interactions. We found that the inferred sensitivity values for all food-microbe and microbe-metabolite pairs (Fig. 6b) have a zero-nonzero pattern very similar to the ground-truth consumption and production rates assigned in MiCRM (Fig. 6c). We chose zero as the sensitivity threshold and kept only positive values for food-microbe pairs (green cells in Fig. 6b, c) and for microbe-metabolite pairs (red cells in Fig. 6b, c) to explore consumption and production interactions respectively. To statistically verify the agreement between ground-truth interactions and inferred interactions based on sensitivity values, we computed the AUROC (Area Under the Receiver Operating Characteristic curve) based on the overlap between true and predicted interactions when the classification threshold is varied. More specifically, for each classification threshold \({s}_{{\mbox{thres}}}\), we predicted the consumption of food resource \(\alpha\) by species \(i\) (or production of metabolite \(\alpha\) by species \(i\)) to be true only if \({{{{\rm{s}}}}}_{i\alpha } > {s}_{{\mbox{thres}}}\) (or \({{{{\rm{s}}}}}_{\alpha i} > {s}_{{\mbox{thres}}}\)). We achieved excellent performance in inferring either food-microbe consumption interactions (green line and dots with AUROC = 0.9 in Fig. 6d) or microbe-metabolite production interactions (red line and dots with AUROC = 0.92 in Fig. 6d).

We then performed the same inference on real data from the avocado study28. The results are shown in Fig. 6e (Inference results of other studies provided in the Supplementary Data). Our results shown in Fig. 6e are in agreement with prior biological knowledge that Faecalibacterium prausnitzii is a stronger producer of butyrate49 than Ruminococcus callidus, and R. calidus is a stronger producer of acetate than F. prausnitzii50,51.

The inference results also enable us to construct the tripartite food-microbe-metabolite graph. For the sake of simplicity, here we visualize the avocado-microbe-butyrate subgraph (Fig. 6f). Note that increased butyrate levels have been shown to be beneficial to host health by enhancing immune status19,20,21. For the avocado-microbe-butyrate subgraph, we focused on the top-20 avacado-microbe consumption and top-20 microbe-butyrate production interactions ranked by their absolute sensitivity values. Only nodes and links associated with these interactions were shown in this subgraph. Widths of individual edges in this figure are proportional to the absolute values of the corresponding sensitivities and node sizes for microbes are proportional to the products of edge widths connecting this microbe to avocado at the top and butyrate at the bottom of this subgraph. We ordered microbial nodes in the middle layer in the increasing order of node sizes from left to right (Fig. 6f). This organization helps us identify the key species that serve as both strong consumers of avocado and strong producers of butyrate. F. prausnitzii emerged as the most important key species for butyrate production in response to avocado intervention. Our results are consistent with previous studies49. For example, F. prausnitzii levels have been previously shown to be elevated when avocado is supplied by diet52. In a separate study, F. prausnitzii has also been shown to produce butyrate as a metabolite byproduct49.

Discussion

A highly accurate computational method for predicting metabolite responses based on baseline data and a potential dietary intervention strategy is a prerequisite for precision nutrition. In this paper, we developed a deep learning method, McMLP, which predicts metabolomic profiles after a dietary intervention better than existing methods. We first validated the superior performance of McMLP using synthetic data generated by a microbial consumer-resource model and investigated the influence of diet intervention doses and training sample sizes. We then demonstrated that McMLP produced the most accurate predictions across six different dietary intervention studies27,28,34,39,40,41. We proceeded with a biological interpretation of McMLP results using sensitivity analysis to infer the tripartite food-microbe-metabolite relationship, finding that the inferred relationship was quite accurate in synthetic data. Finally, we demonstrated that our sensitivity analysis applied to real data revealed key species whose metabolic capabilities were consistent with prior biological knowledge.

Currently available dietary intervention studies have many limitations for use in machine learning. First, the sample size (or number of participants) of these studies is typically small, on the order of dozens27,32,34,41. The relatively small sample size fundamentally limits the performance of any predictive model. While the cross-validation that we employed is a widely used method for assessing model robustness and preventing overfitting, its reliability is contingent upon the sample size. It has been well-documented that performance estimates derived from cross-validation can carry a significant degree of uncertainty when applied to small datasets such as dietary intervention with walnuts, almonds, and broccoli53. This uncertainty is attributed to the increased variability in training and validation splits, which can result in overestimated or underestimated model performance. This problem may be mitigated in ongoing large-scale research cohorts with many participants. One such cohort is the All of Us Research Program, which is attempting to build a diverse health database of more than one million people across the U.S. and then use the data to learn how human biology, lifestyle, and environment affect health. As part of this observational cohort, the recently announced Nutrition for Precision Health Study will recruit 10,000 participants to conduct precision dietary interventions54. Second, only a handful of dietary components have ever been explored in dedicated diet-microbiota studies. As a result, the computational approaches can only predict metabolite responses for the limited set of dietary components used in these studies. However, to realize the promise of precision nutrition to provide accurate personalized dietary recommendations, we need a predictive model that can accurately predict metabolite responses for a wide range of dietary components. Last, other baseline variables unavailable to us here (e.g., meal composition, age, sex, demographics, and anthropometric data) might help to improve the predictive performance. If such data are available, they can be incorporated into McMLP as extra input variables.

Consistent with most dietary intervention studies in the literature, the data available for this study were primarily the 16S rRNA gene sequencing data. We acknowledge that 16S rRNA gene sequencing may restrict our taxonomic resolution to the genus level for certain taxa. Yet, a unique aspect of this work is that the 16S results were able to be further explored and validated using WMS data that were available for a subset of the avocado study. Our comparison of the predictive performance of McMLP between using the microbial compositions and the functional profiles in the input demonstrates the effectiveness of using the microbial compositions (at the ASV level) derived from the 16S rRNA gene sequencing data. Indeed, McMLP still yields highly promising results for identifying important interactions supported by works of literature: (1) Faecalibacterium prausnitzii is a stronger producer of butyrate and (2) Ruminococcus callidus is a stronger producer of acetate. Both Faecalibacterium prausnitzii and Ruminococcus callidus are identified from the 16S data. These results showcase the power of performing the sensitivity analysis on well-trained McMLP. Looking ahead, we believe this lack of data will only be solved by the emergence of more datasets from dietary intervention studies with paired metabolome and WMS sequencing data. Hopefully, the emergence of new datasets in the future will open new opportunities for applying and refining our method. We recognize the inherent complexity of microbe-metabolite interactions and acknowledge that our approach primarily focuses on very simplified pairwise consumption/production interactions between microbes and metabolites. In reality, these interactions can extend beyond pairwise interactions. For example, the serum level of enterolactone, one metabolite produced by some gut microbiota upon consumption of dietary fibers, is influenced by the dietary fat intake55,56. Additionally, McMLP currently relies on a limited selection of metabolites obtained from targeted metabolomics. In the future, as dietary intervention studies incorporate more comprehensive lists of metabolites, we anticipate that the prediction power of McMLP will be further improved.

Our McMLP architecture is quite generic—its input variables and their dimensions can be easily adapted to fit more complex datasets. For example, if a particular dietary intervention study documents an extensive list of dietary components, McMLP can be modified to include an input node for each dietary component to reflect the amount and frequency of its consumption. Similarly, the predicted output variables of McMLP need not be limited to metabolomic profiles measured in fecal samples. It can be generalized to predict other variables such as immune biomarkers or metabolite concentrations from blood samples. Moreover, McMLP can be interpreted through sensitivity analysis, revealing numerous interactions supported by existing literature evidence. If McMLP can successfully predict other data types, it might be feasible to infer other types of interactions.

Unlike other machine learning methods that typically require hyperparameter tuning to achieve the best performance for each dataset with a different set of hyperparameters, McMLP consistently outperformed existing machine learning methods across six real datasets even without hyperparameter tuning. We speculate that McMLP exploited the recently observed “double-descent” behavior for the risk curve43, which suggests that an overparametrized deep-learning model (i.e., one with an extremely large number of model parameters) can generate better and more consistent performance than models with less capacity and more carefully tuned hyperparameters. To reach this overparameterized regime, we used a large and fixed number of layers \({N}_{{{{\rm{l}}}}}=6\) and a large hidden layer dimension \({N}_{{{{\rm{h}}}}}=2048\), exceeding both the number of microbial species and the number of metabolites. One benefit of using such a model free of hyperparameter tuning is the shorter training time. Since the typical 5-fold cross-validation used to select the best set of hyperparameters is the most time-consuming part of a typical deep learning workflow, McMLP saves a significant amount of time required for hyperparameter tuning and thus has a shorter training time (~5 min for each run of McMLP on the avocado intervention study28).

Methods

Ethical compliance statement

In this study, we used publicly available datasets from previous studies, whose study procedures were administered in accordance with the Declaration of Helsinki and were approved by the University of Illinois Institutional Review Board.

Datasets

The datasets utilized herein were generated as part of work on bacterial56 and metabolite57 biomarkers of food intake, which provided anonymized microbial and metabolomic data. The dataset related to the fibers or fermented foods intervention study is available for download in the supplemental material of the original publication34. The main characteristics of the dietary intervention studies used above are summarized in Table 1. Across all studies, fecal or blood samples were collected before and after each dietary intervention period. Gut microbiota composition was determined by the 16S rRNA gene sequencing and metabolomic profiles of either fecal samples or blood serum samples were determined by tandem liquid chromatography-mass spectrometry (LC-MS/MS) and gas chromatography-mass spectrometry (GC-MS) metabolomics. For all machine learning tasks, the same fifty random 80/20 train-test splits were used to ensure a fair comparison of methods. Further details are described below:

Avocado intervention study. This dataset was reported by a dietary intervention study that investigated how avocado consumption altered the relative abundance of gut bacteria and concentrations of microbial metabolites in 132 adults with overweight or obesity28. All participants were assigned to the avocado treatment or no-avocado control group (66 each for arm). They consumed isocaloric meals with or without avocado (175 g, men; 140 g, women) once daily for 12 weeks. For fecal samples collected before and after the dietary intervention, 278 ASVs (Amplicon Sequence Variants) were determined by the 16S rRNA gene sequencing and profiles of 6 SCFAs and 21 bile acids were generated by LC-MS/MS metabolomics. Out of 132 individuals, 45 individuals’ fecal samples (both collected before and after the dietary intervention) underwent whole-metagenome shotgun (WMS) sequencing. To obtain the taxonomic profiles, DIAMOND (double index alignment of next-generation sequencing data) v2.0.11.149 was used in conjunction with the National Center for Biotechnology Information (NCBI) non-redundant (nr) protein reference database (June 2021) to align translated DNA query sequences. DIAMOND was set to “sensitive” mode, targeting alignments with >40% identity with an e-value of 0.0000158. Each sample’s sequences from the merged and cleaned 165 FASTQ file were aligned against the NCBI-nr database, producing a corresponding output DIAMOND alignment archive file. DIAMOND was set to “sensitive” mode, targeting 167 alignments with >40% identity with an e-value of 0.0000158. To generate the functional profiles, MEGAN (MEtaGenome ANalyzer) v6.12.2 Ultimate Edition was then used to perform functional analysis of the sequence alignments against the KEGG gene database58. For each sample, the sequence alignments produced by DIAMOND in the previous step were matched to a KEGG ortholog (KO) accession, producing a MEGAN file containing the total count of each KO across each sample58. NCBI taxonomy counts were also exported from MEGAN in a similar fashion58. Eventually, 859 microbial species-level microbial taxa were identified and 14,109 KOs were identified in the functional profiles. Since the number of KOs (14,109) is too large to be included in our machine-learning algorithms, all KOs are grouped into pathways. Eventually, there are 375 pathway features in the functional profiles.

Grains intervention study. This dietary intervention study investigated how grain barley and oat consumption affects gut bacteria relative abundances and concentrations of microbial metabolites in 68 healthy adults39. All participants were randomly assigned to receive one of three treatments: (1) a control diet containing 0.8 daily servings of whole grain/1800 kcal, (2) a diet containing 4.4 daily servings of whole grain barley/1800 kcal or (3) a diet containing 4.4 daily servings of whole grain oats/1800 kcal. Fecal samples were collected before and after the dietary intervention.

Walnut intervention study. This dietary intervention study investigated how walnut consumption affects the gut microbiota and metabolite concentrations in 18 healthy adults27. All participants completed two 3-week treatment/intervention periods separated by a 1-week washout period. Fecal samples were collected before and after the dietary intervention period.

Almond intervention study. This dietary intervention study was conducted in 18 healthy adults40. All participants completed four 3-week treatment periods and one control period separated by a 1-week washout period. Fecal samples were collected before and after the dietary intervention period.

Broccoli intervention study. In this study, 18 healthy adults completed two 18-day treatment periods separated by a 24-day washout period41. Fecal samples were collected before and after the dietary intervention period.

Fibers or fermented foods intervention study. This dietary intervention study was designed to investigate how consumption of plant-based foods rich in dietary fibers or fermented foods alters gut bacteria and their associated metabolites in 36 healthy adults34. All participants were divided into the high-fiber or the high-fermented-foods arm (18 each for arm). The entire dietary intervention lasted 17 weeks. Their fecal or blood serum samples were collected before and after the dietary intervention period. Gut microbiota composition in fecal samples was determined by the 16S rRNA gene sequencing and metabolomic profiles of serum samples were generated by the LC-MS metabolomics.

McMLP

McMLP consists of two coupled MLPs: (step-1) in the first step (using the MLP at the top in Supplementary Fig. 1a), we predict endpoint microbial compositions based on baseline microbial compositions, baseline metabolomic profiles, and dietary intervention strategy; (step-2) in the second step (using the MLP at the bottom in Supplementary Fig. 1a), we take the predicted endpoint microbial compositions from the first MLP, baseline metabolomic profiles, and dietary intervention strategy to predict endpoint metabolomic profiles.

  • Data processing: The CLR (Centered Log-Ratio) transformation is applied to microbial relative abundances and the log10 transformation is applied to metabolite concentrations.

  • Model detail: Each MLP model (for either the top or the bottom MLP in Supplementary Fig. 1) has 6 hidden layers in the middle, sandwiched by input and output variables. Each hidden layer has a fixed hidden layer dimension of 2048.

  • Training method: The Adam optimizer59 is used for the gradient descent. Specifically, each dataset is split into the train-test set with 80/20. Then the training set is further split into the train-validation set with 80/20. Training stops when (1) the Mean Squared Error (MSE) averaged over all metabolites on the training set is less than 0.1 and (2) the mean SCC (Spearman Correlation Coefficient) of annotated metabolites \(\bar{\rho }\) on the validation set starts to decrease within the last 20 epochs.

  • Activation function: ReLU (Rectified Linear Unit).

Inference of food-microbe and microbe-metabolite interactions via sensitivity

The two MLP models in the well-trained McMLP can be interpreted separately. We first interpret the first MLP (step 1) in McMLP for food-microbe consumption interactions by the amount of food resource \(\alpha\) (\(\Delta {x}_{\alpha }\)) and then measure the change in the relative abundance of species \(i\)(\(\Delta {y}_{i}\)). Mathematically, for the sample \(m\) in the training set, we set the new value of this variable as zero. As a result, the perturbation amount for this variable in sample m is \(\Delta {x}_{\alpha }^{(m)}=0-{x}_{\alpha }^{\left(m\right)}=-{x}_{\alpha }^{\left(m\right)}\) where \({x}_{\alpha }^{\left(m\right)}\) is the unperturbed value. We can measure the change in the relative abundance of species i for sample \(m\) (\(\Delta {y}_{i}^{(m)}\)) and define the sensitivity of species \(i\) to food resource \(\alpha\) for sample \(m\) as \({s}_{i\alpha }^{(m)}=\frac{\Delta {y}_{i}^{(m)}}{\Delta {x}_{\alpha }^{(m)}}\). Finally, we can average sensitivity values across samples to obtain the average sensitivity of species \(i\) to food resource \(\alpha\): \({s}_{i\alpha }=\frac{{\sum}_{m}{s}_{i\alpha }^{(m)}}{{N}_{{{{\rm{train}}}}}}\) where \({N}_{{{{\rm{train}}}}}\) is the number of training samples. Similarly, for the second MLP (step-2) in McMLP, we can define \({s}_{\beta i}^{(m)}=\frac{\Delta {y}_{\beta }^{(m)}}{\Delta {x}_{i}^{(m)}}\) and \({s}_{\beta i}=\frac{{\sum}_{m}{s}_{\beta i}^{(m)}}{{N}_{{{{\rm{train}}}}}}\) to infer microbe-metabolite interactions by perturbing the relative abundance of species \(i\) (\(\Delta {x}_{i}\)) and then measuring the change in concentration of metabolite \(\beta\) (\(\Delta {y}_{\beta }\)).

Statistics

To calculate correlations throughout the study, we used Spearman’s correlation coefficient. Wherever P values were used we calculated the associated null distributions were computed from scratch. The non-parametric Wilcoxon signed-rank test is employed to evaluate differences in predictive performance among the algorithms. All statistical tests were performed using standard numerical and scientific computing libraries in the Python programming language (version 3.7.1) and Jupyter Notebook (version 6.1).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.