Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Metagenomic estimation of dietary intake from human stool

An Author Correction to this article was published on 24 March 2025

This article has been updated

Abstract

Dietary intake is tightly coupled to gut microbiota composition, human metabolism and the incidence of virtually all major chronic diseases. Dietary and nutrient intake are usually assessed using self-reporting methods, including dietary questionnaires and food records, which suffer from reporting biases and require strong compliance from study participants. Here, we present Metagenomic Estimation of Dietary Intake (MEDI): a method for quantifying food-derived DNA in human faecal metagenomes. We show that DNA-containing food components can be reliably detected in stool-derived metagenomic data, even when present at low abundances (more than ten reads). We show how MEDI dietary intake profiles can be converted into detailed metabolic representations of nutrient intake. MEDI identifies the onset of solid food consumption in infants, shows significant agreement with food frequency questionnaire responses in an adult population and shows agreement with food and nutrient intake in two controlled-feeding studies. Finally, we identify specific dietary features associated with metabolic syndrome in a large clinical cohort without dietary records, providing a proof-of-concept for detailed tracking of individual-specific, health-relevant dietary patterns without the need for questionnaires.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Constructing a metagenomic food database.
Fig. 2: Food genome quantification on simulated ground-truth data.
Fig. 3: MEDI recapitulates data from controlled-feeding studies.
Fig. 4: MEDI food abundances across infants and adults.
Fig. 5: MEDI dietary intake estimates were associated with metabolic health.

Similar content being viewed by others

Data availability

Data for specific food items are available at https://foodb.ca. Individual matched genomic assemblies can be downloaded from GenBank or the Nucleotide Database and are listed at https://github.com/Gibbons-Lab/medi-paper/blob/main/db/data/manifest.csv. Metagenomic sequencing data for the studied cohorts are available on the NCBI SRA under accession numbers PRJNA473126 (infants), PRJNA398089 (iHMP), PRJEB37249 (METACARDIS), PRJNA947193 (MBD) and PRJNA1198318 (PATH). Source data are provided with this paper.

Code availability

All intermediate data files, metadata and analysis code have been uploaded to GitHub (https://github.com/Gibbons-Lab/medi-paper). The MEDI software package is available on GitHub (https://github.com/Gibbons-Lab/medi).

Change history

References

  1. Harding, J. E., Cormack, B. E., Alexander, T., Alsweiler, J. M. & Bloomfield, F. H. Advances in nutrition of the newborn infant. Lancet 389, 1660–1668 (2017).

    Article  CAS  PubMed  Google Scholar 

  2. de Ridder, D., Kroese, F., Evers, C., Adriaanse, M. & Gillebaart, M. Healthy diet: health impact, prevalence, correlates, and interventions. Psychol. Health 32, 907–941 (2017).

    Article  PubMed  Google Scholar 

  3. Clark, M., Hill, J. & Tilman, D. The diet, health, and environment trilemma. Annu. Rev. Environ. Resour. 43, 109–134 (2018).

    Article  Google Scholar 

  4. David, L. A. et al. Diet rapidly and reproducibly alters the human gut microbiome. Nature 505, 559–563 (2014).

    Article  CAS  PubMed  Google Scholar 

  5. Wang, D. D. et al. The gut microbiome modulates the protective association between a Mediterranean diet and cardiometabolic disease risk. Nat. Med. 27, 333–343 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Gu, Y., Nieves, J. W., Stern, Y., Luchsinger, J. A. & Scarmeas, N. Food combination and Alzheimer disease risk: a protective diet. Arch. Neurol. 67, 699–706 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Mente, A. et al. Diet, cardiovascular disease, and mortality in 80 countries. Eur. Heart J. 44, 2560–2579 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Magkos, F., Hjorth, M. F. & Astrup, A. Diet and exercise in the prevention and treatment of type 2 diabetes mellitus. Nat. Rev. Endocrinol. 16, 545–555 (2020).

    Article  PubMed  Google Scholar 

  9. Key, T. J., Allen, N. E., Spencer, E. A. & Travis, R. C. The effect of diet on risk of cancer. Lancet 360, 861–868 (2002).

    Article  CAS  PubMed  Google Scholar 

  10. Ludwig, D. S., Ebbeling, C. B. & Heymsfield, S. B. Improving the quality of dietary research. JAMA 322, 1549–1550 (2019).

    Article  PubMed  Google Scholar 

  11. Molag, M. L. et al. Design characteristics of food frequency questionnaires in relation to their validity. Am. J. Epidemiol. 166, 1468–1478 (2007).

    Article  PubMed  Google Scholar 

  12. Timon, C. M. et al. A review of the design and validation of web- and computer-based 24-h dietary recall tools. Nutr. Res. Rev. 29, 268–280 (2016).

    Article  PubMed  Google Scholar 

  13. Conway, J. M., Ingwersen, L. A. & Moshfegh, A. J. Accuracy of dietary recall using the USDA five-step multiple-pass method in men: an observational validation study. J. Am. Diet. Assoc. 104, 595–603 (2004).

    Article  PubMed  Google Scholar 

  14. Abu-Saad, K., Shahar, D. R., Vardi, H. & Fraser, D. Importance of ethnic foods as predictors of and contributors to nutrient intake levels in a minority population. Eur. J. Clin. Nutr. 64, S88–S94 (2010).

    Article  PubMed  Google Scholar 

  15. Mozaffarian, D. & Forouhi, N. G. Dietary guidelines and health—Is nutrition science up to the task? Brit. Med. J. 360, k822 (2018).

    Article  PubMed  Google Scholar 

  16. Taubes, G. Epidemiology faces its limits. Science 269, 164–169 (1995).

    Article  CAS  PubMed  Google Scholar 

  17. Young, S. S. & Karr, A. Deming, data and observational studies. Signif. (Oxf.) 8, 116–120 (2011).

    Article  Google Scholar 

  18. Sturgeon, C. M. et al. National Academy of Clinical Biochemistry laboratory medicine practice guidelines for use of tumor markers in testicular, prostate, colorectal, breast, and ovarian cancers. Clin. Chem. 54, e11–e79 (2008).

    Article  CAS  PubMed  Google Scholar 

  19. Mundi, S. et al. Endothelial permeability, LDL deposition, and cardiovascular risk factors—a review. Cardiovasc. Res. 114, 35–52 (2018).

    Article  CAS  PubMed  Google Scholar 

  20. Zuppinger, C. et al. Performance of the digital dietary assessment tool MyFoodRepo. Nutrients 14, 635 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Mohanty, S. P. et al. The food recognition benchmark: using deep learning to recognize food in images. Front. Nutr. 9, 875143 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Mortazavi, B. J. & Gutierrez-Osuna, R. A review of digital innovations for diet monitoring and precision nutrition. J. Diabetes Sci. Technol. 17, 217–223 (2023).

    Article  PubMed  Google Scholar 

  23. Hassannejad, H. et al. Automatic diet monitoring: a review of computer vision and wearable sensor-based methods. Int. J. Food Sci. Nutr. 68, 656–670 (2017).

    Article  PubMed  Google Scholar 

  24. West, K. A., Schmid, R., Gauglitz, J. M., Wang, M. & Dorrestein, P. C. foodMASST a mass spectrometry search tool for foods and beverages. NPJ Sci. Food 6, 22 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Dorrestein, P. Metabolomics technologies for defining diet influences on brain metabolome and in Alzheimer’s disease. Alzheimers Dement. 18, e067277 (2022).

    Article  Google Scholar 

  26. Petrone, B. L. et al. Diversity of plant DNA in stool is linked to dietary quality, age, and household income. Proc. Natl Acad. Sci. USA 120, e2304441120 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Deagle, B. E., Thomas, A. C., Shaffer, A. K., Trites, A. W. & Jarman, S. N. Quantifying sequence proportions in a DNA-based diet study using Ion Torrent amplicon sequencing: Which counts count? Mol. Ecol. Resour. 13, 620–633 (2013).

    Article  CAS  PubMed  Google Scholar 

  28. Integrative HMP (iHMP) Research Network Consortium. The Integrative Human Microbiome Project. Nature 569, 641–648 (2019).

    Article  Google Scholar 

  29. Quince, C., Walker, A. W., Simpson, J. T., Loman, N. J. & Segata, N. Shotgun metagenomics, from sampling to analysis. Nat. Biotechnol. 35, 833–844 (2017).

    Article  CAS  PubMed  Google Scholar 

  30. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Blanco-Míguez, A. et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat. Biotechnol. 41, 1633–1644 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Brent, M. R. How does eukaryotic gene prediction work? Nat. Biotechnol. 25, 883–885 (2007).

    Article  CAS  PubMed  Google Scholar 

  33. Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Ounit, R., Wanamaker, S., Close, T. J. & Lonardi, S. CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics 16, 236 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Shen, W. et al. KMCP: accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping. Bioinformatics 39, btac845 (2023).

    Article  CAS  PubMed  Google Scholar 

  36. Gihawi, A. et al. Major data analysis errors invalidate cancer microbiome findings. Mbio 14, e0160723 (2023).

    Article  PubMed  Google Scholar 

  37. Breitwieser, F. P., Baker, D. N. & Salzberg, S. L. KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biol. 19, 198 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Lu, J., Breitwieser, F. P., Thielen, P. & Salzberg, S. L. Bracken: estimating species abundance in metagenomics data. PeerJ Comput. Sci. 3, e104 (2017).

    Article  Google Scholar 

  39. Srivastava, A. et al. Alignment and mapping methodology influence transcript abundance estimation. Genome Biol. 21, 239 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Sun, Z. et al. Challenges in benchmarking metagenomic profilers. Nat. Methods 18, 618–626 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Corbin, K. D. et al. Host–diet–gut microbiome interactions influence human energy balance: a randomized clinical trial. Nat. Commun. 14, 3161 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Thompson, S. V. et al. Avocado consumption alters gastrointestinal bacteria abundance and microbial metabolite concentrations among adults with overweight or obesity: a randomized controlled trial. J. Nutr. 151, 753–762 (2021).

    Article  CAS  PubMed  Google Scholar 

  43. Asnicar, F. et al. Original research: blue poo: impact of gut transit time on the gut microbiome using a novel marker. Gut 70, 1665 (2021).

    Article  CAS  PubMed  Google Scholar 

  44. Duan, Y., Pi, Y., Li, C. & Jiang, K. An optimized procedure for detection of genetically modified DNA in refined vegetable oils. Food Sci. Biotechnol. 30, 129–135 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Scollo, F. et al. Absolute quantification of olive oil DNA by droplet digital-PCR (ddPCR): comparison of isolation and amplification methodologies. Food Chem. 213, 388–394 (2016).

    Article  CAS  PubMed  Google Scholar 

  46. Baumann-Dudenhoeffer, A. M., D’Souza, A. W., Tarr, P. I., Warner, B. B. & Dantas, G. Infant diet and maternal gestational weight gain predict early metabolic maturation of gut microbiomes. Nat. Med. 24, 1822–1829 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Lloyd-Price, J. et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569, 655–662 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Manore, M. M. Exercise and the Institute of Medicine recommendations for nutrition. Curr. Sports Med. Rep. 4, 193–198 (2005).

    Article  PubMed  Google Scholar 

  49. Fromentin, S. et al. Microbiome and metabolome features of the cardiometabolic disease spectrum. Nat. Med. 28, 303–314 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Thomas, M. S., Calle, M. & Fernandez, M. L. Healthy plant-based diets improve dyslipidemias, insulin resistance, and inflammation in metabolic syndrome. A narrative review. Adv. Nutr. 14, 44–54 (2023).

    Article  PubMed  Google Scholar 

  51. Neuenschwander, M. et al. Substitution of animal-based with plant-based foods on cardiometabolic health and all-cause mortality: a systematic review and meta-analysis of prospective studies. BMC Medicine 21, 404 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Embleton, N. D. Optimal protein and energy intakes in preterm infants. Early Hum. Dev. 83, 831–837 (2007).

    Article  CAS  PubMed  Google Scholar 

  53. Uauy, R., Mena, P. & Valenzuela, A. Essential fatty acids as determinants of lipid requirements in infants, children and adults. Eur. J. Clin. Nutr. 53, S66–S77 (1999).

    Article  PubMed  Google Scholar 

  54. Neis, F. A., de Costa, F., de Araújo, A. T. Jr., Fett, J. P. & Fett-Neto, A. G. Multiple industrial uses of non-wood pine products. Ind. Crops Prod. 130, 248–258 (2019).

    Article  CAS  Google Scholar 

  55. Wallick, D. Cellulose polymers in microencapsulation of food additives. In Microencapsulation in the Food Industry (eds Gaonkar A. et al.) 181–193 (Elsevier, 2014).

  56. Li, N., Simon, J. E. & Wu, Q. Development of a scalable, high-anthocyanin and low-acidity natural red food colorant from Hibiscus sabdariffa L. Food Chem. 461, 140782 (2024).

    Article  CAS  PubMed  Google Scholar 

  57. Ruxton, C. H. S., Gardner, E. J. & McNulty, H. M. Is sugar consumption detrimental to health? A review of the evidence 1995–2006. Crit. Rev. Food Sci. Nutr. 50, 1–19 (2010).

    Article  CAS  PubMed  Google Scholar 

  58. Crovetto, M. et al. Effect of healthy and unhealthy habits on obesity: a multicentric study. Nutrition 54, 7–11 (2018).

    Article  PubMed  Google Scholar 

  59. Gibbons, S. M. et al. Perspective: leveraging the gut microbiota to predict personalized responses to dietary, prebiotic, and probiotic interventions. Adv. Nutr. 13, 1450–1461 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Lovegrove, J. A., Hodson, L., Sharma, S. & Lanham-New S. A. Nutrition Research Methodologies (John Wiley & Sons, 2015).

  61. Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017).

    Article  PubMed  Google Scholar 

  64. Corbin, K. D. et al. Integrative and quantitative bioenergetics: design of a study to assess the impact of the gut microbiome on host energy balance. Contemp. Clin. Trials Commun. 19, 100646 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  65. 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

Download references

Acknowledgements

Research reported in this publication was supported by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) of the NIH under award number R01DK133468 (to S.M.G.) and by a Global Grants for Gut Health Award from Nature Portfolio and Yakult (to S.M.G.). This research was funded in part by the Austrian Science Fund (FWF): grant Cluster of Excellence CoE7 (to C.D. and C.M.-E.) and SFB ImmunoMetabolism 10.55776/F8300 (to C.M.-E.). Computational resources for this work were provided by the MedBioNode High-Performance Computing cluster at the Medical University of Graz. H.D.H. acknowledges funding for the PATH study from the Foundation for Food and Agriculture Research (FFAR) New Innovator Award and Hass Avocado Board.

Author information

Authors and Affiliations

Authors

Contributions

C.D. and S.M.G. conceived of the study. C.D. wrote and tested the software. C.D., K.F. and S.M.G. performed analyses. H.D.H., K.D.C., C.M.-E. and S.M.G. provided datasets and resources. C.D. wrote the initial draft of the paper. C.D. and S.M.G. provided supervision. All authors contributed to writing and revising the paper.

Corresponding authors

Correspondence to Christian Diener or Sean M. Gibbons.

Ethics declarations

Competing interests

The authors report no financial or non-financial competing interests relevant to the work presented in this paper. S.M.G. received funding from a Global Grants for Gut Health Award from Nature Portfolio and Yakult. However, the funders were not involved in conducting the research, drafting the paper or reviewing the work.

Peer review

Peer review information

Nature Metabolism thanks Lars Dragsted and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Yanina-Yasmin Pesch, in collaboration with the Nature Metabolism team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 MEDI benchmarks.

(a) Genomic distance (1 - ANI) vs. macronutrient distance (euclidean, in g/100 g). The blue line denotes a smooth spline regression and shaded area denotes the 95% confidence interval of the mean spline regression. (b) Benchmark of cached and batched processing using MEDI (6 CPUs per process, see Methods). 888 samples were divided into two batches of 500 and 388 FASTQ files and processes separately in parallel. Each point denotes a single FASTQ file and colors denote the batch. Vertical line denotes median classification rate. (c) Relationship between (haploid) genome/assembly size and food abundance in the iHMP data set. Shown are only genomes/assemblies with at least 1 million basepairs.

Source data

Extended Data Fig. 2 Foods and nutrients in controlled feeding studies.

(a) Food abundances in the MBD cohort by diet group (n = 30). Boxplots show 25%, 50%, and 75% quantiles.The center denotes the median and whiskers extend to the smallest and largest data points within 1.5 interquartile ranges. (b) Correlation between MEDI estimates and ground truth for varying fecal samples/food diary entry offsets. (c) MEDI predictions of total fiber content from fecal DNA (y-axis) and nutrient consumption of sugars, fibers and grains obtained from food diaries (x-axis) in a controlled-feeding study (PATH), where the dietary intake recorded in the daily food record precede the stool sample by at least 48 h. Each point denotes a single individual. For the food diaries, points represent means over all measured intake amounts and error bars denote the standard error of the mean (sd/sqrt(n)), normalized to a 100 g portion (all samples within the offset, 38 individuals with 124 food record diary entries). For the MEDI data, points x-coordinate represent point estimates of intake based on weighting nutrient profiles of food items by food item relative abundance and assuming a 100 g portion. Blue lines denote regression slopes and gray areas represent 95% confidence intervals. Annotations denote correlation coefficient (r) and p-value (p) from a Pearson product-moment correlation test.

Source data

Extended Data Fig. 3 Non-food reads in infant samples.

Relative abundance of bacterial and human reads across infant timeseries, colored by delivery route. Lines denotes a smooth spline regression and shaded areas denotes the 95% confidence interval of the spline regression.

Source data

Extended Data Fig. 4 MEDI dietary intake estimates were associated with metabolic health.

Abundances per 100 g portion for 1703 compounds across a cohort of 533 metabolically healthy and unhealthy individuals from the METACARDIS cohort. Fill colors denote abundance per standard portion (mg/100 g). Column annotations denote metabolic health status from the original METACARDIS cohort (HC - healthy cohort, MMC - IHD metabolically matched cohort, UMMC - untreated metabolically matched cohort). Here, MMC and UMMC denote disease-free but metabolically unhealthy groups. Row annotations denote the monomer mass of the compound (in g/mol).

Source data

Extended Data Fig. 5 Curation of FOODB data.

(a) Original content (x-axis) vs. energy content calculated by the Adwater method based on macronutrient content (Pearson r = 0.94, two-sided product-moment correlation test p < 2.2e-16). Colors denote detailed unique preparation types in the FOODB. (b) Cholesterol abundances across foods in the FOODB before adjustment.

Source data

Extended Data Fig. 6 Hibiscus associations.

Significant associations between food frequency questionnaires (FFQs) and Hibiscus genus abundance in the iHMP cohort (see Methods, n = 361). Associations were run for all 19 FFQ questions. Circles denote the mean and error bar denote standard deviation. p[lm] indicates the ANOVA p-value of a regression of log-transformed relative abundances and p[logit] denotes the p-value of a logistic regression of food occurrence against food frequency strata. Axis labels are common across all plots within this panel. Shown are only food groups with a Bonferroni-adjusted p(lm) < 0.05.

Source data

Supplementary information

Reporting Summary

Tables 1 and 2

Table 1. Summary of metabolites in FOODB. Includes source type (nutrient or compound), monomer mass, abundance statistics and in how many foods the metabolite was measured in. Table 2. Cohort characteristics.

Source data

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Diener, C., Holscher, H.D., Filek, K. et al. Metagenomic estimation of dietary intake from human stool. Nat Metab 7, 617–630 (2025). https://doi.org/10.1038/s42255-025-01220-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s42255-025-01220-1

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing