Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Enhancing untargeted metabolomics using metadata-based source annotation

An Author Correction to this article was published on 18 October 2023

This article has been updated

Abstract

Human untargeted metabolomics studies annotate only ~10% of molecular features. We introduce reference-data-driven analysis to match metabolomics tandem mass spectrometry (MS/MS) data against metadata-annotated source data as a pseudo-MS/MS reference library. Applying this approach to food source data, we show that it increases MS/MS spectral usage 5.1-fold over conventional structural MS/MS library matches and allows empirical assessment of dietary patterns from untargeted data.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The concept of an RDD-based analysis workflow.
Fig. 2: RDD with food reference data.

Similar content being viewed by others

Data availability

The following files are available in addition to the Global FoodOmics mzXML files on https://massive.ucsd.edu under MSV000084900: metadata as a.txt; an image repository with between one and six images per food item that was sampled; table of FDR-based parameters; full size PDF of sleep restriction and circadian misalignment study; food reference data molecular network (excerpts found in Fig. 1). A metadata dictionary can also be accessed here: https://docs.google.com/spreadsheets/d/1Ebn-TgMWEkd_7KOw9TCRvHGPsE7dGjVCr7dg28pwbmM/edit#gid=727944641. The accessions numbers to the raw metabolomics data files available via Supplementary Table 2. The GNPS-based molecular networking analyses jobs used in this study can be accessed online at the following links: sleep and circadian study (MSV000083759, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=e0bf255bcb2e492bb0be3be1a691b5fb, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=6fe434761daf4f9da540cf1fd90b3985, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=9a90bd12f51e453e968656e6458e0da4); centenarian (MSV000084591, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=8895b6e3445546c4a5bc3a726a920227, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=981c9a7d39f742bda296d52f856981e5); impact of diet on rheumatoid arthritis (MSV000084556, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=0794151fce2c4c18a7a0aa3a09140169); LP infant (MSV000083462, MSV000083463, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=a7b222466ef844e69cdbd9835d2f6c39, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=c756a9dfb5c34a2a8655f88114edf0a8, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=4a322e640bb644068030949267fb4ea9); children with medical complexity (MSV000084610, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=df24423835a341969342c2086b46275a); american gut (MSV000081981, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=4884483bcffe4f269819858c3fd4faef); fermented food consumption (MSV000081171, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=5cca39e0ebab4066a56e41ded48b4466); Malawi legume supplement (MSV000081486, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=93ba727aa9234727a73ae7860b2af3ca); Rotarix vaccine response (MSV000084218, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=08e9b9e048f04ac4b416e574a073e8e6); IBD_1 (MSV000082431; https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=ec08eed8f186430d893c63111409baf4); IBD_individual (MSV000079115, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=fad746939afd4184975a296436aebfb7); IBD_seed (MSV000082221, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=907f2e0b7878417dbdb4c83f0df0e83a); IBD_biobank (MSV000079777, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=a79fbd4c96124209adfd0ef84cb56dec); IBD_2 (MSV000084775, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=07f855658c5342458045032ea70fc526); IBD_200 (MSV000084908, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=55bef02250d744eb97c6040c379cbfb4); Alzheimer’s disease (MSV000085256, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=aac78e9d23b84194ab2f768cb685c636); Alzheimer’s disease serum (MSV000086270, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=570aacf2244948c7afa590631de5d345); omnivore versus vegan (MSV000086989, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=74089e95b8df41b2af7c289869dc866f); COVID-19 (MSV000085505, MSV000085537, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=9cbcb6b46fe24826bc56c9e893d0bd2b); IBD_biopsy (MSV000082220, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=a83a279dad154f9ca7b549d40ce117ba); gout (MSV000084908, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=55bef02250d744eb97c6040c379cbfb4); adult saliva (MSV000083049, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=6dd6e5b1cf454d67b8a2b3c151c18f4a); legume supplementation (MSV000084663, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=93ba727aa9234727a73ae7860b2af3ca); tomato seedling (MSV000083353, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=3b6020d7034045c39969631894ae4c22); food only (MSV000084900, https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=d5adba7f67cc402396e9ba7cd85ce52b). Networking parameters were set on the basis of the MOLECULAR-LIBRARYSEARCH-FDR workflow on GNPS with the following task IDs: GFOP3500, a7bf6cc3f91d466bab923f2268d6f4fc; sleep deprivation, b55ab4004ed342d7b4ed1c488e935998; sleep study, 78bbfed8574748d1a77dc7c2f1a44d39; sleep study_SSF_test, b55ab4004ed342d7b4ed1c488e935998; centenarian, 265a9553c69e47499cca3de056b43178; centenarian_SSF_test, 265a9553c69e47499cca3de056b43178; American gut, aee5dde3b2f84079a264e68ec981487e; fermented food consumption, a44d1b2e1b9d4612974d0b85021675a7; Malawi legume supplement, de7b55f8adaa4ad9b2a8430e30435bf3; children with medical complexity, f27243af071b43ab90d846bda959fc1c; Rotarix vaccine response, a2e02e3f97a54ca08e3866cc60f8d42b; impact of diet on rheumatoid arthritis, 62b8754e761549f3b94ffae83d7ab95a; LP infant, 532aba2ad3644fadba0e6e7ea063c7ee; IBD_1, bb10b1ce90a24f3a9cef1e85e88c3882; IBD_biopsy, c4cfda90933b4842a7154f5f2def139d; IBD_individual, 3ce8cc636ae944848b4ada322aaf12fe; IBD_seed, ebbb715fc605457ba5f7e910b79d6177; IBD_biobank, 9465c34cf5444e12b89318b1fb363714; IBD_2, 983fa9271136404fb5743b44a6a109f0; IBD_200, e5acf5726722486caa897b2b07d402e8; Alzheimer’s disease, 658103164325425981c097cecba840b0; Alzheimer’s disease serum, 67516099b37647f2a9c91f890366bef3; omnivore versus vegan, ba974d08cab04f77aaacdb7828baada6; gout, a478f419ae824378aa02e5e1b310cad2; adult saliva, 32980f95dbd5437aaa9e15d05c7246bb; LP infant, 8bfbdc1bf38c418fb223306cd42af897; LP infant, 3e414e13a4394bb78c07f7ca7f4d1be3; legume supplementation, 2ca007303b9c4bb3820f392b996eba27; COVID-19 Brazil, d16eb32276c84bdb9c35c5872e97a986; Tomato seedling, f1c9cd79e0e94c66a367b6816b149750.

Code availability

The code generated during this study is available at https://github.com/DorresteinLaboratory/GlobalFoodomics.

Change history

References

  1. Knights, D. et al. Bayesian community-wide culture-independent microbial source tracking. Nat. Methods 8, 8761–8763 (2011).

    Article  Google Scholar 

  2. Ono, H. RefEx, a reference gene expression dataset as a web tool for the functional analysis of genes. Scientific Data 4, 170105 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Bono, H. All of gene expression (AOE): an integrated index for public gene expression databases. PLoS One 15, e0227076 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Turnbaugh, P. J. The human microbiome project. Nature 449, 804–810 (2007).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  5. Skogerson, K. et al. The volatile compound BinBase mass spectral database. BMC Bioinf. 12, 321 (2011).

    Article  CAS  Google Scholar 

  6. Lai, Z. et al. Identifying metabolites by integrating metabolome databases with mass spectrometry cheminformatics. Nat. Methods 15, 53–56 (2018).

    Article  CAS  PubMed  ADS  Google Scholar 

  7. Bouslimani, A. et al. Lifestyle chemistries from phones for individual profiling. Proc. Natl Acad. Sci. 113, E7645 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Haug, K. et al. MetaboLights: a resource evolving in response to the needs of its scientific community. Nucleic Acids Res. 48, D440 (2020).

    CAS  PubMed  Google Scholar 

  9. Damen, H. et al. Siscom—a new library search system for mass spectra. Anal. Chim. Acta 103, 289–302 (1978).

    Article  CAS  Google Scholar 

  10. Wang, M. et al. Mass spectrometry searches using MASST. Nat. Biotechnology 38, 23–26 (2020).

    Article  Google Scholar 

  11. Robin S., et al. Nature Communications 12, 3832 (2021).

  12. Li C., et al. Metabolite discovery through global annotation of untargeted metabolomics data. Preprint available at bioRxiv https://doi.org/10.1101/2021.01.06.425569 (2021).

  13. Wang, M. et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 34, 828–837 (2016).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  14. Barabási, A.-L. et al. The unmapped chemical complexity of our diet. Nat. Food 1, 33–37 (2020).

    Article  Google Scholar 

  15. Maruvada, P. et al. Perspective: Dietary Biomarkers of Intake and Exposure-Exploration with Omics Approaches. Adv. Nutr. 11, 200–215 (2020).

    Article  PubMed  Google Scholar 

  16. Watrous, J. et al. Mass spectral molecular networking of living microbial colonies. Proc. Natl Acad. Sci. 109, E1743–E1752 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Quinn, R. et al. Molecular networking as a drug discovery, drug metabolism, and precision medicine strategy. Trends Pharmacol. Sci. 38, 143–154 (2017).

    Article  CAS  PubMed  Google Scholar 

  18. Sprecher, K. et al. Trait-like vulnerability of higher-order cognition and ability to maintain wakefulness during combined sleep restriction and circadian misalignment. Sleep 42, zsz113 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Lungren, D. et al. Role of spectral counting in quantitative proteomics. Expert Rev. Proteomics 7, 39–53 (2010).

    Article  Google Scholar 

  20. Tripathi, T. et al. Chemically informed analyses of metabolomics mass spectrometry data with Qemistree. Nat. Chem. Biol. 17, 146–151 (2021).

    Article  CAS  PubMed  Google Scholar 

  21. Scheubert, K. et al. Significance estimation for large scale metabolomics annotations by spectral matching. Nat. Commun. 8, 1494 (2017).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  22. Sumner, L. et al. Proposed minimum reporting standards for chemical analysis: Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative. Metabolomics 3, 211–221 (2021).

    Article  Google Scholar 

  23. West, K., et al., NPJ Sci. Food 6, 22 (2022).

  24. St. John-Williams, L. et al. Bile acids targeted metabolomics and medication classification data in the ADNI1 and ADNIGO/2 cohorts. Scientific Data 212, 1 (2019).

    Google Scholar 

  25. Aksenov, A. et al. Auto-deconvolution and molecular networking of gas chromatography–mass spectrometry data. Nat. Biotechnol. 39, 169–173 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Thompson, L. R. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551, 457–463 (2017).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  27. McDonald, D. et al. American Gut: an open platform for citizen science microbiome research. mSystems 3, e00018-31 (2018).

    Article  Google Scholar 

  28. Sicherer, S. H. & Sampson, H. A. Food allergy: A review and update on epidemiology, pathogenesis, diagnosis, prevention, and management. J. Allergy Clin. Immunol. 117, S470–S475 (2006).

    Article  CAS  PubMed  Google Scholar 

  29. Martin, C. L., et al. USDA Food and Nutrient Database for Dietary Studies 2011–2012: Documentation and User Guide. Beltsville, MD: US Department of Agriculture. (Agricultural Research Service, USDA Food Surveys Research Group, 2012).

  30. Song, S. J. et al. Preservation methods differ in fecal microbiome stability,affecting suitability for field studies. mSystems 1, e00021-16 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Sprecher, K. J. et al. Trait-like vulnerability of higher-order cognition and ability to maintain wakefulness during combined sleep restriction and circadian misalignment. Sleep 42, zsz113 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  32. McDonald, D. et al. The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome. Gigascience. 1, 7 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Jarmusch, A. K. et al. ReDU: a framework to find and reanalyze public mass spectrometry data. Nat. Methods 17, 901–904 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. McDonald, D. et al. redbiom: a rapid sample discovery and feature characterization system. mSystems 4, e00215-19 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Wang, M. et al. Sharing and community curation of mass spectrometry data with global natural products social molecular networking. Nat. Biotechnol. 34, 828–837 (2016).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  36. Frank, A. M. et al. Spectral archives: extending spectral libraries to analyze both identified and unidentified spectra. Nat. Methods 8, 587–591 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Aron, A. T. et al. Reproducible molecular networking of untargeted mass spectrometry data using GNPS. Nat. Protoc. 15, 1954–1991 (2020).

    Article  CAS  PubMed  Google Scholar 

  38. Horai, H. et al. Massbank: a public repository for sharing mass spectral data for life sciences. J. Mass Spectrom. 45, 703–714 (2010).

    Article  CAS  PubMed  ADS  Google Scholar 

  39. Wishart, D. S. et al. HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res. 46, D608–D617 (2018).

    Article  CAS  PubMed  Google Scholar 

  40. Sawada, Y. et al. RIKEN tandem mass spectral database (ReSpect) for phytochemicals: a plant-specific MS/MS-based data resource and database. Phytochemistry 82, 38–45 (2012).

    Article  CAS  PubMed  Google Scholar 

  41. Huang, R. et al. The NCATS pharmaceutical collection: a 10-year update. Drug Discov. 24, 2341–2349 (2019).

    Google Scholar 

  42. Kyle, J. E. et al. LIQUID: an-open source software for identifying lipids in LC–MS/MS-based lipidomics data. Bioinformatics. 33, 1744–1746 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. McKinney, W. Data Structures for Statistical Computing in Python. In Proc. 9th Python in Science Conference (Eds. van der Walt, S. & Millman, J.) 56–61 (SciPy, 2010).

  45. van der Walt, S., Colbert, S. C. & Varoquaux, G. The NumPy array: a structure for efficient numerical computation. Comput. Sci. Eng. 13, 22–30 (2011).

    Article  Google Scholar 

  46. Lupton, R. C. & Allwood, J. M. Hybrid Sankey diagrams: visual analysis of multidimensional data for understanding resource use. Resour. Conserv. Recycl. 124, 141–151 (2017).

    Article  Google Scholar 

  47. Taylor, B. C. et al. Consumption of fermented foods is associated with systematic differences in the gut microbiome and metabolome. mSystems 5, e00901-19 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Funding sources: we thank the Crohn’s & Colitis foundation #675191, U19 AG063744 01, R01AG061066, 1 DP1 AT010885, P30 DK120515, Office of Naval Research MURI grant N00014-15-1-2809 and NIH/NCATS Colorado CTSA Grant UL1TR002535, the Emch Fund and C&D Fund. This work was also supported in part by the Chancellor’s Initiative in the Microbiome and Microbial Sciences and by Illumina through reagent donation and by Danone Nutricia Research in partnership with the Center for Microbiome Innovation at UCSD. We would like to thank E. Sayyari, D. S. Nguyen, E. Wolfe and K. Sanders for sample processing, and J. DeReus for data handling, processing, and maintaining the computational infrastructure. J.P.S. was supported by SD IRACDA (5K12GM068524-17), and in part by USDA-NIFA (2019-67013-29137) and the Einstein Institute GOLD project (R01MD011389). R.C. and M.G. were supported by the Krupp Endowed Fund; R.C. was also supported by a UCSD Rheumatic Diseases Research Training Grant from the NIH/NIAMS (T32AR064194). VA Research Service, NIH/NIAMS AR060772 and AR075990 to R.T., R.H.M. was supported through a UCSD training grant from the NIH/NIDDK Gastroenterology Training Program (T32 DK007202). The Brazilian National Council for Scientific and Technological Development (CNPq)-Brazil (245954/2012) to M.F.O. and FAPESP (2014/50265-3) to N.P.L. D.W. was supported by NIH/NHLBI Training Grant (NIH T32 HL149646). K.S. was supported by a PROMOS fund (DAAD). W.B. is a postdoctoral researcher of the Research Foundation–Flanders (FWO). R.J.D. was supported by NIH DP2 AT010401-01. We thank R. da Silva for his feedback and early bioinformatics analysis for the Global FoodOmics project. We further acknowledge all the individuals that contributed samples as well as companies and organizations that have donated samples: D. Vargas, Townshend’s Tea Company, BDK Kombucha, Oregonian Tonic, Squirrel & Crow, Venissimo cheese, Fermenter’s Club San Diego, Good Neighbor Gardens, Sprouts Farmers Market, Ralphs, Whole Foods, Julian Ciderworks and San Diego Zoo and Safari Park. Specifically thank you to A. Durant for coordinating sampling at Fermentation Festivals and the wonderful staff at San Diego Zoo Wildlife Alliance for coordinating and helping with sample collection: M. Gaffney, E. Galindo, K. Kerr, A. Fidgett, J. Stuart, D. Tanciatco, and L. Pospychala. NIST would like to acknowledge The Institute for the Advancement of Food and Nutrition Sciences (IAFNS) microbiome committee for providing support for the development of standardized fecal materials. Funding for the ADMC (Alzheimer’s Disease Metabolomics Consortium, led by Dr R.K.-D. at Duke University) was provided by the National Institute on Aging grants 1U01AG061359-01 and R01AG046171, a component of the Accelerating Medicines Partnership for AD (AMP-AD) Target Discovery and Preclinical Validation Project (https://www.nia.nih.gov/research/dn/ampad-target-discovery-and-preclinical-validation-project) and the National Institute on Aging grant RF1 AG0151550, a component of the M2OVE-AD Consortium (Molecular Mechanisms of the Vascular Etiology of AD Consortium, https://www.nia.nih.gov/news/decoding-molecular-tiesbetween-vascular-disease-and-alzheimer). Additional support was provided by the following NIA grants: (1RF1AG058942-01 and 3U01 AG024904-09S4). Data collection and sharing for the ADNI was supported by National Institutes of Health Grant U01 AG024904. ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd; Janssen Alzheimer Immunotherapy Research & Development, LLC; Johnson & Johnson Pharmaceutical Research & Development LLC; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. UCSD Academic Senate Research/Bridge Grant. Eunice Kennedy Shriver National Institute of Child Health and Human Development K12-HD000850.

Author information

Authors and Affiliations

Authors

Contributions

P.C.D., R.K.D., R.J.D., and J.M.G. conceptualized the idea. M.J.M., M.B., M.P., F.D.O., K.C.W., C.M.A., E.B., K.S., P.C.D., R.J.D., R.K.D., N.C.S., A.D.S., K.D., G.A., D.M.D., N.P.L., M.B., and J.M.G. collected FoodOmics samples and performed metadata curation. M.J.M, M.P., F.D.O., F.V., C.M.A., E.B., N.C.S., and J.M.G. performed FoodOmics sample processing and MS data acquisition. A.J.J., P.B.F., E.D., Q.Z., D.N., D.M., J.P.S., and J.M.G. curated Global FoodOmics metadata to match FNDDS. K.E.R., J.B.W., B.S.B., B.J.B., R.C., M.G.D.B., M.M.D., E.O.E., D.G., L.H., J.H.K., M.M., C.M., R.K., K.E.S., D.V.R., T.I.K., C.W., K.P.W.J., M.F.O., R.H.M., D.W., R.T., J.G.A., P.S.D., M.G., D.J.G., A.K.J., B.J.B., R.M.S., K.C.W., A.D.S., F.V., N.P.L., P.K.P., S.M.D.S., S.L.S., C.M.J., N.J.L., K.A.L., S.A.J., R.K.D. and J.M.G. provided samples, comparative dataset, and/or detailed metadata. L.M.M.M., T.M.C. performed COVID-19 patient and/or food sample preparation and analysis. P.L.J. was the physician responsible for the COVID-19 patients. R.D.R.O was the physician responsible for collecting the plasma from COVID-19 patients. F.P.V. was responsible for tabulation of COVID-19 patient data. M.P., J.M.G., T.S., M.G.D.B., L.D.R.G., G.H. prepared samples for food. M.W. supported GNPS computational infrastructure used in the study. C.L.W., W.B., A.K.J., K.A.W., E.S., A.T., N.P.L. and J.M.G. analyzed MS data. C.L.W., W.B., A.K.J., K.A.W., C.M., and J.M.G. generated figures. P.C.D., R.K., R.J.D., A.D.S., and J.M.G. supervised the work. P.C.D., R.K., C.L.W., K.A.W., W.B., and J.M.G. wrote the paper. All authors have contributed feedback and edits to the manuscript.

Corresponding authors

Correspondence to Rob Knight or Pieter C. Dorrestein.

Ethics declarations

Competing interests

B.S.B. has a research grant from Prometheus Biosciences and has received consulting fees from Pfizer. P.C.D. is on the scientific advisory board of Sirenas, Cybele Microbiome, Galileo, and founder and scientific advisor of Ometa Labs LLC and Enveda (with approval by UC San Diego). J.H.K. is a consultant for Medela and on the Board for Innara Health; he owns shares in Astarte Medical and Nicolette. M.G. has research grants from Pfizer and Novartis. P.S.D. has received research support and/or consulting from Takeda, Pfizer, Abbvie, Janssen, Prometheus, Buhlmann, Polymedco. R.J.D. is a consultant for and owns shares in Impossible Foods Inc., and is on the Scientific Advisory Panel of Boost Biomes. A.J.J. has received consulting fees from Abbott Nutrition and Corebiome. D.M. is a consultant for BiomeSense, Inc., has equity and receives income. The terms of these arrangements have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. D.G. is a consultant for Biogen, Fujirebio, vTv Therapeutics, Esai and Amprion and serves on a DSMB for Cognition Therapeutics. K.P.W. reports during the conduct of the study receiving research support from SomaLogic, Inc., consulting fees from or served as a paid member of scientific advisory boards for the Sleep Disorders Research Advisory Board–National Heart, Lung and Blood Institute, CurAegis Technologies, Philips, Inc., Circadian Therapeutics, Ltd. and Circadian Biotherapies Ltd. R.T. received a research grant from AstraZeneca Consulting, SOBI, Selecta, Horizon, Allena, AstraZeneca. A.D.S. and R.K. are directors at the Center for Microbiome Innovation at UC San Diego, which receives industry research funding for multiple microbiome initiatives, but no industry funding was provided for this project. R.K. is a scientific advisory board member, and consultant for BiomeSense, Inc., has equity and receives income. He is a scientific advisory board member and has equity in GenCirq. He is a consultant and scientific advisory board member for DayTwo, and receives income. He has equity in and acts as a consultant for Cybele. He is a co-founder of Biota, Inc., and has equity. He is a cofounder of Micronoma, and has equity and is a scientific advisory board member. The terms of this arrangement have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. M.W. is a co-founder of Ometa Labs LLC. K.D. is an inventor on a series of patents on the use of metabolomics for the diagnosis and treatment of central nervous system diseases and holds equity in Metabolon Inc., Chymia LLC and PsyProtix. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks Elaine Holmes and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1−3.

Reporting Summary

Supplementary Table 1

The metadata table for the foodomics project

Supplementary Table 2

Table of the study details of each of the 28 public projects

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gauglitz, J.M., West, K.A., Bittremieux, W. et al. Enhancing untargeted metabolomics using metadata-based source annotation. Nat Biotechnol 40, 1774–1779 (2022). https://doi.org/10.1038/s41587-022-01368-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41587-022-01368-1

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research