Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Polyketide and nonribosomal peptide retro-biosynthesis and global gene cluster matching

Abstract

Polyketides (PKs) and nonribosomal peptides (NRPs) are profoundly important natural products, forming the foundations of many therapeutic regimes. Decades of research have revealed over 11,000 PK and NRP structures, and genome sequencing is uncovering new PK and NRP gene clusters at an unprecedented rate. However, only 10% of PK and NRPs are currently associated with gene clusters, and it is unclear how many of these orphan gene clusters encode previously isolated molecules. Therefore, to efficiently guide the discovery of new molecules, we must first systematically de-orphan emergent gene clusters from genomes. Here we provide to our knowledge the first comprehensive retro-biosynthetic program, generalized retro-biosynthetic assembly prediction engine (GRAPE), for PK and NRP families and introduce a computational pipeline, global alignment for natural products cheminformatics (GARLIC), to uncover how observed biosynthetic gene clusters relate to known molecules, leading to the identification of gene clusters that encode new molecules.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Historical perspective of microbial polyketide and nonribosomal peptide natural-product discovery and associated genetic information.
Figure 2: Workflow of GRAPE.
Figure 3: Matching algorithm of GARLIC and examples of natural products matched.
Figure 4: Matching results of the test clusters (171 NRP, type 1 PK and hybrid PK-NRP) to in-house compound database (48,222 microbial natural products that can be processed via GRAPE).
Figure 5: Discovery of gene clusters for orphaned microbial natural products, and identification of new microbial natural products using the GARLIC pipeline.

Similar content being viewed by others

References

  1. Doroghazi, J.R. et al. A roadmap for natural product discovery based on large-scale genomics and metabolomics. Nat. Chem. Biol. 10, 963–968 (2014).

    Article  CAS  Google Scholar 

  2. Jensen, P.R., Chavarria, K.L., Fenical, W., Moore, B.S. & Ziemert, N. Challenges and triumphs to genomics-based natural product discovery. J. Ind. Microbiol. Biotechnol. 41, 203–209 (2014).

    Article  CAS  Google Scholar 

  3. Rutledge, P.J. & Challis, G.L. Discovery of microbial natural products by activation of silent biosynthetic gene clusters. Nat. Rev. Microbiol. 13, 509–523 (2015).

    Article  CAS  Google Scholar 

  4. Weber, J.M. et al. Organization of a cluster of erythromycin genes in Saccharopolyspora erythraea. J. Bacteriol. 172, 2372–2383 (1990).

    Article  CAS  Google Scholar 

  5. Medema, M.H. et al. Minimum Information about a Biosynthetic Gene cluster. Nat. Chem. Biol. 11, 625–631 (2015).

    Article  CAS  Google Scholar 

  6. Fischbach, M.A. & Walsh, C.T. Assembly-line enzymology for polyketide and nonribosomal peptide antibiotics: logic, machinery, and mechanisms. Chem. Rev. 106, 3468–3496 (2006).

    Article  CAS  Google Scholar 

  7. Hertweck, C. The biosynthetic logic of polyketide diversity. Angew. Chem. Int. Ed. Engl. 48, 4688–4716 (2009).

    Article  CAS  Google Scholar 

  8. Walsh, C.T., O'Brien, R.V. & Khosla, C. Nonproteinogenic amino acid building blocks for nonribosomal peptide and hybrid polyketide scaffolds. Angew. Chem. Int. Ed. Engl. 52, 7098–7124 (2013).

    Article  CAS  Google Scholar 

  9. Johnston, C.W. et al. An automated Genomes-to-Natural Products platform (GNP) for the discovery of modular natural products. Nat. Commun. 6, 8421 (2015).

    Article  CAS  Google Scholar 

  10. Vizcaino, M.I. & Crawford, J.M. The colibactin warhead crosslinks DNA. Nat. Chem. 7, 411–417 (2015).

    Article  CAS  Google Scholar 

  11. Lincke, T., Behnken, S., Ishida, K., Roth, M. & Hertweck, C. Closthioamide: an unprecedented polythioamide antibiotic from the strictly anaerobic bacterium Clostridium cellulolyticum. Angew. Chem. Int. Ed. Engl. 49, 2011–2013 (2010).

    Article  CAS  Google Scholar 

  12. Ishida, K., Lincke, T., Behnken, S. & Hertweck, C. Induced biosynthesis of cryptic polyketide metabolites in a Burkholderia thailandensis quorum sensing mutant. J. Am. Chem. Soc. 132, 13966–13968 (2010).

    Article  CAS  Google Scholar 

  13. Weber, T. et al. antiSMASH 3.0-a comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids Res. 43, W237–W243 (2015).

    Article  CAS  Google Scholar 

  14. Skinnider, M.A. et al. Genomes to natural products PRediction Informatics for Secondary Metabolomes (PRISM). Nucleic Acids Res. 43, 9645–9662 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Bachmann, B.O. Biosynthesis: is it time to go retro? Nat. Chem. Biol. 6, 390–393 (2010).

    Article  CAS  Google Scholar 

  16. Bachmann, B.O. & Ravel, J. Complex enzymes in microbial natural product biosynthesis, part A: overview articles and peptides. in Methods in Enzymology 458, 181–217 (Academic Press, 2009).

    Article  CAS  Google Scholar 

  17. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Article  CAS  Google Scholar 

  18. Khayatt, B.I., Overmars, L., Siezen, R.J. & Francke, C. Classification of the adenylation and acyl-transferase activity of NRPS and PKS systems using ensembles of substrate specific hidden Markov models. PLoS One 8, e62136 (2013).

    Article  CAS  Google Scholar 

  19. Anderson, E., Veith, G. & Weininger, D. SMILES: a line notation and computerized interpreter for chemical structures. Report No. EPA/600/M-87/021 (US Environmental Protection Agency Environmental Research Laboratory-Duluth, 1987).

  20. Steinbeck, C. et al. Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. Curr. Pharm. Des. 12, 2111–2120 (2006).

    Article  CAS  Google Scholar 

  21. Rahman, S.A., Bashton, M., Holliday, G.L., Schrader, R. & Thornton, J.M. Small Molecule Subgraph Detector (SMSD) toolkit. J. Cheminform. 1, 12 (2009).

    Article  Google Scholar 

  22. Callahan, B., Thattai, M. & Shraiman, B.I. Emergent gene order in a model of modular polyketide synthases. Proc. Natl. Acad. Sci. USA 106, 19410–19415 (2009).

    Article  CAS  Google Scholar 

  23. Challis, G.L. & Naismith, J.H. Structural aspects of non-ribosomal peptide biosynthesis. Curr. Opin. Struct. Biol. 14, 748–756 (2004).

    Article  CAS  Google Scholar 

  24. Fisher, R.A.Y. Frank Statistical Tables for Biological, Agricultural and Medical Research 3rd edn. (Oliver & Boyd, London, 1948).

  25. Smith, T.F. & Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981).

    Article  CAS  Google Scholar 

  26. Needleman, S.B. & Wunsch, C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970).

    Article  CAS  Google Scholar 

  27. Powell, M.J.D. An efficient method for finding the minimum of a function of several variables without calculating derivatives. Comput. J. 7, 155–162 (1964).

    Article  Google Scholar 

  28. Sheehan, J.C., Mania, D., Nakamura, S., Stock, J.A. & Maeda, K. The structure of telomycin. J. Am. Chem. Soc. 90, 462–470 (1968).

    Article  CAS  Google Scholar 

  29. Watrous, J. et al. Mass spectral molecular networking of living microbial colonies. Proc. Natl. Acad. Sci. USA 109, E1743–E1752 (2012).

    Article  CAS  Google Scholar 

  30. Gaudiano, G., Bravo, P. & Qiulico, A. The structure of lucensomycin. Part I. Tetrahedr. Lett. 7, 3559–3565 (1966).

    Article  Google Scholar 

  31. Dobashi, K., Naganawa, H., Takahashi, Y., Takita, T. & Takeuchi, T. Novel antifungal antibiotics octacosamicins A and B. II. The structure elucidation using various NMR spectroscopic methods. J. Antibiot. (Tokyo) 41, 1533–1541 (1988).

    Article  CAS  Google Scholar 

  32. Barsby, T., Kelly, M.T., Gagné, S.M. & Andersen, R.J. Bogorol A produced in culture by a marine Bacillus sp. reveals a novel template for cationic peptide antibiotics. Org. Lett. 3, 437–440 (2001).

    Article  CAS  Google Scholar 

  33. Desjardine, K. et al. Tauramamide, a lipopeptide antibiotic produced in culture by Brevibacillus laterosporus isolated from a marine habitat: structure elucidation and synthesis. J. Nat. Prod. 70, 1850–1853 (2007).

    Article  CAS  Google Scholar 

  34. Jones, E., Oliphant, T. & Peterson, P. SciPy: Open Source Scientific Tools for Python. (2014).

Download references

Acknowledgements

This work was funded through an Natural Sciences and Engineering Research Council (NSERC) of Canada Discovery grant (RGPIN 371576-2014) (N.A.M.) and a Joint Programme Initiative on Antimicrobial Resistance funded through the Canadian Institutes of Health Research (CIHR) (grant 138739) (N.A.M.). C.W.J. is funded through a CIHR Doctoral Research Award. N.A.M. is supported by the Canada Research Chairs Program (grant 950228183). We thank J. Cao for rendering trees, A. Luo for curating sugar genes and structures, and B. Furman for valuable communications.

Author information

Authors and Affiliations

Authors

Contributions

C.A.D. and G.M.C. developed GRAPE and GARLIC, devised scoring strategies, contributed to study design, and wrote the manuscript. H.L. consulted on GRAPE and GARLIC's logic, devised scoring strategies, curated data sets, contributed to study design, and wrote the manuscript. C.W.J. isolated compounds and characterized structures. M.R.E. revised GARLIC, and designed and performed the optimization analysis of GARLIC scoring. M.A.S. developed PRISM, contributed to study design, and completed analysis for Figure 1. P.N.R. curated data sets, and contributed to study design. A.L.H.W. curated data sets. N.A.M. contributed to study design and wrote the manuscript.

Corresponding author

Correspondence to Nathan A Magarvey.

Ethics declarations

Competing interests

N.A.M. is a founder of Adapsyn Bioscience; H.L., C.A.D. and M.A.S. are consultants of Adapsyn Bioscience. Adapsyn Bioscience has licensed some aspects of the technologies relating to GARLIC.

Supplementary information

Supplementary Text and Figures

Supplementary Results, Supplementary Figures 1–16, Supplementary Tables 1–7 and Supplementary Note. (PDF 6564 kb)

Supplementary Dataset

Names and abbreviations of substrates used in the GARLIC pipeline. (XLSX 20 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dejong, C., Chen, G., Li, H. et al. Polyketide and nonribosomal peptide retro-biosynthesis and global gene cluster matching. Nat Chem Biol 12, 1007–1014 (2016). https://doi.org/10.1038/nchembio.2188

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/nchembio.2188

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing