Abstract
Polyketides (PKs) and nonribosomal peptides (NRPs) are profoundly important natural products, forming the foundations of many therapeutic regimes. Decades of research have revealed over 11,000 PK and NRP structures, and genome sequencing is uncovering new PK and NRP gene clusters at an unprecedented rate. However, only ∼10% of PK and NRPs are currently associated with gene clusters, and it is unclear how many of these orphan gene clusters encode previously isolated molecules. Therefore, to efficiently guide the discovery of new molecules, we must first systematically de-orphan emergent gene clusters from genomes. Here we provide to our knowledge the first comprehensive retro-biosynthetic program, generalized retro-biosynthetic assembly prediction engine (GRAPE), for PK and NRP families and introduce a computational pipeline, global alignment for natural products cheminformatics (GARLIC), to uncover how observed biosynthetic gene clusters relate to known molecules, leading to the identification of gene clusters that encode new molecules.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
References
Doroghazi, J.R. et al. A roadmap for natural product discovery based on large-scale genomics and metabolomics. Nat. Chem. Biol. 10, 963–968 (2014).
Jensen, P.R., Chavarria, K.L., Fenical, W., Moore, B.S. & Ziemert, N. Challenges and triumphs to genomics-based natural product discovery. J. Ind. Microbiol. Biotechnol. 41, 203–209 (2014).
Rutledge, P.J. & Challis, G.L. Discovery of microbial natural products by activation of silent biosynthetic gene clusters. Nat. Rev. Microbiol. 13, 509–523 (2015).
Weber, J.M. et al. Organization of a cluster of erythromycin genes in Saccharopolyspora erythraea. J. Bacteriol. 172, 2372–2383 (1990).
Medema, M.H. et al. Minimum Information about a Biosynthetic Gene cluster. Nat. Chem. Biol. 11, 625–631 (2015).
Fischbach, M.A. & Walsh, C.T. Assembly-line enzymology for polyketide and nonribosomal peptide antibiotics: logic, machinery, and mechanisms. Chem. Rev. 106, 3468–3496 (2006).
Hertweck, C. The biosynthetic logic of polyketide diversity. Angew. Chem. Int. Ed. Engl. 48, 4688–4716 (2009).
Walsh, C.T., O'Brien, R.V. & Khosla, C. Nonproteinogenic amino acid building blocks for nonribosomal peptide and hybrid polyketide scaffolds. Angew. Chem. Int. Ed. Engl. 52, 7098–7124 (2013).
Johnston, C.W. et al. An automated Genomes-to-Natural Products platform (GNP) for the discovery of modular natural products. Nat. Commun. 6, 8421 (2015).
Vizcaino, M.I. & Crawford, J.M. The colibactin warhead crosslinks DNA. Nat. Chem. 7, 411–417 (2015).
Lincke, T., Behnken, S., Ishida, K., Roth, M. & Hertweck, C. Closthioamide: an unprecedented polythioamide antibiotic from the strictly anaerobic bacterium Clostridium cellulolyticum. Angew. Chem. Int. Ed. Engl. 49, 2011–2013 (2010).
Ishida, K., Lincke, T., Behnken, S. & Hertweck, C. Induced biosynthesis of cryptic polyketide metabolites in a Burkholderia thailandensis quorum sensing mutant. J. Am. Chem. Soc. 132, 13966–13968 (2010).
Weber, T. et al. antiSMASH 3.0-a comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids Res. 43, W237–W243 (2015).
Skinnider, M.A. et al. Genomes to natural products PRediction Informatics for Secondary Metabolomes (PRISM). Nucleic Acids Res. 43, 9645–9662 (2015).
Bachmann, B.O. Biosynthesis: is it time to go retro? Nat. Chem. Biol. 6, 390–393 (2010).
Bachmann, B.O. & Ravel, J. Complex enzymes in microbial natural product biosynthesis, part A: overview articles and peptides. in Methods in Enzymology 458, 181–217 (Academic Press, 2009).
Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Khayatt, B.I., Overmars, L., Siezen, R.J. & Francke, C. Classification of the adenylation and acyl-transferase activity of NRPS and PKS systems using ensembles of substrate specific hidden Markov models. PLoS One 8, e62136 (2013).
Anderson, E., Veith, G. & Weininger, D. SMILES: a line notation and computerized interpreter for chemical structures. Report No. EPA/600/M-87/021 (US Environmental Protection Agency Environmental Research Laboratory-Duluth, 1987).
Steinbeck, C. et al. Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. Curr. Pharm. Des. 12, 2111–2120 (2006).
Rahman, S.A., Bashton, M., Holliday, G.L., Schrader, R. & Thornton, J.M. Small Molecule Subgraph Detector (SMSD) toolkit. J. Cheminform. 1, 12 (2009).
Callahan, B., Thattai, M. & Shraiman, B.I. Emergent gene order in a model of modular polyketide synthases. Proc. Natl. Acad. Sci. USA 106, 19410–19415 (2009).
Challis, G.L. & Naismith, J.H. Structural aspects of non-ribosomal peptide biosynthesis. Curr. Opin. Struct. Biol. 14, 748–756 (2004).
Fisher, R.A.Y. Frank Statistical Tables for Biological, Agricultural and Medical Research 3rd edn. (Oliver & Boyd, London, 1948).
Smith, T.F. & Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981).
Needleman, S.B. & Wunsch, C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970).
Powell, M.J.D. An efficient method for finding the minimum of a function of several variables without calculating derivatives. Comput. J. 7, 155–162 (1964).
Sheehan, J.C., Mania, D., Nakamura, S., Stock, J.A. & Maeda, K. The structure of telomycin. J. Am. Chem. Soc. 90, 462–470 (1968).
Watrous, J. et al. Mass spectral molecular networking of living microbial colonies. Proc. Natl. Acad. Sci. USA 109, E1743–E1752 (2012).
Gaudiano, G., Bravo, P. & Qiulico, A. The structure of lucensomycin. Part I. Tetrahedr. Lett. 7, 3559–3565 (1966).
Dobashi, K., Naganawa, H., Takahashi, Y., Takita, T. & Takeuchi, T. Novel antifungal antibiotics octacosamicins A and B. II. The structure elucidation using various NMR spectroscopic methods. J. Antibiot. (Tokyo) 41, 1533–1541 (1988).
Barsby, T., Kelly, M.T., Gagné, S.M. & Andersen, R.J. Bogorol A produced in culture by a marine Bacillus sp. reveals a novel template for cationic peptide antibiotics. Org. Lett. 3, 437–440 (2001).
Desjardine, K. et al. Tauramamide, a lipopeptide antibiotic produced in culture by Brevibacillus laterosporus isolated from a marine habitat: structure elucidation and synthesis. J. Nat. Prod. 70, 1850–1853 (2007).
Jones, E., Oliphant, T. & Peterson, P. SciPy: Open Source Scientific Tools for Python. (2014).
Acknowledgements
This work was funded through an Natural Sciences and Engineering Research Council (NSERC) of Canada Discovery grant (RGPIN 371576-2014) (N.A.M.) and a Joint Programme Initiative on Antimicrobial Resistance funded through the Canadian Institutes of Health Research (CIHR) (grant 138739) (N.A.M.). C.W.J. is funded through a CIHR Doctoral Research Award. N.A.M. is supported by the Canada Research Chairs Program (grant 950228183). We thank J. Cao for rendering trees, A. Luo for curating sugar genes and structures, and B. Furman for valuable communications.
Author information
Authors and Affiliations
Contributions
C.A.D. and G.M.C. developed GRAPE and GARLIC, devised scoring strategies, contributed to study design, and wrote the manuscript. H.L. consulted on GRAPE and GARLIC's logic, devised scoring strategies, curated data sets, contributed to study design, and wrote the manuscript. C.W.J. isolated compounds and characterized structures. M.R.E. revised GARLIC, and designed and performed the optimization analysis of GARLIC scoring. M.A.S. developed PRISM, contributed to study design, and completed analysis for Figure 1. P.N.R. curated data sets, and contributed to study design. A.L.H.W. curated data sets. N.A.M. contributed to study design and wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
N.A.M. is a founder of Adapsyn Bioscience; H.L., C.A.D. and M.A.S. are consultants of Adapsyn Bioscience. Adapsyn Bioscience has licensed some aspects of the technologies relating to GARLIC.
Supplementary information
Supplementary Text and Figures
Supplementary Results, Supplementary Figures 1–16, Supplementary Tables 1–7 and Supplementary Note. (PDF 6564 kb)
Supplementary Dataset
Names and abbreviations of substrates used in the GARLIC pipeline. (XLSX 20 kb)
Rights and permissions
About this article
Cite this article
Dejong, C., Chen, G., Li, H. et al. Polyketide and nonribosomal peptide retro-biosynthesis and global gene cluster matching. Nat Chem Biol 12, 1007–1014 (2016). https://doi.org/10.1038/nchembio.2188
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/nchembio.2188
This article is cited by
-
Mining metagenomes reveals diverse antibiotic biosynthetic genes in uncultured microbial communities
Brazilian Journal of Microbiology (2023)
-
Strategies to access biosynthetic novelty in bacterial genomes for drug discovery
Nature Reviews Drug Discovery (2022)
-
Eliciting the silent lucensomycin biosynthetic pathway in Streptomyces cyanogenus S136 via manipulation of the global regulatory gene adpA
Scientific Reports (2021)
-
Mining genomes to illuminate the specialized chemistry of life
Nature Reviews Genetics (2021)
-
Cas12a-assisted precise targeted cloning using in vivo Cre-lox recombination
Nature Communications (2021)