Introduction

Terpenoids are the largest and most structurally diverse family of natural products, consisting of over 100,000 known compounds, and are widely used as pharmaceuticals, herbicides, flavors, fragrances, and biofuels1,2,3,4,5. Plants, fungi, and marine organisms are well known producers of terpenoids; however, less than 2% of all known terpenoids are of bacterial origin6. This disparity is particularly true for diterpenoids, the C20 subclass famous for the anticancer drug Taxol and the plant hormone gibberellin, as less than 1% of diterpenoids are from bacteria (<200 of ~25,000)1.

Terpene synthases (TSs), the enzymes that begin specialized terpene biosynthesis by converting acyclic and achiral prenyl diphosphates into diverse polycyclic skeletons with numerous stereocenters, are prevalent throughout nature7,8. Despite the relatively small numbers of bacterial terpenoids, TSs are widespread in bacterial genomes. In an early genome mining study, over 120 putative bacterial TSs were identified, mostly from 20 actinomycete genomes9. A few years later, a hidden Markov model (HMM) identified 262 presumptive TSs from a variety of bacteria10. Now, with the ever-growing microbial genomic libraries, a simple search on the UniProt database for the ‘terpene cyclase-like 2 family’ (IPR034686) reveals over 5000 bacterial proteins. This number does not include other subfamilies of TSs, such as the type II TSs11 or any of the noncanonical TSs12. Overall, this highlights the enormous potential to use TSs to characterize new terpene skeletons, discover new terpenoid natural products, investigate terpene enzymology and evolution, and develop biocatalysts to produce structurally and stereochemically complex hydrocarbons.

TSs act as chaperones that use carbocation chemistry to catalyze complex cyclization reactions7,8. They are differentiated into subclasses based on how they form the initial cation8,11,12. Type I TSs use a trinuclear divalent cation cluster to abstract the diphosphate moiety of the substrate and initiate cyclization8. Although type I TSs, which are found in plants, fungi, bacteria, and recently corals, have conserved protein folds, they often have little sequence conservation outside of two metal-binding motifs, an Asp-rich motif (e.g., DDxxD) and an NSE/DTE triad (e.g., NxxxSxxxE)7. Due to the complexities of terpene cyclization, it is currently impossible to predict the substrate and products of TSs based on protein sequence alone. An additional complicating factor of TS sequence–function relationships is that a single amino acid change, in a well-positioned site, can dramatically alter the product profile of a TS13. In this study, we explore and expand the natural chemical space of diterpenes in bacteria by screening 334 uncharacterized TSs for activity.

Results and discussion

Genome mining for bacterial terpene synthases

We set out to map the natural chemical space of bacterial diterpenes by screening a library of putative TSs. There are 152 characterized canonical type I TSs from bacteria: >78% (119) are from the Actinomycetota phylum, 36 are diterpene synthases (di-TSs) that act on GGPP (as opposed to di-TSs that act on precyclized diphosphates, e.g., ent-copalyl diphosphate), and only 27 are associated with known natural products. We first investigated the overall distribution of putative TSs in bacteria. Using the Enzyme Function Initiative Taxonomy Sunburst tool14, there are 133,434 proteins that fall into the ‘isoprenoid synthase domain superfamily’ (IPR008949) in the UniProt database. Narrowing the search to IPR034686, a type I TS subfamily that is mainly found in bacteria and fungi, we identified 5802 putative TSs. Of the >40 formally recognized bacterial phyla15, 13 include TSs from the IPR034686 subfamily with Actinomycetota the most significant representative with 79.5% of the proteins; other phyla with >1% include Bacteroidota (8.6%), Pseudomonadota (5.0%), Myxococcota (3.2%), and Cyanobacteriota (2.6%) (Fig. 1A, B). When the total number of genomes of each phylum is considered, Actinomycetota, Myxococcota, and Cyanobacteriota have the highest number of TSs per genome (Supplementary Data 1). Interestingly, there are no type I di-TSs known from myxobacteria or cyanobacteria (Fig. 1B).

Fig. 1: Terpene synthases in bacteria.
figure 1

A TSs from the subfamily IPR034686 are found in 13 bacterial phyla. The phylogenetic tree of 43 phyla was created using 16S rRNA of representative bacteria (Supplementary Table 32). Phyla with at least one putative TS from IPR034686 are bolded and colored with the total number and overall percentage listed. B A sunburst chart highlighting the phylogenetic breakdown of TSs within the 13 bacterial phyla. Selected taxonomic descriptions are listed. C The overall TS library described in this study categorized by phyla (color-coded as in A). The numbers in the pie chart represent the number of TSs that showed di-TS activity and the total number within each phylum. D Phylogenetic analysis of type I TSs from bacteria showing broad distribution of di-TSs. The blue lines represented previously characterized TSs, red lines are the 31 TSs characterized in this study, green lines are the other 94 TSs that showed di-TS activity but not studied here, and black lines are the remaining 209 TSs in our bacterial TS library. A larger tree is shown in Supplementary Fig. 1. E The total count of di-TSs from bacteria reported in the literature and the number of di-TSs characterized in this study.

Using a combination of genome mining, phylogenetics, sequence similarity networking, and biosynthetic gene cluster (BGC) analysis, we selected 334 TSs from 8 phyla, 17 classes, and 83 genera of bacteria for functional analysis (Fig. 1C, D, Supplementary Fig. 1). The phylogenetic distribution of selected TSs was intentionally broad and paralleled the overall phylogenetic representation (Fig. 1B, C). To expedite functional characterization, we used an E. coli heterologous expression system to detect products without the need for protein purification or substrate synthesis. We synthesized the 334 TSs as codon-optimized genes and cloned them into an engineered E. coli strain that overproduces geranylgeranyl diphosphate (GGPP)16. Terpene products were initially detected via TLC or HPLC of organic extracts from the TS library in E. coli. In a first-pass screen, with no optimization of the genetic system, expression, or culture conditions, we identified 125 positive hits (37%, Fig. 1C). We then prioritized 31 TSs for large-scale fermentations for diterpene isolation and structural elucidation (Fig. 1E, Supplementary Fig. 2). TSs were chosen based on sequence and phylogenetic diversity (Supplementary Data 2), location in a unique BGC (Supplementary Data 3), and/or high initial yields. We isolated 28 diterpenes and determined their structures using NMR, GC-MS, and vibrational circular dichroism (VCD). We organized the bacterial diterpenes into four categories below, although several diterpenes fit into more than one category.

Discovery and structural elucidation of three previously unreported diterpene skeletons

Tetraisoquinene (1), a diterpene with a 5/5/5/5-fused tetracyclic skeleton, was produced by a TS (TiqS) from the myxobacterium Melittangium boletus (Fig. 2). Myxobacteria are well-established producers of natural products but only a few diterpenoids have been isolated from the phylum6. To the best of our knowledge, this is the first di-TS characterized from myxobacteria. The GC-MS of 1 showed a molecular ion peak at m/z 272.2499 (Supplementary Fig. 3), supporting a molecular formula of C20H32 and a diterpene with five degrees of unsaturation. Using 1D and 2D NMR (Supplementary Note 1, Supplementary Table 1, Supplementary Figs. 411), 1 was determined to possess an angular tetraquinane skeleton. As diterpenes are particularly well suited for analysis by vibrational circular dichroism (VCD), a powerful technique for the determination of absolute configuration in solution phase without derivatization17,18,19, we used VCD to determine the absolute configuration of 1 to be 1S,6S,9S,10S,13S,14S (Supplementary Fig. 12). The only other tetraquinane diterpenes from nature are the crinipellins, basidiomycete fungal diterpenoids with antibacterial, antifungal, anticancer, and anti-inflammatory properties20. However, the connections between the A and B rings of 1 and the crinipellins are not the same (Fig. 2). To the best of our knowledge, the TS responsible for tetraquinane formation is unknown, and no biosynthetic studies have been reported; only chemical synthetic methods to access the crinipellin skeleton are known21.

Fig. 2: Previously unreported skeletons (1–3) isolated from three different phyla of bacteria.
figure 2

3D representations of 1 and 2 are shown. The phyla encoding the responsible TSs are listed. The fungal crinipellins are structurally the closest known skeleton to 1.

Salbirenol (2), isolated from E. coli expressing SalS from Streptomyces albireticuli (Actinomycetota), is a diterpene alcohol with a 7/5/6-tricyclic skeleton (Fig. 2). The GC-MS (m/z 290.2607; Supplementary Fig. 13) and NMR spectra (Supplementary Note 1, Supplementary Table 2, Supplementary Figs. 1420) of 2 supported its planar structure. Using VCD in comparison with the calculated spectra of nearly all 32 possible stereoisomers, the absolute configuration of 2 was determined to be 1S,2S,5R,8R,9S,14S (Supplementary Fig. 21).

Chitino-2,5(6),9(10)-triene (3), a 5/11-bicyclic diterpene, was one of the major products of ChtS from Chitinophaga japonensis (Bacteroidota; Fig. 2). The GC-MS (m/z 272.2498; Supplementary Fig. 22) and NMR spectra of 3 (Supplementary Note 1, Supplementary Table 3, Supplementary Figs. 2329) revealed its bicyclic skeleton. After determining its relative configuration by 2D NOESY, the absolute configuration of 3 was determined to be 1R,11S,12R by VCD (Supplementary Figs. 3031). At first glance, 3 appears to be a dolabellatriene; however, the positions of its methyl groups on the 11-membered ring confirms it is a previously unreported skeleton, one that has no corresponding natural products in nature, and is likely formed via a different cyclization mechanism than the dolabellatrienes.

Bacteria produce diterpene skeletons found in other organisms

The next five diterpenes are all known skeletons but have not been previously seen in bacteria, to the best of our knowledge, and either do not have characterized TSs responsible for their formation, associated natural products, or both. Peysonnosene (4) was identified as the major product of a TS (PeyS) from the chloroflexota bacterium Anaerolineaceae sp. De novo structural elucidation by GC-MS, 1D and 2D NMR spectroscopy (Supplementary Note 1, Supplementary Table 4, Supplementary Figs. 3239) revealed that 4 shares its 5/6/3/6-fused tetracyclic skeleton with the peyssonnosides, unusual diterpene glycosides isolated from the marine red alga Peyssonnelia sp. (Fig. 3). The relative configuration of 4, which was supported by limited NOESY correlations, was proposed to be the same as that of the peyssonnosides. We confirmed its absolute configuration, 1S,6S,7 R,10S,11S,14S, by VCD (Supplementary Fig. 40). The peyssonnosides have low mM activities against MRSA, the malarial parasite Plasmodium berghei, and the marine fungus Dendryphiella salina, with no cytotoxicity against human keratinocytes22. While the total synthesis of the peyssonnosides has been completed23,24, there are no biosynthetic reports of these natural products. Interestingly, Anaerolineaceae bacteria are known to occur in marine sediments25, providing a potential link between their diterpene production and algal natural products.

Fig. 3: Diterpene skeletons (4–9), known in other organisms, now identified in bacteria.
figure 3

Bacterial TSs are the first known to form 4 and 5; TSs in fungi and plants were known to produce 6 and 9, respectively. The phyla encoding the responsible TSs are listed.

Clavulara-1(15),17-diene (5) is a 6/7/5 tricycle, produced by CvdS from Nocardia vulneris (Fig. 3). The structure of 5, including its absolute configuration, was determined by GC-MS, NMR, and VCD (Supplementary Note 1, Supplementary Table 5, Supplementary Figs. 4150). We named 5 after a small family of coral diterpenoids, the clavularanes, that share the same planar skeleton (Fig. 3)26. However, the presumed biosynthesis of the third ring in the clavularanes, and related dolastanes, stems from oxidized dolabellanes (5/11-bicyclic skeletons)27,28 and no coral TSs are yet known to produce either the dolabellane or clavularane skeletons29.

Variediene (6), a 5/5/9 tricyclic diterpene, was found from two distinct bacterial TSs (Fig. 3, Supplementary Table 6, Supplementary Figs. 5157). PsVS from Prauserella shujinwangii (Actinomycetota) and OdVS from Olivibacter domesticus (Bacteroidota) share only 20% sequence identity. Three fungal chimeric TSs, containing both polyprenyl synthase and cyclase domains, from ascomycetes are known to produce variediene30,31,32. To the best of our knowledge, this is the first report of a bacterial variediene synthase and they share essentially no sequence identity (<20%) with any of the fungal versions. The genuine natural products from the fungal variediene BGCs are still unknown.

The verticillenes are 6/12-bicyclic diterpenes that share a biosynthetic relationship with the taxane and phomactin families of natural products33. Verticillene natural products are known in plants, corals, and insects, and verticillene isomers have been isolated from taxadiene synthase engineered variants34,35. Here, we identified two verticillene synthases, CnVrtS from Chitinophaga niabensis and SVrtS from Streptomyces sp. TLI_235 produce 1S-verticilla-3,7,11(12)-triene (7) and 1S-verticilla-3,8(19),11(12)-triene (8), respectively (Fig. 3, Supplementary Tables 78, Supplementary Figs. 5873). The NMR of 7 matched previous literature, confirming its structure. The NMR of 8, in comparison with that of 7, clearly showed its exocyclic C-8-C-19 olefin. The absolute configuration of 8 was confirmed by VCD (Supplementary Fig. 73).

While genome mining for unique TSs from bacteria, we identified a putative tridomain, bifunctional TS from Chitinophaga japonensis. TSs with both type I and II TS activity, are prominent in plants and fungi36, and while they have been evolutionarily proposed in bacteria37, there were no known examples when we began this study. The longer than typical length, 785 amino acids, piqued our interest in this TS, which was previously annotated as a prenyltransferase/squalene oxidase-like protein but showed similarity to plant bifunctional TSs. Expression in E. coli yielded (-)-sandaracopimaradiene (9, Fig. 3, Supplementary Table 9, Supplementary Figs. 7480), clearly indicating type II activity forming n-copalyl diphosphate from GGPP and subsequent type I cyclization to 9. There are known sets of discrete type II and type I TSs from Actinomycetota that form (iso)pimaradienes38, and at least one bacterial diterpenoid, gifhornenolone B39, formed from this skeleton. At the time of our discovery, no natural bifunctional TS had been identified in bacteria. During preparation of this manuscript, the function of this TS, named ChjDCS, was reported40. This finding opens the door to understanding TS evolution in terrestrial organisms.

Isomers of known skeletons are prevalent in bacteria

We identified several diterpenes with known planar skeletons that have alkenes at different positions or are diastereomers of previously known diterpenes (Fig. 4). Two Streptomyces spp. TSs, SaVenA and SsVenA, produce the 5/5/6/7-tetracyclic scaffold seen in venezuelaene A (10)41. With sequence identities of ~58% to venezuelaene A synthase (VenA), it is not surprising that SaVenA forms 10 (Supplementary Table 10, Supplementary Figs. 8184) and SsVenA forms the nearly identical venezuelaene A2 (11; Supplementary Table 11, Supplementary Figs. 8594). Interestingly, another homologous TS from Streptomyces alkaliterrae, SalkS, with 49% sequence identity to VenA, forms the 5/7/6-tricyclic odyverdiene B2 (12; Supplementary Table 12, Supplementary Figs. 95104). Similar diterpenes were seen in Streptomyces sp. ND9042, but that di-TS is less similar with only 24% identity to SalkS. The absolute configuration of 12, which was determined by VCD (Supplementary Fig. 104), is likely not the same as the previously reported odyverdiene B, whose absolute configuration was not reported42, given differences in their NMR spectra.

Fig. 4: Structural and stereochemical isomers of bacterial diterpenes (10–27).
figure 4

The phyla encoding the responsible TSs and protein sequence identities to known TSs or TSs in this study are listed. Previously unreported and reported diterpenes are shown in blue and yellow boxes, respectively.

Dolabellatrienes, the 5/11-bicyclic precursors to the dolabellane family of natural products known in algae, fungi, marine invertebrates27, are rare in bacteria. Only a few are known and are often implicated as intermediates in more complex cyclization cascades43. SaVenA, the venezuelaene A (10) synthase described above also yields 1S,3E,7E,11R,12S-dolabella-3,7,18-triene (13; Supplementary Table 13, Supplementary Figs. 105107), previously reported as a shunt product from several VenA mutated variants44. CbDS, from Chitinophaga barathri, forms 14 (Supplementary Table 14, Supplementary Figs. 108115), a diastereomer of 13; a relative configuration of 1R*,11S*,12S* for 14 is supported by NOESY. Finally, we identified di-TS from cyanobacteria: ShDS from Scytonema hofmannii, which was shown to form 1R,7E,10E,12R-dolabella-4(16),7,10-triene (15; Supplementary Table 15, Supplementary Figs. 116124), a possible precursor to the coral diterpenoids stolonitriene and stolonidiol45.

KpCdS from Kibdelosporangium phytohabitans (Actinomycetota), with 19%–28% identities to the dolabellatriene synthases above, formed a 5/8/5-tricyclic skeleton that likely originates from a dolabellyl intermediate. The structure of this diterpene (16; Supplementary Table 16, Supplementary Figs. 125134) matched that of the previously reported cyclooctat-7(8),10(14)-diene (17) from Streptomyces lactacystinaeus OM-615942, but its NMR was not identical. Since the absolute configuration of 17 was not reported, we obtained slt18_1078, which encodes this di-TS (here named SlCdS), and isolated 17 (Supplementary Table 17, Supplementary Figs. 135143). Comparison of the NOESY data and VCD analysis concluded their absolute configurations: 2S,3S,6 R,7Z,10Z,11R-16 from KpCdS and 2 R,3S,6 R,7Z,10Z,11S-17 from Slt18_1078. These two enzymes share 27% identity yet construct diastereomeric 5/8/5-tricyclic diterpenes.

(+)-Monoprenyl-b-curcumene (18) was produced by StMpcS from Saccharopolyspora terrae (Supplementary Table 18, Supplementary Figs. 144150). Monocyclic 18 is the C20 version of β-curcumene produced by plant sesqui-TSs and mutants of epi-isozizaene synthase46,47 and tetraprenyl-β-curcumene from YtpB in Bacillus48. StMpcS shares 30% sequence identity with epi-isozizaene synthase, but no significant sequence similarity with the plant sesqui-TSs or the non-canonical “large TS” YtpB. Tetraprenyl-β-curcumenes are often precursors for a 2nd cyclization reaction, but the BGC encoding StMpcS does not have a type II TS nearby suggesting different downstream tailoring modifications (Supplementary Data 3).

Distinct TSs construct the same diterpenes

As described above for the variediene (6) synthases from different bacterial phyla and fungi, it is not unprecedented for TSs with disparate sequences, often from different phyla or genera, to form identical products. Here, we list several examples of known diterpenes isolated in this study (Fig. 4).

Benditerpetriene (19), a 2E,6E-cis-eunicellane, which is known in benditerpenoic acid and aridacin biosynthesis49,50 was produced by two additional TSs (NBnd4, MaBnd4) from Nocardia sp. SYP-A9097 and Mycobacteroides abscessus (Supplementary Table 19, Supplementary Figs. 151153). NBnd4 and MaBnd4 share 40% and 37% sequence identity with Bnd4, respectively, and while NBnd4 is found in a BGC similar to that of benditerpenoic acid and the aridacins49, it is intriguing to consider why the pathogenic M. abscessus encodes a eunicellane synthase.

Prehydropyrene (20), a diastereomer of benditerpetriene (19), was found to be the major product of SPhpS and CpPhpS (Supplementary Table 20, Supplementary Figs. 154160). Although 20 is known as a neutral intermediate in the biosynthesis of hydropyrene51, to the best of our knowledge, this is the first report of a prehydropyrene synthase. SPhpS, from Streptomyces sp. TLI_053, shares no significant sequence identity with hydropyrene synthase (HpS) from Streptomyces clavuligerus; CpPhpS, from Chryseobacterium populi (Bacteroidota), shares 21% sequence identity with HpS.

Micromonocyclol (21), a diterpene alcohol with a rare 15-membered ring originally identified from McS from Micromonospora marina52, was rediscovered from three TSs (Supplementary Table 21, Supplementary Figs. 161166). MpMcS and MbMcS are from two Micromonospora spp. with 62% and 61% sequence identities to McS; SMcS is from Streptomyces sp. CB01580 and only shares 28% identity.

Several 14-membered monocyclic diterpenes were identified from bacterial TSs. MpMcS and MbMcS, as well as MlCAS from Micromonospora sp. Liam0 and BpCAS from Belliella pelovolcani (Bacteroidota) produced cembrene A (22; Supplementary Table 22, Supplementary Figs. 167173); BpCAS also formed (+)-nephthenol (23; Supplementary Table 23, Supplementary Figs. 174176) but does not show any significant identity (<18%) to DtcycA, a producer of 2353. NyICS, from Nocardia yunnanensis, forms isocembrene C (24; Supplementary Table 24, Supplementary Figs. 177182).

Spiroviolene (25), a spirocyclic triquinane diterpene from Streptomyces violens and known in fungi54,55, was produced by S130SvS, a TS with 50% ID to SvS from S. violens (Supplementary Table 25, Supplementary Figs. 183189). Spata-13,17-diene (26) and cneorubin Y (27) were found from a TS (SpS) with high homology to the known spatadiene synthase (Supplementary Tables 2627, Supplementary Figs. 190194)56. In some cases, we identified the GGPP elimination product b-springene (28), such as from the Streptomyces sp. Tü2975 TS (Supplementary Table 28, Supplementary Figs. 195200). We hypothesized that this is not a di-TS and should be screened for preference to other prenyl diphosphates. Gratifyingly, this TS was recently characterized as the sesterterpene synthase, StSS, responsible for producing sesterviolene57.

Diterpene products verified by in vitro characterization

To verify that the TSs are catalyzing the same reactions in vitro as they do in our heterologous expression system, we subcloned TS genes encoding TiqS, SalS, PeyS, OdVS, SalkS, and SPhpS with N-terminal His6 tags and purified them from E. coli (Supplementary Fig. 201). For each of the six TSs, the major products matched with those produced in E. coli (Fig. 5), supporting that the isolated products are genuine bacterial diterpenes. It remains to be seen if expression in their native hosts yields identical compounds; although recent characterization of AlbS, a trans-eunicellane synthase from Streptomyces albireticuli, in vitro and via heterologous expression in both E. coli and Streptomyces albireticuli confirmed the same enzymatic product58,59.

Fig. 5: HPLC analysis of in vitro reactions of six TSs with GGPP in comparison to extracts obtained from expressing these genes in our GGPP-production system in E. coli.
figure 5

Absorbance was measured at 210 nm.

In conclusion, we used genome mining to identify and screen a large library of bacterial TSs for diterpene activity. This screen nearly doubles the number of characterized bacterial diTSs. It is highly probable that many of the remaining TSs are active but are selective for different prenyl lengths (i.e., monoterpene, sesquiterpene, or sesterterpene synthases); these studies are being conducted now. Libraries like this one will be essential for advancing our understanding of how these enzymes spatially and stereochemically control cyclization and future approaches to predicting TS function.

It is evident that the bacterial terpenome6, the collection of genomically-encoded terpenoid natural products, is much larger than previously thought. Diterpenes that are either previously unobserved in nature or those seen, to the best of our knowledge, for the first time in bacteria are likely precursors to more complex diterpenoid natural products. We characterized type I diterpene synthases from myxobacteria and cyanobacteria and also identified functional TSs in organisms, including human pathogens, that were not known to produce terpenoids. It is also tempting to speculate the possibility that bacterial symbionts may be producing some of the known terpenoids in eukaryotic systems60; however, the discovery of coral, and very recently arthropod, TSs support that at least some, if not all, are eukaryotic natural products61,62,63. Very little is known about the ecological roles of bacterial terpenoids, although recent studies suggest volatile terpenes enable interspecies communication64,65. Finding similar compounds in multiple phyla or organisms may provide insights into some of these difficult-to-answer questions.

Methods

General experimental procedures

All 1H, 13C, and 2D NMR (1H-13C HSQC, 1H-1H COSY, 1H-13C HMBC, and 1H-1H NOESY) experiments were run in CDCl3 or C6D6 on a Bruker AVANCE III Ultrashield 600, Bruker AVANCE III HD 600, or Bruker AVANCE III 800. All NMR chemical shifts were referenced to residual solvent peaks or to Si(CH3)4 as an internal standard. Terpene production was monitored by thin-layer chromatography (TLC) and high-performance liquid chromatography (HPLC). TLC was performed with 0.25 mm silica gel plates (60 F254) using short-wave UV light to visualize, and I2 or KMnO4 and heat as developing agents. HPLC was performed on an Agilent 1260 Infinity LC equipped with a Restek Roc-C18 column (150 mm × 4.6 mm, 5 µm). Preparative HPLC was carried out on an Agilent 1260 Infinity LC equipped with an Agilent Eclipse XDB-C18 column (250 mm × 21.2 mm, 7 µm). GC-MS analysis was run using a Thermo Scientific Orbitrap Exploris spectrometer with a Rxi-5MS column (Restek Corp, 30 m × 0.25 mm i.d. and 0.251 µm film). Optical rotations were measured using a JASCO P-2000 polarimeter. IR data were measured by PerkinElmer Spectrum Two FT-IR Spectrometer. VCD data were collected on a BioTools, Inc. (Jupiter, FL) ChiralIR 2X Dual PEM FT-VCD spectrometer.

Bacterial strains, plasmids, and chemicals

Strains, plasmids, and PCR primers used in this study are listed in Supplementary Tables 2931. All genes encoding bacterial TSs screened in this study were synthesized by the U.S. Department of Energy Joint Genome Institute as part of a User Proposal for Functional Genomics (part of proposal: 10.46936/10.25585/60008123). Genes were codon-optimized for E. coli and cloned into pET28a via Gibson assembly. PCR was conducted by PCR primers were obtained from Sigma-Aldrich. Q5 high-fidelity DNA polymerase and restriction endonucleases were purchased from NEB and used by following the protocols provided by the manufacturers. DNA gel extraction and plasmid preparation kits were purchased from Omega Bio-Tek. DNA sequencing was conducted by Genewiz. Common chemicals and media components were purchased from standard commercial sources.

Bioinformatics

The phylogenetic trees were constructed via the phylogenetic tree builder function in MEGA11 using Muscle for multiple sequence alignment, neighbor joining algorithm for tree creation, and bootstrap analysis using 1000 replicates66. Bootstrap values, ranging from 0 (near the root) to 1 with half of the nodes >0.3, were removed from the figures for clarity. The tree of bacterial phyla (Fig. 1A) was built from a multiple sequence alignment of the 16S rRNA sequences of 42 bacterial phyla (Supplementary Table 32). The tree of bacterial TSs was constructed using the amino acid sequences of 377 uncharacterized bacterial TSs and 141 characterized bacterial di-TSs from the literature. The phylogenetic trees were then visualized and annotated using iTOL67. Each TS sequence was additionally manually analyzed to identify the presence of highly conserved type I TS motifs (Supplementary Data 2). Protein sequence alignment was performed using ClustalW (http://www.clustal.org/) and the results were displayed using outputs created with ESPript 3.0 (http://espript.ibcp.fr/ESPript/ESPript/).

Terpene synthase screen

Plasmids harboring synthesized TS genes in pET28a were isolated using NEB Turbo E. coli grown in lysogeny broth (LB) with kanamycin (50 mg L−1) following manufacturer’s protocols. Each plasmid was individually transformed into E. coli BL21 Star (DE3) competent cells already containing pJR1064b, a pCDF-Duet vector harboring genes that produce GGPP in vivo via an alternative isoprenoid precursor pathway58. Transformants were cultivated in LB containing kanamycin (50 mg L−1) and streptomycin (50 mg L−1). The cells were cultured overnight were subsequently inoculated into 50 mL of Terrific Broth (TB) media at 37 °C with shaking at 250 rpm until reaching an optical density at 600 nm (OD600) of 1.0. Isopropyl β-d-1-thiogalactopyranoside (IPTG, 0.5 mM final concentration) and isoprenol (4 mM, final concentration) were added to the culture and the culture was shaken at 250 rpm for 48 h at 28 °C. For analysis, 1 mL of culture was extracted using 0.5 mL of acetonitrile, saturated with solid NaCl, after vigorous vortexing and centrifugation at 21,300 × g for 5 min. The organic layer, containing the terpene products of each TS, was initially analyzed by TLC or HPLC. For HPLC analysis, products were detected at 210 nm using a linear gradient with flow rate of 1 mL min−1: 5% acetonitrile/water (0–5 min); 5% to 95% acetonitrile/water (5–35 min at 5% min−1); 95% acetonitrile/water (hold 35 min) at 35 °C.

Isolation and purification of terpenes

Large-scale fermentations of each target TS were conducted as described above, but in 6–12 L. To isolate the diterpene products, cultured cells were harvested by centrifugation at 5000 × g for 10 min at 25 °C and subsequently transferred to a glass beaker. The cell pellets were extracted with acetone (2× volume of the pellet) and the organic phase was extracted with the same volume of hexanes three times. The hexanes extractions were combined and concentrated in vacuo at room temperature. The resulting extract was redissolved in hexanes and purified by silica chromatography, employing an isocratic hexanes mobile phase. Fractions containing the target products were combined and subjected to further purification using preparative LC. For preparative LC, products were detected at 210 nm using a linear gradient with flow rate of 20 mL min-1: 5% acetonitrile/water (0–5 min); 5% to 95% acetonitrile/water (5–15 min at 9% min−1); 95% acetonitrile/water (hold 65 min). Isolated yields and spectroscopic data of 128 are reported in Supplementary Note 2 and 3, respectively.

GC-MS analysis

For GC-MS analysis, purified diterpenes were dissolved in hexanes at a concentration of 0.1 mg mL−1. The source, transfer line, and injection port were set to 250 °C, respectively, and the carrier gas flow rate was set at 1.2 mL min-1. Products were measured with an electron ionization of 70 eV and mass scan range was from m/z 30–500 @ 30000 resolution with a temperature gradient as follows: 50 °C (0–3 min), ramp to 280 °C @ 10 °C min−1 (hold 5 min). Kovats retention index (RI) values were calculated for all products in this study using the GC-MS conditions above in comparison with C7–C30 saturated alkanes (Sigma).

VCD measurements and calculations

To a small vial containing 2.5–44 mg of diterpene was added 100–125 μL of CDCl3. CDCl3 (Cambridge Isotope Labs Silver Foil) used to dissolve diterpenes was run through a small plug of activated basic alumina immediately before use. The resulting solution was transferred to a liquid IR cell (BaF2, 100 μm cell path) and placed in the measurement chamber. The spectrometer was set to 4 cm−1 resolution with PEM (both 1 and 2) maximum frequency set to 1400 cm−1. The sample was then measured for 4–18 h in 1-h blocks. The IR data from the first block were solvent and water vapor subtracted, then offset to zero at 2000 cm-1. The VCD data blocks were averaged, and from this was subtracted a previously measured solvent baseline (18-h avg.). Finally, the VCD spectrum was offset to zero at 2000 cm−1. The VCD noise data were block averaged and used without further processing. For detailed methods on VCD calculations and calculation data, see Supplementary Method 1 and Supplementary Data 4, respectively.

In vitro terpene synthase assays

For each in vitro enzyme reaction, 40 μM TS was incubated at 37 °C in 50 mM Tris-HCl, pH 8.0, containing 10 mM GGPP and 10 mM MgCl2 at a total volume of 100 µL. The reactions were incubated for 1 h at 37 °C, quenched with equal volume of acetonitrile, and saturated with solid NaCl. After separating the two phases by vortexing, the organic phase was analyzed by HPLC and/or GC-MS. For HPLC and GC-MS analysis, the methods described above were used. For detailed methods on cloning design for protein production and protein purification, see Supplementary Method 2 and 3, respectively.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.