Introduction

In the realm of flavonoid biosynthesis, more than 9000 distinct compounds have evolved from a limited set of shared precursors, making flavonoids one of the most chemically diverse families of natural products1. This remarkable diversity is largely driven by tailoring enzymes2,3,4, which operate downstream of core scaffold assembly to introduce context-dependent modifications such as methylation, hydroxylation, glycosylation, and cyclization. These post-assembly transformations critically shape a compound’s bioactivity, solubility, stability, and subcellular localization5,6. This post-assembly process is not only critical for generating the diverse array of natural products, but also adds an additional layer of regulation that ultimately shapes the biological activity of these compounds.

However, this tailoring complexity presents a challenge when attempting to reconstruct or engineer the biosynthesis of specific target compounds. The intricate sequence of reactions due to the promiscuity often gives rise to a metabolic network—each step interwoven and highly context-dependent7,8. Such complexity makes it particularly challenging to fully elucidate the biosynthetic pathways of specialized metabolites with multiple modifiable sites and obscure tailoring logic. One notable case is glabridin9, a naturally occurring prenylated isoflavan. It is renowned for its skin-brightening, anti-inflammatory, and anti-aging properties9, and commands a multimillion-dollar global market in high-end skin-care cosmetics. Currently, the global supply chain for glabridin depends on extraction and purification from roots of G. glabra, with yields as low as 0.08–0.35% of root dry weight10, contributing to desertification and environmental degradation. Meanwhile, glabridin is difficult to chemically synthesize due to the challenge of stereoselective construction of the C-3 chiral center and the chemoselective reduction of conjugated systems11. Microbial biosynthesis represents a promising and sustainable alternative; however, although glabridin was isolated from licorice in 1976, and its native biosynthetic pathway in G. glabra remains poorly resolved.

In this study, we apply a metabolic pathway search algorithm integrated with experimental metabolic data to comprehensively elucidate and map a maze-like biosynthetic network of glabridin. By leveraging a high-quality genome assembly together with extensive transcriptome datasets of Glycyrrhiza species, we identify all key tailoring enzymes involved in glabridin biosynthesis. These enzymes are then used to reconstruct a multi-route synthetic pathway, in which a single set of promiscuous enzymes supports multiple parallel biosynthetic routes. Through this reconstruction, we uncover distinct biosynthetic modes underlying glabridin formation and ultimately achieve de novo biosynthesis of glabridin in Saccharomyces cerevisiae, paving the way for a sustainable alternative to traditional plant extraction and conservation of wild plant populations.

Results

Glabridin pathway design and core nodes identification

To discover the biosynthesis pathway of glabridin, we employed a metabolic pathway search algorithm based on known enzymatic reactions in combination with a retrosynthetic algorithm derived from reaction rules12,13,14 (Supplementary Fig. 1a–c), which enables systematic pathway design by stepwise deconstruction of glabridin (15) into endogenous yeast metabolites. The reaction-rule-based pathway search is designed to enumerate all biochemically plausible routes supported by known enzyme reaction types, including both pathways that could represent the biosynthetic route in licorice and others that licorice may not actually use. To focus on routes relevant to licorice, the network is further refined by retaining only intermediates detected in licorice (Fig. 1a). Using this curated compound–conversion network, we constructed a directed graph and performed a systematic depth-first traversal, which identified 13 potential biosynthetic pathways leading to the production of glabridin, involving L-phenylalanine (1) as precursor and 21 intermediate metabolites (Fig. 1a and Supplementary Fig. 1d). Among these, 9 compounds are located at pathway branch nodes, and over 69% of the predicted pathways include more than four branch nodes. The presence of such diverse and numerous branch points significantly increases the complexity of glabridin biosynthesis. Interestingly, this complexity is governed by dynamic methylation–demethylation and dihydrofuran cyclization–decyclization progress, in which the sequence of demethylation and decyclization appears to direct metabolic flux through distinct biosynthetic routes.

Fig. 1: The maze-like glabridin biosynthesis pathway.
Fig. 1: The maze-like glabridin biosynthesis pathway.
Full size image

a Summary of predicted glabridin biosynthesis. b Metabolomic profiling of different organs and species of Glycyrrhiza, including G. glabra (Gg), G. uralensis (Gu), and G. inflata (Gi). c The key tailoring steps in glabridin biosynthesis. Dash arrows in (a) and (c) represent uncharacterized step, gray arrows represent characterized steps, and the overlapping arrows represent a multi-step reaction. The color bar in (b) indicates the log2-transformed peak area. Metabolite abundance is categorized into four levels based on accumulation in G. glabra roots: high (> 24), medium (> 18), low (> 12) and undected. The bar chart in the upper right corner of each compound in (a) indicates its detection in G. glabra roots, where the number of solid brown bars corresponds to the abundance level, and hollow bars indicate that the compound was not detected. Abbreviations not already defined in the text are as follows: isoflavone synthase (IFS), 2,7,4′-trihydroxyisoflavanone 4′-O-methyltransferase (4′OMT), 2-hydroxyisoflavanone dehydratase (HID), isoflavone 2′-hydroxylase (I2′H), isoflavone reductase (IFR), vestitone reductase (VR), pterocarpan synthase (PTS), and 2′-hydroxydihydrodaidzein synthase (THIS). Source data are provided as a Source Data file.

To evaluate feasible biosynthetic routes among those predicted, we conducted metabolomic profiling of three Glycyrrhiza species: G. glabra, G. uralensis, and G. inflata (Fig. 1a, b). Metabolomic analysis revealed that major metabolites—compounds 5, 11, 13, 15, and 20— are predominantly accumulated in the roots compared to leaves and stems. Glabridin was abundantly detected only in G. glabra, while at low levels in G. inflata. Furthermore, several predicted intermediates (4, 8, 9, 10, 12, and 14) were also identified in G. glabra, providing support for their roles as key metabolic network nodes15. Thus, through node-centric analysis of the metabolomic data, the predicted routes major incorporated with key intermediates, such as medicarpin (8). To address this complexity, we focused on categorizing the unresolved steps in glabridin biosynthesis into four major classes of tailoring reactions: pterocarpin reduction, prenylation, prenyl cyclization, and demethylation (Fig. 1c), which are likely mediated by pterocarpin reductases (PTRs), prenyltransferases (PTs), oxidative cyclases (OCs) and demethylases (DMTs), respectively. PTRs, PTs and OCs enzymes have the potential to broaden the range of intermediate metabolites. In addition, DMTs catalyze critical branch-forming reactions, which direct the metabolic flux toward different outcomes. The four tailoring reactions can proceed in different sequential orders through various branching nodes, ultimately leading to produce glabridin (Fig. 1a).

Systematic omics analysis for enzyme discovery

To enable systematic identification of gene suites underlying glabridin biosynthesis, we first generated a high-quality, chromosome-level genome assembly of G. glabra as a foundational resource (Fig. 2a and Supplementary Fig. 2a). Wild G. glabra plants were collected from natural habitats in northwest China for genome sequencing. The assembled genome spans 415 Mb and is organized at the chromosome level, with annotation predicting 34,941 high-confidence protein-coding genes. This genome assembly, together with extensive seasonal and organ-specific transcriptome datasets, enabled integrative genome mining and co-expression analyses to identify candidate enzymes involved in glabridin biosynthesis (Fig. 2). Using orthologous gene groupings, we constructed a phylogenetic tree comparing G. glabra with G. inflata, G. uralensis, and other legume-related species. The analysis revealed that G. glabra shares the closest evolutionary relationship with G. inflata (Supplementary Fig. 2b, c). Based on this, we conducted a systematic genome-wide annotation, with a particular focus on candidate tailoring genes. Among these, PTRs, along with VRs and IFRs, are NADPH-dependent reductases16,17,18 that share similar catalytic mechanisms and high sequence similarity18, and were functionally annotated via phylogenetic analysis (Supplementary Fig. 2d).

Fig. 2: Genomic and transcriptomic analysis of G. glabra resources.
Fig. 2: Genomic and transcriptomic analysis of G. glabra resources.
Full size image

a G. glabra pseudochromosome genome assembly and gene annotation statistics. b Genomic distribution of biosynthetic enzyme candidates for glabridin in G. glabra. The outermost layer indicates the density of candidate genes across the genome (measured as the number of genes per 1 Mb). c The transcriptional profiles of these genes across five organs of G. glabra: tap roots, horizontal roots, stems, leaves and sprouts (121 genes compared across 53 organ samples). The expression levels are represented as Z-scores derived from log2-normalized TPM. A specific cluster enriched in root-expressed genes is highlighted in red. Source data are provided as a Source Data file.

To investigate the transcriptional patterns of genes involved in glabridin biosynthesis, we conducted a large-scale transcriptomic study. We sampled five major Glycyrrhiza species across seven locations, at four key critical growth periods, and from different organs, resulting in a total of 183 distinct samples (Supplementary Fig. 3a, b). The clustering analysis revealed that the organ-specific transcriptome profiles group mainly into three clusters: (1) tap roots and horizontal roots, (2) stems and sprouts, (3) leaves. In contrast, the overall transcriptomic differences among species were relatively minor (Supplementary Fig. 3c), with variations observed only in certain specific genes. Candidate genes were identified through a multi-layered screening strategy that combined sequence-based functional annotation and subsequent refinement using phylogenetic clustering and conserved-motif analysis (Supplementary Table 1). This integrative process yielded 7 PTRs, 18 PTs, 39 OCs, and 6 DMTs for downstream analysis. These genes exhibited chromosomal clustering, primarily located on chromosomes 5 and 6 (Fig. 2b)19.

To further identify genes involved in glabridin synthesis, we performed hierarchical clustering to generate co-expressed patterns of transcripts across five organs. The clustering revealed a root-enriched gene expression cluster, including 1 PTR, 9 PTs, 8 OCs, and 2 DMTs (Fig. 2c and Supplementary Data 1). To refine candidate genes, we used a previous reported key gene GgI2′H120 as the bait for co-expression analysis. This analysis prioritized candidate enzymes, including 1 PTR, 7 PTs, 2 OCs, and 2 DMTs (Table 1).

Table 1 Functional annotation and classification of candidate genes

Stepwise characterization of glabridin biosynthetic key enzymes

To characterize the initial tailoring enzyme involved in the glabridin pathway, we focused on pterocarpin reductase (PTR) that catalyzes the reductive cleavage of the dihydrofuran ring within potential core scaffolds, medicarpin (8) and demethylmedicarpin (12) (Fig. 3). We tested one PTR candidate, GgPTR1, identified through co-expression analysis. This enzyme converts medicarpin (8) and demethylmedicarpin (12) into vestitol (9) and demethylvestitol (13), with conversion rates of 90.6 ± 3.6% and 52.7 ± 0.9%, respectively (Fig. 3a). This confirms co-expression analysis as an effective strategy for prioritizing candidate enzymes. To identify more efficient PTRs, we further evaluated the remaining 6 PTR genes. Among them, the homolog GgPTR5 showed detectable activity but remained much weaker than GgPTR1. Because of its sequence similarity to GgPTR1, we attempted to improve GgPTR5 by introducing amino-acid substitutions in its substrate-binding pocket. Although these mutations produced slight increases in activity, none were able to bring GgPTR5 close to the activity of GgPTR1 (Supplementary Fig. 4a, b). Molecular dynamics simulations further revealed unstable substrate binding in GgPTR5, suggesting that its low activity arises from overall structural limitations rather than defects at individual residues (Supplementary Fig. 4c, d). Notabley, GgPTR2 with a different transcription pattern—highly expressing in the stems and leaves—also exhibits activity, with conversion rates of 58.8 ± 7.6% for medicarpin (8) and 4.7 ± 2.5% for demethylmedicarpin (12), respectively (Fig. 3a). In addition, although sharing 71% sequence similarity with GgPTR1, GgPTR2 can convert licoagrocarpin (20) to 4′-O-methylpreglabridin (10), whereas GgPTR1 shows no activity, indicating selective recognition of prenylated substrates by GgPTR2 (Fig. 3b). To further validate their activities, we performed in vitro assays for both GgPTR1 and GgPTR2 (Supplementary Fig. 5a–e). Both enzymes showed clear catalytic activity toward their respective substrates. These results are consistent with the in vivo yeast assays and further confirm the functions of the two enzymes. To further explore the functional divergence, we modeled their structures in complex with medicarpin (8) and licoagrocarpin (20) (Fig. 3c and Supplementary Fig. 6). The electrostatic potential maps21 indicated that both enzymes possess a positively charged catalytic pocket, attracting the electronegative phenolic hydroxyl groups of the substrates. Additionally, GgPTR2 exhibited a similar binding orientation for both 8 and 20, consistent with the binding of 8 in GgPTR1. In contrast, compound 20 adopted a 180-degree flipped orientation in GgPTR1, which may explain its lack of catalytic activity toward this substrate (Fig. 3c).

Fig. 3: Characterization of the four tailoring enzymes.
Fig. 3: Characterization of the four tailoring enzymes.
Full size image

a Catalytic activity characterizations of PTRs. The organ-specific expression heatmap demonstrates the transcriptome data from tap root, horizontal root, stem, and leaf. Expression levels are log2-transformed and normalized as Z-scores. b Functional characterization of PTRs in catalyzing the conversion of prenylated pterocarpan substrate (20). c Structural modeling of GgPTR1 and GgPTR2 in complex with substrates (8 and 20) using AlphaFold361. Box regions highlight major differences in the catalytic pocket. Electrostatic potential calculated using PyMOL APBS tools; Connolly surface shown within ± 5 eV. Functional characterization and phylogenetic analysis of (d) PTs and (e) OCs. f Substrate profiling of DMTs in catalyzing the demethylation of intermediate compounds. g Yeast confocal microscopy showing subcellular location of mCherry-GgPT1, GgOC1-mCherry, GgPTR1-mCherry, NhPDA1-mCherry and GgDMT1-mCherry. Scale bar, 5 μm. Images are representative of three independent experiments. h Locations of GgPT1, GgOC1, GgPTR1, and GgDMT1 across three Glycyrrhiza species and their corresponding evolutionary relationships. i Comparison of transcriptional expression levels of GgPT1, GgOC1, GgPTR1, and GgDMT1 across three Glycyrrhiza species (G. glabra, n = 26; G. inflata, n = 12; G. uralensis, n = 39) in roots. Box plots in (i): lines are median, box limits are quartiles 1 and 3, whiskers are 1.5 × interquartile range, and points are outliers. Statistical significance in (i) was assessed using a two-sided Student’s t test. Exact P values are shown in the figure. The bar charts in (a) and (df) show molar conversion rates and are presented as mean ± SD from n = 3 independent biological replicates. Functional assays were performed by feeding 20 mg/L substrate in (a) and 50 mg/L substrate in (d), (e), and (f), and conversion rates were determined based on the relative peak areas of the corresponding products. Source data are provided as a Source Data file.

Next, to identify prenyltransferases of G. glabra that act on isoflavan substrates, we tested the seven PT candidates and found GgPT1 exhibits detectable activity, catalyzing the conversion of compounds 9 and 13 into 4′-O-methylpreglabridin (10) and preglabridin (14) (Fig. 3d). Other PT genes, despite sharing a certain degree of phylogenetic similarity with GgPT1, show no catalytic activity toward the tested substrates (Fig. 3d). Amino-acid comparison revealed that GgPT1 contains seven unique residues absent in the other GgPTs, and mutational analysis demonstrated that K242 is essential for its catalytic activity (Supplementary Fig. 7; Supplementary Table 2). Moreover, we tested the activity of all seven GgPTs toward compound 8 and found that none were capable of catalyzing its conversion to compound 20 (Supplementary Fig. 8). In contrast, a previously reported enzyme PcM4DT was able to catalyze the reaction from 8 to 2022.

After specific prenylation, intramolecular cyclization is required to form the dihydrobenzopyran moiety23,24,25,26. Previously reported enzymes that catalyze oxidative cyclization of prenylated isoflavonoid belong to the cytochrome P450 family, such as CYP71D8 and CYP82A227,28. Co-expression analysis led to the identification of two candidate genes, GgOC1 and GgOC2. Interestingly, the GgOC1 exhibits an asymmetric co-expression pattern (Supplementary Table 3): the expression of OC shows a correlation with the transcriptional activity of the upstream genes(i.e., GgI2′H1), whereas the upstream genes show relatively weak association with OC expression. This strongly suggests that the GgOC1 is most likely to participate in glabridin biosynthesis. Indeed, functional assays of the two OCs demonstrated that only GgOC1 catalyzes the oxidative cyclization of 4′-O-methylpreglabridin (10) or preglabridin (14), yielding 4′-O-methylglabridin (11) or glabridin (15) (Fig. 3e). In addition, we tested GgOC3 and several other BBE-like enzymes and found that they were also inactive. To investigate the cause of their inactivity, we performed mutational analysis, which showed that residues I293 and T295 of GgOC1 are essential for catalysis, explaining the loss of activity in both GgOC2 and GgOC3 (Supplementary Fig. 9; Supplementary Table 4). The catalytic activity of GgOC1 was further validated in vitro using yeast microsomes (Supplementary Fig. 5f-g), and the results were consistent with the in vivo yeast assays, confirming the function of GgOC1. Further analysis revealed that GgOC1 belongs to the berberine bridge enzyme family and clusters with previously reported MaOC1 in phylogenetic tree, which is responsible for the oxidative cyclization of prenylated chalcones29.

To date, no DMTs catalyzing isoflavonoid demethylation have been reported in plants. To find the DMTs in licorice, we identified GgDMT1 as the second-ranked candidate in the co-expression analysis. It encodes a 2-oxoglutarate-dependent dioxygenase and reveals the ability to completely realize the demethylation of compound 11 (Fig. 3f). Interestingly, when checking the substrate profiling, we found GgDMT1 shows high promiscuity and efficiently demethylates both vestitol (9) and 4′-O-methylpreglabridin (10). It also shows moderate activity toward formononetin (5), but displays very weak activity on medicarpin (8) (Fig. 3f). The activity of GgDMT1 was further confirmed in vitro using purified protein (Supplementary Fig. 5h-l). Besides, GgDMT2 showed no detectable activity toward any of the tested methylated substrates (Supplementary Fig. 10). Docking analyses indicated that GgDMT2 misorients the substrate in the active site, impairing productive catalysis. Likewise, changing GgDMT1 at residue N303 (N303R), which affects substrate positioning, also abolishes its activity (Supplementary Fig. 11). To further expand the enzyme toolbox for specific 4′-O-demethylation, we screened a panel of previously reported demethylases (Supplementary Table 5), including cytochrome P450s enzymes30,31,32,33,34 from plant, fungal and human. Among these, only the fungal enzyme NhPDA1 displayed demethylation activity, displaying promiscuity as well toward multiple 4′-O-methylated substrates (5, 8, 9, 10, and 11) (Fig. 3f). These observations are consistent with prior reports that demethylases often exhibit broad substrate specificity29,34,35.

Given that GgDMT1, originating from G. glabra, exhibits strong substrate promiscuity and high catalytic activity, it raises the question of why methylated intermediates still accumulate in this species. Analysis of seasonal transcriptional patterns revealed an opposite trend between O-methyltransferase and demethylase expression, suggesting that precursor synthesis and final product accumulation may occur in temporally distinct stages (Supplementary Fig. 12). Furthermore, subcellular localization studies showed that GgDMT1 is cytoplasmic, whereas GgOC1 and GgPT1 co-localized with Sec61–mNeonGreen36, indicating that both enzymes are localized to the endoplasmic reticulum (Fig. 3g, Supplementary Fig. 13, Supplementary Fig. 14). This spatial separation implies that the four tailoring enzymes may be compartmentalized—potentially within specific organelles—thereby facilitating intermediate segregation and stepwise regulation of the biosynthetic pathway37,38,39.

In addition, we found that throughout the pathway, methylation occurs first, followed by demethylation. This raises another question: whether such complexity is essential for the pathway. When we examined alternative routes that bypass methylation reactions, we found that the I2′H1-IFR1-VR1 module is capable of catalyzing the conversion from intermediate 4 to 12. However, its catalytic efficiency is extremely low (Supplementary Fig. 15a, b). Molecular dynamics simulations revealed that 17 is more unstable in combination with VR compared to 7 (Supplementary Fig. 15c, d). These results suggest that methylation tailoring effectively impedes the non-methylated route (P2-P4) and then directs metabolic flux toward the efficient methylated pathway (P5-P13).

Interestingly, when we analyzed the presence of these key enzymes in the other two Glycyrrhiza species, we found that both harbor homologous genes with > 93% protein sequence identity, and their genomic loci are relatively conserved (Fig. 3h, Supplementary Fig. 16 and Supplementary Data 2). We also analyzed the transcriptional differences of the four key enzymes across the three species and found that the transcription level of PT in G. glabra is significantly higher than in the other two species. PTR, OC and DMT also show significantly higher transcript levels in G. glabra compared to G. inflata (Fig. 3i). These findings help explain the accumulation of glabridin in G. glabra, as observed in our metabolomic analysis (Fig. 1b).

De novo biosynthesis of glabridin in engineered yeast

To realize the biosynthesis of glabridin in yeast from glucose, we designed the pathway with two modules, a core scaffold module (glucose to 8) and a tailoring module (8 to 15) (Fig. 4a). We first assembled the tailoring module using characterized enzymes. Given that all involved enzymes-including PTR, PT, OC, and DMT-exhibit broad substrate promiscuity, we reconstructed the metabolic landscape from compound 8 to 15 and discovered a complex, interconnected tailoring network. Depending on the sequence of dihydrofuran decyclization steps, this network supports at least two major tailoring routes: PT-PTR-OC and PTR-PT-OC. We then explored different enzyme combinations to further unravel this “tailoring maze”, aiming to realize glabridin production.

Fig. 4: Glabridin biosynthesis module assembly and de novo production in yeast.
Fig. 4: Glabridin biosynthesis module assembly and de novo production in yeast.
Full size image

a Scheme of the glabridin biosynthetic pathway reconstructed in S. cerevisiae. b Stepwise reconstitution of the glabridin biosynthetic pathway (PT–PTR–OC). 20 mg/L 8 and 10 mg/L 12 were added separately to engineered yeast strains with PcM4DT-GgPTR2-GgOC1 combination. The squares on the bottom indicate different enzyme combinations: gray indicates enzymes present in the chassis, blue indicates newly introduced enzymes, and white indicates enzymes not introduced. c Ladder-like biosynthetic network of glabridin (PTR–PT–OC–DMT). d, e Metabolic dynamics of the ladder-like glabridin biosynthetic network incorporating with either (d) NhPDA1 or (e) GgDMT1. Only the dominant conversions are shown because competing branches do not accumulate detectable intermediates under these pathway conditions. f Extracellular distribution ratio of key intermediates in the ladder-like glabridin biosynthetic network. Box plots in (f): lines are median, box limits are quartiles 1 and 3, whiskers are 1.5 × interquartile range, and points are outliers. Sample sizes for each compound are indicated as following: (compound 9, n = 30; compound 13, n = 45; compound 10, n = 11; compound 14, n = 40; compound 11, n = 9; compound 15, n = 9). Each data point represents an independent biological replicate. Statistical significance was assessed using a two-sided Kruskal–Wallis test followed by Dunn’s multiple comparisons test. Exact P-values are shown in the figure. g Metabolic simulation of multi-route biosynthetic network and single-route design. h Metabolic simulation of the induced pathway. i De novo biosynthesis of glabridin in engineered S. cerevisiae. Specifically, seven genes-GgPAL1, GgC4H1, Gg4CL1, GgCHS1, GgCHR1, GgCHI1 and StsTAL-were incorporated to drive liquiritigenin biosynthesis from glucose. Liquiritigenin was subsequently converted to formononetin via three additional enzymes: GgIFS1, GgHID1, and GgOMT1. To complete the formation of the core scaffold medicarpin, four more enzymes-GgI2′H1, GgIFR1, GgVR1, and GgPTS1-were expressed in yeast. Data shown in (d), (e) and (i) are presented as mean ± SD from n = 3 independent biological replicates. Source data are provided as a Source Data file.

We first tested the PcM4DT-GgPTR2-GgOC1, which proceed as expected catalytic sequence: PT-PTR-OC, and continuous convert 811, or 1215 (Fig. 4b, P9–P13). Due to the promiscuity of GgPTR2, side products such as 9 and 13 were also detected (Supplementary Fig. 17a, b). In this design, we observed accumulation of intermediate 8, indicating that the prenyltransfer reaction is inefficient—likely due to the low supply of prenyl donor DMAPP. To address this, we dynamically regulated the ERG20 gene and overexpressed key rate-limiting enzymes (tHMG1 and IDI1) for enhancing DMAPP supply (Supplementary Fig. 17c).

In an alternative configuration (PTR-PT-OC), GgPTR1, GgPT1 and GgOC1 were assembled into a single module. When supplemented with an additional demethylation step, this module generated a ladder-like metabolic network (Fig. 4c, P5–P8). In this architecture, the promiscuous enzymes-PTR, PT, and OC-act on different intermediates along two parallel metabolic routes, which are further interconnected by DMT, resulting in a flexible and interwoven biosynthetic network. The choice of DMT significantly influenced the network’s behavior. Using NhPDA1, a moderately active DMT, the metabolic flux is distributed across multiple routes, forming a multi-route architecture in which various intermediates converge toward glabridin as the final product (Fig. 4d, P5–P7). This metabolism pattern is very close to that we detected in G. glabra (Fig. 1a, b), indicating G. glabra may contain a multi-route architecture as well. Although NhPDA1 can convert 8 to 12, this route was not observed because GgPTR1 rapidly converted 8 to 9. Moreover, we observed that flux preferentially flows toward compound 10, while conversion to 13 remains low. After 48 hours, the demethylation of 10 became dominant, initiating glabridin production. This setup yields 0.9 mg/L of glabridin (Fig. 4d). However, compound 14 accumulates over time, indicating a bottleneck at the OC step. In contrast, GgDMT1, with its exceptionally high demethylation activity, drives the flux predominantly from compound 9 to 13, effectively creating a streamlined, single-route pathway (Fig. 4e, P7). In this mode, 9 is consumed preferentially, thus, intermediates 10, 11, and 12 do not accumulate to detectable levels. When checking the yield of glabridin, surprisingly, the simplified single-route design achieved much lower productivity than the complex, multi-route “ladder” network (Fig. 4).

We analyzed intermediates secretion during fermentation (Fig. 4f). Among intermediates, 13 as a key metabolic node, demonstrates the highest extracellular ratio, reaching 88.4 ± 4.2% compared with the others. This extensive leakage explains the lower overall yield observed in the simplified single-route design. In contrast, despite with the strong efflux of 9 and 13, multi-route network maintains higher level of robustness than the single one. Actually, prenylation reduces extracellular secretion due to increased hydrophobicity, while demethylation promotes secretion into the medium. This suggests that both methylation and prenylation act as self-retention strategies, minimizing their loss via secretion. To validate these observations, we built a metabolic flux model, which confirms the experimental trends and further supports the network behavior (Supplementary Fig. 18a and Fig. 4g). Interestingly, this model also indicates that shifting the demethylation step close to the end of the pathway could substantially improve glabridin production (Fig. 4h and Supplementary Fig. 18b, c).

To achieve de novo biosynthesis of glabridin in S. cerevisiae, 14 enzymes were introduced to establish the core scaffold module (glucose to compound 8), enabling the stepwise biosynthesis of liquiritigenin, formononetin and ultimately medicarpin (Supplementary Fig. 19a, b). Following establishment of the core pathway, the tailoring module (GgPTR1-GgPT1-GgOC1-GgDMT1) was introduced, with GgDMT1 controlled by the cyanamide-inducible DDI2 promoter (P5). In addition, to relieve limitations in the de novo biosynthesis of glabridin, the rate-limiting enzymes were overexpressed, and NADPH supply were enhanced (Supplementary Fig. 19c). Finally, this fully engineered yeast strain enabled de novo production of glabridin in shake-flask fermentation, reaching a final titer of 0.5 mg/L (Fig. 4i and Supplementary Fig. 19d, e).

Discussion

In this study, we combined a high-quality genome assembly with integrative transcriptome analyses to systematically map the maze-like biosynthetic network of glabridin and achieve de novo production in S. cerevisiae from glucose. To our knowledge, this work provides the complete biosynthetic route to glabridin and its total microbial biosynthesis. Within this pathway, we uncovered several transformations, including a plant demethylation step acting on O-methylated isoflavonoids. By assembling key tailoring enzymes, we revealed a ladder-like network architecture mediated by a set of promiscuous enzymes that enhances pathway robustness through multiple interconnected routes. Unlike classical linear metabolic pathways, the multi-route design tolerates metabolic flux fluctuations40,41, contributing to improved overall productivity. In addition, contrary to conventional expectations that plants employs enzymes, which eliminate the need for protecting groups during secondary metabolite biosynthesis, we find multisteps of protection and deprotection mechanism appears to be essential for the synthesis of isoflavan post-tailoreing derivatives, such as glabridin. This unexpected reliance on reversible modifications—such as methylation-demethylation—reveals a layer of regulation in plant biosynthetic pathways. Elucidating these dynamic processes may enable more efficient biosynthetic production of valuable natural products and provide broader insights into the complexity and control of plant metabolic networks. Moreover, although investigation using yeast as model system, our study also provides insights into the evolutionary adaptation of biosynthetic pathways in Glycyrrhiza species. The observed metabolic promiscuity and redundancy may represent an evolutionary strategy to maximize ecological adaptability while minimizing genetic burden, by utilizing a limited number of gene copies to ensure metabolic resilience and flexibility across diverse environmental conditions42,43.

While we have reconstructed the major biosynthetic routes, several challenges remain for further pathway optimization. The oxidative cyclization step catalyzed by GgOC1 exhibited low catalytic efficiency, emerging as the rate-limiting step for glabridin biosynthesis. In addition, although we attempted to reconstitute GgPT1 activity in vitro using yeast microsomes, no measurable activity was detected. As multi-pass aromatic prenyltransferases are prone to loss of activity outside their native membrane environment44, this likely reflects limitations of the current reconstitution system rather than a lack of catalytic function.

Besides, our pathway search relies on known types of enzyme reactions, the inferred routes reflect reasonable biosynthetic possibilities rather than a definitive reconstruction of the native pathway in licorice. Licorice may employ enzymes or reaction chemistries that have not yet been described, and such steps would not be captured by the computational model. Establishing whether the key enzymes identified here function in the endogenous pathway will ultimately require in planta genetic evidence. Although gene-silencing approaches in licorice are not yet established for this purpose, developing such genetic tools represents an important direction for future work. These advances would enable targeted knockdown or knockout of PTR, PT, OC, and DMT genes and allow direct evaluation of their roles in glabridin biosynthesis.

Moreover, although we established two major biosynthetic routes, certain branches of the network (Fig. 1 and Supplementary Fig. 1d) remain incomplete due to missing or unidentified enzymes. We agree that validation of these reactions would further enhance the authenticity of the proposed glabridin maze-like network. However, the inability to perform these assays reflects experimental constraints rather than evidence against the existence of these branches. Nonetheless, the presence of key intermediates such as compounds 19 and 21 in G. glabra supports the biochemical plausibility of these branches45,46. Future efforts to discover and integrate these enzymes could fully realize the potential of the glabridin biosynthetic network, enabling more efficient and versatile production of glabridin and structurally related derivatives.

Overall, our work not only provides a blueprint for glabridin biosynthesis but also highlights a generalizable principle: maze-like, multi-route architectures can endow microbial cell factories with enhanced metabolic resilience, offering a strategy for the synthesis of complex plant natural products. In addition, the development of such scalable and sustainable microbial production systems offers a highly promising alternative to direct plant extraction, which is essential for the wildlife conservation of rare G. glabra.

Methods

Chemical standards

All chemical standards used in this study (Supplementary Table 6) were obtained either from commercial suppliers or through custom synthesis. Preglabridin was obtained from TargetMol (Boston, MA, USA). 4′-O-methylglabridin and 4′-O-methylpreglabridin were custom-synthesized by Acme Biopharma Co., Ltd. (Shanghai, China). All other chemicals were purchased from YuanYe Biotechnology Co., Ltd. (Shanghai, China).

Retrobiosynthesis analysis

The bio-retrosynthesis algorithm was employed to predict the biosynthetic pathway of glabridin. First, the molecular fingerprint similarity method (FingerprintSimilarity) from the RDKit package in Python was utilized to identify structural analogs of glabridin among metabolites in the KEGG database (https://www.genome.jp/kegg/pathway.html). Second, the acyclic path search method (shortest_simple_paths function) from the NetworkX package in Python was applied to explore potential biosynthetic routes for these structural analogs. The pathway origins were set as both the identified glabridin analogs and 2808 endogenous metabolites from the S. cerevisiae genome-scale metabolic model YEAST847. Subsequently, RetroRules12 (containing 65,688 reaction rules with diameters = 4 and 6) was implemented for retrosynthetic pathway prediction from glabridin to its structural analogs identified in the previous step. To address combinatorial explosion challenges, Monte Carlo Tree Search (MCTS) was employed to establish a retrosynthetic route from glabridin to its structural analog medicarpin. This foundation was further expanded through human-computer interaction, single-step retrosynthetic analysis to develop a comprehensive retrosynthetic network.

Pathway screening and curation

A comprehensive retrosynthesis-guided reaction network was generated through systematic reaction-rule expansion and MCTS-assisted manual extension, as described above. From this exhaustive compound-reaction space, we manually curated a subset of intermediates and transformations based on known or predictable enzymatic logic in plant specialized metabolism. Only reactions consistent with established biosynthetic mechanisms (e.g., O-methylation, hydroxylation, reductation, prenylation, cyclization, demethylation) were retained. To improve biochemical relevance, we further filtered intermediates by requiring that they have been detected or reported in licorice. Application of this criterion yielded the refined network. To enumerate all theoretically feasible routes in the network, we constructed a directed graph using its curated intermediates and transformations. To avoid redundant branching near the early-stage precursor 4, the conversion from 4 to 5 was excluded during graph construction. We then performed a depth-first search to traverse all possible paths from compound 1 to the final product 15, resulting in 13 candidate pathways.

Genome sequencing, assembly and annotation

Samples of G. glabra were collected from the Alar region of Xinjiang, China. A PacBio sequencing library was constructed and sequenced using six SMRT cells on the PacBio Sequel platform (Pacific Biosciences) by BGI Genomics Co., Ltd. The third-generation sequencing data were first corrected using the Canu (v2.2), followed by preliminary assembly with the Smartdenovo software (v1.0.0,) to obtain initial contig sequences. To further improve assembly quality, the initial contigs were sequentially polished using Racon (v1.4.3) for third-generation data correction, followed by Pilon (v1.24) for error correction with second-generation sequencing data, yielding the final preliminary assembly. To remove redundant regions, the genome assembly was processed using the Purge haplotigs (v1.1.2) tool. Genome completeness was assessed using BUSCO (v5.8.3). A Hi-C sample was sequenced for high-quality data. By integrating Hi-C linkage signals, the contig sequences were oriented, clustered, and anchored to the chromosomal level using Juicer (v1.6) and 3d-dna (v201008). Illumina and PacBio sequencing reads were mapped on chromosome-level genome assembly with Bowtie2 (v2.5.4) and minimap2 (v2.28), respectively, where default parameters were adopted48,49. Then, sequencing coverage was calculated by the ‘depth’ function of Samtools (v1.18) in ‘-aa’ mode and displayed in a Circos plot with TBtools (v2.007)50,51. Phylogeny relationship of G. glabra, G. inflata, G. uralensis and thirteen neighbor species in Leguminosae were constructed according to profiles of shared and unique orthology genes with OrthoFinder (v2.5.4) by default parameters52.

To annotate the SDR family enzymes identified from licorice, enzymes from the SDR406A family were retrieved using the Uniprot database (https://www.uniprot.org). The BLAST tool (https://blast.ncbi.nlm.nih.gov/Blast.cgi) was used to search for homologous sequences of the target enzymes. From the identified candidates, enzymes with known functions were selected, and a multiple sequence alignment was performed using MUSCLE53 (version 5.1). Phylogenetic tree construction was carried out using IQ-TREE54 (version 2.1.4-beta), applying the Maximum Likelihood (ML) method. The inference of the phylogenetic tree was based on the JTT (Jones-Taylor-Thornton) amino acid substitution model, with 1000 ultrafast bootstrap55 resampling to assess branch support and an additional 1000 approximate likelihood ratio tests to further evaluate branch reliability. The final tree was visualized using online iTOL (https://itol.embl.de), with clusters containing only branches that include the enzymes to be annotated retained. The branch lengths of the tree represent evolutionary distances.

Total RNA isolation, RNA-Seq and gene expression quantification

Based on the geographical distribution of wild licorice resources, sampling sites were selected in Turpan, Urumqi, Emin, Tumxuk, Aksu in Xinjiang, and Longdong in Gansu. From the same individual plant at each location, distinct samples of tap roots, horizontal roots, stems, leaves and sprouts were collected. To facilitate subsequent metabolite analysis and RNA extraction for transcriptome profiling, the collected organs were immediately flash-frozen in liquid nitrogen. Aliquots intended for metabolomic analysis were stored on dry ice, while aliquots for transcriptomic analysis were preserved in an RNA stabilization solution. Licorice was chopped and ground by freezing with liquid nitrogen. Total RNA was extracted by using the TranZol kit (TransGen Biotech, China) according to the manufacturer’s instructions, and was subsequently used to synthesize first-strand complementary DNA (cDNA) with the TransScript® One-Step gDNA Removal and cDNA Synthesis SuperMix (TransGen Biotech, China). Transcriptome sequencing of various G. glabra organs was performed by BGI Genomics Co., Ltd. (Shenzhen, China). The quality of RNA was assessed by the Agilent 2100 Bioanalyzer, and mRNA was purified using oligomagnetic beads. The reverse-transcribed cDNA library was sequenced using BGI’s BGISEQ500RS platform (BGI Genomics Co., Ltd). The raw data and measured parameters of the transcriptome have been stored in the Jingweikaiwu database (kw.ibc.tsinghua.edu.cn). For transcriptome analysis, RNA-seq reads were aligned to the genome sequence using Hisat2 (v2.2.1)56. Subsequent gene prediction was performed using GeMoMa57, leveraging genomic and transcriptomic data from a closely related species. The gene set was evaluated for completeness using BUSCO. For transcriptome quantification, transcript abundance was quantified using Stringtie58.

Candidate gene identification

Candidate pathway genes were identified through a stepwise sequence-based workflow. First, functionally characterized enzymes reported for related reactions were compiled from public databases and used as queries for homology searches against the G. glabra genome. Reductase candidates were drawn from IFR/VR/PLR/PTR families and classified by phylogenetic analysis. Prenyltransferase candidates were filtered by homology to known aromatic PTs and the presence of conserved UbiA-type motifs (NQ--D---D, KD--D--GD). Oxidative cyclase candidates were selected from BBE-like proteins and P450 subfamilies (CYP71D, CYP82A, CYP736). O-demethylase candidates were obtained from Fe2+/2OG-dependent dioxygenases and P450 families implicated in O-demethylation (CYP80G, CYP82D, CYP71BE). Final gene sets are summarized in Supplementary Table 1.

Co-expression analysis of candidate genes

A heatmap was generated to visualize the transcriptional profiles of candidate genes across five organs of G. glabra—tap roots, horizontal roots, stems, leaves and sprouts. The heatmap was constructed using the R package pheatmap (v1.0.12), with row-wise scaling. Expression levels were shown as Z-scores calculated from log2-normalized TPM values. Pearson correlation analysis was performed using GgI2′H1 as the bait gene, based on TPM values across 53 G. glabra transcriptomes covering the five organs.

Plasmid construction

Information on newly identified G. glabra genes is provided in Supplementary Data 3. Other genes encoding biosynthetic enzymes used in this study are listed by source and accession number in Supplementary Table 5. Codon-optimized synthetic genes were synthesized by Tsingke (Beijing, China). A complete list of plasmids used is available in Supplementary Table 7. Two classes of plasmids were constructed: functional assay plasmids and pathway integration plasmids. For functional assays, candidate genes were cloned between a galactose-inducible GAL2 promoter and the ECM10 terminator, flanked by 500–800 bp genomic homology arms to enable targeted integration into the yeast genome. All vector components except for the gene insert were synthesized in advance. A pair of inward-facing BsaI restriction sites was placed between the promoter and terminator to allow seamless gene insertion via Golden Gate assembly. Gene fragments (either PCR-amplified or synthesized) were digested with BsaI and ligated using T4 DNA ligase (both from New England Biolabs). Pathway integration plasmids were constructed using the same BsaI-based cloning strategy. In these plasmids, expression cassettes consisted of GAL2 or GAL1 promoters from various species and native yeast terminators. Each expression cassette was flanked by 200 bp synthetic orthogonal sequences that were manually designed to serve as overlap regions, enabling one-step assembly of multi-gene constructs.

Yeast strain construction

Yeast genomic modifications were performed using the CRISPR-based method59. DNA fragments intended for genomic integration were PCR-amplified using Q5 DNA polymerase (NEB) and co-transformed with a CRISPR-Cas9 plasmid targeting the desired locus. The primers used for yeast strain construction are listed in Supplementary Data 4. For yeast transformations, an overnight culture of the parental strain was inoculated into 4 mL of 2 × YPD medium in a 24-deep well plate at an initial OD600 of 0.2 and incubated at 30 °C with shaking at 400 rpm until OD600 reached 1.0. Cells corresponding to 1 OD unit were harvested by centrifugation at 3000 × g for 2 min and washed with an equal volume of sterile water. The cell pellet was then resuspended in a transformation mixture containing 1000 ng of DNA fragment, 500 ng of CRISPR-Cas9 plasmid, and 260 μL of 50% PEG3350, 36 μL of 1 M LiOAc, and 10 μL of denatured salmon sperm ssDNA60. The mixture was incubated at 42 °C for 30 min. Following centrifugation at 3000 × g for 2 min, the cell pellet was resuspended in 600 μL of YPD and recovered at 30°C for 90 min. Cells were then pelleted again, resuspended in 100 μL of sterile water, and plated onto selective agar plates. Correct integration events were confirmed by colony PCR. Verified clones were used for subsequent engineering steps after curing the CRISPR-Cas9 plasmid. Yeast culture media were obtained from Coolaber (Beijing, China). A list of all strains constructed in this study is provided in Supplementary Table 8.

Strains and culture conditions

Yeast colonies were initially inoculated in triplicate into 2 mL YPD and grown overnight to saturation at 30 °C and 400 rpm, and then 100 μL transferred into 4 mL deep-well 24-well plates sealed with Breathable Sealing Film (Biotss). Cultures were incubated for 120 h at 30 °C, 400 rpm in a high-speed orbital shaker (ZSZY-88BH; Shanghai Zhichu Instrument Co., Ltd.). After 24 h of incubation, galactose was added to a final concentration of 20 g/L to induce gene expression, along with substrate addition when required.

Metabolomic profiling of licorice

Dried roots, stems and leaves of G. glabra, and dried roots of G. uralensis and G. inflata were ground into coarse powder under liquid nitrogen and further disrupted using 0.5 mm and 3 mm zirconia beads in a cryogenic bead mill (Guangzhou Luka Sequencing Instrument Co., Ltd.) at 75 Hz for five cycles (each cycle: 60 s grinding, 30 s pause) to obtain fine plant powder. Accurately weighed samples were extracted three times with 25 mL of methanol and 25 mL of ethyl acetate, respectively, in an ultrasonic bath at 25 °C for 30 min. The combined extracts were concentrated to dryness, re-dissolved in methanol, and filtered through a 0.22 μm membrane. Four independent biological replicates were analyzed for each tissue type of G. glabra, and three independent biological replicates for the dried roots of G. uralensis and G. inflata.

Metabolite profiling was performed using a Thermo Scientific Ultimate 3000 UHPLC system coupled to a Q Exactive Plus Orbitrap high-resolution mass spectrometer (Thermo Fisher Scientific, USA). Chromatographic separation was carried out on a Waters ACQUITY UPLC HSS T3 C18 column (2.1 × 100 mm, 1.8 μm; Waters Corporation, USA), with 0.1% (v/v) formic acid in water as mobile phase A and 0.1% (v/v) formic acid in acetonitrile as mobile phase B. The column temperature was maintained at 35 °C, the flow rate was set to 0.2 mL/min, and the injection volume was 5 μL. The elution was performed using the following gradient: 0.00–10.00 min, 100% B; 10.00–20.00 min, 100–70% B; 20.00–25.00 min, 70–60% B; 25.00–30.00 min, 60–50% B; 30.00–40.00 min, 50–30% B; 40.00–45.00 min, 30–0% B; 45.00–60.00 min, 0% B; 60.00–60.10 min, 0–100% B; 60.10–70.00 min, 100% B. Mass spectrometry was conducted in both positive and negative ionization modes using a heated electrospray ionization (HESI) source. The sheath gas flow rate was set to 40 arbitrary units, auxiliary gas to 15 units, capillary temperature to 320 °C, and auxiliary gas heater temperature to 350°C. The spray voltages were +3.2 kV (positive mode) and –3.0 kV (negative mode). The resolution was set to 70,000 for MS and 17,500 for MS/MS. Data were acquired in full-scan mode over an m/z range of 100–1500 in both ionization modes, followed by data-dependent MS/MS (ddMS2) scans acquisition using higher-energy collisional dissociation (HCD). Compound identification was performed using Compound Discoverer 3.3 software, with reference to mzCloud and mzVault databases. A mass tolerance of ± 5 ppm was applied for precursor ion matching. Metabolites were confirmed using authentic standards. The peak Area of each detected compound was normalized to the corresponding sample weight and used for metabolomic heatmap visualization.

Analysis of metabolite production

Yeast cultures were centrifuged at 3000 × g for 5 min to separate the cell pellets and supernatants. Cell pellets were resuspended in saturated saline and mixed with 0.5 mm zirconia beads. Cell disruption was performed using a cryogenic bead mill (Guangzhou Luka Sequencing Instrument Co., Ltd.). Ethyl acetate was added to both the disrupted cell suspension and the culture supernatant for metabolite extraction. The organic phase was collected and evaporated to dryness under vacuum using a concentrator (Concentrator plus; Eppendorf). Residues were re-dissolved in methanol and filtered through a Nylon 66 membrane (Bioexploration, Guangdong, China) to obtain samples for subsequent HPLC analysis. Metabolites were analyzed using an Agilent 1260 Infinity Binary HPLC system. Chromatographic separation was performed on an InfinityLab Poroshell 120 EC-C18 column (3.0 × 100 mm, 2.7 μm; Agilent Technologies) with 0.1% (v/v) formic acid in water as mobile phase A and acetonitrile as mobile phase B. The flow rate was set to 0.5 mL/min, with a column temperature of 40°C and an injection volume of 5 μL. The elution was carried out using the following gradient: 0.00–5.00 min, 10–20% B; 5.00–20.00 min, 20–60% B; 20.00–25.00 min, 60–100% B; 25.00–27.00 min, 100% B; 27.00–30.00 min, 100-10% B; 30.00–32.00 min, 10% B. LC–MS analysis was performed following the same procedure described in the “Metabolomic profiling of licorice” section. The production of the metabolite was purified and characterized by liquid Chromatography-Mass Spectrometry (LC-MS) and structural determination by NMR.

Protein structure modeling and docking

Protein structures and ligand docking were predicted using the local version of AlphaFold 361. The SMILES notation of the ligands was obtained from the PubChem database (https://pubchem.ncbi.nlm.nih.gov). PyMOL (version 2.5.7, http://www.pymol.org) molecular visualization software was used for further analysis of the ligand-receptor complexes. The APBS Electrostatics feature in PyMOL was applied to calculate the electrostatic potential maps of the proteins, utilizing the pdb2pqr method with a grid spacing of 0.50 Å. The predicted isoprenoid-binding pockets were determined by comparing protein structures and sequence similarities.

Molecular dynamics simulation

The force field parameters of NADPH were retrieved from previous report62. The geometry of the ligand was optimized using the ORCA 6.063 with the M06-2X method64 combined with the def2-TZVP basis set and empirical dispersion correction GD365. Molecular force field parameters for the ligand were generated by the sobtop tool based on the General Amber Force Field (GAFF2)66, with atomic charges fitted using the RESP method67. Protein topology was described using the Amber ff14SB all-atom force field. Molecular dynamics (MD) simulations were conducted using the GROMACS 2023 software package. The protein-ligand complex was placed in a periodic boundary condition box filled with the TIP3P water model, with the system solvated and neutralized by adding 0.15 M NaCl. The simulation system underwent sequential energy minimization, followed by equilibration under the NVT ensemble at 303.15 K and the NPT ensemble. Each complex adopts three independent 500 ns simulations with a time step of 2 fs, and trajectories were recorded every 1 ns. System temperature was maintained at 303.15 K using the V-rescale thermostat, while pressure was controlled at 1 bar via the C-rescale method. Non-bonded van der Waals interactions were computed using a switching function with a cutoff of 1.0 nm, and long-range electrostatic interactions were calculated with the particle mesh Ewald (PME) method, also using a cutoff of 1.0 nm. The Root-mean-square deviation (RMSD) profiles of ligands fitted to the protein backbone were calculated using the “gmx rms” command. Simulation results were visually analyzed using PyMOL software (version 2.6.2).

Subcellular co-localization analysis

For subcellular localization, the coding sequences of GgPT1 and GgOC1 were cloned into 2 µ-based multicopy plasmids under the control of constitutive promoters PTHD3 and fused to fluorescent tags mScarlet. The constructs were transformed into a Saccharomyces cerevisiae strain integrated with Sec61–mNeonGreen, a canonical endoplasmic reticulum (ER) marker. Co-localization was analyzed using a confocal fluorescence microscope.

Fluorescence image and analysis

Individual colonies of yeast strains transformed with plasmids encoding biosynthetic enzymes fused to fluorescent protein reporters were inoculated into 4 mL YPD medium and grown overnight (~ 12 h) at 30°C with shaking at 400 rpm. Overnight cultures were then diluted 1:4 into fresh YPD medium and incubated for an additional 6–8 h at 30 °C, 400 rpm to allow proper folding of slow-maturing fluorescent proteins. Cells were harvested by centrifugation and resuspended in 100 μL of sterile PBS for imaging. For microscopy, 5–10 μL of cell suspension was spotted onto a glass microscope slide and covered with a glass coverslip (Shanghai Titan Scientific Co., Ltd.). Imaging was performed using a Nikon AXR multi-SIM multimodal super-resolution confocal microscope (Nikon Instruments Inc., Japan) equipped with a 100 × oil immersion objective. Fluorescence excitation was carried out using 488 nm for the green channel (GFP) and 561 nm for the red channel (mCherry). Images were acquired using NIS-Elements software and processed using ImageJ (v1.54 f).

Protein expression and purification

Recombinant GgPTR1, GgPTR2 and GgDMT1 were cloned into the pET28a vector using Gibson assembly and transformed into E. coli BL21. Overnight seed cultures grown in TB medium supplemented with kanamycin were used to inoculate 1 L TB cultures, which were shaken at 37 ˚C to an OD600 of 0.6. Protein expression was induced with 0.5 mM IPTG, followed by incubation at 16 ˚C for 20 h. Cells were harvested by centrifugation and resuspended in lysis buffer (50 mM Tris-HCl, 300 mM NaCl, 5% glycerol, 10 mM imidazole, pH 7.5). After sonication, the lysate was clarified (10,000 g, 20 min, 4 ˚C) and the supernatant was incubated with Ni-NTA resin for 20 min at 4 ˚C. Bound proteins were washed with lysis buffer and eluted with elution buffer containing 500 mM imidazole. Imidazole was removed by ultrafiltration (10 kDa cutoff) with two rounds of dialysis against storage buffer (20 mM HEPES, 150 mM NaCl, 20% glycerol, 1 mM DTT, pH 7.4). The purified proteins were used directly for enzymatic assays.

Microsomes preparation

Microsomal fractions containing GgOC1 were prepared from S. cerevisiae expressing the corresponding constructs. Seed cultures were grown in YPD medium at 30 ˚C overnight and used to inoculate 20 mL and subsequently 1 L YPD cultures, which were incubated for 2 days at 30 ˚C with shaking. Cells were harvested by centrifugation (4000 × g, 10 min, 4 ˚C), gently resuspended in 100 mL ice-cold extraction buffer (20 mM Tris-HCl pH 7.5, 0.5 M sucrose, 5 mM MgCl2, 1 mM DTT, 1 mM PMSF), incubated for 10 min at room temperature, and pelleted again under the same conditions. The cell pellet was washed once more with extraction buffer and subjected to high-pressure homogenization (three cycles at 1200 bar) on ice. The homogenate was clarified by centrifugation (10,000 × g, 10 min, 4 ˚C), and the resulting supernatant was ultra-centrifuged at 100,000 × g for 1 h at 4 ˚C to collect the microsomal membrane fraction. The microsomal pellet was gently resuspended in 5 mL storage buffer (100 mM Tris-HCl, pH 7.5, 0.25 M sucrose, 10 mM MgCl2, 20% glycerol, 1 mM DTT), aliquoted, flash-frozen in liquid nitrogen, and stored at − 80  C until use.

Enzymatic assays

GgPTR1 and GgPTR2 were assayed in 50 mM Tris-HCl (pH 7.5), supplemented with 200 μM NADPH under initial-rate conditions. For GgPTR1, medicarpin (8, 10–1000 μM) and demethylmedicarpin (12, 10–1000 μM) were assayed using 0.5 μg purified enzyme. For GgPTR2, medicarpin (8, 40–3000 μM), demethylmedicarpin (12, 10–1000 μM), and licoagrocarpin (20, 10–400 μM) were assayed under initial-rate conditions. GgDMT1 were performed in 50 mM Tris-HCl (pH 7.5), supplemented with 100 mM NaCl, 1 mM 2-oxoglutarate, 1 mM ascorbate, 1 mM TCEP, and 100 μM FeSO4. Five methylated substrates were evaluated under initial-rate conditions with substrate ranges and enzyme amounts as follows: 4′-O-methylglabridin (11, 10–1000 µM, 0.5 µg enzyme); 4′-O-methylpreglabridin (10, 10–700 µM, 10 µg enzyme); vestitol (9, 10–1600 µM, 5 µg enzyme); medicarpin (8, 10–800 µM, 50 µg enzyme); and formononetin (5, 10–600 µM, 25 µg enzyme). Microsomal GgOC1 (5 μL) was assayed using 4’-O-methylpreglabridin (10, 30–1000 µM) or preglabridin (14, 10–1000 µM) as substrates in 50 mM Tris-HCl (pH 7.5). All kinetic assays were performed in 100 μL reaction mixtures at 30 ˚C, and reactions were quenched with an equal volume of ice-cold methanol, clarified by centrifugation (10,000 × g, 10 min), and analyzed by UPLC using a modified gradient: 10% B (0 min), 10–20% B (0–2.5 min), 20–60% B (2.5–10 min), 60–100% B (10–12.5 min), 100% B (12.5–13.5 min), 100-10% B (13.5–15 min), and 10% B (15–16 min). Kinetic parameters (Km, Vmax, Kcat, and Kcat/Km) were obtained by nonlinear regression based on three independent replicates.

Ladder network dynamics measurement

Yeast colonies were initially inoculated in triplicate into 4 mL YPD medium and grown overnight at 30 °C with shaking at 400 rpm until saturation. Subsequently, 1 mL of each culture was transferred into 20 mL fresh YPD medium in 100 mL shake flasks. Cultures were incubated for 120 h at 30 °C and 250 rpm in a shaking incubator. After 24 h of incubation, galactose was added to a final concentration of 20 g/L to induce gene expression, along with substrate addition when applicable. Samples were collected every 24 h and processed using the same method described above.

Model framework and simplifications

A computational modeling approach was employed to describe metabolic networks formed by multiple promiscuous enzymes and their substrate compounds. To address the inherent complexity of cellular metabolic environments, essential simplifications were implemented throughout the simulations. Cellular compartmentalization and compound diffusion rates were neglected in the model formulation.

The Michaelis-Menten equation was adapted to simulate catalytic processes, accounting for competitive interactions between (1) different substrates for shared enzyme active sites and (2) multiple enzymes competing for identical substrates. This framework yielded:

$${v}_{1i}=\frac{{V}_{\max 1,i}[{S}_{1}]}{{K}_{{{{\rm{m}}}}1,i}(1+\frac{\left[{S}_{2}\right]}{{K}_{{{{\rm{m}}}}2}})+\left[{S}_{1}\right]+{\Sigma }_{j\ne i}[{S}_{1}]\cdot {\alpha }_{{ij}}}$$
(1)

Km values were listed in the Supplementary Table 968. Computational analysis revealed Km [S], where substrate concentrations ([S]) were significantly lower than corresponding Km values. This permitted simplification of the Michaelis-Menten equation to:

$${v}_{1i}=\frac{{V}_{\max 1,i}[{S}_{1}]}{{K}_{{{{\rm{m}}}}1,i}}$$
(2)

To align with laboratory conditions for microbial cell factories, the model assumed constant cellular biomass and enzyme concentrations. Enzyme concentrations were fixed based on biomass measurements at 24 h post-inoculation, coinciding with substrate supplementation timing in experimental protocols. In addition, concerning the critical parameter of compound transport across cellular membranes (influx/efflux) that significantly impacts this reaction network, we postulate that all compounds achieve rapid equilibrium through passive transport mechanisms, represented by:

$$\frac{d\left[{S}_{{{{\rm{out}}}}}\right]}{{dt}}=-{k}_{{{{\rm{trans}}}}}(\left[{S}_{{{{\rm{out}}}}}\right]-[{S}_{{{{\rm{in}}}}}])$$
(3)

where transport rate constants (\({k}_{{trans}}\)) were experimentally determined through separate permeability assays.

The resulting system of ordinary differential equations was numerically solved using LSODA (Low-order SObolov differential equation solver), solve_ivp function in Python (v3.8.5, scipy 1.10.1).

Quantification and statistical analysis

Data were processed by Excel, Origin and R. The significant comparisons of two different groups are indicated in the graphs statistical analysis was performed using a two-tailed unpaired Student’s t test. P-values were given in the figures. The graphs represented means ± SD unless otherwise indicated, as described in the figure legends.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.