Introduction

Furochromones are an important class of bioactive natural products. They demonstrate anti-inflammatory1,2, hepatoprotective3, and antiviral4 activities. For instance, khellin was used as a smooth muscle relaxant and cardioprotective drug in Europe5. While chromones are widely present in plants, furochromones have only been reported in a few families including Apiaceae, Ranunculaceae, and Leguminosae6. In Apiaceae, furochromones are the major bioactive compounds of Saposhnikovia divaricata7, Ammi visnaga8, and Cnidium monnieri9. Particularly, S. divaricata contains abundant prim-O-glucosylcimifugin (POG) and 5-O-methylvisamminoside (5-O-MVG), and their total content could be above 0.24% of dry weight10. These two compounds may contribute to the bioactivities of S. divaricata for the treatment of respiratory virus infection11, type I allergy12, colitis13, and aging-impaired endogenous tendon regeneration14.

The structures of POG and 5-O-MVG feature in the substitution of an isoprenyl group at C-6, which forms a fused dihydrofuran ring15,16 (Supplementary Fig. 1). The biosynthesis of simple chromones has been extensively studied. The chromone skeletons are generated by polyketide synthases, such as PECPS from Aquilaria sinensis and AaPCS from Aloe arborescens17,18. However, the biosynthesis of furochromones remains largely unexplored. In the early 1970s, researchers fed sodium [1-14C] acetate to the shoots of Ammi visnaga, and revealed that peucenin and visamminol were biosynthetic intermediates of furochromones19. For the biosynthesis of POG or 5-O-MVG, a prenyltransferase (PT) is responsible to introduce an isoprenyl group to C-6 of the chromone skeleton20. Thus far, very few enzymes have been reported to catalyze cyclization of an isoprenyl group to form a dihydrofuran ring. Although CYP76F112 from Ficus carica, PpDC and PpOC from Peucedanum praeruptorum, NiDC and NiOC from Notopterygium incisum, as well as AsDC and AsOC from Angelica sinensis have been reported to catalyze similar reactions to produce furocoumarins21,22,23,24, no enzymes have been testified to generate furochromones. On the other hand, glycosyl substitutions at hydroxyl groups linking to the quaternary C-3’ or the secondary C-11 are rare for natural products, and these reactions are usually catalyzed by uridine diphosphate-dependent glycosyltransferases (UGTs)25. Moreover, both POG and 5-O-MVG contain a methoxyl group at C-5, and the methylation reaction was proposed to be catalyzed by an O-methyltransferase (OMT)26. Although a big family of OMTs have been reported from plants, few OMTs could catalyze methylation at the less active 5-OH. Limited examples include the isoflavone 5-O-methyltransferase from Lupinus luteus27 and CdFOMT5 from Citrus depressa28. For POG, a cytochrome P450 (CYP450) enzyme may introduce the extra primary hydroxyl group at C-1129. Based on the above analysis, we hypothesized the biosynthetic pathway of 5-O-MVG (6) and POG (9) (Fig. 1a). While the enzyme categories catalyzing each step seem obvious, the specific enzymes with expected functions are still illusive.

Fig. 1: A proposed biosynthetic pathway of furochromones and genomic statistics in S. divaricata.
figure 1

a The proposed biosynthetic pathway and catalytic enzymes. PT prenyltransferase, PCS pentaketide chromone synthase, CYP450 cytochrome P450 enzyme, OMT O-methyltransferase, UGT uridine diphosphate-dependent glycosyltransferase. 1, malonyl-CoA; 2, noreugenin; 3, peucenin; 4, visamminol; 5, 5-O-methylvisamminol; 6, 5-O-methylvisamminoside; 7, norcimifugin; 8, cimifugin; 9, prim-O-glucosylcimifugin. b Image of the sampled S. divaricata. c, Total ion currents (TICs) and extracted ion chromatograms (EICs) of the root, petiole, and leave of S. divaricata by LC/MS analysis. EIC mass range: m/z 291.11–291.12 + 293.09–293.10. d Contents of 5, 6, 8 and 9 in different organs, calculated on the basis of dry weight (n = 3, three biologically independent samples were tested). e Genomic statistics of S. divaricata, showing eight chromosomes (Chr1–Chr8). i, pseudochromosomes; ii, gene density; iii, Gypsy LTR density; iv, Copia LRT density; v, Helitron density; vi, GC content.

In this work, we dissected the biosynthetic pathway of POG and 5-O-MVG in S. divaricata. The functions of seven enzymes were characterized, including SdPCS, SdPT, SdPC, SdCH, SdOMT, SdUGT1, and SdUGT2. Utilizing these gene elements, we realized the complete biosynthesis of POG and 5-O-MVG in Nicotiana benthamiana leaves. Moreover, we unravelled the genetic mechanisms for high abundances of POG and 5-O-MVG in S. divaricata among Apiaceae plants.

Results and discussion

Proposed biosynthetic pathway of furochromones in Saposhnikovia divaricata and gene mining

First, we analyzed the chemical constituents of three organs of S. divaricata (leaf, petiole, and root, Fig. 1b–c) by liquid chromatography coupled with mass spectrometry (LC/MS). At least five furochromones (5–9) could be detected, which supported the validity of our proposed biosynthetic pathway. Subsequently, the contents of major compounds 5, 6, 8 and 9 in five tissue samples (roots at three developmental stages, petiole, and leaf) were quantitatively determined (Supplementary Figs. 26). The results indicated the roots contained more abundant furochromones, particularly the glycosides 6 and 9, than the petiole and leaf samples (Fig. 1d).

In order to obtain a complete list of candidate genes involved in the biosynthesis of POG and 5-O-MVG, we sequenced, assembled, and annotated a chromosome-level genome of S. divaricata. Based on 43.72 Gb PacBio CCS long reads, we assembled the genome to 1.95 Gb (Supplementary Table 1), which was consistent with the estimate by flow cytometry (1.94 ± 0.02 Gb) (Supplementary Fig. 7) and the published assembly30. The assembly contig N50 was 2.22 Mb and the Benchmarking Universal Single-Copy Ortholog (BUSCO) score was 96.1%, indicating good genome continuity and completeness (Supplementary Tables 23). By Hi-C technology, 94.27% contigs were anchored onto eight chromosomes (Fig. 1e, Supplementary Fig. 8 and Supplementary Table 4). Multiple-tissue RNA-Seq data (Supplementary Table 5), ab initio prediction, and homolog protein evidences were combined for genome annotation, which led to the identification of 38,704 high-confidence protein-coding genes and 65,734 transcripts. Finally, a total of 1,751,401 repetitive elements were annotated, accounting for 76.78% of the genome (Supplementary Table 6). With the high-quality genome and multiple-tissue RNA-Seq data, we quantified the gene expression abundance (fragments per kilobase of exon model per million mapped fragments, FPKM) of the five tissue samples mentioned above. Subsequently, we screened candidate genes according to genome annotation or local blastn search, and selected genes whose expression levels were correlated with the contents of downstream secondary metabolites in different organs for functional characterization.

Biosynthesis of the furochromone skeleton

The first step of the biosynthetic pathway converts malonyl-CoA (1) to noreugenin (2). The pentaketide chromone synthase AaPCS from Aloe arborescens is the only reported enzyme to catalyze this type of reaction18. Thus, we conducted a local blastn search using AaPCS as a query in the transcriptome of S. divaricata, and ten candidate genes with e values < 10−21 were discovered. The expression level (FPKM) of one gene, SdPCS, was highly correlated with the furochromones contents with Pearson correlation coefficient (PCC) > 0.95 (Supplementary Table 7). It was sub-cloned into the pET28a (+) vector for protein expression in E. coli BL21 (DE3) cells. The function was characterized by enzyme catalysis reactions with 1 as substrate. According to high-performance liquid chromatography (HPLC) and LC/MS analyses, SdPCS generated a new peak, which was identified as 2 by comparing with a reference standard. SdPCS showed the highest catalytic efficiency at 37°C in Na2HPO4-NaH2PO4 buffer (pH 6.0). The kcat/Km value of SdPCS generating 2 was 22.84 M−1·s−1 (Supplementary Fig. 9). From the genome of S. divaricata, we further discovered and characterized SdPCS2 with the same function (Fig. 2a), though it was initially neglected due to low expression level (Supplementary Data 1). SdPCS2 shows high amino acid sequence identity (91.33%) and short distance in the genome (135,334 bp) with SdPCS (Supplementary Figs. 1011).

Fig. 2: Biosynthesis of furochromone skeleton.
figure 2

Functional characterization of SdPCS (a), SdPT (b), and SdPC (c). Shown are HPLC/UV chromatograms of enzyme catalysis reactions (λ = 280 nm), together with (+)-ESI-MS and MS/MS spectra of the products. Control, reaction mixtures incubated with boiled enzymes or microsomes.

To discover the prenyltransferase (PT) converting 2 to peucenin (3), we obtained one candidate gene SdPT (PCC > 0.95, Supplementary Table 8) among the 20 annotated PT genes. SdPT was sub-cloned to pESC-Leu vector and expressed in yeast WAT11 cells31. When the yeast microsomes were incubated with 2, DMAPP and MgCl2, HPLC analysis showed a new product, which exhibited an [M + H]+ ion at m/z 261.11 in LC/MS analysis. The MS/MS spectrum showed an abundant [M-56 + H]+ fragment at m/z 205.05, indicating a prenyl substitution at C-6 or C-832 (Fig. 2b). SdPT showed the highest catalytic efficiency at 45 °C in Na2CO3-NaHCO3 buffer (pH 10.0), with a Km value of 29.08 μM (Supplementary Fig. 12). Then we purified 0.8 mg of the product from scaled-up enzymatic reactions with the most suitable reaction condition. The 1H-NMR spectrum showed two methylene signals at δH 3.17 (m, H-1′), one olefinic signal at δH 5.16 (t, J = 6.0 Hz, H-2′), and two methyl signals at δH 1.61 (H-4′) and 1.71 (H-5′), indicating the presence of an isoprenyl group. The HMBC cross peaks from H-1′ to C-5 (δC 158.1), C−6 (δC 111.1), and C-7 (δC 164.8) indicated the isoprenyl group was located at C-6 (Supplementary Figs. 1316). Thus, the product was identified as peucenin (3) (Supplementary Table 9). SdPT represented the first prenyltransferase utilizing chromones as substrate. SdPT2, with an amino acid sequence identity of 85.79% (Supplementary Fig. 17), exhibited the same catalytic function as SdPT.

Few enzymes are known to catalyze the oxidative cyclization of isoprenyl groups, except for several CYP450 enzymes involved in the biosynthesis of furocoumarins22,23,24. Since these enzymes belong to the CYP736 family, we screened candidates from the same family in S. divaricata, and chose four candidate genes whose expression levels were highly correlated with the furochromones contents (PCC > 0.90, Supplementary Table 10). By incubating the microsomes of SdPC recombinant yeast WAT11 cells with 3 and NADPH, a new product was yielded. LC/MS analysis showed an [M + H]+ ion at m/z 277, which could be fragmented into m/z 259 and m/z 205. Its structure was proposed to be visamminol (4). As no reference standard was available, we prepared 4 through hydrolysis of visamminol 3’-O-glucoside catalyzed by β-glucosidase (Supplementary Fig. 18), and confirmed its structure by NMR analysis. The 1H-NMR spectrum showed two methyl signals at δH 1.13 (s, H-4′) and δH 1.14 (s, H-5′), a tertiary proton signal at δH 4.71 (t, J = 8.6 Hz, H-2′), and a methylene signal at δH 3.02 (d, J = 8.6 Hz, H-1′), indicating the presence of a furan ring. The HMBC cross peaks from H-2′ (δH 4.71) to C-1′ (δC 26.6), C-7 (δC 166.4), and C-6 (δH 109.5) indicated the furan ring was conjugated with the benzene ring (Supplementary Figs. 19-22, Supplementary Table 9). HPLC and LC/MS analyses indicated the product had the same retention time and mass spectra with 4 (Fig. 2c). As the oxidative cyclization of isoprenyl phenolic compounds by chemical synthesis requires strong oxidizers like m-chloroperbenzoic acid33, SdPC represents an efficient enzyme catalyst for this reaction.

Post-modification steps for the biosynthesis of furochromones

C-11 of compounds 7-9 is hydroxylated, indicating the presence of a CYP450 enzyme. However, very few enzymes have been reported to catalyze a similar reaction, thus no suitable templates are available for gene blast search. By analyzing the transcriptome data, we selected 12 candidate CYP genes, whose expression levels were highly correlated with the total contents of 8 and 9 (PCC > 0.95, Supplementary Table 11). These genes were expressed in yeast WAT11 cells, and the microsomes were incubated with NADPH and 4 or 5 (Tris-HCl buffer, 50 mM) for functional characterization. LC/MS analysis indicated that SdCH could convert 4 and 5 (5-O-methylvisamminol) into 7 (norcimifugin) and 8 (cimifugin), respectively (Fig. 3a, Supplementary Fig. 23).

Fig. 3: Post-modification reactions for the biosynthesis of furochromones.
figure 3

Functional characterization of SdCH (a), SdOMT (b), and SdUGT1/2 (c, d). Shown are HPLC/UV chromatograms of the enzyme catalysis reactions (λ = 280 nm), together with (+)-ESI-MS and MS/MS spectra of the products. e Kinetic parameters of SdUGT1 and SdUGT2. f Simulated binding modes of 8 in the crystal structure of SdUGT2 when catalyzing 11-O-glycosylation (left) and 3′-O-glycosylation (right). Hydrogen bonds and hydrophobic interactions were labeled with blue and orange dashes, respectively. STD, reference standard. Control, reaction mixtures incubated with boiled enzymes or microsomes.

Likewise, we discovered the 5-O-methyltransferase SdOMT which converted 4 and 7 into 5 and 8, respectively (PCC > 0.90, Supplementary Table 12). Its function was characterized by enzymatic reaction and LC/MS analysis (Fig. 3b, Supplementary Fig. 24).

Glycosylation is the final step in the biosynthetic pathway. A total of 8 UGT genes with FPKM > 10 in the roots were chosen as candidate genes, and were cloned and expressed in E. coli BL21(DE3) (Supplementary Table 13). The functions were characterized by enzymatic catalysis with UDP-Glc (UDPG) as sugar donor, and 5 or 8 as sugar acceptor. SdUGT1 (UGT93BA1) could catalyze the glucosylation of 3′-OH of 5 (tertiary alcohol) and 11-OH of 8 (primary alcohol) to produce 6 (5-O-methylvisamminoside, 5-O-MVG) and 9 (prim-O-glucosylcimifugin, POG), respectively. The products could lose 162 Da in the MS/MS spectra, and their structures were identified by comparing with reference standards (Fig. 3c–d). Moreover, we discovered SdUGT2 (UGT93BB1), which exhibited a high amino acid sequence identity (54.93%) with SdUGT1 and showed the same catalytic activities (Supplementary Fig. 25). Interestingly, SdUGT1 and SdUGT2 only catalyzed 11-O-, but not 3′-O-glycosylation of 8. Moreover, they showed 31 and 2.8-fold higher catalytic efficiency (kcat/Km value) with 8 than with 5 as substrate (Fig. 3e, Supplementary Figs. 2627).

To elucidate mechanisms for the preference towards 11-OH, we acquired the crystal structure of SdUGT2 in complex with UDP through X-ray diffraction (PDB ID: 8ZNK, 1.88 Å) (Fig. 3f, Supplementary Fig. 28, Supplementary Table 14). The structure of SdUGT2 showed a typical GT-B fold with two Rossmann-like β/α/β domains. The N-terminal domain (NTD, residues 1–261 and 454–480) and the C-terminal domain (CTD, residues 262–453) are primarily responsible for sugar acceptor and sugar donor binding, respectively. Then we simulated the SdUGT2/UDP-Glc complex structure according to the GgCGT/UDP-Glc structure, and docked 8 into the structure in two potential binding modes through AutoDock 4.2 software34,35. Alanine scanning of residues around the binding pocket led to remarkably decreased activities for most mutants, indicating reliability of the docking results (Supplementary Fig. 29). In both binding modes, His32 is close to the glycosylation sites (11-OH or 3′-OH) with a distance below 3.1 Å. Thus, the hydroxyl groups could be easily deprotonated to initiate the glycosylation reaction. However, 8 could form more hydrogen bonds and hydrophobic interactions in the 11-O-glycosylation mode than in the 3′-O-glycosylation mode, as predicted by PLIP 202136. Thus, SdUGT2 preferred to catalyze 11-O-glycosylation of 8. On the other hand, docking of 9 into SdUGT2 showed the distance between 3′-OH and His32 was too far for glycosylation reaction. This result was consistent with the absence of furochromone 3′,11-di-O-glycosides in S. divaricata7 (Supplementary Fig. 30). We also simulated the structure of SdUGT1 using Alphafold237, and docked 8 and UDP-Glc into the structure in the same way as described above. Comparing the binding modes of 8 in SdUGT1 and SdUGT2, we noticed that 8 was more stable in SdUGT1 as a result of extra hydrogen bonds and π-stacking interactions (Supplementary Fig. 31). This is probably the reason for the higher catalytic efficiency of SdUGT1 than SdUGT2.

Thus far, we have identified seven enzymes from S. divaricata catalyzing biosynthesis of the two major furochromones 6 and 9. These genes are located at different chromosomes. Specifically, SdCH and SdUGT1 are located at Chr1, SdPCS and SdPC at Chr2, SdPT at Chr3, SdOMT at Chr6, and SdUGT2 at Chr8 (Fig. 4a). To our knowledge, this is the first report to unravel the complete biosynthetic pathway of furochromones. The expression levels of identified genes, except for SdUGT1 and SdUGT2, are highly correlated with the distribution of major furochromones among different organs of S. divaricata.

Fig. 4: The dissected biosynthetic pathway of furochromones and their de novo biosynthesis in N. benthamiana leaves.
figure 4

a Genomic location of biosynthetic genes in S. divaricata. Catalytic functions of biosynthetic genes responsible for formation (b) and post-modification (c) of the furochromone skeleton. Extracted ion chromatograms (EICs) of biosynthetic products in LC/MS analysis are shown. STD, reference standards. EV, agrobacterium-mediated transient expression using a vector without any biosynthetic genes.

De novo biosynthesis of Saposhnikovia furochromones in Nicotiana benthamiana leaves

POG and 5-O-MVG are important bioactive compounds in S. divaricata. Their extraction and purification are time and labor-consuming. It is imperative to engineer the biosynthetic pathway in chassis organisms. In this work, we realized de novo biosynthesis of furochromones in Nicotiana benthamiana leaves. Transient expression of the seven genes in N. benthamiana leaves revealed that all genes showed the expected catalytic activities (Fig. 4b-c). When all the seven genes were infiltrated into N. benthamiana leaves, 6 and 9 could be detected. Given the low catalytic efficiency of SdPT, we increased the OD600 value of SdPT strain to 0.40, and 6 and 9 were generated at a yield of 17.48 μg/g and 3.82 μg/g (dry weight, Supplementary Fig. 32), respectively, where the yields were calculated with six independent biological replicates.

The distribution of chromones and their biosynthetic genes in Apiaceae plants

To gain deep insights into the evolution of biosynthetic pathway of furochromones in Apiaceae, we incorporated another seven Apiaceous species (Coriandrum sativum, Apium graveolens, Angelica sinensis, Ligusticum chuanxiong, Daucus carota, Bupleurum chinense, Centella asiatica) into metabolic, comparative genomic and transcriptomic analyses (Supplementary Data 24, Supplementary Table 15). These species represented different evolutionary lineages, including the subfamilies Mackinlayoideae and Apioideae (including the tribe Bupleureae, Apieae, Sinodielsia Clade, Selineae). We first determined and compared the contents of four typical furochromones (5, 6, 8, and 9) among the eight species (Supplementary Figs. 3353). Unexpectedly, the furochromones did not show a stepwise accumulation along the phylogeny backbone but exhibited a drastic enrichment in S. divaricata. The contents of furochromones in the other species were generally low (Fig. 5a, Supplementary Table 16). This result implied substantial differences in furochromone biosynthesis between S. divaricata and the other Apiaceous plants.

Fig. 5: The distribution of chromones and their biosynthetic genes in Apiaceae plants.
figure 5

a Contents of typical furochromones in various organs of Apiaceae plants, and syntenic gene analysis. b Contents of simple chromones in various organs of Apiaceae plants and the hypothesized causes (n = 3, three biologically independent samples were tested, and data are presented as mean values +/− SD).

To investigate the evolutionary shift in furochromone biosynthesis from early diverged Apiaceous lineage to S. divaricata, we constructed a maximum likelihood (ML) phylogeny of Apiaceae species based on 398 strict single-copy orthologous genes (Fig. 5a). It revealed that S. divaricata belonged to the latest diverged clade including C. sativum. Then, the conserved syntenic gene blocks containing each furochromone-biosynthetic gene was identified in each Apiaceous species (Supplementary Figs. 5464). The syntenic and homologous genes of SdUGTs and SdCH were detected in all Apiaceous species, while PCSs, OMTs and PTs were limited to Apioideae, indicating a stepwise assembly of the furochromone biosynthetic pathway in Apiaceae (Fig. 5a). Remarkably, the last piece of the puzzle, SdPC, was detected only in S. divaricata (Fig. 5a). This clue motivated us to speculate that most Apiaceous species except S. divaricata may not contain any functional PC, thus leading to low furochromone content.

Additionally, we found that simple chromones (2 and 3) showed consistent distribution with furochromones (4, 5, 6, 8 and 9) in Apiaceae. They were also drastically enriched in S. divaricata (Fig. 5b, Supplementary Figs. 6588). As 2 and 3 were generated before the catalysis of PC, their absence in other Apiaceous species was more likely caused by blocking of the initial step (PCS catalysis) (Fig. 5b). Moreover, most potential Apiaceous PTs involved in furochromone biosynthesis showed moderate or high expression abundances (Supplementary Table 17). Thus, except for PC, we mainly focused on the upstream PCS, the housekeeping gene in the chromone biosynthetic pathway.

Proximal duplication of SdPCS promotes furochromone accumulation in S. divaricata

We retrieved all potential PKS III genes in S. divaricata and the other seven Apiaceae species, and constructed an ML tree. A strongly supporting clade (bootstrap support value (BS) = 100) containing SdPCS and 20 potential Apiaceous PCSs was identified (Fig. 6a). Most genes in this clade were in the same syntenic region, implying they shared the same ancestor (Supplementary Fig. 63). No PCS was detected in C. asiatica, one of the most basal species in Apiaceae, indicating that PCS may first emerge in the Apioideae subfamily. Then we expressed and characterized all the 20 genes, and compared their functions by enzymatic assays (Fig. 2a, Supplementary Figs. 8994). Most of these enzymes showed similar catalytic abilities by converting 1 to 2 (Fig. 6a). However, the expression level of SdPCS in the root of S. divaricata was remarkably higher than the other homologous PCS genes (FPKM value, 162.05 vs 0–5.19) (Fig. 6a).

Fig. 6: High expression of SdPCS promotes the accumulation of furochromones in the roots of S. divaricata.
figure 6

a Phylogenetic relationships, enzymatic activities, and expression abundances of Apiaceous PCSs. The PCS enzyme activity was quantified by HPLC/UV peak area of generated noreugenin (2) (λ = 280 nm, n = 3, three biologically independent samples were tested, and data are presented as mean values +/− SD). R/R, L, and P/S represent Root/Rhizome, Leaf, and Petiole/Stem, respectively. b Comparison of the FPKM between SdPCS and other potential Apiaceous PCSs in the genome-wide context. Each red line represents a log2 (FPKM ratio) between SdPCS and a potential Apiaceous PCS. Each grey density plot indicates the log2 (FPKM ratio) distribution of genome-wide orthologous genes of one Apiaceae species. c Syntenic regions containing Apiaceous PCSs. d Expression levels of the three PCS copies in S. divaricata (n = 3, three biologically independent samples were tested, data are presented as mean values +/− SD).

We are aware that direct inter-species comparison of FPKM might lead to misinterpretation since the utilization of FPKM value is usually limited to intra-species level. Instead, we compared the FPKM ratio between SdPCS and other potential Apiaceous PCSs against 9470–27,214 pairs of orthologous genes as genomic background. The log2 (FPKM ratio) value of >95% genome-wide ortholog pairs between S. divaricata and other Apiaceous species in root/rhizome is <5.00, with a mean near zero (−0.005). This result indicates that FPKM values of the investigated species are basically comparable. It is noteworthy that the log2 (FPKM ratio) between SdPCS and other Apiaceous PCSs in root/rhizome, ranging from 6.12 to 20.63 (mean = 14.15), was higher than 95% of genome-wide orthologous gene pairs (Fig. 6b). Thus, we deduced the expression abundance of SdPCS was significantly higher than the other Apiaceous PCSs. As the initial step is usually the rate-limiting step in biosynthetic pathway38, the exceptionally high expression of SdPCS may contribute to the furochromone accumulation in S. divaricata.

Moreover, we traced the origin of SdPCS (SaDchr02G001054), and found it might originate from proximal duplication of the nearby SdPCS2 (SaDchr02G001052) (Fig. 6c). The PCS ML tree revealed the three PCS copies in S. divaricata clustered in the same clade and SdPCS2 diverged earlier, indicating SdPCS was originated from S. divaricata specific duplication event rather than directly inherited from ancestor species (Fig. 6a). Sequence analysis further confirmed this deduction. SdPCS2 showed the highest coding sequence (CDS) identity with other syntenic Apiaceous PCSs among the three SdPCSs (Supplementary Fig. 95). Although the chromosomal position of SdPCS3 (not anchored to any chromosome) is unknown, the lowest CDS identity with other Apiaceous PCSs indicated it was younger than SdPCS and SdPCS2 (Supplementary Fig. 95). Therefore, SdPCS2 is most likely the direct progenitor of SdPCS (Fig. 6c). Notably, we found that SdPCS2 was nearly not expressed in any tissue (Fig. 6d), and its syntenic genes in other Apiaceous species were almost not expressed, either (Supplementary Table 18). Thus, the proximal duplication and high expression of SdPCS profoundly contributed to the biosynthesis of furochromones in the roots of S. divaricata.

The absence of functional PC gene leads to low furochromone contents in most Apiaceous plants

As shown in Fig. 7a, we did not detect syntenic genes of SdPC in the other Apiaceous plants. We retrieved all Apiaceous CYP736s and constructed an ML tree. The reported PpDC was also included22. A robust clade (BS = 100) containing SdPC and 18 other potential PCs was identified (Fig. 7b, Supplementary Fig. 64). Although DCAR_313054 and As05G02751 were not included in this clade, they were syntenic with several genes including SaDchr05G002066 and LCX7BG003145 (Supplementary Fig. 96). The 11 potential PCs from A. sinensis lost one exon and was likely to lose the cyclization activity, thus they were not further analyzed. Finally, we cloned and characterized the other 10 potential PCs. Strikingly, none of them except for SdPC was effective in producing visamminol (4) (Fig. 7c). This observation confirmed our hypothesis that the absence of PC genes may lead to the low contents of furochromones in most Apiaceous species. However, we cannot exclude the possibility that these or other potential PCs might weakly catalyze the reactions in vivo, as trace furochromones were detected in all Apiaceous plants.

Fig. 7: The absence of functional PC leads to low furochromone contents in most Apiaceous plants.
figure 7

a Syntenic regions containing SdPC. The syntenic gene pairs are connected by grey lines. b Phylogenetic relationship and gene structure of Apiaceous PCs. As05G02748 (AsDC) is the same as As05G00644 in the initial annotation version. c HPLC/UV chromatograms showing the in vitro enzymatic activity of potential Apiaceous PCs (λ = 280 nm).

Since PpDC participated in the generation of furocoumarins in Peucedanum praeruptorum22, we tested the catalytic activities of homologous PCs using demethylisuberosin as substrate. SdPC4, AsDC, LcPC2 and DcPC showed cyclization activities (Supplementary Fig. 97). However, SdPC could not catalyze this reaction despite its high sequence similarity with SdPC4 (Supplementary Fig. 98). Interestingly, these four genes were located at the same syntenic block, which did not include SdPC (Supplementary Fig. 96). Thus, SdPC is a homologous enzyme with novel function, and its evolutionary origin warrants further investigation in the future.

In conclusion, this work dissected the complete biosynthetic pathway of prim-O-glucosylcimifugin and 5-O-methylvisamminoside, the major bioactive furochromone glucosides in S. divaricata. The functions of seven biosynthetic enzymes were characterized by enzymatic catalysis reactions, and the biosynthetic pathway was verified by de novo biosynthesis of major furochromones in Nicotiana benthamiana. Moreover, we explored the evolutionary mechanisms of furochromones biosynthesis in Apiaceae plants. Through comparative metabolic, genomic, and transcriptomic analyses of eight plant species, we found that proximal duplication and high expression of a pentaketide chromone synthase gene SdPCS, as well as the presence of a lineage-specific peucenin cyclase gene SdPC, contribute to the abundant and specific accumulation of furochromones in the roots of S. divaricata. This work provides critical insights into the biosynthesis of furochromones and serves as a platform for their metabolic engineering production.

Methods

Materials and reagents

The sources of fresh plants of Saposhnikovia divaricata, Centella asiatica, Bupleurum chinense, Daucus carota, Angelica sinensis, Apium graveolens, and Coriandrum sativum are given in Supplementary Data 2. We sampled leaves, petioles, and roots of Saposhnikovia divaricata for both metabolic analyses and RNA-Seq, with the roots revealing three different growth levels (Supplementary Table 15). For each tissue/stage, three replicates were sampled.

The chemical reference standards and sugar donors used in this study were purchased from YuanYe Biotechnology Co., Ltd. (Shanghai, China). Methanol and acetonitrile (Thermo Fisher Scientific, USA) were of HPLC grade. The conversion rates were determined by HPLC/UV analysis on an Agilent HPLC 1260 instrument. Samples were separated on a Zorbax SB-C18 column (4.6×250 mm, 5 μm, Agilent, USA). The column temperature was 30 °C. To calculate the conversion rates, peak areas of both substrate and product were integrated by Chromeleon® at a certain wavelength. LC/MS analysis was performed on a Q-Exactive quadrupole Orbitrap mass spectrometer (Thermo Fisher Scientific, USA).

Genome sequencing, assembly, and annotation

For the PacBio library construction, 15 μg genomic DNA from the leaves of S. divaricata was fragmented into approximately 15 kb using g-TUBEs (Covaris, USA). After removing short fragments and single-strand overhangs, the retained fragments were converted into the proprietary SMRTbell library with the PacBio DNA Template Preparation Kit (Pacific Biosciences, CA, USA). Single Molecule Real Time (SMRT) sequencing was performed on a PacBio Sequel II sequencing platform. For Hi-C library construction, chromatin was first fixed in place with formaldehyde in the nucleus and then extracted. The extracted chromatin was digested with DpnII. The 5’ overhangs of resulting fragments were then filled in with biotinylated nucleotides, and free blunt ends were ligated. After ligation, the DNA was purified from protein and treated following the Illumina Next Generation manufacturer’s instructions. The libraries were subsequently sequenced on Illumina Hiseq X, producing 166.99 Gb 2 × 150 bp paired-end reads. The raw data of PacBio subreads was filtered to HiFi reads by PBccs (v6.4.0) (https://github.com/PacificBiosciences/ccs), and subsequently assembled with Hifiasm (v0.16.0)39, which generated a pair of haplotype-resolved assemblies, hap1.p_ctg (1.95 Gb) and hap2.p_ctg (1.93 Gb). We selected the slightly larger hap1.p_ctg for subsequent BUSCO assessment, scaffolding, and annotation. The initial assembled contigs were anchored to chromosomes by 3D-DNA pipeline (v201008)40 and further manually adjusted to produce a chromosome-level genome. BUSCO (v5.4.3)41 was used for benchmarking the genome with the “embryophyte_odb10” database.

EDTA (v2.0.0)42 was used to de novo identify, annotate, and classify the repetitive elements in the genome of S. divaricata. Prior to protein-coding gene annotation, the annotated repetitive elements in the genome were soft masked with bedtools (v2.28.0)43. RNA-Seq raw reads of S. divaricata were filtered with fastx-toolkits (v0.0.14) (http://hannonlab.cshl.edu/fastx_toolkit/index.html) and then assembled through Hisat2 (v2.2.1)44 and Stringtie (v2.2.0)45. The raw assembly of transcripts was further validated by PASA (v2.5.1)46, which were then incorporated into the MAKER (v3.01.03)47 pipeline to automatedly identify protein-coding genes. Finally, the gene models identified by MAKER were updated by PASA (v2.5.1)46. Some genes were recognized and annotated by manual examination. Their names were different from the other genes that were annotated and named using software. To reveal the relative locations of a manual-annotated gene, we named it by its nearby upstream gene name plus a suffix of downstream_s1. For example, SaDchr03g003979_downstream_s1 (SdPT2) is a manually annotated gene located downstream region of the SaDchr03g003979. Function annotations of the protein-coding genes were carried out by BLASTP searches against entries in both NCBI non-redundant protein (NR) (https://www.ncbi.nlm.nih.gov/) and Swiss-Prot (https://www.uniprot.org/) databases. The prediction of conserved domains for the genes was performed by InterProScan (v5.11-51.0)48. The annotations of the GO terms (http://geneontology.org/) and KEGG pathways (https://www.genome.jp/kegg/) for the genes were annotated with eggNOG-mapper (v2.1.10-0)49.

Total RNA isolation, RNA-Seq, and gene expression quantification

The total RNA was extracted with the TranZolTM kit (Transgen Biotech, China) following the manufacturer’s instructions, and was used to synthesize the first-stranded complementary DNA (cDNA) with TransScript one-step genomic DNA (gDNA) removal and cDNA synthesis SuperMix (Transgen Biotech, China). The transcriptome data of different tissues of S. divaricata were sequenced at Novogene Co., Ltd. (Beijing, China).

The raw RNA-seq reads were filtered in fastp50 with default parameters and then mapped to the reference genome of S. divaricata by Hisat2 (v2.2.1)44. The counts of reads mapping to exons of each gene were calculated by featureCounts51. The FPKM value of each gene was calculated in R.

Genome-wide mining for furochromone biosynthetic genes

HMMER3 (v3.3.2)52 was used to identify 47PKSs, PTs, CYPs, OMTs and UGTs with an e-value of 1e−6. The HMMER profiles PF02797 and PF00195 were used for PKS III search. PF01040 and PF00067 was utilized to identify CYPs and PTs. PF00891 and PF08100 were employed to search OMTs, PF00201 was applied for UGT identification. The possible pseudogenes (length of predicted CDS < 200 amino acids) were discarded. Gene structures of all candidate genes were manually adjusted with IGV-GSAman (https://gitee.com/CJchen/IGV-sRNA).

Pearson correlation coefficients and the P values between contents of furochromones and the FPKM of genes among different tissues of S. divaricata were calculated with the corr.test function in R package psych (https://rdocumentation.org/packages/psych/versions/2.3.3). The correlation between the expression abundance of each gene and furochromone content was analyzed based on identical sample composition, involving 15 pairs of strict-matched samples. Those unexpressed genes were not incorporated in correlation analysis.

Phylogenetic and microsynteny analyses

ML phylogeny was constructed based on 398 strict single-copy orthologous genes identified by OrthoFinder (v2.5.4)53 to clarify the phylogenetic relationship among the eight Apiaceous species. The protein sequences were aligned by MUSCLE (v5.1.linux64)54 and subsequently concatenated by Phylosuite (v1.2.2)55. ModelTest-NG (v0.1.7)56 was used to detect the best-fit amino acid substitution model, based on which RAxML-NG (v1.1.0)57 was employed to construct the ML phylogeny with 1,000 bootstrap analyses. The construction of phylogeny of biosynthetic genes follows the same method above.

The microsyntenic analyses generally followed the methods of Griesmann et al. (2018) and Yang et al. (2023). All vs. all blastp (E-value = 1e−5) was conducted for the protein sequences among eight Apiaceous genomes with BLAST (v2.13.0 + )58. The output protein identity matrix was loaded in JCVI (v1.2.7) to produce collinear gene blocks. Subsequently, we identified the syntenic region containing the furochromone biosynthetic genes ( ± 100 kb) in each species using the genome of S. divaricata as the reference. Because the syntenic retention varied between different species pairs, we compared the syntenic gene pairs for all species pairs and retained those gene pairs demonstrating consistent syntenic relationships. To eliminate the bias induced by mistaken annotation, we manually checked the corrected gene structure in syntenic region and re-organized the microsyntenic gene pairs.

Molecular cloning

The full-length candidate genes were amplified from cDNA with TransStart FastPfu DNA Polymerase (Transgen, China). Candidate genes for PCS, OMT and UGT were recombined in the pET-28a (+) vector (Invitrogen, USA) at BamH I site. Candidate genes for PT, PC and CH were cloned into pESC-Leu vector at BamH I (Invitrogen, USA). Sequences of the primers used in this study are listed in Supplementary Table 19.

Expression of candidate biosynthetic genes

The recombinant plasmids for PCS, OMTs and OGTs were introduced into E. coli BL21 (DE3) (Transgen Biotech, China) for heterologous expression. The E. coli cells were grown in 500 mL Luria-Bertani medium (JS0666, JSENB, China) containing kanamycin (50 μg/mL) at 37 °C. After OD600 reached 0.4–0.6, the cells were induced with 0.1 mM IPTG at 18 °C. After 18–24 h, the cell pellets were harvested by centrifugation (5632 × g, 3 min at 4 °C), and then resuspended in 15 mL lysis buffer (50 mM NaH2PO4 pH 8.0, 300 mM NaCl, 30 mM imidazole, pH 8.0). Then cells were disrupted by sonication on ice, and the cell debris was removed by centrifugation at 5632 × g for 50 min at 4 °C. The supernatant was collected and loaded onto a pre-equilibrated column (His TrapTM HP, 5 mL, GE Healthcare), and eluted with different concentrations of elution buffer (50 mM NaH2PO4, pH 8.0, 300 mM NaCl, 30/300 mM imidazole)59. The purified protein solution was added with approximately 0.5 mL glycerol (25%) and stored at −80 °C.

The recombinant plasmids for PT, PC and CH were introduced into yeast strain Saccharomyces cerevisiae WAT11 for heterologous expression. The yeast cells were grown in synthetic dropin medium without leucine (SD-Leu). Liquid cultures of the recombinant strains were set up by picking a single colony and growing in 50 mL of SD-Leu medium containing 20 g/L glucose at 28 °C overnight. The cells were collected by centrifugation (1000 g, 2 min) and resuspended in 25 mL of SD-Leu medium containing 20 g/L galactose to induce target protein expression for 24–48 hours at 28 °C. The microsomes of yeast cells were prepared as reported31.

Enzyme activity assay

The purified proteins and prepared microsomes were used for functional characterization by in vitro enzymatic reactions. The reactions were conducted in 100 μL Tris-HCl buffer (50 mM, pH 8.0) containing 50 μg purified enzymes or 20 μL microsomes. The incubation mixtures include substrates (0.1 mM, malonyl-CoA for PCSs, noreugenin for PTs, peucenin for PCs, visamminol for OMTs and CHs, 5-O-methylvisamminol for CHs and SdUGT1/2, norcimifugin for OMTs, and cimifugin for SdUGT1/2), and donors/cofactors (0.5 mM, dimethylallyl pyrophosphate (DMAPP) and MgCl2 for PTs, nicotinamide adenine dinucleotide phosphate (NADPH) for PCs and CHs, S-adenosylmethionine (SAM) and dithiothreitol (DTT) for OMTs, and uridine diphosphate glucose (UDPG) for SdUGT1/2). The reactions continued in a shaking incubator for 2 hours (37 °C for OMTs and UGTs, 30 °C for PCSs, PTs, PCs and CHs). For PCSs, reactions were terminated by adding 10 μL 20% HCl followed by extraction with 300 μL ethyl acetate and redissolution in 100 μL MeOH. The other reactions were terminated by adding 100 μL ice-cold MeOH. The mixtures were then centrifuged at 21,130 × g for 20 min. The supernatants were analyzed by HPLC and LC/MS.

Samples were separated on a Zorbax SB-C18 column (4.6 × 250 mm, 5 μm, Agilent, USA). The HPLC methods are shown in Supplementary Table 20. LC/MS analysis was performed on a Q-Exactive hybrid quadrupole-Orbitrap mass spectrometer equipped with a heated ESI source (Thermo Fisher Scientific, USA). The MS parameters were as follows: sheath gas pressure 45 arb, aux gas pressure 10 arb, discharge voltage 4.5 kV, capillary temperature 350 °C. MS1 resolution was set as 70,000 FWHM, AGC target 1*E6, maximum injection time 50 ms, and scan range m/z 100–1000. MS2 resolution was set as 17,500 FWHM, AGC target 1*E5, maximum injection time 100 ms, NCE 35.

Biochemical properties assay

To optimize the pH value, different reaction buffers with pH from 4.0–6.0 (citric acid-sodium citrate buffer), 6.0–8.0 (Na2HPO4-NaH2PO4 buffer), 7.0–9.0 (Tris-HCl buffer), and 9.0–11.0 (Na2CO3-NaHCO3 buffer) were tested. To optimize the reaction temperature, the reactions were incubated at 4, 18, 30, 37, 45, or 60 °C. All enzymatic reactions (100 μL reaction mixtures, the same as those used for enzyme activity assay) were conducted in three parallel experiments (n = 3). The reactions were terminated and centrifuged at 21,130 × g for 20 min for HPLC analysis as described above. The conversion rates in percentage were calculated from peak areas of products and substrates in HPLC/UV chromatograms (Agilent 1260, USA) (The peak area of product divided by the total peak area of product and substrate). The catalytic efficiency of SdPCS was evaluated by the peak area of target product (2).

Determination of kinetic parameters

Reactions were conducted in a final volume of 50 μL with 50 mM reaction buffer, suitable concentration of protein or microsome, 1 mol/L of saturated donors/cofactors (UDPG for SdUGT1/2, DMAPP and MgCl2 for SdPT), and different concentrations of substrate (5 or 8 for SdUGT1/2, 2 for SdPT, 1 for SdPCS) (Supplementary Table 21, Supplementary Fig. 9d, Supplementary Fig. 11d). The reactions were quenched (adding 70 μL pre-cooled methanol for SdUGT1/2 and SdPT, adding 5 μL 20% HCl followed by extraction with 300 μL ethyl acetate and redissolution in 100 μL MeOH for SdPCS) after incubating at the optimal temperature for a certain reaction time (15 min for SdUGT1/2, 60 min for SdPT and SdPCS), and then centrifuged at 21,130 × g for 20 min. The supernatants were used for HPLC analysis. All experiments were performed in triplicate. The conversion rates were calculated as described above. To determine the yields of 2 produced by SdPCS, we acquired its regression equation by testing calibration standard solutions (7.78 μΜ reference standard 2 diluted by 2, 4, 8, 16, 32, 64, 128 and 256 folds, Supplementary Fig. 99), and calculated the conversion rates of each reaction according to HPLC/UV analysis. The kinetic parameters were calculated with Michaelis-Menten plot fitted by Graphapad Prism 8.060.

Scaled-up enzymatic reactions

To prepare the prenylated product, the reaction mixtures contained 100 μL buffer (50 mM Tris-HCl, pH 8.0), 0.2 mM noreugenin, 1.0 mM DMAPP, 2.0 mM MgCl2, and 20 μL microsome. A total of 1200 parallel tube reactions were conducted. The reactions were performed at 30 °C overnight and terminated by extraction with 4-fold volume of ethyl acetate. The organic solvent was removed under reduced pressure. The residue was dissolved in 1.5 mL of methanol. The products were then purified by reversed-phase semi-preparative HPLC. The structures were characterized by HRMS and extensive 1D and 2D NMR analyses.

To prepare the hydrolyzed product of visamminol-3’-O-glucoside, the reaction mixture contained 20 mL buffer (50 mM NaH2PO4-Na2HPO4, pH 6.0), 0.5 mM visamminol-3’-O-glucoside, and 200 mg β-glucosidase (Solarbio, Beijing, China). A total of 5 parallel tubes were used. The reactions were performed at 45 °C for 4 hours and terminated by extraction with 4-fold volume of ethyl acetate. The extract was treated as described above.

Crystallization and structural determination

The full-length cDNA of SdUGT2 was cloned into pET-28a (+) vector. The S-tag of pET28a was removed. A TrxA-tag and 6×His-tag followed by thrombin site were added before the N-terminus of the target protein to facilitate purification. The TrxA-His-thrombin-SdUGT2 protein was expressed in E. coli (DE3) strain and purified by Ni-affinity chromatography (GE Healthcare). After purification, the recombinant protein was digested by thrombin to remove tag (4 °C, 8 h). The sample was mixed with Ni-NTA affinity beads for the second time to purify the protein. The flow-through was concentrated and then applied to size-exclusion chromatography on a SuperdexTM 200 increase 10/300 GL prepacked column (GE Healthcare) for further purification. The elution buffer was 20 mM Tris-HCl (pH 7.5) and 50 mM NaCl. Fractions containing SdUGT2 were collected and concentrated to 20 mg/mL, flash-frozen on liquid nitrogen, and then stored in a −80 °C freezer. The purified protein was incubated with 6 mM UDP for 2 h. The crystals of SdUGT2 were obtained after 5 days at 16 °C in hanging drops containing 1 μL of protein solution and 1 μL of reservoir solution (0.2 M lithium sulfate monohydrate, 0.1 M Bis-Tris pH 5.25, 28% w/v polyethylene glycol 3,350) (Supplementary Fig. 28). The crystals were flash-frozen in the reservoir solution supplemented with 25% (v/v) glycerol.

The diffraction data of SdUGT2 crystal were collected at beamlines BL19U1 and BL02U1 Shanghai Synchrotron Radiation Facility (SSRF). The data were processed with XDS. The structures were solved by molecular replacement with Phaser. Crystallographic refinement was performed repeatedly with Phenix and COOT. The refined structures were validated by Phenix and the PDB validation server (https://validate-rcsb-1.wwpdb.org/). The final refined structures were deposited in the Protein Data Bank. The diffraction data and structure refinement statistics are shown in Supplementary Table 14.

Molecular docking

Since all the reported UGT structures are highly conserved for the UDP-sugar binding domain, we simulated the SdUGT2/UDPG sugar complex structures by superimposing the UDP parts of UDPG to reported structures. With reference to the docking parameters of UGT71AP2, the axis of grid box for SdUGT1/2 is x = −31.157, y = −21.476, and z = −11.44361. Then we performed Auto-Dock analysis by Lamarckian Genetic Algorithm with default parameters for 250,000 evaluations in 100 cycles, and the other parameters followed the default settings. We selected conformations for further structural analysis according to the binding energies and possibilities for glycosylation reactions to happen. For SdUGT2 in complex with UDPG and 8, we selected two conformations representing 11-O-glucosylation and 3′-O-glycosylation, respectively, among a set of 46 conformations with the lowest binding energy. For SdUGT2 in complex with UDPG and 9, we selected one conformation from a set of two conformations with the second lowest binding energy, since no conformation with lower binding energy was suitable for further 3′-O-glycosylation (Supplementary Fig. 100).

De novo biosynthesis of furochromones in N. benthamiana

The full-length DNA sequences of SdPCS, SdPT, SdPC, SdOMT, SdCH and SdUGT1/2 were amplified with primers given in Supplementary Table 19. The PCR products were sub-cloned into pDonr207 vectors with the Gateway BP Clonase II Enzyme Mix and then cloned into pEAQ-HT-DEST1 vector with the Gateway LR Clonase II Enzyme Mix according to the manufacturer’s instructions62. The recombinant pEAQ-HT-DEST1-target gene vectors were transformed into Agrobacterium tumefaciens strain GV3101 by chemical conversion method. Single colonies were inoculated at 28 °C and subsequently shaked in LB culture medium (50 μg/mL kanamycin and 50 μg/mL rifampicin) until OD600 = 0.6. After centrifugation, bacteria were re-suspended in MMA buffer to OD600 = 0.2 for each strain. Different strains were mixed for transformation. The infection solution was infiltrated into leaves of 5–6 weeks old N. benthamiana. After 7 days, the samples were harvested and freeze-dried. The secondary metabolites were extracted by methanol and analyzed by LC/MS. The contents of compounds 5, 6, 8 and 9 were quantified by regression equations. Reference standards 5, 6, 8 and 9 were respectively dissolved in DMSO to make solutions of 1 mg/mL, which were 1:1 mixed to obtain the mixed stock solution. The stock solution was serially diluted with methanol containing 4 μg/mL bergenin as internal standard to obtain calibration standard solutions (diluted by 2, 4, 8, 16, 32, 64, 128, 256, 512, 1,024, 2,048, 4,096, 8,192, 16,384, 32,768, 65,536 and 131,072 folds, respectively). The regression equations of 5, 6, 8 and 9 were listed in Supplementary Figs. 101104. The LC/MS method parameters are listed in Supplementary Table 20. The data were analyzed using XcaliburTM 4.3 software. The yields of 6 and 9 in each group were the average contents of six independent biological replicates.

Metabolite quantification

The secondary metabolites of different Apiaceae plants were extracted by methanol, and analyzed by LC/MS following the methods mentioned above.

Statistics and reproducibility

No data were excluded from the analyses. The experiments were not randomized.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.