Introduction

Benzylisoquinoline alkaloids (BIAs) are a diverse group of approximately 2500 compounds with significant pharmacological properties, including analgesic, antiseptic, and antineoplastic effects1. These compounds are primarily found in early-diverging eudicots (e.g., Ranunculales and Proteales) and basal angiosperm Magnoliids (e.g., Magnoliales, Laurales, and Piperales) (Supplementary Fig. 1a)2,3,4. BIA biosynthesis begins with the condensation of dopamine and 4-hydroxyphenylacetaldehyde (4-HPAA), catalyzed by norcoclaurine synthase, forming norcoclaurine (NCC)5. This intermediate serves as foundation for diverse BIA class, such as bisbenzylisoquinolines, aporphines, morphinans, protopines, berberines, phthalideisoquinolines (Supplementary Fig. 1b)3,6,7. Although BIA biosynthesis can be complex, they largely stem from the mainstream pathway of 1-benzylisoquinolines (1-BIAs), which do not have a direct C-C coupling between the phenolic group and the isoquinoline ring. Biosynthesis of 1-BIAs mainly involves methylation and hydroxylation of NCC. Briefly, NCC is 6-O-methylated and N-methylated by 6OMT and CNMT enzymes, respectively, forming (S)-N-methylcoclaurine (NMC). The NMC is further catalyzed by CYP80B and 4’OMT enzymes, yielding (S)-reticuline (RTC), a central branch-point intermediate for synthesizing diverse BIA backbones through C-C coupling and other modifications (Supplementary Fig. 1c)8,9,10,11.

The diversification of BIAs in land plants is likely in a step-wise fashion, with the early-diverging Magnoliids species accumulating structurally simple BIAs (e.g., 1-BIAs and aporphines) and the latter-diverging eudicot Ranunculales species accumulating structurally more complex BIAs (e.g., sanguinarine, berberine and morphinan) (Supplementary Fig. 1a)2,3. While BIA pathways in Ranunculales (e.g., Papaver somniferum,) and Proteales species (e.g., Nelumbo nucifera) were extensively studied12,13,14,15, their biosynthesis and evolution in Magnoliids remain poorly understood. Recent years have seen increasing studies on the early-diverging angiosperm group Magnoliids, revealing new insights into diversification of BIAs. For instance, a study of Liriodendron chinense (Magnoliales) led to the discovery of not only a functionally conserved 6OMT (LcOMT1), but also a novel NMT (LcNMT1) capable of catalyzing both mono- and di-methylation of (S)-coclaurine (COC)16. In Aristolochia contorta (Aristolochiaceae, Magnoliids), researchers identified 6OMTs (AcOMT1 and AcOMT2) and a promiscuous CYP80Q enzyme (AcCYP80Q8) that converts (S,R)-NMC into (S,R)-Glaziovine. This contrasts with the stereospecific CYP80Q (NnCYP80Q1) in Nelumbo nucifera (Proteales), which exclusively recognizes (R)-NMC as a substrate15,17,18. While these limited studies indicate conservation and diversification of BIAs between Magnoliids and other eudicot plant families, a systematic analysis of the complete mainstream pathway for 1-BIA biosynthesis in Magnoliids remains lacking. This knowledge gap thus hinders a comprehensive understanding of BIA evolution across flowering plants.

Here, we focused on Houttuynia cordata (Saururaceae)--a Magnoliids species known as Yu Xing Cao--a herbaceous plant widely used in Traditional Chinese Medicine (Fig. 1a)19. We characterized a cascade of key enzymes (6OMT, NMT, CYP80B, and 4’OMT) responsible for RTC biosynthesis in H. cordata, providing compelling evidence for conservation of 1-BIA biosynthesis between Magnoliids and other angiosperm families. We also identified a neofunctionalized NMT (HcNMT5) that specifically produces di-methylated COC and the first reported isoboldine synthase (HcCYP80G2) in plants. Intriguingly, evolutionary analysis reveals that the CYP80B, 4’OMT and 6OMT genes are frequently clustered across diverse Magnoliids and Ranunculales species. This gene clustering correlates with coordinated chemical modifications on the phenol rings of RTC, enabling selective C-C coupling reactions to generate specific scaffolds for diverse BIAs in flowering plants. The chemistry-associated gene clustering thus may complement other widely debated scenarios explaining metabolic gene clustering, such as intermediate toxicity, coordinated gene co-expression and natural selections20,21,22,23. This work not only identifies the conservation and diversification of BIA biosynthesis in angiosperms, but also shows that biochemical reaction interdependency may be an overlooked evolutionary driver of gene cluster formation.

Fig. 1: Characterization of 6OMTs and NMTs in H. cordata.
figure 1

a Images of H. cordata showing its leaves and rhizomes. b Detection of four major 1-BIAs (compared to standards) in plant tissues of H. cordata using UPLC-MS/MS. Extracted ion chromatograms are shown. c Enzyme assay of Hc6OMT1 or Hc6OMT2 with supply of norcoclaurine (NCC) in transient N. benthamiana system. Coclaurine (COC) was exclusively detected in samples expressing 6OMTs by UPLC-MS, compared to a chemical standard. d Enzyme assay of HcNMTs with supply of COC in transient N. benthamiana system. Aligned extracted ion chromatograms are shown for the standards and tobacco samples in UPLC-MS analysis. Mono- or di-methylated products are shown. In both (c, d), tobacco leaves expressing with a MNG fluorescence protein were used as negative control. e Partial sequence alignment for HcNMTs and characterized CNMTs. A (S)-reticuline (RTC) N-methyltransferase PsRNMT (Papaver somniferum; Ranunculales) with di-methylation functionality is included. The key amino acid residue is highlighted in a red box. f Proposed MGC biosynthetic pathways in H. cordata.

Results

Identification of genes for 1-BIA biosynthesis in H. cordata

In Magnoliids plants, the prevalent type of BIAs are structurally simple 1-BIAs and aporphines (Supplementary Fig. 1a)3. To gain insights into 1-BIA biosynthesis in H. cordata (Fig. 1a), we purchased four available 1-BIA standards-- NCC, COC, NMC and RTC (Supplementary Table 1) --and applied targeted metabolic analysis to two plant tissues: rhizomes and leaves (Fig. 1a). Ultra-performance liquid chromatograms coupled with mass spectrometry (UPLC-MS/MS) analysis showed that the four 1-BIAs were present in H. cordata, with predominant accumulation in rhizomes (Fig. 1b; Supplementary Fig. 2). To identify 1-BIA biosynthetic genes in H. cordata, we carried out homology searches in the recently published H. cordata genome24 for enzyme families involved in the RTC pathway (see Methods), including OMT, NMT and CYP80 families (Supplementary Tables 24). This analysis identified two 6OMT, two 4’OMT, five NMT and seven CYP80 genes (see “Methods”) (Supplementary Figs. 35; Supplementary Table 5). Examination of their gene expression pattern across six plant tissues revealed that ~80% of these genes share similar expression profiles (Supplementary Fig. 6).

Functional characterization of 6OMTs in H. cordata

According to previous studies25, 6OMT serves as the rate-limiting step in 1-BIA production and catalyzes the first modification of the fundamental BIA backbone, NCC. To verify the functions of the two candidate 6OMT genes (Hc6OMT1 and Hc6OMT2), we first cloned them from a cDNA library and tested their activities using a hyper-trans expression system in Nicotiana benthamiana leaves26,27. Agrobacterium strains carrying Hc6OMT1 or Hc6OMT2 were infiltrated into tobacco leaves for about three days to enable protein synthesis, followed by infiltration of NCC (not natively synthesized in tobacco). Metabolites extracted from leaf tissues after three days were analyzed by UPLC-MS. Compared to the control expressing a fluorescence protein (instead of 6OMT), both Hc6OMT1 and Hc6OMT2 samples produced a new compound with retention time and MS spectra matching authentic COC standard (Fig. 1c). Given that NCC and COC were detected in H. cordata tissues (Fig. 1b), these results confirm that Hc6OMT1 and Hc6OMT2 encode functional enzymes responsible for norcoclaurine 6-O-methylation in H. cordata.

Functional characterization of CNMT in H. cordata

The N-methylation of COC, catalyzed by COC N-methyltransferase (CNMT), was recently characterized in Magnoliids species L. chinense16. To investigate the functions of the five candidate HcCNMT genes, we tested them using the transient expression system in tobacco described earlier. When COC was supplied, samples expressing HcNMT2, HcNMT3, or HcNMT4 produced a compound identified as NMC based on its matching retention time and MS spectra with an authentic NMC standard (Fig. 1d). In contrast, no NMC-related molecules were identified in HcNMT1 or HcNMT5 samples, suggesting these paralogs may have lost NMC-producing activity. In L. chinense, the characterized CNMT (LcNMT1) exhibits dual functionality, producing both NMC (mono-methylated COC) and magnocurarine (MGC; di-methylated COC). This contrasts with Ranunculales CNMTs, which exclusively synthesize NMC10. Similarly, MGC was detected in tobacco samples overexpressing HcNMT2, HcNMT4, or HcNMT5, implying that Magnoliids CNMTs may have acquired broader/new catalytic roles. Strikingly, HcNMT5 exclusively produced MGC (Fig. 1d). A single E204G mutation in LcNMT1 has been shown to shift its activity to MGC-only production16. The glycine (G) residue also presents in PsRNMT (Papaver somniferum; Ranunculales) which has di-methylation functionality on (S)-RTC (Supplementary Fig. 1c, e)28,29. Consistent with this, sequence alignment revealed that only glycine occupies the equivalent position in HcNMT5 (Fig. 1e). Therefore, HcNMT5 functions as a natural NCC di-methyltransferase, specifically generating MGC in H. cordata (Supplementary Fig. 7). Our work thus reveals a complex biosynthetic network for MGC production in H. cordata, underpinned by gene duplication and neofunctionalization (Fig. 1f).

To further test the results of enzymatic activities of 6OMTs and NMTs obtained from transient assays in N. benthamiana, we evaluated their functionality using crude protein extracts from tobacco leaves incubated with the substrates and the cofactor S-adenosyl methionine (SAM). These experiments confirmed their roles as methyltransferases (Supplementary Fig. 8).

Functional characterization of (S)-reticuline biosynthesis in H. cordata

Previous studies have shown that CYP80B and 4’OMT are involved in (S)-RTC biosynthesis8,9. CYP80B catalyzes the 3’-hydroxylation of NMC, followed by 4’OMT-mediated 4’-O-methylation to produce RTC (Supplementary Fig. 1c). Our homology search identified one CYP80B (HcCYP80B) and two 4’OMTs (Hc4’OMT1 and Hc4’OMT2) (Supplementary Figs. 3 and 5). Interestingly, HcCYP80B and Hc4’OMT1 are tightly linked on the chromosome and exhibit highly similar expression patterns across major plant tissues (Fig. 2a). While multiple CYP80B gene copies exist in the flanking region of Hc4’OMT2, these genes are likely undergoing pseudogenization (Fig. 2a). The three genes were successfully cloned and functionally tested in tobacco leaves. When NMC was supplied as a substrate, tobacco expressing HcCYP80B produced a compound (1) with m/z = 316, 16 units higher than NMC (m/z = 300). Co-expression of HcCYP80B with Hc4’OMT1 led to a sharp decline in 1 and the production of RTC, confirming 1 as 3’-hydroxy-NMC. Hc4’OMT2 displayed comparable catalytic capability, with seemingly less efficiency (Fig. 2b). Moreover, Hc4’OMT2 showed negligible expression across all tissue transcriptomes (Fig. 2a). Together, these results indicate that the HcCYP80B-Hc4’OMT1 mini-gene cluster drives NMC-to-RTC conversion in H. cordata (Fig. 2c).

Fig. 2: Characterization of a gene cluster determined (S)-reticuline biosynthesis in H. cordata.
figure 2

a Expression pattern for the genes in CYP80B-4’OMT cluster (Chromosome 11a) and its duplicated region (Chromosome 01a). Fragments Per Kilobase of transcript per Million mapped fragments (FPKM) values were derived from the published dataset24 and used to create the heatmap. The cluster is drawn based on annotation file. Three CYP80B truncated sequences identified in Chr. 01a are shown with dashed arrows. b Enzyme assay of HcCYP80B and Hc4’OMTs with supply of N-methylcoclaurine (NMC) in transient N. benthamiana system. Aligned extracted ion chromatograms (m/z = 316; m/z = 330) are shown for (S)-reticuline (RTC) standard and tobacco samples in UPLC-MS analysis. c Schematic view of HcCYP80B-Hc4’OMT1 gene cluster determined (S)-reticuline (RTC) biosynthesis in H. cordata.

In summary, mainstream 1-BIA biosynthesis in flowering plants--including Magnoliids and Ranunculales--likely relies on conserved 6OMT, CNMT, CYP80B, and 4’OMT (sub)-families.

Identification of a novel aporphine synthase in H. cordata

In addition to 1-BIAs, H. cordata produces aporphine alkaloids likely derived from the 1-BIA30. For instance, aristolactam--a nephrotoxic compound--is an aporphine alkaloid frequently detected in H. cordata31,32. Studies indicate that CYP80Q and CYP80G, members of the CYP80 family, play key roles in aporphine alkaloids backbone formation via C–C coupling reactions15,33. To identify candidate aporphine alkaloid biosynthetic genes in H. cordata, we successfully cloned HcCYP80Q1 and two CYP80G genes (HcCYP80G1 and HcCYP80G2) and tested their activity in the tobacco transient expression system, co-infiltrated with putative substrates (NMC or RTC). We next screened for potential products exhibiting a 2-unit mass reduction relative to substrates, consistent with C–C coupling. While no targeted products were detected for CYP80Q (Supplementary Fig. 9), CYP80Gs assays with RTC yielded distinct products. UPLC-MS/MS analysis revealed that the HcCYP80G1 expression generated a new compound (2) compared to the control, while HcCYP80G2 produced an additional compound (3) (Fig. 3a). In Coptis japonica (Ranunculales), CYP80G converts RTC to (S)-corytuberine (CTB) via a C2’-C8 phenol coupling reaction33. However, 2 and 3 did not co-elute with the CTB standard in the UPLC-MS/MS analysis (Fig. 3a). Several mechanistic studies indicate CYP80G-mediated phenol coupling begins with 3’-OH hydrogen abstractions, followed by radical migration to reactive carbons (Supplementary Fig. 10)13,33,34. Chemically, ortho- and para-carbons in hydroxyphenyl group are more reactive than meta-carbons due to electronic and steric effects. Beyond C2’–C8 coupling (as in CTB), alternative products like isoboldine (IBD)—a C6’–C8-coupled aporphine alkaloid—are plausible. Recent work has also proposed that IBD originates from RTC35. Using an IBD standard, we confirmed that 3 matched IBD in retention time and fragmentation patterns. Detectable IBD levels in H. cordata leaves and rhizomes further support HcCYP80G2 as an isoboldine synthase (Fig. 3a and Supplementary Fig. 11). Additionally, we directly determined the in vitro enzyme activity using tobacco leaves crude protein extracts incubated with the substrate RTC (see “Method”). Again, only HcCYP80G2 was found to produce IBD, but not HcCYP80G1 or the negative control (Supplementary Fig. 12). To our best knowledge, this is the first reported plant enzyme responsible for IBD biosynthesis. Compound 2, however, could not be purified or matched to known standards.

Fig. 3: Characterization of a CYP80G-medaited intramolecular C8-C6’ phenol coupling of RTC for isoboldine biosynthesis in H. cordata.
figure 3

a Detection of HcCYP80G1 and HcCYP80G2 functions on (S)-reticuline (RTC) in transient tobacco system by UPLC-MS/MS. STD Standard, CTB (S)-corytuberine, IBD isoboldine. RTC was supplied as substrate. Ions with a reduction of 2-m/z relative to RTC were searched. 2 and 3 were identified as products. Notably, the CTB co-eluted peaks in plant tissue samples showed distinct parent ions in UPLC-MS/MS. b The molecular docking between alphafold3 predicted HcCYP80G2 and the ligand RTC. c The proposed catalytic mechanism of HcCYP80G2. Based on the result in (b), it is likely that the first hydrogen abstraction occurs at the 3’-OH group, followed by the second hydrogen abstraction at the 7-OH group. The C8-C6’ linkage is then proceeded by two one-electron transfers with subsequent radical coupling.

The catalytic mechanism for CTB synthase (CjCYP80G2) was previously proposed33(Supplementary Fig. 10). We next generated a 3D structure of IBD synthase (HcCYP80G2) using AlphaFold3 and docked it with the RTC substrate (Fig. 3b). In the simulation, the 3’-OH oxygen atom of RTC is positioned 4.5 Å from the Fe(II) center in the heme, while the distance for the 7-OH group is 6.2 Å (Fig. 3b). It has been proposed that a two-step of the Fe=O-mediated hydrogen abstraction (first in the 3’-OH, then in the 7-OH), followed by radical migrations to the C8 and C2’ carbons, enabling the production of CTB through C–C coupling in Coptis japonica33. Given the high similarity of docking simulation results between the CTB and IBD synthase, it is likely that the resulting radical (consequence of hydrogen abstraction in 3’-OH) migrates to the para-carbon (C6’), giving rise to distinct C–C coupling reaction and the subsequent formation of IBD (Fig. 3c).

Large-scale identification of “OMT-CYP” gene clustering across a wide range of BIA-producing plant species

Despite the wide BIA distribution in Magnoliids, biosynthesis pathways in this clade remain poorly characterized. To advance our understanding of BIA evolution, we investigated the conservation and diversification of the CYP80B-4’OMT mini-gene cluster across Magnoliids. We performed synteny analysis on genomes of seven BIA-producing species spanning five plant families (Fig. 4 and Supplementary Table 6). The CYP80B-4’OMT cluster was present in all examined genomes and localized to conserved syntenic genomic blocks, except in Annona cherimola (Annonaceae) (Fig. 4). Intriguingly, 6OMT genes were identified in the CYP80B-4’OMT flanking region in four species from three plant families (Annonaceae, Magnoliaceae and Aristolochiaceae) (Fig. 4). Notably, the 6OMT genes within syntenic blocks of L. chinense (Magnoliaceae) and Aristolochia contorta (Aristolochiaceae)--functionally characterized in recent studies16,17--share synteny with Hc6OMT in H. cordata. Given the phylogenetic diversity of our sampled genomes, we propose that the 6OMT-CYP80B-4’OMT gene cluster likely originated in early flowering plant ancestors prior to the divergence of Magnoliales and Piperales. In contrast, the CYP80B-4’OMT cluster occupies distinct genomic blocks relative to 6OMT gene in H. cordata, Saururus chinensis and Piper nigrum (Fig. 4), suggesting additional translocation of this module in Saururaceae and Piperaceae during evolution.

Fig. 4: Genomic conservation of the 6OMT-CYP80B-4’OMT gene clustering in Magnoliids.
figure 4

The species relationship is shown as a tree on the left. The plant families within the Magnoliids are underscored. The collinearity of the 6OMT (green), CYP80B (red), and 4’OMT (blue) genes are shown on the right.

BIA related gene clusters were first characterized in Papaver somniferum (Ranunculales) for morphine and noscapine production36. However, these gene clusters appear unique to the P. somniferum, and earlier studies did not include 6OMT, CYP80B, and 4’OMT genes. Remarkably, reanalysis of the P. somniferum genome revealed gene clusters containing PsCYP80B paired with either Ps6OMT or Ps4’OMT (Supplementary Fig. 13). Similar CYP80B-6OMT or CYP80B-4’OMT gene clusters were recently identified in three additional Ranunculales species (Stephania japonica, Stephania yunnanensis, and Stephania cepharantha)37. Expanding this analysis to diverse BIA-producing Ranunculales species confirmed the widespread occurrence of such clusters, though the 6OMT-CYP80B type dominated in this order (Supplementary Fig. 13).

Collectively, our large-scale analysis of BIA producing plants across basal angiosperms (Magnoliids) and early-diverging eudicots (Ranunculales) reveal dynamic gene clustering of 6OMT, CYP80B, and 4’OMT genes, which encode critical steps for BIA backbone biosynthesis. This conserved genomic architecture thus provide a potential marker for identifying BIA pathways in other plant lineages.

Discussion

Here, we have characterized a cascade of key enzymes critical for the core BIA backbone formation in H. cordata. While these catalyzing enzyme (sub)families are likely conserved between Ranunculales and Magnoliids (two major BIA-producing clades in flowering plants), functional diversification is evident through gene duplications in H. cordata. This is reflected by (1) variable catalytic activity between paralogs; (2) sub- or neo-functionalized gene cop(ies) such as HcNMT5 and HcCYP80G2. As a polyploid species, H. cordata’s whole-genome duplication (WGD) events may have driven this diversification, though mechanistic links require further investigation.

Our work also reveals widespread dynamic 6OMT-CYP80B-4’OMT gene clustering across species in Ranunculales and Magnoliids. Unlike most biosynthetic gene clusters (BGCs)21,38, which are typically lineage-restricted, these clusters occur dynamically across phylogenetically distant flowering plants (Supplementary Fig. 13). While metabolic gene clustering has emerged as a key framework for studying genome evolution and metabolic diversification, its evolutionary drivers remain poorly understood and debated21,23. A prevailing hypothesis posits that clustering facilitates coordinated spatiotemporal gene regulation via genetic or epigenetic modulations39,40,41. Others argue that natural selection—either purifying or positive—drives cluster assembly21,42. For instance, purifying selection may maintain the thalianol cluster in Arabidopsis thaliana, as mutations in these genes produce toxic intermediates38,43. Contrarily, positive selection likely preserves BGCs encoding defensive compounds, favoring co-inheritance of functionally linked genes22,44,45. While these models could theoretically explain 6OMT-CYP80B-4’OMT clustering, their prevalence across angiosperms suggests additional mechanisms may exist.

The three genes encode enzymes that modify phenyl groups to generate RTC—the central intermediate skeleton for other BIAs via C–C coupling reactions—resulting in a conserved structural feature: one methylated -OH and one free -OH group on its phenyl rings (Fig. 5). To investigate the functional necessity of these modifications, we revisited three downstream enzymes catalyzing RTC’s intramolecular C–C coupling: the sinoacutine synthase MdCYP80G10 in Menispermum dauricum13, the salutaridine synthase PsCYP719B1 in P. somniferum34, and the (S)-corytuberine synthase CjCYP80G2 in Coptis japonica33 (Supplementary Fig. 10). Despite differing evolutionary origins and coupling products, all enzymes require a free -OH group for catalysis (e.g., hydrogen abstraction) (Supplementary Fig. 10). Complete methylation of both -OH groups would block coupling, while unmethylated intermediates with multiple free -OH groups impair catalytic activity. Consistent with this, these enzymes exhibit strict substrate selectivity, accepting only intermediates with one free -OH group11,33. As noted by Jonathan E. Poulton in The Biochemistry of Plants (1981), “methylation … reduces the chemical reactivity of the phenolic groups…. [and] may play a crucial role in directing intermediates toward specific biosynthetic pathways (of BIAs).” In our newly proposed model, the CYP80B generates a hydroxyl group required for subsequent C–C coupling; and the 6OMT or 4’OMT-mediated methylation acts as ‘protecting groups’6, ensuring regio-selectivity by preventing aberrant reactions. Therefore, the interplay between free -OH availability and methylation-directed selectivity might provide a mechanistic rationale for the evolutionary retention of 6OMT-CYP80B-4’OMT dynamic gene clustering in diverse plant families (Fig. 5). The clustering of genes associated with interlinked biochemical reactions thus inspires new perspectives on the evolutionary forces driving the formation of metabolic gene clusters in plants.

Fig. 5: A proposed model for dynamic 6OMT–CYP80B–4’OMT gene clustering promoted by interlinked biochemical reactions in plants.
figure 5

The coupled chemical reactions (phenyl hydroxylation and O-methylation) are essential for the intramolecular selective C–C coupling of RTC, which generates a series of specialized BIAs (three reported examples plus the HcCYP80G2 with distinct C–C couplings are illustrated; see details in Supplementary Fig. 10). We speculate that the interdependence between the two functional groups underpins the dynamic gene clustering of the 6OMT, CYP80B, and 4’OMT genes across a wide range of plant lineages.

Methods

Plant materials and chemical standards

The H. cordata and N. benthamiana plant samples were grown in a growth chamber (16 h of light/8 h of dark, 22 °C) at Shanghai Jiao Tong University. All chemical standards used in this study are listed in Supplementary Table 1.

Bioinformatics analysis

The genome sources used in this study were listed in Supplementary Table 6. The information on reference protein was listed in Supplementary Tables 24. For identification of OMTs, NMTs, and CYP80s, BLASTP with a cut-off of E value (<1e−10) and sequence identity (>40%) was applied. The output was further filtered by sequence length (300–700 residues for CYPs and 200–500 residues for OMTs and NMTs)46,47,48. Multiple sequence alignment was generated with MUSCLE49. The maximum likelihood (ML) tree was constructed using FastTree50. The phylogenetic tree was visualized using iTOL (https://itol.embl.dei). Microsynteny analysis was performed using the MCscan package with default parameters (https://github.com/tanghaibao/jcvi/wiki/MCscan-(Python-version)). We obtained the gene expression matrix using the published Houttuynia cordata transcriptome data24. The raw paired-end reads were processed for quality control and trimming using Trimmomatic 0.39, adhering to the default settings51. The cleaned reads were mapped to the reference genome using the HISAT2 software in directional mode52. Subsequently, for each sample, the mapped reads were used to assemble transcripts with StringTie through a reference-guided assembly approach to output the FPKM values53.

Docking simulation analysis

The structure of HcCYP80G2 with its heme ligand was predicted using AlphaFold354. The RTC ligand structure was drawn with ChemDraw (v22.0.0 64-bit). The docking results were simulated using NRGSuite-Qt55, a PyMOL (v3.1.3; PyMOL Molecular Graphics System; Schrödinger) plugin based on FlexAID56, which relies on a genetic algorithm to predict binding sites and generate docking results. The plugin identified the binding site, and the top-ranked one was selected for FlexAID docking. The docking parameters were set to 2000 for both the initial population (chromosomes) and the number of generations. The result with the highest docking score was chosen as the result. PyMOL was also used to visualize the docking results.

Gene cloning and synthesis

Total RNA was extracted using Total RNA Extractor (Trizol) (Sangon BioTech, Shanghai, China), and the cDNA was synthesized using the HiScript II 1st Strand cDNA Synthesis Kit (Vazyme, Nanjing, China). The candidate genes were amplified by PCR using cDNA from H. cordata as the template and the PCR products were inserted into pEAQ-Xhol vector (digested with Xhol) using the ClonExpress II One Step Cloning kit (Vazyme, Nanjing, China). The primers are listed in Supplementary Table 7.

Transient expression and enzyme assay in N. benthamiana

Candidate genes were cloned into the pEAQ-Xhol vector and transformed into Agrobacterium tumefaciens (LBA44404). A positive colony was selected and resuspended in 10 mL of LB medium, incubated at 28 °C with agitation at 220 rpm until the value of OD600 nm reached 1, and then centrifuged at 4000 g for 10 min. The pellets were then resuspended in 5 mL MMA buffer (10 mM MES, 10 mM MgCl2, and 150 μM acetosyringone in Milli-Q water), and incubated at 28 °C in the dark for 1 h. Agrobacterium suspension (the final OD600 nm = 0.2 for each strain) was infiltrated into the abaxial side of 6 to 8-week-old N. benthamiana leaves (on a 14 h light-cycle) using a 5 mL syringe until the entire leaf was infiltrated. The injected N. benthamiana plants were then kept in the dark for 1 day, followed by 2 days of light exposure (14 h light-cycle). Next, 0.1% DMSO with or without 25 μM substrate was infiltrated into the abaxial side of previously Agrobacterium-infiltrated leaves. 3 days after feeding, leaves were harvested, lyophilized, ground into powder, and stored at −20 °C for later analysis. Each treatment consisted of 3 or 4 leaves, and the biological replicates consisted of 3 different plants.

In vitro enzyme activity

For in-vitro enzyme activity analysis, Agrobacterium tumefaciens strain LBA4404 carrying the target genes were used to infiltrate 4-week-old in N. benthamiana leaves. After 3 days, leaf samples were harvested, flash-frozen in liquid nitrogen, followed by extraction of crude protein using 400 µL of Plant Lysis Buffer (100 mM Tris-HCl pH 8.0, 150 mM NaCl, 10% glycerol, 1 mM EDTA, and 0.1% Triton-X100), and the homogenate was centrifuged at 13,800 g for 20 min at 4 °C. The supernatant containing the crude protein extract was collected, and protein concentration was determined using the Bradford assay. For enzyme activity of CYPs, 100 µg of crude protein was incubated in a 100 µL reaction buffer containing 100 mM Tris-HCl pH 8.0, 1 mM NADPH, and 0.1 mM substrate (RTC) at 28 °C for 6 h. For the NMTs/OMTs, 100 µg of crude protein was incubated in a 100 µL reaction buffer containing 100 mM Tris-HCl pH 8.0, 1 mM SAM, and 0.1 mM substrate (COC/NCC) at 28 °C for 24 h. Reactions were terminated by adding 100 µL of methanol, followed by centrifugation at 13,800 g for 10 min to remove insoluble debris. Finally, 1 µL of the supernatant was injected into an UPLC-MS/MS system for product detection.

For enzyme activity analysis, 100 µg of crude protein was incubated in a 100 µL reaction buffer containing 100 mM Tris-HCl pH 8.0, 1 mM NADPH, and 0.1 mM substrate (RTC) at 28 °C for 6 h. Reactions were terminated by adding 100 µL of methanol, followed by centrifugation at 13,800 g for 10 min to remove insoluble debris. Finally, 1 µL of the supernatant was injected into an UPLC-MS/MS system for product detection.

BIA extraction from plants

Lyophilized tissue powder (100 mg for H. cordata; 50 mg for tobacco leaves) was weighed into a 2 mL centrifuge tube and 1 mL methanol was added. The mixture was sonicated for 30 min at room temperature and then centrifuged at 12,000 g for 20 min. After being transferred into a new 1.5 mL centrifuge tube, the extraction was spun-dried in a Genevac (EZ-2 4.0) and then dissolved in 200 μL methanol by 10 min of ultrasonication. After centrifuged at 12,000 g for 20 min, 100 μL extraction solution was transferred into vials for UPLC-MS/MS analysis.

UPLC–MS analysis

The UPLC-MS analysis was performed on a UHPLC-MS system (Thermo Scientific Vanquish UHPLC) equipped with a C18 column (1.8 μm, 2.1 mm × 50 mm). The mobile phase consisted of 0.1% formic acid in water (A) and 0.1% formic acid in acetonitrile (B), and the linear gradient elution program was as follows: 0–2 min, 2% B; 2–10 min, 2–10% B; 10–13 min, 10–50% B; 13–16 min, 50–95% B; 16–18 min, 95% B; 18–18.1 min, 95–2% B; 18.1–20 min, 2% B. The flow rate was 0.25 mL/min and the column temperature was 25 °C. The electrospray ionization (ESI) source was used for detection in a positive-ion mode with a vaporizer temperature of 350 °C and a m/z range of 200–800.

UPLC–MS/MS analysis

The UPLC-MS/MS analysis was performed on a Thermo Fisher Vanquish UPLC system coupled to a Q-Exactive Plus Hybrid Quadrupole Orbitrap Mass Spectrometer (Thermo Fisher). A UPLC BEH C18 column (2.1 mm i.d. × 100 mm, 1.7 µm, Waters) with a VanGuard BEH C18 precolumn (2.1 mm i.d. × 5 mm, 1.7 µm, Waters) was used for separation. The mobile phase consisted of 0.1% formic acid in water (A) and 0.1% formic acid in acetonitrile (B). The linear gradient elution program was as follows: 0–2 min, 5% B; 2–12 min, 5–15% B; 12–18 min, 15–50% B; 18–19 min, 50–100% B; 19–20 min, 100% B; 20–21 min, 100–5% B; 21–22 min, 5% B. The flow rate was 0.30 mL/min and the column temperature was 25 °C. The ESI source was used for detection in a positive-ion mode with a mass range of m/z 150–500.

Statistics and reproducibility

No statistical method was used to predetermine sample size. All functional experiments in transient N. benthamiana leaves were performed in triplicate unless otherwise stated.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.