Introduction

Styphnolobium japonicum (SJ; synonym: Sophora japonica L.) is a tree species from the butterfly flower family, originating in China1. It is widely cultivated across various regions, particularly in northern China and the Loess Plateau. The species is also distributed in Japan, Vietnam, Korea, Europe, and several Western countries, including the United States2. The flower bud of SJ (FBSJ) is a major traditional medicinal material in China, Japan, and Korea, commonly used for hemostasis and cooling the blood to treat heat-related blood disorders. It is also frequently incorporated into medicinal diets and general food products3. Owing to its rich composition of bioactive compounds, FBSJ serves as a prominent active ingredient in Traditional Chinese Medicine (TCM)4. According to data from Tian Di Yun Tu, a big data platform dedicated to the TCM industry, China’s FBSJ production reached 1,800 tons in 20225. Its notably high rutin content further establishes FBSJ as the primary raw material for rutin extraction6. To date, an increasing number of studies have investigated FBSJ in various aspects, including vegetative propagation7; the effects of different environmental impacts on growth and nitrogen metabolism8; and the characterization of bioactive components, such as proteins, gene expression profiles, and flavonoids including rutin and quercetin9,10,11. Additional research has examined its lectin activity12, aqueous extract10, polysaccharides’ antioxidant effects,11 uric acid modulation,12 and its pharmacological effect when combined with other Traditional Chinese medicines13. Metabolites are highly responsive to environmental factors and vary accordingly. However, they are also produced by plants as adaptations to specific ecological conditions, with secondary metabolites usually considered the material basis of bioactivity14. Previous studies have mainly focused on partial components of FBSJ, while research on full metabolite profiles (including non-flavonoid metabolites) at the molecular level, particularly in Sophora species, remains limited15,16. Comprehensive studies on the types of secondary metabolites and associated metabolic pathways of FBSJ across different geographic origins remain limited, hindering the elucidation of origin-dependent quality variations. Furthermore, the therapeutic potential of FBSJ against 11 major diseases (e.g., cancer, diabetes) has yet to be systematically validated at the metabolite level17.

Currently, major diseases threatening human health worldwide include cancer18, diabetes19, hypertension20, cardiovascular diseases21, atherosclerosis22, and thrombotic diseases23. Modern clinical studies have shown that FBSJ exhibits various pharmacological effects, including hemostasis, hypoglycemia, antioxidant activity, stomach protection, immunity enhancement, antiviral effects, blood pressure reduction, and antitumor activity24. Based on the CancerHSP and TCMSP databases, six major diseases—namely cancer/tumor, diabetes, hypertension, cardiovascular diseases, atherosclerosis, and thrombotic disorders—were identified as primary targets associated with FBSJ. In addition, five other conditions potentially linked to FBSJ were recognized, including osteoporosis, liver ischemic injury, inflammation, infectious diseases, and hemorrhage. These disease categories reflect the spectrum of pharmacological activities that FBSJ-derived metabolites may exert. Due to variations in plant origin and extraction conditions, FBSJ-derived compounds may exhibit distinct biological activities and health-promoting effects across different countries. In China, Shandong, Anhui, Henan, and Hebei are the main production regions of FBSJ. Although the flowers from these four provinces exhibit no significant morphological differences, market feedback indicates that there are certain variations in quality. However, the specific differences remain unclear, and relevant studies are scarce.

In this context, a metabolomic analysis of FBSJ from these four regions is necessary to identify differences in metabolite composition, explore potential new pharmacological effects, predict variations in therapeutic properties, and provide a reference for the screening and deep processing of FBSJ. To achieve the metabolites of FBSJ more comprehensively, this study employs a widely targeted metabolomics approach using UPLC-MS/MS to analyse FBSJ samples collected from diverse geographic origins. This strategy enables precise identification of secondary metabolites at the molecular level. Additionally, key active ingredients and disease-resistant ingredients associated with the eleven diseases were identified based on network pharmacology databases—this step specifically addresses the lack of metabolite-disease linkage in existing research. Differential metabolite screening enabled the identification of potential marker metabolites in each group (e.g., 13 high-abundance metabolites in SJsd), providing theoretical support for the establishment of quality standards. Furthermore, an analysis of significant KEGG pathways (e.g., isoflavonoid biosynthesis) among FBSJ from different origins offers insights for future cultivation, material basis screening, and the pharmacological mechanisms of disease-resistant ingredients, directly advancing the field’s understanding of FBSJ’s origin-related metabolism. These findings contribute to human health promotion and support the development of FBSJ-based functional foods and pharmaceuticals.

Materials and methods

Materials

Methanol and acetonitrile were purchased from Shanghai Merck Chemical Technology Co., Ltd., while formic acid was obtained from Shanghai Aladdin Biochemical Technology Co., Ltd. All chemicals were of chromatographic grade and used without further purification. FBSJ samples were collected from four geographic origins in China: SJsd from Shandong (115°46′E, 35°16′N, Heze), SJah from Anhui (117°18′E, 31°30′N, Lujiang), SJhn from Henan (114°28′E, 33°23′N, Shangcai), and SJhb from Hebei (115°04′E, 38°59′N, Dingzhou). These samples were sourced from Yutang Food Department Store, Xinzhou High-tech Industrial Development Zone, Anhui Baitang Pharmaceutical Co., Ltd., Chengde Biological Technology Co., Ltd., and China E-commerce Co., Ltd., respectively. All samples were harvested in July 2022 and were identified by Assistant Professor Jihai Gao from Chengdu University of TCM as the dried flower buds of Styphnolobium japonicum L. (Sophora japonica L.). Voucher specimens of these materials were deposited in the State Bank of Chinese Drug Germplasm Resources (Voucher number:522230190125HM001, 522230190125HM002, 522230190125HM003, 522230190125HM004). Climatic data (temperature, rainfall, and sunshine duration) for the four production regions from February to May 2022 were obtained from publicly available sources: temperature and rainfall data from https://www.tianqi24.com and sunshine duration from https://www.ceicdata.com/zh-hans/china/sunshine-hours.

Methods

Sample Preparation and extraction

Each sample was freeze-dried and ground into a fine powder. A 50 mg aliquot of FBSJ powder was accurately weighed and extracted with 1.2 mL of pre-cooled (− 20 ℃) 70% methanol, which served as the internal standard extraction solution. The mixture was vortexed for 30 s every 30 min, repeated six times. The sample was then centrifuged at 1,200 rpm for 3 min, and the supernatant was filtered through a microporous membrane for subsequent UPLC-MS/MS analysis.

LC-MS/MS analysis conditions

Metabolite identification was performed using a UPLC-ESI-MS/MS system consisting of a UPLC unit (ExionLC™ AD, https://sciex.com.cn/) coupled with an MS (Applied Biosystems 4500 Q TRAP, https://sciex.com.cn/).

Chromatography-mass spectrometry acquisition conditions

Liquid-phase conditions

The data acquisition was conducted using a UPLC system coupled with MS/MS. The liquid phase conditions included an Agilent SB-C18 column (1.8 μm, 2.1 mm × 100 mm), with phase A consisting of ultra-pure water with 0.1% formic acid and phase B consisting of acetonitrile with 0.1% formic acid. The elution gradient started with 5% B at 0.0 min, increased linearly to 95% B within 9.0 min, maintained 95% B from 9.0 to 10.0 min, decreased to 5% B from 10.0 to 11.1 min, and maintained 5% B from 11.1 to 14.0 min. The flow rate was set to 0.35 mL/min, the column oven temperature was maintained at 40 °C, and the injection volume was 4 µL.

MS conditions

The effluent was analysed using linear ion trap (LIT) and triple quadrupole (QQQ) scans. The analytical conditions were as follows: Electrospray ionization (ESI) temperature set to 550 °C, with an ion spray voltage of 5500 V (positive ion mode) or − 4500 V (negative ion mode). The ion source gas (GSI) was set to 50 psi, gas II (GSII) to 60 psi, and the curtain gas (CUR) to 25 psi. Collision-induced ionization parameters were set to “high”. The QQQ scan used Multiple Reaction Monitoring (MRM) mode, with the collision gas (nitrogen) set to medium. The declustering potential (DP) and collision energy (CE) were optimized for each MRM ion pair. A specific set of MRM ion pairs was monitored during each period based on the metabolites eluted during that time.

Qualitative and quantitative determination of metabolites

For qualitative analysis of metabolites, the data analysis method was adapted from Wang et al.25. The substances were identified based on fragmentation patterns, retention time, and m/z values established in the Metware database (MWDB), provided by Metware Biotechnology Co., Ltd. (Wuhan, China). Using secondary mass spectra information, the obtained metabolite data were compared with the MWDB database to obtain structural information and classifications. Differential metabolites in the FBSJ samples were then studied. Isotopic signals, repetitive signals containing K+, Na+, NH4+, and fragment ions derived from larger molecular species were excluded from the analysis.

Quantitative analysis was performed using the multi-reaction monitoring (MRM) mode of triple quadrupole mass spectrometry. After obtaining the mass spectrometry data for the different samples, the peak area normalization method was applied to calculate the relative content of FBSJ samples from different producing areas. Each sample was analysed in triplicate, and the average data were calculated.

Identification of key active ingredients in traditional Chinese medicine (TCM)

UPLC-MS/MS was used to analyse FBSJ, and all metabolites were queried in the TCM systems pharmacology database and analysis platform (TCMSP). Metabolites with oral bioavailability (OB) ≥ 5% and drug-likeness (DL) ≥ 0.14 were identified as key active ingredients. The relevant targets of the identified metabolites, along with associated disease information, were then obtained26.

Identification of the pharmaceutical ingredients for human diseases-resistance

To identify disease-resistant annotation information, the TCMSP and CancerHSP databases (https://www.tcmsp-e.com/#/database) were used. Pharmaceutical ingredients corresponding to multiple diseases, such as diabetes, cardiovascular diseases, cancer, hypertension, atherosclerosis, thrombotic diseases, and five other related conditions including osteoporosis, liver ischemic injury, inflammation, infectious diseases, and hemorrhage, were identified. Finally, by comparing the metabolites identified through UPLC-MS/MS analysis with these disease-resistant components, the active drug ingredients in FBSJ were determined.

Mass spectrometry data analysis and metabolic mechanism analysis

Principal component analysis (PCA), hierarchical cluster analysis (HCA), and orthogonal partial least squares discriminant analysis (OPLS-DA) were performed on the identified metabolites using relevant software. Differential metabolites were selected based on the variable importance in projection (VIP) obtained from the OPLS-DA model, with criteria of VIP ≥ 1, Fold Change (FC) ≥ 2, or FC ≤ 0.5.

Kyoto encyclopedia of genes and genomes (KEGG) annotation and further enrichment analysis

Related metabolic pathways were screened for metabolites from the four FBSJ groups with different origins using the KEGG database27. Differential metabolites were identified based on two criteria: VIP ≥ 1 and P-value < 0.05 (ANOVA) from the OPLS-DA model. These differential metabolites were aligned and annotated using the KEGG metabolic library and then mapped to the KEGG Pathway database28. Further enrichment analysis of the mapped pathways was conducted using the network-based server Metabolite Sets Enrichment Analysis (MSEA). Pathways with P ≤ 0.05 were considered significantly enriched29.

Data analysis method

Statistical analysis and figure generation were performed using the software listed in Table 1.

Table 1 Software list information.

Results and discussion

Overall analysis of the widely targeted metabolites of FBSJ

FBSJ, also known as Sophora japonica (L.), were collected from four different geographic origins in China: SJsd from Shandong, SJah from Anhui, SJhn from Henan, and SJhb from Hebei. Figure 1A shows their shape, color, and relative size. Apart from the overall darker color of SJsd, there were no significant differences in appearance among the samples.

Fig. 1
figure 1

FBSJ from four different origins (A) and the Venn diagram of their total metabolites (B).

In this study, a metabolomics approach was employed to investigate the dynamic changes in metabolites in FBSJ. The widely targeted metabolite analysis, conducted using UPLC-MS/MS, identified a total of 1,559 metabolites using the Venn daigram (Fig. 1B). This is the largest number of components detected in FBSJ to date, surpassing the previous detection of 331 secondary metabolites16. Among these, 1557 metabolites were found to co-exist in SJsd, SJah, SJhb, and SJhn. Additionally, two metabolites, methyl syringate, and His-Thr-Lys-Lys, were found exclusively in specific origins: methyl syringate in SJah, SJhn, and SJhb, and His-Thr-Lys-Lys in SJah, SJhn, and SJsd.

Among all the metabolites detected, 399 flavonoids, 224 phenolic acids, 186 amino acids and their derivatives, 172 lipids, 86 alkaloids, 111 organic acids, 97 terpenoids, 74 nucleotides and their derivatives, 50 lignans and coumarins, 11 tannins, 10 quinones, and 139 other metabolites were identified. In terms of species abundance, flavonoids accounted for approximately 25.5%, phenolic acids for 14.3%, amino acids and their derivatives for 11.9%, and lipids for 11% (see Supplementary Fig. S1 online). In both variety and quantity, this analysis significantly exceeds previously detected secondary metabolites in FBSJ. Previous studies identified 173 flavonoids, 53 phenolic acids, 29 organic acids, 23 alkaloids, 15 terpenoids, 11 lignans, coumarins, 6 tannins, and 21 other metabolites16.

Identification of the key active ingredients in TCM

FBSJ contains 11 types of ingredients, including flavonoids, phenolic acids, organic acids, amino acids and their derivatives, terpenoids, and tannins30,31. Although flavonoids such as rutin, quercetin, kaempferol, along with certain saponins, are well-characterized constituents of FBSJ, other bioactive compounds with potential health-promoting effects in FBSJ remain insufficiently defined. As a result, the functional roles of non-flavonoid metabolites are not yet fully understood, despite their possible contributions in promoting human health.

The identified metabolites were queried in the TCMSP database to identify key active components in FBSJ with health-promoting functions for the human body. Among the 1,559 identified metabolites—the largest number reported for FBSJ to date—this study significantly expands metabolome coverage, addressing the limitations of previous research and providing a more comprehensive biochemical profile. Out of the 1,559 metabolites identified, 294 were classified as active components of TCM. Using OB ≥ 5% and DL ≥ 0.14 as screening criteria, 135 key active components were identified (Table 2). Although rutin (OB = 3.2%, DL = 0.68) did not meet the predefined screening thresholds, it was nonetheless considered as a key active component of FBSJ due to its well-established health-promoting properties.

Among the 135 key active ingredients of TCM, flavonoids were the most abundant, comprising 87 types and accounting for 64.4% of all key active ingredients. Additionally, there were 14 lipids, eight phenolic acids, seven terpenoids, six nucleotides and their derivatives, one tannin, one quinone, and five vitamins, flavonoids, and other bioactive substances. Apart from biochanin A, 3’-methoxydaidzein, quercetin, genistein, glycitein, acacetin, catechin, 6’’-O-acetylgenistin, and 3’,4’,7-trihydroxyflavone, which have been reported in previous studies, the remaining ingredients have not been documented in prior related metabolomics research16. Notably, 12 metabolites, despite lacking relevant protein target information, exhibited very high DL values (DL > 0.65). These included 6’-O-malonyl bebeoside, conazole H, rescinin A-7-O-glucoside (Indian santalin), pyrin, terpenoid tritol, kaempferol-7-O-rhamnoside, 3,24-dihydroxypier-12-ene-22 ketol (soy alcohol E), dehydrogated soy saponin I, and apigenin. This indicates that these metabolites may possess significant health-promoting potential and could be further explored for the development of new foods and medicines.

Table 2 Key active ingredients of TCM in FBSJ with OB ≥ 5% and DL ≥ 0.14.

Identification of active pharmaceutical ingredients for 11 human diseases

Given the substantial chemical diversity of FBSJ, directly predicting its pharmacological spectrum presents significant challenges32. To address this complexity, we systematically selected 11 disease targets for component screening through an integrative strategy. This approach combined global common diseases (such as cancer, cardiovascular disease, and diabetes mellitus)33, dominant flavonoid constituents (e.g., rutin and quercetin)34,35, and TCM property-flavor theory documented in the Chinese Pharmacopoeia36. This tripartite strategy identified pathologies, including hepatic dysfunction, hemorrhage, and inflammatory bowel diseases. The selection rationale was based on FBSJ’s meridian tropism (liver/large intestine) and its dual therapeutic functions: blood-cooling hemostasis and hepatic fire-clearing36. In the CancerHSP and TCMSP databases, the top six diseases—cancer/tumor, diabetes, hypertension, cardiovascular disease, atherosclerosis, and thrombotic diseases—along with five other related conditions, including osteoporosis, liver ischemic injury, inflammation, infectious diseases, and hemorrhage, were selected as representative pharmacological roles that metabolites from FBSJ extract may contribute to. The active ingredients in FBSJ that target these 11 diseases were identified to provide additional information for evaluating the preventive and therapeutic effects of FBSJ on these conditions. From the 294 TCM ingredients identified, a total of 193 metabolites were found to be associated with the diseases mentioned above. Most of the identified metabolites were flavonoids, comprising 90 flavonoids, 37 phenolic acids, 15 organic acids, 12 lipids, nine coumarins and lignans, six terpenoids, four alkaloids, four amino acids and their derivatives, three nucleotides and their derivatives, three chromones, three vitamins, two proanthocyanidins, one quinone, one ketone, one alcohol, and two sugars (see Supplementary Tab. S1 online). Based on this, the potential preventive and therapeutic effects of FBSJ on the 11 diseases were speculated, or alternatively, the material basis for its therapeutic effects on these diseases was proposed. These 193 metabolites were found to have effects associated with 306 target proteins linked to 335 diseases, indicating that FBSJ possesses extensive pharmacological potential contributing to the treatment of the 11 diseases.

Differential metabolites in FBSJ from different origins

PCA and HCA cluster analysis

To identify and better understand the differences in metabolites of FBSJ from different origins, PCA and HCA were performed. As shown in Fig. 2A, PCA analysis of FBSJ from four different origins revealed clear distinctions, with the PC1 and PC2 contribution rates being 49.78% and 14.17%, respectively. The PCA results effectively reflected the metabolic differences between Shandong, Anhui, Henan, and Hebei Provinces. SJsd from Shandong showed the greatest difference from the other three origins in PC1. Although SJhn from Henan and SJah from Anhui appeared relatively similar, the 3D plot in Fig. 2B indicated that these two groups were clearly separated in terms of PC3. Biological replicates from the same origin clustered well together in the PCA 3D plot. However, this unsupervised PCA could not fully account for within-group variations and random errors unrelated to the research37, making it less effective for identifying group differences. Further research will be conducted using supervised methods. To minimize the influence of quantity on the recognition pattern, the peak area of each metabolite in the sample was transformed logarithmically before performing HCA. As shown in Fig. 2C, the HCA results clearly distinguished the samples by color. A noticeable separation was observed between SJsd and the other samples. The metabolites from SJhn and SJah initially clustered together, then gradually grouped with SJhb, and ultimately converged with SJsd. This clustering pattern reflects the phenotypic differences between the four groups, which align with the findings from the PCA analysis.

The metabolic profiles of SJah, SJhn, and SJhb exhibited some differences compared to SJsd. Both PCA and HCA analyses revealed that the four Styphnolobium japonicum plants formed distinct clusters, each characterized by unique metabolic signatures. Among them, SJsd showed the most pronounced divergence from the other three origins.

Cluster analysis was performed to examine the differences in metabolites of FBSJ from various origins, resulting in a cluster heat map of the four sample groups (Fig. 2D). The Z-scores of the differential metabolites in FBSJ from each geographic origin showed notable variation. Interestingly, the flower buds from the Shandong origin (SJsd) exhibited significantly lower Z-scores for over 100 metabolites compared to the other three origins. Conversely, more than 100 metabolites in SJsd showed markedly higher Z-scores, indicating distinct metabolic enrichment patterns relative to the other samples.

OPLS-DA analysis

OPLS-DA was used to filter out orthogonal variables unrelated to the categorical variables in the metabolites. This method allows for separate analysis of non-orthogonal and orthogonal variables, providing more reliable information about the inter-group differences in metabolites and the correlation degree of experimental groups38. To identify the specific differential metabolites responsible for the separation of FBSJ from four different origins in China, four comparative OPLS-DA models were established (Fig. 2E). The R2X and R2Y values reflect the model’s explanatory power for the X and Y variables, respectively, while the Q2 value indicates the model’s predictive ability. A Q2 value > 0.9 indicates an excellent model, and a Q² value > 0.5 indicates an effective model. The model’s prediction ability (Q2) in this study was 0.727, with a P-value of 0.005. This indicates that the OPLS-DA model was well-constructed, reliable, and meaningful. The R2X and R2Y values of the OPLS-DA models were 0.549 and 0.996, respectively, showing that the explanatory power for X was good, while the interpretation rate of Y was excellent. Furthermore, the P-value of R2Y was less than 0.005, indicating that no random grouping model in the permutation test had a better interpretation rate of the Y matrix than the present OPLS-DA model.

The results (see Supplementary Fig. S2 online) demonstrated clear separation among the four sample groups, indicating substantial variation in the first principal predicted component (Y, 33.8%) across the groups. In contrast, no significant differences were observed between SJsd and SJah along the second principal component (Y, 21.2%), which was not associated with the primary discriminating axis. These findings are consistent with the clustering patterns revealed by both PCA and HCA analyses.

Fig. 2
figure 2

2D PCA plot (A), 3D PCA plot (B), HCA diagram (C), Heatmap (D), and OPLS-DA model verification diagrams (E) of FBSJ from four different origins.

Screening of differential metabolites of FBSJ from different origins

Based on the results of OPLS-DA, a pair-based comparison of different producing areas was conducted under the conditions of VIP > 1 and p < 0.05 to identify the differential metabolites (Table 3). The analysis revealed that the number of differential metabolites identified between SJsd and the other origins was significantly greater than those observed in comparisons among the remaining origins. A total of 708 differential metabolites, classified into 12 major categories, were identified across the four sample groups. These included 183 flavonoids (comprising 51 flavonoids, 48 isoflavones, 43 flavonols, 10 other flavonoids, and additional subtypes), 108 phenolic acids, 88 amino acids and their derivatives, 81 lipids (including 46 free fatty acids, 11 lysophosphatidylethanolamines, 12 lysophosphatidylcholines, and others), 45 alkaloids (8 indole alkaloids, 3 piperidine alkaloids, 2 pyridine alkaloids, and others), 45 organic acids, 36 terpenoids (30 triterpenoid saponins and 6 triterpenoids), 26 nucleotides and their derivatives, 21 lignans and coumarins (11 lignans and 10 coumarins), 6 tannins, 5 quinones, and 64 metabolites classified under other categories.

Among these differential metabolites in FBSJ, flavonoids, phenolic acids, amino acids and their derivatives, and lipids accounted for 25.85%, 15.25%, 12.43%, and 11.44%, respectively. The results indicated 143 downregulated metabolites, with the highest number observed in the SJsd vs. SJhb comparison. In contrast, 160 metabolites were upregulated, most prominently in the SJsd vs. SJah comparison. Minimal changes were detected between SJah and SJhn, with only two downregulated metabolites, and between SJhn and SJhb, with just three upregulated metabolites. The SJsd group demonstrated the most pronounced metabolic alterations, with 143 metabolites significantly downregulated in comparison to SJhb and 160 metabolites upregulated relative to SJah. These significant metabolic shifts establish SJsd as a key candidate for further mechanistic investigation, as its distinct metabolic profile may underlie the observed phenotypic differences. In contrast, minimal intergroup variability was observed among SJah, SJhn, and SJhb, with only slight metabolic divergence between SJah and SJhn. Specifically, only two metabolites exhibited reduced abundance between SJah and SJhn, suggesting that the observed metabolic divergence may be attributable to technical variation rather than biological differences.

Table 3 Count of significantly different metabolites by two-by-two comparisons of FBSJ in four groups.

Among the 708 differential metabolites in FBSJ, 135 metabolites were identified as key active ingredients of TCM. Of these, 102 metabolites were anti-disease active ingredients for 11 diseases, including four phenolic acids, three nucleotides and their derivatives, 73 flavonoids, eight lipids, five lignans and coumarins, three terpenoids, two chromones, one quinone, one vitamin, one ketone, and one tannin. These metabolites represented potential key health-promoting compounds that differentiate FBSJ samples according to their geographic origin in this study. The disease spectrum includes six diseases: cancer/tumor, diabetes, hypertension, cardiovascular diseases, atherosclerosis, and thrombotic diseases, alongside five additional diseases: osteoporosis, liver ischemic injury, inflammation, infectious diseases, and hemorrhage. To further elucidate the content variation trends of potential marker compounds in FBSJ across four geographic origins, the dominant differential metabolites in FBSJ from different origins were screened based on the relative expression ratios among the sample groups. The fold change (FC) values of the metabolites in the comparison group were calculated as FC ≥ 2 or FC ≤ 0.5, resulting in a total of 46 dominant metabolites. These included 35 flavonoids, four coumarins and lignans, three nucleotides and their derivatives, two phenolic acids, one terpenoid, and one tannin. As shown in Table 3, SJsd exhibited a greater number of dominant upregulated metabolites, with 35, 17, and 30 species showing elevated expression in comparisons with SJah, SJhb, and SJhn, respectively. These differential metabolites were predominantly enriched in flavonoids (82.9%), reflecting region-specific biosynthetic specialization. In contrast, SJah, SJhn, and SJhb displayed minimal intergroup variability, with fewer than five differentially abundant metabolites identified between any two groups. These low-magnitude differences reinforce the functional homogeneity among SJah, SJhn, and SJhb, sharply contrasting with SJsd’s distinct metabolic identity. These results strongly highlight the distinct metabolic profile of SJsd, which is likely driven by a combination of environmental stressors, genetic adaptations, and specific regulatory mechanisms. In contrast, the other three groups, particularly SJah and SJhn, exhibited minimal metabolic divergence, suggesting a higher degree of similarity in their biochemical composition39,40.

Fig. 3
figure 3

Boxplots of the 13 dominant metabolites, comparing their relative contents in the flower buds of SJsd, SJah, SJhn and SJhb.

Moreover, the expression levels of 13 differential metabolites, which could serve as potential anti-disease active ingredients for 11 diseases, were significantly higher in SJsd compared to the other three groups (FC ≥ 2) (Fig. 3). These metabolites included 12 flavonoids and one terpenoid: Diosmetin, 3’-methoxydaidzein, ononin, pectolinarigenin, 7,4’-di-O-methyldaidzein, apigenin-7,4’-dimethyl ether, luteolin, maackiain, isoliquiritin, medicarpin, liquiritigenin, 3,5,6,7,8,3’,4’-heptamethoxyflavone, and corosolic acid methyl ester. In the other three groups, differential metabolite expression was largely consistent, with the exception of two metabolites; maackiain and medicarpin in the SJah/SJhb comparison (FC < 0.5). Overall, the FC values for metabolites in SJah, SJhb, and SJhn did not exhibit significant variation.

These ingredients have been shown to possess notable pharmacological activities, including anti-oxidative, anti-inflammatory41,42,43,44,45, neuroprotective46,47, anticancer48,49,50, blood sugar-regulating43,46, cardiovascular and cerebrovascular vessel-protecting43,46,51, antiviral46, antifungal46,52, antibiosis53, anti-allergic54, osteoporosis-related46,55,56, PDE4 inhibition57, analgesic58, as well as preventive and therapeutic potential for skin disorders59. The broad spectrum of pharmacological activities observed in SJsd suggests that it may exert more favorable therapeutic effects compared to the other three groups in these respective aspects.

Differential metabolite KEGG metabolic pathway analysis

Pathway enrichment analysis of 708 different metabolites across the four FBSJ groups was performed using the KEGG database based on the characteristics of the differential metabolites. In total, 382 metabolites were annotated by KEGG, with 176 showing significant differences. These metabolites were distributed across 87 metabolic pathways. Among these, the most significant enrichment was observed in the isoflavone biosynthesis pathway (p < 0.01), followed by neomycin, kanamycin, and gentamicin biosynthesis (p < 0.05), starch and sucrose metabolism, aminoacyl-tRNA biosynthesis, linoleic acid metabolism, and indole alkaloid biosynthesis pathways (p > 0.05). The observed upregulation of key metabolites within the isoflavonoid biosynthesis pathway in SJsd suggests a potentially enhanced isoflavone synthesis capacity relative to other origins. However, transcriptomic or proteomic validation is required to confirm the underlying enzymatic mechanisms driving this metabolic distinction.

Based on the KEGG database, metabolic pathway enrichment analysis was conducted for the three contrast combinations—SJsd vs. SJah, SJsd vs. SJhb, and SJsd vs. SJhn—using a large number of differential metabolites. This analysis helped to understand the mechanisms behind changes in differential metabolites within metabolic pathways. The Differential Abundance (DA) Score is a pathway-based metabolic change analysis method that captures the overall change of all metabolites in a pathway60. Unlike the KEGG enrichment bubble map, the DA score map includes line segments, with the length representing the absolute value of the DA Score. The size of the dots at the end of the line segments indicates the number of differential metabolites in the pathway. The dots are positioned to the left of the central axis, and the longer the line segment, the more the overall expression of the pathway tends to be downregulated compared to the previous state.

In the SJsd vs. SJah comparison, a total of 260 different metabolites were annotated across 51 metabolic pathways. Notably, the isoflavonoid biosynthesis pathway exhibited a high degree of metabolite enrichment and significant variation (p < 0.05) (Fig. 4A), highlighting its potential role in the metabolic divergence between these two origins. 21 significant differential metabolites were involved in this pathway, including 7,4’-dihydroxyflavone, isoformononetin, 6”-O-malonyldaidzin, pseudobaptigenin, daidzein, 6”-O-malonylglycitin, formononetin-7-O-glucoside (ononin), formononetin-7-O-(6’’-malonyl)glucoside, glycitein, genistein, daidzein-7-O-glucoside (daidzin), 2’-hydroxygenistein, calycosin, 4’,5,7-trihydroxyflavone (apigenin), biochanin A, liquiritigenin, maackiain, 5,4’-dihydroxy-7-methoxyisoflavone (prunetin), 3,9-dihydroxypterocarpan, medicarpin, and 7-hydroxy-4’-methoxyisoflavone (formononetin). The overall expression of SJah in this metabolic pathway was downregulated compared to SJsd, indicating a reduced biosynthetic activity associated with this metabolic route.

In the SJsd vs. SJhb comparison, a total of 253 different metabolites were annotated in the KEGG database and mapped to 57 distinct metabolic pathways (Fig. 4B). The metabolic pathways exhibiting high concentrations of metabolites and significant differences (p < 0.05) included pyrimidine metabolism, isoflavonoid biosynthesis, and purine metabolism. The overall expression of metabolites involved in isoflavonoid biosynthesis was more markedly downregulated in SJah compared to SJsd, with an even greater downregulation trend observed in the SJsd vs. SJhb comparison. The differential metabolites involved in isoflavonoid biosynthesis included 7,4’-dihydroxyflavone, medicarpin, formononetin-7-O-glucoside (ononin), 3,9-dihydroxypterocarpan, daidzein, 6”-O-malonylglycitin, maackiain, and liquiritigenin. In addition, the overall expression of SJhb in pyrimidine metabolism, purine metabolism, and nucleotide metabolism was relatively upregulated compared to that of SJsd. These three pathways are associated with nucleotide metabolism. The differential metabolites involved in pyrimidine metabolism included nucleotides and their derivatives, as well as organic acids. The organic acids consisted of malonic acid and 3-hydroxypropanoic acid, while the nucleotides and derivatives included 2’-deoxycytidine, uridine 5’-monophosphate, barbituric acid, malonylurea, 2,4,6-pyrimidinetrione, and cytidine 5’-monophosphate (cytidylic acid). The differential metabolites in purine metabolism were nucleotides and derivatives, including guanosine 5’-monophosphate, cyclic 3’,5’-adenylic acid, 2’-deoxyadenosine, 2’-deoxyinosine, guanosine, and guanosine 3’,5’-cyclic monophosphate. Differential metabolites involved in nucleotide metabolism contributed to guanosine 5’-monophosphate, cyclic 3’,5’-adenylic acid, 2’-deoxyadenosine, 2’-deoxyinosine, guanosine, and guanosine 3’,5’-cyclic monophosphate. These findings suggest that SJhb’s metabolic profile reflects elevated cellular activity61 (e.g., proliferation, stress adaptation), which may be influenced by environmental factors or regulated through genetic mechanisms involving enzymes like PRPP synthetase62.

Fig. 4
figure 4

KEGG enrichment of common different metabolites (DA score) between SJsd and SJah (A), SJsd and SJhb (B), and SJsd and SJhn (C).

For the shared differential metabolites between SJsd and SJhn, a total of 236 different metabolites were mapped to 54 metabolic pathways (Fig. 4C). Among these pathways, isoflavonoid biosynthesis was the pathway with a high concentration of metabolites and a significant difference (p < 0.05). In this pathway, 19 common differential metabolites were identified, including 7,4’-dihydroxyflavone, genistein, formononetin-7-O-(6’-malonyl)glucoside, 3,9-dihydroxypterocarpan, formononetin (7-hydroxy-4’-methoxyisoflavone), prunetin (5,4’-dihydroxy-7-methoxyisoflavone), formononetin-7-O-glucoside (ononin), maackiain, apigenin, 4’,5,7-trihydroxyflavone, medicarpin, daidzein-7-O-glucoside (daidzin), 2’-hydroxygenistein, calycosin, 6’’-O-malonylglycitin, pseudobaptigenin, daidzein, liquiritigenin, biochanin A, and 6”-O-malonyldaidzin. The overall expression of metabolites within this pathway was more substantially downregulated in SJhn compared to SJsd, indicating a lower biosynthetic activity associated with this metabolic route.

Across the SJsd vs. SJhn, SJhb, and SJah comparisons, the isoflavone metabolic pathway emerged as a common differential metabolic pathway, encompassing a total of 21 distinct metabolites. A systematic analysis of the KEGG isoflavone metabolic pathway (ko00943) revealed that three components—7,4’-dihydroxyflavone, liquiritigenin, and 4’,5,7-trihydroxyflavone (apigenin)—were significantly upregulated in SJsd compared to SJah, indicating enhanced pathway activity in the SJsd group. These components are primarily regulated by FNS I and CYP93B2_16 in flavonoid metabolism. The remaining 18 differentially accumulated compounds were identified as isoflavones, including isoformononetin, 6”-O-malonyldaidzin, pseudobaptigenin, daidzein, 6”-O-malonylglycitin, formononetin-7-O-glucoside (ononin), formononetin-7-O-(6’’-malonyl)glucoside, glycitein, genistein, daidzein-7-O-glucoside (daidzin), 2’-hydroxygenistein, calycosin, biochanin A, maackiain, 5,4’-dihydroxy-7-methoxyisoflavone (prunetin), 3,9-dihydroxypterocarpan, medicarpin, and 7-hydroxy-4’-methoxyisoflavone (formononetin). These isoflavone components mainly fall into four functional categories: precursors (e.g., genistein); O-methylated products (e.g., biochanin A); glycosylated derivatives (e.g., daidzin); and acylated modifications. Metabolic regulatory network analysis revealed that the synergistic action of key enzymes, including core skeleton synthases (CYP93C, HIDH), hydroxylation modification enzymes (CYP81E9, CYP81E1/E7), O-methyltransferases (7-IOMT, HI4OMT), glycosylation/acylation enzymes (IF7GT, IF7MAT), and specific product synthases (CYP93A1), was the core mechanism driving the differential accumulation of metabolites.

Fig. 5
figure 5

Column chart of temperature difference(A), total rainfall(B) and sunshine duration(C) accumulation in four regions (Feb to Jun 2022). Note: sunshine duration data were based on data from nearby stations in Shandong Jinan, Hebei Baoding, Henan Zhengzhou, and Anhui Hefei.

Notably, the differential metabolites, including formononetin, pseudobaptigenin, 2’-hydroxygenistein, 6’’-O-malonyldaidzin, daidzein-7-O-glucoside (daidzin), formononetin-7-O-glucoside (ononin), calycosin, apigenin, genistein, prunetin, biochanin A, glycitein, and isoformononetin, are common in the comparisons of SJsd vs. SJah and SJsd vs. SJbn, but absent in SJsd vs. SJhb. This differential trend aligns with variations in temperature, precipitation, and sunshine hours. Thus, it can be speculated that the distinct metabolic profile of SJsd and SJhb may be associated with their unique geographic origins in Shandong and Hebei Province. Environmental stressors prevalent in these regions could drive adaptive phytochemical responses63. Greater fluctuations in average temperature (Fig. 5A), reduced precipitation Fig. 5B), and prolonged sunshine duration (Fig. 5C) may serve as key factors influencing the biosynthesis of these components by isoflavone metabolic regulatory enzymes. Although this association is speculative and based on observed correlations between climatic variables and metabolite content, establishing a causal relationship between environmental factors and metabolite biosynthesis will require further validation through controlled experimental studies. This finding aligns with a wide range of existing literature. For instance, controlled experiments have demonstrated the regulatory influence of temperature and drought stress on soybean isoflavone content, supporting the hypothesis that climate factors (such as temperature fluctuations, and reduced precipitation) modulate isoflavone biosynthesis via metabolic enzyme regulation64,65. Additionally, prunetin and biochanin A levels in Sophora alopecuroides L. have been shown to significantly increase under mild drought stress, whereas isoflavonoids content markedly decrease under moderate and severe drought conditions66.

In contrast, while Hebei Province shares broadly similar environmental conditions with Shandong, SJhb specimens exhibited lower isoflavone content than SJsd, despite exceeding the levels observed in samples from Henan and Anhui Provinces (Fig. 5). Differential metabolites identified across the SJsd vs. SJhn, SJhb, and SJah comparisons include maackiain, medicarpin, 3,9-dihydroxypterocarpan, 6’’-O-malonylglycitin, 7,4’-dihydroxyflavone, daidzein, liquiritigenin, and formononetin-7-O-(6’’-malonyl) glucoside. All of them are associated with isoflavonoid biosynthesis and contribute to the metabolic divergence among these origins. These observations suggest that, beyond climatic influences, additional factors significantly affect the regulation of key enzymes within the isoflavone metaboloc pathway—particularly core skeleton synthases (HIDH), glycosylation and acylation enzymes (IF7GT, IF7MAT), specific product synthases (CYP93A1), and flavone synthases I and II. These findings align with existing research. Under UV-B stress, the levels of total isoflavones and aglycones (genistein, daidzein, and glycitein) increased. When irradiated with UV-B, H2 treatment suppressed the activity and gene expression of IF7GT and IF7MaT. Additionally, appropriate line spacing has been shown to enhance daidzein accumulation in soybean seeds67. Collectively, these environmental and agronomic factors offer a foundational reference for the cultivation, screening, development, and application of FBSJ.

However, it should be noted that, beyond climatic influences, additional variables likely contribute to the differential accumulation of metabolites in FBSJ, potentially through modulation of enzymatic activity and regulatory pathways. These variables include the composition of the soil microbiome39,68, epigenetic regulation68, and anthropogenic influences69,70, which likely modulate differential accumulation of metabolites in FBSJ. These factors act synergistically to finely regulate isoflavone biosynthesis. In particular, the complex interactions between rhizosphere microorganisms and plants may influence the plant’s metabolic processes, thereby promoting the synthesis and accumulation of isoflavones71. The metabolic pathway of isoflavones is highly complex and involves the coordinated action of multiple enzymes. In addition to the previously mentioned enzymes, several upstream biosynthetic enzymes—including phenylalanine ammonia-lyase (PAL), cinnamate-4-hydroxylase (C4H), and 4-coumarate-CoA ligase (4CL), along with various transcription factors, also play critical roles in regulating isoflavone biosynthesis72. Moreover, genetic variation among different cultivars or individual plants can result in differential expression of isoflavone metabolism-related genes, enzyme activities, and variability in the overall efficiency of the metabolic pathway73,74. Transcriptomic analysis further revealed that the upregulation of upstream synthetic genes (e.g., PAL, CHS, F3H) and transcription factors (e.g., MYB, bHLH) in flavonoid biosynthesis may be mediated by epigenetic regulation74. These genetic adaptations, combined with localized abiotic pressures, likely act synergistically to enhance the production of secondary metabolites, including bioactive compounds with potential health-promoting properties.

Conclusions

This study systematically characterized the chemical diversity of FBSJ from four geographic origins using UPLC-MS/MS-based widely targeted metabolomics. Multivariate analyses (PCA, HCA, and OPLS-DA) revealed significant geographic variation in metabolic profiles. Beyond flavonoids—the dominant class of compounds—other metabolite categories such as lipids, phenolic acids, lignans, coumarins, nucleotides, and terpenoids showed origin-specific patterns of accumulation. These findings suggest their potential as novel chemotaxonomic markers for quality control of FBSJ. Among the 1,559 identified metabolites, the largest number reported for FBSJ to date, 135 metabolites were recognized as key active ingredients used in TCM screened via TCMSP database, and 193 metabolites were annotated as active pharmaceutical components linked to 11 major diseases (e.g., cardiovascular disorders, diabetes) via TCMSP and CancerHSP databases. Comparative metabolomic profiling revealed that SJsd specimens contained higher concentrations of both TCM-associated and disease-targeting metabolites compared to SJhb, SJah, and SJhn, suggesting superior therapeutic potential and biochemical richness. Notably, 13 critical pharmacological compounds (e.g., 3’-Methoxydaidzein, Maackiain, Medicarpin) were found to be over twofold more abundant in SJsd highlighting its potential as a superior source material for investigating the health-promoting properties of FBSJ, particularly in relation to analgesic, anti-inflammatory, and bone-conserving effects. This finding provides empirical support for the origin–quality correlation, addressing a gap previously unclarified in the literature. KEGG pathway analysis identified isoflavonoid biosynthesis (ko00943) as the most differentially regulated pathway. SJsd showed enriched expression of metabolites associated with key biosynthetic enzymes, (e.g., CYP93A1, IF7GT), which may be influenced by Shandong’s favorable climate conditions, such as moderate temperature fluctuations, lower precipitation, and extended sunlight exposure. Combined with epigenetic factors, these environmental elements likely enhance isoflavone synthesis through effects on gene expression, metabolic regulation, and microbial community structure.

This work represents a comprehensive metabolic analysis of FBSJ, offering new insights into metabolite variability across geographic origins. These findings not only resolve the ambiguity surrounding origin–quality correlations but also provide pharmacological assessments and elucidate metabolite–disease associations linked to differential metabolite accumulation in FBSJ. It establishes a foundational framework for germplasm evaluation, geographic origin traceability, and the strategic development of functional food products derived from FBSJ. Future research could integrate transcriptomic data and soil metagenomics to elucidate gene–environment interactions contributing to chemotypic variation in FBSJ, thereby advancing our understanding of how genetic regulation and microbial ecology jointly influence secondary metabolite biosynthesis.