Introduction

Soybean (Glycine max L.) is a globally important crop, valued for its high protein content and important role in both human nutrition and animal feed1,2. In addition to its protein richness, soybeans are abundant in bioactive compounds like flavonoids and isoflavones, which are known for their health benefits, such as reducing the risk of cardiovascular diseases, osteoporosis, and certain cancers3,4,5. As a versatile crop, soybeans are becoming increasingly important in agriculture, industry, and food production, with their global demand surging continuously6. With the projected global population increase, food production must significantly rise to meet future needs7. One of the most effective ways to address this challenge is through the development of soybean cultivars rich in valuable metabolites, such as flavonoids and isoflavones8.

Metabolite profiling is essential for uncovering the biochemical composition, metabolic pathways, and physiological states of soybeans and for providing key insights into their complex metabolic networks during growth and environmental responses9. This comprehensive metabolomic approach has enabled the identification and characterization of a wide array of primary and secondary metabolites, including amino acids, carbohydrates, lipids, and various phenolic compounds like flavonoids and isoflavones. Recent advances in analytical techniques, such as gas chromatography and high-performance liquid chromatography coupled with mass spectrometry (MS), have facilitated more detailed metabolomic studies10. While research has primarily focused on soybean seeds due to their nutritional and economic importance11, soybean leaves also contain valuable metabolites, particularly flavonoids like kaempferol glycosides12,13. Flavonoids are a diverse group of secondary plant metabolites that play crucial roles in plant physiology and human health14. In plants, they are involved in pigmentation, UV protection, nitrogen fixation, and pathogen defense15,16. In soybeans, isoflavones, a subgroup of flavonoids, are notable for their estrogen-like effects, which can influence hormone-related health conditions in humans17. Studies have demonstrated that soybean leaf extracts can benefit conditions such as diabetes and obesity, and flavonoid metabolites such as genistein, daidzein, and glycitein have been identified18,19. Despite their significance, the metabolic profiles of soybean leaves at various growth stages remain underexplored20. Understanding these changes is essential for enhancing plant biology knowledge and optimizing agricultural practices, which could reduce reliance on synthetic inputs and promote sustainable farming methods21,22.

Recent research has employed principal component analysis (PCA) to monitor metabolite changes across different growth stages, identifying key flavonoids like kaempferol glycosides and genistein as important markers15. Additionally, 83 flavonoid derivatives have been identified in 4-week-old soybean leaves23. While Park et al. investigated metabolite changes across five soybean cultivars and identified both primary and secondary metabolites20, our study expands on this research by conducting a comprehensive analysis of flavonoids and isoflavones from ten soybean cultivars across six leaf developmental stages to determine optimal leaf harvest times, as well as in their seeds. The vegetative (V) stages studied included nitrogen fixation initiation (V2) by rhizobia and the development of four trifoliate leaves (V4). The reproductive (R) stages encompassed flower formation (R1) through seed maturity (R7). Specifically, R1 marks the beginning of flowering, R3 signifies the onset of pod development, R5 indicates seed development, and R7 represents seed maturity24.

Thus, this research provides a comprehensive understanding of metabolite dynamics throughout the soybean life cycle, particularly regarding flavonoid and isoflavone composition in leaves and seeds. These insights have significant implications for optimizing soybean cultivation, enhancing its nutritional value, and advancing agricultural biotechnology. The findings lay a foundation for future crop improvement and the development of functional foods.

Results and discussion

Phenotypic characteristics of mature seeds and developmental stages of leaves

This study utilized 10 soybean landrace varieties Meju (MJ), Bancheongdu (BCD), KLS88035 (KLS), SLSB397-1 (SL), and Kangwon Sujib2-33(KW), and improved varieties Hoseo (HS), Suwon98 (SW), Sohwang (SH), Bosug (BS), and Kwangan (KA) at various leaf and seed developmental stages, as shown in Fig. 1. Regarding the phenotypic seed color in the landrace lines, MJ seeds were yellow, KLS were green, and BCD, SL, and KW exhibited a brown seed color. In the improved lines, HS, SW, SH, BS, and KA, the seeds were yellow. As shown in Fig. 1b, the vegetative (V2 and V4) and reproductive (R1, R3, R5, and R7) stages can be visually distinguished in the soybean plants.

Fig. 1
figure 1

Phenotypic characteristics of seeds of 10 soybean core collection cultivars (A) and leaf developmental stages (V2, V4, R1, R3, R5, and R7) of variety Kwangan (B). (A. bar = 1 cm; B. bar = 5 cm).

Metabolites of soybean seeds and leaves

In this study, a total of 16 metabolites, including 11 flavonoids and 5 isoflavones, were identified in 10 soybean seed samples and leaves at different developmental stages through ultra-performance liquid chromatography-mass spectrometry (UPLC-MS)/MS. The identified flavonoids included apigenin, kaempferol, luteolin, naringenin, quercetin, rutin, and vitexin and four kaempferol glycosides (KFG1, KFG2, KFG3, and KFG4). However, the identified isoflavones were daidzein, daidzin, genistein, and genistin. Peak identification was supported by the Target Lynx XS software and an in-house mass spectra library. Detailed information on compound names, retention times, wavelengths, and precursor/product ions is provided in Supplementary Table S1. Compounds with available authentic standards were quantified by absolute quantification using calibration curves (Supplementary Table S2). Supplementary Figure S1 shows chromatograms of both samples and standard mixtures, validating the accuracy and reproducibility of the quantification. For the kaempferol group, quantification was carried out using UV detection at 350 nm, with peak areas normalized to an internal standard to ensure accuracy. All values are expressed as mean ± SD from three biological replicates. A comprehensive dataset of all identified compounds across 10 germplasm lines, including six developmental stages of leaf samples and seed samples, is provided in Supplementary Table S3.

Comparative analysis of metabolite profiles in seeds of ten soybean cultivar varieties

Of the 16 identified metabolites, seven were detected in the seeds—daidzein, daidzin, genistein, genistin, naringenin, kaempferol, and quercetin. Notably, genistin and daidzin levels were significantly higher in seeds than in leaves (Supplementary Table S3). This finding aligns with those of previous studies that have highlighted the distinct metabolite profiles of soybean seeds and leaves20,23. PCA was performed to evaluate the metabolite profiles of the seeds (Fig. 2A, 2B). The score plot revealed considerable variability in the metabolite profiles across the 10 soybean cultivars, as indicated by the spread of samples along the first two principal components. The first principal component (PC1) accounted for 65% of the variance and the second (PC2) explained 22.6%, collectively capturing 87.6% of the total variance. This high level of explained variance is consistent with other recent metabolomic analyses of soybean seeds11. Distinct clustering patterns were observed, with SL, KW, BCD, KA, MJ, and SW exhibiting tight clusters, indicating homogeneity in their metabolite profiles. In contrast, KLS, HS, and BS displayed relatively greater dispersion, suggesting higher variability in their metabolite profiles. These varietal differences in the metabolite profiles corroborate the findings of Park et al.20, who reported significant metabolite variations among soybean cultivars. The observed differences in seed metabolite profiles underscore the importance of cultivar selection in breeding programs aimed at enhancing specific metabolite content, as highlighted by recent studies4,5. Furthermore, the distinct metabolite profiles of seeds compared with those of leaves emphasize the tissue-specific nature of metabolite accumulation in soybeans, a phenomenon that has also been noted in recent comprehensive metabolomic analyses3.

Fig. 2
figure 2

Principal component analysis (PCA) score plot (A) and loading plot (B) of soybean seed in 10 cultivars. Partial least squares discriminant analysis score plot (C) and loading plot (D) of soybean leaf according to six growth stages, orthogonal partial least squares discriminant analysis score plot (E) and loading plot (F) of soybean leaf between vegetative stages and reproductive stages and variable importance in projection (VIP) scores associated with metabolites (G).

Variation in metabolite profiles across growth stages

PCA of metabolite profiles during the vegetative growth stages did not reveal any clustering based on cultivar, suggesting minimal varietal differences during this phase. However, distinct variations between the V2 and V4 stages were evident, indicating unique metabolomic shifts as the plants progressed through the vegetative phase (Supplementary Fig. 2). These results align with those of Park et al.20, who also observed substantial fluctuations in metabolite composition within soybean leaves as they progressed through various developmental stages. Notably, varieties such as KA, SH, and HS consistently displayed high concentrations of kaempferol glycosides, suggesting the presence of variety-specific metabolic traits (Fig. 3). In the reproductive stages, partial least squares discriminant analysis (PLS-DA) revealed clear distinctions in metabolite profiles across different developmental stages of soybean plants, with the first two principal components explaining 40.8% of the total variation (Supplementary Fig. 2). This method successfully categorized the reproductive stages into three distinct clusters, consistent with the stage-specific metabolite changes observed by Song et al.15.

Fig. 3
figure 3

Heat maps comparing the levels of flavonoids (A–K) and isoflavones (L–P) in vegetative and reproductive stages of 10 soybean cultivars. Peak areas were normalized to construct a comparative heat map using minimum and maximum algorithms. The deeper the red color, the higher the peak area of the metabolites; the deeper the green color, the lower the peak area of the metabolites.

Further PLS-DA analysis across the entire growth period showed that the V4 stage was markedly distinct from other stages, indicating a unique metabolite profile (Fig. 2C, 2D). This is in line with the findings of Lee et al.23, who reported significant flavonoid variations in young soybean leaves. Key metabolites contributing to the differentiation of growth stages included naringenin, genistin, and daidzin, with rutin and quercetin also playing crucial roles (Fig. 2D).

Notably, our findings on isoflavone content differed from those reported by Song et al.15. While they observed a rise in daidzein and genistein synthesis during the R5–R6 stages, our study found that these isoflavones were the most abundant during the vegetative stages and decreased by the R5 stage. This discrepancy highlights the complex nature of metabolite dynamics in soybeans, which can be influenced by factors such as cultivar type and environmental conditions. The Variable importance in projection (VIP) scores further underscored the significance of certain metabolites across different growth stages, with naringenin emerging as the most critical metabolite (Fig. 2G). The use of VIP scores to identify key metabolites is consistent with the methods employed by Kim et al.25 in their metabolomic analysis of soybean cultivars. These findings enhance our understanding of the dynamic metabolite changes in soybean leaves during growth stages, complementing previous research on cultivar-specific metabolite variations and the relationship between growth stages and flavonoid composition.

OPLS-DA analysis of metabolite profile differences between vegetative and reproductive growth stages

Orthogonal partial least squares discriminant analysis (OPLS-DA) was employed to distinguish the metabolite profiles between the V and R growth stages of the soybeans (Fig. 2E, 2F). The predictive component accounted for 21.8% of the variation, while the orthogonal component (to) accounted for 17.3%, collectively representing 39.1% of the total variance. This analysis revealed a clear separation between the V and R stages, which is consistent with previous studies showing distinct metabolic changes during soybean development20. The V stage was characterized by higher concentrations of flavonoids such as daidzin, genistin, apigenin, kaempferol, genistein, naringenin, luteolin, and quercetin. These findings align with those of Lee et al.23, who identified 83 flavonoid derivatives in young soybean leaves. In contrast, rutin was more prominent during the R stage, particularly during the reproductive phase, supporting the observations of Song et al.15 of stage-specific flavonoid accumulation. These findings extend our understanding of the metabolite dynamics in soybean leaves across growth stages, complementing previous research on cultivar-specific metabolite variations20 and the relationship between seed coat color and flavonoid composition26.

Heat map analysis

Our heat map analysis (Fig. 3) based on the peak areas of metabolites revealed distinct patterns of flavonoid accumulation across both seed and leaf developmental stages in soybean. Notably, four KFGs were detected alongside apigenin, glycitein, luteolin, and vitexin in the leaf stages but were absent in the seeds. This pattern is consistent with the findings of Park et al.20, who also reported the stage-specific accumulation of flavonoids in soybean leaves. In contrast, isoflavones like daidzein, daidzin, genistein, and genistin were more abundant in seeds during the V stages but showed reduced levels during the R stages. This trend aligns with the observations of Kim et al.25 regarding isoflavone variation across soybean developmental stages. Kaempferol and quercetin exhibited greater abundance in the V stages compared to their levels in the R stages, while rutin concentrations were substantially higher during the R stages. This differential accumulation pattern supports the findings of Chen et al.27, who reported stage-specific shifts in metabolite changes in soybean, particularly noting shifts in flavonoid profiles between the vegetative and reproductive phases. These results emphasize the dynamic nature of secondary metabolite synthesis in soybeans, as highlighted by recent metabolomic studies.

Variations in metabolite content across soybean varieties: a focus on Kwangan soybeans

Our metabolomic analysis of 10 soybean cultivars revealed distinct patterns of flavonoid content across different growth stages. Kwangan soybeans, a common variety in Korea, consistently showed higher levels of kaempferol throughout all growth stages than the other varieties studied. This result aligns with that of previous reports highlighting the variability in flavonoid content among soybean cultivars23. Kwangan is widely cultivated in Korea and recognized for its commercial importance and versatility. It is particularly valued for its adaptability to Korean growing conditions and its high yield potential, making it one of the most commonly produced soybean cultivars in the country. Rich in protein and oil, it contributes to various food products, including traditional Korean soy-based foods such as tofu, soy milk, and fermented soybean products.

Notably, for certain metabolites, particularly luteolin, quercetin, and rutin, the other cultivars exhibited higher concentrations than KA. Varieties such as BCD, KLS, SL, KW, SW, and BS contained significant amounts of luteolin and quercetin, whereas MJ, HS, and SH varieties, similar to KA, had low concentrations (Fig. 4). This variation in flavonoid profiles among cultivars has been reported previously20. Among the varieties with high levels of these metabolites, only SW and BS had yellow seed coats, whereas the others exhibited brown and green seed coats. This observation supports the findings of Kim et al.25, suggesting that seed coat color may influence flavonoid composition.

Fig. 4
figure 4

Quantitative comparison of flavonoids (A, kaempferol, luteolin, quercetin, and rutin) and isoflavones (B, daidzein, daidzin, genistein, and genistin) across soybean cultivars and developmental stages. Flavonoids were predominantly detected in leaves, with minimal or no presence in seeds. Isoflavone glycosides (daidzin and genistin) showed peak concentrations in seeds, while aglycones (daidzein and genistein) were highest at the V4 stage. Data represent mean concentrations (μg/g dry weight).

Rutin was identified as the most prominent metabolite during the R stage, with significantly higher concentrations than during the V stage. This stage-specific accumulation of flavonoids has been reported previously15. The cultivars BCD, KLS, SL, KW, and BS contained substantial amounts of rutin, with KW exhibiting the highest concentration. Our findings support and extend previous research indicating that soybeans with non-yellow seed coats, including black, green, brown, and white, tend to have higher luteolin, quercetin, and rutin content than yellow-seeded varieties26. Notably, despite its yellow seed coat, BS contained high levels of these compounds, demonstrating that this correlation is not absolute.

Despite including both improved and landrace lines in our study, we did not observe significant differences in metabolite levels across growth stages between these types. Additionally, when examining the metabolite content differences among the five soybean clades identified by Kim et al.25, no notable variations were detected. These results suggest that factors beyond genetic lineage or improvement status may play a crucial role in determining the flavonoid profiles in soybeans.

Kwangan’s significance extends beyond its agronomic traits. It has been utilized as a model variety in genetic modification studies aimed at enhancing nutritional characteristics, such as the development of β-carotene-enhanced soybeans28. Its strong agronomic performance under local conditions has made it a popular choice among farmers and an important subject in breeding programs focused on improving disease resistance and other desirable traits. Additionally, KA prominence in research is underscored by its frequent use in studies investigating metabolite profiles and genetic improvements29. Overall, KA plays a crucial role in both the agricultural landscape and food industry in Korea, and our study provides new insights into its flavonoid profile compared to other varieties.

Gene expression analysis of key polyphenolic biosynthetic genes

In addition to metabolomic profiling, we performed gene expression analysis of six key polyphenolic biosynthetic genes—chalcone synthase (CHS1), chalcone isomerase (CHI), flavonol synthase (FNS), 2-hydroxyisoflavanone dehydratase (HID), isoflavone O-methyltransferase (ISMAT), and isoflavone synthase (IFS1)—which are central components of the flavonoid and isoflavonoid biosynthetic pathways in G. max (Fig. 5). These genes were analyzed through qRT-PCR across different developmental stages and in seeds.  CHS and CHI are responsible for the initiation of flavonoid skeleton formation, FNS branches the intermediates toward flavone derivatives, HID and IFS1 drive isoflavone backbone formation, and ISMAT mediates late-stage methylation of isoflavones. These genes represent critical regulatory nodes that govern flux through the flavonoid/isoflavonoid metabolic network, and their expression patterns likely contribute to the observed variation in metabolite accumulation.

Fig. 5
figure 5

Simplified polyphenolic metabolic pathway (A) and gene expression levels of biosynthesis gene for flavonoids and isoflavones (B). CHS, chalcone synthase; CHI, chalcone isomerase; FNS, flavonol synthase; IFS, isoflavone synthase; ISMAT, isoflavone 7-o-methyl transferase; HID, 2-hydroxyisoflavanone dehydratase. Three biological replicates were performed, and error bars represent standard deviation (SD).

Integrating metabolomic data with gene expression data from qPCR provides a comprehensive understanding of the biological processes in soybeans, revealing the connections between metabolic pathways and gene regulation27. Studies have shown correlations between metabolite levels and the expression of key genes in specific pathways such as flavonoid biosynthesis and isoflavone production30. These correlations can be developmental stage-specific31 or vary in response to environmental factors32. Comparative analyses of different soybean cultivars have revealed how metabolite–gene expression correlations may differ, providing insights into breeding programs33. Recent research has demonstrated the correlations between primary metabolite content and seed protein and oil traits in near-isogenic soybean lines, with 28 annotated metabolites showing significant correlations with seed traits26. Additionally, integrated transcriptomic and metabolomic analyses have been used to investigate the regulatory networks controlling soybean seed metabolism34 and to classify soybeans from different habitats35.

We examined the CHS gene, a type III polyketide synthase that catalyzes the first step in flavonoid biosynthesis. In G. max, 14 unique CHS genes (GmCHS1-GmCHS14) have been identified36. From these genes, we selected CHS1 for correlation with our metabolite data, and observed its expression at the R7 leaf stage. Isoflavone-related genes, including IFS, HID, and ISMAT, were highly expressed in the seeds and R7 and V4 leaves, which corresponded with high levels of daidzin, genistin, and glycitin in our metabolite data. Additionally, FNS, which is involved in flavonol biosynthesis, showed high expression at the R1, V4, and R3 stages, particularly in the leaves, which aligned with the high flavonoid levels observed in our metabolomic data (Fig. 5). Finally, our gene expression results were compared to previously published transcriptomic data37, which are consistent with our findings, further validating the expression patterns observed in our study.

Correlation analysis integrating metabolite and gene expression data

To elucidate the regulatory relationships between metabolite accumulation and gene expression, we performed Pearson correlation analysis between 16 flavonoid/isoflavonoid metabolites and six key polyphenolic biosynthetic genes (CHS1, CHI, FNS, HID, ISMAT, and IFS1). The resulting correlation matrix revealed distinct positive and negative association patterns across developmental stages and varieties. Several metabolites, such as kaempferol-group compounds and daidzein derivatives, showed strong positive correlations with FNS and IFS1 expression, while others, including early flavanones, displayed weak or inverse correlations.

We visualized these associations as a heatmap (Fig. 6A), highlighting the correlation coefficients (r) and their statistical significance following Benjamini–Hochberg false discovery rate (FDR) correction. To further examine the most robust associations, we filtered significant metabolite–gene pairs (FDR < 0.05) and constructed a correlation network (Fig. 6B). In this network, nodes represent metabolites or genes, and edges represent significant positive (red) or negative (blue) correlations, with edge width scaled to |r| and node size proportional to degree. This network clearly grouped metabolites and genes into coherent clusters, illustrating potential co-regulation modules underlying polyphenolic biosynthesis.

Fig. 6
figure 6

Correlation between flavonoid/isoflavonoid metabolites and key polyphenolic biosynthetic genes. (A) Heatmap showing Pearson correlation coefficients (r) between 16 identified metabolites and six biosynthetic genes (CHS1, CHI, FNS, HID, ISMAT, and IFS1). Colors indicate correlation strength (blue = negative, red = positive). Statistical significance was assessed using two-sided tests and corrected using the Benjamini–Hochberg false discovery rate (FDR). (B) Network diagram showing significant metabolite–gene correlations (FDR < 0.05). Nodes denote metabolites (circles) and genes (diamonds); node size reflects degree, and node color indicates community grouping. Edge color represents correlation sign (red = positive, blue = negative) and edge width corresponds to |r|. Isolates (degree = 0)—naringenin, daidzein, and genistein—are not shown to indicate measured metabolites without significant associations at the chosen thresholds. This network highlights modules of tightly associated metabolites and genes Table 1.

Table 1 List of 10 soybean cultivars selected based on the genome coverage with subpopulation clade.

Conclusions

This study conducted a comprehensive metabolite profiling of 10 diverse soybean varieties across six leaf developmental stages and in seeds, with a particular focus on flavonoids and isoflavones. Using UPLC-MS/MS, we identified 16 key compounds that provided valuable insights into the dynamic metabolite landscape of soybeans. Through multivariate analyses, including PCA, PLS-DA, and OPLS-DA, we observed clear distinctions in metabolite profiles between the vegetative and reproductive stages, as well as among different soybean varieties. Notably, naringenin, genistin, and kaempferol were identified as the most significant metabolites for differentiating between varieties and developmental stages, based on VIP scores. This study establishes a basis for optimizing soybean cultivation and improving nutritional quality by revealing metabolic profiles across growth stages and varieties, providing insights for targeted breeding programs.

Materials and methods

Plant materials

From a core collection of 430 soybean cultivars38, ten were selected for flavonoid and isoflavone profiling. These included both landrace varieties (Meju, Bancheongdu, KLS88035, SLSB397-1, Kangwon Sujib2-33) and improved varieties (Hoseo, Suwon98, Sohwang, Bosug, Kwangan)25. The selection was based on a genomic coverage of over 80%, ensuring comprehensive genetic representation. The cultivars were provided by the National Institute of Crop Science in Wanju, Jeollabuk-do, Korea. Planting, cultivation, and sampling were conducted at the Metabolic Engineering Division, Department of Agricultural Biotechnology, National Institute of Agricultural Sciences, Rural Development Administration, 370 Nongsaengmyeong-ro, Jeonju 54,874, Republic of Korea (35°49′51"N, 127°03′55"E). Each soybean cultivar was planted in a greenhouse on June 22, 2023. On July 3, 2023, seedlings were transferred to an isolated field. Field planting was conducted at 15-cm intervals between seedlings on six 80 × 0.8-m rows, with 20 seedlings planted for each cultivar. Leaves were collected from the most recently developed trifoliate leaves at six distinct growth stages (V2, V4, R1, R3, R5, and R7). Leaves were flash-frozen in liquid nitrogen immediately after collection to prevent metabolic changes. The frozen leaves were lyophilized, ground into a fine powder, and stored at -80 °C until further analysis. All experimental procedures involving plant material complied with RDA, Korean guidelines, and applicable legislation.

Flavonoid and isoflavone extraction

Flavonoids and isoflavones were extracted and analyzed as previously described20. In summary, 25 mg of ground leaf or seed powder was weighed and combined with 200 µL of 1.2 M hydrochloric acid in 80% ethanol (v/v). Internal standards were then added: 50 µL of 2.5 ppm galangin (Sigma-Aldrich, St. Louis, MO, USA; CAS.No: 92342) for flavonoids and 0.5 ppm of 6-methoxyflavone (Sigma-Aldrich; CAS.No: 419737) for isoflavones. The mixture was vigorously stirred at 1200 rpm for 2 h at 26 °C using a Thermomixer Comfort (Eppendorf AG, Hamburg, Germany). Subsequently, the samples were centrifuged at 13,000 × g for 10 min at 25 °C in a TOMY MX-307 centrifuge. The supernatant was passed through a 0.5-µm filter (Advantec Toyo Roshi Kaisha Ltd., Tokyo, Japan) at room temperature.

UPLC-MS/MS analysis of metabolites

Flavonoid and isoflavone compounds were separated and analyzed using an ACQUITY UPLC CSH C18 column (1.7 μm, 150 mm × 2.1 mm; Waters, Milford, MA, USA). The mobile phase consisted of 0.1% formic acid in water (solvent A) and 0.1% formic acid in acetonitrile (solvent B), at a flow rate of 0.3 mL/min. Gradient elution was applied as follows: 0–20 min, 95–75% A; 20–28 min, 75–60% A; 28–30 min, 60–10% A; 30–32 min, 10% A; 32–35 min, 10–95% A; and 35–40 min, 95% A. The injection volume was 1 μL, and the column temperature was maintained at 30 °C. The UPLC system was connected to a Xevo TQ-S micro mass spectrometer (Waters) operating in the positive ion mode, with nitrogen as the desolvation gas. The desolvation temperature was set to 500 °C at a flow rate of 1,000 L h⁻1, with a source temperature of 120 °C. The capillary and cone voltages were 0.5 kV and 5 V, respectively. Collision energy for MS/MS was 15–30 V, and data were acquired in multiple reaction monitoring (MRM) mode. Raw MS data were processed using the TargetLynx XS applications manager.

Prior to sample analysis, each target metabolite was confirmed using authentic reference standards, and individual calibration curves were constructed across appropriate concentration ranges (0.002–10 µg mL⁻1). Standard solutions were analyzed under the same UPLC-MS conditions as the samples, and calibration curves were generated by plotting peak area versus concentration. All calibration curves showed excellent linearity (R2 ≥ 0.994), with consistent slopes and intercepts, indicating high sensitivity and reproducibility. Retention times and precursor/product ion transitions were stable across injections, with retention time drift < 0.1 min. Calibration standards were injected immediately before the biological samples to verify system suitability and instrument stability. Subsequently, solvent blanks were run after the highest-concentration standards to ensure the absence of carryover.

The quantification of individual flavonoid metabolites was performed using the external calibration curves derived from authentic standards (Supplementary Table S2). For the kaempferol group, quantification was carried out by UV detection at 350 nm, and peak areas were normalized to an internal standard to correct for injection and instrumental variability. For all analytes, the ratio of the peak area of each metabolite to that of the internal standard was used for concentration estimation based on the selected ions. Representative chromatograms of the authentic standards and sample mixtures are shown in Supplementary Figure S1, confirming the selectivity and reproducibility of the quantification procedure.

RNA extraction and cDNA synthesis

Total RNA was extracted from 10 soybean cultivar leaves at six developmental stages (V2, V4, R1, R3, R5, and R7), and from their seeds, using a Spectrum Plant Total RNA Kit (Sigma-Aldrich), following the manufacturer’s guidelines. The purified RNA was dissolved in DEPC-treated water, and 1 µg of RNA was utilized for cDNA synthesis. This synthesis was conducted using a Bio-Rad cDNA synthesis kit (Bio-Rad Laboratories, Hercules, CA, USA) in a thermal cycler. The procedure commenced with an initial denaturation at 72 °C for 3 min, followed by cooling to 4 °C. Subsequently, the master mix was added to a pre-cooled tube and incubated at 42 °C for 4 h. The reaction was concluded by heating the mixture to 70 °C for 10 min. The samples were then stored at -20 °C until further analysis.

Gene expression analysis via quantitative real-time PCR

Gene expression levels were assessed via quantitative real-time polymerase chain reaction (qRT-PCR). This analysis was conducted on a CFX96 Real-Time PCR detection system using the SYBR Premix (Bio-Rad Laboratories). The PCR conditions included an initial denaturation at 95 °C for 5 min, followed by 45 cycles of 15 s at 95 °C for denaturation and 30 s at 60 °C for annealing. Primer pairs were designed using the PrimerQuest Tool from Integrated DNA Technologies (Coralville, IA, USA). Gene expression levels were normalized to the internal reference gene, Glycine max L Actin (NM_001252731.3). Relative quantification was performed using the Livak and Schmittgen 2-ΔΔCT method to evaluate quantitative differences between replicates39. The accession numbers and primer sequences used in this study are listed in Supplementary Table 4.

Data analysis

The data were analyzed using Metaboanalyst 6.0, accessed on September 5, 2024 https://www.metaboanalyst.ca)40, with PCA, PLS-DA, and OPLS-DA to identify significant variables by evaluating the VIP scores. Data normalization was performed to ensure robust and reliable results. PCA, PLS-DA, and OPLS-DA were specifically used to discriminate between different growth stages based on metabolite profiles, with VIP scores indicating the significance of each metabolite. Heatmaps and quantification of selected metabolites were created using the online tool Chiplot (https://www.chiplot.online/)41,42 to visualize the abundance of key metabolites across different growth stages, highlighting stage-specific patterns in the distribution of flavonoids and isoflavones. Pairwise Pearson correlation coefficients (r) and p-values were computed using R (version 4.5.1). The resulting edge and node tables were exported from R and visualized as a correlation network using the Cytoscape platform. The experiments included three biological and three technical replicates to ensure reliable results.