Introduction

Soybean (Glycine max) was domesticated from wild soybean (G. soja) in China 5000–9000 years ago1,2. Soybean is the most globally economically important legume crop and provides abundant protein and vegetable oil for human consumption, animal feed, and industrial production3. Considering the growing global population and ever-expanding consumer demands, global crop yield requires a doubling by 2050 to maintain their current level of population sustainment4,5. In contrast to dramatic increases in the production of staple crops including rice, wheat, and maize that have greatly benefited from the Green Revolution, soybean yield has only modestly increased over the past six decades6,7,8. Thus, improving soybean yield is a critical and urgent objective in soybean breeding9,10. Soybean yield is a complex trait that are determined by various factors, including plant height, internode formation, branch number, plant density, pod number, seed number per pod, and seed size, ultimately culminating in observed seed weight and the number of seeds per unit area11. Among these determinants, seed weight frequently positively correlates with seed size and is a primary influence on soybean yield.

Seed weight has significantly increased during domestication due to the recruitment of superior alleles and reduced frequency of negative alleles12. Consequently, further introduction of superior alleles into cultivars and knockout/editing of negative alleles provides a strategy for cultivar improvement that could benefit from a better understanding of their molecular characteristics. Several major signaling pathways that control seed weight have been identified in model plants13,14. Further, hundreds of quantitative trait loci (QTLs) associated with seed weight/size have been identified for soybeans, although only a few genes that are causally related to these QTLs have been functionally characterized15,16,17. In addition, many positive regulators of soybean seed weight/size18,19,20,21,22,23,24,25,26,27,28 and a few negative regulators7,29,30,31 have been identified. Moreover, a few genes21,22,23,25,30,31, including GmST05/GmMFT22,25, have been shown to simultaneously affect protein/oil contents and seed size. However, the genetic and regulatory mechanisms controlling seed weight in soybean still remain largely unknown, let alone effective strategies that can be used to boost soybean yield via their manipulation.

The ubiquitin-proteasome pathway plays a critical role in controlling seed weight and size13,14. E3 ubiquitin ligases directly modulate protein ubiquitination and stability, consequently being frequently reported to regulate seed weight/size of various plants32,33,34,35,36,37. In addition to ubiquitination, transcriptional regulators, and particularly transcription factors (TFs), are essential for seed development13,14, including for example the ABA INSENSITIVE 5 (ABI5) that is a bZIP TF38. Concomitantly, 14-3-3 proteins in eukaryotes regulate various aspects of target proteins, including translocation, structure, activity, and stability, thus providing platforms for cross-talk between different signaling pathways and playing critical roles in abiotic stress responses and plant growth processes39,40. However, the roles and molecular mechanisms of 14-3-3 proteins in modulating soybean seed size remain unexplored. Moreover, the mechanism by which TF protein homeostasis is fine-tuned by cooperation of ubiquitination and 14-3-3 proteins remains unknown in plants.

In this work, we characterize a soybean small seed 6 (sms6) mutant. Genetic, molecular, and biochemical analyses reveal that the causal gene, GmSMS6, encodes a 14-3-3 protein, which negatively affects seed weight by affecting cellular expansions. Moreover, GmSMS6 cooperates with an E3 ubiquitin ligase (GmUBQ1) to coordinate post-translational regulations, thereby fine-tuning transcriptional activation activity and protein homeostasis of GmbZIP151. Disrupting GmSMS6 boosts soybean yield potential by about 10% and increases seed protein content but decreases oil accumulation. Nevertheless, loss-of-function (LoF) alleles have not been selected during domestication. This study provides insights that can help deconvolute the determinants of seed weight during domestication, consequently representing potential for improving high-yield and high-quality soybean breeding.

Results

Positional cloning of the qSMS6 locus

An sms6 mutant was identified via a forward genetic screen of an ethyl-methanesulfonate (EMS)-mutagenesis population in the staple soybean cultivar ‘Zhongpin661’ (Zp661) background41. Compared to Zp661, sms6 plants exhibited shorter plant heights and more branches (Fig. 1a, b and Supplementary Fig. 1a–c). The seed size, including length, width, and thickness, in sms6 plants universally became smaller (Fig. 1c–j), resulting in dramatically decreased 100-seed weight (Fig. 1k). However, other agronomic traits, including the number of nodes and pods, did not exhibit any significant differences in sms6 plants compared to Zp661 plants (Supplementary Fig. 1d,e). Seed yield per plant was significantly lower in sms6 than Zp661 plants (Fig. 1l).

Fig. 1: Phenotypic characterization of the soybean sms6 mutant.
figure 1

Images showing the architectures of Zp661 (a) and sms6 (b) plants. Bar = 10 cm. Images showing seed size of Zp661 (c) and sms6 (d) plants. Images showing seed length (e), width (f), and thickness (g) in Zp661 and sms6 plants. Bars = 10 mm. Quantification of seed length (h), seed width (i), seed thickness (j), 100-seed weight (k), and seed yield per plant (l) of Zp661 and sms6 plants. Seed length, width, and thickness were measured in at least 60 seeds (n = 78, 60, and 60 seeds for h, i, and j, respectively), while 100-seed weight and seed yield per plant were measured in 10 individuals (n = 10 for k and l). Data are shown as means ± SD. Statistically significant differences were determined by two-sided Student’s t-tests. ***p < 0.001. Source data are provided as a Source Data file.

The MutMap method42 was used to map the sms6 locus responsible for seed weight reduction using an F2 population derived from a cross between Zp661 and sms6. Analysis of candidate single nucleotide polymorphisms (SNPs) and calculation of the ΔSNP-index revealed that a major QTL, referred to as qSMS6, was located on the long arm of chromosome 2, as identified in the Williams 82 genome43 (Fig. 2a and Supplementary Fig. 2a). qSMS6 was located in a QTL-rich region related to seed weight/size that was previously identified in several independent studies44,45,46,47,48,49 (Supplementary Fig. 2b). To target the causal gene, molecular markers were developed within this interval and fine mapped-based cloning of GmSMS6 was conducted. For this purpose, an F2:3 population consisting of 4120 phenotyped individuals was constructed using residual heterozygous lines (RHLs) that were heterozygous for the qSMS6 region (Fig. 2b). The causative region was narrowed down to a 308.5-kb interval in the chromosomal region from 39.35 to 39.66 Mb (Fig. 2b). One SNP (a G to A transversion at position 39,388,884) was detected in the first exon of the candidate gene Glyma.02G208700, resulting in a non-synonymous mutation from Arginine (R) in Zp661 to Histidine (H) in sms6, which was subsequently referred to as the R49H substitution (Supplementary Fig. 2c). Glyma.02G208700 was consequently identified as the candidate gene for qSMS6.

Fig. 2: Map-based cloning and functional analysis of GmSMS6.
figure 2

a GmSMS6 was mapped to the qSMS6 region on chromosome 2. b Fine-mapping of GmSMS6 to a 308.5-kb region. The green and blue bars represent the homozygous alleles from Zp661 and sms6, respectively. The bar graph on the right indicates 100-seed weight variation for each recombinant. GmSMS6 is highlighted in the red box, and the black boxes indicate the other genes within this genomic region. c 100-seed weight (n = 4 individuals) and indicated seed traits (n = 40 seeds) for Zp661, CR-SMS6Z1, and CR-SMS6Z2. d Comparison of seed length among Jack and indicated CR-SMS6 lines. Bar = 10 mm. e 100-seed weight (n = 10 individuals) and indicated seed traits (n = 60 seeds) in Jack and CR-SMS6 lines. f Comparison of seed length among Jack, OE-SMS6G, and OE-SMS6A, where OE-SMS6G lines. g 100-seed weight (n = 10, 4, 4, 4, and 4 individuals for Jack, OE1-SMS6G, OE2-SMS6G, OE1-SMS6A, and OE2-SMS6A lines, respectively) and indicated seed traits (n = 60, 40, 40, 40, and 40 seeds for the same lines) in Jack and four OE-SMS6 lines. h Scanning electron microscopy (SEM) analysis of the mature seed cotyledons (inner surface) from Jack, CR1-SMS6, OE1-SMS6G, and OE1-SMS6A lines. Bar = 10 µm (cells) and 2 mm (cotyledons). i Cell area, cotyledon area, and cell number in mature seeds (n = 4) of Jack, CR1-SMS6, OE1-SMS6G, and OE1-SMS6A plants. The cell size (n = 406, 457, 432, and 508 cells for each genotype, respectively) were measured. Data are shown as means ± SD. In all the box plots, the center line indicates the median, the edges of the box represent the first and third quartiles, and the whiskers extend to the smallest and largest data points within 1.5 interquartile ranges from the edges. Statistically significant differences (c, e) were determined by two-sided Student’s t-tests. *** p < 0.001. The different lowercase letters (g, i) indicate statistically significant differences (p < 0.05) determined by two-sided one-way ANOVA tests with Duncan’s multiple range tests. ns, not significant. Source data are provided as a Source Data file.

To corroborate the above hypothesis, a pair of near-isogenic lines (NILs) for qSMS6 was generated through marker-assisted selection from an RHL heterozygous in the qSMS6 region from the Zp661 background (Fig. 2b and Supplementary Fig. 3a, b). Among the measured agronomic traits, only seed weight and yield per plant were significantly different between the two NILs (Supplementary Fig. 3c–g and Supplementary Fig. 4). Seed developmental stages were similar between the two NILs (Supplementary Fig. 4a). However, NIL-SMS6, which carried the Zp661 allele, exhibited significantly larger seed weight/size and yield per plant compared to NIL-sms6 that harbored the sms6 allele (Supplementary Fig. 4b–g), implying a seed-specific function of GmSMS6.

GmSMS6 negatively influences soybean seed weight

To further investigate the role of GmSMS6 in seed weight, GmSMS6 was knocked out in Zp661 using CRISPR/Cas9 genome editing, generating two independently homozygous knockout lines (CR-SMS6Z1 and CR-SMS6Z2) (Supplementary Fig. 5a). The resultant gene-edited soybean plants exhibited large seeds with increased seed length, width, thickness, and seed yield per plant compared to the Zp661 line (Fig. 2c and Supplementary Fig. 5b–d). In addition, the knockout lines did not exhibit any significant differences in other major agronomic traits compared to Zp661 plants (Supplementary Fig. 5e–h). These results strongly implicated GmSMS6 as a negative regulator of seed weight.

To further validate the effect of GmSMS6 on soybean morphology, three independent homozygous knockout lines (CR1-SMS6, CR2-SMS6, and CR3-SMS6) were generated from the cultivar ‘Jack’ line (Supplementary Fig. 6a). The seed weight, length, width, and thickness all significantly increased in these knockout transgenic lines, without changes in plant architecture and other traits (Fig. 2d, e and Supplementary Fig. 6b–g). Moreover, the CR-SMS6 knockout lines exhibited significantly increased seed yield per plant compared with the Jack line (Supplementary Fig. 6h).

These observations apparently differ from variation observed for sms6, suggesting a non-loss-of-function mutation in sms6. To further elucidate this pattern, the coding sequence (CDS) of GmSMS6G (Zp661 allele) and GmSMS6A (sms6 allele) were overexpressed in Jack plants driven by the 35S promoter, generating three independent transgenic overexpression soybean lines for each allele (SMS6G-OE1, SMS6G-OE2, SMS6G-OE3, SMS6A-OE1, SMS6A-OE2, and SMS6A-OE3). Reverse transcription–quantitative PCR (RT–qPCR) assays revealed that the transcript levels of GmSMS6G and GmSMS6A were significantly elevated in the obtained overexpression lines (Supplementary Fig. 7a). In contrast to the knockout lines, the overexpression lines exhibited reduced seed weight and size compared to the Jack line (Supplementary Fig. 7b–h), which overall phenocopied sms6 and NIL-sms6. Moreover, the SMS6A-OE lines exhibited more pronounced reductions in seed weight in conjunction with smaller seed length, width, and thickness compared to the SMS6G-OE line plants (Supplementary Fig. 7b–h). The overexpression lines also exhibited significantly reduced seed yield per plant compared with Jack plants (Supplementary Fig. 7i). The overexpression lines did not exhibit differences in other agronomic traits compared to Jack plants (Supplementary Fig. 7j–m). Overall, these results indicate that GmSMS6 negatively regulates seed weight and size in soybean plants, wherein GmSMS6A may exert a greater repression effect than GmSMS6G.

The above hypothesis was further substantiated by generating independent overexpression transgenic Jack lines for GmSMS6G and GmSMS6A that were each fused with flag epitopes. Two independent overexpression lines for each allele (OE1-SMS6G, OE2-SMS6G, OE1-SMS6A, and OE2-SMS6A), with comparable transcript and protein levels, were selected for agronomic trait investigation (Supplementary Fig. 8a–c). All four overexpression lines exhibited specific and significant reductions in seed length, width, thickness, seed weight, and seed yield per plant (Fig. 2f, g and Supplementary Fig. 8d–j). Moreover, overexpression of GmSMS6A exerted a greater effect on reducing seed weight than GmSMS6G (Fig. 2g). Seed weight and size based on seed length, width, and thickness, in addition to seed yield per plant, were smaller in OE-SMS6A plants than in OE-SMS6G plants (Fig. 2f, g and Supplementary Fig. 8d, e, j). These results indicate that GmSMS6A has a stronger negative regulatory effect on seed weight than GmSMS6G. Taken together, our findings demonstrate that GmSMS6 negatively regulates seed weight and size. Moreover, the sms6 allele GmSMS6A is a function-enhanced mutation that confers an enhanced repression effect on seed size compared to the GmSMS6G allele.

GmSMS6 regulates seed size by repressing cellular expansion

To identify how GmSMS6 regulates seed weight and size, cellular analysis of sms6-related soybean lines was first conducted. Scanning electron microscopy (SEM) analysis revealed that the sms6 mutant exhibited significantly reduced cell size and a slight decrease in cell number on mature cotyledons compared to Zp661 (Supplementary Fig. 9a–d), suggesting that the small seed size of sms6 is primarily due to reduced cellular expansion. This hypothesis was further supported by observations of cellular variation in NIL seeds, wherein cell size of mature cotyledons was reduced in NIL-sms6 compared to NIL-SMS6, without significant differences in cell number (Supplementary Fig. 9e–h). To further substantiate these results, cellular variation was evaluated in the obtained GmSMS6-modified transgenic soybean lines from the Zp661 and Jack backgrounds. SEM analysis revealed that CR-SMS6Z1 line mature seeds exhibited increased cell area, without significant differences in cell number compared to Zp661 plants (Supplementary Fig. 9i–l). Similarly, CR1-SMS6 line mature seeds exhibited increased cell area, but with comparable cell number to Jack plants (Fig. 2h, i). In contrast, OE1-SMS6G and OE1-SMS6A lines of Jack exhibited decreased cell size, with the decreases being more pronounced in OE1-SMS6A lines (Fig. 2h, i). Collectively, these results suggest that GmSMS6 regulates seed size primarily by repressing cell expansion. Moreover, the GmSMS6A allele confers a stronger repression effect on cell expansion compared to the GmSMS6G allele.

The two isoforms of the 14-3-3 protein GmSMS6 interact with GmbZIP151 and GmUBQ1

GmSMS6 was predicted to encode a 14-3-3 protein (Supplementary Fig. 10a). RT–qPCR analysis indicated that GmSMS6 was constitutively expressed in various tissues and organs (Supplementary Fig. 10b). Moreover, its expression levels were comparable between the two NILs, indicating that the R49H substitution did not affect gene expression (Supplementary Fig. 10c). We found that both isoforms were localized in the nucleus and cytoplasm (Supplementary Fig. 10d), indicating that the R49H substitution did not affect the subcellular localization of GmSMS6. 14-3-3 proteins are known to form dimers39,40. To investigate whether the amino acid (R49H) change affects the formation of dimers, we conducted luciferase complementation imaging (LCI) and bimolecular fluorescence complementation (BiFC) assays. The results demonstrated that the R49H substitution did not perturb the formation of dimers (Supplementary Fig. 10e, f). Protein structure prediction revealed no differences in spatial structure between the two isoforms (Supplementary Fig. 10g). However, protein alignment of GmSMS6 homologs from various plants revealed that the mutated R49H residue was conserved as an R in 14-3-3 proteins of several species (Supplementary Fig. 11), implying that the conserved residue site is likely important for the biochemical and biological functions of the 14-3-3 proteins.

To further assess how the R49H substitution affects the regulation of seed weight by GmSMS6 in soybeans, yeast two-hybrid (Y2H) assays were conducted by screening a soybean cDNA library to identify proteins that interact with GmSMS6. Two proteins (GmbZIP151 and GmUBQ1) were identified in the screens. GmbZIP151 encodes an ortholog of Arabidopsis ABI5 (Supplementary Fig. 12a, b) that is a key TF in ABA signal transduction during seed germination and seedling growth50,51 and that negatively regulates seed weight38. To validate protein-protein interactions (PPIs), GmbZIP151 was shown to comparably interact with GmSMS6G and GmSMS6A in Y2H assays (Fig. 3a). Further, LCI assays in N. benthamiana leaves demonstrated that co-expression of GmbZIP151-nLUC with cLUC-GmSMS6G or cLUC-GmSMS6A resulted in strong luciferase (LUC) activity, while no such activity was observed in the controls (Supplementary Fig. 12c). Co-immunoprecipitation (Co-IP) assays further confirmed interactions between GmSMS6G-Flag or GmSMS6A-Flag with GmbZIP151-GFP (Fig. 3b). Additionally, yellow fluorescence protein (YFP) signals were observed when GmbZIP151-cYFP was co-expressed with GmSMS6G-nYFP or GmSMS6A-nYFP in BiFC assays (Fig. 3c). Pull-down assays further confirmed the observed PPI (Fig. 3d). Consequently, these results suggest that GmSMS6 physically interacts with GmbZIP151.

Fig. 3: GmSMS6 physically interacts with GmbZIP151 and GmUBQ1.
figure 3

a Yeast two-hybrid assay (Y2H) demonstrated interactions among GmSMS6 (GmSMS6G and GmSMS6A) and GmbZIP151. The pBD-53 and pAD-T combination was used as a positive control, while the empty vector (pBD and pAD) associated combinations and the pBD-CR1-SMS6 + pAD-GmbZIP151 combination served as negative controls. SD/-T-L, synthetic defined (SD) medium lacking Trp and Leu; SD/-T-L-H-A, SD medium lacking Trp, Leu, His, and adenine; pGBKT7, pBD; pGADT7, pAD. b Co-immunoprecipitation (Co-IP) analysis to determine potential interactions between GmSMS6 (GmSMS6G and GmSMS6A) and GmbZIP151. The input and co-immunoprecipitated proteins were detected with anti-Flag and anti-GFP antibodies, respectively. kDa, kilodalton; GFP, green fluoresce protein. Three independent experiments were repeated with similar results. c Bimolecular fluorescence complementation assays to evaluate interactions between GmSMS6G and GmSMS6A with GmbZIP151. YFP, yellow fluoresce protein; nYFP, N-terminal of YFP; cYFP, c-terminal of YFP. Bars = 50 µm. d Pull-down assays confirming the interactions between GmSMS6G and GmSMS6A with GmbZIP151. MBP, maltose-binding protein; GST, glutathione-S-transferase. The input and pulled proteins were detected with anti-MBP or anti-GST antibodies, as indicated. Three independent experiments were repeated with similar results. e Y2H assays demonstrating the interactions among GmSMS6 and GmUBQ1. The pBD-53 and pAD-T combination was used as a positive control, while empty vector (pBD and pAD) associated combinations and the pBD-CR1-SMS6 + pAD-GmUBQ1 combinations served as negative controls. f Interaction strength of GmSMS6G and GmSMS6A with GmUBQ1 based on β-galactosidase activity assays of Y2H. The empty vector (pBD and pAD) associated combinations served as negative controls. Data are shown as means ± SD (n = 3, 3, 3, 6, and 6 biological replicates, respectively). The different lowercase letters indicate statistically significant differences (p < 0.05) determined by two-sided one-way ANOVA tests with Duncan’s multiple range tests. g Co-IP analysis of potential interactions between GmSMS6 (GmSMS6G and GmSMS6A) and GmUBQ1. Immunoprecipitation was performed using anti-MYC magnetic beads. The input and co-immunoprecipitated proteins were detected with anti-Flag and anti-MYC antibodies. Three independent experiments were repeated with similar results. Source data are provided as a Source Data file.

The second inferred interacting protein, GmUBQ1, was predicted to be a C3HC4 RING-type E3 ubiquitin ligase (Supplementary Fig. 13a, b). Y2H and LCI assays demonstrated that GmSMS6 interacted with GmUBQ1 and that the PPI of GmSMS6A with GmUBQ1 was stronger than that with GmSMS6G (Fig. 3e, f and Supplementary Fig. 13c, d). The in vivo and in vitro PPIs between GmSMS6G and GmSMS6A with GmUBQ1 were further investigated using Co-IP, BiFC, and pull-down assays (Fig. 3g and Supplementary Fig. 13e, f). All results supported that GmSMS6 physically interacts with GmUBQ1.

GmUBQ1 mediates the ubiquitination and degradation of GmbZIP151, not GmSMS6

To investigate the ubiquitin ligase activity of GmUBQ1, a self-ubiquitination assay was conducted. GmUBQ1 could be auto-ubiquitinated when His-GmUBQ1 was incubated with ubiquitin, the ubiquitin-activating enzyme (E1), and the ubiquitin-conjugating enzyme (E2), while no ubiquitination smear was detected in the absence of E1, E2, or GmUBQ1 (Fig. 4a). Thus, GmUBQ1 was a functional E3 ubiquitin ligase. Consequently, we hypothesized that GmUBQ1 might facilitate the ubiquitination of GmSMS6. An in vitro ubiquitination assay was first used to determine if GmSMS6 could be ubiquitinated by GmUBQ1. However, a ubiquitination smear was not detected for either recombinant GST-GmSMS6G or GST-GmSMS6A proteins in the presence of ubiquitin, E1, E2, and GmUBQ1, as confirmed by anti-GST antibodies (Supplementary Fig. 14a), indicating that GmSMS6 cannot be ubiquitinated by GmUBQ1.

Fig. 4: GmSMS6 coordinates the ubiquitination and degradation of GmbZIP151 that is mediated by GmUBQ1.
figure 4

a E3 ligase activity of GmUBQ1. Ubiquitinated proteins were detected using both anti-His and anti-Ub antibodies. b Interaction between GmUBQ1 and GmbZIP151 suggested by co-immunoprecipitation (Co-IP) assays using anti-GFP beads. kDa, kilodalton. c Ubiquitination of GmbZIP151 mediated by GmUBQ1 in vitro. Ubiquitinated proteins were detected using both anti-MBP and anti-Ub antibodies. d In vivo ubiquitination of GmbZIP151 detected by Co-IP assays. GmbZIP151-GFP and MYC-GmUBQ1 were co-expressed in N. benthamiana leaves. Ubiquitinated proteins were detected using both anti-GFP and anti-Ub antibodies. e Ubiquitination of GmbZIP151 that was mediated by GmUBQ1 was repressed by GmSMS6. Ubiquitinated proteins were detected using anti-MBP, anti-GST, anti-Ub, and anti-His antibodies. Three independent experiments for ae were repeated with similar results. fk GmSMS6 repressed the ubiquitination and degradation of GmbZIP151 that was mediated by GmUBQ1 in a cell-free assay. f, g GmbZIP151 degradation was measured and quantified by immunoblotting with anti-MBP antibody at the indicated time points (min). Equal amounts of MBP-GmbZIP151 were incubated with equal amounts of total proteins that were extracted from WT ‘Jack’ (1), CR1-SMS6 (2), OE1-SMS6G (3), and OE1-SMS6A (4) soybean plants. 5, Addition of MG132 on the base of (1) plants. 6, Independent loading of equal amounts of MBP-GmbZIP151 without total proteins extracts, indicating the total amount of MBP-GmbZIP151 that was loaded in each lane. h, i Actin levels measured and quantified by immunoblotting with anti-actin antibodies at the indicated time points. Actin was used as the loading control. j, k GmSMS6G and GmSMS6A levels in their overexpressed soybean lines OE1-SMS6G (3), and OE1-SMS6A (4). Protein levels were measured and quantified by immunoblotting with anti-Flag antibodies at the indicated time points. Data are shown as means ± SD (n = 3 biological replicates). The different lowercase letters in g indicate statistically significant differences (p < 0.05) determined by two-sided one-way ANOVA tests with Duncan’s multiple range tests. Source data are provided as a Source Data file.

Subsequently, the ability of GmUBQ1 to ubiquitinate GmbZIP151 was evaluated. Co-IP, Y2H, LCI, pull-down, and BiFC assays were first conducted, revealing that GmUBQ1 interacted with GmbZIP151 both in vivo and in vitro (Fig. 4b and Supplementary Fig. 14b–e), particularly in the nucleus (Supplementary Fig. 14e). To further investigate whether GmUBQ1 could facilitate GmbZIP151 ubiquitination, in vitro ubiquitination assays were performed, in which a polyubiquitinated smear of GmbZIP151 was detected using anti-MBP antibodies when GmbZIP151 was in the presence of ubiquitin, E1, E2, and GmUBQ1 (Fig. 4c), suggesting that GmbZIP151 was ubiquitinated by GmUBQ1 in vitro. Flag-Ub (Flag-tagged ubiquitin) and GmbZIP151-GFP were then co-expressed in N. benthamiana leaves via Agrobacterium-mediated transient expression, leading to the detection of a polyubiquitinated GmbZIP151-GFP smear with anti-Flag antibody via Co-IP using anti-GFP beads (Supplementary Fig. 15a). Thus, GmbZIP151 was ubiquitinated in planta. MYC-GmUBQ1 and GmbZIP151-GFP were further co-expressed in N. benthamiana leaves, followed by Co-IP investigation using anti-GFP beads alongside immunoblotting with anti-ubiquitin (anti-Ub) antibodies. A significantly enhanced polyubiquitinated smear was observed in the presence of GmUBQ1 (Fig. 4d). These results confirmed that GmbZIP151 can be ubiquitinated by GmUBQ1.

Moreover, LUC activity in the presence of GmUBQ1 was markedly decreased compared with GmbZIP151-LUC that was incubated with an empty vector control in Arabidopsis protoplasts (Supplementary Fig. 15b). Additional LUC activity assays in N. benthamiana leaves revealed that co-expression of GFP-GmUBQ1 with GmbZIP151-LUC led to sharply decreased LUC activity, while LUC activity was strong in the leaf area where GmbZIP151-LUC was co-expressed with the GFP-empty control (Supplementary Fig. 15c, d). Additionally, LUC activity was enhanced when GmbZIP151-LUC was co-expressed with GFP-GmUBQ1 in the presence of the proteasome inhibitor MG132 compared to the control, where GmbZIP151-LUC was co-expressed with GFP-GmUBQ1 without MG132 (Supplementary Fig. 15c, d). Thus, GmUBQ1 likely mediates the degradation of GmbZIP151 via the ubiquitin–26S proteasome system.

GmSMS6 weakens the GmUBQ1-mediated degradation of GmbZIP151

The effects of GmSMS6 on the ubiquitination of GmbZIP151, as mediated by GmUBQ1 were further investigated, in addition to whether the R49H substitution influences GmbZIP151 stability. To this end, equal amounts of recombinant GST-GmSMS6G or GST-GmSMS6A were co-incubated with MBP-GmbZIP151, E1, E2, and GmUBQ1. Ubiquitinated GmbZIP151 was detected with or without GmSMS6. However, the extent of ubiquitination was decreased in the presence of GmSMS6, while the ubiquitination level of GmbZIP151 incubated with GmSMS6A was weaker than that incubated with GmSMS6G (Fig. 4e). These results suggested that GmSMS6 prevents GmUBQ1-mediated ubiquitination of GmbZIP151, with GmSMS6A exhibiting a greater inhibitory effect than GmSMS6G.

The effects of GmSMS6 on the stability of GmbZIP151 was then investigated via cell-free degradation assays. Total protein extracts from WT (Jack), CR1-SMS6, OE1-SMS6G, and OE1-SMS6A seedlings were separately incubated with purified MBP-GmbZIP151 proteins. MBP-GmbZIP151 protein exhibited greater stability in OE-SMS6 extracts, but degraded more rapidly in CR1-SMS6 extracts compared to WT extracts (Fig. 4f, g). The degradation of MBP-GmbZIP151 was slower when incubated with protein extracts from OE1-SMS6A compared with those from OE1-SMS6G (Fig. 4f, g). The addition of the proteasome inhibitor MG132 further reduced the degradation rate of MBP-GmbZIP151 in the same protein extracts (Fig. 4f, g). In contrast, significant degradation was not detected for either actin or GmSMS6-Flag in the protein extracts, nor for the MBP-GmbZIP151 control in the absence of protein extracts (Fig. 4h–k). Taken together, these results indicate that GmSMS6 enhances the stability and abundance of GmbZIP151 by preventing ubiquitination and degradation of GmbZIP151 that is mediated by GmUBQ1. Moreover, GmSMS6A exhibits a stronger protective effect on GmbZIP151 compared to GmSMS6G. Consistently, co-expression of GmbZIP151-LUC, GmUBQ1, and GmSMS6 resulted in enhanced LUC activity compared to leaf tissue without GmSMS6, while co-expression with GmSMS6A led to a higher LUC activity than with GmSMS6G (Supplementary Fig. 15c, d).

GmSMS6 promotes transcriptional activation activity of GmbZIP151

To identify seed weight regulatory pathways associated with GmSMS6, RNA-seq analysis was performed using seed materials from Jack, CR1-SMS6, OE1-SMS6G, NIL-SMS6, and NIL-sms6 plants. A total of 1,181 differentially expressed genes (DEGs) (676 upregulated/505 downregulated) were identified in the CR1-SMS6/Jack comparison, 2228 (726/1502) in the OE1-SMS6G/Jack comparison, and 1421 (555/866) in the NIL-sms6/NIL-SMS6 comparison within the Zp661 background (Fig. 5a, b). A total of 139 DEGs were common across these comparisons and considered candidate target genes associated with GmSMS6 (Fig. 5b and Supplementary Data 1). Most (112) of these DEGs exhibited opposite expression patterns across the CR1-SMS6/Jack, OE1-SMS6G/Jack, and NIL-sms6/NIL-SMS6 comparisons including the following patterns: (1) Upregulated in the CR1-SMS6/Jack comparison, but downregulated in the OE1-SMS6G/Jack and NIL-sms6/NIL-SMS6 comparisons, (2) Downregulated in the CR1-SMS6/Jack comparison, but upregulated in the OE1-SMS6G/Jack and NIL-sms6/NIL-SMS6 comparisons (Supplementary Fig. 16a). Variation in expression was validated through RT–qPCR analysis of selected DEGs (Supplementary Fig. 16b). Changes in the gene expression of OE-SMS6A lines were more pronounced than in the OE-SMS6G lines, suggesting an enhanced effect of GmSMS6A on soybean seed weight (Supplementary Fig. 16b). These DEGs included genes encoding UDP-glucosyl transferase, cytochrome P450, O-fucosyltransferase, glycosyl hydrolase, and polygalacturonase, which have all been reported to regulate seed weight and size (Supplementary Fig. 16a, Supplementary Data 1 and 2). In particular, GmST05 is a homolog of Arabidopsis Mother of FT and TFL1 (MFT) and positively regulates seed weight and oil content in soybean plants22,25. Consistent with these observations, GmST05 exhibited significantly increased expression in CR1-SMS6 plants and decreased expression in OE1-SMS6 plants (Fig. 5c and Supplementary Fig. 16a). Nevertheless, differences in GmbZIP151 and GmUBQ1 expression were not observed in these comparisons (Supplementary Fig. 16b).

Fig. 5: GmSMS6 affects the regulation of ABRE-containing genes via GmbZIP151.
figure 5

a Volcano plot showing differentially expressed genes (DEGs) in the CR1-SMS6/Jack, OE1-SMS6G/Jack, and NIL-sms6/NIL-SMS6 line comparisons. Total RNA from developing seeds in the indicated genotypes at 20 days after fertilization (DAF) was subjected to RNA sequencing to identify DEGs. b Venn diagram showing the total and common number of identified DEGs among CR1-SMS6/Jack, OE1-SMS6G/Jack, and NIL-sms6/NIL-SMS6 lines. c Relative expression levels of GmST05 in developing seeds at 20 DAF in Jack, CR1-SMS6, OE1-SMS6G, and OE1-SMS6A lines. Data are shown as means ± SD (n = 3 biological replicates). d GmSMS6 enhances the transcription repression of GmST05 by GmbZIP151. Schematic representations of the effector and reporter constructs described above. Quantification of the relative luciferase (LUC) activity was conducted using protoplasts of N. benthamiana leaves. Data are shown as means ± SD (n = 3 biological replicates). The different lowercase letters in c and d indicate statistically significant differences (p < 0.05) determined by two-sided one-way ANOVA tests with Duncan’s multiple range tests. e Gene ontology (GO) enrichment analysis of the 139 DEGs common to the three comparisons of CR1-SMS6/Jack, OE1-SMS6G/Jack, and NIL-sms6/NIL-SMS6. BP, biological process; CC, cellular component; MF, molecular function. Statistical significance for e was calculated using a two-sided hypergeometric test with Benjamini–Hochberg correction for multiple comparisons. Source data are provided as a Source Data file.

The 14-3-3 protein GmSMS6 should not have the capacity to directly regulate GmST05 expression. Thus, we proposed that GmST05 may be targeted by GmbZIP151 that interacted with GmSMS6, which is consistent with several observations. First, ABA-responsive elements (ABREs) are the potential binding sites of bZIP TFs51 and were present in the putative promoters of most identified DEGs, including GmST05 (Supplementary Fig. 17). Second, GmbZIP151 enhanced LUC activity compared to the control, implicating it as a transcriptional regulator. This transcriptional activation activity was dramatically enhanced by both GmSMS6 isoforms, with no significant difference between them (Supplementary Fig. 18a). To investigate whether the GmSMS6–GmbZIP151 interaction contributes to the enhanced transcriptional activation activity of GmbZIP151, we constructed a pAD-GmbZIP151m vector that lacks the 14-3-3 binding motif (RRNMSCPL) at the C-terminal end of GmbZIP151. This motif corresponds to one previously characterized 14-3-3 binding motif in OsGF14f-OsbZIP23 of rice52. The results from the Y2H assay showed that GmbZIP151m failed to interact with GmSMS6 (Supplementary Fig. 18b). The deletion of this 14-3-3 binding motif did not impact the intrinsic transcriptional activation activity of GmbZIP151 (Supplementary Fig. 18a). However, disruption of the GmSMS6–GmbZIP151 interaction negated the enhancement of GmbZIP151’s transcriptional activation activity by GmSMS6 (Supplementary Fig. 18a), indicating that the GmSMS6’s promotion of GmbZIP151 activity is contingent upon their physical interaction. Furthermore, co-expression with GmSMS6A and GmUBQ1 resulted in a higher transcriptional regulatory activity of GmbZIP151 compared to co-expression with GmSMS6G and GmUBQ1 (Supplementary Fig. 18a). This result aligns with LUC activity assays (Supplementary Fig. 15c, d), which showed increased GmbZIP151 accumulation when co-expressed with GmSMS6A and GmUBQ1 compared to co-expression with GmSMS6G and GmUBQ1. Therefore, the stronger interaction between GmSMS6A and GmUBQ1 leads to greater stability and accumulation of GmbZIP151, thereby enhancing its transcriptional regulatory activity.

Moreover, yeast one-hybrid (Y1H) analysis verified the binding capacity of GmbZIP151 to the GmST05 promoter (Supplementary Fig. 18c). To further validate these results, an electrophoretic mobility shift assay (EMSA) was conducted. The results indicated that the probes (P1, P2, and P3) containing ABREs in the GmST05 promoter were bound by the MBP-GmbZIP151 rather than by MBP alone (Supplementary Fig. 18d, e). Accordingly, the chromatin immunoprecipitation quantitative PCR (ChIP-qPCR) analysis using transgenic hairy roots harboring GmbZIP151-GFP or GFP revealed a significant enrichment of the GmST05 promoter fragments (S1, S2, and S3) containing ABREs (Supplementary Fig. 18d, f). These findings indicate that GmST05 is a direct target of GmbZIP151. Further, co-expression with GmbZIP151 resulted in repressed GmST05 promoter-driven LUC activity and this repression was further enhanced by both GmSMS6 isoforms compared to the control (Fig. 5d). Consistent with the previous results (Supplementary Fig. 18a), compared with the co-expression of GmSMS6G and GmUBQ1, the co-expression GmSMS6A and GmUBQ1 resulted in a stronger repression of the LUC activity driven by the GmST05 promoter (Fig. 5d). Collectively, these results indicate that GmbZIP151 represses the transcription of GmST05, and this repression can be enhanced by GmSMS6. Additionally, Gene Ontology (GO) analysis revealed that the DEGs common across comparisons were enriched in functions related to regulatory processes associated with seed quality, including lipid storage, seed oil body biogenesis, lipid storage body, protein storage vacuole, aleurone grain, nutrient reservoir activity, and linoleate 9S-lipoxygenase activity (Fig. 5e), implying a potential role of GmSMS6 in modulating seed quality. Consistent with these results, a 4.0% to 7.4% increase in protein content and a 2.3% to 3.5% decrease in oil content of CR-SMS6 lines were observed compared to the Jack line (Supplementary Fig. 19a, b). Similarly, the protein content of CR-SMS6Z seeds increased by 9.4% to 12.4%, while the oil content decreased by 13.3% to 14.5% compared to Zp661 seeds (Supplementary Fig. 19c, d). Conversely, the protein content of NIL-sms6 decreased by 4.82%, while oil content increased by 6.56% compared to NIL-SMS6 (Supplementary Fig. 19e, f). These observations suggest a role for GmSMS6 in coordinating seed weight and quality.

GmSMS6 functions together with GmbZIP151 and GmUBQ1 to control seed weight

Given that GmSMS6 physically interacts with GmbZIP151 and GmUBQ1, we investigated whether GmSMS6 could operate alongside GmbZIP151 and GmUBQ1 to control seed weight. To test this hypothesis, we generated GmbZIP151 knockout lines and GmbZIP151/GmSMS6 double mutants using CRISPR/Cas9 genome editing. Representative mutant lines containing frameshift mutations were selected for further analysis (Supplementary Fig. 20a). The knockout of GmbZIP151 (CR-bZIP151) resulted in a significant increase in seed weight, indicating that GmbZIP151 acts as a negative regulator of seed weight (Fig. 6a–e and Supplementary Fig. 20b–f). Furthermore, the CR1-SMS6 exhibited a 15.01% increase in seed weight compared to Jack (Fig. 6e), whereas the CR-bZIP151/CR1-SMS6 double mutant showed only a 6.07% increase in seed weight relative to CR-bZIP151 (Fig. 6e). This suggests that GmSMS6 and GmbZIP151 additively function in determining soybean seed weight.

Fig. 6: GmSMS6 acts together with GmbZIP151 and GmUBQ1 to regulate seed weight.
figure 6

a Comparison of seed length among Jack, CR1-SMS6, CR-bZIP151, and CR-bZIP151/CR1-SMS6 lines. Bars = 10 mm. Comparison of seed length (b), seed width (c), seed thickness (d), and 100-seed weight (e) between Jack, CR1-SMS6, CR-bZIP151, and CR-bZIP151/CR1-SMS6 lines (n = 50 seeds for bd; n = 5 plants for e). f Comparison of seed length among Jack, CR1-SMS6, CR-UBQ1, and CR-UBQ1/CR1-SMS6 lines. Bars = 10 mm. Comparison of seed length (g), seed width (h), seed thickness (i), and 100-seed weight (j) between Jack, CR1-SMS6, CR-UBQ1, and CR-UBQ1/CR1-SMS6 lines (n = 50 seeds for gi; n = 5 plants for j). Data are shown as means ± SD. The different lowercase letters indicate significant differences (p < 0.05) determined by two-sided one-way ANOVA with Duncan’s multiple range tests. Source data are provided as a Source Data file.

Similarly, we generated and analyzed the GmUBQ1 single mutant (CR-UBQ1) as well was the GmUBQ1/GmSMS6 double mutant (CR-UBQ1/CR1-SMS6) lines (Supplementary Fig. 20g). The CR-UBQ1 lines exhibited a reduction in seed weight compared with Jack (Fig. 6f–j and Supplementary Fig. 20h–l), thereby supporting a role for GmUBQ1 as a positive modulator of seed weight. The knockout of GmUBQ1 in the CR1-SMS6 background partially suppressed the increased seed weight phenotype of CR1-SMS6 (Fig. 6j), implying that GmSMS6 and GmUBQ1 act, at least in part, within a shared genetic pathway. Furthermore, GmST05 expression was significantly upregulated in CR-bZIP151 plants while downregulated in CR-UBQ1 plants (Supplementary Fig. 20m, n), further indicating that GmbZIP151 and its associated complexes regulate GmST05 expression. Taken together, we propose that GmSMS6 functions additively with GmUBQ1 and GmbZIP151 to modulate seed weight.

Natural variation and yield increasing potential of GmSMS6

To evaluate evolution of GmSMS6 across soybean domestication, selection signals were investigated in the genomic regions of GmSMS6 by analyzing nucleotide diversity (π) and fixation index (FST) among 4414 re-sequenced soybean accessions53. Both π and FST analyses revealed significantly reduced diversity within the investigated genomic regions in the genomes of improved cultivars and landraces compared to wild soybeans (Supplementary Fig. 21a), indicating that GmSMS6 may have undergone artificial selection during soybean domestication. This presumption is further supported by the following analyses using a core soybean germplasm54. Amplification of GmSMS6 sequences revealed that G. max (landraces and cultivars) genes exhibited reduced nucleotide diversity compared to those in the wild soybean G. soja (Fig. 7a and Supplementary Table 1). Haplotype analysis further revealed the presence of 28 haplotypes, with GmSMS6Hap1 and GmSMS6Hap2 being predominant (Supplementary Fig. 21b and Supplementary Table 2). Wild soybean plants harbored 27 haplotypes, while G. max only harbored 4 haplotypes (Fig. 7a, Supplementary Fig. 21b, and Supplementary Table 2). The frequency of GmSMS6Hap1 was increased from 28.9% in wild soybeans to 73.7% in landraces and 93.1% in improved cultivars (Fig. 7a). Variation in the 5′ untranslated region (5′UTR) primarily distinguished GmSMS6Hap1 and GmSMS6Hap2 (Supplementary Table 2), while no LoF SMS6 allele was identified. Haplotype network analysis revealed that GmSMS6Hap1 and GmSMS6Hap2 were closely related (Supplementary Fig. 21b). However, the 100-seed weight of plants harboring GmSMS6Hap1 were significantly greater than those of GmSMS6Hap2-harboring plants in G. max, while no difference was observed within G. soja (Supplementary Fig. 21c). These results suggest that mutation of SMS6 was not selected during soybean domestication, while GmSMS6Hap1 associated with heavy seed weight, which is possibly due to expression variation, has been expanded but not fully fixed in G. max.

Fig. 7: Haplotype combinations associated with soybean seed weight variation.
figure 7

a Haplotype diversity and distribution of GmSMS6 within the soybean core germplasm54 that includes 97 wild soybeans, 198 landraces, and 29 cultivars. The proportions of the 28 identified haplotypes (designated GmSMS6Hap1GmSMS6Hap28) are shown in the pie chart for each group. b, c GmSMS6 knockout enhances soybean seed yield. b Seed yield per plant are respectively shown for the wild type ‘Jack’, CR1-SMS6, CR2-SMS6, and CR3-SMS6 lines. Bar = 10 mm. c 100-seed weight (n = 19, 19, 22, and 20 plants for Jack, CR1-SMS6, CR2-SMS6, and CR3-SMS6 lines, respectively), seed weight per plant (n = 16, 19, 19, and 19 individuals for Jack, CR1-SMS6, CR2-SMS6, and CR3-SMS6 lines, respectively), and seed yield (n = 3) measured for the Jack and indicated CR-SMS6 lines. d Haplotype distribution of GmbZIP151 within the soybean core germplasm54. The proportions of the haplotypes are shown in the pie chart for each group. e Comparison of 100-seed weight between GmbZIP151A and GmbZIP151G in G. max. f Comparison of 100-seed weight between deferent haplotype combinations in G. max. In all the box plots, the center line indicates the median, the edges of the box represent the first and third quartiles, and the whiskers extend to the smallest and largest data points within 1.5 interquartile ranges from the edges. g Haplotype distribution of haplotype combinations in G. max and G. soja. Data are shown as means ± SD. Statistically significant differences were determined by two-sided Student’s t-tests. *p < 0.05; **p < 0.01; ***p < 0.001. Source data are provided as a Source Data file.

The expression levels of the two haplotypes in G. max accessions with contrasting seed weight were compared, revealing that GmSMS6Hap2 plants exhibited higher expression in small-seed accessions compared to GmSMS6Hap1 in heavy-seed accessions at 25 days after fertilization (DAF) (Supplementary Fig. 22a, b). We speculate that variation in the 5′UTR may have led to functional changes of GmSMS6. To test this hypothesis, transient expression assays were conducted using the dual-luciferase reporter system. The 5′UTRHap1 generated lower LUC activity levels compared with the 5′UTRHap2 (Supplementary Fig. 22c). Additionally, SMS6 exhibited heterochronic variation and a higher expression level in floral buds of G. soja than in G. max when considering floral and seed development (Supplementary Fig. 22d). These results indicated that the low expression of GmSMS6 may contribute to soybean seed enlargement. Consistently, complete knockout of GmSMS6 significantly enhanced seed weight and boosted soybean yield potentials in two investigated backgrounds (Fig. 2c, e, Supplementary Fig. 5 and Supplementary Fig. 6). To further demonstrate this potential in soybean breeding, Jack and CR-SMS6 lines were grown to evaluate seed yield (80,000 plants per hectare). The 100-seed weight and yield per plant of the mutant lines were significantly higher relative to Jack plants, while changes were not observed in plant height, branch number, pod number, or node number (Fig. 7b, c and Supplementary Fig. 23). The plot yield of CR-SMS6 lines was significantly greater (6.7%, 11.5%, and 12.3%) than that of Jack (Fig. 7c). Thus, disrupting GmSMS6 represents a strategy for the genetic improvement of soybean yield.

GmSMS6, GmbZIP151, and GmUBQ1 haplotype combinations for soybean seed weight improvement

We further conducted a haplotype analysis for GmbZIP151 and GmUBQ1 using the SoyGVD, based on 4414 resequenced soybean accessions53. This analysis identified six major haplotypes for GmbZIP151 (Supplementary Fig. 24a). Among these, a unique SNP at the coding region (C to A), which results in a missense mutation, distinguished the haplotypes into two groups: GmbZIP151C and GmbZIP151A (Supplementary Fig. 24a, b). The frequency of GmbZIP151A increased from 54.7% in wild soybeans to 78.0% in landraces and reached 90.1% in cultivars (Supplementary Fig. 24b), suggesting that GmbZIP151A may have been selected during soybean domestication and modern breeding practices. A similar trend was observed within the core soybean germplasm collection, where the frequency of GmbZIP151A increased from 46.4% in wild soybeans to 89.4% in landraces and 89.3% in improved cultivars (Fig. 7d). Furthermore, no significant difference was observed in the GmbZIP151 expression between G. max and G. soja (Supplementary Fig. 24c). Soybean accessions carrying GmbZIP151C exhibited significantly higher 100-seed weight than those carrying GmbZIP151A in G. max (Fig. 7e and Supplementary Fig. 24d).

We performed a similar haplotype analysis for GmUBQ1 and identified eight haplotypes (Supplementary Fig. 25a). The frequency of GmUBQ1 Hap1 increased from 13.2% in wild soybean to 78.8% in landraces and 79.3% in improved cultivars (Supplementary Fig. 25b), suggesting that GmUBQ1 Hap1 likely underwent selection during soybean domestication and breeding. Moreover, no significant differences were observed in GmUBQ1 expression between G. max and G. soja (Supplementary Fig. 25c). While the coding sequence of GmUBQ1 was highly conserved, only one SNP (G/A) resulting in a missense mutation was detected (Supplementary Fig. 25a). Thus, we identified two major haplotypes, GmUBQ1G and GmUBQ1A, in the soybean core germplasm collections (Supplementary Fig. 25d). GmUBQ1G was found to be completely fixed within cultivars (Supplementary Fig. 25d).

Based on these results, we focused haplotype combination analysis primarily on GmSMS6 and GmbZIP151, and found that soybean accessions carrying the GmSMS6Hap1/GmbZIP151C haplotype combination exhibited the heaviest seed weight when compared against other possible combinations (Fig. 7f, Supplementary Fig. 25e and Supplementary Table 3). Variance analysis revealed that GmSMS6 and GmbZIP151 respectively explained 8.82% and 10.38% of the phenotypic variation in seed weight (Supplementary Table 4). Consistently, within both the GmSMS6Hap1 and GmSMS6Hap2 backgrounds in G. max, the GmbZIP151C haplotype was associated with significantly greater seed weight than GmbZIP151A (Fig. 7f and Supplementary Table 3). These results demonstrate that specific haplotype combinations contribute to natural variation in soybean seed weight, with the GmSMS6Hap1/GmbZIP151C combination representing a favorable allele combination for enhancing seed weight. However, this advantageous combination was present in only 9.3% of G. max accessions (Fig. 7g). These findings highlight the considerable breeding potential of the GmSMS6Hap1/GmbZIP151C combination for enhancing soybean seed weight and yield. To gain a more comprehensive understanding of its potential applications, future studies could benefit from examining these associations in a broader and more genetically diverse panel of soybean germplasm.

Discussion

Identifying the mechanisms and key genes involved in seed yield and quality is critically important for guiding future plant breeding efforts and accelerating the development of improved crop varieties. Many genes controlling seed weight have been identified in soybean (see Introduction). However, our understanding of the genetic and molecular basis of soybean seed weight and size regulation remains limited, and an efficient strategy has not yet been identified to increase soybean yield. In this study, GmSMS6 was identified to encode a 14-3-3 protein that negatively modulates seed weight and size primarily by repressing cell expansion. A function-enhanced mutation of this gene caused by a single mutation (R49H) was further investigated. Moreover, GmSMS6-driven mechanism for the regulation of seed traits was identified, providing insights into soybean yield potentials and establishing a framework for high-yield and high-quality soybean breeding.

Roles for 14-3-3 genes in seed development have been rarely reported, although 14-3-3 proteins are highly conserved across eukaryotic organisms and play diverse roles in plant growth, development, and abiotic stress responses39,40,55. The potential roles of a few 14-3-3 proteins as negative modulators of seed size have only been reported in rice56,57,58,59, but the molecular mechanisms and specific functions of 14-3-3 proteins in the modulation of seed weight remain unclear. In this work, the soybean sms6 mutant obtained through EMS mutagenesis was targeted on a 14-3-3 gene, thereby providing the first evidence linking a 14-3-3 gene and seed weight in soybean plants. Knockout of GmSMS6 led to increased seed weight/size without affecting other morphological traits. In contrast, overexpression of GmSMS6 resulted in small seeds. These results supported that GmSMS6 is a specific modulator of seed development. A single mutation of GmSMS6 (causing the R49H substitution) sufficiently enhanced its repressive role in seed weight, without altering mRNA levels, thereby producing overexpression-like effects. These results suggest that the conserved R49 residue is key to the biochemical and biological functions of GmSMS6. Whether the effects are conserved for other 14-3-3 proteins remains unknown. Nevertheless, these results indicate that GmSMS6 encodes a 14-3-3 protein that is an indispensable component of genetic repertoires that regulate seed development in soybean.

In addition to the above, the molecular mechanism of the 14-3-3 protein GmSMS6 underlying its modulation of seed weight and size was identified. Physically reciprocal interactions among GmSMS6, GmbZIP151, and GmUBQ1 may provide a mechanistic basis to modulate seed weight (Fig. 8). GmbZIP151 encodes a homolog of Arabidopsis ABI5 that negatively regulates seed weight38, while GmUBQ1 encodes a RING-type E3 ligase. Further, the ubiquitin-related pathway is known to play a crucial role in seed weight regulation14,32. However, few E3 ubiquitin ligases have been reported to specifically regulate seed weight in soybean. Increasing evidence suggests that ABI5 abundance is modulated by ubiquitination that is mediated by E3 ligases60,61,62,63,64. Nevertheless, it has remained challenging to understand mechanisms underlying how E3 ligases fine-tune optimal levels of substrates through post-translational regulation. In this study, GmUBQ1 was demonstrated to specifically ubiquitinate GmbZIP151, thereby promoting its degradation via the 26S proteasome-mediated protein degradation pathway. GmUBQ1 also interacts with GmSMS6, but does not ubiquitinate GmSMS6. Thus, GmSMS6-GmUBQ1 formation may prevent GmUBQ1-mediated degradation of GmbZIP151, resulting in the increased stability and abundance of GmbZIP151 (Fig. 8). Moreover, GmSMS6 enhanced the transcriptional activation activity of GmbIZP151, thereby regulating downstream genes (Fig. 8). DEGs, including GmST05 and those encoding UDP-glucosyl transferase, cytochrome P450, O-fucosyltransferase, glycosyl hydrolase, and polygalacturonase were identified by RNA-seq as targets of GmbIZP151, because they contain putative ABREs. Increased GmbZIP151 levels were also shown to lead to greater repression of ABRE-containing gene expression, including of GmST0522,25, thereby repressing cell size and resulting in reduced seed weight (Fig. 8). These results are consistent with the well-known functions of 14-3-3 proteins39,58,65,66. In addition, genetic manipulation of GmSMS6 did not alter the transcript levels of GmbZIP151 and GmUBQ1. Thus, these results demonstrate that GmSMS6 may bridge and fine-tune the balance between ubiquitination and transcriptional activation activity of a bZIP TF. Consistently, a single mutation in GmSMS6 (the R49H substitution) sufficiently enhanced the biochemical and developmental roles of GmSMS6 to phenocopy its overexpression, without altering its mRNA levels, where GmUBQ1 was mainly arrested by enhanced or overexpressed GmSMS6, leading to increased abundances and activity of GmbZIP151, thereby repressing seed weight and size. Conversely, knockout (LoF) of GmSMS6 resulted in reduced abundance and activity of GmbZIP151 concomitant with significantly increased seed weight and size (Fig. 8). Therefore, the 14-3-3 protein GmSMS6 functions as a hub switch and cooperates with GmUBQ1 to fine-tune the ubiquitination and abundance of GmbZIP151 to modulate seed size and yield. This hypothesis was further supported by genetic analyses demonstrating that GmSMS6 functions additively with GmUBQ1 or GmbZIP151 to regulate seed weight. Given the complexity of their functions, including diverse biochemical activities, interacting partners, and downstream targets13,14,39,40, it is unlikely that GmSMS6, GmbZIP151, and GmUBQ1 operate solely through a linear pathway or as a single unified complex. While our findings provide insights into this regulatory mechanism, further molecular and genetic studies are needed in order to fully elucidate the intricate network.

Fig. 8: A proposed working model of the mechanism by which GmSMS6 negatively modulates soybean seed weight.
figure 8

a Molecular interactions in loss-of-function (LoF) or function weakened mutants of GmSMS6 (SMS6). b Molecular interactions in wild type. c Molecular interactions in overexpression or function enhanced mutants of SMS6. Protein-protein interactions (PPIs) of SMS6 and GmbZIP151 (bZIP) promote the transcriptional activation activity of bZIP. PPI of SMS6 and GmUBQ1 (UBQ1) prevents the PPI of UBQ1 and bZIP, thereby weakening the ubiquitination and degradation of the bZIP. PPI of UBQ1 and bZIP ubiquitinates the bZIP, mediating the degradation of this transcription factor. Thus, GmSMS6 cooperates with GmUBQ1 to balance and fine-tune the protein homeostasis and activity of GmbZIP151, which is highlighted in the blue window. Solid lines between the three proteins indicate their PPIs, and dashed lines indicate loss of PPIs in GmSMS6 knockout. The variation of line thickness indicates the strength of activity or interaction. ABREs-containing genes (ABREGs) may be regulated by functional GmbZIP151, thereby determining seed weight and size. For example, GmbZIP151 negatively regulates GmST05, which is a positive regulator of soybean seed weight22,25. Increased abundance and transcriptional activation activity of GmbZIP151 leads to reduced seed weight and size. The interaction between GmSMS6A and GmUBQ1 is stronger than that of GmSMS6G, thereby resembling GmSMS6 overexpression and resulting in high GmbZIP151 abundance and consequently reduced seed size. Knocking down or LoF of GmSMS6 enhances the degradation of GmbZIP151, leading to increased seed size. Consequently, the balance of functional bZIP and UBQ1 coordinated by SMS6 via its PPIs with bZIP and UBQ1 are essential for regulating seed weight variation. Blue and gray arrows indicate the transcriptional activation activity and ubiquitination levels. Green arrow indicates the activation of ABREGs while black T line stands for the repression. The thickness of arrows indicate the strength of activity or regulation.

14-3-3 ligand proteins exhibit distinct preferences for specific isoforms of 14-3-3 proteins39,40,67. However, the molecular basis underlying such isoform-specific interactions remains largely unclear. Site-specific mutagenesis studies have shown that mutations at certain residues of 14-3-3 proteins can influence interactions with their interacting proteins39,68,69,70. Although the overall structures of 14-3-3 isoforms are highly similar, subtle variations have been observed in the angle between the two subunits, the length of loop regions, and the length of α-helices—particularly helices H3 and H4 and the loop connecting them40,71,72. In this study, we found that the mutation from GmSMS6G to GmSMS6A did not alter subcellular localization, homodimerization, and mRNA expression. However, GmSMS6A, which carries the R49H substitution relative to the wild-type GmSMS6G, exhibited an enhanced interaction with GmUBQ1. This alteration may be attributed to the R49H mutation residing within helix H3, a site that is highly conserved among 14-3-3 proteins. GmSMS6 encodes a repressor of seed weight, and overexpression of GmSMS6A led to a more pronounced reduction in seed weight than GmSMS6G. Therefore, the sms6 mutant harboring GmSMS6A is a function-enhanced mutation relative to GmSMS6G.

The influence of 14-3-3 genes on seed quality remains largely unexplored73,74. In this study, GmSMS6 was shown to modulate seed quality while coordinating oil and protein contents. RNA-seq analysis revealed that genes involved in lipid storage, seed oil body biogenesis, lipid storage body, protein storage vacuole, and linoleate 9S-lipoxygenase activity were affected by GmSMS6 manipulation. Further, increased protein content by 4.0% to 12.4% was observed in gene-edited GmSMS6 soybean lines alongside decreased oil content by 2.3% to 14.5%. The trend in oil content variation is consistent with the effects of manipulating 14-3-3 genes in Arabidopsis73,74. However, the magnitude of variation and trends in protein/oil contents differ from that observed for GmST05 genetically modified soybean plants22,25. Thus, more complex regulatory mechanisms for seed quality are likely present in soybean plants, requiring further investigation. Nevertheless, GmSMS6 may coordinate seed quality with seed weight in soybean, potentially helping to address challenges arising from low quality that are often associated with high-yield soybean varieties.

Genetic variation can cause developmental repatterning that has played an essential role in the domestication of soybean seed traits1,2. Mutations randomly occur during evolution, although not all genes that significantly affect seed traits could have been selected or fixed during crop domestication. Here, natural variation and allele diversity were evaluated in soybean germplasms, revealing the lack of LoF alleles in GmSMS6, but the presence of multiple haplotypes with two major ones being identified (GmSMS6Hap1 and GmSMS6Hap2). GmSMS6Hap1, associated with heavy seed weight, was significantly expanded during soybean domestication and improvement. The primary variation between GmSMS6Hap1 and GmSMS6Hap2 was identified in the 5′UTR regions, which might lead to GmSMS6Hap1 expression negatively correlating with seed size variation in G. max, in addition to the heterochrony of GmSMS6Hap1 expression during seed development between G. max and G. soja. Further evaluation is needed to explore the mechanisms underlying these observations. Nevertheless, utilizing a LoF allele of GmSMS6 or gene editing to disrupt or downregulate GmSMS6 function may increase soybean seed weight and yield, as demonstrated in the Jack and Zp661 lines. Moreover, since GmSMS6 is conserved in soybean, the enhancement strategy may be generalized to different genetic backgrounds. Additionally, we found that the GmbZIP151C haplotype, which was associated with higher seed weight, is relatively rare in cultivated varieties, whereas the GmUBQ1G haplotype had become completely fixed in cultivated soybean. The GmSMS6Hap1/GmbZIP151C combination represents a favorable allele combination for seed weight. However, it appears not to have been efficiently utilized during soybean domestication. These findings highlight the breeding potential of the GmSMS6Hap1/GmbZIP151C combination in increasing soybean seed weight and yield. Thus, unknown and partially utilized genetic components during domestication and improvement await further identification and may have great potential for improving future breeding.

In summary, the 14-3-3 protein GmSMS6 was identified in this study as a key genetic component that regulates seed weight and size in soybean. Moreover, the R49H mutation was shown to enhance these functions. GmSMS6 was also shown to cooperate with GmUBQ1 to achieve post-translational modifications of GmbZIP151, thereby coordinating the activity, stability, and abundance of this TF. Furthermore, knockouts of GmSMS6 in multiple backgrounds stably enhanced soybean seed weight and yield without observable adverse effects, thereby strengthening the potential importance of leveraging gene loss. Thus, genetically modifying GmSMS6 modulates soybean seed size and quality, further strengthening the breeding value of this gene. In addition, a LoF allele of GmSMS6 was not detected in available soybean resources. Thus, this study revealed a genetic regulatory mechanism driven by a 14-3-3 protein that has potential to boost soybean yield and modulate seed quality. The specific haplotype combinations of GmSMS6, GmUBQ1, and GmbZIP151 have potential in breeding soybean cultivars with increased seed yield and quality. Nevertheless, the mechanisms of GmbZIP151 and GmUBQ1 functioning with GmSMS6 in modulating seed developments requires further investigations. This work overall provides insights into soybean domestication and improvement by revealing a post-translational mechanism that underlies seed weight variation.

Methods

Plant materials and growth conditions

The sms6 mutant was isolated from a soybean EMS mutant library of the ‘Zhongpin661’ (Zp661) background41. Zp661 was crossed with sms6 to generate F1 plants and then self-crossed to produce F2 and F3 populations. An RHL from the F3 population that was heterozygous at the qSMS6 locus and homozygous in the Zp661 background was continuously self-crossed to create a pair of NILs, referred to as NIL-SMS6 and NIL-sms6. For field experiments, materials were planted and evaluated in experimental fields at the Institute of Botany, Chinese Academy of Sciences (IBCAS, Beijing, China, 39.9°N, 116.3°E) during the growing season from May to October in 2019–2021. Transgenic soybean plants were generated from the Zp661 and Jack backgrounds. For in-house experiments, soybean plants were cultivated in a growth chamber under a 16-h light/8-h dark cycle at 30/25 °C, with a light intensity of 430–460 μmol m–2s–1.

Evaluation of agronomic traits and field trials

Plant height, branch number, node number, pod number, seed length, seed width, seed thickness, and 100-seed weight were measured24,27. Cell size was measured by imaging the cotyledons of mature seeds using a frozen scanning electron microscope S-4800 (Hitachi, Tokyo, Japan). The epidermal cells in the central part of the ventral surface were measured using the ImageJ software program (National Institutes of Health, Bethesda, Maryland, USA). Protein/oil contents were measured using a near-infrared spectrum system (NIRS DS 2500F, FOSS, Denmark). The field trial was conducted under natural conditions in Beijing (39.9°N, 116.3°E)7,75,76,77. Jack, CR1-SMS6, CR2-SMS6, and CR3-SMS6 lines were planted at a density of 80,000 plants per hectare in three replicates. Each plot consisted of six rows, with each row measuring 3 m in length and being spaced 50 cm apart. Upon reaching maturity, at least 16 plants were randomly selected from each of the three plots to measure plant height, branch number, node number, and pod number. The threshed seeds were naturally sun-dried for 3 days and then weighed. Seed weight data were subsequently converted to soybean yield per hectare7,75,76,77.

Map-based cloning

To conduct bulk segregant analysis, 30 plants exhibiting a wild-type phenotype and 30 displaying the mutant phenotype were selected from the F2 population derived from the cross between sms6 and Zp661. Genomic DNA was extracted from all individuals and mixed in equal amounts to construct the wild-type (Bulk_L) and mutant (Bulk_S) DNA pools. Sequencing libraries for each pool, in addition to the parental lines Zp661 and sms6, were prepared for whole-genome sequencing on the Illumina HiSeq2500 sequencing platform using standard manufacturer protocols. The libraries were sequenced to achieve an average sequencing depth of 30x. Sequence reads from the two pools and parental lines were mapped to the reference genome (Wm82.a2.v1) using the BWA software package78 and default parameters. SNP identification was performed using the GATK software package79. The SNP and ΔSNP indices were calculated to identify the confidence intervals42,80. To conduct fine mapping, a large mapping population (n = 4120) was developed from the F2 population that was heterozygous at the qSMS6 locus. Recombinants were then identified from the fine-mapping population using molecular makers developed from resequencing data (Supplementary Data 3).

Vector construction and soybean transformation

To achieve CRISPR/Cas9 genome editing of GmSMS6, GmbZIP151, and GmUBQ1, small-guide RNAs (sgRNAs) were designed using the targetDesign server (http://skl.scau.edu.cn/targetdesign/). Two targets were selected and cloned into the pCas9-adaptor vector81. The recombinant constructs were introduced into Agrobacterium strain EHA105 for the genetic transformation. Positive CRISPR-Cas9 induced mutants were identified by confirming the presence of the Cas9 gene. The editing models were identified by sequencing the DNA regions spanning the target sites using gene-specific primers (Supplementary Data 3). To achieve overexpression, the coding sequences (CDS) of GmSMS6G and GmSMS6A were respectively amplified from Zp661 and sms6 and then cloned into the pTF101 vector under the control of the cauliflower mosaic virus (CaMV) 35S promoter. The recombinant constructs were then introduced into Agrobacterium strain EHA105. The Jack and Zp661 lines were subsequently genetically transformed using the cotyledonary node method82.

RNA extraction and RT–qPCR

Total RNA was extracted from plant tissues using the RNAprep Pure Plant Plus Kit (Tiangen, Beijing, China). First-strand cDNAs were synthesized using the Reverse Transcription kit (TaKaRa, Shiga, Japan). RT–qPCR was then performed using the SYBR Premix EX Taq (TaKaRa, Shiga, Japan) on the Mx3000P qPCR system (Agilent, Santa Clara, California, USA). The GmActin11 (Glyma.18G290800) gene was used as the internal control.

RNA-seq analysis

Total RNA was isolated from young seeds (20 days after fertilization) of the Jack, CR1-SMS6, OE1-SMS6G, NIL-SMS6, and NIL-sms6 lines, with three biological replicates used for each sample. A total of 15 libraries were constructed and sequenced using the DNBSEQ platform (BGI Genomics Co., Ltd Shenzhen, China). The clean sequence reads were mapped to the soybean Williams 82 reference genome (Wm82.a2.v1) using the HISAT2 software program83. The HTseq 0.11.2 program was then employed to enumerate the reads mapped to each gene84. The DESeq2 package for R was used to analyze the DEGs85. GO annotation files were constructed using the EggNOG-mapper v2 software program86. Expression data for GmSMS6 in comparisons of G. max and G. soja were extracted from previous RNA-seq analyses87.

Y2H assay

The full-length CDS of GmSMS6 was cloned into the pGBKT7 vector and transformed into the yeast strain Y2H Gold. The library screen was performed using the Matchmaker Gold Yeast Two-Hybrid System (Takara Bio USA, San Jose, California, USA). To verify interactions between GmSMS6, GmbZIP151, and GmUBQ1, the CDSs of GmbZIP151, GmbZIP151m (lacking the 14-3-3 binding motif), and GmUBQ1 were cloned into the pGADT7 vector. The CDS of GmbZIP151 was cloned into pGBKT7. Paired plasmids were then co-transformed into the Y2HGold Yeast Strain and grown on SD/-Trp-Leu (SD/-T-L) medium. After 3 days, positive clones were further detected on SD/-Trp-Leu-His-Ade (SD/-T-L-H-A) medium containing X-α-Gal. pGBKT7-53 (pBD-53) and pGADT7-T (pAD-T) were used as the positive controls, while pGBKT7 (pDB) and pGADT7 (pAD)-associated combinations and pBD-CR1-SMS6 were used as the negative controls. β-galactosidase activity was measured using a liquid assay with yeast (AH109) transformants, following the manufacturer’s instructions (Takara Bio USA, San Jose, California, USA).

BiFC assay

The CDSs of GmSMS6G, GmSMS6A, GmbZIP151, and GmUBQ1 were cloned into vectors containing the C-terminal (cYFP) or N-terminal (nYFP) fragments of YFP to generate recombinant proteins fused with cYFP or nYFP. The recombinant constructs were transformed into Agrobacterium strain GV3101, and the appropriate pairs were co-transformed into N. benthamiana leaf epidermal cells. After 36–48 h of incubation, YFP signals were detected using the Olympus FV1000MPE confocal laser scanning microscope (Olympus, Tokyo, Japan).

LCI assay

The CDSs of GmSMS6G, GmSMS6A, GmbZIP151, and GmUBQ1 were cloned into the pCAMBIA-1300-nLUC or pCAMBIA-1300-cLUC vector to generate the nLUC- or cLUC-fused recombinant proteins, respectively. Recombinant constructs were transformed into GV3101 and equal volumes of Agrobacterium cultures containing the appropriate pairs were co-transformed into N. benthamiana leaf epidermal cells. After 36–48 h of incubation, 1 mM of luciferin was applied to the leaves to detect luminescence using a Tanon 5200 CDD camera (Tanon, Shanghai, China).

Pull-down assay

To investigate the interactions among GmSMS6, GmbZIP151, and GmUBQ1, recombinant GST-GmSMS6G, GST-GmSMS6A, MBP-GmbZIP151, and His-GmUBQ1 fusion proteins were expressed in E. coli strain BL21 (DE3) cells and purified using GSTSep Glutathione Agarose Resin (Yeason, Shanghai, China), MBPSep Dextrin Agarose Resin (Yeason, Shanghai, China), and Ni-NTA Agarose Resin (LABLEAD, Beijing, China), respectively. To conduct pull-down assay, equimolar amounts of GST, GST-GmSMS6G, and GST-GmSMS6A pre-incubated beads were incubated with equal amounts of purified MBP-GmbZIP151 or His-GmUBQ1 at 4 °C for 2 h. Similarly, equimolar amounts of MBP and MBP-GmbZIP151 pre-incubated beads were incubated with equal amounts of purified His-GmUBQ1. The protein-bead complexes were then washed with washing buffer five times36,37. Eluted proteins were examined by immunoblotting with anti-GST, anti-MBP, and anti-His antibodies (TransGen, 1:10,000), respectively.

Co-IP assay

The CDSs of GmSMS6G and GmSMS6A were cloned into the pCAMBIA2300-Flag vector to generate GmSMS6G-Flag and GmSMS6A-Flag constructs. In addition, the CDSs of GmbZIP151 and GmUBQ1 were cloned into the pCAMBIA1300-GFP and pCAMBIA2300-MYC vectors to generate GmbZIP151-GFP and MYC-GmUBQ1 constructs, respectively. The constructs were then transformed into GV3101 and inoculated into N. benthamiana leaf epidermis tissues in appropriate pairs. Total proteins were then extracted using extraction buffer36,37. The supernatants were incubated with anti-GFP or anti-MYC magarose beads (Alpalifebio, Shenzhen, China) at 4 °C for 3 h and then washed with washing buffer four times. The bound proteins were then detected by immunoblotting with anti-Flag, anti-GFP, and anti-MYC antibodies (TransGen, 1:10,000), respectively.

In vitro ubiquitination assay

In vitro ubiquitination assay was performed using purified proteins88. Specifically, GST-GmSMS6G, GST-GmSMS6A, MBP-GmbZIP151, and His-GmUBQ1 were expressed in E. coli BL21 (DE3) cells and purified. His-GmUBQ1 was incubated alone as the E3 ligase for self-ubiquitination or together with MBP-GmbZIP151 as the substrate, and either in combination with GST-GmSMS6G or GST-GmSMS6A. The reaction mixture also included E1 (UBE1, Yeason), E2 (UBE2N, Yeason), and ubiquitin (Ub, Yeason) that were incubated together in a reaction buffer at 37 °C for 2 h. The reaction was stopped by adding SDS loading buffer and the ubiquitinated proteins were detected by SDS-PAGE analysis using anti-ubiquitin antibody (SANTA CRUZ, 1:1000), anti-MBP antibody (TransGen, 1:10,000), and anti-His antibody (TransGen, 1:10,000).

In vivo ubiquitination assay

In vivo ubiquitination assay was conducted to examine protein degradation and stability89,90. For the GmbZIP151 ubiquitination assay, Agrobacterium strain GV3101 cells harboring the constructs Flag-Ub, GmbZIP151-GFP, and GmUBQ1-MYC were inoculated into N. benthamiana leaves. To prevent GmbZIP151-GFP degradation, 50 µM of MG132 was injected via the suspension. Leaf tissues were then harvested 36 h after infiltration and total proteins were extracted from the leaves. Supernatants were incubated with anti-GFP magarose beads (Alpalifebio, Shenzhen, China) for 3 h at 4 °C. The beads were then washed with washing buffer four times. Bound proteins were detected by immunoblot analysis using anti-Flag (TransGen, 1:5000) and anti-GFP antibodies (TransGen, 1:5000).

The CDS of GmbZIP151 was cloned into the pGreenII 0800-LUC plasmid, with the Renilla luciferase (REN) gene used as an internal control. The GmbZIP151-LUC, GFP-GmUBQ1, GFP-empty, GmSMS6G-Flag, and GmSMS6A-Flag constructs were then transformed into Arabidopsis protoplasts and N. benthamiana leaves in appropriate combinations, with or without MG132. LUC and REN activities were assayed with a Dual-Luciferase Reporter Assay System (Promega, Madison, Wisconsin, USA). Luminescence was observed using a Tanon 5200 CDD camera (Tanon, Shanghai, China).

Cell-free degradation assay

Cell-free degradation assay was performed using total proteins from plants37,88. Specifically, total proteins were extracted from WT (Jack), CR1-SMS6, OE1-SMS6G, and OE1-SMS6A line seedlings. Recombinant purified MBP-GmbZIP151 proteins were then incubated in equal amounts with the total protein extracts in 10 mM ATP with or without 50 μM MG132 at 30 °C for varying times (0, 30, 60, 90, 120, and 150 min). Protein levels were then detected by western blotting analysis using anti-MBP (TransGen, 1:10,000), anti-Flag (TransGen, 1:10,000), anti-Actin antibodies (Abclonal, 1:5000).

Y1H assay

The CDS of GmbZIP151 was cloned into the pGADT7 vector to generate the pAD-GmbZIP151 construct. The 2-kb promoter fragment of GmST05 was then cloned into the pAbAi vector to generate the proGmST05-pAbAi construct. The pairs of constructs were co-transformed into the Y1H Gold yeast strain, while the empty vector pGADT7 was used as a control. Positive transformants grown on SD/-Leu medium were further plated on SD/-Leu medium with or without Aureobasidin A (AbA) to detect the binding of GmbZIP151 to the GmST05 promoter.

EMSA assay

The recombinant protein MBP-GmbZIP151 was expressed in E. coli BL21 (DE3) cells and purified as described in pull-down assay. After synthesis of the probe sequence, the corresponding probes were incubated with 10 μg of proteins at 25 °C for 2 h. The resulting complexes were then analyzed by 2% (w/v) agarose gel electrophoresis91,92.

ChIP-qPCR assay

The ChIP assay was performed in soybean hairy roots expressing GFP or GmbZIP151-GFP26. Five grams of hairy root tissue were harvested and crosslinked with 1% (v/v) formaldehyde to facilitate nuclear extraction. Immunoprecipitation with the anti-GFP antibody (Proteintech, Wuhan, China) and reverse-crosslinking were performed according to the manufacturer’s instructions of the Chromatin Immunoprecipitation Kit for Plant (Gene Create, Wuhan, China). The enrichment of DNA fragments was analyzed through qRT-PCR.

Transcriptional activation activity assay

Transcriptional activation activity assay was performed in N. benthamiana leaves26. The CDS of GmSMS6, GmbZIP151, and GmbZIP151m was fused with GAL4-BD under the control of the 35S promotor to generate the GBD-GmSMS6, GBD-GmbZIP151, and GBD-GmbZIP151m constructs. A 2-kb promoter fragment upstream of the ATG of GmST05 was cloned into the pGreenII 0800-LUC vector to generate the reporter. The 35S:GmSMS6, 35S: GmbZIP151, and 35S:GmUBQ1 were used as effectors. The constructs were co-transformed into N. benthamiana leaf protoplasts in appropriate combinations, while the REN gene was used as an internal control. LUC and REN activities were measured using a dual-luciferase reporter assay system according to the manufacturer’s protocols (Promega, Madison, Wisconsin, USA).

Evolutionary analysis

The CDSs of GmSMS6, GmbZIP151, and GmUBQ1 from Zp661 were isolated. Their homologs from Glycine max, Arabidopsis thaliana, Medicago truncatula, and Oryza sativa were also retrieved from the NCBI and Phytozome (https://phytozome-next.jgi.doe.gov/) databases. Amino acid sequences for each protein were aligned using MUSCLE and a maximum likelihood phylogenetic tree was generated using MEGA 11 (https://www.megasoftware.net/) using default parameters27. Resequencing data for 4414 accessions were obtained from a published soybean population to evaluate population-level characteristics53. FST and π values were calculated using the VCFtools software program using 100-kb sliding windows and a 5-kb step90. Haplotype analysis, nucleotide diversity, and Tajima’s D values were calculated using a core soybean germplasm54 using the PopART software program and a TCS approach (https://popart.maths.otago.ac.nz/) in addition to DnaSP (https://www.ub.edu/dnasp/). Phenotypic contribution rates were analyzed based on the analysis of variance93.

Primer synthesis and DNA sequencing

All primers (Supplementary Data 3) were commercially synthesized. The obtained DNA fragments and resultant constructs were commercially sequenced by Sangon Biotech (Shanghai) Co., Ltd (Shanghai, China).

Statistics and reproducibility

Experimental data were analyzed using the SPSS Statistics 22 (IBM Corp, Armonk, NY, USA) and GraphPad Prism 9.5 (GraphPad Software, Boston, MA, USA) software programs. In general, statistically significant differences were detected using two-sided Student’s t-tests or one-way ANOVA tests.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.