Abstract
Seed size/weight plays an important role in determining crop yield, yet only few genes controlling seed size have been characterized in soybean. Here, we perform a genome-wide association study and identify a major quantitative trait locus (QTL), named GmSW17 (Seed Width 17), on chromosome 17 that determine soybean seed width/weight in natural population. GmSW17 encodes a ubiquitin-specific protease, an ortholog to UBP22, belonging to the ubiquitin-specific protease (USPs/UBPs) family. Further functional investigations reveal that GmSW17 interacts with GmSGF11 and GmENY2 to form a deubiquitinase (DUB) module, which influences H2Bub levels and negatively regulates the expression of GmDP-E2F-1, thereby inhibiting the G1-to-S transition. Population analysis demonstrates that GmSW17 undergo artificial selection during soybean domestication but has not been fixed in modern breeding. In summary, our study identifies a predominant gene related to soybean seed weight, providing potential advantages for high-yield breeding in soybean.
Similar content being viewed by others
Introduction
Seed is one of the most important organs of plants because it plays central role in fitness and heredity and is also a major component of yield1,2. Seed development is a complex process that requires the coordinated integration of numerous genetic, metabolic, and physiological pathways, in addition to environmental cues3. In the initial proliferative phase of seed development, cell numbers increase while a relatively constant cell size is maintained. Subsequently, there is a significant increase in cell size until the predetermined size of the organ is attained4. Throughout this process, several regulators have been reported to function between various stages of the cell cycle, especially during G1-to-S transition, to determine the final seed size5,6,7,8. For instance, overexpression of CYCB1;4 was showed to increase seed size by accelerating cell cycle progression in Arabidopsis9; D-type cyclins (CYCDs) and E2F were identified as the key cell cycle regulators involved in G1-to-S transition5,7,10; and knocking out DP-E2F-LIKE1 (DEL1) could increase ploidy levels11 and result in 11% increase in seed size in Arabidopsis12.
To date, several pathways that control seed size, including the HAIKU (IKU) pathway, G-protein signaling pathway, mitogen-activated protein kinase (MAPK) signaling pathway, phytohormone perception and homeostasis pathway, and transcriptional regulatory pathway have been identified, among which the ubiquitin-proteasome pathway is particularly important1,13,14,15. As a crucial protein modification, ubiquitination significantly affects various cellular processes, including hormone signaling, stress responses, and organ development16,17,18,19,20. Recently, several ubiquitin-related genes involved in seed size regulation have been functionally characterized. For instance, DA1 and DA1-related (DAR1) which encode predicted ubiquitin receptors4, and ENHANCER OF DA1 (EOD1)/BIG BROTHER (BB) and DA2, which encode RING-type protein with E3 ubiquitin ligase activity21, were found to regulate final seed size by restricting the period of cell proliferation in Arabidopsis4. Additionally, a homolog of DA2 also participates in controlling seed size in rice, maize and wheat22,23,24,25. Unlike the ubiquitin-proteasome pathway, which primarily involves polyubiquitination, monoubiquitination mainly contributes to DNA repair, gene expression and receptor endocytosis26. Additionally, it has been reported that monoubiquitination of histone H2B is typically associated with transcriptional activation27,28,29. For instance, OsUBR7 mediates H2Bub to regulate cell cycle related genes, thereby influencing plant height in rice30. Deubiquitination is a reversible process to ubiquitination. Ubiquitin can be cleaved from modified substrates by deubiquitinating enzymes (DUBs) to maintain the ubiquitination of target proteins at a determined level. Ubiquitin-specific proteases (USPs/UBPs) are a major class of DUBs and constitute a large family with multiple members. It has been revealed that different UBPs play distinct, extensive and significant roles in plant development and stress responses31,32,33,34,35. However, studies on the function of deubiquitination in seed development have been limited. To date, UBP14 has been shown to regulate endoreplication and cell/organ growth in Arabidopsis by influencing the stability of CYCA2;3 and CDKB1;136, and OsUBP15 functions as a positive regulator of grain width and size in rice, although its substrates have not been identified thus far37. OTU1, one of the OTU family, regulates DA1 and DA2 in an epigenetic manner, ultimately influencing seed size in Arabidopsis38. However, whether other UBPs are involved in seed size determination remains unclear.
Soybean [Glycine max (L.) Merr] is an important oilseed crop that supplies the majority plant protein and oil for humans and animals. With the ongoing population growth and improvement of people’s living standards, it is estimated that soybean yield will need to double by 205039. Soybean yield is a multifaceted trait that can be influenced by various components, and seed size plays a crucial role in high-yield soybean breeding. Identifying the key genes controlling seed size and elucidating the underlying regulatory mechanism are predominant essential for high-yield soybean breeding. Through forward genetic methods, either by genome-wide association studies (GWAS) or by QTL mapping, several seed size-related genes, such as Phosphatase 2C-1 (PP2C-1, Glyma.17G221100)40, GmKIX8-1 (Glyma.17G112800)41, GmSWEET10a (Glyma.15G049200), GmSWEET10b (Glyma.08G183500)42, Dt2 (Glyma.18G273600)43 and GmST05 (Glyma.05G244100)44, have been identified in soybean. In addition, through homology analyses, several genes that have been shown to control seed size in other species were found to have conserved functions in soybean, such as GmCYP78A which is a homolog of P450/CYP78A (cytochrome P450 family) from Arabidopsis45,46; GmCIF1 and GmC/VIF2, which are homologs of AtCIF147; and GmBS1 and GmBS2, which are homologs of BIG SEEDS1 (BS1) from Medicago48. However, the understanding of seed size determination in soybean through either gene discovery or on the basis of underlying mechanisms is insufficient for molecular design breeding, especially in natural populations. Further functional studies of the regulation of seed size will provide important genetic resources for soybean breeding.
In this study, through a GWAS of more than 1800 soybean accessions, we identify GmSW17, a homolog of Arabidopsis UBP22, as a positive regulator of seed width and seed weight in soybean. GmSW17 regulates GmDP-E2F-1 in an epigenetic mechanism involving posttranslational modification (H2Bub). These findings will facilitate high-yield soybean variety breeding.
Results
Natural variations in GmSW17 predominantly determines seed width in soybean
To identify the key genes that control seed width in soybean germplasms, we phenotyped 1853 accessions from our previous resequencing panel49 in 2016 and 2017. A GWAS performed using a mixed linear model revealed a stable association signal across the 2 years in a 93.7 kb interval block on chromosome 17 (Fig. 1a–c and Supplementary Fig. 1a–f), which was named as QTL Seed Width 17 (GmSW17). Within this 93.7 kb interval, a total of 12 protein-coding genes were annotated according to the reference genome ZH1350,51 (Fig. 1d). Transcriptome data demonstrated that among the 12 genes, 3 genes (SoyZH13_17G104400, SoyZH13_17G104500, and SoyZH13_17G105400) exhibited significantly higher expression during seed development (Fig. 1e). Haplotype investigation revealed no haplotype difference in SoyZH13_17G104400. Of the remaining 2 genes, SoyZH13_17G104500 encodes heparanase-like protein 3, and SoyZH13_17G105400 encodes a UBP that contains the Zf-UBP and the ubiquitin carboxyl-terminal hydrolase (UCH) domains (Fig. 1f). These two genes were considered candidates for GmSW17.
a Manhattan plot of GWAS results for seed width using 2 years of best linear unbiased prediction (BLUP) data from the 1853 accessions. b Quantile-quantile plot of the seed width. For the quantile-quantile plot, -Log10-transformed observed P values are plotted against -Log10-transformed expected P values. c Genome-wide Manhattan plot of the 8.3-8.7 Mb region on chromosome 17. The red dashed lines indicate the candidate region for the peak. The red plot indicates the nucleotide variation in GmSW17. d Linkage disequilibrium plot for SNPs in the 8.495-8.589 Mb region from a continuous association block. Candidate genes are depicted at the top of the panel, and GmSW17 are shown in red. The color key (white to red) represents the linkage disequilibrium value (r2) of the accessions. e An expression heatmap of candidate genes located in the candidate region. The color key (blue to red) represents gene expression (Transcripts per kilobase million, TPM). f Gene structure of GmSW17. The nonsynonymous locus (+988th C/T) in the second exon (UCH domain) is labeled on the gene sketch. Two domains (ZF-UBP and ubiquitin carboxyl-terminal hydrolase namely, the UCH domain) are depicted at the top of the panel. g Haplotype analysis of GmSW17H1 (n = 295 accessions) and GmSW17H2 (n = 731 accessions). The edges of the box plot represent the first and third quartiles, the center line represents the median, and the whiskers extend to the smallest and largest data points within 1.5 interquartile ranges from the edges. Statistical significance is determined using a two-sided t-test. Source data are provided as a Source Data file.
In addition, we examined the seed weight of a recombinant inbred line (RIL) population consisting of 243 individuals derived from a cross between Dong Nong50 (DN50), which has an average 100-seed weight of 8 g, and Williams 82 (W82), which has an average 100-seed weight of 19 g. QTL mapping revealed that a major-effect QTL was located on chromosome 17 (Supplementary Fig. 1g), which overlapped with the GWAS interval containing GmSW17 (Fig. 1a–d and Supplementary Fig. 1g). DN50 and W82 had the same genotype in SoyZH13_17G104500 but different genotypes in SoyZH13_17G105400. Previous studies have revealed that UBPs may be involved in seed development36,37. Therefore, SoyZH13_17G105400 was considered a candidate gene associated with seed width in this region and is subsequently referred to GmSW17. Among the polymorphisms associated with a minor allele frequency (MAF) > 0.05, a C/T SNP from the second exon that changed the amino acid arginine to cysteine had an association value higher than the threshold (Fig. 1f) and was used to classify GmSW17 into two major haplotypes in the natural population (Fig. 1g). We found that the accessions harboring GmSW17H2 exhibited a significantly greater seed width than those harboring GmSW17H1 (Fig. 1g).
Then, we generated a set of near-isogenic lines (NILs) from a residual heterozygous line (RHL) that was heterozygous in the GmSW17 region but had a homozygous GmSW17 genetic background for other regions. We observed that NIL-GmSW17H2 exhibited significantly greater seed width, seed length and seed thickness, which resulted in greater 100-seed weight, seed weight per plant and yield per plot than NIL-GmSW17H1 (Supplementary Fig. 2a–g). Using cryo-scanning electron microscopy (cryoSEM), we found that the cotyledon area, cell area, and cell number in the R6 stage cotyledons of NIL-GmSW17H2 were significantly greater than those in NIL-GmSW17H1 (Supplementary Fig. 2h–l). These results suggested that the large seed size phenotype of NIL-GmSW17H2 might result from an increase in cell expansion and cell division.
Functional validation of GmSW17 in controlling soybean seed width and weight
GmSW17 shares 63.2% identity with Arabidopsis AT5G10790 (UBP22) (Supplementary Fig. 3a). To determine the conservation of the catalytic site across different species and the differences in GmSW17H1/GmSW17H2 observed in soybean, we performed protein alignment of GmSW17 homologs from various species and found that the catalytic triad (marked by asterisk) is conserved in all examined species (Supplementary Fig. 3b). However, the differential locus between GmSW17H1/GmSW17H2 in soybean varied in different species (Supplementary Fig. 3b). Subcellular localization investigation in N. benthamiana leaves revealed that both GmSW17H1 and GmSW17H2 were localized to the nucleus, indicating that the SNPs in GmSW17 did not affect the protein localization (Supplementary Fig. 3c). To validate the function of GmSW17, we knocked out GmSW17 in DN50 (harboring GmSW17H1) via the CRISPR/Cas9 system, leading to frameshift mutations in GmSW17CR1 (causing premature stop codons at 398th amino acids), GmSW17CR2 (causing premature stop codons at 397th amino acids), and amino acids alterations (substitution of one amino acid, deletion of two amino acids) in GmSW17CR3 (Supplementary Fig. 4a, b). Phenotypic analysis revealed that the GmSW17CR lines exhibited reduced seed width (Fig. 2a, b). In addition, the seed length, seed thickness, and 100-seed weight were also lower in the GmSW17CR lines compared with the wild type DN50 (Fig. 2c and Supplementary Fig. 4c–e). Consequently, compared with DN50, the seed weight per plant and yield per plot significantly decreased (Fig. 2d, e). There are three GmSW17 homologs in the soybean genome, SoyZH13_13G144800, SoZH13_14G104200 and SoyZH13_17G206600, which shared similarities of 95.4%, 65.9%, and 66.3% with GmSW17, respectively (Supplementary Fig. 3a, b). We found that these four homologs exhibited diverse expression patterns (Supplementary Fig. 4f). GmSW17 was highly expressed in seed compared to the other three homologs, as shown in the source data from our previous publication52 (Supplementary Fig. 4f). We investigated the expression of GmSW17 homologs in the GmSW17CR2 lines and did not observe significant changes (Supplementary Fig. 4g), suggesting that GmSW17 functioned independently from the other homologs in controlling seed development.
a Comparison of seed width between DN50 and GmSW17CR lines. Scale bars, 1 cm. b Seed width of DN50 and GmSW17CR lines (n = 120 and 30 biological independent plants for DN50 and GmSW17CR lines, respectively). c 100-seed weight of DN50 and GmSW17CR lines (n = 113, 30, 31, 26 biological independent plants for DN50, GmSW17CR1, GmSW17CR2 and GmSW17CR3, respectively). d Seed weight per plant of DN50 and GmSW17CR lines (n = 215, 89, 68, 90 biological independent plants for DN50, GmSW17CR1, GmSW17CR2 and GmSW17CR3, respectively). e Plot yield of DN50 and GmSW17CR lines (n = 3 biological replicates). The plot area is 7.5 m2. f, Comparison of the seed width between DN50 and GmSW17-OE lines. Scale bars, 1 cm. g Seed width of DN50 and GmSW17-OE lines (n = 120, 30 biological independent plants for DN50 and GmSW17-OE lines, respectively). h 100-seed weight of DN50 and GmSW17-OE lines (n = 113, 28, 31, 31, 32, 29, 25 biological independent plants for DN50, GmSW17H1-OE-1, GmSW17H1-OE-2, GmSW17H1-OE-3, GmSW17H2-OE-1, GmSW17H2-OE-2, GmSW17H2-OE-3, respectively). i Seed weight per plant of DN50 and GmSW17-OE lines (n = 215, 59, 89, 87, 79, 63, 53 biological independent plants for DN50, GmSW17H1-OE-1, GmSW17H1-OE-2, GmSW17H1-OE-3, GmSW17H2-OE-1, GmSW17H2-OE-2, GmSW17H2-OE-3, respectively). j Plot yield of DN50 and GmSW17-OE lines (n = 3 biological replicates). The plot area is 7.5 m2. The data in (e, j) are presented as the means ± SD. In all the box plots, the center line indicates the median, the edges of the box represent the first and third quartiles, and the whiskers extend to the smallest and largest data points within 1.5 interquartile ranges from the edges. Statistical significance is determined using a two-sided t-test. Source data are provided as a Source Data file.
We further overexpressed the coding sequences (CDS) of GmSW17H1 and GmSW17H2 driven by the seed-specific promoter of the soybean β-conglycinin α subunit encoding gene (Glyma.20G148300)53 in DN50 and obtained six independent transgenic overexpression (OE) lines (named GmSW17H1-OE-1, GmSW17H1-OE-2, GmSW17H1-OE-3, GmSW17H2-OE-1, GmSW17H2-OE-2 and GmSW17H2-OE-3). qRT-PCR assays showed that the transcript level of GmSW17 was significantly increased in the GmSW17-OE lines than in the wild type DN50 (Supplementary Fig. 5a). As expected, the GmSW17-OE lines exhibited increased seed width (Fig. 2f, g). Moreover, the seed length, seed thickness, 100-seed weight, seed weight per plant and yield per plot were also greater in the GmSW17-OE lines (Fig. 2h–j and Supplementary Fig. 5b–d). Taken together, these results demonstrated that GmSW17 positively regulates seed size (seed width or seed weight) in soybean.
GmSW17 forms a deubiquitination module with GmSGF11 and GmENY2
To identify the interaction partners of GmSW17, we carried out a yeast two-hybrid (Y2H) assay by screening a cDNA library constructed from seeds at different stages of development. Among these interacting proteins, an ortholog of Arabidopsis SGF11, named GmSGF11 (SoyZH13_11G189300) in this study, was identified more frequently than the others and was of interest to us (Supplementary Table 1). GmSGF11 encodes a SAGA-associated factor 11, which is a component of the deubiquitination module (DUBm) of the SAGA complex54,55. DUBm has been found to play important roles in histone H2B deubiquitination, which confers the establishment of epigenetic patterns on chromatin and thereby regulates gene expression27,28,56. Arabidopsis UBP22 (like its yeast and human orthologs) is reported to interact with SGF11 and, together with SGF11 and ENY2, forms the DUB module of the SAGA complex to deubiquitinate H2B54. The interaction between GmSW17 (both GmSW17H1 and GmSW17H2) and GmSGF11 was further confirmed by yeast-two hybrid (Y2H), coimmunoprecipitation (Co-IP) assays, bimolecular florescence complementation (BiFC) (Fig. 3a, b and Supplementary Fig. 6a). Pull down assay showed that GmSW17 (both GmSW17H1 and GmSW17H2) was able to directly interact with GmSGF11 in vitro (Supplementary Fig. 6b). Additionally, we found that GmSW17CR3 interacts with GmSW17 in a Y2H assay (Supplementary Fig. 6c). Furthermore, we assessed the interaction ability between Zf-UBP and UCH domains of GmSW17 and GmSGF11 in yeast cells and found that only the Zf-UBP domain of GmSW17 was necessary for the physical interaction between GmSW17 and GmSGF11 (Supplementary Fig. 6d). This observation is consistent with the interaction of yeast Ubp8 (UBP22 homolog) with Sgf11, which is also mediated by the Zf-UBP-domain57. As the natural variation located in UCH domain (Fig. 1f), it is likely that the variation does not result in differences in the interaction abilities of the two haplotypes, and both GmSW17H1 and GmSW17H2 are capable of interacting with GmSGF11.
a Interaction between different haplotypes of GmSW17 and GmSGF11 in a yeast two-hybrid assay. Transformed yeast cells are cultured on DDO (double dropout -Trp/-Leu) or QDO (quadruple dropout -Trp/-Leu/-His/-Ade) synthetic dropout medium. The number at the top indicates the different serial dilutions. AD, GAL4 activation domain. BD, GAL4 DNA-binding domain. b Co-IP analysis of the protein interactions between different haplotypes of GmSW17 and GmSGF11, using anti-Flag beads. c Interaction between GmENY2 and GmSGF11 in a yeast two hybrid assay. d Co-IP analysis of the interactions between GmENY2 and GmSGF11, using anti-Flag beads. e Comparison of seed width between DN50 and GmSGF11-OE lines. Scale bars, 1 cm. f Seed width of DN50 and GmSGF11-OE lines (n = 120, 30 biological independent plants for DN50 and GmSGF11-OE lines, respectively). g 100-seed weight of DN50 and GmSGF11-OE lines (n = 113, 72, 51, 53 biological independent plants for DN50, GmSGF11-OE-1, GmSGF11-OE-2 and GmSGF11-OE-3, respectively). h Comparison of seed width between DN50 and GmENY2-OE lines. Scale bars: 1 cm. i Seed width of DN50 and GmENY2-OE lines (n = 120, 30 biological independent plants for DN50 and GmENY2-OE lines, respectively). j 100-seed weight of DN50 and GmENY2-OE lines (n = 113, 42, 57 biological independent plants for DN50, GmENY2-OE-1 and GmENY2-OE-2, respectively). Experiment repeated two times with similar results in (b, d). In all the box plots, the center line indicates the median, the edges of the box represent the first and third quartiles, and the whiskers extend to the smallest and largest data points within 1.5 interquartile ranges from the edges. Statistical significance is determined using a two-sided t-test. Source data are provided as a Source Data file.
ENY2 is another important component of DUBm that participates in conserved interactions with SGF11 in yeast and Arabidopsis54,55,58. We found that GmSGF11 could interact with GmENY2 (Fig. 3c, d). However, GmSW17 could not directly interact with GmENY2 (Supplementary Fig. 6e, f). To ensure the reliability of the Y2H assay, we confirmed the expression of the proteins by performing a western blot (Supplementary Fig. 6g). Taken together, these results demonstrated that GmSW17 (both GmSW17H1 and GmSW17H2) physically associates with GmSGF11 via Zf-UBP domain, and GmSGF11 serves as a mediator to interact with GmENY2 to form a DUBm in soybean.
To investigate the involvement of DUBm in seed size regulation, we overexpressed the CDS of GmSGF11 and GmENY2 driven by the 35S promoter in DN50. We obtained three independent transgenic OE lines for GmSGF11 (named GmSGF11-OE-1, GmSGF11-OE-2, and GmSGF11-OE-3) (Supplementary Fig. 7a) and two OE lines for GmENY2 (named GmENY2-OE-1 and GmENY2-OE-2) (Supplementary Fig. 7b). The phenotypic analysis demonstrated that, compared with wild-type DN50, both the GmSGF11-OE and GmENY2-OE lines presented significantly greater seed size (including seed width, seed length, and seed thickness), 100-seed weight, and seed weight per plant (Fig. 3e–j, and Supplementary Fig. 7c–j). These results suggested that GmSW17, GmSGF11, and GmENY2 can form a DUBm and affect seed size/weight in soybean.
Natural variation in GmSW17 resulted in differential H2Bub deubiquitinating activity
It has been revealed that UBP22 deubiquitinates histone H2B in humans and Arabidopsis54,56,59. It is also revealed that Ubp8, homologous to Arabidopsis UBP22, does not possess H2B deubiquitinating activity alone in yeast unless SGF11 and SUS1 are all present60. Given that GmSW17 is a homolog of human USP22, Arabidopsis UBP22 and yeast Ubp8, we wondered whether GmSW17 (GmSW17H1 and GmSW17H2) has a histone deubiquitination function. To investigate this possibility, the GmSW17 (GmSW17H1 and GmSW17H2), GmSGF11 and GmENY2 proteins were purified separately (Supplementary Fig. 8a), and incubated with bovine histones in vitro, after which H2B ubiquitination was detected via western blotting. We found that when GmSW17 (GmSW17H1 and GmSW17H2) was co-incubated with GmSGF11 and GmENY2 and the DUBm was constructed in vitro, the level of monoubiquitinated histone H2B (H2Bub) decreased (Supplementary Fig. 8b). More importantly, we observed that GmSW17H2 had stronger H2Bub deubiquitination activity than GmSW17H1 (Fig. 4a), which was further confirmed using a ubiquitin-AMC cleavage assay (Fig. 4b).
a Effects of DUB module subunits on the histone H2B deubiquitination activity of different haplotypes of GmSW17, as determined by histone deubiquitination assay. Purified His-GmSW17, His-GmSGF11 and His-GmENY2 (4 μg each) are incubated with 20 mg of total purified bovine histones, after which the H2Bub level is detected by western blotting. H3 level is detected as a loading control. Unspecific bands are marked with a star. The bottom panel shows the quantitative analysis of H2Bub signals (n = 3 biological replicates) for samples 1, 2 and 3. Experiment repeated three times with similar results. b Catalytic activity of different haplotypes of GmSW17. Data are presented as the mean of three technical replicates. A representative result from two independent biological replicates is shown. c Comparison of H2Bub levels in the DN50 and different GmSW17 transgenic lines. Quantification of H2Bub signals is shown at the top (n = 3 biological replicates) using H3 as the loading control. Experiment repeated three times with similar results. d Comparison of the H2Bub levels between the DN50 and GmSGF11-OE lines. Quantification of H2Bub signals is shown on the right (n = 3 biological replicates) using H3 as the loading control. Experiment repeated three times with similar results. e Comparison of the H2Bub levels between the DN50 and GmENY2-OE lines. Quantification of H2Bub signals is shown on the right (n = 3 biological replicates) using H3 as the loading control. Experiment repeated three times with similar results. The data in (a, c–e) are presented as the means ± SD. Statistical significance is determined using a two-sided t-test. Source data are provided as a Source Data file.
To further validate the role of DUBm in ‘erasing’ H2Bub, we measured H2Bub levels in GmSW17CR, GmSW17-OE, GmSGF11-OE, GmENY2-OE and wild type DN50 in vivo. The results showed that the H2Bub level was significantly higher in GmSW17 CRISPR lines (GmSW17CR2 and GmSW17CR3) than in DN50, while the H2Bub levels in multiple independent GmSW17 overexpressing lines (GmSW17H1-OE-1, GmSW17H1-OE-3, GmSW17H2-OE-2 and GmSW17H2-OE-3) were significantly lower than in DN50 (Fig. 4c). Besides, we also found the H2Bub level was decreased in GmSGF11-OE lines (GmSGF11-OE-1 and GmSGF11-OE-2) (Fig. 4d) and GmENY2-OE lines (GmENY2-OE-1 and GmENY2-OE-2) (Fig. 4e) compared to DN50. Taken together, the results revealed that different haplotypes of GmSW17, which form DUBm with GmSGF11 and GmENY2, exhibited divergent H2Bub deubiquitination activities in soybean.
GmSW17 affected cell cycle by regulating the H2Bub levels of its target gene GmDP-E2F-1
To mine the downstream targets of GmSW17, transcriptome profiling via RNA-seq was performed using seed materials from WT (DN50) and a representative knockout line, GmSW17CR2. A total of 263 upregulated and 448 downregulated genes were identified as differentially expressed genes (DEGs) between the wild type and GmSW17 knockout line (WT/GmSW17CR2) (Supplementary Fig. 9a). Gene Ontology (GO) enrichment analysis demonstrated that DEGs from WT/GmSW17CR2 were involved in multiple pathways in proliferating cells, such as DNA-templated transcription, initiation, and DNA replication (Supplementary Fig. 9b). Among the 263 upregulated genes, GmDP-E2F-1, a homolog of DP-E2F-LIKE1 (DEL1), exhibited significantly increased expression in GmSW17CR2 (Fig. 5a). Considering that DEL1 has been reported to function in regulating the G1-to-S transition and seed size in Arabidopsis5,7,11,12, GmDP-E2F-1 might be a target of GmSW17. To confirm that DP-E2F-1 controls seed weight, we screened a T-DNA insertion mutant of DP-E2F-LIKE1 (named e2f-1) in Arabidopsis (Supplementary Fig. 10a–c). Consistent with previous report12, we found that e2f-1 had greater 1000-seed weight than wild-type Columbia (Col) (Supplementary Fig. 10d, e), indicating that DP-E2F-LIKE1 negatively regulates seed weight (seed size) in Arabidopsis. Subsequently, we performed qRT-PCR to assess the expression of GmDP-E2F-1 and found it exhibited a significant upregulating in the GmSW17CR2, whereas a significant downregulating in the GmSW17H1-OE-1 and GmSW17H2-OE-2 (Fig. 5b). These results indicate that GmSW17 negatively regulates the GmDP-E2F-1.
a Volcano plot of DEGs in GmSW17CR2/WT. b Relative expression level of GmDP-E2F-1 in DN50 and GmSW17 transgenic lines (n = 3 biological replicates). c H2Bub ChIP-seq track of GmDP-E2F-1 locus in DN50 and GmSW17CR2. d H2Bub ChIP-qPCR validation of GmDP-E2F-1 locus in DN50 and GmSW17CR2 (P1, P2 and P3). P1 to P3 represent regions covered by the primers used to assess H2Bub level by ChIP-qPCR. The data are normalized to the input chromatin, with GmTubulin used as a negative control. H2Bub levels in the WT are set to 1 (n = 3 biological replicates). e Cell numbers in the cell cycle phases with 2C (left) and 4C (right) nuclei in WT and GmSW17CR lines are measured by flow cytometry (n = 3 biological replicates). f, Percentage of cells in different phases of the cell cycle in WT and GmSW17CR seedlings (n = 3 biological replicates). ***P < 0.001, **P < 0.01, *P < 0.05, ns, not significant. g cryoSEM image of the cotyledon surface of DN50 and GmSW17 transgenic lines. Scale bars, 20 μm. h Cotyledon areas of DN50 and GmSW17 transgenic lines. i Cell area in the cotyledons of DN50 and GmSW17 transgenic lines. j Cell numbers on the ventral surface of DN50 and GmSW17 transgenic lines. The data in (b, d–f) are presented as the means ± SD. In all the box plots, the center line indicates the median, the edges of the box represent the first and third quartiles, and the whiskers extend to the smallest and largest data points within 1.5 interquartile ranges from the edges. Statistical significance is determined using a two-sided t-test. Source data are provided as a Source Data file.
Furthermore, we performed chromatin immunoprecipitation sequencing (ChIP-seq) for H2Bub and observed a correlation between H2Bub levels and transcriptional changes (Supplementary Fig. 11a, b). The downregulated DEGs were accompanied by a decrease in H2Bub levels (Supplementary Fig. 11a); conversely, the upregulated DEGs exhibited an increase in H2Bub levels (Supplementary Fig. 11b). This result was consistent with previous reports that H2Bub plays a role in the transcriptional activation process27,28,29. Moreover, the H2Bub levels at the GmDP-E2F-1 locus were significantly increased upon knockout of GmSW17 (Fig. 5c), which was also validated by ChIP-qPCR (Fig. 5d).
In addition, we conducted ChIP-qPCR to assess H2Bub levels at the GmDP-E2F-1 locus between NIL-GmSW17H1 and NIL-GmSW17H2. These results revealed that, compared to NIL-GmSW17H1, NIL-GmSW17H2 exhibited a lower H2Bub level at GmDP-E2F-1 (Supplementary Fig. 12a). As anticipated, the expression of GmDP-E2F-1 was significantly decreased in NIL-GmSW17H2 than in NIL-GmSW17H1 (Supplementary Fig. 12b). These results suggested that NIL-GmSW17H2 achieves a lower expression level by significantly altering the H2Bub levels at the GmDP-E2F-1 locus compared with NIL-GmSW17H1, which is consistent with GmSW17H2 having stronger H2Bub deubiquitination activity than GmSW17H1 in vitro.
Phenotypically, we used flow cytometry to investigate the cell cycle procession of GmSW17CR and DN50 cells. There were fewer 4C cells but more 2C cells in the GmSW17CR lines than in the DN50 (Fig. 5e). Cell cycle analysis revealed that GmSW17 knockout elicited a substantial increase in the number of cells in the G1 phase and a concomitant reduction in the number of cells in the S phase (Fig. 5f). Additionally, the depletion of USP22, which is a homolog of GmSW17, results in a specific G1-phase cell cycle arrest in humans and yeast59,61. The cell cycle process was divergently regulated in NIL-GmSW17H1 and NIL-GmSW17H2, and it was observed that the NIL-GmSW17H1 lines contained fewer 4C cells but more 2C cells than the NIL-GmSW17H2, revealing a significant increase in the G1 phase and a significant decrease in the S phase in NIL-GmSW17H1 (Supplementary Fig. 12c–e). As expected, we found that the average cell size and cell number were decreased in R6 stage seed cotyledons of GmSW17CR2; in contrast, they were increased in GmSW17H1-OE-1 and GmSW17H2-OE-2, indicating that GmSW17 can promote both cell expansion and cell division (Fig. 5g–j).
Therefore, our results suggested that GmSW17 negatively regulated the expression of GmDP-E2F-1 by influencing the H2Bub level at the GmDP-E2F-1 locus, thereby affecting the transition from the G1 phase to the S phase, and ultimately affecting seed size/weight by influencing cell expansion and cell division.
Selection of GmSW17 in geographical differentiation
The cultivated soybean (Glycine max [L.] Merr.) was domesticated from its wild relative (G. soja) in China 5000 ~ 9000 years ago49,62,63. Since seed size significantly influences soybean yield, it has been a crucial factor in soybean breeding. After investigating 2069 previous resequenced soybean accessions49, including 88 wild soybeans, 858 landraces, and 1123 improved cultivars, we observed that 94.32% of the wild soybeans, which exhibit smaller seed size, characterized by the presence of GmSW17H1. However, this percentage decreased to 30.77% and 18.43% in landraces and improved cultivars, respectively (Fig. 6a). Consequently, the proportion of soybeans with larger seeds, characterized by the presence of GmSW17H2, increased during the domestication of wild soybeans into landraces (Fig. 6a). π and FST analyses also revealed that wild soybeans exhibited higher diversity than landraces and improved cultivars, indicating an artificial selection for GmSW17 during soybean domestication (Fig. 6b, c and Supplementary Fig. 13a).
a Haplotype frequency distribution of GmSW17 among wild soybeans (n = 88 accessions), landraces (n = 858 accessions), and improved cultivars (n = 1123 accessions). b π values associated with GmSW17 in wild soybeans, landraces and improved cultivars. c FST values associated with GmSW17 in wild soybeans, landraces and improved cultivars. The thresholds of FST(Soja_Landrace), FST(Landrace_Cultivar) are 0.702 and 0.089, respectively. The red dotted line represents GmSW17, and the dark dotted lines represent the other genes. d A proposed model elucidating the role of GmSW17 in regulating seed size in soybean. GmSW17, GmSGF11 and GmENY2 form a trimeric DUB module that functions to reduce H2Bub levels in soybean. As GmSW17H2 has a stronger enzymatic activity in deubiquitinating H2Bub than GmSW17H1, the H2Bub level of GmDP-E2F-1 decreased, resulting in lower expression levels of GmDP-E2F-1. Consequently, the constraint on the G1 to S phase transition was alleviated, ultimately leading to increased seed size.
By investigating the distribution of the variants among 1539 soybean accessions (including landraces and improved cultivars) collected from China, we found that accessions with GmSW17H1 were predominantly distributed in the southern regions (ecoregion III) of China, with fewer in the northern regions (ecoregion I) (Supplementary Fig. 13b). In contrast, the accessions with GmSW17H2 were predominantly distributed in the northern regions of China and exhibited a continuously decreasing pattern from the northern regions to the Huanghuaihai region (ecoregion II) and then to the ecoregion III of China (Supplementary Fig. 13b). The result suggested the two haplotypes of GmSW17 have been utilized in soybean breeding in different region but has not been fully fixed.
Discussion
Soybean is an important legume crop that is a great source of both protein and oil. To increase soybean production, it is essential to dissect and understand the mechanisms involved in the regulation of seed related traits. Several quantitative trait loci (QTLs) associated with seed weight have been identified in soybean64, but the genes underlying these QTLs and their functions remain largely unknown. In this study, through GWAS, we identified and characterized GmSW17, which positively regulates the seed-related traits. Moreover, we found that GmSW17CR1, GmSW17CR2 and GmSW17CR3 have smaller leaves compared to DN50, while NIL-GmSW17H2 produced bigger leaves than NIL-GmSW17H1 (Supplementary Fig. 14a–d). This suggests that GmSW17 not only affects seed-related traits but also influences leaf size. The reduced seed size and leaf size phenotype were also identified in Arabidopsis otu1 mutants38. Interestingly, GmSW17 was located in a hot region that had been identified to be associated with the seed size by several independently studies, among which GmKIX8-1 was determined as the causative gene responsible qSW17-141,63,65,66,67,68,69,70,71,72 (Supplementary Fig. 15). However, GmSW17 and GmKIX8 located in two independent non-overlapped regions (Supplementary Fig. 15). Therefore, GmSW17 and GmKIX8 may function independently from a quantitative trait genetic perspective. The functional “crosstalk”, dominance, and combination effects of these two genes in determining seed size/weight are unclear and need future investigation. Furthermore, besides GmSW17 and GmKIX8, other causative genes may exist in this “hotpot” region controlling seed size/weight, such as within the QTL for Seed weight 30-769 (Supplementary Fig. 15).
In this study, we showed that GmSW17, which is a component of DUBm, influenced seed size by regulating H2Bub levels at the GmDP-E2F-1 locus. Therefore, we proposed a possible working model for the network: GmSW17 forms a trimeric DUB module with GmSGF11 and GmENY2 to influence H2Bub levels in GmDP-E2F-1 and subsequently regulate cell cycle during soybean seed development. We also revealed that GmSW17H1 had a weaker H2Bub deubiquitination activity than GmSW17H2. The weaker deubiquitination activity of GmSW17H1 results in an increase of H2Bub and thus in higher expression levels of GmDP-E2F-1, ultimately leading to a reduction of seed size (Fig. 6d).
UBPs are among the major components of DUBm and have been found to have a relatively conserved functional mechanism in Arabidopsis and yeast. However, studies also have revealed that different UBP members play extensive and diverse roles in controlling different traits. Except for Arabidopsis UBP1436 and rice OsUBP1537, the roles of other UBPs in seed development is seldom reported. Here, we identified GmSW17, a homolog of Arabidopsis UBP22, as a key regulator of seed size in soybean. This discovery disclosed an important member of this gene family and shed more light on the role of UBPs in seed development. Our results suggested that GmSW17 can form a DUB module together with GmSGF11 and GmENY2 (Fig. 3, Supplementary Fig. 6), which is consistent with previous studies54,55,57,58,59, indicating the conservation of DUBm. Our data also revealed that GmSW17 functions by regulating cell cycle arrest at the G1 phase (Fig. 5e, f), which aligns well with the outcomes of earlier investigations59,61,73. In humans, it was found that Myc could recruit USP22 to the promoters of target genes, such as those of CAD and MTA159. It was also found that USP22 plays a direct role in stabilizing the crucial G1-cyclin CCND1 in humans61. Here, we found that GmSW17 regulated the cell cycle through GmDP-E2F-1. However, the mechanism by which GmSW17 was recruited to the downstream GmDP-E2F-1 gene locus has not been determined, and uncovering this process, while challenging, would be interesting.
Previous reports revealed that USP22 can remove the ubiquitin moiety from monoubiquitinated histones H2A and H2B, thereby altering chromatin structure and gene transcription61,74,75,76. As polyubiquitination of H2B has been found in yeast77 and the molecular mechanism of H2B ubiquitination is conserved from yeast to plant78, it is reasonable to speculate that both monoubiquitination and polyubiquitination of H2B may occur in plants. We observed that GmSW17 deubiquitinates H2Bub and influence GmDP-E2F-1 expression in soybean (Fig. 5). In addition, we also found that the downregulated DEGs were associated with reduced H2Bub levels, while the upregulated DEGs exhibited an increase in H2Bub levels, indicating that H2Bub level was correlated with transcriptional changes (Supplementary Fig. 11a, b). More efforts are required to identify the substrates of GmSW17 beyond histones H2A and H2B, aiming to provide a more detailed understanding of the function of GmSW17 in soybean.
In summary, we identified GmSW17 as a positive regulator influencing seed size/weight, and it could form a DUBm complex with GmSGF11 and GmENY2. The natural variation in GmSW17 leads to variations in the deubiquitination activity of H2Bub at the GmDP-E2F-1 locus, which contributes to the seed size/weight of soybean. Additionally, the overexpression of these genes has potential application for improving soybean yield. Taken together, our findings provide significant insight into the genetic basis for determining seed size in soybean and will be helpful for improving soybean yield through molecular breeding.
Methods
Plant materials and phenotyping
The 1853 soybean accessions used for the GWAS were planted at the experimental station of the Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing (40° 22′ N and 116° 23′ E), during the summer season in 2016 and 2017. In this study, the soybean (Glycine max (L.) Merr.) cultivar Dong Nong50 (DN50) served as control groups to generate transgenic lines. CRISPR and overexpression transgenic lines were planted in three-row plots in accordance with a randomized complete block design with three replications for each environment. The length of each plot was 5 meters, a row spacing was 0.5 meters, and a plant spacing was 0.14 meters. The space between plots was 0.5 meters.
QTL mapping
F1 plants derived from a cross between DN50 and Williams (W82) were consistently self-pollinated, giving rise to F6 through single seed descent and ultimately producing 243 recombinant inbred lines (RILs). QTL detection was determined using QTL lciMapping 4.2 software. The LOD threshold was determined by performing 1000 permutation tests (P < 0.05). A residual heterozygous line (RHL) within the RILs, exhibiting heterozygosity at the GmSW17, was continually self-pollinated to generate F6 lines. This process was employed to construct a pair of near-isogenic lines (NILs), namely NIL-GmSW17H1 and NIL-GmSW17H2, for GmSW17.
All of the above seeds were stored for one month postharvest and subsequently utilized for seed trait determination. Seed width, seed length and seed thickness were measured for at least 30 representative dry seeds per accession. For 100-seed weight and seed weight per plant, a minimum of 20 replicates were examined.
GWAS for the seed width trait
A total of 31,870,983 single-nucleotide polymorphisms (SNPs) derived from the resequencing of 1853 soybean accessions were utilized in conducting GWAS for the seed width trait49. The population structure was assessed using the Bayesian clustering tool fastStructure79. Only SNPs with an MAF > 0.05 and missing rate <0.1 in the population were used in the GWAS. Association analysis was performed with a mixed linear model (MLM) implemented in the efficient mixed-model association expedited (EMMAX) software package80. The matrix of pairwise genetic distances, which were derived from the simple matching coefficients of the variance-covariance matrix of the random effects, was also calculated by EMMAX. The threshold for GWAS was determined based on a previous report81. Briefly, first, we randomly shuffled the observed phenotypes to disrupt the associations between these phenotypes and their corresponding genotypes. Then, we applied GWAS to the permuted phenotypes using the same model employed for the observed phenotypes. The most significant P value across the entire genome was recorded. This random process was repeated 1000 times. The distribution of the most significant P values across the 1000 replicates was utilized to determine the threshold, which represented the P value corresponding to a 5% chance of a type I error.
Vector construction and transformation
For the CRISPR/Cas9 system experiments, two sgRNAs were designed according to a previously described method82. In particular, U6 promoter were employed for the guide RNA oligonucleotide pair. The U6 promoter driving a single guide RNA cassette was cloned and inserted into the PMDC123 vector. These constructs were subsequently introduced into Agrobacterium tumefaciens strain EHA105, which was subsequently transformed into DN50. The relevant primers used are listed in Supplementary Data 1.
To construct the GmSW17 (SoyZH13_17G105400/Glyma.17G109100) overexpression vector, the coding DNA sequences (CDSs) of GmSW17H1 and GmSW17H2 were amplified from DN50 and Willimas 82 (W82) respectively. A 2 kb DNA sequence of a soybean seed-specific promoter from the β-conglycinin α subunit encoding gene Glyma.20G14830053 was amplified from W82 genomic DNA. The ligated promoter-cDNA fragment was cloned and inserted into the pTF101 vector. To construct the GmSGF11 (SoyZH13_11G189300/Glyma.11G168600) and GmENY2 (SoyZH13_16G052500/Glyma.16G056700) overexpression plasmids, the CDSs of GmSGF11 and GmENY2 were both amplified from W82 and ligated into the pFGC5941 vector. These constructs were subsequently introduced into Agrobacterium tumefaciens strains EHA101 or EHA105 and then transformed into DN50. All primers used to construct vectors are listed in Supplementary Data 1.
Preparation of protoplasts
The preparation of Arabidopsis protoplasts was performed according to standard procedures83. Leaves from 3-4-week-old Arabidopsis were cut into 0.5–1 mm strips using a sharp blade, followed by enzyme digestion in the dark for 3–4 h at room temperature. After digestion, the protoplast was isolated using a 40 µm filter. Finally, the protoplasts were transformed with plasmid DNA and incubation in the dark for 12–16 h.
RNA extraction, PCR, and qRT-PCR
Total RNA was extracted from the seeds at the R5 stage using the HUAYUEYANG Quick RNA Isolation Kit v1.0 (HUAYUEYANG, Beijing) following the manufacturer’s protocol, and three biological replicates were performed for each experiment. Reverse transcription was carried out using a cDNA synthesis kit (TransGen, AE311). The cDNA sequence was subsequently used as the template for quantitative real-time PCR (qRT-PCR). qRT-PCR was performed using LightCycler 480 SYBR Green I Master Mix (TransGen, AQ101-01) on a LightCycler 480 instrument (Roche). Gene expression was normalized to the expression of the soybean gene GmACTIN11 (Glyma.18G290800). The relevant primers used are listed in Supplementary Data 1.
Subcellular localization
The CDSs of GmSW17H1, and GmSW17H2 were individually cloned and inserted into the 35S::eYFP vector. The subcellular localization of GmSW17 was determined by transient expression of GmSW17H1-eYFP and GmSW17H2-eYFP in N. benthamiana leaves. To minimize chlorophyll autofluorescence, we carefully selected appropriate fluorescence filters for excitation and emission wavelengths. The excitation and emission wavelengths for GFP are 470–490 nm and 510–530 nm, respectively, while for RFPs (e.g., mCherry), they are 540–560 nm and 580–610 nm. The relevant primers used are listed in Supplementary Data 1.
Phylogenetic tree analysis
Duplicates and orthologs with high similarity to GmSW17 from soybean, as well as from other representative species (rice, maize, Medicago and Arabidopsis), were obtained from Phytozome 13 (https://phytozome-next.jgi.doe.gov/blast-search) and utilized to construct a maximum likelihood tree in MEGA11 software.
Yeast two-hybrid assays
Yeast two-hybrid assays were subsequently conducted following the procedures outlined in the Yeast Protocols Handbook from Clontech. The CDSs of GmSW17H1, GmSW17H2, and GmENY2 were individually cloned and inserted into the bait vector pGBKT7. Similarly, the CDSs of GmSGF11 and GmENY2 were cloned and inserted into the prey vector pGADT7. Subsequently, equal amounts of paired plasmids were cotransformed into the yeast strain Y2H Gold strain (Clontech), which was subsequently subjected to selection on DDO (double dropout -Leu-Trp) and QDO (quadruple dropout -Ade-His-Leu-Trp) media. Empty AD or empty BD vectors were used as negative controls. The sequences of primers used are provided in Supplementary Data 1.
Bimolecular fluorescence complementation (BiFC)
For the construction of BiFC vectors, we employed the Gateway-compatible vectors pUGW2-nYFP and pUGW2-cYFP to generate BiFC assay vectors using Gateway cloning technology. pUGW2-nYFP serves as the vector for fusing the N-terminal region with yellow fluorescent protein (nYFP), while pUGW2-cYFP is the vector for fusing the C-terminal region with YFP (cYFP). The full length CDSs of GmSW17H1 and GmSW17H2 were cloned and inserted into pUGW2-nYFP, and the full-length CDSs of GmSGF11 were cloned and inserted into pUGW2-cYFP. Arabidopsis protoplasts were prepared for the expression assays. Various plasmid combinations were cotransformed into Arabidopsis protoplasts and incubated at 22 °C in the dark for 12–16 h. YFP fluorescence was visualized using a confocal laser scanning microscope (Zeiss LSM 985 NLO). The sequences of primers used are provided in Supplementary Data 1.
Co-immunoprecipitation (Co-IP) assays
To assess the interactions among GmSW17, GmSGF11 and GmENY2, a Co-IP assay was conducted in Arabidopsis protoplasts. The full-length CDSs of GmSW17H1, GmSW17H2, GmSGF11 and GmENY2 were cloned and inserted into the pUC19-35S-HA vector and the pUC19-35S-Flag vector. For example, to examine the interactions between different haplotypes of GmSW17 and GmSGF11, the GmSW17H1-Flag or GmSW17H2-Flag construct was cotransformed with GmSGF11-HA into Arabidopsis protoplasts and incubated in the dark for overnight. Total proteins were extracted from Arabidopsis protoplasts after 12–16 h of incubation in extraction buffer (50 mM Tris-HCl (pH 7.5), 0.5 mM EDTA, 150 mM NaCl, 0.5% np-40, 1 mM PMSF, and 1× complete protease inhibitor cocktail from Roche 04693132001). The protein lysates were incubated with Flag magnetic beads (MBL, M185-11R) for 2 h to 3 h, after which the beads were washed four times with a buffer containing 50 mM Tris-HCl pH 7.5, 150 mM NaCl, 20% glycerol, 0.1% Triton X-100, 1 mM EDTA pH 8.0 and 1× complete protease inhibitor cocktail. The immunoprecipitates were separated via SDS-PAGE and subsequently transferred to a nitrocellulose membrane (GE Healthcare), after which the proteins were detected using anti-HA (MBL, M180-7, 1:5,000 dilution) or anti-DDDDK-tag mAb-HRP-DirectT (MBL, M185-7, 1:10,000 dilution) antibodies. The sequences of primers used are provided in Supplementary Data 1.
Flow cytometry analysis
Soybean leaves were chopped with a razor blade in 500 µl of GS buffer (20 mM MOPS, 30 mM sodium citrate, 45 mM MgCl2, 0.1% (v/v) Triton X-100) at 4 °C. Subsequently, nuclei were filtered through a 40 µm mesh sieve. The isolated nuclei were further stained with a 10 µM DAPI staining solution for 5 min on ice and then analyzed using a flow cytometer (BD FACSAria II; BD Biosciences). Based on the distribution of DNA content, the cell cycle stage (G1, S, G2/M) of cells is determined. Cells in the G1 phase have haploid DNA content, cells in the G2/M phase have diploid DNA content, while cells in the S phase have DNA content between haploid and diploid.
Cryo-scanning electron microscopy (cryo-SEM)
The cotyledons of seeds at the R6 stage (green seed filling the pod cavity) were sampled and observed using a Crossbeam 340 & VCT500 field emission scanning electron microscope (SEM, Carl Zeiss, Germany). The cryo-SEM workflow can be summarized as follows: First, the sample was mounted and placed in the correct orientation on a stub. Second, the sample was rapidly frozen. Third, the samples were transferred under vacuum to the SEM chamber. Fourth, cryogenic environment was maintained under vacuum to prevent water loss and the formation of exogenous ice. The cotyledon areas of the NIL line seeds and transgenic seeds on the ventral surface were estimated using a Tomato analyzer. The cells in field of view were counted using ImageJ. The cell area was calculated as the ratio of the field area to the number of cells within the field. The number of cells in the flat surface of the cotyledon was estimated as the ratio of the area of the cotyledon to the cell size.
In vitro histone deubiquitination assay
Purified His-GmSW17, His-GmSGF11, and His-GmENY2 proteins (4 µg each) were incubated with 20 µg of total bovine histones (Roche), which are unfractionated and contain H2A, H2B, H3, H4, and their variants, in a buffer containing 50 mM Tris-HCl (pH 8.0) and 50 mM NaCl for 3-4 h at room temperature (25 °C), following a previously described protocol84 with minor modifications. In particular, for western blot analysis, the samples were separated by 12% SDS-PAGE, blocked with 5% milk, and probed with anti-H2Bub (Cell Signaling Technology, 5546 s, 1:5000 dilution) and an anti-H3 antibody (Abcam, Ab1791, 1:5000 dilution).
Ubiquitin-AMC hydrolysis assay
For the Ubiquitin-AMC hydrolysis assay, following a previously described protocol with minor modifications85. Specifically, we used the DUB-Activity Assay Kit (701490), which uses a fluorescent universal ubiquitin substrate (Ubiquitin-AMC), according to the manufacturer’s instructions. Ub-AMC cleavage was monitored by measuring fluorescence in real time for at least 30 min at 365 nm excitation and 460 nm emission. Ub-AMC was used at a concentration 500 nM. The GmSW17H1 and GmSW17H2 mutant complexes were used at a concentration of 20 nM.
In vivo histone deubiquitination assay (western blot assays)
The seeds at the R5 stage were sampled, and total histone proteins were extracted by using an EpiQuik Total Histone Extraction Kit (OP-0006-100). Subsequently, the total histone proteins were subjected to western blot analysis to assess the H2Bub levels in vivo. An anti-H3 immunoblot served as a loading control. The antibodies used were anti-H3 (Abcam, ab1791, 1:5000 dilution), anti-H2Bub (Cell Signaling Technology, 5546 s, 1:5000 dilution). Immunoblotting was performed using the enhanced chemiluminescence (ECL) system.
RNA-seq sample preparation and data processing
Total RNA was extracted from the seeds of DN50 and GmSW17CR2 at the R5 stage for RNA-seq analysis, with three biological replicates performed for each sample. Paired-end libraries were constructed and sequenced using an Illumina NovaSeq 6000 instrument at BerryGenomics Company (China). Adapter sequences and low-quality reads from the RNA-seq library were removed using fastp (0.20.1)86. The cleaned reads were subsequently mapped to W82 Refseq v2 using hisat2 (2.1.0)87, and gene expression was quantified with featureCount (2.0.1)88. Differentially expressed genes were assessed using the DESeq2 package (1.34.0)89 in R with an adjusted p-value < 0.05 and Log2 fold change > 1. Transcripts per kilobase million (TPM) values derived from the counts matrix were used to characterize gene expression and for hierarchical clustering analysis.
For functional enrichment, GO annotation files were generated from W82 Refseq v2 and an R package clusterProfiler (4.2.2)90 was utilized for enrichment analysis.
ChIP assay
ChIP assay was performed according to standard procedures91. In brief, 1.5 g seeds of DN50 and GmSW17CR2 at the R5 stage were ground into fine powder in liquid nitrogen and then suspended in 20 ml of cold lysis buffer (0.4 M sucrose, 10 mM Tris-HCl (pH 8.0), 10 mM MgCl2, 5 mM dithiothreitol (DTT), 0.1 mM phenylmethylsulfonyl fluoride (PMSF), and 1× protease inhibitor cocktail). Formaldehyde (Sigma, F8775) was added to achieve a final concentration of 1% for crosslinking. After 15 min, glycine (Sigma, 50046) was added at 0.125 M to quench crosslinking. The nuclei were isolated by filtering the sample through two layers of Miracloth (Millipore, 475855) and centrifuging. The nuclei were washed 3-4 times with 10 ml of nuclear resuspension buffer I (0.25 M sucrose, 10 mM Tris-HCl (pH 8.0), 10 mM MgCl2, 1% Triton X-100, 5 mM dithiothreitol (DTT), 0.1 mM phenylmethylsulfonyl fluoride (PMSF), and 1× protease inhibitor cocktail) and resuspended in 700 µl of nuclear resuspension buffer II (1.7 M sucrose, 10 mM Tris-HCl (pH 8.0), 2 mM MgCl2, 0.15% Triton X-100, 5 mM dithiothreitol (DTT), 0.1 mM PMSF, and 1× protease inhibitor cocktail). The proteins were subsequently transferred to 700 µl of nuclear resuspension buffer II, after which the mixture was allowed to stratify and centrifuged at 13000 g for 1 h. The nuclei were resuspended in 200 µl of nuclear lysis buffer (50 mM Tris-HCl (pH 8.0), 10 mM EDTA pH 8.0, 1% SDS) and placed on ice for 30 min. Subsequently, 1800 μL of ChIP dilution buffer without Triton (167 mM NaCl, 16.7 mM Tris-HCl (pH 8.0), 1.2 mM EDTA, 0.01% SDS) was added. Chromatin was sonicated for 30 min with a 10 min ON/4 min OFF cycle to an average size of 200-500 bp using a Covaris device (M220). Then, 61 μL of 20% Triton X-100 was added at a final concentration of 1.1 %. Precleaned protein A/G beads were incubated with ChIP dilution buffer containing Triton X-100 (167 mM NaCl, 16.7 mM Tris-HCl (pH 8.0), 1.2 mM EDTA, 0.01% SDS, and 1.1% Triton X-100) three times. Fifty microliters of magnetic protein A/G beads were added to each tube, and the mixture was incubated for 2 h at 4 °C with rotation. The cleared solution was transferred to a new 2 ml tube, from which 2% was removed as an input sample and stored at -80 °C. Anti-H2Bub (CST, 5546 s, 1:150 dilution) was added to bind the protein-DNA complex, and the mixture was incubated overnight at 4 °C with rotation. For immunoprecipitation captured DNA/protein complexes, 50 μL of precleaned protein A/G magnetic beads was added to each tube. The mixture was incubated for 4 h at 4 °C with rotation. The beads were washed sequentially with low-salt wash buffer (20 mM Tris-HCl (pH 8.0), 150 mM NaCl, 0.1% SDS, 1% Triton X-100, 2 mM EDTA), high-salt wash buffer (20 mM Tris-HCl (pH 8.0), 500 mM NaCl, 0.1% SDS, 1% Triton X-100, 2 mM EDTA), LiCl wash buffer (10 mM Tris-HCl (pH 8.0), 0.25 M LiCl, 1% NP-40, 1% sodium deoxycholate, 1 mM EDTA), and TE buffer (10 mM Tris-HCl (pH 8.0), 1 mM EDTA pH 8.0). Elution was performed in elution buffer (1% SDS, 0.1 M NaHCO3). After reverse crosslinking (65 °C overnight in 200 mM NaCl), the eluted DNA was digested with proteinase K (Invitrogen, 25530049) and purified for sequencing or qPCR. GmTubulin (SoyZH13_08G013700/Glyma.08G014200) was used as a negative control locus in the ChIP-qPCR assays. The sequences of primers used are provided in Supplementary Data 1. For ChIP-seq, chromatin-immunoprecipitated DNAs were used to construct sequencing libraries following the protocol of the DNA SMARTTM ChIP-Seq Kit User Manual (Takara Bio USA, Cat. No. 634866), and sequenced by Annoroad Gene Technology (China) with HiSeq-PE150 (Illumina).
ChIP-seq analysis
The adapter sequence and low-quality reads from the ChIP-seq library were removed by fastp (0.20.1)86, and the cleaned reads were mapped to W82 Refseq v2 using the bwa mem algorithm (0.7.17)92. We further filtered the reads mapped with “samtools view -bS -F 1,804 -f 2 -q 30” to filter the low-quality mapped reads. The high-quality mapped reads were subsequently reduplicated using Picard-2.20.5-0. The deduplicated bam files from two biological replicates were merged with samtools (1.5)93, and the merged bam file was converted to bigwig files using bamCoverage provided by deeptools (3.3.0) with the parameters “-bs 10 –effectiveGenomeSize 978000000 – normalizeUsing RPKM – smoothLength 50”. The bigwig files were visualized using deeptools (3.3.0)94 and IGV (2.8.0.01)95.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The raw sequence data of ChIP-seq and RNA-seq were deposited in the Genome Sequence Archive (https://bigd.big.ac.cn/gsa) under GSA accession number CRA015133; the expression data of genes within the GWAS interval was under GSA accession numbers CRA017547. The sequence of GmSW17 was deposited to GenBase [https://ngdc.cncb.ac.cn/genbase/] under accession number C_AA082825.1. Source data are provided with this paper.
Code availability
Source code for analysis is available at Github [https://github.com/Liangshan94/Soybean-histone-analysis].
Change history
20 September 2024
A Correction to this paper has been published: https://doi.org/10.1038/s41467-024-52624-8
References
Li, N. & Li, Y. Signaling pathways of seed size control in plants. Curr. Opin. Plant Biol. 33, 23–32 (2016).
Moles, A. T. et al. Factors that shape seed mass evolution. Proc. Natl. Acad. Sci. USA 102, 10540–10544 (2005).
Dante, R. A., Larkins, B. A. & Sabelli, P. A. Cell cycle control and seed development. Front. Plant Sci. 5, 493 (2014).
Li, Y., Zheng, L., Corke, F., Smith, C. & Bevan, M. W. Control of final seed and organ size by the DA1 gene family in Arabidopsis thaliana. Genes Dev. 22, 1331–1336 (2008).
Inze, D. & De Veylder, L. Cell cycle regulation in plant development. Annu. Rev. Genet. 40, 77–105 (2006).
Nakagami, H., Kawamura, K., Sugisaka, K., Sekine, M. & Shinmyo, A. Phosphorylation of retinoblastoma-related protein by the cyclin D/cyclin-dependent kinase complex is activated at the G1/S-phase transition in tobacco. Plant Cell 14, 1847–1857 (2002).
De Veylder, L., Joubes, J. & Inze, D. Plant cell cycle transitions. Curr. Opin. Plant Biol. 6, 536–543 (2003).
Vandepoele, K. et al. Genome-wide analysis of core cell cycle genes in Arabidopsis. Plant Cell 14, 903–916 (2002).
Ren, D. et al. A new regulator of seed size control in Arabidopsis identified by a genome-wide association study. New Phytol 222, 895–906 (2019).
Collins, C., Dewitte, W. & Murray, J. A. H. D-type cyclins control cell division and development rate during Arabidopsis seed development. J. Exp. Bot. 63, 695–709 (2012).
Vlieghe, K. et al. The DP-E2F-like gene DEL1 controls the endocycle in Arabidopsis thaliana. Curr. Biol. 15, 59–63 (2005).
Van Daele, I. et al. A comparative study of seed yield parameters in Arabidopsis thaliana mutants and transgenics. Plant Biotechnol. J. 10, 488–500 (2012).
Li, N., Xu, R. & Li, Y. Molecular networks of seed size control in plants. Annu. Rev. Plant Biol. 70, 435–463 (2019).
Ren, D., Ding, C. & Qian, Q. Molecular bases of rice grain size and quality for optimized productivity. Sci. Bull. (Beijing) 68, 314–350 (2023).
Li, N. & Li, Y. Ubiquitin-mediated control of seed size in plants. Front. Plant. Sci. 5, 332 (2014).
Sadanandom, A., Bailey, M., Ewan, R., Lee, J. & Nelis, S. The ubiquitin-proteasome system: central modifier of plant signalling. New Phytol 196, 13–28 (2012).
Dong, C.-H., Agarwal, M., Zhang, Y., Xie, Q. & Zhu, J.-K. The negative regulator of plant cold responses, HOS1, is a RING E3 ligase that mediates the ubiquitination and degradation of ICE1. Proc. Natl. Acad. Sci. USA 103, 8281–8286 (2006).
Cho, S. K., Ryu, M. Y., Seo, D. H., Kang, B. G. & Kim, W. T. The Arabidopsis RING E3 ubiquitin ligase AtAIRP2 plays combinatory roles with AtAIRP1 in abscisic acid-mediated dought stress responses. Plant Physiol 157, 2240–2257 (2011).
Liu, H. & Stone, S. L. Abscisic acid increases Arabidopsis ABI5 transcription factor levels by promoting KEG E3 Ligase self-ubiquitination and proteasomal degradation. Plant Cell 22, 2630–2641 (2010).
Linden, K. J. & Callis, J. The ubiquitin system affects agronomic plant traits. J. Biol. Chem. 295, 13940–13955 (2020).
Disch, S. et al. The E3 ubiquitin ligase BIG BROTHER controls Arabidopsis organ size in a dosage-dependent manner. Curr. Biol. 16, 272–279 (2006).
Su, Z., Hao, C., Wang, L., Dong, Y. & Zhang, X. Identification and development of a functional marker of TaGW2 associated with grain weight in bread wheat (Triticum aestivum L.). Theor. Appl. Genet. 122, 211–223 (2011).
Xia, T. et al. The ubiquitin receptor DA1 interacts with the E3 ubiquitin ligase DA2 to regulate seed and organ size in Arabidopsis. Plant Cell 25, 3347–3359 (2013).
Li, Q. et al. Relationship, evolutionary fate and function of two maize co-orthologs of rice GW2 associated with kernel size and weight. BMC Plant Biol 10, 143 (2010).
Song, X. J., Huang, W., Shi, M., Zhu, M. Z. & Lin, H. X. A QTL for rice grain width and weight encodes a previously unknown RING-type E3 ubiquitin ligase. Nat. Genet. 39, 623–630 (2007).
Huang, T. T. & D’Andrea, A. D. Regulation of DNA repair by ubiquitylation. Nat. Rev. Mol. Cell Biol. 7, 323–334 (2006).
Morgan, M. T. et al. Structural basis for histone H2B deubiquitination by the SAGA DUB module. Science 351, 725–728 (2016).
Henry, K. W. et al. Transcriptional activation via sequential histone H2B ubiquitylation and deubiquitylation, mediated by SAGA-associated Ubp8. Genes Dev. 17, 2648–2663 (2003).
March, E. & Farrona, S. Plant deubiquitinases and their role in the control of gene expression through modification of histones. Front. Plant Sci. 8, 2274 (2017).
Zheng, Y. et al. Rice OsUBR7 modulates plant height by regulating histone H2B monoubiquitination and cell proliferation. Plant Commun 3, 100412 (2022).
Derkacheva, M. et al. H2A deubiquitinases UBP12/13 are part of the Arabidopsis polycomb group protein system. Nat. Plants 2, 16126 (2016).
Schmitz, R. J., Tamada, Y., Doyle, M. R., Zhang, X. & Amasino, R. M. Histone H2B deubiquitination is required for transcriptional activation of FLOWERING LOCUS C and for proper control of flowering in Arabidopsis. Plant Physiol 149, 1196–1204 (2009).
Wu, R. et al. Protein partners of plant ubiquitin-specific proteases (UBPs). Plant Physiol. Biochem. 145, 227–236 (2019).
Luo, R., Yang, K. & Xiao, W. Plant deubiquitinases: from structure and activity to biological functions. Plant Cell Rep 42, 469–486 (2023).
Skelly, M. J. The emerging roles of deubiquitinases in plant proteostasis. Essays Biochem 66, 147–154 (2022).
Xu, Y. et al. UBIQUITIN-SPECIFIC PROTEASE14 interacts with ULTRAVIOLET-B INSENSITIVE4 to regulate endoreduplication and cell and organ growth in Arabidopsis. Plant Cell 28, 1200–1214 (2016).
Shi, C. et al. Ubiquitin specific protease 15 has an important role in regulating grain width and size in rice. Plant Physiol 180, 381–391 (2019).
Keren, I., Lacroix, B., Kohrman, A. & Citovsky, V. Histone deubiquitinase OTU1 epigenetically regulates DA1 and DA2, which control Arabidopsis seed and organ size. iScience 23, 100948 (2020).
Ray, D. K., Mueller, N. D., West, P. C. & Foley, J. A. Yield trends are insufficient to double global crop production by 2050. PLoS One 8, e66428 (2013).
Lu, X. et al. A PP2C-1 allele underlying a quantitative trait locus enhances soybean 100-seed weight. Mol. Plant 10, 670–684 (2017).
Nguyen, C. X., Paddock, K. J., Zhang, Z. & Stacey, M. G. GmKIX8-1 regulates organ size in soybean and is the causative gene for the major seed weight QTL qSw17-1. New Phytol 229, 920–934 (2021).
Wang, S. et al. Simultaneous changes in seed size, oil content and protein content driven by selection of SWEET homologues during soybean domestication. Natl. Sci. Rev. 7, 1776–1786 (2020).
Liang, Q. et al. Natural variation of Dt2 determines branching in soybean. Nat. Commun. 13, 6429 (2022).
Duan, Z. et al. Natural allelic variation of GmST05 controlling seed size and quality in soybean. Plant Biotechnol. J. 20, 1807–1818 (2022).
Wang, X. et al. Evolution and association analysis of GmCYP78A10 gene with seed size/weight and pod number in soybean. Mol. Biol. Rep. 42, 489–496 (2015).
Zhao, B. et al. Arabidopsis KLU homologue GmCYP78A72 regulates seed size in soybean. Plant Mol. Biol. 90, 33–47 (2016).
Tang, X. et al. Suppression of extracellular invertase inhibitor gene expression improves seed weight in soybean (Glycine max). J. Exp. Bot. 68, 469–482 (2017).
Ge, L. et al. Increasing seed size and quality by manipulating BIG SEEDS1 in legume species. Proc. Natl. Acad. Sci. USA 113, 12414–12419 (2016).
Liu, Y. C. et al. Pan-genome of wild and cultivated soybeans. Cell 182, 162–176 (2020).
Shen, Y. T. et al. De novo assembly of a Chinese soybean genome. Sci. China Life Sci. 61, 871–884 (2018).
Shen, Y. et al. Update soybean Zhonghuang 13 genome to a golden reference. Sci. China Life Sci. 62, 1257–1260 (2019).
Shen, Y. et al. Global dissection of alternative splicing in paleopolyploid soybean. Plant Cell 26, 996–1008 (2014).
Yoshino, M., Tsutsumi, N. A. & Kanazawa, K. A. The regulatory function of the upstream sequence of the β-conglycinin α subunit gene in seed-specific transcription is associated with the presence of the RY sequence. Genes Genet. Syst. 81, 135–141 (2006).
Nassrallah, A. et al. DET1-mediated degradation of a SAGA-like deubiquitination module controls H2Bub homeostasis. Elife 7, e37892 (2018).
Wu, C. J. et al. Three functionally redundant plant-specific paralogs are core subunits of the SAGA histone acetyltransferase complex in Arabidopsis. Mol. Plant 14, 1071–1087 (2021).
Grasser, K. D., Rubio, V. & Barneche, F. Multifaceted activities of the plant SAGA complex. Biochim. Biophys. Acta Gene Regul. Mech. 1864, 194613 (2021).
Samara, N. L. et al. Structural insights into the assembly and function of the SAGA deubiquitinating module. Science 328, 1025–1029 (2010).
Cornelio-Parra, D. V., Goswami, R., Costanzo, K., Morales-Sosa, P. & Mohan, R. D. Function and regulation of the Spt-Ada-Gcn5-Acetyltransferase (SAGA) deubiquitinase module. Biochim. Biophys. Acta Gene Regul. Mech. 1864, 194630 (2021).
Zhang, X.-Y. et al. The putative cancer stem cell marker USP22 is a subunit of the human SAGA complex required for activated transcription and cell-cycle progression. Mol. Cell 29, 102–111 (2008).
Köhler, A., Zimmerman, E., Schneider, M., Hurt, E. & Zheng, N. Structural basis for assembly and activation of the heterotetrameric SAGA histone H2B deubiquitinase module. Cell 141, 606–617 (2010).
Gennaro, V. J. et al. Control of CCND1 ubiquitylation by the catalytic SAGA subunit USP22 is essential for cell cycle progression through G1 in cancer cells. Proc. Natl. Acad. Sci. USA 115, E9298–E9307 (2018).
Sedivy, E. J., Wu, F. & Hanzawa, Y. Soybean domestication: the origin, genetic architecture and molecular bases. New Phytol 214, 539–553 (2017).
Zhou, Z. et al. Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat. Biotechnol. 33, 408–414 (2015).
Duan, Z., Li, Q., Wang, H., He, X. & Zhang, M. Genetic regulatory networks of soybean seed size, oil and protein contents. Front. Plant Sci. 14, 1160418 (2023).
Kato, S. et al. A major and stable QTL associated with seed weight in soybean across multiple environments and genetic backgrounds. Theor. Appl. Genet. 127, 1365–1374 (2014).
Hoeck, J. A. et al. Molecular marker analysis of seed size in soybean. Crop Sci 43, 68–74 (2003).
Panthee, D. R., Pantalone, V. R., West, D. R., Saxton, A. M. & Sams, C. E. Quantitative trait loci for seed protein and oil concentration, and seed size in soybean. Crop Sci 45, 2015–2022 (2005).
Liu, Y. L. et al. Identification of quantitative trait loci underlying plant height and seed weight in soybean. Plant Genome 6, 1–11 (2013).
Kim, H.-K. et al. Analysis of quantitative trait loci (QTLs) for seed size and fatty acid composition using recombinant inbred lines in soybean. J. Life Sci. 20, 1186–1192 (2010).
Karikari, B. et al. Utilization of interspecific high-density genetic map of RIL population for the QTL detection and candidate gene mining for 100-Seed weight in soybean. Front. Plant Sci. 10, 1001 (2019).
Yan, L. et al. Identification of QTL with large effect on seed weight in a selective population of soybean with genome-wide association and fixation index analyses. BMC Genomics 18, 529 (2017).
Wang, J. et al. Identification of quantitative trait loci for oil content in soybean seed. Crop Sci 55, 23–34 (2015).
Liu, Y. L. et al. The deubiquitinating enzyme activity of USP22 is necessary for regulating HeLa cell growth. Gene 572, 49–56 (2015).
Zhao, Y. et al. A TFTC/STAGA module mediates histone H2A and H2B deubiquitination, coactivates nuclear receptors, and counteracts heterochromatin silencing. Mol. Cell 29, 92–101 (2008).
Reyes-Turcu, F. E., Ventii, K. H. & Wilkinson, K. D. Regulation and cellular roles of ubiquitin-specific deubiquitinating enzymes. Annu. Rev. Biochem. 78, 363–397 (2009).
Bonnet, J., Romier, C., Tora, L. & Devys, D. Zinc-finger UBPs: regulators of deubiquitylation. Trends Biochem. Sci. 33, 369–375 (2008).
Geng, F., Tansey, W. P. & Sommer, T. Polyubiquitylation of Histone H2B. Mol. Biol. Cell 19, 3616–3624 (2008).
Wang, L., Cao, C., Wang, F., Zhao, J. & Li, W. H2B ubiquitination: Conserved molecular mechanism, diverse physiologic functions of the E3 ligase during meiosis. Nucleus 8, 461–468 (2017).
Raj, A., Stephens, M. & Pritchard, J. K. fastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics 197, 573–589 (2014).
Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).
Fang, C. et al. Genome-wide association studies dissect the genetic networks underlying agronomical traits in soybean. Genome Biol 18, 161 (2017).
Xie, S., Shen, B., Zhang, C., Huang, X. & Zhang, Y. sgRNAcas9: a software package for designing CRISPR sgRNA and evaluating potential off-target cleavage sites. PLoS One 9, e100448 (2014).
YOO, S.-D., Cho, Y.-H. & Sheen, J. Arabidopsis mesophyll protoplasts: a versatile cell system for transient gene expression analysis. Nat. Protoc. 2, 1565–1572 (2007).
Ma, S. et al. Reversible histone H2B monoubiquitination fine-tunes abscisic acid signaling and drought response in rice. Mol. Plant 12, 263–277 (2019).
Morrow, M. E. et al. Active site alanine mutations convert deubiquitinases into high‐affinity ubiquitin‐binding proteins. EMBO Rep 19, e45680 (2018).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550 (2014).
Wu, T. et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation (Camb) 2, 100141 (2021).
Xu, L. et al. Regulation of rice tillering by RNA-directed DNA methylation at miniature inverted-repeat transposable elements. Mol. Plant 13, 851–863 (2020).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Ramirez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids. Res. 44, W160–W165 (2016).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Acknowledgements
This work was supported by the National Key Research and Development Program of China (grant no. 2021YFF1001201), the National Natural Science Foundation of China (grant no. 32272107, 32388201, 32090064), the Hainan Yazhou Bay Seed Laboratory Project (grant no. B21HJ0002), the “Strategic Priority Research Program” of the Chinese Academy of Sciences (grant no. XDA24030501), and the Xplorer Prize Award.
Author information
Authors and Affiliations
Contributions
Z.T. designed and supervised the project; S.Liang, Z.D., X.H., X.Y., Y.Y., Y.P., G.Z. and M.Z. performed the experiments; S.Liang, Q.L., S.Liu and Z.T. analyzed the data; and S. Liang, S. Liu and Z.T. wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Bin Liu, Dawei Zhang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Liang, S., Duan, Z., He, X. et al. Natural variation in GmSW17 controls seed size in soybean. Nat Commun 15, 7417 (2024). https://doi.org/10.1038/s41467-024-51798-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-024-51798-5
This article is cited by
-
GmCDC7 is involved in coordinating seed size and quality in soybean
Theoretical and Applied Genetics (2025)
-
GWAS analysis revealed genomic loci and candidate genes associated with the 100-seed weight in high-latitude-adapted soybean germplasm
Theoretical and Applied Genetics (2025)
-
Genome-wide exploration of soybean domestication traits: integrating association mapping and SNP × SNP interaction analyses
Plant Molecular Biology (2025)