Introduction

Transcription factors are proteins that typically possess DNA-binding domains, transcription regulatory domains, oligomerization sites, and nuclear localization signals1,2. To date, several transcription factors have been identified. TCP (TEOSINTE BRANCHED1/CYCLOIDEA/PROLIFERATING CELL FACTOR) transcription factor family was named based on the first letter of three genes: TEOSINTE BRANCHED1 (TB1) from maize, CYCLOIDEA (CYC) from snapdragon, and PROLIFE RATING CELLFACTORS1 and 2(PCF1 and PCF2) from rice3,4. It belongs to a family of plant-specific transcription factors that contain conserved domains and encode proteins that are structurally similar5,6. TCP transcription factors are key regulators of plant growth, and development and simultaneously regulate the expression of other genes7,8,9,10,11,12.

TCP transcription factors are involved in various biological processes in plants. Several recent studies have indicated that TCP genes play a role in the abiotic stress response13,14,15. Overexpression of OsTCP19 enhances plant tolerance to NaCl, whereas overexpression of OsTCP14 increases plant sensitivity to cold stress16,17. StTCP-silenced potato plants exhibit increased sensitivity to disease, suggesting that the TCP transcription factor family may play an important role in plant disease defense18. TCP transcription factors in grapes are associated with fruit development, and their expression can be inhibited by drought and water stress19. In Citrus, the expression levels of seven TCP transcription factors are significantly altered in response to drought stress20. TCP transcription factors in cassava are involved in distinct signaling pathways that contribute to the response to cold and drought stress21. Twenty-three CnTCP genes have been identified in Chrysanthemum nankingense, three of which exhibit a rapid response to cold stress22. In addition, cis-acting element analysis has shown that the PavTCP family in cherry has many light-responsive elements, and after shading treatment, the fruit color is significantly lighter, indicating that TCP transcription factors are related to shading stress23. These results suggest that the TCP transcription factor family is closely associated with plant stress responses.

Soybean originated in China and is cultivated globally. Soybean is an important and valuable protein source for humans24. In soybean, TCP transcription factors play an important role in growth and development25,26. GmTCP and GmNLP regulate soybean root development and nodulation under various nitrogen concentrations27. GmTCP40 can bind to the GmAP1a promoter and activate its transcription, thereby promoting soybean flowering under long-day conditions28. In addition, the TCP transcription factor genes GmFT2a and GmFT5a play a role in floral transition, whereas GmFT1a and GmFT4 suppress soybean flowering. Furthermore, GmFT7 is strongly expressed during floral transition in soybean29.

Several studies have demonstrated that TCP genes play crucial roles in the response of soybean to stress. The expression profile results indicated that GmTCPs respond to various abiotic stresses, including drought, salt, and heat, as well as to hormone treatments, such as ABA, BR, SA, and MeJA30. However, little is known about the functions and types of the TCP family in soybean.

GmTCP670 is a candidate gene related to alpha-subunit deletion, identified within the fine localization interval of the sensitized protein alpha-subunit deletion gene using the map-cloning method. This gene was the only one among the 14 candidate genes annotated as a TCP family transcription factor, identified by the serial number Glyma20g28670 (hereafter referred to as GmTCP670). Studies have shown that the TCP transcription factor gene GmTCP670 is a 7 S α-subunit deletion-related gene that may be involved in the recombination and quality improvement of hypoallergenic soybean protein caused by α-subunit deletion of the sensitized protein.

The epigenetic regulation of methylation modifications is a prominent area of research at the forefront of life science. Here, we focused on the GmTCP670 gene to verify its function and analyze its molecular mechanisms. We aimed to clarify the role of the soybean TCP family transcription factor GmTCP670 gene in subunit loss and the improvement of protein nutritional quality in ‘alpha-subunit deletion hypoallergenic soybean,’ as well as to elucidate the molecular mechanisms underlying its function. Using the soybean variety Dongnong50 (DN50), GmTCP670-OE lines, and GmTCP670-CC line populations as materials, a high-throughput epigenetic analysis (whole-genome bisulfite sequencing [WGBS]) and transcriptome analysis (RNA sequencing [RNA-seq]) was performed. In addition, a transcriptome-epigenetic association analysis of differentially expressed genes (DEGs) was performed and the relationship between epigenetic modifications and regulation of GmTCP670 gene expression was verified. This study aimed to provide a multi-omics, comprehensive, and systematic theoretical basis for analyzing the molecular mechanisms underlying improvements in hypoallergenic soybean protein quality.

Results

Bioinformatics analysis of GmTCP670

Phytozome analysis revealed that the full length of GmTCP670 was 1128-bp, with no introns or untranslated regions encoding 375 amino acids. ExPASY predicted that the pI of GmTCP670 is 8.14, and its molecular weight is 42.49 kDa. The secondary and tertiary structures of GmTCP670 are shown in Fig. 1A and B, respectively. The primary elements of the secondary structure of GmTCP670 include α-helices and random coils. Furthermore, analysis using the TMHMM Server v. 2.0 indicated that the protein lacked a transmembrane domain, classifying it as a non-transmembrane protein (Fig. 1C). ProtScale analysis suggested that the protein was hydrophilic (Fig. 1D), and SignalP 5.0 analysis revealed no potential signal peptide sequence, suggesting that the protein was non-secretory (Fig. 1E). NetPhos 3.1 analysis suggested that the protein might be phosphorylated (Fig. 1F), and ePlant analysis of expression patterns indicated that GmTCP670 had the highest expression level in flowers (Fig. 1G). Moreover, subcellular localization analysis suggested that GmTCP670 might be localized in the nucleus. The experimental results showed that the fluorescence of the control GFP protein was distributed throughout the cell, whereas that of the GmTCP670-GFP fusion protein was exclusively located in the nucleus (Figure S1), indicating that the GmTCP670 protein was localized in the nucleus.

A phylogenetic tree was constructed using full-length amino acid sequences of the TCP transcription factor family from Arabidopsis and soybean (Fig. 2). TCP families were categorized into three groups: two groups for Class I TCP genes and one group for Class II CYC/TB1-type TCP genes. According to the markers in the phylogenetic tree, GmTCP670 is classified as a Class II CYC/TB1-type TCP gene. GmTCP670 contains an ‘R domain’ that is associated with arginine enrichment. Therefore, we speculated that GmTCP670 might be related to arginine metabolism.

Fig. 1
figure 1

Bioinformatics analysis of GmTCP670. (A) Secondary structure of GmTCP670. (B) Tertiary structure of GmTCP670. (C) Prediction of transmembrane structural domains of GmTCP670. (D) Prediction of the hydrophobic regions of GmTCP670. (E) Prediction of signal peptide of GmTCP670. (F) Prediction of the phosphorylation sites of GmTCP670. (G) Prediction of expression patterns of GmTCP670.

Fig. 2
figure 2

Phylogenetic relationships between TCP proteins from soybean and Arabidopsis.

Identification of transgene-positive plants

To investigate whether GmTCP670 plays a role in regulating plant development, the CDS of GmTCP670 was placed under the control of the cauliflower mosaic virus CaMV 35 S promoter. This construct was then introduced into soybean to generate stable GmTCP670-OE transgenic soybean lines (Fig. 3A). We obtained five independent homozygous GmTCP670-OE lines: OE14, OE15, OE19, OE21, and OE22 (Fig. 3B and C). OE14, OE15, and OE19 plants showed a round leaf phenotype, whereas OE21 and OE22 plants showed a phenotype similar to that of wild-type (WT) plants. However, no differences were observed in seed storage proteins among the lines (Fig. 3D). Thus, OE14, OE15, and OE19 lines were used for further studies.

We used CRISPR/Cas9-mediated genome editing tools to knock out the GmTCP670 gene in soybeans. Two guide RNAs were designed to specifically target GmTCP670 (Fig. 4A). The genomic locus of GmTCP670 containing the target sites was amplified by PCR using specific primers (Table S1). Sequencing analysis revealed that six independent T0 lines exhibited CRISPR/Cas9-mediated fragment deletions, and the results of PCR detection for the bar and Cas9 genes were positive (Fig. 4B, C). The subunit components of the seeds from DN47, DN50, and T3 seeds from GmTCP670-CC transgenic plants were visualized using SDS-PAGE (Fig. 4D). However, there were no significant changes in the subunit composition of the GmTCP670-CC seeds. We did not observe significant differences in plant growth and development between the five homozygous mutants and WT plants. The homozygous mutants CC01, CC02, and CC04, which had a deleted fragment length of 388 bp, were designated as GmTCP670-CC lines and were used for further analysis.

Fig. 3
figure 3

Construction and analysis of GmTCP670-OE lines. (A) Diagram of the recombinant expression vector, 35 S-GmTCP670 construct. LB, left T-DNA border; RB, right T-DNA border; 35 S, cauliflower mosaic virus 35 S promoter; Tnos, terminator nopaline synthase. (B) PCR detection of transgenic T0 soybean plant bar gene. M, DNA Marker DL 2000; +, 35 S-GmTCP670 recombinant plasmid; −, WT DN50 soybean plant lines; 1−5, transgenic plants. (C) LibertyLink strip analysis of the T0 transgenic plants. 1–5, transgenic plants. (D) SDS-PAGE analysis of protein extracts from mature T3 seeds harvested from GmTCP670-OE positive plants. Original gels are presented in Figs. S2 and S3.

Fig. 4
figure 4

Target site in the GmTCP670 gene and results obtained from mutagenesis of GmTCP670 using CRISPR/Cas9 technology. (A) Gene structure of GmTCP670 at the two target sites. (B) Detailed sequences of the target site of GmTCP670 in T0 lines. The black and red dashed lines represent DNA fragment omission and deletion, respectively; d, deletion; s, substitution. (C) PCR detection of transgenic T0 soybean plant bar gene and cas9 gene. M, DNA Marker DL 2000; +, pCBSG015-GmTCP670 recombinant plasmid; −, WT DN50 soybean plant lines; 1–6, transgenic plants. (D) SDS-PAGE analysis of protein extracts from mature T3 seeds harvested from GmTCP670-CC plants. Original gels are presented in Figs. S4 and S5.

Assessment of the agronomic traits of Transgenic plant GmTCP670-CC and GmTCP670-OE lines

The agronomic traits of transgenic plant lines GmTCP670-CC and GmTCP670-OE are shown in Fig. 5. During the same growth period, the GmTCP670-OE lines were taller than the DN50 and GmTCP670-CC lines (Fig. 5A). Although the DN50 and GmTCP670-CC lines had already progressed to the flowering stage, the GmTCP670-OE lines had not yet bloomed (Fig. 5B). This significant delay in flowering time for GmTCP670-OE resulted in a significant delay in maturity, approximately 13 days later than that of the other lines. Additionally, the leaf morphology of GmTCP670-OE transitioned from the sharp leaf type of WT to the round leaf type, with a leaf area significantly larger than those of DN50 and GmTCP670-CC. No significant differences were observed between DN50 and GmTCP670-CC (Fig. 5C). Furthermore, there were no significant differences in plant height, pod number per plant, or 100-seed weight between GmTCP670-CC and DN50. However, GmTCP670-OE showed significantly greater plant height and 100-seed weight than DN50, whereas the number of grains per plant was significantly lower than that of DN50 (Fig. 5D). In addition, the protein content in the mutant lines GmTCP670-CC and GmTCP670-OE was significantly lower than that in DN50, whereas the oil content was significantly higher than that in DN50 (Fig. 5E).

Fig. 5
figure 5

Assessment of agronomic traits of knockout (GmTCP670-CC) and overexpressing (GmTCP670-OE) lines. (A) Plants during the vegetative growth stage. (B) Comparison of flowering and maturity status between GmTCP670-CC and GmTCP670-OE plants. (C) Representative pictures showing trifoliate leaf shapes and comparison of leaf areas. Bar = 5 cm. (D) Plant height, pod number per plant, seed number per plant, and 100-seeds weight. (E) Seed protein and oil contents of T4-generation GmTCP670-CC, GmTCP670-OE, and DN50 plants grown under field conditions. Significant differences are indicated by different letters following Duncan’s multiple range test (P < 0.05).

Overexpressed and knockout of GmTCP670 affect the amino acid content and quality of soybean seeds

The free and total amino acid contents of mature T4 seeds from the mutant GmTCP670-CC and GmTCP670-OE were evaluated (Table 1). Except for methionine (Met), which was present at a lower level than in DN50, the concentration of other amino acids in GmTCP670-CC was higher than that in both DN50 and GmTCP670-OE, with a significant difference (P < 0.05) compared with GmTCP670-OE. Specifically, methionine content in GmTCP670-CC was 2.27% lower than that in DN50, whereas cysteine (Cys) was 7.84% higher than that in DN50. This resulted in a sulfur-containing amino acid (Met and Cys) content that was 3.16% higher than that in DN50 and 16.67% higher than that in GmTCP670-OE.

The free essential amino acid content of GmTCP670-CC was significantly lower than that of DN50 but significantly higher than that of GmTCP670-OE. The total free amino acids in GmTCP670-CC were significantly higher than those in DN50 and GmTCP670-OE (P < 0.05), by 18.55% and 83.95%, respectively. The levels of free methionine, alanine, and arginine in GmTCP670-CC were significantly higher than those in DN50 and GmTCP670-OE. The free methionine content was 1.74-fold higher than that of DN50 and 2.26-fold higher than that of GmTCP670-OE, and free arginine content was 43.15% and 211.33% higher than that of DN50 and GmTCP670-OE, respectively. The contents of free phenylalanine, aspartic acid, and glycine in GmTCP670-CC were similar to those in DN50 but significantly higher than those in GmTCP670-OE (P < 0.05).

Table 1 Comparison of free and total amino acid composition of mature seeds between the DN50 cultivar, overexpressing (GmTCP670-OE), and knockout (GmTCP670-CC) lines.

Transcriptome comparative analysis of overexpression and knockout of GmTCP670

Comparative analysis of differentially expressed genes (DEGs, q-value < 0.05, | log2 foldchange | > 1) in the transcriptomes of GmTCP670-OE, GmTCP670-CC, and DN50 is presented in Fig. 6A. A total of 196 DEGs were identified between GmTCP670-CC and DN50, comprising 45 upregulated and 151 downregulated genes, respectively. Between GmTCP670-CC and GmTCP670-OE, 2317 DEGs were found with 1163 upregulated and 1154 downregulated genes. Additionally, there were 7514 DEGs between GmTCP670-OE and DN50, including 4250 upregulated and 3264 downregulated genes. To validate the RNA-seq results, qRT-PCR was used to select three significantly upregulated and three significantly downregulated genes in GmTCP-CC and GmTCP-OE for mRNA quantification (Figure S6). These results were consistent with those obtained from RNA-seq.

There were 24 common DEGs identified between GmTCP670-CC vs. DN50 and GmTCP670-CC vs. GmTCP670-OE, 149 common DEGs between GmTCP670-OE vs. DN50 and GmTCP670-CC vs. DN50, and 1997 common DEGs between GmTCP670-CC vs. GmTCP670-OE and GmTCP670-OE vs. DN50 (Fig. 6B). Among the three comparison groups, six common DEGs were distributed on different chromosomes: Glyma.01G142400, Glyma.07G163300, Glyma.09G027000, Glyma.13G289100, Glyma.15G244200, and Glyma.16G142200 (Fig. 6C). The enriched GO terms and KEGG pathways were also identified. The top 10 GO entries, along with the number of differentially expressed genes in each category, were plotted and analyzed in a bar chart (Figure S7). KEGG pathway analysis indicated that several important pathways, such as “Vitamin B6 metabolism,” “Biosynthesis of flavonoids and flavonols,” and “Linoleic acid metabolism” were enriched (Figure S8).

Fig. 6
figure 6

Comparative analysis of DEGs in the transcriptome of GmTCP670-CC, GmTCP670-OE, and DN50. (A) Number of DEGs among GmTCP670-CC, GmTCP670-OE, and DN50. (B) Venn diagram of DEGs. (C) DEGs in GmTCP670-CC, GmTCP670-OE, and DN50.

Association analysis of transcriptome and methylation of GmTCP670-CC, GmTCP670-OE, and DN50

According to the results of the association analysis of the transcriptome and methylation association analysis, genes related to growth and development, amino acid metabolism, and Arg metabolism were screened for differential expression in the transcriptome and simultaneous methylation modification changes, as shown in Table 2.

In the group GmTCP670-CC-vs-DN50, two genes, Glyma.01G178800 and Glyma.14G010300, were identified and annotated to be involved in amino acid transport. This indicated that the amino acid transport genes exhibited higher activity, resulting in elevated levels of certain amino acids in GmTCP670-CC. In group GmTCP670-OE vs. DN50, four genes were identified related to growth and development. Among these, three genes were associated with leaf development (Glyma.10G067200, Glyma.11G008500, and Glyma.19G192700), whereas one gene was related to growth and transcriptional regulation (Glyma.11G110700). Additionally, four genes were related to amino acid metabolism, including two associated with the vacuolar amino acid transporter (Glyma.01G178800 and Glyma.20G181900) and two associated with the amino acid hydrolase ILR1 like 4 (Glyma.04G247600 and Glyma.18G132000). Furthermore, three genes were associated with arginine, annotated as having arginine transmembrane transporter activity (Glyma.03G083600), arginine N-methyltransferase activity (Glyma.09G000300), and bifunctional arginine biosynthesis protein activity (Glyma.10G044300).

In the comparison between GmTCP670-CC vs. GmTCP670-OE, three genes associated with growth and development was identified. This included two genes involved in growth regulation and transcriptional regulation (Glyma.01G234400, Glyma.11G008500), and one gene involved in growth regulation and root development (Glyma.17G232600). Additionally, four genes related to amino acid metabolism were identified, comprising one gene associated with the transport of neutral and acidic amino acids (Glyma.04G209100) and three genes involved in amino acid transmembrane transport protein activity (Glyma.06G088200, Glyma.06G090200 and Glyma.09G118900). Furthermore, three genes related to arginine were identified, which coexisted in the comparison between GmTCP670-OE and DN50.

Table 2 Genes differentially expressed at the transcriptional level and altered methylation modifications simultaneously.

Interaction between GmTCP670 transcription factor and GGACCC element

The TCP domain region frequently binds to specific DNA regulatory elements, suggesting that TCP factors function as part of a regulatory complex to regulate gene expression. To verify the interaction between GmTCP670 and GGNCCC elements, we used a yeast single hybrid system. The GmTCP670 transcription factor gene was cloned into the yeast vector, pGADT7 (Fig. 7A). Three repetitive sequences containing the GGNCCC core sequence were constructed upstream of the Pmin promoter and ligated to the pHIS2 vector (Fig. 7B). The pHIS2 plasmid, which contains the target element GGNCCC, and the pGADT7 plasmid, associated with the target gene GmTCP670, were co-transformed into Y187 yeast. The transformed yeast cells were cultured on SD/-His-Trp and SD/-His-Trp-Leu media supplemented with varying concentrations of 3-AT. However, the co-transformed strains (pHIS2-GGNCCC + pGADT7 and pHIS2 + pGADT7-GmTCP670) failed to grow on SD/-His/-Trp/-Leu medium supplemented with 300 mM/L 3AT.

Therefore, the SD/-His/-Trp/-Leu medium supplemented with 300 mM/L 3AT was used to further investigate the interaction between GmTCP670 and the element. As shown in Fig. 7C, the yeast strains co-expressing pHIS2-GGNCCC and pGADT7-GmTCP670, along with the positive control pHIS2-P53 + pGAD53m, exhibited normal growth on SD/-His/-Trp/-Leu + 300 mmol/L 3AT. However, the growth of the control groups pHIS2-GGNCCC + pGADT7 and pHIS2 + pGADT7-GmTCP670 was inhibited in SD/-His/-Trp/-Leu + 300 mM/L 3AT medium. Notably, the GmTCP670 TF interacts with the GGNCCC element. We confirmed that the GGACCC element could bind to the GmTCP670 transcription factor.

Fig. 7
figure 7

Y1H assay of the interaction between GmTCP670 transcription factor and GGNCCC. (A) Diagrams of pGADT7 + GmTCP670 prey vectors. (B) Diagram of pHIS2 + GGNCCC bait vectors. (C) Y1H validation of GmTCP670 binding to GGACCC.

Y2H screening of GmTCP670 interacting proteins

Y2H analysis was conducted to determine whether GmTCP670 exhibited transcriptional activation activity in yeast cells using expression constructs and reporter constructs (Fig. 9A). These results demonstrated that both full-length GmTCP670 and GmTCP670-II activated transcription in yeast. In contrast, GmTCP670-I (AAs 97–249) and GmTCP670-III (AAs 97–375) were unable to activate transcription (Fig. 9B). Consequently, GmTCP670-III was used to screen the libraries.

Fifty-eight blue colonies were characterized by sequence analysis using BLAST. Sixteen candidate genes encoding proteins that might interact with GmTCP670 are listed in Fig. 8. Homology analysis revealed that these candidate proteins were associated with seed storage proteins, domain-containing proteins, and phosphate dehydrogenase.

Fig. 8
figure 8

Part of library screening results by Y2H.

Interaction of GmTCP670 with GmLOB40 in yeast

To confirm which protein interacts with GmTCP670-III, we selected four fusion genes (Gy1, Gy4, Gy5, and LOB40) that were predicted to be related to the 11 S seed storage protein, and lateral organ boundary (LOB) domain-containing proteins were selected for further investigation. Full-length cDNAs of the four genes were cloned and constructed in a pGADT7 vector (Fig. 9C). We analyzed whether these proteins could activate transcription and interact with GmTCP670 in the yeast cells. Our results showed that none of the four proteins activated transcription in yeast cells; however, GmLOB40 interacted with GmTCP670-III (Fig. 9D), whereas the other three candidate proteins did not interact with GmTCP670-III in yeast.

Fig. 9
figure 9

Interaction between GmTCP670 and GmLOB40 in yeast cells. (A) Diagram of pGBKT7 + GmTCP670/GmTCP670-Ι/GmTCP670-II/GmTCP670-III bait vectors. (B) Full-length GmTCP670, GmTCP670-Ι, GmTCP670-II, GmTCP670-III transcription activation assays. (C) Reverse transcription polymerase chain reaction cloning of GmLOB40 (Glyma.18G297100), GmGy4 (Glyma.10G037100), GmGy5 (Glyma.13G123500), and GmGy1 (Glyma.03G163500) full-length cDNAs. Original gel is presented in Figure S9. (D) GmTCP670-III interacted with GmLOB40 in yeast cells.

Discussion

The TCP transcription factor family plays a vital role in plant development and stress responses31,32,33. Through systematic gene function verification analysis, this study demonstrated that the soybean GmTCP670 transcription factor influences soybean growth and development from multiple aspects. Overexpression of the GmTCP670 transcription factor significantly altered leaf type, plant height, flowering time, and growth period, all of which are closely related to soybean yield. This study provides valuable genetic resources for the molecular design breeding of high-quality, high-yield soybean varieties. The results of CRISPR/Case9 gene editing in this study revealed that knocking out large fragments of the GmTCP670 gene improved the amino acid quality of T4 seeds compared with the control DN50 and overexpressed transgenic lines. This indicates that the differential expression of the GmTCP670 gene affects the amino acid quality of soybean proteins. Thus, it was proven that the GmTCP670 transcription factor is an excellent genetic resource for enhancing the amino acid quality of soybean protein.

Epigenetic modifications of nucleic acids, such as DNA and RNA, particularly through processes for instance methylation, play a crucial role in regulating cellular processes across all kingdoms of life. Third-generation sequencing technologies have identified numerous previously unknown modifications in diverse organisms34,35. A comprehensive analysis of transcriptome and whole-genome methylation sequencing was conducted. The findings from different combinations of GmTCP670-OE, GmTCP670-CC, and the control DN50 were compared, and various combinations of transcriptome expression and methylation modifications were screened. Eighteen DEGs related to growth and development, amino acid synthesis, and arginine synthesis were identified, which are important for analyzing the molecular mechanism of GmTCP670 transcription factor expression at the epigenomic level. In particular, the gene Glyma.10G044300, which is associated with arginine synthesis and metabolism exhibited significant differential expression in previous studies α-subunit deletion hypoallergenic soybean mutant. This study further confirms the presence of methylation modifications. Methylation modification may be an important molecular means to improve the quality of hypoallergenic soybean proteins. Using the transcriptome and genome-wide high-throughput methylation data obtained in this study, we conducted a comprehensive and systematic analysis of the molecular mechanisms by which the GmTCP670 transcription factor regulates soybean development and quality improvement. This will significantly contribute to an overall understanding of the molecular mechanisms involved in enhancing the quality of hypoallergenic soybean proteins.

Several studies have demonstrated that the TCP transcription factor can directly bind to the promoters of many genes via the GGNCCC/GGGNCC motif36. In this study, we used a Y1H system to investigate the relationship between the GmTCP670 and GGNCCC elements. The N in the GGNCCC element could be C, G, A, or T; therefore, we used four elements (GGCCCC, GGGCCC, GGACCC, and GGTCCC) combined with GmTCP670 to confirm the element type that could be recognized and bound by GmTCP670. According to the Y1H results (Fig. 7C), GGACCC was directly bound by GmTCP670.

Protein-protein interactions are ubiquitous in every cellular process, including DNA replication, transcription, splicing, and translation. Identifying and characterizing protein interactions is essential for understanding these processes at both molecular and biophysical levels. Among the various techniques developed to study protein-protein interactions, the Y2H system has proven to be a powerful tool for investigating molecular interactions37. In this study, we screened for proteins that interact with the soybean GmTCP670 transcription factor using a Y2H system. We selected four proteins, Gy1 (Glyma.03G163500), Gy4 (Glyma.10G037100), Gy5 (Glyma.13G123500), and LOB40 (Glyma.18G297100), for interaction validation. Three genes, Gy1, Gy4, and Gy5, encode 11 S seed storage proteins. Glyma.18G297100 encodes an LOB domain-containing protein. LOB domain genes belong to a gene family that encodes plant-specific transcription factors that play crucial roles in plant growth and development38. LOB genes regulate leaf and root development in plants39. Seed storage proteins (SSPs) are essential for germination and early seedling growth. Glyma.18G297100 could interact with GmTCP670, suggesting that GmTCP670 affects soybean growth and development. These findings will contribute significantly to a comprehensive analysis of the molecular mechanisms underlying soybean improvement.

Materials and methods

Experimental materials

For this study, the soybean variety Dongnong50 (DN50), which served as both the transgenic receptor and control material, was provided by Inner Mongolia Agricultural University, China.

Bioinformatics analysis of GmTCP670

We used Phytozome (https://phytozome-next.jgi.doe.gov/) to analyze gene and protein sequences. The secondary and tertiary structures of proteins were predicted using SOPMA (https://npsa.lyon.inserm.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html), and Swiss-Model (https://www.swissmodel.expasy.org/). We also used ExPASY (https://web.expasy.org/compute_pi/) for protein molecular-weight (Mw) determination, isoelectric point (pI) prediction, ProtScale (https://web.expasy.org/protscale/) for protein hydrophobicity analysis, TMHMM Server v 2.0 (https://services.healthtech.dtu.dk/service.php?TMHMM-2.0) for transmembrane structural domain prediction, SignalP 5.0, (https://services.healthtech.dtu.dk/services/SignalP-5.0/) for signal peptide prediction, and NetPhos 3.1 (https://services.healthtech.dtu.dk/services/NetPhos-3.1/) for phosphorylation site prediction, and ePlant (https://bar.utoronto.ca/eplant_soybean/) for expression pattern prediction. CELLO (http://cello.life.nctu.edu.tw/) was used to predict the subcellular localization of proteins. Phylogenetic analysis of the GmTCP670 tree for Glycine max and Arabidopsis thaliana was conducted using MEGA11.

Subcellular localization of GmTCP670 protein

The coding region of GmTCP670 was cloned into the pCAMBIA1302 vector to produce 35 S: GmTCP670::GFP, whereas the empty vector 35 S::GFP served as a control. Transient expression of green fluorescent protein (GFP)-fused proteins in Arabidopsis protoplasts. Transfected cells were observed under a confocal laser scanning microscope (Leica TCS SP2, Germany).

Generation of Transgenic plants

The full-length coding sequence (CDS) of GmTCP670 was amplified by polymerase chain reaction (PCR) from a complementary DNA (cDNA) library of the soybean cultivar Dongnong 47 (DN47). This sequence was identical to the CDS cloned from the Williams 82 cultivar and subsequently cloned into an overexpression vector. CRISPR/Cas9 gene knockout constructs were generated using the pCBSG015 (Basta) vector, which contains Cas9 protein. We designed two target sequences (5′-CCGCCATGCTTGAACACGATGCA-3′ and 5′- ATGGTGGTGATGCTTCCCGAGGG-3′) and combined them using the pCBSG015 (Basta) vector. The constructs were individually introduced into Agrobacterium tumefaciens strain EHA105 using the freeze-thaw method. The EHA105 recombinant strain was used to transform the DN50 cultivar background through soybean embryo cotyledonary node transformation, following a previously described method40.

Identification of transformants and mutations

The expression cassette containing the marker gene bar, which encodes phosphinothricin acetyltransferase (PAT), confers resistance to the herbicide ammonium glufosinate. To detect transgenic plants, the LibertyLink strip was used to measure PAT protein content according to the manufacturer’s instructions (Envirologix Inc., Portland, ME, USA). For transgenic plants, genomic DNA was extracted from the leaves using the cetyltrimethylammonium bromide (CTAB) method, the bar or cas9 gene was amplified to confirm T-DNA insertion and the target gene fragment was amplified using the following primers: 5ʹ-GCAAGCCACACCCTTCTTGA-3ʹ and 5ʹ-CCACCAAAGTTAACGGCCCC-3ʹ. The PCR products were then sequenced to verify gene editing, and only transformed plants with successful gene editing were used for subsequent experiments.

Analysis of subunit composition of soybean seed proteins

Sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) was used to analyze the protein composition of the seeds. A small portion of dry seeds, carefully removed manually to avoid damaging the embryonic axis, was used as a soy flour sample. Total seed proteins were extracted from soy flour using SDS sample buffer (2% SDS, 5% 2-mercaptoethanol, 10% glycerol, 5 M urea, and 62.5 mM Tris-HCl), followed by centrifugation at 15,000 g. A 10 µL aliquot of the supernatant was separated via SDS-PAGE using 4.5% stacking and 12.5% separating polyacrylamide gels and stained with Coomassie Brilliant Blue R-250 dye. The gels were scanned using a SHARP JX-330 scanner (Amersham Biosciences, Canada).

Characterization of agronomic traits

T4 GmTCP670-OE lines, GmTCP670-CC lines, and soybean variety ‘DN50’ were grown in Inner Mongolia, China, under standard growth conditions. The experiment used a randomized complete block design, with three replicates. Seeds were sown in rows that were three meters long and 0.65 m apart, with a spacing of 6 cm between each plant. After reaching full maturity, the mature seeds were harvested from the plants and air-dried. In total, 10 DN50 plants and 10 individuals from each transgenic line were randomly sampled at the mature stage in each replicate for phenotypic analysis. The plant height, pod number, seeds number per plant, and 100-seed weight were measured. The experiments were performed in triplicates. The leaves were scanned using a flatbed scanner (CanoScan LiDE 200, Canon Inc., Japan) and the leaf area (cm2) was measured using ImageJ.

Analysis of total seed protein and oil contents

Seed protein content was determined using the Dumas method41 (Rapid N Cube, Elementar Analytical, France), using the formula N × 6.25, to convert total nitrogen into protein content. The seed oil content was analyzed using a near-infrared grain analyzer (Infratec 1241 Analyzer; Foss, Denmark). The total oil and protein contents were expressed as a percentage of the dry weight.

Amino acid analysis

Total amino acids were extracted by hydrolyzing the seed meal in 6 M hydrochloric acid (HCl) for 22 h in sealed evacuated tubes, which were then placed in boiling water at 110 °C. An L-8800 amino acid analyzer (Hitachi, Japan) was used to estimate the amino acid composition of the hydrolysate.

Free amino acids were extracted from 5.00 g of the seed meal. The seeds were sampled according to a sample quartile method, fully dried, ground using a mill grinder, filtered through a 0.25-mm sieve, and thoroughly mixed. They were then finely homogenized in 30 mL sulfosalicylic acid (10 g per 100 mL) and disrupted ultrasonically for 30 min. The samples were centrifuged at 5000 ×g for 5 min. The supernatant was filtered using a 22-µm GD/X sterile disposable syringe filter. An L-8800 amino acid analyzer (Hitachi) was used to analyze the filtrate. Amino acid concentration was calculated using the following formula: g/16 g N in the test protein divided by g/16 g N in the scoring pattern.

Sample Preparation for RNA-seq and WGBS

Soybean seeds were collected from the control line DN50, overexpressing lines GmTCP670-OE, and knockout lines GmTCP670-CC 50 days after flowering. The seeds were stored in liquid nitrogen, and three biological replicates were performed. Half of the samples from each group were treated with TRIzol reagent (Invitrogen, Carlsbad, CA, USA) to extract RNA for RNA-seq assay, whereas DNA was isolated from the remaining samples using a modified CTAB method for the WGBS assay42.

Transcriptome sequencing and data analyses

Nine seed samples were sent to OEbiotech Co., Ltd. (Shanghai, China) for sequencing and analysis. Sequencing libraries were generated using the Illumina HiSeq X Ten sequencing platform. After adapter clipping and quality filtering, the clean reads were aligned to the soybean DN50 reference transcriptome using HisHat43. Gene expression levels were evaluated in reads per kilobase per million reads (RPKM) based on the number of reads mapped to the reference sequence. Differential expression analysis was performed using DEG-seq44. The screening criteria for DEGs were set at q-value < 0.05, and | log2 foldchange | > 1.

WGBS sequencing and data analysis

WGBS was performed by OE Biotech Co. Ltd. (Shanghai, China). DNA was fragmented by sonication using a Bioruptor (Diagenode, Brussels, Belgium) to achieve a mean size of approximately 250 base pairs (bp). This was followed by the addition of dA to the 3ʹ-end through blunt-end cloning and ligation of a methylated adaptor. Ligated DNA was converted to bisulfite using the EZ DNA Methylation-Gold kit (ZYMO, Tustin, CA, USA). DNA fragments of varying lengths were excised from 2% agarose gel, purified, and amplified by PCR. Sequencing was performed using an Illumina HiSeq 4000 platform (Illumina, San Diego, CA, USA). The Bismark software (version 0.16.3) was used to align the reads to the reference genome. Fisher exact test was used to identify significant DMRs, with screening criteria set at methylation differences ≥ 10% and a Q value ≤ 0.05. After DMRs were identified, differentially methylated genes (DMGs) located in DMRs were characterized.

Yeast one-hybrid (Y1H) assay

Y1H assay was performed according to the procedure described by Clontech (Takara). The cDNA fragment of GmTCP670, was cloned into the pGADT7 vector using the double-enzyme digestion method. The GGNCCC fragment was inserted thrice into the bait vector (pHIS2) to create pHIS2-GGNCCC (3 × GGNCCC). The plasmids containing the transformed bait vectors (pHIS2) were transformed into the yeast strain Y187 and cultured on SD/-Trp/-Leu medium at 30 °C for three days. pHIS2-GGNCCC was used for transcriptional activation. pHIS2-P53 and pGAD53m were co-transfected into yeast cells as positive controls. Similarly, pHIS2 + pGADT7-GmTCP670 and pHIS2-GGNCCC + pGADT7 were co-transformed into yeast cells to serve as the experimental groups.

Single bacterial colonies were randomly selected from each transformation reaction plate and cultured on SD/-His/-Trp/-Leu with or without 3-amino-1,2,4-triazole (0, 30, 60, 100, 150, 200, 300, and 400 mM/L 3AT) at 30 °C for three days. The concentration of 3AT used for screening positive colonies was determined based on the self-activation detection concentration of 3AT in pHIS2-GGNCCC bait vectors. The pHIS2-GGNCCC + pGADT7-GmTCP670 sample was randomly selected and spotted on SD/-Trp/-Leu/-Ura with an appropriate concentration of 3AT medium for 3–5 days of cultivation.

Transcription activation assays

Full-length of GmTCP670 and three cDNA fragments (GmTCP670-I encoding AAs 97–249, GmTCP670-II encoding AAs 1–249, and GmTCP670-III encoding AAs 97–375) were amplified by PCR using appropriate primers (GmTCP670-F/R, GmTCP670-I-F/R, GmTCP670-II-F/R, and GmTCP670-III-F/R; see Supplementary Table 1). Transcription activation assays were conducted using the yeast strain Y2HGold (Clontech), which contains HIS3 and ADE2 reporter genes regulated by distinct GAL4-responsive promoter elements. Purified PCR products were cloned into the pGBKT7 vector. The fusion plasmids and pGADT7 vector were transformed into yeast strain Y2HGold. Yeast cells were selected by culturing on SD/-Trp-Leu and SD/-Trp-Leu-His-Ade media. The pGBKT7-53 and pGADT7-T plasmids were introduced into yeast Y2HGold cells as positive controls, whereas yeast cells containing the pGBKT7-Lam and pGADT7-T plasmids served as negative controls.

Yeast two-hybrid (Y2H) library assays

A cDNA library was constructed from α-subunit-deficient hypoallergenic soybean seeds at 18, 35, and 55 days after flowering using a Y2H library construction kit (Clontech). The screening for interacting proteins was performed according to the manufacturer’s instructions (Clontech). Transformants from the cDNA library were plated on SD/-Trp-Leu-His-Ade medium at 30 °C for 3–5 days. Yeast colonies were cultured on SD/-Trp-Leu-His-Ade medium containing X-a-Gal (20 µg/mL) and aureobasidin A (AbA; 125 mg/mL). Blue colonies were characterized using PCR and sequencing techniques.

Y2H experiments were performed as previously described45. Full-length cDNAs of these genes were amplified using PCR and subsequently cloned into the pGADT7 vector. Fusion plasmids and pGBKT7-GmTCP670-III were co-transformed into yeast strain Y2HGold. After selection at 30℃, yeast colonies grown on SD/-Trp-Leu medium were transferred to SD/-Trp-Leu-His-Ade or SD/-Trp-Leu-His-Ade/X-a-Gal/AbA media. Yeast Y2HGold cells containing pGBKT7-53 and pGADT7-T served as positive controls, whereas the co-expression of pGBKT7-lam and pGADT7-T was used as a negative control.

Statistical analysis

Data are presented as mean ± standard deviation. After conducting a one-way analysis of variance, a post-hoc Duncan’s multiple range test was performed to evaluate the differences among the group means using IBM SPSS Statistics software (version 22.0), with a significance level set at P < 0.05.