Abstract
Flowering at the appropriate time is one of the critical traits to determine regional adaptation and yield of soybeans. Although several genes in regulating flowering time have been identified in soybeans, the epigenetic regulators are still largely unknown. Here, we identify an early flowering mutant (ef1) and clone the causal gene as lysine-specific demethylase like 2 (GmLDL2). Enzymatic activity assay shows that GmLDL2 has demethylase activity towards H3K4me1 and H3K4me2. Integrative ChIP-seq and RNA-seq analyses demonstrate that GmLDL2 directly inhibits the expression of GmFER, an ortholog of Arabidopsis FERONIA. Genetic analysis reveals that GmLDL2 regulates soybean flowering in a GmFER-dependent manner. Interestingly, both GmLDL2H1 and GmLDL2H2 haploids display distinct demethylase activity and likely facilitate the adaptation of soybeans to low- and high-latitude regions, respectively. Together, our findings reveal the GmLDL2-GmFER module in regulating soybean flowering and provide useful information for manipulating the regional adaptability of soybeans.
Similar content being viewed by others
Introduction
Flowering in angiosperms is referred to as the transition from vegetative to reproductive growth, which ensures the success of propagation and crop yield1. This process is strictly regulated by various endogenous and environmental cues, among which day length (photoperiod) change is one of the critical factors to induce flowering at a favorable time. Based on the response to photoperiod, plants can be classified into three types: long-day, short-day, and day-neutral2. The general process of photoperiod-induced flowering is conserved between long-day and short-day plants. First, the photoreceptor in leaves tracks the photoperiod and integrates the information into the circadian clock to generate a flowering signal called florigen, which then transports to the shoot apex and activates a set of floral identity genes, thus triggering flowering3,4,5,6. Aside from those, there are multiple additional layers of post-transcriptional modifications that regulate flowering time in a photoperiod-dependent manner6,7,8,9.
Soybean (Glycine max) is one of the most important crops in the world. It contains high protein and oil contents, as well as a potent nitrogen fixation ability. As a typical short-day crop, soybean flowering is strictly regulated by photoperiod, which has a great impact on the adaptation of elite cultivars to a broad range of latitudes10. Current evidence supports the idea that cultivated soybean has been domesticated and cultivated for more than 5000 years from the temperate regions in China11, whereas artificial selection in combination with natural variation of critical quantitative trait loci (QTLs) controlling flowering and maturity led to the adaption of soybeans to the regions ranging from 54˚ N to 35˚S around the world12. Among those major QTLs, several of them that contribute to the flowering and maturity have been identified, which include E1 to E11, J, LUXs (LUX ARRHYTHMO), and Tofs (Time of flowering)3. E1, E3, E4, Tof4, Tof5, Tof8, Tof11, Tof12, and Tof13 contribute to the adaptation of soybeans to high-latitudes13,14,15,16,17,18,19; while J, LUXs, Tof16, and Tof18 facilitate their adaptation to low-latitudes20,21,22,23,24,25. In particular, E1 is a leguminous-specific transcription factor and was thought to be a repressor of photoperiodic responses in soybeans26. The long-juvenile gene J is the ortholog of EARLY FLOWERING 3 (ELF3), which forms the circadian clock evening complex (EC) together with ELF4 and LUX to suppress the transcription of E120,22,24,25. FLOWERING LOCUS T 2a (FT2a) and FT5a are two key FT genes whose expressions are repressed by E1 family members27. E3 and E4 are the homologs of photoreceptor phytochrome A (phyA), which can physically interact with EC or E1 to modulate their protein stability20. In a word, the phyA-EC-E1-FT pathway is primarily responsible for the photoperiodic flowering of soybeans and shows a profound impact on their adaptation ability. On the other hand, epigenetic regulators for flowering in soybeans remain largely unknown when compared to the well-studied epigenetic regulation mechanisms in the model plant Arabidopsis thaliana.
Epigenetic regulations refer to the regulation of gene transcription without altering the DNA sequence, which includes DNA methylation, histone modification, histone variants, and noncoding RNAs28. Histone modification refers to the histone methylation and demethylation at various lysine residues of the histone tail. Histone methylation is regulated by a set of protein methyltransferases; while histone demethylation is catalyzed by histone demethylases29. Human lysine-specific demethylase 1 (HsLSD1) was the first histone demethylase to be identified and demethylates H3K4 mono-methylation (H3K4me1) and H3K4 di-methylation (H3K4me2)30,31,32. In the model plant Arabidopsis, there are four HsLSD1 homologs, including LSD-LIKE1 (LDL1), LDL2, LDL3, and FLOWERING LOCUS D (FLD). All of them have the activity to demethylate H3K4 and participate in regulating the transition to flowering33,34,35,36. In the fld mutant, the increased H3K4me1 at the FLC locus (FLOWERING LOCUS C) results in the upregulation of FLC, which in turn suppresses the flowering33. Similarly, H3K4 methylation of the FLC locus is also regulated by LDL1 and LDL2, which are functionally redundant with FLD. Moreover, LDL1 and LDL2 also regulate the H3K4 methylation at FLOWERING WAGENINGEN (FWA, a floral repressor)34. Unlike the delayed flowering of fld and ldl1/ldl2 mutants, the flowering time of the ldl3 mutant is expedited35, suggesting that they have divergent roles in regulating the flowering. Heading data 18 (Hd18), which is a rice homolog of FLD, induces rice flowering under both short- and long-day conditions37. It was demonstrated in a previous study that overexpression of GmFLD complemented the late-flowering phenotype of the Arabidopsis fld mutant38. GmLDL2 was identified as a homolog of Arabidopsis histone demethylase LDL238,39. However, the function and characteristics of GmLDL2 in soybeans is still largely unknown.
In this study, we perform a forward genetic screen and characterize an early-flowering soybean mutant ef1 (early flowering 1). We further demonstrate that GmLDL2 is the target gene of ef1. Enzymatic activity assays demonstrate that GmLDL2 displays demethylase activity to H3K4me1 and H3K4me2. Furthermore, RNA-seq and chromatin immunoprecipitation (ChIP)-seq analyses reveal that GmLDL2 directly binds to the ortholog of Arabidopsis FERONIA (GmFER) to regulate its transcription by altering the levels of H3K4me1 and H3K4me2 at 3’UTR and downstream regions. Consistently, genetic analyses indicate that GmLDL2 negatively regulates soybean flowering via repressing the expression of GmFER. Finally, haplotype analyses suggest that different alleles of GmLDL2 undergo artificial selection in adaptation to low- or high-latitudes. Together, the study offers a potential target for improving regional adaptability by manipulating molecular design breeding of soybeans.
Results
Screening an early-flowering soybean mutant ef1 and identification of the causal gene
To identify other genes involved in the flowering time control of soybeans, we performed a forward genetic screening of progeny from soybean cultivar Huaxia3 (HX3) mutagenized with 60Co γ-Ray. An early-flowering mutant, designated as ef1 was isolated. The flowering time of ef1 mutant is significantly earlier than HX3 under both short-day and long-day conditions (Fig. 1a, b). In the field experiment, compared with HX3, the maturity time of ef1 mutant is also earlier (Fig. 1c). To verify the earlier flowering phenotype of the ef1 mutant under natural conditions, HX3 and ef1 were planted from January to December in the field and it was found that the flowering time of ef1 is earlier than HX3 for all sowing dates (Fig. 1d). These data demonstrated that the early flowering phenotype of ef1 is very stable. We also measured several yield-related traits of the ef1 mutant, including plant height, node number, valid branch number, pod number, one hundred seed weight, and grain weight per plant. Compared with the wild type, the mutant exhibits a slight reduction in pod number and grain weight per plant (Supplementary Table 1).
a Phenotype of whole plants and flowers of HX3 and ef1 under short-day conditions. ef1 showed early flowering compared with HX3. White arrow, magnified view. Scale bars, 10 cm. b The flowering time of HX3 and ef1 under SD (11 h/13 h) and LD (15 h/9 h) conditions. c The maturity time of HX3 and ef1 in the field conditions of Guangzhou (Summer). d Flowering time of HX3 and ef1 relative to sowing dates throughout the year in field conditions. e Frequency distributions of flowering time in the F2 progeny of a cross between HX3 and ef1 under field conditions. f SNP-index distribution along chromosomes. Red regression lines were obtained by averaging SNP indices from a moving window of five consecutive SNPs and shifting the window 200 kb at a time. The Y-axis value of each average SNP index was set at a midpoint between the first and fifth SNP. g Fine mapping of ef1 using an F2 population from LNHM (a landrace form Guangdong) × ef1 candidate gene. Numbers of recombination refer to the number of heterozygous plants at a physical location. The candidate region was located in an 860 kb region spanned by SNP2 and SNP4, in which only Glyma.06G250100 harbors a deletion according to resequencing. The red square indicates ef1 has a single-nucleotide deletion at this site. ATG is the start codon; TAA is the stop codon; UTR untranslated region, CDS coding sequence. h Comparison of AtLDL2 and GmLDL2. Pink and gray squares represent the SWRIM and amine oxidase domains, respectively. Data represent means ± SD (n = 10 plants) in (b–d). The P value was generated from a two-sided Student’s t-test. Source data are provided as a Source Data file.
To identify the causative gene responsible for the early flowering in ef1, we crossed ef1 with HX3 to generate an F2 population with 119 individual plants. Among these plants, 87 showed the flowering time comparable to wildtype, while 32 exhibited early flowering phenotype (Fig. 1e). The segregation ratio of early and regular flowering time was about 1:3 (χ2 = 0.226, P = 0.63) in the F2 population, indicating that the ef1 phenotype is controlled by a single recessive allele. To map the locus, we re-sequenced the genome of the mutant and wildtype pooled from the F2 segregation population and calculated a SNP index using the bulk-segregant analysis (BSA) method40. Linkage analysis with SNP index distribution indicated that the 0.5 Mb region between 23,068,949 and 42,300,283 on chromosome 6 co-segregated with the early flowering phenotype of ef1 (Fig. 1f and Supplementary Table 2). To further narrow down the candidate interval in chromosome 6, we generated another F2 population with 95 plants by crossing ef1 with another soybean accession, LNHM (Liannanhuimao), which is a later-flowering landrace variety from Guangzhou, China. The flowering time of LNHM is significantly later than ef1 under the long-day conditions (Supplementary Fig. 1a, b), and the F2 progeny derived from LNHM × ef1 cross shows distinct comparable flowering time (Supplementary Fig. 1c). Through linkage analysis, we successfully narrowed down the candidate gene to an 850 kb region in chromosome 6 between maker SNP2 and SNP4 using the variations (SNPs or InDels) between LNHM and ef1 based on the resequencing (Fig. 1g). By comparing resequencing data between HX3 and ef1, we identified one gene Glyma.06G250100, with a 1 bp deletion at the first exon in ef1 (Fig. 1g). Sequence alignment showed that Glyma.06G250100 was previously named GmLDL2 encoding a homolog of Arabidopsis histone demethylase LDL2 (Supplementary Fig. 2)38,39. We therefore designated ef1 as GmLDL2 (named ef1 as Gmldl2-1) in soybean, and the 1 bp deletion resulted in a frameshift that led to a premature stop codon to produce a putative truncated protein lacking an amine oxidase-like domain (Fig. 1h). Taken together, these results provide evidence that the GmLDL2 mutation could be the causal lesion responsible for the early flowering phenotype observed in ef1.
To investigate the expression pattern of GmLDL2 in different tissues, we conducted reverse transcription quantitative PCR (RT-qPCR) to examine the transcription levels of GmLDL2 and found that the expression level of GmLDL2 in leaves is much higher than in other tissues tested (Supplementary Fig. 3a). Furthermore, we monitored the periodicity of GmLDL2 expression at a 4-h interval over a period of 48 h. No diurnal rhythm was observed for GmLDL2 expression either under short-day or long-day conditions (Supplementary Fig. 3b, c). In addition, the expression of GmLDL2 fused with GFP driven by the CaMV 35S promoter showed that the protein is localized in the nucleus (Supplementary Fig. 3d), which is consistent with the previous finding38.
Functional validation of GmLDL2 in controlling soybean flowering
To further verify that the early flowering of ef1 was indeed due to the loss-of-function of GmLDL2, we introduced the coding sequence (CDS) of GmLDL2 driven by the CaMV 35S promoter in the ef1 mutant background. Two independent representative transgenic lines were selected based on qRT-PCR and western blotting assays. Both the GmLDL2 transcript and its protein levels were significantly higher in the two transgenic lines compared to the ef1 mutant (Supplementary Fig. 4a, b). Consistently, the GmLDL2 complementary transgenic lines exhibited later flowering time compared to the ef1 mutant under either short-day or long-day conditions (Fig. 2a, b). Furthermore, we conducted CRISPR/Cas9 to generate two Gmldl2 alleles in the HX3 background. Two small guided RNA target regions (target 1 and 2) in the first exon of GmLDL2 were designed (Fig. 2c), which resulted in two independent GmLDL2 mutation lines (designated as Gmldl2-2 and Gmldl2-3), in which Gmldl2-2 contains a 5 bp deletion at target 1, and Gmldl2-3 contains a 2 bp deletion at target 2 (Fig. 2c). The Gmldl2-2 encodes a predicted protein with 23 amino acids because the 5 bp deletion led to a frameshift and premature stop codon. Gmldl2-3 harbors a truncated protein with 207 amino acids (Fig. 2d). Subsequently, HX3, ef1, Gmldl2-2, and Gmldl2-3 seeds were planted in the growth chamber under short-day and long-day conditions, and their flowering time were compared. Both Gmldl2-2 and Gmldl2-3 plants displayed early flowering phenotypes compared to HX3, similar to ef1 (Fig. 2e, f). We also examined the expression levels of soybean flowering marker genes GmFT2a and GmFT5a27, and found that the expression of both GmFT2a and GmFT5a were upregulated in Gmldl2 mutant compared to HX3 (Fig. 2g, h). Collectively, these data confirm that GmLDL2 is the causal gene for the ef1 mutant.
a The phenotype of whole plants and flowers of HX3, ef1, and two GmLDL2 overexpressed transgenic lines under short-day conditions. The two GmLDL2 complementary lines showed later flowering compared to ef1. White arrow, magnified view. Scale bars, 10 cm. b The flowering time of HX3, ef1, and two GmLDL2 overexpressed transgenic lines under SD (11 h/13 h) and LD (15 h/9 h) conditions. c The design of sgRNAs for CRISPR/Cas9 to knock out GmLDL2. Two sgRNAs targeting the CDS of GmLDL2 are labeled in red. Results of Sanger sequencing of target sites in T1 generations in HX3 and Gmldl2 mutants. sgRNA target and PAM sequences were shown in red and blue, respectively. UTR untranslated region. d Predicted amino acid sequences of GmLDL2 in HX3 and Gmldl2 mutants. Missense amino acids resulting from the frameshift caused by CRISPR/Cas9-induced mutations are shown in red. The asterisk indicates the termination of translation resulting from a stop codon. e The phenotypes of whole plants and flowers of HX3, ef1, Gmldl2-2, and Gmldl2-3 under short-day conditions. ef1, Gmldl2-2, and Gmldl2-3 showed early flowering compared to HX3. White arrow, magnified view. Scale bars, 10 cm. f The flowering time of all varieties examined under SD (11 h/13 h) and LD (15 h/9 h) conditions. g, h The expression levels of GmFT2a and GmFT5a in HX3, ef1, Gmldl2-2, and Gmldl2-3 at ZT4 in SD (11 h/ 13 h) conditions. Data represent means ± SD (n = 10 plants) in (b and f), and data represent means ± SD (n = 3 biological replicates) in (g, h). The P value was generated from a two-sided Student’s t-test. Source data are provided as a Source Data file.
The histone demethylase activity assay of GmLDL2
GmLDL2 is phylogenetically and structurally similar to the human HsLSD1 and Arabidopsis AtLDL2 (Fig. 1h and Supplementary Fig. 2), both of which were demonstrated to exhibit demethylase activity at H3K4me1 and H3K4me232,34. To test whether GmLDL2 also has histone demethylase activity, we measured the enzymatic activity of GmLDL2 in vitro. Similar to human HsLSD1, affinity-purified recombinant protein His-GmLDL2 exhibited demethylation activity for H3K4me1 and H3K4me2, but not for H3K4me3, H3K9me2, H3K27me1, or H3K36me3 (Fig. 3a, b). Furthermore, we transiently expressed 35Spro:GmLDL2-GFP, 35Spro:HsLSD1-GFP, and 35Spro:GFP in tobacco leaves and examined their levels of histone modification. It was found that both human LSD1 and soybean LDL2 displayed significant reduction in H3K4me1 and H3K4me2 signals as compared with that in the empty vector control, whereas di-, mono- and trimethylation levels at H3K9, H3K27, and H3K36 were unaffected compared to 35Spro:GFP expressing cells (Fig. 3c, d, Supplementary Fig. 5). We also examined the histone demethylase activity of GmLDL2 in vivo in HX3 and Gmldl2 mutant plants. Unexpectedly, western blot data showed no significant difference in H3K4me1/2/3, H3K9me2, H3K27me1, and H3K36me3 among HX3 and Gmldl2 mutants (Supplementary Fig. 6), indicating GmLDL2 does not affect the change of global level of H3K4me1/2 in vivo. Therefore, we speculate that mutation of GmLDL2 may only affect the H3K4me1/2 methylation levels at certain loci rather than global modification across the chromosome. Together, these results demonstrated that GmLDL2 acts as a histone demethylase for H3K4me1/2 in soybean.
a The affinity-purified recombinant His-GmLDL2 from E.coli was incubated with calf thymus histone for H3K4me1/2 demethylase activity assay, and the histone methylation status was determined using methylation-specific antibodies. His-HsLSD1 was used as the positive control, and anti-H3 was used as a loading control. The experiment was repeated three times independently with similar results. b Quantification analysis of (a) by ImageJ. Data represent means ± SD (n = 3 biological replicates). The P value was generated from a two-sided Student’s t-test. c GmLDL2-GFP exhibits H3K4me1/2 demethylase activity in N.benthamiana leaves. GmLDL2-GFP expression in N.benthamiana nuclei (arrows) was visualized by monitoring GFP fluorescence (green). The histone methylation status was analyzed by immunostaining with specific histone methylation antibodies (red). The location of nuclei was visualized using 4’,6-diamidine-2-phenylindole staining (blue). Scale bars, 50 μm. d Statistical analysis of the levels of fluorescent signals in (c) and Supplementary Fig. 5. Data represent means ± SD (n = 3 biological replicates). The P value was generated from a two-sided Student’s t-test. At least 200 nuclei expressing GFP, HsLSD1-GFP, or GmLDL2-GFP were analyzed in each biological replicate. Source data are provided as a Source Data file.
Transcriptomic analyses of GmLDL2-mediated gene expression
To further investigate gene transcription regulated by GmLDL2, two biological replicates of transcriptome sequencing were performed using leaves from HX3 and ef1. Briefly, clean reads were mapped to the Williams 82 reference genome using HISAT2 software41. There were 90.83% and 88.25% mapping ratios in two replicates of HX3, and 88.53% and 91.91% in two replicates of ef1 (Supplementary Table 3). RNA-seq data show a good correlation between replicates of both HX3 and ef1 (Correlation Coefficient: R2 = 0.9072 in HX3 and R2 = 0.9392 in ef1) (Supplementary Fig. 7). Analyses of differentially expressed genes using DESeq2 software42 showed that, compared with HX3, 440 and 372 genes were upregulated and downregulated in ef1 respectively (Fig. 4a and Supplementary Data 1). Gene Ontology (GO) enrichment analysis showed that upregulated genes were mainly involved in kinase activity, transferase activity, DNA-binding transcription factor activity, and transcription regulator activity (Fig. 4b). While the downregulated genes were mainly enriched in molecular function, binding, extracellular region, and cell wall (Fig. 4c). Meanwhile, the KEGG analysis showed that upregulated genes tend to associate with fatty acids, linoleic acid metabolism, monoterpene biosynthesis, zeatin biosynthesis, alkaloids biosynthesis, isoflavone biosynthesis, and photosynthesis antenna protein pathways. Downregulated genes were predominantly enriched in isoquinoline alkaloid biosynthesis, limonene and pinene degradation, and isoflavonoid biosynthesis pathways (Supplementary Fig. 8). Notably, there is no significant difference in expression levels between HX3 and ef1 for the principal flowering time-associated genes, such as E1, E3, E4, PRRs, and LUXs (Fig. 4d), whose expression was further validated by RT-qPCR analysis (Supplementary Fig. 9). These results suggest that GmLDL2 may regulate flowering time in the classical EC-E1 independent manner.
a Volcano plot showing differentially expressed genes in HX3 and ef1. Upregulated and downregulated genes in ef1 compared with HX3 are highlighted in red and blue, respectively. Differential expression analysis was performed using DESeq2 with its default Wald test (two-sided). P values were adjusted for multiple comparisons using the Benjamini–Hochberg method to control the false discovery rate (FDR). Genes with FDR < 0.05 were considered significantly differentially expressed. b and c Enriched Gene Ontology terms in the 440 genes upregulated or 372 genes downregulated in ef1, respectively. GO enrichment analysis was conducted using GOATOOLS with default parameters. Fisher’s exact test (two-sided) was used to assess enrichment significance, and P values were adjusted for multiple testing using the Benjamini–Hochberg correction. The outer circle indicates the classification of GO terms, and the middle circle represents gene number of each GO in the soybean genome background, and the inner circle represents the number of differentially expressed genes. P-values are presented in purple. d Heat map of expression profiles of known flowering control genes in HX3 and ef1. Source data are provided as a Source Data file.
ChIP-seq analyses of the GmLDL2-binding sites
To investigate the direct genome-wide binding sites of GmLDL2 in vivo, we performed chromatin immunoprecipitation followed by sequencing (ChIP-seq) with anti-GmLDL2 antibody using leaves of HX3 and ef1 (as a negative control) at the same stage as those in RNA-seq. The GmLDL2 antibody was verified by western blot, which can detect the GmLDL2 protein in HX3 rather than Gmldl2 mutants (Supplementary Fig. 10). Libraries from two biological replicates were sequenced, with highly reproducible confirmed by Spearman correlation (ρ > 0.90) and principal component analysis (PCA) (Supplementary Fig. 11a, b and Supplementary Fig. 12). In total, we identified 27,840 reliable peaks (q value < 0.01) entire the whole genome using merged anti-GmLDL2 ChIP-seq data from two biological replicates (Supplementary Table 4).
Given that GmLDL2 is a histone demethylase and can associate with chromatin, we performed ChIP-seq experiments using the anti-H3K4me1/2/3 antibodies to determine genome-wide H3K4 methylation levels in HX3 and ef1, with two replicates for each condition (the same stage sampled for RNA-seq). Similar to the anti-GmLDL2 ChIP-seq data, we observed a strong correlation between biological replicates (Supplementary Fig. 11c–h and Supplementary Fig. 12) and identified thousands of reliable peaks across the whole genome (Supplementary Table 4). Genomic annotations and statistical analysis of these peaks revealed that the mono-, di-, and tri-H3K4 methylation are highly enriched in the genic regions, but their distribution patterns are different. The tri-H3K4 methylation are enriched in the transcription start site (TSS) and gene body regions close to TSS, whereas the H3K4me1 and H3K4me2 are present at relatively low levels at the TSS and more broadly distributed across the gene body compared to H3K4me3, in both HX3 and ef1 (Fig. 5a and Supplementary Fig. 13), which is consistent with previous studies43,44. Especially, H3K4me2 was observed slightly higher levels around transcription end site (TES), in contrast to H3K4me1 and H3K4me3 (Fig. 5a). This is a possible reason for the lack of significant change in the global distribution of the three histone methylations on the coding genes in ef1, further supporting the hypothesis that GmLDL2 may only affect H3K4 methylation at specific loci. This hypothesis was also supported by the observation of a small proportion of genes with hyper H3K4me1/2 levels exhibiting exceptionally high enrichment of anti-GmLDL2 signals (Supplementary Fig. 14). We also analyzed both hyper- and hypo-methylated regions of H3K4me1 and H3K4me2 in the ef1 mutant, and identified the associated genes that overlapped with GmLDL2-bound genes. By comparing these differentially methylated gene sets (with increased or decreased H3K4me1/2 levels) with GmLDL2 ChIP-seq data, we found that the proportion of GmLDL2-bound genes was comparable between the hyper- and hypo-methylated groups (Supplementary Fig. 15).
a Normalized read density of H3K4me1, H3K4me2, and H3K4me3 signals at gene body regions, as well as upstream 1.5 kb and downstream 1.5 kb, lines show the average value of all genes in HX3 and ef1. b H3K4me1, H3K4me2, and H3K4me3 reads density among three groups of genes with different expression levels in HX3 and ef1. Gene body regions were divided into equal bins, upstream 1.5 kb of TSS and downstream of TES were calculated per 10 bp sliding window by read density. Metaplots represent the average of the normalized read density of genes in each group. Heatmaps show the distribution of read density of all genes. Genes in each group were sorted from high to low by gene expression levels (FPKM). TSS transcription start site, TES transcription end site. c Venn diagram showing the overlap among genes that are transcriptionally upregulated in ef1, H3K4me1 and H3K4me2 hypermethylated genes in ef1, and GmLDL2 binding genes. d Genome tracks of RNA-seq, anti-GmLDL2, anti-H3K4me1, and anti-H3K4me2 ChIP-seq data for GmFER locus in HX3 and ef1. Y-axis value means normalized read counts. Structure of GmFER. P1 to P4, specific regions used for ChIP-qPCR analysis. UTR untranslated region, CDS coding sequence. Gray block showing the higher H3K4me1 and H3K4me2 modification levels in ef1 than HX3. e qRT-PCR analysis of GmFER expression level. GmActin was used as an internal control. f, g ChIP-qPCR analysis of H3K4me1 and H3K4me2 methylation status at the GmFER locus. h ChIP-qPCR analysis of GmLDL2 associated with GmFER locus. Data represent means ± SD (n = 3 biological replicates) in (e–h). The P value was generated from a two-sided Student’s t-test. Source data are provided as a Source Data file.
Identification of GmLDL2-specific target genes
To identify the specific target genes of GmLDL2, we investigated the relationship between H3K4 methylation levels in the transcribed region and gene expression levels. A total of 52,872 genes annotated in the soybean genome were sorted from high to low by gene expression levels and genes with expression levels greater than 1 were then divided into three groups, composed of highly expressed genes (FPKM > 40), median expressed genes (10 < FPKM ≤ 40), low expressed genes (1 < FPKM ≤ 10). The results showed that the reads density of H3K4me1, H3K4me2, and H3K4me3 modifications all ranked from high to low among three groups of genes accompanied by gene expression values, indicating that H3K4me1/2/3 positively correlated with the levels of gene expression (Fig. 5b). To identify the target genes of GmLDL2, we screened the whole genome per 200 bp regions and identified regions that exhibited H3K4me1 and H3K4me2 changes that are more 1.5 folds higher in ef1 compared to HX3. In the ef1 mutant, H3K4me1 and H3K4me2 levels increased at least 0.5-fold in 3053 and 7946 regions overlapping with genes, respectively (Supplementary Data 2, 3). HyperH3K4me1 and hyperH3K4me2 peaks were both significantly overrepresented in the promoter and distal intergenic regions (Supplementary Fig. 16).
Genes overlapping with these hyperH3K4me1 and hyperH3K4me2 regions, as well as peaks from anti-GmLDL2 ChIP-seq data, were considered as candidate target genes of GmLDL2. Given that the H3K4me1 and H3K4me2 are strongly correlated with transcription levels (Fig. 5b), we compared these candidate target genes of GmLDL2 with upregulated genes in ef1 and found that only one gene (Glyma.18G270900) exhibited increased levels of H3K4me1 and H3K4me2, and also overlapped with GmLDL2 binding site (Fig. 5c, Supplementary Table 5, and Supplementary Data 4, 5). Glyma.18G270900 (GmFER) is a flowering-associated gene encoding a protein similar to Arabidopsis FERONIA45. Specifically, GmFER expression in ef1 exhibited a fivefold increase in the RNA-seq data, also showing a higher level of H3K4me1/2 at the 3’UTR and gene downstream regions (Fig. 5d). RT-qPCR and ChIP-qPCR validated the upregulated expression and increased H3K4me1/2 levels of GmFER in ef1 compared with the HX3 (Fig. 5e–g). Moreover, ChIP-qPCR assay using antibodies against GmLDL2 confirmed that GmLDL2 was directly associated with the chromatin region of GmFER (Fig. 5h). Together, these results indicate that GmLDL2 negatively regulates GmFER transcription by directly binding to its chromatin to change H3K4me1 and H3K4me2 levels.
GmLDL2 negatively regulates soybean flowering in a GmFER-dependent manner
To test the in vivo function of GmFER in soybeans, we first transformed the construction of GmFER fused to 3×Flag tags driven by the CaMV 35S promoter (35Spro:GmFER-Flag) into wild-type (Col) Arabidopsis plants. Three independent transgenic lines were selected (Supplementary Fig. 17a). Western blotting using an anti-Flag antibody demonstrated that the recombinant GmFER protein was expressed in all three lines (Supplementary Fig. 17b). In comparison to Col, the transgenic plants displayed significantly earlier flowering time under long-day conditions (Supplementary Fig. 17c, d). Such a phenomenon is consistent with the FER function in Arabidopsis flowering45, indicating that FER orthologs are relatively conserved in controlling the flowering time. Furthermore, we transformed the 35Spro:GmFER-Flag construct into soybean and obtained two independent transgenic lines (GmFER-OE) (Fig. 6a). These plants showed obviously increased levels of transcripts and protein of GmFER compared to HX3 (Fig. 6b, c). Subsequently, HX3 and transgenic plants were grown in climate chambers under short-day or long-day conditions. Both GmFER-OE transgenic lines showed significantly earlier flowering time relative to HX3 (Fig. 6d). Moreover, we searched the EMS-induced soybean mutant library46 and found three independent mutant lines of GmFER: NJAU0800, NJAU1719, and NJAU1817. NJAU0800 accommodates a G to A substitution in the coding region, resulting in an early stop at its 661 amino acid residue and the loss of a part of its kinase domain (Supplementary Fig. 18a, b). NJAU1719 harbors a single base substitution (G to A), and this missense SNP is located in the kinase domain, replacing glycine (Gly) with an arginine (Arg) (Supplementary Fig. 18a, b). NJAU1817 has the missense SNP (T to A) that is located in the malectin domain, substituting leucine (Leu) with a methionine (Met) (Supplementary Fig. 18a, b). To test the phenotypes of these soybean FER mutant alleles, we planted them in a growth chamber to measure their flowering time under long-day conditions, all three Gmfer mutant lines displayed delayed flowering time when compared to the control plants (Supplementary Fig. 18c, d). Taken together, these results indicate that GmFER promotes flowering in soybeans.
a The phenotypes of whole plants and flowers of HX3 and two GmFER-OE transgenic lines under short-day conditions. The two independent GmFER-OE transgenic plants showed early flowering compared to HX3. White arrow, magnified view. Scale bars, 10 cm. b The expression level of GmFER in HX3 and GmFER-OE transgenic plants. GmActin was used as an internal control. c Validation of GmFER expression in GmFER-OE transgenic plants by western blot with anti-Flag antibody. Actin serves as the loading control. d The flowering time of HX3 and GmFER-OE plants under SD (11 h/13 h) and LD (15 h/9 h) conditions. e The design of sgRNAs for CRISPR/Cas9 to knock out GmFER in the Gmldl2 mutant background. sgRNAs targeting CDS are labeled in red. Results of Sanger sequencing of target sites in T1 generations in HX3 and Gmldl2/fer double mutants. sgRNA target and PAM sequences were shown in red and blue, respectively. UTR untranslated region. f Predicted amino acid sequences of GmLDL2 and GmFER in HX3 and Gmldl2/fer double mutants. Missense amino acids resulting from the frameshift caused by CRISPR/Cas9-induced mutations are shown in red. The asterisk indicates the termination of translation resulting from a stop codon. g The phenotypes of whole plants and flowers of HX3, ef1, Gmldl2-2/fer, and Gmldl2-3/fer under long-day conditions. All Gmldl2/fer double mutants showed later flowering compared to ef1. White arrow, magnified view. Scale bars, 10 cm. h The flowering time of all varieties examined under SD (11 h/13 h) and LD (15 h/9 h) conditions. Data represent means ± SD (n = 3 biological replicates) in (b), and data represent means ± SD (n = 10 plants) in (d, h). The P value was generated from a two-sided Student’s t-test. Source data are provided as a Source Data file.
To determine whether the early flowering phenotype of the ef1 mutant is associated with increased GmFER expression, we generated Gmldl2/fer double mutant via CRISPR/Cas9 technology in the Gmldl2 mutant background. Two targets were designed: target 3 and 4 for editing GmFER (Fig. 6e). After stable transformation, two independent Gmldl2/fer double mutants were obtained (designated as Gmldl2-2/fer and Gmldl2-3/fer). Both Gmldl2-2/fer and Gmldl2-3/fer contain a 6 bp deletion at target 3, resulting in the loss of two amino acids at positions 146 and 147 in the malectin domain of GmFER (Fig. 6e, f). The alanine residue at position 146 is highly conserved among FER homologs (Supplementary Fig. 19), implying its functional importance. Then we planted HX3, ef1 mutant, and Gmldl2/fer double mutant plants under short-day or long-day conditions to measure their flowering. The Gmldl2-2/fer and Gmldl2-3/fer double mutants displayed a significantly delayed flowering time compared to the Gmldl2 single mutant under both SD and LD conditions (Fig. 6g, h), indicating that the Gmfer mutation suppressed the early flowering phenotype of the Gmldl2 mutant. In summary, these data strongly support the conclusion that GmLDL2 acts as a negative regulator of soybean flowering by directly modulating the transcription of GmFER.
Distinct GmLDL2 alleles drive soybean regional adaptation
Flowering time is a critical trait during soybean domestication and has enabled soybeans to expand from their origin to other regions. Several genes contributing to this trait have been demonstrated to undergo natural or artificial selection13,15,16,21,23,24, which led us to investigate whether GmLDL2 could be a domestication gene. First, we estimated the nucleotide diversity across the 2 Mb genomic region by sliding window per 5 kb spanning GmLDL2 using a public resequencing dataset consisting of 2898 soybean varieties from around the world, including 103 wild soybeans, 1048 landraces, and 1747 improved cultivars47. Specifically, we found that GmLDL2 is located in a region with a high signature of fixation index (FST = 0.26) between wild and landrace soybean populations (Fig. 7a), indicating that this gene might contribute to the early population differentiation. Meanwhile, we found that the wild soybean population exhibited a high nucleotide diversity (π) of 3.30e-3 in this region, whereas the landrace and improved cultivar populations showed markedly lower diversity, with π values of 2.32e-3 and 0.99e-3, respectively (Supplementary Fig. 20a), indicating that the region had low heterozygosity and the abundance of common alleles. However, positive Tajima’s D values of 2.24, 3.36, and 0.37 were also observed in the region containing GmLDL2 in wild soybeans, landraces, and cultivars (Supplementary Fig. 20b). The high Tajima’s D values suggested that the landrace population may shrink as a result of balancing selection. These results suggest that GmLDL2 was selected during soybean domestication, leading to different alleles rapid accumulation in landraces and cultivars.
a The nucleotide variation indicated by Fst in wild soybeans, landraces, and improved cultivars across the 2 Mb genomic regions surrounding GmLDL2. b Proportions of GmLDL2 haplotypes within each of the three germplasm groups. c The geographical distributions of cultivar accessions harboring GmLDL2H1 and GmLDL2H2 across China. NE, Northeastern region of China; NR, Northern region of China; HR, Huanghuai region of China; SR, Southern region of China. d The demethylase activity assay of GmLDL2H1 and GmLDL2H2 in N.benthamiana leaves. GmLDL2H1-GFP and GmLDL2H2-GFP expression in N.benthamiana nuclei (arrows) was visualized by monitoring GFP fluorescence (green). The histone methylation status was analyzed by immunostaining with specific histone methylation antibodies (red). The location of nuclei was visualized using 4’,6-diamidine-2-phenylindole staining (blue). Scale bars, 50 μm. e Statistical analysis of the levels of fluorescent signals in (d). Data represent means ± SD (n = 3 biological replicates). The P value was generated from a two-sided Student’s t-test. At least 200 nuclei expressed GmLDL2H1-GFP or GmLDL2H2-GFP were analyzed in each biological replicate. f The affinity-purified recombinant His-GmLDL2H1 and His-GmLDL2H2 from E.coli were incubated with calf thymus histone for H3K4me1/2 demethylase activity assay, and the histone methylation status was determined using methylation-specific antibodies. His was used as the negative control, and anti-H3 was used as a loading control. The experiment was repeated three times independently with similar results. g Quantification analysis of (f) by ImageJ. Data represent means ± SD (n = 3 biological replicates). The P value was generated from a two-sided Student’s t-test. h–j Statistics of flowering time, plant height, and grain weight of soybean accessions carrying GmLDL2H1 (n = 18 accessions) and GmLDL2H2 (n = 46 accessions) under natural short-day conditions in Guangzhou and Sanya. The P value was generated from a two-sided Student’s t-test. In all the box plots, the center line indicates the median, the edges of the box represent the first and third quartiles, and the whiskers extend to the smallest and largest data points within 1.5 interquartile ranges from the edges. Source data are provided as a Source Data file.
To explore the evolutionary origin of the different alleles in GmLDL2, we examined variations of the GmLDL2 coding region and classified six unique high-confidence haplotypes based on four SNPs in the exon: one nonsynonymous SNP resulting in an arginine-to-leucine substitution at position 699 (R699L) and three synonymous SNPs (Supplementary Fig. 21a). To further investigate the evolutionary history of GmLDL2, we analyzed the percentages of different alleles in wild soybeans, landraces, and improved cultivars. The frequency of GmLDL2H1 allele was present in 57.4% of wild soybeans, whereas GmLDL2H2 was only 7.9%, indicating that GmLDL2H1 may represent the ancestral haplotype (Fig. 7b). Furthermore, the minimum spanning tree (MST) analysis indicates that GmLDL2H1 is centrally located and connected to other haplotypes with the fewest mutational steps (Supplementary Fig. 21b), suggesting it probably was the original haplotype in soybean ancestors. Moreover, both rooted phylogenetic tree analyses and conserved sequence alignment patterns across closely related species consistently support GmLDL2H1 as the ancestral haplotype of the GmLDL2 locus (Supplementary Fig. 21c, d). Interestingly, the frequencies of GmLDL2H1 decreased to 17.1% and 7.4% in the landraces and cultivars respectively, while the GmLDL2H2 frequencies increased to 66.8% and 87.0% in the landraces and cultivars, respectively (Fig. 7b). These results suggested that GmLDL2 is likely under strong artificial selection during domestication.
Next, we assessed the geographical distribution of the two major alleles (GmLDL2H1 and GmLDL2H2) within each subset of wild soybeans, landraces, and improved cultivars, as described above. The results indicate that the GmLDL2H1 and GmLDL2H2 exhibit contrasting patterns between wild and landrace accessions. In the wild soybean accessions, the frequency of GmLDL2H1 increased from 40% in the Huanghuai region (central China) to 100% in the Northeast and Southern regions. Conversely, in landrace varieties, it decreased from 37.3% to 26.7% across these regions (Supplementary Fig. 22). In the cultivar subgroup, 27.7% of accessions from the Huanghuai regions carried the GmLDL2H1 allele, and 72.3% of accessions carried the GmLDL2H2 allele. Interestingly, the frequency of the GmLDL2H1 allele increased to 40.5% in southern China, whereas the frequency of the GmLDL2H2 allele rose to 97.2% in high-latitude regions (Fig. 7c), indicating that those two alleles of GmLDL2 might contribute to the adaptation of cultivated soybeans to low- and high-latitudes, respectively. To examine the functional significance of the two major alleles of GmLDL2H1 and GmLDL2H2, we first predicted the three-dimensional structure of GmLDL2H1 and GmLDL2H2 and found them to be similar to each other (Supplementary Fig. 23), suggesting that both alleles do not alter the protein structure. We then assessed the enzymatic activity of GmLDL2H1 and GmLDL2H2 by luminescence assays with transient expression in tobacco leaves and western blot with histone methylation antibodies. It was found that GmLDL2H1 has higher histone demethylase activity than GmLDL2H2, especially for H3K4me2 (Fig. 7d–g and Supplementary Fig. 24).
GmLDL2 is a specific H3K4 demethylase and negatively regulates soybean flowering, whereas the demethylase activity towards H3K4me2 of GmLDL2H2 is significantly lower than that of GmLDL2H1. Therefore, we wondered whether the flowering time of the soybean accessions carrying the GmLDL2H1 allele differs from those harboring the GmLDL2H2 allele. To further evaluate the phenotypic effects of these two major haplotypes, we minimized the effects of natural variations in known soybean flowering-time genes, including E1-E4, J, LUXs, and Tof16. First, we selected the accessions with no variations in the CDS of E3, E4, LUXs, Tof18, GmFT2a, and GmFT5a. Then, we excluded the soybean accessions carrying known functional alleles, including the e1-as allele (E1 transaction variant causing early flowering)26, the HT1m mutation (J splice variant conferring later flowering)25, Tof11-1 to Tof11-10 (loss-of-function alleles of Tof11), and Tof12-1 to Tof12-4 (loss-of-function alleles of Tof12)16. Next, we characterized coding sequence polymorphisms in E2 and Tof16. Two exonic nonsynonymous SNPs defined three E2 haplotypes, while three Tof16 haplotypes were classified by one exonic nonsynonymous SNP and an in-frame deletion. To evaluate the phenotypic effects of the three haplotypes of E2 and Tof16, we compared flowering time differences among different haplotypes. The results showed that haplotype 2 of E2 flowered significantly later than the other two haplotypes, while both haplotype 2 and haplotype 3 of Tof16 flowered later than haplotype 1 (Supplementary Fig. 25). Consequently, we also excluded accessions with E2H2, Tof16H2, or Tof16H3. Finally, a total of 64 soybean accessions with nearly identical genetic backgrounds were selected to assess the functions of GmLDL2H1 and GmLDL2H2. Eighteen soybean accessions harboring GmLDL2H1 allele and 46 varieties carrying GmLDL2H2 were planted in Guangzhou (23°16′N, 113°23′E) and Sanya (18°15′N, 109°30′E), China. Six individual plants were measured for each variety, and as expected, varieties carrying GmLDL2H1 showed significantly later flowering time than those carrying GmLDL2H2 (Fig. 7h), along with increased plant height and grain weight per plant (Fig. 7i, j). Considering that GmLDL2-mediated regulation of soybean flowering time depends on GmFER, we wonder whether the expression levels and H3K4me1/2 modifications of GmFER contribute to the differences in flowering time between soybean carrying GmLDL2H1 and those carrying GmLDL2H2. To test this hypothesis, we measured the expression and H3K4me1/2 levels of GmFER in both GmLDL2H1 and GmLDL2H2 varieties. The results showed that the expression levels and H3K4me1/2 modifications of GmFER in GmLDL2H2 varieties are significantly higher than those carrying GmLDL2H1 (Supplementary Fig. 26). Taken together, we speculated that the GmLDL2H1 allele could be selected and may contribute to the adaptation of soybeans to low latitudes.
Discussion
Flowering at the appropriate time has a strong impact on the geographical adaptation and grain yield of crops, including soybeans. Previous studies have identified several critical regulators for flowering time control in soybean3, but environmental factors such as epigenetic regulators are rarely reported. In this study, we identified a GmLDL2 locus that acts as a flowering repressor through forward genetic screening. Cloning and functional characterization revealed that GmLDL2 encodes a histone demethylase, and inhibits flowering and maturity through suppressing the expression of GmFER by triggering the H3K4me1 and H3K4me2 demethylation at this locus, which is independent of the canonical EC-E1 pathway. Moreover, genomic and population genetic analyses revealed that distinct GmLDL2 alleles may promote flowering and improve latitude adaptation of soybeans.
GmLDL2 encodes a member of the amine-oxide domain-containing histone demethylase. We previously demonstrated that its orthologs GmFLD, GmLDL1a, and GmLDL1b all exhibit H3K4 demethylase activity39. Here, it was shown that GmLDL2 can specifically demethylate H3K4me1 and H3K4me2 (Fig. 3), similar to its orthologs. However, the global H3K4 methylation levels were not accumulated in the Gmldl2 mutant (Fig. 5a and Supplementary Fig. 6). It is plausible that GmLDL2 is not a master regulator for global histone H3K4me1 and H3K4me2 modification, or due to functional redundancy with other histone demethylase family members. Histone demethylase contains a large number of members, including JmjC-containing proteins and other LSD-like proteins. Instead, GmLDL2 may only modulate a small fraction of the genomic loci. In Arabidopsis, AtLDL2 acts redundantly with AtLDL1 in repressing FLC and FWA expression. Hence, the ldl2 single mutant plants display no significant phenotypic change34. By contrast, the flowering time of Gmldl2 mutant is significantly earlier compared to HX3 (Fig. 2), suggesting that LDL2 is divergent between soybean and Arabidopsis. It was reported that Arabidopsis LDL2 controls flowering time by repressing FLC and FWA expression via demethylating H3K4me1/234, while the expression of FLC-like and FWA-like genes were unaltered and no change was observed in H3K4 modification at those locus in soybeans (Supplementary Data 1–3), suggesting that LDL2 may have different targets across species.
Unlike H3K4me3 which is widely considered as an active marker, the relationship between gene transcription levels and H3K4me1/2 remains largely unclear in plants36,44,48,49,50. Here, our data showed that H3K4me1/2 levels were positively correlated with gene expression in soybeans (Fig. 5b), in combination with the GmLDL2 binding sites identified by ChIP-seq, we identified only one GmLDL2-targeted candidate gene of GmFER, which is consistent with the findings that upregulated genes in ef1 are mainly overrepresented in kinase activity and post-embryonic development GO terms (Fig. 4b). This was also reflected by the fact that the Gmldl2 mutant plants does not show obvious developmental defects besides the change in flowering and maturity time. Specially, it was found that the GmFER gene had higher H3K4me1/2 levels around the TES region (Fig. 5d), which is in agreement with previous studies found that HsLSD2 primarily associates with gene bodies and downstream of genes instead of promoters, and in conjunction with the H3K9-specific G9a histone methyltransferase and elongation factors51,52,53. Additionally, H3K4me2 is usually complicated by lncRNA enrichment54, and HDA6-LDL1/2 is crucial for repressing the lncRNAs, and are non-expressed or show low expression in Arabidopsis55. Therefore, it is also worthwhile to investigate the lncRNAs related to the soybean FER gene in future studies. FER proteins are members of the CrRLK1L family and have been extensively studied. They are involved in a variety of signaling pathways, and play crucial roles in fertility development, cell growth and mechanical integrity, morphogenesis, and resistance to abiotic and biotic stresses56. In Arabidopsis, it has been demonstrated that RLF1-FER is capable of controlling vegetative reproduction and growth transition. AtFER is involved in the circadian clock pathway, as well as influencing flowering by modulating the expression and alternative splicing of FLC45. However, the homologs of GmFLC-like or the circadian clock-related genes, such as J, LUXs, and PRRs, were not significantly altered in ef1 compared to HX3 (Fig. 4d and Supplementary Fig. 9), suggesting that FERONIA is divergent in regulating flowering between Arabidopsis and soybean. In particular, our data identified a mechanism of histone modification to regulate FERONIA expression.
Crop progenitors exhibit high allelic diversity, which has largely been lost during domestication. However, natural variations from wild ancestors can be reintroduced into modern crops through breeding to enhance plant adaptation and yield. The UPA2 (Upright Plant Architecture 2), which regulates plant architecture by reducing the leaf angle in maize, originated from the wild progenitor of maize (teosinte). This allele was lost during domestication, but reintroducing it into modern maize hybrids has improved performance under high-density planting and increased yield57. In tomatoes, the HAK20 (HAK/KUP/KT transporter 20) is responsible for salt tolerance, but its functional allele has been lost during domestication. Integrating the salt-tolerant allele from wild varieties into cultivated tomatoes significantly improves their salt tolerance58. Soybean was domesticated from the temperate regions of China to expand to a wide range of latitudes around the world. Understanding the genetic basis of flowering time is of major significance for improving soybean regional adaptation. Here, our results demonstrate that soybean accessions carrying the GmLDL2H1 allele (the ancestral haplotype) flower significantly later than those carrying the GmLDL2H2 allele under natural short-day conditions, leading to enhanced yields (Fig. 7h–j). Thus, introgressing the GmLDL2H1 allele from wild soybean into cultivars can simultaneously improve soybean adaptation and yield-related traits. Since the interaction between genetics and environment determines phenotypes, the combination of GmLDL2 with other genetic factors could provide avenues to improve crop breeding.
Based on our data, we propose a model (Fig. 8) of how GmLDL2 regulates flowering time and improves low-latitude adaptation in soybeans. According to this model, GmLDL2 could undergo natural and artificial selection during soybean domestication. In the low-latitude regions, natural selection of GmLDL2H1 might have higher demethylase activity for the removal of methylated H3K4me1/2, thus resulting in the lower expression of target genes GmFER, which leads to late flowering. In contrast, the artificial selection of GmLDL2H2 in the high-latitude regions results in lower enzymatic activity to demethylate H3K4me1/2, thereby causing the higher expression of GmFER, which leads to early flowering. In summary, our study not only identified a flowering regulator GmLDL2 in soybeans, but also revealed unreported mechanistic insights for understanding the flowering of plants, hence providing important resources for improving genetic manipulation of soybean breeding.
The GmLDL2H1 allele under natural selection promotes soybean to low latitudes, whereas the GmLDL2H2 allele under artificial selection facilitates soybean to high-latitudes. GmLDL2 regulates flowering time by repressing the expression of GmFER through altering the H3K4 methylation status at this locus.
Methods
Plant materials and growth conditions
The soybean Huaxia3 (HX3) or Williams 82 (Wm82) cultivar was used as a wild-type control. The mutant early flowering 1 (ef1, M6 generation) was generated from HX3 treated with 60Co γ-rays. Liannanhuimao (LNHM) is a later-maturity landrace from Guangdong, China. The plants were grown under short-day (11 h light/13 h dark) or long-day (15 h light/9 h dark) conditions in the 25 °C growth chamber or the field in Guangzhou, China, in 2018.
The soybean accessions selected from a subset of 315 accessions59 were cultivated under field conditions at two sites: Guangzhou (23°16′N, 113°23′E) in the 2020 summer and Sanya (18°15′N, 109°30′E) in the 2019 winter. The flowering time was recorded as the number of days after emergence (DAE) when the first flower opened, and yield-related traits were recorded at the R8 stage. At least 6 plants were tested per variety.
All Arabidopsis seeds used in this study were in the Columbia (Col) background. Seeds of wild-type Col and transgenic lines (T2 generation) were grown on Murashige and Skoog (MS) plates under long-day conditions (16 h light/8 h dark) in a 22 °C growth chamber. The flowering time was investigated as days after sowing to the first flower opening.
Map-based cloning
For bulk-segregant analysis, 50 plants (25 with a wide-type phenotype and 25 with the mutant phenotype) were selected from the HX3 × ef1 F2 population. The phenotypes of F2:3 progeny were used to determine whether the F2 plants belonged to the wild-type or mutant category. If all F2:3 plants flowered earlier than the wildtype, the corresponding F2 line was classified as mutant. Conversely, if the flowering time of F2:3 progeny was variable or similar to that of the wildtype, the F2 line was classified as wildtype. Genomic DNA from individuals with early flowering phenotype and individuals with wild-type phenotype was bulked into mutant and wild-type pools in an equal ratio. Sequencing libraries were constructed for each pool and the parents, ef1 and HX3, through Illumina HiSeqX with an average coverage depth of 30× for each sample. Reads of each library were aligned to the Williams 82 genome (Wm82.a2.v1) using BWA 0.7.17-r1188 with default parameters. SNP calling was performed using GATK 1.9 software. Annotations of SNPs were performed based on the Wm82 genome using SnpEff 3.1. Statistical confidence intervals of Δ(SNP index) were calculated as the SNP index of the mutant pool minus the SNP index of the wild-type pool. An F2 population from ef1 and LNHM was used for fine mapping. Primer sequences of the markers are listed in Supplementary Data 6.
Phylogenetic tree
The protein sequences of HsLSD1 and LSD-like family numbers of Arabidopsis and soybean, the CDS sequences of LDL2-like genes, and FER protein sequences from various species were downloaded from Phytozome v13 (Phytozome (doe.gov)). Amino acid sequences were aligned by Clustal-W, and the phylogenetic tree was constructed and visualized in MEGA 11 with the neighbor-joining model, for 1000 bootstrap replications.
RNA isolation, RT-qPCR, and RNA-seq
The total RNA of different tissues of soybean was isolated using the HiPure Plant RNA Plus kit (Magen, R4150) according to the manufacturer’s instructions. The leaves, SAM, stem, and root were collected at 20 DAE. Flowers were harvested at the R1 stage, and pods and nodules were collected at the R4 stage. Leaves at 20 DAE were sampled from HX3 and ef1 under long-day (15 h light/ 9 h dark) and short-day (11 h light/ 13 h dark) conditions to examine the diurnal expression. Leaves at 20 DAE were sampled from soybean accessions carrying GmLDL2H1 or GmLDL2H2 under short-day (11 h light/ 13 h dark) conditions to examine the expression of GmFER. 1 μg of total RNA was used for first-strand cDNA synthesis (TaKaRa, RR047A). qRT-PCR analysis was carried out using the SYBR-GREEN kit (TaKaRa, RR820A) on a CFX96 real-time PCR system (Bio-Rad). Three technical replicates were used for each real-time qPCR sample and GmACT3 was used as an internal controls. The primers were listed in Supplementary Data 6.
Total RNA was extracted from fully expanded trifoliolate leaves at the last node of 55-day-old HX3 and ef1 plants at ZT4 under long-day conditions (15 h light/9 h dark) for RNA-seq analysis. Reads of each library were aligned to the Williams 82 genome (Wm82.a4.v1) using HISAT2 software version 2.1.0 with default parameters. The gene expression levels were estimated by fragments per kilobase per million mapped reads (FPKM) through StringTie2 software version 2.2 with parameters -e -G -A -B. FeatureCounts software version 2.0.3 was used to count the number of mapped reads for each gene to identify differentially expressed genes (DEGs). The DEGs were identified using the R package DESeq2 version 1.34.0 with a threshold of |log2(fold change)| ≥ 1.5 and FDR < 0.05. Then, the Python library GOATOOLS version 1.3.3 was used for GO analysis, and KOBAS-I was employed for KEGG enrichment analysis for upregulated and downregulated genes, respectively.
Subcellular localization
The full-length coding sequence of GmLDL2 from HX3 was cloned into pCAMBIA1300-GFP, and the construction was transformed into Agrobacterium tumefaciens GV3101. Overnight cultures of Agrobacteria were collected by centrifugation, resuspended in MES buffer (10 mM MES (pH 5,6), 10 mM MgCl2, and 200 mM acetosyringone), and grown to OD600 = 0.8–1.0. It was then injected into Nicotiana benthamiana. And the GFP fluorescence was assayed after 48 h using an LSM-710 confocal microscope (Zeiss). The primers were listed in Supplementary Data 6.
Soybean transformation
The coding sequences of GmLDL2 and GmFER were amplified from HX3 cDNA and cloned into the pTF101 vector, which uses Basta as the selection marker. The constructs were introduced into Agrobacterium tumefaciens EHA101 and ultimately transformed into the ef1 mutant or HX3 following the cotyledon-node method. The positive transgenic plants were verified by PCR amplification of Bar using primers as listed in Supplementary Data 6. Western blots were used to verify expression of the corresponding proteins.
To obtain the soybean Gmldl2 mutant and Gmldl2/fer double mutant, CRISPR-Cas9 target sequences of GmLDL2 and GmFER were designed using the CRISPR-P webtool. Two targets were selected and cloned into two different single-guide RNA (sgRNA) expression cassettes and integrated into the pYLCRISPR/Cas9P35S-B vector. The recombinant construct was introduced into Agrobacterium tumefaciens EHA101 and transformed into HX3 or the mutant using the cotyledonary method. To identify positive CRISPR-Cas9-induced mutants, genomic DNA was extracted from the leaves of each plant, and PCR analysis was performed to confirm the presence of the cas9 gene with specific primers. The regions spanning the target sites of the GmLDL2 and GmFER were amplified using KOD DNA Polymerase (Toyobo) with specific primers and sequenced. The primers were listed in Supplementary Data 6.
Arabidopsis transformation
The full-length coding sequence of GmFER from HX3 was cloned into pTF101, and the recombinant construct (35Spro:GmFER-3×Flag) was transformed into Agrobacterium tumefaciens GV3101 and introduced into Arabidopsis using the floral dip method. Positive T1 transformants were selected using 25 mg/mL Basta, and the positive T1 plants were self-fertilized to produce T2 seeds and verified by PCR amplification of the Bar gene using primers as listed in Supplementary Data 6. Western blots were used to verify expression of the corresponding proteins.
Protein extraction and western blot analysis
Total proteins were extracted from young leaves of HX3, ef1, GmLDL2-OE, and GmFER-OE transgenic lines with protein extraction buffer (50 mM Tris-HCl (pH 8.0), 150 mM NaCl, 5 mM MgCl2, 0.1% Nonidet P-40, 10% glycerol, 2 mM DTT, and 1×EDTA-free protease inhibitor cocktail). The samples were centrifuged at 16,000×g for 15 min, then the supernatant was mixed with 5×SDS loading buffer and boiled at 95 °C for 10 min. The mixture was separated by SDS-PAGE gels, followed by western blotting analysis with the appropriate antibodies, including anti-Actin and anti-Flag.
Demethylation activity assay in vitro
Briefly, His-HsLSD1 and His-GmLDL2H1 (the full-length coding sequence of GmLDL2H1 was amplified from the HX3 cDNA) and His-GmLDL2H2 (the full-length coding sequence of GmLDL2H2 was amplified from the Wm82 cDNA) were expressed in E.coli Rosetta (DE3) cells and affinity-purified with His Bind Resin (Roche, Swiss). A quantity of 7 μg of calf thymus histones (Sigma-Aldrich, H9250) were incubated with affinity-purified His-HsLSD1 (10 µg), His-GmLDL2 (10 µg), or His (10 µg) in 100 µL of reaction buffer (50 mM Tris-HCl (pH 8.5), 50 mM KCl, 5 mM MgCl2, 5% glycerol, and complete EDTA-free protease inhibitors) at 37 °C for 4 h. Nuclei were extracted from the leaves of HX3 and Gmldl2 mutants using the Plant Nuclei Isolation/Extraction kit (Sigma, CELLYTPN1). The reaction product or nuclei extracted from HX3 and Gmldl2 mutants was analyzed by western bolting with anti-H3K4me1 (Millipore, 07-436), anti-H3K4me2 (Millipore, 07-030), anti-H3K4me3 (Abcam, ab8580), anti-H3K9me2 (Abcam, ab1220), anti-H3K27me1 (Millipore, 07-448), anti-H3K36me3 (Abcam, ab9050), and anti-H3 (Abcam, ab1791). The signals of each modification were calculated as the ratio of itself to H3 and quantified by ImageJ 1.8.0.
Demethylation activity assay in vivo
Briefly, pCAMBIA1300-HsLSD1-GFP, pCAMBIA1300-GmLDL2H1-GFP (the full-length coding sequence of GmLDL2H1 was amplified from the HX3 cDNA), pCAMBIA1300-GmLDL2H2-GFP (the full-length coding sequence of GmLDL2H2 was amplified from the Wm82 cDNA), and pCAMBIA1300-GFP constructs were transformed into Nicotiana benthamiana leaves. To determine the specific histone methylation status, immunostaining was performed using histone methylation-specific antibodies (H3K4me1: Millipore, 07-436; H3K4me2: Millipore, 07-030; H3K4me3: Abcam, ab8580; H3K9me2: Abcam, ab1220; H3K27me1: Millipore, 07-448; H3K36me3: Abcam, ab9050) and Alexa FluorTM 488-conjugated goat anti-rabbit (Invitrogen, A11008) or goat anti-mouse (Invitrogen, A21428). The GFP and Alexa Fluor 488 fluorescent signals were recorded under a fluorescence microscope (Zeiss). ImageJ 1.8.0 software was used to quantify the fluorescent signals in the immunolabeled nuclei.
Chromatin immunoprecipitation (ChIP) and ChIP-seq experiment
Leaves were collected from 55 DAE plants at ZT4 under long-day conditions from HX3 and ef1 or 20 DAE plants at ZT4 under short-day conditions from soybean accessions carrying GmLDL2H1 or GmLDL2H2. About 2 g samples were crosslinked with 1% formaldehyde for 10 min under a vacuum, then quenched with 125 mM glycine for 5 min. Nuclei were isolated and sonicated to a size range of 300–500 bp. The solubilized chromatin was immunoprecipitated by anti-H3K4me1 (Millipore, 07-436), anti-H3K4me2 (Millipore, 07-030), anti-H3K4me3 (Abcam, ab8580), and anti-GmLDL2. GmLDL2 antibody was prepared by Shanghai Ango Biological Technology. Co-immunoprecipitated DNA was recovered by phenol-chloroform extraction. Immunoprecipitated DNA was used for ChIP-seq DNA library preparation or qPCR analysis.
Two biological replicates were prepared and sequenced for each ChIP-seq experiment. The libraries were prepared using VAHTS Multiples Oligos Set 4 for Illumina (Vazyme, N321). The sequencing was performed on the Illumina Hiseq X Ten platform. Raw reads were trimmed by Trim Galore software version 0.6.8 with parameter -q 25 --trim1 --stringency 5 -r1 31 -r2 31 --length 30 --paired. The clean reads were then aligned to the soybean reference genome Wm82.a4.v1. Duplicate reads were removed using Picard version 2.23.3, and multiple mapped reads were filtered out using the SAM file tag “XS:”. Unique mapped reads were then delivered for peak calling through MACS2 software version 2.2.9.1 with parameter --extsize 200 --nomodel -g 1e9 -B -q 0.05. The distributions of hyper H3K4me1 and H3K4me2 regions across the genome in ef1 mutant were annotated using the R package ChIPseeker. Software deepTools2 was used for ChIP-seq data visualization, bamCoverage subroutine was used to normalize reads per genomic content (1×normalization) across the entire genome with parameter --binSize 10 --normalizeUsing RPGC --effectiveGenomeSize 952,479,371 --extendReads --smoothLength 30. The multiBigwigSummary and plotCorrelation subroutines were used to calculate Spearman correlation for biological replicate data. Then the bamCompare subroutine was used to calculate the fold change in histone modification levels between HX3 and ef1 per 200 bp window. The plotHeatmap subroutine was used for visualizing ChIP-seq reads landscape with heatmaps. BigWig files from normalized reads were then imported into the Integrated Genome Viewer (IGV) version 2.13.2 for visualization. The precipitated DNA by H3K4me1, H3K4me2, and GmLDL2 antibodies were recovered and analyzed by qPCR assay. The percentage of input of each amplicon was calculated as the ratio between the immunoprecipitated DNA and input DNA. The primers used for amplification were listed in Supplementary Data 6.
Genetic diversity and haplotype analysis
Whole genome resequencing data of 2898 accessions (PRJCA002030) was downloaded from the National Genomics Data Center (https://ngdc.cncb.ac.cn/). Previously, filter-out SNP missing rates greater than 10% and minor allele frequency (MAF) less than 5% in VCF format file, were used for the genetic diversity analysis. BCFtools software version 1.14 was used to filter SNPs in the CDS region of the GmLDL2 gene for haplotype classification. VCFtools version 0.1.16 was used to calculate the pairwise genomic differentiation values for wild, landrace, and improved cultivar soybean populations per 5 kb sliding window with a step size of 2.5 kb, including the fixation index (FST) of population differentiation, nucleotide diversity (π) and Tajima’s D, which measures whether the population is expanding or shrinking. Accessions collected in the PRC were classified accordingly and mapped to their geographical origins. The geographic classification of soybeans was based on the previous study60. The map of the PRC was certified and downloaded from the Ministry of Natural Resources of the PRC (https://www.bzdt.ch.mnr.gov.cn).
Simulation of protein structure
The structural characteristics of GmLDL2H1 and GmLDL2H2 were simulated and analyzed using the online program Phyre2 (PHYRE2 Protein Fold Recognition Server (ic.ac.uk)). The structural superimpositions were visualized using PyMoL (version 2.5) (Schrodinger, Inc.).
Statistical analysis
In this study, the two-sided Student’s t-test was performed in SPSS 19. Welch’s t-test and hypergeometric test were performed in R. Fisher’s exact test of overlapped regions between hypermethylated regions and upregulated gene regions were performed using BEDTools Fisher version 2.30.0.
Gene accession
Sequence data for genes described in this article can be found in SoyBase (https://www.soybase.org/): GmLDL2 (https://www.soybase.org/tools/search/gene.html?identifier=Glyma.06G250100), GmFT2a (https://www.soybase.org/tools/search/gene.html?identifier=Glyma.16G150700), GmFT5a (https://www.soybase.org/tools/search/gene.html?identifier=Glyma.16G044100), GmActin (https://www.soybase.org/tools/search/gene.html?identifier=Glyma.18G290800), E1 (https://www.soybase.org/tools/search/gene.html?identifier=Glyma.06G207800), E2 (https://www.soybase.org/tools/search/gene.html?identifier=Glyma.10G221500), E3 (https://www.soybase.org/tools/search/gene.html?identifier=Glyma.19G224200), E4 (https://www.soybase.org/tools/search/gene.html?identifier=Glyma.20G090000), GmLUX2 (https://www.soybase.org/tools/search/gene.html?identifier=Glyma.12G060200), J (https://www.soybase.org/tools/search/gene.html?identifier=Glyma.04G050200), Tof11 (https://www.soybase.org/tools/search/gene.html?identifier=Glyma.U034500), Tof12 (https://www.soybase.org/tools/search/gene.html?identifier=Glyma.12G073900), and GmFER (https://www.soybase.org/tools/search/gene.html?identifier=Glyma.18G270900).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The genome resequencing of HX3, ef1, wide-type pool, and mutant pool, along with RNA-Seq and ChIP-seq data of HX3 and ef1, have been deposited in the Genome Sequence Archive (GSA) at the National Genomics Data Center under accession CRA019775. Source data are provided with this paper.
References
Thomas, B. & Vince-Prue, B. Photoperiodism in Plants 2nd edn (Academic Press, 1997).
Garner, W. W. & Allard, H. A. Effect of the relative length of day and night and other factors of the environment on growth and reproduction in plants. Mon. Weather Rev. 48, 415–415 (1992).
Fang, C., Du, H., Wang, L., Liu, B. & Kong, F. Mechanisms underlying key agronomic traits and implications for molecular breeding in soybean. J. Genet. Genomics 51, 379–393 (2024).
Fornara, F., De Montaigu, A. & Coupland, G. SnapShot: control of flowering in Arabidopsis. Cell 141, 550–550 (2010).
Lin, X., Liu, B., Weller, J. L., Abe, J. & Kong, F. Molecular mechanisms for the photoperiodic regulation of flowering in soybean. J. Integr. Plant Biol. 63, 981–994 (2021).
Zhou, S. et al. Transcriptional and post-transcriptional regulation of heading date in rice. N. Phytol. 230, 943–956 (2021).
Liu, S., He, M., Lin, X. & Kong, F. Epigenetic regulation of photoperiodic flowering in plants. Plant Genome 16, e20320 (2023).
Shafiq, S., Berr, A. & Shen, W. Combinatorial functions of diverse histone methylations in Arabidopsis thaliana flowering time regulation. N. Phytol. 201, 312–322 (2014).
Wang, Y., Gu, X., Yuan, W., Schmitz, R. J. & He, Y. Photoperiodic control of the floral transition through a distinct polycomb repressive complex. Dev. Cell 28, 727–736 (2014).
Bandillo, N. B. et al. Dissecting the genetic basis of local adaptation in soybean. Sci. Rep. 7, 17195 (2017).
Sedivy, E. J., Wu, F. & Hanzawa, Y. Soybean domestication: the origin, genetic architecture and molecular bases. N. Phytol. 214, 539–553 (2017).
Zhang, S. R. et al. Photoperiodism dynamics during the domestication and improvement of soybean. Sci. China Life Sci. 60, 1416–1427 (2017).
Dong, L. et al. The genetic basis of high-latitude adaptation in wild soybean. Curr. Biol. 33, 252–262 (2023).
Li, H. et al. The variant AP2/ERF transcription factor TOE4b regulates photoperiodic flowering and grain yield per plant in soybean. Plant Biotechnol. J. 21, 1682–1694 (2023).
Li, H. et al. Natural variation of FKF1 controls flowering and adaptation during soybean domestication and improvement. N. Phytol. 238, 1671–1684 (2023).
Lu, S. et al. Stepwise selection on homeologous PRR genes controlling flowering and maturity during soybean domestication. Nat. Genet. 52, 428–436 (2020).
Tsubokura, Y. et al. Genetic variation in soybean at the maturity locus E4 is involved in adaptation to long days at high latitudes. Agronomy 3, 117–134 (2013).
Xu, M. et al. Genetic variation in four maturity genes affects photoperiod insensitivity and PHYA-regulated post-flowering responses of soybean. BMC Plant Biol. 13, 1–14 (2013).
Xu, M. et al. The soybean-specific maturity gene E1 family of floral repressors controls night-break responses through down-regulation of FLOWERING LOCUS T orthologs. Plant Physiol. 168, 1735–1746 (2015).
Bu, T. et al. A critical role of the soybean evening complex in the control of photoperiod sensitivity and adaptation. Proc. Natl Acad. Sci. USA 118, e2010241118 (2021).
Dong, L. et al. Genetic basis and adaptation trajectory of soybean from its temperate origin to tropics. Nat. Commun. 12, 5445 (2021).
Fang, X. et al. Modulation of evening complex activity enables north-to-south adaptation of soybean. Sci. China Life Sci. 64, 179–195 (2021).
Kou, K. et al. A functionally divergent SOC1 homolog improves soybean yield and latitudinal adaptation. Curr. Biol. 32, 1728–1742 (2022).
Lu, S. et al. Natural variation at the soybean J locus improves adaptation to the tropics and enhances yield. Nat. Genet. 49, 773–779 (2017).
Yue, Y. et al. A single nucleotide deletion in J encoding GmELF3 confers long juvenility and is associated with adaption of tropic soybean. Mol. Plant 10, 656–658 (2017).
Xia, Z. et al. Positional cloning and characterization reveal the molecular basis for soybean maturity locus E1 that regulates photoperiodic flowering. Proc. Natl Acad. Sci. USA 109, E2155–E2164 (2012).
Kong, F. et al. Two coordinately regulated homologs of FLOWERING LOCUS T are involved in the control of photoperiodic flowering in soybean. Plant Physiol. 154, 1220–1231 (2010).
Allis, C. D. & Jenuwein, T. The molecular hallmarks of epigenetic control. Nat. Rev. Genet. 17, 487–500 (2016).
Liu, C., Lu, F., Cui, X. & Cao, X. Histone methylation in higher plants. Annu. Rev. Plant Biol. 61, 395–420 (2010).
Chen, Y. et al. Crystal structure of human histone lysine-specific demethylase 1 (LSD1). Proc. Natl Acad. Sci. USA 103, 13956–13961 (2006).
Kim, S. A., Zhu, J., Yennawar, N., Eek, P. & Tan, S. Crystal structure of the LSD1/CoREST histone demethylase bound to its nucleosome substrate. Mol. Cell 78, 903–914 (2020).
Shi, Y. et al. Histone demethylation mediated by the nuclear amine oxidase homolog LSD1. Cell 119, 941–953 (2004).
He, Y., Michaels, S. D. & Amasino, R. M. Regulation of flowering time by histone acetylation in Arabidopsis. Science 302, 1751–1754 (2003).
Jiang, D., Yang, W., He, Y. & Amasino, R. M. Arabidopsis relatives of the human lysine-specific demethylase1 repress the expression of FWA and FLOWERING LOCUS C and thus promote the floral transition. Plant Cell 19, 2975–2987 (2007).
Martignago, D. et al. The four FAD-dependent histone demethylases of Arabidopsis are differently involved in the control of flowering time. Front. Plant Sci. 10, 669 (2019).
Mori, S. et al. Cotranscriptional demethylation induces global loss of H3K4me2 from active genes in Arabidopsis. EMBO J. 42, e113798 (2023).
Shibaya, T. et al. Hd1, encoding histone acetylase related to Arabidopsis FLOWERING LOCUS D, is involved in the control of flowering time in rice. Plant Cell Physiol. 57, 1828–1838 (2016).
Hu, Q., Jin, Y., Shi, H. & Yang, W. GmFLD, a soybean homolog of the autonomous pathway gene FLOWERING LOCUS D, promotes flowering in Arabidopsis thaliana. BMC Plant Biol. 14, 1–12 (2014).
Liu, M. et al. Functional characterization of the lysine-specific histone demethylases family in soybean. Plants 11, 1398 (2022).
Abe, A. et al. Genome sequencing reveals agronomically important loci in rice using MutMap. Nat. Biotechnol. 30, 174–178 (2012).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 1–21 (2014).
Zhang, X., Bernatavichute, Y. V., Cokus, S., Pellegrini, M. & Jacobsen, S. E. Genome-wide analysis of mono-, di- and trimethylation of histone H3 lysine 4 in Arabidopsis thaliana. Genome Biol. 10, 1–14 (2009).
Inagaki, S. et al. Gene-body chromatin modification dynamics mediate epigenome differentiation in Arabidopsis. EMBO J. 36, 970–980 (2017).
Wang, L. et al. Receptor kinase FERONIA regulates flowering time in Arabidopsis. BMC Plant Biol. 20, 1–16 (2020).
Zhang, M. et al. iSoybean: a database for the mutational fingerprints of soybean. Plant Biotechnol. J. 20, 1435–1437 (2022).
Liu, Y. et al. Pan-genome of wild and cultivated soybeans. Cell 182, 162–176 (2020).
Liu, Y. et al. H3K4me2 functions as a repressive epigenetic mark in plants. Epigenetics Chromatin 12, 1–14 (2019).
Oya, S., Takahashi, M., Takashima, K., Kakutani, T. & Inagaki, S. Transcription-coupled and epigenome-encoded mechanisms direct H3K4 methylation. Nat. Commun. 13, 4521 (2022).
Wang, Y. et al. The Arabidopsis DREAM complex antagonizes WDR5A to modulate histone H3K4me2/3 deposition for a subset of genome repression. Proc. Natl Acad. Sci. USA 119, e2206075119 (2022).
Chen, F. et al. Structural insight into substrate recognition by histone demethylase LSD2/KDM1b. Cell Res. 23, 306–309 (2013).
Fang, R. et al. Human LSD2/KDM1b/AOF1 regulates gene transcription by modulating intragenic H3K4me2 methylation. Mol. Cell 39, 222–233 (2010).
Janardhan, A. et al. Prominent role of histone lysine demethylases in cancer epigenetics and therapy. Oncotarget 9, 34429–34448 (2018).
Mishra, K. & Kanduri, C. Understanding long noncoding RNA and chromatin interactions: what we know so far. Non-Coding RNA 5, 54 (2019).
Hung, F. et al. The expression of long non-coding RNAs is associated with H3Ac and H3K4me2 changes regulated by the HDA6-LDL1/2 histone modification complex in Arabidopsis. NAR Genom. Bioinform. 2, lqaa066 (2020).
Malivert, A. & Hamant, O. Why is FERONIA pleiotropic?. Nat. Plants 9, 1018–1025 (2023).
Tian, J. et al. Teosinte ligule allele narrows plant architecture and enhances high-density maize yields. Science 365, 658–664 (2019).
Wang, Z. et al. Loss of salt tolerance during tomato domestication conferred by variation in a Na+ /K+ transporter. EMBO J. 39, e103256 (2020).
Liu, G. et al. Genome-wide association study and haplotype analysis reveal novel candidate genes for resistance to powdery mildew in soybean. Front. Plant Sci. 15, 1369650 (2024).
Li, Y. et al. Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci. N. Phytol. 188, 242–253 (2010).
Acknowledgements
We are grateful to Prof. Qingxin Song from Nanjing Agricultural University for providing the mutants of GmFER used in this study, Prof. Mei Hong from South China Agricultural University for critical reading of the manuscript. This work was supported by the open competition program of top ten critical priorities of Agricultural Science and Technology Innovation for the 14th Five-Year Plan of Guangdong Province (2022SDZG05 to C.Y.), National Science and Technology Major Project 2030 (2024ZD040780104 to Y.W.), the Program for Guangdong Laboratory for Lingnan Modern Agriculture (NG2022002 to Y.W.), the National Natural Science Foundation of China (Grant Nos. 32172061 to C.Y.), Guangdong Pearl River Talent Program (Introduction of Innovation and Entrepreneurship Teams: 2021ZT09N333 to Z.D.), Double first-class discipline promotion project of SCAU (2023B10564004 to Y.W. and C.Y.), and South China Agricultural University.
Author information
Authors and Affiliations
Contributions
Y.W., C.Y., and Z.D. designed research; M.L., H.C., X.X., B.Y., G.L., and C.L. performed research; J.J. and Y.F. analyzed data; M.L. and Y.F. wrote the original draft; Y.W. and C.Y. revised the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Fanjiang Kong and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Liu, M., Fang, Y., Jiang, J. et al. Natural variation in histone demethylase GmLDL2 confers flowering time divergence for geographical expansion in soybean. Nat Commun 16, 5687 (2025). https://doi.org/10.1038/s41467-025-61328-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-61328-6