Abstract
Cereal seeds are vital for food, feed, and agricultural sustainability because they store and provide essential nutrients to human and animal food and feed systems. Unraveling molecular processes in seed development is crucial for enhancing cereal grain yield and quality. We analyze spatiotemporal transcriptome and metabolome profiles during sorghum seed development in the inbred line ‘BTx623’. Morphological and molecular analyses identify the key stages of seed maturation, specifying starch biosynthesis onset at 5 days post-anthesis (dpa) and protein at 10 dpa. Transcriptome profiling from 1 to 25 dpa reveal dynamic gene expression pathways, shifting from cellular growth and embryo development (1–5 dpa) to cell division, fatty acid biosynthesis (5–25 dpa), and seed storage compounds synthesis in the endosperm (5–25 dpa). Network analysis identifies 361 and 207 hub genes linked to starch and protein synthesis in the endosperm, respectively, which will help breeders enhance sorghum grain quality. The availability of this data in the sorghum reference genome line establishes a baseline for future studies as new pangenomes emerge, which will consider copy number and presence-absence variation in functional food traits.
Similar content being viewed by others
Introduction
Sorghum [Sorghum bicolor (L.) Moench] stands out as a versatile and climate-smart crop, ranking among the world’s top five cereals in terms of production. It plays a crucial role in providing dietary calories and essential nutrients for a substantial proportion of the global population1,2,3,4. The challenges posed by population growth, climate change, and the increasing demand for nutritious cereal crops underscore the need to enhance both the quantity and quality of sorghum grain production5,6,7,8,9,10. To meet these challenges, plant breeders need a comprehensive understanding of the molecular, biochemical, and physiological mechanisms governing sorghum seed development. Such insights will ensure an ample and nutritious food supply in the face of climate change.
The sorghum seed is a complex system comprised of genetically distinct tissues: a diploid embryo, a triploid endosperm, and diploid maternal tissues11,12,13. Following double fertilization, an evolutionarily conserved process in all flowering plants, the zygote develops into the embryo while the central cell transforms into the endosperm. The endosperm serves as a nutrient-rich storage tissue, supplying the energy required for the initial growth of the embryo and subsequent germination in monocots11,14,15. The developmental timeline from fertilization of the ovule to seed maturity in sorghum is typically 40–45 days16. Initially, from 3–5 days post-anthesis (dpa), there is limited growth, and no apparent development of the embryo or endosperm11,17. Subsequently, endoreduplication occurs, followed by starch accumulation. While starch accumulation in maize initiates at 10 dpa18,19, in sorghum, it commences at 5 dpa11. From 6–24 dpa, the caryopsis, embryo, and endosperm undergo rapid growth, accompanied by significant changes in seed size. However, from 24–35 dpa, the growth rate diminishes, and only slight alterations in the sizes of the caryopsis, embryo, and endosperm occur16. These observations indicate three primary developmental stages in the sorghum caryopsis: an early stage before 6 dpa, a middle stage spanning 6–24 dpa, and a late stage extending from 25–35 dpa20 .
Starch metabolism is a dynamic physiological process that is required for energy storage and utilization21. It involves a sophisticated interplay between sucrose metabolism and various tightly regulated pathways governed by many enzymes22. Among those enzymes are ADP-glucose pyrophosphorylase (AGPase) and various starch synthases (SSs), starch branching enzymes (SBEs), and starch debranching enzymes (DBEs). In sorghum, kafirins are the predominant seed storage proteins, constituting 77 to 82% of endosperm protein and 68 to 73% of total protein in whole sorghum grain23,24. Non-prolamin proteins, namely albumins, globulins, and glutelins, make up the remaining 20% of protein. Kafirins have molecular weight-based classifications as α-kafirins (25–23 kDa), β-kafirins (20–16 kDa), γ-kafirins (28–50 kDa), and δ-kafirins (13 kDa)25,26,27,28. A total of 27 previously reported kafirin genes in the sorghum genome include 23 α-kafirins, 1 β-kafirin, 2 γ-kafirins, and 1 δ-kafirin24. Kafirins exist as monomeric proteins, small oligomeric protein complexes, and large polymeric protein complexes, held together by inter-protein disulfide bonds. Sorghum kafirins can be further classified as kafirin 1 and 2 based on solubility during protein extraction. Kafirin 1 comprises proteins not heavily cross-linked into large polymeric structures, while kafirin 2 is solubilized from the remaining large polymeric complexes29. The ratio of kafirin 1 to kafirin 2 is a crude measure of protein cross-linking in the sorghum seed30.
Seed development is a process that is notable for dynamic physiological and biochemical changes31,32. The chemical composition of mature seeds is shaped by complex gene expression networks. Recent years have seen a surge in transcriptomic analyses investigating seed development in diverse plant species, including Arabidopsis thaliana, Oryza sativa, Triticum, Zea mays, Paeonia, Medicago truncatula, Brassica napus, Hordeum vulgare, and Glycine max33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59. These studies have advanced our understanding of seed spatiotemporal gene expression patterns and their regulation, offering genetic insights that are applicable to breeding for quality traits. For instance, high-throughput RNA sequencing (RNA–Seq) in maize identified genes and transcription factors (TFs) strongly associated with amylose and amylopectin biosynthesis60. Similarly, transcriptome analyses of developing soybean seeds revealed hub genes implicated in oil and protein accumulation55,61. The integrated metabolomic and transcriptomic analyses of rice seeds identified candidate genes involved in the structural modification of anthocyanins62. These examples underscore the potential for integrating metabolomic and transcriptomic information during seed development to clarify the molecular mechanisms driving the accumulation of desirable chemical profiles. Consequently, integrating the transcriptomic and metabolomic profiles of developing sorghum seeds holds promise for generating new molecular breeding resources, thereby enhancing sorghum seed quality and yield for a hungry planet.
Despite the importance of sorghum in global food and feed systems, a significant gap remains in our understanding of gene expression dynamics during sorghum endosperm and embryo development. A recent study reported the transcriptome of developing sorghum seeds at various timepoints from 5 to 25 dpa, providing insights into overall seed development, but the transcriptomes of the embryo and endosperm were not differentiated63. Additionally, there are no published studies that comprehensively explore the transcriptomic and metabolomic networks governing carbon allocation tradeoffs driving the accumulation of starch and protein throughout seed development, which is a major target for plant breeding. Research is particularly needed to clarify stage- and tissue-specific gene expression and crosstalk within and among the embryo, endosperm, and whole seed64,65,66.
To address this knowledge gap, we conducted an in-depth transcriptomic analysis of developing sorghum seeds, dissecting the early whole seed, embryo, and endosperm tissues from fertilization through maturity. Complementing these efforts, metabolomic analyses were performed at five key seed developmental stages to gain insights into the accumulation of specific metabolites that drive nutritional quality, ultimately contributing to the mature grain quality profile. Our findings have unveiled hub genes and metabolites crucial for regulating mature sorghum seed chemistry profiles, broken down by their specificity to the embryo, endosperm, and/or the whole seed. This comprehensive analysis marks a significant step toward unraveling the intricate molecular mechanisms underlying sorghum seed development, with implications for enhancing nutritional quality and overall yield.
Results
Morphological analyses of sorghum seed development
To comprehensively characterize sorghum seed development, we collected developing seed samples from the reference genome line ‘BTx623’ spanning 1–25 dpa (Supplementary Data 1). Over this period, the seed coat exhibited a transition from bright green (1–12 dpa) to light green (13–21 dpa), ultimately a yellowish-green hue (22–25 dpa) (Fig. 1a). The fresh weight of the seeds, depicted in Supplementary Fig. 1a, exhibited a gradual increase post-pollination, reaching its peak at 22 dpa (average 1.23 g/50 grain). Notably, the rate of seed weight gain during early stages (1–15 dpa) surpassed that of later stages (16–25 dpa). By the final timepoint of the study at 25 dpa, sorghum seeds had entered the desiccation stage, making the separation of the embryo and endosperm challenging and justifying the termination of the experiment.
a Sequential morphological transitions of BTx623 sorghum whole seed, embryo, and endosperm at various developmental stages. b Cryo-scanning electron microscopy (cryo-SEM) images illustrating the ultrastructural changes in BTx623 sorghum endosperms at 5, 10, 15, 20, and 25 dpa. Bar: 20 μm. c Quantification of kafirin 1, kafirin 2, and total protein content in BTx623 sorghum seeds at 5, 10, 15, 20, and 25 dpa. Error bars denote the standard error of three replicates. n = 3.
Scanning electron microscopy (SEM) imaging revealed the presence of starch granules at 5 dpa (Fig. 1b), with the quantity and dimensions of these granules gradually increasing in subsequent developmental phases. These observations align with previous research indicating that endoreduplication precedes starch accumulation in sorghum11. Concurrently, kafirin 1 (monomeric proteins and small oligomeric complexes) and kafirin 2 (polymeric cross-linked complexes) emerged at low levels between 5 and 10 dpa but increased between 15 and 25 dpa. This suggests a crucial developmental shift in kafirin accumulation and crosslinking between 10 and 15 dpa. The abundance of kafirin 1 surpassed that of kafirin 2 at all stages (Fig. 1c). At 25 dpa, kafirin 1 constituted approximately 84% of the total protein content, while kafirin 2 comprised the remaining 16% (Fig. 1c). Consequently, the sampled timepoints in this study (5, 10, 15, 20, and 25 dpa) were representative stages of sorghum seed development based on SEM.
Dynamic metabolic changes during sorghum seed development
Differential metabolite accumulation was assessed across the five sampled timepoints (Supplementary Data 2; Fig. 2a). Among 7959 detected peaks, a total of 2073 metabolites were successfully identified and 955 were assigned to functionally annotated pathways (Supplementary Fig. 1b). The functionally annotated metabolites were grouped into 13 functional categories (Supplementary Fig. 1c; Supplementary Data 3). The top enriched pathways within these categories included the biosynthesis of secondary metabolites (20.23%), amino acid metabolism (18.75%), lipid metabolism (13.98%), and carbohydrate metabolism (13.09%) (Supplementary Fig. 1d). Pathway analysis of the six clusters of metabolites, reflecting different stages of development (Supplementary Fig. 2a), revealed that starch biosynthesis, initiates at 5 dpa, transitioning to protein biosynthesis and degradation after 15 dpa (Supplementary Fig. 2b–g). This observation agreed with the SEM imaging of starch granules and kafirin quantification (Fig. 1b, c). Taken together, these results suggest that the starch and fatty acid contents in sorghum seeds were determined before final protein content, potentially contributing to the well-known negative correlation between starch and protein content.
a Morphological transitions in BTx623 sorghum seeds at five critical timepoints (5, 10, 15, 20, and 25 dpa) subjected to metabolome analysis. b Volcano plot displaying metabolite features, with numbers indicating significantly upregulated [log2 (fold-changes) ≥ 1.5; q-values ≤ 0.05] and downregulated [log2 (fold-changes) ≤ -1.5; q-values ≤ 0.05] metabolites in the specified comparisons. Gray-dashed lines represent the q-value and fold-change filter. c–f Categorization of upregulated and downregulated metabolic pathways according to the Kyoto Encyclopedia of Genes and Genomes (KEGG) for different comparison groups (10 Vs. 5 dpa, 15 Vs. 5 dpa, 20 Vs. 5 dpa, and 25 Vs. 5 dpa). Node colors indicate p-values, with white and red denoting lower and higher p-values, respectively. Node radii correspond to pathway impact values, with smaller and larger radii indicating lower and higher impact values, respectively.
Principal component analysis (PCA) highlighted distinct variations in metabolite profiles across the five timepoints (Supplementary Fig. 3; Fig. 2a). A total of 1495 compounds exhibited differential accumulation throughout seed development (Fig. 2b; Supplementary Data 4), indicating a stage-specific accumulation pattern. Up-regulated metabolites between 10 and 5 dpa were associated with the biosynthesis of fatty acids, linoleic acid, sugar metabolism, and lysine, while down-regulated metabolites were linked to flavanol biosynthesis and the pentose phosphate pathway (Fig. 2c). This pattern suggested a resource reallocation favoring essential processes during the early stages of seed development, with upregulation of high-energy molecules and downregulation of metabolites related to secondary metabolism and nucleotide synthesis. Similar trends were observed in comparisons between the 15 and 5 dpa (Fig. 2d). In contrast, comparisons between the 20 and 5 dpa, as well as the 25 and 5 dpa, revealed a shift toward alanine, aspartate, glutamate, flavonoid, and linoleic acid biosynthesis, indicating an emphasis on protein synthesis during later stages of seed development (Fig. 2e & f). The 189 metabolites that were consistently up-regulated and the 234 consistently down-regulated metabolites across the five timepoints during seed development (Supplementary Fig. 4a–c) likely play roles in the accumulation of sorghum protein, starch, and oil. The consistently up-regulated metabolites were associated with the synthesis of lipids, phenolic acids, and flavonoids (Supplementary Fig. 4d), while the down-regulated metabolites were enriched in linoleic acid, monoterpenoid biosynthesis, and the biosynthesis of unsaturated fatty acids (Supplementary Fig. 4e). Collectively, these trends suggested that lipid metabolism, secondary metabolite production, and nucleotide biosynthesis undergo dynamic modulation throughout seed development.
The transcriptome landscape of sorghum seed development
Transcriptome profiling of sorghum seed development encompassed 45 samples (1–9 dpa for the early whole seed, 6–25 dpa for the endosperm, and 10–25 dpa for the embryo), each with two replicates (Supplementary Data 5 & 6). We obtained over 218.8 million high-quality reads, averaging 23.78 million reads per replicate (Supplementary Data 5). The robust correlation (average R² = 0.976) between the two replicates for each sample underscored the high quality of the data (Supplementary Data 5). Additionally, the qPCR results from three replicates of four randomly selected genes at the five major timepoints (5, 10, 15, 20, 25 dpa) closely matched the RNAseq data (Supplementary Fig. 5), further validating the reliability of our findings.
A total of 21,971 genes were expressed (FPKM ≥ 1) during sorghum seed development (Supplementary Data 7). More genes were expressed in early whole seeds and endosperms than in later stages (Supplementary Fig. 6a), indicating heightened gene activity during early seed development as tissue types and specialized cell layers initially diversify. The higher expression (average FPKM of all genes from 10–25 dpa) in the embryo compared to the endosperm suggests greater metabolic activity and more complex developmental processes in the embryo. The distinctiveness of the transcriptome landscapes between these tissues was further confirmed by the greater median gene expression level in the embryo compared to the endosperm (Supplementary Fig. 6b).
Among the 2049 genes specifically expressed in the 1–9 dpa whole seed (Supplementary Fig. 6c), 1558 were previously identified in the sorghum ovary cell wall using RNA-Seq data67. This was expected, as whole sorghum seeds (caryopses) are comprised of distinct maternal (pericarp, derived from the ovary cell wall) and daughter (embryo, endosperm) tissues (Supplementary Fig. 6d). Similarly, 795 embryo-specific genes were enriched for pathways related to embryogenesis (Supplementary Fig. 6e), while 397 endosperm-specific genes were enriched in metabolic and mitogen-activated protein kinase (MAPK) signaling pathways (Supplementary Fig. 6f).
A PCA of the transcriptome data effectively separated developing seeds into three groups based on their tissue identity, validating their distinct developmental activities (Fig. 3a). Early whole seed samples collected at 1–5 dpa formed a separate cluster from samples collected at 6–9 dpa. The latter cluster exhibited proximity to the endosperm sample, indicating shared gene activity within the whole seed and young endosperm. In the hierarchical clustering of gene expression within the embryo, the first and second clusters were associated with morphogenesis and maturation processes, respectively (Fig. 3b). This aligns with the embryo’s sequence of active DNA synthesis, cell division, and differentiation in early and middle phases, followed by the synthesis of storage reserves and desiccation processes in later phases17,20,46,68. The three endosperm clusters aligned with canonical stages guiding harvest times, encompassing the milk, soft dough, and hard dough phases (Fig. 3c). The milk phase coincided with cellularization, following the syncytial phase which involves the formation of cell walls and the partitioning of the endosperm into discrete cells. The soft dough and hard dough phases correlated with the grain-filling phase, characterized by the development of distinct cell types and the accumulation of storage reserves17,20.
a Principal Component Analysis of the RNA-seq data for the 45 seed samples (1–9 dpa of the early whole seed, 6–25 dpa of the endosperm, and 10–25 dpa of the embryo) revealing three distinct clusters corresponding to each tissue. Cluster dendrogram displaying the global transcriptome relationships among time series samples of the embryo (b) and endosperm (c). The lower row indicates the developmental phases, as per the cluster dendrogram of the time series data, with numbers indicating days post-anthesis.
Main pathways involved in sorghum seed development
To identify the active cellular processes in developing sorghum seeds, we employed a k-means clustering methodology, which revealed 12, 16, and 15 co-expression clusters in the early whole seed, embryo, and endosperm, respectively (Fig. 4a, b; Supplementary Fig. 7; Supplementary Data 6). In early whole seeds (1–9 dpa), clusters c1–c6 (1–5 dpa; early stage) were enriched in genes controlling cellular growth, proliferation, and fundamental structures essential for seed development (Supplementary Fig. 7). In contrast, clusters c7–c11 (5–9 dpa; middle stage) were enriched in processes related to embryo development and storage compound accumulation like starch biosynthesis, which aligned with SEM imaging showing starch granules emerging at 5 dpa. Genes constitutively expressed from 1–9 dpa (c12) were associated with endosomal vesicle fusion, vacuolar acidification, Nicotinamide Adenine Dinucleotide (NAD) biosynthesis, organelle fusion, vacuole organization, and embryo development.
a Expression patterns of co-expression clusters for the embryo. b Expression patterns of co-expression clusters for the endosperm. These patterns are organized according to the sample timepoints at which they peak. Functional categories enriched within different co-expression clusters for the embryo and endosperm are listed, reflecting various stages of sorghum seed development. For each gene, the RPKM value is displayed, normalized in relation to the maximum RPKM value observed for that gene across all timepoints.
In the embryo samples, primary active pathways were related to cell division, fatty acid biosynthesis, and embryo development. The middle stage (c1–c7) of embryo development (10–18 dpa) was enrichment in pathways regulating cellular processes, including the cell cycle, starch, and amino acid biosynthesis. The later stage (c8–c15; 19–25 dpa) indicated active embryo growth with enriched fatty acid and lipid biosynthesis pathways, suggesting a crucial role in providing energy and nutrition. Genes in c16, expressed across all stages, were enriched for embryo development, protein folding, membrane organization, and transport, indicating their fundamental roles across all timepoints (Fig. 4a).
In the endosperm, a distinct shift toward the activation of storage pathways (starch and storage proteins) occurred after initial cell division in early stages of development (Fig. 4b). Clusters c1–c7 (6–14 dpa) included genes related to cell cycle processes, metabolic processes, and seed storage compound biosynthesis, indicating their roles in regulating nutrient storage and energy metabolism. During middle and late stages, extensive growth and differentiation in the endosperm coincided with the accumulation of storage reserves for starch and protein. Genes in c15, expressed throughout endosperm development, highlighted an emphasis on cellular metabolism and function. Programed cell death (PCD) is a crucial process for the cereal endosperm as it transitions from cell division to nutrient accumulation69,70. Ethylene is one of the major hormones involved in the PCD71. Based on the ethylene biosynthesis enzymes in rice72, we noted the peak expression of most of these genes during the early stages of sorghum endosperm development (6-10DPA, Supplementary Fig. 8a). Notably, the primary enzyme initiating ethylene synthesis methionine adenosyltransferase (SAM), encoded by the Sobic.003G151600 and Sobic.009G033600 genes, exhibited a pronounced downregulation trend during sorghum endosperm development. This observation suggests that ethylene acts as a negative regulator of grain filling and PCD. Consistently, other documented PCD regulator genes in maize, such as ZmDEK4073, ZmDEK6674, ZmATR, and ZmATM75, showed similar downregulation trend in the process of sorghum endosperm development (Supplementary Fig. 8b).
Hub genes and key networks associated with starch and protein synthesis
Understanding the regulation of biosynthetic networks that govern seed nutrition is crucial for enhancing sorghum grain quality. The presented data highlighted that starch and protein biosynthesis in BTx623 sorghum seeds predominantly took place in the endosperm (Supplementary Fig. 9a, b; Supplementary Data 8), aligning with its role as a storage tissue for energy during germination and early seedling growth76.
In endosperm, the expression patterns of sorghum ortholog genes associated with starch and kafirin biosynthesis revealed distinct trends (Supplementary Fig. 9c, d). Starch biosynthesis was most active between 5–15 dpa, while protein biosynthesis primarily occurred during 15–25 dpa. Interestingly, kafirin genes constituted 44.77% of total endosperm transcripts from 6–25 dpa, with a notable increase from 24.67% (6–15 dpa) to 62.16% (16–25 dpa), indicating predominant kafirin synthesis post-15 dpa (Supplementary Fig. 10a). The most abundant kafirin gene transcripts during endosperm development were α-kafirins (34.20%), followed by γ-kafirins (6.99%), β-kafirins (3.29%), and δ-kafirins (0.277%) (Supplementary Fig. 10a). This agrees with the first observation of starch granules at 5 dpa in the SEM imaging and the distinct metabolite enrichments at the five major timepoints. Throughout all stages, the average expression level of kafirin and starch synthesis genes was higher in the endosperm than in the embryo (Supplementary Fig. 10b). For example, a significant proportion of kafirin genes (23 out of 27) ranked among the 100 most highly expressed genes in the endosperm, compared to only 9 out of 27 genes in the embryo (Supplementary Fig. 10c; Supplementary Data 9 & 10).
A co-expression network analysis using the 20,491 genes expressed in the endosperm (FPKM ≥ 1) was conducted to scrutinize the regulation of starch and protein biosynthesis. Soft clustering was employed to allow genes to belong to multiple clusters. Among the 12 co-expression modules (Supplementary Fig. 11; Supplementary Data 11), modules 8 and 12 exhibited significant enrichment (FDR < 0.05) in starch biosynthesis-related genes, as determined by Fisher’s Exact Test (Fig. 5a, Supplementary Fig. 12 a, b). In addition, genes from the same modules were associated with diverse functional categories such as proteosome activity, N-glycan biosynthesis, participation in the tricarboxylic acid (TCA) cycle, spliceosome activity, amino acid synthesis, oxidative phosphorylation, and DNA replication (Fig. 5b). Gene Network Analyzer analysis of these modules identified 361 as hub genes based on two criteria: gene degree of connectivity ≥ 5 in the hub module and gene module membership > 0.8. Many hub genes encode proteins participating in the TCA cycle, ribosome biogenesis, oxidative phosphorylation, DNA replication, starch, and sucrose metabolism. For example, the top 10 highly connected genes (Supplementary Data 12) included genes that code for succinate dehydrogenase, metallopeptidase M24 family proteins, the translation initiation factor 3B family, and elongation factor 1-gamma 3. These results indicate that the core enzymes in starch synthesis are regulated by the identified hub genes. For instance, Sobic.007G023400, one of the hub genes, encodes the succinate dehydrogenase iron-protein subunit (SDHB), a crucial component of the succinate dehydrogenase enzyme complex that is essential for the TCA cycle, Krebs cycle, and the electron transport chain during cellular respiration77. Notably, two genes involved in starch branching, Sobic.001G083900 (SbPHOL) and Sobic.003G358600 (SbPHOH), were also identified as hub genes, emphasizing their regulatory role in starch biosynthesis.
Clustering analysis using the Mfuzz package. The fuzzy c-means clustering algorithm uses a soft partitioning clustering method. Twelve co-expression modules were obtained. Yellow or green lines correspond to genes with a low membership value, whereas red and purple lines correspond to genes with a high membership value. Most genes showed a high membership value. Module numbers and the corresponding P-value are included above each cluster. a Modules 8 and 12 showed significant enrichment (FDR < 0.05) for starch biosynthesis-related genes using Fisher’s Exact Test. b GO-enrichment analysis of genes enriched in modules 8 and 12, visualized as a network. c Modules 8 and 12 showed enrichment (FDR < 0.05) for kafirin biosynthesis-related genes using Fisher’s Exact Test. d GO-enrichment analysis of genes enriched in modules 4 and 10, visualized as a network. Node colors represent enrichment significance (FDR < 0.05), node size indicates gene set size, and edge thickness signifies gene overlap. The analysis was performed using ShinyGO.
Among the 12 co-expression modules in the endosperm (Supplementary Fig. 11), modules 10 and 4 exhibited significant enrichment (FDR < 0.05) for kafirin genes (Fig. 5c; Supplementary Fig. 12c). Specifically, Module 4 encompasses 15 α-kafirin genes, while Module 10 includes six α-kafirin genes and two γ-kafirins. The β-kafirin and δ-kafirin genes were present in Modules 2 and 5, respectively. These findings indicated that different types of kafirins are synthesized at different stages of seed development. Notably, β- and δ-kafirins were expressed exclusively in the later stages of endosperm development (20–25 dpa), while α-kafirin expression spans from 15 to 25 dpa in wild-type sorghum.
The 1719 genes within modules 4 and 10, including 23 kafirin genes, were functionally implicated in crucial biological processes such as carbon metabolism, lipid metabolism, the MAPK signaling pathway, and seed storage protein processes, based on GO term analysis (Fig. 5d). The function of genes co-expressed with seed storage proteins could imply their involvement in the biosynthesis, accumulation, and/or mobilization of these proteins during seed development. Subsequently, we identified 207 hub genes related to kafirin biosynthesis from modules 4 and 10 (Supplementary Fig. 12d; Supplementary Data 13). These genes were significantly enriched in various biological processes such as lipid metabolism, fatty acid degradation, amino acid biosynthesis and degradation, MAPK signaling, carotenoid biosynthesis, and hormonal signaling. The top 10 most highly connected genes are presented in Supplementary Data 13. Some of these top hub genes have been investigated for roles in protein synthesis in other crops. For instance, extra-large GTP-binding proteins have been reported to play key roles in regulating panicle architecture, plant growth, development, grain weight, and disease resistance78. Similarly, bZIP TF is known to play a key role in regulating various biological pathways, including seed storage protein biosynthesis79,80,81,82. Further characterization of various alleles of these genes could provide valuable insights into the molecular mechanisms and regulatory networks underlying these processes and indicate gene targets for modifying protein content through molecular breeding.
Discussion
Sorghum is a major climate-smart cereal crop that will continue to play a significant role in adapting global food and feed systems to climate change. However, the molecular mechanisms governing sorghum seed development have not yet been explored as extensively as in other cereals44,45,83,84,85,86,87,88. Previous characterization of the transcriptomic landscapes of endosperm development in wheat, rice, oat, barley, and maize have provided a solid foundation for the present study. Here, we employed RNA-Seq and untargeted metabolome profiling to capture dynamic transcriptome and metabolome profiles of sorghum seed development. Our findings provide significant insights into the interplay of gene expression and metabolite accumulation across five developmental timepoints, offering a comprehensive temporal perspective on functional and cellular specialization in the whole seed (maternal + daughter tissues), embryo, and endosperm (daughter tissues).
Our research has shed light on the dynamic landscape of transcriptomic and metabolomic activity during sorghum seed development, focusing on the reference genome line BTx623. This study lays the groundwork for clarifying the significant diversity observed in seed traits within global sorghum mapping and breeding populations. For instance, a genome-wide association study involving 837 varieties revealed 81 quantitative trait loci (QTLs) associated with grain size, which influences starch and protein harvestable yields on a per-area basis89. Other studies have documented the extensive diversity in grain quality traits across sorghum germplasm, including variations in seed color, starch, protein, and oil contents90,91,92. This diversity underscores the multitude of phenotypic variants impacted by sorghum seed development.
Future investigations are needed in diverse sorghum varieties to elucidate how genetic variation in starch, protein, and oil biosynthetic pathways results in differential carbon partitioning among them; and how that partitioning is impacted by whole-plant phenotypes and local adaptation. Comparative genomics studies should focus on differences in seed development processes among varieties having genetic variation in hub genes to shed light on tissue-specific metabolite accumulation patterns and contribute to improved seed quality, stress responses, and adaptation to local production environments. Hence, the baseline information presented herein about sorghum seed developmental programing holds promise for enhancing the utility of sorghum in breeding for climate-smart food and feed systems.
Tissue-specific (TS) genes play a pivotal role in unraveling the mechanisms governing tissue or organ identity and can be crucial in guiding the progression through seed developmental programs93,94. In our study, we identified 499 TS genes, including 41 TFs, expressed specifically in early whole seeds, embryos, and endosperms (Supplementary Data 6; Supplementary Fig. 13a). Analyses revealed variations in the numbers of TS genes among embryos (127 genes, including 14 TFs), endosperms (71 genes, including 6 TFs), and early whole seeds (79 genes, including 12 TFs), with the endosperm exhibiting the lowest number of TS genes (Supplementary Fig. 13a), consistent with previous findings in maize95 and wheat88. This may suggest the relatively less complex structure of the endosperm due to its role as a storage tissue, compared to embryos or early seeds, which must differentiate into multiple organs.
The dynamic expression patterns and functional enrichment of TS genes indicated their involvement in specific tissues and stages of seed development (Supplementary Fig. 13b, c). For instance, genes specific to early whole seeds were primarily associated with cell wall biosynthesis and structural integrity, emphasizing the importance of creating space inside maternal tissues for rapid expansion of the embryo and endosperm. Embryo-specific genes were predominantly expressed in later stages of embryo development, indicating an early focus on general growth and tissue differentiation; whereas endosperm-specific genes were expressed throughout its development, emphasized its long-term programming (mediated through RNA processing and regulation) focused on starch and protein storage. Genes that were only expressed the embryo and/or endosperm (nowhere else in the plant) appeared to coordinate developmental processes and respond to environmental cues, particularly through sterol biosynthesis and abscisic acid (ABA) responses. ABA in particular is a well-known player in seed maturation, dry-down, dormancy, germination, and both seed and whole-plant environmental responses96.
While the functions of most seed-specific TFs remain unknown, our analysis revealed enrichment with known regulator families of seed development (e.g., WOX, NF-YB, NAC, ERF, AP2, MYB, Myb_related). These TF families are recognized for their key roles in events unique to seeds, especially in the formation and maturation of the endosperm and embryo97,98,99. This suggests that the remaining TS genes and TFs we identified may also hold regulatory roles in seed development. Future endeavors should focus on elucidating their roles by leveraging the genetic variation present in mutant and natural populations. The identified TS genes and TFs, along with the newly mapped gene networks governing starch and protein in BTx623, provide a crucial starting point for understanding how gene activity and metabolite accumulation is coordinated during seed development and organogenesis, as determined by SEM. These processes collectively influence the yield, quality, and nutrient profiles of sorghum grain.
Materials and methods
Plant material and field experiments
The study utilized the Sorghum bicolor cultivar ‘BTx623,’ cultivated under field conditions at the Quaker Research Farm in Lubbock, TX (33°35'52.9“N 101°54'21.4“W, elevation 992 m) during the summer of 2022. The farm experiences a semi-arid climate with an average yearly precipitation of 469 mm, primarily from May to October, and features Amarillo sandy clay loam soil100. Irrigation was maintained at 1 inch water per week.
Developing sorghum seeds were collected following successful pollination for detailed characterization (Supplementary Data 1). In brief, we conducted daily sampling from pollination until 30 dpa. Before flowering, panicles were covered with pollination bags to prevent cross-pollination. After pollination, these bags were replaced with mesh ones to safeguard seeds from birds while allowing light exposure and improved air circulation. From pollination to 15 dpa, we harvested the middle portion of 10 panicles that flowered on the same day for each replicate. Subsequently, for samples beyond 15 dpa, we collected the middle portion of 5 panicles per biological replicate. Three biological replicates for all data points were collected for transcriptome analysis and five replicates for 5, 10, 20, 25, and 30 dpa for metabolome analysis. Sampling was consistently conducted in the morning (between 9:00 AM to 11:00 AM) to minimize potential circadian influences. Following collection, samples were promptly transported to the laboratory, where they were dissected on ice using scalpels and tweezers to isolate embryos and endosperms.
For metabolome samples, 100 uniform seeds were isolated from the harvested panicles, flash-frozen in liquid nitrogen in 15 mL falcon tubes and stored at −80 °C. During seed dissection for embryo and endosperm RNA extraction, we isolated 50 uniform seeds from the harvested panicle. For early embryo samples (10–18 dpa), we isolated embryos from 40–50 seeds, whereas for later embryo samples, 20 seeds were sufficient for RNA extraction. Similarly, for endosperm samples (6–10 dpa), 40–50 seeds were used for isolation, while for later time points, 10 seeds were used for RNA extraction. Subsequently, embryo and endosperm sample were flash-frozen in liquid nitrogen and stored at −80 °C until further analysis.
Kafirin analysis
Kafirin content analysis involved a step procedure to assess total kafirin levels and the degree of cross-linking (polymerization). Kafirin fractions were selectively extracted under non-reducing conditions (kafirin 1) and reducing conditions (kafirin 2) following the method outlined by Da Silva et al30. Seeds from days 5, 10, 15, 20, and 25 dpa were retrieved from −80 °C storage, immediately crushed using a mortar and pestle, and then returned to −80 °C. The coarsely crushed material was lyophilized and ground into a fine powder using a mortar and pestle. Kafirin 1 and kafirin 2 were then extracted as described in Da Silva, et al.30. except 50 mg of sample and 0.5 mL of solvent were used. Following extraction, beta-mercaptoethanol (BME) was added to kafirin 1 extracts to achieve a final volume of 2%. After incubation with BME, samples underwent alkylation with 4-VP, as described in Bean, et al101. After kafirin 2 extraction, additional sample extraction solvent was added to equalize the total volume of kafirin 1 and kafirin 2. Kafirin 2 was then alkylated with 4-VP. Subsequently, kafirin 1 and kafirin 2 were subjected to analysis by RP-HPLC using C3 columns, following the procedure outlined in Bean, et al.101.
Metabolomic analysis
Untargeted metabolomic profiling was employed to analyze sorghum seeds using the LC-MS platform. The metabolites extraction and quantification were carried out by the service provider Innomics. For each replicate, whole seed were shipped in dry ice. For metabolite extraction, 50 mg of each sample was weighed into 1.5 mL Eppendorf tubes and immersed in a pre-cooled extraction solution (methanol: H2O = 7:3, v/v), supplemented with 20 μL of Internal Standard 1. Homogenization was conducted using a weaving grinder at 50 Hz for 10 minutes, followed by water bath ultrasonication at 4 °C for 30 min. After being held at −20 °C for 1 h, the extracts were centrifuged at 14,000 rpm at 4 °C for 15 min. The resulting 600 μL supernatant was filtered using a 0.22 μm membrane, and 20 μL of the filtered solution from each sample was composited into the mixed quality control (QC) sample to assess the repeatability and stability of LC/MS analysis.
A Waters 2777c UPLC (Waters, USA) in series with a Q Exactive HF high-resolution mass spectrometer (Thermo Fisher Scientific, USA) was utilized for the separation and detection of metabolites. Post-experiment, the off-line mass spectrometry data were imported into Compound Discoverer v3.3 (Thermo Fisher Scientific, USA) software. Analysis of the mass spectrometry data, in conjunction with the BGI metabolome database (bmdb), mzCloud database (https://www.mzcloud.org/), and ChemSpider online database (https://www.chemspider.com/), resulted in a data matrix containing metabolite peak area and identification results. The identified metabolites were annotated using the Kyoto Encyclopedia of Genes and Genomes pathway (KEGG; https://www.genome.jp/kegg/) and Human Metabolome Databases (HMDB; https://hmdb.ca/)102,103.A PCA and Partial Least Squares Discriminant Analysis (PLS-DA) were conducted with the metabolomics software MetaboAnalyst (https://www.metaboanalyst.ca/)104. Univariate analyses (t-tests) were used to calculate statistical significance (P-value). The following criteria were used to identify differentially expressed metabolites: Variable Importance in Projection value (VIP)> 1 and a P-value < 0.05, log2 (fold change) ≥ 1.5 or ≤ −1.5.
RNAseq and data analysis
RiboPure™ RNA Purification Kit (AM1924; Invitrogen) was utilized for total RNA isolation following the manufacturer’s instructions. We first used agarose gel electrophoresis, and a Nanodrop to assess the quality of the extracted RNA samples and check for DNA and protein contamination. At least 2 μg RNA was shipped to sequencing service provider Innomics on dry ice. RNA quality control and sequencing were performed by Innomics. Briefly, the RNA integrity number (RIN) value was calculated and samples with RIN ≥ 7 were used for RNA sequencing (Supplementary Data 5). Standard DNBSEQ Eukaryotic Transcriptome Resequencing protocols were followed for the construction of RNA-seq libraries, which were subsequently sequenced to generate PE150 reads by the service provider Innomics Inc. For the library construction, the fragmented mRNA was synthesized into first strand cDNA using random primers, while the second strand cDNA was synthesized with dUTP instead of dTTP. The synthesized cDNA was subjected to end-repair and 3’ adenylated. Adaptors were ligated to the ends of these 3’ adenylated cDNA fragments followed by the PCR amplifications. The raw data from the DNBSEQ platform was filtered to remove the adaptors, ployX and low-quality data by SOAPnuke software105 with parameters: “-n 0.001 -l 20 -q 0.4 --adaMR 0.25 --ada_trim --polyX 50 --minReadLen 150”.
To ensure data quality, the cleaned data underwent QC assessment using FastQC106. High-quality reads were aligned to the sorghum reference genome (version 3.3.1)107,108 using STAR109. The FPKM values representing gene expression levels were calculated using StringTie110. Pearson correlation coefficients between biological replicates were calculated based on gene FPKM values, and replicates with a Pearson correlation > 0.8 were selected for further analysis.
Genes were considered expressed at a specific stage if they met the following criteria: (1) a minimum of two reads were mapped to the gene in each of two replicates, and (2) the average FPKM at a timepoint was ≥ 1 in at least one sample. To mitigate the impact of transcriptional noise, genes with a minimum FPKM value ≥ 1 in at least one sample were included for downstream analysis, consistent with the approach used in several other studies34,35,46,99,111.
The PCA was employed to visually represent relationships among distinct seed tissue samples, utilizing the prcomp112 function within R with default settings. Hierarchical clustering was performed by k-mean clustering with the pheatmap package113 using default settings. The elbow method114,115 was applied to determine the optimal cluster number. Transformed and normalized gene expression values with log2 (FPKM + 1) were used for PCA analysis and hierarchical clustering. For hierarchical clustering, relative expression values of the genes were calculated by dividing their expression level at different timepoints by their maximum observed RPKM.
Functional enrichment analysis was conducted based on a hypergeometric test using KEGG and ShinyGO (http://bioinformatics.sdstate.edu/go74/)116. Enriched KEGG pathways with an FDR < 0.05 were considered statistically significant, and selected KEGG pathways are presented. The co-expression network was generated through the STRING database (https://string-db.org/)117. Mean FPKM values were clustered using Fuzzy c-means clustering in the Mfuzz v2.42 R package (https://www.bioconductor.org/packages/release/bioc/html/Mfuzz.html)118. The optimal number of clusters was set to 12, and the fuzzifier coefficient was set to 2.01. Genes with a membership score of at least 0.5 were plotted and used as inputs for categorical enrichment analysis.
qRT-PCR
The expression levels of selected genes were validated by quantitative real-time polymerase chain reaction (qRT-PCR). Total RNA was reverse transcribed into complementary DNA (cDNA) using iScript™ Reverse Transcription Supermix for RT-qPCR (Bio-Rad), according to the manufacturer’s instructions. qRT-PCR was performed with two technical replicates for each of the three biological replicates using SsoAdvanced Universal SYBR Green Supermix (Bio-Rad) on a Bio-Rad CFX96 system. Data was processed using CFX Manager software. The relative transcript levels were normalized to the expression of the reference gene Serine/threonine-Protein Phosphatase (PP2A). It was selected according to the Sorghum reference genes selection paper119. Oligonucleotides used for these experiments are listed in Supplementary Data 14.
Identification embryo, endosperm, and whole seed specific genes
A total of 42 previously published non-seed sorghum RNA-seq datasets were employed to identify the tissue-specific (TS) genes67,108,120,121,122,123,124,125,126,127. The TS genes were identified utilizing a TS scoring algorithm128,129 that compares the expression level of a gene in a given compartment with its maximal expression level in the other sample. Therefore, TS scores range from 0 to 1, and the higher the TS score of a gene for a tissue, the more likely the gene is specifically expressed in that tissue95,128,129. This study defined TS genes as having a TS score > 0.5, following similar criteria used in other studies in maize95 and yam111.
Statistics and reproducibility
Details of the statistical tests used in the study are provided in the respective methods sections and Supplementary Data. The RNA-seq, metabolome profiling and qPCR were performed with two, five and three independent biological replicates, respectively. Pearson correlations (r) among the replicates were calculated based on the expression levels (FPKM) of the genes. The assessment of GO term enrichment was conducted using Fisher’s Exact test, followed by adjustment for false discovery rate (FDR) as implemented in PANTHER19.0. For metabolome pathway enrichment analysis, the Hypergeometric Test was utilized, as implemented in MetaboAnalyst.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The source data behind the graphs in the main figures can be found in Supplementary Data 15. The RNA-seq data has been deposited into the Ensembl ArrayExpress collection in BioStudies under Accession number: E-MTAB-13406. The metabolomic data, encompassing compound names, formulas, exact Q1 (m/z) values, molecular weights, and peak intensities, are available in the supplementary data.
References
Hossain, M. S., Islam, M. N., Rahman, M. M., Mostofa, M. G. & Khan, M. A. R. Sorghum: A prospective crop for climatic vulnerability, food and nutritional security. J. Agriculture Food Res. 8, 100300 (2022).
Adebo, O. A. African sorghum-based fermented foods: past, current and future prospects. Nutrients 12, 1111 (2020).
Mundia, C. W., Secchi, S., Akamani, K. & Wang, G. A regional comparison of factors affecting global sorghum production: The case of North America, Asia and Africa’s Sahel. Sustainability 11, 2135 (2019).
Maikasuwa, A. & Ala, A. Trend analysis of area and productivity of sorghum in Sokoto state, Nigeria, 1993-2012. Eur. Sci. J. 9, 16 (2013).
Maiti, R. Sorghum science. (Science Publishers, Inc., 1996).
Tack, J., Lingenfelser, J. & Jagadish, S. K. Disaggregating sorghum yield reductions under warming scenarios exposes narrow genetic diversity in US breeding programs. Proc. Natl Acad. Sci. 114, 9296–9301 (2017).
Schlenker, W. & Lobell, D. B. Robust negative impacts of climate change on African agriculture. Environ. Res. Lett. 5, 014010 (2010).
Sultan, B. et al. Robust features of future climate change impacts on sorghum yields in West Africa. Environ. Res. Lett. 9, 104006 (2014).
Liu, B. et al. Similar estimates of temperature impacts on global wheat yield by three independent methods. Nat. Clim. Change 6, 1130–1136 (2016).
Mohammed, A. & Misganaw, A. Modeling future climate change impacts on sorghum (Sorghum bicolor) production with best management options in Amhara Region, Ethiopia. CABI Agriculture Biosci. 3, 22 (2022).
Kladnik, A., Chourey, P. S., Pring, D. R. & Dermastia, M. Development of the endosperm of Sorghum bicolor during the endoreduplication-associated growth phase. J. Cereal Sci. 43, 209–215 (2006).
Rooney, W. Sorghum: Origin. History, Technology, and Production (2000).
Benech-Arnold, R. L. & Rodríguez, M. V. Pre-harvest sprouting and grain dormancy in Sorghum bicolor: what have we learned? Front. Plant Sci. 9, 811 (2018).
Raghavan, V. Some reflections on double fertilization, from its discovery to the present. N. Phytologist 159, 565–583 (2003).
Tao, Y. et al. Integration of embryo–endosperm interaction into a holistic and dynamic picture of seed development using a rice mutant with notched-belly kernels. Crop J. 10, 729–742 (2022).
Zheng, Y., Xiong, F., Wang, Z. & Gu, Y. Observation and investigation of three endosperm transport tissues in sorghum caryopses. Protoplasma 252, 705–714 (2015).
Artschwager, E. & McGuire, R. C. Cytology of reproduction in Sorghum vulgare. J. Agric. Res. 78, 659–673 (1949).
Kowles, R. V. & Phillips, R. L. DNA amplification patterns in maize endosperm nuclei during kernel development. Proc. Natl Acad. Sci. 82, 7010–7014 (1985).
Schweizer, L., Yerk-Davis, G., Phillips, R., Srienc, F. & Jones, R. Dynamics of maize endosperm development and DNA endoreduplication. Proc. Natl Acad. Sci. 92, 7070–7074 (1995).
Zheng, Y. & Wang, Z. Structural character of sorghum endosperm transfer cells and their relationship with embryo and endosperm. Int. J. Plant Biol. 1, e15 (2010).
Leviczky, T. et al. E2FA and E2FB transcription factors coordinate cell proliferation with seed maturation. Development 146, dev179333 (2019).
Li, R., Tan, Y. & Zhang, H. Regulators of starch biosynthesis in cereal crops. Molecules 26, 7092 (2021).
Hamaker, B., Mohamed, A., Habben, J., Huang, C. & Larkins, B. Efficient procedure for extracting maize and sorghum kernel proteins reveals higher prolamin contents than the conventional method. Cereal Chem. 72, 583–588 (1995).
Duressa, D., Weerasoriya, D., Bean, S. R., Tilley, M. & Tesso, T. Genetic basis of protein digestibility in grain sorghum. Crop Sci. 58, 2183–2199 (2018).
Izquierdo, L. & Godwin, I. D. Molecular characterization of a novel methionine‐rich δ ‐kafirin seed storage protein gene in sorghum (Sorghum bicolor L.). Cereal chem. 82, 706–710 (2005). .
Belton, P., Delgadillo, I., Halford, N. & Shewry, P. Kafirin structure and functionality. J. cereal Sci. 44, 272–286 (2006).
Laidlaw, H. et al. Allelic variation of the β-, γ-and δ-kafirin genes in diverse Sorghum genotypes. Theor. Appl. Genet. 121, 1227–1237 (2010).
Castro-Jácome, T. P., Alcántara-Quintana, L. E. & Tovar-Pérez, E. G. Optimization of sorghum kafirin extraction conditions and identification of potential bioactive peptides. BioResearch Open Access 9, 198–208 (2020).
Duodu, K. et al. Effect of grain structure and cooking on sorghum and maize in vitro protein digestibility. J. Cereal Sci. 35, 161–174 (2002).
Da Silva, L. S., Taylor, J. & Taylor, J. R. Transgenic sorghum with altered kafirin synthesis: kafirin solubility, polymerization, and protein digestion. J. Agric. food Chem. 59, 9265–9270 (2011).
Locascio, A., Roig-Villanova, I., Bernardi, J. & Varotto, S. Current perspectives on the hormonal control of seed development in Arabidopsis and maize: a focus on auxin. Front. plant Sci. 5, 412 (2014).
Kozaki, A. & Aoyanagi, T. Molecular aspects of seed development controlled by gibberellins and abscisic acids. Int. J. Mol. Sci. 23, 1876 (2022).
Ruuska, S. A., Girke, T., Benning, C. & Ohlrogge, J. B. Contrapuntal networks of gene expression during Arabidopsis seed filling. Plant Cell 14, 1191–1206 (2002).
Palovaara, J., Saiga, S. & Weijers, D. Transcriptomics approaches in the early Arabidopsis embryo. Trends Plant Sci. 18, 514–521 (2013).
Fait, A. et al. Arabidopsis seed development and germination is associated with temporally distinct metabolic switches. Plant Physiol. 142, 839–854 (2006).
Baud, S., Boutin, J.-P., Miquel, M., Lepiniec, L. & Rochat, C. An integrated overview of seed development in Arabidopsis thaliana ecotype WS. Plant Physiol. Biochem. 40, 151–160 (2002).
Lan, L. et al. Monitoring of gene expression profiles and isolation of candidate genes involved in pollination and fertilization in rice (Oryza sativa L.) with a 10K cDNA microarray. Plant Mol. Biol. 54, 471–487 (2004).
Furutani, I., Sukegawa, S. & Kyozuka, J. Genome‐wide analysis of spatial and temporal gene expression in rice panicle development. Plant J. 46, 503–511 (2006).
Jiang, S.-Y. & Ramachandran, S. Functional genomics of rice pollen and seed development by genome-wide transcript profiling and Ds insertion mutagenesis. Int. J. Biol. Sci. 7, 28 (2011).
Xue, L.-J., Zhang, J.-J. & Xue, H.-W. Genome-wide analysis of the complex transcriptional networks of rice developing seeds. PloS one 7, e31081 (2012).
Rangan, P., Furtado, A. & Henry, R. J. The transcriptome of the developing grain: a resource for understanding seed development and the molecular control of the functional and nutritional properties of wheat. BMC genomics 18, 1–9 (2017).
Guan, J. et al. Transcriptome Analysis of Developing Wheat Grains at Rapid Expanding Phase Reveals Dynamic Gene Expression Patterns. Biology 11, 281 (2022).
Jiang, L. et al. Dynamic transcriptome analysis suggests the key genes regulating seed development and filling in Tartary buckwheat (Fagopyrum tataricum Garetn.). Front. Genet. 13, 990412 (2022).
Yi, F. et al. High temporal-resolution transcriptome landscape of early maize seed development. Plant Cell 31, 974–992 (2019).
Chen, J. et al. Dynamic transcriptome landscape of maize embryo and endosperm development. Plant Physiol. 166, 252–264 (2014).
Li, X., Wu, J., Yi, F., Lai, J. & Chen, J. High temporal-resolution transcriptome landscapes of maize embryo sac and ovule during early seed development. Plant Mol. Biol. 111, 233–248 (2023).
Li, G. et al. Temporal patterns of gene expression in developing maize endosperm identified through transcriptome sequencing. Proc. Natl Acad. Sci. 111, 7582–7587 (2014).
Wang, X. et al. Integrated analysis of transcriptomic and proteomic data from tree peony (P. ostii) seeds reveals key developmental stages and candidate genes related to oil biosynthesis and fatty acid metabolism. Horticulture Res. 6, 111 (2019).
Fedorova, M. et al. Genome-wide identification of nodule-specific transcripts in the model legume Medicago truncatula. Plant Physiol. 130, 519–537 (2002).
Gallardo, K. et al. A combined proteome and transcriptome analysis of developing Medicago truncatula seeds: evidence for metabolic specialization of maternal and filial tissues. Mol. Cell. Proteom. 6, 2165–2179 (2007).
Ziegler, D. J., Khan, D., Kalichuk, J. L., Becker, M. G. & Belmonte, M. F. Transcriptome landscape of the early Brassica napus seed. J. Integr. Plant Biol. 61, 639–650 (2019).
Shahid, M. et al. Comparative transcriptome analysis of developing seeds and silique wall reveals dynamic transcription networks for effective oil production in Brassica napus L. Int. J. Mol. Sci. 20, 1982 (2019).
Li, F., Wu, X., Tsang, E. & Cutler, A. J. Transcriptional profiling of imbibed Brassica napus seed. Genomics 86, 718–730 (2005).
Watson, L. & Henry, R. J. Microarray analysis of gene expression in germinating barley embryos (Hordeum vulgare L.). Funct. Integr. Genomics 5, 155–162 (2005).
Qi, Z. et al. Meta‐analysis and transcriptome profiling reveal hub genes for soybean seed storage composition during seed development. Plant, Cell Environ. 41, 2109–2127 (2018).
Collakova, E. et al. Metabolic and transcriptional reprogramming in developing soybean (Glycine max) embryos. Metabolites 3, 347–372 (2013).
Yang, S. et al. Dynamic transcriptome changes related to oil accumulation in developing soybean seeds. Int. J. Mol. Sci. 20, 2202 (2019).
Sun, S. et al. Analysis of spatio-temporal transcriptome profiles of soybean (Glycine max) tissues during early seed development. Int. J. Mol. Sci. 21, 7603 (2020).
Huang, L., Tan, H., Zhang, C., Li, Q. & Liu, Q. Starch biosynthesis in cereal endosperms: An updated review over the last decade. Plant Commun. 2, 100237. https://doi.org/10.1016/j.xplc.2021.100237 (2021).
Qu, J. et al. Comparative transcriptomics reveals the difference in early endosperm development between maize with different amylose contents. PeerJ 7, e7528 (2019).
Du, J. et al. Identification of regulatory networks and hub genes controlling soybean seed set and size using RNA sequencing analysis. J. Exp. Bot. 68, 1955–1972 (2017).
Zhang, Z. et al. Integrated metabolomics and transcriptomics analyses reveal the metabolic differences and molecular basis of nutritional quality in landraces and cultivated rice. Metabolites 12, 384 (2022).
Xiao, Q. et al. Profiling of transcriptional regulators associated with starch biosynthesis in sorghum (Sorghum bicolor L.). Front. Plant Sci. 13, 999747 (2022).
Seebauer, J. R., Singletary, G. W., Krumpelman, P. M., Ruffo, M. L. & Below, F. E. Relationship of source and sink in determining kernel composition of maize. J. Exp. Bot. 61, 511–519 (2010).
Shen, S., Hou, H., Ding, C., Bing, D.-J. & Lu, Z.-X. Protein content correlates with starch morphology, composition and physicochemical properties in field peas. Can. J. plant Sci. 96, 404–412 (2016).
Kljak, K., Duvnjak, M. & Grbeša, D. Effect of starch properties and zein content of commercial maize hybrids on kinetics of starch digestibility in an in vitro poultry model. J. Sci. Food Agriculture 99, 6372–6379 (2019).
Wang, B. et al. A comparative transcriptional landscape of maize and sorghum obtained by single-molecule sequencing. Genome Res. 28, 921–932 (2018).
Doll, N. M., Depège-Fargeix, N., Rogowsky, P. M. & Widiez, T. Signaling in early maize kernel development. Mol. Plant 10, 375–388 (2017).
Young, T. E. & Gallie, D. R. Programmed cell death during endosperm development. Programmed cell death in higher plants, 39–57 (2000).
Domínguez, F. & Cejudo, F. J. Programmed cell death (PCD): an essential process of cereal seed development and germination. Front. plant Sci. 5, 99512 (2014).
Young, T. E., Gallie, D. R. & DeMason, D. A. Ethylene-mediated programmed cell death during maize endosperm development of wild-type and shrunken2 genotypes. Plant Physiol. 115, 737–751 (1997).
Ahmadizadeh, M., Chen, J.-T., Hasanzadeh, S., Ahmar, S. & Heidari, P. Insights into the genes involved in the ethylene biosynthesis pathway in Arabidopsis thaliana and Oryza sativa. J. Genet. Eng. Biotechnol. 18, 62 (2020).
Wang, G. et al. Dek40 encodes a PBAC4 protein required for 20S proteasome biogenesis and seed development. Plant Physiol. 180, 2120–2132 (2019).
Wei, Y. M. et al. Defective kernel 66 encodes a GTPase essential for kernel development in maize. J. Exp. Bot. 74, 5694–5708 (2023).
Pedroza-Garcia, J. A. et al. Maize ATR safeguards genome stability during kernel development to prevent early endosperm endocycle onset and cell death. Plant Cell 33, 2662–2684 (2021).
Yan, D., Duermeyer, L., Leoveanu, C. & Nambara, E. The functions of the endosperm during seed germination. Plant and Cell Physiology. 55, 1521–1533 (2014).
Huang, S. & Millar, A. H. Succinate dehydrogenase: the complex roles of a simple enzyme. Curr. Opin. plant Biol. 16, 344–349 (2013).
Biswal, A. K. et al. Novel mutant alleles reveal a role of the extra-large G protein in rice grain filling, panicle architecture, plant growth, and disease resistance. Front. Plant Sci. 12, 2821 (2022).
Zhang, C. et al. Pivotal roles of ELONGATED HYPOCOTYL5 in regulation of plant development and fruit metabolism in tomato. Plant Physiol. 189, 527–540 (2022).
Sun, Y. et al. Natural variation in the OsbZIP18 promoter contributes to branched‐chain amino acid levels in rice. N. Phytologist 228, 1548–1558 (2020).
Sun, Y. et al. OsbZIP18, a positive regulator of serotonin biosynthesis, negatively controls the UV-B tolerance in rice. Int. J. Mol. Sci. 23, 3215 (2022).
Kawakatsu, T. & Takaiwa, F. Differences in transcriptional regulatory mechanisms functioning for free lysine content and seed storage protein accumulation in rice grain. Plant cell Physiol. 51, 1964–1974 (2010).
Gao, Y., Xu, H., Shen, Y. & Wang, J. Transcriptomic analysis of rice (Oryza sativa) endosperm using the RNA-Seq technique. Plant Mol. Biol. 81, 363–378 (2013).
Gillies, S. A., Futardo, A. & Henry, R. J. Gene expression in the developing aleurone and starchy endosperm of wheat. Plant Biotechnol. J. 10, 668–679 (2012).
Gutierrez-Gonzalez, J. J., Tu, Z. J. & Garvin, D. F. Analysis and annotation of the hexaploid oat seed transcriptome. BMC genomics 14, 1–17 (2013).
Lu, X. et al. The differential transcription network between embryo and endosperm in the early developing maize seed. Plant Physiol. 162, 440–455 (2013).
Pellny, T. K. et al. Cell walls of developing wheat starchy endosperm: comparison of composition and RNA-Seq transcriptome. Plant Physiol. 158, 612–627 (2012).
Chi, Q. et al. Global transcriptome analysis uncovers the gene co-expression regulation network and key genes involved in grain development of wheat (Triticum aestivum L.). Funct. Integr. Genomics 19, 853–866 (2019).
Tao, Y. et al. Large‐scale GWAS in sorghum reveals common genetic control of grain size among cereals. Plant Biotechnol. J. 18, 1093–1105 (2020).
Boyles, R. E. et al. Genetic dissection of sorghum grain quality traits using diverse and segregating populations. Theor. Appl. Genet. 130, 697–716 (2017).
Kumar, N. et al. Development and characterization of a sorghum multi-parent advanced generation intercross (MAGIC) population for capturing diversity among seed parent gene pool. G3: Genes, Genomes. Genetics 13, jkad037 (2023).
Kimani, W., Zhang, L.-M., Wu, X.-Y., Hao, H.-Q. & Jing, H.-C. Genome-wide association study reveals that different pathways contribute to grain quality variation in sorghum (Sorghum bicolor). BMC genomics 21, 1–19 (2020).
Pattison, R. J. et al. Comprehensive tissue-specific transcriptome analysis reveals distinct regulatory programs during early tomato fruit development. Plant Physiol. 168, 1684–1701 (2015).
Han, B. et al. Epigenetic regulation of seed-specific gene expression by DNA methylation valleys in castor bean. BMC Biol. 20, 1–18 (2022).
Zhan, J. et al. RNA sequencing of laser-capture microdissected compartments of the maize kernel identifies regulatory modules associated with endosperm cell differentiation. Plant Cell 27, 513–531 (2015).
Ali, F., Qanmber, G., Li, F. & Wang, Z. Updated role of ABA in seed maturation, dormancy, and germination. J. Adv. Res. 35, 199–214 (2022).
Zhao, Y., Hu, Y., Dai, M., Huang, L. & Zhou, D.-X. The WUSCHEL-related homeobox gene WOX11 is required to activate shoot-borne crown root development in rice. Plant Cell 21, 736–748 (2009).
Yamamoto, A. et al. Arabidopsis NF‐YB subunits LEC1 and LEC1‐LIKE activate transcription by interacting with seed‐specific ABRE‐binding factors. Plant J. 58, 843–856 (2009).
Jiang, W., Zhang, X., Song, X., Yang, J. & Pang, Y. Genome-wide identification and characterization of APETALA2/ethylene-responsive element binding factor superfamily genes in soybean seed development. Front. plant Sci. 11, 566647 (2020).
Singh, M. et al. Biochar implications under limited irrigation for sweet corn production in a semi-arid environment. Front. Plant Sci. 13, 1032 (2022).
Bean, S., Ioerger, B. & Blackwell, D. Separation of kafirins on surface porous reversed-phase high-performance liquid chromatography columns. J. Agric. Food Chem. 59, 85–91 (2011).
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Wishart, D. S. et al. HMDB: the human metabolome database. Nucleic Acids Res. 35, D521–D526 (2007).
Xia, J. & Wishart, D. S. Using MetaboAnalyst 3.0 for comprehensive metabolomics data analysis. Current protocols in bioinformatics 55, 14.10. 11-14.10. 91 (2016).
Chen, Y. et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. Gigascience 7, gix120 (2018).
Andrews, S. FastQC: A quality control tool for high throughput sequence data. Babraham Bioinformatics, Babraham Institute, Cambridge, UK (2010).
Paterson, A. H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009).
McCormick, R. F. et al. The Sorghum bicolor reference genome: improved assembly, gene annotations, a transcriptome atlas, and signatures of genome organization. Plant J. 93, 338–354 (2018).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Wu, Z.-G. et al. Morphological and stage-specific transcriptome analyses reveal distinct regulatory programs underlying yam (Dioscorea alata L.) bulbil growth. J. Exp. Bot. 71, 1899–1914 (2020).
Venables, W. N. & Ripley, B. D. Modern applied statistics with S-PLUS. (Springer Science & Business Media, 2013).
Kolde, R. & Kolde, M. R. Package ‘pheatmap’. R. package 1, 790 (2015).
Bholowalia, P. & Kumar, A. EBK-means: A clustering technique based on elbow method and k-means in WSN. Int. J. Comput. Appl. 105, 17–24 (2014).
Yuan, C. & Yang, H. Research on K-value selection method of K-means clustering algorithm. J 2, 226–235 (2019).
Ge, S. X., Jung, D. & Yao, R. ShinyGO: a graphical gene-set enrichment tool for animals and plants. Bioinformatics 36, 2628–2629 (2020).
Szklarczyk, D. et al. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic acids Res. 51, D638–D646 (2023).
Kumar, L. & Futschik, M. E. Mfuzz: a software package for soft clustering of microarray data. Bioinformation 2, 5 (2007).
Sudhakar Reddy, P. et al. Evaluation of sorghum [Sorghum bicolor (L.)] reference genes in various tissues and under abiotic stress conditions for quantitative real-time PCR data normalization. Front. plant Sci. 7, 172935 (2016).
Kebrom, T. H., McKinley, B. & Mullet, J. E. Dynamics of gene expression during development and expansion of vegetative stem internodes of bioenergy sorghum. Biotechnol. Biofuels 10, 1–16 (2017).
Emms, D. M., Covshoff, S., Hibberd, J. M. & Kelly, S. Independent and parallel evolution of new genes by gene duplication in two origins of C4 photosynthesis provides new insight into the mechanism of phloem loading in C4 species. Mol. Biol. Evolution 33, 1796–1806 (2016).
Makita, Y. et al. MOROKOSHI: transcriptome database in Sorghum bicolor. Plant Cell Physiol. 56, e6–e6 (2015).
Davidson, R. M. et al. Comparative transcriptomics of three Poaceae species reveals patterns of gene expression evolution. Plant J. 71, 492–502 (2012).
Turco, G. M. et al. DNA methylation and gene expression regulation associated with vascularization in Sorghum bicolor. N. Phytologist 214, 1213–1229 (2017).
Varoquaux, N. et al. Transcriptomic analysis of field-droughted sorghum from seedling to maturity reveals biotic and metabolic responses. Proc. Natl Acad. Sci. 116, 27124–27132 (2019).
Gelli, M. et al. Identification of differentially expressed genes between sorghum genotypes with contrasting nitrogen stress tolerance by genome-wide transcriptional profiling. BMC Genomics 15, 1–16 (2014).
Dugas, D. V. et al. Functional annotation of the transcriptome of Sorghum bicolor in response to osmotic stress and abscisic acid. BMC Genomics 12, 1–21 (2011).
Ma, C., Xin, M., Feldmann, K. A. & Wang, X. Machine learning–based differential network analysis: a study of stress-responsive transcriptomes in Arabidopsis. Plant Cell 26, 520–537 (2014).
Ma, C. & Wang, X. Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis. Plant Physiology 160, 192–203 (2012).
Acknowledgements
This research was supported by the intramural research program of the U.S. Department of Agriculture, National Institute of Food and Agriculture, Agriculture and Food Research Initiative (AFRI), under the award number: 2023-67013-39631. Y.J. and A.K. was also supported by the State of Texas’ Governor’s University Research Initiative (GURI). S.R.B. was funded by USDA-ARS project number 3020-43440-002.The findings and conclusions in this preliminary publication have not been formally disseminated by the U. S. Department of Agriculture and Should not be construed to represent any agency determination or policy. Names are necessary to report factually on available data; however, the U.S. Department of Agriculture neither guarantees nor warrants the standard of the product and use of the name by the U.S. Department of Agriculture implies no approval of the product to the exclusion of others that may also be suitable. USDA is an equal opportunity provider and employer.
Author information
Authors and Affiliations
Contributions
Y.J. conceived of and designed the project. A.K. and R.T. performed the experiments and analyzed the data. S.B. conducted protein analysis. A.K., Y.J., R.T. and M.K.Y. prepared the manuscript. All authors edited and approved the final version for publication.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Biology thanks Long Mao and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: David Favero. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Khan, A., Tian, R., Bean, S.R. et al. Transcriptome and metabolome analyses reveal regulatory networks associated with nutrition synthesis in sorghum seeds. Commun Biol 7, 841 (2024). https://doi.org/10.1038/s42003-024-06525-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42003-024-06525-7
This article is cited by
-
Dynamic transcriptome landscape of oat grain development
BMC Genomics (2025)
-
An updated molecular toolkit for genomics-assisted breeding of waxy sorghum [Sorghum bicolor (L.) Moench]
Journal of Applied Genetics (2025)