Transcriptome and metabolome analyses reveal regulatory networks associated with nutrition synthesis in sorghum seeds

Khan, Adil; Tian, Ran; Bean, Scott R.; Yerka, Melinda; Jiao, Yinping

doi:10.1038/s42003-024-06525-7

Download PDF

Article
Open access
Published: 10 July 2024

Transcriptome and metabolome analyses reveal regulatory networks associated with nutrition synthesis in sorghum seeds

Adil Khan¹^na1,
Ran Tian¹^na1,
Scott R. Bean²,
Melinda Yerka³ &
…
Yinping Jiao ORCID: orcid.org/0000-0002-6016-9639¹

Communications Biology volume 7, Article number: 841 (2024) Cite this article

7555 Accesses
15 Citations
6 Altmetric
Metrics details

Subjects

Abstract

Cereal seeds are vital for food, feed, and agricultural sustainability because they store and provide essential nutrients to human and animal food and feed systems. Unraveling molecular processes in seed development is crucial for enhancing cereal grain yield and quality. We analyze spatiotemporal transcriptome and metabolome profiles during sorghum seed development in the inbred line ‘BTx623’. Morphological and molecular analyses identify the key stages of seed maturation, specifying starch biosynthesis onset at 5 days post-anthesis (dpa) and protein at 10 dpa. Transcriptome profiling from 1 to 25 dpa reveal dynamic gene expression pathways, shifting from cellular growth and embryo development (1–5 dpa) to cell division, fatty acid biosynthesis (5–25 dpa), and seed storage compounds synthesis in the endosperm (5–25 dpa). Network analysis identifies 361 and 207 hub genes linked to starch and protein synthesis in the endosperm, respectively, which will help breeders enhance sorghum grain quality. The availability of this data in the sorghum reference genome line establishes a baseline for future studies as new pangenomes emerge, which will consider copy number and presence-absence variation in functional food traits.

Utilizing genetic variation in perennial sorghum to improve host plant resistance to aphids

Article Open access 19 April 2025

Resequencing of two elite sorghum (Sorghum bicolor (L.) Moench) hybrid parent lines reveals distinctly different genome-wide variation models

Article Open access 26 September 2025

A telomere-to-telomere genome assembly of Chinese grain sorghum 654

Article Open access 19 March 2025

Introduction

Sorghum [Sorghum bicolor (L.) Moench] stands out as a versatile and climate-smart crop, ranking among the world’s top five cereals in terms of production. It plays a crucial role in providing dietary calories and essential nutrients for a substantial proportion of the global population^1,2,3,4. The challenges posed by population growth, climate change, and the increasing demand for nutritious cereal crops underscore the need to enhance both the quantity and quality of sorghum grain production^5,6,7,8,9,10. To meet these challenges, plant breeders need a comprehensive understanding of the molecular, biochemical, and physiological mechanisms governing sorghum seed development. Such insights will ensure an ample and nutritious food supply in the face of climate change.

The sorghum seed is a complex system comprised of genetically distinct tissues: a diploid embryo, a triploid endosperm, and diploid maternal tissues^11,12,13. Following double fertilization, an evolutionarily conserved process in all flowering plants, the zygote develops into the embryo while the central cell transforms into the endosperm. The endosperm serves as a nutrient-rich storage tissue, supplying the energy required for the initial growth of the embryo and subsequent germination in monocots^11,14,15. The developmental timeline from fertilization of the ovule to seed maturity in sorghum is typically 40–45 days¹⁶. Initially, from 3–5 days post-anthesis (dpa), there is limited growth, and no apparent development of the embryo or endosperm^11,17. Subsequently, endoreduplication occurs, followed by starch accumulation. While starch accumulation in maize initiates at 10 dpa^18,19, in sorghum, it commences at 5 dpa¹¹. From 6–24 dpa, the caryopsis, embryo, and endosperm undergo rapid growth, accompanied by significant changes in seed size. However, from 24–35 dpa, the growth rate diminishes, and only slight alterations in the sizes of the caryopsis, embryo, and endosperm occur¹⁶. These observations indicate three primary developmental stages in the sorghum caryopsis: an early stage before 6 dpa, a middle stage spanning 6–24 dpa, and a late stage extending from 25–35 dpa²⁰ .

Starch metabolism is a dynamic physiological process that is required for energy storage and utilization²¹. It involves a sophisticated interplay between sucrose metabolism and various tightly regulated pathways governed by many enzymes²². Among those enzymes are ADP-glucose pyrophosphorylase (AGPase) and various starch synthases (SSs), starch branching enzymes (SBEs), and starch debranching enzymes (DBEs). In sorghum, kafirins are the predominant seed storage proteins, constituting 77 to 82% of endosperm protein and 68 to 73% of total protein in whole sorghum grain^23,24. Non-prolamin proteins, namely albumins, globulins, and glutelins, make up the remaining 20% of protein. Kafirins have molecular weight-based classifications as α-kafirins (25–23 kDa), β-kafirins (20–16 kDa), γ-kafirins (28–50 kDa), and δ-kafirins (13 kDa)^25,26,27,28. A total of 27 previously reported kafirin genes in the sorghum genome include 23 α-kafirins, 1 β-kafirin, 2 γ-kafirins, and 1 δ-kafirin²⁴. Kafirins exist as monomeric proteins, small oligomeric protein complexes, and large polymeric protein complexes, held together by inter-protein disulfide bonds. Sorghum kafirins can be further classified as kafirin 1 and 2 based on solubility during protein extraction. Kafirin 1 comprises proteins not heavily cross-linked into large polymeric structures, while kafirin 2 is solubilized from the remaining large polymeric complexes²⁹. The ratio of kafirin 1 to kafirin 2 is a crude measure of protein cross-linking in the sorghum seed³⁰.

Seed development is a process that is notable for dynamic physiological and biochemical changes^31,32. The chemical composition of mature seeds is shaped by complex gene expression networks. Recent years have seen a surge in transcriptomic analyses investigating seed development in diverse plant species, including Arabidopsis thaliana, Oryza sativa, Triticum, Zea mays, Paeonia, Medicago truncatula, Brassica napus, Hordeum vulgare, and Glycine max^{33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59}. These studies have advanced our understanding of seed spatiotemporal gene expression patterns and their regulation, offering genetic insights that are applicable to breeding for quality traits. For instance, high-throughput RNA sequencing (RNA–Seq) in maize identified genes and transcription factors (TFs) strongly associated with amylose and amylopectin biosynthesis⁶⁰. Similarly, transcriptome analyses of developing soybean seeds revealed hub genes implicated in oil and protein accumulation^55,61. The integrated metabolomic and transcriptomic analyses of rice seeds identified candidate genes involved in the structural modification of anthocyanins⁶². These examples underscore the potential for integrating metabolomic and transcriptomic information during seed development to clarify the molecular mechanisms driving the accumulation of desirable chemical profiles. Consequently, integrating the transcriptomic and metabolomic profiles of developing sorghum seeds holds promise for generating new molecular breeding resources, thereby enhancing sorghum seed quality and yield for a hungry planet.

Despite the importance of sorghum in global food and feed systems, a significant gap remains in our understanding of gene expression dynamics during sorghum endosperm and embryo development. A recent study reported the transcriptome of developing sorghum seeds at various timepoints from 5 to 25 dpa, providing insights into overall seed development, but the transcriptomes of the embryo and endosperm were not differentiated⁶³. Additionally, there are no published studies that comprehensively explore the transcriptomic and metabolomic networks governing carbon allocation tradeoffs driving the accumulation of starch and protein throughout seed development, which is a major target for plant breeding. Research is particularly needed to clarify stage- and tissue-specific gene expression and crosstalk within and among the embryo, endosperm, and whole seed^64,65,66.

To address this knowledge gap, we conducted an in-depth transcriptomic analysis of developing sorghum seeds, dissecting the early whole seed, embryo, and endosperm tissues from fertilization through maturity. Complementing these efforts, metabolomic analyses were performed at five key seed developmental stages to gain insights into the accumulation of specific metabolites that drive nutritional quality, ultimately contributing to the mature grain quality profile. Our findings have unveiled hub genes and metabolites crucial for regulating mature sorghum seed chemistry profiles, broken down by their specificity to the embryo, endosperm, and/or the whole seed. This comprehensive analysis marks a significant step toward unraveling the intricate molecular mechanisms underlying sorghum seed development, with implications for enhancing nutritional quality and overall yield.

Results

Morphological analyses of sorghum seed development

To comprehensively characterize sorghum seed development, we collected developing seed samples from the reference genome line ‘BTx623’ spanning 1–25 dpa (Supplementary Data 1). Over this period, the seed coat exhibited a transition from bright green (1–12 dpa) to light green (13–21 dpa), ultimately a yellowish-green hue (22–25 dpa) (Fig. 1a). The fresh weight of the seeds, depicted in Supplementary Fig. 1a, exhibited a gradual increase post-pollination, reaching its peak at 22 dpa (average 1.23 g/50 grain). Notably, the rate of seed weight gain during early stages (1–15 dpa) surpassed that of later stages (16–25 dpa). By the final timepoint of the study at 25 dpa, sorghum seeds had entered the desiccation stage, making the separation of the embryo and endosperm challenging and justifying the termination of the experiment.

**Fig. 1: Morphological changes, structural alterations, and kafirin accumulation during sorghum seed development.**

Scanning electron microscopy (SEM) imaging revealed the presence of starch granules at 5 dpa (Fig. 1b), with the quantity and dimensions of these granules gradually increasing in subsequent developmental phases. These observations align with previous research indicating that endoreduplication precedes starch accumulation in sorghum¹¹. Concurrently, kafirin 1 (monomeric proteins and small oligomeric complexes) and kafirin 2 (polymeric cross-linked complexes) emerged at low levels between 5 and 10 dpa but increased between 15 and 25 dpa. This suggests a crucial developmental shift in kafirin accumulation and crosslinking between 10 and 15 dpa. The abundance of kafirin 1 surpassed that of kafirin 2 at all stages (Fig. 1c). At 25 dpa, kafirin 1 constituted approximately 84% of the total protein content, while kafirin 2 comprised the remaining 16% (Fig. 1c). Consequently, the sampled timepoints in this study (5, 10, 15, 20, and 25 dpa) were representative stages of sorghum seed development based on SEM.

Dynamic metabolic changes during sorghum seed development

Differential metabolite accumulation was assessed across the five sampled timepoints (Supplementary Data 2; Fig. 2a). Among 7959 detected peaks, a total of 2073 metabolites were successfully identified and 955 were assigned to functionally annotated pathways (Supplementary Fig. 1b). The functionally annotated metabolites were grouped into 13 functional categories (Supplementary Fig. 1c; Supplementary Data 3). The top enriched pathways within these categories included the biosynthesis of secondary metabolites (20.23%), amino acid metabolism (18.75%), lipid metabolism (13.98%), and carbohydrate metabolism (13.09%) (Supplementary Fig. 1d). Pathway analysis of the six clusters of metabolites, reflecting different stages of development (Supplementary Fig. 2a), revealed that starch biosynthesis, initiates at 5 dpa, transitioning to protein biosynthesis and degradation after 15 dpa (Supplementary Fig. 2b–g). This observation agreed with the SEM imaging of starch granules and kafirin quantification (Fig. 1b, c). Taken together, these results suggest that the starch and fatty acid contents in sorghum seeds were determined before final protein content, potentially contributing to the well-known negative correlation between starch and protein content.

**Fig. 2: Overview of differentially expressed metabolites identified through pairwise comparisons at key timepoints during sorghum seed development.**

Principal component analysis (PCA) highlighted distinct variations in metabolite profiles across the five timepoints (Supplementary Fig. 3; Fig. 2a). A total of 1495 compounds exhibited differential accumulation throughout seed development (Fig. 2b; Supplementary Data 4), indicating a stage-specific accumulation pattern. Up-regulated metabolites between 10 and 5 dpa were associated with the biosynthesis of fatty acids, linoleic acid, sugar metabolism, and lysine, while down-regulated metabolites were linked to flavanol biosynthesis and the pentose phosphate pathway (Fig. 2c). This pattern suggested a resource reallocation favoring essential processes during the early stages of seed development, with upregulation of high-energy molecules and downregulation of metabolites related to secondary metabolism and nucleotide synthesis. Similar trends were observed in comparisons between the 15 and 5 dpa (Fig. 2d). In contrast, comparisons between the 20 and 5 dpa, as well as the 25 and 5 dpa, revealed a shift toward alanine, aspartate, glutamate, flavonoid, and linoleic acid biosynthesis, indicating an emphasis on protein synthesis during later stages of seed development (Fig. 2e & f). The 189 metabolites that were consistently up-regulated and the 234 consistently down-regulated metabolites across the five timepoints during seed development (Supplementary Fig. 4a–c) likely play roles in the accumulation of sorghum protein, starch, and oil. The consistently up-regulated metabolites were associated with the synthesis of lipids, phenolic acids, and flavonoids (Supplementary Fig. 4d), while the down-regulated metabolites were enriched in linoleic acid, monoterpenoid biosynthesis, and the biosynthesis of unsaturated fatty acids (Supplementary Fig. 4e). Collectively, these trends suggested that lipid metabolism, secondary metabolite production, and nucleotide biosynthesis undergo dynamic modulation throughout seed development.

The transcriptome landscape of sorghum seed development

Transcriptome profiling of sorghum seed development encompassed 45 samples (1–9 dpa for the early whole seed, 6–25 dpa for the endosperm, and 10–25 dpa for the embryo), each with two replicates (Supplementary Data 5 & 6). We obtained over 218.8 million high-quality reads, averaging 23.78 million reads per replicate (Supplementary Data 5). The robust correlation (average R² = 0.976) between the two replicates for each sample underscored the high quality of the data (Supplementary Data 5). Additionally, the qPCR results from three replicates of four randomly selected genes at the five major timepoints (5, 10, 15, 20, 25 dpa) closely matched the RNAseq data (Supplementary Fig. 5), further validating the reliability of our findings.

A total of 21,971 genes were expressed (FPKM ≥ 1) during sorghum seed development (Supplementary Data 7). More genes were expressed in early whole seeds and endosperms than in later stages (Supplementary Fig. 6a), indicating heightened gene activity during early seed development as tissue types and specialized cell layers initially diversify. The higher expression (average FPKM of all genes from 10–25 dpa) in the embryo compared to the endosperm suggests greater metabolic activity and more complex developmental processes in the embryo. The distinctiveness of the transcriptome landscapes between these tissues was further confirmed by the greater median gene expression level in the embryo compared to the endosperm (Supplementary Fig. 6b).

Among the 2049 genes specifically expressed in the 1–9 dpa whole seed (Supplementary Fig. 6c), 1558 were previously identified in the sorghum ovary cell wall using RNA-Seq data⁶⁷. This was expected, as whole sorghum seeds (caryopses) are comprised of distinct maternal (pericarp, derived from the ovary cell wall) and daughter (embryo, endosperm) tissues (Supplementary Fig. 6d). Similarly, 795 embryo-specific genes were enriched for pathways related to embryogenesis (Supplementary Fig. 6e), while 397 endosperm-specific genes were enriched in metabolic and mitogen-activated protein kinase (MAPK) signaling pathways (Supplementary Fig. 6f).

A PCA of the transcriptome data effectively separated developing seeds into three groups based on their tissue identity, validating their distinct developmental activities (Fig. 3a). Early whole seed samples collected at 1–5 dpa formed a separate cluster from samples collected at 6–9 dpa. The latter cluster exhibited proximity to the endosperm sample, indicating shared gene activity within the whole seed and young endosperm. In the hierarchical clustering of gene expression within the embryo, the first and second clusters were associated with morphogenesis and maturation processes, respectively (Fig. 3b). This aligns with the embryo’s sequence of active DNA synthesis, cell division, and differentiation in early and middle phases, followed by the synthesis of storage reserves and desiccation processes in later phases^17,20,46,68. The three endosperm clusters aligned with canonical stages guiding harvest times, encompassing the milk, soft dough, and hard dough phases (Fig. 3c). The milk phase coincided with cellularization, following the syncytial phase which involves the formation of cell walls and the partitioning of the endosperm into discrete cells. The soft dough and hard dough phases correlated with the grain-filling phase, characterized by the development of distinct cell types and the accumulation of storage reserves^17,20.

**Fig. 3: Global transcriptome relationships among different developmental stages and tissues.**

Main pathways involved in sorghum seed development

To identify the active cellular processes in developing sorghum seeds, we employed a k-means clustering methodology, which revealed 12, 16, and 15 co-expression clusters in the early whole seed, embryo, and endosperm, respectively (Fig. 4a, b; Supplementary Fig. 7; Supplementary Data 6). In early whole seeds (1–9 dpa), clusters c1–c6 (1–5 dpa; early stage) were enriched in genes controlling cellular growth, proliferation, and fundamental structures essential for seed development (Supplementary Fig. 7). In contrast, clusters c7–c11 (5–9 dpa; middle stage) were enriched in processes related to embryo development and storage compound accumulation like starch biosynthesis, which aligned with SEM imaging showing starch granules emerging at 5 dpa. Genes constitutively expressed from 1–9 dpa (c12) were associated with endosomal vesicle fusion, vacuolar acidification, Nicotinamide Adenine Dinucleotide (NAD) biosynthesis, organelle fusion, vacuole organization, and embryo development.

**Fig. 4: Gene expression patterns and functional transitions over the time course for the BTx623 sorghum embryo and endosperm.**

In the embryo samples, primary active pathways were related to cell division, fatty acid biosynthesis, and embryo development. The middle stage (c1–c7) of embryo development (10–18 dpa) was enrichment in pathways regulating cellular processes, including the cell cycle, starch, and amino acid biosynthesis. The later stage (c8–c15; 19–25 dpa) indicated active embryo growth with enriched fatty acid and lipid biosynthesis pathways, suggesting a crucial role in providing energy and nutrition. Genes in c16, expressed across all stages, were enriched for embryo development, protein folding, membrane organization, and transport, indicating their fundamental roles across all timepoints (Fig. 4a).

In the endosperm, a distinct shift toward the activation of storage pathways (starch and storage proteins) occurred after initial cell division in early stages of development (Fig. 4b). Clusters c1–c7 (6–14 dpa) included genes related to cell cycle processes, metabolic processes, and seed storage compound biosynthesis, indicating their roles in regulating nutrient storage and energy metabolism. During middle and late stages, extensive growth and differentiation in the endosperm coincided with the accumulation of storage reserves for starch and protein. Genes in c15, expressed throughout endosperm development, highlighted an emphasis on cellular metabolism and function. Programed cell death (PCD) is a crucial process for the cereal endosperm as it transitions from cell division to nutrient accumulation^69,70. Ethylene is one of the major hormones involved in the PCD⁷¹. Based on the ethylene biosynthesis enzymes in rice⁷², we noted the peak expression of most of these genes during the early stages of sorghum endosperm development (6-10DPA, Supplementary Fig. 8a). Notably, the primary enzyme initiating ethylene synthesis methionine adenosyltransferase (SAM), encoded by the Sobic.003G151600 and Sobic.009G033600 genes, exhibited a pronounced downregulation trend during sorghum endosperm development. This observation suggests that ethylene acts as a negative regulator of grain filling and PCD. Consistently, other documented PCD regulator genes in maize, such as ZmDEK40⁷³, ZmDEK66⁷⁴, ZmATR, and ZmATM⁷⁵, showed similar downregulation trend in the process of sorghum endosperm development (Supplementary Fig. 8b).

Hub genes and key networks associated with starch and protein synthesis

Understanding the regulation of biosynthetic networks that govern seed nutrition is crucial for enhancing sorghum grain quality. The presented data highlighted that starch and protein biosynthesis in BTx623 sorghum seeds predominantly took place in the endosperm (Supplementary Fig. 9a, b; Supplementary Data 8), aligning with its role as a storage tissue for energy during germination and early seedling growth⁷⁶.

In endosperm, the expression patterns of sorghum ortholog genes associated with starch and kafirin biosynthesis revealed distinct trends (Supplementary Fig. 9c, d). Starch biosynthesis was most active between 5–15 dpa, while protein biosynthesis primarily occurred during 15–25 dpa. Interestingly, kafirin genes constituted 44.77% of total endosperm transcripts from 6–25 dpa, with a notable increase from 24.67% (6–15 dpa) to 62.16% (16–25 dpa), indicating predominant kafirin synthesis post-15 dpa (Supplementary Fig. 10a). The most abundant kafirin gene transcripts during endosperm development were α-kafirins (34.20%), followed by γ-kafirins (6.99%), β-kafirins (3.29%), and δ-kafirins (0.277%) (Supplementary Fig. 10a). This agrees with the first observation of starch granules at 5 dpa in the SEM imaging and the distinct metabolite enrichments at the five major timepoints. Throughout all stages, the average expression level of kafirin and starch synthesis genes was higher in the endosperm than in the embryo (Supplementary Fig. 10b). For example, a significant proportion of kafirin genes (23 out of 27) ranked among the 100 most highly expressed genes in the endosperm, compared to only 9 out of 27 genes in the embryo (Supplementary Fig. 10c; Supplementary Data 9 & 10).

A co-expression network analysis using the 20,491 genes expressed in the endosperm (FPKM ≥ 1) was conducted to scrutinize the regulation of starch and protein biosynthesis. Soft clustering was employed to allow genes to belong to multiple clusters. Among the 12 co-expression modules (Supplementary Fig. 11; Supplementary Data 11), modules 8 and 12 exhibited significant enrichment (FDR < 0.05) in starch biosynthesis-related genes, as determined by Fisher’s Exact Test (Fig. 5a, Supplementary Fig. 12 a, b). In addition, genes from the same modules were associated with diverse functional categories such as proteosome activity, N-glycan biosynthesis, participation in the tricarboxylic acid (TCA) cycle, spliceosome activity, amino acid synthesis, oxidative phosphorylation, and DNA replication (Fig. 5b). Gene Network Analyzer analysis of these modules identified 361 as hub genes based on two criteria: gene degree of connectivity ≥ 5 in the hub module and gene module membership > 0.8. Many hub genes encode proteins participating in the TCA cycle, ribosome biogenesis, oxidative phosphorylation, DNA replication, starch, and sucrose metabolism. For example, the top 10 highly connected genes (Supplementary Data 12) included genes that code for succinate dehydrogenase, metallopeptidase M24 family proteins, the translation initiation factor 3B family, and elongation factor 1-gamma 3. These results indicate that the core enzymes in starch synthesis are regulated by the identified hub genes. For instance, Sobic.007G023400, one of the hub genes, encodes the succinate dehydrogenase iron-protein subunit (SDHB), a crucial component of the succinate dehydrogenase enzyme complex that is essential for the TCA cycle, Krebs cycle, and the electron transport chain during cellular respiration⁷⁷. Notably, two genes involved in starch branching, Sobic.001G083900 (SbPHOL) and Sobic.003G358600 (SbPHOH), were also identified as hub genes, emphasizing their regulatory role in starch biosynthesis.

**Fig. 5: Identification of hub genes associated with starch and kafirin biosynthesis.**

Among the 12 co-expression modules in the endosperm (Supplementary Fig. 11), modules 10 and 4 exhibited significant enrichment (FDR < 0.05) for kafirin genes (Fig. 5c; Supplementary Fig. 12c). Specifically, Module 4 encompasses 15 α-kafirin genes, while Module 10 includes six α-kafirin genes and two γ-kafirins. The β-kafirin and δ-kafirin genes were present in Modules 2 and 5, respectively. These findings indicated that different types of kafirins are synthesized at different stages of seed development. Notably, β- and δ-kafirins were expressed exclusively in the later stages of endosperm development (20–25 dpa), while α-kafirin expression spans from 15 to 25 dpa in wild-type sorghum.

The 1719 genes within modules 4 and 10, including 23 kafirin genes, were functionally implicated in crucial biological processes such as carbon metabolism, lipid metabolism, the MAPK signaling pathway, and seed storage protein processes, based on GO term analysis (Fig. 5d). The function of genes co-expressed with seed storage proteins could imply their involvement in the biosynthesis, accumulation, and/or mobilization of these proteins during seed development. Subsequently, we identified 207 hub genes related to kafirin biosynthesis from modules 4 and 10 (Supplementary Fig. 12d; Supplementary Data 13). These genes were significantly enriched in various biological processes such as lipid metabolism, fatty acid degradation, amino acid biosynthesis and degradation, MAPK signaling, carotenoid biosynthesis, and hormonal signaling. The top 10 most highly connected genes are presented in Supplementary Data 13. Some of these top hub genes have been investigated for roles in protein synthesis in other crops. For instance, extra-large GTP-binding proteins have been reported to play key roles in regulating panicle architecture, plant growth, development, grain weight, and disease resistance⁷⁸. Similarly, bZIP TF is known to play a key role in regulating various biological pathways, including seed storage protein biosynthesis^79,80,81,82. Further characterization of various alleles of these genes could provide valuable insights into the molecular mechanisms and regulatory networks underlying these processes and indicate gene targets for modifying protein content through molecular breeding.

Discussion

Sorghum is a major climate-smart cereal crop that will continue to play a significant role in adapting global food and feed systems to climate change. However, the molecular mechanisms governing sorghum seed development have not yet been explored as extensively as in other cereals^{44,45,83,84,85,86,87,88}. Previous characterization of the transcriptomic landscapes of endosperm development in wheat, rice, oat, barley, and maize have provided a solid foundation for the present study. Here, we employed RNA-Seq and untargeted metabolome profiling to capture dynamic transcriptome and metabolome profiles of sorghum seed development. Our findings provide significant insights into the interplay of gene expression and metabolite accumulation across five developmental timepoints, offering a comprehensive temporal perspective on functional and cellular specialization in the whole seed (maternal + daughter tissues), embryo, and endosperm (daughter tissues).

Our research has shed light on the dynamic landscape of transcriptomic and metabolomic activity during sorghum seed development, focusing on the reference genome line BTx623. This study lays the groundwork for clarifying the significant diversity observed in seed traits within global sorghum mapping and breeding populations. For instance, a genome-wide association study involving 837 varieties revealed 81 quantitative trait loci (QTLs) associated with grain size, which influences starch and protein harvestable yields on a per-area basis⁸⁹. Other studies have documented the extensive diversity in grain quality traits across sorghum germplasm, including variations in seed color, starch, protein, and oil contents^90,91,92. This diversity underscores the multitude of phenotypic variants impacted by sorghum seed development.

Future investigations are needed in diverse sorghum varieties to elucidate how genetic variation in starch, protein, and oil biosynthetic pathways results in differential carbon partitioning among them; and how that partitioning is impacted by whole-plant phenotypes and local adaptation. Comparative genomics studies should focus on differences in seed development processes among varieties having genetic variation in hub genes to shed light on tissue-specific metabolite accumulation patterns and contribute to improved seed quality, stress responses, and adaptation to local production environments. Hence, the baseline information presented herein about sorghum seed developmental programing holds promise for enhancing the utility of sorghum in breeding for climate-smart food and feed systems.

Tissue-specific (TS) genes play a pivotal role in unraveling the mechanisms governing tissue or organ identity and can be crucial in guiding the progression through seed developmental programs^93,94. In our study, we identified 499 TS genes, including 41 TFs, expressed specifically in early whole seeds, embryos, and endosperms (Supplementary Data 6; Supplementary Fig. 13a). Analyses revealed variations in the numbers of TS genes among embryos (127 genes, including 14 TFs), endosperms (71 genes, including 6 TFs), and early whole seeds (79 genes, including 12 TFs), with the endosperm exhibiting the lowest number of TS genes (Supplementary Fig. 13a), consistent with previous findings in maize⁹⁵ and wheat⁸⁸. This may suggest the relatively less complex structure of the endosperm due to its role as a storage tissue, compared to embryos or early seeds, which must differentiate into multiple organs.

The dynamic expression patterns and functional enrichment of TS genes indicated their involvement in specific tissues and stages of seed development (Supplementary Fig. 13b, c). For instance, genes specific to early whole seeds were primarily associated with cell wall biosynthesis and structural integrity, emphasizing the importance of creating space inside maternal tissues for rapid expansion of the embryo and endosperm. Embryo-specific genes were predominantly expressed in later stages of embryo development, indicating an early focus on general growth and tissue differentiation; whereas endosperm-specific genes were expressed throughout its development, emphasized its long-term programming (mediated through RNA processing and regulation) focused on starch and protein storage. Genes that were only expressed the embryo and/or endosperm (nowhere else in the plant) appeared to coordinate developmental processes and respond to environmental cues, particularly through sterol biosynthesis and abscisic acid (ABA) responses. ABA in particular is a well-known player in seed maturation, dry-down, dormancy, germination, and both seed and whole-plant environmental responses⁹⁶.

While the functions of most seed-specific TFs remain unknown, our analysis revealed enrichment with known regulator families of seed development (e.g., WOX, NF-YB, NAC, ERF, AP2, MYB, Myb_related). These TF families are recognized for their key roles in events unique to seeds, especially in the formation and maturation of the endosperm and embryo^97,98,99. This suggests that the remaining TS genes and TFs we identified may also hold regulatory roles in seed development. Future endeavors should focus on elucidating their roles by leveraging the genetic variation present in mutant and natural populations. The identified TS genes and TFs, along with the newly mapped gene networks governing starch and protein in BTx623, provide a crucial starting point for understanding how gene activity and metabolite accumulation is coordinated during seed development and organogenesis, as determined by SEM. These processes collectively influence the yield, quality, and nutrient profiles of sorghum grain.

Materials and methods

Plant material and field experiments

The study utilized the Sorghum bicolor cultivar ‘BTx623,’ cultivated under field conditions at the Quaker Research Farm in Lubbock, TX (33°35'52.9“N 101°54'21.4“W, elevation 992 m) during the summer of 2022. The farm experiences a semi-arid climate with an average yearly precipitation of 469 mm, primarily from May to October, and features Amarillo sandy clay loam soil¹⁰⁰. Irrigation was maintained at 1 inch water per week.

Developing sorghum seeds were collected following successful pollination for detailed characterization (Supplementary Data 1). In brief, we conducted daily sampling from pollination until 30 dpa. Before flowering, panicles were covered with pollination bags to prevent cross-pollination. After pollination, these bags were replaced with mesh ones to safeguard seeds from birds while allowing light exposure and improved air circulation. From pollination to 15 dpa, we harvested the middle portion of 10 panicles that flowered on the same day for each replicate. Subsequently, for samples beyond 15 dpa, we collected the middle portion of 5 panicles per biological replicate. Three biological replicates for all data points were collected for transcriptome analysis and five replicates for 5, 10, 20, 25, and 30 dpa for metabolome analysis. Sampling was consistently conducted in the morning (between 9:00 AM to 11:00 AM) to minimize potential circadian influences. Following collection, samples were promptly transported to the laboratory, where they were dissected on ice using scalpels and tweezers to isolate embryos and endosperms.

For metabolome samples, 100 uniform seeds were isolated from the harvested panicles, flash-frozen in liquid nitrogen in 15 mL falcon tubes and stored at −80 °C. During seed dissection for embryo and endosperm RNA extraction, we isolated 50 uniform seeds from the harvested panicle. For early embryo samples (10–18 dpa), we isolated embryos from 40–50 seeds, whereas for later embryo samples, 20 seeds were sufficient for RNA extraction. Similarly, for endosperm samples (6–10 dpa), 40–50 seeds were used for isolation, while for later time points, 10 seeds were used for RNA extraction. Subsequently, embryo and endosperm sample were flash-frozen in liquid nitrogen and stored at −80 °C until further analysis.

Kafirin analysis

Kafirin content analysis involved a step procedure to assess total kafirin levels and the degree of cross-linking (polymerization). Kafirin fractions were selectively extracted under non-reducing conditions (kafirin 1) and reducing conditions (kafirin 2) following the method outlined by Da Silva et al³⁰. Seeds from days 5, 10, 15, 20, and 25 dpa were retrieved from −80 °C storage, immediately crushed using a mortar and pestle, and then returned to −80 °C. The coarsely crushed material was lyophilized and ground into a fine powder using a mortar and pestle. Kafirin 1 and kafirin 2 were then extracted as described in Da Silva, et al.³⁰. except 50 mg of sample and 0.5 mL of solvent were used. Following extraction, beta-mercaptoethanol (BME) was added to kafirin 1 extracts to achieve a final volume of 2%. After incubation with BME, samples underwent alkylation with 4-VP, as described in Bean, et al¹⁰¹. After kafirin 2 extraction, additional sample extraction solvent was added to equalize the total volume of kafirin 1 and kafirin 2. Kafirin 2 was then alkylated with 4-VP. Subsequently, kafirin 1 and kafirin 2 were subjected to analysis by RP-HPLC using C3 columns, following the procedure outlined in Bean, et al.¹⁰¹.

Metabolomic analysis

Untargeted metabolomic profiling was employed to analyze sorghum seeds using the LC-MS platform. The metabolites extraction and quantification were carried out by the service provider Innomics. For each replicate, whole seed were shipped in dry ice. For metabolite extraction, 50 mg of each sample was weighed into 1.5 mL Eppendorf tubes and immersed in a pre-cooled extraction solution (methanol: H2O = 7:3, v/v), supplemented with 20 μL of Internal Standard 1. Homogenization was conducted using a weaving grinder at 50 Hz for 10 minutes, followed by water bath ultrasonication at 4 °C for 30 min. After being held at −20 °C for 1 h, the extracts were centrifuged at 14,000 rpm at 4 °C for 15 min. The resulting 600 μL supernatant was filtered using a 0.22 μm membrane, and 20 μL of the filtered solution from each sample was composited into the mixed quality control (QC) sample to assess the repeatability and stability of LC/MS analysis.

A Waters 2777c UPLC (Waters, USA) in series with a Q Exactive HF high-resolution mass spectrometer (Thermo Fisher Scientific, USA) was utilized for the separation and detection of metabolites. Post-experiment, the off-line mass spectrometry data were imported into Compound Discoverer v3.3 (Thermo Fisher Scientific, USA) software. Analysis of the mass spectrometry data, in conjunction with the BGI metabolome database (bmdb), mzCloud database (https://www.mzcloud.org/), and ChemSpider online database (https://www.chemspider.com/), resulted in a data matrix containing metabolite peak area and identification results. The identified metabolites were annotated using the Kyoto Encyclopedia of Genes and Genomes pathway (KEGG; https://www.genome.jp/kegg/) and Human Metabolome Databases (HMDB; https://hmdb.ca/)^102,103.A PCA and Partial Least Squares Discriminant Analysis (PLS-DA) were conducted with the metabolomics software MetaboAnalyst (https://www.metaboanalyst.ca/)¹⁰⁴. Univariate analyses (t-tests) were used to calculate statistical significance (P-value). The following criteria were used to identify differentially expressed metabolites: Variable Importance in Projection value (VIP)> 1 and a P-value < 0.05, log₂ (fold change) ≥ 1.5 or ≤ −1.5.

RNAseq and data analysis

RiboPure™ RNA Purification Kit (AM1924; Invitrogen) was utilized for total RNA isolation following the manufacturer’s instructions. We first used agarose gel electrophoresis, and a Nanodrop to assess the quality of the extracted RNA samples and check for DNA and protein contamination. At least 2 μg RNA was shipped to sequencing service provider Innomics on dry ice. RNA quality control and sequencing were performed by Innomics. Briefly, the RNA integrity number (RIN) value was calculated and samples with RIN ≥ 7 were used for RNA sequencing (Supplementary Data 5). Standard DNBSEQ Eukaryotic Transcriptome Resequencing protocols were followed for the construction of RNA-seq libraries, which were subsequently sequenced to generate PE150 reads by the service provider Innomics Inc. For the library construction, the fragmented mRNA was synthesized into first strand cDNA using random primers, while the second strand cDNA was synthesized with dUTP instead of dTTP. The synthesized cDNA was subjected to end-repair and 3’ adenylated. Adaptors were ligated to the ends of these 3’ adenylated cDNA fragments followed by the PCR amplifications. The raw data from the DNBSEQ platform was filtered to remove the adaptors, ployX and low-quality data by SOAPnuke software¹⁰⁵ with parameters: “-n 0.001 -l 20 -q 0.4 --adaMR 0.25 --ada_trim --polyX 50 --minReadLen 150”.

To ensure data quality, the cleaned data underwent QC assessment using FastQC¹⁰⁶. High-quality reads were aligned to the sorghum reference genome (version 3.3.1)^107,108 using STAR¹⁰⁹. The FPKM values representing gene expression levels were calculated using StringTie¹¹⁰. Pearson correlation coefficients between biological replicates were calculated based on gene FPKM values, and replicates with a Pearson correlation > 0.8 were selected for further analysis.

Genes were considered expressed at a specific stage if they met the following criteria: (1) a minimum of two reads were mapped to the gene in each of two replicates, and (2) the average FPKM at a timepoint was ≥ 1 in at least one sample. To mitigate the impact of transcriptional noise, genes with a minimum FPKM value ≥ 1 in at least one sample were included for downstream analysis, consistent with the approach used in several other studies^{34,35,46,99,111}.

The PCA was employed to visually represent relationships among distinct seed tissue samples, utilizing the prcomp¹¹² function within R with default settings. Hierarchical clustering was performed by k-mean clustering with the pheatmap package¹¹³ using default settings. The elbow method^114,115 was applied to determine the optimal cluster number. Transformed and normalized gene expression values with log₂ (FPKM + 1) were used for PCA analysis and hierarchical clustering. For hierarchical clustering, relative expression values of the genes were calculated by dividing their expression level at different timepoints by their maximum observed RPKM.

Functional enrichment analysis was conducted based on a hypergeometric test using KEGG and ShinyGO (http://bioinformatics.sdstate.edu/go74/)¹¹⁶. Enriched KEGG pathways with an FDR < 0.05 were considered statistically significant, and selected KEGG pathways are presented. The co-expression network was generated through the STRING database (https://string-db.org/)¹¹⁷. Mean FPKM values were clustered using Fuzzy c-means clustering in the Mfuzz v2.42 R package (https://www.bioconductor.org/packages/release/bioc/html/Mfuzz.html)¹¹⁸. The optimal number of clusters was set to 12, and the fuzzifier coefficient was set to 2.01. Genes with a membership score of at least 0.5 were plotted and used as inputs for categorical enrichment analysis.

qRT-PCR

The expression levels of selected genes were validated by quantitative real-time polymerase chain reaction (qRT-PCR). Total RNA was reverse transcribed into complementary DNA (cDNA) using iScript™ Reverse Transcription Supermix for RT-qPCR (Bio-Rad), according to the manufacturer’s instructions. qRT-PCR was performed with two technical replicates for each of the three biological replicates using SsoAdvanced Universal SYBR Green Supermix (Bio-Rad) on a Bio-Rad CFX96 system. Data was processed using CFX Manager software. The relative transcript levels were normalized to the expression of the reference gene Serine/threonine-Protein Phosphatase (PP2A). It was selected according to the Sorghum reference genes selection paper¹¹⁹. Oligonucleotides used for these experiments are listed in Supplementary Data 14.

Identification embryo, endosperm, and whole seed specific genes

A total of 42 previously published non-seed sorghum RNA-seq datasets were employed to identify the tissue-specific (TS) genes^{67,108,120,121,122,123,124,125,126,127}. The TS genes were identified utilizing a TS scoring algorithm^128,129 that compares the expression level of a gene in a given compartment with its maximal expression level in the other sample. Therefore, TS scores range from 0 to 1, and the higher the TS score of a gene for a tissue, the more likely the gene is specifically expressed in that tissue^95,128,129. This study defined TS genes as having a TS score > 0.5, following similar criteria used in other studies in maize⁹⁵ and yam¹¹¹.

Statistics and reproducibility

Details of the statistical tests used in the study are provided in the respective methods sections and Supplementary Data. The RNA-seq, metabolome profiling and qPCR were performed with two, five and three independent biological replicates, respectively. Pearson correlations (r) among the replicates were calculated based on the expression levels (FPKM) of the genes. The assessment of GO term enrichment was conducted using Fisher’s Exact test, followed by adjustment for false discovery rate (FDR) as implemented in PANTHER19.0. For metabolome pathway enrichment analysis, the Hypergeometric Test was utilized, as implemented in MetaboAnalyst.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The source data behind the graphs in the main figures can be found in Supplementary Data 15. The RNA-seq data has been deposited into the Ensembl ArrayExpress collection in BioStudies under Accession number: E-MTAB-13406. The metabolomic data, encompassing compound names, formulas, exact Q1 (m/z) values, molecular weights, and peak intensities, are available in the supplementary data.

References

Hossain, M. S., Islam, M. N., Rahman, M. M., Mostofa, M. G. & Khan, M. A. R. Sorghum: A prospective crop for climatic vulnerability, food and nutritional security. J. Agriculture Food Res. 8, 100300 (2022).
Adebo, O. A. African sorghum-based fermented foods: past, current and future prospects. Nutrients 12, 1111 (2020).
Article CAS PubMed PubMed Central Google Scholar
Mundia, C. W., Secchi, S., Akamani, K. & Wang, G. A regional comparison of factors affecting global sorghum production: The case of North America, Asia and Africa’s Sahel. Sustainability 11, 2135 (2019).
Article Google Scholar
Maikasuwa, A. & Ala, A. Trend analysis of area and productivity of sorghum in Sokoto state, Nigeria, 1993-2012. Eur. Sci. J. 9, 16 (2013).
Maiti, R. Sorghum science. (Science Publishers, Inc., 1996).
Tack, J., Lingenfelser, J. & Jagadish, S. K. Disaggregating sorghum yield reductions under warming scenarios exposes narrow genetic diversity in US breeding programs. Proc. Natl Acad. Sci. 114, 9296–9301 (2017).
Article CAS PubMed PubMed Central Google Scholar
Schlenker, W. & Lobell, D. B. Robust negative impacts of climate change on African agriculture. Environ. Res. Lett. 5, 014010 (2010).
Article Google Scholar
Sultan, B. et al. Robust features of future climate change impacts on sorghum yields in West Africa. Environ. Res. Lett. 9, 104006 (2014).
Article Google Scholar
Liu, B. et al. Similar estimates of temperature impacts on global wheat yield by three independent methods. Nat. Clim. Change 6, 1130–1136 (2016).
Article Google Scholar
Mohammed, A. & Misganaw, A. Modeling future climate change impacts on sorghum (Sorghum bicolor) production with best management options in Amhara Region, Ethiopia. CABI Agriculture Biosci. 3, 22 (2022).
Article Google Scholar
Kladnik, A., Chourey, P. S., Pring, D. R. & Dermastia, M. Development of the endosperm of Sorghum bicolor during the endoreduplication-associated growth phase. J. Cereal Sci. 43, 209–215 (2006).
Article CAS Google Scholar
Rooney, W. Sorghum: Origin. History, Technology, and Production (2000).
Benech-Arnold, R. L. & Rodríguez, M. V. Pre-harvest sprouting and grain dormancy in Sorghum bicolor: what have we learned? Front. Plant Sci. 9, 811 (2018).
Article PubMed PubMed Central Google Scholar
Raghavan, V. Some reflections on double fertilization, from its discovery to the present. N. Phytologist 159, 565–583 (2003).
Article CAS Google Scholar
Tao, Y. et al. Integration of embryo–endosperm interaction into a holistic and dynamic picture of seed development using a rice mutant with notched-belly kernels. Crop J. 10, 729–742 (2022).
Article Google Scholar
Zheng, Y., Xiong, F., Wang, Z. & Gu, Y. Observation and investigation of three endosperm transport tissues in sorghum caryopses. Protoplasma 252, 705–714 (2015).
Article CAS PubMed Google Scholar
Artschwager, E. & McGuire, R. C. Cytology of reproduction in Sorghum vulgare. J. Agric. Res. 78, 659–673 (1949).
Google Scholar
Kowles, R. V. & Phillips, R. L. DNA amplification patterns in maize endosperm nuclei during kernel development. Proc. Natl Acad. Sci. 82, 7010–7014 (1985).
Article CAS PubMed PubMed Central Google Scholar
Schweizer, L., Yerk-Davis, G., Phillips, R., Srienc, F. & Jones, R. Dynamics of maize endosperm development and DNA endoreduplication. Proc. Natl Acad. Sci. 92, 7070–7074 (1995).
Article CAS PubMed PubMed Central Google Scholar
Zheng, Y. & Wang, Z. Structural character of sorghum endosperm transfer cells and their relationship with embryo and endosperm. Int. J. Plant Biol. 1, e15 (2010).
Article Google Scholar
Leviczky, T. et al. E2FA and E2FB transcription factors coordinate cell proliferation with seed maturation. Development 146, dev179333 (2019).
Article CAS PubMed PubMed Central Google Scholar
Li, R., Tan, Y. & Zhang, H. Regulators of starch biosynthesis in cereal crops. Molecules 26, 7092 (2021).
Article CAS PubMed PubMed Central Google Scholar
Hamaker, B., Mohamed, A., Habben, J., Huang, C. & Larkins, B. Efficient procedure for extracting maize and sorghum kernel proteins reveals higher prolamin contents than the conventional method. Cereal Chem. 72, 583–588 (1995).
Duressa, D., Weerasoriya, D., Bean, S. R., Tilley, M. & Tesso, T. Genetic basis of protein digestibility in grain sorghum. Crop Sci. 58, 2183–2199 (2018).
Article CAS Google Scholar
Izquierdo, L. & Godwin, I. D. Molecular characterization of a novel methionine‐rich δ ‐kafirin seed storage protein gene in sorghum (Sorghum bicolor L.). Cereal chem. 82, 706–710 (2005). .
Belton, P., Delgadillo, I., Halford, N. & Shewry, P. Kafirin structure and functionality. J. cereal Sci. 44, 272–286 (2006).
Article CAS Google Scholar
Laidlaw, H. et al. Allelic variation of the β-, γ-and δ-kafirin genes in diverse Sorghum genotypes. Theor. Appl. Genet. 121, 1227–1237 (2010).
Article CAS PubMed Google Scholar
Castro-Jácome, T. P., Alcántara-Quintana, L. E. & Tovar-Pérez, E. G. Optimization of sorghum kafirin extraction conditions and identification of potential bioactive peptides. BioResearch Open Access 9, 198–208 (2020).
Article PubMed PubMed Central Google Scholar
Duodu, K. et al. Effect of grain structure and cooking on sorghum and maize in vitro protein digestibility. J. Cereal Sci. 35, 161–174 (2002).
Article CAS Google Scholar
Da Silva, L. S., Taylor, J. & Taylor, J. R. Transgenic sorghum with altered kafirin synthesis: kafirin solubility, polymerization, and protein digestion. J. Agric. food Chem. 59, 9265–9270 (2011).
Article PubMed Google Scholar
Locascio, A., Roig-Villanova, I., Bernardi, J. & Varotto, S. Current perspectives on the hormonal control of seed development in Arabidopsis and maize: a focus on auxin. Front. plant Sci. 5, 412 (2014).
Article PubMed PubMed Central Google Scholar
Kozaki, A. & Aoyanagi, T. Molecular aspects of seed development controlled by gibberellins and abscisic acids. Int. J. Mol. Sci. 23, 1876 (2022).
Article CAS PubMed PubMed Central Google Scholar
Ruuska, S. A., Girke, T., Benning, C. & Ohlrogge, J. B. Contrapuntal networks of gene expression during Arabidopsis seed filling. Plant Cell 14, 1191–1206 (2002).
Article CAS PubMed PubMed Central Google Scholar
Palovaara, J., Saiga, S. & Weijers, D. Transcriptomics approaches in the early Arabidopsis embryo. Trends Plant Sci. 18, 514–521 (2013).
Article CAS PubMed Google Scholar
Fait, A. et al. Arabidopsis seed development and germination is associated with temporally distinct metabolic switches. Plant Physiol. 142, 839–854 (2006).
Article CAS PubMed PubMed Central Google Scholar
Baud, S., Boutin, J.-P., Miquel, M., Lepiniec, L. & Rochat, C. An integrated overview of seed development in Arabidopsis thaliana ecotype WS. Plant Physiol. Biochem. 40, 151–160 (2002).
Article CAS Google Scholar
Lan, L. et al. Monitoring of gene expression profiles and isolation of candidate genes involved in pollination and fertilization in rice (Oryza sativa L.) with a 10K cDNA microarray. Plant Mol. Biol. 54, 471–487 (2004).
Article CAS PubMed Google Scholar
Furutani, I., Sukegawa, S. & Kyozuka, J. Genome‐wide analysis of spatial and temporal gene expression in rice panicle development. Plant J. 46, 503–511 (2006).
Article CAS PubMed Google Scholar
Jiang, S.-Y. & Ramachandran, S. Functional genomics of rice pollen and seed development by genome-wide transcript profiling and Ds insertion mutagenesis. Int. J. Biol. Sci. 7, 28 (2011).
Article Google Scholar
Xue, L.-J., Zhang, J.-J. & Xue, H.-W. Genome-wide analysis of the complex transcriptional networks of rice developing seeds. PloS one 7, e31081 (2012).
Article CAS PubMed PubMed Central Google Scholar
Rangan, P., Furtado, A. & Henry, R. J. The transcriptome of the developing grain: a resource for understanding seed development and the molecular control of the functional and nutritional properties of wheat. BMC genomics 18, 1–9 (2017).
Article Google Scholar
Guan, J. et al. Transcriptome Analysis of Developing Wheat Grains at Rapid Expanding Phase Reveals Dynamic Gene Expression Patterns. Biology 11, 281 (2022).
Article CAS PubMed PubMed Central Google Scholar
Jiang, L. et al. Dynamic transcriptome analysis suggests the key genes regulating seed development and filling in Tartary buckwheat (Fagopyrum tataricum Garetn.). Front. Genet. 13, 990412 (2022).
Article CAS PubMed PubMed Central Google Scholar
Yi, F. et al. High temporal-resolution transcriptome landscape of early maize seed development. Plant Cell 31, 974–992 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chen, J. et al. Dynamic transcriptome landscape of maize embryo and endosperm development. Plant Physiol. 166, 252–264 (2014).
Article PubMed PubMed Central Google Scholar
Li, X., Wu, J., Yi, F., Lai, J. & Chen, J. High temporal-resolution transcriptome landscapes of maize embryo sac and ovule during early seed development. Plant Mol. Biol. 111, 233–248 (2023).
Article CAS PubMed Google Scholar
Li, G. et al. Temporal patterns of gene expression in developing maize endosperm identified through transcriptome sequencing. Proc. Natl Acad. Sci. 111, 7582–7587 (2014).
Article CAS PubMed PubMed Central Google Scholar
Wang, X. et al. Integrated analysis of transcriptomic and proteomic data from tree peony (P. ostii) seeds reveals key developmental stages and candidate genes related to oil biosynthesis and fatty acid metabolism. Horticulture Res. 6, 111 (2019).
Fedorova, M. et al. Genome-wide identification of nodule-specific transcripts in the model legume Medicago truncatula. Plant Physiol. 130, 519–537 (2002).
Article CAS PubMed PubMed Central Google Scholar
Gallardo, K. et al. A combined proteome and transcriptome analysis of developing Medicago truncatula seeds: evidence for metabolic specialization of maternal and filial tissues. Mol. Cell. Proteom. 6, 2165–2179 (2007).
Article CAS Google Scholar
Ziegler, D. J., Khan, D., Kalichuk, J. L., Becker, M. G. & Belmonte, M. F. Transcriptome landscape of the early Brassica napus seed. J. Integr. Plant Biol. 61, 639–650 (2019).
Article CAS PubMed Google Scholar
Shahid, M. et al. Comparative transcriptome analysis of developing seeds and silique wall reveals dynamic transcription networks for effective oil production in Brassica napus L. Int. J. Mol. Sci. 20, 1982 (2019).
Article CAS PubMed PubMed Central Google Scholar
Li, F., Wu, X., Tsang, E. & Cutler, A. J. Transcriptional profiling of imbibed Brassica napus seed. Genomics 86, 718–730 (2005).
Article CAS PubMed Google Scholar
Watson, L. & Henry, R. J. Microarray analysis of gene expression in germinating barley embryos (Hordeum vulgare L.). Funct. Integr. Genomics 5, 155–162 (2005).
Article CAS PubMed Google Scholar
Qi, Z. et al. Meta‐analysis and transcriptome profiling reveal hub genes for soybean seed storage composition during seed development. Plant, Cell Environ. 41, 2109–2127 (2018).
CAS PubMed Google Scholar
Collakova, E. et al. Metabolic and transcriptional reprogramming in developing soybean (Glycine max) embryos. Metabolites 3, 347–372 (2013).
Article CAS PubMed PubMed Central Google Scholar
Yang, S. et al. Dynamic transcriptome changes related to oil accumulation in developing soybean seeds. Int. J. Mol. Sci. 20, 2202 (2019).
Article CAS PubMed PubMed Central Google Scholar
Sun, S. et al. Analysis of spatio-temporal transcriptome profiles of soybean (Glycine max) tissues during early seed development. Int. J. Mol. Sci. 21, 7603 (2020).
Article CAS PubMed PubMed Central Google Scholar
Huang, L., Tan, H., Zhang, C., Li, Q. & Liu, Q. Starch biosynthesis in cereal endosperms: An updated review over the last decade. Plant Commun. 2, 100237. https://doi.org/10.1016/j.xplc.2021.100237 (2021).
Qu, J. et al. Comparative transcriptomics reveals the difference in early endosperm development between maize with different amylose contents. PeerJ 7, e7528 (2019).
Article PubMed PubMed Central Google Scholar
Du, J. et al. Identification of regulatory networks and hub genes controlling soybean seed set and size using RNA sequencing analysis. J. Exp. Bot. 68, 1955–1972 (2017).
CAS PubMed PubMed Central Google Scholar
Zhang, Z. et al. Integrated metabolomics and transcriptomics analyses reveal the metabolic differences and molecular basis of nutritional quality in landraces and cultivated rice. Metabolites 12, 384 (2022).
Article CAS PubMed PubMed Central Google Scholar
Xiao, Q. et al. Profiling of transcriptional regulators associated with starch biosynthesis in sorghum (Sorghum bicolor L.). Front. Plant Sci. 13, 999747 (2022).
Article PubMed PubMed Central Google Scholar
Seebauer, J. R., Singletary, G. W., Krumpelman, P. M., Ruffo, M. L. & Below, F. E. Relationship of source and sink in determining kernel composition of maize. J. Exp. Bot. 61, 511–519 (2010).
Article CAS PubMed Google Scholar
Shen, S., Hou, H., Ding, C., Bing, D.-J. & Lu, Z.-X. Protein content correlates with starch morphology, composition and physicochemical properties in field peas. Can. J. plant Sci. 96, 404–412 (2016).
Article CAS Google Scholar
Kljak, K., Duvnjak, M. & Grbeša, D. Effect of starch properties and zein content of commercial maize hybrids on kinetics of starch digestibility in an in vitro poultry model. J. Sci. Food Agriculture 99, 6372–6379 (2019).
Article CAS Google Scholar
Wang, B. et al. A comparative transcriptional landscape of maize and sorghum obtained by single-molecule sequencing. Genome Res. 28, 921–932 (2018).
Article CAS PubMed PubMed Central Google Scholar
Doll, N. M., Depège-Fargeix, N., Rogowsky, P. M. & Widiez, T. Signaling in early maize kernel development. Mol. Plant 10, 375–388 (2017).
Article CAS PubMed Google Scholar
Young, T. E. & Gallie, D. R. Programmed cell death during endosperm development. Programmed cell death in higher plants, 39–57 (2000).
Domínguez, F. & Cejudo, F. J. Programmed cell death (PCD): an essential process of cereal seed development and germination. Front. plant Sci. 5, 99512 (2014).
Google Scholar
Young, T. E., Gallie, D. R. & DeMason, D. A. Ethylene-mediated programmed cell death during maize endosperm development of wild-type and shrunken2 genotypes. Plant Physiol. 115, 737–751 (1997).
Article CAS PubMed PubMed Central Google Scholar
Ahmadizadeh, M., Chen, J.-T., Hasanzadeh, S., Ahmar, S. & Heidari, P. Insights into the genes involved in the ethylene biosynthesis pathway in Arabidopsis thaliana and Oryza sativa. J. Genet. Eng. Biotechnol. 18, 62 (2020).
Article PubMed PubMed Central Google Scholar
Wang, G. et al. Dek40 encodes a PBAC4 protein required for 20S proteasome biogenesis and seed development. Plant Physiol. 180, 2120–2132 (2019).
Article CAS PubMed PubMed Central Google Scholar
Wei, Y. M. et al. Defective kernel 66 encodes a GTPase essential for kernel development in maize. J. Exp. Bot. 74, 5694–5708 (2023).
Article CAS PubMed PubMed Central Google Scholar
Pedroza-Garcia, J. A. et al. Maize ATR safeguards genome stability during kernel development to prevent early endosperm endocycle onset and cell death. Plant Cell 33, 2662–2684 (2021).
Article PubMed PubMed Central Google Scholar
Yan, D., Duermeyer, L., Leoveanu, C. & Nambara, E. The functions of the endosperm during seed germination. Plant and Cell Physiology. 55, 1521–1533 (2014).
Huang, S. & Millar, A. H. Succinate dehydrogenase: the complex roles of a simple enzyme. Curr. Opin. plant Biol. 16, 344–349 (2013).
Article CAS PubMed Google Scholar
Biswal, A. K. et al. Novel mutant alleles reveal a role of the extra-large G protein in rice grain filling, panicle architecture, plant growth, and disease resistance. Front. Plant Sci. 12, 2821 (2022).
Article Google Scholar
Zhang, C. et al. Pivotal roles of ELONGATED HYPOCOTYL5 in regulation of plant development and fruit metabolism in tomato. Plant Physiol. 189, 527–540 (2022).
Article CAS PubMed PubMed Central Google Scholar
Sun, Y. et al. Natural variation in the OsbZIP18 promoter contributes to branched‐chain amino acid levels in rice. N. Phytologist 228, 1548–1558 (2020).
Article CAS Google Scholar
Sun, Y. et al. OsbZIP18, a positive regulator of serotonin biosynthesis, negatively controls the UV-B tolerance in rice. Int. J. Mol. Sci. 23, 3215 (2022).
Article CAS PubMed PubMed Central Google Scholar
Kawakatsu, T. & Takaiwa, F. Differences in transcriptional regulatory mechanisms functioning for free lysine content and seed storage protein accumulation in rice grain. Plant cell Physiol. 51, 1964–1974 (2010).
Article CAS PubMed Google Scholar
Gao, Y., Xu, H., Shen, Y. & Wang, J. Transcriptomic analysis of rice (Oryza sativa) endosperm using the RNA-Seq technique. Plant Mol. Biol. 81, 363–378 (2013).
Article CAS PubMed Google Scholar
Gillies, S. A., Futardo, A. & Henry, R. J. Gene expression in the developing aleurone and starchy endosperm of wheat. Plant Biotechnol. J. 10, 668–679 (2012).
Article CAS PubMed Google Scholar
Gutierrez-Gonzalez, J. J., Tu, Z. J. & Garvin, D. F. Analysis and annotation of the hexaploid oat seed transcriptome. BMC genomics 14, 1–17 (2013).
Article Google Scholar
Lu, X. et al. The differential transcription network between embryo and endosperm in the early developing maize seed. Plant Physiol. 162, 440–455 (2013).
Article CAS PubMed PubMed Central Google Scholar
Pellny, T. K. et al. Cell walls of developing wheat starchy endosperm: comparison of composition and RNA-Seq transcriptome. Plant Physiol. 158, 612–627 (2012).
Article CAS PubMed Google Scholar
Chi, Q. et al. Global transcriptome analysis uncovers the gene co-expression regulation network and key genes involved in grain development of wheat (Triticum aestivum L.). Funct. Integr. Genomics 19, 853–866 (2019).
Article CAS PubMed PubMed Central Google Scholar
Tao, Y. et al. Large‐scale GWAS in sorghum reveals common genetic control of grain size among cereals. Plant Biotechnol. J. 18, 1093–1105 (2020).
Article CAS PubMed Google Scholar
Boyles, R. E. et al. Genetic dissection of sorghum grain quality traits using diverse and segregating populations. Theor. Appl. Genet. 130, 697–716 (2017).
Article PubMed Google Scholar
Kumar, N. et al. Development and characterization of a sorghum multi-parent advanced generation intercross (MAGIC) population for capturing diversity among seed parent gene pool. G3: Genes, Genomes. Genetics 13, jkad037 (2023).
Google Scholar
Kimani, W., Zhang, L.-M., Wu, X.-Y., Hao, H.-Q. & Jing, H.-C. Genome-wide association study reveals that different pathways contribute to grain quality variation in sorghum (Sorghum bicolor). BMC genomics 21, 1–19 (2020).
Article Google Scholar
Pattison, R. J. et al. Comprehensive tissue-specific transcriptome analysis reveals distinct regulatory programs during early tomato fruit development. Plant Physiol. 168, 1684–1701 (2015).
Article CAS PubMed PubMed Central Google Scholar
Han, B. et al. Epigenetic regulation of seed-specific gene expression by DNA methylation valleys in castor bean. BMC Biol. 20, 1–18 (2022).
Article CAS Google Scholar
Zhan, J. et al. RNA sequencing of laser-capture microdissected compartments of the maize kernel identifies regulatory modules associated with endosperm cell differentiation. Plant Cell 27, 513–531 (2015).
Article CAS PubMed PubMed Central Google Scholar
Ali, F., Qanmber, G., Li, F. & Wang, Z. Updated role of ABA in seed maturation, dormancy, and germination. J. Adv. Res. 35, 199–214 (2022).
Article CAS PubMed Google Scholar
Zhao, Y., Hu, Y., Dai, M., Huang, L. & Zhou, D.-X. The WUSCHEL-related homeobox gene WOX11 is required to activate shoot-borne crown root development in rice. Plant Cell 21, 736–748 (2009).
Article CAS PubMed PubMed Central Google Scholar
Yamamoto, A. et al. Arabidopsis NF‐YB subunits LEC1 and LEC1‐LIKE activate transcription by interacting with seed‐specific ABRE‐binding factors. Plant J. 58, 843–856 (2009).
Article CAS PubMed Google Scholar
Jiang, W., Zhang, X., Song, X., Yang, J. & Pang, Y. Genome-wide identification and characterization of APETALA2/ethylene-responsive element binding factor superfamily genes in soybean seed development. Front. plant Sci. 11, 566647 (2020).
Article PubMed PubMed Central Google Scholar
Singh, M. et al. Biochar implications under limited irrigation for sweet corn production in a semi-arid environment. Front. Plant Sci. 13, 1032 (2022).
Article Google Scholar
Bean, S., Ioerger, B. & Blackwell, D. Separation of kafirins on surface porous reversed-phase high-performance liquid chromatography columns. J. Agric. Food Chem. 59, 85–91 (2011).
Article CAS PubMed Google Scholar
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS PubMed PubMed Central Google Scholar
Wishart, D. S. et al. HMDB: the human metabolome database. Nucleic Acids Res. 35, D521–D526 (2007).
Article CAS PubMed PubMed Central Google Scholar
Xia, J. & Wishart, D. S. Using MetaboAnalyst 3.0 for comprehensive metabolomics data analysis. Current protocols in bioinformatics 55, 14.10. 11-14.10. 91 (2016).
Chen, Y. et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. Gigascience 7, gix120 (2018).
Article PubMed Google Scholar
Andrews, S. FastQC: A quality control tool for high throughput sequence data. Babraham Bioinformatics, Babraham Institute, Cambridge, UK (2010).
Paterson, A. H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009).
Article CAS PubMed Google Scholar
McCormick, R. F. et al. The Sorghum bicolor reference genome: improved assembly, gene annotations, a transcriptome atlas, and signatures of genome organization. Plant J. 93, 338–354 (2018).
Article CAS PubMed Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Wu, Z.-G. et al. Morphological and stage-specific transcriptome analyses reveal distinct regulatory programs underlying yam (Dioscorea alata L.) bulbil growth. J. Exp. Bot. 71, 1899–1914 (2020).
Article CAS PubMed Google Scholar
Venables, W. N. & Ripley, B. D. Modern applied statistics with S-PLUS. (Springer Science & Business Media, 2013).
Kolde, R. & Kolde, M. R. Package ‘pheatmap’. R. package 1, 790 (2015).
Google Scholar
Bholowalia, P. & Kumar, A. EBK-means: A clustering technique based on elbow method and k-means in WSN. Int. J. Comput. Appl. 105, 17–24 (2014).
Yuan, C. & Yang, H. Research on K-value selection method of K-means clustering algorithm. J 2, 226–235 (2019).
Google Scholar
Ge, S. X., Jung, D. & Yao, R. ShinyGO: a graphical gene-set enrichment tool for animals and plants. Bioinformatics 36, 2628–2629 (2020).
Article CAS PubMed Google Scholar
Szklarczyk, D. et al. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic acids Res. 51, D638–D646 (2023).
Article CAS PubMed Google Scholar
Kumar, L. & Futschik, M. E. Mfuzz: a software package for soft clustering of microarray data. Bioinformation 2, 5 (2007).
Article PubMed PubMed Central Google Scholar
Sudhakar Reddy, P. et al. Evaluation of sorghum [Sorghum bicolor (L.)] reference genes in various tissues and under abiotic stress conditions for quantitative real-time PCR data normalization. Front. plant Sci. 7, 172935 (2016).
Article Google Scholar
Kebrom, T. H., McKinley, B. & Mullet, J. E. Dynamics of gene expression during development and expansion of vegetative stem internodes of bioenergy sorghum. Biotechnol. Biofuels 10, 1–16 (2017).
Article Google Scholar
Emms, D. M., Covshoff, S., Hibberd, J. M. & Kelly, S. Independent and parallel evolution of new genes by gene duplication in two origins of C4 photosynthesis provides new insight into the mechanism of phloem loading in C4 species. Mol. Biol. Evolution 33, 1796–1806 (2016).
Article CAS Google Scholar
Makita, Y. et al. MOROKOSHI: transcriptome database in Sorghum bicolor. Plant Cell Physiol. 56, e6–e6 (2015).
Article PubMed Google Scholar
Davidson, R. M. et al. Comparative transcriptomics of three Poaceae species reveals patterns of gene expression evolution. Plant J. 71, 492–502 (2012).
Article CAS PubMed Google Scholar
Turco, G. M. et al. DNA methylation and gene expression regulation associated with vascularization in Sorghum bicolor. N. Phytologist 214, 1213–1229 (2017).
Article CAS Google Scholar
Varoquaux, N. et al. Transcriptomic analysis of field-droughted sorghum from seedling to maturity reveals biotic and metabolic responses. Proc. Natl Acad. Sci. 116, 27124–27132 (2019).
Article CAS PubMed PubMed Central Google Scholar
Gelli, M. et al. Identification of differentially expressed genes between sorghum genotypes with contrasting nitrogen stress tolerance by genome-wide transcriptional profiling. BMC Genomics 15, 1–16 (2014).
Article Google Scholar
Dugas, D. V. et al. Functional annotation of the transcriptome of Sorghum bicolor in response to osmotic stress and abscisic acid. BMC Genomics 12, 1–21 (2011).
Article Google Scholar
Ma, C., Xin, M., Feldmann, K. A. & Wang, X. Machine learning–based differential network analysis: a study of stress-responsive transcriptomes in Arabidopsis. Plant Cell 26, 520–537 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ma, C. & Wang, X. Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis. Plant Physiology 160, 192–203 (2012).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This research was supported by the intramural research program of the U.S. Department of Agriculture, National Institute of Food and Agriculture, Agriculture and Food Research Initiative (AFRI), under the award number: 2023-67013-39631. Y.J. and A.K. was also supported by the State of Texas’ Governor’s University Research Initiative (GURI). S.R.B. was funded by USDA-ARS project number 3020-43440-002.The findings and conclusions in this preliminary publication have not been formally disseminated by the U. S. Department of Agriculture and Should not be construed to represent any agency determination or policy. Names are necessary to report factually on available data; however, the U.S. Department of Agriculture neither guarantees nor warrants the standard of the product and use of the name by the U.S. Department of Agriculture implies no approval of the product to the exclusion of others that may also be suitable. USDA is an equal opportunity provider and employer.

Author information

These authors contributed equally: Adil Khan, Ran Tian.

Authors and Affiliations

Institute of Genomics for Crop Abiotic Stress Tolerance, Department of Plant and Soil Science, Texas Tech University, Lubbock, TX, 79409, USA
Adil Khan, Ran Tian & Yinping Jiao
Grain Quality and Structure Research Unit, Center for Grain and Animal Health Research, USDA-ARS, 1515 College Ave, Manhattan, KS, 66502, USA
Scott R. Bean
Department of Agriculture, Veterinary & Rangeland Sciences, University of Nevada-Reno, Reno, NV, 89557, USA
Melinda Yerka

Authors

Adil Khan
View author publications
Search author on:PubMed Google Scholar
Ran Tian
View author publications
Search author on:PubMed Google Scholar
Scott R. Bean
View author publications
Search author on:PubMed Google Scholar
Melinda Yerka
View author publications
Search author on:PubMed Google Scholar
Yinping Jiao
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.J. conceived of and designed the project. A.K. and R.T. performed the experiments and analyzed the data. S.B. conducted protein analysis. A.K., Y.J., R.T. and M.K.Y. prepared the manuscript. All authors edited and approved the final version for publication.

Corresponding author

Correspondence to Yinping Jiao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Biology thanks Long Mao and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: David Favero. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer Review File

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Supplementary Data 7

Supplementary Data 8

Supplementary Data 9

Supplementary Data 10

Supplementary Data 11

Supplementary Data 12

Supplementary Data 13

Supplementary Data 14

Supplementary Data 15

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Khan, A., Tian, R., Bean, S.R. et al. Transcriptome and metabolome analyses reveal regulatory networks associated with nutrition synthesis in sorghum seeds. Commun Biol 7, 841 (2024). https://doi.org/10.1038/s42003-024-06525-7

Download citation

Received: 22 December 2023
Accepted: 28 June 2024
Published: 10 July 2024
Version of record: 10 July 2024
DOI: https://doi.org/10.1038/s42003-024-06525-7

This article is cited by

Dynamic transcriptome landscape of oat grain development
- Ting Wang
- Bing Han
BMC Genomics (2025)
Temporal dynamics of metabolite accumulation and carbon–nitrogen reprogramming during wheat (Triticum aestivum L.) grain development
- Naimat Ullah
- Huma Qureshi
- Dilbar Bazarbayeva
Cereal Research Communications (2025)
An updated molecular toolkit for genomics-assisted breeding of waxy sorghum [Sorghum bicolor (L.) Moench]
- Melinda K. Yerka
- Zhiyuan Liu
- Yinping Jiao
Journal of Applied Genetics (2025)