Abstract
Successful seed development is essential for flowering plant reproduction and requires the coordination of three genetically distinct tissues: the embryo and endosperm, which are the products of fertilization, and the maternal seed coat. Our understanding of the transcriptional programs underlying tissue-specific functions and inter-tissue coordination in seeds remains incomplete. To address this, we performed single-nucleus RNA sequencing on Arabidopsis thaliana seeds at 3, 5 and 7 days after pollination. Here we characterized all major seed cell or nucleus types, further refined transcriptional states in the endosperm and mapped signatures of selection on cell-type-specific genes. Among other findings, our analyses reveal the compartmentalization of genes involved in brassinosteroid-responsive transcription factor activation, abundant endosperm expression of genes that encode short, secreted peptides, and expression enrichment of rapidly evolving genes in endosperm and seed coat subtypes, illuminating the cell type and species specificity of seed genes.
Main
The three genetically distinct tissues of the seed are developmentally coordinated to ensure the propagation of the next generation1. This process begins when the egg cell and central cell of the female gametophyte are fertilized by sperm to generate the embryo and endosperm, respectively. After fertilization, the growth, development and differentiation of the embryo and the more rapidly developing endosperm are accompanied by the growth and development of the ovule integuments, which become the seed coat2,3,4,5. The close synchronization of developmental transitions in the seed suggests widespread signalling between tissues, even though inter-tissue signals must cross cell walls and membranes as the embryo, endosperm and seed coat are isolated symplastic fields6,7. Mechanisms underlying coordination within and between tissues are beginning to be elucidated8. For example, seed coat development requires endosperm-derived auxin, and embryo morphogenesis relies on auxin from the early seed coat9,10,11. The complete mechanisms of these interactions are unclear, and they are only two components of an extensive molecular dialogue.
The embryo-nourishing endosperm is a dynamic tissue that has been implicated in several axes of inter-tissue signalling12,13,14. It begins as a coenocyte and then undergoes cellularization before embryo invasion and consumption during the final stages of seed development15. Arabidopsis endosperm was previously transcriptionally characterized throughout seed development by laser-capture microdissection followed by microarray analysis and was transcriptionally characterized at the single-nucleus level at 4 days after pollination (DAP)16,17. Endosperm nuclei that are at key tissue interfaces show the highest transcriptional distinction—namely, the embryo-proximal micropylar endosperm (MCE) and the chalazal endosperm (CZE), which sits at the maternal–offspring interface and acts as a gateway for maternal resources into the embryo sac16,17,18. How endosperm compartments locally coordinate processes at tissue interfaces is incompletely understood. However, some short, secreted peptides (SSPs) from the MCE are well-characterized inter-tissue signals. For example, the embryo-derived TWS1 SSP is processed in the endosperm by the ALE1 subtilase, and the mature peptide directs cuticle development in the embryo19,20. Given the symplastic isolation of seed compartments, SSP signalling could be a frequent conduit for inter-tissue signals.
Throughout seed development, maternal resources are deposited into and distributed from the chalazal seed coat (CZSC), a specialized region of the seed coat at the terminus of the vasculature5,21. Transporter- and channel-dense cells import nutrients and hormones and unload them into integument symplastic domains22. Recent evidence suggests that maternally synthesized auxin and abscisic acid are imported from the funiculus and control seed size and dormancy, respectively23,24. The CZSC is a morphologically complex region, and the degree of functional specialization among CZSC cell types is unknown.
We present a time-point-resolved single-nucleus RNA sequencing (snRNA-seq) atlas of early Arabidopsis seed development. Among other findings, our analyses reveal transcriptionally defined CZSC subtypes with complementary functions and the concentration of brassinosteroid (BR) biosynthesis in a micropylar seed coat subtype. In the endosperm, we report genes that underlie transcriptional polarity within the CZE cyst, many of which encode SSPs, consistent with a general enrichment of SSP expression in the CZE and MCE. Finally, we show that that the peripheral endosperm (PEN) and CZE express the majority of seed-expressed genes that appear to be under positive selection. Our atlas is available for exploration at https://seedatlas.wi.mit.edu/.
Results
A transcriptional atlas of early Arabidopsis seed development
To capture the most dynamic period of endosperm development, we isolated and sequenced RNA from individual nuclei from 3, 5 and 7 DAP Col-0 seeds using the 10x Genomics platform (Fig. 1a and Methods). At 3 DAP, the embryo is at the globular stage, and the endosperm is coenocytic. At 5 DAP, the endosperm begins to cellularize at the micropylar pole, and at 7 DAP cellularization is complete and the embryo expands rapidly (Fig. 1a)15. For each time point, we collected two biological replicates, which showed high transcriptional correlation within time points (Extended Data Fig. 1). After raw snRNA-seq data filtering and correction (Methods), we identified an optimal clustering resolution on the basis of both the number of clusters and cluster neighbourhood purity (Supplementary Fig. 1). We assigned clusters to cell types by analysing the expression of known published marker genes, by using differential expression and Gene Ontology (GO) term enrichment analysis and by hybridization chain reaction (HCR) RNA-fluorescence in situ hybridization (RNA-FISH) of cluster marker genes (Supplementary Tables 1–4 and Methods). We further subclustered embryo and endosperm data to identify previously characterized cell types in the embryo and reveal putative novel nuclei types in the endosperm (Extended Data Fig. 2 and Supplementary Figs. 2–4). At the highest level of resolution, which we refer to as level 3 (L3) annotation, we identified 34, 33 and 25 clusters at 3, 5 and 7 DAP, respectively (Fig. 1b–d, Extended Data Fig. 3 and Supplementary Table 3). The L3 clusters were then assigned level 2 (L2) and level 1 (L1) annotations, which were harmonized across time points (Fig. 1b,c). Once each time point was annotated separately, the datasets were integrated into a final atlas dataset (Fig. 1b–d). In total, our atlas contains 54,210 profiles (24,024 at 3 DAP, 16,039 at 5 DAP and 14,147 at 7 DAP) and is approximately 10.5% embryo, 23.4% endosperm, 64.3% seed coat and 1.8% unfertilized ovule and funiculus (Fig. 1e).
a, Seed developmental stages profiled in this study and L1 and L2 annotations. oi2, outer integument 2; oi1, outer integument 1; ii2, inner integument 2; ii1′, inner integument 1′; ii1, inner integument 1; EMB, embryo; CPT, chalazal proliferating tissue; CZSC, chalazal seed coat; FUN, funiculus; PEN peripheral endosperm; MCE, micropylar endosperm; ESR, embryo surrounding region; CZE, chalazal endosperm; OVL, ovule. b–d, L1 (b), L2 (c) and L3 (d) annotations across all datasets. See Supplementary Table 3 for complete descriptions of all L3 annotations. e, Cell type proportions for L2 annotations in the full atlas dataset. f, Morphological stages of the embryo. g, Merged 3–7 DAP embryo datasets coloured according to annotations from f (left) and time point (right). h, Left: HCR validation of the protoderm (PDF1+), inner cotyledon (XYL4+) and vascular primordium (JUL1+) in the embryo. Scale bars, 50 µm. The PDF1 and JUL1 images are representative of signal observed in at least six seeds in two independent experiments, and the image for XYL4 represents signal observed in four seeds in one experiment. Right: expression of genes corresponding to HCR probes in the integrated embryo dataset, split by L3 annotation. y-axis labels are coloured according to tissue types in f. i, Detection of the suspensor, a rare nucleus type in the merged embryo dataset. Left: module score analysis for 29 suspensor marker genes curated in Kao et al.27. Right: WOX8 is a suspensor marker gene enriched in the suspensor cluster and detected in 31 nuclei in the entire dataset. The black arrows point to the putative suspensor population.
To assess atlas completeness, we determined whether we had captured rare embryonic cell types (Fig. 1f). The majority of embryo nuclei were isolated from seeds at 7 DAP (5,106), with 178 nuclei from 3 DAP and 399 nuclei from 5 DAP (Fig. 1g and Supplementary Table 3). The shoot apical meristem (SAM) and suspensor represent rare embryonic cell types, which respectively specifically express CUC1 and WOX8 (Fig. 1f)25,26. A subclustering analysis revealed five clusters in the 3 DAP embryo and six in the 5 DAP embryo (Extended Data Fig. 2 and Methods). Using the embryo subtype markers curated in Kao et al.27, we annotated embryo subclusters with known subtypes, if possible. We identified upper and lower protoderm populations at each time point and a 3-DAP-specific WOX8+ suspensor population, among other subclusters (Fig. 1i and Extended Data Fig. 2). 7 DAP embryo subtypes were resolved at the de novo clustering resolution, and clusters corresponding to the inner cotyledon, vascular primordium, ground tissue initials, cortical initials and hypophysis were identified (Fig. 1g)27,28,29,30,31,32. Correspondence between clusters and cell types was validated by HCR RNA-FISH (Fig. 1h, Supplementary Fig. 5 and Supplementary Tables 1 and 2). CUC1 was detected in 12 protodermal nuclei at 5 and 7 DAP, and these nuclei were enriched with SAM-specific gene expression (Extended Data Fig. 2). However, other embryo clusters showed similar levels of SAM-specific gene expression, and WUS, a known marker for a subpopulation of the SAM expressed early in development, was not detected in any embryo nuclei (Extended Data Fig. 2)28,33. This indicates that ~54,000 nuclei from whole seeds enable the detection of some rare cell populations in the seed, such as the suspensor, but characterized SAM cell types appear to be absent. Overall, the clusters we defined include representatives of most previously anatomically and morphologically defined seed cell types (Fig. 1, Extended Data Fig. 3 and Supplementary Table 3).
A high-resolution census of the developing seed coat
The seed coat is the predominant seed tissue type at early stages (Fig. 1b,e), comprising the five-layered testa, which in past studies has been referred to as the general seed coat (GSC), and the CZSC, or the cell types near maternal vascular terminals (Fig. 2a)17,34. Each layer of the GSC is one cell thick, which has made comprehensive transcriptional profiling of these cell types difficult, although many layer-specific genes have been identified35,36,37,38,39,40. Using published markers, we identified all five layers of the seed coat in our L2 annotations across all time points, except for 7 DAP, for which there is a single inner integument 1′ (ii1′)/inner integument 2 (ii2) cluster (Extended Data Fig. 4 and Supplementary Tables 1 and 3).
a, A model of the distribution of lower seed coat nucleus types identified in this study at 5 DAP. Xylem and phloem terminate in the CZSC. PS, pigment strand. b, Expression patterns of published markers (Supplementary Table 1) and HCR-validated markers (SWEET10 and AGO3) across all L3 annotations for the lower seed coat. y-axis labels are coloured according to tissue types in a. See Supplementary Table 3 to match the abbreviated L3 names to their full descriptions. Avg. norm. expr., average normalized expression of a given gene for nuclei in a cluster; Pct. expr., percent of nuclei in a cluster that express a given gene; pers nuc, persistent nucellus; basal cont PS, basal contains pigment strand; plac phos, placentochalazal phosphate transport; plac call, placentochalazal callose-synthesizing; micr, micropylar; synth, synthesizing. c, AGO3 labels a subpopulation of the CPT at 3 DAP by HCR, while RALFL3 labels the CZE. Scale bar, 50 µm. The AGO3 and RALFL3 image is representative of signal observed in at least three independent experiments for each probe. d, AGO3 is highly specific to the CPT (left), and 11 nucleoside metabolism markers follow the same expression pattern (right). e, Differential expression between the 3 DAP CPT persistent and transient nuclei, with adjusted P values calculated through a two-sided Wilcoxon rank sum test with Bonferroni correction. f, SWEET10 is localized to the placentochalazal region by HCR and appears to be surrounding maternal vascular (Vasc) terminals. Scale bar, 50 µm. The SWEET10 image is a representative of the signal observed in at least three seeds in three independent experiments. g, Top: TET5 and SWEET10 are the best markers for the L3 subtypes of the 3 DAP placentochalazal CZSC. Bottom: a subset of GO terms exclusively associated with DE genes (log2FC > 1, adjusted P ≤ 0.05, according to a hypogeometric test with Benjamini–Hochberg adjustment) for each placentochalazal cluster.
Whereas the upper seed coat contains the five principal layers, the lower seed coat is a densely heterogeneous region where maternal vasculature, the CZSC and layers of the GSC meet (Fig. 2a). It serves roles in hormonal signalling and nutrient uptake in the developing seed23,41,42,43. To identify lower seed coat subtypes, we used the published marker ARR22, which is enriched in the CZSC but is also detected in adjacent lower seed coat tissues44. We identified 18 ARR22+ clusters across all time points (Fig. 2b), including subtypes of the GSC, CZSC and the chalazal proliferating tissue (CPT), a small population derived from the nucellus that lies beneath the CZE cyst45. After fertilization, the CPT is partially degraded, characterized by ‘persistent’ and ‘transient’ CPT subtypes, and then fully degraded between 5 and 7 DAP46,47. We used the markers SWEET4 and PLL24 (ref. 48) to identify the persistent and transient populations, respectively. AGO3 promoter activity has been reported in the chalazal integument of seeds, and we defined its expression in the putative persistent CPT by HCR RNA-FISH (Fig. 2b,c)49. A GO term analysis using the top persistent CPT differentially expressed (DE) genes (log2(fold change (FC)) > 1, adjusted P < 0.05) indicated that it is a hotspot for nucleoside catabolism in the seed, consistent with ongoing programmed cell death in this region (Extended Data Fig. 4)46. Eleven nucleoside catabolism genes showed high specificity for the persistent CPT (Fig. 2d and Extended Data Fig. 4). Differential expression analysis between the 3 DAP persistent and transient CPT clusters implicated genes associated with cell death and metal response (Fig. 2e). Notably, DELTAVPE, a contributor to cell death in the inner integument39, is DE in the transient CPT, while MSL10, an ion channel that positively regulates programmed cell death in a mechanically sensitive manner50, is DE in the persistent population, suggesting differing cell death triggers in the transient and persistent CPT.
Cell types in the placentochalazal region of the CZSC have complementary functions
The CZSC is appreciated for being the primary site of active nutrient transfer into the seed from maternal vascular terminals, facilitated by SWEET, UMAMIT and PHO1 transporters41,42,51,52. The cells closest to the maternal vasculature comprise the placentochalazal region and specifically express the gene encoding the UMAMIT14 amino acid transporter51 (Fig. 2a,b). On the basis of UMAMIT14 and high ARR22 expression, we identified six putative CZSC clusters across all time points, which were grouped into three subtypes on the basis of the expression of SWEET10, TET5 and AAP6 (Fig. 2b,f and Supplementary Table 1). Of these, we found two putative placentochalazal subtypes in the 3 and 5 DAP CZSC, which were labelled specifically by SWEET10 and TET5 and shared UMAMIT14 expression (Fig. 2b,g). We identified DE genes in these subtypes compared with all other clusters at 3 DAP (average log2FC > 1, adjusted P < 0.05) (Fig. 2g and Extended Data Fig. 5). TET5+ cluster DE genes are associated with the GO term ‘(1 → 3)-β-ᴅ-glucan (callose) metabolic process’. Callose deposits are known to regulate plasmodesmata activity and serve an insulating role between cell types of the developing ovule53. A callose-rich ‘phloem end’ has recently been described in the CZSC54. Module score analysis (Methods) using all genes associated with the GO term ‘callose biosynthesis’ revealed high, significant enrichment in TET5+ clusters across all time points (Extended Data Fig. 5). Several putative callose biosynthesis genes are specifically expressed in this population, the strongest being CALS8 (Fig. 2b and Extended Data Fig. 5).
In contrast, top exclusive GO terms for the SWEET10+ cluster DE genes include ‘phosphate ion transport’ and ‘phosphatase activity’, supported by the differential expression of PHO1;H1 and of TPPE, TPPB and TPPG, respectively (Fig. 2g and Extended Data Fig. 5). PHO1;H1 is a phosphate exporter with demonstrated activity in the Arabidopsis CZSC42. Additionally, trehalose phosphatase expression suggests that these cells participate in trehalose-6-phosphate signalling, which directs sugar utilization55,56,57,58. The invertase cwINV4 sugar transporter gene is also specifically expressed in this cluster (Extended Data Fig. 5). These results indicate that the SWEET10+ cluster is probably the primary site of nutrient transfer in the placentochalazal CZSC and that the TET5+ and SWEET10+ populations have complementary functions influencing the permeability of maternal vascular terminals and surrounding cells.
BR biosynthesis, homeostasis and response genes show concentrated expression in the micropylar region of the seed
BRs are hormones that act widely in plant physiology and are known to direct organ formation and cell expansion in reproductive tissues59,60,61,62,63,64,65. In the seed they promote endosperm proliferation by reducing the physical resistance of the seed coat through cell wall weakening65. Although maternal seed-coat-derived BRs are hypothesized to interact directly with the endosperm, a recent study indicates that BR controls endosperm proliferation through cell autonomous effects in the seed coat65. To identify sites of BR biosynthesis in the developing seed, we performed a module score analysis for the GO term gene sets ‘BR biosynthesis’ and ‘BR homeostasis’. Across all time points, the ii2 seed coat layer showed the greatest enrichment for BR biosynthesis, followed by the outer integument 1 (oi1) cluster (Fig. 3a,b). Additionally, partitioning by time point and L3 annotation revealed that a putative micropylar oi subtype (Fig. 3a–c) drives the L2 oi1 enrichment for BR biosynthesis gene expression and exhibits the highest enrichment for genes associated with BR homeostasis atlas-wide. We annotated an oi micropylar cluster on the basis of the specific expression of SWEET12 (Fig. 2b), which has previously been localized to the micropylar end of the seed coat, and because the cluster shows high transcriptional similarity to oi clusters, with a slight bias towards oi1 (Supplementary Fig. 6)52. An inspection of the genes underlying the micropylar oi enrichment scores revealed that the enzymes that catalyse the last two steps of BR biosynthesis, BR6OX1 and BR6OX2, are upregulated in this subtype (Fig. 3d). We propose that the micropylar oi is a key site for BR production at early to intermediate stages of seed development.
a, A subset of 5 DAP L2 annotations with the L3 oi1 micropylar region highlighted in yellow. b, Gene set enrichment for all genes associated with BR biosynthesis (GO:0016132) in the L2 seed coat across all time points. c, Module score analysis for BR biosynthesis and homeostasis (GO:0010268) in a subset of L3 seed coat clusters. See Supplementary Table 3 to match the abbreviated L3 names to their full descriptions. d, Gene expression patterns for BR6OX1 and BR6OX2. e, Module score analysis for all six BZR/BES1 transcription factors detected in the atlas (BES1, BEH1, BEH2, BEH3, BEH4 and BZR1). BES1, BEH3, BZR1 and BEH4 show the strongest expression and are enriched in 5 DAP MCE. Components of the BR-independent BES1 activation pathway (EMS1 and SERK1/2) show overlapping expression with BES1, although TPD1 shows low, non-specific expression throughout seed development. f, Module score analysis for 128 BES1 targets (identified in O’Malley et al.72 and DE in the atlas MCE (log2FC > 1, adjusted P ≤ 0.05)), showing an increase in BES1 target genes through development. In b, c and e, adjusted P values are shown to the right above clusters with significantly high average positive module scores in a cluster-versus-all-other-nuclei comparison (sample size 54,210, treating individual nuclei as biological replicates). In f, adjusted P values were generated from pairwise comparisons (sample sizes: 524 for 3 DAP MCE, 426 for 5 DAP MCE and 1,929 for 7 DAP, treating individual nuclei as biological replicates). All P values are derived from a two-sided Wilcoxon rank-sum test with Bonferroni correction. See Supplementary Table 7 for the module scores, P values, and pairwise group sizes for all comparisons. For all box plots, the centre line corresponds to the median, the upper and lower hinges correspond to the 25th and 75th percentiles, and the whiskers extend to the highest and lowest values that are within 1.5 times the interquartile range.
BZR1/BES1 transcription factors are activated by BR via the BRI1 receptor-like kinase, although BR-independent activation of BES1 by the SSP TPD1 through EMS1–SERK1/2 signalling has also been observed66. Once activated, they promote the expression of genes involved in cell elongation, light-regulated development and cell wall remodelling66,67,68,69,70,71. The activity of some BZR1/BES1 family members has been characterized in the endosperm: constitutively active BES1 or BZR1 causes reduced proliferation of endosperm nuclei, whereas quintuple mutants of BZR1/BES1 family members (bzr1 bes1 beh1 beh3 beh4) show no endosperm phenotype, suggesting redundant mechanisms for BZR1/BES1-family contributions to endosperm development65. To map BZR1/BES1 activity, we performed a module score analysis for the six BZR1/BES1 genes detected in the atlas, revealing high and specific expression in the 5 DAP MCE, driven by BES1, BEH3, BZR1 and BEH4 (Fig. 3e). The transmembrane BR receptor BRI1 shows broad expression throughout the seed coat but is depleted in the 5 DAP MCE (Fig. 3e). To determine whether BZR1/BES1 may be activated in a BR-independent manner in the MCE, we mapped the expression of EMS1, TPD1 and SERK1/2. Although EMS1 and SERK1/2 show overlapping spatiotemporal expression patterns with MCE BZR1/BES1 transcription factors, TPD1 is nearly undetectable atlas-wide (Fig. 3e).
To test whether BES1 targets show increased expression after BES1 upregulation in the MCE, we performed a module score analysis of all BES1 targets that are DE in the MCE (log2FC > 1, adjusted P < 0.05), which indicated significant upregulation from 3 to 7 DAP (Fig. 3f)72. The co-occurrence of BR biosynthesis and response genes in proximal micropylar cell types, the absence of TPD1 and the timing of BES1 target upregulation supports the hypothesis that BRs are transported from the micropylar region of oi1 to the endosperm. However, it is plausible that EMS1 and SERK1/2 may be activated by an alternative TPD1-like SSP, such as TPD1-like 1 (TDL1), which is expressed in the early embryo and is the seed SSP that shows the highest sequence similarity to TPD1 of all Arabidopsis SSPs (Supplementary Fig. 7). In that case, BZR1/BES1 activation might also occur cell non-autonomously, but in a BR-independent manner.
The micropylar-to-embryo surrounding region endosperm shift is characterized by UMAMIT and nitrate transporter gene expression
On the basis of a correlation analysis of L2 endosperm clusters throughout development, a dramatic transcriptional shift occurs in the endosperm between 5 and 7 DAP. We observed that the 7 DAP endosperm is the most transcriptionally divergent endosperm cluster (Extended Data Fig. 6). Within 7 DAP endosperm clusters, the GLIP6+ MCE (referred to as the embryo surrounding region at this stage) is the most distinct, driven by the upregulation of UMAMITs and nitrate transporters (Extended Data Fig. 6). The embryo surrounding region is known to shape embryo viability by controlling the formation of two barriers that support successful germination: the embryonic cuticle and sheath, which respectively prevent embryo dehydration and endosperm adherence19,73. These processes are coordinated with programmed cell death through the transcription factor ZOU/RGE1 (refs. 73,74,75,76). At 7 DAP, we observed two clusters within the ZOU+ population: one expressing KRS, a signalling peptide that directs embryo sheath formation, and another specifically expressing NAC074 and enriched for NAC087, which directs programmed cell death in the endosperm (Extended Data Fig. 6)73,76. ZOU induces KRS expression74 and probably controls programmed cell death in partially parallel pathways with NAC transcription factors76, and these genes appear to be distributed on a continuum of MCE nucleus states at 7 DAP. Other genes follow this pattern, such as the nitrate transporters, which are enriched in the KRS+ cluster (Extended Data Fig. 6).
The developmental basis for transcriptional polarity in the CZE
The CZE is composed of an unusual population of nuclei that does not cellularize77. The CZE is important for maternal resource allocation in the developing seed due to its position at the maternal–filial interface, and it exhibits the strongest parent-of-origin allelic expression bias for imprinted genes of all endosperm subtypes16,78. In particular, paternally expressed imprinted genes are upregulated in the CZE (Supplementary Fig. 8)16. A corresponding enrichment of epigenetic and transcriptional regulator expression in the CZE has been described16,79. This atlas contains 2,941 CZE nuclei, enabling higher-powered differential expression analyses. We inspected the expression patterns of 460 chromatin-associated genes that showed variable endosperm subtype specificity in our previous snRNA-seq study of 4 DAP endosperm and found that most of these genes are enriched in early endosperm subtypes. Furthermore, some of the epigenetic regulators previously described are depleted in the endosperm and enriched in seed coat layers (Supplementary Fig. 8). To identify additional epigenetic regulators that vary between seed cell and nuclei types, we inspected DE genes associated with chromatin (GO:0000785), transcriptional regulation of gene expression (GO:0010468) and epigenetic regulation of gene expression (GO:0040029) and found 736 additional endosperm-variable genes. This updated gene set generally follows the enrichment pattern exhibited by the gene list from our previous study (Supplementary Fig. 8 and Supplementary Table 5)16.
A subset of CZE nuclei form the multi-nucleate cyst, a coenocytic sac isolated by the central vacuole, and another subset form the nodules, which lie above the cyst (Fig. 4a)80. Developmental time course and live imaging studies of Arabidopsis endosperm indicate that the cyst grows in part through fusion with proximal nuclei from the nodule3,80,81. The cyst and nodule were transcriptionally characterized in our previous snRNA-seq study of 4 DAP endosperm, which also described a ‘nodule-like’ chalazal population that was not morphologically defined16. We first identified CZE nucleus populations using marker gene sets for the chalazal cyst, nodule and nodule-like populations defined from 4 DAP, which include AT2G44240 (cyst), CYCD4;2 (nodule) and MEA (nodule-like) (Fig. 4d)16. These markers were not detected at 7 DAP, so NPF4.5 was used to define the 7 DAP CZE (Fig. 4d, Supplementary Fig. 5 and Supplementary Table 1)17.
a, L2 annotations of chalazal-proximal tissues, including the CZE, CPT, PEN and ii1. The chalazal nodules and cyst are morphologically characterized compartments of the CZE. b, Left: the CZE highlighted in the 3 DAP dataset. Middle and right: AT3G49307, a CZE marker, and RALFL3 (arrow), a marker for a subtype of the CZE at 3 DAP. c, HCR validation of RALFL3 (top) and AT3G49307 (bottom) expression in 3 DAP seeds. Scale bars, 50 µm. The images are representative of signal observed in at least four independent experiments for each probe. d, Expression patterns of CZE subtype markers from Picard et al.16 (MEA, CYCD4;2 and AT2G44240) and this study in all L3 clusters for CZE, PEN and CZE-proximal tissues at all time points. See Supplementary Table 3 to match the abbreviated L3 names to their full descriptions. y-axis labels are coloured according to tissue types in f. e, Clustered heat map of the Spearman correlation coefficients for aggregated expression of the 5 DAP L3 clusters for CZE, PEN and CZE-proximal maternal subtypes. f, A model for CZE cyst growth through fusion with proximal nuclei based on live imaging studies and our transcriptional characterization. g, Left: time point annotations for the integrated 3–5 DAP CZE and PEN dataset used for pseudotime analysis. Pseudotime is a proxy for progression along a developmental trajectory from the 3 DAP PEN. Middle: L3 annotations for CZE subtypes and L2 annotation for the PEN in the 3–5 DAP integrated datasets. Right: integrated 3–5 DAP datasets coloured according to 3-DAP-PEN-anchored pseudotime, with the monocle3 principal graph indicating developmental branches from the 3 DAP PEN. The green arrow indicates the putative ‘basal’ trajectory from the early PEN, while the red arrow denotes the ‘apical’ branch. h, Pseudotime distributions for CZE and 3 DAP PEN nuclei types. The nodule-like and nodule populations represent transitional states between the PEN and CZE. For all box plots, the centre line corresponds to the median, the upper and lower hinges correspond to the 25th and 75th percentiles, and the whiskers extend to the highest and lowest values that are within 1.5 times the interquartile range.
Differential expression analysis of the 3 DAP L2 clusters revealed that the SSP gene RAPID ALKALINIZATION FACTOR-LIKE 3 (RALFL3) was a highly specific marker for a subpopulation of CZE nuclei (Fig. 4b and Supplementary Figs. 2 and 3). RALFL3 HCR RNA-FISH indicated transcript accumulation either throughout or at the base of the chalazal cyst (Figs. 2c and 4c and Extended Data Fig. 7). We compared the RALFL3 HCR signal with that of AT3G49307, an SSP gene more broadly expressed in the CZE, and found the highest AT3G49307 signal in the apical region of the cyst and nodules throughout seed development (Fig. 4c and Extended Data Fig. 7).
We propose that, rather than being a structure of uniform gene expression, there is a transcriptional apical–basal axis within the chalazal cyst, with AT3G49307 and RALFL3 labelling the apical and basal regions, respectively. At 3 DAP, the RALFL3+ basal state predominates in the cyst, but at 5 DAP the apical/basal distinction is pronounced, and two cyst states are detectable as subclusters (Fig. 4d). A correlation analysis of aggregated gene expression for cell types within and adjacent to the CZE showed high transcriptional correlation between the basal cyst and proximal maternal tissues, the ii1 and CPT (Fig. 4a,e).
On the basis of high RALFL3 HCR signal in CZE cysts when about three nuclei are visible and the transcriptional similarity of RALFL3+ basal cyst clusters to both CZE subtypes and proximal maternal tissues, we hypothesized that the RALFL3+ basal cyst nuclei represent the ‘founder’ nuclei that migrate to the chalazal region at early time points, to which subsequent nuclei fuse to generate the mature chalazal cyst (Fig. 4f and Extended Data Fig. 8). To characterize the developmental landscape of the CZE, we performed trajectory inference and pseudotime analysis on the 3 and 5 DAP PEN and CZE, anchored in the 3 DAP PEN (Methods). Pseudotime values are a proxy for progression on a developmental trajectory from the 3 DAP PEN. This analysis revealed two branches, one connecting the 3 DAP PEN with the 3–5 DAP basal cyst clusters and another connecting the 3 DAP PEN with the rest of the 3–5 DAP CZE subtypes. This suggests that the basal cyst has an independent developmental trajectory from the rest of the CZE (Fig. 4g). Pseudotime analysis positioned the nodule-like population closest to the early PEN, and this population might represent the initial commitment to the CZE-bound state (Fig. 4h). However, the majority of the 3 and 5 DAP nodule and nodule-like populations are both positioned on the non-basal cyst branch, suggesting that the transcriptional underpinnings of the PEN-to-CZE transition are distinct for basal cyst nuclei compared with the rest of the CZE (Fig. 4g and Extended Data Fig. 8). Pseudotimes within nodule and nodule-like subtypes are similar across 3 and 5 DAP, suggesting that although nuclei migrate through these transition states on their way to the CZE cyst, the states themselves are stable (Fig. 4h).
To identify DE transcription factors that might promote the PEN-to-CZE transition, we performed graph autocorrelation analysis using the 3–5 DAP PEN and CZE dataset (Methods). Focusing on the non-basal cyst branch, we identified 56 transcription factors that significantly vary in pseudotime and are DE (log2FC > 1, adjusted P < 0.05) in the nodule-like and nodule populations, such as HDG8 and GRF2 (Extended Data Fig. 8 and Supplementary Table 6).
Discrete families of SSPs are enriched in endosperm subtypes
We observed that many endosperm subtype marker genes are SSPs, such as RALFL3, AT3G49307 and KRS (Fig. 4 and Extended Data Figs. 6 and 7). SSPs are abundant in plant genomes and play roles in reproduction, development and innate immunity82,83,84,85,86,87. They are characterized by an amino-terminal secretory signal sequence and are less than 250 amino acids long (Fig. 5a)88,89,90. Arabidopsis seeds devote more of their transcriptomes to genes that encode SSPs than any other tissue, but little is known about the functions of seed-specific SSPs (Extended Data Fig. 9)91. This is in part due to their absence from existing stage-resolved transcriptional atlases of seed development. For example, a previous atlas is based on ATH1 microarray data, which had probes for only ~50% of SSP genes with conserved SSP motifs (Extended Data Fig. 9)17,88.
a, Conventional structure of SSPs and the criteria used for SSP detection. CS, cleavage site. b, Number of SSPs detected in the atlas assigned to characterized families on the basis of a motif analysis (Methods), and the number of each peptide family represented among the assigned SSPs. The grey area in the ‘assigned’ bar consists of all peptides in families with less than 15 SSPs detected in the atlas. c, Clustered heat maps of scaled gene expression for variable SSPs detected in the atlas. The row annotations ‘ATH1’ and ‘Fert’ indicate presence in or absence from the ATH1 microarray, and whether the SSP was upregulated after fertilization (adjusted P ≤ 0.05, log2FC > 0.5, limma t-test with Benjamini–Hochberg correction) in the bulk expression data from Figueiredo et al.11. d, Upset plot indicating the proportion of unique SSPs DE (adjusted P < 0.05, log2FC > 1, two-sided Wilcoxon rank-sum test with Bonferroni correction) at any time in development for the L2 clusters. e, The results of module score analysis for four SSP families containing members with signalling (TPD, CLE and LCRs) as well as inhibitory roles (PMEI). Adjusted P values are centred above clusters with significantly high average positive module scores in a cluster-versus-all-other-nuclei comparison (Wilcoxon rank-sum test with Bonferroni correction, sample size 54,210, treating individual nuclei as biological replicates). See Supplementary Table 7 for the module scores, P values and pairwise group sizes for all comparisons. For all box plots, the centre line corresponds to the median, the upper and lower hinges correspond to the 25th and 75th percentiles, and the whiskers extend to the highest and lowest values that are within 1.5 times the interquartile range.
To characterize the cell- and nuclei-type specificity of seed SSPs and SSP families, we performed differential expression analysis using genes previously annotated with SSP motifs88. This analysis revealed that SSPs containing defensin-like (DEFL), low-molecular-weight cysteine-rich (LCR) and lipid transfer (LTP) motifs are the three predominant SSP families in the atlas (Fig. 5b, Supplementary Table 10 and Methods)88. Although most have not been functionally characterized, members of these families have been implicated in defence and signalling; some play roles in pollen–pistil interactions85. We found that the 3 DAP MCE, 3 DAP CZE and 5 DAP CZE are hubs of SSP expression, both by expression level and by the number of unique SSPs expressed (Fig. 5c,d). The SSPs enriched in the MCE and CZE have motifs found in families with characterized roles in cell–cell signalling (tapetum determinant (TPD), CLAVATA3/embryo surrounding region-related (CLE) and LCR), as well as inhibitory roles (pectin methylesterase inhibitor (PMEI))92,93,94,95. Furthermore, many exhibit upregulation after fertilization (Fig. 5c and Methods). SSP enrichment and diversity in the CZE and MCE are compelling from a signalling point of view because these regions are important interfaces: the CZE is a gateway for maternal resources into the seed, and the MCE is the most embryo-proximal seed tissue (Fig. 1a).
Rapidly evolving single-copy orthologues are compartmentalized in the endosperm
Previous studies have suggested that seed genes show higher rates of rapid evolution than other tissue-specific genes, with those specifically expressed at maternal–offspring interfaces showing the highest evolutionary rates96. One explanation for this is that genes involved in maternal resource allocation are expected to rapidly evolve due to intrafamilial conflict. A previous study of seed-tissue-specific gene evolution analysed signatures of selection for gene sets but did not resolve individual rapidly evolving genes. To identify individual rapidly evolving seed genes, their protein domains under selection and their expression patterns in seed cell and nuclei types, we used codon-substitution site models of positive, negative and neutral selection implemented in the codeml program in the PAML package to calculate the likelihood of positive selection for all single-copy orthologues (SCOs) shared by Arabidopsis thaliana, Arabidopsis lyrata, Arabidopsis arenosa and Capsella grandiflora (Methods)97. This analysis produces a likelihood ratio test statistic (LRT) for each SCO, which indicates the goodness of fit of its phylogeny to a positive (M2a) or nearly neutral (M1a) model of selection. We generated LRTs for 7,187 SCOs and found that 141 seed genes have statistically high M2a/M1a LRTs (M2a/M1a-sig), 103 of which are DE among seed clusters.
An inspection of the 103 DE M2a/M1a-sig SCOs revealed that endosperm subtypes differentially express the highest number of M2a/M1a-sig SCOs (Fig. 6a and Supplementary Table 8). However, a module score analysis of M2a/M1a-sig SCOs showed that the ii1′ and ii2 seed coat layers have the highest expression enrichment, indicating that these subtypes highly express a small number of M2a/M1a-sig SCOs (Fig. 6a and Extended Data Fig. 10). Indeed, 17 genes DE in the ii1′ and ii2 seed coat layers are M2a/M1a-sig SCOs, but the M2a/M1a-sig expression enrichment appears to be largely driven by DELTAVPE, which has a rapidly evolving site in a carboxy-terminal legumain prodomain (Pfam ID: PF20985) (Extended Data Fig. 10). Intersecting all 359 high-confidence rapidly evolving sites (p(dN/dS) > 0.95, Bayes empirical Bayes) within M2a/M1a-sig SCOs with predicted protein domain coordinates revealed a functionally heterogeneous protein domain list, including 50 and 25 unique PANTHER and Pfam domains, respectively (Supplementary Table 8). A plurality of rapidly evolving sites were detected in extracellular domains or signal peptides of secreted proteins (147/359) (Fig. 6a). Intrinsically disordered regions (IDRs) were the second most prevalent selected domain (63/359) (Extended Data Fig. 10). There were no statistically significant shared GO terms among all DE M2a/M1a-sig SCOs, but genes implicated in protein degradation, secretion and transcriptional regulation recurred in the M2a/M1a-sig SCOs list (Supplementary Table 8). For example, AFA1 and HON1 encode an F-box protein (a putative E3 ubiquitin ligase adaptor) and histone H1.1 (which contains a winged-helix DNA-binding domain), respectively, and both are DE in the endosperm and CPT (Fig. 6b,c). XYN4, the most endosperm-specific M2a/M1a-sig SCO, unusually does not have secreted regions or IDRs with selected sites (Fig. 6b,c). Taken together, this analysis supports the finding that endosperm subtypes are enriched for rapidly evolving genes and identifies sites in secreted extracellular domains and IDRs that are the targets of positive selection in seeds.
a, Left: the number of DE genes (adjusted P ≤ 0.05, log2FC > 1, two-sided Wilcoxon rank-sum test with Bonferroni correction) with significant M2a/M1a LRTs for timed atlas L2 clusters. Right: the same genes as in the left plot, but labelled if at least one selected site falls in an extracellular domain (ECD) or signal peptide (SP), as predicted by Phobius. b, The residues likely to be under positive selection in three select genes DE in the endosperm (adjusted P ≤ 0.05, log2FC > 1, two-sided Wilcoxon rank-sum test with Bonferroni correction). Coding sequences were translated, and individual residues are coloured according to the Bayes empirical Bayes (BEB) posterior probability of having a dN/dS > 1 under the M2a model. Informative protein domains near or containing selected sites are highlighted. Pfam, PANTHER and InterPro identifiers from InterProScan are shown in parentheses. Consensus disorder regions were predicted by MobiDB-lite via InterProScan. Domain coordinates were taken from InterProScan for the longest isoform. c, Expression patterns for the genes shown in b in timed L2 clusters.
Discussion
We present a comprehensive atlas of early seed development that illuminates aspects of cell–cell signalling, functional compartmentalization and gene diversification in transcriptionally distinct cell or nuclei types. This adds to a growing compendium of single-cell or single-nucleus transcriptional atlases for various stages of development in Arabidopsis and other plant species98,99,100,101,102,103. We revealed additional insights into nutrient transport within the seed as well as sites of callose and BR biosynthesis. Our high coverage of endosperm nuclei allowed for the identification of a rare nucleus population in the CZE, which may clarify the origins of the CZE cyst. Furthermore, this atlas will enable the identification of promoters with high spatial and temporal specificity and will serve as a community resource for seed research.
Our study has also revealed additional complexity within the CZE. We propose that RALFL3+ CZE nuclei may be the ‘founder’ CZE population to which subsequent nuclei fuse to create the cyst. A time-lapse live-cell imaging study of Arabidopsis coenocytic endosperm development described two nuclei that migrate to the CZE, divide once and persist at the chalazal pole81, and we hypothesize that these are the RALFL3+ founders. We further posit that the fusion of nodule nuclei to the early CZE generates an apical–basal gradient of gene expression in the CZE at early time points. Notably, cytological features following this polarity have been observed in the CZE cyst: mitochondria and thylakoid stacks are enriched in the basal and apical regions, respectively104. The transcriptional similarity of the basal cyst to the CPT and ii1 suggests congruence or coordination with maternal tissue transcriptomes. Interestingly, “tentacle-like processes” embedded in maternal tissues have been described at the base of the Arabidopsis cyst104, and in other species the CZE takes on haustorial properties, which could facilitate such coordination78. How the RALFL3+ nuclei are established at the chalazal pole and the extent to which their DE SSPs contribute to this process remain to be studied.
Prior to this work, we had only partial understanding of the extent of SSP expression and diversity in seed cell types. Although the MCE is a well-appreciated source of signalling SSPs, most of the MCE/CZE-specific SSPs were not assayed in existing transcriptional seed atlases, which may have led to an underestimation of the contribution of SSPs to seed development. Some of the MCE/CZE SSPs that are also expressed in the ovule could function in fertilization, but many are more highly expressed after fertilization, suggesting alternative functions. Many of the seed SSPs are annotated as defensins. This class of cysteine-rich peptides is typically thought of as acting as anti-microbial peptides in seeds, particularly against fungi105. However, few seed-expressed defensins have been specifically evaluated for anti-microbial activity. It is possible that defensins and other seed SSPs act as ligands that are perceived by receptor-like kinases or receptor-like proteins to activate a variety of signalling pathways. The atlas will further allow the evaluation of the expression of potential receptors in various seed compartments. The enrichment of SSP expression at the embryo–endosperm and maternal–offspring interfaces is consistent with cell non-autonomous functions, although this remains to be demonstrated for individual peptides.
Our study supports the finding that the endosperm is enriched for rapidly evolving genes and highlights signatures of rapid evolution in seed coat layers. The majority of positively selected sites fall in secreted extracellular domains and IDRs. The structural flexibility of IDRs allows them to engage in diverse interactions, and as a result, they are often implicated in cell signalling and gene expression regulation. Our findings thus suggest that protein–protein interactions involved in signal transduction within and outside of the cell might be sites of rapid evolution in the seed. However, it is unclear to what extent the rapid evolution observed in these IDRs is due to their putative functions or lack of structural constraint106. A limitation of our approach is that we restricted our analysis to SCOs to prevent evolutionary analyses on false orthologues, thus omitting genes that are members of expanded families. We provide the list of M2a/M1a-sig SCOs and the coordinates of their rapidly evolving protein domains to guide future hypotheses about the functions of rapidly evolving seed genes. Functional studies are needed to discern whether these genes underlie the roles and diverse morphologies of Brassica seeds.
We have thus provided a transcriptional atlas with a high-resolution annotation that will serve as the basis for future studies of early seed development. To this end, we have created an online resource for exploring the data at https://seedatlas.wi.mit.edu/.
Methods
Generation of single-nucleus transcriptomes
Plant material
All plants (A. thaliana (L.) Heynh., Col-0) used in HCR and snRNA-seq experiments were grown at 22 °C in a glasshouse on soil under long-day conditions (16 h:8 h, light:dark). For timed crosses, we emasculated flower buds and after 2 days pollinated stigma with mature papillae.
Tissue preparation and nucleus extraction for snRNA-seq
Two biological replicates (different plants hand-pollinated and processed for snRNA-seq on different days), each containing seeds isolated from 10–15 siliques (500–800 seeds), were collected for each time point, producing six replicates total. Seeds were dissected into 150 μl cold extraction buffer on ice (1× Partec CyStain UV Precise P nuclei extraction buffer (Sysmex no. 05-5002), 4% BSA, 1 mM DTT, 1:100 protease inhibitor cocktail for plants (Sigma no. P9599) and 1:30 Protector RNase inhibitor (Fisher Scientific no. NC1877809)). The seeds were mechanically dissociated in 1.5-ml tubes by grinding with an Axygen pestle (Corning no. PES-15-B-SI) for 10 turns. The nucleus suspension was filtered through a 30-μm cell strainer (Fisher Scientific no. NC9682496), prewet with Partec CyStain UV Precise P staining buffer (Sysmex no. 05-5002), into a 5-ml tube for fluorescence activated nuclei sorting (FANS). The strainer was rinsed with Partec CyStain UV Precise P staining buffer into the 5-ml tube to bring the final suspension volume to 1 ml. Then, 1 μl of 1 mg ml−1 DAPI (ThermoFisher Scientific no. 62248) was added to increase the nuclear signal. Nuclei were purified and concentrated by FANS on a BD FACS ARIA Cell Sorter using a 70-μm nozzle chip. We gated on 2C, 3C, 4C, 6C, 8C and 16C peaks (Supplementary Fig. 9). Nuclei were sorted into 30–50 μl collection buffer (PBS-4% BSA) in a 1.5-ml tube, and concentration was assessed using the ARIA nucleus count and by manual counting on a Neubauer Improved haemocytometer. Each biological replicate was processed individually using one Chromium Next GEM Single Cell 3′ Kit (v.3.1) reaction with a target nucleus recovery of 10,000, except for one 3 DAP biological replicate, which was split across two reactions. The libraries were sequenced on a NovaSeqS2 with 50 × 50 paired-end reads. In total, seven libraries were generated in this study.
Computational analysis of single-nucleus transcriptomes
Raw data preprocessing, integration and clustering
Raw sequencing data were processed using Cell Ranger v.7.1.0 (10x Genomics). The A. thaliana TAIR10 genome sequence (Athaliana_447_TAIR10.fa) and the Araport11 annotation (Araport11_GFF3_genes_transposons.filtered.201606.gtf)107 were used as inputs to cellranger mkref v.6.0.02. The ‘cellranger count’ pipeline was used with the default parameters in STAR v.2.7.1a.
All scripts for the following analysis steps are available via the Gehring Lab GitHub at https://github.com/Gehring-Lab/seed_atlas_2025.
The ‘filtered_feature_bc_matrix’ for each library was individually preprocessed with the ‘per_library_QCs.R’ script, which implements Seurat v.5.0.0 for object generation and SoupX v.1.6.2 for background RNA correction108,109. Genes detected in less than 10 nuclei were removed, and profiles with less than 250 genes filtered out. Following the recommendation from Heumos et al.110, we further identified and removed ‘outlier’ nuclei, defined as those with a gene/nucleus or a transcript/nucleus that differs by five median absolute deviations from the rest of the sample110. We then performed an initial clustering analysis and removed clusters that had significantly fewer genes/nuclei than the mean. After low-quality profile removal, we ran scDblFinder v.1.12.0 on each library in random mode to generate a per-nucleus doublet score and removed putative doublets on the basis of the recommended threshold generated by scDblFinder, as outlined in the ‘score_doublets.R’ script111.
The preprocessed libraries were then merged and further quality controlled, as outlined in the ‘merge_libraries_and_batch_effects_harmony.R’ script, which is executed in two rounds. In the first round, libraries are merged by time point, and an initial clustering analysis is performed to generate cluster-level quality controls. Additionally, the JackStraw and ScoreJackStraw functions in Seurat are used to detect statistically significant principal components for downstream analysis111. The clusters are then visually assessed for batch effects. In the second round, we performed a final dimensionality reduction involving log normalization followed by a principal component analysis using 3,000 highly variable genes determined by the FindVariableFeatures function of Seurat, the number of principal components determined by JackStraw analysis, k-nearest neighbour graph construction and UMAP projection. Time points that exhibited clustering by biological replicate were integrated with Harmony v.1.2.0, which was required for 5 DAP (Supplementary Fig. 9)112. At this step, cell cycle genes from Picard et al.16 and Menges et al.113 were used to categorize nuclei into G0, G1, G1/S, S, G2 and G2/M-phases, if possible, using a modified version of the CellCycleScoring function in Seurat (Extended Data Fig. 1 and Supplementary Table 11)16,113.
To identify an appropriate initial clustering resolution for each time point, we performed a parameter sweep of the Seurat FindClusters function, varying the resolution parameter from 1 to 2 in increments of 0.1 (see the ‘clustering.R’ script). For each resolution, an average cluster purity and silhouette score were calculated using the neighborPurity and approxSilhouette bluster v.1.8.0 functions, respectively114. For each time point, we selected a resolution that preceded the greatest drop in neighbourhood purity on a resolution versus average neighbourhood purity plot. This clustering, referred to as the de novo clustering, served as the basis for the L3 annotation, the highest annotation resolution (Fig. 1d, Supplementary Fig. 1 and Supplementary Table 3).
Cluster annotation
Using both published markers (Supplementary Table 1) and markers identified in this study with HCR RNA-FISH validation (Supplementary Table 2), we classified all clusters, giving each L3 a full name that includes a number, a descriptor and the most informative marker gene(s) (Supplementary Table 3 and Extended Data Fig. 3; see the ‘manual_annotation.R’ script). RALFL3+ nuclei, localized to the CZE cyst by HCR RNA-FISH, were manually annotated on the basis of RALFL3 expression in the 3–5 DAP datasets. We performed an endosperm-only clustering analysis to isolate nodule and nodule-like clusters at 3 and 5 DAP using markers identified in Picard et al.16. When we used the Seurat FindClusters function, the nodule and nodule-like populations separated at resolutions 1.6 and 2.4 in 3 and 5 DAP, respectively. We appended the basal cyst, nodule and nodule-like annotations to the full 3–5 DAP datasets (Supplementary Figs. 2 and 3). We also performed a subclustering analysis on the 3 and 5 DAP embryos to identify characterized embryo subpopulations. We selected a clustering resolution for the Seurat FindClusters function that separated upper and lower protoderm populations in each embryo dataset, which was 1.2 for 3 DAP and 1.5 for 5 DAP (Extended Data Fig. 2). Suspensor nuclei were annotated in the 3 DAP embryo on the basis of enrichment for suspensor markers curated in Kao et al.27: nuclei in the 90th percentile of suspensor gene enrichment were classified as suspensor nuclei (see the ‘subclustering_embryo_reviews.R’ script)27. We also re-annotated putative G2/M oi nuclei that initially clustered with the 3 DAP embryo in a separate subclustering analysis (Supplementary Fig. 4). To do this, we re-scaled the 3 DAP expression data with G2/M- and S-phase enrichment scores regressed out (for example, using Seurat::ScaleData(dap3_seurat_object, features = VariableFeatures(dap3_seurat_object), vars.to.regress = c(‘S.Score’, ‘G2M.Score’))) and performed k-means clustering with the same optimal resolution identified in the non-regressed object. We found that the embryo subclustered into two populations in the cell-cycle-regressed object: one that expressed the embryo marker PDF1 and one that expressed the oi1 marker MYB11. Both of these clusters had high G2/M scores. We subsequently re-annotated the MYB11+ embryo subcluster as G2/M oi1 in the non-cell-cycle-regressed dataset (Supplementary Fig. 4). In all time-point datasets, clusters with negligible differential expression differences in the oi1 were merged into one cluster. We annotated time points separately and integrated them into a final atlas dataset using robust principal component analysis (see the ‘atlas_merging_rpca.R’ script)109.
GO term and gene module analysis with statistical testing
On the basis of GO term analysis for all DE genes for each atlas cluster across all annotation levels using clusterProfiler v.4.7.1.002 (see the ‘clusterprofiler_intermediate.R’ script), we identified gene sets (‘modules’) for follow-up enrichment analysis115. We retrieved gene lists for GO terms from UniProt (https://www.uniprot.org/) using GO term IDs filtered by the A. thaliana taxon ID (taxonomy_id:3702). Gene lists were used as input to the Seurat AddModuleScore function with the default settings (number of bins, nbin = 24; number of control genes per bin, nctrl = 100), which implements the gene set enrichment approach described by Tirosh et al.109,116. The resulting ‘module score’ is the difference in average expression between the gene set of interest and a randomly generated gene set with matched expression level variability. All gene lists are deposited with the scripts used for generating scores, which include ‘signalling_transport_gene_enrichment.R’, ‘peptide_enrichment.R’ and ‘protein_catabolism_enrichment.R’. The same approach was used for curated gene lists of SSPs, BZR/BES1 transcription factors and embryo subtype marker genes.
To statistically test module scores by cluster, we used the Wilcoxon rank-sum test in focal cluster versus all other nuclei comparisons. After performing this test for all clusters for a given module score, we adjusted P values using Bonferroni correction. See Supplementary Table 7 for the results of all statistical analyses of module scores performed in this study.
Correlation analysis of pseudobulked clusters
To quantify similarity between cell types, we pseudobulked clusters using the Seurat function AggregateExpression using the default arguments (normalization.method, ‘LogNormalize’; scale.factor, 10,000) and the top 3,000 variable genes for the snRNA-seq datasets of interest109. We used the pseudoexpression matrices as input to generate correlation matrices using Spearman correlation. To characterize gene expression correlation between biological replicates, we used the same approach but pseudobulked all genes detected and used Pearson correlation.
Pseudotime analysis
To define the transcriptional landscape of CZE development, we performed pseudotime analysis on integrated 3–5 DAP PEN and CZE nuclei. We merged 3 and 5 DAP PEN and CZE data and then performed dimensionality reduction analysis on the subset, regressing out cell cycle genes during data scaling and centring. The datasets were then integrated across time points using Harmony. To identify only one trajectory, we assigned a single partition to all nuclei in each dataset and manually specified the 3 DAP PEN as the root for trajectory and pseudotime inference. Nuclei were ordered and assigned pseudotime values using the learn_graph and order_cells functions in monocle3 v.1.3.7 (ref. 117). To identify genes that vary in pseudotime, we implemented graph autocorrelation analysis using the graph_test function in monocle3 v.1.3.7 (ref. 117). See the ‘level_2_pseudotime_merged_timepoint.R’ and ‘harmony_chalazal_endosperm_trajectory.R’ scripts for the details117.
Identifying DE genes
Differential expression analysis was performed on each time point and the integrated atlas object using the Seurat FindAllMarkers function with the default arguments, generating cluster versus all other nuclei results for each gene and P values from a Wilcoxon rank-sum test with Bonferroni correction (see the ‘differential_expression.R’ script for the details). To determine whether a gene is upregulated after fertilization (Fig. 5c), we re-analysed the published expression data from 4-day-after-emasculation unfertilized ovules and 2 DAP seeds (‘GSE85751_RMA_matrix.txt’) from Figueiredo et al.11, using limma118 with Benjamini–Hochberg correction to calculate the significance of DE genes. Genes with a log2FC greater than 0.5 with an adjusted P value of less than or equal to 0.05 in the fertilized condition were classified as upregulated.
Marker validation using HCR RNA-FISH
To localize clusters and marker genes in situ, we used HCR RNA-FISH based on a modified version of the whole-mount HCR protocol outlined in Huang et al.119 and the general HCR RNA-FISH protocol developed by Molecular Instruments: https://www.molecularinstruments.com/hcr-gold-rnafish. All HCR probes and fluorescent hairpins were synthesized by Molecular Instruments (Supplementary Table 2). Hand-pollinated or unstaged siliques were collected and fixed with 4% paraformaldehyde in 1× PBS through vacuum infiltration. After overnight fixation at 4 °C, the samples were washed in 1× PBS twice before embedding in 3% agarose for vibratome sectioning. Whole siliques were sectioned longitudinally to 60–150 μm thickness and stored in PBS on ice. To generate negative controls in parallel with each probe set, we stored sections in two tubes (one tube for the experimental and one for the negative control) during slicing, alternating tube placement every other section. Sections were subject to a second 4% paraformaldehyde fixation for 30 min at room temperature before two washes in PBS, then transferred to 100% methanol and stored at −20 °C. To permeabilize tissue before probe hybridization, the samples were subjected to alternating ethanol and methanol incubations, following Huang et al., with a clearing step using 50% Histoclear (National Diagnostics no. HS-200)/50% ethanol halfway through the permeabilization washes119. The samples were rehydrated with an ethanol/PBS-Tween-20 series (25%, 50%, 75%, 100%). Then, the samples were incubated in preheated Probe Hybridization Buffer (Molecular Instruments) for 30 min. Sections in the experimental tube were then exchanged into preheated Probe Hybridization Buffer (Molecular Instruments) with probes, and sections in the negative control tube were exchanged into preheated Probe Hybridization Buffer without probes. For targets with <5 probe sets, we used 64 μl of 1 μM probe in 400 μl Probe Hybridization Buffer, and for all other probes we used 12–24 μl of 1 μM probe in 400 μl. Probes were hybridized to samples in 1.5-ml tubes overnight in a 37 °C water bath. Probe washes closely followed the Molecular Instruments HCR protocol: we performed four 15-min buffer exchanges with Probe Wash Buffer (Molecular Instruments) preheated to 37 °C in a water bath, followed by two quick washes with 5× SSC-Tween-20. Samples in both positive and negative control tubes were incubated in Amplification Buffer (Molecular Instruments) for 30 min at room temperature before exchange to Amplification Buffer containing snap-cooled hairpins (8 μl of 3 μM stock per hairpin per 400 μl reaction volume). We used hairpins containing B2 and B3 adapter sequences with Alexa-488 or Alexa-647 dyes (Molecular Instruments). After an overnight incubation in the dark, excess probes were washed off with four 5× SSC-Tween-20 exchanges. The samples were stored for up to 2 weeks at 4 °C before imaging.
To prepare samples for confocal microscopy, we counterstained nuclei with 1 μg ml−1 DAPI and mounted silique sections on thin 2% agar pads suspended in water on glass slides. We imaged Z-stacks of samples on a Zeiss LSM 710 confocal microscope and generated maximum-intensity projection images using Fiji 2.16.0.
Whereas our negative controls indicate that the presented HCR results were not derived from spurious signals, it is possible that our HCR probes non-specifically bind to other RNAs or tissue components. This issue would be resolved if sense probes designed to mRNA targets were used as an additional negative control.
SSP detection and annotation
To detect all SSPs in the Arabidopsis genome, we filtered all Araport11 protein sequences for those less than 250 residues and used this as input to SignalP6 to identify those with N-terminal secretory signal sequences (see the ‘signalp6_command.sh’ script for the details). We transferred the SSP annotations described in Ghorbani et al.88 to the SSPs detected by SignalP6 (refs. 88,120). For all SSPs detected by SignalP6 expressed in the atlas that did not have an annotation from Ghorbani et al.88, we used the Araport11 annotation categorize these sequences into SSP families (Supplementary Table 10).
Maximum likelihood estimation of positive selection at sites in A. thaliana SCOs
To implement site models of codon evolution using codeml/PAML v.4.9 (ref. 97), we closely followed the procedure outlined in the Bioinformatics Workbook121, based on the analyses in Petersen et al.122. We identified orthologues between the translated coding sequences of A. thaliana (Araport11, Phytozome genome ID: 447), Arabidopsis lyrata (v.2.1, Phytozome genome ID: 384), Arabidopsis arenosa (AARE701a) and Capsella grandiflora (v.1.1, Phytozome genome ID: 266) using OrthoFinder v.2.5.4 (ref. 123). Arabidopsis SCOs that were found in all four species were used for subsequent analysis. Protein sequences from gene trees generated by OrthoFinder were aligned with clustalo124. The protein alignments and coding sequences from each gene tree were used as input to the pal2nal.pl script125 to produce codon alignments, omitting gaps in all sequence alignments (pal2nal.pl -nogap). Codon alignments and pruned gene trees were used as input to codeml, with arguments for performing maximum likelihood estimation of site models of codon evolution (runmode, 0; seqtype, 1; CodonFreq, 2; model, 0; NSsites, 0 1 2 7 8; fix_kappa, 0; kappa, 2; fix_omega, 0; omega, 0.4; cleandata, 1). We calculated both M2a/M1a and M8/M7 LRTs (2(lnLalt − lnLnull)) but proceeded with M2a/M1a for higher stringency. See the ‘orthofinder_to_codeml.sh’ script for more details. LRTs were significance tested using the chi-squared test function pchisq in R 4.2.1. All P values were adjusted within M2a/M1a comparisons using Benjamini–Hochberg correction. The Bayes empirical Bayes results from M2a/M1a analyses for individual sites were extracted from codeml outputs and evaluated only for genes with significant M2a/M1a LRTs.
To identify protein domains that overlap with positively selected sites in DE M2a/M1a genes, we performed an InterProScan (https://www.ebi.ac.uk/interpro/) for all translated DE M2a/M1a coding sequences. We downloaded the resulting GFF and matched selected sites to protein domains using a custom R 4.2.1 script; see ‘analyzing_codeml.R’ for the details.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All sequencing data are available via NCBI GEO with accession code GSE295007. A browser for the data is available at https://seedatlas.wi.mit.edu/
Code availability
Scripts for the analyses are available via GitHub at https://github.com/Gehring-Lab/seed_atlas_2025.
References
Nowack, M. K., Ungru, A., Bjerkan, K. N., Grini, P. E. & Schnittger, A. Reproductive cross-talk: seed development in flowering plants. Biochem. Soc. Trans. 38, 604–612 (2010).
Berger, F. Endosperm: the crossroad of seed development. Curr. Opin. Plant Biol. 6, 42–50 (2003).
Boisnard-Lorig, C. et al. Dynamic analyses of the expression of the HISTONE::YFP fusion protein in Arabidopsis show that syncytial endosperm is divided in mitotic domains. Plant Cell 13, 495–510 (2001).
Becker, M. G., Hsu, S.-W., Harada, J. J. & Belmonte, M. F. Genomic dissection of the seed. Front. Plant Sci. 5, 464 (2014).
Khan, D., Millar, J. L., Girard, I. J. & Belmonte, M. F. Transcriptional circuitry underlying seed coat development in Arabidopsis. Plant Sci. 219–220, 51–60 (2014).
Ingram, G. C. Family life at close quarters: communication and constraint in angiosperm seed development. Protoplasma 247, 195–214 (2010).
Stadler, R., Lauterbach, C. & Sauer, N. Cell-to-cell movement of green fluorescent protein reveals post-phloem transport in the outer integument and identifies symplastic domains in Arabidopsis seeds and embryos. Plant Physiol. 139, 701–712 (2005).
Doll, N. M. & Ingram, G. C. Embryo–endosperm interactions. Annu. Rev. Plant Biol. 73, 293–321 (2022).
Robert, H. S. et al. Maternal auxin supply contributes to early embryo patterning in Arabidopsis. Nat. Plants 4, 548–553 (2018).
Figueiredo, D. D. & Köhler, C. Auxin: a molecular trigger of seed development. Genes Dev. 32, 479–490 (2018).
Figueiredo, D. D., Batista, R. A., Roszak, P. J., Hennig, L. & Köhler, C. Auxin production in the endosperm drives seed coat development in Arabidopsis. eLife 5, e20542 (2016).
Lafon-Placette, C. & Köhler, C. Embryo and endosperm, partners in seed development. Curr. Opin. Plant Biol. 17, 64–69 (2014).
Xu, W., Sato, H., Bente, H., Santos-González, J. & Köhler, C. Endosperm cellularization failure induces a dehydration-stress response leading to embryo arrest. Plant Cell 35, 874–888 (2023).
Song, J. et al. LEAFY COTYLEDON1 expression in the endosperm enables embryo maturation in Arabidopsis. Nat. Commun. 12, 3963 (2021).
Brown, R. C., Lemmon, B. E., Nguyen, H. & Olsen, O.-A. Development of endosperm in Arabidopsis thaliana. Sex. Plant Reprod. 12, 32–42 (1999).
Picard, C. L., Povilus, R. A., Williams, B. P. & Gehring, M. Transcriptional and imprinting complexity in Arabidopsis seeds at single-nucleus resolution. Nat. Plants 7, 730–738 (2021).
Belmonte, M. F. et al. Comprehensive developmental profiles of gene activity in regions and subregions of the Arabidopsis seed. Proc. Natl Acad. Sci. USA 110, E435–E444 (2013).
van Ekelenburg, Y. S. et al. Spatial and temporal regulation of parent-of-origin allelic expression in the endosperm. Plant Physiol. 191, 986–1001 (2023).
Doll, N. M. et al. A two-way molecular dialogue between embryo and endosperm is required for seed development. Science 367, 431–435 (2020).
Royek, S. et al. Processing of a plant peptide hormone precursor facilitated by posttranslational tyrosine sulfation. Proc. Natl Acad. Sci. USA 119, e2201195119 (2022).
Millar, J. L. et al. Chalazal seed coat development in Brassica napus. Plant Sci. 241, 45–54 (2015).
Pegler, J. L., Grof, C. P. & Patrick, J. W. Sugar loading of crop seeds—a partnership of phloem, plasmodesmal and membrane transport. New Phytol. 239, 1584–1602 (2023).
Liu, H. et al. Biosynthesis- and transport-mediated dynamic auxin distribution during seed development controls seed size in Arabidopsis. Plant J. Cell Mol. Biol. 113, 1259–1277 (2023).
Chen, X. et al. Adaptation of seed dormancy to maternal climate occurs via intergenerational transport of abscisic acid. Proc. Natl Acad. Sci. USA 122, e2519319122 (2025).
Hibara, K., Takada, S. & Tasaka, M. CUC1 gene activates the expression of SAM-related genes to induce adventitious shoot formation. Plant J. 36, 687–696 (2003).
Ueda, M., Zhang, Z. & Laux, T. Transcriptional activation of Arabidopsis axis patterning genes WOX8/9 links zygote polarity to embryo development. Dev. Cell 20, 264–270 (2011).
Kao, P., Schon, M. A., Mosiolek, M., Enugutti, B. & Nodine, M. D. Gene expression variation in Arabidopsis embryos at single-nucleus resolution. Development 148, dev199589 (2021).
Capron, A., Chatfield, S., Provart, N. & Berleth, T. Embryogenesis: pattern formation from a single cell. Arabidopsis Book 7, e0126 (2009).
Wysocka-Diller, J. W., Helariutta, Y., Fukaki, H., Malamy, J. E. & Benfey, P. N. Molecular analysis of SCARECROW function reveals a radial patterning mechanism common to root and shoot. Development 127, 595–603 (2000).
Welch, D. et al. Arabidopsis JACKDAW and MAGPIE zinc finger proteins delimit asymmetric cell division and stabilize tissue boundaries by restricting SHORT-ROOT action. Genes Dev. 21, 2196–2204 (2007).
Willemsen, V. et al. The NAC domain transcription factors FEZ and SOMBRERO control the orientation of cell division plane in Arabidopsis root stem cells. Dev. Cell 15, 913–922 (2008).
Zhang, T., Ge, Y., Cai, G., Pan, X. & Xu, L. WOX–ARF modules initiate different types of roots. Cell Rep. 42, 112966 (2023).
Su, Y. H. et al. Auxin-induced WUS expression is essential for embryonic stem cell renewal during somatic embryogenesis in Arabidopsis. Plant J. Cell Mol. Biol. 59, 448–460 (2009).
Beeckman, T., De Rycke, R., Viane, R. & Inzé, D. Histological study of seed coat development in Arabidopsis thaliana. J. Plant Res. 113, 139–148 (2000).
Francoz, E. et al. Complementarity of medium-throughput in situ RNA hybridization and tissue-specific transcriptomics: case study of Arabidopsis seed development kinetics. Sci. Rep. 6, 24644 (2016).
Mizzotti, C. et al. SEEDSTICK is a master regulator of development and metabolism in the Arabidopsis seed coat. PLoS Genet. 10, e1004856 (2014).
Windsor, J. B., Symonds, V. V., Mendenhall, J. & Lloyd, A. M. Arabidopsis seed coat development: morphological differentiation of the outer integument. Plant J. Cell Mol. Biol. 22, 483–493 (2000).
Creff, A., Brocard, L. & Ingram, G. A mechanically sensitive cell layer regulates the physical properties of the Arabidopsis seed coat. Nat. Commun. 6, 6382 (2015).
Nakaune, S. et al. A vacuolar processing enzyme, δVPE, is involved in seed coat formation at the early stage of seed development. Plant Cell 17, 876–887 (2005).
Muller, B. Characterization of UmamiTs in Arabidopsis: Amino Acid Transporters Involved in Amino Acid Cycling, Phloem Unloading and the Supply of Symplasmically Isolated Sink Tissues. PhD thesis, Univ. Regensburg (2016); https://epub.uni-regensburg.de/34834/1/Doktorarbeit%20Benedikt%20M%C3%BCller.pdf
Sanden, N. C. H. et al. An UMAMIT–GTR transporter cascade controls glucosinolate seed loading in Arabidopsis. Nat. Plants 10, 172–179 (2024).
Vogiatzaki, E., Baroux, C., Jung, J.-Y. & Poirier, Y. PHO1 exports phosphate from the chalazal seed coat to the embryo in developing Arabidopsis seeds. Curr. Biol. 27, 2893–2900.e3 (2017).
Li, M. et al. SHORT-ROOT specifically functions in the chalazal region to modulate assimilate partitioning into seeds. Plant J. Cell Mol. Biol. 120, 2031–2044 (2024).
Horák, J. et al. The Arabidopsis thaliana response regulator ARR22 is a putative AHP phospho-histidine phosphatase expressed in the chalaza of developing seeds. BMC Plant Biol. 8, 77 (2008).
Lu, J. & Magnani, E. Seed tissue and nutrient partitioning, a case for the nucellus. Plant Reprod. 31, 309–317 (2018).
Xu, W. et al. Endosperm and nucellus develop antagonistically in Arabidopsis seeds. Plant Cell 28, 1343–1360 (2016).
Lu, J. et al. The nucellus: between cell elimination and sugar transport. Plant Physiol. 185, 478–490 (2021).
Xu, W. et al. A change in the cell wall status initiates the elimination of the nucellus in Arabidopsis. Preprint at bioRxiv https://doi.org/10.1101/2024.04.23.590775 (2024).
Jullien, P. E. et al. Functional characterization of Arabidopsis ARGONAUTE 3 in reproductive tissues. Plant J. Cell Mol. Biol. 103, 1796–1809 (2020).
Veley, K. M. et al. Arabidopsis MSL10 has a regulated cell death signaling activity that is separable from its mechanosensitive ion channel activity. Plant Cell 26, 3115–3131 (2014).
Müller, B. et al. Amino acid export in developing Arabidopsis seeds depends on UmamiT facilitators. Curr. Biol. 25, 3126–3131 (2015).
Chen, L.-Q. et al. A cascade of sequentially expressed sucrose transporters in the seed coat and endosperm provides nutrition for the Arabidopsis embryo. Plant Cell 27, 607–619 (2015).
Pinto, S. C. et al. Germline β-1,3-glucan deposits are required for female gametogenesis in Arabidopsis thaliana. Nat. Commun. 15, 5875 (2024).
Liu, X. et al. Fertilization-dependent phloem end gate regulates seed size. Curr. Biol. 35, 2049–2063.e3 (2025).
Ponnu, J., Wahl, V. & Schmid, M. Trehalose-6-phosphate: connecting plant metabolism and development. Front. Plant Sci. 2, 70 (2011).
Lunn, J. E. et al. Sugar-induced increases in trehalose 6-phosphate are correlated with redox activation of ADPglucose pyrophosphorylase and higher rates of starch synthesis in Arabidopsis thaliana. Biochem. J. 397, 139–148 (2006).
Eastmond, P. J. SUGAR-DEPENDENT1 encodes a patatin domain triacylglycerol lipase that initiates storage oil breakdown in germinating Arabidopsis seeds. Plant Cell 18, 665–675 (2006).
Gómez, L. D., Gilday, A., Feil, R., Lunn, J. E. & Graham, I. A. AtTPS1-mediated trehalose 6-phosphate synthesis is essential for embryogenic and vegetative growth and responsiveness to ABA in germinating seeds and stomatal guard cells. Plant J. Cell Mol. Biol. 64, 1–13 (2010).
Jiang, W.-B. et al. Brassinosteroid regulates seed size and shape in Arabidopsis. Plant Physiol. 162, 1965–1977 (2013).
Cai, H. et al. Brassinosteroid signaling regulates female germline specification in Arabidopsis. Curr. Biol. 32, 1102–1114.e5 (2022).
Liao, K. et al. Brassinosteroids antagonize jasmonate-activated plant defense responses through BRI1-EMS-SUPPRESSOR1 (BES1). Plant Physiol. 182, 1066–1082 (2020).
Lima, R. B. & Figueiredo, D. D. Sex on steroids: how brassinosteroids shape reproductive development in flowering plants. Plant Cell Physiol. 65, 1581–1600 (2024).
Jia, D. et al. Brassinosteroids regulate outer ovule integument growth in part via the control of INNER NO OUTER by BRASSINOZOLE-RESISTANT family transcription factors. J. Integr. Plant Biol. 62, 1093–1111 (2020).
Vogler, F., Schmalzl, C., Englhart, M., Bircheneder, M. & Sprunck, S. Brassinosteroids promote Arabidopsis pollen germination and growth. Plant Reprod. 27, 153–167 (2014).
Lima, R. B. et al. Seed coat-derived brassinosteroid signaling regulates endosperm development. Nat. Commun. 15, 9352 (2024).
Chen, W. et al. BES1 is activated by EMS1–TPD1–SERK1/2-mediated signaling to control tapetum development in Arabidopsis thaliana. Nat. Commun. 10, 4164 (2019).
Yin, Y. et al. BES1 accumulates in the nucleus in response to brassinosteroids to regulate gene expression and promote stem elongation. Cell 109, 181–191 (2002).
Zhu, T. et al. The BAS chromatin remodeler determines brassinosteroid-induced transcriptional activation and plant growth in Arabidopsis. Dev. Cell 59, 924–939.e6 (2024).
Wang, W. et al. Photoexcited CRYPTOCHROME1 interacts with dephosphorylated BES1 to regulate brassinosteroid signaling and photomorphogenesis in Arabidopsis. Plant Cell 30, 1989–2005 (2018).
Oh, E., Zhu, J.-Y., Ryu, H., Hwang, I. & Wang, Z.-Y. TOPLESS mediates brassinosteroid-induced transcriptional repression through interaction with BZR1. Nat. Commun. 5, 4140 (2014).
Wang, T. et al. Brassinosteroid transcription factor BES1 modulates nitrate deficiency by promoting NRT2.1 and NRT2.2 transcription in Arabidopsis. Plant J. Cell Mol. Biol. 114, 1443–1457 (2023).
O’Malley, R. C. et al. Cistrome and epicistrome features shape the regulatory DNA landscape. Cell 165, 1280–1292 (2016).
Doll, N. M. et al. The endosperm-derived embryo sheath is an anti-adhesive structure that facilitates cotyledon emergence during germination in Arabidopsis. Curr. Biol. 30, 909–915.e4 (2020).
Moussu, S. et al. ZHOUPI and KERBEROS mediate embryo/endosperm separation by promoting the formation of an extracuticular sheath at the embryo surface. Plant Cell 29, 1642–1656 (2017).
Yang, S. et al. The endosperm-specific ZHOUPI gene of Arabidopsis thaliana regulates endosperm breakdown and embryonic epidermal development. Development 135, 3501–3509 (2008).
Doll, N. M. et al. Endosperm cell death promoted by NAC transcription factors facilitates embryo invasion in Arabidopsis. Curr. Biol. 33, 3785–3795.e6 (2023).
Olsen, O.-A. Nuclear endosperm development in cereals and Arabidopsis thaliana. Plant Cell 16, S214–S227 (2004).
Povilus, R. A. & Gehring, M. Maternal–filial transfer structures in endosperm: a nexus of nutritional dynamics and seed development. Curr. Opin. Plant Biol. 65, 102121 (2022).
Le, B. H. et al. Global analysis of gene activity during Arabidopsis seed development and identification of seed-specific transcription factors. Proc. Natl Acad. Sci. USA 107, 8063–8070 (2010).
Baroux, C., Fransz, P. & Grossniklaus, U. Nuclear fusions contribute to polyploidization of the gigantic nuclei in the chalazal endosperm of Arabidopsis. Planta 220, 38–46 (2004).
Ali, M. F. et al. Cellular dynamics of coenocytic endosperm development in Arabidopsis thaliana. Nat. Plants 9, 330–342 (2023).
Murphy, E., Smith, S. & De Smet, I. Small signaling peptides in Arabidopsis development: how cells communicate over a short distance. Plant Cell 24, 3198–3217 (2012).
Matsubayashi, Y. Posttranslationally modified small-peptide signals in plants. Annu. Rev. Plant Biol. 65, 385–413 (2014).
Higashiyama, T. Peptide signaling in pollen–pistil interactions. Plant Cell Physiol. 51, 177–189 (2010).
Marshall, E., Costa, L. M. & Gutierrez-Marcos, J. Cysteine-rich peptides (CRPs) mediate diverse aspects of cell–cell communication in plant reproduction and development. J. Exp. Bot. 62, 1677–1686 (2011).
Olsson, V. et al. Look closely, the beautiful may be small: precursor-derived peptides in plants. Annu. Rev. Plant Biol. 70, 153–186 (2019).
Takahashi, F., Hanada, K., Kondo, T. & Shinozaki, K. Hormone-like peptides and small coding genes in plant stress signaling and development. Curr. Opin. Plant Biol. 51, 88–95 (2019).
Ghorbani, S. et al. Expanding the repertoire of secretory peptides controlling root development with comparative genome analysis and functional assays. J. Exp. Bot. 66, 5257–5269 (2015).
Hellmann, E. MtSSPdb: a new database for the small secreted peptide research community. Plant Physiol. 183, 31–32 (2020).
Hu, X.-L. et al. Advances and perspectives in discovery and functional analysis of small secreted proteins in plants. Hortic. Res. 8, 130 (2021).
Mergner, J. et al. Mass-spectrometry-based draft of the Arabidopsis proteome. Nature 579, 409–414 (2020).
Takeuchi, H. & Higashiyama, T. A species-specific cluster of defensin-like genes encodes diffusible pollen tube attractants in Arabidopsis. PLoS Biol. 10, e1001449 (2012).
Coculo, D. & Lionetti, V. The plant invertase/pectin methylesterase inhibitor superfamily. Front. Plant Sci. 13, 863892 (2022).
Huang, J. et al. Control of anther cell differentiation by the small protein ligand TPD1 and its receptor EMS1 in Arabidopsis. PLoS Genet. 12, e1006147 (2016).
Song, X.-F., Hou, X.-L. & Liu, C.-M. CLE peptides: critical regulators for stem cell maintenance in plants. Planta 255, 5 (2021).
Geist, K. S., Strassmann, J. E. & Queller, D. C. Family quarrels in seeds and rapid adaptive evolution in Arabidopsis. Proc. Natl Acad. Sci. USA 116, 9463–9468 (2019).
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Shahan, R. et al. A single-cell Arabidopsis root atlas reveals developmental trajectories in wild-type and cell identity mutants. Dev. Cell 57, 543–560.e9 (2022).
Ke, Y. et al. A single-cell and spatial wheat root atlas with cross-species annotations delineates conserved tissue-specific marker genes and regulators. Cell Rep. 44, 115240 (2025).
Guillotin, B. et al. A pan-grass transcriptome reveals patterns of cellular divergence in crops. Nature 617, 785–791 (2023).
Lee, T. A. et al. A single-cell, spatial transcriptomic atlas of the Arabidopsis life cycle. Nat. Plants 11, 1960–1975 (2025).
Guo, X. et al. An Arabidopsis single-nucleus atlas decodes leaf senescence and nutrient allocation. Cell 188, 2856–2871.e16 (2025).
Zhang, X. et al. A spatially resolved multi-omic single-cell atlas of soybean development. Cell 188, 550–567.e19 (2025).
Nguyen, H., Brown, R. C. & Lemmon, B. E. The specialized chalazal endosperm in Arabidopsis thaliana and Lepidium virginicum (Brassicaceae). Protoplasma 212, 99–110 (2000).
Vriens, K., Cammue, B. P. A. & Thevissen, K. Antifungal plant defensins: mechanisms of action and production. Molecules 19, 12280–12303 (2014).
Singleton, M. D. & Eisen, M. B. Evolutionary analyses of intrinsically disordered regions reveal widespread signals of conservation. PLoS Comput. Biol. 20, e1012028 (2024).
Cheng, C.-Y. et al. Araport11: a complete reannotation of the Arabidopsis thaliana reference genome. Plant J. 89, 789–804 (2017).
Young, M. D. & Behjati, S. SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data. GigaScience 9, giaa151 (2020).
Hao, Y. et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol. 42, 293–304 (2024).
Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).
Germain, P.-L., Lun, A., Garcia Meixide, C., Macnair, W. & Robinson, M. D. Doublet identification in single-cell sequencing data using scDblFinder. F1000Res. 10, 979 (2022).
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with harmony. Nat. Methods 16, 1289–1296 (2019).
Menges, M., de Jager, S. M., Gruissem, W. & Murray, J. A. H. Global analysis of the core cell cycle regulators of Arabidopsis identifies novel genes, reveals multiple and highly specific profiles of expression and provides a coherent model for plant cell cycle control. Plant J. Cell Mol. Biol. 41, 546–566 (2005).
bluster. Bioconductor http://bioconductor.org/packages/bluster/ (accessed 29 April 2025).
Xu, S. et al. Using clusterProfiler to characterise multi-omics data. Nat. Protoc. 19, 3292–3320 (2024).
Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016).
Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Huang, T., Guillotin, B., Rahni, R., Birnbaum, K. D. & Wagner, D. A rapid and sensitive, multiplex, whole mount RNA fluorescence in situ hybridization and immunohistochemistry protocol. Plant Methods 19, 131 (2023).
Teufel, F. et al. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat. Biotechnol. 40, 1023–1025 (2022).
Positive, neutral, negative selection with Codeml using multiple genome annotations. Bioinformatics Workbook https://bioinformaticsworkbook.org/dataAnalysis/ComparativeGenomics/Finding_Positive_Selection_With_Codeml.html (accessed 29 April 2025).
Petersen, L., Bollback, J. P., Dimmic, M., Hubisz, M. & Nielsen, R. Genes under positive selection in Escherichia coli. Genome Res. 17, 1336–1343 (2007).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011).
Shakya, M. et al. Standardized phylogenetic and molecular evolutionary analysis applied to species across the microbial tree of life. Sci. Rep. 10, 1723 (2020).
Acknowledgements
We thank S. Khouider and E. Hemenway for assistance with timed seed isolation for snRNA-seq. We thank T. Whitfield, S. Gupta and A. Dionisio for guidance on module score statistical analyses, snRNA-seq analysis and seed atlas app development, respectively. Additionally, we thank J. Love, S. Mraz III and A. Chilaka at the WIBR Genome Technology Core for all snRNA-seq library preparation and sequencing. We also thank P. Autissier and A. Rathee for performing FANS procedures at the WIBR flow core facility. This research was supported by the Manton Foundation, the Dr. Vincent J. Ryan Orphan Plant Project and a National Science Foundation Graduate Research Fellowship to C.A.M. M.G. is an Investigator at the Howard Hughes Medical Institute.
Author information
Authors and Affiliations
Contributions
C.A.M. and M.G. conceived this study. C.A.M. performed the snRNA-seq and HCR experiments and imaging with help from A.L.P. and K.R.C. C.A.M. performed all data analyses, and C.A.M. and M.G. interpreted the results. C.A.M. and M.G. prepared the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Plants thanks Tomokazu Kawashima and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 General cluster characteristics and similarity.
a, Genes per nucleus distributions for level 2 (L2) annotations across all timepoints. Red dotted line drawn at the median. b, Cell cycle stage proportionality for L2 clusters across all timepoints based on cell cycle phase marker genes defined in Picard et al.16 and Menges et al. 2005. c, Clustered heatmaps of the Spearman correlations for the aggregated expression of 3000 highly variable genes across all L2 annotations and timepoints. d, Pearson correlation of aggregated gene expression between two biological replicates for each timepoint (3 DAP: 21755 genes, 5 DAP: 21227 genes, 7 DAP: 19705 genes). Axes are pseudo log-transformed.
Extended Data Fig. 2 Embryo subclustering analysis.
a-b, The most informative markers for clusters identified in the 3 DAP and 5 DAP embryo, which include markers with characterized embryonic expression patterns (PDF1, KAN1, AGP18, SHR, PIN4, XYL4, JKD, and PIN7), and those that are specific to embryo subclusters in this dataset (PME5, KRATOS, IMK2, and TOPII). c, Proportions of embryo subcluster categories at each timepoint. d, Module score analysis for 40 strong shoot apical meristem (SAM) markers curated in Kao et al.27 across all embryo L3 clusters and CUC1+ nuclei. p-values are centered above clusters with significantly high positive module scores in a cluster-vs-all other nuclei comparison. p-values are derived from a two-sided Wilcoxon Rank-Sum test with Bonferroni correction. See Supplementary Table 7 for the module scores and p-values for all clusters.
Extended Data Fig. 3 Level 3 annotation key.
Full names for level 3 (L3) annotations for all timepoints, with defining marker genes in parentheses. EMB, embryo; CZE, chalazal endosperm; MCE, micropylar endosperm; PEN, peripheral endosperm; FUN, funiculus; OVL, ovule 5 days after emasculation; CPT, chalazal proliferating tissue; CZSC, chalazal seed coat; ii1, inner integument 1; ii1’, inner integument 1’; ii2; inner integument 2; oi1, outer integument 1; oi2, outer integument 2; ESR, embryo-surrounding region; LTP, lipid transfer protein; PCD, programmed cell death.
Extended Data Fig. 4 Informative markers for seed coat layers and nucleoside metabolism enrichment in the chalazal proliferating tissue.
a, The most informative markers for L2 seed coat layers. All markers except MYB11 have previously characterized layer-specific expression patterns. MYB11, a regulator of flavanol biosynthesis, is the top marker for oi1 in this dataset. b, The most significant GO terms for the 3 DAP CPT, based on differentially expressed genes in a cluster vs. all 3 DAP clusters comparison (log2FC > 1, adj. p-value < 0.05). c, Module score results for all nucleoside metabolism genes in 3 DAP clusters. p-values are centered above clusters with significantly high average positive module scores in a cluster-vs-all other nuclei comparison. p-values are derived from a two-sided Wilcoxon Rank-Sum test with Bonferroni correction. d, The expression patterns of the top genes that underlie nucleoside metabolism enrichment in the 3 DAP persistent CPT.
Extended Data Fig. 5 Cell types with complementary functions in the 3 DAP CZSC.
a, GO term overlap for genes DE (log2FC > 1, adj. p-value < 0.05) in two putative placentochalazal clusters. Exclusive GO terms are those associat-ed with only one of the clusters. b, A subset of shared GO terms for the two putative placentochalazal clusters at 3 DAP. c, Module score analysis for all genes associated with the GO term ‘callose biosynthesis’ shown across all L3 seed coat clusters in the atlas (Methods). p-values are centered above clusters with significantly high average positive module scores in a cluster-vs-all other nuclei comparison. p-values are derived from a two-sided Wilcoxon Rank-Sum test with Bonferroni correction, see Supplementary Table 7 for the module scores and p-values for all clusters. d, Expression patterns of genes that underlie exclusive GO terms for the two putative CZSC clusters at 3 DAP. See Supplementary Table 3 to match abbreviated L3 names to their full descriptions.
Extended Data Fig. 6 The 7 DAP MCE is the most distinctive endosperm subtype.
a, L2 endosperm subtypes at 7 DAP. b, A clustered heatmap of the Spearman correlation coefficient (ρ) of aggregated expression for L2 endosperm clusters split by time. c, GLIP6 is a highly specific marker for the ESR at 7 DAP, validated by HCR.Scale bar, 100 µm. d, Module score analysis for all UMAMITs (top) and nitrate transporters (bottom) detected in the atlas for all L2 endosperm clusters. p-values are centered above clusters with significantly high average positive module scores in a cluster-vs-all other nuclei comparison. p-values are derived from a two-sided Wilcoxon Rank-Sum test with Bonferroni correction, see Supplementary Table 7 for the module scores and p-values for all clusters. e, 7 DAP L3 MCE clusters show differential expression of KRS and two NAC transcription factors. f, Nitrate transporters follow the KRS-NAC074 transcriptional gradient while UMAMITs are more broadly expressed.
Extended Data Fig. 7 Differentially expressed genes along the apical/basal axis of the CZE.
a, Top row: HCR for AT3G49307 (red), a gene enriched in the apical CZE during early to intermediate stages of development based on snRNA-seq data. These samples were not staged. The last two images are from whole mount seed preparations (Methods). Bottom row: HCR validation of RALFL3 (green), a basal cyst marker, assayed on 3 DAP seeds. DAPI staining in white. Scale bar, = 50 µm. b, Expression patterns for the top basal cyst markers, excluding those in Fig. 4d, through development within L3 CZE subtypes.
Extended Data Fig. 8 Genes that vary along the CZE nodule-like-to-cyst trajectory.
a, Spearman correlation coefficients of the pairwise similarity between aggregated expression profiles of L3 CZE subtypes. 7 DAP L3 clusters could not be assigned apical/basal states. See Supplementary Table 3 to match abbreviated L3 names to their full descriptions. b, Left: L3 annotations for 3–5 DAP PEN and CZE clusters used in pseudotime analysis. Right: only blue-highlighted nuclei were used in the pseudotime expression plots in c to show gene expression variation in pseudotime along the non-basal cyst PEN to CZE trajectory (see Fig. 4). c, Normalized expression of a subset of genes that significantly vary on 3–5 DAP PEN/CZE pseudotime (see Fig. 4) is plotted in pseudotime-ordered nuclei. Color key in a extends to the labeling in c.
Extended Data Fig. 10 Rapidly evolving single-copy ortholog gene expression enrichment in endosperm and seed coat subtypes.
a, Module score analysis for single-copy orthologs (SCOs) with significant M2a/M1a likelihood ratio tests(‘‘M2a/M1a-sig’) across L2 timed atlas clusters. p-values are derived from a two-sided Wilcoxon Rank-Sum test with Bonferroni correction in focal cluster vs. all other comparisons. Only p-values for clusters with positive, significant module scores are shown. See Supplementary Table 7 for the module scores and p-values for all clusters. b, The expression patterns for M2a/M1a-sig SCOs differentially expressed in ii1’ and ii2 subtypes (p-value ≤ 0.05, log2FC > 1). c, The residues likely under positive selection in the DELTAVPE protein sequence. Individual residues colored by the Bayes Emprical Bayes (BEB) posterior probability of having a dN/dS > 1 under the M2a model. Informative protein domains near or containing selected sites are highlighted, Pfam identifier from InterProScan in parentheses. d, The number of DE M2a/M1a-sig genes (p-value ≤ 0.05, log2FC > 1) per L2 timed cluster. Individual genes are colored blue-green if at least one selected site falls in an an intrinsically disordered region (IDR) as predicted by MobiDB-lite.
Supplementary information
Supplementary Information (download PDF )
Supplementary Figs. 1–9.
Supplementary Tables 1, 2, 5–7 and 9–11 (download XLSX )
Supplementary Table 1: All published markers used to annotate clusters in this study; Supplementary Table 2: Names and sequences for HCR probes used in this study; Supplementary Table 5: Transcriptional and epigenetic regulators that vary between atlas endosperm clusters; Supplementary Table 6: Transcription factors that vary along chalazal endosperm pseudotime; Supplementary Table 7: All module score statistical analyses presented in this study; Supplementary Table 9: Full names for the samples from ref. 91 displayed in Extended Data Fig. 9; Supplementary Table 10: SSP families used for peptide enrichment analysis; Supplementary Table 11: Cell cycle gene list compiled from refs. 16,113 used for nucleus cell cycle scoring.
Supplementary Table 3 (download XLSX )
Quality metrics and metadata for all cluster annotations.
Supplementary Table 4 (download XLSX )
All differential analysis for all datasets at all annotation levels.
Supplementary Table 8 (download XLSX )
Information about genes that show evidence for positive selection.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Martin, C.A., Cogdill, K.R., Pusey, A.L. et al. A transcriptional atlas of early Arabidopsis seed development suggests mechanisms for inter-tissue coordination. Nat. Plants (2026). https://doi.org/10.1038/s41477-026-02295-8
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41477-026-02295-8





