Abstract
Transcription factors (TFs) are proteins that bind DNA to control where and when genes are expressed. In plants, dozens of TF families interact with distinct sets of binding sites (TFBSs) that reflect each TF’s role in organismal function and species-specific adaptations. However, defining these roles and understanding broader patterns of regulatory evolution remain challenging, as predicted TFBSs may lack a clear impact on transcription, and experimentally derived TF binding maps to date are modest in scale or restricted to model organisms. Here we present a scalable TFBS assay that we leveraged to create an atlas of nearly 3,000 genome-wide binding site maps for 360 TFs in ten species spanning 150 million years of flowering plant evolution. We found that TF orthologues from distant species retain nearly identical binding preferences, while on the same timescales the gain and loss of TFBSs are widespread. Within lineages, however, conserved TFBSs are over-represented and found in regions harbouring signatures of functional regulatory elements. Moreover, genes with conserved TFBSs showed striking enrichment for cell-type-specific expression in 14 single-nucleus RNA atlases, providing a robust marker of each TF’s activity and developmental role. Finally, we compare distant lineages, illustrating how ancient regulatory modules were recruited and rewired to enable adaptations underlying the evolutionary success of grasses.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout



Similar content being viewed by others
Data availability
The raw DAP-seq fastq sequence data files were submitted to the National Center for Biotechnology Information under BioProject no. PRJNA1177505. The raw snRNA-seq fastq sequence data files were submitted to the National Center for Biotechnology Information under BioProject no. PRJNA1262374. The processed DAP-seq data including peak files (narrowPeak format), coverage tracks (bigwig format), c score tables for four-species and ten-species datasets (.tsv format) and a readme, as well as processed snRNA-seq data including single-cell gene expression matrices (.mtx format) for each library and compiled seurat objects for each species tissue, were submitted to the National Center for Biotechnology Information under GEO superseries accession no. GSE299028. A compiled list of A. thaliana cell-type marker genes used to assist with cell-type annotation was downloaded from scPlantDB (https://biobigdata.nju.edu.cn/scplantdb/marker).
Code availability
Scripts and example data files used for the DAP-seq and single-nucleus analyses are available in a Git repository at https://code.jgi.doe.gov/LBaumgart/plant-multidap-and-single-cell.
References
Weirauch, M. T. et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443 (2014).
Baumgart, L. A. et al. Persistence and plasticity in bacterial gene regulation. Nat. Methods 18, 1499–1505 (2021).
O’Malley, R. C. et al. Cistrome and epicistrome features shape the regulatory DNA landscape. Cell 165, 1280–1292 (2016).
Chaw, S.-M., Chang, C.-C., Chen, H.-L. & Li, W.-H. Dating the monocot–dicot divergence and the origin of core eudicots using whole chloroplast genomes. J. Mol. Evol. 58, 424–441 (2004).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Haudry, A. et al. An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions. Nat. Genet. 45, 891–898 (2013).
Schmidt, D. et al. Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science 328, 1036–1040 (2010).
Ballester, B. et al. Multi-species, multi-transcription factor binding highlights conserved control of tissue-specific biological pathways. eLife 3, e02626 (2014).
Huang, X.-C., German, D. A. & Koch, M. A. Temporal patterns of diversification in Brassicaceae demonstrate decoupling of rate shifts and mesopolyploidization events. Ann. Bot. 125, 29–47 (2020).
Gou, M. et al. The MYB107 transcription factor positively regulates suberin biosynthesis. Plant Physiol. 173, 1045–1058 (2017).
Shukla, V. et al. Suberin plasticity to developmental and exogenous cues is regulated by a set of MYB transcription factors. Proc. Natl Acad. Sci. USA https://doi.org/10.1073/pnas.2101730118 (2021).
Woolfson, K. N., Esfandiari, M. & Bernards, M. A. Suberin biosynthesis, assembly, and regulation. Plants 11, 555 (2022).
Serra, O. & Geldner, N. The making of suberin. N. Phytol. 235, 848–866 (2022).
Fraser, C. M. & Chapple, C. The phenylpropanoid pathway in Arabidopsis. Arabidopsis Book 9, e0152 (2011).
Jiang, C.-K. & Rao, G.-Y. Insights into the diversification and evolution of R2R3–MYB transcription factors in plants. Plant Physiol. 183, 637–655 (2020).
Shibata, M. et al. GTL1 and DF1 regulate root hair growth through transcriptional repression of ROOT HAIR DEFECTIVE 6-LIKE 4 in Arabidopsis. Development 145, dev159707 (2018).
Hrmova, M. & Hussain, S. S. Plant transcription factors involved in drought and associated stresses. Int. J. Mol. Sci. 22, 7181 (2021).
Kim, Y., Park, S., Gilmour, S. J. & Thomashow, M. F. Roles of CAMTA transcription factors and salicylic acid in configuring the low‐temperature transcriptome and freezing tolerance of Arabidopsis. Plant J. 75, 364–376 (2013).
Bowman, J. L. & Smyth, D. R. CRABS CLAW, a gene that regulates carpel and nectary development in Arabidopsis, encodes a novel protein with zinc finger and helix–loop–helix domains. Development 126, 2387–2396 (1999).
Chen, P. et al. Arabidopsis R1R2R3–Myb proteins are essential for inhibiting cell division in response to DNA damage. Nat. Commun. 8, 635 (2017).
Ramirez-Parra, E., López-Matas, M. A., Fründt, C. & Gutierrez, C. Role of an atypical E2F transcription factor in the control of Arabidopsis cell growth and differentiation. Plant Cell 16, 2350–2363 (2004).
Huang, Y.-C., Niu, C.-Y., Yang, C.-R. & Jinn, T.-L. The heat stress factor HSFA6b connects ABA signaling and ABA-mediated heat responses. Plant Physiol. 172, 1182–1199 (2016).
Johnson, C., Boden, E. & Arias, J. Salicylic acid and NPR1 induce the recruitment of trans-activating TGA factors to a defense gene promoter in Arabidopsis. Plant Cell 15, 1846–1858 (2003).
Chen, H. et al. Roles of Arabidopsis WRKY18, WRKY40 and WRKY60 transcription factors in plant responses to abscisic acid and abiotic stress. BMC Plant Biol. 10, 281 (2010).
Ramachandran, P. et al. Abscisic acid signaling activates distinct VND transcription factors to promote xylem differentiation in Arabidopsis. Curr. Biol. 31, 3153–3161.e5 (2021).
1001 Genomes Consortium 1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell 166, 481–491 (2016).
Lu, Z., Hofmeister, B. T., Vollmers, C., DuBois, R. M. & Schmitz, R. J. Combining ATAC-seq with nuclei sorting for discovery of cis-regulatory regions in plant genomes. Nucleic Acids Res. 45, e41 (2017).
Shahan, R. et al. A single-cell Arabidopsis root atlas reveals developmental trajectories in wild-type and cell identity mutants. Dev. Cell 57, 543–560.e9 (2022).
Procko, C. et al. Leaf cell-specific and single-cell transcriptional profiling reveals a role for the palisade layer in UV light protection. Plant Cell 34, 3261–3279 (2022).
Lee, T. A. et al. A single-nucleus atlas of seed-to-seed development in Arabidopsis. Preprint at bioRxiv https://doi.org/10.1101/2023.03.23.533992 (2023).
He, Z. et al. scPlantDB: a comprehensive database for exploring cell types and markers of plant cell atlases. Nucleic Acids Res. 52, D1629–D1638 (2024).
Farmer, A., Thibivilliers, S., Ryu, K. H., Schiefelbein, J. & Libault, M. Single-nucleus RNA and ATAC sequencing reveals the impact of chromatin accessibility on gene expression in Arabidopsis roots at the single-cell level. Mol. Plant 14, 372–383 (2021).
Tarashansky, A. J. et al. Mapping single-cell atlases throughout Metazoa unravels cell type evolution. eLife 10, e66747 (2021).
Zhao, C., Hanada, A., Yamaguchi, S., Kamiya, Y. & Beers, E. P. The Arabidopsis Myb genes MYR1 and MYR2 are redundant negative regulators of flowering time under decreased light intensity. Plant J. 66, 502–515 (2011).
Wang, W., Sijacic, P., Xu, P., Lian, H. & Liu, Z. Arabidopsis TSO1 and MYB3R1 form a regulatory module to coordinate cell proliferation with differentiation in shoot and root. Proc. Natl Acad. Sci. USA 115, E3045–E3054 (2018).
Clark, N. M. et al. Stem-cell-ubiquitous genes spatiotemporally coordinate division through regulation of stem-cell-specific gene networks. Nat. Commun. 10, 5574 (2019).
Wang, S. et al. Light regulates stomatal development by modulating paracrine signaling from inner tissues. Nat. Commun. 12, 3403 (2021).
Yamaguchi, M. et al. VASCULAR-RELATED NAC-DOMAIN7 directly regulates the expression of a broad range of genes for xylem vessel formation. Plant J. 66, 579–590 (2011).
Coomey, J. H. et al. Mechanically induced localisation of SECONDARY WALL INTERACTING bZIP is associated with thigmomorphogenic and secondary cell wall gene expression. Quant. Plant Biol. https://doi.org/10.1017/qpb.2024.5 (2024).
Yang, S. U., Kim, H., Kim, R. J., Kim, J. & Suh, M. C. AP2/DREB transcription factor RAP2.4 activates cuticular wax biosynthesis in Arabidopsis leaves under drought. Front. Plant Sci. 11, 895 (2020).
Quon, T., Lampugnani, E. R. & Smyth, D. R. PETAL LOSS and ROXY1 interact to limit growth within and between sepals but to promote petal initiation in Arabidopsis thaliana. Front. Plant Sci. 8, 152 (2017).
McCarthy, R. L., Zhong, R. & Ye, Z.-H. MYB83 is a direct target of SND1 and acts redundantly with MYB46 in the regulation of secondary cell wall biosynthesis in Arabidopsis. Plant Cell Physiol. 50, 1950–1964 (2009).
Zhong, R. et al. Transcriptional activation of secondary wall biosynthesis by rice and maize NAC and MYB transcription factors. Plant Cell Physiol. 52, 1856–1871 (2011).
Cantó-Pastor, A. et al. A suberized exodermis is required for tomato drought tolerance. Nat. Plants 10, 118–130 (2024).
Enstone, D. E., Peterson, C. A. & Ma, F. Root endodermis and exodermis: structure, function, and responses to the environment. J. Plant Growth Regul. 21, 335–351 (2002).
Danila, F. R. et al. Bundle sheath suberisation is required for C4 photosynthesis in a Setaria viridis mutant. Commun. Biol. 4, 254 (2021).
Swift, J. et al. Exaptation of ancestral cell-identity networks enables C4 photosynthesis. Nature 636, 143–150 (2024).
Arendt, D., Bertucci, P. Y., Achim, K. & Musser, J. M. Evolution of neuronal types and families. Curr. Opin. Neurobiol. 56, 144–152 (2019).
Galli, M. et al. Transcription factor binding divergence drives transcriptional and phenotypic variation in maize. Nat. Plants https://doi.org/10.1038/s41477-025-02007-8 (2025).
Cheng, C.-Y. et al. Araport11: a complete reannotation of the Arabidopsis thaliana reference genome. Plant J. 89, 789–804 (2017).
Hu, T. T. et al. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change. Nat. Genet. 43, 476–481 (2011).
Rawat, V. et al. Improving the annotation of Arabidopsis lyrata using RNA-seq data. PLoS ONE 10, e0137391 (2015).
Slotte, T. et al. The Capsella rubella genome and the genomic consequences of rapid mating system evolution. Nat. Genet. 45, 831–835 (2013).
Parkin, I. A. P. et al. Transcriptome and methylome profiling reveals relics of genome dominance in the mesopolyploid Brassica oleracea. Genome Biol. 15, R77 (2014).
Li, Y., Pi, M., Gao, Q., Liu, Z. & Kang, C. Updated annotation of the wild strawberry Fragaria vesca V4 genome. Hortic. Res. 6, 61 (2019).
Tuskan, G. A. et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313, 1596–1604 (2006).
Hosmani, P. S. et al. An improved de novo assembly and annotation of the tomato reference genome using single-molecule sequencing, Hi-C proximity ligation and optical maps. Preprint at bioRxiv https://doi.org/10.1101/767764 (2019).
Pham, G. M. et al. Construction of a chromosome-scale long-read reference genome assembly for potato. GigaScience 9, giaa100 (2020).
Ouyang, S. et al. The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic Acids Res. 35, D883–D887 (2007).
Bartlett, A. et al. Mapping genome-wide transcription-factor binding sites using DAP-seq. Nat. Protoc. 12, 1659–1672 (2017).
Galbraith, D. W. Simultaneous flow cytometric quantification of plant nuclear DNA contents over the full range of described angiosperm 2C values. Cytom. A 75, 692–698 (2009).
BBMap https://sourceforge.net/projects/bbmap/ (SourceForge, 2022).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Bailey, T. L., Johnson, J., Grant, C. E. & Noble, W. S. The MEME suite. Nucleic Acids Res. 43, W39–W49 (2015).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Ramírez, F., Dündar, F., Diehl, S., Grüning, B. A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014).
Wu, T. et al. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innovation (Camb.) 2, 100141 (2021).
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).
Alvarez, M. et al. Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM. Sci. Rep. 10, 11019 (2020).
Montserrat-Ayuso, T. & Esteve-Codina, A. High content of nuclei-free low-quality cells in reference single-cell atlases: a call for more stringent quality control using nuclear fraction. BMC Genom. 25, 1124 (2024).
Hao, Y. et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol. 42, 293–304 (2024).
McGinnis, C. S., Murrow, L. M. & Gartner, Z. J. DoubletFinder: doublet detection in single-cell RNA sequencing data using artificial nearest neighbors. Cell Syst. 8, 329–337.e4 (2019).
Miller, C. N. et al. A single-nuclei transcriptome census of the Arabidopsis maturing root identifies that MYB67 controls phellem cell maturation. Dev. Cell https://doi.org/10.1016/j.devcel.2024.12.025 (2025).
Dorrity, M. W. et al. The regulatory landscape of Arabidopsis thaliana roots at single-cell resolution. Nat. Commun. 12, 3334 (2021).
Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016).
Jo, L. & Kajala, K. ggPlantmap: an open-source R package for the creation of informative and quantitative ggplot maps derived from plant images. J. Exp. Bot. 75, 5366–5376 (2024).
Torchiano, M. Effsize—a package for efficient effect size computation. Zenodo https://doi.org/10.5281/ZENODO.196082 (2016).
Gandrud, C. networkD3: D3 JavaScript network graphs from R. R package version 0.4 https://cran.r-project.org/web/packages/networkD3 (2014).
Acknowledgements
The work conducted by the US Department of Energy Joint Genome Institute (https://ror.org/04xm1d337), a DOE Office of Science User Facility, is supported by the Office of Science of the US Department of Energy operated under Contract No. DE-AC02-05CH11231. We also thank the following individuals: C. Beecroft and T. Reddy for assisting with the submission of raw data files; L. Gerard for illustrating plant cartoons; B. Cole, J. Humphries, A. Visel and Z. Zhang for editing the text and providing comments; M. Fix, N. Bassil and K. Hummer at the USDA ARS NCGR for providing F. vesca germplasm; J. Edwards and the Sundar lab at UC Davis for providing O. sativa germplasm; J. Schartner at the USDA ARS VCRU for providing S. tuberosum germplasm; and M. Harrison and T. Fields at the USDA ARS PGRCU for providing S. bicolor germplasm.
Author information
Authors and Affiliations
Contributions
L.A.B., R.C.O. and S.I.G. conceptualized the project. A.M.-C., D.J.D., L.A.B., P.W., R.C.O., S.I.G. and Y.Z. devised the methodology. A.M.-C., D.J.D., L.A.B., R.C.O., S.I.G. and Y.Z. validated the data. A.C.G., A.M.-C., L.A.B., L.Y., R.C.O. and S.I.G. carried out the formal analysis. C.C., E.S., G.H., N.G., P.W. and Y.Z. conducted the investigation. A.M.-C., L.A.B., P.W., R.C.O., S.I.G. and Y.Z. wrote the paper. A.M.-C., L.A.B., L.Y. and S.I.G. visualized the data. C.G.D., I.K.B., L.A.B., R.C.O., S.I.G. and Y.Y. supervised the project.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Plants thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Comparisons of genome-wide binding signal between different TFs and technical replicates.
(a) Example 20 kilobase region of the A. thaliana genome shows similar TF-binding profiles of four example A. thaliana TFs and their corresponding orthologs from five other species. Numbers after TF IDs indicate percent amino acid identity compared to the A. thaliana ortholog. (b) All-vs-all genome-wide correlations of DAP-seq binding profiles between TFs from six species. Gray grid lines delineate individual groups of orthologous TFs and colored bars indicate TF families. (c) Correlations between different subsets of DAP-seq. Boxplots are drawn according to standard convention, where the box center represents the median, upper and lower box bounds represent the upper and lower limits of the interquartile range, and whiskers extend to the furthest data point within 1.5x the interquartile range from the box. Numbers in parentheses indicate the number of pairwise comparison data points shown in each boxplot. FRIP = fraction of reads in peaks, which serves as a measure of signal-to-noise in each DAP-seq library and can be used as an estimate of overall performance of the assay. Performance in the assay is variable between different TFs, and also varies slightly between technical replicates of the same TF. The category labeled ‘replicate with closest FRIP’ partially controls for this confounding effect on correlations: for each comparison between an A. thaliana TF and an orthologous TF, one technical replicate of each TF was selected such that the difference in FRIP scores between the pair being compared was minimized. In the category labeled ‘same motif group’, motif-groups refers to groups of A. thaliana TFs that fall within the same TF family and also show highly similar motifs (see Supplementary Table 1).
Extended Data Fig. 2 Genomic TF binding maps across 10 species generated by multiDAP.
(a) All-vs-all genome-wide correlations of DAP-seq binding profiles between four replicates for each TF using either 1, 4, or 10 multiplexed species. Gray grid lines delineate individual TFs and colored bars indicate TF families. Mean correlation between single species replicates = 0.94, standard deviation = 0.08, n = 116; mean correlation between single and four species experiments = 0.94, standard deviation = 0.06, n = 211; two-sample t-test p = 0.56. Mean correlation of 10 species vs. single species experiments = 0.87, standard deviation = 0.1, n = 163. (b) Count of peaks in genomic regions normalized by the number of total genes per species. CDS: coding sequence; distal intergenic: >2,000 bp upstream of any start codon; downstream: within 2,000 bp downstream of a stop codon; proximal genic: within 500 bp downstream of a start codon; proximal intergenic: within 2,000 bp upstream of a start codon (c) Number of TF target genes shared between A. thaliana and other brassica species. (d) Overrepresentation test for conserved sites. A subset of representative example TFs are displayed. All 244 TFs were significantly enriched (p < 0.01) at c scores ≥ 2.
Extended Data Fig. 3 Identification and labeling of cell types in snRNA-seq atlases of four Brassicaceae species.
(a) Integration of internal and external A. thaliana seedling snRNA-seq datasets (first row) and results of label transfer from three reference atlases (second row). Only cells receiving a label with confidence score >0.4 are shown in these plots. (b) UMAP representations of mature leaf and flower bud snRNA-seq atlases from four profiled brassica species. (c) Dotplot of the top marker gene for each cell type, where dot size represents the percent of all nuclei with a specific cell type label showing non-zero expression of the gene, and color represents average normalized and scaled expression across these nuclei.
Extended Data Fig. 4 Conserved TFBSs are predictive of target gene expression patterns.
(a) TF-target co-expression across all nuclei in the A. thaliana seedling atlas as a function of multiDAP TF binding affinity and multiDAP peak conservation score. Y-axis represents the mean fold-change between true correlation of normalized TF expression with target expression vs. correlation with a matched control target gene. Point size represents the number of DAP-seq peaks considered, and error bars represent standard error of the mean. (b) Cell type specificity scores as a function of target conservation for 244 TFs in each of the four brassica seedling snRNA-seq atlases. Cell type specificity was defined as the F-statistic from an ANOVA test of the extent to which per-nucleus target gene expression varied systematically across labeled cell types. All distributions within a species were significantly different from each other (FDR < 0.05) using paired one-sided t-tests. Boxplots are drawn according to standard convention, where the box center represents the median, upper and lower box bounds represent the upper and lower limits of the interquartile range, and whiskers extend to the furthest data point within 1.5x the interquartile range from the box. (c) NAC007/VND4 target gene expression summary scores using target c scores from extended 10-species multiDAP dataset. Note that two target genes are excluded as they were not expressed in any cells in the snRNA-seq dataset.
Extended Data Fig. 5 Conserved TFBSs are predictive of cell type-specific gene expression.
Network diagrams showing TFs with enriched activity scores (based on summarized expression of all c4 target genes) and the cell-type-specific marker genes they target for four example cell types: a) dividing cells, b) procambium, c) suberized-endodermis, and d) mature-atrichoblast. TFs are shown in pink and marker genes are yellow, with a gray line connecting them if a brassica-c4 TFBS for the TF was associated with the marker gene. Line thickness corresponds to relative binding strength (measured as DAP-seq peak height) and marker gene node size represents relative cell type specificity (measured as expression log2-fold enrichment). e) Distribution of Pearson correlations between pairs of TF activity score profiles, calculated for all pairs of cell types identified in the four brassica species and binned by concordance of cell type, tissue, and species. The highest correlations mark corresponding cell types from the same tissue found in different species, indicating that cell-type-specific TF activity is generally conserved across species. Number of pairs summarized by each boxplot is indicated in the x-axis labels. Boxplots are drawn according to standard convention, where the box center represents the median, upper and lower box bounds represent the upper and lower limits of the interquartile range, and whiskers extend to the furthest data point within 1.5x the interquartile range from the box.
Extended Data Fig. 6 Core sets of TFBSs are conserved across flowering plants.
Scatterplot showing enrichment of each TF’s core regulon (defined as target orthogroups conserved in both grasses and all four brassica), for TFs assayed in the 10 species multiDAP experiment. The enrichment scores (y-axis) indicate the extent to which the number of TF target genes shared by both grasses and brassica species was more than expected by chance, given the number of TF target genes conserved within each lineage alone. The vast majority of TFs showed significant enrichment scores (blue dots), indicating that persistent core regulons are a common feature of TF regulatory networks. To ensure meaningful statistics, only the 70 TFs targeting at least 150 total brassica-c4 or grass-c2 target orthogroups were tested. Each point represents a single TF, and x-axis shows the total number of brassica-c4 orthogroups, dot size shows the total number of grass c2-orthogroups, and the y-axis shows the enrichment of orthogroups shared between these two sets. To calculate enrichments, true counts of core regulon target orthogroups for each TF (shown in parentheses after TF name for top examples) were compared to background distributions generated by shuffling orthogroup labels for c2-grass TFBSs 1,000 times. Significance (dot color) was calculated as the number of times the shuffled core regulon count met or exceeded the actual count, followed by FDR correction. Significance was assessed at FDR < 0.05. Enrichment score (y-axis) was calculated as the fold change between the actual count and the average shuffled count.
Extended Data Fig. 7 Identification and labeling of cell types in snRNA-seq atlases of sorghum.
Dotplots showing expression of sorghum orthologs of experimentally validated cell type marker genes compiled from published literature in each cluster of the sorghum leaf (top) and root (bottom) snRNA-seq atlas. Column labels specify the cell type(s) where gene expression was experimentally observed/validated, followed by the gene alias. LRC = lateral root cap. For gene IDs and publication references, see Supplementary Table 10.
Extended Data Fig. 8 Comparison of TF activity scores in leaf cell types of A. thaliana and sorghum.
Sankey plot showing top activity score correlations for each sorghum leaf cell type (right) among all A. thaliana leaf cell types (left). A vector of TF activity scores was constructed for each sorghum and A. thaliana cell type, quantifying enrichment of each TF’s core regulon target genes (defined as conserved in both grasses and all four brassica) using Cohen’s d. Then Pearson correlations were calculated between the TF activity score vectors of all cross-species cell type pairs. Ribbons in the diagram represent the top two strongest correlations for each sorghum cell type, after filtering to correlations with r > 0.3. Ribbon width corresponds to Pearson r2 value, and ribbons are labeled with TF families found in the top 5 enrichments for both the A. thaliana and sorghum cell type.
Extended Data Fig. 9 MYB83 and NAC007 expression in roots.
Dotplots showing expression of TF genes MYB83 and NAC007 across all A. thaliana seedling cell types and all sorghum root cell types.
Extended Data Fig. 10 Differential involvement of MYB and NAC in driving cell type-specific gene expression in xylem.
Table showing NAC007 and MYB83 TFBSs associated with the top 50 marker genes for the mature-xylem cluster in the A. thaliana seedling atlas and their sorghum orthologs. Specificity of expression of each ortholog in the mature xylem cluster of the sorghum root atlas is also shown in the column after the sorghum ortholog ID. Green shading highlights brassica-c4 TFBSs and yellow highlights grass-c2 TFBSs. Outlined boxes highlight A. thaliana genes with a brassica-specific NAC007 TFBS and no MYB83 TFBS whose sorghum orthologs have the reverse: a grass-specific MYB83 TFBS but no NAC007 TFBS.
Supplementary information
Supplementary Information
Captions for Supplementary Tables 1–12.
Supplementary Tables
Supplementary Tables 1–12. See the Supplementary Information for the table captions.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Baumgart, L.A., Greenblum, S.I., Morales-Cruz, A. et al. Recruitment, rewiring and deep conservation in flowering plant gene regulation. Nat. Plants 11, 1514–1527 (2025). https://doi.org/10.1038/s41477-025-02047-0
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41477-025-02047-0
This article is cited by
-
A binding agreement
Nature Plants (2025)
-
A single-cell rice atlas integrates multi-species data to reveal cis-regulatory evolution
Nature Plants (2025)
-
Cis-regulatory elements: systematic identification and horticultural applications
aBIOTECH (2025)