Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Recruitment, rewiring and deep conservation in flowering plant gene regulation

Abstract

Transcription factors (TFs) are proteins that bind DNA to control where and when genes are expressed. In plants, dozens of TF families interact with distinct sets of binding sites (TFBSs) that reflect each TF’s role in organismal function and species-specific adaptations. However, defining these roles and understanding broader patterns of regulatory evolution remain challenging, as predicted TFBSs may lack a clear impact on transcription, and experimentally derived TF binding maps to date are modest in scale or restricted to model organisms. Here we present a scalable TFBS assay that we leveraged to create an atlas of nearly 3,000 genome-wide binding site maps for 360 TFs in ten species spanning 150 million years of flowering plant evolution. We found that TF orthologues from distant species retain nearly identical binding preferences, while on the same timescales the gain and loss of TFBSs are widespread. Within lineages, however, conserved TFBSs are over-represented and found in regions harbouring signatures of functional regulatory elements. Moreover, genes with conserved TFBSs showed striking enrichment for cell-type-specific expression in 14 single-nucleus RNA atlases, providing a robust marker of each TF’s activity and developmental role. Finally, we compare distant lineages, illustrating how ancient regulatory modules were recruited and rewired to enable adaptations underlying the evolutionary success of grasses.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Plant multiDAP identifies motif specificities and TFBSs across 150 million years of evolution.
Fig. 2: Expression of TF target genes across snRNA-seq seedling atlases enables the construction of a multi-tissue regulatory map of cell-type-specific gene expression programs.
Fig. 3: Single-nucleus transcriptomes of S. bicolor seedlings reveal conserved and grass-specific regulatory networks driving cell-type identity.

Similar content being viewed by others

Data availability

The raw DAP-seq fastq sequence data files were submitted to the National Center for Biotechnology Information under BioProject no. PRJNA1177505. The raw snRNA-seq fastq sequence data files were submitted to the National Center for Biotechnology Information under BioProject no. PRJNA1262374. The processed DAP-seq data including peak files (narrowPeak format), coverage tracks (bigwig format), c score tables for four-species and ten-species datasets (.tsv format) and a readme, as well as processed snRNA-seq data including single-cell gene expression matrices (.mtx format) for each library and compiled seurat objects for each species tissue, were submitted to the National Center for Biotechnology Information under GEO superseries accession no. GSE299028. A compiled list of A. thaliana cell-type marker genes used to assist with cell-type annotation was downloaded from scPlantDB (https://biobigdata.nju.edu.cn/scplantdb/marker).

Code availability

Scripts and example data files used for the DAP-seq and single-nucleus analyses are available in a Git repository at https://code.jgi.doe.gov/LBaumgart/plant-multidap-and-single-cell.

References

  1. Weirauch, M. T. et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  2. Baumgart, L. A. et al. Persistence and plasticity in bacterial gene regulation. Nat. Methods 18, 1499–1505 (2021).

    Article  PubMed  CAS  Google Scholar 

  3. O’Malley, R. C. et al. Cistrome and epicistrome features shape the regulatory DNA landscape. Cell 165, 1280–1292 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Chaw, S.-M., Chang, C.-C., Chen, H.-L. & Li, W.-H. Dating the monocot–dicot divergence and the origin of core eudicots using whole chloroplast genomes. J. Mol. Evol. 58, 424–441 (2004).

    Article  PubMed  CAS  Google Scholar 

  5. Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Haudry, A. et al. An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions. Nat. Genet. 45, 891–898 (2013).

    Article  PubMed  CAS  Google Scholar 

  7. Schmidt, D. et al. Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science 328, 1036–1040 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. Ballester, B. et al. Multi-species, multi-transcription factor binding highlights conserved control of tissue-specific biological pathways. eLife 3, e02626 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Huang, X.-C., German, D. A. & Koch, M. A. Temporal patterns of diversification in Brassicaceae demonstrate decoupling of rate shifts and mesopolyploidization events. Ann. Bot. 125, 29–47 (2020).

    Article  PubMed  Google Scholar 

  10. Gou, M. et al. The MYB107 transcription factor positively regulates suberin biosynthesis. Plant Physiol. 173, 1045–1058 (2017).

    Article  PubMed  CAS  Google Scholar 

  11. Shukla, V. et al. Suberin plasticity to developmental and exogenous cues is regulated by a set of MYB transcription factors. Proc. Natl Acad. Sci. USA https://doi.org/10.1073/pnas.2101730118 (2021).

  12. Woolfson, K. N., Esfandiari, M. & Bernards, M. A. Suberin biosynthesis, assembly, and regulation. Plants 11, 555 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. Serra, O. & Geldner, N. The making of suberin. N. Phytol. 235, 848–866 (2022).

    Article  CAS  Google Scholar 

  14. Fraser, C. M. & Chapple, C. The phenylpropanoid pathway in Arabidopsis. Arabidopsis Book 9, e0152 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Jiang, C.-K. & Rao, G.-Y. Insights into the diversification and evolution of R2R3–MYB transcription factors in plants. Plant Physiol. 183, 637–655 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. Shibata, M. et al. GTL1 and DF1 regulate root hair growth through transcriptional repression of ROOT HAIR DEFECTIVE 6-LIKE 4 in Arabidopsis. Development 145, dev159707 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Hrmova, M. & Hussain, S. S. Plant transcription factors involved in drought and associated stresses. Int. J. Mol. Sci. 22, 7181 (2021).

    Article  Google Scholar 

  18. Kim, Y., Park, S., Gilmour, S. J. & Thomashow, M. F. Roles of CAMTA transcription factors and salicylic acid in configuring the low‐temperature transcriptome and freezing tolerance of Arabidopsis. Plant J. 75, 364–376 (2013).

    Article  PubMed  CAS  Google Scholar 

  19. Bowman, J. L. & Smyth, D. R. CRABS CLAW, a gene that regulates carpel and nectary development in Arabidopsis, encodes a novel protein with zinc finger and helix–loop–helix domains. Development 126, 2387–2396 (1999).

    Article  PubMed  CAS  Google Scholar 

  20. Chen, P. et al. Arabidopsis R1R2R3–Myb proteins are essential for inhibiting cell division in response to DNA damage. Nat. Commun. 8, 635 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Ramirez-Parra, E., López-Matas, M. A., Fründt, C. & Gutierrez, C. Role of an atypical E2F transcription factor in the control of Arabidopsis cell growth and differentiation. Plant Cell 16, 2350–2363 (2004).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. Huang, Y.-C., Niu, C.-Y., Yang, C.-R. & Jinn, T.-L. The heat stress factor HSFA6b connects ABA signaling and ABA-mediated heat responses. Plant Physiol. 172, 1182–1199 (2016).

    PubMed  PubMed Central  CAS  Google Scholar 

  23. Johnson, C., Boden, E. & Arias, J. Salicylic acid and NPR1 induce the recruitment of trans-activating TGA factors to a defense gene promoter in Arabidopsis. Plant Cell 15, 1846–1858 (2003).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  24. Chen, H. et al. Roles of Arabidopsis WRKY18, WRKY40 and WRKY60 transcription factors in plant responses to abscisic acid and abiotic stress. BMC Plant Biol. 10, 281 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Ramachandran, P. et al. Abscisic acid signaling activates distinct VND transcription factors to promote xylem differentiation in Arabidopsis. Curr. Biol. 31, 3153–3161.e5 (2021).

    Article  PubMed  CAS  Google Scholar 

  26. 1001 Genomes Consortium 1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell 166, 481–491 (2016).

    Article  Google Scholar 

  27. Lu, Z., Hofmeister, B. T., Vollmers, C., DuBois, R. M. & Schmitz, R. J. Combining ATAC-seq with nuclei sorting for discovery of cis-regulatory regions in plant genomes. Nucleic Acids Res. 45, e41 (2017).

    Article  PubMed  Google Scholar 

  28. Shahan, R. et al. A single-cell Arabidopsis root atlas reveals developmental trajectories in wild-type and cell identity mutants. Dev. Cell 57, 543–560.e9 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  29. Procko, C. et al. Leaf cell-specific and single-cell transcriptional profiling reveals a role for the palisade layer in UV light protection. Plant Cell 34, 3261–3279 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Lee, T. A. et al. A single-nucleus atlas of seed-to-seed development in Arabidopsis. Preprint at bioRxiv https://doi.org/10.1101/2023.03.23.533992 (2023).

  31. He, Z. et al. scPlantDB: a comprehensive database for exploring cell types and markers of plant cell atlases. Nucleic Acids Res. 52, D1629–D1638 (2024).

    Article  PubMed  Google Scholar 

  32. Farmer, A., Thibivilliers, S., Ryu, K. H., Schiefelbein, J. & Libault, M. Single-nucleus RNA and ATAC sequencing reveals the impact of chromatin accessibility on gene expression in Arabidopsis roots at the single-cell level. Mol. Plant 14, 372–383 (2021).

    Article  PubMed  CAS  Google Scholar 

  33. Tarashansky, A. J. et al. Mapping single-cell atlases throughout Metazoa unravels cell type evolution. eLife 10, e66747 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  34. Zhao, C., Hanada, A., Yamaguchi, S., Kamiya, Y. & Beers, E. P. The Arabidopsis Myb genes MYR1 and MYR2 are redundant negative regulators of flowering time under decreased light intensity. Plant J. 66, 502–515 (2011).

    Article  PubMed  CAS  Google Scholar 

  35. Wang, W., Sijacic, P., Xu, P., Lian, H. & Liu, Z. Arabidopsis TSO1 and MYB3R1 form a regulatory module to coordinate cell proliferation with differentiation in shoot and root. Proc. Natl Acad. Sci. USA 115, E3045–E3054 (2018).

    PubMed  PubMed Central  CAS  Google Scholar 

  36. Clark, N. M. et al. Stem-cell-ubiquitous genes spatiotemporally coordinate division through regulation of stem-cell-specific gene networks. Nat. Commun. 10, 5574 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  37. Wang, S. et al. Light regulates stomatal development by modulating paracrine signaling from inner tissues. Nat. Commun. 12, 3403 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. Yamaguchi, M. et al. VASCULAR-RELATED NAC-DOMAIN7 directly regulates the expression of a broad range of genes for xylem vessel formation. Plant J. 66, 579–590 (2011).

    Article  PubMed  CAS  Google Scholar 

  39. Coomey, J. H. et al. Mechanically induced localisation of SECONDARY WALL INTERACTING bZIP is associated with thigmomorphogenic and secondary cell wall gene expression. Quant. Plant Biol. https://doi.org/10.1017/qpb.2024.5 (2024).

  40. Yang, S. U., Kim, H., Kim, R. J., Kim, J. & Suh, M. C. AP2/DREB transcription factor RAP2.4 activates cuticular wax biosynthesis in Arabidopsis leaves under drought. Front. Plant Sci. 11, 895 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  41. Quon, T., Lampugnani, E. R. & Smyth, D. R. PETAL LOSS and ROXY1 interact to limit growth within and between sepals but to promote petal initiation in Arabidopsis thaliana. Front. Plant Sci. 8, 152 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  42. McCarthy, R. L., Zhong, R. & Ye, Z.-H. MYB83 is a direct target of SND1 and acts redundantly with MYB46 in the regulation of secondary cell wall biosynthesis in Arabidopsis. Plant Cell Physiol. 50, 1950–1964 (2009).

    Article  PubMed  CAS  Google Scholar 

  43. Zhong, R. et al. Transcriptional activation of secondary wall biosynthesis by rice and maize NAC and MYB transcription factors. Plant Cell Physiol. 52, 1856–1871 (2011).

    Article  PubMed  CAS  Google Scholar 

  44. Cantó-Pastor, A. et al. A suberized exodermis is required for tomato drought tolerance. Nat. Plants 10, 118–130 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Enstone, D. E., Peterson, C. A. & Ma, F. Root endodermis and exodermis: structure, function, and responses to the environment. J. Plant Growth Regul. 21, 335–351 (2002).

    Article  CAS  Google Scholar 

  46. Danila, F. R. et al. Bundle sheath suberisation is required for C4 photosynthesis in a Setaria viridis mutant. Commun. Biol. 4, 254 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  47. Swift, J. et al. Exaptation of ancestral cell-identity networks enables C4 photosynthesis. Nature 636, 143–150 (2024).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  48. Arendt, D., Bertucci, P. Y., Achim, K. & Musser, J. M. Evolution of neuronal types and families. Curr. Opin. Neurobiol. 56, 144–152 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  49. Galli, M. et al. Transcription factor binding divergence drives transcriptional and phenotypic variation in maize. Nat. Plants https://doi.org/10.1038/s41477-025-02007-8 (2025).

  50. Cheng, C.-Y. et al. Araport11: a complete reannotation of the Arabidopsis thaliana reference genome. Plant J. 89, 789–804 (2017).

    Article  PubMed  CAS  Google Scholar 

  51. Hu, T. T. et al. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change. Nat. Genet. 43, 476–481 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Rawat, V. et al. Improving the annotation of Arabidopsis lyrata using RNA-seq data. PLoS ONE 10, e0137391 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Slotte, T. et al. The Capsella rubella genome and the genomic consequences of rapid mating system evolution. Nat. Genet. 45, 831–835 (2013).

    Article  PubMed  CAS  Google Scholar 

  54. Parkin, I. A. P. et al. Transcriptome and methylome profiling reveals relics of genome dominance in the mesopolyploid Brassica oleracea. Genome Biol. 15, R77 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  55. Li, Y., Pi, M., Gao, Q., Liu, Z. & Kang, C. Updated annotation of the wild strawberry Fragaria vesca V4 genome. Hortic. Res. 6, 61 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Tuskan, G. A. et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313, 1596–1604 (2006).

    Article  PubMed  CAS  Google Scholar 

  57. Hosmani, P. S. et al. An improved de novo assembly and annotation of the tomato reference genome using single-molecule sequencing, Hi-C proximity ligation and optical maps. Preprint at bioRxiv https://doi.org/10.1101/767764 (2019).

  58. Pham, G. M. et al. Construction of a chromosome-scale long-read reference genome assembly for potato. GigaScience 9, giaa100 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Ouyang, S. et al. The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic Acids Res. 35, D883–D887 (2007).

    Article  PubMed  CAS  Google Scholar 

  60. Bartlett, A. et al. Mapping genome-wide transcription-factor binding sites using DAP-seq. Nat. Protoc. 12, 1659–1672 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  61. Galbraith, D. W. Simultaneous flow cytometric quantification of plant nuclear DNA contents over the full range of described angiosperm 2C values. Cytom. A 75, 692–698 (2009).

    Article  Google Scholar 

  62. BBMap https://sourceforge.net/projects/bbmap/ (SourceForge, 2022).

  63. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  64. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  65. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  66. Bailey, T. L., Johnson, J., Grant, C. E. & Noble, W. S. The MEME suite. Nucleic Acids Res. 43, W39–W49 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  67. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  68. Ramírez, F., Dündar, F., Diehl, S., Grüning, B. A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  69. Wu, T. et al. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innovation (Camb.) 2, 100141 (2021).

    PubMed  CAS  Google Scholar 

  70. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  71. La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  72. Alvarez, M. et al. Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM. Sci. Rep. 10, 11019 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  73. Montserrat-Ayuso, T. & Esteve-Codina, A. High content of nuclei-free low-quality cells in reference single-cell atlases: a call for more stringent quality control using nuclear fraction. BMC Genom. 25, 1124 (2024).

    Article  CAS  Google Scholar 

  74. Hao, Y. et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol. 42, 293–304 (2024).

    Article  PubMed  CAS  Google Scholar 

  75. McGinnis, C. S., Murrow, L. M. & Gartner, Z. J. DoubletFinder: doublet detection in single-cell RNA sequencing data using artificial nearest neighbors. Cell Syst. 8, 329–337.e4 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  76. Miller, C. N. et al. A single-nuclei transcriptome census of the Arabidopsis maturing root identifies that MYB67 controls phellem cell maturation. Dev. Cell https://doi.org/10.1016/j.devcel.2024.12.025 (2025).

  77. Dorrity, M. W. et al. The regulatory landscape of Arabidopsis thaliana roots at single-cell resolution. Nat. Commun. 12, 3334 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  78. Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  79. Jo, L. & Kajala, K. ggPlantmap: an open-source R package for the creation of informative and quantitative ggplot maps derived from plant images. J. Exp. Bot. 75, 5366–5376 (2024).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  80. Torchiano, M. Effsize—a package for efficient effect size computation. Zenodo https://doi.org/10.5281/ZENODO.196082 (2016).

  81. Gandrud, C. networkD3: D3 JavaScript network graphs from R. R package version 0.4 https://cran.r-project.org/web/packages/networkD3 (2014).

Download references

Acknowledgements

The work conducted by the US Department of Energy Joint Genome Institute (https://ror.org/04xm1d337), a DOE Office of Science User Facility, is supported by the Office of Science of the US Department of Energy operated under Contract No. DE-AC02-05CH11231. We also thank the following individuals: C. Beecroft and T. Reddy for assisting with the submission of raw data files; L. Gerard for illustrating plant cartoons; B. Cole, J. Humphries, A. Visel and Z. Zhang for editing the text and providing comments; M. Fix, N. Bassil and K. Hummer at the USDA ARS NCGR for providing F. vesca germplasm; J. Edwards and the Sundar lab at UC Davis for providing O. sativa germplasm; J. Schartner at the USDA ARS VCRU for providing S. tuberosum germplasm; and M. Harrison and T. Fields at the USDA ARS PGRCU for providing S. bicolor germplasm.

Author information

Authors and Affiliations

Authors

Contributions

L.A.B., R.C.O. and S.I.G. conceptualized the project. A.M.-C., D.J.D., L.A.B., P.W., R.C.O., S.I.G. and Y.Z. devised the methodology. A.M.-C., D.J.D., L.A.B., R.C.O., S.I.G. and Y.Z. validated the data. A.C.G., A.M.-C., L.A.B., L.Y., R.C.O. and S.I.G. carried out the formal analysis. C.C., E.S., G.H., N.G., P.W. and Y.Z. conducted the investigation. A.M.-C., L.A.B., P.W., R.C.O., S.I.G. and Y.Z. wrote the paper. A.M.-C., L.A.B., L.Y. and S.I.G. visualized the data. C.G.D., I.K.B., L.A.B., R.C.O., S.I.G. and Y.Y. supervised the project.

Corresponding authors

Correspondence to Leo A. Baumgart or Ronan C. O’Malley.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Plants thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Comparisons of genome-wide binding signal between different TFs and technical replicates.

(a) Example 20 kilobase region of the A. thaliana genome shows similar TF-binding profiles of four example A. thaliana TFs and their corresponding orthologs from five other species. Numbers after TF IDs indicate percent amino acid identity compared to the A. thaliana ortholog. (b) All-vs-all genome-wide correlations of DAP-seq binding profiles between TFs from six species. Gray grid lines delineate individual groups of orthologous TFs and colored bars indicate TF families. (c) Correlations between different subsets of DAP-seq. Boxplots are drawn according to standard convention, where the box center represents the median, upper and lower box bounds represent the upper and lower limits of the interquartile range, and whiskers extend to the furthest data point within 1.5x the interquartile range from the box. Numbers in parentheses indicate the number of pairwise comparison data points shown in each boxplot. FRIP = fraction of reads in peaks, which serves as a measure of signal-to-noise in each DAP-seq library and can be used as an estimate of overall performance of the assay. Performance in the assay is variable between different TFs, and also varies slightly between technical replicates of the same TF. The category labeled ‘replicate with closest FRIP’ partially controls for this confounding effect on correlations: for each comparison between an A. thaliana TF and an orthologous TF, one technical replicate of each TF was selected such that the difference in FRIP scores between the pair being compared was minimized. In the category labeled ‘same motif group’, motif-groups refers to groups of A. thaliana TFs that fall within the same TF family and also show highly similar motifs (see Supplementary Table 1).

Extended Data Fig. 2 Genomic TF binding maps across 10 species generated by multiDAP.

(a) All-vs-all genome-wide correlations of DAP-seq binding profiles between four replicates for each TF using either 1, 4, or 10 multiplexed species. Gray grid lines delineate individual TFs and colored bars indicate TF families. Mean correlation between single species replicates = 0.94, standard deviation = 0.08, n = 116; mean correlation between single and four species experiments = 0.94, standard deviation = 0.06, n = 211; two-sample t-test p = 0.56. Mean correlation of 10 species vs. single species experiments = 0.87, standard deviation = 0.1, n = 163. (b) Count of peaks in genomic regions normalized by the number of total genes per species. CDS: coding sequence; distal intergenic: >2,000 bp upstream of any start codon; downstream: within 2,000 bp downstream of a stop codon; proximal genic: within 500 bp downstream of a start codon; proximal intergenic: within 2,000 bp upstream of a start codon (c) Number of TF target genes shared between A. thaliana and other brassica species. (d) Overrepresentation test for conserved sites. A subset of representative example TFs are displayed. All 244 TFs were significantly enriched (p < 0.01) at c scores ≥ 2.

Extended Data Fig. 3 Identification and labeling of cell types in snRNA-seq atlases of four Brassicaceae species.

(a) Integration of internal and external A. thaliana seedling snRNA-seq datasets (first row) and results of label transfer from three reference atlases (second row). Only cells receiving a label with confidence score >0.4 are shown in these plots. (b) UMAP representations of mature leaf and flower bud snRNA-seq atlases from four profiled brassica species. (c) Dotplot of the top marker gene for each cell type, where dot size represents the percent of all nuclei with a specific cell type label showing non-zero expression of the gene, and color represents average normalized and scaled expression across these nuclei.

Extended Data Fig. 4 Conserved TFBSs are predictive of target gene expression patterns.

(a) TF-target co-expression across all nuclei in the A. thaliana seedling atlas as a function of multiDAP TF binding affinity and multiDAP peak conservation score. Y-axis represents the mean fold-change between true correlation of normalized TF expression with target expression vs. correlation with a matched control target gene. Point size represents the number of DAP-seq peaks considered, and error bars represent standard error of the mean. (b) Cell type specificity scores as a function of target conservation for 244 TFs in each of the four brassica seedling snRNA-seq atlases. Cell type specificity was defined as the F-statistic from an ANOVA test of the extent to which per-nucleus target gene expression varied systematically across labeled cell types. All distributions within a species were significantly different from each other (FDR < 0.05) using paired one-sided t-tests. Boxplots are drawn according to standard convention, where the box center represents the median, upper and lower box bounds represent the upper and lower limits of the interquartile range, and whiskers extend to the furthest data point within 1.5x the interquartile range from the box. (c) NAC007/VND4 target gene expression summary scores using target c scores from extended 10-species multiDAP dataset. Note that two target genes are excluded as they were not expressed in any cells in the snRNA-seq dataset.

Extended Data Fig. 5 Conserved TFBSs are predictive of cell type-specific gene expression.

Network diagrams showing TFs with enriched activity scores (based on summarized expression of all c4 target genes) and the cell-type-specific marker genes they target for four example cell types: a) dividing cells, b) procambium, c) suberized-endodermis, and d) mature-atrichoblast. TFs are shown in pink and marker genes are yellow, with a gray line connecting them if a brassica-c4 TFBS for the TF was associated with the marker gene. Line thickness corresponds to relative binding strength (measured as DAP-seq peak height) and marker gene node size represents relative cell type specificity (measured as expression log2-fold enrichment). e) Distribution of Pearson correlations between pairs of TF activity score profiles, calculated for all pairs of cell types identified in the four brassica species and binned by concordance of cell type, tissue, and species. The highest correlations mark corresponding cell types from the same tissue found in different species, indicating that cell-type-specific TF activity is generally conserved across species. Number of pairs summarized by each boxplot is indicated in the x-axis labels. Boxplots are drawn according to standard convention, where the box center represents the median, upper and lower box bounds represent the upper and lower limits of the interquartile range, and whiskers extend to the furthest data point within 1.5x the interquartile range from the box.

Extended Data Fig. 6 Core sets of TFBSs are conserved across flowering plants.

Scatterplot showing enrichment of each TF’s core regulon (defined as target orthogroups conserved in both grasses and all four brassica), for TFs assayed in the 10 species multiDAP experiment. The enrichment scores (y-axis) indicate the extent to which the number of TF target genes shared by both grasses and brassica species was more than expected by chance, given the number of TF target genes conserved within each lineage alone. The vast majority of TFs showed significant enrichment scores (blue dots), indicating that persistent core regulons are a common feature of TF regulatory networks. To ensure meaningful statistics, only the 70 TFs targeting at least 150 total brassica-c4 or grass-c2 target orthogroups were tested. Each point represents a single TF, and x-axis shows the total number of brassica-c4 orthogroups, dot size shows the total number of grass c2-orthogroups, and the y-axis shows the enrichment of orthogroups shared between these two sets. To calculate enrichments, true counts of core regulon target orthogroups for each TF (shown in parentheses after TF name for top examples) were compared to background distributions generated by shuffling orthogroup labels for c2-grass TFBSs 1,000 times. Significance (dot color) was calculated as the number of times the shuffled core regulon count met or exceeded the actual count, followed by FDR correction. Significance was assessed at FDR < 0.05. Enrichment score (y-axis) was calculated as the fold change between the actual count and the average shuffled count.

Extended Data Fig. 7 Identification and labeling of cell types in snRNA-seq atlases of sorghum.

Dotplots showing expression of sorghum orthologs of experimentally validated cell type marker genes compiled from published literature in each cluster of the sorghum leaf (top) and root (bottom) snRNA-seq atlas. Column labels specify the cell type(s) where gene expression was experimentally observed/validated, followed by the gene alias. LRC = lateral root cap. For gene IDs and publication references, see Supplementary Table 10.

Extended Data Fig. 8 Comparison of TF activity scores in leaf cell types of A. thaliana and sorghum.

Sankey plot showing top activity score correlations for each sorghum leaf cell type (right) among all A. thaliana leaf cell types (left). A vector of TF activity scores was constructed for each sorghum and A. thaliana cell type, quantifying enrichment of each TF’s core regulon target genes (defined as conserved in both grasses and all four brassica) using Cohen’s d. Then Pearson correlations were calculated between the TF activity score vectors of all cross-species cell type pairs. Ribbons in the diagram represent the top two strongest correlations for each sorghum cell type, after filtering to correlations with r > 0.3. Ribbon width corresponds to Pearson r2 value, and ribbons are labeled with TF families found in the top 5 enrichments for both the A. thaliana and sorghum cell type.

Extended Data Fig. 9 MYB83 and NAC007 expression in roots.

Dotplots showing expression of TF genes MYB83 and NAC007 across all A. thaliana seedling cell types and all sorghum root cell types.

Extended Data Fig. 10 Differential involvement of MYB and NAC in driving cell type-specific gene expression in xylem.

Table showing NAC007 and MYB83 TFBSs associated with the top 50 marker genes for the mature-xylem cluster in the A. thaliana seedling atlas and their sorghum orthologs. Specificity of expression of each ortholog in the mature xylem cluster of the sorghum root atlas is also shown in the column after the sorghum ortholog ID. Green shading highlights brassica-c4 TFBSs and yellow highlights grass-c2 TFBSs. Outlined boxes highlight A. thaliana genes with a brassica-specific NAC007 TFBS and no MYB83 TFBS whose sorghum orthologs have the reverse: a grass-specific MYB83 TFBS but no NAC007 TFBS.

Supplementary information

Supplementary Information

Captions for Supplementary Tables 1–12.

Reporting Summary

Supplementary Tables

Supplementary Tables 1–12. See the Supplementary Information for the table captions.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Baumgart, L.A., Greenblum, S.I., Morales-Cruz, A. et al. Recruitment, rewiring and deep conservation in flowering plant gene regulation. Nat. Plants 11, 1514–1527 (2025). https://doi.org/10.1038/s41477-025-02047-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41477-025-02047-0

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing