Identification of plant transcriptional activation domains

Morffy, Nicholas; Van den Broeck, Lisa; Miller, Caelan; Emenecker, Ryan J.; Bryant, John A.; Lee, Tyler M.; Sageman-Furnas, Katelyn; Wilkinson, Edward G.; Pathak, Sunita; Kotha, Sanjana R.; Lam, Angelica; Mahatma, Saloni; Pande, Vikram; Waoo, Aman; Wright, R. Clay; Holehouse, Alex S.; Staller, Max V.; Sozzani, Rosangela; Strader, Lucia C.

doi:10.1038/s41586-024-07707-3

Article
Published: 17 July 2024

Identification of plant transcriptional activation domains

Nature volume 632, pages 166–173 (2024)Cite this article

17k Accesses
52 Citations
96 Altmetric
Metrics details

Subjects

Abstract

Gene expression in Arabidopsis is regulated by more than 1,900 transcription factors (TFs), which have been identified genome-wide by the presence of well-conserved DNA-binding domains. Activator TFs contain activation domains (ADs) that recruit coactivator complexes; however, for nearly all Arabidopsis TFs, we lack knowledge about the presence, location and transcriptional strength of their ADs¹. To address this gap, here we use a yeast library approach to experimentally identify Arabidopsis ADs on a proteome-wide scale, and find that more than half of the Arabidopsis TFs contain an AD. We annotate 1,553 ADs, the vast majority of which are, to our knowledge, previously unknown. Using the dataset generated, we develop a neural network to accurately predict ADs and to identify sequence features that are necessary to recruit coactivator complexes. We uncover six distinct combinations of sequence features that result in activation activity, providing a framework to interrogate the subfunctionalization of ADs. Furthermore, we identify ADs in the ancient AUXIN RESPONSE FACTOR family of TFs, revealing that AD positioning is conserved in distinct clades. Our findings provide a deep resource for understanding transcriptional activation, a framework for examining function in intrinsically disordered regions and a predictive model of ADs.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: High-throughput tiling of *Arabidopsis* TFs uncovers thousands of ADs.**

**Fig. 2: Using AD sequence features to create a predictive model.**

**Fig. 3: AD subtypes show distinct compositional biases.**

**Fig. 4: Validation of identified ADs.**

**Fig. 5: The position of ARF ADs has remained constant over evolutionary time.**

A redundant transcription factor network steers spatiotemporal Arabidopsis triterpene synthesis

Article 15 May 2023

Recruitment, rewiring and deep conservation in flowering plant gene regulation

Article 15 July 2025

Comparative analysis of amino acid sequence level in plant GATA transcription factors

Article Open access 30 November 2024

Data availability

Library sequencing data have been deposited in the NCBI’s Gene Expression Omnibus (GEO) and are accessible through the GEO series accession number GSE234215. Source data are provided with this paper.

Code availability

All scripts for the neural network training and validation and for making predictions are available on GitHub (https://github.com/LisaVdB/TADA).

References

Strader, L., Weijers, D. & Wagner, D. Plant transcription factors—being in the right place with the right company. Curr. Opin. Plant Biol. 65, 102136 (2022).
CAS PubMed Google Scholar
O’Malley, R. C. et al. Cistrome and epicistrome features shape the regulatory DNA landscape. Cell 165, 1280–1292 (2016).
PubMed PubMed Central Google Scholar
Galli, M. et al. The DNA binding landscape of the maize AUXIN RESPONSE FACTOR family. Nat. Commun. 9, 4526 (2018).
ADS PubMed PubMed Central Google Scholar
Sanborn, A. L. et al. Simple biochemical features underlie transcriptional activation domain diversity and dynamic, fuzzy binding to Mediator. eLife 10, e68068 (2021).
CAS PubMed PubMed Central Google Scholar
Dyson, H. J. & Wright, P. E. Role of Intrinsic protein disorder in the function and interactions of the transcriptional coactivators CREB-binding protein (CBP) and p300. J. Biol. Chem. 291, 6714–6722 (2016).
CAS PubMed PubMed Central Google Scholar
Ferreira, M. E. et al. Mechanism of transcription factor recruitment by acidic activators. J. Biol. Chem. 280, 21779–21784 (2005).
CAS PubMed Google Scholar
Hermann, S., Berndt, K. D. & Wright, A. P. How transcriptional activators bind target proteins. J. Biol. Chem. 276, 40127–40132 (2001).
CAS PubMed Google Scholar
Kim, J. Y. & Chung, H. S. Disordered proteins follow diverse transition paths as they fold and bind to a partner. Science 368, 1253–1257 (2020).
ADS CAS PubMed PubMed Central Google Scholar
Staller, M. V. et al. Directed mutational scanning reveals a balance between acidic and hydrophobic residues in strong human activation domains. Cell Syst. 13, 334–345 (2022).
CAS PubMed PubMed Central Google Scholar
Kotha, S. R. & Staller, M. V. Clusters of acidic and hydrophobic residues can predict acidic transcriptional activation domains from protein sequence. Genetics 225, iyad131 (2023).
PubMed PubMed Central Google Scholar
Hummel, N. F. C. et al. The trans-regulatory landscape of gene networks in plants. Cell Syst. 14, 501–511 (2023).
CAS PubMed Google Scholar
Staller, M. V. et al. A high-throughput mutational scan of an intrinsically disordered acidic transcriptional activation domain. Cell Syst. 6, 444–455 (2018).
CAS PubMed PubMed Central Google Scholar
Konishi, M. & Yanagisawa, S. The role of protein–protein interactions mediated by the PB1 domain of NLP transcription factors in nitrate-inducible gene expression. BMC Plant Biol. 19, 90 (2019).
PubMed PubMed Central Google Scholar
Hahn, S. & Young, E. T. Transcriptional regulation in Saccharomyces cerevisiae: transcription factor regulation and function, mechanisms of initiation, and roles of activators and coactivators. Genetics 189, 705–736 (2011).
CAS PubMed PubMed Central Google Scholar
Emenecker, R. J., Griffith, D. & Holehouse, A. S. Metapredict: a fast, accurate, and easy-to-use predictor of consensus disorder and structure. Biophys. J. 120, 4312–4319 (2021).
ADS CAS PubMed PubMed Central Google Scholar
Hope, I. A., Mahadevan, S. & Struhl, K. Structural and functional characterization of the short acidic transcriptional activation region of yeast GCN4 protein. Nature 333, 635–640 (1988).
ADS CAS PubMed Google Scholar
Hope, I. A. & Struhl, K. Functional dissection of a eukaryotic transcriptional activator protein, GCN4 of yeast. Cell 46, 885–894 (1986).
CAS PubMed Google Scholar
Mitchell, P. J. & Tjian, R. Transcriptional regulation in mammalian cells by sequence-specific DNA binding proteins. Science 245, 371–378 (1989).
ADS CAS PubMed Google Scholar
Mahatma, S. et al. Prediction and functional characterization of transcriptional activation domains. In 57th Annual Conference on Information Sciences and Systems (CISS) 1–6 (2023).
Erijman, A. et al. A high-throughput screen for transcription activation domains reveals their sequence features and permits prediction by deep learning. Mol. Cell 78, 890–902 (2020).
CAS PubMed PubMed Central Google Scholar
Lundberg, S. & Lee, S.-I. A unified approach to interpreting model predictions. In Proc. 31st International Conference on Neural Information Processing Systems 4768–4777 (2017).
Hussain, R. M. F., Sheikh, A. H., Haider, I., Quareshy, M. & Linthorst, H. J. M. Arabidopsis WRKY50 and TGA transcription factors synergistically activate expression of PR1. Front. Plant Sci. 9, 930 (2018).
PubMed PubMed Central Google Scholar
Li, J. et al. Activation domains for controlling plant gene expression using designed transcription factors. Plant Biotechnol. J. 11, 671–680 (2013).
CAS PubMed Google Scholar
Cho, S. et al. Analysis of the C-terminal region of Arabidopsis thaliana APETALA1 as a transcription activation domain. Plant Mol. Biol. 40, 419–429 (1999).
CAS PubMed Google Scholar
Sakuma, Y. et al. Functional analysis of an Arabidopsis transcription factor, DREB2A, involved in drought-responsive gene expression. Plant Cell 18, 1292–1309 (2006).
CAS PubMed PubMed Central Google Scholar
Kotak, S., Port, M., Ganguli, A., Bicker, F. & von Koskull-Doring, P. Characterization of C-terminal domains of Arabidopsis heat stress transcription factors (Hsfs) and identification of a new signature combination of plant class A Hsfs with AHA and NES motifs essential for activator function and intracellular localization. Plant J. 39, 98–112 (2004).
CAS PubMed Google Scholar
Yoo, C. Y. et al. Direct photoresponsive inhibition of a p53-like transcription activation domain in PIF3 by Arabidopsis phytochrome B. Nat. Commun. 12, 5614 (2021).
ADS CAS PubMed PubMed Central Google Scholar
Fernandez-Calvo, P. et al. The Arabidopsis bHLH transcription factors MYC3 and MYC4 are targets of JAZ repressors and act additively with MYC2 in the activation of jasmonate responses. Plant Cell 23, 701–715 (2011).
CAS PubMed PubMed Central Google Scholar
Tiwari, S. B., Hagen, G. & Guilfoyle, T. The roles of auxin response factor domains in auxin-responsive transcription. Plant Cell 15, 533–543 (2003).
CAS PubMed PubMed Central Google Scholar
Ulmasov, T., Hagen, G. & Guilfoyle, T. J. Activation and repression of transcription by auxin-response factors. Proc. Natl Acad. Sci. USA 96, 5844–5849 (1999).
ADS CAS PubMed PubMed Central Google Scholar
Pierre-Jerome, E., Jang, S. S., Havens, K. A., Nemhauser, J. L. & Klavins, E. Recapitulation of the forward nuclear auxin response pathway in yeast. Proc. Natl Acad. Sci. USA 111, 9407–2412 (2014).
ADS CAS PubMed PubMed Central Google Scholar
Powers, S. K. & Strader, L. C. Regulation of auxin transcriptional responses. Dev. Dyn. 249, 483–495 (2020).
CAS PubMed Google Scholar
Choi, H. S., Seo, M. & Cho, H. T. Two TPL-binding motifs of ARF2 are involved in repression of auxin responses. Front. Plant Sci. 9, 372 (2018).
PubMed PubMed Central Google Scholar
Hiratsu, K., Matsui, K., Koyama, T. & Ohme-Takagi, M. Dominant repression of target genes by chimeric repressors that include the EAR motif, a repression domain, in Arabidopsis. Plant J. 34, 733–739 (2003).
CAS PubMed Google Scholar
Mutte, S. K. et al. Origin and evolution of the nuclear auxin response system. eLife 7, e33399 (2018).
PubMed PubMed Central Google Scholar
DelRosso, N. et al. Large-scale mapping and mutagenesis of human transcriptional effector domains. Nature 616, 365–372 (2023).
ADS CAS PubMed PubMed Central Google Scholar
Leydon, A. R. et al. Repression by the Arabidopsis TOPLESS corepressor requires association with the core mediator complex. eLife 10, e66739 (2021).
CAS PubMed PubMed Central Google Scholar
Holehouse, A. S., Das, R. K., Ahad, J. N., Richardson, M. O. & Pappu, R. V. CIDER: resources to analyze sequence-ensemble relationships of intrinsically disordered proteins. Biophys. J. 112, 16–21 (2017).
ADS CAS PubMed PubMed Central Google Scholar
Kagale, S. & Rozwadowski, K. EAR motif-mediated transcriptional repression in plants: an underlying mechanism for epigenetic regulation of gene expression. Epigenetics 6, 141–146 (2011).
CAS PubMed PubMed Central Google Scholar
Boer, D. R. et al. Structural basis for DNA binding specificity by the auxin-dependent ARF transcription factors. Cell 156, 577–589 (2014).
CAS PubMed Google Scholar
Korasick, D. A. et al. Molecular basis for AUXIN RESPONSE FACTOR protein interaction and the control of auxin response repression. Proc. Natl Acad. Sci. USA 111, 5427–5432 (2014).
ADS CAS PubMed PubMed Central Google Scholar
Havens, K. A. et al. A synthetic approach reveals extensive tunability of auxin signaling. Plant Physiol. 160, 135–142 (2012).
CAS PubMed PubMed Central Google Scholar
Hillson, N. J., Rosengarten, R. D. & Keasling, J. D. j5 DNA assembly design automation software. ACS Synth. Biol. 1, 14–21 (2012).
CAS PubMed Google Scholar
Garcia-Nafria, J., Watson, J. F. & Greger, I. H. IVA cloning: a single-tube universal cloning system exploiting bacterial in vivo assembly. Sci. Rep. 6, 27459 (2016).
ADS CAS PubMed PubMed Central Google Scholar
Gietz, R. D. & Schiestl, R. H. High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method. Nat. Protoc. 2, 31–34 (2007).
CAS PubMed Google Scholar
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).
PubMed PubMed Central Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
PubMed PubMed Central Google Scholar
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
ADS CAS PubMed PubMed Central Google Scholar
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
ADS CAS PubMed PubMed Central Google Scholar
Kobak, D. & Berens, P. The art of using t-SNE for single-cell transcriptomics. Nat. Commun. 10, 5416 (2019).
ADS PubMed PubMed Central Google Scholar
Pierre-Jerome, E., Wright, R. C. & Nemhauser, J. L. Characterizing auxin response circuits in Saccharomyces cerevisiae by flow cytometry. Methods Mol. Biol. 1497, 271–281 (2017).
CAS PubMed PubMed Central Google Scholar
Wright, R. C., Bolten, N. & Pierre-Jerome, E. flowTime: annotation and analysis of biological dynamical systems using flow cytometry. R version 1.24.0 https://www.bioconductor.org/packages/release/bioc/html/flowTime.html (2023).
White, S. et al. FlowKit: a Python toolkit for integrated manual and automated cytometry analysis workflows. Front. Immunol. 12, 768541 (2021).
CAS PubMed PubMed Central Google Scholar
Lotthammer, J. M., Ginell, G. M., Griffith, D., Emenecker, R. J. & Holehouse, A. S. Direct prediction of intrinsically disordered protein conformational properties from sequence. Nat. Methods 21, 465–476 (2024).
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the National Science Foundation (PGRP BIO-2112056 to L.C.S., PGRP BIO-2112057 to M.V.S. and PGRP BIO-2112058 to R.S.), the NSF Postdoctoral Research Program (IOS-1907098 to N.M.) and the National Institutes of Health (R35 GM136338 to L.C.S.).

Author information

Authors and Affiliations

Department of Biology, Duke University, Durham, NC, USA
Nicholas Morffy, Caelan Miller, Tyler M. Lee, Katelyn Sageman-Furnas, Edward G. Wilkinson, Sunita Pathak & Lucia C. Strader
Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, USA
Lisa Van den Broeck, Saloni Mahatma, Vikram Pande, Aman Waoo & Rosangela Sozzani
Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
Ryan J. Emenecker & Alex S. Holehouse
Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA
Ryan J. Emenecker & Alex S. Holehouse
Biological Systems Engineering, Virginia Tech, Blacksburg, VA, USA
John A. Bryant Jr. & R. Clay Wright
Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
Sanjana R. Kotha, Angelica Lam & Max V. Staller

Authors

Nicholas Morffy
View author publications
Search author on:PubMed Google Scholar
Lisa Van den Broeck
View author publications
Search author on:PubMed Google Scholar
Caelan Miller
View author publications
Search author on:PubMed Google Scholar
Ryan J. Emenecker
View author publications
Search author on:PubMed Google Scholar
John A. Bryant Jr.
View author publications
Search author on:PubMed Google Scholar
Tyler M. Lee
View author publications
Search author on:PubMed Google Scholar
Katelyn Sageman-Furnas
View author publications
Search author on:PubMed Google Scholar
Edward G. Wilkinson
View author publications
Search author on:PubMed Google Scholar
Sunita Pathak
View author publications
Search author on:PubMed Google Scholar
Sanjana R. Kotha
View author publications
Search author on:PubMed Google Scholar
Angelica Lam
View author publications
Search author on:PubMed Google Scholar
Saloni Mahatma
View author publications
Search author on:PubMed Google Scholar
Vikram Pande
View author publications
Search author on:PubMed Google Scholar
Aman Waoo
View author publications
Search author on:PubMed Google Scholar
R. Clay Wright
View author publications
Search author on:PubMed Google Scholar
Alex S. Holehouse
View author publications
Search author on:PubMed Google Scholar
Max V. Staller
View author publications
Search author on:PubMed Google Scholar
Rosangela Sozzani
View author publications
Search author on:PubMed Google Scholar
Lucia C. Strader
View author publications
Search author on:PubMed Google Scholar

Contributions

N.M., M.V.S., R.S. and L.C.S. designed the study. N.M. and M.V.S. designed the pilot tiling libraries. N.M. designed the PADI tiling and ARF evolution tiling libraries. C.M. and N.M. cloned and integrated libraries into yeast. L.V.d.B., S.M., V.P., A.W. and R.S. designed and implemented the TADA network. R.J.E. and A.S.H. performed biophysical simulations and advised on Metapredict. J.A.B. and R.C.W. assessed ARF7 AD activity in the yeast synthetic auxin signalling system. N.M., T.M.L., K.S.-F., E.G.W., S.P. and L.C.S. tested ADs in protoplasts. S.R.K. and A.L. examined human TFs with TADA. N.M. and L.V.d.B. wrote the manuscript, with important contributions from R.S. and L.C.S., and contributions from all other authors. L.C.S. supervised the project, with contributions from R.S. and M.V.S.

Corresponding author

Correspondence to Lucia C. Strader.

Ethics declarations

Competing interests

L.C.S. is on the science advisory board of Prose Foods. R.S. is founder of Raleigh Biosciences. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature thanks Jennifer Brophy and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 PADI workflow and quality control.

a, Extended depiction of the PADI assay. 1) DNA encoding 40-amino-acid fragments are synthesized and 2) cloned into a synthetic TF backbone in bulk. 3) Confirmed synthetic TF libraries are cloned into the URA3 locus of DHY211 yeast cells and positive clones are selected by G418 and 5-FOA resistance. 4) Positively cloned yeast TF libraries are mated to the MY435 reporter strain¹². Positively mated clones are selected by G418 (library) and CloNAT (reporter) resistance. 5) Pooled mated libraries and controls are grown overnight and subcultured 1:5 with 1 µM beta-estradiol to induce synthetic TF localization to the nucleus. 6) After 4 hrs beta-estradiol treatment, mated yeast libraries are sorted into bins based on relative levels of GFP (reporter) to mCherry (synthetic TF) to determine AD activity. 7) Populations from each bin were grown overnight and sequenced to determine the distribution of tested fragments across bins. b,c, These plots show the correlation between PADI scores from all Arabidopsis TF libraries plotted against a pooled library where cells were sorted on median GFP (b) or mCherry (c) values. Each fragment was given a GFP or mCherry score based on the weighted mean of its appearance across all GFP or mCherry bins and then normalized using Z-score normalization consistent with how the PADI score was generated. The blue line represents the linear correlation of the data. There is a positive correlation between PADI score and GFP score, but not between PADI and mCherry scores. These results show that the PADI score is a robust measure of transcriptional activity regardless of the abundance of any TF. d, Scatter plot showing the correlation between two sorts of PADI library 3. Replicate 1 is included in all analysis. The blue line represents the linear regression of the two datasets. The linear regression model has an r-value of 0.657. e, Violin plots showing the PADI scores of four positive AD controls (n = 10 independent library experiments). The controls are found in all 10 PADI libraries and were consistently positive across libraries. The violin plot of Arabidopsis fragments (n = 69,347 fragments from 10 libraries) is also provided as a comparison. Box plots within the violin plot show the interquartile range and the median with whiskers that are 1.5 times the interquartile range. f, Box plots showing the PADI scores of tested control fragments across the 10 PADI libraries. Each point is the PADI score of the tested fragment and the colour of each point corresponds to the 10 PADI libraries (n = 10 independent experiments). All box plots show the interquartile range and the median. Whiskers are 1.5 times the interquartile range. g, Comparison of panels h–l from main text Fig. 1. The data presented from Fig. 1h–l (top) (n = 3,576) are presented above the same analysis conducted on all positive fragments regardless of mean disorder (bottom) (n = 6,207). The trends hold between the filtered data (top) and unfiltered data (bottom). h, Distribution of identified ADs across Arabidopsis TF families. i, Distribution of highest-scoring hits from each TF in each family. j, Distribution of the number of ADs identified per Arabidopsis TF. k, Distribution of number of contiguous hits identified per identified AD. Contiguous hits could be indicative of a short AD contained in neighbouring fragments or of an extended AD for which a subset of residues is sufficient to activate transcription; our data cannot distinguish between these. l, The distribution of hit locations revealed a bias towards the amino and carboxy termini of proteins. All box plots represent the median and interquartile range. The whiskers are 1.5 times the interquartile range.

Source Data

Extended Data Fig. 2 PADI hit characterization.

a–d, Box plots showing the number of D + E (a) R + K + H (b) A + I + L + M + V (c) and S + N + P + Q (d) of each subtype (n ≥ 625). Letters correspond to the statistical levels of each subtype based on the Tukey–Kramer HSD metric with an alpha-level of 0.05. e, Scatter plot showing the correlation between the percentage of TFs with at least one AD (defined as a PADI score of greater than or equal to 1 and from an IDR) and the mean of the highest-scoring AD from each TF in a family. The line represents the linear regression and the shaded area represent the 95% confidence interval. f, Box plots showing the net charge of hits from each of the six AD subtypes (n ≥ 625). g, Heat map showing the distribution of Rg values against PADI score for all tested fragments (n = 6,207). We used simulations to examine the radius of gyration (Rg), which is a measure of the volume that an IDR ensemble occupies. Rg is particularly relevant to the AD molecular mechanism, as exposure of interacting side chains is necessary for interaction with the transcriptional machinery. We found that the Rg of our identified ADs occupied a narrow range of radii, as compared to the tested library, raising the possibility that ADs must adopt sufficiently expanded conformations for activity. h, Box plots showing the Rg values of each subtype; Rg was similar across subtypes (n ≥ 625). i, Table describing the PADI fragments tested in the synthetic TFs in Fig. 3h. The fragment key, its Arabidopsis identifier, amino acid sequence, PADI score, and subtype are shown. j, Box plots showing the distribution of PADI scores for each of the six subtypes. The stars represent the PADI score of the fragments tested for activity in Fig. 3h and shown in Extended Data Fig. 2i. The tested fragments span the range of PADI scores found in the six subtypes (n ≥ 625). Stars depict the PADI scores of selected hits for testing in protoplasts. k, Protein accumulation of Synthetic TFs from Fig. 3h. Violin plots show the mScarlet-TF values of cells. The black lines mark the mean mScarlet-TF value of each sample (n ≥ 529 cells from 3 independent transfections). l, Protein accumulation of FrankenARF TFs from Fig. 4e. Violin plots show the mNEON-TF values of cells. The black lines mark the mean mNEON-TF value of each sample (n ≥ 2,212 cells from 4 independent transfections). All cells collected for reporter expression were gated on the presence of TF signal when compared to blank cells. Only positive cells were used to collect output data presented in Figs. 3h and 4e. m, Gating strategy for examination of AD activity in protoplasts. Cells were gated based on size and mScarlet (for presence of TF) signal as depicted. Untransfected cells did not display signal above the threshold for mScarlet (left) whereas control cells transfected with the TF lacking an AD (middle) and cells transfected with the TF carrying VP16 (right) were selected for assessment of mNeonGreen (transcriptional output). All box plots represent the median and interquartile range. The whiskers are 1.5 times the interquartile range.

Source Data

Extended Data Fig. 3 Classification performance of TADA and effect of features on TADA’s prediction performance.

a, The loss of TADA during training and validation. b, TADA’s performance in terms of precision, recall, area under the receiver operating curve (AUC), accuracy, AUPR and F1 score. TADA was trained three distinct times using random peptides²⁰, PADI (referred to as “plant TFs”), and random peptides and PADI combined. c, TADA outperforms all published AD predictors. We compared the performance TADA with three published AD predictors (ADpred, PADDLE and a composition model^4,10,20. We used a hand-curated list of 599 ADs from 451 human TFs. For each TF, we predicted ADs and considered predictions that overlapped a known annotation by > 10 amino acids to be true positive, using each predictor. TADA made the most predictions, had the highest Sensitivity, and highest F1 score. d, Z-score normalized SHAP values leading to the selection of 8 features with a z-score above 1. e, Normalized SHAP values ranked from overall most important to least important for fragments scoring above 1 for each of the 6 identified AD subclasses.

Source Data

Extended Data Fig. 4 AD subtypes by TF family.

Heat map showing the percentage of hits (defined as a PADI score ≥ 1) from each subtype found in each family in Arabidopsis.

Source Data

Extended Data Fig. 5 Comparison of PADI hits to previous activators and distribution of hits across the middle regions of clade-A ARF subclades.

a, Hummel et al.¹¹ identified ADs in sixty-eight Arabidopsis TFs that could elicit a transcriptional response when transiently expressed in intact tobacco leaves. We identified fragments that could activate transcription in yeast from fifty-six (82%) of the sixty-eight TFs factors identified by Hummel et al. We did not identify fragments that could elicit yeast-based transcription from nine TFs in which Hummel et al. demonstrated transcriptional activity. An additional three TFs were untested in the PADI dataset. It is possible that for the 9 TFs for which Hummel et al. found activation activity and in which we did not identify a hit in our PADI screen that either 1) they contain ADs that are active in plant cells but not in yeast or 2) the nearly intact TFs used by Hummel et al. recruited other coactivators in their system (for example native TFs that contain an AD). b–e, Orange regions were used to define AD regions for alignment in Extended Data Figs. 7 and 8. b, ARF5 clade. c, ARF6 clade. d, ARF7 clade. e, ARF8 clade.

Source Data

Extended Data Fig. 6 Phylogeny of examined ARFs.

The maximum-likelihood tree was generated using MAFFT alignments of the conserved ARF DBD. Major ARF clades (bright blue, orange and green) and subclades (light blue, orange and green) are annotated. These annotations were used for categorizing sequences in Fig. 4.

Extended Data Fig. 7 ARF7 and ARF5 subclade AD alignments.

a–c,The highest-scoring fragment from each tested ARF within the defined ARF7 and ARF5 AD regions (a, ARF7AD1; b, ARF7AD2; c, ARF5 AD) (orange bars in Extended Data Fig. 5b,d) were used to generate alignments with MAFFT. Alignments were visualized with the ESPript 3.0 webserver. Boxes indicate regions in which 50% of amino acid residues share sequence similarity based on biochemical properties. Bolded residues are the amino acids with shared properties within the region. Black boxes represent sequence conservation.

Extended Data Fig. 8 ARF6 and ARF8 subclade AD alignments.

The highest-scoring fragment from each tested ARF within the defined AD regions (orange bars in Extended Data Fig. 5c,e) were used to generate alignments with MAFFT. Alignments were visualized with the ESPript 3.0 webserver. Boxes indicate regions where 50% of amino acid residues share sequence similarity based on biochemical properties. Bolded residues are the amino acids with shared properties within the region. Black boxes represent sequence conservation.

Extended Data Fig. 9 MYB family ADs and prediction performance of TADA on the ARF evolution dataset.

a, Histogram of all AD hits (defined as a PADI score of greater than or equal to 1 and from an IDR) from the MYB family. Each bar represents the number of ADs found in each 5% interval of the protein length. These results show that MYB ADs are enriched in the final 15% of tested TFs. b, Representative gating strategy for all PADI libraries. Yeast cells were gated based on size to exclude doublets (R1 and R3). Single cells were then gated to exclude those with mCherry signal below background (R4) when compared to mCherry negative cells. The mCherry-positive cells were then binned and sorted into twelve populations based on the GFP:mCherry ratio. c,Prediction performance of TADA, and the TADAΔARF variation. TADA performance on the PADI data test set and the ARF evolution dataset in terms of precision, recall, area under the receiver operating curve (AUC), accuracy, AUPR and F1 score. We further validated the generalization of TADA by retraining TADA on the original training dataset but withholding the ARF sequences (2,046 of the 70,937 sequences), which we called TADAΔARF. This approach prevents TADA from memorizing/overfitting ARF sequences. d, Prediction performance of TADA, PADDLE, ADPred, and the composition model in terms of area under the receiver operating curve (roc_auc), area under the precision recall curve (pr_auc), accuracy, F1 score, true positive rate (tpr), false positive rate (fpr), precision, and recall when tested on the ARF evolution dataset. Because each of these predictors subdivides sequences differently and used different fragment lengths for training, we compared their performance on full-length protein sequence from the evolution dataset.

Extended Data Fig. 10 Arabidopsis TFs with identified ADs.

Waffle plots of the 1,918 Arabidopsis TFs analysed. Those with previously identified ADs are marked with a black box in the left waffle plot. The right waffle plot depicts those with activating fragments identified by PADI.

Supplementary information

Reporting Summary

Supplementary Table 1

This table contains name, locus, gene family, amino acid sequence, PADI score and biochemical information used to generate Figs. 1e, 1f, 1h–l, 2a, 2b, 3b-g, and 4a-c as well as associated extended figures. AD subtype and fragment type information are also included. Here we called any fragment that had a PADI score >=1 and mean disorder >0.5 as an “AD” and any fragment that had a PADI score >=1 and mean disorder <=0.5 as “Maybe.” All other fragments that had a PADI score <1 are “Not AD.”

Supplementary Table 2

This table has names and amino acid sequences of PADI hits by AD subtype.

Supplementary Table 3

This table contains the clade, species, names, amino acid sequences and PADI scores for the ARF evolution dataset used to generate Fig. 5a,c,d.

Supplementary Table 4

This table contains the features used by TADA to predict activation domain activity.

Supplementary Data 1

This file has a graphical representation of PADI data for every Arabidopsis TF tested. PADI (orange) and predicted disorder (white) scores for NLP7 show regions strong activity in disordered regions as well as ordered regions that overlap with the know PB1 domain. The orange (PADI = 1) and grey (Metapredict score = 0.5) dashed lines are considered cut-offs for activation and disorder, respectively.

Source data

Source Data Fig. 1

Source Data Fig. 3

Source Data Fig. 4

Source Data Extended Data Fig. 1

Source Data Extended Data Fig. 2

Source Data Extended Data Fig. 3

Source Data Extended Data Fig. 4

Source Data Extended Data Fig. 5

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Morffy, N., Van den Broeck, L., Miller, C. et al. Identification of plant transcriptional activation domains. Nature 632, 166–173 (2024). https://doi.org/10.1038/s41586-024-07707-3

Download citation

Received: 26 June 2023
Accepted: 12 June 2024
Published: 17 July 2024
Version of record: 17 July 2024
Issue date: 01 August 2024
DOI: https://doi.org/10.1038/s41586-024-07707-3

This article is cited by

Combined R2R3–MYB transcription factor mutants reveal the regulatory structure of the Arabidopsis thaliana flavonoid biosynthesis pathway
- Lennart Malte Sielmann
- Timo Denecke
- Ralf Stracke
Planta (2026)
Efficient, cell-type-specific production of flavonols by multiplexed CRISPR activation of a suite of metabolic enzymes
- Anaxi Houbaert
- Valérie Denervaud Tendon
- Niko Geldner
Nature Communications (2025)
Mechanisms of auxin action in plant growth and development
- Steffen Vanneste
- Yuanrong Pei
- Jiří Friml
Nature Reviews Molecular Cell Biology (2025)
Multiplexed profiling of transcriptional regulators in plant cells
- Simon Alamos
- Lucas Waldburger
- Patrick M. Shih
Nature Biotechnology (2025)
Evolutionary origins and functional diversification of Auxin Response Factors
- Jorge Hernández-García
- Vanessa Polet Carrillo-Carrasco
- Dolf Weijers
Nature Communications (2024)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data figures and tables

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links