Abstract
Gene expression in Arabidopsis is regulated by more than 1,900 transcription factors (TFs), which have been identified genome-wide by the presence of well-conserved DNA-binding domains. Activator TFs contain activation domains (ADs) that recruit coactivator complexes; however, for nearly all Arabidopsis TFs, we lack knowledge about the presence, location and transcriptional strength of their ADs1. To address this gap, here we use a yeast library approach to experimentally identify Arabidopsis ADs on a proteome-wide scale, and find that more than half of the Arabidopsis TFs contain an AD. We annotate 1,553 ADs, the vast majority of which are, to our knowledge, previously unknown. Using the dataset generated, we develop a neural network to accurately predict ADs and to identify sequence features that are necessary to recruit coactivator complexes. We uncover six distinct combinations of sequence features that result in activation activity, providing a framework to interrogate the subfunctionalization of ADs. Furthermore, we identify ADs in the ancient AUXIN RESPONSE FACTOR family of TFs, revealing that AD positioning is conserved in distinct clades. Our findings provide a deep resource for understanding transcriptional activation, a framework for examining function in intrinsically disordered regions and a predictive model of ADs.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
Library sequencing data have been deposited in the NCBI’s Gene Expression Omnibus (GEO) and are accessible through the GEO series accession number GSE234215. Source data are provided with this paper.
Code availability
All scripts for the neural network training and validation and for making predictions are available on GitHub (https://github.com/LisaVdB/TADA).
References
Strader, L., Weijers, D. & Wagner, D. Plant transcription factors—being in the right place with the right company. Curr. Opin. Plant Biol. 65, 102136 (2022).
O’Malley, R. C. et al. Cistrome and epicistrome features shape the regulatory DNA landscape. Cell 165, 1280–1292 (2016).
Galli, M. et al. The DNA binding landscape of the maize AUXIN RESPONSE FACTOR family. Nat. Commun. 9, 4526 (2018).
Sanborn, A. L. et al. Simple biochemical features underlie transcriptional activation domain diversity and dynamic, fuzzy binding to Mediator. eLife 10, e68068 (2021).
Dyson, H. J. & Wright, P. E. Role of Intrinsic protein disorder in the function and interactions of the transcriptional coactivators CREB-binding protein (CBP) and p300. J. Biol. Chem. 291, 6714–6722 (2016).
Ferreira, M. E. et al. Mechanism of transcription factor recruitment by acidic activators. J. Biol. Chem. 280, 21779–21784 (2005).
Hermann, S., Berndt, K. D. & Wright, A. P. How transcriptional activators bind target proteins. J. Biol. Chem. 276, 40127–40132 (2001).
Kim, J. Y. & Chung, H. S. Disordered proteins follow diverse transition paths as they fold and bind to a partner. Science 368, 1253–1257 (2020).
Staller, M. V. et al. Directed mutational scanning reveals a balance between acidic and hydrophobic residues in strong human activation domains. Cell Syst. 13, 334–345 (2022).
Kotha, S. R. & Staller, M. V. Clusters of acidic and hydrophobic residues can predict acidic transcriptional activation domains from protein sequence. Genetics 225, iyad131 (2023).
Hummel, N. F. C. et al. The trans-regulatory landscape of gene networks in plants. Cell Syst. 14, 501–511 (2023).
Staller, M. V. et al. A high-throughput mutational scan of an intrinsically disordered acidic transcriptional activation domain. Cell Syst. 6, 444–455 (2018).
Konishi, M. & Yanagisawa, S. The role of protein–protein interactions mediated by the PB1 domain of NLP transcription factors in nitrate-inducible gene expression. BMC Plant Biol. 19, 90 (2019).
Hahn, S. & Young, E. T. Transcriptional regulation in Saccharomyces cerevisiae: transcription factor regulation and function, mechanisms of initiation, and roles of activators and coactivators. Genetics 189, 705–736 (2011).
Emenecker, R. J., Griffith, D. & Holehouse, A. S. Metapredict: a fast, accurate, and easy-to-use predictor of consensus disorder and structure. Biophys. J. 120, 4312–4319 (2021).
Hope, I. A., Mahadevan, S. & Struhl, K. Structural and functional characterization of the short acidic transcriptional activation region of yeast GCN4 protein. Nature 333, 635–640 (1988).
Hope, I. A. & Struhl, K. Functional dissection of a eukaryotic transcriptional activator protein, GCN4 of yeast. Cell 46, 885–894 (1986).
Mitchell, P. J. & Tjian, R. Transcriptional regulation in mammalian cells by sequence-specific DNA binding proteins. Science 245, 371–378 (1989).
Mahatma, S. et al. Prediction and functional characterization of transcriptional activation domains. In 57th Annual Conference on Information Sciences and Systems (CISS) 1–6 (2023).
Erijman, A. et al. A high-throughput screen for transcription activation domains reveals their sequence features and permits prediction by deep learning. Mol. Cell 78, 890–902 (2020).
Lundberg, S. & Lee, S.-I. A unified approach to interpreting model predictions. In Proc. 31st International Conference on Neural Information Processing Systems 4768–4777 (2017).
Hussain, R. M. F., Sheikh, A. H., Haider, I., Quareshy, M. & Linthorst, H. J. M. Arabidopsis WRKY50 and TGA transcription factors synergistically activate expression of PR1. Front. Plant Sci. 9, 930 (2018).
Li, J. et al. Activation domains for controlling plant gene expression using designed transcription factors. Plant Biotechnol. J. 11, 671–680 (2013).
Cho, S. et al. Analysis of the C-terminal region of Arabidopsis thaliana APETALA1 as a transcription activation domain. Plant Mol. Biol. 40, 419–429 (1999).
Sakuma, Y. et al. Functional analysis of an Arabidopsis transcription factor, DREB2A, involved in drought-responsive gene expression. Plant Cell 18, 1292–1309 (2006).
Kotak, S., Port, M., Ganguli, A., Bicker, F. & von Koskull-Doring, P. Characterization of C-terminal domains of Arabidopsis heat stress transcription factors (Hsfs) and identification of a new signature combination of plant class A Hsfs with AHA and NES motifs essential for activator function and intracellular localization. Plant J. 39, 98–112 (2004).
Yoo, C. Y. et al. Direct photoresponsive inhibition of a p53-like transcription activation domain in PIF3 by Arabidopsis phytochrome B. Nat. Commun. 12, 5614 (2021).
Fernandez-Calvo, P. et al. The Arabidopsis bHLH transcription factors MYC3 and MYC4 are targets of JAZ repressors and act additively with MYC2 in the activation of jasmonate responses. Plant Cell 23, 701–715 (2011).
Tiwari, S. B., Hagen, G. & Guilfoyle, T. The roles of auxin response factor domains in auxin-responsive transcription. Plant Cell 15, 533–543 (2003).
Ulmasov, T., Hagen, G. & Guilfoyle, T. J. Activation and repression of transcription by auxin-response factors. Proc. Natl Acad. Sci. USA 96, 5844–5849 (1999).
Pierre-Jerome, E., Jang, S. S., Havens, K. A., Nemhauser, J. L. & Klavins, E. Recapitulation of the forward nuclear auxin response pathway in yeast. Proc. Natl Acad. Sci. USA 111, 9407–2412 (2014).
Powers, S. K. & Strader, L. C. Regulation of auxin transcriptional responses. Dev. Dyn. 249, 483–495 (2020).
Choi, H. S., Seo, M. & Cho, H. T. Two TPL-binding motifs of ARF2 are involved in repression of auxin responses. Front. Plant Sci. 9, 372 (2018).
Hiratsu, K., Matsui, K., Koyama, T. & Ohme-Takagi, M. Dominant repression of target genes by chimeric repressors that include the EAR motif, a repression domain, in Arabidopsis. Plant J. 34, 733–739 (2003).
Mutte, S. K. et al. Origin and evolution of the nuclear auxin response system. eLife 7, e33399 (2018).
DelRosso, N. et al. Large-scale mapping and mutagenesis of human transcriptional effector domains. Nature 616, 365–372 (2023).
Leydon, A. R. et al. Repression by the Arabidopsis TOPLESS corepressor requires association with the core mediator complex. eLife 10, e66739 (2021).
Holehouse, A. S., Das, R. K., Ahad, J. N., Richardson, M. O. & Pappu, R. V. CIDER: resources to analyze sequence-ensemble relationships of intrinsically disordered proteins. Biophys. J. 112, 16–21 (2017).
Kagale, S. & Rozwadowski, K. EAR motif-mediated transcriptional repression in plants: an underlying mechanism for epigenetic regulation of gene expression. Epigenetics 6, 141–146 (2011).
Boer, D. R. et al. Structural basis for DNA binding specificity by the auxin-dependent ARF transcription factors. Cell 156, 577–589 (2014).
Korasick, D. A. et al. Molecular basis for AUXIN RESPONSE FACTOR protein interaction and the control of auxin response repression. Proc. Natl Acad. Sci. USA 111, 5427–5432 (2014).
Havens, K. A. et al. A synthetic approach reveals extensive tunability of auxin signaling. Plant Physiol. 160, 135–142 (2012).
Hillson, N. J., Rosengarten, R. D. & Keasling, J. D. j5 DNA assembly design automation software. ACS Synth. Biol. 1, 14–21 (2012).
Garcia-Nafria, J., Watson, J. F. & Greger, I. H. IVA cloning: a single-tube universal cloning system exploiting bacterial in vivo assembly. Sci. Rep. 6, 27459 (2016).
Gietz, R. D. & Schiestl, R. H. High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method. Nat. Protoc. 2, 31–34 (2007).
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Kobak, D. & Berens, P. The art of using t-SNE for single-cell transcriptomics. Nat. Commun. 10, 5416 (2019).
Pierre-Jerome, E., Wright, R. C. & Nemhauser, J. L. Characterizing auxin response circuits in Saccharomyces cerevisiae by flow cytometry. Methods Mol. Biol. 1497, 271–281 (2017).
Wright, R. C., Bolten, N. & Pierre-Jerome, E. flowTime: annotation and analysis of biological dynamical systems using flow cytometry. R version 1.24.0 https://www.bioconductor.org/packages/release/bioc/html/flowTime.html (2023).
White, S. et al. FlowKit: a Python toolkit for integrated manual and automated cytometry analysis workflows. Front. Immunol. 12, 768541 (2021).
Lotthammer, J. M., Ginell, G. M., Griffith, D., Emenecker, R. J. & Holehouse, A. S. Direct prediction of intrinsically disordered protein conformational properties from sequence. Nat. Methods 21, 465–476 (2024).
Acknowledgements
This work was supported by the National Science Foundation (PGRP BIO-2112056 to L.C.S., PGRP BIO-2112057 to M.V.S. and PGRP BIO-2112058 to R.S.), the NSF Postdoctoral Research Program (IOS-1907098 to N.M.) and the National Institutes of Health (R35 GM136338 to L.C.S.).
Author information
Authors and Affiliations
Contributions
N.M., M.V.S., R.S. and L.C.S. designed the study. N.M. and M.V.S. designed the pilot tiling libraries. N.M. designed the PADI tiling and ARF evolution tiling libraries. C.M. and N.M. cloned and integrated libraries into yeast. L.V.d.B., S.M., V.P., A.W. and R.S. designed and implemented the TADA network. R.J.E. and A.S.H. performed biophysical simulations and advised on Metapredict. J.A.B. and R.C.W. assessed ARF7 AD activity in the yeast synthetic auxin signalling system. N.M., T.M.L., K.S.-F., E.G.W., S.P. and L.C.S. tested ADs in protoplasts. S.R.K. and A.L. examined human TFs with TADA. N.M. and L.V.d.B. wrote the manuscript, with important contributions from R.S. and L.C.S., and contributions from all other authors. L.C.S. supervised the project, with contributions from R.S. and M.V.S.
Corresponding author
Ethics declarations
Competing interests
L.C.S. is on the science advisory board of Prose Foods. R.S. is founder of Raleigh Biosciences. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature thanks Jennifer Brophy and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 PADI workflow and quality control.
a, Extended depiction of the PADI assay. 1) DNA encoding 40-amino-acid fragments are synthesized and 2) cloned into a synthetic TF backbone in bulk. 3) Confirmed synthetic TF libraries are cloned into the URA3 locus of DHY211 yeast cells and positive clones are selected by G418 and 5-FOA resistance. 4) Positively cloned yeast TF libraries are mated to the MY435 reporter strain12. Positively mated clones are selected by G418 (library) and CloNAT (reporter) resistance. 5) Pooled mated libraries and controls are grown overnight and subcultured 1:5 with 1 µM beta-estradiol to induce synthetic TF localization to the nucleus. 6) After 4 hrs beta-estradiol treatment, mated yeast libraries are sorted into bins based on relative levels of GFP (reporter) to mCherry (synthetic TF) to determine AD activity. 7) Populations from each bin were grown overnight and sequenced to determine the distribution of tested fragments across bins. b,c, These plots show the correlation between PADI scores from all Arabidopsis TF libraries plotted against a pooled library where cells were sorted on median GFP (b) or mCherry (c) values. Each fragment was given a GFP or mCherry score based on the weighted mean of its appearance across all GFP or mCherry bins and then normalized using Z-score normalization consistent with how the PADI score was generated. The blue line represents the linear correlation of the data. There is a positive correlation between PADI score and GFP score, but not between PADI and mCherry scores. These results show that the PADI score is a robust measure of transcriptional activity regardless of the abundance of any TF. d, Scatter plot showing the correlation between two sorts of PADI library 3. Replicate 1 is included in all analysis. The blue line represents the linear regression of the two datasets. The linear regression model has an r-value of 0.657. e, Violin plots showing the PADI scores of four positive AD controls (n = 10 independent library experiments). The controls are found in all 10 PADI libraries and were consistently positive across libraries. The violin plot of Arabidopsis fragments (n = 69,347 fragments from 10 libraries) is also provided as a comparison. Box plots within the violin plot show the interquartile range and the median with whiskers that are 1.5 times the interquartile range. f, Box plots showing the PADI scores of tested control fragments across the 10 PADI libraries. Each point is the PADI score of the tested fragment and the colour of each point corresponds to the 10 PADI libraries (n = 10 independent experiments). All box plots show the interquartile range and the median. Whiskers are 1.5 times the interquartile range. g, Comparison of panels h–l from main text Fig. 1. The data presented from Fig. 1h–l (top) (n = 3,576) are presented above the same analysis conducted on all positive fragments regardless of mean disorder (bottom) (n = 6,207). The trends hold between the filtered data (top) and unfiltered data (bottom). h, Distribution of identified ADs across Arabidopsis TF families. i, Distribution of highest-scoring hits from each TF in each family. j, Distribution of the number of ADs identified per Arabidopsis TF. k, Distribution of number of contiguous hits identified per identified AD. Contiguous hits could be indicative of a short AD contained in neighbouring fragments or of an extended AD for which a subset of residues is sufficient to activate transcription; our data cannot distinguish between these. l, The distribution of hit locations revealed a bias towards the amino and carboxy termini of proteins. All box plots represent the median and interquartile range. The whiskers are 1.5 times the interquartile range.
Extended Data Fig. 2 PADI hit characterization.
a–d, Box plots showing the number of D + E (a) R + K + H (b) A + I + L + M + V (c) and S + N + P + Q (d) of each subtype (n ≥ 625). Letters correspond to the statistical levels of each subtype based on the Tukey–Kramer HSD metric with an alpha-level of 0.05. e, Scatter plot showing the correlation between the percentage of TFs with at least one AD (defined as a PADI score of greater than or equal to 1 and from an IDR) and the mean of the highest-scoring AD from each TF in a family. The line represents the linear regression and the shaded area represent the 95% confidence interval. f, Box plots showing the net charge of hits from each of the six AD subtypes (n ≥ 625). g, Heat map showing the distribution of Rg values against PADI score for all tested fragments (n = 6,207). We used simulations to examine the radius of gyration (Rg), which is a measure of the volume that an IDR ensemble occupies. Rg is particularly relevant to the AD molecular mechanism, as exposure of interacting side chains is necessary for interaction with the transcriptional machinery. We found that the Rg of our identified ADs occupied a narrow range of radii, as compared to the tested library, raising the possibility that ADs must adopt sufficiently expanded conformations for activity. h, Box plots showing the Rg values of each subtype; Rg was similar across subtypes (n ≥ 625). i, Table describing the PADI fragments tested in the synthetic TFs in Fig. 3h. The fragment key, its Arabidopsis identifier, amino acid sequence, PADI score, and subtype are shown. j, Box plots showing the distribution of PADI scores for each of the six subtypes. The stars represent the PADI score of the fragments tested for activity in Fig. 3h and shown in Extended Data Fig. 2i. The tested fragments span the range of PADI scores found in the six subtypes (n ≥ 625). Stars depict the PADI scores of selected hits for testing in protoplasts. k, Protein accumulation of Synthetic TFs from Fig. 3h. Violin plots show the mScarlet-TF values of cells. The black lines mark the mean mScarlet-TF value of each sample (n ≥ 529 cells from 3 independent transfections). l, Protein accumulation of FrankenARF TFs from Fig. 4e. Violin plots show the mNEON-TF values of cells. The black lines mark the mean mNEON-TF value of each sample (n ≥ 2,212 cells from 4 independent transfections). All cells collected for reporter expression were gated on the presence of TF signal when compared to blank cells. Only positive cells were used to collect output data presented in Figs. 3h and 4e. m, Gating strategy for examination of AD activity in protoplasts. Cells were gated based on size and mScarlet (for presence of TF) signal as depicted. Untransfected cells did not display signal above the threshold for mScarlet (left) whereas control cells transfected with the TF lacking an AD (middle) and cells transfected with the TF carrying VP16 (right) were selected for assessment of mNeonGreen (transcriptional output). All box plots represent the median and interquartile range. The whiskers are 1.5 times the interquartile range.
Extended Data Fig. 3 Classification performance of TADA and effect of features on TADA’s prediction performance.
a, The loss of TADA during training and validation. b, TADA’s performance in terms of precision, recall, area under the receiver operating curve (AUC), accuracy, AUPR and F1 score. TADA was trained three distinct times using random peptides20, PADI (referred to as “plant TFs”), and random peptides and PADI combined. c, TADA outperforms all published AD predictors. We compared the performance TADA with three published AD predictors (ADpred, PADDLE and a composition model4,10,20. We used a hand-curated list of 599 ADs from 451 human TFs. For each TF, we predicted ADs and considered predictions that overlapped a known annotation by > 10 amino acids to be true positive, using each predictor. TADA made the most predictions, had the highest Sensitivity, and highest F1 score. d, Z-score normalized SHAP values leading to the selection of 8 features with a z-score above 1. e, Normalized SHAP values ranked from overall most important to least important for fragments scoring above 1 for each of the 6 identified AD subclasses.
Extended Data Fig. 4 AD subtypes by TF family.
Heat map showing the percentage of hits (defined as a PADI score ≥ 1) from each subtype found in each family in Arabidopsis.
Extended Data Fig. 5 Comparison of PADI hits to previous activators and distribution of hits across the middle regions of clade-A ARF subclades.
a, Hummel et al.11 identified ADs in sixty-eight Arabidopsis TFs that could elicit a transcriptional response when transiently expressed in intact tobacco leaves. We identified fragments that could activate transcription in yeast from fifty-six (82%) of the sixty-eight TFs factors identified by Hummel et al. We did not identify fragments that could elicit yeast-based transcription from nine TFs in which Hummel et al. demonstrated transcriptional activity. An additional three TFs were untested in the PADI dataset. It is possible that for the 9 TFs for which Hummel et al. found activation activity and in which we did not identify a hit in our PADI screen that either 1) they contain ADs that are active in plant cells but not in yeast or 2) the nearly intact TFs used by Hummel et al. recruited other coactivators in their system (for example native TFs that contain an AD). b–e, Orange regions were used to define AD regions for alignment in Extended Data Figs. 7 and 8. b, ARF5 clade. c, ARF6 clade. d, ARF7 clade. e, ARF8 clade.
Extended Data Fig. 6 Phylogeny of examined ARFs.
The maximum-likelihood tree was generated using MAFFT alignments of the conserved ARF DBD. Major ARF clades (bright blue, orange and green) and subclades (light blue, orange and green) are annotated. These annotations were used for categorizing sequences in Fig. 4.
Extended Data Fig. 7 ARF7 and ARF5 subclade AD alignments.
a–c,The highest-scoring fragment from each tested ARF within the defined ARF7 and ARF5 AD regions (a, ARF7AD1; b, ARF7AD2; c, ARF5 AD) (orange bars in Extended Data Fig. 5b,d) were used to generate alignments with MAFFT. Alignments were visualized with the ESPript 3.0 webserver. Boxes indicate regions in which 50% of amino acid residues share sequence similarity based on biochemical properties. Bolded residues are the amino acids with shared properties within the region. Black boxes represent sequence conservation.
Extended Data Fig. 8 ARF6 and ARF8 subclade AD alignments.
The highest-scoring fragment from each tested ARF within the defined AD regions (orange bars in Extended Data Fig. 5c,e) were used to generate alignments with MAFFT. Alignments were visualized with the ESPript 3.0 webserver. Boxes indicate regions where 50% of amino acid residues share sequence similarity based on biochemical properties. Bolded residues are the amino acids with shared properties within the region. Black boxes represent sequence conservation.
Extended Data Fig. 9 MYB family ADs and prediction performance of TADA on the ARF evolution dataset.
a, Histogram of all AD hits (defined as a PADI score of greater than or equal to 1 and from an IDR) from the MYB family. Each bar represents the number of ADs found in each 5% interval of the protein length. These results show that MYB ADs are enriched in the final 15% of tested TFs. b, Representative gating strategy for all PADI libraries. Yeast cells were gated based on size to exclude doublets (R1 and R3). Single cells were then gated to exclude those with mCherry signal below background (R4) when compared to mCherry negative cells. The mCherry-positive cells were then binned and sorted into twelve populations based on the GFP:mCherry ratio. c,Prediction performance of TADA, and the TADAΔARF variation. TADA performance on the PADI data test set and the ARF evolution dataset in terms of precision, recall, area under the receiver operating curve (AUC), accuracy, AUPR and F1 score. We further validated the generalization of TADA by retraining TADA on the original training dataset but withholding the ARF sequences (2,046 of the 70,937 sequences), which we called TADAΔARF. This approach prevents TADA from memorizing/overfitting ARF sequences. d, Prediction performance of TADA, PADDLE, ADPred, and the composition model in terms of area under the receiver operating curve (roc_auc), area under the precision recall curve (pr_auc), accuracy, F1 score, true positive rate (tpr), false positive rate (fpr), precision, and recall when tested on the ARF evolution dataset. Because each of these predictors subdivides sequences differently and used different fragment lengths for training, we compared their performance on full-length protein sequence from the evolution dataset.
Extended Data Fig. 10 Arabidopsis TFs with identified ADs.
Waffle plots of the 1,918 Arabidopsis TFs analysed. Those with previously identified ADs are marked with a black box in the left waffle plot. The right waffle plot depicts those with activating fragments identified by PADI.
Supplementary information
Supplementary Table 1
This table contains name, locus, gene family, amino acid sequence, PADI score and biochemical information used to generate Figs. 1e, 1f, 1h–l, 2a, 2b, 3b-g, and 4a-c as well as associated extended figures. AD subtype and fragment type information are also included. Here we called any fragment that had a PADI score >=1 and mean disorder >0.5 as an “AD” and any fragment that had a PADI score >=1 and mean disorder <=0.5 as “Maybe.” All other fragments that had a PADI score <1 are “Not AD.”
Supplementary Table 2
This table has names and amino acid sequences of PADI hits by AD subtype.
Supplementary Table 3
This table contains the clade, species, names, amino acid sequences and PADI scores for the ARF evolution dataset used to generate Fig. 5a,c,d.
Supplementary Table 4
This table contains the features used by TADA to predict activation domain activity.
Supplementary Data 1
This file has a graphical representation of PADI data for every Arabidopsis TF tested. PADI (orange) and predicted disorder (white) scores for NLP7 show regions strong activity in disordered regions as well as ordered regions that overlap with the know PB1 domain. The orange (PADI = 1) and grey (Metapredict score = 0.5) dashed lines are considered cut-offs for activation and disorder, respectively.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Morffy, N., Van den Broeck, L., Miller, C. et al. Identification of plant transcriptional activation domains. Nature 632, 166–173 (2024). https://doi.org/10.1038/s41586-024-07707-3
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41586-024-07707-3
This article is cited by
-
Combined R2R3–MYB transcription factor mutants reveal the regulatory structure of the Arabidopsis thaliana flavonoid biosynthesis pathway
Planta (2026)
-
Efficient, cell-type-specific production of flavonols by multiplexed CRISPR activation of a suite of metabolic enzymes
Nature Communications (2025)
-
Mechanisms of auxin action in plant growth and development
Nature Reviews Molecular Cell Biology (2025)
-
Multiplexed profiling of transcriptional regulators in plant cells
Nature Biotechnology (2025)
-
Evolutionary origins and functional diversification of Auxin Response Factors
Nature Communications (2024)


