Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Identification of plant transcriptional activation domains

Abstract

Gene expression in Arabidopsis is regulated by more than 1,900 transcription factors (TFs), which have been identified genome-wide by the presence of well-conserved DNA-binding domains. Activator TFs contain activation domains (ADs) that recruit coactivator complexes; however, for nearly all Arabidopsis TFs, we lack knowledge about the presence, location and transcriptional strength of their ADs1. To address this gap, here we use a yeast library approach to experimentally identify Arabidopsis ADs on a proteome-wide scale, and find that more than half of the Arabidopsis TFs contain an AD. We annotate 1,553 ADs, the vast majority of which are, to our knowledge, previously unknown. Using the dataset generated, we develop a neural network to accurately predict ADs and to identify sequence features that are necessary to recruit coactivator complexes. We uncover six distinct combinations of sequence features that result in activation activity, providing a framework to interrogate the subfunctionalization of ADs. Furthermore, we identify ADs in the ancient AUXIN RESPONSE FACTOR family of TFs, revealing that AD positioning is conserved in distinct clades. Our findings provide a deep resource for understanding transcriptional activation, a framework for examining function in intrinsically disordered regions and a predictive model of ADs.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: High-throughput tiling of Arabidopsis TFs uncovers thousands of ADs.
Fig. 2: Using AD sequence features to create a predictive model.
Fig. 3: AD subtypes show distinct compositional biases.
Fig. 4: Validation of identified ADs.
Fig. 5: The position of ARF ADs has remained constant over evolutionary time.

Similar content being viewed by others

Data availability

Library sequencing data have been deposited in the NCBI’s Gene Expression Omnibus (GEO) and are accessible through the GEO series accession number GSE234215Source data are provided with this paper.

Code availability

All scripts for the neural network training and validation and for making predictions are available on GitHub (https://github.com/LisaVdB/TADA).

References

  1. Strader, L., Weijers, D. & Wagner, D. Plant transcription factors—being in the right place with the right company. Curr. Opin. Plant Biol. 65, 102136 (2022).

    CAS  PubMed  Google Scholar 

  2. O’Malley, R. C. et al. Cistrome and epicistrome features shape the regulatory DNA landscape. Cell 165, 1280–1292 (2016).

    PubMed  PubMed Central  Google Scholar 

  3. Galli, M. et al. The DNA binding landscape of the maize AUXIN RESPONSE FACTOR family. Nat. Commun. 9, 4526 (2018).

    ADS  PubMed  PubMed Central  Google Scholar 

  4. Sanborn, A. L. et al. Simple biochemical features underlie transcriptional activation domain diversity and dynamic, fuzzy binding to Mediator. eLife 10, e68068 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Dyson, H. J. & Wright, P. E. Role of Intrinsic protein disorder in the function and interactions of the transcriptional coactivators CREB-binding protein (CBP) and p300. J. Biol. Chem. 291, 6714–6722 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Ferreira, M. E. et al. Mechanism of transcription factor recruitment by acidic activators. J. Biol. Chem. 280, 21779–21784 (2005).

    CAS  PubMed  Google Scholar 

  7. Hermann, S., Berndt, K. D. & Wright, A. P. How transcriptional activators bind target proteins. J. Biol. Chem. 276, 40127–40132 (2001).

    CAS  PubMed  Google Scholar 

  8. Kim, J. Y. & Chung, H. S. Disordered proteins follow diverse transition paths as they fold and bind to a partner. Science 368, 1253–1257 (2020).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  9. Staller, M. V. et al. Directed mutational scanning reveals a balance between acidic and hydrophobic residues in strong human activation domains. Cell Syst. 13, 334–345 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Kotha, S. R. & Staller, M. V. Clusters of acidic and hydrophobic residues can predict acidic transcriptional activation domains from protein sequence. Genetics 225, iyad131 (2023).

    PubMed  PubMed Central  Google Scholar 

  11. Hummel, N. F. C. et al. The trans-regulatory landscape of gene networks in plants. Cell Syst. 14, 501–511 (2023).

    CAS  PubMed  Google Scholar 

  12. Staller, M. V. et al. A high-throughput mutational scan of an intrinsically disordered acidic transcriptional activation domain. Cell Syst. 6, 444–455 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Konishi, M. & Yanagisawa, S. The role of protein–protein interactions mediated by the PB1 domain of NLP transcription factors in nitrate-inducible gene expression. BMC Plant Biol. 19, 90 (2019).

    PubMed  PubMed Central  Google Scholar 

  14. Hahn, S. & Young, E. T. Transcriptional regulation in Saccharomyces cerevisiae: transcription factor regulation and function, mechanisms of initiation, and roles of activators and coactivators. Genetics 189, 705–736 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Emenecker, R. J., Griffith, D. & Holehouse, A. S. Metapredict: a fast, accurate, and easy-to-use predictor of consensus disorder and structure. Biophys. J. 120, 4312–4319 (2021).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  16. Hope, I. A., Mahadevan, S. & Struhl, K. Structural and functional characterization of the short acidic transcriptional activation region of yeast GCN4 protein. Nature 333, 635–640 (1988).

    ADS  CAS  PubMed  Google Scholar 

  17. Hope, I. A. & Struhl, K. Functional dissection of a eukaryotic transcriptional activator protein, GCN4 of yeast. Cell 46, 885–894 (1986).

    CAS  PubMed  Google Scholar 

  18. Mitchell, P. J. & Tjian, R. Transcriptional regulation in mammalian cells by sequence-specific DNA binding proteins. Science 245, 371–378 (1989).

    ADS  CAS  PubMed  Google Scholar 

  19. Mahatma, S. et al. Prediction and functional characterization of transcriptional activation domains. In 57th Annual Conference on Information Sciences and Systems (CISS) 1–6 (2023).

  20. Erijman, A. et al. A high-throughput screen for transcription activation domains reveals their sequence features and permits prediction by deep learning. Mol. Cell 78, 890–902 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Lundberg, S. & Lee, S.-I. A unified approach to interpreting model predictions. In Proc. 31st International Conference on Neural Information Processing Systems 4768–4777 (2017).

  22. Hussain, R. M. F., Sheikh, A. H., Haider, I., Quareshy, M. & Linthorst, H. J. M. Arabidopsis WRKY50 and TGA transcription factors synergistically activate expression of PR1. Front. Plant Sci. 9, 930 (2018).

    PubMed  PubMed Central  Google Scholar 

  23. Li, J. et al. Activation domains for controlling plant gene expression using designed transcription factors. Plant Biotechnol. J. 11, 671–680 (2013).

    CAS  PubMed  Google Scholar 

  24. Cho, S. et al. Analysis of the C-terminal region of Arabidopsis thaliana APETALA1 as a transcription activation domain. Plant Mol. Biol. 40, 419–429 (1999).

    CAS  PubMed  Google Scholar 

  25. Sakuma, Y. et al. Functional analysis of an Arabidopsis transcription factor, DREB2A, involved in drought-responsive gene expression. Plant Cell 18, 1292–1309 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Kotak, S., Port, M., Ganguli, A., Bicker, F. & von Koskull-Doring, P. Characterization of C-terminal domains of Arabidopsis heat stress transcription factors (Hsfs) and identification of a new signature combination of plant class A Hsfs with AHA and NES motifs essential for activator function and intracellular localization. Plant J. 39, 98–112 (2004).

    CAS  PubMed  Google Scholar 

  27. Yoo, C. Y. et al. Direct photoresponsive inhibition of a p53-like transcription activation domain in PIF3 by Arabidopsis phytochrome B. Nat. Commun. 12, 5614 (2021).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  28. Fernandez-Calvo, P. et al. The Arabidopsis bHLH transcription factors MYC3 and MYC4 are targets of JAZ repressors and act additively with MYC2 in the activation of jasmonate responses. Plant Cell 23, 701–715 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Tiwari, S. B., Hagen, G. & Guilfoyle, T. The roles of auxin response factor domains in auxin-responsive transcription. Plant Cell 15, 533–543 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Ulmasov, T., Hagen, G. & Guilfoyle, T. J. Activation and repression of transcription by auxin-response factors. Proc. Natl Acad. Sci. USA 96, 5844–5849 (1999).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  31. Pierre-Jerome, E., Jang, S. S., Havens, K. A., Nemhauser, J. L. & Klavins, E. Recapitulation of the forward nuclear auxin response pathway in yeast. Proc. Natl Acad. Sci. USA 111, 9407–2412 (2014).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  32. Powers, S. K. & Strader, L. C. Regulation of auxin transcriptional responses. Dev. Dyn. 249, 483–495 (2020).

    CAS  PubMed  Google Scholar 

  33. Choi, H. S., Seo, M. & Cho, H. T. Two TPL-binding motifs of ARF2 are involved in repression of auxin responses. Front. Plant Sci. 9, 372 (2018).

    PubMed  PubMed Central  Google Scholar 

  34. Hiratsu, K., Matsui, K., Koyama, T. & Ohme-Takagi, M. Dominant repression of target genes by chimeric repressors that include the EAR motif, a repression domain, in Arabidopsis. Plant J. 34, 733–739 (2003).

    CAS  PubMed  Google Scholar 

  35. Mutte, S. K. et al. Origin and evolution of the nuclear auxin response system. eLife 7, e33399 (2018).

    PubMed  PubMed Central  Google Scholar 

  36. DelRosso, N. et al. Large-scale mapping and mutagenesis of human transcriptional effector domains. Nature 616, 365–372 (2023).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  37. Leydon, A. R. et al. Repression by the Arabidopsis TOPLESS corepressor requires association with the core mediator complex. eLife 10, e66739 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. Holehouse, A. S., Das, R. K., Ahad, J. N., Richardson, M. O. & Pappu, R. V. CIDER: resources to analyze sequence-ensemble relationships of intrinsically disordered proteins. Biophys. J. 112, 16–21 (2017).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  39. Kagale, S. & Rozwadowski, K. EAR motif-mediated transcriptional repression in plants: an underlying mechanism for epigenetic regulation of gene expression. Epigenetics 6, 141–146 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Boer, D. R. et al. Structural basis for DNA binding specificity by the auxin-dependent ARF transcription factors. Cell 156, 577–589 (2014).

    CAS  PubMed  Google Scholar 

  41. Korasick, D. A. et al. Molecular basis for AUXIN RESPONSE FACTOR protein interaction and the control of auxin response repression. Proc. Natl Acad. Sci. USA 111, 5427–5432 (2014).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  42. Havens, K. A. et al. A synthetic approach reveals extensive tunability of auxin signaling. Plant Physiol. 160, 135–142 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. Hillson, N. J., Rosengarten, R. D. & Keasling, J. D. j5 DNA assembly design automation software. ACS Synth. Biol. 1, 14–21 (2012).

    CAS  PubMed  Google Scholar 

  44. Garcia-Nafria, J., Watson, J. F. & Greger, I. H. IVA cloning: a single-tube universal cloning system exploiting bacterial in vivo assembly. Sci. Rep. 6, 27459 (2016).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  45. Gietz, R. D. & Schiestl, R. H. High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method. Nat. Protoc. 2, 31–34 (2007).

    CAS  PubMed  Google Scholar 

  46. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).

    PubMed  PubMed Central  Google Scholar 

  47. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    PubMed  PubMed Central  Google Scholar 

  48. Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  49. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    MathSciNet  Google Scholar 

  50. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  51. Kobak, D. & Berens, P. The art of using t-SNE for single-cell transcriptomics. Nat. Commun. 10, 5416 (2019).

    ADS  PubMed  PubMed Central  Google Scholar 

  52. Pierre-Jerome, E., Wright, R. C. & Nemhauser, J. L. Characterizing auxin response circuits in Saccharomyces cerevisiae by flow cytometry. Methods Mol. Biol. 1497, 271–281 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Wright, R. C., Bolten, N. & Pierre-Jerome, E. flowTime: annotation and analysis of biological dynamical systems using flow cytometry. R version 1.24.0 https://www.bioconductor.org/packages/release/bioc/html/flowTime.html (2023).

  54. White, S. et al. FlowKit: a Python toolkit for integrated manual and automated cytometry analysis workflows. Front. Immunol. 12, 768541 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. Lotthammer, J. M., Ginell, G. M., Griffith, D., Emenecker, R. J. & Holehouse, A. S. Direct prediction of intrinsically disordered protein conformational properties from sequence. Nat. Methods 21, 465–476 (2024).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Science Foundation (PGRP BIO-2112056 to L.C.S., PGRP BIO-2112057 to M.V.S. and PGRP BIO-2112058 to R.S.), the NSF Postdoctoral Research Program (IOS-1907098 to N.M.) and the National Institutes of Health (R35 GM136338 to L.C.S.).

Author information

Authors and Affiliations

Authors

Contributions

N.M., M.V.S., R.S. and L.C.S. designed the study. N.M. and M.V.S. designed the pilot tiling libraries. N.M. designed the PADI tiling and ARF evolution tiling libraries. C.M. and N.M. cloned and integrated libraries into yeast. L.V.d.B., S.M., V.P., A.W. and R.S. designed and implemented the TADA network. R.J.E. and A.S.H. performed biophysical simulations and advised on Metapredict. J.A.B. and R.C.W. assessed ARF7 AD activity in the yeast synthetic auxin signalling system. N.M., T.M.L., K.S.-F., E.G.W., S.P. and L.C.S. tested ADs in protoplasts. S.R.K. and A.L. examined human TFs with TADA. N.M. and L.V.d.B. wrote the manuscript, with important contributions from R.S. and L.C.S., and contributions from all other authors. L.C.S. supervised the project, with contributions from R.S. and M.V.S.

Corresponding author

Correspondence to Lucia C. Strader.

Ethics declarations

Competing interests

L.C.S. is on the science advisory board of Prose Foods. R.S. is founder of Raleigh Biosciences. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature thanks Jennifer Brophy and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 PADI workflow and quality control.

a, Extended depiction of the PADI assay. 1) DNA encoding 40-amino-acid fragments are synthesized and 2) cloned into a synthetic TF backbone in bulk. 3) Confirmed synthetic TF libraries are cloned into the URA3 locus of DHY211 yeast cells and positive clones are selected by G418 and 5-FOA resistance. 4) Positively cloned yeast TF libraries are mated to the MY435 reporter strain12. Positively mated clones are selected by G418 (library) and CloNAT (reporter) resistance. 5) Pooled mated libraries and controls are grown overnight and subcultured 1:5 with 1 µM beta-estradiol to induce synthetic TF localization to the nucleus. 6) After 4 hrs beta-estradiol treatment, mated yeast libraries are sorted into bins based on relative levels of GFP (reporter) to mCherry (synthetic TF) to determine AD activity. 7) Populations from each bin were grown overnight and sequenced to determine the distribution of tested fragments across bins. b,c, These plots show the correlation between PADI scores from all Arabidopsis TF libraries plotted against a pooled library where cells were sorted on median GFP (b) or mCherry (c) values. Each fragment was given a GFP or mCherry score based on the weighted mean of its appearance across all GFP or mCherry bins and then normalized using Z-score normalization consistent with how the PADI score was generated. The blue line represents the linear correlation of the data. There is a positive correlation between PADI score and GFP score, but not between PADI and mCherry scores. These results show that the PADI score is a robust measure of transcriptional activity regardless of the abundance of any TF. d, Scatter plot showing the correlation between two sorts of PADI library 3. Replicate 1 is included in all analysis. The blue line represents the linear regression of the two datasets. The linear regression model has an r-value of 0.657. e, Violin plots showing the PADI scores of four positive AD controls (n = 10 independent library experiments). The controls are found in all 10 PADI libraries and were consistently positive across libraries. The violin plot of Arabidopsis fragments (n = 69,347 fragments from 10 libraries) is also provided as a comparison. Box plots within the violin plot show the interquartile range and the median with whiskers that are 1.5 times the interquartile range. f, Box plots showing the PADI scores of tested control fragments across the 10 PADI libraries. Each point is the PADI score of the tested fragment and the colour of each point corresponds to the 10 PADI libraries (n = 10 independent experiments). All box plots show the interquartile range and the median. Whiskers are 1.5 times the interquartile range. g, Comparison of panels h–l from main text Fig. 1. The data presented from Fig. 1h–l (top) (n = 3,576) are presented above the same analysis conducted on all positive fragments regardless of mean disorder (bottom) (n = 6,207). The trends hold between the filtered data (top) and unfiltered data (bottom). h, Distribution of identified ADs across Arabidopsis TF families. i, Distribution of highest-scoring hits from each TF in each family. j, Distribution of the number of ADs identified per Arabidopsis TF. k, Distribution of number of contiguous hits identified per identified AD. Contiguous hits could be indicative of a short AD contained in neighbouring fragments or of an extended AD for which a subset of residues is sufficient to activate transcription; our data cannot distinguish between these. l, The distribution of hit locations revealed a bias towards the amino and carboxy termini of proteins. All box plots represent the median and interquartile range. The whiskers are 1.5 times the interquartile range.

Source Data

Extended Data Fig. 2 PADI hit characterization.

ad, Box plots showing the number of D + E (a) R + K + H (b) A + I + L + M + V (c) and S + N + P + Q (d) of each subtype (n ≥ 625). Letters correspond to the statistical levels of each subtype based on the Tukey–Kramer HSD metric with an alpha-level of 0.05. e, Scatter plot showing the correlation between the percentage of TFs with at least one AD (defined as a PADI score of greater than or equal to 1 and from an IDR) and the mean of the highest-scoring AD from each TF in a family. The line represents the linear regression and the shaded area represent the 95% confidence interval. f, Box plots showing the net charge of hits from each of the six AD subtypes (n ≥ 625). g, Heat map showing the distribution of Rg values against PADI score for all tested fragments (n = 6,207). We used simulations to examine the radius of gyration (Rg), which is a measure of the volume that an IDR ensemble occupies. Rg is particularly relevant to the AD molecular mechanism, as exposure of interacting side chains is necessary for interaction with the transcriptional machinery. We found that the Rg of our identified ADs occupied a narrow range of radii, as compared to the tested library, raising the possibility that ADs must adopt sufficiently expanded conformations for activity. h, Box plots showing the Rg values of each subtype; Rg was similar across subtypes (n ≥ 625). i, Table describing the PADI fragments tested in the synthetic TFs in Fig. 3h. The fragment key, its Arabidopsis identifier, amino acid sequence, PADI score, and subtype are shown. j, Box plots showing the distribution of PADI scores for each of the six subtypes. The stars represent the PADI score of the fragments tested for activity in Fig. 3h and shown in Extended Data Fig. 2i. The tested fragments span the range of PADI scores found in the six subtypes (n ≥ 625). Stars depict the PADI scores of selected hits for testing in protoplasts. k, Protein accumulation of Synthetic TFs from Fig. 3h. Violin plots show the mScarlet-TF values of cells. The black lines mark the mean mScarlet-TF value of each sample (n ≥ 529 cells from 3 independent transfections). l, Protein accumulation of FrankenARF TFs from Fig. 4e. Violin plots show the mNEON-TF values of cells. The black lines mark the mean mNEON-TF value of each sample (n ≥ 2,212 cells from 4 independent transfections). All cells collected for reporter expression were gated on the presence of TF signal when compared to blank cells. Only positive cells were used to collect output data presented in Figs. 3h and 4e. m, Gating strategy for examination of AD activity in protoplasts. Cells were gated based on size and mScarlet (for presence of TF) signal as depicted. Untransfected cells did not display signal above the threshold for mScarlet (left) whereas control cells transfected with the TF lacking an AD (middle) and cells transfected with the TF carrying VP16 (right) were selected for assessment of mNeonGreen (transcriptional output). All box plots represent the median and interquartile range. The whiskers are 1.5 times the interquartile range.

Source Data

Extended Data Fig. 3 Classification performance of TADA and effect of features on TADA’s prediction performance.

a, The loss of TADA during training and validation. b, TADA’s performance in terms of precision, recall, area under the receiver operating curve (AUC), accuracy, AUPR and F1 score. TADA was trained three distinct times using random peptides20, PADI (referred to as “plant TFs”), and random peptides and PADI combined. c, TADA outperforms all published AD predictors. We compared the performance TADA with three published AD predictors (ADpred, PADDLE and a composition model4,10,20. We used a hand-curated list of 599 ADs from 451 human TFs. For each TF, we predicted ADs and considered predictions that overlapped a known annotation by > 10 amino acids to be true positive, using each predictor. TADA made the most predictions, had the highest Sensitivity, and highest F1 score. d, Z-score normalized SHAP values leading to the selection of 8 features with a z-score above 1. e, Normalized SHAP values ranked from overall most important to least important for fragments scoring above 1 for each of the 6 identified AD subclasses.

Source Data

Extended Data Fig. 4 AD subtypes by TF family.

Heat map showing the percentage of hits (defined as a PADI score ≥ 1) from each subtype found in each family in Arabidopsis.

Source Data

Extended Data Fig. 5 Comparison of PADI hits to previous activators and distribution of hits across the middle regions of clade-A ARF subclades.

a, Hummel et al.11 identified ADs in sixty-eight Arabidopsis TFs that could elicit a transcriptional response when transiently expressed in intact tobacco leaves. We identified fragments that could activate transcription in yeast from fifty-six (82%) of the sixty-eight TFs factors identified by Hummel et al. We did not identify fragments that could elicit yeast-based transcription from nine TFs in which Hummel et al. demonstrated transcriptional activity. An additional three TFs were untested in the PADI dataset. It is possible that for the 9 TFs for which Hummel et al. found activation activity and in which we did not identify a hit in our PADI screen that either 1) they contain ADs that are active in plant cells but not in yeast or 2) the nearly intact TFs used by Hummel et al. recruited other coactivators in their system (for example native TFs that contain an AD). be, Orange regions were used to define AD regions for alignment in Extended Data Figs. 7 and 8. b, ARF5 clade. c, ARF6 clade. d, ARF7 clade. e, ARF8 clade.

Source Data

Extended Data Fig. 6 Phylogeny of examined ARFs.

The maximum-likelihood tree was generated using MAFFT alignments of the conserved ARF DBD. Major ARF clades (bright blue, orange and green) and subclades (light blue, orange and green) are annotated. These annotations were used for categorizing sequences in Fig. 4.

Extended Data Fig. 7 ARF7 and ARF5 subclade AD alignments.

ac,The highest-scoring fragment from each tested ARF within the defined ARF7 and ARF5 AD regions (a, ARF7AD1; b, ARF7AD2; c, ARF5 AD) (orange bars in Extended Data Fig. 5b,d) were used to generate alignments with MAFFT. Alignments were visualized with the ESPript 3.0 webserver. Boxes indicate regions in which 50% of amino acid residues share sequence similarity based on biochemical properties. Bolded residues are the amino acids with shared properties within the region. Black boxes represent sequence conservation.

Extended Data Fig. 8 ARF6 and ARF8 subclade AD alignments.

The highest-scoring fragment from each tested ARF within the defined AD regions (orange bars in Extended Data Fig. 5c,e) were used to generate alignments with MAFFT. Alignments were visualized with the ESPript 3.0 webserver. Boxes indicate regions where 50% of amino acid residues share sequence similarity based on biochemical properties. Bolded residues are the amino acids with shared properties within the region. Black boxes represent sequence conservation.

Extended Data Fig. 9 MYB family ADs and prediction performance of TADA on the ARF evolution dataset.

a, Histogram of all AD hits (defined as a PADI score of greater than or equal to 1 and from an IDR) from the MYB family. Each bar represents the number of ADs found in each 5% interval of the protein length. These results show that MYB ADs are enriched in the final 15% of tested TFs. b, Representative gating strategy for all PADI libraries. Yeast cells were gated based on size to exclude doublets (R1 and R3). Single cells were then gated to exclude those with mCherry signal below background (R4) when compared to mCherry negative cells. The mCherry-positive cells were then binned and sorted into twelve populations based on the GFP:mCherry ratio. c,Prediction performance of TADA, and the TADAΔARF variation. TADA performance on the PADI data test set and the ARF evolution dataset in terms of precision, recall, area under the receiver operating curve (AUC), accuracy, AUPR and F1 score. We further validated the generalization of TADA by retraining TADA on the original training dataset but withholding the ARF sequences (2,046 of the 70,937 sequences), which we called TADAΔARF. This approach prevents TADA from memorizing/overfitting ARF sequences. d, Prediction performance of TADA, PADDLE, ADPred, and the composition model in terms of area under the receiver operating curve (roc_auc), area under the precision recall curve (pr_auc), accuracy, F1 score, true positive rate (tpr), false positive rate (fpr), precision, and recall when tested on the ARF evolution dataset. Because each of these predictors subdivides sequences differently and used different fragment lengths for training, we compared their performance on full-length protein sequence from the evolution dataset.

Extended Data Fig. 10 Arabidopsis TFs with identified ADs.

Waffle plots of the 1,918 Arabidopsis TFs analysed. Those with previously identified ADs are marked with a black box in the left waffle plot. The right waffle plot depicts those with activating fragments identified by PADI.

Supplementary information

Reporting Summary

Supplementary Table 1

This table contains name, locus, gene family, amino acid sequence, PADI score and biochemical information used to generate Figs. 1e, 1f, 1h–l, 2a, 2b, 3b-g, and 4a-c as well as associated extended figures. AD subtype and fragment type information are also included. Here we called any fragment that had a PADI score >=1 and mean disorder >0.5 as an “AD” and any fragment that had a PADI score >=1 and mean disorder <=0.5 as “Maybe.” All other fragments that had a PADI score <1 are “Not AD.”

Supplementary Table 2

This table has names and amino acid sequences of PADI hits by AD subtype.

Supplementary Table 3

This table contains the clade, species, names, amino acid sequences and PADI scores for the ARF evolution dataset used to generate Fig. 5a,c,d.

Supplementary Table 4

This table contains the features used by TADA to predict activation domain activity.

Supplementary Data 1

This file has a graphical representation of PADI data for every Arabidopsis TF tested. PADI (orange) and predicted disorder (white) scores for NLP7 show regions strong activity in disordered regions as well as ordered regions that overlap with the know PB1 domain. The orange (PADI = 1) and grey (Metapredict score = 0.5) dashed lines are considered cut-offs for activation and disorder, respectively.

Source data

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Morffy, N., Van den Broeck, L., Miller, C. et al. Identification of plant transcriptional activation domains. Nature 632, 166–173 (2024). https://doi.org/10.1038/s41586-024-07707-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41586-024-07707-3

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing