Extended Data Fig. 1: PADI workflow and quality control.
From: Identification of plant transcriptional activation domains

a, Extended depiction of the PADI assay. 1) DNA encoding 40-amino-acid fragments are synthesized and 2) cloned into a synthetic TF backbone in bulk. 3) Confirmed synthetic TF libraries are cloned into the URA3 locus of DHY211 yeast cells and positive clones are selected by G418 and 5-FOA resistance. 4) Positively cloned yeast TF libraries are mated to the MY435 reporter strain12. Positively mated clones are selected by G418 (library) and CloNAT (reporter) resistance. 5) Pooled mated libraries and controls are grown overnight and subcultured 1:5 with 1 µM beta-estradiol to induce synthetic TF localization to the nucleus. 6) After 4 hrs beta-estradiol treatment, mated yeast libraries are sorted into bins based on relative levels of GFP (reporter) to mCherry (synthetic TF) to determine AD activity. 7) Populations from each bin were grown overnight and sequenced to determine the distribution of tested fragments across bins. b,c, These plots show the correlation between PADI scores from all Arabidopsis TF libraries plotted against a pooled library where cells were sorted on median GFP (b) or mCherry (c) values. Each fragment was given a GFP or mCherry score based on the weighted mean of its appearance across all GFP or mCherry bins and then normalized using Z-score normalization consistent with how the PADI score was generated. The blue line represents the linear correlation of the data. There is a positive correlation between PADI score and GFP score, but not between PADI and mCherry scores. These results show that the PADI score is a robust measure of transcriptional activity regardless of the abundance of any TF. d, Scatter plot showing the correlation between two sorts of PADI library 3. Replicate 1 is included in all analysis. The blue line represents the linear regression of the two datasets. The linear regression model has an r-value of 0.657. e, Violin plots showing the PADI scores of four positive AD controls (n = 10 independent library experiments). The controls are found in all 10 PADI libraries and were consistently positive across libraries. The violin plot of Arabidopsis fragments (n = 69,347 fragments from 10 libraries) is also provided as a comparison. Box plots within the violin plot show the interquartile range and the median with whiskers that are 1.5 times the interquartile range. f, Box plots showing the PADI scores of tested control fragments across the 10 PADI libraries. Each point is the PADI score of the tested fragment and the colour of each point corresponds to the 10 PADI libraries (n = 10 independent experiments). All box plots show the interquartile range and the median. Whiskers are 1.5 times the interquartile range. g, Comparison of panels h–l from main text Fig. 1. The data presented from Fig. 1h–l (top) (n = 3,576) are presented above the same analysis conducted on all positive fragments regardless of mean disorder (bottom) (n = 6,207). The trends hold between the filtered data (top) and unfiltered data (bottom). h, Distribution of identified ADs across Arabidopsis TF families. i, Distribution of highest-scoring hits from each TF in each family. j, Distribution of the number of ADs identified per Arabidopsis TF. k, Distribution of number of contiguous hits identified per identified AD. Contiguous hits could be indicative of a short AD contained in neighbouring fragments or of an extended AD for which a subset of residues is sufficient to activate transcription; our data cannot distinguish between these. l, The distribution of hit locations revealed a bias towards the amino and carboxy termini of proteins. All box plots represent the median and interquartile range. The whiskers are 1.5 times the interquartile range.