Extended Data Fig. 6: Inferring redundant gene function and physical interactions from co-expression analysis.
From: Mass-spectrometry-based draft of the Arabidopsis proteome

a, Scatter plot of Pearson’s correlation coefficients (r) as a measure for co-expression across tissues for all pairs of proteins (x axis) and all pairs of transcripts (y axis) (core dataset only, n = 5,043) along with their marginal histograms. Colours denote the log10-normalized STRING scores of individual gene pairs as a measure of known or predicted direct (physical) or indirect (functional) associations. Strong co-expression of transcripts or proteins or both are more strongly related (physically or functionally) than transcripts and proteins that are not. b, Co-expression analysis of duplicated genes (pairs had to be detected in at least 10 matching tissues to be included in the analysis). The density plots show the distribution of Pearson’s correlation coefficients (r) of co-expressed transcripts (grey) or proteins (blue) for genes that arose by whole-genome duplications (WGD), local duplications or transposon-mediated duplications. Randomly selected gene pairs are shown as a control. Medians are given and displayed as dotted lines. There is substantial co-expression of duplicated genes, indicating that these genes probably have redundant functions. c, Left, protein-level Pearson’s correlation coefficient (r) values (from b) for all duplicate gene pairs (WGD, local, transposed) plotted against the protein abundance ratio of each pair (average across 30 tissues) (Methods). Blue arrows denote an example of a high or low ratio of protein production for the duplicated genes. Right, example for tissue-resolved protein intensity proportions (top-3) (Methods) for the duplicate pair MAC5A and MAC5B. Irrespective of the tissue, MAC5A is always much higher expressed than MAC5B. Tissues are coloured as in Fig. 1. d, Top, ranked protein abundance ratio for selected duplicate pairs (mean ± s.d.; n = 30) and annotated for phenotypic effects (bottom) in the loss-of-function mutant for either duplicate 1 or duplicate 2 (+). Minus symbols denotes absence of a phenotypic effect. Asymmetric protein production within duplicate pairs can be associated with the occurrence of a phenotype in the loss-of-function mutant of the higher expressed duplicate protein, indicating a dominant functional role of the more highly expressed protein. Blue arrows highlight MAC5A–MAC5B and PHB3–PHB4 as examples. e, Inference of physical protein–protein interactions from co-expression data. Distribution of pairwise Pearson’s correlation coefficients (r) of co-expressed proteins across (at least 10) tissues that are subunits of selected protein complexes. r > 0.5 (shaded in grey) was chosen as a cut-off for the selection of proteins for subsequent analysis to make sure that proteins present in well-characterized protein complexes are retained. CONSTITUTIVE PHOTOMORPHOGENESIS9 SIGNALOSOME (CSN), CELLULOSE SYNTHASE (CESA). f, Recovery of annotated protein–protein interactions by co-expression analysis. Distribution of Pearson’s correlation coefficients (r) of pairs of transcripts (grey) or protein (blue) that are annotated to interact physically in the AtPIN database33 (pairs had to be detected in at least 10 matching tissues to be included in the analysis). Subsets of the AtPIN database, namely interactions detected by the yeast two-hybrid (Y2H) method, by affinity purification–mass spectrometry (AP–MS) or both. r > 0.5 are shaded in blue (protein). Dotted lines denote median values. Co-expression only recovers a minority of annotated physical interactions andinteractions supported by more than one line of experimental evidence also tend to show stronger co-expression.