Fig. 4: Global and stage-specific patterns of transcript–protein correlations across Bombyx mori development.

a Heatmap displaying the Pearson correlation coefficient between transcript and protein expression (n = 6032 genes, quantified in transcriptome and proteome). b Violin plot showing the distribution of Pearson correlation coefficients between transcript and protein abundance per gene (n = 5271 genes with dynamic expression at either the transcript and/or protein level). Black, blue and gray dots represent negative (Pearson’s R < 0, P value < 0.05), positive (Pearson’s R > 0, P value < 0.05) and no transcript–protein correlation (P value ≥ 0.05), respectively. The box plot displays the distribution of Pearson’s R; the horizontal line represents the median and the upper and lower edges of the box depict the interquartile range (IQR). c Significantly enriched Gene Ontology (GO) terms and Pfam domain results of genes with negative, no (zero) and positive transcript–protein correlation (Fisher’s exact test, FDR < 0.05). Color corresponds to enrichment −log10(FDR), and the circle size represents the number of genes per GO term or Pfam domain. The top 5 most significantly enriched terms with negative, no and positive transcript–protein correlations are depicted (all terms in Supplementary Data S9). d Protein (dashed line) and transcript (solid line) expression levels of genes assigned to the respective clusters (cluster-wise median z-score of mean CPM values and mean LFQ values) with highly positive (cluster 1) or negative (cluster 9) transcript–protein correlations generated by unsupervised SOM clustering are shown. The box plots show the distribution of normalized expression levels; black dot or triangle represent the median and box edges indicate the IQR. In addition, for each cluster, the Pearson’s R between protein and transcript expression levels and the corresponding P value are depicted. e Violin plots illustrating the stage-specific transcript–protein index (representing the difference in means between transcript and protein expression). Box plots display the transcript–protein index distribution; the horizontal line represents the median and the upper and lower edges of the box depict the IQR. The red line connects stage-specific median transcript–protein indexes. f Line plots displaying the median stage-specific transcript–protein indexes of the 15 clusters assigned to 4 groups using k-means clustering. Box plots show the distribution of transcript–protein indexes, the horizontal lines represent the median, and the box edges indicate the IQR. g Significantly enriched GO term and Pfam domain results (Fisher’s exact test, FDR < 0.05) for genes within each of the four groups depicted in (f). Color corresponds to enrichment −log10(FDR), and the circle size represents the number of genes per GO term or Pfam domain. The top five most significantly enriched terms per group are depicted (all terms in Supplementary Data S10).