Fig. 4: Features of de novo proteins. | Nature Communications

Fig. 4: Features of de novo proteins.

From: Uncovering de novo gene birth in yeast using deep transcriptomics

Fig. 4

a Identification of transcripts containing putative translated ORFs using ribosome profiling data and annotations. Using ribosome profiling data from yeast grown in rich media and oxidative stress conditions as well as all the annotated CDS information, we identified 97 de novo, 147 genus-specific and 4297 conserved transcripts with at least one translated open reading frame (ORF). The translated ORFs were detected by RibORF on the basis of high read 3-nucleotide periodicity and uniformity, using a score cut-off of 0.7, in one or both conditions. For sections b, c, and d we selected the longest translated ORF per transcript (Conserved n = 4297; Genus-specific n = 147; De novo n = 97). b Length of translated ORFs in different phylogenetic conservation classes. The length of the longest translated ORF per transcript showed a positive relationship with the conservation level. The median length is indicated in the plot. Length is in amino acids (aa). The values of each boxplot are as follows: ‘Conserved’ min 5.81, 25% percentile 7.82, median 8.55, 75% percentile 9.16, max 11.13; ‘Genus specific’ min 3, 25% percentile 4.98, median 6.11, 75% percentile 7.02, max 9.5; ‘De novo’ min 3, 25% percentile 4.52, median 5.19, 75% percentile 5.94, max 7.55. c Coding score of translated ORFs in different phylogenetic conservation classes. Coding score was calculated using a previously developed hexamer-based metric called CIPHER, which measures codon usage bias of putatively coding sequences with respect to non-coding sequences. Coding score shows a significantly positive relationship with the transcript conservation level. The values of each boxplot are as follows: ‘Conserved’ min 0.05, 25% percentile 0.19, median 0.23, 75% percentile 0.28, max 0.41; ‘Genus specific’ min −0.2, 25% percentile 0.04, median 0.13, 75% percentile 0.2, max 0.45; ‘De novo’ min −0.21, 25% percentile −0.01, median 0.06, 75% percentile 0.14, max 0.32. d Isoelectric point (IP) of translated ORFs in different phylogenetic conservation classes. IP was predicted with the R package ‘Peptides’, using the EMBOSS pKscale. Data are for the longest translated ORF per transcript. The values of each boxplot are as follows: ‘Conserved’ min 3.14, 25% percentile 5.26, median 7.08, 75% percentile 9.19, max 13.06; ‘Genus specific’ min 3.49, 25% percentile 6.43, median 8.61, 75% percentile 10.24, max 13.46; ‘De novo’ min 4.54, 25% percentile 8.12, median 9.33, 75% percentile 10.55, max 12.3. Significance between the distributions of the values for different variables was calculated with pairwise two-sided Wilcoxon tests; p-values are as follows: 4b) Gs-C < 2e-16; Dn-C < 2e-16; Dn-Gs <6.3e-05, 4c) Gs-C < 2e-16; Dn-C < 2e-16; Dn-Gs <5.8e-05, 4d) Gs-C 3.9e-06; Dn-C 8.5e-14; Dn-Gs 0.01; where Gs is Genus-specific, Dn is De novo and C is Conserved. Source data are provided as a Source Data file.

Back to article page