Fig. 8: Comparison of evolutionary rate distributions for top-ranking proteins from each method with the whole proteome.
From: Evolutionary sparse learning reveals the shared genetic basis of convergent traits

a The distributions of mammalian evolutionary rates are shown for the top 100 ranking convergent proteins for ESL-PSC, CCS, CSUBST ωc, ASR-Codon, and Hyphy BUSTED, along with the distribution for all 14,509 orthologous protein alignments. The orange dashed line shows the median evolutionary rate for the whole proteome dataset. Differences between each top 100-protein distribution and the whole-proteome distribution were assessed by a two-sided two-sample Kolmogorov–Smirnov test. The exact P-values for the comparisons are as follows: ESL-PSC, P = 0.4789346; CCS, P = 3.320747 × 10−12; CSUBST ωc, P = 6.009956 × 10−10; ASR-Codon, P = 1.241988 × 10−12; HyPhy BUSTED, P = 0.01697650. Differences between the distributions for each top 100 protein list and the whole proteome were assessed by two-sample Kolmogorov-Smirnov tests with P-values shown above each method. The distribution for the top 100 ESL-PSC proteins is not significantly different than that of the whole proteome (2 sample K-S P = 0.479), indicating that ESL-PSC results were not biased toward proteins exhibiting pervasively slower or faster evolution across the mammalian tree. The Hyphy BUSTED method is included for comparison, but as a method that detects diversifying selection and not convergent substitutions, it is expected to find somewhat faster-evolving proteins. Evolutionary rates are measured in substitutions per site per billion years and are calculated for each gene alignment as the total branch length of the maximum likelihood gene tree as reported by the OrthoMaM database37 divided by the total time in the consensus species timetree59 pruned for taxon sampling variation across gene alignments. Violin plots show the kernel density estimate of the evolutionary rate distributions. Embedded box plots indicate the median (center line), interquartile range (bounds of the box), and 1.5 times the interquartile range (whiskers). b Differences in the median evolutionary rate between the top 100 proteins of each method and that of the whole proteome. Error bars indicate 95% bootstrap confidence intervals of the median difference for each method (see Supplementary Methods).