Fig. 6 | Scientific Data

Fig. 6

From: Homologous Pairs of Low and High Temperature Originating Proteins Spanning the Known Prokaryotic Universe

Fig. 6

Comparison of protein pair quality between our data and Hait et al.’s 1660 protein pairs52. (a) Empirical distribution of local alignment homology as bit score normalised to the average length of both protein strands. Our data has a right shifted score on average with t-test probability = 1.94e−8. (b) Same as A, except with percent identity. Our data has a right shifted score on average with t-test probability = 1.75e−6. (c) Empirical distribution of Jaccard score over Pfam annotations for our data (blue) compared to the baseline data (orange). Our full data has more annotation mismatches on average. When only the 25 mil protein pairs with BLAST coverage >95% are considered, the Pfam annotations become indistinguishable from the baseline with t-test probability = 3.24e−13. (d) Cumulative distribution of FATCAT structural alignment P-value for bins in BLAST coverage uniformly sampled from our data, compared to the baseline structural alignments. Even low coverage pairs are more likely to have less than one in a thousand P-value with binomial confidence >99%.

Back to article page