Table 4 Statistical comparison of learn2therm protein pairs to Hait et al.’s pairs.

From: Homologous Pairs of Low and High Temperature Originating Proteins Spanning the Known Prokaryotic Universe

Score

Statistical Test

Probability (Cov > 75)

Probability (Cov > 95)

Normalised percent identity

left t

1.75e−6

6.06e−155

Normalised bit score

left t

1.94e−8

9.47e−117

Pfam Jaccard

left t

0.99

3.24e−13

Structural P score

binomial P < 0.001

<0.001

<0.001

  1. The base set of protein pairs (alignment coverage >75%) as well as a subset (N = 24 mil, coverage >95%) are shown. Percent identity, bit score (normalised to the average length of both proteins), and Pfam annotation Jaccard score are t-tests, while output P-value of structural alignment was treated as a binary if smaller than 0.001. For all metrics, our pairs meet the baseline on average for the subset with >90% alignment coverage, and most are still quality for the full set. Statistical significance is bolded.