Fig. 1 | Scientific Data

Fig. 1

From: Homologous Pairs of Low and High Temperature Originating Proteins Spanning the Known Prokaryotic Universe

Fig. 1

The taxonomic and protein space covered by the protein homologous pairs (N = 69 mil) within the dataset. Left) NCBI taxonomic breakdown of the dataset88. The outer ring depicts super kingdom, Phylum, Class, and Order, where the size of wedges indicates the number of organisms in the classification that contain at least one protein in a learn2therm protein pair. Highly populated Phylum and Class are labelled. The inner ring moving inward is a histogram of the number of proteins participating in pairs per organism followed by a colour mapping labelling organisms as mesophilic in blue and thermophilic in red. Central connections indicate taxa pairs contributing to protein pairs. Right) Two dimensional mapping (using t-SNE) of a sample protein space as determined by Evolutionary Scale Model (ESM) embeddings89. In blue, a sample of data from the ESM Atlas with highest structural confidence. Note that this data contains eukaryotic proteins. In yellow, our proteins in pairs, and in orange, the current largest set of protein pairs across temperature. Size of samples conserves relative size of our proteins vs. the Atlas and reference dataset. For details of the mapping procedure, see Supplementary Information S8.

Back to article page