Fig. 1: Clustering of BP-Pol sequences.
From: Diversity of sugar-diphospholipid-utilizing glycosyltransferase families

a The first step of the clustering; SSN network with nodes representing individual proteins and edges representing pairwise alignment bit scores. Proteins are linked by edges if they have a pairwise score above 110. The resulting clusters are sorted according to number of protein members, with the largest cluster in the upper left corner. b The second step of the clustering: HMM models were built for each SSN cluster and the HMMs were compared using HHblits. A network was built with nodes representing SSN clusters and edges representing HHblits scores. SSN clusters are linked by edges if they have an HHblits score higher than 160. The resulting clusters are referred to as superclusters and are sorted according to number of SSN clusters. There are two edges between nodes, when the HHblits score is above 160 in both directions. The size of the nodes represents the number of members in the SSN cluster. The 14 largest superclusters (>150 GenBank members) define CAZy families GT122 - GT135. Nodes are colored consistently according to their respective CAZy family in both (a, b).