Fig. 4: RSA-based binding site clusters and examples.

a RSA profiles of the 293 binding sites that were grouped in four, C1-C4, clusters by K-means based on the difference between their RSA profiles (UD). Each binding site is represented by a vector, plotted as a bar here. The elements of the vector represent the residues that form the binding site and are sorted according to their RSA, so buried residues are at the beginning of the vector (bottom), and more accessible residues towards the end (top). Each element of the vector, or section of the bar, is coloured according to RSA, using the matplotlib cividis colour palette. Within each cluster, binding sites are sorted based on the number of amino acids. Over each cluster, a line is drawn at RSA = 25%. b Six examples of binding sites are shown in structure for each cluster. Examples were selected to represent the range of binding site sizes within each cluster. IDs are UniProt accession codes. Binding site residues are coloured according to their RSA, using the cividis colour scheme. The rest of the protein is coloured in white. Ligands binding to the site in question are coloured in red.