Extended Data Fig. 9: Radial phylogenetic tree of S1_11 showing the conservation of the substrate recognition triad and N-sulfate specificity features.
From: Sulfated glycan recognition by carbohydrate sulfatases of the human gut microbiota

The tree comprises a total of 2178 sequences of which 1190 are Bacteroidetes; 233 are Verrucomicrobia;184 are Planctomycetes;143 are Ascomycota (fungi); 100 are Actinobacteria.The annotations next to the colour code concern the presence or absence of conservation of the indicated residues and in this order: R290, W273, D385, R387 and H471. These residues are required for substrate recognition by BT46566S-GlcNAc/GlcNS (acc-code Q89YS5). D385, R387, and H471 represent the recognition triad, whilst the presence of W or R at positions 273 and 290, respectively, represent N-sulfate specificity features. Residue numbers have been omitted for simplicity. For example, an R in black means an equivalent arginine is present; a grey and bold letter at this position means that the corresponding residue is replaced by that amino acid; the grey and italic R at this position means that the R-equivalent position is replaced by any type of amino acid; a bold grey R followed by one-letter codes in parentheses indicates that the R-equivalent position can be substituted by any of those amino acids; the dash at the R-equivalent position indicates that no equivalent amino acid can be deduced from the multiple alignment. Branches having the same colour have the corresponding pattern in common. Red filled diamonds designate sequences of S1_11 sulfatases from B. thetaiotaomicron. All sequences in the specific branch that contains BT46566S-GlcNAc/GlcNS are found within a conserved heparan sulfate PUL. For clarity, all labels and sequence accession codes have been omitted (See Supplementary Fig 7 for full tree).