Extended Data Fig. 10: Phylogenetic analysis of the AmGH20A CBM-like domain.

a Phylogenetic tree generated from a search with the C-terminal CBM-like domain of AmGH20A (aa 555–665) as a query against the non-redundant database using Delta-Blast (Domain enhanced lookup time accelerated blast) algorithm against sequences with 20–95 % sequence identity and 90–100% sequence coverage. The search retrieved 549 sequences, mostly derived from other putative HexNAcases, but also putative M14 peptidases and Gfo/Idh/MocA-like oxidoreductases. Sequences were initially aligned using MAFFT and then trimmed to only include the segment corresponding the CBM-like domain of AmGH20A. The redundancy of the sequences was then reduced using CD-HIT with a 95% identity cutoff and MaxAlign at default settings. A phylogenetic tree was generated from the remaining 253 sequences and annotated based on phylum affiliation. The cluster shaded with blue harbours the AmGH20A CBM. b The N-acetylgalactosamine binding residues that are presented by three loops are shown and polar interactions are denoted with yellow dotted lines. Sequence logos of the loops forming the GalNAc-binding site, show conservation in the AmGH20A CBM cluster, as compared to the conservation in all sequences in the tree, respectively. The positions of residues shown in the cartoon are marked with black arrows.