Fig. 2: Finding structural patterns at scale with GlyContact.

a Glycan 3D structures form distinct clusters. For all glycans on GlycoShape (n = 717), we used the superimpose_glycans function from GlyContact, with the flag “fast=True”, to calculate all pairwise alignments (always with the best-matching pair of conformers) via an SVD-based Kabsch algorithm using k-d spatial trees. Alignment distances were then used to cluster glycans via t-SNE initialized with a PCA. Glycan class is indicated by color and glycan size (i.e., number of monosaccharides) by a colored halo. The correlation of glycan size with x-axis spread is provided as Pearson’s r, with the results of a two-tailed t-test for testing the correlation against zero (p = 0.0004). b Glycan epitopes affect distal structural properties. For the example of core fucose in N-glycans, we used the glycontact.visualize.find_difference function to gather all ‘twins’ (pairs of sequences that only differed in the presence/absence of core fucose) for which we had structural data from GlycoShape (n = 43) and compared their average SASA values (excluding core fucose and its attachment site). Results are shown as box plots (line indicating the median, box edges indicating the 25th and 75th percentile, whiskers indicating the 95% confidence interval, and black circles indicating outliers), as well as an overlaid scatter plot of the actual values, with added horizontal jitter for visibility. Statistical testing involved a paired two-tailed t-test and Cohen’s dz as an effect size for paired samples. c, d Glycan motifs exposed by different glycan classes exhibit different structural characteristics. For the various isoforms of sialyl-LacNAc (Neu5Acα2-?Galβ1-?GlcNAc) in GlycoShape glycans, we used the glycontact.visualize.find_difference function to sum the monosaccharide-level SASA values for each motif (c, p = 0.00001) and averaged their flexibility (d). Then, we grouped glycans by glycan class (using the glycowork.motif.processing.get_class function) and analyzed differences in SASA (c) or torsion-based flexibility (d) across classes by a one-way ANOVA, followed by Tukey’s Honestly Significant Difference post-hoc test. The number of analyzed motif instances is provided under each box plot. Data are depicted as mean values, with box edges indicating quartiles, and whiskers indicating the remaining data distribution up to the 95% confidence interval.