Fig. 6: Glyco-motif level statistics require half as many samples to reach the same level of statistical power as analysis with raw glycans.
From: Correcting for sparsity and interdependence in glycomics by accounting for glycan biosynthesis

a, b The use of glyco-motifs improves measures of regression robustness. The coefficient magnitude and Standard Error indicate the magnitude of the measured effect and the confidence with which a coefficient can be estimated. In a, the boxplot illustrates 25th, 50th, and 75th percentiles for regression coefficients using glycan data (Min = 0.5094, Q1 = 0.7206, median = 0.8416, Q3 = 1.2706, Max = 1.7166, n = 35) or glyco-motif data (Min = 0.5094, Q1 = 0.8365, median = 1.1403, Q3 = 1.5106, Max = 2.8357, n = 74). Distributions were compared using one-sided Wilcoxon tests (p = 0.0047). In b, the boxplot again illustrates the 25th, 50th, and 75th percentiles for regression standard error trained on glycan data (Min = 0.0182, Q1 = 0.1631, median = 0.2446, Q3 = 0.2832, Max = 0.4518, n = 35) or glyco-motif data (Min = 0.0053, Q1 = 0.1508, median = 0.2047, Q3 = 0.2747, Max = 0.5398, n = 74). Distributions were compared using a one-sided Wilcoxon test (p = 0.033). c The R2 describes the effect size of a regression; we used marginal R2 (mR2) because it was appropriate for the regression models used51. Distributions for mR2 of regression models trained on glycan data (Min = 0.128, Q1 = 0.183, median = 0.331, Q3 = 0.441, Max = 0.737, n = 20) and glyco-motif data (Min = 0.0949, Q1 = 0.3185, median = 0.46, Q3 = 0.686, Max = 0.764, n = 40) were compared using a one-sided Wilcoxon test (p = 0.04). d We predicted power for a range of sample sizes (n = 5–200) given the median effect size (solid line) within the interquartile range (shaded region) for glyco-motif-trained regressions (mR2: Q1 = 0.31, median = 0.45, Q3 = 0.68) and the median effect size for glycan-trained regressions (mR2: Q1 = 0.18, median = 0.33, Q3 = 0.44). Here, the use of GlyCompare and glyco-motif (grey-blue color) abundances required approximately half the number of samples to achieve equivalent power as standard glycan (red color) measures. Source data are provided as a Source data file.