Extended Data Fig. 2: Sensitivity analyses.

To rule out the possibility that the large number of differentially abundant proteins in AD is an artifact of the analyses, due to data harmonization across sites, bias due to site or any other hidden problem, we conducted several additional analyses (e.g. analyses using raw proteomic values, inclusion of site in regression model, and joint vs. meta-analysis) to demonstrate the robustness of our analyses. We demonstrate that different QC approaches, either doing QC by site or all samples together do not lead to considerably different results. Additionally, the inclusion or exclusion of sodium citrate samples does not change the findings. Furthermore, the joint analyses do not lead to any batch or artifactual results as the effect sizes of the joint analyses are highly correlated with those of the meta-analyses. Moreover, the joint analyses that show similar effect sizes to that of the meta-analyses provide more statistical power by reducing the confidence interval of the estimates. a-b, Sensitivity analysis performed using extended AD and dementia patients. a, Volcano plot visualizing proteins significantly increased (right side of dashed vertical line on x-axis) or decreased (left side of dashed vertical line on x-axis) in AD and dementia patients in comparison to cognitively normal controls. The dotted line on the y-axis represents the significance threshold (FDR < 0.05). In the GNPC version 1.3, 1,638 individuals have been diagnosed with dementia, based on a Clinical Dementia Rating (CDR) greater than 0.5 or a Mini-Mental State Examination (MMSE) score below 19, but do not have a confirmed final diagnosis. b, A scatterplot of correlation in effect size from the AD vs. CO analysis (included in the main results) and AD and Dementia vs. CO analysis (sensitivity analysis). Red and blue dots and regression lines represent proteins that passed FDR (FDR < 0.05) and nominal (p < 0.05) significance in the main AD vs. CO analysis, and green dots represent proteins that are non-significant (p > 0.05). We observed a strong correlation (Pearson r2 = 0.93) in effect size between the main and sensitivity analyses. c-e, Pairwise comparisons of effect size estimates from three models. c, Joint analysis using study-wide Z-scores (All). d, Joint analysis using z-score calculated by site (By Site). e, Meta-analyses using z-score by site (Meta). Each point represents protein, with effect estimates shown for the corresponding models. Points are color-coded based on significance in the respective models: grey indicates not significant in either model, green indicates significance in the x-axis model only, blue in the y-axis model only, and red indicates significance in both models. Dashed lines indicate linear regression fits with shaded 95% confidence intervals. Spearman correlation coefficients (r) and associated P values are displayed in each panel. f, Effect size correlation of the analyses with and without the site with Citrate. Effect size estimates were highly consistent, with 100% directional concordance and a Pearson correlation coefficient of r2 = 1. Scatter plot of effect size EDTA joint analysis (x-axis) for differentially expressed proteins versus EDTA + Citrate joint analysis effect sizes (y-axis) across AD. The number of differentially expressed proteins is 5,187. Each dot represents a protein, with blue indicating concordant direction and grey indicating discordant direction between the two analyses. The red line indicates linear regression fit. g-h, Correlation of effect sizes between by-site meta-analysis and joint analysis. To perform a head-to-head comparison of the meta-analyses vs the joint analyses, we initially compare the results of a joint analyses with those of the meta-analyses where only the sites with cases and controls for a specific disease are included. In this way the same samples are included in both the meta-analyses and the joint analyses. When doing these analyses, we found high correlation (r2 > 0.88) in the effect size, and even higher concordance rate in direction of effect sizes across both analyses (>95%). This correlation further increases if we focus only on the proteins that passed FDR correction in the joint analysis. Scatter plots of effect size joint analysis (x-axis) for differentially expressed proteins vs. by-site meta-analysis effect sizes (y-axis) across three diseases: AD, PD, (FTD. Each dot represents a protein, with blue indicating concordant direction and grey indicating discordant direction between the two analyses. The red line indicates linear regression fit. g, Represents all proteins. h, Only proteins that pass FDR significance. i, Correlation of effect sizes between joint analysis (Fixed effect) and joint analysis (Random effect). Scatter plots of effect size joint analysis fixed effect (x-axis) for differentially expressed proteins vs. joint analysis random effect (y-axis) across three diseases: AD, PD, and FTD. Each dot represents a protein, with blue indicating concordant direction and grey indicating discordant direction between the two analyses. The red line indicates linear regression fit. The results from the two approaches were highly consistent. Specifically, the correlation of effect sizes is quite high (r2 > 0.98), and there is minimal changes in the proteins that pass FDR, suggesting that the random vs fixed effect has low impact on the overall associations.