Abstract
Alzheimer’s disease (AD) involves proteostasis dysregulation causing protein misfolding, but whether these structural changes manifest as plasma conformational biomarkers remains unclear. We profiled plasma protein structures from 520 participants including individuals with AD, individuals with mild cognitive impairment (MCI) and healthy controls. Using mass spectrometry and machine learning, we systematically characterized the structural proteome changes associated with ApoE variations and neuropsychiatric symptoms to identify AD-specific signatures. We developed a diagnostic panel using peptides from C1QA, CLUS and ApoB representing AD-associated structural changes. This three-marker panel achieved 83.44% accuracy in three-way classification (healthy versus MCI versus AD). Binary classification yielded area under the receiver operating characteristic curves of 0.9343 for healthy versus MCI and 0.9325 for MCI versus AD. Longitudinal samples were classified with 86.0% accuracy. This multi-marker panel based on plasma protein structural alterations represents a promising diagnostic approach that may enhance early AD detection and provide insights for clinical trials, improving therapeutic outcomes.
Main
Proper protein folding and misfolded protein removal are essential for cellular function1. Despite meticulous cellular surveillance systems, approximately 30% of newly synthesized proteins are prone to misfolding2,3. Misfolded proteins cause cellular malfunctions, including mitochondrial dysfunction, calcium dysregulation and inflammation4. In AD, the inability to sustain proper protein folding because of decreased proteostasis is observed in the brain. AD is closely correlated with aging, and declining proteostasis allows the accumulation of damaged organelles and misfolded proteins, ultimately causing extensive protein aggregation5. Aggregated amyloid-β, an AD hallmark, accumulates between neurons approximately 15 years before clinical symptoms appear6. However, as the understanding of AD’s complex biology has grown, research has expanded beyond individual protein misfolding like amyloid plaques or tau tangles to include protein conformations and interactions7,8,9. Bamberger et al.8 reported conformational differences in brain tissue proteins between AD and normal controls, suggesting conformational perturbations beyond amyloid-β misfolds. Comprehensive investigation of global structural changes in AD-associated proteins could reveal mechanisms underlying disease risk factors or symptoms and potentially identify plasma protein structural signatures.
Apolipoprotein E (encoded by APOE) is a polymorphic gene encoding a plasma protein that chaperones cholesterol and blood lipids. Three major allelic variants exist (ε2, ε3, ε4), with APOE ε3 being most common (70–80% allele frequency)10. ApoE isoforms produce proteins differing by only 1–2 amino acids but with different binding interactions11. APOE ε4 and ε2 correlate with AD risk12; individuals with ε4/ε4 were 15 times more likely to develop AD than ε3/ε3, while ε2/ε2 showed 40% lower risk. Approximately 50–65% of individuals with AD carry the ε4 allele10,13. Studies have differentiated protein and RNA expression profiles of APOE genotypes and identified their protein network impact11,14. However, few studies have examined how the APOE genotype affects ApoE-interacting protein structures, despite evidence of differential binding.
Almost all individuals with AD develop neuropsychiatric symptoms (NPSs), but women reportedly develop severe cognitive impairment more rapidly than men15,16,17,18. Studies found that women with AD exhibit delusion more often, whereas apathy and agitation were more common in men19,20. Interest in characterizing molecular profiles and pathway alterations related to NPSs and AD is continuing to grow21. However, the sex–NPS relationship remains unsubstantiated22, and providing molecular pathological evidence is challenging because of substantial AD clinical heterogeneity23,24.
In this study, we conducted an unbiased analysis to determine if protein structural changes underlie AD post-onset symptoms. We hypothesized that AD-induced proteostasis reduction causes structural perturbations in AD-associated plasma proteins. We examined how ApoE-interacting proteins were structurally influenced by APOE allelic variations and identified proteins with structural changes correlating with NPS severity in male and female participants. We used deep learning algorithms to develop an optimized multi-marker panel from AD-associated conformational changes in three proteins (C1QA, CLUS and ApoB). This panel was the best-performing among 18 candidates created using 18 distinct machine learning algorithms, evaluated using 470 blood samples from healthy controls, individuals with MCI and individuals with AD. This three-marker panel achieved 83.44% accuracy, simultaneously classifying healthy, MCI and AD in the unsupervised test set, was unaffected by age-related health conditions and was validated using 50 longitudinal follow-up samples.
Results
Demographic characteristics of participants from two cohorts and profiles of protein structure
The protein structures of 520 plasma samples from two retrospective cohorts were profiled. The University of Kansas Alzheimer’s Disease Research Center (KU ADRC) cohort (n = 320) included medical history, cognitive testing results and neuropsychiatric symptoms. Of these individuals, 50 were followed longitudinally to track disease progression for up to 255 days. The University of California San Diego (UCSD) cohort (n = 200) included sex, age and cerebrospinal fluid (CSF) amyloid-β and tau measurements (Table 1). Presence of AD-related brain pathology was determined using CSF amyloid-β and tau measurements (Extended Data Fig. 1). Median amyloid-β 42 (Aβ42) concentrations were 408 pg ml−1 (AD), 553 pg ml−1 (MCI) and 701 pg ml−1 (healthy individuals). Median Aβ42/40 ratios were 0.0448 (AD), 0.0639 (MCI) and 0.0746 (healthy individuals). Median total-tau concentrations were 658 pg ml−1 (AD), 385 pg ml−1 (MCI) and 304 pg ml−1 (healthy individuals). Median concentrations of 181-phosphorylated tau181 were 93.7 pg ml−1 (AD), 49.8 pg ml−1 (MCI) and 42.1 pg ml−1 (healthy individuals). CSF biomarker concentrations in AD were significantly different from MCI and healthy individuals (analysis of variance, adjusted P (Padj) < 0.05).
Age did not differ across groups in either cohort (P = 0.7880 for KU ADRC, P = 0.2026 for UCSD) except in longitudinal follow-up samples. The longitudinal samples were slightly older than the cross-sectional samples; however, this difference did not affect the identification of disease-specific protein structural changes or biomarker panel generation because the longitudinal samples were used solely for multi-marker panel validation. In the KU ADRC cohort, the APOE genotypes were determined for 256 of 270 individuals (94.8%), with none exhibiting the APOE ε2/ε2 genotype. In total, 33 healthy individuals (32.4%), 35 individuals with MCI (45.3%) and 54 individuals with AD (55.9%) carried APOE ε4, while 2 healthy individuals (2.0%), 8 individuals with MCI (10.7%) and 17 individuals with AD (18.3%) had two ε4 copies. The average Mini-Mental State Examination (MMSE) and Clinical Dementia Rating Sum of Boxes (CDRSUM) scores were 29.0 ± 1.1 and 0.1 ± 0.4 (healthy individuals), 26.5 ± 2.6 and 1.7 ± 0.8 (individuals with MCI) and 22.4 ± 4.9 and 5.5 ± 3.2 (individuals with AD), respectively.
We globally profiled proteome structures in blood samples from all participants (n = 520) using covalent protein profiling (CPP), a chemical method for quantitative protein footprinting that probes exposed and buried amino acids on protein surfaces (Fig. 1a). This method quantifies protein conformational changes on a proteome-wide scale based on dimethyl labeling of exposed lysine residues. CPP quantifies lysine exposure and represents it as accessibility (%), independent of protein abundance. Accessibility reflects the proportion of dimethylated lysine residues relative to the theoretical maximum. Higher accessibility indicates more lysine sites exposed to solvent with greater dimethylation accessibility, while lower accessibility indicates buried or structurally protected lysine residues. While CPP resolution is limited regarding atomic-level structural changes, the accessibility score offers a quantitative global perspective by reflecting structural changes across the entire proteome in a single experimental batch. We quantified 3,655 labeled lysine-containing peptides corresponding to 373 proteins from 520 blood samples. Subsequently, we focused on 879 labeled peptides (representing 109 proteins) quantified in more than 25% of total participants (n > 130) (Supplementary Fig. 1). This study aimed to identify AD markers based on structural changes rather than conducting in-depth proteome profiling or achieving large dynamic range proteomic analysis. Prioritizing potential AD marker utility in clinical settings, we minimized sample preparation to reduce experimental errors. We assumed that if structural changes in relatively abundant proteins could effectively discriminate AD, these findings would offer clinical application advantages with reliable and consistent measurements. To monitor quality control, we pooled individual samples to create 13 quality control samples run intermittently during liquid chromatography–mass spectrometry (LC–MS) analysis of individual samples (Supplementary Fig. 2a–c). The coefficient of variation was approximately 5% or less for both labeled peptides and non-lysine-containing peptides, confirming that the analysis was not technically biased and was reproducible.
a, Workflow of CPP using the blood samples. b, Averaged accessibility decreased significantly across groups (healthy versus MCI: *P = 0.019, MCI versus AD: *P = 0.046, two-sided t-test). The box represents the IQR (25th–75th percentiles), center line, median, whiskers and 1.5× the IQR; outliers shown as individual points. c,d, Mean coefficient of variation (CV) for 879 labeled peptides (c) and 1,432 non-labeled peptides (d) in 520 individuals. The error bars indicate standard the deviation (*P = 0.0049, two-sided t-test). e, Peptides with significantly different accessibility (green) or intensity (blue) between groups (P < 0.05, two-sided t-test). The thick colors indicate significant proportions. f, Protein conformational changes compared as fold change between disease groups. Early accessibility changes showed stronger correlation with overall changes (R = 0.83) than late changes (R = 0.71). R signifies the Spearman’s correlation coefficient. The blue line represents the best-fit linear regression.
Assessing the potential of protein structure as a classifier via global profiling
To evaluate the potential of using protein structure for disease classification, we sought to compare interindividual variability in protein structure with the variability of protein expression. Accessibility averaged 93.2% for healthy individuals (n = 227), 92.0% for individuals with MCI (n = 135) and 91.1% for individuals with AD (n = 158) across the 879 labeled peptides (Fig. 1b). Overall, accessibility decreased across the three groups, suggesting that lysines are less exposed as the disease progresses, presumably because of dysregulation of proteostasis. The differences between these groups were statistically significant. We determined the coefficient of variation of the accessibility for each peptide to assess the interindividual variability of structures. The average CV within each group increased as the disease progressed: 7.4% (interquartile range (IQR) 2.6–8.3) for healthy individuals, 8.1% (IQR 3.2–9.9) for individuals with MCI and 8.8% (IQR 3.4–10.8) for individuals with AD (Fig. 1c). The decrease in accessibility and increase in variability of the accessibility across groups aligned with the compelling hypothesis that proteostasis function progressively declines in the diseased state, leading to misfolded proteins25,26. There was less variation in the abundance of non-lysine-containing peptides (which typically conveys protein expression information) with coefficient of variation of 6.1% (IQR 4.5–7.3) for healthy individuals, 6.3% (IQR 4.6–7.5) for individuals with MCI and 6.1% (IQR 4.2–7.4) for individuals with AD (Fig. 1d); coefficient of variation values did not differ between the three groups. Structural changes exhibited slightly higher variability than expression changes. When structural and expression changes in MCI and AD were compared at the protein level, we found an overlap of 55 proteins; when healthy individuals were compared with individuals with MCI, there was an overlap of 66 proteins (Supplementary Fig. 3). Under physiological dysregulation, proteins undergo post-synthesis modifications that affect their function and stability, leading to protein misfolding. Additionally, we were able to compare the variability in heterogenous individual samples by repeatedly measuring the 13 quality control samples. The structural variability of proteins in the quality control samples was slightly lower than that in the individual samples. The higher variability of the individual samples probably results from the influence of genetic variations (such as single-nucleotide polymorphisms) or from environmental factors. Nevertheless, the average coefficient of variation values for the individual samples remained below 10%, indicating considerable uniformity of proteomic conformations within the same group. These results suggest that protein structural changes can be used as signatures for disease classification.
To investigate structural alterations in proteins across groups, we analyzed the accessibility of labeled peptides. In parallel, we assessed differences in protein expression levels by examining the intensity of peptides lacking lysine residues (non-K-containing peptides). For the healthy versus MCI groups, significant differences in accessibility or intensity were observed in 42.3% (n = 372) of 879 labeled peptides and in 31.8% (n = 456) of 1,431 non-lysine-containing peptides. Similarly, significant differences in accessibility or expression were observed in 22.5% (n = 198) of 879 labeled peptides, and in 19.0% (n = 273) of 1,431 non-lysine-containing peptides for the advanced stages of MCI versus AD (Fig. 1e). Despite the higher coefficient of variation of accessibility among individuals within the group, protein structure appeared to be more influenced by disease than protein expression. These observations suggest that protein structure might contain essential information that could be used to identify disease state. Assuming that protein structures were dysregulated by AD, we examined how the extent of structural changes is correlated with the alterations throughout the overall disease progression. A higher correlation was observed between the initial and overall changes (R = 0.83) than was observed between late and overall changes (R = 0.71) (Fig. 1f). These observations suggest a nonlinear relationship between protein structure changes and disease stages; some change occur early, while others change as the disease advances.
Impact of the APOE genotype on the structure of the proteome
APOE ε4 poses a genetic risk for AD, while APOE ε2 provides protective effects12,27,28. Studies have reported that APOE functionality is altered by the isoforms29,30,31,32. We hypothesized that investigating structural changes in ApoE-interacting proteins based on APOE genotypes could enhance our understanding of AD and predict its onset and progression. To achieve this, we used β coefficients from multiple linear regression models for each labeled peptide to identify proteins structurally associated with ApoE, regardless of disease status. APOE ε3/ε3 (the most prevalent genotype) served as reference. Ninety-one labeled peptides mapping to 43 proteins showed nominally significant accessibility differences (P < 0.05) based on ε4/ε4, ε4/ε3 and ε2/ε3 (Fig. 2a). Of 91 peptides, 73 had significantly lower accessibility in ε4/ε4. Of these 73 peptides, 10 showed accessibility differences with only one ε4 allele copy. After multiple comparison correction (Benjamini–Hochberg P < 0.05), seven peptides remained significantly different in ε4/ε4. Of 43 proteins corresponding to these 91 peptides, 25 reportedly interact with ApoE per STRING database (Fig. 2b). Twenty-six proteins (including ApoE) enriched 17 Gene Ontology terms generalizable as binding activities, enzyme activities, lipid metabolism and protein-folding chaperone binding (Supplementary Fig. 4). Four proteins exhibited decreased expression with ApoE ε4 versus ε3/ε3. Four proteins exhibited decreased expression with ApoE ε4 versus ε3/ε3 (Supplementary Table 1).
a, Left: heatmap representing P values of β coefficients from multiple linear regression analysis assessing protein abundance-APOE genotype associations. Gray represents nonsignificant values. Right: Heatmap representing β coefficients showing the direction and magnitude of accessibility changes across APOE genotypes. The black boxes indicate peptides that were significant after false discovery rate correction (Padj < 0.05). b, Based on the STRING database, 22 proteins (gray circles) interact directly with APOE; three proteins (white circles) do not. Confidence cutoff = 0.4. The line colors represent the APOE genotype significantly associated with each protein. c, Unsupervised clustering based on the accessibilities of 91 labeled peptides with APOE genotype associations. Both clusters mainly included diseased individuals but were not disease-specific. d, APOE ε4/ε4 carriers showed significant C1QA structural changes (decreased GFCDTTNLKGLF accessibility) compared to the other genotypes (n = 13, 64, 47, 17 for ε3/ε2, ε3/ε3, ε3/ε4, ε4/ε4). The error bars represent the s.d. Kruskal–Wallis test with Dunn’s post hoc test: ε3/ε2 showed higher accessibility than all genotypes (P values: ε3/ε3, 0.037; ε3/ε4, 0.016; ε4/ε4, <0.001); ε4/ε4 lower than ε3/ε3 (P = 0.004) and ε3/ε4 (P = 0.012). *P < 0.05; ***P < 0.005; ***P < 0.0001 ApoE ε4/ε4 carrier. e, DVFEEGTEASAATAVKITLL (SERPINA3) accessibility showed bimodal distribution. The bar represents the percentage of individuals with accessibility <20% per APOE genotype. A Fisher’s exact test was used (P = 0.014); a pairwise Bonferroni correction identified group differences (*Padj = 0.024).
Unsupervised clustering of 91 ApoE-associated labeled peptides revealed two distinct clusters: one mainly for healthy individuals and another for mixed-disease individuals (MCI and AD). These peptides could not differentiate between AD and MCI (Fig. 2c). We anticipated that ApoE-related structural information might not effectively distinguish MCI from AD because APOE ε4 is a common risk factor for both, and MCI represents the transitional stage between healthy aging and AD. We identified 37 labeled peptides showing significant accessibility differences between individuals from different clusters with the same disease, but not between individuals with different diseases within the same cluster (Supplementary Fig. 5a). Among these, C1QA exhibited the most significant accessibility differences across APOE genotypes. APOE ε4/ε4 carriers showed the lowest accessibility, with significantly reduced levels versus ε2/ε3, ε3/ε3 and ε3/ε4 carriers (Fig. 2d, Padj < 0.05). We observed a bimodal accessibility of DVFEEGTEASAATAVKITLL (SERPINA3) based on APOE allelic variants (Fig. 2e and Supplementary Fig. 5b), indicating two distinct subpopulations with high or low accessibility rather than continuous distribution. Notably, the low accessibility proportion increased in APOE ε4 carriers (particularly ε4/ε4), potentially reflecting genotype-associated population shifts. In 58.7% (n = 34) of ε4 allele carriers, DVFEEGTEASAATAVKITLL accessibility was less than 20%.
To investigate the structural basis for APOE genotype-dependent SERPINA3 accessibility changes, we performed computational structural analysis. No studies have revealed direct experimental evidence for ApoE-SERPINA3 physical interaction. However, haptoglobin (HP) was experimentally validated to have interactions with both ApoE33,34,35,36,37,38 and SERPINA339,40,41. We hypothesized that HP serves as mediator in tripartite complex, facilitating APOE genotype-dependent SERPINA3 structural changes. We constructed an ApoE-HP-SERPINA3 ternary complex model using AlphaFold multimer to explore potential indirect SERPINA3 (DVFEEGTEASAATAVKITLL) across ApoE isoforms, with progressive reduction from ApoE ε2 (16.3 Å; Extended Data Fig. 2a,b) to ε3 (15.9 Å; Extended Data Fig. 2c,d) to ε4 (10.0 Å; Extended Data Fig. 2e,f). This ternary complex model provides computational support for ApoE-dependent SERPINA3 accessibility changes observed in our CPP data. Furthermore, the predicted SERPINA3 structure aligned well with the experimentally validated structure (5OM2, X-ray diffraction, 1.47 Å resolution), showing root mean square deviation values of 1.76 Å (ApoE ε2), 1.78 Å (ApoE ε3) and 1.65 Å (ApoE ε4). Interface analysis revealed distinct structural characteristics among ApoE isoforms, with ApoE ε4 demonstrating highest contact density (Extended Data Fig. 2g) and most efficient molecular packing despite a moderate interface area (Extended Data Fig. 2h). Contact efficiency measurements showed that ApoE ε4 achieved more intermolecular interactions per unit surface area than ApoE ε2 or ApoE ε3. These structural differences correlated with progressive increases in binding affinity (ΔGibbs energy; Extended Data Fig. 2i) from ApoE ε2 to ApoE ε4, indicating that ApoE ε4 forms the most structurally optimized SERPINA3 complex. Our computational structural analyses demonstrated close agreement with experimentally determined SERPINA3 structures, confirming that the predicted ternary complex is consistent with CPP-derived findings of reduced accessibility in ApoE ε4. Interface analysis provided orthogonal support for our experimental results. These findings establish discovery-level evidence for ApoE ε4 structural differences and lay the foundation for future biochemical studies elucidating biological implications of these conformational changes.
Protein structural changes are differentially influenced by neuropsychiatric symptoms in the sexes
NPSs such as depression, anxiety, elation, hallucination and apathy are observed in more than 97% of individuals with AD18. To quantitatively evaluate NPSs, we aggregated ratings assigned by patients for 12 NPSs (agitation, anxiety, apathy, appetite, delusions, depression, disinhibition, elation, hallucination, irritability, motor disturbance, nighttime behaviors). Ratings were: 0, none; 1, mild; 2, moderate; and 3, severe. Summed score ranged from 0 to 24. In healthy individuals and individuals with MCI, the NPS score did not differ between the sexes, whereas in the group with AD, female participants scored higher than male participants, suggesting that female participants with AD had greater cognitive function impairment and mood disorder than male participants42,43 (Extended Data Fig. 3a,b). Based on these sex-dependent symptom phenotypes, we examined NPS associations with protein structural changes. We established linear relationship between labeled peptide accessibility and NPS score using linear regression analysis. Most peptides showed a negative relationship between accessibility and NPS scores (Fig. 3a). Among peptides with significant linear relationships, 74 out of 79 (93.7%) were negative in male participants and 110 out of 129 (85.2%) in female participants. These patterns indicate that severe NPSs are correlated with decreased peptide accessibility, mirroring the global accessibility trend across healthy individuals, individuals with MCI and individuals with AD. When testing sex differences in the accessibility–NPS relationship for each peptide, we found that 45 peptides showed sex-influenced linearity (Fig. 3a). Of these 45 peptides, 17 (representing 13 proteins) showed significant linearity (β = −4.2 to −0.1) in male participants only. Of 45 peptides mapping to 19 proteins, 28 showed significant linearity (β = −4.0 to 2.4) in female participants only (Supplementary Table 2). Meanwhile, 26 labeled peptides showed accessibility variation with NPS scores in both sexes without sex dependence. The β coefficients for these 26 peptides were highly correlated between male and female participants (R = 0.93, 95% confidence interval (CI) 0.85–0.97) (Fig. 3b). Among 20 proteins mapped by these 26 peptides, 10 proteins (APCS, ApoD, C1S, CLUS, FN1, GSN, ITIH4, KLKB1, PROS1 and SERPINA3) showed no sex-specific influence in any identified peptide region (Fig. 3c and Supplementary Table 3). These 10 proteins were enriched to amyloidosis and AD in the Human Disease Ontology database (Fig. 3c).
a, Simple linear regression analyses for male and female plotted as β coefficient versus −log(P). The labeled dots represent peptides showing significant accessibility–NPS relationships (P < 0.05) with sex differences. Red and blue represent positive and negative relationships. b, Twenty-six peptides showed significant NPS–accessibility relationships without sex influence. β coefficients were highly correlated between the sexes. c, Protein interaction network showing nine of ten proteins interacting with each other; PROS1 is not shown (no interactions). d, Four peptides from four proteins (CLUS, PROS1, ITIH2 and C3) in male participants, and five peptides from five proteins (FN1, CFB, HPX, CLUS and PROS1) in female participants showed AUROCs > 0.7 for both healthy versus MCI and MCI versus AD. VTVEKGSYYPGSGIAQF (PROS1) and SVDCSTNNPSQAKL (CLUS) showed AUROCs > 0.7 in both sexes. Diamonds indicate β coefficients; nonsignificant values are not shown. e, SVDCSTNNPSQAKL (CLUS) accessibility was significantly related to NPSs in both sexes using linear regression. No sex differences were observed. Blue and red indicate male and female. The shaded area indicates the 95% CI.
NPS scores showed higher discriminatory power in female than male participants. Area under the receiver operating characteristic curves (AUROCs) for healthy versus AD differed significantly between male (0.8435) and female (0.9321) participants (DeLong test, P = 0.0477), suggesting more severe cognitive impairment in women during AD progression. We evaluated the distinguishing power of protein structural changes reflecting cognitive impairment differences. More peptides exhibited distinguishable characteristics (AUROC > 0.7) in early versus advanced disease stages. We identified 160 and 115 labeled peptides for healthy versus MCI, and 54 and 80 for MCI versus AD in male and female participants, respectively. Four peptides from four proteins (C3, CLUS, ITIH2 and PROS1) in male participants and five peptides from five proteins (CFB, CLUS, FN1, HPX and PROS1) in female participants showed AUROCs greater than 0.7 for both healthy versus MCI and MCI versus AD (Fig. 3d). ITIH2 conformation was significantly correlated with NPS severity, with high negative β coefficients (−5.24 and −3.29) for DMENFRTEVNVLPGAKVQF accessibility versus NPS scores in male and female participants, respectively. This peptide showed significant sex differences in discriminatory power for early disease (MCI versus AD), performing better in men (AUROC = 0.838) than women (AUROC = 0.625) (Extended Data Fig. 3c). Studies reported the salutary effect of ITIH2 in neurological recovery after brain injury, suggesting that these proteins maintain normal homeostatic balance and that genetic polymorphisms may contribute to neuropsychiatric disorder risk44,45,46. Clusterin (CLUS) has been associated with Aβ clearance or aggregation47. In our study, the SVDCSTNNPSQAKL (CLUS) lysine residue became increasingly concealed as NPSs worsened (β = −0.65 male, −0.78 female using simple linear regression) (Fig. 3e). Additionally, SVDCSTNNPSQAKL accessibility distinguished disease states in early (healthy versus MCI) and late stages (MCI versus AD) with AUROCs greater than 0.7 for both sexes (Extended Data Fig. 3d).
Multi-marker panel based on protein structural status
To mitigate cohort-specific marker selection bias and improve population heterogeneity representation, we combined the KUMC and UCSD samples and split them into discovery (training) and test sets. Of 470 samples, two-thirds (n = 313; 135 healthy, 83 with MCI, 95 with AD) were used to identify structural signatures simultaneously differentiating healthy versus MCI versus AD. The remaining one-third (n = 157; 67 healthy, 42 with MCI, 48 with AD) independently tested the model generated from the training set (Supplementary Fig. 6). Additionally, longitudinal samples (n = 50) validated the model’s discriminatory power. Missing values were imputed using the k-nearest neighbor method, which preserves the original dataset patterns more effectively than mean or minimum value imputation48, before training. Missing value imputation was validated as described in the Supplementary Data, ‘Missing value imputation validation’ and Supplementary Fig. 7.
We systematically evaluated 18 established machine learning algorithms to identify the optimal approach, including random forest, gradient-boosted trees, decision tree, deep learning, k-nearest neighbor, naive Bayes (kernel), rule induction, support vector machine, linear discriminant analysis, neural net, naive Bayes, generalized linear model, decision stump, random tree, one rule (single attribute), quadratic discriminant analysis, regularized discriminant analysis and AutoMLP49. Eleven algorithms (random forest, gradient-boosted trees, decision tree, deep learning, k-nearest neighbor, naive Bayes (Kernel), rule induction, support vector machine, linear discriminant analysis, neural net, generalized linear model, AutoMLP) demonstrated 70% or greater accuracy on both training and test sets (Fig. 4a). Deep learning, random forest and k-nearest neighbor exhibited superior test set performance with accuracies of 83.4%, 81.5% and 80.2%, respectively. The deep-learning-based panel achieved the highest test set accuracy (83.4%), with no significant reduction in the unsupervised test set, indicating no overfitting. The multi-marker model included GFCDTTNKGLF (C1QA), SVDCSTNNPSQAKL (CLUS) and AVLCEFISQSIKSF (ApoB).
a, Eighteen machine learning algorithms were tested. The red bars show algorithms with an accuracy greater than 70%. Deep learning was selected for the highest test set accuracy (83.4%). b, Accuracy was 88.49% on the training set and 83.44% on the test set. The numbers in the boxes indicate individuals classified using the deep-learning-based multi-marker model. c, GFCDTTNLKGLF (C1QA) and SVDCSTNNPSQAKL (CLUS) were commonly selected in ten multi-marker models with an accuracy greater than 70%. d, Accessibility distributions of GFCDTTNLKGLF (C1QA) and SVDCSTNNPSQAKL (CLUS) represented according to group. A two-sided t-test was used. *P < 0.05; ***P < 0.001; ****P < 0.0001. e,f, A multi-marker model was applied for binary classification: healthy versus MCI (AUROC = 0.9343) (e) and MCI versus AD (f) (AUROC = 0.9325). The blue area indicates the 95% CI. Right: the bar graphs show the performance indices. NPV, negative predictive value; PPV, positive predictive value.
Using this multi-marker model, 84 out of 95 individuals with AD in the training set were accurately classified based on structural status. Similarly, 42 out of 48 test set participants were correctly classified (Fig. 4b). In both sets, all false-negative AD cases were assigned to MCI, and all falsely classified healthy individuals were classified as MCI. When individuals with AD were misclassified as MCI, the structural variation trends of all three proteins resembled the precursor condition (MCI). However, some misclassified individuals with MCI were classified as healthy, while more were classified as having AD. Notably, GFCDTTNKGLF (C1QA) and SVDCSTNNPSQAKL (CLUS) consistently emerged as selected features in multi-marker panels developed by ten different algorithms, demonstrating more than 80% training set accuracy and more than 70% test set accuracy (Fig. 4c). As expected, these the accessibility of these two peptides decreased consistently from healthy to MCI to AD, with significantly different distributions between groups (Fig. 4d). When the three-marker deep-learning-based panel distinguished two groups (healthy versus MCI or MCI versus AD), disease state discrimination improved, with an AUROC = 0.9343 (95% CI 0.8874–0.9815) for healthy versus MCI (Fig. 4e) and an AUROC = 0.9325 (95% CI 0.8753–0.9898) for MCI versus AD (Fig. 4f). All performance indices (F1 score, accuracy, precision, recall) scored more than 0.75 for both discriminations. To benchmark our structural change-based approach against conventional expression-based approaches, we used deep learning to construct a multi-marker model using intensity data from the same protein set. The model was trained on the training set and validated on the independent test set, achieving 64.3% overall accuracy for simultaneous three-group classification (healthy, MCI or AD) (Extended Data Fig. 4a,b). Furthermore, we developed a comprehensive expression-based model using intensity data from all available proteins to establish the maximum diagnostic potential of conventional approaches. Using this complete intensity dataset, three-group classification achieved a modest 65.0% test set accuracy. Group-specific performance analysis revealed that healthy demonstrated the highest sensitivity (79.1%), followed by AD (60.4%) and MCI (47.6%) (Extended Data Fig. 4c,d). Collectively, these results indicate that incorporating structural protein changes into diagnostic models may enhance classification performance compared to relying solely on expression-level data.
Evaluation of analytical confounders in protein marker discovery strategy
We validated the robustness and reliability of our marker identification approach through multiple analytical scenarios. First, we investigated whether data imputation or cohort-specific effects influenced the discriminatory power of the three-marker panel. To validate that the dataset was not distorted by imputation, we compared the discriminatory power of the three-marker panel with the independent test set without imputation to that with imputation. The simultaneous classification of 157 individuals in the test set into three distinct groups yielded an accuracy of 87.90%, an efficacy comparable to the analysis conducted with imputed data (accuracy = 83.44%). The AUROCs of the two groups (0.976 healthy versus MCI, 0.939 MCI versus AD) from data without imputation were not statistically significantly different from those of imputed data (Supplementary Fig. 8a–c). These results suggest that after the imputation procedure, the intrinsic data distribution of the original dataset and its characteristics were not influenced by the k-nearest neighbor imputation.
To examine the cohort-specific bias, we distinguished the test dataset according to cohort and verified the classification performance for each cohort independently. The simultaneous classification of the three groups yielded accuracies of 89.55% and 82.22% for the UCSD and KUMC cohorts, respectively, thus confirming a performance similar to our original analysis (Extended Data Fig. 5a,b). When assessing the ability to differentiate AD from MCI or healthy, the AUROCs exceeded 0.88 in both cohorts (Extended Data Fig. 5c). Additionally, no statistically significant differences in performance were observed between the cohorts (P > 0.05). The comparable discriminative performances of both cohorts suggests that cohort-specific differences do not significantly affect biomarker performance, supporting the generalizability and reliability of AD-associated protein structural changes identified through the integrated cohort analysis pipeline.
In parallel validation efforts, we hypothesized that if the proteins we initially identified were truly significant, they would be consistently selected in separate cohort training scenarios. Using the KUMC cohort (102 healthy, 75 with MCI, 93 with AD) as the training dataset and the UCSD cohort (100 healthy, 50 with MCI, 50 with AD) as the independent test dataset, we found that CLUS and C1QA, two key proteins of interest from our original findings, were consistently identified with an accuracy of 76.00% (Supplementary Fig. 9). This consistency across different cohort configurations provides robust support for our methodological approach and the validity of our initial protein identification.
The power calculation details are shown in the Supplementary Data, (‘Power/precision calculation: post hoc performance metrics’ section) and Extended Data Fig. 6. Age did not significantly confound the protein structural changes when evaluated using the multi-marker panel (See the ‘Evaluation of age as a confounder’ in the Supplementary Data and Extended Data Fig. 7).
Correlation of the multi-marker panel with established screening tools and brain pathological change
The MMSE and CDRSUM scores strongly correlated with the value of ‘confidence’ from the multi-marker panel, R = −0.8000 (95% CI −0.8668 to −0.7049) and R = 0.8502 (95% CI 0.7781–0.9002), respectively (Fig. 5a,b). The MMSE and CDRSUM scores for individuals with AD were distributed within the ‘normal-to-moderate impairment’ range, while scores from the multi-marker panel classified all individuals with AD as having AD. Given that the purpose of multi-marker models is to screen for disease, minimizing false negatives is critical to ensuring early detection.
a,b, AD confidence score from the multi-marker model by Deep learning was compared with the conventional screening tool for AD, MMSE (a) and CDRSUM (b). Top: the x axis shows the clinical cutoff values for AD staging. The dashed line represents the multi-marker model classification criteria. The yellow, green and red dots indicate healthy, MCI and AD, respectively. c, Ventricle volume correlated with the AD confidence score from the multi-marker panel. Statistical significance was assessed using a two-sided correlation test. The dark blue bars signify the R values; the gray bars indicate non-significance. P value: right inferior lateral ventricle = 0.0043, left inferior lateral ventricle = 0.0311, left lateral ventricle = 0.0332, third ventricle = 0.0372. d, AD confidence scores were significantly correlated with CSF biomarkers (n = 167, P < 0.0001, two-sided correlation test). R values were −0.3553, −0.3283, 0.3982 and 0.5986 for Aβ42, Aβ42/40 ratio, phosphorylated tau181 (p-tau181) and total-tau (T-tau), respectively. The error bars indicate the 95% CIs.
We found that the multi-marker panel detected pathological changes in the brain. Ventricular expansion has received attention in AD research50 because enlargement of the ventricles resulting from shrinkage of the brain tissue is observed in neurodegenerative disorders. The confidence score for AD determined by the multi-marker panel was moderately positively correlated with the volume of the brain ventricles as measured by magnetic resonance imaging (Fig. 5c). Specifically, the volume of the third ventricle showed the most significant correlation with the AD scores, with an R of 0.48. Of the three proteins in our multi-marker panel (CLUS, C1QA and ApoB), CLUS exhibited moderate correlation with the volume of the ventricles. The accessibility of the lysine residue in SVDCSTNNPSQAKL (CLUS) was negatively correlated with the volume of the ventricles, corresponding to a trend toward its decreased accessibility as AD progresses (Supplementary Fig. 10a). While the correlation between our plasma-based multi-marker panel and CSF biomarkers was moderate (R = −0.24 to 0.51) (Fig. 5d and Supplementary Fig. 10b), it is consistent with the inherent biological differences between compartments. As our multi-marker panel measures protein structural changes rather than absolute concentrations, it may provide information that is complementary to but distinct from data obtained from traditional CSF concentration-based biomarkers. The performance of the CSF markers is presented in detail in the Supplementary Data (‘Evaluation of the discriminatory power of the multi-marker panel depending on CSF biomarkers’) and Supplementary Figs. 11–13.
Reproducible performance of the multi-marker panel in longitudinal follow-up samples
Plasma samples from 50 individuals (healthy (n = 34), MCI (n = 4), AD (n = 12)) were longitudinally followed up to 255 days to track progression to AD. Even though the follow-up period was less than a year, our multi-marker panel discriminated not only conversion to AD, but also from healthy to MCI (Fig. 6a). There were three possibilities for paired samples: (1) healthy people remained healthy at the re-visit (n = 25, healthy–healthy); (2) individuals with MCI or AD were diagnosed with the same disease at a re-visit (n = 14, MCI–MCI or AD–AD); and (3) healthy participants were diagnosed with MCI or AD, or individuals with MCI were diagnosed with AD at the re-visit (n = 11, healthy–MCI, healthy–AD, MCI–AD). When assessing the changes in the accessibility of individual peptides, only SVDCSTNNPSQAKL of CLUS reflected disease progression; SVDCSTNNPSQAKL only showed significantly decreased accessibility in the group with a changed diagnosis (Extended Data Fig. 8). The other two labeled peptides (GFCDTTNKGLF of C1QA and AVLCEFISQSIKSF of ApoB) did not significantly change in accessibility even when the diagnostic status changed. The three-marker panel achieved an accuracy of 86% with the longitudinal samples. The scores of the three-marker panel were significantly altered when disease status changed to AD or MCI, whereas the scores of a three-marker panel did not change significantly when the diagnosis was unchanged for the paired samples (Fig. 6b). Interestingly, when an MCI or AD diagnosis remained unchanged throughout the follow-up period, the confidence in AD via the three-marker panel increased, but not significantly.
a, Fifty longitudinal samples were classified into healthy, MCI or AD, with 86.0% accuracy. b, Three scenarios were observed: remaining in a healthy state, remaining in the same disease state (MCI or AD) and transitioning to an advanced stage (healthy to MCI or healthy to AD or healthy to AD). A two-sided Wilcoxon test was used. *P = 0.0019.
We inferred that this trend might be attributed to physiological perturbations worsening during the follow-up period, even though the clinical diagnostic status remained unchanged. This implies that as individuals progress from healthy to AD, the three-marker panel can classify them as healthy, MCI or AD based on the structural characteristics of C1QA, CLUS and ApoB, which are altered through physiological perturbations.
Discussion
We identified a multi-marker panel including C1QA, CLUS and ApoB, proteins previously associated with AD pathophysiology and under active investigation51,52,53. Still, most multi-marker studies have focused on protein expression rather than structural changes as biomarker candidates. C1QA is a subunit of complement component 1q (C1q), which serves as the recognition component in the classical complement pathway. C1q binds to Aβ plaques and reportedly has a protective role in early AD, while reduced CSF complement protein levels have been implicated in AD progression54,55. The multi-marker panel includes peptide GFCDTTNKGLF from the C1QA’s C1q domain, which is involved in protein folding and recognition of diverse molecules, including pathogen surface ligands56. GFCDTTNKGLF had the highest weight in the multi-marker panel with accessibility related to ApoE genotypes (higher in ApoE ε2/ε3 than for ApoE ε3/ε3). Our findings suggest that increased lysine exposure in GFCDTTNKGLF may be protective against AD, providing evidence that complement components undergo structural changes in AD. Recent cross-linking MS studies demonstrated dynamic structural alterations in complement component 3 during AD progression, particularly in its interactions with α-1-antitrypsin57. Our C1QA findings support the concept that multiple complement pathway proteins experience conformational modifications in AD, suggesting that complement system dysfunction involves widespread structural perturbations across components. CLUS (also known as ApoJ), the second highest weighted attribute in the multi-marker panel, was reported to bind to Aβ and appears to alter aggregation and promote Aβ clearance, suggesting a neuroprotective role47,58. CLUS is abundant in the CSF (~2–6 µg ml−1) and plasma (~100–200 µg ml−1)59, and is considered the third greatest genetic risk factor for AD after ApoE and BIN159. Aβ plaques colocalize with upregulated CLUS in the hippocampus and cortex of the AD brain51,59, and blood CLUS levels differ between patients with AD with and without cognitive impairment60. In our study, lysine exposure in SVDCSTNNPSQAKL (amino acids 310–323) of CLUS correlated with AD pathological changes. SVDCSTNNPSQAKL accessibility was linearly correlated with NPS severity (β = −0.65 and −0.78 for male and female, respectively), while CLUS expression did not correlate with MMSE or CDRSUM scores. SVDCSTNNPSQAKL accessibility alone exhibited strong discriminatory power with AUROCs of 0.80 (healthy versus MCI) and 0.79 (MCI versus AD).
Current AD biomarkers include positron emission tomography (PET) imaging for amyloid deposition and CSF measurements of Aβ and tau61. However, PET is cost-prohibitive as a screening tool and typically used for diagnosis confirmation, while structural changes detectable using magnetic resonance imaging manifest relatively late in disease progression. Our multi-marker panel offers potential for early AD detection and stage classification, enhancing diagnostic precision and enabling timely interventions. Previous studies achieved ~90% accuracy measuring plasma Aβ1–40/Aβ1–42 ratios via immunoprecipitation-MS using PET as refs. 62,63. Multi-protein panels can outperform single biomarkers by better characterizing disease pathology64,65,66,67, and MS enables unbiased biomarker discovery through simultaneous multiplexed measurements without requiring prior protein knowledge64,65,66,67,68,69,70. However, most multi-marker studies focus on protein expression rather than structural changes. Notably, our three proteins (C1QA, CLUS and ApoB) were identified based on quantitative structural changes, not expression changes.
A limitation is the potential co-depletion of proteins bound to the 14 immunoaffinity-depleted abundant proteins (human serum albumin, IgG, IgA, IgM, IgD, IgE, kappa and lambda light chains, alpha-1-acidglycoprotein, alpha-1-antitrypsin, alpha-2-macroglobulin, apolipoprotein A1, fibrinogen, haptoglobin and transferrin), potentially losing clinically relevant biomarkers. Immunoaffinity depletion fundamentally alters plasma matrix composition, potentially affecting protein conformation and interactions. Disease-stage-specific variations in protein–protein interactions may cause differential depletion effects across healthy, MCI and AD groups. Altered binding patterns in disease states could create stage-specific co-depletion biases, confounding interpretation of observations as artificial rather than disease-specific alterations. The irreversible nature of immunoaffinity depletion prevents direct validation against nondepleted samples, limiting our ability to distinguish genuine disease-related changes from depletion-related effects. While enrichment strategies could enable comprehensive protein detection, they would significantly increase processing time and costs, making them impractical for large-scale processing. Despite these limitations, depleting the top 14 abundant proteins was essential to detect low-abundance proteins and prevent signal overwhelming.
Our longitudinal analysis was constrained by a 255-day follow-up period and limited sample size because of sequential sample availability from participating institutions. While our findings demonstrate significant protein accessibility changes within this timeframe, longer follow-up with larger cohorts would strengthen clinical utility and provide comprehensive insights into disease progression trajectories.
We investigated protein structural alterations associated with AD in a substantial cohort of human blood samples, focusing on molecular pathology. We used deep learning methods to develop a multi-marker panel using structural changes in proteins associated with causative factors and observable symptoms to classify progressive AD stages. Our findings highlight the significance of protein structural modifications as informative biomarkers for diagnosing and tracking AD. This multistructural marker panel holds promise for early AD detection and stage classification.
Methods
Criteria for diagnosis and recruitment of blood samples
Whole-blood samples collected from participants of the UCSD and University of Southern California Alzheimer’s Disease Research Centers were used for this study. Blood was processed to plasma, which was aliquoted and frozen within 2 h of draw. Presence of brain pathology related to AD was determined by CSF analyses for Aβ and tau. Participants were also seen biannually for cognitive visits. The KU ADRC collects longitudinal data on a clinical cohort that includes participants with cognitive impairment and with normal cognition. Our assessment protocol includes the uniform dataset. Cognitively normal individuals aged 60 and older are included. Cohort entry requires written consent from each participant, and written consent from a study partner. The consent forms and processes were approved by the University of Kansas Medical Center’s institutional review board and research activities were conducted in accordance with the 1975 Declaration of Helsinki. Participants undergo a CDR interview. Participants completed the uniform dataset evaluation defined by the National Institute on Aging (NIA) ADRC network, as well as additional cognitive tests that include letter number sequencing, the Free and Cued Selective Reminding Test and Stroop test. APOE genotype was determined at the time of cohort entry, and plasma samples were collected, prepared and frozen. The KU ADRC reviewed initial and subsequent annual CDR and neuropsychological test scores at weekly consensus diagnostic conferences that include clinicians, a neuropsychologist, psychometricians and other staff who participate in the evaluation process. Participants with a global CDR score of 0 have no ascertainable objective evidence of cognitive deficits and were classified as cognitively normal. Participants with a global CDR score of 0.5 were assigned a categorical designation of MCI if there are objective deficits on cognitive testing but the participant is fully independent in terms of daily function; and very mild dementia if there were objective deficits on cognitive testing but the participant is not fully independent in terms of daily function. Those receiving a global CDR score of 1 or more were assigned a categorical designation of dementia (mild, moderate or severe). For those with MCI and dementia, an underlying basis for the syndrome (AD or an alternative condition that can cause cognitive impairment) was decided through consensus. The level of certainty regarding whether individuals with MCI have an underlying AD pathology were determined based on the etiological diagnosis. Individuals determined to have AD met the McKhann et al. AD diagnostic criteria71. The AD samples were selected based on the primary etiology diagnosis as AD in NACCETPR from NACC72. Written informed consent was obtained from all participants at the time of sample collection.
Plasma proteome sample preparation with dimethyl labeling
The preparation of samples was randomized to avoid bias. The top 14 abundant proteins in 2 μl of plasma samples were depleted using a depletion spin column (cat. no. A36369, Thermo Fisher Scientific) and the depletion solutions were washed. The proteins were concentrated using a 3,000 Da filter. Protein concentration was determined using a bicinchoninic acid assay (cat. no. 23225, Thermo Fisher Scientific). Lysine resides in 50 μg of proteins (30 μl of volume) were dimethyl-labeled with 10 μl of labeling solution (final concentration: 30 mM sodium cyanoborodeuteride; NaBD3CN, 1% formaldehyde; 13CD2O) and the reaction was quenched by the addition of 10 μl of 250 mM ammonium bicarbonate (ABC), to a final concentration of 50 mM ABC. To precipitate the proteins, 200 μl of methanol, 50 μl of chloroform and 150 μl of water were added and mixed vigorously. After centrifugation with 15,000 rpm at 4 °C for 30 min, the pellet was washed by adding 800 μl of methanol, mixed vigorously and centrifuged. After discarding the supernatant, the pellet was air-dried. Protein was denatured with 80 μl of solution (2% sodium deoxycholate, 10 mM Tris(2-carboxyethyl)phosphine) and was incubated at 60 °C for 60 min on a shaker. The reduced disulfide bonds were alkylated with 20 μl of iodoacetamide and incubated at 25 °C for 30 min. The denatured proteins were digested with chymotrypsin (enzyme to substrate = 1:100 (w:w)) at 37 °C for 16 h. The enzymatic digestion was quenched by the addition of 30 μl of 15% formic acid, to a final concentration of 1% formic acid. The sample was incubated at 37 °C for 30 min and the supernatant was collected after centrifugation. Again, the collected sample was centrifuged at 13,000 rpm to collect a clear sample. Sample cleanup and second dimethyl labeling on the newly exposed lysine residues were performed on a 96-well plate using a C18 pipette tip (cat. no. 87782, Thermo Fisher Scientific). The C18 pipette tip was washed by aspirating and dispensing methanol and acetonitrile and was equilibrated by aspirating and dispensing 0.1% formic acid. The sample was loaded into the C18 tip by aspirating and dispensing. The C18 tip was washed by aspirating and dispensing 0.1% formic acid. To adjust the pH of the sample in the C18 tip, 20 mM HEPES was aspirated and dispensed by pipetting. The labeling solution (300 mM sodium cyanoborohydride; NaBH3CN, 1% formic acid; 12CD2O) was aspirated and dispensed into the C18 tip and the reaction was quenched by aspirating and dispensing 50 mM of ABC. After quenching the reaction, the pH was adjusted by aspirating and dispensing 20 mM HEPES and the C18 tip was washed by aspirating and dispensing 0.1% formic acid. The peptides retained in the C18 tip were eluted by aspirating and dispensing 40% acetonitrile in 0.1 formic and 60% acetonitrile in 0.1% formic acid, sequentially. The peptide solution was lyophilized.
LC–MS/MS analysis
LC–MS/MS analysis was performed as described previously73. EvoTips were prepared with samples according to the supplier’s instructions. Chromatographic separation and mass spectrometric analysis were performed using an Evosep One system interfaced with a timsTOF Pro instrument (Bruker). Peptides were resolved on a 15 cm × 150 μm inner diameter column containing BEH C18 particles (1.7 μm, Waters) with an integrated electrospray emitter (manufactured in-house), using the 30 SPD gradient program. Aqueous and organic mobile phases consisted of 0.1% formic acid in water and acetonitrile, respectively. MS used the PASEF acquisition mode, with each 1.1-s duty cycle comprising one TIMS–MS survey scan followed by PASEF-based fragmentation. Both ion accumulation and ramp duration in the dual TIMS device were configured to 100 ms, with ion mobility spanning 0.6–1.6 Vs cm−2 (1/K0). Precursor selection used mass windows of 2 Th (m/z < 700) or 3 Th (m/z > 700) across m/z 100–1,700. Collision energy decreased linearly with ion mobility from 59 eV (1/K0 = 1.6 Vs cm−2) to 20 eV (1/K0 = 0.6 Vs cm−2). A polygon filter removed singly charged species, while MS/MS precursors required a minimum intensity of 2,500 counts to ensure adequate spectral quality for confident identification. A target ion count of 20,000 provided sufficient signal for effective fragmentation, and a 24-s dynamic exclusion window maximized proteome coverage by preventing repeated sampling of identical precursors.
MS data analysis
Raw files were processed with MSFragger (v.17.1) using mass calibration and parameter optimization against the human Swiss-Prot-UniProt database (downloaded 28 January 2022), which included canonical and isoform sequences74. Philosopher filtered peptide-spectrum matches, while IonQuant handled quantification75,76. Chymotrypsin was specified as the digestion enzyme with up to two missed cleavages allowed. Peptides shorter than six amino acids were excluded. Variable modifications were methionine oxidation (15.9949 Da), light dimethyl labeling (32.0564 Da) and heavy dimethyl labeling (36.0757 Da) at lysine and N termini, with carbamidomethylation of cysteine (57.0214 Da) as a fixed modification. Precursor and fragment mass tolerances were both set to 50 ppm, which accounts for mass distribution variability from chymotryptic digestion and dimethyl labeling while staying within the instrument’s mass accuracy range. Isotope errors of 0, 1 or 2 were considered. Peptide-spectrum matches required at least two fragment ion peaks for modeling and four for reporting. The top 150 most intense peaks were used, with a minimum of 15 fragment peaks needed to search each spectrum. A decoy database search was included and identifications were filtered to a 1% peptide false discovery rate, a standard threshold in proteomics. As our analysis targeted structural changes, we identified and quantified modified peptides to track site-specific lysine dimethylation.
Determination of accessibility of dimethyl labeling on lysine sites
Lysine residues undergo dimethylation with either light or heavy isotopic labels based on their solvent exposure. By comparing peptide signal intensities between the two sequential labeling reactions, we calculated the R ratio77. This ratio quantifies the fraction of each lysine site that was available for initial modification, providing a measure that remains unaffected by total protein concentration in the sample. We convert R values to accessibility percentages using the following formula: accessibility (%) = R/(1 + R) × 100, which indicates the degree of lysine exposure to the labeling reagent.
Functional annotation and enrichment analysis
Functional enrichment was performed with ClueGO (a plug-in to Cytoscape) to identify the significant biological functions of the proteins. The analysis for protein–protein interactions was performed using the STRING database and the resulting networks were visualized on Cytoscape.
Machine learning methods
A comprehensive machine learning framework was developed for class prediction using MS data, with a feed-forward deep neural network serving as the primary model alongside 17 comparative algorithms. The dataset of 313 samples was partitioned using a stratified approach, where 90% (approximately 282 samples) constituted the model development set and 10% (31 samples) were reserved as an independent test set for final validation. Tenfold cross-validation was implemented to enhance model generalization and provide robust performance estimates by evaluating the model on multiple, independent data subsets. Each fold preserved the proportional representation of control and case groups found in the original dataset. This stratified approach was consistently applied across all machine learning algorithms to ensure fair comparison. The deep neural network architecture was optimized through comprehensive grid search, resulting in a configuration of two hidden layers with 20 nodes per layer and rectifier (ReLU) activation functions. Training was conducted for 400 epochs with robust regularization strategies, including L2 regularization coefficient78 (penalty = 1.0 × 10−5) and dropout (ratio = 0.5 for hidden layers) to mitigate overfitting. The optimization included ten critical hyperparameters spanning network architecture, activation functions, regularization coefficients and training dynamics. Class imbalance was addressed through stratified sampling throughout the analytical pipeline, using a comprehensive suite of evaluation metrics. Class-specific accuracy, precision, recall and F1 scores were calculated to ensure balanced performance across both minority and majority classes. The AUROC was used as the primary metric for model comparison because of its robustness to class distribution. Feature selection was performed using a stepwise approach that systematically evaluated each variable’s contribution to model performance. This methodology identified the most informative features while eliminating redundant or noisy variables. The resulting feature set was validated across all 18 algorithms to confirm generalizability. Model robustness was assessed through 100 bootstrap iterations using a 90:10 train–test split ratio, providing CIs for all performance metrics. All analyses were performed using the H2O package (v.3.10.3.6) in R (v.3.3.3), leveraging its efficient handling of high-dimensional data and systematic hyperparameter optimization capabilities. The deep learning method was compared with 17 alternative machine learning techniques, including random forest, gradient-boosted trees, decision tree, autoMLP, k-NN, naive Bayes, rule induction, support vector machine, linear discriminant analysis, neural net, generalized linear model, decision stump, random free, One Rule and discriminant analysis methods79. The training and test data were processed the same way as they were in the deep-learning approach; hyperparameter optimization was carried out using a grid search in the R package for the five methods80.
Complex modeling with AlphaFold2-multimer
We used AlphaFold2-multimer to predict the protein–protein interaction motif of each complex. AlphaFold2-multimer modeling was performed with ColabFold81. Input multiple sequence alignment (MSA) features were generated by local ColabFold using the ‘MMseqs2 (Uniref + Environmental)’ MSA mode. By default, the constructed MSAs contain both unpaired (per-chain) and paired sequences. AlphaFold2-multimer was run with one or several options from the following list: model type = alphafold2_multimer v3, num recycles = 3, recycle early stop tolerance = 0.5, max msa = auto, num seeds = 1. The models were ranked according to confidence score, and rank 1 was selected as the most accurate model. The distance between two lysine residues was calculated using PyMOL2 v.2.5 (Schrödinger LLC).
Statistics and reproducibility
No statistical methods were used to predetermine sample sizes. Sample sizes were chosen based on established practices in discovery proteomics and to ensure sufficient representation and variability across study groups. Samples were processed using block randomization. Briefly, random numbers were generated using Microsoft Excel and were ordered accordingly. Within each block of 24 samples, samples were allocated to the three groups in equal proportions to ensure balanced group representation. Samples or data points with excessive missing values were excluded from the analyses. Comparisons of the global accessibility between groups were conducted using unpaired t-test. The relationships between the fold changes were determined using simple linear regression. Associations between accessibility and the genotype of APOE were examined using multiple linear regression. Each genotype was treated as a categorical variable for the multiple linear regression analysis. A technique of dummy coding was used to represent each APOE genotype as binary (0 or 1) variables. The β coefficient from the multiple linear regression was used as a measure of the effect of the relationship between each genotype and accessibility. Analysis of covariance (ANCOVA) was used to assess whether sex influences the relationship between NPSs and accessibility. The relationships between NPSs and accessibility were calculated using simple linear regression and showed the effect of the relationship based on the β coefficient from the simple linear regression. For data analyzed using t-tests or ANCOVA, normality was assumed as required for these parametric tests but was not formally verified. Comparisons of the confidence score with longitudinal samples were conducted using a Mann–Whitney U-test because of the low number of samples. Diagnostic accuracies were assessed with ROC curve analysis. A DeLong test was used to compare the performance of two ROC curves. The association between the multi-marker model and the conventional tool (MMSE or CDRSUM) was tested with a nonlinear polynomial spline model. The confidence score from the multi-marker model was used in this regression analysis. All analyses were performed using RStudio and R v.4.2.1 (packages pROC and splines). Data were visualized using either Prism 8 (GraphPad Software) or RStudio. An unadjusted two-sided P < 0.05 was considered statistically significant.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All data supporting the findings of this study, including processed data, are available from the corresponding author upon reasonable request. The mass spectrometry raw data have been deposited to the MassIVE repository at https://massive.ucsd.edu/ProteoSAFe/dataset.jsp?task=4d36129273c0401eb470f840d123c808.
References
Höhn, A., Tramutola, A. & Cascella, R. Proteostasis failure in neurodegenerative diseases: focus on oxidative stress. Oxid. Med. Cell. Longev. 2020, 5497046 (2020).
Sontag, E. M., Samant, R. S. & Frydman, J. Mechanisms and functions of spatial protein quality control. Annu. Rev. Biochem. 86, 97–122 (2017).
Mymrikov, E. V., Daake, M., Richter, B., Haslbeck, M. & Buchner, J. The chaperone activity and substrate spectrum of human small heat shock proteins. J. Biol. Chem. 292, 672–684 (2017).
Radford, S. E. & Dobson, C. M. From computer simulations to human disease: emerging themes in protein folding. Cell 97, 291–298 (1999).
Chiti, F. & Dobson, C. M. Protein misfolding, amyloid formation, and human disease: a summary of progress over the last decade. Annu. Rev. Biochem. 86, 27–68 (2017).
Hampel, H. et al. The amyloid-beta pathway in Alzheimer’s disease. Mol. Psychiatry 26, 5481–5503 (2021).
Rayaprolu, S. et al. Systems-based proteomics to resolve the biology of Alzheimer’s disease beyond amyloid and tau. Neuropsychopharmacology 46, 98–115 (2021).
Bamberger, C. et al. Protein footprinting via covalent protein painting reveals structural changes of the proteome in Alzheimer’s disease. J. Proteome Res. 20, 2762–2771 (2021).
Del Campo, M. et al. CSF proteome profiling across the Alzheimer’s disease spectrum reflects the multifactorial nature of the disease and identifies specific biomarker panels. Nat. Aging 2, 1040–1053 (2022).
Corder, E. H. et al. Protective effect of apolipoprotein E type 2 allele for late onset Alzheimer disease. Nat. Genet. 7, 180–184 (1994).
Huang, Y. et al. Relationships of APOE genotypes with small RNA and protein cargo of brain tissue extracellular vesicles from patients with late-stage AD. Neurol. Genet. 8, e200026 (2022).
Fernández-Calle, R. et al. APOE in the bullseye of neurodegenerative diseases: impact of the APOE genotype in Alzheimer’s disease pathology and brain diseases. Mol. Neurodegener. 17, 62 (2022).
Farrer, L. A. et al. Effects of age, sex, and ethnicity on the association between apolipoprotein E genotype and Alzheimer disease. A meta-analysis. APOE and Alzheimer Disease Meta Analysis Consortium. JAMA 278, 1349–1356 (1997).
Dai, J. et al. Effects of APOE genotype on brain proteomic network and cell type changes in Alzheimer’s disease. Front. Mol. Neurosci. 11, 454 (2018).
Filon, J. R. et al. Gender differences in Alzheimer disease: brain atrophy, histopathology burden, and cognition. J. Neuropathol. Exp. Neurol. 75, 748–754 (2016).
Irvine, K., Laws, K. R., Gale, T. M. & Kondel, T. K. Greater cognitive deterioration in women than men with Alzheimer’s disease: a meta analysis. J. Clin. Exp. Neuropsychol. 34, 989–998 (2012).
Terry, R. D. & Davies, P. Dementia of the Alzheimer type. Annu. Rev. Neurosci. 3, 77–95 (1980).
Eikelboom, W. S. et al. Sex differences in neuropsychiatric symptoms in Alzheimer’s disease dementia: a meta-analysis. Alzheimers Res. Ther. 14, 48 (2022).
Ott, B. R., Tate, C. A., Gordon, N. M. & Heindel, W. C. Gender differences in the behavioral manifestations of Alzheimer’s disease. J. Am. Geriatr. Soc. 44, 583–587 (1996).
Zuidema, S. U., de Jonghe, J. F., Verhey, F. R. & Koopmans, R. T. Predictors of neuropsychiatric symptoms in nursing home patients: influence of gender and dementia severity. Int. J. Geriatr. Psychiatry 24, 1079–1086 (2009).
Fusar-Poli, P. et al. Transdiagnostic psychiatry: a systematic review. World Psychiatry 18, 192–207 (2019).
Showraki, A. et al. Cerebrospinal fluid correlates of neuropsychiatric symptoms in patients with Alzheimer’s disease/mild cognitive impairment: a systematic review. J. Alzheimers Dis. 71, 477–501 (2019).
Nebel, R. A. et al. Understanding the impact of sex and gender in Alzheimer’s disease: a call to action. Alzheimers Dement. 14, 1171–1183 (2018).
Clayton, J. A. & Tannenbaum, C. Reporting sex, gender, or both in clinical research? JAMA 316, 1863–1864 (2016).
Labbadia, J. & Morimoto, R. I. The biology of proteostasis in aging and disease. Annu. Rev. Biochem. 84, 435–464 (2015).
Douglas, P. M. & Dillin, A. Protein homeostasis and aging in neurodegeneration. J. Cell Biol. 190, 719–729 (2010).
Martins, I. J. et al. Apolipoprotein E, cholesterol metabolism, diabetes, and the convergence of risk factors for Alzheimer’s disease and cardiovascular disease. Mol. Psychiatry 11, 721–736 (2006).
Jeong, W., Lee, H., Cho, S. & Seo, J. ApoE4-induced cholesterol dysregulation and its brain cell type-specific implications in the pathogenesis of Alzheimer’s disease. Mol. Cells 42, 739–746 (2019).
Zannis, V. I., Just, P. W. & Breslow, J. L. Human apolipoprotein E isoprotein subclasses are genetically determined. Am. J. Hum. Genet. 33, 11–24 (1981).
Havel, R. J. & Kane, J. P. Primary dysbetalipoproteinemia: predominance of a specific apoprotein species in triglyceride-rich lipoproteins. Proc. Natl Acad. Sci. USA 70, 2015–2019 (1973).
Rall, S. C. Jr., Weisgraber, K. H. & Mahley, R. W. Human apolipoprotein E. The complete amino acid sequence. J. Biol. Chem. 257, 4171–4178 (1982).
Mahley, R. W., Weisgraber, K. H. & Huang, Y. Apolipoprotein E: structure determines function, from atherosclerosis to Alzheimer’s disease to AIDS. J. Lipid Res. 50, S183–S188 (2009).
Cigliano, L., Pugliese, C. R., Spagnuolo, M. S., Palumbo, R. & Abrescia, P. Haptoglobin binds the antiatherogenic protein apolipoprotein E—impairment of apolipoprotein E stimulation of both lecithin: cholesterol acyltransferase activity and cholesterol uptake by hepatocytes. FEBS J. 276, 6158–6171 (2009).
Spagnuolo, M. S. et al. Haptoglobin interacts with apolipoprotein E and beta-amyloid and influences their crosstalk. ACS Chem. Neurosci. 5, 837–847 (2014).
Salvatore, A., Cigliano, L., Carlucci, A., Bucci, E. M. & Abrescia, P. Haptoglobin binds apolipoprotein E and influences cholesterol esterification in the cerebrospinal fluid. J. Neurochem. 110, 255–263 (2009).
Bai, H. et al. A haptoglobin (HP) structural variant alters the effect of APOE alleles on Alzheimer’s disease. Alzheimers Dement. 19, 4886–4895 (2023).
Spagnuolo, M. S. et al. Haptoglobin increases with age in rat hippocampus and modulates Apolipoprotein E mediated cholesterol trafficking in neuroblastoma cell lines. Front. Cell. Neurosci. 8, 212 (2014).
Meier, S. et al. Identification of novel tau interactions with endoplasmic reticulum proteins in Alzheimer’s disease brain. J. Alzheimers Dis. 48, 687–702 (2015).
Marcon, E. et al. Human-chromatin-related protein interactions identify a demethylase complex required for chromosome segregation. Cell Rep. 8, 297–310 (2014).
Huttlin, E. L. et al. Architecture of the human interactome defines protein communities and disease networks. Nature 545, 505–509 (2017).
Kim, K. et al. C1QBP is upregulated in colon cancer and binds to apolipoprotein A-I. Exp. Ther. Med. 13, 2493–2500 (2017).
Cohen, D. et al. Sex differences in the psychiatric manifestations of Alzheimer’s disease. J. Am. Geriatr. Soc. 41, 229–232 (1993).
Tao, Y. et al. Sex differences in the neuropsychiatric symptoms of patients with Alzheimer’s disease. Am. J. Alzheimers Dis. Other Demen. 33, 450–457 (2018).
Threlkeld, S. W. et al. Effects of inter-alpha inhibitor proteins on neonatal brain injury: age, task and treatment dependent neurobehavioral outcomes. Exp. Neurol. 261, 424–433 (2014).
Gaudet, C. M., Lim, Y. P., Stonestreet, B. S. & Threlkeld, S. W. Effects of age, experience and inter-alpha inhibitor proteins on working memory and neuronal plasticity after neonatal hypoxia-ischemia. Behav. Brain Res. 302, 88–99 (2016).
Goulding, D. R. et al. Inter-alpha-inhibitor deficiency in the mouse is associated with alterations in anxiety-like behavior, exploration and social approach. Genes Brain Behav. 18, e12505 (2019).
Foster, E. M., Dangla-Valls, A., Lovestone, S., Ribe, E. M. & Buckley, N. J. Clusterin in Alzheimer’s disease: mechanisms, genetics, and lessons from other pathologies. Front. Neurosci. 13, 164 (2019).
Beretta, L. & Santaniello, A. Nearest neighbor imputation algorithms: a critical evaluation. BMC Med. Inform. Decis. Mak. 16, 74 (2016).
Jovel, J. & Greiner, R. An introduction to machine learning approaches for biomedical research. Front. Med. 8, 771607 (2021).
Nestor, S. M. et al. Ventricular enlargement as a possible measure of Alzheimer’s disease progression validated using the Alzheimer’s disease neuroimaging initiative database. Brain 131, 2443–2454 (2008).
May, P. C. et al. Dynamics of gene expression for a hippocampal glycoprotein elevated in Alzheimer’s disease and in response to experimental lesions in rat. Neuron 5, 831–839 (1990).
Dejanovic, B. et al. Complement C1q-dependent excitatory and inhibitory synapse elimination by astrocytes and microglia in Alzheimer’s disease mouse models. Nat. Aging 2, 837–850 (2022).
Picard, C. et al. Apolipoprotein B is a novel marker for early tau pathology in Alzheimer’s disease. Alzheimers Dement. 18, 875–887 (2022).
Benoit, M. E. et al. C1q-induced LRP1B and GPR6 proteins expressed early in Alzheimer disease mouse models, are essential for the C1q-mediated protection against amyloid-beta neurotoxicity. J. Biol. Chem. 288, 654–665 (2013).
Li, M. et al. Associations of cerebrospinal fluid complement proteins with Alzheimer’s pathology, cognition, and brain structure in non-dementia elderly. Alzheimers Res. Ther. 16, 12 (2024).
Kishore, U. & Reid, K. B. C1q: structure, function, and receptors. Immunopharmacology 49, 159–170 (2000).
Zhu, Z., Zhong, X., Wang, B., Lu, H. & Li, L. Probing protein structural changes in Alzheimer’s disease via quantitative cross-linking mass spectrometry. Anal. Chem. 96, 7506–7515 (2024).
DeMattos, R. B. et al. ApoE and clusterin cooperatively suppress Abeta levels and deposition: evidence that ApoE regulates extracellular Abeta metabolism in vivo. Neuron 41, 193–202 (2004).
Nilselid, A.-M. et al. Clusterin in cerebrospinal fluid: analysis of carbohydrates and quantification of native and glycosylated forms. Neurochem. Int. 48, 718–728 (2006).
Mukaetova-Ladinska, E. B. et al. Plasma and platelet clusterin ratio is altered in Alzheimer’s disease patients with distinct neuropsychiatric symptoms: findings from a pilot study. Int. J. Geriatr. Psychiatry 30, 368–375 (2015).
Cohen, A. D. et al. Fluid and PET biomarkers for amyloid pathology in Alzheimer’s disease. Mol. Cell. Neurosci. 97, 3–17 (2019).
Nakamura, A. et al. High performance plasma amyloid-beta biomarkers for Alzheimer’s disease. Nature 554, 249–254 (2018).
Ovod, V. et al. Amyloid β concentrations and stable isotope labeling kinetics of human plasma specific to central nervous system amyloidosis. Alzheimers Dement. 13, 841–849 (2017).
Hu, T. W. et al. Plasma multianalyte profiling in mild cognitive impairment and Alzheimer disease. Neurology 79, 897–905 (2012).
Ray, S. et al. Classification and prediction of clinical Alzheimer’s diagnosis based on plasma signaling proteins. Nat. Med. 13, 1359–1362 (2007).
Marksteiner, J. et al. Five out of 16 plasma signaling proteins are enhanced in plasma of patients with mild cognitive impairment and Alzheimer’s disease. Neurobiol. Aging 32, 539–540 (2011).
Baird, A. L., Westwood, S. & Lovestone, S. Blood-based proteomic biomarkers of Alzheimer’s disease pathology. Front. Neurol. 6, 236 (2015).
Kononikhin, A. S. et al. Prognosis of Alzheimer’s disease using quantitative mass spectrometry of human blood plasma proteins and machine learning. Int. J. Mol. Sci. 23, 7907 (2022).
Richens, J. L. et al. Practical detection of a definitive biomarker panel for Alzheimer’s disease; comparisons between matched plasma and cerebrospinal fluid. Int. J. Mol. Epidemiol. Genet. 5, 53–70 (2014).
Inoue, M. Identification of plasma proteins as biomarkers for mild cognitive impairment and Alzheimer’s disease using liquid chromatography-tandem mass spectrometry. Int. J. Mol. Sci. 24, 13064 (2023).
McKhann, G. M. et al. The diagnosis of dementia due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement. 7, 263–269 (2011).
Wang, M. et al. A pragmatic dementia risk score for patients with mild cognitive impairment in a memory clinic population: development and validation of a dementia risk score using routinely collected data. Alzheimers Dement. 8, e12301 (2022).
Son, A. et al. Using in vivo intact structure for system-wide quantitative analysis of changes in proteins. Nat. Commun. 15, 9310 (2024).
Kong, A. T., Leprevost, F. V., Avtonomov, D. M., Mellacheruvu, D. & Nesvizhskii, A. I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat. Methods 14, 513–520 (2017).
da Veiga Leprevost, F. et al. Philosopher: a versatile toolkit for shotgun proteomics data analysis. Nat. Methods 17, 869–870 (2020).
Yu, F. et al. Fast quantitative analysis of timsTOF PASEF data with MSFragger and IonQuant. Mol. Cell. Proteomics 19, 1575–1585 (2020).
Bamberger, C., Pankow, S., Park, S. K. & Yates, J. R. 3rd Interference-free proteome quantification with MS/MS-based isobaric isotopologue detection. J. Proteome Res. 13, 1494–1501 (2014).
Cook et al. Practical Machine Learning with H2O: Powerful, Scalable Techniques for Deep Learning and AI 1st edn (O’Reilly Media, 2017).
Zhang, Y. et al. Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications. Biomed. Eng. Online 16, 125 (2017).
Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw. 28, 1–26 (2008).
Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
Acknowledgements
We thank C. Delahunty for critical reading the manuscript. This work was supported by a National Institutes of Health/NIA grant nos. RF1AG061846-01, 5R01AG075862, P30AG072973 and P30-AG066530. We thank members of the Rissman laboratory for technical support for this project.
Author information
Authors and Affiliations
Contributions
A.S., H.K. and J.R.Y. conceived the project. A.S. and H.K. performed the experiments. J.K.D. measured the samples on the mass spectrometer. A.S. and H.K. analyzed the data and the results. C.B. developed the original CPP protocol and provided scientific feedback. R.A.R. and R.H.S. provided the clinical samples and edited the manuscript. J.R.Y. supervised the project. A.S. wrote the manuscript and prepared the figures with help from all authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Aging thanks Maaike Beuvink, Deeptarup Biswas, Simon Ekström, Lucilla Parnetti and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Level of biomarkers in CSF.
Statistical significance was determined by one-way ANOVA with Tukey’s post-hoc test. P-values shown are adjusted p-values. Aβ40: *p = 0.0377 (NOR vs AD); Aβ42: ****p < 0.0001 (NOR vs AD), ***p = 0.0007 (MCI vs AD); Ratio [Aβ42/Aβ40]: ****p < 0.0001 (NOR vs AD), ***p = 0.0005 (MCI vs AD); T-tau: ****p < 0.0001 (NOR vs AD, MCI vs AD); P-tau: ****p < 0.0001 (NOR vs AD, MCI vs AD). P-values less than 0.0001 are reported as p < 0.0001.
Extended Data Fig. 2 Complexes of SERPINA3, HP, and ApoE isoforms.
Complexes were predicted using AlphaFold multimer, based on different sequences of the ApoE isoforms ApoE ε2 (A), ApoE ε3 (C), and ApoE ε4 (E) based on the structure of ApoE ε2, the closest amino acid of HP to the lysine in DVFEEGTEASAATAVKITLL of SERPINA3 was identified and fixed at a distance of 16.3 Å between the two amino acids. For ApoE ε3 and ε4, the distances between the amino acid of HP and the lysine of SERPINA3 were measured to be 15.9 Å and 10.0 Å, respectively. Zoom views are presented for ApoE ε2 (B), ApoE ε3 (D), and ApoE ε4 (F). Predicted SERINA3 is displayed in red, HP in cyan, and ApoE in purple. The green represents the sequence (DVFEEGTEASAATAVKITLL) of SERPINA3. The experimentally determined structure of SERPINA3 is in grey. Contact density (G), interface area (H), and binding affinity (I) between the proteins were calculated.
Extended Data Fig. 3 Relation of NPS and structural change of protein.
NPS including agitation, anxiety, apathy, appetite, delusions, depression, disinhibition, elation, hallucination, irritability, motor disturbance, nighttime behaviors were considered. In the patient’s medical record, each symptom was scaled from 0 (none) to 3 (severe), the score for 12 symptoms were summed for each individual. There were 130 male participants (A), and 135 female participants (B). The proportion value represents the number of individuals with each summed score relative to the total number of individuals in the same disease group. ROC curves are presented for NOR vs. MCI (left) and MCI vs. AD (right) for DMENFRTEVNVLPGAKVQF of ITIH2 (C). Discriminatory power was different between male and female at the early stage (Delong’s test, P-value = 0.0115). Numbers in the square brackets are 95% confidence interval (CI). The shade area indicates 95% CI. Blue indicates male, red indicates female. ROC curves are presented for NOR vs. MCI (left) and MCI vs. AD (right) for SVDCSTNNPSQAKL of CLUS (D). AUROCs for SVDCSTNNPSQAKL of CLUS were over 0.7 for all classifications for both male and female. Blue indicates male, red indicates female.
Extended Data Fig. 4 Performance of Classification of the multi-protein panel based on intensity data.
The panel was developed using the 3 proteins (C1QA, CLU, and ApoB) using a step-wise approach (A, B). All intensity data was used to generate the multi-protein panel (C, D).
Extended Data Fig. 5 Evaluation of cohort-specific bias.
Cohort-specific bias was assessed by comparing the discriminatory power of UCSD (33 NOR, 17 MCI, 17 AD) and KUMC (34 NOR, 15 MCI, 31 AD) cohorts. The 3-marker panel classified NOR, MCI, and AD with accuracies of 89.55% (A) and 82.22% (B) for UCSD and KUMC cohorts, respectively. ROC curves were used to assess the classification AD against both MCI and NOR (C). The discriminatory powers were not significantly different between UCSD and KUMC for either NOR vs. AD (P-value = 0.31) or MCI vs. AD (P-value = 0.24) comparisons. The index values for the discriminatory power for each comparison were represented in the table.
Extended Data Fig. 6 ROC comparison on the independent test set.
Multi-marker panel (blue) vs amyloid-β 42/40 ratio (red). Shaded areas indicate 95% CIs; the gray diagonal is random performance. AUCs: panel 0.934; Aβ42/40 0.742.
Extended Data Fig. 7 Assessing the confounding effect of age.
“Age group” indicates groups based on age (young, middle, old), regardless of the disease (A). Statistical significance was determined by one-way ANOVA. Exact p-values are presented. Extremely low P-values below 0.0001 are reported as p < 0.0001. The average age within each group, divided by age, was significantly statistically different. “Original group” represents a group based on disease and was used as a test set when employing a multi-marker panel. The performance of the multi-marker panel showed very low distinguish power within the Age groups (B). Confusion matrices showing classification performance of the multi-marker panel in young (C) group (≦70 years, n = 46) and old (D) group ( > 70 year, n = 111). Numbers in each cell indicate the number of correctly or incorrectly classified samples. Diagonal cells (highlighted in yellow) represents correct classifications. Overall accuracy is shown below each matrix.
Extended Data Fig. 8 Change of the accessibility of the individual peptide of C1QA (GFCDTTNKGLF, first column), CLUS (SVDCSTNNPSQAKL, second column) and APOB (AVLCEFISQSIKSF, third column) with longitudinal samples.
Only SVDCSTNNPSQAKL exhibited the distinct change of the accessibility between the paired samples. Two-sided Wilcoxon test was performed. *, P-value = 0.0137.
Supplementary information
Supplementary Information
Supplementary text, Supplementary Figs. 1–13 and Supplementary Tables 1–3.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Son, A., Kim, H., Diedrich, J.K. et al. Structural signature of plasma proteins classifies the status of Alzheimer’s disease. Nat Aging (2026). https://doi.org/10.1038/s43587-026-01078-2
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s43587-026-01078-2





