Main

Amnestic dementia, often diagnosed as late-onset Alzheimer’s disease (LOAD), is increasingly recognized to be a heterogeneous clinical syndrome that may reflect multiple underlying heritable pathological processes1,2,3. LOAD genome-wide association studies (GWAS) have primarily used clinical diagnosis or proxy phenotypes based on family history of dementia4,5,6,7. While these GWAS have been immensely successful, identifying over 80 disease-associated genetic loci6, the use of clinical phenotypes complicates interpretation and partly obscures the complex and common reality of mixed neuropathologies in aged individuals3,8. To complement the successes from previous studies, GWAS using neuropathology endophenotypes (NPEs) is an essential next step to identify loci that drive specific Alzheimer’s disease and related dementias (ADRD)-associated pathologic mechanisms9.

Amyloid plaques and neurofibrillary tangles (NFT), together known as Alzheimer’s disease (AD) neuropathologic changes (ADNC), are present at autopsy in most brains from patients diagnosed with clinical LOAD, but ~20% of clinically diagnosed patients do not have ADNC, and >50% of those with ADNC have comorbid non-AD pathologies3,10. For example, transactive response (TAR) DNA-binding protein 43-kDa (TDP-43) pathology was found in >50% of elderly autopsied individuals11 in a community-based cohort study. Limbic-predominant age-related TDP-43 encephalopathy (LATE) is an amnestic dementia syndrome defined by a distinguishing pattern of LATE-neuropathological change (LATE-NC) characterized by TDP-43 proteinopathy that is most severe in the medial temporal lobes3,12,13. Hippocampal sclerosis of aging is characterized by neuronal death, gliosis and atrophy of the hippocampus beyond normal ranges based on levels of ADNC, commonly co-occurs with LATE-NC, and is associated with severe cognitive impairment12,14.

Cerebrovascular pathologies also contribute to cognitive decline and dementia and are prevalent among elderly autopsied research participants15. Cerebral amyloid angiopathy (CAA) is characterized by amyloid-β deposition in cerebral blood vessels16. CAA often co-occurs with ADNC but can independently contribute to cerebral injury15,16. Infarcts of both grossly visible arteries and microscopically examined vessels (the latter referred to as microinfarcts) are also common contributors to cognitive decline15,17. Cerebral large-vessel atherosclerosis and small-vessel/arteriolar thickening (arteriolosclerosis) are associated with infarcts, white matter rarefaction and hippocampal sclerosis18,19 and contribute to cognitive decline20,21. Collectively, these factors reveal a complex web of pathologies that contribute to cognitive impairment and dementia.

Examining the genetic risk factors of each subtype of neuropathology can provide an important and complementary approach to large GWAS of clinical- and family history-based outcomes for studying LOAD/ADRD risk. Previous GWAS of NPEs have confirmed known LOAD risk loci and have identified new neuropathology risk loci22,23,24,25,26,27. Some NPEs, particularly LATE-NC, have yet to be studied systematically using GWAS. Here we performed GWAS on 11 NPEs using three high-quality data sources with both autopsy and genotype data. We also performed downstream functional analyses to explore potential biological functional mechanisms of newly identified risk loci and provide insight into previously identified putative AD risk loci.

Results

Participant and NPE characteristics

Genotype and neuropathology data were analyzed from the following three autopsy data sources: (1) the National Alzheimer’s Coordinating Center (NACC; n = 5,940), (2) the Religious Orders Study and Rush Memory and Aging Project (ROSMAP; n = 1,183) and (3) the Adult Changes in Thought (ACT; n = 681) study (Fig. 1). In total, 7,804 unique participants were included in our analyses. The number of participants included in each GWAS ranged from 6,363 for amyloid-β plaques to 7,786 for Consortium to Establish a Registry for Alzheimer’s Disease (CERAD) neuritic amyloid plaque score, except for LATE-NC, which had a smaller sample due to the more recent discovery and evaluation of TDP-43 pathology (n = 3,112; Table 1).

Fig. 1: Overview of GWAS meta-analysis study design.
figure 1

We performed GWAS meta-analyses of 11 NPEs across three data sources. White boxes represent data sources or summary statistics used in this study. Purple boxes represent individual steps throughout the genetic association analysis, and green boxes represent downstream functional analyses. The first stage of this analysis involved independent GWAS performed in parallel across the NACC neuropathology dataset, the ACT study and the combined ROSMAP. We then performed a meta-analysis using results from each individual GWAS using METAL. Variants reaching a suggestive threshold of association (P ≤ 1 × 10−5) in the meta-analysis were then carried forward for downstream analyses, including functional and colocalization analyses. Variants reaching the genome-wide significant threshold (P ≤ 5 × 10−8) and exhibiting ≥80% colocalization between two NPEs were followed up using existing methylation data to assess the association. All variants reaching genome-wide significance were considered associated with the respective NPE. We also report variants that reached a suggestive threshold (P ≤ 5 × 10−7) or reached the lower suggestive threshold (P ≤ 1 × 10−5) and were in a previously known disease-associated locus. GTEx, Genotype-Tissue Expression Project; QTL, quantitative trait locus; eQTL, expression QTL; sQTL, splicing QTL; mQTL, methylation QTL; AD, Alzheimer’s disease.

Table 1 Demographic and pathology characteristics of the NACC, ROSMAP, ACT cohorts and combined sample

The 11 studied NPEs included AD-related pathologies (CERAD score for neuritic amyloid plaques, amyloid-β plaques including diffuse plaques and Braak NFT staging28,29,30), non-AD neurodegenerative proteinopathies (LATE-NC and Lewy bodies12,31), cerebrovascular pathologies (CAA, gross infarcts, microinfarcts, circle of Willis atherosclerosis and arteriolosclerosis30,32) and hippocampal sclerosis32. NPEs commonly co-occurred, forming the following four identifiable clusters of pathologies: vascular, AD, LATE and Lewy body (Extended Data Fig. 1). Methods, Supplementary Methods and Supplementary Table 1 describe the applied phenotype definitions and harmonization approach.

GWAS meta-analysis of NPEs

We first performed GWAS on the 11 NPEs for the NACC, ROSMAP and ACT studies separately (Fig. 1). Genetic association analyses were performed with logistic or proportional-odds logistic regression mixed-effects models as appropriate (Methods). We then performed fixed-effects meta-analyses using METAL 2011-03-25 (ref. 33) on variants with minor allele frequencies ≥1% in at least one study33. Quantile–quantile plots and corresponding estimates of genomic inflation (λ values) did not suggest systematic bias (Extended Data Fig. 2).

In total, the meta-analysis revealed six loci with at least one variant meeting genome-wide significance (P < 5 × 10−8) across eight NPEs (amyloid-β plaques, arteriolosclerosis, atherosclerosis, Braak NFT stage, CAA, CERAD plaque score, hippocampal sclerosis and LATE-NC), with a total of 12 associations between genomic loci and NPEs (Fig. 2). Four of the six loci were from genes previously associated with late-onset ADRD (the broader APOE region, TMEM106B, GRN and BIN1; Fig. 2a–c,f,i,k), while two loci were new, where the lead variant was in or closest to LZTS1 and COL4A1, respectively (Fig. 2d,e). Although most meta-analyses had no significantly different effect size estimates across the three data sources, there were three with significant tests for heterogeneity (Table 2), all of which were associations with APOE.

Fig. 2: Manhattan plots identify loci associated with each of the 11 NPEs included in this study.
figure 2

ak, Manhattan plots are shown for amyloid-β plaques (a), Braak NFT stage (b), CERAD score for neuritic plaques (c), arteriolosclerosis (d), atherosclerosis in the circle of Willis (e), CAA (f), gross infarcts (g), microinfarcts (h), LATE-NC (i), Lewy body (j) and hippocampal sclerosis (k). The y axes denote the −log10(P value of meta-analysis two-sided z test) of the variant–phenotype association, and the x axes outline the chromosomal position, with alternate chromosomes represented in black and blue. Labels indicate the nearest gene at a locus. The horizontal lines define the genome-wide significance level (solid black, P = 5 × 10−8), near genome-wide significance level (dotted gold, P = 5 × 10−7), and suggestive significance level (P < 5 × 10−5) in loci with evidence of AD association from a previous study (for example, ref. 6; dotted purple). Points and gene symbols are coded with the same colors. Gray gene symbols indicate genome-wide significant hits within the APOE region that did not survive conditional analysis. All GWAS are in cohorts of European ancestry and adjusted for age at death, sex, genotyping cohort and top ten genetic PCs. We identified seven genome-wide significant loci and 30 near genome-wide significant or suggestive loci. The genome-wide significant loci resulted in 12 associations with eight NPEs (amyloid-β plaques, arteriolosclerosis, atherosclerosis, Braak NFT stage, CAA, CERAD score, hippocampal sclerosis and LATE-NC). Four genes were previously associated with ADRD (APOE, BIN1, TMEM106B, GRN; ac,f,i,k), while the three new loci were in or closest to LZTS1, COL4A1 and APOC2 (df). APOC2 is within the broader APOE region but remained significantly associated with CAA after adjusting for APOE \(\epsilon\) diplotypes (f). Three NPEs (gross infarcts, microinfarcts and Lewy bodies) had zero genome-wide significant hits, but all three had near genome-wide significant and/or suggestive hits from either new or known loci. APOE was associated with a range of NPEs, including LATE-NC, which is not pathognomonic of AD. On the other hand, neither GRN nor TMEM106B (recently identified in ADRD GWAS) was associated with the AD pathognomonic NPEs but were specific to gross infarcts, LATE-NC and hippocampal sclerosis at either genome-wide or suggestive significance. sig., significant; assocs, associations; sugg., suggestive.

Table 2 Significant NPE-associated loci in GWAS meta-analysis of NACC, ROSMAP and ACT datasets

We subsequently discovered a new locus near APOC2 within the broader APOE region that is associated with CAA after adjusting for APOE \(\epsilon\) diplotypes (Fig. 2f). No loci reached genome-wide significance with gross infarcts, microinfarcts or Lewy body pathology (Fig. 2g–j).

Known ADRD-associated loci

As expected, the APOE region (rs429358) is associated with multiple NPEs at genome-wide significance. Specifically, the APOE region (rs429358) is associated with (1) amyloid-β plaques (odds ratio (OR) = 1.98; P = 2.3 × 10−55; Table 2 and Fig. 2a), (2) Braak NFT stage (OR = 2.06, P = 9.7 × 10−89; Fig. 2b), (3) CERAD score (OR = 2.42, P = 4.7 × 10−103; Fig. 2c), (4) CAA (OR = 2.49, P = 4.4 × 10−138; Fig. 2f) and (5) LATE-NC (OR = 1.70, P = 1.7 × 10−14; Fig. 2i). Our results corroborate previous studies22,34 that described an association between APOE and CAA.

BIN1 was associated with Braak NFT stage (rs6733839; OR = 1.21; P = 1.6 × 10−9; Table 2 and Fig. 2b), and variants within TMEM106B were associated with both LATE-NC (rs2043539; OR = 0.70, P = 5.8 × 10−11; Fig. 2i) and hippocampal sclerosis (rs7805419; OR = 0.65; P = 3. 2× 10−13; Fig. 2k). A locus in GRN was also associated with hippocampal sclerosis (rs5848; OR = 1.40, P = 3.2 × 10−8; Fig. 2k).

New loci outside the APOE region

We also discovered two new loci outside the broader APOE region that are associated with three NPEs. Associations identified in the NPE GWAS meta-analysis included a suggestive PIK3R5 intronic locus associated with Braak NFT stage (rs72807981; OR = 0.71, P = 1.3 × 10−7; Supplementary Table 8 and Figs. 2b and 3a,b; see Supplementary Results for details on suggestive loci), an intronic LZTS1 locus associated with arteriolosclerosis (rs78909048; OR = 0.44, P = 5.8 × 10−10; Table 2 and Figs. 2d and 3d,e) and a variant 12 kilobase pairs (kbp) upstream of COL4A1 associated with the circle of Willis atherosclerosis (rs2000660; OR = 0.73, P = 2.7 × 10−8; Figs. 2e and 3g,h).

Fig. 3: New associations identified between PIK3R5 (suggestive), LZTS1 and COL4A1 and Braak NFT stage, arteriolosclerosis and cerebral atherosclerosis, respectively.
figure 3

a, Braak stage association plot from NPE GWAS meta-analysis (n = 7,776) for the region around PIK3R5. Colored dots represent the chromosomal position (x axis, Mb) in hg38 coordinates and −log10(P value from meta-analysis two-sided z test; y axis) of each variant in the region. Dots are colored to represent the LD r2 with the lead variant (purple diamond) estimated with PLINK–r2 using 1000 Genomes phase 3 European-descended participants. The recombination rate was calculated using GRCh38 genetic map files downloaded from https://bochet.gcc.biostat.washington.edu/beagle/genetic_maps/ and taking the ratio of difference of CM and Mb between positions. Boxes below data indicate the location of genes in the region. (Plot generated using LocusZoom73.) b, Association of PIK3R5 lead variant (rs72807981) with Braak stage for individual cohorts (NACC, n = 5,927 and ACT, n = 677; this variant was not present in ROSMAP) and meta-analysis (n = 6,604) using METAL (y axis). Points along the x axis represent OR of association, and error bars indicate 95% CI. c, Human brain cell-type expression profile of PIK3R5 in ref. 35. Columns represent mean FPKM. Error bars indicate the s.e. of measurement for each cell type based on the number of human samples sequenced for each type (fetal astrocytes, n = 6; mature astrocytes, n = 12; endothelial, n = 2; microglia, n = 3; neurons, n = 1 and oligodendrocytes, n = 5). PIK3R5 is primarily expressed in microglia. d, Arteriolosclerosis association plot from NPE GWAS meta-analysis (n = 6,668) for the region around LZTS1 (see a for interpretation). e, Association of LZTS1 lead variant (rs78909048) with arteriolosclerosis for individual cohorts (NACC, n = 4,930; ROSMAP, n = 1,163 and ACT, n = 575) and meta-analysis (n = 6,668) using METAL (y axis; see b for interpretation). f, Human brain cell-type expression profile of LZTS1 in ref. 35. LZTS1 is primarily expressed in fetal astrocytes and endothelial cells (see c for interpretation). g, Cerebral atherosclerosis association plot from NPE GWAS meta-analysis (n = 7,340) for the region around COL4A1 (see a for interpretation). h, Association of COL4A1 lead variant (rs2000660) with cerebral atherosclerosis for individual cohorts (NACC, n = 5,496; ROSMAP, n = 1,175 and ACT, n = 669) and meta-analysis (n = 7,340) using METAL (y axis; see b for interpretation). i, Human brain cell-type expression profile of COL4A1 in ref. 35. COL4A1 is preferentially expressed in fetal astrocytes and endothelial cells with lower expression in neurons (see c for interpretation). Mb, megabase.

We next characterized which cell type(s) in the human brain express the new genes identified. According to brainrnaseq.org35,36, PIK3R5 is most highly expressed in microglia (Fig. 3c), LZTS1 is most highly expressed in fetal astrocytes and endothelial cells (Fig. 3f) and COL4A1 is most highly expressed in fetal astrocytes, endothelial cells and neurons (Fig. 3i).

New locus association within the APOE region

Based on the meta-analysis, we observed 12 genetic locus-phenotype associations within the broader APOE region (defined as less than 500 kbp from the start or end site of APOE transcription) across five NPEs (amyloid plaques, Braak stage, CAA, CERAD score and LATE-NC), where APOE itself (rs429358) was the top variant in the region for all five NPEs (Fig. 2a–c,f,i). We performed additional analyses in this region adjusting for APOE \(\epsilon\) diplotypes to determine whether any of the genome-wide significant signals within the broader APOE region remained significant.

In the APOE-adjusted analysis, the lead variant from the nonadjusted analysis (rs429358), which tags the APOE ε4 allele (the well-known common variant with the strongest association with LOAD), was no longer associated with any of the five phenotypes. One locus with lead variant rs7247551 remained significantly associated with CAA (OR = 0.81; P = 8.0 × 10−12; Table 2 and Fig. 2f). rs7247551 is located between APOC2 and CLPTM1. No variants remained genome-wide significantly associated with any other APOE-associated NPE. Sensitivity analyses showed that the effect size of rs7247551 did not significantly differ based on APOE diplotype in NACC, ROSMAP or ACT (Supplementary Results and Extended Data Fig. 3).

APOC2 replicates in an independent cohort

We obtained data from a recent GWAS of CAA in 815 participants with dementia in the Mayo Clinic Brain Bank. Using their data, we replicated the association between rs7247551 and CAA while adjusting for APOE \(\epsilon\) diplotypes (P = 0.0012). We also confirmed that rs7247551 was indeed new and not in linkage disequilibrium (LD) with the variant previously reported in ref. 26 (rs5117; r2 < 0.01). Together, these results provide evidence for a new locus within the broader APOE region that is independent of the APOE \(\epsilon\) diplotypes and is associated with CAA pathology burden. It further suggests that the genetic risk for CAA in the broader APOE region may differ from the AD-specific neuropathologies (neuritic amyloid plaques and NFT).

Associations of clinical and proxy AD risk loci with NPE

We further tested whether LOAD-associated loci identified in a recent ADRD GWAS were associated with any evaluated NPEs6. Reference 6 identified a total of 83 distinct non-APOE loci (39 previously identified and 44 new) associated with ADRD (hereafter, ‘ADRD loci’), between 76 and 78 of which had lead variants that met inclusion criteria in our study for each NPE. In total, 26 NPE-locus associations from 19 loci had adjusted P (Q values) ≤0.05 across the 11 NPEs. Of the 26 associations, 24 had concordant directions of effect with ref. 6 (Table 3 and Supplementary Table 2).

Table 3 Associations between NPEs (using NACC, ROSMAP and ACT datasets) and known ADRD loci

Notably, the pathognomonic AD pathologies, operationalized using Braak NFT stage and CERAD score, had concordant directions of effect with 68 and 59 of 77 ADRD loci, respectively, regardless of statistical significance. Five ADRD loci (BIN1, MME, HLA-DQA1, TMEM106B and CELF1/SPI1; Table 3) were significantly associated with the Braak NFT stage after multiple testing corrections. Two ADRD loci (PICALM and TPCN1) were associated with CAA. Eleven ADRD loci (CR1, BIN1, INPP5D, ZCWPW1/NYAP1, PTK2B, CELF1/SPI1, PICALM, SORL1, FERMT2, SNX1 and ABCA7) were significantly associated with the CERAD score after multiple testing corrections. Four ADRD loci (TMEM106B, IL34, GRN and MAPT) were significantly associated with hippocampal sclerosis, all of which except for IL34 (hippocampal sclerosis—OR = 0.74, P = 0.0019; AD—OR = 1.06, P = 5.6 × 10−6) were concordant in effect direction. Two ADRD loci (TMEM106B and GRN) were significantly associated with LATE-NC. Finally, two ADRD loci (TPCN1 and PLCG2) were significantly associated with microinfarcts. These results indicate that NPE studies largely corroborate the findings of large AD GWAS based on clinical and proxy phenotypes. Several ADRD loci, particularly MAPT (hippocampal sclerosis), TMEM106B and GRN (hippocampal sclerosis and LATE-NC), were associated only with non-AD pathology.

Identifying potential genetic mechanisms of NPE development

We also assessed possible mechanisms through which identified NPE-associated variants may be involved in disease risk.

Gene-prioritization and enrichment analyses

Using gene-based, pathway, and enrichment analyses, we identified APOE as significantly associated with NFT, diffuse plaques, CAA, neuritic plaques and LATE-NC (Supplementary Table 3)37. TMEM106B was associated with both hippocampal sclerosis and LATE-NC. We further found that rs2000660 (associated with the circle of Willis atherosclerosis) is located within an enhancer region 13 kbp upstream of COL4A1 transcription start site (Extended Data Fig. 4). Annotations from FAVOR identified a synonymous, exonic single-nucleotide polymorphism (SNP), rs650724, in high LD (r2 = 0.86) with this lead variant that is highly conserved across mammals (mamPhCons = 0.987; Supplementary Table 4)38.

Genetic colocalization analysis

We investigated whether loci associated with multiple NPEs show evidence for genetic colocalization39. We identified two NPE pairs exhibiting genetic colocalization, including (1) Braak NFT stage and CERAD score (BIN1; probability of colocalization (PrC) > 99%; Extended Data Fig. 5) and (2) hippocampal sclerosis and LATE-NC (TMEM106B; PrC = 90%; Fig. 4a–c).

Fig. 4: Hippocampal sclerosis and quantitative trait locis all colocalize with LATE-NC on TMEM106B.
figure 4

We investigated whether loci associated with multiple NPEs show evidence for genetic colocalization using a Bayesian colocalization analysis approach implemented in the coloc R package72. a, The TMEM106B lead variant (rs2043539) reached genome-wide significance with LATE-NC. b,c, Hippocampal sclerosis colocalized with LATE-NC on TMEM106B (PrC = 90%). d,e, TMEM106B expression colocalized with LATE-NC (PrC = 91%). fi, Two methylation QTL (mQTLs), cg09613507 (PrC = 89%; f,g) and cg23422036 (PrC = 95%; h,i), also colocalized with LATE-NC. a, b, d, f and h show regional LocusZoom73 plots for each trait. Purple diamonds represent lead variants. c, e, g and i compare −log10(P) values between each trait compared to LATE-NC −log10(P) values across the TMEM106B rs2043539 locus (color legend same as in a). The TMEM106B expression and the methylation data were obtained from ROSMAP. j, Decreased TMEM106B expression was associated with more severe LATE-NC pathology (P = 0.043). Unless otherwise specified, for all boxplots, boxes outline the first quartile, median and third quartile. Whiskers extend up to 1.5× the distance between the first and third quartiles. k, Hypermethylation of cg09613507 was associated with more severe LATE-NC pathology (P = 0.0093). l, Methylation at cg23422036 was not significantly associated (P = 0.10). m, Unrelated to TMEM106B, BIN1 expresses eight distinct RNA isoforms simultaneously in the frontal cortex from six AD cases and six controls. To understand the complexities and nuances of ADRDs, we also need to understand the nuances of the genes purported to be driving disease. CDS, coding sequence; CTRL, control; CPM, counts per million; F, female; M, male; hipp. scler., hippocampal sclerosis; expr. cer., expression in cerebellum.

We also tested for colocalization between NPE loci and quantitative trait loci (QTL). The TMEM106B locus (associated with hippocampal sclerosis and LATE-NC) colocalized with TMEM106B expression in multiple tissues, including the cerebellar hemisphere (PrC = 91%; Fig. 4d,e). Two CpG sites located either within TMEM106B (cg09613507; Fig. 4f) or upstream (cg23422036; Fig. 4h) colocalized with both hippocampal sclerosis (cg09613507-hippocampal sclerosis PrC = 94%, cg23422036-hippocampal sclerosis PrC = 94%; Fig. 4b,f,g) and LATE-NC (cg09613507-LATE-NC PrC = 89% and cg23422036-LATE-NC PrC = 95%; Fig. 4a,f–i). A hippocampal sclerosis-associated locus within GRN strongly colocalized with both LATE-NC and GRN expression in multiple tissues (PrC > 99.9%; Extended Data Fig. 6).

APOC2 colocalized with several traits, including methylation QTL (mQTL) for four CpG sites in ROSMAP (cg04401876, cg10169327, cg13119609 and cg09555818; PrC = 96–99%; Fig. 5a–i). The APOC2 locus also colocalized with an expression of multiple genes in Genotype-Tissue Expression Project (GTEx), including APOE expression in the wall of the aorta (PrC = 94%), CLPTM1 expression in the skin of the leg and suprapubic region and APOC2 expression in 17 different tissues (for example, brain cortex, caudate, nucleus accumbens and cerebellum; PrC = 89–96%).

Fig. 5: Four mQTL colocalize with CAA on APOC2.
figure 5

Using the same Bayesian colocalization analysis approach from Fig. 4 (coloc 5.2.2 R package72), we tested for colocalization between CAA and methylation sites using existing data from ROSMAP. a, Lead SNP rs7247551, near APOC2, reached genome-wide significance with CAA. b,d,f,h, The rs7247551 was also significantly associated with four mQTL. bi, cg04401876 (PrC = 96%; b,c), cg10169327 (PrC = 96%; d,e), cg13119609 (PrC = 99%; f,g) and cg09555818 (PrC = 97%; h,i) all colocalized with CAA. a, b, d, f and h show regional LocusZoom73 plots for each trait. c, e, g and i compare −log10(P) values between each trait compared to CAA −log10(P) values across the APOC2 rs7247551 locus. Variants in LD with the lead variant (purple diamond in ai) are shaded in ai according to the color legend on the left-hand side of a. j, Plots of normalized methylation level versus CAA pathology severity. Hypomethylation at cg09555818 (OR = 0.82, P = 0.003) and cg13119609 (OR = 0.78, P = 0.0006) were significantly associated with more severe CAA pathology. Unless otherwise specified, for all boxplots, boxes outline the first quartile, median and third quartile. Whiskers extend up to 1.5× the distance between the first and third quartiles. k, Both cg09555818 (P = 0.0063; k) and cg13119609 (P = 0.0069; not shown) were significantly associated with APOC2 expression. l,m, The rs7247551 G allele was significantly associated with increased APOC2 expression in the frontal cortex in ROSMAP (β = 0.072, P = 0.00013; l); however, the direction of effect was opposite of that found in brain tissues in GTEx (P = 7.2 × 10−7; m). n,o, The rs7247551 was not associated with APOE (P = 0.81; n) or APOC2 (P = 0.89; o) expression in frontal cortex in ROSMAP. p, APOC2 is highly expressed, especially in microglia and oligodendrocytes. Columns represent mean FPKM. Error bars indicate the s.e. of measurement for each cell type based on the number of human samples sequenced for each type (fetal astrocytes, n = 6; mature astrocytes, n = 12; neurons, n = 1; oligodendrocytes, n = 5; endothelial, n = 2 and microglia, n = 3). expr., expression; norm., normalized.

Multiple suggestive NPE loci showed evidence of colocalization with expression QTL (eQTL) in GTEx. In total, 50 NPE loci (lead variant P < 1 × 10−5) colocalized with various QTL types (that is, expression, methylation or splicing QTL [sQTL]; PrC ≥ 80%). A total of 407 NPE–QTL pairs colocalized across 47 tissues (Supplementary Table 5), many giving credence to previously discovered associations and insight into potential mechanisms. For example, rs1643235 (ABCC9) colocalized with hippocampal sclerosis and gene expression in multiple tissues, including the cortex (PrC = 80%), corroborating previous studies27,40,41 because the ABCC9 SNP rs4148674 (r2 = 0.96 with rs1643235) was a robust eQTL for ABCC9 and the strongest ABCC9 region signal for association with hippocampal sclerosis (Extended Data Fig. 7).

Functional studies in ROSMAP

Using data from ROSMAP participants with DNA methylation and/or RNA-sequencing (RNA-seq) data available from the dorsolateral prefrontal cortex (DLPFC), we found that neither TMEM106B nor GRN expression was associated with hippocampal sclerosis (P > 0.05), while decreased TMEM106B expression was associated with more severe LATE-NC pathology (P = 0.043; Fig. 4j). Of the two CpG sites that colocalized with hippocampal sclerosis and LATE-NC, hypermethylation of cg09613507 was associated with more severe LATE-NC pathology (P = 0.0093; Fig. 4k), while cg23422036 was not significantly associated (P = 0.10; Fig. 4l).

Additionally, we tested for association between CAA pathology and methylation levels at four CpG sites (cg09555818, cg04401876, cg10169327 and cg13119609) that colocalized with the chromosome 19 (chr19) CAA risk locus (rs7247551). We first confirmed that all four CpG sites were significantly associated with rs7247551 (P < 0.0001) and had directions of effect consistent with those previously reported for ROSMAP42. Hypomethylation at cg09555818 (OR = 0.82, P = 0.003) and cg13119609 (OR = 0.78, P = 0.0006) were significantly associated with more severe CAA pathology (Fig. 5j). Both cg09555818 (P = 0.0063; Fig. 5k) and cg13119609 (P = 0.0069) were significantly associated with APOC2 expression.

Next, as APOC2 expression in multiple brain tissues colocalized with CAA in GTEx but not ROSMAP, we investigated whether there was a nominal association between APOC2 expression in the DLPFC and rs7247551. We found that the G allele of rs7247551 was nominally associated with increased APOC2 expression in the DLPFC (β = 0.072, P = 0.00013; Fig. 5l); however, the direction of effect was opposite of that found in brain tissues in GTEx (that is, the G allele of rs7247551 was associated with decreased APOC2 expression in GTEx; P = 7.2 × 10−7; Fig. 5m). Expression of APOC2 in the DLPFC was not associated with CAA in ROSMAP (OR = 0.98, P = 0.89). We performed an additional post hoc analysis for nominal APOE eQTL activity of rs7247551 in ROSMAP. We confirmed that rs7247551 was not associated with APOE expression in the DLPFC in ROSMAP (P = 0.81; Fig. 5n). APOC2 expression was also not significantly associated with the severity of CAA pathology in ROSMAP (P = 0.089; Fig. 5o). Notably, APOC2 is highly expressed, especially in microglia and oligodendrocytes (Fig. 5p).

Potential effects via differential RNA splicing

Many human genes undergo alternative splicing43,44,45. Recent work in ref. 46 demonstrated that many medically relevant genes express multiple RNA isoforms that result in unique proteins, including genes involved in ADRD. Specific examples include APP, MAPT and BIN1, which express five, five and eight distinct RNA isoforms above noise levels, respectively, in the human frontal cortex. Thus, we explored whether any genes associated with NPEs in this work express multiple RNA isoforms in the human frontal cortex.

While all loci that were associated with NPE have multiple annotated RNA isoforms, ranging from 4 (LZTS1) to 24 (PICALM) per Ensembl, most of the genes exhibited expression for only a single isoform above noise levels, per data from ref. 46. Some expressed multiple RNA isoforms, but the isoforms were not predicted to result in distinct proteins (for example, LZTS1)—although recent data suggest that alternative untranslated regions have direct functional consequences47,48,49. BIN1, however, actively transcribes eight distinct RNA isoforms in the frontal cortex (Fig. 4m). BIN1 is also expressed in multiple brain cell types in humans, according to brainrnaseq.org35,36. Given the diversity of BIN1 isoforms simultaneously expressed, we need to understand whether specific isoforms are involved in disease, and we propose differential RNA isoform expression as a potential mechanism through which disease genes may be affecting disease.

Discussion

The present study of 11 ADRD-related NPEs comprised 7,804 participants. These results provide an autopsy-based complement to previous studies based on clinical diagnoses and expand on the findings of previous genetic studies of dementias and neuropathologies2,6,22,23,24,26,41. In this work, we confirm previous findings that several loci associated with AD (including APOE) are also associated with non-ADNC pathogenesis. Additionally, some of the known ADRD loci (that is, TMEM106B and GRN) did not associate with any of the classical AD-defining NPEs in this study. Thus, our results improve our collective understanding of the complex nature of ADRD and its genetic bases.

Sharpening the endophenotypes enabled the discovery of new hit genes that did not reach genome-wide significance in previous studies oriented toward studying AD clinical phenotypes. We identified clear genetic associations between specific neuropathologies and loci already associated with ADRD and three new NPE-associated loci. Yet, many questions remain and will require deeper investigation. For example, three of the NPEs studied (gross infarcts, microinfarcts and Lewy body pathology) did not have any locus reach genome-wide significance in our study. Likely explanations for the lack of genome-wide associations for these three pathologies include (1) insufficient sample size, (2) the cohorts not being specifically designed to study these phenotypes, and (3) variability in the collection and scoring of these phenotypes over time and between research centers. Nonetheless, these three NPEs each had suggestive associations, including with known loci (Supplementary Results). Specifically, TMEM106B and APOE had suggestive associations with gross infarcts and Lewy body pathology, respectively. APOE ?4 was previously associated with Lewy body pathology by a study discussed in ref. 50. As study cohorts grow larger, researchers will be able to re-assess these suggestive associations.

As expected, APOE variation was associated with ADNC, CAA and LATE-NC. Although LATE-NC is diagnostically characterized by TDP-43 proteinopathy3,12,13, brains with a greater burden of neuritic amyloid plaques are more likely to have TDP-43 proteinopathy (and vice versa)12. Similarly, others have previously shown an association between APOE ?4 status and TDP-43 pathology in ROSMAP11. Unraveling the complex interaction between APOE and proteinopathies (including the suggestive association to Lewy bodies) may provide a crucial insight into properly treating patients with these often comorbid pathologies. A study using a mouse model suggests that TDP-43 directly interacts with amyloid-β and promotes worse pathology51, but further studies are needed to better understand this interaction.

By adjusting for APOE ? diplotypes, we identified a new locus near APOC2 associated with CAA. Like APOE, APOC2 is part of the apolipoprotein family and is involved in lipoprotein metabolism. Both genes directly bind fat droplets (chylomicrons)52,53 and are implicated in heart disease54—a known link to APOE and dementia. APOC2 and APOE expression is also coregulated through the same mechanisms in liver52,53. We replicated the association between APOC2 and CAA while adjusting for APOE in an independent sample of 815 Mayo Clinic Brain Bank participants used in ref. 26, providing additional evidence for the APOC2 locus being important for CAA pathology beyond the known effects of APOE \(\epsilon\) haplotypes.

Previously, a study discussed in ref. 55 reported that a variant proximal to rs7247551 (rs10413089; 822 bp away) showed evidence of association with clinical AD independently of APOE ? status in their cohort but determined their results were inconclusive. Their original association did not survive multiple testing corrections, but the association replicated in an independent cohort. Summary statistics from ref. 6 reflect that rs7247551 was significantly associated with ADRD but did not report results of an independent analysis within the broader APOE region. Thus, the present study is the first to confirm that this association is independent of the known effects of APOE alleles. Both the potential association with clinical AD status found by ref. 55 and the association with CAA we report herein should be followed up in future studies.

Several variants in the APOC2 locus were lead eQTLs for APOC2 brain expression in both GTEx and ROSMAP. Colocalization analysis confirmed that the new CAA risk locus shares a functional variant with both APOC2 eQTL and nearby brain cortex mQTL. We confirmed that two of the CpG sites affected by the CAA risk locus, cg09555818 and cg13119609, were, in turn, significantly associated with CAA pathology. Both CpG sites are located within the APOC4–APOC2 readthrough transcript region, overlapping APOC4 and APOC2. Our results are consistent with the hypothesis that the association between rs7247551 and CAA risk may be driven by hypomethylation of CpG sites in the APOC2 region. Previous studies in other human cohorts also implicate hypomethylation at cg09555818 and cg13119609 in AD56,57,58. Collectively, these results provide evidence that APOC2 may be the target gene of the rs7247551 CAA risk locus, although more research must be done for verification.

We also identified associations between known loci and multiple NPEs, including BIN1, APOE and TMEM106B. These associations provide context regarding their involvement in disease pathogenesis. One intronic locus of TMEM106B was significantly associated with both hippocampal sclerosis and LATE-NC, while a locus within GRN was associated with hippocampal sclerosis. Both genes are associated with frontotemporal lobar degeneration with TDP-43 inclusions59,60, hippocampal sclerosis61,62, and were recently associated with clinical AD5,6. We found that hippocampal sclerosis, LATE-NC, and clinical AD all colocalize at these two loci, suggesting that hippocampal sclerosis, LATE-NC and clinical AD likely share causal loci for these genes. Furthermore, hippocampal sclerosis (GRN and TMEM106B) and LATE-NC (TMEM106B) colocalized with brain eQTL and mQTL, and TMEM106B expression and methylation were associated with LATE-NC in downstream analyses. Notably, lead GRN and TMEM106B variants identified in GWAS of ADRD were not associated with the ADNC NPEs. Given that a substantial fraction of individuals diagnosed with dementia have LATE-NC or hippocampal sclerosis pathology (with estimates as high as ~50%)3, it is plausible that the associations found between these genes and clinical AD in recent GWAS are due to individuals with these non-AD pathologies who were diagnosed clinically with AD.

A locus ~30 kbp downstream of BIN1 on chr2q14 was significantly associated with Braak stage and suggestively associated with CERAD score for neuritic plaques. In previous GWAS, this locus was second only to APOE for strength of association with LOAD6. We verified through colocalization analysis that the same locus drives association signals with the Braak NFT stage and CERAD score. Interestingly, the lead variant in this locus, rs6733839, was not associated with nonneuritic amyloid plaques nor CAA, neither of which include tau deposits. Previous research supports the hypothesis that BIN1 is associated with LOAD through its effect on NFT rather than amyloid pathology63,64.

We also discovered intriguing new loci mapped to COL4A1 and LZTS1 associated with atherosclerosis in the circle of Willis and brain arteriolosclerosis, respectively.

One locus on chr13q34 with lead variant rs2000660 located 12 kbp upstream of COL4A1 was significantly associated with atherosclerosis in the circle of Willis. Previous research found that the COL4A1/COL4A2 locus is associated with numerous vascular disease phenotypes, including peripheral artery disease, coronary artery disease, stroke, arteriolar stiffness, rare familial cerebrovascular diseases and stroke65,66,67 In a recent GWAS, rs2000660 was a lead risk variant for migraines68. The relevance of the COL4A1 locus to cerebral vascular traits is thus highly supported by previous research, and the biological role of collagen IV in vascular disease is possibly related to the disruption of the extracellular matrix65. COL4A1 is preferentially expressed in astrocytes and endothelial cells and codes for a component of collagen IV, an important component of basal lamina. Endothelial cells are strongly implicated in atherosclerosis, and in recent years, researchers have suggested that astrocytes may also be directly involved in cerebrovascular disease69. The rs2000660 was not nominally associated with any other vascular NPE in our study, and a previous GWAS of circle of Willis atherosclerosis using ROSMAP participants did not identify the COL4A1 as a risk locus24. The rs650724, a variant in high LD with rs2000660 (r2 = 0.84), is a synonymous coding variant (p.Ser1600Ser in ENST00000375820.10; p.Ser319Ser in ENST00000650424.1) within COL4A1. The rs2000660 is also located within an enhancer for COL4A1, presenting possible molecular functional mechanisms driving association in this locus.

An intronic variant within PIK3R5 (rs72807981; chr17p13) met our suggestive threshold of association with the Braak NFT stage. PIK3R5 codes for a phosphatidylinositol 3-kinase involved in cell growth, motility and survival. There is previous research suggesting that PIK3R5 is more highly expressed in aged adults with Braak NFT stages V and VI versus nondemented controls70. PIK3R5 is expressed preferentially in microglial cells in humans35, suggesting that its association with neurofibrillary pathology may be immune-mediated, although future work is needed to validate the association between PIK3R5 and NFT.

One new intronic locus in LZTS1 was found to be protective against brain arteriolosclerosis. The relatively modest literature regarding LZTS1 has focused mostly on cancers; however, LZTS1 is primarily expressed in endothelial cells and astrocytes, cell types relevant to vascular function and dysfunction. One paper suggests that LZTS1 is involved in neuronal delamination and development of glial-like cells during mammalian neocortical development71, but additional work related to LZTS1 and its function in the cerebrovasculature and brain parenchyma is needed.

In conclusion, we identified promising new loci associated with NPEs and replicated multiple known risk loci for ADRD using NPE-based GWAS. Our study demonstrates the importance of studying genetic risk factors of NPEs as a complement to studies of clinical and proxy phenotypes of LOAD.

Methods

Participants

An overview of our study design is presented in Fig. 1. Each participating study previously obtained informed consent from participants or caregivers for participants with substantial cognitive impairment. Parent study protocols were approved by the local institutional review boards. This study was approved by the University of Kentucky Office of Research Integrity Institutional Review Board.

NACC

The present study used NACC data from 36 National Institute on Aging (NIA)-funded Alzheimer’s Disease Research Centers (ADRCs). Individual ADRCs use different recruitment strategies and perform autopsies on-site, but neuropathology data at each ADRC are collected using a standard form (https://files.alz.washington.edu/documentation/np11-form.pdf) and submitted to NACC where they are aggregated and anonymized. The NACC Neuropathology dataset based on the first version of this form was originally implemented in 2001 (ref. 30), and this analysis uses data from then through the March 2023 freeze. Participants were excluded if they did not have autopsy data available or if they were noted in the NACC Neuropathology dataset to have at least one of 19 conditions that could potentially bias results. These conditions include brain tumors, severe head trauma and frontotemporal lobar degeneration (see Supplementary Table 6 for the full list of variables used for exclusion criteria).

ROSMAP

ROSMAP consists of harmonized data from the following two longitudinal cohort studies: the Religious Orders Study (ROS) and the Rush Memory and Aging Project (MAP)74. ROS and MAP were both approved by the Institutional Review Board of Rush University Medical Center. All participants signed the Uniform Anatomic Gift Act, as well as informed and repository consents. ROS began in 1994 and has recruited over 1,500 Catholic priests, nuns and brothers across the United States. MAP started in 1997 and has enrolled more than 2,300 community members in the greater Chicago area of northeastern Illinois. The ROSMAP NP data used in this study were received from Rush University Medical Center in January 2020. Using KING 2.2.7 (ref. 75), we found that several participants in ROSMAP also had neuropathology and genotype data available in NACC. In these cases, records in the NACC were preferentially kept.

ACT

The ACT study began in 1994 and recruited residents in the greater Seattle area aged 65 years and older without dementia at the time of enrollment76,77,78,79. The study has expanded to include three cohorts with continued enrollment using the original enrollment criteria and has a current total of 4,960 participants across all three cohorts. The ACT NP data used in this study were obtained from Kaiser Permanente in May 2023.

Genotype data and quality control

Genotype data for all cohorts underwent imputation using the Trans-Omics for Precision Medicine (TOPMed) Imputation Server 1.7.3 and the TOPMed reference panel using Minimac 4 (refs. 80,81,82). Postprocessing was performed with BCFtools 1.10.2 (ref. 83) and SAMtools 1.10 (ref. 84), PLINK 1.9 and 2.0 (ref. 85), R 4.2.1 and 4.2.2 (https://cran.r-project.org/), and R packages data.table 1.14.10 (https://CRAN.R-project.org/package=data.table) and stringi 1.803 (ref. 86). The 3.4.2 NACC and ACT raw genotype data were obtained from the September 2020 freeze of the Alzheimer’s Disease Genetics Consortium (ADGC) in March 2021 and subsequently imputed. Pre-imputed ROSMAP genotype data were received from collaborators in the Hohman Lab at Vanderbilt University in December 2021. Genetic variants with minor allele frequency (MAF) < 0.1% and imputation quality scores of <0.8 were removed before further quality control measures. Due to the small sample sizes of participants with substantial non-European ancestry (based on proximity to 1000 Genomes ‘EUR’ superpopulation cluster in principal component (PC) analysis), especially in replication cohorts, these participants were excluded from the analysis. Standard GWAS quality control procedures were followed for variant and participant inclusion (Supplementary Methods).

Defining and harmonizing NPEs for analysis

In total, we combined and/or harmonized 11 NPEs for analysis across the four studies. We note that there are differences in the way that some neuropathological data were collected across studies, and our strategy for harmonizing was informed by practical considerations for maximizing available sample sizes given the available endophenotypes. Thus, several synthetic NPEs were created by merging existing NPEs within a cohort or by harmonizing categorical variables from one cohort and continuous variables from another. Hippocampal sclerosis, microinfarcts and gross infarcts were recorded as binary case–control phenotypes. Arteriosclerosis, atherosclerosis, CAA, CERAD score for neuritic plaques, amyloid plaques, LATE-NC and Lewy body pathology were recorded as four-stage ordinal variables that either measured progressive severity of pathology (‘none’ < ‘mild’ < ‘moderate’ < ‘severe’) or progressing anatomical distribution of pathology. Braak NFT was recorded as a seven-stage ordinal variable that followed the anatomical distributional stages originally characterized in ref. 87. We provide a deeper description of our harmonization approach in the Supplementary Methods, and a detailed listing of variables harmonized across data sources to construct NPEs for analysis is available in Supplementary Table 1.

To assess the co-occurrence of NPEs in our data, we estimated polychoric correlations (an approach that assumes that observed ordinal or binary variables reflect latent normally distributed variables) between NPE pairs, followed by hierarchical clustering using the polycor 0.8-1 (ref. 88), psych89 and pheatmap 1.0.12 (ref. 90) R packages (Extended Data Fig. 1).

DNA methylation data

Preprocessed and quality-controlled DNA methylation data for 740 ROSMAP participants were downloaded from Synapse.org (Synapse IDs: syn3157275 and syn3191087). DNA methylation preparation and quality control measures have been previously described91,92. Briefly, approximately 50 mg of frozen gray matter tissue from the DLPFC was sampled from each participant. DNA was then extracted and processed using the Illumina Infinium HumanMethylation450 BeadChip. Quality control measures included removing low-quality probes, removing participants with poor bisulfite-conversion efficiency and adjusting methylation levels by age, sex and batch, which adequately controlled for batch effects92. Missing methylation levels were imputed using 100-nearest neighbors91,92.

RNA-seq data

Preprocessed and quality-controlled bulk-tissue RNA-seq data from the DLPFC of ROSMAP participants were downloaded from Synapse.org (Synapse IDs: syn21088596, syn21323366, syn3505732 and syn3505724). As previously described, samples were prepared by sectioning approximately 100 mg of gray matter tissue from the DLPFC and RNA extracted using the Qiagen MiRNeasy Mini (217004) protocol and then submitted for transcriptome library construction using the dUTP protocol and Illumina sequencing92. A total of 634 participants in seven batches were sequenced with an average sequencing depth of 50 million paired reads per sample92. Subsequent quality control and batch corrections were performed, and the final output of the RNA-Seq pipeline was fragments per kilobase of transcript per million mapped reads (FPKM)92.

Statistical analyses

Single-variant GWAS

We analyzed ordinal endophenotypes using proportional-odds logistic mixed-effects models implemented in the POLMM 0.2.3 (refs. 93,94) and GRAB 0.1.1 R packages93 and analyzed binary variables similarly with logistic mixed-effects models implemented in the SAIGE R package95. Fixed-effect covariates included age at death, sex, cohort and the first ten genetic PCs created using the PCA in Related Samples (PC-AiR) method in the GENESIS 2.26.0 R package96. We included a genetic relationship matrix (GRM) as a random effect to account for relatedness between participants. An additive mode of inheritance was assumed in all analyses.

Analysis of individual data sources proceeded in two stages. In stage one, GRM was constructed using a pruned set of independent variants, defined as having a pairwise r2 < 0.2 within moving windows of 15 kbp. Null models, which included fixed covariates and the GRM, were then fitted using the GWASTools 1.42.1 (ref. 97), SNPRelate 1.30.1 (ref. 98), POLMM 0.2.3 (refs. 93,94), GRAB 0.1.1 (refs. 93,94) and/or SAIGE 1.1.3 (ref. 95) R packages. In stage two, score tests were performed on each variant with a saddle-point approximation used to calculate P values. We considered all variants with a P <5 × 10−8 to be genome-wide significant. To identify independent risk loci, we clumped results using the ‘--clump’ flag in PLINK 1.9 with the pairwise LD threshold set to r2 ≤ 0.05 (https://www.cog-genomics.org/plink/1.9/)85. Following analyses of individual cohorts, we performed fixed-effects meta-analyses using METAL software using inverse-variance weighting on variants with MAF > 1% in at least one cohort33. Variants with a total minor allele count <20 after meta-analysis were then excluded.

Conditional analysis of the APOE region

The region surrounding the APOE gene on chr19 is consistently the single strongest genetic risk factor for LOAD in GWAS. Three common forms of the APOE gene—\(\epsilon\)2, \(\epsilon\)3 and \(\epsilon\)4—are present in our study populations (see Table 1 for distribution of APOE alleles in participants), and the \(\epsilon\)2 and \(\epsilon\)4 alleles are associated with lower and higher risk of LOAD, respectively, relative to the \(\epsilon\)3 allele99. We therefore expected that variants in the APOE region, defined as the region within 200 kbp from the start and end transcription sites of APOE, would be associated with multiple NPEs in our study. Moreover, we hypothesized that genetic variants in the APOE region may influence neuropathology risk independently of the effects of APOE \(\epsilon\) alleles. To test this hypothesis, we re-analyzed variants in chr19 while adjusting for APOE \(\epsilon\) diplotype. We limited re-analysis to endophenotypes with at least one genome-wide significant association signal within the APOE locus in the final meta-analysis of the three independent GWAS. APOE diplotypes were determined either using the rs7412 and rs429358 variants according to the SNPedia online ref. 100 or taken directly from study data if available. Both rs7412 and rs429358 variants had high imputation quality scores (r2 = 0.997 and 0.975, respectively). The \(\epsilon 3/\epsilon 3\) diplotype was used as a reference, and we included fixed-effect indicator variables to adjust for \(\epsilon 2/\epsilon 2\), \(\epsilon 2/\epsilon 3\), \(\epsilon 3/\epsilon 4\) and \(\epsilon 4/\epsilon 4\) diplotypes. We chose this approach rather than adjusting for counts of \(\epsilon\)2 and \(\epsilon\)4 alleles because it is robust to potential nonlinear effects of genotypes. We performed additional sensitivity analyses for loci identified through this approach (Supplementary Information).

Replication of known AD risk loci in NPE

We used the 83 ADRD loci presented in a recent large GWAS to investigate whether AD-associated loci were associated with NPE6. We restricted our comparison to AD loci with lead variants with MAF ≥ 0.01, leaving 76–78 loci for comparison for each NPE. LD for variants near the top-known AD-associated variants was evaluated using the R package LDLinkR 1.2.3 (ref. 101). We controlled the false-discovery rate for each NPE using the Benjamini–Hochberg procedure102. Variants with an adjusted Q value ≤ 0.05 were considered significant.

FUMA and FAVOR annotation, gene-prioritization and functional enrichment pipeline

We mapped variants to genes and performed subsequent gene and gene-set analyses using the FUMA and FAVOR pipelines37,103. Variants were mapped to genes if they had P ≤ 1 × 10−5 in the GWAS meta-analysis and were located within 10 kbp of a protein-coding gene’s transcription start or end sites. Gene-based analyses were performed using MAGMA 1.10. The top variant PCs that accounted for 99.9% of the variance in a gene’s region were used to test for significance using an F test. We considered genes with resulting P ≤ 2.5 × 10−6 to be significantly associated with NPE. Gene-set enrichment analyses were performed using MAGMA104 gene-set analysis of Gene Ontology and curated gene sets from MSigDB105. Bonferroni P value corrections were made for each NPE individually.

Colocalization analyses

We used multiple sources of publicly available summary statistics from external studies as data sources for Bayesian colocalization analyses. First, we downloaded Genotype-Tissue Expression Project (GTEx) v8 European ancestry QTL analysis summary statistics, which contains summary statistics for significant gene expression and splicing QTL variants (eQTL and sQTL, respectively) in 48 body tissues106. We also used gene expression and DNA mQTL analysis summary statistics from studies using tissue from the DLPFC of ROSMAP participants42. These studies examined the associations of genetic variants with molecular traits and provided curated lists of significant QTL variants. Finally, we downloaded the summary statistics from a recent GWAS of LOAD for a targeted post hoc colocalization analysis in TMEM106B and GRN6.

For each NPE outcome in our study, we first created a list of genetic variants with P ≤ 1 × 10−5 in the GWAS meta-analysis. We then queried the lists of significant QTL variants in GTEx and ROSMAP using R (https://cran.r-project.org/) and Python 3.8.16 and 3.10.8 (https://www.python.org/) to identify neuropathology-associated QTL variants. For each genetic locus associated with NPEs that had at least one significant QTL in either GTEx or ROSMAP, we performed colocalization analysis using the ‘coloc.abf’ function in the coloc 5.2.2R package39. For ordinal variables, we chose dichotomizing cut points to determine case–control proportions. We used coloc’s default prior PrC of PrC = 1 × 10−5 and considered a posterior PrC > 80% as a threshold for evidence of colocalization.

To investigate whether shared GWAS signals drive association among multiple NPEs, we also performed colocalization analysis on loci with variants satisfying P < 1 × 10−4 and concordant effect direction for at least two NPEs in the GWAS meta-analysis. Due to the absence of associations in the region in APOE-adjusted analyses for NPEs other than CAA, we excluded that region for NPE–NPE colocalization analyses.

Association analyses using DLPFC DNA methylation and bulk RNA-seq data from ROSMAP

ROSMAP participants had postmortem bulk-tissue samples collected from the DLPFC, which underwent DNA methylation quantification using the Illumina DNAMethylation450 chip and gene expression and RNA-seq using the Illumina HiSeq 2000 (ref. 92). In total, 708 ROSMAP participants had DNA methylation data available for analysis. We restricted analyses involving DNA methylation or RNA-seq data to NPE-associated loci that reached the genome-wide significance threshold in the meta-analysis and also colocalized with mQTL or eQTL in brain tissue in either GTEx or ROSMAP.

In our APOE \(\epsilon\)-adjusted genetic association analysis, one locus near APOE remained significantly associated with CAA. This locus colocalized with DNA methylation levels at four CpG sites in ROSMAP. To investigate whether these CpG sites were in turn associated with CAA pathology, we combined individual-level DNA methylation and neuropathological data in ROSMAP for analysis. We used cumulative logit models using the ‘clm’ function implemented in the R package ordinal 2023.13.12-04 (ref. 107) with the semi-quantitative CAA variable described above as the outcome for analysis. We performed four analyses, with one of each of the four CpG sites tested as the independent variable of interest in each analysis. We adjusted for age, sex, ROS versus MAP study, bisulfite-conversion efficiency, postmortem interval and APOE \(\epsilon\) diplotype in each analysis. Similar models were used to test associations between hippocampal sclerosis and LATE-NC and methylation levels at CpG sites cg09613507 and cg23422036. Wald tests were performed on the resulting parameter estimates to test for statistical significance. We also performed post hoc analyses examining the association between these CpG sites and APOC2 expression in ROSMAP.

For genes with significant eQTL in GTEx or ROSMAP that colocalized with NPE, we performed additional targeted analyses to assess the association between gene expression and NPE. We first assessed the association between NPE lead variants and gene expression in ROSMAP to confirm nominal eQTL status. We then performed generalized linear regression models between square-root or log-transformed mRNA expression and NPE outcomes adjusting for age at death, sex, PMI and RNA integrity number.

Plots from these analyses were generated using the R package ggplot2 (ref. 108).

Replication of CAA locus using Mayo Clinic neuropathology GWAS

We used data from Mayo Clinic Brain Bank participants available from ref. 26 study of the genetic risk factors of CAA (dataset heretofore referred to as MC-CAA) to attempt to replicate a new CAA locus in the present study in an independent sample26. Neuropathology and genetic variant data were downloaded from Synapse (Synapse IDs: syn10930250, syn21499318, syn21522653 and syn21547862). Eight participants were identified as duplicates between batches or with NACC participants and removed. While CAA is graded on a four-level ordinal scale in the present study, CAA in MC-CAA is graded as an average of CAA burden across five brain regions26. We therefore used linear regression with the outcome variable as sqrt(CAA) with the independent variable of interest being the number of G alleles of variant rs7247551. Covariates included APOE diplotype (\(\epsilon 3/\epsilon 3\), \(\epsilon 2/\epsilon 4\), \(\epsilon 2/\epsilon 3\), \(\epsilon 3/\epsilon 4\) or \(\epsilon 4/\epsilon 4\)), sex, age at death (truncated at 90 years) and the first three genetic PCs.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.