Abstract
Almost every recent Alzheimer’s disease (AD) genome-wide association study (GWAS) has performed meta-analysis to combine studies with clinical diagnosis of AD with studies that use proxy phenotypes based on parental disease history. Here, we report major limitations in current GWAS-by-proxy (GWAX) practices due to uncorrected survival bias and nonrandom participation in parental illness surveys, which cause substantial discrepancies between AD GWAS and GWAX results. We demonstrate that the current AD GWAX provide highly misleading genetic correlations between AD risk and higher education, which subsequently affects a variety of genetic epidemiological applications involving AD and cognition. Our study sheds light on potential issues in the design and analysis of middle-aged biobank cohorts and underscores the need for caution when interpreting genetic association results based on proxy-reported parental disease history.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
Summary statistics for the AD GWAX are freely available at http://qlu-lab.org/data.html and the GWAS Catalog (parental AD status: https://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90448001-GCST90449000/GCST90448951/; parental AD status following the approach of Jansen et al.6: https://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90448001-GCST90449000/GCST90448949/; parental AD status following the approach of Marioni et al.5: https://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90448001-GCST90449000/GCST90448950/; parental health awareness in UKB: https://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90448001-GCST90449000/GCST90448952/; parental health awareness in AllofUs: https://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90448001-GCST90449000/GCST90448947/; participation in personal and family medical history survey in AllofUs: https://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90448001-GCST90449000/GCST90448948/). The HRS genetic data were accessed through NIAGADS with accession number NG00119.v1. UKB individual-level data used in the present work were obtained under application no. 42148.
Code availability
The code used in this study is available from the following websites: GSUB, GitHub (https://github.com/qlu-lab/GSUB) or Zenodo50 (https://doi.org/10.5281/zenodo.13845422); PLINK1.9 (https://www.cog-genomics.org/plink2/) and 2.0 (https://www.cog-genomics.org/plink/2.0/); Regenie, https://github.com/rgcgithub/regenie; Hail, https://hail.is/docs/0.2/index.html; GNOVA, https://github.com/qlu-lab/GNOVA-2.0; LDSC, https://github.com/bulik/ldsc; METAL, https://github.com/statgen/METAL; PRS-CS, https://github.com/getian107/PRScs. We also used the following R packages: GenomicSEM v0.0.4, tidyverse v2.0.0, data.table v1.14.8, mcr v1.3.3, WinCurse v0.0.1, lme4 v1.1.35.3, TwoSampleMR v0.5.6, sandwich v3.0.2 and glmnet v4.1.6.
References
Abdellaoui, A., Yengo, L., Verweij, K. J. & Visscher, P. M. 15 years of GWAS discovery: realizing the promise. Am. J. Hum. Genet. 110, 179–194 (2023).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Liu, J. Z., Erlich, Y. & Pickrell, J. K. Case–control association mapping by proxy using family history of disease. Nat. Genet. 49, 325–331 (2017).
McKhann, G. et al. Clinical diagnosis of Alzheimer’s disease: report of the NINCDS-ADRDA Work Group under the auspices of Department of Health and Human Services Task Force on Alzheimer’s Disease. Neurology 34, 939 (1984).
Marioni, R. E. et al. GWAS on family history of Alzheimer’s disease. Transl. Psychiatry 8, 99 (2018).
Jansen, I. E. et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet. 51, 404–413 (2019).
Schwartzentruber, J. et al. Genome-wide meta-analysis, fine-mapping and integrative prioritization implicate new Alzheimer’s disease risk genes. Nat. Genet. 53, 392–402 (2021).
Wightman, D. P. et al. A genome-wide association study with 1,126,563 individuals identifies new risk loci for Alzheimer’s disease. Nat. Genet. 53, 1276–1282 (2021).
Bellenguez, C. et al. New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nat. Genet. 54, 412–436 (2022).
Sherva, R. et al. African ancestry GWAS of dementia in a large military cohort identifies significant risk loci. Mol. Psychiatry 28, 1293–1302 (2023).
Escott-Price, V. & Hardy, J. Genome-wide association studies for Alzheimer’s disease: bigger is not always better. Brain Commun. 4, fcac125 (2022).
Grotzinger, A. D., Fuente, J., Privé, F., Nivard, M. G. & Tucker-Drob, E. M. Pervasive downward bias in estimates of liability-scale heritability in genome-wide association study meta-analysis: a simple solution. Biol. Psychiatry 93, 29–36 (2023).
Kunkle, B. W. et al. Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nat. Genet. 51, 414–430 (2019).
Demange, P. A. et al. Investigating the genetic architecture of noncognitive skills using GWAS-by-subtraction. Nat. Genet. 53, 35–44 (2021).
Rietveld, C. A. et al. Common genetic variants associated with cognitive performance identified using the proxy-phenotype method. Proc. Natl Acad. Sci. USA 111, 13790–13794 (2014).
Larsson, S. C. et al. Modifiable pathways in Alzheimer’s disease: Mendelian randomisation analysis. BMJ 359, j5375 (2017).
Andrews, S. J. et al. Causal associations between modifiable risk factors and the Alzheimer’s phenome. Ann. Neurol. 89, 54–65 (2021).
Lambert, J.-C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet. 45, 1452–1458 (2013).
Hujoel, M. L. A., Gazal, S., Loh, P.-R., Patterson, N. & Price, A. L. Liability threshold modeling of case–control status and family history of disease increases association power. Nat. Genet. 52, 541–547 (2020).
de la Fuente, J., Grotzinger, A. D., Marioni, R. E., Nivard, M. G. & Tucker-Drob, E. M. Integrated analysis of direct and proxy genome wide association studies highlights polygenicity of Alzheimer’s disease outside of the APOE region. PLoS Genet. 18, e1010208 (2022).
Liu, H. et al. Mendelian randomization highlights significant difference and genetic heterogeneity in clinically diagnosed Alzheimer’s disease GWAS and self-report proxy phenotype GWAX. Alzheimer’s Res. Ther. 14, 17 (2022).
European Alzheimer’s & Dementia Biobank Mendelian Randomization (EADB-MR) Collaboration. Genetic associations between modifiable risk factors and Alzheimer disease. JAMA Netw. Open 6, e2313734 (2023).
Thorp, J. G. et al. Genetic evidence that the causal association of educational attainment with reduced risk of Alzheimer’s disease is driven by intelligence. Neurobiol. Aging 119, 127–135 (2022).
Chen, Y. et al. Genomic atlas of the plasma metabolome prioritizes metabolites implicated in human diseases. Nat. Genet. 55, 44–53 (2023).
Grotzinger, A. D. et al. Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat. Hum. Behav. 3, 513–525 (2019).
Wu, Y. et al. Estimating genetic nurture with summary statistics of multigenerational genome-wide association studies. Proc. Natl Acad. Sci. USA 118, e2023184118 (2021).
Nalls, M. A. et al. Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet Neurol. 18, 1091–1102 (2019).
van Rheenen, W. et al. Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis. Nat. Genet. 48, 1043–1048 (2016).
Ferrari, R. et al. Frontotemporal dementia and its subtypes: a genome-wide association study. Lancet Neurol. 13, 686–699 (2014).
Chia, R. et al. Genome sequencing analysis identifies new loci associated with Lewy body dementia and provides insights into its genetic architecture. Nat. Genet. 53, 294–303 (2021).
Mignogna, G. et al. Patterns of item nonresponse behaviour to survey questionnaires are systematic and associated with genetic loci. Nat. Hum. Behav. 7, 1371–1387 (2023).
Schoeler, T. et al. Participation bias in the UK Biobank distorts genetic associations and downstream analyses. Nat. Hum. Behav. 7, 1216–1227 (2023).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Phelan, J. C. & Link, B. G. Fundamental cause theory. in Medical Sociology on the Move. 105−125 (Springer, 2013).
Pedersen, E. M. et al. Accounting for age of onset and family history improves power in genome-wide association studies. Am. J. Hum. Genet. 109, 417–432 (2022).
Tublin, J. M., Adelstein, J. M., del Monte, F., Combs, C. K. & Wold, L. E. Getting to the heart of Alzheimer disease. Circ. Res. 124, 142–149 (2019).
Stakos, D. A. et al. The Alzheimer’s disease amyloid-β hypothesis in cardiovascular aging and disease: JACC Focus Seminar. J. Am. Coll. Cardiol. 75, 952–967 (2020).
Kurki, M. I. et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 613, 508–518 (2023).
Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).
Pirastu, N. et al. Genetic analyses identify widespread sex-differential participation bias. Nat. Genet. 53, 663–671 (2021).
Tyrrell, J. et al. Genetic predictors of participation in optional components of UK Biobank. Nat. Commun. 12, 886 (2021).
Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 53, 1097–1103 (2021).
The All of Us Program Investigators. The “All of Us” Research Program. N. Engl. J. Med. 381, 668–676 (2019).
The All of Us Research Program Genomics Investigators. Genomic data in the All of Us Research Program. Nature 627, 340–346 (2024).
Turley, P. et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat. Genet. 50, 229–237 (2018).
Lu, Q. et al. A powerful approach to estimating annotation-stratified genetic covariance via GWAS summary statistics. Am. J. Hum. Genet. 101, 939–964 (2017).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1776 (2019).
Hemani, G. et al. The MR-Base platform supports systematic causal inference across the human phenome. eLife 7, e34408 (2018).
qlu-lab. qlu-lab/GSUB. Zenodo https://doi.org/10.5281/zenodo.13845422 (2024).
Acknowledgements
The authors gratefully acknowledge research support from National Institute on Aging (NIA) grant R21 AG085162 (Q.L.), NIA Center grant P30 AG017266 (Q.L.) and the University of Wisconsin−Madison Office of the Chancellor and the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation (WARF; Q.L.). The All of Us Research Program is supported by the National Institutes of Health (NIH), Office of the Director, Regional Medical Centers: 1 OT2 OD026549, 1 OT2 OD026554, 1 OT2 OD026557, 1 OT2 OD026556, 1 OT2 OD026550, 1 OT2 OD 026552, 1 OT2 OD026553, 1 OT2 OD026548, 1 OT2 OD026551, 1 OT2 OD026555; IAA AOD 16037; Federally Qualified Health Centers: HHSN 263201600085U; Data and Research Center: 5 U2C OD023196; Biobank: 1 U24 OD023121; The Participant Center: U24 OD023176; Participant Technology Systems Center: 1 U24 OD023163; Communications and Engagement: 3 OT2 OD023205; 3 OT2 OD023206; and Community Partners: 1 OT2 OD025277; 3 OT2 OD025315; 1 OT2 OD025337 and 1 OT2 OD025276. In addition, the All of Us Research Program would not be possible without the partnership of its participants. The Health and Retirement Study (HRS) genetic data were accessed through NIAGADS with accession no. NG00119.v1. These data were collected with financial support from NIH Director’s Opportunity for Research awards using American Reinvestment and Recovery Act funds (RC2 AG036495-01, RC4 AG039029-01). With these funds, the HRS has genotyped almost 20,000 respondents who provided DNA samples and signed consent forms in 2006–2012. The HRS data were produced and distributed by the University of Michigan under the directorship of D. R. Weir, with funding from the NIA (NIA U01AG009470). This research was conducted using the UKB Resource under application no. 42148. We acknowledge the participants and investigators of the FinnGen study. We thank the members of the Social Genomics Working Group at the University of Wisconsin−Madison for helpful comments.
Author information
Authors and Affiliations
Contributions
Q.L. conceived and designed the study. Y.W. and Z.S. performed the analyses. Y.W. and Q.L. wrote the manuscript. S.D. implemented the software package for GWAS-by-subtraction. Q.Z. performed the GWAS on parental illness awareness in UKB. J.M. assisted with the mathematical derivations for GWAS-by-subtraction. S.M. advised on the genetics of AD and cognition. J.M.F. advised on the HRS cohort and social science issues. All authors revised and approved the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Michelle Lupton and Bjarni Vilhjálmsson for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1−18.
Supplementary Tables
Supplementary Tables 1−21.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, Y., Sun, Z., Zheng, Q. et al. Pervasive biases in proxy genome-wide association studies based on parental history of Alzheimer’s disease. Nat Genet 56, 2696–2703 (2024). https://doi.org/10.1038/s41588-024-01963-9
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41588-024-01963-9