Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Pervasive biases in proxy genome-wide association studies based on parental history of Alzheimer’s disease

Abstract

Almost every recent Alzheimer’s disease (AD) genome-wide association study (GWAS) has performed meta-analysis to combine studies with clinical diagnosis of AD with studies that use proxy phenotypes based on parental disease history. Here, we report major limitations in current GWAS-by-proxy (GWAX) practices due to uncorrected survival bias and nonrandom participation in parental illness surveys, which cause substantial discrepancies between AD GWAS and GWAX results. We demonstrate that the current AD GWAX provide highly misleading genetic correlations between AD risk and higher education, which subsequently affects a variety of genetic epidemiological applications involving AD and cognition. Our study sheds light on potential issues in the design and analysis of middle-aged biobank cohorts and underscores the need for caution when interpreting genetic association results based on proxy-reported parental disease history.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Comparing top association findings and genetic correlation results of AD GWAS and GWAX.
Fig. 2: AD GWAX biases risk prediction and causal inference.
Fig. 3: Schematic diagram for GWAS-by-subtraction.
Fig. 4: Genetic correlation of the AD and non-AD factors in GWAX with other complex traits.
Fig. 5: Genetic correlation of AD GWAS and GWAX with EA and coronary artery disease.
Fig. 6: Genetic correlation of meta-analyzed AD with EA and coronary artery disease.

Similar content being viewed by others

Data availability

Summary statistics for the AD GWAX are freely available at http://qlu-lab.org/data.html and the GWAS Catalog (parental AD status: https://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90448001-GCST90449000/GCST90448951/; parental AD status following the approach of Jansen et al.6: https://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90448001-GCST90449000/GCST90448949/; parental AD status following the approach of Marioni et al.5: https://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90448001-GCST90449000/GCST90448950/; parental health awareness in UKB: https://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90448001-GCST90449000/GCST90448952/; parental health awareness in AllofUs: https://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90448001-GCST90449000/GCST90448947/; participation in personal and family medical history survey in AllofUs: https://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90448001-GCST90449000/GCST90448948/). The HRS genetic data were accessed through NIAGADS with accession number NG00119.v1. UKB individual-level data used in the present work were obtained under application no. 42148.

Code availability

The code used in this study is available from the following websites: GSUB, GitHub (https://github.com/qlu-lab/GSUB) or Zenodo50 (https://doi.org/10.5281/zenodo.13845422); PLINK1.9 (https://www.cog-genomics.org/plink2/) and 2.0 (https://www.cog-genomics.org/plink/2.0/); Regenie, https://github.com/rgcgithub/regenie; Hail, https://hail.is/docs/0.2/index.html; GNOVA, https://github.com/qlu-lab/GNOVA-2.0; LDSC, https://github.com/bulik/ldsc; METAL, https://github.com/statgen/METAL; PRS-CS, https://github.com/getian107/PRScs. We also used the following R packages: GenomicSEM v0.0.4, tidyverse v2.0.0, data.table v1.14.8, mcr v1.3.3, WinCurse v0.0.1, lme4 v1.1.35.3, TwoSampleMR v0.5.6, sandwich v3.0.2 and glmnet v4.1.6.

References

  1. Abdellaoui, A., Yengo, L., Verweij, K. J. & Visscher, P. M. 15 years of GWAS discovery: realizing the promise. Am. J. Hum. Genet. 110, 179–194 (2023).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  2. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  3. Liu, J. Z., Erlich, Y. & Pickrell, J. K. Case–control association mapping by proxy using family history of disease. Nat. Genet. 49, 325–331 (2017).

    Article  PubMed  CAS  Google Scholar 

  4. McKhann, G. et al. Clinical diagnosis of Alzheimer’s disease: report of the NINCDS-ADRDA Work Group under the auspices of Department of Health and Human Services Task Force on Alzheimer’s Disease. Neurology 34, 939 (1984).

    Article  PubMed  CAS  Google Scholar 

  5. Marioni, R. E. et al. GWAS on family history of Alzheimer’s disease. Transl. Psychiatry 8, 99 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Jansen, I. E. et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet. 51, 404–413 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  7. Schwartzentruber, J. et al. Genome-wide meta-analysis, fine-mapping and integrative prioritization implicate new Alzheimer’s disease risk genes. Nat. Genet. 53, 392–402 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. Wightman, D. P. et al. A genome-wide association study with 1,126,563 individuals identifies new risk loci for Alzheimer’s disease. Nat. Genet. 53, 1276–1282 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  9. Bellenguez, C. et al. New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nat. Genet. 54, 412–436 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  10. Sherva, R. et al. African ancestry GWAS of dementia in a large military cohort identifies significant risk loci. Mol. Psychiatry 28, 1293–1302 (2023).

    Article  PubMed  CAS  Google Scholar 

  11. Escott-Price, V. & Hardy, J. Genome-wide association studies for Alzheimer’s disease: bigger is not always better. Brain Commun. 4, fcac125 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Grotzinger, A. D., Fuente, J., Privé, F., Nivard, M. G. & Tucker-Drob, E. M. Pervasive downward bias in estimates of liability-scale heritability in genome-wide association study meta-analysis: a simple solution. Biol. Psychiatry 93, 29–36 (2023).

    Article  PubMed  CAS  Google Scholar 

  13. Kunkle, B. W. et al. Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nat. Genet. 51, 414–430 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Demange, P. A. et al. Investigating the genetic architecture of noncognitive skills using GWAS-by-subtraction. Nat. Genet. 53, 35–44 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  15. Rietveld, C. A. et al. Common genetic variants associated with cognitive performance identified using the proxy-phenotype method. Proc. Natl Acad. Sci. USA 111, 13790–13794 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. Larsson, S. C. et al. Modifiable pathways in Alzheimer’s disease: Mendelian randomisation analysis. BMJ 359, j5375 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Andrews, S. J. et al. Causal associations between modifiable risk factors and the Alzheimer’s phenome. Ann. Neurol. 89, 54–65 (2021).

    Article  PubMed  CAS  Google Scholar 

  18. Lambert, J.-C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet. 45, 1452–1458 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  19. Hujoel, M. L. A., Gazal, S., Loh, P.-R., Patterson, N. & Price, A. L. Liability threshold modeling of case–control status and family history of disease increases association power. Nat. Genet. 52, 541–547 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  20. de la Fuente, J., Grotzinger, A. D., Marioni, R. E., Nivard, M. G. & Tucker-Drob, E. M. Integrated analysis of direct and proxy genome wide association studies highlights polygenicity of Alzheimer’s disease outside of the APOE region. PLoS Genet. 18, e1010208 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Liu, H. et al. Mendelian randomization highlights significant difference and genetic heterogeneity in clinically diagnosed Alzheimer’s disease GWAS and self-report proxy phenotype GWAX. Alzheimer’s Res. Ther. 14, 17 (2022).

    Article  CAS  Google Scholar 

  22. European Alzheimer’s & Dementia Biobank Mendelian Randomization (EADB-MR) Collaboration. Genetic associations between modifiable risk factors and Alzheimer disease. JAMA Netw. Open 6, e2313734 (2023).

  23. Thorp, J. G. et al. Genetic evidence that the causal association of educational attainment with reduced risk of Alzheimer’s disease is driven by intelligence. Neurobiol. Aging 119, 127–135 (2022).

    Article  PubMed  CAS  Google Scholar 

  24. Chen, Y. et al. Genomic atlas of the plasma metabolome prioritizes metabolites implicated in human diseases. Nat. Genet. 55, 44–53 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Grotzinger, A. D. et al. Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat. Hum. Behav. 3, 513–525 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Wu, Y. et al. Estimating genetic nurture with summary statistics of multigenerational genome-wide association studies. Proc. Natl Acad. Sci. USA 118, e2023184118 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  27. Nalls, M. A. et al. Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet Neurol. 18, 1091–1102 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  28. van Rheenen, W. et al. Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis. Nat. Genet. 48, 1043–1048 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Ferrari, R. et al. Frontotemporal dementia and its subtypes: a genome-wide association study. Lancet Neurol. 13, 686–699 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Chia, R. et al. Genome sequencing analysis identifies new loci associated with Lewy body dementia and provides insights into its genetic architecture. Nat. Genet. 53, 294–303 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. Mignogna, G. et al. Patterns of item nonresponse behaviour to survey questionnaires are systematic and associated with genetic loci. Nat. Hum. Behav. 7, 1371–1387 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Schoeler, T. et al. Participation bias in the UK Biobank distorts genetic associations and downstream analyses. Nat. Hum. Behav. 7, 1216–1227 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  34. Phelan, J. C. & Link, B. G. Fundamental cause theory. in Medical Sociology on the Move. 105−125 (Springer, 2013).

  35. Pedersen, E. M. et al. Accounting for age of onset and family history improves power in genome-wide association studies. Am. J. Hum. Genet. 109, 417–432 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. Tublin, J. M., Adelstein, J. M., del Monte, F., Combs, C. K. & Wold, L. E. Getting to the heart of Alzheimer disease. Circ. Res. 124, 142–149 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  37. Stakos, D. A. et al. The Alzheimer’s disease amyloid-β hypothesis in cardiovascular aging and disease: JACC Focus Seminar. J. Am. Coll. Cardiol. 75, 952–967 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. Kurki, M. I. et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 613, 508–518 (2023).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  39. Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Pirastu, N. et al. Genetic analyses identify widespread sex-differential participation bias. Nat. Genet. 53, 663–671 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  41. Tyrrell, J. et al. Genetic predictors of participation in optional components of UK Biobank. Nat. Commun. 12, 886 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  42. Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 53, 1097–1103 (2021).

    Article  PubMed  CAS  Google Scholar 

  43. The All of Us Program Investigators. The “All of Us” Research Program. N. Engl. J. Med. 381, 668–676 (2019).

    Article  Google Scholar 

  44. The All of Us Research Program Genomics Investigators. Genomic data in the All of Us Research Program. Nature 627, 340–346 (2024).

    Article  CAS  Google Scholar 

  45. Turley, P. et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat. Genet. 50, 229–237 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. Lu, Q. et al. A powerful approach to estimating annotation-stratified genetic covariance via GWAS summary statistics. Am. J. Hum. Genet. 101, 939–964 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  47. Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  48. Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1776 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Hemani, G. et al. The MR-Base platform supports systematic causal inference across the human phenome. eLife 7, e34408 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  50. qlu-lab. qlu-lab/GSUB. Zenodo https://doi.org/10.5281/zenodo.13845422 (2024).

Download references

Acknowledgements

The authors gratefully acknowledge research support from National Institute on Aging (NIA) grant R21 AG085162 (Q.L.), NIA Center grant P30 AG017266 (Q.L.) and the University of Wisconsin−Madison Office of the Chancellor and the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation (WARF; Q.L.). The All of Us Research Program is supported by the National Institutes of Health (NIH), Office of the Director, Regional Medical Centers: 1 OT2 OD026549, 1 OT2 OD026554, 1 OT2 OD026557, 1 OT2 OD026556, 1 OT2 OD026550, 1 OT2 OD 026552, 1 OT2 OD026553, 1 OT2 OD026548, 1 OT2 OD026551, 1 OT2 OD026555; IAA AOD 16037; Federally Qualified Health Centers: HHSN 263201600085U; Data and Research Center: 5 U2C OD023196; Biobank: 1 U24 OD023121; The Participant Center: U24 OD023176; Participant Technology Systems Center: 1 U24 OD023163; Communications and Engagement: 3 OT2 OD023205; 3 OT2 OD023206; and Community Partners: 1 OT2 OD025277; 3 OT2 OD025315; 1 OT2 OD025337 and 1 OT2 OD025276. In addition, the All of Us Research Program would not be possible without the partnership of its participants. The Health and Retirement Study (HRS) genetic data were accessed through NIAGADS with accession no. NG00119.v1. These data were collected with financial support from NIH Director’s Opportunity for Research awards using American Reinvestment and Recovery Act funds (RC2 AG036495-01, RC4 AG039029-01). With these funds, the HRS has genotyped almost 20,000 respondents who provided DNA samples and signed consent forms in 2006–2012. The HRS data were produced and distributed by the University of Michigan under the directorship of D. R. Weir, with funding from the NIA (NIA U01AG009470). This research was conducted using the UKB Resource under application no. 42148. We acknowledge the participants and investigators of the FinnGen study. We thank the members of the Social Genomics Working Group at the University of Wisconsin−Madison for helpful comments.

Author information

Authors and Affiliations

Authors

Contributions

Q.L. conceived and designed the study. Y.W. and Z.S. performed the analyses. Y.W. and Q.L. wrote the manuscript. S.D. implemented the software package for GWAS-by-subtraction. Q.Z. performed the GWAS on parental illness awareness in UKB. J.M. assisted with the mathematical derivations for GWAS-by-subtraction. S.M. advised on the genetics of AD and cognition. J.M.F. advised on the HRS cohort and social science issues. All authors revised and approved the manuscript.

Corresponding author

Correspondence to Qiongshi Lu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Michelle Lupton and Bjarni Vilhjálmsson for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1−18.

Reporting Summary

Peer Review File

Supplementary Tables

Supplementary Tables 1−21.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Y., Sun, Z., Zheng, Q. et al. Pervasive biases in proxy genome-wide association studies based on parental history of Alzheimer’s disease. Nat Genet 56, 2696–2703 (2024). https://doi.org/10.1038/s41588-024-01963-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41588-024-01963-9

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing