Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

LDAK-KVIK performs fast and powerful mixed-model association analysis of quantitative and binary phenotypes

Abstract

Mixed-model association analysis (MMAA) is the preferred tool for performing genome-wide association studies. However, existing MMAA tools often have long runtimes and high memory requirements. Here we present LDAK-KVIK, an MMAA tool for analysis of quantitative and binary phenotypes. LDAK-KVIK is computationally efficient, requiring less than 10 CPU hours and 5 Gb memory to analyze genome-wide data for 350,000 individuals. Using simulated phenotypes, we show that LDAK-KVIK produces well-calibrated test statistics for both homogeneous and heterogeneous datasets. When applied to real phenotypes, LDAK-KVIK has the highest power among all tools considered. For example, across 40 quantitative UK Biobank phenotypes (average sample size 349,000), LDAK-KVIK finds 16% more independent, genome-wide significant loci than classical linear regression, whereas BOLT-LMM and REGENIE find 15% and 11% more, respectively. LDAK-KVIK can also be used to perform gene-based tests; across the 40 quantitative UK Biobank phenotypes, LDAK-KVIK finds 18% more significant genes than the leading existing tool. Last, LDAK-KVIK produces state-of-the-art polygenic scores.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Type 1 errors and power of MMAA tools for quantitative phenotypes.
Fig. 2: Exploiting the dependency of per-SNP heritability on MAF.
Fig. 3: Performance of MMAA tools when analyzing 40 quantitative UK Biobank phenotypes.
Fig. 4: Accuracy of step 1 PGS and power of LDAK-KVIK-GBAT.

Similar content being viewed by others

Data availability

Our study used data from UK Biobank, which we applied for and downloaded from www.ukbiobank.ac.uk. The UK Biobank has ethics approval from the North West Multi-centre Research Ethics Committee. The summary statistics and PGS SNP weightings from our enlarged analysis of 62 quantitative phenotypes can be downloaded from www.ldak-kvik.com/summaries.

Code availability

LDAK-KVIK is part of the software package LDAK. The latest version of LDAK can be downloaded from www.dougspeed.com and the version used in this work is available via Zenodo at https://doi.org/10.5281/zenodo.15747229 (ref. 46). Full instructions for running LDAK-KVIK are provided at www.ldak-kvik.com.

References

  1. Uffelmann, E. et al. Genome-wide association studies. Nat. Rev. Methods Prim. 1, 59 (2021).

    Article  CAS  Google Scholar 

  2. Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet 101, 5–22 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Chen, Z. et al. China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up. Int J. Epidemiol. 40, 1652–1666 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Keaton, J. M. et al. Genome-wide analysis in over 1 million individuals of European ancestry yields improved polygenic risk scores for blood pressure traits. Nat. Genet. 56, 778–791 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Haines, J. L. et al. Complement factor H variant increases the risk of age-related macular degeneration. Science 308, 419–421 (2005).

    Article  CAS  PubMed  Google Scholar 

  6. Kurki, M. I. et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 613, 508–518 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Leitsalu, L. et al. Cohort profile: Estonian Biobank of the Estonian Genome Center, University of Tartu. Int. J. Epidemiol. 44, 1137–1147 (2014).

    Article  PubMed  Google Scholar 

  8. Lippert, C. et al. FaST linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011).

    Article  CAS  PubMed  Google Scholar 

  9. Zhou, X. & Stephens, M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat. Methods 11, 407–409 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Yang, J., Zaitlen, N. A., Goddard, M. E., Visscher, P. M. & Price, A. L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Svishcheva, G. R., Axenovich, T. I., Belonogova, N. M., van Duijn, C. M. & Aulchenko, Y. S. Rapid variance components–based method for whole-genome association analysis. Nat. Genet. 44, 1166–1170 (2012).

    Article  CAS  PubMed  Google Scholar 

  12. Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 53, 1097–1103 (2021).

    Article  CAS  PubMed  Google Scholar 

  14. Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Loh, P. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Campos, A. I. et al. Boosting the power of genome-wide association studies within and across ancestries by using polygenic scores. Nat. Genet. 55, 1769–1776 (2023).

    Article  CAS  PubMed  Google Scholar 

  17. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature https://doi.org/10.1038/s41586-018-0579-z (2018).

  18. Sudlow, C. et al. The UK Biobank resource with deep phenotyping and genomic data. PLoS Med. 12, e1001779 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Jiang, L. et al. A resource-efficient tool for mixed model association analysis of large-scale data. Nat. Genet. 51, 1749–1755 (2019).

    Article  CAS  PubMed  Google Scholar 

  20. Berrandou, T., Balding, D. & Speed, D. LDAK-GBAT: fast and powerful gene-based association testing using summary statistics. Am. J. Hum. Genet. 110, 23–29 (2023).

    Article  CAS  PubMed  Google Scholar 

  21. MacKay, D. J. Information Theory, Inference and Learning Algorithms (Cambridge Univ. Press, 2003).

  22. Dey, R., Schmidt, E. M., Abecasis, G. R. & Lee, S. A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS. Am. J. Hum. Genet 101, 37–49 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Ma, Y., Bi, W. & Zhang, J.-F. Empirical saddlepoint approximation and its application to genome-wide association studies. In 2021 40th Chinese Control Conference (CCC) 6380–6385 (IEEE, 2021).

  24. Bi, W., Fritsche, L. G., Mukherjee, B., Kim, S. & Lee, S. A fast and accurate method for genome-wide time-to-event data analysis and its application to UK Biobank. Am. J. Hum. Genet. 107, 222–233 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Speed, D., Holmes, J. & Balding, D. Evaluating and improving heritability models using summary statistics. Nat. Genet. 52, 458–462 (2020).

    Article  CAS  PubMed  Google Scholar 

  26. Zhang, Q., Privé, F., Vilhjálmsson, B. & Speed, D. Improved genetic prediction of complex traits from individual-level data or summary statistics. Nat. Commun. 12, 4192 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Speed, D. et al. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986–992 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Speed, D. & Balding, D. SumHer better estimates the SNP heritability of complex traits from summary statistics. Nat. Genet. 51, 277–284 (2019).

    Article  CAS  PubMed  Google Scholar 

  29. Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005).

    Article  Google Scholar 

  30. The ICD-10 Classification of Mental and Behavioural Disorders: Clinical Descriptions and Diagnostic Guidelines, Vol. 1 (World Health Organization, 1992).

  31. de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Bakshi, A. et al. Fast set-based association analysis using summary data from GWAS identifies novel gene loci for human complex traits. Sci. Rep. 6, 32894 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Momin, M. M. et al. A method for an unbiased estimate of cross-ancestry genetic correlation using individual-level data. Nat. Commun. 14, 722 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746–753 (2018).

    Article  CAS  PubMed  Google Scholar 

  35. Schoech, A. et al. Quantification of frequency-dependent genetic architectures and action of negative selection in 25 UK Biobank traits. Nat. Commun. 10, 790 (2019).

  36. Nagai, A. et al. Overview of the BioBank Japan Project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).

    Article  CAS  PubMed  Google Scholar 

  38. Pain, O. et al. Evaluation of polygenic prediction methodology within a reference-standardized framework. PLoS Genet. 17, e1009021 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Pain, O., Al-Chalabi, A. & Lewis, C. M. The GenoPred pipeline: a comprehensive and scalable pipeline for polygenic scoring. Bioinformatics 40, btae551 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Mak, T., Porsch, R., Choi, S., Zhou, X. & Sham, P. Polygenic scores via penalized regression on summary statistics. Genet. Epidemiol. 41, 469–480 (2017).

    Article  PubMed  Google Scholar 

  41. Yang, S. & Zhou, X. Accurate and scalable construction of polygenic scores in large biobank data sets. Am. J. Hum. Genet. 106, 679–693 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Privé, F., Arbel, J. & Vilhjálmsson, B. LDpred2: better, faster, stronger. Bioinformatics 36, 5424–5431 (2021).

  43. Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1776 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Pazokitoroudi, A. et al. Efficient variance components analysis across millions of genomes. Nat. Commun. 11, 4020 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Tseng, P. Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109, 475–494 (2001).

    Article  Google Scholar 

  46. Hof, J. P. & Speed, D. LDAK 6.1. Zenodo https://doi.org/10.5281/zenodo.15747229 (2025).

Download references

Acknowledgements

This research was conducted using the UK Biobank Resource under application number 21432. The computing for this project was performed on the GenomeDK cluster (Aarhus University). We thank A. Halager and D. Søndergaard (Aarhus University) for programming suggestions; D. Balding (University of Melbourne) for helpful comments on the manuscript; J. Bajzik, A. Depope and M. Robinson (Institute of Science and Technology Austria) for assistance with the UK Biobank Research Analysis Platform; H. Heydari (University of Toronto) for advice on heritability estimation; and S. Jin (Karolinska Institute) for testing LDAK-KVIK. D.S. is supported by the Aarhus University Research Foundation (AUFF), by the Independent Research Fund Denmark (project no. 7025-00094B) and by a European Research Council Consolidator Grant (ID 101088901, acronym ClassifyDiseases).

Author information

Authors and Affiliations

Authors

Contributions

J.H. and D.S. jointly developed the software, performed the analysis and wrote the manuscript.

Corresponding author

Correspondence to Doug Speed.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Longda Jiang, Po-Ru Loh and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Notes 1–10, Figs. 1–46 and Tables 1–9.

Reporting Summary

Peer Review File

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hof, J.P., Speed, D. LDAK-KVIK performs fast and powerful mixed-model association analysis of quantitative and binary phenotypes. Nat Genet 57, 2116–2123 (2025). https://doi.org/10.1038/s41588-025-02286-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41588-025-02286-z

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics