Abstract
Mixed-model association analysis (MMAA) is the preferred tool for performing genome-wide association studies. However, existing MMAA tools often have long runtimes and high memory requirements. Here we present LDAK-KVIK, an MMAA tool for analysis of quantitative and binary phenotypes. LDAK-KVIK is computationally efficient, requiring less than 10 CPU hours and 5 Gb memory to analyze genome-wide data for 350,000 individuals. Using simulated phenotypes, we show that LDAK-KVIK produces well-calibrated test statistics for both homogeneous and heterogeneous datasets. When applied to real phenotypes, LDAK-KVIK has the highest power among all tools considered. For example, across 40 quantitative UK Biobank phenotypes (average sample size 349,000), LDAK-KVIK finds 16% more independent, genome-wide significant loci than classical linear regression, whereas BOLT-LMM and REGENIE find 15% and 11% more, respectively. LDAK-KVIK can also be used to perform gene-based tests; across the 40 quantitative UK Biobank phenotypes, LDAK-KVIK finds 18% more significant genes than the leading existing tool. Last, LDAK-KVIK produces state-of-the-art polygenic scores.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
Data availability
Our study used data from UK Biobank, which we applied for and downloaded from www.ukbiobank.ac.uk. The UK Biobank has ethics approval from the North West Multi-centre Research Ethics Committee. The summary statistics and PGS SNP weightings from our enlarged analysis of 62 quantitative phenotypes can be downloaded from www.ldak-kvik.com/summaries.
Code availability
LDAK-KVIK is part of the software package LDAK. The latest version of LDAK can be downloaded from www.dougspeed.com and the version used in this work is available via Zenodo at https://doi.org/10.5281/zenodo.15747229 (ref. 46). Full instructions for running LDAK-KVIK are provided at www.ldak-kvik.com.
References
Uffelmann, E. et al. Genome-wide association studies. Nat. Rev. Methods Prim. 1, 59 (2021).
Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet 101, 5–22 (2017).
Chen, Z. et al. China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up. Int J. Epidemiol. 40, 1652–1666 (2011).
Keaton, J. M. et al. Genome-wide analysis in over 1 million individuals of European ancestry yields improved polygenic risk scores for blood pressure traits. Nat. Genet. 56, 778–791 (2024).
Haines, J. L. et al. Complement factor H variant increases the risk of age-related macular degeneration. Science 308, 419–421 (2005).
Kurki, M. I. et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 613, 508–518 (2023).
Leitsalu, L. et al. Cohort profile: Estonian Biobank of the Estonian Genome Center, University of Tartu. Int. J. Epidemiol. 44, 1137–1147 (2014).
Lippert, C. et al. FaST linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011).
Zhou, X. & Stephens, M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat. Methods 11, 407–409 (2014).
Yang, J., Zaitlen, N. A., Goddard, M. E., Visscher, P. M. & Price, A. L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014).
Svishcheva, G. R., Axenovich, T. I., Belonogova, N. M., van Duijn, C. M. & Aulchenko, Y. S. Rapid variance components–based method for whole-genome association analysis. Nat. Genet. 44, 1166–1170 (2012).
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 53, 1097–1103 (2021).
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Loh, P. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).
Campos, A. I. et al. Boosting the power of genome-wide association studies within and across ancestries by using polygenic scores. Nat. Genet. 55, 1769–1776 (2023).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature https://doi.org/10.1038/s41586-018-0579-z (2018).
Sudlow, C. et al. The UK Biobank resource with deep phenotyping and genomic data. PLoS Med. 12, e1001779 (2015).
Jiang, L. et al. A resource-efficient tool for mixed model association analysis of large-scale data. Nat. Genet. 51, 1749–1755 (2019).
Berrandou, T., Balding, D. & Speed, D. LDAK-GBAT: fast and powerful gene-based association testing using summary statistics. Am. J. Hum. Genet. 110, 23–29 (2023).
MacKay, D. J. Information Theory, Inference and Learning Algorithms (Cambridge Univ. Press, 2003).
Dey, R., Schmidt, E. M., Abecasis, G. R. & Lee, S. A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS. Am. J. Hum. Genet 101, 37–49 (2017).
Ma, Y., Bi, W. & Zhang, J.-F. Empirical saddlepoint approximation and its application to genome-wide association studies. In 2021 40th Chinese Control Conference (CCC) 6380–6385 (IEEE, 2021).
Bi, W., Fritsche, L. G., Mukherjee, B., Kim, S. & Lee, S. A fast and accurate method for genome-wide time-to-event data analysis and its application to UK Biobank. Am. J. Hum. Genet. 107, 222–233 (2020).
Speed, D., Holmes, J. & Balding, D. Evaluating and improving heritability models using summary statistics. Nat. Genet. 52, 458–462 (2020).
Zhang, Q., Privé, F., Vilhjálmsson, B. & Speed, D. Improved genetic prediction of complex traits from individual-level data or summary statistics. Nat. Commun. 12, 4192 (2021).
Speed, D. et al. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986–992 (2017).
Speed, D. & Balding, D. SumHer better estimates the SNP heritability of complex traits from summary statistics. Nat. Genet. 51, 277–284 (2019).
Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005).
The ICD-10 Classification of Mental and Behavioural Disorders: Clinical Descriptions and Diagnostic Guidelines, Vol. 1 (World Health Organization, 1992).
de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
Bakshi, A. et al. Fast set-based association analysis using summary data from GWAS identifies novel gene loci for human complex traits. Sci. Rep. 6, 32894 (2016).
Momin, M. M. et al. A method for an unbiased estimate of cross-ancestry genetic correlation using individual-level data. Nat. Commun. 14, 722 (2023).
Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746–753 (2018).
Schoech, A. et al. Quantification of frequency-dependent genetic architectures and action of negative selection in 25 UK Biobank traits. Nat. Commun. 10, 790 (2019).
Nagai, A. et al. Overview of the BioBank Japan Project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).
Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).
Pain, O. et al. Evaluation of polygenic prediction methodology within a reference-standardized framework. PLoS Genet. 17, e1009021 (2021).
Pain, O., Al-Chalabi, A. & Lewis, C. M. The GenoPred pipeline: a comprehensive and scalable pipeline for polygenic scoring. Bioinformatics 40, btae551 (2024).
Mak, T., Porsch, R., Choi, S., Zhou, X. & Sham, P. Polygenic scores via penalized regression on summary statistics. Genet. Epidemiol. 41, 469–480 (2017).
Yang, S. & Zhou, X. Accurate and scalable construction of polygenic scores in large biobank data sets. Am. J. Hum. Genet. 106, 679–693 (2020).
Privé, F., Arbel, J. & Vilhjálmsson, B. LDpred2: better, faster, stronger. Bioinformatics 36, 5424–5431 (2021).
Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1776 (2019).
Pazokitoroudi, A. et al. Efficient variance components analysis across millions of genomes. Nat. Commun. 11, 4020 (2020).
Tseng, P. Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109, 475–494 (2001).
Hof, J. P. & Speed, D. LDAK 6.1. Zenodo https://doi.org/10.5281/zenodo.15747229 (2025).
Acknowledgements
This research was conducted using the UK Biobank Resource under application number 21432. The computing for this project was performed on the GenomeDK cluster (Aarhus University). We thank A. Halager and D. Søndergaard (Aarhus University) for programming suggestions; D. Balding (University of Melbourne) for helpful comments on the manuscript; J. Bajzik, A. Depope and M. Robinson (Institute of Science and Technology Austria) for assistance with the UK Biobank Research Analysis Platform; H. Heydari (University of Toronto) for advice on heritability estimation; and S. Jin (Karolinska Institute) for testing LDAK-KVIK. D.S. is supported by the Aarhus University Research Foundation (AUFF), by the Independent Research Fund Denmark (project no. 7025-00094B) and by a European Research Council Consolidator Grant (ID 101088901, acronym ClassifyDiseases).
Author information
Authors and Affiliations
Contributions
J.H. and D.S. jointly developed the software, performed the analysis and wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Longda Jiang, Po-Ru Loh and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Notes 1–10, Figs. 1–46 and Tables 1–9.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hof, J.P., Speed, D. LDAK-KVIK performs fast and powerful mixed-model association analysis of quantitative and binary phenotypes. Nat Genet 57, 2116–2123 (2025). https://doi.org/10.1038/s41588-025-02286-z
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41588-025-02286-z


