Fig. 1: Assessment of calibration, power, and scalability of FastKAST. | Nature Communications

Fig. 1: Assessment of calibration, power, and scalability of FastKAST.

From: Fast kernel-based association testing of non-linear genetic effects for biobank-scale data

Fig. 1

a Calibration of FastKAST under null simulations that include linear effects but no nonlinear effects (N = 50K individuals). We fixed SNP heritability at 0.50 while varying the ratio of causal variants ({0.001, 1}) and the range of minor allele frequencies (MAF) of the causal variants (ALL, COMMON, RARE). We applied FastKAST to test for nonlinear effects within 100 kb windows (after regressing out the linear effect in five windows centered around the tested window). The two-sided 95% confidence interval for the Q-Q plot was estimated using a beta distribution. b Comparison of p values computed using FastKAST to an exact method. We analyzed Body mass index and Mean platelet volume (MPV) across 5000 unrelated white British individuals in the UK Biobank (UKBB). We tested each trait for nonlinear effects of SNPs in the UKBB genotyping array within non-overlapping 100 kb windows using the exact RBF kernel and FastKAST (with approximation dimension D = 50M where M is the number of SNPs in a tested window and the kernel hyperparameter γ = 0.1). p represents the p value computed using the exact kernel; \(\tilde{p}\) represents the p value computed by FastKAST. c Power of FastKAST as a function of the kernel variance component \({\sigma }_{g}^{2}\), the kernel hyperparameter γ, and the approximation dimension D. We calculate the average (represented as a dot) across 2000 repetitions for each parameter setting and the bootstrap standard error bar across 1000 bootstrap replicates (denoted as a bar). d The runtime of FastKAST and the exact method as a function of sample size (N) for a fixed number of SNPs (M = 30) and approximation dimension D = 50M (the default in this study). The exact method requires hours to analyze sample sizes larger than 50K. FastKAST remains efficient for sample sizes as large as 500K.

Back to article page