Introduction

Accelerated decline of kidney function is a serious health burden: it can lead to kidney failure, necessitating dialysis or kidney transplantation, with high risk of early mortality1,2 and otherwise limited therapeutic options. Kidney function is typically assessed by serum creatinine as estimated glomerular filtration rate (eGFR). Age-related decline of eGFR is on average −1 mL/min/1.73 m2/year in adult populations3, but exhibits a high variability due to mechanisms that are still poorly understood4.

Deciphering the genetic make-up of kidney function decline by genome-wide association studies (GWAS) is a promising route to understand these mechanisms. Since genes in GWAS loci are candidates for drug development5,6, GWAS can also help identify therapeutic options. Hundreds of genetic loci have been identified for association with eGFR by large cross-sectional GWAS7,8. Cross-sectional associations may arise through one allele associated with steeper eGFR-decline or with lower eGFR-levels stable over time and age (Fig. 1a). Genes in decline-associated loci might lead more directly to therapeutic options to decelerate progression9. So far, only few genetic loci are known for genome-wide significant association with eGFR-decline: one locus (two variants in/near UMOD) in general populations (n = 343,33910; seven further loci among pre-selected variants at Bonferroni-corrected significance) and three loci in patients with chronic kidney disease (CKD, eGFR < 60 mL/min/1.73 m2, n = 116,87011).

Fig. 1: Conceptual illustration of genetic variant association with eGFR over time/age and phenotypic models.
figure 1

a Genetic variant (SNP) associations with eGFR can arise through one allele (risk allele A) that accelerates eGFR-decline over time/age (left) or lowers eGFR in a constant fashion over time/age (right) as compared to the other allele (a). This suggests that genetic variants associated with eGFR-decline are found among genetic variants associated with eGFR cross-sectionally. Shown is a schematic for persons with A/a versus a/a. b Temporal change of eGFR can be modeled in longitudinal data in various ways (phenotypic models): as (i) difference between last and 1st eGFR value of a person (difference model; assessments in-between 1st and last unused and thus depicted as circles); (ii) eGFR over time via linear mixed model (LMM) with person-specific intercepts and slopes (LMM time model RI&RS; time = 0 corresponds to an individual’s 1st eGFR assessment); (iii) eGFR over age (LMM age model RI&RS); or (iv) eGFR over age without random slopes (LMM age model RI-only; time model RI-only possible, but not applied/shown). Shown is a schematic of the phenotypic modeling for two example persons.

This reflects a general imbalance between well-studied genetics of cross-sectional disease-related traits12 and less-studied genetics of temporal trait change using longitudinal data: there are only few robustly identified genetic variants for the temporal change of any trait11,13,14. This is despite the high clinical relevance, as deteriorating quantitative biomarkers are typically linked to disease onset and progression. The reason for this imbalance is arguably the scarcity of large longitudinal data, but also substantial uncertainty about the appropriate statistical approach that simultaneously achieves controlled type I error, high power, unbiased effect estimation, and computational speed.

Emerging large-scale longitudinal data from biobanks that integrate electronic health records (eHRs) set the stage for a new era of longitudinal GWAS (“longGWAS”). LongGWAS can address multiple questions, including the quest for genetics of trait variability15 or (here) the quest for genetics of temporal trait change.

There are various options to model temporal trait change (Fig. 1b): (i) a straightforward approach uses the difference divided by time in-between two eGFR assessments (difference model); linear mixed models (LMMs), a standard framework for longitudinal data16, can model the trait: as (ii) function of time-since-baseline (time model) or (iii) function of age (age model) with random intercepts and random slopes accounting for their correlation (RI&RS)17 or ignoring it (RI&RS uncorrelated; to improve identifiability18), or (iv) with random intercepts only (RI-only; computationally easier). LMMs can be applied to test genetic variants directly (one-stage LMM) or as computationally much faster two-stage approach (using LMM to generate “best linear unbiased predictors”, BLUPs, for person-specific slopes, evaluated via linear regression11,19; BLUPs&LinReg). Previous work applied the difference model10,20 or BLUPs&LinReg11,21, which are readily applicable for longGWAS by standard software, but cannot integrate individuals with = 1 trait assessment (“singletons”). One-stage LMMs can integrate singletons but are computationally challenging. So far, a systematic comparison between such approaches has been lacking.

Here, we set out to understand more about statistical approaches to test genetic association with temporal trait change, with eGFR-decline as role model, and about the genetics of eGFR-decline. We used simulated data and a UK Biobank (UKB) dataset on eGFR-trajectories combining creatinine values derived from study-center visits and eHRs22 (n~350 K; >1.5 million eGFR assessments over up to 27 years). Specifically, we (1) compared seven approaches regarding type I error, power, and bias and (2) searched the UKB eGFR-trajectories data for association with eGFR-decline. Since we hypothesized that eGFR-decline genetics was a subset of cross-sectional eGFR genetics, we searched for eGFR-decline association (2a) among 595 independent variants across 424 loci known for association with eGFR from cross-sectional GWAS8,23 (“595-search”), (2b) followed by longGWAS to evaluate this hypothesis.

Results

UKB eGFR-trajectories exhibit an approximately linear decline of −1 mL/min/1.73 m2/year

We analyzed unrelated European-ancestry UKB individuals without acute kidney injury (AKI) or nephrectomy, excluding eGFR assessments after onset of dialysis, kidney transplant, or end-stage kidney disease (ESKD) (“Methods” section). Our analyzed UKB data consisted of 149,263 individuals with ≥2 eGFR assessments per person (“UKB 150K”; median follow-up time = 8.4 years; m = 1,321,370 eGFR assessments) or 348,275 individuals with ≥1 eGFR assessment (“UKB 350K”; m = 1,520,382; Supplementary Fig. 1). UKB 350K was similar to 150K regarding participant characteristics: 54% women, 1.2% CKD at baseline and 4.6% at any timepoint (eGFR < 60 mL/min/1.73 m2), baseline age 35–78 years, median baseline eGFR = 97 mL/min/1.73 m2 (Table 1). We used UK10K/HRC-imputed allele dosages of 11.3 million single-nucleotide polymorphisms (SNPs) and selected 595 variants known for association with cross-sectional eGFR23 (“Methods” section).

Table 1 Participant characteristics for UKB data on eGFR-trajectories

Before evaluating genetic variants, we explored a potentially non-linear relationship of eGFR with time and age, observing approximate linearity and negligible difference by sex (Supplementary Fig. 2a–c). This was more challenging for individuals with CKD, primarily due to regression-to-the-mean effects at the start of trajectories and sparse data at their end (Supplementary Fig. 2d). Assuming linearity, mean annual eGFR-decline was comparable across approaches (−0.88 to −1.08 mL/min/1.73 m2/year), with high variability of person-specific slopes (standard deviation 0.66–0.95 mL/min/1.73 m2/year, Supplementary Table 1 and Supplementary Note 1).

LMM age model RI&RS is a powerful approach with unbiased genetic effect estimates

We considered seven approaches for genetic association analysis with eGFR-decline (Supplementary Table 2, “Methods” section Eqs. (14)): in data of individuals with ≥2 assessments over time, (i) difference model, (ii–v) four one-stage LMMs (time model RI&RS, age model RI&RS, age model RI&RS uncorrelated, age model RI-only), (vi) an LMM-based two-stage approach (BLUPs&LinReg); in data adding singletons (i.e., individuals with =1 assessment), (vii) age model RI&RS.

We compared these approaches in simulated data using various scenarios (simulation parameters corresponding to: eGFR-trajectories as in UKB 350K, ~50% singletons; eGFR-trajectories in an external cohort study, KORA-424, ~20% singletons; trajectories of another trait, body mass index, BMI, in KORA-4; “Methods” section, Supplementary Table 3). We found the following (Table 2 and Supplementary Table 4): (i) type I error was inflated for age model RI-only and age model RI&RS uncorrelated, indicating insufficient accounting for person-specific slope variability. (ii) Power was better for one-stage LMMs compared to difference model, but BLUPs&LinReg was the most powerful. When adding singletons, not possible with difference model or BLUPs&LinReg, the age model RI&RS became nearly as powerful as BLUPs&LinReg in the UKB-based scenario. (iii) Biased effect estimates were observed for BLUPs&LinReg in all scenarios (11%–38% shrinkage), in line with the bias-variance trade-off known from regularization25 (Supplementary Note 2), while estimates from age model RI&RS were unbiased.

Table 2 Performance of seven approaches to genetic association analyses for trait change in simulated and empirical longitudinal data

Empirical data (UKB 150K, or 350K when adding singletons) corroborated simulation findings regarding type I error (no control by age model RI-only and RI&RS uncorrelated, Supplementary Fig. 3), power (best for BLUPs&LinReg and age model RI&RS in UKB 350 K), and bias (BLUPs&LinReg: 38.5% shrinkage; Table 2, Supplementary Note 2, Supplementary Fig. 4, Supplementary Data 1).

Altogether, among approaches with type I error control, BLUPs&LinReg showed the best power, but biased effect estimates. When jointly aiming for good power and unbiased effect estimates, the LMM age model RI&RS was preferable, particularly in the UKB 350K dataset. We thus used the LMM age model RI&RS in UKB 350K in the following.

Twelve genetic variants across ten loci identified for association with eGFR-decline

Due to our hypothesis that genetics of eGFR-decline is a subset of genetics of cross-sectional eGFR, we first focused on the 595 variants known for cross-sectional eGFR-association23 and tested these for association with eGFR-decline (“595-search”, LMM age model RI&RS in UKB 350 K). We identified 12 variants (Pdecline < 0.05/595 = 8.4 × 10−5, 6 with Pdecline < 5 × 10−8, Fig. 2a and Table 3): (i) 7 variants known for eGFR-decline10 (near/in UMOD/PDILT (2), TPPP, C15orf54, FGF5, OVOL1, and PRKAG2) and (ii) 5 variants novel for eGFR-decline: 1 independent third UMOD/PDILT variant and 4 novel loci (near SDCCAG8, RRAGD, GGT7, PRAG1). We raised the number of variants with Pdecline < 5 × 10−8 from two (UMOD/PDILT) to six (four loci, adding loci around TPPP, C15orf54, SDCCAG8; Table 3). Results were robust upon various sensitivity analyses (Supplementary Fig. 5 and “Methods” section).

Fig. 2: Twelve variants identified for eGFR-decline by focused search among 595 variants.
figure 2

We selected 595 SNPs previously reported for association with eGFR in cross-sectional data23 and tested them for association with eGFR-decline using the one-stage LMM age model RI&RS 350K (UKB 350K; n = 348,275, m = 1,520,382). a Shown are P values (Pdecline) versus chromosomal position. We identified 12 variants (10 loci) for eGFR-decline at Bonferroni(595)-corrected significance (Pdecline < 0.05/595 = 8.4 × 10−5, brown dashed horizontal line; including 6 with Pdecline < 5 × 10−8, red dashed horizontal line), consisting of 5 novel and 7 known variants for eGFR-decline10 (blue or green, respectively). Also color-coded are two variants known for eGFR-decline not identified here (orange) and three variants known for not being associated with eGFR-decline (red)10. Variants with small minor allele frequency (MAF < 5%) are shown as circles. b Shown are genetic effect sizes for eGFR-decline (βdecline from LMM age model RI&RS 350K) versus effect sizes for association with eGFR cross-sectionally (βcross-sectional: eGFR~sex, age, SNP, PCs; eGFR from UKB baseline study-center assessment, n = 341,073). Color and symbol codes are as in (a), additionally highlighting 11 stable-effect variants (black; Pmain < 5 × 10−8, |βmain| > 0.50 mL/min/1.73 m2/allele; Pdecline ≥ 0.1; |βdecline| < 0.005 and SEdecline < 0.005 mL/min/1.73 m2/allele and year) that include the CPS1 variant (rs1047891; red in (a)). Effect allele was the cross-sectionally eGFR-lowering allele (unconditioned analyses in EUR23). The exact numerical values are provided in Supplementary Data 2.

Table 3 Twelve variants identified for association with eGFR-decline using LMM age model RI&RS in the UKB 350K dataset

The five novel variants were detected with a similar number of individuals as in previous work10 (n ~ 350,000; CKDGen, difference model) due to the age model, not with the difference model in UKB or CKDGen or due to different multiple testing burdens (Table 3 and Supplementary Data 1).

Among the nine variants previously identified for eGFR-decline10, seven were identified here (Pdecline < 0.05/595), one additional variant had Pdecline = 5.1 × 103 (directionally consistent; Supplementary Table 5). We also confirmed variants near CPS1, SHROOM3, and GATM as not associated with eGFR-decline (Pdecline ≥ 0.05, Supplementary Table 5).

Validation in external data

We obtained support in independent longitudinal data: in three population-based cohort studies from Germany, we had previously reported an approximate linear relationship of eGFR over age26 (KORA-3: n = 2933, m = 3749; KORA-4: n = 3752, m = 9644; AugUR: n = 2397, m = 3442). Baseline age was 35–84, 25–74, or 70–95 years with ~20 years (KORAs) or ~9 years of follow-up (AugUR). The %CKD was higher in these studies than in UKB: %CKD at baseline (eGFR < 60 mL/min/1.73 m2) was 5.6%, 1.5%, and 21.5%, respectively, and %CKD at any timepoint was 6.7%, 8.2%, and 26.1%. The 12-variant polygenic score in combined KORA&AugUR data was significantly associated with eGFR-decline (Pdecline = 0.013; age model RI&RS, “Methods” section).

Decline-associated variants have little effect on eGFR for 40-year-old individuals and large effects on 70-year-old individuals in contrast to 11 stable-effect variants

When comparing directionality and size of variants’ effects on eGFR-decline with effects on cross-sectional eGFR (UKB study-center baseline, n = 341,073, aged 39–72 years), we found the 12 decline-accelerating alleles to coincide with cross-sectionally eGFR-lowering alleles (Fig. 2b, blue and green dots; Supplementary Data 2). One “bad” allele lowered average eGFR by −0.012 to −0.060 mL/min/1.73 m2/year compared to cross-sectional effects of −0.13 to −0.90 mL/min/1.73 m2 (Supplementary Data 3). We also observed variants with large cross-sectional effects that had no association with eGFR-decline (e.g., CPS1 variant).

We extracted variants with large main effect on eGFR-levels and no association with eGFR-decline (Pmain < 5 × 10−8, |βmain| > 0.50 mL/min/1.73 m2 per allele, Pdecline ≥ 0.1, |βdecline| < 0.005 and SEdecline < 0.005 mL/min/1.73 m2 per allele and year), yielding 11 “stable-effect” variants (including CPS1; Supplementary Data 3). Their main effects, reflecting genetic effects on eGFR for 50-year-old individuals due to age-centering, were similar to cross-sectional effects (βcross-sectional = −0.50 to −0.74 mL/min/1.73 m2; Fig. 2b, black dots).

We visualized the 12 + 11 SNP associations on eGFR-levels over age (βmain + (age-50)*βdecline): the 12 decline-associated variants showed age-dependent effects on eGFR, while the 11 stable-effect variants showed age-independent effects (Fig. 3a). The large extent of age-dependency for decline-associated variants was remarkable: near-zero effects on eGFR-levels among 40-year-old (even UMOD/PDILT; except PRAKG2), but large effects for 70-year-old individuals, much larger than cross-sectionally (e.g., for UMOD/PDILT rs77924615: −1.59 versus −0.90 mL/min/1.73 m2 per “bad” allele, respectively; for rs854922 near RRAGD: −0.55 versus −0.28; Supplementary Data 3). This suggests that age-dependent associations with eGFR become effective mainly around the age of 40 years, while stable associations are already effective before the age of 40 years and age-independent thereafter.

Fig. 3: Differential pattern between decline-associated versus stable-effect loci regarding age-dependency, clinical progression traits, and tissue-specific gene expression regulation.
figure 3

We contrasted the 12 decline-associated variants versus 11 stable-effect variants and underlying loci. a Shown are genetic effects on eGFR for 40-, 50-, 60-, 70-year-old individuals using LMM age model RI&RS 350K (beta derived as βmain + (age-50)*βdecline) for decline-associated variants (left; blue: novel, green: known) and stable-effect variants (right; black). Effect allele was the cross-sectionally eGFR-lowering allele23 (Supplementary Data 3). b We tested the 12 + 11 variants for association with two clinical progression traits using UKB 150K, rapid decline (ncases = 1211, ncontrols = 63,392, logistic regression) and decline in CKD (nCKD =13,116, mCKD = 116,944, LMM time model RI&RS; “Methods” section and Supplementary Table 6). Significant enrichment (Penrich < 0.05) of directionally consistent nominally significant associations was found among the 12 (left; 8/12, 4/12), but not among the 11 SNPs (right; 0/11, 1/11). c We evaluated genes in loci of the 12 + 11 variants regarding tissue-specific enrichment of differentially expressed genes (DEGs): shown are enrichment P values in decline-associated loci (left, among 256 genes) and stable-associated loci (right, among 182 genes; using FUMA, testing 54 tissue types, showing top 25; “Methods” section). Significant enrichment for DEGs (FDR < 0.05, red) was found for decline-associated loci only in kidney cortex (upregulated) and for stable-effect loci in various tissues (mostly downregulated, e.g., in liver, heart, muscle, pancreas, kidney cortex).

Robustness of findings regarding non-linear age effects and eGFR-variability

The approaches applied here and by others10,11,20,21 assume linearity in the global age effect on eGFR, the person-specific age effects on eGFR, and the age effect on the SNP-association with eGFR (i.e., modeling SNP-association with linear eGFR-decline). Allowing for non-linear relationships (adding quadratic terms; “Methods” section) did not alter results for the 12 + 11 SNP associations with linear eGFR-decline (Supplementary Fig. 6). Two variants, rs77924615 and rs13334589 in/around UMOD/PDILT, showed a small, but significant association with over-linear eGFR-decline (Supplementary Fig. 7; PSNPxage² < 0.05/23 = 2.2 × 103; Supplementary Data 4). Further analyses for these two variants pointed to 50 years as breakpoint for accelerated decline (Pbreakpoint50 = 6.3 × 1056 and 1.7 × 10−5, respectively; Pbreakpoint40 = 0.45 and 0.50, Pbreakpoint60 = 0.04 and 0.04; “Methods” section).

Longitudinal data have also been used to test for SNP associations with trait variability15. When applying the model implemented in TrajGWAS15 (“Methods” section), all 12 decline-associated variants, but also 7 stable-effect variants were associated with eGFR-variability (P < 0.05/23 = 2.2 × 10−3; Supplementary Fig. 8). Thus, association with eGFR-variability answers a different question than association with eGFR-decline.

Decline-associated variants show SNP-by-age interaction in cross-sectional data

Decline-associated SNPs should show SNP-by-age interaction in cross-sectional data (UKB study-center baseline, n = 341,073; linear regression adjusted for sex, 20 principal components (PCs)): 10 of 12 showed PSNPxage < 0.05; when compared to effects on eGFR-decline in longitudinal data, interaction effects were similar (−0.010 to −0.048 mL/min/1.73 m2 per allele and year) and P values were larger, attributable to reduced power (Supplementary Data 5). None of the 11 stable-effect variants had PSNPxage < 0.05 with negative effect.

The cross-sectional data also gave us the opportunity to explore whether the age-dependency of the 12 SNP associations with eGFR was explained by their interaction with diabetes, HbA1c, hypertension, or systolic blood pressure (SBP). The SNP-by-age interaction effects remained the same when including SNP-by-diabetes, SNP-by-HbA1c, SNP-by-hypertension, or SNP-by-SBP interaction terms (Supplementary Fig. 9 and Supplementary Data 5).

Differential pattern of association with clinical progression traits between decline-associated versus stable-effect loci

From a clinical perspective, rapid eGFR-decline or eGFR-decline in CKD are of particular interest as surrogate for CKD progression3,27. Previous work on the genetics of these progression traits identified SNPs around UMOD/PDILT, PRKAG2, and TPPP11,20,21,28, suggesting an overlap with genetics of eGFR-decline in general population. We tested the 12 + 11 SNPs for association with rapid decline (ncases =1211, ncontrols = 63,392; “Methods” section) and with eGFR-decline in the subset of individuals with CKD (eGFR < 60 mL/min/1.73 m2, nckd = 13,116, mCKD = 116,944; “Methods” section). The 12 decline-associated variants were enriched for directionally consistent nominally significant association with rapid decline and eGFR-decline in CKD (Penrich =1.6 × 108 or 2.2 × 10−3, respectively), but the 11 stable-effect variants were not (Penrich = 1.0 or 0.43, respectively; Fig. 3b and Supplementary Table 6). Decline-associated variants contributing to these enrichments were near UMOD/PDILT (3), PRKAG2, and TPPP (confirmed for clinical progression traits), RRAGD, OVOL1, and C15orf54 (novel).

Both the 12 and 11 variants were enriched for association with the odds of having CKD (ncases =16,147, ncontrols = 332,128; Penrich = 2.4 × 10−16 and 9.8 × 10−11, respectively). Thus, decline-associated versus stable-effect variants showed a similar relevance for having/developing CKD, but a differential pattern for clinical progression traits.

Differential pattern of tissue-specific gene expression regulation in decline-associated versus stable-effect loci

We were interested in likely causal genes and potentially differential mechanisms implicated by the 12 decline-associated variants (10 loci) versus the 11 stable-effect variants (9 loci).

We annotated biological and statistical features to 256 and 182 genes in these loci (“Methods” section; Supplementary Data 6). We found accumulated evidence with ≥3 features for six genes to be likely causal for decline-associated loci (UMOD, PRKAG2, SDCCAG8, RRAGD, TPPP, FGF5) and for four genes for stable-effect loci (CPS1, SLC22A2, SLC34A1, UNCX; Table 4 and Supplementary Note 3). For the highlighted 6 + 4 = 10 genes, the locus index variant was in or very near (<25 kb) to the mapped gene and statistically highly likely the association-driving variant (22%–100% probability). Common-variant effects for Mendelian disease genes were found for both decline-associated and stable-effect variants; two genes known for a role in creatinine metabolism (creatinine production or tubular reuptake29,30) mapped to stable-effect loci.

Table 4 Genes supported as likely causal genes in decline-associated or stable-effect loci

While pathway-enrichment analyses were inconclusive (using Panther31,32, “Methods” section and Supplementary Note 3), analysis of tissue-specific enrichment for differentially expressed genes (DEGs) showed a strikingly differential pattern (using FUMA33, “Methods” section): significant enrichment for DEGs (false discovery rate, FDR < 0.05) was found only in kidney cortex for decline-associated loci (upregulated), yet in various tissues for stable-effect loci (mostly downregulated; e.g., in heart, liver, muscle, pancreas, kidney cortex; Fig. 3c). This suggests that decline-associated versus stable-effect loci differentiate kidney-specific versus cross-organ regulation of gene expression.

LMM-based longGWAS identifies five loci with genome-wide significance highlighting MUC1 for eGFR-decline

We now applied the LMM age model RI&RS in UKB 350K using the GMMAT/MAGEE34,35 implementation, which implements this model in a more efficient way than lme4 (“Methods” section). We tested the 595 variants and corroborated that association statistics for both implementations, GMMAT/MAGEE versus lme4, were identical (Supplementary Fig. 10 and Supplementary Data 7).

We used GMMAT/MAGEE to conduct a longGWAS, testing ~11 million autosomal variants (UK10K/HRC-imputed36, “Methods” section). We obtained results within 5 days (256 cores, 1 TB RAM) with little evidence for population stratification (lambda = 1.06).

We identified five loci associated with eGFR-decline at genome-wide significance (GC-corrected Pdecline < 5 × 10−8, “Methods” section, Fig. 4): the four loci already identified with Pdecline < 5 × 10−8 by the 595-search and one additional locus (MTX1/MUC1, novel for eGFR-decline compared to previous work10).

Fig. 4: LongGWAS is viable with GMMAT/MAGEE and identifies five loci with genome-wide significance for eGFR-decline.
figure 4

We conducted a genome-wide search for genetic variant association with eGFR-decline (Pdecline, GC-corrected, lambda = 1.06) using the LMM age model RI&RS 350K implemented in GMMAT/MAGEE34,35 (UKB 350K; n = 348,275, m = 1,520,382; testing 11 million SNPs with MAF ≥ 0.5%, imputation quality INFO ≥ 0.6). a Shown are association P values versus chromosomal position. We identified five loci at genome-wide significance (Pdecline < 5 × 10−8; red dashed horizontal line). Coloring highlights the overall 11 loci identified for eGFR-decline: 10 loci around the 12 variants identified by 595-search (Pdecline < 0.05/595 = 8.4 × 10−5, brown dashed horizontal line; 4 novel and 6 known for eGFR-decline in blue or green, respectively), and one novel locus for eGFR-decline now identified by longGWAS (cyan; lead variant rs2075570 in the 424 loci, but not among the 595 variants). Loci were derived by clumping based on variant position (d > 500kB between loci, “Methods” section). b Shown is the Quantile–Quantile (QQ) plot comparing the distribution of observed Pdecline with the distribution of Pdecline expected under the null hypothesis of “no association with eGFR-decline” (green: all variants; cyan: excluding the 10 loci around the 12 decline-associated variants; black: excluding the 424 loci around the 595 variants).

The lead variant of the MTX1/MUC1 locus, rs2075570 (Pdecline = 1.1 × 10−8), resided in the 424 loci known for cross-sectional eGFR, but was not among or correlated to the 595 variants (Pcross-sectional = 0.01 in Stanzick et al.23; Pcross-sectional = 0.80 in UKB; Supplementary Fig. 11a, b). Breakpoint analyses suggest a complex age-dependency of the rs2075570-association on eGFR (Supplementary Fig. 11c). rs2075570 modifies expression for MUC1 in tubolo-interstitial tissue37, (FDR < 5%), which suggests MUC1, a well-known gene for rare autosomal dominant tubulo-interstitial kidney disease38,39, as likely causal gene.

In total, we identified 13 independent variants (11 loci) for eGFR-decline: 7 variants (5 loci) with Pdecline < 5 × 10−8 by longGWAS and/or the 595-search and 6 variants (6 loci) by the 595-search (Pdecline < 0.05/595; Supplementary Table 7). LongGWAS results also enabled us to show full regional association signals for decline-associated loci, which align well with respective signals from cross-sectional analyses (Supplementary Fig. 12), except for the MTX/MUC1 signal (Supplementary Fig. 11a).

Discussion

Based on UKB data on eGFR trajectories with >1.5 million datapoints and the one-stage LMM age model RI&RS, we identified known and novel SNP associations with eGFR-decline. Our results support the hypothesis that decline-associated variants reside in loci known for cross-sectional eGFR, but also that eGFR-decline associations can be masked in cross-sectional data by age effects. Methodologically, we showed that the one-stage LMM age model RI&RS was statistically advantageous for this task and, implemented in GMMAT/MAGEE, computationally viable for longGWAS. Importantly, it enabled the link of genetics of eGFR-decline to age-dependent genetics of eGFR with clinical and biological implications. Our work provides important insights into the genetics of kidney function decline and into pros and cons of statistical approaches for longGWAS.

With our results, we substantially raised the number of identified loci for eGFR-decline in general population, from 810 to 11 (6 confirmed, 5 novel), and the number of genome-wide significant loci, from 1 (UMOD/PDILT) to 5. Biological annotation found evidence for three novel decline-associated loci to capture common-variant-effects for genes of rare Mendelian kidney diseases (SDCCAG8, RRAGD, and MUC1), additional to the two such genes in known eGFR-decline loci (UMOD, PRKAG2). The TPPP locus (known) was found to include a gene encoding an approved drug against CKD progression40 (SLC9A3), but TPPP was the statistically more likely causal gene21.

Our analyses also provide important insights into age-dependent versus age-independent genetics of eGFR: previously, one UMOD variant had been reported for age-dependent association with eGFR in cross-sectional data (n = 24,63541). We found all but one decline-associated variants with near-zero effects on eGFR for 40-year-old (even for UMOD) and large effects in 70-year-old individuals with up to twice the size of cross-sectional effects (e.g., near RRAGD). The mechanisms underlying decline-associated variants thus appear to become effective mainly from the age of 40 years onwards, in line with physiological kidney aging42. In contrast, mechanisms underlying the 11 stable-effect variants apparently become effective before the age of 40 years and remain age-independent thereafter. This underscored the advantage of the LMM age model, which enables the generation of age-appropriate genetic effects on eGFR that is not possible with difference model, time model, or BLUPs&LinReg.

Age-dependent versus age-independent genetics of eGFR differentiate biological processes and clinical implications: age-independent eGFR genetics identified here imply pathological or physiological processes affecting one’s predisposition to lower/higher eGFR at early adulthood that are stable over time. Stable-effect variants were associated with increased risk of CKD, but not with CKD progression. The underlying genes showed differential expression in numerous tissues including heart, liver, muscle, pancreas, and kidney, suggesting mechanisms that affect multiple organs. Stable-effect variants mapped to Mendelian kidney disease genes (SLC34A1), but also to creatinine metabolism (CPS1, SLC22A229,30) in line with differential expression in muscle.

Age-dependent eGFR genetics imply processes that are dynamic over age, which can be mechanisms of kidney aging43,44,45 or age-accumulating pathological events. In a dataset where individuals are rather healthy and individuals with AKI excluded, like here in UKB46, such pathological events could stem from age-accumulating external stressors that are common on population-scale (such as diabetes and hypertension34, (poly-)medication intake, infections, or age-related decreased immune defense). However, in this UKB data, the age-dependency of genetic effects on eGFR was independent of interaction with diabetes or hypertension, which does not support a primary role of diabetes or hypertension. The observed kidney-specificity of gene expression regulation in decline-associated loci suggests kidney-inherent mechanisms. Causal genes in decline-associated loci might be compelling targets for the study of kidney aging mechanisms, like physiological aging by nephron loss44, or subsequent remodeling of remaining nephrons to compensate function4,45. Our results suggest an overlap of eGFR-decline genetics in general population with genetics of CKD progression, as many decline-associated variants were associated with rapid decline or decline in CKD. However, challenges in these analyses include potential index event bias47 when restricting to CKD, bias in BLUPs used to define rapid decline, and limited sample size for both. Future larger datasets may help understand the overlapping or discriminating processes of physiological kidney aging versus processes that lead to progressive disease, which is considered a promising route to identify therapeutic targets45.

Methodologically, we provide important insights into the conduct of longGWAS for eGFR-decline in adult population that are generalizable to other datasets and traits in various ways. Our simulations revealed that BLUPs&LinReg had excellent power and calibrated type I error, but exhibited bias in effect estimates due to regularization25,48. This may be acceptable for locus identification, but it is disadvantageous when the study aim is to interpret effect sizes or to use them in meta-analyses. When looking for an unbiased estimator with calibrated type I error, the LMM age model RI&RS is preferable. The computational burden of this model is relatively high, but its implementation in GMMAT/MAGEE makes it viable for longGWAS in large data, filling an important gap and complementing other longGWAS software targeting trait variability (e.g., TrajGWAS15).

A further methodological aspect of our study that is generalizable is modeling the longitudinal trait over age: it avoids the time model’s differentiation between temporal effects before and after baseline, which is unnecessary when baseline is a random timepoint that does not mark an intervention. We recommend the age model for longGWAS on trait change when the trajectory start is random and the time model when the trajectory start is informative, e.g., when analyzing trait change in patients.

We acknowledge that we analyzed only individuals of European ancestry and thus missed the APOL1 locus, identified by others including African Ancestry11. Also, we relied on serum creatinine as biomarker to assess kidney function, which depends on muscle mass, and muscle mass declines by age49; this might have masked some of the age-related eGFR-decline. Genes with a role in creatinine metabolism were captured by stable-effect loci (CPS129, SLC22A230). We did not account for informative loss-to-follow-up or competing death; previous work using bivariate analyses found no impact of death as a second outcome17. Our primary LMM assumed a linear change in eGFR over age or time and derived SNP associations with linear eGFR-decline, which we found reasonable in our data, but requires evaluation in each setting.

Overall, our results provide important insights into age-dependent genetics of kidney function, which can help understand processes in kidney aging. Our methodological considerations, with kidney function decline as role model, inform future longGWAS regarding pros and cons of statistical approaches. Computationally efficient longGWAS along with the emerging large-scale longitudinal data from biobanks offer a promising route to understand the dynamics of genetic associations for disease markers and underlying mechanism.

Methods

Ethics

This UKB project was conducted under the application number 20272. The AugUR study was approved by the Ethics Committee of the University of Regensburg, Germany (vote 12-101-0258). The KORA-S3 study was approved by the local authorities and conducted in accordance with the data protection regulations as part of the World Health Organization Monitoring Trends and Determinants in Cardiovascular Disease (MONICA) Project. All other KORA studies were approved by the Ethics Committee of the Bavarian Chamber of Physicians (KORA-F3 EC Number 03097, KORA-S4 EC Number 99186, KORA-F4/FF4 EC Number 06068, KORA-Fit EC Number 17040). All studies comply with the 1964 Declaration of Helsinki and its later amendments, and all participants provided written informed consent.

UKB eGFR-trajectories data

In UKB, an observational study of ~500,000 participants, we used serum creatinine measurements from blood drawn at study-center visits (centralized measurements, Enzymatic Beckman Coulter AU5800). We obtained further serum creatinine values and information on AKI, nephrectomy, dialysis, transplantation, and ESKD from general practitioner eHRs22 (GP CTV3 and read V2 codes). We combined eHR and study-center data and computed eGFR (ancestry-term-free CKD-EPI 202150).

We included unrelated UKB participants of European ancestry51 without any eHR-record of AKI or nephrectomy and without eHR-record of dialysis, kidney transplant, or ESKD prior to their first eGFR assessment. We excluded eGFR assessments (i) before age of 35 years or January 1st, 1990, (ii) at or after eHR-record of dialysis, (iii) <6 months prior to, at or after eHR-record of kidney transplant or ESKD, (iv) after prior eGFR<15 mL/min/1.73 m2, and (v) extreme values (excluding absolute value > 10 residual SDs using LMM age model RI&RS in UKB 350K; winsorizing remaining eGFR values <15 and >200 mL/min/1.73 m2). We analyzed individuals with ≥2 eGFR assessments ≥1 year apart (UKB 150K), and, where applicable, added individuals with =1 eGFR assessment (UKB 350K).

Data processing and statistical analyses were performed using R-Software v4.0.452. All statistical tests applied were two-sided.

Genetic UKB data and pre-selection of genetic variants known for cross-sectional association with eGFR

We used UKB genomic data imputed to HRC53,54 and UK10K haplotype reference panels55 and 20 genetic PCs from Pan-UKB project51. We excluded variants with low imputation quality (Info < 0.6) or MAF < 0.5%, yielding allele dosages of 11,321,495 genetic variants. We selected 595 SNPs with genome-wide significant association with cross-sectional eGFR (CKDGen&UKB, n = 1,201,92923): (i) 594 independent index variants across 424 loci, (ii) one additional variant (rs28857283 near C15orf54; Pcross-sectional = 1.9 × 10−8) capturing a narrowly missed second signal in one of the 424 loci. The 595 SNPs included the 9 SNPs (directly or proxy by r2 ≥ 0.8) previously identified for association with eGFR-decline (n = 343,33910). Effect allele was the cross-sectionally eGFR-lowering allele (unconditioned analyses in EUR23).

Seven approaches to identify SNP associations with temporal trait change

The following is stated for eGFR, but generalizes to any quantitative trait. For all approaches, \(i\) denotes individuals (\(i=1,\ldots,n\)), \({n}_{i}\) the corresponding number of eGFR assessments (\(t=1,\ldots,{n}_{i}\)), \({ag}{e}_{i,t}\) and \({{eGFR}}_{i,t}\) the age and eGFR at the tth timepoint, and \({{SNP}}_{i}\) the allele dosage for a genetic variant (omitting indexing for the different SNPs). All SNP-association models were adjusted for 20 PCs (\({{PC}}_{1,i},\,\ldots,\,{{PC}}_{20,i}\)) (omitted in the following equations). Error terms \({\epsilon }_{i}\) or \({\varepsilon }_{i,t}\sim N\left(0,{\sigma }^{2}\right)\) are i.i.d. (and independent of RI&RS). We tested the SNPs for association with eGFR-decline by the following six approaches in data of individuals with ≥2 eGFR assessments:

  1. (i)

    difference model10,20,

    $$\frac{{{eGFR}}_{i,{n}_{i}}-{{eGFR}}_{i,1}}{{ag}{e}_{i,{n}_{i}}-{ag}{e}_{i,1}}={\beta }_{0}+{\beta }_{1}*{SN}{P}_{i}+{\epsilon }_{i}$$
    (1)
  2. (ii)

    LMM time model RI&RS (with RI \({\gamma }_{0i}\) and RS \({\gamma }_{1i}\) from bivariate normal distribution, allowing for correlation) that models eGFR-levels as function of time-since-baseline (\({tim}{e}_{i,t}\)) and SNP-association with eGFR-decline as \({{time}}_{i,t}*{SN}{P}_{i}\) interaction, adjusting for age-at-baseline (\({ag}{e}_{i,1}\)),

    $${eGF}{R}_{i,t}= \, {\beta }_{0}+{\beta }_{1} * {se}{x}_{i}+{\beta }_{2} * {ag}{e}_{i,1}+{\beta }_{3} * {{time}}_{i,t}+{\beta }_{4} * {SN}{P}_{i} \\ +{\beta }_{5} * {{time}}_{i,t} * {SN}{P}_{i}+{\gamma }_{0i}+{\gamma }_{1i} * {tim}{e}_{i,t}+{\varepsilon }_{i,t}$$
    (2)
  3. (iii)

    LMM age model RI&RS, equivalent to (2) but now modeling eGFR as function of age-at-exam (agei,t) and SNP-association with eGFR-decline as agei,t SNPi interaction:

    $${eGF}{R}_{i,t}= {\beta }_{0}+{\beta }_{1} * {se}{x}_{i}+{\beta }_{2} * {ag}{e}_{i,t}+{\beta }_{3} * {SN}{P}_{i}+{\beta }_{4} * {{age}}_{i,t} * {SN}{P}_{i} \\ +{\gamma }_{0i}+{\gamma }_{1i} * {ag}{e}_{i,t}+{\varepsilon }_{i,t}$$
    (3)
  4. (iv)

    LMM age model RI&RS uncorrelated, where \({\gamma }_{0i}\) and \({\gamma }_{1i}\) are from independent univariate normal distributions,

  5. (v)

    LMM age model RI-only, without RS term:

    $${eGF}{R}_{i,t}= {\beta }_{0}+{\beta }_{1} * {se}{x}_{i}+{\beta }_{2} * {ag}{e}_{i,t}+{\beta }_{3} * {SN}{P}_{i}+{\beta }_{4} * {{age}}_{i,t} \\ * {SN}{P}_{i}+{\gamma }_{0i}+{\varepsilon }_{i,t}$$
    (4)
  6. (vi)

    BLUPs&LinReg11,21, a two-stage approach (a) estimating RS terms, \({\hat{\gamma }}_{1i}\), via BLUPs based on LMM age model RI&RS (as in (3) without SNP as covariate) and (b) using \({\hat{\gamma }}_{1i}\) as outcome for SNP-association via linear regression (as in (1)).

In a seventh approach, we repeated the age model RI&RS in extended data adding individuals with =1 eGFR assessment (age model RI&RS including singletons).

All approaches make use of the entire trajectories (ni ≥ 2; ni ≥ 1 for the 7th approach), except the difference model which utilizes only two values over time (e.g., 1st and last). For analyses, we divided age and time by 10 and centered age at 50 years, ensuring appropriate scaling for optimization of LMMs (re-scaling results for all presentations). LMMs were fitted using lmer() (R-package lme456 v1.1.34; Powell’s BOBYQA optimizer57).

Evaluating type I error, power, bias in effect sizes, and detectability of eGFR-decline variants for the seven approaches

We simulated datasets for three phenotypic scenarios: (i) we used observed age-at-exam for randomly sampled UKB 350K individuals and simulation parameters (derived from UKB 350K, ~50% singletons); (ii + iii) we simulated a cohort study scenario (~20% attrition between baseline and follow-up, 20% singletons) with simulation parameters from the independent KORA-4 study26 for eGFR or BMI, respectively (details on simulation parameters in Supplementary Table 3). For each scenario, genotypes, random effects, and residual errors were simulated (10,000 times), then phenotypes were generated according to Eq. (3) without sex effects, with true SNP-association βchange. For each approach, we computed type I error rates (proportion of nominally significant SNPs, Pchange < 0.05, βchange = 0), power (proportion of nominally significant SNPs, Pchange < 0.05, βchange ≠ 0), and bias (estimated genetic effect relative to βchange ≠ 0).

To evaluate empirical type I error, we generated 10,000 “null-SNPs” for UKB individuals (permutation of allele dosage of 500 out of the 595 SNPs, 20 times) and derived, for each approach, the proportion of SNPs with Pchange < 0.05 as type I error estimate. We computed empirical power and bias based on the nine SNPs known for eGFR-decline10 as proportion of SNPs directionally consistent (Pchange < 0.05; power) and mean relative difference of observed genetic effects compared to reference (bias). Finally, we derived detectability by testing 595 SNPs for association with eGFR-decline (judged at Pchange < 0.05/595 = 8.4 × 10−5).

Validation in external data

We used independent population-based longitudinal data from three studies, KORA-3, KORA-4, and AugUR from Germany26. Recruitment was via population registry, inviting randomly selected inhabitants of Augsburg (KORAs) or Regensburg (AugUR) of specific age range to participate. We tested the joint effect of identified decline-associated variants as PGS (sum of eGFR-decline-accelerating alleles weighted by βdecline) for association with eGFR-decline (age model RI&RS including singletons; adjusting for study membership).

Allowing for non-linear age effects

The LMM framework enables alleviating the linearity assumptions by, e.g., fitting 2nd degree polynomials for the relationships of age with (i) global eGFR (adding age2), (ii) person-specific eGFR-trajectories (adding age2 to the random effect), or (iii) SNP associations with eGFR (adding SNP*age2). We added these quadratic terms to the original model (LMM age model RI&RS in UKB 350K; eGFR~SNP, age, SNPxage, sex, PCs, RI, RS) and explored their impact on the SNP-by-age effect (i.e., SNP-association with linear eGFR-decline). For SNPs with PSNPxage² < 0.05, we additionally conducted breakpoint analyses (allowing for interval-wise linear relationships at 40, 50, and 60 years of age).

For eGFR-variability analyses, we used a generalized additive model for location, scale and shape (GAMLSS)58 with µ(eGFR)~sex, age, SNP, PCs and log(σ(eGFR))~sex, age, SNP, PCs.

Follow-up of identified variants regarding association with clinical traits

Rapid decline cases and controls were defined as annual decline < −3 or −1 to +1 mL/min/1.73 m2, respectively (based on estimated person-specific slopes via BLUPs, Eq. (3) without SNP as covariate); SNPs were tested for association with rapid decline via logistic regression (adjusted for age-at-baseline, sex, PCs). For eGFR-decline in CKD, we selected individuals with CKD (eGFR < 60 mL/min/1.73 m2) for at least one timepoint, removing the eGFR-trajectory before the first such timepoint; SNPs were tested for association with eGFR-decline in these CKD individuals (LMM time model RI&RS, since now the first timepoint is informative; Eq. (2)). UKB 150K was used, since these analyses required ≥2 eGFR values over time.

We also tested SNPs for association with being in the CKD subset (cases = CKD at any timepoint, controls = no CKD at any timepoint; using UKB 350K) via logistic regression (adjusted for age-at-CKD-onset or age-at-baseline, sex, PCs).

Follow-up of identified variants regarding biological relevance

Using KidneyGPS23, we annotated genes in identified loci for features that supported them as likely causal: (i) Mendelian human kidney disease (OMIM59 and other39,60), (ii) drug target for registered clinical trials on kidney disease (Therapeutic Target Database61), (iii) nearest gene to index variant62, (iv) gene mapped to variant statistically likely to be causal (posterior probability of association ≥10%) which alters protein (e.g., “missense”), protein abundance (e.g., 5′ UTR), or gene expression in kidney tissue (eQTL, Neptune63, Susztak Lab37, GTExv864; FDR < 5%). Notably, we used fine-mapping cross-sectionally assuming association signals for eGFR-decline to coincide with cross-sectional association signals as indicated previously10.

We searched genes in identified loci for enrichment of pathways (Reactome version-85, Released 2023-05-25, using PANTHER 18.031,32) or tissue-specific enrichment of DEGs (MAGMA65 as GENE2FUNC in FUMA 1.5.233 with default parameters, which evaluates 54 different tissue types).

LongGWAS on eGFR-decline in UKB

We tested 11,321,495 autosomal variants from UK10K/HRC-imputed UKB data36 using LMM age model RI&RS in UKB 350K via GMMAT (v1.4.2)34 and MAGEE (v1.4.1)35. GMMAT/MAGEE provides an efficient implementation of an LMM RI&RS. The computational efficiency is obtained by estimating the LMM-based phenotypic variance-covariance only once (GMMAT), which is then used by MAGEE to efficiently test SNP associations. Analyses were adjusted for 20 PCs; results were corrected for GC lambda66. We selected genetic variants associated with eGFR-decline with GC-corrected Pdecline < 5 × 10−8. Independent locus regions were defined by the variant with the smallest Pdecline (lead variant) and variants nearby ±250 kb (overlapping loci merged).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.