Abstract
Age at onset of walking is an important early childhood milestone which is used clinically and in public health screening. In this genome-wide association study meta-analysis of age at onset of walking (N = 70,560 European-ancestry infants), we identified 11 independent genome-wide significant loci. SNP-based heritability was 24.13% (95% confidence intervals = 21.86–26.40) with ~11,900 variants accounting for about 90% of it, suggesting high polygenicity. One of these loci, in gene RBL2, co-localized with an expression quantitative trait locus (eQTL) in the brain. Age at onset of walking (in months) was negatively genetically correlated with ADHD and body-mass index, and positively genetically correlated with brain gyrification in both infant and adult brains. The polygenic score showed out-of-sample prediction of 3–5.6%, confirmed as largely due to direct effects in sib-pair analyses, and was separately associated with volume of neonatal brain structures involved in motor control. This study offers biological insights into a key behavioural marker of neurodevelopment.
Similar content being viewed by others
Main
In early childhood, the onset of walking is used as a simple yet robust clinical marker for brain and behavioural development. A major advantage of this milestone is that it is both memorable and clearly defined and therefore can be reliably identified and recalled by parents1. Moreover, while there is variability in the sequence and presence of some motor skills (for example, some children bottom shuffle but never crawl), walking is an exclusive and informative milestone for both typical and atypical development.
In current clinical practice, an inability to walk independently by age 18 months is used in national guidelines such as those outlined by the UK National Institute of Health and Care Excellence (NICE; https://www.nice.org.uk/) or by the US Centers for Disease Control and Prevention2 as a screening criterion for referral to a paediatrician for further assessment and investigation3. This is because delayed walking could represent an underlying motor-specific issue such as a primary muscle disorder or generalized issues such as global developmental delay4. The causes of these issues can be genetic or environmental, including genetic disorders and extreme prematurity5. However, historical data suggest that only a minority (about a third) of late walkers may have an underlying neurological abnormality or developmental disorder, and that variation in age at onset of walking within the typical range might not be strongly associated with IQ in childhood6. As such, late-walking children (later than 18 months) might either reflect an extreme of typical variation or relate to clinically meaningful conditions with a later age of onset.
Although most humans begin to walk independently by early childhood, typical attainment of this milestone can be achieved within a relatively wide developmental period, for most infants between 8 and 18 months old3. It is thought that age at onset of independent walking (hereafter, AOW) is a complex trait determined by multiple factors, including body dimensions, year of birth, gestational age and related neural maturation, opportunity to practice7,8, cultural context9 and nutrition10. Many of these factors are thought to influence the structure and function of a network of brain areas implicated in motor control, including the cortex, basal ganglia and cerebellum, with dysfunction in these brain regions resulting in movement disorders11. In addition to reflecting general developmental processes, the ability to walk independently may itself have cascading effects on other developmental domains12. When children transition from crawling to standing and walking, the perspective at which they perceive the world changes, as do their means of interacting with the world13. However, it remains unclear what are the causal influences underlying the wide variability in age at onset of walking or whether these causal influences are also associated with later health, neurodevelopmental and cognitive outcomes.
A greater understanding of the variability and causes of late walking has clear societal implications. It would inform many countries’ public health policy that aim to screen children for delay14. Genetic information has the potential to offer greater understanding regarding the aetiology of this developmental milestone. Furthermore, it can contribute alongside screening tools to aid the prediction and early identification of clinically relevant conditions associated with early or delayed onset of walking, and avoid missing time for potentially beneficial physical training when appropriate.
There is substantial evidence for a genetic contribution to motor development. A recent meta-analysis of infant twin studies showed that the broad category of psychomotor function was one of the most heritable behavioural domains, with pooled heritability of 59%15. For AOW specifically, a study of 2,274 twin pairs in England and Wales reported a heritability of 84%16. Polygenic scores for autism spectrum disorder (ASD, hereafter autism), schizophrenia and bipolar disorder have been found to be associated with infant neuromotor characteristics such as muscle tone, reflexes and senses17. Further, the attention deficit/hyperactivity disorder (ADHD) polygenic score was associated with AOW18. As such, age at onset of walking appears to be an ideal candidate for genetic discovery research. Identification of specific genetic loci is an important step towards uncovering the biological mechanisms underlying this developmental milestone and deriving clinically informative insights with respect to childhood motor disorders. There have been no common gene discovery studies of AOW so far19.
In sum, there are several reasons for focusing on AOW. It is a marker of brain and behavioural development, it is easily measurable in large cohorts, reliably recalled by parents1 and varies substantially between children.
Here we present a genome-wide association study (GWAS) meta-analysis of AOW in a sample of 70,560 children from four European-ancestry cohorts. First, we aimed to quantify single-nucleotide polymorphism (SNP)-based heritability of AOW and the degree of polygenicity of this trait. Second, we aimed to identify independent genetic loci associated with AOW and their functional roles. Third, we estimated genetic correlations with physical health indicators, cognitive traits, neurodevelopmental conditions, psychiatric disorders and cortical phenotypes. Fourth, we evaluated the predictive power of the AOW polygenic score and tested whether it was associated with the volume of neonatal brain structures in an independent cohort.
Results
Genomic loci associated with age at onset of walking
We conducted a GWAS meta-analysis of AOW in 70,560 children including data from four European-ancestry cohorts: the Norwegian Mother, Father and Child Cohort Study20,21 (MoBa, N = 58,302), the Netherlands Twin Register22 (NTR, N = 6,251), the Lifelines multigenerational prospective population-based birth cohort study23 from the North of the Netherlands (N = 3,415) and the United Kingdom Medical Research Council National Study for Health and Development24 (NSHD, N = 2,592). Analyses were preregistered on OSF (https://doi.org/10.17605/OSF.IO/M2QV3). The quantile–quantile (QQ) plot for the MoBa GWAS (Supplementary Fig. 2) indicated a P value deviation from a normal distribution (λGC = 1.227). The observed inflation is probably explained by trait polygenicity (linkage disequilibrium score regression [LDSC] intercept = 1.008 (0.008)25,26; see Supplementary Note A for a detailed investigation of the observed inflation). The other smaller cohorts’ inflation factors were below the recommended threshold of 1.10 (NTR λGC = 0.975, Supplementary Fig. 4; Lifelines λGC = 1.001, Supplementary Fig. 6; NSHD λGC = 1.002, Supplementary Fig. 8), which is expected given the positive relationship between inflation and sample size27. Therefore, contrary to the preregistered plan, automatic correction for genomic control was not applied for all cohorts when performing the standard-error-weighted meta-analysis using the METAL tool28.
We identified 2,525 genome-wide significant SNPs (P < 5 × 10−8), of which 11 were independent loci with one lead variant per locus in GCTA conditional and joint analysis (COJO)29 (Table 1 and Fig. 1, see also Supplementary Fig. 9 for the QQ plot and Supplementary Fig. 10 for the regional plots). All 11 lead SNPs remained significant after conditioning on the other significant SNPs on the same chromosome (Table 1, column ‘COJO P’). The most strongly associated SNP was located on chromosome 12 (rs7956202 near HECTD4, P = 2.045 × 10−11). The second most significant lead SNP was located on chromosome 16 (rs16952251, near RBL2, P = 2.637 × 10−11) (fine mapping of this locus is discussed later; see Results section ‘Co-localization with gene expression in the brain’). See Table 1 for a full list of significant loci, Supplementary Table 4 for previous associations with complex traits, and Supplementary Table 5 for which cohorts contributed to each locus.
The x axis shows genomic position (chromosomes 1–22) and the y axis shows statistical significance as −log10(P value). P values are two-sided and based on an inverse-variance standard-error-weighted fixed-effects meta-analysis. N = 70,560. The horizontal red line indicates the P-value threshold for genome-wide statistical significance (P = 5 × 10−8). P values were not adjusted for multiple comparisons. The lead SNP for each genome-wide significant locus is labelled and indicated with a yellow diamond. The inflation factor λGC for this GWAS was 1.27 and LDSC intercept was 1.00 (s.e. = 0.01), suggesting that inflation was due to polygenicity of AOW (see Supplementary Note A for a discussion). The meta-GWAS QQ plot by allele frequency is presented in Supplementary Fig. 9. SNPs with P-values < 0.001 (corresponding to −log10(P) > 3) are presented as data points.
Common genetic architecture of age at onset of walking
SNP-based heritability of AOW estimated with LDSC25 was h2SNP = 24.13% (95% CI = 21.856, 26.404). Heritability for the phenotype in males (N = 35,642) and females (N = 34,918) was estimated to be 23.06% (95% CI = 19.512, 26.608) and 23.06% (95% CI = 19.356, 26.764), respectively. The genetic correlation (rg) of the phenotype between males and females estimated with LDSC30 was 0.99 (95% CI = 0.872, 1.108).
The SNP-based heritability (h2SNP) estimated using LDSC25 for the MoBa sample was h2SNP = 25.11% (95% CI = 22.484, 27.736) and for the NTR sample, h2SNP = 19.09% (95% CI = 4.547, 33.633). Lower h2SNP estimates and larger standard errors were obtained for the smaller samples, namely: Lifelines (h2SNP = 9.52%, 95% CI = −15.921, 34.961) and NSHD (h2SNP = −3.02%, 95% CI = −36.673, 30.633), as LDSC cannot produce reliable estimates with samples <5,000 (ref. 25). Genetic correlation between MoBa and NTR was rg = 0.893 (95% CI = 0.558, 1.228, P = 1.803 × 10−7) and between NTR and Lifelines, rg = 0.463 (95% CI = −0.623, 1.549, P = 0.404). As expected, other genetic correlations were out of bound (MoBa–Lifelines rg = 1.168, 95% CI = −0.233, 2.569, P = 0.103) or non-estimable due to low reliability of the LDSC estimates, indicated by the large SNP-based heritability standard errors obtained for the smaller cohorts. Of note, the interval between AOW and parent report was not significantly correlated with the mean AOW difference between cohorts (r = 0.16, P = 0.76, two-tailed).
There was no genome-wide statistically significant heterogeneity (using the conventional P < 5 × 10−8 threshold) between cohorts as tested with the heterogeneity metric per SNP, I2; the maximum I2 was 95.3 for SNPs rs7864115 (χ2(1) = 21.453, P = 3.627 × 10−6) and rs148684045 (χ2(1) = 21.441, P = 3.648 × 10−6). This indicates that variation of individual SNP effects between individual GWASs was not due to heterogeneity between the cohorts31 (Supplementary Fig. 11). Overall, the M multiSNP heterogeneity metric across the independent lead SNPs32 associated with AOW indicated no systematically more or less influential study (see Supplementary Table 3, all Bonferroni-corrected Ps < 0.401).
Biological annotation of associated loci and genes
Analyses on prioritized genes annotated to significant SNPs
The genome-wide significant SNPs were mapped to 233 genes on the basis of genomic position, expression quantitative trait loci (eQTLs) and chromatin interaction information in FUMA33 (Supplementary Table 6). We tested whether these prioritized genes were differentially expressed in the brain across BrainSpan34 developmental stages and GTEx (v.8)35 tissues. We observed a significant downregulation of the differentially expressed genes (DEGs) in multiple tissues including the brain (amygdala and hippocampus, both-sided DEG Bonferroni-adjusted P = 5.979 × 10−8, P = 5.165 × 10−7) and the heart left ventricle (Padj. = 3.124 × 10−7), and DEGs upregulation in fibroblasts (Padj. = 0.003) (Supplementary Fig. 12). The enrichment of upregulated or downregulated DEGs across BrainSpan developmental stages was not significant (all Bonferroni-adjusted P = 1, see Supplementary Table 7 for full results). Gene sets associated with AOW were enriched in the Gene Ontology36 neurogenesis and generation of neurons pathways (see Supplementary Table 8 for all significantly enriched gene sets and gene set–trait associations from previous studies).
Genes associated with age at onset of walking
The MAGMA37 gene-based test performed in FUMA on the meta-GWAS summary statistics indicated 50 genes that were associated with AOW at a Bonferroni-corrected genome-wide significance threshold of 2.664 × 10−6 (P = 0.05 divided by 18,766 genes; Supplementary Table 9). A full list of previously reported genome-wide associations with complex traits for the 50 AOW-associated genes is provided in Supplementary Table 10.
Using the Genomics England PanelApp38, we found that 13 (27.7%) of the 47 of the 50 MAGMA genes that had Ensembl IDs in PanelApp were associated with intellectual disability (ID, v.5.557); this is over double the proportion (2.10 times) of ID-associated genes in the panels as a whole (2,624 out of 19,950, 13.2%; χ2(1) = 7.45; P = 0.006, two-tailed). These genes include ATXN2, AUTS2, CUX2, FOXP1, KANSL1 and RBL2 (Supplementary Table 9). Furthermore, we found that 7 of the 47 genes were associated with autism (14.9%), which is over 4 times the proportion of autism-associated genes in the panel (v.0.36, largely based on SFARI gene39) as a whole (734 out of 19,950, 3.68%; χ2(1) = 13.7; P = 0.0002, two-tailed).
To identify tissue specificity of AOW, MAGMA gene-property analyses performed in FUMA using gene-based association P values for all the 18,766 genes revealed that gene expression was primarily enriched in the brain cerebellar hemispheres (β = 0.017, 95% CI = −0.050, 0.084, P = 0.006) and cerebellum (β = 0.018, 95% CI = −0.052, 0.088, P = 0.007), although these results were not significant at a Bonferroni-corrected α level of 0.05 for 54 tissues (9.000 × 10−4; see Supplementary Fig. 13). Overall, expression of the genes associated with AOW was significantly enriched between 19 and 24 post-conceptional weeks (late mid-prenatal period, β = 0.041, 95% CI = 0.011, 0.070, P = 0.004; Supplementary Fig. 14). The MAGMA gene-set analysis yielded no significant results (Supplementary Table 11).
Analyses on the meta-GWAS summary statistics
Enrichment of AOW meta-GWAS signal by functional genomic annotation was tested using stratified LDSC40 analyses. These revealed that heritability of AOW was significantly enriched in genomic regions conserved in primates (16.142-fold enrichment, 95% CI = 10.421, 21.863, P = 0.309 × 10−6), mammals (13.053-fold enrichment, 95% CI = 8.239, 17.867, P = 0.287 × 10−5,) and vertebrates (8.747-fold enrichment, 95% CI = 5.450, 12.044, P = 0.817 × 10−5; see Extended Data Fig. 1). Full results of partitioned heritability by functional genomic annotation can be found in Supplementary Table 12.
We then tested whether heritability was enriched in specific cell types using stratified LDSC41 and found significant enrichment in the brain, particularly in the basal ganglia (caudate: enrichment = 1.400 × 10−8, Bonferroni-adjusted P = 0.014, 95% CI = 6.062 × 10−9, 2.194 × 10−8, nucleus accumbens: enrichment = 1.760 × 10−8, Padj. = 0.001, 95% CI = 9.740 × 10−9, 2.546 × 10−8, putamen: enrichment = 1.470 × 10−8, Padj. = 0.006, 95% CI = 6.840 × 10−9, 2.256 × 10−8), cortex (enrichment = 1.370 × 10−8, Padj. = 0.003, 95% CI = 6.781 × 10−9, 2.062 × 10−8), amygdala (enrichment = 1.360 × 10−8, Padj. = 0.020, 95% CI = 5.682 × 10−9, 2.152 × 10−8) and cerebellum (enrichment = 1.320 × 10−8, Padj. = 0.014, 95% CI = 5.772 × 10−9, 2.023 × 10−8; Extended Data Fig. 2). Complete stratified LDSCs by cell-type estimate are reported in Supplementary Table 13.
Co-localization with gene expression in the brain
We investigated whether genes near the 11 genome-wide significant loci, as well as 50 genes significantly associated with AOW (Supplementary Table 9), were enriched for eQTLs in an independent dataset of post-mortem bulk RNA-seq from 261 samples of the human adult cerebellum42. We identified significant eQTLs for the gene RBL2 (which encodes a transcriptional regulator by the same name) in genomic locus 2 on chromosome 16 (Table 1). Comparing the statistical evidence of association with AOW (GWAS) against the statistical evidence of association with RBL2 expression, we noticed a distinct pattern: both the GWAS and eQTL P values had two groups of significantly associated SNPs distinguished by their linkage disequilibrium correlation with a lead GWAS SNP (rs17800727, Fig. 2a). Group 1 had the strongest evidence for GWAS association (min P = 2.95 × 10−11) but slightly weaker evidence of eQTL association (min P = 2.72 × 10−13 cerebellum eQTL), while Group 2 had weaker evidence for GWAS association (min P = 9.51 × 10−8) but stronger evidence of eQTL association (min P = 6.41 × 10−24 cerebellum eQTL, Fig. 2a). We investigated the probability that the same SNPs in this locus influence both AOW and RBL2 expression (co-localization, Fig. 2). Our co-localization analysis at this locus suggested an independent causal variant in the GWAS (rs17800727; chr16:53481010:A:G GRCh37; chr16:53447098:A:G GRCh38) and the eQTL data (rs7203132; chr16:53429775:G:A GRCh37; chr16:53395863:G:A GRCh38) with a posterior probability (PP) of 0.96 (ref. 43) that the causal SNP is distinct in each dataset. A similar co-localization pattern was observed using 1,433 samples of the human adult cortex (ref. 42) (Supplementary Note B and Fig. 15; PP = 0.97–0.99).
Genomic locus 2 overlaps with a region in which SNPs are predicted to alter RBL2 expression in the human brain (eQTLs). a, The GWAS evidence for association with age at onset of walking [−log10(P value), y axis] is plotted against the statistical evidence of being an eQTL for RBL2 in human adult cerebellum42 [−log10(P value), x axis] for each SNP (points) within a 2-Mb window around the GWAS peak. Points are coloured by linkage disequilibrium (LD) correlation with the lead SNP (rs17800727) and these values were used to define two groups. b, The SNPs from a are shown in the 2-Mbp genomic region (x axis, GRCh37) with protein-coding genes (top), GWAS evidence for association with age at onset [−log10(P value), middle] and statistical evidence for RBL2 expression in human cerebellum [−log10(P value), y axis, bottom]. Point colour matches a. c, A zoomed-in view of the peak indicated by dashed vertical lines in b shows the GWAS evidence for association with age at onset of walking [−log10(P value), y axis] by genomic position (x axis, GRCh37). Colour indicates the MAF of each SNP. The locations of protein-coding genes in the region are indicated at the top. An SNP (rs17800727) that results in a missense variant (p.Tyr210Cys) in RBL2 is marked. d, Swarm, violin and boxplots showing the distribution of RBL2 expression in the prefrontal cortex (transcripts per million (TPM), y axis). Each point represents the expression of RBL2 in 1 of 87 prenatal human cortices (BrainVar44) split by genotype into 3 groups on the basis of zygosity for the Group 2 50% MAF SNPs. The P value represents the difference between the homozygous alternate (N = 28) and homozygous reference (N = 30) groups. The centre is the median expression value. The lower and upper bounds of the box correspond to the first and third quartiles (the 25th and 75th percentiles). The upper/lower whiskers extend from the upper/lower bound to the largest/smallest value no further than 1.5× the interquartile range. Data beyond the end of the whiskers are outlying points and are plotted individually. Bars at the bottom indicate pairs of haplotypes (derived from the data shown in c making up each genotype). e, Structure of the RBL2 protein predicted by AlphaFold95 with the location of rs17800727, p.Tyr210Cys in red96.
To understand these two groups, we assessed their distribution across the 2-Mb genomic locus (±1 MB around the gene) and observed that they overlapped throughout a 125-kb peak with well-defined margins for both the GWAS and RBL2 eQTL analysis (Fig. 2b). We next considered how these SNPs were distributed on the basis of minor allele frequency (MAF, Fig. 2c). The Group 1 SNPs (strongest GWAS evidence) had a MAF of 30%, while the Group 2 SNPs (strongest eQTL evidence) had a MAF of 50%. Using whole-genome sequencing data from 176 individuals with paired post-mortem RNA-seq data from prefrontal cortex44, we used the MAF distribution to identify five haplotypes (Fig. 2d) and each individual’s genotype. Group 2 SNPs (strongest eQTL evidence, MAF 50%) are found in three haplotypes (dark blue and red, dark blue and yellow, dark blue alone, Fig. 2d) resulting in the high MAF of 50%. Homozygous status for the Group 2 SNPs is associated with decreased expression of RBL2 (Wilcoxon rank test, two-sided; W(56) = 249, P = 0.007, Hodges–Lehmann estimator = −2.105, 95% CI = −3.813, −0.610). We infer that one of the SNPs shown in dark blue (Fig. 2c) impacts RBL2 expression, although no clear candidate SNP was evident when considering epigenetic data.
Group 1 SNPs are only found on one haplotype (dark blue and red, Fig. 2d) resulting in a lower MAF of 30% than the Group 2 SNPs. We infer that one of the Group 1 SNPs has a functional impact above and beyond the decrease in RBL2 expression mediated by the Group 2 SNPs, to yield the stronger evidence of association with AOW. Annotation of the 125-kb locus with VEP45 identified rs17800727 as a likely candidate for this effect, since it results in a missense variant (MANE isoform: ENST00000262133.11, p.Tyr210Cys) (Fig. 2e) that is predicted to impact function by some severity metrics (for example, ‘Damaging’ based on PolyPhen2 (ref. 46), CADD47 score of 25) but not all (for example, ‘Tolerated’ based on SIFT). If the missense variant had a loss-of-function effect, it would be on a haplotype that magnifies the functional impact through decreased expression of RBL2; future functional studies would be required to validate this impact.
We also identified co-localization of SNPs associated with expression of several genes in both the cerebellum and cortex with SNPs associated with AOW in genomic locus 6 on chromosome 17 (Table 1). This region has a complex haplotype structure, including alternative contigs, which may explain this result. In cerebellum, we identified co-localization in KANSL1 (PP = 0.79), PLEKHM1 (PP = 0.78), SPPL2C (PP = 0.77) and STH (PP = 0.63). In the cortex, we also identified co-localization in STH (PP = 0.78) and SPPL2C (PP = 0.72), as well as in CRHR1 (PP = 0.74).
Polygenic score analysis
In a leave-one-out design, we calculated a polygenic score (PGS) on the basis of meta-analyses of all samples, leaving out either Lifelines, NTR or NSHD. In the Lifelines cohort, the PGS from the meta-GWAS of the other cohorts (MoBa, NTR and NSHD) was significantly associated with AOW (β = 0.185, 95% CI = 0.152, 0.217, P < 2 × 10−16, R2 = 0.034). Using the same method, the PGS was significantly associated with AOW in the NTR cohort (β = 0.185, 95% CI = 0.147, 0.223, P < 2 × 10−16, R2 = 0.031) and in the NSHD cohort (β = 0.175, 95% CI = 0.137, 0.213, P < 2 × 10−16, R2 = 0.030). The MoBa sample comprised a high proportion of the data such that it would be inappropriate as a ‘left out’ sample in a leave-one-out design. Therefore, we applied 5-fold cross-validation to this cohort, yielding 5 within-sample PGSs with a mean variance explained of R2 = 0.056 (s.e. = 0.001).
Genetic effects identified by GWAS can be confounded by indirect genetic effects, for example, through population structure, assortative mating and passive gene–environment correlation (prGE)48. To identify possible confounding from indirect genetic effects, we used a within- and between-sib-pair PGS analysis. We generated a PGS from a meta-analysis of the MoBa, Lifelines and NSHD GWAS summary statistics and used it to conduct within-family associations in the NTR dataset. Among 1,254 dizygotic twin pairs (N = 2,508 individuals), within- and between-family standardized regression coefficients in a linear mixed-effects model were not significantly different from each other (χ2(1) = 1.479, P = 0.224, two-tailed), indicating that the genetic signal is not biased by prGE, or effects such as stratification and assortative mating. Figure 3 shows the beta estimates of the AOW PGS prediction in all the cohorts, with the NTR within- and between- sib-pair estimates presented separately.
Data are presented as beta estimates ±s.e. of the beta estimate of a linear regression model testing the association between age at onset of walking and the polygenic score (two-tailed P values). N = 11,660 (MoBa-1, MoBa-2, Moba-3), N = 11,661 (MoBa-4, MoBa-5), N = 3,415 (Lifelines), N = 2,592 (NSHD); N = 2,508, N pairs = 1,254 (NTR between- and NTR within-sib-pair).
Genetic correlations with other traits
Next, we tested for genetic correlations between AOW and a preregistered selection of physical health, neurodevelopmental, psychiatric, cognitive and cortical phenotypes. For physical health, AOW was negatively genetically correlated with childhood body-mass index (cBMI)49 (rg = −0.143, 95% CI = −0.217, −0.069, P = 1.553 × 10−4, Bonferroni-adjusted P = 0.004) and adult BMI50 (rg = −0.103, 95% CI = −0.142, −0.063, P = 2.858 × 10−7, Padj. = 8.00 × 10−6) but not with birth weight (rg = 0.068, 95% CI = −0.067, 0.202, P = 0.325). Of the six included psychiatric disorders, ADHD51 showed a significant genetic correlation with AOW (rg = −0.180, 95%CI = −0.242, −0.118, P = 1.299 × 10−8, Padj. = 3.64 × 10−7). In addition, AOW was positively genetically correlated with the cognitive phenotypes, educational attainment52 (rg = 0.119, 95% CI = 0.081, 0.157, P = 7.457 × 10−10, Padj. = 2.088 × 10−8) and cognitive performance53 (rg = 0.092, 95% CI = 0.041, 0.142, P = 3.967 × 10−4, Padj. = 0.011).
Among 13 adolescent and adult cortical phenotypes54, we observed a significant genetic correlation between AOW and folding index (rg = 0.136, 95%CI = 0.062, 0.209, P = 3.000 × 10−4, Padj. = 0.008). There were no significant genetic correlations with the other complex traits tested after correction for multiple testing (see Supplementary Table 14 and Fig. 4a). For motor phenotypes, non-preregistered exploratory analyses showed that AOW was genetically correlated with self-reported walking pace in adults55 (rg = 0.058, 95% CI = 0.006, 0.110, P = 0.029, Padj. = 0.820), although this result did not survive P-value correction for multiple testing (Supplementary Table 14).
a, Genetic correlation between AOW and physical health (purple), cognitive traits (blue), neurodevelopmental conditions and psychiatric disorders (orange), cortical phenotypes (grey) and non-preregistered motor phenotypes (green). Data are presented as correlation coefficients ± 95% CIs. Filled circles indicate significant correlations based on CIs. Filled squares indicate the traits that remain significantly genetically correlated with age at onset of walking after adjusting the two-sided P values obtained from LDSC for multiple testing using Bonferroni correction. The maximum GWAS sample sizes for each of the traits included in the LDSC analysis are as follows: age at onset of walking N = 70,560; childhood BMI N = 61,111; birth weight N = 42,212; adult BMI N = 795,640; educational attainment N = 765,283; cognitive performance N = 269,867; autism Ncases = 18,382, Ncontrols = 27,969; ADHD Ncases = 38,691, Ncontrols = 186,843; schizophrenia Ncases = 67,390, Ncontrols = 94,015; cross-disorders Ncases = 232,964, Ncontrols = 494,162,; major depression Ncases = 170,756, Ncontrols = 329,443; bipolar disorder Ncases = 41,917, Ncontrols = 371,549; cortical phenotypes (fractional anisotropy, mean diffusivity, intracellular volume fraction, orientation dispersion index, isotropic volume fraction, cortical thickness, folding index, Gaussian curvature, intrinsic curvature index, local gyrification index, mean curvature, cortical surface area, grey matter volume) N = 36,663; muscle weakness in the pincer grip Ncases = 48,596, Ncontrols = 207,927; self-reported walking pace N = 450,967; early motor coordination N = 31,797; Parkinson’s disease Ncases = 26,421, Ncontrols = 442,271. b, Venn diagrams representing MiXeR bivariate analyses between AOW and the 6 other phenotypes with which it has Bonferroni-significant genetic correlations. The size of the circles and the numbers within them represent the relative polygenicity of each trait (that is, how many genetic variants contribute to 90% of the SNP heritability). The overlap between each pair of circles represents the degree of genetic overlap between the two phenotypes, that is, the number of shared variants in thousands, along with the standard error. Numbers and standard errors in sections of the circles that do not overlap represent the number of variants unique to that phenotype. The corresponding rg, estimated using LDSC, is shown below each Venn diagram.
In light of our findings of a Bonferroni-significant genetic correlation between AOW and global folding index, we conducted further non-preregistered analyses, as requested by a reviewer, to gain more specific information about the brain regions implicated. We included regions involved in motor and/or somatosensory function and corrected for multiple testing using false discovery rate (FDR) correction. We found that later AOW was significantly genetically correlated with increased folding in the primary somatosensory cortex (regions of interest (ROIs) in Glasser parcellation56 1: rg = 0.160, 95% CI = 0.078, 0.242, FDR-adjusted P = 0.003 and 5 m: rg = 0.182, 95% CI = 0.081, 0.283, Padj. = 0.005), premotor cortex (ROI 6r: rg = 0.152, 95% CI = 0.042, 0.262, Padj. = 0.045) and cingulate motor area (ROI 24dd: rg = 0.148, 95% CI = 0.053, 0.243, Padj. = 0.021). See Supplementary Table 16 for the full set of results.
The largest-magnitude genetic correlation was between AOW and ADHD. In light of the potential implications of this finding, we tested, in an exploratory non-preregistered analysis, whether the AOW–ADHD genetic correlation remained after controlling the genetic influences of educational attainment, since the latter are also known to be associated with ADHD57. In a genetic multivariable regression performed with GenomicSEM58, we observed that the relationship between the genetic components of ADHD and AOW remained significant after conditioning for educational attainment (standardized β = −0.160, 95% CI = −0.248, −0.072, P = 3.8 × 10−4), while the conditional standardized association between educational attainment and AOW was non-significant (β = 0.038, 95% CI = −0.027, 0.103, P = 0.246; Supplementary Fig. 16).
We applied MiXeR univariate and bivariate Gaussian mixture modelling59, which calculates the polygenicity of AOW defined as the number of SNPs that explain 90% of the h2SNP, and the genetic overlap between AOW and other phenotypes, including SNPs of both concordant and discordant effect directions. We applied bivariate mixture modelling to AOW with all other phenotypes with which there was a significant genetic correlation as calculated by LDSC after correction for multiple testing (based on Fig. 4a). In terms of Akaike information criterion (AIC) fit, we found support for the bivariate MiXeR models that estimated the optimal polygenic overlap between AOW and childhood and adult BMI, educational attainment, cognitive performance, ADHD and folding index (see Fig. 4b, AIC and Bayesian information criterion (BIC) values for all correlated phenotypes are provided in Supplementary Table 15). These models were supported over the ‘minimal model’ which explains the observed LDSC models using the minimal amount of polygenic overlap possible.
The polygenicity of AOW was 11,857 SNPs, confirming the hypothesis that the inflation observed in the QQ plot could be explained by trait polygenicity (Supplementary Note A). MiXeR presents the genetic overlap between two traits as Venn diagrams (Fig. 4b). In terms of the proportion of the SNPs contributing to the polygenicity of AOW that overlap with other phenotypes investigated, the traits investigated that showed the most overlap were cognitive performance (91.07%), educational attainment (82.44%), adult BMI (77.38%) and ADHD (64.87%). Of these overlapping SNPs, the fractions of SNPs that had concordant directions of effect were 55.10% and 53.71% for educational attainment and cognitive performance, respectively. On the contrary, little SNP overlap, despite significant genetic correlation, was found with childhood BMI (11.80%, of which 36.44% was concordant) and folding index (15.84%, of which 58.72% was concordant). A summary of all bivariate MiXeR analysis results can be found in Supplementary Table 15.
Polygenic score association with brain measures at birth
In an exploratory analysis, we tested whether the PGS for AOW was associated with measurable differences in infant brain volume and gyrification at birth. We used neonatal T2 imaging data from a European subsample of 264 term-born infants (137 male, 127 female), acquired as part of the Developing Human Connectome Project (dHCP)60.
The effect of the AOW PGS on brain volume was investigated across the whole brain at the voxel level using log-Jacobian determinants, calculated using nonlinear deformation fields between participants and the dHCP neonatal standardized atlas. In the resultant maps, higher values represent brain regions that contracted during image registration (that is, had larger brain volumes), while smaller values represent volume reductions61. We performed a tensor-based morphometry analysis, applying a general linear model (GLM) and permutation testing for statistical inference. We found a significant positive correlation between the AOW PGS and regional brain volume in the right basal ganglia, right posterior thalamus, bilateral anterior thalami, bilateral cerebellum and cerebellar peduncles, pons, medulla, primary visual cortex and superior temporal sulcus after correcting for multiple comparisons and thresholding at a corrected P < 0.05 (Fig. 5). Increased brain volume in these regions was associated with a higher PGS (predisposing to later AOW).
Thresholding t-statistic image at t > 0.95 (two-sided statistical test). Significant voxels were overlaid on the 40-week neonatal brain template in sagittal, coronal and axial planes. White arrows indicate significant brain structures involved in motor control. N = 264.
To explore whether the correlation between gyrification and common genetic variation linked to AOW was present in newborns, we fit a GLM testing for a significant effect of AOW PGS on the mean gyrification index in the left and right hemisphere of the dHCP infants. We found a significant positive association between AOW PGS and gyrification index in both hemispheres in newborn brains (left hemisphere β = 83,517.30, CI = 14869.22–152165.39, P = 0.017; right hemisphere β = 83,839.82, CI = 18,552.48–149127.15, P = 0.007).
Finally, for those infants that had also been assessed using the Bayley-III Scales of Infant and Toddler Development62 at 18-month-age follow-up (N = 217), we explored the relationship between scaled gross motor score and the AOW PGS using a multiple linear regression model. Data distribution was assumed to be normal, but this was not formally tested. We found that higher AOW PGS was significantly associated with lower Bayley’s gross motor score, indicating worse/possibly delayed gross motor skills (t(201) = −2.305, β = −0.161, s.e. = 0.070, P = 0.022).
Discussion
The reasons for the high variability in AOW in young children are poorly understood, yet this milestone is used widely as a clinical marker to index overall development, with absence of walking at age 18 months prompting clinical referral to a paediatrician for further assessment and investigation. The present study reveals that AOW is a heritable polygenic trait with significant aetiological links to later health outcomes. Moreover, we identified 11 independent genome-wide significant loci associated with AOW, one of which co-localized with eQTLs and was located in a gene associated with rare disorders that include delayed or absent walking. We discuss four main conclusions from these results.
Past models of gross motor skills, as well as neurodevelopment more generally, have put a primary emphasis on environmental factors such as nutrition10 and cultural factors9,63. Our first conclusion is that our results show that AOW is also associated with common genetic variants operating in the brain. Significantly enriched cell-type tissues were exclusively brain-based tissues; moreover, strongest signals included tissues in the basal ganglia, cortex and cerebellum. In line with these findings, the polygenic score for AOW was associated with neonatal brain volume of the basal ganglia, thalami, medulla, pons and cerebellum. This is consistent with the known role of these brain areas in motor function11,64. Also supporting this first conclusion, we found that gene sets involved in AOW are also involved in the generation of neurons. Further, we observed that genes associated with AOW are enriched in the brain between 19 and 24 weeks post conception (Supplementary Fig. 14).
Our second conclusion is that the novel loci that were discovered here involve genes of highly plausible biological relevance to the onset of walking. We identify common variant associations with AOW at a locus overlying RB transcriptional corepressor like protein 2 (RBL2, ENSG00000103479, genomic locus 2 in Table 1). RBL2 is also associated with an autosomal recessive neurodevelopmental disorder (eponym Brunet–Wagner)65,66. Homozygous loss of RBL2 has been observed in five individuals across three families, each with a different allele65,66. Affected individuals had infantile hypotonia, severe developmental delay, delayed/absent walking, and were minimally verbal. Seizures were reported in three cases. Three cases had microcephaly (−2.4 s.d. to −4.7 s.d.), while two had normal head circumference (65th and 50th centiles) but cerebral atrophy on magnetic resonance imaging (MRI). Height was normal for two cases, unreported for one and low for two (3rd centile, −3.4 s.d.). In Balb/c mice, homozygous loss of Rbl2 is embryonic lethal with a disorganized neural tube and neuronal loss64 (see also Supplementary Note B).
The third conclusion from our results is that AOW is partly influenced by the same genetic variants that influence individual variability of other complex traits measured at later ages. We found that common genetic variation associated with AOW is partly overlapping with common genetic variation associated with cognitive performance and years in education, likelihood of ADHD and cortical folding index. We note that the direction of these associations was consistent in the three largest individual cohorts (MoBa, NTR and Lifelines) (the fourth cohort, NSHD was not well-powered for genetic correlation estimates) as well as the meta-analysed results, indicating robust findings. Interestingly, MiXeR analyses showed that a large proportion of variants explaining the heritability in AOW were shared with educational attainment and cognitive performance, with more than half of these variants having concordant effects on the two phenotypes (which explains the overall positive genetic correlations obtained with the LDSC method shown in Fig. 4a). Thus, results indicated that genetic predispositions to later onset of walking also contribute to high cognitive performance and more educational attainment. It is interesting to note that nearly half of the overlapping SNPs between AOW and cognitive performance and academic achievement have discordant effects.
The negative genetic correlation between AOW and ADHD might be surprising when considering that, at the phenotypic level, delayed walking, rather than earlier walking, is associated with increased likelihood of developmental disorders4. However, the ability to walk requires practice and movement63, and infants with higher activity levels or shorter attention spans may, on average, move about more, thus gaining more practice in movement, muscle strengthening and training, ultimately resulting in earlier walking onset. Thus, attention and activity levels may influence motor system training in young children, and this may relate to what we are observing at the level of common genetic variation. In support of the hypothesis that shorter attention span and higher activity levels would be associated with earlier walking, a recent study of over 25,000 children from MoBa found that the ADHD polygenic score was associated with earlier walking18. Further, the ADHD polygenic score was associated with better gross motor skills, such as walking, climbing stairs and jumping, in 7,498 18-month-old children from the Avon Longitudinal Study of Parents and Children (ALSPAC)67. At the same time, it should be noted that in our study, the negative genetic correlation between AOW and ADHD, while significant, is still relatively modest in magnitude (rg = −0.180).
Research on the timing of milestones in prenatal brain development across humans, primates and other mammals shows that longer duration (more prolonged development) is associated with larger brain volumes, and in particular, enlargement of later developing brain structures68. In line with this, within humans we found that the polygenic score predisposing to later onset of walking is associated with larger volumes of neonatal brain areas involved in the motor domain (Fig. 5). In addition, we found that gene sets associated with AOW are involved in neurogenesis, and that expression of genes associated with AOW is enriched in the brain between 19 and 24 weeks post conception (Supplementary Fig. 14). Last, we found that later AOW is genetically correlated with increased cortical folding in adolescence and adulthood in areas involved in the somatosensory processing of movement (ROIs 1 and 5 m in Glasser parcellation56, located in the primary somatosensory cortex), including higher-order somatosensory integration of the lower limb representation (ROI 24dd in the cingulate motor area) and motor planning concerning the whole body (ROI 6r in the premotor cortex). Taken together, these findings may suggest that for children with genetically influenced protracted subcortical neurogenesis in the prenatal period, cortical regions involved in more complex motor behaviours may take longer to specialize69. This results in a later onset of walking. Since advantages and costs to early walking might vary on the basis of the individual’s environmental conditions, wide individual differences in the duration of the sensitive period to learn to walk might be the result of the ability of human beings to adapt to their local environment70.
Current public health policy employs late walking (>18 months) as a red flag for developmental delay which typically triggers referral for clinical assessment aimed to identify the reason for a departure from the normal range of achievement of this milestone3. A better understanding of the entire variation of AOW and of its shared biology with later medically relevant phenotypes could help in more precise intervention planning. Future research should test whether adding AOW PGS to clinical variables and/or rare variant information could improve prediction models that could be applied clinically. Historical data suggest that the majority of late walkers do not have a medically recognized developmental disorder6. In light of our findings, future research should explore whether early walking may also be a useful red flag that may offer early information about likelihood of ADHD or learning difficulties.
Our final conclusion is that the genetic signal identified through our AOW GWAS captures genetic effects that directly influence the phenotype48. This was tested by the within-family polygenic score analyses on fraternal twin siblings in the NTR cohort. We found that the variance explained by the between-pair PGS was not significantly greater than that explained by within-pair association. If the variance explained by between-pair PGS had been much larger than the within-pair PGS, it would have indicated that some of the AOW signal was coming from genetic effects that play a role on the phenotype in an indirect way, via mechanisms such as gene–environment correlation, assortative mating and stochastic effects48. Our results offer evidence that the polygenic score is picking up on direct genetic effects.
In our study design, we took a comprehensive approach to the phenotype and samples. Relevant samples were searched for using multiple database resources, research council websites and bibliographies. Samples were only included if they had a highly similar phenotype (AOW in months) and a sample size greater than 1,000 to ensure reliable effect sizes in individual samples. Nevertheless, the potential attrition and participation biases present in population cohorts should be considered in relation to our findings71,72. Although there is evidence that AOW can be reliably recalled by parents retrospectively by the child’s second1 and third18 birthday, we acknowledge that it was not possible to measure the reliability of this phenotype as recorded in the Lifelines cohort, where it was collected between the children’s 3 and 18 years of age. It is possible that the Lifelines measure included the largest measurement error of the four cohorts, in light of the later age at which parents recalled the AOW in their children (although AOW difference between cohorts was not correlated with the interval between AOW and parent report). Systematic regional/national differences in body size and cultural factors might explain these differences. However, by conducting the GWAS in each individual cohort and then meta-analysing the effects, our approach ensures capturing SNP effects on the trait variance that are not confounded by differences between cohorts. An important limitation of this study is that our meta-analysis only included Western European cohorts, as at the time of conducting the study, information on AOW was not available in other sufficiently large genotyped cohorts to have the statistical power for a GWAS. Extending this investigation to a more diverse population is a vital next step. Future work could also test the degree to which genetic correlations with AOW vary locally across the genome, and furthermore, how they vary when conditioned on third variables to delineate genetic associations with AOW within specific genomic locations73.
In summary, we demonstrate that the high variability in age at onset of walking is partly due to common genetic variation, with approximately a quarter of the variability explained by common genetic variants. The genetic variants identified were plausible contributors to individual variability in motor behaviour, as they were previously associated with disorders that disrupt the development of walking. AOW was shown to be an important milestone that links genetically to a range of later health, educational and behavioural outcomes.
Methods
Inclusion and ethics statement
This study complies with all relevant ethics regulations. The research and the related secondary data analysis were approved by the Departmental Ethics Committee of the Psychological Science Department of Birkbeck, University of London, on 27 October 2020 (reference number 2021007). Each cohort received ethics approval by the local ethics review committee. The current research was not conducted in resource-poor settings. Researchers responsible for the cohort data management in each of the three European countries (Norway, the Netherlands, United Kingdom) were involved in the research process and consulted regarding authorship and relevant citations. No biological materials were shared for the purpose of this study. This study did not involve animal data.
MoBa and the related data collection was authorized by a licence from the Norwegian Data Protection Agency and an approval from the Regional Committees for Medical and Health Research Ethics (REK). MoBa is regulated by the Norwegian Health Registry Act. Informed consent was provided by all participating parents at recruitment. The current study was approved by REK (2016/1702). An external collaborator form was signed for accessing this dataset, and MoBa genotype and phenotype data were shared within a secure server in Norway, in accordance with Collaboration and Data Processor Agreement 20220801 between the Norwegian Institute of Public Health and Birkbeck College, University of London.
Informed consent for NTR was obtained from parents or guardians. The study was approved by the Central Ethics Committee on Research Involving Human Subjects of the VU University Medical Centre, Amsterdam, an Institutional Review Board certified by the US Office of Human Research Protections (IRB number IRB00002991 under Federal-wide Assurance FWA00017598; IRB/institute codes, NTR 03-180). No application for accessing this dataset was needed because for this study, only summary statistics and no personal data were shared between collaborating authors.
Participants in Lifelines gave written consent before physical examination. The study was conducted according to the principles of the Declaration of Helsinki and in accordance with the University Medical Center Groningen (UMCG) research code, and was approved by the Medical Ethics Committee of UMCG (document number METC UMCG METc 2007/152). Data were accessed in accordance with Material Transfer and/or Data Access Agreement OV19_0511 between Medische Biobank Noord Nederland B.V. for Lifelines and Birkbeck College, University of London.
For NSHD, the collection of blood samples and DNA information from the participants was approved by ethics approval reference MREC no. 98/2/121. No application for accessing this dataset was needed because for this study, only summary statistics and no personal data were shared between collaborating authors.
The Developing Human Connectome Project was approved by the UK Health Research Authority (Research Ethics Committee reference number: 14/LO/1169), and written parental consent was obtained in every case for imaging and open data release of the anonymized data. No application for accessing this dataset was needed because no personal data were shared between collaborating authors.
Samples
The meta-analysis was conducted using data from four birth cohort samples of European ancestry. Full details of the samples are provided in Supplementary Note A.
Analyses were preregistered on the Open Science Framework on 24 February 2023 (https://doi.org/10.17605/OSF.IO/M2QV3).
The Norwegian Mother, Father and Child Cohort Study
MoBa is a population-based pregnancy cohort study conducted by the Norwegian Institute of Public Health20,21. Participants were recruited from all over Norway from 1999–2008. The women consented to participation in 41% of the pregnancies. Blood samples were obtained from both parents during pregnancy and from mothers and children (umbilical cord) at birth74. The cohort includes ~114,500 children, 95,200 mothers and 75,200 fathers. The current study is based on version 12 of the quality-assured data files released for research in January 2019. Phenotype information used in this study (year of birth and sex of the participants) was obtained from the Medical Birth Registry (MBRN), a national health registry containing information about all births in Norway.
After post-imputation quality control, the MoBa dataset included 207,569 individuals, of whom 76,577 were children75. The final sample size of children from MoBa with European genetic ancestry and good-quality genotype and phenotype information included in the GWAS was 58,302 (28,456 females, 29,846 males).
Netherlands Twin Register
The NTR consists of twins, multiples and their family members. NTR twins and multiples were recruited into the register as newborns up to a few months after birth starting in 1987 (ref. 76). There were no exclusion criteria. Genotyping was performed on 7,392 individuals for whom there was parent-report data in infancy77. For NTR, 6,251 children (3,399 females, 2,852 males) with good-quality genotype and available phenotype data were included in the GWAS.
Lifelines
Lifelines is a multigenerational prospective population-based birth cohort study examining the health and health-related behaviours of 167,729 persons living in the North of the Netherlands. lt employs a broad range of investigative procedures in assessing the biomedical, socio-demographic, behavioural, physical and psychological factors that contribute to the health and disease of the general population, with a special focus on multimorbidity and complex genetics23. Individuals aged 25 to 50 were recruited from the Northern region of the Netherlands between 2006 and 2013 and, during their first study visit, were asked for consent for the study team to approach family members with an invitation to participate. This included any children (≥6 months) of cohort members. Questionnaires about children were answered by parents on the basis of retrospective recollection. The final sample size of Lifelines children with good-quality phenotype and genotype data included in the GWAS was 3,415 (1,768 females, 1,647 males).
MRC National Study for Health and Development
NSHD is a population-based prospective birth cohort study whose participants were infants from single births born in England, Scotland and Wales during 1 week in March 1946 (N = 5,362) to women with husbands24. The dataset included 2,939 genotyped individuals whose DNA was collected at age 53 (ref. 78). The sample was roughly representative of the national population of the same age at the time according to a comparison with census data. The final NSHD GWAS sample size including children with available genotype and phenotype was 2,592 (1,295 females, 1,297 males).
Phenotype coding
In all samples, individuals whose AOW was less than 6 months or greater than 36 months were excluded as outside the normative range3. MoBa, NSHD and NTR all recorded AOW in months as an integer variable. In the Lifelines sample, age at onset of walking was measured as an ordinal scale, using bins of months of age at onset of walking. These were recorded using the midpoint for each age bin. The upper and lower bins (‘10 months or younger’ and ‘24 months or older’, respectively), were winsorized, recoding them to 10 and 24 months, respectively. The phenotype descriptives for each cohort are reported in the Supplementary Table 1. Normality and spread of the phenotype data distribution was formally tested. All four cohorts met the assumptions of normality in terms of symmetry of the distribution (skewness = 0.43–0.91, see Supplementary Table 1). NSHD (kurtosis = 3.88), MoBa (kurtosis = 3.26) and, to a lesser degree, Lifelines (kurtosis = 1.33) showed a peaked distribution, different from NTR (kurtosis = −0.12). Histograms for the phenotype data distributions are reported in the Supplementary Notes (Supplementary Figs. 1, 3, 5 and 7).
Genotyping, imputation and quality control
Pre- and post-imputation quality control (QC) and imputation procedures were conducted for each cohort following individual study protocols and according to a common standard operating procedure (https://osf.io/jyk6d/), which was based on the Rapid Imputation for COnsortias PipeLIne (RICOPILI) pipeline79. In all the individual cohorts, samples were excluded from the GWAS if they presented excess autosomal heterozygosity, mismatch between self-reported and genetic sex, XXY genotype and other aneuploidies, and individual genotyping rate <90% in line with established GWAS analysis pipelines79,80. Duplicate samples and samples whose genetically determined ancestry did not overlay with the European-ancestry cluster based on a reference panel were also excluded to minimize confounding due to population stratification. Autosomal SNPs were excluded from the GWAS if they had MAFs < 0.5%, Hardy–Weinberg equilibrium exact test at P < 1 × 10−6 and call-rate <98%. Full details of the pre- and post-imputation QC are provided in Supplementary Note A and Table 2.
Genome-wide association analyses
GCTA81 fastGWA82 was used for association analyses in MoBa, Lifelines and NTR. PLINK83 1.9 was used for association analyses in NSHD, where all related individuals (PI-HAT > 0.2) were excluded from the analysis and the sample size was too small to use fastGWA.
Association analyses of the AOW, as a continuous variable, were carried out using a mixed linear model. Each primary GWAS included the first 10 ancestry principal components as continuous covariates, and sex and genotyping batch as discrete covariates. MoBa included year of birth, and NTR and Lifelines included age at data collection as continuous covariates. NTR included two dummy variables for the genotyping platform as covariates. In MoBa, Lifelines and NTR, where fastGWA was used, a sparse (0.05 cut-off) genetic relatedness matrix was included in the model to account for relatedness in the sample.
GWAS analyses were performed for each of the samples using the whole dataset and also with the samples stratified by sex.
GWAS meta-analysis
Summary statistics QC was performed using the GWASinspector84 R package on each of the cohorts’ summary statistics separately. Variants were excluded if they (1) presented invalid or missing values in the chromosome, position, effect and other allele, beta, standard error columns, and duplicated alleles; (2) were monomorphic (with allele frequency of 0 or 1 and variants with identical alleles), allosomal or mitochondrial; or (3) had imputation quality score <0.8. Results of the summary statistics QC are provided in Supplementary Note A and Table 3.
Summary statistics for the four samples were meta-analysed with a standard-error-weighted meta-analysis in METAL28 on SNPs with MAF > 1%. SNPs were matched between cohorts using rsIDs, which had been assigned according to their chromosome, base-pair positions and alleles on the basis of the 1000 Genomes85 reference panel in GWASinspector. Meta-analyses were performed separately for the whole sample and for sex-stratified samples. Finally, only SNPs for which the minimum sample size was 10,000 (which was obtained if the SNP was available for the MoBa sample, all three other cohorts or if it overlapped in all four cohorts) were retained for further analyses (6,902,401 variants). The I2 heterogeneity metric per SNP was calculated in METAL. M multiSNP heterogeneity statistics, indicating whether individual studies were systematically more influential or weaker than average based on their effects, was calculated using the getmstatistic R package for the independent lead SNPs (pairwise LD r2 < 0.1, P < 5 × 10−8, N SNPs = 16)31.
Fine mapping and functional annotation
To identify significant independent SNPs associated with AOW at each locus at a P-value threshold of P < 5 × 10−8 (ref. 86), we conducted conditional and joint association analyses (COJO)29 in GCTA81. This analysis conditions on the lead SNP at a locus and tests for further independent significant SNPs within the same chromosome using a stepwise selection procedure. The MoBa genotype data were used to estimate linkage disequilibrium (LD), in line with the COJO guidelines.
Fine mapping, functional annotation and gene-based analyses were carried out in FUMA33 (v.1.5.2) and MAGMA37 (v.1.08), indicating the list of independent lead SNPs from the COJO analysis. We defined significant SNPs to be independent if they had pairwise LD r2 < 0.6. Lead SNPs were defined as having pairwise LD r2 < 0.1 (ref. 87). Loci were merged if LD blocks distance was <250 kb.
For gene-mapping in FUMA, SNPs were mapped to genes at a maximum distance of 1 Mb33 on the basis of position, eQTL for selected relevant tissues such as the brain, lung, muscles, heart and adipose tissue, and chromatin interaction in the brain (see Supplementary Table 6). Annotation of genes was performed using ANNOVAR within FUMA (date of download 17 July 2017).
A subset of genes prioritized on the basis of mapping using only significant SNP–gene pairs at an FDR corrected P < 0.05 were tested for differential expression in 54 Genotype-Tissue Expression (GTEx) (v.8)35 and 11 BrainSpan34 tissues, and gene-set enrichment using GENE2FUNC in FUMA. The gene-set analysis in FUMA used one-sided hypergeometric tests to test whether the prioritized genes were over-represented in predefined gene sets obtained from the Molecular Signatures Database88,89 (MSigDB) v.7.0, WikiPathways90 (v.20191010) and GWAS Catalog91 (v.e0_r2022-11-29) databases, after excluding the MHC region and applying Bonferroni correction for multiple testing.
For MAGMA analyses, the MHC region was excluded and SNPs within 1 kb from a gene were assigned to each gene87. The MAGMA gene-based test identified genes associated with AOW from all 18,766 mapped genes using a Bonferroni correction to define statistical significance (Supplementary Table 9). The MAGMA gene-property analysis used 53 GTEx (v.8)35 and 11 BrainSpan34 RNA-seq datasets to test tissue specificity of genes associated with AOW, based on association one-tailed P values of all 18,766 genes mapped in FUMA.
Co-localization
We used coloc SuSiE43 to identify co-localization of GWAS and eQTL signals, using an LD reference panel of 1,444,196 HapMap3 SNPs with LD calculated in European-ancestry individuals from the UK Biobank92,93. Pairs of variants further than 3 cM apart were assumed to have 0 correlation. We used coloc SuSiE’s default priors (for more information on how these priors were estimated, see ref. 94). The eQTL data used in the co-localization analyses were from 261 post-mortem bulk RNA-seq samples of human cerebellum42. We replicated the co-localization signal observed in RBL2 (Fig. 2b) in the human cortex using eQTL data from 1,433 post-mortem bulk RNA-seq samples42 (Supplementary Fig. 15). To validate in an independent dataset whether genotype was indeed associated with RBL2 expression, we used bulk RNA-seq data of prefrontal cortex and individual-level genotypes from BrainVar44 (periods 4–6; Fig. 2d) (as no publicly available cerebellum RNA-seq with genotype on the same individual exists, to our knowledge). We used a two-sided Wilcoxon rank test to test for differences in RBL2 expression in the human cortex by genotype for GWAS and eQTL significant SNPs at MAF ≈ 50%. Missense variants in the chromosome 16 locus were annotated using the Variant Effect Predictor (VEP)45. The protein structure for RBL2 was predicted using AlphaFold95. Annotation of p.Tyr210Cys on RBL2 was done using the Genomics 2 Proteins Portal96.
LD score regression
LD score regression (LDSC25) was used to calculate h2SNP and bivariate genetic correlations30, using the 1000 Genomes Phase 3 (ref. 85) European-ancestry LD scores reference panel. Bivariate genetic correlations were calculated between AOW and multiple infant, psychiatric, neurodevelopmental and global cortical phenotypes, specifically: birth weight97, childhood body-mass index (cBMI)49, adult BMI50, autism98, ADHD51, educational attainment (EA)52, cognitive performance53, schizophrenia99, general loading for psychiatric disorders (cross-disorders)100, major depression101, bipolar disorder57 and 13 cortical phenotypes54 (see Fig. 4a). Genetic correlation was also calculated between the AOW in each of the cohorts.
In addition, LDSC was used to calculate h2SNP for the female and male meta-GWAS and genetic correlation between the sex-stratified analyses. Statistical significance was evaluated on the basis of 95% confidence intervals as preregistered. As post-hoc analyses, which were not preregistered, we also used LDSC to test the genetic correlation between AOW and four other motor phenotypes: self-reported walking pace55, clinically ascertained muscle weakness in the pincer grip in elderly people102, motor coordination in childhood103 and Parkinson’s Disease104. Bonferroni-adjusted P values correcting for 28 multiple testings are reported in Supplementary Table 14.
To further investigate the significant genetic correlation between AOW and cortical folding index (FI), we ran non-preregistered genetic correlation analyses using 26 regional FI summary statistics from ref. 54. The 26 ROIs were defined following the Glasser parcellation and identified on the basis of their functional specialization as early somatosensory/motor areas according to ref. 56. Given that regional FI could not be assumed to be completely unrelated, we applied FDR correction for 26 simultaneous tests.
Stratified LDSC40 was conducted to obtain estimates of heritability partitioned by functional annotation and cell-type. HapMap3 (ref. 105) SNPs (excluding the HLA region) from the meta-GWAS summary statistics weighted by LD score obtained from a European 1000 Genomes85 reference panel were used in the regression, as recommended by ref. 40. To estimate the proportion of genome-wide h2SNP attributable to functional categories, we ran the stratified LDSC ‘full baseline model’ (described in ref. 40) that evaluates whether heritability in a functional category is greater than heritability outside the category. This was tested for 96 functional categories provided by the stratified LDSC developers, including coding, untranslated regions, promoter and intron annotations from UCSC106, genomic annotations for all cell types and fetal cell types only from ENCODE107 and the Roadmap Epigenomics Consortium108, region conserved in mammals from ref. 109 and FANTOM5 enhancers from ref. 110. The P value for enrichment was adjusted for multiple testing using the Bonferroni method, as in similar previous research111.
To calculate whether heritability was enriched in specific cell types, we applied stratified LDSC to 53 sets of specifically expressed genes41 using multitissue gene expression data from the GTEx35 project. Bonferroni correction was applied to correct for multiple testing.
Genomic Structural Equation Modelling (SEM)
A non-preregistered Genomic SEM58 analysis was conducted to test whether the association of the genetic components of AOW with ADHD remained significant after conditioning for educational attainment. To this aim, we performed a genetic multivariable regression using the same ADHD51 and EA52 summary statistics that were entered in the LDSC analysis. For ADHD, the sample size was defined as effective sample Neff = 4 v × (1−v) × (Ncases + Ncontrols) where ν was the sample prevalence set as 50%, as indicated by the Genomic SEM developers (https://github.com/GenomicSEM/GenomicSEM/wiki/2.-Important-resources-and-key-information). The summary statistics were munged using HapMap3 SNPs. Both standardized and unstandardized results are reported in Supplementary Fig. 16.
MiXeR
Univariate causal mixture models were applied using MiXeR59 to obtain estimates of polygenicity, defined as the proportion of variants that contribute to 90% of the h2SNP112. We fitted bivariate models in MiXeR to estimate the genetic overlap that was due to both concordant and discordant SNP effects between AOW and six other phenotypes that had a Bonferroni-significant genetic correlation with AOW (calculated using LDSC). For each pair of traits, the models were evaluated using differential BIC and AIC values between the ‘best’ bivariate model estimating the optimal amount of polygenic overlap between the two traits (grey areas in Fig. 4b) and two simpler models, namely, the ‘minimum’ and the ‘maximum’ overlap models. The ‘minimum’ overlap models used only the minimum number of SNPs to explain the genetic overlap from the LDSC genetic correlation estimate, while the ‘maximum’ overlap models assumed that all the variants associated with the least polygenic of the two traits overlapped with the other trait. Positive differential BIC and AIC values indicated the ‘best’ MiXeR bivariate model outperforming the two simpler models. When the summary statistics for the second phenotype in these bivariate analyses came from the case-control GWAS, the Neff was calculated as 4/(1/Ncases + 1/Ncontrols). The MHC region (6:26,000,000–34,000,000) was excluded from MiXeR analyses due to its complex LD structure, in line with the programme recommendations. MiXeR v.1.3 was used for these analyses, and the data were prepared using scripts developed by the programme’s authors (https://github.com/precimed/python_convert).
We considered the bivariate MiXeR model to be supported when the differential AIC value comparing the ‘best’ vs ‘minimal’ model was positive. This criterion ensures that there is support for the model of the polygenic overlap that includes the added free parameters of this model.
Polygenic score analysis
Polygenic scores were calculated using PRS-cs113,114; a leave-one-out design was employed whereby additional GWAS meta-analyses were conducted, leaving out one of each of the smaller samples (NSHD, NTR and Lifelines) in turn to be used as a target dataset and meta-analysing the remaining samples as a training dataset for estimation of SNP weights. The MoBa sample comprises most of the overall sample size and thus could not be used as a target dataset, so a within-MoBa cross-validation was employed. The MoBa dataset was split randomly into five samples of roughly equal size by removing one-fifth of the data in turn (with no overlap in these fifths) from the whole dataset to create five new samples, each comprising four-fifths of the data. GWASs were then conducted on each of these five new samples and the summary statistics of the meta-analysis of four samples used for estimation of PGS SNP weights applied to the left-out fifth of the data. This was performed five times, using each of the fifths as target data in turn.
For all leave-one-out PGS analyses, including the within-MoBa design, we derived weights for each chromosome using the 1000 Genomes phase 3 European panel85 as a reference for LD, and the following PRS-cs parameters: parameter a and b in the gamma-gamma prior = 1 and 0.5, respectively, global shrinkage parameter phi = 0.01, 1,000 MCMC iterations, 500 burn-ins and 5 as a thinning factor of the Markov chain. PLINK (2.0)115 was used to compute the PGS in the target sample. The proportion of variance explained by the PGS, scaled so that mean = 0 and s.d. = 1, was quantified in the NTR cohort by the squared beta-coefficient from a linear regression model between the scaled phenotype and the PGS, including 10 ancestry principal components (PCs), age, sex and genotyping platform in the model, and quantified in all other cohorts with adjusted R2 of the linear regression between the scaled phenotype regressed on 10 PCs and the genotype batch and the PGS.
Within- and between-family polygenic score analysis
Within- and between-family analyses were performed using the NTR cohort dataset. The method is described in ref. 48 and scripts from ref. 116 were used (https://github.com/PerlineDemange/GeneticNurtureNonCog/).
A PGS was generated from a meta-analysis of the MoBa, Lifelines and NSHD GWAS (calculated as above), and the predictive power of this PGS was quantified in the whole NTR sample using the above method. We used a random intercept mixed-effects linear model in R using the dizygotic twins-only subsample of NTR (N = 2,508 individuals in 1,254 twin pairs), after ensuring that a mixed-effects model was justified by calculating a bootstrapped intraclass correlation (ICC = 0.656) as indicated in ref. 48. PGS entered into the model were first scaled to mean = 0 and s.d. = 1. Within-family PGS effects were calculated by subtracting the family mean PGS from each individual PGS. Between-family effects were modelled using the mean PGS for each family. The linear model included age, sex, the first 10 PCs and a genotyping platform dummy variable as covariates. The within- and between-family standardized regression coefficients were compared using a χ2 test.
Polygenic score in the Developing Human Connectome Project
Genetic data
Infant saliva DNA was genotyped for SNPs genome-wide on the Illumina Infinium Omni5-4 array and standard quality control was performed. The dataset was imputed to the Haplotype Reference Consortium reference panel117 on the Michigan Imputation Server. The imputed data were used to compute an AOW PGS for each of the 264 unrelated European infants using summary statistics from the AOW meta-GWAS and the PRS-cs software113, as previously described.
Acquisition, processing and surface generation of imaging data
T2-weighted MRI data were acquired at term-equivalent age (median postmenstrual age = 41.9 weeks) as part of the dHCP60 in 264 term-born infants (137 male, 127 female) with available genotype data. The volumes were run through the neonatal-specific processing pipeline developed for the dHCP study, including bias field correction, brain extraction and image segmentation118,119,120. Segmentations were used to generate cortical, white matter and pial surfaces, and each subject was visually inspected to ensure accuracy before the local gyrification index was calculated at each vertex on the basis of the ratio of the pial and white matter surface areas121,122.
Image registration
T2 images were registered to the 40-week dHCP neonatal atlas (https://brain-development.org/brain-atlases/atlases-from-the-dhcp-project/)123 via an age-matched intermediate using Symmetric Diffeomorphic Image Registration, implemented using Advanced Neuroimaging Tools (ANTs)124,125, as a measure of individual variation in brain volume; the log-Jacobian determinant images were calculated by applying ANTs algorithms to the nonlinear transformation deformation tensor fields. Log-Jacobian maps were then smoothed using a 3-mm full-width half-maximum Gaussian filter and downsampled to 1 mm isotropic resolution (to increase computational efficiency). A 4D volume was created by merging the 1-mm log-Jacobian maps across all participants (N = 264), then subsequently used as the input to the randomize algorithm (described below).
Tensor-based morphometry of imaging data
Permutation testing using the randomize function, part of the FMRIB Software Library (FSL)126,127, was used with a general linear model, including gestational age, postmenstrual age at scan, sex, weight z-score and 10 ancestral PCs as covariates. Threshold-free cluster enhancement and family-wise error (FWE) rate were applied to correct for multiple comparisons between voxels. Significant areas were identified with permutation testing using 5,000 random permutations (two-sided test). In Fig. 5, we show results at a significance level of P < 0.05 in the FWE-corrected contrast.
Bayley’s gross motor analysis
For the European term-born infants in the dHCP cohort who were assessed using the Bayley-III Scales of Infant and Toddler Development at an 18-month follow-up (N = 217), we investigated the association between the scaled gross motor score and the PGS for AOW using a multiple linear regression model, implemented using the lm function in R (https://www.r-project.org/). The model included sex, gestational age at birth, birth weight z-score, home environment score (as a proxy for socioeconomic status) and 10 ancestral PCs as covariates to account for potential confounding. All continuous variables were standardized before analysis. Data distribution was assumed to be normal, but this assumption was not formally tested.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The summary statistics of the genome-wide association study of age at onset of walking are available on figshare (https://doi.org/10.6084/m9.figshare.28071566)128. Data from the Norwegian Mother, Father and Child Cohort (MoBa) Study and the Medical Birth Registry of Norway used in this study are managed by the National Health Register Holders in Norway (Norwegian Institute of Public Health) and can be made available to researchers, with approval from the Regional Committees for Medical and Health Research Ethics (REC), compliance with the EU General Data Protection Regulation (GDPR) and approval from the data owners. The consent given by the participants is not open to storage of data on an individual level in repositories or journals. Researchers who want access to datasets for replication should apply through https://helsedata.no/. Access to datasets requires approval from The Regional Committee for Medical and Health Research Ethics in Norway and an agreement with MoBa. Data from the Netherlands Twin Register (NTR) are available upon request by researchers. Information is available at https://tweelingenregister.vu.nl/information_for_researchers/working-with-ntr-data. Lifelines data may be obtained from a third party and are not publicly available. Researchers can apply to use the Lifelines data used in this study. More information about how to request Lifelines data and the conditions of use can be found on their website at https://www.lifelines-biobank.com/researchers/working-with-us. National Study for Health and Development (NSHD) data used in this publication are available to bona fide researchers upon request to the NSHD Data Sharing Committee via a standard application procedure. Further details can be found at http://www.nshd.mrc.ac.uk/data. https://doi.org/10.5522/NSHD/Q101. eQTL results for the ROSMAP, Mayo TCX, Mayo CER and cortical meta-analysis from ref. 42 are available through the AMP-AD Knowledge Portal: https://www.synapse.org/Synapse:syn2580853/wiki/409840. The accession number for the raw RNA-seq and WGS data from BrainVar, along with processed files, is PsychENCODE Knowledge Portal: syn21557948 on Synapse.org (https://www.synapse.org/#!Synapse:syn4921369). Developing Human Connectome project data are open access and data are available for download via https://nda.nih.gov/edit_collection.html?id=3955.
References
Langendonk, J. M. et al. Assessment of motor milestones in twins. Twin Res. Hum. Genet. 10, 835–839 (2007).
Help your child grow and thrive. Centers for Disease Control and Prevention https://www.cdc.gov/ncbddd/actearly/index.html (2025).
WHO Multicentre Growth Reference Study Group WHO Motor Development Study: windows of achievement for six gross motor development milestones. Acta Paediatr. 95, 86–95 (2006).
Bowler, A., Arichi, T., Austerberry, C., Fearon, P. & Ronald, A. A systematic review and meta-analysis of the associations between motor milestone timing and motor development in neurodevelopmental conditions. Neurosci. Biobehav. Rev. 167, 105825 (2024).
Kim, S. W. et al. Clinical characteristics of developmentally delayed children based on interdisciplinary evaluation. Sci. Rep. 10, 8148 (2020).
Chaplais, J. D. & Macfarlane, J. A. A review of 404 ‘late walkers’. Arch. Dis. Child. 59, 512–516 (1984).
Adolph, K. E., Vereijken, B. & Shrout, P. E. What changes in infant walking and why. Child Dev. 74, 475–497 (2003).
Hadders-Algra, M. Early human motor development: from variation to the ability to vary and adapt. Neurosci. Biobehav. Rev. 90, 411–427 (2018).
Karasik, L. B. & Robinson, S. R. Milestones or millstones: how standard assessments mask cultural variation and misinform policies aimed at early childhood development. Policy Insights Behav. Brain Sci. 9, 57–64 (2022).
Kariger, P. K. et al. Community and international nutrition iron deficiency and physical growth predict attainment of walking but not crawling in poorly nourished Zanzibari infants. J. Nutr. 135, 814–819 (2005).
Fazl, A. & Fleisher, J. Anatomy, physiology, and clinical syndromes of the basal ganglia: a brief review. Semin. Pediatr. Neurol. 25, 2–9 (2018).
Adolph, K. E., Hoch, J. E. & Cole, W. G. Development (of walking): 15 suggestions. Trends Cogn. Sci. 22, 699–711 (2018).
Adolph, K. E. & Tamis-Lemonda, C. S. The costs and benefits of development: the transition from crawling to walking. Child Dev. Perspect. 8, 187–192 (2014).
The Global Research on Developmental Disabilities Collaborators Accelerating progress on early childhood development for children under 5 years with disabilities by 2030. Lancet Glob. Health 10, e438–e444 (2022).
Austerberry, C., Mateen, M., Fearon, P. & Ronald, A. Heritability of psychological traits and developmental milestones in infancy. JAMA Netw. Open 5, e2227887 (2022).
Smith, L. et al. Genetic and environmental influences on developmental milestones and movement: results from the Gemini Cohort Study. Res. Q. Exerc. Sport 88, 401–407 (2017).
Serdarevic, F. et al. Polygenic risk scores for developmental disorders, neuromotor functioning during infancy, and autistic traits in childhood. Biol. Psychiatry 87, 132–138 (2020).
Hannigan, L. J. et al. Developmental milestones in early childhood and genetic liability to neurodevelopmental disorders. Psychol. Med. 53, 1750–1758 (2023).
Ronald, A. & Gui, A. The potential and translational application of infant genetic research. Nat. Genet. 56, 1346–1354 (2024).
Magnus, P. et al. Cohort profile: the Norwegian Mother and Child Cohort Study (MoBa). Int. J. Epidemiol. 35, 1146–1150 (2006).
Magnus, P. et al. Cohort profile update: the Norwegian Mother and Child Cohort Study (MoBa). Int. J. Epidemiol. 45, 382–388 (2016).
Willemsen, G. et al. The Netherlands Twin Register biobank: a resource for genetic epidemiological studies. Twin Res. Hum. Genet. 13, 231–245 (2010).
Scholtens, S. et al. Cohort profile: LifeLines, a three-generation cohort study and biobank. Int. J. Epidemiol. 44, 1172–1180 (2015).
Wadsworth, M., Kuh, D., Richards, M. & Hardy, R. Cohort profile: the 1946 National Birth Cohort (MRC National Survey of Health and Development). Int. J. Epidemiol. 35, 49–54 (2006).
Bulik-Sullivan, B. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Winkler, T. W. et al. Quality control and conduct of genome-wide association meta-analyses. Nat. Protoc. 9, 1192–1212 (2014).
Yang, J. et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807–812 (2011).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Higgins, J. P. T. & Thompson, S. G. Quantifying heterogeneity in a meta-analysis. Stat. Med. 21, 1539–1558 (2002).
Magosi, L. E., Goel, A., Hopewell, J. C. & Farrall, M. Identifying systematic heterogeneity patterns in genetic association meta-analysis studies. PLoS Genet. 13, e1006755 (2017).
Watanabe, K., Taskesen, E., van Bochoven, A. & Posthuma, D. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 8, 1826 (2017).
Kang, H. J. et al. Spatio-temporal transcriptome of the human brain. Nature 478, 483–489 (2011).
Aguet, F. et al. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
Martin, A. R. et al. PanelApp crowdsources expert knowledge to establish consensus diagnostic gene panels. Nat. Genet. 51, 1560–1565 (2019).
Banerjee-Basu, S. & Packer, A. SFARI Gene: an evolving database for the autism research community. Dis. Model. Mech. 3, 133–135 (2010).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).
Sieberts, S. K. et al. Large eQTL meta-analysis reveals differing patterns between cerebral cortical and cerebellar brain regions. Sci. Data 7, 340 (2020).
Wallace, C. A more accurate method for colocalisation analysis allowing for multiple causal variants. PLoS Genet. 17, e1009440 (2021).
Werling, D. M. et al. Whole-genome and RNA sequencing reveal variation and transcriptomic coordination in the developing human prefrontal cortex. Cell Rep. 31, 107489 (2020).
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Selzam, S. et al. Comparing within- and between-family polygenic score prediction. Am. J. Hum. Genet. 105, 351–363 (2019).
Vogelezang, S. et al. Novel loci for childhood body mass index and shared heritability with adult cardiometabolic traits. PLoS Genet. 16, e1008718 (2020).
Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry. Hum. Mol. Genet. 27, 3641–3649 (2018).
Demontis, D. et al. Genome-wide analyses of ADHD identify 27 risk loci, refine the genetic architecture and implicate several cognitive domains. Nat. Genet. 55, 198–208 (2023).
Okbay, A. et al. Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nat. Genet. 54, 437–449 (2022).
Savage, J. E. et al. Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nat. Genet. 50, 912–919 (2018).
Warrier, V. et al. Genetic insights into human cortical organization and development through genome-wide analyses of 2,347 neuroimaging phenotypes. Nat. Genet. 55, 1483–1493 (2023).
Timmins, I. R. et al. Genome-wide association study of self-reported walking pace suggests beneficial effects of brisk walking on health and survival. Commun. Biol. 3, 634 (2020).
Glasser, M. F. et al. A multi-modal parcellation of human cerebral cortex. Nature 536, 171–178 (2016).
Mullins, N. et al. Genome-wide association study of more than 40,000 bipolar disorder cases provides new insights into the underlying biology. Nat. Genet. 53, 817–829 (2021).
Grotzinger, A. D. et al. Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat. Hum. Behav. 3, 513–525 (2019).
Frei, O. et al. Bivariate causal mixture model quantifies polygenic overlap between complex traits beyond genetic correlation. Nat. Commun. 10, 2417 (2019).
Edwards, A. D. et al. The Developing Human Connectome Project neonatal data release. Front. Neurosci. 16, 886772 (2022).
Avants, B. & Gee, J. C. Geodesic estimation for large deformation anatomical shape averaging and interpolation. Neuroimage 23, S139–S150 (2004).
Bayley, N. Bayley Scales of Infant and Toddler Development 3rd edn (Harcourt Assessment, 2006).
Hospodar, C. M., Hoch, J. E., Lee, D. K., Shrout, P. E. & Adolph, K. E. Practice and proficiency: factors that facilitate infant walking skill. Dev. Psychobiol. 63, e22187 (2021).
Doya, K. Complementary roles of basal ganglia and cerebellum in learning and motor control. Curr. Opin. Neurobiol. 10, 732–9 (2000).
Brunet, T. et al. Biallelic loss‐of‐function variants in RBL2 in siblings with a neurodevelopmental disorder. Ann. Clin. Transl. Neurol. 7, 390–396 (2020).
Samra, N. et al. RBL2 bi-allelic truncating variants cause severe motor and cognitive impairment without evidence for abnormalities in DNA methylation or telomeric function. J. Hum. Genet. 66, 1101–1112 (2021).
Riglin, L. et al. Early manifestations of genetic liability for ADHD, autism and schizophrenia at ages 18 and 24 months. JCPP Adv. 2, e12093 (2022).
Finlay, B. L. & Darlington, R. B. Linked regularities in the development and evolution of mammalian brains. Science 268, 1578–1584 (1995).
Blumberg, M. S. & Adolph, K. E. Protracted development of motor cortex constrains rich interpretations of infant cognition. Trends Cogn. Sci. 27, 233–245 (2023).
Panchanathan, K. & Frankenhuis, W. E. The evolution of sensitive periods in a model of incremental development. Proc. R. Soc. B 283, 20152439 (2016).
Nilsen, R. M. et al. Self-selection and bias in a large prospective pregnancy cohort in Norway. Paediatr. Perinat. Epidemiol. 23, 597–608 (2009).
Vejrup, K., Magnus, P. & Magnus, M. Lost to follow‐up in the Norwegian mother, father and child cohort study. Paediatr. Perinat. Epidemiol. 36, 300–309 (2022).
Werme, J., van der Sluis, S., Posthuma, D. & de Leeuw, C. A. An integrated framework for local genetic correlation analysis. Nat. Genet. 54, 274–282 (2022).
Paltiel, L. et al. The biobank of the Norwegian Mother and Child Cohort Study – present status. Nor. Epidemiol. https://doi.org/10.5324/nje.v24i1-2.1755 (2014).
Corfield, E. C. et al. The Norwegian Mother, Father, and Child cohort study (MoBa) genotyping data resource: MoBaPsychGen pipeline v.1. Preprint at bioRxiv https://doi.org/10.1101/2022.06.23.496289 (2022).
Boomsma, D. I. et al. Netherlands Twin Register: a focus on longitudinal research. Twin Res. 5, 401–406 (2002).
Ligthart, L. et al. The Netherlands Twin Register: longitudinal research based on twin and twin-family designs. Twin Res. Hum. Gen. 22, 623–636 (2019).
Rousseau, K. et al. MUC7 haplotype analysis: results from a longitudinal birth cohort support protective effect of the MUC7*5 allele on respiratory function. Ann. Hum. Genet. 70, 417–427 (2006).
Lam, M. et al. RICOPILI: rapid imputation for consortias pipeline. Bioinformatics 36, 930–933 (2020).
Marees, A. T. et al. A tutorial on conducting genome-wide association studies: quality control and statistical analysis. Int. J. Methods Psychiatr. Res. 27, e1608 (2018).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Jiang, L. et al. A resource-efficient tool for mixed model association analysis of large-scale data. Nat. Genet. 51, 1749–1755 (2019).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Ani, A., van der Most, P. J., Snieder, H., Vaez, A. & Nolte, I. M. GWASinspector: comprehensive quality control of genome-wide association study results. Bioinformatics 37, 129–130 (2021).
The 1000 Genomes Project Consortium A global reference for human genetic variation. Nature 526, 68–74 (2015).
Dudbridge, F. & Gusnanto, A. Estimation of significance thresholds for genomewide association scans. Genet. Epidemiol. 32, 227–234 (2008).
Watanabe, K. et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat. Genet. 51, 1339–1348 (2019).
Liberzon, A. et al. The Molecular Signatures Database hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Martens, M. et al. WikiPathways: connecting communities. Nucleic Acids Res. 49, D613–D621 (2021).
Sollis, E. et al. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res. 51, D977–D985 (2023).
Privé, F. LD reference for HapMap3+. figshare https://doi.org/10.6084/m9.figshare.21305061.v2 (2023).
Privé, F., Arbel, J. & Vilhjálmsson, B. J. LDpred2: better, faster, stronger. Bioinformatics 36, 5424–5431 (2021).
Wallace, C. Eliciting priors and relaxing the single causal variant assumption in colocalisation analyses. PLoS Genet. 16, e1008720 (2020).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Iqbal, S. et al. Comprehensive characterization of amino acid positions in protein structures reveals molecular effect of missense variants. Proc. Natl Acad. Sci. USA 117, 28201–28211 (2020).
Beck, J. J. et al. Genetic meta-analysis of twin birth weight shows high genetic correlation with singleton birth weight. Hum. Mol. Genet. 30, 1894–1905 (2021).
Grove, J. et al. Identification of common genetic risk variants for autism spectrum disoder. Nat. Genet. 51, 431–444 (2019).
Trubetskoy, V. et al. Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature 604, 502–508 (2022).
Lee, P. H. et al. Genomic relationships, novel loci, and pleiotropic mechanisms across eight psychiatric disorders. Cell 179, 1469–1482.e11 (2019).
Howard, D. M. et al. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nat. Neurosci. 22, 343–352 (2019).
Jones, G. et al. Genome-wide meta-analysis of muscle weakness identifies 15 susceptibility loci in older men and women. Nat. Commun. 12, 654 (2021).
Mountford, H. S., Hill, A., Barnett, A. L. & Newbury, D. F. Genome-wide association study of motor coordination. Front. Hum. Neurosci. 15, 669902 (2021).
Nalls, M. A. et al. Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet Neurol. 18, 1091–1102 (2019).
The International HapMap 3 Consortium Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Kent, W. J. et al. The Human Genome Browser at UCSC. Genome Res. 12, 996–1006 (2002).
The ENCODE Project Consortium A user’s guide to the Encyclopedia of DNA Elements (ENCODE). PLoS Biol. 9, e1001046 (2011).
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–82 (2011).
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018).
Hindley, G. et al. Charting the landscape of genetic overlap between mental disorders and related traits beyond genetic correlation. Am. J. Psychiatry 179, 833–843 (2022).
Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1776 (2019).
Pain, O. et al. Evaluation of polygenic prediction methodology within a reference-standardized framework. PLoS Genet. 17, e1009021 (2021).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Demange, P. A. et al. Estimating effects of parents’ cognitive and non-cognitive skills on offspring education using polygenic scores. Nat. Commun. 13, 4801 (2022).
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
Makropoulos, A. et al. Automatic whole brain MRI segmentation of the developing neonatal brain. IEEE Trans. Med. Imaging 33, 1818–1831 (2014).
Makropoulos, A. et al. Regional growth and atlasing of the developing human brain. Neuroimage 125, 456–478 (2016).
Makropoulos, A., Counsell, S. J. & Rueckert, D. A review on automatic fetal and neonatal brain MRI segmentation. Neuroimage 170, 231–248 (2018).
Zilles, K., Armstrong, E., Schleicher, A. & Kretschmann, H.-J. The human pattern of gyrification in the cerebral cortex. Anat. Embryol. 179, 173–179 (1988).
Armstrong, E., Schleicher, A., Omran, H., Curtis, M. & Zilles, K. The ontogeny of human gyrification. Cerebral Cortex 5, 56–63 (1995).
Schuh, A. et al. Unbiased construction of a temporally consistent morphological atlas of neonatal brain development. Preprint at bioRxiv https://doi.org/10.1101/251512 (2018).
Avants, B., Epstein, C., Grossman, M. & Gee, J. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Med. Image Anal. 12, 26–41 (2008).
Avants, B. B. et al. A reproducible evaluation of ANTs similarity metric performance in brain image registration. Neuroimage 54, 2033–2044 (2011).
Winkler, A. M., Ridgway, G. R., Webster, M. A., Smith, S. M. & Nichols, T. E. Permutation inference for the general linear model. Neuroimage 92, 381–397 (2014).
Jenkinson, M., Beckmann, C. F., Behrens, T. E. J., Woolrich, M. W. & Smith, S. M. FSL. Neuroimage 62, 782–790 (2012).
Gui, A. & Ronald, A. Age at onset of walking GWAS summary statistics. figshare https://doi.org/10.6084/m9.figshare.28071566 (2025).
Acknowledgements
This work was funded by the Simons Foundation for Autism Research Initiative (SFARI, 724306) to A.R.; the Economic and Social Research Council (ES/P000592/1) to A. Hollowell; the South-Eastern Norway Regional Health Authority (2019097 and 2022083 to L.J.H.; 2021045 to E.C.C.; 2020022, 2024001 and 2019097 to A. Havdahl); the Research Council of Norway (274611 to E.C.C.; 274611 and 336085 to A. Havdahl; 324499, 324252, 223273 to O.A.A.); the UK Medical Research Council (MRC, MC_UU_00019/1 to A.W.; MR/V036874/1 and MR/N026063/1 to T.A.; MRC PG MR/T003057/1 to M.H.J.; MR/S037055/1 to F.D.); a UK National Institute for Health Research (NIHR) fund to H.C.; SFARI, Wellcome Trust (214322\Z\18\Z), Horizon-Europe R2D2-MH (101057385), and UKRI (10063472) to V.W.; KG Jebsen Stiftelsen, Nordforsk (164218), EU H2020 RIA grant 964874 REALMENT to O.A.A.; HORIZON-HLTH-2021 R2D2-MH (101057385) to B.S.P.; ERC-2017-COG (771057) WELL-BEING and NWO Vici scheme (VI.C.211.054 504 ‘The Power of Wellbeing’) to M.B.; a KNAW Academy Professor Award (PAH/6635) to D.I.B.; the European Research Council (FP/20072013 to A.D.E.); the National Institute of Mental Health (NIMH, R01MH129751 and U01MH122681) to S.J.S.; HDR UK QQ2 Molecules to Health Records Driver Programme to S.J.S; and European Union’s Horizon Europe Research and Innovation programme Marie Skłodowska-Curie grant agreement European Social Science Genetics Network (101073237) to A. Havdahl. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. The views expressed are those of the authors and not necessarily those of the MRC, the NIHR or the Department of Health and Social Care. The Norwegian Mother, Father and Child Cohort Study is supported by the Norwegian Ministry of Health and Care Services, and the Ministry of Education and Research. We thank all the participating families in Norway who take part in this on-going cohort study; the Norwegian Institute of Public Health (NIPH) for generating high-quality genomic data. This research is part of the HARVEST collaboration supported by the RCN (grant no. 229624). For providing genotype data, we also thank the NORMENT Centre (funded by the RCN (223273), the South-Eastern Norway Regional Health Authority (SENRHA) and Stiftelsen Kristian Gerhard Jebsen), in collaboration with deCODE Genetics, and the Center for Diabetes Research at the University of Bergen (funded by the ERC AdG project SELECTionPREDISPOSED, Stiftelsen Kristian Gerhard Jebsen, Trond Mohn Foundation, the RCN, the Novo Nordisk Foundation, the University of Bergen, and the Western Norway Regional Health Authority). This work was performed on the TSD (Tjeneste for Sensitive Data) facilities, owned by the University of Oslo, operated and developed by the TSD service group at the University of Oslo, IT-Department (USIT) (tsd-drift@usit.uio.no). The computations were performed on resources provided by Sigma2—the National Infrastructure for High Performance Computing and Data Storage in Norway. The Lifelines initiative has been made possible by subsidy from the Dutch Ministry of Health, Welfare and Sport, the Dutch Ministry of Economic Affairs, the University Medical Center Groningen (UMCG), Groningen University and the Provinces in the North of the Netherlands (Drenthe, Friesland, Groningen). We acknowledge the services of the Lifelines Cohort Study, the contributing research centres delivering data to Lifelines, and all the study participants. The MRC National Survey of Health and Development is funded by the UK Medical Research Council (MC_UU_00019/1). We also thank the study participants for their continuing participation in the National Study of Health and Development, and also the study members from the MRC NSHD for their lifelong commitment to the study. The Netherlands Twin Register acknowledges funding from the Netherlands Organization for Scientific research (NWO), including NWO-Grants NWO/SPI 56-464-14192 and 480-15-001/674: Netherlands Twin Registry Repository and the Biobanking and Biomolecular Resources Research Infrastructure (BBMRI–NL, 184.021.007 and 184.033.111); Amsterdam Public Health (APH) and Neuroscience Campus Amsterdam (NCA); the European Community 7th Framework Program (FP7/2007-2013): ENGAGE (HEALTH-F4-2007-201413) and ACTION (9602768) and European Research Council (ERC-230374). We also acknowledge The Rutgers University Cell and DNA Repository cooperative agreement (NIMH U24 MH068457-06); the Collaborative Study of the Genetics of DZ twinning (NIH R01D0042157-01A1); the Developmental Study of Attention Problems in Young Twins (NIMH, RO1 MH58799-03); Major depression: stage 1 genome-wide association in population-based samples (MH081802); Determinants of Adolescent Exercise Behavior (NIDDK R01 DK092127-04); Grand Opportunity grants Integration of Genomics and Transcriptomics (NIMH 1RC2MH089951-01) and Developmental Trajectories of Psychopathology (NIMH 1RC2 MH089995); and the Avera Institute for Human Genetics, Sioux Falls, South Dakota (USA). We also thank the participants and parents for their voluntary participation in the research project of the Netherlands Twin Register. Data were also provided by the developing Human Connectome Project, KCL-Imperial-Oxford Consortium and the work was funded by ERC grant agreement no. 319456, the Wellcome EPSRC Centre for Medical Engineering at Kings College London (WT 203148/ Z/16/Z) and by the National Institute for Health Research (NIHR) Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust and King’s College London. We thank all the families who kindly agreed to participate in the project and recognize their particular commitment in remaining engaged with the programme during the COVID-19 pandemic; the Neonatal Intensive Care Unit and the Newborn Imaging Centre at Evelina London Children’s Hospital for the support; all the families who contributed with their data; and S. Medland, L. Jiang, E. Hagen, G. Hindley, J. Yang and R. Ma for useful advice at various stages of the research.
Author information
Authors and Affiliations
Contributions
A.G., J.-J.H., E.B.R., M.H.J., F.D., A. Havdahl and A.R. were involved in the conception or design of the work. A.W., V.W., O.A.A., M.B., D.I.B., C.M.M., A.D.E., C.A.H. and A. Havdahl contributed to the data acquisition. A.G., A. Hollowell, E.M.W., M.J.M., L.J.H., E.C.C., V.O., A.W., R.P., H.C., S.W., V.W., O.A.A., M.B., S.J.S. and A.R. were involved in the data preparation and analysis. A.G., A. Hollowell, E.M.W., L.J.H., E.C.C., R.P., H.C., S.W., E.ME., B.S.P., T.A., F.D., S.J.S., A. Havdahl and A.R. interpreted the data. A.G., A. Hollowell, E.M.W., M.J.M. and A.R. drafted the manuscript. All authors reviewed and revised the manuscript.
Corresponding author
Ethics declarations
Competing interests
O.A.A. is a consultant to cortechs.ai and Precision Health, and receives speaker honoraria from Janssen, Lundbeck, Sunovion, Lilly, and Otsuka. S.J.S. receives research funding from BioMarin Pharmaceutical. The other authors declare no competing interests.
Peer review
Peer review information
Nature Human Behaviour thanks Sophie van der Sluis and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Partitioned heritability enrichment by functional annotation.
Enrichment of age at onset of walking GWAS signal by functional genomic annotation. Points represent the heritability enrichment estimate +/− standard errors of the enrichment estimates, obtained in LDSC40 (two-sided test). The dashed horizontal line represents statistical significance based on Bonferroni correction for multiple testing (Supplementary Table 12). Genomic annotations with significant enrichment for age at onset of walking are labelled. Dots are colored using a spectrum of colors based on alphabetical order.
Extended Data Fig. 2 Partitioned heritability enrichment by cell type.
Tissue enrichment based on LDSC partitioned heritability analysis41. Statistically significant enrichments after correcting two-sided p-values for multiple comparisons using the Bonferroni method are highlighted as yellow bars.
Supplementary information
Supplementary Information (download PDF )
Supplementary Notes A and B, Table 1 and Figs. 1–16.
Supplementary Table (download XLSX )
Supplementary Tables 1–16.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gui, A., Hollowell, A., Wigdor, E.M. et al. Genome-wide association meta-analysis of age at onset of walking in over 70,000 infants of European ancestry. Nat Hum Behav 9, 1470–1487 (2025). https://doi.org/10.1038/s41562-025-02145-1
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41562-025-02145-1







