Abstract
Background/objectives
This study aimed to predict body mass index (BMI) trajectories from childhood to early adulthood using explainable artificial intelligence, integrating adult BMI polygenic scores (PGS), maternal, early-life, and familial factors to identify key predictors of obesity risk and inform prevention strategies.
Subjects/methods
We analyzed longitudinal data from the Raine Study Gen2 cohort, recruiting 2868 participants. This observational study, without randomization or case-control design, collected BMI measurements at ages 8, 10, 14, 17, 20, 23, and 27 years. We applied Kolmogorov–Arnold Networks (KAN) alongside conventional machine learning models, integrating epidemiological variables (maternal and paternal anthropometrics, parental education, early-life skinfold measurements) with seven BMI-related PGS. The analysis spanned from childhood to early adulthood, with no intervention administered.
Results
The KAN model, combining epidemiological and PGS data, achieved predictive performance with R² ranging from 0.81 for BMI at age 8 to 0.34 at age 27. BMI z-score at age 5 was the dominant predictor in early years, with adult BMI PGS influence increasing post-adolescence. Maternal and paternal anthropometry, parental education, and early-life skinfold measurements were significant contributors.
Conclusions
The interpretable KAN model revealed the dynamic interplay of childhood BMI z-score and PGS emerging as key drivers of BMI trajectories across life stages. The finding underscores the potential of BMI at critical time in early childhood as a biomarker for obesity risk. Our interpretable model offers actionable insights for targeted obesity prevention strategies.
Similar content being viewed by others
Introduction
Obesity, defined as an excessive accumulation of body fat that poses health risks, has reached epidemic proportions globally. Worldwide, adult obesity has more than doubled since 1990, and adolescent obesity has quadrupled (WHO) [1]. In Australia, nearly two-thirds of adults and one in four children are overweight or obesity, underscoring the need for targeted interventions and prevention. Childhood obesity often persists into adulthood, with longitudinal studies showing that ~55% of children with obesity remain with obesity in adolescence, 80% of adolescents with obesity continue as adults, and 70% maintain obesity beyond age 30 [2]. This highlights the long-term implications and importance of early prevention.
Key factors influencing obesity development from childhood to early adulthood are complex and multifaceted, including environmental, biological, pre- and postnatal, and psychosocial factors [3]. Children of parents with obesity face higher risks [4], with stronger associations in older children, reflecting genetic, epigenetic, and shared environmental influences [4]. Economic factors affect physical activity levels and access to healthy food [5]. Pre- and postnatal factors such as maternal pre-pregnancy obesity [6], excessive gestational weight gain, gestational diabetes [6, 7], and smoking [8] are significant contributors. Early childhood factors, including birth weight, rapid postnatal weight gain [9, 10], and breastfeeding duration [11], have also been linked to obesity risk.
Body mass index (BMI) serves as a surrogate measure of obesity risk at the population level [12]. Our study is motivated by the need to deepen our understanding of the longitudinal impact of a range of factors identifiable early in life, that influence BMI development into adulthood, incorporating insights from early childhood, parental, environmental and genetic factors. Unlike previous studies, which often focus on isolated factors or lack comprehensive interpretability, we proposed a novel machine learning (ML) model that balances predictive performance with robust interpretability. Additionally, we introduced an innovative methodology to systematically identify and quantify the importance of influential factors, transforming these insights into precise BMI estimations.
Materials and methods
Study population
This study used data from the Raine Study Gen2Footnote 1 (the variables in the Rain Study are mostly harmonized with the LifeCycle Project-EU Child Cohort Network [13]). The Raine Study is a large, well-established longitudinal cohort designed to track health and developmental outcomes from pregnancy into adulthood [14]. The Raine Study initially recruited 2900 pregnant women (Generation 1, Gen1) and followed 2868 children (Generation 2, Gen2). The Gen2 cohort has been followed up longitudinally from birth into early adulthood [14]. Table 1 summarizes the key characteristics, including environmental, maternal, parental and early childhood factors across seven age groups (8–27 years) in the Raine Study.
Outcome measure
Gen2 underwent phenotyping for BMI at ages 8, 10, 14, 17, 20, 23, and 27. Weight was measured using standardized scales, and height was recorded using a stadiometer. BMI was calculated as weight (kg) divided by height squared (m²) and served as the primary outcome variable. Descriptive statistics for BMI across these age groups are presented in Supplementary 1.
Epidemiological predictors – maternal, early childhood and family variables - (Epidemiology Dataset)
The Epidemiology Dataset comprises 201 raw variables (see descriptions and statistics in Supplementary 1) grouped into four categories: environmental factors (paternal characteristics, household socioeconomic conditions, family structure and environmental exposures), maternal factors (pregnancy health, anthropometrics and lifestyle), other parental factors and early-life anthropometrics (neonatal and early childhood anthropometrics at birth, 1 and 5 years). To ensure robust analysis, we cleaned the data and applied correlation-based clustering to group similar variables, then selected the most important predictors for BMI. Participants with missing data were removed to create an optimized dataset. See Supplementary 1 for variable descriptions and statistics, and Supplementary 2 for data flow and sample sizes.
Genetic factors - (Genetic Dataset)
Seven polygenic scores (PGS) for BMI (PGS002313 [15], PGS002161 [16], PGS00027 [17], PGS004150 [18], PGS003884 [19], PGS002853 [20], PGS000921 [21]) were sourced from the PGS Catalog [22]. Each score reflects a genetic predisposition to BMI, based on genetic variants identified in large-scale genome-wide association studies (GWAS). The seven PGSs were chosen for their large number of variants (over one million) and diverse development methods as listed in Table 2.
The PGS were calculated using the pgsc_calc pipeline [23], which computes PGS by combining genetic data with scoring files from the PGS Catalog. The pipeline first lifts variant coordinates from their original genome build to the target build (GRCh38), then matches genetic variants (SNPs) from the target dataset to those in the PGS Catalog scoring file, using chromosome, position, and allele information. After removing the problematic variants, such as those with missing or unclear data, the pipeline multiplies the number of effect alleles (0, 1, or 2) by their effect weights (from GWAS) for each matched SNP and adds them up to get the PGS for each sample, which indicates an individual’s genetic risk for higher BMI. The pipeline finally adjusts the scores to make them comparable across individuals.
After preprocessing and clustering the Epidemiology and Genetic Datasets to address data multicollinearity, we used Recursive Feature Elimination method to select the top predictors that achieved the highest scores. These predictors were subsequently used to train the model and analyze the outcomes.
Machine learning models
This study utilized four conventional machine learning models: Extreme Random Forest [24] (ERF), Extreme Gradient Boosting [25] (XGB), and Gradient Boosting Machines [26] (GBM) and Elastic Net [27] (EN). These were selected for their ability to capture complex relationships and providing strong predictive performance and interpretability [28,29,30]. ERF, GBM and EN were implemented using the Python package scikit-learn (v1.7.2), and XGB used the Python package xgboost (v3.0.5).
Kolmogorov–Arnold Networks (KANs), based on the Kolmogorov–Arnold theorem, are a novel deep learning model [31, 32] that outperforms traditional Multi-Layer Perceptrons (MLPs) in interpretability [33, 34]. This theorem states that any multivariate continuous function on a bounded domain can be expressed as a finite composition of continuous univariate functions and additions. Unlike MLPs, which use fixed activation functions and linear weights, KANs employ learnable univariate functions (B-splines) along edges, enabling flexible modeling of complex, nonlinear relationships with smaller networks.
A defining feature of KANs is their intrinsically symbolic nature, which sets them apart from conventional machine learning models like Random Forests, GBMs, or MLPs. By representing activation functions as splines, KANs can be further symbolized into explicit mathematical expressions. This process, facilitated by grid extension and sparsity-inducing regularization, produces compact formulas that describe the model’s decision-making process. Unlike traditional models that often rely on post-hoc methods (e.g., SHapley Additive exPlanations [35] or Local Interpretable Model-agnostic Explanations [36]) for explanation or provide simple variable importance, KANs’ symbolic functions enable direct human understanding and visualization. For details on model development and application, see Chen et al. [37]. KAN was implemented using the Python package pykan (v0.2.8).
Model training and evaluation metrics
All five models were deployed across Epidemiology Dataset, Genetic Dataset, and their Combined Datasets. A 5-fold cross-validation strategy with varying randomization was implemented to split the data into training and testing sets. Each model was finely tuned and trained on the training set, evaluated on the testing set, with predictions collected from all five testing folds for further analysis.
Model performance was assessed using the Coefficient of Determination (R²) score, with additional metrics, including Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), and confusion matrices for BMI classification into four categories at ages 17, 20, 23, and 27). The RMSE, MAPE and confusion matrices are provided in Supplementary 2. The best-performing model and dataset were selected for further analysis and results presentation. KAN’s symbolic regression produced explicit mathematical formulas to describe its decision-making process, using activation functions selected for optimal performance. Previously identified key predictors, such as year 5 BMI z-score (Y5BMIz) and polygenic scores (PGS002313, PGS002161, PGS000921), were analyzed for their symbolic relationships with BMI outcomes [37], justifying their roles in predictions.
Results
Models and datasets comparison
Table 3 presents the R² values of five models trained on Epidemiology Dataset, Genetic Dataset, and their combination across the seven age groups. Among these models, the KAN models consistently outperformed the others across all age groups. When comparing the impact of datasets on BMI estimation, the combined use of Epidemiology and Genetic Datasets yielded the best results, particularly for age groups over 17 years, agreed by all the five models, followed by Epidemiology Dataset alone.
Regarding the influence of age, all models agreed that performance gradually decreased with age, ranging from R2 of 0.81 at the age of 8 years to 0.34 at the age of 27 years. Notably, the models after symbolic regression exhibited performance that closely aligned with their original counterparts, showing minimal differences.
As the KAN models trained on the combined datasets yielded the best results, the subsequent results and analyses will focus on the KAN models utilizing both datasets.
Overall feature importance
The feature importance in the decision-making process of the KAN model was determined by multiplying the weights of each activation function from the leaf nodes (input variables) to the output (the BMI values of a target age group), with the visualized pruned tree plots shown in Supplementary 2.
As shown in Fig. 1, the weights exhibit significant variation across different age groups and variables. Overall, the most influential variable is the BMI z-score at 5 years of age (Y5BMIz), with weights of 50%, 34%, 16%, 28%, 15%, 22% and 18% across the seven age groups, respectively. It demonstrates its dominance in early age groups. While its importance declines, it maintains similar absolute weight as several of the polygenic risk scores at the age of 27. In contrast, polygenic scores (PGS), including PGS002313, PGS002161, PGS000027, PGS003884, PGS002853, and PGS000921, have a lower impact in early age groups but gradually catch up with Y5BMIz after age 17.
Only variables appearing in at least four age groups are shown.
Other anthropometry at age 5, including skinfolds (abdominal, suprailiac, subscapular and triceps), and arm and chest circumference, follow a similar trend in the models, with their weights increasing across the age groups. Variables consistently predicting offspring BMI across all age groups include maternal occupation, pre-pregnancy weight, late-pregnancy weight, maternal height-to-weight ratio, paternal weight, and paternal height. Additional variables, selected as important for specific ages not shown in Fig. 1, include pre-pregnancy and pregnancy smoking (ages 8, 10, 14), paternal occupation (ages 10, 14, 27), maternal age (ages 14, 17, 27) and childcare (ages 14, 17, 23). Interestingly, other factors that have been investigated in the literature for their association with subsequent offspring obesity [38,39,40,41,42,43] did not show influence in longitudinal models. This included factors such as delivery mode (age 14), placenta weight (ages 14), birth BMI z-score (age 17), solid food introduction age (age 17), paternal education (age 20), sex (age 20), breastfeeding (age 23), birth anthropometrics (age 23), Apgar score (age 23), maternal height (age 23), paternal age (age 27), and birth month (age 27). Further details are provided in Supplementary 1.
Symbolic formulas
Our results demonstrate that the models maintain comparable performance after symbolic regression, as shown in Table 3 (“KAN-f*”). This consistency enables further exploration of mathematical relationships between key variables and their roles in predicting BMI. The symbolic formulas for BMI estimation, as a function of the selected variables across the seven age groups, are provided in Supplementary 1.
To identify and analyze the most influential factors across age groups, we examined the formulas associating Y5BMIz and the most heavily weighted polygenic scores PGS002313 [15], PGS002161 [16] and PGS000921 [21] along with predicted BMI values.
The BMI estimation and Y5BMIz formula across seven age groups are shown in Supplementary 2. Y5BMIz, a key factor among 10–20 in the KAN model’s decision-making, follows an exponential or first-quadrant sine function before age 20, mirroring actual BMI values and model estimations. After age 20, this relationship weakens due to data sparsity, with many data points in the lower range of Y5BMIz and true BMI, limiting the KAN model’s ability to derive a stable formula. However, the model uses additional variables to maintain predictive performance. These findings are further explored in Section “Role of early-life anthropometry and environmental factors”.
The influence of PGS on predicted BMI increases with age, with consistent positive correlations observed for PGS002313, PGS002161, and PGS000921 across all age groups. The KAN model derived functional representations for the PGS pair (PGS002313, PGS002161) and PGS000921, shown in Supplementary 2 alongside BMI estimations. Most functions capture the positive correlation between these PGSs and BMI, except at age 17, where PGS000921 has lower weight. We further discussed the impact of PGSs in Section “Impact of Polygenic scores (PGS)”.
Discussion
Key findings
This study explored over 200 variables, including maternal, early-childhood, familial, and genetic SNP data, to predict those at risk of future overweight and obesity, using KAN machine learning. Symbolic formulas from KAN address the “black box” issue, enabling transparent clinical decision-making with direct human visualization. Early-life factors like maternal weight during pregnancy, paternal height, and age 5 anthropometry strongly predict BMI in younger age groups. Their influence wanes in adolescence and early adulthood, where adult BMI polygenic scores (PGS002313, PGS002161 and PGS000921) gain significance. Further, many early life factors which have been extensively investigated for their associations with subsequent offspring obesity, including birth weight [40] were not consistently present in these prediction models, suggesting that their limited role for offspring BMI prediction at a population level. These findings align with literature highlighting the balanced role of several dominant early-life conditions in shaping long-term BMI outcomes [44], which can be mitigated by further subsequent factors.
Role of early-life anthropometry and environmental factors
Across the early preschool ages explored (birth, 1 and 5 years), anthropometry at age 5 years were selected by the models, particularly Y5BMIz, as the most influential predictor of subsequent BMI. Y5BMIz showed a persistent relationship with BMI in later life, underscoring its potential as a clinical biomarker for assessing the risk of adult overweight and obesity. Removing Y5BMIz from the full models, drastically reduces model effectiveness: R² drops from 0.81 to 0.70 at age 8, from 0.65 to 0.57 at age 10, and to near zero (0–0.01) for ages 14–27. This suggests that Y5BMIz captures early-life patterns that substantially drive longitudinal BMI tracking into adulthood. Without it, the model’s predictive ability for adolescence and adulthood BMI is largely lost. This finding is consistent with trajectory modeling of childhood obesity showing that, from age 5, obesity tracking is stable [45], while prior to this, there is cross over and catch up growth in some subsets. For example, some individuals exposed to adverse conditions in utero (e.g., malnutrition, smoking) may exhibit catch-up growth in those preschool years, which is associated with adult obesity risk [46], as outlined in the developmental origins of health and disease framework [47]. Further, early adiposity rebound (mean age 5.5 years) has been shown to be associated with adult obesity [48].
At older ages, skinfold thickness at age 5 surpasses Y5BMIz as a predictor, highlighting the value of body composition [49]. While some studies show BMI is as accurate as skinfolds for cardiovascular risk prediction [49,50,51], skinfolds better measure subcutaneous fat, which tracks in childhood [52]. This may reflect earlier subcutaneous fat saturation, leading to ectopic fat accumulation and metabolic dysfunction by the mid-twenties [53]. At a histopathological level, subcutaneous tissue in overweight children had greater adipocyte surface area and collagen content in their subcutaneous tissue compared to normal weight children [54].
Despite including many early-life variables (such as preschool exposures), none were selected in final KAN models, suggesting they may not reliably predict population-level obesity risk.
Impact of polygenic scores (PGS)
Polygenic scores (PGS) enhanced BMI prediction only when combined with epidemiological factors. In particular, three of the seven PGSs, PGS002313 [15], PGS002161 [16] and PGS000921 [21] were top predictors across multiple ages, showing consistent positive correlations with BMI (Supplementary 2). PGS002313 and PGS002161, both developed in the UK Biobank (BOLT-LMM and LDpred2, respectively), are strongly correlated (r > 0.9) and in turn, moderately correlated with PGS000921 (r > 0.6, Inter99 cohort, LDpred), suggesting shared SNP signals critical for adult BMI. In contrast, PGS000027 and PGS004150, also from UK Biobank, are correlated but less predictive, likely capturing less informative variation.
For the three most informative PGSs, the KAN model revealed non-linear relationships, such as exponential-like functions for PGS002313 and PGS002161 at ages 14, 17, and 20. (Supplementary 2).
The influence of PGSs tested in this study increased from childhood to adulthood. Notably, the PGSs were originally developed from adult cohorts, which may explain their stronger predictive performance for BMI in adulthood compared to childhood. Further, variability in BMI in childhood may need to be captured by different PGSs due to varying genetic influences across life stages [55,56,57]. To explore this, we assessed pediatric PGSs (PGS000716 [58], PGS004610 [59], PGS004900 [60]) from the UK Biobank, based on self-reported body size at age 10. However, they were excluded due to potential recall bias and imprecision.
We concur with evidence from previous studies [55, 56], which indicate that adult BMI PGSs can provide meaningful predictive power for adolescent BMI, indicating some genetic overlap across ages. Nevertheless, we suggest that there is a need for PGS developed specifically for childhood and adolescence BMI, where a slightly different set of influences are at play.
Influence of parental factors
Parental factors play a pivotal role in BMI outcomes, with variables such as parental education, household income, parental height and weight, maternal age at childbirth, and maternal weight gain during pregnancy showing consistent associations across all age groups. These factors likely capture both genetic and environmental influences. For instance, parental education and household income reflect socioeconomic and environmental conditions that influence health behaviors and resource access. Similarly, maternal weight gain during pregnancy may affect fetal development through nutritional and metabolic pathways. These associations highlight the need for a holistic framework integrating genetic, maternal, and environmental elements in BMI prediction models.
The influence of these factors evolves with age: maternal and paternal anthropometric measures have limited impact in early childhood but grow stronger in adolescence and early adulthood, aligning with the rising role of adult BMI PGSs in later stages.
Strengths and limitations
Overall, the strength of this study lies in integrating nearly three decades of prospectively collected genetic and epidemiological data using KAN machine learning, which can both predict and simultaneously provide mathematical transparency in predicting adult BMI utilizing information that is available at 5 years old and before.
However, there are also several limitations to this study, namely using BMI as a surrogate measure of adiposity, the fixed time points of data collection, inability to create sex-specific models, and inability to imply causation. Using BMI as the sole obesity indicator may not fully capture adiposity complexity. Alternative measures (e.g., waist circumference, waist-to-hip ratio) could complement BMI. A further limitation is the use of fixed time points for anthropometric data collection (birth, 1 year, and 5 years), preventing assessment of other time points. Individual risk, however, may be better captured by BMI velocity with repeated measures with other time points.
We opted to develop unified models across sexes, as sex was a significant predictor only at age 20 with lower weightings than Y5BMIz or PGSs. Further, childhood and adolescent BMI z-scores already account for sex-related growth differences. However, sex-specific models may reveal further insights in larger sample sizes where the halving the sample size, limiting statistical power were not an issue. Finally, no causation is implied in this study, as selected variables may capture latent variance rather than direct effects.
While we categorized data into epidemiological and genetic datasets, these are not entirely distinct. Anthropometric measures like Y5BMIz, parental height, and weight likely reflect genetic influences due to heritability. This design effectively models combined effects but is less suited for separating genetic and environmental contributions, unlike the twin studies [61], which show increasing BMI heritability from infancy to adulthood, consistent with our finding that PGS influence strengthens post-adolescence while early-life factors dominate in childhood.
The KAN model slightly outperformed other machine learning models in R² scores, but is computationally complex compared to simpler models like Elastic Net. Its strength lies in providing symbolic formulas that reveal variable relationships, feature importance, and effect directions (Supplementary 2). However, the symbolic regression attribute of KANs struggled with weak relationships or low feature weights, such as PGS000921 at age 17, PGS002313 and PGS002161 at age 8, and Y5BMIz and BMI at ages 20 and 23 (Supplementary 2, Supplementary Section 5). Despite these challenges, KAN’s interpretable insights make it valuable for avoiding the “black box” interpretation of machine learning and allowing transparency when using the information to drive clinical decision making. Its use of information that is available at 5 years old and before makes it suitable for application to early intervention of childhood and prevention of adult obesity.
Future directions and conclusion
The findings underscore the importance of aggregating the combined effects of early-life factors, fixed genetic variants, parental and maternal influences in shaping BMI trajectories from childhood to early adulthood. These insights pave the way for more targeted and effective interventions aimed at promoting healthy weight and preventing obesity over the life course.
Future studies should follow up these findings to ascertain if different fixed genetic variants can contribute patterns of childhood growth and whether skinfold thickness or more sophisticated measures of body composition (subcutaneous fat thickness measurements) has increased importance.
Data availability
The datasets generated during and/or analyzed during the current study are not available. The Raine study is committed to a high level of confidentiality of the data in line with the informed consent provided by participants. Requests for data should be directed to the Raine Study Executive.
Code availability
Machine learning model coding is available upon reasonable request.
Notes
The variables in the Rain Study are mostly harmonized with the LifeCycle Project-EU Child Cohort Network [14].
References
GBD 2019 Risk Factors Collaborators. Global burden of 87 risk factors in 204 countries and territories, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. 2020;396:1223–49.
Simmonds M, Llewellyn A, Owen CG, Woolacott N. Predicting adult obesity from childhood obesity: a systematic review and meta-analysis. Obes Rev. 2016;17:95–107.
Lehmann F, Varnaccia G, Zeiher J, Lange C, Jordan S. Influencing factors of obesity in school-age children and adolescents - A systematic review of the literature in the context of obesity monitoring. J Health Monit. 2020;5:2–23.
Wang Y, Min J, Khuri J, Li M. A Systematic Examination of the Association between Parental and Child Obesity across Countries. Adv Nutr. 2017;8:436–48.
Safron M, Cislak A, Gaspar T, Luszczynska A. Micro-environmental characteristics related to body weight, diet, and physical activity of children and adolescents: a systematic umbrella review. Int J Environ Health Res. 2011;21:317–30.
Woo Baidal JA, Locks LM, Cheng ER, Blake-Lamb TL, Perkins ME, Taveras EM. Risk factors for childhood obesity in the first 1000 days: A systematic review. Am J Prev Med. 2016;50:761–79.
Tie H-T, Xia Y-Y, Zeng Y-S, Zhang Y, Dai C-L, Guo JJ, et al. Risk of childhood overweight or obesity associated with excessive weight gain during pregnancy: a meta-analysis. Arch Gynecol Obstet. 2014;289:247–57.
Schnurr TM, Ängquist L, Nøhr EA, Hansen T, Sørensen TIA, Morgen CS. Smoking during pregnancy is associated with child overweight independent of maternal pre-pregnancy BMI and genetic predisposition to adiposity. Sci Rep. 2022;12:3135.
Schellong K, Schulz S, Harder T, Plagemann A. Birth weight and long-term overweight risk: systematic review and a meta-analysis including 643,902 persons from 66 studies and 26 countries globally. PLoS ONE. 2012;7:e47776.
Zheng M, Lamb KE, Grimes C, Laws R, Bolton K, Ong KK, et al. Rapid weight gain during infancy and subsequent adiposity: a systematic review and meta-analysis of evidence. Obes Rev. 2018;19:321–32.
Yan J, Liu L, Zhu Y, Huang G, Wang PP. The association between breastfeeding and childhood obesity: a meta-analysis. BMC Public Health. 2014;14:1267.
Rubino F, Cummings DE, Eckel RH, Cohen RV, Wilding JPH, Brown WA, et al. Definition and diagnostic criteria of clinical obesity. Lancet Diabetes Endocrinol. 2025;13:221–62.
Jaddoe VWV, Felix JF, Andersen A-MN, Charles M-A, Chatzi L, Corpeleijn E, et al. The LifeCycle Project-EU Child Cohort Network: a federated analysis infrastructure and harmonized data of more than 250,000 children and parents. Eur J Epidemiol. 2020;35:709–24.
Straker L, Mountain J, Jacques A, White S, Smith A, Landau L, et al. Cohort Profile: The Western Australian Pregnancy Cohort (Raine) Study-Generation 2. Int J Epidemiol. 2017;46:1384–85j.
Weissbrod O, Kanai M, Shi H, Gazal S, Peyrot WJ, Khera AV, et al. Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores. Nat Genet. 2022;54:450–8.
Privé F, Aschard H, Carmi S, Folkersen L, Hoggart C, O’Reilly PF, et al. Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort. Am J Hum Genet. 2022;109:12–23.
Khera AV, Chaffin M, Wade KH, Zahid S, Brancale J, Xia R, et al. Polygenic prediction of weight and obesity trajectories from birth to adulthood. Cell. 2019;177:587–96.e9.
Monti R, Eick L, Hudjashov G, Läll K, Kanoni S, Wolford BN, et al. Evaluation of polygenic scoring methods in five biobanks shows larger variation between biobanks than methods and finds benefits of ensemble learning. Am J Hum Genet. 2024;111:1431–47.
Shim I, Kuwahara H, Chen N, Hashem MO, AlAbdi L, Abouelhoda M, et al. Clinical utility of polygenic scores for cardiometabolic disease in Arabs. Nat Commun. 2023;14:6535.
Ma Y, Patil S, Zhou X, Mukherjee B, Fritsche LG. ExPRSweb: An online repository with polygenic risk scores for common health-related exposures. Am J Hum Genet. 2022;109:1742–60.
Borisevich D, Schnurr TM, Engelbrechtsen L, Rakitko A, Ängquist L, Ilinsky V, et al. Non-linear interaction between physical activity and polygenic risk score of body mass index in Danish and Russian populations. PLoS ONE. 2021;16:e0258748.
Lambert SA, Gil L, Jupp S, Ritchie SC, Xu Y, Buniello A, et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat Genet. 2021;53:420–5.
Lambert SA, Wingfield B, Gibson JT, Gil L, Ramachandran S, Yvon F, et al. Enhancing the Polygenic Score Catalog with tools for score calculation and ancestry normalization. Nat Genet. 2024;56:1989–94.
Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn. 2006;63:3–42.
Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16. New York: ACM Press; 2016. p. 785–94.
Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Statist. 2001;29:1189–232.
Zou H, Hastie T. Regularization and variable selection via the elastic net. J Royal Statistical Soc B. 2005;67:301–20.
Konstantinov AV, Utkin LV. Interpretable machine learning with an ensemble of gradient boosting machines. Knowl Based Syst. 2021;222:106993.
Palkar A, Dias CC, Chadaga K, Sampathila N. Empowering glioma prognosis with transparent machine learning and interpretative insights using explainable AI. IEEE Access. 2024;12:31697–718.
Du H, Yang Q, Ge A, Zhao C, Ma Y, Wang S. Explainable machine learning models for early gastric cancer diagnosis. Sci Rep. 2024;14:17457.
Liu Z, Wang Y, Vaidya S, Ruehle F, Halverson J. Kan: Kolmogorov-Arnold networks. 2024. Preprint at https://arxiv.org/abs/2404.19756.
Liu Z, Ma P, Wang Y, Matusik W, Tegmark M. Kan 2.0: Kolmogorov-Arnold networks meet science. 2024. Preprint at https://arxiv.org/abs/2408.10205.
Somvanshi S, Javed SA, Islam MM, Pandit D, Das S. A survey on Kolmogorov-arnold network. ACM Comput Surv. 2026;58:1–35.
Zhong K, Chen Y, Yang W, Chen J, Tang P, Wang P, et al. Interpretable disease prediction based on Kolmogorov - Arnold networks. In: 2024 IEEE International Conference on Medical Artificial Intelligence (MedAI). IEEE; 2024. p. 645–50.
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. in Proc. 31st International Conference on Neural Information Processing Systems 4768–77 (Curran Associates Inc., 2017).
Ribeiro M, Singh S, Guestrin C. “Why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. Stroudsburg, PA, USA: Association for Computational Linguistics; 2016. p. 97–101.
Chen F, Melton P, Vinsen K, Mori TA, Beilin L, Huang R-C. Explainable AI to predict a complex multifactorial outcome, childhood obesity: application to clinical epidemiology. medRxiv. 2025. https://www.medrxiv.org/content/10.1101/2025.06.21.25330041v1.
Darmasseelane K, Hyde MJ, Santhakumaran S, Gale C, Modi N. Mode of delivery and offspring body mass index, overweight and obesity in adult life: a systematic review and meta-analysis. PLoS One. 2014;9:e87896.
Ouyang F, Parker MG, Luo Z-C, Wang X, Zhang H-J, Jiang F, et al. Maternal BMI, gestational diabetes, and weight gain in relation to childhood obesity: the mediation effect of placental weight. Obesity. 2016;24:938–46.
Yu ZB, Han SP, Zhu GZ, Zhu C, Wang XJ, Cao XG, et al. Birth weight and subsequent risk of obesity: a systematic review and meta-analysis. Obes Rev. 2011;12:525–42.
Papoutsou S, Savva SC, Hunsberger M, Jilani H, Michels N, Ahrens W, et al. Timing of solid food introduction and association with later childhood overweight and obesity: the IDEFICS study. Matern Child Nutr. 2018;14:e12471.
Eriksen W, Sundet JM, Tambs K. Paternal age at birth and the risk of obesity in young adulthood: a register-based birth cohort study of Norwegian males. Am J Hum Biol. 2013;25:29–34.
Hemati Z, Keikha M, Riahi R, Daniali SS, Goudarzi M, Kelishadi R. A systematic review on the association of month and season of birth with future anthropometric measures. Pediatr Res. 2021;89:31–45.
Reilly JJ, Armstrong J, Dorosty AR, Emmett PM, Ness A, Rogers I, et al. Early life risk factors for obesity in childhood: cohort study. BMJ. 2005;330:1357.
Huang R-C, de Klerk NH, Smith A, Kendall GE, Landau LI, Mori TA, et al. Lifecourse childhood adiposity trajectories associated with adolescent insulin resistance. Diabetes Care. 2011;34:1019–25.
Eriksson JG, Forsén T, Tuomilehto J, Winter PD, Osmond C, Barker DJ. Catch-up growth in childhood and death from coronary heart disease: longitudinal study. BMJ. 1999;318:427–31.
Hanson MA, Gluckman PD. Early developmental conditioning of later health and disease: physiology or pathophysiology? Physiol Rev. 2014;94:1027–76.
Whitaker RC, Pepe MS, Wright JA, Seidel KD, Dietz WH. Early adiposity rebound and the risk of adult obesity. Pediatrics. 1998;101:E5.
Santos S, Gaillard R, Oliveira A, Barros H, Abrahamse-Berkeveld M, van der Beek EM, et al. Associations of infant subcutaneous fat mass with total and abdominal fat mass at school-age: the generation R study. Paediatr Perinat Epidemiol. 2016;30:511–20.
Freedman DS, Katzmarzyk PT, Dietz WH, Srinivasan SR, Berenson GS. Relation of body mass index and skinfold thicknesses to cardiovascular disease risk factors in children: the Bogalusa Heart Study. Am J Clin Nutr. 2009;90:210–6.
Huang R-C, de Klerk N, Mori TA, Newnham JP, Stanley FJ, Landau LI, et al. Differential relationships between anthropometry measures and cardiovascular risk factors in boys and girls. Int J Pediatr Obes. 2011;6:e271–82.
Vogelezang S, Gishti O, Felix JF, van der Beek EM, Abrahamse-Berkeveld M, Hofman A, et al. Tracking of abdominal subcutaneous and preperitoneal fat mass during childhood. The Generation R Study. Int J Obes. 2016;40:595–600.
Gyllenhammer LE, Alderete TL, Toledo-Corral CM, Weigensberg M, Goran MI. Saturation of subcutaneous adipose tissue expansion and accumulation of ectopic fat associated with metabolic dysfunction during late and post-pubertal growth. Int J Obes. 2016;40:601–6.
Mujkić R, Šnajder Mujkić D, Ilić I, Rođak E, Šumanovac A, Grgić A, et al. Early childhood fat tissue changes-adipocyte morphometry, collagen deposition, and expression of CD163+ cells in subcutaneous and visceral adipose tissue of male children. Int J Environ Res Public Health. 2021;18:3627.
Choh AC, Lee M, Kent JW, Diego VP, Johnson W, Curran JE, et al. Gene-by-age effects on BMI from birth to adulthood: the Fels Longitudinal Study. Obesity. 2014;22:875–81.
Hohenadel MG, Baier LJ, Piaggi P, Muller YL, Hanson RL, Krakoff J, et al. The impact of genetic variants on BMI increase during childhood versus adulthood. Int J Obes. 2016;40:1301–9.
Brandkvist M, Bjørngaard JH, Ødegård RA, Åsvold BO, Smith GD, Brumpton B, et al. Separating the genetics of childhood and adult obesity: a validation study of genetic scores for body mass index in adolescence and adulthood in the HUNT Study. Hum Mol Genet. 2021;29:3966–73.
Richardson TG, Sanderson E, Elsworth B, Tilling K, Davey Smith G. Use of genetic variation to separate the effects of early and later life adiposity on disease risk: mendelian randomisation study. BMJ. 2020;369:m1203.
Hawkes G, Beaumont RN, Tyrrell J, Power GM, Wood A, Laakso M, et al. Genetic evidence that high BMI in childhood has a protective effect on intermediate diabetes traits, including measures of insulin sensitivity and secretion, after accounting for BMI in adulthood. Diabetologia. 2023;66:1472–80.
Wu Y, Lye S, Dennis C-L, Briollais L. Exclusive breastfeeding can attenuate body-mass-index increase among genetically susceptible children: a longitudinal study from the ALSPAC cohort. PLoS Genet. 2020;16:e1008790.
Silventoinen K, Li W, Jelenkovic A, Sund R, Yokoyama Y, Aaltonen S, et al. Changing genetic architecture of body mass index from infancy to early adulthood: an individual based pooled analysis of 25 twin cohorts. Int J Obes. 2022;46:1901–9.
Acknowledgements
We gratefully acknowledge all Raine Study participants and their families for their continued participation in the study, as well as the Raine Study team for study co-ordination and data collection. We also thank the NHMRC and the Raine Medical Research Foundation for their support. The core management of the Raine Study is funded by The University of Western Australia, Curtin University, The Kids Research Institute Australia, Women and Infants Research Foundation, Edith Cowan University, Murdoch University, The University of Notre Dame Australia and the Western Australian Future Health Research and Innovation Fund (2023-2024; Grant ID WACSOSP2023-2024). The Pawsey Supercomputing Centre provided computation resources to carry out analyses required with funding from the Australian Government and the Government of Western Australia. The data collection of the Raine Study Gen1- and 2-1, 2, 5, 8, 10, 14, 17, 20, 22, and 26 year follow-ups were funded by NHMRC project grants (211912, 003209, 572613, 403981, 353514, 572613, 403981, 1059711, 1084947), and The Raine Medical Research Foundation.
Author information
Authors and Affiliations
Contributions
FC designed and conducted the study, developed the model, aggregated and visualized the results, performed the analysis, and drafted the manuscript and supplementary materials. RH was responsible for data acquisition, results interpretation, analysis, and manuscript writing. PM contributed to data acquisition, model evaluation, and results interpretation and analysis. KV contributed to model evaluation and provided computational resources. TM and LB contributed to the results analysis. All authors reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics approval
This study, utilizing de-identified data from the Raine Study Gen2 cohort, received an exemption from ethics review by the University of Western Australia’s Human Research Ethics Office (Ref: 2025/ET000396, April 28, 2025), as no new data from human subjects were collected. Informed consent was obtained from participants’ guardians during the original Raine Study, adhering to institutional protocols. Data privacy and confidentiality were ensured through de-identification and secure storage, compliant with Australian data protection guidelines. No compensation was provided to participants, as the study relied on existing data.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chen, F., Melton, P.E., Vinsen, K. et al. Longitudinal prediction of BMI using explainable AI: integrating polygenic scores, maternal, early-life and familial factors. Int J Obes 50, 1142–1149 (2026). https://doi.org/10.1038/s41366-026-02050-1
Received:
Revised:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41366-026-02050-1



