Introduction

Psychiatric illnesses are among the top contributors to disability globally1 and arise from an interaction between genes and the environment (GxE)2,3 at critical—often early-periods of development (GxExTime).4 Mental illness can manifest in childhood with behavioural problems5,6 and childhood behaviour is similarly subject to GxE interactions.7 The stratification of individuals based on genetics could potentially identify those eligible for early intervention in the form of environmental modifications.8 In a clinical context the ever-decreasing cost of high-throughput analyses and the potential availability of genetic material from routine heel-prick9 implies such precision prevention could be within reach;8 however, potential genotypes and beneficial interventions are still subject to investigation.

One modifiable environmental exposure consistently associated with child behaviour is nutrition10,11 and evidence suggests that it is also possible to change early diet through individual12,13 and community intervention.14 In the Raine Study, both breastfeeding15 and Mediterranean diet16 have been associated with reduced problem behaviour in childhood and adolescence; however, traditional observational studies do not account for potential genetic confounding of perinatal risks (including poor nutrition) or GxE-dependent effects.17 For instance there is poor consistency between observational studies and results from the large breastfeeding trial PROBIT.18,19 Here breastfeeding support programs increased cognition at age 6.5 but did not reduce problem behaviour,12 which could suggest specific confounding in nutrition-behaviour correlates. Alternatively early nutritional effects on long-term outcomes may vary by underlying genotype as demonstrated for cardiometabolic health,20 meaning interventions should be directed to at-risk populations.

Studies of gene-nutrition interactions in determining child behaviour have been limited by sparse results in the genome-wide association studies (GWAS) of child behaviour, albeit the most recent efforts yielded two significant genetic variants associated with total problems;21 however the co-morbidity patterns and changing manifestations of psychopathology across childhood and adolescence have prompted research into cross-disorder prediction from psychiatric GWAS.22 A recent screening of genetic risk profiles demonstrated that polygenic scores (PGS) for attention-deficit hyperactivity disorder (ADHD),23 depression24 and chronic multisite pain (CMSP)25 had considerable influence on a construct of general child psychopathology26 derived from the parent-reported child behaviour checklist (CBCL).27 The utility of these four genetic risks in predicting early nutrition effects on behaviour has not been examined. Surprisingly, an association has also emerged between birth weight (BW) PGS and childhood behaviour28,29 with different effects in males and females,30 and our group previously demonstrated an interaction between BW-PGS and early nutrition in predicting adult cardiometabolic risk.20 The two exposures of interest (breastfeeding and early diet) were therefore screened for interaction with five PGS to predict behaviour problems in childhood and adolescence. We hypothesised that breastfeeding and diet effects on problem behaviour would be larger in genetically vulnerable individuals.

We show that healthier year one diet is associated with reduced CBCLTOT in those with a lower PGS of ADHD driven primarily by plant-based food intake, with reduced diet effect size at age three; in contrast, a longer breastfeeding duration was associated with reduced CBCLTOT in those with a higher PGS of CMSP, albeit at a borderline significant level. These results highlight the complex interactions between genetics and nutrition in sensitive developmental windows to shape lasting behavioural traits.

Methods

Sample

The Raine Study is a well-characterised pregnancy cohort located in Western Australia.31,32,33 In brief, the recruitment of 2900 mothers during pregnancy began in 1989 from the King Edward Memorial Hospital and surrounding clinics. The Raine Study Gen1 and Gen2 antenatal data used in this paper included maternal questionnaire data from 18 weeks gestation and mother-baby dyads were followed through until birth. Subsequently, 2730 participating mothers (Generation 1/Gen1) and their 2868 offspring (Generation 2/Gen2) were followed up throughout childhood and adolescence with various phenotype assessments, including behaviour. The data for the present investigation were gathered from the Raine Study Gen2–1 year follow-up through to Gen2–17 year follow-up (the primary outcome being the longitudinal investigation of child behaviour in the Gen2 of the Raine Study at ages 2, 5, 8, 10, 14 and 17). For the present study, we used data from the sub-cohort that agreed to provide samples for genotyping. The study was conducted in accordance with the Declaration of Helsinki, and all participants provided written consent for their participation in the study at each follow-up. Ethics approvals were granted from the Human Research Ethics Committee of King Edward Memorial Hospital, Princess Margaret Hospital, the University of Western Australia, and the Health Department of Western Australia.

Outcomes

The primary outcome was the total problem T-scores as assessed by the CBCL at ages 2, 5, 8, 10, 14 and 17 (T-scoreTOT). In brief the CBCL is a clinically used psychometric instrument evaluating child behaviour wherein parents are asked to rate individual items (e.g. ‘Acts too young for age’) on a 3-point Likert scale (0 = ‘Not true’ 1 = ‘Somewhat or sometimes true’, 2 = ‘Very often or often true’). Items are summed to generate a hierarchy of scores—the highest level is the total problem score summing all items; factor analysis has demonstrated clustering of items signifying externalising (aggressive and delinquent behaviour) and internalising (anxious/depressed, withdrawn and somatic complaints) problems (CBCL T-scoreEXT and CBCL T-scoreINT).34 The age 2 assessment was done with the preschool CBCL ages 2–3 years35 and the age 5–17 assessment was done with the CBCL4/18.27 Questionnaires with more than 8 missing items were discarded. Missing items for those with 8 or fewer missing items were treated as zero-scores. As a secondary outcome and to confirm the main findings, we used a teacher report form (TRF) filled out at age 10. This is a questionnaire by the same provider, which has parallel questions modified for the school context, and which follows a similar hierarchical construct order.

From each of the total internalising and externalising scales a T-score was derived (age and sex standardised), which is clinically used to group behaviours as normal (up to and including 59), borderline (60–64) or clinically significant problems (65+).

Exposures

Early nutrition

We chose two primary exposures of interest: breastfeeding duration and maternal diet report at year 1.

Total breastfeeding duration was derived from maternal recall at follow-ups in years 1, 2 and 3 of life. Mothers were asked ‘Did you breastfeed your baby?’ If they said ‘no’ this was recorded as ‘never breastfed’. If they said ‘yes’ they were asked ‘At what age did you stop breastfeeding?’ with the answer recorded in months. Previous studies have found excellent reliability of maternal breastfeeding duration recall even up to 6 years after delivery.36

Diet was assessed using 24-h maternal dietary recall37 and categorised with nutritionist supervision according to the categories of the Youth Healthy Eating Index that reflects dietary guideline adherence in children.38 Parents were asked: ‘please describe what food and drink your child has eaten in the past 24 h (please specify type of food/drink and quantity)’ and space was provided for breakfast, morning snack, lunch, afternoon snack, dinner and evening snack. Because of varying questionnaire responses related to portion size, only the food quality at each meal was included in a quasi-quantitative score, the eating assessment in toddlers (EAT) score.37 In brief seven food sub-categories—wholegrain, vegetables, fruits, meat ratio (\(\frac{{{\rm{white}}}\; {{\rm{meat}}}+{{\rm{egg}}}+{{\rm{other}}}\; {{\rm{protein}}}\; {{\rm{sources}}}}{{{\rm{red}}}\; {{\rm{meat}}}+{{\rm{processed}}}\; {{\rm{meat}}}}\)), dairy, snack foods, and sweetened beverages—were scored 0–10 points depending on how many times they were offered at meals during the day. Our primary outcome, the total EAT score (range 0–70), summed these numbers, treating the first five components as positive and the latter two as negative (higher score proxying healthier diet). As a secondary measure to test the effects of dietary timing, a similar EAT score collected at age 3 years was included. In post hoc analysis, we also created three groups based on the food categories: plant-based (wholegrain, vegetables, and fruits), animal (meat ratio and dairy), and junk food products (snack foods and sweetened beverages). To facilitate effect size comparability for these categories, we used normalised scores.

Genetics data and polygenic scores

A total of 1593 participants had genotype data of approximately 560,000 single-nucleotide polymorphisms (SNPs) and ~95,000 copy number variants, obtained from an Illumina 660W Quad Array at the Centre for Applied Genomics, Toronto, Canada with quality control (QC) as per standard protocol. Plate controls and replicates with a higher proportion of missing data were excluded. We then assessed low genotyping success (>3% missing), excessive heterozygosity, gender discrepancies between the core- and genotyped data, and cryptic relatedness (π > 0.1875, in between second- and third-degree relatives). The SNP data were cleaned using plink39 following the Wellcome Trust Case—Control Consortium protocol.40 The exclusion criteria for SNPs included: Hardy–Weinberg-equilibrium p < 5.7 × 10–7; call-rate < 95%; minor-allele-frequency < 1%; and SNPs of possible strand ambiguity (i.e. A/T and C/G SNPs). The cleaned GWAS data were imputed using MACH software41 across the 22-autosomes and X-chromosome against the 1000 Genome Project Phase I version 3.42 After QC 1494 participants had SNPs and imputation resulted in 30,061,896 autosomal and 1,264,4493 x-linked SNPs. Principal components (PCs) analysis was then carried out, using SMARTPCA from v.3.0 of EIGENSOFT,43 and PCs were generated to adjust for population stratification.

PGS were generated using the top hits from three GWASs likely to be associated with general psychopathology:26 for ADHD we had 27 out of 27 available SNPs,23 for depression, we had 95 out of 103,24 and for CMSP we had 36 out of 39;25 we had 146 out of 146 SNPs associated with own BW.20,44 For the CBCL total problems GWAS, only one of two SNPs (rs10767094) was available, and the info score was borderline (0.30411).21 Participant SNP data were extracted and recoded to correspond with increasing psychiatric risk (or increasing BW). It was then weighted using the beta-coefficients reported in the meta-analysis (for the depression GWAS, we used coefficients from the combined meta-analysis and 23andme replication), summed and re-scaled before the BW-PGS was calculated for each study participant. We confirmed the PGS predictive value for measures with relevant target data (adult self-report of depressive symptoms, a teacher assessment of ADHD, and the measured BW (data not shown)). To ease comparability in effect size, we used normalised scores.

Confounders and additional variables

Confounder selection for the relationship between early nutrition and child behaviour was based on an a priori search of the literature and visualised using directed acyclic graphs (Fig. 1A).45 To avoid reverse causality (i.e. postnatal variables being a result of aberrant child behaviour), we used prenatal variables collected during weeks 16–18 of pregnancy. For the relationship between early nutrition and behaviour, we decided on a confounder set consisting of maternal age at birth, education level (highest year of finished education), family income (5 level variable treated as continuous) and civil status (dichotomised into single or partnered). Detailed descriptions of gestation duration and questionnaire formulation can be found in the Supplementary Materials. Additionally, Gen2 participants age at assessment, biological sex collected at birth (one participant had missing data and sex was derived from a later questionnaire), and relevant PCs were included as fixed effects.

Fig. 1: The variables used for analysis.
Fig. 1: The variables used for analysis.
Full size image

A Conceptual framework of analysis and B correlation table of the variables used in regression. EAT eating assessment in toddlerhood, ADHD attention deficit hyperactivity disorder, CMSP chronic multisite pain, PGS polygenic score, SNP single nucleotide polymorphism.

Statistics

Exposures, outcomes, and confounders were visually assessed and presented as summary statistics. We examined the Pearson correlation between variables, with particular focus on gene-environment correlations of nutritional measures and PGS that might bias the estimates.45 The family income variables had more missing values than the remaining confounders (n = 56). To avoid bias and loss of power, these missing values were assigned the family income mean value for primary analysis (see also sensitivity analysis). The crude predictive value of the PGS and nutritional measures on T-scoreTOT was then regressed at each age.

Given the two primary exposures of interest (breastfeeding and year 1 diet) and the five genetic profiles with potential interaction, we pragmatically chose a significance threshold of 0.025 to limit chance findings while preserving power within our limited sample. The primary model using repeated CBCL T-scores was a linear mixed effects model with clustering at the participant ID level and varying intercepts. The interaction term between PGS and breastfeeding or year 1 EAT scores was then assessed. Model residuals showed some deviation from normality; therefore, we obtained two-tailed p-values from Z-statistics based on SE derived by non-parametric bootstrap (5000 resamples) to confirm primary results. Using Johnson–Neyman intervals, we derived a cut point to categorise those with and without significant effects from EAT scores. Results were reanalysed, assigning ‘family income’ missing values maximum and minimum values (instead of mean). Supplementary/sensitivity analyses of the model included a generalised logistic mixed effects model of a T-scoreTOT > 59, age 10 TRF T-scoreTOT, specifically examining the age 2 rating, the exclusion of preterm births and using internalising and externalising outcomes. As we previously found sex-specific effects of the BW-PGS on behaviour30 we also investigated sex-stratified BW-PGS-by-nutrition interaction models.

Post hoc we sought to identify the importance of specific food items and timing. Individual EAT-score categories (fibre, animal, and junk products) were added to the same model to examine the specificity of effects. Having found an effect of age-1 dietary fibre, we then made a final model including both dietary fibre at ages 1 and 3. As the age 3 diet was collected after the first CBCL, only age 5–17 assessments were used for this analysis. For breastfeeding, we divided the sample into approximately equal portions by splitting the cohort from 0–2, 3–6, 7–12 and 13–18 months of breastfeeding to explore potential breastfeeding-sensitive age windows and potential non-linear effects.

Results

Sample description

We identified 1395 Gen2 participants with available data (1393 vs 1310 for analysis of breastfeeding duration and year 1 diet, respectively—see Supplementary Fig. 1 for flowchart). Participants had an average of 5.2 behavioural assessments; 12 were assessed only once, and 717 were assessed all six times. No clear differences were seen in variable distributions for participants used in breastfeeding and diet models (Table 1). Additional variables are described in the supplement (sT1). The analytic sample had more favourable maternal baseline demographics than the excluded Gen2 participants, assessed by birthweight and socioeconomic variables (sT2). The prespecified confounder variables correlated with both breastfeeding duration and diet, whereas the correlations for the PGSs were insignificant (r < 0.1) (Fig. 1). In isolation PGS did not predict total problems in childhood and adolescence; in contrast, longer breastfeeding and higher EAT1 scores were associated with fewer behaviour problems even when accounting for confounders (sT3).

Table 1 Baseline demographics for the analytic sample (n = 1395)

Polygenic score by nutrition interactions

Continuing to the primary question, we did not find statistically significant interactions between breastfeeding duration and PGSs at our predetermined significance threshold (all p > 0.025 see Table 2); however, a borderline result suggested that a higher genetic risk for CMSP amplified the positive effects of breastfeeding (p = 0.03) (Table 2). Exploring this borderline result, we split the cohort at the 0.025 alpha with a Johnson Neyman plot (Fig. 2) yielding 669 low- and 724 high-risk participants. This coincided with a pronounced effect difference between groups in models of externalising behaviour (Table 3, B: −0.0816, 95 % CI: [−0.131, −0.0291], p = 0.002), meaning the high-risk group saw a 0.124 point T-scoreEXT reduction for every 1-month increase in breastfeeding. The primary results were present in the term-born cohort and did not diminish with increasing age (sT4). In the pre-specified sex stratification, there was no signal to suggest interaction between BW-PGS and breastfeeding (sT5). Family income missing values did not appear to influence effect estimates (sT6). Post hoc we found that the differences in the primary linear mixed models coincided with lower T-scoreTOT for breastfeeding beyond 12 months in the genetic high-risk participants (Table 4).

Fig. 2: Splitting the cohort by genetic risk.
Fig. 2: Splitting the cohort by genetic risk.
Full size image

We see that the effects of age 1 diet on child behaviour total problems diminishes with increasing ADHD polygenic score, whilst the effect of breastfeeding increases with increasing chronic multisite pain score. Error bands signify the 97.5% confidence interval.

Table 2 Interactions between genetic risk and early life nutrition for total behaviour problems
Table 3 Behavioural specificity of nutrition association
Table 4 Breastfeeding brackets

For diet we found a significant interaction between ADHD-PGS and EAT1 score (p = 0.0005). Using a Johnson Neyman plot and splitting at the 0.025 alpha (Fig. 2), we defined two groups of 516 vs 794 participants with high and low ADHD-PGS, respectively. The remaining interactions did not reach significance; however, a signal suggested that the year-1 diet only benefitted males with a higher BW PGS in sex-stratified models (sT5).

In the ADHD low-risk group an improved diet—i.e. a higher EAT1-score—was associated with lower T-scoreTOT (Table 2, B: −0.121, 95% CI [−0.171, −0.0704]), whereas there was no effect in the high-PGS group. When examining the effects on lower-order behavioural domains we observed marginally larger effects on externalising than internalising behaviours (Table 3). In the supplementary analysis (sT4) there was a trend towards increasing diet effect size with advanced age. However this was not confirmed in the continuous model’s three-way interaction, and the signal was already present at 2 years of age. Results were consistent in the term-only cohort and diminished but maintained directional similarity with an age-10 teacher assessment. The OR of meeting a clinically relevant cut point was also reduced with a higher EAT1 score. Family income missing values did not appear to influence effect estimates (sT6).

Post hoc we further explored the diet effects in the low-ADHD PGS group—first we split the components of the EAT score into three categories: plant-based, animal-based and junk food. We found that primarily plant-derived foods (Table 5, B: −1.01 per 1 SD, 95% CI: [−1.50, −0.516], P: <0.0001) drove the association with minimal effect modification if including year 3 plant-based food consumption (B: −0.954 per 1 SD, 95% CI: [−1.44 −0.451], P: 0.0002). We found a 30% effect size reduction for the age 3 plant-based consumption (B: −0.664, 95% CI: [−1.20, −0.130], P: 0.014). Breaking down the three components of the plant-based diet the effects were largest for wholegrain, then fruit and least for vegetables (sT7). No significant associations emerged with age 3 plant-based foods.

Table 5 Exploring diet

Discussion

Using longitudinal data throughout childhood and adolescence, we have demonstrated the interplay between early nutrition and psychiatric PGS in determining behaviour. Consistent with our a priori hypothesis, a borderline signal suggested that individuals with a high genetic risk for CMSP had larger benefits from longer breastfeeding; however, contrary to our a priori hypothesis, a strong signal suggested that individuals with a low ADHD risk benefitted from improved diet in the first year of life. We note that the conventionally viewed ‘good’ early life nutrition (i.e. higher EAT score and longer duration of breastfeeding) was not associated with worse outcomes in any genetic subgroup. Nevertheless, contingency on genetic risk was considerable.

For diet, the ADHD low-risk group saw a 0.121 point lower T-scoreTOT per 1 point higher EAT-score (scale 0–70). This association would, in a causal framework, translate to an 8–9 points improvement (SD of T-scoreTOT = 10 points) from the worst to the best diet, whereas the high-risk group would translate to only a 0–1 point improvement. The behaviour association was detectable at age two, was consistent across ages with no sign of effect size reduction with increasing age, and we saw non-significant but directionally similar results with the teacher assessments. The higher EAT score also reduced the ‘borderline’ problems category (T-score > 60), hinting at clinical significance. We did not see pronounced differences for specific behavioural patterns, but the effect sizes were larger for externalising compared to internalising behaviour.

For breastfeeding, we highlight that the p-value of the primary interaction did not cross the pre-specified significance threshold. Nevertheless, a signal suggested that every month of breastfeeding was associated with 0.11 points lower T-scoreTOT in those with a higher genetic risk for CMSP. The WHO recommends 2 years of breastfeeding and, in a causal framework, going from 0 to 24 months. This would translate to a 2–3 point reduction in T-scoreTOT, whereas the low-risk group would see around a 1 point decrease. Exploring brackets of breastfeeding in the high-risk group, most breastfeeding benefits in the high-risk group were seen after 6 months, whereas the low-risk group saw no additional T-scoreTOT reductions beyond two months. Exploring breastfeeding further models of T-scoreEXT showed a strong interaction hinting at behaviour-specific effects. This is consistent with the increased hostility seen in the ‘never-breastfed’ group of the Young Finns study on a total population level.46 The interaction was robust across ages and in those born at term.

The mechanisms underlying our associations are not clear. In psychiatry the diathesis-stress model proposes that vulnerability interplays with environmental stressors to provoke mental illness.3 We believe that the borderline interaction between CMSP and breastfeeding is consistent with this theory, as shorter breastfeeding could be considered a stressor.47 From a mechanistic point of view breastfeeding has been proposed to alter neurodevelopment through several mechanisms. The long-chain fatty acids in breast milk have been proposed to enhance white matter myelination in the first 2 years of life.48 Alternatively changes in the gut microbiome from breastfeeding could affect the gut-brain axis leading to behavioural changes. Finally increased breastfeeding could enhance maternal bonding and affect both child behaviour and maternal evaluation of child behaviour. The first year of life has been considered critical for microbiome establishment49 and white matter development.50 As breastfeeding duration up to 2 years influences maternal sensitivity51 and the associations in the present study differed primarily beyond 12 months, we propose that the effects relate to increased effects of child-maternal bonding in psychiatrically vulnerable offspring.

The diet-by-PGS results in ADHD conflict with the diathesis-stress model and we see two potential explanations for this surprising result. First the genetics of ADHD may determine behaviour to such an extent that high-risk individuals are not amenable to influences from early environmental exposures such as diet. Indeed, the heritability of ADHD has been estimated at 70–80%23 leaving little room for the environment; however, our PGS was a poor predictor in and of itself and only showed a significant prediction of T-scoreTOT at age 2 years. As such our instrument is likely insufficient to capture such an exposure-resistant genetic predisposition. Alternatively we observed an age two increase in behaviour problems associated with the ADHD-PGS (sT3), which opens the possibility that the interaction stems from gene-environment correlation.52 i.e. if the early increase in problem behaviour manifested as a refusal to eat less palatable foods, only those offered the ‘healthy’ foods and with a lower genetic risk would eat them. From a mechanistic perspective, we found evidence that plant-based especially wholegrain, foods at age 1 exerted strong effects. We postulate that dietary fibre is the more important constituent underlying the association (as compared to polyphenols or vitamins more abundant in fruits and vegetables). Consistent with this, year 1 of life is considered ‘crucial’53 in establishing a healthy gut microbiome and the robustness of the year 1 estimate to including year 3 diet hints at an early window of opportunity for optimising diet. Supporting this notion, differences in gut-microbiome composition in adult ADHD have been shown in many studies.54 Alternatively short-chain fatty acids generated from gut-microbe metabolism of dietary fibre could enhance white matter myelination, as explained above.

Our study had several strengths. The key strength was the long-term follow-up of the same cohort over many years with repeated assessments throughout childhood and adolescence. A further strength was the use of consistent validated questionnaires, which were completed by multiple raters. The repeated follow-up in early life allowed us to conduct analyses regarding the time sensitivity of nutritional exposures and minimise the recall bias risk. Collection of confounders before birth ensured that we were not adjusting for downstream effects of offspring behaviour, and the prospective nature should minimise the risk of reverse causality. In addition the effects of the age-1 diet were robust and largely unaltered even when the age-3 measures were included, suggesting that the association is not explained by the correlation of early and late diet.

The study also has some limitations. The Raine Study is from 1989; our results are not certain to generalise to later cohorts—i.e. modern dietary and breastfeeding patterns could change the GxE interaction in a more contemporary setting; furthermore, the diet exposure was a 24-h recall and could be an imprecise measure of general diet-patterns, which would bias our results towards the null. Another limitation is that our hypothesis-driven approach to PGS selection is unlikely to have picked the most influential PGS. A broad screening of genetic instruments would require stringent significance thresholds and given our limited sample size, we refrained from this. Our results should also be interpreted with caution as our p-value threshold for significance was pragmatically chosen to account for a limited sample size, rather than a strict Bonferroni correction. Although we had a good follow-up of the genotyped cohort in the Raine Study (93 % of the 1494 genotyped participants), the need for genetic information introduced selection bias and our sample consisted of half (51.3 %) of the original cohort. This could limit the generalisability of our exposure-outcome relationships; however, we did not see signs of selection bias in the PGS histograms and previous studies have suggested that exposure-outcome relationships in the Raine Study follow-ups vs dropouts are similar.33 We had excellent data quality for potential confounders; however, potential residual confounding limits any causal conclusions regarding nutrition effect estimates.45 In contrast the underlying confounder architecture would have to differ between high- and low-risk groups to explain away the interaction estimates as the genes are subject to random allocation.

Future basic research should be directed at understanding the impact of timing for early nutritional effects on the gut microbiome; furthermore, the impact of ADHD risk on early-life eating habits is warranted, given the surprising lack of diet-behaviour association in the high-risk ADHD-PGS group. For breastfeeding, our results need replication before drawing robust conclusions. The detectability of effects at age 2 years is encouraging for future early nutrition RCTs, and we hope interventionists studying early nutrition will include genetic profiling in trials for subgroup analysis.

In conclusion, we show for the first time that early nutrition and psychiatric genetic risk interact in shaping behaviour throughout childhood and adolescence.