Introduction

During early-life development, the pediatric immune system undergoes rapid changes, shifting from maternally derived protection to a mature state marked by changes in immune cell proportions and immunological function1. Although this transition is essential for both sexes, differences in immune responses are observed between males and females in early life, but most immunological work investigating male-female differences focuses almost exclusively on the post pubescent stage where sex hormones are known to play a role in moderating immune system function2,3. While sex hormones are critical for fetal development and may contribute to sex differences occurring in immune cell composition, immune responsiveness, and disease incidence later in life, much remains unknown about prepubescent differences in the immune system composition and the underlying molecular mechanisms that may drive them2. To address this gap, we investigated early-life molecular differences between the sexes within the immune system, focusing on shifts in immune cell composition and epigenetic changes during development.

In the first 3 months of life, all typically developing infants converge on a similar growth trajectory, suggesting that during development there is also a stereotypical path to immune system maturity4. Innate immune cells, such as neutrophils, eosinophils, basophils, and monocytes, form the first line of defense and mount a consistent response undergoing functional changes during early development1. Adaptive immune cells, including B and T cells, serve as a secondary line of defense and develop antigen-specific responses that improve with repeated exposure1. For example, in healthy children, monocytes are initially immature with impaired signaling capacity, natural killer (NK) cells exhibit only half the cytolytic function seen in adults at birth, and early B memory cells demonstrate reduced antibody affinity maturation compared to later life1. Although immune cell maturation trajectories occur in both males and females, sex-specific differences in the proportions of certain immune cell types have been observed as early as in cord blood samples5. Specifically, males exhibit a higher proportion of monocytes, and females have higher proportions of B cells with greater basal immunoglobulin levels6. It is plausible that in the absence of sex hormone expression during childhood, these sex-related biological differences could arise during gestation or the subsequent mini puberty that occurs during infancy7,8. Hormonal levels of progesterone and testosterone fluctuate during pregnancy, serving distinct roles during this gestational period and in the fetal developmental timeline8,9,10. After birth, sex hormone levels rise briefly in association with maturation of sex organs8,9,10. These sex hormones may also interact with environmental stressors and exposures, ranging from pollution to phthalates, resulting in altered hormonal levels in the fetus and disparities in later health outcomes between males and females11,12,13. This research on birth tissues, such as umbilical cord blood, suggests that sex-based health disparities in later life could be related to biological differences in the immune system that are established during gestation and further modified by the re-emergence of sex hormones in puberty5,7. Even with the observation that children converge on a common developmental trajectory overall, there is compelling evidence that sex differences are present in immune responses and health outcomes beginning at birth and proceeding throughout the life course.

Life course trajectories can be reflective of biological embedding of experience14. The Developmental Origins of Health and Disease (DOHaD) framework hypothesizes that the first 1000 days of life, conception to about two years of age, are characterized by rapid development. During this period, environmental exposures or intrinsic psychological states will impact the developing fetus and may have significant, long-lasting influence on the offspring’s health and wellbeing15,16. One such mechanism associated with biological embedding of early life experiences is epigenetics, mitotically heritable changes that occur around the genome with the potential for influencing gene expression while not altering the underlying sequence17. DNA methylation (DNAm), a well-characterized epigenetic mark in human population studies, involves the addition of a methyl group to the cytosine DNA residue. Importantly, DNAm is dynamic and can be reflective of the environment18,19,20. Differences in DNAm have been associated with age, sex, and other biological features, exposures, and disease states21,22,23,24. This chemical tag is also associated with transcriptional changes and involved in defining and maintaining cell type identity17. With this feature in mind, the choice of investigated tissue is imperative for the interpretation and findings of epigenetic studies. While blood is a commonly used peripheral tissue in DNAm research, it is directly relevant for observing immunological activity due to its constituent cell types25. Although blood, an immune-relevant tissue, is commonly used in DNAm studies, few have leveraged this epigenetic mark to explore molecular developmental differences specific to the immune system during early life.

Epigenetic research in pediatric populations has demonstrated that children experience more dynamic changes in DNAm compared to adults. However, the implications of DNAm changes in the immune system during early life have not been well explored26. Given the extensive DNAm remodeling observed in childhood, it is essential to investigate how these changes influence specific developmental pathways. Several studies have reported substantial autosomal DNAm changes in early childhood associated with biological sex and enriched for developmental processes27,28. Sex differences in autosomal DNAm have also been observed in adults where a number of co-methylated regions (CMRs) were found to be consistently different between the sexes from adolescence into older age29. While these findings suggest genetic-based sex differences that extend beyond the sex chromosomes, they have not specifically examined immune system development. As a result, little is known about how DNAm dynamically contributes to early immune maturation or whether these patterns relate to later-life sex differences in immune responses.

Our study aims to explore the dynamics of the immune system in early life, focusing on DNAm differences over time and between biological sexes in the pediatric population. Leveraging a deeply phenotyped cohort, the Canadian Healthy Infant Longitudinal Development (CHILD) study, we measured DNAm in venous whole blood collected at ages one and five. We first explored the developmental alterations of predicted immune cell type proportions between these ages and sexes. Next, we conducted age and sex-specific DNAm association studies employing immune gene candidates measured on a high dimensional microarray We subsequently validated the majority of the sex differences in independent pediatric cohorts. Together, we found significant sex differences in both cell type proportions and immune-specific DNAm, noting that DNAm differences between the sexes are relatively stable during the first five years of life. These data fill the gap in knowledge regarding developmental and sex-specific cell type composition and DNAm changes within the developing pediatric immune system.

Results

Estimated cell type proportions significantly changed in the first five years of life and exhibited sex-associated differences

Immune cell composition dynamics in early life are reflective of development and exposures as children grow. Leveraging our longitudinal pediatric cohort (CHILD), we sought to validate early life immune cell changes previously established from empirical measures such as complete blood cell counts and flow cytometry using DNAm data30. Using DNAm-based deconvolution algorithms, we first estimated the proportions of 12 immune cell subtypes at age one and age five and next compared the changes between these time points (Fig. 1, Supplementary Data 1). Each cell type in both the innate and adaptive arm exhibited significant changes in their predicted cell type proportion between age one and age five (q < 0.05). Notably, the difference in neutrophils, B memory, B naive, CD4 memory, CD4 naive, CD8 memory, and CD8 naive cells between age one and age five had large effect sizes and exhibited high likelihood ratios as assessed by Cohen’s d and Bayes Factor, respectively, which provided further evidence to support that the two timepoints were significantly different from each other.

Fig. 1: Cell type proportions predicted using bioinformatic deconvolution methods and compared between time points showed changes associated with immune system maturation.
Fig. 1: Cell type proportions predicted using bioinformatic deconvolution methods and compared between time points showed changes associated with immune system maturation.The alternative text for this image may have been generated using AI.
Full size image

Bonferroni family wise error rate (q) was applied to paired t-tests to address the testing of multiple comparisons (threshold < 0.05). The box plot overlay represents the interquartile range of each dataset, N = 760 biologically independent samples collected at two time points, with the center line representing the median value and the box encompassing the 25th−75th percentile. Significant differences were observed in all 12 predicted cell type proportions between sample collections after adjusting for multiple tests reflecting immune system remodeling during early development.

As these were paired data, we next examined how the individual participants’ immune cell profiles changed between time points. Most children had a clear and consistent trajectory (Table 1). In neutrophils, as well as in Bnv, Bmem, CD4nv, CD4mem, and CD8nv subsets, fewer than 10% of children demonstrated a change in the direction opposite to the predominant trend. In contrast, within eosinophils, 43% of children exhibited a decrease in proportion, while the remaining 57% experienced an increase.

Table 1 Summary of changes in estimated immune cell-type proportions between ages 1 and 5

In addition to significant cellular changes in the immune system during the first five years of life, immune differences between the sexes are well-documented across the lifespan4. In this study, sex was based on the biological sex determined at the time of birth for each participant. Given the limited exploration of these differences in early life, we next compared cell type proportions at ages one and five between males and females (Fig. 2). At both time points, basophils, CD4 memory cells, and T regulatory cells (Tregs) were significantly different between the sexes (q < 0.05) (Supplementary Data 2). By age five, additional significant differences in the pediatric immune profile were identified with males having a greater proportion of monocytes and females having a greater proportion of CD8 naive cells. These data not only confirm established changes in cell type proportions in pediatric populations but also reveal sex-associated differences in immune cell populations during early life.

Fig. 2: Predicted cell type proportions showed differences between sexes.
Fig. 2: Predicted cell type proportions showed differences between sexes.The alternative text for this image may have been generated using AI.
Full size image

Cell type proportions predicted using bioinformatic deconvolution methods and compared between the sexes at both collected time points to establish male (N = 404 independent samples) and female (N = 356 independent samples) differences in immune composition. Bonferroni family wise error rate (q) calculated to address the testing of multiple comparisons (threshold < 0.05) to paired t-tests. The box plot overlay represents the interquartile range of each dataset with the center line representing the median value and the box encompassing the 25th−75th percentile.

Immune-specific epigenome association study revealed age- and sex-associated DNAm signatures in the first five years of life

Given the significant changes observed in all cell types between time points, and recognizing that functional changes occur during early development, we next employed a site-specific DNAm analysis of immune candidate genes to gain a detailed understanding of the molecular changes underlying immune system development.

We tested the associations between DNAm and time or sex by fitting robust linear regression models, which are sensitive to potential outliers and data with non-normal distributions. Statistically significant DNAm sites were identified based on a statistical cut-off FDR of 0.05 (p < 0.002) and a Δβ threshold of greater than 3% (|Δβ | > 0.03) (Supplementary Data 3). CpGs discovered in all presented regression models also met the same significance threshold in a sensitivity analysis corrected for immune-relevant estimated cell type proportions (Supplementary Data 4).

We identified age-associated differential DNA methylation at 10 CpGs within nine genes out of 4968 CpGs across 628 immune system candidate genes, using robust linear regression analysis (Fig. 3, Table 2). Seven of the ten CpG sites were located within larger co-methylated regions (CMRs), two of which showed significant associations with age (Supplementary Data 5).

Fig. 3: Significant changes in immune system DNAm associated with age.
Fig. 3: Significant changes in immune system DNAm associated with age.The alternative text for this image may have been generated using AI.
Full size image

Site by site association studies between immune-specific candidate CpGs and age with |Δβ | > 0.03 and FDR < 0.05. A cohort wide age-specific analysis identified ten high confidence CpGs with differential DNAm associated with age. N = 760 individuals sampled at two time points. B Female (N = 356 individuals) stratified age analysis identified seven high confidence CpGs with differential DNAm associated with age. C Male (N = 404) stratified age analysis identified seven high confidence CpGs with differential DNAm associated with age. D Venn diagram of significant age associated sites between all three analyses. Male and Female age associated sites were independent, but shared commonalities with sites significant in the cohort wide analysis.

Table 2 Top differentially methylated CpG sites associated with age across the whole cohort and stratified by sex

When stratifying the same analysis by sex, we noted age-specific associations in seven CpGs in males and a non-overlapping set of seven CpGs in females (Fig. 3d). Of these CpGs, four CpGs were consistent between all samples and female only associations, and five CpGs were consistent between all samples and male only associations. Although not statistically significant in the sex stratified analyses, all 10 significant CpGs from the cohort-wide analysis exhibited consistent Δβ directionality in males and females (Supplementary Data 6). Overall, we identified age-specific DNAm associations with the immune system during this window of development that were significant for all children.

Given that we observed both cohort-wide DNAm differences in the immune system genes with age in conjunction with cell type proportion differences with sex, we next explored potential sex-associated DNAm differences at each time point. Within immune candidate genes, at age one, we identified 46 CpGs significantly associated with sex, and at age five we identified 37 CpGs significantly associated with sex (Fig. 4, Supplementary Data 3). 77% of all significant sex-associated CpGs (36 out of 47), exhibited consistent Δβ of sex differences across ages in both direction and magnitude (Pearson correlation: 0.995, p < 2.2e-16), and the Δβs of all 47 CpG sites were also highly correlated at both time points (Pearson correlation: 0.98, p < 0 2.2e-16). Furthermore, 38 of the sex-associated CpGs at age 1 and 29 at age 5 were located within larger correlated methylation regions (CMRs), comprising 17 and 15 regions, respectively. Of these, 10 CMRs at age 1 and 7 at age 5 showed significant associations with sex. (Supplementary Data 5).

Fig. 4: Immune system specific DNAm associated with sex.
Fig. 4: Immune system specific DNAm associated with sex.The alternative text for this image may have been generated using AI.
Full size image

Site by site association studies between immune-specific candidate CpGs and sex with |Δβ | > 0.03 and FDR < 0.05 in 760 individuals (404 males and 356 females) (A) Sex-specific analysis at age 1 identified 46 sites with significantly different DNAm. B Sex-specific analysis at age 5 identified 37 sites with significantly different DNAm. C Venn diagram of significant sex associated sites identified at both time points. 36 sites had sex-associated differential DNAm at both time points (D) Correlation of effect sizes of all significant hits, showing that even if the non-overlapping sites were trending in the same direction at both time points. Grey dashed lines represent the technical threshold cut off, red dots indicate significant at both time points (n = 36), black dots are significant at one time point. Grey shading along the line corresponds to the 95% confidence interval of the fitted line.

Validation of sex differences in DNAm

Finally, to confirm the robustness of our sex-associated DNAm findings we took advantage of two independent cohorts where samples were collected at approximately the same ages as the children in our discovery cohort, CHILD. Specifically, one dataset was obtained from the publicly available repository, GEO (GSE142512), consisting of the healthy control one-year old individuals in a previously published study on DNAm associations with Type 1 Diabetes, and the other dataset was comprised of four-year olds from a population-based prospective cohort: INMA31,32. As the samples in the validation datasets were profiled using an older Illumina platform, the 450k array that quantifies DNAm at approximately half the sites that are included on the EPICv1 array, only 35 of the original 47 significant sites were available to be tested. Using the same technical threshold as the previous analyses, 15 CpG sites validated at age one, and at age four, 16 of the sites validated (|Δβ | > 0.03 and p value < 0.05) showing sex-associated differential methylation. Of the remaining sites that did not meet the effect size threshold (|Δβ | > 0.03), 18 and 17 CpGs at ages one and four/five, respectively, still had effect sizes trending in the same direction as the original analysis and passed the significance threshold. All of the sites tested exhibited high correlation between the discovery and validation cohorts; R2 = 0.8, p = 7.3e-13 at age one and R2 = 0.9, p = 7.4e-15 at age five (Fig. 5, Supplementary Data 7, Supplementary Data 8).

Fig. 5: Strong correlation of sex-associated Δβ between original discovery and validation cohorts.
Fig. 5: Strong correlation of sex-associated Δβ between original discovery and validation cohorts.The alternative text for this image may have been generated using AI.
Full size image

Both validations showed strong correlations between the Δβ in the original discovery cohort (CHILD) and the independent validation cohorts. Red dots correspond to sites that met significance thresholds in the validation analysis. Black dots met the statistical significance threshold (p < 0.05), but did not meet the technical threshold (|Δβ| > 0.03). Grey shading along the line corresponds to the 95% confidence interval of the fitted line. A Age one validation comparing the Δβ between CHILD and the GEO dataset (34 CpGs). B Age five validation comparing the Δβ between CHILD and INMA (29 CpGs).

Discussion

The early life developmental window is a critical period that may establish lifelong health trajectories15,16. The biological foundations of these trajectories can be better understood by examining the dynamics of the typically developing immune system as reflected in changing cell type proportions and molecular markers during early childhood14. Leveraging DNAm data from the longitudinal CHILD birth cohort, we characterized changes in whole blood immune cell proportions and DNAm patterns over the first five years of life. We observed system-wide changes in immune cell proportions with sex differences evident as early as age one, while DNAm patterns showed few age-specific changes but stable, consistent sex-associated differences. Collectively, these findings provide insight into immune composition and molecular changes underpinning immune system development, highlighting both the temporal dynamics and the prominent role of sex-associated DNAm patterns in this healthy pediatric population.

To begin exploring the immune system composition, we used a bioinformatic approach with an adult reference panel, allowing us to predict cell type proportions in this pediatric population. We were able to recapitulate early-life cell type changes previously established using other empirical methods. Our findings align with these prior observations, including the gradual increase of memory T cells across childhood identified through flow cytometry, the balance between B memory and B naïve cells observed via high-dimensional cytometry, and the inverse nature of neutrophil and lymphocyte proportions during pediatric development captured by complete blood cell counts30,33,34. By utilizing cell type specificity inferred from bioinformatic predictions and validating these estimates through replication of prior findings, we demonstrate the applicability and utility of this approach for studying immune cell proportions from DNAm data in pediatric populations, particularly when empirical measurement methods are not available. This framework provided the basis for examining how these inferred immune cell proportions change across the rapid developmental period of early childhood.

Given that the first five years of life are marked by rapid immune development, we anticipated observing significant changes in all cell type proportions, with larger effect sizes in the adaptive arm compared to the innate arm, reflecting the accumulation of immunological memory from early-life exposures35. In our study, we observed small effect sizes as assessed by Cohen’s d in most of the innate cell types (i.e., monocytes, eosinophils, and basophils). This was as we anticipated based on the known stability of the innate immune system over time, primarily in order to mount a consistent response to pathogenic exposures35. Alternatively, given the role of the adaptive immune system to mount a learned response over time, such as in the case of vaccinations, we hypothesized that the adaptive immune cell type proportions would not only change over time, but that these changes would be larger in magnitude35. As anticipated, the largest effect sizes were observed in changes to B and T cell proportions between one and five years old. These shifts reflected the dynamic developmental period during which children are increasingly exposed to environmental stimuli, prompting the transition from a predominantly naïve immune system to one characterized by acquired immunological memory1. The study design also allowed us to examine within-participant changes during this developmental window. While most children exhibited a consistent pattern of change, some deviations from the majority were observed. Notably, eosinophils showed the most striking variation, which, given their small population, may be linked to the future development of allergy or asthma and warrants further investigation. These findings highlight the dynamic nature of immune landscape in early childhood and underscore the importance of longitudinal studies in understanding individual variability and potential implications for future health outcomes.

We next investigated site-specific changes in DNAm patterns of the pediatric immune system between ages one and five to explore molecular features of immune development, which may reflect shifts in cell type proportions or functional changes associated with maturation. Owing to the fact that we observed large changes in cell type proportions with time, which was suggestive of a dynamic and developing immune system during this pediatric window, we expected to see immune-specific DNAm changes between one and five years old in a large set of immune genes. Instead, we identified a small number of significant DNAm associations annotated to nine genes, including NLRC5 and LAG3. As these genes are involved in antiviral immunity and antigen presentation, respectively, it is conceivable that we captured a signature of immune system maturation in whole blood that could be a result of vaccination or an overall increase of exposure to pathogens as children are entering daycare and increasing their social interactions with age36. While these age-associated findings offered some insight into immune development, they did not fully explain the biological variation observed in our cohort.

We therefore examined sex-associated differences as another potential contributor to this variation. Analyzing changes from age one to five separately in males and females revealed significant, yet non-overlapping, CpGs. In females, but not males, two CpGs annotated to NOD2, a gene involved in bacterial recognition and proper immune activation, exhibited significantly different DNAm between ages one and five. Females are typically seen to mount a stronger innate immune response than males in early life, and the observed sex-associated DNAm differences in NOD2 CpGs may be involved in the molecular underpinnings of that observation2. In males, but not females, two CpGs annotated to DHX58, a negative regulator of type 1 interferon, had significantly different DNAm between ages one and five. Type 1 interferon plays a key role in antiviral defense and may contribute to the higher incidence of viral infections reported in males37. These minor differences in observed immune system-related DNAm changes in males and females during the first five years of life align with sex differences observed in health outcomes and could play a role in the underlying mechanism of immune system function. Although not statistically significant, effect sizes for sex-stratified CpGs in one sex tended to be similar in direction and magnitude in the other, suggesting that limited power or interindividual variation may have influenced the results, warranting validation in larger independent cohorts.

We were ultimately surprised to find what seemed like a fairly small number of age-specific immune system DNAm changes in both the cohort-wide and sex stratified analysis. This limited number of changes may be partly attributable to the use of whole blood, a heterogeneous tissue in which the detectability of molecular changes can be constrained by cell type composition, particularly for DNAm differences of smaller magnitude (>3%), as noted in this study. Further, DNAm works in tandem with other epigenetic mechanisms, such as histone modification, to shape cell type identity and immune system function during childhood, indicating other epigenetic mechanisms should be investigated in the future38. Together, we demonstrated DNAm within the immune system is relatively stable during this developmental window and is likely not the primary molecular driver of the observed compositional and functional changes in whole blood-based analysis.

With sex differences in immune responses and health outcomes observed across the life course, we also compared the estimated cell type proportions of males and females in our pediatric cohort. Currently, literature exploring sex differences in the immune system has typically included older adults or post pubescent populations when sex hormones are actively produced and modulating the immune responses, yet similar studies are lacking in early pediatric populations3. To address this gap, we compared the estimated cell type proportions between males and females at age one, and observed significant differences in basophils, CD4 memory T cells, and Tregs that persisted through age five. Differences in Tregs have been reported between older men and women, but much of what is known about Treg proportions and their immunosuppressive function is hypothesized to be associated with sex chromosomes and hormonal interactions39,40. Our findings indicate these cell type proportional differences were present as early as one year old and bear further investigation across ages, as it may be plausible that hormones during gestation or the presence of sex chromosomes establish this difference prior to puberty. We also identified several sex-associated cell differences unique to a specific time point. For example, monocyte proportions were not significantly different between the sexes at age one but exhibited sex-associated differences by age five. In the literature, sex differences have been observed in monocytes, but these were from adults aged 25−71 years, where males were seen to have a consistently higher proportion of monocytes as compared to females, the same trend that we saw emerge in our study at age five41. Although multiple studies have highlighted significant sex differences in cell type proportions in adulthood, our study has provided compelling evidence of sex-associated cell type proportion differences that exist as early as one year old and persist through development.

While there were limited changes in immune-specific DNAm observed across early life in an age associated manner, we further explored these molecular differences between the sexes at each time point. In contrast to the surprisingly small number of CpGs associated with immune maturation over time, we found four times more significant DNAm differences associated with sex at each time point, as well as more than a 75% overlap in sex-differential DNAm congruent in ages one and five. This consistency in sex-associated immune DNAm in the first five years of life suggests that these differences are present and stable throughout childhood, potentially indicating there may be different biological starting points for the male and female immune system even before age one that persist throughout early life immune development. The discovered CpGs were annotated to genes such as ERG2, CFD, SPIB, RUNX1, and ALOX15 which are involved in multiple elements of the immune response from the complement pathway in innate immunity to lymphoid-specific enhancers in the adaptive arm. While there was only one sex-associated CpG uniquely identified at age five, we identified ten CpGs that were different between the sexes at age one but not age five. These were annotated to ALOX5, IL1RL2 and RELB, which are genes involved in inflammation, suggesting that sex differences in inflammatory responses could be more pronounced at age one. Sex differences have been reported in other acute inflammatory conditions, supporting the notion that males and females exhibit distinct inflammatory responses during early life. Specifically, males experience sepsis, a systemic inflammatory response mounted by the immune system, 2.8 times more frequently than females in the first year of life, but by age 8 there is a convergence of disease rates between the sexes42. Aside from inflammation, we observed the most significant difference in DNAm between the sexes at both time points was a CpG in CALM1. As this gene is responsible for calmodulin regulation of ion channels, sex differences could be associated with calcium-based messaging in lymphocyte cell types43. We also identified significant sex differences in lymphocyte proportions at both time points, which may be linked to differential DNAm at immune-related loci. Collectively, our findings show that the majority of sex-associated DNAm differences were stable across ages one and five, reinforcing the idea that immune-related epigenetic profiles are established early in life, remain consistent during this developmental window, and may underlie observed sex-based disparities in immune-related health outcomes. Notably, two of the sex-associated CMRs identified in our analysis associated with ITGA2B and EGR2 overlapped with previously reported sex-specific differentially methylated regions in adults, further highlighting the persistence of these epigenetic differences across the lifespan2. These compelling results prompted us to evaluate whether these findings could be replicated in independent pediatric cohorts.

Replicating results in DNAm studies is often a challenge in the field44. Though we did not have access to a comparable longitudinal cohort with the same collection time points and sample type (whole blood), we leveraged two previously published independent pediatric DNAm cohorts with whole blood samples to validate our findings at each time point separately. At age one, we were able to use the healthy controls from a study focused on Type 1 Diabetes, and age five we used INMA, a Spanish birth cohort with samples collected at age four31,32. At both time points, we were able to validate a significant subset of our sex-associated differences (approximately 30%) at the same statistical and biological thresholds in spite of smaller sample sizes and differences in genetic ancestry. In fact, for the CpG sites that did not meet effect size cut-off, we still observed strong correlation between the effect sizes in the discovery and validation cohorts, regardless of the time point. Our validation confirmed the robustness, stability, and consistency of sex-based differences in early life immune system autosomal DNAm, despite platform disparities, limited sample sizes, and varied populations. These findings underscore the reliability of our approach and provide a foundation for future studies that continue to explore the nuances of immune development in early childhood.

The goal of this study was to characterize cell type changes and DNAm signatures of normative immune development within and between the sexes in the first five years of life. To investigate cell type changes in our pediatric population, we relied on bioinformatic deconvolution tools using an adult reference. Though this adult reference has been widely used in pediatric populations, future research would benefit from validation of these findings with empirical methods such as cell sorting. At this stage, we cannot determine the functional significance of the observed DNAm changes, as transcriptomic data were not available for integration. However, combining DNAm and gene expression data in future studies would be a valuable next step for elucidating the biological relevance of these epigenetic differences. We also observed inter-individual variation in immune cell type proportions, which is expected given the heterogeneous nature of blood. To ensure the robustness of our group comparisons, we employed additional statistical approaches to confirm meaningful differences, even in cases where the distributions appeared similar. Future translation or extrapolation of our findings could also be limited by the demographics of the current cohort. Specifically, we identified immune-specific DNAm changes in a normative population, yet, as these children have access to regular health care, are predominately of high socioeconomic status, and have a low incidence of allergy or severe illness, the findings may not translate to a population with more diversity or wider variety of health outcomes. We also observed significant differences between study subsets in study center distribution, self-reported maternal ethnicity, and mode of delivery. To account for these differences, genetic ancestry and study center were included in the models. However, mode of delivery was not included as a covariate, as it has not been shown to be significantly associated with DNAm in later life45. Furthermore, the two collection time points at one and five years of age allowed us to focus on the early life developmental window where we were able to identify previously under reported immunological sex differences in the prepubescent time window. However, in the absence of DNAm data extending beyond the five years, we are currently unable to extrapolate whether the identified sex- DNAm differences in immune system genes persist until, during, or after puberty, especially with the introduction of sex hormones. Future work utilizing larger, more diverse cohorts with additional time points to not only replicate, but expand on these findings, may provide valuable insights for pediatric research, advancing understanding of the molecular basis of immune system development, and informing clinical approaches to reduce sex-based health disparities46.

Our study sought to leverage a longitudinal DNAm cohort to characterize pediatric immune system development. Though we observed significant changes in immune cell type proportions with age, the site-by-site DNAm changes over time were small and suggest that DNAm in immune-related genes is relatively stable during this period of development. We also identified consistent sex-associated differences in both cell type proportions and autosomal DNAm in prepubescent children. It is reasonable to hypothesize that during gestation, when sex hormones are present, a divergence in the male and female immune system may be set in place. While there are notable distinctions between male and female immune cell composition and immune-related DNAm patterns, many of these differences persist throughout the first five years of life, suggesting males and females have distinctive biological starting points, but are on a similar immune system developmental trajectory. This study characterizes early-life immune changes in typically developing children and provides a foundation for future research aimed at understanding and addressing sex disparities in pediatric health.

Methods

Participant description

The Canadian Healthy Infant Longitudinal Development (CHILD) Study began enrollment in 2008, recruiting a total of 3621 pregnant women from four major cities in Canada (Vancouver, Edmonton, Winnipeg, Toronto). Eligible infants (n  =  3,454) with no congenital abnormalities and born at a minimum of 35 weeks of gestation were also enrolled and have been followed from pregnancy to age five years, and beyond47. Informed consent was obtained from parents at the time of enrollment. CHILD Study participants were followed prospectively and thorough information on environmental exposures, clinical measurements, and assessments were gathered through a combination of questionnaires and in-person clinician appointments.

The samples used in the current study are a representative sub-cohort of infants (n = 760 paired, 1520 total) drawn from the CHILD study who had whole blood collected at ages one and five (Table 3). Biological sex was collected at birth.

Table 3 Study Demographics: Demographic characteristics of the full cohort, study subset, and participants stratified by sex

This study adheres to all applicable ethical guidelines and was developed and authorized by the University of British Columbia, University of Manitoba, University of Toronto, McMaster University, BC Children’s Hospital, The Hospital for Sick Children, and Simon Fraser University. The Research Ethics Board (#H07-03120) examined and authorized this study in accordance with the Tri-Council Policy Statement: Ethical Conduct for Human Research (TCPS2, 2018). All ethical regulations relevant to human research participants were followed.

DNA sample collection and microarray profiling

DNA was extracted from whole blood samples (n = 1616) using the DNeasy Blood & Tissue Kit (Qiagen, Venlo, The Netherlands) with concentration and quality assessed using a NanoDrop 8000 Spectrophotometer (Thermo Fisher Scientific, United States). For DNAm array profiling, purified DNA was bisulfite converted using EZ-96 DNA Methylation kit (Zymo Research, Irvine, CA, United States) and DNAm profiles of the samples were measured with the Infinium MethylationEPIC BeadChip array version 1 (EPICv1) (Illumina, San Diego, CA). Raw intensity IDAT files for 866,836 data points encompassing 863,904 CpG sites were produced and preprocessed for downstream analysis. This included 1616 individual samples plus 12 technical replicates randomized across 293 chips by consistent operators.

For whole genome single nucleotide polymorphism (SNP) profiling, purified DNA from all cohort samples was hybridized to the GSA v3 + Psych v1 array (Illumina, San Diego, CA) to measure SNP markers at 699,424 genotypes.

DNA methylation preprocessing

All available whole blood DNAm data from children collected at ages one and five (n = 1,628) were processed in RStudio (version 4.0.3). The ewastools R package was employed for sample quality control to assess multiple technical parameters including array staining, extension, hybridization, target removal, specificity, and bisulfite conversion based off of 636 control probes48. Next, the minfi R package was used to assess methylated and unmethylated intensities and check sex concordance between reported and inferred sex using sex probe intensities and the lumi R package was used to detect potential outliers49,50,51. Poorly performing samples were defined as those with higher proportions of bad detection p-value (p > 0.01 on > 1% of all probes), with DNAm intensity statistically different from the average intensity of the negative control probes, and/or having a bead count less than 3 on > 1% of all probes. The 59 SNP probes included on the array were also leveraged to check for sample identity as the one- and five-year samples from each child should cluster together. The 12 technical replicates were removed from the final data set, and based on the quality control checks, 14 of the 1,616 independent samples were removed: seven due to sex mismatch, four due to missing bead count, and three due to lack of SNP agreement.

Next, probe filtering was conducted to remove poorly performing probes, probes on the sex chromosomes, and those predicted to be cross-hybridizing or containing SNPs52. Poorly performing probes were identified based on those with a high detection p value (p value > 0.1) and missing data (beadcount < 3)53. Beta-Mixture Intra Quantile (BMIQ) normalization was done in conjunction with normal-exponential out-of-band (noob) correction to account for probe type bias and background correction54. Finally, to correct for known technical confounding factors, variation associated with batch (chip, row, and plate) was removed using the ComBat function from the sva R package55. Principal component analysis (PCA) was employed during each step of data preprocessing to confirm normalization and batch correction was appropriately performed on the data (Supplementary Fig. 1). After all preprocessing steps, a total of 1602 samples and 786,363 probes were included in the final sample set. For subsequent analyses, samples were subset to those with matched data at both the age one and age five collection time point (760 children at each time point for 1520 total distinct samples).

DNAm data reduction

For further data reduction to meet assumptions of Benjamini Hochberg false discovery rate (FDR) correction for equal likelihood of discovery, all DNAm probes that passed preprocessing were subjected to interquartile range filtering to subset only variable probes where the DNAm beta (β) values varied by at least 5% across all samples between the 5th and 95th percentiles. A total of 332,560 variable probes were included in the final object and used in downstream analyses and composition of the candidate list.

Computational cell type proportion estimation

To identify developmental and sex differences in cell types within our pediatric cohort, we completed cell type deconvolution to predict cell type proportions from raw, unfiltered DNAm data. The DNAm-based cell type estimates of 12 immune cell populations (Bnv, Bmem, CD4nv, CD4mem, CD8nv, CD8mem, Treg, NK, Monocyte, Neutrophil, Basophil, and Eosinophil) were extracted from the extended Identifying Optimal DNA methylation Libraries (IDOL-ext) reference dataset which has previously reported high correlations between mixtures of known cell type proportions and predicted values56,57,58. We utilized the estimateCellCounts2 function from the FlowSorted.Blood.EPIC package to noob normalize our samples to the IDOL-ext reference and select the top 100 cell-type-specific probes for deconvolution56,59.

Genotyping preprocessing

Samples were genotyped on the Illumina Infinium Global Screening + Psych Array (GSA) which measures SNP markers at 699,424 genotypes. Using the genotyping module of Illumina’s GenomeStudio software (version 2.0.4), samples were assessed for call rate and GenCall score. All samples had a call rate > 0.97 and a 10% GenCall confidence score. As there are common SNP probes (59) between the EPICv1 array and the GSA, discordant SNP patterns at these probes were used to identify potential sample mismatches. No mismatches were identified during this check.

SNPs were filtered based on various Illumina recommended metrics including GenTrain score (<0.4), cluster separation score (<0.45), call frequency (<0.97), and intensity of heterozygote and homozygote clusters (AB R Mean ≤ 0.4, AB T Mean ≤ 0.2 or ≥ 0.8, AA Frequency = 1 & AA T Mean ≥ 0.2, AA Frequency = 1 & AA T Deviation ≥ 0.04, BB Frequency = 1 & BB T Mean ≤ 0.8, BB Frequency = 1 & BB T Deviation ≥ 0.04, AA Frequency or BB Frequency = 0 & AB T Deviation;≥ 0.5, AB Frequency = 0). SNPs were also excluded based on a heterozygote excess score (<-0.3 or >0.2). SNPs that deviated from Hardy Weinberg Equilibrium (determined through a chi squared test p ≤ 10-6) were also removed. Next, rare SNP genotypes with a minor allele frequency (MAF) < 1% were excluded as they may reflect inaccuracies in genotype calling. Imputation was conducted using The Enhancing Neuroimaging Genetics through Meta-Analysis (ENIGMA) Consortium imputation protocol based on the Michigan Imputation Server pipeline60. After imputation, SNPs with a R2 ≤ 0.8 and MAF ≤ 0.01 were filtered out using bcftools v1.16 14. To minimize redundancy of genetic information between SPS and reduce the computational burden, SNP data was pruned in PLINK software (version 1.9) based on linkage disequilibrium (LD) using the pairwise comparison method with a 50Kb sliding window, a 5 variant count window shift, and a 0.5 pairwise R2 threshold.

After obtaining the LD pruned dataset with 1,260,703 probes, we confirmed sample independence, an assumption of the majority of statistical models. Sample relatedness was checked by calculating kinship coefficients (identity by descent) based on Maximum Likelihood Estimation (MLE) in R SNPRelate package. Based on a kinship coefficient score of 0.5, 3 pairs of related samples were identified which were technical replicates and being sample duplicates were expected to be identical. Only one sample from the 3 pairs was retained for downstream analysis.

The final genotyping consisted of a data set of 1,260,703 SNPs and 822 samples.

Genetic ancestry estimation

We next inferred the genetic structure of the study population using the 2504 individuals across four major superpopulations using the 1000 Genomes Project as the reference. Using the SNPRelate R package, we performed a principal component analysis (PCA) including both sets of samples and shared SNPs between the CHILD cohort and 1000 Genomes Project that have been pruned for LD (Supplementary Fig. 2). The first 2 components from the PCA, representing 73% of the overall genetic variance, were included as covariates in the statistical models.

Immune candidate gene list

Leveraging established array-based assays, the genes included in the Nanostring nCounter® immunology panel were selected for candidates for the immune focused arm of this study. Based on Illumina’s annotation of the gene names as informed by the UCSC genome browser, of the 1033 genes included in the NanoString panel, 989 were present on the EPICv1 array. After subsetting to variable probes that were not located in gene regions with known splice variants, the final testing space included 4968 autosomal probes used in subsequent analysis (Supplementary Data 9). The chromatin structure associated with these locations was annotated using ChromHMM with the neutrophil and PBMC reference61,62.

Cell type proportion comparisons

To assess the statistically significant differences between cell type proportions between ages one and five and between sexes, a Bonferroni adjusted p value was applied to paired t-tests with a threshold of 0.05 using the t_test and adjust_p value functions in the rstatix package. We then assessed effect size using Cohen’s d in the lsr R package and the evidence for supporting the null hypothesis using Bayes Factors calculated using the BayesFactor R package63,64.

Technical replicate Delta Beta (Δβ) threshold calculation

For this study, the biological effect size threshold was determined using the root mean square error (RMSE) of the 12 age five technical replicates after preprocessing (0.0199) added to the maximum standard error across all CpGs (0.008386347). This value, 0.0283, was rounded up to two decimal places, yielding a threshold of |Δβ | > 0.03, which is considered to be representative of an effect size greater than technical noise.

Delta beta (Δβ) calculations

Biological effect size for DNAm is quantified as the change in methylation value, referred to as Δβ65. In this study, the Δβs were calculated against continuous data (age) and categorical data (sex). For continuous data, Δβs were calculated by extracting the β coefficient of the time between sample collections (ΔAge) from the linear model CpG site ~ ΔAge. This coefficient, representing the change in β value on the y-axis divided by time elapsed on the x-axis was then multiplied by the cohort interquartile range of ΔAge. For categorical data, the average β for the test category (males) was subtracted from the average β from the reference category (females) for every CpG site. The resulting Δβs represent the observed change in β value at each CpG site in an unadjusted association with the variable of interest.

Assessing statistical power

In order to determine if our sample size had sufficient statistical power to detect meaningful differences, we employed the pwrEWAS package in R66. For the sex stratified analysis (male n = 404, female n = 356), a statistical power of 66% was achieved. For all associations, with the total sample (n = 760), we were well-powered at 84% to detect a technically meaningful effect of Δβ > 3% for the immune-relevant testing space of 4986 CpGs from 628 genes.

Candidate DNAm association studies with robust linear regression

A robust linear regression model using Huber M-estimation was used for site-by-site epigenome association analysis, as this method is robust to heteroskedasticity and outliers, both of which are present in these data. All models were run on variable immune candidate probes for a total of 4,968 linear regression tests. Model construction was tested for multicollinearity and relations were assumed to be linear and normally distributed. Covariates known to contribute significantly to DNAm differences, i.e., sex, age, genetic ancestry, recruitment location and prenatal smoke exposure, were included65. We did not adjust for cell type proportion as we were interested in the developing immune system i.e.. the changes that occur as a result of immune cell proportions adapting and maturing.

The linear model used for testing associations with age is as follows

$$ {Candidate\; CpG}\,\Delta {DNAm} \sim \Delta {Age}+{Sex}+{GA}1+{GA}2\\ +{Study\; Center}+{any\; prenatal\; smoke}\exp o{sure}+\varepsilon$$

Where the input CpG ΔDNAm was represented by a differential matrix of the beta values at five minus the beta values at one for all candidate CpGs, delta age was the amount of time between sample collections, and GA1 and GA2 are genetic ancestry PCs. Rather than using a linear mixed model to account for the longitudinal nature of our data, we incorporated the change in age as a variable to capture the effect of time elapsed between samples, recognizing that longer intervals may correspond to greater potential for change.

The linear model used for testing associations with sex is as follows

$$ {Candidate\; CpG} \sim {Sex}+{Age\; at\; sample\; collection}+{GA}1\\ +{GA}2+{Study\; Center}+{any\; prenatal\; smoke\; exposure}+\varepsilon$$

Where the input CpG beta values were either from the age one or the age five time point

Model fit was assessed by QQ plot and genomic inflation of the models was observed67,68. To address the inflation and reduce the incidence of false positives, the BACON package in R was employed to construct an empirical null distribution by which adjusted p values could be extracted (Supplementary Fig. 3)69.

Following BACON, all regressions were multiple test corrected using the Benjamini–Hochberg FDR control method70. A high confidence statistical threshold of FDR < 0.05, which corresponded to P ≤ 0.002 in our sample, and a biological effect size threshold of |Δβ | > 0.03 were used to determine significant associations.

Given that cell type proportions are known to influence DNA methylation, we also conducted a sensitivity analysis by including cell type proportions as covariates in our models. The adjusted models were as follows:

Age Associations:

\({Candidate\; CpG}\,\Delta {DNAm} \sim \Delta {Age}+{Cell\; type\; proportions}+{Sex}+{GA}1+{GA}2+{Study\; Center}+{any\; prenatal\; smoke\; exposure}+\varepsilon\)

Sex Associations:

\({Can}{didate\; CpG} \sim {Sex}+{Cell\; type\; proportions}+{Age\; at\; sample\; collection}+{GA}1+{GA}2+{Study\; Center}+{any\; prenatal\; smoke\; exposure}+\varepsilon\)

Post hoc characterization of significant DNAm sites

CpG sites of interest were annotated to genes, genomic regions, and chromosomal locations using the UCSC Genome Browser with the GRCH37/hg19 human genome reference assembly52. To relate the resulting sets of genes to their biological functions and gene network interactions, the GeneCards (https://www.genecards.org) database and EWAS Atlas (https://ngdc.cncb.ac.cn/ewas) were used71,72,73. To evaluate chromatin structure, ChromHMM reference panels were employed61,62. Given the regional correlation of DNAm and its potential biological relevance, a region-based approach was implemented. CMRs were defined using the Co-Methylation with Genomic CpG Background (CoMeBack) algorithm, which clusters genome-wide CpG sites based on a Pearson correlation threshold of ≥ 0.4 and a maximum inter-probe distance of 1 kb74. Composite beta values for each CMR were computed using PCA. CMRs containing at least one significant CpG site were further analyzed for association with the variable of interest using the same robust linear regression models described in Section 2.3.5, with the composite beta value as the input. Associations were considered significant if the |Δβ| of the CMR exceeded 0.03 and the false discovery rate (FDR) was below 0.05.

Validation cohorts and data preprocessing

To validate the sex associated differences in DNAm, two separate pediatric whole blood cohorts were used. The publicly available dataset GSE142512 from GEO was used to validate the findings from age one (henceforth referred to as GEO) and the INfancia y Medio Ambiente (INMA) - Sabadell cohort was used to validate the findings from age five31,32. Both cohorts were preprocessed independently from raw IDATS following the same steps as outlined for the discovery cohort. DNAm from both GSE142512 and INMA was measured using the Illumina 450k array.

Briefly, samples were assessed against 17 Illumina quality control metrics using the ewastools R package and detection p-value, bead count, average methylated and unmethylated intensity metrics were evaluated using the minfi R package51. To confirm the biological sex of the samples, sex was inferred by the array probes targeting the X and Y chromosomes with the R package minfi (version 1.44.0) and matched with reported sex. All samples passed the QC metrics. Dye bias and background correction was performed by and subsequently normalized by BMIQ54.

After preprocessing, 197 samples remained in INMA, and the GEO data set was subset to passing healthy control samples (n = 56) (Supplementary Data 10).

Validation of candidate DNAm association analysis with linear regression for sex

We performed the same analysis as described in Section 2.3.5 independently in the GEO dataset and the INMA cohort using robust linear regression, including age, to assess the relations between changes in DNAm and biological sex. We tested only DNAm sites identified in the CHILD cohort analysis that were present in the datasets (n = 35 out of 47) (Supplementary Data 11). In accordance with the literature, CpG sites were determined to be meaningful at a threshold of |Δβ | > 0.03 and a p value of < 0.0575.

Statistics and reproducibility

The discovery cohort included 760 individuals with samples collected at two time points, comprising 404 males and 356 females for the stratified analysis. Validation of sex differences at age 1 was performed using 55 samples from GSE142512 (GEO; 36 males, 19 females), and validation at age 5 was performed using 197 samples from the INMA cohort (102 males, 95 females). All statistical analyses were conducted in RStudio v4.2.2 with Bioconductor 3.16. Multiple testing corrections (Benjamini–Hochberg and Bonferroni) were applied as appropriate. For comparisons of continuous variables (e.g., cell type proportions) between two groups, paired t-tests were used. Comparisons of categorical variables (e.g., birth mode) were assessed with chi-squared tests. Pearson correlation was used for correlation analyses. To establish the technical threshold for DNA methylation analysis, technical replicates distributed across the arrays were used to calculate the root mean square error (RMSE). The relationship between the variable of interest and DNAm was assessed using robust linear modeling76.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.