Introduction

The hippocampus is a highly adaptable archicortical structure that develops slowly throughout early life1,2. Cognitive and affective processes, including memory, spatial processing, emotion, motivation, and language are dependent on normal hippocampal development3,4,5,6, which is highly sensitive to experience and particularly early stress7, such as environmental adversity. Alterations in hippocampal structure and function have been linked to several developmental disorders including attention-deficit/hyperactivity disorder, autism spectrum disorder, and obsessive-compulsive disorder8. Importantly, the hippocampus is highly sensitive to experience; volumes in term-born infants increase exponentially after birth2, and both its structure and function are affected by stress and childhood adversity9,10,11. Nevertheless, the neurobiological mechanisms underlying this vulnerability remain unknown.

Anatomically, the hippocampus is a unique grey matter structure with a complex folding pattern. The hippocampal grey matter is made up of distinct subfields, comprising the cornu ammonis (CA1-4), the dentate gyrus (DG), and the subiculum, each with their own function, structure, and connectivity, with several subfields showing nonlinear growth in childhood12. In children, research has demonstrated variability in subfield volumes related to different developmental disorders and stressors7, however, results are mixed in terms of the specific subfields involved and the direction of the effects13,14,15. The bulk of the white matter of the hippocampus is contained within the subiculum as the perforant path and within the stratum radiatum lacunosum moleculare (SRLM) layers; however, it is also present in the dentate gyrus and hippocampus proper16,17, and studies have revealed varying patterns of myelination throughout the brain in neonates18,19. In contrast to volumes, although hippocampal myelination has been implicated in psychiatric disorders in adults20,21, it is understudied in general, especially in developmental samples. Hippocampus developmental timelines differ substantially between volume and myelination, with volumetric development occurring primarily in the first trimester and myelination occurring primarily in the third trimester21,22. Therefore, hippocampal subfield myelination may be more vulnerable to stress from early exposure to the extrauterine environment seen in preterm birth.

The postnatal period that corresponds to the third trimester in preterm infants has been characterized as a period of significant adversity23, with much disruption to the still maturing brain and its development. Evidence from humans and animal models has indicated that both sensory regions and areas sensitive to stress, such as the hippocampus, amygdala, and insula, are regulated by input from the environment, and premature exposure to the extrauterine environment can interrupt developmental trajectories in stress-sensitive networks24,25,26,27. Here, we investigated the effect of early life stress on volumetric and myelin development in the hippocampal subfields, using preterm birth as a model. We analyzed data from 520 newborn infants from the Developing Human Connectome Project28, 120 of which were born preterm. We compared both grey matter volume and T1w/T2w ratio as a proxy for myelin between preterm- and term-born neonates to determine whether early stress differentially affects their development. We began by extending the existing HippUnfold automated subfield segmentation tool29 for use in newborns, by training new machine learning models and adapting the workflow for neonatal MRI data. Subfield volumes were extracted, and myelination was assessed by way of T1w/T2w ratio in the hippocampal gray matter30. By comparing the predictive power of each model, we determined whether volumes and myelination were sensitive to preterm birth. We then replicated our findings in a withheld sample of 25 preterm-born infants from the dHCP dataset who had both a scan at birth (that is, earlier than 37 weeks) and at term equivalent age (approximately 40 weeks). We compared both ages to 25 age- and sex-matched term-born infants, revealing differences in how predictive the stress models were, and how stress-exposed infants developed in terms of volumes and myelination.

Results

We successfully segmented the hippocampus using the T1-weighted images, using an incremental U-net training approach. An existing HippUnfold model trained on adult hippocampi with a contrast-agnostic approach (described in the online methods) was first applied to the neonatal T1-weighted data, also employing a neonatal T1-weighted template for initial linear image registration. Successfully segmented cases were then used to train subsequent models that employed the neonatal T1-weighted images. Final segmentations were graded for segmentation quality by multiple raters, and did not differ in quality with respect to hemisphere (X2 (1, N = 1250) = 1.41, p = 0.235). Volumes for each subfield were calculated in native space, and T1w/T2w ratio was averaged across subfield in unfolded space29.

Two linear mixed models were then constructed. The first included postmenstrual age (i.e., age since conception) alone, modeling linear development with age. The second included both time spent in utero and ex utero, accounting for preterm birth. Importantly, both models account for total age of the infant, which is expected to be an important predictor of both volumes and myelination. All subfield volumes showed a clear increase in size with postmenstrual age, with CA1, CA3, and CA4/DG having a quadratic fit (Fig. 1A). The stress model, which includes both time spent in and ex utero, was better at predicting hippocampal subfield volumes than the age model, which considered postmenstrual age since conception alone, with greater explained variance and smaller error values (Volumes, Actual R2 and MSE values, Table 1). However, differences in explained variance between models were small, ranging from less than 0.1% to 3%, which suggests that performance between the models was similar. Actual adjusted R2 values in the stress model varied greatly across subfields, ranging from 0.12 in CA2 to 0.45 in CA4/DG, showing that volumetric sensitivity to stress was not uniform across subfields. Importantly, we tested whether this model could accurately predict volumes in a withheld sample of preterm (which included two timepoints, one during the preterm period and one at term equivalency) and term-born infants. Surprisingly, the age model outperformed the stress model in this replication sample (Predicted R2 values, Table 1 and Fig. 1B), meaning that while prenatal and postnatal stress factors are relevant, postmenstrual age alone was a better predictor of volumes than stress in new, unseen data. Differences in explained variance between models were much larger, with the age model outperforming stress by 4–19%. Preterm volumes were notably smaller than those at term equivalency and in term-born infants (all p < 0.001, Table 2), but no significant differences were observed between term-equivalent and term-born volumes (all p > 0.05, Fig. 1C).

Fig. 1: Hippocampal subfield volumes showed greater sensitivity to age compared to early stress.
figure 1

A The relationship between subfield volumes (y-axis) and age (x-axis). B Segmentation of hippocampal subfield volumes was performed using an iterative approach, with a group-based neonatal template for initial registration before being labeled and unfolded. Colours represent the R² values for the age (top) and stress (bottom) models, showing the model fit across the hippocampus. Warm colours indicate higher R² values, while cooler colors show lower model fit. C Subfield volumes in preterm-born, term-equivalent preterm-born, and term-born infants. Medians are indicated by thick black horizontal lines. The first and third quartiles are marked by the lower and upper edges of the boxes, respectively. Lower and upper whiskers extend to the smallest and largest value, respectively, within 1.5 times the interquartile range. Outlying values beyond these ranges are plotted individually. CA cornu ammonis, DG dentate gyrus, TE term equivalent.

Table 1 Actual and predicted adjusted R2 and MSE values of postmenstrual age and stress models on hippocampal subfield volume and myelination
Table 2 Estimated marginal effects comparing preterm- and term-born infant volumes and T1w/T2w ratio values in the longitudinal sample

T1w/T2w ratio in the subfields also increased significantly with age, with all subfields having a quadratic fit, indicating that myelination slowed at approximately term age (Fig. 2A). However, the results observed in levels of myelination of the subfields followed a very different pattern from volumes. The stress model led to higher adjusted R2 values in all subfields relative to the age model, ranging from 0.20 in CA2 to 0.33 in the subiculum, again indicating that the effect of stress on myelination is not uniform across the subfields (Myelination, Actual R2 and MSE values, Table 1). Differences in explained variance between models were moderate, with the stress model outperforming age by 3–9%. When using each model to predict the withheld data, the stress model still provided better predictive value of myelination, greatly outperforming the age model with improvements in explained variance ranging from 10 to 30% (Myelination, Predicted R2, Table 1 and Fig. 2B). Comparing subgroups of the withheld sample also revealed that term-born infants had significantly higher levels of myelination than preterm-born infants both at birth and at term equivalency (all p < 0.006, Fig. 2C and Table 2).

Fig. 2: Hippocampal subfield myelination showed greater sensitivity to early stress than age.
figure 2

A The relationship between myelination (T1w/T2w ratio, y-axis) and age (x-axis). B The stress model showed better predictive power in a new dataset (R² values, warmer colors), than near-zero effect in the age model (R² values, cooler colours). C Subfield myelination in preterm-born, term-equivalent preterm-born, and term-born infants. Medians are indicated by thick black horizontal lines. The first and third quartiles are marked by the lower and upper edges of the boxes, respectively. Lower and upper whiskers extend to the smallest and largest value, respectively, within 1.5 times the interquartile range. Outlying values beyond these ranges are plotted individually. CA cornu ammonis, DG dentate gyrus, TE term equivalent.

Discussion

In examining the effects of early stress on hippocampal subfield maturation, two key points emerged: subfield volumes are relatively resilient to early life stress, while myelination is quite sensitive. Additionally, the pattern of early stress-related myelination changes was not uniform across subfields; while CA3 showed a relatively small (that is, 3%) difference between models, CA1 showed greater sensitivity with upwards of 25% of their variance in myelination being explained by preterm birth—9% more than age alone. We propose that the differences in stress susceptibility emerge from the structural properties of each subfield; prior work in adults has shown that CA3 has a greater myelin content than CA129,31, thus it appears that subfields with less myelination are more vulnerable to stress. CA1 is heavily involved in episodic memory32, and variations in its structure have been associated with mood disorders33. The connection between episodic memory and mood disorders has been well-established34, and both are known to be impacted by preterm birth35. Thus, the association of mood disorders with early life stress, including preterm birth, may reflect delayed or disrupted myelination of subfields mediating episodic memory. In contrast, volumes were more variably impacted by early stress, which may explain the conflicting results on the effects of preterm birth on subfield volumes in prior literature13,14,15.

Although the mechanisms remain unclear, the link between early life stress, the hippocampus, and enduring developmental outcomes into adulthood have been well-established. The results presented here provide a potential mechanism for this association, particularly when considering both the structure and function of the hippocampal subfields and how early stress shows a variable relationship with each. Findings further suggest that hippocampal subfields exhibit unique myelination compared to other white matter tracts that support broader connectivity, and less variable myelination patterns19. The hippocampus’ specialized function and organization into subfields may result in different myelination needs and sensitivities, especially in response to early life stress.

Methods

Participants

This research was performed using preprocessed data from the third release of the Developing Human Connectome Project (dHCP; obtained from http://www.developingconnectome.org/)28,36. The dHPC is a large open science project of early life development lead by teams at King’s College London, Imperial College London, and Oxford University, and includes imaging, genetic, demographic, and behavioural data for over 1000 infants and their parents. The infant imaging data used in the present study were collected between December 2014 and May 2020. Inclusion criteria were: Gestational age between 23- and 44-weeks, estimated from the mother’s last menstrual period and confirmed where possible by early ultrasound scanning. Exclusion criteria were: Infants with contraindication to MR imaging, including implanted metallic devices such as orthopedic devices or non-compatible clips to close patent ductus arteriosus, preterm infants who are too unwell to tolerate the scanning period, despite full intensive neonatal care, and language difficulties preventing proper communication about the study and the consent process. The original study was approved by the United Kingdom Health Research Authority (Research Ethics Committee reference number: 14/LO/1169) and written parental consent was obtained in every case for imaging and open data release of the anonymized data.

This release contained data for 887 sessions for 783 infants, and 638 infants had both T1- and T2-weighted anatomical images. Six MRI scans were performed under sedation, and 881 were not. After performing segmentation and data cleaning (described below) and removing participants with a diagnosis of growth restriction, gestational diabetes, pre-eclampsia, HELLP, or hypertension (78 infants), a final sample of 520 participants was analyzed (three acquired with sedation), and full participant demographics and clinical characteristics are shown in Table 3. Data were divided into two groups: (i) those participants with a single time point (the cross-sectional sample, 470 infants: 95 preterm, 375 term) and (ii) those with two time points (the longitudinal sample, 25 preterm infants), as well as a group of 25 term-born infants matched to the preterm group in sex and age at the term equivalent scan, who were withheld from the cross-sectional sample.

Table 3 Participant demographics and clinical characteristics

Magnetic resonance imaging

The T1-weighted anatomical images were acquired using an IR (Inversion Recovery) TSE sequence with TR = 4.8 s, TE = 8.7 ms, SENSE factor 2.26 (axial) and 2.66 (sagittal) with overlapping slices (resolution (mm) 0.8 × 0.8 × 1.6). T2 images were obtained using a Turbo Spin Echo (TSE) sequence with the same resolution, acquired in two stacks of 2D slices (in sagittal and axial planes), using parameters: TR = 12 s, TE = 156 ms, SENSE factor 2.11 (axial) and 2.58 (sagittal). Data were acquired on a Philips Achieva 3 T scanner at the Evelina Newborn Imaging Centre using a dedicated 32-channel neonatal head coil28. All anatomical volumes were collected as part of the dHCP and are described in detail in Makropoulos et al.37. The resulting images were motion corrected as described in Cordero-Grande et al.36 and super-resolution reconstruction was performed as in Kuklisova-Murgasova et al.38, resulting in 3D volumes resampled to 0.5 mm isotropic resolution. The resulting images were also corrected for bias field inhomogeneities.

Subfield segmentation with HippUnfold

HippUnfold is a novel tool for subfield segmentation, surface generation, and unfolding of the hippocampus. The first step of HippUnfold is a coarse affine registration to a template where a bounding box around the hippocampus is defined. Using this bounding box, the image is cropped around the hippocampus and rotated into a coronal oblique space. The cropped and coronal oblique images are then supplied to one of the core processes of HippUnfold: a deep neural network called nnU-Net39 to segment the hippocampal gray matter and the boundaries of the hippocampus along orthogonal directions in participant space. The boundary regions include the hippocampal-amygdala transition area, medial temporal lobe cortex, pial surface, stratum radiatum lacunosum moleculare (SRLM), and the indusium griseum. Using the boundaries of this participant-specific segmentation, a coordinate system is defined along the domain of the gray matter across the proximal-distal, anterior-posterior, and inner-outer hippocampal axes via solving Laplace’s equation for these three axes independently. These coordinates have been demonstrated to have topological correspondence between variably shaped hippocampi, and we can use these coordinates to generate non-linear transformations between the native space and a canonical unfolded hippocampal space. Thus, we can project subfields from this unfolded space defined via histology (using the BigBrain dataset) into each subject’s native space. We also generate surfaces at varying laminar depths in the unfolded space, and warp these into subject native space, enabling us calculate measures of gyrification, thickness, and curvature on these surfaces, and perform vertex-wise analyses. Additionally, myelin maps can be generated if both a T1w and T2w image is supplied. The subfield atlas includes the subiculum, cornu ammonis (CA) 1, CA2, CA3, CA4, and the dentate gyrus (DG). For more details refer to DeKraker et al.40.

The original implementation of HippUnfold was designed for adult human data and thus may not perform well out-of-the box to other datasets, including neonate data. Therefore, two critical adjustments were made in the current study. First, the original implementation of HippUnfold uses a template defined on adult humans, which would be inappropriate for use on neonatal data. Thus, for the initial registration step a custom T1w template was created using a groupwise average template of all 638 neonatal participants used in this study. Similarly, the original nnU-Net in HippUnfold was trained on adult human data, and thus the model is mismatched both in terms of image contrast (T1w images of neonates have grey and white matter contrast inverted), and anatomy. Performing manual segmentation of a large set of subjects from scratch to train a new model is a significant, arguably insurmountable, barrier, thus we made use of an iterative and incremental training paradigm to produce a new model. The process was seeded by a HippUnfold model (synthseg_v0.2) trained using a SynthSeg approach41, which used a set of ex vivo adult hippocampi to generate a large set of synthetic contrast MRI images, with randomized tissue contrast, to produce a model that is inherently agnostic to tissue contrast. This model was first applied to a set of 210 dHCP participants, and outputs were assessed visually by a trained expert, classifying each hippocampus as a success or failure. Successes were further characterized using a 5-point qualitative grading scheme defined by the extent of manual correction required, where grade 1 required extensive manual correction, and grade 5 required no manual correction. In this first segmentation iteration, 405 hippocampi completed segmentation, and there were 108 failures, and passing grades 1–5 were 11, 19, 54, 88, and 125, respectively. The grade 5 segmentations were used as-is, and the grade 3 and 4 segmentations were manually corrected, to provide a final set of 172 curated and corrected segmentations. The curated and corrected segmentations were used to train a new model from scratch using nnUnet, using the dHCP T1w images, which was denoted as neonateT1w_v2 in HippUnfold. To evaluate the effect of re-training, we performed quality control on the neonatal_T1w_v2 outputs of the same participants. This iteration resulted in 77 failures, and passing grades 1–5 of 4, 5, 14, 25, and 295 respectively. This new model was applied to the remaining 428 infants, and the final success rate of the entire sample was 85%. The final neonate model is integrated and openly available within HippUnfold. Finally, T1w/T2w ratio maps were generated by HippUnfold using the –generate-myelin-maps flag, leveraging both the T1w and T2w images. While T1-weighted and T2-weighted MRI images do not convey information on myelination in isolation, taking the ratio of T1w to T2w signal intensity provides a proxy for myelin content within a tissue. This technique was first proposed and validated by Glasser and Van Essen30. We did not consider the resolution sufficient to label the DG independently, thus we combined DG and CA4 into one label.

Statistics and reproducibility

All analyses were performed in R for statistical computing (https://www.r-project.org/), and code is available online at https://osf.io/8hte3/42. Both volumes and T1w/T2w ratio were analyzed using the same analytical method. The first analysis examined subfield measures in the cross-sectional sample (the “age” model), and a linear mixed model was constructed with postmenstrual age (i.e., age since conception) as a regressor, including a second-order polynomial to determine whether the fit followed a quadratic shape. Sex and brain hemisphere were included as nuisance variables, and participant was included as a random effect. Marginal coefficients of determination (that is, R2) were calculated for each subfield’s model, to estimate the variance explained by postmenstrual age. We then fit a second linear mixed model, using gestational age at birth and time since birth (that is, time spent both in utero and ex utero, the “stress” model accounting for preterm birth) as independent variables, again controlling for sex and brain hemisphere, with participant as a random effect. Marginal R2 was computed. To determine which model provided a better fit, both R2 values and tenfold cross validated mean squared error (MSE) were numerically compared for each subfield, and the model with the highest amount of variance explained, and the lower MSE, was considered the better fit (Table 1). As some infants had long periods between birth and being scanned, a sensitivity analysis was conducted to determine whether eliminating outliers in this difference would produce similar results. The results are included in Table S1 in the Supplementary Information, and remain largely unchanged from the main analysis.

Next, we used the age and stress linear mixed effects models fit to the cross-sectional sample to predict the longitudinal data, to determine whether the age or stress model better predicted new data. Adjusted R2 values were calculated for each subfield to determine how well both models fit the new data, and were again numerically compared to establish whether the age or stress model explained more variance (Table 1).

Finally, we analyzed the longitudinal sample. First, we examined the effect of term status on volumes and myelination, and whether preterm infants differed from term-born infants both at birth and once they reached full term. Term status refers to the age of the infant at the time of scan; if the infant was younger than 37 weeks, they were considered preterm. If the infant was born preterm but their scan was conducted after 37 weeks, they were considered term-equivalent. Finally, if the infant was born after 37 weeks, then they were considered term-born. In separate linear mixed effects models, subfield volumes and T1w/T2w ratios were analyzed, with term status as the independent variable, controlling for sex and hemisphere, with participant as a random effect.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.