Abstract
Reproducibility of neuroimaging research on infant brain development remains limited due to highly variable processing approaches. Progress towards reproducible pipelines is limited by a lack of benchmarks such as gold-standard brain segmentations. These segmentations are limited by the difficulty of infant brain segmentations, which require extensive neuroanatomical knowledge and are time-consuming in nature. Addressing this, we constructed the Baby Open Brains (BOBs) Dataset, an open source resource of manually curated and expert reviewed infant brain segmentations. Anatomical MRI data was segmented from 71 infant imaging visits across 51 participants, using both T1w and T2w images per visit. Images showed dramatic differences in myelination and intensities across 1–9 months, emphasizing the need for densely sampled gold-standard segmentations across early life. This dataset provides a benchmark for evaluating and improving pipelines dependent upon segmentations in the youngest populations. As such, this dataset provides a vitally needed foundation for early-life large-scale studies such as HBCD.
Similar content being viewed by others
Background & Summary
Processing pipeline variability is a critical factor contributing to reproducibility challenges in neuroimaging research. When the same functional imaging dataset is analyzed by a variety of processing pipelines, different conclusions are drawn depending on which approaches were used1. A variety of different processing stream decisions affect final conclusions, including pipeline components on both the structural and functional side2,3. To support reproducible neuroimaging research, benchmarks must be identified for best standards and practices. One of these necessary benchmarks is gold standard manually defined brain tissue segmentations4.
Nowhere are manually defined segmentations more needed than in studying the first 1000 days of life, a dynamically changing period of brain growth and development5,6. 80% of brain growth occurs during the first 1000 days of life, including dramatic synaptogenesis, myelination, and other cellular processes7,8,9. Aggregating over 100,000 participants from over 100 MRI studies, Bethlehem et al. found that brain development growth acceleration peaks at 7 months of age, with velocity highest around the first three years of life5. Work by Alex et al. confirmed this velocity peak and showed that these trajectories of growth are linked to cognitive and motor outcomes at 2 years of age and that these trajectories differ by sociodemographic factors and adverse birth outcomes6. This dynamic period of growth complicates accurate cortical and subcortical segmentation10. The considerable myelination through the first year of life causes T1-weighted (T1w) scans (which enhance the signal of fatty tissue) and T2-weighted (T2w) scans (which enhance the signal of water) to show a contrast spin-inversion effect during this period11. Existing studies remain limited due to protocols that varied considerably in processing mechanisms, including varied early life segmentation atlases4,12,13. In this context, an atlas refers to a common set of labels for each brain structure within a whole brain MRI scan; a set of atlases can be used to segment new MRI scans and inform where common structures are across brains. Thus, a researcher can confidently know that they are examining the same brain region in two different children’s brains.
Standardized infant segmentation atlases have become a critical need within research programs. The NIH has already invested $50 + million, and plans to invest hundreds of millions more, in the HEALthy Brain and Child Development (HBCD) study. This study promises to elucidate neurodevelopmental trajectories with unprecedented precision and rigor14,15 and overcome sample size limitations highlighted by Marek et al.16. This fills a critical need for measuring true effect sizes for brain-wide associations relevant to early-life outcomes. Correct structural brain segmentations are essential to this promise, especially during the first 9 months due to the dynamic processes of growth and myelination occurring17,18,19. Thus, an atlas is needed that supports the dynamic changes within this time period. Yet the availability of manually-corrected segmentations from anatomical MRI data across infancy is limited4. Such corrections require considerable neuroanatomic expertise, expertise linking MRI landmarks to neuroanatomic borders, and are time-intensive, thus requiring considerable effort.
As field-wide momentum grows for reproducible research standards, a philosophy of open science is a necessary component of research best practices20. Without transparent research, factors that contribute to low reproducibility rates cannot be examined. In this context, as underlying manual segmentations are an impactful part of processing pipelines, it stands to reason that these segmentations should themselves be open and transparent. The primary objective of this resource was to construct a set of manually curated and expert reviewed human infant brain segmentations that adhere to FAIR21 data principles (Findable, Accessible, Interoperable, and Reusable). This dataset can be used to assess existing pipelines and/or develop new ones, such as the recently presented BIBSNet algorithm that was trained on this dataset22. Early life segmentation algorithms already exist within the literature12,23,24,25,26,27,28,29,30,31,32,33. However, many lack coverage across the whole-brain (eg. ID-Seg24, MANTiS27, iSEG challenges25, SDM U-net for subcortical23, ANUBEX30, SegSrgan29), use only T1w or only T2w images as inputs (eg. Infant Freesurfer12, MCRIB-S26, ID-Seg24), or are specific to neonatal periods (VINNA31) and aren’t reliable across the full first years of life4. As well, the underlying training data for those algorithms is often unavailable to the scientific community (iBEAT32). Finally, widespread disagreements among researchers can exist even for well-established areas like Wernicke’s area or the hippocampus34,35.
Therefore, a lack of high quality, publicly available training data is a major limitation to improved infant segmentation pipelines, which is often pointed out by the developers of these algorithms themselves24,31. Making such manual corrections available via open repositories would subject such segmentations to broader exposure and review, improving the rigor and fidelity of the manual segmentations. Indeed, such work has already been performed extensively in adults and even in fetal tissue (ex.36,37,38,39,40), and numerous segmentations have been made publicly available via repositories like OpenNeuro41.
The Baby Open Brains (BOBs) dataset addresses the need for openly available manually corrected segmentations of MRI data during the earliest periods of life4. Such a resource is critical for developers wishing to create processes for accurate automated segmentations. The curation of such a dataset requires considerable neuroanatomic expertise, including knowledge of anatomical MRI landmarks for accurate segmentations. Until now, the labor and considerable effort required to conduct such work has left much of the methods development without a ‘gold standard’ or benchmark dataset. This lack of proper benchmarks has limited the ability for pipeline developers to generalize infant processing pipelines and ensure the effectiveness of different pipelines across infant age groups, which has subsequently led to constrained pipelines tuned for particular ages.
BOBs manual segmentations will provide a benchmark for evaluating and improving automated segmentations. As infant neuroimaging expands, the research community will observe an exponential increase in MRI segmentation approaches. Already at least a half dozen early life segmentation algorithms exist within the literature12,23,24,25,26,27,28,29,30; however, few have tested segmentations across the early life age span and that cover the whole-brain and incorporate a wide-breadth of labels beyond gray matter, white matter, and CSF. Combining expert and community review, the BOBs dataset provides a unique foundational benchmark for evaluating and improving image segmentation methods, as well as expanding their scope towards more comprehensive segmentations. Such benchmarks standardize methods development as methods researchers can evaluate segmentation performance and validate tool capability. Such a benchmark standard for performance evaluation facilitates best practices and standards in infant neuroimaging.
These algorithms will form a necessary foundation for early-life large-scale studies such as HBCD. Automated MR processing pipelines specifically designed for early development are necessary to allow large-scale studies such as HBCD to create MR outputs unconfounded by age. With the BOBs resource providing a foundational benchmark to evaluate and improve these processing pipelines, HBCD and other future early-life neuroimaging studies will be well-equipped to provide the promised knowledge of nuanced neurodevelopmental trajectories and their complex environmental interactions.
Methods
The dataset is comprised of baby connectome project (BCP) anatomic and segmentation MRI data
The data for the BOBs dataset is pulled from the Baby Connectome Project (BCP), a longitudinal neuroimaging study in infants 0–5 years old. Detailed methodology has been described previously19. Briefly, infants were recruited from departmental research participant registries based on both state-wide birth records and the broader communities around the University of North Carolina at Chapel Hill and the University of Minnesota. Infants were eligible for the BCP if they 1) were born at a gestational age of 37–42 weeks, 2) had a birth weight appropriate for gestational age, and 3) had an absence of major pregnancy and delivery complications. Parents provided informed consent and permission for their child’s study participation and data sharing prior to participation. All procedures were approved by the University of North Carolina at Chapel Hill (Study #16-1943) and University of Minnesota Institutional Review Board (SITE00000093). For this dataset, 71 MRI visits with good quality data from infants 1–9 months old scanned at the University of Minnesota were used. Images selected for the dataset represented best quality images based on visual review by the authors, which remains the gold standard for quality assurance in comparison to automated methods42,43. Specifically, images were inspected for signs of poor quality such as motion, ghosting, blurriness, ringing, signal drop-off or image cut-offs. MRI data was collected using a 32-channel head coil on a Siemens 3 T Prisma scanner and included high resolution T1w (MPRAGE: TR 2400 ms, TE 2.24 ms, TI 1600 ms, Flip angle 8°, resolution = 0.8 × 0.8 × 0.8 mm3) and T2w (turbo spin-echo sequences: turbo factor 314, Echo train length 1166 ms, TR 3200 ms, TE 564 ms, resolution = 0.8 × 0.8 × 0.8 mm3, with a variable flip angle) structural scans collected during natural sleep.
Segmentations were initialized using two different segmentation pipelines
As a starting point for manual reviewers, segmentations were run through one of two segmentation pipelines. The first segmentations were initialized from a joint label fusion (JLF) pipeline44, and then manually curated. However, such a procedure required many hours of manual curation as these initializations required much coarser edits. Therefore, these initial manual segmentations were used to train “BIBSNet”22, a deep neural network built using nnU-Net45 and SynthSeg33. Using BIBSNet, other segmentations were initialized and then manually curated. Iteratively using BIBSNet prototypes as a starting point saved many hours of work, as the prototypes were much more accurate starting points than the JLF pipeline. In both pipelines, Advanced Normalization Tools (ANTs) was used to perform denoising and N4 bias field correction and T1w and T2w images underwent a rigid-body realignment to remove distortions and improve image quality for the reviewers. Detailed information about preprocessing is referenced in22 and on the BIBSNet Github (https://github.com/DCAN-Labs/BIBSnet).
Markers curated segmentations according to a standard operating protocol
A schematic depicting the process of segmentation initialization, correction, and upload is shown in Fig. 1. Markers attended trainings provided by the experts and had regular consultations with expert reviewers throughout the segmentation process. Marker segmentations were reviewed by expert reviewers (EF/SS/JW/DA) and modified as needed. Markers performed image segmentation edits using ITK-SNAP46 software. Initialized segmentations were overlaid on top of structural scans and manually edited. Markers utilized both the T1w and T2w scans to determine correct segmentation boundaries, such that there is one segmentation per session. As infant brains in this age range have increasing amounts of myelination in the white matter, referring to both T1w and T2w scans was critical to determining the extent of white matter. For each brain, the cortical surface and the gray-white matter boundary were edited first and reviewed. Subcortical regions were then edited, including the lateral ventricles, inferior lateral ventricles, cerebellum white matter, cerebellum cortex, thalamus, caudate, putamen, pallidum, amygdala, hippocampus, nucleus accumbens, third ventricle, fourth ventricle, and brainstem. Segmentations were done in phases, with the lateral ventricles, third ventricle, and fourth ventricle segmented first, the nucleus accumbens, caudate, putamen, and pallidum second, the brainstem, thalamus, and cerebellum third, and then the amygdala and inferior lateral ventricles last. The hippocampus was segmented separately, either before or after the rest of the subcortical segmentations. Definitions for the boundaries of these regions were pulled from previously published definitions47,48,49. A full SOP of subcortical boundaries was created (See Supplemental Information) and can be found on the OSF site50 as well as the ReadTheDocs page (https://bobsrepository.readthedocs.io).
A schematic depicting the process of creating the dataset. Segmentations were initialized with an automated processing pipeline and then manually corrected, utilizing both the T1 and T2 MRI images. Segmentations were then reviewed by expert reviewers who made revisions as necessary. These images were defaced and deidentified, and uploaded to OpenNeuro. OSF acts as a hub to integrate the links to dataset images, protocols, and any other future documentation created as the dataset expands.
Approved anatomic MRI data were deidentified and defaced
Final data was stripped of identifying information and formatted into BIDS format. To deface images, T1w and T2w images were run through PyDeface using MNI infant templates as well as a custom infant mask (https://cdnis-brain.readthedocs.io/deidentification/), which masked out facial features from the scans. Final deidentified and defaced images and segmentations were version controlled with DataLad to enable data provenance.
Data Records
The BOBs dataset is available on OpenNeuro, with 71 BCP visits spanning 1–9 months of age
The BOBs dataset is available on OSF50 and OpenNeuro51. In total, segmentations were manually curated from 71 imaging visits across 51 participants. Of the 51 participants, 34 participants contributed one scan visit, 14 contributed 2 scans, and 3 contributed 3 scans to this set of segmentations. The age at scan ranged from 1–9 months old, with at least 6 scans at each month 1–8 (Fig. 2). The demographics of the dataset participants skewed White, non-Hispanic, and well-resourced (Fig. 2), with 82% of the sample identifying as White, non-Hispanic and 96% of mothers having at least a college degree. The demographics of the 51 participants pulled for the dataset did not differ statistically from the full BCP neuroimaging sample (N = 901 visits across 383 participants). Select neurodevelopmental measures, including the Mullen Scales of Early Learning, the Vineland Adaptive Behavior Scales, and subscales from the Infant Behavior Questionnaire - Revised, showed no differences between dataset participants and the full BCP sample as well (Table 1), suggesting that participants in this dataset can be considered representative of the larger BCP sample.
The current BOBs dataset is comprised of FreeSurfer-style segmentations for infants
These segmentations comprise cerebral gray/white matter and 23 subcortical structures. Uploaded segmentations went through several review stages before final approval, including at least one expert reviewer manually checking the segmentation. Leveraging both a T1w and T2w, care was taken to label white matter both affected and unaffected by the contrast spin-inversion effect. Diverging from FreeSurfer labels, the ventral thalamic boundary that separates thalamus from ventral diencephalon was defined by the hypothalamic sulcus52. The hippocampal label was used to define the hippocampus proper, excluding the formation at the tail along the lingual gyrus, in order to be consistent with other infant literature53. While we think evaluating whether the SOP is “right” or “wrong” may be beyond the scope of this paper, we chose such definitions in order to be more consistent with prior infant MRI literature53. We welcome the community to inspect and refine existing segmentations to ensure that the “gold standard” benchmarks reflect a community gold standard.
The BOBs dataset follows BIDS formatting standards
Data within the dataset follows the BIDS formatting standards54,55. Each subject folder contains one or more session folders. The “anat” subdirectory within each session folder contains the T1w and T2w image files, the associated segmentation file, and corresponding json files containing metadata for each file. In addition to the subject folders, the directory contains a “dataset_description.json” file, containing a description of the dataset, a “dseg.tsv” file containing a lookup table of segmentation label numbers and names, and a phenotype folder with a “sessions.json” and “sessions.tsv” that contain a list of ID numbers, session, chronological age, gestational age at birth, and sex of the participants in the dataset. The dataset also includes two non-BIDs standard files, “index.html”, a list of links to download individual files, and “V1.0.zip”, a zipped version of the entire repository, that are included for ease of access. File organization can also be found on the BOBs ReadTheDocs page.
Technical Validation
Manual segmentations show massive qualitative improvement over initial Joint Label Fusion segmentations
Compared to initial Joint Label Fusion segmentations, created from the DCAN infant-ABCD-BIDS pipeline56, manual segmentations show dramatic qualitative improvements. Initial segmentations had three major types of errors that were corrected by markers (see Fig. 3). First, initial segmentations often created major errors in cortical folding patterns (Fig. 3 top). The initial model may not account for differences between infant and adult image intensity, and this model failure may drive folding pattern segmentation errors. These errors required intensive edits to correct the basic gyri and sulci patterns. Additionally, due to the contrast spin inversion occurring at this age from myelination processes, labeling the full extent of unmyelinated white matter required extensive manual segmentation (Fig. 3 middle). Automated segmentations often miss unmyelinated white matter, especially along the lateral surface of the brain where myelination processes occur later in development. Finally, as exemplified in Fig. 4, subcortical regional intensities change dramatically over this time period, and thus subcortical regional boundaries often needed refining (Fig. 3 bottom).
T1w and T2w images show dramatic developmental differences across the age range considered. (a) The selected images are from the same participant at three different ages, clearly depicting the transition from unmyelinated to myelinated white matter, and the differing image contrast intensities in the T1w vs. T2w at each age. Red arrows point out cortical gray/white matter changes, blue triangles point out internal capsule white matter changes, and green circles point out nucleus accumbens region changes (b) Cohen’s d values of white-gray matter differentiation are plotted for T1w and T2w MRI images. Considering both the T1w and the T2w images at this age group is critical to fully capture the white matter and subcortical boundaries.
Dynamic brain development in infancy requires dense sampling and segmentations utilizing both T1w and T2w images
As infant brains in this age range have increasing amounts of myelination in the white matter17, referring to both T1w and T2w scans was critical to determining the extent of white matter. This brain growth is exemplified in a single infant in our dataset across three ages in Fig. 4a. In this infant, there is visually dramatic development of image contrast within and across brain structures. This early time period shows rapid myelination, such that the older ages show much more myelinated white matter, especially along the major white matter tracts. Most dramatically at 5 months in this infant, there is an abundance of unmyelinated white matter that can be easily seen on the T2w image, but would be easily missed on the T1w image. Regardless of the cause, these developmental changes require considering both the T1w and the T2w images at this age group to fully capture the white matter and subcortical boundaries. This was especially critical in subcortical regions such as the basal ganglia, where boundaries might only be visible in either the T1w or the T2w, but not both. The symbols on each of the images exemplify regions that are better served by examining the T1w or the T2w but not both, such as the basal ganglia.
As the largest manually curated human infant brain segmentation dataset for the critical 1–9 month age range, the BOBs dataset proved vital in developing BIBSnet22. BIBSnet is an automated segmentation pipeline necessary for HBCD MRI data preprocessing, and critical for infant pipeline development. The BOBs dataset’s critical role in developing BIBSnet establishes further external technical validation for the dataset. Prior efforts towards developing automated segmentation pipelines lacked densely sampled, manually labeled training data that would be critical for early-life longitudinal studies like HBCD4,14,17,18,19. For example, the Developing Human Connectome Project (dHCP) provides extensive anatomical segmentations that are largely restricted to neonatal and preterm infants28,57. Such segmentations are derived from the T2w but do not use the T1w; while they can be used to develop automated segmentation pipelines23,29,30, such pipelines may fail to generalize beyond the neonatal period. The Infant Freesurfer dataset comprises data from a dozen infant sessions through the first two years of life58, and helped develop Infant Freesurfer12, but lacks the participant density of the BOBs dataset.
Usage Notes
In addition to the repository dataset on OpenNeuro, BOBs is available at https://bobsrepository.s3.amazonaws.com/index.html. More information and additional download links are available on our ReadTheDocs page. The dataset was also linked to BrainBox (https://brainbox.pasteur.fr/), which allows users to review the dataset online.
Code availability
All code used in this manuscript is available publicly as cited in the manuscript or available at https://github.com/sallystoyell/BOBs_manuscript.
References
Botvinik-Nezer, R. et al. Variability in the analysis of a single neuroimaging dataset by many teams. Nature 582, 84–88 (2020).
Glasser, M. F. et al. The minimal preprocessing pipelines for the Human Connectome Project. Neuroimage 80, 105–124 (2013).
Li, X. et al. Moving Beyond Processing and Analysis-Related Variation in Neuroscience. bioRxiv 2021.12.01.470790 https://doi.org/10.1101/2021.12.01.470790 (2024).
Dufford, A. J. et al. Un)common space in infant neuroimaging studies: A systematic review of infant templates. Hum. Brain Mapp. 43, 3007–3016 (2022).
Bethlehem, R. A. I. et al. Brain charts for the human lifespan. Nature 604, 525–533 (2022).
Alex, A. M. et al. A global multicohort study to map subcortical brain development and cognition in infancy and early childhood. Nat. Neurosci. 27, 176–186 (2024).
Knickmeyer, R. C. et al. A structural MRI study of human brain development from birth to 2 years. J. Neurosci. 28, 12176–12182 (2008).
Stiles, J. & Jernigan, T. L. The basics of brain development. Neuropsychol. Rev. 20, 327–348 (2010).
Gao, W. et al. Temporal and spatial development of axonal maturation and myelination of white matter in the developing brain. AJNR Am. J. Neuroradiol. 30, 290–296 (2009).
Mhlanga, S. T. & Viriri, S. Deep learning techniques for isointense infant brain tissue segmentation: a systematic literature review. Front. Med. 10, 1240360 (2023).
Saunders, D. E. et al. Magnetic resonance imaging protocols for paediatric neuroradiology. Pediatric Radiology 37, 789–797 (2007).
Zöllei, L., Iglesias, J. E., Ou, Y., Grant, P. E. & Fischl, B. Infant FreeSurfer: An automated segmentation and surface extraction pipeline for T1-weighted neuroimaging data of infants 0–2 years. Neuroimage 218, 116946 (2020).
Gousias, I. S. et al. Magnetic resonance imaging of the newborn brain: manual segmentation of labelled atlases in term-born and preterm infants. Neuroimage 62, 1499–1509 (2012).
Volkow, N. D., Gordon, J. A. & Freund, M. P. The Healthy Brain and Child Development Study-Shedding Light on Opioid Exposure, COVID-19, and Health Disparities. JAMA Psychiatry 78, 471–472 (2021).
Dean, D. C. 3rd et al. Quantifying brain development in the HEALthy Brain and Child Development (HBCD) Study: The magnetic resonance imaging and spectroscopy protocol. Dev. Cogn. Neurosci. 70, 101452 (2024).
Marek, S. et al. Reproducible brain-wide association studies require thousands of individuals. Nature 603, 654–660 (2022).
Dubois, J., Hertz-Pannier, L., Dehaene-Lambertz, G., Cointepas, Y. & Le Bihan, D. Assessment of the early organization and maturation of infants’ cerebral white matter fiber bundles: a feasibility study using quantitative diffusion tensor imaging and tractography. Neuroimage 30, 1121–1132 (2006).
Shi, F. et al. Infant brain atlases from neonates to 1- and 2-year-olds. PLoS One 6, e18746 (2011).
Howell, B. R. et al. The UNC/UMN Baby Connectome Project (BCP): An overview of the study design and protocol development. Neuroimage 185, 891–905 (2019).
Nosek, B. A. et al. SCIENTIFIC STANDARDS. Promoting an open research culture. Science 348, 1422–1425 (2015).
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016).
Hendrickson, T. J. et al. BIBSNet: A Deep Learning Baby Image Brain Segmentation Network for MRI Scans. bioRxiv https://doi.org/10.1101/2023.03.22.533696 (2023).
Chen, L. et al. An attention-based context-informed deep framework for infant brain subcortical segmentation. Neuroimage 269, 119931 (2023).
Wang, Y. et al. ID-Seg: an infant deep learning-based segmentation framework to improve limbic structure estimates. Brain Inform 9, 12 (2022).
Sun, Y. et al. Multi-Site Infant Brain Segmentation Algorithms: The iSeg-2019 Challenge. IEEE Trans. Med. Imaging 40, 1363–1376 (2021).
Adamson, C. L. et al. Parcellation of the neonatal cortex using Surface-based Melbourne Children’s Regional Infant Brain atlases (M-CRIB-S). Sci. Rep. 10, 4359 (2020).
Beare, R. J. et al. Neonatal Brain Tissue Classification with Morphological Adaptation and Unified Segmentation. Front. Neuroinform. 10, 12 (2016).
Makropoulos, A. et al. The developing human connectome project: A minimal processing pipeline for neonatal cortical surface reconstruction. Neuroimage 173, 88–112 (2018).
Delannoy, Q. et al. SegSRGAN: Super-resolution and segmentation using generative adversarial networks - Application to neonatal brain MRI. Comput. Biol. Med. 120, 103755 (2020).
Chen, J. V. et al. Automated neonatal nnU-Net brain MRI extractor trained on a large multi-institutional dataset. Sci. Rep. 14, 4583 (2024).
Henschel, L., Kügler, D., Zöllei, L. & Reuter, M. VINNA for Neonates–Orientation Independence through Latent Augmentations. arXiv [cs.CV] (2023).
Wang, L. et al. iBEAT V2.0: a multisite-applicable, deep learning-based pipeline for infant cerebral cortical surface reconstruction. Nat. Protoc. 18, 1488–1509 (2023).
Billot, B. et al. SynthSeg: Segmentation of brain MRI scans of any contrast and resolution without retraining. arXiv [eess.IV] (2021).
Tremblay, P. & Dick, A. S. Broca and Wernicke are dead, or moving past the classic model of language neurobiology. Brain Lang. 162, 60–71 (2016).
Wisse, L. E. M. et al. A harmonized segmentation protocol for hippocampal and parahippocampal subregions: Why do we need one and what are the key goals? Hippocampus 27, 3–11 (2017).
Caviness, V. S. Jr, Filipek, P. A. & Kennedy, D. N. Magnetic resonance technology in human brain science: blueprint for a program based upon morphometry. Brain Dev. 11, 1–13 (1989).
Kennedy, D. N., Filipek, P. A. & Caviness, V. R. Anatomic segmentation and volumetric calculations in nuclear magnetic resonance imaging. IEEE Trans. Med. Imaging 8, 1–7 (1989).
Goldstein, J. M. et al. Cortical abnormalities in schizophrenia identified by structural magnetic resonance imaging. Arch. Gen. Psychiatry 56, 537–547 (1999).
Seidman, L. J. et al. Thalamic and amygdala-hippocampal volume reductions in first-degree relatives of patients with schizophrenia: an MRI-based morphometric analysis. Biol. Psychiatry 46, 941–954 (1999).
Payette, K. et al. An automatic multi-tissue human fetal brain segmentation benchmark using the Fetal Tissue Annotation Dataset. Sci Data 8, 167 (2021).
Gorgolewski, K., Esteban, O., Schaefer, G., Wandell, B. & Poldrack, R. OpenNeuro—a free online platform for sharing and analysis of neuroimaging data. Organization for human brain mapping. Vancouver, Canada 1677 (2017).
Taylor, P. A., Etzel, J., Glen, D. R. & Reynolds, R. C. Demonstrating Quality Control (QC) Procedures in fMRI. (Frontiers Media SA, 2023).
White, T. et al. Automated quality assessment of structural magnetic resonance images in children: Comparison with visual inspection and surface-based reconstruction. Hum. Brain Mapp. 39, 1218–1231 (2018).
Wang, H. & Yushkevich, P. A. Groupwise segmentation with multi-atlas joint label fusion. Med. Image Comput. Comput. Assist. Interv. 16, 711–718 (2013).
Isensee, F., Jaeger, P. F., Kohl, S. A. A., Petersen, J. & Maier-Hein, K. H. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18, 203–211 (2021).
Yushkevich, P. A. et al. User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage 31, 1116–1128 (2006).
Wedig, M. M., Rauch, S. L., Albert, M. S. & Wright, C. I. Differential amygdala habituation to neutral faces in young and elderly adults. Neurosci. Lett. 385, 114–119 (2005).
Filipek, P. A., Richelme, C., Kennedy, D. N. & Caviness, V. S. Jr. The young adult human brain: an MRI-based morphometric analysis. Cereb. Cortex 4, 344–360 (1994).
Rushmore, R. J. et al. Anatomically curated segmentation of human subcortical structures in high resolution magnetic resonance imaging: An open science approach. Front. Neuroanat. 16, 894606 (2022).
Feczko, E. et al. BOBsRepository. OSF https://doi.org/10.17605/OSF.IO/WDR78 (2024).
Feczko, E. et al. BOBsRepo. OpenNeuro https://openneuro.org/datasets/ds005450/ (2025).
Standring, S. Gray’s Anatomy: The Anatomical Basis of Clinical Practice. (Elsevier, 2020).
Morey, R. A. et al. A comparison of automated segmentation and manual tracing for quantifying hippocampal and amygdala volumes. Neuroimage 45, 855–866 (2009).
Gorgolewski, K. J. et al. BIDS apps: Improving ease of use, accessibility, and reproducibility of neuroimaging data analysis methods. PLoS Comput Biol 13, e1005209 (2017).
Poldrack, R. A. et al. The past, present, and future of the brain imaging data structure (BIDS). Imaging Neurosci (Camb) 2, 1–19 (2024).
Creators Sturgeon, Darrick1 Snider, Kathy1 Moore, Lucille A. 2 Perrone, Anders J. 2 Earl, Eric3 Madison, Thomas J. 4 Conan, Greg4 Klein, Rachel Miranda-Dominguez, Oscar4 Feczko, Eric5 Graham, Alice1 Fair, Damien4 Show affiliations 1. Oregon Health and Science University 2. Masonic Institute for the Developing Brain 3. National Institute of Mental Health 4. Masonic Institute for the Developing Brain, University of Minnesota 5. (1) Masonic Institute for the Developing Brain, University of Minnesota & (2) Institute of Child Development, University of Minnesota. DCAN-Labs Infant-Abcd-Bids-Pipeline. https://doi.org/10.5281/zenodo.7683282.
Edwards, A. D. et al. The Developing Human Connectome Project Neonatal Data Release. Front. Neurosci. 16, 886772 (2022).
de Macedo Rodrigues, K. et al. A FreeSurfer-compliant consistent manual segmentation of infant brains spanning the 0–2 year age range. Front. Hum. Neurosci. 9 (2015).
Acknowledgements
The authors would like to thank additional markers and coordinators who contributed to this dataset, including Henrique A. Caldas, BettyAnn Chodkowski, Katie Day, Ekomobong Eyoh, Brayton Hall, Alexandra Harper, and Sarah Kuplic. This work was supported by Bill & Melinda Gates Foundation INV-015711 (M.D.R., J.T.E., C.D.S., D.A.F.) The UNC/UMN Baby Connectome Project was supported by NIMH R01 MH104324 and NIMH U01 MH110274. EAK is supported by a National Research Service Award (NRSA) T32 training grant (T32-NS109604). SMS and TKMD are supported by the National Science Foundation Graduate Research Fellowship Program (SMS: 2237827; TKMD: 2020295366). DAF is supported by NIDA U01DA041148, NIDA U24DA055330, NIMH R01MH096773, NIMH R01MH125829, and NIMH R37MH125829.
Author information
Authors and Affiliations
Contributions
S.M.S. and E.F. wrote the manuscript, with substantial writing contributions from J.T.E. and D.A.F. All authors reviewed and approved the manuscript. E.F., S.M.S., L.A.M., M.B., D.G., A.R.R., M.D.R. coordinated segmentation efforts. E.F., S.M.S., D.A., J.L.W. reviewed segmentations. E.F., S.M.S., D.A., K.B., B.B., A.C., T.A.C., T.K.M.D., L.H.-R., O.K., E.A.K., C.L., T.M., A.M., M.M., P.N., H.S., S.S. and B.Z. contributed to manual segmentations. L.A.M., G.C., T.K.M.D., A.G., T.J.H., A.H., E.G.L., J.T.L., A.J.P., P.R., M.S., S.S., B.T., E.Y., C.D.S. and M.D.R. contributed to pre-processing the data. M.D.R., C.D.S., D.A.F. and J.T.E. conceived of and coordinated the project.
Corresponding author
Ethics declarations
Competing interests
D.A. Fair has ownership interest in FIRMM software, Turing Medical Inc; M. Bagonis, K. Barrett, B. Bower, D. Goradia, L. Heisler-Roman, C. Lucena, M. Myricks, and P. Narnur all worked for PrimeNeuro at the time of their contributions to this manuscript.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Feczko, E., Stoyell, S.M., Moore, L.A. et al. Baby Open Brains: An open-source dataset of infant brain segmentations. Sci Data 12, 1423 (2025). https://doi.org/10.1038/s41597-025-05404-y
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41597-025-05404-y






