Background & Summary

There is a growing awareness of the importance of early brain maturation on health later in life as the underlying complex, interconnected structural and functional processes can be altered by various genetic and environmental factors1,2,3,4,5,6,7,8,9. Accurate characterization of in utero development is therefore critical.

Magnetic resonance imaging (MRI) is an emergent adjunct to ultrasound (US) in cases of diagnostic ambiguity, and for comprehensive diagnostic, prognostic, and postnatal management planning10. MRI is adequate for exploring the developing fetal brain due to its excellent contrast in soft tissue while being minimally invasive. Clinical guidelines recommend the acquisition of T2-weighted (T2w) fast spin echo (FSE) sequences to workaround unpredictable fetal motion during the exam. In practice, at least three orthogonal series of two-dimensional (2D) thick slices are acquired to provide information on the whole brain volume with sufficient signal-to-noise ratio (SNR)11. Despite all counter-measures and optimizations, fetal MRI remains challenging due to motion artefacts and related signal drops, as well as low SNR in small structures within the maturing brain surrounded by the mother’s womb and the amniotic fluid. Since MRI is second-line and not the reference-standard technique (i.e, US) for the follow-up of the fetus during pregnancy, large-scale clinical datasets are relatively scarce in this cohort of sensitive subjects. Besides, there is no standardized imaging protocol across sites, which has resulted in large variability between MR schemes across scanners, studies12, and even more so between vendors. Such discrepancies may result in highly variable MR contrasts and image quality. Indeed, most of the data available today are heavily post-processed and integrated into spatio-temporal MRI atlases of the fetal brain, either healthy10,13,14 or pathological15. This enables a fine representation of the developing brain throughout gestation. However, such atlases average brain scans across several fetuses at a given gestational age (GA), thus resulting in high-resolution (HR) images far from a realistic clinical set-up, with smoothed inter-individual heterogeneities and features.

Recently, the Fetal Tissue Annotations (FeTA) dataset has been proposed as a benchmark for automated multi-tissue fetal brain segmentation16,17,18,19. However, only super-resolution (SR) reconstructions20,21 of the fetal brain volume and their associated semi-automated annotations have been made publicly available, but not the original clinical acquisitions. In fact, to date, no database provides annotated low-resolution (LR) series of the fetal brain.

Thanks to their ability to provide a flexible and controlled environment that facilitates accurate, robust, and reproducible research, computer simulations are widely used for MR developments to mitigate data scarcity and post-processing complexity22,23,24,25,26,27,28.

While recent advances in generative artificial intelligence (AI, such as GANs, VAEs, and diffusion models) have shown promising results in the synthesis of medical images29,30, our physics-guided simulation approach was specifically chosen to provide precise control over the acquisition parameters of MRI and tissue properties throughout the maturation of the fetal brain. Although conditional generative models could potentially be trained to handle such parameters31, they would require extensive paired training data covering all possible parameter combinations across GA - data that is particularly scarce in this sensitive population32. In this context, we demonstrated that synthetic, yet realistic data can efficiently complement scarce clinical datasets, providing valuable support fot data-demanding deep learning (DL) models for fetal brain MRI tissue segmentation28,33,34, as well as the optimization of advanced reconstruction techniques28,35,36,37. These exploratory studies were based on the first Fetal Brain magnetic resonance Acquisition Numerical phantom (FaBiAN) that simulates as closely as possible the FSE sequences used in clinical routine for fetal brain examination to generate realistic T2w images of the fetal brain throughout maturation from a variety of segmented HR anatomical images of healthy and pathological subjects28. Despite a good tissue contrast, the synthetic T2w MR images used in this work were originally derived from a three-class model of the fetal brain (gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF)) that does not allow to capture key maturation processes and metabolic changes occurring in WM tissues across gestation.

In the wake of this first prototype38, the proposed data descriptor showcases a full dataset of highly realistic in silico data composed of:

  • 594 synthetic T2w MR images corresponding to 78 developing fetal brains, derived from HR annotations of SR-reconstructed, real clinical data acquired on various MR scanners and following the different clinical protocols in place at Lausanne University Hospital (CHUV) and at University Children’s Hospital Zurich (Kispi);

  • automatically-generated brain masks and fetal brain annotations of the LR series;

  • a corresponding SR reconstruction for every subject.

This dataset is based on a numerical model of the developing fetal brain that accounts for the pronounced MR signal changes representative of tissue heterogeneity within WM structures throughout maturation39. Our methodological contribution (FaBiAN v2.040) thus identifies local spatial regions of variable water content within a WM mask to simulate more realistic in utero MR images.

Figure 1 provides a schematic overview of the study design presented in this work. We specifically validate the data as follows. First, we report a large-scale, independent, qualitative assessment of the realism of the newly simulated images (FaBiAN v2.0) compared to the former implementation (FaBiAN v1.2) by two neuroradiology and pediatric radiology experts. Second, we compare quantitatively the SR volumes reconstructed from real clinical cases and from their corresponding synthetic LR images generated using FaBiAN v1.2 and v2.0. Finally, we demonstrate a potential use-case of the dataset with an in silico augmentation of training samples for automated fetal brain multi-tissue segmentation of in utero MR acquisitions, through the simulation of a broad variety of sequence parameters using FaBiAN v2.0.

Fig. 1
figure 1

Schematic overview of the dataset promoted throughout this work and of the study design to validate its relevance to the fetal brain MRI community.

This study aims at demonstrating the relevance of such an extensive dataset of synthetic, yet highly realistic T2w MR images of the fetal brain throughout maturation to a community struggling with data scarcity in this sensitive population that requires comprehensive ethical oversight to acquire new data as well as experienced MR technologists. As such, having access to multiple MR images generated from various settings and in different clinical scenarios has a great potential reuse value for further developments of advanced post-processing methods as well as cutting-edge artificial intelligence models.

Methods

Data

The input anatomical models used to generate all the LR series of the fetal brain with local WM changes across maturation for the Technical Validation and Reuse Potential sections were extracted from the publicly available FeTA Dataset with refined annotations41,42. The FeTA dataset gathers 90 subjects in total17, from which two were excluded after quality control due to the bad quality of the SR reconstruction, hence the corresponding segmentations have not been manually corrected in the refined dataset42. The remaining 88 subjects (34 neurotypical and 54 pathological) are in the GA range of 20.0 to 34.8 weeks (27.0 ± 3.58 weeks). Diagnosis of the pathology, absent from the original anonymized FeTA dataset release, has been assessed by two radiologists and shared along this updated dataset42: common pathological conditions such as spina bifida pre- and post-surgery (n = 36) and ventriculomegaly (n = 8), as well as heterotopia (n = 5), fetal intracranial cystic lesions (n = 3), high-flow dural sinus malformation (n = 1), aqueductal stenosis (n = 1), and cerebellar hemorrhage (n = 1) are included.

In this study, we provide simulated data based on 78 subjects from the FeTA Dataset with refined annotations41,42. For the sake of reproducibility, the corresponding subject IDs are provided as Supplementary Information. On the one hand, 29 subjects (neurotypical: 16, pathological: 13) were initially simulated for the technical validation, which consisted of a qualitative evaluation of the realism of the simulated LR series and a quantitative comparison between the simulated and original clinical images. These healthy and pathological subjects were randomly selected to span a broad range of GA, from 20.1 to 34.8 weeks. On the other hand, 70 subjects (neurotypical: 27, pathological: 43; GA range: 20.0-34.8 weeks) were simulated as to form the training set of the Reuse Potential section that leverages in silico data for automated fetal brain multi-tissue segmentation with deep learning. Among them, we re-used 21 subjects from the technical validation and simulated 49 additional subjects. The remaining 18 subjects (neurotypical: 7, pathological: 11; GA range: 20.9-33.1 weeks) out of the 88 subjects from the FeTA dataset constituted the pure testing set (no simulated data are used for testing). Thus, our dataset encompasses 78 simulated subjects overall (neurotypical: 31, pathological: 47; GA range: 20.0-34.8 weeks). Details peculiar to the different subgroups used for the technical validation and the reuse potential experiment are given in the corresponding sections.

The original MR images were acquired at Kispi at 1.5T (Signa Discovery MR450, GE Healthcare) or 3T (Signa Discovery MR750, GE Healthcare), either using an eight-channel cardiac coil or body coil. Multiple series of 2D thick slices were scanned in orthogonal orientations (axial, coronal and sagittal) with respect to the fetal brain using a T2w Single-Shot FSE (SS-FSE) sequence (TR/TE, 2000–3500 ms/120 ms (minimum); flip angle, 90°; sampling percentage, 55%; slice thickness, 3.00–5.00 mm; field-of-view, from 200 × 200 mm2 to 240 × 240 mm2; acquisition matrix, 1.5T: 256 × 224 voxels2, 3T: 320 × 224 voxels2; isotropic in-plane resolution of 0.5 mm)16. Eight tissues are segmented: WM, intra-axial CSF, cerebellum, extra-axial CSF, cortical GM, deep GM, brainstem, and corpus callosum.

The data used in this study were acquired in earlier studies in accordance with the relevant guidelines and regulations, under the supervision of Ethics Boards composed of representatives at different levels (hospitals, cantons, and federal state). Mothers of all fetuses included in the current work provided written informed consent for the re-use of their data for research purposes.

Modeling of white matter heterogeneity and changes across maturation

Figure 2 gives an overview of FaBiAN v2.0, our latest developments to account for local WM heterogeneities throughout fetal brain maturation. In particular, Fig. 3 highlights the major changes implemented to modulate brain tissue relaxation times across GA.

Fig. 2
figure 2

Workflow for simulating fast spin echo (FSE) acquisitions of the fetal brain which better capture local WM changes throughout maturation (FaBiAN v2.0). The major changes compared to the original implementation of the software are highlighted by the flag “v2.0”.

Fig. 3
figure 3

Modulation of reference T1 and T2 values in the WM to depict smooth, local WM changes throughout fetal brain maturation.

Due to the lack of characterizations or normative maps of T1 and T2 relaxometry measurements of fetal brain structures, all tissues from the segmented fetal brain images are merged into three main classes: GM, WM and CSF. According to previous empirical relaxometry values provided in the literature43,44,45,46,47, an average T1, respectively T2 value is assigned to each of these classes, without consideration of the GA or spatial a priori on the finer location within the brain28 (see Fig. 3(A)).

We rely on the Gaussian Hidden Markov Random Field (GHMRF) model, fitted using the Expectation-Maximization algorithm, following the approach outlined in FMRIB’s Automated Segmentation Tool (FAST48) to integrate local spatial WM heterogeneities in our numerical representation of the developing fetal brain, and therefore capture biophysical changes that arise across maturation. Concretely, we automatically segment a WM mask into three classes using FAST 6.0.5.1 as illustrated in Fig. 3(B)): hydrated WM areas that appear hyperintense on a T2w image, hypointense areas representing dense WM fibers, and a third class interpreted as an intermediate between hydrated WM and maturing, dense WM fibers. Parameters were set heuristically as follows: 0.1 MRF regularization weight; four iterations for bias field removal; 20.0 mm kernel in bias field smoothing.

The segmentation output consists in 3D partial volume (PV) concentration maps for each voxel (i) within the three segmented classes. These PV maps are used to ultimately weight T1 and T2 relaxation times locally. Specifically, the intermediate WM class is considered as a baseline: the positive difference between the PV maps of hydrated, respectively dense WM and this baseline will be used to increase, respectively decrease the average reference T1 and T2 relaxation times in every voxel of the WM mask. The corresponding weight (w+, respectively w) computed in every voxel of the WM mask according to Equation (1) can be displayed in so-called positive and negative reward maps (Fig. 3(B)).

$$w=\left\{\begin{array}{l}{w}^{+}[i]=1+\max \,(0,{PV}_{hydratedWM}^{i}-P{V}_{intermediateWM}^{i})\\ {w}^{-}[i]=1-\max \,(0,{PV}_{intermediateWM}^{i}-P{V}_{WMfibers}^{i})\end{array}\right.\forall i\in {{\mathbb{R}}}^{W\times H\times D}$$
(1)

where: w+ and w are respectively the positive and negative reward maps, PVhydratedWM, PVintermediateWM, and PVWMfibers stand for the PV concentration maps of each of the three segmented WM classes, respectively the hydrated WM, the intermediate WM class, and the dense WM fiber bundles. W, H, and D represent the width, height, and depth of the maps respectively.

The adjustment of the mean reference T1 and T2 relaxation times should be approached with caution though as substantial weights (w) may result in unrealistic relaxometric properties of the modeled fetal brain tissues. To address this concern, a sigmoid modulation characterized by a control value α (see Fig. 3(D)) is applied to the original T1 and T2 values in order to smoothen local WM changes within brain structures and ensure that the corresponding tissue properties do not excessively deviate from their reference values. Because the range of WM intensity changes is not uniform across gestation (see Fig. 3(C)), α needs to be adjusted according to the GA. We rely on the normative spatiotemporal MRI atlas (STA) of the fetal brain which includes 18 subjects spanning 21 to 38 weeks of GA10 to determine the optimal setting following:

$${\alpha }_{GA}\,:=\,\alpha \left[GA\right]=\left\{\begin{array}{ll}0.2 & \,{\rm{if}}\,GA=21\\ \alpha \left[GA-1\right]\left(2-\frac{{\sigma }_{WM}[GA]}{{\sigma }_{WM}[GA-1]}\right) & \,{\rm{if}}\,GA\in \{22,23,\ldots ,38\},\end{array}\right.$$
(2)

where σWM stands for the estimated standard deviation of WM intensity (see Fig. 3C). We empirically set α21 = 0.2, and iteratively compute α corresponding to the GA interval according to Equation (2) for the 17 remaining subjects of the STA. For continous GA estimation, α() is determined by shape-preserving piece-wise cubic interpolation for GA between 20.0 and 38.0 weeks. Given (2), we obtain the new T1 and T2 (see Fig. 3D) from the reference Tk as:

$${T}_{k}^{\,{\rm{new}}}={T}_{k}\cdot {f}_{GA}(w)\quad k\in \{1,2\},$$
(3)
$$\,{\rm{being}}\,\quad {f}_{GA}(w)={\alpha }_{GA}\left(\frac{2}{1+{e}^{-\frac{2w}{{\alpha }_{GA}}}}-1\right)+1,$$
(4)

Simulated datasets

High-resolution multi-tissue annotations from clinical experts are used as input models of the developing brain to simulate typical fetal brain FSE acquisitions, either at 1.5T or 3T28. Overall, synthetic images of 78 subjects are provided out of the 88 subjects from41,42. These subjects (n=78) were randomly selected to achieve a homogeneous distribution of disease condition and GA. Different subgroups are used in the sections Technical Validation and Reuse Potential: i) a qualitative evaluation of the realism of the simulated low-resolution series and ii) a quantitative comparison between simulated and original clinical images are based on n=29 simulations; the automated fetal brain multi-tissue segmentation with deep learning experiment is based on n=70 simulations used for training. Twenty-one subjects are shared in the simulations of the three experiments. For each subject, T2w series of 2D thick slices are simulated in the three orthogonal orientations (i.e., axial, coronal, sagittal) with respect to the fetal brain position. As routinely performed in clinical practice, two partially overlapping LR series are generated in every orientation for subsequent SR reconstruction of the fetal brain volume. Since the realism of interslice, 3D random rigid movements of the fetus encoded during k-space sampling achieved with FaBiAN v1.238 has already been validated28, the dataset showcased in this work is generated without additional movement not to induce any bias in the qualitative evaluation of the realism of the generated LR series by the radiologists, but also to make it possible to align SR reconstructions from the simulated images to the corresponding ground truth label maps they originate from.

Table 1 reports the range of MR sequence parameters that determine the contrast, the geometry, and the resolution of representative clinical low-resolution series acquired in clinical routine on different MR systems (Half-Fourier Acquisition Single-shot Turbo spin Echo, HASTE for Siemens, SS-FSE for GE scanners), at either 1.5 or 3T, to screen the in utero developing brain. These parameters are therefore reproduced to simulate the T2w images of the fetal brain used throughout this study.

Table 1 Acquisition parameters at clinical magnetic field strength (1.5 and 3T) of two MR sequences (HASTE and SS-FSE) commonly used for fetal brain examination, and reproduced to simulate T2w images of the developing fetal brain.

Besides non-standardized MR sequence parameters, only a few studies have investigated T1 and T2 properties of the developing brain, either in utero44,45,47 or in preterm newborns43,46. We determined mean reference T146,47 and T2 values43,44,45 of GM and WM based on measurements performed at 1.5T, knowing that both T1 and T2 properties decrease across brain development (so from in utero to postnatal life) and that T2 > T2*. Since the biochemical composition of the CSF does not vary between childhood and adulthood, T1 and T2 values of this tissue were assumed to be similar in both populations, by extension also in fetuses. Besides, we extended these reference values to 3T by assuming that T2 relaxation time remains constant across clinical magnetic field strengths for a given structure, while T1 relaxation time increases by about 25% in GM and 10% in both WM and CSF49,50,51,52,53. As reported in Table 2, we then randomly generated T1 and T2 values using the RANDBETWEEN function of Microsoft Excel (version 2108) to increase the diversity in tissue relaxation times across GA and within the three main classes modeled in the simulated images, as well as to account for measurement uncertainty in previous work. In practice, the T2 value was computed as a random integer number in the range of the mean reference T2 relaxation time plus or minus its standard deviation for both GM and WM. The T1 value was computed as the difference (if the T2 relaxation time was set smaller than its reference value), respectively the sum (if the T2 relaxation time was set larger than its reference value) between the ratio between the estimated and the reference T2 values multiplied by the reference T1 value, and a random integer number between 0 and the standard deviation of the reference T1 value.

Table 2 Range of fetal brain tissue properties simulated in this study at 1.5T and 3T.

Data Records

The simulated LR series of the 78 subjects included in this study have been released on Zenodo54 together with the corresponding brain masks and SR reconstructions. The LR multi-tissue annotations of the 29 subjects generated for the qualitative evaluation as well as the resampled parcellations of the SR-reconstructed subjects used as training set in the reuse potential experiment are also included. They are provided as compressed NIfTI images and organized in the Brain Imaging Data Structure (BIDS) format55. The Matlab (MathWorks, R2019a) and Python scripts used to generate this dataset have also been made publicly available (FaBiAN v2.040). Moreover, the implementation of FaBiAN v1.2 has been slightly modified to be containerized into a docker image (https://hub.docker.com/r/petermcgor/fabian-docker), and therefore facilitate the simulation of additional T2w MR images of the developing fetal brain from HR annotations. With the perspective of broadly disseminating this tool to the community, a similar docker image for FaBiAN v2.0 is available in DockerHub56.

Technical Validation

Qualitative evaluation of the realism of the simulated low-resolution series

Figure 4 illustrates how locally adjusting T1 and T2 values within WM tissues enhances the resemblance of simulated FSE images with real clinical MR acquisitions of the fetal brain across maturation. It shows a comparison between synthetic LR series generated using both versions of FaBiAN from the segmented, SR-reconstructed fetal brain volumes of three representative subjects and the corresponding clinical images acquired at CHUV at 1.5T, in both healthy and pathological fetuses of 21, 31, and 33 weeks of GA respectively. The modulation of T1 and T2 properties according to the water/myelin content of WM tissues results in local variations of the MR contrast allows to better depict the complexity of WM and the underlying maturation processes. For instance, the migration of neurons from the germinal matrix to the cortex during the first two trimesters of gestation is reflected by a multilayer aspect of WM, which is finely captured in synthetic images generated using FaBiAN v2.0. In contrast, WM tissues appear highly homogeneous in the LR series simulated by FaBiAN v1.2 (see Fig. 4-top row, in the coronal orientation).

Fig. 4
figure 4

Comparison between synthetic low-resolution series generated in the different orthogonal orientations using FaBiAN v1.0 (left column) and v2.0 (middle column) and three representative clinical fast spin echo acquisitions (right column) at 1.5T in both healthy and pathological fetuses of 21 (top line), 31 (middle line), and 33 weeks (bottom line) of gestational age (GA) respectively scanned at CHUV. Red arrows highlight areas where white matter heterogeneities result in MR contrast variations that are representative of the ongoing maturation processes. These white matter changes are well captured by FaBiAN v2.0 simulations, which therefore look more realistic and closer to clinical acquisitions compared to images generated by FaBiAN v1.2.

Experimental design

A pediatric neuroradiologist and a neuroradiologist from CHUV with 17 and 14 years of experience respectively, provided independent, blind, qualitative assessment of the realism of the synthetic T2w LR series of the fetal brain generated for 29 subjects (16 neurotypical and 13 pathological) in the GA range of 20.1 to 34.8 weeks (27.0 ± 4.20 weeks) by both the original version of the software (FaBiAN v1.2)38 and our new numerical WM model (FaBiAN v2.0)40. Young fetuses diagnosed with spina bifida prior surgery were excluded from this evaluation as the absence of extra-axial CSF substantially alters the quality of the clinical acquisitions and subsequent simulations. Simulations were run at 1.5T (25 subjects) and 3T (four subjects), the main difference being the higher spatial resolution and higher SNR that can be reached at 3T. High-quality LR series were simulated with high SNR and without motion not to bias the evaluation of the radiologists with corrupted images. Visual inspection and navigation throughout the different series of the fetal brain were made possible via ITK-SNAP57, displaying simulations in the same orientation plane but generated by both versions of the software in two different windows. Both experienced radiologists were asked to evaluate: i) which one of both series appears the most realistic (Experiment 1), ii) how realistic and close to MR images acquired in clinical routine every of these simulated series looks, taken independently, with a special focus on WM appearance, based on a five-point Likert scale (from 1: poorly realistic, to 5: highly realistic) (Experiment 2).

Experiment 1: Comparison between FaBiAN v1.2 and FaBiAN v2.0

This validation aims at comparing which version of our numerical phantom provides the most realistic LR series of the fetal brain throughout maturation in every orthogonal orientation. Figure 5 (top row) illustrates at which frequency every expert rated both versions of FaBiAN as providing the most realistic LR series compared to each other across development. Unanimously, FaBiAN v2.0 generated more realistic LR series over gestation than FaBiAN v1.2, for all subjects according to our neuroradiologist, respectively in more than 96.5% of the cases to our pediatric neuroradiologist. Interestingly, the three out of 87 series generated by FaBiAN v1.2 that they assessed as more realistic than the ones simulated by FaBiAN v2.0 corresponded to images of young fetuses (i.e., below 26 weeks of GA). At this early stage of development, the differences in the simulated images between both versions of FaBiAN may be more subtle to distinguish.

Fig. 5
figure 5

Qualitative evaluation of the realism of the low-resolution (LR) series simulated using FaBiAN v1.2 and FaBiAN v2.0 for 29 subjects, in the three orthogonal orientations, by two independent experts in pediatric neuroradiology (Expert #1) and in neuroradiology (Expert #2). Experiment 1 (top row) consisted in comparing images generated for every subject by both implementations of the simulation framework. The cumulative count of the most realistic synthetic LR series is represented over the gestational age, according to each expert, showing a net preference for FaBiAN v2.0 (between 96% and 100% of the images evaluated as the most realistic by Expert 1 and Expert 2 respectively) over FaBiAN v1.2 throughout fetal brain maturation. Experiment 2 (bottom row) further aimed at evaluating the realism of the simulated LR series independently. The violin plots show the distribution of the ratings (between 1-poorly realistic and 5-highly realistic) of every image by each expert. Overall, the LR series simulated by FaBiAN v2.0 were rated as more realistic than the ones generated using FaBiAN v1.2.

Experiment 2: Independent assessment

This experiment aims to independently evaluate the realism of the LR series of the developing fetal brain simulated by FaBiAN v1.2 and FaBiAN v2.0. The violin plots in Fig. 5 (bottom row) display the distribution of ratings by each expert of all the synthetic LR series generated using both phantom versions. They show dispersity in the realism of the simulated images, although LR series generated by FaBiAN v2.0 were evaluated as the most realistic overall. These independent ratings of all the synthetic images provided, without comparison of both versions of the phantom to each other, further support the findings inferred from the first experiment, namely that our latest developments (FaBiAN v2.0) make it possible to generate even more realistic images than the original prototype (FaBiAN v1.2) by capturing local changes within WM tissues throughout maturation.

Both of these experiments validate that the proposed dataset generated using FaBiAN v2.0 provides highly realistic T2w MR images of the brain throughout in utero development, and not only in older fetuses where maturation processes are further advanced.

Quantitative comparison between simulated and original clinical images

This section aims at quantifying the similarity between simulated (FaBiAN v1.2, respectively FaBiAN v2.0) and real clinical data.

Since SR reconstructions are of great value for further quantitative analysis of the developing fetal brain, and because manual annotations of these data are available, we compare the similarity between SR reconstructions from simulated and real cases (n=29 subjects), with a special focus on WM tissues.

Partially-overlapping orthogonal LR series (two in each acquisition plane) of the same 29 subjects inspected during qualitative evaluation by the radiologists were generated and combined to reconstruct an SR volume of the fetal brain of the corresponding clinical cases using two different pipelines, the Image Registration Toolkit SVRTK20 under Licence from Ixico Ltd., or MIALSRTK21), as in16. Overall, 21 subjects (nine neurotypical and 12 pathological in the GA range of 20.1 to 34.8 weeks (26.4 ± 4.06 weeks) were reconstructed using SVRTK, eight subjects (seven neurotypical and one pathological in the GA range of 21.2 to 33.4 weeks (28.5 ± 4.2 weeks) with MIALSRTK respectively. For every subject, mutual information (MI) was computed between the SR reconstructions from synthetic data (FaBiAN v1.2 against FaBiAN v2.0), and between the SR reconstructions from simulated (FaBiAN v1.2 or FaBiAN v2.0, respectively) and real clinical data, as a measure of similarity between them, with a focus on the WM mask. The higher the MI between two images, the closer they are.

Figure 6 quantifies to which extent SR reconstructions from simulated, respectively from real clinical acquisitions, are close to each other for the same subject and across GA. As shown by the left panel (blue dots), the differences between SR reconstructions from images generated by FaBiAN v1.2 and FaBiAN v2.0 are more pronounced in older fetuses, leading, for the same subject, to a decrease in MI between SR reconstructions from data simulated by FaBiAN v1.2 and FaBiAN v2.0, respectively, with GA. This trend can be explained by the growing proportion of WM in relation to other brain tissues over development, knowing that FaBiAN v2.0 accounts for local WM changes throughout brain maturation while WM appears rather uniform in simulations from the first prototype. The right panel of Fig. 6 further supports these findings by highlighting an increase in MI between SR reconstructions from data simulated by FaBiAN v2.0 and the corresponding clinical subjects across development (green dots) whereas, on the contrary, MI between SR reconstructions from images generated by FaBiAN v1.2 and real subjects decreases with GA (red dots). Linear regressions on these data (see solid lines) corroborate that SR reconstructions of subjects older than 23 weeks of GA simulated by FaBiAN v2.0 show a greater similarity with real clinical data than their twins reconstructed from images generated by FaBiAN v1.2, especially as the fetus gets older. We, therefore, assume that local WM changes implemented by FaBiAN v2.0 throughout brain maturation enhance the realism of the simulated images by accounting for spatial and temporal heterogeneities within WM tissues.

Fig. 6
figure 6

Mutual information (MI) between SR reconstructions from simulated data (FaBiAN v2.0 versus v1.2), and between SR reconstructions from simulated data (FaBiAN v1.2 or v2.0, respectively) and the corresponding SR reconstruction obtained from real clinical acquisitions across gestational age (GA) for the 29 subjects included in the qualitative evaluation by the radiologists, with a focus on the white matter mask. MI between SR reconstructions from simulated data (FaBiAN v2.0 versus v1.2) decreases with GA, showing a higher difference between the images generated by FaBiAN v2.0 and v1.2 at later GA, in other words when white matter changes linked to brain maturation become more obvious. The right panel of the Figure shows an enlargement of the plots that compare SR reconstructions from simulated and clinical data. SR reconstructions from data generated by FaBiAN v2.0 lead to an increase in MI across GA compared to images simulated using FaBiAN v1.2, thus demonstrating a higher similarity between the SR reconstructions from clinical subjects and data generated by FaBiAN v2.0 compared to images simulated using FaBiAN v1.2 as the fetus gets older.

Reuse Potential

Leveraging in silico data for automated fetal brain multi-tissue segmentation with deep learning

We extended the technical validation of our proposed dataset as a valuable complement to scarce, annotated clinical data, in the context of fetal brain multi-tissue segmentation.

Methodology

Clinical data

Eighty-eight SR-reconstructed clinical subjects from the FeTA dataset16 were complemented with corresponding refined HR label maps41,42 that delineate eight classes: WM (excluding the corpus callosum), cortical GM, deep GM, extra-axial CSF, intra-axial CSF, brainstem, cerebellum, and corpus callosum. The dataset was split into training and test samples, including 70 (healthy: 27, pathological: 43) and 18 subjects (healthy: 7, pathological: 11), respectively, in the GA range of 20.0 to 34.8 weeks (27.1 ± 3.62 weeks) and 20.9 to 33.1 weeks (26.9 ± 3.53 weeks). Subjects were selected on a stratified random basis with disease condition and GA as factors.

In silico dataset

We generated realistic T2w MR images from the refined, clinical HR annotations based on the sequence parameters extracted from multiple LR series acquired in 15 representative subjects28, using our extended framework (FaBiAN v2.0) to account for WM heterogeneity and changes across maturation. For every subject, six partially overlapping 2D LR series were simulated in the three orthogonal orientations without motion and combined to reconstruct one single HR volume of the fetal brain using either SR reconstruction pipeline to match the corresponding clinical subject (either MIALSRTK v2.1.0, docker container58, or SVRTK20, using their default parameters).

Experimental design

Our study’s primary objective is to assess the practicality and efficacy of utilizing simulated data for training DL models in the context of segmentation tasks, prioritizing practical applications over the pursuit of a marginal, residual performance gain. To achieve this goal, we chose to employ the widely accessible nnU-Net implementation59, renowned for its robustness in similar tasks, as demonstrated by its strong performance in the FeTA Challenge 202118. To evaluate the impact of simulated data, we conducted a comparative analysis, employing the same nnU-Net model while varying the input data. We initially trained a baseline model using 70 out of 88 subjects from the real dataset, including manual annotations (ground truth). Subsequently, we trained a series of models using varying proportions of SR images reconstructed from synthetic LR series, along with the corresponding ground truth annotations for each subject, intending to match the size and characteristics (e.g., individual anatomical features, tissue distribution, common acquisition parameters) of the training dataset. Specifically, we trained models with none (i.e., only SR reconstructions from synthetic data), six, nine, 12, 15, and 18 clinical SR images, combined with simulated data up to 70 subjects. In line with nnU-Net’s standard preprocessing, we performed skull stripping (SynthStrip, FreeSurfer60) on the images before inputting them to the different models. For all models, we employed the 3D U-Net full resolution architecture, which was selected based on the baseline model’s cross-validation results. This architecture choice emerged as optimal among the available options (2D, 3D low-resolution, 3D full-resolution, or their combinations) during the cross-validation process.

Evaluation

The performance of fetal brain tissue segmentation models was assessed based on the Dice similarity coefficient (DSC61,62), which quantifies the overlap between the predicted segmentation and the ground truth manual tissue annotations. The performance metrics are reported for the test set (18 subjects) and are denoted as \(\widetilde{DS{C}_{n}^{tissue}}\), where n indicates the number of synthetic subjects used for training, and tissue consists of the following categories: WM, intra-axial CSF, cerebellum, extra-axial CSF, cortical GM, deep GM, brainstem, and corpus callosum. Statistical comparisons between different models and the baseline were performed using the Wilcoxon rank sum test for individual fetal brain tissues and overall, with Bonferroni correction applied to adjust for multiple comparisons. Statistical significance was set at p  < 0.05.

Feasibility study

Our experiment evaluates whether automated segmentation of the developing fetal brain can be achieved with good performance without employing real data.

The boxplots in Fig. 7a show that a model trained exclusively on synthetic images can achieve an acceptable overall performance in term of tissue segmentation (\(\widetilde{DS{C}_{0}}=0.8{0}_{0.21}\)), suggesting that simulated data contain sufficient semantics for the nnU-Net model to provide approximate annotations that could then be refined by a radiologist.

Fig. 7
figure 7

Dice similarity coefficient (DSC) scores for 18 test subjects during multi-tissue segmentation using nnU-Net with synthetic image training. Model performance is shown as the number of synthetic images decreases from 70 to zero: (a) overall performance for all tissues and subjects, (b) performance per tissue across all subjects.

Notably, the DSC substantially improves with the inclusion of a small number of real images in the training set (e.g., \(\widetilde{DS{C}_{6}}=0.8{5}_{0.14}\), \(\widetilde{DS{C}_{12}}=0.8{9}_{0.12}\)), approaching the performance of a model trained solely on real images (\(\widetilde{DS{C}_{70}}=0.9{2}_{0.11}\)), with these improvements being statistically significant (p 0.05). This observation highlights the potential for reducing the reliance on real data for training purposes, which is especially crucial in very specific or sensitive populations for which data are scarce.

Furthermore, it is important to note that the differences in model performance are not consistent across all tissues, as depicted in Fig. 7b. The most significant differences (p 0.05) between models are primarily attributed to tissues that include complex structures, or to small regions of interest like the corpus callosum, which present challenges for accurate segmentation (with or without synthetic subjects). Conversely, all models exhibit similar performance when segmenting WM areas (excluding the corpus callosum; \(\widetilde{DS{C}_{0}^{WM}}=0.8{9}_{0.03}\) or \(\widetilde{DS{C}_{6}^{WM}}=0.9{2}_{0.03}\) vs. \(\widetilde{DS{C}_{70}^{WM}}=0.9{5}_{0.04}\)), which have been a key focus of this work and the development of FaBiAN v2.0. This consistent reliability in the segmentation performance of WM further supports the potential for re-use of the dataset showcased in this work, as well as the validity of our new model of the developing fetal brain.

These findings provide valuable insights into the potential for using synthetic data to enhance the training of deep learning models for fetal brain tissue segmentation, particularly in scenarios where access to real data is limited or challenging.