deepmriprep: voxel-based morphometry preprocessing via deep neural networks

Fisch, Lukas; Winter, Nils R.; Goltermann, Janik; Barkhau, Carlotta; Emden, Daniel; Ernsting, Jan; Konowski, Maximilian; Leenings, Ramona; Borgers, Tiana; Flinkenflügel, Kira; Grotegerd, Dominik; Kraus, Anna; Leehr, Elisabeth J.; Meinert, Susanne; Stein, Frederike; Teutenberg, Lea; Thomas-Odenthal, Florian; Usemann, Paula; Hermesdorf, Marco; Jamalabadi, Hamidreza; Jansen, Andreas; Nenadić, Igor; Straube, Benjamin; Kircher, Tilo; Berger, Klaus; Risse, Benjamin; Dannlowski, Udo; Hahn, Tim

doi:10.1038/s43588-026-00953-7

Download PDF

Brief Communication
Open access
Published: 30 January 2026

deepmriprep: voxel-based morphometry preprocessing via deep neural networks

Lukas Fisch ORCID: orcid.org/0000-0001-6424-7676¹,
Nils R. Winter¹,
Janik Goltermann ORCID: orcid.org/0000-0003-3087-1002^1,2,
Carlotta Barkhau¹,
Daniel Emden^1,2,
Jan Ernsting^1,3,4,
Maximilian Konowski ORCID: orcid.org/0009-0009-2790-9330¹,
Ramona Leenings¹,
Tiana Borgers¹,
Kira Flinkenflügel¹,
Dominik Grotegerd¹,
Anna Kraus¹,
Elisabeth J. Leehr¹,
Susanne Meinert^1,5,
Frederike Stein⁶,
Lea Teutenberg⁶,
Florian Thomas-Odenthal⁶,
Paula Usemann⁶,
Marco Hermesdorf⁷,
Hamidreza Jamalabadi ORCID: orcid.org/0000-0003-2485-2374⁶,
Andreas Jansen⁶,
Igor Nenadić ORCID: orcid.org/0000-0002-0749-7473⁶,
Benjamin Straube⁶,
Tilo Kircher⁶,
Klaus Berger⁷,
Benjamin Risse³,
Udo Dannlowski^1,8,9,10 &
…
Tim Hahn¹

Nature Computational Science (2026)Cite this article

3689 Accesses
46 Altmetric
Metrics details

Subjects

Abstract

Voxel-based morphometry (VBM), a popular approach in neuroimaging research, uses magnetic resonance imaging data to assess variations in the local density of brain tissue and to examine its associations with biological and psychometric variables. Here we present deepmriprep, a preprocessing pipeline designed to leverage neural networks to perform all the necessary preprocessing steps for the VBM analysis of T₁-weighted magnetic resonance imaging. Utilizing the graphics processing unit, deepmriprep is 37 times faster than CAT12, the leading VBM preprocessing toolbox. The proposed method matches CAT12 in accuracy for tissue segmentation and image registration across more than 100 datasets and shows strong correlations in the VBM results. Tissue segmentation maps from deepmriprep have more than 95% agreement with ground-truth maps, and its nonlinear registration predicts smooth deformation fields comparable to CAT12. The high computational speed of deepmriprep enables rapid preprocessing of large datasets and opens the door to real-time applications.

Choice of Voxel-based Morphometry processing pipeline drives variability in the location of neuroanatomical brain markers

Article Open access 06 September 2022

Reproducibility and across-site transferability of an improved deep learning approach for aneurysm detection and segmentation in time-of-flight MR-angiograms

Article Open access 13 August 2024

Explainable hybrid vision transformers and convolutional network for multimodal glioma segmentation in brain MRI

Article Open access 14 February 2024

Main

Voxel-based morphometry (VBM) is a widely used analytical approach in neuroimaging research that aims to measure differences in the local concentration of brain tissue across multiple brain magnetic resonance imaging (MRI) scans and to investigate their association with biological and psychometric variables^1,2. Comparing neuroimaging data is challenging, because the intensity of MRI is not standardized, and brain structures differ across individuals. Standard VBM preprocessing addresses this by segmenting MRI scans into tissue classes and spatially normalizing the resulting tissue map to a template^3,4,5,6,7,8. Finally, generalized linear models (GLMs) are fit for each voxel, modeling associations between spatially normalized tissue probabilities and the considered biological (for instance, age and sex) or psychometric variables (for instance, symptom severity or cognitive performance scores). If the GLM reveals a significant association, the corresponding voxel may be considered a region of interest (ROI), indicating a potential neural correlate of the variables under study.

Given that the effect sizes in VBM-based statistical analyses are typically small⁹, MRI datasets with thousands of participants are necessary to obtain accurate measurements¹⁰. To meet this demand, large-scale datasets have expanded to include more than 40,000 participants¹¹, resulting in datasets that exceed 100,000 MRIs. However, preprocessing these large datasets with existing toolboxes, such as CAT12⁷, can take weeks or even months on standard hardware, delaying scientific progress. Developing a more computationally efficient VBM preprocessing pipeline could alleviate these processing bottlenecks, allowing researchers to focus on the conceptual aspects of their studies and accelerating scientific discovery. Therefore, creating a faster VBM preprocessing pipeline is a critical step forward for structural neuroimaging research.

In recent years, deep learning has emerged as a highly effective approach for various tasks in medical image analysis¹², providing state-of-the-art performance in a wide range of image-related applications, such as semantic segmentation¹³. The neuroimaging community has followed this trend, resulting in neural network-based tools for brain extraction^14,15,16, tissue segmentation^17,18, registration^19,20,21,22 and other neuroimaging-specific tasks²³.

However, the adoption of neural network-based methods for preprocessing still lags behind classical toolboxes, such as CAT12⁷, SPM³ or FreeSurfer⁶, owing to two main reasons. First, deep learning tools often perform poorly in a realistic setting where they are applied to MRIs from scanner sites unseen during model training¹⁴. In line with ref. ²⁴, this problem can be addressed by increasing the number of scanner sites²⁵ and the extensive use of data augmentation^15,20,24,26. Second, neural network-based tools are often specialized for only one processing step, while CAT12, SPM and FreeSurfer provide full processing pipelines for VBM and other methods such as surface-based morphometry (SBM). Tools, such as SynthMorph²⁴, SynthStrip¹⁵ and EasyReg²¹, attempt to resolve this by being integrated into the FreeSurfer toolbox, serving as alternatives for parts of its preprocessing pipeline. However, to the best of our knowledge, there is no toolbox that has been developed from the ground up to fully harness the potential of neural networks across all preprocessing steps needed for VBM analysis.

We present deepmriprep, a preprocessing pipeline for VBM analysis of structural MRI data that is built to fully leverage deep learning. deepmriprep uses neural networks for the major VBM preprocessing steps: brain extraction, tissue segmentation and spatial registration with a template. Brain extraction is performed by deepbet¹⁴, the most accurate existing method to remove nonbrain voxels in T₁-weighted (T₁w) MRIs of healthy adults. To encompass the full VBM preprocessing, in this work, we additionally develop neural networks for tissue segmentation and registration. For tissue segmentation, we use a patch-based three-dimensional (3D) UNet approach, inspired by Isensee et al.²³, which also exploits neuroanatomical properties, such as hemispheric symmetry. Nonlinear image registration is performed using a custom variant of SYMNet²², which uses a 3D UNet in conjunction with DARTEL shooting²⁷ to predict smooth and invertible deformation fields.

The neural networks are trained on 685 MRIs compiled from 137 OpenNeuro datasets in a grouped cross-validation to ensure realistic validation performance. The worst predictions are visually inspected to identify potential weaknesses. In addition, deepmriprep is tested on 18 OpenNeuro datasets with pediatric healthy controls (HCs) and a dataset with synthetic atrophy and synthetic image artifacts. To investigate the effect of preprocessing on VBM-based statistical analyses, the VBM pipelines of deepmriprep and CAT12 are applied to 4,017 participants from three cohorts. In subsequent VBM analyses, associations with biological and psychometric variables are investigated, and the correlation of the resulting t-maps based on deepmriprep and CAT12 is analyzed. Finally, the correlations between deepmriprep- and CAT12-based tissue volume measurements are investigated in a ROI-based setting using the LPBA40 atlas²⁸. In conclusion, our results indicate that deepmriprep is 37 times faster than CAT12 while achieving comparable accuracy in the individual preprocessing steps and strongly correlated results in the final VBM analysis.

Results

Tissue segmentation

OpenNeuro-HD

We first evaluated deepmriprep and CAT12 on 685 high-resolution adult MRIs with strict quality control (OpenNeuro-HD) using cross-dataset validation. deepmriprep demonstrated robust tissue segmentation, achieving a median Dice score DSC_median of 95.0 across validation MRIs (Fig. 1a, Supplementary Fig. 1, Supplementary Table 1 (left) and Source Data Fig. 1). This high level of agreement with the ground truth—that is, CAT12 tissue segmentation maps derived from high-resolution MRIs (see ‘Preprocessing’ section in the Supplementary Information)—is an improvement compared with CAT12 (in original MRI resolution), which achieves a median Dice score DSC_median of 93.1.

The high agreement of deepmriprep’s tissue segmentation with the ground truth in terms of the Dice score is confirmed by the foreground probabilistic Dice score (pDSC 84.7) and Jaccard score (JSC 90.6) shown in Supplementary Figs. 2 and 3 and Supplementary Tables 2 and 3. The segmentation of cerebrospinal fluid (CSF) resulted in the lowest median Dice scores ${\mathrm{DSC}}_{\mathrm{median}}^{\mathrm{CSF}}$ of 91.1 for deepmriprep and 85.6 for CAT12 (Fig. 2). Furthermore, the CSF Dice scores showed the strongest outliers across all 685 validation MRIs, with minimal Dice scores ${\mathrm{DSC}}_{\min }^{\mathrm{CSF}}$ of 73.6 for deepmriprep and 62.9 for CAT12. In the tissue maps that resulted in the minimal foreground metrics ${\mathrm{DSC}}_{\min }$, ${\mathrm{pDSC}}_{\min }$ and ${\mathrm{JSC}}_{\min }$ for each method (Fig. 1b and Supplementary Figs. 2 and 3), deepmriprep and CAT12 produced a thicker outer layer of CSF than the ground truth. With respect to gray matter (GM) and white matter (WM), the tissue maps of both methods did not show notable visual differences compared with the reference maps.

OpenNeuro-Total

To test robustness in a realistic, heterogeneous setting, we compared deepmriprep and CAT12 on 8,279 scans from 208 datasets with only minimal quality control (OpenNeuro-Total). Despite this challenging setting, the median Dice score DSC_median of 93.1 between deepmriprep and CAT12 tissue maps—not to be confused with Dice scores of each tool’s output compared with ground truths—showed high agreement for most of the respective tissue maps (Supplementary Fig. 4a). Again, GM and WM segmentation was most consistent with Dice scores of 96.0 for ${\mathrm{DSC}}_{\mathrm{median}}^{\mathrm{GM}}$ and 97.4 for ${\mathrm{DSC}}_{\mathrm{median}}^{\mathrm{WM}}$, while CSF segmentation resulted in a lower median Dice score of 85.9 for ${\mathrm{DSC}}_{\mathrm{median}}^{\mathrm{CSF}}$.

Despite the absence of ground truth, visually comparing tissue maps with low Dice scores—that is, low agreement between deepmriprep and CAT12—enables a qualitative assessment of each tool’s robustness.

Throughout the tissue maps with the 0.0th, 0.1th, 0.2th, 0.3th and 0.4th percentile foreground Dice scores DSC (Supplementary Fig. 4b), deepmriprep showed reasonable results with minor artifacts, while CAT12 was prone to errors. In the 0.0th and 0.1th percentile tissue maps, CAT12 produced unusable results with respective Dice scores of 0.0 for DSC^GWM and 1.9 for DSC^GWM, compared with the reasonable tissue maps created by deepmriprep. In the 0.2th, 0.3th and 0.4th percentile tissue maps, CAT12 produced less detailed tissue maps than deepmriprep and misclassified tissue at the edge of the brain as background. deepmriprep properly classified the outer edge tissue, but misclassified areas of CSF as background in the 0.4th percentile tissue map. The same characteristic sources of errors could be found across the 16 tissue maps with the lowest agreement between deepmriprep and CAT12 (Supplementary Fig. 5), again measured by DSC. Finally, due to an error, CAT12 did not produce any tissue map for one MRI scan, while deepmriprep processed all 8,279 MRIs without any errors.

Image registration

The registration of tissue probability maps with deepmriprep resulted in a median mean squared error MSE_median of 9.9 × 10⁻³ and a median linear elasticity LE_median of 250 during cross-dataset validation (Supplementary Figs. 6, left, and 7). These metrics indicate that deepmriprep performs on par with CAT12 (MSE_median 9.2 × 10⁻³, LE_median 240). While CAT12 showed slightly better median metrics, the supervised SYMNet used within deepmriprep resulted in a smaller maximal linear elasticity across MRIs, with an ${\mathrm{LE}}_{\max }$ of 366 (CAT12: ${\mathrm{LE}}_{\max }\,386$), and a smaller 95th percentile LE_95p of 280 (CAT12: LE_95p 283). This favorable linear elasticity indicates improved regularity of the deformation field for challenging probability maps—that is, maps that exhibit large voxel-wise differences from the template.

For both registration methods, the same probability map resulted in the largest voxel-wise mean squared error (MSE) after registration (Supplementary Fig. 6, right). Visual inspection of this warped probability map uncovers a small misalignment at the upper edge of the ventricles for deepmriprep, indicating less rigor in aligning the map with the template. Based on the absolute voxel-wise difference to the template, no apparent differences between deepmriprep and CAT12 could be found.

VBM analyses

VBM analysis results for GM based on deepmriprep and CAT12 across all three datasets (Marburg-Münster Affective Disorders Cohort Study (MACS), Münster Neuroimaging Cohort (MNC) and BiDirect) demonstrated high similarity (Supplementary Fig. 8 and Supplementary Data 1), with strong correlation between the respective t-maps (Supplementary Table 7). The correlation of t-maps remained strong even for the psychometric variables—years of education, HC versus major depressive disorder (HC versus MDD) and intelligence quotient (IQ)—despite their smaller effects compared with the biological variables, namely age, sex and body mass index (BMI). The analyses that used all three datasets all resulted in correlation coefficients of r > 0.8, with BMI (r = 0.75) being the only exception. The equivalence between deepmriprep- and CAT12-based analysis outcomes is also supported by their similar maximal, absolute t-scores $| t{| }_{\max }$ (Supplementary Tables 8 and 9), especially for age and HC versus MDD. deepmriprep resulted in a larger maximal, absolute t-score for sex and age and smaller maximal, absolute t-scores for IQ, years of education and BMI. The difference in maximal values and the reduced t-map correlation for BMI was primarily driven by a large cluster in the outer cerebellum, which appeared only in the CAT12-preprocessed data (Supplementary Fig. 8).

The correlation coefficients of r > 0.8 also hold for the analyses using the MACS dataset and BiDirect dataset individually, again with BMI being the exception due to CAT12-based large clusters in the outer cerebellum (Supplementary Figs. 9 and 11). For analyses using only the MNC dataset, BMI-based t-maps strongly correlated with r = 0.83, while the sex-based t-maps resulted in a correlation coefficient of r = 0.72 (Supplementary Fig. 10).

The VBM results in WM (Supplementary Figs. 12–15) also exhibit strong correlations of r > 0.8 between most t-maps. Again, the BMI-based t-maps showed the lowest correlation caused by CAT12-based large clusters in the cerebellum. In addition, sex-based t-maps for the MNC dataset, BiDirect dataset and the pooled analysis showed lower correlation coefficients of 0.71, 0.71 and 0.73, respectively, and the HC versus MDD analysis for the BiDirect dataset resulted in a correlation coefficient of r = 0.74.

Finally, ROI-based GM volume measurements of deepmriprep and CAT12 exhibit strong correlations of r > 0.9 across all of the 29 ROIs of the LPBA40 atlas, with the only exception being the GM volumes of the brainstem region with a correlation of r = 0.75 (Supplementary Figs. 16–21). For WM, the caudate (r = 0.80), hippocampus (r = 0.81) and cerebellar lobe (r = 0.89) showed the lowest correlation coefficients, with all remaining regions resulting in correlations of r > 0.9.

Processing time

deepmriprep achieved the highest processing speed on both low-end and high-end hardware (Supplementary Fig. 22) across the 8,279 scans from 208 datasets with only minimal quality control (OpenNeuro-Total). On high-end hardware, DeepMRIPrep took an average of 4.6 s per MRI using the graphics processing unit, whereas CAT12, parallelized across all 16 cores of the high-end processor, required an average of 173 s per MRI. On low-end hardware, deepmriprep and CAT12 take 209 s and 1,096 s per MRI, respectively.

Discussion

We present deepmriprep, a neural network-based pipeline specifically built for VBM preprocessing. deepmriprep is 37× faster than CAT12, a leading toolbox known to be the more efficient preprocessing alternative to FreeSurfer for SBM. For preprocessing a large dataset containing 100,000 MRI scans such as the UK Biobank¹¹, this translates into a reduction of computation time from 6 months to 5 days. While being faster, it delivers equivalent or better accuracy across tissue segmentation and registration. Most importantly, statistical maps based on deepmriprep preprocessing show strong correlations with respective CAT12-based VBM results.

It should be highlighted that assigning reliability to any statistical VBM map in the absence of gold standard or ground truth is inherently difficult²⁹, aggravated by the large amounts of data required to achieve sufficient statistical power¹⁰. Therefore, the differences between the results of the deepmriprep- and CAT12-based VBM analyses should be interpreted with caution. Consequently, more research is needed to advance VBM from a scientific tool for detecting group-level differences to a reliable clinical application for individual patient diagnosis.

Furthermore, it should be noted that CAT12 is not solely a VBM toolbox but also offers SBM, while the current version of deepmriprep is limited to VBM, a shortcoming that should be addressed in a future version of deepmriprep to gain adoption in the broad neuroimaging community. Furthermore, the nonlinear registration could be improved by optimizing the affine matrix in conjunction with the warping field, thereby avoiding any potential biases of the initial affine registration. Finally, the training data quality could be improved to further increase deepmriprep’s accuracy. One promising, straightforward approach would be to use more training data with higher image quality, for instance, by using MRIs acquired with increased scan time, increased matrix size and reduced slice thickness.In addition, human expert annotations can be used to generate high-quality tissue segmentation maps and deformation fields, which can then be combined with lower-quality MRIs from the same session as input data. This would train the neural networks to predict high-quality tissue segmentation maps and deformation fields, even if the input MRI is of lower quality.

Although deepmriprep’s high processing speed and user-friendly interface are its main advantages, its underlying software design may hold even greater implications for future development (https://github.com/wwu-mmll/deepmriprep)³⁰. The software is structured into small, modular components, each comprising fewer than 1,000 lines of code. This streamlined design improves long-term maintainability and reduces the likelihood of potentially far-reaching bugs^5,31. Most importantly, the straightforward software architecture of deepmriprep reduces the barrier for researchers in VBM and other neuroimaging domains, making it easier to understand, adapt and reuse the code for various neuroimaging pipelines. We anticipate that the broader adoption of deepmriprep into other neuroimaging pipelines will advance the underlying methods, thus fostering progress in the broader neuroscience community.

Methods

Datasets

This study uses existing data from 225 datasets published on the OpenNeuro platform³² downloaded via the openneuro-py version 2023.1.0 Python package (https://github.com/hoechenberger/openneuro-py). OpenNeuro data from adult HCs were used for training and validation with cross-validation, while OpenNeuro data from HCs aged 2–12 years were used for testing, along with a Synthetic Atrophy dataset and three patient datasets: the Münster Neuroimaging Cohort (MNC), the Marburg-Münster Affective Disorders Cohort Study (FOR2107/MACS) and the BiDirect study. Data availability is governed by the respective consortia. No new data were acquired for this study.

Training and validation datasets

OpenNeuro-Total

Out of the over 700 datasets available at OpenNeuro at the time of compilation (10 November 2021), each dataset that contained at least five T₁w MRIs from at least five adult HCs was included, resulting in 208 datasets. Based on a successive visual quality check, 30 MRIs were excluded, mainly due to improper masking (Supplementary Fig. 23a) and erroneous orientation (Supplementary Fig. 23b). The remaining compilation of 8,279 T₁w MRIs is used as the OpenNeuro-Total dataset (Supplementary Fig. 24 and Supplementary Table 10).

OpenNeuro-HD

The 8,279 MRIs from OpenNeuro-Total were preprocessed using the commonly used CAT12 toolbox (https://neuro-jena.github.io/cat/) with default parameters⁷. To ensure high quality of the training data, strict quality thresholds were set based on the preprocessing quality ratings provided by the toolbox. To be included, all ratings had to be at least a B− grade, resulting in the following thresholds: a surface Euler number below 25, a surface defect area under 5.0, a surface intensity root mean square error below 0.1, and a surface position root mean square error below 1.0. All OpenNeuro datasets that contained fewer than ten adult HCs after this quality control were excluded. In the remaining datasets, MRIs were ranked according to the surface defect number, and finally the top five MRIs per dataset that passed a visual quality check were included in the dataset. This results in a total of 685 MRIs from 137 datasets called OpenNeuro-HD (Supplementary Fig. 24, middle, Supplementary Table 10 and Source Data Fig. 2).

Test datasets

OpenNeuro-Kids

Among the over 700 datasets available at OpenNeuro at the time of compilation (10 November 2021), each dataset that contained at least 5 T₁w MRIs from at least 5 HCs in the age range from 2 to 12 years was included, resulting in 18 datasets. Based on a successive visual quality check, 300 MRIs were excluded, mainly due to strong motion artifacts (Supplementary Fig. 23c) and improper masking (Supplementary Fig. 23d). The remaining compilation of 867 T₁w MRIs is used as the OpenNeuro-Kids dataset (Supplementary Fig. 24, right, Supplementary Table 10 and Source Data Fig. 1). The CAT12 preprocessing for OpenNeuro-Kids used the TMP_Age11.5 template.

Synthetic atrophy and synthetic artifacts

Published by Rusak et al.³³, this dataset uses T₁w MRIs of 20 HCs from the Alzheimer’s Disease Neuroimaging Initiative³⁴ to synthetically introduce global neocortical atrophy. Simulating ten progressions of atrophy, ranging from 0.1 mm to 1 mm of global thickness reduction, the resulting dataset consists of 220 T₁w MRIs (including the 20 originals) and their respective ground-truth tissue maps.

To additionally investigate the influence of scanner effects, we introduce artificial artifacts in the 20 original T₁w MRIs using Rician noise, bias field, blurring, ghosting, motion, ringing and spike artifacts (Supplementary Fig. 25). Each of the seven artifacts is applied with medium and strong intensity, resulting in 480 synthetic MRIs.

VBM analysis datasets

For the VBM analyses, we use a total of 4,017 MRIs from three independent German cohorts (Supplementary Fig. 26): the Marburg-Münster Affective Disorders Cohort Study (MACS; N = 1,799), the Münster Neuroimaging Cohort (MNC; N = 1,194) and the BiDirect cohort (N = 1,024). All three cohorts include subsamples with both patients with MDD and HCs who are free from any lifetime mental disorder diagnoses according to DSM-IV criteria.

Marburg-Münster Affective Disorders Cohort Study (FOR2107/MACS)

Patients were recruited through psychiatric hospitals, while the control group was recruited via newspaper advertisements. Patients diagnosed with MDD showed varying levels of symptom severity and underwent various forms of treatment (inpatient, outpatient or none). The FOR2107/MACS was conducted at two scanning sites: University of Münster and University of Marburg. Inclusion criteria for the present study were availability of completed baseline MRI data with sufficient MRI quality. Further details about the structure of the FOR2107/MACS³⁵ and MRI quality assurance protocol³⁶ are provided elsewhere.

Münster Neuroimaging Cohort (MNC)

In MNC, patients were recruited from local psychiatric hospitals and underwent inpatient treatment due to a moderate or severe depressive disorder. Further information regarding this study can be found in refs. ^37,38.

BiDirect

The BiDirect Study is a prospective project that comprises three distinct cohorts: patients hospitalized for an acute episode of major depression, patients up to 6 months after an acute cardiac event, and HCs randomly drawn from the population register of the city of Münster, Germany. Further details on the rationale, design and recruitment procedures of the BiDirect study have been described in refs. ^39,40.

Preprocessing

All datasets are preprocessed using the VBM pipeline of version 12.8.2 of the CAT12 toolbox, which was the latest version available at the time of analysis, with default parameters⁷. The affine transformation calculated during this initial CAT12 preprocessing is used such that tissue segmentation (see ‘Tissue segmentation’ section in the Methods) and image registration (see ‘Image registration’ section in the Methods) are consistently applied in the template coordinate space. Image registration is based on GM and WM probability maps in the standard resolution of 1.5 mm (113 × 137 × 113 voxels).

For tissue segmentation, unprocessed MRIs are affinely registered to the template in a high resolution of 0.5 mm (339 × 411 × 339 voxels) using B-spline interpolation, and the CAT12 preprocessing is repeated on the basis of these high-resolution MRIs. This circumvents any potential image degradation caused by additional resizing of the CAT12 tissue map. Because there exist no ground-truth tissue maps, these high-resolution tissue maps are used as reference maps for model training and validation. Because the MRIs are skull-stripped before tissue segmentation in deepmriprep’s prediction pipeline (see ‘Prediction pipeline’ section in the Methods), all voxels in the MRI that do not contain tissue in the respective tissue map are set to zero. Furthermore, the standard N4 bias correction⁴¹ is applied using the ANTS package⁴ to avoid interference with potential artificial bias fields introduced during data augmentation (see ‘Data augmentation’ section in the Methods). Finally, min–max scaling between the 0.5th and 99.5th percentile is used as proposed in ref. ²³ with one modification: values above the maximum are not clipped to one but scaled via the function $f(x)=1+{\log }_{10}x$ to prevent any loss of crucial information in areas with extreme intensity values (for example, blood vessels). The code for the input preprocessing is publicly accessible at https://github.com/wwu-mmll/deepmriprep-train (ref. ⁴²).

Data augmentation

Data augmentation is used during training to artificially introduce image artifacts that may occur in real-world datasets. This increases model generalizability, because effects that are infrequent in the training data can be systematically oversampled with any desired intensity. Data augmentations for the image registration step would have to be consistent with equation (2), requiring specialized implementations. Hence, for the current version of deepmriprep, data augmentation is omitted during image registration model training.

The 12 different data augmentations used during model training (Supplementary Fig. 27) are implemented in the niftiai version 0.3.2 Python package (https://github.com/codingfisch/niftiai)⁴³ and introduce artificial bias fields, motion artifacts, noise, blurring, ghosting, spike artifacts, downsampling, translation, flipping, brightness, contrast and Gibbs ringing. Bias fields are generated by applying an inverse Fourier transform to low-frequency Gaussian noise, whereas motion artifacts, ghosting⁴⁴, spike artifacts⁴⁵ and Gibbs ringing⁴⁶ are achieved by introducing artifacts in the k-space of the T₁w MRI. To be MRI-specific, noise is sampled out of a chi distribution⁴⁷, a generalization of the Rician noise distribution⁴⁸. Instead of using the full set of affine and nonlinear spatial transformations, only translation and flipping are applied via nearest-neighbor resampling to circumvent any potential image degradation.