Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Addressing artifactual bias in large, automated MRI analyses of brain development

Abstract

Large, population-based magnetic resonance imaging (MRI) studies of adolescents promise transformational insights into neurodevelopment and mental illness risk. However, youth MRI studies are especially susceptible to motion and other artifacts that introduce non-random noise. After visual quality control of 11,263 T1 MRI scans obtained at age 9–10 years through the Adolescent Brain Cognitive Development study, we uncovered bias in measurements of cortical thickness and surface area in 55.1% of the samples with suboptimal image quality. These biases impacted analyses relating structural MRI and clinical measures, resulting in both false-positive and false-negative associations. Surface hole number, an automated index of topological complexity, reproducibly identified lower-quality scans with good specificity, and its inclusion as a covariate partially mitigated quality-related bias. Closer examination of high-quality scans revealed additional topological errors introduced during image preprocessing. Correction with manual edits reproducibly altered thickness measurements and strengthened age–thickness associations. We demonstrate here that inadequate quality control undermines advantages of large sample size to detect meaningful associations. These biases can be mitigated through additional automated and manual interventions.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: MQC protocol.
Fig. 2: Association between MQC ratings and sMRI indices (n = 10,261).
Fig. 3: Effects of SHN on sMRI indices and derivation of SHN tiers in conjunction with MQC ratings (n = 10,261).
Fig. 4: Effects of variable QC on applied analyses of sMRI data.
Fig. 5: Effects of increasingly stringent QC on statistical significance and effect size of externalizing symptoms–volume findings.
Fig. 6: Effects of manual edits on sMRI indices (n = 180).

Similar content being viewed by others

Data availability

Data from all ABCD-related analyses were downloaded from the NIMH Data Archive (NDA) version 4.0 (https://nda.nih.gov/study.html?id=1299). Derived variables, including MQC ratings and SHN, as well as ROI-level data for cortical thickness, surface area and volume processed in FreeSurfer 7.1, have been uploaded to the NDA (https://nda.nih.gov/study.html?id=1944). Data from MGH analyses contain sensitive patient information that was obtained following a waiver of informed consent, and, as such, have not been uploaded to a publicly available repository. Contact the corresponding author for additional information.

Code availability

R code is available at https://doi.org/10.5281/zenodo.14872906. Source files are available at the NIMH Data Repository (https://nda.nih.gov/study.html?id=1944).

References

  1. Thompson, P. M. et al. ENIGMA and global neuroscience: a decade of large-scale studies of the brain in health and disease across more than 40 countries. Transl. Psychiatry 10, 100 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  2. Mills, K. L. & Tamnes, C. K. Methods and considerations for longitudinal structural brain imaging analysis across development. Dev. Cogn. Neurosci. 9, 172–190 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Becht, A. I. & Mills, K. L. Modeling individual differences in brain development. Biol. Psychiatry 88, 63–69 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Dick, A. S. et al. Meaningful associations in the adolescent brain cognitive development study. Neuroimage 239, 118262 (2021).

    Article  PubMed  Google Scholar 

  5. Marquand, A. F. et al. Conceptualizing mental disorders as deviations from normative functioning. Mol. Psychiatry 24, 1415–1424 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Karcher, N. R. & Barch, D. M. The ABCD study: understanding the development of risk for mental and physical health outcomes. Neuropsychopharmacology 46, 131–142 (2021).

    Article  PubMed  Google Scholar 

  7. Dick, D. M. et al. Post-GWAS in psychiatric genetics: a developmental perspective on the ‘other’ next steps. Genes Brain Behav. 17, e12447 (2018).

    Article  PubMed  Google Scholar 

  8. Alexander-Bloch, A. et al. Subtle in-scanner motion biases automated measurement of brain anatomy from in vivo MRI. Hum. Brain Mapp. 37, 2385–2397 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Blumenthal, J. D., Zijdenbos, A., Molloy, E. & Giedd, J. N. Motion artifact in magnetic resonance imaging: implications for automated analysis. Neuroimage 16, 89–92 (2002).

    Article  PubMed  Google Scholar 

  10. Reuter, M. et al. Head motion during MRI acquisition reduces gray matter volume and thickness estimates. Neuroimage 107, 107–115 (2015).

    Article  PubMed  Google Scholar 

  11. Dale, A. M., Fischl, B. & Sereno, M. I. Cortical surface-based analysis. I. Segmentation and surface reconstruction. Neuroimage 9, 179–194 (1999).

    Article  CAS  PubMed  Google Scholar 

  12. Rosen, A. F. G. et al. Quantitative assessment of structural image quality. Neuroimage 169, 407–418 (2018).

    Article  PubMed  Google Scholar 

  13. White, T. et al. Automated quality assessment of structural magnetic resonance images in children: comparison with visual inspection and surface-based reconstruction. Hum. Brain Mapp. 39, 1218–1231 (2018).

    Article  PubMed  Google Scholar 

  14. Waters, A. B., Mace, R. A., Sawyer, K. S. & Gansler, D. A. Identifying errors in Freesurfer automated skull stripping and the incremental utility of manual intervention. Brain Imaging Behav. 13, 1281–1291 (2019).

    Article  PubMed  Google Scholar 

  15. Monereo-Sanchez, J. et al. Quality control strategies for brain MRI segmentation and parcellation: practical approaches and recommendations—insights from the Maastricht study. Neuroimage 237, 118174 (2021).

    Article  PubMed  Google Scholar 

  16. Ross, M. C. et al. Gray matter volume correlates of adolescent posttraumatic stress disorder: a comparison of manual intervention and automated segmentation in FreeSurfer. Psychiatry Res. Neuroimaging 313, 111297 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  17. McCarthy, C. S. et al. A comparison of FreeSurfer-generated data with and without manual intervention. Front. Neurosci. 9, 379 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Beelen, C., Phan, T. V., Wouters, J., Ghesquiere, P. & Vandermosten, M. Investigating the added value of FreeSurfer’s manual editing procedure for the study of the reading network in a pediatric population. Front. Hum. Neurosci. 14, 143 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Eryilmaz, H. et al. Association of prenatal exposure to population-wide folic acid fortification with altered cerebral cortex maturation in youths. JAMA Psychiatry 75, 918–928 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Pulli, E. P. et al. Feasibility of FreeSurfer processing for T1-weighted brain images of 5-year-olds: semiautomated protocol of FinnBrain Neuroimaging Lab. Front. Neurosci. 16, 874062 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  21. White, N. et al. PROMO: real-time prospective motion correction in MRI using image-based tracking. Magn. Reson. Med. 63, 91–105 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Tisdall, M. D. et al. Prospective motion correction with volumetric navigators (vNavs) reduces the bias and variance in brain morphometry induced by subject motion. Neuroimage 127, 11–22 (2016).

    Article  PubMed  Google Scholar 

  23. Ducharme, S. et al. Trajectories of cortical thickness maturation in normal brain development—the importance of quality control procedures. Neuroimage 125, 267–279 (2016).

    Article  PubMed  Google Scholar 

  24. Wainberg, M., Jacobs, G. R., Voineskos, A. N. & Tripathy, S. J. Neurobiological, familial and genetic risk factors for dimensional psychopathology in the Adolescent Brain Cognitive Development study. Mol. Psychiatry 27, 2731–2741 (2022).

    Article  PubMed  Google Scholar 

  25. Wang, C., Hayes, R., Roeder, K. & Jalbrzikowski, M. Neurobiological clusters are associated with trajectories of overall psychopathology in youth. Biol. Psychiatry Cogn. Neurosci. Neuroimaing 8, 852–863 (2023).

    Google Scholar 

  26. Hughes, D. E. et al. Genetic patterning for child psychopathology is distinct from that for adults and implicates fetal cerebellar development. Nat. Neurosci. 26, 959–969 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Marek, S. et al. Reproducible brain-wide association studies require thousands of individuals. Nature 603, 654–660 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Szucs, D. & Ioannidis, J. P. Sample size evolution in neuroimaging research: an evaluation of highly-cited studies (1990–2012) and of latest practices (2017–2018) in high-impact journals. Neuroimage 221, 117164 (2020).

    Article  PubMed  Google Scholar 

  29. Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349, aac4716 (2015).

    Article  Google Scholar 

  30. Kaplan, R. M., Chambers, D. A. & Glasgow, R. E. Big data and large sample size: a cautionary note on the potential for bias. Clin. Transl. Sci. 7, 342–346 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Bozek, J., Griffanti, L., Lau, S. & Jenkinson, M. Normative models for neuroimaging markers: impact of model selection, sample size and evaluation criteria. Neuroimage 268, 119864 (2023).

    Article  CAS  PubMed  Google Scholar 

  32. Sonuga-Barke, E. J. S. Editorial: ‘Safety in numbers’? Big data discovery strategies in neuro-developmental science—contributions and caveats. J. Child Psychol. Psychiatry 64, 1–3 (2023).

    Article  PubMed  Google Scholar 

  33. Pardoe, H. R., Kucharsky Hiess, R. & Kuzniecky, R. Motion and morphometry in clinical and nonclinical populations. Neuroimage 135, 177–185 (2016).

    Article  PubMed  Google Scholar 

  34. Smith, J. et al. Can this data be saved? Techniques for high motion in resting state scans of first grade children. Dev. Cogn. Neurosci. 58, 101178 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Sacca, V. et al. Aging effect on head motion: a machine learning study on resting state fMRI data. J. Neurosci. Methods 352, 109084 (2021).

    Article  PubMed  Google Scholar 

  36. Casey, B. J. et al. The Adolescent Brain Cognitive Development (ABCD) study: imaging acquisition across 21 sites. Dev. Cogn. Neurosci. 32, 43–54 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Backhausen, L. L., Herting, M. M., Tamnes, C. K. & Vetter, N. C. Best practices in structural neuroimaging of neurodevelopmental disorders. Neuropsychol. Rev. 32, 400–418 (2022).

    Article  PubMed  Google Scholar 

  38. Duffy, B. A. et al. Retrospective motion artifact correction of structural MRI images using deep learning improves the quality of cortical surface reconstructions. Neuroimage 230, 117756 (2021).

    Article  PubMed  Google Scholar 

  39. Garavan, H. et al. Recruiting the ABCD sample: design considerations and procedures. Dev. Cogn. Neurosci. 32, 16–22 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Auchter, A. M. et al. A description of the ABCD organizational structure and communication framework. Dev. Cogn. Neurosci. 32, 8–15 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Tustison, N. J. et al. N4ITK: improved N3 bias correction. IEEE Trans. Med. Imaging 29, 1310–1320 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Esteban, O. et al. MRIQC: advancing the automatic prediction of image quality in MRI from unseen sites. PLoS ONE 12, e0184661 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Achenbach, T. M. The Achenbach System of Empirically Based Assessment (ASEBA): Development, Findings, Theory, and Applications (University of Vermont, Research Center for Children, Youth, & Families, 2009).

Download references

Acknowledgements

Presented in part at the Society for Biological Psychiatry 2022 Annual Meeting, New Orleans, and the Society for Neuroscience 2022 Annual Meeting, San Diego. A pre-review version of this paper was published on bioRxiv at https://doi.org/10.1101/2023.02.28.530498 on 1 March 2023. This work was supported by the National Institutes of Health (NIH) (R01MH124694, R01MH120402 and T32MH112485 to J.L.R.; R01MH113550, R01MH120482, R01MH112847, R37MH125829 and R01EB022572 to T.D.S.; K23DA057486 to B.T.-C.); the Harvard Medical School Dupont Warren Fellowship (to J.A.C.); the Louis V. Gerstner Scholar Award (to J.A.C.); the MQ Foundation (to J.L.R.); and the Mass General Hospital Early Brain Development Initiative (to J.L.R.). The authors are grateful to R. L. Buckner and E. C. Dunn for helpful comments on the paper and to S. Perdomo and A. Blum for conducting additional statistical analysis and assisting with final paper preparation. We thank the investigators and staff at the Adolescent Brain Cognitive Development (ABCD) sites and coordinating centers as well as study participants and their families for their essential contributions to this work. Data used in the preparation of this article were obtained from the ABCD study (https://abcdstudy.org), held in the NIMH Data Archive. This is a multisite, longitudinal study designed to recruit more than 10,000 children age 9–10 years and follow them over 10 years into early adulthood. The ABCD study is supported by the NIH and additional federal partners under award numbers U01DA041048, U01DA050989, U01DA051016, U01DA041022, U01DA051018, U01DA051037, U01DA050987, U01DA041174, U01DA041106, U01DA041117, U01DA041028, U01DA041134, U01DA050988, U01DA051039, U01DA041156, U01DA041025, U01DA041120, U01DA051038, U01DA041148, U01DA041093, U01DA041089, U24DA041123 and U24DA041147. A full list of supporters is available at https://abcdstudy.org/federal-partners.html. A listing of participating sites and a complete listing of the study investigators can be found at https://abcdstudy.org/consortium_members/. ABCD consortium investigators designed and implemented the study and/or provided data but did not necessarily participate in the analysis or writing of this report. This article reflects the views of the authors and may not reflect the opinions or views of the NIH or ABCD consortium investigators.

Author information

Authors and Affiliations

Authors

Contributions

Conception and experimental design: K.K., J.A.C., A.E.D., H.L., B.T.-C., H.E., T.D.S. and J.L.R. Data acquisition: C.E.H., H.E., R.L.H., K.F.D. and J.L.R. Data analysis: S.E., H.E., K.K., J.A.C., E.L., K.A.K., D.E.H., O.B., R.F.S., H.L., K.F.D. and J.L.R. Data interpretation: S.E., K.K., J.A.C., E.L., K.A.K., D.E.H., O.B., A.E.D., H.L., B.T.-C., R.L.H., D.M.B., T.D.S., K.F.D. and J.L.R. Drafting and revision of the paper: all authors. All authors approved the submitted version of the paper and have agreed to be personally accountable for their own contributions.

Corresponding author

Correspondence to Joshua L. Roffman.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Neuroscience thanks Eduard Klapwijk and Aristotle Voineskos for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Stability of manual quality control (MQC) ratings over time (n = 10,295).

Scans were assigned to deciles based on the sequence in which they received MQC ratings by a single trained rater. (a) Box and whisker plots show distribution of MQC ratings for each time period, after adjusting for age, gender, scanner manufacturer, and externalizing psychopathology. Adjacent marks show unadjusted mean ratings for the same period. (b) Box and whisker plots show distribution of the log of surface hole numbers (SHN), stratified by decile and MQC rating. For both plots, box indicates median and interquartile range (IQR), whiskers indicate 1.5x IQR, circles indicate mild outliers (1.5 to 3 x IQR), and asterisks indicate extreme outliers (>3 x IQR).

Extended Data Fig. 2 Signal dropout in sMRI processing (n = 228).

(a) Examples of dropout regions where FreeSurfer segmentation failed and did not include a substantial portion of cortex. (b) Distribution of approximate volume of dropout area estimated by ellipsoid volume calculated and distribution of (c) sagittal, (d) coronal, and (e) axial extent. (f) Distributions of drop-out regions overlaid on exemplar brain thresholded at n = 10 subjects. Heat map represents number of overlapping subjects.

Extended Data Fig. 3 Comparison of manual quality control (MQC) and surface hole number (SHN) to other automated quality control metrics (QCMs) at Baseline (n = 10,294) and Year 2 (n = 999).

At Baseline, SHN values correlated more strongly with MQC ratings than did any other IQM (a), and SHN tiers closely approximated MQC ratings in detecting variance in other IQMs (b). The same patterns were apparent using Year 2 scans (c, d). In (b) and (d) box plots embedded within violin plots indicate median and interquartile range (IQR), whiskers indicate 1.5x IQR, and boxed numbers indicate the number outliers >6 standard deviations from the mean. FBER: Foreground-background energy ratio; BG; EFC: Entropy-focus criterion; QI2: Mortamet’s quality index 2.

Extended Data Fig. 4 Comparison of SHN tier effects on sMRI indices at (a) Baseline (n = 10,295) and (b) Year 2 (n = 6,941); compare to Fig. 2.

Maps at left show linear associations of SHN tier (A to D) with cortical thickness, surface area, and volume. Maps at right contrast thickness, surface area, and volume highest quality images (SHN = A) with those assigned to lower quality ratings. Covariates included age, gender, estimated intracranial volume (fixed effects), site, and scanner manufacturer (random effects).

Extended Data Fig. 5 Unique contributions of SHN tiers versus MQC to variance in sMRI indices, n = 10,295.

(a) Linear association of MQC on cortical indices after controlling for SHN tiers. (b) Linear association of SHN tiers on cortical indices after controlling for MQC. Covariates included age, gender, estimated intracranial volume (fixed effects), site, and scanner manufacturer (random effects).

Extended Data Fig. 6 Included Year 2 follow-up scans.

Among 11,875 total participants at baseline, Year 2 T1 scans were available from 7,829; of these, 6,941 were eligible for processing with FreeSurfer, and 1,000 were semi-randomly selected for MQC ratings (see Methods for additional details).

Extended Data Fig. 7 Relationship of surface hole number (SHN) to manual quality control (MQC) in selected Year 2 follow-up scans (n = 999).

(a) Density plot of SHN values, stratified by MQC ratings. (b) Distribution of MQC ratings as related to SHN for each SHN tier.

Extended Data Fig. 8 Effects of manual edits on sMRI indices, stratified by MQC rating.

Edits were conducted on 150 scans with MQC = 1 and 30 scans with MQC = 2. Maps reflect effect sizes of pre-to-post edit changes in (a) cortical thickness, (b) cortical surface area, and (c) cortical volume. Note increased effects of edits in MQC = 2 relative to MQC = 1. (d) Post-edit thickness reduction along the superior sagittal sinus, which is frequently misattributed to pial surface during preprocessing.

Extended Data Fig. 9 Composite maps showing location and direction of sMRI measurement errors detected by manual quality control and cortical edits, among MQC = 1 and 2 scans only.

Highlighted regions show either significant differences in sMRI indices between MQC = 1 and MQC = 2 scans, significant effects of cortical edits, or both. Note that, when co-occurring within the same region, errors due to poor scan quality (assessed by MQC) do not necessarily occur in the same direction as errors requiring manual edits.

Extended Data Fig. 10 Effects of manual edits on cortical thickness and age-thickness relationships MGH sample, stratified by age group (n = 292).

(a) Violin plots show effect size and related variance of manual edits on cortical thickness in the MGH sample, stratified by age group. The 18 included ROIs are those that also showed significant effects of edits on cortical thickness in the ABCD cohort, in the same direction. Regions are ordered by effect size in the 8- to 10-year-old group. Means are represented by black circles. Note that effect sizes and variance diminished with age. (b) Effects of edits on the magnitude of age-thickness relationships within the MGH sample across 68 cortical ROIs, stratified by age group. Each marker shows the age-thickness effect size for a given ROI. Edits strengthened age-thickness effects (that is, effect sizes became more negative, indicated by lower intercept of the best-fit line compared to the dashed unity line) at age 8-10, but not in other age groups.

Supplementary information

Reporting Summary

Supplementary Tables

Supplementary Tables 1–14.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Elyounssi, S., Kunitoki, K., Clauss, J.A. et al. Addressing artifactual bias in large, automated MRI analyses of brain development. Nat Neurosci 28, 1787–1796 (2025). https://doi.org/10.1038/s41593-025-01990-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41593-025-01990-7

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing