DeepCor: denoising fMRI data with contrastive autoencoders

Zhu, Yu; Aglinskas, Aidas; Anzellotti, Stefano

doi:10.1038/s41592-025-02967-x

Brief Communication
Published: 28 November 2025

DeepCor: denoising fMRI data with contrastive autoencoders

Nature Methods volume 23, pages 334–337 (2026)Cite this article

2452 Accesses
59 Altmetric
Metrics details

Subjects

Abstract

Functional magnetic resonance imaging (fMRI) allows noninvasive measurement of neural activity with high spatial resolution. However, fMRI data are affected by noise. Here we introduce and evaluate a denoising method (DeepCor) that utilizes deep generative models to disentangle and remove noise. The method is applicable to data from single participants. DeepCor outperforms other state-of-the-art denoising approaches on a variety of simulated datasets. In real fMRI data, DeepCor enhances BOLD signal responses to face stimuli, outperforming CompCor by 215%.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of DeepCor workflow and its performance on simple synthetic datasets.**

**Fig. 2: Evaluation of DeepCor performance on BrainIAK-simulated and real fMRI data.**

Data availability

The fMRI data used to extract noise parameters for realistic data simulations are available via DataSpace at http://arks.princeton.edu/ark:/88435/dsp01dn39x4181 and the StudyForrest fMRI data are available via the Study Forrest repository at https://www.studyforrest.org/. The ABIDE I dataset has been used for functional connectivity analysis (https://fcon_1000.projects.nitrc.org/), and the resulting seven networks are available via GitHub at https://github.com/ThomasYeoLab/CBIG/blob/master/stable_projects/brain_parcellation/Yeo2011_fcMRI_clustering/1000subjects_reference/Yeo_JNeurophysiol11_SplitLabels/MNI152/Centroid_coordinates/Yeo2011_7Networks_N1000.split_components.FSL_MNI152_2mm.Centroid_RAS.csv. Source data are provided with this paper.

References

Poldrack, R. A. The role of fMRI in cognitive neuroscience: where do we stand? Curr. Opin. Neurobiol. 18, 223–227 (2008).
Article CAS PubMed Google Scholar
Canario, E., Chen, D. & Biswal, B. A review of resting-state fMRI and its use to examine psychiatric disorders. Psychoradiology 1, 42–53 (2021).
Article PubMed PubMed Central Google Scholar
Liu, T. T. Noise contributions to the fMRI signal: an overview. NeuroImage 143, 141–151 (2016).
Article PubMed Google Scholar
Glover, G. H. Overview of functional magnetic resonance imaging. Neurosurg. Clin. 22, 133–139 (2011).
Article Google Scholar
Abid, A. & Zou, J. Contrastive variational autoencoder enhances salient features. Preprint at https://arxiv.org/abs/1902.04601 (2019).
Aglinskas, A., Hartshorne, J. K. & Anzellotti, S. Contrastive machine learning reveals the structure of neuroanatomical variation within autism. Science 376, 1070–1074 (2022).
Article CAS PubMed Google Scholar
Behzadi, Y., Restom, K., Liau, J. & Liu, T. T. A component based noise correction method (CompCor) for bold and perfusion based fMRI. NeuroImage 37, 90–101 (2007).
Article PubMed PubMed Central Google Scholar
Poskanzer, C., Fang, M., Aglinskas, A. & Anzellotti, S. Controlling for spurious nonlinear dependence in connectivity analyses. Neuroinformatics https://doi.org/10.1007/s12021-021-09540-9 (2022).
Satterthwaite, T. D. et al. Motion artifact in studies of functional connectivity: characteristics and mitigation strategies. Hum. Brain Mapping 40, 2033–2051 (2019).
Article Google Scholar
Yang, Z. et al. A robust deep neural network for denoising task-based fMRI data: an application to working memory and episodic memory. Med. Image Anal. 60, 101622 (2020).
Article PubMed Google Scholar
Yang, Z. et al. Disentangling time series between brain tissues improves fMRI data quality using a time-dependent deep neural network. NeuroImage 223, 117340 (2020).
Article PubMed PubMed Central Google Scholar
Zhao, C., Li, H., Jiao, Z., Du, T. & Fan, Y. A 3D convolutional encapsulated long short-term memory (3DConv-lSTM) model for denoising fMRI data. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Proc., Part VII 23 (eds Martel, A. L. et al.) 479–488 (Springer, 2020).
Chen, Z. et al. Deep learning for image enhancement and correction in magnetic resonance imaging-state-of-the-art and challenges. J. Digit. Imag. 36, 204–230 (2023).
Article Google Scholar
Kumar, M. et al. BrainIAK: the brain imaging analysis kit. Aperture Neuro. https://doi.org/10.52294/31bb5b68-2184-411b-8c00-a1dacb61e1da (2021).
Power, J. D., Plitt, M., Laumann, T. O. & Martin, A. Sources and implications of whole-brain fMRI signals in humans. NeuroImage 146, 609–625 (2017).
Article PubMed Google Scholar
Kanwisher, N., McDermott, J. & Chun, M. M. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci. 17, 4302–4311 (1997).
Article CAS PubMed PubMed Central Google Scholar
Hanke, M. et al. A studyforrest extension, simultaneous fMRI and eye gaze recordings during prolonged natural stimulation. Sci. Data 3, 1–15 (2016).
Article Google Scholar
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. Preprint at https://arxiv.org/abs/1312.6114 (2014).
Aglinskas, A., Bergeron, A. & Anzellotti, S. Understanding heterogeneity in psychiatric disorders: a method for identifying subtypes and parsing comorbidity. Psychiatry Clin. Neurosci. https://doi.org/10.1111/pcn.13829 (2025).
Ellis, C. T., Baldassano, C., Schapiro, A. C., Cai, M. B. & Cohen, J. D. Facilitating open-science with realistic fMRI simulation: validation and application. PeerJ 8, e8564 (2020).
Article PubMed PubMed Central Google Scholar
Bejjanki, V. R., Da Silveira, R. A., Cohen, J. D. & Turk-Browne, N. B. Noise correlations in the human brain and their impact on pattern classification. PLoS Comput. Biol. 13, e1005674 (2017).
Article PubMed PubMed Central Google Scholar
Fang, M., Aglinskas, A., Li, Y. & Anzellotti, S. Angular gyrus responses show joint statistical dependence with brain regions selective for different categories. J. Neurosci. 43, 2756–2766 (2023).
Article CAS PubMed PubMed Central Google Scholar
Zhu, Y., Aglinskas, A. & SCCN Lab. Aglinskas/DeepCor: Publication code 1.1. Zenodo https://doi.org/10.5281/zenodo.17392231 (2025).

Download references

Acknowledgements

This work was supported by a NSF CAREER grant (1943862) awarded to S.A. and by a startup grant from Boston College awarded to S.A. The funders had no role in the study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank M. Gregas, F. Pari and W. Qiu for support with high-performance computing and A. Anzellotti for suggesting to include voxel coordinate information as input to the denoising model.

Author information

These authors contributed equally: Yu Zhu, Aidas Aglinskas.

Authors and Affiliations

Department of Psychology and Neuroscience, Boston College, Boston, MA, USA
Yu Zhu, Aidas Aglinskas & Stefano Anzellotti
Center for Computational Molecular Biology, Brown University, Providence, RI, USA
Yu Zhu

Authors

Yu Zhu
View author publications
Search author on:PubMed Google Scholar
Aidas Aglinskas
View author publications
Search author on:PubMed Google Scholar
Stefano Anzellotti
View author publications
Search author on:PubMed Google Scholar

Contributions

S.A. conceived the project. Y.Z. and S.A. designed the model. Y.Z. developed the software, and A.A. and S.A. adapted it to task data. Y.Z. generated the simulated data. A.A. preprocessed the real data. Y.Z. performed the analysis on simulated data and functional connectivity analysis. A.A. performed the analysis on task fMRI. Y.Z. and A.A. prepared the figures. Y.Z., A.A. and S.A. wrote the manuscript.

Corresponding authors

Correspondence to Yu Zhu or Stefano Anzellotti.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Erica Busch and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Nina Vogt, in collaboration with the Nature Methods team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Comparison of denoising performance: CompCor vs DeepCor.

Box plots showing mean R-squared values (n=10) for DeepCor and CompCor denoising methods under six simulation conditions, categorized by Linear and Nonlinear noise types with standard deviations (SD) of 0.5, 1.0, and 2.0. Each box plot represents 10 independent simulation runs per condition and method. The central line within each box represents the median, while the upper and lower bounds correspond to the first (Q1) and third quartiles (Q3), capturing the interquartile range (IQR). Whiskers extend to the smallest and largest values within 1.5 times the IQR, while individual points beyond this range are plotted as outliers. Significance bars indicate statistical differences between methods, with p = 9.13 × 10⁻⁵ < 0.001 (***) for all conditions except Nonlinear (SD=2.0), where p = 0.07 (*).

Source data

Extended Data Fig. 2 Robustness of DeepCor to random seed variability.

(a) Box plots showing the R-squared values (n=1500) of individual voxels for the simple simulation scenario, where the signal and noise are combined linearly with standard deviation (SD) = 1. (b) Box plots illustrating the R-squared distribution (n=2323) under the realistic simulation setting. Each box plot represents a different random seed. The central line within each box represents the median, while the upper and lower bounds correspond to the first (Q1) and third quartiles (Q3), capturing the interquartile range (IQR). Whiskers extend to the smallest and largest values within 1.5 times the IQR, while individual points beyond this range are plotted as outliers.

Source data

Extended Data Fig. 3 Robustness of DeepCor and CompCor to the number of times points in input data.

(a) Box plots showing the mean R-squared values (n=10) for DeepCor and CompCor in relationship to the number of time points in the simple simulation scenario, where the signal and noise are combined linearly with a standard deviation of 1. Each box plot represents results across 10 independent seeds, with significance bars indicating statistical differences between methods (p = 9.13 × 10⁻⁵ < 0.001, *** for all time points). (b) Box plots illustrating the mean R-squared values (n=10) for CompCor and DeepCor across different time points under the BrainIAK realistic simulation setting. Significance bars denote statistical differences between methods, with p = 9.13 × 10⁻⁵ < 0.001 (***) for all conditions. The central line within each box represents the median, while the upper and lower bounds correspond to the first (Q1) and third quartiles (Q3), capturing the interquartile range (IQR). Whiskers extend to the smallest and largest values within 1.5 times the IQR, while individual points beyond this range are plotted as outliers.

Source data

Extended Data Fig. 4 Robustness of DeepCor and CompCor to the number of training voxels in input data.

(a) Box plots showing the mean R-squared values (n=10) for CompCor and DeepCor in relationship to the number of training voxels in the simple simulation scenario, where the signal and noise are combined linearly with a standard deviation of 1. Each box plot represents results across 10 independent seeds, with significance bars indicating statistical differences between methods (p = 9.13 × 10⁻⁵ < 0.001, *** for all conditions). (b) Box plots illustrating the mean R-squared values (n=10) for CompCor and DeepCor across different training voxels under the BrainIAK realistic simulation setting. Significance bars denote statistical differences between methods, with p = 9.13 × 10⁻⁵ < 0.001 (***) for all conditions. The central line within each box represents the median, while the upper and lower bounds correspond to the first (Q1) and third quartiles (Q3), capturing the interquartile range (IQR). Whiskers extend to the smallest and largest values within 1.5 times the IQR, while individual points beyond this range are plotted as outliers.

Source data

Extended Data Fig. 5 Impact of latent dimension on DeepCor’s performance and final validation loss.

(a-b) Box plots illustrating the relationship between the latent dimension size and mean R-squared values (n=10) for DeepCor in both (a) the simple simulation scenario and (b) the realistic BrainIAK simulation scenario. The latent dimension indicates the output size of each encoder; the total latent dimension is twice this size, as it results from concatenating the outputs of two encoders. Each box plot represents results across 10 independent seeds. (c-d) Box plots depicting the validation loss (n=10) at the final training epoch as a function of latent dimension size in (c) the simple simulation scenario and (d) the realistic BrainIAK simulation scenario. Each box plot represents results across 10 independent seeds. The central line within each box represents the median, while the upper and lower bounds correspond to the first (Q1) and third quartiles (Q3), capturing the interquartile range (IQR). Whiskers extend to the smallest and largest values within 1.5 times the IQR, while individual points beyond this range are plotted as outliers.

Source data

Extended Data Fig. 6 Impact of number of principal components (PCs) on CompCor’s performance.

(a) Box plots of mean r-squared score over 10 rounds (n=10) with different seeds, for the denoised testing voxels to their ground truth, coming from the simple simulation dataset consisting of a linear combination of signal and noise with the noise proportion’s standard deviation set to 1, evaluated using different numbers of principal components (PCs) in the CompCor method. The horizontal line represents the mean R-squared value achieved by DeepCor on the same simulated dataset. (b) Box plots (n=10), similar to (a), but evaluated on the BrainIAK simulation dataset. Simulation parameters were the same as the ones reported in the main text. The central line within each box represents the median, while the upper and lower bounds correspond to the first (Q1) and third quartiles (Q3), capturing the interquartile range (IQR). Whiskers extend to the smallest and largest values within 1.5 times the IQR, while individual points beyond this range are plotted as outliers.

Source data

Extended Data Fig. 7 Impact of latent dimension on DeNN’s performance.

Box plots showing DeNN’s mean R-squared scores across 10 runs with different random seeds (n=10), evaluating the denoised testing voxels against their ground truth from the realistic BrainIAK simulation dataset. The latent dimension is set to the default value of 16 (labeled as “Default” on the left) and doubled to 32 (labeled as “Double” on the right). Simulation and other model parameters were the same as the ones reported in the main text. The central line within each box represents the median, while the upper and lower bounds correspond to the first (Q1) and third quartiles (Q3), capturing the interquartile range (IQR). Whiskers extend to the smallest and largest values within 1.5 times the IQR, while individual points beyond this range are plotted as outliers.

Source data

Extended Data Fig. 8 Robustness of CompCor and DeepCor to the number ROI voxels mixed in the RONI.

In real fMRI data, RONI voxels can contain some proportion of signal of neural origin. We tested DeepCor’s robustness to this by including different proportions of ROI voxels in the RONI. Box plots showing the mean R-squared values (n=10) for DeepCor and CompCor in relationship to the percent of ROI voxels included in the RONI. Each box plot represents results across 10 independent seeds, with significance bars indicating statistical differences between methods (p = 9.13 × 10⁻⁵ < 0.001, *** for all conditions). The central line within each box represents the median, while the upper and lower bounds correspond to the first (Q1) and third quartiles (Q3), capturing the interquartile range (IQR). Whiskers extend to the smallest and largest values within 1.5 times the IQR, while individual points beyond this range are plotted as outliers.

Source data

Extended Data Fig. 9 Functional connectivity analysis.

(a) Overview of the functional connectivity evaluation pipeline, incorporating a Yeo et al.-defined 51-region brain mask. The pipeline processes whole-brain fMRI data, applies denoising methods, extracts regional mean signals, and computes interregional Pearson correlations to generate functional connectivity matrices. (b) Functional connectivity matrices from a single participant, derived from raw data, as well as CompCor-, DeNN-, and DeepCor-denoised fMRI signals. (c) Bar plot comparing within-network and between-network correlation strengths (n=200) across the four methods (raw data, CompCor, DeNN, and DeepCor). Error bars represent standard errors. Statistical significance was assessed using a one-sided Wilcoxon signed-rank test, comparing each denoising method (CompCor, DeNN, and DeepCor) with Raw data for Within-Network and Between-Network correlations, and comparing DeepCor with Raw, CompCor, and DeNN for the Difference (Within - Between) measure, with significance levels denoted as p < 0.001 (***). Exact p-values: Within-Network (vs Raw) - CompCor p = 2.8 × 10⁻³³, DeNN p = 7.2 × 10⁻³⁵, DeepCor p = 9.0 × 10⁻³²; Between-Network (vs Raw) - CompCor p = 4.5 × 10⁻³⁴, DeNN p = 7.2 × 10⁻³⁵, DeepCor p = 7.2 × 10⁻³⁵; Difference (Within-Between; DeepCor vs others) - vs Raw p = 1.0 × 10⁻³⁴, vs CompCor p = 2.9 × 10⁻³⁴, vs DeNN p = 9.6 × 10⁻³⁴. (d) Mean correlation +/- SEM between the FEF and PCC, two regions exhibiting anticorrelated activity, across denoising methods. Background shading indicates the correlation values, scaled consistently with the connectivity matrices. FEF: Frontal Eye Field, PCC: Posterior Cingulate Cortex.

Source data

Extended Data Fig. 10 Algorithm for computing the padding size.

The algorithm takes the number of time points as input and outputs the padding arguments for each layer of the encoder and decoder.

Supplementary information

Supplementary Information

Supplementary Tables 1–10 and Notes 1–3.

Reporting Summary

Peer Review File

Source data

Source Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 7

Statistical source data.

Source Data Extended Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 9

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhu, Y., Aglinskas, A. & Anzellotti, S. DeepCor: denoising fMRI data with contrastive autoencoders. Nat Methods 23, 334–337 (2026). https://doi.org/10.1038/s41592-025-02967-x

Download citation

Received: 29 November 2023
Accepted: 29 October 2025
Published: 28 November 2025
Version of record: 28 November 2025
Issue date: February 2026
DOI: https://doi.org/10.1038/s41592-025-02967-x

Subjects

Abstract

Access options

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links