Real-world unified denoising for multi-organ fast MRI: a large-scale prospective validation

Shao, Yuchen; Huang, Hongyan; Zhang, Lingyan; Li, Dongsheng; Ding, Zhiguang; Wang, Fan; Chen, Shengli; Lin, Shiwei; Gu, Yuning; Du, Mu; Li, Hongbing; Liang, Jiuping; Huang, Xiaoqian; Liu, Aowen; Zhong, Jiafu; Zhan, Yiqiang; Zhou, Xiang Sean; Shi, Feng; Liao, Shu; Sun, Kaicong; Shen, Dinggang; Qiu, Yingwei

doi:10.1038/s41746-026-02548-y

Download PDF

Article
Open access
Published: 19 March 2026

Real-world unified denoising for multi-organ fast MRI: a large-scale prospective validation

Yuchen Shao¹^na1,
Hongyan Huang²^na1,
Lingyan Zhang³^na1,
Dongsheng Li⁴^na1,
Zhiguang Ding²,
Fan Wang²,
Shengli Chen²,
Shiwei Lin²,
Yuning Gu⁵,
Mu Du⁶,
Hongbing Li⁷,
Jiuping Liang⁸,
Xiaoqian Huang⁵,
Aowen Liu⁵,
Jiafu Zhong⁵,
Yiqiang Zhan⁵,
Xiang Sean Zhou⁵,
Feng Shi⁵,
Shu Liao⁵,
Kaicong Sun¹,
Dinggang Shen^1,5,9 &
…
Yingwei Qiu²

npj Digital Medicine volume 9, Article number: 366 (2026) Cite this article

2613 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Lengthy acquisition time remains a key bottleneck for the widespread use of MRI in clinics. While accelerated MRI can reduce scan duration, it often introduces increased noise, compromising image quality and diagnostic reliability. In this study, we present a unified deep learning-based denoising model for multi-organ accelerated MRI, designed to operate directly on reconstructed images from commercial MRI systems. Our model was trained on a prospectively collected, large-scale real-world dataset comprising 148,930 noisy-clean image pairs from six clinical centers and four major MRI vendors, spanning six organs and 96 MRI protocols. On a test set of 20,143 real-world image pairs, our model consistently outperforms state-of-the-art denoising methods. Importantly, downstream evaluation using tissue segmentation demonstrates a 7.05% improvement in Dice score across multiple organs compared to noisy images. The model further generalizes effectively to 46,870 external clinical images from four independent cohorts, highlighting its robustness across various scanners and acquisition protocols. To assess clinical utility, two experienced radiologists conducted blinded evaluations across multiple organs, focusing on overall image quality, diagnostic confidence, and disease diagnosis. The denoised images retained high visual fidelity and yielded diagnostic performance equivalent to clean images even with acceleration factor of 3× compared to clinical scanning setup, such that many acquisitions can be completed within one minute. This unified MRI denoising model holds great potential for various clinical applications.

A foundation model for enhancing magnetic resonance images and downstream segmentation, registration and diagnostic tasks

Article 05 December 2024

Automated characterization of abdominal MRI exams using deep learning

Article Open access 25 July 2025

A compact and interpretable multi-source framework for heterogeneous medical image classification

Article Open access 02 May 2026

Introduction

Magnetic resonance imaging (MRI) is a cornerstone of non-invasive diagnostic imaging, owing to its superior soft-tissue contrast. However, the clinical utility of MRI remains constrained by inherently prolonged data acquisition times. In the era of artificial intelligence (AI), acceleration of MRI acquisition can generally follow two algorithmic strategies: AI-driven MRI reconstruction from undersampled k-space data, and AI-driven denoising applied to routinely reconstructed MR images. The former approaches^1,2,3,4,5,6, while effective, necessitate large-scale, fully-sampled k-space datasets for training, which are challenging to obtain due to their limited clinical relevance, substantial storage requirements, and the complexities associated with data access and formatting across various MRI platforms. Consequently, such models are typically tailored for specific organs, imaging contrasts, or acceleration factors. In contrast, denoising approaches operating on magnitude images (DICOM) leverage data readily available from standard reconstruction pipelines such as those based on GRAPPA⁷ or SENSE⁸, particularly from the widely deployed 1.5 T scanners, and do not require access to multi-coil, complex k-space data. Thus, the development of robust, generalizable, and versatile denoising models for accelerated MRI bears significant clinical promise, offering a practical pathway to improving image quality within existing clinical workflows.

Traditionally, the development of denoising algorithms has relied on the use of synthetically generated noisy images, created by superimposing Rician⁹ or mixed noise patterns onto clean images^{10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25}. While these strategies have enabled the advancement of denoising techniques, they rely on simplified noise models. Therefore, the resulting gap between synthesized noise and real-world noise causes inherent limitations of these methods in clinical practice, thus, these methods may introduce unanticipated risks. Diffusion models (DMs)^26,27,28,29 represent the latest development in image denoising. Conditional diffusion models leverage the noisy input as condition, sampling the denoised output from the posterior distribution during the reverse diffusion process. However, these approaches can produce hallucinated anatomical structures, raising concerns regarding safety and reliability in clinical applications. More recently, there is an increasing interest in DM-based frameworks for addressing inverse problems^30,31,32,33. These methods typically embed a degradation model into the reverse diffusion process at inference, steering the reconstruction to be consistent with observed measurements. Despite these advances, degradation models are commonly assumed to be linear functions for simplicity. Our observations of paired noisy and clean MR images reflect that real-world degradation in MRI is inherently non-linear, underscoring the need for denoising methods that can more accurately reflect the complex nature of MRI noise for clinical utility.

To address the limitations posed by hallucinations in diffusion models and the oversimplified characterization of degradation processes across heterogeneous clinical environments, we prospectively assembled a large-scale dataset comprising 148,930 paired noisy-clean MRI slices from six organs, sourced from six medical centers utilizing 1.5 T MRI scanners. Leveraging these real-world data, we developed a unified, advanced denoising framework for accelerated MR imaging, designed for seamless integration with standard commercial reconstruction algorithms. We performed extensive validation, encompassing internal (N = 102,060 slices) and external (N = 46,870 slices) cohorts from multiple centers. We systematically benchmark our method against five state-of-the-art denoising models and further validate our method by multi-reader assessments. Experimental results demonstrate that the proposed denoising model substantially extends the acceleration capacity of 96 routinely used MRI protocols, while remaining compatible with conventional reconstruction algorithms such as GRAPPA⁷ and SENSE⁸ on 1.5 T MR scanners. Notably, the acquisition time of these clinically representative MRI protocols can be significantly reduced within a range between 25% and 72% (with 30% on average), depending upon organ site and protocol (Supplementary Table 1). Importantly, many routinely used MRI protocols can be completed within one minute without compromising image quality and diagnostic accuracy, substantially enhancing the acquisition efficiency of MRI in clinical usage.

Results

Data sources and unified denoising network

Our unified denoising network was trained on a large-scale prospective dataset, which consists of 5366 real-world noisy-clean volume pairs (N = 102,060 slice pairs), covering six organs including head (N = 37,482), knee (N = 8,329), C-spine (N = 14,097), L-spine (N = 14,447), T-spine (N = 18,139), and shoulder (N = 9566) with 82 MRI protocols (e.g., T1w, T1-FLAIR, T2w, T2-FLAIR, DWI, PDw, DIXON), and three MRI manufacturers (i.e., SIEMENS, GE, Philips) acquired from January 2024 to August 2024 in three hospitals in Shenzhen and Guangzhou, China. Besides, for external evaluation, we further collected 2,157 volume pairs (N = 46,870 slice pairs) of healthy and non-healthy subjects including MRI scanners of Siemens, GE, UIH, and Philips from four data centers from October 2024 to March 2025 covering totally 96 MRI protocols (Siemens:29, GE:25, UIH:14, Philips:19). The noisy images were acquired by acceleration of the clinically used MRI sequences through either k-space subsampling or deactivation of k-space average. The clean images were obtained by either k-space fully sampling or k-space averaging. The acceleration rates of the clinically used sequences were mainly between 1.1× and 3.5×, depending on the MRI protocols and scanners. For convenience, we split the abovementioned different acceleration rates into three groups, i.e., 1.5×, 2×, and 3×. It is worth noting that the definition of “acceleration rate” used here differs from the one used in MRI reconstruction^{31,34,35,36,37,38}. Acceleration rate in MRI reconstruction often refers to undersampling rate in the k-space during acquisition. While the acceleration rate in this work refers to the number of excitations (averaged acquisitions). We summarize the population characteristics and imaging parameters in the Supplementary Table 1.

As the key principle of our model design, the denoised MR images are expected to have high fidelity and good sharpness with suppressed noise for routinely used MRI sequences. Due to the complexity and diversity of clinical scenarios, the proposed model is supposed to handle various MRI contrasts and multiple organs by different MRI scanners under different acceleration factors. To this end, we propose a novel denoising framework, which integrates a learnable real-world non-linear imaging model (describing the image degradation process) into a text-guided conditional DM framework as shown in Fig. 1. The cutting-edge text-guided DM framework is employed to “inpaint” the missing anatomical structures in the noisy images by resorting to its generative nature, which is particularly effective for cases of relatively high acceleration rates. Meanwhile, a degradation model is learned from real-world noisy-clean MR image pairs using the proposed multi-cycle (MC) training strategy. The non-linear degradation model is built on non-generative model, which alleviates the hallucination issue of the diffusion model and hence facilitates the data fidelity. To endow our denoising model with enhanced versatility, a pretrained text encoder CLIP is integrated into the diffusion model and finetuned by the Low-Rank Adaptation (LoRA). The non-linear degradation model and the conditional DM equipped with a finetuned text encoder were trained on the internal dataset containing 3750 volume pairs (N = 71,780 slice pairs). In the inference time, an objective function is constructed using the well-trained DM and degradation model with their model parameters frozen, which is solved iteratively along with the reverse diffusion process using gradient-based algorithm. For computational demands, under the setup of sampling steps as 10, our method overall has 248 M parameters and needs 6.31GB RAM for inference. A desktop computer with an NVIDIA GeForce RTX 3080 GPU (10GB VRAM) processes a DICOM image with the size of 256 × 256 in only 0.39 s. More detailed descriptions of our model are given in the Methods section.

Internal evaluation

We conducted comprehensive evaluation of our model on the internal dataset, which consists of in total 102,060 real-world noisy-clean slice pairs of head, knee, C-spine, L-spine, T-spine, and shoulder. The internal dataset was split subject-wise into training (N = 71,780), validation (N = 10,137), and test (N = 20,143) sets. On the test dataset, we conducted systematic evaluation of our model from different aspects, including (1) benchmarking with five state-of-the-art (SOTA) denoising methods quantitatively and qualitatively; (2) analyzing tissue segmentation of multiple organs on denoised images; (3) evaluating the effectiveness of core components of our framework in the ablation study. Detailed descriptions are presented in the following sections.

To validate the advancement in denoising of our model, we compared with five state-of-the-art denoising methods, including the CNN- based NBNet³⁹ and BME-X³³, and the diffusion model-based DDPM²⁶, RED-Diff³², and BlindDPS⁴⁰. The BlindDPS is an extended version of DPS⁴¹, which employs a learned non-linear degradation model³⁹, instead of a pre-defined linear one. In the experiments, we adopted the same degradation model as ours in BlindDPS. For a fair comparison, we used the same training strategy as ours for BlindDPS. We utilized three metrics for quantitative evaluation, including the Peak Signal-to-Noise Ratio (PSNR), the Structural Similarity Index Measure (SSIM), and the Learned Perceptual Image Patch Similarity (LPIPS). LPIPS is a widely used metric to measure perceptual similarity between two extracted features by a DL model such as VGG⁴² or AlexNet⁴³. In our work, we employed the LPIPS module in Python based on VGG16.

In Fig. 2a, we demonstrate visual comparison with the other methods for representative MRI sequences. For each MRI sequence, we illustrate both the denoised images and the close-up views of the marked green regions. It is shown that our model achieves significant superiority over the other methods in visual perception, especially for preserving structural fidelity and anatomical details. Moreover, it turns out that our model can effectively remove the noise in flat regions while retain the lesion regions with fine details. This merit arises from our particular combination of the learned real-world non-linear degradation model and the text-guided denoising DM. We also validate this property in External Evaluation section. More visual comparisons can be found in Supplementary Figs. 2–9.

**Fig. 2: Comparison with state-of-the-art methods on the internal dataset.**

In Fig. 2b, we summarize the results of different methods in terms of SSIM, PSNR, and LPIPS according to acceleration factors of 1.5×, 2×, and 3×. We can see that, for each acceleration factor, our model (marked in orange) obtains the best performance in all the three metrics. As the acceleration rate increases (fewer scans are averaged), the SNR drops due to magnified noise. In this circumstance, denoising algorithm plays a critical role for increasing diagnosis confidence. Besides, it turns out that our model obtains more evident performance gain in LPIPS than PSNR and SSIM, indicating the advancement of our model in visual perception.

In addition, we further evaluated our denoising model by tissue segmentation. To be specific, we adopted a widely used segmentation method named FAST⁴⁴ and segmented multiple organs. The segmentation results of different methods for acceleration factors of 1.5×, 2×, and 3× are illustrated in Dice score using barplots in Fig. 3a. We also performed statistical analysis for different organs, including white matter (WM), gray matter (GM), knee, C-spine, L-spine, and T-spine. We can see that our model outperforms the other methods significantly for all the investigated organs. The consistent performance gain in both the denoised images and their corresponding tissue segmentation verifies the superiority and effectiveness of our model.

**Fig. 3: Quantitative comparison of organ segmentation.**

Finally, we conducted ablation studies to evaluate the effectiveness of (1) the proposed MC strategy for learning the non-linear degradation model, (2) the conditional DM, (3) the LoRA-finetuned CLIP text encoder. We conduct experiments on the internal test dataset and list the results in the Supplementary Table 2. We can see that the use of MC can largely improve the denoising performance. In fact, from our empirical observations, MC can stabilize the training and improve the fidelity of the denoised images. In addition to the degradation model, the integration of conditional DM further improves the denoising performance due to its inherent generative nature, especially for highly accelerated MR images. Moreover, it is shown that the adoption of finetuned CLIP via LoRA can further improve the denoising performance by explicitly providing the information of imaging contrast, organ, acceleration factor, and MRI manufacturer.

External validation

In order to evaluate the generalizability of our model, we directly apply the pretrained model on a prospectively collected external dataset, which consists of 46,870 slice pairs from four cohorts. The external dataset covers slice pairs of head (N = 24,912), knee (N = 6778), C-spine (N = 4358), L-spine (N = 4284), T-spine (N = 2240), and shoulder (N = 4298) from Siemens (N = 8487), GE (N = 20,046), Philips (N = 17,702), and United Imaging Healthcare (UIH) (N = 635) with acceleration factors of 1.5× (N = 23,875), 2× (N = 16,359), and 3× (N = 6636). More descriptions of the population characteristics are detailed in the Supplementary Table 1. In this experiment, we not only conduct quantitative and qualitative evaluations on the denoised images and their corresponding segmentation maps, but also carry out extensive multi-round reader studies. The results of the multi-perspective evaluation are described in the following sections.

Similar to the evaluation on internal dataset, we demonstrate the denoising performance of different methods on external dataset in Fig. 4. We can see that, although the MR images were acquired by four different scanners (Siemens Aera 1.5T, GE SIGNA Voyager 1.5T, Philips Prodiva CX 1.5T, UIH uMR 660 1.5T) from four data centers, our model provides reliable denoised images of high image fidelity with an improvement of 1.09% in SSIM, 1.51dB in PSNR, and 0.0218 in LPIPS, compared to the other denoising methods, indicating its great generalizability to unseen data. More visual comparisons can be found in Supplementary Figs. 2–9. Besides, we also demonstrate across-slice consistency on a 3D T1w denoised image (UIH uMR 660, 1.5T, isotropic spacing of 0.67 mm) in the Supplementary Fig. 10. Axial view was used for slice-wise denoising, and our model achieves good consistency and continuity across slices in the sagittal and coronal views.

**Fig. 4: Comparison with state-of-the-art methods on the external dataset.**

In addition, we perform tissue segmentation on the four external unseen cohorts (N = 46,870 slices) as well. The results are depicted in Dice score in barplots in Fig. 3b. It is shown that our method achieves consistent superiority in tissue segmentation for different organs compared to the other methods. Interestingly, the other DM-based methods obtain slightly worse segmentation performance than NBNet, which might be due to the lack of structural fidelity preservation.

Finally, reader studies were conducted by two radiologists with 5 and 3 years of specialty experience, respectively, to evaluate the clinical utility of our model. In the first reader study, one prior published denoised model (BME-X) was also included for comparison. As shown in Fig. 5a, b and Supplementary Fig. 13, our denoised images demonstrate statistically significant superiority over the noisy images and BME-X images in terms of both the overall image quality and diagnostic confidence for head (n = 145), spinal (n = 50), and articular regions (n = 99) (all p < 0.05), achieving comparable visual fidelity to the GT images (all p > 0.05). For the second reader study assessing diagnostic performances of our denoised images and the GT images across three clinical benchmarks: (1) Fazekas grading of white matter lesions, (2) signal intensity agreement in spinal disc, and (3) musculoskeletal pathology detection. Brain MRI analysis shows no significant differences in white matter lesion detection accuracy (Fig. 5c, p > 0.05). Cervical and lumbar spine MRI analyses reveal near-perfect correlations in disc signal intensity indices (Spearman’s ρ = 0.84 for cervical, ρ = 0.98 for lumbar; both p < 0.001, Fig. 5d, e). Shoulder MRI evaluations (Fig. 5f) demonstrate promising diagnostic consistency for tendon integrity (supraspinatus/infraspinatus/subscapularis/teres minor), biceps tendon pathology, and cartilage and labral abnormalities. Knee MRI assessments (Figs. 5g) similarly achieve full diagnostic agreement with GT for meniscal tears, ligament injuries (ACL/PCL/MCL/LCL), and cartilage/bone marrow abnormalities. Notably, our denoised model exhibited superior performance when compared to BME-X in detecting injuries of the infraspinatus tendon, teres minor tendon, and long head of the biceps tendon (Fig. S1b). Good agreement was observed between the readers, with intraclass correlation coefficients ranging from moderate to excellent (0.67–0.99 [95% CI, 0.43–0.99]). Representative clinical cases are provided in Supplementary Fig. 1.

**Fig. 5: Multi-dimensional evaluation of imaging performance and clinical validity across anatomical region.**

Public dataset validation

To verify the versatility of our denoising method on other MRI field strengths, organs, and international cohorts, we conducted additional experiments on other cohorts: the M4Raw⁴⁵ low-field strength dataset and the fastMRI knee³⁴, prostate⁴⁶, and breast⁴⁷ datasets. The M4Raw dataset is a real-world low-field (0.3T) MRI dataset that contains repetitive MRI scans of the brain. It comprises a training set of 128 subjects, a validation set of 30 subjects, and a test set of 25 subjects. Due to large domain gap, we fine-tuned our pretrained model on this 0.3T dataset. Following data processing approach of existing methods²⁵, the training and validation dataset used three-repetition-averaged images as ground truth images, while higher-SNR labels of the test dataset were created by averaging six repetitions for T1w and T2w, and four for FLAIR.

We further conduct comparison experiments on the 1.5T and 3.0T MR images of the fastMRI knee³⁴, prostate⁴⁶, and breast⁴⁷ dataset. Since noisy images are unavailable in fastMRI, we chose two noise synthesis approaches to simulate noisy images: (1) Adding Rician noise with $\sigma =13/255$, following the common setting in real-world MR images^24,25; (2) Adding Rician noise with $\sigma \sim U[\mathrm{6,25}]$, covering wide real-world scenarios. For the knee dataset, we performed both zero-shot and few-shot evaluation (using 50 knee volumes). For the prostate and breast datasets, we performed zero-shot evaluation. Results are shown in the Supplementary Tables 3, 4 and 5. We can see that our method significantly outperforms all the other compared methods in terms of PSNR and SSIM, indicating the versatility of our method across various different MRI field strengths, organs, noise distributions, and application settings such as zero-shot and few-shot.

Discussion

In this work, we have presented a unified MRI denoising framework, developed on 148,930 real-world noisy-clean image pairs covering 96 MRI protocols for six organs from Siemens, GE, Philips, and UIH. Our model is built on a text-guided diffusion model and integrates a learnable non-linear degradation model at inference time to ensure data consistency. We rigorously assess our model using an internal test set comprising 20,143 clinical image pairs and a multi-center external dataset of 46,870 image pairs, employing both quantitative and qualitative analyses across similarity metrics, tissue segmentation tasks, and blinded reader studies. Our model consistently surpasses the leading denoising approaches, particularly with respect to visual fidelity. Furthermore, when directly applied to real-world MR images exhibiting varying noise levels from accelerated clinical acquisitions, our approach maintains diagnostic performance on par with ground-truth images, underscoring its robustness and generalizability to previously unseen data.

Our empirical investigations and comprehensive assessments reveal that the proposed model exhibits two principal denoising merits: (1) superior fidelity preservation and (2) substantial perceptual enhancement. The high-fidelity outcome stems from a non-linear degradation model, obtained by a large-scale, prospectively assembled dataset of real-world noisy–clean image pairs in conjunction with a multi-cycle learning strategy. From our empirical experience, we found that using a degradation model, which generates noisy images from the clean counterparts, tends to obtain sharper denoised images than the case of using a restoration model, which restores the clean images from the noisy counterparts. The hypothesis is that estimating noisy images is more difficult than estimating the clean ones when using non-generative models based on the L2-norm loss, which actually encourages an average correct mean. In consequence, the degradation model would require a relatively noisier clean image to better align with its noisy reference image. Since the degradation model is more difficult to converge than the restoration model, we propose to introduce a restoration model to accompany the training of the degradation model in a multi-cycle fashion. These two models are considered as mutual-inverse functions, and deep supervision is applied to each cycle. In such a way, the degradation model can provide sharper denoised images with high fidelity in a stable manner. The second merit originates from the text-assisted diffusion model, which facilitates the denoising by resorting to its generative nature, especially for severely contaminated images. The degradation model and diffusion model, trained independently, are synergistically integrated during inference, i.e., the diffusion process is regularized by the degradation model, constraining the denoising trajectory and further enhancing denoising performance.

Our model achieves significant improvement compared to the state-of-the-art DM-based denoising methods on both the internal and external test data for diverse clinical scenarios. Notably, the superiority of our method is particularly pronounced in terms of visual quality. We observe that this advantage becomes increasingly apparent under stronger noise associated with increased acceleration factors. Quantitatively, our model yields an average PSNR improvement of 2.06 dB for head, 2.02 dB for knee, 1.87 dB for cervical spine, 1.77 dB for lumbar spine, 1.61 dB for thoracic spine, and 1.98 dB for shoulder, compared to the cutting-edge diffusion model-based baselines on the test dataset. Furthermore, subsequent tissue segmentation on the denoised images achieves a mean Dice coefficient of 85.57% across six organs, consistently surpassing alternative denoising methods and thereby further verifying the efficacy of our framework.

Moreover, our model is shown to have promising generalizability to unseen cohorts without any finetuning. The large-scale evaluation on 2157 real-world volumes covering diverse clinical scenarios, including six organs, multiple noise levels, 96 MRI protocols, and four MRI vendors, demonstrates great denoising feasibility and reliability of our model. Specifically, we have extensively evaluated the clinical impact of our model on external unseen data in multi-perspective reader studies. Our findings demonstrate that our model achieves promising overall image quality and comparable diagnostic confidence as the GT images across head, spine, and joint, outperforming the noisy input in both evaluations significantly. More importantly, we show that our denoised images consistently achieve equivalent diagnostic performance compared to the GT images in three critical clinical assessments: (1) Fazekas grading for cerebral small vessel disease shows no statistically significant differences (p > 0.05); (2) signal intensity measurements in spinal disc exhibit strong linear correlations in both cervical (r = 0.84) and lumbar (r = 0.98) regions; and (3) key pathology detection rates in shoulder/knee examinations, including rotator cuff tears and meniscal injuries, reveal comparable diagnostic accuracy with no significant statistical disparities (p > 0.05). These evaluations confirm the ability of our model for preserving clinically essential tissue contrast properties and anatomical integrity. The synergistic combination of highly accelerated acquisition and high-fidelity image quality enables our methods as a viable strategy for optimizing efficiency of MRI workflow without compromising diagnostic reliability relative to the existing clinical setups.

Our denoising method is applied on the DICOM images, which are reconstructed from the MR scanner. Therefore, it can be seamlessly integrated into the user interface of the MR scanner as a plugin or an optional image enhancement module. Besides, our proposed method has flexible practical applications. The existing clinical protocols used in 1.5T MRI scanners usually take 1.2×–3× acceleration by undersampling in the k-space. To achieve enhanced SNR, usually 2–3 averages of accelerated scans are required, which increases the overall acquisition time. Figure 5 illustrates a significant difference in diagnosis confidence when using accelerated images with and without performing averaging (noisy image vs. GT image). Our denoising method achieves acceleration by getting rid of the averaging without compromising diagnosis confidence. The acquisition time for clinically representative MRI protocols can be significantly reduced by 30% on average. This significantly reduces the waiting time of patients and facilitates the equipment utilization efficiency. Besides, our method requires only a desktop computer with an NVIDIA GeForce RTX 3080 GPU (10GB VRAM) to inference within seconds, without encumbering the existing clinical workflow.

Despite the promising performance of our denoising method, our study also has several limitations: First, it is worth noticing that our denoising model is a blind denoising method, applied to the reconstructed DICOM images with no aliasing artifacts. That is, when using the traditional reconstruction algorithms such as GRAPPA⁷ or SENSE⁸, the acceleration through subsampling in the k-space is usually limited by up to 3$\times$ to avoid aliasing artifacts. Another line of work to reduce acquisition time relies on MRI reconstruction, which can deal with aliasing artifacts but needs to have access to the subsampled k-space data that is practically difficult to collect. Recently, some physics-informed MRI reconstruction methods^36,37 have chosen to potentially alleviate the reliance on real-world k-space data and boost model generalizability for fast MRI reconstruction. These physics-informed frameworks rely on theoretical equations to synthesize data. In contrast, our degradation model is learned directly from large-scale real-world data, and it empirically captures complex, non-linear noise characteristics (e.g., system-specific electronic noise, physiological motion). A combination of real-world blind denoising approaches and synthesis-empowered MRI reconstruction approaches would be a potential research direction. Second, our method was developed and trained on real-world data of six anatomical regions, including the head, shoulder, knee, cervical spine, lumbar spine, and thoracic spine. Future efforts will extend the applicability of our method to more organs.

To summarize, we proposed a unified denoising model for real-world accelerated MRI, which integrates an elaborately designed non-linear degradation model into a text-assisted diffusion model, leveraging a large-scale, prospectively collected real-world noisy-clean image pairs. Extensive validation—including quantitative evaluation, qualitative assessments, tissue segmentation, and multi-center reader studies—demonstrates that our model achieves superior perceptual quality, promising diagnostic reliability, and strong generalizability across various imaging protocols. In a nutshell, by integrating our proposed denoising model, the acquisition time of clinically representative MRI protocols can be significantly reduced by 30% on average, permitting the scan time below one minute without compromising diagnostic performance or diagnostic confidence compared to the routinely used scanning setups (typically 2–3 min or even longer). These findings highlight its promising potential for a broad range of clinical scenarios.

Methods

Ethical approval

The prospective data collection was approved by the institutional review board (IRB) at each institution with a waiver for informed consent: Huazhong University of Science and Technology Union Hospital (Nanshan Hospital, ky-2024-102301), The Second People’s Hospital of Panyu Guangzhou (py2y-xjsll-20250017), Longgang Central Hospital of Shenzhen (2025ECPJ170), Southern University of Science and Technology Hospital (SUCTH-014), Shenzhen Bao’an Songgang People’s Hospital (IRB-YJ-2025-043), and Shenzhen FuYong People’s Hospital (KY202603). In this study, patients were directly involved or recruited for the study. For all research involving human participants, informed consent to participate in the study has been obtained from participants.

Dataset collection and preprocessing

We collected internal data from three hospitals, including Huazhong University of Science and Technology Union Hospital (Nanshan Hospital), The Second People’s Hospital of Panyu Guangzhou, and Longgang Central Hospital of Shenzhen in Guangdong Province, China, comprising both healthy and non-healthy subjects. The acquisition of internal data was performed by three 1.5T MRI scanners, i.e., Siemens MAGNETOM Amira 1.5T, GE SIGNA Voyager 1.5T, and Philips Ingenia 1.5T, over the period from January 2024 to August 2024. The internal dataset consists of a total of 5366 noisy-clean volume pairs with overall 102,060 paired slices, as shown in Fig. 1a. The GT images were calculated by averaging the accelerated acquisitions or using fully sampled acquisitions. Some subjects were scanned for multiple organs, including head, knee, cervical vertebra, lumbar vertebra, thoracic vertebra, and shoulder. Depending on the scanned organs, 18 imaging protocols such as T1-weighted (T1w), T2-weighted (T2w), proton density-weighted (PDw), fat-suppressed T1 fluid-attenuated inversion recovery (T1 FLAIR), fat-suppressed T2 FLAIR, and diffusion-weighted imaging (DWI) were acquired. More details of the imaging parameters can be found in the Supplementary Table 1. In the experiments, the entire internal dataset was randomly divided into training, validation, and test sets in a ratio of 7:1:2.

Besides Nanshan Hospital, three additional hospitals, namely Southern University of Science and Technology Hospital, Shenzhen Bao’an Songgang People’s Hospital, and Shenzhen FuYong People’s Hospital, were involved (totally four hospitals) for external validation as shown in the Supplementary Table 1. To be specific, 2157 volume pairs (46,870 slice pairs) of six organs were collected from Siemens (N = 8487), GE (N = 20,046), Philips (N = 17,702), and UIH (N = 635). The same imaging sequences as for training were used for external validation covering accelerations of 1.5×, 2×, and 3×.

Given the fact that the noisy-clean image pairs were collected sequentially in clinical setups, there exists an inconsistency in image size and object orientation between the noisy and clean counterparts. To address this issue, we first converted the images into the frequency domain by the fast Fourier transform, then applied zero-padding in the frequency domain, and transformed it back to the image domain using the inverse Fourier transform. Subsequently, we utilized rigid registration to align the noisy images with the clean ones.

Overall Model architecture and training

Our denoising framework contains two main modules, namely, a non-linear degradation model and a text-guided conditional diffusion model, as shown in Fig. 1b. These two modules were trained individually and used collaboratively in the inference phase. As mentioned above, diffusion model has strong generative ability and can provide plausibly looking results on severely degraded images. However, diffusion model is prone to hallucination issues, which can lead to serious consequences in the safety-critical medical imaging field. To compensate this issue, we propose to introduce a learnable degradation model (given a clean image and text information of degradation, estimating the noisy counterpart) to strengthen the denoising fidelity and sharpness. From our empirical observations, directly training a degradation model on real-world image pairs is difficult due to the vast search space. We propose a multi-cycle training strategy that the degradation model is trained along with its inverse function, i.e., a restoration model. Specifically, we consider the degradation model and its restoration counterpart as a unity and stack multiples of the combination of these two models during training, where each of them is deeply supervised. This training strategy encourages multi-cycle consistency and leads to a more stable convergence and more reliable degradation. It is worth noting that the degradation and restoration models have the same architecture of NBNet³⁹ with 1-128-64-32-16-8-16-32-64-128-1 as channel numbers, but different model weights. With regard to the conditional diffusion model, the noisy image is employed as the condition of the denoiser, and a pretrained text encoder CLIP is finetuned using LoRA. We adopted the Dhariwal U-Net⁴⁸ as the denoiser of the diffusion model.

Multi-cycle training for non-linear degradation model

Denoising is regarded as an ill-posed inverse problem. Generally speaking, the inverse problem can be formulated as:

$$y=f\left(x\right)+\varepsilon$$

(1)

where $y$ is the noisy image (observation or measurement), $x$ is the expected clean image, $f({\rm{\cdot }})$ is a system model or degradation model (for denoising problem $f(\cdot )$ is usually considered as the identity function). $\varepsilon$ is usually simplified and assume to be white Gaussian noise (AWGN). However, in reality, the recorded intensity value is not pixel-independent, and the noise is more complex than AWGN. For example, the noise in the magnitude images (DICOM) of MRI is intensity-related, which is usually modeled by Rician distribution, and is more complex for multi-coil imaging. To better model the distribution of noise in real-world acquired MR images and deal with real-world scenarios, we learn a non-linear degradation function $f({\rm{\cdot }})$ by a neural network ${f}_{\varphi }$, aiming to better preserve the data fidelity and alleviate the hallucination effect.

The straightforward way to approximate the real-world degradation function $f(\cdot )$ is to directly train an end-to-end model ${f}_{\varphi }$

using the clean-noisy image pairs. However, in the experiments, we found that this training paradigm leads to unstable results. To address this issue, we propose a novel training strategy based on multi-cycle consistency. To be specific, in addition to the degradation model ${f}_{\varphi }$, we further introduce a restoration model ${f}_{\psi }$, which acts as the inverse function of the degradation model. The restoration model ${f}_{\psi }$ is cascaded with the degradation model ${f}_{\varphi }$ as a unity. A series of such unities is stacked and trained simultaneously by deep supervision. In such a way, the errors of the latter unities can be backpropagated to the former ones, and hence hierarchical constraints can be imposed to the degradation model, which effectively reduces the solution space and stabilizes the performance of the degradation model. It should be mentioned that different unities do not share model weights, and we set the number of the unities as two in our experiments according to the tradeoff between denoising performance and training resources.

Compared to the straightforward end-to-end training, the degradation model trained by our multi-cycle strategy has improved the fidelity and sharpness of the denoised images. In order to further enhance the visual perception of the denoised images, we utilize an adversarial loss to train the stacked unities. Specifically, each of ${f}_{\psi }$ and ${f}_{\varphi }$ uses a weighted sum of the L1-norm, MS-SSIM, and the adversarial loss as their loss function, and the overall loss is the sum of losses for both ${f}_{\psi }$ and ${f}_{\varphi }$ as below:

$${L}_{{\rm{Total}}}=\mathop{\sum }\limits_{n}^{N}\left[{L}_{n}\left({f}_{\psi }\left({x}_{n}^{{\rm{noisy}}}\right)\right)+{L}_{n}\left({f}_{\varphi }\left({x}_{n}^{{\rm{clean}}}\right)\right)\right]$$

(2)

$$\begin{array}{l}{L}_{n}\left({f}_{\psi }\left({x}_{n}^{\mathrm{noisy}}\right)\right)={\left|{f}_{\psi }\left({x}_{n}^{\mathrm{noisy}}\right)-{x}_{\mathrm{GT}}\right|}_{1}+{\lambda }_{1}{L}_{\mathrm{MS}-\mathrm{SSIM}}\left({f}_{\psi }\left({x}_{n}^{\mathrm{noisy}}\right),{x}_{\mathrm{GT}}\right)\\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,+{\lambda }_{2}D\left({f}_{\psi }\left({x}_{n}^{\mathrm{noisy}}\right),{x}_{\mathrm{GT}}\right)\end{array}$$

(3)

$$\begin{array}{l}{L}_{n}\left({f}_{\varphi }\left({x}_{n}^{\mathrm{clean}}\right)\right)={\left|{f}_{\varphi }\left({x}_{n}^{\mathrm{clean}}\right)-{x}_{\mathrm{noisy}}\right|}_{1}+{\lambda }_{1}{L}_{\mathrm{MS}-\mathrm{SSIM}}\left({f}_{\varphi }\left({x}_{n}^{\mathrm{clean}}\right),{x}_{\mathrm{noisy}}\right)\\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,+{\lambda }_{2}D\left({f}_{\varphi }\left({x}_{n}^{\mathrm{clean}}\right),{x}_{\mathrm{noisy}}\right)\end{array}$$

(4)

where ${x}_{n}^{{\rm{noisy}}}$ is the noisy input for the n-th restoration model, and ${x}_{n}^{{\rm{clean}}}$ is the clean(denoised) input for the n-th degradation model. ${x}_{{\rm{GT}}}$ is the clean GT image, and N is the overall number of unities. Function D(·) is the cross-entropy function serving as the discriminator to distinguish whether the results of the models ${f}_{\psi }$ and ${f}_{\varphi }$ resemble the real-world clean or noisy images, respectively. The same set of weighting parameters ${\lambda }_{1},\,{\lambda }_{2}$ was used for ${L}_{n}\left({f}_{\psi }\left({x}_{n}^{{\rm{noisy}}}\right)\right)$ and ${L}_{n}({f}_{\varphi }({x}_{n}^{{\rm{clean}}}))$. Note that the degradation network trained in the first unity was used for inference. For better interpretation, we illustrate the output of the degradation model over reverse diffusion process in the Supplementary Fig. 11.

LoRA-finetuned text guidance for diffusion model

In this work, we aim to develop a unified denoising model that can deal with multiple organs with clinically widely used imaging sequences by four MRI vendors. Since different organs with their imaging sequences and acceleration rates lead to large diversity in noise level and data distribution, we propose to integrate a CLIP⁴⁹ text encoder to explicitly guide the denoising model. The text encoder embeds the text prompt constructed by organ, contrast, acceleration rates, and vendor information and is finetuned by LoRA⁵⁰ on the training data. Specifically, the template for the prompt is: “MR image acquired by facilities from [vendor name], organ in image is [organ name], acquisition protocol is [protocol name]”. Acceleration rate information is included in the protocol name. To this end, a 16-token prompt containing the above imaging metadata is entered in a pre-trained CLIP text encoder of the Stable Diffusion v2.1. To adapt the pretrained CLIP to MRI data, we apply LoRA to finetune the pretrained CLIP. We ultimately obtain a 16 × 1024 text embedding. The image embedding is fused with the text embedding based on cross attention, where the features of the denoiser bottleneck are flattened and used as query (Q), and the text embeddings are used as key (K) and value (V). With the text guidance, the score-based denoiser can better group features from different classes into clusters with enlarged inter-class distance while reduced intra-class distance. We show the effectiveness of the CLIP-based text coder in the Supplementary Fig. 12.

Fidelity-enhanced optimization-based inference

Diffusion modes have strong generative ability but are prone to unrealistic structures, which is fatal for medical diagnosis. To address this issue, we incorporate data consistency into the diffusion model framework following the inference paradigm of RED-diff³², which is a variational diffusion model:

$$\mathop{\min }\limits_{\varphi }{D}_{{\rm{KL}}}\left({q}_{\varphi }({x}_{0}{|y}){||p}({x}_{0}{|y})\,\right)=\mathop{\min }\limits_{\varphi }{{\rm{E}}}_{{x}_{0} \sim {q}_{\varphi }}\left[\log {q}_{\varphi }\left({x}_{0}|y\right)-\log p\left({x}_{0}|y\right)\right]$$

(5)

Here,$\,\varphi$ are parameters of a learnable model that outputs the restoration result ${x}_{0}$ with the input $y$. The optimization of the KL objective function in Eq. 5 equals to a minimization problem:

$$\mathop{\min }\limits_{\varphi }\,{{\rm{E}}}_{{x}_{0} \sim {q}_{\varphi }}[-\log p\left(y|{x}_{0}\right)]+{D}_{{\rm{KL}}}\left({q}_{\varphi }({x}_{0}{|y}){||p}({x}_{0})\,\right)$$

(6)

The reverse diffusion process involves solving the above objective using stochastic gradient descent (SGD), so that the intermediate state $\mu$ of the reverse diffusion step continuously approximates the clean image ${x}_{0}$ given the measurement $y$:

$${{\nabla }_{{\mu }_{t}}{\mathscr{L}}}_{{\mu }_{t}}={\nabla }_{{\mu }_{t}}\left[-\log p\left(y|{\mu }_{t}\right)+{D}_{{\rm{KL}}}\left({q}_{\varphi }({\mu }_{t}{|y}){||p}({\mu }_{t})\,\right)\right]$$

(7)

$${\mu }_{t-1}={\mu }_{t}-{\nabla }_{{\mu }_{t}}{{\mathscr{L}}}_{{\mu }_{t}}$$

(8)

The first term in the above equation $\log p\left(y,|,{\mu }_{t}\right)$ can be realized by the reconstruction (data fidelity) term, corresponding to ${{||y}-{f}_{\varphi }\left({\mu }_{t}\right){||}}^{2}$, to ensure the fidelity of the result. The ${q}_{\varphi }\left({\mu }_{t},|,y\right)$ can be expressed using the simple modeling ${q}_{\varphi }\left(\mu ,|,y\right){\rm{:= }}N(\mu ,{\sigma }^{2}{I}_{n})$, and the score function $p({x}_{0})$ is modeled using the score-matching method^30,51. Ultimately, the KL-regularization term in the above loss function can be simplified to the following equivalent form:

$${\nabla }_{{\mu }_{t}}{D}_{{\rm{KL}}}\left({q}_{\varphi }({\mu }_{t}{|y}){||p}({\mu }_{t})\,\right)=\lambda {\left[{\mu }_{t}-{g}_{\theta }\left({x}_{t},t\right)\right]}^{{\rm{T}}}$$

(9)

In the above formula, $\lambda$ is a constant, and ${g}_{\theta }\left({x}_{t},t\right)$ is the output of the score matching denoiser that fits $p\left({x}_{0}\right)$. It is worth noting that the output of the score-matching denoiser is the clean image ${x}_{0}$ predicted according to ${x}_{t}$, rather than the standard normal distribution noise. For the complete mathematical proof of the above formula, please refer to the Proposition 2 in RED-diff³². To ease interpretability, we illustrate the evolution of the denoised image over reverse diffusion process in the Supplementary Fig. 11.

Implementation Details

The proposed framework was implemented using PyTorch. Both the degradation model and diffusion model were trained on a NVIDIA A100 Tensor Core GPU with 80GB GPU RAM. For the degradation model. We set the mini-batch size as 32. For both models, Adam was used as the optimizer with an initial learning rate of 10⁻⁴ and a weight decay of 0.998 for every one epoch. For the diffusion model, the denoiser was trained from the scratch for 100 epochs, and the text encoder was finetuned by LoRA (rank = 8) for 100 epochs. We empirically tuned the weighting parameters as ${\lambda }_{1}={\lambda }_{2}=$ 0.5 and ${\lambda }_{3}=$ 0.25 in our experiments based on grid search. The number of overall iterations of the reverse diffusion process is 200, and, within each step, one SGD-based optimization is performed.

Reader studies

Reader studies were conducted by two radiologists with 5 and 3 years of specialty experience, respectively, to evaluate the clinical utility of our model. The first study used a four-point Likert scale to assess both image quality and diagnostic confidence. Image quality was graded from 1 (poor: inadequate signal-to-noise ratio, low spatial resolution, severe artifacts) to 4 (excellent: optimal signal-to-noise ratio, high resolution, minimal artifacts). Diagnostic confidence was similarly rated from 1 (inadequate pathological evaluation) to 4 (definitive diagnostic certainty with excellent lesion detection), with intermediate scores indicating increasing degrees of diagnostic uncertainty and morphological clarity. The second reader study systematically evaluated diagnostic performance across multiple anatomical regions. Cerebral small vessel disease was analyzed using the Fazekas scale (Grades 1–3) for white matter hyperintensities. Intervertebral disc degeneration was quantified by the disc signal intensity index, calculated as the normalized T2 signal ratio between the nucleus pulposus and cerebrospinal fluid. Shoulder MRI evaluations covered tendon integrity (supraspinatus, infraspinatus, subscapularis, teres minor, biceps long head), glenoid labral pathology, cartilage abnormalities, and bone marrow lesions. Knee MRI assessments included meniscal tear evaluation (medial/lateral), ligament integrity (ACL, PCL, MCL, LCL), cartilage pathology, and bone marrow abnormalities. All findings were classified using a four-category diagnostic certainty scale: 1 = definitely absent, 2 = probably absent, 3 = probably present, 4=definitely present. To minimize bias, evaluations were distributed across three randomized sessions with ≥2-week intervals. Comprehensive blinding protocols included anonymization of patient identifiers and masking of acquisition methods (noisy images, Ours, and GT). Images from individual patients were prohibited from appearing in the same evaluation session. Discordant interpretations were adjudicated through consensus review by both readers. Inter-rater reliability for qualitative ratings was assessed by using a two-way mixed absolute-agreement intraclass correlation coefficient (Poor: <0.5; Moderate: 0.5 to <0.75; Good: 0.75 to <0.9; Excellent: ≥0.9). To validate the advancement of our model, we further conducted a comparative analysis between our model and one prior published model (BME-X) regarding both image quality and diagnostic performance.

Data availability

All public datasets involved in this study are available: (1) M4Raw⁴⁵ low-field strength dataset is available at https://github.com/mylyu/M4Raw. (2) The fastMRI knee³⁴, prostate⁴⁶, and breast⁴⁷ dataset is available at https://fastmri.med.nyu.edu/. The large-scale prospective dataset consists of routinely collected private MR scanning images. Due to its sensitive nature and the risk of reidentification, this dataset is subject to controlled access by means of a structured application process. The application link is https://docs.google.com/forms/d/e/1FAIpQLScJslufmteDZBLuD76k697 WUNUiEOykMZX-fewONAU2V0KvMg/viewform?usp=dialog.

Code availability

The code used in the current study to develop the algorithm is available at https://github.com/cgqzsyc/Real-world_MRI_Denoising.

References

Aggarwal, H. K., Mani, M. P. & Jacob, M. MoDL: model-based deep learning architecture for inverse problems. IEEE Trans. Med. Imag. 38, 394–405 (2018).
Article Google Scholar
Sriram, A. et al. End-to-end variational networks for accelerated MRI reconstruction. In Proc. Medical Image Computing and Computer Assisted Intervention–MICCAI 2020 64–73 (Springer, 2020).
Sun, K., Wang, Q. & Shen, D. Joint cross-attention network with deep modality prior for fast MRI reconstruction. IEEE Transactions on Med. Imaging 43, 558–569 (2023).
Google Scholar
Liu, Y. et al. Srm-net: Joint sampling and reconstruction and mapping network for accelerated 3T brain multi-parametric MR imaging. IEEE Trans. Biomed. Eng. 72, 1811–1824 (2024).
Sun, K., Duan, C., Lou, X. & Shen, D. Mip-enhanced uncertainty-aware network for fast 7T time-of-flight MRA reconstruction. IEEE Trans. Med. Imaging 44, 2270–2282 (2025).
Yang, G. et al. DAGAN: deep de-aliasing generative adversarial networks for fast compressed sensing MRI reconstruction. IEEE Transactions on Med. Imaging 37.6, 1310–1321 (2017).
Google Scholar
Griswold, M. A. et al. Generalized autocalibrating partially parallel acquisitions (GRAPPA). Magn. Reson. Med. 6, 1202–1210 (2002).
Article Google Scholar
Pruessmann, K. P., Weiger, M., Scheidegger, M. B. & Boesiger, P. SENSE: sensitivity encoding for fast mri. Magn. Reson. Med. 42, 952–962 (1999).
Article CAS PubMed Google Scholar
Gudbjartsson, H. & Patz, S. The Rician distribution of noisy MRI data. Magn. Reson. Med. 34, 910–914 (1995).
Article CAS PubMed PubMed Central Google Scholar
Jiang, D. et al. Denoising of 3d magnetic resonance images with multi-channel residual learning of convolutional neural network. Jpn. J. Radiol. 36, 566–574 (2018).
Article PubMed Google Scholar
You, X., Cao, N., Lu, H., Mao, M. & Wanga, W. Denoising of MR images with Rician noise using a wider neural network and noise range division. Magn. Reson. Imaging 64, 154–159 (2019).
Article PubMed Google Scholar
Ran, M. et al. Denoising of 3d magnetic resonance images using a residual encoder–decoder Wasserstein generative adversarial network. Med. Image Anal. 55, 165–180 (2019).
Article PubMed Google Scholar
Yang, H. et al. Denoising of 3d mr images using a voxel-wise hybrid residual MLP-CNN model to improve small lesion diagnostic confidence. In Proc. Medical Image Computing and Computer Assisted Intervention – MICCAI 2022 292–302 (Springer, 2022).
Sun, Y., Wang, L., Li, G., Lin, W. & Wang, L. A foundation model for enhancing magnetic resonance images and downstream segmentation, registration and diagnostic tasks. Nat. Biomed. Eng. 9, 521–538 (2024).
Jiang, J., Zhang, L. & Yang, J. Mixed noise removal by weighted encoding with sparse nonlocal regularization. IEEE Trans. Image Process. 23, 2651–2662 (2014).
Article PubMed Google Scholar
Sun, K. & Simon, S. Bilateral spectrum weighted total variation for noisy-image super-resolution and image denoising. IEEE Trans Signal. Process. 69, 6329–6341 (2021).
Mannam, V. et al. Real-time image denoising of mixed Poisson–Gaussian noise in fluorescence microscopy images using Image. Optica 9, 335–345 (2022).
Xiang, T., Yurt, M., Syed, A., Setsompop, K. & Chaudhari, A. Ddm²: self-supervised diffusion MRI denoising with generative diffusion models. In Proc. International Conference on Learning Representations (ICLR) (2023).
Zhou, L. et al. Neighboring slice noise2noise: self-supervised medical image denoising from single noisy image volume. Preprint at https://arxiv.org/html/2411.10831v1 (2024).
Shreyas, F., Batson, J. & Garyfallidis, E. Patch2self: denoising diffusion mri with self-supervised learning. Adv. Neural Inf. Process. Syst. 33, 16293–16303 (2020).
Google Scholar
Shreyas, F., Chowdhury, A., Batson, J., Drineas, P. & Garyfallidis, E. Patch2self2: self-supervised denoising on coresets via matrix sketching. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 27641–27651 (IEEE, 2024).
Wu, C., Kong, Q., Jiang, Z. & Zhou, S. Self-supervised diffusion MRI denoising via iterative and stable refinement. In Proc. International Conference on Learning Representations (ICLR) (2025).
Zhu, P., Liu, C., Fu, Y., Chen, N. & Qiu, A. Cycle-conditional diffusion model for noise correction of diffusion-weighted images using unpaired data. Med. image analysis (2025).
Chung, H., Lee, E. & Ye, J. MR image denoising and super-resolution using regularized reverse diffusion. IEEE Trans. Med. Imaging 42.4, 922–934 (2022).
Google Scholar
Tu, J., Shi, Y. & Lam, F. Score-based self-supervised MRI denoising. In Proc. International Conference on Learning Representations (ICLR) (2025).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. In Proc. 34th International Conference on Neural Information Processing Systems, 6840–6851 (2020).
Song, J., Meng, C. & Ermon, S. Denoising diffusion implicit models. In Proc. International Conference on Learning Representations (ICLR) (2020).
Liao, K., Yue, Z., Wang, Z. & Loy, C. Denoising as adaptation: noise-space domain adaptation for image restoration. In Proc. International Conference on Learning Representations (ICLR) (2025).
Martin, S., Gagneux, A., Hagemann, P. & Steidl, G. Pnp-flow: plug-and-play image restoration with flow matching. In Proc. International Conference on Learning Representations (ICLR) (2025).
Song, Y., Shen, L., Xing, L. & Ermon, S. Solving inverse problems in medical imaging with score-based generative models. In Proc. International Conference on Learning Representations (ICLR) (2022).
Ozturkler, B. et al. Smrd: sure-based robust MRI reconstruction with diffusion models. Int. Conf. on Med. Image Comput. Comput. Interv. Cham: Springer Nat. Switz. 199–209 (2023).
Mardani, M., Song, J., Kautz, J. & Vahdat, A. A variational perspective on solving inverse problems with diffusion models. In Proc. International Conference on Learning Representations (ICLR) (2024).
Ozturkler, B., Mardani, M., Vahdat, A., Kautz, J. & Pauly, J. Regularization by denoising diffusion process for MRI reconstruction. NeurIPS 2023 Work. on Deep. Learn. Inverse Probl. https://openreview.net/forum?id=NRGZmGbteB (ICLR, 2023).
Zbontar, J. et al. fastmri: an open dataset and benchmarks for accelerated MRI. Preprint at https://doi.org/10.48550/arXiv.1811.08839 (2018).
Jalal, A. et al. Robust compressed sensing MRI with deep generative priors. Adv. Neural Inf. Process. Syst. 34, 14938–14954 (2021).
Google Scholar
Wang, Z. et al. One-dimensional deep low-rank and sparse network for accelerated MRI. IEEE Trans. Med. Imaging 42.1, 79–90 (2022).
Google Scholar
Wang, Z. et al. One for multiple: physics-informed synthetic data boosts generalizable deep learning for fast MRI reconstruction. Med. Image Anal. 103, 103616 (2025).
Wang, Z. et al. A faithful deep sensitivity estimation for accelerated magnetic resonance imaging. IEEE J. Biomed. Heal. Inform. 28.4, 2126–2137 (2024).
Article Google Scholar
Cheng, S. et al. Nbnet: noise basis learning for image denoising with subspace projection. In Proc. IEEE/CVF Conference on Computer Vision Pattern Recognition 4896–4906 (IEEE, 2021).
Chung, H., Kim, J., Kim, S. & Ye, J. Parallel diffusion models of operator and image for blind inverse problems. In Proc. IEEE/CVF Conf. on Comput. Vis. Pattern Recognit. 6059–6069 (IEEE, 2023).
Chung, H., Kim, J., Mccann, M., Klasky, M. & Ye, J. Diffusion posterior sampling for general noisy inverse problems. In Proc. International Conference on Learning Representations (ICLR) (2023).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proc. International Conference on Learning Representations (ICLR) (2015).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25 (2012).
Zhang, Y., Brady, M. & Smith, S. Segmentation of brain MR images through a hidden markov random field model and the expectation-maximization algorithm. IEEE Trans. Med. Imaging 20, 45–57 (2001).
Article CAS PubMed Google Scholar
Lyu, M. et al. M4raw: a multi-contrast, multi-repetition, multi-channel mrik-space dataset for low-field mri research. Sci. Data 10.1, 264 (2023).
Article Google Scholar
Tibrewala, R., Dutt, T., Tong, A., Ginocchio, L. & Lattanzi, R. Fastmri prostate: a public, biparametric MRI dataset to advance machine learning for prostate cancer imaging. Sci. Data 11.1, 404 (2024).
Article Google Scholar
Zbontar, J. et al. FastMRI Breast: a publicly available radial k-space dataset of breast dynamic contrast-enhanced MRI. Radiol. Artif. Intell. 7.1, e240345 (2025).
Dhariwal, P. & Nichol, A. Diffusion models beat GANs on image synthesis. Adv. Neural Inf. Process. Syst. 34, 8780–8794 (2021).
Google Scholar
Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. Conference International on Machine Learning 8748–8763 (2021).
Hu, E. J. et al. LoRA: low-rank adaptation of large language models. Int. Conf. Learn. Represent 1, 3 (2022).
Google Scholar
Karras, T. et al. Elucidating the design space of diffusion-based generative models. Adv. Neural Inf. Process. Syst. 35, 26565–26577 (2022).
Google Scholar

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (grant numbers U23A20295, 62131015, 82441023, 62301318, 82394432), the China Ministry of Science and Technology (STI2030-Major Projects-2022ZD0209000, STI2030-Major Projects-2022ZD0213100), Shanghai Municipal Central Guided Local Science and Technology Development Fund (No. YDZX20233100001001), The Key R&D Program of Guangdong Province, China (grant number 2023B0303040001), Natural Science Foundation of Guangdong Province (grants 2024A1515013203), the Nature and Science Basic Research Foundation of Shenzhen (JCYJ20230807115916035), and HPC Platform of ShanghaiTech University.

Author information

These authors contributed equally: Yuchen Shao, Hongyan Huang, Lingyan Zhang, Dongsheng Li.

Authors and Affiliations

School of Biomedical Engineering and State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
Yuchen Shao, Kaicong Sun & Dinggang Shen
Department of Radiology, Shenzhen Nanshan People’s Hospital, Shenzhen University, Shenzhen, China
Hongyan Huang, Zhiguang Ding, Fan Wang, Shengli Chen, Shiwei Lin & Yingwei Qiu
Department of Radiology, Longgang Central Hospital, Shenzhen, China
Lingyan Zhang
Department of Radiology, Panyu Second People’s Hospital, Guangzhou, China
Dongsheng Li
Shanghai United Imaging Intelligence Co., Ltd, Shanghai, China
Yuning Gu, Xiaoqian Huang, Aowen Liu, Jiafu Zhong, Yiqiang Zhan, Xiang Sean Zhou, Feng Shi, Shu Liao & Dinggang Shen
Department of Radiology, Southern University of Science and Technology Hospital, Shenzhen, China
Mu Du
Department of Radiology, Shenzhen FuYong People’s Hospital, Shenzhen, China
Hongbing Li
Department of Radiology, Shenzhen Bao’an Songgang People’s Hospital, Shenzhen, China
Jiuping Liang
Shanghai Clinical Research and Trial Center, Shanghai, China
Dinggang Shen

Authors

Yuchen Shao
View author publications
Search author on:PubMed Google Scholar
Hongyan Huang
View author publications
Search author on:PubMed Google Scholar
Lingyan Zhang
View author publications
Search author on:PubMed Google Scholar
Dongsheng Li
View author publications
Search author on:PubMed Google Scholar
Zhiguang Ding
View author publications
Search author on:PubMed Google Scholar
Fan Wang
View author publications
Search author on:PubMed Google Scholar
Shengli Chen
View author publications
Search author on:PubMed Google Scholar
Shiwei Lin
View author publications
Search author on:PubMed Google Scholar
Yuning Gu
View author publications
Search author on:PubMed Google Scholar
Mu Du
View author publications
Search author on:PubMed Google Scholar
Hongbing Li
View author publications
Search author on:PubMed Google Scholar
Jiuping Liang
View author publications
Search author on:PubMed Google Scholar
Xiaoqian Huang
View author publications
Search author on:PubMed Google Scholar
Aowen Liu
View author publications
Search author on:PubMed Google Scholar
Jiafu Zhong
View author publications
Search author on:PubMed Google Scholar
Yiqiang Zhan
View author publications
Search author on:PubMed Google Scholar
Xiang Sean Zhou
View author publications
Search author on:PubMed Google Scholar
Feng Shi
View author publications
Search author on:PubMed Google Scholar
Shu Liao
View author publications
Search author on:PubMed Google Scholar
Kaicong Sun
View author publications
Search author on:PubMed Google Scholar
Dinggang Shen
View author publications
Search author on:PubMed Google Scholar
Yingwei Qiu
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.S. conducted study design, algorithm implementation, experimental setup, data processing, and manuscript writing. K.S. participated in study design, experimental setup, and manuscript writing, revision, and editing. H.H. participated in experimental setup, manuscript writing, and editing. L.Z. and D.L. participated in data collection and manuscript editing. Z.D., F.W., S.C., S.L., Y.G., X.H., A.L., J.Z., Y.Z., X.Z., F.S., and S.L. participated in data collection and experimental setup. M.D., H.L., and J.L. participated in data collection. D.S. supervised the research, conducted funding acquisition, and provided detailed guidance for manuscript writing, revision, as well as editing. Y.Q. supervised the research, experimental setup, manuscript writing, revision, and editing.

Corresponding authors

Correspondence to Kaicong Sun, Dinggang Shen or Yingwei Qiu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

(Revised)Supplementary_material (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Shao, Y., Huang, H., Zhang, L. et al. Real-world unified denoising for multi-organ fast MRI: a large-scale prospective validation. npj Digit. Med. 9, 366 (2026). https://doi.org/10.1038/s41746-026-02548-y

Download citation

Received: 15 September 2025
Accepted: 04 March 2026
Published: 19 March 2026
Version of record: 12 May 2026
DOI: https://doi.org/10.1038/s41746-026-02548-y