Abstract
Deep learning (DL) methods are increasingly applied to address the low signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR) of low-field MRI (LFMRI). This study evaluates the potential of diffusion models for LFMRI enhancement, comparing the Super-resolution via Repeated Refinement (SR3), a generative diffusion model, to traditional architectures such as CycleGAN and UNet for translating LFMRI to high-field MRI (HFMRI). Using synthetic LFMRI (64mT) FLAIR brain images generated from the BraTS 2019 dataset (3T), the models were assessed with traditional metrics, including structural similarity index (SSIM) and normalized root-mean-squared error (nRMSE), alongside specialized structural error measurements such as gradient entropy (gEn), gradient error (GE), and perception-based image quality evaluator (PIQE). SR3 significantly outperformed (p-value < < 0.05) the other models across all metrics, achieving SSIM scores over 0.97 and excelling in preserving pathological structures such as necrotic core and edema, with lower gEn and GE values. These findings suggest diffusion models are a robust alternative to conventional DL approaches for LF-to-HF MRI translation. By preserving structural details and enhancing image quality, SR3 could improve the clinical utility of LFMRI systems, making high-quality MRI more accessible. This work demonstrates the potential of diffusion models in advancing medical image enhancement and translation.
Introduction
Low-field (LF) magnetic resonance imaging is emerging as a promising medical imaging modality due to its potential cost-effectiveness and logistic sustainability1,2,3. However, MRI at lower magnetic fields is limited by low signal-to-noise (SNR) and contrast-to-noise (CNR) ratio2,4,5,6. Recent advancements in deep learning (DL) have demonstrated the potential to determine transformations from low-to-high field images, thereby leveraging the higher image quality of the latter. Convolutional Neural Networks (CNNs) were among the first architectures that achieved SNR improvements7,8,9,10. Generative Adversarial Networks (GANs) were also successful in introducing further structural enhancement11,12,13,14. Vision Transformers (ViTs) marked a new era by utilizing self-attention mechanisms to capture long-range dependencies—relationships between distant image pixels/features15,16,17. Diffusion models have emerged along the way as a novel iterative approach, achieving remarkable enhancement results18,19.
Recently introduced diffusion model architectures have been effectively used in enhancement of non-clinical20,21,22,23 and clinical24,25,26,27,28 images. These highly promising outcomes motivated this work to investigate these generative models for enhancing LFMRI as image-to-image translation from LF to HF image domain. In this work we focus on the generative diffusion model Super-Resolution via Repeated Refinement (SR3)29. We deliberately chose SR3 not because it is the most advanced diffusion model, but because it allows us to isolate and evaluate the effectiveness of the iterative denoising process at the core of the diffusion framework. Our aim is to demonstrate that even a basic formulation like SR3 can outperform traditional DL methods under the constraints of LF MRI.
In assessing its potential value, SR3 is compared against the CycleGAN30 and the UNet31—two commonly used DL networks in medical image enhancement32,33,34,35,36,37,38. This comparison aims to determine whether a diffusion model’s iterative refinement, despite its longer runtime, yields significant LF image enhancement compared to traditional approaches, thereby justifying the consideration of exploring and adopting diffusion model architectures. These models were not selected for their novelty, but rather as conceptual baselines representing GAN-based and CNN-based approaches, respectively. While more recent variants such as attention-UNet39 or UGAN40 introduce architectural complexity, they offer only incremental improvements in medical image translation tasks—particularly when operating on single-contrast, low-SNR data such as LF FLAIR. Rather than conducting an extensive architecture benchmark, this study focused on assessing the value introduced by the diffusion-based training dynamics, which we believe play a more central role under the constraints of LFMRI. While ViT-based architectures have shown recent success in medical image enhancement, our aim was not to benchmark state-of-the-art models across all paradigms. Instead, we aimed to isolate the gains derived from the core denoising mechanism of diffusion models under the severe signal degradation specific to LFMRI.
Comparative investigation of the performance of those machine trainable methods requires acquiring scans from the same subject in both LF and HF conditions, in addition to performing co-registration between the corresponding images41,42,43,44,45,46. This complexity makes it challenging to find publicly available datasets for exploring this research question. As a solution, synthesis techniques have emerged along the way to mimic actual LF image acquisition. Some approaches attempt to simulate low-field contrast by modeling tissue-specific relaxation properties (T1, T2) as a function of magnetic field strength2,6,47,48. Using such techniques in our study require accurate relaxation data for pathological tissues—which are rarely available—and may yield unrealistic appearances when applied to tumor regions.
Another category of LF synthesis methods is based on iteratively reducing the signal and introducing noise into the HF data in an empirical way10,49,50. Such an approach neglects the crucial need for setting hyperparameters in each image degradation step to be selected based on an end-to-end optimization from HF to LF space. These methods also lack expert validation to ensure that the simulated LF images faithfully represent authentic LF characteristics—the synthesized image may not represent the real-world LF conditions, such as specific noise patterns and unique artifacts associated with them. The use of inadequate synthesis pipelines for generating LF data propagates biases into the inference of the studied DL network, leading to different performances on synthetic data compared to real-world LF MRI conditions10,49,50,51.
In this work, we started with the 3T BraTS 2019 dataset52 as the HF data and created their corresponding LF pair (at 64mT, the field used by the commercial Hyperfine Swoop system, which is the only FDA and CE approved portable very low field system as present) using a novel neurologist-validated image synthesis technique46. The hyperparameter optimization employed alongside histogram matching in this technique ensures a realistic representation of LF conditions. This method was specifically developed and neurologist-validated for FLAIR images, which is why we limited our study to this contrast—particularly given the lack of validated synthesis tools for other modalities in pathological cases.
Prior investigations have demonstrated the application of diffusion models for LFMRI enhancement, with results consistently showing the superior performance of diffusion-based approaches compared to conventional neural networks51,53. However, these studies were limited to healthy subject cohorts, while this work extends the application to pathological tissues, specifically tumor cases. Furthermore, while53 conducted their investigation at 0.2T field strength, this study extends to the ultra-low-field strength of 64mT, wherein the inherently compromised SNR intensifies the enhancement complexity.
Materials and methods
The SR3 architecture
The studied SR3 architecture29 employs Denoising Diffusion Probabilistic Models (DDPMs)54 and integrates them with a UNet-based architecture to achieve state-of-the-art performance in image super-resolution. The core principle behind DDPMs is to learn a reverse denoising process that can recover clean data from a sequence of noisy versions. As shown in Fig. 1(a), the DDPM consists of two processes: the forward diffusion and the reverse denoising process.
Overview of the employed SR3 diffusion model. (a) The forward and inverse processes of the architecture. In the forward process, the original HF image (\({y}_{0}\)) undergoes iterative degradation by adding Gaussian noise over T = 2000 steps. The reverse denoising process progressively recovers the original HF image by removing the added noise step by step. (b) An illustration of the denoising process highlighting that the noisy image (\({y}_{t}\)), i.e., the outcome of the forward step in (a), is concatenated with the LF FLAIR (\(x\)).
Forward Diffusion Process: The DDPM framework, as in29 , was formulated as a Markov chain; as illustrated in Fig. 1(a), the forward diffusion process gradually corrupts the 3T HF image by adding Gaussian noise over a fixed number of timesteps \(T\) (in this study T = 2000, as in the original work29). The operation of this process can be appropriated in the example images in Fig. 1(a); the rightmost image is the \({y}_{2000}\). The forward diffusion transition \(q\), from timestep \(t-1\) to timestep \(t\), can be formulated as:
where \(y_{t}\) represents the noisy image at timestep \(t\) of the forward process, \({\upbeta }_{t}\) is a variance schedule that controls the amount of noise added at each timestep \(t\), and \(I\) is the identity matrix.
Reverse Denoising Process: The reverse denoising process of the SR3 pipeline, parameterized by a neural network \(\theta\), aims to recover the HF image from the noisy version. As illustrated in Fig. 1(b), this is achieved by learning a sequence of reverse transitions:
where \({\mu }_{\theta }\) and \(\sum \theta\) are the mean and variance predicted by the neural network, and \(x\) is the LF image. The neural network employed in this work is based on the UNet structure, which serves as the backbone for the reverse denoising process. The key components of the UNet architecture in our baseline include:
1. Input Conditioning: The noisy image at each timestep is concatenated with the LF image \(x\) and fed as input to the UNet model.
2. Downsampling: The input is downsampled using strided convolutions to a lower resolution, enabling the network to capture hierarchical features with a larger receptive field.
3. Bottleneck: At the bottleneck, self-attention55 and multiple residual blocks are applied to capture long-range dependencies and refine the features.
4. Upsampling: The features are then upsampled using nearest neighbor interpolation and convolutions to the desired output dimension (240 × 240 pixels), which enables the reconstruction of finer spatial details while maintaining structural accuracy.
5. Output: The final output is the predicted residual \(\varepsilon\) at the noise level \({\alpha }_{t}\).
In this study, the SR3 diffusion model subjects the input LF image \(x\) to a \(T\)-step iterative process that progressively refines it to a predicted HF image \(y\). This process can be expressed as:
where \({y}_{t}\) is a sample from \(p\left({y}_{t-1}\right| {y}_{t} , x)\), and \(p\left({y}_{t-1}\right| {y}_{t} , x)\) is the conditional DDPM learned by the UNet model.
Objective function
The loss function employed in the training of SR3 architecture was the same as in29:
where \({\varepsilon }_{t}\) is a random Gaussian noise at time step \(t\), and \({\varepsilon }_{\theta }\) is the predicted residual output by the UNet module at the current noise level αt.
Training and testing dataset
This study used publicly available, fully anonymized data from the BraTS 2019 dataset52. All methods were carried out in accordance with relevant guidelines and regulations. Ethical approval was not required, as no experiments were conducted on human participants and no identifiable personal data were used. This dataset is a standard resource in diagnostic imaging research, which consists of multi-modal MRI scans of brain tumors, including 369 cases of high-grade glioma, 183 cases of low-grade glioma, and 76 cases of meningioma. We focused on the 369 high-grade glioma cases as they encompass a wide spectrum of tumor types, ranging from simple to complex. High-grade gliomas, particularly glioblastomas, are known for their heterogeneity and diverse imaging characteristics, making them an ideal subset for comprehensive performance analysis of our studied DL architectures.
To generate 64mT FLAIR images from the HF 3T BraTS data, we employed the validated synthesis pipeline originally proposed in46. This method was developed using a dataset of paired 3T and 64mT FLAIR scans from patients with hydrocephalus and models the LF degradation process empirically. First, HF images are rigidly registered to the 64mT counterparts and brain masks are applied to extract intracranial tissue. The resulting volumes are then downsampled to the spatial resolution of LFMRI. To simulate the loss in spatial detail, Gaussian smoothing is applied with a tunable kernel, and additional Gaussian noise is added to degrade the SNR. The standard deviation of the smoothing kernel and the noise amplitude are selected through an end-to-end optimization procedure that minimizes the discrepancy in histogram statistics, specifically the mean, standard deviation, and skewness, between the synthesized images and actual 64mT FLAIR images. This optimization is performed on a development dataset of real paired images. The validity of the synthesized LF images was performed in two ways: (1) quantitatively, by comparing gradient entropy values between synthetic and real low-field scans, and (2) qualitatively, through blinded neuroradiologist evaluations of the images’ diagnostic quality. After validating the method, it was applied to the 3T BraTS 2019 dataset to create corresponding low-field FLAIR images for glioma patients. These generated images were used as inputs for training and testing our enhancement models.
In these studies, we used FLAIR images from 369 subjects. To ensure no data leakage between training and testing and that inference was performed on unseen subjects, we used 319 subjects for training and 50 for testing. Moreover, to satisfy the criterion that all FLAIR images contained brain tumors, from each patient we selected and used 10 consecutive slices (out of 155) that all included pathologic structures. As a result, in this work we used for training, 3,190 (80%) and for testing, 500 (20%) FLAIR images of the brain that all contained annotated pathologies.
Network implementation and training
Our implementation was developed in PyTorch56 on a high-performance computing cluster equipped with 16 GB of memory. The model optimization was performed using the Adam optimizer57, for 1000 epochs and a batch size of 4. To prevent gradient explosion during training, all input data was normalized to the range -1 and 1. Table 1 outlines further training details including training and inference times of the studied networks. SR3 is the most computationally demanding at inference due to its iterative denoising across 2000 steps. While UNet and CycleGAN complete predictions in a single pass, CycleGAN requires more training time than UNet. This is attributed to its adversarial setup involving two generators and discriminators and the additional overhead from enforcing cycle consistency.
Evaluation metrics
Quantitative assessment of the model outcomes was based on the five metrics: the commonly used structural similarity index (SSIM)58,59 and normalized root-mean-squared-error (nRMSE) as well as gradient entropy (gEn)60, gradient error (GE)60, and perception-based image quality evaluator (PIQE)61. The latter three were motivated by the fact the commonly used SSIM was not originally introduced to assess structural differences62,63, while they were designed for that specific purpose60,61.
Difference in gradient entropy (gEn)
Gradient entropy is a measure of information or variation present in the gradient of an image, and is defined as60:
where \({g}_{i,j}\) is the gradient magnitude computed using a Sobel filter at the pixel \((i, j)\). Also, \(p({g}_{i,j})\) is the probability distribution of the gradient magnitudes, which is calculated by creating a probability density function (PDF) from gradient magnitude frequencies. In this work, to compare the performance of the different models we used and reported the difference, the L1 distance, between the gEn of the network output (enhanced image) and the gEn of the ground truth (the original BraTS image).
Gradient Error (GE)
Gradient Error is a way to compare the gradients of two images; in this case the network output enhanced image and its ground truth. Specifically, we calculated the L1 distance between corresponding gradients and average these differences across the image. Gradients are computed using a 3 × 3 Sobel filter, which approximates the spatial derivatives of image intensities along the \(x\) and \(y\) directions. This metric is particularly sensitive to edge and texture discrepancies, indicating how well the enhanced image preserves the original’s gradient structure. GE between two images A and B is defined as60:
where \(\nabla\) is the gradient operator, and \({\Vert \nabla {\text{A}}_{\left(i,j\right)}\Vert }_{1}\) and \({\Vert \nabla {\text{B}}_{\left(i,j\right)}\Vert }_{1}\) are the L1-norms of the gradients at each pixel \((i, j)\) in images A and B, respectively.
Perception-based Image Quality Evaluator (PIQE)
PIQE calculates image quality by dividing an image into blocks and analyzing their distortion. As described in61, it first applies the Mean Subtracted Contrast Normalized (MSCN) algorithm to each pixel in the image, which normalizes contrast and removes the mean value. Then, the image is divided into NxN blocks (N was set to 16 as in the original implementation), and those blocks with high variability in pixel values (high activity) are identified. These active blocks are used to create a mask that highlights areas of interest. Next, each block is evaluated for distortion caused by blocking artifacts and noise using the MSCN coefficients. The PIQE score is finally calculated as the average distortion of the distorted blocks, resulting in a value between 0 and 1. A lower score indicates higher perceived image quality. Our evaluation calculates the PIQE score for both the enhanced image and its ground truth, then computes the absolute difference between these scores as the final measure of image quality.
Statistical analysis
Our analysis was performed across 7 different regions of interests (ROI) for example shown in Fig. 2: Entire brain, Healthy tissue, Entire tumor, Necrotic core, Edema, Non-enhancing core, and Enhancing core. This granular analysis highlights specific strengths and weaknesses in translating tumors of varying severity, unlike earlier works that perform the evaluation on whole brain scans58,64,65,66, which offer a high-level view of model capabilities but may overlook variations in how different tumors are translated. In this work, we utilized the expert-annotated segmentation masks provided by the BraTS 2019 dataset52 for defining ROIs. These subject- and slice-specific masks removed the need for any additional segmentation and simplified the evaluation process.
To compare the performance of different enhancement methods at each tissue type (model-to-model variation), we employed a linear mixed effects model (MEM). We chose MEM over traditional ANOVA due to the presence of repeated measurements in our data, which violates the assumption of independence required for ANOVA14,67,68,69,70. Model was the only fixed effect factor in our MEMs, representing the choice of image enhancement method (SR3, UNet, or CycleGAN). Our model-to-model analysis conducted separate MEMs for each of the 7 tissue types, as variations in tissue cannot be treated as originating from the same population. We also considered a random effect, accounting for the variability across images.
We conducted adequacy checks on our MEMs and observed violations of normality and/or homoscedasticity assumptions. We resolved these issues by applying transformations: a cubic transformation to SSIM and natural logarithm (ln(.)) to nRMSE, GE, gEn, and PIQE response variables. After applying these transformations, we re-ran the MEM analysis using the transformed variables. Subsequent model adequacy checks confirmed that the transformations successfully resolved the previous assumption violations, ensuring the validity of our statistical analysis.
Results
Figure 3 shows the predicted values from the mixed effects model analysis (circles) with 95% confidence intervals for each of the five studied evaluation metrics, comparing the three architectures (CycleGAN, UNet and SR3) across the seven different ROIs/tissues (left-to-right across the rows). When evaluating these results, it is noted that except for the SSIM, where higher values indicate higher image quality, the opposite is the case for the other metrics: the lower their values, the higher quality (indicating a closer match to the ground truth). From the 35 studied cases, i.e., combinations of image evaluation metrics (× 5) and types of tissue (× 7), it is clear that the SR3 outperforms both the CycleGAN and the UNet. It also appears that UNet performed better than the CycleGAN, except for certain cases that intriguingly correspond to only healthy and entire brain ROIs and for the GE and PIQE metrics only.
Mixed effects model predictions (circles with 95% confidence intervals) comparing three deep learning architectures (CycleGAN, UNet, SR3) across the 7 studied brain ROIs. The results are displayed for five image quality metrics, organized by rows: (a) SSIM, (b) nRMSE, (c) gEn, (d) GE, and (e) PIQE. Columns represent the seven analyzed tissue types: Entire brain, Healthy tissue, Entire tumor, Necrotic core, Edema, Non-enhancing core, and Enhancing core.
Specifically, Fig. 3(a) demonstrates when the SSIM is considered, the SR3 diffusion model excelled (notably p-value < < 0.05 in Table 2), with values approaching 0.97 for all analyzed tissue types. This signifies a near-perfect structural similarity to the ground-truth images across all categories. UNet consistently ranked second in performance, while images translated with the CycleGAN demonstrated the lowest SSIM throughout. Notably, all the three models exhibited remarkable stability in their SSIM scores across various tissue types, from edema to contrast agent enhancing core, and even in healthy tissue and the entire brain. Accordingly, Fig. 3(b) illustrates that the SR3 model consistently yields the lowest nRMSE values across all examined tissue types. In contrast, CycleGAN and UNet exhibit significantly higher nRMSE values (p-value < < 0.05, Table 2), with CycleGAN outperforming UNet.
A similar pattern was also observed with the gEn metric. Figure 3(c) clearly shows that SR3 consistently achieved the lowest gEn values (p-value < < 0.05, Table 2). Figure 3(d) shows that similar to gEn, SR3 demonstrates significantly smaller GE values (p-value < < 0.05, Table 2), from healthy to complex pathological tissue categories, such as the contrast-agent enhanced core area. The performance hierarchy exhibits a consistent order, with the SR3 at the top of performance, followed by CycleGAN and UNet, and CycleGAN occasionally outperforming UNet. Also, as in Fig. 3(e), the PIQE performance evaluation closely reflects the trend observed in the gEn, where SR3 achieved consistently smaller PIQE values across diverse tissue categories (p-value < < 0.05, Table 2). Notably, UNet consistently outperformed CycleGAN throughout all tissue types.
Figure 4 shows one example from the N = 500 tested subset; each one of the six panels includes the synthetic 64mT FLAIR (upper left), the inference outcomes for the three tested models (columns two to five) followed by the corresponding BraTS 3T FLAIR (the ground truth from which the synthetic 64mT was generated). In each panel the second row is the pixel-to-pixel difference between the inference outcome and the ground truth 3T. According to Fig. 4, the SR3 model consistently demonstrates better tumor enhancement across all tissue types. This is also evident in the error maps, i.e., difference enhanced minus ground truth, SR3 outperforms other models in accurately capturing the characteristics of all tumor types. Please note that the error maps are presented with increased brightness (50% increase; pixel values multiplied by 1.5) to emphasize the degree of differences. The performance of the SR3 can be appreciated, for example, in the contrast–agent enhanced potion of the tumor. As highlighted by the arrows that point to significant areas of enhancement, the border of the contrast-enhanced core is better outlined in the SR3-enhanced images. These findings on the enhancing tumor regions showcase SR3’s ability to maintain low error despite complex textures. The error maps also reveal that SR3 achieves lower errors in healthy tissue and whole-brain reconstructions as well, preserving anatomical details more precisely than UNet and CycleGAN.
Representative outcomes from a test inference on a FLAIR image from the BraTS dataset. Arrows highlight areas with significant differences for each analyzed tissue. The pixel intensity of the error maps has been increased by 50% (multiplying the pixel values by a factor of 1.5) to improve error visibility.
Evaluating training and inference times across models, Table 1 shows that the diffusion-based SR3 model incurs the highest inference time due to its iterative denoising over 2000 timesteps. In contrast, UNet and CycleGAN rely on single-pass inference, resulting in significantly faster predictions. However, CycleGAN exhibits longer training time than UNet and SR3 because it involves adversarial training with two generators and two discriminators, as well as cycle consistency loss, all of which add computational overhead. The learning rates reflect optimal training configurations used for each model.
Discussion
While comparing DL architectures for computer vision tasks, especially when considering statistics, is a potentially useful endeavor, this study was instead motivated by the prospect of broader and more sustainable access to imaging using very low, magnetic field MRI scanners. This highly altruistic idea does have a critical barrier: nature itself manifested by the low SNR and CNR in LFMRI2,4,6. To address this, DL methods for LF-to-HF MRI translation are emerging10,49,51,71. To this end, this work compared three such LF-to-HF translations based on the computer vision workhorse UNet and two generative architectures, the SR3 Diffusion Model and the CycleGAN.
The central outcome of this work is that the SR3 diffusion model manifested superior performance in translating/transforming LF-to-HF MRI for the particular architectures and the particular training and testing dataset. This observation was apparent for the different types of tissue that were characterized in the BraTS dataset. Most importantly, this superiority was statistically significant with all p-values < < 0.05 (as reflected in Table 2). Moreover, these differences of performance are particularly evident in the challenging pathological tissues, as supported by the representative examples in Fig. 4. The decision to assess metrics at different tissues was driven by the obvious value of efficient enhancement of the pathology-relating regions as improved tumor visibility also allows for better delineation of tumor boundaries, leading to more meticulous treatment planning and ultimately improved patient outcomes. These findings support the notion that a diffusion model, even as simple as SR3 architecture, may be suitable for handling complex pathological tissues, suggesting the potential of diffusion models in clinical applications that require intricate representations of structural details. However, extensive studies to a wider population are required to substantiate such a claim.
The superior performance of the SR3 diffusion model over UNet and CycleGAN can be attributed to its iterative denoising mechanism, which enables gradual refinement of anatomical structures. Unlike UNet and CycleGAN, which perform enhancement in a single forward pass and may struggle with recovering high-frequency details lost in acquisitions, SR3 progressively reverses the degradation process by predicting intermediate noise states. This stepwise restoration allows the model to handle severe information loss in LF images more effectively. We hypothesize that this is particularly beneficial in recovering complex structural information in pathological regions, where the signal is often weak, and the anatomical boundaries are ambiguous. For example, tumor subregions such as the enhancing core exhibit subtle intensity variations and require precise restoration of edges, something SR3 appears better equipped to handle due to its capacity to model complex conditional distributions over noise scales. This is supported by our region-wise analyses, which consistently showed SR3 yielding the lowest structural error in tumor tissues. In essence, SR3’s denoising mechanism acts as a learned prior that reinforces anatomical plausibility at each refinement step, an advantage that single-shot models inherently lack.
While this work focused on comparing SR3 to CNN and GAN models, future investigations could include transformer-based approaches such as ViT to broaden architectural comparisons. However, our goal here was to demonstrate that even a basic diffusion framework offers significant gains under low-field constraints, emphasizing the value of its iterative denoising process over architectural novelty.
Another observation is the pattern or the changes of the metrics for the different types of tissue; for example, when comparing healthy tissue with the pathological structure of enhancing core. This highlights the ongoing challenge for medical image translation methods to accurately capture variations in pathological tissue. SR3 consistently exhibited the lowest error rates compared to CycleGAN and UNet, especially in the rather severe tumor regions. This also suggests that iterative denoising strategy of a diffusion model, i.e., SR3 in our case, maybe better in learning complex distributions within the training.
This work also included the use of gEn and GE, as they are particularly designed for highlighting structural improvements in an enhanced image. These measures offer a direct quantification of structural error, offering a more targeted measure compared to traditional metrics like SSIM and nRMSE, which were not originally designed for this purpose62,63; however, SSIM was still reported in our study due to its widespread use in the literature. Interestingly, despite its limitation, SSIM yielded similar trends to GE and gEn in our results. We attribute this to the clear and substantial performance gap between models, i.e., when the enhancement difference is significant, even suboptimal metrics like SSIM may still reflect meaningful ranking.
While this study demonstrates the potential of SR3 for LF-to-HF translations, it has certain limitations which do not impact the conclusions. This study was designed as a proof-of-concept investigation into whether the core iterative denoising mechanism of diffusion models can effectively enhance ultra-low-field (64mT) MR images, particularly in pathological contexts. Training used synthesized LFMRI data from the BraTS public dataset due to the logistic and ethical challenges of acquiring sufficient paired 64mT/3T tumor data. Our goal was to explore whether such models could enhance LF images for patients without access to 3T systems, such as those in rural, emergency, or low-resource settings where portable MRI scanners are more viable.
This study focused exclusively on FLAIR images because a validated synthesis method exists for this contrast, with both quantitative and neuroradiologist validation46. In contrast, synthesis methods for other modalities typically rely either on modeling tissue-specific relaxation properties2,6,47,48, which are not well-characterized for pathological tissues and may lead to unrealistic tumor representations in our study, or on simplistic image degradation techniques that fail to capture key characteristics of LFMRI such as noise behavior10,49,50. The pipeline we employed provides a more realistic simulation of LF pathology than these alternatives, especially given the absence of reliable relaxation data for tumors.
For evaluation, we relied on region-wise analysis using expert segmentations from BraTS to quantify model performance within clinically relevant tumor subregions. Automated tumor segmentation pipelines typically require multi-contrast input, which was not available in our single-contrast setting. As more multi-contrast and real 64mT data becomes available, we envision extending this work toward full clinical evaluation pipelines that include radiologist assessment of image quality and clinical value. Indeed, realistic LFMRI data over synthetic would further advance those studies.
Conclusion
In-silico studies demonstrated that the SR3 diffusion outperforms CycleGAN and UNet in the image-to-image translation paradigm from synthetic 64mT FLAIR to 3T MRI. This was evident from the performance metrics, which showed a p-value < < 0.05 for the pairwise comparisons for the models. These findings highlight the potential of even simple diffusion model in enhancing LFMRI, suggesting promising applications in possible clinical translation for LFMRI.
Data availability
The datasets analyzed during the current study are publicly available in the BraTS 2019 repository: https://www.med.upenn.edu/cbica/brats2019/data.html. The synthesized LF MRI data generated during this study are available from the corresponding author upon reasonable request.
References
Hori, M., Hagiwara, A., Goto, M., Wada, A. & Aoki, S. Low-Field Magnetic Resonance: Imaging Its History and Renaissance. Invest Radiol 56, 669–679. https://doi.org/10.1097/RLI.0000000000000810 (2021).
Marques, J. P., Simonis, F. F. J. & Webb, A. G. Low-field MRI: An MR physics perspective. J. Magn. Reson. Imaging 49, 1528–1542. https://doi.org/10.1002/jmri.26637 (2019).
Heiss, R., Nagel, A. M., Laun, F. B., Uder, M. & Bickelhaupt, S. Low-Field Magnetic Resonance Imaging: A New Generation of Breakthrough Technology in Clinical Imaging. Invest. Radiol. 56, 726–733. https://doi.org/10.1097/RLI.0000000000000805 (2021).
Arnold, T. C., Freeman, C. W., Litt, B. & Stein, J. M. Low-field MRI: Clinical promise and challenges. J. Magn. Reson. Imaging 57, 25–44. https://doi.org/10.1002/jmri.28408 (2023).
Kimberly, W. T. et al. Brain imaging with portable low-field MRI. Nat. Rev. Bioeng. 1, 617–630. https://doi.org/10.1038/s44222-023-00086-w (2023).
Rooney, W. D. et al. Magnetic field and tissue dependencies of human brain longitudinal 1H2O relaxation in vivo. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 57, 308–318 (2007).
Koonjoo, N., Zhu, B., Bagnall, G. C., Bhutto, D. & Rosen, M. S. Boosting the signal-to-noise of low-field MRI with deep learning image reconstruction. Sci. Rep. 11, 8248. https://doi.org/10.1038/s41598-021-87482-7 (2021).
Ayde, R., Senft, T., Salameh, N. & Sarracanie, M. Deep learning for fast low-field MRI acquisitions. Sci. Rep. 12, 11394. https://doi.org/10.1038/s41598-022-14039-7 (2022).
Su, J. et al. A CNN Based Software Gradiometer for Electromagnetic Background Noise Reduction in Low Field MRI Applications. IEEE Trans. Med. Imaging 41, 1007–1016. https://doi.org/10.1109/TMI.2022.3147450 (2022).
de Leeuw den Bouter, M. L. et al. Deep learning-based single image super-resolution for low-field MR brain images. Sci. Rep. https://doi.org/10.1038/s41598-022-10298-612, 6362. (2022).
Yoo, D. et al. Signal Enhancement of Low Magnetic Field Magnetic Resonance Image Using a Conventional- and Cyclic-Generative Adversarial Network Models With Unpaired Image Sets. Front. Oncol. 11, 660284. https://doi.org/10.3389/fonc.2021.660284 (2021).
Chen, Z. et al. Deep Learning for Image Enhancement and Correction in Magnetic Resonance Imaging—State-of-the-Art and Challenges. J. Digit. Imaging 36, 204–230. https://doi.org/10.1007/s10278-022-00721-9 (2023).
Lucas, A. et al. Multi-contrast high-field quality image synthesis for portable low-field MRI using generative adversarial networks and paired data. MedRxiv https://doi.org/10.1101/2023.12.28.23300409 (2023).
Javadi, M. et al. Let UNet play an adversarial game: investigating the effect of adversarial training in enhancing low-resolution MRI. Journal of Imaging Informatics in Medicine https://doi.org/10.1007/s10278-024-01205-8 (2024).
Zou, B. et al. Multi-scale deformable transformer for multi-contrast knee MRI super-resolution. Biomed. Signal. Process. Control. 79, 104154. https://doi.org/10.1016/j.bspc.2022.104154 (2023).
Mahapatra, D. & Ge, Z. MR image super resolution by combining feature disentanglement CNNs and Vision transformers. PMLR 172, 858–878 (2022).
Shamshad, F. et al. Transformers in medical imaging: A survey. Med. Image Anal. 88, 102802. https://doi.org/10.1016/j.media.2023.102802 (2023).
Wu, Z., Chen, X., Xie, S., Shen, J. & Zeng, Y. Super-resolution of brain MRI images based on denoising diffusion probabilistic model. Biomed. Signal Process. Control 85, 104901. https://doi.org/10.1016/j.bspc.2023.104901 (2023).
Li, G. et al. Rethinking diffusion model for multi-contrast MRI super-resolution. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2024).
Shang, S. et al. ResDiff: Combining CNN and Diffusion Model for Image Super-Resolution. In: AAAI’24/IAAI’24/EAAI’24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence 8975-8983 (2024). https://doi.org/10.1609/aaai.v38i8.28746
Li, H. et al. SRDiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing 479, 47–59. https://doi.org/10.1016/j.neucom.2022.01.029 (2022).
Yue, Z., Wang. J., & Loy, C. C. ResShift: Efficient diffusion model for image super-resolution by residual shifting. Adv. Neural Inf. Process. Syst. 36, 13294–13307(2024).
Gao, S. et al. Implicit diffusion models for continuous super-resolution. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (2023).
Han, Z. & Huang, W. Arbitrary scale super-resolution diffusion model for brain MRI images. Comput. Biol. Med. 170, 108003. https://doi.org/10.1016/j.compbiomed.2024.108003 (2024).
Wu, Z., Chen, X. & Yu, J. 3D-SRDM: 3D super-resolution of MRI volumes based on diffusion model. Int. J. Imaging Syst. Technol. 2024;34. https://doi.org/10.1002/ima.23093.
Han, Z. & Huang, W. Prostate MRI super-resolution using discrete residual diffusion model. Proceedings - 2023 2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023, Institute of Electrical and Electronics Engineers Inc., p. 1947–50. https://doi.org/10.1109/BIBM58861.2023.10385851. (2023).
Huang, G., Chen, X., Shen, Y. & Wang, S. MR image super-resolution using wavelet diffusion for predicting alzheimer’s disease. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13974, p. 146–57. https://doi.org/10.1007/978-3-031-43075-6_13. (LNAI, Springer Science and Business Media Deutschland GmbH, 2023)
Mao, Y., Jiang, L., Chen, X. & Li C. DisC-Diff: disentangled conditional diffusion model for multi-contrast MRI super-resolution. International Conference on Medical Image Computing and Computer-Assisted Intervention. (Cham: Springer Nature Switzerland, 2023).
Saharia, C. et al. Image super-resolution via iterative refinement. IEEE Trans. Pattern. Anal. Mach. Intell. 45, 4713–4726 (2022).
Zhu, J-Y., Park, T., Isola, P. & Efros A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the IEEE international conference on computer vision, p. 2223–32 (2017).
Ronneberger, O., Fischer, P. & Brox T. U-net: Convolutional networks for biomedical image segmentation. Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5–9, 2015, proceedings, part III 18, p. 234–41 (2015).
Yang, G., Li, C., Yao, Y., Wang, G. & Teng, Y. Quasi-supervised learning for super-resolution PET. Comput. Med. Imaging Graph. 113, 102351 (2024).
Fang, W. et al. CycleINR: Cycle implicit neural representation for arbitrary-scale volumetric super-resolution of medical data. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p. 11631–41. (2014).
Liu, H., Liu, J., Hou, S., Tao, T. & Han, J. Perception consistency ultrasound image super-resolution via self-supervised CycleGAN. Neural. Comput. Appl. 35, 1–11 (2023).
Du, Y. et al. Dual-Channel in Spatial-Frequency Domain CycleGAN for perceptual enhancement of transcranial cortical vascular structure and function. Comput. Biol. Med. 173, 108377 (2024).
Chatterjee, S., Sarasaen, C., Rose, G., Nürnberger, A. & Speck, O. Ddos-unet: Incorporating temporal information using dynamic dual-channel unet for enhancing super-resolution of dynamic mri. IEEE Access 12, 99122–99136 (2024).
Min, F., Wang, L., Pan, S. & Song, G. D 2 UNet: Dual decoder U-Net for seismic image super-resolution reconstruction. IEEE Trans. Geosci. Remote Sens. 61, 1–13 (2023).
El-Assiouti, H. S., El-Saadawy, H., Al-Berry, M. N. & Tolba, M. F. Lite-SRGAN and Lite-UNet: toward fast and accurate image super-resolution, segmentation, and localization for plant leaf diseases. IEEE Access 11, 67498–67517 (2023).
Oktay, O. et al. Attention u-net: Learning where to look for the pancreas. ArXiv Preprint ArXiv:180403999 (2018).
Ma, C. Uncertainty-aware GAN for single image super resolution. Proc. AAAI Conf. Artif. Intell. 38, 4071–4079 (2024).
Islam, K. T. et al. Improving portable low-field MRI image quality through image-to-image translation using paired low- and high-field images. Sci. Rep. 13, 21183. https://doi.org/10.1038/s41598-023-48438-1 (2023).
Iglesias, J. E. et al. Accurate super-resolution low-field brain mri. ArXiv Preprint ArXiv:220203564 (2022).
Iglesias, J. E. et al. Quantitative brain morphometry of portable low-field-strength MRI using super-resolution machine learning. Radiology 306, e220522 (2022).
Cooper, R. et al. Bridging the gap: improving correspondence between low-field and high-field magnetic resonance images in young people. Front. Neurol. 15, 1339223 (2024).
Lau, V. et al. Pushing the limits of low-cost ultra-low-field MRI by dual-acquisition deep learning 3D superresolution. Magn. Reson. Med. 90, 400–416 (2023).
Arnold, T. C., Baldassano, S. N., Litt, B. & Stein, J. M. Simulated diagnostic performance of low-field MRI: Harnessing open-access datasets to evaluate novel devices. Magn. Reson. Imaging. 87, 67–76 (2022).
Pohmann, R., Speck, O. & Scheffler, K. Signal-to-noise ratio and MR tissue parameters in human brain imaging at 3, 7 and 9.4 tesla using current receive coil arrays. Magn Reson Med. 75, 801–9. (2016).
Salehi, A. et al. Denoising low-field MR images with a deep learning algorithm based on simulated data from easily accessible open-source software. J. Magn. Reson. 370, 107812 (2025).
Lin, H. et al. Low-field magnetic resonance image enhancement via stochastic image quality transfer. Med. Image Anal. 87, 102807 (2023).
Kalluvila, A., Koonjoo, N., Bhutto, D., Rockenbach, M. & Rosen, M. S. Synthetic low-field MRI super-resolution via nested U-Net architecture. ArXiv Preprint ArXiv:221115047 (2022).
Kim, S., Alexander, D. C., Eldaly, A. K., Figini, M. & Tregidgo H. F. J. A 3D conditional diffusion model for image quality transfer-an application to low-field MRI. Deep generative models for health workshop NeurIPS ArXiv Preprint ArXiv:2311.06631 https://arxiv.org/abs/2311.06631 (2023).
Menze, B. H. et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34, 1993–2024 (2014).
Lin, X. et al. Zero-shot low-field mri enhancement via denoising diffusion driven neural representation. International Conference on Medical Image Computing and Computer-Assisted Intervention, p. 775–85. (2024).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).
Vaswani, A. et al. Attention is All you Need. In: Adv Neural Inf Process Syst Vol. 30, (eds. Guyon, I. et al.) Curran Associates, Inc., (2017).
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In: Adv Neural Inf Process Syst, vol. 32, (eds. Wallach, H.,et al.) Curran Associates, Inc., (2019).
Kingma, D., & Ba, J. Adam: A method for stochastic optimization. International conference on learning representations (ICLR), San Diega, CA, USA, (2015).
Guerreiro, J., Tomás, P., Garcia, N. & Aidos, H. Super-resolution of magnetic resonance images using Ggenerative adversarial networks. Comput. Med. Imaging Graph. 108, 102280. https://doi.org/10.1016/j.compmedimag.2023.102280 (2023).
Wang, Q. et al. DISGAN: Wavelet-informed discriminator guides GAN to MRI super-resolution with noise cleaning. Proceedings of the IEEE/CVF International Conference on Computer Vision, p. 2452–61. (2023).
McGee, K. P., Manduca, A., Felmlee, J. P., Riederer, S. J. & Ehman, R. L. Image metric-based correction (autocorrection) of motion effects: analysis of image metrics. Journal of Magnetic Resonance Imaging: An Official Journal of the International Society for Magnetic Resonance in Medicine 11, 174–181 (2000).
Venkatanath, N., Praneeth, D., Channappayya, S. S. & Medasani, S. S. Blind image quality evaluation using perception based features. 2015 twenty first national conference on communications (NCC), p. 1–6. (2015).
Nilsson, J. & Akenine-Möller, T. Understanding ssim. ArXiv Preprint ArXiv:200613846 (2020).
Andersson, P., Nilsson, J. & Akenine-Möller, T. FLIP: A Difference Evaluator for Alternating Images. Proc. ACM Comput. Graph. Interact. Tech. 33, 1–23 (2020).
Chen, Y. et al. Efficient and accurate MRI super-resolution using a generative adversarial network and 3D multi-level densely connected network. International conference on medical image computing and computer-assisted intervention, p. 91–9. (2018).
Lyu, Q. et al. Multi-Contrast Super-Resolution MRI through a Progressive Network. IEEE Trans. Med. Imaging 39, 2738–2749. https://doi.org/10.1109/TMI.2020.2974858 (2020).
Do, W. J. et al. Reconstruction of multicontrast MR images through deep learning. Med. Phys. 47, 983–997. https://doi.org/10.1002/mp.14006 (2020).
Sharma, R., Tsiamyrtzis, P., Webb, A. G., Leiss, E. L. & Tsekos, N. V. Learning to deep learning: statistics and a paradigm test in selecting a UNet architecture to enhance MRI. Magn. Reson. Mater. Phys. Biol. Med. 37(3), 507–528. https://doi.org/10.1007/s10334-023-01127-6 (2023).
Schading, S. et al. Reliability of spinal cord measures based on synthetic T1-weighted MRI derived from multiparametric mapping (MPM). Neuroimage 271, 120046. https://doi.org/10.1016/j.neuroimage.2023.120046 (2023).
Dror, R., Shlomov, S. & Reichart, R. Deep dominance-how to properly compare deep neural models. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, p. 2773–85. (2019).
Raschka, S. Model evaluation, model selection, and algorithm selection in machine learning. ArXiv Preprint ArXiv:181112808 (2018).
Lin H, Figini M, Tanno R, Blumberg SB, Kaden E, Ogbole G, et al. Deep learning for low-field to high-field MR: image quality transfer with probabilistic decimation simulator. Machine Learning for Medical Image Reconstruction: Second International Workshop, MLMIR 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 17, 2019, Proceedings 2, p. 58–70. (2019).
Author information
Authors and Affiliations
Contributions
Conceptualization, M.J., R.G., and N.V.T.; Data curation, M.J. and R.G.; Experimentation, M.J. and R.G.; Formal analysis, M.J. and P.T.; Investigation, M.J., R.G., and N.V.T.; Methodology, M.J. and R.G.; Project administration, N.V.T.; Software, M.J. and R.G.; Supervision, N.V.T.; Validation, M.J., R.G., P.T., E.L., A.G.W., and N.V.T.; Visualization, M.J. and P.T.; Writing – original draft, M.J, R.G., P.T., E.L., A.G.W., and N.V.T.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Javadi, M., Griffin, R., Tsiamyrtzis, P. et al. In-silico comparison of a diffusion model with conventionally trained deep networks for translating 64mT to 3T brain FLAIR. Sci Rep 15, 38052 (2025). https://doi.org/10.1038/s41598-025-21806-9
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-21806-9




