Introduction

Recent years, deep learning-based reconstruction from under-sampled measurements for magnetic resonance imaging (MRI) has been an important topic. Although MRI is a critical tool that provides invaluable diagnostic information, long acquisition times for MRI leads to multiple limitations of this technique including delays in diagnosis, limited availability for patients, and degradation of imaging, among others. Multiple works have focused on utilizing convolutional neural network (CNN)-based models for under-sampled MRI reconstruction. Deep learning-based MRI reconstruction models have achieved superior performance these days1,2,3,4,5,6,7,8.

Most recently, diffusion models have shown greater potential and remarkable performance in various fields. Works such as denoising diffusion probabilistic models (DDPM)9 and score-based model10,11 have shown promising results in image generation, in-painting, super-resolution, image editing and more12,13,14,15,16,17,18,19,20,21. Diffusion models typically include two components: image degradation (forward process or diffusion process) and image generation (reverse process). In the first part, noise is gradually added to the original data in a step-by-step manner. In the second part, a deep neural network is trained as a restoration operator to perform denoising and recover the original data distribution. Recently, multiple works have been reported to utilize diffusion-based models for accelerated MRI reconstruction22,23,24,25. Chung et al.26 propose a diffusion model that performs accelerated MRI reconstruction from the under-sampled image directly. Xie et al.27 propose a measurement-conditioned denoising diffusion probabilistic model (MC-DDPM) for accelerated MRI reconstruction. Instead of performing a reverse process on the image, MC-DDPM performs reconstruction on the measurement space (k-space for MRI). During the diffusion process, MC-DDPM gradually adds Gaussian noise to the k-space data conditioned on the under-sampling mask. Then, during the reverse process, MC-DDPM recovers the k-space data step-by-step.

Conventional diffusion models utilize Gaussian noise during training and generation. They can be seen as a random walk around the image density function using Langevin dynamics. Cold diffusion models further broaden the scope and consider models built around arbitrary image transformations such as blurring, down-sampling, in-painting, snowification, etc. In such way, the cold diffusion model can be considered as a generalized diffusion model which requires no Gaussian noise during training or testing28. The deep neural networks in cold diffusion models are trained to invert such image deformation rather than remove the Gaussian noise. The degradations in cold diffusion can be randomized or deterministic and designed as needed. This framework provides a more diverse paradigm of diffusion models beyond just the Gaussian noise and gives more flexibility for image degradation and generation.

In this work, we present a k-space cold diffusion model for accelerated MRI reconstruction, demonstrated in Fig. 1. Different from previous diffusion-based models for MRI reconstruction that utilized Gaussian noise, our model performs degradation in k-space during the diffusion process. A deep neural network is trained to perform the reverse process to recover the original fully sampled image. In such a way, the k-space sampling process is integrated directly into the image degradation process, enhancing the model’s generalizability, especially when the sampling process is similar. This allows for quicker application and better performance in zero-shot or few-shot learning scenarios. We provide comparisons with multiple deep learning-based MRI reconstruction models and perform tests on a well-known large open-source MRI dataset: fastMRI29. Our results show that this novel way of performing degradation can generate high-quality reconstruction images for accelerated MRI. We hope that this work can invoke more generalized diffusion models for MRI reconstruction in the future.

Fig. 1
figure 1

The overall pipeline of our k-space cold diffusion model.

Methods

Cold diffusion

Diffusion model is a class of latent variable models which uses a Markov chain to convert the noise distribution to the data distribution. It has the form \(p_{\theta } \left( {{\text{x}}_{0} } \right){:}{=} \smallint p_{\theta } \left( {{\text{x}}_{{0:T}} } \right)d{\text{x}}_{{1:T}}\), where \(\:{\text{x}}_{1},\:\dots\:,{\text{x}}_{T}\) are latent variables of the same dimensionality as the data distribution \({\text{x}}_{0} \sim q\left( {{\text{x}}_{0} } \right)\). The reverse process \(\:{p}_{\theta\:}\left({\text{x}}_{0:T}\right)\) is a joint distribution and is defined as a Markov chain with learned Gaussian transitions starting with \(\:{p}_{\theta\:}\left({\text{x}}_{T}\right)=\mathcal{N}({\text{x}}_{T};0,\mathbf{I})\):

$$\begin{array}{*{20}c} {p_{\theta } \left( {{\text{x}}_{{0:T}} } \right): = p_{\theta } \left( {{\text{x}}_{T} } \right)\mathop \prod \limits_{{t = 1}}^{T} p_{\theta } \left( {{\text{x}}_{{t - 1}} |p_{\theta } \left( {{\text{x}}_{t} } \right)} \right),\quad p_{\theta } \left( {{\text{x}}_{{t - 1}} |{\text{x}}_{t} } \right): = N\left( {{\text{x}}_{{t - 1}} ;{\varvec{\mu }}_{\theta } \left( {{\text{x}}_{t} ,t} \right),\sigma _{t}^{2} {\mathbf{I}}} \right)} \\ \end{array}$$
(1)

The forward or diffusion process is the approximate posterior \(\:q\left({\text{x}}_{1:T}|{\text{x}}_{0}\right)\), which is fixed to a Markov chain that gradually adds Gaussian noise to the data according to a variance schedule \(\:{\beta\:}_{1},\dots\:,{\beta\:}_{T}\):

$$\begin{array}{*{20}c} {q\left( {{\text{x}}_{{1:T}} |{\text{x}}_{0} } \right): = \mathop \prod \limits_{{t = 1}}^{T} q\left( {{\text{x}}_{t} {\text{|x}}_{{t - 1}} } \right),~~q\left( {{\text{x}}_{t} {\text{|x}}_{{t - 1}} } \right): = N\left( {{\text{x}}_{t} ;\sqrt {1 - \beta _{t} } {\text{x}}_{{t - 1}} ,\beta _{t} {\mathbf{I}}} \right)} \\ \end{array}$$
(2)

In practice, the forward process is achieved by gradually adding Gaussian noise following the variance schedule. This process does not encounter learnable parameters. The reverse process, on the other hand, is implemented with a learnable deep neural network.

Cold diffusion model28 is a generalized diffusion model that provides more flexibility for image degradation and restoration. Given an image data \(\:{\text{x}}_{0}\), consider a degradation operator \(\:D\) with severity \(\:t\), then the degraded \(\:{\text{x}}_{t}=D({\text{x}}_{0},t)\) should vary continuously in \(\:t\). And the degradation should satisfy \(\:D\left({\text{x}}_{0},0\right)={\text{x}}_{0}\). In the standard diffusion model, \(\:D\) gradually adds Gaussian noise with a variance proportional to \(\:t\). Works have been done for more efficient noise scheduling along \(\:t\) and faster convergence30,31,32,33. To revert this process and generate an image, the restoration operator \(\:R\) (approximately) inverts \(\:D\) and has the property of \(\:R({\text{x}}_{t},t)\approx\:{\text{x}}_{0}\). The restoration operator \(\:R\) is implemented via a deep neural network in practice and parameterized by \(\:\theta\:\). This network can then be trained via the minimization problem:

$${\mathop {\min }\limits_{\theta } {\mathbb{E}}\left\| {R_{\theta } \left( {D\left( {{\text{x}},t} \right),t} \right) - {\text{x}}} \right\|}$$
(3)

Once the degradation is chosen and the network is trained properly to perform the restoration, the network can be used to sample images from the degraded image. For standard diffusion models, one can generate images from noises as the network is trained to perform reconstruction under Gaussian noise. Standard diffusion models perform image generation (sampling from the model) by iteratively applying the denoising and adding the noise back:

$$\:{\widehat{\text{x}}}_{0}=R\left({\text{x}}_{t},t\right)$$
$$\:\begin{array}{c}{\text{x}}_{t-1}=D\left({\widehat{\text{x}}}_{0},t-1\right),\:\:t=T,T-1,\dots\:,1\end{array}$$
(4)

The sampling strategy above works well when the restoration operator is perfect, meaning \(\:R\left(D\left({\text{x}}_{0},t\right),t\right)={\text{x}}_{0}\) holds for all \(\:t\). However, if the restoration operator is an approximate inverse of the degradation, \(\:{\text{x}}_{t}\) can wander away from \(\:D\left({\text{x}}_{0},t\right)\) and lead to an inaccurate reconstruction. Cold diffusion proposed an improved sampling strategy. Instead of sampling \(\:{\text{x}}_{t}\) from \(\:{\widehat{\text{x}}}_{0}\) directly, it samples \(\:{\text{x}}_{t}\) via intermediate variables:

$$\:\begin{array}{c}{\text{x}}_{t-1}={\text{x}}_{t}-D\left({\widehat{\text{x}}}_{0},t\right)+D\left({\widehat{\text{x}}}_{0},t-1\right)\end{array}$$
(5)

This sampling strategy is beneficial especially when the higher order terms in the Taylor expansion of the degradation \(\:D\left(\text{x},t\right)\) is non-negligible. This sampling strategy enables more reliable reconstructions for cold diffusion models with smaller total step number \(\:T\) and a variety of image restoration operations such as deblurring, inpainting, super-resolution, snowification, etc.

Cold diffusion in k-space

An MR scanner performs imaging by acquiring measurements in the frequency domain (i.e., k-space) using receiver coils. The relationship between the underlying image \(\:\mathbf{x}\) and the measured k-space \(\:\text{k}\) can be represented as:

$$\begin{array}{*{20}c} {k = {\mathcal{F}}\left( {\text{x}} \right) + \epsilon } \\ \end{array}$$
(6)

where \(\:\mathcal{F}\) is the Fourier transformation operator and \(\epsilon\) is the measurement noise. The MRI acquisition speed is limited by the amount of k-space data to obtain. This acquisition process can be accelerated by down-sampling only a portion of the k-space data \({\tilde{{\text k}}} = M \circ {\text{k}}\), where \(\:M\) is a down-sampling binary mask that selects a subset of the overall k-space and \(\:\circ\:\) indicates Hadamard product. Then, only the selected subset data in k-space is collected during the measurement. Thus, the accelerated image can be represented as:

$$\:\begin{array}{c}\widehat{\text{x}}={\mathcal{F}}^{-1}\left(M\circ\:\text{k}\right)\end{array}$$
(7)

This down-sampling process leads to less k-space data to collect and faster imaging speed. However, after applying the reverse Fourier transformation and go back to the image space, the resulting image typically includes aliasing artifacts. Heavier down-sampling typically leads to more intense aliasing artifacts and worse image quality. This k-space down-sampling can be considered as an image degradation.

Furthermore, consider a sampling mask \(\:{M}_{t}\) that changes along time steps \(\:t\). This image degradation can then be written as:

$$\:\begin{array}{c}{\text{x}}_{t}=D\left({\text{x}}_{0},t\right)={\mathcal{F}}^{-1}\left({M}_{t}\circ\:\text{k}\right),\:\:t=\text{0,1},\dots\:,T\end{array}$$
(8)

In the cold diffusion model, the degradation severity varies along \(\:t\). Where \(\:t=0\) corresponds to the original image and \(\:t=T\) corresponds to the final degraded image. Here for the k-space cold diffusion, \(\:{\text{x}}_{0}={\mathcal{F}}^{-1}\left(\text{k}\right)\) is the fully sampled image reconstruction and \(\:t\) determines the down-sampling proportion in k-space. Larger \(\:t\) number regarding a heavier down sampling in k-space. We set \(\:{M}_{t=0}=\varvec{J}\), \(\:{M}_{t=T}=M\), where \(\:\varvec{J}\) denotes matrix of ones and \(\:M\) denotes the sampling mask that acquires the measurement k-space data. This indicates that, for the un-degraded image at \(\:t=0\), the image is a fully sampled reconstruction from the k-space measurement, and the final degraded image is a zero-filled reconstruction from the sub-sampling measurement corresponding to the mask \(\:M\). The sampling mask proportion for intermediate steps are scheduled linearly according to the step number \(\:t\).

Later, we train a model to reverse this k-space degradation:

$$\:\begin{array}{c}{\widehat{\text{x}}}_{0}=R\left({\text{x}}_{t},t\right)=R\left({\mathcal{F}}^{-1}\left({M}_{t}\circ\:\text{k}\right),t\right)\end{array}$$
(9)

Then, we use Eq. (5) to predict the fully sampled reconstruction \(\:{\widehat{\text{x}}}_{0}\). Figure 1 illustrates the overall pipeline of our k-space cold diffusion model.

Implementation details

We performed our k-space down-sampling degradation with two types of masks: Cartesian sampling mask29 and 2D Gaussian mask. Both are binary masks where 1 indicates that the corresponding k-space data is preserved and 0 indicates the corresponding k-space is masked out.

Let \(\:{M}^{c}=\mathbf{I}-M\), which indicates the complement matrix of the mask \(\:M\). At each time step, we randomly select a subset \(\:{M}_{t}^{c}\:\)from \(\:{M}^{c}\) such that the portion being selected is proportional to \(\:(T-t)/T\), meaning \(\:{M}_{t}^{c}\) is scheduled linearly for \(\:t\). Then, \(\:{M}_{t}=M+{M}_{t}^{c}\:\) and the corresponding image is \(\:{\text{x}}_{t}={\mathcal{F}}^{-1}\left({M}_{t}\text{k}\right)=D\left({\text{x}}_{0},t\right).\)Fig. 2 further demonstrates examples for this k-space degradation process along time steps.

Fig. 2
figure 2

K-space cold diffusion degradation process. (a) for Cartesian sampling mask, (b) for Gaussian sampling mask.

We trained a U-Net34 structure followed the cold diffusion model for image restoration. It includes 4 depth layers with the number of channels in the first layer being 64. The channel number doubles at each depth. We performed experiments on a large-scale open-source MRI dataset: fastMRI29. All experiments are performed with the single-coil knee dataset. Our k-space cold diffusion model was trained with k-space data computed from \(\:320\times\:320\) complex images. For both acceleration masks, we performed 4-fold and 8-fold acceleration reconstruction, with the central fraction being 0.08 for 4-fold and 0.04 for 8-fold as suggested by the fastMRI work. We trained our k-space cold diffusion model by minimizing L1 loss in Eq. (3). The network was trained to predict the fully sampled reconstruction \(\:{\widehat{\text{x}}}_{0}\) given a k-space degraded image. All models were trained for 700,000 iterations using the Adam optimizer35. The batch size is set to 6 and the learning rate is set to \(\:2{\times\:10}^{-5}\).

To further demonstrate the effectiveness of our model, we performed comparison studies with multiple baseline deep learning-based MRI reconstruction models: U-Net, W-Net5 and end-to-end variational net (E2E-VarNet)7. Note that although our model performs image degradation in k-space, the network works in the image space like the U-Net. Where W-Net and E2E-VarNet work in the k-space. All models have been set to have the same size for fair comparison. We report peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM)29 as our performance metrics since other baseline models utilize them for evaluation and comparison.

Results

All the experiments are performed with the large-scale open-source MRI dataset: fastMRI. We used the full single-coil knee dataset as our training set. For testing, we randomly selected 6 PD and PDFS volumes from the validation set.

Figure 3 demonstrates reconstruction results for 4-fold and 8-fold Cartesian under-sampling masks. We compare our k-space cold diffusion model with U-Net, W-Net and E2E-VarNet. Corresponding error maps are shown under each reconstruction image. All error maps have been magnified five times for better demonstration. We find that our k-space cold diffusion model preserves more image details and cleaner error maps compared to other models. Table 1 provides evaluation metrics for the 4-fold and 8-fold Cartesian under-sampling reconstructions, showing that our model outperforms others with a PSNR/SSIM of 30.58/0.7150 for 4-fold and 29.51/0.6414 for 8-fold. In Fig. 4; Table 2, we demonstrate reconstruction results and evaluation metrics for Gaussian sampling masks with 4-fold and 8-fold acceleration tasks. Our method outperforms others with a PSNR/SSIM of 30.31/0.7059 and 29.59/0.6416 for 4-fold and 8-fold, respectively. Similar to the Cartesian sampling reconstruction tasks, we found our model preserves more image details and textures and gives out cleaner error maps.

Fig. 3
figure 3

Reconstruction results for 4-fold and 8-fold accelerations with Cartesian sampling mask. (a) fully sampled target image, (b) zero-filled images and sampling masks, (c) U-Net reconstructions, (d) W-Net reconstructions, (e) E2E-VarNet reconstructions, (f) our proposed k-space cold diffusion reconstructions. Corresponding error maps are demonstrated together with their reconstructions. The numbers on the lower left of each image indicate PSNR and SSIM, respectively.

Fig. 4
figure 4

Reconstruction results for 4-fold and 8-fold accelerations with Gaussian sampling mask. (a) fully sampled target image, (b) zero-filled images and sampling masks, (c) U-Net reconstructions, (d) W-Net reconstructions, (e) E2E-VarNet reconstructions, (f) our proposed k-space cold diffusion reconstructions. Corresponding error maps are demonstrated together with their reconstructions. The numbers on the lower left of each image indicate PSNR and SSIM, respectively.

Table 1 Evaluation metrics for 4-fold and 8-fold 1D Cartesian under-sampling reconstructions.
Table 2 Evaluation metrics for 4-fold and 8-fold 2D Gaussian under-sampling reconstructions.

In order to explore the effects of the total sampling time steps on the final reconstructions, we performed tests using Cartesian sampling masks with the total time step \(\:T\) set to 125, 250 and 1000. Corresponding evaluation metrics are shown in Table 3. We found that the PSNR and SSIM are rather close to each other in these settings. Note that for this Cartesian sampling case, the resolution for k-space is \(\:320\times\:320\), meaning that for larger time steps, the same sampling proportion may map to the same time step, such that the final performance may stay the same. Yet for even higher resolution or heavier down-sampling ratio, larger time steps can still be beneficial. Here for our settings, we use \(\:T=125\) for all our tests as it takes less steps for reconstruction and is faster.

Table 3 Evaluation metrics for different time step settings. We performed tests using Cartesian sampling masks with the total time step \(\:T\) set to 125, 250 and 1000.

To further study the generalization capabilities of our k-space cold diffusion model, we performed zero-shot transfer learning tasks using equispaced Cartesian sampling, 1D Gaussian sampling, and 2D Gaussian sampling masks. In each setup, we utilized the models originally trained with Cartesian sampling masks, as demonstrated in Table 1. All models were evaluated directly with the corresponding sampling masks without any fine-tuning. Reconstruction comparison results are illustrated in Table 4. Our model achieves superior performance with a PSNR/SSIM of 30.54/0.7139 for 4-fold and 29.56/0.6440 for 8-fold equispaced Cartesian sampling masks. In the 1D Gaussian sampling zero-shot reconstruction test, our model outperforms others with a PSNR/SSIM of 30.34/0.7090 and 29.46/0.6464 for 4-fold and 8-fold, respectively. Similarly, in the 2D Gaussian sampling zero-shot test, our model maintains superior and stable performance with a PSNR/SSIM of 30.04/0.6992 for 4-fold and 29.31/0.6341 for 8-fold. Additionally, we conducted a 4-fold reconstruction test using 8-fold pre-trained models, as demonstrated in Table 4. Notably, our k-space cold diffusion model achieves a PSNR/SSIM of 30.50/0.7138 for equispaced Cartesian sampling, 30.17/0.7061 for 1D Gaussian sampling, and 30.04/0.6999 for 2D Gaussian sampling, which is comparable to the performance of the 4-fold model. Interestingly, the performance of other models significantly drops for 2D Gaussian sampling due to the difference in sampling strategies between 2D and 1D. This demonstrates that our model benefits from the k-space cold diffusion process, which is naturally and inherently conditioned on the sampling mask, allowing the model to be applied in a broader scope.

Table 4 Zero-shot learning evaluation metrics for equispaced Cartesian under-sampling, 1D Gaussian under-sampling, and 2D Gaussian under-sampling reconstructions.

As an image generation model, diffusion-based method can output multiple reconstructions given a starting point. The original noise-based diffusion models can generate multiple images from noises. Later conditional diffusion models are capable of performing multiple samplings given the same starting point. Here we also explore the effects of multiple samplings of our model. Examples are shown in Fig. 5. For multiple sampling settings, we take the average of each pixel from different sampling images as the final output image. The uncertainty maps are calculated by taking the standard deviation among all samples in the pixel level. As the acceleration factor increases, the corresponding uncertainty increases as well. Corresponding PSNR/SSIM are labelled together with the image. We found that when the sampling number is 5–10, the improvements in metrics are rather noticeable. As the sampling number increases, the enhancement converges. Note that more samplings lead to a longer reconstruction time and the averaging over multiple samplings can lead to a smoothness for the final reconstruction, which hampers the image details. Taking these into consideration, we simply take single sampling during our reconstructions.

Fig. 5
figure 5

Reconstructions with different sampling numbers. (a) Fully sampled image, (b) zero-filled images and sampling masks. (c,d,e,f,g) for 1, 5, 10, 20 and 40 samples for reconstructions. Images are generated by taking the average of multiple samples. Corresponding uncertainty maps are shown under each image. The numbers on the lower left of each image indicate PSNR and SSIM, respectively.

Conclusion

Under-sampling reconstruction is an essential topic for imaging. Diffusion-based models have shown superior performance in multiple fields nowadays including image generation, super-resolution, in-painting, editing, text-to-image transformation, etc. Moreover, cold diffusion provides a framework for generalized diffusion models. The image degradation is no longer limited to Gaussian noise, but other general image degradation operations such as blurring, inpainting, down-scaling, and snowification, among others. In this work, we present a novel k-space cold diffusion framework for accelerated MRI reconstruction. Instead of performing image degradation in image space, k-space cold diffusion performs image degradation by continuously increasing the down-sampling ratio in k-space and using a deep neural network to learn the restoration. This k-space degradation can accommodate different sampling masks. Furthermore, by embedding the k-space sampling process directly into the image degradation process, k-space cold diffusion is inherently conditioned on the sampling mask. A deep neural network is trained to effectively reverse this process, thereby enhancing the model’s ability to generalize. Although we present our work with the MRI acceleration task, we want to note that this method can applied to other medical imaging fields such as the sparse view CT reconstruction.

We tested our k-space cold diffusion method with a large-scale open-source dataset with different acceleration factors and sampling masks. Our results show that our method outperforms other typical reconstruction networks working in image space, k-space and unrolled structure. We conducted generalizability tests for zero-shot transfer learning using unforeseen masks. Remarkably, our model demonstrated superior performance and maintained similar levels of accuracy in 4-fold reconstructions, even when using 8-fold pre-trained models. Furthermore, we explore the effects of the total time steps and sampling number and show that the method is capable of generating good results even with a small total time step and just a single sample.

There remains much potential for future works. For example, the current k-space cold diffusion model uses linear scheduling for k-space degradation following the leads in the original cold diffusion work. However, lower frequency regions in k-space correspond to the overall image content in the image space, while higher frequency regions correspond to image details. Thus, the degradation process can be further accommodated to encounter this difference. Secondly, the k-space degradation process can be further pixelized to enable a more refined restoration process. This can be especially beneficial if the acceleration factor is large, and the sampling mask is unknown. Another interesting topic can be a latent space cold diffusion work for accelerated image reconstruction. This would further improve the efficiency of image restoration and can enable prompt design for more general reconstruction tasks.