A generalized dual-domain generative framework with hierarchical consistency for medical image reconstruction and synthesis

Zhang, Jiadong; Sun, Kaicong; Yang, Junwei; Hu, Yan; Gu, Yuning; Cui, Zhiming; Zong, Xiaopeng; Gao, Fei; Shen, Dinggang

doi:10.1038/s44172-023-00121-z

Download PDF

Article
Open access
Published: 12 October 2023

A generalized dual-domain generative framework with hierarchical consistency for medical image reconstruction and synthesis

Jiadong Zhang ORCID: orcid.org/0000-0001-6305-4928¹^na1,
Kaicong Sun¹^na1,
Junwei Yang^1,2^na1,
Yan Hu^1,3^na1,
Yuning Gu¹^na1,
Zhiming Cui¹^na1,
Xiaopeng Zong¹,
Fei Gao ORCID: orcid.org/0000-0001-7524-1499¹ &
…
Dinggang Shen^1,4,5

Communications Engineering volume 2, Article number: 72 (2023) Cite this article

6080 Accesses
14 Citations
8 Altmetric
Metrics details

Subjects

Abstract

Medical image reconstruction and synthesis are critical for imaging quality, disease diagnosis and treatment. Most of the existing generative models ignore the fact that medical imaging usually occurs in the acquisition domain, which is different from, but associated with, the image domain. Such methods exploit either single-domain or dual-domain information and suffer from inefficient information coupling across domains. Moreover, these models are usually designed specifically and not general enough for different tasks. Here we present a generalized dual-domain generative framework to facilitate the connections within and across domains by elaborately-designed hierarchical consistency constraints. A multi-stage learning strategy is proposed to construct hierarchical constraints effectively and stably. We conducted experiments for representative generative tasks including low-dose PET/CT reconstruction, CT metal artifact reduction, fast MRI reconstruction, and PET/CT synthesis. All these tasks share the same framework and achieve better performance, which validates the effectiveness of our framework. This technology is expected to be applied in clinical imaging to increase diagnosis efficiency and accuracy.

Overcoming barriers to data sharing with medical image generation: a comprehensive evaluation

Article Open access 24 September 2021

Multi-domain improves classification in out-of-distribution and data-limited scenarios for medical image analysis

Article Open access 18 October 2024

Generative models improve fairness of medical classifiers under distribution shifts

Article Open access 10 April 2024

Introduction

Medical imaging as an indispensable imaging technique plays a critical role in plenty of clinical applications, such as screening, disease diagnosis, and treatment planning. The enhancement of image quality has been a central topic for decades in the field of medical image processing¹. Different from natural images which are usually captured directly in the image domain, medical imaging usually acquires data in the modality-specific domain such as the k-space domain for magnetic resonance imaging (MRI)², and the sinogram domain for computed tomography (CT)³ and positron emission tomography (PET)⁴. To take advantage of the acquisition property of medical imaging, one should exploit the underlying information patterns in both the acquisition domain and the image domain. Moreover, one can apply more sophisticated constraints such as cycle consistency within and across domains to better regularize the solution space for generative tasks.

Medical image reconstruction and synthesis are the typical generative tasks in medical imaging, and can strongly benefit from the aforementioned dual-domain cycle-consistency scheme. Medical image reconstruction is one of the pillars of medical imaging. Reconstruction tasks usually can be categorized into two subgroups⁵: (1) reconstruction in the form of forward/backward transform such as low-dose CT^6,7,8 and fast MRI reconstruction^9,10,11; (2) reconstruction as post-processing to improve image quality such as metal artifact reduction (MAR)^12,13,14 and super-resolution (SR)^15,16,17. Most of the recent dual-domain-based medical image reconstruction works exploit dual-domain information by individual sub-networks, which are connected either in parallel branches^18,19 or sequentially in a cascaded manner^20,21,22, which can be further used for the diagnosis tasks^23,24. The backbone of the sub-networks can be UNet-like^19,21,22,25, Transformer²⁶, or recently emerged Diffusion model²⁷. In particular, Jun et al.¹⁹ utilize two parallel UNet-shaped networks serving as regularizations in the k-space and image domains, respectively, for MRI reconstruction. Reseachers²¹ also adopt sequentially cascaded sinogram and image networks for simultaneous metal artifact reduction and low-dose CT reconstruction. Although these existing reconstruction methods^{20,28,29,30,31} have taken dual-domain information into account for better data consistency and overall performance, to our best knowledge, these networks are task-specifically designed and there is limited work that uses a generalized framework for dual-domain reconstruction. More importantly, there is no study yet considering structured consistency constraints which cover multi-level constraints within and across domains for better regularization of the solution domain.

Different from image reconstruction, medical image synthesis aims to infer a desired imaging modality without an actual scan such as synthesizing imaging modalities which are usually unavailable in routine clinical practice, i.e., due to cost. Further, since different modalities can reveal complementary physical characteristics of the underlying tissues, synthesis of missing or key modalities can lead to more accurate diagnosis and treatment planning^32,33. Medical image synthesis can be categorized into inter-modality synthesis and intra-modality synthesis³². The inter-modality synthesis denotes image synthesis between two imaging modalities such as from PET to CT³⁴, while the intra-modality counterpart refers to the studies such as transferring between different MRI sequences. Most of the existing learning-based models for medical image synthesis adopt VAE³⁵, GAN³⁶-based network architecture, and its variations^37,38,39. By resorting to the adversarial learning strategy, they obtain more plausible and real-looking images. For example, Dong et al.⁴⁰ propose a framework based on cycle-consistent generative adversarial networks (CycleGAN) to synthesize CT images from the non-attenuation corrected PET. Another method MedGAN³⁸, as a non-application-specific framework, merges the adversarial framework with several feature-level similarity metrics to facilitate the similarity match. However, these representative methods, following the works for natural images, purely manipulate information in the image domain, without considering inherent differences in image acquisition between natural images and medical images. And, ignoring dual-domain cycle consistency limits their performance in medical image synthesis.

To cope with the above-mentioned issues, we present a generalized learning-based dual-domain framework for generative tasks of medical images, by employing hierarchical consistency constraints including all possible directional constraints within and across domains by means of a multi-stage learning strategy. To be specific, different from the majority of existing generative models that manipulate in a single domain such as CycleGAN⁴⁰, we propose to exploit the underlying patterns in both domains. Moreover, we aim to build up multi-level similarity constraints between the source and target images in dual domains to better regularize the solution space. As shown in Fig. 1a and b, without loss of generality, for reconstruction or synthesis tasks of medical imaging, there exist two domains, namely the image domain and the acquisition domain. The source and target images can be transformed bidirectionally in each domain based on their individual generative function G. Depending on the applications, function G can be realized by learning-based or model-based functions. Cross-domain information can be exchanged by means of the transform function F and the inverse transform function F⁻¹. The methods can be applied to various tasks, as shown in Fig. 1c. Inspired by CycleGAN⁴⁰, we introduce a multi-stage learning strategy S1–S3 which, respectively, accounts for intra-domain consistency, inter-domain consistency, and cycle consistency as demonstrated in Fig. 1d. These hierarchical intra- and inter-domain consistency constraints construct and consolidate a comprehensive multi-level similarity match between the source and target images. Particularly, the intra-domain consistency stage S1 imposes the primary consistency constraint within the individual domain. To preserve the inter-domain consistency, in stage S2 we introduce a sequential constraint as marked by black arrows. Lastly, to further strengthen the similarity match, in stage S3 we adopt the cycle-consistency constraint within and across domains as depicted by yellow and green arrows. For a detailed description, please refer to the “Methods” section. We have conducted quantitative and qualitative analyses of the proposed framework from different aspects. Our framework achieves remarkable performance gain in many representative generative tasks in medical imaging including low-dose PET and CT reconstruction, MAR, fast MRI reconstruction, and PET-CT synthesis.

**Fig. 1: Overview of our proposed dual-domain generative framework with hierarchical consistency for medical image reconstruction and synthesis.**

Results

To validate the effectiveness and generalizability of our generative framework, we carried out extensive experiments for representative reconstruction and synthesis tasks for medical imaging including low-dose PET/CT reconstruction, CT metal artifact reduction, accelerated MRI reconstruction, and PET-CT synthesis quantitatively and qualitatively. In Figs. 2 and 3, we demonstrate the performance of our framework for the above generative tasks in different cases. In Figs. 4 and 5, we perform an in-depth analysis of our framework using the noise power spectrum (NPS) calculated according to the work of Dobbins III et al.⁴¹. An elaborate investigation of the proposed framework is conducted, including quantitative and qualitative evaluation, effectiveness, and ablation studies. In the following experiments, we employ structural similarity index measurement (SSIM), peak signal-to-noise ratio (PSNR), NPS, and standardized uptake values (SUV) for quantitative assessment.

**Fig. 2: Application of the proposed framework on different reconstruction tasks.**

**Fig. 3: Visual comparison of synthesized positron emission tomography (PET) and computed tomography (CT) images for six typical cases by different methods.**

Low-dose PET/CT reconstruction

CT and PET both are imaging techniques based on ionizing radiation. To reduce radiation exposure risk in clinical applications, low-dose CT and low-dose PET are often preferred. Comparing to standard-dose CT and PET, the reconstruction of low-dose ones usually suffers from severe artifacts and noise, which may greatly affect a physician’s diagnosis^42,43. Hence, the development of effective reconstruction algorithms for low-dose PET/CT is a critical task that has an urgent demand in clinical practice. In this experiment, we evaluate the effectiveness of the proposed framework on low-dose PET/CT reconstruction, where the Radon transform is used as the transform function F.

Performance evaluation

To evaluate the performance of our framework for low-dose PET/CT reconstruction, we compare the proposed reconstruction model with several representative learning-based reconstruction methods, including UNet⁴⁴, RUNet⁴⁵, p2pGAN⁴⁶, CycleGAN⁴⁰, and MedGAN³⁸, and also a recent dual-domain reconstruction method, i.e., iBP-Net⁴⁷. We use the acquired low-dose images as source data and the standard-dose images as target data. The experimental results are summarized in Table 1. We can see that p2pGAN and MedGAN achieve better performance than RUNet by means of adversarial learning. CycleGAN slightly outperforms p2pGAN and MedGAN by using cycle-consistency. However, these methods only use image-domain information without considering sinogram patterns. Compared to the case of using only image-domain methods, iBP-Net combines the advantages of both domains, leading to better reconstruction performance than other comparison methods. On the other hand, iBP-Net reconstructs low-dose images using the cascaded dual-domain networks, which cannot explicitly guarantee dual-domain consistency. In contrast, our method exploits dual domains with multi-level consistency constraints and achieves the best performance in terms of both SSIM and PSNR metrics. To conduct a clinical evaluation of the reconstructed PET uptakes, we calculate SUV bias (SUV_mean and SUV_max) using the ground-truth standard-dose PET uptakes and report results in Table 1. These results show better performance of our method than other comparison methods.

Table 1 Quantitative comparison with representative learning-based methods for low-dose PET/CT reconstruction.

Full size table

Effectiveness of dual-domain hierarchical consistency in low-dose PET/CT reconstruction

The proposed generative framework contains multi-stage consistency constraints, consisting of intra-domain consistency (S1), inter-domain consistency (S2), and cycle-consistency (S3) to build up similarity match in a hierarchical manner between the source and target images within and across domains in bi-directions, i.e., from target to source and source to target. In this experiment, we perform an in-depth analysis of the effectiveness of the introduced dual-domain and hierarchical consistency scheme for low-dose PET/CT reconstruction. In particular, we evaluate the impact of each stage from S1 to S3 by ablation study on both reconstruction tasks. Besides, we conduct ablation study also for the cases of using single and dual domains.

Quantitative results are summarized in Table 2. Different consistency constraints are involved in the corresponding stages from S1 to S3 as shown in Fig. 1. From the table, we can see that there indeed exists noticeable performance improvement for both tasks when using hierarchical consistency constraints by multi-stage learning strategy. To evaluate the importance of the dual-domain scheme in the proposed generative framework, we carried out experiments based on PET/CT reconstruction tasks. From Table 2, we can see that the dual-domain scheme achieves better performance than the single-domain as expected. The utilization of a dual-domain scheme allows the generative framework to better exploit the latent patterns in a complementary way. Moreover, it enables the integration of hierarchical consistency constraints into the framework.

Table 2 Ablation study of the proposed framework for low-dose PET/CT reconstruction including multi-stage and dual-domain schemes.

Full size table

Metal artifact reduction

Metal artifacts caused by the presence of metal implants such as dental fillings can generate abnormal streaks across images and severely impede the detection and diagnosis of disease. Metal artifact reduction (MAR) has hence of great importance in clinical practice for decades^48,49,50,51. In this experiment, we evaluate the effectiveness of the proposed framework on MAR in CT images and hence, Radon transform is used for transformation between domains.

Performance evaluation

To demonstrate the advancement of our framework for MAR, we compare with other representative MAR methods such as the conventional linear interpolation (LI)⁴⁸ and normalized metal artifact reduction (NMAR)⁴⁹ methods, and learning-based RCN⁵⁰, CycleGAN³⁹, attention-based MAR⁵¹ (AttenMAR), and DudoNet²⁸ on two datasets (with details provided in the “Methods” section). It is worth noting that our MAR model does not depend on complex pre-processing steps, such as pre-segmentation of the implant, and the model is easy to train and re-implement. Since we have metal-free images as references, we adopt the conventional metrics PSNR and SSIM for quantitative assessment. From the experimental results provided in Table 3, we can observe that AttenMAR and the proposed model achieve much better performance than the others in terms of average SSIM and PSNR on both datasets. In comparison with AttenMAR, the performance gain of our model is mainly from the use of hierarchical consistency within and across domains.

Table 3 Quantitative comparison with representative methods for MAR on both in-house teeth CBCT dataset and public DeepLesion dataset.

Full size table

Effectiveness of dual-domain hierarchical consistency in MAR

We have shown the effectiveness of hierarchical consistency and dual-domain on low-dose PET/CT reconstruction in the previous section. In this additional experiment, we validate the influence of hierarchical consistency within and across domains on MAR. We perform an ablation study and report the results in Table 4. We can see the impact of hierarchical consistency constraints from S1 to S3 on both datasets. The multi-stages scheme not only improves the mean of different assessment metrics but also the individual standard deviation, which indicates better robustness for different test samples. Besides, the use of a dual-domain scheme allows greater performance improvement than a single-domain scheme in terms of SSIM, PSNR, and NRMSE.

Table 4 Ablation study of the proposed framework for MAR on both in-house teeth CBCT dataset and public DeepLesion dataset.

Full size table

Fast MRI reconstruction

MRI is a commonly used non-invasive and radiation-free medical imaging technique. Despite its advantages in high spatial resolution and multi contrasts, a major limitation of MRI is the slow acquisition speed, since multiple times of radiofrequency (RF) pulses are required to fill in the k-space for encoding spatial-frequency information, and also each contrast has to be scanned separately. Consequently, such lengthy acquisitions can lead to patient discomfort and severe motion artifacts in the acquired images⁵². Meanwhile, this also limits the availability of scanners. Therefore, the development of fast MRI reconstruction algorithms can greatly improve the efficiency of data acquisition and subsequent diagnosis. In this experiment, we validate our framework for fast MRI reconstruction and use Fourier transform as the transform function F.

Performance evaluation

For the sake of demonstrating the effectiveness of dual-domain and hierarchical consistency, we evaluate the performance of our framework using two backbones, namely UNet⁵³ and E2EVarNet⁵⁴ as the generative functions. In particular, UNet acts as ${G}_{\rm {{t}}}^{I},{G}_{\rm {{t}}}^{A}$ in both domains and shares the same weights. Data-consistency mapping⁵⁵ is utilized as ${G}_{\rm {{s}}}^{I},{G}_{\rm {{s}}}^{A}$. The same setting is implemented for the E2EVarNet backbone. It is worth noting that more sophisticated generative functions can be used in our framework for possibly better performance. We evaluate the proposed reconstruction model using an in-house MRI dataset containing 62 T2-weighted (T2w) MR images. We compare reconstructed images by our model with the corresponding baseline methods, for the acceleration rates of 4× and 8×. Quantitative evaluation is provided in Table 5. In addition, we also compare with the state-of-the-art MRI reconstruction method, i.e., DuDoRNet²⁰. Due to hardware restriction, the number of recurrent blocks is set as 2 for fair comparison, and the optimization is performed for 1000 epochs. As can be observed, compared to both representative baseline methods, our proposed model provides better performance in all the evaluation metrics for both acceleration rates.

Table 5 Quantitative evaluation of our framework based on two representative backbones UNet and E2EVarNet using the in-house dataset for 4× and 8× acceleration rates.

Full size table

Effectiveness of dual-domain hierarchical consistency in MRI Reconstruction

By taking the native dual-domain representation of MRI data into account, our framework can exploit patterns in both domains and regularize the optimization in a structured manner. To demonstrate the effectiveness of the introduced dual-domain hierarchical consistency constraints, we conduct an ablation study and summarize the quantitative analysis in Table 6. As we can see, the hierarchical consistency constraints improve the performance of MRI reconstruction stepwise which coincides with the aforementioned results for the PET/CT reconstruction and MAR tasks. Besides, it is shown that the utilization of dual-domain information brings a noticeable improvement in reconstruction performance for both acceleration factors than using solely single-domain data.

Table 6 Ablation study of the proposed framework for MRI reconstruction based on the UNet backbone on the in-house dataset for 4× and 8× acceleration rates.

Full size table

PET-CT synthesis

Both PET-to-CT and CT-to-PET syntheses have great potential in clinical applications. In routine clinical practice, CT is used for anatomical localization for PET. Synthesis of CT from PET can avoid additional radiation caused by CT, which is of great importance for reducing radiation dose. From another perspective, since PET is more expensive than CT and is not as easily available as CT, synthesis of PET from CT is in fact also of great practical interest. In this experiment, we use PET-to-CT and CT-to-PET syntheses as case studies to evaluate and analyze the effectiveness of our proposed framework on synthesis tasks.

Performance evaluation

In this experiment, our network has the same structure as the one used for low-dose PET/CT reconstruction. We compare our synthesis model with the representative approaches, including UNet⁴⁴, RUNet⁴⁵, p2pGAN⁴⁶, CycleGAN⁴⁰, and MedGAN³⁸. These networks are widely used for medical image synthesis, especially for PET-CT synthesis. Note that only CycleGAN and our framework can jointly learn the two tasks, i.e., PET-to-CT synthesis and CT-to-PET synthesis. Others are single-direction synthesis models and are thus trained for these two synthesis tasks independently. The quantitative results of all the studied models are summarized in Table 7. For the PET-to-CT task, our framework achieves SSIM up to 0.9843 and outperforms the other methods by a large margin. With respect to the CT-to-PET task, the proposed network also shows superior performance than the other models. To evaluate the SUV bias of reconstructed PET images in a CT-to-PET reconstruction task, we demonstrate the performance of each method in Table 7, indicating the superiority and feasibility of our framework. To better visualize the performance improvement, we show perceptual evaluation in Fig. 3. The PET-to-CT images are demonstrated in the left panel, and the CT-to-PET results are shown in the right panel. For each task, we illustrate six locations of the human brain. We can clearly see that, in comparison to other methods, the synthesized CT and PET images by the proposed model are more consistent with the GT images, which coincides with the above quantitative assessment. Our results indicate that, by resorting to dual-domain cycle consistency, the proposed generative framework achieves promising performance for synthesis tasks. The noise power spectrum comparisons presented in Fig. 5 provide further evidence for the superior performance of our method. In both PET-to-CT and CT-to-PET synthesis tasks, our method exhibits the lowest noise power across spatial frequency. This indicates the great advantage of our dual-domain and hierarchical consistency learning approach.

Table 7 Quantitative comparison with other state-of-the-art methods for PET-CT synthesis based on the in-house dataset.

Full size table

**Fig. 4: The noise power spectrum analysis (NPS) for different reconstruction tasks.**

**Fig. 5: The noise power spectrum (NPS) analysis of the synthesized images by different state-of-the-art methods.**

Effectiveness of dual-domain hierarchical consistency in PET-CT synthesis

To better understand the effectiveness of each stage for synthesis tasks, we perform an ablation study on PET-CT synthesis and provide experimental results in Table 8. When comparing the performance obtained by S1 + S2 against S1, we can clearly see the impact of inter-domain consistency on the synthesis performance. When integrating the cycle consistency S3 into S1 + S2, we can see further improvement for both cases, which indicates the importance of cycle consistency within and across domains. From quantitative analysis, we can conclude that the multi-stage consistency constraints are well-designed for our framework and each stage provides different kinds of supervision to improve overall performance in a complementary way. Besides, we also conduct experiments to evaluate the dual-domain scheme for PET-CT synthesis tasks. Particularly, we compare the performance with and without using the sinogram domain information and list the quantitative results in Table 8. We can see great performance improvement induced by the sinogram domain information in terms of average SSIM, PSNR, and NRMSE for both PET-to-CT and CT-to-PET synthesis tasks.

Table 8 Ablation study of the proposed framework for PET-CT synthesis on in-house dataset.

Full size table

Discussion

In this paper, we propose a generalized dual-domain framework for medical image reconstruction and synthesis based on hierarchical consistency constraints within and across domains. In particular, the dual domain can be interpreted as an image domain and another domain of interest, such as an image acquisition domain. Different from the conventional CycleGAN framework, where cycle consistency is performed between the source and target images using an unsupervised learning scheme in a single modality, e.g., the image domain, our proposed dual-domain-based generative framework adopts the principle of hierarchical consistency in dual domains based on supervised learning. Leveraging dual domains not only coincides with the inherent characteristics of medical imaging but also allows better exploitation of the underlying patterns in both acquisition and image domains. Without loss of generality, by involving four generative functions between the source and target images in dual domains, bi-directional mappings across images and domains are enabled, although, for certain tasks, some of the generative functions are not necessary to be learned. More importantly, unlike most of the existing dual-domain-based generative methods which either adopt sequentially cascaded or parallel-connected sub-networks for processing the individual domain patterns, we explicitly impose hierarchical consistency, including intra-domain consistency, inter-domain consistency, and cycle consistency. These hierarchical consistency constraints are stepwise integrated into three stages of generative framework during the training, to achieve a multi-level similarity match in a stabilized and structured way. In extensive experiments, multiple representative generative tasks are investigated, and the proposed generative framework achieves superior performance compared to the corresponding state-of-the-art methods in different applications. Furthermore, we have performed comprehensive analysis and in-depth ablation study from different perspectives to evaluate the effectiveness of the dual-domain and hierarchical consistency design in several representative generative tasks.

We first carried out experiments to evaluate our framework for low-dose PET and low-dose CT reconstruction. We collected 70 standard-dose PET volumes with a total of 1540 slices and also 8 standard-dose CT volumes with 5326 slices in total. Following the standard simulation procedure, we generated corresponding low-dose counterparts. By using hierarchical dual-domain constraints, the proposed reconstruction model obtains great performance gain compared to representative reconstruction methods.

Furthermore, we have also evaluated our framework for metal artifact reduction in Cone-beam CT (CBCT). We conducted experiments on two datasets. The first one contains 100 CT volumes of teeth from local hospitals with 5500 selected slices in total, and the second one is a subset of the public dataset DeepLesion which contains 4118 slices. Compared to competitive approaches such as AttenMAR⁵¹ and RCN⁵⁰, our proposed framework exhibits remarkable improvement in terms of average PSNR and SSIM on both datasets.

To further validate the effectiveness of our proposed framework, we carried out experiments for MRI reconstruction. The development of an MRI reconstruction algorithm is of clinical importance since it can improve image quality by alleviating severe aliasing effects due to k-space subsampling. In the experiment, an in-house dataset which consists of 24-coil T2w MR brain images of 62 subjects is employed. We construct our reconstruction model based on two representative backbones, namely UNet⁵³ and E2EVarNet⁵⁴. Experiments show that the proposed framework obtains noticeable performance gain on both backbones for 4× and 8× acceleration rates.

Besides experiments for reconstruction tasks, we perform experimental evaluation also for the task of PET-CT synthesis based on an in-house dataset containing 65 paired PET/CT brain volumes. In comparison with existing state-of-the-art methods for image synthesis such as MedGAN³⁸ and p2pGAN⁴⁶, our proposed framework achieves great quantitative improvement, which coincides with the qualitative performance in visual perception.

In summary, from the above evaluations for different reconstruction and synthesis tasks, we can see that, one can always use a UNet-shaped network as the baseline backbone for the individual generative functions G, and separately designed network structure for different applications can further facilitate the model performance. Moreover, although different generative functions can be selected for different applications, all models share the same dual-domain framework with hierarchical consistency constraints and achieve remarkable performance improvement in their respective applications. Therefore, we can conclude that the proposed generative framework is general for medical image reconstruction and synthesis.

Methods