Figure 4

Overview of our end-to-end IMT network for cross-modality generation. Notice that our training set is denoted as S = {(xi, yi), i = 1, 2, 3, …, n}, where xi and yi refer to the ith input given-modality image and its corresponding target-modality image. The training process involves two aspects. On the one hand, given an input image xi and a random noise vector z, generator G aims to produce indistinguishable images \({\hat{y}}_{i}\) from the real images yi. On the other hand, discriminator D evolves to distinguish between translated-modality images \({\hat{y}}_{i}\) generated by G and the real images yi. The output of D is 0 or 1, where 0 represents synthesized images and 1 represents the real data. In the generation process, translated-modality images can be synthesized through the optimized G.