Fig. 1
From: A latent diffusion approach to visual attribution in medical imaging

The counterfactual generation pipeline takes as input the abnormal image \(x^a\), which is then encoded by the VAE encoder (\(\epsilon\)) to form the encoded image latents Z and passed through the diffusion process to form noised latents of the image \(Z_T\) after incremental t steps. The fine-tuned conditional U-net denoises the latents into the conditioned latent Z, decoded by the VAE decoder D into the final generated counterfactual \(x^n\), from which a visual attribution map M(\(x^n\)) is subtractively generated.