Fig. 2 | Scientific Reports

Fig. 2

From: Irrelevant region preserving for counterfactual image manipulation

Fig. 2

The main structure of MCA-CLIP. Given the latent code w of the source image and the description, the Feature Fusion module computes \(t_w\) with strong semantics by 18 cross-attention blocks. The Precision Region Editing module learns to produce the needed masks for generator and finally generates the editing image with \(\Delta w\), w and masks involved.

Back to article page