Fig. 4: FID heatmaps comparing real MR images with generated images without LAM and with LAM.

Shown are comparisons of (left) real-to-real comparison, (middle) real images versus generated images without the lesion-aware module (LAM), and (right) real images versus generated images with LAM. Lower FID values indicate higher similarity. Lower FID values indicate higher similarity. The optimal FID score is 0.0, signifying that the two sets of images are identical (the values along the diagonal from the bottom left to the top right are 0). Friedman test indicated significant differences among the three FID datasets (χ² = 120.82, p < 0.001). Dunn’s post hoc tests with Bonferroni correction showed significant pairwise differences: without LAM vs. real (−231.25, Cohen’s d = −1.07, p < 0.001), with LAM vs. real (−283.99, Cohen’s d = −1.31, p < 0.001), and with LAM vs. without LAM ( − 52.74, Cohen’s d = −0.24, p < 0.001).