Fig. 5: Using TERP to explain and check the reliability of a ViT trained on CelebA dataset.
From: Thermodynamics-inspired explanations of artificial intelligence

a ViT predicts the presence of 'Eyeglasses' in this image with a probability of 0.998. b Superpixel definitions for the test image following the 16 × 16 pixel definition of ViT patches. TERP results showcasing c\({{{{\mathcal{U}}}}}^{\, j}\), d\({{{{\mathcal{S}}}}}^{\, j}\), eθ j, and fζ j as functions of j, g corresponding TERP explanation. We can see the maximal drop in θ j happens when going from j = 2 to j = 3. By defining the optimal temperature \({\theta }^{o}=\frac{{\theta }^{\, \, j=2}+{\theta }^{\, \, j=3}}{2}\) as discussed in the “Results” section, a minimum in ζ j is observed at j = 3. Panels h–j show sanity checks63, i.e., the result of an AI explanation scheme should be sensitive under model parameter randomization (h), (i) and data randomization (j). k Saliency map results as baseline explanation for ‘Eyeglasses’ prediction. Red color highlights pixels with high absolute values of the class probability gradient across RGB channels. The high gradient at pixels not relevant to ‘Eyeglasses’ shows the limitation of the saliency map explanation. l TERP, and m saliency map explanations for the class ‘Male’. \({{{{\mathcal{U}}}}}^{\, j}\), \({{{{\mathcal{S}}}}}^{\, j}\), ζ j, and θ j as functions of j for (l, m) are provided in the SI.