Fig. 4: From concept-level explanations to model and data debugging.
From: From attribution maps to human-understandable explanations through Concept Relevance Propagation

a, Local analysis on the attribution map reveals several channels (203, 361, 483, 454, 414, 486 and more) in layer features.30 of a VGG-16 model with BatchNorm pretrained on ImageNet that encode for a Clever Hans feature exploited by the model to detect the safe class. Top left: input image and heatmap. Top right: reference samples \({{{{{\mathcal{X}}}}}_{8}^{* }}_{{\text{sum}}}^{{{\text{rel}}}\,}\) for the six most relevant channels in the selected region in descending order of their relevance contribution. Bottom: relevance contribution of the 20 most relevant filters inside the region (bottom left). These filters are successively set to zero and the change in prediction confidence of different classes is recorded (bottom right). b, The previously identified Clever Hans filter 361 has a role for samples of different classes (most relevant reference samples shown). Here, black arrows point to the location of a Clever Hans artefact, that is, a white, delicate font overlaid on images (best to be seen in a digital format). In the case of class ‘puma’ or ‘spiderweb’, the channel is used to recognize the puma’s whiskers or the web itself, respectively. Below the reference samples, the CRP heatmaps conditioned on filter 361 and the respective true class y illustrate which part of their attribution map would result from filter 361. Credit: iStock.com/Andyworks, iStock.com/farakos, iStock.com/GP232, iStock.com/neamov, iStock.com/Stock Depot, iStock.com/t_kimura, shutterstock.com/Peter Zijlstra, shutterstock.com/Ground Picture.