Fig. 7 | Scientific Reports

Fig. 7

From: An autoencoder and vision transformer based interpretability analysis on the performance differences in automated staging of second and third molars

Fig. 7

Comparison of the mean stage images and attention maps for tooth 37. Top row: Mean stage images, a general representation of stage shape. Second row: Mean ViT attention maps, showing how the ViT focus changes across stages by moving from the tooth cusp to the cervical section as stages progress. Third row: Mean reconstructions per stage, depicting how, on average, the reconstructions represent the stage shape. Bottom row: The attention maps from the AE+ViT framework. It can be seen that the attention maps for the ViT model heavily focus on the tooth cusps, and mostly disregard the root region, while the attention maps from the AE + ViT framework incorporate the root information much more, indicating that this useful anatomical feature was pronounced by the AE preprocessing.

Back to article page