Fig. 8

Comparison of mean stage images and attention maps for tooth 38. Top row: Mean stage images reveal a blurrier average per stage, indicating the tooth shapes for 38 contain more intra-class variation. Second row: Mean ViT attention maps seem similar to those of tooth 37, remaining plausible; however they do not incorporate the information below the mid-region of the tooth, and thus do not explain the lower accuracy. Third row: Mean reconstructions are less visually similar to mean stage images, indicating the mean images are not the optimal representation of stage morphology. Bottom row: The attention maps from the AE + ViT pipeline, for all stages, focus on the lower region of the tooth more than ViT, indicating that the root formation informed the classification process.