Fig. 2

Detailed Architecture of the ViT Module in the VBG Model, Illustrating the Processing Pipeline for Relic Image Input, Feature Extraction through Linear Projection, Positional Encoding Integration, and Multi-head Self-Attention Mechanism for Enhanced Visual Feature Representation.