Fig. 17

Qualitative example showing model robustness under challenging visual conditions, including occlusion, low lighting, and complex backgrounds. Visual analysis of model attention under challenging conditions. From left to right: input images, attention heatmaps from the Vision Transformer module, and saliency maps from the hybrid CNN-ViT model. The “Complex Background” example shows that the model correctly focuses on the hand shape despite visual clutter, confirming the effectiveness of the dual-path feature extraction and attention mechanism.