Fig. 17 | Scientific Reports

Fig. 17

From: Recognizing American Sign Language gestures efficiently and accurately using a hybrid transformer model

Fig. 17

Qualitative example showing model robustness under challenging visual conditions, including occlusion, low lighting, and complex backgrounds. Visual analysis of model attention under challenging conditions. From left to right: input images, attention heatmaps from the Vision Transformer module, and saliency maps from the hybrid CNN-ViT model. The “Complex Background” example shows that the model correctly focuses on the hand shape despite visual clutter, confirming the effectiveness of the dual-path feature extraction and attention mechanism.

Back to article page