Table 9 Performance comparison of the proposed model with other configurations.

From: Recognizing American Sign Language gestures efficiently and accurately using a hybrid transformer model

Configuration

Accuracy (%)

Error rate (%)

Inference speed (FPS)

Computational complexity (GFLOPs)

CNN-only

86.0

14.00

130

3.9

Transformer-only

88.5

11.5

85

7.5

CNN + Transformer (No fusion)

89.7

10.30

95

6.0

Proposed hybrid model

99.97

0.03

110

5.0