Table 11 Comparison of model complexity of the MARNN-FRFICP technique based on flops and GPU memory usage.

From: An innovative multi-head attention mechanism-driven recurrent neural network model with feature representation fusion for enhanced image captioning to assist individuals with visual impairments

Model

FLOPs (G)

GPU (M)

ConvNeXt v2 Base

26.22

1513

DenseNet 121

18.7

2189

ResNetv2 50

24.65

1660

Swin Tiny

17.04

2748

Vit Base

25.48

2710

MobileNetv3 S 50

19.57

2463

MARNN-FRFICP

9.54

913