Table 8 Comparative study of the MARNN-FRFICP model with existing techniques on the MSCOCO dataset.

From: An innovative multi-head attention mechanism-driven recurrent neural network model with feature representation fusion for enhanced image captioning to assist individuals with visual impairments

MSCOCO Dataset

Technique

BLEU1

BLEU2

BLEU3

BLEU4

METEOR

CIDEr

QPULM

63.18

49.15

34.22

23.85

20.00

69.56

YOLOv8

65.34

51.51

36.22

26.12

22.15

71.95

ResNet-50

67.55

52.92

38.17

29.08

24.87

89.45

Google NIC

63.12

49.07

34.14

23.80

19.96

69.52

Soft-Attention

65.27

51.46

36.16

26.03

22.09

71.89

m-RNN

67.49

52.85

38.09

28.99

24.81

89.37

SCA-CNN-VGG

68.21

53.61

38.86

29.65

25.34

90.06

GCN-LSTM

69.95

56.27

40.82

32.09

26.88

106.84

Injection-Tag

76.73

58.98

43.29

33.21

30.07

118.12

AIC-SSAIDL

81.14

63.65

48.59

38.83

34.32

138.03

MARNN-FRFICP

83.36

70.85

55.59

47.86

41.69

150.62