Table 5 Comparative study of MARNN-FRFICP model with existing techniques on Flickr30K dataset.

From: An innovative multi-head attention mechanism-driven recurrent neural network model with feature representation fusion for enhanced image captioning to assist individuals with visual impairments

Flickr30K Dataset

Technique

BLEU1

BLEU2

BLEU3

BLEU4

METEOR

CIDEr

QPULM

60.49

50.82

40.30

29.14

23.46

39.58

YOLOv8

62.91

53.21

43.16

30.77

25.67

41.68

ResNet-50

64.61

55.02

46.10

33.08

27.55

44.17

Google NIC

60.45

50.74

40.23

29.09

23.39

39.51

Soft-Attention

62.82

53.14

43.12

30.73

25.60

41.59

m-RNN

64.55

54.94

46.02

33.01

27.47

44.13

SCA-CNN-VGG

66.45

57.71

49.01

35.90

30.24

47.04

GCN-LSTM

69.22

59.28

50.74

37.74

30.74

58.49

Injection-Tag

71.00

61.97

52.56

39.69

34.47

61.17

AIC-SSAIDL

73.21

64.35

55.47

42.82

36.86

63.42

MARNN-FRFICP

77.23

70.11

69.08

58.91

45.26

69.81