Table 2 Comparative analysis of MARNN-FRFICP model on Flickr8k dataset with recent models.

From: An innovative multi-head attention mechanism-driven recurrent neural network model with feature representation fusion for enhanced image captioning to assist individuals with visual impairments

Flickr8K Dataset

Technique

BLEU1

BLEU2

BLEU3

BLEU4

METEOR

CIDEr

QPULM

60.04

44.93

34.20

20.05

16.32

32.96

YOLOv8

62.37

47.05

36.33

22.14

18.24

35.63

ResNet-50

64.47

48.41

38.09

24.87

19.98

38.33

Google NIC

60.00

44.86

34.12

19.99

16.23

32.89

Soft-Attention

62.29

46.97

36.29

22.07

18.19

35.58

m-RNN

64.38

48.33

38.05

24.81

19.91

38.26

SCA-CNN-VGG

66.98

51.81

41.01

26.28

23.23

40.10

GCN-LSTM

69.16

53.79

43.12

28.47

25.84

42.89

Injection-Tag

68.71

59.01

50.68

37.59

30.15

58.27

AIC-SSAIDL

74.19

58.19

47.69

33.48

31.53

47.95

MARNN-FRFICP

80.10

63.55

56.64

45.78

43.54

63.97