Table 3 Experimental results of text-image fusion models.

From: T-ECBM: a deep learning-based text-image multimodal model for tourist attraction recommendation

Model

Top-1 accuracy (%)

Top-5 accuracy (%)

F1-score (%)

GoogLeNet + CA + RNN + MLP

88.52 ± 1.15

98.25 ± 0.48

88.15 ± 1.15

GoogLeNet + CA + BERT + MLP

94.01 ± 1.24

99.12 ± 0.20

93.65 ± 1.24

DenseNet-121 + CA + RNN + MLP

91.51 ± 1.04

98.85 ± 0.47

91.10 ± 1.04

DenseNet-121 + CA + BERT + MLP

96.20 ± 0.71

99.69 ± 0.30

95.75 ± 0.73

EfficientNet + CA + RNN + MLP

94.69 ± 0.41

98.87 ± 0.25

94.25 ± 0.41

EfficientNet + CA + BERT + MLP (T-ECBM, ours)

96.71 ± 0.43

99.82 ± 0.26

96.70 ± 0.44