Table 3 Experimental results of text-image fusion models.
From: T-ECBM: a deep learning-based text-image multimodal model for tourist attraction recommendation
Model | Top-1 accuracy (%) | Top-5 accuracy (%) | F1-score (%) |
|---|---|---|---|
GoogLeNet + CA + RNN + MLP | 88.52 ± 1.15 | 98.25 ± 0.48 | 88.15 ± 1.15 |
GoogLeNet + CA + BERT + MLP | 94.01 ± 1.24 | 99.12 ± 0.20 | 93.65 ± 1.24 |
DenseNet-121 + CA + RNN + MLP | 91.51 ± 1.04 | 98.85 ± 0.47 | 91.10 ± 1.04 |
DenseNet-121 + CA + BERT + MLP | 96.20 ± 0.71 | 99.69 ± 0.30 | 95.75 ± 0.73 |
EfficientNet + CA + RNN + MLP | 94.69 ± 0.41 | 98.87 ± 0.25 | 94.25 ± 0.41 |
EfficientNet + CA + BERT + MLP (T-ECBM, ours) | 96.71 ± 0.43 | 99.82 ± 0.26 | 96.70 ± 0.44 |