Table 6 Benchmark comparison with state-of-the-art multimodal models.

From: DeepFusionNet for realtime classification in iotbased crossmedia art and design using multimodal deep learning

Model

Accuracy (%)

MCC

AUC

ViLBERT45

91.2

0.801

0.94

MM-Transformer23

92.4

0.823

0.95

MISA46

93.1

0.832

0.95

MFN Zadeh47

92.7

0.829

0.94

Proposed DeepFusionNet

94.2

0.846

0.96