Table 3 Performance of the employed multimodal approaches: pre-trained multimodal models, multimodal ensembles using soft voting, and multimodal-based feature extraction for ML ensemble classifiers.

From: A triple pronged approach for ulcerative colitis severity classification using multimodal, meta, and transformer based learning

Method

Models

Accuracy

F1

Precision

Recall

Inf. time

Pre-trained

CLIP B/1618

0.65

0.66

0.67

0.65

17.8 ms

Pre-trained

CLIP B/3218

0.66

0.67

0.69

0.66

21.5 ms

Pre-trained

CLIP L/1418

0.70

0.70

0.71

0.70

27.8 ms

Pre-trained

BLIP22

0.65

0.66

0.66

0.65

54.3 ms

Pre-trained

FLAVA28

0.65

0.67

0.72

0.65

32.7 ms

Multimodal ensemble

Soft voting ensemble

CLIP B/32, BLIP, FLAVA

0.64

0.66

0.69

0.64

27.33 ms

Soft voting ensemble

CLIP B/32, CLIP L/14, CLIP B/16

0.73

0.74

0.76

0.73

22.36 ms

Soft voting ensemble

CLIP B/32, CLIP L/14, CLIP B/16, BLIP, FLAVA

0.70

0.71

0.74

0.70

30.82 ms

Machine learning classifiers ensemble

Soft voting ensemble | encoder: CLIP B/32

Base classifiers: KNN, SVM, RF

0.83

0.82

0.82

0.83

157.13 ms

 

Additional classifiers: LR, GB, GNB

     

Soft voting ensemble | encoder: EVA-CLIP B/16

Base classifiers: KNN, SVM, RF

0.82

0.80

0.82

0.82

140.35 ms

 

Additional Classifiers: LR, GB, GNB

     

Stacking ensemble | encoder: CLIP B/32

Base classifiers: KNN, SVM, RF

0.82

0.81

0.81

0.82

240.1 ms

 

Meta classifier: LR

     

Stacking ensemble | encoder: EVA-CLIP B/16

Base classifiers: KNN, SVM, RF

0.82

0.79

0.80

0.81

94.37 ms

 

Meta Classifier: LR