Table 3 Performance Comparison of Various Models Before and After Fine-Tuning on the CBL Dataset.

From: A Connected Building Landscape dataset for Instance Segmentation

Pre-trained on

Model

Backbone

AP (Before Fine-tuning)

AP (After Fine-tuning)

Inference Time (s/img)

COCO

Mask R-CNN

R50-FPN

0.0

39.07

0.058

R101-FPN

0.0

40.67

0.053

Mask2Former

R50

0.0

34.57

0.060

R101

0.0

31.52

0.120

Swin-T

0.0

45.12

0.160

Swin-L

0.0

48.81

0.230

Cityscapes

Mask R-CNN

R50-FPN

0.0

38.48

0.046

Mask2Former

R101

0.0

26.75

0.150

Swin-T

0.0

41.24

0.150

Swin-L

0.0

42.90

0.240

ADE20k

Mask2Former

R50

0.0

21.30

0.075

Swin-T

0.0

32.89

0.374

Swin-L

0.0

30.50

0.312

  1. FPN: The Feature Pyramid Network (FPN) is a commonly employed architecture for multi-scale feature representation. It is frequently integrated with ResNet to amalgamate features from diverse levels of the network, thereby augmenting multi-scale feature representations.
  2. Swin-T, Swin-L: ‘-T’ and ‘-L’ representing different scales, Swin-T (Tiny) is a more compact version with fewer parameters and computational cost, while Swin-L (Large) offers a larger scale with better performance.