Table 9 Comparison of model complexity, inference speed, and segmentation accuracy on the ISPRS Vaihingen dataset with \(256\times 256\) input resolution. Params and Size denote the number of learnable parameters and storage footprint, respectively, while FPS indicates inference throughput. Best results are shown in bold, and second-best results are underlined.

From: Projection Kernel regularization for diffusion-based multimodal remote sensing segmentation

Type

Model

Params (M)

Size (MB)

FPS

OA

F1

mIoU

CNN-based

MANet

35.86

137.05

55.55

90.05

88.55

79.91

ABCNet

13.67

52.19

163.27

90.43

87.90

78.96

PSPNet

49.07

187.42

35.38

89.31

86.23

76.39

Transformer-based

FTransUNet

203.40

775.93

10.96

88.45

85.68

75.55

ASMFNet

83.48

321.60

32.43

88.14

78.51

67.82

CMFNet

104.07

397.13

10.92

90.14

88.45

79.76

UNetFormer

11.72

44.87

171.36

90.33

89.03

80.68

Diffusion-based

SegDiff

157.69

601.55

1.77

74.77

74.75

52.52

RNDiff

6.29

24.09

7.17

90.89

88.70

80.19

PKDiff

6.43

24.54

4.08

91.19

89.51

81.46