Table 9 Comparison of model complexity, inference speed, and segmentation accuracy on the ISPRS Vaihingen dataset with \(256\times 256\) input resolution. Params and Size denote the number of learnable parameters and storage footprint, respectively, while FPS indicates inference throughput. Best results are shown in bold, and second-best results are underlined.
From: Projection Kernel regularization for diffusion-based multimodal remote sensing segmentation
Type | Model | Params (M) | Size (MB) | FPS | OA | F1 | mIoU |
|---|---|---|---|---|---|---|---|
CNN-based | MANet | 35.86 | 137.05 | 55.55 | 90.05 | 88.55 | 79.91 |
ABCNet | 13.67 | 52.19 | 163.27 | 90.43 | 87.90 | 78.96 | |
PSPNet | 49.07 | 187.42 | 35.38 | 89.31 | 86.23 | 76.39 | |
Transformer-based | FTransUNet | 203.40 | 775.93 | 10.96 | 88.45 | 85.68 | 75.55 |
ASMFNet | 83.48 | 321.60 | 32.43 | 88.14 | 78.51 | 67.82 | |
CMFNet | 104.07 | 397.13 | 10.92 | 90.14 | 88.45 | 79.76 | |
UNetFormer | 11.72 | 44.87 | 171.36 | 90.33 | 89.03 | 80.68 | |
Diffusion-based | SegDiff | 157.69 | 601.55 | 1.77 | 74.77 | 74.75 | 52.52 |
RNDiff | 6.29 | 24.09 | 7.17 | 90.89 | 88.70 | 80.19 | |
PKDiff | 6.43 | 24.54 | 4.08 | 91.19 | 89.51 | 81.46 |