Table 1 Comparison of performances on the NYU dataset.
From: A simple monocular depth estimation network for balancing complexity and accuracy
Method | Venue | Backbone | \(\delta _1\uparrow\) | \(\delta _2\uparrow\) | \(\delta _3\uparrow\) | AbsRel\(\downarrow\) | RMSE\(\downarrow\) | log10\(\downarrow\) | Params\(\downarrow\) |
---|---|---|---|---|---|---|---|---|---|
DORN17 | CVPR 2018 | ResNet-101 | 0.828 | 0.965 | 0.992 | 0.115 | 0.509 | 0.051 | – |
BTS58 | Arxiv 2019 | ResNext-101 | 0.885 | 0.978 | 0.994 | 0.110 | 0.392 | 0.047 | 47.0M |
PWA59 | AAAI 2021 | DenseNet161 | 0.892 | 0.985 | 0.997 | 0.105 | 0.374 | 0.045 | – |
AdaBins12 | CVPR 2021 | EfficientNet-B5 | 0.903 | 0.984 | 0.997 | 0.103 | 0.364 | 0.044 | 78.0 M |
P3Depth60 | CVPR 2022 | ResNet101 | 0.898 | 0.981 | 0.996 | 0.104 | 0.364 | 0.043 | 94.2M |
NeWCRFs19 | CVPR 2022 | Swin-L | 0.922 | 0.992 | 0.998 | 0.095 | 0.334 | 0.041 | 270.5M |
LifelongDepth20 | TNNLS 2023 | ResNet-34 | 0.857 | 0.972 | 0.993 | 0.121 | 0.429 | 0.052 | 22.23M |
DepthFormer61 | MIR 2023 | Swin-L+R-50-C1 | 0.923 | 0.989 | 0.997 | 0.094 | 0.329 | 0.040 | 273.0M |
IEBins18 | NeurIPS 2023 | Swin-T | 0.893 | 0.984 | 0.996 | 0.108 | 0.375 | 0.046 | 90.7M |
TrapAttention62 | CVPR 2023 | XCiT-M24 | 0.925 | 0.988 | 0.997 | 0.092 | 0.332 | 0.040 | 94.2M |
MDEUncertainty63 | TCSVT 2024 | Swin-L | 0.879 | 0.977 | 0.994 | 0.112 | 0.420 | 0.048 | – |
ASNDepth22 | TPAMI 2024 | HRNet-18 | 0.906 | 0.985 | 0.997 | 0.101 | 0.377 | 0.044 | – |
Metric3Dv229 | ECCV 2024 | ConvNeXt-L | 0.925 | 0.983 | 0.994 | 0.092 | 0.341 | 0.040 | 203.24M |
SimMDE(Ours) | Ours | MSCAN-B | 0.925 | 0.990 | 0.997 | 0.091 | 0.331 | 0.039 | 30.9M |