Fig. 5
From: Efficient attention vision transformers for monocular depth estimation on resource-limited hardware

Pareto Frontiers built considering the RMSE and the mean of the three inference times of the models. The networks in these plots refer to the KITTI dataset and are grouped by dataset and size: tiny (a), base (b), and large (c) models. The models in bold represent the optimal trade-offs, lying on the Pareto Frontier.