Fig. 6
From: Efficient attention vision transformers for monocular depth estimation on resource-limited hardware

Pareto Frontiers built considering the \(Abs_{rel}\) and the mean of the three inference times of the models. The networks in these plots refer to the NYU dataset and are grouped by dataset and size: tiny (a), base (b), and large (c) models. The models in bold represent the optimal trade-offs, lying on the Pareto Frontier.