Fig. 4: Interpretability and dataset size-dependent performance.
From: Deep learning-based robust positioning for all-weather autonomous driving

a, A game-theoretic visualization of GRAMME to interpret the depth predictions based on the SHAP values for sample frames16. Pixels annotated by red points increase the depth prediction accuracy, whereas blue points lower the accuracy. The challenging conditions such as glare, poor illumination and adverse weather lead to concentrated blue regions around the occluded pixels. However, the training with lidar and radar data helps the model focus on more semantically invariant pixels across diverse test conditions, as visualized by the red points around static objects and road edges. The distribution of the values illustrates the independent and uncorrelated failure modes of the proposed multimodal system. b, Dataset size-dependent performance of GRAMME in terms of mean depth prediction error, with standard deviation with respect to the depth ground truth. Although the lidar–camera (stereo) and radar–camera (stereo) fusions improve the overall performance, access to minimal data (for example, only 25%) causes a worse performance than the camera due to the increased complexity of the model required for the multimodal architecture. On the other hand, despite the model complexity, the lidar and radar-based models achieve good performances (compared with the baseline approaches) with a dataset size of at least 50% in all test conditions.