Figure 9
From: RGB-D based multi-modal deep learning for spacecraft and debris recognition

Attention maps of a few samples that vision transformer Succeeded (first five rows) and failed (last row) to focus attention on space objects.
From: RGB-D based multi-modal deep learning for spacecraft and debris recognition

Attention maps of a few samples that vision transformer Succeeded (first five rows) and failed (last row) to focus attention on space objects.