Table 3 Vision Transformer (vit) architecture.

From: RGB-D based multi-modal deep learning for spacecraft and debris recognition

Layer number

Layer type

1

vit_model with 12 layers

2

Fully connected layers with 512 nodes

3

ReLU activation

4

Fully connected layers with 512 nodes

5

ReLU activation

6

Fully connected layers with 11 nodes

7

Softmax activation