Table 3 Vision Transformer (vit) architecture.
From: RGB-D based multi-modal deep learning for spacecraft and debris recognition
Layer number | Layer type |
|---|---|
1 | vit_model with 12 layers |
2 | Fully connected layers with 512 nodes |
3 | ReLU activation |
4 | Fully connected layers with 512 nodes |
5 | ReLU activation |
6 | Fully connected layers with 11 nodes |
7 | Softmax activation |