Table 1 Details of the TSCA-Net Architecture. In the table, C represents the number of MEA channels, T represents the number of time points, and N represents the number of output classes.
From: Temporal-spatial cross attention network for recognizing imagined characters
Modules | Layers | Output size | Details |
|---|---|---|---|
TF | Input Size: \(\textit{T}\times \textit{C}\) | Â | Â |
BN | \(\textit{T}\times 256\) | Batch Normalization | |
LSTM | \(\textit{T}\times 256\) | [\(inputdim=256, hiddendim=256\)] \(\times 1\) | |
BN | \(\textit{T}\times 256\) | Batch Normalization | |
MSA | \(\textit{T}\times 2048\) | [\(heads=16,d_q=d_k=d_v=128\)] | |
MLP | \(\textit{T}\times 256\) | 2 FC layers [2048,256] | |
SF | Input Size: \(\textit{C}\times \textit{T}\) | Â | Â |
BN | \(\textit{C} \times 256\) | Batch Normalization | |
MSA | \(\textit{C} \times 2048\) | [\(heads=16,d_q=d_k=d_v=128\)] | |
MLP | \(\textit{C} \times 256\) | 2 FC layers [2048,256] | |
TSCross-SingleT | Query Size: \(\textit{T}\times 256\), | Â | Time Vectors as Query |
Key, Value Size: \(\textit{C} \times \textit{T}\) | Â | Channel Vectors as Key | |
MSA | \(\textit{C} \times 2048\) | [\(heads=16,d_q=d_k=d_v=128\)] | |
MLP | \(\textit{C} \times 256\) | 2 FC layers [2048,256] | |
TSCross-SingleC | Query Size: \(\textit{C}\times 256\), | Â | Time Vectors as Query |
Key, Value Size: \(\textit{T} \times \textit{C}\) | Â | Channel Vectors as Key | |
MSA | \(\textit{T} \times 2048\) | [\(heads=16,d_q=d_k=d_v=128\)] | |
MLP | \(\textit{T} \times 256\) | 2 FC layers [2048,256] | |
Classifier | Input1 Size: \(\textit{C}\times 256\) | Â | The TSCross-SingleT output |
Input2 Size: \(\textit{T} \times 256\) | Â | The TSCross-SingleC output | |
Concatenate | Â | Concatenate the input vectors | |
AvgPooling | \(1\times 256\) | Global average pooling | |
MLP | \(1\times \textit{N}\) | FC-1 layers [256, 256], | |
| Â | Â | FC-2 layers [256, N],softmax |