Table 2 The model architecture of TCN + Transformer. The Learning Rate is 5 \(\times\) \(10^{-4}\); the batch size is 32; the loss is FocalLoss (\(\alpha =0.8\), \(\gamma =2.0\), reduction=’mean’).
From: Daily insider threat detection with hybrid TCN transformer architecture
Layer | Type | Key Parameters |
|---|---|---|
Input | window data | input_dim = (w \(\times\) 104) |
TCN Block 1 | Conv1D + ReLU + Dropout + LayerNorm | kernel=7, dilation=1, dropout = 0.2 |
TCN Block 2 | Conv1D + ReLU + Dropout + LayerNorm | kernel=7, dilation=2, dropout = 0.2 |
TCN Block 3 | Conv1D + ReLU + Dropout + LayerNorm | kernel=7, dilation=4, dropout = 0.2 |
TCN Block 4 | Conv1D + ReLU + Dropout + LayerNorm | kernel=7, dilation=8, dropout = 0.2 |
Transformer | PositionalEncoding | maxlen=w, dim=104 |
Transformer Encoder \(\times\) 2 | Multi-Head Attention | heads=8, dim per head=13 |
Add + LayerNorm | residual connection, dropout = 0.2 | |
Feed-Forward | ff_dim=104, ReLU + Linear | |
Add + LayerNorm | residual connection, dropout = 0.2 | |
Classifier | Linear | output dim = (w \(\times\) 1) |