Table 2 The model architecture of TCN + Transformer. The Learning Rate is 5 \(\times\) \(10^{-4}\); the batch size is 32; the loss is FocalLoss (\(\alpha =0.8\), \(\gamma =2.0\), reduction=’mean’).

From: Daily insider threat detection with hybrid TCN transformer architecture

Layer

Type

Key Parameters

Input

window data

input_dim = (w \(\times\) 104)

TCN Block 1

Conv1D + ReLU + Dropout + LayerNorm

kernel=7, dilation=1, dropout = 0.2

TCN Block 2

Conv1D + ReLU + Dropout + LayerNorm

kernel=7, dilation=2, dropout = 0.2

TCN Block 3

Conv1D + ReLU + Dropout + LayerNorm

kernel=7, dilation=4, dropout = 0.2

TCN Block 4

Conv1D + ReLU + Dropout + LayerNorm

kernel=7, dilation=8, dropout = 0.2

Transformer

PositionalEncoding

maxlen=w, dim=104

Transformer Encoder \(\times\) 2

Multi-Head Attention

heads=8, dim per head=13

Add + LayerNorm

residual connection, dropout = 0.2

Feed-Forward

ff_dim=104, ReLU + Linear

Add + LayerNorm

residual connection, dropout = 0.2

Classifier

Linear

output dim = (w \(\times\) 1)