Table 2 The model architecture of TCN + Transformer. The Learning Rate is 5 \(\times\) \(10^{-4}\); the batch size is 32; the loss is FocalLoss (\(\alpha =0.8\), \(\gamma =2.0\), reduction=’mean’).

Layer	Type	Key Parameters
Input	window data	input_dim = (w \(\times\) 104)
TCN Block 1	Conv1D + ReLU + Dropout + LayerNorm	kernel=7, dilation=1, dropout = 0.2
TCN Block 2	Conv1D + ReLU + Dropout + LayerNorm	kernel=7, dilation=2, dropout = 0.2
TCN Block 3	Conv1D + ReLU + Dropout + LayerNorm	kernel=7, dilation=4, dropout = 0.2
TCN Block 4	Conv1D + ReLU + Dropout + LayerNorm	kernel=7, dilation=8, dropout = 0.2
Transformer	PositionalEncoding	maxlen=w, dim=104
Transformer Encoder \(\times\) 2	Multi-Head Attention	heads=8, dim per head=13
	Add + LayerNorm	residual connection, dropout = 0.2
	Feed-Forward	ff_dim=104, ReLU + Linear
	Add + LayerNorm	residual connection, dropout = 0.2
Classifier	Linear	output dim = (w \(\times\) 1)

Quick links

Search