Table 1 Implementation details: dropout shows the dropout rate in the embedding layer and MLP layer, GRU dropout shows the dropout rate in the Bi-GRU layer and Node dropout shows the dropout rate in the node transformation layer.
From: Hierarchical contrastive learning for multi-label text classification
Description | Values | Description | Values | Description | Values |
|---|---|---|---|---|---|
GRU depth | 1 | Learning rate | 0.0001 | Train batch size | 64 |
GRU hidden units | 64 | Prediction threshold | 0.5 | Test batch size | 512 |
CNN depth | 3 | Dropout | 0.5 | Momentum \(\beta _1\) | 0.9 |
CNN filter region size | {2,3,4} | GRU dropout | 0.1 | Momentun \(\beta _2\) | 0.999 |
Token length | 256 | Node dropout | 0.05 | Momentum \(\epsilon\) | \(1\times 10^{-6}\) |