Table 1 Implementation details: dropout shows the dropout rate in the embedding layer and MLP layer, GRU dropout shows the dropout rate in the Bi-GRU layer and Node dropout shows the dropout rate in the node transformation layer.

From: Hierarchical contrastive learning for multi-label text classification

Description

Values

Description

Values

Description

Values

GRU depth

1

Learning rate

0.0001

Train batch size

64

GRU hidden units

64

Prediction threshold

0.5

Test batch size

512

CNN depth

3

Dropout

0.5

Momentum \(\beta _1\)

0.9

CNN filter region size

{2,3,4}

GRU dropout

0.1

Momentun \(\beta _2\)

0.999

Token length

256

Node dropout

0.05

Momentum \(\epsilon\)

\(1\times 10^{-6}\)