Table 2 Transformer parameter settings in the dual-encoder and the substructure-level sequence-to-sequence model

From: Single-step retrosynthesis prediction by leveraging commonly preserved substructures

Parameters

Dual-encoder

Substructure-level seq-to-seq

Embedding size

512

512

Hidden size

256

512

Feedforward hidden size

2048

2048

Encoder blocks

3

10

Encoder attention heads

8

8

Max total training steps

500,000

500,000

Warm-up steps

4000

8000

Dropout

0.1

0.1