Tabel 2 Comparison of Macformer with different augmentation numbers and MacLS on ZINC test dataset

From: Macrocyclization of linear molecules by deep learning to facilitate macrocyclic drug candidates discovery

Method

aug.a

Recovery (%)

Validity (%)

Uniqueness (%)

Noveltymol (%)

Noveltylinker (%)

Macrocyclization (%)

Macformerb

None

2.70 ± 1.31

72.91 ± 2.05

47.74 ± 8.98

96.10 ± 0.81

44.24 ± 2.05

96.39 ± 0.71

×2

76.37 ± 3.23

81.97 ± 1.20

44.99 ± 5.37

99.31 ± 0.19

53.03 ± 0.65

99.48 ± 0.08

×5

81.86 ± 0.75

84.73 ± 1.01

45.14 ± 4.60

99.39 ± 0.09

53.98 ± 1.00

99.53 ± 0.05

×10

84.25 ± 0.84

85.35 ± 1.33

45.26 ± 0.46

99.43 ± 0.09

50.00 ± 0.95

99.27 ± 0.07

MacLS_selfc

/

0.00 ± 0.00

13.02 ± 0.79

83.68 ± 0.74

100 ± 0.00

0.00 ± 0.00

100 ± 0.00

MacLS_extrac

/

4.52 ± 0.20

89.67 ± 0.07

95.04 ± 0.14

99.99 ± 0.00

0.00 ± 0.00

100 ± 0.00

  1. a The fold of augmentation of ChEBML training dataset.
  2. bData are mean ± SD, n = 10 independent experiments using different source SMILES strings. Source data are provided as a Source Data file.
  3. cData are mean ± SD, n = 3 independent experiments using top 3 low-energy conformations. Source data are provided as a Source Data file.