Table 1 Comparison of generated drug-like molecules on DUD-E targets (n = 101)

From: Generation of 3D molecules in pockets via a language model

 

Random test

Pocket2Mol

TargetDiff

Lingo3DMol (ours)

Number of molecules generated

100,195

98,332

92,727

100,428

Mean QED ()

0.69

0.46

0.50

0.53

Mean SAS ()

2.6

4.0

4.9

3.3

Number of drug-like molecules

98,432

59,936

45,210

82,637

Drug-like molecules as % of total generated molecules ()

98%

61%

49%

82%

The comparison below involves only drug-like molecules

 Mean molecular weight

370

386

299

348

 ECFP_TS > 0.5 ()

17%

8%

3%

33%

 Mean min-in-place GlideSP score ()

N/A

−6.7

−6.2

6.8

 Mean GlideSP redocking score ()

−6.4

−7.5

−7.0

7.8

 Mean QED ()

0.70

0.56

0.60

0.59

 Mean SAS ()

2.6

3.5

4.0

3.1

 Diversity ()

0.85

0.84

0.88

0.82

 Dice ()

0.21

0.24

0.28

0.25

 Mean r.m.s.d. versus low-energy conformer (Å,)

4.0

1.1

1.1

0.9

  1. Note that for each method, we generated approximately 1,000 molecules per target. To determine the drug likeness and inclusion in the comparison, we considered molecules with a QED score greater than or equal to 0.3 and a SAS less than or equal to 5. The metric ‘ECFP_TS > 0.5’ represents the percentage of targets with generated compounds that are similar to active compounds on the basis of the Tanimoto similarity of ECFP4 (ref. 51). The min-in-place GlideSP score and GlideSP redocking score were calculated using the Glide software. The r.m.s.d. value indicates the differences between the generated conformers and the low-energy conformers generated using ConfGen30. As for the random set, we randomly selected 1,000 molecules from our in-house commercial library for each target. As there are no ‘generated conformers’ for the random test molecules, the r.m.s.d. in this case represents the differences between the docked conformer and the low-energy conformer. More details of molecular weight distribution can be found in Extended Data Fig. 6. Diversity reflects the average pair-wise Tanimoto similarity of molecules generated for the same target. Dice score was defined as the ratio of ‘intersection over union’ between the voxelized representations of the reference compounds observed in the crystal structure (that is, the PDB ID) and the generated molecules. To calculate Dice score, we created a grid with points at 0.5 Å intervals to cover both molecules. Each grid point was evaluated to determine whether it fell within 1.2 times of the covalent radius (referred as testing radius) of any atom in either molecule. Grid points within the testing radius of atoms in both molecules were considered as intersected points, while grid points within the testing radius of any atom in either molecule were considered as union points. The Dice score, calculated as the ratio of intersected points over union points, ranges from 0 to 1, with a value of 0 indicating no similarity or overlap between the molecules. Bold face indicates the best performance.