Fig. 3: Comparison of high-energy atom detector (HEAD) and PoseBusters8 in assessing ligand-protein interactions.
From: Assessing conformation validity and rationality of deep learning-generated 3D molecules

Panels (a–d) focus on the comparison between HEAD and PoseBusters using F1 scores, while Panels (e) and (f) compare HEAD with Ciepliński et al. (2023) using Pearson correlations. a Histogram showing the distribution of Molecular Mechanics-Generalized Born Surface Area (MM/GBSA31,32) binding free energies for AI-generated molecules in the GM-5K dataset. The red dashed line indicates the threshold value used to label ligand-protein complexes as either “valid” or “invalid”. Molecules with MM/GBSA binding free energy values below this threshold are labeled as “valid,” while those above the threshold are classified as “invalid”. The dynamic adjustment of the threshold value results in corresponding changes to the valid and invalid labels. b Weighted F1 scores of HEAD and PoseBusters across varying labeling thresholds based on MM/GBSA binding free energy of molecules in the GM-5K dataset. The threshold was systematically varied from 0 to 2000 kcal/mol in increments of 1 kcal/mol, yielding a series of weighted F1 scores. Panels (c) and (d) showcase two representative cases which were detected by HEAD but missed by PoseBusters, including a representative example of ligand-protein clashes involving hydrogens (c), and a case involving lone pair electron clash between two carbonyl groups (d), specifically the ligand O4 atom and the backbone carbonyl oxygen of pocket E377. In each panel, the ligand–protein complex is shown as sticks, accompanied by a bar chart displaying atomic energy difference for ligand upon binding with proteins (\(\Delta {{{\rm{E}}}}={{{{\rm{E}}}}}_{{{{\rm{ligand}}}}}^{{{{\rm{bound}}}}}-{{{{\rm{E}}}}}_{{{{\rm{ligand}}}}}^{{{{\rm{isolated}}}}}\)). Atom names are shown on the horizontal axis and atomic energy differences on the vertical axis. Atoms exhibiting high energies upon binding are highlighted by red bars in the bar charts and indicated by red arrows in the structures, signifying ligand–protein clashes (circled in red). The details of identifying these invalid atoms can be found in Methods section (Evaluation of Ligand-Protein Interaction Validity). e Scatter plot of MM/GBSA binding free energies (log scale) versus Ciepliński’s Vinardo docking scores33 on GM-5K dataset.f Scatter plot of MM/GBSA binding free energies (log scale) versus HEAD’s Ebind values on GM-5K dataset. \({{{{\rm{E}}}}}_{{{{\rm{bind}}}}}={{{{{\rm{E}}}}}_{{{{\rm{complex}}}}}^{{{{\rm{bound}}}}}}-({{{\rm{E}}}}_{{{{\rm{ligand}}}}}^{{{{\rm{isolated}}}}}+{{{{\rm{E}}}}}_{{{{\rm{pocket}}}}}^{{{{\rm{isolated}}}}}).\) PCC stands for Pearson Correlation Coefficient.