Fig. 5: Network-Derived Negatives.
From: Improving the generalizability of protein-ligand binding predictions with AI-Bind

a Protein-ligand bipartite network consisting of only binding (positive) annotations for drugs and natural compounds (green) to proteins (pink). b Degree distributions of ligands and proteins are fat-tailed in nature. c Shortest path length distribution capturing all possible protein-ligand pairs. We use protein-ligand pairs with shortest path distance of 7 for training, while absolute negatives obtained from BindingDB and pairs with shortest path distances ≥11 are used for validation and test. d Average experimental kinetic constant as a function of the shortest path distance. Higher path distance corresponds to higher Ki in BindingDB. Beyond 7 hops, the expected constant exceeds the binding threshold of 106 nM (dashed line). e An example of a protein-ligand pair that is 7 hops apart and is used as a negative sample in the AI-Bind training set. Source data are provided as a Source Data file.