Fig. 2: Superpositions of complexes highlighting train–test structural similarities before and after filtering. | Nature Machine Intelligence

Fig. 2: Superpositions of complexes highlighting train–test structural similarities before and after filtering.

From: Resolving data bias improves generalization in binding affinity prediction

Fig. 2

a, Superpositions of the most prominent train–test similarities before applying the filtering algorithm. b, Superpositions of the same test complexes as in a, now shown with the most similar training complexes found in PDB CleanSplit. c, Superpositions of the closest train–test similarities that remained postfiltering in the dataset PDBbind CleanSplit. Protein structures from the test and training datasets are depicted as grey and blue cartoons, respectively, with ligands shown in magenta (test) and green (train). Below each superposition, the Tanimoto score, TM score, ligand r.m.s.d. and affinity difference (ΔpK) is shown, which are combined into an overall similarity score S = TM score + Tanimoto + (1 − ligand r.m.s.d.) − ΔpK. This S score was computed for all possible train–test pairs and served to select representative complexes for this figure. For each training dataset, the depicted pairs were selected from the top five pairs with the highest S, prioritizing good ligand visibility. In some superpositions (1O3F/1O3G and 3DD0/3DWC), one structure has been slightly shifted to improve visibility. Original PyMol sessions of all of superpositions are provided on GitHub.

Back to article page