Extended Data Fig. 1: Simulating the effect of library size on ligand enrichment among the top 1,000 docked molecules.
From: Ultra-large library docking for discovering new chemotypes

a, b, The energy distribution of ligands (a) and decoys (b) from docking enrichment calculations against AmpC. The skewed normal fitting curves are plotted in red lines. The fitting parameters (shape (α), location (loc) and scale values) are shown. c, Heat maps of the number of active molecules in the top 1,000 docked molecules for 6 targets. The number of ligands in the top 1,000 docked molecules for a given library size and the ratio between ligands and decoys is coloured using a log10(number of ligands) scale ranging from 1 (blue) to 1,000 (red). Cells with zero ligands are shown in white. d, Large-library docking screens of AmpC (top, n = 99 million molecules) and D4 (bottom, n = 138 million molecules). Molecules that are known to bind to AmpC and D4, as well as close analogues, are treated as ligands and the rest of the molecules are treated as decoys. Left, the energy distributions of decoys (grey), ligands defined by ECFP4 Tc similarity ≥0.5 (blue), 0.6 (green) and 0.7 (orange) to ligands from ChEMBL. Middle, heat maps of the number of ligands in the top 1,000 docked molecules based on fit to full-library docking with the ligands (AmpC, Tc ≥ 0.5, green; D4, Tc ≥ 0.6, orange) and decoys (grey) distributions. Right, number of ligands in the top 1,000 docked molecules as the library grows based on actual distributions plotted in the left panel. Data are mean ± s.d. of 20 samples. See Supplementary Table 1 for retrospective performance on three more targets.