Extended Data Fig. 12: Analysis of the determinants of the success rate of de novo binder design.
From: Design of protein-binding proteins from the target structure alone

a, Correlation between success rate and root mean square deviation (RMSD) with scaffolds. In this experiment, the accuracy of the scaffold library was examined with an experiment similar to Chevalier et al1.. The binding residues from known-good interfaces were copied onto scaffolds that closely resembled the known-good binders. If the scaffold folded properly and displayed these binding residues similarly to the original known-good interface, the hypothesis was that the scaffold would bind. This experiment sought to determine both the required accuracy of displayed sidechains to create a successful binder as well as to probe the accuracy of the scaffold library. If for instance, the scaffold library was perfectly accurate, this graph would indicate that if the Cα RMSD of the displayed sidechains deviates from the known-good conformation by 0.5 Å, that there would be a 15% chance of binding due to the intrinsic accuracy of sidechains required for binding. The scaffold library is likely not perfectly accurate however; as such, the correct interpretation would be: If the Cα RMSD of the displayed sidechains according to the scaffold PDB model (which may not be perfectly correct) deviates by 0.5 Å Cα RMSD, there is a 15% chance of binding. This 15% chance of binding arises in part from the likelihood that the scaffold will fold correctly and in part from the intrinsic required accuracy of sidechain placements for binding. Notably, the RMSD reported in this graph is far lower than the determined crystallographic accuracy of the IL-7Rα binder when aligned by the receptor (the two interfacial helices are 1.5 Å Cα RMSD when aligned by the IL-7Rα receptor); however, if the two interfacial helices are aligned without regard for the receptor (the same calculation performed in this figure (i.e. the helices are superimposed on top of each other)) the Cα RMSD is 0.43 Å. As such, the best explanation for this data is as follows: Although the predicted binding conformation of the complex structure was only accurate to 1.5 Å, the predicted monomer structure was correct to 0.43 Å. The comparison between scaffold and known-good interface was performed at the monomer level, and therefore, these new binders were successful because they assumed the correct monomer structure, which displayed the sidechains the same as the known-good binder, and therefore were able to bind, even though the known-good complex structure was not as accurate. This graph continues to show increased signal below 0.43 Å probably because the scaffolds at very low RMSD ended up being slightly structurally different for the same reason as the known-good binder. (i.e. if we crystallized one of the scaffolds that differed only by 0.2 Å, we would likely find that scaffold model and the scaffold crystal structure deviate by about 0.43 Å and that the scaffold crystal structure and the known-good crystal structure are very similar). Method: 11 IL-7Rα SSM-validated interfaces were used as a starting point to create 2-helical grafts. All grafts consisted of 2-helices joined with a loop and the scaffold library was superimposed onto these two helices and the RMSD of the match was assessed. If a good match was found, the sidechains making strong interactions with IL-7Rα were copied onto the scaffold and the remaining positions near the interface were allowed to redesign to avoid clashes. Plotted on the x-axis is the RMSD of the superposition of the 2-helices + loop between the motif and the scaffold. The y-axis represents the fraction of binders with predicted SC50s <3 μM with the number on top representing the denominator. b, Target success rate versus hydrophobicity. The y-axis shows what percentage of tested binders against the indicated target showed SC50 below 4 μM. The x-axis shows the hydrophobicity of the target region in SAP89 units. A greater Δsap_score indicates greater hydrophobicity. While this graph is not completely fair as the authors improved the method with time, the trend is striking and can be used to estimate the difficulty of potential future targets. (The Δsap_score can be calculated on the target structure alone by observing the SAP score of all residues a potential binder would cover.).