Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

De novo protein design with a denoising diffusion network independent of pretrained structure prediction models

Abstract

The recent success of RFdiffusion, a method for protein structure design with a denoising diffusion probabilistic model, has relied on fine-tuning the RoseTTAFold structure prediction network for protein backbone denoising. Here, we introduce SCUBA-diffusion (SCUBA-D), a protein backbone denoising diffusion probabilistic model freshly trained by considering co-diffusion of sequence representation to enhance model regularization and adversarial losses to minimize data-out-of-distribution errors. While matching the performance of the pretrained RoseTTAFold-based RFdiffusion in generating experimentally realizable protein structures, SCUBA-D readily generates protein structures with not-yet-observed overall folds that are different from those predictable with RoseTTAFold. The accuracy of SCUBA-D was confirmed by the X-ray structures of 16 designed proteins and a protein complex, and by experiments validating designed heme-binding proteins and Ras-binding proteins. Our work shows that deep generative models of images or texts can be fruitfully extended to complex physical objects like protein structures by addressing outstanding issues such as the data-out-of-distribution errors.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: SCUBA-D uses a denoising diffusion network trained with adversarial losses to generate designable protein backbone structures.
Fig. 2: Structure generation without condition or with biased secondary structure distributions.
Fig. 3: Generating protein structures with sketched inputs.
Fig. 4: SCUBA-D for designing small-molecule-binding proteins.
Fig. 5: SCUBA-D for designing protein-binding proteins.

Similar content being viewed by others

Data availability

Protein structures for training the models were downloaded from the PDB. The experimentally solved protein structures were deposited in the PDB under accession codes: 8K7Z (N1), 8K83 (N2), 8K84 (N3), 8KCJ (N7), 8KCK (N9), 8K8I (N14), 8KC4 (NA5), 8KA6 (NA7), 8KA7 (NB7), 8KC0 (NB8), 8KAC (NX1), 8KC1 (NX5), 8K7M (T01), 8KDQ (T03), 8WX8 (T09), 8KC8 (T11) and 8WWC (120–4). We referenced the structures 2ZDO and 4G0N from the PDB for the design of heme-binding proteins and Ras-binding proteins, respectively. The amino acid sequences and encoding DNA sequences of the experimentally examined proteins are available in Supplementary Tables 610 and Supplementary Data 13. The complete lists of proteins for training and testing the models, the data of experimental results (SEC, multi-angle light scattering, 15N-1H HSQC NMR, ITC, CD, validation reports for experimentally solved protein structures) and all in silico experimental results are available from Zenodo via https://doi.org/10.5281/zenodo.10911626 (ref. 45). Source data are provided with this paper.

Code availability

Executable computer programs and source codes of SCUBA-D (version 1.0) and SCUBA-sketch (version 1.0) are publicly available from Zenodo via https://doi.org/10.5281/zenodo.10947360 (ref. 46) and can be freely used for noncommercial purposes. The source codes for SCUBA-D are also available from GitHub at https://github.com/liuyf020419/SCUBA-D.git/.

References

  1. Huang, P. S., Boyken, S. E. & Baker, D. The coming of age of protein design. Nature 537, 320–327 (2016).

    Article  CAS  PubMed  Google Scholar 

  2. Kuhlman, B. et al. Design of a novel globular protein fold with atomic-level accuracy. Science 302, 1364–1368 (2003).

    Article  CAS  PubMed  Google Scholar 

  3. Polizzi, N. F. & DeGrado, W. F. A defined structural unit enables de novo design of small-molecule–binding proteins. Science 369, 1227–1233 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Gainza, P. et al. De novo design of protein interactions with learned surface fingerprints. Nature 617, 176–184 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Yeh, A. H.-W. et al. De novo design of luciferases using deep learning. Nature 614, 774–780 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Li, H., Helling, R., Tang, C. & Wingreen, N. Emergence of preferred structures in a simple model of protein folding. Science 273, 666–669 (1996).

    Article  CAS  PubMed  Google Scholar 

  7. Kuhlman, B. & Baker, D. Native protein sequences are close to optimal for their structures. Proc. Natl Acad. Sci. USA 97, 10383–10388 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Grigoryan, G. & DeGrado, W. F. Probing designability via a generalized model of helical bundle geometry. J. Mol. Biol. 405, 1079–1100 (2011).

    Article  CAS  PubMed  Google Scholar 

  9. Huang, B. et al. A backbone-centred energy function of neural networks for protein design. Nature 602, 523–528 (2022).

    Article  CAS  PubMed  Google Scholar 

  10. Anishchenko, I. et al. De novo protein design by deep network hallucination. Nature 600, 547–552 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Eguchi, R. R., Choe, C. A. & Huang, P.-S. Ig-VAE: generative modeling of protein structure by direct 3D coordinate generation. PLoS Comput. Biol. 18, e1010271 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Lee, J. S., Kim, J. & Kim, P. M. Score-based generative modeling for de novo protein design. Nat. Comput. Sci. 3, 382–392 (2023).

    Article  CAS  PubMed  Google Scholar 

  13. Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).

    Google Scholar 

  15. Chen, N. et al. Wavegrad: estimating gradients for waveform generation. In Proc. International Conference on Learning Representations (ICLR, 2021).

  16. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. & Chen, M. Hierarchical text-conditional image generation with clip latents. Preprint at https://arXiv.org/abs/2204.06125 (2022).

  17. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Anand, N. & Achim, T. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. Preprint at https://arXiv.org/abs/2205.15019 (2022).

  19. Wu, K. E. et al. Protein structure generation via folding diffusion. Nat. Commun. 15, 1059 (2022).

  20. Trippe, B. L. et al. Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem. In Proc. International Conference on Learning Representations (ICLR, 2023).

  21. Ingraham, J. et al. Illuminating protein space with a programmable generative model. Nature 623, 1070–1078 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Yim, J. et al. SE(3) diffusion model with application to protein backbone generation. In Proc. International Conference on Machine Learning (ICML, 2023).

  23. Zhao, H., Gallo, O., Frosio, I. & Kautz, J. Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 3, 47–57 (2016).

    Article  Google Scholar 

  24. Blau, Y., Mechrez, R., Timofte, R., Michaeli, T. & Zelnik-Manor, L. The 2018 PIRM challenge on perceptual image super-resolution. In Proc. the European Conference on Computer Vision (ECCV) Workshops (eds Leal-Taixé, L. & Roth, S.) 334–355 (2019).

  25. Goodfellow, I. et al. Generative adversarial networks. Commun. ACM 63, 139–144 (2020).

    Article  Google Scholar 

  26. Liu, Y. et al. Rotamer-free protein sequence design based on deep learning and self-consistency. Nat. Comput. Sci. 2, 451–462 (2022).

    Article  CAS  PubMed  Google Scholar 

  27. Dauparas, J. et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 378, 49–56 (2022).

  28. Popov, V., Vovk, I., Gogoryan, V., Sadekova, T., Kudinov, M. Grad-tts: a diffusion probabilistic model for text-to-speech. in International Conference on Machine Learning, 8599–8608 (PMLR, 2021)

  29. Lee, S.-g. et al. PriorGrad: Improving conditional denoising diffusion models with data-dependent adaptive prior. In International Conference on Learning Representations (ICLR, 2022).

  30. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Sillitoe, I. et al. CATH: increased structural coverage of functional space. Nucleic Acids Res. 49, D266–D273 (2021).

    Article  CAS  PubMed  Google Scholar 

  33. Zhang, Y. & Skolnick, J. Scoring function for automated assessment of protein structure template quality. Proteins 57, 702–710 (2004).

    Article  CAS  PubMed  Google Scholar 

  34. Xu, J. & Zhang, Y. How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics 26, 889–895 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Lin, Y. & AlQuraishi, M. Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds. In Proc. International Conference on Machine Learning (ICML, 2023).

  36. van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn Res. 9, 2579–2605 (2008).

    Google Scholar 

  37. Wang, J. et al. Scaffolding protein functional sites using deep learning. Science 377, 387–394 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Lee, W. C., Reniere, M. L., Skaar, E. P. & Murphy, M. E. Ruffling of metalloporphyrins bound to IsdG and IsdI, two heme-degrading enzymes in Staphylococcus aureus. J. Biol. Chem. 283, 30957–30963 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Skaar, E. P., Gaspar, A. H. & Schneewind, O. IsdG and IsdI, heme-degrading enzymes in the cytoplasm of Staphylococcus aureus. J. Biol. Chem. 279, 436–443 (2004).

    Article  CAS  PubMed  Google Scholar 

  40. Fetics, S. K. et al. Allosteric effects of the oncogenic RasQ61L mutant on Raf-RBD. Structure 23, 505–516 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2021).

  42. Remy, I., Campbell-Valois, F. & Michnick, S. W. Detection of protein–protein interactions using a simple survival protein–fragment complementation assay based on the enzyme dihydrofolate reductase. Nat. Protoc. 2, 2120–2125 (2007).

    Article  CAS  PubMed  Google Scholar 

  43. Jing, B., Eismann, S., Suriana, P., Townshend, R. J., Dror, R. Learning from protein structure with geometric vector perceptrons. In Proc. International Conference on Learning Representations (ICLR, 2021).

  44. Wang, G. & Dunbrack, R. L. Jr PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003).

    Article  CAS  PubMed  Google Scholar 

  45. Wang, S. Source data for manuscript: de novo protein design with a denoising diffusion network independent of pre-trained structure prediction models. Zenodo https://doi.org/10.5281/zenodo.10911626 (2024).

  46. Wang, S. De novo protein design with a denoising diffusion network independent of pre-trained structure prediction models. Zenodo https://doi.org/10.5281/zenodo.10947360 (2024).

Download references

Acknowledgements

We thank the staff from the BL18U1 and BL19U1 beamlines of the National Facility for Protein Science in Shanghai for their assistance during crystallographic data collection. We also thank X. Hu, R. Wu and L. Zhang for their help with experimental techniques, as well as M. Lv and H. Yu for their help with crystal collection. This work was supported by the National Key R&D Program of China (2022YFA1303700 to H.L. and 2022YFF1203100 to Q.C.), National Natural Science Foundation of China (T2221005, 92253302 and 22177107 to H.L.; 32371487 and 32171411 to Q.C.), CAS Strategic Priority Research Program (XDB0500201 to H.L.), CAS Project for Young Scientists in Basic Research (YSBR-072 to Q.C.), Anhui Provincial Natural Science Foundation (2308085J01 to Q.C.) and Research Funds of Center for Advanced Interdisciplinary Science and Biomedicine of IHM (QYPY20230035 to Q.C.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Y.L. developed computational models and codes with the assistance of L.C. S.W. carried out the experimental work with the help of J.D., X.W. and Y.W. L.W., F.L. and C.W. helped with analysis of crystal structural data. J.Z. collected the NMR data. S.W. participated in the discussion. H.L. and Q.C. supervised the project. H.L., Y.L., S.W. and Q.C. wrote the paper.

Corresponding authors

Correspondence to Quan Chen or Haiyan Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Arne Elofsson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Arunima Singh, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Evaluation of the variant models ‘no ESM’, ‘compressed ESM’, ‘full ESM’ and ‘full ESM with GAN’.

(a) Distributions of TM-scores and RMSDs between the initial natural backbones and the denoised backbones generated by the variant modes. For each model, 75 protein backbones were generated by considering 3 independent ‘denoising’ runs from each of 25 initial natural backbones. (b) Distributions of scTM-scores and scRMSDs between the denoised backbones and the AlphaFold2-predicted structures for amino acid sequences designed (with ABACUS-R) on corresponding denoised backbones. (c) Distributions of per-residue ABACUS-R logits scores of amino acid sequences designed by ABACUS-R for the various denoised backbones and for the initial natural backbones. Larger logits scores indicate better compatibility between the designed sequences and the corresponding backbone structures. (d) Left: backbones ‘denoised’ with the ‘full ESM’ model (orange) and the ‘full ESM with GAN’ model (green) from the same initial natural backbone 1e1qA01 (CATH domain ID); the RMSD between the two denoised backbones is indicated. Middle: the backbone denoised with the ‘full ESM’ model (orange) superimposed with the AlphaFold2-predicted structure (gray) for the amino acid sequence designed on this backbone by ABACUS-R; the corresponding scRMSD is indicated. Left: the backbone denoised with the ‘full ESM with GAN’ model (green) superimposed with the AlphaFold2-predicted structure (gray) for the amino acid sequence designed on this backbone by ABACUS-R; the corresponding scRMSD is indicated. (e) The distributions of the ABACUS-R logits of different amino acid sequences for 25 natural backbones. The pESM sequences were obtained by projecting the single representation parts from the SCUBA-D output using a residue type classifier network of ESM. The boxplots in A to C and E show median, interquartile range, and minimum and maximum values excluding outliers (>1.5 times the interquartile range beyond the box) with the sample sizes being 75 (for the denoised backbones) or 25 (for the natural backbones).

Source data

Extended Data Fig. 2 Comparasions between SCUBA-D and other DDPM models for unconditional backbone generation.

(a) Averaged metrics of various models. For each method, the averages over two groups (one group comprised 100 backbones of 100 residues in chain length and the other group comprised 300 backbones of 200 to 400 residues in chain length) are reported, with data in the parentheses reporting the total number of backbones with scRMSD below 2.5 Å or the total number of backbones of high overall structural novelty (the highest TM-score to PDB below 0.5). (b) Two example backbones of 100 residues (100-9 and 100-7) generated by SCUBA-D without condition and with their highest TM-scores to both PDB and AlphaFold2 database below or equal to 0.5. The generated backbones (in blue) and their superimpositions with the corresponding structures from PDB or AlphaFold2 database (in salmon) are shown. The respective TM-scores and PDB IDs (with chain IDs) are indicated. Here the scRMSD of a generated backbone was determined as the RMSD of the backbone from the AF2 predicted structure for the amino acid sequence designed (here with ProteinMPNN) for that backbone. (c) The structures of ten example backbones with RosettaFold2-based scRMSDs above 6.0 Å. Both the ESM prediction-based scRMSDs and the RosettaFold2 prediction-based scRMSDs are indicated.

Source data

Extended Data Fig. 3 Example results of size-exclusion chromatography (SEC) experiments.

Proteins designed in the five different tasks as indicated were analyzed. For each task, three example results are shown in the same row. The protein IDs and the types of the SEC columns are indicated.

Source data

Extended Data Fig. 4 The deviations between the loops in designed structures and in solved crystal structures.

(a) The RMSDs between the loops. The analysis included the 6 crystal structures obtained for proteins of backbones generated by SCUBA-D without condition. Each point corresponds to a loop, with the loops grouped according to their lengths and those of the same length displayed in the same column. The RMSDs were calculated by superimposing the flanking secondary structure segments for a pair of compared loops. An example showing the superimposed structures with the indicated RMSD between designed loop (blue) and corresponding crystal structure (orange). (b) The same as A, but for the 6 experimentally determined structures of proteins generated for particular architectures.

Source data

Extended Data Fig. 5 Protein backbone generation without condition or with biased secondary structure (SS) distributions.

(a) The distribution of the mutual TM-scores between the set of backbones unconditionally generated by SCUBA-D. (b) Histograms of the proportions of residues in the α helix state (upper panel) and of residues in the β strand state (lower panel) for the set of unconditionally generated backbones with SCUBA-D (blue) and for a set of natural protein structures (salmon), which comprised PDB structures of resolutions higher than 2.0 Å, of mutual sequence identities below 40%, and of 100 to 500 residues in length. The proportions were calculated on individual backbones. The histograms represent the normalized frequencies of backbones with proportions in specific bins. (c) Scattering plot of the recovery rates of the input secondary structure (SS) states versus the scRMSDs for the set of 225 backbones generated using the 25 input structures. Each input structure was composed according to the SS distribution of a natural backbone. The gray box indicates the region with scRMSD < 2.0 Å and SS recovery rate > 70%. (d) The scRMSDs of the backbones generated with biased SS distributions and the SS recovery rate. For each SS distribution, 9 backbones were generated and evaluated, one data point in the plots corresponding to one designed backbone. The results for three different classes of SS distributions (all-α, all-β, and mixed αβ) are displayed in different plots. Within each plot, results biased towards the same SS distribution are numbered the same and displayed in the same column. Results for different SS distributions were arranged from left to right in an ascending order of the corresponding chain lengths.

Source data

Extended Data Fig. 6 Backbone generation with skected input structures.

(a) Example scattering plots of scRMSD versus TM-score to initial structure for the backbones generated from initial structures ‘sketched’ according to three architectures of different natural proteins. The examples were of different fold classes (all-α, all-β, and mixed αβ). For each architecture, backbones were generated by applying SCUBA-D to 60 independently ‘sketched’ input structures. The dashed boxes indicate regions with scRMSDs < 2.0 Å and the TM-scores to initial backbones > 0.5. (b) An example for which no generated backbone for the particular architecture meet the criteria of scRMSD < 2.0 Å and TM-score > 0.5. Left: the scattering plot of scRMSD versus the TM-score to initial and backbone for the architecture. Middle: an example of the initial backbone. Right: an example generated backbone.

Source data

Extended Data Fig. 7 Designing proteins of the (αβ)n-barrel and the (β4)n-propeller architectures.

(a) Left: an example initial structure ‘sketched’ according to the (αβ)15-barrel architecture. Right: example backbones (blue) generated for the (αβ)n-barrel architectures superimposed with structures predicted by AlphaFold2 (gray) for amino acid sequences designed for these backbones with ProteinMPNN. The scRMSDs of the superimpositions are indicated. For each value of the repeat number n from 9 to 15, one example is shown. (b) The same as A, but for the (β4)n-propeller architectures with n ranging from 7 to 11. (c) Left: the crystal structure (gold and salmon) and the designed backbone (blue) of the designed (αβ)9-barrel protein T01. The crystal structure presents a domain-swapped dimer, with the monomers colored differently. The designed backbone is superimposed with one of the monomers. Right: the results of SEC (black curve) and static light scattering (red curve) experiments on T01, which indicate that the protein exists in the monomeric state in solution. (d) Left: the crystal structure (gold, yellow, and salmon) and the designed backbone (blue) of the designed (αβ)9-barrel protein T11. The crystal structure presents a domain-swapped trimer, with the monomers colored differently. The designed backbone is superimposed with one of the monomers. Right: the results of SEC (black curve) and static light scattering (red curve) experiments on T11, which indicate that the protein exists in the monomeric state in solution.

Source data

Extended Data Fig. 8 Designed heme-binding proteins.

(a) Scattering plot and histograms of the scRMSD and pLDDT scores of the designed heme-binding backbones. Structure predictions with AlphaFold2 were performed for amino acid sequences designed with the ABACUS-R program. (b) UV-Visible absorbance spectra of 9 designed heme-binding proteins are shown with the topology diagrams of the corresponding proteins. ‘NC’ represents negative control. ‘IsdG’ represents the natural iron-regulated surface determinant G protein which served as a positive control. Heme binding is indicated by the presence of the peak around 412 nm. (c) Experimental characterizations of the designed heme-binding protein H6. Left: SEC result. Middle: NMR 15N-1H HSQC spectrum. Right: ITC measurements on heme binding. (d) The same as C, but for H8. (e) The result of ITC experiments measuring the KD values of heme binding by the natural protein IsdG. (f) UV-Visible absorbance spectra showing the impacts of mutating the iron-coordinating histidine residues in the natural protein and the designed heme-binding proteins. Each panel shows the spectrum of a mutated protein together with the spectra of the corresponding original protein and of a non-heme-binding negative control protein (labeled as ‘NC’).

Source data

Extended Data Fig. 9 The designed structures and experimental characterizations of the Ras-binding proteins 90-4, 90-2 and 120-4.

(a) The designed proteins (90-4, 90-2 and 120-4, colored in blue) are correspondingly superimposed with the predicted structures (gray) in complex with Ras (green). For each designed protein, the residues to be mutated is shown with its surrounding residues in the predicted structure next to the overall superimposition. The scRMSD and ligand pLDDT are indicated. (b) NMR 15N-1H HSQC spectrum of Ras-binding proteins 90-2. (c) The results of ITC measurements on the Ras binding of 90-4, 90-2 and 120-4 and their mutated variants. (d) The results of ITC measurements on the Ras binding of Raf-RBD and the mutated variant of Raf-RBD (R89L).

Source data

Extended Data Fig. 10 Assessing the designed Ras-binding proteins with the dihydrofolate reductase (DHFR)-based protein complementarity analysis assay.

(a) The protein complementarity analysis results on 14 designed Ras-binding proteins. In these experiments, the peptide chain of DHFR is split into two parts. Ras and the protein to be assessed were separately fused with each part. Bacterium cells expressing the two fused peptides were diluted to different levels of concentrations and tittered on media containing different levels of trimethoprim (TMP), which can inhibit the endogenous DHFR activity of the cells. Possible binding between Ras and the protein to be assessed was detected through the resistance of the bacterium cells to the growth inhibition by TMP. The label ‘Raf-RBD’ represents the Ras-binding domain of Raf, which served as a positive control. The label ‘Raf-RBD R89L’ represents a mutant with abolished Ras binding activity, which served as a negative control. The stronger TMP resistance (relative to the negative control) exhibited by the cells expressing fusion peptides of the designed proteins indicated that the designed proteins examined here can bind Ras. (b) Results of competitive DHFR-PCA analysis of 4 designed proteins. In the experiments examining a designed protein, cells co-expressing isolated Raf-RBD and the DHFR-PCA system for the designed protein were analyzed. If the designed protein and Raf-RBD share binding sites on Ras, the expression of Raf-RBD, which is induced by L-arabinose, will lead to the competitive inhibition of the Ras binding of the designed protein, detected as reduced resistance to TMP.

Source data

Supplementary information

Supplementary Information

Supplementary Methods, Fig. 1, Tables 1–11 and References.

Reporting Summary

Peer Review File

Supplementary Data 1

Raw data for Supplementary Fig. 1.

Supplementary Data 2

Partial computational data and crystallographic data for Supplementary Tables 1–4.

Supplementary Data 3

DNA sequences of experimentally characterized proteins.

Source data

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data and unprocessed NMR data.

Source Data Fig. 5

Statistical source data.

Source Data Extended Data Fig./Table 1

Statistical source data.

Source Data Extended Data Fig./Table 2

Statistical source data.

Source Data Extended Data Fig./Table 3

Statistical source data.

Source Data Extended Data Fig./Table 4

Statistical source data.

Source Data Extended Data Fig./Table 5

Statistical source data.

Source Data Extended Data Fig./Table 6

Statistical source data.

Source Data Extended Data Fig./Table 7

Statistical source data.

Source Data Extended Data Fig./Table 8

Statistical source data and unprocessed NMR data.

Source Data Extended Data Fig./Table 9

Statistical source data and unprocessed NMR data.

Source Data Extended Data Fig./Table 10

Unprocessed images.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Wang, S., Dong, J. et al. De novo protein design with a denoising diffusion network independent of pretrained structure prediction models. Nat Methods 21, 2107–2116 (2024). https://doi.org/10.1038/s41592-024-02437-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41592-024-02437-w

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing