De novo protein design with a denoising diffusion network independent of pretrained structure prediction models

Liu, Yufeng; Wang, Sheng; Dong, Jixin; Chen, Linghui; Wang, Xinyu; Wang, Lei; Li, Fudong; Wang, Chenchen; Zhang, Jiahai; Wang, Yuzhu; Wei, Si; Chen, Quan; Liu, Haiyan

doi:10.1038/s41592-024-02437-w

Article
Published: 09 October 2024

De novo protein design with a denoising diffusion network independent of pretrained structure prediction models

Yufeng Liu^1,2^na1,
Sheng Wang^1,2^na1,
Jixin Dong ORCID: orcid.org/0009-0003-1268-8367^1,2,
Linghui Chen ORCID: orcid.org/0009-0009-4247-4128³,
Xinyu Wang^1,2,
Lei Wang²,
Fudong Li^2,4,
Chenchen Wang ORCID: orcid.org/0000-0002-3618-1670²,
Jiahai Zhang^2,4,
Yuzhu Wang²,
Si Wei⁵,
Quan Chen ORCID: orcid.org/0000-0002-3301-3065^1,2,3,4 &
…
Haiyan Liu ORCID: orcid.org/0000-0002-5926-820X^2,3,4,6

Nature Methods volume 21, pages 2107–2116 (2024)Cite this article

12k Accesses
36 Citations
22 Altmetric
Metrics details

Subjects

Abstract

The recent success of RFdiffusion, a method for protein structure design with a denoising diffusion probabilistic model, has relied on fine-tuning the RoseTTAFold structure prediction network for protein backbone denoising. Here, we introduce SCUBA-diffusion (SCUBA-D), a protein backbone denoising diffusion probabilistic model freshly trained by considering co-diffusion of sequence representation to enhance model regularization and adversarial losses to minimize data-out-of-distribution errors. While matching the performance of the pretrained RoseTTAFold-based RFdiffusion in generating experimentally realizable protein structures, SCUBA-D readily generates protein structures with not-yet-observed overall folds that are different from those predictable with RoseTTAFold. The accuracy of SCUBA-D was confirmed by the X-ray structures of 16 designed proteins and a protein complex, and by experiments validating designed heme-binding proteins and Ras-binding proteins. Our work shows that deep generative models of images or texts can be fruitfully extended to complex physical objects like protein structures by addressing outstanding issues such as the data-out-of-distribution errors.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: SCUBA-D uses a denoising diffusion network trained with adversarial losses to generate designable protein backbone structures.**

**Fig. 2: Structure generation without condition or with biased secondary structure distributions.**

**Fig. 3: Generating protein structures with sketched inputs.**

**Fig. 4: SCUBA-D for designing small-molecule-binding proteins.**

**Fig. 5: SCUBA-D for designing protein-binding proteins.**

De novo design of protein structure and function with RFdiffusion

Article Open access 11 July 2023

A backbone-centred energy function of neural networks for protein design

Article 09 February 2022

Protein structure generation via folding diffusion

Article Open access 05 February 2024

Data availability

Protein structures for training the models were downloaded from the PDB. The experimentally solved protein structures were deposited in the PDB under accession codes: 8K7Z (N1), 8K83 (N2), 8K84 (N3), 8KCJ (N7), 8KCK (N9), 8K8I (N14), 8KC4 (NA5), 8KA6 (NA7), 8KA7 (NB7), 8KC0 (NB8), 8KAC (NX1), 8KC1 (NX5), 8K7M (T01), 8KDQ (T03), 8WX8 (T09), 8KC8 (T11) and 8WWC (120–4). We referenced the structures 2ZDO and 4G0N from the PDB for the design of heme-binding proteins and Ras-binding proteins, respectively. The amino acid sequences and encoding DNA sequences of the experimentally examined proteins are available in Supplementary Tables 6–10 and Supplementary Data 1–3. The complete lists of proteins for training and testing the models, the data of experimental results (SEC, multi-angle light scattering, ¹⁵N-¹H HSQC NMR, ITC, CD, validation reports for experimentally solved protein structures) and all in silico experimental results are available from Zenodo via https://doi.org/10.5281/zenodo.10911626 (ref. ⁴⁵). Source data are provided with this paper.

Code availability

Executable computer programs and source codes of SCUBA-D (version 1.0) and SCUBA-sketch (version 1.0) are publicly available from Zenodo via https://doi.org/10.5281/zenodo.10947360 (ref. ⁴⁶) and can be freely used for noncommercial purposes. The source codes for SCUBA-D are also available from GitHub at https://github.com/liuyf020419/SCUBA-D.git/.

References

Huang, P. S., Boyken, S. E. & Baker, D. The coming of age of protein design. Nature 537, 320–327 (2016).
Article CAS PubMed Google Scholar
Kuhlman, B. et al. Design of a novel globular protein fold with atomic-level accuracy. Science 302, 1364–1368 (2003).
Article CAS PubMed Google Scholar
Polizzi, N. F. & DeGrado, W. F. A defined structural unit enables de novo design of small-molecule–binding proteins. Science 369, 1227–1233 (2020).
Article CAS PubMed PubMed Central Google Scholar
Gainza, P. et al. De novo design of protein interactions with learned surface fingerprints. Nature 617, 176–184 (2023).
Article CAS PubMed PubMed Central Google Scholar
Yeh, A. H.-W. et al. De novo design of luciferases using deep learning. Nature 614, 774–780 (2023).
Article CAS PubMed PubMed Central Google Scholar
Li, H., Helling, R., Tang, C. & Wingreen, N. Emergence of preferred structures in a simple model of protein folding. Science 273, 666–669 (1996).
Article CAS PubMed Google Scholar
Kuhlman, B. & Baker, D. Native protein sequences are close to optimal for their structures. Proc. Natl Acad. Sci. USA 97, 10383–10388 (2000).
Article CAS PubMed PubMed Central Google Scholar
Grigoryan, G. & DeGrado, W. F. Probing designability via a generalized model of helical bundle geometry. J. Mol. Biol. 405, 1079–1100 (2011).
Article CAS PubMed Google Scholar
Huang, B. et al. A backbone-centred energy function of neural networks for protein design. Nature 602, 523–528 (2022).
Article CAS PubMed Google Scholar
Anishchenko, I. et al. De novo protein design by deep network hallucination. Nature 600, 547–552 (2021).
Article CAS PubMed PubMed Central Google Scholar
Eguchi, R. R., Choe, C. A. & Huang, P.-S. Ig-VAE: generative modeling of protein structure by direct 3D coordinate generation. PLoS Comput. Biol. 18, e1010271 (2022).
Article CAS PubMed PubMed Central Google Scholar
Lee, J. S., Kim, J. & Kim, P. M. Score-based generative modeling for de novo protein design. Nat. Comput. Sci. 3, 382–392 (2023).
Article CAS PubMed Google Scholar
Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).
Article CAS PubMed PubMed Central Google Scholar
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).
Google Scholar
Chen, N. et al. Wavegrad: estimating gradients for waveform generation. In Proc. International Conference on Learning Representations (ICLR, 2021).
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. & Chen, M. Hierarchical text-conditional image generation with clip latents. Preprint at https://arXiv.org/abs/2204.06125 (2022).
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Article CAS PubMed PubMed Central Google Scholar
Anand, N. & Achim, T. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. Preprint at https://arXiv.org/abs/2205.15019 (2022).
Wu, K. E. et al. Protein structure generation via folding diffusion. Nat. Commun. 15, 1059 (2022).
Trippe, B. L. et al. Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem. In Proc. International Conference on Learning Representations (ICLR, 2023).
Ingraham, J. et al. Illuminating protein space with a programmable generative model. Nature 623, 1070–1078 (2023).
Article CAS PubMed PubMed Central Google Scholar
Yim, J. et al. SE(3) diffusion model with application to protein backbone generation. In Proc. International Conference on Machine Learning (ICML, 2023).
Zhao, H., Gallo, O., Frosio, I. & Kautz, J. Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 3, 47–57 (2016).
Article Google Scholar
Blau, Y., Mechrez, R., Timofte, R., Michaeli, T. & Zelnik-Manor, L. The 2018 PIRM challenge on perceptual image super-resolution. In Proc. the European Conference on Computer Vision (ECCV) Workshops (eds Leal-Taixé, L. & Roth, S.) 334–355 (2019).
Goodfellow, I. et al. Generative adversarial networks. Commun. ACM 63, 139–144 (2020).
Article Google Scholar
Liu, Y. et al. Rotamer-free protein sequence design based on deep learning and self-consistency. Nat. Comput. Sci. 2, 451–462 (2022).
Article CAS PubMed Google Scholar
Dauparas, J. et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 378, 49–56 (2022).
Popov, V., Vovk, I., Gogoryan, V., Sadekova, T., Kudinov, M. Grad-tts: a diffusion probabilistic model for text-to-speech. in International Conference on Machine Learning, 8599–8608 (PMLR, 2021)
Lee, S.-g. et al. PriorGrad: Improving conditional denoising diffusion models with data-dependent adaptive prior. In International Conference on Learning Representations (ICLR, 2022).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article CAS PubMed PubMed Central Google Scholar
Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Sillitoe, I. et al. CATH: increased structural coverage of functional space. Nucleic Acids Res. 49, D266–D273 (2021).
Article CAS PubMed Google Scholar
Zhang, Y. & Skolnick, J. Scoring function for automated assessment of protein structure template quality. Proteins 57, 702–710 (2004).
Article CAS PubMed Google Scholar
Xu, J. & Zhang, Y. How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics 26, 889–895 (2010).
Article CAS PubMed PubMed Central Google Scholar
Lin, Y. & AlQuraishi, M. Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds. In Proc. International Conference on Machine Learning (ICML, 2023).
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn Res. 9, 2579–2605 (2008).
Google Scholar
Wang, J. et al. Scaffolding protein functional sites using deep learning. Science 377, 387–394 (2022).
Article CAS PubMed PubMed Central Google Scholar
Lee, W. C., Reniere, M. L., Skaar, E. P. & Murphy, M. E. Ruffling of metalloporphyrins bound to IsdG and IsdI, two heme-degrading enzymes in Staphylococcus aureus. J. Biol. Chem. 283, 30957–30963 (2008).
Article CAS PubMed PubMed Central Google Scholar
Skaar, E. P., Gaspar, A. H. & Schneewind, O. IsdG and IsdI, heme-degrading enzymes in the cytoplasm of Staphylococcus aureus. J. Biol. Chem. 279, 436–443 (2004).
Article CAS PubMed Google Scholar
Fetics, S. K. et al. Allosteric effects of the oncogenic RasQ61L mutant on Raf-RBD. Structure 23, 505–516 (2015).
Article CAS PubMed PubMed Central Google Scholar
Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2021).
Remy, I., Campbell-Valois, F. & Michnick, S. W. Detection of protein–protein interactions using a simple survival protein–fragment complementation assay based on the enzyme dihydrofolate reductase. Nat. Protoc. 2, 2120–2125 (2007).
Article CAS PubMed Google Scholar
Jing, B., Eismann, S., Suriana, P., Townshend, R. J., Dror, R. Learning from protein structure with geometric vector perceptrons. In Proc. International Conference on Learning Representations (ICLR, 2021).
Wang, G. & Dunbrack, R. L. Jr PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003).
Article CAS PubMed Google Scholar
Wang, S. Source data for manuscript: de novo protein design with a denoising diffusion network independent of pre-trained structure prediction models. Zenodo https://doi.org/10.5281/zenodo.10911626 (2024).
Wang, S. De novo protein design with a denoising diffusion network independent of pre-trained structure prediction models. Zenodo https://doi.org/10.5281/zenodo.10947360 (2024).

Download references

Acknowledgements

We thank the staff from the BL18U1 and BL19U1 beamlines of the National Facility for Protein Science in Shanghai for their assistance during crystallographic data collection. We also thank X. Hu, R. Wu and L. Zhang for their help with experimental techniques, as well as M. Lv and H. Yu for their help with crystal collection. This work was supported by the National Key R&D Program of China (2022YFA1303700 to H.L. and 2022YFF1203100 to Q.C.), National Natural Science Foundation of China (T2221005, 92253302 and 22177107 to H.L.; 32371487 and 32171411 to Q.C.), CAS Strategic Priority Research Program (XDB0500201 to H.L.), CAS Project for Young Scientists in Basic Research (YSBR-072 to Q.C.), Anhui Provincial Natural Science Foundation (2308085J01 to Q.C.) and Research Funds of Center for Advanced Interdisciplinary Science and Biomedicine of IHM (QYPY20230035 to Q.C.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

These authors contributed equally: Yufeng Liu, Sheng Wang.

Authors and Affiliations

Department of Rheumatology and Immunology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, Hefei National Research Center for Physical Sciences at the Microscale, Center for Advanced Interdisciplinary Science and Biomedicine of IHM, University of Science and Technology of China, Hefei, China
Yufeng Liu, Sheng Wang, Jixin Dong, Xinyu Wang & Quan Chen
MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
Yufeng Liu, Sheng Wang, Jixin Dong, Xinyu Wang, Lei Wang, Fudong Li, Chenchen Wang, Jiahai Zhang, Yuzhu Wang, Quan Chen & Haiyan Liu
Oristruct Biotech Co. Ltd, Hefei, China
Linghui Chen, Quan Chen & Haiyan Liu
Biomedical Sciences and Health Laboratory of Anhui Province, Anhui Basic Discipline Research Center of Artificial Intelligence Biotechnology and Synthetic Biology, University of Science and Technology of China, Hefei, China
Fudong Li, Jiahai Zhang, Quan Chen & Haiyan Liu
iFLYTEK Research, Hefei, China
Si Wei
School of Biomedical Engineering, Suzhou Institute for Advanced Research, University of Science and Technology of China, Hefei, China
Haiyan Liu

Authors

Yufeng Liu
View author publications
Search author on:PubMed Google Scholar
Sheng Wang
View author publications
Search author on:PubMed Google Scholar
Jixin Dong
View author publications
Search author on:PubMed Google Scholar
Linghui Chen
View author publications
Search author on:PubMed Google Scholar
Xinyu Wang
View author publications
Search author on:PubMed Google Scholar
Lei Wang
View author publications
Search author on:PubMed Google Scholar
Fudong Li
View author publications
Search author on:PubMed Google Scholar
Chenchen Wang
View author publications
Search author on:PubMed Google Scholar
Jiahai Zhang
View author publications
Search author on:PubMed Google Scholar
Yuzhu Wang
View author publications
Search author on:PubMed Google Scholar
Si Wei
View author publications
Search author on:PubMed Google Scholar
Quan Chen
View author publications
Search author on:PubMed Google Scholar
Haiyan Liu
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.L. developed computational models and codes with the assistance of L.C. S.W. carried out the experimental work with the help of J.D., X.W. and Y.W. L.W., F.L. and C.W. helped with analysis of crystal structural data. J.Z. collected the NMR data. S.W. participated in the discussion. H.L. and Q.C. supervised the project. H.L., Y.L., S.W. and Q.C. wrote the paper.

Corresponding authors

Correspondence to Quan Chen or Haiyan Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Arne Elofsson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Arunima Singh, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Evaluation of the variant models ‘no ESM’, ‘compressed ESM’, ‘full ESM’ and ‘full ESM with GAN’.

(a) Distributions of TM-scores and RMSDs between the initial natural backbones and the denoised backbones generated by the variant modes. For each model, 75 protein backbones were generated by considering 3 independent ‘denoising’ runs from each of 25 initial natural backbones. (b) Distributions of scTM-scores and scRMSDs between the denoised backbones and the AlphaFold2-predicted structures for amino acid sequences designed (with ABACUS-R) on corresponding denoised backbones. (c) Distributions of per-residue ABACUS-R logits scores of amino acid sequences designed by ABACUS-R for the various denoised backbones and for the initial natural backbones. Larger logits scores indicate better compatibility between the designed sequences and the corresponding backbone structures. (d) Left: backbones ‘denoised’ with the ‘full ESM’ model (orange) and the ‘full ESM with GAN’ model (green) from the same initial natural backbone 1e1qA01 (CATH domain ID); the RMSD between the two denoised backbones is indicated. Middle: the backbone denoised with the ‘full ESM’ model (orange) superimposed with the AlphaFold2-predicted structure (gray) for the amino acid sequence designed on this backbone by ABACUS-R; the corresponding scRMSD is indicated. Left: the backbone denoised with the ‘full ESM with GAN’ model (green) superimposed with the AlphaFold2-predicted structure (gray) for the amino acid sequence designed on this backbone by ABACUS-R; the corresponding scRMSD is indicated. (e) The distributions of the ABACUS-R logits of different amino acid sequences for 25 natural backbones. The pESM sequences were obtained by projecting the single representation parts from the SCUBA-D output using a residue type classifier network of ESM. The boxplots in A to C and E show median, interquartile range, and minimum and maximum values excluding outliers (>1.5 times the interquartile range beyond the box) with the sample sizes being 75 (for the denoised backbones) or 25 (for the natural backbones).

Source data

Extended Data Fig. 2 Comparasions between SCUBA-D and other DDPM models for unconditional backbone generation.

(a) Averaged metrics of various models. For each method, the averages over two groups (one group comprised 100 backbones of 100 residues in chain length and the other group comprised 300 backbones of 200 to 400 residues in chain length) are reported, with data in the parentheses reporting the total number of backbones with scRMSD below 2.5 Å or the total number of backbones of high overall structural novelty (the highest TM-score to PDB below 0.5). (b) Two example backbones of 100 residues (100-9 and 100-7) generated by SCUBA-D without condition and with their highest TM-scores to both PDB and AlphaFold2 database below or equal to 0.5. The generated backbones (in blue) and their superimpositions with the corresponding structures from PDB or AlphaFold2 database (in salmon) are shown. The respective TM-scores and PDB IDs (with chain IDs) are indicated. Here the scRMSD of a generated backbone was determined as the RMSD of the backbone from the AF2 predicted structure for the amino acid sequence designed (here with ProteinMPNN) for that backbone. (c) The structures of ten example backbones with RosettaFold2-based scRMSDs above 6.0 Å. Both the ESM prediction-based scRMSDs and the RosettaFold2 prediction-based scRMSDs are indicated.

Source data

Extended Data Fig. 3 Example results of size-exclusion chromatography (SEC) experiments.

Proteins designed in the five different tasks as indicated were analyzed. For each task, three example results are shown in the same row. The protein IDs and the types of the SEC columns are indicated.

Source data

Extended Data Fig. 4 The deviations between the loops in designed structures and in solved crystal structures.

(a) The RMSDs between the loops. The analysis included the 6 crystal structures obtained for proteins of backbones generated by SCUBA-D without condition. Each point corresponds to a loop, with the loops grouped according to their lengths and those of the same length displayed in the same column. The RMSDs were calculated by superimposing the flanking secondary structure segments for a pair of compared loops. An example showing the superimposed structures with the indicated RMSD between designed loop (blue) and corresponding crystal structure (orange). (b) The same as A, but for the 6 experimentally determined structures of proteins generated for particular architectures.

Source data

Extended Data Fig. 5 Protein backbone generation without condition or with biased secondary structure (SS) distributions.

(a) The distribution of the mutual TM-scores between the set of backbones unconditionally generated by SCUBA-D. (b) Histograms of the proportions of residues in the α helix state (upper panel) and of residues in the β strand state (lower panel) for the set of unconditionally generated backbones with SCUBA-D (blue) and for a set of natural protein structures (salmon), which comprised PDB structures of resolutions higher than 2.0 Å, of mutual sequence identities below 40%, and of 100 to 500 residues in length. The proportions were calculated on individual backbones. The histograms represent the normalized frequencies of backbones with proportions in specific bins. (c) Scattering plot of the recovery rates of the input secondary structure (SS) states versus the scRMSDs for the set of 225 backbones generated using the 25 input structures. Each input structure was composed according to the SS distribution of a natural backbone. The gray box indicates the region with scRMSD < 2.0 Å and SS recovery rate > 70%. (d) The scRMSDs of the backbones generated with biased SS distributions and the SS recovery rate. For each SS distribution, 9 backbones were generated and evaluated, one data point in the plots corresponding to one designed backbone. The results for three different classes of SS distributions (all-α, all-β, and mixed αβ) are displayed in different plots. Within each plot, results biased towards the same SS distribution are numbered the same and displayed in the same column. Results for different SS distributions were arranged from left to right in an ascending order of the corresponding chain lengths.

Source data

Extended Data Fig. 6 Backbone generation with skected input structures.

(a) Example scattering plots of scRMSD versus TM-score to initial structure for the backbones generated from initial structures ‘sketched’ according to three architectures of different natural proteins. The examples were of different fold classes (all-α, all-β, and mixed αβ). For each architecture, backbones were generated by applying SCUBA-D to 60 independently ‘sketched’ input structures. The dashed boxes indicate regions with scRMSDs < 2.0 Å and the TM-scores to initial backbones > 0.5. (b) An example for which no generated backbone for the particular architecture meet the criteria of scRMSD < 2.0 Å and TM-score > 0.5. Left: the scattering plot of scRMSD versus the TM-score to initial and backbone for the architecture. Middle: an example of the initial backbone. Right: an example generated backbone.

Source data

Extended Data Fig. 7 Designing proteins of the (αβ)n-barrel and the (β4)n-propeller architectures.

(a) Left: an example initial structure ‘sketched’ according to the (αβ)₁₅-barrel architecture. Right: example backbones (blue) generated for the (αβ)_n-barrel architectures superimposed with structures predicted by AlphaFold2 (gray) for amino acid sequences designed for these backbones with ProteinMPNN. The scRMSDs of the superimpositions are indicated. For each value of the repeat number n from 9 to 15, one example is shown. (b) The same as A, but for the (β₄)_n-propeller architectures with n ranging from 7 to 11. (c) Left: the crystal structure (gold and salmon) and the designed backbone (blue) of the designed (αβ)₉-barrel protein T01. The crystal structure presents a domain-swapped dimer, with the monomers colored differently. The designed backbone is superimposed with one of the monomers. Right: the results of SEC (black curve) and static light scattering (red curve) experiments on T01, which indicate that the protein exists in the monomeric state in solution. (d) Left: the crystal structure (gold, yellow, and salmon) and the designed backbone (blue) of the designed (αβ)₉-barrel protein T11. The crystal structure presents a domain-swapped trimer, with the monomers colored differently. The designed backbone is superimposed with one of the monomers. Right: the results of SEC (black curve) and static light scattering (red curve) experiments on T11, which indicate that the protein exists in the monomeric state in solution.

Source data

Extended Data Fig. 8 Designed heme-binding proteins.

(a) Scattering plot and histograms of the scRMSD and pLDDT scores of the designed heme-binding backbones. Structure predictions with AlphaFold2 were performed for amino acid sequences designed with the ABACUS-R program. (b) UV-Visible absorbance spectra of 9 designed heme-binding proteins are shown with the topology diagrams of the corresponding proteins. ‘NC’ represents negative control. ‘IsdG’ represents the natural iron-regulated surface determinant G protein which served as a positive control. Heme binding is indicated by the presence of the peak around 412 nm. (c) Experimental characterizations of the designed heme-binding protein H6. Left: SEC result. Middle: NMR ¹⁵N-¹H HSQC spectrum. Right: ITC measurements on heme binding. (d) The same as C, but for H8. (e) The result of ITC experiments measuring the K_D values of heme binding by the natural protein IsdG. (f) UV-Visible absorbance spectra showing the impacts of mutating the iron-coordinating histidine residues in the natural protein and the designed heme-binding proteins. Each panel shows the spectrum of a mutated protein together with the spectra of the corresponding original protein and of a non-heme-binding negative control protein (labeled as ‘NC’).

Source data

Extended Data Fig. 9 The designed structures and experimental characterizations of the Ras-binding proteins 90-4, 90-2 and 120-4.

(a) The designed proteins (90-4, 90-2 and 120-4, colored in blue) are correspondingly superimposed with the predicted structures (gray) in complex with Ras (green). For each designed protein, the residues to be mutated is shown with its surrounding residues in the predicted structure next to the overall superimposition. The scRMSD and ligand pLDDT are indicated. (b) NMR ¹⁵N-¹H HSQC spectrum of Ras-binding proteins 90-2. (c) The results of ITC measurements on the Ras binding of 90-4, 90-2 and 120-4 and their mutated variants. (d) The results of ITC measurements on the Ras binding of Raf-RBD and the mutated variant of Raf-RBD (R89L).

Source data

Extended Data Fig. 10 Assessing the designed Ras-binding proteins with the dihydrofolate reductase (DHFR)-based protein complementarity analysis assay.

(a) The protein complementarity analysis results on 14 designed Ras-binding proteins. In these experiments, the peptide chain of DHFR is split into two parts. Ras and the protein to be assessed were separately fused with each part. Bacterium cells expressing the two fused peptides were diluted to different levels of concentrations and tittered on media containing different levels of trimethoprim (TMP), which can inhibit the endogenous DHFR activity of the cells. Possible binding between Ras and the protein to be assessed was detected through the resistance of the bacterium cells to the growth inhibition by TMP. The label ‘Raf-RBD’ represents the Ras-binding domain of Raf, which served as a positive control. The label ‘Raf-RBD R89L’ represents a mutant with abolished Ras binding activity, which served as a negative control. The stronger TMP resistance (relative to the negative control) exhibited by the cells expressing fusion peptides of the designed proteins indicated that the designed proteins examined here can bind Ras. (b) Results of competitive DHFR-PCA analysis of 4 designed proteins. In the experiments examining a designed protein, cells co-expressing isolated Raf-RBD and the DHFR-PCA system for the designed protein were analyzed. If the designed protein and Raf-RBD share binding sites on Ras, the expression of Raf-RBD, which is induced by L-arabinose, will lead to the competitive inhibition of the Ras binding of the designed protein, detected as reduced resistance to TMP.

Source data

Supplementary information

Supplementary Information

Supplementary Methods, Fig. 1, Tables 1–11 and References.

Reporting Summary

Peer Review File

Supplementary Data 1

Raw data for Supplementary Fig. 1.

Supplementary Data 2

Partial computational data and crystallographic data for Supplementary Tables 1–4.

Supplementary Data 3

DNA sequences of experimentally characterized proteins.

Source data

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data and unprocessed NMR data.

Source Data Fig. 5

Statistical source data.

Source Data Extended Data Fig./Table 1

Statistical source data.

Source Data Extended Data Fig./Table 2

Statistical source data.

Source Data Extended Data Fig./Table 3

Statistical source data.

Source Data Extended Data Fig./Table 4

Statistical source data.

Source Data Extended Data Fig./Table 5

Statistical source data.

Source Data Extended Data Fig./Table 6

Statistical source data.

Source Data Extended Data Fig./Table 7

Statistical source data.

Source Data Extended Data Fig./Table 8

Statistical source data and unprocessed NMR data.

Source Data Extended Data Fig./Table 9

Statistical source data and unprocessed NMR data.

Source Data Extended Data Fig./Table 10

Unprocessed images.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, Y., Wang, S., Dong, J. et al. De novo protein design with a denoising diffusion network independent of pretrained structure prediction models. Nat Methods 21, 2107–2116 (2024). https://doi.org/10.1038/s41592-024-02437-w

Download citation

Received: 28 November 2023
Accepted: 30 August 2024
Published: 09 October 2024
Version of record: 09 October 2024
Issue date: November 2024
DOI: https://doi.org/10.1038/s41592-024-02437-w

This article is cited by

Efficient protein structure generation with sparse denoising models
- Michael Jendrusch
- Jan O. Korbel
Nature Machine Intelligence (2025)
Enhancing functional proteins through multimodal inverse folding with ABACUS-T
- Yufeng Liu
- Rui Wu
- Haiyan Liu
Nature Communications (2025)
Modification and applications of glucose oxidase: optimization strategies and high-throughput screening technologies
- Zeyang Li
- Yong Chen
- Huayou Chen
World Journal of Microbiology and Biotechnology (2025)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links