Fig. 2: Outstanding performance of RFdiffusion for monomer generation.
From: De novo design of protein structure and function with RFdiffusion

a, RFdiffusion can generate new monomeric proteins of different lengths (left 300, right 600) with no conditioning information. Grey, design model; colours, AF2 prediction. r.m.s.d. AF2 versus design (Å), left to right: 0.90, 0.98, 1.15, 1.67. b, Unconditional designs from RFdiffusion are new and not present in the training set as quantified by highest TM-score to the PDB; the divergence from previously known structures increases with length. c, Unconditional samples are closely repredicted by AF2 up to about 400 amino acids. d, RFdiffusion significantly outperforms Hallucination (with RF) at unconditional monomer generation (two-proportion z-test of in silico success: n = 400 designs per condition, z = 9.5, P = 1.6 × 10−21). Although Hallucination successfully generates designs up to 100 amino acids in length, in silico success rates rapidly deteriorate beyond this length. e, Ablating pretraining (by starting from untrained RF), RFdiffusion fine-tuning (that is, using original RF structure prediction weights as the denoiser), self-conditioning or m.s.e. losses (by training with FAPE) each notably decrease the performance of RFdiffusion. r.m.s.d. between design and AF2 is shown, for the unconditional generation of 300 amino acid proteins (Supplementary Methods). f, Two example 300 amino acid proteins that expressed as soluble monomers. Designs (grey) overlaid with AF2 predictions (colours) are shown on the left, alongside circular dichroism (CD) spectra (top) and melt curves (bottom) on the right. The designs are highly thermostable. g, RFdiffusion can condition on fold information. An example TIM barrel is shown (bottom left), conditioned on the secondary structure and block adjacency of a previously designed TIM barrel, PDB 6WVS (top left). Designs have very similar circular dichroism spectra to PDB 6WVS (top right) and are highly thermostable (bottom right). See also Extended Data Fig. 3 for further traces. Boxplots represent median ± interquartile range; tails are minimum and maximum excluding outliers (±1.5× interquartile range).