Fig. 1: Mask-prior-guided denoising diffusion (MapDiff) for inverse protein folding. | Nature Machine Intelligence

Fig. 1: Mask-prior-guided denoising diffusion (MapDiff) for inverse protein folding.

From: Mask-prior-guided denoising diffusion improves inverse protein folding

Fig. 1

a, The mask-prior pretraining stage randomly masks residues within the AA sequence and pretrains an invariant point attention (IPA) network with the masked sequence and the 3D backbone structure to learn prior structural and sequence knowledge, using BERT-like masked language modelling objectives. b, The mask-prior-guided denoising network ϕθ takes an input noisy AA sequence Xaa to predict the native AA sequence \({{\bf{X}}}_{0}^{\rm{aa}}\) by means of three operations in every iterative denoising step. It first initializes a structure-based sequence predictor as an equivariant graph neural network to denoise the noisy sequence Xaa conditioned on the provided 3D backbone structure. Then, combining an entropy-based mask strategy with a mask ratio adaptor identifies and masks low-confidence residues in the denoised sequence in the first step to produce a masked sequence \({{\bf{X}}}_{\rm{m}}^{\rm{aa}}\). Next, the pretrained masked sequence designer in a takes the masked sequence \({{\bf{X}}}_{\rm{m}}^{\rm{aa}}\) and its 3D backbone information for refinement (fine-tuning) to better predict the native sequence \({{\bf{X}}}_{0}^{\rm{aa}}\). c, The MapDiff denoising diffusion framework iteratively alternates between two processes: diffusion and denoising. The diffusion process progressively adds random discrete noise to the native sequence \({{\bf{X}}}_{0}^{\rm{aa}}\) according to the cumulative transition matrix \({\overline{\bf{Q}}}_{t}\) at the diffusion step t so that the real data distribution can gradually transition to a uniform or marginal prior distribution. The denoising process randomly samples an initial noisy AA sequence \({{\bf{X}}}_{T}^{\rm{aa}}\) from the prior distribution and iteratively uses the denoising network ϕθ in b to denoise it, learning to predict the native sequence \({{\bf{X}}}_{0}^{\rm{aa}}\) from \({{\bf{X}}}_{t}^{\rm{aa}}\) at each denoising step t. The prediction \({\hat{{\bf{X}}}}_{0}^{\rm{aa}}\) facilitates the computation of the posterior distribution \(q({{\bf{X}}}_{t-1}^{\rm{aa}}| {{\bf{X}}}_{t}^{\rm{aa}},{\hat{{\bf{X}}}}_{0}^{\rm{aa}})\) for predicting a less-noisy sequence \({{\bf{X}}}_{t-1}^{\rm{aa}}\).

Back to article page