Fig. 2: Summary of Deep Novel Mutation Search (DNMS). | Communications Biology

Fig. 2: Summary of Deep Novel Mutation Search (DNMS).

From: Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutations

Fig. 2

DNMS starts with an input sequence being fed into ProtBERT . From ProtBert DNMS extracts the attention matrix A; a protein semantic embedding Z; and output posterior probability. DNMS calculates ' Attention Change, ' Semantic Change and ' Grammaticality for every single point amino acid substitution for the input sequence. In this example, two mutations are visualized at position i = 4, where the input sequence has token L, xi = L. The two mutations are L4A and L4E, denoted by \({\tilde{x}}_{i}\). ' Grammaticality, denoted with \(p({\tilde{x}}_{i}| {{{{\bf{X}}}}}_{k})\), for the two mutations are calculated from the posterior probability output from ProtBERT,. Grammaticality is a measure of statistical patterns learned from the fine-tuned ProtBERT model. For each mutation, we pass into ProtBERT the mutated sequence, \({\tilde{{{{\bf{X}}}}}}_{k}[{\tilde{x}}_{i}]\) which represents the input sequence with the introduced mutation at position i. ' We obtain the attention matrix for the mutated sequence, \({{{\bf{A}}}}[{\tilde{x}}_{i}]\), and calculate Attention Change (change from A), ΔA, which is a measure of similarity. ' We obtain a protein semantic embedding for the mutated sequence, \({{{\bf{Z}}}}[{\tilde{x}}_{i}]\), and calculate Semantic Change (change from Z), ΔZ, which is an additional measure of similarity. DNMS combines the rankings of Semantic Change, Grammaticality, and Attention Change; prioritizing high Grammaticality, and low Semantic Change and Attention Change. Future novel mutations are discovered using \({\mathtt{DNMS}}({\tilde{x}}_{i};{{{{\bf{X}}}}}_{k})\).

Back to article page