Fig. 2: A task-specific enzyme representation developed from the ESM-1b model. | Nature Communications

Fig. 2: A task-specific enzyme representation developed from the ESM-1b model.

From: A general model to predict small molecule substrates of enzymes based on machine and deep learning

Fig. 2

a ESM-1b model. Amino acids of a protein sequence are represented with numerical vectors and passed through a transformer network. Some amino acid representations are masked. All representations are iteratively updated 33 times, using information about neighboring and distant amino acids. The ESM-1b model is trained to predict the masked amino acids. ESM-1b vectors are calculated by taking the element-wise mean of all representations in the last layer. b Modified ESM-1b model. An additional representation for the whole enzyme is added to the amino acid representations. After updating all representations 33 times, the enzyme representation is concatenated with a small molecule representation. The network is trained to predict whether the small molecule is a substrate for the given enzyme. After training, the ESM-1bts vector is extracted as the enzyme representation before adding the small molecule representation.

Back to article page