Fig. 1: Overview of dataset preparation, and GeoPoc model architecture. | Communications Biology

Fig. 1: Overview of dataset preparation, and GeoPoc model architecture.

From: Accurately predicting optimal conditions for microorganism proteins through geometric graph learning and language model

Fig. 1

a The data collection and dataset preparation process. b The overall architecture of the GeoPoc model. ESM2.0 is used to extract the sequence embedding from the sequence, and the protein structure is taken from the AlphaFold2 database. After featuring these as protein graphs, the graph is input to the GeoFormer module to get hidden embeddings. Finally, the hidden embeddings are pooled by the self-attention pooling layer, which is input to the output MLP to predict the temperature, pH, and salt concentration. Note: SaltConc denotes salt concentration.

Back to article page