Fig. 1: Overview of dataset preparation, and GeoPoc model architecture.

a The data collection and dataset preparation process. b The overall architecture of the GeoPoc model. ESM2.0 is used to extract the sequence embedding from the sequence, and the protein structure is taken from the AlphaFold2 database. After featuring these as protein graphs, the graph is input to the GeoFormer module to get hidden embeddings. Finally, the hidden embeddings are pooled by the self-attention pooling layer, which is input to the output MLP to predict the temperature, pH, and salt concentration. Note: SaltConc denotes salt concentration.