Fig. 1
From: Neural network conditioned to produce thermophilic protein sequences can increase thermal stability

Overview of presented work. (A) We fine-tuned and autoregressive encoder-decoder protein language model to recapitulate thermophilic variants of mesophilic proteins, using 4 million meso-thermo homolog examples from bacterial origin. The dataset is limited by only those where thermophilic organism temperature could be acquired, yet covers 2.7 k protein families labeled using Pfam34. (B) We used the trained model to score protein variants on thermal stability benchmark datasets including single and multi mutant data. The model achieves statistically significant correlation with measured melting temperatures and catalytic half inactivation temperatures. (C) We use the trained model to redesign 1ENH. The model suggested 14 changes, including insertions, that increased the melting temperature of the protein by 15.5 K.