Fig. 1: Machine learning model to predict kcat from numerical enzyme representations and reaction fingerprints.
From: Turnover number predictions for kinetically uncharacterized enzymes using machine and deep learning

Experimentally measured kcat values are downloaded from three different databases. Enzyme information is represented with numerical vectors obtained from natural language processing (NLP) models that use the linear amino acid sequence as their input. Chemical reactions are represented using integer vectors. Concatenated enzyme-reaction representations are used to train a gradient boosting model to predict kcat. After training, the fitted model can be used to parameterize metabolic networks with kcat values.