Fig. 1: Present state of the art in biocatalytic reaction discovery.
From: Connecting chemical and protein sequence space to predict biocatalytic reactions

a, Established methods for new biocatalytic reaction discovery. Known connections between chemical and protein sequence space can be exploited for new reaction discovery through local exploration. The known reaction between epichlorohydrin and epoxide hydrolase (EH) was used to enable the reaction on the epoxide analogue towards the synthesis of GSK2330672 (ref. 10). Alternatively, local protein sequence space was explored through protein engineering to improve the transformation of the known substrate (Ar = p-biphenyl) with wild-type (wt) amino transaminase (ATA), resulting in ATA-r11 after 11 rounds of directed evolution (positions of mutations shown in purple)13. b, Limitations of present methods. Expansion of characterized biocatalytic reactivity is limited to local exploration of chemical and sequence space, inhibiting larger, non-intuitive leaps between the landscapes. There remains a vast unexplored region of substrates and enzymes with unknown biocatalytic reactivity, creating a higher risk for their incorporation as key steps in chemical synthesis. There is at present no method to predict compatible enzymes or substrates in the NHI enzyme superfamily. c, Our approach to streamline biocatalytic reaction discovery. We examined diverse substrates and protein sequences for new biocatalytic reactions and use these data to build machine learning models to predict compatible enzymes and substrates.