Fig. 5: Use of the machine learning model for CATNIP. | Nature

Fig. 5: Use of the machine learning model for CATNIP.

From: Connecting chemical and protein sequence space to predict biocatalytic reactions

Fig. 5

a, Demonstration of substrate-to-enzyme CATNIP with sparteine (16), matridine (18) and 6-methyleneandrost-4-ene-3,17-dione (20). In the chemical space map, the substrate of interest (open black circle), the nearest substrates over five dimensions (dark blue circle), unchosen substrates in BioCatSet1 (light blue circle) and substrates without known biocatalytic activity (grey circle) are shown. The sequence space shows all enzymes in the cluster (SSN at alignment score = 75) with predicted compatible enzymes (k = 10), with rank shown in decreasing shades of purple. Enzymes not predicted in the top ten sequences are represented as grey nodes. The top ten predicted enzyme sequences were prepared in whole-cell E. coli and examined for relative product formation in triplicate. The x-axis contains the enzyme prediction rank, for which X = no enzyme control. The y-axis shows the average relative extracted ion count (n = 3). Several products are represented with various shades of green. The enzyme generating the most product was then produced (1 -l cultures in Terrific Broth) and used in 50-mg-scale biocatalytic reactions as clarified cell lysate. Oxidation products were isolated and characterized for the three substrates of interest, providing (4S)-hydroxysparteine (17), (12S)-hydroxymatridine (19) and androst-4-ene-3,6,17-trione (21) in 35%, 50% and 12% isolated yields, respectively. b, Demonstration of the enzyme-to-substrate CATNIP model with NHI123, NHI177 and TqaL. Each enzyme was mapped to sequence space, which shows all enzymes in the cluster (SSN at alignment score = 75), with the ten most similar enzymes shown in decreasing shades of purple. Enzymes not predicted in the top ten sequences are represented as grey nodes. The predicted compatible substrates are identified (dark blue) and mapped to chemical space among all substrates in BioCatSet1 (light blue) and substrates outside the dataset (grey). The best-ranked substrates were tested with the enzyme of interest in triplicate and the relative product conversion was measured. The x-axis shows the rank of the small molecule substrate in decreasing order. The y-axis shows the average normalized relative conversion, as compared with the empty vector control of each sample (n = 3). The structure for the best-ranked substrate for each enzyme is shown as 22, 12 and 23, respectively.

Back to article page