Fig. 2: Prediction of AAS branch pathway enzymes to produce 4HPAA for norcoclaurine production.
From: Machine learning discovery of missing links that mediate alternative branches to plant alkaloids

a The aromatic acetaldehyde synthase (AAS) branch pathway (green) can produce 4HPAA directly from tyrosine, but this missing link is unreported in P. somniferum. b Structure-based curation of typical aromatic amino acid decarboxylase (AAAD), insect-type AAS and plant-type AAS, as represented by the active site configurations of Pseudomonas putida DDC (PpDDC, blue), Bombyx mori DHPAAS (DHPAAS, grey), and Petroselinum crispum 4HPAAS (Pc4HPAAS, deepteal). AAS candidate PsTyDC1 (green) has a unique active site, while AAS candidate PsTyDC6 (green) has an AAAD-like active site and could not be predicted by a homology or structure-based approach alone. c Cross-validation for correct assignment of AAAD and AAS training sequences is performed using SVM models (blue), Random forests models (green), and by comparing sequence homology of each training sequence to a consensus sequence of AAS training sequences and a consensus sequence of AAAD training sequences (grey), as described in the methods section. d For visual representation, a two-dimensional plot of AAS SVM-based prediction is shown, with positive and negative prediction spaces colored green and white, respectively (left side). Principal component analysis (PCA) is used to compress multi-dimensional data into two dimensions (PC1 and PC2) for a visual representation. Corresponding high-dimensional SVM decision scores from Supplementary Table 1 are shown on the right. Decision scores represent the distance from the SVM prediction boundary. PsTyDC1 and PsTyDC6 score highest for AAS prediction and are colored red. e LC-MS detection of products from Thalictrum flavum norcoclaurine synthase (TfNCS) containing strains T1-01-DE3 (wild-type PsTyDC1 + TfNCS), T1-02-DE3 (PsTyDC1-L205H + TfNCS) and T1-03-DE3 (PsTyDC1-Y98F-F99Y-L205N + TfNCS) (Supplementary Table 2), grown in LB supplemented with 1 mM tyrosine and 0.5 mM dopamine, at 28 °C with 180 rpm shaking for 51 h. Selective in vivo production of the downstream AAS product norcoclaurine accompanies the expression of wild-type PsTyDC1 (green), as well as the triple variant of PsTyDC1 with an engineered active site based on that of insect DHPAAS (red). Tyramine is the major product of PsTyDC1-L205H (grey), which contains an engineered active site based on typical AAAD. Similar results are replicated in Supplementary Fig. 2 and Supplementary Fig. 3.