Fig. 2: Scoring Galaxy-SynBioCAD predicted pathways with literature pathways and expert validation data. | Nature Communications

Fig. 2: Scoring Galaxy-SynBioCAD predicted pathways with literature pathways and expert validation data.

From: The automated Galaxy-SynBioCAD pipeline for synthetic biology design and engineering

Fig. 2

a Pathways for different targets and different hosts are extracted from literature (cf. Literature data benchmarking subsection), this is illustrated here for production of phenol in E. coli. b Galaxy-SynBioCAD workflows are run on the literature targets and hosts. c A collection of Galaxy-SynBioCAD generated pathways is compiled. Pathway ‘A’ producing phenol in E. coli from tyrosine is highlighted. d The Galaxy-SynBioCAD generated pathways are compared with the literature pathways using a matching algorithm (cf. ‘Supplementary_Text’ file). The plot shows for each literature pathway the best matching pathways among all Galaxy-SynBioCAD generated pathways. Pathways having a matching score above 0.5 are identical (similarity of 1) to literature pathways as far as main substrate and products are concerned. The raw data can be found in Supplementary file ‘Dataset 2’, tab ‘literature_matching_score’. e Galaxy-SynBioCAD generated pathways are evaluated by metabolic engineer experts whose task is to select in batches of 5 generated pathways which ones are valid (cf. Expert validation trial benchmarking subsection). f Valid pathways according to experts and pathways matching literature are added to a training set of labeled pathways. g The set of labeled pathways is used to train a classifier printing out a machine learning score to assess if a given pathway is valid or not (cf. Machine Learning Global Scoring in Methods section). The figure plots the results obtained for all pathways generated by Galaxy-SynBioCAD. The raw data, including the training set, can be found in the Supplementary file ‘Dataset 3’. Using a machine learning global score threshold of 0.5, the accuracy retrieving literature of expert labeled pathways is 0.91 with a false positive rate of 0.10 in 4-fold cross validation (cf. Supplementary file ‘Dataset 3’, tab ‘Pathway_PredictedScore’). Source data are provided in the ‘Source Data’ file.

Back to article page