Fig. 3: Clustering the strains by the chemical contribution to growth (K). | Communications Biology

Fig. 3: Clustering the strains by the chemical contribution to growth (K).

From: Data-driven discovery of the interplay between genetic and environmental factors in bacterial growth

Fig. 3

A Machine learning model construction and prediction for clustering the strains. The feature importance of each chemical in individual strain was predicted by the GBDT model, which was constructed using the corresponding growth profiles. B Clustering dendrogram of the strains. The heatmap indicates the feature importance of each chemical in each strain. Chemicals representing the medium components comprised in 135 media are arrayed vertically. Four clusters (C1–C4) are shown in green and orange. The number of strains in each cluster is bracketed. C Chemicals of high priority in the four clusters. The top five chemicals are shown. Boxplots of the genes (indicated by gray dots) that participated in the cluster are indicated. D Clusters participated in the nine pathways. The number of 114 knockout genes assigned in the four clusters participating in each pathway is shown. E Gene–chemical network constructed upon their contribution to bacterial growth (K). A total of 204 combinations of genes and chemicals are used to create the network, in which the chemicals of high importance to determine the growth of 115 strains are indicated. The large nodes in transparent yellow represent the determinative chemicals, whose sizes are the sum of the feature importance to all linked strains (genes). The small nodes in green and orange indicate the knockout genes categorized in the four clusters. The thickness of the edges reflects the magnitude of the feature importance.

Back to article page