Fig. 1: Flowchart of the data analysis. | Nature Communications

Fig. 1: Flowchart of the data analysis.

From: Nonlinear machine learning pattern recognition and bacteria-metabolite multilayer network analysis of perturbed gastric microbiome

Fig. 1: Flowchart of the data analysis.

To answer the five questions under investigation in our study, we implemented a workflow based on machine learning tools. Following the flowchart shown in the figure, we analysed three 16 S rRNA gene sequencing datasets with information on PPI use in dyspeptic patients; for one of the datasets (Paroni Sterbini et al.22), patients were also determined to be positive or negative to H. pylori infection. First, we performed unsupervised dimension reduction, both linear and nonlinear, in the first two dimensions of embedding. Nonlinear dimension reduction will show the presence of hidden patterns, in the form of sample groups. Secondly, nonlinear clustering was applied to confirm the well-possedeness of the hidden patterns found by nonlinear dimension reduction. Furthermore, our workflow ends with the network analysis. It starts with the use of the PC-corr algorithm, that reveals which combination of bacteria (features) are responsible for the identified differences between the groups of samples. A fourth dataset (Parsons et al.29) is used only for the validation of the PC-corr network results and it contains information of PPI treatment and H. pylori infection. From the consensus bacteria found in each PC-corr network, a bacteria-metabolite multilayer analysis that lastly end with the metabolite pathway enrichment analysis that introduces evidence to possible perturbed biological mechanisms.

Back to article page