Figure 1 | Scientific Reports

Figure 1

From: Predicting type 2 diabetes via machine learning integration of multiple omics from human pancreatic islets

Figure 1

MultiOmics Integration Workflow. (a) Illustration of the idea and principle behind multiOmics integration: to see patterns hidden in individual Omics. The classes of data points cannot be reliably determined using separate Omics axes, however, become linearly separable when putting the Omics against each other. (b) Presents different Machine Learning methods for multiOmics integrations, including supervised linear methods such as Partial Least Squares (PLS) regression, Orthogonal Partial Least Squares (OPLS), mixOmics, Least Absolute Shrinkage and Selection Operator (LASSO), Ridge regression, and Elastic net regularization, supervised non-linear methods such as Neural networks, Random Forest and Bayesian networks, unsupervised linear methods such as Factor analysis and MultiOmics Factor Analysis (MOFA) as well as unsupervised non-linear models such as autoencoder, Similarity Network Fusion (SNF), Uniform Manifold Approximation and Projection (UMAP) and Clustering of clusters. The choice of integrative multiOmics method depends on (1) sample size and (2) presence of a phenotype of interest. In this study we prioritized a supervised linear method (PLS) since we have a limited number of samples and T2D as a clear phenotype of interest. (c) A schematic overview of the ambition of multiOmics integration to achieve a boost in the predictive capacity compared to the predictions of each Omic individually. Since the Omics data are sampled from very different underlying probability distributions (top of the figure exemplifies the distribution), we cannot simply concatenate the Omics into a single matrix without at least converting them to a common space where their technological memory is lost (left box).

Back to article page