Fig. 1: Overview of the simulation setup based on real datasets.
From: A systematic benchmark of integrative strategies for microbiome-metabolome data

A Three microbiome-metabolome datasets were selected, each exhibiting different data structures and correlations. We reported the sample size (N) and the number of features (P), as N × P, for each dataset. B Realistic datasets were simulated using the “Normal-to-Anything" (NORtA) framework. First, we estimated sparse microbiome and metabolome correlation networks using SpiecEasi. Second, correlated multivariate Gaussian distributions were generated for both microbiome and metabolome datasets using the correlation structures estimated in the previous step. Third, Gaussian distributions were converted into arbitrary distributions matching the original data structures. C Associations between species and metabolites were specified, mimicking the complex entanglement between the two omic layers. For each dataset, proportions of associated features vary between 1% and 10%, with association strengths randomly picked from a Gaussian distribution.