Extended Data Figure 2: Modelling development of the gut microbiota during the first 24 months of life in healthy twins.
From: Development of the gut microbiota and mucosal IgA responses in twins and gnotobiotic mice

a, To estimate the number of OTUs needed to maximize predictive accuracy, OTUs were iteratively added to a series of RF models, starting with the OTU with the highest feature importance score and adding additional OTUs in order of decreasing feature importance. To evaluate performance of the model, members of the 40 twin cohort were randomly assigned to ‘training’, ‘co-twin of training’, and ‘test’ sets (red, green, and blue, respectively) ten times, and the Spearman’s correlation coefficient and adjusted r2 of a linear model were calculated for a given model size (n = 10 models for each data point, mean ± s.e.m. values are plotted). The dashed vertical line indicates performance of a 25 OTU model across the three different sets. b, Predicted age was calculated for all faecal microbiota samples with a sparse 25 OTU RF-generated model. Chronological versus model-predicted age is plotted for each of the three data subsets (n = 1,477 faecal samples). The inset shows mean ± s.d. values for predicted microbiota age of samples in each monthly age bin. c, Heatmap of mean abundances over the first 24 months of life for the 25 OTUs used to generate the sparse model. Taxa are normalized by row, with hierarchical clustering (complete linkage; n = 1,477 faecal samples).