Shipped and shifted: modeling collection-induced bias in microbiome multi-omics using a tractable fermentation system

Meyer, Annina R.; Tan, Jan Patrick; Mihaila, Mihnea Paul; Neugebauer, Michelle; Nyström, Laura; Bokulich, Nicholas A.

doi:10.1038/s41522-025-00909-1

Download PDF

Article
Open access
Published: 09 January 2026

Shipped and shifted: modeling collection-induced bias in microbiome multi-omics using a tractable fermentation system

Annina R. Meyer¹,
Jan Patrick Tan²,
Mihnea Paul Mihaila¹,
Michelle Neugebauer¹,
Laura Nyström² &
…
Nicholas A. Bokulich¹

npj Biofilms and Microbiomes volume 12, Article number: 43 (2026) Cite this article

2225 Accesses
1 Citations
Metrics details

Subjects

Abstract

Large-scale, decentralized microbiome sampling surveys and citizen science initiatives often require periods of storage at ambient temperature, potentially altering sample composition during collection and transport. We developed a generalizable framework to quantify and model these biases using sourdough as a tractable fermentation system, with samples subjected to controlled storage conditions (4 °C, 17 °C, 30 °C, regularly sampled up to 28 days). Machine-learning models paired with multi-omics profiling—including microbiome, targeted and untargeted metabolome profiling, and cultivation—revealed temperature-dependent shifts in bacterial community structure and metabolic profiles, while fungal communities remained stable. Storage induced ecological restructuring, marked by reduced network modularity and increased centrality of dominant taxa at higher temperatures. Notably, storage duration and temperature were strongly encoded in the multi-omics data, with temperature exerting a more pronounced influence than time. 24 of the top 25 predictors of storage condition were metabolites, underscoring functional layers as both sensitive to and informative of environmental exposure. These findings demonstrate that even short-term ambient storage (<2 days) can substantially reshape microbiome, metabolome, and biochemical profiles, posing risks to data comparability in decentralized studies and emphasizing the need to recognize and address such biases. Critically, the high predictability of storage history offers a path toward bias detection and correction— particularly when standardized collection protocols are infeasible, as is common in decentralized sampling contexts. Our approach enables robust quantification and modeling of such storage effects across multi-omics datasets, unlocking more accurate interpretation of large-scale microbiome surveys.

Microbial storage and its implications for soil ecology

Article 30 September 2021

Community succession and functional prediction of microbial consortium with straw degradation during subculture at low temperature

Article Open access 23 November 2022

Legacies of temperature fluctuations promote stability in marine biofilm communities

Article Open access 11 March 2025

Introduction

Complex microbial ecosystems—microbiomes—colonize nearly every corner of our planet, playing key roles in food production as well as human, animal, plant, and planetary health^1,2,3. Accurate characterization of microbiomes and metabolome composition and function critically depends on standardized protocols for sample collection, storage, and handling. Even minor deviations in storage conditions can shift microbial community structure, altering metabolic outputs. These shifts can bias downstream analyses and compromise data comparability, as demonstrated in aquatic, soil, and gut microbiomes^4,5,6,7,8. While immediate freezing at −20 °C or −80 °C remains the gold standard to minimize microbial distortion in fecal microbiomes, as confirmed by both amplicon^9,10 and whole metagenome sequencing¹¹, it is often infeasible in decentralized, large-scale sampling efforts and in field studies. To limit compositional drift during room-temperature exposure, alternative stabilization strategies (e.g., sampling kits, storage conditions) have been benchmarked specifically for fecal samples^12,13. However, these tests were matrix-specific and do not necessarily generalize to diverse biological or food microbiomes, which ideally both need to be biobanked and preserved for sequencing-based characterization.

This challenge is especially pronounced in citizen science and global-scale microbiome surveys, where samples are often shipped under heterogeneous ambient conditions without cold-chain logistics. In citizen science projects like the American Gut Project⁶, Global Sourdough Project¹⁴, and HealthFerm Project¹⁵, samples frequently experienced uncontrolled temperature exposure before processing and analysis in the laboratory. Although storage effects on individual omics layers—such as microbial community composition—have been investigated, and correction strategies have been proposed for fecal samples^6,16, the extent and nature of such distortions across integrated multi-omics data and in non-fecal matrices such as food fermentations remain underexplored.

In fermented foods, residual substrates sustain metabolism, enabling not only the outgrowth of individual taxa but also community-wide restructuring in response to environmental stressors^17,18. Refrigeration slows, but does not halt these dynamics, and for example psychrotrophic bacteria such as Hafnia alvei and Pseudomonas spp. remain metabolically highly active even at 4 °C¹⁹. Warmer storage accelerates metabolism and acidification, which initially contributes to food safety and spoilage prevention^20,21. However, this can also accelerate community restructuring—particularly through the loss of lactic acid bacteria (LAB) viability^19,22,23,24, favoring more acid- and heat-tolerant taxa²⁰. These shifts undermine both microbiome stability and also functional attributes of the fermentation ecosystems.

In parallel to taxonomic shifts, prolonged storage can induce changes in the metabolite landscape. Residual sugars and organic acids present in fresh fermentations are transformed by ongoing microbial metabolism, and can further reshape the biochemical fingerprint of a ferment before visible changes in microbial composition occur²⁵. Such transformations can obscure biological signals, compromise reproducibility, and confound interpretation in multi-omics studies.

Sourdough provides a tractable, ecologically relevant model to investigate these dynamics. Its microbiome, dominated by LAB, acetic acid bacteria, and yeasts, is highly responsive to environmental perturbations^{14,26,27,28,29}. Moreover, compared with highly diverse host-associated and environmental microbiomes such as gut or soil communities, sourdough starters are typically dominated by a limited number of bacterial and fungal taxa, making them a tractable yet ecologically interesting model for mechanistic studies of community and metabolic resilience.

While fermentation conditions have been extensively studied^30,31,32,33, the impact of post-sampling storage on sourdough microbiome and metabolome integrity remains poorly defined. Addressing this gap is critical—not only to safeguard interpretability in decentralized multi-omics studies of fermentations, but also to understand how environmental stress can reshape fermentation ecosystems, with potential implications for food quality and safety^34,35.

Here, we used sourdough as a tractable model fermentation system to investigate how realistic post-sampling storage conditions affect microbiome and metabolome profiles (Fig. 1). To simulate common sample shipping scenarios, sourdough aliquots were incubated at 4 °C, 17 °C, or 30 °C for up to 28 days without substrate replenishment. A temporal multi-omics design—spanning 16S rRNA and ITS amplicon sequencing, cultivation, untargeted FIA-MS metabolomics, and HPLC profiling—was used to dissect the progression of microbial and biochemical changes across storage conditions. Using machine learning, we demonstrated that both storage time and temperature were strongly and predictably encoded in multi-omics profiles, revealing consistent microbial and metabolic signatures of post-sampling environmental exposure. These patterns underscore the need for caution when interpreting data from decentralized sampling efforts, where storage conditions may confound biological signals.

**Fig. 1: Overview of experimental design, multi-omics analysis, and main findings.**

Results

Temperature and time drive microbial viability loss and biochemical divergence in sourdough starters during simulated transport

Large-scale microbiome initiatives such as citizen science projects expand access to diverse ecosystems but often rely on ambient-temperature collection and transport, raising concerns about post-collection storage-induced bias. To quantify these effects under a controlled experimental setting, we simulated typical transport conditions using sourdough aliquots incubated at 4 °C, 17 °C, or 30 °C—representing seasonal ambient temperatures—over 28 days without substrate replenishment (Fig. 1).

At each time point, three biological replicates per temperature condition were analyzed. Microbiological analyses revealed rapid declines in viable lactic acid bacteria (LAB) and yeast at higher temperatures. By day 1, LAB colony forming units (CFUs) were already reduced at 30°C relative to cooler conditions (Fig. 2a), with significant within-group declines from day 3 onward. Yeast viability declined from day 4–5 (Fig. 2b, Fig. S2c, d). For reference, measurements of sourdough characteristics (microbial and chemical) at day 0 are shown in the supplementary figures (e.g., Fig. S1), but were not included in the statistical comparisons because day 0 corresponds to the single mother sourdough batch prior to exposure to the experimental storage conditions (see Methods for details).

**Fig. 2: Simulated large-scale sourdough sampling reveals significant temperature- and time-dependent shifts in microbial and biochemical profiles.**

Storage temperature also impacted acidity and metabolite dynamics. pH declined gradually at 4 °C but increased over time at 17 °C and 30 °C (Fig. 2c, Fig. S2a), while total titratable acidity (TTA) rose across all conditions, with a sharper increase at elevated temperatures—doubling by day 14 (Fig. 2d, Fig. S2b). These shifts reflect accelerated microbial metabolism under non-cooled conditions, further evidenced by high-pressure liquid chromatography (HPLC)-based metabolite profiling (Figs. S1–S3). Acetic acid showed accumulation at 17 °C and 30 °C from day six onwards (Fig. S1e), whereas lactic acid, succinic acid, and ethanol remained comparatively stable (Figs. S1f–h, S2e, f, S3a, b). Sugars such as glucose, maltose, and sucrose were rapidly depleted at warmer temperatures (Fig. S1i, k, l), followed by glucose reappearance from day 5–7 onward at 30 °C (Fig. S3c). Fructose accumulation from day 14 at 30 °C likely reflects enhanced enzymatic hydrolysis of oligosaccharides at elevated temperatures, potentially coupled with reduced microbial uptake or conversion (Fig. S1j). The concurrent stability of mannitol levels across cold conditions, and slight decrease at elevated temperatures (Fig. S1m, S3g) may suggest a temperature-dependent suppression of fructose-to-mannitol conversion, which may be linked to the activity or abundance of mannitol-producing heterofermentative LAB. We also observed substantial between-replicate variability in absolute acidity measurements, sugar or acid concentrations at each time point (Figs. S2, S3). This variability reflects heterogeneity among independent semi-solid fermentations in separate containers, rather than technical noise or oscillations within a single dough, and is consistent with the inherent microscale heterogeneity of sourdough matrices.

Higher temperature catalyzes microbial succession and promotes starvation tolerant taxa to bloom

Microbial community trajectories were profiled using 16S rRNA (V4) and ITS amplicon sequencing. The baseline bacterial community consisted of five species — Fructilactobacillus sanfranciscensis, Levilactobacillus brevis, Furfurilactobacillus rossiae, Pediococcus parvulus, and Acetobacter cerevisiae|Acetobacter malorum (indistinguishable 16S v4 and the type strains share 99.9% full-length 16S rRNA gene similarity³⁶; hereafter we refer to this ASV assignment as A. cerevisiae|malorum). Saccharomyces cerevisiae, out of 27 detected fungal species, dominated the fungal community across all samples (>97.5% relative abundance; Fig. 2e). At 4 °C, bacterial relative abundances remained stable throughout the one month period under starvation. However, warm conditions favored rapid expansion of A. cerevisiae|malorum (at 17 °C from day 3, and 30 °C from day 1), and also supported increased abundance of L. brevis and P. parvulus, while F. sanfranciscensis declined with increasing storage temperature and time (Fig. 2e).

To validate these trends and account for compositional bias, total bacterial load was quantified via 16S rRNA qPCR (Fig. S4a, d). Neither total abundance nor DNA yield varied significantly across time or temperature (Fig. S4b, c). Combining qPCR with relative abundance data enabled absolute quantification of species dynamics: F. sanfranciscensis remained dominant at 4 °C; at 17 °C, A. cerevisiae|malorum overtook by day 14; and at 30 °C, A. cerevisiae|malorum dominated by day 6 (Fig. S4e).

Correlational clustering of microbial, physicochemical, and metabolite data contextualized these shifts (Fig. 2f). A. cerevisiae|malorum, P. parvulus, and L. brevis co-clustered with temperature and pH, linking their activity to warming and acidification. This group was closely aligned with acetic acid, TTA, and fructose, indicating that elevated TTA is primarily driven by higher acetic acid accumulation relative to lactic acid. This is consistent with the higher pKₐ (≈4.75) of acetic acid compared to lactic acid (pKₐ ≈ 3.86). Conversely, F. sanfranciscensis clustered with readily fermentable sugars (glucose, maltose, sucrose), highlighting its more strict dependency on primary carbohydrate availability³⁷. LAB and yeast CFUs, DNA yields, and residual sugars and acids formed a final cluster reflecting shared responses to broader ecological restructuring (Fig. 2f).

Compositional restructuring hints at competitive dynamics under varying storage conditions

Alpha-diversity analyses reinforced the temperature- and time-driven bacterial succession observed in abundance profiles. Richness did not significantly change across any conditions for both bacteria and fungi, regardless of taxonomic resolution (ASV, OTU, k-mer; Fig. 3a, c; Fig. S5a, b, d). However, bacterial evenness increased significantly—by day 1 at 30 °C and from day 4 at 17 °C—indicating selective expansion of certain taxa under warmer conditions (Fig. 3b). Fungal evenness remained unchanged (Fig. 3d), consistent with S. cerevisiae’s near-total dominance across samples.

**Fig. 3: Storage temperature and duration drive bacterial, but not fungal, community restructuring in sourdough.**

To capture compositional imbalance, we applied ANCOM-BC2 differential abundance modeling. This revealed sharp declines in the log-ratio of enriched to outcompeted bacterial taxa between days 4–5 at 17 °C and days 5–6 at 30 °C (Fig. 3e), reflecting a community-wide turnover. No such restructuring occurred at 4 °C or in fungal communities at any temperature (Fig. 3f; Fig. S6), highlighting bacterial communities’ higher sensitivity to storage induced metabolic stress in this model system. Strikingly, A. cerevisiae|malorum and F. sanfranciscensis exhibited strongly opposing trajectories across all timepoints and temperatures (Fig. 3g), supporting a model of contrasting ecological adaptations: F. sanfranciscensis thrives in cool, sugar-rich conditions, whereas A. cerevisiae|malorum is better adapted to warm, more nutrient-depleted and acidified environments.

Temperature and time shape bacterial and fungal beta-diversity in opposite directions

To dissect microbial restructuring under storage, we analyzed beta-diversity in bacterial (Fig. 4a, b) and fungal (Fig. 4c, d) communities. Among bacteria, Bray–Curtis distances captured greater variance than Jaccard across all feature levels, with k-mer-based PCoA explaining the most (PCo1: 86.97%, PCo2: 6.54%), outperforming OTUs (79.44%/9.76%) and ASVs (78.0%/10.56%). This highlights the superior resolution of k-mers, even within a low-complexity system like sourdough.

**Fig. 4: Storage modulates microbial community structure via compositional shifts, captured most sensitively by k-mer-based profiling.**

Biplot overlays revealed that 4 °C samples clustered with higher glucose, maltose, sucrose, mannitol, ethanol, lactic acid, CFU counts, and F. sanfranciscensis, while 17 °C and 30 °C samples correlated with A. cerevisiae|malorum, elevated acetic acid levels, TTA, pH, and fructose (Fig. 4b).

Fungal beta-diversity was less temperature-dependent, though Bray–Curtis again explained more variance. Two S. cerevisiae ASVs associated with high temperature and long storage correlated with TTA, pH, acetic acid, and fructose, while 4 °C samples were enriched in minor taxa, DNA yield, CFUs, mannitol, and ethanol (Fig. 4d).

Beta-dispersion analyses revealed rising bacterial heterogeneity with storage duration and temperature, as shown by increasing mean distances to centroid: 0.049 (4 °C), 0.240 (17 °C), 0.337 (30 °C), and over time from 0.064 (day 1) to 0.218 (day 4) and 0.356 (day 21). PERMDISP confirmed significant dispersion increases (temperature: F_ASV = 40.262, F_OTU = 46.166, F_kmer = 42.082; day: F_ASV = 3.508, F_OTU = 3.413, F_kmer = 4.160; all p = 0.001). In contrast, fungal dispersion remained stable (mean distance to centroid: 0.016–0.018; PERMDISP p > 0.2), indicating higher resilience to storage-induced stress.

PERMANOVA supported these patterns: for bacteria, variance was primarily explained by temperature (R² = 0.36, p = 0.001), followed by nested effects (temperature(day), R² = 0.35, p = 0.001), and day (R² = 0.14, p = 0.001) (Fig. S7a); for fungi, day had stronger effects (R² = 0.14, p = 0.001) than temperature (R² = 0.04, p = 0.009) (Fig. S7d). Nesting by temperature within day explained more variance than the inverse in both kingdoms (Fig. S7c, f).

Notably, nested PERMANOVA revealed divergent temperature-dependence: bacterial communities showed greatest temporal variance at 4 °C (R² = 5.168, p = 0.006), decreasing at 17 °C (R² = 4.284, p = 0.001) and 30 °C (R² = 3.164, p = 0.006); fungi displayed the opposite trend, with variance increasing from 4 °C (R² = 1.155, p = 0.320) to 30 °C (R² = 2.632, p = 0.007). All together, this suggests that while bacterial communities at higher temperatures became more compositionally variable, these changes were also more stochastic across replicates. Fungal communities, by contrast, exhibited stable dispersion but increasing PERMANOVA R² at higher temperatures, pointing to greater susceptibility to temporal restructuring as compared to bacterial communities, which was consistently shared across samples.

Untargeted metabolomics reveals temperature- and time-dependent metabolite clusters with distinct functional enrichments

To expand beyond targeted chemical analysis, we applied untargeted FIA-MS metabolomics, detecting 1097 ion features across all sourdough samples (Fig. S8). Elevated temperatures led to increased metabolic diversity: at 17 °C, the number of features doubled by day 7, and at 30 °C, this increase was already evident by day 2 compared to 4 °C and day 1 (Fig. S8a, c). In line with bacterial patterns, feature evenness remained stable at 4 °C but rose progressively at 17 °C and more rapidly at 30 °C (Fig. S8b, d).

Principal component analysis of FIA-MS profiles under different normalization schemes (TSS, log, z-score) revealed temperature-driven clustering analogous to 16S Bray–Curtis ordinations (Fig. 5a). TSS normalization provided the clearest separation (PC1 = 60.11%, PC2 = 21.54%), outperforming log (39.71%/5.26%) and z-score (51.89%/12.23%). Bi-plot overlays indicated strong correlations between 4 °C samples and fermentable sugars (sucrose, glucose, maltose), F. sanfranciscensis, CFUs, and ethanol. In contrast, warmer storage aligned with increased concentrations of acetic acid, pH, TTA, A. cerevisiae|malorum, and subdominant taxa (P. parvulus, L. brevis, F. rossiae) (Fig. 5a).

**Fig. 5: Storage conditions restructure the sourdough metabolome.**

Procrustes analysis demonstrated strong concordance between metabolomic and bacterial community structures: FIA_TSS vs. 16S PCoA (M² = 0.437, p = 0.001), FIA_TSS PCA vs. 16S PCoA (M² = 0.498, p = 0.001), and FIA z-score vs. 16S (M² = 0.591, p = 0.001), with TSS offering the best alignment (Table S7). Concordance was highest at 17 °C (M² = 0.438), followed by 30 °C (M² = 0.593), and was non-significant at 4 °C (M² = 0.987, p = 0.718). Temporally, microbiome–metabolome synchrony peaked between days 6–14 (M² = 0.241–0.277, p < 0.01), diminishing during early and late storage phases (Table S8).

K-means clustering partitioned the FIA-MS features into three distinct temporal profiles (Fig. 5b–d), each reflecting coordinated biochemical responses (Kyoto Encyclopedia of Genes and Genomes (KEGG) + (Human Metabolome Database) HMD annotation ±0.002 Da, categorization into fermentation-relevant metabolite classes following the HMDB metabolite taxonomy).

Cluster 0: Nucleotides, aromatic compounds, hormones, and toxin-like metabolites remained stable at 4 °C but increased at 17 °C and 30 °C (Fig. 5b), likely reflecting a combination of enhanced nucleotide turnover and passive release via cellular lysis under nutrient limitation and increased (auto-)acidification at elevated temperatures. Similar shifts in nucleotide pool shifts have been reported in lactic acid bacteria under nutrient limitation, acid, and oxidative stress which all stimulate increased nucleotide salvage, DNA/RNA repair, and the production of stress-signaling nucleotides such as c-di-AMP^38,39.
Cluster 1: Enriched in amino acids, fatty acyls, glycerolipids, organoheterocycles, prenol lipids, and steroids. These catabolic metabolites surged at 30 °C during days 5–10, appeared later at 17 °C, and remained depleted at 4 °C (Fig. 5c). These dynamics align with pronounced proteolysis, lipid remodeling, and stringent response mediated amino acid recycling under starvation stress and elevated temperatures³⁹. However, given the absence of significant total bacterial biomass accumulation (qPCR, Fig. S4e), non-microbial sources such as endogenous wheat proteases and lipoxygenases may also contribute, particularly in response to thermal activation.
Cluster 2: Carbohydrate-related features (oligosaccharides, sugar alcohols) showed depletion at elevated temperatures and stability or slight enrichment in early phases at 4 °C (Fig. 5d), mirroring faster sugar consumption due to elevated enzymatic activity at warmer storage conditions with progressing time.

Together, these trajectories reflect a temperature-dependent modulation of metabolic activity, with carbohydrate catabolism occurring more rapidly from day 1 onwards at warmer temperatures, and more gradually at 4 °C. The subsequent accumulation of proteolytic and nucleotide-associated metabolites at 17 °C and 30 °C, and their delayed rise at 4 °C (e.g., Cluster 0), suggest that core metabolic processes proceed at different rates depending on storage temperature. Overall, the data support a model of progressive, rate-controlled metabolic layering, in which warmer conditions accelerate both the onset and magnitude of nutrient stress responses such as proteolysis and nucleotide turnover.

Storage temperature restructures microbial-metabolite networks

To investigate how storage conditions reshape ecosystem connectivity, we constructed abundance-weighted co-correlation networks (Fig. 6a–c) and sparse conditional co-occurrence networks (Fig. 6d–f) for each storage temperature (4 °C, 17 °C, 30 °C). In the co-correlation networks, two major modules consistently emerged: the one containing dominant taxa and their metabolite partners, and the second one harboring subdominant species. At 4 °C, F. sanfranciscensis, F. rossiae, and S. cerevisiae dominated the core cluster (Fig. 6a). At 17 °C, network modularity diminished as clusters became more interconnected, with overall stronger co-correlations (Fig. 6b). By 30 °C, A. cerevisiae|malorum transitioned from the subdominant cluster (4 °C) to the inter-module bridge (17 °C) and ultimately joined the dominant hub (Fig. 6c). This trajectory was mirrored by increasing betweenness and eigenvector centrality for A. cerevisiae|malorum, and a concurrent decline in centrality for F. sanfranciscensis (Fig. S9).

**Fig. 6: Ecological networks in sourdough microbiomes under starvation at different temperatures.**

Sparse conditional networks uncovered temperature-specific rewiring of direct associations. At 4 °C, F. sanfranciscensis remained largely disconnected, with no significant positive microbiome or metabolome associations, while less dominant bacterial species exhibited stronger connectivity (Fig. 6d). Centrality analyses confirmed F. sanfranciscensis had the lowest centrality values at 4 °C (Fig. S10), whereas A. cerevisiae|malorum retained higher influence across all temperatures. At 30 °C, F. sanfranciscensis gained closeness centrality and established a positive edge with A. cerevisiae|malorum (Fig. 6f), suggesting increased ecological interdependence in the bacterial sourdough community under warmer fermentation conditions.

Together, these network analyses reveal that rising temperatures not only shift microbial abundances but also rewire co-occurrence topology—amplifying the integrative role of A. cerevisiae|malorum while diminishing the dominance of F. sanfranciscensis. This rewiring may reflect broader ecological adaptation across microbial and metabolic layers.

Functional metabolite profiles outperform taxonomic composition in predicting storage history

To identify which features most robustly reflect storage-induced perturbations, we benchmarked predictive performance of individual and integrated data layers—including amplicon-based microbiome profiles, untargeted FIA-MS metabolomics, and physicochemical metadata (HPLC sugars/acids, CFUs, pH, TTA)—using cross-validated random forest models (Fig. 7a, Fig. S11).

**Fig. 7: Storage-phase classification and predictive markers in sourdough microbiomes.**

Storage duration was most accurately predicted from HPLC and metadata (macro-average AUC = 0.92), with LAB CFUs, fructose, and mannitol as top-ranked features (Fig. S12a). Predictive performance for storage duration based on the microbiome was poor (AUC = 0.58), and driven by low-abundance fungi (e.g., P. variabilis, D. fristingensis, A. pullulans) and A. cerevisiae|malorum (Fig. S13a), highlighting limited temporal specificity in community composition alone. Storage temperature was again best captured by HPLC features and metadata (AUC = 0.98), dominated by maltose, glucose, and TTA (Fig. S12b). The microbiome was more predictive for temperature (AUC = 0.84) than time, where most discriminatory features included F. sanfranciscensis, P. parvulus, and P. variabilis (Fig. S13b). This likely reflects their differential resilience to temperature-induced accelerated acidification, contributing to temperature-dependent niche structuring and community dynamics^27,39.

Classification of combined temperature–time phases (Phase 1: days 1–4, Phase 2: days 5–10, Phase 3: days 14–28) achieved near-perfect accuracy with FIA-MS (AUC = 0.99), HPLC + metadata (AUC = 0.99), and full multi-omics integration (AUC = 0.98). Using microbiome data alone demonstrated high accuracy but inferior performance (AUC = 0.80) (Fig. S11). Strikingly, 24 of the top 25 predictive features originated from the metabolome, particularly FIA-MS Cluster 0, enriched in nucleotides, steroids, and stress-response metabolites with increasing abundance under warmer conditions and extended storage (Fig. 7b, Fig. S13f). Additional contributions came from Clusters 1 (amino acids) and 2 (carbohydrates), reflecting metabolic transitions from carbohydrate depletion to proteolysis and secondary metabolite accumulation. HPLC-measured maltose was the only non-FIA-MS predictor in the top 25.

These results highlight that metabolite profiles—reflecting the functional output of microbial activity—more robustly capture storage history than taxonomic composition alone. This decoupling of function from community structure underscores the central role of metabolic adaptation in driving fermentation dynamics, particularly under varying temperature conditions and extended storage durations.

Discussion

Decentralized microbiome sampling—crucial to citizen science and global biodiversity surveys—faces a persistent challenge: preserving data integrity in the absence of controlled post-collection storage. Understanding how environmental exposures can reshape microbial composition and function is therefore critical for elucidating ecological dynamics and eliminating bias in large-scale microbiome studies. Our multi-omics analysis revealed that even short-term ambient storage (17–30 °C) restructures microbial ecosystems across multiple omics layers. In a controlled sourdough model, bacterial communities, metabolite profiles, and ecological network topology underwent pronounced shifts with storage temperature and duration—while fungal communities remained stable. These results highlight the higher sensitivity of bacterial and functional layers to environmental perturbation as compared to the fungal community, as observed in our relatively low-complexity, closed sourdough ecosystems, and can be modeled with high accuracy. More broadly, they reinforce that post-collection storage is not a neutral holding phase, but a dynamic window of ecological change with profound implications for microbiome data integrity and implications for reproducibility, comparability and multi-omic data interpretation.

Importantly, shifts in metabolome profiles preceded microbial community restructuring, and temperature—not time—was the primary lever. In the sourdough model at 4 °C, the core sourdough bacterium Fructilactobacillus sanfranciscensis remained dominant throughout, reflecting its cold tolerance and efficient sugar utilization in mildly acidic environments³⁷, and metabolite profiles remained relatively stable over time. In contrast, higher temperatures accelerated substrate depletion—particularly maltose and sucrose—triggering a succession from LAB to more acid- and storage stress-tolerant taxa such as Acetobacter cerevisiae|malorum, consistent with their known higher acid tolerance and metabolic flexibility^40,41,42,43. This trajectory exemplifies an environmental filtering process⁴⁴, wherein acid accumulation and resource scarcity seem to shift the community from sourdough specialization toward resilience and higher generalism. In line with this, F. sanfranciscensis, with its smaller genome and narrower metabolic repertoire⁴⁵, declined with increasing storage temperature and time (Fig. 2e). A methodological consideration is that our species labels are derived from short-read marker-gene data. Short-read 16S rRNA V4 amplicon sequencing provides limited species-level resolution, and taxonomic assignments are often most reliable at the genus or family level, depending on the clade⁴⁶. However, in our system, bacterial richness was low (maximum of eight OTUs/ASVs; Fig. 2a, Fig. S5a), and the classifier’s species-level labels matched canonical sourdough LAB species that have been described as prevalent and abundant across diverse sourdough ecosystems¹⁴. We further evaluated the robustness of our assignments by combining a SILVA 138.2 average-weighted Naïve Bayes classifier with manual blastn validation of the Pediococcus and Acetobacter ASVs which had lower classification confidence (below c = 0.7) against type-strain 16S rRNA sequences (Supplementary Fig. S16, Supplementary Tables S3–S4). Nevertheless, given the inherent limits of the 16S rRNA gene for bacterial classification and the dynamic nature of bacterial taxonomy, we treat the species labels as indicative but well supported within the sourdough niche and emphasize that our analyses and conclusions are robust to re-aggregation at the genus level, since the observed bacterial taxa belong to distinct genera.

These observed compositional transitions were likely driven by a combination of ecological stressors and metabolic interdependencies. As primary sugars (maltose, sucrose) were rapidly depleted—particularly under warmer storage—acidogenesis intensified, and acetic acid accumulated (Fig. 2d, Fig. S1a). Acetic acid, due to its membrane permeability and higher dissociation constant, imposes greater intracellular stress than lactic acid, which can promote decline of acid-sensitive LAB such as F. sanfranciscensis and facilitating the expansion of more acid-tolerant taxa like A. cerevisiae|malorum^27,47. In parallel, metabolites produced by LAB and yeasts—such as lactic acid, ethanol, and amino acids—likely fostered cross-feeding by AABs, in line with established models of microbial succession involving AABs in other fermentation ecosystems⁴⁸.

In parallel, untargeted FIA-MS metabolomics revealed a temperature- and time-dependent enrichment of nucleotides and stress-associated lipid classes (e.g., glycerophospholipids), indicative of nucleic acid degradation, membrane remodeling, and microbial turnover (Fig. 5b–d). The accumulation of select amino acids likely reflects both proteolytic activity and metabolite recycling under nutrient-depleted, acidified conditions⁴⁹. Notably, glucose and fructose re-emerged at later timepoints (Fig. S3c), possibly released via endogenous carbohydrate breakdown or also via microbial lysis. These catabolic byproducts may have enabled the metabolic expansion of A. cerevisiae|malorum, which not only exhibits high stress tolerance but, as observed in kefir systems⁴⁸ may also represent a late-successional taxon in the sourdough ecosystem that capitalizes on nutrient landscapes shaped by earlier fermenters.

Despite pronounced metabolic and bacterial shifts, fungal community structure remained remarkably stable across conditions (Figs. 4, 5), highlighting a functional decoupling between taxonomic stability and functional activity. This supports the notion that community composition does not always predict ecosystem function—a phenomenon previously reported in, e.g., marine microbiomes⁵⁰. Importantly, our warmest and longest storage condition (30 °C for 28 days) was designed as an extreme upper-bound “stress test” for microbiome and metabolome stability rather than as a realistic or safe storage scenario for dough intended for consumption. The unchanged fungal evenness and generally the relatively stable fungal community structure at this condition should therefore not be interpreted as evidence of microbiological safety.

Under the studied storage conditions, the compositional decoupling was asymmetric: in bacterial communities, structural and functional profiles became more tightly coupled under warm, nutrient-limited conditions. Procrustes analyses revealed the strongest alignment between bacterial composition and metabolome profiles at 17 °C and 30 °C (M² ≈ 0.44–0.59; Fig. 5a), suggesting that under increased stress, microbial structure becomes a better predictor of function. This tighter coupling may arise from the erosion of functional redundancy under stress conditions.

Simultaneously, starvation stress exposure increased bacterial beta-dispersion (PERMDISP p < 0.001; Fig. 4), consistent with greater stochasticity and adaptive bottlenecks^51,52. These shifts imply that while function becomes more predictable from community structure under stress, the structure itself becomes more variable between replicates. Temperature effects explained progressively more variance in Bray-Curtis dissimilarity (nested PERMANOVA R²: 30 °C > 17 °C > 4 °C), underscoring the dominant role of environmental filtering in shaping microbial ecology during storage.

Multi-omics network analyses revealed profound rewiring of ecological interaction topologies in response to storage temperature. At 4 °C, networks exhibited a modular architecture with F. sanfranciscensis maintaining high abundance but lower connectivity, consistent with niche partitioning and community stability under cold, sugar-rich conditions (Fig. 6c, f; Fig. S9, S10). In contrast, elevated temperatures led to a collapse of modularity and emergence of centralized structures, where A. cerevisiae|malorum assumed a hub-like role with high betweenness and eigenvector centrality. This transition likely illustrates a stress-induced reorganization from distributed, niche-driven coexistence to centralized ecological dependence, with A. cerevisiae|malorum functioning as an interaction keystone—likely mediating metabolite fluxes or coordinating inter-kingdom dynamics, supporting its previously suggested metabolic importance when studied in synthetic sourdough co-cultures⁵³.

These shifts were uncovered through integrated, multi-layer network models that capture both positive and negative associations among bacteria, fungi, and metabolites. Unlike single-layer approaches, these frameworks reveal both abundance correlations and structural embedding—linking taxa to metabolite turnover and ecological roles. Notably, A. cerevisiae|malorum’s increased centrality did not merely reflect growth, but highlighted its functional anchoring as a metabolic and ecological mediator within the network at elevated temperatures. This underscores how community dominance under, e.g., stress conditions, can emerge not only from abundance, but from a taxon’s network position and embeddedness. Such multi-domain integration aligns with recent advances in gut microbiome research⁵⁴, and offers a powerful framework for investigating resilience and stability also in fermented food ecosystems.

Building on this systems-level view, we next explored how predictive signatures of storage conditions are embedded within multi-omics profiles. Our supervised learning approach revealed that metabolite features were disproportionately sensitive to storage conditions: 24 of the top 25 predictors of storage temperature and duration were derived from the FIA-MS metabolome (Fig. 7b). This underscores the heightened vulnerability of functional layers to post-collection drift. Metabolomic data thus not only reflect system function but also encode metadata signatures of handling history and post-collection storage bias. Such sensitivity, while a source of distortion, also provides an opportunity—these molecular signatures could serve as proxies for sample integrity or environmental exposure.

These findings carry critical implications for microbiome studies relying on decentralized sampling and transport. Even brief exposure (1–2 days) to elevated temperatures triggered measurable changes in metabolite composition, bacterial evenness, and viability, while prolonged exposure (from 5 to 6 days onwards) induced divergence in community structure and ecological networks (Figs. 2–6). Notably, microbial succession lagged behind metabolome remodeling, underscoring the heightened sensitivity of the metabolome to handling conditions—and its potential utility as an early marker of post-collection perturbation.

Such storage-induced biases can distort ordination and clustering, mislead feature selection, and undermine the accuracy of downstream machine learning models—especially when unrecognized. Our findings echo concerns raised in other microbiome contexts, including human gut studies, where technical artifacts have been shown to dominate biological signals if uncorrected^16,55,56. These observations emphasize the urgent need for rigorous metadata tracking, awareness of post-collection shifts, and the development of standardized correction workflows—particularly in large-scale, decentralized food microbiome efforts. A limitation of our experimental design is that storage temperatures were held constant, whereas real-world shipping and home-storage conditions often involve repeated temperature fluctuations and short-term temperature increases or decreases⁵⁷. Our framework nevertheless captures the dominant directional effects of cooler versus warmer storage on community and metabolite trajectories and can be readily extended in future work to controlled fluctuating time–temperature profiles, as for example derived from empirical cold-chain measurements.

In conclusion, our study combines high-resolution storage tracking with machine learning–driven modeling in a controlled sourdough ecosystem to illuminate how microbial and metabolic profiles shift during post-collection storage. Through integrative, multi-omics analysis, we present a scalable, generalizable framework for quantifying and modeling collection-induced bias in microbiome research, and sourdough proved again its value as a tractable model for studying ecological and functional dynamics under environmental stress. Critically, the strong predictive signals encoded in storage-altered multi-omics profiles offer a new avenue for detecting—and potentially correcting—post-collection drift. This opens the door to more accurate, bias-aware interpretation and comparability of microbiome multi-omics data from decentralized, large-scale studies across diverse systems.

Methods

Fermentation experiments

To simulate transportation time and seasonal temperature variation, sourdough samples were stored at three constant temperatures 4 °C, 17 °C, and 30 °C that bracket typical refrigerated and ambient storage conditions, and sampled at 11 time points over one month (daily for the first 7 days, then on days 10, 14, 21, and 28). For sample sourdough preparation, we first established a laboratory wheat sourdough starter. Briefly, 10 g organic wheat flour (L170524, Swissmill, CH-8037 Zurich) were mixed with 10 g autoclaved, deionized and filtered water in a sterile 100 mL glass beaker. Autoclaved, deionized and filtered water was used instead of tap water to minimize variability in initial microbial inputs from the water supply and to standardize water composition. The mixture was covered with a loose aluminum foil cap to allow air circulation and incubated at 20 °C and 80% relative humidity for 24 h in a climate chamber (Memmert Peltier incubator). The starter was then refreshed for 14 consecutive days by removing 10 g of fermented dough and adding 5 g fresh flour and 5 g autoclaved, deionized water. After these initial 14 days, the starter was back-slopped once per week, with each 24 h fermentation step followed by storage at 4 °C until the next refreshment.

This laboratory-prepared wheat sourdough starter was subsequently upscaled for use in the storage experiment. In a first upscaling step, the starter (dough yield 200) was expanded using 500 g of organic wheat flour and 0.5 L autoclaved, deionized water under the same fermentation conditions as above. After 24 h, this sourdough was further upscaled overnight to a total of 8 kg. The 8 kg batch was distributed across five containers (each inoculated with 200 g of the same mother sourdough, 700 g water, and 700 g flour; incubated at 20 °C and 80% relative humidity), homogenized, and then divided into 99 aliquots (Gosselin Straight Containers, Fisher Scientific AG, Cat. No. 15458824) of 75 g each to cover the three storage temperatures and 11 sampling points. Samples were incubated with closed lids until harvest. A sample of the mother sourdough was collected at day 0, immediately after the overnight upscaling at 20 °C and before aliquoting into individual containers. The measurements of this day 0 sample are included in the supplements (Fig. S1, Fig. S4, S5 and S8) for comparison, but were not included in the main statistical analyses, because the sample represents one batch (the ‘mother’ dough, no biological replicates), was fermented at 20 °C rather than at the three experimental storage temperatures, and has a different effective fermentation history than the subsequent 24 h storage intervals.

Sample collection and processing

At each time point, three aliquots per temperature condition were randomly selected for microbiological and chemical analyses. All samples (n = 99), including the baseline sourdough (day 0), were diluted 1:1 in molecular-grade water and stored at −80 °C for subsequent DNA and metabolite extraction. For microbiological analysis, 5 g of sourdough was homogenized with 45 g of 1% vegetable peptone (VWR International, Cat. Nr. OXOIVG0100B) and 0.85% NaCl solution, then serially diluted. Viable lactic acid bacteria (LAB) counts were determined by plating on De Man, Rogosa, and Sharpe 5 (MRS-5) agar⁵⁸, prepared following their exact recipe containing 0.1 g/L cycloheximide, except exchanging meat extract with Beef Extract 500 (Merck, Cat. Nr. B4888-500G), and peptone with vegetable peptone (VWR International, Cat. Nr. OXOIVG0100B). Viable yeast counts were determined by plating on Yeast Extract Peptone Glucose (YPG) agar supplemented with 0.2 g/L chloramphenicol. Plates were incubated aerobically at 30 °C and 80% relative humidity for 48 h before calculating the number of colony-forming units (CFU) per gram. For chemical analysis, 10 g of sourdough was homogenized with 100 g of non-sterile deionized water. The pH was measured immediately, and total titratable acidity (TTA) was quantified by titrating with 0.1 M NaOH to a pH endpoint of 8.5, following the exact procedure as reported in the HealthFerm sourdough citizen science initiative¹⁵.

Substrate removal from sourdough samples and DNA extraction

DNA was extracted using the MagMAX™ Microbiome Ultra Nucleic Acid Isolation Kit (Thermo Fisher Scientific) on the KingFisher Apex platform (Thermo Fisher Scientific), following the MagMAX Liquid Buccal protocol (Thermo Fisher Scientific) with minor modifications. Cryotubes containing sourdough aliquots were thawed on ice, randomized, and transferred to KingFisher 96 deep-well plates. ZymoBIOMICS Microbial Community Standard (37.5 μL diluted in 112.5 μL sterile water; Zymo Research, Cat. No. D6300) and sterile H₂O (150 μL) served as positive and negative extraction controls, respectively. To prevent yeast sedimentation, samples were briefly homogenized at 20 Hz using a TissueLyser III (Qiagen). To remove flour particles while retaining larger yeast cells, plates were centrifuged at 100 × g for 1 min. A 150 μL aliquot of the resulting supernatant was transferred to the MagMAX Bead Beating Plate. Bead beating was performed twice for 5 min at 30 Hz, and DNA was eluted in 50 μL of elution buffer. DNA concentrations were measured using the Qubit™ dsDNA Quantification Assay Kit (Thermo Fisher Scientific, Cat. No. Q32854), following the manufacturer’s instructions. DNA was stored at −20 °C until further processing.

Marker gene amplicon library preparation and sequencing

Amplicon libraries for bacterial (16S rRNA gene) and fungal (ITS) profiling were prepared using the HighALPS ultra-high-throughput protocol with unique dual indices (UDI)⁵⁹. Bacterial communities were amplified targeting the V4 region using UDI-linked primers 515 F (5′-GTGYCAGCMGCCGCGGTAA-3′) and 806 R (5′-GGACTACNVGGGTWTCTAAT-3′)^60,61. Reactions (25 μL) contained 5 μL of template DNA, 12.5 μL 2× KAPA HiFi HotStart ReadyMix (Roche, Cat. No. 07958935001), and 0.3 μM primers. PCR conditions were: 95 °C for 5 min; 35 cycles of 98 °C for 20 s, 55 °C for 15 s, and 72 °C for 25 s; followed by a final extension at 72 °C for 4 min.

Fungal communities were profiled via nested PCR targeting the ITS1 region using BITS (5′-ACCTGCGGARGGATCA-3′) and B58S3 (5′-GAGATCCRTTGYTRAAAGTT-3′) primers⁶². The first round used non-indexed primers (25 μL reactions with 2 μL DNA template), followed by barcoding PCR using UDI-tagged primers with 1 μL of the first reaction as input. All primers were ordered from Microsynth AG, Switzerland. Cycling conditions were: 95 °C for 5 min; 35 cycles of 98 °C for 20 s, 49 °C for 15 s, and 72 °C for 20 s; with a final extension at 72 °C for 4 min. Barcoding PCR consisted of 10 cycles under the same thermal profile.

Amplicons were purified using Agencourt AMPure XP magnetic beads (0.7× ratio; Beckman Coulter, Cat. No. A63882) on the KingFisher Apex platform. DNA concentrations were quantified in duplicate using the Qubit™ dsDNA High Sensitivity Assay (Thermo Fisher Scientific, Cat. No. Q32854) on a Tecan Spark microplate reader. 16S rRNA and ITS amplicons were pooled separately at equimolar concentrations using a liquid handling platform (Brand GmbH), and quality was assessed via high-sensitivity TapeStation (Agilent Technologies, High Sensitivity D1000 DNA ScreenTape assays, Cat. No. 5067-5584). Pooled libraries were combined and sequenced with 300 bp paired-end reads on the Illumina NextSeq 2000 (600-cycle kit), including a 20% PhiX spike-in, at the Functional Genomics Center Zürich.

Quantitative PCR

Quantitative PCR (qPCR) was used to determine total, viable and non-viable, bacterial (LAB and non-LAB) abundance across all storage conditions. Each 25 μL reaction contained 1 μL of template DNA, 12.5 μL of 2× KAPA HiFi HotStart ReadyMix (Roche, Cat. No. 07958935001), 1 μL of 20× EvaGreen Dye (Brunschwig, Cat. No. BIO31000), and 0.3 μM of non-barcoded 515 F/806 R primers. Thermal cycling was performed on a LightCycler 480 (Roche) with the following conditions: 95 °C for 3 min; 40 cycles of 98 °C for 20 s, 55 °C for 15 s, and 72 °C for 15 s; followed by melt curve analysis (95 °C for 30 s, 60 °C for 15 s, ramping at 1 °C/min). Absolute DNA concentrations (ng/μL) were determined using a standard curve generated with the ZymoBIOMICS Microbial Community DNA Standard (Zymo Research, Cat. No. D6305), and applied to cycle threshold (Ct) values from sourdough samples. Bacterial cell counts were estimated assuming an average of five 16S rRNA gene copies per cell and 1.58 × 10⁹ 16S rRNA gene copies per ng of DNA (Table S1).

HPLC sample preparation and measurement

Pre-diluted sourdough samples stored at −80 °C were thawed on ice for 30–45 min. A 200 mg aliquot was diluted 1:10 in LC-MS grade water (Merck Supelco), homogenized in a Thermomix at 1500 rpm and 10 °C for 10 min, and centrifuged at 14,000 × g and 4 °C for 15 min. The resulting supernatant was filtered through a 0.45 μm polyvinylidene difluoride (PVDF) membrane (BGB Analytik AG) to remove particulates. For organic acid and ethanol analysis, 200 μL of filtered supernatant was transferred into glass HPLC vials with inserts. For sugar analysis, 100 μL of supernatant was further diluted 1:10 with Milli-Q water (MilliporeSigma) in plastic vials. Samples were stored at 4 °C for up to 48 h or at −20 °C for longer-term storage.

Organic acid and ethanol detection

Organic acids (succinic acid, lactic acid, acetic acid) and ethanol were quantified using an Agilent 1200 Series HPLC system equipped with an Aminex HPX-87H column (Bio-Rad). The column was maintained at 40 °C with a sample tray temperature of 10 °C and an injection volume of 10 μL. Separation was achieved under isocratic conditions using 5 mM sulfuric acid (H₂SO₄; Merck Titrisol) at a flow rate of 0.4 mL/min over 40 min. Ethanol was detected via refractive index (RI) detection at 40 °C, while organic acids were quantified using a diode array detector at 210 nm.

Carbohydrate detection

Carbohydrates (mannitol, sucrose, glucose, fructose, maltose) were analyzed using a Dionex ICS-5000+ system (Thermo Scientific) equipped with a Dionex CarboPac PA200 IC column maintained at 25 °C. Samples were held at 10 °C and injected at 10 μL. Separation was performed at a flow rate of 0.25 mL/min using a three-solvent system: (1) 225 mM sodium hydroxide (NaOH), (2) 1250 mM sodium acetate with 100 mM NaOH, and (3) Milli-Q water. A defined gradient program (Table S2) was applied, and carbohydrates were detected via pulsed amperometric detection. Total runtime per sample was 45 min.

FIA-MS untargeted metabolomics

All steps were performed on ice using ice-cold solvents unless otherwise specified. For metabolite fingerprinting via flow injection analysis-mass spectrometry (FIA-MS), 100 μL of water extract (from the HPLC organic acid analysis; see above) was diluted with 900 μL LC-MS grade methanol (Fisher Chemical), vortexed briefly, and centrifuged at 4500 × g for 30 min at 5 °C. The supernatant was further diluted 1:10 in 50% methanol, mixed, and 150 μL was transferred to a 96-well plate, heat-sealed with foil (Thermo Fisher Scientific), and analyzed within 24 h. FIA-MS was conducted using an Acquity I-Class UPLC system coupled to a Xevo G2-XS qToF mass spectrometer (Waters), following a modified protocol from Fuhrer et al.⁶³. Two microliters of extract were injected in 60% methanol containing 0.05% ammonium hydroxide (NH₄OH, Honeywell) and 2 mM ammonium fluoride (NH₄F; Sigma-Aldrich) as carrier solvent, at 0.2 mL/min for 1 min. Compounds were ionized using electrospray ionization in negative sensitivity mode with the following source parameters: capillary voltage 2.2 kV, cone voltage 40 V, source offset 80 V, source temperature 120 °C, desolvation gas at 250 °C and 850 L/h, and cone gas at 150 L/h. Mass spectra were acquired in extended dynamic range from m/z 50–1200 at a scan rate of 0.7 s. For online mass correction, 200 ng/μL Leucine-Enkephalin ([M–H]⁻ 556.2771; Waters) in 50:50:0.1 acetonitrile:water:formic acid was injected every 19 s (scan time 0.3 s, capillary voltage 2.2 kV).

Raw data were converted to mzML format using MSConvert v3.0 (ProteoWizard)⁶⁴, and processed in MZMine 4.1.0⁶⁵. Mass detection used centroid mode with a noise level of 1000. Chromatogram building used the ADAP algorithm (min scans: 4, min group intensity: 2000, min absolute height: 2000, m/z tolerance: 0.005 or 10 ppm). Chromatograms were smoothed with a Savitzky–Golay filter (RT window: 5). Carbon-13 deisotoping applied intra-sample tolerances of 0.002 m/z or 3.5 ppm and 1 min RT. Feature alignment used the Join Aligner (m/z tolerance: 0.005 or 15 ppm; m/z weight: 80; RT tolerance: 1 min; RT and mobility weights: (1). Gap filling used an intensity tolerance of 0.2, m/z tolerance of 0.005 or 15 ppm, and a minimum of 4 scans. Blank subtraction removed features with intensity <5x that of corresponding blanks. Final exported peak heights were used for statistical analysis.

Ions were annotated by exact mass (m/z) matching (±0.002 Da) against the KEGG metabolite⁶⁶ and HMDB databases⁶⁷, and subsequently classified using the HMDB metabolite taxonomy. Annotated features were then grouped into compound classes or subclasses relevant to sourdough fermentation (e.g., amino acids, organic acids, carbohydrates) for visualization. These groupings were used to calculate and plot the relative proportions of annotated metabolites within each temporal response cluster and temperature condition (Fig. 4).

Bioinformatic marker gene amplicon processing

Fungal ITS amplicons

Raw paired-end internal transcribed spacer (ITS) sequences were processed using QIIME 2 version 2024.10⁶⁸. Primers and adapters were trimmed with the cutadapt trim-paired plugin⁶⁹. Denoising was performed via dada2 denoise-single plugin⁷⁰ with no truncation (--p-trunc-len 0), a maximum expected error of 4.0, and a minimum parent abundance fold-difference of 4.0, resulting in 96–98% non-chimeric reads. Reads shorter than 50 nt were filtered out, retaining 2,453,636 sequences across 33 samples per temperature group (4 °C, 17 °C, 30 °C), one day 0 mother control, and excluding one contaminated replicate at day 7. Taxonomy was assigned with the classify-sklearn action of the q2-feature-classifier plugin⁷¹ against the customized UNITE v10.99 reference database⁷² curated via RESCRIPt⁷³; initial classification followed a confidence threshold of 0 to retain low-confidence fungal reads (to enable filtering of off-target hits as described below), followed by classification with the default confidence threshold of 0.7 for high-confidence classification of on-target hits⁷¹. Species labels reported in the manuscript were assigned based on the default 0.7 confidence threshold. Reads assigned to non-fungal sequences, low-abundance features (<10 reads/sample), fruiting body–associated taxa, unclassified phyla, and extremely rare class-level taxa were removed, accounting for 1.4% of total reads and resulting in 2,419,268 reads. Operational taxonomic units (OTUs) were generated by mapping reads at 90% similarity using the vsearch cluster-features-closed-reference plugin⁷⁴, against the same UNITE v10.99 database, excluding singletons. Unmatched and poorly classified reads were discarded, yielding a final OTU table of 2,411,351 reads—a 0.33% reduction from the ASV set. Both ASV and OTU tables were rarefied to 3500 reads per sample, resulting in a final dataset of 95 samples (Fig. S14a, b).

Bacterial 16S rRNAv4 amplicons

Paired-end 16S rRNAv4 gene sequences were processed using the dada2 denoise-paired plugin in QIIME 2. Reads were truncated at 150 bp in both directions to remove low-quality regions, primers, and adapters. Denoising used dada2 (via the q2-dada2 plugin) with a maximum expected error threshold of 2.0 and a minimum fold-parent-over-abundance threshold of 4.0, yielding 84–86% non-chimeric reads and a total of 4,019,885 denoised, merged sequences with a mean length of 252.98 ± 3.48 bps (consistent with an expected length of ~250 bps for the EMP protocol). One sample (day 28 at 4 °C) failed, resulting in 32 samples for 4 °C, 33 for 17 °C, 33 for 30 °C, and 1 mother sourdough control. All retained reads were ≥50 bp, so no additional length filtering was required. Taxonomic classification was performed using the classify-sklearn plugin trained on a custom SILVA 138.2 SSU NR99 reference database 75, restricted to EMP 515 f/806r amplicon regions and prepared using RESCRIPt, using a staged classification approach as described above. Initial indicative classifications for quality control used a confidence threshold of 0 to maximize bacterial read retention for quality filtering purposes, filtering out all non-bacterial sequences (e.g., mitochondria, chloroplasts, archaea, eukaryotes), which accounted for 13.46% of reads. An additional 492 ASVs were removed via contaminant filtering⁷⁵ (decontam-combined action of the q2-quality-control plugin) and removal of rare features (<10 reads in any sample), resulting in 3,478,383 high-quality reads. Reads were additionally clustered into OTUs at 99% identity using the vsearch cluster-features-closed-reference plugin against SILVA 138.2, yielding a final OTU table with 3,478,356 reads (only 27 fewer than the ASV table due to unmatched sequences). Both ASV and OTU tables were subsequently rarefied to 380 reads/sample, resulting in a final dataset of 93 samples (Fig. S14c, d). Final taxonomic classification of the filtered ASVs/OTUs was performed using a pre-trained SILVA 138.2 SSU NR99 reference database⁷⁶ “average weighted” naive Bayes classifier, which uses ecological frequency information as class weights to improve taxonomic classification to most probable species⁷⁷ (Fig. S16a–c). Among the top 8 most abundant ASVs, which collectively encompass 100% of the total relative frequency across all samples, taxonomic classifications were manually checked to confirm species labels via local alignment with blastn⁷⁸ against the NCBI RefSeq 16S database⁷⁹ (Fig. S16d, e; Tables S3, S4); this approach was used to confirm identification of Pediococcus parvulus and Acetobacter cerevisiae|malorum (the latter share 99.9% full-length 16S rRNA gene similarity and hence the exact species cannot be resolved with the 16S v4 fragment). Nevertheless, given the limited resolution of the 16S rRNA gene for bacterial classification, and the dynamic nature of bacterial taxonomy, all species-level classifications included here should be interpreted as indicative rather than definitive.

Statistical analysis

Microbial diversity analyses

Diversity analyses, including calculations of alpha-diversity (richness, evenness and Shannon entropy) and beta-diversity (Jaccard distance and Bray–Curtis dissimilarity), were made using the q2-diversity and q2-kmerizer⁸⁰ plugins in QIIME 2. Beta diversity estimates were calculated based on ASV composition as well as the constituent k-mers in these ASVs, and OTU features.

FIA-MS features preprocessing and clustering

FIA-MS features were filtered with a prevalence threshold of 3 (triplicates) and missing values were imputed as zeros. Feature tables were processed using z-score normalization, log transformation, combined log-z transformation, or total sum scaling (TSS). To capture dynamic metabolite responses to storage, one-way ANOVA (α = 0.05) was performed across time points within each temperature group on z-score normalized data. Metabolites showing significant temporal variation in at least one temperature condition were retained, and non-responsive features were excluded prior to clustering. Filtered, z-score normalized metabolite features were clustered using k-means in scikit-learn^81,82. Optimal cluster number was determined via silhouette score, elbow method (WCSS), and Gap statistic. Final clusters were defined with the best-supported k (k = 3) across all evaluation metrics. For each cluster, temporal dynamics were visualized by plotting z-score trajectories per temperature. Kruskal–Wallis tests were performed per cluster and day to assess temperature-driven differences, with Benjamini-Hochberg correction applied to control for false discovery rate.

Two-dimensional feature space statistics

Microbial beta-diversity feature spaces were visualized by using principal coordinate analysis (PCoA) and FIA-features by using Principal Component Analysis (PCA) to reduce the dimensionality. To aid interpretation of sample separation in beta-diversity PCoAs and FIA PCAs, top correlated features and metadata variables were overlaid in feature space. Pearson correlations between each variable and the first two PCoA respectively PCA axes were used to determine direction and length. Top contributors were visualized as scaled vectors, highlighting features and metadata most associated with ordination structure. To evaluate the contributions of experimental factors and their nested or interacting effects on community and metabolomic dissimilarity structures, we applied the QIIME 2 diversity adonis plugin for global PERMANOVA modeling (Fig. S7a, d, g) and a custom two-level nested PERMANOVA implementation in scikit-bio (Fig. S7b, c, e, f, h, i)^83,84,85. For the adapted nesting, global variance was first assessed for primary factors (e.g., temperature or day), followed by separate within-group PERMANOVA tests on secondary factors (e.g., day nested within each temperature, or temperature nested within each day). p-values were corrected for multiple testing using the Benjamini–Hochberg false discovery rate (FDR) procedure.

Day-wise comparison with post hoc grouping

To assess temperature effects across time, one-way ANOVA was applied at each time point for continuous variables including alpha diversity (amplicon-based and FIA-MS), CFUs, pH, TTA, and HPLC-derived metabolite concentrations (excluding control samples). Post hoc comparisons were performed using Tukey’s Honest Significant Difference (HSD) test to identify temperature-specific differences per day. Prior to conducting ANOVAs, homogeneity of variance was evaluated using Levene’s test across temperatures for each time point and variable (Fig. S15). As no comparisons showed significant heterogeneity (p < 0.05), ANOVA assumptions were considered met despite small replicate numbers.

Differential abundance analysis

Microbial compositional shifts were tested using ANCOM-BC2⁸⁶ in R v4.5.0. Amplicon data were rescaled to pseudo-counts and analyzed using a phyloseq-based workflow, excluding control (20 °C) samples. The model included day and temperature as fixed effects, and both pairwise and global tests were performed with FDR correction. Structural zeros and sampling fraction variation were accounted for. Taxa showing log-fold changes with a certain threshold (|logFC| > 0.75 for 16S, |logFC| > 0.1 for ITS) were classified as enriched (numerator) or outcompeted (denominator). Sample-wise log-ratios were computed as the log geometric mean of enriched taxa over depleted taxa at given temperatures and time points.

Integrations of metabolomics and amplicon data

To assess the concordance between metabolic and microbial community structures, Bray-Curtis dissimilarity matrices were computed on 16S OTU feature tables and subsequent PCoA, whereas FIA-MS (either z-score or TSS normalized) was either processed via Bray-Curtis and PCoA or direct PCA. Subsequently, Procrustes analysis was used to align ordination spaces between microbiome and metabolome data. To evaluate statistical significance, permutation tests (n = 1000) were conducted by randomizing sample labels in one dataset and recomputing Procrustes disparities, yielding empirical p-values. Disparities and associated p-values were computed for global comparisons, as well as stratified by day and temperature in isolation or stratification, to identify temporal or condition-specific shifts in metabolome–microbiome concordance.

Multi-omics co-occurrence analysis and network inference

To investigate intra- and inter-kingdom associations in the sourdough microbiome and metabolome across storage temperatures, we performed two complementary network inference approaches based on centered log-ratio (CLR)–transformed data. In the first approach, microbial OTU tables (16S and ITS) and metabolomics data were CLR-transformed (after multiplicative replacement; delta = 1e-8; with TSS normalization for metabolite data) and pairwise associations were calculated using a novel abundance-weighted co-occurrence score (Eq. 1).

$$Co-occurrence(x,y)=\frac{{\sum }_{i=1}^{n}{x}_{i}* {y}_{i}* {w}_{i}}{{\sum }_{i=1}^{n}{w}_{i}},where\,{w}_{i}=\left|{x}_{i}* {y}_{i}\right|$$

(1)

Equation (1) is inspired by compositional-aware correlation frameworks such as SparCC⁸⁷ and CoNet⁸⁸, and enhances traditional co-occurrence metrics by explicitly upweighting high-abundance, co-varying features while suppressing noise from low-abundance artifacts. Unlike SparCC, which infers correlations via iterative pseudo-count-based covariance estimation, or CoNet, which combines multiple similarity metrics with heuristic edge selection, our score directly integrates abundance information into the association weight, offering greater sensitivity to biologically meaningful, high-confidence co-occurrence in complex multi-omics datasets. Associations with ∣Score(x,y)∣ ≥ 0.2 were retained for downstream network construction. This approach supports robust detection of ecologically relevant interactions across kingdoms and omics layers, while maintaining interpretability and scalability for large datasets.

In parallel, we applied a sparse inverse covariance estimation approach to capture conditional dependencies and partial correlations among features. CLR-transformed feature matrices were z-score standardized and concatenated, followed by model fitting using the Graphical Lasso with cross-validation in scikit-learn^82,87. Non-zero entries in the resulting precision matrix defined network edges, representing regularized partial correlations. Feature identity (bacteria, fungi, metabolite) was used to classify intra- and inter-domain interactions. The resulting co-occurrence matrices were visualized as undirected networks in NetworkX (v2.8.8)⁸⁹, with node attributes encoding biological category (bacteria, fungi, metabolite), shape, and average abundance (used for node size scaling). Network layout was performed using Graphviz’s “neato” force-directed algorithm⁹⁰. Edge thickness and spring length were scaled by co-occurrence strength, and edge color represented co-occurrence polarity. Topological metrics - degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality - were calculated to identify key taxa and metabolites in each network. Networks and centrality metrics were generated independently for each storage temperature.

Supervised classification of storage conditions using multi-omics features

To predict sample storage temperature (4 °C, 17 °C, 30 °C), a supervised classifier was trained on combined microbial (16S OTU) and metabolomic (FIA-MS) feature matrices as well as HPLC, CFU, pH, TTA and DNA concentration (extractions) data. A nested pipeline was implemented using a random forest classifier⁹¹ (100 estimators) preceded by feature selection, which retained features with importances above the median threshold. Model evaluation was performed using stratified 5-fold cross-validation to preserve class balance per fold. This approach allowed robust performance estimation while accounting for both model fitting and feature selection within each training fold.

All statistical and computational analyses were performed in Python (v3.10.14), unless stated otherwise. Data preprocessing used pandas⁹² and numpy⁹³. Dimensionality reduction and building and evaluating random forest models was performed with scikit-learn⁸². Random forest models were built and evaluated using scikit-learn. Statistical modeling (e.g., ANOVA) used statsmodels⁹⁴. Visualizations were generated with Seaborn⁹⁵, Matplotlib⁹⁶ and NetworkX⁸⁹.

Data availability

All sequence data have been deposited on EBI-ENA under accession number PRJEB94514 (16S) and PRJEB94515 (ITS). Source data (metadata) along with processed HPLC and FIA-MS data have been deposited together with all code notebooks in github (see code availability).

Code availability

All code notebooks for bioinformatic processing, statistical analyses, and machine-learning models have been deposited and are openly accessible in github https://github.com/bokulich-publications/shipped-and-shifted.

References

Banerjee, S. et al. Agricultural intensification reduces microbial network complexity and the abundance of keystone taxa in roots. ISME J. 13, 1722–1736 (2019).
Article PubMed PubMed Central Google Scholar
Berg, G. et al. Microbiome definition re-visited: old concepts and new challenges. Microbiome 8, 103 (2020).
Article PubMed PubMed Central Google Scholar
Gilbert, J. A. et al. Current understanding of the human microbiome. Nat. Med. 24, 392–400 (2018).
Article CAS PubMed PubMed Central Google Scholar
Burman, E. & Bengtsson-Palme, J. Microbial community interactions are sensitive to small changes in temperature. Front. Microbiol. 12, 672910 (2021).
Article PubMed PubMed Central Google Scholar
Bokulich, N. A., Maldonado, J., Kang, D.-W., Krajmalnik-Brown, R. & Caporaso, J. G. Rapidly processed stool swabs approximate stool microbiota profiles. mSphere 4, e00208–e00219 (2019).
Article CAS PubMed PubMed Central Google Scholar
McDonald, D. et al. American gut: an open platform for citizen science microbiome research. mSystems 3, e00031–e00031 (2018).
Article CAS PubMed PubMed Central Google Scholar
Scofield, V., Jacques, S. M. S., Guimarães, J. R. D. & Farjalla, V. F. Potential changes in bacterial metabolism associated with increased water temperature and nutrient inputs in tropical humic lagoons. Front. Microbiol. 6, 310 (2015).
Article PubMed PubMed Central Google Scholar
Silva, I. et al. Short-term responses of soil microbial communities to changes in air temperature, soil moisture and UV radiation. Genes 13, 850 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bassis, C. M. et al. Comparison of stool versus rectal swab samples and storage conditions on bacterial community profiles. BMC Microbiol. 17, 78 (2017).
Article PubMed PubMed Central Google Scholar
Tedjo, D. I. et al. The effect of sampling and storage on the fecal microbiota composition in healthy and diseased subjects. PLOS ONE 10, e0126685 (2015).
Article PubMed PubMed Central Google Scholar
Momo Cabrera, P., Bokulich, N. A. & Zimmermann, P. Evaluating stool microbiome integrity after domestic freezer storage using whole-metagenome sequencing, genome assembly, and antimicrobial resistance gene analysis. Microbiol. Spectr. 13, e02278–24 (2025).
Article PubMed PubMed Central Google Scholar
Song, S. J. et al. Preservation methods differ in fecal microbiome stability, affecting suitability for field studies. mSystems 1, e00021–16 (2016).
Article PubMed PubMed Central Google Scholar
Teo, Y. et al. Evaluating long-term stool preservation methods for maximizing the recovery of viable human fecal microbiota. Gut Microbes Rep. 2, 2594958 (2025).
Article Google Scholar
Landis, E. A. et al. The diversity and function of sourdough starter microbiomes. eLife 10, e61644 (2021).
Article CAS PubMed PubMed Central Google Scholar
Meyer, A. et al. Rising together: exploring sourdough fermentation diversity through Co-design in the HealthFerm Citizen Science Initiative. 2025.05.23.655785 Preprint at https://doi.org/10.1101/2025.05.23.655785 (2025).
Amir, A. et al. Correcting for microbial blooms in fecal samples during room-temperature shipping. mSystems 2, e00199–16 (2017).
Article CAS PubMed PubMed Central Google Scholar
Marco, M. L. et al. Health benefits of fermented foods: microbiota and beyond. Curr. Opin. Biotechnol. 44, 94–102 (2017).
Article CAS PubMed Google Scholar
Sawant, S. S., Park, H.-Y., Sim, E.-Y., Kim, H.-S. & Choi, H.-S. Microbial fermentation in food: impact on functional properties and nutritional enhancement—a review of recent developments. Fermentation 11, 15 (2025).
Article CAS Google Scholar
Wei, Q., Wang, X., Sun, D.-W. & Pu, H. Rapid detection and control of psychrotrophic microorganisms in cold storage foods: a review. Trends Food Sci. Technol. 86, 453–464 (2019).
Article CAS Google Scholar
Louw, N. L., Lele, K., Ye, R., Edwards, C. B. & Wolfe, B. E. Microbiome assembly in fermented foods. Annu. Rev. Microbiol. 77, 381–402 (2023).
Article CAS PubMed Google Scholar
Valentino, V. et al. Fermented foods, their microbiome and its potential in boosting human health. Microb. Biotechnol. 17, e14428 (2024).
Article CAS PubMed PubMed Central Google Scholar
Cabello-Olmo, M. et al. Influence of storage temperature and packaging on bacteria and yeast viability in a plant-based fermented food. Foods 9, 302 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kim, E., Yang, S.-M. & Kim, H.-Y. Analysis of cultivable microbial community during kimchi fermentation using MALDI-TOF MS. Foods 10, 1068 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kim, J. Y. et al. Long-term population dynamics of viable microbes in a closed ecosystem of fermented vegetables. Food Res. Int. 154, 111044 (2022).
Article PubMed Google Scholar
De Filippis, F., Genovese, A., Ferranti, P., Gilbert, J. A. & Ercolini, D. Metatranscriptomics reveals temperature-driven functional changes in microbiome impacting cheese maturation rate. Sci. Rep. 6, 21871 (2016).
Article PubMed PubMed Central Google Scholar
Martins, I. E. et al. Effect of packaging materials and storage conditions on the microbial quality of pearl millet sourdough bread. J. Food Sci. Technol. 58, 52–61 (2021).
Article CAS PubMed Google Scholar
Minervini, F., De Angelis, M., Di Cagno, R. & Gobbetti, M. Ecological parameters influencing microbial diversity and stability of traditional sourdough. Int. J. Food Microbiol. 171, 136–146 (2014).
Article CAS PubMed Google Scholar
Sanmartin C, A. G. The kinetics of fermentations in sourdough bread stored at different temperature and influence on bread quality. J. Bioprocess. Biotech. 3, 134–138 (2013).
Van Kerrebroeck, S., Maes, D. & De Vuyst, L. Sourdoughs as a function of their species diversity and process conditions, a meta-analysis. Trends Food Sci. Technol. 68, 152–159 (2017).
Article Google Scholar
De Vuyst, L., Van Kerrebroeck, S. & Leroy, F. Microbial Ecology and Process Technology of Sourdough Fermentation. in Advances in Applied Microbiology (eds Sariaslani, S. & Gadd, G. M.) 100, 49–160 (Academic Press, 2017).
Ercolini, D. et al. Microbial ecology dynamics during rye and wheat sourdough preparation. Appl. Environ. Microbiol. 79, 7827–7836 (2013).
Article CAS PubMed PubMed Central Google Scholar
Reese, A. T., Madden, A. A., Joossens, M., Lacaze, G. & Dunn, R. R. Influences of Ingredients and Bakers on the Bacteria and Fungi in Sourdough Starters and Bread. mSphere 5, e00950–19 (2020).
Article CAS PubMed PubMed Central Google Scholar
Ripari, V., Gänzle, M. G. & Berardi, E. Evolution of sourdough microbiota in spontaneous sourdoughs started with different plant materials. Int. J. Food Microbiol. 232, 35–42 (2016).
Article CAS PubMed Google Scholar
Lim, J.-Y. et al. Microbial dynamics and metabolite profiles in different types of salted seafood (Jeotgal) during fermentation. ACS Omega 9, 35798–35808 (2024).
Article CAS PubMed PubMed Central Google Scholar
Tan, G. et al. Microbial community and metabolite dynamics during soy sauce koji making. Front. Microbiol. 13, 841529 (2022).
Article PubMed PubMed Central Google Scholar
Cleenwerck, I., Vandemeulebroecke, K., Janssens, D. & Swings, J. Re-examination of the genus Acetobacter, with descriptions of Acetobacter cerevisiae sp. nov. and Acetobacter malorum sp. nov. Int. J. Syst. Evol. Microbiol. 52, 1551–1558 (2002).
Article CAS PubMed Google Scholar
Corsetti, A. & Settanni, L. Lactobacilli in sourdough fermentation. Food Res. Int. 40, 539–558 (2007).
Article CAS Google Scholar
Kilstrup, M., Hammer, K., Ruhdal Jensen, P. & Martinussen, J. Nucleotide metabolism and its control in lactic acid bacteria. FEMS Microbiol. Rev. 29, 555–590 (2005).
Article CAS PubMed Google Scholar
Papadimitriou, K. et al. Stress physiology of lactic acid bacteria. Microbiol. Mol. Biol. Rev. MMBR 80, 837 (2016).
Article CAS PubMed PubMed Central Google Scholar
Baig, M. A. et al. Potential probiotic pediococcus pentosaceus M41 modulates its proteome differentially for tolerances against heat, cold, acid, and bile stresses. Front. Microbiol. 12, 731410 (2021).
Article PubMed PubMed Central Google Scholar
Gomes, R. J. et al. Acetic acid bacteria in the food industry: systematics, characteristics and applications. Food Technol. Biotechnol. 56, 139–151 (2018).
Article CAS PubMed PubMed Central Google Scholar
Han, N. R. et al. Evolution-aided improvement of the acid tolerance of Levilactobacillus brevis and its application in sourdough fermentation. Food Res. Int. 190, 114584 (2024).
Article CAS PubMed Google Scholar
Wang, B., Shao, Y., Chen, T., Chen, W. & Chen, F. Global insights into acetic acid resistance mechanisms and genetic stability of Acetobacter pasteurianus strains by comparative genomics. Sci. Rep. 5, 18330 (2015).
Article CAS PubMed PubMed Central Google Scholar
Marsland, R. et al. Available energy fluxes drive a transition in the diversity, stability, and functional structure of microbial communities. PLOS Comput. Biol. 15, e1006793 (2019).
Vogel, R. F. et al. Genomic analysis reveals Lactobacillus sanfranciscensis as stable element in traditional sourdoughs. Microb. Cell Factories 10, S6 (2011).
Article Google Scholar
Johnson, J. S. et al. Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nat. Commun. 10, 5029 (2019).
Article PubMed PubMed Central Google Scholar
López, P. C., Peng, C., Arneborg, N., Junicke, H. & Gernaey, K. V. Analysis of the response of the cell membrane of Saccharomyces cerevisiae during the detoxification of common lignocellulosic inhibitors. Sci. Rep. 11, 6853 (2021).
Article PubMed PubMed Central Google Scholar
Blasche, S. et al. Metabolic cooperation and spatiotemporal niche partitioning in a kefir microbial community. Nat. Microbiol. 6, 196–208 (2021).
Article CAS PubMed PubMed Central Google Scholar
Gänzle, M. & Follador, R. Metabolism of oligosaccharides and starch in lactobacilli: a review. Front. Microbiol. 3, 340 (2012).
Article PubMed PubMed Central Google Scholar
Louca, S. et al. Function and functional redundancy in microbial systems. Nat. Ecol. Evol. 2, 936–943 (2018).
Article PubMed Google Scholar
Shade, A. et al. Fundamentals of microbial community resistance and resilience. Front. Microbiol. 3, 417 (2012).
Article PubMed PubMed Central Google Scholar
Stegen, J. C., Lin, X., Konopka, A. E. & Fredrickson, J. K. Stochastic and deterministic assembly processes in subsurface microbial communities. ISME J. 6, 1653–1664 (2012).
Article CAS PubMed PubMed Central Google Scholar
Rappaport, H. B., Senewiratne, N. P. J., Lucas, S. K., Wolfe, B. E. & Oliverio, A. M. Genomics and synthetic community experiments uncover the key metabolic roles of acetic acid bacteria in sourdough starter microbiomes. mSystems 0, e00537–24 (2024).
Google Scholar
Akiyama, S. et al. Multi-biome analysis identifies distinct gut microbial signatures and their crosstalk in ulcerative colitis and Crohn’s disease. Nat. Commun. 15, 10291 (2024).
Article CAS PubMed PubMed Central Google Scholar
McLaren, M. R., Willis, A. D. & Callahan, B. J. Consistent and correctable bias in metagenomic sequencing experiments. eLife 8, e46923 (2019).
Article PubMed PubMed Central Google Scholar
Sinha, R. et al. Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium. Nat. Biotechnol. 35, 1077–1086 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ndraha, N., Hsiao, H.-I., Vlajic, J., Yang, M.-F. & Lin, H.-T. V. Time-temperature abuse in the food cold chain: review of issues, challenges, and recommendations. Food Control 89, 12–21 (2018).
Article Google Scholar
Harth, H., Van Kerrebroeck, S. & De Vuyst, L. Community dynamics and metabolite target analysis of spontaneous, backslopped barley sourdough fermentations under laboratory and bakery conditions. Int. J. Food Microbiol. 228, 22–32 (2016).
Article CAS PubMed Google Scholar
Flörl, L., Cabrera, P. M., Moccia, M. D., Plüss, S. & Bokulich, N. A. HighALPS: ultra-high-throughput marker-gene amplicon library preparation and sequencing on the illumina NextSeq and NovaSeq Platforms. Preprint at https://doi.org/10.1101/2024.10.10.617643 (2024).
Apprill, A., McNally, S. P., Parsons, R. & Weber, L. Minor revision to V 4 region SSU rRNA 806 R gene primer greatly increases detection of SAR 11 bacterioplankton. Aquat. Microb. Ecol. 75, 129–137 (2015).
Article Google Scholar
Parada, A. E., Needham, D. M. & Fuhrman, J. A. Every base matters: assessing small subunit rRNA primers for marine microbiomes with mock communities, time series and global field samples. Environ. Microbiol. 18, 1403–1414 (2016).
Article CAS PubMed Google Scholar
Bokulich, N. A. & Mills, D. A. Improved selection of internal transcribed spacer-specific primers enables quantitative, ultra-high-throughput profiling of fungal communities. Appl. Environ. Microbiol. 79, 2519–2526 (2013).
Article CAS PubMed PubMed Central Google Scholar
Fuhrer, T., Heer, D., Begemann, B. & Zamboni, N. High-throughput, accurate mass metabolome profiling of cellular extracts by flow injection–time-of-flight mass spectrometry. Anal. Chem. 83, 7074–7080 (2011).
Article CAS PubMed Google Scholar
Chambers, M. C. et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 30, 918–920 (2012).
Article CAS PubMed PubMed Central Google Scholar
Schmid, R. et al. Integrative analysis of multimodal mass spectrometry data in MZmine 3. Nat. Biotechnol. 41, 447–449 (2023).
Article CAS PubMed PubMed Central Google Scholar
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).
Article CAS PubMed Google Scholar
Wishart, D. S. et al. HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res. 46, D608–D617 (2018).
Article CAS PubMed PubMed Central Google Scholar
Bolyen, E. et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857 (2019).
Article CAS PubMed PubMed Central Google Scholar
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10–12 (2011).
Article Google Scholar
Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bokulich, N. A. et al. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin. Microbiome 6, 90 (2018).
Article PubMed PubMed Central Google Scholar
Abarenkov, K. et al. UNITE QIIME release for Fungi 2. UNITE Community https://doi.org/10.15156/BIO/2959337 (2024).
Robeson, M. S. et al. RESCRIPt: reproducible sequence taxonomy reference database management. PLOS Comput. Biol. 17, e1009581 (2021).
Article PubMed PubMed Central Google Scholar
Rognes, T., Flouri, T., Nichols, B., Quince, C. & Mahé, F. VSEARCH: a versatile open source tool for metagenomics. PeerJ 4, e2584 (2016).
Article PubMed PubMed Central Google Scholar
Davis, N. M., Proctor, D. M., Holmes, S. P., Relman, D. A. & Callahan, B. J. Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data. Microbiome 6, 226 (2018).
Article PubMed PubMed Central Google Scholar
Chuvochina, M. et al. SILVA in 2026: a global core biodata resource for rRNA within the DSMZ digital diversity. Nucleic Acids Res. 18, gkaf1247 (2025).
Google Scholar
Kaehler, B. D. et al. Species abundance information improves sequence taxonomy classification accuracy. Nat. Commun. 10, 4643 (2019).
Article PubMed PubMed Central Google Scholar
Zhang, Z., Schwartz, S., Wagner, L. & Miller, W. A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7, 203–214 (2000).
Article CAS PubMed Google Scholar
Tatusova, T. et al. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res. 44, 6614–6624 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bokulich, N. A. Integrating sequence composition information into microbial diversity analyses with k-mer frequency counting. mSystems 10, e01550-24 (2025).
Article PubMed PubMed Central Google Scholar
MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proc. Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics 5, 281–298 (University of California Press, 1967).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Google Scholar
Anderson, M. J. A new method for non-parametric multivariate analysis of variance. Austral Ecol. 26, 32–46 (2001).
Google Scholar
Oksanen, J. et al. vegan: community ecology package. 2.7–1 https://doi.org/10.32614/CRAN.package.vegan (2001).
Rideout, J. R. et al. biocore/scikit-bio: scikit-bio 0.5.9: Maintenance release. Zenodo https://doi.org/10.5281/zenodo.8209901 (2023).
Lin, H. & Peddada, S. D. Multigroup analysis of compositions of microbiomes with covariate adjustments and repeated measures. Nat. Methods 21, 83–91 (2024).
Article CAS PubMed Google Scholar
Friedman, J., Hastie, T. & Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008).
Article PubMed Google Scholar
Faust, K. & Raes, J. CoNet app: inference of biological association networks using Cytoscape. [version 2; peer review: 2 approved]. F1000 Res. 5, 1519. https://doi.org/10.12688/f1000research.9050.2 (2016).
Hagberg, A. A., Schult, D. A. & Swart, P. J. Exploring network structure, dynamics, and function using networkX. scipy https://doi.org/10.25080/TCWV9851 (2008).
Article Google Scholar
Gansner, E. R. & North, S. C. An open graph visualization system and its applications to software engineering. Softw. Pract. Exp. 30, 1203–1233 (2000).
Article Google Scholar
Breiman, L. Random Forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
The pandas development team. pandas-dev/pandas: Pandas. https://doi.org/10.5281/zenodo.13819579 (2024).
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
Article CAS PubMed PubMed Central Google Scholar
Seabold, S. & Perktold, J. Statsmodels: Econometric and Statistical Modeling with Python. in 92–96 (Austin, Texas, 2010). https://doi.org/10.25080/Majora-92bf1922-011.
Waskom, M. L. seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).
Article Google Scholar
The Matplotlib Development Team. Matplotlib: Visualization with Python. https://doi.org/10.5281/zenodo.11201097 (2024).

Download references

Acknowledgements

The authors acknowledge financial support from the project HealthFerm, which is funded by the European Union under the Horizon Europe grant agreement No. 101060247 and by the Swiss State Secretariat for Education, Research and Innovation (SERI) under contract No. 22.00210. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union nor European Research Executive Agency (REA). Neither the European Union nor REA can be held responsible for them.The authors thank Luisa Ferreira for support in data collection and the Genomic Diversity Center of ETH Zürich for their support with amplicon library preparation. The microbiome amplicon sequencing was performed at the Functional Genomics Center Zurich of University of Zurich and ETH Zurich.

Funding

Open access funding provided by Swiss Federal Institute of Technology Zurich.

Author information

Authors and Affiliations

Laboratory of Food Systems Biotechnology, Department of Health Sciences and Technology, ETH Zurich, Zurich, Switzerland
Annina R. Meyer, Mihnea Paul Mihaila, Michelle Neugebauer & Nicholas A. Bokulich
Laboratory of Food Biochemistry, Department of Health Sciences and Technology, ETH Zurich, Zurich, Switzerland
Jan Patrick Tan & Laura Nyström

Authors

Annina R. Meyer
View author publications
Search author on:PubMed Google Scholar
Jan Patrick Tan
View author publications
Search author on:PubMed Google Scholar
Mihnea Paul Mihaila
View author publications
Search author on:PubMed Google Scholar
Michelle Neugebauer
View author publications
Search author on:PubMed Google Scholar
Laura Nyström
View author publications
Search author on:PubMed Google Scholar
Nicholas A. Bokulich
View author publications
Search author on:PubMed Google Scholar

Contributions

Annina R. Meyer, Conceptualization, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing; Jan P. Tan, Formal analysis, Investigation, Methodology, Writing - review and editing; Mihnea P. Mihaila, Michelle Neugebauer, Investigation, Writing - review and editing; Laura Nyström, Resources, Supervision, Funding acquisition, Writing - review and editing; Nicholas A. Bokulich, Conceptualization, Resources, Supervision, Funding acquisition, Writing - original draft, Writing - review and editing.

Corresponding author

Correspondence to Nicholas A. Bokulich.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Meyer, A.R., Tan, J.P., Mihaila, M.P. et al. Shipped and shifted: modeling collection-induced bias in microbiome multi-omics using a tractable fermentation system. npj Biofilms Microbiomes 12, 43 (2026). https://doi.org/10.1038/s41522-025-00909-1

Download citation

Received: 17 September 2025
Accepted: 31 December 2025
Published: 09 January 2026
Version of record: 12 February 2026
DOI: https://doi.org/10.1038/s41522-025-00909-1

This article is cited by

Rising together: exploring sourdough fermentation diversity through co-design in the HealthFerm citizen science initiative
- Annina R. Meyer
- Thomas Gettemans
- Nicholas A. Bokulich
npj Science of Food (2026)