Abstract
The human oral microbiome has been associated with multiple inflammatory conditions including inflammatory bowel disease (IBD). Identifying functional changes in oral microbiome by metaproteomics helps understanding the factors driving dysbiosis related to intestinal diseases. However, enriching bacterial cells from oral samples (such as saliva and mouth rinse) rich in host proteins is challenging. Here, we present an Optimized Salivary MetaProteomic sample analysis workflow (OSaMPle) to enrich salivary bacteria and reduce host-derived interferences for in-depth analysis of the oral metaproteome. Compared to a conventional approach, OSaMPle improved the identification of bacterial peptides and proteins by 3.2 folds and 1.7 folds, respectively. Furthermore, applying OSaMPle to analyze mouth rinse samples from IBD patients revealed significant alterations in bacterial protein expressions under disease conditions. Specifically, proteins involved in the fatty acid elongation pathway in Peptostreptococcus were significantly less abundant in IBD patients, whereas proteins associated with the TCA cycle in Neisseria were significantly more abundant. The OSaMPle workflow is capable of processing small-volume oral samples and adaptable to high-throughput automation. It holds promise as a strategy for investigating the functional responses of oral microbiomes under disease conditions and identifying disease-associated microbes with their proteins, providing critical insights for detecting disease-related biomarkers within the oral microbiome.
Similar content being viewed by others
Introduction
The oral cavity harbors the second most diverse microbiota in the human body1. According to the expanded Human Oral Microbiome Database (eHOMD), the human oral microbiome comprises more than 700 species2 with 20 million nonredundant genes3. The oral microbiome serves as a crucial interface between the external environment and the gut along the oral-gut axis, influencing the gut microbiome and contributing to the overall health of the gastrointestinal tract. Oral microbiome dysbiosis has been associated with both local and systematic diseases, such as periodontitis4 and inflammatory bowel disease (IBD)5. IBD refers to a condition characterized by chronic, non-specific inflammation of the gastrointestinal tract. The role of intestinal microbiota alterations in IBD is an area of intense research and clinical interest. Environmental factors can disrupt gut microbiota balance, leading to reduced diversity and alterations in bacterial composition and function. These changes can ultimately lead to immune dysregulation and inflammation in the gut6. While this mechanism is widely recognized, the precise pathways through which environmental factors induce these imbalances remain unclear. Anatomically, exogenous microorganisms introduced through the oral cavity have been shown to play a crucial role in shaping and altering gut microbiota in germ-free mice5, highlighting the complex interactions between the oral and gut microbiomes through the oral-gut axis. Indeed, emerging evidence suggests that oral pathogens may translocate to the intestine and exacerbate intestinal inflammation5,7, highlighting a potential functional role for oral microbes in IBD pathogenesis. The notion that IBD may be influenced by oral bacteria is strongly supported by hypothesis-driven8 research, which demonstrates that as intestinal inflammation subsides in IBD patients, the dysbiotic oral microbiome tends to shift toward a healthier composition9. Additionally, IBD patients exhibit a significantly higher incidence of bacterial-associated oral diseases, particularly periodontitis10.
Our previous review emphasized the need for standardized methodologies in oral microbiome research, underscoring the importance of transitioning from technology-driven to hypothesis-driven approaches to better address research challenges and enhance clinical relevance8. Although studies have indicated a close relationship between IBD and the oral microbiome and highlighted its clinical significance, relying solely on metagenomics and metatranscriptomics sequencing methods9,11,12,13,14,15 is insufficient for a comprehensive understanding of the oral microbiome’s role in IBD pathogenesis, particularly when protein-level insights are required.
Since the presence of genes or transcripts does not necessarily correlate with protein expression levels, metaproteomics—which directly measures the abundance of expressed proteins—can provide a more accurate indication of microbiome functionality16. Although metaproteomics has emerged as a powerful tool for studying the functionality of human microbiome17,18,19, the oral microbiomes of IBD patients have not yet been investigated on the level of expressed metaproteomes. Exploring this could offer valuable insights into the correlation of microbial proteins to the pathophysiology of IBD. Human saliva is an ideal pool for studying the oral microbiome. Accordingly, one of our previous efforts has documented a set of standardized sampling methods for saliva collection20. However, the high abundance of host-derived biomolecules, which constitute approximately 95% of saliva, and the extremely low abundance of microbial components, around 3%, pose significant challenges in saliva metaproteome analysis21. While mass spectrometry-based proteomics approach22 has been foundational for saliva metaproteomics, the analytical approach remains underdeveloped due to the analytical challenges associated with enriching microbial cells from saliva. Only a limited number of studies have focused on analyzing salivary metaproteomes21,23,24,25,26,27,28. A recent study has addressed the importance of saliva sample pretreatment in enriching the microbiome, thereby enhancing the identification of oral metaproteomics26. In their work, the authors developed an free-flow isoelectric focusing (FFIEF) electrophoresis-based microbial enrichment methodology and identified 3647 bacterial proteins in a total of 44 raw files26. However, the experimental procedures were highly complex and did not yield metaproteomic identification results proportional to the effort required. Furthermore, the requirement for large sample volumes and intricate procedures makes these methods difficult to adapt for the high-throughput processing of clinical samples.
A dedicated workflow that involves cost-efficient, high-throughput compatible processing and effective enrichment of microbiota from saliva samples before LC-MS/MS analysis is indispensable for future in-depth, large-scale oral metaproteomic studies in health and disease. Since the oral microbiome samples contain substantial host-derived interferents, such as mucins and oral epithelial cells, it is crucial to first reduce the proportion of these substances and enrich microbial components before further protein analysis and enzymatic digestion. This pre-processing step ensures that, within the limited analysis gradient, MS/MS scans can be more effectively allocated toward detecting low-abundance bacterial peptides, thereby improving the identification of microbial components. However, the presence of mucins in saliva increases its viscosity, while microorganisms adhering to the surface of shed epithelial cells make it difficult to effectively separate microbial components through differential centrifugation. This challenge necessitates the development of more effective methods, that account for the physicochemical properties of the various sample components to selectively enrich microbial cells for downstream analysis. The human oral microbiota mainly consists of six phyla, namely Bacillota, Bacteroidota, Pseudomonadota, Actinomycetota, Spirochaetota, and Fusobacteriota. Among them, Bacillota and Actinomycetota are largely classified as Gram-positive (G+) bacteria29,30, though some exceptions exist, such as Gram-negative staining Bacillota (e.g., Megasphaera and Selenomonas) and Gram-variable Gardnerella within Actinomycetota. In contrast, Bacteroidota, Fusobacteriota, Spirochaetota, and Pseudomonadota are classified as Gram-negative (G-) bacteria31,32,33,34. Both G+ and G- bacteria have distinct cell wall structures that differ from the cell membrane of shed oral epithelial cells in saliva. This structural difference leads to varying lysis susceptibility, as human cells require milder membrane solubilization conditions, whereas microbial cells are more resistant and require harsher treatments for effective disruption. Mechanical disruption methods, such as bead beating, have been shown to significantly enhance protein yields by physically breaking down the tough cell walls of G+ bacteria35. Similarly, ionic detergents like sodium dodecyl sulfate (SDS) are indispensable for solubilizing bacterial membranes and efficiently releasing proteins from G- bacteria36. These differences in lysis susceptibility enable tailored approaches for selective protein extraction. We propose processing oral samples with a low concentration of denaturing agents to dissolve host proteins and non-ionic detergents to partially solubilize lipids in host cell membranes, facilitating the detachment of microorganisms while preserving microbial integrity.
In this study, we introduce the Optimized Salivary MetaProteomic sample analysis workflow (OSaMPle), an efficient method for isolating and enriching oral microbiota from human saliva or mouth rinse samples, significantly enhancing saliva metaproteomics identification. We developed and optimized an enrichment workflow that integrates a selective microbial enrichment buffer and a differential centrifugation regime to effectively enrich oral bacteria and reduce host components, followed by LC-MS/MS analysis and bioinformatics pipeline for microbial protein identification. Our approach greatly improved the identification of microbial peptides and protein groups with high reproducibility. This streamlined method was effectively adapted for processing oral rinse samples. Furthermore, we utilized this optimized workflow to investigate the oral microbiota in patients diagnosed with IBD, revealing dysregulated bacteria in both abundance and function associated with the condition.
Results
Establishing sample pretreatment workflow for microbiota enrichment
We used an iterative optimization strategy for establishing sample pretreatment workflow for microbiota enrichment. Briefly, at the first stage of proof-of-principle validation, we developed and tested a two-component processing workflow to isolate and enrich microbiota from saliva samples, which includes: (1) a selective enrichment buffer to process saliva samples, which facilitates the dissolution of host proteins; (2) differential centrifugations to separate and enrich microbiota (Fig. 1a). Next, at the second stage of method optimization, we further refined the composition of the selective enrichment buffer employing a full factorial experimental design (Fig. 1b).
a Proof-of-principle validation of the two-component processing workflow. Pelleted salivary aliquots were treated with one of four different buffers, followed by differential centrifugation to effectively isolate and enrich microbes from host debris. “PBS”, “U”, “T”, and “U + T” in panel (a) refer to “PBS”, “6 M urea in PBS”, “0.5% Tween 80 in PBS”, and “6 M urea, 0.5% Tween 80 in PBS”, respectively. b Experimental design and workflow for optimizing the selective enrichment buffer. Saliva samples from nine healthy human subjects were pooled and aliquoted into 18 portions for method optimization. These were divided into six groups, including a Baseline group (BL), a Control group (CT), and four experimental groups (GX, GT, UX and UT), with three technical replicates per group. The ‘+’ and ‘−’ signs indicate the presence (+) or absence (−) of specific buffer components in each group. In the BL group, sample pellets were suspended and washed in PBS but were not subjected to differential centrifugation. In the CT group and the four experimental groups, the sample pellets were treated with PBS or a combination of a detergent and a denaturant, followed by differential centrifugation (the detailed workflow of which is illustrated in the box within panel a). After pretreatment, all samples were washed by PBS buffer, followed by cell lysis, protein extraction, and digestion, and subsequently analyzed by LC-MS/MS. Logos for MetaLab and eHOMD are included in Fig. 1b, which originate from MetaLab software and https://www.homd.org, respectively.
In more details, during the first stage of proof-of-principle validation (Fig. 1a), we evaluated the efficiency of combining a denaturing agent and a detergent by comparing it to conditions using only a denaturing agent or only a detergent. A saliva sample collected from a single human subject was aliquoted into 2 mL portions per tube and then centrifuged at 14,000 × g for 20 min to collect the saliva pellet. Each pelleted aliquot was subsequently resuspended in one of four different buffer conditions—(1) PBS (as baseline control); (2) PBS with 0.5% Tween 80 (mild detergent); (3) PBS with 6 M urea (denaturing agent); or (4) PBS with a combination of 0.5% Tween 80 with 6 M urea (detergent and denaturing agent). In more detail, PBS served as a baseline buffer for comparison due to its isotonic and non-denaturing properties. Tween 80 (0.5%) was selected as a mild, non-ionic detergent that has been demonstrated to reversibly increase cell membrane permeability without causing significant cell damage37. Urea (6 M) was employed as a denaturing agent to solubilize host proteins, in accordance with the widespread application of 6–8 M urea for protein denaturation and solubilization38,39. Here, we selected the lower concentration of 6 M urea to dissolve salivary mucin proteins while preserving the structural integrity of microbial cells.
In this proof-of-principle test, we also validated the effect of the differential centrifugation regime (Fig. 1a). Briefly, the saliva pellets resuspended in the four different buffers were then subjected to low-speed centrifugation to pellet host cells and other salivary debris (such as food debris), referred to as “Debris” (Deb) in this procedure. For microbial enrichment, the resulting supernatant, named as “Supernatant” (Sup), was collected and subjected to high-speed centrifugation to obtain the pellet. If the selective enrichment buffer was effective, the final pellet was expected to contain a successfully enriched population of microbial cells.
Microscopic images of the samples collected after differential centrifugation revealed that the combination of 6 M urea and 0.5% Tween 80 effectively separated and enriched microbiota from saliva samples (Fig. 2a). Microscopic images of the supernatant indicated abundant presence of microorganisms, while the debris fraction primarily consisted of host cells. Notably, few intact host cells were observed in the supernatants of either the T group or the U + T group, but differences in bacterial enrichment were evident. The supernatant from the U + T group suggested bacterial enrichment, whereas the supernatant from the T group contained fewer microbial cells. Additionally, in the debris samples, more bacteria were observed adhering to host cells in the T group. In contrast, host cells were observed in both the debris and supernatant of the other two groups. These observations preliminarily provided proof of principle that the combination of denaturing agents and detergents, followed by differential centrifugation, enhances the separation and enrichment of the oral microbiome. Thus, we proceeded with the next step of optimizing the composition of denaturing agents and detergents.
a Microscopic images of samples from the four experimental groups during the method validation phase. Supernatant (left) and debris (right) samples were obtained after different buffer treatments followed by differential centrifugation, show at 10 × and 100 × magnification, respectively. b Hierarchical clustering analysis of all samples. Samples with the same label color indicate technical replicates from identical buffer treatments, controls, baseline, or QC. c Principal component analysis (PCA) of the metaproteomic profiles generated from different sample treatment workflows. The LFQ intensities of protein groups were log-transformed, and the analysis include all protein groups identified across samples. d Pearson correlation coefficient of quantified protein groups within different groups. The analysis was conducted on the samples from the QC group, the BL group, and the supernatant samples from the CT group, as well as the four experimental groups. Protein intensities (log2-transformed) were used to calculate the correlation. All groups were compared with the QC group. Statistical significance was assessed using one-way ANOVA with Tukey’s post hoc test. Significant differences between groups are denoted by asterisks: *adjusted-p < 0.05, ****p < 0.0001, while groups with no significant differences are marked without asterisks. Sup and Deb in panels (b–d) represent supernatant and debris, respectively; labels BL, CT, GX, GT, UX, UT correspond to the experimental design as shown in Fig. 1b. QC indicates MS quality control samples.
Optimized workflow enables comprehensive bacterial protein identification
At the second stage (Fig. 1b), we investigated the efficacy of different combinations of denaturing agents (6 M urea, 4 M guanidine hydrochloride) and detergents (0.5% Tween 80, 0.5% Triton X-100) to further optimize the workflow. As illustrated in Fig. 1b, pooled saliva samples from 9 subjects were aliquoted (3 mL per replicate) and used to optimize the composition of the selective enrichment buffer: (1) Baseline group (BL), direct centrifugation of 3 mL of a saliva sample to obtain the pellet, followed by washing it with PBS and protein extraction for metaproteomics analysis; (2) Control group (CT), saliva pellet was processed with PBS buffer, followed by differential centrifugation (see details in Methods); the four experimental groups followed the same experimental procedure as the CT group, with the only difference being the composition of the selective enrichment buffer: (3) GX group (4 M GuHCl, 0.5% Triton-X 100 in PBS); (4) GT group (4 M GuHCl, 0.5% Tween 80 in PBS); (5) UX group (6 M Urea, 0.5% Triton-X 100 in PBS); (6) UT group (6 M Urea, 0.5% Tween 80 in PBS). All processed samples underwent the same pre-processing steps (PBS washing, cell lysis, protein extraction and digestion) before metaproteomics analysis. N = 3 technical replicates were performed for each group to assess reproducibility.
To compare the effectiveness of microbial enrichment and host cell removal across different groups, we evaluated the separation of host and microbial components in the debris fractions and supernatants obtained from each experimental step through microscopic images, followed metaproteomics analysis to assess the host or microbial origin of the identified proteins and their relative abundances. Microscopic analysis indicated that all four experimental groups achieved effective separation consistent with previous results, as evidenced by the distinct characteristics of host and microbial components in the debris fractions and supernatants (Supplementary Fig. 1). All supernatant and debris samples in different groups were then subjected to cell washing, protein extraction, enzymatic digestion, desalting, following a previously established metaproteomic workflow40. LC-MS/MS analyses with 78 min gradient using an Orbitrap Fusion™ Lumos™ Tribrid™ mass spectrometer resulted in a total of 36 LC-MS/MS raw files, including three quality control (QC) samples generated by merging aliquots from each different sample.
Peptide and protein identification and quantification were performed using MetaLab 2.3 with the MetaPro-IQ iterative database search strategy40, based on a de-redundant database combining the expanded Human Oral Microbiome Database (eHOMD) collection of oral bacteria and human proteomes from UniProt. A total of 632,992 peptide-spectrum matches (PSMs) were identified, corresponding to 47,503 peptide sequences, which allowed the quantification of 8591 protein groups in this dataset. Hierarchical clustering with all quantified protein groups demonstrated that the major differences were observed between the Sup and Deb groups. Additionally, the samples treated with different enrichment buffers formed distinct clusters. The technical replicates showed high consistency, confirming the robustness of our experimental approach (Fig. 2b). To assess the impact of sample processing on protein composition, PCA analysis using the quantified protein groups from all samples was performed. The results showed that supernatant samples primarily contributed to variations in PC1 compared to debris samples, while PC2 captured differences between selective enrichment buffers (Fig. 2c). To achieve quantifiable reproducibility assessment and compare differences across treatment groups, we used the Pearson correlation coefficient (r) to evaluate the robustness of technical replicates across groups. All groups exhibited high reproducibility (r > 0.9) except for the UX_Sup group (r = 0.86 ± 0.03; mean ± SD, N = 3 technical replicates). The GX buffer-enriched microbial components yield the highest robustness of r = 0.95 ± 0.00 (Fig. 2d). Additionally, the correlation among the three QC samples, which were respectively loaded before, in the middle, and after the entire sample set analysis, was 0.96 ± 0.00 (N = 3, Supplementary Fig. 2), demonstrating the stability and reliability of the whole LC-MS/MS analysis process.
We compared the number of identified peptides from the four experimental groups’ Sup samples across all groups with those from the BL group. As described previously, the supernatant yielded from differential centrifugation is expected to contain enriched microbial cells. Among all Sub groups, the GX_Sup group achieved the highest number of identified peptides (18,014 ± 124 per sample), followed by GT_Sup (14,379 ± 714). For the UX_Sup and UT_Sup groups, the numbers were 11,105 ± 3,648 and 10,666 ± 366, respectively. Whereas the BL and CT_Sup groups identified 11,292 ± 413 and 9072 ± 1368 peptides, respectively (Supplementary Fig. 3a). Subsequently, we quantified an average of 5059 ± 59 protein groups per GX_Sup sample, while GT_Sup samples showed 4495 ± 433 protein groups. For the UX_Sup and UT_Sup groups, the numbers were 3771 ± 790 and 3,640 ± 147, respectively. In contrast, this number in the BL and CT_Sup were 2,922 ± 156 and 2866 ± 172, respectively (Supplementary Fig. 3b). Since the peptide and protein identification counts for the UT_Sup and UX_Sup groups were relatively low, and their reproducibility was not as high as that of the GX_Sup group, these two groups were excluded from subsequent comparative analyses.
Notably, the percentages of bacterial peptides in GX_Sup samples increased significantly from the control groups (one-way ANOVA with Tukey’s post hoc test, ****p < 0.0001, Fig. 3a), with bacterial peptides constituting 73.46% of the total peptide count and accounting for 51.46% of the total LFQ intensity, which quantifies accurately41. In the GT_Sup samples, the percentages were similarly high to those in GX_Sup, with no significant difference between the two groups. In comparison, the BL group contained only 27.76% bacterial peptides and 5.68% of the intensity, while the CT_Sup group yielded 34.71% and 6.45%, respectively. These results indicate that the GX buffer led to a 3.23- and 3.20-fold increase in bacterial peptide quantification relative to the BL and CT_Sup groups, respectively. Similarly, the GT buffer improved bacterial peptide quantification by 2.44- and 3.56-fold compared to the BL and CT_Sup groups, respectively.
a Proportions of bacterial-originated peptide intensity and count among all identified peptides. b Numbers of identified bacterial and human peptides. Panels a and b show comparisons across BL, CT_Sup, GX_Sup and GT_Sup groups. *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001 indicate statistical significance levels determined by one-way ANOVA followed by Tukey’s post hoc test. Error bars represent mean ± SD, and individual values are shown as black dots. c UpSet plot illustrating the intersections of microbial protein sets among of the BL, CT_Sup, GX_Sup groups, the results are based on merged data from technical replicates of each group. d Volcano plot showing significantly differentially quantified proteins between GX_Sup and CT_Sup groups, with a fold change threshold of > 1.5 and p value < 0.05). e Stacked bar chart representing the originations of differentially quantified proteins in panel d).
Using GX buffer combined with differential centrifugation, we identified only 4783 human peptides, and the GT_Sup group identified even fewer, with just 3,607 human peptides. This represents a significant decrease of 41.17% and 55.64%, respectively, compared to the 8132 human peptides identified in the BL group (one-way ANOVA with Tukey’s post hoc test, *p < 0.05; ****p < 0.0001, Fig. 3b). Notably, the reduction in host interference peptides allowed a significant increase in the number of bacterial peptides. Specifically, in the GX_Sup group, the number of bacterial PSMs increased from 3,905 to 15,099 (3.87-fold) compared to the BL group (Supplementary Fig. 4a). Similarly, the number of bacterial peptides increased from 3,131 in the BL group to 13,231 in the GX_Sup group (4.23-fold, Fig. 3b). In GX_Sup group, 80% of the quantified proteins were derived from microbiota, with an average of 4076 ± 39 bacterial protein groups per sample (Supplementary Fig. 4b). In comparison, the GT_Sup group quantified 3665 ± 316 bacterial proteins, which accounted for 82% of the total protein content. The number of quantified bacterial protein groups in the GX_Sup group was 1.70 times higher than in the BL group and 1.50 times higher than in the CT_Sup group. Similarly, the GT_Sup group showed 1.43 times and 1.24 times increases in quantified bacterial protein groups compared to the BL the CT_Sup groups, respectively (Supplementary Fig. 4b).
Since the GX_Sup group was the group with the highest number of bacterial protein quantifications, we next compared the identity of quantified bacterial proteins in GX_Sup group with those in the BL group and CT_Sup group, who served as the baseline and control groups, respectively. A total of 2,267 bacterial protein groups were exclusively quantified in GX_Sup samples (Fig. 3c). We also explored the differentially quantified proteins between GX_Sup and the CT_Sup groups, using threshold at fold-change > 1.5 and p value < 0.05 (Wilcoxon signed-rank test, Fig. 3d). We identified 808 increased proteins and 58 decreased proteins, with 785 increased proteins being derived from the bacteria (Fig. 3e). Additionally, the volcano plot of significantly differentially quantified proteins (fold change > 1.5 and p value < 0.05, Wilcoxon signed-rank test) between GX_Sup and BL groups suggested 739 increased proteins and 64 decreased proteins (Supplementary Fig. 5a). Among the differential proteins, 731 were increased bacterial proteins (Supplementary Fig. 5b).
In summary, these results together show that GX_Sup group significantly increased the identification and robust quantification of bacterial protein groups. We thereafter name the sample processing approach using the GX buffer and the differential centrifugation regime as the “OSaMPle workflow”.
OSaMPle enables deep taxonomic and functional insights of oral microbiota
We next further explored the taxonomic and functional insights enabled by the OSaMPle workflow. Six phyla (Actinomycetota, Bacteroidota, Bacillota, Fusobacteriota, Pseudomonadota, and Campylobacterota) were significantly enriched by the OSaMPle workflow (two-sided unpaired t-test with Bonferroni correction. *q < 0.05, ***q < 0.001; Supplementary Fig. 6). The relative LFQ intensity of bacterial peptides annotated at the genus level, compared to all peptides annotated at this taxonomic level (including both host and bacterial peptides), reached 95% using the OSaMPle workflow (Fig. 4a). Taxonomic classification of all samples revealed a total of 62 genera, based on a threshold of at least 3 genus-specific peptides for identification, with 28, 34, 46, and 49 genera identified in the BL, CT_Sup, GT_Sup groups, and OSaMPle workflow, respectively (Fig. 4b). Compared to the BL and CT_Sup groups, 15 genera were uniquely identified in the OSaMPle workflow, including Aggregatibacter, Solobacterium, Lancefieldella, Peptoanaerobacter, Pseudoprevotella, Candidatus Nanosynbacter, Kingella, Mogibacterium, Phocaeicola, Paraprevotella, Fretibacterium, Eikenella, Paenibacillus, Bacteroides, and Candidatus Minimicrobia (Fig. 4b). At the species level, the relative abundance of bacterial peptides identified using the OSaMPle workflow and annotated to the species level reached 86.0% (Supplementary Fig. 7). Using a threshold of three species-specific peptides, 37, 48, 82 and 79 species were identified in the BL, CT_Sup, GT_Sup groups and OSaMPle workflow, respectively (Supplementary Fig. 8). Among the three technical replicates, the OSaMPle workflow demonstrated the highest consistency, with 89.3% of species consistently identified in all replicates, compared to 76.6% for the GT_Sup group. In contrast, the BL and CT_Sup groups had only 63–64% of species consistently identified across replicates. This indicates that the OSaMPle workflow exhibits much higher robustness in microbial identification (Fig. 4c). These findings underscore the superior robustness of the OSaMPle workflow in microbial identification. By effectively minimizing host interferences, enriching the low-abundant species, and enhancing microbiome identification sensitivity, the workflow preserves a significantly higher number of bacterial species for metaproteomics analysis.
a Comparison of genus-level quantification across BL group, CT_Sup group, OSaMPle workflow, GT_Sup group, UX_Sup group, and UT_Sup group using metaproteomics. b UpSet plot showing identified bacterial genera (genus-specific peptides≥3) across BL, CT_Sup, GX_Sup groups, and the OSaMPle workflow, the results are based on merged data from technical replicates of each group. c Venn diagram of the identified bacterial species (species-specific peptides ≥3) in these four groups. Each circle represents a technical replicate (n = 3). d Unweighted degree distribution of functions in the proteomic content network (PCN) of the four groups, the group layout in panel (d) is consistent with that in panel (c). e Sankey diagram illustrating the relationships between bacterial phyla and COG categories across the BL, CT_Sup groups, and the OSaMPle workflow. The line width and numbers in brackets represent functional protein LFQ intensity (COG category definition, J: Translation, ribosomal structure and biogenesis; K: Transcription; L: Replication, recombination and repair; D: Cell cycle control, cell division, chromosome partitioning; V: Defense mechanisms; T: Signal transduction mechanisms; M: Cell wall/membrane/envelope biogenesis; N: Cell motility; U: Intracellular trafficking, secretion, and vesicular transport; O: Posttranslational modification, protein turnover, chaperones; C: Energy production and conversion; G: Carbohydrate transport and metabolism; E: Amino acid transport and metabolism; F: Nucleotide transport and metabolism; H: Coenzyme transport and metabolism; I: Lipid transport and metabolism; P: Inorganic ion transport and metabolism; Q: Secondary metabolites biosynthesis, transport, and catabolism; R: General function prediction only; S: Function unknown).
We further compared the relative abundances of microbial species, focusing on the 18 species whose cumulative abundance accounted for 95% of the total metaproteome. Compared to the BL or CT_Sup group, OSaMPle workflow significantly increased the abundances of 15 of these microbial species, while the remaining three species showed no significant difference relative to the BL or CT_Sup group (Supplementary Fig. 9). Interestingly, a few microbial species were significantly enriched in other buffer groups compared to OSaMPle workflow. For example, the proportion of species such as Prevotella pallens, Prevotella melaninogenica, Neisseria flavescens, Neisseria sicca demonstrated a significant increase in GT_Sup group, and Selenomonas flueggei were significantly increased in UX_Sup group (*q < 0.05; **q < 0.01,Tukey’s post hoc test with Bonferroni correction). These results suggest that while the OSaMPle workflow offers comprehensive and robust microbial identification, specific buffer groups may be better suited for studies targeting certain microbial species.
We next annotated the quantified bacterial proteins in the BL group, CT group and OSaMPle workflow using Clusters of Orthologous Groups (COG) of proteins database. COG enrichment analysis, conducted using the hypergeometric distribution method42, revealed similar functional annotations across these three groups, with the OSaMPle workflow showing a higher enrichment level and a greater number of annotated bacterial proteins (Supplementary Fig. 10). In addition, we compared the subcellular localization of bacterial proteins quantified in the BL group, CT_Sup group, and the four experimental groups. Based on the Gram-staining classification of the corresponding microbial species, proteins from G- bacteria were categorized into five groups: Cytoplasmic, Periplasmic, Inner Membrane, Outer Membrane, and Extracellular. For G+ bacteria, the proteins were classified into four groups: Cytoplasmic, Extracellular, Cell Wall, and Membrane. Subcellular localization annotation revealed that the total LFQ intensity of proteins assigned to each subcellular localization in the OSaMPle workflow and GT_Sup group were higher compared to those in the BL and CT_Sup groups (Supplementary Fig. 11).
We further performed a taxon-function integration analysis to compare the ability of OSaMPle in addressing taxon-specific functionalities. Based on taxonomic and functional annotations, we generated proteomic content networks (PCN)43 of each sample, which can be defined as a bipartite graph connecting each microbial taxon to all expressed functions from the taxon’s proteome. By calculating the unweighted degree distribution of functions in the PCN of the BL, CT_Sup, GT_Sup groups, and OSaMPle workflow, we found that the degree distributions of functions in PCNs of the OSaMPle workflow and the GT_Sup group exhibit a fat tail pattern whereas the BL and CT_Sup groups don’t (Fig. 4d). This finding highlights that OSaMPle workflow and GT_Sup group significantly improved the depth of bacterial functional identification. We also performed phylum level taxonomy-function integration analysis for the OSaMPle workflow, CT_Sup and BL groups based on interconnected annotation information. In the OSaMPle workflow, the abundance of bacterial proteins from different phyla corresponding to COG categories increased by an order of magnitude compared to the BL and CT_Sup groups (Fig. 4e). These further evidenced the strength of OSaMPle in elucidating the taxon-specific functionality of the saliva microbiome.
OSaMPle discovers differentially abundant proteins in mouth-rinse metaproteomes between IBD patients and healthy controls
We next applied the OSaMPle workflow to explore the functional responses of oral microbiota in IBD patients (IBD group, N = 8) and healthy controls (HC group, N = 10). Each individual’s age, sex, and oral condition have been documented in detail (Supplementary Table 1). As previous studies have shown, mouth rinse samples can serve as a substitute for saliva samples as a resource for oral microbiome research44,45,46. Therefore, we examined the suitability of the OSaMPle workflow for mouth rinse samples. Results in Supplementary Fig. 12 show that a 5 mL physiological saline mouth rinse sample from a healthy subject achieved a comparable number of bacterial peptide identifications and protein quantifications to those obtained from saliva samples. Therefore, during the method application phase, we used a 5 mL physiological saline solution for a 30-s mouth rinse to collect samples, which was more time-saving and easier to perform for clinical sampling compared to saliva collection.
Mouth rinse samples were collected from both groups and processed through the OSaMPle workflow, resulting in a total of 20 LC-MS/MS raw files involving two QC samples generated by merging aliquots from each different sample. Overall, a total of 579,017 PSMs were identified, corresponding to 79,893 peptide sequences and resulting in 16,775 quantified protein groups in this dataset. On average, we quantified 19,929 ± 3,351 peptides and 5779 ± 991 protein groups per sample (N = 20 raw files). No significant differences were observed in the number of peptides, peptides abundance, and the number of proteins between the two groups (Supplementary Fig. 13). We also calculated the functional redundancy (FR) and normalized functional redundancy (nFR)43 of each individual sample but found no significant differences between the two groups (Supplementary Fig. 13).
Taxonomic analysis revealed the presence of five major phyla in all samples: Actinomycetota, Pseudomonadota, Fusobacteriota, Bacillota and Bacteroidota (Supplementary Fig. 14). Among these, Pseudomonadota showed a significant increase in the IBD group compared to the HC group (*q < 0.05, two-sided unpaired t-test with Bonferroni correction). Bacillota exhibited an overall decrease in the IBD group; however, due to the small sample size, the q-value was not significant (two-sided unpaired t-test with Bonferroni correction). The other phyla, including Actinomycetota, Fusobacteriota, and Bacteroidota, did not show significant differences, and thus were not emphasized in the statistical analysis. At the genus level, although the relative abundance of Neisseria appeared to increase in the IBD group, and the relative abundances of Streptococcus and Peptostreptococcus appeared to decrease in the IBD group compared to the HC group (Supplementary Fig. 15), these changes were not statistically significant (as determined by two-sided unpaired t-test with Bonferroni correction). These observations suggest compositional differences between the IBD group and the HC group, but further studies with larger sample sizes are needed to validate these findings.
To further explore protein-level differences in the oral metaproteome associated with IBD, we performed sparse partial least squares-discriminant analysis (sPLS-DA) using proteins quantified in ≥ 50% samples of both groups. The sPLS-DA demonstrated clear separation between the IBD and HC groups, with the IBD group exhibiting lower scores on Component 1 compared to the HC group (Fig. 5a). The analysis also demonstrated low classification error rates (reached 0% starting from Component 3), confirming its reliability and precision. Feature loadings analysis identified the top protein groups contributing to this separation, including one protein group with the top-ranking protein SEQF10188.1_02677 (belongs to the receptor family ligand-binding region) and a taxonomic LCA at the species level with Ralstonia pickettii; one protein group with the top-ranking protein SEQF10165.1_03816 (belongs to the aldehyde dehydrogenase family), sharing a taxonomic LCA at the phylum level of Pseudomonadota; and another protein group containing a single protein SEQF3078.1_02022 (ABC-type branched-chain amino acid transport system) from Streptococcus cristatus (Supplementary Fig. 16). Additionally, the PERMANOVA (Permutational Multivariate Analysis of Variance) analysis based on the sPLS-DA Component 1 and Component 2 scores and Euclidean distance revealed an R-squared value of 0.47332, an F-statistic of 14.373, and a p-value of 0.001, indicating significant group differences. To further identify proteins with significant changes in the IBD oral microbiome, differential abundance proteins between the two groups were analyzed using the Wilcoxon signed-rank test with a fold change threshold of > 1.5 and p-value < 0.05 (Fig. 5b). This analysis identified 71 proteins with significantly higher abundance and 54 proteins with significantly lower abundance in the IBD group. A heatmap of the top 120 differentially abundant proteins showed clear clustering based on disease or health status (Fig. 5c), with 108 of these proteins originating from the microbiome, highlighting distinct differences in the metaproteome between healthy individuals and IBD patients.
a Sparse partial least squares discriminant analysis (s-PLS-DA) of the metaproteomic response in IBD and HC groups. Right: Classification error rate derived from the sparse PLS-DA. b Volcano plot showing significantly differentially quantified proteins between IBD and HC groups (threshold: fold change > 1.5 and p value < 0.05). c Heatmap showing the highly distinctive structure of the oral metaproteome between two groups. rows represent the samples, and columns represent the 120 proteins. d Sankey plot shows taxonomy-function integration in more abundant proteins in IBD. e Sankey plot shows taxonomy-function integration in less abundant proteins in IBD. Letters represent different functional categories in the COGs database, and numbers represent functional protein LFQ intensity. f Pathways corresponding to significantly different COGs between the two groups. Orange lines indicate COGs more abundant in IBD group, blue lines indicate COGs less abundant in IBD group, and black lines indicate shared responses.
To further characterize the differentially abundant microbial proteins in IBD patients, we analyzed specific protein functions that showed significant changes. Among the more abundant microbial proteins, we identified a protein group with the top-ranking protein SEQF10188.1_02677 (as mentioned previously), which was found to be 104 times more abundant in IBD group compared to the HC group. Another protein group, with the top-ranking protein SEQF10165.1_03816 (also mentioned previously), was 44 times more abundant in IBD group.
In contrast, several proteins were less abundant in the IBD group. we identified two protein groups with reduced abundance in IBD group, with the top proteins SEQF3557.1_00099 (bacterial solute-binding protein 9 family) and SEQF1059.1_00280 (alcohol dehydrogenase 1 family), respectively. Both protein groups share a taxonomic LCA at the genus level with Streptococcus. Additionally, four other protein groups with reduced abundance in the IBD group were identified, each consisting of a single protein. These include SEQF3078.1_02022 (ABC-type branched-chain amino acid transport system) and SEQF2520.1_02072 (an ABC transporter, substrate-binding protein) from Streptococcus cristatus, SEQF3598.1_01181 (a bacterial extracellular solute-binding protein) from Streptococcus mitis, and SEQF2007.2_01041 (phosphoglycerate kinase family) from Streptococcus parasanguinis. We further connected microbial taxa and functional categories through a Sankey plot in Fig. 5d, illustrating proteins that are more abundant in the oral microbiome of IBD group. These proteins are predominantly produced by Neisseria and are enriched in pathways related to energy production and conversion, as well as inorganic ion transport and metabolism. In contrast, Fig. 5e shows that the less abundant proteins are mainly derived from streptococcus, with similar associations to energy production and conversion and inorganic ion transport and metabolism, along with additional contributions from Peptostreptococcus (Fig. 5e), which are linked to translation, ribosomal structure and biogenesis. These findings are consistent with the observed reduction in the relative abundance of these two genera (Supplementary Fig. 15).
Further enrichment analysis of these altered proteins under IBD conditions revealed distinct functional changes. Specifically, we annotated differentially expressed proteins with COG proteins and performed functional enrichment analysis based on these annotations. Proteins with increased abundance were significantly enriched in pathways associated with energy production and conversion, while those with decreased abundance were primarily linked to amino acid transport and metabolism, as well as carbohydrate transport and metabolism (Supplementary Fig. 17). Notably, the increased abundance of proteins involved in inorganic iron transport and metabolism was particularly pronounced, suggesting a potential disruption in the normal metabolic activities of the oral microbiome in IBD patients (Supplementary Fig. 17). Next, we mapped these COG proteins to microbial metabolism pathways using iPATH. By tracing these COG proteins back to their corresponding proteins and linking the proteins to their taxonomic origins following the “protein-peptide bridge” approach as previously described43, this integrated approach provides insights into how changes in protein expression may influence or reflect metabolic activities within the microbiota of IBD patients (Fig. 5f). Interestingly, we found that the more abundant proteins in IBD group (fold-change threshold of >1.5 and p-value < 0.05 by Wilcoxon signed-rank test), corresponding to specific COG proteins, are linked to pathways such as the TCA cycle, inositol phosphate metabolism, and sulfur metabolism. Tracing these COG proteins to their corresponding proteins and mapping them to their taxonomic origins revealed that they are associated with Neisseria, which showed a tendency of increased relative abundance, this suggests that Neisseria may increase their abundance in the oral microbiome of IBD patients by generating energy through the TCA cycle. Conversely, less abundant proteins in IBD patients, corresponding to specific COG proteins, are linked to pathways such as fatty acid elongation pathway associated with Peptostreptococcus, corresponded with the observed decrease in this genus (Supplementary Fig. 15b).
Discussion
The OSaMPle workflow is a method designed for salivary and mouth rinse samples, enabling in-depth metaproteomic identification and analysis of oral microbiota. This workflow combines two key steps: selective enrichment buffer treatment and differential centrifugation. We optimized the buffer formulation by testing different combinations of denaturants and detergents, finding that 4 M GuHCl with 0.5% Triton X-100 yielded the best results for host protein solubilization and microbial cell detachment. While GuHCl and urea are both effective denaturants commonly used in proteomic analysis, they function via distinct mechanisms. Urea’s interaction with water forms strong hydrogen bonds, increasing solubility of hydrophobic residues and forming hydrogen bonds with protein backbones, which destabilizes protein structures47,48. In contrast, GuHCl, an ionic denaturant, interacts with protein backbones and polar side chains, disrupting charge interactions and promoting protein unfolding49. In the oral cavity, mucins, particularly Mucin 5B—a heavily glycosylated, charged macromolecular glycoprotein50—pose significant challenges in sample preparation due to their high abundance in the saliva. We argue that GuHCl, due to its ionic nature, effectively solubilizes these complex mucins, enabling the separation of microorganisms from previously insoluble large mucin molecules through centrifugation. Additionally, we determined the optimal detergent by comparing 0.5% Tween 80 versus 0.5% Triton X-100, as each has been shown to affect cell permeability differently. In specific, 0.5% Tween 80 increases membrane permeability without causing significant cell damage37, while 0.5% Triton X-100 is widely used in immunofluorescence experiments to permeabilize cells51. After the experimental comparison, we selected 0.5% Triton X-100 in our buffer to enhance host cell permeabilization as it showed the capability to maintain the integrity of microbial cells and yielded high microbial protein abundances. Notably, while our selection of denaturant and detergent combinations effectively enriched microbial cells, further work refining the concentrations of buffer components may enhance microbial recovery efficiency even more. This optimization simplifies the procedure and eliminates the need for the specialized equipment required in other established approaches such as the FFIEF-MS-based metaproteomic methodology26, which rely on specified electrophoresis instruments and involve complex separation procedures. Additionally, previous methods are often incompatible with small-volume samples, while OSaMPle can start with as low as 3 mL saliva or 5 mL mouse rinse samples, further emphasizing the adaptability and practicality of the OSaMPle workflow, emphasizing the adaptability and practicality of the OSaMPle workflow for processing large-scale clinical samples.
The OSaMPle workflow reduced the proportion of host-derived peptides from 94% in the BL group to 49% in the OSaMPle-processed group, significantly enhancing the robust identification of bacterial peptides and proteins. Reducing host interference proved essential for improving the sensitivity and achieving more reliable microbiome identification by LC-MS/MS. By comparing metaproteomic data from both taxonomic and functional levels between groups, we demonstrated that the OSaMPle workflow enables comprehensive and in-depth characterization of microbial functions, offering a robust framework for advancing oral metaproteomic research.
However, a limitation of this work was due to the LC-MS/MS instrument selected in this study, the number of oral bacterial proteins quantified by this workflow still falls short compared to a recent study, in which Chen et al. identified approximately 15,000–17,000 microbial peptides and 7,000–8,000 microbial proteins on average per tongue coating sample using Pressure Cycling Technology coupled with Data-Independent Acquisition (PCT-DIA) MS52. To achieve a fair comparison, we analyzed a mouth rinse sample using timsTOF Pro 2 mass spectrometer and accurately quantified 60,218 microbial peptides and 8,564 protein groups (Supplementary Fig. 18). The difference in the peptide-to-protein ratio between our work and Chen’s is due to the fact that we analyzed only one sample; the number of protein identifications is expected to increase as the sample set size grows, enabling match between runs to discover more protein groups across the dataset. Therefore, it is evident that the current limitation in the overall identification number in our presented work was due to the mass spectrometry instruments used. In future clinical oral metaproteomics studies, utilization of high-resolution mass spectrometry instruments in combination of the OSaMPle processing workflow will facilitate an in-depth identification of the oral metaproteome.
We next applied OSaMPle workflow to analyze mouth rinse samples and investigate the functional response of the oral microbiota in IBD patients. Studies have shown that the oral microbiome of IBD patients differs from that of healthy individuals12,13,15,53. However, most research on the oral microbiome and IBD has relied on metagenomic sequencing, leaving a gap in the functional interpretation of the microbiota under pathological condition. With our established metaproteomics pipeline, several IBD-associated taxa identified in this study are consistent with findings from previous research12,15. At the genus level, for example, we observed an increase in Neisseria and a decrease in Peptostreptococcus and Streptococcus within the oral microbiota of IBD patients, aligning with findings from earlier studies. Although some changes did not reach statistical significance in our study due to a limitation of sample size, the existing results show considerable potential that future studies with larger sample sizes would observe more significant changes and explore their potential biological implications in the context of IBD. Despite insignificant changes in the overall abundance of microbial genera, sPLS-DA analysis on the protein level demonstrated reliable discrimination between IBD and HC groups. Differential protein analysis revealed significant increases in proteins predominantly belonging to Neisseria. As an anaerobic opportunistic pathogen in the oral cavity, Neisseria is closely related to periodontal inflammation54. These compositional changes in oral microbiota may partly explain the increased risk of periodontal disease in IBD patients10. By analyzing the differentially expressed proteins between the IBD and HC groups, we revealed the less abundant proteins were primarily associated with amino acid transport and metabolism, as well as carbohydrate transport and metabolism, suggesting that the normal metabolism of the oral microbiome in IBD patients may have been impaired. Moreover, the enrichment of more abundant proteins in inorganic iron transport and metabolism is more pronounced, consistent with changes in the oral cavity of IBD patients with periodontitis, which is characterized by an increase in available degraded host proteins and heme facilitates the further expansion of virulent bacteria55. Additionally, through taxonomy-function integrated analysis, we identified dysregulated protein functions in IBD patients corresponding to changes in specific genera. For instance, we found increased Neisseria were associated with increased proteins related to pathways such as the TCA cycle, inositol phosphate metabolism, and sulfur metabolism, this suggests that Neisseria may increase their abundance in the oral microbiome of IBD patients by generating energy through metabolic pathways. Hu et al. reported that the acetyl-CoA pathway of saliva microbiome was enriched in CD patients15, acetyl-CoA as a substrate for the TCA cycle, which aligns with the increase of TCA pathways-related proteins that we identified. Our results demonstrated that compared to analyzing based solely on taxonomic composition, integrating taxonomy and function provides a more comprehensive and robust insight into dysbiosis of the oral microbiome in patients with IBD.
One limitation of this study is its relatively small sample size, which could constrain the statistical power. Specifically, we determined the overall mean Cohen’s d for our metaproteomic dataset to be 1.08. With a total sample size of 18 (N_IBD = 8 and N_HC = 10) and a significance level of 0.1, the statistical power was calculated to be 0.70. While the significance level and statistical power are not particularly high, the primary goal of this experiment is to validate the sensitivity of our metaproteomics workflow and demonstrate its ability to detect differentially expressed proteins in the IBD oral microbiome. Due to the fact that this study does not aim for an in-depth mechanistic exploration of IBD oral microbiome dynamics but rather tests the sensitivity of the OSaMPle method, we believe that the relaxed criteria better support validating the robustness of the metaproteomics workflow. Furthermore, while it is well-known that Crohn’s disease (CD) and ulcerative colitis (UC), the two major subtypes of IBD, exhibit distinct dysbiotic patterns in the gut microbiome, distinguishing between them using oral microbiome data may not be feasible. For example, a study using 16S rRNA sequencing of the dorsum tongue microbiome in 156 treatment-naïve pediatric IBD patients (94 CD, 52 UC) and 102 healthy controls reported that, although PCA indicated more severe dysbiosis in CD, classifiers failed to effectively differentiate CD from UC based on oral microbiome data56. Consistent with this observation, our findings show significant oral microbial dysbiosis in IBD patients; however, the differences between UC and CD at the oral microbiome level are less distinct. Given the methodological focus of this study, we chose to analyze CD and UC as a combined IBD group. This approach allows us to highlight shared microbial disruptions across both subtypes and validate the sensitivity of the metaproteomics workflow. Nonetheless, we acknowledge that future studies with larger cohorts and advanced analytical techniques may better elucidate oral microbial differences between CD and UC. Additionally, variations in diet have been known to affect the oral microbiome’s composition57, which was not explored in the current study and should be addressed in future research. Despite these limitations, this study is the first time to explore the oral microbiome of IBD patients using metaproteomics. Based on our established oral metaproteomics workflow, we believe that large-scale studies on the gut-oral axis in IBD will become an effective approach for in-depth mechanistic discoveries in the future.
Methods
Oral microbiome sample collection
This study was approved by the Institutional Medical Ethics Review Board of Peking University People’s Hospital (Approval No. 2022PHB373-001). All participants signed informed consent forms. During the method proof-of-principle and optimization stage, unstimulated saliva was collected from one and nine different healthy adults according to the method described by Lu et al.20, respectively. Each subject was asked not to eat or drink for two hours after breakfast. Sampling was taken place late in the morning, subjects were asked to avoid swallow, concentrate the saliva on the floor of the mouth before spitting it into a 50 mL centrifuge tube, which was stored on ice immediately. During the methods application stage on clinical IBD patient samples, we chose saline mouth rinse sample to improve operational feasibility in a hospital setting. IBD patients were diagnosed at Peking University People’s Hospital. Healthy control subjects met the following criteria: without probiotics/antibiotics administration in months before sample collection, and with good physical status. All participants rinsed their mouths with 5 mL of saline solution for 30 s and then expelled the liquid into a sealed container. The samples were placed on ice and immediately after collection, they were manually transported to the laboratory for further processing. The time between sample collection and subsequent processing was no more than six hours, with the samples kept on ice throughout the entire period.
Sample pretreatment
All saliva samples were pooled together and then divided into 18 aliquots after homogenization with a vortex mixer, which was followed by centrifugation at 14,000 x g for 20 min at 4 °C to collect the pellet, this centrifugation parameters were adapted from gut metaproteomics workflow35, then the eighteen aliquots were randomly divided into six groups: a Baseline group, a Control group and four experimental groups, with each group consisting of three technical replicates (N = 3). Pellets in the baseline group were washed three times with precooled PBS. For the Control group and the four experimental groups, the pellets were resuspended in different treatment buffers. CT group: Resuspended in PBS buffer (Solarbio, P1020); GX group: Resuspended in GX buffer containing 4 M GuHCl (Sigma, G3272) and 0.5% Triton X-100 (Sigma, 688462) in PBS; GT group: Resuspended in GT buffer containing 4 M GuHCl and 0.5% Tween 80 (Sigma, P4780) in PBS; UX group: Resuspended in UX buffer containing 6 M urea (Sigma, U5378) and 0.5% Triton-X 100 in PBS; UT group: Resuspended in UT buffer containing 6 M urea and 0.5% Tween 80 in PBS. After resuspension, the samples were centrifuged at 300 × g for 5 min at 4 °C to collect the supernatant and debris, The debris was resuspended in cold PBS and washed three additional times. The resulting supernatant was transferred to a new tube for high-speed centrifugation at 14,000 × g for 20 min at 4 °C to pellet the microbial cells. The microbial cell pellets were then washed twice with PBS, with each washing step involving resuspension in cold PBS followed by high-speed centrifugation under the same conditions (14,000 × g, 4 °C for 20 min). Mouth rinse samples were processed separately but followed the same protocol as described above.
Protein extraction, tryptic digestion, desalting, and LC-MS/MS analysis
The pretreated samples were stored overnight at −80 °C before bacterial cell lysis and protein extraction. The lysis buffer was freshly prepared, containing 8 M urea in 100 mM Tris-HCl (LABLEAD, 0497-1) buffer (pH = 8.0), plus Roche cOmplete™ Mini tablets (Roche, 04693116001). Microbial cell pellets and host cell debris were then re-suspended in 200 μl lysis buffer and subjected to ultrasonication at 8 °C for 10 min (10 s on/off), using a sonicator (Qsonica Q700, USA) with a Cup Horn probe. Forty percent amplitude was used, and 10 min of 10 s ultrasonication and 10 s cooling down were performed. Protein lysates were centrifuged at 16,000 × g for 10 min at 8 °C to remove the cell debris. The resulting supernatants were precipitated by 5-fold volume of precooled precipitation solvent (50% v/v acetone, 50% v/v ethanol, and 0.1% v/v acetic acid; all reagents were purchased from Sinopharm Chemical Reagent Co., Ltd.) at −20 °C overnight. The precipitated proteins were pelleted by centrifugation at 16,000 × g for 25 min at 4 °C and washed three times by precooled acetone for desalting. Protein pellets were then resuspended in 6 M urea (in 100 mM Tris-HCl buffer) for protein quantitation using the Pierce™ BCA Protein Assay Kits (Thermo Fisher Scientific, 23227) according to the manufacturer’s instructions.
For tryptic digestion, we employed an equal-volume method due to the difficulty in achieving a microbial protein concentration of 1 μg/μl. Specifically, 50 μg (or for samples with less than 50 μg of protein, an equivalent volume of protein solution totaling 50 μl) of protein was reduced with 10 mM dithiothreitol (Sigma, 43185) at 56 °C and 850 rpm for 30 min using a thermomixer (IKA MATRIX Delta Plus). The samples were then cooled to room temperature and alkylated with 20 mM iodoacetamide (Sigma, I1149) for 40 min at room temperature in the dark. Alkylation was stopped by adding 9-fold volume of 100 mM Tris-HCl buffer. Subsequently, 1 μg of trypsin (Worthington Biochemical Corp., Lakewood, NJ) was added to each sample for digestion at 37 °C and 850 rpm for 20 h on the thermomixer. Digestion reaction was stopped by adding 10% v/v formic acid (purchased from Sinopharm Chemical Reagent Co., Ltd.). The tryptic digest was desalted with a 10-μm C18 column and the tryptic peptides were then eluted with 80% acetonitrile (v/v)/0.1% formic acid (v/v). After evaporation, the tryptic peptides were dissolved in 0.1% formic acid (v/v). The peptide concentration was determined by measuring absorbance at 280 nm using a NanoDrop One/Onec (Thermo Fisher Scientific, USA) prior to mass spectrometry analysis.
For mass spectrometry analysis, 1 μg peptide was loaded into an EASY-nLC1000 system (Thermo Fisher Scientific, USA) and separated by a 150 μm × 15 cm long column packing with 1.9 mm C18 packing particles. Peptides were eluted at a flow rate of 600 nL min−1 with mobile phases A (0.1% FA in water) and B (0.1% FA in ACN) using a 78 min gradient: 0–8 min, 5% B; 8–58 min, 10–24% B; 58–70 min, 24–32% B; 70–71 min, 32–95% B; and 71–78 min, 95% B. An Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo Fisher Scientific, USA) was utilized to analyse all samples. The m/z range of the mass analyser was 300–1400 Da, and the MS resolution was 120,000. The AGC target and maximum injection time were set at 5000 and 35 ms, respectively. The collision energy was set to 33%. Data-dependent acquisition mode was applied to acquire the spectra. The signal intensity threshold was set at 5000.
Metaproteomics data analysis
Database search for peptide/protein identification and quantification
Protein/peptide identification and quantification were done using the MetaLab software (version 2.3)58. MetaLab is a tool designed to execute an iterative database search strategy known as MetaPro-IQ40. The search was based on a de-redundant database that combined the expanded Human Oral Microbiome Database (eHOMD) (release 2023_1_22, version 10.1)59 and the Human Uniprot database (release 2020_04, 204063 entries, including Swiss-Prot and TrEMBL)60. In MetaLab, a spectral clustering strategy58 was employed to construct the database from all raw files. Subsequently, peptide and protein lists were generated by applying rigorous filtering with an FDR of 0.01 at both the peptide and protein levels. Protein grouping was performed using MaxQuant (version 1.5.3.30) based on the principle of parsimony, with unique peptides assigned to their respective leading protein group. Protein quantification was conducted using the maxLFQ algorithm in MaxQuant (version 1.5.3.30). The fixed modification was Carbamidomethyl (C), while variable modifications included oxidation (M) and N-terminal acetylation (Protein N-term). The search parameters specified a fragment ion tolerance of 0.5 Da (MS/MS, ITMS). The minimum peptide length was set to seven amino acids, and a maximum of two missed cleavages was allowed. The matching between runs (MBR) option was selected.
Taxonomic analysis
Phylogenetic assignment for each of the quantified peptide sequences was determined using the lowest common ancestor (LCA) algorithm in the Unipept desktop software (version 2.0.1)61,62. The “Equal I and L” and “Advanced miss cleavage handling” options were selected, and the results were exported in table format. The analysis was carried out at the phylum, genus, and species levels. The abundance of each taxon was represented as the sum intensity of all unique peptides annotated to that taxon.
Functional annotation with eggNOG functional categories
The quantified protein groups were annotated with eggNOG (evolutionary genealogy of genes: Non-supervised Orthologous Groups) functional categories63. For protein-based functional annotation, the identified protein sequences were extracted from the eHOMD database and used for COG annotation via an in-house Perl script. Functional enrichment analysis was conducted on iMetaLab.ca42 (https://shiny.imetalab.ca/), which employs the hypergeometric distribution method to calculate p-values and performs FDR correction within each dataset. Functions with at least one match meeting the p-value cutoff were retained in the final result list and used for bubble plot visualization via R scripts. The subcellular localization of microbial proteins was predicted by CELLO (version 2.5)64.
Statistical analysis
Principal component analysis, sparse partial least-squares discriminant analyses (sPLS-DA) and hierarchical clustering were performed using MetaboAnalyst (version 6.0)65. sPLS-DA model was evaluated by cross-validation of cross-validation (CV). The data are expressed as mean ± standard deviation (SD). The normality of the data was assessed using the Shapiro-Wilk test (α = 0.05). For normally distributed data, differences between two groups were assessed by an unpaired t-test, while for non-normally distributed data, the Mann-Whitney U test was used. Multiple taxa comparisons were adjusted for multiple testing using the Bonferroni correction. One-way analysis of variance (ANOVA) with post-hoc Tukey’s test was used to compare multiple groups. A difference with p < 0.05 was considered statistically significant66,67.
Data availability
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (https://proteomecentral.proteomexchange.org) via the iProX partner repository with the dataset identifier PXD055269.
References
Kilian, M. et al. The oral microbiome—an update for oral healthcare professionals. Br. Dent. J. 221, 657–666 (2016).
Escapa, I. F. et al. New Insights into Human Nostril Microbiome from the Expanded Human Oral Microbiome Database (eHOMD): a Resource for the Microbiome of the Human Aerodigestive Tract. mSystems 3, https://doi.org/10.1128/msystems.00187-18 (2018).
Tierney, B. T. et al. The landscape of genetic content in the gut and oral human microbiome. Cell Host Microbe 26, 283–295.e8 (2019).
Yost, S., Duran-Pinedo, A. E., Teles, R., Krishnan, K. & Frias-Lopez, J. Functional signatures of oral dysbiosis during periodontitis progression revealed by microbial metatranscriptome analysis. Genome Med. 7, 27 (2015).
Atarashi, K. et al. Ectopic colonization of oral bacteria in the intestine drives TH1 cell induction and inflammation. Science 358, 359–365 (2017).
Lee, M. & Chang, E. B. Inflammatory Bowel Diseases (IBD) and the Microbiome—Searching the Crime Scene for Clues. Gastroenterology 160, 524–537 (2021).
Hu, S. et al. Ectopic gut colonization: a metagenomic study of the oral and gut microbiome in Crohn’s disease. Gut Pathog. 13, 13 (2021).
Gao, C. et al. Standardized studies of the oral microbiome: From technology-driven to hypothesis-driven. iMeta 1, e19 (2022).
Xun, Z., Zhang, Q., Xu, T., Chen, N. & Chen, F. Dysbiosis and ecotypes of the salivary microbiome associated with inflammatory bowel diseases and the assistance in diagnosis of diseases using oral bacterial profiles. Front. Microbiol. 9, 1136 (2018).
She, Y.-Y. et al. Periodontitis and inflammatory bowel disease: a meta-analysis. BMC Oral. Health 20, 67 (2020).
Said, H. S. et al. Dysbiosis of salivary microbiota in inflammatory bowel disease and its association with oral immunological biomarkers. DNA Res. Int. J. Rapid Publ. Rep. Genes Genomes 21, 15–25 (2014).
Zhang, T. et al. Dynamics of the salivary microbiome during different phases of Crohn’s disease. Front. Cell. Infect. Microbiol. 10, 544704 (2020).
Qi, Y. et al. High-throughput sequencing provides insights into oral microbiota dysbiosis in association with inflammatory bowel disease. Genomics 113, 664–676 (2021).
Somineni, H. K. et al. Site- and taxa-specific disease-associated oral microbial structures distinguish inflammatory bowel diseases. Inflamm. Bowel Dis. 27, 1889–1900 (2021).
Hu, S. et al. Oral microbiome of Crohn’s disease patients with and without oral manifestations. J. Crohns Colitis 16, 1628–1636 (2022).
Verberkmoes, N. C. et al. Shotgun metaproteomics of the human distal gut microbiota. ISME J. 3, 179–189 (2009).
Zhang, X. et al. Widespread protein lysine acetylation in gut microbiome and its alterations in patients with Crohn’s disease. Nat. Commun. 11, 1–12 (2020).
Zhang, X. et al. Metaproteomics reveals associations between microbiome and intestinal extracellular vesicle proteins in pediatric inflammatory bowel disease. Nat. Commun. 9, 2873 (2018).
Tanca, A. et al. Potential and active functions in the gut microbiota of a healthy human cohort. Microbiome 5, 79 (2017).
Lu, H. et al. The sampling strategy of oral microbiome. iMeta 1, e23 (2022).
Belstrøm, D. et al. Metaproteomics of saliva identifies human protein markers specific for individuals with periodontitis and dental caries compared to orally healthy controls. PeerJ 4, e2433 (2016).
Aebersold, R. & Mann, M. Mass spectrometry-based proteomics. Nature 422, 198–207 (2003).
Grassl, N. et al. Ultra-deep and quantitative saliva proteome reveals dynamics of the oral microbiome. Genome Med. 8, 44 (2016).
Rabe, A. et al. Metaproteomics analysis of microbial diversity of human saliva and tongue dorsum in young healthy individuals. J. Oral. Microbiol. 11, 1654786 (2019).
Rabe, A. et al. Impact of different oral treatments on the composition of the supragingival plaque microbiome. J. Oral. Microbiol. 14, 2138251 (2022).
Jiang, X. et al. In-depth metaproteomics analysis of oral microbiome for lung cancer. Research (Wash., DC) 2022, 9781578 (2022).
Xiao, X. et al. A qualitative and quantitative analysis of the human gingival crevicular fluid proteome and metaproteome. Proteomics 21, e2000321 (2021).
Arıkan, M. et al. Metaproteogenomic analysis of saliva samples from Parkinson’s disease patients with cognitive impairment. NPJ Biofilms Microbiomes 9, 86 (2023).
Davey, L., Halperin, S. A. & Lee, S. F. Thiol-disulfide exchange in gram-positive firmicutes. Trends Microbiol 24, 902–915 (2016).
Ventura, M. et al. Genomics of actinobacteria: tracing the evolutionary history of an ancient phylum. Microbiol. Mol. Biol. Rev.71, 495–548 (2007).
Larsen, N. et al. Gut microbiota in human adults with type 2 diabetes differs from non-diabetic adults. PLoS ONE 5, e9085 (2010).
Gupta, R. S. The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes. FEMS Microbiol. Rev. 24, 367–402 (2000).
Bennett, K. W. & Eley, A. Fusobacteria: new taxonomy and related diseases. J. Med. Microbiol. 39, 246–254 (1993).
San Martin, F., Fule, L., Iraola, G., Buschiazzo, A. & Picardeau, M. Diving into the complexity of the spirochetal endoflagellum. Trends Microbiol. 31, 294–307 (2023).
Zhang, X. et al. Assessing the impact of protein extraction methods for human gut metaproteomics. J. Proteom. 180, 120–127 (2018).
Wu, J. et al. Development of an integrated pipeline for profiling microbial proteins from mouse fecal samples by LC-MS/MS. J. Proteome Res. 15, 3635–3642 (2016).
Staurouskaya, A. A., Potapova, T. V., Rosenblat, V. A. & Serpinskaya, A. S. The effect of non-ionic detergent tween 80 on colcemid-resistant transformed mouse cells in vitro. Int. J. Cancer 15, 665–672 (1975).
Sanyalukruechai, C., Watthanasakphuban, N., Khemthong, M., Surachetpong, W. & Rattanaporn, K. Expression and purification of recombinant tilapia lake virus segment 4 protein and its in-vitro biological activity for potential use in vaccine development. Sci. Rep. 14, 31529 (2024).
Wiśniewski, J. R., Zougman, A., Nagaraj, N. & Mann, M. Universal sample preparation method for proteome analysis. Nat. Methods 6, 359–362 (2009).
Zhang, X. et al. MetaPro-IQ: a universal metaproteomic approach to studying human and mouse gut microbiota. Microbiome 4, 31 (2016).
Cox, J. et al. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol. Cell. Proteom.13, 2513–2526 (2014).
Li, L. et al. iMetaLab Suite: a one-stop toolset for metaproteomics. iMeta 1, e25 (2022).
Li, L. et al. Revealing proteome-level functional redundancy in the human gut microbiome using ultra-deep metaproteomics. Nat. Commun. 14, 3428 (2023).
Fan, X., Peters, B. A., Min, D., Ahn, J. & Hayes, R. B. Comparison of the oral microbiome in mouthwash and whole saliva samples. PloS ONE 13, e0194729 (2018).
Yano, Y. et al. Evaluation of alcohol-free mouthwash for studies of the oral microbiome. PLoS ONE 18, e0284956 (2023).
Caselli, E. et al. Defining the oral microbiome by whole-genome sequencing and resistome analysis: the complexity of the healthy picture. BMC Microbiol. 20, 120 (2020).
Canchi, D. R. & García, A. E. Backbone and side-chain contributions in protein denaturation by urea. Biophys. J. 100, 1526–1533 (2011).
Bennion, B. J. & Daggett, V. The molecular basis for the chemical denaturation of proteins by urea. Proc. Natl Acad. Sci. USA 100, 5142–5147 (2003).
Monera, O. D., Kay, C. M. & Hodges, R. S. Protein denaturation with guanidine hydrochloride or urea provides a different estimate of stability depending on the contributions of electrostatic interactions. Protein Sci. Publ. Protein Soc. 3, 1984–1991 (1994).
Gabryel-Porowska, H. et al. Mucin levels in saliva of adolescents with dental caries. Med. Sci. Monit. 20, 72–77 (2014).
Yu, H. et al. HSP70 chaperones RNA-free TDP-43 into anisotropic intranuclear liquid spherical shells. Science 371, eabb4309 (2021).
Chen, J. et al. In-depth metaproteomics analysis of tongue coating for gastric cancer: a multicenter diagnostic research study. Microbiome 12, 6 (2024).
Abdelbary, M. M. H. et al. The oral-gut axis: Salivary and fecal microbiome dysbiosis in patients with inflammatory bowel disease. Front. Cell. Infect. Microbiol. 12, 1010853 (2022).
Abusleme, L., Hoare, A., Hong, B.-Y. & Diaz, P. I. Microbial signatures of health, gingivitis, and periodontitis. Periodontol. 86, 57–78 (2021).
E, R., Ma, C. & Jf, N. The role of oral bacteria in inflammatory bowel disease. Nat. Rev. Gastroenterol. Hepatol. 18, 731–742 (2021).
Elmaghrawy, K. et al. The oral microbiome in treatment-naïve paediatric IBD patients exhibits dysbiosis related to disease severity that resolves following therapy. J. Crohns Colitis 17, 553–564 (2023).
Kato, I. et al. Nutritional correlates of human oral microbiome. J. Am. Coll. Nutr. 36, 88–98 (2017).
Cheng, K. et al. MetaLab: an automated pipeline for metaproteomic data analysis. Microbiome 5, 157 (2017).
F. Escapa, I. et al. Construction of habitat-specific training sets to achieve species-level assignment in 16S rRNA gene datasets. Microbiome 8, 65 (2020).
UniProt Consortium, T. he UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 49, D480–D489 (2021).
Mesuere, B. et al. The Unipept metaproteomics analysis pipeline. Proteomics 15, 1437–1442 (2015).
Mesuere, B. et al. Unipept: tryptic peptide-based biodiversity analysis of metaproteome samples. J. Proteome Res. 11, 5773–5780 (2012).
Huerta-Cepas, J. et al. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 44, D286–D293 (2016).
Yu, C.-S., Chen, Y.-C., Lu, C.-H. & Hwang, J.-K. Prediction of protein subcellular localization. Proteins 64, 643–651 (2006).
Pang, Z. et al. MetaboAnalyst 6.0: towards a unified platform for metabolomics data processing, analysis and interpretation. Nucleic Acids Res. 52, W398–W406 (2024).
Ma, J. et al. iProX: an integrated proteome resource. Nucleic Acids Res. 47, D1211–D1217 (2019).
Chen, T. et al. iProX in 2021: connecting proteomics data sharing with big data. Nucleic Acids Res. 50, D1522–D1527 (2022).
Acknowledgements
This work was funded by the National Key R&D Program of China (2022YFA1206103 and 2022YFE0118300), the National Natural Science Foundation of China (82070566), and the Beijing Natural Science Foundation (J230029). L.L. received start-up support from the National Center for Protein Science (Beijing) and the State Key Laboratory of Medical Proteomics. The funders provided financial support only and were not involved in the study design, data analysis, or manuscript preparation. We are grateful to the National Center for Protein Science (Beijing) and the State Key Laboratory of Medical Proteomics for providing the resources and facilities necessary for this research. We would like to thank Yuping Xie and Fang Liu of Proteomics Technology Platform for their help in mass spectrometry analysis.
Author information
Authors and Affiliations
Contributions
L.Z., L.L., F.C., and N.C. supervised the study and provided training for graduate students involved in this study. Conceptualization and experimental design were performed by J.Y., L.L., F.C., and N.C. Participant recruitment, sample collection, and obtaining written consent were carried out by N.C., C.Y., J.Y. and L.L. The experiments were performed by J.Y., B.S., M.L., and C.Y. Data analyses were performed by J.Y., B.S., and L.L. The original draft was written by J.Y. and L.L. All co-authors reviewed and edited the manuscript, and approved the final version.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
All participants signed informed consent forms, and the study was approved by the Institutional Medical Ethics Review Board of Peking University People’s Hospital (Approval No. 2022PHB373-001).
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yuan, J., Sun, B., Li, M. et al. OSaMPle workflow for salivary metaproteomics analysis reveals dysbiosis in inflammatory bowel disease patients. npj Biofilms Microbiomes 11, 63 (2025). https://doi.org/10.1038/s41522-025-00692-z
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41522-025-00692-z







