Harnessing vaginal inflammation and microbiome: a machine learning model for predicting IVF success

Bar, Ofri; Vagios, Stylianos; Barkai, Omer; Elshirbini, Joseph; Souter, Irene; Xu, Jiawu; James, Kaitlyn; Bormann, Charles; Mitsunami, Makiko; Chavarro, Jorge E.; Foessleitner, Philipp; Kwon, Douglas S.; Yassour, Moran; Mitchell, Caroline

doi:10.1038/s41522-025-00732-8

Download PDF

Article
Open access
Published: 05 June 2025

Harnessing vaginal inflammation and microbiome: a machine learning model for predicting IVF success

Ofri Bar^1,2,3^na1,
Stylianos Vagios⁴^na1,
Omer Barkai^3,5,6^na1,
Joseph Elshirbini⁷,
Irene Souter⁸,
Jiawu Xu⁷,
Kaitlyn James¹,
Charles Bormann⁸,
Makiko Mitsunami⁸,
Jorge E. Chavarro^8,9,10,
Philipp Foessleitner^1,3,11,
Douglas S. Kwon^3,7,10,
Moran Yassour^2,12^na1 &
…
Caroline Mitchell^1,3^na1

npj Biofilms and Microbiomes volume 11, Article number: 95 (2025) Cite this article

5499 Accesses
3 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Humans are the only species with a commensal Lactobacillus-dominant vaginal microbiota. Reproductive tract microbes have been linked to fertility outcomes, as has intrauterine inflammation, suggesting immune response may mediate adverse outcomes. In this pilot study, we compared vaginal microbiota composition and immune marker concentrations between patients with unexplained or male factor infertility (MFI), as a control. We applied a supervised machine learning algorithm that integrated microbiome and inflammation data to predict pregnancy outcomes.

Twenty-eight participants provided vaginal swabs at three IVF cycle time points; 18 achieved pregnancy. Pregnant participants had lower microbial diversity and inflammation. Among them, MFI cases had higher diversity but lower inflammation than those with unexplained infertility. Our model showed the highest prediction accuracy at time point 2 of the IVF cycle. These findings suggest that vaginal microbiota and inflammation jointly impact fertility and can inform predictive tools in reproductive medicine.

A citizen-science-enabled catalogue of the vaginal microbiome and associated factors

Article Open access 26 October 2023

Towards a deeper understanding of the vaginal microbiota

Article 04 March 2022

Microbial diversity in the vaginal microbiota and its link to pregnancy outcomes

Article Open access 04 June 2023

Introduction

Infertility affects 7–15% of reproductive age women in the United States¹. In vitro fertilization (IVF) is one of the most common types of infertility treatment and involves oocyte retrieval, fertilization with sperm in the laboratory, and embryo transfer into the uterus. This process results in live birth in up to 40% - 50% of cases, depending on various factors such as age and infertility diagnosis^2,3. People with unexplained infertility have the lowest success rates (closer to 30%)⁴, and although success rates have increased over the past decades, there are still opportunities to optimize the efficacy of IVF.

Cervicovaginal microbiota are recognized to play an important role in women’s health⁵. In contrast to the gut, where a diverse community is considered healthy, in the vagina a low-diversity community dominated by Lactobacillus is a marker of health⁵. Dominance of the vaginal or endometrial microbiota by L. crispatus is associated with higher pregnancy rates after IVF, compared to dominance by anaerobic bacteria^6,7,8,9,10 One hypothesized reason for this is the association between Lactobacillus-dominance and lower concentrations of vaginal inflammatory chemokines and cytokines¹¹. Both immune and microbiome compose large (high-dimensional) data sets, making machine learning tools useful as they allow for the identification of complex microbial and inflammatory patterns and associations that may impact health outcomes. Recent reports highlight how these methods can be transformed to enhance our understanding of microbiome dynamics and their predictive value in clinical contexts^12,13.

In this pilot study, we prospectively collected vaginal samples at three time points during a treatment cycle from women with unexplained or male factor infertility (MFI) undergoing IVF. We aimed to assess associations between vaginal microbial composition, vaginal fluid inflammatory markers and chances of becoming pregnant. We hypothesized that people with unexplained infertility would be less likely to have Lactobacillus dominance throughout the cycle, and that lower concentrations of vaginal fluid pro-inflammatory markers would be associated with increased clinical pregnancy rates across diagnoses. Using Support Vector Machine (SVM) supervised machine learning model, we demonstrate that vaginal microbiome data alone—or combined with inflammatory markers—can effectively predict pregnancy potential with high performance^13,14. Furthermore, we leverage SHapley Additive exPlanations (SHAP) analysis to interpret feature importance and provide a detailed explanation of the key predictive factors within our model¹⁵ (Fig. 1).

**Fig. 1: Workflow for sample collections, data processing, analysis, and machine learning model construction.**

Results

Study population

We enrolled 30 people, and 29 completed an IVF cycle during the study period. One person was excluded from analysis because only 1 swab was collected, leaving 28 participants for the final analysis, 14 with unexplained infertility and 14 with MFI. All participants with unexplained infertility provided 3 vaginal swabs. Of the participants with MFI, eleven had 3 vaginal swabs and three had 2 vaginal swabs collected (2 participants were missing the second and one the first swab). Thus, the analysis included 81 samples.

18 of the 28 participants became pregnant. There were no significant differences in age, infertility diagnosis, race, BMI, ovarian reserve markers, cycle type, blastocysts produced, and number of transferred embryos between those who became pregnant and those who did not (Table 1).

Table 1 Patient demographics, cycle characteristics, and outcomes

Full size table

Vaginal microbiome—Community state types and microbiome diversity

Presence of CST I was associated with clinical pregnancy (Fig. 2A, B). At time of embryo transfer 11 of 14 people with CST I became pregnant (79%), 2 of 2 with CST II became pregnant (100%), 4 of 6 with CST III became pregnant (66.6%), 1 of 4 with CST IV became pregnant (25%) and 0 of 2 with CST V (p = 0.07). Most participants had the same CST assignment across all three time points (Fig. 2C). To assess the association between vaginal microbiome diversity and pregnancy outcome, we compared alpha diversity by pregnancy outcome, using the Shannon Diversity Index. Women who became pregnant were significantly more likely to have a less diverse vaginal microbiome compared to women who did not become pregnant (Fig. 2D, p = 0.041). When comparing microbial diversity between the two infertility diagnoses groups, participants with MFI had a significantly more diverse vaginal microbiome than those with unexplained infertility (Fig. 2E, p = 0.031).

**Fig. 2: Features of the vaginal microbiome by infertility diagnosis and pregnancy outcome.**

Within patients who became pregnant, patients with MFI had a more diverse vaginal microbiome than those with unexplained infertility (Supplementary Fig 1A, p = 0.086), however the difference did not reach statistical significance, possibly due to the small sample size. Among the participants who did not become pregnant, there was no difference in alpha diversity between the two types of infertility (Supplementary Fig 1A, p = 0.397).

When we compared alpha diversity within infertility group, we noted that unexplained infertility patients who became pregnant had lower vaginal microbiome diversity compared to the ones that did not become pregnant, however this difference did not reach statistical significance (Supplementary Fig 1b, p = 0.210). Patients with MFI who became pregnant had less difference in microbial diversity compared to those who did not become pregnant (Supplementary Fig 1B, p = 0.324).

Genital inflammation and treatment outcome

We next examined whether genital inflammation was associated with treatment outcome. Among 20 analytes, 2 were undetectable across all samples. The remaining 18 were analyzed across 79 vaginal samples that had complete data for cytokines across all three time points. There were no notable differences in immune markers between infertility diagnoses or pregnancy outcomes (Fig. 3A).

**Fig. 3: Vaginal inflammatory markers, inflammation score and pregnancy outcome.**

We assigned an inflammation score for each of the 79 samples by tallying the number of values in the top quartile for 9 selected analytes in each sample (IL-1b, IL-1a, IP-10, IL-6, TNFa, IL-8, MIP-1a, MIP-1b,IL-17).

Participants who became pregnant had a significantly lower inflammation score than those who did not (Fig. 3B, p = 0.024). We then compared inflammation scores within each CST category to see if differences in the host inflammatory response to a similar microbial profile were associated with treatment outcome. Among participants with CST III vaginal microbiome (L. iners dominant), genital inflammation scores were higher in the participants who did not conceive compared to those who did (Fig. 3D). In participants with CST I microbiome (L. crispatus dominant), there was no such difference.

Within the MFI group, patients who conceived had a lower inflammation score compared to the ones that did not (Fig. 3C, p = 0.061), however the difference did not reach statistical significance. Within the unexplained infertility group, this difference was not as pronounced (Fig. 3C, p = 0.296).

Machine learning approach for predicting pregnancy outcome using microbial community and genital inflammation

Our data show slightly different associations with pregnancy outcome between the two fertility diagnoses, which led to the question of whether host responses, microbial community, or a combination of the two, can be used for predicting pregnancy outcome in IVF. To this end, we developed and applied a supervised machine learning algorithm (Fig. 1). We trained a support vector machine (SVM) classification model with the subjects taxonomic or inflammatory data as features (‘X’), and their pregnancy outcomes as targets (‘y’). Prediction performance was assessed at each time point where swabs were collected during the IVF cycle, using microbiome data, cytokine data, or a combination of both. When using only bacterial features, the highest prediction performance with an F1-score of 0.9 was observed at time point 2 (Fig. 4A). With inflammatory features alone, the best prediction occurred at time point 3, during embryo transfer, with an F1-score of 0.86. When combining both bacterial and inflammatory features, the best prediction was at time point 2, with an F1-score of 0.87 (Fig. 4A). Model importance analysis suggests relative abundance of Gardnerella vaginalis to be of high impact in the models’ performances (Supplementary Fig 2). Gardnerella vaginalis, however, is often considered a marker of high microbial diversity. To confirm that the presence of Gardnerella was specifically associated with pregnancy outcome, and not an indicator for high microbial diversity, we asked whether adding a diversity index as a feature would benefit the model’s performance. We included the Shannon diversity index as a feature and retrained the classification model for the three feature sets over the three time points. Across all nine models, the F1-scores were lower when the Shannon index was included, while Gardnerella vaginalis remained the most important bacterial feature affecting pregnancy outcome prediction. The model also resulted in lower performances when Gardnerella vaginalis was dropped from the Shannon index-included feature set, suggesting that Gardnerella vaginalis cannot be solely considered as a high microbial diversity measure. We further tested if infertility diagnosis can improve the model’s ability to predict pregnancy outcomes. The addition of the infertility diagnosis as a feature to our training dataset also did not improve the model’s performance. To assess whether our model’s performance was significantly better than random chance, we performed a permutation test where the pregnancy outcome labels were randomly shuffled 50 times for each model. F1 scores were computed for each of the 50 permutations. We found that the models trained on the original labels consistently outperformed those trained on shuffled labels. A one-sample t-test confirmed that this difference was statistically significant (Supplementary Fig 3).

**Fig. 4: Computational model predicts pregnancy outcome using vaginal microbiome and inflammation data.**

Prediction explanation

We next sought to understand which of the features had highest importance in our prediction performance. To provide an explanation of how each feature contributes to the predicted outcome of our machine learning models, we used SHAP summary showing the contribution of the top-ten features for each subject in the three models (Fig. 4B, left) and their absolute importance (Fig. 4B, right).

In both models that included bacterial features, the presence of Gardnerella vaginalis was the most impactful bacterial variable in the model, with high relative abundance contributing to no pregnancy. Notably, L. crispatus appeared in the top ten ranking of the two bacteria feature-based models and is shown to be positively associated with pregnancy outcome, agreeing with our findings (Fig. 2A, B). Enterobacter also appears on both bacteria feature-based models, shown to have a negative impact on the pregnancy outcome predictions.

Presence of several cytokines also had a significant impact on pregnancy prediction: IL-1a and ITAC, repeated in top-three rankings in cytokine feature-based models, both with a negative pregnancy outcome—high abundance relation in the models’ predictions.

Discussion

In this pilot study including patients with unexplained and male factor infertility as a control assuming normal fertility for the female partner, we evaluated the relationship between the vaginal microbiota, genital inflammation, and pregnancy outcome. We show that the microbiota and inflammatory data can be utilized for the prediction of IVF pregnancy outcome with high accuracy. As previous studies have shown, we observed an inverse relationship between microbial diversity and the chance of clinical pregnancy. Interestingly, though, we found different patterns of association between these measures among couples with unexplained vs. male factor infertility (in which the female partner is presumed to have normal fertility). In our small cohort, people with unexplained infertility who became pregnant had lower vaginal microbial diversity than those with male factor infertility. These results could suggest that the vaginal microbiome has a greater impact on pregnancy outcome in people with unexplained infertility. Of note, the participants with male factor infertility were less likely to have a pattern of Lactobacillus crispatus dominance and had a significantly higher alpha diversity compared to participants with unexplained infertility.

“Optimal” vaginal microbial populations dominated by Lactobacillus species have been associated with higher rates of pregnancy with IVF, however few studies have evaluated whether this association differs by indication for IVF⁵. In a study where the majority of people included had MFI (67%), those with Lactobacillus dominance were more likely to become pregnant⁷. In a study of 30 women who likely had ovulatory dysfunction or unexplained infertility, investigators showed that greater diversity of the vaginal microbial community on the day of embryo transfer was associated with a lower live birth rate¹⁶. A Danish cohort with equal proportions of male factor (36%) and unexplained (31%) infertility showed a 35% live birth rate in IVF patients with CST I, and a 41% live birth rate for CST III compared to 8% for CST IV¹⁷. In that cohort, the prevalence of CST I was similar between those with unexplained (43%) vs. male factor infertility (58%), though swabs could have been taken up to 2 months prior to embryo transfer. They also found that the total load of L. iners in CST III communities by qPCR was much higher than the load of L. crispatus in CST I communities and suggested that total abundance may be as important as relative abundance when assessing the role of the microbiome in IVF. None of these studies compared results between infertility diagnoses. We also observed an inverse relationship between microbial diversity and the chance of clinical pregnancy. Interestingly, though, among participants with clinical pregnancies, there was significantly higher diversity in the microbiome of patients with MFI compared to those with unexplained infertility. This highlights the importance of including the infertility diagnosis in such studies.

In studies of the seminal microbiome, community composition differed between those with normal vs. abnormal sperm parameters^18,19. More participants in our study with MFI had CST III or CST IV communities. We did not evaluate seminal microbiome nor recency of sexual intercourse, but other work has shown a correlation between penile and vaginal microbiota in sexual partners²⁰. In a recent study of semen microbiome, men with abnormal sperm motility had a higher abundance of Lactobacillus iners, compared to those with normal sperm motility. This study also observed that men with abnormal sperm concentration showed a higher abundance of Pseudomonas stutzeri and Pseudomonas fluorescens²¹. It is possible that the vaginal microbiome in couples with MFI in our study is reflective of male partner semen dysbiosis.

Cytokines, growth factors, and adhesion molecules all participate in making implantation and pregnancy possible. Using a composite inflammation score to identify people with the highest concentrations of multiple markers we showed that the presence of vaginal fluid inflammation was associated with lower chance of pregnancy. Most studies of immune factors associated with IVF success measure serum concentrations of chemokines and cytokines and have shown that markers of systemic inflammation are associated with unexplained infertility and recurrent implantation failure^22,23. Fewer studies have evaluated endometrial or vaginal markers of inflammation. In one study of endometrial biopsies collected in the cycle before an IVF cycle, the presence of ≥ 5 CD138+ plasma cells in a high-powered microscopy field, indicative of chronic endometritis, was associated with a significantly lower pregnancy rate²⁴. In a separate study, endometrial fluid was aspirated immediately prior to embryo transfer in a cohort of patients where ~50% were being treated for male factor infertility, ~20% for unexplained infertility and ~20% for tubal factor. Elevated levels of TNF-ɑ and MIF (macrophage migration inhibitory factor) were associated with increased clinical pregnancy, while IL1b and MCP-1 (monocyte chemoattractant protein) were associated with lower pregnancy rates¹¹. These data indicate that complex immune interactions inform IVF outcomes: both TNF-a and IL1b are considered to have pro-inflammatory effects but have opposite associations.

Our machine learning model identified the vaginal microbiota, a combination of the vaginal microbiota and vaginal fluid immune markers at the second time point (which was egg retrieval for most participants), or vaginal fluid immune markers at the third time point which was embryo transfer as the best predictor of pregnancy outcomes. A study using endometrial biopsies found that when the endometrial microbiome had <90% lactobacilli, tissue had higher concentrations of the pro-inflammatory analytes IL6, IL1b and less of the anti-inflammatory IL10²⁵. Previous data have shown that a non-Lactobacillus dominant endometrial microbiome in the cycle prior to an IVF cycle is associated with lower rates of pregnancy^9,10. Interestingly, Corynebacterium emerged as an important factor in multiple comparisons within our model, despite not being among the most abundant taxa in vaginal samples. Given that the vaginal microbiome is typically of low diversity, it would be expected that the most predictive taxa would be among the more dominant species. However, machine learning approaches allow for the identification of taxa that may not be the most abundant but still hold biological significance. While Corynebacterium is generally considered a minority member of the vaginal microbiome, it has been more frequently reported in postmenopausal individuals and cases of microbial dysbiosis and has also been previously linked to preterm birth²⁶. Its predictive role in our model suggests that even subtle variations in its abundance may have functional relevance. While the clinical significance of Corynebacterium in this context remains unclear, this finding highlights the potential for non-dominant taxa to influence reproductive outcomes and underscores the need for further investigation in larger cohorts.

At the time of embryo transfer, our model showed that immune factors alone had the highest predictive value. This suggests that the impact of the microbiome on pregnancy outcomes is likely mediated through the host immune response—the microbiome at the second time point likely drives the immune response at the third time point, which may be reflective of endometrial receptivity.

The prospective nature of our study and the fact that our participants were consistently evaluated within the same center are major strengths of the study. All participants had the same evaluation to identify the cause of their infertility. Limitations of this study include the small size of our population, which limits our power to detect associations, precludes controlling for potential confounders in our analysis, and limits the generalizability of our findings and the ability to draw clinical conclusions that would affect practice. We included both fresh and frozen cycles in our analysis, which increased generalizability but also heterogeneity, and did not have sufficient numbers to control for different hormonal regimens. We assessed whether cycle type influenced the prediction model at mid-cycle and observed no differences in pregnancy prediction accuracy across all models using our features. At this time there are no interventions proven to reliably shift the vaginal microbiome toward a L. crispatus-dominant community type, however if and when these become available, our results suggest studying these in the context of IVF outcomes could be promising. The majority of our population identifies as White, further limiting the generalizability of our findings. Additionally, our participants diagnosed with male factor infertility could also have unexplained female infertility, as this is a diagnosis of exclusion. Finally, we did not measure endometrial microbiota, which may be of greater relevance to implantation, though less accessible for routine testing.

The findings of this pilot analysis suggest that larger studies of microbiota and IVF outcomes should plan for sub-analyses by type of infertility, as there may be differences in the magnitude of impact of microbial communities between different diagnoses. Our results also affirm that a low-diversity, Lactobacillus dominant vaginal microbiota is associated with greater success for a given IVF cycle. To our knowledge, this is the first published machine learning model that enables the prediction of IVF pregnancy outcome based on microbiome and inflammatory data. While it remains to be tested, it may pave the way for improving intervention in IVF cycle medical decision-making. Furthermore, our machine learning model’s feature importance explanation can be used as an exploratory guide map for a better understanding of the underlying factors affecting the chances of becoming pregnant in IVF.

Methods

Study design

This pilot study was approved by the Mass General Brigham Human Subjects Committee, IRB number 2015P000085. We enrolled participants under 40 years of age with unexplained infertility or MFI who underwent IVF between 10/2019 and 04/2021 at the Massachusetts General Hospital Fertility Center. All study participants provided written consent prior to enrollment. Participants did not receive any incentive or compensation for their participation. Unexplained infertility was defined as a couple with normal semen analysis, normal evaluation of uterus, tubes, and ovarian function (AMH > 0.8), who have been trying to conceive for 6–12 months (depending on age) without success. Male factor infertility was defined as a couple with normal evaluation of uterus, tubes, ovarian function (AMH > 0.8) and one or more abnormal semen analysis parameters on two separate samples produced at least 2 weeks apart. Normal semen analysis values were based on the World Health Organization 5th ed. Guidelines: concentration >20 million/mL, motility >40%, forward progression >3 and total motile count >15 million motile sperm/sample²⁷.

Both fresh and frozen embryo transfer cycles were included. For cycles with a planned fresh embryo transfer, participants provided vaginal swabs on day 3 of the stimulation cycle, on the day of egg retrieval and on the day of embryo transfer. For cryothaw cycles, participants provided swabs on the day of baseline ultrasound (day 3–5), on the day of the second ultrasound, and on the day of embryo transfer. Swabs were collected before clinical procedures. Swabs were stored at −80 °C until processed.

Data on age, race, peak serum estrogen levels, anti-Mullerian hormone (AMH) levels, follicle-stimulating hormone (FSH) levels, embryo quality, number of embryos transferred, stimulation protocol, pregnancy history and medical history were obtained through chart review. Information about treatment outcomes (i.e., no pregnancy, biochemical pregnancy, confirmed intrauterine gestation on ultrasound) was obtained from the medical record for the ART cycle in which the vaginal samples were collected.

Laboratory analyses

Swabs were eluted in 400 μL sterile saline, centrifuged, and the pellet submitted for DNA extraction, while the supernatant was aliquoted and frozen for Luminex analysis. Genomic DNA (gDNA) was extracted using a plate-based protocol that included a bead beating process and combined phenol-chloroform isolation with QIAamp 96 DNA QIAcube HT Kit (Qiagen) procedures. The amplicon library of bacterial 16S rRNA gene V4 region was prepared and sequenced on Illumina MiSeq with a 300-cycle sequencing kit²⁸. Taxonomic assignment was performed using GTDB and the microbial compositional analysis package dada2^29,30. In stacked bar plots, only the 20 most prevalent taxa are represented. The microbial compositional analysis was performed using R (Version 4.2.2). Taxonomy data was aggregated at the genus level, and low-abundance taxa (<0.5% prevalence) were filtered out. Samples were assigned to a microbial Community State Type (CST) using VAginaL community state typE Nearest CentroId clAssifier-VALENCIA³¹: CST I = Lactobacillus crispatus dominant, CST II = Lactobacillus gasserii dominant, CST III = Lactobacillus iners dominant, CST V = Lactobacillus jensenii dominant, CST IV = non-Lactobacillus dominant.

The concentrations of 20 cytokines/chemokines (MIG, IP10, IFN-γ, ITAC, IL1α, IL1β, TNFα, IL6, IL8, MIP-3α, IL12, MIP1α, MIP3β, IL13, IL 12, IL21, IL4, IL23, IL5, IL10) were measured in the vaginal supernatant using multiplexed ELISA assays (Luminex), as previously described^29,32. Values below the lower limit of detection in the assay were recorded as half of the lowest standard concentration for that analyte. Similarly, concentrations above the detectable limit were recorded as 1.1 times the highest standard concentration. In an attempt to identify participants with the most overall mucosal inflammation, we assigned each sample an “inflammation score”, which corresponded to the number of any of 9 inflammatory markers with a value in the highest quartile for this cohort. These markers were selected based on a panel associated with increased risk for STI acquisition³²: (interleukin (IL)-1α, IL-1β, IL-6, tumor necrosis factor (TNF)-α, IL-8, C-X-C motif chemokine 10 (CXCL10; also known as IP-10), IL-17, macrophage inflammatory protein (MIP)-1α, and MIP-1β). For each marker, we assigned a score of 1 if the concentration value was >= 75th percentile (of the cohort) and 0 if below that value. We tallied the score for all the markers across 79 samples to provide an inflammation score for each sample (the score ranging between 0-9 for each sample).

Statistical analysis

This was a pilot, convenience sample. Metadata were compared between those who did vs. did not have an intrauterine pregnancy (IUP) and between patients with unexplained vs. MFI using chi square, t-test or Mann–Whitney U-test as appropriate. When evaluating the association between microbiome or inflammation and outcomes, we included data from all timepoints for all participants in a mixed effects linear regression model to control for the multiple samples per participant and for the different time points. Statistical analyses were performed using either Stata or R, with a p-value of less than 0.05 considered statistically significant.

Alpha diversity was assessed using the Shannon Diversity Index and was calculated using the Diversity function, which is part of the Vegan package. Alpha diversity was compared between groups using mixed effects linear regression analysis to account for multiple samples from each participant.

Machine learning model

Data acquisition, preprocessing, and target definition

The dataset used for this study included subjects with both cytokine and bacterial abundance data at each of the three time points: 1 A (27 subjects), 2A (25 subjects), 3A (27 subjects), with a binary pregnancy outcome label (‘outcome’). As a preprocessing step, to ensure a high-quality feature matrix (X) for the subsequent modeling steps, feature columns with values below a 0.01 threshold were excluded, ensuring only significant features were included. Columns with missing or low-variance data, defined as less than 50% of non-zero values, were also removed. The target variable (y) was the binary pregnancy outcome per subject.

Data balancing using SMOTE

To address the class imbalance in the dataset (between pregnant and not pregnant) we applied Synthetic Minority Over-sampling Technique (SMOTE) with a consistent arbitrary random–state³³. This technique generated synthetic samples for the minority class, ensuring balanced representation of both classes for training the classifier. With SMOTE over-sampling, 34 data points were used for time point 1 A and 3 A, and 32 were used for time point 2 A.

Machine-learning model

A support vector machine (SVM) with C-Support Vector Classification with a linear kernel was selected due to its simplicity and superior performance. The classifier was trained on the scaled training data, and predictions were generated on the left-out test sample^34,35,36. The model was used with the default parameters provided by the Scikit-learn package. The prediction probability threshold which was considered positive was above 0.5, otherwise it was considered negative.

Model evaluation metrics

Due to the imbalanced nature of the data, to evaluate the model’s performance, we used F1-score, which is the harmonic mean of precision and recall, providing a balanced measure of a model’s prediction performance that accounts for both false positives and false negatives:

$$F1=2\cdot \frac{{precision}\cdot {recall}}{{precision}+{recall}}$$

Where

$${precision}=\frac{{true\; positive}}{{true\; positive}+{false\; positive}}$$

$${recall}=\frac{{true\; positive}}{{true\; positive}+{false\; negative}}$$

Accuracy, the number of correct predictions the model has made was measured as follows:

$${Accuracy}=\frac{{true\; positive}+{true\; negative}}{{true\; positive}+{false\; positive}+{true\; negative}+{false\; negative}}$$

While training sets included both original and SMOTE-synthesized data, the evaluations of the models were computed only for the original dataset. Confusion matrices were generated to assess the classification performance.

Leave-one-out cross-validation

We employed leave-one-out cross-validation (LOO-CV) to rigorously evaluate the model’s performance. For each iteration, one sample was held out as the test set while the rest of the data was used as the training set. The number of iterations was equal to the number of samples, as this was repeated for every sample (Fig. 1). Evaluation was performed using the F1 and Accuracy metrics. The features matrix was standardized using the ‘StandardScaler’ method, standardizing each feature by subtracting the mean and then scaling to unit variance, ensuring that all features contributed equally to the model’s predictions.

Feature importance

For feature importance, we relied on (SHapley Additive exPlanations) SHAP from the SVC linear model^37,38. Feature importance values were computed after each LOO-CV fold. These importance scores were averaged across folds to represent the overall contribution of each feature.

Software and tools

Data analysis and statistics were performed using R. Machine learning analysis and modeling were conducted using Python. We used the following open-source packages: Pandas for data manipulation, Imblearn (SMOTE) for handling class imbalances, Scikit-learn for machine learning and Matplotlib for plotting.

Data availability

The sequences generated as part of this analysis were uploaded to the NCBI Short Read Archive (BioProject PRJNA1037556).

Code availability

Scripts to reproduce the machine learning algorithm are available in a GitHub repository:https://github.com/OmerBarkai/IVFOutcomeAI.

References

Thoma, M. E. et al. Prevalence of infertility in the United States as estimated by the current duration approach and a traditional constructed approach. Fertil. Steril. 99, 1324–1331.e1 (2013).
Gleicher, N., Kushnir, V. A. & Barad, D. H. Worldwide decline of IVF birth rates and its probable causes. Hum. Reprod. Open 2019, hoz017 (2019).
Article CAS PubMed PubMed Central Google Scholar
CDC. ART Success Rates | CDC. https://www.cdc.gov/art/success-rates/?CDC_AAref_Val=https://www.cdc.gov/art/artdata/index.html (2022).
Reindollar, R. H. et al. A randomized clinical trial to evaluate optimal treatment for unexplained infertility: the fast track and standard treatment (FASTT) trial. Fertil. Steril. 94, 888–899 (2010).
Article PubMed Google Scholar
Anahtar, M. N., Gootenberg, D. B. & Mitchell, C. M. & Kwon, D. S. Cervicovaginal microbiota and reproductive health: the virtue of simplicity. Cell Host Microbe 23, 159–168 (2018).
Schoenmakersa, S., Laven, N. J. & Schoenmakersa, S. The vaginal microbiome as a tool to predict IVF success. Curr. Opin. Obstet. Gynecol. 32, 169–178 (2020).
Article Google Scholar
Koedooder, R. et al. The vaginal microbiome as a predictor for outcome of in vitro fertilization with or without intracytoplasmic sperm injection: a prospective study. Hum. Reprod. 34, 1042–1054 (2019).
Article CAS PubMed Google Scholar
Kong, Y. et al. The disordered vaginal microbiota is a potential indicator for a higher failure of in vitro fertilization. Front. Med. 7, 217 (2020).
Article Google Scholar
Moreno, I. et al. Evidence that the endometrial microbiota has an effect on implantation success or failure. Am. J. Obstet. Gynecol. 215, 684–703 (2016).
Article PubMed Google Scholar
Moreno, I. et al. Endometrial microbiota composition is associated with reproductive outcome in infertile patients. Microbiome 10, 1 (2022).
Boomsma, C. M. et al. Endometrial secretion analysis identifies a cytokine profile predictive of pregnancy in IVF. Hum. Reprod. 24, 1427–1435 (2009).
Article CAS PubMed Google Scholar
Hernández Medina, R. et al. Machine learning and deep learning applications in microbiome research. ISME Commun. 2, 1–7 (2022).
Pasolli, E., Truong, D. T., Malik, F., Waldron, L. & Segata, N. Machine learning meta-analysis of large metagenomic datasets: tools and biological insights. PLoS Comput. Biol. 12, e1004977 (2016).
Article PubMed PubMed Central Google Scholar
Statnikov, A. et al. A comprehensive evaluation of multicategory classification methods for microbiomic data. Microbiome 1, 11 (2013).
Article PubMed PubMed Central Google Scholar
Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. Adv. Neural. Inf. Process. Syst. 2017, 4766–4775 (2017).
Hyman, R. W. et al. The dynamics of the vaginal microbiome during infertility therapy with in vitro fertilization-embryo transfer. J. Assist Reprod. Genet 29, 105–115 (2012).
Article PubMed PubMed Central Google Scholar
Haahr, T. et al. Vaginal microbiota and in vitro fertilization outcomes: development of a simple diagnostic tool to predict patients at risk of a poor reproductive outcome. J. Infect. Dis. 219, 1809–1817 (2019).
Article CAS PubMed Google Scholar
Garcia-Segura, S. et al. Seminal microbiota of idiopathic infertile patients and its relationship with sperm DNA Integrity. Front. Cell Dev. Biol. 10, 937157 (2022).
Veneruso, I. et al. Metagenomics reveals specific microbial features in males with semen alterations. Genes 14, 1228 (2023).
Mehta, S. D. et al. The microbiome composition of a man’s penis predicts incident bacterial vaginosis in his female sex partner with high accuracy. Front. Cell Infect. Microbiol. 10, 433 (2020).
Article PubMed PubMed Central Google Scholar
Osadchiy, V. et al. Semen microbiota are dramatically altered in men with abnormal sperm parameters. Sci. Rep. 14, 1068 (2024).
Topkara Sucu, S. et al. New immunological indexes for the effect of systemic inflammation on oocyte and embryo development in women with unexplained infertility: systemic immune response index and pan-immune-inflammation value. Am. J. Reprod. Immunol. 92, e13923 (2024).
Liang, P. Y. et al. The pro-inflammatory and anti-inflammatory cytokine profile in peripheral blood of women with recurrent implantation failure. Reprod. Biomed. Online 31, 823–826 (2015).
Article PubMed Google Scholar
Li, Y. et al. Diagnosis of chronic endometritis: How many CD138+ cells/HPF in endometrial stroma affect pregnancy outcome of infertile women?. Am. J. Reprod. Immunol. 85, e13369 (2021).
Article CAS PubMed Google Scholar
Cela, V. et al. Endometrial dysbiosis is related to inflammatory factors in women with repeated implantation failure: a pilot study. J. Clin. Med. 11, 2481 (2022).
Ansari, A. Z. et al. Dysbiotic vaginal microbiota induces preterm birth cascade via pathogenic molecules in the vagina. Metabolites 14, 45 (2024).
Article CAS PubMed PubMed Central Google Scholar
World Health Organization. Who Laboratory Manual for the Examination and Processing of Human Semen. Vol. 6, 1–276 (World Health Organization, Geneva, 2021).
Bloom, S. M. et al. Cysteine dependence of Lactobacillus iners is a potential therapeutic target for vaginal microbiota modulation. Nat. Microbiol. 7, 434 (2022).
Article CAS PubMed PubMed Central Google Scholar
Gosmann, C. et al. Lactobacillus-deficient cervicovaginal bacterial communities are associated with increased HIV acquisition in young South African women. Immunity 46, 29–37 (2017).
Article CAS PubMed PubMed Central Google Scholar
Anahtar, M. N. et al. Cervicovaginal bacteria are a major modulator of host inflammatory responses in the female genital tract. Immunity 42, 965–976 (2015).
Article CAS PubMed PubMed Central Google Scholar
France, M. T. et al. VALENCIA: a nearest centroid classification method for vaginal microbial communities based on composition. Microbiome 8, 1–15 (2020).
Article Google Scholar
McKinnon, L. R. et al. Genital inflammation undermines the effectiveness of tenofovir gel in preventing HIV acquisition in women. Nat. Med. 24, 491 (2018).
Article CAS PubMed PubMed Central Google Scholar
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 16, 321–357 (2011).
Article Google Scholar
Raschka, S. & Mirjalili, V. Python Machine Learning: Machine Learning and Deep Learning with Python, Scikit-learn, and TensorFlow 2. (India, Packt Publishing, 2019).
Cortes, C., Vapnik, V. & Saitta, L. Support-vector networks editor. Mach. Learning 20, 273–297 (1995).
Article Google Scholar
Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L. & Lopez, A. A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing 408, 189–215 (2020).
Article Google Scholar
Nohara, Y., Matsumoto, K., Soejima, H. & Nakashima, N. Explanation of machine learning models using improved Shapley additive explanation. 546–546 https://doi.org/10.1145/3307339.3343255 (2019).
Nohara, Y., Matsumoto, K., Soejima, H. & Nakashima, N. Explanation of machine learning models using Shapley additive explanation and application for real data in hospital. Comput. Methods Prog. Biomed. 214, 106584 (2022).
Article Google Scholar

Download references

Acknowledgements

This work was conducted with support from the UL1TR002541 award through Harvard Catalyst. The Harvard Clinical and Translational Science Center (National Center for Advancing Translational Sciences, National Institutes of Health) and financial contributions from Harvard University and its affiliated academic healthcare centers. The content is solely the responsibility of the authors and does not necessarily represent the official views of Harvard Catalyst, Harvard University and its affiliated academic healthcare centers, or the National Institutes of Health.; Dr. Bar is supported by a grant from the US-Israel Binational Science Foundation to Drs. Yassour and Mitchell.

Author information

These authors contributed equally: Ofri Bar, Stylianos Vagios, Omer Barkai, Moran Yassour, Caroline Mitchell.

Authors and Affiliations

Department of Obstetrics and Gynecology, Massachusetts General Hospital, Boston, MA, USA
Ofri Bar, Kaitlyn James, Philipp Foessleitner & Caroline Mitchell
Department of Microbiology and Molecular Genetics, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
Ofri Bar & Moran Yassour
Harvard Medical School, Boston, MA, USA
Ofri Bar, Omer Barkai, Philipp Foessleitner, Douglas S. Kwon & Caroline Mitchell
Department of Obstetrics & Gynecology, Tufts University Medical Center, Boston, MA, USA
Stylianos Vagios
F.M. Kirby Neurobiology Center, Boston Children’s Hospital, Boston, MA, USA
Omer Barkai
Department of Neurobiology, Harvard Medical School, Boston, MA, USA
Omer Barkai
Ragon Institute of MGH, MIT, and Harvard, Massachusetts General Hospital, Cambridge, MA, USA
Joseph Elshirbini, Jiawu Xu & Douglas S. Kwon
Division of Reproductive Endocrinology and Infertility, Department of Obstetrics and Gynecology, Harvard Medical School, Massachusetts General Hospital Fertility Center, Boston, MA, USA
Irene Souter, Charles Bormann, Makiko Mitsunami & Jorge E. Chavarro
Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Jorge E. Chavarro
Division of Infectious Diseases, Massachusetts General Hospital, Boston, MA, USA
Jorge E. Chavarro & Douglas S. Kwon
Division of Obstetrics and Feto-Maternal Medicine, Department of Obstetrics and Gynecology, Medical University of Vienna, Vienna, Austria
Philipp Foessleitner
The Rachel and Selim Benin School of Computer Science and Engineering, Hebrew University of Jerusalem, Jerusalem, Israel
Moran Yassour

Authors

Ofri Bar
View author publications
Search author on:PubMed Google Scholar
Stylianos Vagios
View author publications
Search author on:PubMed Google Scholar
Omer Barkai
View author publications
Search author on:PubMed Google Scholar
Joseph Elshirbini
View author publications
Search author on:PubMed Google Scholar
Irene Souter
View author publications
Search author on:PubMed Google Scholar
Jiawu Xu
View author publications
Search author on:PubMed Google Scholar
Kaitlyn James
View author publications
Search author on:PubMed Google Scholar
Charles Bormann
View author publications
Search author on:PubMed Google Scholar
Makiko Mitsunami
View author publications
Search author on:PubMed Google Scholar
Jorge E. Chavarro
View author publications
Search author on:PubMed Google Scholar
Philipp Foessleitner
View author publications
Search author on:PubMed Google Scholar
Douglas S. Kwon
View author publications
Search author on:PubMed Google Scholar
Moran Yassour
View author publications
Search author on:PubMed Google Scholar
Caroline Mitchell
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, C.M. and D.K.; Methodology, O.B., S.V., O.B., C.M., M.Y., J.C., and D.K.; Data acquisition, S.V., O.B., I.S., C.B., M.M., J.C., and C.M.; Resources, O.B., S.V, O.B, C.M., M.Y., and D.K.; Data curation, O.B., O.B., J.X, J.E., and M.Y.; Statistical analysis, O.B. and K.J; Writing—original draft preparation, O.B., S.V., and C.M.; Writing—review and editing, O.B., S.V., O.B., M.Y., F.P., and C.M.; Visualization, O.B., S.V., O.B., M.Y., and C.M.; Supervision, C.M. and M.Y.; Project administration, S.V.; All authors critically revised and approved the final version of the manuscript.

Corresponding author

Correspondence to Ofri Bar.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Bar, O., Vagios, S., Barkai, O. et al. Harnessing vaginal inflammation and microbiome: a machine learning model for predicting IVF success. npj Biofilms Microbiomes 11, 95 (2025). https://doi.org/10.1038/s41522-025-00732-8

Download citation

Received: 22 November 2024
Accepted: 15 May 2025
Published: 05 June 2025
Version of record: 05 June 2025
DOI: https://doi.org/10.1038/s41522-025-00732-8

This article is cited by

From gut to gamete: how the microbiome influences fertility and preconception health
- Sarah K. Munyoki
- Natalie Vukmer
- Eldin Jašarević
Microbiome (2025)