Abstract
Humans are the only species with a commensal Lactobacillus-dominant vaginal microbiota. Reproductive tract microbes have been linked to fertility outcomes, as has intrauterine inflammation, suggesting immune response may mediate adverse outcomes. In this pilot study, we compared vaginal microbiota composition and immune marker concentrations between patients with unexplained or male factor infertility (MFI), as a control. We applied a supervised machine learning algorithm that integrated microbiome and inflammation data to predict pregnancy outcomes.
Twenty-eight participants provided vaginal swabs at three IVF cycle time points; 18 achieved pregnancy. Pregnant participants had lower microbial diversity and inflammation. Among them, MFI cases had higher diversity but lower inflammation than those with unexplained infertility. Our model showed the highest prediction accuracy at time point 2 of the IVF cycle. These findings suggest that vaginal microbiota and inflammation jointly impact fertility and can inform predictive tools in reproductive medicine.
Similar content being viewed by others
Introduction
Infertility affects 7–15% of reproductive age women in the United States1. In vitro fertilization (IVF) is one of the most common types of infertility treatment and involves oocyte retrieval, fertilization with sperm in the laboratory, and embryo transfer into the uterus. This process results in live birth in up to 40% - 50% of cases, depending on various factors such as age and infertility diagnosis2,3. People with unexplained infertility have the lowest success rates (closer to 30%)4, and although success rates have increased over the past decades, there are still opportunities to optimize the efficacy of IVF.
Cervicovaginal microbiota are recognized to play an important role in women’s health5. In contrast to the gut, where a diverse community is considered healthy, in the vagina a low-diversity community dominated by Lactobacillus is a marker of health5. Dominance of the vaginal or endometrial microbiota by L. crispatus is associated with higher pregnancy rates after IVF, compared to dominance by anaerobic bacteria6,7,8,9,10 One hypothesized reason for this is the association between Lactobacillus-dominance and lower concentrations of vaginal inflammatory chemokines and cytokines11. Both immune and microbiome compose large (high-dimensional) data sets, making machine learning tools useful as they allow for the identification of complex microbial and inflammatory patterns and associations that may impact health outcomes. Recent reports highlight how these methods can be transformed to enhance our understanding of microbiome dynamics and their predictive value in clinical contexts12,13.
In this pilot study, we prospectively collected vaginal samples at three time points during a treatment cycle from women with unexplained or male factor infertility (MFI) undergoing IVF. We aimed to assess associations between vaginal microbial composition, vaginal fluid inflammatory markers and chances of becoming pregnant. We hypothesized that people with unexplained infertility would be less likely to have Lactobacillus dominance throughout the cycle, and that lower concentrations of vaginal fluid pro-inflammatory markers would be associated with increased clinical pregnancy rates across diagnoses. Using Support Vector Machine (SVM) supervised machine learning model, we demonstrate that vaginal microbiome data alone—or combined with inflammatory markers—can effectively predict pregnancy potential with high performance13,14. Furthermore, we leverage SHapley Additive exPlanations (SHAP) analysis to interpret feature importance and provide a detailed explanation of the key predictive factors within our model15 (Fig. 1).
The scheme illustrates the model’s workflow. Swab samples were collected at 3 time points of the IVF cycle for data preprocessing of vaginal microbiome and inflammation(top). Vaginal microbiome was analyzed with 16S rRNA sequencing, and inflammation was analyzed with Luminex. Data was then used as a feature set, extracted and reduced before training an SVM model, which was cross-validated with leave-one-subject-out methods to construct a pregnancy prediction classifier with explanatory features. Created with BioRender.com.
Results
Study population
We enrolled 30 people, and 29 completed an IVF cycle during the study period. One person was excluded from analysis because only 1 swab was collected, leaving 28 participants for the final analysis, 14 with unexplained infertility and 14 with MFI. All participants with unexplained infertility provided 3 vaginal swabs. Of the participants with MFI, eleven had 3 vaginal swabs and three had 2 vaginal swabs collected (2 participants were missing the second and one the first swab). Thus, the analysis included 81 samples.
18 of the 28 participants became pregnant. There were no significant differences in age, infertility diagnosis, race, BMI, ovarian reserve markers, cycle type, blastocysts produced, and number of transferred embryos between those who became pregnant and those who did not (Table 1).
Vaginal microbiome—Community state types and microbiome diversity
Presence of CST I was associated with clinical pregnancy (Fig. 2A, B). At time of embryo transfer 11 of 14 people with CST I became pregnant (79%), 2 of 2 with CST II became pregnant (100%), 4 of 6 with CST III became pregnant (66.6%), 1 of 4 with CST IV became pregnant (25%) and 0 of 2 with CST V (p = 0.07). Most participants had the same CST assignment across all three time points (Fig. 2C). To assess the association between vaginal microbiome diversity and pregnancy outcome, we compared alpha diversity by pregnancy outcome, using the Shannon Diversity Index. Women who became pregnant were significantly more likely to have a less diverse vaginal microbiome compared to women who did not become pregnant (Fig. 2D, p = 0.041). When comparing microbial diversity between the two infertility diagnoses groups, participants with MFI had a significantly more diverse vaginal microbiome than those with unexplained infertility (Fig. 2E, p = 0.031).
A total of 81 vaginal samples from 28 women were analyzed and categorized into Community State Type (CST). Samples are stratified by diagnosis (A Unexplained, B Male factor). Time point is noted in the sample name (-1 for early in cycle, -2 for egg retrieval/2nd ultrasound or -3 for embryo transfer) and pregnancy outcome denoted by a colored square across the bottom of the plots. C Sankey plot showing the CST assignment over time for the 25 participants with three samples. D Alpha diversity by pregnancy outcome. E Alpha diversity by diagnosis.
Within patients who became pregnant, patients with MFI had a more diverse vaginal microbiome than those with unexplained infertility (Supplementary Fig 1A, p = 0.086), however the difference did not reach statistical significance, possibly due to the small sample size. Among the participants who did not become pregnant, there was no difference in alpha diversity between the two types of infertility (Supplementary Fig 1A, p = 0.397).
When we compared alpha diversity within infertility group, we noted that unexplained infertility patients who became pregnant had lower vaginal microbiome diversity compared to the ones that did not become pregnant, however this difference did not reach statistical significance (Supplementary Fig 1b, p = 0.210). Patients with MFI who became pregnant had less difference in microbial diversity compared to those who did not become pregnant (Supplementary Fig 1B, p = 0.324).
Genital inflammation and treatment outcome
We next examined whether genital inflammation was associated with treatment outcome. Among 20 analytes, 2 were undetectable across all samples. The remaining 18 were analyzed across 79 vaginal samples that had complete data for cytokines across all three time points. There were no notable differences in immune markers between infertility diagnoses or pregnancy outcomes (Fig. 3A).
A A heatmap of all 18 analytes analyzed across 79 vaginal samples. Concentrations are shown ona log2 scale. Annotations for pregnancy outcome, infertility diagnoses, Community state type (CST) and Shannon Diversity Index are shown on the top. B Comparison of inflammation score between pregnancy outcomes. C Comparison of inflammation score between pregnancy outcomes within the diagnosis group. D Comparison of the inflammation score between pregnancies within CST.
We assigned an inflammation score for each of the 79 samples by tallying the number of values in the top quartile for 9 selected analytes in each sample (IL-1b, IL-1a, IP-10, IL-6, TNFa, IL-8, MIP-1a, MIP-1b,IL-17).
Participants who became pregnant had a significantly lower inflammation score than those who did not (Fig. 3B, p = 0.024). We then compared inflammation scores within each CST category to see if differences in the host inflammatory response to a similar microbial profile were associated with treatment outcome. Among participants with CST III vaginal microbiome (L. iners dominant), genital inflammation scores were higher in the participants who did not conceive compared to those who did (Fig. 3D). In participants with CST I microbiome (L. crispatus dominant), there was no such difference.
Within the MFI group, patients who conceived had a lower inflammation score compared to the ones that did not (Fig. 3C, p = 0.061), however the difference did not reach statistical significance. Within the unexplained infertility group, this difference was not as pronounced (Fig. 3C, p = 0.296).
Machine learning approach for predicting pregnancy outcome using microbial community and genital inflammation
Our data show slightly different associations with pregnancy outcome between the two fertility diagnoses, which led to the question of whether host responses, microbial community, or a combination of the two, can be used for predicting pregnancy outcome in IVF. To this end, we developed and applied a supervised machine learning algorithm (Fig. 1). We trained a support vector machine (SVM) classification model with the subjects taxonomic or inflammatory data as features (‘X’), and their pregnancy outcomes as targets (‘y’). Prediction performance was assessed at each time point where swabs were collected during the IVF cycle, using microbiome data, cytokine data, or a combination of both. When using only bacterial features, the highest prediction performance with an F1-score of 0.9 was observed at time point 2 (Fig. 4A). With inflammatory features alone, the best prediction occurred at time point 3, during embryo transfer, with an F1-score of 0.86. When combining both bacterial and inflammatory features, the best prediction was at time point 2, with an F1-score of 0.87 (Fig. 4A). Model importance analysis suggests relative abundance of Gardnerella vaginalis to be of high impact in the models’ performances (Supplementary Fig 2). Gardnerella vaginalis, however, is often considered a marker of high microbial diversity. To confirm that the presence of Gardnerella was specifically associated with pregnancy outcome, and not an indicator for high microbial diversity, we asked whether adding a diversity index as a feature would benefit the model’s performance. We included the Shannon diversity index as a feature and retrained the classification model for the three feature sets over the three time points. Across all nine models, the F1-scores were lower when the Shannon index was included, while Gardnerella vaginalis remained the most important bacterial feature affecting pregnancy outcome prediction. The model also resulted in lower performances when Gardnerella vaginalis was dropped from the Shannon index-included feature set, suggesting that Gardnerella vaginalis cannot be solely considered as a high microbial diversity measure. We further tested if infertility diagnosis can improve the model’s ability to predict pregnancy outcomes. The addition of the infertility diagnosis as a feature to our training dataset also did not improve the model’s performance. To assess whether our model’s performance was significantly better than random chance, we performed a permutation test where the pregnancy outcome labels were randomly shuffled 50 times for each model. F1 scores were computed for each of the 50 permutations. We found that the models trained on the original labels consistently outperformed those trained on shuffled labels. A one-sample t-test confirmed that this difference was statistically significant (Supplementary Fig 3).
A Confusion matrix for the three models—two models at time point 2 (egg retrieval/2nd ultrasound) using either bacterial features alone or combined with inflammatory features. The third model is at time point 3 (embryo transfer) using only inflammatory features. F1 scores for each model are shown on the top. B Global feature importance ranking graph for each model. Features are ordered by importance in descending order from top to bottom. Each observation is represented by a dot, with the initial feature value indicated by the dot’s color according to the color map (left). The x-axis represents the influence of each feature, with a vertical line marking the baseline. Points to the right of the baseline indicate positive influence on pregnancy outcome, while points to the left indicate negative influence on pregnancy outcome. Variables are ordered by global importance, with the most important feature listed first and the least important listed last. For an importance ranking presentation, the absolute influence distribution SHAP graph for each model is shown (right). The feature importance values were normalized to the highest-ranked feature importance value. Purple and orange bars represent bacteria and cytokines features importances, respectively.
Prediction explanation
We next sought to understand which of the features had highest importance in our prediction performance. To provide an explanation of how each feature contributes to the predicted outcome of our machine learning models, we used SHAP summary showing the contribution of the top-ten features for each subject in the three models (Fig. 4B, left) and their absolute importance (Fig. 4B, right).
In both models that included bacterial features, the presence of Gardnerella vaginalis was the most impactful bacterial variable in the model, with high relative abundance contributing to no pregnancy. Notably, L. crispatus appeared in the top ten ranking of the two bacteria feature-based models and is shown to be positively associated with pregnancy outcome, agreeing with our findings (Fig. 2A, B). Enterobacter also appears on both bacteria feature-based models, shown to have a negative impact on the pregnancy outcome predictions.
Presence of several cytokines also had a significant impact on pregnancy prediction: IL-1a and ITAC, repeated in top-three rankings in cytokine feature-based models, both with a negative pregnancy outcome—high abundance relation in the models’ predictions.
Discussion
In this pilot study including patients with unexplained and male factor infertility as a control assuming normal fertility for the female partner, we evaluated the relationship between the vaginal microbiota, genital inflammation, and pregnancy outcome. We show that the microbiota and inflammatory data can be utilized for the prediction of IVF pregnancy outcome with high accuracy. As previous studies have shown, we observed an inverse relationship between microbial diversity and the chance of clinical pregnancy. Interestingly, though, we found different patterns of association between these measures among couples with unexplained vs. male factor infertility (in which the female partner is presumed to have normal fertility). In our small cohort, people with unexplained infertility who became pregnant had lower vaginal microbial diversity than those with male factor infertility. These results could suggest that the vaginal microbiome has a greater impact on pregnancy outcome in people with unexplained infertility. Of note, the participants with male factor infertility were less likely to have a pattern of Lactobacillus crispatus dominance and had a significantly higher alpha diversity compared to participants with unexplained infertility.
“Optimal” vaginal microbial populations dominated by Lactobacillus species have been associated with higher rates of pregnancy with IVF, however few studies have evaluated whether this association differs by indication for IVF5. In a study where the majority of people included had MFI (67%), those with Lactobacillus dominance were more likely to become pregnant7. In a study of 30 women who likely had ovulatory dysfunction or unexplained infertility, investigators showed that greater diversity of the vaginal microbial community on the day of embryo transfer was associated with a lower live birth rate16. A Danish cohort with equal proportions of male factor (36%) and unexplained (31%) infertility showed a 35% live birth rate in IVF patients with CST I, and a 41% live birth rate for CST III compared to 8% for CST IV17. In that cohort, the prevalence of CST I was similar between those with unexplained (43%) vs. male factor infertility (58%), though swabs could have been taken up to 2 months prior to embryo transfer. They also found that the total load of L. iners in CST III communities by qPCR was much higher than the load of L. crispatus in CST I communities and suggested that total abundance may be as important as relative abundance when assessing the role of the microbiome in IVF. None of these studies compared results between infertility diagnoses. We also observed an inverse relationship between microbial diversity and the chance of clinical pregnancy. Interestingly, though, among participants with clinical pregnancies, there was significantly higher diversity in the microbiome of patients with MFI compared to those with unexplained infertility. This highlights the importance of including the infertility diagnosis in such studies.
In studies of the seminal microbiome, community composition differed between those with normal vs. abnormal sperm parameters18,19. More participants in our study with MFI had CST III or CST IV communities. We did not evaluate seminal microbiome nor recency of sexual intercourse, but other work has shown a correlation between penile and vaginal microbiota in sexual partners20. In a recent study of semen microbiome, men with abnormal sperm motility had a higher abundance of Lactobacillus iners, compared to those with normal sperm motility. This study also observed that men with abnormal sperm concentration showed a higher abundance of Pseudomonas stutzeri and Pseudomonas fluorescens21. It is possible that the vaginal microbiome in couples with MFI in our study is reflective of male partner semen dysbiosis.
Cytokines, growth factors, and adhesion molecules all participate in making implantation and pregnancy possible. Using a composite inflammation score to identify people with the highest concentrations of multiple markers we showed that the presence of vaginal fluid inflammation was associated with lower chance of pregnancy. Most studies of immune factors associated with IVF success measure serum concentrations of chemokines and cytokines and have shown that markers of systemic inflammation are associated with unexplained infertility and recurrent implantation failure22,23. Fewer studies have evaluated endometrial or vaginal markers of inflammation. In one study of endometrial biopsies collected in the cycle before an IVF cycle, the presence of ≥ 5 CD138+ plasma cells in a high-powered microscopy field, indicative of chronic endometritis, was associated with a significantly lower pregnancy rate24. In a separate study, endometrial fluid was aspirated immediately prior to embryo transfer in a cohort of patients where ~50% were being treated for male factor infertility, ~20% for unexplained infertility and ~20% for tubal factor. Elevated levels of TNF-ɑ and MIF (macrophage migration inhibitory factor) were associated with increased clinical pregnancy, while IL1b and MCP-1 (monocyte chemoattractant protein) were associated with lower pregnancy rates11. These data indicate that complex immune interactions inform IVF outcomes: both TNF-a and IL1b are considered to have pro-inflammatory effects but have opposite associations.
Our machine learning model identified the vaginal microbiota, a combination of the vaginal microbiota and vaginal fluid immune markers at the second time point (which was egg retrieval for most participants), or vaginal fluid immune markers at the third time point which was embryo transfer as the best predictor of pregnancy outcomes. A study using endometrial biopsies found that when the endometrial microbiome had <90% lactobacilli, tissue had higher concentrations of the pro-inflammatory analytes IL6, IL1b and less of the anti-inflammatory IL1025. Previous data have shown that a non-Lactobacillus dominant endometrial microbiome in the cycle prior to an IVF cycle is associated with lower rates of pregnancy9,10. Interestingly, Corynebacterium emerged as an important factor in multiple comparisons within our model, despite not being among the most abundant taxa in vaginal samples. Given that the vaginal microbiome is typically of low diversity, it would be expected that the most predictive taxa would be among the more dominant species. However, machine learning approaches allow for the identification of taxa that may not be the most abundant but still hold biological significance. While Corynebacterium is generally considered a minority member of the vaginal microbiome, it has been more frequently reported in postmenopausal individuals and cases of microbial dysbiosis and has also been previously linked to preterm birth26. Its predictive role in our model suggests that even subtle variations in its abundance may have functional relevance. While the clinical significance of Corynebacterium in this context remains unclear, this finding highlights the potential for non-dominant taxa to influence reproductive outcomes and underscores the need for further investigation in larger cohorts.
At the time of embryo transfer, our model showed that immune factors alone had the highest predictive value. This suggests that the impact of the microbiome on pregnancy outcomes is likely mediated through the host immune response—the microbiome at the second time point likely drives the immune response at the third time point, which may be reflective of endometrial receptivity.
The prospective nature of our study and the fact that our participants were consistently evaluated within the same center are major strengths of the study. All participants had the same evaluation to identify the cause of their infertility. Limitations of this study include the small size of our population, which limits our power to detect associations, precludes controlling for potential confounders in our analysis, and limits the generalizability of our findings and the ability to draw clinical conclusions that would affect practice. We included both fresh and frozen cycles in our analysis, which increased generalizability but also heterogeneity, and did not have sufficient numbers to control for different hormonal regimens. We assessed whether cycle type influenced the prediction model at mid-cycle and observed no differences in pregnancy prediction accuracy across all models using our features. At this time there are no interventions proven to reliably shift the vaginal microbiome toward a L. crispatus-dominant community type, however if and when these become available, our results suggest studying these in the context of IVF outcomes could be promising. The majority of our population identifies as White, further limiting the generalizability of our findings. Additionally, our participants diagnosed with male factor infertility could also have unexplained female infertility, as this is a diagnosis of exclusion. Finally, we did not measure endometrial microbiota, which may be of greater relevance to implantation, though less accessible for routine testing.
The findings of this pilot analysis suggest that larger studies of microbiota and IVF outcomes should plan for sub-analyses by type of infertility, as there may be differences in the magnitude of impact of microbial communities between different diagnoses. Our results also affirm that a low-diversity, Lactobacillus dominant vaginal microbiota is associated with greater success for a given IVF cycle. To our knowledge, this is the first published machine learning model that enables the prediction of IVF pregnancy outcome based on microbiome and inflammatory data. While it remains to be tested, it may pave the way for improving intervention in IVF cycle medical decision-making. Furthermore, our machine learning model’s feature importance explanation can be used as an exploratory guide map for a better understanding of the underlying factors affecting the chances of becoming pregnant in IVF.
Methods
Study design
This pilot study was approved by the Mass General Brigham Human Subjects Committee, IRB number 2015P000085. We enrolled participants under 40 years of age with unexplained infertility or MFI who underwent IVF between 10/2019 and 04/2021 at the Massachusetts General Hospital Fertility Center. All study participants provided written consent prior to enrollment. Participants did not receive any incentive or compensation for their participation. Unexplained infertility was defined as a couple with normal semen analysis, normal evaluation of uterus, tubes, and ovarian function (AMH > 0.8), who have been trying to conceive for 6–12 months (depending on age) without success. Male factor infertility was defined as a couple with normal evaluation of uterus, tubes, ovarian function (AMH > 0.8) and one or more abnormal semen analysis parameters on two separate samples produced at least 2 weeks apart. Normal semen analysis values were based on the World Health Organization 5th ed. Guidelines: concentration >20 million/mL, motility >40%, forward progression >3 and total motile count >15 million motile sperm/sample27.
Both fresh and frozen embryo transfer cycles were included. For cycles with a planned fresh embryo transfer, participants provided vaginal swabs on day 3 of the stimulation cycle, on the day of egg retrieval and on the day of embryo transfer. For cryothaw cycles, participants provided swabs on the day of baseline ultrasound (day 3–5), on the day of the second ultrasound, and on the day of embryo transfer. Swabs were collected before clinical procedures. Swabs were stored at −80 °C until processed.
Data on age, race, peak serum estrogen levels, anti-Mullerian hormone (AMH) levels, follicle-stimulating hormone (FSH) levels, embryo quality, number of embryos transferred, stimulation protocol, pregnancy history and medical history were obtained through chart review. Information about treatment outcomes (i.e., no pregnancy, biochemical pregnancy, confirmed intrauterine gestation on ultrasound) was obtained from the medical record for the ART cycle in which the vaginal samples were collected.
Laboratory analyses
Swabs were eluted in 400 μL sterile saline, centrifuged, and the pellet submitted for DNA extraction, while the supernatant was aliquoted and frozen for Luminex analysis. Genomic DNA (gDNA) was extracted using a plate-based protocol that included a bead beating process and combined phenol-chloroform isolation with QIAamp 96 DNA QIAcube HT Kit (Qiagen) procedures. The amplicon library of bacterial 16S rRNA gene V4 region was prepared and sequenced on Illumina MiSeq with a 300-cycle sequencing kit28. Taxonomic assignment was performed using GTDB and the microbial compositional analysis package dada229,30. In stacked bar plots, only the 20 most prevalent taxa are represented. The microbial compositional analysis was performed using R (Version 4.2.2). Taxonomy data was aggregated at the genus level, and low-abundance taxa (<0.5% prevalence) were filtered out. Samples were assigned to a microbial Community State Type (CST) using VAginaL community state typE Nearest CentroId clAssifier-VALENCIA31: CST I = Lactobacillus crispatus dominant, CST II = Lactobacillus gasserii dominant, CST III = Lactobacillus iners dominant, CST V = Lactobacillus jensenii dominant, CST IV = non-Lactobacillus dominant.
The concentrations of 20 cytokines/chemokines (MIG, IP10, IFN-γ, ITAC, IL1α, IL1β, TNFα, IL6, IL8, MIP-3α, IL12, MIP1α, MIP3β, IL13, IL 12, IL21, IL4, IL23, IL5, IL10) were measured in the vaginal supernatant using multiplexed ELISA assays (Luminex), as previously described29,32. Values below the lower limit of detection in the assay were recorded as half of the lowest standard concentration for that analyte. Similarly, concentrations above the detectable limit were recorded as 1.1 times the highest standard concentration. In an attempt to identify participants with the most overall mucosal inflammation, we assigned each sample an “inflammation score”, which corresponded to the number of any of 9 inflammatory markers with a value in the highest quartile for this cohort. These markers were selected based on a panel associated with increased risk for STI acquisition32: (interleukin (IL)-1α, IL-1β, IL-6, tumor necrosis factor (TNF)-α, IL-8, C-X-C motif chemokine 10 (CXCL10; also known as IP-10), IL-17, macrophage inflammatory protein (MIP)-1α, and MIP-1β). For each marker, we assigned a score of 1 if the concentration value was >= 75th percentile (of the cohort) and 0 if below that value. We tallied the score for all the markers across 79 samples to provide an inflammation score for each sample (the score ranging between 0-9 for each sample).
Statistical analysis
This was a pilot, convenience sample. Metadata were compared between those who did vs. did not have an intrauterine pregnancy (IUP) and between patients with unexplained vs. MFI using chi square, t-test or Mann–Whitney U-test as appropriate. When evaluating the association between microbiome or inflammation and outcomes, we included data from all timepoints for all participants in a mixed effects linear regression model to control for the multiple samples per participant and for the different time points. Statistical analyses were performed using either Stata or R, with a p-value of less than 0.05 considered statistically significant.
Alpha diversity was assessed using the Shannon Diversity Index and was calculated using the Diversity function, which is part of the Vegan package. Alpha diversity was compared between groups using mixed effects linear regression analysis to account for multiple samples from each participant.
Machine learning model
Data acquisition, preprocessing, and target definition
The dataset used for this study included subjects with both cytokine and bacterial abundance data at each of the three time points: 1 A (27 subjects), 2A (25 subjects), 3A (27 subjects), with a binary pregnancy outcome label (‘outcome’). As a preprocessing step, to ensure a high-quality feature matrix (X) for the subsequent modeling steps, feature columns with values below a 0.01 threshold were excluded, ensuring only significant features were included. Columns with missing or low-variance data, defined as less than 50% of non-zero values, were also removed. The target variable (y) was the binary pregnancy outcome per subject.
Data balancing using SMOTE
To address the class imbalance in the dataset (between pregnant and not pregnant) we applied Synthetic Minority Over-sampling Technique (SMOTE) with a consistent arbitrary random–state33. This technique generated synthetic samples for the minority class, ensuring balanced representation of both classes for training the classifier. With SMOTE over-sampling, 34 data points were used for time point 1 A and 3 A, and 32 were used for time point 2 A.
Machine-learning model
A support vector machine (SVM) with C-Support Vector Classification with a linear kernel was selected due to its simplicity and superior performance. The classifier was trained on the scaled training data, and predictions were generated on the left-out test sample34,35,36. The model was used with the default parameters provided by the Scikit-learn package. The prediction probability threshold which was considered positive was above 0.5, otherwise it was considered negative.
Model evaluation metrics
Due to the imbalanced nature of the data, to evaluate the model’s performance, we used F1-score, which is the harmonic mean of precision and recall, providing a balanced measure of a model’s prediction performance that accounts for both false positives and false negatives:
Where
Accuracy, the number of correct predictions the model has made was measured as follows:
While training sets included both original and SMOTE-synthesized data, the evaluations of the models were computed only for the original dataset. Confusion matrices were generated to assess the classification performance.
Leave-one-out cross-validation
We employed leave-one-out cross-validation (LOO-CV) to rigorously evaluate the model’s performance. For each iteration, one sample was held out as the test set while the rest of the data was used as the training set. The number of iterations was equal to the number of samples, as this was repeated for every sample (Fig. 1). Evaluation was performed using the F1 and Accuracy metrics. The features matrix was standardized using the ‘StandardScaler’ method, standardizing each feature by subtracting the mean and then scaling to unit variance, ensuring that all features contributed equally to the model’s predictions.
Feature importance
For feature importance, we relied on (SHapley Additive exPlanations) SHAP from the SVC linear model37,38. Feature importance values were computed after each LOO-CV fold. These importance scores were averaged across folds to represent the overall contribution of each feature.
Software and tools
Data analysis and statistics were performed using R. Machine learning analysis and modeling were conducted using Python. We used the following open-source packages: Pandas for data manipulation, Imblearn (SMOTE) for handling class imbalances, Scikit-learn for machine learning and Matplotlib for plotting.
Data availability
The sequences generated as part of this analysis were uploaded to the NCBI Short Read Archive (BioProject PRJNA1037556).
Code availability
Scripts to reproduce the machine learning algorithm are available in a GitHub repository:https://github.com/OmerBarkai/IVFOutcomeAI.
References
Thoma, M. E. et al. Prevalence of infertility in the United States as estimated by the current duration approach and a traditional constructed approach. Fertil. Steril. 99, 1324–1331.e1 (2013).
Gleicher, N., Kushnir, V. A. & Barad, D. H. Worldwide decline of IVF birth rates and its probable causes. Hum. Reprod. Open 2019, hoz017 (2019).
CDC. ART Success Rates | CDC. https://www.cdc.gov/art/success-rates/?CDC_AAref_Val=https://www.cdc.gov/art/artdata/index.html (2022).
Reindollar, R. H. et al. A randomized clinical trial to evaluate optimal treatment for unexplained infertility: the fast track and standard treatment (FASTT) trial. Fertil. Steril. 94, 888–899 (2010).
Anahtar, M. N., Gootenberg, D. B. & Mitchell, C. M. & Kwon, D. S. Cervicovaginal microbiota and reproductive health: the virtue of simplicity. Cell Host Microbe 23, 159–168 (2018).
Schoenmakersa, S., Laven, N. J. & Schoenmakersa, S. The vaginal microbiome as a tool to predict IVF success. Curr. Opin. Obstet. Gynecol. 32, 169–178 (2020).
Koedooder, R. et al. The vaginal microbiome as a predictor for outcome of in vitro fertilization with or without intracytoplasmic sperm injection: a prospective study. Hum. Reprod. 34, 1042–1054 (2019).
Kong, Y. et al. The disordered vaginal microbiota is a potential indicator for a higher failure of in vitro fertilization. Front. Med. 7, 217 (2020).
Moreno, I. et al. Evidence that the endometrial microbiota has an effect on implantation success or failure. Am. J. Obstet. Gynecol. 215, 684–703 (2016).
Moreno, I. et al. Endometrial microbiota composition is associated with reproductive outcome in infertile patients. Microbiome 10, 1 (2022).
Boomsma, C. M. et al. Endometrial secretion analysis identifies a cytokine profile predictive of pregnancy in IVF. Hum. Reprod. 24, 1427–1435 (2009).
Hernández Medina, R. et al. Machine learning and deep learning applications in microbiome research. ISME Commun. 2, 1–7 (2022).
Pasolli, E., Truong, D. T., Malik, F., Waldron, L. & Segata, N. Machine learning meta-analysis of large metagenomic datasets: tools and biological insights. PLoS Comput. Biol. 12, e1004977 (2016).
Statnikov, A. et al. A comprehensive evaluation of multicategory classification methods for microbiomic data. Microbiome 1, 11 (2013).
Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. Adv. Neural. Inf. Process. Syst. 2017, 4766–4775 (2017).
Hyman, R. W. et al. The dynamics of the vaginal microbiome during infertility therapy with in vitro fertilization-embryo transfer. J. Assist Reprod. Genet 29, 105–115 (2012).
Haahr, T. et al. Vaginal microbiota and in vitro fertilization outcomes: development of a simple diagnostic tool to predict patients at risk of a poor reproductive outcome. J. Infect. Dis. 219, 1809–1817 (2019).
Garcia-Segura, S. et al. Seminal microbiota of idiopathic infertile patients and its relationship with sperm DNA Integrity. Front. Cell Dev. Biol. 10, 937157 (2022).
Veneruso, I. et al. Metagenomics reveals specific microbial features in males with semen alterations. Genes 14, 1228 (2023).
Mehta, S. D. et al. The microbiome composition of a man’s penis predicts incident bacterial vaginosis in his female sex partner with high accuracy. Front. Cell Infect. Microbiol. 10, 433 (2020).
Osadchiy, V. et al. Semen microbiota are dramatically altered in men with abnormal sperm parameters. Sci. Rep. 14, 1068 (2024).
Topkara Sucu, S. et al. New immunological indexes for the effect of systemic inflammation on oocyte and embryo development in women with unexplained infertility: systemic immune response index and pan-immune-inflammation value. Am. J. Reprod. Immunol. 92, e13923 (2024).
Liang, P. Y. et al. The pro-inflammatory and anti-inflammatory cytokine profile in peripheral blood of women with recurrent implantation failure. Reprod. Biomed. Online 31, 823–826 (2015).
Li, Y. et al. Diagnosis of chronic endometritis: How many CD138+ cells/HPF in endometrial stroma affect pregnancy outcome of infertile women?. Am. J. Reprod. Immunol. 85, e13369 (2021).
Cela, V. et al. Endometrial dysbiosis is related to inflammatory factors in women with repeated implantation failure: a pilot study. J. Clin. Med. 11, 2481 (2022).
Ansari, A. Z. et al. Dysbiotic vaginal microbiota induces preterm birth cascade via pathogenic molecules in the vagina. Metabolites 14, 45 (2024).
World Health Organization. Who Laboratory Manual for the Examination and Processing of Human Semen. Vol. 6, 1–276 (World Health Organization, Geneva, 2021).
Bloom, S. M. et al. Cysteine dependence of Lactobacillus iners is a potential therapeutic target for vaginal microbiota modulation. Nat. Microbiol. 7, 434 (2022).
Gosmann, C. et al. Lactobacillus-deficient cervicovaginal bacterial communities are associated with increased HIV acquisition in young South African women. Immunity 46, 29–37 (2017).
Anahtar, M. N. et al. Cervicovaginal bacteria are a major modulator of host inflammatory responses in the female genital tract. Immunity 42, 965–976 (2015).
France, M. T. et al. VALENCIA: a nearest centroid classification method for vaginal microbial communities based on composition. Microbiome 8, 1–15 (2020).
McKinnon, L. R. et al. Genital inflammation undermines the effectiveness of tenofovir gel in preventing HIV acquisition in women. Nat. Med. 24, 491 (2018).
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 16, 321–357 (2011).
Raschka, S. & Mirjalili, V. Python Machine Learning: Machine Learning and Deep Learning with Python, Scikit-learn, and TensorFlow 2. (India, Packt Publishing, 2019).
Cortes, C., Vapnik, V. & Saitta, L. Support-vector networks editor. Mach. Learning 20, 273–297 (1995).
Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L. & Lopez, A. A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing 408, 189–215 (2020).
Nohara, Y., Matsumoto, K., Soejima, H. & Nakashima, N. Explanation of machine learning models using improved Shapley additive explanation. 546–546 https://doi.org/10.1145/3307339.3343255 (2019).
Nohara, Y., Matsumoto, K., Soejima, H. & Nakashima, N. Explanation of machine learning models using Shapley additive explanation and application for real data in hospital. Comput. Methods Prog. Biomed. 214, 106584 (2022).
Acknowledgements
This work was conducted with support from the UL1TR002541 award through Harvard Catalyst. The Harvard Clinical and Translational Science Center (National Center for Advancing Translational Sciences, National Institutes of Health) and financial contributions from Harvard University and its affiliated academic healthcare centers. The content is solely the responsibility of the authors and does not necessarily represent the official views of Harvard Catalyst, Harvard University and its affiliated academic healthcare centers, or the National Institutes of Health.; Dr. Bar is supported by a grant from the US-Israel Binational Science Foundation to Drs. Yassour and Mitchell.
Author information
Authors and Affiliations
Contributions
Conceptualization, C.M. and D.K.; Methodology, O.B., S.V., O.B., C.M., M.Y., J.C., and D.K.; Data acquisition, S.V., O.B., I.S., C.B., M.M., J.C., and C.M.; Resources, O.B., S.V, O.B, C.M., M.Y., and D.K.; Data curation, O.B., O.B., J.X, J.E., and M.Y.; Statistical analysis, O.B. and K.J; Writing—original draft preparation, O.B., S.V., and C.M.; Writing—review and editing, O.B., S.V., O.B., M.Y., F.P., and C.M.; Visualization, O.B., S.V., O.B., M.Y., and C.M.; Supervision, C.M. and M.Y.; Project administration, S.V.; All authors critically revised and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Bar, O., Vagios, S., Barkai, O. et al. Harnessing vaginal inflammation and microbiome: a machine learning model for predicting IVF success. npj Biofilms Microbiomes 11, 95 (2025). https://doi.org/10.1038/s41522-025-00732-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41522-025-00732-8