Introduction

Recurrent pregnancy loss (RPL) is a distressing pregnancy disorder defined as the presence of two or more clinically recognized pregnancy losses before 20–24 weeks of gestation1. RPL affects about 2.5% of women who are trying to get pregnant. Recurrent miscarriages can be caused by a variety of conditions, including genetic, anatomical, endocrine, and immune-related illnesses2. However, the cause of about 50% of RPL instances is still unknown, and the etiology of RPL has not yet been thoroughly clarified3. As a result, progress in the development of accurate diagnosis and early prediction of recurrent miscarriage is stymied.

A woman's endometrial immune system is essential to the success of her pregnancy because it functions as a semi-allograft of the maternal host. Early in pregnancy, roughly 40% of the decidua’s cells are endometrium-resident immune cells, which serve regulatory roles during embryo implantation to guarantee maternal tolerance of the embryo4. Decidual lymphocytes play a crucial role in the early stages of pregnancy, among other things by removing apoptotic cells, spotting infections, encouraging trophoblast invasion, and controlling decidualization. Activated natural killer (NK) cells release growth-promoting factors that promote fetal maturation, and CD49a+ Emos+ NK cells recognize HLA-G expressed on extravillous trophoblasts5,6. Depletion of CD4+CD25+Treg cells results in pregnancy loss in mice because regulatory T (T reg) cells promote tolerance between fetal and maternal cells4,7. Furthermore, the quantity of tolerogenic dendritic cells (DCs) in the endometrium was dramatically decreased in the mid-luteal phase of RPL women, which induced the differentiation of Treg cells in the endometrium and other tissues8,9. At the same time, depletion of DCs in the endometrium also interferes with embryo implantation and leads to early embryo resorption, which is related to impaired decidualization and reduced vasodilation10. Circulating monocytes infiltrate into the decidua mediated by cytokines and chemokines. They differentiate into macrophages or DCs at the onset of pregnancy, participating in regulating maternal–fetal immunity11. Collectively, RPL is hypothesized to have a common etiology of compromised endometrial immunity.

Recently, major functional genes in several diseases have been discovered using microarray technology and thorough bioinformatics analysis, which can then be employed as diagnostic and predictive biomarkers12,13,14. Finding illness biomarkers is frequently done using machine learning (ML) techniques. We can account for the magnitude and direction of interactions between predictors and outcomes using Support Vector Machines-Recursive Feature Elimination (SVM-RFE) in machine learning15. A gene expression-based deconvolution technique called CIBERSORT is employed to evaluate immune cell infiltration16. To the best of our knowledge, however, the combination study of SVM-RFE, LASSO, Random Forest (RF), and CIBERSORT has not been used to identify putative biomarkers of RPL and forecast immune cell infiltration in RPL patients.

The purpose of this study was to screen for novel biomarkers in the endometrium associated with RPL using ML techniques. In addition, we used the CIBERSORT algorithm to assess immune cell infiltration in RPL and analyzed the relationship between biomarker expression and immune cell infiltration.

Materials and methods

Preprocessing and collection of data

In Fig. 1, we can see the workflow of the research. GSE165004 and GSE26787 were downloaded from the Gene Expression Omnibus (GEO) database in NCBI17. GSE165004 was based on the GPL16699 platform, which contained endometrial tissues of 24 RPL women and 24 controls. And GSE26787 was based on GPL570, consisting of 5 RPL patients and 5 control endometrial samples. With R packages “limma” and “sva”, two datasets were then merged and batch-normalized18. With the help of R software, 40 RPL and 18 healthy women were randomly divided into the training and testing cohorts in a ratio of two to one for the following analysis.

Figure 1
figure 1

The workflow of the study.

Calculation of differentially expressed genes

Using the "limma" R package, we gained differences in gene expression between RPL and control tissues, and DEGs were set to |log2FC|> 1.0 along with P-value < 0.05. For visualizations of DEGs, heatmaps and volcano graphs were produced by "ggplot2" and "pheatmap" packages in R.

Enrichment assessment of DEGs

The Gene Ontology (GO), Disease Ontology (DO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways19 were assessed for functional enrichment using the "clusterProfiler" package of R to discover the underlying biological functions of DEGs.

Selection of optimal feature genes (OFGs)

To identify OFGs, we selected three different machine learning (ML) algorithms. "Glmnet", an R package, constructed the Least absolute shrinkage and selection operator (LASSO) binary logistic regression model. The optimal penalty parameter was determined and used in every signature using a tenfold cross-validation minimum20. The SVM-RFE, a nonlinear support vector machine implemented in the R package "e1071", "kernlab", and "caret", is applied to determine the OFGs15. To filter OFGs, we used the R package "randomforest" to generate 500 trees for every datapoint and retained the top 5 key genes21. Furthermore, the venn graph exhibited the OFGs of the intersection of three machine learning.

Diagnostic quality verification of OFGs

A testing set was applied to verify screened OFGs as a validation step. To visualize the expression of crucial OFGs in the RPL and control women of testing set, we constructed boxplots using the R packages "ggplot2″ and "ggpubr”. Area under curve (AUC) was applied to assess the predictive value of OFGs using the receiver operating characteristic (ROC) curve computed by the "pROC" package in R22. Considering this, OFGs were recognized as prospective biomarkers with highly predictive and diagnostic capabilities once the AUC exceeded 0.85.

Assessment of immune cell Infiltration

Utilizing the CIBERSORT algorithm (https://cibersort.stanford.edu/), we computed the infiltrating abundances and differences of 22 immune cells. The outcomes were visualized with heatmap, and violin graph produced with the "corrplot" and "ggplot2" packages in R16. The relationship between OFGs and immune cells was estimated by spearman correlation coefficient.

Statistical analysis

Every statistical calculation and graph were executed by R (version 4.2.2). It was confirmed once p-value less than 0.05 as statistically significant.

Results

Identification of DEGs

Here is a diagram of the study's workflow (Fig. 1). DEGs were generated based on the training dataset, which contained endometrium tissues from 20 RPL patients and 20 controls. A comparison between RPL and control women revealed 42 DEGs with 28 upregulated and 14 downregulated genes. Heatmaps and volcano graphs were produced correspondingly to display the consequences (Fig. 2).

Figure 2
figure 2

Differentially expressed genes (DEGs) identified between RPL and control women. (A) Heatmap. (B) Volcano plot.

Enrichment assessment of DEGs

To complete the enrichment analysis of DEGs by GO/DO/KEGG, we applied the "clusterProfiler" package in R. For GO analysis, In RPL patients, the biological process, cellular component, and molecular function associated with anion transport, collagen–containing extracellular matrix and receptor-ligand activity were identified as the most enriched functions (Fig. 3A). KEGG analysis primarily targeted on Ras signalling pathway, and PI3K–Akt signalling pathway (Fig. 3B). Furthermore, DO analysis revealed tight associations of DEGs with myocardial infarction, bone remodelling disease and peptic ulcer disease in RPL women (Fig. 3C).

Figure 3
figure 3

Functional enrichment analysis of DEGs. (A) GO analysis was executed to identify the potential functions of DEGs, containing CC, MF, and BP. (B) KEGG pathway was evaluated between RPL and control patients regarding DEGs. (C) DO analysis was used to evaluate the enrichment of DEG in the disease.

Screening and validating OFGs

In the RPL-related DEGs, the LASSO and RF algorithms each screened five valuable genes. Additionally, the SVM-REF algorithm was applied to filter six crucial genes. The four intersecting OFGs are known as zinc finger protein 90 (ZNF90), putative translationally controlled tumor protein-like protein TPT1P8 (TPT1P8), fibroblast growth factor 2 (FGF2) and family with sequence similarity 166, member B (FAM166B) (Fig. 4). In the RPL, ZNF90, TPT1P8, and FGF2 are down-regulated and FAM166B is up-regulated. As verified by the testing set, the expression of OFGs were noticeably decreased in RPL women, except for FAM166B (Fig. 5A–D). To assess their diagnostic effectiveness, we produced ROC curves for the OFGs in testing cohort. All OFGs displayed excellent diagnostic results with AUCs exceeding 0.88 (Fig. 5E–H). Consequently, ZNF90, TPT1P8, FGF2 and FAM166B were identified to be promising biomarkers for diagnosing RPL.

Figure 4
figure 4

Screening underlying OFGs by machine learning. (A) Identifying biomarkers by LASSO algorithm. (B) Random Forest algorithm treated the top 5 genes in terms of MeanDecreaseGini score as OFGs. (C) SVM-RFE algorithm filters out 8 OFGs. (D) Venn diagram displaying four OFGs intersected by machine learning algorithms.

Figure 5
figure 5

Validation of the OFGs. (AD) Expression of ZNF90, TPT1P8, FGF2 and FAM166B in RPL patients compared to controls. (EH) Diagnostic effectiveness of ZNF90, TPT1P8, FGF2 and FAM166B in ROC curves.

Assessment of immune cell infiltration

In further analysis, we applied the CIBERSORT algorithm to discover the pertinent proportions of 22 different immune cell types. As shown in the bar chart, each sample has a different proportion of immune cell subpopulations (Fig. 6A). The violin graph demonstrated that infiltration of monocytes was strongly significant in the RPL group, whereas infiltration of γδ T cells exhibited remarkably significant in control dataset (Fig. 6B). Moreover, the interaction among immune cells revealed that regulatory T cells exhibited the most positive correlation with M0 macrophages, although CD8 T cells negatively exhibited relationship with CD4 memory resting T cells (Fig. 6C).

Figure 6
figure 6

Proportion and association of immune cell infiltration. (A) The ratio of 22 immune cell subtypes between RPL and controls women. (B) Violin diagram showing differences in immune cells between RPL and controls women. (C) Correlation analysis among 22 immune cells.

Relationships between immune cells and biomarkers

Correlation analyses were executed to estimate the connections between immune cells and biomarkers. We discovered that the down-regulated genes: ZNF90 and TPT1P8 were positively related to γδ T cells (ZNF90: R = 0.324, P = 0.041; TPT1P8: R = 0.328, P = 0.034), but negatively associated with monocytes (ZNF90: R = -0.649, P < 0.001; TPT1P8: R =  − 0.418, P = 0.007). M2 macrophages and plasma cells were positively linked to down-regulated FGF2 and negatively connected to up-regulated FAM166B, whereas CD4 resting memory T cells were inversely related to these two genes. In addition, ZNF90 was also concerned with eosinophils and naive B cells; FAM166B was associated with monocytes and resting dendritic cells (Fig. 7).

Figure 7
figure 7

Visualization of Spearman correlation between 4 OFGs and immune cells in RPL patients. (A) ZNF90. (B) TPT1P8. (C) FGF2. (D) FAM166B.

Discussion

RPL is still a major health concern in reproductive medicine, creating a significant psychological burden to individuals because 50% of RPL is idiopathic and evidence-based therapy is restricted. Currently, machine learning algorithms are excellent tools for analyzing underlying linkages and selecting ideal parameters for gene selection among all DEGs of biological significance in high-dimensional data. The discovery of new genes as potential biomarkers, as well as the study of immune cell infiltration features, will have a substantial impact on the early diagnosis and prediction of RPL. In this study, we identified a total of 42 DEGs, of which 28 genes were upregulated, and 14 were downregulated, based on the gene expression datasets of RPL and normal controls. Multifunctional enrichment analysis showed that these DEGs were related to MAPK signaling pathway, PI3K-Akt signaling pathway, inflammation and immune responses. Then based on three machine learning algorithms (LASSO regression model, RF algorithm and SVM-RFE algorithm), we screened out 4 best eigengenes (FGF2, FAM166B, ZNF90 and TPT1P8). Finally, we revealed the relationship between the 4 OFGs and immune cells using the CIBERSORT algorithm.

Fibroblast growth factor (FGF) regulates cell fate, angiogenesis, immunity, and metabolism through signalling through its receptors FGFR1, FGFR2, FGFR3 or FGFR4. According to research, dysregulation of FGF signaling causes human diseases such as lung, breast, and stomach cancer, as well as achondroplasia23. Furthermore, FGF and its receptors are major factors in fetal and placental angiogenesis. The FGF signaling process regulates immunity dynamically as well as regulating inflammation and tissue repair by immune cells. After FGFR1/2 signaling and VEGF/ANGPT2 secretion, FGF2 can promote endothelial cell proliferation and migration, for example24,25. FGF1 and FGF2 promote neutrophil chemotaxis to damaged tissues through FGFR2 signaling26. In addition, Cox reported that implantation of exogenous FGF-2 into the quail embryonic environment induced angiogenic cells and patterned blood vessel formation27. Thus, low FGF2 expression may contribute to RPL by hindering embryonic angiogenesis. FAM166B is a gene that has yet to be studied in depth. Previous studies involving FAM166B, which focused more on expression in multiple symmetric lipidosis and skeletal muscle, showed that FAM166B is highly expressed in adrenal glands and ciliated cells, but its precise function remains unclear28. A recent study showed that FAM166B expression correlates with breast cancer prognosis. The study found that the expression level of FAM166B in breast cancer was negatively correlated with the level of macrophage infiltration and positively correlated with the expression of CD 4+ T cells, which suggesting that the recruitment and regulation of immune infiltrating cells may be mediated by FAM166B in breast cancer29. Zinc finger proteins (ZFPs) are the largest family of transcription factors characterized by finger-like DNA-binding domains that play an important role in metabolic processes, autophagy, apoptosis, immune responses, differentiation, and stem cell maintenance30. However, only a few studies have reported the involvement of ZFPs in immune-related processes, such as immune response, immune homeostasis, and cytokine production recently31,32. ZFPs bind to Zinc, which is involved in the developmental process of oocytes. Abu-Soud reported that zinc deficiency leads to high ROS production in oocytes, which affects oocyte quality and female fertility by interfering with physiological antioxidant mechanisms that act on biomolecular, protein and cellular processes33. In the present study, low expression of ZFPs caused a decrease in binding efficiency to zinc, which may lead to RPL. Currently, there are few studies on TPT1P8, also known as FKSG2. The latest literature found that the significantly lower expression of TPT1P8 in the anterior cingulate cortex (ACC) of Cushing's disease (CD) patients was associated with immune function34. Accordingly, the OFGs screened in this study are involved in signaling transduction, inflammation and immune responses, which may contribute to RPL occurrence and progression.

Based on the CIBERSORT analysis, we found that RPL and the control group had significantly different levels of immune cell infiltration, especially monocytes and γδ T cells. Our study found that RPL samples had higher levels of monocyte infiltration. Consistent with our conclusions, previous studies also showed that women with RPL had higher monocyte concentrations detected in peripheral blood than normal fertile controls35. During normal pregnancy, immune cells at the fetal-maternal interface increase, such as uterine NK cells and macrophages. Monocytes are short-lived cells that arise from monocyte precursors in the bone marrow and makeup approximately 5–10% of the total number of circulating white blood cells36. Accumulating evidence suggests that circulating monocytes are recruited to the decidua at the onset of pregnancy to generate macrophages with essential immune functions. Thus, decidual macrophages contribute to maternal tolerance to fetal antigens11. Monocytes also present antigens to T cells, which regulate the adaptive immune response. In addition, they are involved in fundamental processes of a successful pregnancy, such as trophoblast invasion and tissue and vascular remodeling37. In addition, our results also showed that γδ T cells were decreased in RPL samples compared with normal controls. According to the T cell receptor (TCR), T cells are divided into αβ T and γδ T cells, which express αβ TCR and γδ TCR, respectively38. γδ T cells play numerous roles in establishing and maintaining immune tolerance in early pregnancy but are often overlooked. γδT cells are increased in the early decidua of normal pregnancy. They secrete anti-inflammatory cytokines such as IL-10 and TGF-β and transduce negative signals by expressing regulatory molecules such as PD-1, Tim-3, and CD 16039,40. From this, we speculate that dysregulation of endometrial monocytes and γδ T cells in women with RPL biases the maternal immune system towards pro-inflammatory properties, which may ultimately lead to RPL.

In our study, according to the correlation analysis, the four characteristic genes screened are related to immune cell infiltration of RPL. In Fig. 7, the expression of four candidate genes showed strong correlation with monocytes, but weak correlation with other differentially infiltrated immune cells. Comins-Boo suggests that monocyte dysregulation is a major factor contributing to RPL. We therefore hypothesize that key genes interacting with monocytes may promote the development of RPL. However, the specific mechanism for the weak correlation of γδ T cells and plasma cells with key genes is unknown and may be related to the small sample size.

The integration of microarray technology, bioinformatics analysis, and ML algorithms has become a hotbed for biomarker screening, diagnosis prediction, and prognosis evaluation of complicated diseases in recent years. Moreover, computational biology methods can provide the basis for further basic experimental design. In this study, the combination of the LASSO model, RF algorithm and SVM-RFE algorithm was applied to identify potential biomarkers of RPL, as it has rarely been done before. This study, however, has limited data, and more external data, clinical samples, and prospective clinical trials are needed in the future to verify the results.

Conclusion

In this study, we found that ZNF90, TPT1P8, FGF2 and FAM166B could serve as candidate biomarkers for RPL, and we explored their correlations with immune cells in the pathogenesis of RPL. In addition, the differential infiltration of monocytes and γδ T cells is related to the onset and progression of RPL. Future studies with larger sample sizes and more predictive clinical measures are necessary for validating these results.