Background & Summary

Repeated implantation failure (RIF) is defined as the failure of good-quality embryos to achieve a clinical pregnancy after transfer in multiple in vitro fertilization (IVF) treatment cycles1,2. It is estimated that RIF occurs in approximately 5–15% of couples undergoing IVF3,4. The causes of RIF are complex and can be broadly categorized into maternal and embryonic factors5. Compared to embryo quality, the assessment of the maternal factors, particularly the endometrium, is far more limited. Therefore, thoroughly exploring and understanding endometrial functions is critical for improving the success rate of IVF treatment6,7,8.

Abnormal gene expression in the endometrium is frequently associated with RIF. Bulk RNA sequencing and microarray analyses focused on the endometrium have identified numerous dysregulated biomarkers in patients with RIF, including key mRNA and long noncoding RNA (lncRNA) hub genes9,10,11. Furthermore, since endometrial receptivity is highly correlated with reproductive outcomes7, some of these identified biomarkers can be utilized in endometrial receptivity array (ERA) testing12. Among the differentially expressed genes (DEGs), some biomarkers were identified related to immune factors significantly contributing to RIF through abnormal immune response13, immune infiltration11, imbalances of immune cell types14,15, and endometrial disruption16. However, there is notable inconsistency in the results from these studies, and the underlying biological mechanisms remains poorly understood17.

Single-cell RNA (scRNA) and spatial transcriptomics sequencing offer significant advantages in revealing cell type heterogeneity, spatial cell locations and communications, and cell-specific gene expression18,19,20. Recently, many studies utilizing scRNA-seq focused on various aspects of the endometrium, including constructing normal or disordered endometrial tissue cell maps21,22,23,24, elucidating cellular compositions and gene expression dynamics throughout the menstrual cycle25, and exploring cell-cell crosstalk between the embryo and the endometrium, particularly the role of trophectoderm on embryo implantation25,26,27. Additionally, several single-cell RNA studies focused on RIF and identified abnormal gene expression patterns reported28,29,30. To further investigate the communications among these cells, spatial transcriptomics has been employed in some studies. For instance, spatial and single-cell transcriptomics have been used to characterize the spatial gene expression features of the normal human endometrium throughout the menstrual cycle31 or in vivo and in vitro32. Additionally, other studies aim to create a spatiotemporal atlas of the human maternal-fetal interface33. While these advancements offer a more comprehensive understanding of the endometrium, there remains a notable lack of spatial transcriptomics studies to identify cell type profiles, cell co-locations and gene expression atlas in the context of RIF.

Therefore, we performed 10x Visium Spatial Transcriptome (ST) Sequencing on 8 samples from 4 normal individuals and 4 patients with RIF in this study. After performing quality control and unsupervised clustering, we divided the spots into 7 niches with specifically expressed genes. Furthermore, we characterized the cellular compositions in our spots by integrating the public scRNA data, which reveals that the dominant cell type is epithelial cells. In summary, to our knowledge, this is the first study employing ST technology in RIF. We believe that the data source from this study will provide valuable references for further study on the molecular mechanism and treatment biomarkers of RIF.

Methods

Patients and samples enrollment

In this study, endometrial samples were collected after obtaining written consent from each participant. The research protocol was approved by the Medical Ethics Committee of The 960th Hospital of the PLA Joint Logistics Support Force. The study included four RIF subjects who had a history of failure to achieve clinical pregnancy after  ≥3 embryo transfers of good-quality embryos34. In comparison, the control group (referred to as the CTR group) consisted of 4 multiparous women without a history of miscarriage. All participants were confirmed free of uterine pathologies (including endometriosis and adenomyosis), endocrine, metabolic, auto-immune diseases and infection. The age range of the subjects was set to be ≤ 35 years, and the BMI threshold was set to be < 28 kg/m2. Demographic details of the subjects recruited for spatial RNA sequencing in both the RIF and CTR groups are listed in Table 1. Combining transvaginal ultrasound with urinary LH dipstick testing, we were able to detect the Luteinizing Hormone (LH) surge (LH + 0). Endometrial samples of both CTR and RIF groups were collected at the fundal and upper part of the uterus using Pipelle endometrial biopsy during the LH + 7.

Table 1 Clinical characteristics of subjects included in the control and RIF groups.

10x Visium spatial transcriptomics sequencing

Fresh endometrial tissues were obtained and rapidly frozen in isopentane pre-chilled with liquid nitrogen. The tissues were then stored at −80 °C. Fresh frozen tissues were sectioned into slices. Initially, RNA quality was assessed to ensure a minimum RNA Integrity Number (RIN) larger than 7 to minimize the degradation of RNA molecules. The optimal tissue optimization time was determined based on fluorescence imaging strength. The tissue splices were carefully placed onto designated capture areas on the 10x Visium Spatial Tissue Optimization Slide. Each slide contained four capture areas, each measuring 6.5 mm*6.5 mm and containing about 5,000 spots equipped with barcode sequences. Standard methanol fixation, hematoxylin and eosin (H&E) staining, and brightfield microscopes were utilized to visualize tissue sections. The tissue was permeabilized to release mRNA molecules, which were then captured by the adjacent spots. mRNA was subjected to reverse transcription to generate cDNA. Subsequently, libraries were constructed according to the standard protocol. The Illumina NovaSeq 6000 platform was employed for sequencing using the PE150 model.

Alignment, quality control, and clustering of visium data

The Space Ranger count pipeline (version 2.0.0) automatically aligned spatial transcriptome data to the human reference genome (GRCh38-2020-A), detected tissue sections, and aligned fiducial across all slices. To construct the Seurat object, we used the Seurat (version 4.3.0) Load10X_Spatial function to import the spatial feature-spot expression matrices, spatial location of spots and tissue slices. Spots with gene count below 500 or mitochondrial gene percentage exceeding 20% were excluded for each slice. Spot expression data normalization was done using the SCTransform function for each slice, followed by merging all slices using merge function. Principal component analysis (PCA) was conducted using the top 30 principal components (PCs) and dimension reduction was performed with a resolution of 0.6. Differential gene expression analysis was performed among spatial clusters or niches using the FindAllMarkers function. Spots with similar gene expression profiles were grouped into a single niche.

Public single-cell data processing

Public single-cell data (GSE183837)28 were downloaded from the GEO database. In total, 9 samples were obtained, containing 3 normal individuals and 6 RIF patients. The expression matrices from these samples were used for analysis. Several preprocessing steps were performed to ensure data quality; the poorly quality cells were first filtered out when they hit one of the following indices: the number of detected genes in the cell beyond the range 500–5000, UMI counts less than 800, or the mitochondrial gene percentage greater than 20%. In addition, the suspected double cells were removed using DoubletFinder (v2.0.3)35. After quality control, one sample named RIF1 captured with too much cells (13917 cells) than the others (< 8200 cells per sample), so we removed this sample to mitigate potential technical variations. Finally, 31348 cells coming from 8 samples are retained for further analysis using Seurat (v4.3.0.1). The analysis comprised the selection of highly variable genes (HVGs), eliminating batch effects by Harmony36, clustering cells, and annotating cell types or subtypes based on the expression of canonical markers.

Integration of spatial and single-cell data

The conditional autoregressive-based deconvolution (CARD) package (v1.1)37 was used to deconvolve the mixture of cellular components within the spots of the spatial data. CARD employs a non-negative matrix factorization model to estimate the cell type proportions for each spot, and it requires the single-cell data information to implement the deconvolution process. We computed the cellular components by averaging percentages of the cell types in all spots for each sample separately.

Data Records

The raw and processed spatial transcriptomics data generated in this study have been deposited in the Gene Expression Omnibus (GEO) database under the accession number GSE28727838.

Technical Validation

Quality matrix of the spatial transcriptomics data

10x Genomics Visium sequencing data and H&E images were processed automatically by SpaceRanger (v2.0.0). When insight into FASTQ files, it shows that the read-pair number among all eight samples is about 3*108, and the sequencing saturation is over 90%, thus our sequencing depth was sufficient to detect gene expression. The sequence data presents high sequencing accuracy with Q30 values for Barcode, UMI, and RNA Read all exceeding 90% (Table 2). Next, we summarized the quality matrix of the generated spatial libraries (Table 3). The number of spots under tissue shows some variation, ranging from 751 to 2018. The median genes and UMI Counts per spot among the samples were over 2000 and 4000, respectively. The rate of reads mapped to the genome was over 90% (Table 3).

Table 2 The detailed QC matrices of the spatial transcriptomics FASTQ files for the eight samples.
Table 3 The detailed QC matrices of the spatial transcriptomics libraries for the eight samples.

After applying screening criteria with gene counts greater than 500 and mitochondrial gene percentages less than 20%, a total of 10131 spots were retained. For these high-quality spots from all eight samples, the median feature count was 3156, the average median UMI count was 6860, the average median gene count was 3074, and the average median percentage of the UMI count belonging to the mitochondrial gene was 5.5%. There were no significant differences in the distributions of UMI counts (nCount_Spatial), gene counts (nFeature_Spatial) and mitochondrial genes percentage (percent_mito) among the control (CTR) and RIF samples (Fig. 1a). Moreover, as expected, the relationship between nCount_Spatial and percent_mito is very weak (R = −0.15), and the correlation coefficient between nCount_Spatial and nFeature_Spatial is very high (R = 0.98) (Fig. 1b). At last, when insight into the niche H&E images, it was observed that there were some morphological differences among the eight samples (Fig. 1c).

Fig. 1
figure 1

Quality control of the spatial transcriptomics dataset of the endometrium from four normal samples and four repeated implantation failure patients. (a) A Violin diagram illustrating the distributions of the number of unique molecular identifiers (UMIs), (nCount_Spatial), the number of genes (nFeature_Spatial) and the percentage of mitochondrial genes (percent_mito) in each spot of 8 samples. (b) Scatter diagrams showing the relationships between the UMIs counts and the percentage of mitochondrial genes (left), and between the UMIs counts and the gene counts (right). (c) Hematoxylin and eosin (H&E) staining images for eight samples.

Unsupervised clustering of the spatial transcriptomics data

To construct the spatial transcriptomics atlas, we clustered all spots from the RIF and CTR groups based on the gene expression profiles. Following spatial high variable genes identification, dimension reduction and unsupervised clustering, 12 clusters with distinct gene expression patterns were identified and the clusters were visualized using 2D UMAP plot (Fig. 2a). Additionally, sample and group information were visualized through using 2D UMAP plot, and batch effects were not observed in our data, as evidenced by spot distribution patterns of most clusters contained contributions from multiple samples (Fig. 2b,c). We merged clusters into 7 niches based on similar gene expression patterns (Fig. 2d), and visualized the top 6 marker genes for each niche via a heatmap plot (Fig. 2e). In detail, Niche1 was characterized by high expression of CFD, C1R, and DCN. Niche2 exhibited high expression of PAEP, CXCL14 and CLDN4. Niche3 was characterized by specific expression of SPP1, CLU and GPX3. Niche4 showed a specific high expression of SFRP4. Niche5 was characterized by specific expression of MMP26, TAPI2, MAP2K6, and ENPP3. VTCN1, PTGS1, LGR5 and SERPINA5 are the top markers of Niche6. Niche7 was primarily marked by smooth muscle (SMCs) cell markers such as ACTA2, RGS5 and COL4A2, indicating that the spots in Niche7 mainly are SMCs. Furthermore, there are some divergent characterizations of the spatial distributions for these niches in different samples or groups (Fig. 3a). Finally, we quantified the niche frequency across all samples, which shows that Niche1 and Niche2 constitute the primary components in 7 samples, except RIF4 uniquely exhibited Niche4 as its predominant component (Fig. 3b). Besides, the frequencies of other Niches, such as 3, 5, and 6 also show some variations across the eight samples (Fig. 3b).

Fig. 2
figure 2

Clustering of 10x Visium spots and constructing the spatial atlas. (a) Unsupervised clustering of 10131 spots into 12 clusters, and visualization on a non-linear dimension reduction method named 2D Uniform Manifold Approximation and Projection (UMAP). The 2D-UMAP plotting colored by sample (b) and group (c) information. (d) The 11 clusters were merged into 7 niches and shown using the UMAP plot. (e) The heatmap plot of the top 6 marker genes based on log2(Fold Change) values for each niche.

Fig. 3
figure 3

The characteristics of the seven niches in the spatial data. (a) The spatial distributions of the seven niches in the splices from the eight samples. (b) The ratios of spots belonging to the niches among the eight samples.

Spatial cellular landscapes of the endometrium in RIF

Since the 10x Visium platform captured a mixture of several cells for each spot and does not achieve single-cell resolution, we integrated the spatial data with the publicly available single-cell data to deconvolute the cell components within each spot. First, after performing quality control of the pubic single-cell data, we obtained a total of 31,348 high-quality single cells from eight samples. Using unsupervised clustering with Seurat, these cells were classified into 24 clusters (Supplementary Fig. 1a), and further annotated into 12 cell types based on the expression patterns of classical known markers21,22,23,25,28 (Supplementary Fig. 1b,c). The atlas showed that the major cell types in endometrial tissues are non-immune cells, including fibroblasts (DCN, LUM), proliferating fibroblasts (abbreviated as pFibroblasts, TOP2A, MKI67), smooth muscle cells (SMCs, ACTA2, RGS5), epithelial cells (EPCAM, KRT18, KRT19), and endothelial cells (VWF, CDH15). Furthermore, epithelial cells were further subclustered into three categories: unciliated epithelia1 (MIF, HINT1, CHCHD2), unciliated epithelia2 (CTNNA2) and ciliated Epithelia (FOXJ1, DYNLRB2, DYDC2). Immune cell types are all present with low frequencies in the tissue, such as T cells (CD3D, CD3G), NK cells (GNLY, NKG7, NCAM1, KLRC1), B cells (MS4A1, CD79A), macrophages (CD14, CD68 and C1QA) and innate lymphoid cells (ILCs, PTPRC, IL7R, KIT, AQP3) (Supplementary Fig. 1c).

We then transferred the annotated cell type information from scRNA data to our spatial transcriptome data using CARD. We presented the spatial distributions of cellular components in each spot (Fig. 4a) and the sample-specific mean cellular frequencies across the within-sample spots (Fig. 4b). The results showed that all the cell types captured in the single cell data replicated in our data, and the most abundant cell component is unciliated epithelia1, with an average percentage of approximately 73.94% across all spots (Fig. 4b). On the other hand, unciliated epithelial1 cells constituted the first predominant population in 10,103/10,131 spots. Other cell types with relatively high average proportions include fibroblasts (14.27%), SMCs (5.15%), macrophages (2.05%), endothelia (1.66%) and unciliated epithelia2 (1.2%) (Fig. 4b). To ensure the accuracy of the prediction results, we also verified the expressions of the canonical known markers in our spatial data using bubble plot (Fig. 4c). The analysis revealed robust expression of fibroblast and epithelial cell markers with high percent expressed spots and average expression (Fig. 4c), and widespread spatial expression distributions (Fig. 5a,b), while other cell markers present lowly expressed (Fig. 4c). In summary, the expression patterns of these marks validated the spatial deconvolution results.

Fig. 4
figure 4

The cell type spatial atlas of the endometrium. (a) The pie charts of cell-type compositions of ST data deconvolved by CARD for each sample. (b) The averaged cell-type composition frequencies of the intra-sample spots among the eight samples. (c) A bubble plot showing the expression patterns of the marker genes among the spatial data of the eight samples.

Fig. 5
figure 5

The spatial expression of mark genes. (a) The spatial expression of the marker of epithelial cells (marked by KRT19) in H&E anatomic images among the eight samples. (b) The spatial expression of the marker of fibroblast (marked by DCN) in H&E anatomic images among the eight samples.