Introduction

Ovarian follicles are the functional units of the ovary. Within the follicle, the bidirectional signaling between somatic cells and the oocyte changes continuously over time to synchronize follicle development and oocyte maturation. Integrated signaling cascades coordinate oocyte maturation, regulating GC proliferation and theca cell differentiation1. Yet, the function and the roles of distinct types of intercellular communications remain poorly clarified. Follicle maturation is a complex dynamic process, tightly controlled through autocrine and paracrine regulatory factors produced principally by theca, granulosa cells and oocytes. In addition, it is controlled by hormones (from the hypothalamic-pituitary-ovarian axis, including pituitary –derived FSH, LH) and steroids secreted by the ovary. GCs play an essential role during oocyte maturation, providing an optimal microenvironment for follicle development2,3,4. Two major cell types of GCs, named mural and cumulus, surround respectively the follicular antrum and the oocyte. Mural granulosa cells (MGCs), have an endocrine function and promote follicle growth, whereas cumulus cells (CCs) support oocyte maturation and provide essential metabolic and signaling functions5,6. Although MGCs and CCs are closely related granulosa cell types, differences in their function and gene expression profile5 suggest that these significant changes, in gene expression, coordinate the follicle maturation7.

As the oocyte competence is acquired only through bidirectional communication with GCs2, the closely surrounding CCs constitute an attractive biological material for all the molecular analyses finalized to infer on the developmental abilities of the oocyte. This implies that CCs transcriptomic profiling may serve to identify biomarkers for predicting oocyte quality8. Several studies have described gene expression of CCs (matured either in vivo or in vitro) in cumulus-oocyte complex (COC)9 or derived from pre-ovulatory follicles exposed to human chorionic gonadotrophin (hCG)5,10,11.

For example, Yerushalmi et al. (PRJNA216966)12 have reported transcriptional differences between CCs derived from mid-antral follicles (compact cumulus cells) and expanded CCs from metaphase 2 (M2) COC, identifying 1746 Differentially Expressed Genes (DEGs). In addition, they found 89 Differentially Expressed (DE) long noncoding RNAs, a few of them encoded within introns of genes required for GC function. For what concerns transcriptional gene expression profiles within COC, Li et al. (PRJNA649934)13 performed RNA-seq experiments to investigate the transcription gene profile of CCs and Oocytes derived from the same Polycystic Ovary syndrome (PCOS) patient or from age-matched controls. In PCOS oocyte, they found an altered expression of some genes involved in Microtubule-based processes13. Overall, global gene expression in CCs and in oocytes from PCOS patients showed imbalance in other essential signalling pathways underlying Mitochondrial Function and Oxidative Stress.

MicroRNAs (miRNAs) modulate gene expression post-transcriptionally, yet their expression and function within the GC populations remain poorly investigated. Recent reports described and studied the miRNA profile of MGCs and CCs isolated from human pre-ovulatory follicles from healthy women undergoing ovum pick up for in vitro fertilization (IVF). Velthut-Meikas et al. (PRJNA200696)14 found 90 miRNAs differently expressed between CGCs and MGCs, [2 among them were of intronic origin]. In MGCs, some of the DE miRNAs-targets specifically act on extracellular matrix remodeling, Wnt/beta-catenin and Neurotrophin signaling pathways.

A more recent study by Andrei et al. (PRJNA417973)15 reported the miRNA expression profile of human primary MGCs and CCs, in vitro cultured. In this study, the authors found 53 miRNAs differentially expressed between MGCs and CCs, whose targets are enriched in Ubiquitin-like Protein Conjugation genes. Interestingly, the two studies showed a limited overlap in terms of DE miRNAs, mostly because of differences in data analysis strategy.

Nevertheless, the overview of the published studies describing CCs’ transcriptome has provided an extremely limited consensus of “signatures”.

Conversely, by combining RNA-seq data from various cell types, we can capture a wide range of expression patterns, offering insights into the functional diversity and complexity of the transcriptome expression and regulation across the GCs, Secondly, comparative transcriptional profiling of CCs derived from immature or mature COC would help identifying genes that are expressed in the final stages of follicle maturation.

Through a survey of publicly available transcriptomics datasets, we re-analysed the raw RNA-seq data from different “sources”: two GC subpopulations, two CCs samples derived from mature and immature COCs, and from CCs and the oocyte. The original studies laid a solid foundation, but we saw an opportunity to delve deeper, asking new questions that bring out the latent possibilities within these datasets. Thereby, we also focused on Differential Alternative Splicing (DAS) analysis of CCs transcripts, to get insights into alternative splicing events occurring during follicle maturation. By revisiting and reinterpreting the data, we aimed to uncover additional layers of biological insight, providing a richer and more nuanced understanding of the biological landscapes under study. This approach not only maximizes the value of the existing data but also opens new avenues for discovery and innovation, such as the prediction of some biomarkers for measuring the oocyte competence and reproductive potential.

Results

We selected a cohort of samples according to inclusion criteria detailed under the Methods section. The final core dataset consists of 51 samples, (30 miRNAs samples and 21 mRNAs) for a total of 98,82 GB of input data (Table 1, Supplementary Table 1). Raw data from the selected BioProjects were downloaded from Sequence Read Archive (SRA)16 and individually reanalyzed applying the same bioinformatics pipeline depicted in Fig. 1.

Table 1 The table shows the main characteristics of each BioProject, including the platform used for coding and non-coding RNA-sequencing, the reference paper and the type of cells analysed.
Fig. 1
figure 1

Workflow diagram of the RNA-seq analysis. Overview of the data analysis pipeline. The pipeline begins with the acquisition of raw reads in FASTQ format. These reads are then assessed for quality using the FastQC program. Terminal low-quality bases and adapter sequences are subsequently trimmed off. Mapped files in BAM format are then generated from the processed reads and raw read counts are quantified. The pipeline proceeds with differential gene expression analysis. The resulting differentially expressed genes (DEGs) are analysed for Gene Ontology (GO) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) enrichment. Starting from BAM input files, the pipeline also includes a parallel branch analysis for identifying Differential Alternative Spliced Genes (DAS), enriched by a splicing events distribution analysis and a GO- and KEGG- enrichment analysis for DAS. Finally, miRNA analysis is conducted as part of the small-RNA seq process. This comprehensive pipeline provides a robust approach to RNA-Seq data analysis, from raw reads to functional insights..

In the present study, three RNA-sequencing datasets from GCs were reanalyzed to identify DEGs (Differentially Expressed Genes) and DAS (Differentially Alternatively Spliced Genes). In addition, to evaluate the miRNAs profile of granulosa cells we re-analyzed and integrated two further Bioprojects treating small-RNASeq data (Table 1).

Results

Actin-binding proteins (ABPs) play crucial roles in granulosa cell function and oocyte maturation

For cumulus cells, three different bioprojects were analyzed for Gene Expression Analysis (Table 1).

Firstly, in PRJNA216966, we identified 3093 DEGs, nearly double the number of DEGs previously reported12, with 1662 genes up-regulated and 1431 down-regulated in Germinal Vesicle stage CC compared to M2-expanded CCs (Supplementaty Fig. 1a,b).

Secondly, in PRJNA200696, 2242 DEGs were identified, consisting of 1201 up-regulated and 1041 down-regulated genes in CCs compared to MGCs (Supplementaty Fig. 2a,b).

Lastly, in PRJNA649934, the differential expression analysis detected 7253 DEGs, with 3609 genes up-regulated and 3644 down-regulated in Oocytes compared to CCs (Supplementaty Fig. 3a,b).

As a first step, an intra-Bioproject functional enrichment analysis was performed. This approach ensured that the biological context of each specific dataset was considered independently, minimizing the potential confounding effects of combining disparate conditions. Supplementary figures illustrate these analyses (Fig S1 1c, 1d, S2 2c, 2d, S3c, 3d).

Beyond the individual project re-analysis, we performed a comparative analysis to understand the distribution of DEGs across different cell types, including cumulus cells, oocytes, and mural cells. This comparative approach was carefully designed to highlight both the unique and the shared pathways/biological processes among the different cell types. This approach helped us identify critical pathways and mechanisms that might be overlooked if we only focused on one direction of gene regulation.

Therefore, an inter-bioprojects comparison of functional profiles was executed to gain insights into the biological roles of DEGs emerging from the three BioProjects. By including both up- and down-regulated genes in the gene set enrichment analysis, we could better understand how these regulatory changes impact the biological functions and communication networks essential for follicle development. This analysis allowed us both to find a functional consensus and, at the same time, to highlight any differential functional module specific for each data set (Fig. 2a).

Fig. 2
figure 2

Analysis and comparison of enriched GO terms for DEGs. (a) Dotplot shows enriched GO terms over the 3 BioProjects, highlighting the first twelve most significantly over-represented GO terms in biological processes. The x-axis reports the Bioproject name and the n. of DEGs involved in those GO processes reported in parenthesis. The y-axis shows significantly enriched GO terms. The size of each dot indicates the number of genes associated with that GO term in the respective cluster (gene count). The color scale represents the adjusted p-value, with darker colors indicating higher statistical significance. Only GO terms with an adjusted p-value < 0.05 are shown. (b) Upset plot of the intersection of DEGs across BioProjects. The horizontal bar graph on the left shows the total number of DEGs for each BioProject. The upper black bar graphs show the number of DEGs for each overlapping combination and black connected dots indicate which BioProjects are involved in each intersection. Plot shows the number of shared DEGs among 3 Bioprojects (n = 236, herein defined as “core set”), among at least two BioProjects (n = 390, 521 and 1229) and BioProject-specific (n = 1095, 1236 and 5267). (c) PPI network of the core set of 236 shared DEGs. The network nodes represent proteins. Colored nodes highlighted in red (n = 160) are proteins known to be involved in “Alternative splicing” with a great statistical significance (p-adj < 0.05).

The functional consensus analysis highlighted “Actin binding” as the GO Term shared among all comparisons, while the other GO Terms are more BioProject-specific or denote CC-specific GO Terms. Of note, CCs collected at two different developmental stages (GV-M2 CCs) are placed along a middle ground between pre-ovulatory CCs vs MCGs and expanded CCs vs Oocyte (Fig. 2a).

To further elucidate the gene expression patterns within different cell types, we examined the distribution of up-regulated genes (Fig. S4), which advised our subsequent focus on CCs. Therefore, we performed a functional enrichment analysis solely on genes that were up regulated in CCs (taken at different stages of follicle maturation) and results are reported in Fig. S5. This targeted approach ensured the functional analysis to be cumulus-specific, without contamination from markers of mural GCs or oocytes. Results of this more focused analysis highlighted roughly the same enriched categories, like “actin binding” and different terms related to cell-cell communication and metabolic activity, which are known to be critical in CC function and oocyte maturation. Additionally, we identified both shared and unique pathways across different stages of cumulus cell development.

PPI analysis reveals a coreset of 236 DEGs potentially involved in alternative splicing processes

We were also interested in using a complementary approach to GO-enrichment analysis for finding a consensus list of proteins which could be key interactors and regulators of different developmental stages of CCs. As shown in the upset plot in Fig. 2b, there is a coreset of 236 DEGs shared by the three Bioprojects (Fig. 2b, first bar in the upset plot; the whole list of overlappings is supplied17.

To this end, we used STRING database18 which focuses on experimentally validated and literature-derived protein-protein interactions. It provides insights into how proteins interact and form functional complexes, highlighting biological processes and pathways that might not be as apparent in GO analysis. Intriguingly, by applying a PPI (Protein-Protein Interaction) network analysis on the core set, the “Alternative Splicing” pattern resulted enormously enriched with an FDR of 9.96e-05 (Fig. 2c, red nodes) thus suggesting that, besides individual peculiarities of each Bioproject, these 236 DEGs could have strong consequences on Alternative splicing variations in GCs.

Unraveling the impact of alternative splicing events on transcription regulators

Previous analyses of RNA-seq data in GCs subpopulations have largely focused on gene expression rather than alternative splicing. Nevertheless, the results obtained in our PPI analysis strongly encouraged us to investigate the extent to which alternative splicing contributes to transcriptomic variation between each data subsets.

In CCs from the GV cumulus-oocyte-complexes (COCs) and from the M2 expanded COCs (PRJNA216966), Differential Alternative Splicing analysis identified 1402 DAS (Supplementary Table 2), including 104 transcription factors (TFs).

In pre-ovulatory follicles, (CCs vs MCGs) (PRJNA200696), we identified 378 DAS events (Supplementary Table 2), involving 45 TFs. In stimulated vs unstimulated CCs (PRJNA649934), we detected 924 DAS events (Supplementary Table 2), 72 of which are TFs.

Regarding the splicing event distribution of each Bioproject (Fig. 3a), we found more similarity between PRJNA200696 and PRJNA216966, probably because both bioprojects share as source the same granulosa cells type. Particularly, Intron Retention (IR) and Multiple Exon Skipping (MES) events seem to have a great role in AS events occurring in GCs. Conversely, in PRJNA649934 we found a greater enrichment in terms of Alternative Last Exons (ALE).

Fig. 3
figure 3

Analysis of splicing event distribution and comparison of enriched GO terms and KEGGs for DAS. (a) Splicing distribution events over the DAS detected in the three mRNA bioprojects showing cumulative percentage in the histogram. The categories are: IR (Intron Retention); ALE/AFE (Alternative Last/First Exon); MES (Multiple Exon skipping); MXE (Mutually Exclusive Exons); alt3′ (alternative 3′ exon); alt5′ (alternative 5′ exon). (b) A dotplot that shows the common enriched GO terms for DAS in the three bioprojects. The x-axis represents different CC clusters, while the y-axis shows significantly enriched GO terms. The size of each dot indicates the number of genes associated with that GO term in the respective cluster (gene count). The color scale represents the adjusted p-value, with darker colors indicating higher statistical significance. Only GO terms with an adjusted p-value < 0.05 are shown. (c) The cnet-plot shows common GO Terms and highlights the identity of the DAS that contributes to the enrichment of that specific category. (d) The cnet-plot shows common KEGG Pathways and highlights the identity of the DAS that contributes to the enrichment of that specific category.

As consequence, a cluster comparison of Gene Ontology and KEGG enrichment analysis of the DAS lists shows greater similarities between PRJNA200696 and PRJNA216966, while PRJNA649934 has specific enriched GO terms (Supplementary Table 3). In pre-ovulatory follicles, CCs versus MCGs, and in stimulated versus unstimulated CCs, alternative splicing affects genes involved in “retinoic acid receptor binding” (GO:0042974), “nuclear hormone receptor binding” (GO:0035257) (Fig. 3b,c) and “Ubiquitin mediated proteolysis” (hsa04120) Kegg pathways (Fig. 3d).

On the other hand, “Cadherin binding” GO term results strongly enriched for DAS in Cumulus Cells from GV cumulus-oocyte-complex (COC) and from M2 expanded COC and in stimulated versus unstimulated CCs (Fig. 3b,c).

Specific pathways uniquely enriched in PRJNA216966, are “Ubiquitin binding”, “DNA-binding transcription factor binding” and “RNA polymerase II-specific DNA-binding transcription factor binding” (Fig. 3b). Therefore, given the high impact that this last category of DAS (n = 104) could have on transcription, we analyzed in greater detail the expression of downstream targets. By applying a GO enrichment analysis over the DEGs targeted by the 104 TF-DAS, we found some terms in common with the ontological DEGs analysis (GO:0001816 cytokine production, GO:0030155, regulation of cell adhesion) and intriguingly some unique terms like “female sex differentiation” (GO:0046660), “ovarian follicle development” (GO:0001541) and “reproductive structure development” (GO:0048608).

Given the impact on transcription highlighted in Fig. 3c, we wondered whether we could observe significant changes in terms of log2FC on the target genes of DAS Transcription Regulators. Although we have found some cases in which DAS Transcription Regulators seemed to have an effect over the cumulative gene expression of its target genes (Suppl. Figure 6a), our analysis pointed out that Differential Splicing of TFs may not be the only event to determine effects on downstream transcription.

Nevertheless, to identify potentially interesting genes for future studies, we used the VOILA web-based application to focus our attention on the AS events affecting some TFs which resulted interesting from the Supplementary Fig. 6a. We observed that many DAS events fall in regions transcribing functional protein domains associated with epigenetic modifications, notably histone modification.

We performed an additional GO enrichment analysis on DEG targets of DAS Transcription Factors for each Bioproject.

Interestingly we found “Transcription coregulator activity” and “Transcription factor binding” largely enriched in PRJNA216966 in Fig. 3c. Alternative splicing is a fine-tuning regulatory mechanism that can vary significantly between similar cell types or different maturation stages. In fact, in the previously mentioned dataset comparing cumulus cells at GV (germinal vesicle) and M2 (metaphase II) stages, we found numerous GO terms related to key processes such as “ovarian follicle development,” “response to follicle-stimulating hormone,” and different terms related to cell differentiation (Fig. S6b). These findings highlight the role of alternative splicing in regulating the genes involved in these critical biological processes and the importance of transcription regulators in orchestrating the complex events leading to ovulation. Enrichment results of the PRJNA200696 and PRJNA649934 are shown in Figure S6c,d. Among the DAS-TFs found in PRJNA216966, [CCs in two developmental stages, GV-M2], GATA4 (GATA Binding Protein 4), GATA6 (GATA Binding Protein 6), NR5A1 (nuclear receptor subfamily 5 group A member 1), NR5A2 (nuclear receptor subfamily 5 group A member 2), all these TFs play a crucial role in follicle maturation (Fig. 4).

Fig. 4
figure 4

Splicegraph analysis and dPSI of functional splicing events occurring in transcriptional regulators in PRJNA216966. Each plot provides a visual representation of a splicegraph along with the differential Percent Spliced In (dPSI) values for a specific splice junction. The plot emphasizes splice junctions exhibiting significant dPSI values, with annotations indicating their associated biological functions within the highlighted region. The splice graph is a directed acyclic graph representing alternative splicing events in a gene. It illustrates connections between exons and introns: exons are denoted by numbered nodes enclosed in boxes, while intron retention events are represented by smaller boxes lacking numerical labels. Raw read counts for each specific splicing event are displayed on both nodes and edges. At the top of each plot, the genomic coordinates of the splicing junctions are indicated. At the bottom of the plot, a violin plot is presented, illustrating the distribution of differential Percent Spliced In (dPSI) values associated with the highlighted junction. The width of the violin plot conveys the probability density of different dPSI values, offering insights into the variability of splicing patterns. (a) GATA6. (b) GATA4. (c) NR5A1. (d) NR5A2.

miRNA DE analysis reveals a core set of 12 DEMs which may influence cell cycle progression and nucleic acids metabolism

Both the bioprojects (PRJNA417973; PRJNA200699) have investigated the miRNA transcriptome landscape of CCs and MGCs. This allowed us to make an integrated complete comparison for searching either meta-signatures of differentially expressed miRNAs (DEMs) or common DEMs to use for GO and KEGG enrichment analysis.

In detail, the analysis of PRJNA417973 highlighted 46 DEMs, half of which (23) up-regulated and the other half down-regulated (Fig. 5a); while in PRJNA200699 we found more DEMs in CCs compared to MCs: 82 DEMs (Fig. 5b), 48 among them up-regulated and 34 down-regulated.

Fig. 5
figure 5

PRJNA200699 and PRJNA417973 small RNA-seq analysis. (a) Heatmap of differential expressed miRNAs in PRJNA200699. (b) Heatmap of differential expressed miRNAs in PRJNA417973. (c) Venn diagram illustrating miRNAs that have the same expression pattern between the two bioprojects (intersection, herein indicated as DEM-core set, n = 12) and Bioproject-specific DEMs. (d) Common enriched GO terms over the target DEGs of DEM-core set. (e) PRJNA200699 network shows DEM-core set and targets which are differentially expressed. Spheres represent miRNAs colored by log2FC, from red up-regulation to blue down-regulation.

Intersecting the two lists of DEMs, we found 12 common DEMs which interestingly share the same pattern of expression, unveiling cell type specific “signatures”. In MGCs hsa-miR-30a-5p, hsa-miR-30a-3p, hsa-miR-142-5p, hsa-miR-148a-3p, hsa-miR-10b-5p, hsa-miR-144-5p and hsa-miR-146a-5p are up-regulated while hsa-miR-92a-1-5p, hsa-miR-145-5p, hsa-miR-149-5p, hsa-miR-129-5p and hsa-miR-196a-5p are all up-regulated in CCs.

Since the pattern of expression was extremely coherent in the two granulosa cell types, we exerted a GO enrichment analysis over the predicted target genes of the 12 shared DEMs to enlighten downstream pathways modulated by such miRNA core set within the two GC populations. The GO analysis highlighted common enriched GO-terms with statistical significance, first among them, “histone modification (GO:0016570)”, “regulation of mRNA metabolic process (GO:1903311)”, “regulation of DNA metabolic process (GO:0051052)”, “regulation of translation (GO:0006417)”, “response to decreased oxygen (GO:0036293)”, “mitotic cell cycle phase transition (GO:)”(Fig. 5d).

Moreover, access to an independent gene expression dataset (PRJNA200696) that compares mural and cumulus cells provided a significant advantage in our analysis. This dataset enabled direct evaluation and validation of whether the pathways regulated by miRNA in cumulus and mural cells do exhibit corresponding changes at the mRNA level. Our analysis revealed a significant overlap in Gene Ontology (GO) terms between the targets of the identified miRNAs and the mRNA expression data from the independent dataset (Fig. S8) thus indicating that the pathways regulated by these miRNAs in cumulus and mural cells are also reflected at the mRNA level in the independent dataset.

To further elucidate the regulatory network of miRNAs and their target genes, we constructed a miRNA-mRNA interaction network based on our differential expression analyses results (Fig. 5e). The resulting network was visualized using the Fruchterman-Reingold algorithm, a force-directed layout that positions nodes based on their connections. In this visualization, miRNA positioning reflects the number of edges (target connections), with more centrally located miRNAs typically having a higher number of target mRNAs. Notably, neighboring miRNAs in the network tend to share more common mRNA targets, suggesting potential co-regulatory relationships. This network approach allowed us to identify key miRNA regulators, based on their connectivity, and to discern patterns of miRNA-mRNA interactions that may be biologically relevant to our experimental conditions.

Discussion

Our study investigates various cell types under their physiological conditions, with a particular focus on the interactions between mural/cumulus cells and the oocyte within the follicle. These cells communicate to coordinate follicle maturation and other essential processes. Consequently, we examined the entire follicle system, analyzing  how genes are upregulated or downregulated within these cells, ultimately contributing to key biological processes involved in follicle maturation.

To identify specific signatures associated with each granulosa cell type, we reanalyzed raw data obtained from several selected publicly available bioprojects using a standardized transcriptomic computational workflow. We aimed to analyze different datasets to gain insights into the enrichment of differentially expressed genes between human MGCs and CCs from distinct stages [in the Germinal Vesicle stage cumulus cells (GV-CCs) compared to the M2 expanded cumulus cells (M2-CCs) and the oocyte compared to the CCs].

After obtaining the list of DEGs specific for each Bioproject, a biological theme comparison was used to compare functional profiles from different conditions. Our rationale for including both up- and down-regulated genes in the GO enrichment analysis was to capture a comprehensive view of the regulatory mechanisms at play which could be crucial for the maturation and function of the follicle. Up-regulated genes can indicate processes that are being promoted or activated, while down-regulated genes can indicate processes that are being repressed or modulated.However, many processes are more complex than that: up and down regulated genes can participate in the same process simultaneously. Together, they provide a holistic picture of the cellular environment and the interactions between different cell types within the follicle.

Results indicate that the functional consensus module “Actin binding” is the GO-Term shared among all comparisons. To address the concern about potential cross-contamination from other cell types, we also focused exclusively on up-regulated genes in cumulus cells and conducted a thorough functional enrichment analysis (Fig. S5). “Actin Binding” was correctly remarked in this analysis as well, thus ensuring that the functional relevance of our results can be accurately attributed to cumulus cells. Remarkably, actin filaments are involved in multiple cellular processes such as cytoskeleton organization, nuclear positioning, germinal vesicle breakdown, spindle migration, chromosome segregation and polar body extrusion in oocyte mammalian meiosis19.

ABPs modulate the polymerization and depolymerization of actin filaments, thereby controlling the structure and dynamics of the actin cytoskeleton, and the secretory and endocytic pathway20. This regulation is vital for maintaining the structural integrity and shape of granulosa cells and oocytes21. Furthermore, during folliculogenesis, the development of ovarian follicles, granulosa cells proliferate and differentiate. ABPs help in remodeling the actin cytoskeleton, which is necessary for the migration, proliferation, and differentiation of granulosa cells, thus supporting follicle development22.

ABPs are involved in the formation and maintenance of gap junctions, ensuring proper signaling between GCs and oocytes. At the same time, the secretion of signaling molecules ensures nutrient exchange, which is critical for oocyte growth and maturation1.

MGCs, when stimulated to differentiate through FSH and insulin-like growth factors23,24, are involved in the endocrine function necessary for supporting the development of the follicle.

Notably, our comparative analysis of DEGs from pre-ovulatory MGCs vs CCs (Supplementary Fig. 2c,d) has highlighted the following GO terms: “extracellular matrix structural constituent”, “extracellular matrix binding”, “glycosaminoglycan binding” and “collagen binding”17. This comparison reveals a clear correlation between these DEGs and the physiological events underlying the follicle maturation.

The extracellular matrix (ECM) plays a critical role in mammalian ovarian function and oogenesis25,26,27,28, while the MGCs line the antrum, the CCs surround the oocyte, being regulated by oocyte secreted factors (OSFs)29. In the pre-ovulatory follicle maturation process, CCs produce hyaluronic acid that is deposited into the extracellular matrix and later stabilized by secreted proteins30,31. This newly formed ECM connects the oocyte and cumulus cells together. The growing oocyte then derives most of its substrates for energy metabolism and biosynthesis from the surrounding CCs32,33.

Gap junctions transmit metabolites, nutrients, and intracellular signalling molecules and all the OSFs that are required for the expansion of CCs34. In turn, CCs synchronize nuclear and cytoplasm maturation of the oocyte and regulate meiosis resumption by providing many factors and regulatory molecules to oocytes34. Cumulus expansion is an important aspect in the final steps of follicle maturation, which culminates in the formation of a mature cumulus-oocyte complex arrested at the metaphase of the second meiotic division (M2) and ready for ovulation and its subsequent fertilization35.

Among the key factors that affect the “oocyte competence” are the cytoplasmic synthesis/degradation of maternal mRNA together with an ordered distribution of organelles36,37,38,39. The outcome of both processes affects fertilization and embryonic development1,40,41,42,43,44. Abnormal mitochondrial rearrangement impairs the oocyte quality and maturation and chromosomal separation during meiosis45,46. Cytoplasmic oocyte development also includes maturation of cortical granules47,48 (membranous organelles derived from Golgi complexes) and cytoskeleton (microtubules and microfilaments network49) as well as of the endoplasmic reticulum (ER), the latter being responsible for the storage and release of free calcium, essential for the calcium reaction at fertilization50,51. Compelling evidence indicates that Actin, Microtubule and Organelle Dynamics as well as Golgi distribution are fundamental for oocyte maturation52,53. By comparing GV oocyte-vs-CCs and CCs GV-vs-M2, we found the following enriched GO terms “Microtubule binding”, “tubulin binding”, “catalytic activity, acting on DNA”. Conversely, GO terms like “histone binding”, and “transcription co-regulator activity” were enriched only when comparing GV_Oocyte versus CCs. The GO term “histone binding” points out that epigenetic modifications play important roles following meiotic maturation of mammalian oocytes54.

In our search for key regulators of different developmental stages in CCs, we identified a core set of 236 DEGs, shared by the three Bioprojects.

We are aware that much of the complexity within cells arises from functional and regulatory interactions among proteins, therefore we wanted to conduct a complementary approach to GO-term analysis, namely the Protein-Protein Interaction (PPI) analysis. Interestingly, PPI network analysis of the core set indicates that the “Alternative Splicing” pattern resulted  in being significantly enriched. We investigated the extent to which alternative splicing contributes to transcriptomic differences within all data sets.

To this end, we used the STRING database which focuses on experimentally validated and literature-derived protein-protein interactions. It provides insights into how proteins interact and form functional complexes, highlighting biological processes and pathways that might not be as apparent in GO analysis. The enrichment of “alternative splicing” term in STRING might indicate that the proteins involved in this process are highly interconnected and functionally significant within the shared coreset of 236 DEGs because many proteins interact directly in this process.

On the other hand, the solely GO enrichment analysis was not able to capture this important feature, probably because while it provides a broad view of functional categories, it might not always capture specific processes if they are not the main factor distinguishing these clusters.

Because of this result, we were strongly encouraged to perform a Differential Alternative Splicing (DAS) analysis on these data, which is something that none of the authors of the three Bioprojects have investigated in their published papers.

Regarding the DAS analysis we first wanted to describe the overall splicing changes in the three datasets, categorizing them into 8 categories and reporting cumulative frequencies. We observed a similar splicing event distribution when comparing the two bioprojects (PRJNA200696 and PRJNA216966), both based on a common granulosa cells source. In contrast, DAS events, in PRJNA649934, reveal a greater enrichment in terms of Alternative Last Exons (ALE) and, at the same time, a lower percentage of Mutually Exclusive Exons and Intron Retention events (IR). This observation could be linked to the mechanisms by which maternal mRNAs are stored in oocytes and gradually consumed with initiation of meiotic resumption. Alternative 3′ UTR isoforms enable inclusion or exclusion of cis-regulatory elements either for RNA binding proteins or microRNAs that can influence transcript abundance, stability, subcellular localization, and translation efficiency55. A large amount of maternal mRNA exists in mature oocytes at the GV stage although as dormant transcripts until meiotic maturation56,57. Compelling evidence supports the existence of stringent mRNA stabilizing mechanisms within GV-Oocytes, while the selective degradation of maternal mRNA is required for the activation of zygotic genome39,58. Regulation of maternal mRNA translation and degradation occurs during oocyte maturation and such events are essential for the oocyte competence necessary to accomplish maternal zygotic transition36,59. In maturing oocytes, cytoplasmic polyadenylation of the 3′-UTR is linked to mRNA stability and mRNA translation60. As such, we analyzed DAS events involving transcription regulators, including NR5A1, NR5A2, GATA4 and GATA6, and found that they affect domains of such TFs associated with epigenetic modification (i.e. histone modification). We have used differentially alternative spliced transcription factors (DAS-TFs) and their targets, which are differentially expressed genes (DEGs) within each dataset, for GO enrichment analysis. Our analysis revealed that many of the enriched GO terms are directly related to follicle development. The analysis was conducted on each individual dataset. This approach was necessary because alternative splicing is a fine-tuning regulatory mechanism that can vary significantly between similar cell types or different maturation stages. In fact, in our dataset comparing cumulus cells at GV (germinal vesicle) and M2 (metaphase II) stages, we found numerous GO terms related to key processes such as “ovarian follicle development,” “response to follicle-stimulating hormone,” and different terms related to cell differentiation. These findings highlight the role of alternative splicing in regulating the genes involved in these critical biological processes and the importance of transcription regulators in orchestrating the complex events leading to ovulation. Among the DAS-TFs, comparing cumulus cells at two developmental stages (GV_M2), we found GATA4, GATA6, NR5A1, NR5A2 (see Fig. 4). Compelling evidence indicates that GATA4 and GATA6 play a crucial role in folliculogenesis61. Although they do not contribute equally to ovarian function, as assessed by experiments performed with conditional knockout mice. Experimental results indicate that GATA4 regulates directly or indirectly more genes than GATA6. In particular, the expression of FSHR (which is essential for the differentiation of GCs) decreases only in GATA4 knockout mice. However, the role of GATA factors in promoting the expression of genes for extracellular matrix organization, steroid metabolism and ovulation has been clearly demonstrated using microarrays analysis. NR5A1 and NR5A2 are master regulators of steroidogenesis62,63. Defects in steroid hormone production impair follicular development and fertility. Experiments performed with conditional knockout mice indicate that the lack of expression of NR5A1 in postnatal ovary causes infertility64. Interestingly, also the overexpression of NR5A1 affects female fertility and metabolism, confirming the importance of a fine-tuned control of NR5A1 for female reproductive and metabolic functions62. Despite the structure similarity, NR5A1 and NR5A2 exert different effects in human granulosa cells. It has been proposed that the fine-tuning of steroidogenesis in the ovarian follicles is achieved through BMP-15 signaling that induces NR5A -target genes expression while suppressing NR5A2-linked genes63.

Interestingly, “Cadherin binding” GO terms result in being strongly enriched for DAS in stimulated versus unstimulated CCs and in CCs from GV-M2 COC. As mentioned above, folliculogenesis strictly depends on the contact of the surrounding granulosa cells and the oocyte. Cadherins are a group of membrane proteins involved in cell adhesion. They assure adhesion between adjacent cells by the homotypic interactions of cells exposing similar sets of cadherins at their surfaces65. Cadherins contribute to tissue integrity66, regulating cell migration, cell differentiation and the control size of a specific cell population67. Cadherin-catenin complexes also act as receptors for signaling molecules. Compelling evidence indicates that E- and N–cadherin together with associated catenin take part in NF-kb-mediated signaling, RhoA GTPase signaling, and Hippo, Yap1, RTK and Hedgehog pathways68,69,70,71. Interestingly, the Wnt4/beta-catenin cascade is crucial for ovary development in mice and other vertebrates72. Although it can be assumed that cadherins take part in the follicle maturation, since they constitute the core of adherens junctions, little is known about the signaling pathways regulated by cadherins during oocyte development. Therefore, our findings suggest a new deeper investigation of Alternative Splicing events in Cadherin-signaling associated genes. Despite the indirect suggestions of enriched follicle development function from DEGs only GO enrichment analysis, DAS analysis reported a significant enrichment of the ovarian follicle development category (GO:0001541)73.

How do maternal mRNAs remain stable during completion of meiosis or in initial stages of embryo development and how dormant transcripts are activated and degraded later? Likely, RNA-binding proteins are accumulated in fully grown oocytes for stabilizing and degrading mRNAs together with other components required for regulating protein synthesis and degradation. Interestingly, we observed several DAS genes involved in mRNA surveillance pathways by comparing GV_Oocyte vs CCs (Fig. 3d). We found genes encoding RNA binding proteins that are components of multi protein Exon Junction Complex (EJC) placed at the splice junction on mRNAs (MAGOHB, RNPS1, ACIN1) and involved in the nonsense-mediated decay (NMD) of mRNAs (MAGOHB, RNPS1). RNPS1 participates in mRNA 3′-end cleavage and mediated an increase in RNA abundance and translational efficiency. DAS genes encode proteins that bind with high affinity to nascent poly(A) tails and stimulate poly (A) addition (PABPN1, FIPL1) and are involved in nucleocytoplasmic trafficking. Furthermore, CSTF1 and CSTF3 (Cleavage Stimulation Factor Subunit 1 and 3 respectively) genes encode two of the three subunits forming the cleavage stimulation factors (CSTF), which are involved in polyadenylation and cleavage of 3′ end of mammalian pre-mRNAs. Among DAS genes, we also found GSPT1 (G1 to S phase transition 1) gene that is involved in regulation of translation termination and predicted to be part of translation release factor complex74.

Furthermore, we found regulatory components of phosphorylation-mediated signaling involved in the negative control of cell growth and division (Protein Phosphatase 2 Regulatory Subunit B gamma, PPP2R3C and Protein Phosphatase 2 Scaffold Subunit Abeta, PPP2R1B, both genes present alternative splice transcript variants).

Several DAS genes associated with Ubiquitin mediated proteolysis were shared by comparing both GV_Oocyte vs CCs and CCs GV vs M2 COC. The ubiquitination/deubiquitination process is important for degradation of proteins, transcriptional regulation, and cell cycle progression75 which are also crucial for the oocyte maturation. APC (anaphase-promoting complex) initiates the metaphase to anaphase transition by promoting cyclin B and securin degradation76. Among DAS genes, we found several core subunits of APC, like ANAPC5, ANAPC7, ANAPC10, ANAPC13 (which are large E3 ubiquitin ligases), controlling cell progression by targeting cyclin B for 26 S proteasome-mediated degradation. Through comparison of DAS between CCs_vs M2_CCs we found ANAPC13, ANPC7 and ANAPC5. Interestingly, multiple transcript variants for ANAPC5 gene have been described, resulting in shorter isoforms due to downstream AUG sites. Compelling evidence supports that ubiquitin E3 ligases mediate specific protein degradation, crucial in the progress of both meiotic and mitotic cell cycle77. Cullin ring-finger ubiquitin ligase 4 is one of the E3 ligase members that play multiple functions both in oocyte survival and in the meiotic cell cycle progression76. In line with this, among DAS shared by both groups (GV_Oocyte vs CCs and CCs_vs M2_CCs), we found CUL4A, ELOC, TRIP12M MGRN1, CDC27. The latter (together with CDC16) is a component of the anaphase-promoting complex (APC) that catalyzes the formation of cyclin B-ubiquitin conjugate ending in the ubiquitin-mediated proteolysis of B-type cyclins. TRIP12 is an E3 ubiquitin ligase implicated in the degradation of p19ARF/ARF isoform of CDKN2A and in the DNA Damage Response by regulating UP7 stability and by doing so of p53. Interestingly, among DAS shared by comparing CCs GV_M2 and GV_Oocyte vs CCs, we found several members of E2 Ubiquitin-conjugating enzyme E2 family (UBE2D3, UBE3B, UBE2E2, UBE2N, UBE2G). UBE2E2 is involved in positive regulation of G1/S transition of mitotic cell cycle, UBE2N can interact with UBE2V1/UBE2V2, catalyzing the synthesis of non-canonical Lys 63-coupled poly-ubiquitin chains. This type of poly-ubiquitination does not lead to protein degradation while mediating transcriptional activation of target genes. UBE2G2 mediates endoplasmic reticulum-associated degradation (ERAD)78, such as the sterol-induced ubiquitination of 3-hydroxy-3-methylglytaryl CoA reductase79. Among DAS genes derived from GV oocyte-vs-CCs comparison we found several E3 ubiquitin ligases. FBXW11 (F-box and WD Repeat Domain Containing 11), a substrate component of a SCF (SKPI-CUL1-F-box protein) that mediates the ubiquitination of phosphorylated proteins, allowing transcriptional activation. BIRC2 (Baculoviral IAP Repeat Containing 2), an E3 ubiquitin ligase that modulates mitogenic kinase signaling, cell proliferation and apoptosis. In addition, BIRC2 can stimulate the transcriptional activity of E2F1 and can also function as an E3 ligase for the NEDD8 conjugation pathway of effector caspases. SYVN1 (Synoviolin 1) E3 ubiquitin ligase, component of the reticulum quality control system (ERAD) involved in ubiquitin dependent degradation of misfolded endoplasmic reticulum proteins80,81. RPS27A (Ribosomal Protein 27a) a fusion protein consisting of ubiquitin at the N terminus and the ribosomal protein 27a at the C-terminus], and RPS40 (Ribosomal Protein 40), implicated in maintenance of chromatin structure.

Compelling data support the role of SUMOylation in maturing oocytes, deletion of SUMO-component UBE2I disrupts meiotic maturation and causes defects in spindle organization82. Deletion of UBE2I impaired the communication with granulosa cells and caused a defective resumption of meiosis and meiotic progression83. As DAS gene, we also found SAE (SUMO1 Activating Enzyme Subunit 1) that mediates ATP-dependent activation of SUMO protein on a conserved cysteine residue on UBA2.

DAS genes from CCs GV-vs-M2 comprise several Small Ubiquitin-like modifier (SUMO) ligases: PIAS2 (Protein Inhibitor of activated STAT 2) and PIAS3 both stabilize the interaction with UBE2I and the substrate. PIAS2 play a crucial role as transcriptional co-regulator in the p53 pathway and the steroid hormone signalling pathway. PIAS3 directly binds to several transcription factors, blocking or enhancing their activity.

By comparing CCs GV-M2 we found several DAS E3 ubiquitin ligases: MDMD2, STUB1, HUWE1, HERC2, TRIM37, CBLB, CUL4B. Pathways regulated by TRIM (Tripartite Motif Containing) 37 are epigenetic transcriptional repression (mono-ubiquitination of histone H2A)84, centriole duplication85, peroxisome biogenesis86. While HERC2 (HECT and RLD Domain Containing Ubiquitin Protein Ligase 2) regulates small ubiquitin-dependent retention of repair proteins on damaged chromosome87. DDB2 (damage specific DNA binding protein 2) in complex with CUL4 may ubiquinate histones, facilitating their removal from nucleosome and promoting subsequent DNA repair88,89. HUWE1 (HECT E6AP type E3 ubiquitin protein ligase) targets the p53 tumour suppressor90, core histones91,92 and DNA polymerase beta, playing a role in base-excision repair93. STUB1 (STIP1 Homology and U-Box containing Protein 1) mediates poly-ubiquitination of DNA polymerase beta (POLB), amplifying the HUWE1/ARF-BP1 dependent POLB-degradation by the proteasome93. In addition, STUB1 act as a co-chaperone for HSPA1A and HSPA1B chaperone proteins and promotes protein degradation94. CUL4B (Cullin 4B) is required for ubiquitination of cyclin E and consequently for normal G1 cell cycle progression95,96 and for the proteolysis of several regulators of DNA replication. CUL4B regulates the mammalian target-of-rapamycin (mTOR) pathways implicated in the control of cell growth and metabolism97. Lastly, MDM2 (mouse double minute 2 homolog) a nuclear-localized E3 ubiquitin ligase, inhibits p53-mediated cell cycle arrest, promotes degradation of retinoblastoma RB1 protein98. Together these findings highlight that cumulus expansion is fundamental for oocyte maturation, as many DAS genes are involved in cell cycle control, DNA replication and repair.

Finally, comparison of CCs GV-M2 reveals multiple DAS genes encoding for spliceosome components. SRSF2, SRSF3, SRSF4, SRSF5, SRSF7 (Serine and Arginine Rich Splicing Factor 2, 3, 4, 5 and 7) are splicing factors that promote exon-inclusion during alternative splicing as well as mRNA nuclear export99,100,101. SNRPG (Small Nuclear Ribonucleoprotein Polypeptide G) is a component of precursors of the spliceosome like U1, U2, U4, U5 and U7 small nuclear ribonucleoprotein complexes. U5SNRNP70 (Small Nuclear Ribonucleoprotein U1 Subunit 70) is a component of the spliceosome U1 snRNP102,103. THOC2 (THO Complex Subunit 2) is a component of the TREX complex which is involved in mRNA processing and nuclear export.

PRPF38B (Pre-mRNA Processing Factor 38B), PRPF40A (Pre-mRNA Processing Factor 40 A) PRPF40B (Pre-mRNA Processing Factor 40B) are RNA binding protein involved in mRNA splicing and in several processes, including cytoskeleton organization, regulation of cell shape, regulation of cytokinesis. TRA2A (Transformer 2 Alpha Homolog) and TXNL4A (Tiredoxin like 4A) are RNA binding proteins involved in pre-mRNA splicing. RBM17 (RNA Binding Motif Protein 17) is a splice factor that binds to the single stranded 3′AG at the exon/intron border, also involved in the regulation of alternative splicing and the utilization of cryptic splice sites. NCBP2 (Nuclear Cap Binding Protein Subunit 2) is a component of cap-binding complex associated with pre-mRNA splicing, translational regulation, non-sense-mediated mRNA decay, RNA-mediated gene silencing by microRNAs and mRNA export. HNRNPA1 is a member of heterogeneous nuclear ribonucleoproteins (hnRNPS) associated with pre-mRNAs in the nucleus and pre-mRNAs packaging and processing. PQPB1 (Polyglutamine Binding Protein 1) is an intrinsically disordered protein that act as scaffold in different processes like pre-mRNA splicing, transcription regulation. Through its disordered region, PQPB1 regulates alternative splicing of specific pre-mRNAs molecules104,105,106,107. DHX38 (Dead- Box Helicase 38) is involved in pre-mRNA splicing as component of spliceosome. U2AF1 (U2 small Nuclear RNA A10uxiliary Factor 1) and U2AF1L4 (U2 small Nuclear RNA Auxiliary Factor 1 like 4) take part in protein-protein interactions and protein-RNA interactions required for an accurate 3′ splice site selection, while TCERG1 (Transcription Elongation Regulator 1) is a nuclear protein that regulates transcriptional elongation and pre-mRNA splicing.

In the last part of our study, we focused on miRNAs and their target genes. In fact, it is known that several miRNAs are intricately involved in the regulation of granulosa cell processes, crucial for ovarian function and folliculogenesis108,109.

Interestingly, by comparing the lists of DEMs shared by the two GC types, we found 12 DEMs showing a similar expression profile. A further GO analysis indicated significant common enriched terms associated with cell division, proliferation and metabolism, revealing a complex and powerful regulatory network within the two GC subpopulations.

Lastly, we built a co-expression miRNA-mRNA network to infer some potential regulatory relationships. A total of 12 common miRNAs were analyzed in the context of DE-mRNAs. In line with earlier studies, we found that miR-129-5p is the most connected in the miRNA-mRNA network109, while the two couples miR-30a-5p/miR-148a and miR-30a-3p/miR-196a-5p may act on common target genes. Interestingly, miR-129 is down-regulated in GCs of PCOS patients, whereas the overexpression of miR-129 increases P4 (Progesterone) and E2 (17-beta-estradiol) secretion, thus promoting GCs proliferation in PCOS110.

As previously reported, the miR-146a-5p expression, higher in MGCs compared to CCs, promotes apoptosis by targeting IRAK1 in MGCs15.

MiR-30a-3p and miR-30a-5p, belonging to the same miR-30 family, are both up-regulated in MGCs in contrast to miR-148a and miR-196-5p. The MiR-30 family is highly expressed in mammalian gonads, associated with the Homeobox protein and Zn transport111 and with the regulation of ubiquitin-mediated pathways112. MiR-30a expression in ovarian cancer results in being significantly up-regulated113,114 and further bioinformatics analysis indicates that miR-30a play a crucial role in ovarian cancer biology115. Alteration of miRNA expression profiles is associated with PCOS; miR-30a was detected in human follicular fluid and used as biomarkers (in tandem with miR-140 and let-7b) to discriminate between PCOS and normal ovarian reserve with a high specificity and sensitivity116.

Recent data show a significant dysregulation of miR-144117 in ovarian tissues of animal model of PCOS and of miR-145 in granulosa cells118 and miR-142119 in follicular fluid, respectively, of PCOS patients.

Compelling evidence shows that miR-148a-3p modulates the Wnt/beta-catenin signaling pathway120 and Serpin family E member 1 (SERPINE1) expression121.

In granulosa cells, SERPINE1 expression can influence follicular development by regulating matrix degradation and tissue remodeling processes necessary for follicle growth and ovulation. Aberrant expression of SERPINE1 can disrupt these processes, leading to reproductive issues such as PCOS122. Furthermore, miR-148a-3p is implicated in the regulation of steroidogenesis and cell proliferation within chicken granulosa cells, targeting genes vital for these processes123.

In addition, hsa-miR-196a-5p regulates granulosa cell apoptosis, affecting cell survival pathways crucial for follicular development by repressing Rho Associated Coiled-coil Containing Protein Kinase 1 (ROCK1)124. ROCK1, a key regulator of the actin cytoskeleton and cell polarity, ensures that apoptotic signals are properly localized within the cell, facilitating the efficient execution of the apoptotic program125.

MiR-145 targets Crkl and through the JNK/p38 MAPK pathway promotes cell proliferation, differentiation, and steroidogenesis126 by regulating the cytoskeleton remodeling127.

The expression of miR-10b is higher in MGCs compared to CCs, while for miR-92a-1-5 it is the opposite. Recent data demonstrate that miR-10b represses BDNF expression in granulosa cells, by targeting the 3′ UTR128. The miR-10b expression is controlled by hormones and growth factors in the follicle, such as FSH, FGF9 and some ligands of the TGF-beta pathway. On the other hand, the miR-10 family inhibits many key genes of TGF-beta signaling axis, suggesting the existence of a negative feedback loop129. SMAD4 is a transcription factor of the CYP19A1 gene (encoding aromatase, a key enzyme in E2 synthesis), that in turn stimulates E2 release and inhibits cell apoptosis in porcine GCs. In contrast, miR-10b might act as a pro-apoptotic factor, directly interacting with 3′-UTR of porcine CYP19A1 mRNA, inhibiting its expression and function130. On the other hand, recent data demonstrate that miR-92a inhibits GCs apoptosis in pig ovaries, by targeting SMAD7 directly131. Very recent reports have highlighted the emerging role of miRNAs in regulating ovarian angiogenesis, follicular communication and in regulatory networks of cumulus-oocyte complex (COC) linked to fertilization132,133. The miRNA common network, herein identified, might likely work to fine tune various aspects of follicle maturation [i.e. cumulus cell survival/expansion, cumulus-oocyte communication, follicular microenvironment, angiogenesis]. Interestingly, miRNA profiling from human GC and follicular fluid has revealed the presence miRNAs, involved in the regulation of main pathways underlying folliculogenesis (like Wnt signaling134, mitogen-activated protein kinase135 and phosphatidylinositol 3 kinase-protein kinase B136 pathways respectively). Among these miRNAs, miRNA-10b-5p that we found upregulated in MGCs137. Furthermore, analysis of the follicular fluid from bovine ovaries unveiled a role of few miRNAs in global DNA demethylation138. By targeting DNMT, miRNA-148a that we found upregulated in MGCs, may impact on DNA methylation, thus improving embryonic development138. MiRNA profiling of exosomes derived either from follicular fluid (FF) or from human GCs seem to show a remarkable overlap137. Our study goes in the same direction, since we have identified a shared miRNA signature within GCs. Emerging in-depth analysis of miRNA profiles from exosomes (derived from FF and GCs) will allow soon to better define the miRNA signature and its positive effect on follicle development and activation.

In conclusion, these results have overcome the limitations of individual studies, enabling the identification of new candidates, genes, miRNAs, and signaling routes within the mature follicle. Such new molecular signatures can be used to study the mechanisms underlying GCs function, and as potential biomarkers of oocyte quality.

Methods

Data source

RNA-seq and small RNA-seq datasets related to granulosa cells and oocyte were retrieved by consulting the Sequence Read Archive (SRA) according to Search and Incorporation criteria detailed below.

Search criteria

A systematic search strategy was employed to identify relevant datasets in the SRA archive. The search was performed using keywords such as “granulosa cells”, “cumulus cells and oocyte” and “mural and granulosa cells” ensuring the specificity of the results. Then a filter for homo sapiens species was applied. The SRA archive’s web interface or command-line tools (e.g., SRA Toolkit) were utilized to execute the search queries. The search queries were constructed to include the relevant keywords and any additional filters or parameters deemed necessary.

Incorporation criteria

The identified datasets were evaluated based on specific selection criteria to ensure their suitability for the research project. Criteria considered may include data quality, experimental design, sample size (at least three replicates), tissue source, and relevant metadata. Only standard RNA-seq data was considered, single cell dataset was excluded. An important criterion was the platform of sequencing, indeed only sequencing data from the Illumina platform were chosen to ensure that the same pipeline could be set up and applied to the selected Bioprojects to harmonize all datasets before executing downstream integrated analyses.

Data selection

Three bioprojects of interest have been found in the Sequence Read Archive (SRA) for mRNA sequencing experiments on human granulosa cells, and two bioprojects for small RNA sequencing experiments have been found to evaluate the miRNAs profile of these cells. Data selection was focused on experiments that included also the cumulus cells (CCs).

Briefly, from the article by Yerushalmi et al.12 were analyzed CCs in two stages to get insight into different maturation phases of the COC, it was linked to PRJNA216966, which has 2 samples of CCs from germinal vesicles unstimulated COC, and 3 samples from M2 cumulus of expanded/stimulated COC.

Li et al.13 focused on Polycystic Ovary Syndrome. To discover what characterizes cumulus cells from oocytes and find new insight into intercellular communication, the analysis was only performed considering the control groups. From PRJNA649934, 6 control oocyte samples and 4 control cumulus cells samples.

In the article by Velthut-Meikas et al.14, two bioprojects were linked PRJNA200696 and PRJNA200699 respectively RNA-seq and small RNA-seq experiments on Cumulus and Mural cells in the preovulatory stage. All samples were treated with the standard stimulation protocol: GnRH antagonist, rFSH, hCG. The former has 6 samples total, 3 for CCs and 3 for mural cells (MCs), the latter has 12 samples, 6 for CCs and 6 for MCs.

Completing the small RNA-seq data on granulosa cells is PRJNA41797315. 24 runs total, 12 for CCs (3 samples, 4 runs per sample) and 12 for MCs (3 samples) have been analyzed.

In our analysis, we observed a substantial disproportion in the number of PCOS datasets compared to control and standard treatment datasets. We did not prioritize the analysis of PCOS datasets (like PRJNA762274, PRJEB46048, PRJNA576435, PRJNA576231) since our primary objective was to investigate and characterize the transcriptome of Cumulus Cells under control conditions. To mitigate the inherent bias caused by the unequal distribution of datasets, we employed stringent criteria for dataset selection. We prioritized datasets with well-defined metadata, rigorous experimental design, and a balanced representation of PCOS cases, control subjects, and standard treatment groups. Datasets lacking essential information or exhibiting a skewed distribution were regrettably excluded from our analysis to ensure the robustness and validity of our findings.

Data download

The selected datasets were downloaded from the SRA archive using SRA toolkit v3.0.0. The chosen dataset’s accession numbers or unique identifiers were recorded for future reference. The whole metadata are reported under supplementary table 1.

mRNA-Seq data processing

Quality control and trimming

A first quality control was performed on the FASTQ files using FASTQC (version 0.11.9), low-quality reads and adapters were removed using Trimmomatic v.0.39 (installed in g100). For adapter trimming were used the TrueSeq adapters for paired end:

>PrefixPE/1

TACACTCTTTCCCTACACGACGCTCTTCCGATCT

>PrefixPE/2

GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT

>PE1

TACACTCTTTCCCTACACGACGCTCTTCCGATCT

>PE1_rc

AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA

>PE2

GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT

>PE2_rc

AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC

For single end:

>TruSeq 3_IndexedAdapter

AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC

>TruSeq 3_UniversalAdapter

AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA

Parameters used for trimmomatic for paired-end and single-end reads were LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36.

Alignment and annotation

For the alignment, The HISAT2 aligner software (version 2.2.1)139 was employed for aligning the RNA-Seq reads to the reference genome assembly for Homo Sapiens (hg38). Prior to alignment, the reference genome assembly was indexed using HISAT2’s indexing utility. The alignment of RNA-Seq reads to the reference genome was performed using HISAT2 principal command. Were used both paired and unpaired data after trimming.

The resulting alignment file in SAM format was converted to the binary BAM format for efficient storage and subsequent analysis and to facilitate downstream analysis, the BAM file was sorted by genomic coordinates. The conversion was performed using the SAMtools (version 1.13)140.

If the experiment had multiple runs for the same sample, the BAM files were merged using SAMtools. As a result, there was a BAM file for each sample.

The quantification of gene expression levels based on the aligned reads and a genome annotation file (GTF or GFF) was performed using Stringtie (version 2.1.6)141. This step involved assigning reads to genes and counting the number of reads associated with each gene. The genome annotation file was obtained from hg3819 release and provided in GTF format. The PrepDe python script was used to generate the resulting gene count matrix for differential expression analysis, as described in the StringTie handbook142.

Data quality control before dataset integration

The datasets coming from different bioprojects have been processed using the shared pipeline as described above. This consistent processing helps mitigating some of the variability introduced by differences in library preparation and sequencing depth. However, we recognize that differences in library preparation and sequencing depth can influence gene detection efficiency and expression levels, potentially affecting downstream pathway analysis. To address this, we have conducted several quality control measures to assess and account for these differences.

To provide a clearer understanding of the dataset quality and sequencing depth, we have conducted a fastqc analysis, which is reported under the Supplementary Dataset143. The MultiQC report provides a comprehensive overview of the raw fastq data quality, including metrics such as read quality scores, adapter content, and duplication levels before and after trimming. Furthermore, we have conducted a PCA on the count matrices for each bioproject (S7). This analysis helps visualizing the similarity and differences in gene expression profiles. The PCA analysis has clearly highlighted distinct sample clustering at dataset level segregating them by cell type and not by potentially confounding batch effects. The subsequent gene-count normalization process operated by DESeq 2 package ensured proper comparisons and a robust final list of DEGs.

mRNA post-processing data analysis

Most post-processing analysis were conducted using R programming language with R studio (version 2022.02.3 Build 492). The raw count data obtained from Stringtie was used as input for differential expression analysis. The DESeq 2 R package (version 1.36.0)144 was used to normalize the raw counts, estimate size factors, and perform statistical tests to identify differentially expressed genes. This analysis aimed to identify genes with significantly different expression levels between experimental conditions or cell types.

Statistical analysis was conducted using DESeq 2. The analysis considered relevant factors such as treatment condition, replicates, and experimental design. Significantly differentially expressed genes were determined based on at least 0.05 adjusted p-values and two-log2 fold change thresholds in up- or down-regulation.

Gene Ontology (GO) enrichment analysis was performed using the clusterProfiler R package (version 4.7.1.003)145 to gain insights into the functional annotation and enrichment of DEGs within bioprojects. The gene list used for GO analysis consisted of genes that showed significant differential expression both up- or downregulated between experimental conditions or cell types, as determined by the differential expression analysis using DESeq 2.

We acknowledge that DEGs from two of the three datasets might include markers specific to mural GCs and oocytes, which could influence the functional analysis results. To address this, we conducted a detailed analysis to show the up-regulated and down-regulated genes separately in each cell type across multiple datasets (Fig. S4).

Before conducting GO-analysis, if necessary, the gene identifiers in the input gene list were converted to a common identifier system, such as Entrez Gene IDs or official gene symbols, using available annotation database org.Hs.eg.db146.

The gene list was annotated with GO terms using the enrichGO function from clusterProfiler. The annotation was based on the specific organism’s GO database (org.Hs.eg.db). The enrichGO function was used to identify significantly enriched GO terms within the gene list. This function performed a hypergeometric test to determine if a particular GO term was overrepresented in the input gene list compared to the genome background. The p-values were adjusted for multiple testing using the appropriate method Benjamini-Hochberg correction method.

The results of the GO enrichment analysis were visualized using various plotting functions from clusterProfiler. To remove redundancy of enriched GO terms the simplify function from the clusterProfiler package. This step provided a more concise and interpretable representation of the enriched GO terms.

To perform KEGG pathway enrichment analysis the clusterProfiler package was used following the same steps of the GO enrichment analysis except for the database was downloaded locally due to known problems with clusterProfiler connecting to KEGG DB, loading the “KEGG.db” downloaded from the create_kegg_db(“hsa”) command, using the “createKEGGdb” R package.

In parallel, to reveal the hidden patterns of common biological processes, the R package clusterProfiler was also used in compareCluster modality, where the complete lists of DEGs from each Bioproject are used simultaneously as input.

Protein Protein Interaction network was built using the webAPP STRING v12.0 using the coreset of 236 shared DEGs as input.

Differential alternative splicing analysis

To investigate differential alternative splicing events between experimental conditions or treatments, MAJIQ (version 2.4) was employed147,148. The MAJIQ software was downloaded from the official website149 and installed via Conda. The required dependencies and reference annotations were obtained according to the MAJIQ documentation. A MAJIQ configuration file was created to specify the experimental design and sample information. This file included details such as the input BAM files for each condition or treatment, sample groupings, and experimental factors.

MAJIQ “build” command was used to detect and quantify alternative splicing events from the aligned RNA-Seq data in BAM format. Furthermore, a gene annotation file in gff3 format was required, the file used for the analysis is available on genecode. The output is a.majiq file for each sample, which is used as input for the “deltapsi” command to identify differentially spliced events between conditions or cell type.

The output of the differential splicing analysis provided the differential splicing events, including their psi values (percent spliced-in) and statistical significance.

MAJIQ calculates differential alternative splicing over local splicing variations (LSVs), not on the entire gene. These results make it so that even small LSVs variations (20%) can strongly affect the differential splicing, thus this filtering criterion, in conjunction with q < 0.05, was used to produce filtered lists of LSVs.

Various visualization tools and packages, such as Voila: a visualization package that combines the output of MAJIQ Builder (“build”) and MAJIQ Quantifier(“deltapsi”) using interactive D3 components and HTML5. Then the Modulizer tool was used to categorize different splicing event types. To get the distribution of each alternative splicing category in each Bioproject, a custom script was built.

To gain insights into the biological functions and pathways associated with the differentially alternative splicing (DAS), GO Enrichment Analysis was performed on DAS transcription regulators. The DAS genes can be linked to their corresponding gene annotations or functional databases to determine enriched terms or pathways. The enrichGO function was used to identify significantly enriched GO terms within the gene list. Additional analysis on transcription regulator DAS and their targets was performed using the TRRUST database. Targets were filtered to only the differentially expressed genes, then a GO enrichment analysis was performed on the filtered target list150.

The violin plot implemented in the Fig. S4 serves as an exploratory visualization tool. Its primary input consists of the lists of DEGs associated with each TF within individual bioprojects. The plot is further annotated to identify instances where genes exhibit both being TF and DAS. This visualization is created using the GGplot2 R package.

Small RNA-seq analysis

For PRJNA200699 was used miRDeep2 (version 0.1.2)151 to trim, detect and quantify miRNAs expression.

For PRJNA417973 a specific adapter trimming was required due to the library construction152, Cutadapt v4.2 was employed with the following commands: cutadapt–discard-untrimmed–minimum-length = 18–maximum-length = 35 -a NNNNTGGAATTCTCGGGTGCCAAGGNNNN -o clipped_SAMN08013076.fastq -j 0 -O 11 SAMN08013076.fastq.

Gene ontology enrichment analysis was performed as previously described the gene list for the enrichment is the list of differential expressed miRNAs target genes (experimentally validated only). Then a venn diagram has been plotted to represent the shared DEMs between the two bioprojects using ggVenn from ggplot2 and VennDiagram R package153.

An interaction network was built selecting differentially expressed miRNAs shared in both projects and their target DEGs from PRJNA200696 using igraph R package, and Fruchterman-Reingold algorithm for graph visualization. miRNA-targets were downloaded using multiMiR R package154, applying the validated target filter.

List of abbreviations and acronyms used in the paper

GCs (Granulosa Cells); CCs (Cumulus Cells); miRNAs (microRNAs); mRNAs (messenger RNAs); DEM (Differentially expressed miRNAs); DEGs (Differentially expressed Genes); DAS (Differential alternative splicing); cumulus-oocyte complex (COC); human chorionic gonadotrophin (hCG); metaphase 2 (M2); Polycystic Ovary syndrome (PCOS); Differentially Expressed (DE); Sequence Read Archive (SRA); Gene Ontology (GO); Kyoto Encyclopedia of Genes and Genomes (KEGG); transcription regulator (TR); ALE/AFE (Alternative Last/First Exon); MES (Multiple Exon skipping); MXE (Mutually Exclusive Exons); alt3′ (alternative 3′ exon); alt5′ (alternative 5′ exon).