Introduction

Sex-sorted semen in bovines is a groundbreaking development in the dairy industry. This technology allows for a greater proportion of milk-producing animals (~ 90%) than conventional (unsorted) semen (~ 50%), thereby increasing profits for the dairymen1. Additionally, it enhances resource-use efficiency, accelerates genetic gain, and promotes animal welfare by reducing the number of undesired male calf births2. However, the current flow cytometer-based technology for sex sorting of bovine semen is proprietary, slow, expensive, and associated with lower conception rates3. Therefore, developing alternative technologies are crucial for advancing milk production as a global public good. Ongoing research and developmental efforts aim to refine this technology, making it more accessible and efficient and further solidifying its role in sustainable livestock production2,3. There are significant gaps in the omics knowledge bases concerning X- and Y-chromosome-bearing spermatozoa (X- and Y-sperm) of cattle. The lack of a comprehensive omics dataset hampers the ability to draw meaningful comparisons across livestock species and to develop robust biomarkers for sex-sorting of bovine semen with improved conception rates. While some progress has been made using advanced techniques such as RNA sequencing and mass spectrometry, these studies often focus on model organisms, leaving a gap in our knowledge of agriculturally important species. Additionally, the dynamic nature of sperm and their interactions with the female reproductive tract are not well understood, further complicating efforts to develop more efficient sex-sorting methods. A detailed understanding of the molecular biology of X- and Y-sperm will not only address these gaps but also contribute to broader fields of reproductive health and biology4,5.

Recent studies on transcriptome profiles of unsorted sperm from various species such as Bos indicus have debunked the notion that sperm is transcriptionally dormant. Comparative male fertility has been the primary focus of these transcriptomics investigations6,7,8,9. However, there is a paucity of research on the differential expression of genes in bovine X- and Y-sperm. Suppression subtractive hybridization and cDNA microarray analysis revealed 31 differentially expressed upregulated genes in bovine X- (27 numbers) and Y-sperm (4 numbers), respectively10. The advent of advanced RNA sequencing technologies has revolutionized transcriptome studies by minimizing the requirement of RNA quantities. By RNA sequencing study, it was deciphered that 492 genes are encoded by mouse X-chromosome as against only 15 genes by the Y-chromosome11.

Earlier, we reported the proteomics of unsorted sperm and identified differential expression of plasma membrane-associated proteins between X- and Y-sperm of indicus cattle12. Here, we hypothesise differential gene expression in bovine X- and Y-sperm which can serve as the basis for their segregation. Therefore, our objective was to study the comparative transcriptome of unsorted and sexed bovine X- and Y-sperm with gene ontology and pathway enrichment analysis and validate the differential gene expression. This report provides novel insights into the complexities of sex differences in dairy cattle.

Methods

General information

Semen samples from bulls for unsorted, sex-sorted X and sex-sorted Y (n = 3 bulls each for RNA-Seq) were obtained from reputed bull semen stations. The semen samples of pedigreed Sahiwal bulls, a prized cattle breed of India and Pakistan, of 4–6 years of age and high genetic merit were used for the present study. The collected ejaculates qualified the minimum standards required for production of frozen semen as prescribed by the Department of Animal Husbandry & Dairying, Ministry of Fisheries, Animal Husbandry & Dairying, Government of India13. The sperm were sorted following the principle of Beltsville Sperm Sexing Technology14. A high-speed BD influx cell sorter was used for sperm sorting process. The purity of the sexed semen (both X- and Y-sorted) was approx. 90%. Six biological replicates (bulls) and three technical replicates were used for each of the groups for qPCR validation. A minimum of two number of ejaculates per bull was taken, and the amount of each ejaculate per bull was balanced for downstream applications. All methods were carried out in accordance with the relevant guidelines and regulations and are reported in accordance with ARRIVE guidelines (https://arriveguidelines.org).

Experimental design and sampling

Three biological replicates (bulls) for each group viz., Conventional/unsorted (C1, C2, C3), sex-sorted X-sperm (T1A, T1B, T1C) and sex-sorted Y-sperm (T2A, T2B, T2C) were used for RNA-Seq analysis. C versus T1, C versus T2 and T1 versus T2 were the experimental conditions.

RNA extraction

RNA isolation was performed on pooled ejaculates from various bulls. The samples were first purified and washed with phosphate buffer saline (PBS, pH 7.2). Total RNA was isolated via a protocol described previously by us15. In brief, sperm samples (10 million cells for each biological replicate) were homogenized and lysed with a lysis buffer cocktail [0.1 ml of lysis buffer from the RNeasy Plus Mini Kit (Qiagen, USA), 0.9 ml of Qiazol (Qiagen) and 0.1 ml of TCEP (Sigma, USA)]. This was followed by phase separation with 0.2 ml of chloroform. The remaining steps were performed according to RNeasy Plus Mini Kit (Qiagen) protocol. The DNase treatment was included. To rule out contamination by somatic cells, we performed molecular screening of the RNA used for RNA-Seq and qPCR. For this purpose, we followed methods outlined by Selvaraju et al.8. Spermatozoal gDNA contamination was checked using a set of intron-spanning primers for protamine 1 (PRM1). To confirm that the RNA was free from other types of contamination, cell-specific intron-spanning primers for the Cadherin1 gene (CDH1, for epithelial cells), protein tyrosine phosphatase receptor type C gene (PTPRC, for leukocytes) and KIT oncogene (KIT, for germ cells) were used. The primer sequences are provided in Supplementary Table 1. The RNA was stored at − 80 °C for subsequent applications.

RNA-seq analysis

The NEBNext Ultra II Directional RNA Library Prep Kit (New England Biolabs, USA) was used to prepare high-quality libraries according to the manufacturer’s protocols, and paired-end sequencing reads of 150 bp were generated with the Illumina HiSeq X sequencing platform. The raw reads were checked via FastQC16. Fastp, an ultrafast FASTQ preprocessor with useful quality control and data-filtering features, were used17. A cut-off of 30 was set for the Phred quality score, and only high-quality reads were retained.

Mapping and alignment of reads to the Bos indicus genome

The reads were aligned against the NCBI Reference genome of Bos indicus (assembly Bos_indicus_1.0) by using a fast and sensitive alignment program HISAT218. To understand the alignment quality, we checked several parameters, including the percentage of mapped reads. The mapped reads were subsequently considered for transcript assembly and quantification of transcript abundance by StringTie19.

Transcript classification

The transcript count obtained from each sample was further used for the analysis of the differential expression of transcripts between alternate conditions using DESeq220. The transcripts whose p value and adjusted p value were less than 0.05 were selected for further analysis. The log2-fold change cut-offs of (+ 2) and greater for upregulated transcripts and (− 2) and lesser for downregulated transcripts were used in all three conditions (C vs T1, C vs T2 and T1 vs T2). The transcripts with FPKMs present in only one group were considered unique. The differentially expressed spermatozoa transcripts between the alternate conditions were expressed as heatmaps, volcano plots and PCA plots using online tools.

Gene ontology and pathway analysis

The obtained spermatozoa transcripts were subjected to gene ontology (GO) classification via the Panther classification system (PANTHER 18.0) and the Database for Annotation, Visualization, and Integrated Discovery (DAVID) Bioinformatics Resources21. The transcripts were classified into four categories: molecular function (MF), biological process (BP), cellular component (CC), and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. Highcharts was used to create a donut pie chart representing the BPs, CCs, and MFs of genes that are unique to bovine X-sperm and Y-sperm. Pathway enrichment was performed via enriched KEGG functions and clusterProfiler. Using the open-source Cytoscape (version 3.10.1) platform, the ClueGo (version 2.5.4) and Cluepedia (version 1.5.4) plugins were employed to analyse gene interactions, comprehensive networks of integrated GO categories, and pathway analysis22. We found several configurations representing a network of interactions between related genes. Bos indicus genome served as the background for each analysis.

Validation of differential gene expression by qPCR

The top 22 differentially expressed genes (DEGs) in bovine Y-sperm (in comparison with X-sperm) were selected and validated by qPCR using SYBR Green chemistry (Supplementary Table 1). Six or more bulls for each of the experimental groups (unsorted, C; X-sorted, T1; and Y-sorted, T2) were used for qPCR analysis. GAPDH was used as a reference gene in the qPCR for determining relative expression levels. The choice of GAPDH as a reference gene was based on previous work23 that evaluated the expression stability of 10 commonly used housekeeping genes in bovine spermatozoa and revealed GAPDH as the most stable internal control gene (ICG) on the basis of the analysis by Genorm, NormFinder, Delta Ct and the comprehensive ranking using RefFinder. GAPDH was also found to be stable across the experimental groups on the basis of Ct values and melt curve analysis in the present study. The primer sequences for selected 22 genes are provided in Supplementary Table 1. Statistical analysis of the gene expression data from the qPCR was performed by Student’s t-test using Microsoft Excel.

Results

RNA yield and quality

An average yield of 60 fg of RNA per sperm cell was obtained, and it was confirmed that this RNA was devoid of gDNA and other RNA impurities from somatic cells, leucocytes, and germ cells by molecular screening. DNA from sperm cells served as a positive control (Supplementary Fig. 1).

RNA-seq analysis

The total filtered sequence counts for each biological replicate in the three groups used for the study are shown in Supplementary Fig. 2. The average number of filtered reads obtained from unsorted, sex-sorted X- and Y-sperm were 53.39 ± 3.9, 47.75 ± 2.2, and 26.80 ± 2.6 million, respectively. The processed reads were aligned against the NCBI reference genome of Bos indicus (assembly Bos_indicus_1.0) via the ultrafast splice-ware aligner HISAT218. Details of the mapping read percentage per sample are given in Supplementary Table 2. The mapped reads were further considered for transcript assembly and quantification of transcript abundance via stringtie19. Transcript counts obtained from each sample were further used for analysis of the differential expression of transcripts between alternate conditions via DESeq220. The numbers of upregulated and downregulated transcripts found in the three alternate conditions are given in Table 1. The list of all the differentially expressed transcripts in Y-sperm vis-à-vis X-sperm is given in Supplementary Table 3, and the list of differentially expressed transcripts in X- and Y-sperm of Bos indicus bulls compared to unsorted sperm are given in Supplementary Tables 4 and 5. The volcano plot generated by using a DESeq2 dataset, with default log fold-change thresholds of − 2 and + 2 and an adjusted p value threshold of 0.05, is shown in Fig. 1. The top 50 differentially expressed sperm transcripts were plotted via a heatmap (Fig. 2). Principal component analysis (PCA) of differentially expressed genes was performed on biological replicates of each group to evaluate variance (Fig. 3). A Venn diagram of the unique gene expression data is shown in Fig. 4.

Table 1 The number of differential transcripts obtained via DSEQ2 analysis for p and adjusted (adj.) p values less than 0.05 for each experimental condition.
Fig. 1
figure 1

Visualization of the differential transcripts via a volcano plot. (a) Unsorted and X- sorted cattle sperm. (b) Unsorted and Y-sperm. (c) X- and Y-sperm. The X-axis represents logarithmic fold changes in expression (log2FC), whereas the Y-axis represents the negative decimal logarithm of the P values. The horizontal line refers to the negative logarithmic P value cut-off (P = 2). The vertical lines mark the fold change cut-offs (log2FC > 2).

Fig. 2
figure 2

Heatmap of the top 50 differentially expressed transcripts between (a) unsorted (C1, C2, C3) and X-sperm (T1A, T1B, T1C), (b) unsorted and Y-sperm (T2A, T2B, T2C), and (c) X- and Y-sperm.

Fig. 3
figure 3

Principal component analysis (PCA) plots generated from DeSeq2 showing variation within and between groups. (a) Unsorted (control) and X-sperm (treatment). (b) Unsorted (control) and Y-sperm (treatment). (c) X-(control) and Y-sperm (treatment).

Fig. 4
figure 4

Venn diagram representing the sperm transcripts in (a) Unsorted (C) and X-sperm (T1), (b) Unsorted (C) and Y-sperm (T2), and (c) X- (T1) and Y-sperm (T2).

Transcript distribution in X- and Y-sperm

We found that 16% of the transcripts were unique to cattle X-sperm and 20.7% were unique to Y-sperm. The percentage of common transcripts between cattle X- and Y-sperm was 63.3% (Fig. 4c). A total of 175 transcripts were differentially expressed in cattle Y-sperm compared with X-sperm of which nine transcripts were classified as noncoding (IncRNA) (five nos. upregulated and four nos. downregulated), 152 as protein-coding (67 nos. upregulated and 85 nos. downregulated), 10 as pseudogenes (four nos. upregulated and six nos. downregulated), and four as transcribed pseudogenes (one no. upregulated and three nos. downregulated). Among the transcripts unique to Y-sperm, 53 nos. were located on the Bos indicus Y-chromosome, while among the transcripts unique to X-sperm, 200 nos. were located on the Bos indicus X-chromosome. The complete set of SRA data of sex-sorted and unsorted sperm of indicus cattle is available at NCBI (BioProject PRJNA976949).

Gene ontology (GO) analysis

The GO analysis of the upregulated genes revealed their involvement in various molecular functions (MFs), biological processes (BPs), cellular components (CCs), and KEGG pathways across different comparisons: for C versus T1, the genes were involved in 6 MFs, 3 BPs, 4 CCs, and 5 KEGG pathways (Fig. 5a); for C versus T2, they were involved in 8 MFs, 13 BPs, 11 CCs, and 15 KEGG pathways (Fig. 5b); and for T1 versus T2, they were involved in 7 MFs, 5 BPs, 4 CCs, and 5 KEGG pathways (Fig. 5c). Additionally, GO analysis of genes unique to X-sperm showed their involvement in 14 MFs, 11 BPs, and 6 CCs (Fig. 6a), while genes unique to Y-sperm were involved in 8 MFs, 10 BPs, and 4 CCs (Fig. 6b).

Fig. 5
figure 5

Top Gene Ontology terms in different categories: (a) unsorted and X-sperm, (b) unsorted and Y-sperm, (c) X- and Y-sperm.

Fig. 6
figure 6

Top Gene Ontology terms in different categories: (a) unique to X-sperm, and (b) unique to Y-sperm.

Pathway enrichment analysis

The pathway enrichment of the transcripts upregulated in cattle X-sperm in comparison with the unsorted sperm alluded to involvement in the coronavirus disease-COVID-19 pathway (15 counts) and ribosome pathway (11 counts). The transcripts upregulated in cattle Y-sperm compared with unsorted sperm indicated their involvement in the herpes simplex virus 1 infection pathway (88 counts) and glycerophospholipid metabolism pathway (24 counts). The transcripts upregulated in cattle Y-sperm in comparison with X-sperm indicated their involvement in metabolic pathways (111 counts) and the regulation of the actin cytoskeleton pathway (19 counts).

The pathway enrichment of genes unique to cattle X-sperm revealed their involvement in pathways related to cancer (KEGG: 05200, 54 counts) and the MAPK signalling pathway (KEGG: 04010, 31 counts) (Fig. 7). The transcripts unique to Y-sperm indicated involvement in purine metabolism (KEGG: 00230, 10 counts), the calcium signalling pathway KEGG:04020, 7 counts) and the cAMP signalling pathway (KEGG: 04024, 7 counts) (Fig. 8).

Fig. 7
figure 7

MAPK signalling pathway (KEGG:04,010, 31 counts; https://www.kegg.jp/pathway/map04010) with transcripts unique to X-sperm (ARAF, ARRB1, CACNG1, CACNG2, CRK, ELK1, FGF10, FGF13, FGF14, FGF2, FGFR1, FGFR3, IKBKG, IL1RAP, IRAK1, KIT, KRAS, MAP3K7, MAPK7, MAPK8IP3, MAPKAPK2, MECOM, MEF2C, PAK1, PDGFRA, PGF, PRKCB, RASA1, RPS6KA2, RPS6KA3, and VEGFC)57.

Fig. 8
figure 8

cAMP signalling pathway (KEGG:04,024, 7 counts; https://www.kegg.jp/pathway/map04024) with transcripts unique to Y-sperm (ADCY10, CNGA1, FSHR, PDE4D, PLD1, PRKACB, and RAPGEF4)57.

Validation of differential gene expression in cattle X- and Y-Sperm

The 22 differentially expressed genes (DEGs) were validated using qPCR. Ten upregulated and 12 downregulated genes in cattle Y-sperm compared with X-sperm were selected based on log2fold changes from DSEQ2 analysis. The biological significance of the validated genes is presented in Table 2. The primers for the selected genes were designed by Primer BLAST from NCBI (Supplementary Table 1) and validated using SYBR Green chemistry. The log2fold change was calculated by using the 2‒ΔΔCt method46. The means ± SEs of the fold changes in gene expression are shown in Fig. 9. The results revealed significant up-/down-regulation of the selected genes (21 out of 22) in bovine Y-sperm compared with X-sperm (p < 0.01, except for the NAPRT, VWC2 and MVP genes, p < 0.05).

Table 2 The biological significance of the differentially expressed genes validated through qPCR.
Fig. 9
figure 9

Log2-fold change in the expression of selected (a) upregulated and (b) downregulated genes in bull Y-sperm (in comparison to X-sperm). *p < 0.05; **p < 0.01.

Discussion

The discernible difference between X chromosome-bearing sperm and Y chromosome-bearing sperm is the quantity of DNA in the sex chromosomes47. Since the X chromosome has more DNA than the Y chromosome does in mammals, there may be differences in the amount of resultant RNA. Previous works have demonstrated that transcripts or gene products are shared between X- and Y-sperm through intercellular bridges during spermatogenesis that may or may not be carried forward to mature sperm48. However, not all of these products are likely transferred across the intercellular bridge49. The current study reports differential and unique gene expression in X- and Y-sperm of bulls, which are original inputs for the differences at the gene/transcript level between X- and Y-sperm. The protein coding transcripts Cathepsin B, Histone H2A, Calmodulin and Glyceraldehyde-3-phosphate dehydrogenase, testis-specific were found unique to bovine X-sperm whereas protein coding transcripts Elongation factor 1-alpha 1 and Chromodomain-helicase-DNA-binding protein 1 were upregulated in bovine Y-sperm. These are in agreements with our proteomic studies12,50. Previously, a combination of suppression subtractive hybridization (SSH), cDNA microarray, and sequence-homology analysis identified 27 and four genes upregulated in bovine X- and Y-sperm (Bos taurus), respectively10. The abundance and diversity of small noncoding RNA (SncRNA) profiles were reported to vary between bull X- and Y-sperm51.

The GO analysis revealed that the cellular component (60%) was the major portion of the GO categories enriched with unique genes in X-sperm while the unique genes in Y-sperm were enriched in molecular functions (67.3%). The same trend was observed for the genes whose expression was upregulated in Y-sperm compared with X-sperm. The top GO term for the genes unique to Y-sperm was G-protein coupled receptor activity (GO: 0004930, 24.5%), which is associated with MF. The G protein-coupled receptors (GPCRs) constitute the largest family of receptors with seven transmembrane domains and regulate various physiological processes. The known sperm-associated GPCRs are olfactory receptors. They are also involved in fertility and induce bovine sperm acrosome reactions52. Our findings suggest that certain olfactory receptors may be exclusive to cattle Y-sperm (Fig. 6B). These receptors may serve as targets for various interactions involving ligands and receptors, which could be utilised for segregation of bovine Y-sperm.

The pathway analysis of genes unique to bovine Y-sperm revealed their involvement in signalling pathways (calcium signalling pathway (KEGG: 04020) and cAMP signalling pathway (KEGG: 04024)), which are important for sperm motility. Purine metabolism (KEGG: 00230, 10 counts), which has been shown to potentially influence bull fertility, also involves genes unique to Y-sperm53. The essential role of cAMP signalling pathways is in the activation of sperm motility and in the induction of the vigorous asymmetrical movement (i.e., hyperactivated motility) necessary for the fertilization of sperm. cAMP efflux through Multidrug resistance protein 4 (MRP4) regulates sperm motility in bull spermatozoa54,55.

Twenty-two DEGs were validated in the present study. The genes upregulated in Y-sperm are reported to be involved in energy metabolism (PKLR, D2HGDH), membrane transport (ATP13A2, TMEM143, TMEM168), and transcriptional regulation and RNA processing (HNRNPUL1, RUNX1T1, ELAVL4) (Table 2). This suggests that the Y-sperm has robust metabolic rate, ion transport or membrane dynamics than X-sperm. Upregulation of the above genes in Y-sperm may explain their higher velocity compared to X-sperm56. In contrast, genes downregulated in Y-sperm are associated with structural components (NUP214, EMILIN2, MINPP1, LRRC27, DZIP1L) and cellular stress response (NAPRT, MVP, CA14). This suggests that Y-sperm may have lower stress resilience as they are structurally less robust, making them more vulnerable to unfavorable conditions58.

Conclusion

We deciphered the transcriptomes of X- and Y-chromosome-bearing sperm of indicus cattle (Bos indicus) and identified and validated the differential transcripts between them. Among the transcripts found unique to Y-sperm, 53 genes were located on the Bos indicus Y chromosome, while among the transcripts unique to X-sperm, 200 genes were located on the Bos indicus X chromosome. Gene Ontology (GO) analysis revealed that unique genes in cattle X-sperm are mostly associated with cellular components, whereas the unique Y-sperm genes are more involved in molecular functions. Pathway analysis revealed that unique cattle Y-sperm genes are involved in critical signalling pathways, such as calcium and cAMP signalling, which are vital for sperm motility and fertility. Our omics data will help biomarker discovery for sex sorting of bovine semen.