Abstract
Male infertility is fundamentally rooted in developmental defects of germ cells and associated molecular dysregulation, yet the underlying mechanisms remain poorly understood. Data-independent acquisition (DIA) has emerged as a powerful tool in discovery proteomics, enabling the identification of disease biomarkers, therapeutic targets, and molecular pathways with high precision and reproducibility. To analyse the aberrant regulatory events in human sperm proteins associated with male reproductive disorders in a high-throughput, reproducible, and reliable manner, we employed the Orbitrap Astral mass spectrometer, which independently operates Orbitrap Full Scan and Astral MS/MS to generate high-resolution full-scan spectra and high-quality secondary maps. Leveraging the principles of DIA technology, we constructed the most comprehensive human sperm proteomic expression profile reported to date, encompassing 9,309 proteins, 198,153 unique precursors, 154,062 modified peptides, and 145,355 peptides. Moreover, this data is derived from two types of sperm samples to facilitate the exploration of disease-specific proteins. Our data serve as a valuable resource for analysing spermatozoa protein content, facilitating deeper insights into male reproduction.
Similar content being viewed by others
Background & Summary
Male infertility, affecting ~7% of men globally, stems from germ cell defects and molecular dysregulation, yet its mechanisms remain unclear1. Infertility impacts 8–12% of couples worldwide and is predicted to become the 21st century’s third most prevalent disease2. In China, infertility rates rose from 11.9% (2007) to 15.5% (2010), with 25% of reproductive-age couples affected in 20173. Assisted reproductive technologies (ART), including IVF, have advanced since 1978, yet success rates remain suboptimal (30–40% pregnancy, 28.8% live birth)4,5. Infertility arises from female, male, combined, or unknown factors, influenced by genetics and environment.
Recent advances in omics technologies have revolutionized our understanding of reproductive biology. Multi-omics studies have elucidated the dynamic transcriptional and epigenetic landscapes of primordial germ cells (PGCs), gametes, and early embryos, identifying key genes and pathways critical for human reproduction6,7,8. According to a review by Amaral et al.9, a total of 6,198 unique proteins have been identified in human sperm to date, which are involved in diverse biological pathways, highlighting the potential of proteomics in advancing our understanding of male infertility9. Proteomics has emerged as a powerful tool for uncovering the molecular mechanisms behind spermatogenic defects and sperm dysfunction.
Data-independent acquisition (DIA) is a cutting-edge mass spectrometry (MS) technology revolutionizing label-free quantitative proteomics. By dividing the mass spectrometer’s scanning range into multiple windows, DIA systematically selects, fragments, and detects all precursor ions within each window, ensuring comprehensive and unbiased recording of fragment information10,11. This approach significantly reduces missing values compared to traditional methods, offering superior quantification accuracy, repeatability, and stability12,13,14. DIA-MS is particularly suited for large-scale studies, enabling reliable, reproducible results and the detection of low-abundance proteins, making it invaluable for biomarker discovery and mechanistic research.
In reproductive biology, proteomics has transformed human sperm research, providing molecular insights into sperm function, male infertility, and reproductive health. Sperm function relies on intricate protein networks and precise post-translational modifications (PTMs)15,16,17,18. While traditional studies focused on morphology and physiology, proteomics technologies like data-dependent acquisition (DDA) and DIA have enabled comprehensive analysis of the sperm proteome, uncovering key proteins and pathways related to motility, capacitation, and fertilization19,20. For instance, functional modules linked to oxidative stress, energy metabolism, and cytoskeletal remodeling have been identified, with their dysregulation closely associated with male infertility21. The integration of Astral-DIA proteomics has further advanced male fertility research by enabling high-resolution, high-throughput analysis of the sperm proteome. This technology identifies critical proteins and pathways, offering profound insights into male infertility mechanisms. Its exceptional accuracy and reproducibility support biomarker discovery and targeted therapy development, bridging basic research and clinical applications. Astral-DIA thus represents a transformative tool for advancing male reproductive health.
Male infertility is initially screened through traditional semen analysis22, but LC-MS/MS-based proteomics offers deeper molecular insights into sperm proteins critical for fertilization and embryonic development23. Emerging evidence shows that external factors like cryopreservation, ejaculatory abstinence, reactive oxygen species (ROS), and inflammation induce proteomic changes, significantly impacting semen quality24,25,26. These findings highlight proteomics’ potential to uncover asthenozoospermia mechanisms and identify novel biomarkers and therapeutic targets for male infertility.
Despite advances in reproductive medicine, the molecular mechanisms of asthenozoospermia, characterized by reduced sperm motility, remain poorly understood27,28,29,30. This gap hinders effective diagnostic and therapeutic development. Recent breakthroughs, such as the integration of data-independent acquisition (DIA) and high-resolution mass spectrometry (e.g., Orbitrap Astral), enable large-scale, precise protein identification and quantification. These tools provide comprehensive proteome coverage with exceptional sensitivity and reproducibility, offering unprecedented opportunities to explore the sperm proteome and advance reproductive medicine.
To characterize sperm-related proteins as comprehensively as possible, we analyzed 47 samples representing two groups in this study, while systematically evaluating two protein extraction methods. This study employed a Vanquish Neo UHPLC system coupled with an Orbitrap Astral mass spectrometer for liquid chromatography-tandem mass spectrometry (LC-MS/MS) analysis using DIA. High-resolution full-scan spectra and high-quality MS/MS spectra were generated, resulting in the most comprehensive human sperm proteome expression profile to date. The instrumentation delivers high-resolution and mass-accurate measurements, enabling comprehensive profiling of the sperm proteome. Final data analysis was performed using Spectronaut™ software (version 18.7) for both sample cohorts. This study establishes a standardized proteomic workflow—encompassing sample selection, peptide fractionation, and DIA quantification—that generates a valuable resource for mass spectrometry-based sperm protein profiling, with generalizability for future reproductive research applications.
Methods
Study design
A total of 47 human sperm samples were collected, including 24 from normal controls and 23 from patients with asthenozoospermia. Total protein lysates were extracted from sperm samples using T-PER and urea-assisted (UA) ultrasound methods, followed by tryptic digestion into peptides. Basic reversed-phase chromatography was employed to fractionate the digested peptides, generating 47 raw mass spectrometry data files (Fig. 1).
Sample preparation and proteomic analysis workflow. Semen samples from asthenozoospermic (n = 7 for T-PER, n = 16 for UA) and normozoospermic (n = 7 for T-PER, n = 17 for UA) individuals were lysed using T-PER (n = 14) or UA (n = 33) buffers. Proteins were digested with trypsin and processed via FASP (Filter-Aided Sample Preparation). Peptides were fractionated using a Vanquish Neo UHPLC system with a high-pH reversed-phase chromatograph (23-minute gradient). DIA-MS was performed on an Orbitrap Astral mass spectrometer, and data were analyzed using Spectronaut.
To comprehensively analyze the proteomic profiles of normal and asthenozoospermic sperm, DIA strategy was implemented. Liquid chromatography-tandem mass spectrometry (LC-MS/MS) analysis was performed using the Orbitrap Astral mass spectrometer in DIA mode31,32,33. Raw mass spectrometry files were processed and searched against a protein database using SpectronautTM software, resulting in the identification of 145,355 unique peptides and 9,309 corresponding proteins34,35. Functional annotation and pathway analysis were conducted using Gene Ontology (GO) and single-sample gene set enrichment analysis (ssGSEA) to elucidate the biological processes and molecular pathways associated with sperm quality and function.
Sample collection
The ethical approval for this study was provided by the Reproductive Study Ethics Committee of Peking University Third Hospital (2017SZ-035). All donors provided written informed consent for this study, which included agreement to provide their own anonymous information and semen samples for this research and give specific consent for sharing and publishing the data in public databases. A total of 47 semen samples were obtained from patients at the Reproductive Center of Peking University Third Hospital (Supplementary Table 1). Routine semen analysis was performed using computer-assisted sperm analysis (CASA) (HTM-IVOS, USA) in accordance with the World Health Organization (WHO, 2021) guidelines35,36. Following complete liquefaction, we evaluated: (1) total motility (%), (2) progressive motility (PR, %), (3) curvilinear velocity (VSL, µm/s), and (4) average path velocity (VAP, µm/s). For each sample, a minimum of 500 spermatozoa were analyzed across five randomized microscopic fields. According to WHO (2021) criteria, participants were stratified into two groups: normozoospermic (Normal, n = 24; total motility >40%, PR >32%) and asthenozoospermic (Astheno, n = 23; total motility ≤40%, PR ≤32%). Patients with conditions such as varicocele, leukocytospermia, or oligozoospermia were excluded from the study to ensure the homogeneity of the sample population and the reliability of the results. All participants were screened for coexisting confounding conditions, including genitourinary infections, alcohol abuse, and drug use. Semen samples were collected via masturbation after 3–5 days of sexual abstinence and deposited into sterile containers.
The semen samples were divided into two groups: a normal sperm group and an asthenozoospermia group. A total of 24 qualified healthy semen samples and 23 asthenozoospermia samples were collected. After complete liquefaction at 37 °C, the samples were washed three times with phosphate-buffered saline (PBS) and subsequently used for proteomic analysis. This standardized protocol ensured the integrity and consistency of the samples for downstream proteomic profiling.
Protein extraction and digestion
Fresh semen samples were incubated at 37 °C for 30 minutes in a cell culture incubator to allow liquefaction. The samples were then centrifuged at 1,000 × g for 10 minutes to remove seminal plasma. The resulting sperm pellet was washed three times with ice-cold phosphate-buffered saline (PBS). Subsequently, 500 μL of tissue protein extraction reagent (T-PER, Thermo Scientific, USA) or UA lysis buffer (8 M urea, 100 mM Tris-HCl, pH 8.0) was added to each sample to resuspend the sperm cells. The resuspended cells were subjected to sonication on ice (1 second on and 2 second off, 99 cycles, power 200 Watts) for protein extraction. After 10 minutes of lysis on ice, the lysate was centrifuged at 14,000 × g for 15 minutes, and the supernatant was collected. Protein concentration in the lysate was quantified using the Bradford assay.
Filter-aided sample preparation (FASP) and enzymatic digestion
For protein digestion, the sperm protein lysate was processed using the filter-aided sample preparation (FASP) method. The lysate was loaded into a 30 kDa molecular weight cut-off (MWCO) ultrafiltration device. To each ultrafiltration unit, 200 μL of UA solution (8 M urea in 50 mM Tris-HCl, pH 8.0) and 20 mM dithiothreitol (DTT) were added, and the samples were incubated at 37 °C for 4 hours to facilitate denaturation and reduction37,38. After centrifugation at 14,000 × g for 15 minutes, 100 μL of UA solution containing 50 mM iodoacetamide was added to each ultrafiltration unit for alkylation. The samples were incubated in the dark at room temperature for 30 minutes. Following alkylation, the samples were centrifuged at 14,000 × g for 15 minutes to remove excess reagents. Subsequently, 180 μL of 50 mM ammonium bicarbonate (NH4HCO3) was added to each ultrafiltration unit, and the proteins were digested with trypsin at a 50:1 protein-to-enzyme ratio. The digestion was carried out at 37 °C for 16 hours. The resulting peptides were collected by centrifugation, and peptide concentration was determined using a NanoDrop 2000C spectrophotometer at 280 nm absorbance, following established protocols (Jiang et al., Nature 2019)38.
Spectra acquisition by DIA
The liquid chromatography-tandem mass spectrometry (LC-MS/MS) system used for DIA analysis consisted of a Vanquish Neo UHPLC system (Thermo Fisher Scientific) coupled to an Orbitrap Astral mass spectrometer (Thermo Fisher Scientific)39. This high-performance system enables high-resolution separation and sensitive detection, making it well-suited for comprehensive proteomic profiling and quantitative analysis. Reverse-phase liquid chromatography was performed using mobile phase A (0.1% formic acid in water) and mobile phase B (0.1% formic acid in 80% acetonitrile). Peptides were separated at a flow rate of 1.3 µL/min with a 23-minute linear gradient programmed as follows: 0–0.5 min, 4% to 8% buffer B; 0.6–0.9 min, 8% to 8.5% buffer B; 0.9–13.9 min, 8.5% to 22.5% buffer B; 13.9–20.8 min, 22.5% to 35% buffer B; 20.8–21.2 min, 35% to 55% buffer B; 21.2–21.7 min, 35% to 55% buffer B; and 21.7–22.6 min, 99% buffer B. This gradient ensured optimal peptide separation and elution for subsequent high-resolution mass spectrometric analysis.
The Orbitrap Astral mass spectrometer was configured to perform DIA MS/MS scans within a mass range of 150 to 2000 m/z. The isolation window was set to 2 m/z, and the method included a total of 299 scan events. Higher-energy collisional dissociation (HCD) was employed with a collision energy of 25%. The precursor mass range was defined from 380 to 980 m/z. The injection time was fixed at 3 ms, and the normalized automatic gain control (AGC) target was set to 500%. Each scan was acquired with a single microscan. The maximum injection time mode was customized, and the DIA window type was set to automatic. The DIA window mode was configured to operate within the specified m/z range. Loop control was set to time, with a cycle time of 0.6 seconds. This configuration ensured high-resolution and high-accuracy mass spectral data acquisition for comprehensive proteomic analysis.
Database searching for DIA raw files
For data analysis, all DIA raw files were processed using Spectronaut™ software (version 18.7) to perform database searching and quantitative analysis of sperm proteins. The DIA raw data files were imported directly with default settings, and trypsin/P was selected as the digestion enzyme. The false discovery rate (FDR) for both proteins and peptides were set to <0.01. Quantification was performed using MS2-level peak areas. The maximum intensity method was applied for extracting MS1 and MS2 intensities, and the mass tolerance correction factors for both MS1 and MS2 were set to 1. Protein typing sequences and MS2-level peak areas were used for protein quantification, with the reference database set to the human UniProt database (updated at 2024-03-28 containing 20,435 protein groups). The maximum number of missed cleavages was set to 2, and post-translational modifications such as methionine oxidation, N-terminal acetylation, and carbamidomethylation of cysteine were included. The peptide length range was set to 7–25 amino acids, precursor charge states were limited to 1–4, precursor m/z range was set to 400–1200, and fragment ion m/z range was set to 200–2000. The output included protein identification and quantification results.
Data processing
For data processing and statistical analysis, the proteomic data were normalized using the quantile normalization method implemented in the limma package (v.3.24.15) within the R/Bioconductor framework40. The normalized data were used to calculate the coefficient of variation (CV) for each protein, defined as the standard deviation (SD) divided by the mean. Missing values were imputed using methods such as minimum value imputation. Following normalization, the intensity values were log2-transformed for subsequent analyses. Missing values, which could not be processed by certain algorithms, were imputed using the minimum value observed in the proteomic dataset. For pairwise comparisons, differentially expressed proteins were identified using a student’s t-test combined with fold change (FC) analysis. Proteins with a p-value < 0.05 and an absolute log2FC ≥ 0.4 (corresponding to a 1.32-fold change) were considered statistically significant.
Enrichment analysis
Bioinformatics analyses were conducted using R software (version 4.3.1). To ensure compatibility across different analytical methods and tools, the identified protein data were annotated and organized using official gene symbols approved by the HUGO Gene Nomenclature Committee (HGNC) and UniProtKB accession numbers as protein identifiers41. Pathway enrichment analysis of differentially expressed proteins was performed using gene sets obtained from the Molecular Signatures Database (MSigDB, version 7.4), which includes 50 hallmark gene sets, 186 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and 7,481 Gene Ontology (GO) biological process terms. The clusterProfiler R package was utilized for GO and KEGG pathway analysis, with the enricher function implemented for enrichment testing42,43. Additional R packages employed in the analysis included topGO, Rgraphviz, pathview, org.Hs.eg.db, ggplot2, and ggrepel. Statistical significance was assessed using Fisher’s exact test, and multiple testing corrections were applied using the false discovery rate (FDR) method. GO terms were categorized into three domains: biological process (BP), molecular function (MF), and cellular component (CC). Gene Set Enrichment Analysis (GSEA) was performed using the GSEABase package in R, with the GSEA function implemented to evaluate enrichment patterns.
Data Records
The mass spectrometry proteomics data generated in this study, including instrument raw files, raw proteomic expression profiles of human spermatozoa, all search results of Spectronaut™ (.sne files), TSV-formatted search results, and original DIA files for identifying potential pathological targets in asthenozoospermia, have been deposited to the ProteomeXchange Consortium (https://proteomecentral.proteomexchange.org) via the iProX partner repository44 with the dataset identifier PXD06169845 and PXD06172546. DIA data analysed in Spectronaut (xlsx) is available from ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD06660047.
Technical Validation
Proteomic profiling of human spermatozoa
To investigate the proteomic differences between normal and asthenozoospermic sperm samples, we collected semen samples from two distinct groups: normozoospermic individuals and those diagnosed with asthenozoospermia according to the World Health Organization (WHO)35,36 6th edition criteria. To assess the reliability of the proteomic data, we examined the raw MS/MS spectra and found an average of 12 matched peptides per protein, demonstrating high confidence in protein identification at both the protein and peptide levels (Fig. 2a). Compared to recent studies reported in Reproduction (Taylor Pini. et al.) which from five studies and generated a human sperm proteome of 9,296 proteins and Cells (Mengqi Luo et al.) which coverage of 5685 protein groups via DIA-MS/MS, our proteomic data uniquely identified 3,981 additional protein groups (Fig. 2b)48,49. In this study, protein extraction using T-PER lysis buffer combined with ultrasonication yielded a total of 92,467 peptides and 7,389 protein groups, with an average identification of 71,370 peptides and 6,730 proteins per sample. In contrast, extraction with UA lysis buffer and ultrasonication identified 145,355 peptides and 9,309 protein groups, with an average of 108,270 peptides and 8,367 proteins per sample (Supplementary Table 2). Overall, the UA lysis buffer extracted approximately 30% more proteins than the T-PER lysis buffer, with an average increase of nearly 20% in the number of protein species identified per sample (Fig. 2c,d). Heatmap showed the Pearson correlation coefficient plot of proteins identified in the two sample groups using T-PER lysis buffer (Fig. 2e).
Proteomic profiling and PCA. (a) The distribution of numbers of matched peptides for each protein. (b) Sperm protein groups identified currently compared with the previous studies. (c,d) Bar plots showing the total number of quantified peptides and proteins per group, with dots representing the quantification for each individual sample. (e) Pearson correlation coefficient plot of proteins identified in the two sample groups using T-PER lysis buffer. (f) Comparison of the number of proteins identified by two lysis buffers. (g) Venn diagram showing the number of proteins identified by the two extraction methods. (h) Hhclust analysis summary for the two sample groups.
We subsequently compared the efficacy of two distinct protein extraction methods in terms of the variety of proteins extracted from sperm samples (Fig. 2f). Specifically, the overlap between proteins identified by UA and T-PER was 7,272, while UA uniquely identified 2,037 additional proteins, as depicted in Fig. 2g. Visualization of the interaction correlation coefficients between the two groups showed that samples from different pathological states were largely divided into two clusters (Fig. 2h). These findings collectively demonstrate that the choice of extraction method significantly influences both the diversity and quantity of proteins obtained, with UA-based sonication yielding a markedly higher number of protein species compared to T-PER. Both extraction methods demonstrated comparable efficacy in isolating the majority of human sperm proteome components, yet exhibited significant divergence in both proreform selectivity and quantitative recovery profiles. These results underscore the robustness and depth of our proteomic profiling, further validating the methodological advancements and their application in male reproductive biology research.
Proteomic characteristics of human spermatozoa
To quantitatively analyze the proteomics data, protein intensities were extracted from Spectronaut output files for 47 samples, representing the final expression levels of specific proteins in each sample, thereby generating a protein expression matrix for the sperm samples. The expression matrix was normalized using the quantile normalization function implemented in the R/Bioconductor package limma, a widely adopted and optimal method for proteomics data analysis. Following normalization, protein expression intensities were log2-transformed and used for all subsequent quantitative analyses (Fig. 3a,b) (Supplementary Table 3, 4). For algorithms unable to handle missing values, these were imputed using the minimum value observed in the proteomics dataset. Differential expression analysis was performed separately for proteins identified using T-PER and UA lysis buffers. As shown in Fig. 3, 569 and 1,493 differentially expressed proteins (DEPs) were identified in T-PER and UA lysis buffers, respectively, with statistically significant differences in expression levels (log2 FC ≥ 0.4, p < 0.05) (Fig. 3c) (Supplementary Table 5). Notably, the UA lysis buffer detected a greater number of DEPs with higher sequence coverage compared to T-PER, highlighting its superior performance in protein identification.
Protein abundance and differential analysis. (a,b) Distribution of log2-transformed protein abundance in proteomic samples before and after normalization. (c) Number of differentially expressed proteins between the two groups. (d) PCA results of samples based on the CASA system, with each group color-coded for distinction. (e) PCA of protein expression matrices revealing distinct clustering between normozoospermic (green) and asthenozoospermic (red) groups. (f) Pearson correlation coefficient plot of proteins between the two sample groups.
Principal component analysis and K-means clustering segregated semen samples into two distinct groups (Fig. 3d). Subsequent proteomic analysis using PCA and Spearman-Euclidean algorithms demonstrated complete separation between normal and asthenozoospermic samples based on protein expression profiles (Fig. 3e). Correlation analysis based on the proteomics-identified proteins under the two pathological states revealed that data from samples within the same group clustered together, indicating stable clustering of DEPs between the two pathological conditions (Fig. 3f). The identified DEPs exhibited consistent clustering patterns across analytical approaches.
Retained germ cell features in asthenozoospermic sperm
Comparative proteomic analysis of normozoospermic and asthenozoospermic spermatozoa using a spectral library-based DIA approach (SpectronautTM default parameters) identified 1,493 differentially expressed proteins (Fig. 4a), while revealing conserved germ cell-derived reproductive signatures in the pathological state. Comprehensive proteomic characterization of asthenozoospermic spermatozoa through integrated KEGG and GO analyses (Fig. 4b–f) identified distinct molecular signatures while revealing preservation of core germ cell features. We found that the proteins expressed in normal sperm cells were enriched in pathways supporting cytoskeletal dynamics, signal transduction, and protein modification, which are essential for structural integrity and motility. In contrast, asthenozoospermic sperm proteins were primarily associated with processes related to germ cell biology, such as intraflagellar transport, RNA splicing, and microtubule-based transport. Additionally, pathways related to DNA replication, cell cycle regulation, and spliceosomal activity are prominently activated, suggesting incomplete maturation or aberrant retention of germ cell-like transcriptional and metabolic activity. These findings indicate that asthenozoospermia may involve a failure to fully transition from a germ cell-like state to a functionally mature spermatozoon, resulting in impaired motility and fertilization potential. GSEA profiling (Fig. 4c,d) confirmed retention of fundamental reproductive characteristics through conserved spermatogenesis gene sets, while simultaneously showing pathological alterations in inflammatory response pathways and IL2-STAT5 signaling networks (Supplementary Table 6). The proteomic landscape exhibited coordinated changes in cell cycle regulation and immune modulation, with enrichment of G2/M phase markers and inflammatory mediators alongside suppression of lymphocyte activation pathways. These findings collectively demonstrate that while asthenozoospermia involves widespread pathway dysregulation, critical germ cell-derived reproductive signatures remain preserved within the pathological sperm proteome.
Pathway and functional enrichment analysis of differentially expressed proteins. (a) Volcano plot of differentially expressed proteins. (b) KEGG analysis of differentially expressed proteins. (c) Immunologic signature gene set analysis. (d) Gene set analysis using the hallmark gene sets. The dot describes the enrichment (red) or depletion (green) of each hallmark gene set. (e,f) GO analysis of down-regulated or up-regulated proteins.
Application of DIA-based proteomic profiling in human spermatozoa research
To verify the proteomic expression profile of sperm in male germ cells, we processed a total of 47 samples from two physiological states, each sample was injected into the Orbitrap Astral mass spectrometer, generating 47 raw mass spectrometry data files. In total, we identified 10,581 proteins,19,8153 unique precursors, 154,062 modified peptides, and 145,355 peptides. This study represents the most extensive profiling of the male sperm proteome to date, reliably detecting and quantifying 45.60% of all reviewed human proteins in UniProtKB, establishing the most comprehensive sperm proteomic expression profile currently available. The significant expansion in proteome coverage can be attributed to the exceptional sensitivity and resolution of the Orbitrap Astral mass spectrometer, coupled with substantial improvements in sample preparation and data analysis pipelines.
Analysis of Hallmark gene sets based on GSEA scores revealed that differentially expressed genes between the two physiological states were predominantly enriched in the HALLMARK_SPERMATOGENESIS, HALLMARK_E2F_TARGETS, and HALLMARK_IL2_STAT5_SIGNALING pathways (Fig. 5a–c). We further analyzed proteins associated with asthenozoospermia-driven pathophysiological changes and observed downregulation of ATRN and TNNC1, along with upregulation of PIAS2 (Fig. 5d–f)40,41,42,43,44,45,46,47,48,49,50,51. These molecular changes were observed in association with immune regulation, angiogenesis, and spermatogenesis-related processes52,53,54,55. The sperm protein expression profile revealed coordinated patterns involving multiple biological systems. Specific protein clusters showed distinct expression characteristics between experimental groups, while maintaining associations with fundamental reproductive functions. This heat map reveals significant changes in protein expression, which is crucial for understanding the proteome of asthenospermia and improving disease identification, stratification, treatment and management strategies.
GSEA results and biomarker identification in asthenozoospermia. (a,c) GSEA results in the two datasets, highlighting enriched pathways and biological processes. (d–f) Expression of potential biomarkers in normozoospermic and asthenozoospermic samples. Box plots show the median (central bar), interquartile range (box), and whiskers extending to 1.5 times the interquartile range. (g) Heatmap of Signature Proteins: Relative abundance of signature proteins in asthenozoospermia. Biological functions associated with these proteins are indicated by the colored panel on the right.
Code availability
No custom computer codes were generated in this work.
References
Krausz, C. & Riera-Escamilla, A. Genetics of male infertility. Nature Reviews Urology 15(6), 369–384, https://doi.org/10.1038/s41585-018-0003-3 (2018).
Agarwal, A., Mulgund, A., Hamada, A. & Chyatte, M. R. A unique view on male infertility around the globe. Reproductive Biology and Endocrinology 13(1), 37, https://doi.org/10.1186/s12958-015-0032-1 (2015).
Sun, H. et al. Global, regional, and national prevalence and disability-adjusted life-years for infertility in 195 countries and territories, 1990–2017: Results from a global burden of disease study, 2017. Aging 11(23), 10952–10991, https://doi.org/10.18632/aging.102497 (2019).
Wang, J., Sauer, M. V. & Chen, Z. In vitro fertilization (IVF): A review of 3 decades of clinical innovation and technological advancement. Therapeutics and Clinical Risk Management 2(4), 355–364, https://doi.org/10.2147/tcrm.2006.2.4.355 (2006).
Zegers-Hochschild, F. et al. The International Glossary on Infertility and Fertility Care, 2017. H. uman Reproduction 32(9), 1786–1801, https://doi.org/10.1093/humrep/dex234 (2017).
Li, L. et al. Single-cell multi-omics sequencing of human early embryos. Nature Cell Biology 20, 847–858, https://doi.org/10.1038/s41556-018-0123-2 (2018).
Li, L., Yang, R., Yin, C. & Kee, K. Studying human reproductive biology through single-cell analysis and in vitro differentiation of stem cells into germ cell-like cells. Human reproduction update 26(5), 670–688, https://doi.org/10.1093/humupd/dmaa021 (2020).
Tan, K. & Wilkinson, M. F. A single-cell view of spermatogonial stem cells. Curr Opin Cell Biol. 67, 71–78, https://doi.org/10.1016/j.ceb.2020.07.005 (2020).
Amaral, A., Castillo, J., Ramalho-Santos, J. & Oliva, R. The combined human sperm proteome: cellular pathways and implications for basic and clinical science. Human reproduction update 20(1), 40–62, https://doi.org/10.1093/humupd/dmt046 (2014).
Li, J., Smith, L. S. & Zhu, H. J. Data-independent acquisition (DIA): An emerging proteomics technology for analysis of drugmetabolizing enzymes and transporters. Drug Discov Today Technol 39, 49–56, https://doi.org/10.1016/j.ddtec.2021.06.006 (2021).
Kitata, R. B., Yang, J. C. & Chen, Y. J. Advances in data-independent acquisition mass spectrometry towards comprehensive digital proteome landscape. Mass Spectrom Rev 42, 2324–2348, https://doi.org/10.1002/mas.21781 (2023).
Ludwig, C. et al. Data-independent acquisition-based SWATH-MS for quantitative proteomics: a tutorial. Molecular systems biology 14(8), e8126, https://doi.org/10.15252/msb.20178126 (2018).
Gillet, L. C. et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: A new concept for consistent and accurate proteome analysis. Molecular & Cellular Proteomics 11(6), O111.016717, https://doi.org/10.1074/mcp.O111.016717 (2012).
Kong, R., Qian, X. & Ying, W. Pancreatic cancer cells spectral library by DIA-MS and the phenotype analysis of gemcitabine sensitivity. Sci Data 9, 283, https://doi.org/10.1038/s41597-022-01407-1 (2022).
Martin-Hidalgo, D. et al. Human sperm phosphoproteome reveals differential phosphoprotein signatures that regulate human sperm motility. Journal of proteomics 215, 103654, https://doi.org/10.1016/j.jprot.2020.103654 (2020).
Urizar-Arenaza, I. et al. Phosphoproteomic and Functional Analyses Reveal Sperm-specific Protein Changes Downstream of Kappa Opioid Receptor in Human Spermatozoa. Molecular & cellular proteomics: MCP 18(Suppl 1), S118–S131, https://doi.org/10.1074/mcp.RA118.001133 (2019).
Shahat, A. M., Rizzoto, G. & Kastelic, J. P. Amelioration of heat stress-induced damage to testes and sperm quality. Theriogenology 158, 84–96, https://doi.org/10.1016/j.theriogenology.2020.08.034 (2020).
Xin, M. et al. Precision Glycoproteomics Reveals Distinctive N-Glycosylation in Human Spermatozoa. Molecular & cellular proteomics: MCP 21(4), 100214, https://doi.org/10.1016/j.mcpro.2022.100214 (2022).
Swain, N. et al. Aberrant Upregulation of Compensatory Redox Molecular Machines May Contribute to Sperm Dysfunction in Infertile Men with Unilateral Varicocele: A Proteomic Insight. Antioxidants & redox signaling 32(8), 504–521, https://doi.org/10.1089/ars.2019.7828 (2020).
Nascimento Gomes, S. et al. Imbalanced testicular metabolism induced by thyroid disorders: New evidences from quantitative proteome. Endocrine 67(1), 209–223, https://doi.org/10.1007/s12020-019-01989-8 (2020).
Agarwal, A. et al. Role of oxidative stress, infection and inflammation in male infertility. Andrologia 50(11), e13126, https://doi.org/10.1111/and.13126 (2018).
Carson, S. A. & Kallen, A. N. Diagnosis and Management of Infertility: A Review. JAMA 326(1), 65–76, https://doi.org/10.1001/jama.2021.4788 (2021).
Panner Selvam, M. K. & Agarwal, A. Update on the proteomics of male infertility: A systematic review. Arab journal of urology 16(1), 103–112, https://doi.org/10.1016/j.aju.2017.11.016 (2017).
Shen, Z. Q. et al. Characterization of the Sperm Proteome and Reproductive Outcomes with in Vitro, Fertilization after a Reduction in Male Ejaculatory Abstinence Period. Molecular & cellular proteomics: MCP 18(Suppl 1), S109–S117, https://doi.org/10.1074/mcp.RA117.000541 (2019).
Zhang, X. et al. Both protein and non-protein components in extracellular vesicles of human seminal plasma improve human sperm function via CatSper-mediated calcium signaling. Human reproduction (Oxford, England) 39(4), 658–673, https://doi.org/10.1093/humrep/deae018 (2024).
Manfrevola, F. et al. CRISP2, CATSPER1 and PATE1 Expression in Human Asthenozoospermic Semen. Cells 10(8), 1956, https://doi.org/10.3390/cells10081956 (2021).
Cao, X. et al. Proteomic profile of human spermatozoa in healthy and asthenozoospermic individuals. Reproductive biology and endocrinology: RB&E 16(1), 16, https://doi.org/10.1186/s12958-018-0334-1 (2018).
Murdica, V. et al. Proteomic analysis reveals the negative modulator of sperm function glycodelin as over-represented in semen exosomes isolated from asthenozoospermic patients. Human reproduction (Oxford, England) 34(8), 1416–1427, https://doi.org/10.1093/humrep/dez114 (2019).
Wu, Y. et al. Quantitative Proteomic Analysis of Human Seminal Plasma from Normozoospermic and Asthenozoospermic Individuals. BioMed research international. 2735038. https://doi.org/10.1155/2019/2735038 (2019).
Kuster, B., Tüshaus, J. & Bayer, F. P. A new mass analyzer shakes up the proteomics field. Nature biotechnology 42(12), 1796–1797, https://doi.org/10.1038/s41587-024-02129-y (2024).
Heil, L. R. et al. Evaluating the Performance of the Astral Mass Analyzer for Quantitative Proteomics Using Data Independent Acquisition. Journal of proteome research, 22(10), 3290-3300, https://doi.org/10.1101/2023.06.03.543570 (2022).
Guzman, U. H. et al. Ultra-fast label-free quantification and comprehensive proteome coverage with narrow-window data-independent acquisition. Nature biotechnology 42(12), 1855–1866, https://doi.org/10.1038/s41587-023-02099-7 (2024).
Baker, C. P. et al. Optimizing Spectronaut Search Parameters to Improve Data Quality with Minimal Proteome Coverage Reductions in DIA Analyses of Heterogeneous Samples. Journal of proteome research 23(6), 1926–1936, https://doi.org/10.1021/acs.jproteome.3c00671 (2024).
Lou, R. & Shui, W. Acquisition and Analysis of DIA-Based Proteomic Data: A Comprehensive Survey in 2023. Molecular & cellular proteomics: MCP 23(2), 100712, https://doi.org/10.1016/j.mcpro.2024.100712 (2024).
Finelli, R. et al. The validity and reliability of computer-aided semen analyzers in performing semen analysis: a systematic review. Translational andrology and urology 10(7), 3069–3079, https://doi.org/10.21037/tau-21-276 (2021).
Boitrelle, F. et al. The sixth edition of the WHO manual for human semen analysis: A critical review and SWOT analysis. Life (Basel) 11(12):1368. Published 2021 Dec 9. https://doi.org/10.3390/life11121368 (2021).
Wiśniewski, J. R., Zougman, A., Nagaraj, N. & Mann, M. Universal sample preparation method for proteome analysis. Nature methods 6(5), 359–362, https://doi.org/10.1038/nmeth.1322 (2009).
Jiang, Y. et al. Proteomics identifies new therapeutic targets of early-stage hepatocellular carcinoma. Nature. 2019;567(7747):257–261. https://doi.org/10.1038/s41586-019-0987-8 (2019).
Stewart, H. I. et al. Parallelized Acquisition of Orbitrap and Astral Analyzers Enables High-Throughput Quantitative Analysis. Analytical chemistry 95(42), 15656–15664, https://doi.org/10.1021/acs.analchem.3c02856 (2023).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic acids research 43(7), e47, https://doi.org/10.1093/nar/gkv007 (2015).
Boutet, E. et al. UniProtKB/Swiss-Prot. Methods in molecular biology (Clifton, N.J.), 406, 89–112, https://doi.org/10.1007/978-1-59745-535-0_4 (2007).
Kanehisa, M. et al. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic acids research 45(D1), D353–D361, https://doi.org/10.1093/nar/gkw1092 (2017).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102(43), 15545–15550, https://doi.org/10.1073/pnas.0506580102 (2005).
Perez-Riverol, Y. et al. The PRIDE database at 20 years: 2025 update. Nucleic acids research 53(D1), D543–D553, https://doi.org/10.1093/nar/gkae1011 (2025).
Kong, R., Yan, K. & Liu, D. Astral-DIA Proteomics Deciphers the Molecular Landscape of Human Spermatozoa: A Comprehensive Quantitative Proteomic Characterization of Sperm Function and Fertility Potential. https://identifiers.org/pride.project:PXD061698 (2025).
Kong, R., Yan, K. & Liu, D. Astral-DIA Proteomics Deciphers the Molecular Landscape of Human Spermatozoa: A Comprehensive Quantitative Proteomic Characterization of Sperm Function and Fertility Potential. https://identifiers.org/pride.project:PXD061725 (2025).
Kong, R., Yan, K., Liu, D. & Wu, W. A Comprehensive Mass Spectrometry-Based Proteomic Dataset of Human Spermatozoa. https://identifiers.org/pride.project:PXD066600 (2025).
Luo, M. et al. Proteomic Landscape of Human Spermatozoa: Optimized Extraction Method and Application. Cells 11(24), 4064, https://doi.org/10.3390/cells11244064 (2022).
Pini, T. et al. Towards a kingdom of reproductive life - the core sperm proteome. Reproduction. 2025;169(6): e250105. https://doi.org/10.1530/REP-25-0105 (2025).
Cui, C. et al. Quantitative proteomic and phosphoproteomic analysis of chicken skeletal muscle during embryonic development. Animal biotechnology 34(2), 122–133, https://doi.org/10.1080/10495398.2021.1941071 (2021).
Sarkar, S. et al. Array-based DNA methylation profiling reveals peripheral blood differential methylation in male infertility. Fertility and sterility 112(1), 61–72.e1, https://doi.org/10.1016/j.fertnstert.2019.03.020 (2019).
Dai, C. L. et al. CXCL6: A potential therapeutic target for inflammation and cancer. Clinical and experimental medicine 23(8), 4413–4427, https://doi.org/10.1007/s10238-023-01152-8 (2023).
Liu, T. D. et al. Gene expression profiling of rat testis development during the early post-natal stages. Reproduction in domestic animals = Zuchthygiene 47(5), 724–731, https://doi.org/10.1111/j.1439-0531.2011.01950.x (2012).
Chaweewannakorn, W. et al. Ameloblastin attenuates RANKL-mediated osteoclastogenesis by suppressing activation of nuclear factor of activated T-cell cytoplasmic 1 (NFATc1). Journal of cellular physiology 234(2), 1745–1757, https://doi.org/10.1002/jcp.27045 (2019).
Bhattacharya, A. et al. MUC1-C Dictates JUN and BAF-Mediated Chromatin Remodeling at Enhancer Signatures in Cancer Stem Cells. Molecular cancer research: MCR 20(4), 556–567, https://doi.org/10.1158/1541-7786.MCR-21-0672 (2022).
Acknowledgements
This work was supported by the National Key R&D Program of China (grant number 2022YFC3401900) and the Peking University clinical medicine youth special fund (grant number PKU20222LCXQ42).
Author information
Authors and Affiliations
Contributions
R.K. and D.L. conceived and designed the study. R.K. and W.W. wrote the main manuscript and analysed the data. R.K. and K.Y. performed the experiments. The final manuscript was reviewed and approved by all authors without disagreement.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Kong, R., Yan, K., Liu, D. et al. A Comprehensive Mass Spectrometry-Based Proteomic Dataset of Human Spermatozoa. Sci Data 12, 1509 (2025). https://doi.org/10.1038/s41597-025-05824-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-025-05824-w