Introduction

Infertility affects approximately 8–12% of reproductive-age couples globally, with male factors accounting for more than 50% of cases1. Recent findings published in Human Reproduction Update (2022) reported a significant decline in global sperm counts over the past five decades (1973–2018), raising substantial concerns about male reproductive health2. Current clinical evaluation of male fertility primarily relies on conventional semen analysis, which examines basic parameters including sperm concentration, motility, morphology, and ejaculate volume3,4. However, this traditional approach shows considerable limitations in predicting fertility potential due to inherent temporal variability and other confounding factors5. These limitations have driven growing interest in identifying novel molecular biomarkers in semen to enhance fertility assessments and improve outcomes in assisted reproductive technologies.

The emergence of high-throughput omics technologies, including genomics, proteomics, and metabolomics, has revolutionized infertility research. Compared to traditional diagnostic methods, omics approaches enable the simultaneous analysis of multiple biological pathways, offering deeper insights into the interaction between genetic, proteomic, and metabolic factors6,7. This integrative perspective not only enhances diagnostic precision but also facilitates the development of personalized treatments in reproductive medicine. Among omics techniques, metabolomics offers unique advantages as endogenous metabolites more directly reflect cellular phenotypes and exhibit higher sensitivity to biological changes compared to genes and proteins8.

Although both sperm and seminal plasma metabolomes are valuable in infertility research, previous studies have predominantly focused on seminal plasma due to its easier accessibility. However, this approach has inherent limitations as seminal plasma primarily serves as a supportive medium and does not directly participate in fertilization. In contrast, sperm cells possess unique metabolic pathways that directly affect motility, capacitation, acrosome reaction, and other fertilization processes9,10. Investigating these pathways through targeted metabolomics offers a dual advantage: it enhances understanding of the molecular basis of infertility and facilitates the identification of potential biomarkers for clinical interventions. Recent studies have yielded significant findings in this domain, such as the identification of age-related sperm metabolic biomarkers, smoking-induced sperm metabolic alterations, and sperm metabolic pathways associated with unexplained recurrent spontaneous abortion11,12,13. Nevertheless, a critical gap remains in distinguishing the distinct sperm metabolic profiles of healthy men from those with specific infertility phenotypes.

This study employs a comprehensive metabolomic analysis to investigate sperm samples from men with asthenozoospermia and teratozoospermia, compared to normozoospermia. By integrating targeted metabolomics with machine learning approaches, we aim to identify distinct metabolic signatures associated with these conditions, develop more accurate diagnostic models, and elucidate the underlying molecular mechanisms. This research not only improves our understanding of male infertility but also has the potential to identify novel therapeutic targets.

Results and discussion

Demographic characteristics and semen parameters

The comprehensive workflow of this study is illustrated in Fig. 1. Analysis of demographic characteristics revealed no significant differences in age distribution among the three groups. Similarly, basic semen parameters including ejaculate volume, sperm concentration, and total sperm count showed no statistically significant differences between normozoospermia and patients with asthenozoospermia or teratozoospermia (Table 1).

Sperm motility parameters demonstrated marked differences between the asthenozoospermia and normozoospermia. Total motility in the asthenozoospermia group was severely impaired at 31.00% (IQR: 23.25-40.00) compared to 79.00% (IQR: 72.75-82.00) in the normozoospermia, representing a 60.8% reduction. Similarly, progressive motility showed an even more pronounced decrease of 76.2%, with values of 15.00% (IQR: 10.00-24.25) in the asthenozoospermia group compared to 63.00% (IQR: 55.75–68.25) in normozoospermia.

Morphological analysis revealed substantial differences between the teratozoospermia and normozoospermia. The normal sperm count in the teratozoospermia group was markedly reduced to 3.00 × 10⁶ (IQR: 2.00–5.00) compared to 14.00 × 10⁶ (IQR: 11.50–16.00) in normozoospermia, indicating a 78.6% reduction. The percentage of morphologically normal spermatozoa showed a similar pattern, with a 79.1% decrease from 6.70% (IQR: 5.38–7.83) in normozoospermia to 1.40% (IQR: 1.00-2.50) in the teratozoospermia group.

These findings demonstrate clear and distinct phenotypic characteristics of asthenozoospermia and teratozoospermia, with substantial impairments in motility and morphology parameters, respectively. The marked differences in these key parameters validate our patient classification and provide a robust foundation for subsequent metabolomic analyses aimed at understanding the underlying molecular mechanisms of these conditions.

Fig. 1
Fig. 1
Full size image

Workflow of this study.

Table 1 Demographic characteristics and semen parameters of study population.

Sperm metabolomic differences among asthenozoospermia, teratozoospermia and normozoospermia

Using our established widely targeted metabolomics approach, we conducted comprehensive metabolomic profiling of sperm samples from all study groups. From the 417 targeted metabolites, we successfully detected 250 metabolites across all samples. After rigorous quality control, including removal of metabolites with excessive missing values (> 80% in QC samples) and exclusion of exogenous compounds, we obtained a final dataset of 89 endogenous metabolites. These metabolites were classified into four major categories: amino acids and derivatives, lipids, nucleic acids, and other compounds (Fig. 2A and Supplemental Table 2).

Principal Component Analysis (PCA) was initially performed to evaluate the overall metabolic profiles and analytical stability. The tight clustering of quality control samples in the PCA score plot demonstrated excellent instrumental stability throughout the analysis period. More importantly, the PCA revealed distinct clustering patterns among asthenozoospermia, teratozoospermia, and normozoospermia, indicating substantial differences in their metabolic profiles.

Comparative analysis between asthenozoospermia and normozoospermia revealed significant alterations in 47 metabolites (Fig. 2B and Supplemental Table 3). The most pronounced changes were observed in amino acids and their derivatives (20 metabolites) and lipid species (13 metabolites). The pathway enrichment analysis for asthenozoospermia highlighted significant perturbations in glycine, serine, and threonine metabolism, as well as linoleic acid metabolism. These metabolic alterations suggest fundamental disruptions in plasma membrane fluidity and mitochondrial bioenergetics. These metabolic alterations suggest fundamental disruptions in plasma membrane fluidity and mitochondrial bioenergetics, which likely contribute to the impaired sperm motility characteristic of asthenozoospermia.

The comparison between teratozoospermia and normozoospermia identified 25 significantly altered metabolites (Fig. 2C and Supplemental Table 4). These changes encompassed 10 amino acids and derivatives, 8 lipids, and 6 nucleic acids. Pathway analysis revealed significant enrichment in phenylalanine metabolism, linoleic acid metabolism, and purine metabolism. The involvement of these pathways, particularly those related to protein synthesis and membrane structure, provides mechanistic insights into the development of abnormal sperm morphology in teratozoospermia.

Fig. 2
Fig. 2
Full size image

Sperm metabolomic analysis. (A) Representative two-dimensional liquid chromatography-mass spectrometry chromatogram of sperm metabolome; Score plot of principal component analysis showing distinct clustering of quality controls (gray), normozoospermia (yellow), asthenozoospermia (blue), and teratozoospermia (red); Pathway enrichment plot of 89 metabolites among asthenozoospermia, teratozoospermia and normozoospermia. (B) volcano plot, bar chart and pathway enrichment plot of differential metabolites between asthenozoospermia and normozoospermia. (C) volcano plot, bar chart and pathway enrichment plot of differential metabolites between teratozoospermia and normozoospermia.

Diagnostic models for asthenozoospermia and teratozoospermia

To develop robust diagnostic models for asthenozoospermia and teratozoospermia, we implemented a comprehensive machine learning framework incorporating multiple algorithms and validation strategies (Fig. 3). Bootstrap resampling and repeated cross-validation minimized overfitting risks despite the limited sample size. Among the nine machine learning algorithms evaluated, the Glmnet model demonstrated promising performance (Table 2), achieving high diagnostic accuracy with AUC values of 0.99 (95% CI: 0.9572, 1) for asthenozoospermia and 0.9997 (95% CI: 1, 1) for teratozoospermia. The robustness of the model was validated through multiple validation methods, including bootstrap sampling, holdout validation, repeated cross-validation, and subsampling, with bootstrap sampling emerging as the optimal choice due to its reliability in handling limited sample sizes and providing stable variance estimates (Supplemental Tables 5 and 6).

Table 2 AUC values of machine learning models for distinguishing asthenozoospermia and teratozoospermia from normozoospermia.

We further conducted SHAP analysis to identify key sperm metabolic features associated with asthenozoospermia and teratozoospermia. Out of the 89 metabolites under investigation, 18 metabolites had non-zero SHAP values (Supplemental Table 7). For patients with asthenozoospermia, SHAP analysis indentified two key metabolic markers: corticosterone and arachidate. These metabolites had high mean absolute SHAP values, demonstrating their substantial contributions to the model’s predictions, as illustrated in Fig. 3A and Supplemental Table 7. Independent black-box analysis further validated these findings. Conversely, for patients with teratozoospermia, a distinct metabolic profile was observed, with corticosterone alterations playing a dominant role. This was supported by its high mean absolute SHAP value, as depicted in Fig. 3B and Supplemental Table 8, as confirmed by both SHAP summary plots and black-box analysis. In contrast, teratozoospermia patients showed a distinct metabolic signature dominated by corticosterone alterations (evidenced by its high mean absolute SHAP value in Fig. 3B and Supplemental Table 8), as evidenced in both SHAP summary plots and black-box analysis (Supplemental Table 8).

To guarantee the reliability of these metabolic signatures, a dual - validation strategy was executed, integrating SHAP values with permutation importance scores (Supplemental Tables 9 and 10). This methodology corroborated corticosterone and arachidate as important diagnostic markers for asthenozoospermia, with a combined diagnostic accuracy of 0.9886. In the case of teratozoospermia, corticosterone alone exhibited high diagnostic utility, with AUC of 0.9999.

The identification of these distinct metabolic signatures not only provides insights into the underlying pathophysiology of asthenozoospermia and teratozoospermia but also offers practical diagnostic tools. Corticosterone’s emergence as a shared biomarker suggests common metabolic perturbations in both conditions, while arachidate’s specific association with asthenozoospermia indicates distinct metabolic disruptions in sperm motility disorders. These findings establish a foundation for developing targeted diagnostic approaches and potential therapeutic strategies for male infertility.

Fig. 3
Fig. 3
Full size image

Machine learning-based metabolic biomarker identification and validation for asthenozoospermia (A) and teratozoospermia (B), showing ROC curves, SHAP summary plots of metabolite contributions, black-box interpretation of key metabolites, and ROC curves of identified metabolic signatures.

Biological significance of differential metabolites in asthenozoospermia and teratozoospermia

Metabolomic profiling revealed distinct metabolic alterations in asthenozoospermia and teratozoospermia, highlighting disruptions in four major pathways: bioenergetic metabolism, lipid metabolism, amino acid regulation, and oxidative stress response. While our metabolomic analysis revealed a significant downregulation of corticosterone that may predispose to oxidative stress through its systemic stress-modulating effects, direct evidence from canonical antioxidant metabolites (L-glutathione, L-cysteine, L-methionine) was lacking as they did not meet our stringent significance thresholds. However, we observed a consistent, though modest, elevation of hypoxanthine in both patient groups, whose catabolism may contribute to ROS generation. More importantly, the Glmnet al.gorithm identified perturbations in key amino acid pathways - including methionine (precursor for glutathione synthesis and methylation via S-adenosylmethionine), glutamate (direct glutathione precursor), and cystine (oxidized cysteine dimer) - that collectively suggest impaired antioxidant defense mechanisms. These metabolic disturbances, particularly in glutathione-related pathways, align with established mechanisms of sperm dysfunction14,15,16,17 though we emphasize that the oxidative stress implications remain inferential without direct ROS measurements. The concurrent findings of corticosterone reduction, hypoxanthine elevation, and glutathione pathway alterations provide a coherent, though indirect, metabolic signature of potential redox imbalance in asthenozoospermia.

One of the most striking observations from our analysis was the significant alteration of energy metabolism-related metabolites, particularly those associated with mitochondrial function. Our metabolomic analysis revealed significant upregulation of L-carnitine and creatine in asthenozoospermia, indicative of compensatory bioenergetic adaptations to maintain ATP production and support sperm motility. These findings align with established mechanisms of energy homeostasis in spermatozoa, as demonstrated by previous studies18,19,20. The Glmnet model further identified lipoamide as a key discriminant metabolite in energy metabolism pathways. As an essential mitochondrial cofactor in pyruvate dehydrogenase and α-ketoglutarate dehydrogenase complexes, lipoamide’s altered levels provide direct evidence of impaired bioenergetics in asthenozoospermia, consistent with established biochemical mechanisms21,22. Taken together, these identified metabolic perturbations demonstrate multi-level dysregulation of bioenergetic pathways which potentially affect normal sperm motility. Similarly, creatine contributes to ATP regeneration through the phosphocreatine shuttle system, a critical mechanism for sustaining energy levels in cells with high energy demands, such as sperms. These adaptations involving L-carnitine and creatine possibly represent cellular attempts to counteract energy deficits that contribute to impaired sperm motility in asthenozoospermia. While our targeted metabolomic panel highlighted these specific changes, a comprehensive assessment of ATP production would involve a broader range of ATP cycle intermediates, some of which may not have been covered or reached statistical significance in this study. Our findings demonstrate significant alterations in energy-related metabolism and the glycine-serine-threonine metabolic axis that directly supports mitochondrial bioenergetics, which suggest systemic dysregulation of cellular energy homeostasis in asthenozoospermic.

Metabolomic profiling also revealed significant disturbances in the metabolism of membrane phospholipids, which are crucial for sperm membrane integrity. Notably, squalene, an essential cholesterol precursor, was significantly downregulated in both asthenozoospermia and teratozoospermia, indicating alterations in sterol biosynthesis pathways and potentially compromised membrane fluidity23,24. The relevance of squalene’s alteration was underscored by its identification as a significant feature in the Glmnet models. This view of potentially disrupted lipid metabolism in asthenozoospermia is broadened by the Glmnet model’s further recognition of linoleate, an essential polyunsaturated fatty acid, and didecanoyl-glycerophosphocholine, a specific phosphatidylcholine species, as contributing metabolites. Linoleic acid metabolism was previously identified as a significantly perturbed pathway. Consequently, altered levels of linoleate and specific phospholipids such as didecanoyl-glycerophosphocholine might profoundly impact sperm membrane integrity, fluidity, and critical functions including capacitation and the acrosome reaction25,26. The concurrent dysregulation of these lipids pinpointed by the Glmnet model (squalene, linoleate, and specific glycerophosphocholines) suggests a possible complex disruption of lipid metabolism affecting sperm membrane architecture and function in asthenozoospermia. Specifically, squalene’s role in membrane fluidity is vital for the sperm’s ability to undergo capacitation and the acrosome reaction, both of which are essential for fertilization27. This downregulation further supports the hypothesis that disruptions in membrane lipid metabolism are central to the pathophysiology of sperm motility and morphology defects in both diseases.

The presence of corticosterone in sperm, while primarily synthesized in the adrenal glands, may be explained through several plausible mechanisms. As a lipophilic steroid hormone, circulating corticosterone could potentially diffuse across cell membranes into spermatozoa or be transported via specific carriers in the male reproductive tract. Although direct synthesis of corticosterone by sperm cells has not been established, its consistent downregulation in both asthenozoospermia and teratozoospermia (p < 0.01) strongly suggests a meaningful association with sperm dysfunction, as supported by recent literature17,28. This observation gains further significance when considered alongside other steroid-related molecules identified by our Glmnet model, including deoxycorticosterone acetate and estradiol-17α. The concurrent alterations in antioxidant micronutrients such as β-carotene and vitamin D229,30 suggest a potential interplay between steroid signaling and oxidative stress regulation in sperm pathology. Corticosterone’s known role in systemic stress responses raises the possibility that its dysregulation in sperm may reflect or contribute to a suboptimal microenvironment, potentially through modulation of redox balance or energy metabolism. The Glmnet model for asthenozoospermia extended this perspective by recognizing other compounds with potential relevance to endocrine signaling or antioxidant defense as contributing features. These included beta-carotene, vitamin D2, deoxycorticosterone acetate, and estradiol-17alpha. Beta-carotene and vitamin D2 are acknowledged for their antioxidant capacities and established roles in male reproductive physiology29,30 suggesting that alterations in these micronutrients might reflect or contribute to diminished antioxidant defenses in asthenozoospermia. Moreover, the model’s emphasis on other steroid-related molecules like deoxycorticosterone acetate and estradiol-17alpha, alongside corticosterone, may signify a more extensive dysregulation of steroid signaling or metabolism possibly influencing sperm function, although their precise intra-spermatozoal roles necessitate further elucidation. Given corticosterone’s role in modulating the body’s response to stress, its dysregulation, as observed here, may contribute to an unfavorable environment for sperm, possibly through increased oxidative stress and cellular damage, as discussed below. However, the precise mechanisms of corticosterone’s entry into sperm and its direct functional roles therein remain to be elucidated and represent an area for future research. While corticosterone alterations suggested shared stress-related dysregulation, the Glmnet model for teratozoospermia also brought to light potential disturbances in energy-related pathways as contributing distinguishing features.

For teratozoospermia, the discussion regarding creatine’s role states that this model highlighted creatine, pivotal for ATP regeneration via the phosphocreatine shuttle. This is supported by literature31. The text also mentions N, N,N-trimethyllysine, a precursor in L-carnitine biosynthesis essential for mitochondrial fatty acid oxidation32 as relevant metabolites. An adequate energy supply is considered indispensable not only for motility but also for the complex, energy-intensive morphogenetic events of spermiogenesis. Therefore, the Glmnet model’s emphasis on potential disruptions in these pathways suggests that compromised energy provision could conceivably impair normal sperm structural development, thereby contributing to teratozoospermia.

Furthermore, the recognition of melatonin as a key contributing metabolite by the Glmnet model for teratozoospermia may offer valuable insight. Melatonin is a well-established, potent antioxidant known to confer protection against oxidative cellular damage across various physiological systems, including the male reproductive tract. The emphasis placed by the Glmnet model on altered melatonin in teratozoospermic individuals might indicate a compromised intrinsic antioxidant defense system33,34. Such a deficiency could render developing spermatozoa more vulnerable to oxidative insults, which are recognized etiological factors for abnormal sperm morphology.

Complementing these observations, the Glmnet models for teratozoospermia also designated pipecolate, an intermediate of lysine catabolism, and norspermidine, a polyamine, as relevant differentiating features. Polyamines, including norspermidine, are considered integral to processes of cell growth, differentiation, and the stabilization of nucleic acids, all of which are fundamentally important during spermiogenesis. Consequently, the model’s identification of alterations in pathways involving lysine degradation products or polyamines could signify interference with normal sperm development, thereby potentially contributing to morphological defects35.

Analysis with the Glmnet model further highlighted arachidate (a C20:0 saturated fatty acid) as a particularly relevant and specific potential biomarker for asthenozoospermia, exhibiting significant upregulation. This finding for arachidate aligns with, and is reinforced by, the Glmnet model’s concurrent identification of alterations in other key lipids such as linoleate and squalene. These observations collectively may point towards substantial perturbations in lipid metabolism within asthenozoospermic samples. While the observed upregulation may represent a compensatory mechanism to enhance membrane stability in response to disrupted lipid metabolism36 the precise functional relationship between this metabolite and sperm motility remains to be fully elucidated. However, while this adaptive mechanism may help stabilize membrane fluidity, it appears insufficient to fully restore proper sperm function. The increased levels of arachidate in asthenozoospermia may indicate an attempt by sperm to compensate for disruptions in lipid metabolism, particularly within the membrane, which is critical for sperm motility. While this adaptive mechanism might aim to stabilize membrane fluidity, it appears insufficient to fully restore normal sperm function, as reflected in the defining motility defects of asthenozoospermia. The observed upregulation of arachidate serves as a specific biomarker of impaired lipid metabolism in spermatozoa, with direct functional consequences for sperm motility—the defining pathological feature of asthenozoospermia25,26.

Materials and methods

Chemicals and reagents

HPLC-grade acetonitrile (ACN) and methanol (MeOH) were purchased from Merck (Darmstadt, Germany). Formic acid and ammonium acetate were obtained from Anpel Laboratory Technologies Inc. (Shanghai, China). Ultrapure water (18.2 MΩ cm) was generated using a Milli-Q water purification system (Millipore, Bedford, MA, USA).

Study participants

This case-control study was conducted with approval from the Ethics Committee of the First Affiliated Hospital of Xiamen University (2022KY003). We recruited 131 Chinese men (age range: 22–48 years) from the Department of Reproductive Medicine between March and April 2022. Participants were categorized into three distinct groups based on routine semen analysis according to the World Health Organization (WHO) Laboratory Manual for the Examination and Processing of Human Semen (5th edition): Normozoospermia (n = 48): Participants meeting all WHO criteria for normal semen parameters (e.g., sperm concentration ≥ 15 × 10⁶/mL, progressive motility ≥ 32%, normal morphology ≥ 4%). Asthenozoospermia (n = 40): Participants with progressive sperm motility < 32%, but with normal sperm morphology (≥ 4%) and sperm concentration (≥ 15 × 10⁶/mL). Teratozoospermia (n = 43): Participants with normal sperm morphology < 4%, but with normal progressive sperm motility (≥ 32%) and sperm concentration (≥ 15 × 10⁶/mL). Individuals presenting with oligozoospermia (sperm concentration < 15 × 10⁶/mL) or combined phenotypes (e.g., oligoasthenoteratozoospermia, or meeting criteria for both asthenozoospermia and teratozoospermia as defined above) were excluded from these specific groups to isolate the metabolic signatures primarily associated with either impaired motility or abnormal morphology. Data on the total number of individuals screened to achieve these specific cohort sizes were not prospectively collected for this study. Demographic characteristics and semen parameters of the study population are detailed in Table 1. As detailed in Table 1, the sperm concentrations were not statistically different among the final study groups, ensuring that this variable did not confound the specific effects of motility or morphology defects. This stringent phenotypic separation was essential for our primary research objective.

Semen analysis

Semen analysis was conducted according to the WHO Laboratory Manual for the Examination and Processing of Human Semen (5th edition). Following liquefaction at 37 °C for 30 min, semen volume was measured using standardized pipettes, and pH was assessed using indicator strips. Sperm concentration and motility (progressive, non-progressive, and immotile) were determined using the Suijia SSA CASA system (Changsha, China), analyzing ≥ 200 sperm per sample. Morphology was evaluated via Papanicolaou staining (≥ 200 spermatozoa counted at 1000× magnification). All assessments were performed by two independent embryologists; discrepancies > 10% triggered a third evaluation. For samples requiring dilution (sperm concentration < 15 × 10⁶/mL), smears were prepared from centrifuged pellets. High-concentration/viscous samples were diluted with physiological saline to 40–70 × 10⁶/mL before thin/drop-smear preparation.

Sperm metabolome analysis

First, sperm concentrations were quantified using the Suijia SSA system during routine semen analysis. Based on these measurements, the precise volume of liquefied semen required to obtain 1 × 107 spermatozoa was calculated for each sample. Following gentle vortexing to ensure homogeneity, the calculated semen volume was transferred to a new tube.

The subsequent sperm washing procedure involved three iterative steps. Samples were initially centrifuged at 12,000 × g for 15 min at 4 °C to pellet the sperm, after which the seminal plasma supernatant was discarded. The sperm pellet was then resuspended in 1 mL of pre - chilled PBS, briefly vortexed for uniform dispersion, and centrifuged again under identical conditions. This PBS washing cycle was repeated twice to effectively remove seminal plasma components.

To remove residual salts, a single wash with 1 mL ultrapure water was conducted, followed by centrifugation and complete supernatant removal.

Finally, for metabolite extraction, 1 mL of 80% methanol (pre - chilled to − 20  to − 80 °C as specified) was added to the sperm pellet along with zirconia beads. Homogenization was performed using a FastPrep® − 24 system at 4 m/s, consisting of five 20 - second cycles with 5 - minute intervals on ice between each cycle to prevent thermal degradation. Following centrifugation, the supernatants were subjected to vacuum concentration and reconstituted in 200 µL 1:1 (v/v) methanol/water solution for subsequent analysis.

Sperm metabolomic analysis was conducted using a Nexera LC-40 heart-cutting two-dimensional liquid chromatography coupled with an 8060NX triple quadrupole mass spectrometer (Shimadzu, Japan), according to previously established protocols37. The system employed a ZIC-cHILIC column for hydrophilic metabolites and an Acclaim C8 column for hydrophobic metabolites. Gradient elution conditions are detailed in Supplemental Table 1. Mass spectrometry parameters were optimized as follows: nebulizer gas 3 L/min, drying gas 10 L/min, heated gas 10 L/min, desolvation temperature 526 °C, desolvation line temperature 250 °C, heat block temperature 400 °C, and collision-induced dissociation gas pressure 270 kPa. Quality control (QC) samples were prepared by pooling 10 µL aliquots from each sample, and were injected every 15 samples to monitor analytical reproducibility.

Raw data were processed using LabSolutions (version 5.99.2, Shimadzu, Japan). Imputation with minimum observed value/√2 and metabolite exclusion based on QC sample detection rates (< 80% detection in QC samples) and RSD (> 30% in QC samples).

The classification of metabolites as endogenous or exogenous was based on the Human Metabolome Database (HMDB). This involved verifying their established presence and origin within human biological systems. Metabolite features categorized as exogenous, including drug metabolites unrelated to the study, known food additives, and common environmental contaminants not anticipated in sperm, were excluded from the dataset prior to statistical analysis.

Statistical analysis

Multivariate analyses including principal component analysis (PCA) and partial least-squares discriminant analysis (PLS-DA) were performed using Simca-P 14.1 (Umetrics, Sweden). Differential metabolites of asthenozoospermia and teratozoospermia were identified using criteria of FC > 1.5 or < 0.67 and FDR < 0.05.

For identifying sperm metabolic biomarkers for asthenozoospermia and teratozoospermia, a comprehensive machine learning framework was implemented using the mlr3 package (version 0.18.0). The dataset was randomly split into training and testing sets using a 7:3 ratio. Nine machine learning algorithms were evaluated, including k-Nearest Neighbors (kNN), linear discriminant analysis (LDA), Naïve Bayes, neural network (nnet), support vector machine (SVM), extreme gradient boosting (XGBoost), Random Forest, least absolute shrinkage and selection operator (lasso, implemented via Glmnet), and logistic regression.

For our glmnet model optimization, we implemented a comprehensive hyperparameter tuning strategy using a well-defined search space. To differentiate asthenozoospermia from normozoospermia, we simultaneously optimized three critical parameters: alpha (elastic net mixing parameter ranging from 0 to 1), lambda (regularization strength ranging from 0.0001 to 1 on a logarithmic scale), and s (the proportion of lambda value used for prediction ranging from 0.01 to 1). The hyperparameter optimization was performed using grid search with a resolution of 10 points per dimension, exploring a total of 50 different parameter configurations as determined by our terminator criterion. To ensure robust performance estimation, we employed a bootstrap resampling strategy with 1000 repetitions and a 0.7 sampling ratio. Classification error rate was used as the optimization metric. Our tuning process yielded an optimal parameter configuration of alpha = 0.3333333, lambda = 0.005994843, and s = 0.67. Similarly, for distinguishing teratozoospermia from normozoospermia, the same tuning strategy was applied. The tuning process yielded an optimal parameter configuration of alpha = 1, lambda = 0.005994843, and s = 0.12. The final model validation utilized 1000-iteration bootstrap resampling.

To ensure robustness, three additional resampling methods—holdout validation, repeated cross-validation (CV), and subsampling—were conducted, each with 1000 iterations. To enhance interpretability of machine learning models in sperm metabolomics analysis, SHapley Additive exPlanation (SHAP) values were utilized to assess feature importance38,39,40. SHAP values are employed to measure the importance of features, where larger absolute SHAP values signify greater significance, and the positive or negative sign denotes the direction in which a metabolite influences the predicted risk. Corticosterone and arachidate were identified as crucial biomarkers based on their high importance scores from SHAP analysis. This finding was further confirmed by permutation importance scores. Additionally, permutation importance values were calculated across 100 iterations using the iml package (version 0.11.3).

All statistical analyses were performed in R (version 4.3.1). Descriptive statistics were generated using the gtsummary package (version 1.7.1). Metabolic pathway enrichment analyses were conducted using MetaboAnalyst 6.0 (https://www.metaboanalyst.ca)41.

Limitations

While this study provides novel insights into sperm metabolic signatures, several limitations should be acknowledged. First, our machine learning models, despite showing high accuracy with internal validation techniques, have not been validated on an external, independent cohort. Therefore, the generalizability of our findings requires confirmation, and the risk of overfitting, although minimized through robust methods like regularization and bootstrap resampling, cannot be entirely excluded. Second, the study’s cross-sectional design limits our ability to infer causality between metabolic changes and sperm dysfunction. Third, we did not collect detailed data on potential confounding factors such as diet, lifestyle (e.g., smoking, alcohol consumption), and environmental exposures, which may have influenced the results. Fourth, key mechanistic questions remain unresolved, particularly regarding corticosterone’s origin in sperm and its precise functional roles. Finally, while our findings suggest a link to oxidative stress, this was inferred from metabolic perturbations rather than direct measurement of reactive oxygen species.

Conclusions

Our sperm metabolomic study has revealed unique metabolic signatures for asthenozoospermia and teratozoospermia. Through the application of machine learning techniques, notably the Glmnet model, distinct metabolic patterns were elucidated. Within these, corticosterone was identified as a candidate biomarker common to both conditions, and arachidate emerged as a potential specific biomarker for asthenozoospermia, with their significance supported by alterations in related metabolic pathways. These complex metabolic signatures, involving disruptions in energy metabolism, lipid metabolism, and amino acid regulation, offer valuable insights into the potential pathophysiological mechanisms underlying sperm dysfunction. They also propose potential diagnostic biomarkers and therapeutic targets for male infertility, although extensive further validation in larger, independent cohorts is required before any clinical application can be considered.