Introduction

Aging is closely associated with fertility in women. Female fertility significantly declines after the age of 35 years and is closely linked to a decline in ovarian function due to aging1,2. Ovarian aging is associated with a reduction in the primordial follicle pool, a decline in oocyte quality, and poor ovarian responsiveness to hormonally controlled ovarian stimulation2,3. This has led to low fertilization success rates in clinical trials1,4. Several therapeutic options, including stem cell therapy5,6, growth hormone administration7, and platelet-rich plasma infusion8, have been used to improve ovarian function in human and rodent models of ovarian aging. Although these methods have been effective in preventing and reversing ovarian aging, there is insufficient evidence regarding their safety and efficacy. Therefore, further exploration of alternative therapeutic approaches capable of regulating and preserving ovarian function in patients with age-related infertility is required.

Samul-tang (SM), an herbal formula comprising four herbs, Paeonia Radix, Cnidii Rhizoma, Rehmanniae Radix Preparata, and Angelicae Gigantis Radix in a 1:1:1:1 ratio, is a widely used herbal medicine that has been clinically prescribed for gynecologic disorders to improve blood circulation, such as menstrual irregularities, anemia, dysmenorrhea, and menopausal syndrome9,10. Recently, in relation to ovarian function, the therapeutic potential of SM has been reported, including its phytoestrogen effect, improvement in follicular maldevelopment, and relief from damage to ovarian function caused by cyclophosphamide11,12,13. We previously investigated the therapeutic effects of SM on fertility in older mice. SM prevents the loss of primordial ovarian follicles, improves oocyte quality, and enhances embryonic developmental competence and implantation potential in aged mice14. Analysis of differentially expressed genes (DEGs) between old mice administered vehicle or SM revealed that its therapeutic effects on ovarian function under aging conditions can be linked to the rat sarcoma virus (Ras) signaling pathway14. Building on this previous study, we hypothesized that SM exerts its pharmacological effects through concurrent regulation of multiple biomolecules by its active compounds. Although the representative compounds of the four major herbs of SM are known, the bioactive compounds of SM related to the improvement of ovarian function have not yet been reported. Identifying the active compounds in SM may provide valuable insights into potential treatments for age-related female infertility.

Deciphering bioactive compounds in herbal medicines is attractive because of their low toxicity and high bioactivity15,16. In silico methods have emerged as a systematic approach for identifying active compounds in medicinal herbs and their potential molecular targets17,18. This method typically leverages existing data derived from high-throughput screening, such as cell-based phenotypic assays and cell-free target-based assays19. However, despite the apparent efficiency of this method, there is a discrepancy in the performance of compounds predicted to be promising when applied to more complex biological systems. Cell-free and cell-based assays often lack information on the complex interplay between various cell types and tissues in organ systems with multifactorial diseases, including infertility20. Therefore, gaining insight into the pharmacodynamics of each compound in an in vivo disease model may be a key strategy for effectively identifying active compounds. Accordingly, we generated transcriptome profiles of SM compounds in aged female mice.

In this study, we present a systematic compound combination approach using ovarian transcriptomes induced by the individual components of SM in aged mice to identify the active compounds that closely mimic the pharmacological effects of SM. Initially, we assumed that the therapeutic efficacy of SM arose from complex interactions among several physiologically active compounds. We identified signature genes from SM-induced in vivo transcriptomic data and used them to identify combinations of compounds similar to SM. Compound combinations were determined using the multiple linear regression (MLR) method, a representative artificial intelligence technique, and a supervised machine learning algorithm, which allows the estimation of the relationship between the given independent variables and all outcomes through a relatively simple yet precise modeling of the variables21.

From these predicted compound combinations, we identified active compounds with biological effects similar to those of SM in improving ovarian function. Briefly, our framework consisted of five distinct steps: i) identification of key gene signatures responsible for driving the pharmacological effects of SM, ii) generation of gene signature profiles for the constituent compounds of SM, iii) prediction of compound combinations, iv) in vivo validation, and v) identification of active compounds. Therefore, the current study aimed to discover active compounds with therapeutic potential for age-related female infertility based on the application of transcriptional data induced by the compositional compounds of SM.

Results

Identification of 21 SM-induced RAS signature genes

To identify the molecular signatures indicative of the therapeutic effects of SM on age-related female infertility, we revisited ovarian transcriptome data (gene expression omnibus number: GSE264413) from a previous study that compared aged (OC) and young (YC) mice14. The DEGs between SM-treated and untreated OC, as well as between untreated YC and OC, were mapped onto a protein-protein physical interaction (PPI) network to identify core modules, including common gene neighbors closely linked to the DEGs (Fig. 1a). By employing PPI network-based DEG analysis for functional enrichment, we observed significant alterations in the Phosphatidylinositol 3-Kinase (PI3K)-serine/threonine protein kinase (Akt), Ras, and Ras-associated protein 1 (Rap1) signaling pathways in SM-treated OC (OC + SM) and untreated OC (Fig. 1b) mice. PI3K and Rap1 are major effectors downstream of Ras and are crucial for cellular functions such as proliferation and apoptosis22. In the ovary, Ras signaling is essential for ovulation and luteinization, and its inappropriate activation leads to granulosa cell dysfunction and premature ovarian failure23. Based on these observations, we postulated that the the therapeutic benefit of SM is mediated by the Ras signaling pathway. To identify key genes within the Ras pathway, we examined genes with differential expression in the ovaries of SM-administered OC mice and then refined this selection to a specific subset of 21 genes that exhibited the same up- and down-expression patterns in YC mice compared to OC mice (Fig. 1c). We refer to this set of genes as the ‘SM-induced Ras signature’, which encompasses seven upregulated genes and 14 downregulated genes following SM treatment (Fig. 1c). We anticipated that this pattern could be approximated by the sum of the individual patterns attributed to the major constituent compounds that significantly contribute to the mode of action (MOA). To further explore this pattern, we profiled the gene expression patterns of the RAS signature in the ovarian tissues of OC mice perturbed by individual constituent compounds of SM.

Fig. 1: Identification of SM-induced Ras-related genes through network-based differential gene expression analysis.
Fig. 1: Identification of SM-induced Ras-related genes through network-based differential gene expression analysis.The alternative text for this image may have been generated using AI.
Full size image

a Volcano plots showing the overall gene expression differences in the ovarian transcriptome data between SM-treated OC mice and untreated OC mice (left) and between untreated YC mice and OC mice (right). b Top 10 pathways enriched for the core protein-protein interaction modules closely linked to DEGs between SM-treated OC and untreated OC (left) and between untreated YC and OC (right). c Expression changes in the 21 RAS-related genes in the ovaries of OC mice following SM treatment and YC mice compared to OC control mice. SM, Samul-tang; Ras, Rat sarcoma virus; OC, old control (42-week-old mice; sampling at this age); YC, young control (12-week-old mice; sampling at this age); OC + SM, Samul-tang treated OC mice; DEGs, differentially expressed genes.

Expression patterns of the SM-induced Ras signature genes elicited by the 30 SM compounds

We identified and characterized the constituent chemical compounds of SM using UHPLC-MS/MS. Thirty distinct compounds were detected in SM, and their presence was confirmed by comparing their retention times and mass spectra with those of existing reference standards24 (Supplementary Table 1) (Fig. 2a). Subsequently, OC mice were administered each compound for four weeks and then superovulated to induce a post-ovulatory state, as described in a previous study14. The ovaries were then subjected to mRNA sequencing (Fig. 2b). Figure 2c shows the expression profiles of SM-induced 21 Ras signature genes for the 30 individual compounds.

Fig. 2: Expression patterns of the 21 SM-induced Ras-related genes following treatment with SM and its constituent compounds.
Fig. 2: Expression patterns of the 21 SM-induced Ras-related genes following treatment with SM and its constituent compounds.The alternative text for this image may have been generated using AI.
Full size image

a Negative and positive ion modes of the base peak chromatograms of SM extracted using UHPLC-Q-Exactive orbitrap-MS. The identified compounds are listed in Supplementary Table 1. b Experimental schematic for obtaining the compound-induced ovarian transcriptome. A group of three aged mice (38-week-old) received oral administration of one of the 30 compounds (5 mg/kg) or a vehicle of 0.5% DMSO in DW five times a week for four weeks. Following the administration of the compounds, the mice were superovulated using gonadotrophin (PMSG and hCG). Total RNA was isolated from the left ovary and subjected to RNA sequencing. c Expression changes in 21 Ras-related genes in the ovaries of OC mice after 30 compound treatments compared to OC mice without treatment. SM Samul-tang, RAS Rat sarcoma virus, DW Distilled water, OC old control (42-week-old mice; sampling at this age).

Prediction of optimal compound combinations mimicking expression patterns of the SM-induced Ras signature genes

We aimed to identify effective compound combinations that replicate the expression patterns of SM-induced Ras signature genes. To this end, we constructed a vector sum model using the MLR method on 21 Ras signature genes following treatment with 30 compounds. The Ras signature gene expression vector of SM was used as the dependent variable, and the Ras signature gene expression vector of each compound included in SM was used as the independent variable. The adjusted R-squared value of the linear regression prediction results was used to determine the rank of the compound combinations (Fig. 3a). The results of MLR, adjusted R-squared score, and compound coefficient values are shown in Supplementary Table 2. The three compound combinations that most accurately reproduced the expression patterns of Ras signature genes induced by SM using MLR were 1) butylphthalide and ferulic acid, 2) butylphthalide and oleanolic acid, and 3) isoacteoside and benzoyloxypeoniflorin. Butylphthalide and ferulic acid recapitulated 87.2% of the transcriptome pattern of SM, whereas butylphthalide and oleanolic acid recapitulated 86.9%, and isoacteoside and benzoyloxypeoniflorin recapitulated 83.5%. In other words, these combinations of compounds effectively mimicked the characteristic gene expression patterns of SM (Fig. 3b).

Fig. 3: Prediction results of SM compound combinations based on multiple linear regression.
Fig. 3: Prediction results of SM compound combinations based on multiple linear regression.The alternative text for this image may have been generated using AI.
Full size image

a Heatmap displaying adjusted R-squared values for predicting compound combinations using MLR. Because there were only two compound combinations in the top ranks of the MLR results, the heatmap selected and visualized only two compound combinations among the results in which the compound combinations were significant. Color-mapped combinations only used data with a p-value < 0.1 and coefficient > 0 in the multiset. The data used for color mapping were limited to those with a significant coefficient of determination (R2) ≥ 0.6. The red boxes represent single compounds or combinations of two compounds with R2 values greater than 0.8. b The top three optimal compound combinations, with adjusted R-squared scores > 0.8, best reproduced the expression patterns of the 21 SM-derived RAS genes. SM Samul-tang, MLR multi-linear regression model, RAS Rat sarcoma virus.

In vivo validation of compound combination effects on oocyte quality in aged mice

We examined the efficacy of the predicted combinations of compounds in aged mice. During the four-week administration of the compound combinations, we observed no significant changes in body weight (Fig. 4a). The ovary weight of OC mice was significantly lower than that of YC mice. The combination of ferulic acid and butylphthalide increased ovary weight compared to the OC+vehicle group, whereas other compound combinations did not affect ovarian weight recovery in OC mice (Fig. 4b). Next, we examined the quantity and quality of ovulated oocytes. To assess chromosomal and spindle alignment in oocytes, we performed immunofluorescence staining for microtubules and DNA. Oocytes with barrel-shaped bipolar spindles and well-ordered chromosomes during metaphase were considered normal25,26 (Supplementary Fig. 1a). Overall, the number of oocytes retrieved from old mice was significantly lower than that retrieved from YC mice (Fig. 4c, Supplementary Fig. 1b). The number of mature (MII; metaphase II stage) oocytes in OC mice treated with a combination of butylphthalide and oleanolic acid was significantly higher than that in the OC + vehicle group (Fig. 4d). It is well established that spindle abnormalities in oocytes tend to increase with age27. In this treatment group, the number of MII oocytes with normal spindle alignment was significantly higher than that in the OC + vehicle group (Fig. 4e). In contrast, groups treated with combinations of ferulic acid and butylphthalide, as well as benzoyloxypeoniflorin and isoacteoside, showed no statistically significant differences compared to the OC + vehicle group (Fig. 4d, e). These results suggest that the combination of butylphthalide and oleanolic acid could act as an effector of SM to improve both oocyte quantity and quality in aged mice.

Fig. 4: Assessment of the quantity and quality of ovulated oocytes after oral administration of compound combinations in mice.
Fig. 4: Assessment of the quantity and quality of ovulated oocytes after oral administration of compound combinations in mice.The alternative text for this image may have been generated using AI.
Full size image

a Body weight changes and (b) weights of the left and right ovaries were assessed following the administration of either 0.5% DMSO (vehicle) or 5 mg/kg of each compound combination for four weeks. Nine mice were used in each group. The number of retrieved (c) total oocytes, (d) metaphase II (MII) oocytes, and (e) MII oocytes with normal spindle alignment were counted to assess oocyte quantity and quality. Data are presented as the mean ± standard deviation. Statistical analyses were performed using the one-tailed unpaired t-test (b, c) or the one-tailed Mann–Whitney U test (d, e). p < 0.05. YC young control (12-week-old mice; sampling at this age), OC old control (42-week-old mice; sampling at this age), Veh vehicle, Fer ferulic acid, But butylphthalide, Ole oleanolic acid, Ben Benzoyloxypeoniflorin, Iso Isoacteoside.

Identification of active compounds in SM that improve oocyte quality in aged mice

To identify the active compounds from the compound combination results, we evaluated the effectiveness of each compound by administering butylphthalide and oleanolic acid separately to aged mice. The rates of change in body and ovary weights were not affected by the administration of a single compound for four weeks (Fig. 5a, b). The total number of oocytes collected did not recover significantly after treatment with a single compound (Fig. 5c, Supplementary Fig. 1c). The number of MII oocytes and MII oocytes with normal spindles was significantly higher in OC mice treated with butylphthalide than in those treated with vehicle, but not in OC mice treated with oleanolic acid (Fig. 5d, e). These findings suggest that butylphthalide, but not oleanolic acid, enhances oocyte quality.

Fig. 5: Assessment of the quantity and quality of ovulated oocytes after oral administration of single compounds in mice.
Fig. 5: Assessment of the quantity and quality of ovulated oocytes after oral administration of single compounds in mice.The alternative text for this image may have been generated using AI.
Full size image

a Body weight changes and b ovary weight were assessed in five mice in each group following the administration of either 0.5% DMSO (vehicle) or 5 mg/kg of the compound for four weeks. The number of retrieved (c) total oocytes, (d) MII oocytes, and (e) MII oocytes with normal spindle alignment were counted to assess oocyte quantity and quality. Data are presented as the mean ± standard deviation. Statistical analyses were performed using a one-tailed unpaired t-test. p < 0.05. YC young control (12-week-old mice; sampling at this age), OC old control (42-week-old mice; sampling at this age), Veh vehicle, But butylphthalide, Ole oleanolic acid.

To further investigate the effect of butylphthalide on gene expression, we analyzed the expression levels of 21 Ras signature genes in the ovaries of YC, OC, and butylphthalide-treated OC mice. Notably, 11 genes, namely Akt1, Rasgrf2, Rassf5, Fgfr2, Gng8, Insr, Akt3, Shc3, Htr7, Sos2, and Rasgrp3, exhibited a directional shift in expression from OC to YC after butylphthalide treatment (Supplementary Fig. 2). This suggests that butylphthalide partially restores the gene expression signatures associated with a more youthful ovarian state. Although the changes in individual gene expression were not statistically significant, the overall trend demonstrated a collective shift toward the YC expression profile. This result aligns with the logic of our vector-based prediction model, which identifies compounds that collectively modulate the Ras signature rather than targeting individual genes with strong effects.

Butylphthalide administration ameliorated primordial follicle loss in aged mice

With aging, the depletion of primordial follicles established early in life reduces the dynamic follicular reserve and increases oocyte abnormalities28. To evaluate the effects of butylphthalide on age-related ovarian reserves, we conducted a histological analysis of mouse ovaries following butylphthalide administration. Although the ovarian size and number of follicles in OC mice were reduced compared to those in YC mice, no significant difference in ovarian architecture was observed between the OC and OC + butylphthalide-treated groups (Fig. 6a). However, based on follicle count data, the number of primordial follicles in OC + butylphthalide mice was significantly higher than that in OC + vehicle mice (Fig. 6b).

Fig. 6: Assessment of ovarian follicle counts after butylphthalide administration in mice.
Fig. 6: Assessment of ovarian follicle counts after butylphthalide administration in mice.The alternative text for this image may have been generated using AI.
Full size image

After administering either 0.5% DMSO (vehicle) or 5 mg/kg butylphthalide for four weeks, right ovaries obtained from each group were subjected to paraffin embedding and serially sectioned at 5 μm. The inner follicles in every 10th H&E section were counted by stage. a Representative histological images of ovaries from each group. Prl, primordial follicles; Pr, primary follicles; Sec, secondary follicles; A, antral follicles. b Quantitative analysis of ovarian follicles from YC (n = 4), OC (n = 7), and OC + butylphthalide (n = 7) mice. Data are presented as the mean ± standard deviation. Statistical analysis was performed using a one-tailed unpaired t-test. p < 0.05. YC young control (12-week-old mice; sampling at this age), OC old control (42-week-old mice; sampling at this age), Veh vehicle, But butylphthalide.

Discussion

In this study, we present an active compound mining strategy to explore effective compound combinations based on transcriptomic changes within the core MOA of SM in an animal model of age-related infertility. Our approach for identifying an active compound in herbal medicine differs from existing research in three ways. First, we identified the Ras signaling pathway using DEGs included in the core module of the PPI network rather than all DEGs between SM-treated and untreated OC (Fig. 1b). In age-related animal models, it is difficult to distinguish SM-induced differences from overall DEG analysis because of individual variations and mild changes in gene expression. Second, among the DEGs related to the Ras pathway, 21 genes were selected as core genes, considering the direction of gene expression in YC compared to OC (Fig. 1c). Compound candidates derived from DEGs, without considering the direction of gene expression, were ineffective in animal experiments (data not shown). Third, we used the MLR method to predict the compound combinations that mimicked the SM-induced expression patterns of 21 genes (Fig. 3). We hypothesized that compound combinations that mimic these gene expression patterns would reproduce the pharmacological effects of SM on OC. By focusing on these specific gene expression patterns, our combined approach to identify active compounds in herbal medicines may offer the advantage of efficiently reducing cost and time, particularly when dealing with a large number of candidate compounds.

Ras signaling plays a crucial role in regulating ovulation and luteinization, and mutations in Ras can lead to premature ovarian failure, ultimately resulting in fertility issues23. Concerning ovarian aging, a recent study utilizing single-cell RNA sequencing showed that Kirsten rat sarcoma viral oncogene homologue (KRAS) signaling was significantly correlated with aging in ovarian granulosa cells29. As shown in Fig. 1c, we focused on 21 Ras-related genes that were affected by SM. Insulin-like growth factor 1 (Igf1), fibroblast growth factor receptor 2 (Fgfr2), and serine/threonine kinase (Akt) levels were restored to YC levels after SM administration. Igf1 is known to promote oocyte maturation30 and reduce oxidative damage31. In human follicular fluid, the concentration of IGF1 has been implicated in ovarian reserve32. Fgfr2 exhibits dynamic expression patterns in both granulosa and cumulus cells and mediates communication between these cells33. Moreover, mutant-activated Fgfr2 is linked to p53-dependent senescence34. Akt regulates the primordial follicle pool, and hyperactivation causes its rapid depletion, which in turn promotes senescence35. SM downregulates colony-stimulating factor-1 (Csf-1), which is released from granulosa cells and plays a role in recruiting macrophages. Among the recruited macrophages, the M1 type contributes to the synthesis of steroid hormones in follicular cells, whereas the M2 type participates in endometrial receptivity by releasing granulocyte colony-stimulating factor (G-CSF)36. These macrophage pools diminish during aging37. Thus, Ras signaling may play a role in age-related ovarian dysfunction and may be of potential importance for understanding the mechanisms underlying this phenomenon.

MLR was used to predict the compound combination that best reproduced the effects of SM. Few attempts have been made to predict compound combinations in herbal medicines using MLR. MOA patterns expressed by various compounds in herbal medicines provide a wealth of information; however, elucidating the MOA of herbal medicines remains challenging38. We attempted to address the complex MOA problem of herbal medicines from a data science perspective, with experimental verification confirming the success of the proposed method (Fig. 4). Initially, we expected several compounds to be required to replicate the MOA of complex herbal medicines. Therefore, when designing this study, we set the maximum number of compounds to be included in the multiset to four. Surprisingly, the prediction results were optimal when combining two compounds rather than when combining four compounds (Fig. 3, Supplementary Table 2). Although transcriptome expression patterns induced by herbal medicines appear complex, the actual dimensions of space-spanning transcriptome expression may not be large because of gene interactions and shared dimensions. Future research could explore herbal medicine MOA using models that perform well in relatively low dimensions, such as non-negative matrix factorization or restricted Boltzmann machines39,40. Additionally, although the three-compound combinations were derived using the MLR method, only the combination of butylphthalide and oleanolic acid affected the aging model (Fig. 4). Therefore, we plan to conduct additional analyses to explain why the other two combinations were ineffective.

Although the representative compounds of the four major herbs of SM are known, in this study, we detected 30 constituent compounds of SM and established a framework to identify the active compounds in an in vivo system using the transcriptomic data of these compounds. Through in vivo validation of the candidate compounds in combination, we identified butylphthalide as an active compound in SM associated with improved oocyte quality and number, as well as primordial follicle number (Figs. 5e and 6b). Butylphthalide, a major constituent of Cnidii, one of SM’s four herbs, was discovered in rat plasma after oral administration of Shimotsuto (a Japanese term for SM)41. Although it has not yet been linked to female infertility, it is clinically used to promote the expression of angiogenic growth factors and angiogenesis in ischemic stroke42. Several angiogenic factors, such as vascular endothelial growth factor (VEGF), VEGF receptor 2, and endothelial cadherin, play critical roles in promoting angiogenesis within ovarian follicles, and are associated with follicular development and successful ovulation43,44. Insufficient intrafollicular vasculature can adversely affect oocyte metabolism and increase chromosomal abnormalities in older women44. The ability of butylphthalide to enhance ovarian angiogenesis may offer new therapeutic avenues for treating female infertility associated with aging45. This study had some limitations, including the use of a single dose of compounds for administration without adjusting the appropriate dosage based on metabolite measurements in the serum or plasma using mass spectrometry. In addition, appropriate composition ratios for compound combinations are lacking. Given the scarcity of studies utilizing this research method, our study successfully presents a novel and challenging approach for deriving the optimal combination of compounds to identify the active compounds in herbal medicines. Further research is underway to elucidate the regulation of the 21 identified genes and to identify the key mechanisms of action of butylphthalide.

Materials and methods

Identification of the constituent chemical compounds of SM

SM powder was obtained from the National Institute for Korean Medicine Development (NIKOM). To determine the constituents of SM, ultra-high-performance liquid chromatography-tandem mass spectrometry (UHPLC-MS/MS) was performed using a Dionex UltiMate 3000 system equipped with a Thermo Q-Exactive mass spectrometer (Thermo Fisher Scientific, Waltham, MA, USA). The solution extracted from the SM powder dissolved in methanol at a concentration of 100 mg/mL was subjected to a subsequent process in accordance with a previous report24. Data acquisition and analysis were performed using Xcalibur 4.2 and Tracefinder 4.1.

Animal model and compound administration for ovarian RNA-seq analysis

Female BALB/c mice were purchased from Central Lab. Animal Inc. (Seoul, Korea). The mice were fed an ad libitum diet and housed in a pathogen-free room under a 12-hour light-dark cycle. All experiments were approved by the Korea Institute of Oriental Medicine Institutional Animal Care and Use Committee (Approval No.: 20-090 and 21-101). Thirty-eight-week-old mice were randomly categorized into groups of three per cage and orally administered either vehicle [distilled water containing 0.5% dimethyl sulfoxide (DMSO; D2650; Sigma-Aldrich, St. Louis, MO, USA)] or one of the 30 single compounds at a dose of 5 mg/kg five times a week for four weeks. Hereafter, the vehicle-treated group is referred to as the old control (OC), and the compound-treated groups are referred to as OC + [name of the compound]. Information on the compounds administered to the mice is listed in Supplementary Table 3. After compound administration for 4 weeks, the mice were induced into a post-ovulatory state, as described in a previous study14, by intraperitoneal injection of 5 IU of pregnant mare serum gonadotropin (PMSG; HOR-272; Prospec, Rehovot, Israel), followed by an injection of 5 IU of human chorionic gonadotropin (hCG; HOR-250; Prospec) 48 h later. The ovaries were collected 14–16 h post-hCG administration, weighed, and immediately preserved in liquid nitrogen for RNA sequencing. The compound dosage was selected to minimize potential chronic toxicity, based on a study reporting that a lowest observed adverse effect level of 5 mg/kg/day provides the highest predictive accuracy for chronic toxicity46. To evaluate oocyte quality, the compound was administered for 4 weeks, which is the period required for primary follicles to mature into antral follicles47.

RNA extraction and sequencing

Total RNA was extracted using TRIzol reagent (Invitrogen, Carlsbad, CA, USA). The purity and integrity of the extracted RNA samples were analyzed using a NanoDrop ND-2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) and an Agilent 2100 Bioanalyzer (Agilent Technologies, Amstelveen, The Netherlands). All samples exhibited high purity (optical density (OD)260/OD280 > 1.80) and integrity (RNA integrity number; RIN > 7.0). Total RNA (1 µg) was sequenced using the NextSeq 500 System (Illumina, San Diego, CA, USA) by Ebiogen Inc. (Seoul, Korea). Raw data were deposited in the GenBank Gene Expression Omnibus (GSE26400248 and GSE26441349). The detailed description of ovarian RNA-seq datasets of SM compounds in OC mice was published as a Data Descriptor, primarily focusing on providing a comprehensive and accessible dataset50.

RNA-seq data preprocessing and alignment

The quality of the raw RNA-seq FASTQ files was initially evaluated using the FASTQC (v1.9). To eliminate adapter sequences and correct sequencing artifacts, the BBduk module in the bbmap toolkit (v38.95, accessed June 20, 2020) was implemented using the following parameters: k = 13, ktrim = r, useshortkmers = t, mink = 5, qtrim = t, trimq = 10, and minlength = 20. These processed reads were then aligned to the mouse reference genome (build GRCm39) using the STAR aligner (v2.7.9 a) to establish their genomic positions. The aligned reads within the genome were quantified per gene using RSEM (v1.3.3) along with a gene structure file (GRCm39.104.gtf). This quantification yielded read counts and transcripts per million bases (TPM).

DEG analysis

To investigate the genome-wide perturbation effects of individual compound treatments on ovarian tissues in aged mice, we conducted statistical testing for differences in gene expression levels between groups (compound-treated versus vehicle-treated). This analytical procedure was executed in R (v4.2.2) using the voom, lmFit, and ebayes functions of the limma R package. In brief, we transformed the read count data, performed linear modeling considering experimental factors, including those associated with sequencing or experimental batches, and calculated moderated t-statistics to assess the significance of the differences in gene expression between the two groups. This computation furnished three essential metrics per gene: log2fold-change, t-statistic, and corresponding p-value. Collectively, these metrics provide a comprehensive snapshot of gene-level responses to compound treatments.

Network-based DEG analysis for functional enrichment

RNA-seq data from a previous study14 were processed using the methods described above. Considering the complex interplay between genes with key regulatory roles in physiological systems, we employed a network-based approach using a well-established protein-protein interaction network. First, we selected DEGs between SM-treated and untreated 38-week-old mice (OC), as well as between untreated 8-week-old mice (YC) and OC, using a moderated t-test with a significance threshold of p < 0.001 (Fig. 1a). These DEGs were mapped onto a protein-protein physical interaction (PPI) network sourced from the STRING database (https://string-db.org/), focusing on interactions with a combined score > 700. Given that the DEGs were dispersed throughout the PPI network, we applied a random walk with restart (RWR) propagation algorithm to identify core modules, including common gene neighbors closely linked to DEGs. The largest and most interconnected modules were analyzed for functional enrichment using a hypergeometric test in the Gene Ontology (GO) database. The computational analyses for the RWR algorithm and functional enrichment were executed in R (v4.2.2) with the functions in “RandomWalkRestartMH” and “clusterProfiler” R packages, respectively, using their default settings.

Deriving SM compound combinations based on the drug-derived transcriptome

A vector sum model was used to predict the combination of compounds that best reproduced SM-derived transcript vectors51. To construct the vector, 21 genes whose transcript expression patterns in the YC and SM appeared in the same direction in the RAS signaling pathway of the KEGG pathway were used14. The SM feature vector uses data from previous studies. Additionally, transcriptome data extracted from mice administered each of the 30 compounds were used to generate compound feature vector information. For both datasets, vectors were generated using log FC values calculated based on OC from previous studies. Each vector consisted of 21 dimensions containing expression information for the 21 genes.

MLR was used to confirm whether the vector of each compound could be synthesized to reproduce a drug-induced SM feature vector52.

$$y={\beta }_{0}+{\beta }_{1}{X}_{1}+{\beta }_{2}{X}_{2}+\,\cdots \,+{\beta }_{p}{X}_{p}+\varepsilon$$
$$if\,{\beta }_{0}\,and\,\varepsilon =0,y={\beta }_{1}{X}_{1}+{\beta }_{2}{X}_{2}+\cdots +{\beta }_{p}{X}_{p}$$

If we assume that intercept (\({\beta }_{0}\)) and residual (\(\varepsilon\)) are 0 in the MLR formula. MLR plays the same role as vector summation. Thus, the MLR for the vector sum can be expressed as follows:

$$y=\left[\begin{array}{cccc}| & | & | & |\\ {X}_{1} & {X}_{2} & {X}_{3} & {X}_{4}\\ | & | & | & |\end{array}\right]\left[\begin{array}{c}{\beta }_{1}\\ {\beta }_{2}\\ {\beta }_{3}\\ {\beta }_{4}\end{array}\right]={\beta }_{1}{X}_{1}+{\beta }_{2}{X}_{2}+{\beta }_{3}{X}_{3}+{\beta }_{4}{X}_{4}$$

In this study, we developed a combination of ≤ 4 compounds that best mimicked SM feature vectors. Using the multiset coefficient, 40,920 multisets consisting of four compound vectors were derived from each of the 30 compound vectors that could be sampled with replacement53.

$$\begin{array}{c}multiset\,coefficient=\left(\left(\begin{array}{c}n\\ k\end{array}\right)\right)=\left(\begin{array}{c}n+k-1\\ k\end{array}\right)=\frac{(n+k-1)!}{k!\,(n-1)!}=\,\frac{n(n+1)(n+2)\cdots (n+k-1)}{k!}\end{array}$$
$$n=30,k=4$$
$$\therefore \left(\left(\begin{array}{c}30\\ 4\end{array}\right)\right)=\left(\begin{array}{c}30+4-1\\ 4\end{array}\right)=\frac{33\times 32\times 31\times 30}{4\times 3\times 2\times 1}=40,920$$

MLR was performed using a multiset of 40,920 compound vectors, where y is the SM feature vector and 0 is the intercept. When performing the analysis, all four compounds had p-values < 0.1, with coefficients > 0 selected as the multiset. The adjusted R-squared value of the linear regression prediction results was used to determine the ranks of the multiple sets (Supplementary Table 2). The adjusted R-squared value, also known as the coefficient of determination, is an excellent metric for evaluating the performance of an MLR model54. In other words, the combination of compounds with the highest coefficients of determination best approximated the SM-derived transcriptome vector. Therefore, we used the coefficient of determination as the criterion for selecting effective combinations. Thus, we selected the top three multisets containing at least two compounds as effective combinations of SM compounds based on their high coefficients of determination. The MLR results were visualized using the adjusted R-squared value, which is a well-established predictor in scientific research, with a threshold of 0.655. Only single compounds or combinations of two compounds with an adjusted R-squared value of 0.6 or higher are displayed in the heatmap, and notable combinations are highlighted with boxes (Fig. 3a). Linear regression analysis and visualization of the results were performed using Python (version 3.8.12), Seaborn (version 0.13.2)56, and statsmodels (v0.13.1).

Assessment of oocyte quality and quantity after administration of compound combinations or single compounds

Eight- and 38-week-old female BALB/c mice were administered 0.5% DMSO, each compound at a dose of 5 mg/kg for the compound combination or 5 mg/kg of a single compound five times a week for four weeks. To synchronize the estrous cycle, OC mice were housed in cages, each consisting of one mouse from each group57. To evaluate drug efficacy in terms of both the quantity and quality of ovulated oocytes, mice were induced by sequential intraperitoneal injections of PMSG (5 IU) and hCG (5 IU), as described above. The ovulated cumulus-oocyte complex was retrieved from the oviducts 14–16 h post-hCG administration and treated with hyaluronidase (ART-4007-A; Coopersurgical, Trumbull, CT, USA) to remove the cumulus cells. Oocytes were visually inspected and counted under a stereomicroscope. To examine and quantify metaphase II (MII)-stage oocytes eligible for fertilization, oocytes were subjected to immunofluorescence anti-tubulin staining, as described previously14. Images were obtained using a fluorescence phase-contrast inverted microscope (Eclipse Ti-U; Nikon).

Histological assessment of ovarian follicle counts after administration of butylphthalide

After the administration of a single compound, ovaries from the YC, OC, and OC + butylphthalide groups were fixed in 4% paraformaldehyde in PBS, embedded in paraffin, and serially sectioned at 5 µm. Every 10th section was stained with Hematoxylin and Eosin (H&E) for follicle quantification. Follicles were counted according to the method described by Myers et al. 58: primordial follicles with a single flat layer of granulosa cells surrounding the oocyte, primary follicles with a single cuboidal granulosa cell layer, secondary follicles with at least two granulosa cell layers and a theca layer, and pre-ovulatory follicles with a complete antrum and theca layer. Images were scanned using a Panoramic Desk digital slide scanner (3D HISTECH, Hungary) and captured using CaseViewer 2.2 software (3D HISTECH).

Statistical analysis

GraphPad Prism 9 (GraphPad Software, La Jolla, CA, USA) was used for statistical analysis. All data are presented as mean ± standard deviation (SD). The Shapiro–Wilk test was used to test the normality of the data, and statistical significance was assessed using a one-tailed unpaired Student’s t-test or one-tailed Mann–Whitney U test. Statistical significance was set at p < 0.05.