Introduction

The reproductive system is essential for species survival. Damage to this system, including pathological lesions in reproductive organs or disruptions of the hypothalamic-pituitary-gonadal axis, often results in developmental impairment in offspring. Thus, chemicals suspected of inducing developmental and reproductive toxicity (DART) should be effectively assessed and prioritized for regulatory consideration.

However, OECD test guidelines (TG) for DART are time-intensive, expensive, and require expert evaluation because of the numerous and complex endpoints involved. The OECD TG for DART includes TG 414 (prenatal developmental toxicity study), TG 416 (two-generation study), TG 421 (DART screening test), TG 422 (combined repeated-dose toxicity study with the DART screening test), TG 443 (extended one-generation reproductive toxicity study), and TG 426 (developmental neurotoxicity study)1. It is impractical to conduct DART studies on the numerous chemicals that require assessment.

To address this challenge, new approach methodologies (NAMs) have been described, including in vitro, in silico, and omics-based strategies, which aim to provide mechanistic insights into chemical toxicity2. These approaches are efficient, ethical, and mechanistically informative alternatives to conventional animal models. Among NAMs, quantitative structure-activity relationships (QSARs) are commonly applied3. The QSAR Toolbox, for example, has been increasingly utilized to predict toxicological properties based on chemical structures by assessing structure–activity relationships. However, because DART encompasses numerous and complex endpoints, QSAR predictions remain limited. To overcome these challenges, the adverse outcome pathway (AOP) framework has been proposed as a structured approach to integrate mechanistic information on toxicity4.

The Tox21 program, launched as a collaborative initiative by the United States National Institutes of Health, Environmental Protection Agency, and Food and Drug Administration, aims to implement high-throughput screening approaches using in vitro cell-based assays for over 10,000 chemicals. It focuses on mechanistic endpoints, including nuclear receptor activation, oxidative stress, and DNA damage, enabling rapid and cost-effective evaluation of large-scale chemicals5. Tox21 assay datasets have been used to construct databases for prediction models6,7.

In this study, we investigated which MIEs correlated with DART outcomes based on the Tox21 assay datasets to predict the DART chemicals.

Results

MIEs associated with OECD TG 414 data

Significant associations between the OECD TG 414 data (1,242 chemicals) and the MIE models are summarized in Table 1. A total of 1,242 SMILES were entered into the toxicity predictor and analyzed for prediction values in each model. Among 108 models, forty-one models were significant. Duplicates corresponding to the same entity across the normalized_1 and _40 models were treated as one. Thus, significant 34 MIEs negatively associated with the OECD TG 414 data. Six stress- and damage-related response factors, including activator protein-1 (AP1), antioxidant response element (ARE), histone variant H2AX (H2AX), hypoxia-induced factor 1 (HIF1), nuclear factor kappa B (NFκB), and p53 agonists, showed negative associations with positive chemicals in OECD TG 414 data (developmental toxicity). Additionally, two apoptosis-related caspases-3/7 (CaspH and CaspC) inducers and heat shock response (HSR) activator showed negative associations with positive chemicals in OECD TG 414 data (developmental toxicity).

Table 1 Association of Tox21 assay results with OECD TG 414 data.

Among endocrine system-related factors, positive developmental toxicity chemicals were negatively associated with chemicals binding to the thyrotropin-releasing hormone receptor (TRHR), thyroid-stimulating hormone receptor (TSHR), and aromatase (Arom) antagonists. Multiple nuclear receptors and metabolomic factors, including the constitutive androstane receptor (CAR) antagonists, farnesoid X receptor (FXR) agonists, peroxisome proliferator-activated receptor delta (PPARδ) agonists and antagonists, PPAR gamma (PPARγ) agonists and antagonists, retinoid X receptor alpha (RXR) agonists, vitamin D receptor (VDR) agonist and antagonists, retinoic acid receptor (RAR) antagonists, retinoid-related orphan receptor gamma (ROR) antagonists, estrogen receptor alpha ligand binding domain (ERlbd) antagonists, estrogen receptor beta (ERb) agonists, estrogen-related receptor with PGC (ERRPGC) antagonists, endoplasmic reticulum stress response (ERsr) agonists, and glucocorticoid receptor (GR) agonists, were negatively associated with the OECD TG 414 data. Developmental signaling pathways, including Sonic hedgehog (Shh) antagonists and transforming growth factor beta (TGFβ) agonists, were also negatively associated with OECD TG 414 data.

MIEs associated with OECD TG 416 data

Significant associations between the OECD TG 416 data (265 chemicals) and the MIE models are listed in Table 2. Ten models (9 MIEs) were positively associated with OECD TG 416 data. Significantly positive associations were observed for ARE, ERR, and PPARγ agonists, as well as CAR, ER full, FXR, histone deacetylase (HDAC), progesterone receptor (PR), and Shh antagonists. ARE agonists, CAR antagonists, and PPARγ agonists showed differing statistical association patterns between the OECD TG 414 (negative association) and TG 416 data (positive association), although the underlying biological basis for this difference cannot be determined from these analyses.

Table 2 Association of Tox21 assay results with OECD TG 416 data.

MIEs associated with OECD TG 421 data

Significant associations between OECD TG 421 outcomes (723 chemicals) and the MIE models are shown in Table 3. A total of 29 models (27 MIEs) showed negative associations, whereas one model (PR agonists) showed positive associations, with the OECD TG 421 data. Odds ratios (OR) of these models are shown in Fig. 1A. PR agonists showed an OR of > 1, suggesting that these chemicals were associated with increased odds of DART occurrence. Proteins showing negative associations overlapped with OECD TG 414 results. However, androgen receptor ligand-binding domain (ARlbd) agonists and antagonists, ER alpha with stimulator (ERfulls) antagonists, and FXR antagonists showed significantly negative associations only in the OECD TG 421 data. Chemical interactions with sex hormone receptor-related proteins were statistically associated with reproductive toxicity in the OECD TG datasets.

Table 3 Association of Tox21 assay results with OECD TG 421 data.
Fig. 1
figure 1

Volcano plots of OECD TGs. Odds ratios of toxicity predictors and OECD TG 421 (A), TG 422 (B), and TG 443 data (C).

MIEs associated with OECD TG 422 data

Significant associations between the OECD TG 422 data (1,456 chemicals) and the MIE models are shown in Table 4. Significant proteins against OECD TG 422 data showed patterns similar to those of OECD TG 414 and TG 421 data. No proteins were uniquely associated with OECD TG 422 data. PR agonists with OR > 1 (-log P value > 1.3) were suggested to be associated with DART (Fig. 1B).

Table 4 Association of Tox21 assay results with OECD TG 422 data.

MIEs associated with OECD TG 443 data

Significant associations between the OECD TG 443 data (201 chemicals) and the MIE models are shown in Table 5. Aryl hydrocarbon receptor (AhR) agonists, HDAC antagonists, and ERβ antagonists showed significantly positive associations with OECD TG 443-positive chemicals. An OR > 1 was observed for AhR agonists and HDAC antagonists, suggesting that these chemicals are associated with reproductive toxicity (Fig. 1C).

Table 5 Association of Tox21 assay results with OECD TG 443 data.

Networks of key proteins involved in MIE associated with DART chemicals

Results of MIEs associated with OECD TG data (Tables 1, 2, 3, 4 and 5) were visualized using a Venn diagram (Fig. 2) to examine the overlap of MIEs among different OECD TGs. Twenty-one common MIEs were shared between OECD TG 414 and TG 421 data, five MIEs (including TGFb agonists, ERsr agonists, PPARd antagonists, TSHR antagonists, and FXR antagonists) of which were also common to OECD TG 422 data. In addition, four MIEs were shared between OECD TG 416 and OECD TG 421, while OECD TG 414 and OECD TG 416 shared five MIEs, including three (FXR agonists, PPARγ agonists, and Shh antagonists) that overlapped. OECD TG 416 and OECD TG 443 had one common MIE, HDAC antagonists. Based on these findings, the interactions among the common MIEs were further analyzed using STRING.

Fig. 2
figure 2

MIE models associated with OECD TGs. OECD TG 414, yellow; OECD TG 416, blue; OECD TG 421, purple; OECD TG 422, light green; OECD TG 443, green. Agonists are shown in red, antagonists in blue, and others (inducer or activator) in black. In OECD TG 421, PR, as indicated in the boxed area, showed an association opposite to that of the other MIEs.

Protein networks of key MIE-associated proteins statistically associated with DART chemicals are shown in Fig. 3. Protein-protein interaction enrichment was significant (p-value = 1.66e-07). In biological processes, the negative regulation of growth, cell growth, cell population proliferation, heart development, reproductive structure development, and hormone-mediated signaling pathways were enriched (false discovery rate < 0.00008). The nuclear receptor transcription pathway and SUMOylation of intracellular receptors were significantly associated with DART (false discovery rate, < 0.00002) (Supplementary Data_S1).

Fig. 3
figure 3

Predicted protein-protein interactions generated by the SPRING algorithm. Multiple protein queries were used to identify key factors commonly associated with both OECD TGs and MIEs. The heatmap presents normalized SPRING interaction scores, with higher values indicating stronger predicted association. Average local clustering coefficient, 0.637. Protein-protein interaction enrichment p-value, 1.66e-07.

Discussion

Risk assessment of DART is crucial for the preservation of species in the environment. However, the experimental validation of the vast number of chemicals is limited. Thus, NAMs are increasingly being applied to the prediction of DART. Reliable prediction requires careful experimental and contextual validation. Reproductive toxicity is linked to disruption of biological activities of the reproductive system, including the ER, PR, and AR pathways8,9,10. Developmental toxicity, in turn, is associated with hormones that regulate homeostasis, cell damage, and growth during organogenesis11,12. In this study, the association between these pathways and DART results based on in vivo experimental data from OECD TG assays is analyzed.

Our results broadly align with previously proposed predictive mechanisms and highlight several candidate factors that merit further investigation. In silico validation of the prediction was performed using the AUC/ROC analysis, with AUC values ranging from 0.7 to 0.98 (Supplementary Data _S2). In Fig. 2, proteins mapped to each OECD TG assay. The overlap indicates putative associations between selected protein and assay endpoints but does not alone establish causal relationships. OECD TG 414 primarily reflects developmental toxicity, while OECD TG 416 and TG 443 converge on reproductive toxicity; however, OECD TG 421 and 422 support the evidence for DART.

As shown in Fig. 1, several proteins showed statistically significant associations with assay-derived endpoints. Interestingly, these commonly related proteins were negatively correlated with developmental toxicity in OECD TG 414 and positively correlated with reproductive toxicity in OECD TG 416.

In the prediction of reproductive toxicity, HDAC antagonists were significantly correlated with both OECD TG 416 and TG 443 levels. HDACs regulate gene expression and chromatin stability; their inhibition has been linked to cell death, teratogenesis, and female reproductive toxicity13,14,15. Our results suggest that HDAC antagonists may be associated with reproductive rather than developmental toxicity. PR, a key receptor in the maintenance of pregnancy, antagonists and agonists are positively correlated with OECD TG416 and OECD TG 421, respectively. This suggests that complex receptor-level regulatory effects, rather than a direct PR signaling pathway, are associated with DART. Consequently, PR has been incorporated into deep learning models of female reproductive toxicity16,17. AhR agonists and ERβ antagonists showed a significant positive association in OECD TG 443. AhR, activated by environmental pollutants, functions as a receptor for endocrine-disrupting chemicals, indicating its role as a key protein in male reproductive toxicity18,19.

ARE, FXR, and PPARγ agonists, as well as Shh antagonists, showed statistical associations with DART. These finding may indicate potential involvement of oxidative stress, metabolism, and inflammatory response pathways, although causal interpretation is beyond the scope of the current dataset. Additionally, at the molecular level, AP1, H2AX, p53, HIF1, NF-kB, RXR, ROR, CaspC, and CaspH were significantly correlated with DNA damage, gene expression, and cell death pathways, suggesting the induction of developmental toxicity. These proteins are involved in the nuclear receptor signaling and SUMOylation pathways. Their association in this study may suggest potential links to transcriptional regulation and cellular homeostasis, although direct causality with DART cannot be established from association data alone.

TRHR and TSHR are known to be involved in thyroid hormone-related developmental processes, including nervous11, cardiovascular, and skeletal systems20. In our results, TRHR and TSHR agonist and antagonist activities were negatively associated with OECD TG 414 and OECD TG 421 outcomes. However, these statistical associations may reflect complex receptor-level regulatory patterns rather than a direct mechanistic effects.

In conclusion, our results obtained by binary classification of MIE models and OECD TG data are consistent with those of previous reports on reproductive and developmental toxicity and related factors. Although binary classification may reduce the granularity of the data and cannot fully capture dose–response relationships, potency, or partial activation, this approach was adopted to facilitate comparative screening across diverse endpoints. Our results suggest that this approach may be useful for preliminary prioritization of chemicals that could warrant further evaluation. Unfortunately, our results could not fully address crosstalk between signaling pathways and establish the association between key proteins and DART endpoints. MIE may omit metabolism, pharmacokinetics, species differences, exposure timing, and mixture effects. Nevertheless, these results suggest the broader implementation of in silico and non-animal methods for DART assessment by linking in vitro and in vivo data through statistically derived associations. This approach may help prioritize candidate DART chemicals for further evaluation and may help inform early-stage safety considerations. However, any translation into regulatory decision-making would requires substantial additional validation, contextual exposure information (pharmacokinetics), and integration with in vivo data.

Materials and methods

Data collection

Chemicals were collected from the database of test results included in the OECD QSAR toolbox (version 4.6) and were tested according to OECD TG 414, TG 416, TG 421, TG 422, and TG 443 depending on the specific endpoints. The chemicals were classified as positive or negative based on reference data from the database. Chemicals that lacked definitive positive or negative classification or had missing CAS registry numbers were excluded from the prediction dataset in this study. The dataset included IUPAC names, CAS registry numbers, and the result from OECD TG 414, TG 416, TG 421, TG 422, and TG 443.

Construction of datasets for collecting and analyzing toxicity and structural information

Chemical structures of each substance were obtained from PubChem using Python scripts with the corresponding CAS numbers. Only compounds containing at least one carbon atom, had a molecular weight below 1500 Da, and excluding coordination complexes, were retained for analysis. The simplified molecular input line-entry system (SMILES) strings were desalted and standardized to canonical SMILES forms using the Toxicity Predictor21; http://mmi-03.my-pharm.ac.jp/tox1/prediction_groups/new) utility, as described below. Duplicate SMILES entries were merged to obtain the final dataset.

MIE activity prediction with toxicity predictor

The Toxicity Predictor21, a QSAR platform built on the Tox21 10 K library22,23,24, was used to predict the agonistic or antagonistic activities of MIEs in the nuclear receptor and stress response pathways25,26. The Tox21 assay data were accessed from the U.S. EPA’s CompTox chemicals dashboard and the PubChem bioassay database.

PubChem activity scores (range: 0–100)27 were binarized prior to model training using two alternative activity score thresholds: 40 (PubChem default) and 1 (lower threshold). For each of the 59 MIE assays, a separate binary classification model was trained at each threshold, yielding 118 candidate models21. Models built using a threshold of 1 minimized false-negative errors. Models built with a threshold of 40 minimized the false-positive errors. Nine models had an area under the receiver operating characteristic (ROC) curve (AUC) below 0.70 and were excluded, leaving 109 models for prediction (Supplementary Data_S2).

For each model, the decision boundary of the predictive probabilities was determined by maximizing the Youden index28. The probabilities were then rescaled such that the boundary was equal to 0.5, thereby establishing a common prediction probability cutoff. Compounds with a normalized predicted probability greater than 0.5 were labeled as “active,” and those with a value less than 0.5 were labeled as “inactive.” Each MIE assay was modeled separately for agonist and antagonist responses. The binary classification of “active” and “inactive” in each model represented functional activity specific to either agonism or antagonism, not a uniform biological activity. When experimental MIE measurements from the Tox21 10 K library were available, these data were obtained directly from the PubChem Bioassay repository and converted to binary active/inactive calls according to the PubChem activity score (≥ 40, active; <40, inactive), thereby superseding the model-based labels. All SMILES in the analysis dataset were submitted to the Toxicity Predictor and their normalized probabilities and activity labels were assigned as described above.

Analysis of functional protein association networks

Protein-protein interactions were predicted from the Homo sapiens database using the Search Tool for Retrieving Interacting Genes (STRING) v1229 https://string-db.org). Proteins showing statistically significant associations (p < 0.05) between OECD TG endpoints and Tox21 MIEs were identified, and only those that were commonly observed in two or more OECD TGs were selected for network analysis. The common protein list identified across multiple OECD TGs was entered into the STRING database to derive the functional associations among these proteins. The edges indicated both functional and physical protein associations. The minimum required interaction score was set to 0.4 (medium confidence), and disconnected were excluded. Active interaction sources included experimental data, curated database, co-occurrence, neighborhood, gene fusion and text mining.

Statistical analysis

For each compound, the presence or absence of toxicity in the OECD TG 414, TG 416, TG 421, TG 422, and TG 442 studies served as response variables, and the predicted MIE activity labels served as explanatory variables. OECD TG outcomes represent expert-interpreted in vivo endpoints that are categorized as positive and negative in the reference database, while Tox21 assay results were modeled using the Toxicity Predictor to yield binary active or inactive output for each assay. Because both datasets were represented in binary form, Fisher’s exact test was applied to evaluate associations between toxicity outcomes and individual MIEs. In this analysis, a right-tailed test corresponds to a positive association, where the co-occurrence of toxic and active outcomes exceeds random expectation, whereas a left-tailed test corresponds to a negative association, where toxic outcomes are less frequent among MIE-active compounds than expected by chance. For ease of interpretation, results are presented in the tables as “positive” or “negative association,” rather than “right-” or “left-tailed” tests. All analyses were performed using JMP Pro 18 (SAS Institute Inc., Cary, NC, USA) with a two-sided significance level of 0.05. The odds ratio (OR) was calculated using Eq. 130.

$$OR = \frac{{TG~positive~and~MIE~positive~ \times ~TG~negative~and~MIE~negative}}{{TG~negative~and~MIE~positive~ \times ~TG~positive~and~MIE~negative}}$$
(1)