Introduction

Infertility is increasing globally, affecting 1 out of 6 couples worldwide1. One of the major causes of infertility is delayed childbearing, which is associated with a profound decline in ovarian reserve, oocyte quality and embryo euploidy rates2. According to the CDC, diminished ovarian reserve (DOR) was the reason for undergoing in vitro fertilization (IVF) in 27% of IVF cycles in 2021, making it the third most frequent indication3.

Considering that the probability of live birth following IVF is associated with the number of oocytes retrieved4,5,6,7,8, numerous treatments have been proposed prior to IVF for women with DOR to increase oocyte yield and improve pregnancy rates9. Despite widespread use, these therapies frequently lack substantial evidence for their safety and efficacy9. One such proposed treatment is testosterone prior to ovarian stimulation.

Animal studies have shown that testosterone treatment promotes the initiation of follicular growth, increases FSH-receptor expression, augments responsiveness to FSH stimulation, and may prevent apoptosis and follicular atresia, thus facilitating oocyte maturation10. This has led to the off-label use of testosterone as a pre-treatment for women with DOR undergoing IVF to increase the number of oocytes retrieved and ultimately to improve live birth rates11,12.

Seven randomized controlled trials (RCTs)13,14,15,16,17,18,19 and an even greater number of meta-analyses10,20,21,22,23,24 have investigated the effects of testosterone in women with DOR. However, the sample sizes of the RCTs were small (ranging from 50 to 159 participants), and only one was placebo-controlled13. Additionally, there is substantial heterogeneity in the dose and the duration of testosterone treatment used10,25,26, leading to inconclusive findings26. Despite clinical practice guidelines either not routinely recommending or lacking specific guidance on the use of testosterone9,27, approximately 50% of reproductive medicine specialists prescribe it off-label for women undergoing fertility treatment11,12.

To address this uncertainty, we performed the first multicenter, multinational, randomized, placebo-controlled study, the Testosterone Transdermal Gel for Poor Ovarian Responders Trial (T-TRANSPORT), to test the hypothesis that transdermal testosterone gel treatment in women with DOR would improve the clinical pregnancy rate compared to placebo, and to evaluate the efficacy and safety of this treatment.

Results

There were 290 eligible participants individually randomized between 24 April 2015 and 12 August 2022, 136 to testosterone and 154 to placebo (Figs. 1 and 2). Two participants in the testosterone group were excluded from the efficacy analyses as their treatment coincided with the start of the COVID-19 pandemic and their IVF cycles were cancelled; both subsequently re-entered the study after a wash-out period >3 months and were re-randomized. Therefore, 288 participants were included in the efficacy analysis. Overall, 255/288 (88.5%) completed the study according to the protocol: 134/154 (87.0%) in the placebo group and 121/134 (90.3%) in the testosterone group. All 290 participants who were randomized were included in the safety analysis. The number of participants included from each study site is shown in Supplementary Table 1.

Fig. 1: CONSORT flow diagram.
Fig. 1: CONSORT flow diagram.The alternative text for this image may have been generated using AI.
Full size image

CONSORT diagram depicting participant flow through the trial.

Fig. 2: Schematic representation of the treatment arms.
Fig. 2: Schematic representation of the treatment arms.The alternative text for this image may have been generated using AI.
Full size image

Group A testosterone group, Group B placebo group, hpHMG highly purified human menopausal gonadotropin, rhCG recombinant human chorionic gonadotropin, OPU oocyte pick-up, ET embryo transfer, TDS three times daily.

The baseline participant characteristics were similar in the two groups (Table 1). The higher number of patients randomized to placebo was a result of the stratified block randomization and the small patient numbers in some age groups at some sites.

Table 1 Baseline characteristics

In the intention-to-treat analysis, the clinical pregnancy rate was 15.7% in the testosterone group and 14.9% in the placebo group (RR 1.05; 95% CI 0.61 to 1.81, p = 0.86; Table 2). There were no between-group differences in the pre-specified secondary endpoints of: cycle cancellation; cycles reaching embryo transfer; biochemical pregnancy rate; ongoing pregnancy rate; number of oocytes retrieved; number of mature (metaphase II) oocytes retrieved; number of top-quality embryos available on day 3; or number of cycles with supernumerary embryos available (Tables 2 and 3). The antral follicle count did not differ on the day of starting stimulation (5.0 ± 2.3 in the placebo group compared to 5.0 ± 2.8 in the testosterone group). There were no significant between group differences in the rates of live birth, multiple pregnancy, miscarriage, stillbirth or ectopic pregnancy (Table 2). Obstetric and neonatal outcomes were similar between the two groups (Table 4).

Table 2 Clinical outcomes
Table 3 Outcomes of ovarian stimulation
Table 4 Obstetric and neonatal outcomes

All 23 clinical pregnancies in the placebo group were singleton pregnancies; in the testosterone group, 4 out of 21 clinical pregnancies were twin pregnancies, and the remainder were singletons. Of the 6 miscarriages in the testosterone group, 1 occurred in a twin pregnancy. There was a single stillbirth in the study, of a singleton pregnancy in the testosterone group. There were no congenital anomalies reported in either group.

The safety analysis included all randomized participants (n = 290). Overall, 154 (53.1%) participants experienced at least one adverse event. There were no significant differences between groups in the incidence of adverse events, serious adverse events, withdrawal due to adverse events, application site reactions or androgenic events overall, however more participants in the testosterone group reported increased hair growth (14.7% vs 7.1% (RR 2.05; 95% CI 1.02 to 4.14, p = 0.03)) (Table 5). There was no difference in the Ferriman-Gallwey (FG) scores or the incidence of hirsutism (FG score ≥8) between groups (Table 5).

Table 5 Adverse events

Two participants in the placebo group reported pre-specified serious adverse events: an extrapyramidal reaction and an overdose of the placebo gel. The study medication was discontinued in the second participant. Neither participant was withdrawn from the study due to these side effects. There were no serious adverse events in the testosterone arm. Two participants discontinued the placebo gel due to adverse events. One reported acne and irritability/bad mood; the other reported per vaginal bleeding. There were no discontinuations for adverse events in the testosterone arm.

The planned subgroup analysis for participants aged <36, 36–39 and ≤40 did not reveal any between-group differences; however, event rates were low in these analyses, and the study was underpowered to detect differences in these subpopulations (Table 2).

As expected, serum total testosterone levels were significantly higher in the testosterone group than the placebo group on the day of starting ovarian stimulation (at completion of the study drug), measuring 3.11 ± 2.61 nmol/L compared to 0.61 ± 0.36 nmol/L (MD 2.50; 95% CI 1.99 to 3.01, p < 0.001) (Fig. 3).

Fig. 3: Boxplot of serum total testosterone levels during the study.
Fig. 3: Boxplot of serum total testosterone levels during the study.The alternative text for this image may have been generated using AI.
Full size image

Serum total testosterone levels (nmol/L) prior to, during, and after treatment with testosterone gel or placebo. Central bars represent median; central boxes span the interquartile range (IQR: Q3–Q1); whiskers extend to the most extreme data point that falls within 1.5 x IQR from Q1 or Q3; dots represent outliers beyond this range. Visit 2 (Placebo: n = 139; Testosterone: n = 122): Day of starting treatment with testosterone or placebo (day 1 of menstrual cycle); Visit 3 (Placebo: n = 128; Testosterone: n = 111): Day 1 of the subsequent menstrual cycle; Visit 4 (Placebo: n = 120; Testosterone: n = 113): Day 21 of the subsequent menstrual cycle; Visit 5 (Placebo: n = 121; Testosterone: n = 106): Final day of treatment with testosterone or placebo and day of starting ovarian stimulation; Visit 10 (Placebo: n = 89; Testosterone: n = 77): day of oocyte collection.

The study was terminated for futility by the DSMB based on the results of the pre-planned interim analysis. After 72.5% of the target sample size was randomized, a conditional power calculation was performed, which predicts the likelihood of rejecting the null hypothesis if the study were to be continued. Using the current trend assumption gave a conditional power of 0.08%; using the hypothesized effect assumption gave a conditional power of 6.2%28,29. Both were well below the pre hoc futility threshold of 20%, indicating that the likelihood of the study demonstrating a significant difference in the primary outcome between groups if continued was extremely low. Based on these results and as per the study protocol, the study group terminated the trial on advice of the DSMB.

Discussion

This multicenter, multinational, randomized, triple-blind, placebo-controlled trial suggests that daily application of 5.5 mg of transdermal testosterone for ~9 weeks prior to IVF in women with DOR does not improve clinical pregnancy rates compared with placebo. Furthermore, there were no statistically significant between-group differences in the live birth rates or secondary outcomes. Subgroup analyses by age did not reveal any population of women who benefited from testosterone treatment. Women receiving testosterone treatment reported increased hair growth twice as frequently as the placebo group.

Testosterone is frequently prescribed to women with DOR undergoing IVF11,12, despite conflicting evidence from small, predominantly uncontrolled, clinical trials26. The most recent Cochrane meta-analysis reported that although testosterone treatment was associated with a higher live birth rate overall, this benefit was not seen when studies at high risk of bias were excluded21, leaving only a single trial of 53 subjects giving an odds ratio for live birth of 2.00 with a very uncertain effect estimate (95% CI 0.17–23.49)13. A more recent systematic review and meta-analysis reported a benefit of testosterone treatment with regard to the number of oocytes retrieved (MD 0.94; 95% CI 0.46–1.42) and the number of mature oocytes retrieved (MD 0.62; 95% CI 0.00–1.25), as well as an increased likelihood of clinical pregnancy (RR 2.07; 95% 1.33–3.20)10. However, it was noted that the evidence was of low to medium quality, that existing studies were heterogeneous in their design, and that additional high quality data was urgently required to address this clinical question. T-TRANSPORT was designed to overcome these limitations as the largest randomized trial evaluating testosterone for women with DOR to date, and the study has a sample size amounting to 44% of the total sample size of all studies included in the recent meta-analysis.

An important consideration during the T-TRANSPORT study design was the selection of the androgen type, dose and duration of administration. Testosterone was chosen as it is a biologically active androgen that directly binds the androgen receptor. Some previous studies have used DHEA, however, DHEA is a relatively biologically inactive steroid that primarily functions as a testosterone precursor10, and previous studies have shown that treatment with DHEA does not increase intrafollicular testosterone concentrations30. Furthermore, well-designed randomized controlled trials and the most recent meta-analysis have not demonstrated any clinical benefit of pre-treatment with DHEA in poor ovarian responders10,31. The chosen duration of treatment was based on evidence that testosterone acts during the earlier stages of follicular development32. As the transition of human ovarian follicles from the pre-antral to antral stages occurs over approximately 70 days, the chosen treatment duration of ~9 weeks reflects the plausible physiological roles of testosterone in human follicular development10.

Finally, T-TRANSPORT used a daily testosterone dose of 5.5 mg based on robust evidence from pharmacokinetic studies showing that higher doses can lead to supraphysiological testosterone levels. In healthy post-menopausal women, 5 mg of transdermal testosterone daily resulted in mean serum total testosterone levels in the upper range of the premenopausal reference range (1.87 nmol/L), whilst 10 mg/day resulted in levels above the reference range (3.26 nmol/L)33. Similarly, 10 mg/day of testosterone via transdermal gel increases serum total testosterone levels in healthy young women to 4.3 nmol/L, more than double the upper limit of the reference range34. Thus, we adopted the dose most likely to provide a physiological effect whilst minimizing the risk of androgenic side effects, leading to mean testosterone levels in the testosterone group of 3.11 ± 2.61 nmol/L.

Reflecting this, no serious adverse events were reported by participants in the active treatment arm. While testosterone treated participants were twice as likely to report increased hair growth, this did not lead to increased withdrawals. The study was not designed to evaluate the safety of longer durations or higher doses of testosterone treatment for premenopausal women planning pregnancy, however, it is reassuring that no between-group differences were observed in neonatal outcomes and obstetric outcomes.

A major strength of T-TRANSPORT is its robust study design. This is the first multicenter RCT on testosterone supplementation for women with DOR, with 10 centres in four European countries. In addition, it is the only placebo-controlled study with a triple-blind design. The adaptive study design, with a pre-specified interim analysis once 70% of the target sample size had been recruited, allowed for re-calculation of the sample size based on the conditional power with the option to increase the sample size in the case of promising results. Based on this pre-defined interim analysis, the study DSMB terminated the trial based on a conditional power well below the pre hoc futility threshold of 20%, preventing unnecessary recruitment, randomization, and treatment of additional participants.

Several limitations of this study should be considered. Only 53 participants (18.4%) were <36 years old, which may limit the applicability of the findings to younger patients. Furthermore, the study was powered to detect differences in clinical pregnancy rates, but not live birth rates or cumulative live birth rates. Despite this, the observed live birth rates were similar between groups, and there were no differences in the number of oocytes collected or embryos available. The study was powered to detect an 11.5% absolute increase in clinical pregnancy rate, which was chosen based on previous studies. Smaller increases than this could still be considered clinically meaningful, and there is a risk of type II error. Additionally, the early termination of the trial for futility reduces the power of the study and the precision of the effect estimate.

The study used day 3 embryo transfers based on standard practices at the involved sites at the time of study commencement, which may not be generalizable to blastocyst embryo transfer. However, given the uniform treatment in both study arms, this is unlikely to bias the outcomes.

The testosterone dose chosen was intended to achieve serum levels in the upper end of the reference range, avoiding excessively supraphysiological levels. This decision was taken to minimize the risk of severe androgenic side effects; however, these results may not be applicable to patients receiving other testosterone doses or regimens, or indeed other androgens. It is important to consider that whilst we did not observe any increase in congenital anomalies, this study was not powered or designed to evaluate differences in this outcome. Finally, the study duration was substantially longer than anticipated, due in part to inactivity during the COVID-19 pandemic.

T-TRANSPORT is particularly relevant to the contemporary practice of reproductive medicine. Women with diminished ovarian reserve and low ovarian responders represent a significant clinical challenge, facing multiple barriers to successful pregnancy. In addition to low oocyte yield, they are commonly older and suffer higher aneuploidy and miscarriage rates. Identifying treatments that could improve clinical outcomes for these patients is therefore a priority. The recent ESHRE Good Practice Recommendations recognize that there was insufficient evidence to recommend any “adjuvants” during ovarian stimulation, including androgens, and that further research is required9. Each of these interventions carries risks and usually costs to the patient and may offer unrealistic expectations and hopes for patients with a poor prognosis. To date, high quality clinical trials have been lacking to support or refute the efficacy of most of these proposed treatments. The T-TRANSPORT trial adds substantially to the body of evidence guiding androgen treatment in poor ovarian responders.

In conclusion, in this study, the treatment of poor ovarian responders with 5.5 mg of transdermal testosterone per day for ~9 weeks prior to IVF treatment did not improve clinical pregnancy rates. Whilst the study was not powered to detect smaller differences in clinical outcomes, these findings do not support the routine use of testosterone in this population to improve reproductive outcomes.

Methods

Study design and setting

This multicenter, parallel, randomized, triple-blind, placebo-controlled clinical trial was conducted across 10 tertiary fertility clinics in 4 European countries (Spain, Belgium, Denmark and Switzerland) between April 2015 and November 2022, with the final participant randomized in August 2022. Participants were followed up until delivery if pregnant. The trial was approved by the Medical Ethics Committee of the Universitair Ziekenhuis Brussel (Ref: 2014/148) and by the ethics committees of each participating center (Belgium: Comissie Medische Ethiek UZ Brussel; Spain: Comité Ético de Investigación Clínica del Grupo Hospitalario Quironsalud; Denmark: Videnskabsetiske Komitéer for Region Midtjylland; Switzerland: Swissethics). The study was monitored by an independent Data Safety Monitoring Board (DSMB) as well as an independent Clinical Research Organization. The trial was prospectively registered at ClinicalTrials.gov (NCT02418572).

Study participants

Eligible participants were females with infertility and DOR aged 18–43 years old and planning ovarian stimulation for IVF. DOR was defined by the Bologna criteria35. Specifically, if aged <40 years, participants were required to have had ≤3 oocytes retrieved in a previous IVF cycle and an antral follicle count (AFC) < 7; or ovarian surgery/chemotherapy and an AFC < 7; or ≤3 oocytes in at least 2 previous cycles with ≥300 IU gonadotropins per day. If aged ≥40 years old, they were required to have had ≤3 oocytes retrieved in a previous cycle or have an AFC < 7.

Exclusion criteria included perimenopause, a basal FSH > 20 IU/L, uterine malformations, hydrosalpinx (unless clipped), recent history of an untreated endocrine abnormality, contraindication to the use of gonadotropins, recent history of a severe disease requiring regular treatment, use of androgens within the previous 3 months, SHBG <20 nmol/L or >160 nmol/L, or planning to use surgically retrieved sperm.

All participants provided written informed consent. Participants did not receive compensation in exchange for their participation.

Randomization and blinding

Potential participants were screened for eligibility, and consent was obtained at the pre-treatment visit. Randomization was performed on day 1 of the subsequent menstrual cycle. Participants were randomly assigned to 5.5 mg per day of transdermal testosterone as 1% gel or an identical placebo in a 1:1 ratio, both supplied in identical metered-dose pumps by Besins Healthcare. The placebo gel was identical in composition to the active gel except for the 1% testosterone, which was replaced with 1% water. The randomization sequence was prepared and held centrally by an independent statistician using a random number table in permuted blocks of 6 participants, stratified by age (<36, 36–39 and ≥40 years old) and study site. Participants, clinicians, outcome assessors and statisticians were all blinded to group allocation. Figure 1 shows the CONSORT Study Flow Diagram.

Procedures

Each participant was instructed to apply 0.55 g (one pump) of either 1% testosterone gel (5.5 mg of testosterone) or placebo gel to their thighs each morning from the day of randomization for approximately 9 weeks. Approximately 7 weeks later, on day 21 of the subsequent menstrual cycle (i.e. after one intervening menstrual period), pituitary downregulation commenced with 100 mcg subcutaneous triptorelin daily. After 14 days, and once pituitary downregulation was confirmed by a serum estradiol <70 ng/L, the study medication was ceased and ovarian stimulation with 300 IU/day of subcutaneous highly purified human menopausal gonadotropin commenced. Once two follicles ≥17 mm were observed on transvaginal ultrasound, 250 mcg of subcutaneous recombinant human chorionic gonadotropin was administered. Thirty-six hours later, oocytes were collected and fertilized by intracytoplasmic sperm injection (ICSI). On day 3 after oocyte retrieval, a maximum of 2 embryos were transferred according to each center’s criteria on single/double embryo transfer. Luteal phase support was prescribed with 200 mg intravaginal micronized progesterone three times daily commencing the day after oocyte retrieval and continuing for at least 6 weeks, or until a negative pregnancy test ≥12 days after the embryo transfer (Fig. 2). Compliance and adverse events were assessed using a patient diary, which was completed daily.

Outcomes

The primary outcome was clinical pregnancy rate, defined as the presence of an intrauterine gestational sac with an embryonic pole demonstrating cardiac activity at 7 weeks’ gestation. Secondary outcomes were defined a priori and included: cycle cancellation due to poor ovarian response; cycles reaching embryo transfer; ongoing pregnancy rate; biochemical pregnancy rate; number of cumulus-oocyte complexes retrieved; and number of metaphase II oocytes retrieved. We recorded live birth outcomes and maternal and neonatal outcomes. Adverse events and their severity were assessed at each visit; a Ferriman–Gallwey hirsutism score was completed prior to starting the study medication and on completion of the study medication. Serum total testosterone levels were measured at five timepoints for each participant, including prior to starting the study medication and at the completion of the study medication but prior to commencing ovarian stimulation (Fig. 3). Testosterone was measured using the Roche Elecsys® electrochemiluminescence immunoassay on a Roche cobas® e 411 Analyzer which has a coefficient of variation of 8.4% at a concentration of 0.33 nmol/L and of 3.2% at 2.40 nmol/L36.

Sample size calculation

Based on the results of earlier studies, the expected clinical pregnancy rate in the placebo group was estimated to be around 14.5%, and approximately twice this in the testosterone group10,14. Therefore, this superiority study was designed to have 80% power to detect an increase in clinical pregnancy rate from 14.5% in the placebo group to 26% in the testosterone group, with a two-sided α of 0.05. To achieve this, 190 participants were required for each group. Allowing for a dropout rate of 5%, the planned sample size was 200 participants per group.

Statistical analysis

All data was collected using standardized electronic case report forms in REDCap (Vanderbilt University).

Descriptive summary measures are expressed as mean (SD) or median (inter-quartile ranges) for continuous variables and frequencies (percentage) for categorical variables. The analysis and reporting of the results of the clinical outcomes follows the CONSORT guidelines (www.consort-statement.org).

An intention to treat approach was used for the primary analysis. For the primary outcome (clinical pregnancy rates) and the secondary clinical outcomes (ongoing pregnancy and biochemical pregnancy), results were analyzed using a two-sided chi-square test with a two-sided P value of less than 0.05 considered significant. A secondary per-protocol analysis was performed for the primary outcome among participants who adhered to the study protocol (i.e., excluding any withdrawn patients as per Fig. 1). Spontaneous pregnancies were included in the analysis.

Secondary outcomes were analyzed with the chi-square or Fisher exact tests if categorical (cycle cancellation due to poor ovarian response, cycles with embryo transfer, number of cycles with frozen supernumerary embryos) or with the independent Student’s t test or Mann–Whitney test if continuous (number of oocytes, number and quality of embryos and endocrine parameters). If a subject did not reach a certain stage in IVF treatment, zero values were imputed for the primary outcome (i.e. pregnancy outcomes were set to not pregnant). For other outcomes, missing values were not imputed. Results for the secondary outcomes, including the width of confidence intervals, were not adjusted for multiplicity.

Prespecified subgroup analyses were performed by age groups (<36, 36–39 and ≥40 years old). Analyses were performed with SPSS (Version 29.0.0.0, IBM Corporation) or R (Version 4.5.0, The R Foundation for Statistical Computing.) The study statisticians were blinded to group assignment.

A planned interim analysis of the primary outcomes was performed by an independent statistician once primary outcome data were available for 70% of the intended sample size, with the results reviewed by the DSMB. If the conditional power was ≥80% then we would continue with the planned sample size; if it was 40–80% the sample size would be increased to reach a conditional power of 80%; if the conditional power was 20–<40%, we would proceed with the planned sample size; if the conditional power was <20%, the study would be terminated for futility. This was included in the study design to prevent unnecessary continued recruitment and ongoing exposure of participants to an off-label medication in the event of futility. Trial statisticians were blinded to group allocation during the interim analysis.

The trial was prospectively registered at ClinicalTrials.gov (NCT02418572).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.