Data-driven subtypes of polycystic ovary syndrome and their association with clinical outcomes

Gao, Xueying; Zhao, Shigang; Du, Yanzhi; Yang, Ziyi; Tian, Ye; Zhao, Junli; Yuan, Xi; Santos, Betania R.; Wei, Daimin; Cui, Linlin; Yan, Junhao; Qin, Yingying; Shi, Yuhua; Tang, Rong; Sun, Yun; Hu, Jingmei; Ding, Lingling; Song, Xueru; Ha, Lingxia; Li, Jingyu; Zhang, Heping; Spritzer, Poli Mara; Yildiz, Bulent O.; Stener-Victorin, Elisabet; Yong, Eu-Leong; Ou, Xiang-Hong; Legro, Richard S.; Zhao, Han; Chen, Zi-Jiang

doi:10.1038/s41591-025-03984-1

Download PDF

Article
Open access
Published: 29 October 2025

Data-driven subtypes of polycystic ovary syndrome and their association with clinical outcomes

Xueying Gao ORCID: orcid.org/0000-0002-5868-5378^{1,2,3,4,5,6,7,8,9}^na1,
Shigang Zhao ORCID: orcid.org/0000-0002-7407-9220^{1,2,3,4,5,6,7}^na1,
Yanzhi Du^8,9^na1,
Ziyi Yang ORCID: orcid.org/0000-0002-4138-5598¹^na1,
Ye Tian^10,11^na1,
Junli Zhao¹²^na1,
Xi Yuan¹³,
Betania R. Santos¹⁴,
Daimin Wei^{1,2,3,4,5,6,7},
Linlin Cui ORCID: orcid.org/0000-0001-7659-9169^{1,2,3,4,5,6,7},
Junhao Yan^{1,2,3,4,5,6,7},
Yingying Qin ORCID: orcid.org/0000-0002-0319-7799^{1,2,3,4,5,6,7},
Yuhua Shi¹,
Rong Tang¹,
Yun Sun^8,9,
Jingmei Hu^{1,2,3,4,5,6,7},
Lingling Ding¹,
Xueru Song^10,11,
Lingxia Ha¹²,
Jingyu Li¹,
China Women’s Reproductive Metabolic Network,
Heping Zhang ORCID: orcid.org/0000-0002-0688-4076¹⁵,
Poli Mara Spritzer ORCID: orcid.org/0000-0002-6734-7688¹⁴,
Bulent O. Yildiz ORCID: orcid.org/0000-0003-1797-7662¹⁶,
Elisabet Stener-Victorin ORCID: orcid.org/0000-0002-3424-1502¹⁷,
Eu-Leong Yong ORCID: orcid.org/0000-0001-6511-770X¹³,
Xiang-Hong Ou¹⁸,
Richard S. Legro¹⁹,
Han Zhao ORCID: orcid.org/0000-0001-9515-7534^{1,2,3,4,5,6,7} &
…
Zi-Jiang Chen ORCID: orcid.org/0000-0001-6637-6631^{1,2,3,4,5,6,7,8,9}

Nature Medicine volume 31, pages 4214–4224 (2025)Cite this article

39k Accesses
8 Citations
130 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 20 November 2025

This article has been updated

Abstract

Polycystic ovary syndrome (PCOS) is a common and heterogeneous endocrine disorder that affects 11%–13% of women worldwide, with profound implications for fertility and long-term metabolic health. Here we identify four reproducible subtypes—PCOS with hyperandrogen, with obesity, with high-sex hormone-binding globulin and with high-luteinizing hormone–anti-Müllerian hormone—through unsupervised clustering of 9 clinical variables in 11,908 affected women, validated across 5 international cohorts. Prospective 6.5-year follow-up and in vitro fertilization treatment data revealed distinct reproductive and metabolic trajectories: hyperandrogenic PCOS showed the highest risk of second trimester pregnancy loss and dyslipidemia incidence; PCOS with obesity exhibited the most severe metabolic complications, lowest live birth rates and highest PCOS remission rate; PCOS with high-sex hormone-binding globulin demonstrated favorable reproductive outcomes and the lowest incidence of diabetes and hypertension; and PCOS with high-luteinizing hormone–anti-Müllerian hormone had the greatest risk of ovarian hyperstimulation and the lowest PCOS remission rate. These findings advance understanding of PCOS heterogeneity and provide a framework for subtype-based risk stratification and personalized management.

Polycystic ovary syndrome

Article 18 April 2024

Epigenetic inheritance of polycystic ovary syndrome — challenges and opportunities for treatment

Article 07 July 2021

Polycystic ovary syndrome as a metabolic disease

Article 28 November 2024

Main

Polycystic ovary syndrome (PCOS) is a reproductive, metabolic and psychological condition with impacts across the lifespan, affecting a notable (~11% to 13%) number of women worldwide. From 1990 to 2019, its global age-standardized point prevalence markedly increased by 30.4%, partially driven by the expansion and evolution of diagnostic criteria, which substantially broadened the definition of PCOS^1,2. PCOS not only disrupts reproductive function, but also has detrimental effects on metabolism³. Women with PCOS are three times more likely to experience obesity than their healthy counterparts⁴. Moreover, they have a higher prevalence of insulin resistance (26.7%)⁵ with a fourfold increased risk of type 2 diabetes (T2DM) before the age of 40⁶. In addition, PCOS is associated with an elevated risk of cardiovascular disease, metabolic dysfunction-associated steatotic liver disease (MASLD), psychological disorders, baldness and other comorbidities, thereby imposing a considerable societal burden throughout an individual’s lifespan⁶.

PCOS exhibits a heterogeneous clinical presentation characterized by a wide array of symptoms, leading to varying potential consequences for affected women⁷. This variation in clinical phenotypic presentation leads to challenges in diagnosis, treatment and prevention, contributing to dissatisfaction among both healthcare practitioners and patients alike⁸. The lack of agreement on clinical subtypes, specifically because of an absence of robust evidence for distinctive subtypes, exacerbates healthcare costs for PCOS subjects, estimated to reach US$8 billion annually in the USA alone⁹. Some studies attempt to classify PCOS based on specific symptoms and metabolic comorbidities¹⁰. However, in the era of precision medicine, it is important to establish more precise and interpretable subtypes of the disease enabling distinct diagnosis and personalized therapy¹¹. We hypothesize that the identification and validation of precise subtypes of women with PCOS in a broader population will improve understanding of their clinical consequences and the design of targeted interventions.

Our current study aimed to identify a set of commonly measured variables that could distinguish clinical subtypes of PCOS using unsupervised cluster analysis in a large discovery cohort, and validate the subtypes in five independent cohorts of different ethnicities from China, USA, Europe, Singapore and Brazil. In addition, we examined the association between reproductive and metabolic variables in different subtypes during 6.5-year follow-up, and of in vitro fertilization (IVF) outcomes and pregnancy complications among the subtypes.

Results

Identification and validation of PCOS subtypes

In the discovery cohort, a total of 11,908 of the 47,071 women with PCOS who were not receiving therapy at the first visit were included in the analyses (Fig. 1 and Extended Data Table 1). To identify PCOS subtypes, we initially included 29 clinical variables relevant for PCOS. Following correlation analysis, principal component analysis and exploratory factor analysis, nine features were selected for subsequent clustering. Using unsupervised clustering, four subtypes were identified in the discovery cohort (Fig. 2a,b). The Jaccard scores for all four subtypes exceeded 0.79, suggesting good stability of clustering (Supplementary Table 1).

**Fig. 2: Classification and validation of PCOS subtypes.**

Each subtype exhibited distinct clinical characteristics (Fig. 2c and Extended Data Table 1). The hyperandrogenic subtype (HA-PCOS, 25%) was characterized by high testosterone–dehydroepiandrosterone sulfate (DHEA-S), along with mild metabolic disorders. The subtype with obesity (OB-PCOS, 26%) was characterized by higher body mass index (BMI), fasting glucose and fasting insulin level, with the highest prevalence of T2DM (7.9%), dyslipidemia (75.3%) and hypertension (28.7%). The high-sex hormone-binding globulin subtype (SHBG-PCOS, 26%) had the highest sex hormone-binding globulin (SHBG) level and lowest BMI among four subtypes, primarily manifested as lower luteinizing hormone (LH) and testosterone levels. The high-LH–AMH subtype (LH-PCOS, 23%) was distinguished by elevated levels of LH, follicle-stimulating hormone (FSH) and anti-Müllerian hormone (AMH).

To validate our clustering results, we replicated them in 5 independent validation cohorts from China (3,081 women with PCOS; 1,476 with eligible data), USA (750 PCOS cases; 593 with eligible data), Europe (572 PCOS cases; 197 with eligible data), Singapore (428 PCOS cases; 127 with eligible data) and Brazil (100 PCOS cases; 85 with eligible data) (Extended Data Table 2). The same unsupervised clustering approach, using k-means with the same parameter setting as the discovery cohort, was applied for each validation cohort. These analyses identified the same four subtypes in each validation cohort (Fig. 2d–h).

To ensure the model could be applied to each woman with PCOS in clinic, we conducted ridge regression analysis in the discovery cohort, which generated four ridge regression equations to compute the probabilities that a given participant corresponded to each of those four subtypes. To verify the accuracy and reliability of these ridge regression equations, nine features from each participant in the validation cohorts were then applied as inputs into these four equations. We then calculated area under the curve (AUC) values from receiver operating characteristic (ROC) analysis of each subtype to evaluate the accuracy of the predictions obtained by the ridge regression equations based on comparison with subtype labels obtained via k-means (as reference). In the Chinese validation cohort, the average AUC of the four subtypes was 0.88 (0.87 to 0.90) (Fig. 2i); an average AUC of 0.92 (0.83 to 0.95) was obtained in the US validation cohort (Fig. 2j), 0.88 (0.88 to 0.89) in the European cohort (Fig. 2k), 0.95 (0.90 to 0.98) in the Singapore cohort (Fig. 2l) and 0.82 (0.80 to 0.86) in the Brazilian cohort (Fig. 2m). The sensitivity and specificity in each validation cohort are shown in Supplementary Table 2. Considering the ethnic heterogeneity of the US cohort, we further stratified this cohort into sub-populations of European (n = 463) and African (n = 82) descent for separate analysis, which resulted in an AUC value of 0.88 for both sub-populations (Supplementary Fig. 1). These results thus demonstrated that the ridge regression equations could indeed predict PCOS subtype across different geographic and ethnic populations. We also repeated the above analysis with random forests method, but obtained lower AUCs than ridge regression in all validation cohorts.

Longitudinal follow-up for the four PCOS subtypes

To investigate the long-term complications and disease remission of the four subtypes, we performed longitudinal follow-up with a median duration of 6.5 years. A total of 4,542 women with PCOS diagnosed between 2014 and 2018 in the discovery cohort were followed-up by telephone interviews (Fig. 1 and Extended Data Table 3) and 523 of them voluntarily underwent physical examinations (Fig. 3, Extended Data Figs. 1–3 and Extended Data Table 4). The average age of these participants was 34 years.

**Fig. 3: Follow-up outcomes for the four PCOS subtypes.**

First, remission of PCOS was evaluated at the time of the follow-up visit. Based on the telephone interview, the percentages of women who had received treatment for PCOS or were diagnosed with PCOS in the past six months among participants who had routine physical examinations were: 78.0% in the HA-PCOS subtype, 61.4% in OB-PCOS, 61.6% in SHBG-PCOS and 80.6% in LH-PCOS (Extended Data Table 3). Similarly, according to physical examination data at the follow-up visit, the percentages of women in the four subtypes who still met the Rotterdam criteria were: 67.2% for HA-PCOS, 50.9% for OB-PCOS, 52.8% for SHBG-PCOS and 74.8% for LH-PCOS (Fig. 3a).

Second, the hyperandrogenic, ovulatory and polycystic ovarian conditions were assessed at follow-up, and remission of these diagnostic features varied notably among the subtypes (Extended Data Fig. 2). The percentages of women with PCOS who remained hyperandrogenic were 56.0%, 31.6%, 44.4% and 55.6% for HA-PCOS, OB-PCOS, SHBG-PCOS and LH-PCOS, respectively. In terms of the other two clinical features, SHBG-PCOS had the lowest incidence of persistent oligo-ovulation or anovulation (57.7%) and polycystic ovaries (65.4%) among the four subtypes.

Third, the cumulative incidence of chronic metabolic complications was also compared among the four subtypes. HA-PCOS had the highest incidence of dyslipidemia (24.4%), whereas OB-PCOS exhibited the highest incidence of T2DM (16.0%). The incidence of hypertension was higher in HA-PCOS (11.1%) and OB-PCOS (14.6%) than in the other two subtypes. Remarkably, SHBG-PCOS demonstrated better metabolic characteristics, with the lowest incidence of T2DM and hypertension (Fig. 3b–d).

During the follow-up period, all subtypes showed an increase in BMI (Extended Data Fig. 1). OB-PCOS continued to exhibit the most unfavorable glucose metabolism, along with the highest rates of overweight and obesity. Notably, hepatic steatosis analysis revealed that OB-PCOS had the highest prevalence of MASLD (85.8%), followed by HA-PCOS (77.2%) (Extended Data Fig. 3).

IVF outcomes among the four PCOS subtypes

There were 5,418 women with PCOS who received IVF treatment in the discovery cohort. Additional information on the controlled ovarian hyperstimulation, embryo culture and transfer protocols for each subtype can be found in Supplementary Table 3.

To investigate the primary IVF outcomes among the four subtypes, we compared the live birth rate, pregnancy rate and pregnancy loss rate (Fig. 4, Table 1 and Supplementary Table 4). The live birth rates for HA-PCOS, OB-PCOS, SHBG-PCOS and LH-PCOS were 50.6%, 48.9%, 56.3% and 54.8%, respectively. Notably, women with SHBG-PCOS and LH-PCOS had higher live birth rates, even exceeding that of the control group (53.8%).

Table 1 IVF outcomes

Full size table

**Fig. 4: Odds ratios of IVF outcomes for the four PCOS subtypes.**

The clinical pregnancy rates for the four subtypes were 66.0% in HA-PCOS, 62.9% in OB-PCOS, 67.4% in SHBG-PCOS and 66.7% in LH-PCOS, all higher than the control group (60.6%). The total pregnancy loss rates were also significantly higher in the four subtypes, at 31.5%, 32.5%, 24.7% and 27.7%, respectively, compared with the control (19.8%) (Table 1). HA-PCOS displayed the highest clinical pregnancy loss rate (23.3%), particularly in the second trimester (odds ratio (OR) 7.32, 95% confidence interval (95% CI) 4.94–10.85) (Fig. 4 and Supplementary Table 4). OB-PCOS had the poorest outcomes, with the lowest clinical pregnancy rate (62.9%) and the highest total pregnancy loss rate (32.5%). Conversely, SHBG-PCOS had the lowest total pregnancy loss (24.7%), which contributes to the best pregnancy outcomes among the four subtypes (Table 1). Because BMI is one of the variables used for clustering, a potential confounding effect of BMI on fertility outcomes cannot be excluded at this time.

Regarding maternal complications, all women with PCOS exhibited an increased risk of moderate to severe ovarian hyperstimulation syndrome (OHSS) (all OR >1) (Fig. 4 and Supplementary Table 4). Notably, LH-PCOS had the highest risk of OHSS compared with the control group (OR 7.44, 95% CI 4.63–11.96). OB-PCOS showed a higher risk of gestational diabetes (OR 1.70, 95% CI 1.20–2.39). Gestational hypertension was more prevalent in OB-PCOS and HA-PCOS, with ORs of 2.83 and 2.63, respectively. Moreover, HA-PCOS had the highest risk of premature rupture of membranes (OR 2.91, 95% CI 1.56–5.43).

Regarding neonatal complications, women with PCOS who underwent IVF generally had lower rates of small for gestational age (SGA) infants than the control group, except for HA-PCOS (OR 1.00, 95% CI 0.58–1.70). On the other hand, the incidence of large for gestational age (LGA) was higher in all four PCOS subtypes than in the control group, with rates exceeding 20%. OB-PCOS had the highest risk of LGA, with rate reaching 38.1% and an OR of 2.14 (Table 1, Fig. 4 and Supplementary Table 4).

Outcomes of different IVF strategies in each PCOS subtype

To provide clinicians with exploratory IVF strategies for each PCOS subtype, we conducted in-depth analyses focusing on key clinical concerns: embryo transfer strategies, endometrial preparation protocols and ovarian hyperstimulation strategies (Table 2 and Extended Data Table 5).

Table 2 IVF outcomes with different protocols in each subtype

Full size table

Among embryo transfer strategies, the transfer of fresh embryos was associated with a lower live birth rate (OR 0.71, 95% CI 0.51–0.99, P = 0.042) and clinical pregnancy rate (OR 0.65, 95% CI 0.46–0.92, P = 0.016) compared to frozen embryo transfer in women with the HA-PCOS subtype (Table 2). No significant differences were observed in live birth rates, clinical pregnancy rates or total pregnancy loss rates between fresh and frozen embryo transfer in the other three subtypes, indicating that women with HA-PCOS may benefit from frozen embryo transfer.

Furthermore, when comparing ovarian stimulation (OS) cycles with natural cycle (NC) or hormone replacement therapy (HRT) for endometrial preparation before frozen embryo transfer, our analysis revealed that HRT yielded the lowest live birth rate and clinical pregnancy rate in women with the LH-PCOS subtypes, as well as the highest pregnancy loss risk in both OB-PCOS and LH-PCOS groups (Table 2). A summary of the clinical features of each subtype is shown in Extended Data Table 6.

Discussion

Accurate and globally accepted classification of PCOS clinical subtypes is essential in addressing this complex disorder associated with reproductive, metabolic as well as psychological dysfunction, although the latter was not addressed in this study. We identified four distinct PCOS subtypes (HA-PCOS, OB-PCOS, SHBG-PCOS and LH-PCOS) in a large discovery cohort. These subtypes were validated in five independent cohorts with different ethnicities around the world. Distinct reproductive and metabolic comorbidities and IVF treatment outcomes were identified in the subtypes.

As with other complex conditions, PCOS has been linked to several comorbidities, including fertility and pregnancy complications, hyperinsulinemia, T2DM and cardiovascular disorders. Accordingly, PCOS has no single diagnostic marker to provide a gold standard¹². The Rotterdam criteria have indisputably improved prognostic capabilities for predicting reproductive outcomes in women with PCOS. However, they may overlook some key heterogeneities that result in potentially severe complications, which can be averted by incorporating clinical phenotypic data, and determining the women at risk of such complications based on their complex PCOS subtype. For the above purposes, we have developed PcosX (www.pcos.org.cn), a web-based tool designed to assign women with PCOS to specific subtypes, provided the necessary clinical variables have been measured.

A metabolic, a reproductive and an intermediate sub-phenotype of PCOS are proposed in ref. ¹³. This classification system was established using 1,156 women from the USA who were diagnosed with PCOS according to the relatively strict National Institutes of Health (NIH) criteria. Here we established an unsupervised clustering model and defined four clinical subtypes of PCOS based on the Rotterdam diagnostic criteria and, in accordance with broader criteria, a cohort ten times larger than that used for PCOS classification in ref. ¹³. We then validated the model in five PCOS cohorts of different ethnicities from different geographic locations. Comparison with this previously developed PCOS subtyping model showed that the metabolic subtype defined in ref. ¹³ largely overlapped with our HA-PCOS and, to a lesser extent, OB-PCOS subtype, which may be due to the hyperandrogenism requirement in the NIH criteria. However, our follow-up studies show that the HA-PCOS subtype clearly differs from a simple metabolic subtype. By contrast, the indeterminate group was largely distributed between our HA-PCOS and LH-PCOS subtypes, possibly because of the inclusion of AMH as a variable. Our clustering analysis included AMH among the nine features, serving as an alternative to antral follicle count for assessing polycystic ovarian morphology (PCOM), as recommended in the recently updated International Evidence-based Guideline of PCOS¹⁴. Furthermore, although the different subtypes of PCOS and components of the Rotterdam criteria can predict reproductive outcomes to some extent, key predictors such as insulin and AMH are not included in the Rotterdam criteria¹². We adopted a different approach by integrating AMH into our analysis and studying a diverse cohort of women with PCOS. This allowed us to identify four subtypes with unique reproductive, metabolic and IVF treatment outcomes on longitudinal follow-up.

Importantly, the four PCOS subtypes are closely associated with distinct clinical characteristics and outcomes. The HA-PCOS subtype is linked to metabolic diseases including obesity, MASLD, T2DM, hypertension and dyslipidemia during follow-up. Notably, HA-PCOS demonstrated a higher incidence of dyslipidemia and severe MASLD compared to the OB-PCOS subtype, which is in line with previous reports demonstrating that hyperandrogenism is associated with an increased risk of lipid dysfunction¹⁵. Moreover, the HA-PCOS subtype was associated with an increased risk of second trimester pregnancy loss and premature rupture of membranes compared to other subtypes. Indeed, previous reports have shown that higher maternal testosterone levels are linked to low birth weight in offspring and an increased risk of preterm delivery, accompanied by fetal membrane rupture and cervical dilatation¹⁶. Thus, higher androgen levels might increase maternal plasma estrogen, oxytocin and amnion fibronectin levels, leading to premature rupture of membranes.

OB-PCOS is a metabolic subtype with the highest rate of PCOS remission during follow-up. As the disease progresses, women with OB-PCOS primarily develop metabolic disorders such as T2DM and hypertension, while their reproductive endocrine abnormalities tend to become less pronounced. This suggests that metabolic triggers contribute to the reproductive alterations such as low IVF success rates due to poor oocyte quality, reduced implantation rates, high pregnancy loss rates, preterm birth and ultimately low live birth rates, all of which are linked to the OB-PCOS subtype. Therefore, the long-term complications of metabolic syndrome should be emphasized in the management of OB-PCOS. Women with OB-PCOS are also more prone to pregnancy complications, including hypertensive disorders, gestational diabetes, preterm birth and cesarean delivery¹⁷. Although it is difficult for these women to conceive or have a healthy delivery, medications used during IVF or throughout pregnancy may improve their chances of conception and better outcomes.

SHBG-PCOS represents the mildest form of PCOS and has the best IVF outcomes, although it often presents with irregular cycles or PCOM. It appears to exhibit relatively mild neuroendocrine, androgenic and metabolic features, with primary abnormalities related to ovulatory dysfunction. SHBG, produced by the liver, binds to circulating sex steroids, affecting their bioavailability by sequestering androgens and estrogens from biological action. Several clinical studies have highlighted the potential role of SHBG in maintaining glucose homeostasis, because low levels of SHBG are strongly associated with an increased risk of T2DM¹⁸. Hence, the clinical features and IVF outcomes of the SHBG-PCOS subtype are almost the opposite of those observed in the OB-PCOS subtype. It is also worth considering that the follicle cutoff of 12 used in this study is below the current recommendation threshold of 20 follicles, meaning that this subtype may include women who would not meet the diagnostic criteria of the current International Evidence-based Guideline of PCOS. In addition, the lack of continuous menstrual cycle data may have influenced the classification of participants in this subtype, highlighting a need for further investigation in future research.

LH-PCOS showed the worst disease remission of PCOS at follow-up, indicating that the effects of high LH and AMH levels on PCOS may not be ameliorated by pregnancy or IVF procedures. Moreover, special attention should be given to the LH-PCOS subtype, because it is associated with the most typical reproductive characteristics of PCOS and carries an exceptionally high risk of OHSS. This heightened risk is likely due to the significantly elevated AMH levels in LH-PCOS, which are strongly associated with antral follicle count and serve as a reliable predictor of OHSS in IVF cycles. Although LH levels are also elevated in LH-PCOS, their ability to predict OHSS is relatively limited. Instead, it is possible that variants or dysregulation of the LH receptor, rather than the absolute levels of LH, may play a key role in the increased likelihood of OHSS¹⁹.

Fertility outcomes and complications are a major concern for infertile women with PCOS. Different outcomes from various intervention strategies, particularly when tailored to specific PCOS subtypes, were assessed to potentially provide clinicians with more IVF options for managing different subtypes. Our previous study²⁰ showed that frozen embryo transfer results in a higher live birth rate than fresh embryo transfer in women with PCOS. In this study, we further found that the benefits of frozen embryo transfer are specific to women with the HA-PCOS subtype. For the other three PCOS subtypes, the more cost-effective fresh embryo transfer may serve as a preferred option.

In general, PCOS is a highly complex syndrome, exhibiting substantial heterogeneity in its etiology and clinical presentations. It is closely associated with a variety of adverse outcomes. To date, its etiology and symptoms have defied a singular explanation, making it challenging to apply trial data or individual clinical features for directly predicting outcomes. Moreover, no existing model has been proven to accurately forecast outcomes in PCOS. Our findings suggest that clustering provides a more comprehensive understanding of PCOS than examining isolated clinical features. This clustering model aligns with the multi-faceted metabolic and reproductive nature of the disease. Reliable subtypes serve as the foundation of precision medicine, and identifying these genuine subtypes requires analyzing data on a population scale, defining robust biomarkers for each subtype, refining diagnoses based on these subtype-specific markers and ultimately predicting treatment responses and disease remission through extensive research and validation.

Because this study represents a step in the long journey toward establishing and implementing clinical precision medicine practices in PCOS, several limitations should be considered when interpreting the results. First, forced patient clustering can create arbitrary groupings and discard valuable continuous information. In our study, we used k‑means clustering for this large dataset, while acknowledging that other unsupervised clustering or machine‑learning methods could also be applied for PCOS classification. Different techniques may yield varying results, especially for cases near cluster boundaries where sub‑phenotypic features overlap. By rigorously validating the clusters across international cohorts and assessing their clinical interpretability, we ensure that the resulting clusters are both statistically robust and clinically meaningful, and future methodological advances may further refine such classifications. In addition, we did not include categorical variables such as menstrual cycle regularity in the analysis. This decision was based on the self-reported nature of the data in our cohort, which introduced potential recall bias and reduced reliability. In addition, as a nonordered categorical variable, menstrual cycle regularity poses challenges for integration into current clustering algorithms without compromising interpretability. We acknowledge that this exclusion may have impacted the clustering results, because menstrual cycle regularity is a key feature of PCOS. Future research should prioritize the inclusion of accurate and objective measures of menstrual cycle regularity, such as data collected via digital tracking apps, to enhance the robustness of clustering analyses.

This study also has limitations related to recruitment methods. A substantial proportion of the participants were lost during the data completion phase. In addition, most participants—excluding those from the Turkish and Brazilian cohorts—were recruited from specialized reproductive centers. This may have introduced selection bias, favoring certain PCOS subtypes. The relatively small size of the validation cohorts from other countries further highlights the need for future validation in larger and more diverse populations to confirm the model’s applicability across different ethnicities. Moreover, follow-up data on disease remission were not available for the validation cohorts. For consistency across the study, we used the diagnostic threshold of 12 follicles per ovary for PCOM, in line with the 2003 Rotterdam criteria. This threshold is lower than the updated recommendations in more recent guidelines, which take into account improvements in ultrasound sensitivity. This discrepancy may limit the comparability of our findings with studies using updated diagnostic criteria. Also, follow-up via telephone interviews may introduce recall bias and lack clinical verification. To mitigate this, we incorporated in-person assessments, including blood draws, for a substantial proportion of participants to ensure objective data collection. However, we acknowledge that not all participants underwent in-person evaluations, which may limit the comprehensiveness of some follow-up data. Ongoing efforts will integrate additional in-person assessments to address this limitation and enhance the robustness of our findings. The control group was not included in the follow-up, precluding direct comparison of PCOS subtypes with controls. In addition, considering the dietary and geographical differences, the definition of hyperlipidemia in this study was based on Chinese criteria, which may differ in certain aspects from other international criteria. Lastly, it is important to note that when considering clinical applications, although our findings primarily focus on IVF outcomes, they do not include data on lifestyle interventions, ovulation induction or other first-line treatments for PCOS. Hence, the pregnancy outcomes reported here may not reflect the general outcomes for all women with PCOS²¹. Further research is needed to explore how this classification can be applied to other treatment strategies and their impact on patient outcomes.

Substantial work remains in both clinical and basic research to further develop and refine this classification. Because some of the variables used in the present model are not routinely recommended in current guidelines, sensitivity, specificity and cost evaluations need to be considered in the future application. Future studies should investigate continuous measures of ovarian or menstrual cycle dysfunction, genetic structures, epigenetic features and proteomic or metabolomic biomarkers for each subtype. Identifying additional variables could help refine cluster classifications, thereby enhancing our understanding of this complex yet often neglected syndrome, PCOS. Validation of these subtypes must be conducted in more diverse, unbiased and nonselected community-based populations to ensure generalizability. Furthermore, the statistical analysis of treatment outcomes should include more rigorous stratification and adjustments during follow-up. Incorporating contemporary diagnostic criteria, performing economic evaluations and developing robust translation tools will be essential for assessing the clinical utility of these clusters. These steps will ultimately support their integration into clinical practice and improve precision medicine approaches for PCOS.

This study, however, has multiple strengths, one of which is that it combines PCOS cohorts around the world. We propose a clinically relevant disease classification model, validated across diverse populations from different regions globally. For each subtype, we have identified their respective characteristics, highlighting the similarities and differences in the risk of metabolic disease, pregnancy complications and the variations in assisted reproductive outcomes. Furthermore, through long-term follow-up, we have revealed changes in the reproductive and metabolic features of each subtype. This provides crucial scientific evidence for early identification and enables early interventions for disease prevention.

In conclusion, our study identifies four PCOS subtypes, each with unique reproductive, metabolic and prognostic characteristics. These subtypes provide important insights into the heterogeneity of PCOS and highlight the potential for more personalized treatment approaches in clinical practice. However, continued international collaboration and research are essential to address the limitations identified and to ensure the reliability and effectiveness of these subtypes for personalized patient care. This collaborative effort will be crucial in refining classification systems and therapeutic strategies, ultimately paving the way for more tailored and effective clinical management of PCOS.

Methods

Ethics statement

Study protocols were approved by the relevant ethics review committee (China, [2021] IRB-No.140; Singapore, 2011/01716-SRF0012)^22,23,24,25. All participants provided written informed consent. For information purpose, all cohorts were registered as an international multicenter study on clinicaltrials.gov (NCT06124391).

Study populations

This study involved a discovery cohort and five validation cohorts of participants with PCOS. All participants in the cohorts were between 20 and 45 years old. No transgender participant was included. PCOS was diagnosed using the broader Rotterdam diagnostic criteria²⁶, which requires the presence of any two of the following: (1) menstrual cycle length of <21 days or >35 days, and/or fewer than 8 cycles per year; (2) hyperandrogenism defined as an elevated total testosterone level according to local laboratory criteria, and/or a modified Ferriman–Gallwey score ≥5; (3) the presence of 12 or more follicles measuring 2–9 mm in diameter in each ovary and/or an ovarian volume >10 ml as determined by ultrasound. All the clinical features were performed and recorded at the time of diagnosis.

The discovery cohort conducted at the Center for Reproductive Medicine, Shandong University, Shandong Province, China, comprised 47,071 women with PCOS diagnosed using the broader Rotterdam diagnostic criteria between December 2013 and June 2020.

Among the full cohort, 11,908 women were not receiving any therapy that could alter hormone levels or the ovulation cycle—such as oral contraceptives, metformin or weight loss regimens—at the time of their first visit and were therefore included in the subsequent statistical analysis. Women with PCOS who had received such therapies at the time of enrollment, and therefore did not meet the Rotterdam diagnostic criteria in terms of clinical manifestations or test parameters were excluded (n = 35,163).

The validation cohorts were from China, the USA, Europe, Singapore and Brazil. The China validation cohort comprised 3,081 PCOS cases from various regions in China, including East China, South China, North China, Northwest China, and Southwest China except cases from the Center for Reproductive Medicine, Shandong University, Shandong Province, China. The US cohort had 750 participants with PCOS from the Pregnancy in Polycystic Ovary Syndrome II (PPCOS II, NCT00719186) trial including European, African American and other races and ethnicities based on their country or area of birth²³. The Europe cohort had 392 cases from Turkey²⁷ and 180 cases from Sweden^28,29,30,31. The Singapore cohort consisted of 428 cases³². The Brazil cohort had 100 cases^24,25.

Feature selection

A total of 29 baseline clinical features routinely tested in the clinic and associated with endocrine or metabolic parameters in PCOS, were initially examined in the discovery cohort (detailed in Extended Data Table 1). Categorical variables were excluded, because they cannot be accommodated in most unsupervised clustering methods. Features with more than 30% missing data, which might be caused by the subjective bias of the physicians, were excluded from further analysis. K-nearest neighbor imputation with k = 10 was performed to impute the missing value for the remaining features, using the DMwR (v.0.1.4) package in R³³. Spearman correlation was performed to analyze the correlations among the clinical features. Principal component analysis was then used to evaluate each feature’s contribution to the overall variance in the dataset, identifying those variables that provided the most distinctive information relevant to clustering. Seeking to minimize covariance and redundancy, the features with lower contribution (value of contributions <10% in the first three principal components) of two strongly correlated features (correlation coefficient >0.7) were also excluded. Finally, an exploratory factor analysis was performed. Features with factor loadings <0.4 were excluded to ensure that only those strongly associated with the identified factors were retained. This approach resulted in the selection of nine continuous variables—BMI, LH, FSH, testosterone, SHBG, DHEA-S, AMH, fasting insulin and fasting glucose—for clustering analysis.

Feature measurements

Blood samples were drawn at the first consultation, and the fasting plasma glucose and insulin were analyzed after overnight fasting. A total of 29 baseline clinical features that related to endocrine or metabolic disorders were included in the study, comprising age, height, weight, BMI, systolic pressure, diastolic pressure, LH, FSH, estradiol, testosterone, prolactin, progesterone, SHBG, AMH, DHEA-S, thyroid stimulating hormone, alanine transaminase, aspartate transaminase, gamma-glutamyl transferase, albumin, triglyceride, total cholesterol, high-density lipoprotein, low-density lipoprotein, fasting glucose, fasting insulin, ultrasound antral follicle counts, menstrual cycle history and age at menarche.

Reproductive steroid hormone levels were measured during days 1–3 of menstruation (related to early follicular phase) for ovulatory women or at any time for anovulatory women by chemiluminescence immunoassay. The levels of AMH and biochemical parameters were measured by enzyme-linked immunosorbent assay. The antral follicle count was assessed by transvaginal ultrasound.

Unsupervised clustering analysis

An initial cluster analysis was conducted in the discovery cohort of Chinese PCOS cases. All continuous variables were normalized using z-score transformation. K-means clustering analysis was chosen for unsupervised clustering based on methods in a previous study³⁴. A range of k values (from 3 to 8) was used to first identify the maximum average silhouette widths (with 30 iterations) using the fpc package (v.2.2-10) in R v.4.0.3 to determine the optimal number for classifications. We found that k = 4 resulted in the highest average silhouette width in sensitivity analyses assessing the fit of individual objects in the classification, thus indicating that four clusters provided the most stable classifications (Supplementary Fig. 2). Ultimately, k = 4 was selected by Manhattan distance as the dissimilarity measure using the cclust function in the flexclust package (v.1.4-0) in R. Cluster stability was assessed by computing the Jaccard similarities through bootstrap resampling 1,000 iterations³⁵.

Subtype validation

Likewise, cases with eligible data from the validation cohorts were used to assess the reproducibility of the clustering results, as in previous studies^34,36,37. The same nine clinical features in each validation cohort were first standardized using z-score transformation. Each cohort was individually clustered with the nine standardized features using k-means with the same parameter setting (k = 4) as the discovery cohort.

Ridge regression analysis

To maximize its potential for clinical application in PCOS populations, we performed ridge regression analysis in the discovery cohort using the glmnet R package (v.4.1-3) with nine normalized model variables and multinomial outcome prediction for four subtypes. The lambda value was determined by assessing the model performance using ‘cv.glmnet’ function, which used tenfold cross-validation methods. The ridge regression analysis output comprised four ridge regression equations, each of which corresponded to a subtype, such as ‘Y = β1 × 1 + β2 × 2…β9 × 9’ (in which Y is the probability a given individual belongs to this subtype, with values ranging from zero to one).

To verify the accuracy of these ridge regression equations for each subtype in the discovery cohort, we used nine features from each participant in the validation cohorts. The features were used as inputs for each of the four equations to compute the probability of a participant belonging to that subtype. Each individual was then assigned a corresponding label of subtype based on the highest value among the four calculated probabilities.

Following this process, the subtype labels obtained from the previous validation cohort via unsupervised clustering with k-means served as a reference for ROC curve analysis. Each individual’s subtype label, assigned by predictive values obtained with the ridge regression equations, were then used to generate ROC curves. The AUC for each subtype was subsequently calculated to evaluate the accuracy of the predictions made by the ridge regression equations. The closer the AUC is to 1 indicates better consistency between the results of the ridge regression and the unsupervised results.

Longitudinal follow-up

A total of 9,601 women with PCOS from the discovery cohort were diagnosed between 2014 and 2018. Of these, 4,542 women voluntarily participated in telephone follow-ups between March 2021 and August 2024. In addition, 523 participants volunteered to undergo a physical examination, which included blood sample collection at the hospital. The median follow-up period for this cohort was 6.5 years (interquartile range 6.16–6.85). Remission of PCOS and its associated endocrinal features was defined as the percentage of people who still had the disease or features at the time of the follow-up. The diagnostic criteria of PCOS were still based on the same Rotterdam criteria as the baseline. Use of hormonal contraception in past six months was adjusted for in the hyperandrogenic and oligo-ovulation assessments.

Content of follow-up

The follow-up telephone interview included: (1) current height, weight and menstrual cycle status; (2) number of pregnancies and births within the past years; (3) whether the participant had received IVF treatment during the follow-up years; and (4) current disease status of PCOS, T2DM, hypertension and dyslipidemia based on whether the participant had received treatment or been diagnosed through physical examination within the last six months.

Physical examination included the following measurements: (1) anthropometric measurements such as height, weight, blood pressure, waist and hip circumference; (2) clinical evaluation for signs of hyperandrogenism; (3) laboratory tests for endocrine and metabolic disorders (same as at baseline); (4) ultrasound examination of the ovaries (antral follicle number and ovarian volume) and liver (fat content); and (5) medication use within the past six months.

Outcomes of chronic metabolic diseases

BMI was calculated as weight divided by the square of height (m²) and was classified based on criteria for the Chinese population³⁸: (1) normal weight, BMI of 18.5–23.9 kg m⁻²; (2) overweight, BMI of 24–27.9 kg m⁻²; and (3) obesity, BMI ≥ 28 kg m⁻². Degree of MASLD was assessed using abdominal ultrasound performed by an experienced ultrasonographer. Hypertension was defined as a systolic blood pressure ≥140 mmHg and/or diastolic blood pressure ≥90 mmHg. T2DM was defined as a fasting glucose ≥7.0 mmol l⁻¹ (ref. ³⁹). Dyslipidemia was defined as the presence of any of the following abnormalities: (1) total cholesterol ≥5.2 mmol l⁻¹; (2) triglycerides ≥1.7 mmol l⁻¹; (3) high-density lipoprotein <1.0 mmol l⁻¹; and (4) low-density lipoprotein ≥3.35 mmol l⁻¹ (ref. ⁴⁰).

Outcomes of IVF

In the discovery cohort, IVF data were available for 5,418 participants. To compare the IVF outcomes of different PCOS subtypes with those of a control population, we included a control group that underwent IVF treatment in the same clinic and met one of the following criteria: (1) infertility due to fallopian tubal adhesion or blockage, without any PCOS features; (2) infertility due to oligozoospermia, asthenospermia or abnormal spermatozoa in their male partner.

The live birth rate, pregnancy rate and pregnancy loss were calculated as the primary outcomes of IVF. Conception is diagnosed by serum human chorionic gonadotropin ≥10 mIU ml⁻¹. Clinical pregnancy is defined as detection of a gestational sac in the uterine cavity. We define first trimester pregnancy loss as pregnancy loss before the end of the 11th gestational week by miscarriage or stillbirth, and second trimester pregnancy loss as pregnancy loss during the 12th gestational week to the end of the 27th gestational week by miscarriage or stillbirth arising from fetal abnormalities or maternal factors, extreme spontaneous preterm birth or iatrogenic preterm birth.

Secondary outcomes were maternal and neonatal complications. Preterm delivery is defined as a live birth during the 28th gestational week to the end of the 36th gestational week, including iatrogenic preterm delivery and spontaneous preterm delivery. Premature rupture of membrane is the membrane rupture after the 28th gestational week, including preterm premature rupture of membrane, which also belongs to the spontaneous preterm delivery. SGA and LGA were determined on the basis of birth weight reference percentiles for Chinese populations, which was adjusted for sex and gestational age⁴¹. Birth weight lower than the 10th percentile of reference was defined as SGA and birth weight higher than the 90th percentile as LGA.

Logistic regression was used to calculate the ORs for each subtype compared to controls.

Statistical analysis

All statistical analyses were performed using SPSS v.26 and R v.4.0.3. Shapiro–Wilk tests were used for analyzing the normality of the variables. Continuous variables were compared using Student’s t-test or analysis of variance with the natural logarithmic conversion for nonnormal distribution data. Post-hoc comparisons among subtypes were performed using either Bonferroni or Dunnett T3 correction. Categorical variables were compared with either χ² or Fisher’s exact test. ORs were calculated using logistic regression analysis, comparing cases with controls while controlling for two confounding models: (1) age and ovarian stimulation methods, and (2) age and fresh or frozen embryo transfer.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Individual-level patient records of the clinical trial (clinicaltrials.gov: NCT06124391) are not publicly available due to national legal restrictions. The de-identified data can be requested by contacting the corresponding authors (Z.-J.C. at chenzijiang@hotmail.com or H. Zhao at hanzh80@sdu.edu.cn). All data access requests will be reviewed. Data can be shared only for noncommercial academic purpose and will require a formal material transfer agreement. The formally signed data transfer agreement is required under the supervision of the corresponding laws and regulations.

Code availability

This classification model method is currently undergoing a patent application process. The codes can be requested by writing to corresponding authors for scientific research and noncommercial use.

Change history

20 November 2025
A Correction to this paper has been published: https://doi.org/10.1038/s41591-025-04113-8

References

Bozdag, G., Mumusoglu, S., Zengin, D., Karabulut, E. & Yildiz, B. O. The prevalence and phenotypic features of polycystic ovary syndrome: a systematic review and meta-analysis. Hum. Reprod. 31, 2841–2855 (2016).
PubMed Google Scholar
Safiri, S. et al. Prevalence, incidence and years lived with disability due to polycystic ovary syndrome in 204 countries and territories, 1990–2019. Hum. Reprod. 37, 1919–1931 (2022).
PubMed Google Scholar
Joham, A. E. et al. Polycystic ovary syndrome. Lancet Diabetes Endocrinol. 10, 668–680 (2022).
CAS PubMed Google Scholar
Lim, S. S., Davies, M. J., Norman, R. J. & Moran, L. J. Overweight, obesity and central obesity in women with polycystic ovary syndrome: a systematic review and meta-analysis. Hum. Reprod. Update 18, 618–637 (2012).
CAS PubMed Google Scholar
Legro, R. S., Castracane, V. D. & Kauffman, R. P. Detecting insulin resistance in polycystic ovary syndrome: purposes and pitfalls. Obstet. Gynecol. Surv. 59, 141–154 (2004).
PubMed Google Scholar
Azziz, R. et al. Polycystic ovary syndrome. Nat. Rev. Dis. Primers 2, 16057 (2016).
PubMed Google Scholar
Copp, T., Doust, J., McCaffery, K., Hersch, J. & Jansen, J. Polycystic ovary syndrome: why widening the diagnostic criteria may be harming women. BMJ 373, n700 (2021).
PubMed Google Scholar
Dokras, A. et al. Gaps in knowledge among physicians regarding diagnostic criteria and management of polycystic ovary syndrome. Fertil. Steril. 107, 1380–1386 (2017).
PubMed Google Scholar
Riestenberg, C., Jagasia, A., Markovic, D., Buyalos, R. P. & Azziz, R. Health care-related economic burden of polycystic ovary syndrome in the United States: pregnancy-related and long-term health consequences. J. Clin. Endocrinol. Metab. 107, 575–585 (2022).
PubMed Google Scholar
Teede, H. J. et al. Recommendations from the international evidence-based guideline for the assessment and management of polycystic ovary syndrome. Hum. Reprod. 33, 1602–1618 (2018).
PubMed PubMed Central Google Scholar
Dennis, J. M., Shields, B. M., Henley, W. E., Jones, A. G. & Hattersley, A. T. Disease progression and treatment response in data-driven subgroups of type 2 diabetes compared with models based on simple clinical features: an analysis using clinical trial data. Lancet Diabetes Endocrinol. 7, 442–451 (2019).
PubMed PubMed Central Google Scholar
Wang, R. & Mol, B. W. The Rotterdam criteria for polycystic ovary syndrome: evidence-based criteria? Hum. Reprod. 32, 261–264 (2017).
PubMed Google Scholar
Dapas, M. et al. Distinct subtypes of polycystic ovary syndrome with novel genetic associations: an unsupervised, phenotypic clustering analysis. PLoS Med. 17, e1003132 (2020).
CAS PubMed PubMed Central Google Scholar
Teede, H. J. et al. Recommendations from the 2023 International Evidence-based Guideline for the assessment and management of polycystic ovary syndrome. Fertil. Steril. 120, 767–793 (2023).
PubMed Google Scholar
Sarkar, M. et al. Testosterone levels in pre-menopausal women are associated with nonalcoholic fatty liver disease in midlife. Am. J. Gastroenterol. 112, 755–762 (2017).
CAS PubMed PubMed Central Google Scholar
Zheng, B. K., Sun, X. Y., Xian, J. & Niu, P. P. Maternal testosterone and offspring birth weight: a mendelian randomization study. J. Clin. Endocrinol. Metab. 107, 2530–2538 (2022).
PubMed Google Scholar
Hynes, J. S., Weber, J. M., Truong, T., Acharya, K. S. & Eaton, J. L. Body mass index is negatively associated with a good perinatal outcome after in vitro fertilization among patients with polycystic ovary syndrome: a national study. F. S. Rep. 4, 77–84 (2023).
PubMed Google Scholar
Ding, E. L. et al. Sex hormone-binding globulin and risk of type 2 diabetes in women and men. N. Engl. J. Med. 361, 1152–1163 (2009).
CAS PubMed PubMed Central Google Scholar
Chambers, A. E., Nayini, K. P., Mills, W. E., Lockwood, G. M. & Banerjee, S. Circulating LH/hCG receptor (LHCGR) may identify pre-treatment IVF patients at risk of OHSS and poor implantation. Reprod. Biol. Endocrinol. 9, 161 (2011).
CAS PubMed PubMed Central Google Scholar
Chen, Z. J. et al. Fresh versus frozen embryos for infertility in the polycystic ovary syndrome. N. Engl. J. Med. 375, 523–533 (2016).
PubMed Google Scholar
Bahri Khomami, M. et al. Systematic review and meta-analysis of birth outcomes in women with polycystic ovary syndrome. Nat. Commun. 15, 5592 (2024).
CAS PubMed PubMed Central Google Scholar
Aksun, S. et al. Alterations of cardiometabolic risk profile in polycystic ovary syndrome: 13 years follow-up in an unselected population. J. Endocrinol. Invest 47, 1129–1137 (2024).
CAS PubMed Google Scholar
Legro, R. S. et al. The Pregnancy in Polycystic Ovary Syndrome II (PPCOS II) trial: rationale and design of a double-blind randomized trial of clomiphene citrate and letrozole for the treatment of infertility in women with polycystic ovary syndrome. Contemp. Clin. Trials 33, 470–481 (2012).
CAS PubMed PubMed Central Google Scholar
Mario, F. M., Graff, S. K. & Spritzer, P. M. Habitual physical activity is associated with improved anthropometric and androgenic profile in PCOS: a cross-sectional study. J. Endocrinol. Invest 40, 377–384 (2017).
CAS PubMed Google Scholar
Di Domenico, K. et al. Cardiac autonomic modulation in polycystic ovary syndrome: does the phenotype matter? Fertil. Steril. 99, 286–292 (2013).
PubMed Google Scholar
Rotterdam ESHRE/ASRM-Sponsored PCOS Consensus Workshop GroupRevised 2003 consensus on diagnostic criteria and long-term health risks related to polycystic ovary syndrome (PCOS). Hum. Reprod. 19, 41–47 (2004).
Google Scholar
Yildiz, B. O., Bozdag, G., Yapici, Z., Esinler, I. & Yarali, H. Prevalence, phenotype and cardiometabolic risk of polycystic ovary syndrome under different diagnostic criteria. Hum. Reprod. 27, 3067–3073 (2012).
PubMed Google Scholar
Stener-Victorin, E. et al. Are there any sensitive and specific sex steroid markers for polycystic ovary syndrome? J. Clin. Endocrinol. Metab. 95, 810–819 (2010).
CAS PubMed Google Scholar
Johansson, J. et al. Acupuncture for ovulation induction in polycystic ovary syndrome: a randomized controlled trial. Am. J. Physiol. Endocrinol. Metab. 304, E934–943 (2013).
CAS PubMed PubMed Central Google Scholar
Nilsson, E. et al. Transcriptional and epigenetic changes influencing skeletal muscle metabolism in women with polycystic ovary syndrome. J. Clin. Endocrinol. Metab. 103, 4465–4477 (2018).
PubMed Google Scholar
Kataoka, J. et al. Prevalence of polycystic ovary syndrome in women with severe obesity—effects of a structured weight loss programme. Clin. Endocrinol. (Oxf.) 91, 750–758 (2019).
CAS PubMed Google Scholar
Indran, I. R. et al. Simplified 4-item criteria for polycystic ovary syndrome: a bridge too far? Clin. Endocrinol. (Oxf.) 89, 202–211 (2018).
CAS PubMed Google Scholar
de Goeij, M. C. et al. Multiple imputation: dealing with missing data. Nephrol. Dial. Transpl. 28, 2415–2420 (2013).
Google Scholar
Ahlqvist, E. et al. Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables. Lancet Diabetes Endocrinol. 6, 361–369 (2018).
PubMed Google Scholar
Henning, C. Cluster-wise assessment of cluster stability. Comput. Stat. Data Anal. 52, 258–271 (2007).
Google Scholar
Lin, Z. et al. Machine learning to identify metabolic subtypes of obesity: a multi-center study. Front. Endocrinol. (Lausanne) 12, 713592 (2021).
PubMed Google Scholar
Zheng, R. et al. Data-driven subgroups of prediabetes and the associations with outcomes in Chinese adults. Cell Rep. Med. 4, 100958 (2023).
CAS PubMed PubMed Central Google Scholar
Zhou, B. & Coorperative Meta-Analysis Group of China Obesity Task Force Predictive values of body mass index and waist circumference to risk factors of related diseases in Chinese adult population. Biomed. Environ. Sci. 15, 83–96 (2002).
PubMed Google Scholar
US Preventive Services Task Force et al Screening for prediabetes and type 2 diabetes: US Preventive Services Task Force recommendation statement. JAMA 326, 736–743 (2021).
Google Scholar
Li, J. et al. Chinese guideline for lipid management. Front. Pharmacol. 14, 1190934 (2023).
CAS PubMed PubMed Central Google Scholar
Dai, L. et al. Birth weight reference percentiles for Chinese. PLoS ONE 9, e104779 (2014).
PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This study is supported by grants from the National Key Research and Development Program of China (grant no. 2021YFC2700400 to H. Zhao, grant no. 2024YFC2707300 to S.Z., grant no. 2017YFC1001000 to Z.-J.C.), the National Natural Science Foundation of China (grant no. 82421004 to H. Zhao, grant no. 32588201 to Z.-J.C., grant no. 82101707 to X.G., grant no. 32370916 to S.Z.), Shandong Provincial Key Research and Development Program (grant no. 2020ZLYS02 to Z.-J.C., grant no. 2024CXPT087 to H. Zhao), the Ningxia Hui Autonomous Region Key Research and Developmental Program (grant no. 2024BEG02019 to H. Zhao), the Natural Science Foundation of Shandong Province for Excellent Youth Scholars (grant no. ZR2023YQ061 to S.Z.), CAMS Innovation Fund for Medical Sciences (grant no. 2021-I2M-5-001 to Z.-J.C.), the Taishan Scholars Program of Shandong Province (grant no. ts20190988 to H. Zhao) and Fundamental Research Funds of Shandong University (grant 2023QNTD004 no. to S.Z.); Innovative research team of high-level local universities in Shanghai (grant no. SHSMU-ZLCX20210200 to Z.-J.C. and Y.D.); NIH grant (grant nos UG3/UH3 HL162971, UL1 TR002014, 1R01HD100630, R01AT009484-A1, 1R43HD11427-01, R01 HD091350-04 to R.S.L.). Swedish Medical Research Council (grant no. 2022-00550 to E.S.-V.); Distinguished Investigator Grant—Endocrinology and Metabolism, Novo Nordisk Foundation (grant no. NNF22OC0072904 to E.S.-V.); Diabetes Foundation (grant nos DIA2021-633 and DIA2022-708 to E.S.-V.); Conselho Nacional de Desenvolvimento Científico e Tecnológico, Brazil (grant no. INCT/CNPq 465482/2014-7 to P.M.S.). We thank all participants for their support and willingness to participate in the study. We also thank J. Zhang of Ministry of Education and Shanghai Key Laboratory of Children’s Environmental Health, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, T. Zhang of Department of Biostatistics, School of Public Health, Shandong University, W. Bao of Institute of Public Health Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, R. Azziz of Department of Medicine, Heersink School of Medicine, University of Alabama at Birmingham for their suggestions and the revision of our project. We also thank the NICHD and NICHD DASH for data sharing.

Author information

These authors contributed equally: Xueying Gao, Shigang Zhao, Yanzhi Du, Ziyi Yang, Ye Tian, Junli Zhao.

Authors and Affiliations

State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Reproductive Medicine, Institute of Women, Children and Reproductive Health, Shandong University, Jinan, China
Xueying Gao, Shigang Zhao, Ziyi Yang, Daimin Wei, Linlin Cui, Junhao Yan, Yingying Qin, Yuhua Shi, Rong Tang, Jingmei Hu, Lingling Ding, Jingyu Li, Yuehong Bian, Xin Liu, Shumin Li, Chuanxin Zhang, Han Zhao & Zi-Jiang Chen
National Research Center for Assisted Reproductive Technology and Reproductive Genetics, Shandong University, Jinan, China
Xueying Gao, Shigang Zhao, Daimin Wei, Linlin Cui, Junhao Yan, Yingying Qin, Jingmei Hu, Honghui Zhang, Han Zhao & Zi-Jiang Chen
Key Laboratory of Reproductive Endocrinology (Shandong University), Ministry of Education, Jinan, China
Xueying Gao, Shigang Zhao, Daimin Wei, Linlin Cui, Junhao Yan, Yingying Qin, Jingmei Hu, Yuqing Zhang, Han Zhao & Zi-Jiang Chen
Shandong Technology Innovation Center for Reproductive Health, Jinan, China
Xueying Gao, Shigang Zhao, Daimin Wei, Linlin Cui, Junhao Yan, Yingying Qin, Jingmei Hu, Xin Zhang, Han Zhao & Zi-Jiang Chen
Shandong Provincial Clinical Research Center for Reproductive Health, Jinan, China
Xueying Gao, Shigang Zhao, Daimin Wei, Linlin Cui, Junhao Yan, Yingying Qin, Jingmei Hu, Yu Tian, Han Zhao & Zi-Jiang Chen
Shandong Key Laboratory of Reproductive Research and Birth Defect Prevention, Jinan, China
Xueying Gao, Shigang Zhao, Daimin Wei, Linlin Cui, Junhao Yan, Yingying Qin, Jingmei Hu, Han Zhao & Zi-Jiang Chen
Research Unit of Gametogenesis and Health of ART-Offspring, Chinese Academy of Medical Sciences (No.2021RU001), Jinan, China
Xueying Gao, Shigang Zhao, Daimin Wei, Linlin Cui, Junhao Yan, Yingying Qin, Jingmei Hu, Han Zhao & Zi-Jiang Chen
Shanghai Key Laboratory for Assisted Reproduction and Reproductive Genetics, Shanghai, China
Xueying Gao, Yanzhi Du, Yun Sun & Zi-Jiang Chen
Department of Reproductive Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
Xueying Gao, Yanzhi Du, Yun Sun & Zi-Jiang Chen
Department of Gynecology and Obstetrics, Tianjin Medical University General Hospital, Tianjin, China
Ye Tian & Xueru Song
Tianjin Key Laboratory of Female Reproductive Health and Eugenics, Tianjin Medical University General Hospital, Tianjin, China
Ye Tian & Xueru Song
Department of Reproductive Medicine, General Hospital of Ningxia Medical University, Ningxia, China
Junli Zhao & Lingxia Ha
Department of Obstetrics and Gynaecology, National University Hospital, National University of Singapore, Singapore, Singapore
Xi Yuan & Eu-Leong Yong
Gynecological Endocrinology Unit, Division of Endocrinology, Hospital de Clínicas de Porto Alegre, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
Betania R. Santos & Poli Mara Spritzer
Department of Biostatistics, Yale University School of Public Health, New Haven, CT, USA
Heping Zhang
Division of Endocrinology and Metabolism, Hacettepe University School of Medicine, Ankara, Turkey
Bulent O. Yildiz
Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
Elisabet Stener-Victorin
Guangdong-Hong Kong Metabolism and Reproduction Joint Laboratory, Reproductive Medicine Center, The Affiliated Guangdong Second Provincial General Hospital of Jinan University, Guangzhou, China
Xiang-Hong Ou & Xiang-Hong Ou
Department of Obstetrics and Gynecology, Penn State College of Medicine, Hershey, PA, USA
Richard S. Legro
Sichuan Jinxin Xinan Women and Children’s Hospital, Chengdu, China
Lin Zhou
State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, China
Qiang Wang
State Key Laboratory of Female Fertility Promotion, Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, China
Yue Zhao
Reproductive Medicine Research Center, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
Xiaoyan Liang & Jingjie Li
State Key Laboratory of Reproductive Medicine, Clinical Center of Reproductive Medicine, First Affiliated Hospital, Nanjing Medical University, Nanjing, China
Jiayin Liu & Xiang Ma
Ministry of Education Key Laboratory of Metabolism and Molecular Medicine, Department of Endocrinology and Metabolism, Zhongshan Hospital, Fudan University, Shanghai, China
Xiaoying Li & Mingfeng Xia
Department of Obstetrics and Gynecology, Renji Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
Zhuowei Gu
Reproductive Medicine Center, Xiangya Hospital of Central South University, Changsha, China
Yanping Li
Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
Shixuan Wang & Yan Li
Key Laboratory of Reproductive Genetics, Ministry of Education, Department of Reproductive Endocrinology, Women’s Hospital, Zhejiang University School of Medicine, Hangzhou, China
Yuli Qian
Institute of Genetics, International School of Medicine, Zhejiang University, Hangzhou, China
Jun Ma & Feng He
Department of Reproductive Medicine, The Second Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
Shanshan Gao & Yue Liu
Department of Obstetrics and Gynecology, Qilu Hospital of Shandong University, Jinan, China
Yonghui Jiang
Department of Obstetrics and Gynecology, Shandong Key Laboratory of Reproductive Medicine, Shandong Provincial Hospital, Shandong First Medical University, Jinan, China
Shuai Zhao & Hui Zhao

Authors

Xueying Gao
View author publications
Search author on:PubMed Google Scholar
Shigang Zhao
View author publications
Search author on:PubMed Google Scholar
Yanzhi Du
View author publications
Search author on:PubMed Google Scholar
Ziyi Yang
View author publications
Search author on:PubMed Google Scholar
Ye Tian
View author publications
Search author on:PubMed Google Scholar
Junli Zhao
View author publications
Search author on:PubMed Google Scholar
Xi Yuan
View author publications
Search author on:PubMed Google Scholar
Betania R. Santos
View author publications
Search author on:PubMed Google Scholar
Daimin Wei
View author publications
Search author on:PubMed Google Scholar
Linlin Cui
View author publications
Search author on:PubMed Google Scholar
Junhao Yan
View author publications
Search author on:PubMed Google Scholar
Yingying Qin
View author publications
Search author on:PubMed Google Scholar
Yuhua Shi
View author publications
Search author on:PubMed Google Scholar
Rong Tang
View author publications
Search author on:PubMed Google Scholar
Yun Sun
View author publications
Search author on:PubMed Google Scholar
Jingmei Hu
View author publications
Search author on:PubMed Google Scholar
Lingling Ding
View author publications
Search author on:PubMed Google Scholar
Xueru Song
View author publications
Search author on:PubMed Google Scholar
Lingxia Ha
View author publications
Search author on:PubMed Google Scholar
Jingyu Li
View author publications
Search author on:PubMed Google Scholar
Heping Zhang
View author publications
Search author on:PubMed Google Scholar
Poli Mara Spritzer
View author publications
Search author on:PubMed Google Scholar
Bulent O. Yildiz
View author publications
Search author on:PubMed Google Scholar
Elisabet Stener-Victorin
View author publications
Search author on:PubMed Google Scholar
Eu-Leong Yong
View author publications
Search author on:PubMed Google Scholar
Xiang-Hong Ou
View author publications
Search author on:PubMed Google Scholar
Richard S. Legro
View author publications
Search author on:PubMed Google Scholar
Han Zhao
View author publications
Search author on:PubMed Google Scholar
Zi-Jiang Chen
View author publications
Search author on:PubMed Google Scholar

Consortia

Contributions

X.G., S.Z., H. Zhao and Z.-J.C. contributed to the conception of the work. Z.-J.C. sponsored and organized the establishment of the international PCOS cohorts and coordinated global collaborations. H. Zhao led the China Women’s Reproductive Metabolic Network, integrating multiple regional cohorts across China, with S.Z. overseeing national data harmonization and quality control. Y.D., Z.Y., Y.T., J.Z., X.Y., B.R.S., D.W., L.C., J.Y., Y.Q., Y. Shi, R.T., Y. Sun, J.H., L.D., X.S., L.H., J.L., P.M.S., B.O.Y., E.S.-V., E.-L.Y., X.-H.O., R.S.L. and H. Zhao contributed to the data collection. X.G., Z.Y. and H. Zhang contributed to the data analysis. X.G. and S.Z. accessed and verified the data. X.G. and S.Z. drafted the article. Under the supervision of Z.-J.C. and H. Zhao, S.Z. led X.G. and Z.Y. in preparing detailed responses to the reviewers’ comments and completing the manuscript revision. Z.-J.C., H. Zhao, E.S.-V., R.S.L., E.-L.Y., X.-H.O., B.O.Y. and P.M.S. provided critical input and supervision throughout the manuscript development. All authors gave final approval of the version to be published.

Corresponding authors

Correspondence to Poli Mara Spritzer, Bulent O. Yildiz, Elisabet Stener-Victorin, Eu-Leong Yong, Xiang-Hong Ou, Richard S. Legro, Han Zhao or Zi-Jiang Chen.

Ethics declarations

Competing interests

R.S.L. reports receiving grants or contracts from the NIH, Guerbet and the Pennsylvania Department of Health; consulting fees from Monsanto, Organon, Celmatix, Covis Pharma GmbH and Novo Nordisk; payment or honoraria from Shandong University as the honorary professor, Scientific Center of Family Health and Human Reproduction as the member of the steering committee, and International Advisory Panel as the member; membership on the Medical–Scientific Advisory Board for PCOS Challenge, and the Investment Subcommittee of Endocrine Society; editorial roles as Editorial Editor for Fertility and Sterility and Associate Editor for Global Reproductive Health; stock ownership in Biodesix; participation in the NIH Loan Repayment Program. E.S.-V. reports serving as the Chief Scientific Officer for the Androgen Excess and PCOS Society (unpaid), outside the submitted work. The classification model method in this paper is currently undergoing a patent application process (applying number: 2023102506962). All other authors declare no competing interests.

Peer review

Peer review information

Nature Medicine thanks Frank Harrell and Helena Teede for their contribution to the peer review of this work. Primary Handling Editor: Ashley Castellanos-Jankiewicz, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Change of clinical features of four PCOS subtypes with 6.5-year follow-up.

Change of testosterone (a), BMI (b), SHBG (c), and LH (d) of four PCOS subtypes with 6.5-year follow-up (n = 128 in HA-PCOS; n = 110 in OB-PCOS; n = 142 in SHBG-PCOS; n = 143 in LH-PCOS). Data are presented as mean ± SD. Abbreviations: BMI, body-mass index; SHBG, sex hormone binding globulin; LH, luteinizing hormone.

Extended Data Fig. 2 Remission of PCOS-associated endocrinal features over the follow-up.

Remission of PCOS-associated endocrinal features hyperandrogenism (a), oligo/anovulation (b), and polycystic ovarian morphology (c) over the follow-up (n = 128 in HA-PCOS; n = 110 in OB-PCOS; n = 142 in SHBG-PCOS; n = 143 in LH-PCOS). The error bars represent the confidence intervals for the overall rate estimations.

Extended Data Fig. 3 The percent of obesity and MASLD at the follow-up time.

The percent of obesity (a) and MASLD (b) at the follow-up time (n = 128 in HA-PCOS; n = 110 in OB-PCOS; n = 142 in SHBG-PCOS; n = 143 in LH-PCOS). Abbreviations: MASLD, metabolic dysfunction-associated steatotic liver disease.

Extended Data Table 1 Baseline characteristics of the discovery cohort

Full size table

Extended Data Table 2 Baseline characteristics of the validation cohorts

Full size table

Extended Data Table 3 Telephone interview of PCOS women at the follow-up

Full size table

Extended Data Table 4 Clinical examination of PCOS women at the follow-up

Full size table

Extended Data Table 5 Total live birth, clinical pregnancy, and total loss outcomes with different COH methods for fresh embryo transfer in each subtype

Full size table

Extended Data Table 6 Summary of the four subtypes

Full size table

Supplementary information

Supplementary Information (download PDF )

Supplementary Figs. 1 and 2 and Tables 1–4.

Reporting Summary (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Gao, X., Zhao, S., Du, Y. et al. Data-driven subtypes of polycystic ovary syndrome and their association with clinical outcomes. Nat Med 31, 4214–4224 (2025). https://doi.org/10.1038/s41591-025-03984-1

Download citation

Received: 10 June 2025
Accepted: 27 August 2025
Published: 29 October 2025
Version of record: 29 October 2025
Issue date: December 2025
DOI: https://doi.org/10.1038/s41591-025-03984-1

Subjects

Abstract

Similar content being viewed by others

Main

Results

Identification and validation of PCOS subtypes

Longitudinal follow-up for the four PCOS subtypes

IVF outcomes among the four PCOS subtypes

Outcomes of different IVF strategies in each PCOS subtype

Discussion

Methods

Ethics statement

Study populations

Feature selection

Feature measurements

Unsupervised clustering analysis

Subtype validation

Ridge regression analysis

Longitudinal follow-up

Content of follow-up

Outcomes of chronic metabolic diseases

Outcomes of IVF

Statistical analysis

Reporting summary

Data availability

Code availability

Change history

20 November 2025

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

China Women’s Reproductive Metabolic Network

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links