Clustering of > 145,000 symptom logs reveals distinct pre, peri, and menopausal phenotypes

Aras, Shravan G.; Grant, Azure D.; Konhilas, John P.

doi:10.1038/s41598-024-84208-3

Download PDF

Article
Open access
Published: 03 January 2025

Clustering of > 145,000 symptom logs reveals distinct pre, peri, and menopausal phenotypes

Shravan G. Aras^1,2^na1,
Azure D. Grant³^na1 &
John P. Konhilas^2,4

Scientific Reports volume 15, Article number: 640 (2025) Cite this article

7616 Accesses
9 Citations
Metrics details

Subjects

Abstract

The transition to menopause is associated with disappearance of menstrual cycle symptoms and emergence of vasomotor symptoms. Although menopausal women report a variety of additional symptoms, it remains unclear which emerge prior to menopause, which occur in predictable clusters, how clusters change across the menopausal transition, or if distinct phenotypes are present within each life stage. We present an analysis of symptoms in premenopausal to menopausal women using the MenoLife app, which includes 4789 individuals (23% premenopausal, 29% perimenopausal, 48% menopausal) and 147,501 symptom logs (19% premenopausal, 39% perimenopausal, 42% menopausal). Clusters generated from logs of 45 different symptoms were assessed for similarities across methods: hierarchical clustering analysis (HCA), K-Means clustering of principal components of symptom reports, and binomial network analysis. Participants were further evaluated based on menstrual cycle regularity or natural versus medically induced menopause. Menstrual cycle-associated symptoms (e.g., cramps, breast swelling), digestive, mood, and integumentary symptoms were characteristic of premenopause. Vasomotor symptoms, pain, mood, and cognitive symptoms were characteristic of menopause. Perimenopausal women exhibited both menstrual cycle-associated and vasomotor symptoms. Subpopulations across life stages presented with additional correlated mood and cognitive, integumentary, digestive, nervous, or sexual complaints. Symptoms also differed among women depending on the reported regularity of their menstrual cycles or the way in which they entered menopause. Notably, we identified a set of symptoms that were very common across life stages: fatigue, headache, anxiety, and brain fog. Finally, we identified a lack of predictive power of hot flashes for any symptom except night sweats. Together, premenopausal women exhibit menstrual cycle-associated symptoms and menopausal women reported vasomotor symptoms, while perimenopausal women report both. All report high rates of fatigue, headache, anxiety, and brain fog. Limiting focus of menopausal treatment to vasomotor symptoms, or to premenstrual syndrome in premenopausal women, neglects a large proportion of overall symptom burden. Future interventions targeting mood and cognition, digestion, and the integumentary system are needed across stages of female reproductive life.

Perimenopause symptoms, severity, and healthcare seeking in women in the US

Article Open access 25 February 2025

Using network analysis to understand the association between menopause and depressive symptoms

Article Open access 23 November 2024

Menopause as a biological and psychological transition

Article 20 June 2025

Introduction

The symptoms attributed to any life stage must be viewed against an ever-evolving background of population health, and those associated with the menstrual cycle and menopause are in need of an update. Menopausal symptoms received increasing medical attention in the mid-nineteenth century¹, and premenstrual syndrome (PMS) has been medically acknowledged since at least 1935². Although the baseline frequency for metabolic and female reproductive health problems (e.g., polycystic ovarian syndrome, obesity) increased greatly from these eras, the clinical description of cycle-associated to climacteric symptoms has changed minimally. Health in America has degraded rapidly in the twenty-first century, with well-documented increases in overweight and obesity^3,4, metabolic disease^5,6, and potentially poorer coping with perceived stress⁷, all of which can impact reproductive function^8,9 and menopause^10,11,12,13. In particular, the recent years of the COVID-19 pandemic comprise a uniquely stressful and isolating time, the health effects of which are still being desribed^14,15. This changing background of general “health” in the population suggests that the experience of female reproductive life stages may differ from the reports in the 1990’s or even early 2000’s. Because life expectancy of women has now reached 80 (almost 35 years longer than at the turn of the twentieth century¹⁶), a greater proportion (40%) of a woman’s lifespan is spent in menopause and, consequently, with menopausal symptoms. The interaction of the changing baseline health environment with aging physiology in a population in which menopausal women will become an estimated half of females by 2030¹⁷ creates a strong need for a “living profile” of symptoms and symptom clusters. These clusters can provide an “update” for clinicians as well as context for evaluating efficacy of new treatments for women of all ages.

Today, methods like symptom clustering have yielded objective diagnostic criteria and more specialized treatment in fields ranging from psychiatry¹⁸ to cardiology^19,20, gastroenterology²¹, and female reproductive health^{10,11,22,23,24,25,26}. Numerous efforts have been made to cluster symptoms among premenopausal to menopausal women using longitudinal clinical datasets and surveys^{10,11,24,27,28,29}, and a few studies have employed self-collected data from menstrual tracking apps to evaluate premenopausal women^30,31. Despite these efforts, lack of consensus remains regarding as to what symptoms: (1) present-day women experience, (2) consistently predict occurrence of other symptoms, (3) cluster to each hormonal life stage, and what (4) distinct phenotypes within life stages exist that may be served by different treatments.

Large-scale efforts have described potential sub-groups of menopausal experience. Analyses are frequently directed at the open-access Study of Women’s Health Across the Nation (SWAN) dataset³², which followed 3289 patients at biannual office visits across 16 years, therefore capturing the full menopausal transition of many women. Although these infrequent visits relied heavily on recall at the time of doctors’ appointments, the dataset yielded many insights into the variety of symptoms experienced in menopause, potential trajectories, and influential factors. However, it is yet to generate a consistent picture of “the menopausal experience” or even consistent “menopausal subtypes”. For example, Harlow et al.¹⁰ used latent transition analysis to evaluate symptom relatedness in the SWAN dataset, identifying symptom severity clusters ranging from relatively asymptomatic to highly symptomatic. They additionally reported two distinct symptom cluster types: fatigue and psychosocial, versus vasomotor (VMS), sleep, and fatigue. A more recent analysis of a 557-woman subgroup with metabolic syndrom¹¹, used latent class growth analysis to identify very different symptom clusters: sleep and urinary problems; VMS and vaginal dryness; and psychological, joint, and sexual dysfunction.

A separate menopausal cohort study of 971 women²⁴, the Women’s Wellness Research Study, identified a further set of symptom clusters using a single timepoint survey: psychological, fatigue, and sleep; VMS; pain and numbness; and panic attacks and racing heart. Moreover, the authors identified more and more severe symptoms in women with a history of breast cancer, who reported nearly double the rate of VMS and low libido. Woods et al.²⁹ drew on the Seattle Midlife Women’s Health Study, which collected annual surveys of 508 menopausal and perimenopausal women. This effort identified only mildly symptomatic, moderately symptomatic, and highly symptomatic clusters. Finally, studies of populations around the world have suggested that culture and genetic background impact menopausal experience. An early study conducted via phone interview of 1,900 Chinese women identified lower prevalence of VMS, and a peak of symptoms during perimenopause²⁷. Five symptom clusters were identified: muscular and GI pain; psychological, respiratory, VMS and sleep disturbance, and non-specific somatic (fatigue, dizziness, headache). Subsequent studies have confirmed lower incidence of VMS and higher pain reporting in Asian women³³. Additional differences may be present, with African American women exhibiting higher rates of VMS^34,35 and Caucasian and Asian women reporting higher rates of psychological symptoms²⁸.

Considering the above-named studies, totaling over 7000 women collectively, it remains unclear if consistent subtypes of menopause exist, and what physiological factors are responsible. Woods et al.²⁹, Harlow et al.¹⁰, and Min et al.¹¹ all make clear that external health and socioeconomic factors (breast cancer history, financial stress, ethnicity, and obesity/metabolic syndrome) can worsen the number and severity of symptoms, but do not provide consensus otherwise. Such different findings may be amplified by differences in data collection, analytical methods, number and range of reportable symptoms, symptom severity metrics, and changing perception of symptoms collected from different cultures in different decades.

In an attempt to reconcile results across analysis methods and life stages, we analyzed the characteristics of user-reported symptoms collected from a smartphone application designed to capture up to 45 symptoms in daily life. Analyzing symptom reporting across the continuum of pre- to peri- to post-menopausal enabled us to distinguish among symptoms dependent or independent of life stage. In addition, we aimed to avoid bias inherent in each of analytical method by employing several standard clustering methods: hierarchical clustering analysis (HCA) of symptom covariance, K-Means clustering of principal components generated from symptoms, and binomial network analysis. We hypothesized that most common symptoms among premenopausal women would be associated with the menstrual cycle (e.g., cramps, ovulation pain, breast swelling, spotting). We hypothesized that VMS would emerge in perimenopause, and menstrual-associated symptoms would disappear by menopause. Finally, we hypothesized that symptom patterning would vary by cycle regularity and by type of menopause. Comparison of premenopausal through menopausal populations enabled us to distinguish among symptoms that depend on life stage, and symptoms common to present-day women independent of life stage.

Results

Study population

Using a smartphone-based application where participants choose from 45 climacteric conditions/symptoms, 25,369 users recorded a total of 447,802 symptoms. In addition to self-reported symptoms, participants also self-reported menopausal status using a series of onboarding questions in order to determine how menopause was entered and menstrual cycle regularity (if applicable). Using inclusion criteria outlined in Fig. 1, a total of 4789 out of the 25,369 total users and 147,501 symptoms out of the 447,802 total symptoms were included in the analysis. All symptoms were collected from Fall 2021 through Spring 2023 (Supplemental Fig. 1). Of the 4789 total women included in the analysis, 1115 (23%) women met the criteria for premenopause, reporting a total of 27,731 symptoms (Table 1) with a median of 17 symptoms and median absolute deviation (MAD) of \(\pm 5\) reported per user; 1,388 (29%) women met the criteria for perimenopause, reporting a total of 57,964 symptoms an increased and more variable median symptom rate of 23 (\(\pm \, 10\)); 2286 (48%) women met the criteria for menopause, reporting a total of 61,806 symptoms. Despite the increased number of menopausal users and symptoms, symptoms per user were remarkably consistent, with a median of 19 symptoms (\(\pm \, 6\)). Note that some users did not answer if or how they had entered menopause (n = 124) or logged chemotherapy (n = 6). These users were excluded from the analysis (See Fig. 1, Table 1) Distribution of symptom counts did not vary by group (Fig. 2).

Table 1 Number of women (N) comprising premenopausal, perimenopausal, and menopausal groups and their respective symptom counts (Sym).

Full size table

Symptom prevalence

The most common logs, presented as a percent of total logs, in premenopausal users were fatigue (7.94%), spotting (7.44%), cramps (6.55%), bloating (6.16%) and headaches (5.78%). By contrast, menopausal hot flashes greatly outweighed the prevalence of any other log (22.3%), followed by fatigue (5.13%), night sweats (4.31%), anxiety (3.96%), joint pain (3.52%), and bloating (3.45%). Perimenopausal women exhibited a combination of these most prevalent symptoms from the pre- and post-menopausal cohort with log prevalence of the following symptoms: hot flashes (14.8%), fatigue (6.33%), headaches and night sweats (each 4.77%), and cramps (4.38%) (Fig. 3A).

The percentage of users reporting symptoms depended on self-reported life stage. Premenopausal women were most likely to report fatigue (74.4%), followed by bloating (60.6%), cramps (57.3%), headaches (52.6%), and anxiety (52.2%), which closely mimicked the most common logs.

Perimenopausal women exhibited the highest rate of hot flashes (83.4%) and night sweats (62.2%), followed by fatigue (74.8%), headaches (58.9%), and bloating (57.1%), all comparable rates to premenopausal women (Fig. 3B). Even though hot flashes were the most reported symptom in the perimenopausal and menopausal cohorts, more menopausal users reported fatigue (75.0%) than hot flashes (73.1%) followed by reports of anxiety (58.7%), joint pain (56.1), and brain fog (56.1%).

Total symptom counts by user exhibited statistical differences between premenopausal, peri, and menopausal women. Aside from symptoms known to explicitly relate to either menopause or the menstrual cycle (i.e., vasomotor symptoms, ovulation and ovulation pain, cramps, spotting), several symptoms differed. Premenopausal and perimenopausal women logged the largest differences in fatigue, bloating, headaches, diarrhea, and mood swings (p < 0.01, chi-sq > 48.7) compared to menopausal women. Menopausal women reported greatest differences in elevated rates of painful sex, insomnia, vaginal dryness, memory lapse, low sex drive, and uti (p < 0.01, chi-sq > 26). No differences were observed in anxiety or vertigo.

Hierarchical clustering of symptom covariance

Premenopausal

Premenopausal symptoms fell into mood/cognitive and digestive groups, with most other symptoms minimally covarying (Supplemental Fig. 3, top). Brain fog and memory lapse were grouped more closely among regular cyclers, as were digestive and menstrual cycle-associated symptoms (e.g., breast pain, cramps, insomnia, bloating, constipation, ovulation pain) (See Supplemental Fig. 4). By contrast, fatigue and a variety of mood and cognitive symptoms were more closely related in irregular cyclers (See Supplemental Fig. 4).

Perimenopausal

Perimenopausal women exhibited 3 large symptom branches (Supplemental Fig. 3, Middle). The highest covarying symptom group included digestive and mood/cognitive problems. Hot flashes were unrelated to any other symptoms.

Menopausal

Menopausal symptoms exhibited different structure from premenopausal or perimenopausal women (Supplemental Fig. 3, Bottom), and further differed by type of menopause (natural vs. medical/surgical) (Supplemental Fig. 5). In the menopause cohort as a whole, most data fell into a large, moderately correlated cluster including integumentary and mood/cognitive problems. Mood/cognitive and integumentary problems were even more related in medical/surgically entered menopause (Supplemental Fig. 5, Top). Notably, hot flashes and night sweats were unrelated to any other symptoms within these hierarchies.

K-means clustering of symptom covariance PCA

Premenopausal

All observed premenopausal clusters shared a baseline phenotype resembling premenstrual syndrome. 81% of users fell into a cluster reporting fatigue, bloating, cramps, anxiety, and headache. A remaining 13% were differentiated by bloating. The remaining 6% exhibited a variety of integumentary complaints alongside spotting and additional digestive symptoms. The only observed difference in top symptoms for regular cyclers was in the presence of brain fog rather than headache in the most common cluster (76% of regular cyclers). Fifteen principal components (PCs) were needed to capture 87% of the variance in the premenopausal dataset. The first PC captured 23% of the variance, and the second PC a remaining 11%. PC 3 accounted for 9%, and PC4 7%. All remaining PCs were ≤ 5%. Relatively few symptoms contributed to the top PCs: spotting in PC1, fatigue, headaches, anxiety, bloating, cramps, and breast pain in PC2; breast pain and cramps PC3, headaches in PC4 (See Supplemental Fig. 2).

Perimenopausal

Top symptoms grouping each main segment are as follows: 81% were placed in a segment characterized by VMS alongside fatigue, cramps, and bloating. 12% were characterized by their lack of night sweats, and presence of spotting and headaches. The remaining exhibited additional digestive symptoms or muscular pain. Fifteen PCs were needed to capture 90% of the variance: 55% for PC1, 11% for PC2, and 5% for PC3 (See Supplemental Fig. 2). Remaining PCs captured ≤ 3% of the variance (data not shown). PC 1 was exclusively determined by hot flashes, PC 2 was a mixture of many symptoms, PC 3 largely driven by nights sweats and, notably, the next PC driven by residual menstrual symptoms spotting and cramps.

Menopausal

The large majority (91%) of menopausal women were clustered by hot flashes and, similar to premenopausal women, fatigue, anxiety, bloating, and joint pain. An additional 6% included night sweats. The remaining reported the above symptoms alongside insomnia, chills, and irregular heartbeat. In menopausal women, 90% of the variance was captured in the first 3 PCs, the vast majority by PC1: 86% for PC1, 3% for PC2, and 3% for PC3 (See Supplemental Fig. 2). Remaining PCs captured ≤ 3% of the variance (data not shown). PC 1 was almost exclusively determined by hot flashes, whereas PC2 was a mixture of fatigue, night sweats, mood/cognitive, and integumentary problems. PC3 was dominated by night sweats.

Symptom networks

Symptom networks varied greatly by hormonal stage of life. Premenopausal symptoms were linked more sparsely than perimenopausal or menopausal symptoms, and into 6 groups of 3 or more symptoms, with all remaining symptoms singletons or pairs. Groups were comprised of (1) cognitive/mood, (2) integumentary, (3) flu-like/digestive, (4) nervous/muscular pain, (5) sexual symptoms, and (6) menstrual cycle-associated (See Supplemental Table 1 for symptoms comprising these groups). The nodes with highest degree centrality were brain fog, mood swings, dry skin, cramps, and nausea. For symptom networks by type of cycler, see (Supplemental Fig. 6).

Perimenopausal symptoms also clustered into 7 similar groups of 3 or more symptoms, with remaining symptoms in singletons or pairs (Fig. 4). (1) cognitive/mood, (2) integumentary, (3) dizziness/vertigo/irregular heartbeat, (4) sexual symptoms, (5) pain, and two groups included menstrual cycle-associated symptoms (ovulation vs. premenstrual). Notably, hot flashes and night sweats clustered together but not with any other symptoms. Mood swings, constipation, hair loss, vaginal dryness, and depression exhibited the greatest degree centrality.

Menopausal symptoms displayed a denser network than premenopausal symptoms, with similarly structured clusters to those found in HCA and K-Means clustering and some overlap with the premenopausal symptom network. Menopausal women exhibited a highly connected network of overlapping symptoms: (1) cognitive/mood, (2) digestive and integumentary (3) dizziness/vertigo/irregular heartbeat similar to perimenopausal women, (4) sexual, (5) flu-like, and (6) hot flashes, night sweats and chills. Joint pain, fatigue, itchy and dry skin, and memory lapse exhibited highest degree centrality. For symptom networks by type of menopause, see (Supplemental Fig. 7). All symptom summaries across methods are described in (Table 2).

Table 2 Symptom clustering summary.

Full size table

Methods

Data collection and inclusion criteria

All procedures have been approved by Western Institutional Review Board-Copernicus Group (registration number, OHRP and FDA, IRB0000053; parent organization number, IORG0000432) Study Number: 1284093. Anonymized data were drawn from the MenoLife mobile app created by MenoLabs (https://app.menolabs.com) and collected between Fall 2021 and Spring 2023. As part of the onboarding process, users provided informed consent for use of de-identified data in this research. Users completed an onboarding questionnaire that included whether menstruation occurred in the last twelve months, description of menstrual periods, how the user entered menopause (if they noted absence of menstrual periods or > 12 months since last menstruation) and, finally, selection of most common symptoms. Following onboarding, women used the app at will to enter symptom logs from a list of 45 available symptoms. Retrospective analysis retrospective analysis of the entire app cohort indicates that approximately 94% of users were from United States and ~ 4% from United Kingdom.

Onboarding data was used to estimate user status as premenopausal or menopausal. These groups were separated for further analysis. Briefly, if users indicated that they had not entered menopause, and further specified that they had had a menstrual period within 12 months, and did not record vasomotor symptoms (i.e., hot flashes or night sweats), they were classified as premenopausal. Users were further grouped by whether they reported regular or irregular cycles. If users stated that they had entered menopause, and further confirmed that they had not had a period in more than 12 months, they were classified as menopausal. Menopausal users were further grouped by whether they reported entering menopause naturally, or via medical/surgical interventions. Individuals reporting chemotherapy were not included. Perimenopause does not have a strict clinical definition using symptoms alone³⁸. Here we chose to estimate the perimenopausal population as users that indicated they had not entered menopause, who reported irregular cycles within the past year, and who experienced vasomotor symptoms. Users that selected conflicting answers were omitted from further analysis (e.g., self-identified as menopausal but selected that their periods were regular). As individual onboarding questions could be skipped, and multiple answers could be selected in some cases to each question, users of indeterminate status were also omitted from further analysis.

Finally, as many users were minimally interactive with the app, logging only a few symptoms, we opted to include only users who had logged at least 10 symptoms for further analysis of relationship among symptoms. To minimize the impact of “super-users” on symptom covariance, we opted to omit individuals who had logged > 300 symptoms.

Self-collected data and analytical methods

This analysis relied on self-collected, rather than clinician-collected data, providing several advantages and limitations in terms of accuracy, detail, and timeliness. In contrast to in-clinic data collection, individuals in the Meno Life data set have the opportunity to collect repeat data over time. Most users interact with the app for ~ 1 week (data not shown), providing a representative view of their everyday life during this time as opposed to a recollected snapshot over many years. Self-collected data close to the time symptoms are experienced is likely to be more accurate than in-clinic recall once a year³⁹. Self-collected data may be particularly more accurate for symptoms which recur multiple times a day (e.g., hot flashes, chills), and which would otherwise be difficult to count and evaluate separately^39,40. Finally, self-collected data using a mobile app on a personal smartphone may alleviate hesitation to report more personal symptoms (indeed, some previous studies omit questions about urogenital or sexual symptoms altogether in their surveys²⁷).

Symptom categories

Menstrual cycle associated symptoms were here defined as ovulation and ovulation pain, menstrual cramps, breast pain and swelling (See Supplemental Table 1). Although fatigue and mood changes are commonly considered premenstrual symptoms, we considered that these were characteristic of both pre and post-menopausal women. Vasomotor symptoms were defined as hot flashes and night sweats. Although chills may be defined as VMS, they can also result from illness/infection, and so were considered separately. Integumentary symptoms refer to symptoms affecting hair, skin, and nails.

Data analysis and statistics

Data were securely organized in Amazon Web Services (AWS) S3 and queried through AWS Athena. Custom Python and R code was written for all the analysis methods. Ranksum tests (non-parametric ANOVA) were used to avoid assumptions of normality in comparisons of symptom count by individual across all 45 measured symptoms. Prior to computing the covariance matrix, symptom counts were standardized across symptoms rather than across users, meaning that users who experienced more symptoms contributed more strongly to HCA (Python: clustermap() from seaborn, linkages generated using linkage() from scipy.cluster.hierarchy).

Hierarchical clustering analysis

HCA combines independent forests of clusters that are not part of an existing hierarchy by using a distance metric to grow clusters. It starts by treating each symptom as a single node “forest”, maintaining a distance matrix between all clusters. This distance matrix is updated at each bottom-up iteration, with the algorithm converging at the formation of a single cluster. The distance metric used is \(\text{min}\left(\text{dist}\left(\text{u}\left[\text{i}\right],\text{ v}\left[\text{j}\right]\right)\right)\) where u, v are 2 clusters, and i, j represents each element within that cluster. This is repeated for all pairs of clusters up the hierarchical chain. We use Euclidean distance for dist() function. Dendrograms representing the hierarchical structure of symptom data were generated using the unweighted pair group method using arithmetic mean (UPGMA) applied to the covariance matrix of normalized symptom counts, and silhouette score was used to determine clusters reported here.

Principal components analysis

We used PCA (Python: PCA() from sklearn.decomposition) to reduce the dimensionality of the symptom dataset prior to K-Means clustering. K-Means clusters were generated using the principal components needed to capture up to 90% of the variance in each group’s symptom data, and elbow of the sum of squared errors (SSE) and silhouette score were used to determine optimal cluster number.

K-Means clustering

K-Means clustering of principal components generated from normalized symptom count was used to evaluate potential consensus with hierarchical clustering. Recall that HCA is a bottom-up approach in which each symptom-pair began as its own cluster, and clusters are iteratively merged or split until all points have been accounted for. By contrast, K-Means partitions data into a set number of clusters and aims to place data points into the group with the nearest centroid. We aimed to compare results generated under these methods to identify what symptom clusters were identified in both, as well as any small but notable groupings identified by HCA (e.g., the grouping of mood and cognitive symptoms).

Network analysis

Network analysis was performed in R using the package IsingFit^41,42. The network estimation procedure used, called “eLasso” and based on the Ising model, pairs regularized logistic regression with model selection based on the Extended Bayesian Information Criterion (EBIC), a measure of fit that identifies variable relationships of interest. The resulting network consists of a symmetric (undirected) weight adjacency matrix. Each value above (below) the diagonal represents an edge (relationship) between a variable in a given row to the variable in that column.

Input data to Isingfit were one-hot encodings of the symptom matrices from each group. The presence of a symptom in any count in each individual was converted to a 1, and absence of a symptom remained a zero. Symptoms were set to null for which values were either (a) all blank, (b) rare enough that the Isingfit reported error due to lack of co-variance. These were nipple discharge in all groups; hot flashes and night sweats for premenopausal women; and ovulation, ovulation pain, spotting, and vomiting for menopausal women.

Networks were then exported and visualized in Python using the package iGraph. The Walktrap algorithm (Python: walktrap() from iGraph) was used to identify relevant communities within the network⁴³. Negative correlations could not be used as inputs, were rare in the networks estimated by Isingfit, and were removed. Walk trap was tested with 3–10 steps, and the number of steps minimally impacted the estimated communities. Graphs displayed use 4 steps. Plots shown depict detected communities as shaded and nodes belonging to those communities in the same color. For a list of symptom names and abbreviations displayed on the graphs (See Supplemental Table 2).

Node degree, betweenness, closeness, and strength were calculated using iGraph (Python: degree(), betweenness(), closeness(), and strength () from iGraph ) and used to identify the most important nodes in the network. Degree is the number of edges connected to a given node, betweenness is the extent to which one node lies along the shortest path between other nodes, closeness is a measure of the average path length between one node and the others in the network, and strength is the sum of weights attached to ties belonging to a given node.

Data availability

Python and R code used for analysis is openly available as Jupyter notebooks at the following Github repository—https://github.com/OpensciAI/clustering-public. The raw datasets used and/or analysed during the current study is available from the corresponding author on reasonable request.

References

Singh, A., Kaur, S. & Walia, I. A historical perspective on menopause and menopausal age. Bull. Indian Inst. Hist. Med. Hyderabad 32, 121–135 (2002).
PubMed MATH Google Scholar
Richardson, J. T. The premenstrual syndrome: A brief history. Soc. Sci. Med. 1982(41), 761–767 (1995).
Article MATH Google Scholar
CDC Data, F. Health, United States 2019. CDC https://www.cdc.gov/nchs/hus/data-finder.htm?year=2019&table=Table%20026 (2019).
Obesity is a Common, Serious, and Costly Disease. Centers for Disease Control and Prevention https://www.cdc.gov/obesity/data/adult.html (2022).
NASH Definition & Prevalence—American Liver Foundation. https://liverfoundation.org/liver-diseases/fatty-liver-disease/nonalcoholic-steatohepatitis-nash/nash-definition-prevalence/.
Moore, J. X. Metabolic syndrome prevalence by race/ethnicity and sex in the United States, National Health and Nutrition Examination Survey, 1988–2012. Prev. Chronic. Dis. 14, E24 (2017).
Article PubMed PubMed Central MATH Google Scholar
Grelle, K. et al. The generation gap revisited: Generational differences in mental health, maladaptive coping behaviors, and pandemic-related concerns during the initial COVID-19 pandemic. J. Adult Dev. https://doi.org/10.1007/s10804-023-09442-x (2023).
Article PubMed PubMed Central MATH Google Scholar
Vidal-Cevallos, P., Mijangos-Trejo, A., Uribe, M. & Tapia, N. C. The interlink between metabolic-associated fatty liver disease and polycystic ovary syndrome. Endocrinol. Metab. Clin. N. Am. 52, 533–545 (2023).
Article Google Scholar
Phumsatitpong, C., Wagenmaker, E. R. & Moenter, S. M. Neuroendocrine interactions of the stress and reproductive axes. Front. Neuroendocrinol. 63, 100928 (2021).
Article CAS PubMed PubMed Central Google Scholar
Harlow, S. D. et al. It is not just menopause: Symptom clustering in the Study of Women’s Health Across the Nation. Womens Midlife Health 3, 2 (2017).
Article PubMed PubMed Central MATH Google Scholar
Min, S. H. et al. Identification of high-risk symptom cluster burden group among midlife peri-menopausal and post-menopausal women with metabolic syndrome using latent class growth analysis. Womens Health 19, 17455057231160956 (2023).
CAS Google Scholar
Liu, H.-F., Meng, D.-F., Yu, P., De, J.-C. & Li, H.-Y. Obesity and risk of fracture in postmenopausal women: A meta-analysis of cohort studies. Ann. Med. 55, 2203515 (2023).
Article PubMed PubMed Central MATH Google Scholar
Martínez-Vázquez, S., Hernández-Martínez, A., Peinado-Molina, R. A. & Martínez-Galiano, J. M. Impact of overweight and obesity in postmenopausal women. Climact. J. Int. Menopause Soc. https://doi.org/10.1080/13697137.2023.2228692 (2023).
Article MATH Google Scholar
Grigsby, T. J., Howard, K., Howard, J. T. & Perrotte, J. COVID-19 concerns, perceived stress, and increased alcohol use among adult women in the United States. Clin. Nurs. Res. 32, 84–93 (2023).
Article PubMed PubMed Central MATH Google Scholar
Zimmerman, M. E. et al. COVID-19 in the community: Changes to women’s mental health, financial security, and physical activity. AJPM Focus 2, 100095 (2023).
Article PubMed PubMed Central MATH Google Scholar
United States: life expectancy 1860–2020. Statista https://www.statista.com/statistics/1040079/life-expectancy-united-states-all-time/.
Kim, M. Y., Im, S.-W. & Park, H. M. The demographic changes of menopausal and geripausal women in Korea. J. Bone Metab. 22, 23–28 (2015).
Article PubMed PubMed Central MATH Google Scholar
Belanger, H. G., Lee, C. & Winsberg, M. Symptom clustering of major depression in a national telehealth sample. J. Affect. Disord. 338, 129–134 (2023).
Article PubMed MATH Google Scholar
Meijs, C. et al. Identifying distinct clinical clusters in heart failure with mildly reduced ejection fraction. Int. J. Cardiol. 386, 83–90 (2023).
Article PubMed Google Scholar
Niimi, N. et al. Which congestion presentation pattern on the physical findings is associated with future adverse events? A cluster analysis in the multicenter acute heart failure registry. Clin. Res. Cardiol. Off. J. Ger. Card. Soc. 112, 1108–1118 (2023).
MATH Google Scholar
Johansen, I. et al. Symptoms and symptom clusters in patients newly diagnosed with inflammatory bowel disease: Results from the IBSEN III study. BMC Gastroenterol. 23, 255 (2023).
Article CAS PubMed PubMed Central MATH Google Scholar
Pierson, E., Althoff, T. & Leskovec, J. Modeling individual cyclic variation in human behavior. Preprint at https://doi.org/10.48550/arXiv.1712.05748 (2018).
Siegel, J. P., Myers, B. J. & Dineen, M. K. Premenstrual tension syndrome symptom clusters. Statistical evaluation of the subsyndromes. J. Reprod. Med. 32, 395–399 (1987).
CAS PubMed Google Scholar
Seib, C. et al. Menopausal symptom clusters and their correlates in women with and without a history of breast cancer: A pooled data analysis from the Women’s Wellness Research Program. Menopause 24, 624 (2017).
Article PubMed MATH Google Scholar
Gehlert, S., Chang, C.-H. & Hartlage, S. Symptom patterns of premenstrual dysphoric disorder as defined in the diagnostic and statistical manual of mental disorders-IV. J. Womens Health 8, 75–85 (1999).
Article CAS PubMed Google Scholar
da Silva, C. M. L., Gigante, D., Carret, M. L. V. & Fassa, A. Population Study of Premenstrual Syndrome (Saude Publica, 2006).
Google Scholar
Ho, S. C. et al. Menopausal symptoms and symptom clustering in Chinese women. Maturitas 33, 219–227 (1999).
Article CAS PubMed MATH Google Scholar
Im, E.-O., Ko, Y. & Chee, W. Ethnic differences in the clusters of menopausal symptoms. Health Care Women Int. 35, 549–565 (2014).
Article PubMed MATH Google Scholar
Woods, N. F., Cray, L., Mitchell, E. S. & Herting, J. R. Endocrine biomarkers and symptom clusters during the menopausal transition and early postmenopause: observations from the Seattle Midlife Women’s Health Study. Menopause 21, 646 (2014).
Article PubMed PubMed Central Google Scholar
Ainsworth, A. J. et al. Global menstrual cycle symptomatology as reported by users of a menstrual tracking mobile application. 2022.10.20.22280407 Preprint at https://doi.org/10.1101/2022.10.20.22280407 (2023).
Hantsoo, L. et al. Premenstrual symptoms across the lifespan in an international sample: Data from a mobile application. Arch. Womens Ment. Health 25, 903–910 (2022).
Article PubMed PubMed Central MATH Google Scholar
SWAN Study Data Access—Women’s Health Across the Nation. SWAN—Study of Women’s Health Across the Nation https://www.swanstudy.org/swan-research/data-access/.
Im, E.-O., Lee, B., Chee, W., Brown, A. & Dormire, S. Menopausal symptoms among four major ethnic groups in the U.S.. West. J. Nurs. Res. 32, 540–565 (2010).
Article PubMed PubMed Central Google Scholar
Richard-Davis, G. & Wellons, M. Racial and ethnic differences in the physiology and clinical symptoms of menopause. Semin. Reprod. Med. 31, 380–386 (2013).
Article PubMed MATH Google Scholar
Grisso, J. A., Freeman, E. W., Maurin, E., Garcia-Espana, B. & Berlin, J. A. Racial differences in menopause information and the experience of hot flashes. J. Gen. Intern. Med. 14, 98–103 (1999).
Article CAS PubMed Google Scholar
Ryu, A. & Kim, T.-H. Premenstrual syndrome: A mini review. Maturitas 82, 436–440 (2015).
Article PubMed MATH Google Scholar
Jing, F. et al. Contemporaneous symptom networks and correlates during endocrine therapy among breast cancer patients: A network analysis. Front. Oncol. 13, 1081786 (2023).
Article PubMed PubMed Central Google Scholar
Santoro, N. Perimenopause: From research to practice. J. Womens Health 2002(25), 332–339 (2016).
Article MATH Google Scholar
Nahum-Shani, I. et al. Just-in-time adaptive interventions (JITAIs) in mobile health: Key components and design principles for ongoing health behavior support. Ann. Behav. Med. Publ. Soc. Behav. Med. 52, 446–462 (2017).
Article MATH Google Scholar
van de Belt, T. H. et al. Barriers to and facilitators of using a one button tracker and web-based data analytics tool for personal science: Exploratory study. JMIR Form. Res. 6, e32704 (2022).
Article PubMed PubMed Central Google Scholar
Dalege, J., Borsboom, D., van Harreveld, F. & van der Maas, H. L. J. Network analysis on attitudes: A brief tutorial. Soc. Psychol. Personal. Sci. 8, 528–537 (2017).
Article PubMed PubMed Central Google Scholar
Borkulo, C. van, Epskamp, S. & Robitzsch, with contributions from A. IsingFit: Fitting Ising Models Using the ELasso Method (2016).
Pons, P. & Latapy, M. Computing communities in large networks using random walks.

Download references

Funding

This work was supported in part with research funding from Amyris, INC provided to Opensci, LLC. JPK and SGA are founding partners of Opensci, LLC. No other funding supported this work. ADG reports no conflicts of interest. Amyris played no role in study design, data collection, analysis and interpretation of data, or the writing of this manuscript. Amyris did not have the opportunity to review this manuscript prior to its submission for peer review.

Author information

Shravan G. Aras and Azure D. Grant contributed equally to this work.

Authors and Affiliations

Center for Biomedical Informatics and Biostatistics, University of Arizona Health Sciences, Tucson, AZ, 85750, USA
Shravan G. Aras
Opensci, LLC, Tucson, AZ, 85750, USA
Shravan G. Aras & John P. Konhilas
People Science Inc., Los Angeles, CA, 90291, USA
Azure D. Grant
Department of Physiology, University of Arizona College of Medicine, Tucson, AZ, 85750, USA
John P. Konhilas

Authors

Shravan G. Aras
View author publications
Search author on:PubMed Google Scholar
Azure D. Grant
View author publications
Search author on:PubMed Google Scholar
John P. Konhilas
View author publications
Search author on:PubMed Google Scholar

Contributions

Study conception and data collection: S.A., J.K. Hypothesis generation. S.A., A.G., J.K. Data analysis: S.A. and A.G. Writing: A.G., J.K., S.A. All authors read and approved the final manuscript.

Corresponding author

Correspondence to John P. Konhilas.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical approval

This research was conducted in accordance with the principles outlined in the Declaration of Helsinki. All procedures have been approved by Western Institutional Review Board-Copernicus Group (registration number, OHRP and FDA, IRB0000053; parent organization number, IORG0000432) Study Number: 1284093. As part of the onboarding process, users provided informed consent for use of de-identified data in this research.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Aras, S.G., Grant, A.D. & Konhilas, J.P. Clustering of > 145,000 symptom logs reveals distinct pre, peri, and menopausal phenotypes. Sci Rep 15, 640 (2025). https://doi.org/10.1038/s41598-024-84208-3

Download citation

Received: 04 October 2024
Accepted: 20 December 2024
Published: 03 January 2025
Version of record: 03 January 2025
DOI: https://doi.org/10.1038/s41598-024-84208-3

Keywords

This article is cited by

Advancing evidence-based regulation: Organization for the Study of Sex Differences and Society for Women’s Health Research support FDA action on menopausal hormone therapy and encourage broader sex-informed drug label updates
- Rebecca L. Cunningham
- Liisa A.M. Galea
- Sofia B. Ahmed
Biology of Sex Differences (2026)

Subjects

Abstract

Similar content being viewed by others

Perimenopause symptoms, severity, and healthcare seeking in women in the US

Using network analysis to understand the association between menopause and depressive symptoms

Menopause as a biological and psychological transition

Introduction

Results

Study population

Symptom prevalence

Hierarchical clustering of symptom covariance

Premenopausal

Perimenopausal

Menopausal

K-means clustering of symptom covariance PCA

Premenopausal

Perimenopausal

Menopausal

Symptom networks

Methods

Data collection and inclusion criteria

Self-collected data and analytical methods

Symptom categories

Data analysis and statistics

Hierarchical clustering analysis

Principal components analysis

K-Means clustering

Network analysis

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Additional information

Publisher’s note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

This article is cited by

Advancing evidence-based regulation: Organization for the Study of Sex Differences and Society for Women’s Health Research support FDA action on menopausal hormone therapy and encourage broader sex-informed drug label updates

Search

Quick links