Introduction

Leiomyomas, also known as uterine fibroids, are benign neoplasms originating in the smooth muscle of the uterus. They are the most common female pelvic tumor developing in up to 80% of females by menopause, and account for up to 34 billion dollars in health care costs in the United States annually1,2,3,4. Fibroids are the leading indication for hysterectomy4. Symptomatic fibroids have a range of reproductive health effect,s including heavy and painful menses, anemia, pelvic pain, and pregnancy complications5. However, up to 50% of females remain asymptomatic, complicating research on the etiology of fibroids as asymptomatic cases can be misclassified without pelvic imaging2.

Current understanding of the clinical risk factors of fibroids is limited to a small number of candidate risk factors identified primarily from self-reported fibroids or prospective cohorts of imaging-confirmed fibroids. Self-reported Black race is the most well-established risk factor for fibroids, with Black females having 2-fold higher odds of developing fibroids relative to White females2,4,6. Black females also develop more numerous and larger fibroids at younger ages7,8. Other factors associated with increased fibroid risk include higher body mass index (BMI), family history, a history of hypertension, increasing age, nulliparity, and earlier age at menarche2,3,9,10,11,12,13,14. Smoking has also been shown to be protective in some studies13,15.

Phenome-wide association studies (PheWAS) offer a unique way to interrogate comorbid disease and risk relationships on a large scale. PheWAS is a data mining approach that tests for associations between an exposure (such as a genotype or a disease diagnosis) across several available disease phenotypes in a systematic, high-throughput, and reproducible way16,17. This method utilizes phecodes, which group relevant International Classification of Diseases codes into clinically meaningful phenotypes and allow researchers to rapidly define phenotypes and query associations. Greater access to long-term information in patient electronic health records (EHRs), combined with PheWAS approaches, will capture relationships not typically collected in traditional cohort studies. PheWAS has been used successfully across several topics, ranging from a study evaluating the relationship between Neanderthal genome and contemporary human phenotypes to studies evaluating the comorbidities associated with systemic lupus erythematosus and leukodystrophies18,19,20,21. Though traditional PheWAS approaches cannot determine causation since they do not account for temporality, PheWAS provides the opportunity to uncover novel comorbidity associations not possible in typical candidate risk factor studies and allows the assessment of the level of comorbidity burden. Observed associations could then be used to prioritize risk factors for treatment and modification within and across groups.

We used PheWAS to systematically investigate the clinical context of fibroids, to understand broader disease associations and explore the clinical phenome. Our hypothesis was that fibroids status would be associated with known fibroid symptoms, and individuals with fibroids would demonstrate an increased burden of comorbidities. Using two large clinical cohorts, we conducted PheWAS analyses using a previously published and validated phenotyping algorithm that required image confirmation to define fibroid cases and controls22. This study was a two-stage design with discovery analyses performed using Vanderbilt University Medical Center’s Synthetic Derivative database, and Geisinger Health Systems EHR database employed for validation of discovery results.

Methods

Study cohorts

We utilized Vanderbilt University Medical Center’s (VUMC) Synthetic Derivative (SD) for our discovery analyses. The SD is a de-identified mirror of the VUMC EHRs containing longitudinal data, including demographic and clinical information, for over 3 million subjects who have received care in the VUMC healthcare system23. Non-Hispanic Black and White females 18 years or older were eligible for inclusion, with race and ethnicity defined via self-reported or by providers. Cases and controls were identified using our previously published algorithm, which has been shown to have positive and negative predictive values of 96% and 98% respectively22. Briefly, cases had at least one International Classification of Diseases, 9th Revision (ICD-9) or current procedure terminology (CPT) code for pelvic imaging and had at least one ICD-9 or CPT code indicating a fibroid diagnosis. Controls had at least two procedural codes for pelvic imaging without a fibroid diagnosis at the time of the last pelvic exam, as well as no history of hysterectomy, myomectomy, or uterine artery embolization.

The Geisinger Health System (GHS) Database was used as a validation cohort, with cases and controls identified by the algorithm described above. GHS is a fully integrated health system serving three million residents of north-central and northeastern Pennsylvania. The database comes from GHS’s physician group practices, which include a network of 1000 physicians across 75 sites, inclusive of 41 community care clinics. As all data was de-identified, this study was deemed non-human subjects research and approved by Vanderbilt University Medical Center Institutional Review Board. All methods were carried out in accordance with relevant guidelines and regulations.

Statistics and reproducibility

PheWAS, adjusted for age and BMI, were performed with uterine fibroids as the outcome and each diagnosis, condition, or clinical characteristic (phecode) as an exposure. Analyses were performed using the PheWAS package (v 0.99.5-3) in R version 4.3.224. Discovery PheWAS was first performed in the SD before statistically significant results were validated in GHS cohort (Fig. 1a, Supplementary Fig. 1). A Bonferroni correction based on the number of tests in discovery cohorts was used to determine significance (p-value ≤ 2.98 × 10⁻⁵ for non-Hispanic Black individuals, 2.87 × 10⁻⁵ for non-Hispanic White individuals, 2.99 x 10⁻⁵ for multi-population). Phecodes from statistically significant associations in the discovery cohorts were then carried forward through testing in the validation cohorts. Inverse-variance weighted fixed-effects meta-analyses was performed, using METAL software, for associations that were statistically significant in the discovery cohorts and had the same direction of effect in both the discovery and validation analyses25. Race-stratified and multi-population meta-analyses were performed across the cohorts (Supplementary Data 13). Bonferroni p-values for the meta-analyses were based on the number of statistically significant phecodes in the discovery cohorts that were available and in the same direction in the validation cohorts (non-Hispanic Black: 2.55 × 10⁻⁴; non-Hispanic White: 1.33 × 10⁻⁴, multi-population 1.28 × 10⁻⁴). Secondary analyses, adjusting for only age, were also performed (Supplementary Data 4).

Fig. 1: Manhattan plots of PheWAS results from meta-analyzed cohorts.
figure 1

Manhattan plots of phenome-wide association study results for the multi-population [N max = 79,213] (a), non-Hispanic Black [N max=11,342] (b), and non-Hispanic White [N max=67,871] (c) meta-analysis. Panels display validated, statistically significant results that were meta-analyzed across discovery and validation cohorts. Each triangles represents a phenotype. Direction of the triangle’s tip corresponds to increased (up) or decreased (down) odds. The red line indicates statistical significance. Triangles above this line are significantly associated with fibroids.

Using previously published methods in the R PheWAS package, phecodes were classified into 16 disease groups based largely off of organ systems and/or biologic processes17,24. A binomial test was used to test for directional relationships of significant associations across all tests, as well as to test for directional relationships within disease categories within stratified meta-analyses and multi-population meta-analysis. A Bonferroni correction was used to determine significance in tests within disease categories.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Results

Study Populations

We identified 52,295 females for discovery analyses in the SD database that had complete covariate (age and BMI) information to be included in analyses (9022 cases, 43,273 controls, Table 1). In the GHS validation cohort, there were 26,918 (10,232 cases, 16,686 controls) females with complete covariate information (Fig. 1A). Non-Hispanic Black individuals, classified using EHR-reported race and ethnicity, made up 19.88% and 3.51% of the discovery and validation cohorts. Average age at diagnosis was lower in non-Hispanic Black cases (SD: 39.2 years, GHS: 45.0 years) relative to non-Hispanic White cases (SD: 44.7 years, GHS: 52.7 years) in both cohorts. Average BMI in cases was higher relative to controls in both races (SD non-Hispanic Black and White individuals: 33.2 and 28.9 kg/m² in fibroid cases and 31.6 and 27.8 kg/m² in controls, GHS non-Hispanic Black and White individuals: 32.3 and 30.8 kg/m² in fibroid cases and 31.8 and 29.9 kg/m² in controls). In the discovery cohort, non-Hispanic Black individuals had a higher proportion of individuals with hypertension and Type 2 diabetes (5% and 11% in cases, 3% and 6% in controls) relative to non-Hispanic White individuals (2% and 6% in cases, 2% and 4% in controls). There was a higher burden of diabetes and hypertension in both cases and controls in GHS cohort compared to the cohort from the SD, likely due to the generally older and higher burden of obesity of the GHS patients (Table 1)26.

Table 1 Demographics of study populations

SD database discovery PheWAS

In discovery analyses using the SD, a total of 1678 and 1743 traits were tested for their associations with uterine fibroids in non-Hispanic Black and White individuals, respectively (Supplementary Data 1, 2). Two hundred and eight associations in non-Hispanic Black and 425 in non-Hispanic White individuals were statistically significant after correction for multiple testing. One hundred and ninety phecodes were statistically significant in both races in the discovery dataset (Supplementary Data 1, 2). Seventeen were statistically significant only in non-Hispanic Black females but not non-Hispanic White females, whereas 234 were statistically significant in non-Hispanic White females but not non-Hispanic Black females. One association (285-other anemias) was statistically significant in both non-Hispanic Black and non-Hispanic White individuals but had different directions of effect (positive association in non-Hispanic Black individuals and negative association in non-Hispanic White individuals). In the multi-population analysis, in which 1671 traits were tested for their association with fibroids. There were 482 statistically significant associations (Supplementary Data 3).

GHS database validation PheWAS

In validation analyses, 197 of phecodes statistically significant in the non-Hispanic Black discovery cohort were available in the non-Hispanic Black validation cohort. One-hundred and sixty-nine of these codes had the same direction of effect as in the discovery cohort (Supplementary Data 1), though only nine of these codes were also statistically significant in the validation PheWAS. In non-Hispanic White females, 377 of the 420 available phecodes that were statistically significant in the discovery cohort had the same direction of effect in those from the GHS database (Supplementary Data 2). Two hundred and eleven phecodes with the same direction of effect were also statistically significant in the non-Hispanic White cohort from the GHS database. Of the 437 available phecodes that were statistically significant in the discovery multi-population analysis, 392 replicated as their direction remained consistent across EHR databases (Supplementary Data 3). Two hundred and eleven of the replicated codes in the multi-population cohort were also statistically significant in this validation PheWAS.

Multi-population SD and GHS meta-analyses

In the multi-population meta-analysis, almost all of the 392 (389, 99.23%) phecode associations that were meta-analyzed across the discovery and validation cohorts were statistically significant at a Bonferroni significance level (Fig. 1A, Tables 2, 3). Three phecodes, atherosclerosis of native arteries of the extremities with ulceration or gangrene (p-value = 3.03 x 10-4), fracture of the foot (p-value = 4.66 x 10-4), and open wounds of head; neck; and trunk (p-value = 6.25 x 10-4), had suggestive significance but failed to reach statistical significance at the Bonferroni level. Demonstrating the strength and performance of our algorithm for defining cases and controls, the association with the largest odds ratio (OR) benign neoplasm of the uterus (ORmulti = 4625.78, 95% confidence interval [CI] = 3507.46–6100.66, p-value = 3.87 ×10-778). These associations were observed within individual races, in multi-population meta-analyses, and across the discovery and validation cohorts (Tables 2, 4). Additionally, several known fibroid risk factors were also associated with fibroid status, including disorders of menstruation and other abnormal bleeding from female genital tract (ORmulti = 6.45, 95% CI = 6.13–6.78, p-value = 5.60 × 10⁻¹¹³⁰), endometriosis (ORmulti = 7.89, 95% CI = 7.02–8.88, p-value = 2.59 × 10⁻²⁵⁸), diagnoses of overweight, obesity, and other hyperalimentation (ORmulti = 1.52, 95% CI = 1.45–1.60, p-value = 2.21 × 10−58), disorders of lipid metabolism (ORmulti = 1.47, 95% CI = 1.40–1.54, p-value = 4.61 × 10⁻⁵⁹), and vitamin D deficiency (ORmulti = 1.43, 95% CI = 1.35–1.51, p-value = 4.49 × 10⁻³⁶). In addition to validating known risk factors for fibroids (Table 2), we also discovered several associations within the multi-population meta-analyses that, to the best of our knowledge, have not been previously linked to fibroids (Table 3). Many of these associations, which we herein refer to as novel, were also statistically significant within the race-stratified meta-analyses.

Table 2 Multi-population associations of known symptoms and diagnoses with odds of uterine fibroids diagnosis
Table 3 Select multi-population associations of novel symptoms and diagnoses with odds of developing uterine fibroids
Table 4 Select associations of symptoms and diagnoses with increased and decreased odds of developing uterine fibroids within race

The most statistically significant associations within and across races and datasets were genitourinary diagnoses typically associated with fibroid symptoms (Fig. 2A, Tables 3, 4) including irregular menstrual cycle (ORmulti = 6.64, 95% CI = 6.31–7.00, p-value = 5.60 × 10⁻¹¹¹¹), excessive or frequent menstruation (ORmulti = 12.1, 95% CI = 11.40–12.96], p-value = 1.87 × 10⁻¹²⁶²), dysmenorrhea (ORmulti = 10.16, 95% CI = 9.19–11.23, p-value = 2.17 × 10⁻⁴⁴⁸), pain and other symptoms of female genital organs (ORmulti= 3.02, 95% CI = 2.85–3.20, p-value = 9.09 × 10⁻³¹¹), and malaise and fatigue (ORmulti = 1.42, 95% CI = 1.36–1.49, p-value = 2.02 × 10⁻⁵⁶). An array of other gynecological or reproductive diseases (Tables 2, 3) were also associated with increase odds of fibroids including endometriosis (ORmulti = 7.89, 95% CI = 7.02–8.88, p-value = 2.59 × 10⁻²⁵⁸), inflammatory diseases of female pelvic organs (ORmulti = 2.40, 95% CI = 2.27–2.55, p-value = 5.03 × 10⁻¹⁹⁰), noninflammatory disorders of female genitals (ORmulti = 3.72, 95% CI = 3.48–3.97, p-value = 1.37 × 10⁻³³⁷), endometrial hyperplasia (ORmulti = 6.94, 95% CI = 5.81–8.29, p-value = 1.76 × 10⁻¹⁰¹), and ovarian cysts (ORmulti= 6.65, 95% CI = 6.21–7.12, p-value = 6.52 ×10⁻⁶³⁶).

Fig. 2: Breakdown of significant, replicated PheWAS outcomes by diagnosis group for meta-analyses.
figure 2

Pie charts illustrating the percentage of statistically significant, replicated associations within each of the 16 disease groups. The percentage of significant results in each category is displayed for the multi-population [N max=79,213] (a), non-Hispanic Black [N max = 11,342] (b), and non-Hispanic White [N max=67,871] (c) meta-analyses.

Hypotension not otherwise specified (ORmulti = 0.53, 95% CI = 0.46–0.62, p-value = 4.74 × 10⁻¹⁸), a known risk factor for fibroids, was associated with a reduced risk of fibroids. Multiple other, novel circulatory system diseases and symptoms were associated with fibroids. Ischemic heart disease (ORmulti = 0.66, 95% CI = 0.61–0.71, p-value = 6.84 × 10⁻²⁵), pulmonary heart disease (ORmulti = 0.66, 95% CI = 0.59–0.74, p-value = 8.36 × 10⁻¹³), non-hypertensive congestive heart failure (ORmulti = 0.48, 95% CI = 0.43–0.53, p-value = 2.38 × 10⁻⁴³), and peripheral vascular disease (ORmulti = 0.70, 95% CI = 0.62–0.80, p-value = 2.61 × 10⁻⁷) diagnoses are also associated with reduced risk of fibroid diagnosis. Hemorrhoids (ORmulti = 1.68, 95% CI = 1.54–1.84, p-value = 6.73 ×10⁻³⁰) and palpitations (ORmulti = 1.55, 95% CI = 1.45–1.65, p-value = 9.66 × 10⁻³⁹) are the only two circulatory diagnoses that show increased odds (Tables 2, 3).

Uterine fibroids were also associated with neoplastic growths in both genitourinary and neoplasia diagnosis categories. The association with the highest odds of fibroids, outside of benign neoplasms of the uterus under which the code for uterine fibroids fall, was malignant neoplasm of the uterus (ORmulti = 247.70, 95% CI = 182.98–335.30, p-value = 9.15 × 10⁻²⁷⁹). Polyps of female genital organs (ORmulti = 7.88, 95% CI = 7.04–8.82, p-value = 6.19 × 10⁻²⁸¹) was associated with increased odds of fibroids. Consistent with previous evidence of links between keloids and fibroids, we find a positive association between other hypertrophic skin conditions (phecode 701 related to scars and keloids) and uterine fibroids (ORmulti = 2.02, 95% CI = 1.86–2.19, p-value = 1.13 × 10⁻⁶¹). Other benign growths such as benign neoplasms of the ovary (ORmulti = 25.32, 95% CI = 20.42–31.39, p-value = 7.22 × 10⁻¹⁹¹) and mammary dysplasia (ORmulti = 3.00, 95% CI = 2.76–3.27, p-value = 9.96 × 10⁻¹⁴⁵) are positively associated with uterine fibroids.

Respiratory diagnoses were also significantly associated with fibroids. Most respiratory diagnoses showed increased odds of fibroids, including acute (ORmulti = 2.18, 95% CI = 2.07–2.29, p-value = 5.55 × 10⁻¹⁹²) and chronic sinusitis (ORmulti = 1.94, 95% CI = 1.79–2.10, p-value = 1.14 × 10⁻⁶¹), acute bronchitis and bronchiolitis (ORmulti = 1.65, 95% CI = 1.54–1.76, p-value = 2.05 × 10⁻⁴⁹), and allergic rhinitis (ORmulti = 1.95, 95% CI = 1.85–2.05, p-value = 3.52 × 10⁻¹⁴⁴). However, there were a few respiratory diagnoses that showed lower odds of fibroids including pleurisy, pulmonary collapse, and respiratory failure (ORsmulti = 0.35–0.50, 95% CIs = 0.32–0.55 p-values < 5.00 × 10⁻³⁶).

Racially Stratified Meta-Analyses

All 169 phecodes that were meta-analyzed in the non-Hispanic Black cohort were significantly associated with uterine fibroids after adjustment for multiple testing (Fig. 1b). Of the 377 phecodes which were meta-analyzed in the non-Hispanic White cohort, 369 (~98%) were significantly associated with fibroids and only eight phecodes did not reach significance after adjusting for multiple testing (Fig. 1C). Known risk factors, including overweight/obesity, other hypertrophic and atrophic conditions of skin, and pelvic inflammatory disease, were significantly associated with fibroids in both groups (Table 4). Symptoms often associated with fibroids, such as dysuria, pain and other symptoms associated with female genital organs, ovarian cysts, and malaise and fatigue, were also associated with fibroid diagnosis in both groups. In general, the association between fibroid status and several known and novel factors was greater in non-Hispanic Black females relative to non-Hispanic White females (e.g., vitamin D deficiency ORblack = 2.00, 95% CI = 1.71–2.33, p-value = 7.74 x 10⁻¹⁹, ORwhite = 1.36, 95% CI = 1.28–1.44, p-value = 1.57 x 10⁻²³; endometriosis ORblack = 9.51, 95% CI = 6.91–13.08, p-value = 1.30 × 10⁻⁴³, ORwhite = 7.66, 95% CI = 6.75–8.70, p-value = 5.84 × 10⁻²¹⁷), though there were a few instances where the opposite was true (e.g., benign neoplasm of ovary ORblack = 20.23, 95% CI = 13.04–31.38, p-value = 4.86 × 10⁻⁴¹, ORwhite = 27.17, 95% CI = 21.24–34.76, p-value = 5.10 × 10⁻¹⁵²). Several novel associations were observed in both non-Hispanic Black and White cohorts, including genitourinary diagnoses such as genital prolapse and polyps of female genital organs (Table 4, Supplementary Data 1, 2).

There was an overall enrichment for positive relationships in both meta-analyses, demonstrating a marked increase in comorbidities in individuals with uterine fibroids compared to those without fibroids (Table 5). Within non-Hispanic Black and White females and across EHR-defined races, genitourinary diagnoses represent the highest proportion of statistically significant replicated associations (Fig. 2b, c). Diagnoses in the circulatory, endocrine/metabolic, respiratory, neoplasm, and musculoskeletal groups followed, but exact rank varied by race. All diagnoses within the sense organ and musculoskeletal group were positively associated with fibroids, suggesting that fibroids and at least one musculoskeletal or sense organ diagnosis co-occur. Both within and across races, diagnoses in the genitourinary group tended to be positively associated with fibroids (Tables 4, 5), while circulatory diagnoses were associated with negatively correlated and tied to decreased odds of fibroids (Tables 4, 5). Diagnoses in the dermatologic and sense organ groups were only positively associated with fibroids in non-Hispanic White females (Table 5).

Table 5 Sign tests for directionality within and across disease groups

Discussion

Using a validated, multi-stage PheWAS, we found statistically significant associations between fibroid status and multiple disease categories, with the strongest risk factors being among genitourinary, musculoskeletal, and neoplasms. Importantly, known fibroid risk factors, such as inflammatory diseases of female pelvic organs, disorders of menstruation, and other abnormal bleeding from the female genital track, dysmenorrhea, hyperlipidemia, and vitamin D deficiency, were the most strongly associated diagnoses, highlighting the validity of our method. Outside these known risk factors, we also identified several novel diagnoses associated with uterine fibroids, including neoplasms and diagnoses linked to autoimmunity. In general, females with uterine fibroids had a significantly higher number of co-morbid diagnoses relative to control individuals. Across disease groups, diagnoses tended to be positively associated with fibroids within and across race groups, suggesting that fibroids are associated with increased comorbidities in many disease groups.

Genitourinary diagnoses, such as symptoms related to menstruation (frequent, irregular, excessive), dysmenorrhea, pain in female genital organs, disorders of the urinary system, and early menopause, were the strongest associations. These diagnoses are the most typical symptoms frequently reported by individuals with symptomatic fibroids, further highlighting the validity of our phenotyping algorithm and PheWAS approach27,28,29. Our results also confirm previously identified relationships between other genitourinary diagnoses and uterine fibroids. For example, there was strong relationship between leiomyoma and endometriosis. Previous studies that have identified a positive relationship between endometriosis and fibroids, as well as evidence of a common genetic basis between the two conditions30,31,32,33.

Known risk factors and related conditions, outside of genitourinary diagnoses, were also observed. For example, diagnoses related to BMI, an established risk factor for fibroids, were more common in individuals with fibroids regardless of race. These diagnoses, including obesity and disorders of lipid metabolism, are also established risk factors for leiomyoma6,34,35,36. Our models were adjusted for BMI, suggesting other pathways or nonlinear relationships with BMI between fibroids and these traits.

Vitamin D deficiency was significantly associated with fibroids. Case-control studies have found lower Vitamin D levels in females with uterine fibroids10,37,38,39. Lower levels of Vitamin D have also been observed in Black females compared to White females40. Our study found that non-Hispanic Black individuals with Vitamin D deficiency had higher odds of fibroid diagnosis relative to non-Hispanic White individuals. In vitro studies have identified a role of Vitamin D in reducing the expression of key genes related to extracellular matrix production in fibroid cells41. Diagnosis with atrophic skin conditions also increased odds of uterine fibroid diagnosis42,43.

Our study identified several novel associations. Briefly, diagnoses such as inflammatory pelvic disease, non-inflammatory pelvic disease, and benign mammary dysplasia, which were not previously well-documented as associated with fibroids, were associated with increased odds of fibroids. As with leiomyomas, many of these genitourinary diagnoses are related to estrogen or hormone dysregulation (endometrial hyperplasia, endometriosis, pelvic inflammatory disease, cystic mastopathy)44,45,46,47,48,49,50,51. These findings suggest a plausible etiologic mechanism shared across genitourinary disease: hormone dysregulation52.

Fibroids are generally considered as benign neoplasms3,5,53,54. However, our findings suggest a common underlying biology of fibroids and both benign and malignant neoplasms. The second largest association, after benign neoplasms of the uterus (the parent code for fibroids), was malignant neoplasm of the uterus. Cervical cancer, cervical intraepithelial dysplasia, abnormal Papanicolaou smear of cervix uterine, cervical and genital polyps, as well as benign neoplasms of the ovary and breast, were also positively associated with fibroids. If fibroids or polyps develop prior to genitourinary and reproductive malignancies, these conditions could be risk factors and may be useful in screening tests.

Diagnoses in the circulatory system category, such as ischemic heart disease, peripheral vascular disease, and congestive heart failure, were consistently associated with reduced odds of uterine fibroid diagnosis. Hypotension not otherwise specified was also negatively associated with fibroids, consistent with reports of high blood pressure and hypertension increasing risk of fibroids11,55,56. These results suggest an underlying mechanism linking biology of cardiovascular function and uterine fibroids. Estrogen, which has been associated with fibroid development, has known protective effects for cardiovascular disease57. Interestingly, obesity and metabolic disorders are typically associated with increased risk of vascular disease58,59,60. The links between leiomyoma, obesity, metabolic disease, and cardiovascular outcomes suggests more complex relationships between the physiology of these diagnoses that requires further research.

We did not observe a consistent pattern of increased or decreased odds of fibroids within other disease groups. However, when comparing associations across remaining disease groups, we observed increased odds with immune/inflammation pathways. For example, respiratory diagnoses such as acute bronchiolitis, chronic and acute sinusitis, and upper respiratory infections were associated with increased odds of fibroids. Similarly, other disease groups like symptoms (cervical radiculitis, thoracic neuritis/radiculitis, cervicalgia61), digestive (irritable bowel syndrome62), musculoskeletal (synovitis/tenosynovitis, pain and stiffness in joint63) and dermatological (dyschromia and vitiligo64, alopecia65) diagnoses, which are linked with immune and/or inflammation, are also associated with increased odds of fibroids in our study. Inflammatory processes and dysregulation have previously been suggested to be involved in the development of uterine fibroids66, endometrial disorders67, and cardiovascular disease via metabolic syndrome68.

In general, non-Hispanic Black and White females showed similar patterns in associations. The race-specific associations tended to have related diagnoses that were statistically significant in the multi-population meta-analysis or in non-Hispanic White females. For example, fluid overload is significant only in non-Hispanic Black females, however, the related diagnosis of “disorders of fluid, electrolyte, and acid base balance” is statistically significant in both races and meta-analysis. The disparity in associations between non-Hispanic Black and White females suggests both genetic and non-genetic differences. However, it is also possible that such a large disparity is a result of a lack of statistical power due to the smaller sample size of non-Hispanic Black cases relative to non-Hispanic White cases. Further research is needed to replicate these differences and uncover any racial disparities in diagnoses.

PheWAS provides a unique way to test for comorbidities and patterns of disease in a systematic fashion. This method is dependent on EHR diagnostic and billing codes, the entry of which do not always correspond to true disease presence. Reliance on these codes could lead to bias due to misclassification. However, validation in two different EHR databases, where clinical practice and coding is likely to vary, lessens the probability that significant results are due to bias or chance. Furthermore, fibroid cases and controls were identified using our previously published algorithm, which was previously shown to have high performance22. One component of this algorithm was the requirement that all individuals have pelvic imaging to be eligible for inclusion. This requirement eliminates the possibility of misclassification of cases and controls and the resulting bias introduced by asymptomatic disease. Controls also had to have an intact uterus. This further ensured that they were an accurate group to compare to those diagnosed with fibroids, unlike those without uteruses (i.e., those who had hysterectomies), who cannot have fibroids. A stringent significance threshold by Bonferroni correction was also adopted to further reduce the chance of false associations. The successful identification of the well-known risk factors indicates the validity of these methods to detect real relationships. However, the associations identified in this study do not implicate causality. A well-controlled longitudinal study may provide more insight in the causal direction between fibroids and other diseases.

We validated previously reported risk factors and identified novel diagnoses that have not been previously linked to uterine fibroids. In general, females with uterine fibroids bear a larger burden of comorbid traits across most disease-diagnosis groups. We detected novel significant associations of fibroids with malignant neoplasms in the uterus and cervix, as well as decreased negative associations with cardiovascular diagnoses and positive associations with inflammation-related diseases. This study provides the most detailed systematic research into fibroids and comorbidities to-date by leveraging large-scale EHR databases and PheWAS methodology and demonstrates a novel approach to identifying previously uncharacterized comorbidities of uterine fibroids.