Gene regulatory activity associated with polycystic ovary syndrome revealed DENND1A-dependent testosterone production

Sankaranarayanan, Laavanya; Brewer, Kelly J.; Morrow, Shauna; Johnson, Graham D.; Barrera, Alejandro; Venukuttan, Revathy; Sisk, Ryan; Dunaif, Andrea; Reddy, Timothy E.

doi:10.1038/s41467-025-62884-7

Download PDF

Article
Open access
Published: 18 August 2025

Gene regulatory activity associated with polycystic ovary syndrome revealed DENND1A-dependent testosterone production

Nature Communications volume 16, Article number: 7697 (2025) Cite this article

5875 Accesses
3 Citations
68 Altmetric
Metrics details

Subjects

Abstract

Polycystic ovary syndrome (PCOS) is among the most common disorders affecting up to 15% of the menstruating population globally. It is the leading cause of anovulatory infertility and a major risk factor for type 2 diabetes. Elevated testosterone levels are a core endophenotype. Despite that prevalence, the underlying causes remain unknown. PCOS genome-wide association studies (GWAS) have reproducibly mapped a number of susceptibility loci, including one encompassing a gene regulating androgen biosynthesis, DENND1A. Identifying the causal variants within these loci will provide fundamental insight into the precise biological pathways that are disrupted in PCOS. Here, we report the discovery of gene regulatory mechanisms that help explain genetic association with PCOS in the GATA4, FSHB and DENND1A loci using a combination of high throughput reporter assays, CRISPR-based epigenome editing, and genetic association analysis from PCOS case and control populations. In addition, we find that increasing endogenous DENND1A expression causes elevated testosterone levels in an adrenal cell model, specifically by perturbing candidate regulatory elements. These results further highlight the potential for combining genetic variant analyses with experimental approaches to fine map genetic associations with disease risk.

Unveiling the shared genetic architecture between testosterone and polycystic ovary syndrome

Article Open access 13 October 2024

Epigenetic inheritance of polycystic ovary syndrome — challenges and opportunities for treatment

Article 07 July 2021

Multi-ancestry genome-wide association analyses of polycystic ovary syndrome

Article 04 November 2025

Introduction

Polycystic Ovary Syndrome (PCOS) is one of the most common disorders affecting people who menstruate with prevalence rates of 5% to 15%¹, depending on the diagnostic criteria applied². It is the leading cause of anovulatory infertility. PCOS is commonly associated with insulin resistance and obesity, disorders that confer increased risk for type 2 diabetes as well as for other serious cardiometabolic morbidities across the lifespan^3,4. However, the cause(s) of PCOS remains unknown and the disorder is relatively understudied compared to other common medical conditions affecting women⁵

Genetic factors are a major contributor to PCOS. Twin studies estimate that the narrow-sense heritability of PCOS is ~79%⁶. There are currently 30 genomic loci that are associated with altered PCOS risk. One locus, ZBTB16, is associated with decreased PCOS risk (odds ratio = 0.8), while the rest of the loci are associated with increased PCOS risk with odds ratios between 1.1 and 5.6 (GWAS catalog accessed 20 Oct 2023)^{7,8,9,10,11,12,13,14,15,16}. The associated regions encompass genes involved in neuroendocrine, reproductive, and metabolic pathways. The functional consequences of noncoding genetic variants associated with complex traits such as PCOS have been exceptionally difficult to elucidate^17,18. One challenge of fine mapping GWAS signals is the difficulty in identifying causal genetic variant(s) from other genetic variants in regions of strong linkage disequilibrium (LD). In general, the lead GWAS SNPs are not the causal variants but are tagging regions of the genome containing non-coding pathogenic variants^17,19 that contribute to common disease risk by altering regulatory element activity and downstream gene expression^20,21. Nevertheless, GWAS have provided considerable insight into PCOS causal pathways. DENND1A was first identified as a PCOS candidate gene in GWAS¹⁴. DENND1A was subsequently shown to be an important regulator of theca cell androgen biosynthesis where ectopic overexpression led to increased androgen production^{22,23,24,25,26,27,28}. Collectively, rare variants in DENND1A were associated with PCOS quantitative traits in 50% of affected families²⁴. Taken together with previous studies indicating that elevated testosterone levels were a consistent endophenotype in sisters of women with PCOS²⁹, these genetic analyses implicate DENND1A as a core gene³⁰ in PCOS pathogenesis. However, a mechanistic link between the noncoding genome, altered DENND1A expression, and testosterone production has yet to be demonstrated.

In this work, we evaluate the effects of non-coding genomic regions associated with PCOS risk on gene regulatory element activity. To measure regulatory element activity across PCOS-associated genomic loci, we use high throughput reporter assays using bacterial artificial chromosomes and fosmids spanning PCOS GWAS risk loci. High-throughput reporter assays such as STARR-Seq can quantify the regulatory activity of millions of genomic fragments at once^{31,32,33,34,35}. This scale enables systematic studies of the effects of non-coding variants across megabases of the genome and in many different cell types^{31,36,37,38,39}. To prioritize variants, we identify targeted genetic variant associations within the identified regulatory elements as well as modify STARR-Seq to implement a locus-specific allele-specific reporter assay. Together, these data highlight the role of DENND1A in contribution to PCOS risk. As proof of concept of regulatory elements impacting gene expression in PCOS risk loci, we perturb PCOS-associated regulatory elements near DENND1A using CRISPR-based epigenome editing^40,41,42. We find that epigenetic activation of those regulatory elements in an androgen-producing adrenocortical cell model increased both DENND1A expression and testosterone production. Together, these findings suggest an endogenous gene-regulatory mechanism contributing to PCOS; and demonstrate an approach for identifying additional molecular mechanisms of PCOS.

Results

Measuring the regulatory activity of PCOS-associated regulatory elements

To identify gene regulatory elements in which genetic variation can contribute to PCOS risk, we analyzed 14 genetic associations identified in cohorts of European and Han Chinese ancestry at the time of this study^{9,10,11,13,14} (Table 1). Those 14 associations included several genes involved in hormone synthesis via the hypothalamic-pituitary-ovarian axis including FSHR, FSHB, LHCGR and DENND1A. We focused on two human cell models: a testosterone-producing adrenal cell line, H295R; and an estradiol-producing ovarian cell line, COV434^{26,43,44,45,46,47}.

Table 1 List of PCOS GWAS loci selected for STARR-seq experiments

Full size table

To measure regulatory activity in these two cell lines, we used a high-throughput reporter assay known as STARR-seq^33,34 (Fig. 1a). STARR-seq can assay millions of DNA fragments for regulatory activity. STARR-seq assays work through two key libraries – an input library termed ‘assay library’ and a library of the regulatory effect readout termed ‘reporter library’ in this study. Briefly, the assay library consists of plasmid reporter assays containing diverse DNA fragments of interest. When transfected into cells, the DNA fragments regulate their own transcription into mRNA molecules. Thus, by sequencing the reporter library of the resulting mRNA fragments, one can estimate the regulatory activity of each DNA fragment in the assay library.

**Fig. 1: Measuring the regulatory activity in PCOS GWAS loci.**

We constructed a STARR-seq assay library that spans 14 PCOS GWAS loci and encompassed 2.9 Mb of the human genome (Supplementary Data 1, 2). The assay library includes 179 open chromatin regions identified in H295R and COV434 (Supplementary Fig. 1). The median fragment length in the assay library was 320 bp, and the 260 bp in the reporter library (Supplementary Fig. 2). Assay library covers the target region at a median of >300x (Supplementary Fig. 3) and replicates are highly correlated with Pearson correlation coefficient (Pearson’ r > 0.95, Supplementary Fig. 4).

We called 956 regulatory elements in the 14 PCOS GWAS loci across the two cell models (Supplementary Data 3, 4) at a false discovery rate (FDR) ≤ 0.5%⁴⁸. Between replicates in the same cell model, the estimated regulatory element activity was highly correlated (0.84 ≤ r ≤ 0.90, Supplementary Fig. 5). Much of the observed variation in effect sizes can be attributed to differences between assay and reporter libraries, and differences between cell lines (Supplementary Fig. 6). The strong correlation suggested that the targeted STARR-seq approach robustly estimated regulatory activity for the cell types within the PCOS GWAS loci.

We identified 464 and 585 regulatory elements in COV434 and H295R cells, respectively (Supplementary Fig. 7). In both cell models, about half of the identified regulatory elements had enhancer activity, and half had repressor activity⁴⁹ (Fig. 1b, c). There were 93 regulatory elements identified in both cell lines. The regulatory activity of those commonly identified elements was highly concordant. The effect sizes in shared regulatory elements were substantially correlated (Pearson’s r = 0.81, p < 2 × 10⁻¹⁶), and the direction of effects was the same for 85% of shared elements (Fig. 1d). The concordance in the direction of effect increased to 93% when we required the regulatory element calls to overlap in the genome by at least 50% (Supplementary Fig. 8, Pearson’s r = 0.85, p < 2 × 10⁻¹⁶). To our knowledge, this data set is the largest reporter-assay screen for enhancers in adrenal and ovarian cell models.

Regulatory element activity in PCOS GWAS regions corresponds to regions of chromatin accessibility

Enrichment of genetic associations in tissue specific sites of increased chromatin accessibility can predict causal tissues of disease^50,51. Of the PCOS associated SNPs in the GWAS catalog across the 14 loci we tested, seven of those variants overlapped DNaseI hypersensitive sites (DHS) from the ENCODE consortium including rs2479106, rs4385527, rs705702, rs2272046, rs6022786, rs11225154 and rs13164856⁵². To increase confidence that the regulatory elements identified by STARR-seq are active in H295R and COV434 cells, we evaluated whether STARR-seq regulatory elements correspond to chromatin accessibility in the same cell lines. We identified ~73,000 and ~66,000 open chromatin sites in H295R and COV434, respectively, using ATAC-seq and MACS2 peak calling with false discovery rate <0.1. Between 40 and 50% of the open chromatin sites identified in each cell line overlapped sites in the other cell line (Supplementary Figs. 9, 10, 11). Those results revealed that a substantial number of chromatin accessible sites were shared between cell lines.

We identified 116 chromatin accessible sites within the 16 PCOS risk genomic regions we assayed with STARR-seq across each COV434 and H295R cell lines. We also discovered 39 (34%) and 37 (32%) had regulatory activity in H295R and COV434 cells, respectively, according to STARR-seq assays (Supplementary Data 5, 6). For H295R cells, the overlap between chromatin accessibility and STARR-seq activity was ~4-fold more than what would be expected if STARR-seq sites were randomly distributed across the genomic regions. For COV434, the overlap was ~6-fold more than expected by random (Fisher’s exact test, p < 2 × 10⁻⁴ for each cell line). There was also significantly greater regulatory activity in open chromatin regions in the same cell type or tissue than in regions with less chromatin accessibility (Fig. 2a) (Mann-Whitney U, p < 10⁻¹⁰ for H295R, p < 0.01 for COV434). Conversely, there was more chromatin accessibility in regions where we identified regulatory element activity (Fig. 2b).

**Fig. 2: Characterizing candidate regulatory elements.**

We also investigated similarities and differences in regulatory activity between H295R and COV434 cells. We first identified regions with both STARR-seq activity and increased chromatin accessibility in H295R and COV434. There were 69 genomic regions that had significant regulatory activity and significant chromatin accessibility in either cell model. Of these, seven had regulatory activity in both cell models. The small overlap was due to differences in statistical power. Specifically, regulatory activity was similar across both cell types (ρ = 0.65, Supplementary Fig. 12). There was also no strong evidence of elements with opposing regulatory activity between cell types. Taken together, the high concordance of regulatory effect size across STARR-seq in H295R and COV434 suggested that regulatory activity was largely similar between the two steroidogenic cell lines across the PCOS GWAS loci tested.

To relate cell line observations to the corresponding primary tissues, we evaluated if STARR-seq regulatory activity was enriched in chromatin accessible sites in adrenal and ovarian tissues^52,53. Approximately 18% of the identified H295R regulatory elements overlapped with open chromatin from primary adrenal tissue, and 24% of the identified COV434 regulatory elements overlapped with open chromatin from primary ovarian tissue. The overlap was a 2.8 and 3.1-fold enrichment in H295R and COV434, respectively, over what would be expected if regulatory elements were randomly distributed across the assayed regions (Fisher’s exact test, p-value < 10⁻⁷ for each cell line). As with our observations in H295R and COV434 cells, regulatory activity was greater in regions of accessible chromatin in primary tissue compared to those without accessible chromatin (Fig. 2c, Mann-Whitney U, p < 10⁻⁹ for each cell line). This result indicated that regulatory activity measurements in H295R and COV434 cells corresponded to activity in primary adrenal and ovarian cells, respectively.

Regulatory activity in H295R and COV434 cells also corresponded to chromatin accessibility in other tissues. About 50% of the regulatory elements we identified via STARR-seq (n = 296 for H295R, n = 304 for COV434) overlapped chromatin accessible sites identified in diverse tissues as part of the ENCODE project^52,53. The overlap was 1.7- and 2.7-fold enriched over what would be expected if regulatory activity was randomly distributed across the assayed regions in H295R and COV434, respectively (Supplementary Fig. 13, Mann-Whitney U, corrected p-value < 10⁻⁴). ENCODE DNase hypersensitive sites also had increased activity in STARR-seq regulatory elements (Fisher’s exact test p < 10⁻¹², Fig. 2d, Supplementary Fig. 14). We observed similar results when focusing on enhancer-like regions defined across diverse cells and tissues by the ENCODE project⁵⁴. Specifically, ~30% of the regulatory elements we identified overlapped proximal or distal enhancers defined by ENCODE (n = 158 for H295R; n = 207 for COV434); and quantitative estimates of regulatory activity was greater in regions identified as enhancer-like sequences (Fig. 2e).

PCOS-associated genetic variants fine-mapped to within regulatory elements

To discover genetic variants that may alter regulatory activity and gene expression, we completed genetic association analyses focused on the regulatory elements we identified (Fig. 3a). To identify additional risk variants within these functional regulatory elements, we first tested for genetic associations between single nucleotide polymorphisms (SNPs) with minor allele frequency (MAF) > 1% and PCOS disease within the regulatory elements we identified. Across a cohort of 983 PCOS cases and 2951 controls⁹, we tested 759 SNPs in H295R cells − 433 in regulatory elements showing increasing activity and 326 in regulatory elements showing decreasing activity and 486 SNPs in COV434 cells − 290 in regulatory elements showing increasing activity and 196 in regulatory elements showing decreasing activity (Supplementary Data 7). Overall, we identified 19 variants with significant PCOS association. In H295R cells there were 7 variants that were significantly associated with PCOS (Bonferroni adjusted P < 1.15 × 10^-4) in enhancers, and 11 variants in repressive regulatory elements (Bonferroni adjusted P < 1.53 × 10⁻⁴). In COV434 cells, there was 1 variant that was significantly associated with PCOS (Bonferroni adjusted P < 1.72 × 10⁻⁴) in an enhancer, and none in repressive regulatory elements. Of the 19 associated variants, four were in the follicle stimulating hormone subunit beta (FSHB) locus, six were in the neighboring ARL14EP-DT locus and two were in the GATA4/NEIL2 locus (Fig. 3b, c, Table 2, Supplementary workbook). There were four previously identified PCOS-associated risk variants in the regulatory elements we assayed: rs6022786 is an intergenic variant near SUMO1P1; rs2268361 is a variant in an intron of FSHR; rs11225154 is a variant in an intron of YAP1 and rs10835638 is a variant in an intron of ARL14EP-DT^8,11,13. Of those, only rs6022786 was tested in this analysis, and there was not a significant association with PCOS in our cohort.

**Fig. 3: Prioritizing PCOS-associated variants within functional regulatory elements.**

Table 2 Top variants associated with PCOS within STARR-seq regulatory elements

Full size table

To relate PCOS-associated non-coding variants to gene expression, we tested for colocalization⁵⁵ between the PCOS-associated genetic variation in STARR-seq regulatory elements and expression quantitative trait loci (eQTLs) from GTEx⁵⁶. Specifically, we used significant single tissue-eQTL association for this analysis. Using significant single-tissue eQTL associations, we identified seven variants across seven loci where PCOS and gene expression associations colocalized (posterior probability >0.6; Table 3, Supplementary Data 8). We also performed colocalization restricted to adrenal and ovarian tissues, given their relevance to our cell models. This analysis recovered four of the same colocalized variants. However, interpretation of these results is limited by the small sample size of these tissues in GTEx and their substantial cell-type heterogeneity in donor tissues. Therefore, we prioritized the colocalization results from the full GTEx dataset for downstream analysis (Table 3 - PP.AllTissue; Supplementary Fig. 16).

Table 3 Colocalization of PCOS-associated variants with eQTL data from GTEx

Full size table

The candidate genes identified have plausible biological roles in PCOS pathogenesis. GATA4 encodes a transcription factor essential for embryonic development and ovarian function; its deletion impairs granulosa and theca cell proliferation⁵⁷. FSHB encodes the beta subunit of follicle-stimulating hormone, a key regulator of follicular development. DENND1A, involved in clathrin-mediated endocytosis^58,59, has been implicated in androgen biosynthesis^22,23,24, including through studies in H295R cells²⁷. Together, these analyses fine-map regulatory variants that influence expression of genes with established or plausible roles in PCOS.

Active STARR-seq regions have increased conservation score

Evolutionary conservation is another indicator of biological function that is complementary to chromatin accessibility and STARR-seq analyses. We anticipated that genes affecting fertility would have strong evolutionary consequences. In support of this notion, previous studies have reported that conservation of regulatory elements corresponds to a greater functional role in the organism⁶⁰. Therefore, we investigated patterns of conservation across the regulatory elements we identified. We compared conservation scores of regulatory elements that we identified by STARR-seq across 20 vertebrate species⁶¹. The STARR-seq regulatory elements with enhancer activity had increased conservation score when compared to GC- and length-matched regions on the same chromosome (Supplementary Fig. 14b, Mann-Whitney U, p < 0.001). We also observed that the accessible chromatin region identified by ATAC-Seq within COV434 and H295R cells have higher conservation scores (Supplementary Fig. 15) when compared to similarly matched genomic regions from the same chromosome. These results further corroborated the functional importance of the regulatory elements we identified.

Allele-specific regulatory variants identified in DENND1A locus

As proof of concept that the regulatory elements we identified were relevant to PCOS pathogenesis, we focused on mechanisms contributing to altered expression of genes through identifying allele-specific regulatory variants and experimental perturbation methods. For the focused experiments, we chose the DENND1A locus. The top three genes nominated by STARR-Seq-colocalization analysis, and genetic association studies were GATA4, FSHB, and DENND1A (Supplementary Fig. 16, Supplementary Data 9). We focused on DENND1A for mechanistic studies because it is known to contribute to testosterone production, which is straightforward to measure; and because human steroidogenic adrenal cells that express DENND1A are readily available. In contrast, FSHB expression is limited to pituitary cells which remain challenging to culture⁶², and GATA4 is a developmental transcription factor that controls the expression of many genes, and further studies are needed to establish which of those genes and associated phenotypes are most relevant to PCOS^63,64. DENND1A is a guanine nucleotide exchange factor involved in clathrin-mediated endocytosis^58,59. DENND1A expression has been implicated in androgen biosynthesis^22,23,24, in ovarian theca cell and H295R cell model²⁷. Therefore, we focused on understanding the role of regulatory elements and genetic variants in controlling DENND1A expression and activity in H295R cells.

The DENND1A locus has been reproducibly associated with PCOS in Han Chinese and European cohorts^{9,10,24,65,66,67}. However, the functional variants within DENND1A have not been identified. We mapped 38 candidate regulatory elements between the second and sixth introns of DENND1A spanning ~180 kb of the genome. Several of these regulatory elements overlapped regions called as candidate cis regulatory elements (cCRE) through ENCODE, or were in regions with increased chromatin accessibility in H295R and COV434 (Fig. 4a). The lead GWAS risk variants did not overlap the regulatory elements we identified in this study. However, for most of these candidate regulatory elements, there were common variants in linkage disequilibrium (Figure S17) with the lead GWAS SNPs. Taken together, these results suggested that regulatory variants within candidate regulatory elements could contribute to PCOS pathogenesis by affecting gene expression of the target gene of that regulatory element.

**Fig. 4: Fine-mapping variants identified four regulatory variants that are also eQTLs for DENND1A.**

To measure the effects of genetic variation across DENND1A on gene expression, we captured genomic DNA spanning the entire DENND1A gene region from three individuals of European ancestry and two of Han Chinese ancestry. We then measured allele-specific regulatory activity in H295R cells using STARR-seq³⁷ (Fig. 4b, Supplementary Data 10). In total, we assayed ~700,000 unique ~160 bp DNA fragments (Supplementary Figs. 18, 19). The assay library covered the DENND1A gene locus at a median coverage of 140x. The assay libraries were highly concordant, while the measure of log fold change between the reporter and assay libraries was moderately concordant among replicates (Pearson’s r > 0.72, Supplementary Fig. 20).

To estimate the allele specific regulatory effects, we used a Bayesian approach, BIRD, that identifies differences in the relative abundance of alleles in the assay library and in the expressed reporter library⁶⁸. Of the 623 variants we assayed in the targeted locus, 62 had allele specific regulatory activity with a posterior probability, Preg > 0.90 (Supplementary Data 11). On average, the identified variants altered regulatory activity by 40% (Supplementary Fig. 21), and the minor alleles more often had less regulatory activity (chi-squared = 6.9, p-value = 0.009). We observed a modest correlation between the absolute effect size and the minor allele-frequency of the selected variants as determined by the 1000 Genomes project (ρ = −0.36, p = 0.005, Fig. 4c)⁶⁹.

Of the 62 identified regulatory variants we identified, 24 were eQTLs for DENND1A (n = 11) or flanking genes CRB2, RABGAP1 or STRBP (n = 14)⁵⁶. Of those variants, 12 also overlapped open chromatin sites or candidate cis-regulatory elements identified by ENCODE (Fig. 4d, Supplementary Fig. 22, Table 4, Supplementary Data 12). Furthermore, the lead variant from colocalization analyses, rs10117940 (Table 3) was also identified in allele-specific analysis with an effect size of 1.299 (p = 0.731). The variant, rs10117940, was in LD with two STARR-seq regulatory variants (rs28441318 and rs73665345) and a PCOS-associated rare variant²⁴ (rs78012023) (0.32 < r² < 0.65; 0.5 D’ > 0.9). These findings suggested that several loci within the DENND1A gene contributed to PCOS phenotypes by altering DENND1A gene expression.

Table 4 Candidate allele-specific regulatory variants identified

Full size table

Endogenous perturbation of PCOS-associated regulatory elements impacted DENND1A expression

Estimating the effect of regulatory elements on altering gene expression can provide an insight into the underlying mechanisms that contribute to the development of PCOS. While reporter assays like STARR-seq can functionally test for allele-specific regulatory activity, the approach does not identify the target genes of those regulatory elements because the plasmids are not integrated in the genome. One approach to identify target genes of candidate regulatory elements is by epigenomic perturbation of that element. Specifically, a fusion of catalytically inactive Cas9 (dCas9) and histone acetyltransferase domain of P300 is targeted to candidate regulatory elements to measure the effects on the expression of nearby genes³⁹. Several studies have demonstrated that dCas9-P300 can act over tens of kilobases, thus allowing the identification of distal gene regulatory elements^41,70,71. In addition to epigenetic activation using P300, we used dCas9 that has been fused to Krüppel-associated box (KRAB). The dCas9-KRAB fusion can act as repressor to silence target gene expression when targeting gene promoters or gene regulatory elements^72,73,74.

To identify target genes of PCOS-associated gene regulatory regions, we created both dCas9-P300-expressing H295R cells and dCas9-KRAB-expressing H295R cells. We targeted the dCas9 fusion protein to four candidate regulatory elements within the introns of the DENND1A gene and to the DENND1A promoter labeled “element 1-4” (Fig. 5a, Supplementary Fig. 24). To prioritize regions for perturbation studies, we first selected candidate regulatory elements based on positive effect size as measured by our CRADLE-DESeq2 analysis (n = 12). From those 12 regions, we selected the four candidate regulatory elements for perturbation based on the following two factors: i) identifying the overlapping chromatin accessibility track and ii) the ability to design > 5 targeting guides using guideScan2 given the genomic sequence, PAM restrictions, and specificity score > 0.2. (Fig. 5a).

**Fig. 5: Perturbation of regulatory elements in DENND1A impacts testosterone levels.**

We designed 5–7 guide RNAs (gRNAs) for each of the four regulatory elements and promoter region (Supplementary Data 13, 14). As a negative control, we also designed five guide RNAs to not target any location in the human genome. We made lentiviral pools for each of the four targeted regions and for the negative controls. We then transduced each lentiviral pool into three cell lines, HEK293T modified to express dCas9-P300 and H295R cells that expressed dCas9-P300 or dCas9-KRAB. DENND1A was already expressed in both cell lines (average TPM: 20.4 for HEK23T and 15.4 for H295R)^75,76, indicating that the gene was not in heterochromatin and thus could be targeted by dCas9-P300 or dCas9-KRAB effectively. Finally, we measured the effects on DENND1A expression via qPCR, and levels of testosterone at two time points (Fig. 5b).

In the established HEK293T-dCas9-P300 cells⁴¹, targeting dCas9-P300 to the DENND1A promoter increased DENND1A expression by 3.2-fold. Targeting dCas9-P300 to the intronic regulatory elements increased DENND1A expression between 2.1-fold and 2.6-fold. The increase in DENND1A expression was statistically significant compared to the effect of the non-targeting gRNAs for the promoter and all four of the regulatory elements after Bonferroni correction for multiple hypothesis testing (Supplementary Fig. 24, α < 0.05, t-test).

To test for the effect of DENND1A perturbation in H295R cells, we used both epigenetic activator (P300) and an epigenetic inhibitor (KRAB). Specifically, we measured changes in gene expression for cells exposed treated with Forskolin or DMSO as the control via RT-qPCR. Forskolin is an adenylate cyclase activator and has been shown to impact testosterone production⁷⁷.

In the activation experiments using H295R-dCas9-P300 cells, we observed a trend of increased DENND1A gene expression for on-target CRISPR perturbation compared to the effect of the non-targeting gRNAs. We observed a 2.8-fold increase in DENND1A expression when targeting dCas9-P300 to the DENND1A promoter under the DMSO treatment. Targeting dCas9-P300 to the intronic regulatory elements increased DENND1A expression between 1.2-fold and 2-fold under DMSO treatment. In the forskolin treated cells, we observed a 4-fold increase in DENND1A expression when targeting dCas9-P300 to the DENND1A promoter, and a 1.8 fold to 2-fold increase in DENND1A expression while targeting regulatory elements 2,3 and 4. The increase in DENND1A expression was statistically significant compared to the effect of the non-targeting gRNAs under the respective drug treatment for the promoter of regulatory elements after Bonferroni correction (Fig. 5c, α < 0.05, t-test). However, there is no significant change in DENND1A expression levels as a result of forskolin treatment.

In the repression experiments using H295R-dCas9-KRAB cells, we observed a decreased DENND1A gene expression for on-target CRISPR perturbation compared to the effect of the non-targeting gRNAs. We observed a 1.4-fold decrease in DENND1A expression when targeting dCas9-KRAB to the DENND1A promoter under the DMSO treatment. Targeting dCas9-KRAB to the intronic regulatory elements decreased DENND1A expression between 1.1-fold and 1.3-fold under DMSO treatment. In the forskolin treated cells, we observed a 1.5-fold decrease in DENND1A expression when targeting dCas9-KRAB to the DENND1A promoter, and a 1.07–1.2 fold decrease in DENND1A expression while targeting regulatory elements 2, 3 and 4. The decrease in DENND1A expression was statistically significant compared to the effect of the non-targeting gRNAs for the promoter after Bonferroni correction (Fig. 5d, α < 0.05, t-test). However, there is no significant change in DENND1A expression levels as a result of forskolin treatment. In summary, we found several regulatory elements that impact DENND1A expression both with and without forskolin treatment.

We also tested the effects of the epigenetic perturbations of DENND1A on CYP17A1 gene expression via RT-qPCR. CYP17A is involved in the steroid synthesis pathways and is upstream of the production of steroid hormones, such as estrogen and testosterone. Forskolin treatment yielded a 4–4.5 fold increase in CYP17A1 gene expression levels compared to the DMSO treatment (p = 1.6 × 10⁻⁷, α < 0.05, t-test) across all conditions. Specifically, we observed a statistically significant increase in CYP17A1 expression following epigenetic activation of the DENND1A promoter under DMSO treatment (p = 0.03, t-test), but not when targeting the regulatory elements. Across all perturbations (activation and repression) under DMSO conditions, DENND1A and CYP17A1 expression levels were moderately correlated (Pearson’s r = 0.462; p = 0.00095). No correlation was observed under forskolin treatment, consistent with the dominant transcriptional effects of cAMP pathway activation. Additionally, in experiments involving DENND1A activation using dCas9-p300, we observed a moderate positive correlation across both DMSO- and forskolin-treated conditions (Pearson’s r = 0.462; p = 5.76 × 10⁻⁷). These findings support a context-dependent relationship in which increases in DENND1A expression are associated with increased CYP17A1 expression. The effects of forskolin on increasing CYP17A1 expression are expected to be substantially greater than any impact resulting from the moderate decrease in DENND1A expression induced by dCas9-KRAB.

As an alternative approach, we also used RNA-seq to assess changes in DENND1A and CYP17A1 after forskolin treatment. We identified CYP17A1 as one of the top 10 differentially expressed genes due to forskolin treatment in H295R cells (Supplementary Data 15) using DESeq2 for estimating the fold change of differentially expressed genes^49,78. Those results closely mirrored our qPCR results for DENND1A (Supplementary Data 15, p.adj = 0.6 Benjamini-Hochberg correction, Supplementary Fig. 26 c), and a significant increase in CYP17A1 expression with forskolin treatment (Supplementary Data 15, p = 1.67 × 10⁻⁷, Benjamini-Hochberg correction, Supplementary Fig. 26d).

To test for off-target effects for genes in the DENND1A locus, we also measured gene expression changes for DENND1A flanking genes LHX2, CRB2, and STRBP. In all three cell lines, we found CRB2 was not expressed and that expression of LHX2 and STRBP was not affected (Supplementary Figs. 27, 28). We, therefore, inferred that the effects we observed were specific to DENND1A, and that genetic variation in the region likely contributes to PCOS via effects on DENND1A expression.

Endogenous perturbation of PCOS-associated regulatory elements impacted hormone production in steroidogenic adrenal cells

Changes in gene expression levels might alter physiologically relevant phenotypes that could contribute to disease pathogenesis⁷⁹. The average circulating testosterone levels in PCOS patients is 2x-3x higher than control population when measured in early follicular phase and may play a causal role in PCOS pathogenesis²⁹. Furthermore, androgen exposure in animals results in PCOS phenocopies^{80,81,82,83,84}. To test if endogenous overexpression of DENND1A could alter testosterone production in H295R cells, we overexpressed DENND1A by targeting dCas9-P300 to the DENND1A promoter or distal regulatory elements. We then measured the concentration of testosterone in the cell culture media four days later. Increasing DENND1A expression via activating the promoter caused a 3.2-fold increase in testosterone concentration, while activating three of the four distal regulatory elements individually increased testosterone concentration by between 1.7-fold and 2.2-fold (Fig. 5e). The increases in testosterone concentration were statistically significant (α < 0.05, t-test). This increase in testosterone concentration by activating DENND1A is comparable to the clinically measured increase in testosterone levels among people with PCOS compared to the control group²⁹. Complementary to the overexpression of DENND1A, in the H295R-dCas9-KRAB cells, repression of DENND1A impacted testosterone levels. We then measured the concentration of testosterone in the cell culture media four days after transduction with the guide RNAs. Decreasing DENND1A expression by targeting the DENND1A promoter yielded a 2-fold decrease in testosterone production, while targeting the three regulatory elements decreased testosterone concentration between 1.5 and 1.8-fold (Fig. 5f, α < 0.05, t-test). In both the activation and repression experiments, the addition of forskolin significantly increased levels of testosterone (Fig. 5e, f, p = 0.0006). Overall, we were able to impact testosterone levels produced by the cells by impacting endogenous DENND1A gene expression levels and we showed that DENND1A activation increases testosterone production, independent of Forskolin-induced increase in testosterone production.

As a complementary analysis, we measured the rate of increase in testosterone concentration over the four days post transduction (Supplementary Fig. 29). Overall, the rate of change of testosterone concentration mirrored the levels measured after four days. Specifically, cells with increased DENND1A expression had substantially increased testosterone production between 2- and 4-days post-transduction compared to control-treated samples (Supplementary Fig. 30). These results indicated that altered expression of endogenous DENND1A was sufficient to increase androgen biosynthesis in steroidogenic cells.

Some studies have indicated a role of altered estradiol on PCOS^85,86, and estradiol is produced from testosterone and other androgens. Thus, we also measured changes in estradiol production. Increasing DENND1A expression via targeting the DENND1A promoter in the H295R-dCas9-p300 cells increased levels of estradiol produced by the cells compared to the non-targeting control under DMSO (p = 0.032, t-test, 1.4-fold, Supplementary Fig. 31). Across perturbation conditions, forskolin treatment also significantly increased estradiol produced by the cells (p < 10⁻¹², t-test, 1.5–2.3-fold). DENND1A activation further increased estradiol production in H295R-dCas9-p300 cells. We did not observe significant changes in estradiol levels when repressing DENND1A in H295R-dCas9-KRAB (Supplementary Fig. 32). These results indicate that increased DENND1A expression leads to increased estradiol levels, likely downstream of increased testosterone production.

Discussion

One of the central challenges of complex trait genetics is identifying the causal variants within GWAS susceptibility loci and determining their functional consequences. Here, we have fine mapped PCOS genetic associations to specific gene regulatory elements using a combination of high-throughput reporter assays and genetic association analyses. Specifically, we have mapped candidate regulatory elements by testing for the regulatory activity of millions of DNA fragments across 14 PCOS GWAS loci comprising of about 3 Mb of the human genome. We further demonstrated a scalable approach to fine map genetic variants within candidate regulatory elements. We identified PCOS-associated genetic variants by performing genetic association tests across genomic regions that we identified as candidate regulatory elements. Together, we demonstrated a generalizable strategy for identifying genetic variants within experimentally identified functional regulatory elements to fine map genetic association loci for complex genetic traits. As proof-of-concept of the strengths of this approach, we focused on DENND1A, a PCOS GWAS candidate gene reported to regulate androgen biosynthesis²². We showed that manipulating the epigenome of DENND1A-proximal regulatory elements caused increased DENND1A expression and, subsequently, increased androgen in human adrenal cells. These results extend previous studies identifying a role for DENND1A in testosterone production in theca cells, while also demonstrating specific gene regulatory elements wherein genetic variation can alter DENND1A expression. Our results demonstrate the advantage of combining high-throughput reporter assays, fine mapped genetic analyses, and targeted epigenome editing to discover novel gene regulatory mechanisms contributing to common human diseases.

The experimental approaches we used have several advantages and limitations. Because the targeted STARR-seq approach assayed fewer fragments, it was more amenable to be used in cell models that cannot be grown at large scales. The targeted approach also allowed us to test for regulatory activity outside context of genetic linkage³⁷. Furthermore, the ability to capture natural genetic variation present in a pool of genomes allowed us to test for allele-specific regulatory activity across one locus in depth. It is understood that weak effects of non-coding variants contribute to a phenotype through coordinated regulation across several regulatory elements⁸⁷. Thus, this approach allowed us to identify regulatory elements that contribute to an organismal phenotype through gene expression patterns

A limitation of the STARR-seq assay is that DNA fragments are tested outside their native chromatin environment. Although the two cell lines used in this study differ in their steroidogenic capacity, we observed broadly similar regulatory activity across fragments. This is likely because STARR-seq measures the intrinsic regulatory potential of DNA sequences, independent of chromatin structure or epigenetic state. Additionally, reporter assays do not provide information about the endogenous target genes regulated by each element. To address these limitations, we incorporated complementary CRISPR-based epigenetic perturbation approaches to assess the functional consequences of activating or repressing candidate regulatory elements in their native genomic context.

However, these perturbation studies presented their own technical challenges. Specifically, we were unable to obtain homogeneous populations of modified H295R cells expressing dCas9-KRAB or dCas9-p300 and the gRNAs, as the cells were highly sensitive to both antibiotic selection and flow cytometric sorting, leading to significant cell death. As a result, we relied on targeted qPCR to measure gene expression changes, which allowed us to evaluate regulatory effects in viable cultures while avoiding confounding stress responses. We acknowledge the limitation of not performing transcriptome-wide analysis under these conditions. Future studies that leverage improved cell engineering techniques or single-cell approaches will be essential for enabling broader transcriptional profiling following regulatory element perturbation, and will build upon the findings reported here to advance our understanding of gene networks contributing to PCOS pathogenesis.

Identifying the underlying mechanisms by which GWAS loci contribute to disease pathogenesis will be essential for translating these findings to benefit human health. The effect of regulatory elements and non-coding variants has been elucidated for several disease phenotypes. For example, one study identified a SNP that regulates SORT1 in a liver-specific manner within a GWAS risk locus for low-density lipoprotein cholesterol and myocardial infarction (MI)⁷⁹. Another study focused on maternal hyperglycemia identified variants spanning multiple enhancers that have a coordinated effect on HKDC1 expression⁸⁸. Other studies focused on post-GWAS functional analyses have used different methods, including statistical^{55,89,90,91,92,93,94,95} and experimental^{96,97,98,99,100,101} approaches to fine map GWAS signals and identify functional variants. Nevertheless, detailed cellular or molecular studies are often needed to connect the identified gene regulatory effects to a disease relevant phenotype^79,99.

A challenge is the molecular follow up on putative causal genes, which is dependent on cell type, function of the genes and assays to measure the function of the gene with respect to the disease phenotype. PCOS, however, is particularly amenable to experimental perturbation since hormone responses are easy to model in cell systems and offer a potential for testing one of the main clinical phenotypes of PCOS. Our results extend the knowledge of non-coding genetic mechanisms of PCOS pathogenesis. Previous experimental studies characterized a highly conserved enhancer regulating FSHB expression in mouse pituitary cells^62,102; and non-coding variants intronic to AMHR2, a receptor for anti-Müllerian hormone¹⁰³. Previous statistical approaches have also nominated common and rare genetic variants altering the expression of DENND1A²⁴, FSHB, ZFP36L2, ERBB3, RPS26, RAD50¹⁰⁴ as potentially contributing to PCOS. Here, we add both a specific gene regulatory mechanism controlling DENND1A expression to that body of knowledge, while also demonstrating a general strategy for identifying analogous mechanisms for other PCOS genes.

The candidate regulatory elements that we identified can serve as a framework to identify functional non-coding regions that might contribute to PCOS risk by harboring causal variants. Our findings add to growing empirical evidence of regulatory regions contributing to complex traits^{101,105,106,107}. We expect that future evaluation of the regulatory elements from this study will provide new insights into the mechanisms leading to PCOS phenotypes. Broadly, our results demonstrate a scalable approach to study disease-associated regulatory regions implicated not only in PCOS, but also in the pathogenesis of common, complex disorders in general.

Methods

STARR Seq assay library construction

Selection of GWAS regions for targeted STARR-seq assays

We compiled a list of PCOS genome wide association studies published at the time of designing the STARR-seq experiments in 2019^{9,10,11,13,14}. At that time there were 27 genetic risk loci. We further prioritized risk loci that had a p-value < 5 × 10^–⁸. That yielded 19 risk loci (Supplementary Data 1). To then select PCOS-associated genomic regions for STARR-seq assays, we identified all genetic variants in linkage disequilibrium (LD, r² > 0.8) with the 19 genetic variants that were most strongly associated with PCOS or its clinical phenotypes^9,11,13,14. We then selected bacterial artificial chromosomes (BACs) and fosmids that encompassed all the identified genetic variants. We removed the genomic regions at ZBTB16, MAPRE and ERBB3 at this step due to recombination or availability of BACs. We obtained 18 BACs and 2 fosmids spanning the 16 remaining genetic risk loci. All BACs and fosmid clones were sourced from BACPAC Genomics, Inc and the source of these clones is Children’s Hospital & Research Center at Oakland (CHRCO). The BACs and fosmids are detailed in Supplementary Data 2.

All BACs and fosmids were obtained as clones in E. coli. We propagated each bacterial clone in selective conditions. We isolated the BAC DNA using NucleoBond Xtra BAC (Machery-Nagel); and we isolated fosmid DNA using FosmidMAX (Lucigen), following manufacturer’s protocols. To validate that the BACs and fosmids were intact and covered the target region, we created Illumina high-throughput sequencing libraries from the isolated DNA using NEBNext Ultra II FS DNA Library Prep. We barcoded the sequencing library for each BAC or fosmid independently, and pooled the resulting libraries for sequencing. We sequenced the pooled libraries on an Illumina MiSeq instrument, and aligned to the human genome. For two of the 16 target regions, the BACs either recombined or the sequencing reads from the BAC aligned to a different genomic region suggesting contamination with another BAC. We removed those two regions from subsequent analysis. The BACs and fosmids for the remaining 14 target regions span ~3 Mb of the human genome (Supplementary Data 2).

STARR-seq reporter plasmid construction

To create STARR-seq assay libraries from the BACs and fosmids, we cloned sheared DNA from each BAC into the STARR-seq plasmid. We sheared each BAC or fosmid to ~400 bp DNA fragments using a Covaris S220 sonication instrument. We then ligated custom universal adapters to the resulting DNA fragments using the NEBNext DNA Library Prep protocol (#E6040L) (Supplementary Data 16 - SS_Adaptor_1 & SS_Adaptor_2). We amplified the adapted DNA fragments and added sequences for Gibson assembly into the STARR-seq plasmid using PCR. For the PCR, we used KAPA HiFi HotStart kit (Roche) and the primers TS2SS-F and TS2SS-R (Supplementary Data, S16). The PCR cycling conditions were: 98 °C for 30 s, followed by 10 cycles of 98 °C for 15 s, 64 °C for 30 s, 72 °C for 30 s, with a final extension at 72 °C for 5 min.

We cloned the fragment libraries into the STARR-seq ORI vector (Addgene#99296). To do so, we first linearized the plasmid using AgeI and SalI (NEB R3552L and NEB R3138L). We analyzed the digested plasmid on a 1% agarose gel, confirmed that the linear plasmid was the expected ~3600 bp size, and isolated the linearized plasmid using either the QIAquick Gel Extraction Kit (#28704) or GeneJET Gel Extraction Kit (#K0691). We cloned the adapted and amplified DNA fragments from the BACs and fosmids into the linearised STARR-seq ORI vector using the NEBuilder HiFi DNA Assembly (#E2621) kit. We ethanol precipitated the products. To do so, we added 0.1X volume 3 M NaOAc and 2.5X volume cold 100% ethanol and stored the mixture at –20 °C overnight. We then pelleted the DNA via centrifugation at 16,000 RCF for 30 min at 4 °C. We washed the pellets with 5 ml cold 70% ethanol, and resuspended them in water. To amplify the resulting plasmid libraries, we electroporated into E. cloni 10 G SUPREME Electrocompetent Cells following manufacturer protocol for optimal settings in 1.0 mm cuvette (10 μF, 600 Ohms, 1800 Volts). We grew the plasmids in individual 1 L volumes of LB with carbenicillin for antibiotic selection at 37 °C overnight. We isolated the resulting PCOS GWAS STARR-seq assay plasmids using NucleoBond PC 10000 EF (Machery-Nagel).

To make the final PCOS GWAS STARR-seq assay library, we pooled the individual BAC and fosmid STARR-seq plasmids in equimolar concentration. We validated the size of the plasmid library using the Agilent TapeStation, and quantified the resulting pool using Qubit (Invitrogen).

PCOS GWAS STARR-seq assay library sequencing

To estimate the abundance of reads mapping to the regions selected in the assay library, we used Illumina high-throughput sequencing NextSeq 2000 with 50 bp paired end sequencing protocol. To prepare the sequencing libraries, we first amplified the STARR-seq regions from a 20 ng pooled plasmid library using KAPA HiFi HotStart kit (Roche). The PCR cycling conditions were: 98 °C for 30 s, followed by 15 cycles of 98 °C for 15 s, 64 °C for 20 s, 72 °C for 30 s, with a final extension at 72 °C for 5 min using 208-F Index7 primers (Supplementary Data 16). To isolate the final library, we used Axygen Spri Beads (AxyPrep™ Mag PCR Clean-Up Kit) beads at appropriate concentrations based on the manufacturer’s manual for an insert size of 400 bp.

Cell culture protocol

We obtained NCI-H295R cells from ATCC. The cells were cultured in DMEM/F-12 medium (Gibco #21041025) supplemented with 2.5% Nu-Serum (Corning #355100) and 1% ITS+Premix (Corning #354352) and grown as a monolayer at 37 °C, 5%CO₂. For drug treatments, forskolin (Stemcell Technologies, #72112) was dissolved in DMSO. Forskolin was added to the cell culture media at 10 µM, and an equivalent volume of DMSO was used as the control. We validated testosterone produced by the cells stimulated with 10 µM forskolin using ELISA following manufacturer’s protocol (Cayman Chemicals #582701).

We obtained COV434 cells from ECACC (Sigma–Aldrich #07071909). The cells were cultured in DMEM (Gibco #11965092) supplemented with 2 mM Glutamine and 10% Foetal Bovine Serum (FBS) and grown as a monolayer at 37 °C, 5%CO2. We validated estradiol produced by these cells treated with 100 ng/mL follicle stimulating hormone (FSH) and 2.9 μg/mL androstenedione (A4) using ELISA (Cayman Chemicals #501890).

All experiments were performed between passages 5 and 12.

Nucleofection optimization

To transiently introduce the PCOS GWAS STARR-seq assay library into the cell lines, we used electroporation via the Lonza 4D-Nucleofector System. To optimize the electroporation settings for H295R and COV434, we used the Cell Line Optimization 4D-Nucleofector™ X Kit (Lonza #V4XC-9064) following manufacturer’s protocol. Based on this optimization, we chose SF-CM-138 for COV434 cells and SF-DN-100 for the H295R cells, with 2 μg of plasmid to every 1 million cells transfected.

Transfection of cells

To test the regulatory potential of PCOS GWAS targeted regions, we first transfected the PCOS GWAS STARR-seq library into both H295R and COV434 cells, and isolated and sequenced the resulting RNA. We isolated the RNA from the cells 6 h post transfection. We transfected the PCOS GWAS STARR-seq plasmid library into H295R and COV434 based on the nucleofection optimization settings we described using SF Cell Line 4D-Nucleofector® LV Kit L (Lonza #V4LC-2002) following manufacturer’s protocol. All the experiments were performed in triplicate for each cell line. For each replicate for each of the cell lines, we used 50 million cells transfected with 100 μg of the PCOS GWAS STARR-seq plasmid library.

PCOS GWAS STARR-seq reporter library construction

To isolate the PCOS GWAS reporter RNA, we first isolated total RNA six hours post transfection followed by enriching for cDNA produced from the PCOS GWAS STARR-seq library plasmid pool¹⁰⁸.

Six hours post transfecting the cells with the STARR-seq library, we rinsed the cells with PBS and dissociated the cells using Trypsin-EDTA 0.25% (Life Technologies). We lysed the cell pellets using RLT buffer (Qiagen) with 2-mercaptoethanol (Sigma). We passed the lysates through a 18-gauge needle ten times and stored at −80 °C before RNA extraction.

RNA extraction

We isolated total RNA using the Qiagen RNeasy Midi kit including the on-column

DNaseI digestion step. We treated the isolated total RNA with 1 μL RNase Block (Agilent). We then isolated poly-A RNA using Dynabead Oligo-dT25 beads (Life Technologies) according to the manufacturer’s recommended protocol. We treated the poly-A RNA with DNase (TURBO DNase, Invitrogen) and 1 μL RNase Block at 37 °C for 30 min before halting the reaction with the DNase inactivation reagent. We then synthesized PCOS GWAS reporter cDNA by reverse transcription using Superscript III (800 U, Life Technologies) following manufacturer’s protocol and a STARR-seq specific primer (SSRT-UMI, Supplementary Data 16)

PCOS GWAS STARR-seq reporter construction

Following synthesis, we treated the cDNA with RNaseA (Sigma) at 37 °C for 1 hour. We purified the PCOS GWAS reporter cDNA with SPRI beads (1.5X) and amplified using index-PCR primer and indexed PostSS-Index-5 primers (Supplementary Data 16) to allow barcoding for sample multiplexing under the following conditions: 98 °C for 30 s, followed by 10–12 cycles of 98 °C for 10 s, 64 °C for 30 s, 72 °C for 30 s, with a final extension at 72 °C for 5 min. We split each sample into 7 individual PCR amplification reactions in this step. We determined the total number of cycles for amplification using a small portion of that sample in a qPCR protocol and estimating cycle number using 1/4^th the maximum plateau observed in the qPCR. We cleaned the amplified PCR products using SPRI beads (1.0X) and then validated the length distribution of the PCOS GWAS reporter library on Agilent tape station.

PCOS GWAS STARR-seq reporter library sequencing

Final PCOS GWAS reporter libraries from each replicate experiment were pooled at equimolar 2 nM concentrations. We sequenced the PCOS GWAS reporter libraries on Illumina NextSeq 2000 using 50 bp PE sequencing.

Alignments and STARR-seq analysis

To estimate regulatory activity in the targeted PCOS GWAS regions, we used the abundance of the fragments expressed as RNA in the reporter library relative to their abundance in the assay library, outlined figure S7. First, we aligned the PCOS GWAS assay library and the PCOS GWAS reporter library individually to the human genome (hg38) using bowtie2. We filtered reads with a quality score of Q > = 30, and outside the centromeres and blacklisted regions. These reads were used for the downstream analysis. We used picardtools¹⁰⁹ to mark and call duplicates. RPKM normalized STARR-seq read density was computed at single base pair resolution using deepTools¹¹⁰ utility bamCoverage. We used CRADLE⁴⁸ package to correct biases and call peaks with the following options. We then estimated differential STARR-seq activity across the regions as fold change using DESeq⁴⁹. For visualizing the sequencing data, we have included the raw and RPKM normalized bigwigs on the UCSC genome browser link in the data availability statement (Figure S7). To represent the CRADLE normalizations, we included CRADLE-corrected bigwigs for input and output STARR-seq libraries separately as well as for input library subtracted from the output library (Figure S7). For each region identified to have STARR-Seq regulatory activity measured by DESeq, we compared PCOS STARR-seq results from both COV434 and H295R cell lines to ATAC-Seq datasets generated for these cell lines. We also compared the STARR-Seq results to the regulatory regions across ENCODE (V4) for both cell lines and primary tissues.

PCOS case-control variant association testing within candidate regulatory regions

To identify any association between genetic variants within functional STARR-seq regulatory elements and PCOS, we performed genetic association analyses. The selection criteria, clinical features and genotyping in 983 PCOS cases and 2951 controls from our previous stage 1 GWAS discovery cohort has been reported (Hayes, 2015). In brief, genotyping was performed using the Illumina OmniExpress (HumanOmniExpress-12v1_C) array⁹. Genotype imputation was performed using minimac4¹¹¹ on the Michigan Imputation Server¹¹² for phasing via Eagle¹¹³ using the TOPMED freeze 8 reference panel^114,115. Variants were filtered to remove any SNPs with imputation quality (R²) less than 0.8 and restricted to STARR-seq regions of regulatory activity. The adjusted p-value threshold was calculated based on the regulatory elements identified in each cell line as having increasing or decreasing regulatory activity (Supplementary Data 7). PCOS association was tested within candidate regulatory regions. Single variant associations were carried out using PLINK¹¹⁶ on common variants (minor allele frequency [MAF] > 1%) using logistic regression with PCOS as the outcome variable and age, BMI, and five principal components (PCs)¹¹⁷ as covariates. To control for false-positive discoveries, results were adjusted for Bonferroni correction thresholds. The number of PCOS cases was 983 rather than 984 reported in Hayes et al. 2015, because we updated the IBD exclusion criterion from 3rd degree relatives to 4th degree relatives resulting in the exclusion of 1 PCOS case.

Colocalization testing

To test for association between two datasets to identify likely causal SNP between two traits, we used a bayesian colocalization method⁵⁵. For the PCOS-associated variants, we used the list of variants and its associated statistics from the above result for all variants with P < 0.3 (Table 3). We used the standard options for the colocalization testing. For the eQTL dataset, we used publicly available expression quantitative association data from the GTEx consortium GTEx Analysis V8 (dbGaP Accession phs000424.v8.p2, accessed on June 7, 2020). The GTEx dataset contains cis-eQTL data from ~900 American donors of mostly European Ancestry (~85%) across 49 tissues and of varying ages. We applied coloc, a Bayesian test for colocalization to identify the probability of a shared causal signal between the PCOS-regulatory element-associated variant and eQTL variants. We used the coloc.abf() function in the coloc R package with the default assignment of prior probabilities for a SNP being associated with each trait from the Coloc package. All analyses with a colocalization posterior probability (PP.4) > 0.3 using eQTL data from all tissues, adrenal tissue and ovarian tissue were reported in Table 4.

DENND1A-enriched STARR seq assay library construction

To test for the allele specific regulatory activity of common variants, we modified the targeted STARR-seq assay to use a pool of 5 different human genomes instead of BACs and fosmids as described above. We selected genomes from individuals identified as female, and healthy from the 1000 Genomes project⁶⁹. We used three individuals of European ancestry and two individuals of Han Chinese ancestry to be pooled into the targeted STARR-seq experiment to identify allele specific regulatory effects. Four candidate regulatory elements were identified from the targeted STARR-seq results with coordinates listed in Supplementary Data 13.

Enrichment for the targeted DENND1A region

We focused on variants present in the DENND1A locus, a region that spans the entire DENND1A gene and 100 kb upstream and downstream of the gene. For target enrichment of the DENND1A locus, we used targeting oligonucleotide probes. We first sheared each genome separately to ~200 bp using Covaris (S220). We then used Agilent SureSelect Custom DNA Target Enrichment Probes to enrich the region around DENND1A (hg 38: chr9:123279654-124030107). We followed the Agilent SelectXT2 custom (Cat# 5190-4846) to enrich the target regions in each genome, however we modified the protocol at the adaptor ligation steps. We used a custom adaptor (SS_Adaptor) and amplified the resulting oligo fragments using TS2SS-F and TS2SS-R primers (Supplementary Data 16).

DENND1A locus STARR-seq reporter plasmid construction

To create the DENND1A STARR-seq assay libraries from the five genomes, we cloned the sheared and DENND1A-locus enriched DNA fragments. We cloned the amplified and enriched fragments into the linearised STARR-seq vector using NEBuilder HiFi DNA Assembly (#E2621). We ethanol precipitated the products. To do so, we added 0.1X volume 3 M NaOAc and 2.5X volume cold 100% ethanol, and stored the mixture at –20 °C overnight. We then pelleted the DNA via centrifugation at 16,000 RCF for 30 min at 4 °C. We washed the pellets with 5 ml cold 70% ethanol, and resuspended in water.

We then pooled the plasmids from each genome in equimolar concentrations. We amplified the pooled plasmids by transfecting the plasmids into E. cloni 10 G Electrocompetent Cells following manufacturer protocol for optimal settings (1.0 mm cuvette, 10 μF, 600 Ohms, 1800 Volts). We subsequently isolated the plasmids using Qiagen Plasmid Kit, GigaPrep (Qiagen #12191), and quantified it using Qubit and validated the length of the pooled library on a 1% agarose gel. This purified, pooled plasmid is our DENND1A-locus STARR-seq library that was used for DENND1A-locus STARR-seq experiments.

DENND1A locus STARR-seq assay library sequencing

To estimate the abundance of reads mapping to the variant loci selected in each assay library, we used Illumina high-throughput sequencing NextSeq 2000 with 50 bp paired end sequencing protocol. We sequenced 3 replicates of the DENND1A-locus STARR-seq assay library using the amplified the STARR-seq assay fragments from the pooled library using 208-F Index7 primers (Supplementary Data 16).

DENND1A locus enriched STARR-seq assay

To test for effects of variants in the targeted DENND1A locus, we transfected the DENND1A locus STARR-seq assay library into H295R cells, and isolated and sequenced the resulting RNA similar to the methods described previously. All experiments were performed in triplicate for each cell line. For each replicate for each of the cell lines, we used 70 million H295R cells transfected with 140 μg DENND1A locus STARR-seq assay library using the Lonza Nucleofector (setting SF-DN-100). We isolated the RNA from the cells 6 h post transfection.

DENND1A locus STARR-seq reporter library construction

To isolate the DENND1A locus reporter RNA, we first isolated total RNA followed by enriching for cDNA produced from the DENND1A-locus STARR-seq plasmid library. We used the same protocol as described for the PCOS GWAS reporter library construction. We pooled the DENND1A locus reporter libraries from each replicate at equimolar 2 nM concentrations. We then sequenced the DENND1A locus reporter libraries on Illumina NextSeq 2000 using 75 bp PE sequencing.

Candidate regulatory variant identification

To identify variants that have allele specific regulatory activity, we compared the ratio of reads mapping to the alternate allele versus reference allele in each assay library and reporter library. If the ratio of reads mapping to alternate allele versus reference allele was higher in the reporter library compared to the assay library, that variant was called as having increased regulatory activity of the alternate allele.

To do so, we first obtained a list of the variants present in the DENND1A locus in the pool. We obtained the VCF for these samples from the 1000 Genomes Project⁶⁹. We filtered the variants in the targeted DENND1A locus, to only include those SNPs present as heterogeneous within the pool of five genomes we used. The final list of ~600 variants was then used for the regulatory variant analysis.

To compare reads mapping to each allele in both the reporter and assay libraries, we first aligned DENND1A locus enriched STARR-seq libraries (assay library and reporter library) individually aligned to the human genome (hg38) using WASP¹¹⁸ and bowtie2¹¹⁹. Reads with a quality score of Q > = 30, and outside the centromeres and blacklisted regions were used for downstream analysis. We used picardtools to mark and call duplicates¹⁰⁹. RPKM normalized STARR-seq read density was computed at single base pair resolution using deepTools utility bamCoverage¹¹⁰. We then assigned reads mapping to each variant for each sequenced sample using samtools mpileup¹²⁰.

To estimate the regulatory effect of variants, we used BIRD⁶⁸. BIRD is a bayesian statistical framework for analysis of regulatory variants and uses bayesian priors to identify allele-specific regulatory effects, and identifies variants that have a high probability of being a regulatory variant with an effect size, theta. We used the standard options for BIRD and set the regulatory effect threshold as 1.2.

ATAC-Seq

To identify the accessible chromatin within the H295R and COV434 cells, we performed ATAC-Seq¹²¹ in duplicate as described below.

We harvested 50,000 viable cells for each replicate. COV434 cells were additionally incubated in TURBO DNase (Invitrogen, #AM2238) for 1 hour at 37 °C. We then incubated the cells with 50 μL cold ATAC-RSB with 0.1% NP40, 0.1% Tween20, 0.01% Digitonin and incubated on ice for 3 min. We washed the cells with 1 ml cold ATAC RSB with 0.1% Tween 20 and pelleted. We resuspended the cell pellets in the transposition mixture comprising of 25 μL TD buffer, 2.5 μL transposase, 16.5 μL PCS, 0.5 μL 1% digitonin, 0.5 μL 10% Tween 20 and 5 μL H2O and incubated in a thermomixer at 37 °C for 30 minutes. We cleaned up the DNA using MinElute Reaction Cleanup Kit (Qiagen, #28204). We amplified the resulting DNA using an ATAC-Universal primer and an ATAC-barcode primer (Supplementary Data 16) and cleaned it using SPRI beads. We sequenced the ATAC seq libraries on Illumina NextSeq 550, 50 bp PE sequencing at the Duke Genomics core.

ATAC-seq preprocessing and alignment

ATAC-seq libraries for H295R and COV434 cell lines were individually aligned to the human genome (hg38). Each cell line had 2 biological replicates, and >40 million reads were generated per sample. Sequencing data quality was assessed with FastQC, and adapters were trimmed with Trimmomatic. Trimmed reads were aligned to the GRCh38 genome using Bowtie¹¹⁹ reporting only alignments having no more than two mismatches, discarding multi-mapping reads(-v 2 --best --strata -m 1). Reads mapping to the ENCODE hg38 blacklisted regions (https://www.encodeproject.org/files/ENCFF356LFX; manually curated regions with anomalous signal across multiple genomic assays and cell types) were removed using bedtools2 intersect¹²² (v2.25.0). Properly paired reads were then filtered to exclude presumed PCR duplicates using Picard MarkDuplicates (v1.130; http://broadinstitute.github.io/picard/). Reads were then used to generate reads per million (RPM) counts of bigWig files for visualization using deeptools bamCoverage¹²³ (v3.0.1). Peaks were called using MACS2 with an FDR cutoff 0.1. We used the ENCODE ATAC-Seq standards for analysing the dataset we generated. We generated Transcription Start Site enrichment values using GRCh38 Refseq TSS annotation and used the cutoff of >7 for high quality data (Figure S11).

Generating cell lines for CRISPRa perturbation studies

GuideRNA (gRNA) design and gRNA plasmid synthesis

Four candidate regulatory elements were identified from the targeted STARR-seq results with coordinates listed in Supplementary Data 13. The regions were selected based on STARR-seq effect, chromatin accessibility and ability to design guides considering genomic sequence and PAM restrictions.

To design the guide oligos, we used Guidescan2¹²⁴, with “specificity” filter > 0.2. We had a total of 21 gRNAs, across four candidate regulatory elements and DENND1A promoter region (Supplementary Data 14), with each regulatory element comprising of 5–7 guides targeting that element. For the negative control, we designed a set of five guides that did not have any targets in the human genome. Each gRNA oligo was synthesized as individual oligos that were then processed as described below to make pooled gRNA plasmids.

To make the gRNA plasmids, we followed the outline of the CROP-Seq protocol¹²⁵. First, we prepared the gRNA plasmid backbone by digesting CROPseq-Guide-Puro plasmid from Addgene (#86708) using BsmBI. We ran the digested product on 1% agarose gel, and we purified the 8.3 kb fragment using GeneJET Gel Extraction Kit (#K0691). To prepare the gRNA oligos for insertion into the plasmid, for each gRNA oligo synthesized, we first converted it to a double stranded oligo using Primers ssds-F and ssds-R (Supplementary Data 16). We then cloned each double-stranded gRNA oligo into the digested CROPseq-Guide-Puro vector using NEBuilder HiFi DNA Assembly (#E2621) kit. The plasmid products were purified with QIAquick PCR Purification Kit (Qiagen #28104).

To make the pooled plasmids, we pooled (equimolar) each plasmid product for each regulatory element, or promoter region, or negative control. To amplify the plasmid pools, we electroporated each pool into Lucigen Endura Cells (Lucigen #60242-2) following manufacturer protocol for optimal settings in 1.0 mm cuvette (25 μF, 200 Ω, 1.5 kV). We grew the plasmids in individual 25 mL volumes of LB with carbenicillin for antibiotic selection at 37 °C overnight and isolated the gRNA plasmid pools using Qiagen Midi Prep (Qiagen #12143) following manufacturer’s protocols. Each purified plasmid pool was then used to prepare lentiviral particles.

Lentivirus production

To test the target gene of the identified STARR-seq regulatory elements, we used CRISPRa to perturb the selected candidate regulatory elements. First, we designed a stable cell line expressing a Cas protein. To do so, we used a catalytically inactive Cas9 (dCas9) fused with the P300 domain of histone acetyltransferase (dCas9-P300). This dCas9-p300 can act as a transcriptional activator when combined with targeting guide RNA⁴¹.

To make stable dCas9-P300 cell lines, we generated lentivirus expressing dCas9-p300. Briefly, we combined the following plasmids: dCas9-p300 (Addgene #83889), psMD2.G (Addgene #12259) and psPAX2 (Addgene #12260) with Lipofectamine 3000 (Invitrogen #L3000001) and lipofected into HEK293T cells (ATCC #CRL-3216™) according to the manufacturer’s protocol. After 14 to 20 h, transfection media was exchanged with fresh media. We then harvested viral supernatant at 24 and 48 h post lipofection. We concentrated the viral supernatant at 1/100x using LentiX Concentrator (Clontech #631232) following the manufacturer’s protocols.

To make stable dCas9-KRAB cell lines, we generated lentivirus expressing dCas9-KRAB. Briefly, we combined the following plasmids: dCas9-KRAB (gift from Gersbach lab, Duke University), psMD2.G (Addgene #12259) and psPAX2 (Addgene #12260) with Lipofectamine 3000 (Invitrogen #L3000001) and lipofected into HEK293T cells (ATCC #CRL-3216™) according to the manufacturer’s protocol. After 14 to 20 h, transfection media was exchanged with fresh media. We then harvested viral supernatant at 24 and 48 h post lipofection. We concentrated the viral supernatant at 1/100x using LentiX Concentrator (Clontech #631232) following the manufacturer’s protocols.

To produce lentivirus for individual gRNAs, we transfected HEK293T cells with an equimolar pool of gRNA plasmids for each regulatory element, psPAX2, and pMD2.G using Lipofectamine 3000 following the manufacturer’s instructions. We harvested media containing the produced lentivirus at 24 and 48 h later and concentrated the viral supernatant at 1/100x using LentiX Concentrator (Clontech #631232) following the manufacturer’s protocols.

HEK293T cell line with stable dCas9-P300 expression:

We received HEK293T-dCas9-P300 cell line⁴¹ from Dr. Charles Gersbach. We followed the published culture and growth conditions for 293T cells.

Generating stable H295R-dCas9-P300 and H295R-dCas9-KRAB cell lines

To make stable H295R cells expressing dCas9-P300, we transduced the concentrated lentiviral particles containing dCas9-p300 into H295R cells with a multiplicity of infection of 5.0 using 6 μg/ml of polybrene (EMD Millipore Corporation #TR-1003-G). Additionally, we selected for the transduced cells using 0.5 μg/mL of puromycin (Gibco #A1113803) for 7 days⁴¹ to generate a polyclonal H295R-dCas9-p300 cell line. We confirmed the expression of dCas9-p300 in H295R cells using qRT-PCR.

To make stable H295R cells expressing dCas9-KRAB, we transduced the concentrated lentiviral particles containing dCas9-KRAB into H295R cells with a multiplicity of infection of 5.0 using 6 μg/ml of polybrene (EMD Millipore Corporation #TR-1003-G). Additionally, we selected for the transduced cells using 0.5 μg/mL of blasticidin (Gibco #A1113903) for 10 days to generate polyclonal H295R-dCas9-KRAB cell line. We confirmed the expression of dCas9-p300 in H295R cells using qRT-PCR.

Transduction of gRNA into dCas9-P300 or dCas9-KRAB expressing cell lines

To test the effect of dCas9-P300 or dCas9-KRAB on the targeted regulatory elements, we transduced each lentiviral pool for the regulatory elements, DENND1A promoter region and negative control in two cell lines (HEK293T and H295R) with stable dCas9-P300 or stable dCas9-KRAB expression with high multiplicity of infection with ~MOI of 15 to account for the fact the dCas9 modified H295R cells were harder to transduce¹²⁶. We transduced the cells during seeding in a 12-well or 6-well plate supplemented with 6 μg/ml of polybrene for H295R cells and 4 μg/mL of polybrene for HEK293T cells across 6 replicates for each pool (EMD Millipore Corporation #TR-1003-G). We changed the media on the cells 24 h after transduction.

RNA isolation and qRT-PCR to measure gene expression levels

To measure any changes in gene expression levels due to the CRISPRa perturbation, we used qRT-PCR. First, we harvested RNA from each replicate 4 days post transduction with the gRNA lentivirus pool using RNeasy Mini Kit (Qiagen #4004) following manufacturer’s protocol including the DNase treatment. Prior to harvesting the cells, we measured cell viability for each replicate to be at least 80% viable using 0.4% trypan blue stain and Countess Cell Counter (ThermoFisher). We quantified the RNA using Qubit (Invitrogen) and used 500 ng of RNA for each sample for subsequent cDNA synthesis. For the cDNA synthesis, we used Superscript III (800 U, Life Technologies) with Oligo dT primers following manufacturer’s protocol (Thermo Fisher #18418012). Following cDNA synthesis, we performed qRT-PCR using that cDNA, TaqMan™ Fast Advanced Master Mix for qPCR (Thermo Fisher #4444556), and TaqMan™ Gene Expression Assays (for the genes DENND1A, CRB2, LHX2 and STRBP and GAPDH). The qPCR analysis was performed using the 2^-ΔΔCT method in R, using GAPDH as the internal control. All the fold change is reported as log(2^-ΔΔCT) compared to the negative (non-targeting gRNA) control. Each sample was measured in triplicate for the qRT-PCR.

RNA sequencing of H295R cells with forskolin treatment

To measure any changes in gene expression levels due to forskolin treatment, we performed RNA-sequencing. First, we cultured H295R cells in the presence of forskolin at 10 μM or equivalent volume of DMSO as a control for 3 days. Prior to harvesting the cells, cell viability for each replicate was measured to be atleast 90% using 0.4% trypan blue stain and Countess Cell Counter (ThermoFisher). The cells were then harvested using trypsin and cell-pellets were flash frozen prior to RNA extraction and sequencing. RNA sequencing was performed at Azenta standard, unstranded, bulk RNA sequencing.

RNA sequencing analysis

To quantify changes in gene expression in H295R cells as a result of forskolin treatment, we performed a differential analysis on RNA-seq data generated from forskolin-treated and DMSO-treated H295R cells. Raw reads were trimmed to remove adapters and

bases with average quality score (Q) (Phred33) of <20 using Trimmomatic v0.32¹²⁷. Trimmed reads were subsequently aligned to the primary assembly of the GRCh38 human genome using STAR v2.4.1¹²⁸. We performed differential expression analysis using DESeq2⁴⁹ in R using Benjamini-Hochberg false discovery rate (FDR) less than 0.05 and log fold change was calculates using apeglm⁷⁸. For estimating transcript abundance, transcripts per million (TPMs) were computed using the rsem-calculate-expression function in the RSEM v1.2.21 package¹²⁹.

ELISA for measuring testosterone production

To measure changes in testosterone production, we collected the supernatant from the gRNA pool transduced H295R cells two- and four- days post transduction. First, we diluted the supernatant 300-fold. Then, we measured the amount of testosterone produced using ELISA (Cayman Chemicals #582701) according to the manufacturer’s protocols using the given standard. All samples were measured in duplicate. The absorbance of the compound was measured at 405–420 nm using the GloMax Discover System (Promega). Fold-change reported is based on the negative (non-targeting gRNA) control.

ELISA for measuring estradiol production

To measure changes in estradiol production, we collected the supernatant from the gRNA pool transduced H295R cells two- and four- days post transduction. First, we diluted the supernatant 10-fold. Then, we measured the amount of estradiol produced using ELISA (Cayman Chemicals #501890) according to the manufacturer’s protocols using the given standard. All samples were measured in duplicate. The absorbance of the compound was measured at 405–420 nm using the GloMax Discover System (Promega). Fold-change reported is based on the negative (non-targeting gRNA) control.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The ATAC-Seq, STARR-seq and RNA-seq data generated in this study have been deposited in the NCBI Gene Expression Omnibus database under accession code GSE262471. VCF files used from the 1000 genomes project⁶⁹ can be accessed through a website specific to the project [https://www.internationalgenome.org/data-portal/data-collection/30x-grch38]. Summary statistics for the association analyses are available with this manuscript as the Supplementary Workbook file. Visualization of open chromatin regions (ATAC-seq) and STARR-seq results for the studied regions are available on UCSC Genome Browser track hub [https://genome.ucsc.edu/s/laavatar/2024%2DPCOS%2DSS%2Dpaper]. Source data are provided as Source Data file with this paper. Source data are provided with this paper.

Code availability

Source Data for the qPCR and ELISA experiments are provided in the Source Data file. The code used to generate the plots is uploaded onto GitHub.

References

Dapas, M. & Dunaif, A. Deconstructing a Syndrome: Genomic Insights Into PCOS Causal Mechanisms and Classification. Endocr. Rev. 43, 927–965 (2022).
Article PubMed PubMed Central Google Scholar
Chang, S. & Dunaif, A. Diagnosis of Polycystic Ovary Syndrome Which Criteria to Use and When?. Endocrinol. Metab. Clin. North Am. 50, 11–23 (2021).
Article PubMed PubMed Central Google Scholar
Rubin, K. H., Glintborg, D., Nybo, M., Abrahamsen, B. & Andersen, M. Development and Risk Factors of Type 2 Diabetes in a Nationwide Population of Women With Polycystic Ovary Syndrome. J. Clin. Endocrinol. Metab. 102, 3848–3857 (2017).
Article PubMed Google Scholar
Diamanti-Kandarakis, E. & Dunaif, A. Insulin resistance and the polycystic ovary syndrome revisited: an update on mechanisms and implications. Endocr. Rev. 33, 981–1030 (2012).
Article CAS PubMed PubMed Central Google Scholar
Brakta, S. et al. Perspectives on Polycystic Ovary Syndrome: Is Polycystic Ovary Syndrome Research Underfunded?. J. Clin. Endocrinol. Metab. 102, 4421–4427 (2017).
Article PubMed Google Scholar
Vink, J., Sadrzadeh, S., Lambalk, C. & Boomsma, D. Heritability of Polycystic Ovary Syndrome in a Dutch Twin-Family Study. J. Clin. Endocrinol. Metab. 91, 2100–2104 (2006).
Article CAS PubMed Google Scholar
Sollis, E. et al. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res. 51, D977–D985 (2022).
Article PubMed Central Google Scholar
Dapas, M. et al. Distinct subtypes of polycystic ovary syndrome with novel genetic associations: An unsupervised, phenotypic clustering analysis. PLoS Med. 17, e1003132 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hayes, G. M. et al. Genome-wide association of polycystic ovary syndrome implicates alterations in gonadotropin secretion in European ancestry populations. Nat. Commun. 6, 7502 (2015).
Article ADS CAS PubMed Google Scholar
Day, F. et al. Large-scale genome-wide meta-analysis of polycystic ovary syndrome suggests shared genetic architecture for different diagnosis criteria. PLOS Genet. 14, e1007813 (2018).
Article PubMed PubMed Central Google Scholar
Shi, Y. et al. Genome-wide association study identifies eight new risk loci for polycystic ovary syndrome. Nat. Genet. 44, 1020 (2012).
Article CAS PubMed Google Scholar
Tyrmi, J. S. et al. Leveraging Northern European population history: novel low-frequency variants for polycystic ovary syndrome. Hum. Reprod. 37, 352–365 (2021).
Article PubMed Central Google Scholar
Day, F. R. et al. Causal mechanisms and balancing selection inferred from genetic associations with polycystic ovary syndrome. Nat. Commun. 6, 8464 (2015).
Article ADS CAS PubMed Google Scholar
Chen, Z.-J. et al. Genome-wide association study identifies susceptibility loci for polycystic ovary syndrome on chromosome 2p16.3, 2p21 and 9q33.3. Nat. Genet. 43, 55 (2011).
Article PubMed Google Scholar
Zhang, Y. et al. A genome-wide association study of polycystic ovary syndrome identified from electronic health records. Am. J. Obstet. Gynecol. 223, 559.e1–559.e21 (2020).
Article CAS PubMed Google Scholar
Lee, H. et al. Genome-wide association study identified new susceptibility loci for polycystic ovary syndrome. Hum. Reprod. 30, 723–731 (2015).
Article CAS PubMed Google Scholar
Visscher, P. M. et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am. J. Hum. Genet. 101, 5–22 (2017).
Article CAS PubMed PubMed Central Google Scholar
Vockley, C. M., Barrera, A. & Reddy, T. E. Decoding the role of regulatory element polymorphisms in complex disease. Curr. Opin. Genet. Dev. 43, 38–45 (2017).
Article CAS PubMed Google Scholar
Uffelmann, E. et al. Genome-wide association studies. Nat. Rev. Methods Prim. 1, 59 (2021).
Article CAS Google Scholar
Pai, A. A., Pritchard, J. K. & Gilad, Y. The Genetic and Mechanistic Basis for Variation in Gene Regulation. PLoS Genet. 11, e1004857 (2015).
Article PubMed PubMed Central Google Scholar
Roussos, P. et al. A Role for Noncoding Variation in Schizophrenia. Cell Rep. 9, 1417–1429 (2014).
Article CAS PubMed PubMed Central Google Scholar
McAllister, J. M. et al. Overexpression of a DENND1A isoform produces a polycystic ovary syndrome theca phenotype. Proc. Natl Acad. Sci. 111, E1519–E1527 (2014).
Article CAS PubMed PubMed Central Google Scholar
Teves, M. E. et al. Human DENND1A.V2 Drives Cyp17a1 Expression and Androgen Production in Mouse Ovaries and Adrenals. Int J. Mol. Sci. 21, 2545 (2020).
Article CAS PubMed PubMed Central Google Scholar
Dapas, M. et al. Family-based quantitative trait meta-analysis implicates rare noncoding variants in DENND1A in polycystic ovary syndrome. J. Clin. Endocrinol. Metab. https://doi.org/10.1210/jc.2018-02496 (2019).
Waterbury, J. S. et al. The PCOS GWAS Candidate Gene ZNF217 Influences Theca Cell Expression of DENND1A.V2, CYP17A1, and Androgen Production. J. Endocr. Soc. 6, bvac078 (2022).
Article PubMed PubMed Central Google Scholar
McAllister, J. M. et al. MicroRNA Profiling Reveals miRNA-130b-3p Mediates DENND1A Variant 2 Expression and Androgen Biosynthesis. Endocrinology 160, 1964–1981 (2019).
Article CAS PubMed PubMed Central Google Scholar
Tee, M. K. et al. Alternative splicing of DENND1A, a PCOS candidate gene, generates variant 2. Mol. Cell Endocrinol. 434, 25–35 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kulkarni, R., Teves, M. E., Han, A. X., McAllister, J. M. & Strauss, J. F. Co-Localization of Polycystic Ovary Syndrome Candidate Gene Products in Theca Cells Suggests Novel Signaling Pathways. J. Endocr. Soc. 3, 2204–2223 (2019).
Article CAS PubMed PubMed Central Google Scholar
Legro, R. S., Driscoll, D., Strauss, J. F., Fox, J. & Dunaif, A. Evidence for a genetic basis for hyperandrogenemia in polycystic ovary syndrome. Proc. Natl Acad. Sci. 95, 14956–14960 (1998).
Article ADS CAS PubMed PubMed Central Google Scholar
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell 169, 1177–1186 (2017).
Article CAS PubMed PubMed Central Google Scholar
Tewhey, R. et al. Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay. Cell 165, 1519–1529 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kheradpour, P. et al. Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res. 23, 800–811 (2013).
Article CAS PubMed PubMed Central Google Scholar
Arnold, C. D., et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Sci. (N.Y., N.Y.) 339, 1074–1077 (2013).
Article ADS CAS Google Scholar
Muerdter, F. et al. Resolving systematic errors in widely used enhancer activity assays in human cells. Nat. Methods 15, 141 (2017).
Article PubMed PubMed Central Google Scholar
Johnson, G. D. et al. Human genome-wide measurement of drug-responsive regulatory activity. Nat. Commun. 9, 5317 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Muerdter, F., Boryń, ŁM. & Arnold, C. D. STARR-seq — Principles and applications. Genomics 106, 145–150 (2015).
Article CAS PubMed Google Scholar
Vockley, C. M. et al. Massively parallel quantification of the regulatory effects of noncoding genetic variation in a human cohort. Genome Res. 25, 1206–1214 (2015).
Article CAS PubMed PubMed Central Google Scholar
Shen, S. Q. et al. Massively parallel cis-regulatory analysis in the mammalian central nervous system. Genome Res. 26, 238–255 (2016).
Article CAS PubMed PubMed Central Google Scholar
Long, E. et al. Massively parallel reporter assays and variant scoring identified functional variants and target genes for melanoma loci and highlighted cell-type specificity. Am. J. Hum. Genet. https://doi.org/10.1016/j.ajhg.2022.11.006 (2022).
Thakore, P. I., Black, J. B., Hilton, I. B. & Gersbach, C. A. Editing the epigenome: technologies for programmable transcription and epigenetic modulation. Nat. methods 13, 127–137 (2016).
Article CAS PubMed PubMed Central Google Scholar
Klann, T. S. et al. CRISPR-Cas9 epigenome editing enables high-throughput screening for functional regulatory elements in the human genome. Nat. Biotechnol. 35, 561–568 (2017).
Article CAS PubMed PubMed Central Google Scholar
Canver, M. C., Bauer, D. E. & Orkin, S. H. Functional interrogation of non-coding DNA through CRISPR genome editing. Methods 121, 118–129 (2017).
Article PubMed Google Scholar
Yu, J. et al. Baicalin inhibits recruitment of GATA1 to the HSD3B2 promoter and reverses hyperandrogenism of PCOS. J. Endocrinol. https://doi.org/10.1530/joe-18-0678 (2019).
Lin, J., Huang, H., Lin, L., Li, W. & Huang, J. MiR-23a induced the activation of CDC42/PAK1 pathway and cell cycle arrest in human cov434 cells by targeting FGD4. J. Ovarian Res. 13, 90 (2020).
Article CAS PubMed PubMed Central Google Scholar
Rainey, W. E., Bird, I. M. & Mason, J. I. The NCI-H295 cell line: a pluripotent model for human adrenocortical studies. Mol. Cell. Endocrinol. 100, 45–50 (1994).
Article CAS PubMed Google Scholar
Zhang, H. et al. Characterization of an immortalized human granulosa cell line (COV434). Mol. Hum. Reprod. 6, 146–153 (2000).
Article CAS PubMed Google Scholar
Karnezis, A. N. et al. Re-assigning the histologic identities of COV434 and TOV-112D ovarian cancer cell lines. Gynecol. Oncol. 160, 568–578 (2021).
Article CAS PubMed Google Scholar
Kim, Y.-S. et al. Correcting signal biases and detecting regulatory elements in STARR-seq data. Genome Res 31, gr.269209.120 (2021).
Article Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central Google Scholar
Tehranchi, A. et al. Fine-mapping cis-regulatory variants in diverse human populations. eLife 8, e39595 (2019).
Article PubMed PubMed Central Google Scholar
Currin, K. W. et al. Genetic effects on liver chromatin accessibility identify disease regulatory variants. Am. J. Hum. Genet. 108, 1169–1189 (2021).
Article CAS PubMed PubMed Central Google Scholar
Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Article ADS CAS Google Scholar
Luo, Y. et al. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic Acids Res. 48, D882–D889 (2019).
Article PubMed Central Google Scholar
Abascal, F. et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
Article Google Scholar
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. Plos Genet 10, e1004383 (2014).
Article PubMed PubMed Central Google Scholar
Consortium, Gte. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Article Google Scholar
Efimenko, E. et al. The transcription factor GATA4 is required for follicular development and normal ovarian function. Dev. Biol. 381, 144–158 (2013).
Article CAS PubMed PubMed Central Google Scholar
Yoshimura, S., Gerondopoulos, A., Linford, A., Rigden, D. J. & Barr, F. A. Family-wide characterization of the DENN domain Rab GDP-GTP exchange factors. J. Cell Biol. 191, 367–381 (2010).
Article CAS PubMed PubMed Central Google Scholar
Marat, A. L. & McPherson, P. S. The Connecdenn Family, Rab35 Guanine Nucleotide Exchange Factors Interfacing with the Clathrin Machinery*. J. Biol. Chem. 285, 10627–10637 (2010).
Article CAS PubMed PubMed Central Google Scholar
Berthelot, C., Villar, D., Horvath, J. E., Odom, D. T. & Flicek, P. Complexity and conservation of regulatory landscapes underlie evolutionary resilience of mammalian gene expression. Nat. Ecol. Evol. 2, 152–163 (2018).
Article PubMed Google Scholar
Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).
Article CAS PubMed PubMed Central Google Scholar
Bohaczuk, S. C., Thackray, V. G., Shen, J., Skowronska-Krawczyk, D. & Mellon, P. L. FSHB Transcription is Regulated by a Novel 5’ Distal Enhancer with a Fertility-Associated Single Nucleotide Polymorphism. Endocrinology https://doi.org/10.1210/endocr/bqaa181 (2020).
Kiiveri, S. et al. Transcription Factors GATA-4 and GATA-6 in Human Adrenocortical Tumors. Endocr. Res. 30, 919–923 (2004).
Article CAS PubMed Google Scholar
Flück, C. E. & Miller, W. L. GATA-4 and GATA-6 Modulate Tissue-Specific Transcription of the Human Gene for P450c17 by Direct Interaction with Sp1. Mol. Endocrinol. 18, 1144–1157 (2004).
Article PubMed Google Scholar
Tian, Y. et al. PCOS-GWAS Susceptibility Variants in THADA, INSR, TOX3, and DENND1A Are Associated With Metabolic Syndrome or Insulin Resistance in Women With PCOS. Front. Endocrinol. 11, 274 (2020).
Article ADS Google Scholar
Crespo, R. P. et al. High-throughput Sequencing to Identify Monogenic Etiologies in a Preselected Polycystic Ovary Syndrome Cohort. J. Endocr. Soc. 6, bvac106 (2022).
Article PubMed PubMed Central Google Scholar
Liu, Q. et al. A genome-wide cross-trait analysis identifies shared loci and causal relationships of type 2 diabetes and glycaemic traits with polycystic ovary syndrome. Diabetologia 65, 1483–1494 (2022).
Article CAS PubMed PubMed Central Google Scholar
Majoros, W. H. et al. Bayesian Estimation of Genetic Regulatory Effects in High-throughput Reporter Assays. Bioinformatics https://doi.org/10.1093/bioinformatics/btz545 (2019).
Consortium, 1000 Genomes Project et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article Google Scholar
Chen, L.-F. et al. Enhancer Histone Acetylation Modulates Transcriptional Bursting Dynamics of Neuronal Activity-Inducible Genes. Cell Rep. 26, 1174–1188.e5 (2019).
Article CAS PubMed PubMed Central Google Scholar
Hilton, I. B. et al. Epigenome editing by a CRISPR-Cas9-based acetyltransferase activates genes from promoters and enhancers. Nat. Biotechnol. 33, 510–517 (2015).
Article CAS PubMed PubMed Central Google Scholar
Thakore, P. I. et al. Highly specific epigenome editing by CRISPR-Cas9 repressors for silencing of distal regulatory elements. Nat. Methods 12, 1143–1149 (2015).
Article CAS PubMed PubMed Central Google Scholar
Gilbert, L. A. et al. CRISPR-Mediated Modular RNA-Guided Regulation of Transcription in Eukaryotes. Cell 154, 442–451 (2013).
Article CAS PubMed PubMed Central Google Scholar
Gilbert, L. A. et al. Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation. Cell 159, 647–661 (2014).
Article CAS PubMed PubMed Central Google Scholar
Scholl, U. I. et al. CLCN2 chloride channel mutations in familial hyperaldosteronism type II. Nat. Genet. 50, 349–354 (2018).
Article CAS PubMed PubMed Central Google Scholar
Malm, M. et al. Evolution from adherent to suspension: systems biology of HEK293 cell line development. Sci. Rep. 10, 18996 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Haggard, D. E. et al. High-Throughput H295R Steroidogenesis Assay: Utility as an Alternative and a Statistical Approach to Characterize Effects on Steroidogenesis. Toxicol. Sci. 162, 509–534 (2017).
Article Google Scholar
Zhu, A., Ibrahim, J. G. & Love, M. I. Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences. Bioinformatics 35, 2084–2092 (2018).
Article PubMed Central Google Scholar
Musunuru, K. et al. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466, 714–719 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Dumesic, D. A., Abbott, D. H., Eisner, J. R. & Goy, R. W. Prenatal exposure of female rhesus monkeys to testosterone propionate increases serum luteinizing hormone levels in adulthood. Fertil. Steril. 67, 155–163 (1997).
Article CAS PubMed Google Scholar
McGee, W. K. et al. Elevated androgens during puberty in female rhesus monkeys lead to increased neuronal drive to the reproductive axis: a possible component of polycystic ovary syndrome. Hum. Reprod. 27, 531–540 (2012).
Article CAS PubMed Google Scholar
Pinilla, L. et al. Changes in pituitary secretion during the early postnatal period and anovulatory syndrome induced by neonatal oestrogen or androgen in rats. Reproduction 97, 13–20 (1993).
Article CAS Google Scholar
Lu, H. et al. Dissecting the Impact of Maternal Androgen Exposure on Developmental Programming through Targeting the Androgen Receptor. Adv. Sci. 11, 2309429 (2024).
Article CAS Google Scholar
Eisner, J. R., Barnett, M. A., Dumesic, D. A. & Abbott, D. H. Ovarian hyperandrogenism in adult female rhesus monkeys exposed to prenatal androgen excess. Fertil. Steril. 77, 167–172 (2002).
Article PubMed Google Scholar
Homer, M. V., Rosencrantz, M. A., Shayya, R. F. & Chang, R. J. The effect of estradiol on granulosa cell responses to FSH in women with polycystic ovary syndrome. Reprod. Biol. Endocrinol. 15, 13 (2017).
Article PubMed PubMed Central Google Scholar
Marie, C. et al. Dysfunction of Human Estrogen Signaling as a Novel Molecular Signature of Polycystic Ovary Syndrome. Int. J. Mol. Sci. 24, 16689 (2023).
Article CAS PubMed PubMed Central Google Scholar
Corradin, O. et al. Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits. Genome Res. 24, 1–13 (2014).
Article CAS PubMed PubMed Central Google Scholar
Guo, C. et al. Coordinated regulatory variation associated with gestational hyperglycaemia regulates expression of the novel hexokinase HKDC1. Nat. Commun. 6, 6069 (2015).
Article ADS CAS PubMed Google Scholar
Hormozdiari, F., Kostem, E., Kang, E. Y., Pasaniuc, B. & Eskin, E. Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014).
Article CAS PubMed PubMed Central Google Scholar
Maurano, M. T. et al. Systematic Localization of Common Disease-Associated Variation in Regulatory DNA. Science 337, 1190–1195 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Ma, M. et al. Disease-associated variants in different categories of disease located in distinct regulatory elements. BMC Genom. 16, S3 (2015).
Google Scholar
Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 45, 124–130 (2013).
Article CAS PubMed Google Scholar
Nicolae, D. L. et al. Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS. PLoS Genet. 6, e1000888 (2010).
Article PubMed PubMed Central Google Scholar
Hormozdiari, F. et al. Colocalization of GWAS and eQTL Signals Detects Target Genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).
Article CAS PubMed PubMed Central Google Scholar
Wallace, C. A more accurate method for colocalisation analysis allowing for multiple causal variants. PLoS Genet. 17, e1009440 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kircher, M. et al. Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution. Nat. Commun. 10, 3583 (2019).
Article ADS PubMed PubMed Central Google Scholar
Myint, L. et al. A screen of 1,049 schizophrenia and 30 Alzheimer’s-associated variants for regulatory potential. Am. J. Méd. Genet. Part B: Neuropsychiatr. Genet. 183, 61–73 (2020).
Article CAS Google Scholar
Liu, S. et al. Systematic identification of regulatory variants associated with cancer risk. Genome Biol. 18, 194 (2017).
Article MathSciNet PubMed PubMed Central Google Scholar
Ouwerkerk et al. Identification of Functional Variant Enhancers Associated With Atrial Fibrillation. Circ. Res 127, 229–243 (2020).
Article PubMed Google Scholar
Zhang, P. et al. High-throughput screening of prostate cancer risk loci by single nucleotide polymorphisms sequencing. Nat. Commun. 9, 2022 (2018).
Article ADS PubMed PubMed Central Google Scholar
Kneppers, J. et al. Extensive androgen receptor enhancer heterogeneity in primary prostate cancers underlies transcriptional diversity and metastatic potential. Nat. Commun. 13, 7367 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Bohaczuk, S. C., Cassin, J., Slaiwa, T. I., Thackray, V. G. & Mellon, P. L. Distal Enhancer Potentiates Activin- and GnRH-Induced Transcription of FSHB. Endocrinol. 162, bqab069 (2021).
Gorsic, L. K., Dapas, M., Legro, R. S., Hayes, M. & Urbanek, M. Functional Genetic Variation in the Anti-Müllerian Hormone Pathway in Women With Polycystic Ovary Syndrome. J. Clin. Endocrinol. Metab. 104, 2855–2874 (2019).
Article PubMed PubMed Central Google Scholar
Censin, J. C., Bovijn, J., Holmes, M. V. & Lindgren, C. M. Colocalization analysis of polycystic ovary syndrome to identify potential disease-mediating genes and proteins. Eur. J. Hum. Genet. 1–9 https://doi.org/10.1038/s41431-021-00835-8 (2021).
Soldner, F. et al. Parkinson-associated risk variant in distal enhancer of α-synuclein modulates target gene expression. Nature 533, 95–99 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Abell, N. S., et al. Multiple causal variants underlie genetic associations in humans. Science 375, 1247–1254 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Brandt, M. M. et al. Chromatin Conformation Links Distal Target Genes to CKD Loci. J. Am. Soc. Nephrol. 29, 462–476 (2018).
Article CAS PubMed Google Scholar
Neumayr, C., Pagani, M., Stark, A. & Arnold, C. D. STARR-seq and UMI-STARR-seq: Assessing Enhancer Activities for Genome-Wide-, High-, and Low-Complexity Candidate Libraries. Curr. Protoc. Mol. Biol. 128, e105 (2019).
Article PubMed PubMed Central Google Scholar
Institute, B. Picard Toolkit. https://broadinstitute.github.io/picard/ (2019).
Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).
Article PubMed PubMed Central Google Scholar
Fuchsberger, C., Abecasis, G. R. & Hinds, D. A. minimac2: faster genotype imputation. Bioinformatics 31, 782–784 (2015).
Article CAS PubMed Google Scholar
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
Article CAS PubMed PubMed Central Google Scholar
Loh, P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
Article CAS PubMed PubMed Central Google Scholar
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Das, S., Abecasis, G. R. & Browning, B. L. Genotype Imputation from Large Reference Panels. Annu. Rev. Genom. Hum. Genet. 19, 1–24 (2018).
Article Google Scholar
Purcell, S. et al. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
Article CAS PubMed Google Scholar
Geijn, B. et al. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat. Methods 12, 1061–1063 (2015).
Article PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008 (2021).
Article PubMed PubMed Central Google Scholar
Corces, M. R. et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods 14, 959–962 (2017).
Article CAS PubMed PubMed Central Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Ramírez, F., Dündar, F., Diehl, S., Grüning, B. A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014).
Article PubMed PubMed Central Google Scholar
Perez, A. R. et al. GuideScan software for improved single and paired CRISPR guide RNA design. Nat. Biotechnol. 35, nbt.3804 (2017).
Article Google Scholar
Datlinger, P. et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat. Methods 14, 297–301 (2017).
Article CAS PubMed PubMed Central Google Scholar
Gasperini, M. et al. A Genome-wide Framework for Mapping Gene Regulation via Cellular Genetic Screens. Cell 176, 1516 (2019).
Article CAS PubMed Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2012).
Article PubMed PubMed Central Google Scholar
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinform. 12, 323 (2011).
Article CAS Google Scholar

Download references

Acknowledgements

This work was funded by the following grant: R01HD100812 (A.D.). The Genotype-Tissue Expression (GTEx) Project was supported by the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The data used for the analyses described in this manuscript was obtained from the GTEx Portal on 7 June 2020. We received the dCas9-KRAB as a generous gift from Dr. Charles Gersbach, Duke University. We thank the Duke University School of Medicine for the use of the Sequencing and Genomic Technologies Shared Resource, which provided sequencing services for some of the ATAC-seq and STARR-seq studies. The following figures 1a, 4b and 5b, supplementary FigS. 7 and 16 were created with BioRender.com

Author information

Graham D. Johnson
Present address: Duke Human Vaccine Institute, Duke University, Durham, NC, USA
These authors jointly supervised this work: Andrea Dunaif, Timothy E. Reddy.

Authors and Affiliations

University Program in Genetics and Genomics, Duke University, Durham, NC, USA
Laavanya Sankaranarayanan & Timothy E. Reddy
Department of Biostatistics and Bioinformatics, Duke University Medical School, Durham, NC, USA
Laavanya Sankaranarayanan, Shauna Morrow, Graham D. Johnson, Alejandro Barrera, Revathy Venukuttan & Timothy E. Reddy
Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
Laavanya Sankaranarayanan, Shauna Morrow, Graham D. Johnson, Alejandro Barrera, Revathy Venukuttan & Timothy E. Reddy
Division of Endocrinology, Diabetes and Bone Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Kelly J. Brewer & Andrea Dunaif
Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
Ryan Sisk
Altius Institute for Biomedical Science, Seattle, WA, USA
Timothy E. Reddy

Authors

Laavanya Sankaranarayanan
View author publications
Search author on:PubMed Google Scholar
Kelly J. Brewer
View author publications
Search author on:PubMed Google Scholar
Shauna Morrow
View author publications
Search author on:PubMed Google Scholar
Graham D. Johnson
View author publications
Search author on:PubMed Google Scholar
Alejandro Barrera
View author publications
Search author on:PubMed Google Scholar
Revathy Venukuttan
View author publications
Search author on:PubMed Google Scholar
Ryan Sisk
View author publications
Search author on:PubMed Google Scholar
Andrea Dunaif
View author publications
Search author on:PubMed Google Scholar
Timothy E. Reddy
View author publications
Search author on:PubMed Google Scholar

Contributions

L.S., T.E.R., and A.D., conceptualized the study. L.S., G.D.J., and T.E.R., developed the experimental methodology of this study. L.S., and S.M., performed the experiments. L.S., K.J.B., T.E.R., and A.D., designed the analyses. L.S., K.J.B., A.B., R.V., and R.S., carried out formal analysis. T.E.R., and A.D., supervised and funded the research. L.S., T.E.R., and A.D., wrote the original draft. L.S., K.J.B., T.E.R., and A.D., reviewed and edited the manuscript. All authours reviewed and provided feedback for this manuscript.

Corresponding authors

Correspondence to Andrea Dunaif or Timothy E. Reddy.

Ethics declarations

Competing interests

The authours declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Jenny Fung, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1–16

Supplementary Data 17

Reporting Summary

Transparent Peer Review File

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Sankaranarayanan, L., Brewer, K.J., Morrow, S. et al. Gene regulatory activity associated with polycystic ovary syndrome revealed DENND1A-dependent testosterone production. Nat Commun 16, 7697 (2025). https://doi.org/10.1038/s41467-025-62884-7

Download citation

Received: 27 March 2024
Accepted: 01 August 2025
Published: 18 August 2025
Version of record: 18 August 2025
DOI: https://doi.org/10.1038/s41467-025-62884-7

This article is cited by

Exosomal miR-199a-3p derived from follicular fluid modulates the immune microenvironment in polycystic ovary syndrome by regulating macrophage polarization
- Xina Zhen
- Lin Zhu
- Fang Lian
Journal of Ovarian Research (2026)

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Measuring the regulatory activity of PCOS-associated regulatory elements

Regulatory element activity in PCOS GWAS regions corresponds to regions of chromatin accessibility

PCOS-associated genetic variants fine-mapped to within regulatory elements

Active STARR-seq regions have increased conservation score

Allele-specific regulatory variants identified in DENND1A locus

Endogenous perturbation of PCOS-associated regulatory elements impacted DENND1A expression

Endogenous perturbation of PCOS-associated regulatory elements impacted hormone production in steroidogenic adrenal cells

Discussion

Methods

STARR Seq assay library construction

Selection of GWAS regions for targeted STARR-seq assays

STARR-seq reporter plasmid construction

PCOS GWAS STARR-seq assay library sequencing

Cell culture protocol

Nucleofection optimization

Transfection of cells

PCOS GWAS STARR-seq reporter library construction

RNA extraction

PCOS GWAS STARR-seq reporter construction

PCOS GWAS STARR-seq reporter library sequencing

Alignments and STARR-seq analysis

PCOS case-control variant association testing within candidate regulatory regions

Colocalization testing

DENND1A-enriched STARR seq assay library construction

Enrichment for the targeted DENND1A region

DENND1A locus STARR-seq reporter plasmid construction

DENND1A locus STARR-seq assay library sequencing

DENND1A locus enriched STARR-seq assay

DENND1A locus STARR-seq reporter library construction

Candidate regulatory variant identification

ATAC-Seq

ATAC-seq preprocessing and alignment

Generating cell lines for CRISPRa perturbation studies

GuideRNA (gRNA) design and gRNA plasmid synthesis

Lentivirus production

Generating stable H295R-dCas9-P300 and H295R-dCas9-KRAB cell lines

Transduction of gRNA into dCas9-P300 or dCas9-KRAB expressing cell lines

RNA isolation and qRT-PCR to measure gene expression levels

RNA sequencing of H295R cells with forskolin treatment

RNA sequencing analysis

ELISA for measuring testosterone production

ELISA for measuring estradiol production

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links