Introduction

Polycystic Ovary Syndrome (PCOS) is one of the most common disorders affecting people who menstruate with prevalence rates of 5% to 15%1, depending on the diagnostic criteria applied2. It is the leading cause of anovulatory infertility. PCOS is commonly associated with insulin resistance and obesity, disorders that confer increased risk for type 2 diabetes as well as for other serious cardiometabolic morbidities across the lifespan3,4. However, the cause(s) of PCOS remains unknown and the disorder is relatively understudied compared to other common medical conditions affecting women5

Genetic factors are a major contributor to PCOS. Twin studies estimate that the narrow-sense heritability of PCOS is ~79%6. There are currently 30 genomic loci that are associated with altered PCOS risk. One locus, ZBTB16, is associated with decreased PCOS risk (odds ratio = 0.8), while the rest of the loci are associated with increased PCOS risk with odds ratios between 1.1 and 5.6 (GWAS catalog accessed 20 Oct 2023)7,8,9,10,11,12,13,14,15,16. The associated regions encompass genes involved in neuroendocrine, reproductive, and metabolic pathways. The functional consequences of noncoding genetic variants associated with complex traits such as PCOS have been exceptionally difficult to elucidate17,18. One challenge of fine mapping GWAS signals is the difficulty in identifying causal genetic variant(s) from other genetic variants in regions of strong linkage disequilibrium (LD). In general, the lead GWAS SNPs are not the causal variants but are tagging regions of the genome containing non-coding pathogenic variants17,19 that contribute to common disease risk by altering regulatory element activity and downstream gene expression20,21. Nevertheless, GWAS have provided considerable insight into PCOS causal pathways. DENND1A was first identified as a PCOS candidate gene in GWAS14. DENND1A was subsequently shown to be an important regulator of theca cell androgen biosynthesis where ectopic overexpression led to increased androgen production22,23,24,25,26,27,28. Collectively, rare variants in DENND1A were associated with PCOS quantitative traits in 50% of affected families24. Taken together with previous studies indicating that elevated testosterone levels were a consistent endophenotype in sisters of women with PCOS29, these genetic analyses implicate DENND1A as a core gene30 in PCOS pathogenesis. However, a mechanistic link between the noncoding genome, altered DENND1A expression, and testosterone production has yet to be demonstrated.

In this work, we evaluate the effects of non-coding genomic regions associated with PCOS risk on gene regulatory element activity. To measure regulatory element activity across PCOS-associated genomic loci, we use high throughput reporter assays using bacterial artificial chromosomes and fosmids spanning PCOS GWAS risk loci. High-throughput reporter assays such as STARR-Seq can quantify the regulatory activity of millions of genomic fragments at once31,32,33,34,35. This scale enables systematic studies of the effects of non-coding variants across megabases of the genome and in many different cell types31,36,37,38,39. To prioritize variants, we identify targeted genetic variant associations within the identified regulatory elements as well as modify STARR-Seq to implement a locus-specific allele-specific reporter assay. Together, these data highlight the role of DENND1A in contribution to PCOS risk. As proof of concept of regulatory elements impacting gene expression in PCOS risk loci, we perturb PCOS-associated regulatory elements near DENND1A using CRISPR-based epigenome editing40,41,42. We find that epigenetic activation of those regulatory elements in an androgen-producing adrenocortical cell model increased both DENND1A expression and testosterone production. Together, these findings suggest an endogenous gene-regulatory mechanism contributing to PCOS; and demonstrate an approach for identifying additional molecular mechanisms of PCOS.

Results

Measuring the regulatory activity of PCOS-associated regulatory elements

To identify gene regulatory elements in which genetic variation can contribute to PCOS risk, we analyzed 14 genetic associations identified in cohorts of European and Han Chinese ancestry at the time of this study9,10,11,13,14 (Table 1). Those 14 associations included several genes involved in hormone synthesis via the hypothalamic-pituitary-ovarian axis including FSHR, FSHB, LHCGR and DENND1A. We focused on two human cell models: a testosterone-producing adrenal cell line, H295R; and an estradiol-producing ovarian cell line, COV43426,43,44,45,46,47.

Table 1 List of PCOS GWAS loci selected for STARR-seq experiments

To measure regulatory activity in these two cell lines, we used a high-throughput reporter assay known as STARR-seq33,34 (Fig. 1a). STARR-seq can assay millions of DNA fragments for regulatory activity. STARR-seq assays work through two key libraries – an input library termed ‘assay library’ and a library of the regulatory effect readout termed ‘reporter library’ in this study. Briefly, the assay library consists of plasmid reporter assays containing diverse DNA fragments of interest. When transfected into cells, the DNA fragments regulate their own transcription into mRNA molecules. Thus, by sequencing the reporter library of the resulting mRNA fragments, one can estimate the regulatory activity of each DNA fragment in the assay library.

Fig. 1: Measuring the regulatory activity in PCOS GWAS loci.
figure 1

a Overview of targeted STARR-seq method: We selected bacterial artificial chromosomes (BACs) or fosmids spanning 14 PCOS GWAS loci and sheared them to ~400 bp. The sheared fragments were inserted into the digested STARR-seq backbone (Addgene#99296). The resulting plasmid library was sequenced to form the control assay library. For measuring regulatory activity, the plasmid pool was transfected into the respective cell lines (2 μg plasmid pool / 1 million cells). Six hours post transfection, RNA was isolated from the cells, and the STARR-seq transcripts were enriched and sequenced as the output reporter library. Candidate regulatory elements were called using CRADLE48 and effect sizes estimated with DESeq249. Created using BioRender. b STARR-seq effect size for H295R cells: The effect size is estimated as pseudo log2 (fold change) using DESeq2 on CRADLE-corrected STARR-seq peak calls. Significant peak calls (Benjamini-Hochberg adjusted p-values, FDR ≤ 0.5%) are highlighted in purple. c STARR-seq effect size for COV434 cells: The effect size is estimated as pseudo log2 (fold change) using DESeq2 on CRADLE-corrected STARR-seq peak calls. Significant peak calls (Benjamini-Hochberg adjusted p-values, FDR ≤ 0.5%) are highlighted in teal. d Comparing effect sizes of shared regulatory elements: About 93 of the regulatory elements were shared between the two cell lines. The adjusted correlation coefficient, r2 is 0.55.

We constructed a STARR-seq assay library that spans 14 PCOS GWAS loci and encompassed 2.9 Mb of the human genome (Supplementary Data 1, 2). The assay library includes 179 open chromatin regions identified in H295R and COV434 (Supplementary Fig. 1). The median fragment length in the assay library was 320 bp, and the 260 bp in the reporter library (Supplementary Fig. 2). Assay library covers the target region at a median of >300x (Supplementary Fig. 3) and replicates are highly correlated with Pearson correlation coefficient (Pearson’ r > 0.95, Supplementary Fig. 4).

We called 956 regulatory elements in the 14 PCOS GWAS loci across the two cell models (Supplementary Data 3, 4) at a false discovery rate (FDR) ≤ 0.5%48. Between replicates in the same cell model, the estimated regulatory element activity was highly correlated (0.84 ≤ r ≤ 0.90, Supplementary Fig. 5). Much of the observed variation in effect sizes can be attributed to differences between assay and reporter libraries, and differences between cell lines (Supplementary Fig. 6). The strong correlation suggested that the targeted STARR-seq approach robustly estimated regulatory activity for the cell types within the PCOS GWAS loci.

We identified 464 and 585 regulatory elements in COV434 and H295R cells, respectively (Supplementary Fig. 7). In both cell models, about half of the identified regulatory elements had enhancer activity, and half had repressor activity49 (Fig. 1b, c). There were 93 regulatory elements identified in both cell lines. The regulatory activity of those commonly identified elements was highly concordant. The effect sizes in shared regulatory elements were substantially correlated (Pearson’s r = 0.81, p < 2 × 1016), and the direction of effects was the same for 85% of shared elements (Fig. 1d). The concordance in the direction of effect increased to 93% when we required the regulatory element calls to overlap in the genome by at least 50% (Supplementary Fig. 8, Pearson’s r = 0.85, p < 2 × 10−16). To our knowledge, this data set is the largest reporter-assay screen for enhancers in adrenal and ovarian cell models.

Regulatory element activity in PCOS GWAS regions corresponds to regions of chromatin accessibility

Enrichment of genetic associations in tissue specific sites of increased chromatin accessibility can predict causal tissues of disease50,51. Of the PCOS associated SNPs in the GWAS catalog across the 14 loci we tested, seven of those variants overlapped DNaseI hypersensitive sites (DHS) from the ENCODE consortium including rs2479106, rs4385527, rs705702, rs2272046, rs6022786, rs11225154 and rs1316485652. To increase confidence that the regulatory elements identified by STARR-seq are active in H295R and COV434 cells, we evaluated whether STARR-seq regulatory elements correspond to chromatin accessibility in the same cell lines. We identified ~73,000 and ~66,000 open chromatin sites in H295R and COV434, respectively, using ATAC-seq and MACS2 peak calling with false discovery rate <0.1. Between 40 and 50% of the open chromatin sites identified in each cell line overlapped sites in the other cell line (Supplementary Figs. 9, 10, 11). Those results revealed that a substantial number of chromatin accessible sites were shared between cell lines.

We identified 116 chromatin accessible sites within the 16 PCOS risk genomic regions we assayed with STARR-seq across each COV434 and H295R cell lines. We also discovered 39 (34%) and 37 (32%) had regulatory activity in H295R and COV434 cells, respectively, according to STARR-seq assays (Supplementary Data 5, 6). For H295R cells, the overlap between chromatin accessibility and STARR-seq activity was ~4-fold more than what would be expected if STARR-seq sites were randomly distributed across the genomic regions. For COV434, the overlap was ~6-fold more than expected by random (Fisher’s exact test, p < 2 × 10−4 for each cell line). There was also significantly greater regulatory activity in open chromatin regions in the same cell type or tissue than in regions with less chromatin accessibility (Fig. 2a) (Mann-Whitney U, p < 10−10 for H295R, p < 0.01 for COV434). Conversely, there was more chromatin accessibility in regions where we identified regulatory element activity (Fig. 2b).

Fig. 2: Characterizing candidate regulatory elements.
figure 2

a, c, e Candidate regulatory elements in H295R cells and COV434 cells with increasing activity correspond to regions with increased evidence of functionality. STARR-seq regulatory activity is measured across overlap with the respective cell line ATAC-seq (a, Mann-Whitney U, p < 10−10 for H295R, p < 0.01 for COV434), GTEx primary tissue ATAC-seq (c, Mann-Whitney U, p < 10−9 for H295R and COV434) and ENCODE candidate cis-regulatory elements (e, Mann-Whitney U, p < 10−9 for H295R and p < 10−6 for COV434). For (a, c, e); the box ranges from the 25th to the 75th percentile, and the median value (middle line). The whiskers extend from each quartile to the minimum and maximum values within 1.5 × interquartile range (IQR) of the box. Data points beyond this range are considered outliers (single points) and shown on these plots. Number of regions corresponding to each category, n, is specified on the plots. b Aggregate profile plots of chromatin accessibility based on ATAC-seq on the respective cell lines centred on the candidate regulatory elements (with increasing and decreasing effect sizes) across 400 bp windows for both cell lines (H295R in purple, COV434 in teal). The colour scale represents the aggregated signal (i.e., fragment coverage) within each defined STARR-seq element region, averaged as a single value. d Aggregate profile plots of chromatin accessibility based on ENCODE DNaseI Hypersensitive sites (DHS) centred on the active candidate regulatory elements across 400 bp windows for both cell lines (H295R in purple, COV434 in teal). Control regions (grey) are randomly generated genomic regions that are chromosome-, length- and GC-matched to the STARR-seq elements. The colour scale represents the aggregated signal (i.e., fragment coverage) within each defined DHS region, averaged as a single value.

We also investigated similarities and differences in regulatory activity between H295R and COV434 cells. We first identified regions with both STARR-seq activity and increased chromatin accessibility in H295R and COV434. There were 69 genomic regions that had significant regulatory activity and significant chromatin accessibility in either cell model. Of these, seven had regulatory activity in both cell models. The small overlap was due to differences in statistical power. Specifically, regulatory activity was similar across both cell types (ρ = 0.65, Supplementary Fig. 12). There was also no strong evidence of elements with opposing regulatory activity between cell types. Taken together, the high concordance of regulatory effect size across STARR-seq in H295R and COV434 suggested that regulatory activity was largely similar between the two steroidogenic cell lines across the PCOS GWAS loci tested.

To relate cell line observations to the corresponding primary tissues, we evaluated if STARR-seq regulatory activity was enriched in chromatin accessible sites in adrenal and ovarian tissues52,53. Approximately 18% of the identified H295R regulatory elements overlapped with open chromatin from primary adrenal tissue, and 24% of the identified COV434 regulatory elements overlapped with open chromatin from primary ovarian tissue. The overlap was a 2.8 and 3.1-fold enrichment in H295R and COV434, respectively, over what would be expected if regulatory elements were randomly distributed across the assayed regions (Fisher’s exact test, p-value < 10−7 for each cell line). As with our observations in H295R and COV434 cells, regulatory activity was greater in regions of accessible chromatin in primary tissue compared to those without accessible chromatin (Fig. 2c, Mann-Whitney U, p < 10−9 for each cell line). This result indicated that regulatory activity measurements in H295R and COV434 cells corresponded to activity in primary adrenal and ovarian cells, respectively.

Regulatory activity in H295R and COV434 cells also corresponded to chromatin accessibility in other tissues. About 50% of the regulatory elements we identified via STARR-seq (n = 296 for H295R, n = 304 for COV434) overlapped chromatin accessible sites identified in diverse tissues as part of the ENCODE project52,53. The overlap was 1.7- and 2.7-fold enriched over what would be expected if regulatory activity was randomly distributed across the assayed regions in H295R and COV434, respectively (Supplementary Fig. 13, Mann-Whitney U, corrected p-value < 10−4). ENCODE DNase hypersensitive sites also had increased activity in STARR-seq regulatory elements (Fisher’s exact test p < 10−12, Fig. 2d, Supplementary Fig. 14). We observed similar results when focusing on enhancer-like regions defined across diverse cells and tissues by the ENCODE project54. Specifically, ~30% of the regulatory elements we identified overlapped proximal or distal enhancers defined by ENCODE (n = 158 for H295R; n = 207 for COV434); and quantitative estimates of regulatory activity was greater in regions identified as enhancer-like sequences (Fig. 2e).

PCOS-associated genetic variants fine-mapped to within regulatory elements

To discover genetic variants that may alter regulatory activity and gene expression, we completed genetic association analyses focused on the regulatory elements we identified (Fig. 3a). To identify additional risk variants within these functional regulatory elements, we first tested for genetic associations between single nucleotide polymorphisms (SNPs) with minor allele frequency (MAF) > 1% and PCOS disease within the regulatory elements we identified. Across a cohort of 983 PCOS cases and 2951 controls9, we tested 759 SNPs in H295R cells − 433 in regulatory elements showing increasing activity and 326 in regulatory elements showing decreasing activity and 486 SNPs in COV434 cells − 290 in regulatory elements showing increasing activity and 196 in regulatory elements showing decreasing activity (Supplementary Data 7). Overall, we identified 19 variants with significant PCOS association. In H295R cells there were 7 variants that were significantly associated with PCOS (Bonferroni adjusted P < 1.15 × 10-4) in enhancers, and 11 variants in repressive regulatory elements (Bonferroni adjusted P < 1.53 × 10−4). In COV434 cells, there was 1 variant that was significantly associated with PCOS (Bonferroni adjusted P < 1.72 × 10−4) in an enhancer, and none in repressive regulatory elements. Of the 19 associated variants, four were in the follicle stimulating hormone subunit beta (FSHB) locus, six were in the neighboring ARL14EP-DT locus and two were in the GATA4/NEIL2 locus (Fig. 3b, c, Table 2, Supplementary workbook). There were four previously identified PCOS-associated risk variants in the regulatory elements we assayed: rs6022786 is an intergenic variant near SUMO1P1; rs2268361 is a variant in an intron of FSHR; rs11225154 is a variant in an intron of YAP1 and rs10835638 is a variant in an intron of ARL14EP-DT8,11,13. Of those, only rs6022786 was tested in this analysis, and there was not a significant association with PCOS in our cohort.

Fig. 3: Prioritizing PCOS-associated variants within functional regulatory elements.
figure 3

a Association analysis to identify PCOS-associated variants in regulatory elements. We use the candidate regulatory elements from STARR-seq experiments to define the genomic regions of interest. We then performed an association analysis to identify variants associated with PCOS using a cohort of 983 PCOS cases and 2951 controls (results in Table 2) adjusted using Bonferroni correction. We then colocalized the association analysis results with GTEx eQTL SNPs to identify SNPs and genes as those likely involved in PCOS pathogenesis (results in Table 3). b, c Regional locuszoom plots for single nucleotide polymorphisms (SNPs) in the FSHB/ARL14EP (b) and GATA4/NEIL2 locus (c), showing P-values obtained by logistic regression within candidate regulatory elements and PCOS case-control samples (fixed-effects). SNPs are colored by r2 linkage disequilibrium (LD) and lead SNP is colored purple.

Table 2 Top variants associated with PCOS within STARR-seq regulatory elements

To relate PCOS-associated non-coding variants to gene expression, we tested for colocalization55 between the PCOS-associated genetic variation in STARR-seq regulatory elements and expression quantitative trait loci (eQTLs) from GTEx56. Specifically, we used significant single tissue-eQTL association for this analysis. Using significant single-tissue eQTL associations, we identified seven variants across seven loci where PCOS and gene expression associations colocalized (posterior probability >0.6; Table 3, Supplementary Data 8). We also performed colocalization restricted to adrenal and ovarian tissues, given their relevance to our cell models. This analysis recovered four of the same colocalized variants. However, interpretation of these results is limited by the small sample size of these tissues in GTEx and their substantial cell-type heterogeneity in donor tissues. Therefore, we prioritized the colocalization results from the full GTEx dataset for downstream analysis (Table 3 - PP.AllTissue; Supplementary Fig. 16).

Table 3 Colocalization of PCOS-associated variants with eQTL data from GTEx

The candidate genes identified have plausible biological roles in PCOS pathogenesis. GATA4 encodes a transcription factor essential for embryonic development and ovarian function; its deletion impairs granulosa and theca cell proliferation57. FSHB encodes the beta subunit of follicle-stimulating hormone, a key regulator of follicular development. DENND1A, involved in clathrin-mediated endocytosis58,59, has been implicated in androgen biosynthesis22,23,24, including through studies in H295R cells27. Together, these analyses fine-map regulatory variants that influence expression of genes with established or plausible roles in PCOS.

Active STARR-seq regions have increased conservation score

Evolutionary conservation is another indicator of biological function that is complementary to chromatin accessibility and STARR-seq analyses. We anticipated that genes affecting fertility would have strong evolutionary consequences. In support of this notion, previous studies have reported that conservation of regulatory elements corresponds to a greater functional role in the organism60. Therefore, we investigated patterns of conservation across the regulatory elements we identified. We compared conservation scores of regulatory elements that we identified by STARR-seq across 20 vertebrate species61. The STARR-seq regulatory elements with enhancer activity had increased conservation score when compared to GC- and length-matched regions on the same chromosome (Supplementary Fig. 14b, Mann-Whitney U, p < 0.001). We also observed that the accessible chromatin region identified by ATAC-Seq within COV434 and H295R cells have higher conservation scores (Supplementary Fig. 15) when compared to similarly matched genomic regions from the same chromosome. These results further corroborated the functional importance of the regulatory elements we identified.

Allele-specific regulatory variants identified in DENND1A locus

As proof of concept that the regulatory elements we identified were relevant to PCOS pathogenesis, we focused on mechanisms contributing to altered expression of genes through identifying allele-specific regulatory variants and experimental perturbation methods. For the focused experiments, we chose the DENND1A locus. The top three genes nominated by STARR-Seq-colocalization analysis, and genetic association studies were GATA4, FSHB, and DENND1A (Supplementary Fig. 16, Supplementary Data 9). We focused on DENND1A for mechanistic studies because it is known to contribute to testosterone production, which is straightforward to measure; and because human steroidogenic adrenal cells that express DENND1A are readily available. In contrast, FSHB expression is limited to pituitary cells which remain challenging to culture62, and GATA4 is a developmental transcription factor that controls the expression of many genes, and further studies are needed to establish which of those genes and associated phenotypes are most relevant to PCOS63,64. DENND1A is a guanine nucleotide exchange factor involved in clathrin-mediated endocytosis58,59. DENND1A expression has been implicated in androgen biosynthesis22,23,24, in ovarian theca cell and H295R cell model27. Therefore, we focused on understanding the role of regulatory elements and genetic variants in controlling DENND1A expression and activity in H295R cells.

The DENND1A locus has been reproducibly associated with PCOS in Han Chinese and European cohorts9,10,24,65,66,67. However, the functional variants within DENND1A have not been identified. We mapped 38 candidate regulatory elements between the second and sixth introns of DENND1A spanning ~180 kb of the genome. Several of these regulatory elements overlapped regions called as candidate cis regulatory elements (cCRE) through ENCODE, or were in regions with increased chromatin accessibility in H295R and COV434 (Fig. 4a). The lead GWAS risk variants did not overlap the regulatory elements we identified in this study. However, for most of these candidate regulatory elements, there were common variants in linkage disequilibrium (Figure S17) with the lead GWAS SNPs. Taken together, these results suggested that regulatory variants within candidate regulatory elements could contribute to PCOS pathogenesis by affecting gene expression of the target gene of that regulatory element.

Fig. 4: Fine-mapping variants identified four regulatory variants that are also eQTLs for DENND1A.
figure 4

a Candidate regulatory elements in DENND1A locus identified in H295R (purple track) and COV434 (teal track) cell lines. Each STARR-seq track is reported as assay (input) subtracted reporter (output) libraries. Zoomed-in genomic regions show the candidate regulatory elements, ATAC-seq data conservation score and ENCODE cCREs of that region in detail. b Overview of enriched DENND1A-STARR-seq method. Genomes from five individuals from the 1000 Genomes Project were sheared to 200 bp. We enriched the target DENND1A locus using RNA-probes (using Agilent Sure Select System). The custom probes were designed to span the DENND1A locus at 2x tiling density. These enriched fragments were then subject to the STARR-seq protocol mentioned in Fig. 1a. Allele-specific regulatory effect was estimated using the BIRD model. Created using BioRender. c Distribution of allele-specific effect sizes as estimated by BIRD against their minor allele frequencies. The estimated significant SNPs with allele-specific regulatory activity (Posterior probability, P > 0.9) are in blue. Individual probabilities of the regulatory variant assessment is detailed in Supplementary Data 11. d Four variants identified to have allele-specific regulatory activity in H295R. SNPs in red are examples where the alternate allele has increased regulatory activity while SNPs in blue are examples where the reference allele has increased regulatory activity.

To measure the effects of genetic variation across DENND1A on gene expression, we captured genomic DNA spanning the entire DENND1A gene region from three individuals of European ancestry and two of Han Chinese ancestry. We then measured allele-specific regulatory activity in H295R cells using STARR-seq37 (Fig. 4b, Supplementary Data 10). In total, we assayed ~700,000 unique ~160 bp DNA fragments (Supplementary Figs. 18, 19). The assay library covered the DENND1A gene locus at a median coverage of 140x. The assay libraries were highly concordant, while the measure of log fold change between the reporter and assay libraries was moderately concordant among replicates (Pearson’s r > 0.72, Supplementary Fig. 20).

To estimate the allele specific regulatory effects, we used a Bayesian approach, BIRD, that identifies differences in the relative abundance of alleles in the assay library and in the expressed reporter library68. Of the 623 variants we assayed in the targeted locus, 62 had allele specific regulatory activity with a posterior probability, Preg > 0.90 (Supplementary Data 11). On average, the identified variants altered regulatory activity by 40% (Supplementary Fig. 21), and the minor alleles more often had less regulatory activity (chi-squared = 6.9, p-value = 0.009). We observed a modest correlation between the absolute effect size and the minor allele-frequency of the selected variants as determined by the 1000 Genomes project (ρ = −0.36, p = 0.005, Fig. 4c)69.

Of the 62 identified regulatory variants we identified, 24 were eQTLs for DENND1A (n = 11) or flanking genes CRB2, RABGAP1 or STRBP (n = 14)56. Of those variants, 12 also overlapped open chromatin sites or candidate cis-regulatory elements identified by ENCODE (Fig. 4d, Supplementary Fig. 22, Table 4, Supplementary Data 12). Furthermore, the lead variant from colocalization analyses, rs10117940 (Table 3) was also identified in allele-specific analysis with an effect size of 1.299 (p = 0.731). The variant, rs10117940, was in LD with two STARR-seq regulatory variants (rs28441318 and rs73665345) and a PCOS-associated rare variant24 (rs78012023) (0.32 < r2 < 0.65; 0.5 D’ > 0.9). These findings suggested that several loci within the DENND1A gene contributed to PCOS phenotypes by altering DENND1A gene expression.

Table 4 Candidate allele-specific regulatory variants identified

Endogenous perturbation of PCOS-associated regulatory elements impacted DENND1A expression

Estimating the effect of regulatory elements on altering gene expression can provide an insight into the underlying mechanisms that contribute to the development of PCOS. While reporter assays like STARR-seq can functionally test for allele-specific regulatory activity, the approach does not identify the target genes of those regulatory elements because the plasmids are not integrated in the genome. One approach to identify target genes of candidate regulatory elements is by epigenomic perturbation of that element. Specifically, a fusion of catalytically inactive Cas9 (dCas9) and histone acetyltransferase domain of P300 is targeted to candidate regulatory elements to measure the effects on the expression of nearby genes39. Several studies have demonstrated that dCas9-P300 can act over tens of kilobases, thus allowing the identification of distal gene regulatory elements41,70,71. In addition to epigenetic activation using P300, we used dCas9 that has been fused to Krüppel-associated box (KRAB). The dCas9-KRAB fusion can act as repressor to silence target gene expression when targeting gene promoters or gene regulatory elements72,73,74.

To identify target genes of PCOS-associated gene regulatory regions, we created both dCas9-P300-expressing H295R cells and dCas9-KRAB-expressing H295R cells. We targeted the dCas9 fusion protein to four candidate regulatory elements within the introns of the DENND1A gene and to the DENND1A promoter labeled “element 1-4” (Fig. 5a, Supplementary Fig. 24). To prioritize regions for perturbation studies, we first selected candidate regulatory elements based on positive effect size as measured by our CRADLE-DESeq2 analysis (n = 12). From those 12 regions, we selected the four candidate regulatory elements for perturbation based on the following two factors: i) identifying the overlapping chromatin accessibility track and ii) the ability to design > 5 targeting guides using guideScan2 given the genomic sequence, PAM restrictions, and specificity score > 0.2. (Fig. 5a).

Fig. 5: Perturbation of regulatory elements in DENND1A impacts testosterone levels.
figure 5

a Perturbation loci: We targeted four candidate regulatory elements (Elements 1-4, also Figure S23) and the promoter regions of DENND1A. We designed between 5&7 guides per region. The STARR-seq activity (purple) and chromatin accessibility (grey) for these regions are shown. b H295R cells that stably express dCas9-p300 or dCas9-KRAB were transduced with lentiviral pools of guide RNAs for each regulatory element. The cell media supernatant was collected 2- and 4-days post transduction for measuring testosterone concentration produced by the cells. RNA was harvested from the cells 4-days post transduction to measure changes in gene expression. Created using BioRender. c, d Log-fold change of DENND1A expression (GAPDH as control) for H295R-dCas9-p300 (c) and H295R-dCas9-KRAB (d) cells with 10 μM forskolin or DMSO control. A set of 5 non-targeting control guides were designed to not target any part of the human genome as control cell population. Each individual data point is a biological replicate (n = 4); and represent data from 3 technical replicates. e, f Testosterone concentration (ng/ml) measured in the cell media 4 days post transduction in H295R-dCas9-p300 (e) and H295R-dCas9-KRAB (f) cells targeted with the specific guide RNA. Cells were cultured in the presence or absence of 10 μM forskolin. For each sample, the respective drug-treated scrambled gRNA sample was used as the control. Each individual data point is a biological replicate (n = 4); and represent data from 2 technical replicates. For (cf); the box ranges from the 25th to the 75th percentile, and the median value (middle line). The whiskers extend from each quartile to the minimum and maximum values within 1.5 × interquartile range (IQR) of the box. The individual data points are also plotted. Significance of perturbations was done using two-sided t-test by comparing each perturbation to the negative control for the respective DMSO/Forskolin conditions. If the p-value is less 0.05, it is specified on the plot. Source data are provided as a Source Data file.

We designed 5–7 guide RNAs (gRNAs) for each of the four regulatory elements and promoter region (Supplementary Data 13, 14). As a negative control, we also designed five guide RNAs to not target any location in the human genome. We made lentiviral pools for each of the four targeted regions and for the negative controls. We then transduced each lentiviral pool into three cell lines, HEK293T modified to express dCas9-P300 and H295R cells that expressed dCas9-P300 or dCas9-KRAB. DENND1A was already expressed in both cell lines (average TPM: 20.4 for HEK23T and 15.4 for H295R)75,76, indicating that the gene was not in heterochromatin and thus could be targeted by dCas9-P300 or dCas9-KRAB effectively. Finally, we measured the effects on DENND1A expression via qPCR, and levels of testosterone at two time points (Fig. 5b).

In the established HEK293T-dCas9-P300 cells41, targeting dCas9-P300 to the DENND1A promoter increased DENND1A expression by 3.2-fold. Targeting dCas9-P300 to the intronic regulatory elements increased DENND1A expression between 2.1-fold and 2.6-fold. The increase in DENND1A expression was statistically significant compared to the effect of the non-targeting gRNAs for the promoter and all four of the regulatory elements after Bonferroni correction for multiple hypothesis testing (Supplementary Fig. 24, α < 0.05, t-test).

To test for the effect of DENND1A perturbation in H295R cells, we used both epigenetic activator (P300) and an epigenetic inhibitor (KRAB). Specifically, we measured changes in gene expression for cells exposed treated with Forskolin or DMSO as the control via RT-qPCR. Forskolin is an adenylate cyclase activator and has been shown to impact testosterone production77.

In the activation experiments using H295R-dCas9-P300 cells, we observed a trend of increased DENND1A gene expression for on-target CRISPR perturbation compared to the effect of the non-targeting gRNAs. We observed a 2.8-fold increase in DENND1A expression when targeting dCas9-P300 to the DENND1A promoter under the DMSO treatment. Targeting dCas9-P300 to the intronic regulatory elements increased DENND1A expression between 1.2-fold and 2-fold under DMSO treatment. In the forskolin treated cells, we observed a 4-fold increase in DENND1A expression when targeting dCas9-P300 to the DENND1A promoter, and a 1.8 fold to 2-fold increase in DENND1A expression while targeting regulatory elements 2,3 and 4. The increase in DENND1A expression was statistically significant compared to the effect of the non-targeting gRNAs under the respective drug treatment for the promoter of regulatory elements after Bonferroni correction (Fig. 5c, α < 0.05, t-test). However, there is no significant change in DENND1A expression levels as a result of forskolin treatment.

In the repression experiments using H295R-dCas9-KRAB cells, we observed a decreased DENND1A gene expression for on-target CRISPR perturbation compared to the effect of the non-targeting gRNAs. We observed a 1.4-fold decrease in DENND1A expression when targeting dCas9-KRAB to the DENND1A promoter under the DMSO treatment. Targeting dCas9-KRAB to the intronic regulatory elements decreased DENND1A expression between 1.1-fold and 1.3-fold under DMSO treatment. In the forskolin treated cells, we observed a 1.5-fold decrease in DENND1A expression when targeting dCas9-KRAB to the DENND1A promoter, and a 1.07–1.2 fold decrease in DENND1A expression while targeting regulatory elements 2, 3 and 4. The decrease in DENND1A expression was statistically significant compared to the effect of the non-targeting gRNAs for the promoter after Bonferroni correction (Fig. 5d, α < 0.05, t-test). However, there is no significant change in DENND1A expression levels as a result of forskolin treatment. In summary, we found several regulatory elements that impact DENND1A expression both with and without forskolin treatment.

We also tested the effects of the epigenetic perturbations of DENND1A on CYP17A1 gene expression via RT-qPCR. CYP17A is involved in the steroid synthesis pathways and is upstream of the production of steroid hormones, such as estrogen and testosterone. Forskolin treatment yielded a 4–4.5 fold increase in CYP17A1 gene expression levels compared to the DMSO treatment (p = 1.6 × 10−7, α < 0.05, t-test) across all conditions. Specifically, we observed a statistically significant increase in CYP17A1 expression following epigenetic activation of the DENND1A promoter under DMSO treatment (p = 0.03, t-test), but not when targeting the regulatory elements. Across all perturbations (activation and repression) under DMSO conditions, DENND1A and CYP17A1 expression levels were moderately correlated (Pearson’s r = 0.462; p = 0.00095). No correlation was observed under forskolin treatment, consistent with the dominant transcriptional effects of cAMP pathway activation. Additionally, in experiments involving DENND1A activation using dCas9-p300, we observed a moderate positive correlation across both DMSO- and forskolin-treated conditions (Pearson’s r = 0.462; p = 5.76 × 10−7). These findings support a context-dependent relationship in which increases in DENND1A expression are associated with increased CYP17A1 expression. The effects of forskolin on increasing CYP17A1 expression are expected to be substantially greater than any impact resulting from the moderate decrease in DENND1A expression induced by dCas9-KRAB.

As an alternative approach, we also used RNA-seq to assess changes in DENND1A and CYP17A1 after forskolin treatment. We identified CYP17A1 as one of the top 10 differentially expressed genes due to forskolin treatment in H295R cells (Supplementary Data 15) using DESeq2 for estimating the fold change of differentially expressed genes49,78. Those results closely mirrored our qPCR results for DENND1A (Supplementary Data 15, p.adj = 0.6 Benjamini-Hochberg correction, Supplementary Fig. 26 c), and a significant increase in CYP17A1 expression with forskolin treatment (Supplementary Data 15, p = 1.67 × 107, Benjamini-Hochberg correction, Supplementary Fig. 26d).

To test for off-target effects for genes in the DENND1A locus, we also measured gene expression changes for DENND1A flanking genes LHX2, CRB2, and STRBP. In all three cell lines, we found CRB2 was not expressed and that expression of LHX2 and STRBP was not affected (Supplementary Figs. 27, 28). We, therefore, inferred that the effects we observed were specific to DENND1A, and that genetic variation in the region likely contributes to PCOS via effects on DENND1A expression.

Endogenous perturbation of PCOS-associated regulatory elements impacted hormone production in steroidogenic adrenal cells

Changes in gene expression levels might alter physiologically relevant phenotypes that could contribute to disease pathogenesis79. The average circulating testosterone levels in PCOS patients is 2x-3x higher than control population when measured in early follicular phase and may play a causal role in PCOS pathogenesis29. Furthermore, androgen exposure in animals results in PCOS phenocopies80,81,82,83,84. To test if endogenous overexpression of DENND1A could alter testosterone production in H295R cells, we overexpressed DENND1A by targeting dCas9-P300 to the DENND1A promoter or distal regulatory elements. We then measured the concentration of testosterone in the cell culture media four days later. Increasing DENND1A expression via activating the promoter caused a 3.2-fold increase in testosterone concentration, while activating three of the four distal regulatory elements individually increased testosterone concentration by between 1.7-fold and 2.2-fold (Fig. 5e). The increases in testosterone concentration were statistically significant (α < 0.05, t-test). This increase in testosterone concentration by activating DENND1A is comparable to the clinically measured increase in testosterone levels among people with PCOS compared to the control group29. Complementary to the overexpression of DENND1A, in the H295R-dCas9-KRAB cells, repression of DENND1A impacted testosterone levels. We then measured the concentration of testosterone in the cell culture media four days after transduction with the guide RNAs. Decreasing DENND1A expression by targeting the DENND1A promoter yielded a 2-fold decrease in testosterone production, while targeting the three regulatory elements decreased testosterone concentration between 1.5 and 1.8-fold (Fig. 5f, α < 0.05, t-test). In both the activation and repression experiments, the addition of forskolin significantly increased levels of testosterone (Fig. 5e, f, p = 0.0006). Overall, we were able to impact testosterone levels produced by the cells by impacting endogenous DENND1A gene expression levels and we showed that DENND1A activation increases testosterone production, independent of Forskolin-induced increase in testosterone production.

As a complementary analysis, we measured the rate of increase in testosterone concentration over the four days post transduction (Supplementary Fig. 29). Overall, the rate of change of testosterone concentration mirrored the levels measured after four days. Specifically, cells with increased DENND1A expression had substantially increased testosterone production between 2- and 4-days post-transduction compared to control-treated samples (Supplementary Fig. 30). These results indicated that altered expression of endogenous DENND1A was sufficient to increase androgen biosynthesis in steroidogenic cells.

Some studies have indicated a role of altered estradiol on PCOS85,86, and estradiol is produced from testosterone and other androgens. Thus, we also measured changes in estradiol production. Increasing DENND1A expression via targeting the DENND1A promoter in the H295R-dCas9-p300 cells increased levels of estradiol produced by the cells compared to the non-targeting control under DMSO (p = 0.032, t-test, 1.4-fold, Supplementary Fig. 31). Across perturbation conditions, forskolin treatment also significantly increased estradiol produced by the cells (p < 1012, t-test, 1.5–2.3-fold). DENND1A activation further increased estradiol production in H295R-dCas9-p300 cells. We did not observe significant changes in estradiol levels when repressing DENND1A in H295R-dCas9-KRAB (Supplementary Fig. 32). These results indicate that increased DENND1A expression leads to increased estradiol levels, likely downstream of increased testosterone production.

Discussion

One of the central challenges of complex trait genetics is identifying the causal variants within GWAS susceptibility loci and determining their functional consequences. Here, we have fine mapped PCOS genetic associations to specific gene regulatory elements using a combination of high-throughput reporter assays and genetic association analyses. Specifically, we have mapped candidate regulatory elements by testing for the regulatory activity of millions of DNA fragments across 14 PCOS GWAS loci comprising of about 3 Mb of the human genome. We further demonstrated a scalable approach to fine map genetic variants within candidate regulatory elements. We identified PCOS-associated genetic variants by performing genetic association tests across genomic regions that we identified as candidate regulatory elements. Together, we demonstrated a generalizable strategy for identifying genetic variants within experimentally identified functional regulatory elements to fine map genetic association loci for complex genetic traits. As proof-of-concept of the strengths of this approach, we focused on DENND1A, a PCOS GWAS candidate gene reported to regulate androgen biosynthesis22. We showed that manipulating the epigenome of DENND1A-proximal regulatory elements caused increased DENND1A expression and, subsequently, increased androgen in human adrenal cells. These results extend previous studies identifying a role for DENND1A in testosterone production in theca cells, while also demonstrating specific gene regulatory elements wherein genetic variation can alter DENND1A expression. Our results demonstrate the advantage of combining high-throughput reporter assays, fine mapped genetic analyses, and targeted epigenome editing to discover novel gene regulatory mechanisms contributing to common human diseases.

The experimental approaches we used have several advantages and limitations. Because the targeted STARR-seq approach assayed fewer fragments, it was more amenable to be used in cell models that cannot be grown at large scales. The targeted approach also allowed us to test for regulatory activity outside context of genetic linkage37. Furthermore, the ability to capture natural genetic variation present in a pool of genomes allowed us to test for allele-specific regulatory activity across one locus in depth. It is understood that weak effects of non-coding variants contribute to a phenotype through coordinated regulation across several regulatory elements87. Thus, this approach allowed us to identify regulatory elements that contribute to an organismal phenotype through gene expression patterns

A limitation of the STARR-seq assay is that DNA fragments are tested outside their native chromatin environment. Although the two cell lines used in this study differ in their steroidogenic capacity, we observed broadly similar regulatory activity across fragments. This is likely because STARR-seq measures the intrinsic regulatory potential of DNA sequences, independent of chromatin structure or epigenetic state. Additionally, reporter assays do not provide information about the endogenous target genes regulated by each element. To address these limitations, we incorporated complementary CRISPR-based epigenetic perturbation approaches to assess the functional consequences of activating or repressing candidate regulatory elements in their native genomic context.

However, these perturbation studies presented their own technical challenges. Specifically, we were unable to obtain homogeneous populations of modified H295R cells expressing dCas9-KRAB or dCas9-p300 and the gRNAs, as the cells were highly sensitive to both antibiotic selection and flow cytometric sorting, leading to significant cell death. As a result, we relied on targeted qPCR to measure gene expression changes, which allowed us to evaluate regulatory effects in viable cultures while avoiding confounding stress responses. We acknowledge the limitation of not performing transcriptome-wide analysis under these conditions. Future studies that leverage improved cell engineering techniques or single-cell approaches will be essential for enabling broader transcriptional profiling following regulatory element perturbation, and will build upon the findings reported here to advance our understanding of gene networks contributing to PCOS pathogenesis.

Identifying the underlying mechanisms by which GWAS loci contribute to disease pathogenesis will be essential for translating these findings to benefit human health. The effect of regulatory elements and non-coding variants has been elucidated for several disease phenotypes. For example, one study identified a SNP that regulates SORT1 in a liver-specific manner within a GWAS risk locus for low-density lipoprotein cholesterol and myocardial infarction (MI)79. Another study focused on maternal hyperglycemia identified variants spanning multiple enhancers that have a coordinated effect on HKDC1 expression88. Other studies focused on post-GWAS functional analyses have used different methods, including statistical55,89,90,91,92,93,94,95 and experimental96,97,98,99,100,101 approaches to fine map GWAS signals and identify functional variants. Nevertheless, detailed cellular or molecular studies are often needed to connect the identified gene regulatory effects to a disease relevant phenotype79,99.

A challenge is the molecular follow up on putative causal genes, which is dependent on cell type, function of the genes and assays to measure the function of the gene with respect to the disease phenotype. PCOS, however, is particularly amenable to experimental perturbation since hormone responses are easy to model in cell systems and offer a potential for testing one of the main clinical phenotypes of PCOS. Our results extend the knowledge of non-coding genetic mechanisms of PCOS pathogenesis. Previous experimental studies characterized a highly conserved enhancer regulating FSHB expression in mouse pituitary cells62,102; and non-coding variants intronic to AMHR2, a receptor for anti-Müllerian hormone103. Previous statistical approaches have also nominated common and rare genetic variants altering the expression of DENND1A24, FSHB, ZFP36L2, ERBB3, RPS26, RAD50104 as potentially contributing to PCOS. Here, we add both a specific gene regulatory mechanism controlling DENND1A expression to that body of knowledge, while also demonstrating a general strategy for identifying analogous mechanisms for other PCOS genes.

The candidate regulatory elements that we identified can serve as a framework to identify functional non-coding regions that might contribute to PCOS risk by harboring causal variants. Our findings add to growing empirical evidence of regulatory regions contributing to complex traits101,105,106,107. We expect that future evaluation of the regulatory elements from this study will provide new insights into the mechanisms leading to PCOS phenotypes. Broadly, our results demonstrate a scalable approach to study disease-associated regulatory regions implicated not only in PCOS, but also in the pathogenesis of common, complex disorders in general.

Methods

STARR Seq assay library construction

Selection of GWAS regions for targeted STARR-seq assays

We compiled a list of PCOS genome wide association studies published at the time of designing the STARR-seq experiments in 20199,10,11,13,14. At that time there were 27 genetic risk loci. We further prioritized risk loci that had a p-value < 5 × 108. That yielded 19 risk loci (Supplementary Data 1). To then select PCOS-associated genomic regions for STARR-seq assays, we identified all genetic variants in linkage disequilibrium (LD, r2 > 0.8) with the 19 genetic variants that were most strongly associated with PCOS or its clinical phenotypes9,11,13,14. We then selected bacterial artificial chromosomes (BACs) and fosmids that encompassed all the identified genetic variants. We removed the genomic regions at ZBTB16, MAPRE and ERBB3 at this step due to recombination or availability of BACs. We obtained 18 BACs and 2 fosmids spanning the 16 remaining genetic risk loci. All BACs and fosmid clones were sourced from BACPAC Genomics, Inc and the source of these clones is Children’s Hospital & Research Center at Oakland (CHRCO). The BACs and fosmids are detailed in Supplementary Data 2.

All BACs and fosmids were obtained as clones in E. coli. We propagated each bacterial clone in selective conditions. We isolated the BAC DNA using NucleoBond Xtra BAC (Machery-Nagel); and we isolated fosmid DNA using FosmidMAX (Lucigen), following manufacturer’s protocols. To validate that the BACs and fosmids were intact and covered the target region, we created Illumina high-throughput sequencing libraries from the isolated DNA using NEBNext Ultra II FS DNA Library Prep. We barcoded the sequencing library for each BAC or fosmid independently, and pooled the resulting libraries for sequencing. We sequenced the pooled libraries on an Illumina MiSeq instrument, and aligned to the human genome. For two of the 16 target regions, the BACs either recombined or the sequencing reads from the BAC aligned to a different genomic region suggesting contamination with another BAC. We removed those two regions from subsequent analysis. The BACs and fosmids for the remaining 14 target regions span ~3 Mb of the human genome (Supplementary Data 2).

STARR-seq reporter plasmid construction

To create STARR-seq assay libraries from the BACs and fosmids, we cloned sheared DNA from each BAC into the STARR-seq plasmid. We sheared each BAC or fosmid to ~400 bp DNA fragments using a Covaris S220 sonication instrument. We then ligated custom universal adapters to the resulting DNA fragments using the NEBNext DNA Library Prep protocol (#E6040L) (Supplementary Data 16 - SS_Adaptor_1 & SS_Adaptor_2). We amplified the adapted DNA fragments and added sequences for Gibson assembly into the STARR-seq plasmid using PCR. For the PCR, we used KAPA HiFi HotStart kit (Roche) and the primers TS2SS-F and TS2SS-R (Supplementary Data, S16). The PCR cycling conditions were: 98 °C for 30 s, followed by 10 cycles of 98 °C for 15 s, 64 °C for 30 s, 72 °C for 30 s, with a final extension at 72 °C for 5 min.

We cloned the fragment libraries into the STARR-seq ORI vector (Addgene#99296). To do so, we first linearized the plasmid using AgeI and SalI (NEB R3552L and NEB R3138L). We analyzed the digested plasmid on a 1% agarose gel, confirmed that the linear plasmid was the expected ~3600 bp size, and isolated the linearized plasmid using either the QIAquick Gel Extraction Kit (#28704) or GeneJET Gel Extraction Kit (#K0691). We cloned the adapted and amplified DNA fragments from the BACs and fosmids into the linearised STARR-seq ORI vector using the NEBuilder HiFi DNA Assembly (#E2621) kit. We ethanol precipitated the products. To do so, we added 0.1X volume 3 M NaOAc and 2.5X volume cold 100% ethanol and stored the mixture at –20 °C overnight. We then pelleted the DNA via centrifugation at 16,000 RCF for 30 min at 4 °C. We washed the pellets with 5 ml cold 70% ethanol, and resuspended them in water. To amplify the resulting plasmid libraries, we electroporated into E. cloni 10 G SUPREME Electrocompetent Cells following manufacturer protocol for optimal settings in 1.0 mm cuvette (10 μF, 600 Ohms, 1800 Volts). We grew the plasmids in individual 1 L volumes of LB with carbenicillin for antibiotic selection at 37 °C overnight. We isolated the resulting PCOS GWAS STARR-seq assay plasmids using NucleoBond PC 10000 EF (Machery-Nagel).

To make the final PCOS GWAS STARR-seq assay library, we pooled the individual BAC and fosmid STARR-seq plasmids in equimolar concentration. We validated the size of the plasmid library using the Agilent TapeStation, and quantified the resulting pool using Qubit (Invitrogen).

PCOS GWAS STARR-seq assay library sequencing

To estimate the abundance of reads mapping to the regions selected in the assay library, we used Illumina high-throughput sequencing NextSeq 2000 with 50 bp paired end sequencing protocol. To prepare the sequencing libraries, we first amplified the STARR-seq regions from a 20 ng pooled plasmid library using KAPA HiFi HotStart kit (Roche). The PCR cycling conditions were: 98 °C for 30 s, followed by 15 cycles of 98 °C for 15 s, 64 °C for 20 s, 72 °C for 30 s, with a final extension at 72 °C for 5 min using 208-F Index7 primers (Supplementary Data 16). To isolate the final library, we used Axygen Spri Beads (AxyPrep™ Mag PCR Clean-Up Kit) beads at appropriate concentrations based on the manufacturer’s manual for an insert size of 400 bp.

Cell culture protocol

We obtained NCI-H295R cells from ATCC. The cells were cultured in DMEM/F-12 medium (Gibco #21041025) supplemented with 2.5% Nu-Serum (Corning #355100) and 1% ITS+Premix (Corning #354352) and grown as a monolayer at 37 °C, 5%CO2. For drug treatments, forskolin (Stemcell Technologies, #72112) was dissolved in DMSO. Forskolin was added to the cell culture media at 10 µM, and an equivalent volume of DMSO was used as the control. We validated testosterone produced by the cells stimulated with 10 µM forskolin using ELISA following manufacturer’s protocol (Cayman Chemicals #582701).

We obtained COV434 cells from ECACC (Sigma–Aldrich #07071909). The cells were cultured in DMEM (Gibco #11965092) supplemented with 2 mM Glutamine and 10% Foetal Bovine Serum (FBS) and grown as a monolayer at 37 °C, 5%CO2. We validated estradiol produced by these cells treated with 100 ng/mL follicle stimulating hormone (FSH) and 2.9 μg/mL androstenedione (A4) using ELISA (Cayman Chemicals #501890).

All experiments were performed between passages 5 and 12.

Nucleofection optimization

To transiently introduce the PCOS GWAS STARR-seq assay library into the cell lines, we used electroporation via the Lonza 4D-Nucleofector System. To optimize the electroporation settings for H295R and COV434, we used the Cell Line Optimization 4D-Nucleofector™ X Kit (Lonza #V4XC-9064) following manufacturer’s protocol. Based on this optimization, we chose SF-CM-138 for COV434 cells and SF-DN-100 for the H295R cells, with 2 μg of plasmid to every 1 million cells transfected.

Transfection of cells

To test the regulatory potential of PCOS GWAS targeted regions, we first transfected the PCOS GWAS STARR-seq library into both H295R and COV434 cells, and isolated and sequenced the resulting RNA. We isolated the RNA from the cells 6 h post transfection. We transfected the PCOS GWAS STARR-seq plasmid library into H295R and COV434 based on the nucleofection optimization settings we described using SF Cell Line 4D-Nucleofector® LV Kit L (Lonza #V4LC-2002) following manufacturer’s protocol. All the experiments were performed in triplicate for each cell line. For each replicate for each of the cell lines, we used 50 million cells transfected with 100 μg of the PCOS GWAS STARR-seq plasmid library.

PCOS GWAS STARR-seq reporter library construction

To isolate the PCOS GWAS reporter RNA, we first isolated total RNA six hours post transfection followed by enriching for cDNA produced from the PCOS GWAS STARR-seq library plasmid pool108.

Six hours post transfecting the cells with the STARR-seq library, we rinsed the cells with PBS and dissociated the cells using Trypsin-EDTA 0.25% (Life Technologies). We lysed the cell pellets using RLT buffer (Qiagen) with 2-mercaptoethanol (Sigma). We passed the lysates through a 18-gauge needle ten times and stored at −80 °C before RNA extraction.

RNA extraction

We isolated total RNA using the Qiagen RNeasy Midi kit including the on-column

DNaseI digestion step. We treated the isolated total RNA with 1 μL RNase Block (Agilent). We then isolated poly-A RNA using Dynabead Oligo-dT25 beads (Life Technologies) according to the manufacturer’s recommended protocol. We treated the poly-A RNA with DNase (TURBO DNase, Invitrogen) and 1 μL RNase Block at 37 °C for 30 min before halting the reaction with the DNase inactivation reagent. We then synthesized PCOS GWAS reporter cDNA by reverse transcription using Superscript III (800 U, Life Technologies) following manufacturer’s protocol and a STARR-seq specific primer (SSRT-UMI, Supplementary Data 16)

PCOS GWAS STARR-seq reporter construction

Following synthesis, we treated the cDNA with RNaseA (Sigma) at 37 °C for 1 hour. We purified the PCOS GWAS reporter cDNA with SPRI beads (1.5X) and amplified using index-PCR primer and indexed PostSS-Index-5 primers (Supplementary Data 16) to allow barcoding for sample multiplexing under the following conditions: 98 °C for 30 s, followed by 10–12 cycles of 98 °C for 10 s, 64 °C for 30 s, 72 °C for 30 s, with a final extension at 72 °C for 5 min. We split each sample into 7 individual PCR amplification reactions in this step. We determined the total number of cycles for amplification using a small portion of that sample in a qPCR protocol and estimating cycle number using 1/4th the maximum plateau observed in the qPCR. We cleaned the amplified PCR products using SPRI beads (1.0X) and then validated the length distribution of the PCOS GWAS reporter library on Agilent tape station.

PCOS GWAS STARR-seq reporter library sequencing

Final PCOS GWAS reporter libraries from each replicate experiment were pooled at equimolar 2 nM concentrations. We sequenced the PCOS GWAS reporter libraries on Illumina NextSeq 2000 using 50 bp PE sequencing.

Alignments and STARR-seq analysis

To estimate regulatory activity in the targeted PCOS GWAS regions, we used the abundance of the fragments expressed as RNA in the reporter library relative to their abundance in the assay library, outlined figure S7. First, we aligned the PCOS GWAS assay library and the PCOS GWAS reporter library individually to the human genome (hg38) using bowtie2. We filtered reads with a quality score of Q > = 30, and outside the centromeres and blacklisted regions. These reads were used for the downstream analysis. We used picardtools109 to mark and call duplicates. RPKM normalized STARR-seq read density was computed at single base pair resolution using deepTools110 utility bamCoverage. We used CRADLE48 package to correct biases and call peaks with the following options. We then estimated differential STARR-seq activity across the regions as fold change using DESeq49. For visualizing the sequencing data, we have included the raw and RPKM normalized bigwigs on the UCSC genome browser link in the data availability statement (Figure S7). To represent the CRADLE normalizations, we included CRADLE-corrected bigwigs for input and output STARR-seq libraries separately as well as for input library subtracted from the output library (Figure S7). For each region identified to have STARR-Seq regulatory activity measured by DESeq, we compared PCOS STARR-seq results from both COV434 and H295R cell lines to ATAC-Seq datasets generated for these cell lines. We also compared the STARR-Seq results to the regulatory regions across ENCODE (V4) for both cell lines and primary tissues.

PCOS case-control variant association testing within candidate regulatory regions

To identify any association between genetic variants within functional STARR-seq regulatory elements and PCOS, we performed genetic association analyses. The selection criteria, clinical features and genotyping in 983 PCOS cases and 2951 controls from our previous stage 1 GWAS discovery cohort has been reported (Hayes, 2015). In brief, genotyping was performed using the Illumina OmniExpress (HumanOmniExpress-12v1_C) array9. Genotype imputation was performed using minimac4111 on the Michigan Imputation Server112 for phasing via Eagle113 using the TOPMED freeze 8 reference panel114,115. Variants were filtered to remove any SNPs with imputation quality (R2) less than 0.8 and restricted to STARR-seq regions of regulatory activity. The adjusted p-value threshold was calculated based on the regulatory elements identified in each cell line as having increasing or decreasing regulatory activity (Supplementary Data 7). PCOS association was tested within candidate regulatory regions. Single variant associations were carried out using PLINK116 on common variants (minor allele frequency [MAF] > 1%) using logistic regression with PCOS as the outcome variable and age, BMI, and five principal components (PCs)117 as covariates. To control for false-positive discoveries, results were adjusted for Bonferroni correction thresholds. The number of PCOS cases was 983 rather than 984 reported in Hayes et al. 2015, because we updated the IBD exclusion criterion from 3rd degree relatives to 4th degree relatives resulting in the exclusion of 1 PCOS case.

Colocalization testing

To test for association between two datasets to identify likely causal SNP between two traits, we used a bayesian colocalization method55. For the PCOS-associated variants, we used the list of variants and its associated statistics from the above result for all variants with P < 0.3 (Table 3). We used the standard options for the colocalization testing. For the eQTL dataset, we used publicly available expression quantitative association data from the GTEx consortium GTEx Analysis V8 (dbGaP Accession phs000424.v8.p2, accessed on June 7, 2020). The GTEx dataset contains cis-eQTL data from ~900 American donors of mostly European Ancestry (~85%) across 49 tissues and of varying ages. We applied coloc, a Bayesian test for colocalization to identify the probability of a shared causal signal between the PCOS-regulatory element-associated variant and eQTL variants. We used the coloc.abf() function in the coloc R package with the default assignment of prior probabilities for a SNP being associated with each trait from the Coloc package. All analyses with a colocalization posterior probability (PP.4) > 0.3 using eQTL data from all tissues, adrenal tissue and ovarian tissue were reported in Table 4.

DENND1A-enriched STARR seq assay library construction

To test for the allele specific regulatory activity of common variants, we modified the targeted STARR-seq assay to use a pool of 5 different human genomes instead of BACs and fosmids as described above. We selected genomes from individuals identified as female, and healthy from the 1000 Genomes project69. We used three individuals of European ancestry and two individuals of Han Chinese ancestry to be pooled into the targeted STARR-seq experiment to identify allele specific regulatory effects. Four candidate regulatory elements were identified from the targeted STARR-seq results with coordinates listed in Supplementary Data 13.

Enrichment for the targeted DENND1A region

We focused on variants present in the DENND1A locus, a region that spans the entire DENND1A gene and 100 kb upstream and downstream of the gene. For target enrichment of the DENND1A locus, we used targeting oligonucleotide probes. We first sheared each genome separately to ~200 bp using Covaris (S220). We then used Agilent SureSelect Custom DNA Target Enrichment Probes to enrich the region around DENND1A (hg 38: chr9:123279654-124030107). We followed the Agilent SelectXT2 custom (Cat# 5190-4846) to enrich the target regions in each genome, however we modified the protocol at the adaptor ligation steps. We used a custom adaptor (SS_Adaptor) and amplified the resulting oligo fragments using TS2SS-F and TS2SS-R primers (Supplementary Data 16).

DENND1A locus STARR-seq reporter plasmid construction

To create the DENND1A STARR-seq assay libraries from the five genomes, we cloned the sheared and DENND1A-locus enriched DNA fragments. We cloned the amplified and enriched fragments into the linearised STARR-seq vector using NEBuilder HiFi DNA Assembly (#E2621). We ethanol precipitated the products. To do so, we added 0.1X volume 3 M NaOAc and 2.5X volume cold 100% ethanol, and stored the mixture at –20 °C overnight. We then pelleted the DNA via centrifugation at 16,000 RCF for 30 min at 4 °C. We washed the pellets with 5 ml cold 70% ethanol, and resuspended in water.

We then pooled the plasmids from each genome in equimolar concentrations. We amplified the pooled plasmids by transfecting the plasmids into E. cloni 10 G Electrocompetent Cells following manufacturer protocol for optimal settings (1.0 mm cuvette, 10 μF, 600 Ohms, 1800 Volts). We subsequently isolated the plasmids using Qiagen Plasmid Kit, GigaPrep (Qiagen #12191), and quantified it using Qubit and validated the length of the pooled library on a 1% agarose gel. This purified, pooled plasmid is our DENND1A-locus STARR-seq library that was used for DENND1A-locus STARR-seq experiments.

DENND1A locus STARR-seq assay library sequencing

To estimate the abundance of reads mapping to the variant loci selected in each assay library, we used Illumina high-throughput sequencing NextSeq 2000 with 50 bp paired end sequencing protocol. We sequenced 3 replicates of the DENND1A-locus STARR-seq assay library using the amplified the STARR-seq assay fragments from the pooled library using 208-F Index7 primers (Supplementary Data 16).

DENND1A locus enriched STARR-seq assay

To test for effects of variants in the targeted DENND1A locus, we transfected the DENND1A locus STARR-seq assay library into H295R cells, and isolated and sequenced the resulting RNA similar to the methods described previously. All experiments were performed in triplicate for each cell line. For each replicate for each of the cell lines, we used 70 million H295R cells transfected with 140 μg DENND1A locus STARR-seq assay library using the Lonza Nucleofector (setting SF-DN-100). We isolated the RNA from the cells 6 h post transfection.

DENND1A locus STARR-seq reporter library construction

To isolate the DENND1A locus reporter RNA, we first isolated total RNA followed by enriching for cDNA produced from the DENND1A-locus STARR-seq plasmid library. We used the same protocol as described for the PCOS GWAS reporter library construction. We pooled the DENND1A locus reporter libraries from each replicate at equimolar 2 nM concentrations. We then sequenced the DENND1A locus reporter libraries on Illumina NextSeq 2000 using 75 bp PE sequencing.

Candidate regulatory variant identification

To identify variants that have allele specific regulatory activity, we compared the ratio of reads mapping to the alternate allele versus reference allele in each assay library and reporter library. If the ratio of reads mapping to alternate allele versus reference allele was higher in the reporter library compared to the assay library, that variant was called as having increased regulatory activity of the alternate allele.

To do so, we first obtained a list of the variants present in the DENND1A locus in the pool. We obtained the VCF for these samples from the 1000 Genomes Project69. We filtered the variants in the targeted DENND1A locus, to only include those SNPs present as heterogeneous within the pool of five genomes we used. The final list of ~600 variants was then used for the regulatory variant analysis.

To compare reads mapping to each allele in both the reporter and assay libraries, we first aligned DENND1A locus enriched STARR-seq libraries (assay library and reporter library) individually aligned to the human genome (hg38) using WASP118 and bowtie2119. Reads with a quality score of Q > = 30, and outside the centromeres and blacklisted regions were used for downstream analysis. We used picardtools to mark and call duplicates109. RPKM normalized STARR-seq read density was computed at single base pair resolution using deepTools utility bamCoverage110. We then assigned reads mapping to each variant for each sequenced sample using samtools mpileup120.

To estimate the regulatory effect of variants, we used BIRD68. BIRD is a bayesian statistical framework for analysis of regulatory variants and uses bayesian priors to identify allele-specific regulatory effects, and identifies variants that have a high probability of being a regulatory variant with an effect size, theta. We used the standard options for BIRD and set the regulatory effect threshold as 1.2.

ATAC-Seq

To identify the accessible chromatin within the H295R and COV434 cells, we performed ATAC-Seq121 in duplicate as described below.

We harvested 50,000 viable cells for each replicate. COV434 cells were additionally incubated in TURBO DNase (Invitrogen, #AM2238) for 1 hour at 37 °C. We then incubated the cells with 50 μL cold ATAC-RSB with 0.1% NP40, 0.1% Tween20, 0.01% Digitonin and incubated on ice for 3 min. We washed the cells with 1 ml cold ATAC RSB with 0.1% Tween 20 and pelleted. We resuspended the cell pellets in the transposition mixture comprising of 25 μL TD buffer, 2.5 μL transposase, 16.5 μL PCS, 0.5 μL 1% digitonin, 0.5 μL 10% Tween 20 and 5 μL H2O and incubated in a thermomixer at 37 °C for 30 minutes. We cleaned up the DNA using MinElute Reaction Cleanup Kit (Qiagen, #28204). We amplified the resulting DNA using an ATAC-Universal primer and an ATAC-barcode primer (Supplementary Data 16) and cleaned it using SPRI beads. We sequenced the ATAC seq libraries on Illumina NextSeq 550, 50 bp PE sequencing at the Duke Genomics core.

ATAC-seq preprocessing and alignment

ATAC-seq libraries for H295R and COV434 cell lines were individually aligned to the human genome (hg38). Each cell line had 2 biological replicates, and >40 million reads were generated per sample. Sequencing data quality was assessed with FastQC, and adapters were trimmed with Trimmomatic. Trimmed reads were aligned to the GRCh38 genome using Bowtie119 reporting only alignments having no more than two mismatches, discarding multi-mapping reads(-v 2 --best --strata -m 1). Reads mapping to the ENCODE hg38 blacklisted regions (https://www.encodeproject.org/files/ENCFF356LFX; manually curated regions with anomalous signal across multiple genomic assays and cell types) were removed using bedtools2 intersect122 (v2.25.0). Properly paired reads were then filtered to exclude presumed PCR duplicates using Picard MarkDuplicates (v1.130; http://broadinstitute.github.io/picard/). Reads were then used to generate reads per million (RPM) counts of bigWig files for visualization using deeptools bamCoverage123 (v3.0.1). Peaks were called using MACS2 with an FDR cutoff 0.1. We used the ENCODE ATAC-Seq standards for analysing the dataset we generated. We generated Transcription Start Site enrichment values using GRCh38 Refseq TSS annotation and used the cutoff of >7 for high quality data (Figure S11).

Generating cell lines for CRISPRa perturbation studies

GuideRNA (gRNA) design and gRNA plasmid synthesis

Four candidate regulatory elements were identified from the targeted STARR-seq results with coordinates listed in Supplementary Data 13. The regions were selected based on STARR-seq effect, chromatin accessibility and ability to design guides considering genomic sequence and PAM restrictions.

To design the guide oligos, we used Guidescan2124, with “specificity” filter > 0.2. We had a total of 21 gRNAs, across four candidate regulatory elements and DENND1A promoter region (Supplementary Data 14), with each regulatory element comprising of 5–7 guides targeting that element. For the negative control, we designed a set of five guides that did not have any targets in the human genome. Each gRNA oligo was synthesized as individual oligos that were then processed as described below to make pooled gRNA plasmids.

To make the gRNA plasmids, we followed the outline of the CROP-Seq protocol125. First, we prepared the gRNA plasmid backbone by digesting CROPseq-Guide-Puro plasmid from Addgene (#86708) using BsmBI. We ran the digested product on 1% agarose gel, and we purified the 8.3 kb fragment using GeneJET Gel Extraction Kit (#K0691). To prepare the gRNA oligos for insertion into the plasmid, for each gRNA oligo synthesized, we first converted it to a double stranded oligo using Primers ssds-F and ssds-R (Supplementary Data 16). We then cloned each double-stranded gRNA oligo into the digested CROPseq-Guide-Puro vector using NEBuilder HiFi DNA Assembly (#E2621) kit. The plasmid products were purified with QIAquick PCR Purification Kit (Qiagen #28104).

To make the pooled plasmids, we pooled (equimolar) each plasmid product for each regulatory element, or promoter region, or negative control. To amplify the plasmid pools, we electroporated each pool into Lucigen Endura Cells (Lucigen #60242-2) following manufacturer protocol for optimal settings in 1.0 mm cuvette (25 μF, 200 Ω, 1.5 kV). We grew the plasmids in individual 25 mL volumes of LB with carbenicillin for antibiotic selection at 37 °C overnight and isolated the gRNA plasmid pools using Qiagen Midi Prep (Qiagen #12143) following manufacturer’s protocols. Each purified plasmid pool was then used to prepare lentiviral particles.

Lentivirus production

To test the target gene of the identified STARR-seq regulatory elements, we used CRISPRa to perturb the selected candidate regulatory elements. First, we designed a stable cell line expressing a Cas protein. To do so, we used a catalytically inactive Cas9 (dCas9) fused with the P300 domain of histone acetyltransferase (dCas9-P300). This dCas9-p300 can act as a transcriptional activator when combined with targeting guide RNA41.

To make stable dCas9-P300 cell lines, we generated lentivirus expressing dCas9-p300. Briefly, we combined the following plasmids: dCas9-p300 (Addgene #83889), psMD2.G (Addgene #12259) and psPAX2 (Addgene #12260) with Lipofectamine 3000 (Invitrogen #L3000001) and lipofected into HEK293T cells (ATCC #CRL-3216™) according to the manufacturer’s protocol. After 14 to 20 h, transfection media was exchanged with fresh media. We then harvested viral supernatant at 24 and 48 h post lipofection. We concentrated the viral supernatant at 1/100x using LentiX Concentrator (Clontech #631232) following the manufacturer’s protocols.

To make stable dCas9-KRAB cell lines, we generated lentivirus expressing dCas9-KRAB. Briefly, we combined the following plasmids: dCas9-KRAB (gift from Gersbach lab, Duke University), psMD2.G (Addgene #12259) and psPAX2 (Addgene #12260) with Lipofectamine 3000 (Invitrogen #L3000001) and lipofected into HEK293T cells (ATCC #CRL-3216™) according to the manufacturer’s protocol. After 14 to 20 h, transfection media was exchanged with fresh media. We then harvested viral supernatant at 24 and 48 h post lipofection. We concentrated the viral supernatant at 1/100x using LentiX Concentrator (Clontech #631232) following the manufacturer’s protocols.

To produce lentivirus for individual gRNAs, we transfected HEK293T cells with an equimolar pool of gRNA plasmids for each regulatory element, psPAX2, and pMD2.G using Lipofectamine 3000 following the manufacturer’s instructions. We harvested media containing the produced lentivirus at 24 and 48 h later and concentrated the viral supernatant at 1/100x using LentiX Concentrator (Clontech #631232) following the manufacturer’s protocols.

HEK293T cell line with stable dCas9-P300 expression:

We received HEK293T-dCas9-P300 cell line41 from Dr. Charles Gersbach. We followed the published culture and growth conditions for 293T cells.

Generating stable H295R-dCas9-P300 and H295R-dCas9-KRAB cell lines

To make stable H295R cells expressing dCas9-P300, we transduced the concentrated lentiviral particles containing dCas9-p300 into H295R cells with a multiplicity of infection of 5.0 using 6 μg/ml of polybrene (EMD Millipore Corporation #TR-1003-G). Additionally, we selected for the transduced cells using 0.5 μg/mL of puromycin (Gibco #A1113803) for 7 days41 to generate a polyclonal H295R-dCas9-p300 cell line. We confirmed the expression of dCas9-p300 in H295R cells using qRT-PCR.

To make stable H295R cells expressing dCas9-KRAB, we transduced the concentrated lentiviral particles containing dCas9-KRAB into H295R cells with a multiplicity of infection of 5.0 using 6 μg/ml of polybrene (EMD Millipore Corporation #TR-1003-G). Additionally, we selected for the transduced cells using 0.5 μg/mL of blasticidin (Gibco #A1113903) for 10 days to generate polyclonal H295R-dCas9-KRAB cell line. We confirmed the expression of dCas9-p300 in H295R cells using qRT-PCR.

Transduction of gRNA into dCas9-P300 or dCas9-KRAB expressing cell lines

To test the effect of dCas9-P300 or dCas9-KRAB on the targeted regulatory elements, we transduced each lentiviral pool for the regulatory elements, DENND1A promoter region and negative control in two cell lines (HEK293T and H295R) with stable dCas9-P300 or stable dCas9-KRAB expression with high multiplicity of infection with ~MOI of 15 to account for the fact the dCas9 modified H295R cells were harder to transduce126. We transduced the cells during seeding in a 12-well or 6-well plate supplemented with 6 μg/ml of polybrene for H295R cells and 4 μg/mL of polybrene for HEK293T cells across 6 replicates for each pool (EMD Millipore Corporation #TR-1003-G). We changed the media on the cells 24 h after transduction.

RNA isolation and qRT-PCR to measure gene expression levels

To measure any changes in gene expression levels due to the CRISPRa perturbation, we used qRT-PCR. First, we harvested RNA from each replicate 4 days post transduction with the gRNA lentivirus pool using RNeasy Mini Kit (Qiagen #4004) following manufacturer’s protocol including the DNase treatment. Prior to harvesting the cells, we measured cell viability for each replicate to be at least 80% viable using 0.4% trypan blue stain and Countess Cell Counter (ThermoFisher). We quantified the RNA using Qubit (Invitrogen) and used 500 ng of RNA for each sample for subsequent cDNA synthesis. For the cDNA synthesis, we used Superscript III (800 U, Life Technologies) with Oligo dT primers following manufacturer’s protocol (Thermo Fisher #18418012). Following cDNA synthesis, we performed qRT-PCR using that cDNA, TaqMan™ Fast Advanced Master Mix for qPCR (Thermo Fisher #4444556), and TaqMan™ Gene Expression Assays (for the genes DENND1A, CRB2, LHX2 and STRBP and GAPDH). The qPCR analysis was performed using the 2-ΔΔCT method in R, using GAPDH as the internal control. All the fold change is reported as log(2-ΔΔCT) compared to the negative (non-targeting gRNA) control. Each sample was measured in triplicate for the qRT-PCR.

RNA sequencing of H295R cells with forskolin treatment

To measure any changes in gene expression levels due to forskolin treatment, we performed RNA-sequencing. First, we cultured H295R cells in the presence of forskolin at 10 μM or equivalent volume of DMSO as a control for 3 days. Prior to harvesting the cells, cell viability for each replicate was measured to be atleast 90% using 0.4% trypan blue stain and Countess Cell Counter (ThermoFisher). The cells were then harvested using trypsin and cell-pellets were flash frozen prior to RNA extraction and sequencing. RNA sequencing was performed at Azenta standard, unstranded, bulk RNA sequencing.

RNA sequencing analysis

To quantify changes in gene expression in H295R cells as a result of forskolin treatment, we performed a differential analysis on RNA-seq data generated from forskolin-treated and DMSO-treated H295R cells. Raw reads were trimmed to remove adapters and

bases with average quality score (Q) (Phred33) of <20 using Trimmomatic v0.32127. Trimmed reads were subsequently aligned to the primary assembly of the GRCh38 human genome using STAR v2.4.1128. We performed differential expression analysis using DESeq249 in R using Benjamini-Hochberg false discovery rate (FDR) less than 0.05 and log fold change was calculates using apeglm78. For estimating transcript abundance, transcripts per million (TPMs) were computed using the rsem-calculate-expression function in the RSEM v1.2.21 package129.

ELISA for measuring testosterone production

To measure changes in testosterone production, we collected the supernatant from the gRNA pool transduced H295R cells two- and four- days post transduction. First, we diluted the supernatant 300-fold. Then, we measured the amount of testosterone produced using ELISA (Cayman Chemicals #582701) according to the manufacturer’s protocols using the given standard. All samples were measured in duplicate. The absorbance of the compound was measured at 405–420 nm using the GloMax Discover System (Promega). Fold-change reported is based on the negative (non-targeting gRNA) control.

ELISA for measuring estradiol production

To measure changes in estradiol production, we collected the supernatant from the gRNA pool transduced H295R cells two- and four- days post transduction. First, we diluted the supernatant 10-fold. Then, we measured the amount of estradiol produced using ELISA (Cayman Chemicals #501890) according to the manufacturer’s protocols using the given standard. All samples were measured in duplicate. The absorbance of the compound was measured at 405–420 nm using the GloMax Discover System (Promega). Fold-change reported is based on the negative (non-targeting gRNA) control.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.