Introduction

Enhancers are non-coding cis-regulatory elements that, along with the transcription factors (TFs) that bind to them, are central in regulating spatiotemporal gene expression. A traditional way to describe enhancers is that they are 100–1000 base pair (bp) sequences capable of increasing gene expression independent of their orientation or distance from the target gene promoter. However, discoveries over the last decade highlight the architectural and functional diversity among enhancers1,2. For example, enhancers can span over three kilobases (kb)3 or be organized into large regulatory clusters such as locus control regions or super-enhancers4. Within regulatory clusters, individual enhancer elements may work additively, synergistically, or redundantly to regulate gene expression4,5,6,7. While enhancers can function at various distances from their target promoters, the distance between the enhancer and the target promoter can influence gene expression2. Certain enhancers, known as dual enhancer-silencers, can function as either enhancers or silencers depending on the developmental stage8,9, cellular context9,10,11 or the specific genes being regulated12. Relatedly, some enhancers contain binding sites for both activating and repressing TFs, where regulation of these enhancers can also be dependent on the cellular context13,14,15,16. The complexity of the architecture and functional variability of enhancers pose significant challenges for their identification and characterization.

Next-generation sequencing approaches can be used to identify enhancers at a genome-wide level. For example, chromatin immunoprecipitation followed by sequencing (ChIP-seq) can be used to profile histone modifications like H3K27Ac, which has been associated with active enhancers, and H3K4me1, which has been associated with poised enhancers17. ChIP-seq identifies putative enhancers but does not confirm enhancer functionality. Another technique, self-transcribing active regulatory region sequencing (STARR-seq), combines plasmid reporter assays with deep sequencing to assess enhancer activity for millions of candidate sequences18. However, STARR-seq results in many false positives or negatives, and neither ChIP-Seq nor STARR-seq associate enhancers with target genes19.

To characterize individual enhancers, DNA segments can be placed upstream of a minimal promoter driving a reporter gene (e.g., luciferase or green fluorescent protein) in a plasmid. The plasmids are then introduced into cells to test the activity of each segment20. However, since the enhancers are taken out of their genomic context in these plasmid-based assays, they only provide information about enhancer potential and, again, do not associate the activity with a specific gene’s expression. To circumvent this limitation, CRISPR-Cas9 can be used to delete enhancer elements in their native genomic context, and target gene expression levels can be measured. CRISPR-Cas9 has been used to characterize multiple disease-gene enhancers, including those for the β-globin locus21, myelocytomatosis proto-oncogene (MYC)22,23, and sonic hedgehog (SHH)24. However, many disease-associated genes have enhancers that are not well studied.

Matrix metalloproteinase-9 (MMP9) is a proteolytic enzyme that degrades the extracellular matrix to facilitate cell invasion and tissue remodeling in physiological and pathological conditions. MMP9 has been implicated in many diseases, including cancers, cardiovascular disease, neurodegenerative disorders, inflammatory diseases, and pregnancy complications25. In cancers, MMP9 is expressed in tumor-associated neutrophils and macrophages, where it helps in the proteolytic breakdown of interstitial collagen, contributing to extracellular matrix degradation and promoting angiogenesis in a pro-tumorigenic environment26. During the first trimester of pregnancy, trophoblast cells of the fetal placenta express MMP927 and migrate and invade extracellular matrix components of the maternal decidua, establishing blood flow to the placenta28. Reduced expression and activity of MMP9 in hypertensive pregnancies are associated with increased collagen deposition in the placenta29. Reduced expression of MMP9 has been seen in placentas of women with preeclampsia30, whereas elevated expression of placental MMP9 has been seen in cases of placenta accreta31. Increased MMP9 expression is also observed in trophoblasts of women who have experienced recurrent pregnancy loss, indicating MMP9 dysregulation as a contributing factor32. Thus, proper regulation of MMP9 expression and activity in trophoblasts may be critical for healthy placental development.

Previous studies focus on understanding how the MMP9 promoter regulates its expression in trophoblast cells30,33, but an MMP9 enhancer has never been characterized in trophoblasts. We previously identified a putative enhancer upstream of Mmp9 in the mouse placenta at embryonic day (e) 7.534. In this study, we used luciferase reporter assays and CRISPR-Cas9 mediated knockout to determine if the putative enhancer could regulate MMP9 expression in a human trophoblast cell line. We identified adjacent segments within the putative enhancer that either positively or negatively regulated MMP9 expression, revealing a complex mechanism of cis-regulation for MMP9. More generally, our results highlight the importance of carefully dissecting putative enhancer elements to uncover their role in the fine-tuned regulation of gene expression.

Materials and methods

Cell culture

HTR-8/SVneo cells (ATCC, CRL-3271), a human trophoblast cell line, were cultured and maintained by following the ATCC-recommended protocol35,36. Mouse trophoblast stem cells (mTSCs) were obtained as a gift from Dr. Emin Maltepe at the University of California San Francisco. Mouse trophoblast stem cells (mTSCs) were maintained on mitomycin C-treated mouse embryonic fibroblast cells (MEFs) (Fisher Scientific, NC1178705)37. Briefly, basal mTSC culture media consisted of RPMI 1640 GlutaMAX (ThermoFisher, 61870127) supplemented with 20% fetal bovine serum (FBS) (VWR, MP1300500), 2 mM L-glutamine (ThermoFisher, 25030081), 1 mM sodium pyruvate (Fisher Scientific, 11360070), 100 µM beta-mercaptoethanol (BME) (Sigma, M3148-25ML), 100 U/ml penicillin, and 100 µg/ml streptomycin (ThermoFisher, 15140122). Basal mTSC culture media was further supplemented with 1 µg/ml heparin (Fisher Scientific, 501657295) and 25 ng/ml fibroblast growth factor 4 (FGF4; Sigma Aldrich, F8424-25UG) for mTSC cell culture maintenance. The cell line was cultured at 37 °C in a humidified incubator containing 5% CO2, and the media was replenished every 2 days.

Differential plating was used to obtain mTSCs from the MEF feeder layer. Cells were washed briefly with Dulbecco’s phosphate-buffered saline (DPBS) (ThermoFisher, 14190250), detached with 0.05% trypsin-EDTA (ThermoFisher, 25300054) at 37 °C for 3 min, and gently dislodged using basal mTSC culture media. The resulting cell suspension was collected and centrifuged at 200xg for 3 min. The pellet containing mTSCs was resuspended in fresh mTSC culture media (containing FGF4 and heparin) and plated onto a new culture dish. Cells were incubated at 37 °C for 1 h, allowing any remaining MEFs to adhere. Non-adherent mTSCs were subsequently collected and centrifuged again at 200×g for 3 min. The mTSCs were then replated onto freshly plated MEFs to continue passaging.

For differentiation into mouse trophoblast giant cells (mTGCs), the mTSCs obtained from differential plating were seeded into differentiation media in a culture dish without MEFs. The differentiation media consisted of the basal mTSC media described above without FGF4 and heparin but supplemented with 5 µM retinoic acid (Sigma Aldrich, R2625-100MG)38. Cells were cultured under differentiation conditions for four days, with media replaced every two days, after which the mTGCs were collected and re-plated for additional experiments.

Ligation independent cloning

Ligation independent cloning (LIC) was performed as previously described39. The pGL4.23 vector was linearized by sequential digestion with EcoRV (NEB, R0195S), at a concentration of 10U/ul, at 37 °C overnight, followed by Nt.BspQ1 (ThermoFisher Scientific, 50-995-264) at a concentration of 5U/ul at 50 °C for 1 h. Digested products were resolved on a 1% agarose gel, and the linearized vector was gel-extracted (Qiagen Mini Elute Gel Extraction Kit) and eluted in nuclease-free water to a final concentration of 10–20 ng/µL. Human genomic DNA (Fisher Scientific, 69-237-3) or mouse genomic DNA (Fisher Scientific, 69-239-3) was used as the template for amplification reactions. Each putative regulatory segment was amplified using primers with LIC attachment sites and Q5 DNA polymerase, following the manufacturer’s instructions (NEB, M0491L) (Supplementary Table S1). Amplified inserts were gel-purified and treated with T4 DNA polymerase (Fisher Scientific, M0203S) in the presence of 10 mM dGTP by following the manufacturer’s protocol. Annealing was then performed by combining 35 ng of the amplified and treated insert with 30ng of linearized pGL4.23 vector in TE buffer at room temperature for 30 min, followed by chilling on ice. Annealed vector and insert were transformed into E. coli DH5α cells and plated on 100 µg/mL ampicillin-containing Luria Broth agar plates overnight at 37 °C. Positive colonies were identified by colony PCR and grown overnight in 5 ml Luria Broth media (Fisher Scientific, BP1426500) containing 100 µg/mL ampicillin (Fisher Scientific, BP176025) at 37 °C. Plasmids were isolated using the QIAprep Spin Miniprep Kit (Qiagen, 27106) and eluted in distilled water (ThermoFisher, 10977023). The LIC plasmid constructs were validated for the presence of the regulatory region using Sanger sequencing, and these sequenced regions were mapped to the respective mouse or human genome (mm9 or hg38) in the UCSC genome browser40,41. Primers and genomic coordinates for the LIC plasmid constructs are included in Supplementary Tables S1 and S2, respectively. Sequenced and verified LIC plasmid constructs containing the target segments were then used for luciferase assays.

Luciferase assays

LIC plasmid construct transfections and luciferase assays were performed using the Dual Glo luciferase assay system kit (Promega, E2940) and following the protocols previously described42,43 with minor modifications. The HTR-8/SVneo cells were seeded at 30,000 cells/well, and mTGCs were seeded at 20,000 cells/well in a 24-well plate. After 48 h, media was changed and was followed by transfections in both cell lines. The HTR-8/SVneo cells were transfected with 500 ng of the LIC plasmid construct, 20 ng of pRL-TK (Promega, E2241), and 1 ul of jetprime reagent/well (VWR, 89129-924). The mTGCs were transfected with 750 ng of the LIC plasmid construct, 75 ng of pRL-TK, and 1.5 µl of jetprime reagent/well. The empty pGL4.23 vector was transfected as a control. The renilla luciferase control vector, pRL-TK, was used as an internal control for normalization. The cell culture media was changed 4 h after the transfection and were grown for an additional two days before performing luciferase assays for both cell lines. Luciferase and renilla readings were recorded by the BioTEK Synergy HTX plate reader using the manufacturer recommended luminescence settings. The firefly values were normalized to renilla values, and the luciferase reporter activity was calculated relative to the empty pGL4.23 vector for three biological replicates. The raw values for luciferase and renilla, normalization calculations, standard errors, and p-values are included in Supplementary File 1.

Site-directed deletions (SDDs)

Site-specific deletion primers were designed using the NEBaseChanger tool (https://nebasechanger.neb.com/, Supplementary Table S3). Deletion primers were designed based on the location of predicted TF binding sites in the JASPAR Transcription Factor Binding Site Database track—“JASPAR CORE 2022—Predicted Transcription Factor Binding Sites” from the UCSC genome browser40,41,44. The SDDs were performed using the 10X Kinase Ligase DpnI (KLD) enzyme mix (Fisher Scientific, NC1322344) following the manufacturer’s protocol. Regulatory segments with SDDs were cloned into the pGL4.23 vector using LIC cloning. The SDD constructs were validated for the deletion using Sanger sequencing and these sequenced regions were mapped to the human genome hg38 in UCSC genome browser40,41. Primers and genomic coordinates for the SDD constructs HD1-11 are included in Supplementary Tables S3 and S4, respectively. The SDD constructs were subsequently used to perform luciferase assays in the HTR-8/SVneo cell line.

CRISPR-Cas9 mediated knockout using ribonucleoprotein complexes (RNPs)

CRISPR-Cas9 mediated knockout using RNPs was done according to a previously published protocol with several modifications45. Two guide RNAs (gRNA) with high on-target and low off-target scores were selected using the CRISPOR tool46 and were purchased from IDT (Supplementary Table S5).

Fluorescence labelling of gRNAs: First, the Trans-activating CRISPR RNA (tracrRNA) and gRNA were annealed at a 1:1 ratio in 40 ul nuclease free duplex buffer (IDT, 11-04-02-01) for 30 min. Each gRNA was fluorescently labeled with labelIT Cy5 or MFP488 (Fisher Scientific, MIR3725 and MIR7125) following the manufacturer’s instructions. Briefly, 40 ul of the annealed tracrRNA: gRNA complex were mixed with 10X labelling buffer A (final concentration 1X), 4 ul nuclease free water, and 1ul of the labelIT reagent, and incubated at 37 °C for 1 h. The labeled nucleic acids were purified using ethanol precipitation and resuspended in 20 ul of duplex buffer.

Transfections: The gRNA-Cas9 RNP complex was formed by incubating 40 ul of 1.5 μm of the labeled tracrRNA: gRNA with 40ul of 1.5 μm of the Cas9 (IDT, 1081058) and 40 ul of Opti-mem media (Thermofisher, 31985062), and was incubated at room temperature for 30 min. Next, 1.25 ul of the RNAiMAX transfection reagent (Thermo Fisher Scientific, 13778150) was added and the tube was incubated at room temperature for another 15 min. The RNP complexes (40 nM gRNA/ 96 well) were reverse-transfected with 8000 HTR-8/SVneo cells and grown for 24 h. Reverse transfection is a method where the RNP-complexes are prepared inside the wells, after which cells and medium are added. For each tracrRNA: gRNA: Cas9 complex, transfections were done in 4 wells for each gRNA (these cells were used for gating the 2 dyes separately on the cell sorter). For the dual gRNA transfections, both tracrRNA:gRNA:Cas9 complexes were reverse transfected into four 96 wells. Finally, another 4 wells were seeded with non-transfected cells (also used for gating) for a total of 16 wells per pair of gRNAs.

Cell sorting using the S3e Cell Sorter (Bio-Rad Laboratories, 1451006): The single labeled cells and the non-transfected cells were used to gate the individual dye signals. The dual-labeled cells containing Cy5 and MFP488 fluorescent dyes were gated and sorted, and one cell per well was added to a 96-well plate containing growth media with antibiotics (penicillin streptomycin solution, Thermofisher, 15140122). The media was replaced every other day with growth media without antibiotics, and the clones were grown for 10–14 days. Each clone was expanded into cell lines, and genomic DNA was isolated (Thermo Fisher Scientific, K182001) to confirm knockout by polymerase chain reaction (PCR) and Sanger sequencing. The sequenced regions were mapped to the human genome hg38 in the UCSC genome browser40,41. Genomic coordinates and nucleotide sequences for the CRISPR-KO cell lines are included in Supplementary Tables S6 and Supplementary File 2, respectively. Primers that were used to confirm knockout by PCR are listed in Supplementary Table S7.

TOMTOM verification: CRISPR-Cas9-mediated knockouts in E1-KO_c1, E1-KO_c2, E1.2-KO_c1 and E1.2-KO_c2 were checked for the introduction of any new transcription factor binding sites using the TOMTOM tool from the MEME Suite website47. We provided input sequences spanning 5 bp upstream and downstream of the knockout regions in all four cell lines. For the analysis, we used the JASPAR CORE 2022 (Vertebrates) database and the JASPAR NON-REDUNDANT option for motif comparison, and an E < 0.05 threshold. We only considered aligned nucleotides for motif comparison47. A TF motif identified by the tool would be considered a strong motif if it met the criteria of E-value < 0.05 and q-value < 0.0547.

RNA isolation, cDNA preparation, and RT-qPCR

RNA was isolated from HTR-8/SVneo cells using the Invitrogen RNA Minikit following the manufacturer’s instructions (Thermo Fisher Scientific, 12183018 A). The RNA concentration was measured using a nanodrop (Fisher Scientific, NDNDLUSCAN), and reverse transcribed into cDNA following the manufacturer’s instructions (Thermo Fisher Scientific, 4368814). Target gene expression was measured by RT-qPCR using 6 ng of cDNA/well. Primers that were used are listed in Supplementary Table S8. GAPDH was used to normalize gene expression, and changes in gene expression were calculated as previously described, with details provided herein48. Mean Ct values were calculated from technical replicates, and relative quantities (RQ) were computed using the formula RQ = (1 + E)^ΔCt, where E is the PCR efficiency for each primer pair and ΔCt is the difference between Ct of each sample and the average control group Ct. MMP9 expression was normalized to GAPDH by dividing the RQ of MMP9 by that of GAPDH. Group means, standard deviation (SD), and standard error of the mean (SEM) were calculated from the normalized values. Raw Cts and calculations for RT-qPCR are included in Supplementary File 3. MIQE guidelines were followed, and details are provided in Supplementary File 4.

Western blots

Whole lysates were collected from HTR-8/SVneo cells, and western blots were performed as previously described, with details provided herein35. A consistent number of cells was used across all samples loaded on a single blot. Based on the experiment, either 300,000 or 400,000 cells were pelleted (Supplementary Table S9). Pelleted cells were resuspended in 4x Laemmli protein sample buffer (Bio-Rad Laboratories, 1610747) and then heated at 95 °C for 10 min before running them on a 10% SDS-PAGE gel for 65–70 min. Resolved proteins were transferred to a 0.2 μm nitrocellulose membrane (Bio-Rad Laboratories, 1704270) using the Trans-Blot Turbo transfer system following manufacturer instructions (Bio-Rad Laboratories, 1704150). After protein transfer, membranes were trimmed. Membranes were then blocked in 5% non-fat milk prepared in TBS-T (0.1% Tween-20 in Tris-buffered saline) for 1 h at room temperature, followed by overnight incubation and antibody hybridization at 4 °C with primary antibodies diluted in TBS-T. After four washes with TBS-T, membranes were incubated with secondary antibody diluted in 1% non-fat milk prepared in TBS-T for 1 h at room temperature. Signal detection was performed using Pierce ECL Western Blotting Substrate (Thermo Fisher Scientific, 32209) using manufacturer’s instructions, and membranes were imaged using the Gel Doc XR + Imaging System (Bio-Rad). Band intensities corresponding to MMP9 (82 kDa), TFAP2C (49 kDa) and GAPDH (36 kDa) were quantified for each sample using the ImageJ software49. Images obtained from the Bio-Rad Gel Doc XR + imaging system were opened in ImageJ. Images were inverted, background was subtracted with light background off and sliding paraboloid on, then inverted again. For lane quantification, the three bands were first defined as lanes and then plots were generated using the plot lanes option. Lanes were separated with lines in the plot and peaks were selected with the wand tool in the sample order to obtain the final background subtracted band intensities. Background-subtracted MMP9 and TFAP2C values were exported and normalized to the GAPDH intensity. These values were then further normalized to the average of the control group (WT) to obtain the relative protein expression in knockout or knockdown cell lines. Membranes shown in Figs. 2d,h, 4d, 5c and 6c were uniformly cropped to retain the target protein band. Membranes shown in Supplementary Fig. S2, S3, S5, S6 and S7 were uniformly cropped and include multiple bandwidths above and below the target protein band for the samples used in this study. Trimmed membranes with visible edges are available at: https://osf.io/37eg2/. Antibodies used, dilutions, and incubation conditions are listed in Supplementary Table S9. Raw values for MMP9, TFAP2C and GAPDH band intensity, step-by-step normalization for each replicate, standard errors and p-values are included in Supplementary File 5.

siRNA knockdown (KD) of TFAP2C

siRNA KD of TFAP2C in the HTR-8/SVneo cell line was done according to protocols previously described, with details provided herein35,43. HTR-8/SVneo cells were seeded in six-well plates at 150,000 cells/well. After 24 h, cells were transfected with two different siRNAs targeting TFAP2C (Qiagen, SI02630649 and SI02630656) and one negative control siRNA (IDT, 270532307) at a concentration of 30 nM siRNA diluted in Lipofectamine RNAiMAX Transfection Reagent at a ratio of 1:1, adhering to the manufacturer-recommended protocol (Fisher Scientific, 13778150). The media was replaced 24 h post-transfection, and cells were harvested 48 h post-transfection for RNA isolation, cDNA preparation, RT-qPCR and western blotting as described above. Gene and protein expression after KD were calculated by normalizing to GAPDH and negative control siRNA treated cells as described above. Changes in expression were log2-transformed for graphing and statistical analyses.

Statistical analyses

All experiments were performed with three biological replicates. Results are displayed as the mean ± standard error (SE), and p-values were calculated using the two-tailed student’s t-test. n.s. indicates not-significant, (*) indicates p-value < 0.05, (**) indicates p-value < 0.01, (***) indicates p-value < 0.001.

Results

Putative enhancer upstream of Mmp9 drives luciferase reporter activity in trophoblast cell lines

We previously used ChIP-seq for the enhancer-associated histone modification, H3K27ac, in the e7.5 mouse ectoplacental cone and identified a 6820 bp putative enhancer that is 7025 bp upstream of Mmp934 (Fig. 1a). To determine if parts of the identified region could drive enhancer activity, we divided it into seven 1 kb segments (mE1-mE7), and each segment was cloned into the pGL4.23 reporter vector to measure luciferase reporter activity (Fig. 1a). In mTGCs and the HTR-8/SVneo cell line, the mE1 segment had significantly higher luciferase reporter activity than mE2-mE7 (p-value < 0.001) (Fig. 1b and c). To further localize reporter activity, we divided mE1 into two roughly equal segments, mE1.1 and mE1.2 (Supplementary Fig. S1). mE1.1 and mE1.2 were designed to have a high percentage of base pairs conserved between the mouse and human genomes (Supplementary Table S10). The mE1.2 segment had 12.47-fold higher luciferase reporter activity than mE1.1 in mTGCs (p-value < 0.001) (Fig. 1d) and 10.85-fold higher luciferase reporter activity than mE1.1 in the HTR-8/SVneo cell line (p-value < 0.01) (Fig. 1e). We also designed corresponding constructs using the human genome (E1, E1.1, and E1.2). The human constructs included base pairs conserved in the mouse sequence and were designed using the liftover tool from the UCSC genome browser40,41,50 (Supplementary Fig. S1, Supplementary Table S2). The human constructs were transfected in the HTR-8/SVneo cell line to confirm that constructs designed using the human genome had similar reporter activity trends to those designed using the mouse genome. We did indeed observe higher activity for E1.2 compared to E1.1 in the HTR-8/SVneo cell line (3.27-fold, p-value < 0.01) (Supplementary Fig. S1). We also note that the mouse constructs (mE1, mE1.2) exhibited considerably higher luciferase reporter activity when transfected in the HTR-8/SVneo cell line compared to the human constructs (E1, E1.2) in the HTR-8/SVneo cell line (Fig. 1c and e and Supplementary Fig. S1).

These results demonstrate that part of the putative enhancer identified through H3K27ac ChIP-seq (mE1) can drive luciferase reporter activity, and half of the conserved portion of mE1 (mE1.2) had significantly higher luciferase reporter activity than the other half (mE1.1) in both mouse and human trophoblast cell lines (Fig. 1b,c,d,e). The corresponding human sequence constructs (E1.1 and E1.2) showed similar trends in luciferase reporter activity in the HTR-8/SVneo cell line (Supplementary Fig. S1).

Fig. 1
figure 1

Putative enhancer identified upstream of Mmp9 drives luciferase reporter activity in mouse and human trophoblast cell lines. (a) H3K27ac ChIP-seq data (in red) was used to identify a putative enhancer (light blue bar) 7025 bp upstream of Mmp9 in e7.5 mouse ectoplacental cones. The distance was measured from the middle of the enhancer to the gene transcription start site (determined using the eukaryotic promoter database51). Black bars below the putative enhancer show the E1-E7 (~ 1 kb length each) segments that were cloned into the pGL4.23 reporter vector. Data was visualized on the mm9 genome using the UCSC genome browser40,41. (b,c) E1 has significantly higher luciferase reporter activity than all other constructs in mTGCs (p-value < 0.001) in (b) and the HTR-8/SVneo cell line (p-value < 0.01) in (c). (d,e) When E1 is divided into E1.1 and E1.2, E1.2 luciferase reporter activity is significantly higher than E1.1 in mTGCs (d) and the HTR-8/SVneo cell line (e). Primers and genomic coordinates used for the design of mE1-mE7, mE1.1 and mE1.2 are in Supplementary Table S1, S2 and Supplementary Fig. S1. For Fig. b-e: the y-axis shows luciferase activity relative to the empty pGL4.23 vector. (**) indicates p-value < 0.01 and (***) indicates p-value < 0.001. The black dots represent individual biological replicate values for the experiment. All experiments were performed with three biological replicates. Results are displayed as the mean +/- standard error (SE), and p-values were calculated using the two-tailed student’s t-test.

E1 and E1.2 segments regulate MMP9 gene and protein expression

To determine if E1 and E1.2 regulate MMP9 gene and protein expression, we used CRISPR-Cas9 mediated knockout in the HTR-8/SVneo cell line (Fig. 2a). First, we designed gRNAs to knockout the E1 segment (Fig. 2b). We obtained two E1 knockout lines, E1-KO_c1 and E1-KO_c2, both of which were heterozygous and confirmed by Sanger sequencing (Supplementary Fig. S2). Genomic coordinates and nucleotide sequences for the E1 knockout lines (E1-KO_c1 and E1-KO_c2) are included in Supplementary Table S6 and Supplementary File 2. E1 knockout resulted in a 3.29-fold and 3.47-fold decrease in MMP9 gene expression in E1-KO_c1 and E1-KO_c2 cell lines, respectively (p-value < 0.01, both) (Fig. 2c). We also observed a decrease in MMP9 protein expression by 8.45-fold in the E1-KO_c1 cell line and 5.74-fold in the E1-KO_c2 cell line (p-value < 0.01, both) (Fig. 2d and e, Supplementary Fig. S2).

Next, we designed gRNAs to knockout the E1.2 segment (Fig. 2f). We obtained two E1.2 knockout lines, E1.2-KO_c1 and E1.2-KO_c2, both of which were heterozygous and confirmed by Sanger sequencing (Supplementary Fig. S3). Genomic coordinates and nucleotide sequences for the E1.2 knockout lines (E1.2-KO_c1 and E1.2-KO_c2) are included in Supplementary Table S6 and Supplementary File 2. To our surprise, the knockout of E1.2 resulted in a 4.78-fold (p-value < 0.001) and 2.27-fold (p-value < 0.01) increase of MMP9 gene expression in E1.2-KO_c1 and E1.2-KO_c2 cell lines, respectively (Fig. 2g). Similarly, we observed an increase in MMP9 protein expression by 8.06-fold (p-value < 0.001) in the E1.2-KO_c1 cell line and 3.05-fold (p-value < 0.01) in the E1.2-KO_c2 cell line (Fig. 2h and i, Supplementary Fig. S3).

To confirm the knockout cell lines did not create new TF binding sites, we analyzed the edited genomic sequences using the TOMTOM tool in the MEME Suite47 (see methods). None of the 841 TF motifs from the JASPAR 2022 vertebrates database was determined to be a strong match (defined as E-value < 0.05 and q-value < 0.0547) to the edited genomic sequences. This provides evidence that a newly formed motif is not causing repression in E1-KO cell lines or de-repression in E1.2-KO cell lines.

These results demonstrate that E1 positively regulates MMP9 gene and protein expression in the HTR-8/SVneo cell line. Although E1.2 showed high activity in luciferase assays (Supplementary Fig. S1), CRISPR-Cas9 mediated knockout of E1.2 showed that it negatively regulates MMP9 gene and protein expression in the HTR-8/SVneo cell line.

Fig. 2
figure 2

The E1 and E1.2 segments regulate MMP9 gene and protein expression. (a) Outline of the CRISPR-Cas9 mediated knockout method using Ribonucleoprotein complexes (RNPs). Two guide RNAs (gRNA) were designed, and each gRNA was fluorescently labeled with either MFP488 or Cy5 and fused with the Cas9 protein to assemble the CRISPR/Cas9-gRNA RNP complex. This complex was then reverse-transfected into the HTR-8/SVneo cell line. Dual-labeled cells were sorted and screened to confirm the knockout of E1 and E1.2. (b) The location of the gRNAs used to knockout the E1 segment (E1-KO). (c) The E1 knockout cell lines (E1-KO_c1 and E1_KO_c2) have significantly decreased MMP9 gene expression compared to the wild-type (WT) cell line. (d) Images and (e) quantification of western blots showing that MMP9 protein expression was significantly decreased in the E1-KO_c1 and E1-KO_c2 cell line compared to the WT cell line. (f) The location of the gRNAs used to knockout the E1.2 segment (E1.2-KO). (g) The E1.2 (E1.2-KO_c1 and E1.2-KO_c2) knockout cell lines have significantly increased MMP9 gene expression compared to the WT cell line. (h) Images and (i) quantification of western blots showing that MMP9 protein expression was significantly increased in the E1.2-KO_c1 and E1.2-KO_c2 cell lines compared to the WT cell line. In (b) and (f), the black bar represents the E1 and E1.2 segments respectively, and the inverted triangles are the nucleotide locations where the gRNA was designed to cut. The colors of the triangles correspond to the different fluorescent labels, MFP488 (green) or Cy5 (red). The black dots in (c), (e), (g) and (i) are individual biological replicate values for the experiment. Details of antibody concentration and cell counts used for western blots are provided in Supplementary Table S9. (a), (b) and (f) were created with BioRender.com. (**) indicates p-value < 0.01, and (***) indicates p-value < 0.001. All experiments were performed with three biological replicates. Results are displayed as the mean +/- standard error (SE) and p-values were calculated using the two-tailed student’s t-test.

Small deletions identify activating and repressing segments within E1

To better understand the unexpected increase in MMP9 expression upon knockout of E1.2, we designed eleven constructs (HD1-11) with small deletions across E1. The deletions were based on the location of predicted TF binding sites from the JASPAR Transcription Factor Binding Site Database track, specifically the “JASPAR CORE 2022 - Predicted Transcription Factor Binding Sites” from the UCSC genome browser40,41,44 (Fig. 3a, Supplementary Fig. S4). Luciferase assays showed that, compared to E1, deletion of HD1 decreased luciferase reporter activity by 1.74-fold (p-value < 0.01), deletion of HD9 decreased luciferase reporter activity by 2.23-fold (p-value < 0.01), and deletion of HD10 decreased luciferase reporter activity by 3.28-fold (p-value < 0.001) (Fig. 3b). On the other hand, the deletion of HD11 increased luciferase reporter activity by 3.32-fold compared to E1 (p-value < 0.01) (Fig. 3b), which supports the data showing increased MMP9 expression in E1.2 knockout cells (Fig. 2g–i).

Fig. 3
figure 3

SDDs across E1 identify segments that change luciferase reporter activity. (a) HD1-HD11 deletion constructs were designed in the human E1-pGL4.23 based on the pattern of predicted TF binding sites in the JASPAR track of the UCSC genome browser40,41,44 (Supplementary Fig. S4). (b) HD1, HD9, and HD10 (blue bars) have significantly decreased luciferase reporter activity compared to E1, and HD11 (orange bar) has significantly increased luciferase reporter activity compared to E1. The x-axis shows luciferase activity relative to the empty pGL4.23 vector. The black dots in (b) are individual biological replicate values for the experiment. (**) indicates p-value < 0.01, and (***) indicates p-value < 0.001. All experiments were performed with three biological replicates. Results are displayed as the mean +/- standard error (SE), and p-values were calculated using a two-tailed student’s t-test.

Transcription factor activator protein 2 C (TFAP2C) binding site is in HD11 and negatively regulates MMP9 expression

We used the JASPAR track from the UCSC genome browser40,41,44 to identify TFs with predicted binding sites in HD11 that could regulate the increased activity we observed and identified a binding site for transcription factor activator protein 2 C (TFAP2C), a well-known regulator of placental development (Fig. 4a)52,53,54,55. We also confirmed that TFAP2C is expressed in the HTR-8/SVneo cell line using different sources of published RNA-seq data. Total RNA-seq data from the HTR-8/SVneo cell line shows an average expression of 12.68 fragments per kilobase of transcript per million fragments mapped (FPKM) for TFAP2C56. In another published RNA-seq data from the HTR-8/SVneo cell line where the expression is reported in transcripts per million (TPM), the average expression level of TFAP2C is 3.67 TPM43. To determine if the TFAP2C binding site is important for reporter gene activity, we designed primers to delete it from the E1-pGL4.23 construct. Deleting the TFAP2C binding site did indeed increase luciferase reporter activity compared to E1 (1.68-fold, p-value < 0.05, Fig. 4b).

We next used two siRNAs targeting TFAP2C, TFAP2C siRNA#1 and TFAP2C siRNA#2, to knockdown its expression. TFAP2C knockdown resulted in a 4.54-fold and a 3.68-fold decrease in TFAP2C gene expression in TFAP2C siRNA#1 and TFAP2C siRNA#2 treated cells, respectively (p-value < 0.05, both) (Fig. 4c). We observed a significant 3.21-fold (p-value < 0.05) increase in MMP9 gene expression in the TFAP2C siRNA#2 treated cells, and although we did see an increase in MMP9 gene expression in TFAP2C siRNA#1 treated cells, it was not statistically significant (2.06-fold, p-value = n.s., Fig. 4c). TFAP2C protein expression decreased by 6.05-fold (p-value < 0.01) and 2.90-fold (p-value < 0.001) in TFAP2C siRNA#1 and TFAP2C siRNA#2 treated cells, respectively (Fig. 4d and e, Supplementary Fig. S5). MMP9 protein expression increased by 3.04-fold (p-value < 0.01) and a 6.62-fold (p-value < 0.01) in TFAP2C siRNA#1 and TFAP2C siRNA#2 treated cells, respectively (Fig. 4d and e, Supplementary Fig. S5). These results together indicate that TFAP2C, a transcription factor that is expressed in the trophoblast cell line, potentially binds at HD11 to negatively regulate MMP9 expression at the protein level.

Fig. 4
figure 4

TFAP2C binding site regulates luciferase reporter activity and TFAP2C negatively regulates MMP9 expression. (a) A TFAP2C binding site was identified in HD11. The TFAP2C binding motif is from the JASPAR database44. (b) The TFAP2C binding site shown in (a) was deleted from the E1-pGL4.23 vector (TFAP2C-del), and luciferase assays showed significantly higher activity than E1. The y-axis shows luciferase activity relative to the empty pGL4.23 vector. (c) TFAP2C gene expression was significantly decreased after treatment with both siRNA#1 and siRNA#2. TFAP2C siRNA#1 (blue) did not significantly increase MMP9 expression, and TFAP2C siRNA#2 (pink) treated cells show a significant increase in MMP9 gene expression. (d) Images and (e) quantification of western blots showing that TFAP2C protein expression was significantly decreased after treatment with both siRNA#1 and siRNA#2. MMP9 protein expression was significantly increased in TFAP2C siRNA#1 and siRNA#2 treated cells. In (c) and (e), the y-axis shows the log₂-transformed fold change values compared to the negative control siRNA treatment, for gene expression and western blot quantification respectively. The black dots in (b), (c) and (e) are individual biological replicate values for the experiment. Details of antibody concentration and cell counts used for western blots are provided in Supplementary Table S9. n.s. indicates not-significant, (*) indicates p-value < 0.05, (**) indicates p-value < 0.01, (***) indicates p-value < 0.001. All experiments were performed with three biological replicates. Results are displayed as the mean +/- standard error (SE), and p-values were calculated using a two-tailed student’s t-test.

CRISPR-Cas9 mediated knockout of HD1 and HD9-10 segments decrease MMP9 expression

To determine if the activating segments (HD1, HD9, and HD10) identified within E1 regulate MMP9 expression, we performed CRISPR-Cas9 mediated knockout using the method described in Fig. 2a. We first designed gRNAs to knockout HD1 (Supplementary Table S6) (Fig. 5a) and obtained two knockout lines, HD1-KO_c1 and HD1-KO_c2, both of which were heterozygous. HD1 knockout was confirmed by Sanger sequencing, and genomic coordinates and nucleotide sequences are included in Supplementary Table S6 and Supplementary File 2. CRISPR knockout of HD1 resulted in a 3.97-fold and a 10.25-fold decrease in MMP9 gene expression in HD1-KO_c1 and HD1-KO_c2 cell lines, respectively (p-value < 0.001, both) (Fig. 5b). We also observed a decrease in MMP9 protein expression by 3.22-fold (p-value < 0.001) in the HD1-KO_c1 cell line and 3.54-fold (p-value < 0.01) in the HD1-KO_c2 cell line (Fig. 5c and d, Supplementary Fig. S6). Using the JASPAR database track from the UCSC genome browser40,41,44, we identified a binding site for TEA domain transcription factor 4 (TEAD4), within HD1 (Fig. 5e). TEAD4 is an essential mammalian TF that regulates human trophoblast stem cell maintenance57. We also confirmed that TEAD4 was expressed in the HTR-8/SVneo cell line using two published RNA-seq datasets, showing an average expression of 31.22 FPKM56 and 9.75 TPM43. Next, to determine if the TEAD4 binding site is important for reporter gene activity, we deleted it from the E1-pGL4.23 construct. This deletion resulted in a 1.76-fold (p-value < 0.05) decrease in luciferase reporter activity compared to E1 (Fig. 5f).

We next designed gRNAs to knockout HD9 and HD10 together (HD9-10) due to their proximity (Supplementary Fig. S7) (Fig. 6a). We generated two knockout lines, HD9-10-KO_c1 and HD9-10-KO_c2, both of which were heterozygous. We confirmed the HD9-10 knockout using Sanger sequencing, and genomic coordinates and nucleotide sequences are included in Supplementary Table S6 and Supplementary File 2. CRISPR-Cas9 mediated knockout of HD9-10 resulted in a 4.37-fold and a 5.73-fold decrease in MMP9 gene expression in HD9-10-KO_c1 and HD9-10-KO_c2 cell lines, respectively (p-value < 0.05, both) (Fig. 6b). We also observed a decrease in MMP9 protein expression by 2.75-fold (p-value < 0.001) in the HD9-10-KO_c1 cell line and 3.13-fold (p-value < 0.001) in the HD9-10-KO_c2 cell line (Fig. 6c and d, Supplementary Fig. S7). Using the JASPAR database track in the UCSC genome browser40,41,44, we identified transcription factor AT-rich interaction domain 3 A (ARID3A) binding sites in HD9 and HD10 (Fig. 6e). ARID3A is a TF crucial for mouse placental development, driving trophoblast differentiation and maintaining the structural integrity of the placenta58. We again confirmed, using two published RNA-seq datasets, that ARID3A was expressed in the HTR-8/SVneo cell line at an average level of 7.37 FPKM56 and 4.23 TPM43. Next, to determine if the ARID3A binding sites were important for reporter gene activity, we deleted the ARID3A binding site in HD9 (ARID3A_1-del) and HD10 (ARID3A_2-del). ARID3A_1-del decreased luciferase reporter activity by 2.89-fold compared to E1 and ARID3A_2-del decreased luciferase reporter activity by 2.66-fold compared to E1 (p-value < 0.05, both) (Fig. 6f). Finally, we confirmed that the knockout cell lines did not create new TF binding sites using TOMTOM, as described above.

In summary, HD1, HD9, and HD10 were confirmed as activating segments within E1 using CRISPR-Cas9 mediated knockout, and we associated binding sites for placental TFs TEAD4 and ARID3A with the reporter gene activity.

Fig. 5
figure 5

The HD1 segment regulates MMP9 gene and protein expression. (a) The location of the gRNAs used to knockout the HD1 segment (HD1-KO). The HD1 segment is the blue bar. The inverted triangles are the nucleotide locations where the gRNA was designed to cut, and the colors of the triangles correspond to the different fluorescent labels, MFP488 (green) or Cy5 (red). Created with BioRender.com. (b) The HD1 knockout cell lines (HD1-KO_c1 and HD1-KO_c2) have significantly decreased MMP9 gene expression compared to the WT cell line. (c) Images and (d) quantification of Western blots showing MMP9 protein expression was significantly decreased in the HD1-KO_c1 and HD1-KO_c2 cell lines compared to the WT cell line. Details of antibody concentration and cell counts used for western blots are provided in Supplementary Table S9. (e) A TEAD4 binding site was identified in HD1. The TEAD4 binding motif is from the JASPAR database. (f) The TEAD4 binding site was deleted from the E1-pGL4.23 vector (TEAD4-del), and luciferase assays showed significantly decreased activity for TEAD4-del compared to E1. The y-axis shows luciferase activity relative to the empty pGL4.23 vector. The black dots in (b), (d) and (f) are individual biological replicate values for the experiment. (*) indicates p-value < 0.05, (**) indicates p-value < 0.01, and (***) indicates p-value < 0.001. All experiments were performed with three biological replicates. Results are displayed as the mean +/- standard error (SE) and p-values calculated using a two-tailed student’s t-test.

Fig. 6
figure 6

The HD9-10 segment regulates MMP9 gene and protein expression. (a) The location of the gRNAs used to knockout the HD9-10 segment (HD9-10-KO). The HD9-10 segment is the blue bar. The inverted triangles are the nucleotide locations where the gRNA was designed to cut, and the colors of the triangles correspond to the different fluorescent labels, MFP488 (green) or Cy5 (red). Created with BioRender.com. (b) The HD9-10 knockout cell lines (HD9-10-KO_c1 and HD9-10-KO_c2) have significantly decreased MMP9 gene expression compared to the WT cell line. (c) Images and (d) quantification of Western blots showing MMP9 protein expression was significantly reduced in the HD9-10-KO_c1 and HD9-10-KO_c2 cell lines compared to the WT cell line. Details of antibody concentration and cell counts used for western blots are provided in Supplementary Table S9. (e) Two ARID3A binding sites were identified in HD9-10. The ARID3A binding motif is from the JASPAR database44. (f) The ARID3A binding sites were each deleted from the E1-pGL4.23 vector (ARID3A_1-del and ARID3A_2-del), and luciferase assays showed significantly decreased activity compared to E1. The y-axis shows enrichments relative to the pGL4.23 vector (empty vector control). The black dots in (b), (d) and (f) are individual biological replicate values for the experiment. (*) indicates p-value < 0.05, and (***) indicates p-value < 0.001. All experiments were performed with three biological replicates. Results are displayed as the mean +/- standard error (SE) and p-values calculated using a two-tailed student’s t-test.

Discussion

In this study, we systematically dissected a regulatory element upstream of MMP9 and identified activating and repressing segments using luciferase assays and CRISPR-Cas9 mediated knockout in a human trophoblast cell line. We further identified TFs that may bind to the activating and repressing segments to regulate MMP9 gene expression.

While many studies have investigated how the MMP9 promoter30,59 or regions proximal to it60,61 (defined as < 2 kb from the transcription start site62) regulate MMP9 expression, few studies have investigated distal enhancer mediated MMP9 regulation using molecular genetic approaches. Distal enhancers, located >2 kb from the MMP9 transcription start site, have been identified in a human colon cancer cell line and in mouse macrophages60,63. However, these distal enhancers do not overlap with the E1 region identified in the present study, and different TFs that may regulate the enhancer were identified, emphasizing the need to investigate MMP9 gene regulation in different cellular contexts. Interestingly, both studies provided strong evidence that enhancer RNAs (eRNAs) could be transcribed from the enhancers to regulate MMP9 gene expression60,63. While RNA-seq data in the HTR-8/SVneo cell line56 does not indicate the presence of eRNAs overlapping with the E1 region we identified, RNA-seq may not always accurately measure the presence of eRNAs due to their instability. Future studies could use GRO-seq to determine if the E1 region is transcribed into an eRNA to regulate MMP9 gene expression in trophoblast cells. Additionally, techniques such as chromosome conformation capture or investigating phase condensate models could yield important insights into how the E1 region interacts with the MMP9 promoter. The present study remains significant because it is the first to demonstrate that the deletion of a distal DNA segment regulates MMP9 expression, and that activating and repressing TFs may bind to fine-tune MMP9 gene expression.

The concept of activating and repressing TFs binding to the same regulatory element in the same context is similar to what has been described in Drosophila. In the Drosophila Kc cell line, an enhancer containing adjacent activator and repressor sequences that control target gene expression was identified64. While the Kc cell line contains a homogenous cell population65, it is important to note that the HTR-8/SVneo cell line is somewhat heterogeneous, containing both epithelial and mesenchymal cells66. It is possible that the repressing TF is binding to reduce MMP9 expression in epithelial cells, whereas the activating TFs are binding in mesenchymal cells and increasing MMP9 expression. This mode of regulation would be more similar to dual enhancer-silencers, which have been well described9,10,11,12. In the future, using FACS sorting based on cell size/markers to tease apart the heterogeneous HTR-8/SVneo cell line will further advance our understanding of how the MMP9 gene is regulated in different cellular contexts.

MMP9 is thought to have a role in both the mouse and human placenta27,29,30,31,32,33,67, but whether the gene is regulated by conserved mechanisms is unknown. The mouse mE1 segment we tested has 60.2% of its bases conserved in the human genome. Among the segments mE1–mE7, mE1 drives the strongest luciferase activity in both mTGCs and in the HTR-8/SVneo cell lines. However, it should be noted that the luciferase activity of mE1 is much higher in mTGCs than in the HTR-8/SVneo cell line. The human E1.2 segment, a subsegment of E1, also showed lower activity in HTR-8/SVneo cell line compared to mE1.2 activity in mTGCs. However, the relative trend of enhancer activity when comparing between constructs was consistent between the mTGCs and the HTR-8/SVneo cell line, despite the sequences not being fully conserved (Supplementary Table S10), as seen in previous studies68.

While we did knockdown TFAP2C and confirm it as a negative regulator of MMP9, we did not perform experiments to confirm TEAD4 or ARID3A as positive regulators of MMP9 expression in the HTR-8/SVneo cell line. Experiments using siRNA-mediated knockdown or CRISPR-KO approaches to knockdown or delete TEAD4 and ARID3A in the HTR-8/SVneo cell line, followed by molecular assays, would help further support their potential contribution as TFs that activate MMP9 expression in the placenta.

Another limitation of our study is that we used a cell culture system, providing only a controlled system to measure enhancer potential. Such in vitro systems do not fully recapitulate the complexity of in vivo regulation, where other placental cell types, cell–cell interactions, and extracellular matrix components may influence MMP9 expression. These in vivo factors could either amplify or dampen enhancer activity compared to our observations in the cell lines. Future work using transgenic animal models would be important to validate regulatory effects in their native chromatin and tissue context.

Our study highlights the importance of dissecting putative enhancers into smaller segments when investigating the mechanisms of gene expression. The CRISPR-Cas9 mediated knockouts of both the HD1 segment (a small part of E1.1) and the HD9-10 segment (a small part of E1.2) resulted in significant reductions in MMP9 expression. At this time, it is unclear if the segments function independently or together to regulate MMP9. One possibility is that simultaneous knockout of HD1 and HD9-10 would result in an even greater reduction in MMP9 expression. This result would indicate enhancer synergy, in which multiple elements coordinate to drive total enhancer activity69,70,71. The fact that E1.1 alone does not drive reporter gene activity as strongly as the complete E1 segment provides some evidence for enhancer synergy. However, the presence of the repressive HD11 segment, also in E1.2 and adjacent to HD9-10, complicates our understanding of the regulatory element’s function. Unlike the HD1 and HD9-10 regions, we could not design specific gRNAs with high on-target and low off-target effects for the HD11 segment due to its proximity to HD9-10. Nevertheless, luciferase assays revealed that HD11 is a repressive element within the E1.2 region. A more detailed dissection into the coordination of these segments will be necessary to fully confirm possible synergy between HD1 and HD9-10, as the repressive influence of HD11 likely impacts the overall activating capacity of this regulatory element.

In addition to understanding basic mechanisms of gene expression, our study may have clinical relevance. The TFAP2C binding site we identified in HD11 overlaps with a SNP, rs11697325. The A-8202 allele at this SNP was previously associated with a 2.8-fold increased risk of first-trimester pregnancy loss72. Furthermore, it has been suggested that the A-8202 allele is associated with increased MMP9 gene expression73. Consistent with these findings, immunohistochemical analyses of placentas from recurrent miscarriages have shown elevated MMP9 expression in trophoblasts32. These observations suggest that the rs11697325 SNP could interfere with TFAP2C-mediated repression, increasing MMP9 expression and potentially contributing to recurrent pregnancy loss. Further investigation into the A-8202 allele in this region is necessary to determine whether it directly affects MMP9 cis-regulatory activity and gene expression.

In this study, we identified a distal cis-regulatory element that contains both activating and repressing segments that is responsible for fine-tuned control of MMP9 expression in a human trophoblast cell line. This architecture may be more generalizable, but its prevalence is unknown and determining how common it is will require characterizing additional enhancers with similar functional approaches to those used in this study. This discovery thus underscores the complexity of enhancer architecture and suggests that further dissection of other putative enhancers is crucial to uncovering the precise regulatory mechanisms that may reveal similar activating and repressing regulatory motifs at other gene loci.