Multimeric transcription factor BCL11A utilizes two zinc-finger tandem arrays to bind clustered short sequence motifs

Horton, John R.; Yu, Meigen; Zhou, Jujun; Tran, Melody; Anakal, Rithvi R.; Lu, Yue; Blumenthal, Robert M.; Zhang, Xiaotian; Huang, Yun; Zhang, Xing; Cheng, Xiaodong

doi:10.1038/s41467-025-58998-7

Download PDF

Article
Open access
Published: 17 April 2025

Multimeric transcription factor BCL11A utilizes two zinc-finger tandem arrays to bind clustered short sequence motifs

Nature Communications volume 16, Article number: 3672 (2025) Cite this article

9585 Accesses
6 Citations
Metrics details

Subjects

Abstract

BCL11A, a transcription factor, is vital for hematopoiesis, including B and T cell maturation and the fetal-to-adult hemoglobin switch. Mutations in BCL11A are linked to neurodevelopmental disorders. BCL11A contains two DNA-binding zinc-finger arrays, low-affinity ZF2-3 and high-affinity ZF4-6, separated by a 300-amino-acid linker. ZF2-3 and ZF4-5 share 73% identity, including five out of six DNA base-interacting residues. These arrays bind similar short sequence motifs in clusters, with the linker enabling a broader binding span. Crystallographic structures of ZF4-6, in complex with oligonucleotides from the β-globin locus region, reveal DNA sequence recognition by residues Asn756 (ZF4), Lys784 and Arg787 (ZF5). A Lys784-to-Thr mutation, linked to a neurodevelopmental disorder with persistent fetal globin expression, reduces DNA binding over 10-fold but gains interaction with a variable base pair. BCL11A isoforms may form oligomers, enhancing chromatin occupancy and repressor functions by allowing multiple copies of both low- and high-affinity ZF arrays to bind DNA. These distinctive properties, apparently conserved among vertebrates, provide essential functional flexibility to this crucial regulator.

Activation of recombinases at specific DNA loci by zinc-finger domain insertions

Article Open access 31 January 2024

A universal deep-learning model for zinc finger design enables transcription factor reprogramming

Article Open access 26 January 2023

Biallelic loss-of-function variants in ZNF142 are associated with a robust DNA methylation signature affecting a limited number of genomic loci

Article 23 May 2025

Introduction

DNA-binding transcription factors (TFs) are central to regulating gene expression across the biological kingdoms^1,2,3. TFs typically bind to specific genomic locations in a DNA sequence-specific manner. Approximately 1500 sequence-specific DNA-binding TFs have been annotated in the human genome^4,5. Among these, three structural classes account for the majority of TFs: helix-loop-helix proteins ( ~ 100), homeodomain proteins ( ~ 250), and Cys2-His2 (C2H2) zinc-finger proteins ( ~ 700). Many regulatory promoter and enhancer elements are controlled by multiple TFs. These DNA-binding proteins often use a protein-protein interaction domain to associate with identical molecules, or other family members, to form functional dimers (homo- or heterodimers), tetramers, or higher-order oligomers^6,7. Such interactions enhance their binding affinity, extend the length of the recognition sequence (thus boosting specificity), and increase chromatin residency time, all of which contribute to stably controlling transcriptional output. Specific multimeric examples include the AP-1 (Fos/Jun) dimer⁸, Myc/Max⁹, NF-κB¹⁰, and p53 tetramer¹¹.

Among the ~700 C2H2 zinc-finger (ZF) proteins, many include an additional protein-protein interaction domain at their N-terminus: KRAB (Krüppel-associated box)¹², BTB (broad-complex, tramtrack, and bric-a-brac; also known as POZ for poxvirus and zinc finger)¹³, and SCAN (SRE-ZBP, Ctfin51, AW-1 (ZNF174), Number 18 cDNA (ZNF20))¹⁴. There are an estimated ~350 KRAB-ZF proteins¹⁵ (about half of all C2H2 ZF proteins), ~50 BTB-ZF proteins¹⁶, and ~70 SCAN-ZF proteins^17,18 (about 10% of C2H2 ZF proteins). Some proteins ( ~ 24) contain both SCAN and KRAB domains. While the KRAB domain is almost exclusively found in C2H2-ZF proteins, the SCAN and BTB domains are also present in non-ZF proteins^13,19. Some BTB-ZF proteins polymerize into filaments to regulate chromatin occupancy^20,21. The remaining ~200 C2H2-ZF proteins lack a recognizable protein-protein interaction domain, leaving their regulatory mechanisms unclear.

Specific mutations in the human adult β-globin gene result in sickle cell disease and β-thalassemia, both of which can be alleviated by increased expression of fetal globin (HbF)²². Using CRISPR-Cas9 screens of ~1500 DNA binding proteins in the Human Umbilical cord blood-Derived Erythroid Progenitor (HUDEP-2) cells, which expresses adult hemoglobin, two studies identified several groups of candidate regulators (including ZNF410) of HbF gene expression^23,24. These direct or indirect regulators include several C2H2 ZF proteins, some of which are already known (prior to the CRISPR screens) to silence HbF genes in adult human erythroid cells: BCL11A²⁵, ZBTB7A²⁶, and KLF1^27,28. Among these, ZBTB7A is a BTB/POZ-ZF protein, while the others lack a recognizable protein-protein interaction domain²⁹. Manipulating TF activities can have therapeutic value^{30,31,32,33,34}.

In this study, we investigate the DNA binding ability of B-cell leukemia/lymphoma 11 A (BCL11 A). BCL11 A (i) is highly conserved among vertebrates²⁹, (ii) plays a critical role in lymphocyte development during normal hematopoiesis, (iii) directly suppresses HBG1/2 expression (the genes for fetal γ-globin) in adult erythroid cells, (iv) controls the fetal-to-adult hemoglobin switch³⁵, and (v) has a number pathogenic human variants^36,37,38. BCL11 A³⁹ contains two ZF tandem arrays (ZF2-3 and ZF4-6) separated by a 300-amino-acid spacer (Fig. 1A). Our findings show that each ZF array in BCL11A can bind a similar six-base pair consensus motif. After reexamining published ChIP-seq peaks⁴⁰, we noticed that multiple copies of these homotypic motifs are present in BCL11A binding sites. We propose that DNA binding by the two ZF tandem arrays significantly extends the recognized sequence, thereby stabilizing transcriptional silencing of HBG1/2, together with BGLT3 and HBBP1, in HUDEP-2 erythroid cells. We tested this hypothesis through structural and DNA-protein interaction analyses.

Results

BCL11A contains two ZF tandem arrays predicted to bind similar sequences

Human BCL11A has 22 isoforms listed in GenBank (Supplementary Table 1), that range in size from 130 to 835 residues. Of these, BCL11A has four major splicing isoforms (Fig. 1A; bolded lines in Supplementary Table 1). The largest isoform (XL) includes 835 amino acids and contains six classic C2H2-ZF units. These six ZFs are in three groups (ZF1, ZF2-ZF3, and ZF4-ZF5-ZF6), separated by long stretches of residues (175 between ZFs 1 and 2, and 300 between ZFs 3 and 4). The L isoform lacks the C-terminal three-finger array, the S isoform is missing two-thirds of its C-terminal fragment (including five of the six fingers), and the XS isoform contains only 142 residues. The two ZF-containing, longer isoforms (XL and L) are expressed in primary adult human erythroid cells at both the RNA and protein levels²⁵.

The two ZF cluster arrays (ZF2-3 and ZF4-6) in BCL11A XL are conventional DNA-binding fingers. Each finger coordinates a zinc ion through two cysteine and two histidine residues, following the typical spacing pattern: (Cx₂C - x₁₂ - Hx_3-4H) with twelve residues between the last zinc-coordinating Cys and the first zinc-coordinating His (reviewed in ref. ⁴¹). Relative to the first Zn-coordinating His (Fig. 1B), the amino acids at positions -1, -4 and -7 of each finger are the principal determinants for recognizing three consecutive DNA bases, primarily on one DNA strand (referred to as the recognition strand; detailed in the structural analysis section).

In addition to the five canonical fingers, BCL11A also contains two atypical, isolated fingers near its N-terminus, referred to as ZF0 and ZF1 (Fig. 1B). ZF0 differs from the canonical fingers in three respects. First, in ZF0, the last His is substituted by a Cys, resulting in C3H1 (or CCHC) zinc coordination, similar to that seen in ZF4 of ZBTB7A⁴² and ZF11 of CTCF⁴³. Second, ZF0 has 14 residues, instead of the usual 12, between the last Zn-coordinating Cys and the first Zn-coordinating His. Third, ZF0, like ZF1, has a stretch of five hydrophobic residues in place of the polar and charged residues in conventional ZF units (underlined sequences in Fig. 1B). Using a prediction method for C2H2-ZFs⁴⁴, the resulting bit score for BCL11A finger ranged from 17.5 (ZF1) to 28 (ZF2-3) and 30 (ZF4-6) (Fig. 1B; higher scores indicate higher confidence). No score could be calculated for ZF0. While ZF0 is present in all four major isoforms of BCL11A (Fig. 1A), previous studies suggested that ZF0 and ZF1 are dispensable for HbF silencing by the XL isoform in mouse erythroleukemia cells³⁵. However, more recent research indicated that ZF0 and ZF1 are required for BCL11A tetramer or dimer formation, which is essential for maintaining steady-state protein levels and properly silencing HbF in HUDEP-2 cells⁴⁵. Additionally, introducing three pathogenic mutations found in patients with intellectual disability (T47P, C48F, and H66Q, respectively) into ZF0 of the L isoform of BCL11A prevented nuclear localization in HEK293T cells³⁸.

Pairwise comparison between ZF2-3 and ZF4-5 shows a significantly higher percentage of identical amino acids (38/52, 73%) (Fig. 1C), than within each array. Specifically, aside from the zinc-coordinating Cys2His2 residues themselves, ZF2 and ZF3 share only two identical residues (Ser at position -5 and Leu at -3), while ZF4 and ZF5 share just three residues (including a Glu between the two zinc-coordinating cysteine residues) (Fig. 1B). The higher sequence identity between ZF2-3 and ZF4-5, particularly the 5/6 base-interacting residues at positions -1, -4 and -7, suggests that these two ZF arrays might bind closely-related DNA sequences. Indeed, DNA sequences of six or nine base pairs were predicted to bind with ZF2-3 or ZF4-6, respectively (Fig. 1D), following the traditional rule of one finger recognizing three base pairs. In an in vitro protein binding microarray experiment, N-terminal tagged GST-ZF2-3 and GST-ZF4-6 fragments bind a similar consensus sequence (Fig. 1E)³⁵. The predicted binding sequences for both ZF arrays (ZF2-3 and ZF4-6) partially align with experimentally-determined motifs at DNA sequence positions of 2, 5 and 6, as identified by ChIP-seq⁴⁰ and CUT&RUN³⁵ in HUDEP-2 cells⁴⁶. However the predicted and experimentally-determined sequence specificities do not match at two adjacent positions: the actual guanine at position 3 and a predicted thymine at position 4 (compare Fig. 1D and F).

To better understand DNA recognition by BCL11A, we generated two protein fragments of BCL11A, residues 373-433 (ZF2-3) and residues 730-835 (ZF4-6) (Fig. 1G). Additionally, we created the mutant K784T (which reflects a pathogenic human mutation³⁷), that alters the base-interacting position -4 in ZF5, in the ZF4-6 fragment (Fig. 1B).

BCL11A binding peaks contain multiple copies of the binding motif

Given that the experimentally-determined sequence motif bound by BCL11A (XL or L) is relatively short (six nucleotides) compared to the ChIP_seq peak width of over 200-300 nucleotides, we investigated the number of motifs present under each peak. We analyzed five datasets from two independent experiments: two replicates of the endogenously inducible BCL11A-ER-V5 in HUDEP-2 cells using V5 antibody⁴⁰ and three replicates of the BCL11A CUT&RUN³⁵. The results from replicate 1 of Martyn et al. ⁴⁰ are shown in Fig. 2 and are consistent across replicates/experiments (Supplementary Fig. 1).

**Fig. 2: BCL11A ChIP-seq peaks contain multiple copies of the binding motif.**

We searched for the 6-mer motif of TGNCCN, allowing variations at positions 3 and 6 (where N is any nucleotide), or its complementary sequence NGGNCA. Since only four base pair positions (1, 2, 4 and 5) are constant, we expect to find one occurrence of this motif approximately every 256 ( = 4⁴) base pairs. Among the 4694 peaks examined, 3882 (82.7%) have at least two motifs, with a median of 4 (Fig. 2A). We used the peak width to calculate a theoretical expected motif count for each peak and determined that the observed mean motif count of four was 1.6X higher than the mean expected peak count (2.45). These differences were significant as indicated by the non-parametric Mann–Whitney U test (p = 1.3 × 10^-266) (Fig. 2B). We observed that motif count was positively correlated with peak width (Spearman correlation coefficient = 0.42, p = 2.2 x 10^-195) (Fig. 2C).

Next, we focused our analysis on the β-globin gene cluster on chromosome 11, where BCL11A naturally binds (Fig. 2D). The human β-globin cluster includes an embryonic ε gene, two fetal γ genes (HBG2 and HBG1), a non-coding BGLT3 (beta globin locus transcript 3), a pseudogene HBBP1 (hemoglobin beta subunit pseudogene 1), and the adult δ and β genes⁴⁷. The order of genes in the cluster reflects their temporal activation order during development: the first gene switch occurs in early gestation when the embryonic ε-gene is silenced and the fetal γ-genes are activated (ε-to-γ); and the second switch occurs around birth, when the adult β-globin gene is activated, replacing fetal γ-globin gene expression (γ-to-β)⁴⁸. In human erythroid HUDEP-2 cells, which express adult globins, BCL11A binding peaks were observed at the promoters of the γ-globin genes (HBG1 and HBG2), as well as the upstream locus control region (LCR), and downstream at BGLT3 and HBBP1^35,40 (Fig. 2D).

We analyzed each BCL11A binding peak in the β-globin locus for the presence of the TGNCCN motif. The locus control region (LCR) peak #1 contains six copies of this motif, peaks #2, #3 and #5 occupy the regulatory regions of HBG2, HBG1, and HBBP1, while peaks #4 and #6 occupy the intrageneric regions of BGLT3 and HBB (Fig. 2D). The BGLT3 long non-coding RNA (ncRNA) induces γ-globin expression^49,50. Positioned ~1-kb downstream of BGLT3 is the HBBP1 pseudogene (Fig. 2D) and overexpression of HBBP1 mRNA increases γ-globin expression in HUDEP-2 cells⁵¹. On the other hand, the HBBP1 genomic region might contain a repressive regulatory element for γ-globin silencing in adult erythroblasts, as deleting the HBBP1 region together with the intergenic region between BGLT3 and HBBP1 produces effects similar to those of BCL11A deletion⁵². HBBP1 mRNA associates with the RNA-binding protein HNRNPA1⁵³. Additionally, a genome-wide association study revealed that the HBBP1 SNP rs2071348 (g.5264146 A > C) was associated with β-thalassemia severity^54,55. Another SNP, rs10128556 (g.5220259 C > T), was found in the second intron of HBBP1⁵⁶. Given these associations between BCL11A and the HBBP1 region, and the implication of BCL11A in expression of the HBBP1 region, we focus here on the BCL11A binding site (peak 5) at HBBP1, which has six motifs (Fig. 2F).

ZF2-3 and ZF4-6 fragments bind TGNCCA

We synthesized two 20-bp DNA oligonucleotides (oligos #1 and #2), containing the TGACCA or TGCCCA motif, derived from the BCL11A binding site between BGLT3 and HBBP1 (Fig. 3A). The two oligos are nearly identical in their 5’ halves, differing only at one position between oligos #1 and #2, and are variable in their 3’ halves, conserved only at one position (Fig. 3B). We used three biophysical methods to characterize the binding affinities of BCL11A’s ZF2-3 and ZF4-6 arrays with these oligos: (a) fluorescence polarization (FP), (b) electrophoretic mobility shift assay (EMSA), and (c) isothermal titration calorimetry (ITC).

**Fig. 3: ZF2-3 and ZF4-6 bind TGNCCA element.**

Using FP, we measured the dissociation constants (K_D). ZF2-3 bound oligos #1 and #2 weakly, with a K_D in the low micromolar range ( > 2 μM) (Fig. 3C, D). Under the same conditions, ZF4-6 showed 20–50-fold higher binding affinity, with K_D values of 110 nM for oligo #2 and 43 nM for oligo #1 (Fig. 3E, F). A mutation in ZF4-6 (K784T; shown in red in ZF5 in Fig. 1B) reduced binding affinity by 10-fold, to >1 μM for oligo #2 and 0.4 μM for oligo #1 (Fig. 3G, H), though binding was still significant. [The K784T mutation is discussed more below, but it is associated with a distinct neurodevelopmental disorder that is also characterized by persistent HbF expression³⁷.]

The EMSA revealed distinctive binding modes not seen in the FP assays. For the high-affinity interaction between ZF4-6 and oligo #1, a single shift band appeared at or above K_D value of ~40 nM (Fig. 3L). For oligo #2, the first shift occurred around 40 nM, and a second shift became visible at ZF4-6 protein concentrations above 0.6 μM (Fig. 3K). Similar two-band shifts were observed for the K784T mutant (Fig. 3M, N), which has reduced binding affinity for both oligos. For the ZF2-3 fragment, which has the lowest binding affinity, super-shift bands were seen at higher protein concentrations for both oligos (Fig. 3I, J). These additional retardations in protein-DNA complex mobility at increased protein concentrations suggest that multiple ZF proteins can bind to a single oligo.

ITC provides the stoichiometry of the interaction, in addition to measuring binding affinity. We used ITC to repeat the binding experiments of ZF4-6 with oligos #1 and #2 by titrating the oligos into the protein sample. For oligo #1, the DNA-to-protein stoichiometry was 1.1, with a K_D value of 22 nM (Fig. 3O), consistent with results from FP and EMSA. For oligo #2, the DNA-to-protein stoichiometry was 0.5 (indicating one DNA molecule bound by two protein molecules), with a K_D value of 1.1 μM (Fig. 3P).

Notably, the three different biophysical methods captured different aspects of the binding process in the case of ZF4-6 binding oligo #2: FP detected the first binding event with a lower K_D of 0.11 μM (Fig. 3E), ITC revealed the second binding event (with a DNA-to-protein stoichiometry of 0.5) and a higher K_D of 1.1 μM (Fig. 3P), while EMSA visually demonstrated the two binding events as distinct band shifts (Fig. 3K). As discussed below, the binding of two ZF4-6 polypeptides to oligo #2, but only single binding to oligo #1, may reflect the presence of a partial second match to the binding motif in oligo #2 (TGCCCA, and acCCCA; Fig. 3A, B).

Structure of ZF4-6 binding TGACCA-containing oligo #1 specifically and non-specifically

To understand how BCL11A ZF arrays bind DNA, and why multiple protein molecules sometimes occupy the same DNA, we structurally characterized ZF4-6 in complex with DNA. Previous studies have analyzed ZF4-6 of BCL11A bound to a 12-bp DNA sequence, centered at the -115 position of the HBG1/2 promoter⁵⁷. More recently, ZF4-6 was reported in complex with a longer 19-bp DNA⁵⁸, though the sequence used did not exactly match any in the human genome (Fig. 4A).

**Fig. 4: ZF4-6 binding TGACCA-containing DNA oligo #1 specifically and non-specifically.**

We co-crystallized ZF4-6 with oligo #1, derived from human chromosome 11 (5,342,701-5,243,720), and determined its structure at 2.09 Å resolution (Supplementary Table 2). In our binding studies (Fig. 3F, L and O), both FP and EMSA showed K_D values of ~40 nM (ITC gave 22 nM), while both EMSA and ITC yielded single ZF4-6 binding (EMSA – no supershift; ITC – 1:1 stoichiometry). However, unlike previous structures, we found one double-stranded DNA molecule and three protein molecules (A, B and C) in the crystallographic asymmetric unit (Fig. 4B). Pairwise superimposition showed a root-mean-square deviation (rmsd) of 0.7 Å across 51 aligned residues of ZF4-5 between molecule A and molecule B, and 0.3 Å across 59 aligned residues of ZF5-6 between molecule A and molecule C. The largest movement (5.9 Å) was observed at Gly796, located in the linker between ZF5 and ZF6 (Fig. 4C). As described below, molecule A is involved in DNA-sequence specific interactions, while molecule B engages in non-specific DNA interactions, likely reflecting the very high concentration during crystallization (mM during crystallization and µM during EMSA and ITC). Molecule C is probably an artifact of the crystal lattice, where two DNA molecules are joined end-to-end.

Molecule A

Molecule A contained the intact three-finger array. Consistent with previous findings⁵⁷, ZF4 and ZF5 of molecule A make base-specific contacts in the DNA major groove (Fig. 4D). For clarity, we numbered the DNA base pair positions from 1 to 20 (Fig. 4D). Although we used TGACCA as the motif sequence, it is the complementary strand that is recognized. Specifically, Asn753 of ZF4 forms water-mediated contacts with the T:A base pair at position 3 (Fig. 4E), while Asn756 of ZF4 forms bidentate hydrogen bonds with adenine at position 4 (A4) (Fig. 4F). Val759 of ZF4 is too distant from C5 to interact, but Ser782 of the next finger, ZF5, makes a cross-strand H-bond with G5 (Fig. 4G). Gln781 of ZF5 bridges two base pairs at positions 5 and 6, and also forms a H-bond with Lys784 (Fig. 4H). That same lysine (Lys784), as well as Arg787, both from ZF5, each make two H-bonds with G7 and G8, respectively (Fig. 4I and J). Additionally, that arginine (Arg787 of ZF5) forms a van der Waals contact with the methyl group of T9 (Fig. 4K). Rather than continuing along the DNA major groove, ZF6 crosses to the minor groove, where it makes a DNA phosphate contact via Thr814 and three water-mediated contacts with base pairs at 11, 12, and 13 from the minor groove side (Fig. 4L).

Significantly, among the above-described interactions, only three result in completely sequence-specific recognition: Asn756 with A4, Lys784 with G7, and Arg787 with G8. Other interactions, such as those involving Asn753 of ZF4 (water-mediated), Ser784 of ZF5 (cross-strand), and Gln781 of ZF5 (bridging two neighboring base pairs), are more versatile and can accommodate alternative base pairs (as reviewed in ref. ⁴¹). Compared to the experimentally derived ChIP-seq motif, the specificity for guanine in TGnCCA is not explained by the current structural data (Fig. 4M). The residue responsible for this recognition, Val759 of ZF4 (see Fig. 4G), is a small hydrophobic amino acid, not typically associated with base-specific interactions.

The most unexpected interaction involves Gln781, as the pairing of Gln (or Asn) with adenine is a common mechanism for adenine recognition⁵⁹, consistent with the predicted sequence motif T(t/g)TCC(a/c) (Fig. 1D). We compared the binding affinities of four oligos with variation at the TGnCCA site (where n = A, C, T and G) (Supplementary Fig. 2A). ZF4-6 binds most strongly to a T:A pair (oligo #3), followed by A:T and G:C pairs, both showing approximately a 2× reduction in binding affinity, and C:G, with more than a 4× reduction (Supplementary Fig. 2B–E). This selectivity can be attributed to the hydrogen bonding pattern of Gln781, which donates a proton to the N7 nitrogen of adenine and accepts a proton from its N6 amino group (Supplementary Fig. 2F).

Molecule B

Molecule B had ZF4 and ZF5, but lacked electron density for ZF6. ZF4 and ZF5 approach the DNA from the minor groove of the second half of DNA, making contacts with seven phosphate groups - three on one strand and four on the other (Fig. 4N). Six residues from ZF4 and three from ZF5 – all polar and charged - are involved in these phosphate contacts. Notably, of these six phosphate-contacting residues, Asn753 and Asn756 of ZF4, and Gln781 and Arg787 of ZF5, are the same residues that participate in base-specific interactions in molecule A. Therefore, molecule B and molecule A appear to represent two states along the substrate-recognition pathway caught in one structure: non-specific and specific. A similar phenomenon has been observed with the bacteriophage T4 DNA methyltransferase⁶⁰ and E. coli lac repressor DNA binding domain⁶¹, where the same protein residues shift from electrostatic interaction with the DNA backbone in a non-specific complex to base-specific binding in cognate complexes. These findings suggest that ZF4-ZF5 of BCL11A can detect local variations in DNA shape (minor versus major grooves) and electrostatic potential. In a binding study with the 12-bp shorter DNA, deletion of ZF6 resulted in 3.6X reduced binding affinity relative to that of ZF4-6⁵⁷.

Molecule C

Finally, molecule C had ZF5 and ZF6, but lacked electron density for ZF4. Molecule C occupies the junction between two neighboring DNA molecules in the crystal lattice (Supplementary Fig. 3). The oligo #1 used for crystallization contains two C:G base pairs at positions 1 and 2, which partially match the consensus sequence recognized by ZF5. Indeed, Lys784 and Arg787 of ZF5 in molecule C interact with G1 and G2 (Supplementary Fig. 3) in the same manner as in molecule A. Interestingly, ZF6 adopts the same conformation in both the two-finger array of molecule C and the three-finger array of molecule A, inserting its Lys817 into the minor groove. In a binding study with the 12-bp shorter DNA, deletion of ZF4 resulted in 100X reduced binding affinity relative to that of ZF4-6⁵⁷.

Structure of K784T variant: two protein molecules bind one oligo #1 DNA

In addition to the wild-type ZF4-6, we also determined the structure of the K784T mutant³⁷ in complex with the same DNA (oligo #1) (Supplementary Table 2; see also Fig. 3G, H, M and N). In this structure, two K784T variant proteins (termed molecules D and E) traverse the major groove of DNA, making specific contacts with each half of the DNA molecule (Fig. 5A). The two mutant structures are highly similar, with an rmsd of 0.76 Å across the three fingers and 75 aligned residues (Fig. 5B). Each molecule recognizes a different half of the DNA, with molecule D interacting with base pairs 3-9, and molecule E with base pairs 10-14 (Fig. 5C, D). Asn756 of ZF4 in each molecule specifically recognizes A4 and A11, respectively (Fig. 5E, F).

ZF5 makes the most DNA base contacts. In molecule D, Gln781 and Arg787 each interact with two bases: C5 and T6, and G8 and T9, respectively (Fig. 5G). Thr784, which replaces the larger Lys784, is too short to reach the G7 base (Fig. 5G), suggesting that various base pairs can be accommodated at position 7. In molecule E, when Arg787 encounters adenine at position 15, its side chain becomes disordered, allowing Thr784 to move closer to T14 and form a van der Waals contact with its methyl group (Fig. 5H).

Compared to the wild-type structure described in Fig. 4, molecule D of the mutant closely resembles molecule A of the wild-type in recognizing the TGACCA sequence. In the wild-type, Lys784 in ZF4 occupies a base-interacting position (Fig. 1B), recognizing guanine at base pair position 7 (Fig. 4I). In contrast, the shorter side chain of Thr784 in the mutant is unable to touch the base (Fig. 5G). Interestingly, molecule E of the mutant adopts a specific conformation (Fig. 5H) that has apparently transitioned from the non-specific conformation observed in molecule B of the wild-type (Fig. 4N). This shift is likely due to the reduced specificity of the Thr784-Arg787 pairing (relative to Lys784-Arg787), which loosens the recognition from GG to NG (N = any nucleotide). However, when Arg encounters adenine, its side chain can adopt different conformations⁶² or become disordered, allowing the neighboring Thr784 to gain an interaction with T14. This flexibility allows the mutant to recognize a variety of different sequences.

Structure of ZF4-6 specifically binding TGCCCA-containing oligo #2

Next, we crystallized the three-finger array ZF4-6 with oligo #2, which contains a TGCCCA element (Supplementary Table 2). As indicated by the EMSA and ITC data (Fig. 3K and P), two molecules of ZF4-6 bind to oligo #2 (molecules F and G in Fig. 6A). Molecule F binds to the 5’ TGCCCA sequence, while molecule G binds to the 3’ half, which contains an overlapping CCCA sequence (Fig. 6B and C). Both molecules use a pair of Lys784 and Arg787 from ZF5 to make sequence-specific contacts with guanine bases at G7 and G8, and G14 and G15, respectively (Fig. 6D, E). In addition, molecule F uses Asn756 from ZF4 to form a base-specific bidentate H-bonds with A4 (Fig. 6F). In both molecules, polar residues Asn753 and Ser755 from ZF4 (Fig. 6G), as well as Gln781 and Ser783 from ZF5 (Fig. 6H), are capable of forming H-bonds as both hydrogen donors and acceptors. These polar residues form cross-base pair and cross-strand interactions with variable base pairs at positions 2 (C:G) and 3 (T:A) (Fig. 6G), positions 5 (G:C) and 6 (C:G) (Fig. 6D), positions 9 (A:T) and 10 (A:T) (Fig. 6H), and positions 12 (C:G) and 13 (C:G) (Fig. 6E). In summary, our structural analyses suggest that ZF4-5 is capable of recognizing TGACCA (molecule A in Fig. 4), TGCCCA (molecule F), or ACCCCA (molecule G in Fig. 6).

**Fig. 6: ZF4-6 binding TGCCCA-containing DNA oligo #2 specifically.**

AlphaFold3 prediction

Since we do not yet have recombinant full-length, or larger segments of, BCL11A to study the coordinated binding events of its two ZF arrays, we used AlphaFold3⁶³ to model BCL11A onto a 70-bp DNA sequence containing both oligo #1 and oligo #2 (Fig. 7A). We aimed to answer three questions (Fig. 7B): what are the binding modes of (i) the isolated ZF2-3 (residues 373–433), (ii) the isolated ZF4-6 (residues 731-835), and (iii) a larger fragment encompassing residues 373–835. To simplify the model, we only considered two binding sites.

AlphaFold3 predicted, consistent with our experimental data, that the two isolated ZF arrays can each bind to both the TGCCCA site (oligo #2) and TGACCA site (oligo #1) (Fig. 7C, D), with both ZF arrays traveling along the DNA in the same direction. For the large fragment containing both ZF arrays, the AlphaFold3 model placed ZF2-3 at the 3’ oligo #1 site and ZF4-6 at the 5’ oligo #2 site, with the ~300-residue linker between the two ZF arrays looping back and forth twice (Fig. 7E). Interestingly, AlphaFold3 also generated an alternative model, where ZF2-3 binds to the oligo #2, and ZF4-6 to the oligo #1, from two separate polypeptides (Fig. 7F). The AlphaFold3 models have a higher confidence (pLDDT values between 70 and 90 on a scale of 100) for the ZF domains, but very low model confidence (pLDDT <50) for the flexible linker (Fig. 7E). Our aim was to demonstrate that the long, 300-residue loop is sufficient to position the two zinc finger arrays in place. The shortest linker, at ~50 residues, would still allow for this placement (Supplementary Fig. 4).

It is noteworthy that immediately following ZF2-3 is a region abundant in serine and threonine residues, which are potential sites for protein phosphorylation, as well as acidic residues such as aspartate and glutamate (Fig. 7B and E). A similar feature has been observed in ZNF410, where an acidic and serine/threonine-rich loop precedes its five-finger array⁶⁴. This loop has been shown to offer an additional layer of regulation to ZNF410’s DNA-binding ability, beyond just sequence specificity⁶⁴, by acting as a cis-inhibitory DNA analog. This composition, which is conserved among vertebrate orthologs (Supplementary Fig. 5), suggests that the corresponding region in BCL11 A might function similarly, where phosphorylation or interactions with acidic residues may modulate its DNA binding activity or interactions with other proteins, providing a mechanism of regulation beyond the DNA sequence specificity.

An oligomer model of BCL11A mediated by the N-terminal ZF0 and ZF1

Inspired by the capabilities of AlphaFold3, we investigated whether the N-terminal region of BCL11A could self-interact to form dimers, tetramers or larger oligomers. During the review of this manuscript, Zheng et al. published a study presenting the crystal structure of the tetramer formed by isolated ZF0, and an AlphaFold3-predicted dimeric formation of isolated ZF1⁴⁵. In addition, the corresponding ZF0 in BCL11B, the paralog of BCL11A that has identical ZFs to those in BCL11A, has been shown to mediate the homo-multimerization of BCL11B⁶⁵ as well as hetero-multimerization with BCL11A⁶⁶. The modeling presented in our work provides an important confirmation of their most recent findings, and also provides insight and context (ZF2-3 and ZF4-6) to our evolving understanding of BCL11A’s role in HbF regulation.

We modeled residues 1-211, a fragment containing both ZF0 and ZF1, which is shared by the BCL11A-S, -L, and -XL isoforms (Fig. 1A and Supplementary Fig. 6A). The model predicts that the non-canonical ZF0 can dimerize and tetramerize, while ZF1 forms a second dimerization interface (Supplementary Fig. 6B). ZF0 is predicted to adopt two conformations, with either Gln70 or Cys71 as the fourth zinc ligand, resulting in a helix with different length (Supplementary Fig. 6C). Dimerization of ZF0 occurs via its β1 strand, extending the 2-strand β-sheet into a 4-strand β-sheet (Supplementary Fig. 6D). This dimer can further assemble into a tetramer through interactions involving the αA helix (Supplementary Fig. 6E). The N-terminal portion of this helix contains five hydrophobic residues, such as Ile60, which forms a hydrophobic core at the tetramer interface (Supplementary Fig. 6F).

Additionally, the two helices of ZF1 pack against each other to form a second dimer interface (αB in Supplementary Fig. 6G and 6H). The five-residue hydrophobic stretches at the beginning of the helices αA and αB are unique to the non-canonical ZF0 and ZF1, whereas the canonical ZFs at these positions contain polar and charged residues that interact with DNA bases (see Fig. 1B). Alternatively, this ZF1-medicated second dimer interface could form between two tetramers (Supplementary Fig. 6I), potentially allowing BCL11A to polymerize.

Discussion

We analyzed the structures of the three-finger array (ZF4-6) of BCL11A, in complex with two different oligonucleotides derived from the regulatory region between BGLT3 and HBBP1 (chr11: 5,243,691-5,243,760). In total, we observed seven ZF4-6 molecules (five wild-type and two K784T mutants), in three crystal structures with different binding modes. Of these, five molecules (A, D, E, F and G) make sequence-specific contacts with DNA bases (Figs. 4, 5 and 6). Molecule B forms non-specific contacts with the DNA phosphate backbone and is likely seen only at very high concentrations (Fig. 4N). Molecule C is positioned across the end-to-end junction of two DNA molecules at the crystal lattice, that is likely an artifact of crystallization (Supplementary Fig. 3). Each molecule of ZF4-6 interacts with DNA independently, as there is no obvious protein-protein interaction observed within the crystallographic asymmetric unit. We made the following observations.

Basis for sequence specificity

First, DNA sequence-specific recognition occurs mainly through three key residues: Asn756 in ZF4, and Lys784 and Arg787 in ZF5. These residues occupy the base-interacting positions of their respective ZF unit: position -4 of ZF4 and positions -4 and -1 of ZF5 (Fig. 8A). The pairing of Arg or Lys with guanine and Asn or Gln with adenine is a common mechanism for purine (guanine or adenine) recognition^59,67. BCL11A follows this rule, recognizing adenine A2 (via Asn756) and guanines G5 and G6 (via Lys784 and Arg787) in the consensus sequence motif (Fig. 8A).

**Fig. 8: Summary of BCL11A interaction with DNA.**

The Lys784-to-Thr (K784T) pathogenic mutation reduced DNA binding by over 10-fold for both oligos #1 and #2 (Figs. 3G and 3H). The structure of K784T bound to oligo #1 shows that the altered protein interacts with a variable base pair (C:G or A:T) at position 5 (Fig. 5). A similar mutation (K365T) in CTCF has been observed, where the mutant binds all four base pairs with consistent affinities (2–4 × variation)⁴³.

The importance of Asn756 and Arg787 in DNA binding has been previously evaluated through mutagenesis studies targeting both the protein and DNA sides⁵⁷. Alanine substitutions of Asn756 (N756A) and Arg787 (R787A) resulted in 7.5X and ~130X reduction in DNA binding affinity, respectively. Similarly, altering the Asn756-interacting A:T base pair to T:A caused approximately 6X reduction in binding affinity. Changing the Arg787-interacting G:C base pair to the three other alternative base pairs of A:T, T:A or G:C led to a much more severe reduction of ~50-100X. These naturally occurring mutations at positions -118 and -114 in the fetal globin promoter are associated with fetal globin derepression and the benign genetic condition of hereditary persistence of fetal hemoglobin (HPFH)^40,68,69. In addition, a recent analysis of 42 individuals with BCL11A-related intellectual developmental disorder identified a heterozygous Asn756-to-Lys missense variant in ZF4⁷⁰.

Second, Asn753 of ZF4 and Gln781 of ZF5 occupy the base-interacting position -7 in their respective ZF unit (Fig. 8A). Theoretically, they can recognize adenine at positions 1 and position 4 of the sequence motif. Indeed, a T:A base pair is predicted at position 4 (Fig. 1D). However, when ZF4 encounters a TTG sequence, the side chain of Asn753 is too short to reach adenine A1 (Fig. 6G). In the helical structure of DNA, base pairs are stacked like coins, with each rotated relative to the next, creating the helical shape. When adenine A2 interacts with Asn756 at the -4 position, A1 is rotated away from Asn753 at the -7 position. As a result, the shorter side chain of Asn753 makes two cross-strand interactions instead. Both the prediction and experimental derived consensus motifs show variability at position 1 of the sequence motif (Figs. 1D and 1F). A similar observation has been made with PRDM9, where an Asn at position -7 in two ZF units is too distant away to make specific contacts⁷¹.

For Gln781 of ZF5, which has a longer side chain, it was predicted to bind adenine at position 4, as observed with Gln264 in ZNF410²³ and Gln536 in CTCF⁴³. Indeed, our binding assay, which tested the four possible base pairs at position 4, showed that the ZF4-6 array binds most strongly to T:A, followed by A:T or G:C base pairs, with the least binding to C:G (Supplementary Fig. 1). In the absence of a preferred adenine, Gln781’s side chain rotates, allowing it to engage in hydrogen bonding (as both a H-donor and H-acceptor simultaneously) across two base pairs (positions 3 and 4) and two strands. Examples of these interactions include GpA/TpC in Fig. 3D, ApG/CpT in Fig. 5D, GpC/GpC in Fig. 6B, and CpC/GpG in Fig. 6C.

Third, the conservation of guanine at position 3 (Fig. 1F) is not explained by this study. The corresponding residue, Val759 in ZF4, is located too far from the base - 5.7 Å from G or 6.2 Å from C (Fig. 6G and 6H). Additionally, the hydrophobic valine residue is not typically involved in base-specific interactions. This may be a case of indirect sequence recognition^72,73.

Fourth, the preference for an A:T base pair at position 7 can be explained by the interaction between the methyl group of thymine T7 and Arg787 in the TpG dinucleotide (Figs. 4K, 5G, 6D and 6E). The methyl group of T7 makes van der Waals contacts with Arg787, which also interacts with the 3’ Gua of the same strand, forming what we call ‘methyl-Arg-G triads’^74,75.

Role of ZF6

Unexpectedly, ZF6 adopts the same conformation in six out of seven molecules (with molecule B lacking electron density for ZF6; Fig. 1C). Instead of continuing along the DNA major groove, ZF6 crosses to the minor groove, where it makes contacts with DNA phosphate and water-mediated interactions from the minor groove side (Fig. 4L). This behavior is not predicted by current knowledge of C2H2 ZF models, as the bit score for ZF6 is as high as for ZF4 and ZF5 (Fig. 1B). Other C2H2 ZF units acting as spacers across the minor groove have been observed before, such as ZF4 and ZF6 in TFIIIA⁷⁶, ZF2 in Zfp568⁶², ZF2 in HIC2⁷⁷, and ZF8 in CTCF^43,78,79. These spacer fingers do not share clear sequence similarity (Fig. 8B), and features that predict spacer functionality remain unknown. However, the presence of ZF6 strengthens the DNA-binding affinity of the ZF4-6 array, by contributing a positive entropy⁵⁸, while the ZF2-3 array, lacking a similar spacer, binds more weakly (Fig. 3). We note that the conformation of the C-terminal ZF6 relative to ZF5 remains fixed when ZF5 binds DNA specifically (Supplementary Fig. 7A). However, ZF6 becomes disordered when ZF5 binds DNA non-specifically (as seen in molecule B in Fig. 4) or adopts varied conformations in a solution NMR structure in the absence of DNA⁵⁸ (Supplementary Fig. 7B). Alignment of ZF4-6 in the presence or absence of DNA indicates that the inter-domain orientations between ZF4 and ZF5, as well as between ZF5 and ZF6, are not the same (Supplementary Fig. 7B). This suggests that the flexibility of the linkers between the two neighboring fingers allows the three-finger array to explore a broader range of DNA duplexes with varied sequences. A structure of ZF4-5, where ZF6 has become disordered, binds DNA similarly to that of ZF4-6 (Supplementary Fig. 7C, D).

Roles of ZF0 and ZF1

An isolated single ZF unit sometimes plays more complex roles than the DNA-binding ZF arrays. In the case of BCL11 A, mutations in ZF0 (Fig. 1A) are pathogenic and destabilize BCL11A³⁸. The role of ZF0 in tetramerization (ref. ⁴⁵ and described here) may explain the effects of these mutations on BCL11A stability. Furthermore, if ZF0 is truly mediating BCL11A multimerization, then the shortest splicing variants (S and XS in Fig. 1A) may play a role in modulating dimer and/or tetramer formation⁸⁰, similar to what has been seen (for example) with human glucuronosyltransferase UGT1A⁸¹. In addition, ZF1 may allow higher-order multimerization (Supplementary Fig. 6I). All four splice variants have ZF0, while the three larger ones have ZF1 as well, so the modulation of multimerization could be quite refined, driven by the absolute and relative levels of all four variants.

Finally, the BCL11A ZF0 and ZF1 sequences, including the hydrophobic patches, are highly-conserved among BCL11A orthologs from six vertebrate classes, ranging from humans to whale sharks (Supplementary Fig. 5). Even whale shark BCL11A, with just 63% overall amino acid identity to the human ortholog, is 79% identical in ZF0 and 90% in ZF1. Indeed, the only significant difference between the human and other BCL11A sequences shown is the spacer between ZF0 and ZF1, which is 50% longer than in the nonmammalian vertebrates.

Role of ZF2-3

Notably, ZF2-3 and ZF5-6 are highly similar, sharing 73% sequence identity (Fig. 1C), including 7 out of 8 residues shown directly involved in DNA base interactions (Fig. 8A), enabling the ZF2-3 array to recognize similar sequences. In our study, we estimated the K_D values for ZF2-3 binding to oligos #1 and #2 using FP and EMSA, which were in the low micromolar range ( > 2 μM). This is comparable to the findings by Viennel et al., where the two-finger array of ZF4-5, upon ZF6 deletion, exhibited a binding affinity of 2.15 μM in an AlphaScreen assay (see supplementary Fig. 5 of ref. ⁵⁸). Nevertheless, the lower-affinity two-finger array of ZF4-5 is still able to bind DNA (Supplementary Fig. 7C, D).

The importance of the ZF2-3 domain of BCL11 A can be understood through its discovery and functional studies, particularly in relation to the L-isoform (Fig. 1A), which contains the ZF2-3 DNA binding domain but lacks the C-terminal ZF4-6 array. BCL11A was initially identified as CTIP1 in a yeast two-hybrid screen for proteins expressed in the brain⁸², with CTIP1 later renamed as BCL11A-L and its homolog CTIP2 renamed as BCL11B-XL. BCL11A-L and BCL11A-S are the two major isoforms expressed in neurons⁸³, and when co-expressed in HEK293 cells, have transcriptional regulatory activity³⁸. BCL11A-L binds to DNA directly through the DNA-binding domain mapped to ZF2-3 (referred to as ZF3 and ZF4 in ref. ⁸⁴), with a consensus binding sequence of GGCCGGAGC. The underlined sequence is a variant of TGRCCR motif (R = G/A). In a conditional mouse model, BCl11A-L (CTIP1) directly suppresses the expression of Semaphorin 3c (Sema3c), which regulates cortical projection neuron polarization and migration⁸⁵. The Sema3c promoter contains six repeats of the sequence TGGGCCGG. Additionally, BCL11A-L has been suggested to act as a transcriptional repressor in adult erythrocytes^86,87. These earlier studies also proposed the idea of BCl11A self-association, including dimer and/or tetramer formation among its isoforms^80,84. Furthermore, using CRISPR-Cas9 editing, BCL11A expression was evaluated in a transgenic mouse model where ZF4-6 was deleted from the endogenous mouse BCL11A locus⁵⁸, BCL11A truncated protein lacking ZF4-6 (equivalent to the L-isoform) is more stable than wild-type BCL11A-XL (see supplementary Fig. 5F of ref. ⁵⁸).

Finally, in BCL11B – the BCL11A paralog that contains the identical DNA binding zinc fingers — a germ line missense mutation yields N441K in ZF2. This ZF2-altered BCL11B allele was initially found in a newborn with severe combined immunodeficiency (SCID)⁸⁸ and isolated from a patient with T cell deficiency and neurological disorders⁶⁶. The mutant acts in a dominant-negative manner through the formation of a non- or dys-functional dimer⁶⁵ and results in reduced binding to known canonical promoters, and also abnormal added binding to novel DNA sites not recognized by wild-type BCL11B⁶⁶.

In summary, enhanced chromatin occupancy at clusters of binding sites is crucial for transcription factors to stably regulate gene expression. Several studies have demonstrated that low-affinity DNA binding directs transcriptional outcomes during tissue development, in ways that are distinct from and cannot be replaced by high-affinity interactions^{89,90,91,92,93}. One notable example is the master myeloid transcription factor PU.1, which binds DNA with affinities spanning multiple orders of magnitude – covering a ~ 1000-fold range in dissociation constants⁹⁴. In this study, we demonstrate that BCL11A uses two tandem zinc-finger arrays, low-affinity ZF2-3 and high-affinity ZF4-6, to bind clustered short sequence motifs. For instance, binding two six-base-pair motifs can extend across a 70-bp regulatory region between BGLT3 and HBBP1 (Fig. 7). The ~300-residue flexible linker between the two ZF arrays allows the motifs to be spaced farther apart, effectively increasing the binding span. Along with recently described N-terminal self-interactions⁴⁵, different BCL11A isoforms may form dimers, tetramers, or larger polymers (Supplementary Fig. 6), enabling BCL11A to bind multiple clustered sites, thereby significantly enhancing chromatin occupancy and its repressor functions. Accordingly, it will be important to determine the relative expression of different BCL11A isoforms under different physiological conditions. These results deepen our understanding of this essential regulator.

Methods

BCL11A plasmid constructs

The human BCL11A ZF4-6 fragment (residues 730-835; pXC2391) was synthesized by GenScript and cloned into the pGEX-6P-1 vector, resulting in a construct with an N-terminal glutathione S-transferase (GST) tag followed by a recognition sequence (LEVLFGP) of human rhinovirus (HRV) type-14 3 C protease. The K784T mutant within the context of ZF4-6 (pXC2397) was generated by site-directed mutagenesis. The BCL11A ZF2-3 fragment (residues 373-433; pXC2394) was generated using overlapping PCR and cloned into a modified pET-28b vector, with a N-terminal His-SUMO tag.

BCL11A ZF arrays expression and purification

The three plasmids described just above were transformed into the Escherichia coli strain BL21 (DE3) Gold+ for protein overexpression. Bacteria were grown in LB medium in the presence of 50 µg/ml of ampicillin (ZF4-6 plasmids) or 25 µg/ml kanamycin (ZF2-3 plasmid). Cells were grown at 28 °C to an approximate OD600 of 1.0 at which 25 µM ZnCl₂ was added to the cell culture and shaker temperature was reduced to 15 °C. Thirty minutes later expression was induced by 1 mM isopropyl β-d-1-thiogalactopyranoside (IPTG) and cultures were allowed to grow overnight ( ~ 16 h).

All harvested cells were suspended in 500 mM NaCl Buffer A - 20 mM HEPES, pH 7.5, 500 mM NaCl, 5% glycerol, 25 µM ZnCl_2, and 0.5 mM Tris(2-carboxyethyl)phosphine (TCEP) - containing 1 mM phenylmethyl sulfonyl fluoride (PMSF). The cells were lysed by French Press followed by brief sonication. Lysates were cleared of cell debris by centrifugation of 70,000Gs.

For GST-ZF4-6, cleared lysate from cultures was loaded onto a 4 ml glutathione-Sepharose column and was then washed with Buffer A until Coomassie-stained 5 ml sampling spots on blotting paper displayed little color. Protein was eluted with Buffer A containing 20 mM glutathione. Pooled fusion protein was digested overnight with the in-house-produced 3C protease to release the GST tag.

For His-SUMO-ZF2-3, cleared lysate was loaded onto a 5-ml HiTrap Ni²⁺ column which was then washed by Buffer A containing 20 mM imidazole until there was little absorbance at 280 nm. The protein was eluted with a 20-column volume gradient with increasing amounts of imidazole (up to 300 mM) in Buffer A. Protein from pooled fractions was digested overnight with in-house-produced Ulp1 protease⁹⁵ to cleave the His-SUMO tag.

Purification proceeded similarly for the three proteins. Pooled fractions were diluted to 100 mM NaCl in Buffer A (20 mM HEPES, pH 7.5, 100 mM NaCl, 5% glycerol, 25 µM ZnCl_2, and 0.5 mM TCEP) and loaded onto a 5-ml HiTrap SP HP column. Proteins were eluted with a 20-column volume gradient with an increasing amount of NaCl (up to 1 M). Peak fractions containing the protein of interest with the smallest amount of other protein impurities were pooled and concentrated to ~2 ml volume and then loaded onto a sizing column (HiLoad 16/60 Superdex S200) equilibrated with 150 mM NaCl in Buffer A. Peak fractions were pooled, and some was immediately concentrated for crystallization trials while the rest was distributed into 1.5 ml Eppendorf tubes, flash frozen in liquid nitrogen and stored at -80 °C for later use.

Crystallography

Oligonucleotides were purchased from Integrated DNA Technologies (IDT) and were annealed in 50 mM NaCl and 10 mM Tris-HCl pH 7.5. All three proteins were concentrated to ~30-35 mg/ml ( ~ 2.6 mM) and the annealed DNA was added at a 1:1.3 molar ratio of protein to DNA such that the final concentration of protein was ~16 mg/ml ( ~ 1.3 mM) and incubated overnight at 4 °C. Crystallization trials using commercially available screens were conducted with an Art Robbins Gryphon Crystallization Robot, which was used to set up sitting drops (0.2 ml sample + 0.2 ml screen solution). Crystallization plates were incubated at ~19 °C. Crystals of ZF4-6 WT and its K784T mutant with oligo #1 (5’-CCT TGA CCA ATA GAT TCA T-3’ and 3’-GGA ACT GGT TAT CTA AGT AA-5’) grew in many conditions over 1–7 days, such as 24% (v/v) ethanol, 0.1 M HEPES pH 7.8, 40 mM magnesium chloride hexahydrate or 18% (w/v) polyethylene glycol (PEG) 3350, 17% (v/v) PEG 400, 4.8% (v/v) 2-propanol, 0.1 M 3-(cyclohexylamino)-2-hydroxy-1-propanesulfonic acid (CAPSO) pH 9.0. ZF4-6 WT with oligo #2 (5’-CCT TGC CCA AAC CCC ACC C-3’ and 3’-GGA ACG GGT TTG GGG TGG GG-5’) grew in only one condition (15% (w/v) PEG 1000, 30% (v/v) PEG 300, 0.1 M Bis-Tris propane pH 7.0) and appeared after several weeks. Crystals were harvested in a nylon or Litho loop and flash frozen in liquid nitrogen after momentary soaking in reservoir solution supplemented with 20% (v/v) ethylene glycol if the original condition did not contain any cryoprotectant.

All birefringent crystals were sent to beamline 17-ID-1 (AMX) at National Synchrotron Light Source (NSLS) II of Brookhaven National Laboratory and placed in a in a 100 K cryostream for screening and if adequate diffraction observed, for subsequent data collection. Diffraction data were processed with the autoPROC toolbox⁹⁶ utilizing XDS⁹⁷ for data reduction. Intensities from images representing 210° crystal rotation were merged with POINTLESS/AIMLESS^98,99.

BCL11A ZF4-6 and DNA complex structures were determined by molecular replacement in the PHENIX PHASER module¹⁰⁰ utilized PDB 6U9Q⁵⁸ as the initial model. After which, as additional electron densities that have the shape of ZFs were observed, second or third ZF molecules were modeled. All structure refinements were performed by PHENIX Refine^101,102. Randomly chosen reflections (5%) were used for validation by R-free values^103,104, with careful manual (re)building using COOT^105,106 within refinement cycles. During each round of PHENIX refinement, structure quality was analyzed, and validation was performed using the PDB validation server¹⁰⁷ (Supplementary Table 2). PyMol (Schrödinger, LLC) was used for generating molecular graphics.

Alphafold3 modeling

Protein and DNA sequences along the number of protein monomers with the expected number of zinc ions for the ZFs were submitted to the Alphafold3 server (https://golgi.sandbox.google.com)⁶³. Model evaluation proceeded by Alphafold scoring and use of our structural knowledge of ZFs, in particular those of BCLl11A and their potential sequence specific binding to DNA. Molecular graphics of these models were generated using PyMol (Schrödinger, LLC).

Fluorescence polarization-based DNA binding assay

A 40 µL reaction mixture was prepared containing 20 nM FAM-labeled DNA duplex and varying concentrations of BCL11A protein in a buffer consisting of 20 mM Tris-HCl (pH 7.5), 150 mM NaCl, 5% glycerol, and 0.5 mM TCEP. The mixture was incubated on ice for 20 min. Fluorescence polarization was measured using the Synergy Neo2 Multi-mode Reader (BioTek). Data analysis was performed using GraphPad Prism 10, and the binding affinity (K_D) was determined using the equation: [mP] = [maximum mP] × [C] / (K_D + [C]) + [baseline mP] where milli-polarization [mP] is the measured signal, and [C] represents the protein concentration.

Electrophoretic mobility shift assay (EMSA)

Following the fluorescence polarization assay, 10 µL of each reaction sample was subjected to electrophoresis on an 8% native polyacrylamide gel at a constant voltage of 150 V in ice-cold 0.5× TBE buffer for 25 min. The gel was imaged using a Typhoon 9410 imager (GE Healthcare).

Isothermal titration calorimetry (ITC)

Stored fractions of purified BCL11A ZF4-6 fragment were slowly thawed on ice. Double-stranded oligonucleotides (oligo #1 and #2) were annealed in 10 mM Tris-HCl pH 7.5 and 50 mM NaCl. The concentrations of the protein and DNA were confirmed using a ThermoScientific Nanodrop before both were diluted with the same buffer (20 mM HEPES pH 7.5, 150 mM NaCl, 5% glycerol, and 0.5 mM TCEP) to concentrations appropriate for ITC. The ZF4-6 domain (20 µM) were loaded in the sample cell and the annealed DNA (200 µM) was injected via the syringe for experiments at 25 °C using a MicroCal Auto ITC200 (Malvern Corporation). During continuous stirring at a rate of 750 rpm, 18 injections of 2 μL followed an initial injection of 0.2 μL of the DNA into the cell. Each injection lasted for 4 s, and there was a 300 s interval between injections to allow the system to reach equilibrium. The reference power was set to 8 μcal/s. Binding constants (K_D) were calculated by fitting the data using the MicroCal PEAQ-ITC Analysis ITC software.

Sequence motif analysis

ChIP sequencing datasets (GSM2771529 and GSM2771530)⁴⁰ were downloaded from Cistrome 3.0¹⁰⁸. CUT&RUN datasets³⁵ were downloaded from GEO (GSE104676; specifically, the GSM2805366, GSM2805367, and GSM2805368 NarrowPeak files). Peak sequences were extracted using the Fasta and SeqIO modules from Pyfaidx and Biopython, respectively. Human genome sequence Hg38 was used for ChIP-seq and Hg19 was used for CUT&RUN, in accordance with the original published analyses. TGNCCN sequence, or its complementary sequence NGGNCA, were searched to derive motif counts. Outliers were identified and removed from the datasets using the interquartile range (IQR) method. Differences between expected and observed motif counts were quantified using Mann Whitney U test. Expected motif counts were calculated as 1/128, based on the chance of TGNCCN motifs occurring randomly in the genome. Motif counts were normalized by the length of the TGNCCN motifs subtracted from the peak width. Spearman correlation analysis was used to assess correlations.

Statistics and reproducibility

X-ray crystallographic data were collected quantitatively and processed. Structure refinements were carried out with PHENIX Refine, using 5% randomly selected reflections for cross-validation by R-free values. Supplementary Table 2 provides statistics of data collection and refinement. Structure quality was assessed throughout refinements and validated using the PDB validation server. Statistics details for binding experiments are provided in the legends of Fig. 3 and Supplementary Fig. 2.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The coordinates and structure factor files of the X-ray structures of the BCL11A ZF4-6 in complexes with DNA have been deposited to PDB and released under accession numbers 9E6R for ZF4-6 and oligo #1, 9E6S for K784T and oligo #1, and 9E6T for ZF4-6 and oligo #2. Source data are provided with this paper.

References

Latchman, D. S. Transcription factors: an overview. Int J. Exp. Pathol. 74, 417–422 (1993).
CAS PubMed PubMed Central Google Scholar
Wolberger, C. How structural biology transformed studies of transcription regulation. J. Biol. Chem. 296, 100741 (2021).
Article CAS PubMed PubMed Central Google Scholar
Boumpas, P., Merabet, S. & Carnesecchi, J. Integrating transcription and splicing into cell fate: transcription factors on the block. Wiley Interdiscip. Rev. RNA 14, e1752 (2023).
Article CAS PubMed Google Scholar
Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A. & Luscombe, N. M. A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet 10, 252–263 (2009).
Article CAS PubMed Google Scholar
Lambert, S. A. et al. The human transcription factors. Cell 172, 650–665 (2018).
Article CAS PubMed Google Scholar
Amoutzias, G. D., Robertson, D. L., Van de Peer, Y. & Oliver, S. G. Choose your partners: dimerization in eukaryotic transcription factors. Trends Biochem Sci. 33, 220–229 (2008).
Article CAS PubMed Google Scholar
Tsuji, A., Yamashita, H., Hisatomi, O. & Abe, M. Dimerization processes for light-regulated transcription factor photozipper visualized by high-speed atomic force microscopy. Sci. Rep. 12, 12903 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Karin, M., Liu, Z. & Zandi, E. AP-1 function and regulation. Curr. Opin. Cell Biol. 9, 240–246 (1997).
Article CAS PubMed Google Scholar
Blackwood, E. M. & Eisenman, R. N. Max: a helix-loop-helix zipper protein that forms a sequence-specific DNA-binding complex with Myc. Science 251, 1211–1217 (1991).
Article ADS CAS PubMed Google Scholar
Oeckinghaus, A. & Ghosh, S. The NF-kappaB family of transcription factors and its regulation. Cold Spring Harb. Perspect. Biol. 1, a000034 (2009).
Article PubMed PubMed Central Google Scholar
Gencel-Augusto, J. & Lozano, G. p53 tetramerization: at the center of the dominant-negative effect of mutant p53. Genes Dev. 34, 1128–1146 (2020).
Article CAS PubMed PubMed Central Google Scholar
Huntley, S. et al. A comprehensive catalog of human KRAB-associated zinc finger genes: insights into the evolutionary history of a large family of transcriptional repressors. Genome Res. 16, 669–677 (2006).
Article CAS PubMed PubMed Central Google Scholar
Stogios, P. J., Downs, G. S., Jauhal, J. J., Nandra, S. K. & Prive, G. G. Sequence and structural analysis of BTB domain proteins. Genome Biol. 6, R82 (2005).
Article PubMed PubMed Central Google Scholar
Williams, A. J., Khachigian, L. M., Shows, T. & Collins, T. Isolation and characterization of a novel zinc-finger protein with transcription repressor activity. J. Biol. Chem. 270, 22143–22152 (1995).
Article CAS PubMed Google Scholar
Thomas, J. H. & Emerson, R. O. Evolution of C2H2-zinc finger genes revisited. BMC Evol. Biol. 9, 51 (2009).
Article PubMed PubMed Central Google Scholar
Siggs, O. M. & Beutler, B. The BTB-ZF transcription factors. Cell Cycle 11, 3358–3369 (2012).
Article CAS PubMed PubMed Central Google Scholar
Sander, T. L. et al. The SCAN domain defines a large family of zinc finger transcription factors. Gene 310, 29–38 (2003).
Article CAS PubMed Google Scholar
Edelstein, L. C. & Collins, T. The SCAN domain family of zinc finger transcription factors. Gene 359, 1–17 (2005).
Article CAS PubMed Google Scholar
Bonchuk, A., Balagurov, K. & Georgiev, P. BTB domains: a structural view of evolution, multimerization, and protein-protein interactions. BioEssays: N. Rev. Mol., Cell. Develop. Biol. 45, e2200179 (2023).
Article Google Scholar
Mance, L. et al. Dynamic BTB-domain filaments promote clustering of ZBTB proteins. Mol. Cell 84, 2490–2510 e2499 (2024).
Article CAS PubMed Google Scholar
Park, P. M. C. et al. Polymerization of ZBTB transcription factors regulates chromatin occupancy. Mol. Cell 84, 2511–2524 e2518 (2024).
Article CAS PubMed Google Scholar
Orkin, S. H. & Bauer, D. E. Emerging genetic therapy for sickle cell disease. Annu Rev. Med. 70, 257–271 (2019).
Article CAS PubMed Google Scholar
Lan, X. et al. ZNF410 Uniquely activates the NuRD component CHD4 to silence fetal hemoglobin expression. Mol. Cell 81, 239–254 e238 (2021).
Article CAS PubMed Google Scholar
Vinjamur, D. S. et al. ZNF410 represses fetal globin by singular control of CHD4. Nat. Genet 53, 719–728 (2021).
Article CAS PubMed PubMed Central Google Scholar
Sankaran, V. G. et al. Human fetal hemoglobin expression is regulated by the developmental stage-specific repressor BCL11A. Science 322, 1839–1842 (2008).
Article ADS CAS PubMed Google Scholar
Masuda, T. et al. Transcription factors LRF and BCL11A independently repress expression of fetal hemoglobin. Science 351, 285–289 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Borg, J. et al. Haploinsufficiency for the erythroid transcription factor KLF1 causes hereditary persistence of fetal hemoglobin. Nat. Genet 42, 801–805 (2010).
Article CAS PubMed PubMed Central Google Scholar
Zhou, D., Liu, K., Sun, C. W., Pawlik, K. M. & Townes, T. M. KLF1 regulates BCL11A expression and gamma- to beta-globin gene switching. Nat. Genet 42, 742–744 (2010).
Article CAS PubMed Google Scholar
Zhang, X., Xia, F., Zhang, X., Blumenthal, R. M. & Cheng, X. C2H2 Zinc finger transcription factors associated with hemoglobinopathies. J. Mol. Biol. 436, 168343 (2024).
Article CAS PubMed Google Scholar
Dinkova-Kostova, A. T. & Copple, I. M. Advances and challenges in therapeutic targeting of NRF2. Trends Pharm. Sci. 44, 137–149 (2023).
Article CAS PubMed Google Scholar
Bonazzi, S. et al. Discovery and characterization of a selective IKZF2 glue degrader for cancer immunotherapy. Cell Chem. Biol. 30, 235–247 e212 (2023).
Article CAS PubMed Google Scholar
Li, J., Lv, A., Chen, M., Xu, L. & Huang, H. Activating transcription factor 4 in erythroid development and beta -thalassemia: a powerful regulator with therapeutic potential. Ann. Hematol. 103, 2659–2670 (2024).
Article CAS PubMed Google Scholar
Pant, T., Uche, N., Juric, M., Zielonka, J. & Bai, X. Regulation of immunomodulatory networks by Nrf2-activation in immune cells: Redox control and therapeutic potential in inflammatory diseases. Redox Biol. 70, 103077 (2024).
Article CAS PubMed PubMed Central Google Scholar
Zhang, X. et al. ATF family members as therapeutic targets in cancer: from mechanisms to pharmacological interventions. Pharmacol. Res. 208, 107355 (2024).
Article CAS PubMed Google Scholar
Liu, N. et al. Direct promoter repression by BCL11A controls the fetal to adult hemoglobin switch. Cell 173, 430–442 e417 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zha, J. et al. Identification of novel BCL11A variant in a patient with developmental delay and behavioural differences. Int. J. Dev. Neurosci. 84, 727–73 (2024)
Shen, Y. et al. Pathogenic BCL11A variants provide insights into the mechanisms of human fetal hemoglobin silencing. PLoS Genet 17, e1009835 (2021).
Article CAS PubMed PubMed Central Google Scholar
Dias, C. et al. BCL11A haploinsufficiency causes an intellectual disability syndrome and dysregulates transcription. Am. J. Hum. Genet 99, 253–274 (2016).
Article CAS PubMed PubMed Central Google Scholar
Zheng, G. & Orkin, S. H. Transcriptional repressor BCL11A in erythroid cells. Adv. Exp. Med. Biol. 1459, 199–215 (2024).
Article PubMed Google Scholar
Martyn, G. E. et al. Natural regulatory mutations elevate the fetal globin gene via disruption of BCL11A or ZBTB7A binding. Nat. Genet 50, 498–503 (2018).
Article CAS PubMed Google Scholar
Zhang, X., Blumenthal, R. M. & Cheng, X. Updated understanding of the protein-DNA recognition code used by C2H2 zinc finger proteins. Curr. Opin. Struct. Biol. 87, 102836 (2024).
Article CAS PubMed Google Scholar
Yang, Y. et al. Structural basis for human ZBTB7A action at the fetal globin promoter. Cell Rep. 36, 109759 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yang, J. et al. Structures of CTCF-DNA complexes including all 11 zinc fingers. Nucleic Acids Res. 51, 8447–8462 (2023).
Article CAS PubMed PubMed Central Google Scholar
Persikov, A. V. & Singh, M. De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins. Nucleic Acids Res. 42, 97–108 (2014).
Article CAS PubMed Google Scholar
Zheng, G. et al. A tetramer of BCL11A is required for stable protein production and fetal hemoglobin silencing. Science 386, 1010–1018 (2024).
Article CAS PubMed Google Scholar
Kurita, R. et al. Establishment of immortalized human erythroid progenitor cell lines able to produce enucleated red blood cells. PLoS ONE 8, e59890 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Orkin, S. H. & Higgs, D. R. Medicine. Sickle cell disease at 100 years. Science 329, 291–292 (2010).
Article ADS CAS PubMed Google Scholar
Peschle, C. et al. Embryonic-Fetal Hb switch in humans: studies on erythroid bursts generated by embryonic progenitors from yolk sac and liver. Proc. Natl Acad. Sci. USA 81, 2416–2420 (1984).
Article ADS CAS PubMed PubMed Central Google Scholar
Ivaldi, M. S. et al. Fetal gamma-globin genes are regulated by the BGLT3 long noncoding RNA locus. Blood 132, 1963–1973 (2018).
Article CAS PubMed PubMed Central Google Scholar
Takase, S. et al. A specific G9a inhibitor unveils BGLT3 lncRNA as a universal mediator of chemically induced fetal globin gene expression. Nat. Commun. 14, 23 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Ma, S. P. et al. Long noncoding RNA HBBP1 enhances gamma-globin expression through the ETS transcription factor ELK1. Biochem. Biophys. Res. Commun. 552, 157–163 (2021).
Article CAS PubMed Google Scholar
Huang, P. et al. Comparative analysis of three-dimensional chromosomal architecture identifies a novel fetal hemoglobin regulatory element. Genes Dev. 31, 1704–1713 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ma, Y. et al. Genome-wide analysis of pseudogenes reveals HBBP1’s human-specific essentiality in erythropoiesis and implication in beta-thalassemia. Develop. cell 56, 478–493 e411 (2021).
Article CAS Google Scholar
Nuinoon, M. et al. A genome-wide association identified the common genetic variants influence disease severity in beta0-thalassemia/hemoglobin E. Hum. Genet 127, 303–314 (2010).
Article CAS PubMed Google Scholar
Giannopoulou, E. et al. A single nucleotide polymorphism in the HBBP1 gene in the human beta-globin locus is associated with a mild beta-thalassemia disease phenotype. Hemoglobin 36, 433–445 (2012).
Article CAS PubMed Google Scholar
Galarneau, G. et al. Fine-mapping at three loci known to affect fetal hemoglobin levels explains additional genetic variation. Nat. Genet 42, 1049–1051 (2010).
Article CAS PubMed PubMed Central Google Scholar
Yang, Y. et al. Structural insights into the recognition of gamma-globin gene promoter by BCL11A. Cell Res. 29, 960–963 (2019).
Article PubMed PubMed Central Google Scholar
Viennet, T. et al. Structural insights into the DNA-binding mechanism of BCL11A: the integral role of ZnF6. Structure. 19, 2024.01.17.576058 (2024)
Luscombe, N. M., Laskowski, R. A. & Thornton, J. M. Amino acid-base interactions: a three-dimensional analysis of protein-DNA interactions at an atomic level. Nucleic Acids Res. 29, 2860–2874 (2001).
Article CAS PubMed PubMed Central Google Scholar
Horton, J. R., Liebert, K., Hattman, S., Jeltsch, A. & Cheng, X. Transition from nonspecific to specific DNA interactions along the substrate-recognition pathway of dam methyltransferase. Cell 121, 349–361 (2005).
Article CAS PubMed PubMed Central Google Scholar
Kalodimos, C. G. et al. Structure and flexibility adaptation in nonspecific and specific protein-DNA complexes. Science 305, 386–389 (2004).
Article ADS CAS PubMed Google Scholar
Patel, A. et al. DNA conformation induces adaptable binding by tandem zinc finger proteins. Cell 173, 221–233 e212 (2018).
Article CAS PubMed PubMed Central Google Scholar
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Kaur, G. et al. Allosteric autoregulation of DNA binding via a DNA-mimicking protein domain: a biophysical study of ZNF410-DNA interaction using small angle X-ray scattering. Nucleic Acids Res. 51, 1674–1686 (2023).
Article CAS PubMed PubMed Central Google Scholar
Grabarczyk, P. et al. The N-terminal CCHC zinc finger motif mediates homodimerization of transcription factor BCL11B. Mol. Cell. Biol. 38, e00368–17 (2018)
Okuyama, K. et al. A mutant BCL11B-N440K protein interferes with BCL11A function during T lymphocyte and neuronal development. Nat. Immunol. 25, 2284–2296 (2024).
Article CAS PubMed Google Scholar
Choo, Y. & Klug, A. Physical basis of a protein-DNA recognition code. Curr. Opin. Struct. Biol. 7, 117–125 (1997).
Article CAS PubMed Google Scholar
Wienert, B. et al. KLF1 drives the expression of fetal hemoglobin in British HPFH. Blood 130, 803–807 (2017).
Article CAS PubMed Google Scholar
Martyn, G. E. et al. A natural regulatory mutation in the proximal promoter elevates fetal globin expression by creating a de novo GATA1 site. Blood 133, 852–856 (2019).
Article CAS PubMed Google Scholar
Peron, A. et al. BCL11A intellectual developmental disorder: defining the clinical spectrum and genotype-phenotype correlations. Eur. J. Hum. Genet. 33, 312–324 (2024)
Patel, A., Zhang, X., Blumenthal, R. M. & Cheng, X. Structural basis of human PR/SET domain 9 (PRDM9) allele C-specific recognition of its cognate DNA sequence. J. Biol. Chem. 292, 15994–16002 (2017).
Article CAS PubMed PubMed Central Google Scholar
Paillard, G., Deremble, C. & Lavery, R. Looking into DNA recognition: zinc finger binding specificity. Nucleic Acids Res. 32, 6673–6682 (2004).
Article CAS PubMed PubMed Central Google Scholar
Mitra, R. et al. Geometric deep learning of protein-DNA binding specificity. Nat. Methods 21, 1674–1683 (2024).
Article CAS PubMed PubMed Central Google Scholar
Liu, Y., Zhang, X., Blumenthal, R. M. & Cheng, X. A common mode of recognition for methylated CpG. Trends Biochem. Sci. 38, 177–183 (2013).
Article CAS PubMed PubMed Central Google Scholar
Zhang, X., Blumenthal, R. M. & Cheng, X. DNA-binding proteins from MBD through ZF to BEN: recognition of cytosine methylation status by one arginine with two conformations. Nucleic Acids Res. 52, 11442–11454 (2024)
Nolte, R. T., Conlin, R. M., Harrison, S. C. & Brown, R. S. Differing roles for zinc fingers in DNA recognition: structure of a six-finger transcription factor IIIA complex. Proc. Natl Acad. Sci. USA 95, 2938–2943 (1998).
Article ADS CAS PubMed PubMed Central Google Scholar
Huang, P. et al. HIC2 controls developmental hemoglobin switching by repressing BCL11A transcription. Nat. Genet 54, 1417–1426 (2022).
Article CAS PubMed PubMed Central Google Scholar
Hashimoto, H. et al. Structural basis for the versatile and methylation-dependent binding of CTCF to DNA. Mol. Cell 66, 711–720 e713 (2017).
Article CAS PubMed PubMed Central Google Scholar
Yin, M. et al. Molecular mechanism of directional CTCF recognition of a diverse range of genomic sites. Cell Res. 27, 1365–1377 (2017).
Article CAS PubMed PubMed Central Google Scholar
Liu, H. et al. Functional studies of BCL11A: characterization of the conserved BCL11A-XL splice variant and its interaction with BCL6 in nuclear paraspeckles of germinal center B cells. Mol. Cancer 5, 18 (2006).
Article PubMed PubMed Central Google Scholar
Bellemare, J., Rouleau, M., Harvey, M. & Guillemette, C. Modulation of the human glucuronosyltransferase UGT1A pathway by splice isoform polypeptides is mediated through protein-protein interactions. J. Biol. Chem. 285, 3600–3607 (2010).
Article CAS PubMed Google Scholar
Avram, D. et al. Isolation of a novel family of C(2)H(2) zinc finger proteins implicated in transcriptional repression mediated by chicken ovalbumin upstream promoter transcription factor (COUP-TF) orphan nuclear receptors. J. Biol. Chem. 275, 10315–10322 (2000).
Article CAS PubMed Google Scholar
Kuo, T. Y., Hong, C. J., Chien, H. L. & Hsueh, Y. P. X-linked mental retardation gene CASK interacts with Bcl11A/CTIP1 and regulates axon branching and outgrowth. J. Neurosci. Res. 88, 2364–2373 (2010).
Article CAS PubMed Google Scholar
Avram, D., Fields, A., Senawong, T., Topark-Ngarm, A. & Leid, M. COUP-TF (chicken ovalbumin upstream promoter transcription factor)-interacting protein 1 (CTIP1) is a sequence-specific DNA binding protein. Biochem. J. 368, 555–563 (2002).
Article CAS PubMed PubMed Central Google Scholar
Wiegreffe, C. et al. Bcl11a (Ctip1) controls migration of cortical projection neurons through regulation of Sema3c. Neuron 87, 311–325 (2015).
Article CAS PubMed Google Scholar
Senawong, T. et al. Involvement of the histone deacetylase SIRT1 in chicken ovalbumin upstream promoter transcription factor (COUP-TF)-interacting protein 2-mediated transcriptional repression. J. Biol. Chem. 278, 43041–43050 (2003).
Article CAS PubMed Google Scholar
Senawong, T., Peterson, V. J. & Leid, M. BCL11A-dependent recruitment of SIRT1 to a promoter template in mammalian cells results in histone deacetylation and transcriptional repression. Arch. Biochem. Biophys. 434, 316–325 (2005).
Article CAS PubMed PubMed Central Google Scholar
Punwani, D. et al. Multisystem anomalies in severe combined immunodeficiency with mutant BCL11B. N. Engl. J. Med. 375, 2165–2176 (2016).
Article CAS PubMed PubMed Central Google Scholar
Wang, J., Malecka, A., Troen, G. & Delabie, J. Comprehensive genome-wide transcription factor analysis reveals that a combination of high affinity and low affinity DNA binding is needed for human gene regulation. BMC Genom. 16, S12 (2015).
Article Google Scholar
Farley, E. K. et al. Suboptimization of developmental enhancers. Science 350, 325–328 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Farley, E. K., Olson, K. M., Zhang, W., Rokhsar, D. S. & Levine, M. S. Syntax compensates for poor binding sites to encode tissue specificity of developmental enhancers. Proc. Natl Acad. Sci. USA 113, 6508–6513 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Delker, R. K., Ranade, V., Loker, R., Voutev, R. & Mann, R. S. Low affinity binding sites in an activating CRM mediate negative autoregulation of the Drosophila Hox gene Ultrabithorax. PLoS Genet. 15, e1008444 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kribelbauer, J. F., Rastogi, C., Bussemaker, H. J. & Mann, R. S. Low-affinity binding sites and the transcription factor specificity paradox in eukaryotes. Annu. Rev. Cell Develop. Biol. 35, 357–379 (2019).
Article CAS Google Scholar
Terrell, J. R. et al. DNA selection by the master transcription factor PU.1. Cell Rep. 42, 112671 (2023).
Article CAS PubMed PubMed Central Google Scholar
Lan, F. et al. Recognition of unmethylated histone H3 lysine 4 links BHC80 to LSD1-mediated gene repression. Nature 448, 718–722 (2007).
Article ADS CAS PubMed PubMed Central Google Scholar
Vonrhein, C. et al. Data processing and analysis with the autoPROC toolbox. Acta Crystallogr. D. Biol. Crystallogr. 67, 293–302 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Kabsch, W. Xds. Acta Crystallogr. D. Biol. Crystallog.r 66, 125–132 (2010).
Article ADS CAS Google Scholar
Evans, P. Scaling and assessment of data quality. Acta Crystallogr. D. Biol. Crystallogr. 62, 72–82 (2006).
Article ADS PubMed Google Scholar
Evans, P. R. & Murshudov, G. N. How good are my data and what is the resolution? Acta Crystallogr. D. Biol. Crystallogr. 69, 1204–1214 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
McCoy, A. J. et al. Phaser crystallographic software. J. Appl. Crystallogr. 40, 658–674 (2007).
Article ADS CAS PubMed PubMed Central Google Scholar
Headd, J. J. et al. Use of knowledge-based restraints in phenix.refine to improve macromolecular refinement at low resolution. Acta Crystallogr. D. Biol. Crystallogr. 68, 381–390 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Afonine, P. V. et al. Towards automated crystallographic structure refinement with phenix.refine. Acta Crystallogr. D. Biol. Crystallogr. 68, 352–367 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Brunger, A. T. Free R value: cross-validation in crystallography. Methods Enzymol. 277, 366–396 (1997).
Article CAS PubMed Google Scholar
Brunger, A. T. Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature 355, 472–475 (1992).
Article ADS CAS PubMed Google Scholar
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D. Biol. Crystallogr. 66, 486–501 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Crystallogr. D. Biol. Crystallogr. 60, 2126–2132 (2004).
Article ADS PubMed Google Scholar
Read, R. J. et al. A new generation of crystallographic validation tools for the protein data bank. Structure 19, 1395–1412 (2011).
Article CAS PubMed PubMed Central Google Scholar
Taing, L. et al. Cistrome data browser: integrated search, analysis and visualization of chromatin data. Nucleic Acids Res. 52, D61–D66 (2024).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

The work is supported by the grants R35GM134744 (to X.C.) and R01DK132286 (to Y.H. and X.C.) from the U.S. National Institutes of Health (NIH). M.Y. is supported by a training fellowship from the Gulf Coast Consortia on the National Library of Medicine Training Program in Biomedical Informatics and Data Science (T15LM007093). M.T. is an undergraduate at University of Houston and R.R.A. is an undergraduate at University of Texas at Austin, both were supported by the Catalyst Summer Research Training Program. M.T. was supported by a generous gift from H-E-B directed to the Partnership for Careers in Cancer Science and Medicine Program, Division of Education & Training. R.R.A. was supported by the MD Anderson Summer Program in Cancer Research (R25CA181004). Xiaotian Z. is supported by NIH grant R01HL170115. X.C. is a CPRIT Scholar in Cancer Research. We thank Dr. Xiangpeng Kong (New York University) for assistance of access to 17-ID-1 beamtime and the beamline scientists of the National Synchrotron Light Source (NSLS) II, Brookhaven National Laboratory. The use of NSLS II resources 17-ID-1 was supported by the U.S. Department of Energy under contract DE-SC0012704.

Author information

Authors and Affiliations

Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
John R. Horton, Meigen Yu, Jujun Zhou, Melody Tran, Rithvi R. Anakal, Yue Lu, Xing Zhang & Xiaodong Cheng
Department of Medical Microbiology and Immunology, and Program in Bioinformatics, The University of Toledo College of Medicine and Life Sciences, Toledo, OH, 43614, USA
Robert M. Blumenthal
Department of Biochemistry and Molecular Biology, The University of Texas Health Science Center Houston, McGovern Medical School, Houston, TX, 77030, USA
Xiaotian Zhang
Center for Epigenetics and Disease Prevention, Institute of Biosciences and Technology, College of Medicine, Texas A&M University, Houston, TX, 77030, USA
Yun Huang

Authors

John R. Horton
View author publications
Search author on:PubMed Google Scholar
Meigen Yu
View author publications
Search author on:PubMed Google Scholar
Jujun Zhou
View author publications
Search author on:PubMed Google Scholar
Melody Tran
View author publications
Search author on:PubMed Google Scholar
Rithvi R. Anakal
View author publications
Search author on:PubMed Google Scholar
Yue Lu
View author publications
Search author on:PubMed Google Scholar
Robert M. Blumenthal
View author publications
Search author on:PubMed Google Scholar
Xiaotian Zhang
View author publications
Search author on:PubMed Google Scholar
Yun Huang
View author publications
Search author on:PubMed Google Scholar
Xing Zhang
View author publications
Search author on:PubMed Google Scholar
Xiaodong Cheng
View author publications
Search author on:PubMed Google Scholar

Contributions

J.R.H. performed protein purification, crystallization, and X-ray diffraction experiments. M.Y., with the help of Y.L., analyzed published ChIP-seq and CUT&RUN data. M.T. and R.R.A., under the guidance of J.Z., performed FP and EMSA binding assays. Xiaotian Z. and Y.H. provided advice on BCL11A biology and participated in discussion. R.M.B. participated in discussions, sequence analysis, and in preparing the manuscript. Xing Z. and X.C. designed and organized the scale of the study.

Corresponding author

Correspondence to Xiaodong Cheng.

Ethics declarations

Competing interests

The authors declare no competing interest.

Peer review

Peer review information

Nature Communications thanks Andrew Perkins, Dong Wang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information (download PDF )

Peer Review file (download PDF )

Reporting Summary (download PDF )

Source data

Source Data (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Horton, J.R., Yu, M., Zhou, J. et al. Multimeric transcription factor BCL11A utilizes two zinc-finger tandem arrays to bind clustered short sequence motifs. Nat Commun 16, 3672 (2025). https://doi.org/10.1038/s41467-025-58998-7

Download citation

Received: 01 November 2024
Accepted: 08 April 2025
Published: 17 April 2025
Version of record: 17 April 2025
DOI: https://doi.org/10.1038/s41467-025-58998-7