Abstract
BCL11A, a transcription factor, is vital for hematopoiesis, including B and T cell maturation and the fetal-to-adult hemoglobin switch. Mutations in BCL11A are linked to neurodevelopmental disorders. BCL11A contains two DNA-binding zinc-finger arrays, low-affinity ZF2-3 and high-affinity ZF4-6, separated by a 300-amino-acid linker. ZF2-3 and ZF4-5 share 73% identity, including five out of six DNA base-interacting residues. These arrays bind similar short sequence motifs in clusters, with the linker enabling a broader binding span. Crystallographic structures of ZF4-6, in complex with oligonucleotides from the β-globin locus region, reveal DNA sequence recognition by residues Asn756 (ZF4), Lys784 and Arg787 (ZF5). A Lys784-to-Thr mutation, linked to a neurodevelopmental disorder with persistent fetal globin expression, reduces DNA binding over 10-fold but gains interaction with a variable base pair. BCL11A isoforms may form oligomers, enhancing chromatin occupancy and repressor functions by allowing multiple copies of both low- and high-affinity ZF arrays to bind DNA. These distinctive properties, apparently conserved among vertebrates, provide essential functional flexibility to this crucial regulator.
Similar content being viewed by others
Introduction
DNA-binding transcription factors (TFs) are central to regulating gene expression across the biological kingdoms1,2,3. TFs typically bind to specific genomic locations in a DNA sequence-specific manner. Approximately 1500 sequence-specific DNA-binding TFs have been annotated in the human genome4,5. Among these, three structural classes account for the majority of TFs: helix-loop-helix proteins ( ~ 100), homeodomain proteins ( ~ 250), and Cys2-His2 (C2H2) zinc-finger proteins ( ~ 700). Many regulatory promoter and enhancer elements are controlled by multiple TFs. These DNA-binding proteins often use a protein-protein interaction domain to associate with identical molecules, or other family members, to form functional dimers (homo- or heterodimers), tetramers, or higher-order oligomers6,7. Such interactions enhance their binding affinity, extend the length of the recognition sequence (thus boosting specificity), and increase chromatin residency time, all of which contribute to stably controlling transcriptional output. Specific multimeric examples include the AP-1 (Fos/Jun) dimer8, Myc/Max9, NF-κB10, and p53 tetramer11.
Among the ~700 C2H2 zinc-finger (ZF) proteins, many include an additional protein-protein interaction domain at their N-terminus: KRAB (Krüppel-associated box)12, BTB (broad-complex, tramtrack, and bric-a-brac; also known as POZ for poxvirus and zinc finger)13, and SCAN (SRE-ZBP, Ctfin51, AW-1 (ZNF174), Number 18 cDNA (ZNF20))14. There are an estimated ~350 KRAB-ZF proteins15 (about half of all C2H2 ZF proteins), ~50 BTB-ZF proteins16, and ~70 SCAN-ZF proteins17,18 (about 10% of C2H2 ZF proteins). Some proteins ( ~ 24) contain both SCAN and KRAB domains. While the KRAB domain is almost exclusively found in C2H2-ZF proteins, the SCAN and BTB domains are also present in non-ZF proteins13,19. Some BTB-ZF proteins polymerize into filaments to regulate chromatin occupancy20,21. The remaining ~200 C2H2-ZF proteins lack a recognizable protein-protein interaction domain, leaving their regulatory mechanisms unclear.
Specific mutations in the human adult β-globin gene result in sickle cell disease and β-thalassemia, both of which can be alleviated by increased expression of fetal globin (HbF)22. Using CRISPR-Cas9 screens of ~1500 DNA binding proteins in the Human Umbilical cord blood-Derived Erythroid Progenitor (HUDEP-2) cells, which expresses adult hemoglobin, two studies identified several groups of candidate regulators (including ZNF410) of HbF gene expression23,24. These direct or indirect regulators include several C2H2 ZF proteins, some of which are already known (prior to the CRISPR screens) to silence HbF genes in adult human erythroid cells: BCL11A25, ZBTB7A26, and KLF127,28. Among these, ZBTB7A is a BTB/POZ-ZF protein, while the others lack a recognizable protein-protein interaction domain29. Manipulating TF activities can have therapeutic value30,31,32,33,34.
In this study, we investigate the DNA binding ability of B-cell leukemia/lymphoma 11 A (BCL11 A). BCL11 A (i) is highly conserved among vertebrates29, (ii) plays a critical role in lymphocyte development during normal hematopoiesis, (iii) directly suppresses HBG1/2 expression (the genes for fetal γ-globin) in adult erythroid cells, (iv) controls the fetal-to-adult hemoglobin switch35, and (v) has a number pathogenic human variants36,37,38. BCL11 A39 contains two ZF tandem arrays (ZF2-3 and ZF4-6) separated by a 300-amino-acid spacer (Fig. 1A). Our findings show that each ZF array in BCL11A can bind a similar six-base pair consensus motif. After reexamining published ChIP-seq peaks40, we noticed that multiple copies of these homotypic motifs are present in BCL11A binding sites. We propose that DNA binding by the two ZF tandem arrays significantly extends the recognized sequence, thereby stabilizing transcriptional silencing of HBG1/2, together with BGLT3 and HBBP1, in HUDEP-2 erythroid cells. We tested this hypothesis through structural and DNA-protein interaction analyses.
A The four major splicing isoforms of BCL11A are depicted. The length of each isoform and its corresponding UniProt accession number is provided. Each C2H2-ZF unit is numbered and highlighted in blue, with relevant sequence Logos under the XL-isoform ZF clusters (see panel D). The acidic region is labeled E and shown within a red box; proline-rich and serine/threonine-rich regions are also labeled. The alternative amino acids unique to each splicing isoform are indicated by black boxes at the carboxyl ends. The mutations of BCL11A linked to neurodevelopmental disorders, derived from refs. 37,38,70, are labeled in ZF0, ZF1, ZF4 and ZF5 in the XL isoform. B Sequence alignment of the C2H2 fingers, showing variations at the DNA base-interacting positions −1, −4, and −7. On the right, the HMMER algorithm has calculated the bit score for ZF1-ZF6 (http://zf.princeton.edu/index.php), though no score could be calculated for ZF0. The stretch of five hydrophobic residues in ZF0 and ZF1 are underlined; four out of five are polar/charged in typical ZFs, including ZF2-6 of BCL11A. Red letters indicate residues associated with pathogenic mutations in ZF0, ZF1, ZF4 and ZF5. C Pairwise sequence alignment of ZF2-ZF3 and ZF4-ZF5, with the DNA-base-interacting positions labeled at −1, −4, and −7 for each finger. D Predicted DNA-binding specificities for ZF2–3 and ZF4-6 were determined via a polynomial Support-Vector-Machine44, and are shown to illustrate their similarity. E Consensus binding sequences of GST-ZF2-3 and GST-ZF4-6 derived from a protein binding microarray experiment35 (UniPROBE: LIU18A). F Consensus of binding by BCL11A derived from ChIP-seq40 and CUT&RUN35 experiments in HUDEP-2 erythroid cells. Notable similarities include DNA sequence positions 2, 5 and 6, but there are differences at positions 3 and 4. G Examples of recombinant proteins of ZF2-3 and ZF4-6 used in the study (N > 5 times).
Results
BCL11A contains two ZF tandem arrays predicted to bind similar sequences
Human BCL11A has 22 isoforms listed in GenBank (Supplementary Table 1), that range in size from 130 to 835 residues. Of these, BCL11A has four major splicing isoforms (Fig. 1A; bolded lines in Supplementary Table 1). The largest isoform (XL) includes 835 amino acids and contains six classic C2H2-ZF units. These six ZFs are in three groups (ZF1, ZF2-ZF3, and ZF4-ZF5-ZF6), separated by long stretches of residues (175 between ZFs 1 and 2, and 300 between ZFs 3 and 4). The L isoform lacks the C-terminal three-finger array, the S isoform is missing two-thirds of its C-terminal fragment (including five of the six fingers), and the XS isoform contains only 142 residues. The two ZF-containing, longer isoforms (XL and L) are expressed in primary adult human erythroid cells at both the RNA and protein levels25.
The two ZF cluster arrays (ZF2-3 and ZF4-6) in BCL11A XL are conventional DNA-binding fingers. Each finger coordinates a zinc ion through two cysteine and two histidine residues, following the typical spacing pattern: (Cx2C - x12 - Hx3-4H) with twelve residues between the last zinc-coordinating Cys and the first zinc-coordinating His (reviewed in ref. 41). Relative to the first Zn-coordinating His (Fig. 1B), the amino acids at positions -1, -4 and -7 of each finger are the principal determinants for recognizing three consecutive DNA bases, primarily on one DNA strand (referred to as the recognition strand; detailed in the structural analysis section).
In addition to the five canonical fingers, BCL11A also contains two atypical, isolated fingers near its N-terminus, referred to as ZF0 and ZF1 (Fig. 1B). ZF0 differs from the canonical fingers in three respects. First, in ZF0, the last His is substituted by a Cys, resulting in C3H1 (or CCHC) zinc coordination, similar to that seen in ZF4 of ZBTB7A42 and ZF11 of CTCF43. Second, ZF0 has 14 residues, instead of the usual 12, between the last Zn-coordinating Cys and the first Zn-coordinating His. Third, ZF0, like ZF1, has a stretch of five hydrophobic residues in place of the polar and charged residues in conventional ZF units (underlined sequences in Fig. 1B). Using a prediction method for C2H2-ZFs44, the resulting bit score for BCL11A finger ranged from 17.5 (ZF1) to 28 (ZF2-3) and 30 (ZF4-6) (Fig. 1B; higher scores indicate higher confidence). No score could be calculated for ZF0. While ZF0 is present in all four major isoforms of BCL11A (Fig. 1A), previous studies suggested that ZF0 and ZF1 are dispensable for HbF silencing by the XL isoform in mouse erythroleukemia cells35. However, more recent research indicated that ZF0 and ZF1 are required for BCL11A tetramer or dimer formation, which is essential for maintaining steady-state protein levels and properly silencing HbF in HUDEP-2 cells45. Additionally, introducing three pathogenic mutations found in patients with intellectual disability (T47P, C48F, and H66Q, respectively) into ZF0 of the L isoform of BCL11A prevented nuclear localization in HEK293T cells38.
Pairwise comparison between ZF2-3 and ZF4-5 shows a significantly higher percentage of identical amino acids (38/52, 73%) (Fig. 1C), than within each array. Specifically, aside from the zinc-coordinating Cys2His2 residues themselves, ZF2 and ZF3 share only two identical residues (Ser at position -5 and Leu at -3), while ZF4 and ZF5 share just three residues (including a Glu between the two zinc-coordinating cysteine residues) (Fig. 1B). The higher sequence identity between ZF2-3 and ZF4-5, particularly the 5/6 base-interacting residues at positions -1, -4 and -7, suggests that these two ZF arrays might bind closely-related DNA sequences. Indeed, DNA sequences of six or nine base pairs were predicted to bind with ZF2-3 or ZF4-6, respectively (Fig. 1D), following the traditional rule of one finger recognizing three base pairs. In an in vitro protein binding microarray experiment, N-terminal tagged GST-ZF2-3 and GST-ZF4-6 fragments bind a similar consensus sequence (Fig. 1E)35. The predicted binding sequences for both ZF arrays (ZF2-3 and ZF4-6) partially align with experimentally-determined motifs at DNA sequence positions of 2, 5 and 6, as identified by ChIP-seq40 and CUT&RUN35 in HUDEP-2 cells46. However the predicted and experimentally-determined sequence specificities do not match at two adjacent positions: the actual guanine at position 3 and a predicted thymine at position 4 (compare Fig. 1D and F).
To better understand DNA recognition by BCL11A, we generated two protein fragments of BCL11A, residues 373-433 (ZF2-3) and residues 730-835 (ZF4-6) (Fig. 1G). Additionally, we created the mutant K784T (which reflects a pathogenic human mutation37), that alters the base-interacting position -4 in ZF5, in the ZF4-6 fragment (Fig. 1B).
BCL11A binding peaks contain multiple copies of the binding motif
Given that the experimentally-determined sequence motif bound by BCL11A (XL or L) is relatively short (six nucleotides) compared to the ChIP_seq peak width of over 200-300 nucleotides, we investigated the number of motifs present under each peak. We analyzed five datasets from two independent experiments: two replicates of the endogenously inducible BCL11A-ER-V5 in HUDEP-2 cells using V5 antibody40 and three replicates of the BCL11A CUT&RUN35. The results from replicate 1 of Martyn et al. 40 are shown in Fig. 2 and are consistent across replicates/experiments (Supplementary Fig. 1).
A Histogram showing the genome-wide distribution of ChIP-seq peaks based on the number of TGNCCN motifs present (GSM2771529)40. B Violin plot showing the distribution of observed and expected motif counts. C A regression line showing the positive correlation between BCL11A ChIP-seq peak widths (nt) and the counts of associated motif counts. D Visualization of BCL11A ChIP-seq binding peaks at the β-globin locus. E Summary of BCL11A peaks and number of motifs in the β-globin locus. F Sequences of ChIP-seq peaks corresponding to panel D.
We searched for the 6-mer motif of TGNCCN, allowing variations at positions 3 and 6 (where N is any nucleotide), or its complementary sequence NGGNCA. Since only four base pair positions (1, 2, 4 and 5) are constant, we expect to find one occurrence of this motif approximately every 256 ( = 44) base pairs. Among the 4694 peaks examined, 3882 (82.7%) have at least two motifs, with a median of 4 (Fig. 2A). We used the peak width to calculate a theoretical expected motif count for each peak and determined that the observed mean motif count of four was 1.6X higher than the mean expected peak count (2.45). These differences were significant as indicated by the non-parametric Mann–Whitney U test (p = 1.3 × 10-266) (Fig. 2B). We observed that motif count was positively correlated with peak width (Spearman correlation coefficient = 0.42, p = 2.2 x 10-195) (Fig. 2C).
Next, we focused our analysis on the β-globin gene cluster on chromosome 11, where BCL11A naturally binds (Fig. 2D). The human β-globin cluster includes an embryonic ε gene, two fetal γ genes (HBG2 and HBG1), a non-coding BGLT3 (beta globin locus transcript 3), a pseudogene HBBP1 (hemoglobin beta subunit pseudogene 1), and the adult δ and β genes47. The order of genes in the cluster reflects their temporal activation order during development: the first gene switch occurs in early gestation when the embryonic ε-gene is silenced and the fetal γ-genes are activated (ε-to-γ); and the second switch occurs around birth, when the adult β-globin gene is activated, replacing fetal γ-globin gene expression (γ-to-β)48. In human erythroid HUDEP-2 cells, which express adult globins, BCL11A binding peaks were observed at the promoters of the γ-globin genes (HBG1 and HBG2), as well as the upstream locus control region (LCR), and downstream at BGLT3 and HBBP135,40 (Fig. 2D).
We analyzed each BCL11A binding peak in the β-globin locus for the presence of the TGNCCN motif. The locus control region (LCR) peak #1 contains six copies of this motif, peaks #2, #3 and #5 occupy the regulatory regions of HBG2, HBG1, and HBBP1, while peaks #4 and #6 occupy the intrageneric regions of BGLT3 and HBB (Fig. 2D). The BGLT3 long non-coding RNA (ncRNA) induces γ-globin expression49,50. Positioned ~1-kb downstream of BGLT3 is the HBBP1 pseudogene (Fig. 2D) and overexpression of HBBP1 mRNA increases γ-globin expression in HUDEP-2 cells51. On the other hand, the HBBP1 genomic region might contain a repressive regulatory element for γ-globin silencing in adult erythroblasts, as deleting the HBBP1 region together with the intergenic region between BGLT3 and HBBP1 produces effects similar to those of BCL11A deletion52. HBBP1 mRNA associates with the RNA-binding protein HNRNPA153. Additionally, a genome-wide association study revealed that the HBBP1 SNP rs2071348 (g.5264146 A > C) was associated with β-thalassemia severity54,55. Another SNP, rs10128556 (g.5220259 C > T), was found in the second intron of HBBP156. Given these associations between BCL11A and the HBBP1 region, and the implication of BCL11A in expression of the HBBP1 region, we focus here on the BCL11A binding site (peak 5) at HBBP1, which has six motifs (Fig. 2F).
ZF2-3 and ZF4-6 fragments bind TGNCCA
We synthesized two 20-bp DNA oligonucleotides (oligos #1 and #2), containing the TGACCA or TGCCCA motif, derived from the BCL11A binding site between BGLT3 and HBBP1 (Fig. 3A). The two oligos are nearly identical in their 5’ halves, differing only at one position between oligos #1 and #2, and are variable in their 3’ halves, conserved only at one position (Fig. 3B). We used three biophysical methods to characterize the binding affinities of BCL11A’s ZF2-3 and ZF4-6 arrays with these oligos: (a) fluorescence polarization (FP), (b) electrophoretic mobility shift assay (EMSA), and (c) isothermal titration calorimetry (ITC).
A BCL11A binding site at the regulatory region between BGLT3 and HBBP1. The top line of sequence shows the 190 nt region (chr11: 5,342,627-5,243,816) that contains three copies of TGNCCA (in red) and three copies of NGGNCA (in blue). Under that is a pairwise alignment with the well-studied promoter regions of the two γ-globin genes (HBG2 and HBG1). Every 10th nucleotide position upstream of the TSS of HBG2/1 is underlined and numbered. Four 20-nt oligos, #1, #2 and #3 of HBBP1 (highlighted in yellow and boxed) and #4 of HBG2/1, were synthesized for binding assays (Supplementary Fig. 2). B Sequence alignment of oligos #1 and #2, showing conservation in the 5’ half. C, D FP assays of ZF2-3 with oligos #1 and #2. The number of replicates is indicated by N and error bars represent SEM. For binding curves that did not reach saturation, the lower limit of the binding affinity was estimated. E, F FP assays of ZF4-6 with oligos #1 and #2. G, H FP assays of the K784T variant with oligos #1 and #2. I, J EMSA of ZF2-3 binding with oligos #1 and #2. The red asterisks indicate putative protein-DNA complexes. K, L EMSA of ZF4-6 binding with oligos #1 and #2. M, N EMSA of K784T binding with oligos #1 and #2. O ITC measurement of ZF4-6 with oligo #1. P ITC measurement of ZF4-6 with oligo #2. Source data are provided as a Source Data file.
Using FP, we measured the dissociation constants (KD). ZF2-3 bound oligos #1 and #2 weakly, with a KD in the low micromolar range ( > 2 μM) (Fig. 3C, D). Under the same conditions, ZF4-6 showed 20–50-fold higher binding affinity, with KD values of 110 nM for oligo #2 and 43 nM for oligo #1 (Fig. 3E, F). A mutation in ZF4-6 (K784T; shown in red in ZF5 in Fig. 1B) reduced binding affinity by 10-fold, to >1 μM for oligo #2 and 0.4 μM for oligo #1 (Fig. 3G, H), though binding was still significant. [The K784T mutation is discussed more below, but it is associated with a distinct neurodevelopmental disorder that is also characterized by persistent HbF expression37.]
The EMSA revealed distinctive binding modes not seen in the FP assays. For the high-affinity interaction between ZF4-6 and oligo #1, a single shift band appeared at or above KD value of ~40 nM (Fig. 3L). For oligo #2, the first shift occurred around 40 nM, and a second shift became visible at ZF4-6 protein concentrations above 0.6 μM (Fig. 3K). Similar two-band shifts were observed for the K784T mutant (Fig. 3M, N), which has reduced binding affinity for both oligos. For the ZF2-3 fragment, which has the lowest binding affinity, super-shift bands were seen at higher protein concentrations for both oligos (Fig. 3I, J). These additional retardations in protein-DNA complex mobility at increased protein concentrations suggest that multiple ZF proteins can bind to a single oligo.
ITC provides the stoichiometry of the interaction, in addition to measuring binding affinity. We used ITC to repeat the binding experiments of ZF4-6 with oligos #1 and #2 by titrating the oligos into the protein sample. For oligo #1, the DNA-to-protein stoichiometry was 1.1, with a KD value of 22 nM (Fig. 3O), consistent with results from FP and EMSA. For oligo #2, the DNA-to-protein stoichiometry was 0.5 (indicating one DNA molecule bound by two protein molecules), with a KD value of 1.1 μM (Fig. 3P).
Notably, the three different biophysical methods captured different aspects of the binding process in the case of ZF4-6 binding oligo #2: FP detected the first binding event with a lower KD of 0.11 μM (Fig. 3E), ITC revealed the second binding event (with a DNA-to-protein stoichiometry of 0.5) and a higher KD of 1.1 μM (Fig. 3P), while EMSA visually demonstrated the two binding events as distinct band shifts (Fig. 3K). As discussed below, the binding of two ZF4-6 polypeptides to oligo #2, but only single binding to oligo #1, may reflect the presence of a partial second match to the binding motif in oligo #2 (TGCCCA, and acCCCA; Fig. 3A, B).
Structure of ZF4-6 binding TGACCA-containing oligo #1 specifically and non-specifically
To understand how BCL11A ZF arrays bind DNA, and why multiple protein molecules sometimes occupy the same DNA, we structurally characterized ZF4-6 in complex with DNA. Previous studies have analyzed ZF4-6 of BCL11A bound to a 12-bp DNA sequence, centered at the -115 position of the HBG1/2 promoter57. More recently, ZF4-6 was reported in complex with a longer 19-bp DNA58, though the sequence used did not exactly match any in the human genome (Fig. 4A).
A List of currently available structures of BCL11A’s ZF4-6 and the DNA sequences used. The common sequences are shaded. B Three protein molecules, A, B and C, are observed in the asymmetric unit. C Superimposition of the three protein molecules. D Summary of specific interactions between the TGACCA sequence and ZF4-6 in molecule A. The last T:A base pair at position 20 was not observed in the current structure. E N753 interacts with T3:A3 base pair. F N756 interacts with A4. G S782 interacts with G5. H Q781 bridges between two oxygen atoms of G5 and T6. I K784 interacts with G7. J R787 interacts with G8. K R787 forms van der Waals contacts with T9. L K817 of ZF6 interacts DNA in the minor groove. M The conservation of guanine in TGNCCA is not explained (marked by a question mark) by the current structural data. N ZF4 and ZF5 of molecule B interact with DNA nonspecifically via phosphate contacts in the minor groove.
We co-crystallized ZF4-6 with oligo #1, derived from human chromosome 11 (5,342,701-5,243,720), and determined its structure at 2.09 Å resolution (Supplementary Table 2). In our binding studies (Fig. 3F, L and O), both FP and EMSA showed KD values of ~40 nM (ITC gave 22 nM), while both EMSA and ITC yielded single ZF4-6 binding (EMSA – no supershift; ITC – 1:1 stoichiometry). However, unlike previous structures, we found one double-stranded DNA molecule and three protein molecules (A, B and C) in the crystallographic asymmetric unit (Fig. 4B). Pairwise superimposition showed a root-mean-square deviation (rmsd) of 0.7 Å across 51 aligned residues of ZF4-5 between molecule A and molecule B, and 0.3 Å across 59 aligned residues of ZF5-6 between molecule A and molecule C. The largest movement (5.9 Å) was observed at Gly796, located in the linker between ZF5 and ZF6 (Fig. 4C). As described below, molecule A is involved in DNA-sequence specific interactions, while molecule B engages in non-specific DNA interactions, likely reflecting the very high concentration during crystallization (mM during crystallization and µM during EMSA and ITC). Molecule C is probably an artifact of the crystal lattice, where two DNA molecules are joined end-to-end.
Molecule A
Molecule A contained the intact three-finger array. Consistent with previous findings57, ZF4 and ZF5 of molecule A make base-specific contacts in the DNA major groove (Fig. 4D). For clarity, we numbered the DNA base pair positions from 1 to 20 (Fig. 4D). Although we used TGACCA as the motif sequence, it is the complementary strand that is recognized. Specifically, Asn753 of ZF4 forms water-mediated contacts with the T:A base pair at position 3 (Fig. 4E), while Asn756 of ZF4 forms bidentate hydrogen bonds with adenine at position 4 (A4) (Fig. 4F). Val759 of ZF4 is too distant from C5 to interact, but Ser782 of the next finger, ZF5, makes a cross-strand H-bond with G5 (Fig. 4G). Gln781 of ZF5 bridges two base pairs at positions 5 and 6, and also forms a H-bond with Lys784 (Fig. 4H). That same lysine (Lys784), as well as Arg787, both from ZF5, each make two H-bonds with G7 and G8, respectively (Fig. 4I and J). Additionally, that arginine (Arg787 of ZF5) forms a van der Waals contact with the methyl group of T9 (Fig. 4K). Rather than continuing along the DNA major groove, ZF6 crosses to the minor groove, where it makes a DNA phosphate contact via Thr814 and three water-mediated contacts with base pairs at 11, 12, and 13 from the minor groove side (Fig. 4L).
Significantly, among the above-described interactions, only three result in completely sequence-specific recognition: Asn756 with A4, Lys784 with G7, and Arg787 with G8. Other interactions, such as those involving Asn753 of ZF4 (water-mediated), Ser784 of ZF5 (cross-strand), and Gln781 of ZF5 (bridging two neighboring base pairs), are more versatile and can accommodate alternative base pairs (as reviewed in ref. 41). Compared to the experimentally derived ChIP-seq motif, the specificity for guanine in TGnCCA is not explained by the current structural data (Fig. 4M). The residue responsible for this recognition, Val759 of ZF4 (see Fig. 4G), is a small hydrophobic amino acid, not typically associated with base-specific interactions.
The most unexpected interaction involves Gln781, as the pairing of Gln (or Asn) with adenine is a common mechanism for adenine recognition59, consistent with the predicted sequence motif T(t/g)TCC(a/c) (Fig. 1D). We compared the binding affinities of four oligos with variation at the TGnCCA site (where n = A, C, T and G) (Supplementary Fig. 2A). ZF4-6 binds most strongly to a T:A pair (oligo #3), followed by A:T and G:C pairs, both showing approximately a 2× reduction in binding affinity, and C:G, with more than a 4× reduction (Supplementary Fig. 2B–E). This selectivity can be attributed to the hydrogen bonding pattern of Gln781, which donates a proton to the N7 nitrogen of adenine and accepts a proton from its N6 amino group (Supplementary Fig. 2F).
Molecule B
Molecule B had ZF4 and ZF5, but lacked electron density for ZF6. ZF4 and ZF5 approach the DNA from the minor groove of the second half of DNA, making contacts with seven phosphate groups - three on one strand and four on the other (Fig. 4N). Six residues from ZF4 and three from ZF5 – all polar and charged - are involved in these phosphate contacts. Notably, of these six phosphate-contacting residues, Asn753 and Asn756 of ZF4, and Gln781 and Arg787 of ZF5, are the same residues that participate in base-specific interactions in molecule A. Therefore, molecule B and molecule A appear to represent two states along the substrate-recognition pathway caught in one structure: non-specific and specific. A similar phenomenon has been observed with the bacteriophage T4 DNA methyltransferase60 and E. coli lac repressor DNA binding domain61, where the same protein residues shift from electrostatic interaction with the DNA backbone in a non-specific complex to base-specific binding in cognate complexes. These findings suggest that ZF4-ZF5 of BCL11A can detect local variations in DNA shape (minor versus major grooves) and electrostatic potential. In a binding study with the 12-bp shorter DNA, deletion of ZF6 resulted in 3.6X reduced binding affinity relative to that of ZF4-657.
Molecule C
Finally, molecule C had ZF5 and ZF6, but lacked electron density for ZF4. Molecule C occupies the junction between two neighboring DNA molecules in the crystal lattice (Supplementary Fig. 3). The oligo #1 used for crystallization contains two C:G base pairs at positions 1 and 2, which partially match the consensus sequence recognized by ZF5. Indeed, Lys784 and Arg787 of ZF5 in molecule C interact with G1 and G2 (Supplementary Fig. 3) in the same manner as in molecule A. Interestingly, ZF6 adopts the same conformation in both the two-finger array of molecule C and the three-finger array of molecule A, inserting its Lys817 into the minor groove. In a binding study with the 12-bp shorter DNA, deletion of ZF4 resulted in 100X reduced binding affinity relative to that of ZF4-657.
Structure of K784T variant: two protein molecules bind one oligo #1 DNA
In addition to the wild-type ZF4-6, we also determined the structure of the K784T mutant37 in complex with the same DNA (oligo #1) (Supplementary Table 2; see also Fig. 3G, H, M and N). In this structure, two K784T variant proteins (termed molecules D and E) traverse the major groove of DNA, making specific contacts with each half of the DNA molecule (Fig. 5A). The two mutant structures are highly similar, with an rmsd of 0.76 Å across the three fingers and 75 aligned residues (Fig. 5B). Each molecule recognizes a different half of the DNA, with molecule D interacting with base pairs 3-9, and molecule E with base pairs 10-14 (Fig. 5C, D). Asn756 of ZF4 in each molecule specifically recognizes A4 and A11, respectively (Fig. 5E, F).
A Two molecules of K784T bind one DNA molecule. B Superimposition of two mutant molecules. C Molecule D interacts with base pairs 3-9. D Molecule E interacts with base pairs 10-14. (E) N756 of molecule D interacts with A4. (F) N756 of molecule F interacts with A11. G ZF5 of molecule D (including Thr784) recognizes C5 to T9. H ZF5 of molecule E (also including Thr784) recognizes T12 to T14. We observed that the AlphaFold3-predicted K784T mutant binds in a similar way to molecule D, and the interactions involving Asn756-A4 and Arg787-G8 are shifted by one base pair, now occurring at A3 and G7. There is no equivalent to molecule E is predicted.
ZF5 makes the most DNA base contacts. In molecule D, Gln781 and Arg787 each interact with two bases: C5 and T6, and G8 and T9, respectively (Fig. 5G). Thr784, which replaces the larger Lys784, is too short to reach the G7 base (Fig. 5G), suggesting that various base pairs can be accommodated at position 7. In molecule E, when Arg787 encounters adenine at position 15, its side chain becomes disordered, allowing Thr784 to move closer to T14 and form a van der Waals contact with its methyl group (Fig. 5H).
Compared to the wild-type structure described in Fig. 4, molecule D of the mutant closely resembles molecule A of the wild-type in recognizing the TGACCA sequence. In the wild-type, Lys784 in ZF4 occupies a base-interacting position (Fig. 1B), recognizing guanine at base pair position 7 (Fig. 4I). In contrast, the shorter side chain of Thr784 in the mutant is unable to touch the base (Fig. 5G). Interestingly, molecule E of the mutant adopts a specific conformation (Fig. 5H) that has apparently transitioned from the non-specific conformation observed in molecule B of the wild-type (Fig. 4N). This shift is likely due to the reduced specificity of the Thr784-Arg787 pairing (relative to Lys784-Arg787), which loosens the recognition from GG to NG (N = any nucleotide). However, when Arg encounters adenine, its side chain can adopt different conformations62 or become disordered, allowing the neighboring Thr784 to gain an interaction with T14. This flexibility allows the mutant to recognize a variety of different sequences.
Structure of ZF4-6 specifically binding TGCCCA-containing oligo #2
Next, we crystallized the three-finger array ZF4-6 with oligo #2, which contains a TGCCCA element (Supplementary Table 2). As indicated by the EMSA and ITC data (Fig. 3K and P), two molecules of ZF4-6 bind to oligo #2 (molecules F and G in Fig. 6A). Molecule F binds to the 5’ TGCCCA sequence, while molecule G binds to the 3’ half, which contains an overlapping CCCA sequence (Fig. 6B and C). Both molecules use a pair of Lys784 and Arg787 from ZF5 to make sequence-specific contacts with guanine bases at G7 and G8, and G14 and G15, respectively (Fig. 6D, E). In addition, molecule F uses Asn756 from ZF4 to form a base-specific bidentate H-bonds with A4 (Fig. 6F). In both molecules, polar residues Asn753 and Ser755 from ZF4 (Fig. 6G), as well as Gln781 and Ser783 from ZF5 (Fig. 6H), are capable of forming H-bonds as both hydrogen donors and acceptors. These polar residues form cross-base pair and cross-strand interactions with variable base pairs at positions 2 (C:G) and 3 (T:A) (Fig. 6G), positions 5 (G:C) and 6 (C:G) (Fig. 6D), positions 9 (A:T) and 10 (A:T) (Fig. 6H), and positions 12 (C:G) and 13 (C:G) (Fig. 6E). In summary, our structural analyses suggest that ZF4-5 is capable of recognizing TGACCA (molecule A in Fig. 4), TGCCCA (molecule F), or ACCCCA (molecule G in Fig. 6).
A Two protein molecules, F and G, bind one dsDNA of oligo #2. B Summary of interactions between the TGCCCA sequence and ZF4-6 in molecule F (green). C Summary of interactions between the ACCCCA sequence and ZF4-6 in molecule G (cyan). D Four polar and charged residues from ZF5 of molecule F interact with five base pairs at positions 5–9. E Four residues from ZF5 of molecule G interact with five base pairs at positions 12–16. F N756 of molecule F interacts with A4. G N753 from ZF4 of molecule F interacts with two base pairs at positions 2 and 3. V759 is too distant from C5. H N753 and S755 from ZF4 of molecule G interacts with two base pairs at positions 9 and 10, and N756 interacts with T11. V759 is too distant from G12.
AlphaFold3 prediction
Since we do not yet have recombinant full-length, or larger segments of, BCL11A to study the coordinated binding events of its two ZF arrays, we used AlphaFold363 to model BCL11A onto a 70-bp DNA sequence containing both oligo #1 and oligo #2 (Fig. 7A). We aimed to answer three questions (Fig. 7B): what are the binding modes of (i) the isolated ZF2-3 (residues 373–433), (ii) the isolated ZF4-6 (residues 731-835), and (iii) a larger fragment encompassing residues 373–835. To simplify the model, we only considered two binding sites.
A A 70-bp DNA sequence containing two potential binding sites for BCL11A. B Three fragments modeled: central ZF2-3, C-terminal ZF5-6 and two-thirds of the C-terminal portion of BCL11A. The S/T-rich regions immediately after ZF2-3 or prior to ZF5-6 are indicated. Panels C to F are displayed in a similar orientation. C ZF2-3 can bind to both binding sites. D ZF-4-6 can bind to both binding sites. E In one predicted configuration, ZF2-3 (cyan) binds to the 3’ TGACCA site while ZF4-6 (green) binds to the 5’ TGCCCA site. The ~300-residue linker region loops back and forth twice. The AlphaFold3 model has a higher confidence (pLDDT values between 70 and 90 in a scale of 100) for the ZF domains (colored green and cyan), a low confidence (between 50 and 70) for the two additional helices after ZF3 and prior to ZF4 (colored gray), and a very low confidence (pLDDT <50) for the flexible linker. F In another predicted configuration, ZF2-3 and ZF4-6 respectively, from two different polypeptides, bind 5’ TGCCCA and 3’ TGACCA. For clarity, the non-structured loops are not shown.
AlphaFold3 predicted, consistent with our experimental data, that the two isolated ZF arrays can each bind to both the TGCCCA site (oligo #2) and TGACCA site (oligo #1) (Fig. 7C, D), with both ZF arrays traveling along the DNA in the same direction. For the large fragment containing both ZF arrays, the AlphaFold3 model placed ZF2-3 at the 3’ oligo #1 site and ZF4-6 at the 5’ oligo #2 site, with the ~300-residue linker between the two ZF arrays looping back and forth twice (Fig. 7E). Interestingly, AlphaFold3 also generated an alternative model, where ZF2-3 binds to the oligo #2, and ZF4-6 to the oligo #1, from two separate polypeptides (Fig. 7F). The AlphaFold3 models have a higher confidence (pLDDT values between 70 and 90 on a scale of 100) for the ZF domains, but very low model confidence (pLDDT <50) for the flexible linker (Fig. 7E). Our aim was to demonstrate that the long, 300-residue loop is sufficient to position the two zinc finger arrays in place. The shortest linker, at ~50 residues, would still allow for this placement (Supplementary Fig. 4).
It is noteworthy that immediately following ZF2-3 is a region abundant in serine and threonine residues, which are potential sites for protein phosphorylation, as well as acidic residues such as aspartate and glutamate (Fig. 7B and E). A similar feature has been observed in ZNF410, where an acidic and serine/threonine-rich loop precedes its five-finger array64. This loop has been shown to offer an additional layer of regulation to ZNF410’s DNA-binding ability, beyond just sequence specificity64, by acting as a cis-inhibitory DNA analog. This composition, which is conserved among vertebrate orthologs (Supplementary Fig. 5), suggests that the corresponding region in BCL11 A might function similarly, where phosphorylation or interactions with acidic residues may modulate its DNA binding activity or interactions with other proteins, providing a mechanism of regulation beyond the DNA sequence specificity.
An oligomer model of BCL11A mediated by the N-terminal ZF0 and ZF1
Inspired by the capabilities of AlphaFold3, we investigated whether the N-terminal region of BCL11A could self-interact to form dimers, tetramers or larger oligomers. During the review of this manuscript, Zheng et al. published a study presenting the crystal structure of the tetramer formed by isolated ZF0, and an AlphaFold3-predicted dimeric formation of isolated ZF145. In addition, the corresponding ZF0 in BCL11B, the paralog of BCL11A that has identical ZFs to those in BCL11A, has been shown to mediate the homo-multimerization of BCL11B65 as well as hetero-multimerization with BCL11A66. The modeling presented in our work provides an important confirmation of their most recent findings, and also provides insight and context (ZF2-3 and ZF4-6) to our evolving understanding of BCL11A’s role in HbF regulation.
We modeled residues 1-211, a fragment containing both ZF0 and ZF1, which is shared by the BCL11A-S, -L, and -XL isoforms (Fig. 1A and Supplementary Fig. 6A). The model predicts that the non-canonical ZF0 can dimerize and tetramerize, while ZF1 forms a second dimerization interface (Supplementary Fig. 6B). ZF0 is predicted to adopt two conformations, with either Gln70 or Cys71 as the fourth zinc ligand, resulting in a helix with different length (Supplementary Fig. 6C). Dimerization of ZF0 occurs via its β1 strand, extending the 2-strand β-sheet into a 4-strand β-sheet (Supplementary Fig. 6D). This dimer can further assemble into a tetramer through interactions involving the αA helix (Supplementary Fig. 6E). The N-terminal portion of this helix contains five hydrophobic residues, such as Ile60, which forms a hydrophobic core at the tetramer interface (Supplementary Fig. 6F).
Additionally, the two helices of ZF1 pack against each other to form a second dimer interface (αB in Supplementary Fig. 6G and 6H). The five-residue hydrophobic stretches at the beginning of the helices αA and αB are unique to the non-canonical ZF0 and ZF1, whereas the canonical ZFs at these positions contain polar and charged residues that interact with DNA bases (see Fig. 1B). Alternatively, this ZF1-medicated second dimer interface could form between two tetramers (Supplementary Fig. 6I), potentially allowing BCL11A to polymerize.
Discussion
We analyzed the structures of the three-finger array (ZF4-6) of BCL11A, in complex with two different oligonucleotides derived from the regulatory region between BGLT3 and HBBP1 (chr11: 5,243,691-5,243,760). In total, we observed seven ZF4-6 molecules (five wild-type and two K784T mutants), in three crystal structures with different binding modes. Of these, five molecules (A, D, E, F and G) make sequence-specific contacts with DNA bases (Figs. 4, 5 and 6). Molecule B forms non-specific contacts with the DNA phosphate backbone and is likely seen only at very high concentrations (Fig. 4N). Molecule C is positioned across the end-to-end junction of two DNA molecules at the crystal lattice, that is likely an artifact of crystallization (Supplementary Fig. 3). Each molecule of ZF4-6 interacts with DNA independently, as there is no obvious protein-protein interaction observed within the crystallographic asymmetric unit. We made the following observations.
Basis for sequence specificity
First, DNA sequence-specific recognition occurs mainly through three key residues: Asn756 in ZF4, and Lys784 and Arg787 in ZF5. These residues occupy the base-interacting positions of their respective ZF unit: position -4 of ZF4 and positions -4 and -1 of ZF5 (Fig. 8A). The pairing of Arg or Lys with guanine and Asn or Gln with adenine is a common mechanism for purine (guanine or adenine) recognition59,67. BCL11A follows this rule, recognizing adenine A2 (via Asn756) and guanines G5 and G6 (via Lys784 and Arg787) in the consensus sequence motif (Fig. 8A).
A Schematic representation of two triplet elements (three adjacent DNA base pairs) bound to ZF4-5 via major groove contacts. The DNA sequence of the recognition strand (bottom in magenta) is oriented right to left from 5’ to 3’. The complementary strand (top) is colored in gray and the numbers 1–7 match to the consensus sequence motif defined in Fig. 1D-1F. X:Y is variable base pair. The protein sequence is from N-to-C termini (left-to-right) and amino acids at positions −1, −4, and −7 (highlighted) relative to the first Zn-associated histidine interact specifically with the DNA bases shown above. The traditional residue numbers of ZF4-5 and ZF2-3 are provided for comparison. B Sequence alignment of four minor-groove spacer ZFs including the pre- and post-linker regions. Positively charged residues (K and R) are counted (“charge”).
The Lys784-to-Thr (K784T) pathogenic mutation reduced DNA binding by over 10-fold for both oligos #1 and #2 (Figs. 3G and 3H). The structure of K784T bound to oligo #1 shows that the altered protein interacts with a variable base pair (C:G or A:T) at position 5 (Fig. 5). A similar mutation (K365T) in CTCF has been observed, where the mutant binds all four base pairs with consistent affinities (2–4 × variation)43.
The importance of Asn756 and Arg787 in DNA binding has been previously evaluated through mutagenesis studies targeting both the protein and DNA sides57. Alanine substitutions of Asn756 (N756A) and Arg787 (R787A) resulted in 7.5X and ~130X reduction in DNA binding affinity, respectively. Similarly, altering the Asn756-interacting A:T base pair to T:A caused approximately 6X reduction in binding affinity. Changing the Arg787-interacting G:C base pair to the three other alternative base pairs of A:T, T:A or G:C led to a much more severe reduction of ~50-100X. These naturally occurring mutations at positions -118 and -114 in the fetal globin promoter are associated with fetal globin derepression and the benign genetic condition of hereditary persistence of fetal hemoglobin (HPFH)40,68,69. In addition, a recent analysis of 42 individuals with BCL11A-related intellectual developmental disorder identified a heterozygous Asn756-to-Lys missense variant in ZF470.
Second, Asn753 of ZF4 and Gln781 of ZF5 occupy the base-interacting position -7 in their respective ZF unit (Fig. 8A). Theoretically, they can recognize adenine at positions 1 and position 4 of the sequence motif. Indeed, a T:A base pair is predicted at position 4 (Fig. 1D). However, when ZF4 encounters a TTG sequence, the side chain of Asn753 is too short to reach adenine A1 (Fig. 6G). In the helical structure of DNA, base pairs are stacked like coins, with each rotated relative to the next, creating the helical shape. When adenine A2 interacts with Asn756 at the -4 position, A1 is rotated away from Asn753 at the -7 position. As a result, the shorter side chain of Asn753 makes two cross-strand interactions instead. Both the prediction and experimental derived consensus motifs show variability at position 1 of the sequence motif (Figs. 1D and 1F). A similar observation has been made with PRDM9, where an Asn at position -7 in two ZF units is too distant away to make specific contacts71.
For Gln781 of ZF5, which has a longer side chain, it was predicted to bind adenine at position 4, as observed with Gln264 in ZNF41023 and Gln536 in CTCF43. Indeed, our binding assay, which tested the four possible base pairs at position 4, showed that the ZF4-6 array binds most strongly to T:A, followed by A:T or G:C base pairs, with the least binding to C:G (Supplementary Fig. 1). In the absence of a preferred adenine, Gln781’s side chain rotates, allowing it to engage in hydrogen bonding (as both a H-donor and H-acceptor simultaneously) across two base pairs (positions 3 and 4) and two strands. Examples of these interactions include GpA/TpC in Fig. 3D, ApG/CpT in Fig. 5D, GpC/GpC in Fig. 6B, and CpC/GpG in Fig. 6C.
Third, the conservation of guanine at position 3 (Fig. 1F) is not explained by this study. The corresponding residue, Val759 in ZF4, is located too far from the base - 5.7 Å from G or 6.2 Å from C (Fig. 6G and 6H). Additionally, the hydrophobic valine residue is not typically involved in base-specific interactions. This may be a case of indirect sequence recognition72,73.
Fourth, the preference for an A:T base pair at position 7 can be explained by the interaction between the methyl group of thymine T7 and Arg787 in the TpG dinucleotide (Figs. 4K, 5G, 6D and 6E). The methyl group of T7 makes van der Waals contacts with Arg787, which also interacts with the 3’ Gua of the same strand, forming what we call ‘methyl-Arg-G triads’74,75.
Role of ZF6
Unexpectedly, ZF6 adopts the same conformation in six out of seven molecules (with molecule B lacking electron density for ZF6; Fig. 1C). Instead of continuing along the DNA major groove, ZF6 crosses to the minor groove, where it makes contacts with DNA phosphate and water-mediated interactions from the minor groove side (Fig. 4L). This behavior is not predicted by current knowledge of C2H2 ZF models, as the bit score for ZF6 is as high as for ZF4 and ZF5 (Fig. 1B). Other C2H2 ZF units acting as spacers across the minor groove have been observed before, such as ZF4 and ZF6 in TFIIIA76, ZF2 in Zfp56862, ZF2 in HIC277, and ZF8 in CTCF43,78,79. These spacer fingers do not share clear sequence similarity (Fig. 8B), and features that predict spacer functionality remain unknown. However, the presence of ZF6 strengthens the DNA-binding affinity of the ZF4-6 array, by contributing a positive entropy58, while the ZF2-3 array, lacking a similar spacer, binds more weakly (Fig. 3). We note that the conformation of the C-terminal ZF6 relative to ZF5 remains fixed when ZF5 binds DNA specifically (Supplementary Fig. 7A). However, ZF6 becomes disordered when ZF5 binds DNA non-specifically (as seen in molecule B in Fig. 4) or adopts varied conformations in a solution NMR structure in the absence of DNA58 (Supplementary Fig. 7B). Alignment of ZF4-6 in the presence or absence of DNA indicates that the inter-domain orientations between ZF4 and ZF5, as well as between ZF5 and ZF6, are not the same (Supplementary Fig. 7B). This suggests that the flexibility of the linkers between the two neighboring fingers allows the three-finger array to explore a broader range of DNA duplexes with varied sequences. A structure of ZF4-5, where ZF6 has become disordered, binds DNA similarly to that of ZF4-6 (Supplementary Fig. 7C, D).
Roles of ZF0 and ZF1
An isolated single ZF unit sometimes plays more complex roles than the DNA-binding ZF arrays. In the case of BCL11 A, mutations in ZF0 (Fig. 1A) are pathogenic and destabilize BCL11A38. The role of ZF0 in tetramerization (ref. 45 and described here) may explain the effects of these mutations on BCL11A stability. Furthermore, if ZF0 is truly mediating BCL11A multimerization, then the shortest splicing variants (S and XS in Fig. 1A) may play a role in modulating dimer and/or tetramer formation80, similar to what has been seen (for example) with human glucuronosyltransferase UGT1A81. In addition, ZF1 may allow higher-order multimerization (Supplementary Fig. 6I). All four splice variants have ZF0, while the three larger ones have ZF1 as well, so the modulation of multimerization could be quite refined, driven by the absolute and relative levels of all four variants.
Finally, the BCL11A ZF0 and ZF1 sequences, including the hydrophobic patches, are highly-conserved among BCL11A orthologs from six vertebrate classes, ranging from humans to whale sharks (Supplementary Fig. 5). Even whale shark BCL11A, with just 63% overall amino acid identity to the human ortholog, is 79% identical in ZF0 and 90% in ZF1. Indeed, the only significant difference between the human and other BCL11A sequences shown is the spacer between ZF0 and ZF1, which is 50% longer than in the nonmammalian vertebrates.
Role of ZF2-3
Notably, ZF2-3 and ZF5-6 are highly similar, sharing 73% sequence identity (Fig. 1C), including 7 out of 8 residues shown directly involved in DNA base interactions (Fig. 8A), enabling the ZF2-3 array to recognize similar sequences. In our study, we estimated the KD values for ZF2-3 binding to oligos #1 and #2 using FP and EMSA, which were in the low micromolar range ( > 2 μM). This is comparable to the findings by Viennel et al., where the two-finger array of ZF4-5, upon ZF6 deletion, exhibited a binding affinity of 2.15 μM in an AlphaScreen assay (see supplementary Fig. 5 of ref. 58). Nevertheless, the lower-affinity two-finger array of ZF4-5 is still able to bind DNA (Supplementary Fig. 7C, D).
The importance of the ZF2-3 domain of BCL11 A can be understood through its discovery and functional studies, particularly in relation to the L-isoform (Fig. 1A), which contains the ZF2-3 DNA binding domain but lacks the C-terminal ZF4-6 array. BCL11A was initially identified as CTIP1 in a yeast two-hybrid screen for proteins expressed in the brain82, with CTIP1 later renamed as BCL11A-L and its homolog CTIP2 renamed as BCL11B-XL. BCL11A-L and BCL11A-S are the two major isoforms expressed in neurons83, and when co-expressed in HEK293 cells, have transcriptional regulatory activity38. BCL11A-L binds to DNA directly through the DNA-binding domain mapped to ZF2-3 (referred to as ZF3 and ZF4 in ref. 84), with a consensus binding sequence of GGCCGGAGC. The underlined sequence is a variant of TGRCCR motif (R = G/A). In a conditional mouse model, BCl11A-L (CTIP1) directly suppresses the expression of Semaphorin 3c (Sema3c), which regulates cortical projection neuron polarization and migration85. The Sema3c promoter contains six repeats of the sequence TGGGCCGG. Additionally, BCL11A-L has been suggested to act as a transcriptional repressor in adult erythrocytes86,87. These earlier studies also proposed the idea of BCl11A self-association, including dimer and/or tetramer formation among its isoforms80,84. Furthermore, using CRISPR-Cas9 editing, BCL11A expression was evaluated in a transgenic mouse model where ZF4-6 was deleted from the endogenous mouse BCL11A locus58, BCL11A truncated protein lacking ZF4-6 (equivalent to the L-isoform) is more stable than wild-type BCL11A-XL (see supplementary Fig. 5F of ref. 58).
Finally, in BCL11B – the BCL11A paralog that contains the identical DNA binding zinc fingers — a germ line missense mutation yields N441K in ZF2. This ZF2-altered BCL11B allele was initially found in a newborn with severe combined immunodeficiency (SCID)88 and isolated from a patient with T cell deficiency and neurological disorders66. The mutant acts in a dominant-negative manner through the formation of a non- or dys-functional dimer65 and results in reduced binding to known canonical promoters, and also abnormal added binding to novel DNA sites not recognized by wild-type BCL11B66.
In summary, enhanced chromatin occupancy at clusters of binding sites is crucial for transcription factors to stably regulate gene expression. Several studies have demonstrated that low-affinity DNA binding directs transcriptional outcomes during tissue development, in ways that are distinct from and cannot be replaced by high-affinity interactions89,90,91,92,93. One notable example is the master myeloid transcription factor PU.1, which binds DNA with affinities spanning multiple orders of magnitude – covering a ~ 1000-fold range in dissociation constants94. In this study, we demonstrate that BCL11A uses two tandem zinc-finger arrays, low-affinity ZF2-3 and high-affinity ZF4-6, to bind clustered short sequence motifs. For instance, binding two six-base-pair motifs can extend across a 70-bp regulatory region between BGLT3 and HBBP1 (Fig. 7). The ~300-residue flexible linker between the two ZF arrays allows the motifs to be spaced farther apart, effectively increasing the binding span. Along with recently described N-terminal self-interactions45, different BCL11A isoforms may form dimers, tetramers, or larger polymers (Supplementary Fig. 6), enabling BCL11A to bind multiple clustered sites, thereby significantly enhancing chromatin occupancy and its repressor functions. Accordingly, it will be important to determine the relative expression of different BCL11A isoforms under different physiological conditions. These results deepen our understanding of this essential regulator.
Methods
BCL11A plasmid constructs
The human BCL11A ZF4-6 fragment (residues 730-835; pXC2391) was synthesized by GenScript and cloned into the pGEX-6P-1 vector, resulting in a construct with an N-terminal glutathione S-transferase (GST) tag followed by a recognition sequence (LEVLFGP) of human rhinovirus (HRV) type-14 3 C protease. The K784T mutant within the context of ZF4-6 (pXC2397) was generated by site-directed mutagenesis. The BCL11A ZF2-3 fragment (residues 373-433; pXC2394) was generated using overlapping PCR and cloned into a modified pET-28b vector, with a N-terminal His-SUMO tag.
BCL11A ZF arrays expression and purification
The three plasmids described just above were transformed into the Escherichia coli strain BL21 (DE3) Gold+ for protein overexpression. Bacteria were grown in LB medium in the presence of 50 µg/ml of ampicillin (ZF4-6 plasmids) or 25 µg/ml kanamycin (ZF2-3 plasmid). Cells were grown at 28 °C to an approximate OD600 of 1.0 at which 25 µM ZnCl2 was added to the cell culture and shaker temperature was reduced to 15 °C. Thirty minutes later expression was induced by 1 mM isopropyl β-d-1-thiogalactopyranoside (IPTG) and cultures were allowed to grow overnight ( ~ 16 h).
All harvested cells were suspended in 500 mM NaCl Buffer A - 20 mM HEPES, pH 7.5, 500 mM NaCl, 5% glycerol, 25 µM ZnCl2, and 0.5 mM Tris(2-carboxyethyl)phosphine (TCEP) - containing 1 mM phenylmethyl sulfonyl fluoride (PMSF). The cells were lysed by French Press followed by brief sonication. Lysates were cleared of cell debris by centrifugation of 70,000Gs.
For GST-ZF4-6, cleared lysate from cultures was loaded onto a 4 ml glutathione-Sepharose column and was then washed with Buffer A until Coomassie-stained 5 ml sampling spots on blotting paper displayed little color. Protein was eluted with Buffer A containing 20 mM glutathione. Pooled fusion protein was digested overnight with the in-house-produced 3C protease to release the GST tag.
For His-SUMO-ZF2-3, cleared lysate was loaded onto a 5-ml HiTrap Ni2+ column which was then washed by Buffer A containing 20 mM imidazole until there was little absorbance at 280 nm. The protein was eluted with a 20-column volume gradient with increasing amounts of imidazole (up to 300 mM) in Buffer A. Protein from pooled fractions was digested overnight with in-house-produced Ulp1 protease95 to cleave the His-SUMO tag.
Purification proceeded similarly for the three proteins. Pooled fractions were diluted to 100 mM NaCl in Buffer A (20 mM HEPES, pH 7.5, 100 mM NaCl, 5% glycerol, 25 µM ZnCl2, and 0.5 mM TCEP) and loaded onto a 5-ml HiTrap SP HP column. Proteins were eluted with a 20-column volume gradient with an increasing amount of NaCl (up to 1 M). Peak fractions containing the protein of interest with the smallest amount of other protein impurities were pooled and concentrated to ~2 ml volume and then loaded onto a sizing column (HiLoad 16/60 Superdex S200) equilibrated with 150 mM NaCl in Buffer A. Peak fractions were pooled, and some was immediately concentrated for crystallization trials while the rest was distributed into 1.5 ml Eppendorf tubes, flash frozen in liquid nitrogen and stored at -80 °C for later use.
Crystallography
Oligonucleotides were purchased from Integrated DNA Technologies (IDT) and were annealed in 50 mM NaCl and 10 mM Tris-HCl pH 7.5. All three proteins were concentrated to ~30-35 mg/ml ( ~ 2.6 mM) and the annealed DNA was added at a 1:1.3 molar ratio of protein to DNA such that the final concentration of protein was ~16 mg/ml ( ~ 1.3 mM) and incubated overnight at 4 °C. Crystallization trials using commercially available screens were conducted with an Art Robbins Gryphon Crystallization Robot, which was used to set up sitting drops (0.2 ml sample + 0.2 ml screen solution). Crystallization plates were incubated at ~19 °C. Crystals of ZF4-6 WT and its K784T mutant with oligo #1 (5’-CCT TGA CCA ATA GAT TCA T-3’ and 3’-GGA ACT GGT TAT CTA AGT AA-5’) grew in many conditions over 1–7 days, such as 24% (v/v) ethanol, 0.1 M HEPES pH 7.8, 40 mM magnesium chloride hexahydrate or 18% (w/v) polyethylene glycol (PEG) 3350, 17% (v/v) PEG 400, 4.8% (v/v) 2-propanol, 0.1 M 3-(cyclohexylamino)-2-hydroxy-1-propanesulfonic acid (CAPSO) pH 9.0. ZF4-6 WT with oligo #2 (5’-CCT TGC CCA AAC CCC ACC C-3’ and 3’-GGA ACG GGT TTG GGG TGG GG-5’) grew in only one condition (15% (w/v) PEG 1000, 30% (v/v) PEG 300, 0.1 M Bis-Tris propane pH 7.0) and appeared after several weeks. Crystals were harvested in a nylon or Litho loop and flash frozen in liquid nitrogen after momentary soaking in reservoir solution supplemented with 20% (v/v) ethylene glycol if the original condition did not contain any cryoprotectant.
All birefringent crystals were sent to beamline 17-ID-1 (AMX) at National Synchrotron Light Source (NSLS) II of Brookhaven National Laboratory and placed in a in a 100 K cryostream for screening and if adequate diffraction observed, for subsequent data collection. Diffraction data were processed with the autoPROC toolbox96 utilizing XDS97 for data reduction. Intensities from images representing 210° crystal rotation were merged with POINTLESS/AIMLESS98,99.
BCL11A ZF4-6 and DNA complex structures were determined by molecular replacement in the PHENIX PHASER module100 utilized PDB 6U9Q58 as the initial model. After which, as additional electron densities that have the shape of ZFs were observed, second or third ZF molecules were modeled. All structure refinements were performed by PHENIX Refine101,102. Randomly chosen reflections (5%) were used for validation by R-free values103,104, with careful manual (re)building using COOT105,106 within refinement cycles. During each round of PHENIX refinement, structure quality was analyzed, and validation was performed using the PDB validation server107 (Supplementary Table 2). PyMol (Schrödinger, LLC) was used for generating molecular graphics.
Alphafold3 modeling
Protein and DNA sequences along the number of protein monomers with the expected number of zinc ions for the ZFs were submitted to the Alphafold3 server (https://golgi.sandbox.google.com)63. Model evaluation proceeded by Alphafold scoring and use of our structural knowledge of ZFs, in particular those of BCLl11A and their potential sequence specific binding to DNA. Molecular graphics of these models were generated using PyMol (Schrödinger, LLC).
Fluorescence polarization-based DNA binding assay
A 40 µL reaction mixture was prepared containing 20 nM FAM-labeled DNA duplex and varying concentrations of BCL11A protein in a buffer consisting of 20 mM Tris-HCl (pH 7.5), 150 mM NaCl, 5% glycerol, and 0.5 mM TCEP. The mixture was incubated on ice for 20 min. Fluorescence polarization was measured using the Synergy Neo2 Multi-mode Reader (BioTek). Data analysis was performed using GraphPad Prism 10, and the binding affinity (KD) was determined using the equation: [mP] = [maximum mP] × [C] / (KD + [C]) + [baseline mP] where milli-polarization [mP] is the measured signal, and [C] represents the protein concentration.
Electrophoretic mobility shift assay (EMSA)
Following the fluorescence polarization assay, 10 µL of each reaction sample was subjected to electrophoresis on an 8% native polyacrylamide gel at a constant voltage of 150 V in ice-cold 0.5× TBE buffer for 25 min. The gel was imaged using a Typhoon 9410 imager (GE Healthcare).
Isothermal titration calorimetry (ITC)
Stored fractions of purified BCL11A ZF4-6 fragment were slowly thawed on ice. Double-stranded oligonucleotides (oligo #1 and #2) were annealed in 10 mM Tris-HCl pH 7.5 and 50 mM NaCl. The concentrations of the protein and DNA were confirmed using a ThermoScientific Nanodrop before both were diluted with the same buffer (20 mM HEPES pH 7.5, 150 mM NaCl, 5% glycerol, and 0.5 mM TCEP) to concentrations appropriate for ITC. The ZF4-6 domain (20 µM) were loaded in the sample cell and the annealed DNA (200 µM) was injected via the syringe for experiments at 25 °C using a MicroCal Auto ITC200 (Malvern Corporation). During continuous stirring at a rate of 750 rpm, 18 injections of 2 μL followed an initial injection of 0.2 μL of the DNA into the cell. Each injection lasted for 4 s, and there was a 300 s interval between injections to allow the system to reach equilibrium. The reference power was set to 8 μcal/s. Binding constants (KD) were calculated by fitting the data using the MicroCal PEAQ-ITC Analysis ITC software.
Sequence motif analysis
ChIP sequencing datasets (GSM2771529 and GSM2771530)40 were downloaded from Cistrome 3.0108. CUT&RUN datasets35 were downloaded from GEO (GSE104676; specifically, the GSM2805366, GSM2805367, and GSM2805368 NarrowPeak files). Peak sequences were extracted using the Fasta and SeqIO modules from Pyfaidx and Biopython, respectively. Human genome sequence Hg38 was used for ChIP-seq and Hg19 was used for CUT&RUN, in accordance with the original published analyses. TGNCCN sequence, or its complementary sequence NGGNCA, were searched to derive motif counts. Outliers were identified and removed from the datasets using the interquartile range (IQR) method. Differences between expected and observed motif counts were quantified using Mann Whitney U test. Expected motif counts were calculated as 1/128, based on the chance of TGNCCN motifs occurring randomly in the genome. Motif counts were normalized by the length of the TGNCCN motifs subtracted from the peak width. Spearman correlation analysis was used to assess correlations.
Statistics and reproducibility
X-ray crystallographic data were collected quantitatively and processed. Structure refinements were carried out with PHENIX Refine, using 5% randomly selected reflections for cross-validation by R-free values. Supplementary Table 2 provides statistics of data collection and refinement. Structure quality was assessed throughout refinements and validated using the PDB validation server. Statistics details for binding experiments are provided in the legends of Fig. 3 and Supplementary Fig. 2.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The coordinates and structure factor files of the X-ray structures of the BCL11A ZF4-6 in complexes with DNA have been deposited to PDB and released under accession numbers 9E6R for ZF4-6 and oligo #1, 9E6S for K784T and oligo #1, and 9E6T for ZF4-6 and oligo #2. Source data are provided with this paper.
References
Latchman, D. S. Transcription factors: an overview. Int J. Exp. Pathol. 74, 417–422 (1993).
Wolberger, C. How structural biology transformed studies of transcription regulation. J. Biol. Chem. 296, 100741 (2021).
Boumpas, P., Merabet, S. & Carnesecchi, J. Integrating transcription and splicing into cell fate: transcription factors on the block. Wiley Interdiscip. Rev. RNA 14, e1752 (2023).
Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A. & Luscombe, N. M. A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet 10, 252–263 (2009).
Lambert, S. A. et al. The human transcription factors. Cell 172, 650–665 (2018).
Amoutzias, G. D., Robertson, D. L., Van de Peer, Y. & Oliver, S. G. Choose your partners: dimerization in eukaryotic transcription factors. Trends Biochem Sci. 33, 220–229 (2008).
Tsuji, A., Yamashita, H., Hisatomi, O. & Abe, M. Dimerization processes for light-regulated transcription factor photozipper visualized by high-speed atomic force microscopy. Sci. Rep. 12, 12903 (2022).
Karin, M., Liu, Z. & Zandi, E. AP-1 function and regulation. Curr. Opin. Cell Biol. 9, 240–246 (1997).
Blackwood, E. M. & Eisenman, R. N. Max: a helix-loop-helix zipper protein that forms a sequence-specific DNA-binding complex with Myc. Science 251, 1211–1217 (1991).
Oeckinghaus, A. & Ghosh, S. The NF-kappaB family of transcription factors and its regulation. Cold Spring Harb. Perspect. Biol. 1, a000034 (2009).
Gencel-Augusto, J. & Lozano, G. p53 tetramerization: at the center of the dominant-negative effect of mutant p53. Genes Dev. 34, 1128–1146 (2020).
Huntley, S. et al. A comprehensive catalog of human KRAB-associated zinc finger genes: insights into the evolutionary history of a large family of transcriptional repressors. Genome Res. 16, 669–677 (2006).
Stogios, P. J., Downs, G. S., Jauhal, J. J., Nandra, S. K. & Prive, G. G. Sequence and structural analysis of BTB domain proteins. Genome Biol. 6, R82 (2005).
Williams, A. J., Khachigian, L. M., Shows, T. & Collins, T. Isolation and characterization of a novel zinc-finger protein with transcription repressor activity. J. Biol. Chem. 270, 22143–22152 (1995).
Thomas, J. H. & Emerson, R. O. Evolution of C2H2-zinc finger genes revisited. BMC Evol. Biol. 9, 51 (2009).
Siggs, O. M. & Beutler, B. The BTB-ZF transcription factors. Cell Cycle 11, 3358–3369 (2012).
Sander, T. L. et al. The SCAN domain defines a large family of zinc finger transcription factors. Gene 310, 29–38 (2003).
Edelstein, L. C. & Collins, T. The SCAN domain family of zinc finger transcription factors. Gene 359, 1–17 (2005).
Bonchuk, A., Balagurov, K. & Georgiev, P. BTB domains: a structural view of evolution, multimerization, and protein-protein interactions. BioEssays: N. Rev. Mol., Cell. Develop. Biol. 45, e2200179 (2023).
Mance, L. et al. Dynamic BTB-domain filaments promote clustering of ZBTB proteins. Mol. Cell 84, 2490–2510 e2499 (2024).
Park, P. M. C. et al. Polymerization of ZBTB transcription factors regulates chromatin occupancy. Mol. Cell 84, 2511–2524 e2518 (2024).
Orkin, S. H. & Bauer, D. E. Emerging genetic therapy for sickle cell disease. Annu Rev. Med. 70, 257–271 (2019).
Lan, X. et al. ZNF410 Uniquely activates the NuRD component CHD4 to silence fetal hemoglobin expression. Mol. Cell 81, 239–254 e238 (2021).
Vinjamur, D. S. et al. ZNF410 represses fetal globin by singular control of CHD4. Nat. Genet 53, 719–728 (2021).
Sankaran, V. G. et al. Human fetal hemoglobin expression is regulated by the developmental stage-specific repressor BCL11A. Science 322, 1839–1842 (2008).
Masuda, T. et al. Transcription factors LRF and BCL11A independently repress expression of fetal hemoglobin. Science 351, 285–289 (2016).
Borg, J. et al. Haploinsufficiency for the erythroid transcription factor KLF1 causes hereditary persistence of fetal hemoglobin. Nat. Genet 42, 801–805 (2010).
Zhou, D., Liu, K., Sun, C. W., Pawlik, K. M. & Townes, T. M. KLF1 regulates BCL11A expression and gamma- to beta-globin gene switching. Nat. Genet 42, 742–744 (2010).
Zhang, X., Xia, F., Zhang, X., Blumenthal, R. M. & Cheng, X. C2H2 Zinc finger transcription factors associated with hemoglobinopathies. J. Mol. Biol. 436, 168343 (2024).
Dinkova-Kostova, A. T. & Copple, I. M. Advances and challenges in therapeutic targeting of NRF2. Trends Pharm. Sci. 44, 137–149 (2023).
Bonazzi, S. et al. Discovery and characterization of a selective IKZF2 glue degrader for cancer immunotherapy. Cell Chem. Biol. 30, 235–247 e212 (2023).
Li, J., Lv, A., Chen, M., Xu, L. & Huang, H. Activating transcription factor 4 in erythroid development and beta -thalassemia: a powerful regulator with therapeutic potential. Ann. Hematol. 103, 2659–2670 (2024).
Pant, T., Uche, N., Juric, M., Zielonka, J. & Bai, X. Regulation of immunomodulatory networks by Nrf2-activation in immune cells: Redox control and therapeutic potential in inflammatory diseases. Redox Biol. 70, 103077 (2024).
Zhang, X. et al. ATF family members as therapeutic targets in cancer: from mechanisms to pharmacological interventions. Pharmacol. Res. 208, 107355 (2024).
Liu, N. et al. Direct promoter repression by BCL11A controls the fetal to adult hemoglobin switch. Cell 173, 430–442 e417 (2018).
Zha, J. et al. Identification of novel BCL11A variant in a patient with developmental delay and behavioural differences. Int. J. Dev. Neurosci. 84, 727–73 (2024)
Shen, Y. et al. Pathogenic BCL11A variants provide insights into the mechanisms of human fetal hemoglobin silencing. PLoS Genet 17, e1009835 (2021).
Dias, C. et al. BCL11A haploinsufficiency causes an intellectual disability syndrome and dysregulates transcription. Am. J. Hum. Genet 99, 253–274 (2016).
Zheng, G. & Orkin, S. H. Transcriptional repressor BCL11A in erythroid cells. Adv. Exp. Med. Biol. 1459, 199–215 (2024).
Martyn, G. E. et al. Natural regulatory mutations elevate the fetal globin gene via disruption of BCL11A or ZBTB7A binding. Nat. Genet 50, 498–503 (2018).
Zhang, X., Blumenthal, R. M. & Cheng, X. Updated understanding of the protein-DNA recognition code used by C2H2 zinc finger proteins. Curr. Opin. Struct. Biol. 87, 102836 (2024).
Yang, Y. et al. Structural basis for human ZBTB7A action at the fetal globin promoter. Cell Rep. 36, 109759 (2021).
Yang, J. et al. Structures of CTCF-DNA complexes including all 11 zinc fingers. Nucleic Acids Res. 51, 8447–8462 (2023).
Persikov, A. V. & Singh, M. De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins. Nucleic Acids Res. 42, 97–108 (2014).
Zheng, G. et al. A tetramer of BCL11A is required for stable protein production and fetal hemoglobin silencing. Science 386, 1010–1018 (2024).
Kurita, R. et al. Establishment of immortalized human erythroid progenitor cell lines able to produce enucleated red blood cells. PLoS ONE 8, e59890 (2013).
Orkin, S. H. & Higgs, D. R. Medicine. Sickle cell disease at 100 years. Science 329, 291–292 (2010).
Peschle, C. et al. Embryonic-Fetal Hb switch in humans: studies on erythroid bursts generated by embryonic progenitors from yolk sac and liver. Proc. Natl Acad. Sci. USA 81, 2416–2420 (1984).
Ivaldi, M. S. et al. Fetal gamma-globin genes are regulated by the BGLT3 long noncoding RNA locus. Blood 132, 1963–1973 (2018).
Takase, S. et al. A specific G9a inhibitor unveils BGLT3 lncRNA as a universal mediator of chemically induced fetal globin gene expression. Nat. Commun. 14, 23 (2023).
Ma, S. P. et al. Long noncoding RNA HBBP1 enhances gamma-globin expression through the ETS transcription factor ELK1. Biochem. Biophys. Res. Commun. 552, 157–163 (2021).
Huang, P. et al. Comparative analysis of three-dimensional chromosomal architecture identifies a novel fetal hemoglobin regulatory element. Genes Dev. 31, 1704–1713 (2017).
Ma, Y. et al. Genome-wide analysis of pseudogenes reveals HBBP1’s human-specific essentiality in erythropoiesis and implication in beta-thalassemia. Develop. cell 56, 478–493 e411 (2021).
Nuinoon, M. et al. A genome-wide association identified the common genetic variants influence disease severity in beta0-thalassemia/hemoglobin E. Hum. Genet 127, 303–314 (2010).
Giannopoulou, E. et al. A single nucleotide polymorphism in the HBBP1 gene in the human beta-globin locus is associated with a mild beta-thalassemia disease phenotype. Hemoglobin 36, 433–445 (2012).
Galarneau, G. et al. Fine-mapping at three loci known to affect fetal hemoglobin levels explains additional genetic variation. Nat. Genet 42, 1049–1051 (2010).
Yang, Y. et al. Structural insights into the recognition of gamma-globin gene promoter by BCL11A. Cell Res. 29, 960–963 (2019).
Viennet, T. et al. Structural insights into the DNA-binding mechanism of BCL11A: the integral role of ZnF6. Structure. 19, 2024.01.17.576058 (2024)
Luscombe, N. M., Laskowski, R. A. & Thornton, J. M. Amino acid-base interactions: a three-dimensional analysis of protein-DNA interactions at an atomic level. Nucleic Acids Res. 29, 2860–2874 (2001).
Horton, J. R., Liebert, K., Hattman, S., Jeltsch, A. & Cheng, X. Transition from nonspecific to specific DNA interactions along the substrate-recognition pathway of dam methyltransferase. Cell 121, 349–361 (2005).
Kalodimos, C. G. et al. Structure and flexibility adaptation in nonspecific and specific protein-DNA complexes. Science 305, 386–389 (2004).
Patel, A. et al. DNA conformation induces adaptable binding by tandem zinc finger proteins. Cell 173, 221–233 e212 (2018).
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Kaur, G. et al. Allosteric autoregulation of DNA binding via a DNA-mimicking protein domain: a biophysical study of ZNF410-DNA interaction using small angle X-ray scattering. Nucleic Acids Res. 51, 1674–1686 (2023).
Grabarczyk, P. et al. The N-terminal CCHC zinc finger motif mediates homodimerization of transcription factor BCL11B. Mol. Cell. Biol. 38, e00368–17 (2018)
Okuyama, K. et al. A mutant BCL11B-N440K protein interferes with BCL11A function during T lymphocyte and neuronal development. Nat. Immunol. 25, 2284–2296 (2024).
Choo, Y. & Klug, A. Physical basis of a protein-DNA recognition code. Curr. Opin. Struct. Biol. 7, 117–125 (1997).
Wienert, B. et al. KLF1 drives the expression of fetal hemoglobin in British HPFH. Blood 130, 803–807 (2017).
Martyn, G. E. et al. A natural regulatory mutation in the proximal promoter elevates fetal globin expression by creating a de novo GATA1 site. Blood 133, 852–856 (2019).
Peron, A. et al. BCL11A intellectual developmental disorder: defining the clinical spectrum and genotype-phenotype correlations. Eur. J. Hum. Genet. 33, 312–324 (2024)
Patel, A., Zhang, X., Blumenthal, R. M. & Cheng, X. Structural basis of human PR/SET domain 9 (PRDM9) allele C-specific recognition of its cognate DNA sequence. J. Biol. Chem. 292, 15994–16002 (2017).
Paillard, G., Deremble, C. & Lavery, R. Looking into DNA recognition: zinc finger binding specificity. Nucleic Acids Res. 32, 6673–6682 (2004).
Mitra, R. et al. Geometric deep learning of protein-DNA binding specificity. Nat. Methods 21, 1674–1683 (2024).
Liu, Y., Zhang, X., Blumenthal, R. M. & Cheng, X. A common mode of recognition for methylated CpG. Trends Biochem. Sci. 38, 177–183 (2013).
Zhang, X., Blumenthal, R. M. & Cheng, X. DNA-binding proteins from MBD through ZF to BEN: recognition of cytosine methylation status by one arginine with two conformations. Nucleic Acids Res. 52, 11442–11454 (2024)
Nolte, R. T., Conlin, R. M., Harrison, S. C. & Brown, R. S. Differing roles for zinc fingers in DNA recognition: structure of a six-finger transcription factor IIIA complex. Proc. Natl Acad. Sci. USA 95, 2938–2943 (1998).
Huang, P. et al. HIC2 controls developmental hemoglobin switching by repressing BCL11A transcription. Nat. Genet 54, 1417–1426 (2022).
Hashimoto, H. et al. Structural basis for the versatile and methylation-dependent binding of CTCF to DNA. Mol. Cell 66, 711–720 e713 (2017).
Yin, M. et al. Molecular mechanism of directional CTCF recognition of a diverse range of genomic sites. Cell Res. 27, 1365–1377 (2017).
Liu, H. et al. Functional studies of BCL11A: characterization of the conserved BCL11A-XL splice variant and its interaction with BCL6 in nuclear paraspeckles of germinal center B cells. Mol. Cancer 5, 18 (2006).
Bellemare, J., Rouleau, M., Harvey, M. & Guillemette, C. Modulation of the human glucuronosyltransferase UGT1A pathway by splice isoform polypeptides is mediated through protein-protein interactions. J. Biol. Chem. 285, 3600–3607 (2010).
Avram, D. et al. Isolation of a novel family of C(2)H(2) zinc finger proteins implicated in transcriptional repression mediated by chicken ovalbumin upstream promoter transcription factor (COUP-TF) orphan nuclear receptors. J. Biol. Chem. 275, 10315–10322 (2000).
Kuo, T. Y., Hong, C. J., Chien, H. L. & Hsueh, Y. P. X-linked mental retardation gene CASK interacts with Bcl11A/CTIP1 and regulates axon branching and outgrowth. J. Neurosci. Res. 88, 2364–2373 (2010).
Avram, D., Fields, A., Senawong, T., Topark-Ngarm, A. & Leid, M. COUP-TF (chicken ovalbumin upstream promoter transcription factor)-interacting protein 1 (CTIP1) is a sequence-specific DNA binding protein. Biochem. J. 368, 555–563 (2002).
Wiegreffe, C. et al. Bcl11a (Ctip1) controls migration of cortical projection neurons through regulation of Sema3c. Neuron 87, 311–325 (2015).
Senawong, T. et al. Involvement of the histone deacetylase SIRT1 in chicken ovalbumin upstream promoter transcription factor (COUP-TF)-interacting protein 2-mediated transcriptional repression. J. Biol. Chem. 278, 43041–43050 (2003).
Senawong, T., Peterson, V. J. & Leid, M. BCL11A-dependent recruitment of SIRT1 to a promoter template in mammalian cells results in histone deacetylation and transcriptional repression. Arch. Biochem. Biophys. 434, 316–325 (2005).
Punwani, D. et al. Multisystem anomalies in severe combined immunodeficiency with mutant BCL11B. N. Engl. J. Med. 375, 2165–2176 (2016).
Wang, J., Malecka, A., Troen, G. & Delabie, J. Comprehensive genome-wide transcription factor analysis reveals that a combination of high affinity and low affinity DNA binding is needed for human gene regulation. BMC Genom. 16, S12 (2015).
Farley, E. K. et al. Suboptimization of developmental enhancers. Science 350, 325–328 (2015).
Farley, E. K., Olson, K. M., Zhang, W., Rokhsar, D. S. & Levine, M. S. Syntax compensates for poor binding sites to encode tissue specificity of developmental enhancers. Proc. Natl Acad. Sci. USA 113, 6508–6513 (2016).
Delker, R. K., Ranade, V., Loker, R., Voutev, R. & Mann, R. S. Low affinity binding sites in an activating CRM mediate negative autoregulation of the Drosophila Hox gene Ultrabithorax. PLoS Genet. 15, e1008444 (2019).
Kribelbauer, J. F., Rastogi, C., Bussemaker, H. J. & Mann, R. S. Low-affinity binding sites and the transcription factor specificity paradox in eukaryotes. Annu. Rev. Cell Develop. Biol. 35, 357–379 (2019).
Terrell, J. R. et al. DNA selection by the master transcription factor PU.1. Cell Rep. 42, 112671 (2023).
Lan, F. et al. Recognition of unmethylated histone H3 lysine 4 links BHC80 to LSD1-mediated gene repression. Nature 448, 718–722 (2007).
Vonrhein, C. et al. Data processing and analysis with the autoPROC toolbox. Acta Crystallogr. D. Biol. Crystallogr. 67, 293–302 (2011).
Kabsch, W. Xds. Acta Crystallogr. D. Biol. Crystallog.r 66, 125–132 (2010).
Evans, P. Scaling and assessment of data quality. Acta Crystallogr. D. Biol. Crystallogr. 62, 72–82 (2006).
Evans, P. R. & Murshudov, G. N. How good are my data and what is the resolution? Acta Crystallogr. D. Biol. Crystallogr. 69, 1204–1214 (2013).
McCoy, A. J. et al. Phaser crystallographic software. J. Appl. Crystallogr. 40, 658–674 (2007).
Headd, J. J. et al. Use of knowledge-based restraints in phenix.refine to improve macromolecular refinement at low resolution. Acta Crystallogr. D. Biol. Crystallogr. 68, 381–390 (2012).
Afonine, P. V. et al. Towards automated crystallographic structure refinement with phenix.refine. Acta Crystallogr. D. Biol. Crystallogr. 68, 352–367 (2012).
Brunger, A. T. Free R value: cross-validation in crystallography. Methods Enzymol. 277, 366–396 (1997).
Brunger, A. T. Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature 355, 472–475 (1992).
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D. Biol. Crystallogr. 66, 486–501 (2010).
Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Crystallogr. D. Biol. Crystallogr. 60, 2126–2132 (2004).
Read, R. J. et al. A new generation of crystallographic validation tools for the protein data bank. Structure 19, 1395–1412 (2011).
Taing, L. et al. Cistrome data browser: integrated search, analysis and visualization of chromatin data. Nucleic Acids Res. 52, D61–D66 (2024).
Acknowledgements
The work is supported by the grants R35GM134744 (to X.C.) and R01DK132286 (to Y.H. and X.C.) from the U.S. National Institutes of Health (NIH). M.Y. is supported by a training fellowship from the Gulf Coast Consortia on the National Library of Medicine Training Program in Biomedical Informatics and Data Science (T15LM007093). M.T. is an undergraduate at University of Houston and R.R.A. is an undergraduate at University of Texas at Austin, both were supported by the Catalyst Summer Research Training Program. M.T. was supported by a generous gift from H-E-B directed to the Partnership for Careers in Cancer Science and Medicine Program, Division of Education & Training. R.R.A. was supported by the MD Anderson Summer Program in Cancer Research (R25CA181004). Xiaotian Z. is supported by NIH grant R01HL170115. X.C. is a CPRIT Scholar in Cancer Research. We thank Dr. Xiangpeng Kong (New York University) for assistance of access to 17-ID-1 beamtime and the beamline scientists of the National Synchrotron Light Source (NSLS) II, Brookhaven National Laboratory. The use of NSLS II resources 17-ID-1 was supported by the U.S. Department of Energy under contract DE-SC0012704.
Author information
Authors and Affiliations
Contributions
J.R.H. performed protein purification, crystallization, and X-ray diffraction experiments. M.Y., with the help of Y.L., analyzed published ChIP-seq and CUT&RUN data. M.T. and R.R.A., under the guidance of J.Z., performed FP and EMSA binding assays. Xiaotian Z. and Y.H. provided advice on BCL11A biology and participated in discussion. R.M.B. participated in discussions, sequence analysis, and in preparing the manuscript. Xing Z. and X.C. designed and organized the scale of the study.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interest.
Peer review
Peer review information
Nature Communications thanks Andrew Perkins, Dong Wang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Horton, J.R., Yu, M., Zhou, J. et al. Multimeric transcription factor BCL11A utilizes two zinc-finger tandem arrays to bind clustered short sequence motifs. Nat Commun 16, 3672 (2025). https://doi.org/10.1038/s41467-025-58998-7
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41467-025-58998-7










