Structural basis for genome-wide site-specific DNA recognition by Nuclear Factor IA

Zhu, Ci; Xiao, Ding; Luo, Zhipu; Zhang, Jie; Liu, Shuang; Wang, Yueyu; Chen, Xi; Xiao, Hejin; Li, Xiaofeng; Tang, Juyu; Fang, Xianyang; Shen, Jie; Song, He

doi:10.1038/s41467-025-67641-4

Download PDF

Article
Open access
Published: 15 December 2025

Structural basis for genome-wide site-specific DNA recognition by Nuclear Factor IA

Ci Zhu¹^na1,
Ding Xiao ORCID: orcid.org/0009-0009-9664-4815^2,3^na1,
Zhipu Luo ORCID: orcid.org/0000-0003-0685-0754⁴^na1,
Jie Zhang^5,6,
Shuang Liu⁷,
Yueyu Wang¹,
Xi Chen¹,
Hejin Xiao⁸,
Xiaofeng Li⁷,
Juyu Tang^2,3,
Xianyang Fang ORCID: orcid.org/0000-0001-9432-9736⁶,
Jie Shen ORCID: orcid.org/0000-0003-4131-1211⁹ &
…
He Song¹

Nature Communications volume 17, Article number: 917 (2026) Cite this article

2704 Accesses
Metrics details

Subjects

Abstract

Nuclear Factor IA, a member of the long-studied Nuclear Factor I family of DNA-binding proteins, plays pivotal roles in development and metabolism. Dysregulation or loss of Nuclear Factor IA is associated with severe neurological defects in humans and disruptions in fatty acid metabolism linked to conditions such as osteoarthritis. Despite extensive study, the DNA recognition mechanism of Nuclear Factor I family proteins remains unresolved. Previous studies have proposed that these proteins dimerize via their DNA-binding domains to bind TGGCA-containing dyad sequences. In this study, we demonstrate that both full-length Nuclear Factor IA and its isolated DNA-binding domain are monomeric in solution, challenging dimer models. Genome-wide ChIP-Seq analysis shows TGGCA half-sites are enriched among Nuclear Factor IA binding motifs. We determine Nuclear Factor IA’s crystal and solution structures bound to half-site and dyad-symmetric DNA motifs, providing a structural basis for its monomeric DNA recognition. Furthermore, functional binding assays show that key residues in Nuclear Factor IA, which facilitate base-specific interactions, are critical for DNA sequence recognition and binding. These findings establish the DNA-binding mechanism of Nuclear Factor IA and provide a detailed molecular framework for understanding the functions of this classic transcription factor family.

Structural basis of Nuclear Factor 1-X DNA recognition provides prototypic insight into the NFI family

Article Open access 19 November 2025

Genome-wide quantification of transcription factor binding at single-DNA-molecule resolution using methyl-transferase footprinting

Article 12 November 2021

Genome-wide protein–DNA interaction site mapping in bacteria using a double-stranded DNA-specific cytosine deaminase

Article Open access 01 June 2022

Introduction

The Nuclear Factor I (NFI) family comprises site-specific DNA-binding transcription factors (TFs) that are critical for regulating gene expression and viral DNA replication^1,2,3. Initially identified as essential activators of the adenovirus origin of replication, NFI proteins were later shown to bind promoter and enhancer regions of diverse genes, regulating transcription in both cellular and viral systems^4,5,6,7. In vertebrates, the NFI family comprises four members—NFIA, NFIB, NFIC, and NFIX—each exhibiting distinct, though sometimes overlapping, expression patterns during embryogenesis and in adult stem cells^8,9,10. These proteins are involved in various developmental and regulatory pathways, including cytokine-mediated differentiation and cerebellum development, underscoring their broad functional versatility^11,12,13. Dysregulation of NFI proteins has been associated with significant changes in gene expression, leading to diverse physiological and pathological consequences, including tumor development in diverse cancers^14,15.

The NFI family member, NFIA, has emerged as a critical regulator of both neural development and metabolism. In conjunction with NFIB, NFIA is indispensable for proper brain development and central nervous system function^16,17,18. Nfia-knockout mice exhibit severe neurological abnormalities, including corpus callosum dysgenesis, hippocampal malformations, and enlarged ventricles^19,20,21. Consistent with these findings, mutations or deletions of Nfia in humans are associated with intellectual disabilities and structural brain defects^22,23,24,25. Beyond its well-established role in neural development, our recent work identified elevated NFIA expression in both human and murine osteoarthritic (OA) articular chondrocytes²⁶. In this context, NFIA acts as a key transcription factor that upregulates genes involved in fatty acid metabolism, including the rate-limiting enzymes ACACA and CPT2. This transcriptional activation enhances fatty acid metabolism, leading to disrupted cellular homeostasis in OA articular chondrocytes. Notably, NFIA inhibition reversed these effects by suppressing metabolic enzyme overexpression, which subsequently alleviated cartilage destruction, joint pain responses, and disease progression in the murine OA model. These findings have motivated us to pursue the structural characterization of NFIA as a foundation for developing mechanism-based therapies for OA, particularly given the current lack of disease-modifying treatments^27,28,29.

In terms of domain organization, NFIA comprises a well-defined N-terminal DNA-binding domain (DBD), which mediates sequence-specific DNA recognition, and a proline-rich C-terminal region that remains largely unstructured and variable^1,15. The C-terminal domain is subject to alternative splicing and post-translational modifications, contributing to its role in transcriptional activation or repression³⁰. The DBD is highly conserved across the NFI protein family yet shares no significant sequence homology with other DNA-binding domains^31,32. Previous studies suggested that NFI proteins dimerize via their DBDs and thereby recognize dyad-symmetric sequences, such as TGGCA(N3)TGCCA^{4,32,33,34,35}. Meanwhile, half-sites containing the TGGCA motif have also been reported as sufficient for DNA binding and transcriptional activation, implying that DNA recognition might be more flexible than previously anticipated^32,34,36. Despite these findings, NF1 proteins, including the isolated DBD, have not previously been purified to the level required for rigorous biophysical analysis, leaving the molecular mechanism underlying their DNA recognition unresolved.

To address these gaps, we expressed and purified full-length NFIA and its isolated DBD, facilitating a comprehensive analysis of their oligomerization and DNA-binding properties. Remarkably, structural analysis of the NFIA_DBD revealed that it adopts a monomeric state in solution, in contrast to the previously proposed dimeric model. Genome-wide ChIP-Seq analysis revealed that TGGCA half-sites are enriched within NFIA peak regions, while symmetric TGGCA(N3)TGCCA full-sites are also bound by NFIA. Functional binding assays showed comparable binding affinities for both motifs. To further elucidate the structural basis of DNA recognition, we determined the crystal structures of NFIA_DBD in complex with DNA containing these motifs. Our results validate the monomeric DNA-binding mode indicated by our biophysical analyses and provide detailed molecular mechanistic insights into sequence-specific recognition by NFIA. Since this family shows high sequence conservation, our data may fundamentally define the structural paradigm for NFI-DNA interactions.

Results

Characterization of NFIA and its DBD in solution

Previous studies have proposed that NFI proteins form a dimer via their double-stranded DNA-binding domains that facilitate DNA recognition and binding^33,34,37,38. However, these conclusions were largely derived from analyses conducted in cell lysates or using crosslinking techniques, both of which may overlook other DNA-binding configurations. In cell lysates, unpurified NFI proteins can interact with other cellular components, such as proteins, nucleic acids, or cofactors, potentially creating artifacts that complicate accurate determination of the oligomeric state of a protein. Yet, detailed analyses of purified NFI family proteins or their isolated DBDs are limited, leaving a gap in the reliable characterization of their oligomerization properties and allowing the dimer theory to persist.

To address these limitations, we systematically expressed and purified both full-length human NFIA and its DNA-binding domain (NFIA_DBD, Fig. 1a), enabling rigorous analysis of their oligomeric states under controlled conditions. Optimized expression systems yielded full-length NFIA from HEK293 cells and NFIA_DBD from E. coli in sufficient quantities for downstream biophysical analysis. Both constructs included an N-terminal His₆–MBP tag to facilitate initial purification via nickel-affinity chromatography, with a TEV protease cleavage site introduced to minimize interference from the tag. After TEV cleavage, size-exclusion chromatography (SEC) was employed for further purification. The SEC profiles for both NFIA and NFIA_DBD revealed single, symmetric Gaussian peaks consistent with well-folded, monodisperse proteins, and elution volumes corresponding to monomeric molecular weights (Fig. 1b). Non-denaturing gel electrophoresis further confirmed that both NFIA and NFIA_DBD exist as monomers in solution (Fig. 1c). These results do not align with the previously proposed dimeric model and demonstrate that neither full-length NFIA nor its DBD undergoes dimerization under the tested conditions.

**Fig. 1: Characterization of full-length NFIA and its isolated DNA-binding domain.**

The structure of NFIA_DBD

Given that the NFIA DNA-binding domain is highly conserved and structurally ordered, whereas the remainder of the protein is largely unstructured, we focused on the NFIA_DBD for structural characterization. We successfully crystallized NFIA_DBD and determined its structure at a resolution of 3.2 Å (Supplementary Table 1). The crystal exhibits P4₁2₁2 symmetry, with the asymmetric unit containing three polypeptide chains of NFIA, encompassing residues 13–173. Each NFIA_DBD molecule forms a single domain characterized by a distinct fold comprising six α-helices and two short, two-stranded β-sheets positioned on one side of the domain (Fig. 1d). The β-sheet region coordinates a zinc ion via three cysteines and one histidine, a feature reminiscent of the SMAD MH1 DNA-binding domain (Supplementary Fig. 1). However, the overall architecture of NFIA_DBD diverges significantly from the MH1 domain, particularly in its extended α-helical region, defining a distinct structural framework for the NFI DNA-binding domain family.

Despite the presence of three NFIA_DBD molecules in the asymmetric unit, SEC analysis confirmed that NFIA_DBD exists as a monomer in solution. To further characterize the solution conformation, we utilized small-angle X-ray scattering (SAXS). The scattering intensity I(q) versus momentum transfer q and the pair-distance distribution function (PDDF) (Fig. 1e) indicated a molecular mass of 24.8 kDa (Supplementary Table 2), consistent with the monomeric state observed in the crystal structure. The maximum dimension (D_max) of 55 ± 3 Å aligns closely with the longest dimension observed in the crystal structure (53 Å). Moreover, the ab initio shape calculated from the SAXS data matched the monomeric crystal structure precisely (Fig. 1f). These findings validate the solution structure of NFIA_DBD and corroborate the architecture observed in the crystal structure, providing a consistent view of its structural properties using distinct methodologies.

Genome-wide NFIA DNA-binding specificity

To elucidate the genome-wide DNA-binding specificity of NFIA and gain further insight into its mechanistic binding patterns, we analyzed its preferred DNA motifs. Chromatin immunoprecipitation sequencing (ChIP-Seq) data for NFIA were processed using the HOMER algorithm to identify enriched sequence motifs³⁹. Mapping of NFIA-bound peaks revealed significant enrichment in promoter regions (Supplementary Fig. 2a), consistent with its function as a transcription factor. Among these binding peaks, TGGCA-containing motifs emerged as the predominant binding motif, present in 37.4% of NFIA-bound genomic regions, with similar counts observed when using 10- or 18-nt motif windows (Fig. 2a and Supplementary Fig. 2c). This result is consistent with the sequence motif for NFIA reported in the JASPAR database and corresponds to motifs identified in genes associated with fatty acid metabolism^26,40. Additionally, the dyad-symmetric consensus sequence TGGCA(N3)TGCCA, which was not previously reported in JASPAR, was identified in 6.4% of NFIA-bound regions in our analysis (Fig. 2a and Supplementary Fig. 2b). Notably, TGGCA-containing regions showed greater enrichment in genes associated with cellular metabolism and developmental processes (Supplementary Fig. 2d), underscoring the critical role of NFIA in these biological pathways. These findings provide insights into the cellular DNA-binding specificity of NFIA, expanding our understanding of NFIA beyond the traditionally recognized dyadic site.

**Fig. 2: Identification of NFIA DNA-binding motifs.**

To further investigate NFIA’s DNA-binding mechanism, we employed biolayer interferometry (BLI) to evaluate its DNA-binding properties⁴¹. Based on the two DNA sequences uncovered by our NFIA ChIP-Seq data, DNA oligonucleotides were synthesized for functional binding assays: a 12-bp DNA-S containing the TGGCA half-site motif and an 18-bp DNA-L, incorporating the dyad-symmetric TGGCA(N3)TGCCA full-site motif (Fig. 2b). Both substrates were chemically synthesized with a biotin tag at the 5′ end of the forward strand to enable immobilization on an Octet SA biosensor for BLI-based analysis. Upon exposure to NFIA during the association phase, a significant wavelength shift was observed for both DNA-S and DNA-L, even at low protein concentrations, indicating strong and specific binding (Fig. 2c, e). BLI analysis revealed a 1:1 binding stoichiometry for both substrates, with dissociation constants (K_d) of 20.4 nM (R² = 0.9994) for DNA-S and 27.4 nM (R² = 0.9986) for DNA-L, demonstrating similar binding affinities. Additional assays with the isolated NFIA_DBD yielded comparable binding profiles and dissociation constants (Fig. 2d, f, Supplementary Table 4), and independent measurements using microscale thermophoresis (MST) produced similar K_d values for both DNA-S and DNA-L (Supplementary Fig. 3), further validating the robustness of the affinity measurements and confirming that the DNA-binding domain alone is sufficient for sequence-specific DNA recognition. Importantly, substitution of the TGGCA motif in DNA-S with either a polyT stretch or an unrelated CAGAC sequence completely abolished binding (Supplementary Fig. 4), supporting the sequence specificity of NFIA-DNA interactions. Collectively, these results reveal that NFIA binds DNA containing either the half-site or full-site motif with similar affinity in a sequence-specific manner.

Structure of NFIA_DBD bound to the TGGCA motif

To elucidate the structural basis of DNA recognition by NFIA, we initially conducted crystallization trials using the NFIA_DBD in the presence of the functional DNA-S (Fig. 3a). We obtained crystals of the protein-DNA complex and determined its structure at a resolution of 2.3 Å (Supplementary Table 1). The structure revealed that the complex consisted of a single NFIA_DBD molecule bound to a DNA-S duplex (NFIA_DBD:DNA-S), consistent with the SEC profile (Fig. 3b). In Fig. 3a, the right panel shows a view perpendicular to the axis of the dsDNA, highlighting the precise fit of the DNA within the NFIA_DBD. The left panel presents a view along the dsDNA axis, illustrating how one NFIA_DBD molecule binds the entire length of one helical turn of B-form DNA with a 1:1 binding stoichiometry. The crystal symmetry is C2, and the asymmetric unit contains one polypeptide chain (residues 13–173), two DNA strands, and one zinc ion. The NFIA_DBD:DNA-S structure demonstrated that the protein recognizes the 12-bp DNA duplex by inserting a loop into the major groove of the DNA, inducing conformational changes within the loop region while leaving the rest of the protein structure largely unaltered (Fig. 3c). The overall structure forms a compact complex, with a buried surface area of 736.2 Å², indicative of specific protein-DNA binding. These observations provide direct structural evidence of the NFIA DNA recognition mechanism and its role in gene regulation.

**Fig. 3: Overall structure of NFIA_DBD in complex with DNA-S.**

To further confirm the formation of the NFIA_DBD:DNA-S complex in solution, we examined it using SAXS. Analysis of the SAXS data revealed a molecular mass of 25.3 kDa, which agrees with the crystal structure composition (26.2 kDa). Additionally, the D_max value of 55 ± 3 Å matched the longest dimension observed in the crystal structure (Supplementary Table 2). To elucidate the solution structure of NFIA_DBD:DNA-S, an ab initio shape envelope was generated using the program DAMMIN. The back-calculated scattering profile for the crystal structure closely aligned with the experimental data (Fig. 3d, χ² = 0.90), and the crystal structure fit remarkably well within this ab initio shape envelope (Fig. 3e). These findings confirm that the crystal structure accurately represents the assembly of the NFIA_DBD:DNA-S complex.

Molecular mechanism of DNA recognition by NFIA

Structural analysis of the NFIA_DBD:DNA-S complex revealed both sequence-specific and backbone interactions with the DNA, providing critical insights into the molecular basis of NFIA function as a transcription factor. The detailed protein-DNA interactions are summarized in Supplementary Table 3. Specifically, the sidechains of R38, K78, K81, Q110, and R121 interact with the phosphate backbone of the DNA (Fig. 4a), contributing to non-specific DNA binding. Sequence-specific recognition is primarily mediated by residues R116, A123, and K125. Among these, the sidechain of R116 forms two base-specific hydrogen bonds with the G6 base of strand 2, while A123’s carbonyl oxygen establishes a hydrogen bond with C8 of the same strand (Fig. 4a). Additionally, K125 plays a pivotal role by forming base-specific hydrogen bonds with G5 and G6 bases of strand 1, and its carbonyl oxygen interacts with C7 on the complementary strand (Fig. 4a). The schematic representation of key interactions is shown in Fig. 4b. These residues collectively create a positively charged surface on NFIA_DBD that engages both strands of the dsDNA in a 1:1 stoichiometric complex (Fig. 4c), targeting the TGGCA base-pairs positioned at the center of the major groove.

**Fig. 4: Sequence-specific recognition of double-stranded DNA by the NFIA_DBD.**

To gain deeper insights into the functional mechanism of DNA binding, we performed site-directed mutagenesis coupled with BLI binding assays. Three key residues mediate DNA recognition: A123 interacts with the DNA through its carbonyl oxygen on the main chain, while R116 and K125 establish base-specific interactions via their sidechains. Accordingly, we introduced individual and combined mutations in R116 and K125 (Supplementary Fig. 5). Mutations in either residue significantly disrupted DNA-binding activity, as evidenced by a substantial reduction in wavelength shift during the association phase compared to wild-type NFIA_DBD (Fig. 4d, Supplementary Fig. 5). Quantitative analysis revealed that the R116A and K125A mutations each completely abolished detectable binding (Fig. 4e, Supplementary Fig. 5, Supplementary Table 4), highlighting the critical roles of these residues in sequence-specific DNA recognition. To further validate the functional significance of this binding mechanism, we performed dual-luciferase reporter assays using the pGL3 vector system, in which the firefly luciferase gene is driven by the native promoters of NFIA target genes ACACA and CPT2²⁶. Overexpression of wild-type NFIA significantly activated reporter expression, whereas the R116A/K125A mutant failed to induce transcriptional activation, consistent with its inability to bind DNA (Supplementary Fig. 6). Together, our structural and functional analyses provide a detailed understanding of the molecular mechanism by which NFIA engages with its target DNA.

Structural basis of TGGCA(N3)TGCCA motif recognition by NFIA

BLI assays demonstrated that NFIA binds both DNA-S and DNA-L with comparable specificity (Supplementary Table 4). To elucidate the mechanism underlying NFIA’s recognition of the dyad-symmetric sequence TGGCA(N3)TGCCA, we crystallized the NFIA_DBD in complex with DNA-L and determined its structure at a resolution of 2.7 Å (Supplementary Table 1). The crystal symmetry is P2₁, with the asymmetric unit containing two NFIA_DBD:DNA-L complexes. As the two complexes were structurally similar, and SEC data confirmed a 1:1 stoichiometry for NFIA_DBD binding to DNA-L, we focused on analyzing a single NFIA_DBD:DNA-L complex (Fig. 5a–c).

**Fig. 5: Crystal structure of NFIA_DBD in complex with DNA-L.**

The detailed interactions between NFIA_DBD and DNA-L are illustrated in Fig. 5b and summarized in Supplementary Table 3. Similar to the interactions observed in the DNA-S complex, residues R38, K78, K81, Q110, R121, and T145 mediate non-specific interactions with the DNA backbone. Sequence-specific recognition of the TGGCA base-pairs is facilitated by residues R116, A123, and K125. While the symmetric TGGCA motif was present within the DNA-L crystal structure, no protein-DNA interactions were observed involving this region (Fig. 5d). The symmetric motif was positioned within the major groove but remained unengaged by NFIA.

To investigate whether NFIA could simultaneously recognize both halves of the dyad-symmetric TGGCA(N3)TGCCA motif, we created a structural model aligning the first TGGCA base-pairs with the symmetric TGGCA motif. Although this alignment suggested the theoretical possibility of two NFIA_DBD molecules binding the same DNA duplex with a three-nucleotide spacer, the structural model revealed significant steric clashes between the two NFIA molecules if bound simultaneously (Fig. 5e). These clashes occur because the two major grooves of the DNA duplex are located on the same side, preventing simultaneous binding of two NFIA molecules (Fig. 5e). These findings are consistent with our functional assays, supporting the conclusion that NFIA binds the TGGCA(N3)TGCCA motif in a monomeric 1:1 binding mode. The structural evidence presented here resolves ambiguities surrounding DNA-binding by NFIA and provides a foundational framework for future investigations into its transcriptional regulatory roles.

Discussion

The molecular characterization of transcription factors is pivotal for elucidating their roles in development, physiological processes, and pathological conditions. Despite being studied for over three decades, the molecular mechanisms underlying DNA binding and transcriptional regulation by the NFI family of TFs remain incompletely understood. Here, we demonstrate that full-length human NFIA and its DBD function as monomers and exhibit specific DNA-binding activity. Through structural analyses of NFIA_DBD bound to DNA substrates containing TGGCA and dyad-symmetric TGGCA(N3)TGCCA recognition motifs, we elucidated the molecular basis of its sequence-specific recognition of the TGGCA motif within the major groove. Our findings establish NFIA as a TGGCA-binding protein. Moreover, the conserved nature of DNA-interacting residues across the NFI family suggests a shared mechanism of DNA recognition (Supplementary Fig. 7). These insights represent a substantial advance in our understanding of the NFI family and address a long-standing gap in their mechanistic characterization.

Our data establish the NFI DBD as a unique type of DNA-binding domain with distinct structural and functional properties. Although the JASPAR database places the NFI DBD into a family similar to the SMAD MH1 domain⁴⁰, our results reveal significant differences between these DBD domains. The overall structure of the NFI DBD is very different from MH1 since it contains an extended α-helical region that mediates DNA interactions (Supplementary Fig. 1). Additionally, NFIA_DBD inserts a loop into the DNA major groove for sequence recognition, while MH1 relies on a β-hairpin for DNA binding (Supplementary Fig. 1). Furthermore, the sequence recognized by NFIA_DBD differs significantly from that of MH1 domains. NFI DBD recognizes the TGGCA motif, while MH1 recognizes the CAGAC Smad binding element, further supporting the classification of NF1 as a unique DNA-binding domain family⁴². Interestingly, SMAD proteins have also been shown to interact with 5GC-rich sequences⁴³. These sequences reported all contain a central GGC motif within the major groove, alongside other GC-rich elements, aligning with the base-specific contacts observed in our NFIA–DNA complex. Consistent with this structural similarity of the DNAs, we tested a representative sequence from PDB entry 5MEY and found that NFIA binds it with an affinity comparable to its canonical TGGCA motif (Supplementary Fig. 4c), suggesting a broader sequence tolerance centered around GGC recognition.

Two structures of the human NFIX DNA-binding domain have been deposited in the Protein Data Bank (PDB IDs: 7QQD and 7QQE); these entries are not supported by published functional data. The apo structure of NFIX (PDB ID: 7QQE), resolved at 3.5 Å, shows a high degree of similarity to the NFIA structure, with a root mean square deviation (RMSD) of 0.76 Å (Supplementary Fig 8a). The NFIX structure in complex with dsDNA (5′-TTGGCAGGCAGCCAG-3′) is available at 2.7 Å resolution (PDB ID: 7QQD). However, this complex does not appear to be functionally relevant, as the dsDNA is positioned outside the protein and lacks base-specific interactions (Supplementary Fig. 8b, c). Notably, we did not observe such non-specific or peripheral DNA contacts in any of our NFIA–DNA structures, all of which exhibit clear and reproducible base-specific interactions. These limitations in prior structural data emphasize the importance of our findings, which provide functionally validated structural insights into the interaction of NFIA with DNA.

To validate the importance of the NFIA-DBD-mediated DNA recognition mechanism we uncovered, we employed AlphaFold to predict the structure of NFIA_DBD and its DNA-S complex. AlphaFold accurately predicted the overall structure of NFIA_DBD, except for the DNA-binding loop, likely due to its flexibility in the absence of DNA (Supplementary Fig. 9a). However, AlphaFold failed to predict the NFIA:DNA-S complex accurately, as the predicted protein-DNA interactions did not correspond to our experimental data (Supplementary Fig. 9b, c). This discrepancy underscores the importance of the dsDNA recognition mechanism revealed in our study. Since AlphaFold cannot predict these interactions, our work establishes a structural paradigm for NFI proteins. This insight highlights the value of experimental validation in understanding protein-DNA interactions.

Our findings suggest an alternative to the previously assumed dimeric DNA-binding model for NFI proteins. The dyad-symmetric TGGCA(N3)TGCCA sequence observed in earlier studies and in our ChIP-seq analysis suggested that a dimeric mode of binding does exist. However, our structural analyses demonstrated that NFIA cannot simultaneously bind two TGGCA base-pairs separated by a three-nucleotide spacer due to steric hindrance. Instead, the symmetric motif may enhance the DNA recognition efficiency of NFIA or serve as a binding site for this protein in conjunction with cellular cofactors. Future studies are needed to explore the potential roles of cofactors in NFIA-mediated DNA binding. By providing a structural basis for NFIA recognition of DNA containing TGGCA motifs, our work lays the groundwork for drug design and screening efforts targeting NFIA, particularly in therapeutic applications such as osteoarthritis treatment.

Methods

Expression and purification of human NFIA and its DBD

The cDNA encoding full-length human NFIA was synthesized and cloned into the pcDNA 3.1 vector, incorporating an N-terminal His₆-MBP tag followed by a TEV protease cleavage site. Protein was expressed using the Expi293 expression system (Thermo Fisher Scientific). Five days post-transfection, the cells were harvested by centrifugation and stored at −80 °C for subsequent protein purification.

The binding domain of NFIA (residues 13–175) was cloned into the pMAL-c5X vector with a His₆-MBP tag at the N terminus, followed by a TEV cleavage site. The clone was overexpressed in Escherichia coli strain BL21 (DE3) (Beyotime). Cultures were grown to mid-log phase at 37 °C in Luria-Bertani (LB) medium, induced by the addition of beta-D-1-thiogalactopyranoside (IPTG) to a final concentration of 1 mM. The cells were incubated for an additional 16 h at 18 °C, harvested by centrifugation, and stored at −80 °C until protein purification.

Both NFIA and its DNA-binding domain (DBD) were purified using the same method at room temperature, using an AKTA chromatography system with prepacked columns (GE Healthcare). Cells were suspended in buffer A [30 mM Tris-HCl (pH 7.5), 200 mM NaCl, 10% (v/v) glycerol, 1 mM TCEP, and 25 mM imidazole], lysed by sonication, and centrifuged at 26,916 × g for 30 min. The supernatant was applied onto a 5-ml His Trap FF column pre-equilibrated in buffer A. The column was washed with buffer A to baseline, and the fusion protein was eluted with buffer B [30 mM Tris-HCl (pH 7.5), 200 mM NaCl, 10% (v/v) glycerol, 1 mM TCEP, and 300 mM imidazole]. The fusion protein was digested overnight with 0.2 mg ml⁻¹ TEV protease. The digestion solution was buffer-exchanged to buffer A and applied onto a 5-ml His Trap FF column pre-equilibrated in buffer A. The target protein was isolated in the column flow-through and was concentrated to 5 ml, which was then applied onto a Superdex 75 gel filtration column pre-equilibrated in buffer C [25 mM Tris-HCl (pH 7.5), 200 mM NaCl, and 1 mM TCEP]. The protein was collected from peak fractions, and the quality was analyzed by SDS gel electrophoresis.

ChIP-sequencing data analysis

NFIA ChIP-Seq data (ENCSR226QQM) were obtained from the ENCODE database. Raw sequencing reads were aligned to the human reference genome assembly (hg38) using the findMotifsGenome.pl and annotatePeaks.pl functions within the HOMER software with default parameters³⁹. Significantly enriched de novo binding motifs were identified and annotated with motif logos generated for visualization using the WebLogo software⁴⁴. The enrichment p-values reported in Supplementary Fig. 2a were calculated automatically by HOMER’s annotatePeaks.pl script during the peak annotation step. Briefly, HOMER evaluates the statistical significance of peak enrichment across distinct genomic annotations (e.g., promoters, exons, introns, intergenic regions) by comparing the observed number of peaks within each annotation category to the expected number, based on the fraction of the genome occupied by that category. Statistical significance is assessed using a hypergeometric distribution, and the resulting p-values are log-transformed and reported as logP scores³⁹. ChIP-Seq peak sets containing either the TGGCA half-site motif or the dyad-symmetric motif are provided as Supplementary Data files.

Biolayer interferometry binding assay

The DNA-binding activities of NFIA and its mutants were assessed using biolayer interferometry (BLI) on an Octet-K2 device. Biotin-labeled DNA and its complementary strand were chemically synthesized (General Biol), annealed, and loaded onto activated Octet SA biosensors. The DNA sequences used were 5′-Biotin-AGTTGGCAAGTC-3′ and 5′-GACTTGCCAACT-3′ for DNA-S, and 5′-Biotin-AGTTGGCAAGATGCCATC-3′ and 5′-GATGGCATCTTGCCAACT-3′ for DNA-L. The DNA substrates were loaded at a concentration of 0.5 ng/μL for DNA-L and 1 ng/μL for DNA-S in BLI buffer (20 mM Tris-HCl (pH 7.5), 100 mM NaCl, 1 mM ZnCl₂, pH 7.4) until saturation. The loaded biosensors were then incubated with serial dilutions of NFIA or its mutants in BLI buffer. Association and dissociation were measured at 30 °C and fitted using ForteBio Data Analysis software.

Microscale thermophoresis (MST) assay

MST experiments were performed using a Monolith NT.115 instrument (NanoTemper Technologies) with standard capillaries in a total reaction volume of 10 μL. Cy5-labeled DNA substrates were diluted to a final concentration of 20 nM. Serial dilutions of purified NFIA_DBD were prepared by performing 15 two-fold serial dilutions starting from a 2 μM stock concentration, as shown in Supplementary Fig. 3. Samples were incubated for 60 min to allow equilibrium binding prior to loading into capillaries. Measurements were conducted at 30 °C with 20% LED excitation power, 40% MST power, and detection via the NanoRed fluorescence channel. Data were analyzed using MO Affinity Analysis Software v2.3, and dissociation constants (K_d) were calculated using the standard model based on the law of mass action.

Dual-luciferase reporter assay

To evaluate transcriptional activation by NFIA, a dual-luciferase reporter assay was performed using HEK293 cells (ATCC, CRL-1573). Briefly, cells were seeded in 24-well plates at a density to reach approximately 60% confluence on the day of transfection. The 2-kb upstream promoter regions of the human ACACA and CPT2 genes were cloned into the pGL3-Basic firefly luciferase reporter vector (Servicebio). Cells were co-transfected with either wild-type NFIA, mutant NFIA (R116A/K125A), or an empty pcDNA 3.1 control vector using Lipofectamine 3000 (Thermo Fisher Scientific), along with the pRL-TK Renilla luciferase vector (Promega), which served as an internal control for normalization. Each condition was transfected in triplicate. After 48 h, cells were lysed, and luciferase activity was measured using the Dual-Luciferase Reporter Assay System (Yeasen) following the manufacturer’s protocol. Firefly luciferase activity was normalized to Renilla luciferase activity to account for variation in transfection efficiency.

Crystal structure determination

The fresh purified NFIA_DBD protein in buffer C was concentrated to 10 mg ml⁻¹ and used for crystallization immediately. Crystals were grown by mixing the protein solution with an equal volume of reservoir solution [0.1 M CHES (pH 9.5), and 30% (w/v) PEG 3000], using the sitting-drop vapor-diffusion method. For crystallization of the protein-DNA complexes, DNA oligos (Fig. 2b) were purchased (General Biol) and annealed without further purification. The purified NFIA_DBD protein (10 mg ml⁻¹) was mixed with either DNA-S or DNA-L at a molar ratio of 1:1.2 and incubated on ice for 30 min. Crystals of the NFIA_DBD:DNA-S complex were grown by mixing the protein-DNA solution with an equal volume of reservoir solution [0.1 M Bis-Tris Propane (pH 7.0), and 1.5 M Ammonium Sulfate], while crystals of the NFIA_DBD:DNA-L complex were grown using a reservoir solution [0.1 M Tris (pH 7.0), and 20% (w/v) PEG 2000]. For data collection, the crystals were soaked in a reservoir solution containing 20% (v/v) ethylene glycol and flash-frozen in liquid nitrogen. X-ray (λ = 0.9795 Å) diffraction data were collected at −173 °C using a PILATUS3 S 6M detector at the BL18U1 beamline of the Shanghai Synchrotron Radiation Facility (SSRF) and processed using the program HKL-3000⁴⁵. For NFIA_DBD, the structure was solved by molecular replacement using the PHASER program⁴⁶ with the NFIA AlphaFold model as the search model. For the NFIA_DBD:DNA complexes, the structures were solved by molecular replacement using the NFIA_DBD structure, and DNA molecules were built de novo based on the electron density map. Model building and structure refinement were performed using Coot⁴⁷ and Phenix⁴⁸. The final structures were evaluated using the wwwPDB⁴⁹ validation server. Data collection and structure statistics are summarized in Supplementary Table 1.

Small-angle X-ray scattering analysis

SAXS data were recorded at the SSRF beamline BL19U2 using a PILATUS3 X 2 M detector. The wavelength (λ) of X-ray radiation was 1.033 Å, and the momentum transfer q was recorded in the range 0.006–0.39 Å⁻¹ [q = (4π/λ)sinθ, where 2θ is the scattering angle]. The NFIA_DBD and NFIA_DBD:DNA-S complexes were concentrated to 4 mg ml⁻¹, respectively, and pre-equilibrated in a SAXS buffer [25 mM Tris-HCl (pH 7.5), 150 mM NaCl, 1 mM TCEP, and 5% (v/v) glycerol] for data collection. For each sample, scattering profiles were measured at three solute concentrations (4-, 2-fold dilution, and stock solution) to remove the scattering contribution due to inter-particle interactions and to extrapolate the data to infinite dilution. A set of 20 two-dimensional images was recorded for each buffer or sample solution using a flow cell, with an exposure time of one second per image to minimize radiation damage and optimize the signal-to-noise ratio. The images were reduced on site to one-dimensional scattering profiles using BioXTAS RAW.

The SAXS data were analyzed using IGOR-Pro (WaveMetrics) and the ATSAS suite⁵⁰, similar to previously described protocols^51,52. Briefly, the forward scattering intensity I(0) and the radius of gyration (R_g) were calculated from the infinite dilution data at low q values in the range of qR_g < 1.3, using the Guinier approximation in PRIMUS⁵³. These parameters were also estimated from the scattering profile with a broader q range of 0.006–0.30 Å⁻¹ using the indirect Fourier transform method implemented in the program GNOM, along with the pair-distance distribution function (PDDF) and the maximum dimension of the protein (D_max). The molecular weight was calculated using the SAXS MoW method, which is independent of protein concentration. Low-resolution ab initio shape envelopes were calculated using DAMMIN, which generates models represented by an ensemble of densely packed beads, using scattering profiles within the q range of 0.006–0.30 Å⁻¹. A total of 20 independent models were created and averaged by DAMAVER⁵⁴, superimposed by SUPCOMB based on the normalized spatial discrepancy criteria, and filtered using DAMFILT to generate the final model. The theoretical scattering intensity of the crystal structure was calculated and fitted to the experimental scattering intensity using CRYSOL⁵⁵. Data collection and scattering-derived parameters are summarized in Supplementary Table 2.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Structure factors and atomic coordinates have been deposited in the Protein Data Bank [www.rcsb.org] with the accession codes 9JH4 (https://doi.org/10.2210/pdb9JH4/pdb), 9JH2 (https://doi.org/10.2210/pdb9JH2/pdb), and 9LC2 (https://doi.org/10.2210/pdb9LC2/pdb). ChIP-seq data analyzed in this study were obtained from ENCODE (www.encodeproject.org) with the accession code ENCSR226QQM (https://doi.org/10.17989/ENCSR226QQM), and processed peak sets annotated for the TGGCA half-site and the TGGCA(N3)TGCCA dyad-symmetric motif are provided in Supplementary Data 1–2. Author-generated structural models (model for two DBDs on DNA-L and AlphaFold3 models for NFIA_DBD and NFIA_DBD:DNA-S) are provided as Supplementary Data 3–5. All other data supporting the findings of this study are available within the paper and its Supplementary Information. Source data are provided with this paper.

References

Gronostajski, R. M. Roles of the NFI/CTF gene family in transcription and development. Gene 249, 31–45 (2000).
Article CAS PubMed Google Scholar
Pjanic, M. et al. Nuclear factor I revealed as family of promoter binding transcription activators. BMC Genomics 12, 1–10 (2011).
Article Google Scholar
Borgmeyer, U., Nowock, J. & Sippel, A. E. The TGGCA-binding protein: a eukaryotic nuclear protein recognizing a symmetrical sequence on double-stranded linear DNA. Nucleic Acids Res. 12, 4295–4311 (1984).
Article CAS PubMed PubMed Central Google Scholar
Gronostajski, R. M., Adhya, S., Nagata, K., Guggenheimer, R. A. & Hurwitz, J. Site-specific DNA binding of nuclear factor I: analyses of cellular binding sites. Mol. Cell. Biol. 5, 964–971 (1985).
CAS PubMed PubMed Central Google Scholar
Jones, K. A., Kadonaga, J. T., Rosenfeld, P. J., Kelly, T. J. & Tjian, R. A cellular DNA-binding protein that activates eukaryotic transcription and DNA replication. Cell 48, 79–89 (1987).
Article CAS PubMed Google Scholar
Nagata, K., Guggenheimer, R. A., Enomoto, T., Lichy, J. H. & Hurwitz, J. Adenovirus DNA replication in vitro: identification of a host factor that stimulates synthesis of the preterminal protein-dCMP complex. Proc. Natl. Acad. Sci. USA 79, 6438–6442 (1982).
Article CAS PubMed PubMed Central ADS Google Scholar
Nagata, K., Guggenheimer, R. A. & Hurwitz, J. Specific binding of a cellular DNA replication protein to the origin of replication of adenovirus DNA. Proc. Natl. Acad. Sci. USA 80, 6177–6181 (1983).
Article CAS PubMed PubMed Central ADS Google Scholar
Harris, L., Genovesi, L. A., Gronostajski, R. M., Wainwright, B. J. & Piper, M. Nuclear factor one transcription factors: divergent functions in developmental versus adult stem cell populations. Dev. Dyn. 244, 227–238 (2015).
Article CAS PubMed Google Scholar
Qian, F., Kruse, U., Lichter, P. & Sippel, A. E. Chromosomal localization of the four genes (NFIA, B, C, and X) for the human transcription factor nuclear factor I by FISH. Genomics 28, 66–73 (1995).
Article CAS PubMed Google Scholar
Rupp, R. A. et al. Chicken NFI/TGGCA proteins are encoded by at least three independent genes: NFI-A, NFI-B and NFI-C with homologues in mammalian genomes. Nucleic Acids Res. 18, 2607–2616 (1990).
Article CAS PubMed PubMed Central Google Scholar
Alevizopoulos, A. et al. A proline-rich TGF-beta-responsive transcriptional activator interacts with histone H3. Genes Dev. 9, 3051–3066 (1995).
Article CAS PubMed Google Scholar
Alevizopoulos, A. & Mermod, N. Antagonistic regulation of a proline-rich transcription factor by transforming growth factor β and tumor necrosis factor α. J. Biol. Chem. 271, 29672–29681 (1996).
Article CAS PubMed Google Scholar
Wang, W. et al. Nuclear factor I coordinates multiple phases of cerebellar granule cell development via regulation of cell adhesion molecules. J. Neurosci. 27, 6115–6127 (2007).
Article CAS PubMed PubMed Central Google Scholar
Bunt, J. et al. Combined allelic dosage of Nfia and Nfib regulates cortical development. Brain Neurosci. Adv. 1, 2398212817739433 (2017).
Article PubMed PubMed Central Google Scholar
Chen, K.-S., Lim, J. W., Richards, L. J. & Bunt, J. The convergent roles of the nuclear factor I transcription factors in development and cancer. Cancer Lett. 410, 124–138 (2017).
Article CAS PubMed Google Scholar
Deneen, B. et al. The transcription factor NFIA controls the onset of gliogenesis in the developing spinal cord. Neuron 52, 953–968 (2006).
Article CAS PubMed Google Scholar
Lu, W. et al. NFIA haploinsufficiency is associated with a CNS malformation syndrome and urinary tract defects. PLoS Genet. 3, e80 (2007).
Article PubMed PubMed Central Google Scholar
Mason, S., Piper, M., Gronostajski, R. M. & Richards, L. J. Nuclear factor one transcription factors in CNS development. Mol. Neurobiol. 39, 10–23 (2009).
Article CAS PubMed Google Scholar
das Neves, L. et al. Disruption of the murine nuclear factor IA gene (Nfia) results in perinatal lethality, hydrocephalus, and agenesis of the corpus callosum. Proc. Natl. Acad. Sci. USA 96, 11946–11951 (1999).
Article PubMed PubMed Central ADS Google Scholar
Piper, M. et al. NFIA controls telencephalic progenitor cell differentiation through repression of the Notch effector Hes1. J. Neurosci. 30, 9127–9139 (2010).
Article CAS PubMed PubMed Central Google Scholar
Shu, T., Butz, K. G., Plachez, C., Gronostajski, R. M. & Richards, L. J. Abnormal development of forebrain midline glia and commissural projections in Nfia knock-out mice. J. Neurosci. 23, 203–212 (2003).
Article CAS PubMed PubMed Central Google Scholar
Gobius, I. et al. Astroglial-mediated remodeling of the interhemispheric midline is required for the formation of the corpus callosum. Cell Rep. 17, 735–747 (2016).
Article CAS PubMed PubMed Central Google Scholar
Koehler, U. et al. A novel 1p31. 3p32. 2 deletion involving the NFIA gene detected by array CGH in a patient with macrocephaly and hypoplasia of the corpus callosum. Eur. J. Pediatr. 169, 463–468 (2010).
Article PubMed Google Scholar
Lu, W., Quintero-Rivera, F., Fan, Y., Alkuraya, F. & Donovan, D. NFIA haploinsufficiency is associated with a CNS malformation syndrome and urinary tract. PLoS Genet. 3, e80 (2007).
Article PubMed PubMed Central Google Scholar
Negishi, Y. et al. Truncating mutation in NFIA causes brain malformation and urinary tract defects. Hum. Genome Var. 2, 1–4 (2015).
Article Google Scholar
Wang, C. et al. NFIA regulates articular chondrocyte fatty acid metabolism and joint homeostasis. Sci. Transl. Med. 17, eadm9488 (2025).
Article CAS PubMed Google Scholar
Chen, D. et al. Osteoarthritis: toward a comprehensive understanding of pathological mechanism. Bone Res. 5, 16044 (2017).
Article CAS PubMed PubMed Central Google Scholar
Tong, L. et al. Current understanding of osteoarthritis pathogenesis and relevant new approaches. Bone Res. 10, 60 (2022).
Article CAS PubMed PubMed Central Google Scholar
Li, X. et al. Current cutting-edge omics techniques on musculoskeletal tissues and diseases. Bone Res. 13, 59 (2025).
Article CAS PubMed PubMed Central Google Scholar
Altmann, H., Wendler, W. & Winnacker, E.-L. Transcriptional activation by CTF proteins is mediated by a bipartite low-proline domain. Proc. Natl. Acad. Sci. USA 91, 3901–3905 (1994).
Article CAS PubMed PubMed Central ADS Google Scholar
Mermod, N., O’Neill, E. A., Kelly, T. J. & Tjian, R. The proline-rich transcriptional activator of CTF/NF-I is distinct from the replication and DNA binding domain. Cell 58, 741–753 (1989).
Article CAS PubMed Google Scholar
Roulet, E. et al. Experimental analysis and computer prediction of CTF/NFI transcription factor DNA binding sites. J. Mol. Biol. 297, 833–848 (2000).
Article CAS PubMed Google Scholar
Armentero, M.-T., Horwitz, M. & Mermod, N. Targeting of DNA polymerase to the adenovirus origin of DNA replication by interaction with nuclear factor I. Proc. Natl. Acad. Sci. USA 91, 11537–11541 (1994).
Article CAS PubMed PubMed Central ADS Google Scholar
Gounari, F. et al. Amino-terminal domain of NF1 binds to DNA as a dimer and activates adenovirus DNA replication. EMBO J. 9, 559–566 (1990).
Article CAS PubMed PubMed Central Google Scholar
Gronostajski, R. M. Analysis of nuclear factor I binding to DNA using degenerate oligonucleotides. Nucleic Acids Res. 14, 9117–9132 (1986).
Article CAS PubMed PubMed Central Google Scholar
Wuarin, J., Mueller, C. & Schibler, U. A ubiquitous CCAAT factor is required for efficient in vitro transcription from the mouse albumin promoter. J. Mol. Biol. 214, 865–874 (1990).
Article CAS PubMed Google Scholar
Kruse, U. & Sippel, A. E. Transcription factor nuclear factor I proteins form stable homo-and heterodimers. FEBS Lett. 348, 46–50 (1994).
Article CAS PubMed Google Scholar
Goyal, N., Knox, J. & Gronostajski, R. M. Analysis of multiple forms of nuclear factor I in human and murine cell lines. Mol. Cell Biol. 10, 1041–1048 (1990).
CAS PubMed PubMed Central Google Scholar
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Article CAS PubMed PubMed Central Google Scholar
Rauluseviciute, I. et al. JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 52, D174–D182 (2024).
Article CAS PubMed Google Scholar
Sultana, A. & Lee, J. E. Measuring protein-protein and protein-nucleic acid interactions by biolayer interferometry. Curr. Protoc. Protein Sci. 79, 19.25.1–19.25.26 (2015).
Article PubMed Google Scholar
Shi, Y. et al. Crystal structure of a Smad MH1 domain bound to DNA: insights on DNA binding in TGF-β signaling. Cell 94, 585–594 (1998).
Article CAS PubMed Google Scholar
Martin-Malpartida, P. et al. Structural basis for genome wide recognition of 5-bp GC motifs by SMAD transcription factors. Nat. Commun. 8, 2070 (2017).
Article PubMed PubMed Central ADS Google Scholar
Crooks, G. E., Hon, G., Chandonia, J.-M. & Brenner, S. E. WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004).
Article CAS PubMed PubMed Central Google Scholar
Minor, W., Cymborowski, M., Otwinowski, Z. & Chruszcz, M. HKL-3000: the integration of data reduction and structure solution–from diffraction images to an initial model in minutes. Acta Crystallogr. Sect. D Biol. Crystallogr. 62, 859–866 (2006).
Article ADS Google Scholar
McCoy, A. J. et al. Phaser crystallographic software. J. Appl. Crystallogr. 40, 658–674 (2007).
Article CAS PubMed PubMed Central ADS Google Scholar
Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Crystallogr. Sect. D Biol. Crystallogr. 60, 2126–2132 (2004).
Article ADS Google Scholar
Adams, P. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr D Biol. Crystallogr 65, 1074 (2009).
PubMed PubMed Central ADS Google Scholar
Berman, H. M. et al. The protein data bank. Nucleic Acids Res. 28, 235–242 (2000).
Article CAS PubMed PubMed Central ADS Google Scholar
Petoukhov, M. V. et al. New developments in the ATSAS program package for small-angle scattering data analysis. J. Appl. Crystallogr. 45, 342–350 (2012).
Article CAS PubMed PubMed Central ADS Google Scholar
Song, H. et al. The functional cycle of Rnt1p: five consecutive steps of double-stranded RNA processing by a eukaryotic RNase III. Structure 25, 353–363 (2017).
Article CAS PubMed PubMed Central Google Scholar
Song, H. et al. Structure and activity of PPX/GppA homologs from Escherichia coli and Helicobacter pylori. FEBS J. 287, 1865–1885 (2020).
Article CAS PubMed Google Scholar
Konarev, P. V., Volkov, V. V., Sokolova, A. V., Koch, M. H. & Svergun, D. I. PRIMUS: a Windows PC-based system for small-angle scattering data analysis. J. Appl. Crystallogr. 36, 1277–1282 (2003).
Article CAS ADS Google Scholar
Volkov, V. V. & Svergun, D. I. Uniqueness of ab initio shape determination in small-angle scattering. J. Appl. Crystallogr. 36, 860–864 (2003).
Article CAS ADS Google Scholar
Svergun, D., Barberato, C. & Koch, M. H. CRYSOL–a program to evaluate X-ray solution scattering of biological macromolecules from atomic coordinates. J. Appl. Crystallogr. 28, 768–773 (1995).
Article CAS ADS Google Scholar

Download references

Acknowledgements

We gratefully acknowledge Prof. Xin Chen (University of Macau) for providing valuable resources and support. This work was supported by the Macau Science and Technology Development Fund (0068/2023/ITP2 and 0143/2025/ITP2, H.S.; 0007/2022/AKP, X.C.), University of Macau (MYRG-GRG2024-00283-ICMS-UMDF, H.S.; SRG2023-00054-ICMS, H.S.), The State Key Laboratory of Mechanism and Quality of Chinese Medicine (005/2023/SKL, H.S.), Basic Research Program of Jiangsu (BK20243030, Z.L.), National Natural Science Foundation of China (No. 82272508, J.T.), Hunan Health Commission Key Clinical Specialty Major Research Project (No. Z2023033, J.T.) and Chinese National Key Clinical Specialty (Orthopedics, J.T.). The X-ray crystallography experiments were carried out with the support of Shanghai Synchrotron Radiation Facility, Beamline B18U1 (proposal 2024-NFPS-JJ-501348, H.S.), with expert assistance from beamline staff member Yijun Gu.

Author information

These authors contributed equally: Ci Zhu, Ding Xiao, Zhipu Luo.

Authors and Affiliations

State Key Laboratory of Mechanism and Quality of Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macau, China
Ci Zhu, Yueyu Wang, Xi Chen & He Song
National Clinical Research Center for Geriatric Disorders, XiangYa Hospital, Changsha, Hunan, China
Ding Xiao & Juyu Tang
Department of Hand and Microsurgery, XiangYa Hospital, Central South University, Changsha, Hunan, China
Ding Xiao & Juyu Tang
MOE Key Laboratory of Geriatric Diseases and Immunology, Institute of Molecular Enzymology, School of Life Sciences, Soochow University, Suzhou, China
Zhipu Luo
School of Life Sciences, Tsinghua University, Beijing, China
Jie Zhang
State Key Laboratory of RNA Innovation, Science and Engineering, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
Jie Zhang & Xianyang Fang
Department of Orthopedics, Shanghai Municipal Hospital of Traditional Chinese Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, China
Shuang Liu & Xiaofeng Li
School of Acupuncture, Moxibustion and Tuina, Beijing University of Chinese Medicine, Beijing, China
Hejin Xiao
Department of Orthopaedic Surgery, Washington University, St. Louis, MO, USA
Jie Shen

Authors

Ci Zhu
View author publications
Search author on:PubMed Google Scholar
Ding Xiao
View author publications
Search author on:PubMed Google Scholar
Zhipu Luo
View author publications
Search author on:PubMed Google Scholar
Jie Zhang
View author publications
Search author on:PubMed Google Scholar
Shuang Liu
View author publications
Search author on:PubMed Google Scholar
Yueyu Wang
View author publications
Search author on:PubMed Google Scholar
Xi Chen
View author publications
Search author on:PubMed Google Scholar
Hejin Xiao
View author publications
Search author on:PubMed Google Scholar
Xiaofeng Li
View author publications
Search author on:PubMed Google Scholar
Juyu Tang
View author publications
Search author on:PubMed Google Scholar
Xianyang Fang
View author publications
Search author on:PubMed Google Scholar
Jie Shen
View author publications
Search author on:PubMed Google Scholar
He Song
View author publications
Search author on:PubMed Google Scholar

Contributions

J.S. and H.S. conceived the project. C.Z., Y.W., and X.C. prepared protein samples. C.Z. conducted binding assays and analyzed the experimental data. D.X., H.X., and J.T. performed luciferase assays. C.Z., Z.L., and H.S. performed X-ray crystallography experiments and carried out subsequent data analysis. S.L. and X.L. conducted ChIP-Seq analysis and designed the DNA constructs. J.Z. and X.F. performed SAXS experiments and analysis. J.S. and H.S. supervised the project. J.S. and H.S. wrote the manuscript with input from all authors.

Corresponding authors

Correspondence to Jie Shen or He Song.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data

Reporting Summary

Transparent Peer Review file

Source data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhu, C., Xiao, D., Luo, Z. et al. Structural basis for genome-wide site-specific DNA recognition by Nuclear Factor IA. Nat Commun 17, 917 (2026). https://doi.org/10.1038/s41467-025-67641-4

Download citation

Received: 02 February 2025
Accepted: 04 December 2025
Published: 15 December 2025
Version of record: 23 January 2026
DOI: https://doi.org/10.1038/s41467-025-67641-4