Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Multiple overlapping binding sites determine transcription factor occupancy

Abstract

Transcription factors (TFs) regulate gene expression by interacting with DNA in a sequence-specific manner. High-throughput in vitro technologies, such as protein-binding microarrays1,2,3,4,5,6 and HT-SELEX (high-throughput systematic evolution of ligands by exponential enrichment)7,8, have revealed the DNA-binding specificities of hundreds of TFs. However, they have limited ability to reliably identify lower-affinity DNA binding sites, which are increasingly recognized as important for precise spatiotemporal control of gene expression9,10,11,12,13,14,15,16,17,18,19. Here, to address this limitation, we developed protein affinity to DNA by in vitro transcription and RNA sequencing (PADIT-seq), with which we comprehensively assayed the binding preferences of six TFs to all possible ten-base-pair DNA sequences, detecting hundreds of novel, lower-affinity binding sites. The expanded repertoire of lower-affinity binding sites revealed that nucleotides flanking high-affinity DNA binding sites create overlapping lower-affinity sites that together modulate TF genomic occupancy in vivo. We propose a model in which TF binding is not determined by individual binding sites, but rather by the sum of multiple, overlapping binding sites. The overlapping binding model explains how competition between paralogous TFs for shared high-affinity binding sites is determined by flanking nucleotides that create differential numbers of overlapping, lower-affinity binding sites. Critically, the model transforms our understanding of noncoding-variant effects, revealing how single nucleotide changes simultaneously alter multiple overlapping sites to additively influence gene expression and human traits, including diseases.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: PADIT-seq detects hundreds of lower-affinity interactions that are undetected by uPBM and HT-SELEX.
Fig. 2: Nucleotides flanking high-affinity binding sites create overlapping lower-affinity binding sites.
Fig. 3: ChIP-nexus footprinting provides direct evidence for independent TF binding to consecutive, overlapping binding sites.
Fig. 4: Competition between the paralogous TFs Pho4 and Cbf1 is determined by differential numbers of overlapping binding sites.
Fig. 5: Noncoding variants alter TF binding by perturbing multiple overlapping binding sites.

Similar content being viewed by others

Data availability

PADIT-seq data have been deposited in the Gene Expression Omnibus (GEO) database under accession number GSE250601. MITOMI Kd data for EGR1 was downloaded from the supporting information of ref. 41. The UniPROBE database (http://thebrain.bwh.harvard.edu/pbms/UniPROBE_staging/browse.php) was used to access uPBM data74,75. HT-SELEX data were downloaded using the ENA accession code ERP001826. ChIP–seq and ChIP-nexus data analysed in the paper were downloaded using accession codes GSE81356, GSE89457, GSE85628, GSE67482 and GSE207001. BET-seq data were downloaded using the accession code GSE111936. Competition gcPBM data were downloaded from the supplementary materials of ref. 64. Processed SNP-SELEX data were downloaded from the supplementary data of ref. 65. Allele-specific EGR1 ChIP–seq data were downloaded from the ADASTRA database: https://adastra.autosome.org/mabel. Processed MPRA data were downloaded from supplementary tables of refs. 70,71,72,73.

Code availability

Code and processed data for generating the figures are available at https://github.com/BulykLab/PADIT-seq.

References

  1. Bulyk, M. L., Gentalen, E., Lockhart, D. J. & Church, G. M. Quantifying DNA–protein interactions by double-stranded DNA arrays. Nat. Biotechnol. 17, 573–577 (1999).

    Article  CAS  PubMed  Google Scholar 

  2. Mukherjee, S. et al. Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays. Nat. Genet. 36, 1331–1339 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Berger, M. F. et al. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24, 1429–1435 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Berger, M. F. & Bulyk, M. L. Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nat. Protoc. 4, 393–411 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Badis, G. et al. Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723 (2009).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  6. Weirauch, M. T. et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Jolma, A. et al. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res. 20, 861–873 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Jolma, A. et al. DNA-binding specificities of human transcription factors. Cell 152, 327–339 (2013).

    Article  CAS  PubMed  Google Scholar 

  9. Driever, W., Thoma, G. & Nüsslein-Volhard, C. Determination of spatial domains of zygotic gene expression in the Drosophila embryo by the affinity of binding sites for the bicoid morphogen. Nature 340, 363–367 (1989).

    Article  ADS  CAS  PubMed  Google Scholar 

  10. Gaudet, J. & Mango, S. E. Regulation of organogenesis by the Caenorhabditis elegans FoxA protein PHA-4. Science 295, 821–825 (2002).

    Article  ADS  CAS  PubMed  Google Scholar 

  11. Rowan, S. et al. Precise temporal control of the eye regulatory gene Pax6 via enhancer-binding site affinity. Genes Dev. 24, 980–985 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Crocker, J. et al. Low affinity binding site clusters confer hox specificity and regulatory robustness. Cell 160, 191–203 (2015).

    Article  CAS  PubMed  Google Scholar 

  13. Farley, E. K. et al. Suboptimization of developmental enhancers. Science 350, 325–328 (2015).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  14. Zandvakili, A., Campbell, I., Gutzwiller, L. M., Weirauch, M. T. & Gebelein, B. Degenerate Pax2 and Senseless binding motifs improve detection of low-affinity sites required for enhancer specificity. PLoS Genet. 14, e1007289 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Tanay, A. Extensive low-affinity transcriptional interactions in the yeast genome. Genome Res. 16, 962–972 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Segal, E., Raveh-Sadka, T., Schroeder, M., Unnerstall, U. & Gaul, U. Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature 451, 535–540 (2008).

    Article  ADS  CAS  PubMed  Google Scholar 

  17. Giorgetti, L. et al. Noncooperative interactions between transcription factors and clustered DNA binding sites enable graded transcriptional responses to environmental inputs. Mol. Cell 37, 418–428 (2010).

    Article  CAS  PubMed  Google Scholar 

  18. Horton, C. A. et al. Short tandem repeats bind transcription factors to tune eukaryotic gene expression. Science 381, eadd1250 (2023).

    Article  CAS  PubMed  Google Scholar 

  19. Lim, F. et al. Affinity-optimizing enhancer variants disrupt development. Nature 626, 151–159 (2024).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  20. Bartlett, A. et al. Mapping genome-wide transcription-factor binding sites using DAP-seq. Nat. Protoc. 12, 1659–1672 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Stormo, G. D., Zuo, Z. & Chang, Y. K. Spec-seq: determining protein-DNA-binding specificity by sequencing. Brief. Funct. Genom. 14, 30–38 (2015).

    Article  CAS  Google Scholar 

  22. Fordyce, P. M. et al. De novo identification and biophysical characterization of transcription-factor binding sites with microfluidic affinity analysis. Nat. Biotechnol. 28, 970–975 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Isakova, A. et al. SMiLE-seq identifies binding motifs of single and dimeric transcription factors. Nat. Methods 14, 316–322 (2017).

    Article  CAS  PubMed  Google Scholar 

  24. Meng, X., Brodsky, M. H. & Wolfe, S. A. A bacterial one-hybrid system for determining the DNA-binding specificity of transcription factors. Nat. Biotechnol. 23, 988–994 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Stringham, J. L., Brown, A. S., Drewell, R. A. & Dresch, J. M. Flanking sequence context-dependent transcription factor binding in early Drosophila development. BMC Bioinformatics 14, 298 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Levo, M. et al. Unraveling determinants of transcription factor binding outside the core binding site. Genome Res. 25, 1018–1029 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Dror, I., Golan, T., Levy, C., Rohs, R. & Mandel-Gutfreund, Y. A widespread role of the motif environment in transcription factor binding across diverse protein families. Genome Res. 25, 1268–1280 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Chaudhari, H. G. & Cohen, B. A. Local sequence features that influence AP-1 cis-regulatory activity. Genome Res. 28, 171–181 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Cohen, D. M., Lim, H.-W., Won, K.-J. & Steger, D. J. Shared nucleotide flanks confer transcriptional competency to bZip core motifs. Nucleic Acids Res. 46, 8371–8384 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Le, D. D. et al. Comprehensive, high-resolution binding energy landscapes reveal context dependencies of transcription factor binding. Proc. Natl Acad. Sci. USA 115, E3702–E3711 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Yang, M. G., Ling, E., Cowley, C. J., Greenberg, M. E. & Vierbuchen, T. Characterization of sequence determinants of enhancer function using natural genetic variation. eLife 11, e76500 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Reiter, F., de Almeida, B. P. & Stark, A. Enhancers display constrained sequence flexibility and context-specific modulation of motif function. Genome Res. 33, 346–358 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Rudnizky, S. et al. Single-molecule DNA unzipping reveals asymmetric modulation of a transcription factor by its binding site sequence and context. Nucleic Acids Res. 46, 1513–1524 (2018).

    Article  CAS  PubMed  Google Scholar 

  34. Gordân, R. et al. Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape. Cell Rep. 3, 1093–1104 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Aditham, A. K., Shimko, T. C. & Fordyce, P. M. BET-seq: binding energy topographies revealed by microfluidics and high-throughput sequencing. Methods Cell Biol. 148, 229–250 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Jung, C. et al. True equilibrium measurement of transcription factor-DNA binding affinities using automated polarization microscopy. Nat. Commun. 9, 1605 (2018).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  37. Aditham, A. K., Markin, C. J., Mokhtari, D. A., DelRosso, N. & Fordyce, P. M. High-throughput affinity measurements of transcription factor and DNA mutations reveal affinity and specificity determinants. Cell Syst. 12, 112–127.e11 (2021).

    Article  CAS  PubMed  Google Scholar 

  38. Götzke, H. et al. The ALFA-tag is a highly versatile tool for nanobody-based bioscience applications. Nat. Commun. 10, 4403 (2019).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  39. Hussey, B. J. & McMillen, D. R. Programmable T7-based synthetic transcription factors. Nucleic Acids Res. 46, 9842–9854 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Geertz, M., Shore, D. & Maerkl, S. J. Massively parallel measurements of molecular interaction kinetics on a microfluidic platform. Proc. Natl Acad. Sci. USA 109, 16540–16545 (2012).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  42. Toko, H. et al. Csx/Nkx2-5 is required for homeostasis and survival of cardiac myocytes in the adult heart. J. Biol. Chem. 277, 24735–24743 (2002).

    Article  CAS  PubMed  Google Scholar 

  43. Moskowitz, I. P. G. et al. The T-Box transcription factor Tbx5 is required for the patterning and maturation of the murine cardiac conduction system. Development 131, 4107–4116 (2004).

    Article  CAS  PubMed  Google Scholar 

  44. Zhou, X. & O’Shea, E. K. Integrated approaches reveal determinants of genome-wide binding and function of the transcription factor Pho4. Mol. Cell 42, 826–836 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Ogawa, N. & Oshima, Y. Functional domains of a positive regulatory protein, PHO4, for transcriptional control of the phosphatase regulon in Saccharomyces cerevisiae. Mol. Cell. Biol. 10, 2224–2236 (1990).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. Cai, M. & Davis, R. W. Yeast centromere binding protein CBF1, of the helix-loop-helix protein family, is required for chromosome stability and methionine prototrophy. Cell 61, 437–446 (1990).

    Article  CAS  PubMed  Google Scholar 

  47. Payne, J. L. & Wagner, A. The robustness and evolvability of transcription factor binding sites. Science 343, 875–877 (2014).

    Article  ADS  CAS  PubMed  Google Scholar 

  48. Jaeger, S. A. et al. Conservation and regulatory associations of a wide affinity range of mouse transcription factor binding sites. Genomics 95, 185–195 (2010).

    Article  CAS  PubMed  Google Scholar 

  49. Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Rube, H. T. et al. Prediction of protein–ligand binding affinity from sequencing data with interpretable machine learning. Nat. Biotechnol. 40, 1520–1527 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Sheth, R. et al. Distal limb patterning requires modulation of cis-regulatory activities by HOX13. Cell Rep. 17, 2913–2926 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Sun, Z. et al. EGR1 recruits TET1 to shape the brain methylome during development and upon neuronal activity. Nat. Commun. 10, 3892 (2019).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  53. Anderson, D. J. et al. NKX2-5 regulates human cardiomyogenesis via a HEY2 dependent transcriptional network. Nat. Commun. 9, 1373 (2018).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  54. Ang, Y.-S. et al. Disease model of GATA4 mutation reveals transcription factor cooperativity in human cardiogenesis. Cell 167, 1734–1749.e22 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Alexandari, A. M. et al. De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein–DNA binding. Preprint at bioRxiv https://doi.org/10.1101/2023.05.11.540401 (2023).

  56. Markstein, M., Markstein, P., Markstein, V. & Levine, M. S. Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo. Proc. Natl Acad. Sci. USA 99, 763–768 (2002).

    Article  ADS  CAS  PubMed  Google Scholar 

  57. Lifanov, A. P., Makeev, V. J., Nazina, A. G. & Papatsenko, D. A. Homotypic regulatory clusters in Drosophila. Genome Res. 13, 579–588 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Gotea, V. et al. Homotypic clusters of transcription factor binding sites are a key component of human promoters and enhancers. Genome Res. 20, 565–577 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Rohs, R. et al. The role of DNA shape in protein–DNA recognition. Nature 461, 1248–1253 (2009).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  60. Li, J., Chiu, T.-P. & Rohs, R. Predicting DNA structure using a deep learning method. Nat. Commun. 15, 1243 (2024).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  61. He, Q., Johnston, J. & Zeitlinger, J. ChIP-nexus enables improved detection of in vivo transcription factor binding footprints. Nat. Biotechnol. 33, 395–401 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. De Masi, F. et al. Using a structural and logics systems approach to infer bHLH-DNA binding specificity determinants. Nucleic Acids Res. 39, 4553–4563 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Grove, C. A. et al. A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors. Cell 138, 314–327 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Zhang, Y., Ho, T. D., Buchler, N. E. & Gordân, R. Competition for DNA binding between paralogous transcription factors determines their genomic occupancy and regulatory functions. Genome Res. 31, 1216–1229 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Yan, J. et al. Systematic analysis of binding of transcription factors to noncoding variants. Nature 591, 147–151 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  66. Coetzee, S. G., Coetzee, G. A. & Hazelett, D. J. motifbreakR: an R/Bioconductor package for predicting variant effects at transcription factor binding sites. Bioinformatics 31, 3847–3849 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014).

    Article  CAS  PubMed  Google Scholar 

  68. Lettice, L. A. et al. Disruption of a long-range cis-acting regulator for Shh causes preaxial polydactyly. Proc. Natl Acad. Sci. USA 99, 7548–7553 (2002).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  69. Abramov, S. et al. Landscape of allele-specific transcription factor binding in the human genome. Nat. Commun. 12, 2751 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  70. Tewhey, R. et al. Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay. Cell 165, 1519–1529 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Khetan, S. et al. Functional characterization of T2D-associated SNP effects on baseline and ER stress-responsive β cell transcriptional activation. Nat. Commun. 12, 5242 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  72. Abell, N. S. et al. Multiple causal variants underlie genetic associations in humans. Science 375, 1247–1254 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  73. McAfee, J. C. et al. Systematic investigation of allelic regulatory activity of schizophrenia-associated common variants. Cell Genom. 3, 100404 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Newburger, D. E. & Bulyk, M. L. UniPROBE: an online database of protein binding microarray data on protein–DNA interactions. Nucleic Acids Res. 37, D77–D82 (2009).

    Article  CAS  PubMed  Google Scholar 

  75. Hume, M. A., Barrera, L. A., Gisselbrecht, S. S. & Bulyk, M. L. UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein–DNA interactions. Nucleic Acids Res. 43, D117–D122 (2015).

    Article  CAS  PubMed  Google Scholar 

  76. Shahein, A. et al. Systematic analysis of low-affinity transcription factor binding site clusters in vitro and in vivo establishes their functional relevance. Nat. Commun. 13, 5273 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  77. Yin, Y. et al. Impact of cytosine methylation on DNA binding specificities of human transcription factors. Science 356, eaaj2239 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  78. Slattery, M. et al. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147, 1270–1282 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Riley, T. R. et al. SELEX-seq: a method for characterizing the complete repertoire of binding site preferences for transcription factor complexes. Methods Mol. Biol. 1196, 255–278 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Hammal, F., de Langen, P., Bergon, A., Lopez, F. & Ballester, B. ReMap 2022: a database of human, mouse, Drosophila and Arabidopsis regulatory regions from an integrative analysis of DNA-binding sequencing experiments. Nucleic Acids Res. 50, D316–D325 (2022).

    Article  CAS  PubMed  Google Scholar 

  81. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Foulk, M. S., Urban, J. M., Casella, C. & Gerbi, S. A. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins. Genome Res. 25, 725–735 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Hon, J., Martínek, T., Zendulka, J. & Lexa, M. pqsfinder: an exhaustive and imperfection-tolerant search tool for potential quadruplex-forming sequences in R. Bioinformatics 33, 3373–3379 (2017).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors thank S. Gisselbrecht for help with calculating PBM 9-mer E-scores for EGR1; K. Struhl, L. Mariani and X. K. Liu for critical reading of the manuscript, and members of the Bulyk laboratory for helpful discussions. This work was supported by grants from the National Institutes of Health (R21 HG010200 and R01 HG010501 to M.L.B.; K99 HG013675 to S.K.) and the American Heart Association (24POST1192017 to S.K.).

Author information

Authors and Affiliations

Authors

Contributions

M.L.B. and S.K. conceived and designed the research project. S.K. performed PADIT-seq experiments, and S.K. and B.S.C. performed custom PBM experiments. S.K. conducted the analyses and prepared the figures. M.L.B. supervised the research. M.L.B. and S.K. wrote the manuscript. All authors approved the final version of the manuscript.

Corresponding author

Correspondence to Martha L. Bulyk.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks Matthew Weirauch and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 PADIT-seq demonstrates high reproducibility and correlates with orthogonal binding assays.

(a-b) First experiment with HOXD13, EGR1, and ‘NoDBD’ controls (R1-3). (a) Heatmap showing pairwise Pearson correlations between replicates with unsupervised row and column clustering. (b) PCA plot explaining ~93% of variation. (c) Second experiment with Pho4, Cbf1, TBX5, NKX2.5, and additional NoDBD controls (R4-7). Previous ‘NoDBD’ controls (R1-3) included for comparison. PCA plot explaining ~71% of variation. (d-e) PADIT-seq activities for HOXD13 (d) and EGR1 (e) from the all-10mers library and the small-scale library are compared. Red TFBS are active in both libraries. Black TFBS are not active in either libraries. Blue TFBS are active only in the small-scale library. (f) PADIT-seq activity from the small-scale library and MITOMI-derived dissociation constants (Kd) for EGR1 are compared. Red TFBS are active, whereas black TFBS are inactive. (g) Comparison of binding preferences measured by uPBM Z-scores (x-axis) and PADIT-seq (y-axis) for 4 human TFs, HOXD13, NKX2.5, TBX5, and EGR1, and 2 S. cerevisiae yeast TFs, Pho4 and Cbf1. Active k-mers are colored red, and inactive k-mers are colored black. (inset) AUROC comparing the ability of uPBM E-scores (orange) and Z-scores (purple) to discriminate between PADIT-seq active and inactive k-mers.

Extended Data Fig. 2 Custom PBM confirms PADIT-seq active k-mers represent genuine TF binding sites.

(a) Schematic of custom PBM design showing PADIT-seq active k-mers (8-mers for HOXD13 and 9-mers for EGR1) embedded within constant flanking sequences. FLAG-tagged HOXD13 or EGR1 binding was detected using Alexa Fluor 488 conjugated anti-FLAG antibodies. (b) Scatter plots comparing PADIT-seq activity (y-axis) and custom PBM signal (x-axis) for HOXD13 at three protein concentrations (300 nM, 500 nM, and 800 nM). Points are colored by PADIT-seq false discovery rate (FDR): red (FDR < 0.01), cyan (0.01 ≤ FDR < 0.05), and orange (0.05 ≤ FDR < 0.10). (c) Corresponding analysis for EGR1 at the same three protein concentrations. (d) ROC curves comparing the predictive performance of uPBM E-scores (orange) and PWM scores from FIMO (blue) in distinguishing between PADIT-seq active and inactive k-mers. In cases where PWM models were longer than PADIT-seq active k-mers, we scanned all possible relative positions (registers) of the PWM model against each k-mer and retained the highest affinity score for our analysis. (e) ROC curves comparing the predictive performance of uPBM E-scores (orange) and ProBound-predicted affinities (blue) in distinguishing between PADIT-seq active and inactive k-mers. (insets) Due to the large imbalance between active and inactive k-mers, even seemingly small false positive rates translate to substantial numbers of false predictions. For example, a 5% FPR corresponds to 1,556 false positives for HOXD13 and even a 1% FPR corresponds to 1,305 false positives for EGR1.

Extended Data Fig. 3 Lower affinity binding sites increase TF genomic occupancy at ChIP-seq and ChIP-nexus peaks.

(a) ROC curves comparing the performance of PADIT-seq, PWM FIMO, and ProBound in distinguishing ChIP-seq peaks from background genomic regions for HOXD13, NKX2.5, TBX5 and EGR1. For HOXD13, in addition to random, length-matched background genomic intervals, false positives were also determined with the background sequences defined to be embryonic forelimb bud ATAC-seq peaks not overlapping HOXD13 ChIP-seq peaks. For Pho4 and Cbf1, ROC curves compare the performance of PADIT-seq and PWM FIMO in distinguishing ChIP-nexus peaks from background genomic regions (ProBound motifs were not available). The number of foreground and background genomic intervals were equal for all 6 TFs. (b) The sum of PADIT-seq activities of all the active k-mers in ChIP-seq and ChIP-nexus peaks is plotted against the corresponding read counts normalized to peak length for each TF. Pearson correlation coefficients and significance values are shown. (c) Pearson correlation coefficient between normalized ChIP-seq and ChIP-nexus read counts and PADIT-seq predictions when varying the number of top active k-mers included. Red horizontal line indicates the maximum correlation achieved using PADIT-seq. Blue horizontal line shows the correlation achieved using PWM FIMO log-likelihood scores summed across peaks, which yielded higher correlation coefficients than using maximum PWM scores alone.

Extended Data Fig. 4 ChIP-seq and ChIP-nexus peaks have significantly more consecutive active k-mers, irrespective of how background genomic sequences were defined.

(a) Histograms showing the distribution of consecutive active k-mers in peaks (red) for six TFs: HOXD13, NKX2.5, TBX5, and EGR1 (ChIP-seq), and Pho4 and Cbf1 (ChIP-nexus). Background regions were generated by selecting genomic sequences flanking each ChIP peak. Statistical significance was determined by two-sided Wilcoxon rank sum tests. For EGR1, the large effect size resulted in a Pvalue below computational precision. (b) The vast majority of binding sites are detected with high statistical confidence. Across all six TFs, 56–76% of active k-mers are found at FDR < 0.01.

Extended Data Fig. 5 ChIP-seq and ChIP-nexus peaks are significantly enriched for consecutive active k-mers irrespective of FDR threshold.

Distribution of consecutive active k-mers within ChIP-seq and ChIP-nexus peaks (red) compared to random, length-matched genomic regions (black) for each TF at three different FDR thresholds, 1% (left panels), 5% (middle panels), and 10% (right panels). Statistical significance was determined by two-sided Wilcoxon rank sum tests.

Extended Data Fig. 6 Minor groove width (MGW) and Propeller twist (ProT) at the extended recognition sequences bound by TFs is distinct from flanking genomic regions.

(a-b) Predicted MGW (a) and ProT (b) are shown for genomic regions containing consecutive active k-mers within ChIP-seq peaks (red) and their 4-bp flanking regions (blue): HOXD13 (13–15 bp with consecutive active 8-mers), NKX2.5 (11–13 bp with consecutive active 8-mers), TBX5 (10–12 bp with consecutive active 8-mers), and EGR1 (11–13 bp with consecutive active 9-mers). Adjusted Pvalues < 0.05 from paired two-sided Wilcoxon rank sum tests are indicated by *.

Extended Data Fig. 7 ChIP-nexus footprint expansion persists after controlling for core motif positioning.

(a) For Pho4, the increased number of significant cuts and expanded footprint size observed for 5 overlapping binding sites is not because CACGTG occurs at more variable registers. Strand-specific distribution of ChIP-nexus 5’ends (red: positive strand; black: negative strand) averaged across peaks containing exactly 5 consecutive overlapping active 8-mers for Pho4. (b) For Cbf1, an expanded footprint size was observed even after constraining the position of CACGTG at the 3rd register. Strand-specific distribution of ChIP-nexus 5’ends (red: positive strand; black: negative strand) averaged across peaks containing exactly 3 (top) or 5 (bottom) consecutive overlapping active 8-mers with CACGTG constrained to be exclusively at the 3rd register. Peaks with 4 consecutive overlapping binding sites are not included because no significant cuts above background were observed due to low statistical power, which makes it difficult to objectively determine the size of footprints. (a-b) Dashed blue lines indicate the boundaries of the consecutive overlapping binding sites. Letters indicate cuts occurring significantly above background in the flanking 15-bp genomic regions (permutation test adjusted P < 0.05). Sequence logos above each plot show the relative frequency at which the 4 nucleotides occur at each position in the genomic sequences containing the indicated category of ChIP-nexus peaks. For these sequence logos, the y-axis represents information content ranging from a minimum of 0 to a maximum of 2.

Extended Data Fig. 8 Overlapping binding sites additively increase TF occupancy in vitro.

(a) The fraction of HT-SELEX reads (y-axis) with consecutive overlapping PADIT-seq active k-mers after 0–4 rounds of selection (x-axis). (b) Box plots showing uPBM signal intensities for 60-bp probes (n ≈ 42,000) categorized by the number of consecutive overlapping active k-mers. The bounds of the box plots define the 25th, 50th and 75th percentiles, and whiskers extend to the furthest data points within 1.5× the interquartile range. Two-sided Wilcoxon tests for all pairwise comparisons have an adjusted P < 0.05 (not indicated). (c) HOXD13, NKX2.5, TBX5, Pho4, and Cbf1 10-mer PADIT-seq activity versus mean uPBM E-scores of constituent 8-mers, centered around a fixed 8-mer in the center (green). EGR1 PADIT-seq activity of 10-mers containing ‘GCGTGGGTG’ (green) versus uPBM E-scores of constituent 9-mers. (d) For all the PADIT-seq active HOXD13, NKX2.5, TBX5, Pho4, and Cbf1 8-mers: distribution of Pearson correlation coefficients between 10-mer PADIT-seq activities and mean uPBM E-scores of constituent 8-mers. For all the PADIT-seq active EGR1 9-mers: correlations between 10-mer PADIT-seq activity and uPBM E-scores of constituent 9-mers.

Extended Data Fig. 9 PADIT-seq outperforms MotifBreakR in predicting variant effects on TF binding and gene expression.

(a-b) PADIT-seq outperforms MotifBreakR in predicting SNP effects on TF binding. Representative variants, rs62523478 (A) and rs2914146 (B), whose effects on HOXD13 and EGR1 binding, respectively, cannot be explained by PWM models. (Right) Box plots show custom PBM signals from 16 probes for the reference and alternate alleles. (Left) PADIT-seq k-mers tiled across the reference and alternate alleles. Red points represent k-mers that are active in both alleles; green points represent k-mers that are uniquely active in the allele with higher TF binding. The bounds of the box plots define the 25th, 50th and 75th percentiles, and whiskers extend to the furthest data points within 1.5× the interquartile range. (c) Comparison of PADIT-seq and MotifBreakR predictions with ADASTRA EGR1 ChIP-seq allelic skew measurements. (d-f) Noncoding variants alter multiple overlapping EGR1 binding sites to influence gene expression. (d) Allelic skew in MPRA activity of 149 SNPs with differential EGR1 PADIT-seq activity. Shapes represent the different studies from which allelic skew in MPRA activity was obtained. Red shapes correspond to concordant directions of effect between differential EGR1 PADIT-seq activity and MPRA allelic skew. (e) Euler diagram of variants with MPRA allelic skew predicted by MotifBreakR to alter EGR1 binding (green) and with differential PADIT-seq activity (blue). (Bottom) Boxplots comparing MotifBreakR (left) and PADIT-seq (right) effect sizes. The bounds of the box plots define the 25th, 50th and 75th percentiles, and whiskers extend to the furthest data points within 1.5× the interquartile range. * indicates two-sided Wilcoxon rank sum test Pvalue < 0.05. (f) Number of active overlapping 9-mers altered by the 149 variants with differential PADIT-seq activity. Brown variants are predicted to alter EGR1 binding by MotifBreakR, while blue is not.

Extended Data Fig. 10 ‘Weavability’ of binding sites is an inherent property of TFs from different DBD classes.

(a) Schema to demonstrate the logic of network construction. All incoming and outgoing edges to and from the HOXD13 active 8-mer ‘ACTTTACT’ are shown. Active 8-mers are colored red, inactive 8-mers are colored black. Edges occur between active k-mers only. (b) (left) Network representation of HOXD13 active 8-mers and reverse complements (n = 3,536), connected by directed edges (arrows not shown). 3,446 out of 3,536 nodes (97.5%) form the largest, single connected component. (right) HOXD13 PADIT-seq activity of active 8-mers is plotted against the total number of incoming and outgoing edges per node. The bounds of the box plots define the 25th, 50th and 75th percentiles, and whiskers extend to the furthest data points within 1.5× the interquartile range. * Adjusted P < 0.05, two-sided Wilcoxon tests. (c) (left) Network representation of randomly selected 8-mers and reverse complements (n = 3,536) connected by directed edges (arrows not shown). Only 7 out of 3,536 nodes (0.2%) forms the largest, single connected component. (right) HOXD13 PADIT-seq activity of the 3,536 random 8-mers is plotted against the total number of incoming and outgoing edges per node. (d-h) Network representations and activity distributions for NKX2.5, TBX5, EGR1, Pho4, and Cbf1. For each TF, the fraction of active k-mers in the largest, single connected component is indicated. (i) Among the top 500 uPBM 8-mers and reverse complements, the fraction of nodes in the largest, single connected component is plotted for 200 TFs from 9 different families of DBDs. 1,000 random samplings of 500 8-mers and reverse complements are also shown, and were used to perform the permutation test.

Supplementary information

Supplementary Figure 1

Conservation analysis of ChIP–seq peaks containing varying numbers of overlapping, PADIT-seq active k-mers.

Reporting Summary

Peer Review File

Supplementary Tables 1–6

Supplementary Table 1: List of primers used. Supplementary Table 2: List of gBlock and vector DNA sequences. Supplementary Table 3: Small-scale library design and sequence. Supplementary Table 4: PADIT-seq library sequencing statistics. Supplementary Table 5: PADIT-seq active and inactive k-mers. Supplementary Table 6: Processed custom PBM data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khetan, S., Carroll, B.S. & Bulyk, M.L. Multiple overlapping binding sites determine transcription factor occupancy. Nature (2025). https://doi.org/10.1038/s41586-025-09472-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41586-025-09472-3

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research