Abstract
Distant-acting enhancers are central to human development1. However, our limited understanding of their functional sequence features prevents the interpretation of enhancer mutations in disease2. Here we determined the functional sensitivity to mutagenesis of human developmental enhancers in vivo. Focusing on seven enhancers that are active in the developing brain, heart, limb and face, we created over 1,700 transgenic mice for over 260 mutagenized enhancer alleles. Systematic mutation of 12-base-pair blocks collectively altered each sequence feature in each enhancer at least once. We show that 69% of all blocks are required for normal in vivo activity, with mutations more commonly resulting in loss (60%) than in gain (9%) of function. Using predictive modelling, we annotated critical nucleotides at the base-pair resolution. The vast majority of motifs predicted by these machine learning models (88%) coincided with changes in in vivo function, and the models showed considerable sensitivity, identifying 59% of all functional blocks. Taken together, our results reveal that human enhancers contain a high density of sequence features that are required for their normal in vivo function and provide a rich resource for further exploration of human enhancer logic.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
Data availability
Machine learning models are available on the ENCODE or Synapse data portal (www.encodeproject.org, https://synapse.org) under the IDs listed in Supplementary Table 5. Transgenic experimental data are at VISTA Enhancer Browser (https://enhancer.lbl.gov) under the IDs listed in Supplementary Table 2.
Code availability
Code for ChromBPNet and DeepLift is available at GitHub (https://github.com/kundajelab).
References
Long, H. K., Prescott, S. L. & Wysocka, J. Ever-changing landscapes: transcriptional enhancers in development and evolution. Cell 167, 1170–1187 (2016).
Kvon, E. Z. et al. Comprehensive in vivo interrogation reveals phenotypic impact of human enhancer variants. Cell 180, 1262–1271 (2020).
Lambert, S. A. et al. The human transcription factors. Cell 172, 650–665 (2018).
Fickett, J. W. Quantitative discrimination of MEF2 sites. Mol. Cell. Biol. 16, 437–441 (1996).
Gotea, V. et al. Homotypic clusters of transcription factor binding sites are a key component of human promoters and enhancers. Genome Res. 20, 565–577 (2010).
Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).
Dickel, D. E. et al. Ultraconserved enhancers are required for normal development. Cell 172, 491–499 (2018).
Hong, J.-W., Hendrix, D. A. & Levine, M. S. Shadow enhancers as a source of evolutionary novelty. Science 321, 1314 (2008).
Osterwalder, M. et al. Characterization of mammalian in vivo enhancers using mouse transgenesis and CRISPR genome editing. Methods Mol. Biol. 2403, 147–186 (2022).
Snetkova, V. et al. Ultraconserved enhancer function does not require perfect sequence conservation. Nat. Genet. 53, 521–528 (2021).
Attanasio, C. et al. Fine tuning of craniofacial morphology by distant-acting enhancers. Science 342, 1241006 (2013).
Inoue, F. & Ahituv, N. Decoding enhancers using massively parallel reporter assays. Genomics 106, 159–164 (2015).
Visel, A. et al. ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature 457, 854–858 (2009).
Visel, A. et al. A high-resolution enhancer atlas of the developing telencephalon. Cell 152, 895–908 (2013).
Visel, A., Minovitsky, S., Dubchak, I. & Pennacchio, L. A. VISTA Enhancer Browser—a database of tissue-specific human enhancers. Nucleic Acids Res. 35, D88–D92 (2007).
Spurrell, C. H. et al. Genome-wide fetalization of enhancer architecture in heart disease. Cell Rep. 40, 111400 (2022).
May, D. et al. Large-scale discovery of enhancers from human heart tissue. Nat. Genet. 44, 89–93 (2011).
Dickel, D. E. et al. Genome-wide compendium and functional assessment of in vivo heart enhancers. Nat. Commun. 7, 12923 (2016).
Pampari, A. et al. ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants. Preprint at bioRxiv https://doi.org/10.1101/2024.12.25.630221 (2025).
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) Vol. 70, 3145–3153 (PMLR, 2017).
Firulli, B. A., Redick, B. A., Conway, S. J. & Firulli, A. B. Mutations within helix I of Twist1 result in distinct limb defects and variation of DNA binding affinities. J. Biol. Chem. 282, 27536–27546 (2007).
Selleri, L. & Rijli, F. M. Shaping faces: genetic and epigenetic control of craniofacial morphogenesis. Nat. Rev. Genet. 24, 610–626 (2023).
Prescott, S. L. et al. Enhancer divergence and cis-regulatory evolution in the human and chimp neural crest. Cell 163, 68–83 (2015).
Kim, S. et al. DNA-guided transcription factor cooperativity shapes face and limb mesenchyme. Cell 187, 692–711 (2024).
Olson, E. N. Gene regulatory networks in the evolution and development of the heart. Science 313, 1922–1927 (2006).
Smith, R. P. et al. Massively parallel decoding of mammalian regulatory sequences supports a flexible organizational model. Nat. Genet. 45, 1021–1028 (2013).
Lettice, L. A. et al. Opposing functions of the ETS factor family define Shh spatial expression in limb buds and underlie polydactyly. Dev. Cell 22, 459–467 (2012).
Lettice, L. A., Devenney, P., De Angelis, C. & Hill, R. E. The conserved Sonic Hedgehog limb enhancer consists of discrete functional elements that regulate precise spatial expression. Cell Rep. 20, 1396–1408 (2017).
Spitz, F. & Furlong, E. E. M. Transcription factors: from enhancer binding to developmental control. Nat. Rev. Genet. 13, 613–626 (2012).
Gorkin, D. U. et al. An atlas of dynamic chromatin landscapes in mouse fetal development. Nature 583, 744–751 (2020).
ENCODE Project Consortium et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Rebboah, E. et al. The ENCODE mouse postnatal developmental time course identifies regulatory programs of cell types and cell states. Preprint at bioRxiv https://doi.org/10.1101/2024.06.12.598567 (2024).
Ameen, M. et al. Integrative single-cell analysis of cardiogenesis identifies developmental trajectories and non-coding mutations in congenital heart disease. Cell 185, 4937–4953 (2022).
Taskiran, I. I. et al. Cell-type-directed design of synthetic enhancers. Nature 626, 212–220 (2024).
de Almeida, B. P., Reiter, F., Pagani, M. & Stark, A. DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat. Genet. 54, 613–624 (2022).
Rafi, A. M. et al. A community effort to optimize sequence-based deep learning models of gene regulation. Nat. Biotechnol. https://doi.org/10.1038/s41587-024-02414-w (2024).
Karbalayghareh, A., Sahin, M. & Leslie, C. S. Chromatin interaction-aware gene regulatory modeling with graph attention networks. Genome Res. 32, 930–944 (2022).
Tan, J. et al. Cell-type-specific prediction of 3D chromatin organization enables high-throughput in silico genetic screening. Nat. Biotechnol. 41, 1140–1150 (2023).
Farley, E. K., Olson, K. M., Zhang, W., Rokhsar, D. S. & Levine, M. S. Syntax compensates for poor binding sites to encode tissue specificity of developmental enhancers. Proc. Natl Acad. Sci. USA 113, 6508–6513 (2016).
Farley, E. K. et al. Suboptimization of developmental enhancers. Science 350, 325–328 (2015).
Jindal, G. A. et al. Single-nucleotide variants within heart enhancers increase binding affinity and disrupt heart development. Dev. Cell 58, 2206–2216 (2023).
Crocker, J. et al. Low affinity binding site clusters confer hox specificity and regulatory robustness. Cell 160, 191–203 (2015).
Boisclair Lachance, J.-F., Webber, J. L., Hong, L., Dinner, A. R. & Rebay, I. Cooperative recruitment of Yan via a high-affinity ETS supersite organizes repression to confer specificity and robustness to cardiac cell fate specification. Genes Dev. 32, 389–401 (2018).
Borok, M. J., Tran, D. A., Ho, M. C. W. & Drewell, R. A. Dissecting the regulatory switches of development: lessons from enhancer evolution in Drosophila. Development 137, 5–13 (2010).
Galupa, R. et al. Enhancer architecture and chromatin accessibility constrain phenotypic space during Drosophila development. Dev. Cell 58, 51–62 (2023).
Fuqua, T. et al. Dense and pleiotropic regulatory information in a developmental enhancer. Nature 587, 235–239 (2020).
Le Poul, Y. et al. Regulatory encoding of quantitative variation in spatial activity of a enhancer. Sci. Adv. 6, eabe2955 (2020).
Museridze, M. et al. Entangled and non-modular enhancer sequences producing independent spatial activities. Sci. Adv. 10, eadr9856 (2024).
Swanson, C. I., Schwimmer, D. B. & Barolo, S. Rapid evolutionary rewiring of a structurally constrained eye enhancer. Curr. Biol. 21, 1186–1196 (2011).
Trevino, A. E. et al. Chromatin and gene-regulatory dynamics of the developing human cerebral cortex at single-cell resolution. Cell 184, 5053–5069 (2021).
Feng, J., Liu, T., Qin, B., Zhang, Y. & Liu, X. S. Identifying ChIP-seq enrichment using MACS. Nat. Protoc. 7, 1728–1740 (2012).
Lundberg, S. & Lee, S.-I. A unified approach to interpreting model predictions. Preprint at https://doi.org/10.48550/arXiv.1705.07874 (2017).
Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. S. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).
Kosicki, M. et al. Massively parallel reporter assays and mouse transgenic assays provide correlated and complementary information about neuronal enhancer activity. Nat. Commun. 16, 4786 (2025).
Ma, X. R. et al. Molecular convergence of risk variants for congenital heart defects leveraging a regulatory map of the human fetal heart. Preprint at medRxiv https://doi.org/10.1101/2024.11.20.24317557 (2024).
Tenney, A. P. et al. Noncoding variants alter GATA2 expression in rhombomere 4 motor neurons and cause dominant hereditary congenital facial paresis. Nat. Genet. 55, 1149–1163 (2023).
Lee, A. S. et al. A cell type-aware framework for nominating non-coding variants in Mendelian regulatory disorders. Nat. Commun. 15, 8268 (2024).
Acknowledgements
This work was supported by a US National Institutes of Health (NIH) grant to L.A.P. (R01HG003988). Research was conducted at the E.O. Lawrence Berkeley National Laboratory and performed under US Department of Energy Contract DE-AC02-05CH11231, University of California. We acknowledge funding support from NIH grants 5U24HG007234, U01HG009431 and U01HG012069 to A.K. A.P. was supported by a Stanford Bio-X Fellowship. We thank E. Kvon, O. Patange and F. Darbellay for reading the manuscript.
Author information
Authors and Affiliations
Contributions
M. Kosicki designed the study, collected embryos, analysed the data and wrote the manuscript. B.Z., V.H. and A.P. prepared the machine learning models and ran the predictions. L.E.C. collected embryos and provided comments for the manuscript. N.S. collected embryos. I.P.-F., C.S.N., S.T. and M. Kato performed microinjections and surgical embryo transfers. R.D.H., K.v.M., S.B., E.B. and Y.Z. prepared the constructs and genotyped the embryos. J.A.A. genotyped and imaged the embryos, and supervised the technical team. D.E.D. designed the study and provided general supervision. A.K. supervised B.Z. and A.P and obtained funding for running of the machine learning predictions. A.V. designed the study, provided general supervision, obtained funding and contributed substantially to writing of the manuscript. L.A.P. designed the study, provided general supervision and obtained funding.
Corresponding authors
Ethics declarations
Competing interests
A.K. is on the scientific advisory board of SerImmune, AINovo, TensorBio and OpenTargets. A.K. was a scientific co-founder of RavelBio, a paid consultant with Illumina, was on the scientific advisory board of PatchBio and owns shares in DeepGenomics, Immunai, Freenome and Illumina. The other authors declare no competing interests.
Peer review
Peer review information
Nature thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Choice of mutagenesis strategy.
(A) Size distribution of all JASPAR TF binding motifs. (B) Visualization of in silico mutagenesis schemes. (C) Relative score of matches between original TF PWM and mutagenized sequence. (D) Match score min-max normalized to that of best and worst sequence for a given TF PWM. See text for details. Observations are ordered on x-axis by score, so each position does not correspond to the same TF PWM.
Extended Data Fig. 2 Validation of transition mutagenesis scheme.
(A) Three blocks with suspected gain-of-binding events or mismatch between adjacent blocks overlapping the same predicted binding motif were tested using alternative mutagenesis scheme or targeted 2 bp mutations. In all cases, a result confirming gain-of-binding was obtained. (B) Unbiased testing using alternative mutagenesis schemes. Blocks were mutagenized using both a deterministic transition scheme (default for this study) and a GC-preserving transversion scheme, with selected blocks also mutagenized through random scrambling. Embryos positive for H11 insertion and presence of tandem insertions are displayed, except when indicated otherwise (see Methods for genotype definitions). White arrowheads indicate regions in which results of alternative mutagenesis mismatch those of transition mutagenesis (blocks marked with asterisk). See Supplementary Note 2 for details. Related to Fig. 1.
Extended Data Fig. 3 Conservation score normalization and analysis including flanking wild-type blocks.
(A) Conservation score boxplots by block mutagenesis result. Centre line is median. Hinges span interquartile range (IQR). Whiskers extend from the hinge to the largest value no further than 1.5 * IQR from the hinge. (B) Same as A, but coloured by enhancer. Linear regression line is added. (C) Density of conservation scores, coloured by enhancer. Each dot in A and B is a 12 bp block (N = 167). Top panels use raw mammalian conservation score (phyloP241), bottom panels use raw score normalized for median of functional core (per enhancer). Minor loss, major loss and gain blocks were each more conserved, after median normalization, than either wild-type flanking blocks or all wild-type blocks combined. Only major loss blocks were more conserved than wild-type core blocks (FDR < 0.05, 9 comparisons, 7 significant, Mann-Whitney U-tests, two-sided). (D) Correlation between density of gnomAD variants and fraction of functional blocks in functional core (Pearson R2 = 68%, p < 0.05, cor.test in R). Related to Fig. 1. See Supplementary Tables 1 and 3 for source data.
Extended Data Fig. 4 Machine learning model selection and validation.
(A) Correlations between model predictions and in vivo results. Dots = mutagenized constructs. Black fit line is linear regression. R2 is Spearman correlation. N = 35, 25, 35, 30, 26 (FL, NEU3, HT1, HT2, HT3). See Supplementary Data 2 for source data. (B) Remaining predicted motif disruptions. Related to Fig. 2.
Extended Data Fig. 5 Rejected best-fit machine learning models for enhancers NEU1 and NEU2.
(A) Correlations between model predictions and in vivo results. Dots = mutagenized constructs. Black fit line is linear regression. R2 is Spearman correlation. N = 35, 45 (NEU2, NEU1). Asterisk = non-significant (FDR > 0.01). See Supplementary Data 2 for source data. (B) Final TF binding motif and activity map including verified binding motifs discovered through in silico saturation mutagenesis (NEU2 PITX/OTX2 site marked with asterisks). (C) Predicted motif disruptions. Note that validation of the GATA motif in blocks 19–20 did not succeed. (D) Discovery and validation of an additional PITX/OTX2 site in enhancer NEU2.
Extended Data Fig. 6 ChromBPNet validation in other enhancers.
(A) Predictive (Ma 2024 medRxiv, Pampari 2025 bioRxiv) and post-hoc (remaining examples) analysis of mutational impact predictions in enhancers tested in vivo19,55,56,57,58. In all cases a prediction consistent with the impact in vivo was obtained. (B) Model predicted signal change and observed in vivo impact of 147 mutagenized ZRS with PPV = 83% and sensitivity = 25% or 137 brain mutagenized enhancer elements with PPV = 78% and sensitivity = 51%2,10. See Supplementary Note 4 for same cutoff scoring method applied to block mutagenesis, at abs(log2FC)>0.32 (25% signal increase).
Extended Data Fig. 7 Degenerate TFs and alternative models.
(A) Contribution score tracks for wild-type sequences and in silico mutated constructs which were predicted both to increase the open chromatin signal by at least 25% (log2 fold change > 0.32) and to feature a novel cluster of high, positive scores. Three of six discovered sites were validated experimentally. Two of the unverified sites that overlapped wild-type blocks were classified as false positive predictions (MEF2 in HT2 and MEIS in HT3). (B) Validation of double NFI site predicted in blocks 16-17 of enhancer HT2 by in silico saturation mutagenesis. Combined 1 bp mutations in SRF site (T160C) and in the predicted double NFI site (C189T) led to a more pronounced loss of function than SRF mutation alone. This validated the double NFI site and led to reassessment of block 16 as (at least) minor loss. Supplement to Fig. 3a. (C) Exploration of alternative models for enhancer FL. Block mutations overlapping the same binding motif show very similar activity impacts, with exception of block 4 and 5 (see Extended Data Fig. 2 and Supplementary Note 2). (D) Example of total site count for enhancer HT3 (all functional blocks shown). Total site count = all predicted sites – predicted sites in wild-type blocks + blocks without site predictions (4 = 3-1 + 2 in this case).
Extended Data Fig. 8 Patterns of multi-tissue in vivo responses to mutations.
(A) Activity of single block mutants of enhancer HT1, scored across four cardiac substructures. Flanking wild-type blocks not shown. (B) Activity all mutated HT1 constructs, scored across four cardiac substructures, arranged by overall expression (Methods). (C) Activity of mutated FL constructs, scored across three branchial arches. Arranged by structure-specific full loss of function. Only mutants with partial loss of function in one of the arches were included. OFT = outflow tract, LV = left ventricle, RV = right ventricle, atr. = atrium, (r) = random scrambling mutagenesis, (tv) = GC content preserving transversion mutagenesis, 1;11 = combinatorial mutagenesis of blocks 1 and 11, A190G = 1 bp A to T mutation at position 190. Arrowheads: black = gain of function, blue = minor loss, white = full loss.
Extended Data Fig. 9 Additional examples of multi-tissue responses to mutations.
(A) Illustration of paired block mutagenesis outcomes for all possible combinations of loss and gain mutations. Bars represent ranges of possible outcomes that would be classified as cumulative or non-cumulative. Redundant is a special case of non-cumulative in which combined mutagenesis of two blocks resulted in an outcome exactly as severe as the most severe of individual block outcomes. (B) Additional cumulative pair examples. (C) Remaining three non-cumulative pairs. White arrowheads indicate loss of function. Black arrowhead indicates gain of function.
Extended Data Fig. 10 Additional results of paired block mutagenesis.
(A) Illustration of paired block mutagenesis outcomes for all possible combinations of loss and gain mutations. Bars represent ranges of possible outcomes that would be classified as cumulative or non-cumulative. (B) Additional cumulative pair examples. (C) Remaining non-cumulative pairs. Combined mutagenesis of enhancer FL blocks 7 and 12 resulted in higher branchial arch 3 activity, while no change in activity in these structures was observed in constructs with single block mutations (see also hindbrain activity). Combined mutagenesis of enhancer FL blocks 12 and 18 resulted in lower activity in branchial arch 2 compared to constructs mutated in block 12 only, while mutation of block 18 in isolation did not appreciably change the activity of this structure (compare also hindbrain activity). White arrowheads highlight structures of interest. Related to Fig. 4.
Supplementary information
Supplementary Information
Supplementary Notes 1–5.
Supplementary Tables
Supplementary Tables 1–6.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kosicki, M., Zhang, B., Hecht, V. et al. In vivo mapping of mutagenesis sensitivity of human enhancers. Nature 643, 839–846 (2025). https://doi.org/10.1038/s41586-025-09182-w
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41586-025-09182-w