Abstract
Exonic enhancers (EEs) occupy an under-appreciated niche in gene regulation. By integrating transcription factor binding, chromatin accessibility, and high-throughput enhancer-reporter assays, we demonstrate that many protein-coding exons possess enhancer activity across species. These candidate EEs (cEEs) exhibit characteristic epigenomic signatures, form long-range interactions with gene promoters, and can be altered by both nonsynonymous and synonymous variants. CRISPR-mediated inactivation demonstrated the involvement of cEEs in the cis-regulation of host and distal gene expression. Through large-scale cancer genome analyses, we reveal that cEE mutations correlate with dysregulated target-gene expression and clinical outcomes, highlighting their potential relevance in disease. Evolutionary comparisons show that cEEs exhibit both strong sequence constraint and lineage-specific plasticity, suggesting that they serve ancient regulatory functions while also contributing to species divergence. Our findings expand the landscape of functional elements by establishing cEEs as a component of gene regulation, while revealing how coding regions can simultaneously fulfil both protein-coding and cis-regulatory roles.
Similar content being viewed by others
Data availability
Data supporting the findings of this study are available in a Zenodo repository (https://doi.org/10.5281/zenodo.17208730) organised by analysis block in accordance with the structure of the paper. All datasets used in this study are publicly available or have been deposited in appropriate repositories. Annotated coding exons and transcripts (human, mouse, fly, thale cress) were retrieved from UCSC and Ensembl, while FANTOM5, modENCODE, and Arabidopsis TSS data are detailed in the Methods. ChIP-seq data for exonic enhancers (ReMap2022), DNase-seq and ATAC-seq data (ENCODE, ChIP-Atlas, PlantRegMap), and STARR-seq catalogue (Supplementary Data 2 STARR-seq catalogues) were integrated to identify cEEs. G-quadruplex sequencing data (GSE110582) and the newly generated EE STARR-seq dataset (GEO accession GSE292804) were also incorporated. Cancer mutation data were obtained from the TCGA PanCanAtlas. Genomic interactions (promoter capture Hi-C), eQTLs (GTExv8), and ENCODE-rE2G mappings were used to define EE-gene relationships, while phyloP conservation scores (UCSC, PlantRegMap) and gene age classifications (Trigos etal.) further contextualised EE evolution. A genome Browser track hub containing all exonic enhancers identified in this study is available in the UCSC public hubs (https://genome.ucsc.edu/cgi-bin/hgHubConnect) and also public sessions (https://genome.ucsc.edu/cgi-bin/hgPublicSessions). Source Data are provided with this paper. Generated plasmids, reporter constructs, and CRISPRi guide sequences generated in this study are available from the corresponding author. Source data are provided with this paper.
Code availability
We deposited codes and bioinformatics environments in GitHub at (https://github.com/benoitballester/ExonEnhancer) and in Zenodo (https://doi.org/10.5281/zenodo.18255062). Both data and codes are publicly available for the replication of the whole study.
References
Ong, C.-T. & Corces, V. G. Enhancer function: new insights into the regulation of tissue-specific gene expression. Nat. Rev. Genet. 12, 283–293 (2011).
Neznanov, N., Umezawa, A. & Oshima, R. G. A regulatory element within a coding exon modulates keratin 18 gene expression in transgenic mice *. J. Biol. Chem. 272, 27549–27557 (1997).
Lampe, X. et al. An ultraconserved Hox-Pbx responsive element resides in the coding sequence of Hoxa2 and is active in rhombomere 4. Nucleic Acids Res. 36, 3214–3225 (2008).
Dong, X. et al. Exonic remnants of whole-genome duplication reveal cis-regulatory function of coding exons. Nucleic Acids Res. 38, 1071–1085 (2010).
Tümpel, S., Cambronero, F., Sims, C., Krumlauf, R. & Wiedemann, L. M. A regulatory module embedded in the coding region of Hoxa2 controls expression in rhombomere 2. Proc. Natl. Acad. Sci. USA. 105, 20077–20082 (2008).
Lin, M. F. et al. Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes. Genome Res. 21, 1916–1928 (2011).
Stergachis, A. B. et al. Exonic transcription factor binding directs codon choice and impacts protein evolution. Science 342, 1367 (2013).
Birnbaum, R. Y. et al. Coding exons function as tissue-specific enhancers of nearby genes. Genome Res. 22, 1059–1068 (2012).
Birnbaum, R. Y. et al. Systematic dissection of coding exons at single nucleotide resolution supports an additional role in cell-specific transcriptional regulation. PLOS Genet. 10, e1004592 (2014).
Ritter, D. I., Dong, Z., Guo, S. & Chuang, J. H. Transcriptional enhancers in protein-coding exons of vertebrate developmental genes. PLoS ONE 7, e35202 (2012).
Chen, J. et al. Prevalent use and evolution of exonic regulatory sequences in the human genome. Nat. Sci. 3, e20220058 (2023).
Ahituv, N. Exonic enhancers: proceed with caution in exome and genome sequencing studies. Genome Med. 8, 14 (2016).
Hammal, F., de Langen, P., Bergon, A., Lopez, F. & Ballester, B. ReMap 2022: a database of human, mouse, Drosophila and Arabidopsis regulatory regions from an integrative analysis of DNA-binding sequencing experiments. Nucleic Acids Res. 50, D316–D325 (2022).
ENCODE Project Consortium et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
Meuleman, W. et al. Index and biological spectrum of human DNase I hypersensitive sites. Nature 584, 244–251 (2020).
Zou, Z., Ohta, T. & Oki, S. ChIP-Atlas 3.0: a data-mining suite to explore chromosome architecture together with large-scale regulome data. Nucleic Acids Res. 52, W45–W53 (2024).
Forrest, A. R. R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Heintzman, N. D. et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39, 311–318 (2007).
Pekowska, A., Benoukraf, T., Ferrier, P. & Spicuglia, S. A unique H3K4me2 profile marks tissue-specific gene regulation. Genome Res. 20, 1493–1502 (2010).
Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).
Arnold, C. D. et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077 (2013).
Mercer, T. R. et al. DNase I–hypersensitive exons colocalize with promoters and distal regulatory elements. Nat. Genet. 45, 852–859 (2013).
Varshney, D., Spiegel, J., Zyner, K., Tannahill, D. & Balasubramanian, S. The regulation and functions of DNA and RNA G-quadruplexes. Nat. Rev. Mol. Cell Biol. 21, 459–474 (2020).
Esnault, C. et al. G4access identifies G-quadruplexes and their associations with open chromatin and imprinting control regions. Nat. Genet. 55, 1359–1369 (2023).
Laverré, A., Tannier, E. & Necsulea, A. Long-range promoter–enhancer contacts are conserved during evolution and contribute to gene expression robustness. Genome Res. 32, 280–296 (2022).
Gschwind, A. R. et al. An encyclopedia of enhancer-gene regulatory interactions in the human genome. BioRxiv 11, 563812 (2023).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Castro-Mondragon, J. A. et al. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 50, D165–D173 (2022).
Santiago-Algarra, D. et al. Epromoters function as a hub to recruit key transcription factors required for the inflammatory response. Nat. Commun. 12, 6660 (2021).
Lonsdale, J. et al. The genotype-tissue expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Hoadley, K. A. et al. Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell 173, 291–304.e6 (2018).
Schmidt, D. et al. Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science 328, 1036–1040 (2010).
Ballester, B. et al. Multi-species, multi-transcription factor binding highlights conserved control of tissue-specific biological pathways. elife 3, e02626 (2014).
Perez, G. et al. The UCSC genome browser database: 2025 update. Nucleic Acids Res 53, D1243–D1249 (2025).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Piovesan, D. et al. MOBIDB in 2025: integrating ensemble properties and function annotations for intrinsically disordered proteins. Nucleic Acids Res. 53, D495–D503 (2025).
Frankish, A. et al. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res. 51, D942–D949 (2023).
Howe, K. L. et al. Ensembl 2021. Nucleic Acids Res. 49, D884–D891 (2021).
modENCODE Consortium et al. Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science 330, 1787–1797 (2010).
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385.e18 (2018).
Schon, M. A., Kellner, M. J., Plotnikova, A., Hofmann, F. & Nodine, M. D. NanoPARE: parallel analysis of RNA 5’ ends from low-input RNA. Genome Res. 28, 1931–1942 (2018).
Thieffry, A. et al. Characterization of Arabidopsis thaliana promoter bidirectionality and antisense RNAs by inactivation of nuclear RNA decay pathways. Plant Cell 32, 1845–1867 (2020).
Tian, F., Yang, D.-C., Meng, Y.-Q., Jin, J. & Gao, G. PlantRegMap: charting functional regulatory maps in plants. Nucleic Acids Res. 48, D1104–D1113 (2020).
de Langen, P. et al. Characterizing intergenic transcription at RNA polymerase II binding sites in normal and cancer tissues. Cell Genom. 3, 100411 (2023).
Ramírez, F., Dündar, F., Diehl, S., Grüning, B. A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014).
Marsico, G. et al. Whole genome experimental maps of DNA G-quadruplexes in multiple species. Nucleic Acids Res. 47, 3862–3874 (2019).
Einson, J., Minaeva, M., Rafi, F. & Lappalainen, T. The impact of genetically controlled splicing on exon inclusion and protein structure. PloS One 19, e0291960 (2024).
Rodriguez, J. M. et al. APPRIS: selecting functionally important isoforms. Nucleic Acids Res. 50, D54–D59 (2022).
GTEx Consortium The GTEx consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Uhlen, M. et al. A genome-wide transcriptomic analysis of protein-coding genes in human blood cells. Science 366, eaax9198 (2019).
Uhlén, M. et al. Transcriptomics resources of human tissues and organs. Mol. Syst. Biol. 12, 862 (2016).
Steinhaus, R., Robinson, P. N. & Seelow, D. FABIAN-variant: predicting the effects of DNA variants on transcription factor binding. Nucleic Acids Res. 50, W322–W329 (2022).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinforma. Oxf. Engl. 25, 2078–2079 (2009).
Long, E. et al. Massively parallel reporter assays and variant scoring identified functional variants and target genes for melanoma loci and highlighted cell-type specificity. Am. J. Hum. Genet. 109, 2210–2229 (2022).
Nei, M. & Gojobori, T. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3, 418–426 (1986).
Concordet, J.-P. & Haeussler, M. CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Res. 46, W242–W245 (2018).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinforma. Oxf. Engl. 26, 841–842 (2010).
Ellrott, K. et al. Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell Syst. 6, 271–281.e7 (2018).
Muzellec, B., Teleńczuk, M., Cabeli, V. & Andreux, M. PyDESeq2: a Python package for bulk RNA-seq differential expression analysis. Bioinforma. Oxf. Engl. 39, btad547 (2023).
Cheng, X. et al. cSurvival: a web resource for biomarker interactions in cancer outcomes and in cell lines. Brief. Bioinform. 23, bbac090 (2022).
Sollis, E. et al. The NHGRI-EBI GWAS catalog: knowledgebase and deposition resource. Nucleic Acids Res. 51, D977–D985 (2023).
1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Paten, B., Herrero, J., Beal, K., Fitzgerald, S. & Birney, E. Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res. 18, 1814–1828 (2008).
Crusoe, M. R. et al. The khmer software package: enabling efficient nucleotide sequence analysis. F1000Research 4, 900 (2015).
Trigos, A. S., Pearson, R. B., Papenfuss, A. T. & Goode, D. L. Altered interactions between unicellular and multicellular genes drive hallmarks of transformation in a diverse range of solid tumors. Proc. Natl. Acad. Sci. USA 114, 6406–6411 (2017).
Li, Y., Aguilar-Martinez, E. & Sharrocks, A. D. Geno2proteo, a tool for batch retrieval of DNA and protein sequences from any genomic or protein regions. J. Integr. Bioinforma. 16, 20180090 (2019).
Corces, M. R. et al. The chromatin accessibility landscape of primary human cancers. Science 362, eaav1898 (2018).
Acknowledgements
The authors thank Robin Steinhaus for his assistance with fabian-tools and lifting the VCF query capacity. We also appreciate Science AAAS for granting permission to reproduce and modify the PanCanAtlas schema in Fig. 5c. This work was supported by a PhD Fellowship awarded to J.-C.M. from the French Ministry of Higher Education and Research (MESR), Institut National de la Santé et de la Recherche Médicale (INSERM), the Core Cluster of the Institut Français de Bioinformatique (IFB; ANR-11-INBS-0013), and by the Agence Nationale pour la Recherche (ANR; grant ANR-23-CE12-0008-01). A Marie Sklodowska-Curie Action postdoctoral fellowship (Eprom-101065610) supported A.V.O. We acknowledge the contribution of AniRA lentivectors production facility from the CELPHEDIA Infrastructure and SFR Biosciences (UAR3444/CNRS, US8/Inserm, ENS de Lyon, UCBL), especially Gisèle Froment, Aurélie Thibaut and Caroline Costa. We thank the Marseille-Luminy cell biology platform for managing cell culture and Nori Sadouni from HL BIOPROCESS (Marseille, France) for the STARR-seq preprocessing. The results presented here are based on data generated by the TCGA Research Network, the GTEx project, the ENCODE Consortium and its production laboratories, as well as independent laboratories that submitted raw ChIP-seq and other omics datasets to public repositories (GEO). We thank Andreas Zanzoni for a helpful discussion about protein disorder.
Author information
Authors and Affiliations
Contributions
B.B. conceived and supervised the project. J-C.M. developed computational methods, curated ATAC-seq, DNase I, and STARR-seq datasets, and performed data analysis. M.T. and F.G. carried out luciferase reporter assays. I.M. and M.T. conducted CRISPRi experiments. A.V.O. performed STARR-seq assays and selected and designed CRISPRi guides. S.S. supervised STARR-seq and CRISPRi experiments. J-C.M. and B.B. prepared the figures, and J-C.M., S.S., and B.B. wrote the manuscript with input from all authors.
Corresponding author
Ethics declarations
Competing interests
All authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mouren, JC., Torres, M., van Ouwerkerk, A. et al. Exonic enhancers are a widespread class of dual-function regulatory elements. Nat Commun (2026). https://doi.org/10.1038/s41467-026-71220-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-026-71220-6


