Abstract
Genetic variants associated with colorectal cancer (CRC) are primarily noncoding and reside in cis-regulatory elements (CREs), yet their underlying mechanisms remain elusive. Here we established a dynamic epigenetic atlas using multiomics data from 533 colorectal tissues spanning normal to advanced adenoma to cancer, identifying 7,492 differential CREs linked to 5,490 target genes. High-throughput CRISPR interference screening revealed 265 functional CREs involved in CRC cell proliferation. A polygenic risk score (PRS) based on functional CRE variants effectively predicted CRC and precancerous lesions among 476,770 individuals. Notably, the functional variant rs10871066 was significantly associated with increased risk of precancerous lesions and CRC (odds ratio = 1.27, P = 1.03 × 10−13). Mechanistically, rs10871066 triggers silencer-to-enhancer switching mediated by FOXP1 and TCF7L2, distally upregulating KLF5 to activate oncogenic pathways and PIBF1 to suppress natural killer cell cytotoxicity. Our study provides a comprehensive resource of dynamic epigenomic atlas, a functionally informed PRS for risk prediction and insights into epigenetic mechanisms underlying CRC development.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout








Similar content being viewed by others
Data availability
Sequencing data (ATAC-seq, H3K27ac ChIP-seq, RNA-seq and Hi-C-seq) generated in this study were deposited to the Genome Sequence Archive at the BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences (accession number HRA006866).The raw data are available under controlled access to comply with data protection regulations. Access requests are reviewed by completing the application form through the Genome Sequence Archive Human System and is granted by the corresponding Data Access Committee. Additional guidance can be found on the Genome Sequence Archive Human System website (https://ngdc.cncb.ac.cn/gsa-human/document/GSA-Human_Request_Guide_for_Users_us.pdf). The timeframe for response to requests is expected within 15 working days. The primary and processed data used to generate the analyses presented here are available online. Summary-level data for the full set of Asian and European GWASs are available through the Biobank Japan (http://jenger.riken.jp/en/result), UK Biobank (https://github.com/Wittelab/pancancer_pleiotropy), CORECT (phs001499.v1.p1 and phs001856.v1.p1), and FinnGen (https://www.finngen.fi/en/access_results). For individual-level data, GECCO and PLCO were deposited to dbGaP (accession numbers phs001315.v1.p1, phs001415.v1.p1 and phs001078.v1.p1; phs000346.v2.p2, phs001554.v2.p1,phs001286.v4.p2 and phs001524.v3.p2). Individual-level data from the UK Biobank will be made available subject to standard institutional agreements. Genotype data from UK Biobank (http://www.ukbiobank.ac.uk) were obtained under application number 94939. We used the GRCh37 reference genome for data analysis (https://ftp.ensembl.org/pub/grch37/release-87/fasta/homo_sapiens/dna/). The epigenomic data were obtained from TCGA (https://gdc.cancer.gov/about-data/publications/ATACseq-AWG) and ENCODE (https://www.encodeproject.org/). Regulatory element datasets for CRC were obtained from EnhancerAltas (http://enhanceratlas.org/), FANTOM5 (https://fantom.gsc.riken.jp/5/) and Li et al.15. The loss-of-function knockout screens and RNAi screen of thousands of genes for CRC cells were obtained from DepMap (https://depmap.org/portal/download/all/). CRC-relevant expression data were obtained from the NCBI Gene Expression Omnibus database under accession numbers GSE14333 and GSE103512. RNA-seq profiles (level 3) and genotype data (level 2) of CRC were obtained from TCGA data portal (https://portal.gdc.cancer.gov). The datasets and analysis results are described in the Supplementary Information. Source data are provided with this paper.
Code availability
The code used for data analysis and figure generation in this study was deposited to GitHub (https://github.com/Miaolab2025/Functional_CRE).
References
Sung, H. et al. Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 71, 209–249 (2021).
Fernandez-Rozadilla, C. et al. Deciphering colorectal cancer genetics through multi-omic analysis of 100,204 cases and 154,587 controls of European and east Asian ancestries. Nat. Genet. 55, 89–99 (2023).
Law, P. J. et al. Association analyses identify 31 new risk loci for colorectal cancer susceptibility. Nat. Commun. 10, 2154 (2019).
Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).
Maurano, M. T. et al. Large-scale identification of sequence variants influencing human transcription factor occupancy in vivo. Nat. Genet. 47, 1393–1401 (2015).
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Tian, J. et al. Risk SNP-mediated enhancer–promoter interaction drives colorectal cancer through both FADS2 and AP002754.2. Cancer Res. 80, 1804–1818 (2020).
Chatterjee, S. et al. Enhancer variants synergistically drive dysfunction of a gene regulatory network in Hirschsprung disease. Cell 167, 355–368(2016).
Zhang, Z. et al. Genetic analyses support the contribution of mRNA N6-methyladenosine (m6A) modification to human disease heritability. Nat. Genet. 52, 939–949 (2020).
Tian, J. et al. N(6)-methyladenosine mRNA methylation of PIK3CB regulates AKT signalling to promote PTEN-deficient pancreatic cancer progression. Gut 69, 2180–2192 (2020).
Noonan, J. P. & McCallion, A. S. Genomics of long-range regulatory elements. Annu. Rev. Genomics Hum. Genet. 11, 1–23 (2010).
He, Y. et al. Spatiotemporal DNA methylome dynamics of the developing mouse fetus. Nature 583, 752–759 (2020).
The ENCODE Project ConsortiumAn integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Lizio, M. et al. Update of the FANTOM web resource: expansion to provide additional transcriptome atlases. Nucleic Acids Res. 47, D752–d758 (2019).
Li, Q. L. et al. Genome-wide profiling in colorectal cancer identifies PHF19 and TBC1D16 as oncogenic super enhancers. Nat. Commun. 12, 6407 (2021).
Bergeron, B. P. et al. Epigenomic profiling of glucocorticoid responses identifies cis-regulatory disruptions impacting steroid resistance in childhood acute lymphoblastic leukemia. Leukemia 36, 2374–2383 (2022).
Ahn, J. H. et al. Phase separation drives aberrant chromatin looping and cancer development. Nature 595, 591–595 (2021).
Strum, W. B. Colorectal adenomas. N. Engl. J. Med. 374, 1065–1075 (2016).
Fodde, R., Smits, R. & Clevers, H. APC, signal transduction and genetic instability in colorectal cancer. Nat. Rev. Cancer 1, 55–67 (2001).
Leslie, A., Carey, F. A., Pratt, N. R. & Steele, R. J. The colorectal adenoma–carcinoma sequence. Br. J. Surg. 89, 845–860 (2002).
Larson, M. H. et al. CRISPR interference (CRISPRi) for sequence-specific control of gene expression. Nat. Protoc. 8, 2180–2196 (2013).
Alerasool, N., Segal, D., Lee, H. & Taipale, M. An efficient KRAB domain for CRISPRi applications in human cells. Nat. Methods 17, 1093–1096 (2020).
Ahmed, M. et al. CRISPRi screens reveal a DNA methylation-mediated 3D genome dependent causal mechanism in prostate cancer. Nat. Commun. 12, 1781 (2021).
Fulco, C. P. et al. Systematic mapping of functional enhancer–promoter connections with CRISPR interference. Science 354, 769–773 (2016).
Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).
Li, K. et al. Noncoding variants connect enhancer dysregulation with nuclear receptor signaling in hematopoietic malignancies. Cancer Discov. 10, 724–745 (2020).
Pomerantz, M. M. et al. The 8q24 cancer risk variant rs6983267 shows long-range interaction with MYC in colorectal cancer. Nat. Genet. 41, 882–884 (2009).
Ogawa, R. et al. Loss of SMAD4 promotes colorectal cancer progression by recruiting tumor-associated neutrophils via the CXCL1/8–CXCR2 axis. Clin. Cancer Res. 25, 2887–2899 (2019).
Zhang, B. et al. Large-scale genetic study in East Asians identifies six new loci associated with colorectal cancer risk. Nat. Genet. 46, 533–542 (2014).
Wu, M. et al. Spatiotemporal transcriptomic changes of human ovarian aging and the regulatory role of FOXP1. Nat Aging 4, 527–545 (2024).
De Silva, P. et al. FOXP1 negatively regulates tumor infiltrating lymphocyte migration in human breast cancer. EBioMedicine 39, 226–238 (2019).
Meyers, R. M. et al. Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells. Nat. Genet. 49, 1779–1784 (2017).
Jiang, Y. Y. et al. TP63, SOX2, and KLF5 establish a core regulatory circuitry that controls epigenetic and transcription patterns in esophageal squamous cell carcinoma cell lines. Gastroenterology 159, 1311–1327(2020).
Huang, B. et al. Interleukin-33-induced expression of PIBF1 by decidual B cells protects against preterm labor. Nat. Med. 23, 128–135 (2017).
Li, X. et al. PIBF1 regulates multiple gene expression via impeding long-range chromatin interaction to drive the malignant transformation of HPV16 integration epithelial cells. J. Adv. Res. 57, 163–180 (2023).
Kyurkchiev, D. et al. Cells isolated from human glioblastoma multiforme express progesterone-induced blocking factor (PIBF). Cell. Mol. Neurobiol. 34, 479–489 (2014).
González-Arenas, A., Valadez-Cosmes, P., Jiménez-Arellano, C., López-Sánchez, M. & Camacho-Arroyo, I. Progesterone-induced blocking factor is hormonally regulated in human astrocytoma cells, and increases their growth through the IL-4R/JAK1/STAT6 pathway. J. Steroid Biochem. Mol. Biol. 144 Pt B, 463–470 (2014).
Sheffer, M. et al. Genome-scale screens identify factors regulating tumor cell responses to natural killer cells. Nat. Genet. 53, 1196–1206 (2021).
Sordo-Bahamonde, C., Lorenzo-Herrero, S., Payer, Á. R., Gonzalez, S. & López-Soto, A. Mechanisms of apoptosis resistance to NK cell-mediated cytotoxicity in cancer. Int. J. Mol. Sci. 21, 3726 (2020).
Perdigoto, C. N. Epigenetic cancer evolution, one cell at a time. Nat. Rev. Genet. 20, 434–435 (2019).
Rodriguez-Martin, B. et al. Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition. Nat. Genet. 52, 306–319 (2020).
Cohen, A. J. et al. Hotspots of aberrant enhancer activity punctuate the colorectal cancer epigenome. Nat. Commun. 8, 14400 (2017).
Jing, Z. et al. NCAPD3 enhances Warburg effect through c-myc and E2F1 and promotes the occurrence and progression of colorectal cancer. J. Exp. Clin. Cancer Res. 41, 198 (2022).
LaFave, L. M. et al. Epigenomic state transitions characterize tumor progression in mouse lung adenocarcinoma. Cancer Cell 38, 212–228 (2020).
Phillips, R. E., Soshnev, A. A. & Allis, C. D. Epigenomic reprogramming as a driver of malignant glioma. Cancer Cell 38, 647–660 (2020).
Zhao, J. et al. Tumor-derived CXCL5 promotes human colorectal cancer metastasis through activation of the ERK/Elk-1/Snail and AKT/GSK3β/β-catenin pathways. Mol. Cancer 16, 70 (2017).
Zhao, G. et al. DDX39B drives colorectal cancer progression by promoting the stability and nuclear translocation of PKM2. Signal Transduct. Target. Ther. 7, 275 (2022).
Zhu, Z. et al. ELK4 promotes colorectal cancer progression by activating the neoangiogenic factor LRG1 in a noncanonical SP1/3-dependent manner.Adv. Sci. 10, e2303378 (2023).
Ying, Y. et al. Zinc finger protein 280C contributes to colorectal tumorigenesis by maintaining epigenetic repression at H3K27me3-marked loci. Proc. Natl Acad. Sci. USA 119, e2120633119 (2022).
Choi, S. W., Mak, T. S. & O’Reilly, P. F. Tutorial: a guide to performing polygenic risk score analyses. Nat. Protoc. 15, 2759–2772 (2020).
Amariuta, T. et al. Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements. Nat. Genet. 52, 1346–1354 (2020).
Gerstein, M. B. et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012).
Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011).
Xie, W. et al. Epigenomic analysis of multilineage differentiation of human embryonic stem cells. Cell 153, 1134–1148 (2013).
Erceg, J. et al. Dual functionality of cis-regulatory elements as developmental enhancers and Polycomb response elements. Genes Dev. 31, 590–602 (2017).
Hua, J. T. et al. Risk SNP-mediated promoter–enhancer switching drives prostate cancer through lncRNA PCAT19. Cell 174, 564–575 (2018).
Wei, R. et al. Ketogenesis attenuates KLF5-dependent production of CXCL12 to overcome the immunosuppressive tumor microenvironment in colorectal cancer. Cancer Res. 82, 1575–1588 (2022).
Xu, Z. et al. CCL7 and TGF-β secreted by MSCs play opposite roles in regulating CRC metastasis in a KLF5/CXCL5-dependent manner. Mol. Ther. 30, 2327–2341 (2022).
Kozma, N. et al. Progesterone-induced blocking factor activates STAT6 via binding to a novel IL-4 receptor. J. Immunol. 176, 819–826 (2006).
Hara, T. et al. Interactions between cancer cells and immune cells drive transitions to mesenchymal-like states in glioblastoma. Cancer Cell 39, 779–792 (2021).
Zhang, M. et al. Genetic control of alternative splicing and its distinct role in colorectal cancer mechanisms. Gastroenterology 165, 1151–1167 (2023).
Cai, Y. et al. An atlas of genetic effects on cellular composition of the tumor microenvironment. Nat. Immunol. 25, 1959–1975 (2024).
Lu, Y. et al. Large-scale genome-wide association study of East Asians identifies loci associated with risk for colorectal cancer. Gastroenterology 156, 1455–1466 (2019).
Houlston, R. S. et al. Meta-analysis of three genome-wide association studies identifies susceptibility loci for colorectal cancer at 1q41, 3q26.2, 12q13.13 and 20q13.33. Nat. Genet. 42, 973–977 (2010).
Li, B. et al. Genetic variants that impact alternative polyadenylation in cancer represent candidate causal risk loci. Cancer Res. 83, 3650–3666 (2023).
Gong, J. et al. PancanQTL: systematic identification of cis-eQTLs and trans-eQTLs in 33 cancer types. Nucleic Acids Res. 46, D971–D976 (2018).
Gong, J. et al. Pancan-meQTL: a database to systematically evaluate the effects of genetic variants on methylation in human cancer. Nucleic Acids Res. 47, D1066–D1072 (2019).
Tian, J. et al. CancerSplicingQTL: a database for genome-wide identification of splicing QTLs in human cancer. Nucleic Acids Res. 47, D909–D916 (2019).
Concordet, J. P. & Haeussler, M. CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Res. 46, W242–W245 (2018).
Yang, X. et al. Functional characterization of Alzheimer’s disease genetic variants in microglia. Nat. Genet. 55, 1735–1744 (2023).
Schmidt, R. et al. CRISPR activation and interference screens decode stimulation responses in primary human T cells. Science 375, eabj4008 (2022).
Li, W. et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol 15, 554 (2014).
Tian, J. et al. Systematic functional interrogation of genes in GWAS loci identified ATF1 as a key driver in colorectal cancer modulated by a promoter–enhancer interaction. Am. J. Hum. Genet. 105, 29–47 (2019).
Chen, C. et al. Single-cell eQTL mapping reveals cell subtype-specific genetic control and mechanism in malignant transformation of colorectal cancer. Cancer Discov. https://doi.org/10.1158/2159-8290.CD-24-1561 (2025).
Acknowledgements
We thank the numerous participants, their families and referring physicians that participated in these studies. This work was supported by Key Program of National Natural Science Foundation of China (NSFC-82130098, to X.M.), National Science Fund for Excellent Young Scholars (NSFC-82322058, to J.T.), National Science Fund for Distinguished Young Scholars of China (NSFC-81925032, to X.M.), Fundamental Research Funds for the Central Universities (2042025kf0027, to J.T.), Noncommunicable Chronic Diseases National Science and Technology Major Project (2023ZD0501400, to J.T.), National Key R&D Program of China (2022YFA0806601 and 2024YFC3405804, to X.M.), Program of National Natural Science Foundation of China (NSFC-82273713, to J.T.), the Leading Talent Program of the Health Commission of Hubei Province, Knowledge Innovation Program of Wuhan (2023020201010060, to X.M.), Fundamental Research Funds for the Central Universities (2042022rc0026 and 2042024kf1007, to X.M.), Young Elite Scientists Sponsorship Program by CAST (2022QNRC001, to J.T.), National Science Fund for Distinguished Young Scholars of Hubei Province of China (2023AFA046, to J.T.), Knowledge Innovation Program of Wuhan (2023020201010073, to J.T.), National Key R&D Program of China (2024YFC3405803, to Y.Z.), Program of National Natural Science Foundation of China (NSFC-82373663, to Y.Z.), Natural Science Foundation of Hubei Province of China (2024AFB777, to Y.Z.), Program of Health Commission of Hubei Province (WJ2023M045, to Y.Z.) and Knowledge Innovation Program of Wuhan (2023020201020244, to Y.Z.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. The numerical calculations in this paper were performed on the supercomputing system in the Supercomputing Center of Wuhan University.
Author information
Authors and Affiliations
Contributions
J.T. and X.M. were the overall principal investigators who conceptualized the study and obtained financial support. Y.Z., J.T. and X.M. were responsible for the study design and supervised the entire study. Z.L. and C.C. performed statistical analyses, interpreted the results and drafted the initial paper. Z.L., C.C., H.Z. and B.L. revised the paper. K.S., J.G. and R.X. assisted with data analysis. Y. Liu, Q.M., K.H., M.Z., Y.C., J.H., H.G., L.F., C.N., Y. Li, S.C., Meng J. and W.T. performed laboratory experiments. H.L., X.Y., C.H., Y.W., X.Z., X.L., Z.X., M.C., X.W., S.Z., H.C., M.D., Mingjuan J. and K.C. performed data curation and investigation. J.T. and X.M. were responsible for participant recruitment and sample preparation. All authors approved the final report for publication.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Cancer thanks Ronald Arnold and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Epigenomic reprogramming accompanies the development of cancer.
a. Illustration of 130 paired colorectal cancer samples used in the omics measurements, CRC, n = 110, adenoma individuals, n = 20. The clinical characteristics, ATAC-seq, H3K27ac ChIP-seq, RNA-seq, and Hi-C are colored blue, brown, purple, red, and light blue, respectively. For each omics dataset, from top to bottom, data shown in each panel are as follows: normal, advanced adenoma and cancer tissues. b. Principal component analysis (PCA) of each omics data revealed that colorectal cancer tissues could be largely distinguished from those of advanced adenoma and normal tissues. c. Genomic distribution of ATAC-seq, H3K27ac ChIP-seq, and CREs in colorectal normal, advanced adenoma, and cancer tissues. d. Correlations between H3K27ac signals and ATAC signals for each CRE across multistage tissues. The two-sided P values were calculated by Pearson correlation analysis. e. Circle maps showing the H3K27ac, ATAC and RNA-seq signals of CREs in colorectal normal, advanced adenoma and cancer tissues. f-l. DNA motifs of TFs determined by JASPAR website, and enrichment analyses for each TF motif among C1 (f), C2 (g), C3 (h), C4 (i), C5 (j), C6 (k), C7 (l). P values were calculated by a two-sided hypergeometric test using HOMER.
Extended Data Fig. 2 Functional characterization for differential CREs.
a. Circle maps showing the H3K27ac ChIP-seq, ATAC-seq and RNA-seq of gradually gained CREs. b. Circle maps showing the H3K27ac ChIP-seq, ATAC-seq and RNA-seq of gradually lost CREs. c. The box plots show the expression level and chromatin activity of MYC. The scatter plots represent the correlation between the chromatin activity and gene expression of MYC among colorectal normal (n = 110, left), advanced adenoma (n = 30, middle) and cancer tissues (n = 110, right). d. The box plots show the expression and chromatin activity of SMAD4. The scatter plots represent the correlation between the chromatin activity and gene expression of SMAD4 among colorectal normal (n = 110, left), advanced adenoma (n = 30, middle) and cancer tissues (n = 110, right). e. Probability distribution density profile for the distance of differential CREs to their corresponding TSS. Each dot (red: H3K27ac ChIP signal, blue: ATAC signal) indicates a CRE plotted according to its distance to the corresponding TSS and statistical significance for differential analysis using a two-side Wald test with Benjamini-Hochberg method. f. ORs for enrichment of CREs within previously identified regulatory elements compared with non-CREs. g. OR for enrichment of CREs in CRC GWAS loci (LD r2 0.2 with index variant) compared with non-CREs. h. OR for enrichment of recurrent CREs in CRC GWAS signals (LD r2 ≥ 0.2 with index variant) compared with non-CREs. i. Pie charts show the proportion (red) of CRC GWAS loci overlapping with CRE variants from colorectal normal, advanced adenoma, and cancer tissues. j-m. Venn diagram showing the overlap of eQTLs (j), sQTLs (k), apaQTLs (l), and meQTLs (m) versus variants in CREs. For c and d, boxes indicate medians and interquartile ranges (IQRs) and whiskers indicate the minimum and maximum values. The two-sided P values were calculated by one-way ANOVA and Pearson correlation analysis. Each dot represents the OR and bars indicate 95% CIs, P-values were calculated by two-sided Fisher’s exact test (f-h).
Extended Data Fig. 3 Functional annotations for CRE genes and gradually changed CRE genes.
a-c. GO analysis for CRE target genes in normal (a), advanced adenoma (b) and cancer (c) tissues. The significant functional pathways among biological process (BP), molecular function (MF) and cell component (CC) are visualized. d. Differentially expressed genes across three stages were classified into nine patterns (C1-C9). The x-axis indicates the patterns, and the y-axis shows the number of genes in each pattern. e. GO analysis for the differential 7,492 CREs target genes. The significant functional pathways among biological process (BP), molecular function (MF) and cell component (CC) are visualized. f. DNA motifs of TFs determined by JASPAR website, and enrichment analyses for each TF motif among gradually changed CREs. The two-sided P values were calculated by hypergeometric test using HOMER. g. Plots of -log10 P values (x-axis) and OR (y-axis) were obtained from enrichment analysis of gradually changed CREs within binding sites for each TF (N = 60) from ENCODE database. The dashed blue line indicates OR = 1 and P = 0.05/60 = 8.33 × 10-4 (Bonferroni-corrected P value threshold, binding sites for 60 TF were tested). h. ORs for enrichment of gradually changed CREs among regulatory elements compared with non-CREs. i. KEGG pathway enrichment of target genes of gradually changed CREs. The circle color represents the significance of enrichment, and the circle size denotes the number of CRE genes within each pathway. j. GO analysis for target genes of gradually changed CREs. Terms were ranked by two-sided hypergeometric test-derived P values. k. The proportion of the target genes of gradually changed CREs that were associated with immune cell infiltration estimated by EPIC in Timer. l. Representative correlations between gene expression of gradually changed CREs with the infiltrations of immune cells using Pearson correlation with two-tailed test. The circle color represents the correlation degree and circle size represents the significance. P values were calculated by two-tailed Fisher’s exact test (g, h). Each dot represents the OR and bars indicate 95% CIs (h).
Extended Data Fig. 4 High-throughput CRISPRi screening identified functional CREs essential for CRC cell proliferation and Polygenic Risk Stratification of CRC and precancerous lesions.
a. Relative mRNA expression of Cas9 in different types of cell lines. Data were shown as the median, box limits indicate upper and lower quartiles and whiskers indicate the maximum and minimum, n = 9 biologically independent samples. The P values were calculated by unpaired two-sided Student’s t-test. b. Relative protein expression of Cas9 in different types of cell lines. c-d. Correlation of sgRNA counts between replicates at two time points in SW480 (c) and HCT116 (d) cells. The x- and y-axis denote the sgRNA counts of replicates at day 0 and day 16. The two-sided P values were calculated using Pearson analysis. e-f. Correlation of epigenomic signals with the negative RRA score in SW480 (e) and HCT116 (f) cells. The two-sided P values were estimated using Pearson correlation test. The color of the box denotes the correlation coefficient. g. The number of functional CREs across three stages in the seven different types (C1-C7). h. HRs for incident CRC risk of individuals from each PRS group by comparing with those in the bottom 20% under PRS323 in UK Biobank CRC cohort, n = 380,525. i-j. HRs for incident non-advanced adenoma (i) and advanced neoplasm (j) risk from each PRS group by comparing with those in the bottom 20% under PRS323 in PLCO cohort, n = 14,071. k. HRs for incident CRC risk of individuals from each PRS group by comparing with those in the bottom 20% under PRS224 in UK Biobank CRC cohort, n = 380,525. l-m. HRs for incident non-advanced adenoma (l) and advanced neoplasm (m) risk from each PRS group by comparing with those in the bottom 20% under PRS224 in PLCO cohort, n = 14,071. Participants were divided into low (bottom 20%), intermediate (middle 60%), and high-risk (top 20%) groups. The two-sided P values and HRs of CRC and precancerous lesions risk were calculated using Cox’s proportional hazards regression models (h-m).
Extended Data Fig. 5 The functional CRE variant rs10871066 regulates PIBF1 and KLF5 expression by interacting with FOXP1 and TCF7L2.
a. The target genes of CRE1 predicted by ABC model among three stages. b. Normalized Hi-C interaction frequencies of small bowel shown as a two-dimensional heatmap denote the potential interaction between rs10871066 and target gene (PIBF1, KLF5) in the region of CRE1 from 3D Genome browser (GRCh37/hg19) from website (http://kobic.kr/3divv1/intro). Genomic tracks represent significant loops. c. The box plots show the expression level (left: PIBF1, middle: KLF5,) and their corresponding chromatin activity, the two-sided P values were calculated by one-way ANOVA (top). The scatter plots represent the correlation between the chromatin activity and gene expression of KLF5 and PIBF1 among colorectal normal (n = 110, left), advanced adenoma (n = 30, middle) and cancer tissues (n = 110, right), The correlation association was calculated by two-sided Pearson correlation. d. eQTL analyses for the association between rs10871066 genotypes and the expression level of PIBF1, KLF5 in CRC samples from TCGA (nAA = 55, nAG = 111, nGG = 159), the two-sided P values were calculated by linear regression analysis. e. The relative luciferase activity of pGL3-promoter vectors containing rs10871066[A] or rs10871066[G] allele in HCT116 cells. n = 9 biologically independent samples. The P values were calculated by unpaired two-sided Student’s t-test. f-g. Footprint profiles of FOXP1 (f) and TCF7L2 (g) at ATAC-seq peaks across all three stages of CRC. h. EMSAs with biotin-labeled probes containing rs10871066 in HCT116 cells. Arrows indicated allele-specific binds that interacted with nuclear protein in the cells. i. Super-shift EMSA using FOXP1 or TCF7L2 antibody in HCT116 cells. ‘I’ represents the allele-specific binding band. ‘II’ represents the super-shifted band. j-k. DNase I footprinting assay. The DNA fragments containing either rs10871066[A] (j) or rs10871066[G] (k) are labeled with 5’-FAM and incubated with different FOXP1 or TCF7L2 concentrations. A black dotted box indicates the region that protein protects from DNase I cleavage. DNA sequencing is shown at the bottom, and the sequence marked in blue represents the protein binding motif. Data were shown as the median, box limits indicate upper and lower quartiles and whiskers indicate the maximum and minimum (c-e).
Extended Data Fig. 6 FOXP1 and TCF7L2 preferentially bind to the rs10871066[A] and rs10871066[G] alleles to affect the expression of PIBF1 and KLF5, respectively.
a. The effect of FOXP1 or TCF7L2 overexpression on relative luciferase activity of vectors containing rs10871066[A] or rs10871066[G] allele in HCT116 cells. The cells were first treated with plasmids containing varying doses of TCF7L2 or FOXP1-cDNA followed by transfection with luciferase plasmids of rs10871066[A] or [G] allele. The results were normalized against the group of pcDNA3.1 along with luciferase plasmids carrying rs10871066-A allele. n = 9 biologically independent samples. b. The correlations between TCF7L2 and KLF5 expression, FOXP1 and PIBF1 expression were analyzed in the GSE14333 cohort (n = 290). c-d. Scatter plots show the correlations between PIBF1 and FOXP1 expression (top), as well as the correlations between KLF5 expression with TCF7L2 expression (bottom) stratified by variant rs10871066 genotypes in our CRC samples (nAA+AG = 103, nGG = 51) (c) and TCGA CRC samples (nAA+AG = 166, nGG = 159) (d). e-f. The expression of TCF7L2 and FOXP1 in tumor and the normal tissues from our CRC samples (nTumor = 154, nNormal = 154) (e) and GSE103512 (nTumor = 57, nNormal = 38) (f). g. FOXP1 (left) suppressed tumors whereas TCF7L2 (right) promoted CRC growth based on the genome-wide CRISPRCas9-based loss-of-function screens. h-i. The effect of TCF7L2 (h) and FOXP1 (i) overexpression on the expression level of TCF7L2 and FOXP1 was validated by qRT-PCR analysis in SW480 cells (top) and HCT116 cells (bottom), respectively. n = 9 biologically independent samples per group. j-k. The effect of TCF7L2 (j) and FOXP1 (k) knockdown on the expression level of TCF7L2 and FOXP1 in CRISPR-edited SW480 (top) and HCT116 (bottom) cells was validated by qRT-PCR analysis. n = 9 biologically independent samples per group. Data were shown as the median, box limits indicate upper and lower quartiles and whiskers indicate the maximum and minimum (a, f). The P values were calculated by unpaired two-sided Student’s t-test (a, e, f). The two-sided P values and correlation coefficient were calculated by Pearson correlation analysis (b-d). Data were presented as the mean ± SD. The P values were calculated by unpaired two-sided Student’s t-test (h-k).
Extended Data Fig. 7 KLF5 level in CRC cell lines and the correlations of the expression PIBF1 or FOXP1 with immune infiltration.
a. Representative H&E staining and immunohistochemical analysis (Ki67 and KLF5) in mice tumors derived from lentivirus-mediated SW480 cells (left) and HCT116 cells (right) were shown (n = 5 per group). b-c. The effect of KLF5 overexpression (b) and knockdown (c) on their expression level in SW480 cells (top) and HCT116 cells (bottom) was validated by qRT-PCR analysis. Data were presented as the mean ± SD, n = 9 biologically independent samples. P values were calculated by unpaired two-sided Student’s t-test. d-f. Western blot assays validated the effect of KLF5 overexpression (d) and knockdown (e-f) on the expression level of KLF5 in SW480 cells (top) and HCT116 cells (bottom). g-h. The correlations between the expression of PIBF1 or FOXP1 and the infiltration of immune cells from TCGA database (g, n = 325) and GSE14333 dataset (h, n = 290). i. Scatter plots show the correlations between PIBF1 and CCL4, CCL5, FCGR3A and CD69 expression (top), as well as the correlations between FOXP1 expression with CCL4, CCL5, FCGR3A and CD69 expression (bottom) in TCGA CRC samples (n = 325). j. Scatter plots show the correlations between PIBF1 and CCL4, CCL5, FCGR3A and CD69 expression (top), as well as the correlations between FOXP1 expression with CCL4, CCL5, FCGR3A and CD69 expression (bottom) in GSE14333 CRC samples (n = 290). For g-j, the two-sided P values and correlation coefficient were calculated by Pearson correlation analysis. The circle color represents the correlation degree and circle size represents the significance of correlation (g, h).
Extended Data Fig. 8 PIBF1-induced activation of CRC pathogenesis by suppressing perforin release from NK cells.
a-b. The effect of PIBF1 overexpression (a) and knockdown (b) on their expression level in SW480 cells (top) and HCT116 cells (bottom) was validated by qRT-PCR analysis. n = 9 biologically independent samples for each group. c-d. Western blot assays validated the effect of PIBF1 overexpression (c) and knockdown (d) on the expression level of PIBF1 in SW480 cells (top) and HCT116 cells (bottom). e-f. The effect of PIBF1 overexpression (e) and knockdown (f) on cell proliferation and colony formation ability in HCT116 cells. n = 4 biologically independent samples for cell proliferation and n = 9 biologically independent samples for colony formation ability for each group. g-h. The effect of PIBF1 overexpression (g) and knockdown (h) on tumor cell lysis (%) in HCT116 cells after co-culturing with NK cells at the ratio of 1:1, 2:1 and 5:1 for NK cell: HCT116 cell. n = 9 biologically independent samples per group. i. Representative images and growth curves of MC38 tumors with PIBF1 knockout in C57BL/6 mice. n = 5 per group. j. Representative H&E staining and immunohistochemical analysis (Ki67 and PIBF1). k-l. The effect of PIBF1 overexpression (k) and knockdown (l) on the proportion of intracellular perforin in NK cells (left) and the level of perforin in supernatant solution (right) co-cultured with HCT116 cells. n = 9 biologically independent samples for each group. m. The effect of PIBF1 knockout on the proportion of intracellular perforin in NK cells from the MC38 tumors (left). The effect of PIBF1 knockout on the level of perforin in supernatant solution from the MC38 tumors (right). n = 5 biologically independent samples. n. Immunofluorescent staining shows colocalization between perforin (green) and NK1.1 (red) from MC38 tumors (n = 5 per group). Scale bar, 20μm and 100μm. Results for cell proliferation were shown as the mean ± SEM (e, top, f, left). The results presented colony formation ability relative to control cells (set to 100%) (e, bottom, f, right). Data were shown as the mean ± SD (a-b, e, bottom, f, right, i, k-m). Data were shown as the median, box limits indicate upper and lower quartiles and whiskers indicate the maximum and minimum (g, h). The P values were calculated by unpaired two-sided Student’s t-test (a-b, e-i, k-m).
Extended Data Fig. 9 The effect of CRE1 perturbation on CRC development in vitro and in vivo.
a-b. The expression of KLF5(a) and PIBF1 (b) was detected by qRT-PCR analysis on CRE1 perturbation in SW480 and HCT116 cells. Data were presented as the mean ± SD, n = 9 biologically independent samples. The P values were calculated by unpaired two-sided Student’s t-test. c-d. The protein expression of KLF5(c) and PIBF1 (d) was detected by western blot assays. e. The effect of CRE1 perturbation on cell proliferation (left) and colony formation ability (right) in SW480 cells and HCT116 cells. Results for cell proliferation were shown as the mean ± SEM. n = 4 biologically independent samples. The results presented colony formation ability relative to control cells (set to 100%). Data were shown as mean ± SD, n = 9 biologically independent samples. The P values were calculated by unpaired two-sided Student’s t-test. f. Representative images and growth curves of tumors derived from SW480 cells and HCT116 cells with CRE1 perturbation in BALB/c nude mice. The results are shown as the means ± SD (n = 5 per group). The P values were calculated by unpaired two-sided Student’s t-test. g. Identification of candidate downstream effector genes of CRE1 and KLF5 by integration of ChIP-seq and RNA-seq in SW480 cell line. n = 3 biologically independent samples. h. GO analysis for the candidate genes regulated by CRE1 and KLF5. The significant functional pathways among biological process (BP), molecular function (MF) and cell component (CC) were visualized. Terms were ranked by two-sided hypergeometric test-derived P values. i. KEGG pathway enrichment analysis of candidate effector genes regulated by CRE1 and KLF5. The circle color represents the significance of enrichment, and the circle size denotes the number of genes within each pathway. j. Gene-set enrichment analysis (GSEA) of candidate effector genes regulated by CRE1 and KLF5.
Extended Data Fig. 10 CRE1 promotes CRC development by restraining the cytotoxicity of NK cells.
a-b. The effect of PIBF1 overexpression on their expression levels in MC38 cells by qPCR (a) and western blot (b). For qPCR assays, n = 9 biologically independent samples per group. c-d. The effect of CRE1 perturbation on PIBF1 expression was detected by qPCR (c) and western blot (d) in MC38 cells. For qPCR assays, n = 9 biologically independent samples per group. e-f. The effect of CRE1 perturbation on proliferation (left) and colony formation ability (right) in SW480 (e) and HCT116 (f) cells after co-culturing with NK cells. n = 4 biologically independent samples for proliferation and n = 9 biologically independent samples for colony formation ability. g-h. The effect of CRE1 repression on tumor cell lysis (%) in SW480 (g) and HCT116 (h) cells after co-culturing with NK cells. n = 9 biologically independent samples. i-j. The effect of CRE1 perturbation on the proportion of intracellular perforin in NK cells (left) and the level of perforin in supernatant solution (right) co-cultured with SW480 (i) or HCT116 cells (j). n = 9 biologically independent samples. k-l. The effect of perforin (treatment for 24 hours at 3 μg/mL) on proliferation and colony formation ability in SW480 cells (k) and HCT116 cells (l) compared with DMSO controls. n = 4 biologically independent samples for proliferation and n = 9 biologically independent samples for colony formation ability. m. Representative images and growth curves of tumors derived from MC38 cells with CRE1 perturbation in C57BL/6 mice. n = 5 per group. n. The effect of CRE1 perturbation on the proportion of intracellular perforin in NK cells (left) and the level of perforin in supernatant solution (right) from MC38 tumors. n = 5 per group. Data were shown as the mean ± SD (a, c, g-j, m-n). Results for cell proliferation were shown as the mean ± SEM. The results presented colony formation ability relative to control cells (set to 100%). Data were presented as the mean ± SD (e-f, k-l). The P value was calculated by unpaired two-sided Student’s t-test (a, c, e-n).
Supplementary information
Supplementary Information
Supplementary Figs. 1–4.
Supplementary Tables
Supplementary Tables 1–43.
Source data
Source Data Fig. 1
Statistical source data.
Source Data Fig. 2
Statistical source data.
Source Data Fig. 3
Statistical source data.
Source Data Fig. 4
Statistical source data.
Source Data Fig. 5
Statistical source data.
Source Data Fig. 5
Unprocessed gels.
Source Data Fig. 6
Statistical source data.
Source Data Fig. 6
Unprocessed gels.
Source Data Fig. 7
Statistical source data.
Source Data Fig. 8
Statistical source data.
Source Data Extended Data Fig. 1
Statistical source data.
Source Data Extended Data Fig. 2
Statistical source data.
Source Data Extended Data Fig. 3
Statistical source data.
Source Data Extended Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 4
Unprocessed western blots.
Source Data Extended Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 5
Unprocessed western blots.
Source Data Extended Data Fig. 6
Statistical source data.
Source Data Extended Data Fig. 7
Statistical source data.
Source Data Extended Data Fig. 7
Unprocessed western blots.
Source Data Extended Data Fig. 8
Statistical source data.
Source Data Extended Data Fig. 8
Unprocessed western blots.
Source Data Extended Data Fig. 9
Statistical source data.
Source Data Extended Data Fig. 9
Unprocessed western blots.
Source Data Extended Data Fig. 10
Statistical source data.
Source Data Extended Data Fig. 10
Unprocessed western blots.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lu, Z., Chen, C., Zhang, H. et al. Characterization of cis-regulatory elements and functional variants in colorectal cancer using epigenomics and CRISPRi screenings. Nat Cancer (2025). https://doi.org/10.1038/s43018-025-01031-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s43018-025-01031-z