Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Characterization of cis-regulatory elements and functional variants in colorectal cancer using epigenomics and CRISPRi screenings

Abstract

Genetic variants associated with colorectal cancer (CRC) are primarily noncoding and reside in cis-regulatory elements (CREs), yet their underlying mechanisms remain elusive. Here we established a dynamic epigenetic atlas using multiomics data from 533 colorectal tissues spanning normal to advanced adenoma to cancer, identifying 7,492 differential CREs linked to 5,490 target genes. High-throughput CRISPR interference screening revealed 265 functional CREs involved in CRC cell proliferation. A polygenic risk score (PRS) based on functional CRE variants effectively predicted CRC and precancerous lesions among 476,770 individuals. Notably, the functional variant rs10871066 was significantly associated with increased risk of precancerous lesions and CRC (odds ratio = 1.27, P = 1.03 × 10−13). Mechanistically, rs10871066 triggers silencer-to-enhancer switching mediated by FOXP1 and TCF7L2, distally upregulating KLF5 to activate oncogenic pathways and PIBF1 to suppress natural killer cell cytotoxicity. Our study provides a comprehensive resource of dynamic epigenomic atlas, a functionally informed PRS for risk prediction and insights into epigenetic mechanisms underlying CRC development.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Genome-wide atlas of CREs across CRC dynamic continuum.
Fig. 2: Functional characterization of differential CREs and target gene profiles.
Fig. 3: High-throughput CRISPRi screening identified functional CREs essential for CRC cell proliferation.
Fig. 4: Effect of PRS models on the risk predictions for CRC and precancerous lesions.
Fig. 5: The functional CRE variant rs10871066 facilitates enhancer–silencer switching to modulate PIBF1 and KLF5 expression mediated by FOXP1 and TCF7L2.
Fig. 6: Direct effects of rs10871066 on PIBF1 and KLF5 expression and CRC cell proliferation by long-range chromatin interactions.
Fig. 7: KLF5 acts as a potential oncogene to promote CRC development in vitro and in vivo.
Fig. 8: PIBF1-induced activation of CRC pathogenesis by restraining the cytotoxicity of NK cells.

Similar content being viewed by others

Data availability

Sequencing data (ATAC-seq, H3K27ac ChIP-seq, RNA-seq and Hi-C-seq) generated in this study were deposited to the Genome Sequence Archive at the BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences (accession number HRA006866).The raw data are available under controlled access to comply with data protection regulations. Access requests are reviewed by completing the application form through the Genome Sequence Archive Human System and is granted by the corresponding Data Access Committee. Additional guidance can be found on the Genome Sequence Archive Human System website (https://ngdc.cncb.ac.cn/gsa-human/document/GSA-Human_Request_Guide_for_Users_us.pdf). The timeframe for response to requests is expected within 15 working days. The primary and processed data used to generate the analyses presented here are available online. Summary-level data for the full set of Asian and European GWASs are available through the Biobank Japan (http://jenger.riken.jp/en/result), UK Biobank (https://github.com/Wittelab/pancancer_pleiotropy), CORECT (phs001499.v1.p1 and phs001856.v1.p1), and FinnGen (https://www.finngen.fi/en/access_results). For individual-level data, GECCO and PLCO were deposited to dbGaP (accession numbers phs001315.v1.p1, phs001415.v1.p1 and phs001078.v1.p1; phs000346.v2.p2, phs001554.v2.p1,phs001286.v4.p2 and phs001524.v3.p2). Individual-level data from the UK Biobank will be made available subject to standard institutional agreements. Genotype data from UK Biobank (http://www.ukbiobank.ac.uk) were obtained under application number 94939. We used the GRCh37 reference genome for data analysis (https://ftp.ensembl.org/pub/grch37/release-87/fasta/homo_sapiens/dna/). The epigenomic data were obtained from TCGA (https://gdc.cancer.gov/about-data/publications/ATACseq-AWG) and ENCODE (https://www.encodeproject.org/). Regulatory element datasets for CRC were obtained from EnhancerAltas (http://enhanceratlas.org/), FANTOM5 (https://fantom.gsc.riken.jp/5/) and Li et al.15. The loss-of-function knockout screens and RNAi screen of thousands of genes for CRC cells were obtained from DepMap (https://depmap.org/portal/download/all/). CRC-relevant expression data were obtained from the NCBI Gene Expression Omnibus database under accession numbers GSE14333 and GSE103512. RNA-seq profiles (level 3) and genotype data (level 2) of CRC were obtained from TCGA data portal (https://portal.gdc.cancer.gov). The datasets and analysis results are described in the Supplementary Information. Source data are provided with this paper.

Code availability

The code used for data analysis and figure generation in this study was deposited to GitHub (https://github.com/Miaolab2025/Functional_CRE).

References

  1. Sung, H. et al. Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 71, 209–249 (2021).

    PubMed  Google Scholar 

  2. Fernandez-Rozadilla, C. et al. Deciphering colorectal cancer genetics through multi-omic analysis of 100,204 cases and 154,587 controls of European and east Asian ancestries. Nat. Genet. 55, 89–99 (2023).

    Article  CAS  PubMed  Google Scholar 

  3. Law, P. J. et al. Association analyses identify 31 new risk loci for colorectal cancer susceptibility. Nat. Commun. 10, 2154 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Maurano, M. T. et al. Large-scale identification of sequence variants influencing human transcription factor occupancy in vivo. Nat. Genet. 47, 1393–1401 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Tian, J. et al. Risk SNP-mediated enhancer–promoter interaction drives colorectal cancer through both FADS2 and AP002754.2. Cancer Res. 80, 1804–1818 (2020).

    Article  CAS  PubMed  Google Scholar 

  8. Chatterjee, S. et al. Enhancer variants synergistically drive dysfunction of a gene regulatory network in Hirschsprung disease. Cell 167, 355–368(2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Zhang, Z. et al. Genetic analyses support the contribution of mRNA N6-methyladenosine (m6A) modification to human disease heritability. Nat. Genet. 52, 939–949 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Tian, J. et al. N(6)-methyladenosine mRNA methylation of PIK3CB regulates AKT signalling to promote PTEN-deficient pancreatic cancer progression. Gut 69, 2180–2192 (2020).

    Article  CAS  PubMed  Google Scholar 

  11. Noonan, J. P. & McCallion, A. S. Genomics of long-range regulatory elements. Annu. Rev. Genomics Hum. Genet. 11, 1–23 (2010).

    Article  CAS  PubMed  Google Scholar 

  12. He, Y. et al. Spatiotemporal DNA methylome dynamics of the developing mouse fetus. Nature 583, 752–759 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. The ENCODE Project ConsortiumAn integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Article  PubMed Central  Google Scholar 

  14. Lizio, M. et al. Update of the FANTOM web resource: expansion to provide additional transcriptome atlases. Nucleic Acids Res. 47, D752–d758 (2019).

    Article  CAS  PubMed  Google Scholar 

  15. Li, Q. L. et al. Genome-wide profiling in colorectal cancer identifies PHF19 and TBC1D16 as oncogenic super enhancers. Nat. Commun. 12, 6407 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Bergeron, B. P. et al. Epigenomic profiling of glucocorticoid responses identifies cis-regulatory disruptions impacting steroid resistance in childhood acute lymphoblastic leukemia. Leukemia 36, 2374–2383 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Ahn, J. H. et al. Phase separation drives aberrant chromatin looping and cancer development. Nature 595, 591–595 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Strum, W. B. Colorectal adenomas. N. Engl. J. Med. 374, 1065–1075 (2016).

    Article  CAS  PubMed  Google Scholar 

  19. Fodde, R., Smits, R. & Clevers, H. APC, signal transduction and genetic instability in colorectal cancer. Nat. Rev. Cancer 1, 55–67 (2001).

    Article  CAS  PubMed  Google Scholar 

  20. Leslie, A., Carey, F. A., Pratt, N. R. & Steele, R. J. The colorectal adenoma–carcinoma sequence. Br. J. Surg. 89, 845–860 (2002).

    Article  CAS  PubMed  Google Scholar 

  21. Larson, M. H. et al. CRISPR interference (CRISPRi) for sequence-specific control of gene expression. Nat. Protoc. 8, 2180–2196 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Alerasool, N., Segal, D., Lee, H. & Taipale, M. An efficient KRAB domain for CRISPRi applications in human cells. Nat. Methods 17, 1093–1096 (2020).

    Article  CAS  PubMed  Google Scholar 

  23. Ahmed, M. et al. CRISPRi screens reveal a DNA methylation-mediated 3D genome dependent causal mechanism in prostate cancer. Nat. Commun. 12, 1781 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Fulco, C. P. et al. Systematic mapping of functional enhancer–promoter connections with CRISPR interference. Science 354, 769–773 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Li, K. et al. Noncoding variants connect enhancer dysregulation with nuclear receptor signaling in hematopoietic malignancies. Cancer Discov. 10, 724–745 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Pomerantz, M. M. et al. The 8q24 cancer risk variant rs6983267 shows long-range interaction with MYC in colorectal cancer. Nat. Genet. 41, 882–884 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Ogawa, R. et al. Loss of SMAD4 promotes colorectal cancer progression by recruiting tumor-associated neutrophils via the CXCL1/8–CXCR2 axis. Clin. Cancer Res. 25, 2887–2899 (2019).

    Article  CAS  PubMed  Google Scholar 

  29. Zhang, B. et al. Large-scale genetic study in East Asians identifies six new loci associated with colorectal cancer risk. Nat. Genet. 46, 533–542 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Wu, M. et al. Spatiotemporal transcriptomic changes of human ovarian aging and the regulatory role of FOXP1. Nat Aging 4, 527–545 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. De Silva, P. et al. FOXP1 negatively regulates tumor infiltrating lymphocyte migration in human breast cancer. EBioMedicine 39, 226–238 (2019).

    Article  PubMed  Google Scholar 

  32. Meyers, R. M. et al. Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells. Nat. Genet. 49, 1779–1784 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Jiang, Y. Y. et al. TP63, SOX2, and KLF5 establish a core regulatory circuitry that controls epigenetic and transcription patterns in esophageal squamous cell carcinoma cell lines. Gastroenterology 159, 1311–1327(2020).

    Article  CAS  PubMed  Google Scholar 

  34. Huang, B. et al. Interleukin-33-induced expression of PIBF1 by decidual B cells protects against preterm labor. Nat. Med. 23, 128–135 (2017).

    Article  CAS  PubMed  Google Scholar 

  35. Li, X. et al. PIBF1 regulates multiple gene expression via impeding long-range chromatin interaction to drive the malignant transformation of HPV16 integration epithelial cells. J. Adv. Res. 57, 163–180 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Kyurkchiev, D. et al. Cells isolated from human glioblastoma multiforme express progesterone-induced blocking factor (PIBF). Cell. Mol. Neurobiol. 34, 479–489 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. González-Arenas, A., Valadez-Cosmes, P., Jiménez-Arellano, C., López-Sánchez, M. & Camacho-Arroyo, I. Progesterone-induced blocking factor is hormonally regulated in human astrocytoma cells, and increases their growth through the IL-4R/JAK1/STAT6 pathway. J. Steroid Biochem. Mol. Biol. 144 Pt B, 463–470 (2014).

    Article  PubMed  Google Scholar 

  38. Sheffer, M. et al. Genome-scale screens identify factors regulating tumor cell responses to natural killer cells. Nat. Genet. 53, 1196–1206 (2021).

    Article  CAS  PubMed  Google Scholar 

  39. Sordo-Bahamonde, C., Lorenzo-Herrero, S., Payer, Á. R., Gonzalez, S. & López-Soto, A. Mechanisms of apoptosis resistance to NK cell-mediated cytotoxicity in cancer. Int. J. Mol. Sci. 21, 3726 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Perdigoto, C. N. Epigenetic cancer evolution, one cell at a time. Nat. Rev. Genet. 20, 434–435 (2019).

    Article  CAS  PubMed  Google Scholar 

  41. Rodriguez-Martin, B. et al. Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition. Nat. Genet. 52, 306–319 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Cohen, A. J. et al. Hotspots of aberrant enhancer activity punctuate the colorectal cancer epigenome. Nat. Commun. 8, 14400 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Jing, Z. et al. NCAPD3 enhances Warburg effect through c-myc and E2F1 and promotes the occurrence and progression of colorectal cancer. J. Exp. Clin. Cancer Res. 41, 198 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. LaFave, L. M. et al. Epigenomic state transitions characterize tumor progression in mouse lung adenocarcinoma. Cancer Cell 38, 212–228 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Phillips, R. E., Soshnev, A. A. & Allis, C. D. Epigenomic reprogramming as a driver of malignant glioma. Cancer Cell 38, 647–660 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Zhao, J. et al. Tumor-derived CXCL5 promotes human colorectal cancer metastasis through activation of the ERK/Elk-1/Snail and AKT/GSK3β/β-catenin pathways. Mol. Cancer 16, 70 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Zhao, G. et al. DDX39B drives colorectal cancer progression by promoting the stability and nuclear translocation of PKM2. Signal Transduct. Target. Ther. 7, 275 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Zhu, Z. et al. ELK4 promotes colorectal cancer progression by activating the neoangiogenic factor LRG1 in a noncanonical SP1/3-dependent manner.Adv. Sci. 10, e2303378 (2023).

    Article  Google Scholar 

  49. Ying, Y. et al. Zinc finger protein 280C contributes to colorectal tumorigenesis by maintaining epigenetic repression at H3K27me3-marked loci. Proc. Natl Acad. Sci. USA 119, e2120633119 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Choi, S. W., Mak, T. S. & O’Reilly, P. F. Tutorial: a guide to performing polygenic risk score analyses. Nat. Protoc. 15, 2759–2772 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Amariuta, T. et al. Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements. Nat. Genet. 52, 1346–1354 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Gerstein, M. B. et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Xie, W. et al. Epigenomic analysis of multilineage differentiation of human embryonic stem cells. Cell 153, 1134–1148 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Erceg, J. et al. Dual functionality of cis-regulatory elements as developmental enhancers and Polycomb response elements. Genes Dev. 31, 590–602 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Hua, J. T. et al. Risk SNP-mediated promoter–enhancer switching drives prostate cancer through lncRNA PCAT19. Cell 174, 564–575 (2018).

    Article  CAS  PubMed  Google Scholar 

  57. Wei, R. et al. Ketogenesis attenuates KLF5-dependent production of CXCL12 to overcome the immunosuppressive tumor microenvironment in colorectal cancer. Cancer Res. 82, 1575–1588 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Xu, Z. et al. CCL7 and TGF-β secreted by MSCs play opposite roles in regulating CRC metastasis in a KLF5/CXCL5-dependent manner. Mol. Ther. 30, 2327–2341 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Kozma, N. et al. Progesterone-induced blocking factor activates STAT6 via binding to a novel IL-4 receptor. J. Immunol. 176, 819–826 (2006).

    Article  CAS  PubMed  Google Scholar 

  60. Hara, T. et al. Interactions between cancer cells and immune cells drive transitions to mesenchymal-like states in glioblastoma. Cancer Cell 39, 779–792 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Zhang, M. et al. Genetic control of alternative splicing and its distinct role in colorectal cancer mechanisms. Gastroenterology 165, 1151–1167 (2023).

    Article  CAS  PubMed  Google Scholar 

  62. Cai, Y. et al. An atlas of genetic effects on cellular composition of the tumor microenvironment. Nat. Immunol. 25, 1959–1975 (2024).

    Article  PubMed  Google Scholar 

  63. Lu, Y. et al. Large-scale genome-wide association study of East Asians identifies loci associated with risk for colorectal cancer. Gastroenterology 156, 1455–1466 (2019).

    Article  PubMed  Google Scholar 

  64. Houlston, R. S. et al. Meta-analysis of three genome-wide association studies identifies susceptibility loci for colorectal cancer at 1q41, 3q26.2, 12q13.13 and 20q13.33. Nat. Genet. 42, 973–977 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Li, B. et al. Genetic variants that impact alternative polyadenylation in cancer represent candidate causal risk loci. Cancer Res. 83, 3650–3666 (2023).

    Article  CAS  PubMed  Google Scholar 

  66. Gong, J. et al. PancanQTL: systematic identification of cis-eQTLs and trans-eQTLs in 33 cancer types. Nucleic Acids Res. 46, D971–D976 (2018).

    Article  CAS  PubMed  Google Scholar 

  67. Gong, J. et al. Pancan-meQTL: a database to systematically evaluate the effects of genetic variants on methylation in human cancer. Nucleic Acids Res. 47, D1066–D1072 (2019).

    Article  CAS  PubMed  Google Scholar 

  68. Tian, J. et al. CancerSplicingQTL: a database for genome-wide identification of splicing QTLs in human cancer. Nucleic Acids Res. 47, D909–D916 (2019).

    Article  CAS  PubMed  Google Scholar 

  69. Concordet, J. P. & Haeussler, M. CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Res. 46, W242–W245 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Yang, X. et al. Functional characterization of Alzheimer’s disease genetic variants in microglia. Nat. Genet. 55, 1735–1744 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Schmidt, R. et al. CRISPR activation and interference screens decode stimulation responses in primary human T cells. Science 375, eabj4008 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Li, W. et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol 15, 554 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  73. Tian, J. et al. Systematic functional interrogation of genes in GWAS loci identified ATF1 as a key driver in colorectal cancer modulated by a promoter–enhancer interaction. Am. J. Hum. Genet. 105, 29–47 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Chen, C. et al. Single-cell eQTL mapping reveals cell subtype-specific genetic control and mechanism in malignant transformation of colorectal cancer. Cancer Discov. https://doi.org/10.1158/2159-8290.CD-24-1561 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank the numerous participants, their families and referring physicians that participated in these studies. This work was supported by Key Program of National Natural Science Foundation of China (NSFC-82130098, to X.M.), National Science Fund for Excellent Young Scholars (NSFC-82322058, to J.T.), National Science Fund for Distinguished Young Scholars of China (NSFC-81925032, to X.M.), Fundamental Research Funds for the Central Universities (2042025kf0027, to J.T.), Noncommunicable Chronic Diseases National Science and Technology Major Project (2023ZD0501400, to J.T.), National Key R&D Program of China (2022YFA0806601 and 2024YFC3405804, to X.M.), Program of National Natural Science Foundation of China (NSFC-82273713, to J.T.), the Leading Talent Program of the Health Commission of Hubei Province, Knowledge Innovation Program of Wuhan (2023020201010060, to X.M.), Fundamental Research Funds for the Central Universities (2042022rc0026 and 2042024kf1007, to X.M.), Young Elite Scientists Sponsorship Program by CAST (2022QNRC001, to J.T.), National Science Fund for Distinguished Young Scholars of Hubei Province of China (2023AFA046, to J.T.), Knowledge Innovation Program of Wuhan (2023020201010073, to J.T.), National Key R&D Program of China (2024YFC3405803, to Y.Z.), Program of National Natural Science Foundation of China (NSFC-82373663, to Y.Z.), Natural Science Foundation of Hubei Province of China (2024AFB777, to Y.Z.), Program of Health Commission of Hubei Province (WJ2023M045, to Y.Z.) and Knowledge Innovation Program of Wuhan (2023020201020244, to Y.Z.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. The numerical calculations in this paper were performed on the supercomputing system in the Supercomputing Center of Wuhan University.

Author information

Authors and Affiliations

Authors

Contributions

J.T. and X.M. were the overall principal investigators who conceptualized the study and obtained financial support. Y.Z., J.T. and X.M. were responsible for the study design and supervised the entire study. Z.L. and C.C. performed statistical analyses, interpreted the results and drafted the initial paper. Z.L., C.C., H.Z. and B.L. revised the paper. K.S., J.G. and R.X. assisted with data analysis. Y. Liu, Q.M., K.H., M.Z., Y.C., J.H., H.G., L.F., C.N., Y. Li, S.C., Meng J. and W.T. performed laboratory experiments. H.L., X.Y., C.H., Y.W., X.Z., X.L., Z.X., M.C., X.W., S.Z., H.C., M.D., Mingjuan J. and K.C. performed data curation and investigation. J.T. and X.M. were responsible for participant recruitment and sample preparation. All authors approved the final report for publication.

Corresponding authors

Correspondence to Jianbo Tian or Xiaoping Miao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Cancer thanks Ronald Arnold and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Epigenomic reprogramming accompanies the development of cancer.

a. Illustration of 130 paired colorectal cancer samples used in the omics measurements, CRC, n = 110, adenoma individuals, n = 20. The clinical characteristics, ATAC-seq, H3K27ac ChIP-seq, RNA-seq, and Hi-C are colored blue, brown, purple, red, and light blue, respectively. For each omics dataset, from top to bottom, data shown in each panel are as follows: normal, advanced adenoma and cancer tissues. b. Principal component analysis (PCA) of each omics data revealed that colorectal cancer tissues could be largely distinguished from those of advanced adenoma and normal tissues. c. Genomic distribution of ATAC-seq, H3K27ac ChIP-seq, and CREs in colorectal normal, advanced adenoma, and cancer tissues. d. Correlations between H3K27ac signals and ATAC signals for each CRE across multistage tissues. The two-sided P values were calculated by Pearson correlation analysis. e. Circle maps showing the H3K27ac, ATAC and RNA-seq signals of CREs in colorectal normal, advanced adenoma and cancer tissues. f-l. DNA motifs of TFs determined by JASPAR website, and enrichment analyses for each TF motif among C1 (f), C2 (g), C3 (h), C4 (i), C5 (j), C6 (k), C7 (l). P values were calculated by a two-sided hypergeometric test using HOMER.

Source data

Extended Data Fig. 2 Functional characterization for differential CREs.

a. Circle maps showing the H3K27ac ChIP-seq, ATAC-seq and RNA-seq of gradually gained CREs. b. Circle maps showing the H3K27ac ChIP-seq, ATAC-seq and RNA-seq of gradually lost CREs. c. The box plots show the expression level and chromatin activity of MYC. The scatter plots represent the correlation between the chromatin activity and gene expression of MYC among colorectal normal (n = 110, left), advanced adenoma (n = 30, middle) and cancer tissues (n = 110, right). d. The box plots show the expression and chromatin activity of SMAD4. The scatter plots represent the correlation between the chromatin activity and gene expression of SMAD4 among colorectal normal (n = 110, left), advanced adenoma (n = 30, middle) and cancer tissues (n = 110, right). e. Probability distribution density profile for the distance of differential CREs to their corresponding TSS. Each dot (red: H3K27ac ChIP signal, blue: ATAC signal) indicates a CRE plotted according to its distance to the corresponding TSS and statistical significance for differential analysis using a two-side Wald test with Benjamini-Hochberg method. f. ORs for enrichment of CREs within previously identified regulatory elements compared with non-CREs. g. OR for enrichment of CREs in CRC GWAS loci (LD r2 0.2 with index variant) compared with non-CREs. h. OR for enrichment of recurrent CREs in CRC GWAS signals (LD r2 ≥ 0.2 with index variant) compared with non-CREs. i. Pie charts show the proportion (red) of CRC GWAS loci overlapping with CRE variants from colorectal normal, advanced adenoma, and cancer tissues. j-m. Venn diagram showing the overlap of eQTLs (j), sQTLs (k), apaQTLs (l), and meQTLs (m) versus variants in CREs. For c and d, boxes indicate medians and interquartile ranges (IQRs) and whiskers indicate the minimum and maximum values. The two-sided P values were calculated by one-way ANOVA and Pearson correlation analysis. Each dot represents the OR and bars indicate 95% CIs, P-values were calculated by two-sided Fisher’s exact test (f-h).

Source data

Extended Data Fig. 3 Functional annotations for CRE genes and gradually changed CRE genes.

a-c. GO analysis for CRE target genes in normal (a), advanced adenoma (b) and cancer (c) tissues. The significant functional pathways among biological process (BP), molecular function (MF) and cell component (CC) are visualized. d. Differentially expressed genes across three stages were classified into nine patterns (C1-C9). The x-axis indicates the patterns, and the y-axis shows the number of genes in each pattern. e. GO analysis for the differential 7,492 CREs target genes. The significant functional pathways among biological process (BP), molecular function (MF) and cell component (CC) are visualized. f. DNA motifs of TFs determined by JASPAR website, and enrichment analyses for each TF motif among gradually changed CREs. The two-sided P values were calculated by hypergeometric test using HOMER. g. Plots of -log10 P values (x-axis) and OR (y-axis) were obtained from enrichment analysis of gradually changed CREs within binding sites for each TF (N = 60) from ENCODE database. The dashed blue line indicates OR = 1 and P = 0.05/60 = 8.33 × 10-4 (Bonferroni-corrected P value threshold, binding sites for 60 TF were tested). h. ORs for enrichment of gradually changed CREs among regulatory elements compared with non-CREs. i. KEGG pathway enrichment of target genes of gradually changed CREs. The circle color represents the significance of enrichment, and the circle size denotes the number of CRE genes within each pathway. j. GO analysis for target genes of gradually changed CREs. Terms were ranked by two-sided hypergeometric test-derived P values. k. The proportion of the target genes of gradually changed CREs that were associated with immune cell infiltration estimated by EPIC in Timer. l. Representative correlations between gene expression of gradually changed CREs with the infiltrations of immune cells using Pearson correlation with two-tailed test. The circle color represents the correlation degree and circle size represents the significance. P values were calculated by two-tailed Fisher’s exact test (g, h). Each dot represents the OR and bars indicate 95% CIs (h).

Source data

Extended Data Fig. 4 High-throughput CRISPRi screening identified functional CREs essential for CRC cell proliferation and Polygenic Risk Stratification of CRC and precancerous lesions.

a. Relative mRNA expression of Cas9 in different types of cell lines. Data were shown as the median, box limits indicate upper and lower quartiles and whiskers indicate the maximum and minimum, n = 9 biologically independent samples. The P values were calculated by unpaired two-sided Student’s t-test. b. Relative protein expression of Cas9 in different types of cell lines. c-d. Correlation of sgRNA counts between replicates at two time points in SW480 (c) and HCT116 (d) cells. The x- and y-axis denote the sgRNA counts of replicates at day 0 and day 16. The two-sided P values were calculated using Pearson analysis. e-f. Correlation of epigenomic signals with the negative RRA score in SW480 (e) and HCT116 (f) cells. The two-sided P values were estimated using Pearson correlation test. The color of the box denotes the correlation coefficient. g. The number of functional CREs across three stages in the seven different types (C1-C7). h. HRs for incident CRC risk of individuals from each PRS group by comparing with those in the bottom 20% under PRS323 in UK Biobank CRC cohort, n = 380,525. i-j. HRs for incident non-advanced adenoma (i) and advanced neoplasm (j) risk from each PRS group by comparing with those in the bottom 20% under PRS323 in PLCO cohort, n = 14,071. k. HRs for incident CRC risk of individuals from each PRS group by comparing with those in the bottom 20% under PRS224 in UK Biobank CRC cohort, n = 380,525. l-m. HRs for incident non-advanced adenoma (l) and advanced neoplasm (m) risk from each PRS group by comparing with those in the bottom 20% under PRS224 in PLCO cohort, n = 14,071. Participants were divided into low (bottom 20%), intermediate (middle 60%), and high-risk (top 20%) groups. The two-sided P values and HRs of CRC and precancerous lesions risk were calculated using Cox’s proportional hazards regression models (h-m).

Source data

Extended Data Fig. 5 The functional CRE variant rs10871066 regulates PIBF1 and KLF5 expression by interacting with FOXP1 and TCF7L2.

a. The target genes of CRE1 predicted by ABC model among three stages. b. Normalized Hi-C interaction frequencies of small bowel shown as a two-dimensional heatmap denote the potential interaction between rs10871066 and target gene (PIBF1, KLF5) in the region of CRE1 from 3D Genome browser (GRCh37/hg19) from website (http://kobic.kr/3divv1/intro). Genomic tracks represent significant loops. c. The box plots show the expression level (left: PIBF1, middle: KLF5,) and their corresponding chromatin activity, the two-sided P values were calculated by one-way ANOVA (top). The scatter plots represent the correlation between the chromatin activity and gene expression of KLF5 and PIBF1 among colorectal normal (n = 110, left), advanced adenoma (n = 30, middle) and cancer tissues (n = 110, right), The correlation association was calculated by two-sided Pearson correlation. d. eQTL analyses for the association between rs10871066 genotypes and the expression level of PIBF1, KLF5 in CRC samples from TCGA (nAA = 55, nAG = 111, nGG = 159), the two-sided P values were calculated by linear regression analysis. e. The relative luciferase activity of pGL3-promoter vectors containing rs10871066[A] or rs10871066[G] allele in HCT116 cells. n = 9 biologically independent samples. The P values were calculated by unpaired two-sided Student’s t-test. f-g. Footprint profiles of FOXP1 (f) and TCF7L2 (g) at ATAC-seq peaks across all three stages of CRC. h. EMSAs with biotin-labeled probes containing rs10871066 in HCT116 cells. Arrows indicated allele-specific binds that interacted with nuclear protein in the cells. i. Super-shift EMSA using FOXP1 or TCF7L2 antibody in HCT116 cells. ‘I’ represents the allele-specific binding band. ‘II’ represents the super-shifted band. j-k. DNase I footprinting assay. The DNA fragments containing either rs10871066[A] (j) or rs10871066[G] (k) are labeled with 5’-FAM and incubated with different FOXP1 or TCF7L2 concentrations. A black dotted box indicates the region that protein protects from DNase I cleavage. DNA sequencing is shown at the bottom, and the sequence marked in blue represents the protein binding motif. Data were shown as the median, box limits indicate upper and lower quartiles and whiskers indicate the maximum and minimum (c-e).

Source data

Extended Data Fig. 6 FOXP1 and TCF7L2 preferentially bind to the rs10871066[A] and rs10871066[G] alleles to affect the expression of PIBF1 and KLF5, respectively.

a. The effect of FOXP1 or TCF7L2 overexpression on relative luciferase activity of vectors containing rs10871066[A] or rs10871066[G] allele in HCT116 cells. The cells were first treated with plasmids containing varying doses of TCF7L2 or FOXP1-cDNA followed by transfection with luciferase plasmids of rs10871066[A] or [G] allele. The results were normalized against the group of pcDNA3.1 along with luciferase plasmids carrying rs10871066-A allele. n = 9 biologically independent samples. b. The correlations between TCF7L2 and KLF5 expression, FOXP1 and PIBF1 expression were analyzed in the GSE14333 cohort (n = 290). c-d. Scatter plots show the correlations between PIBF1 and FOXP1 expression (top), as well as the correlations between KLF5 expression with TCF7L2 expression (bottom) stratified by variant rs10871066 genotypes in our CRC samples (nAA+AG = 103, nGG = 51) (c) and TCGA CRC samples (nAA+AG = 166, nGG = 159) (d). e-f. The expression of TCF7L2 and FOXP1 in tumor and the normal tissues from our CRC samples (nTumor = 154, nNormal = 154) (e) and GSE103512 (nTumor = 57, nNormal = 38) (f). g. FOXP1 (left) suppressed tumors whereas TCF7L2 (right) promoted CRC growth based on the genome-wide CRISPRCas9-based loss-of-function screens. h-i. The effect of TCF7L2 (h) and FOXP1 (i) overexpression on the expression level of TCF7L2 and FOXP1 was validated by qRT-PCR analysis in SW480 cells (top) and HCT116 cells (bottom), respectively. n = 9 biologically independent samples per group. j-k. The effect of TCF7L2 (j) and FOXP1 (k) knockdown on the expression level of TCF7L2 and FOXP1 in CRISPR-edited SW480 (top) and HCT116 (bottom) cells was validated by qRT-PCR analysis. n = 9 biologically independent samples per group. Data were shown as the median, box limits indicate upper and lower quartiles and whiskers indicate the maximum and minimum (a, f). The P values were calculated by unpaired two-sided Student’s t-test (a, e, f). The two-sided P values and correlation coefficient were calculated by Pearson correlation analysis (b-d). Data were presented as the mean ± SD. The P values were calculated by unpaired two-sided Student’s t-test (h-k).

Source data

Extended Data Fig. 7 KLF5 level in CRC cell lines and the correlations of the expression PIBF1 or FOXP1 with immune infiltration.

a. Representative H&E staining and immunohistochemical analysis (Ki67 and KLF5) in mice tumors derived from lentivirus-mediated SW480 cells (left) and HCT116 cells (right) were shown (n = 5 per group). b-c. The effect of KLF5 overexpression (b) and knockdown (c) on their expression level in SW480 cells (top) and HCT116 cells (bottom) was validated by qRT-PCR analysis. Data were presented as the mean ± SD, n = 9 biologically independent samples. P values were calculated by unpaired two-sided Student’s t-test. d-f. Western blot assays validated the effect of KLF5 overexpression (d) and knockdown (e-f) on the expression level of KLF5 in SW480 cells (top) and HCT116 cells (bottom). g-h. The correlations between the expression of PIBF1 or FOXP1 and the infiltration of immune cells from TCGA database (g, n = 325) and GSE14333 dataset (h, n = 290). i. Scatter plots show the correlations between PIBF1 and CCL4, CCL5, FCGR3A and CD69 expression (top), as well as the correlations between FOXP1 expression with CCL4, CCL5, FCGR3A and CD69 expression (bottom) in TCGA CRC samples (n = 325). j. Scatter plots show the correlations between PIBF1 and CCL4, CCL5, FCGR3A and CD69 expression (top), as well as the correlations between FOXP1 expression with CCL4, CCL5, FCGR3A and CD69 expression (bottom) in GSE14333 CRC samples (n = 290). For g-j, the two-sided P values and correlation coefficient were calculated by Pearson correlation analysis. The circle color represents the correlation degree and circle size represents the significance of correlation (g, h).

Source data

Extended Data Fig. 8 PIBF1-induced activation of CRC pathogenesis by suppressing perforin release from NK cells.

a-b. The effect of PIBF1 overexpression (a) and knockdown (b) on their expression level in SW480 cells (top) and HCT116 cells (bottom) was validated by qRT-PCR analysis. n = 9 biologically independent samples for each group. c-d. Western blot assays validated the effect of PIBF1 overexpression (c) and knockdown (d) on the expression level of PIBF1 in SW480 cells (top) and HCT116 cells (bottom). e-f. The effect of PIBF1 overexpression (e) and knockdown (f) on cell proliferation and colony formation ability in HCT116 cells. n = 4 biologically independent samples for cell proliferation and n = 9 biologically independent samples for colony formation ability for each group. g-h. The effect of PIBF1 overexpression (g) and knockdown (h) on tumor cell lysis (%) in HCT116 cells after co-culturing with NK cells at the ratio of 1:1, 2:1 and 5:1 for NK cell: HCT116 cell. n = 9 biologically independent samples per group. i. Representative images and growth curves of MC38 tumors with PIBF1 knockout in C57BL/6 mice. n = 5 per group. j. Representative H&E staining and immunohistochemical analysis (Ki67 and PIBF1). k-l. The effect of PIBF1 overexpression (k) and knockdown (l) on the proportion of intracellular perforin in NK cells (left) and the level of perforin in supernatant solution (right) co-cultured with HCT116 cells. n = 9 biologically independent samples for each group. m. The effect of PIBF1 knockout on the proportion of intracellular perforin in NK cells from the MC38 tumors (left). The effect of PIBF1 knockout on the level of perforin in supernatant solution from the MC38 tumors (right). n = 5 biologically independent samples. n. Immunofluorescent staining shows colocalization between perforin (green) and NK1.1 (red) from MC38 tumors (n = 5 per group). Scale bar, 20μm and 100μm. Results for cell proliferation were shown as the mean ± SEM (e, top, f, left). The results presented colony formation ability relative to control cells (set to 100%) (e, bottom, f, right). Data were shown as the mean ± SD (a-b, e, bottom, f, right, i, k-m). Data were shown as the median, box limits indicate upper and lower quartiles and whiskers indicate the maximum and minimum (g, h). The P values were calculated by unpaired two-sided Student’s t-test (a-b, e-i, k-m).

Source data

Extended Data Fig. 9 The effect of CRE1 perturbation on CRC development in vitro and in vivo.

a-b. The expression of KLF5(a) and PIBF1 (b) was detected by qRT-PCR analysis on CRE1 perturbation in SW480 and HCT116 cells. Data were presented as the mean ± SD, n = 9 biologically independent samples. The P values were calculated by unpaired two-sided Student’s t-test. c-d. The protein expression of KLF5(c) and PIBF1 (d) was detected by western blot assays. e. The effect of CRE1 perturbation on cell proliferation (left) and colony formation ability (right) in SW480 cells and HCT116 cells. Results for cell proliferation were shown as the mean ± SEM. n = 4 biologically independent samples. The results presented colony formation ability relative to control cells (set to 100%). Data were shown as mean ± SD, n = 9 biologically independent samples. The P values were calculated by unpaired two-sided Student’s t-test. f. Representative images and growth curves of tumors derived from SW480 cells and HCT116 cells with CRE1 perturbation in BALB/c nude mice. The results are shown as the means ± SD (n = 5 per group). The P values were calculated by unpaired two-sided Student’s t-test. g. Identification of candidate downstream effector genes of CRE1 and KLF5 by integration of ChIP-seq and RNA-seq in SW480 cell line. n = 3 biologically independent samples. h. GO analysis for the candidate genes regulated by CRE1 and KLF5. The significant functional pathways among biological process (BP), molecular function (MF) and cell component (CC) were visualized. Terms were ranked by two-sided hypergeometric test-derived P values. i. KEGG pathway enrichment analysis of candidate effector genes regulated by CRE1 and KLF5. The circle color represents the significance of enrichment, and the circle size denotes the number of genes within each pathway. j. Gene-set enrichment analysis (GSEA) of candidate effector genes regulated by CRE1 and KLF5.

Source data

Extended Data Fig. 10 CRE1 promotes CRC development by restraining the cytotoxicity of NK cells.

a-b. The effect of PIBF1 overexpression on their expression levels in MC38 cells by qPCR (a) and western blot (b). For qPCR assays, n = 9 biologically independent samples per group. c-d. The effect of CRE1 perturbation on PIBF1 expression was detected by qPCR (c) and western blot (d) in MC38 cells. For qPCR assays, n = 9 biologically independent samples per group. e-f. The effect of CRE1 perturbation on proliferation (left) and colony formation ability (right) in SW480 (e) and HCT116 (f) cells after co-culturing with NK cells. n = 4 biologically independent samples for proliferation and n = 9 biologically independent samples for colony formation ability. g-h. The effect of CRE1 repression on tumor cell lysis (%) in SW480 (g) and HCT116 (h) cells after co-culturing with NK cells. n = 9 biologically independent samples. i-j. The effect of CRE1 perturbation on the proportion of intracellular perforin in NK cells (left) and the level of perforin in supernatant solution (right) co-cultured with SW480 (i) or HCT116 cells (j). n = 9 biologically independent samples. k-l. The effect of perforin (treatment for 24 hours at 3 μg/mL) on proliferation and colony formation ability in SW480 cells (k) and HCT116 cells (l) compared with DMSO controls. n = 4 biologically independent samples for proliferation and n = 9 biologically independent samples for colony formation ability. m. Representative images and growth curves of tumors derived from MC38 cells with CRE1 perturbation in C57BL/6 mice. n = 5 per group. n. The effect of CRE1 perturbation on the proportion of intracellular perforin in NK cells (left) and the level of perforin in supernatant solution (right) from MC38 tumors. n = 5 per group. Data were shown as the mean ± SD (a, c, g-j, m-n). Results for cell proliferation were shown as the mean ± SEM. The results presented colony formation ability relative to control cells (set to 100%). Data were presented as the mean ± SD (e-f, k-l). The P value was calculated by unpaired two-sided Student’s t-test (a, c, e-n).

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1–4.

Reporting Summary

Supplementary Tables

Supplementary Tables 1–43.

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Fig. 5

Unprocessed gels.

Source Data Fig. 6

Statistical source data.

Source Data Fig. 6

Unprocessed gels.

Source Data Fig. 7

Statistical source data.

Source Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 4

Unprocessed western blots.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 5

Unprocessed western blots.

Source Data Extended Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 7

Statistical source data.

Source Data Extended Data Fig. 7

Unprocessed western blots.

Source Data Extended Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 8

Unprocessed western blots.

Source Data Extended Data Fig. 9

Statistical source data.

Source Data Extended Data Fig. 9

Unprocessed western blots.

Source Data Extended Data Fig. 10

Statistical source data.

Source Data Extended Data Fig. 10

Unprocessed western blots.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, Z., Chen, C., Zhang, H. et al. Characterization of cis-regulatory elements and functional variants in colorectal cancer using epigenomics and CRISPRi screenings. Nat Cancer (2025). https://doi.org/10.1038/s43018-025-01031-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s43018-025-01031-z

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer