Abstract
Colorectal cancer (CRC) is a common complex disease caused by the combination of genetic variants and environmental factors. Genome-wide association studies (GWAS) have been performed and reported some novel CRC susceptibility variants. However, the potential genetic mechanisms for newly identified CRC susceptibility variants are still unclear. Here, we selected 85 CRC susceptibility variants with suggestive association Pā<ā1.00E-05 from the National Human Genome Research Institute GWAS catalog. To investigate the underlying genetic pathways where these newly identified CRC susceptibility genes are significantly enriched, we conducted a functional annotation. Using two kinds of SNP to gene mapping methods including the nearest upstream and downstream gene method and the ProxyGeneLD, we got 128 unique CRC susceptibility genes. We then conducted a pathway analysis in GO database using the corresponding 128 genes. We identified 44 GO categories, 17 of which are regulatory pathways. We believe that our results may provide further insight into the underlying genetic mechanisms for these newly identified CRC susceptibility variants.
Similar content being viewed by others
Introduction
Colorectal cancer (CRC) is the third most common form of cancer and the second leading cause of cancer-related death in the western world and1,2. CRC is a leading cause of cancer-related deaths in the United States and its lifetime risk in the United States is about 7%1,3. CRC is a common complex disease caused by the combination of genetic variants and environmental factors1. Genome-wide association studies (GWAS) are considered to be new and power approaches to detect the genetic variants of human complex diseases. Recently, GWAS have been performed and reported some novel CRC susceptibility single nucleotide polymorphisms (SNPs) with genome-wide significance (Pā<ā5.00E-08) and these SNPs have been repliciated by meta-anaysis methods4,5,6,7,8,9,10,11,12,13.
In 2012, Loo et al. conducted a cis-expression quantitative trait loci (cis-eQTLs) analysis to investigate the possible regulatory functions of 19 CRC risk variants on the expression of neighboring genes (<2āMb up- or down-stream)14. They identified three variants including rs10795668, rs4444235 and rs9929218 to be significantly associated with expression levels of nearby genes14. In 2014, Closa et al. analyzed the association between 26 CRC SNPs and the expression of genes within a 2āMb region (cis-eQTLs) using 47 healthy colonic mucosa tissues and 97 normal mucosa tissues adjacent to colon cancer and 97 paired tumor tissues15. Using Bonferroni correction, they identified three significant cis-eQTLs including rs3802842 in 11q23.1 associated with the expression of C11orf53, COLCA1 and COLCA2; rs7136702 in 12q13.12 associated with the expression of DIP2B and rs5934683 in Xp22.3 associated with the expression of SHROOM2 and GPR143. Closa et al. also reported other SNPs including rs7130173 for 11q23.1 and rs61927768 for 12q13.12, which are in linkage disequilibrium (LD) with rs3802842 and rs7136702 and are more strongly associated with the expression of the identified genes and are better functional candidates. In 2014, Yao et al. select 25 CRC SNPs and test the hypothesis that the CRC SNPs and/or correlated SNPs are in elements that regulate gene expression3. They identified 23 promoters, 28 enhancers and 66 putative target genes of the risk-associated enhancers3.
Evidence shows that most variants for common human diseases are not correlated with protein-coding changes, indicating that susceptibility variants in regulatory regions may contribute to disease phenotypes16. For CRC, most risk variants also reside outside the coding regions of genes3,14,15. Until now, as described above, comprehensive functional studies of CRC SNPs on nearby gene expression have been reported3,14,15. Evidence from the National Human Genome Research Institute (NHGRI) GWAS catalog shows that 85 CRC susceptibility variants, which reach suggestive association Pā<ā1.00E-05, have been identified until now17,18. However, the exact genetic mechanisms for these newly identified CRC susceptibility variants are still unclear now. In order to investigate the potential regulatory functions for 85 newly identified CRC susceptibility variants, we conducted a pathway analysis of these CRC susceptibility genes around these CRC susceptibility variants.
Results
CRC susceptibility genes from ProxyGeneLD
Using the ProxyGeneLD and the LD information from the HapMap phase II Europe (CEU), 74 of these 85 unique CRC susceptibility variants were included in HapMap and were successfully mapped to the corresponding genes 53 unique CRC susceptibility genes (Table 1). However, another 11 SNPs including rs11196172, rs73376930, rs11255841, rs10849432, rs35509282, rs140355816, rs34245511, rs12412391, rs4849303, rs57046232 and rs7999699 are not found in HapMap.
CRC susceptibility genes for pathway analysis
Using the nearest upstream and downstream gene method in NHGRI GWAS catalog, we got 106 unique CRC susceptibility gene IDs as described above. We compared these 106 genes with 53 unique CRC susceptibility genes from the ProxyGeneLD and found 31 shared genes. In the end, we got 128 unique CRC susceptibility genes, which is the union of genes from both methods.
Pathway analysis preprocessing
In WebGestalt database, 120 of 128 genes were successfully mapped to 120 unique Entrez Gene IDs19. Other 8 genes were mapped to multiple Entrez Gene IDs or could not be mapped to any Entrez Gene ID. The following pathway analysis will be based upon the 120 unique gene IDs.
Pathway analysis using GO database
Our pathway analysis in GO database showed that these 120 CRC susceptibility genes were significantly enriched in 40 biological processes, 1 molecular function and 3 cellular components with adjusted Pā<ā0.01. 17 of these 44 significant signals are regulatory pathways, such as regulation of epithelial to mesenchymal transition, negative regulation of Wnt receptor signaling pathway, regulation of pathway-restricted SMAD protein phosphorylation, positive regulation of nucleocytoplasmic transport, regulation of muscle organ development, positive regulation of intracellular protein transport. Interestingly, regulation of epithelial to mesenchymal transition (GO:0010717) is the most significant signal (Table 2). More detailed information including the gene IDs is described in supplementary Table 1.
Discussion
Until now, 85 CRC susceptibility variants with suggestive association Pā<ā1.00E-05 have been reported17,18. To investigate the underlying genetic pathways where these newly identified CRC susceptibility genes are significantly enriched, we conducted a functional annotation. Using two kinds of SNP to gene mapping methods including the nearest upstream and downstream gene method and the ProxyGeneLD, we got 128 unique CRC susceptibility genes. We then conducted a pathway analysis in GO database using the corresponding 128 genes. We identified 44 GO categories, 17 of which are regulatory pathways.
Here, we identified the regulation of epithelial to mesenchymal transition (GO:0010717) to be the most significant signal in all the 44 GO categories and the most signal in all the 17 regulatory pathways. It is reported that the epithelial-mesenchymal transition-like dedifferentiation of the tumor cells is a character of CRC invasion20. Several studies have reviewed the association between epithelial-mesenchymal transition and CRC progression21,22,23. Our results show that these newly identified CRC susceptibility SNPs or genes may regulate epithelial-mesenchymal transition.
The negative regulation of Wnt receptor signaling pathway (GO:0030178) is the third significant signal in all the 44 GO categories and the second significant signal in all the 17 regulatory pathways. Evidence shows that aberrant regulation of the Wnt/β-catenin signaling pathway can cause CRC24. It is reported that the loss-of-function mutations in APC gene are common in CRC and can cause inappropriate activation of Wnt signaling24. Recently, several studies have reviewed the involvement of Wnt signalling in CRC development25,26,27. Masuda et al. reported Wnt signaling to be the potential therapeutic target by targeting TNIK in CRC28. Here, our results show that these newly identified CRC susceptibility SNPs or genes may regulate Wnt receptor signaling pathway.
The positive regulation of nucleocytoplasmic transport pathway (GO:0046824) is the 8th significant signal in all the 44 GO categories and the 4th significant signal in all the 17 regulatory pathways. Hill et al. reviewed the mechanisms and role of nucleocytoplasmic transport in cancer therapy29. Here, we identified the pathway-restricted SMAD protein phosphorylation (GO:0060389) and regulation of pathway-restricted SMAD protein phosphorylation (GO:0060393) to be 5th and 7th significant association signals, respectively. Interestingly, evidence shows that protein phosphorylation is a post-translational modification central to cancer biology30. Protein phosphorylation affects most eukaryotic cellular processes and its deregulation is considered a hallmark of cancer31.
We also found that these newly identified CRC susceptibility SNPs or genes may regulate five GO categories related with cell differentiation including regulation of fat cell differentiation (GO:0045598), mesenchymal cell differentiation (GO:0048762), regulation of striated muscle cell differentiation (GO:0051153), negative regulation of myoblast differentiation (GO:0045662) and cell morphogenesis involved in differentiation (GO:0000904). Evidence showed the involvement of differentiation in CRC. Breaking the balance between proliferation and differentiation in animal cells can cause cancer32. PPAR-γ is a nuclear receptor with a dominant regulatory role in differentiation of cells of the adipose lineage33. PPAR-γ can modulate the growth and differentiation of CRC cells33. Differentiated human CRC cells protect tumor-initiating cells from irinotecan34. The resistance of colorectal tumors to irinotecan requires the cooperative action of tumor-initiating ALDHhigh/ABCB1negative cells and their differentiated, drug-expelling, ALDHlow/ABCB1positive daughter cells34. The calcium activated chloride channel A1 (CLCA1) is a member of the calcium sensitive chloride conductance family of proteins and is expressed mainly in the colon32. Recent study shows that CLCA1 plays an important role in differentiation and proliferation of Caco-2 cells, which can regulate the transition from proliferation to differentiation in CRC and may be a potential diagnostic marker for CRC prognosis32.
Take together, our findings suggest that most CRC susceptibility variants are located in the intron region of protein encoding genes and are not correlated with protein-coding changes. Most of these 120 CRC susceptibility genes are involved in kinds of regulatory pathways. Our results may provide further insight into the underlying genetic mechanisms for these newly identified CRC susceptibility variants.
Materials and Methods
CRC susceptibility variants
The CRC susceptibility variants were available from the NHGRI GWAS catalog, which collected the results of published GWAS in online database18. We selected 85 unique CRC susceptibility variants with Pā<ā1.00E-05 from the GWAS CRC, CRC (calcium intake interaction) and CRC (diet interaction).
Data preprocessing
In NHGRI GWAS catalog, these 85 unique CRC susceptibility variants were successfully mapped to 167 nearest upstream and downstream genes. We further analyzed these 167 genes and got 106 unique CRC susceptibility gene IDs. The detailed information was described in Table 1.
Mapping SNPs to genes using the ProxyGeneLD
In addition to the nearest upstream and downstream gene method, we also used a Perl software named ProxyGeneLD. ProxyGeneLD can map these 85 SNPs to their corresponding genes using the linkage disequilibrium (LD) information from the HapMap genotyping data (HapMap phase II Europe (CEU), release 22)35. For more detailed algorithms, please refer to the original study35.
CRC susceptibility genes
Here, we map these 85 SNPs to their corresponding genes using both methods as described above. The final CRC susceptibility gene set is the union of genes from both methods.
Pathway analysis using WebGestalt
We used the GO pathways in WebGestalt database for pathway analysis19. The hypergeometric test was used to detect the overrepresentation of differently expressed AD genes among all of the genes in a given pathway19. The reference gene list is the entire Entrez gene set. The minimum number of genes for a category is 3. The FDR test was used to correct for multiple testing. GO pathways with an adjusted Pā<ā0.05 are considered to be significantly associated with CRC.
Additional Information
How to cite this article: Lu, X. et al. Colorectal cancer risk genes are functionally enriched in regulatory pathways. Sci. Rep. 6, 25347; doi: 10.1038/srep25347 (2016).
References
Lindblom, A. et al. Colorectal cancer as a complex disease: defining at-risk subjects in the general population-a preventive strategy. Expert Rev Anticancer Ther 4, 377ā385 (2004).
Quan, B. et al. Pathway analysis of genome-wide association study and transcriptome data highlights new biological pathways in colorectal cancer. Mol Genet Genomics 290, 603ā610 (2015).
Yao, L., Tak, Y. G., Berman, B. P. & Farnham, P. J. Functional annotation of colon cancer risk SNPs. Nat Commun 5, 5114 (2014).
Zanke, B. W. et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nat. Genet. 39, 989ā994 (2007).
Tomlinson, I. et al. A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nat. Genet. 39, 984ā988 (2007).
Tomlinson, I. P. et al. A genome-wide association study identifies colorectal cancer susceptibility loci on chromosomes 10p14 and 8q23.3. Nat. Genet. 40, 623ā630 (2008).
Cui, R. et al. Common variant in 6q26-q27 is associated with distal colon cancer in an Asian population. Gut 60, 799ā805 (2011).
Peters, U. et al. Identification of Genetic Susceptibility Loci for Colorectal Tumors in a Genome-Wide Meta-analysis. Gastroenterology 144, (799ā807) e724 (2013).
Tenesa, A. et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21. Nat. Genet. 40, 631ā637 (2008).
Broderick, P. et al. A genome-wide association study shows that common alleles of SMAD7 influence colorectal cancer risk. Nat. Genet. 39, 1315ā1317 (2007).
Peters, U. et al. Meta-analysis of new genome-wide association studies of colorectal cancer risk. Hum. Genet. 131, 217ā234 (2012).
Liao, M. et al. Analyzing large-scale samples confirms the association between rs16892766 polymorphism and colorectal cancer susceptibility. Sci Rep 5, 7957 (2015).
He, D. et al. Analyzing large-scale samples highlights significant association between rs10411210 polymorphism and colorectal cancer. Biomed Pharmacother 74, 164ā168 (2015).
Loo, L. W. et al. cis-Expression QTL analysis of established colorectal cancer risk variants in colon tumors and adjacent normal tissue. PLoS One 7, e30477 (2012).
Closa, A. et al. Identification of candidate susceptibility genes for colorectal cancer through eQTL analysis. Carcinogenesis 35, 2039ā2046 (2014).
Lonsdale, J. et al. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580ā585 (2013).
Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362ā9367 (2009).
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 42, D1001ā1006 (2014).
Zhang, B., Kirov, S. & Snoddy, J. WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res 33, W741ā748 (2005).
Brabletz, T. et al. Invasion and metastasis in colorectal cancer: epithelial-mesenchymal transition, mesenchymal-epithelial transition, stem cells and beta-catenin. Cells Tissues Organs 179, 56ā65 (2005).
Bates, R. C. Colorectal cancer progression: integrin alphavbeta6 and the epithelial-mesenchymal transition (EMT). Cell Cycle 4, 1350ā1352 (2005).
Bates, R. C. & Mercurio, A. M. The epithelial-mesenchymal transition (EMT) and colorectal cancer progression. Cancer Biol Ther 4, 365ā370 (2005).
Cao, H., Xu, E., Liu, H., Wan, L. & Lai, M. Epithelial-mesenchymal transition in colorectal cancer metastasis: A system review. Pathol. Res. Pract. 211, 557ā569 (2015).
Lemieux, E., Cagnol, S., Beaudry, K., Carrier, J. & Rivard, N. Oncogenic KRAS signalling promotes the Wnt/beta-catenin pathway through LRP6 in colorectal cancer. Oncogene 34, 4914ā4927 (2015).
Schneikert, J. & Behrens, J. The canonical Wnt signalling pathway and its APC partner in colon cancer development. Gut 56, 417ā425 (2007).
Najdi, R., Holcombe, R. F. & Waterman, M. L. Wnt signaling and colon carcinogenesis: beyond APC. J Carcinog 10, 5 (2011).
Segditsas, S. & Tomlinson, I. Colorectal cancer and genetic alterations in the Wnt pathway. Oncogene 25, 7531ā7537 (2006).
Masuda, M., Sawa, M. & Yamada, T. Therapeutic targets in the Wnt signaling pathway: Feasibility of targeting TNIK in colorectal cancer. Pharmacol Ther 156, 1ā9 (2015).
Hill, R., Cautain, B., de Pedro, N. & Link, W. Targeting nucleocytoplasmic transport in cancer therapy. Oncotarget 5, 11ā28 (2014).
Reimand, J., Wagih, O. & Bader, G. D. The mutational landscape of phosphorylation signaling in cancer. Sci Rep 3, 2651 (2013).
Gamez-Pozo, A. et al. Protein phosphorylation analysis in archival clinical cancer samples by shotgun and targeted proteomics approaches. Mol Biosyst 7, 2368ā2374 (2011).
Yang, B., Cao, L., Liu, B., McCaig, C. D. & Pu, J. The transition from proliferation to differentiation in colorectal cancer is regulated by the calcium activated chloride channel A1. PLoS One 8, e60861 (2013).
Sarraf, P. et al. Differentiation and reversal of malignant changes in colon cancer through PPARgamma. Nat. Med. 4, 1046ā1052 (1998).
Emmink, B. L. et al. Differentiated human colorectal cancer cells protect tumor-initiating cells from irinotecan. Gastroenterology 141, 269ā278 (2011).
Hong, M. G., Pawitan, Y., Magnusson, P. K. & Prince, J. A. Strategies and issues in the detection of pathway enrichment in genome-wide association studies. Hum Genet 126, 289ā301 (2009).
Acknowledgements
This work was supported by funding from the Chinese Medical Association (Grant No. 13050770462), Heilongjiang Postdoctoral Science Foundation (No.LBH-Z15159) and Scientific Foundation of the First Affiliated Hospital of Harbin Medical University (No.2016B008).
Author information
Authors and Affiliations
Contributions
Y.Y. and J.Z. conceived and initiated the project. X.L. and M.C. analyzed the data. Y.Y., J.Z., X.L., M.C. and S.H. wrote the manuscript. All authors reviewed the manuscript and contributed to the final manuscript.
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Electronic supplementary material
Rights and permissions
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the articleās Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
About this article
Cite this article
Lu, X., Cao, M., Han, S. et al. Colorectal cancer risk genes are functionally enriched in regulatory pathways. Sci Rep 6, 25347 (2016). https://doi.org/10.1038/srep25347
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/srep25347
This article is cited by
-
CCDC12 promotes tumor development and invasion through the Snail pathway in colon adenocarcinoma
Cell Death & Disease (2022)