Introduction

Colorectal cancer (CRC) is the third most common form of cancer and the second leading cause of cancer-related death in the western world and1,2. CRC is a leading cause of cancer-related deaths in the United States and its lifetime risk in the United States is about 7%1,3. CRC is a common complex disease caused by the combination of genetic variants and environmental factors1. Genome-wide association studies (GWAS) are considered to be new and power approaches to detect the genetic variants of human complex diseases. Recently, GWAS have been performed and reported some novel CRC susceptibility single nucleotide polymorphisms (SNPs) with genome-wide significance (P < 5.00E-08) and these SNPs have been repliciated by meta-anaysis methods4,5,6,7,8,9,10,11,12,13.

In 2012, Loo et al. conducted a cis-expression quantitative trait loci (cis-eQTLs) analysis to investigate the possible regulatory functions of 19 CRC risk variants on the expression of neighboring genes (<2 Mb up- or down-stream)14. They identified three variants including rs10795668, rs4444235 and rs9929218 to be significantly associated with expression levels of nearby genes14. In 2014, Closa et al. analyzed the association between 26 CRC SNPs and the expression of genes within a 2 Mb region (cis-eQTLs) using 47 healthy colonic mucosa tissues and 97 normal mucosa tissues adjacent to colon cancer and 97 paired tumor tissues15. Using Bonferroni correction, they identified three significant cis-eQTLs including rs3802842 in 11q23.1 associated with the expression of C11orf53, COLCA1 and COLCA2; rs7136702 in 12q13.12 associated with the expression of DIP2B and rs5934683 in Xp22.3 associated with the expression of SHROOM2 and GPR143. Closa et al. also reported other SNPs including rs7130173 for 11q23.1 and rs61927768 for 12q13.12, which are in linkage disequilibrium (LD) with rs3802842 and rs7136702 and are more strongly associated with the expression of the identified genes and are better functional candidates. In 2014, Yao et al. select 25 CRC SNPs and test the hypothesis that the CRC SNPs and/or correlated SNPs are in elements that regulate gene expression3. They identified 23 promoters, 28 enhancers and 66 putative target genes of the risk-associated enhancers3.

Evidence shows that most variants for common human diseases are not correlated with protein-coding changes, indicating that susceptibility variants in regulatory regions may contribute to disease phenotypes16. For CRC, most risk variants also reside outside the coding regions of genes3,14,15. Until now, as described above, comprehensive functional studies of CRC SNPs on nearby gene expression have been reported3,14,15. Evidence from the National Human Genome Research Institute (NHGRI) GWAS catalog shows that 85 CRC susceptibility variants, which reach suggestive association P < 1.00E-05, have been identified until now17,18. However, the exact genetic mechanisms for these newly identified CRC susceptibility variants are still unclear now. In order to investigate the potential regulatory functions for 85 newly identified CRC susceptibility variants, we conducted a pathway analysis of these CRC susceptibility genes around these CRC susceptibility variants.

Results

CRC susceptibility genes from ProxyGeneLD

Using the ProxyGeneLD and the LD information from the HapMap phase II Europe (CEU), 74 of these 85 unique CRC susceptibility variants were included in HapMap and were successfully mapped to the corresponding genes 53 unique CRC susceptibility genes (Table 1). However, another 11 SNPs including rs11196172, rs73376930, rs11255841, rs10849432, rs35509282, rs140355816, rs34245511, rs12412391, rs4849303, rs57046232 and rs7999699 are not found in HapMap.

Table 1 The main information for 85 CRC susceptibility variants.

CRC susceptibility genes for pathway analysis

Using the nearest upstream and downstream gene method in NHGRI GWAS catalog, we got 106 unique CRC susceptibility gene IDs as described above. We compared these 106 genes with 53 unique CRC susceptibility genes from the ProxyGeneLD and found 31 shared genes. In the end, we got 128 unique CRC susceptibility genes, which is the union of genes from both methods.

Pathway analysis preprocessing

In WebGestalt database, 120 of 128 genes were successfully mapped to 120 unique Entrez Gene IDs19. Other 8 genes were mapped to multiple Entrez Gene IDs or could not be mapped to any Entrez Gene ID. The following pathway analysis will be based upon the 120 unique gene IDs.

Pathway analysis using GO database

Our pathway analysis in GO database showed that these 120 CRC susceptibility genes were significantly enriched in 40 biological processes, 1 molecular function and 3 cellular components with adjusted P < 0.01. 17 of these 44 significant signals are regulatory pathways, such as regulation of epithelial to mesenchymal transition, negative regulation of Wnt receptor signaling pathway, regulation of pathway-restricted SMAD protein phosphorylation, positive regulation of nucleocytoplasmic transport, regulation of muscle organ development, positive regulation of intracellular protein transport. Interestingly, regulation of epithelial to mesenchymal transition (GO:0010717) is the most significant signal (Table 2). More detailed information including the gene IDs is described in supplementary Table 1.

Table 2 Significant GO pathways from pathway analysis of 128 CRC susceptibility genes.

Discussion

Until now, 85 CRC susceptibility variants with suggestive association P < 1.00E-05 have been reported17,18. To investigate the underlying genetic pathways where these newly identified CRC susceptibility genes are significantly enriched, we conducted a functional annotation. Using two kinds of SNP to gene mapping methods including the nearest upstream and downstream gene method and the ProxyGeneLD, we got 128 unique CRC susceptibility genes. We then conducted a pathway analysis in GO database using the corresponding 128 genes. We identified 44 GO categories, 17 of which are regulatory pathways.

Here, we identified the regulation of epithelial to mesenchymal transition (GO:0010717) to be the most significant signal in all the 44 GO categories and the most signal in all the 17 regulatory pathways. It is reported that the epithelial-mesenchymal transition-like dedifferentiation of the tumor cells is a character of CRC invasion20. Several studies have reviewed the association between epithelial-mesenchymal transition and CRC progression21,22,23. Our results show that these newly identified CRC susceptibility SNPs or genes may regulate epithelial-mesenchymal transition.

The negative regulation of Wnt receptor signaling pathway (GO:0030178) is the third significant signal in all the 44 GO categories and the second significant signal in all the 17 regulatory pathways. Evidence shows that aberrant regulation of the Wnt/β-catenin signaling pathway can cause CRC24. It is reported that the loss-of-function mutations in APC gene are common in CRC and can cause inappropriate activation of Wnt signaling24. Recently, several studies have reviewed the involvement of Wnt signalling in CRC development25,26,27. Masuda et al. reported Wnt signaling to be the potential therapeutic target by targeting TNIK in CRC28. Here, our results show that these newly identified CRC susceptibility SNPs or genes may regulate Wnt receptor signaling pathway.

The positive regulation of nucleocytoplasmic transport pathway (GO:0046824) is the 8th significant signal in all the 44 GO categories and the 4th significant signal in all the 17 regulatory pathways. Hill et al. reviewed the mechanisms and role of nucleocytoplasmic transport in cancer therapy29. Here, we identified the pathway-restricted SMAD protein phosphorylation (GO:0060389) and regulation of pathway-restricted SMAD protein phosphorylation (GO:0060393) to be 5th and 7th significant association signals, respectively. Interestingly, evidence shows that protein phosphorylation is a post-translational modification central to cancer biology30. Protein phosphorylation affects most eukaryotic cellular processes and its deregulation is considered a hallmark of cancer31.

We also found that these newly identified CRC susceptibility SNPs or genes may regulate five GO categories related with cell differentiation including regulation of fat cell differentiation (GO:0045598), mesenchymal cell differentiation (GO:0048762), regulation of striated muscle cell differentiation (GO:0051153), negative regulation of myoblast differentiation (GO:0045662) and cell morphogenesis involved in differentiation (GO:0000904). Evidence showed the involvement of differentiation in CRC. Breaking the balance between proliferation and differentiation in animal cells can cause cancer32. PPAR-γ is a nuclear receptor with a dominant regulatory role in differentiation of cells of the adipose lineage33. PPAR-γ can modulate the growth and differentiation of CRC cells33. Differentiated human CRC cells protect tumor-initiating cells from irinotecan34. The resistance of colorectal tumors to irinotecan requires the cooperative action of tumor-initiating ALDHhigh/ABCB1negative cells and their differentiated, drug-expelling, ALDHlow/ABCB1positive daughter cells34. The calcium activated chloride channel A1 (CLCA1) is a member of the calcium sensitive chloride conductance family of proteins and is expressed mainly in the colon32. Recent study shows that CLCA1 plays an important role in differentiation and proliferation of Caco-2 cells, which can regulate the transition from proliferation to differentiation in CRC and may be a potential diagnostic marker for CRC prognosis32.

Take together, our findings suggest that most CRC susceptibility variants are located in the intron region of protein encoding genes and are not correlated with protein-coding changes. Most of these 120 CRC susceptibility genes are involved in kinds of regulatory pathways. Our results may provide further insight into the underlying genetic mechanisms for these newly identified CRC susceptibility variants.

Materials and Methods

CRC susceptibility variants

The CRC susceptibility variants were available from the NHGRI GWAS catalog, which collected the results of published GWAS in online database18. We selected 85 unique CRC susceptibility variants with P < 1.00E-05 from the GWAS CRC, CRC (calcium intake interaction) and CRC (diet interaction).

Data preprocessing

In NHGRI GWAS catalog, these 85 unique CRC susceptibility variants were successfully mapped to 167 nearest upstream and downstream genes. We further analyzed these 167 genes and got 106 unique CRC susceptibility gene IDs. The detailed information was described in Table 1.

Mapping SNPs to genes using the ProxyGeneLD

In addition to the nearest upstream and downstream gene method, we also used a Perl software named ProxyGeneLD. ProxyGeneLD can map these 85 SNPs to their corresponding genes using the linkage disequilibrium (LD) information from the HapMap genotyping data (HapMap phase II Europe (CEU), release 22)35. For more detailed algorithms, please refer to the original study35.

CRC susceptibility genes

Here, we map these 85 SNPs to their corresponding genes using both methods as described above. The final CRC susceptibility gene set is the union of genes from both methods.

Pathway analysis using WebGestalt

We used the GO pathways in WebGestalt database for pathway analysis19. The hypergeometric test was used to detect the overrepresentation of differently expressed AD genes among all of the genes in a given pathway19. The reference gene list is the entire Entrez gene set. The minimum number of genes for a category is 3. The FDR test was used to correct for multiple testing. GO pathways with an adjusted P < 0.05 are considered to be significantly associated with CRC.

Additional Information

How to cite this article: Lu, X. et al. Colorectal cancer risk genes are functionally enriched in regulatory pathways. Sci. Rep. 6, 25347; doi: 10.1038/srep25347 (2016).