Multi-tissue expression and splicing data prioritise anatomical subsite- and sex-specific colorectal cancer susceptibility genes

Hazelwood, Emma; Canson, Daffodil M.; Deslandes, Benedita; Wang, Xuemin; Kho, Pik Fang; Legge, Danny; Constantinescu, Andrei-Emil; Lee, Matthew A.; Bishop, D. Timothy; Chan, Andrew T.; Gruber, Stephen B.; Hampe, Jochen; Le Marchand, Loic; Woods, Michael O.; Pai, Rish K.; Schmit, Stephanie L.; Figueiredo, Jane C.; Zheng, Wei; Huyghe, Jeroen R.; Murphy, Neil; Gunter, Marc J.; Richardson, Tom G.; Whitehall, Vicki L. J.; Vincent, Emma E.; Glubb, Dylan M.; O’Mara, Tracy A.

doi:10.1038/s41467-025-60275-6

Download PDF

Article
Open access
Published: 30 May 2025

Multi-tissue expression and splicing data prioritise anatomical subsite- and sex-specific colorectal cancer susceptibility genes

Nature Communications volume 16, Article number: 5043 (2025) Cite this article

4118 Accesses
1 Citations
20 Altmetric
Metrics details

Subjects

Abstract

Genome-wide association studies have suggested numerous colorectal cancer (CRC) susceptibility genes, but their causality and therapeutic potential remain unclear. To prioritise causal associations between gene expression/splicing and CRC risk (52,775 cases; 45,940 controls), we perform a transcriptome-wide association study (TWAS) across six tissues with Mendelian randomisation and colocalisation, integrating sex- and anatomical subsite-specific analyses. Here we reveal 37 genes with robust causal links to CRC risk, ten of which have not previously been reported by TWAS. Most likely causal genes with evidence of cancer cell dependency show elevated expression linked to risk, suggesting therapeutic potential. Notably, SEMA4D, encoding a protein targeted by an investigational CRC therapy, emerges as a key risk gene. We also identify a female-specific association with CRC risk for CCM2 expression and subsite-specific associations, including LAMC1 with rectal cancer risk. These findings offer valuable insights into CRC molecular mechanisms and support promising therapeutic avenues.

Fine-mapping analysis including over 254,000 East Asian and European descendants identifies 136 putative colorectal cancer susceptibility genes

Article Open access 26 April 2024

Trans-ancestry transcriptome-wide association and functional studies to uncover novel susceptibility genes and therapeutic targets for colorectal cancer

Article Open access 29 April 2025

Systematic prioritization of functional variants and effector genes underlying colorectal cancer risk

Article Open access 16 September 2024

Introduction

Colorectal cancer (CRC) is the third most common cancer worldwide and the fourth most common cause of cancer-related death¹. There are several established risk factors for CRC, including obesity, alcohol consumption and tobacco use^{2,3,4,5,6,7,8,9} and there is evidence of heterogeneity by sex and anatomical site^2,10. However, the biological pathways that causally affect CRC development remain poorly understood, which has limited the ability to design suitable therapeutic interventions for prevention and treatment^2,11,12. Indeed, understanding the genetics underlying disease susceptibility has become an important area of research; drugs with genetic support have been shown to be twice as likely to be successful in clinical trials^13,14.

Genome-wide association studies (GWAS) have identified common genetic risk variants at over 200 genetic loci associated with CRC risk, including those associated with anatomical subsite-specific CRC^10,15,16. However, the mechanisms by which these genetic variants affect disease development are generally unknown, hindering translation of these results into clinical applications. Most CRC genetic variants are located outside of coding sequences and their effects are assumed to be mediated through regulation of gene expression, adding complexity to the process of linking variants to the target gene. Given the potential to identify causal disease targets, establishing CRC susceptibility genes from GWAS presents an important opportunity for the development of new therapeutic targets. Indeed, studies have shown that genes or proteins identified through GWAS, or other genetic studies, of clinical phenotypes are more likely to be targeted by drugs approved for corresponding indications, compared to targets lacking such evidence^13,14.

Transcriptome-wide association studies (TWAS) are a form of post-GWAS analysis that establishes associations between gene expression and traits. In brief, gene expression is imputed to GWAS of traits of interest (here, CRC risk) using genetic variants which have been previously identified as being associated with gene expression in relevant tissues. Given the difficulty in accessing solid tissues for gene expression analyses, TWAS using these tissues are often limited by small sample sizes. S-MultiXcan and joint tissue imputation (JTI) are two TWAS methods which address this issue by incorporating information across multiple tissues to maximise statistical power^17,18. Including multiple tissues in a single analysis also allows for the identification of the relevant biological tissue for the gene identified—which is important information for drug development. Notably, the S-MultiXcan approach also facilitates analysis of trait associations with alternative splicing events (i.e. processes producing distinct transcripts from the same gene). Alternative splicing is an often neglected mechanism in linking genes to traits despite evidence suggesting that up to ~30% of GWAS signals may mediate their effects through splicing¹⁹.

TWAS have successfully identified potential susceptibility genes for many cancers, including breast²⁰, endometrial²¹, and CRC^{15,22,23,24,25}. However, no CRC TWAS performed thus far has stratified by anatomical subsite or sex, which are important aspects of CRC development^8,10,26. Additionally, TWAS for CRC have often lacked a causal framework analysis to account for bias from residual linkage disequilibrium between genetic variants^15,25. Consequently, it is likely that some previously identified genes represent spurious associations. Identifying genes that causally affect disease development is essential for revealing novel and effective avenues for CRC therapy and treatment.

In this study, we perform comprehensive multi-tissue expression and splicing TWAS analyses (outlined in Supplementary Fig. 1) to identify likely causal genes involved in CRC susceptibility, with a focus on sex- and anatomical subsite-specific associations. Here, we identify 37 genes with robust causal associations with CRC risk through a causal framework using Mendelian randomisation (MR) and genetic colocalisation. We highlight subsite-specific effects, such as rectal cancer risk linked to LAMC1, a clinically actionable drug target, and identify CCM2 expression as a female-specific CRC risk factor involved in progesterone signalling. Our framework also prioritises SEMA4D, a previously unreported CRC susceptibility gene encoding a protein targeted by investigational cancer therapies. Additionally, we evaluate the impact of established drug targets on CRC risk by applying the same framework to 1163 genes encoding proteins targeted by approved or clinically studied drugs²⁷ and prioritise four such genes. Collectively, our findings provide important insights into the molecular mechanisms underlying CRC risk and reveal promising avenues for the development of new therapeutic strategies.

Results

Multi-tissue TWAS analyses

To identify genes associated with CRC risk at both the expression and splicing level, we used two multi-tissue TWAS methods: S-MultiXcan and JTI. For S-MultiXcan, we imputed gene expression using expression quantitative trait loci (eQTLs) and splicing events using splicing quantitative trait loci (sQTLs). For JTI we imputed gene expression only as predictive models are not currently available for splicing events. For all TWAS approaches, gene expression or splicing events were imputed using data from the GTEx Project (version 8)²⁸. We performed TWAS analyses using data from six tissues previously linked to CRC (subcutaneous and visceral adipose, lymphocytes, and whole blood) or directly relevant to CRC (sigmoid and transverse colon). Associations were tested with risk of overall CRC, as well as sex- or subsite-specific disease. CRC anatomical subsites were defined as per Huyghe et al.¹⁰ (see “Methods”). Briefly, proximal, distal and rectal are mutually exclusive anatomical subsites designated by location of tumour, whereas colon is comprised of proximal colon and distal colon tumours, as well as colon cancer with unspecified location.

Across all three multi-tissue TWAS analyses, 112 unique genes were associated with CRC risk after Bonferroni correction (p < 3.91 × 10⁻⁷ in S-MultiXcan eQTL analysis; p < 5.49 × 10⁻⁷ in S-MultiXcan sQTL analysis; p < 6.01 × 10⁻⁸ in JTI analysis; Supplementary Fig. 2 and Supplementary Data 1–3). Of these genes, 64 were identified in the eQTL TWAS analyses, with 30 identified by both JTI and S-MultiXcan approaches. The splicing S-MultiXcan analysis revealed 144 unique splicing events associated with CRC risk, mapping to 60 genes, 23 of which were also identified in at least one of the eQTL TWAS analyses. None of the genes encoding proteins targeted by clinically studied drugs (i.e. ‘druggable genes’) passed correction for multiple testing in any of the TWAS analyses but 772 demonstrated nominal associations (p < 0.05).

MR analyses

To evaluate the causal effect of gene expression on CRC risk, we performed MR, which uses germline genetic variants as instrumental variables to provide causal estimates (subject to certain assumptions, see Methods)^29,30. Of the 112 genes identified by TWAS, 46 had available cis-genetic variants to proxy gene expression in at least one of the a priori selected tissues (minimum F-statistic: 30, median: 67). All genes had a single genetic instrument other than two genes (MICA and MICB), both of which had two genetic instruments. Among the genes with suitable genetic instruments, 29 passed multiple testing in MR analyses (Supplementary Data 4 and Supplementary Fig. 3). Of the 144 splicing events identified in the S-MultiXcan analysis, 37 had available genetic instruments to proxy the splicing event for MR analyses (minimum F statistic: 30, median: 63), with 27 passing the Bonferroni threshold, corresponding to 17 genes (Supplementary Data 5 and Supplementary Fig. 4). We also included the druggable genes in our causal framework analyses that were nominally associated with CRC risk from TWAS analysis, of which 380 had genetic instruments available according to our thresholds outlined in Methods (minimum F-statistic: 30, median: 60). The expression of seven of these genes passed multiple testing in MR analyses (Supplementary Data 6 and Supplementary Fig. 5).

Colocalisation analyses

Genetic colocalisation analysis can help assess the evidence for causal associations between traits by evaluating whether the same or distinct variant(s) underlie the association between two traits³¹. Colocalisation analyses were performed based on the tissues identified in the TWAS: if a gene was identified in all six tissues in the TWAS, colocalisation analysis was performed for all six tissues. Conversely, if a gene was identified in only one tissue in the TWAS, colocalisation was restricted to that single tissue, and so on. Of the 112 genes identified by TWAS, there was evidence for a shared causal variant between gene expression for 29 of these genes and CRC risk (H₄, posterior probability of a shared causal variant between the traits, >0.80; Supplementary Data 7), and for 19 splicing events that mapped to 12 genes (H₄ > 0.80; Supplementary Data 8). Of the 29 genes prioritised by MR analyses, 20 had been prioritised by the colocalisation analysis; and of the 27 splicing events prioritised by MR analyses, 12 were prioritised by the colocalisation analysis (corresponding to eight genes). Six druggable genes had evidence for a shared causal variant in colocalisation analyses (H₄ > 0.80) (Supplementary Data 9).

In order to avoid deprioritisation of CRC susceptibility genes or splicing events due to violations of the single causal variant assumption, we performed an additional colocalisation analysis using Pairwise Conditional Colocalisation (PWCoCo; described in “Methods”)³². In brief, we applied PWCoCo to any gene or splicing event that met the multiple testing threshold in the MR analysis but had a H₄ posterior probability ≤0.80 in the standard colocalisation analyses. This resulted in the inclusion of an additional one gene based on expression (TCF19; Supplementary Data 10) and one splicing event (mapping to the gene LRRFIP2; Supplementary Data 11).

Likely causal associations with colorectal cancer risk

To identify likely causal gene associations with CRC risk, we used a stringent framework to prioritise genes: (1) passing Bonferroni correction in at least one TWAS analysis; (2) H₄ > 0.80 in genetic colocalisation analysis; and (3) passing Bonferroni correction in MR analysis or having no suitable genetic instruments available (Fig. 1a). Using this framework, we identified 37 genes with a likely causal association (Fig. 1b, Supplementary Fig. 6 and Table 1). Twenty likely causal susceptibility genes were identified solely through associations with expression and nine through associations with splicing alone. The largest magnitude of effect was observed for POU5F1B in the expression TWAS (Z-score in JTI = −13) and for COLCA1 in the splicing TWAS (Z-score in S-MultiXcan with sQTLs = 10). We performed functional enrichment analysis of the likely causal genes using g:Profiler³³ and found significant enrichment (p_adj < 0.05) for genes involved in POU domain binding and the mitochondrial complex IV assembly (Supplementary Data 12).

**Fig. 1: Overview of multi-tissue TWAS, colocalization, and MR-based gene prioritisation for colorectal cancer risk.**

Table 1 Summary table of prioritised genes

Full size table

Since we used two different methods for the expression TWAS (i.e. S-MultiXcan and JTI), we evaluated whether genes identified by both methods were more likely to be prioritised by our framework (Fig. 1c and Supplementary Fig. 2). Of the 37 genes identified by both methods, 10 were prioritised (27%). In contrast, of the 19 gene expression associations identified by JTI alone, 12 were prioritised (63%), whereas only 2 of the 26 (8%) gene expression associations identified by S-MultiXcan were prioritised. These results suggest that JTI outperforms S-MultiXcan in prioritising genes with likely causal associations with CRC.

The likely causal genes included a previously unreported colorectal cancer susceptibility gene, SEMA4D, neither located at known colorectal cancer GWAS risk loci nor previously identified by colorectal cancer TWAS. A further ten genes were located at known colorectal cancer GWAS risk loci but had not been previously identified by colorectal cancer TWAS. Our analysis also revealed context-specific associations. Of the 37 likely causal genes, 23 showed tissue-specific associations (i.e. associations unique to expression or splicing in one tissue): five genes were found through analysis of subcutaneous adipose, one through visceral adipose, two through sigmoid colon, nine through transverse colon, three through lymphocytes and three through whole blood. Regarding anatomical subsites, two genes were exclusively associated with colon cancer risk (AAMP and ARPC2), three genes with both colon and proximal colon cancer risk (EPM2AIP1, MLH1 and RP11-129K12.1), one with distal colon cancer risk (ABCC2), one with proximal colon cancer risk (LRRFIP2) and three with rectal cancer (COLCA1, LAMC1 and GPATCH1) risk. For all but AAMP, differences in TWAS effect sizes for these genes were observed between subtypes (Figs. 2, 3). Lastly, one gene (CCM2) was specifically associated with female colorectal cancer risk (Fig. 2N).

**Fig. 2: Forest plots of JTI effect sizes across colorectal cancer anatomical subsites and sex for anatomical subsite- and sex-specific genes identified by JTI TWAS analysis.**

Fig. 3: Forest plots of mean Z-score estimates from S-MultiXcan across colorectal cancer anatomical subsites for anatomical subsite-specific genes identified by S-MultiXcan (expression or splicing) TWAS analysis.

For the analysis of the druggable genes, we conducted an exploratory analysis by focussing on genes that were nominally significant in at least one TWAS analysis. To prioritise genes for causality, we selected those passing H₄ > 0.80 in genetic colocalisation analysis and Bonferroni-correction in MR analysis. This approach revealed four genes (GPBAR1, LTBR, PDCD1 and PTGER3) (Fig. 1a, Supplementary Fig. 4 and Table 1).

Splicing event annotation

To provide further support for likely causal splicing associations, we explored underlying splicing mechanisms. Using a bioinformatic splicing pipeline to analyse CRC GWAS risk variants for effects on the likely causal splicing events, we found that a single splicing event met the predetermined conditions indicative of a high-confidence splicing mechanism (see “Methods” for more information). This event, related to PLEKHG6 (intron_12_6317696_6317899; Supplementary Data 13), could be explained by rs1468603 (chr12:6317886C > T). Specifically, the T allele was predicted to activate an exonic cryptic acceptor, enhancing the inclusion of a truncated exon 10 (45 bp in-frame deletion) in PLEKHG6 (NM_001384598.1), corresponding to the intron_12_6317696_6317899 splicing event.

Evaluating drug targeting opportunities provided by likely causal susceptibility genes

In addition to specifically analysing druggable targets, we investigated the druggability of proteins encoded by the likely causal susceptibility genes using the Pharos³⁴ and Open Targets³⁵ platforms to identify drug repurposing opportunities for preclinical or clinical investigation. These databases identified proteins encoded by LAMC1 and SEMA4D as targets of clinically studied drugs. Laminin subunit gamma 1, encoded by LAMC1, is degraded by ocriplasmin, a recombinant proteinase drug used to treat vitreomacular adhesion. SEMA4D encodes semaphorin 4D which is inhibited by pepinemab, an antibody that has been clinically studied for treatment of several cancer types, including a phase I trial of CRC (Clinicaltrials.gov: NCT03373188). We also identified five genes (ABCC2, ATF1, FADS1, FEN1 and KLF5) whose protein products bind to small molecules, supporting their potential druggability.

We evaluated the potential for efficacy in therapeutic targeting of likely causal susceptibility genes by assessing if their expression is required for CRC cell line viability. Using the BioGRID Open Repository on CRISPR Screens³⁶, we found CRC cell lines were dependent on 16 of the likely causal susceptibility genes, with nine genes demonstrating dependency in at least 15% of studies with available data (Supplementary Data 14). Among these 16 genes, 11 were identified through expression TWAS approaches (Table 1). Consistent with the dependency findings, increased expression of eight genes, including AAMP and FEN1, associated with CRC risk. AAMP showed particularly consistent findings, with CRC cell lines demonstrating dependency for AAMP expression in 80% of the studies in which it was tested. CRC cell lines also showed frequent dependency for expression of FEN1 (48% of studies), which encodes a potentially druggable protein.

Shared causal pathways with known CRC risk factors

To investigate whether the likely causal susceptibility genes may relate to known CRC risk factors, we performed genetic colocalisation. We evaluated evidence for a shared causal variant between the expression of 28 likely causal susceptibility genes (i.e. those that passed both the colocalisation and MR thresholds, not including the seven genes that had robust evidence for splicing only) and each of four established CRC risk factors—BMI, WHR, alcohol consumption, and smoking initiation. Among these genes, we found evidence of colocalisation (posterior probability of H₄ > 0.80) for two genes (AAMP and TMBIM1) with WHR (Supplementary Data 15).

Discussion

Our analysis combined two multi-tissue TWAS methods with a causal framework to identify CRC susceptibility genes. Through this framework, we prioritised 37 genes with strong evidence for a causal role in colorectal cancer risk, with associations extending to specific disease subtypes and expression in distinct tissues, implicating the involvement of tissues outside the colon or rectum in CRC development. In addition, our analysis of the druggable genome revealed four genes with suggestive evidence for a causal role in colorectal cancer risk. The subsequent drug target analyses allowed us to highlight candidates for future investigation.

While previous TWAS for CRC have been conducted, these analyses have not been stratified by anatomical subsite or sex, which are important aspects of CRC aetiology. The importance of stratified analysis is demonstrated by our findings for a causal role of CCM2 in female-specific colorectal cancer. Cerebral cavernous malformation 2 (CCM2) is a component of the CCM signalling complex, which has a role in regulating several signalling cascades, including progesterone signalling^37,38. Notably, multiple studies have demonstrated a protective role for progesterone in CRC development (reviewed in Wenxuan et al.³⁹). Our findings of decreased CCM2 expression associating with increased CRC risk are consistent with this, supporting a potential sex-specific role for CCM2^37,38.

Nearly one third (11 of 37) of the susceptibility genes exhibited location-specific associations, highlighting the genetic heterogeneity of CRC. This subsite-level dissection provides a more nuanced understanding of this complex disease and underscores the importance of considering tumour location in genetic studies, with implications for developing more tailored treatment strategies. In addition, our findings are consistent with evidence from GWAS that genes at locus 3p22.2 (including MLH1 and EPM2AIP1) have proximal colon cancer-specific effects^10,40,41. Though loss of function MLH1 variants are known to be associated with proximal colon cancer, we found that increased MLH1 expression was associated with increased cancer risk. A similar, albeit nominally significant TWAS finding was previously reported²². Supporting these observations, it has been reported that MLH1 may have context-specific effects. For example, MLH1 has been found to be upregulated in mismatch repair proficient CRC tumours and shown to have oncogenic effects in some contexts⁴². Nevertheless, further research is thus required to understand the direction of effect of MLH1 expression on proximal colon cancer risk.

Among the likely causal genes, SEMA4D emerged as a CRC susceptibility gene that is neither located at known CRC GWAS risk loci nor previously identified by CRC TWAS. SEMA4D was identified through association of its alternative splicing with colorectal cancer risk, highlighting the importance of studying this mechanism using TWAS approaches. SEMA4D encodes a protein with immunoregulatory activity⁴³, consistent with its association with CRC risk through splicing effects in lymphocytes, also highlighting a potential causal cell type. Moreover, in a preclinical mouse colon cancer model, antibody blockade of SEMA4D has been shown to enhance the infiltration of immune cells into tumours, thereby promoting anti-tumour immune responses⁴⁴. Importantly, our findings provide evidence to prioritise the clinical targeting of SEMA4D, currently being performed using an antibody treatment.

A further ten genes, located at known CRC GWAS risk loci had not been previously identified by CRC TWAS. These findings may possibly be due to the lack of anatomical subsite-stratified analyses in previous TWAS or our inclusion of alternative splicing events. Indeed, four of these genes (including SEMA4D) were exclusively identified through splicing associations. Further supporting the relevance of our splicing analysis, we demonstrated a potential mechanism for PLEKHG6 splicing in CRC risk that involves the effect of a CRC GWAS SNP. These findings highlight the importance of incorporating splicing events in TWAS analyses, as they may reveal genes and mechanisms of genetic susceptibility that are not captured by gene expression alone.

LAMC1 emerged as another likely causal susceptibility gene encoding a target of a clinically studied drug (ocriplasmin). LAMC1 has previously been identified as a CRC susceptibility gene through GWAS and other approaches^15,45. The laminin family of proteins are key components of the basal membrane and have been implicated in CRC progression^46,47. We found genetically predicted increased expression of LAMC1 was associated with increased rectal cancer risk, providing support for therapeutic inhibition of LAMC1. Ocriplasmin, a synthetic form of plasmin which targets laminin, is currently used to treat eye-related diseases and is also in phase II trials for several other conditions, including stroke and deep vein thrombosis^48,49,50. While prior research has suggested ocriplasmin as a candidate drug for CRC treatment⁵¹, further drug development would be required due to the current need for its direct injection and its moderate stability⁵².

Evidence from publicly available data supports a role for several of the likely causal susceptibility genes in CRC, including CCM2 and SEMA4D as discussed. Furthermore, mechanistic studies at the 11q23.1 CRC GWAS locus have linked risk variation to POU2AF2 and demonstrated that this gene protects tuft cells in the colon while suppressing colonic tumourigenesis in a mouse model⁵³. This observation is consistent with our TWAS finding that decreased POU2AF2 expression is associated with increased CRC risk. Moreover, we have found that most likely causal susceptibility genes showing a dependency in CRC cell lines align with TWAS findings where increased expression was associated with increased CRC risk (e.g. AAMP and FEN1). This alignment underscores their relevance as candidate therapeutic targets. The most consistent findings of CRC dependency were for AAMP which encodes angio-associated migratory cell protein (AAMP), with a role in angiogenesis, cell migration⁵⁴, and CRC metastasis⁵⁵. We also found evidence for colocalisation of AAMP expression with WHR suggesting that AAMP may also impact CRC risk through effects on adipose distribution, or vice versa. Although there are no current inhibitors of AAMP, Open Targets indicates there is potential for inhibition through antibody or protein targeting chimera approaches. FEN1 also demonstrated consistent CRC dependency. The metallonuclease encoded by FEN1 has a role in DNA replication and double-strand break repair⁵⁶. Promisingly, FEN1 small molecule inhibitors have been developed that show anti-cancer effects in experimental models⁵⁷. These findings support the identification of druggable targets for CRC treatment, including corresponding candidate therapies or modalities, and provide valuable starting points for experimental validation and treatment development.

We also performed a comprehensive analysis of the “druggable genome”²⁷. We focussed on genes that were nominally significant in at least one TWAS analysis and prioritised genes with evidence of genetic colocalisation (H₄ > 0.80) with CRC risk and which met the Bonferroni-correction in an MR analysis. This revealed suggestive evidence for a causal effect of expression of four genes (PDCD1, GPBAR1, PTGER3 and LTBR) on CRC risk. Among these, there were two tissue-specific associations observed in whole blood (GPBAR1 and PTGER3). Additionally, we found associations with unique anatomical subsite cancers: LTBR with risk of proximal colon cancer and PDCD1 with risk of rectal cancer. PDCD1 encodes programmed cell death 1 (PDCD-1 or PD-1) protein, which is targeted by inhibitors used to treat microsatellite instability-high or mismatch repair-deficient metastatic CRC^58,59,60. Our TWAS and MR analyses suggested that increased (rather than decreased, replicating the use of an inhibitor) expression of PDCD1 reduced risk of rectal cancer. This conflicts with evidence that PDCD-1 suppresses the immune system’s ability to destroy cancer cells, as one would assume that in this case increased PDCD1 expression would increase (not decrease) cancer risk⁶¹. However, we note that we only see strong evidence for a causal role of PDCD1 expression in blood (not colon tissue) on cancer risk—suggesting that the mechanism linking PDCD1 expression and colorectal cancer risk may be more complex than the presumed local effects within colorectal tissue. PTGER3 encodes a receptor for prostaglandin E2 that is targeted by misoprostol, an approved drug for gastric ulcers and reflux disease and which has shown efficacy in colon cancer xenograft models⁶². We replicated previous GWAS evidence that PTGER3 may have a role in proximal colon cancer and may be less relevant to rectal cancer¹⁰. LTBR encodes the tumour necrosis factor receptor lymphotoxin beta receptor (LTBR) which is targeted by an antibody agonist⁶³. However, an antibody antagonist is likely to be required for effective treatment given increased LTBR expression in several tissues was associated with risk of proximal colon cancer.

Our analysis aimed to robustly prioritise genes for CRC susceptibility by using multiple tissues alongside a causal framework. We combined two genetic epidemiological approaches to assess genes spuriously identified due to linkage disequilibrium (i.e. showing evidence for a causal role in MR but not colocalisation) and to identify possible non-causal biomarkers of disease or risk factors (i.e. those that colocalise but show null results in MR analyses). However, the sample sizes for available data for TWAS analyses are still relatively small compared to the CRC GWAS, which potentially impacts our ability to genetically predict gene expression and detect associations with CRC risk. In addition, our analyses were limited to genes with expression that can be predicted using available TWAS models, meaning some potentially casual genes may not be captured in our analyses. Additionally, many of our MR analyses were restricted to a single SNP, meaning we were unable to employ various “pleiotropy-robust” models to evaluate exclusion restriction assumptions. We did not exclude HLA in the MR analyses, which is a possible limitation due to the region’s high polymorphism and potential pleiotropic effects, which complicate causal interpretation. Linkage disequilibrium with other variants and unmeasured confounding factors further limit the ability to draw definitive conclusions. Furthermore, we did not evaluate the sensitivity of our colocalisation analyses to alternative window sizes or prior probabilities, which are important aspects of colocalisation analyses⁶⁴. Our study also presents further limitations that could be addressed in future research: (1) our analysed were restricted to individuals of predominantly European ancestries, which limits the generalisability of our findings to other populations and contexts; (2) the MR analyses performed here assume linearity between gene expression and CRC risk, which may not capture more complex interactions and non-linear relationships; (3) the use of available summary data limited our ability to perform analyses with sex-specific gene expression data that could provide insights into differential CRC risk; and (4) similarly, because we used summary-level data, we were unable to evaluate interactions between sex and CRC subtype.

Given the increase in CRC worldwide, understanding the biological mechanisms leading to carcinogenesis is becoming increasingly important¹. Additionally, as more screening programmes are rolled out globally, opportunities to prevent CRC development in high-risk individuals are also increasing. Therefore, the identification of new pharmaceutical targets for the prevention and treatment of this disease remains a priority. Our analyses have identified genes with robust evidence for a potential causal role in CRC development, offering insights into its aetiology and presenting tangible opportunities for the exploration and development of new therapeutic strategies.

Methods

CRC GWAS

Supplementary Data 16 shows the GWAS used in all analyses. Summary genetic association data for CRC risk (52,775 cases, 45,940 controls) were obtained from a meta-analysis of the Colorectal Transdisciplinary Study (CORECT), the Colon Cancer Family Registry (CCFR), and the Genetics and Epidemiology of CRC (GECCO) consortium^10,16. Summary genetic association data were obtained stratified by site (colon, 28,736 cases; proximal colon, 14,416 cases; distal colon, 12,879 cases; and rectal, 14,150 cases; 43,099 controls) and sex (female, 24,594 cases, 23,936 controls; male, 28,271 cases, 22,351 controls). Sex was defined based on sex chromosomes and samples with discrepancies between reported and genotypic sex based on X chromosome heterozygosity were excluded^10,16. Colon cancer included proximal colon (any primary tumour arising in the caecum, ascending colon, hepatic flexure, or transverse colon), distal colon (any primary tumour arising in the splenic flexure, descending colon or sigmoid colon), and colon cases with unspecified site. Rectal cancer included any primary tumour arising in the rectum or rectosigmoid junction. CRC was classified using ICD-10 codes and most cases were incident CRC. All participants in the anatomical subsite-specific CRC analyses were of European ancestries, and approximately 92% of participants in the overall CRC GWAS were European (~8% were East Asian). Imputation of GWAS summary statistics was performed using the Michigan imputation server and HRC r1.0 reference panel. Regression models were adjusted for age, sex, genotyping platform, and genomic principal components as described previously¹⁶. All participants included in the CRC GWAS provided informed consent and ethics were approved by respective institutional review boards^10,16.