Figure 2: Cancer-associated promoters in GC.

(a) Cancer-associated promoters are frequently associated with non-RefSeq TSSs (‘cryptic promoters’). Cryptic promoter proportions associated with all promoters (‘total’) and promoters lost in cancer (‘loss’) are provided as references. Cancer-associated promoters are also associated with expressed non-RefSeq transcripts from RNAseq data (right most numbers). (b) Heatmap showing expression status of non-RefSeq transcripts exhibiting greater than fourfold expression changes between normal tissues and gastric tumours (FPKM). The transcripts are associated with 192 cancer-associated promoters. (c) GREAT analysis demonstrating enriched gene categories for cancer-associated promoters. All enriched terms with P<6 × 10−6 from the original GREAT34 output are listed. (d,e) Cryptic promoter-driven MET expression. RNAseq and K4me3 tracks are shown. (e) Close-up view of the cryptic promoter region. Mapped RNAseq reads that span exon junctions are connected with lines (forward read in blue, reverse read in red), showing representative ‘split’ RNAseq reads confirming linkage of the promoter to downstream MET exons. (f) MET functional domains. The predicted cryptic promoter-driven transcript encodes an N-terminal truncated protein lacking the Sema domain. (g) Cryptic promoter-driven NKX6-3 expression. RNAseq and K4me3 tracks are shown. RNAseq alignments are provided in Supplementary Fig. 13. (h) Cryptic promoter-driven HOXB9 expression. RNAseq alignments are provided in Supplementary Fig. 14. (i) Expression levels of K4me3-marked genes between GCs (n=185) and matched normals (n=89). A significant proportion of genes are upregulated in GC (upregulated genes=143; total target genes=218; one sample proportion test P-value=5.68 × 10−6). (j) Survival analysis comparing patient groups with GCs exhibiting high and low expression of genes driven by cancer-associated promoters. Kaplan–Meier survival analysis of clusters within the Singaporean cohort (total n=183) with ‘high’ (n=154) and ‘low’ (n=29) enrichment of the target gene signature. The signature is prognostic in this cohort (P=0.04, log-rank test)), with worse prognosis observed for higher enrichment of the signature (hazard ratio (95% CI): 1.78 (1.02–3.13); Cox regression P-value=0.044).