Figure 1

Transcription start sites, cloned insert sequences and analysis of splice sites in the TCF4 CTG repeat-containing region in the adult human cerebellum. (A) A schematic of the TCF4 gene including alternative 5′ exons marked as white boxes and internal exons marked as black boxes drawn in scale. Functional protein domains have been marked under the exons and the TCF4 CTG TNR region has been marked on top with an arrow displaying the repeat location in the intron between exons 3 and 4a. Alternative splice sites have been marked as roman numerals. Testis specific exons are underlined and marked bold. (B) Major alternative transcripts of the human TCF4 gene. Coding regions are represented as black boxes and untranslated regions are shown as white boxes. Transcripts are named after the 5′ exon and the splice site. The names of protein isoforms encoded by the transcripts are shown on the right. Locations of alternative splicing that generates full-length (FL), Δ, − and + isoforms are shown at the bottom. A and B adapted from Sepp et al.4. (C) A schematic drawing of the TCF4 gene region proximal to the CTG TNR. Genomic coordinates are based on the GRCh37/hg19 genome build. Black boxes indicate internal exons, white boxes 5′ exons and striped open box marks the location of the CTG TNR. EST-s from GenBank are shown with accession numbers on the right. TSS peaks (FANTOM5 DPI peak, robust set) and CAGE reads (total counts of CAGE for reverse strand encoding for TCF4) from FANTOM5 project are visualized. (D) Transcripts identified by 5′ RACE are indicated—black boxes show sequenced components, white boxes unsequenced components. Transcription start regions of exon 4aI, exon 4b and exon 4c are indicated by arrows. (E) RT-PCR analysis of the splicing in the region. “US” indicates RT-PCR fragments amplified from gDNA or unspliced pre-mRNA. “S” denotes RT-PCR fragments from spliced mRNA. The numbers in brackets indicate different primer pairs used for PCR. Three different primer pairs were used to identify splice sites between exons 4a-I and 4, whereas two were used to identify splice sites between exons 4a-III and 4. (F) The structure of TCF4 transcripts identified by RT-PCR in D. (G) The cloned promoter regions are indicated with filled boxes and the firefly luciferase (ffLuc) reporter gene are indicated with an open box. AD activation domain, NLS nuclear localization domain, bHLH basic helix-loop-helix, FL full length, Δ lack of nuclear localization domain, EST expressed sequence tag, TSS transcription start site, FANTOM5 functional annotation of the mammalian genome, DPI decomposition-based peak identification, CAGE cap analysis of gene expression.