Supplementary Figure 2: Analysis of STAT3 binding sites in EGFRvIII-expressing astrocytes
From: Control of glioblastoma tumorigenesis by feed-forward cytokine signaling

(a-f) EGFRvIII/Stat3loxP/loxP astrocytes were subjected to ChIP-seq analyses using a STAT3 or an IgG control antibody. 7,725 STAT3 ChIP-seq peaks were called by BeyesPeak with posterior probability equal or more than 0.995. 16,078 negative peaks were called by BeyesPeak and represent genomic regions enriched in IgG control groups compared to STAT3. 7,725 negative peaks with identical length distribution as STAT3 ChIP-seq peaks were randomly selected from the negative peak pool and used as control. Peak height is defined as area under the curve divided by peak length. Averaged conservation scores for STAT3 ChIP-seq peaks were calculated based on phastcon scores (0~1; UCSC 20-way placental mammals) of all the base pairs within each peak groups. Base pairs located within exon features (based on UCSC mm9 refGene table) and repetitive sequences (based on RepBase 14.09) were excluded from the calculation. STAT3 peaks were generated and divided into 100 percentiles in the order of decreasing peak height. (a) The numbers of STAT3 motif variants surrounding centers of STAT3 ChIP-seq peaks is shown. The center of a STAT3 ChIP-seq peak was defined as the base pair located at the middle of the peak. The location of a STAT3 motif was defined as the position of the 5th base pair in the 9nt-long STAT3 motif. The number of each STAT3 motif variants was indicated above each plot within total STAT3 peaks that were binned into 20-bp windows over ±500 bp range of STAT3 peak centers. The expected occurrence of each STAT3 motif variant within the same ±500 bp range of the 7,725 STAT3 peak centers were calculated based on the sequence of each STAT3 motif variant and the background frequencies of each A/C/G/T nucleotide in mouse genome. For each STAT3 motif variant, the p-value from binomial test is shown if the actual occurrence of that STAT3 motif variant is more than the expected occurrence. (b) A logo image of STAT3 motifs in astrocytes identified by MEME in 1,000 STAT3 ChIP-seq peaks is shown. (c) STAT3 peaks and negative peaks were divided into 100 percentiles in the order of decreasing peak height. Average number of STAT3 motifs within each percentile was plotted. “Motif-containing STAT3 peaks” are defined as STAT3 peaks that contain STAT3 motifs. The occurrence of STAT3 motifs was calculated by “Find Individual Motif Occurrences” (FIMO, q-value ≤ 0.1) with the scoring matrix of STAT3 motif identified by Multiple EM for motif Elicitation (MEME) analysis in 1,000 STAT3 ChIP-seq peaks with highest posterior probabilities (p ≤ 1e-4). Notably, negative control peaks are relatively depleted for STAT3 motifs comparing to STAT3 peaks. (d) Conservation score for each peak percentile is plotted. The highest scoring peaks are relatively more conserved. (e) Distribution of STAT3 ChIP-seq peaks over genomic features is shown. Gene annotation is based on UCSC mm9 refGene table. The genomic features are mutually exclusive. In case of overlapping features, numbers of base pairs within STAT3 ChIP-seq peaks were assigned to the genomic feature with the following priority order: promoter/downstream regions > coding exon > 5' UTR > 3' UTR > intron > distal intergenic. Promoter regions were grouped as genomic regions covering -1 kb to transcription start site (TSS), -2kb to -1 kb, -3 kb to -2 kb upstream of TSS and similar three 1-kb windows downstream of TSS. Distal intergenic regions were defined as genomic regions not overlapping with any gene-related features. (f) Distribution of genomic features within mouse genome is shown. The native representation of various genomic features within mouse genome (2,654,895,218 bp) was calculated based on the same method as described for panel e. Comparing to the distribution of STAT3 ChIP-seq peaks, STAT3 predominantly occupies promoter and downstream regions surrounding TSS and is particularly enriched within proximal promoter regions (-1 kb to TSS).