Extended Data Figure 2: Linear regression analysis and novel junction sequence considerations used to identify mammalian recursive splice sites.

a, Examples of RNA-seq read density patterns for three genes together with their calculated gradients across the (1) first intron >50 kb, and (2) the average across all other >50-kb long introns within the same gene. Gradients represent the change in summated read count every 5 kb since RNA-seq reads are grouped in 5-kb windows and linear regression performed on resulting histograms. b, Density plot indicating the ratio of gradients of all other >50 kb introns within the same gene: the gradient of the first intron >50 kb. Blue hashed line represents ratio of 1. This would indicate that gradients for long introns within the same gene are comparable and transcription is proceeding at a largely constant rate. c, Schematic of the bioinformatics pipeline used to identify novel junctions. d, Ranking of human 5′ splice site pentamer usage genome-wide. e, Nucleotide usage frequency at human 3′ splice sites genome-wide, and branch-point positioning relative to 3′ splice site genome-wide.