Fig. 2: Overview of SSCVs identified from Sequencing Read Archive and The Cancer Genome Atlas.

a Frequencies of transcriptome sequence data analyzed, binned by the amount of base counts. Transcriptome data were also grouped by the number of detected SSCVs. For example, in the rows [1.0, 2.0), whose base counts are equal to or more than 1.0 Gbp and less than 2.0 Gbp, zero, one, two, three or more SSCVs were identified in 74,805, 5464, 913 and 426 transcriptome sequence data, respectively. b Base substitution patterns of SSCVs according to their relative position to primary novel SSs. Different colors are used to display different types of alternative bases. The x-axes represent different reference bases, and the y-axes represent the numbers of variants. c Histogram showing the distribution of relative position of primary novel SSs to their hijacked SSs (original SSs) for donor (left) and acceptor (right) creating SSCVs. Red dashed lines represent exon-intron boundaries. d Fraction of SSCVs with multiples of three shift sizes (difference between primary novel SSs and hijacked SSs) stratified by coding and non-coding genes. e Sequence motifs of SSCVs with the relative position of primary novel SS to hijacked SS is -4 (left) and +5 (right), respectively. The “GT” dinucleotides at the intrinsic intron edge endow the -4 bp position with the potential to form a novel donor site, featuring “GT” at the fifth and sixth positions within the new intron. In addition, the inherent intron’s fifth and sixth base pairs often comprise “GT” at the donor site, this configuration frequently corresponding to the first two intronic bases of a novel splice donor at the +5 bp position. a, b, c, d, e Source data are provided as a Source Data file.