Extended Data Fig. 9: Sequence and transcription features of internally duplicated genes.

a) Partial duplication of the 5’ UTR in CG3409. CG3409 encodes different 5’ UTRs, one of which is duplicated and a longer 5’ UTR was generated. Because this 5’ exon is alternative the original coding sequence is presumably not affected. b) Post-duplication intron retention of TRAF3IP3. According to the duplication boundaries (highlighted in light blue), two contigs indicate inclusive transcripts where exons and flanking introns are simultaneously transcribed. c) Out-of-frame transcript of DENND5A. The duplicated region is framed with dashed line boxes. Codons are separated with commas, and amino acids are shown accordingly. The incompatibility of codon phases causes the frameshift. d) Read depth-based quantification of isoforms (see also Methods). Two examples (CG9663, SPG11) are shown. Given only 10 data points for the duplication-present lines of SPG11, we showed all specific values as grey dots. For each gene, we calculated the ratio of the read depth (duplicated region versus flanking region) in lines with or without duplication. For these two examples, the ratio found for lines with duplications was significantly (one-sided Wilcoxon signed rank test P ≤ 0.05) higher than that in lines without duplications, supporting the presence of inclusive isoforms. According to the median ratios, we further calculated the relative expression ratio of inclusive and exclusive transcripts (Fig. 5f). E) qRT–PCR-based quantification of isoforms in flies. Whole-body samples of duplication-absent lines were used as controls. Primers targeting inclusive transcripts did not show signals in the duplication-absent lines, and primers targeting both inclusive and exclusive transcripts generally generated weaker signals than their counterparts in the duplication-present lines. The error bar represents the standard deviation based on three technical replicates.