Fig. 3: Characterisation of genes with non-AUG initiation. | Nature Communications

Fig. 3: Characterisation of genes with non-AUG initiation.

From: Thousands of human non-AUG extended proteoforms lack evidence of evolutionary selection among mammals

Fig. 3

Genes from both PhyloSET and RiboSET (a—CCDC8, b—SFPQ), primary extension and the first 50 codons of CDS are shown. The top panel is a colour-coded codon alignment. The middle panel represents the PhyloCSF score per codon (the bottom bar shows the positions of potential start codons). The bottom panel shows Trips-Viz subcodon Ribo-seq profiles with densities of ribosome footprints differentially coloured based on the supported reading frame. The colours are matched to the reading frames in the ORF plot at the bottom. Black vertical lines indicate the start of the annotated CDS. Grey bars are extended CDS initiated at the proposed non-AUG starts. c Average footprint density at CDS of PhyloSET genes (log10) compared to the Ribo-Seq rank of N-extensions in them; lowest rank is 10470. Spearman correlation (two-sided) for genes with known rank (corr = −0.873, p value = 5.01e-15). d The size of the increase in the number of genes between PhyloSET and RiboSET overlap depending on Ribo-Seq NTE threshold. The distributions (from 500 to 6500 rank, where they overlap) are compared with Mann–Whitney U two-sided test (p value = 0.0016, statistic = 28.5). e Re-identification of non-AUG N-terminal extensions in 24 genes from study Ivanov et al22. ‘Artificial theoretical NTE’ starts from the most 3’ in-frame stop codon and stretches till the first downstream ATG right after non-AUG. PhyloCSF score is calculated for the first upstream 50 codons of theoretical extension. f Comparison of RiboSET and PhyloSET with genes from ref. 22. ‘Phylo’—PhyloSET, ‘Ribo’—RiboSET, ‘ann_24’—genes with annotated non-AUG extensions in GENCODE v35, ‘un_28’ and ‘diff_utr’ are genes which non-AUG extensions are not annotated in GENCODE v35 (‘diff_utr’ genes have different 5’leaders from RefSeq). g PhyloCSF score of upstream regions of ‘ann_24’ (with starts moved downstream) and ‘un_28’22. Box plots: the central line indicates the median, the box limits indicate the interquartile area and whiskers indicate 1.5 × interquartile range. The Mann–Whitney U one-sided test (N = 24 and 28 genes, p value = 0.0086). h PhyloCSF score of upstream regions of RiboSET and UntranslSET genes. Source data is provided as a Source Data file.

Back to article page