The non-canonical proteome, comprising hundreds of unannotated peptides encoded by small open reading frames, significantly expands the functional landscape of our genome beyond conventional protein-coding genes. Recent work by Shi et al. published in Cell Research combines several detection techniques and functional validations to explore this uncharted territory, and shows that more than a thousand of these small peptides play significant roles in tumor cell growth, offering new insights into cancer biology.

Advanced technologies such as ribosome profiling (Ribo-seq)1 and mass spectrometry have revealed that many non-coding RNAs harbor small open reading frames (sORFs) encoding microproteins and peptides that are typically shorter than 100 amino acids.2,3,4 Ribo-seq identifies actively translated sORFs by capturing and sequencing fragments of messenger RNA protected by ribosomes, providing a detailed view of nucleotide translation with high resolution. Ribo-seq does not directly detect proteins but provides evidence of ongoing translation. By contrast, mass spectrometry fragments proteins to measure their masses, making it possible to discover novel peptides encoded by sORFs.2,4 Unlike conventional “canonical” proteins annotated in databases, these small peptides have largely been overlooked and remain absent from major reference databases. However, recent initiatives aim to create standardized catalogs of non-canonical translation,3 offering valuable resources for biomedical research.

Non-canonical sORFs and their translated products are emerging as crucial players in diverse biological processes, including metabolism,4 cellular stress responses,5 RNA processing,6 and cell survival.7,8 Moreover, emerging evidence highlights the involvement of these novel peptides in diverse cancer mechanisms, emphasizing their potential as biomarkers and therapeutic targets.9 For example, a microprotein encoded by ASDURF promoted medulloblastoma cell survival through engagement with the prefoldin-like chaperone complex.7 Similarly, the GREP1 gene encodes a small secreted protein that is highly expressed in breast cancer. Knocking out this gene led to selective growth impairments in breast cancer cell lines.8 These findings suggest that peptides encoded by sORFs are a previously unrecognized part of the proteome, with important roles in health and disease. Yet, many questions remain unanswered, such as which non-canonical sORFs encode functional peptides and how they exert their effects.

In a recent study published in Cell Research by Shi et al.,10 the authors advance into this question by exploring the role of sORFs in gastric cancer (Fig. 1). In the past, detecting stable peptides encoded by sORFs has been challenging because of their small size and low abundance. To overcome this, the authors developed an adapted ultrafiltration tandem mass spectrometry method to enrich small proteins and peptides. Using this approach, they created a comprehensive reference library of 8945 previously unannotated peptides encoded by sORFs in human gastric cancer tissues and cell lines. To evaluate their impact on cancer growth, they performed a large-scale functional screen, being able to test the effect of depleting thousands of individual sORFs on cell viability. The study identified over a thousand sORFs that significantly influenced tumor cell proliferation. Most of the peptides encoded by these sORFs promoted cancer growth, and a subset was further validated for their essential roles in processes such as energy production, cholesterol metabolism, and cellular growth pathways.

Fig. 1: This illustration, adapted from the original publication by Shi et al., outlines a typical workflow for the discovery and characterization of the non-canonical proteome, focusing on peptides encoded by sORFs within the non-coding regions of the genome.
figure 1

Non-coding regions account for > 98% of the genome, while canonical protein-coding regions represent < 2%. The identification of sORFs and their encoded peptides relies on two key techniques: ribosome profiling and mass spectrometry, both of which require a reference library of non-canonical sORFs. In their study, Shi and colleagues used mass spectrometry to identify 8945 novel peptides encoded by sORFs in human gastric cancer. They further validated the biological significance of these peptides by examining their peptide–protein interactome, evolutionary conservation, subcellular localization, and effects on cell viability. These findings highlight the potential roles of hundreds of novel peptides in cancer biology.

The findings of this study have a number of further important implications. First, they suggest that the non-canonical proteome contains a vast pool of uncharacterized peptides that play crucial roles in cancer development. This builds on findings from previous studies in other cancer types.7,8 Another intriguing aspect is that many sORFs seem to have evolved recently in humans and primates6,10,11 and are often localized to mitochondria,2 the powerhouses of the cell. This adds a layer of evolutionary and functional complexity to the non-canonical proteome. The study also highlights the power of combining computational prediction, high-throughput screening, and experimental validation to systematically investigate the non-canonical proteome. The authors combined CRISPR genome editing, peptide–protein interaction mapping with artificial intelligence tools, and in vivo tumor models to explore how specific peptides drive cancer progression. They also showed that certain peptides are linked to poor clinical outcomes, emphasizing their potential as diagnostic markers and therapeutic targets. Notably, two peptides shorter than 20 amino acids, encoded by the non-coding genes SNHG14 and AL365361.1, were shown to exert proliferative regulatory effects in cancer. The effect of these peptides, with favorable properties for drug development, were validated through in vitro synthesis and administration.

Overall, these findings challenge the traditional dichotomy of coding vs non-coding and underscore the need for further research into the mechanisms by which non-canonical peptides influence cellular function and cancer progression. This study has uncovered only a small portion of the human non-canonical proteome, emphasizing the need for additional efforts to collect and analyze translation data across other cancer types and diseases. Understanding how sORFs are translated and the pathways regulated by their peptides could open new avenues for developing therapies to target cancer and other conditions.