Machine learning-based proteomics profiling of ALS identifies downregulation of RPS29 that maintains protein homeostasis and STMN2 level

Xu, Wei; Guo, Zhipeng; Guan, Yian; Lv, Shihui; Gao, Xue; Luo, Wenchen; Cheng, Tianlin; Shao, Zhicheng; Tao, Bangbao; Wang, Tao; Qiu, Zhixin

doi:10.1038/s42003-025-08578-8

Download PDF

Article
Open access
Published: 07 August 2025

Machine learning-based proteomics profiling of ALS identifies downregulation of RPS29 that maintains protein homeostasis and STMN2 level

Communications Biology volume 8, Article number: 1177 (2025) Cite this article

4987 Accesses
2 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Amyotrophic lateral sclerosis (ALS) is a devastating motor neuron disease. The molecular understanding of ALS is hampered by the lack of experimental models recapitulating disease heterogeneity and analytical framework integrating multi-omics datasets. Here, we developed a pipeline integrating machine learning and consensus clustering to analyze a large-scale dataset of patient-derived motor neuron models from Answer ALS. Compared to the transcriptome, proteomic profiling closely correlates with ALS pathology, which is interrogated to identify 110 proteomics-based biomarkers (Proteomics Markers for ALS 110, PMA110). Functional enrichment highlights dysregulation of ALS pathways, including protein translation and neuronal function. By integrating ALS subtype-specific proteins with patient postmortem proteomics, we found that RPS29 was consistently downregulated in ALS models and patient motor neurons. RPS29 is required for neuronal viability by maintaining ribosome profiling and accurate translation, and suppressing pathological translation. RPS29 downregulation suppresses translation of STMN2, an essential protein for motor neurons, in iPSC-derived motor neurons. Taken together, this study provides a robust framework for ALS proteomics, identifies RPS29 as a quality controller of protein translation, and presents a translational mechanism for STMN2 maintenance in ALS.

A plasma proteomics-based candidate biomarker panel predictive of amyotrophic lateral sclerosis

Article Open access 19 August 2025

Multi-omic analysis of selectively vulnerable motor neuron subtypes implicates altered lipid metabolism in ALS

Article 15 November 2021

Lipidomics study of plasma from patients suggest that ALS and PLS are part of a continuum of motor neuron disorders

Article Open access 30 June 2021

Introduction

Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease affecting the central nervous system, characterized by a protracted and challenging diagnostic process¹. The median time from symptom onset to death typically spans 2 to 3 years, leading to the 5-year survival rate around 20%^2,3. There is a significant heterogeneity in clinical presentation and prognosis for ALS⁴. Approximately 5 to 10% of ALS cases are familial, with a family history of ALS, while the majority (90%) are classified as sporadic⁵. However, the etiology of ALS remains largely elusive, and no effective therapies are available⁶. Advancing the understanding of molecular mechanisms underlying this deadly disease is crucial for identifying diagnostic biomarkers and therapeutic targets to this unmet medical need.

Recent years have witnessed a remarkable advancement in molecular profiling of ALS samples, leading to a list of about 53 genes associated with ALS⁷. Most of these genes were discovered through genetic analysis, such as genome-wide association study and next-generation sequencing. However, the alteration rate for these genes is rather low for the ALS population, and for a majority of ALS cases, there are no known driver genes. The most commonly identified ALS genes include C9orf72, SOD1, TARDBP, and FUS, which are altered in 70% of familial ALS (fALS), but only in 10% of sporadic ALS (sALS) patients⁸. Such a tremendous genetic heterogeneity largely impedes the development of targeted treatment for ALS. Although several molecular targets are being developed for therapeutic treatment, including ATXN2⁹, STMN2¹⁰, NEK1¹¹, and SOD1¹², ALS remains fatal without cure. Exploring the biomarkers and targets from additional molecular levels, such as proteomics and epigenetics, which provide regulatory information downstream of genomics, holds promise for exploring novel targets and improving ALS treatment.

Due to the scarcity of ALS postmortem tissues and their inability for experimental processes, ALS is lacking a proper experimental platform to recapitulate its heterogeneity. Patient-derived induced pluripotent stem cells (iPSCs), which could be differentiated into motor neurons and preserve the genetic background of the donor patient, provide a representative model system in ALS¹³. A small panel of iPSC models with defined genetic backgrounds has been used to identify pathogenic mechanisms and potential therapeutic targets^{14,15,16,17,18}. However, the genetic heterogeneity of ALS is largely neglected by using limited iPSC models in these studies. Answer ALS (AALS) initiative is an international effort to address these challenges by generating over 1000 iPSC lines and differentiating them into motor neurons, which are characterized across multi-modality techniques, including whole genome sequencing, RNA sequencing, proteomics, and epigenetic profiles¹⁹. This comprehensive multi-omics dataset provides a valuable resource to elucidate the motor neuron-intrinsic molecular changes during ALS development. However, a proper analytical pipeline for this heterogeneous dataset is still under active development and will contribute to data interpretation in ALS.

In this study, we developed an integrative analytical pipeline to interrogate the multi-omics datasets of AALS motor neurons. Due to the molecular heterogeneity across models and the network crosstalk among molecular features, the commonly used methods, such as t-tests or DEseq2, failed to capture the differential expression patterns²⁰. In contrast, machine learning algorithms offer several advantages, such as nonlinearity and fault tolerance, making them particularly suited for complex applications²¹. We employed machine learning algorithms to identify potential biomarkers to distinguish ALS cases from controls in the AALS proteomics dataset. By integrating machine learning with molecular subtypes of AALS, we resolved the molecular heterogeneity and identified potential proteomic targets for patient stratification and treatment.

Results

Analytical pipeline of ALS cohort and multi-omics dataset

To comprehensively characterize the proteomic biomarkers and therapeutic targets for ALS, we interrogated the multi-omics dataset from Answer ALS (AALS) project and other published patient cohorts by applying machine learning algorithms, bioinformatic analysis and experimental validation (Fig. 1). The dataset from AALS project analyzed in this study includes 169 ALS patients and 33 controls (Table 1). 28 patients were reported to be associated with ALS genetic variants, including C9orf72 (n = 16), SOD1 (n = 7), TARDP (n = 1), and others (n = 4). Most patients were sporadic cases (n = 141). Peripheral blood mononuclear cells (PBMCs) were collected from these volunteers and reprogrammed into iPSCs using the well-established protocols, followed by a standardized 32-day motor neuron induction protocol²². The obtained motor neurons were further characterized by transcriptomic and proteomic profiling to systematically establish the molecular landscapes in ALS motor neuron panel. On average, 21,625 genes were quantified per sample at the transcriptome level, and 4441 proteins were quantified per sample in the proteomic data.

Table 1 Summary of the demographics of the AALS cohort

Full size table

Machine learning and consensus clustering of the AALS proteomics dataset were applied to identify a protein signature associated with ALS pathology. Subsequent integration with a proteomics dataset of laser capture microdissection and single-cell mass spectrometry-based proteomics from individual motor neurons (MNs) from ALS postmortem tissues prioritized candidate proteins for further validation²³.

Machine learning of ALS multi-omics dataset identifies 110 ALS-relevant protein markers

To identify potential biomarkers and targets in ALS motor neurons compared to control, we started to analyze the differentially expressed proteins (DEP). Unfortunately, DEP analysis failed to identify high-confidence candidates after multiple comparisons (Supplementary Fig. 1a, b). Given the substantial heterogeneity of data from ALS cohort and the complex molecular regulatory network, DEP analysis was not able to interpret the heterogeneity and the non-linear molecular interactions. We hypothesized that machine learning techniques could model the crosstalk among features and identify potential biomarkers to classify ALS motor neurons and controls. Surprisingly, our analyses revealed that proteomic features yielded better predictive performance compared to transcriptomic features in multiple machine learning algorithms (Random Forest, XGBoost, and SVM), suggesting the power of proteomic data in modeling ALS pathology (Fig. 2a and Supplementary Fig. 1c, d). Similarly, we applied a graph convolutional networks (GCN)-based algorithms, Multi-Omics Graph cOnvolutional NETworks (MOGONET)²⁴, for integrative analysis of both transcriptomic and proteomic data, which also showed that the proteomics features ranked top in predicting ALS and control motor neurons, compared to the transcriptomic features (Supplementary Fig. 1e). These data indicate the importance of proteomic data in ALS classification compared to transcriptomic data, which probably indicates that protein regulation is closely related in ALS pathology. Therefore, we mainly focused on the proteomic data in this study.

To establish prediction models and perform feature selection in AALS proteomic dataset, we introduced a machine learning pipeline combined with bootstrapping strategy. For each training iteration, we randomly selected 70% samples of the dataset as the training subset. Proteins that exhibited feature importance for the Random Forest model exceeding a predetermined threshold were identified in each iteration. The frequency of protein features selected across 10,000 training iteration models was quantified and ranked, ultimately generating a candidate list of 110 protein features (Proteomics Markers for ALS 110, PMA110; Fig. 2b, Supplementary Fig. 2a, b and Supplementary Data 1). Incorporating other variables, such as gender, age and batch, into the machine learning-based prediction revealed that these features had only limited impact on model performance, as shown by the overlapped features with PMA110 (Supplementary Fig. 2c). In line with previous studies¹⁴, we used the top 500 most variable genes in PCA analysis and checked the distribution of these variables, but overall we didn’t find a strong biased distribution for them (Supplementary Fig. 2d, e). Subsequent retraining using this selected PMA110 protein feature set demonstrated improved accuracy and enhanced stability in predicting ALS and control samples (Supplementary Fig. 2f). Notably, several of these proteins, including DDX3Y²⁵, NMNAT1²⁶, PAK3²⁷, YY1²⁸, ACADM²⁹, and YTHDF1³⁰, have been previously implicated in ALS. Shapley Additive exPlanations (SHAP)³¹ is widely used in explaining the feature importance in machine learning and artificial intelligence models. Therefore, we utilized SHAP plots to visually evaluate the impact of PMA110 features on model predictions, considering both the feature values (encoded by gradient colors) and the direction of prediction contribution (indicating the likelihood of ALS) (Supplementary Fig. 3a). For instance, EIF4E2 exhibited large SHAP values, suggesting a substantial discriminatory power and a significant contribution on the model’s output. Individuals with lower EIF4E2 levels (indicated by blue) are more likely to develop ALS compared to those with higher levels (indicated by red).

Among PMA110 signature, only 45% of the protein features showed differential expression (Student’s t-test p < 0.05) at the protein levels between ALS patients and controls, indicating the advantage of machine learning in identify ALS-related protein features. We observed the expression changes for EIF4A2, ACADM, YY1 and BRD2 at the protein level, but not the RNA level (Supplementary Fig. 3b–e), which could be explained by the marginal correlation between protein and RNA levels (Supplementary Fig. 3g). This finding highlights the importance of prioritizing ALS regulators at the protein levels in future investigation. Taken together, these data suggest that the machine learning prediction of the proteomic dataset uncovers potential protein markers linked with ALS pathology. To further explore the correlation between PMA110 and the well-established ALS genes, we linked PMA110 signature with 2 published ALS risk gene sets (ALS meta-set⁷ and RefMap³²), which were curated from genetic and epigenetic association. Overall, we noticed a limited overlap among the three gene sets, suggesting the complexity of ALS pathology at different molecular regulation levels (Supplementary Fig. 3f). We next hypothesized that PMA110 signature may interact with the well-established ALS genes through protein-protein interaction (PPI). To this end, we examined the PPI distance between PMA110 proteins with the ALS meta-set proteins (Supplementary Fig. 3h). A strong interaction relationship between both gene sets was identified, as revealed by the shorter distance between PMA110 and ALS meta-set compared to the randomly selected gene sets, suggesting the functional relevance of PMA110 to ALS (Fig. 2c).

Functional enrichment of PMA110 signature reveals ALS pathological pathways

To gain a deeper understanding of biological functions related to PMA110 signature, we conducted functional enrichment analysis using the 110 candidate proteins. Gene Ontology (GO) and KEGG analysis revealed that these candidates are enriched in pathways associated with stress response, oxidative phosphorylation, RNA metabolism, DNA damage response, neuron function, MAPK signaling, membrane regulation, and lipid metabolism, which was in line with previous findings^{33,34,35,36,37,38,39} (Fig. 2d, e). Utilizing STRING database, we grouped the candidate proteins into four distinct PPI clusters, each representing different interaction patterns and functional relevance. Cluster1 was associated with mitochondrial function, Cluster2 with DNA damage response, Cluster3 with RNA binding regulation, and Cluster4 with protein transport and localization (Fig. 2f). Notably, the hub proteins located at the central positions of each PPI cluster exhibited differential expression levels between ALS and control groups (Supplementary Fig. 4a–d), including the well-established ALS proteins, such as TP53 and HNRNPA2B1. In addition, the interacting proteins also displayed a strong correlation of protein expression levels, further supporting the functional cooperation within each PPI cluster for ALS (Supplementary Fig. 4e–h).

In addition to functional and pathway enrichment analysis, we performed conservation analysis of PMA110 proteins to assess the functional significance of the candidate genes (Supplementary Fig. 5a–c). This list of genes was found to be more haploinsufficient, as indicated by their haploinsufficiency (HI) scores⁴⁰ when compared to all the protein-coding genes. Furthermore, these candidates exhibited less intolerance to loss-of-function (LoF) mutations⁴¹ and various other mutation types, as demonstrated by LoFtool score and the Residual Variation Intolerance Score (RVIS)⁴². By contrast, a randomly selected protein set did not show significant differences from the background coding proteins. Collectively, these results underscore the functional significance of the PMA110 signature and suggest that alterations of their protein levels in patients may confer pathological effects.

Inter-patient molecular heterogeneity is resolved by proteomics-based ALS subtypes

Considering the individual genetic heterogeneity across AALS cohort, we hypothesized that molecular subtype analysis could reduce the overall heterogeneity, stratify patients into subtypes with similar molecular backgrounds, and identify subtype-specific alterations in ALS. To this end, we performed consensus clustering based on transcriptional and proteomic expression profiles. Consistent with machine learning modelling, proteomics-based clustering outperformed transcriptomic clustering in distinguishing patient subtypes (Fig. 3a, Supplementary Fig. 6a–h). Consensus clustering divided the patients into four distinct subtypes (Proteomics Subtype 1-4, ProS1-4), and principal component analysis (PCA) demonstrated that this classification was robust (Fig. 3b). The 4 proteomics subtypes showed distinct protein expression patterns and clinical relevance (Fig. 3c and Table 2). ProS1 showed a late ALS onset, as revealed by an older age at symptom onset. ProS2 subtype was mainly from female patients and with higher C9orf72 repeats. These data were in line with previous findings that aging and sex differences were involved in the ALS pathology¹⁴.

Table 2 Summary of the demographics of the four subtypes in AALS cohort

Full size table

To further characterize the functional difference of each proteomic subtype, we conducted Ingenuity Pathway Analysis (IPA) using the subtype-specifically expressed proteins to identify subtype-specific activation or inhibition of specific pathways (Fig. 3d and Supplementary Fig. 7a–d). ProS1 showed activation of the protein folding and extracellular matrix (ECM) and decreased neuronal function; ProS2 subtype exhibited increased ECM, antigen presentation, but downregulation of the MAPK pathway; ProS3 had upregulated protein transport and dysregulated mitochondrial function; ProS4 showed significant activation of the AMPK pathway and inhibition of translation and ECM. By comparing these proteomic subtypes with previously established ALS subtypes of ALS-Ox, ALS-Glia, and ALS-TE⁴³, we found that ProS1 was mainly enriched in ALS-Glia subtype, ProS2 enriched in ALS-Ox subtype, and Pro S3 and S4 split into all the 3 subtypes with a slight enrichment in ALS-TE subtypes (Supplementary Fig. 7e). We also employed an additional clustering methodology³⁷ to classify AALS dataset (Supplementary Fig. 7f). The pathway activation pattern shown in the heatmap reveals that ProS2 subtype, characterized by activation of immune pathway and extracellular matrix, aligned with the C3 subtype, whereas ProS3 exhibits extracellular matrix (ECM) suppression, resembling the C4 subtype. Overall, our analysis identified proteomic clusters consistent with previous studies and revealed a novel subtype, ProS4, characterized by translational dysregulation.

Next, we explored the protein modules across subtypes using Weighted Gene Co-expression Network Analysis (WGCNA), identifying 14 proteomics-based co-expression modules (Fig. 3e and Supplementary Fig. 8a, b). We selected the most significantly up- or down-regulated modules in each cluster for enrichment analysis (Fig. 3f). Among the up-regulated module activities, module red (ProS1 and ProS3) enriched protein localization and mitophagy pathway. Module yellow (ProS2) enriched protein metabolism, neurodevelopment, and aging pathways. Module tan (ProS4) enriched pathways related to sugar metabolism. Among the down-regulated modules, module black (ProS1) significantly down-regulated energy metabolism and endocytosis. Module turquoise (ProS2) significantly downregulated RNA splicing-related pathways. Module pink (ProS3) significantly downregulated protein stability pathways. Module blue (ProS4) significantly downregulated ribosome function and cytoplasmic translation pathway. These data suggest a remarkable inter-patient heterogeneity among ALS, and that the specific dysregulation of functional pathways are observed in only a subset of ALS cohort.

Consensus clustering integrated with machine learning modelling in AALS identifies subtype-specific targets in ALS patients

We examined the expression pattern of the PMA110 signature across the AALS proteomics subgroups and observed that certain proteins showed differential expression patterns among the subtypes (Supplementary Fig. 9a). To identify potential targets for further experimental validation, we hypothesized that consensus clustering could reduce patient heterogeneity by stratifying individuals into molecular subtypes, and that subtype-specific differentially expressed proteins (DEPs) would better explain the expression patterns of features selected by machine learning models, which helps elucidate subtype-specific regulators and identify biologically meaningful targets within ALS patient cohorts. To this end, we established a pipeline of integrating AALS subtype-specific protein data (ProS), machine learning identified protein signature (PMA110) with ALS patient proteomic analysis (Fig. 4a). Differential expression analysis between each subtype and the control group in AALS dataset yielded a set of differentially expressed proteins, in contrast to the whole cohort analysis, indicating that the patient heterogeneity was indeed resolved at the molecular subtype level (Fig. 4b, Supplementary Fig. 9b–e and Supplementary Data 2). Overlapping analysis of these subtype-specific DEPs with PMA110 yielded a list of 34 proteins (Fig. 4c and Supplementary Fig. 9f). Given that the AALS dataset was derived from iPSC-differentiated motor neurons, we sought to validate whether these motor neuron-intrinsic candidates could be recapitulated in an ALS patient single-cell proteomic dataset obtained from laser-capture microdissection of individual motor neurons from the thoraco-lumbar ventral spinal horns. Analysis of 34 protein candidates identified 11 proteins detectable in postmortem tissue, due to the technical limitations of proteomics. Among these, five proteins (RPS29, RALA, GPS1, ARRB1, and TBC1D9B) showed differential expression in ALS motor neurons compared to controls. Notably, concordant expression changes between AALS models and postmortem tissue were observed for RPS29, RALA, and GPS1 (Fig. 4d–f). Given the involvement in translational regulation and ribosome integrity for RPS29⁴⁴, we proceeded with functional validation of RPS29. In contrast to the downregulation of RPS29 protein levels, RPS29 mRNA levels showed no reduction in ProS4 (Supplementary Fig. 9g). Considering that mis-splicing of RPS29 has been implicated in Diamond-Blackfan anemia⁴⁵, we examined alternative splicing events of RPS29. However, no significant differences were detected in ProS4 and control group (Supplementary Fig. 9h). These findings suggest that RPS29 protein downregulation was not associated with transcription and alternative splicing regulation.

**Fig. 4: Consensus clustering integrated with machine learning modelling in AALS identifies subtype-specific target in ALS patients.**

RPS29 serves as a quality control regulator of protein translation

Precise control of protein translation is essential for motor neuron function; however, its dysregulation in ALS is not well understood. Motivated by the discovery from the integrative analytical pipeline of ALS patient-derived cellular models and postmortem tissues, we sought to validate whether reduced RPS29, a ribosomal protein of the 40S ribosome subunits, accounted for the dysregulated protein translation control in ALS. We first analyzed the functional role of RPS29 in global protein translation using the puromycin incorporation assay. We observed a significant reduction in protein translation rate after RPS29 knockdown (Fig. 5a, b and Supplementary Fig. 10a). Such reduced protein translation further led to disrupted protein homeostasis, as evidenced by concomitant increased global ubiquitylation level, and neuronal cell death (Fig. 5c–e). Faithful translation of mRNA into corresponding protein requires intact ribosomal function, and neurons are particularly sensitive to translational infidelity⁴⁶. To further analyze the protein translation fidelity affected by RPS29 loss, we introduced two dual-luciferase reporters to measure the stop codon readthrough and amino acid misincorporation, two common types of translational errors, respectively^47,48. The Renilla luciferase serves as an internal control for both mRNA abundance and the normal translation whereas the Firefly luciferase is silenced due to the existence of either a key mutation (H245R) or an upstream stop codon but will restore its activity when misincorporation or stop codon read through occurs (Fig. 5f). The results showed that, after RPS29 knockdown, both types of translation errors were increased (Fig. 5g). In contrast, the frequency of translation initiation at near-cognate start codon was not affected by RPS29 knockdown⁴⁹ (Supplementary Fig. 10b, c). These data suggest the RPS29 loss resulted in impaired protein synthesis rate and accuracy.

RPS29 maintains STMN2 translation and protein expression

To further understand the molecular mechanisms of RPS29 loss in ALS, we performed Ribosome-seq (Ribo-seq) combined with RNA-seq to characterize the translatome and transcriptome in wildtype and RPS29 knockdown cells and explore the RPS29-regulated cellular process (Supplementary Fig. 10d–g). GSEA analysis showed that a series of key pathways were dysregulated after RPS29 knockdown, including ribosome proteins, translation and its related quality control, microtubule growth, and DNA damage response (Fig. 6a, b). Interestingly, p53 pathway was activated after RPS29 knockdown, which was in line with the p53 activation in ALS patient samples⁵⁰. In line with the role of RPS29 in cytoplasmic ribosomes, protein translation pathway was downregulated at the protein translation level (Supplementary Fig. 10h).

Next, we dissected the downstream targets of RPS29 involved in ALS. Ribo-seq profiling showed the decreased translation of ALS-related genes, including TDP43, SOD1, FUS, hnRNPA2B1, CHCHD10, and VCP. Importantly, a panel of motor neuron essential proteins, whose functional loss contribute to the pathogenesis of ALS, including STMN2, KPNA2, and ELAVL3, exhibited decreased protein translation after RPS29 suppression^10,36,51,52 (Fig. 6c). We then confirmed a significant reduction in these proteins’ levels using western blotting and qRT-PCR (Fig. 6d, e, Supplementary Fig. 10i). Among those proteins, STMN2 is a microtubule regulator essential for motor neuron axon growth and its reduced expression is a hallmark of ALS patients. While previous studies suggested that loss of STMN2 in ALS is caused by TDP43-associated cryptic splice-polyadenylation mechanism^10,36, our data indicated that RPS29 severed as an additional regulator of STMN2 to sustain its protein expression at the translation level.

In addition to the conventional protein translation, we next analyzed the regulatory role of RPS29 in the ALS-associated aberrant translation process. Repeat-associated non-AUG (RAN) translation of a disease-causing expanded (GGGGCC)_n repeat in C9orf72, the most common genetic cause of ALS, produces highly toxic dipeptide-repeats (DPRs) proteins in C9orf72-ALS patients. We developed a dual fluorescent reporter to monitor RAN translation activity in cells with RPS29 loss. This reporter generates a single mRNA whose N terminal is a mCherry gene, followed by the intron 1 A region of human C9orf72 gene containing a (GGGGCC)₉₆ repeats and a GFP gene lacking AUG start codon. While the mCherry protein can be generated by conventional translation, the GFP protein can only be produced through cap-independent RAN translation of its upstream (GGGGCC)₉₆ repeats, resulting in a poly(Gly-Ala)-GFP fusion protein. Our data showed that RAN translation activity was significantly enhanced by RPS29 knockdown, as evidenced by the increased polyGA-GFP protein level and the high molecular weight aggregation (Fig. 6f, g). Consistently, we observed larger cytoplasmic GFP-positive polyGA inclusions in the RPS29 knockdown cells (Fig. 6h).

Next, we explored the functional relevance of RPS29 in human iPSC differentiation-derived motor neurons (Supplementary Fig. 11a). Motor neurons showed a significant reduction in neuronal viability after RPS29 suppression (Fig. 6i). To confirm the regulatory axis of RPS29-STMN2, we further investigated the protein expression and distribution of STMN2 in motor neurons following RPS29 knockdown. Compared to the RNA levels (Supplementary Fig. 11b), immunofluorescence assays demonstrated decreased axon STMN2 levels after RPS29 suppression, which was in line with the role of STMN2 in axon protection (Fig. 6j). Taken together, these data suggest that RPS29 loss in ALS patient will disrupt protein homeostasis by inhibiting conventional translation and STMN2 protein expression, but promoting translational errors and aberrant RAN translation.

To further validate the functional importance of RPS29, we examined whether re-expression of RPS29 could rescue the phenotypic defects induced by RPS29 downregulation in SH-SY5Y cells and motor neurons. Remarkably, RPS29 overexpression significantly restored cell viability caused by RPS29 inhibition, suggesting the functional importance of RPS29 and therapeutic potential of RPS29 gene therapy for RPS29-downregulated patients (Fig. 6k).

Discussion

Genome-wide association studies (GWAS) have significantly advanced our understanding of the genetic architecture of ALS pathology, defining a set of genomic alterations associated with ALS^53,54,55,56. However, the discovery of the therapeutic targets for ALS remains challenging due to the technical limitation of targeting genetic alterations, such as the difficulty of correcting mutations with high efficiency and accuracy, and the complexity of the downstream biological consequences. Instead, it holds promise for target discovery at regulatory levels, including protein, epigenetics, and metabolism. In this study, we utilized proteomic and transcriptomic data of over 200 cases from AALS database to construct a machine learning framework, through which we identified 110 proteins linked to ALS (PMA110 signature). Interestingly, the PMA110 signature showed limited overlap with previously reported ALS-related genes, which largely shows the difference between proteomics, transcriptomics, and genetic levels. In addition, the heterogeneity among ALS patient cohort also contributes to the different gene sets in different studies, including patients of different ethnicities across Europe, Africa, and Asia, even if the datasets were derived from the same molecular level^53,54,55,57. By integrating molecular subtype-specific expression patterns of PMA110 signature and proteomics profiling of ALS patient spinal cord, our study pinpointed a panel of proteomic biomarkers and targets, especially RPS29, suggesting the dysregulation of the relevant biological functions in ALS subtypes.

One major obstacle in target discovery for brain disorders is the lacking of patient-derived experimental models that are representative of patients’ pathology and molecular characteristics. Several research groups have successfully generated iPSCs from patients harboring pathogenic mutations in ALS, revealing disease-specific phenotypes at the cellular level in vitro and dissecting the pathogenic role of genetic mutations⁵⁸. Moreover, the iPSC-derived ALS motor neuron have been applied for high-throughput drug screening, identifying ropinirole as a promising therapeutic candidate⁵⁹. Our study, together with these studies, highlight the potential value of iPSC models in drug screening, multi-omics analysis, and functional validation for neurodegenerative diseases. Further improvement of iPSC models, such as the increase in genetic heterogeneity, optimized motor neuron differentiation, and multi-cell type culture as organoid, would provide iPSC models more representative of ALS patients.

In our investigation, we observed a limited number of differentially expressed genes between ALS and control groups in iPSC-derived motor neurons. This observation aligns with previous findings that the iPSC motor neuron (iPSMN) model demonstrates only mild differential gene expression⁶⁰, which also underscores the inherent genetic heterogeneity in ALS. In recent years, artificial intelligence techniques, particularly machine learning tools, have been increasingly applied to sequencing data to enable precise diagnosis. For instance, machine learning is used to analyze cerebrospinal fluid proteomic data from large-scale samples, identifying novel biomarkers to improve the early and accurate diagnosis of Alzheimer’s disease and Parkinson's disease, thereby contributing to a deeper understanding of disease mechanisms and potential diagnostic tools. To address this complexity in ALS, we employed several machine learning methods to identify proteomic features associated with ALS. Among the methods tested, including Random Forest, XGBoost, and Support Vector Machines (SVM), the predictive performance of the three algorithms was comparable and yielded 110 ALS-associated protein markers. For statistical modelling of ALS, there would be several improvements for further studies. First, a larger dataset with more samples would benefit for the machine learning modeling to avoid overfitting and sample imbalance. Next, other regulatory omics data provide additional information. State-of-art algorithms to integrate proteomics, metabolomics, and epigenetics, together with biological pathways and functional association, would further improve the biological relevance of the machine learning process. Last, with the increase in sample scale and multi-omics information, the recently developed methodologies, such as deep learning (e.g., convolutional neural networks) and large language models, should be investigated to deal with the complexity of biological context and enhance the accuracy and reliability of disease prediction models.

PMA110 signature represents a novel list of proteomic candidates for ALS biomarkers and targets, which warrants further experimental and clinical validation. In line with PMA110, subtype DEPs also showed minor overlapping with the well-established ALS genes, suggesting a unique role of protein-level regulation. We hypothesized that the information about subtype-specific DEP expression would better explain the expression trend for features selected by ML-based models, and help to identify biologically-meaningful targets in patient cohort or subtypes. In this study, we focused on RPS29, which is a component of the 40S ribosomal subunit and plays a fundamental role in translation initiation. RPS29 is downregulated in ALS patients, and such reduction is irrelevant to any known genetic variations. However, RPS29 mutation has not been linked to ALS, while it is reported to be associated with Diamond-Blackfan anemia (DBA) through whole-exome sequencing and functional analyses, suggesting the contribution of specific ribosomal proteins to ribosome assembly and protein translation may be cell- and tissue-specific⁴⁵. Nevertheless, in a zebrafish model of DBA, Rps29 mutant or knockout induced p53 pathway activation, and suppression of p53 rescued the morphological and hematopoietic defects associated with RPS29 knockdown^61,62. This aligns with our findings that RPS29 knockdown leads to the upregulation of the p53 signaling pathway, which is also found in ALS samples. Therefore, our data indicate that RPS29 regulates p53 pathway activity in ALS.

We also explored the functional and mechanism association of RPS29 specifically in ALS. Besides the overall reduction of protein translation, RPS29 inhibition significantly impaired expression levels of several motor neuron essential proteins, including STMN2. STMN2 is a member of the nervous system-specific stathmin family, which binds to tubulin dimers to regulate microtubule stability. STMN2 is essential for axonal regeneration and motor neuron survival. It has been reported that TDP-43 is able to bind to the STMN2 pre-mRNA and suppress the inclusion of a cryptic exon in the first exon. TDP-43 loss-of-function leads to the cryptic exon inclusion of STMN2 and reduced mRNA expression, which is considered as a primary pathological hallmark in ALS^36,63,64. Our data indicate an additional mechanism to maintain STMN2 expression, through RPS29-controlled protein translation of STMN2, in addition to TPD-43-mediated RNA regulation. The underlying mechanism and contribution of these two regulatory mechanisms warrant further studies. Also, it would be of particular interest to test whether RPS29-STMN2 axis is reserved in large-scale patient samples and could be exploited for therapeutic intervention of ALS using techniques like adeno-associated virus-mediated gene therapy to restore RPS29 expression in impaired motor neurons.

Mounting evidence suggests that pathogenic RAN translation of expanded tandem repeat is closely associated with multiple neurological disorders, including C9orf72-ALS^65,66,67. Moreover, recent studies suggested a native function of RAN translation of tandem repeats within the physiological range⁶⁸. However, the underlying mechanism of RAN translation remains unknown. Previous studies suggest that a protein component of the small (40S) ribosomal subunit, RPS25, is required for effective RAN translation⁶⁹. In contrast, in the present study, we demonstrate that another 40S ribosomal protein, RPS29, is served as a key quality controller of protein translation by suppressing RAN translation while sustaining conventional translation. Thus, these data collectively suggest that the 40S ribosomal subunit function may be a key regulatory component for RAN translation.

Taken together, this study performed a proof-of-concept machine learning modelling of proteomics data from ALS patient-derived motor neuron samples and identified a list of 110 proteins for further validation as biomarkers or targets. One of the proteins, RPS29, which maintains translation process, especially STMN2 translation, and suppresses aberrant RAN translation, represents a novel therapeutic target for therapy development.

Methods

Multi-omics datasets from Answer ALS cohort

The clinical information, RNA-seq data and proteomics data from iPSC-differentiated motor neurons were obtained from the Answer ALS Data Portal (https://dataportal.answerals.org) with approved permission.

Cell culture

Human SH-SY5Y neuroblastoma cells, HeLa cells, and 293 T cells were cultured in Dulbecco’s modified Eagle’s medium (DMEM, Thermo Fisher Scientific; C11995500BT) containing 10% fetal bovine serum (FBS) (Oricell; FBSSR-01021-500), 50 units/mL penicillin, and 50 mg/mL streptomycin. The cell line was maintained in a 37 °C incubator with 5% CO₂.

Plasmid construction, lentiviral production, and transfection

The short hairpin RNA (shRPS29: 5’-GCTCTTGTCGTGTCTGTTCAA-3’) was inserted into the plasmid pLKO.1 at restriction enzyme cutting sites of AgeI and EcoRI. Sanger DNA sequencing was used to confirm the insertion of shRNA. The RPS29 overexpression vector was constructed by introducing a synonymous mutation (CGT to AGG at codon 22 encoding Arginine) at the shRNA binding site through PCR, thereby disrupting the shRNA recognition sequence. The mutated fragment was subsequently cloned into the pLenti-PGK-DEST-Hygro lentiviral expression vector using Gateway recombination cloning technology (Thermo Fisher Scientific; 11789020；11791020). For the repeat-associated non-AUG (RAN) translation reporter, a 3x stop codon was appended to the 3’ end of mCherry via PCR amplification and subsequently inserted upstream of the (GGGGCC)₉₆ repeat sequence, including the adjacent intronic sequences, within the pcDNA 3.1-(GGGGCC)₉₆ vector using XbaI and BamHI restriction enzyme sites. The EGFP, lacking the ATG start codon, was then subcloned downstream of the (GGGGCC)₉₆ repeats in-frame with GA. For the dual-luciferase reporter, Renilla luciferase and firefly luciferase were sequentially ligated into the same expression vector by PCR, with the two luciferases separated by an in-frame linker sequence encoding a sense codon in the control construct. To generate stop codon readthrough structures, the linker sequence codon CGA was mutated to the stop codon UGA. For misincorporation measurements, a reporter construct harboring a mutation in the active site of firefly luciferase (H245K; CAC245CGC) was utilized. For the near-start codon reporter, full-length EGFP and an EGFP variant with the start codon mutated to CTG were both amplified by PCR. The CMV promoter was subsequently inserted upstream of EGFP by overlap PCR, and the resulting construct was integrated into the stop codon region of mCherry in the pLenti-ef1a-mc vector via XbaI and EcoRI sites.

293 T cells were used to generate lentiviral particles through co-transfection of the packaging vectors psPAX2 and pMD2.G using LipoD293 In Vitro DNA Transfection Reagent (SignaGen Laboratories; SL100668). Media were changed after 12 h. The lentivirus supernatant was collected 48 h after media change, followed by filter through a 0.45 μm filter and virus concentration (Beyotime; C2901S) according to the instructions. Briefly, 9 mL of viral supernatant was resuspended with 450 μL of resuspension solution and stored in −80 °C for future use. For lentiviral infections, SH-SY5Y, motor neuron, HeLa, and 293 T cells were infected with concentrated shRNA lentivirus and RPS29-overexpressing lentivirus, and the media were replaced after 72 h. Knockdown efficiency was measured by qRT-PCR or immunoblotting 48 h post-infection. For transient transfection, plasmid transfection was carried out using Lipo8000™ Transfection Reagent according to the instructions. Cells were analyzed 72–96 h post-transfection.

RNA Isolation and qRT-PCR

Cellular total RNA was extracted using RNAprep Pure Cell/Bacteria Kit (TIANGEN; DP430), and reverse transcription to cDNA was performed using the reverse transcription reagent premix (Accurate Biology; AG11706). Briefly, 1 μg of total RNA was used for cDNA synthesis using random primers according to the manufacturer’s instructions. Quantitative real-time PCR (qRT-PCR) was performed using the SYBR Green Master mix (Thermo Fisher Scientific; A25778) on an Applied Biosystems Stepone Plus Real-Time PCR System and normalized to 18S ribosomal RNA or GAPDH. The primers in qRT-PCR were designed using Primer3 (https://primer3.ut.ee/).

SH-SY5Y neuronal differentiation

Human SH-SY5Y cells were seeded at a density of 1 × 10⁵ cells per well of a 6-well culture plate in DMEM (Thermo Fisher Scientific; C11995500BT) supplemented with 10% FBS (Oricell; FBSSR-01021-500), 50 units/mL penicillin, and 50 mg/mL of streptomycin. After 24 h, the media was changed to DMEM supplemented with 5% FBS, 50 units/mL penicillin, 50 mg/mL of streptomycin, 4 mM L-glutamine, and 10 µM retinoic acid (MERCK; R2625). After 72 h, the media was switched to neurobasal media (Thermo Fisher Scientific; 12348017) containing 1% N-2 supplement 100x (Thermo Fisher Scientific; 17502048), 50 units/mL penicillin, 50 mg/mL of streptomycin, 1% L-glutamine, and 50 ng/mL human BDNF (PEPROTECH; 450-02). Cells were cultured for additional 3 days for differentiation.

Motor neuron generation

The method for differentiating human iPSCs into motor neurons is based on a previous report with minor modification⁷⁰. Briefly, on Day 0, a commercially obtained fibroblast-induced iPSC clone was dissociated and plated into a single well of a Matrigel-coated 6-well plate, supplemented with 10 µM Rock inhibitor (Selleck; S1049). On Day 1, the medium was changed to neuronal differentiation medium, consisting of 50% Neurobasal medium (Thermo Fisher; 21103049) and 50% DMEM/F12 medium (Thermo Fisher; C11330500BT), supplemented with 1X GlutaMAX (Thermo Fisher; 35050061), 0.5X N2 (Thermo Fisher; 17502048), B27 supplement (Thermo Fisher; 17504044), and 0.1 mM ascorbic acid (Sigma; A4544). Additionally, 3 µM CHIR99021 (Sigma; SML1046), 2 µM SB431542 (Selleck; S1067), and 2 µM DMH-1 (Selleck; S7146) were added, and the culture was maintained for 6 days. Cells were dissociated and plated onto Matrigel-coated 10-cm dishes in neuronal medium containing 1 µM CHIR99021, 2 µM DMH-1, 2 µM SB431542, 0.1 µM retinoic acid (RA, Sigma; R2625-50MG), and 0.5 µM purmorphamine (Selleck; S3042). On Day 13, cells were dissociated using 1 U/mL dispase (Stemcell; 7923) and transferred to ultra-low adhesion 10-cm plates (LABSELECT; 12331) in neuronal medium supplemented with 0.5 µM RA and 0.2 µM purmorphamine. On Day 21, cells were detached using 1X Accutase (Stemcell; 07920) and seeded into PDL/laminin (Sigma; P7405, L2020) coated 6-well plates at a density of 2 × 10⁶ cells/well and onto PDL/laminin-coated slides at a density of 1.6 × 10⁵ cells/well. The culture medium contained neuronal medium supplemented with 0.5 µM RA, 0.2 µM purmorphamine, and 0.1 µM Compound E (Sigma; 565790-500UG). To knockdown RPS29 in the iPSCs-derived motor neurons, cells were transfected with lentivirus expressing shRPS29 for 12 h on Day 25, followed by an 8-day culture period before experimental analysis. To overexpress RPS29 protein in the iPSCs-derived motor neurons, cells were transduced with concentrated RPS29-overexpressing lentivirus for 12 h on Day 25 of differentiation, followed by an 8-day culture period to allow stable expression before experimental analysis.

Neuronal viability

Motor neurons were plated at a density of 2 × 10⁵ cells/well in a 24-well plate with three or four replicate wells. CellTiter-Glo (Promega) was used to measure cell viability, and the absorbance of each well was measured using a Tecan Spark plate reader.

Dual luciferase assays for translation fidelity

Translational fidelity measurements in cells were conducted using a dual-luciferase reporter system. Plasmid transfection was performed using the Lipo8000™ Transfection Reagent (Beyotime; C0533-1.5 ml), following the manufacturer’s instructions, and cells were harvested after 48 h of transfection. Percentage of stop codon readthrough or misincorporation was determined by dividing the firefly-to-Renilla luciferase ratio of the experimental reporter by the mean firefly-to-Renilla ratio of the control reporter, as described previously^48,71. Luciferase activities were quantified using the Dual Luciferase Reporter Gene Assay Kit (Yeasen; 11402ES60). Each sample was incubated in 200 µL of cell lysate at 4 °C for 5 min, transferred to a black 96-well plate, and analyzed using a Tecan Spark plate reader.

Translation measurement using puromycin incorporation assay

This experimental protocol measures translation via puromycin incorporation and does not exhibit any discernible impact on overall translation⁷². To assess translation, puromycin was added to the culture medium at a final concentration of 5 µg/mL for 5 min prior to harvesting. Following treatment, cells were washed with ice-cold PBS and lysed in RIPA buffer (50 mM Tris, pH 8.0, 150 mM NaCl, 0.1% SDS, 1% NP40, and 1% sodium deoxycholate) supplemented with a 1X× protease inhibitor cocktail. Lysates were subjected to SDS-PAGE followed by western blot analysis.

Western blot

Cells were collected and lysed using RIPA buffer (50 mM Tris, pH 8.0, 150 mM NaCl, 0.1% SDS, 1% NP40, and 1% sodium deoxycholate) supplemented with a 1× protease inhibitor cocktail. Cell lysates were then mixed with 5× SDS sample buffer (250 mM Tris, pH 6.8, 750 mM NaCl, 5% NP40, 5% sodium deoxycholate, 10% SDS, 5% 2-mercaptoethanol, and 60 mM EDTA) and boiled for 10 min. The prepared samples were subjected to SDS-PAGE analysis, followed by protein transfer onto nitrocellulose membranes. The membranes were then probed with the indicated primary antibodies, with GAPDH or tubulin serving as loading controls.

Primary antibodies for immunoblotting were as follows: anti-puromycin (Sigma, MABE343, 1:500 dilution); anti-ubiquitin (Proteintech, 10201-2-AP, 1:5000 dilution); anti-RPS29 (Proteintech, 17374-1-AP, 1:2000 dilution); anti-STMN2 (Proteintech, 10586-1-AP, 1:2000 dilution); anti-ELAVL3 (Proteintech, 55047-1-AP, 1:2000 dilution); anti-KPNA2 (Proteintech, 10819-1-AP, 1:2000 dilution); anti-GAPDH (Proteintech, 60004-1-Ig, 1: 500000 dilution); anti-GFP-Tag (ABclonal, AE078, 1:10000 dilution); Anti-mCherry-Tag (ABclonal, AE002, 1:5000 dilution).

Immunofluorescence

For immunofluorescence analysis, cells grown on coverslips were rapidly rinsed with PBS and then fixed in 4% paraformaldehyde in PBS for 20 min at room temperature. Fixed cells were then permeabilized with 0.1% Triton X-100 in PBS and treated with blocking buffer (1× PBS, 3% BSA, and 0.1% Tween-20) for 30 min at room temperature. Cells were incubated with primary antibodies in blocking buffer overnight at 4 °C and washed with 1× PBS supplemented with 0.1% Tween-20. Then, cells were incubated with fluorescently conjugated secondary antibodies in blocking buffer for 1.5 h at room temperature and washed with 1× PBS supplemented with 0.1% Tween-20. Coverslips were mounted with ProLong Diamond with DAPI (Thermo Fisher).

Images were captured by an FV3000 Confocal Microscope (Olympus) equipped with the FV31S-SW Software and a 60X oil objective using the same settings to allow the comparison of signal intensities across samples. Images were analyzed using Fiji software. For each sample, imaging fields were randomly selected. For motor neuron imaging analysis, the soma of each neuron was first segmented to calculate the average intensity of STMN2. About 40 µm of axons near the soma side were then segmented to calculate the average intensity of STMN2 in axons. Relative STMN2 expression level was obtained by dividing the mean intensity of the axons by the mean intensity of the soma in each neuron.

MOGONET analysis

The analysis framework of Multi-Omics Graph cOnvolutional NETworks (MOGONET) was obtained from the GitHub repository²⁴. The parameters were modified to accommodate the two available omics data types (RNA-seq and proteomic data). For each omics dataset, the 200 features with the largest variance were selected to prioritize the most variable and potentially informative features. Each type of omics data was individually scaled to [0, 1] through linear transformations for training. The data were randomly divided into a training set (70% of the total samples) and a testing set (30% of the total samples) to facilitate model training and evaluation. Finally, the top 50 biomarkers for both transcriptional and proteomic data were identified.

Machine learning-based PMA110 signature

We retrieved the Matrix of Intensities data from the AALS proteomics dataset portal (https://data.answerals.org/home). The data matrix was log2-transformed for normalization. Missing values were handled by imputing them with zero. Proteins that were not expressed in over 75% of the samples were filtered out. In terms of batch correction, we incorporated batch information as a feature in the machine learning model, allowing the model to account for batch effects during analysis. In Python, the control sample data was filled using the “SMOTE” function. The model was trained using the “RandomForestClassifier” in the “sklearn” library, with 70% of the samples used for training and 30% for prediction in each run. Given the limited number of control samples (n = 33), SMOTE was applied to balance the classes. This approach, compared to Leave-One-Out Cross-Validation, enabled more stable model evaluation and reduced the risk of overfitting due to the presence of synthetic samples. The “permutation_importance” was used to obtain the features whose importance was above the threshold of 5 × 10⁻³ in each run. We tested multiple thresholds and found that different cutoff values yielded largely similar rankings of the selected features. The number of times for selected features whose importance was above the threshold in 10,000 runs was calculated, and features with more than 7000 times were considered as candidate features for further analysis.

Protein-protein interaction (PPI) distance

We downloaded the human PPIs from BioGRID (version 4.4.228), including 20,011 proteins and 1,096,621 PPIs, followed by interrogation of PPI distance between a specific protein and ALS-related proteins using this PPI network. When calculating the interaction distance between protein A and ALS protein, if the protein set B that interacts with protein A includes the ALS list, the PPI distance is taken as 1. If set B does not include the ALS list, then if the protein set C that interacts with the proteins in set B includes the ALS list, the PPI distance is 2; otherwise, the PPI distance is greater than or equal to 3.

Enrichment analysis of PPI module

Enrichment analysis was performed using Metascape⁷³, with biological processes for GO enrichment and KEGG pathway enrichment. The enrichment analysis was performed against the whole protein-coding genes as background, and the most significant representative terms were adjusted by false discovery rate (FDR) and selected for visualization. It should be noted that the background in the functional enrichment analysis is all the protein-coding genes, instead of proteins detected in the proteomics dataset. Due to the technical limitation in proteomics technology, only subset of proteins can be accurately detected. To avoid missing biologically meaningful pathway information, we chose all the protein-coding genes as background for statistical analysis. Protein interaction analysis was performed by STRING⁷⁴. The “cluster” function was used to divide the candidate protein list into 4 modules of PPI network and obtain the enrichment analysis results of the four modules. The protein interaction data were retrieved and imported into Cytoscape⁷⁵ software to obtain the protein interactions.

Consensus clustering and WGCNA

In order to classify ALS patients into molecular subtypes, we used the “ConsensusClusterPlus”⁷⁶ package in R for consensus clustering. The maximum classification parameter “maxK” was set to 20, and the parameter k = 4 was selected based on the output cluster performance.

To perform gene co-expression analysis, we used ”WGCNA”^77,78 package in R. Pairwise Pearson correlations were calculated to build signed regulatory networks using WGCNA. We applied soft thresholding techniques to approximate scale-free topological networks by constructing adjacency matrices, and defined multiple modules of gene co-expression. WGCNA was performed using the following parameters: soft-thresholding power = 0.85, minimum module size = 30, and module similarity cut-height = 0.25. Signature genes or proteins were calculated as the first principal component of each module. We calculated the relationship between WGCNA modules and four proteomics subtypes. Modules that were highly correlated with specific proteomics subtypes were screened based on the p values and correlation coefficient. Functional enrichment analysis was performed using Metascape.

To classify samples from the AALS database into transcriptional subtypes defined by Tam et al. ⁴³. (Retrotransposon Activation [TE], Oxidative Stress [Ox], and Activated Glia [Glia]), We implemented a weighted scoring approach based on subtype-specific marker genes. We retrieved the molecular signature markers for each subtype and calculated the Tam Subtype score using weighted marker expression levels. Each AALS sample was classified into one subtype with the highest score.

RNA-seq

For transcriptomic data of RPS29-knockdown cells, total RNA was extracted using the RNAprep Pure Cell/Bacteria Kit (TIANGEN; DP430) following the manufacturer’s protocol. The extracted RNA was utilized for library construction with the Illumina TruSeq Stranded Total RNA Library Prep Kit. The library was sequenced using paired-end 150-bp reads. Raw FASTQ reads were processed by trimming with Trim Galore (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/), and transcript quantification was conducted using Salmon software in quasi-mapping mode⁷⁹. Salmon “quant” files were subsequently converted using the Tximport function. Differentially expressed genes (DEGs) were identified using DESeq2⁸⁰. Gene set enrichment analysis (GSEA) was performed on a preranked gene list, with gene expression fold change as the ranking metric, using the GSEA desktop application⁸¹.

To investigate differential splicing events between ProS4 and control samples, raw FASTQ files were obtained from the AALS database. Raw FASTQ reads were processed by trimming with Trim Galore (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). Filtered reads were aligned to the GRCh38 human reference genome using STAR⁸². Splicing quantification was performed using rMATS to detect five major types of alternative splicing events: skipped exons (SE), alternative 5’/3’ splice sites (A5SS/A3SS), mutually exclusive exons (MXE), and retained introns (RI).

Ribo-seq

Cells were treated with a specific lysis buffer containing cycloheximide (50 mg/mL) to obtain the lysate. The concentration of the lysate was measured using a Qubit fluorometer. To digest RNA other than ribosome-protected fragments (RPFs), cell or tissue lysates were treated with the unspecific endoribonuclease RNase I. Monosomes were isolated by size-exclusion chromatography and purification. Both ends of the RPFs were phosphorylated and ligated with 5’ and 3’ adapters, respectively. RNA samples were treated with an rRNA depletion kit (Qiagen; 334387) to minimize rRNA contamination. The RNA fragments were reverse transcribed and amplified by PCR, followed by library construction using the Multiplex Small RNA Library Prep Set for Illumina (NEB; E7300L). The libraries were subjected to Illumina sequencing with single-end 50 bp (SE50) sequencing.

Raw FASTQ reads were trimmed using Trim Galore to retain fragments between 20 and 40 base pairs. Ribosomal RNA (rRNA) and transfer RNA (tRNA) sequences were removed using Bowtie2 alignment⁸³. The remaining sequences were aligned to the hg38 human genome using STAR⁸², and gene expression was quantified using featureCounts⁸⁴. Differential analysis was identified using DESeq2⁸⁰.

Statistical analyses

Unless otherwise stated, all data are presented as the mean ± standard deviation (SD). Graph visualization was performed using specific R packages, ggplot2 package⁸⁵ or Prism 9. Statistical significance was determined as follows: *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001. A two-tailed, unpaired t-test was used for comparisons between two groups. For box-and-whisker plot, the box indicates interquartile range (IQR), the line in the box indicates the median, the whiskers indicate points within Q3 + 1.5× IQR and Q1 − 1.5× IQR, and the points beyond whiskers indicate outliers. Q1 and Q3, the first and third quartiles, respectively.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All raw sequencing data and processed data are available at the Gene Expression Omnibus through accession number GSE285221. Uncropped gels are provided in Supplementary Fig. 12. The source data behind the graphs in the paper can be found in Supplementary Data 3. Any additional data and information are available from the corresponding author upon request.

Code availability

All major analyses were carried out using publicly available tools as described in the Methods. RNA-seq and Ribo-seq data processing were performed in a Linux environment using standard bioinformatics pipelines. Data processing, statistical testing, and visualization—including PCA, volcano plots, and heatmaps—were conducted in R (version 4.2.3). The full set of R scripts used for analysis and figure generation is available from the corresponding author upon request.

References

Feldman, E. L. et al. Amyotrophic lateral sclerosis. Lancet 400, 1363–1380 (2022).
Article CAS PubMed PubMed Central Google Scholar
Westeneng, H.-J. et al. Prognosis for patients with amyotrophic lateral sclerosis: development and validation of a personalised prediction model. Lancet Neurol. 17, 423–433 (2018).
Article PubMed Google Scholar
Hardiman, O., van den Berg, L. & Kiernan, M. Clinical diagnosis and management of amyotrophic lateral sclerosis. Nat. Rev. Neurol. 7, 639–649 (2011).
Article CAS PubMed Google Scholar
Kiernan, M. C. et al. Improving clinical trial outcomes in amyotrophic lateral sclerosis. Nat. Rev. Neurol. 17, 104–118 (2021).
Article PubMed Google Scholar
Byrne, S. et al. Rate of familial amyotrophic lateral sclerosis: a systematic review and meta-analysis. J. Neurol. Neurosurg. Psychiatry 82, 623–627 (2011).
Article PubMed Google Scholar
Mead, R. J., Shan, N., Reiser, H. J., Marshall, F. & Shaw, P. J. Amyotrophic lateral sclerosis: a neurodegenerative disorder poised for successful therapeutic translation. Nat. Rev. Drug Discov. 22, 185–212 (2023).
Article CAS PubMed Google Scholar
Udine, E., Jain, A. & van Blitterswijk, M. Advances in sequencing technologies for amyotrophic lateral sclerosis research. Mol. Neurodegener. 18, 4 (2023).
Article CAS PubMed PubMed Central Google Scholar
Suzuki, N., Nishiyama, A., Warita, H. & Aoki, M. Genetics of amyotrophic lateral sclerosis: seeking therapeutic targets in the era of gene therapy. J. Hum. Genet. 68, 131–152 (2023).
Article CAS PubMed Google Scholar
Elden, A. C. et al. Ataxin-2 intermediate-length polyglutamine expansions are associated with increased risk for ALS. Nature 466, 1069–1075 (2010).
Article CAS PubMed PubMed Central Google Scholar
Baughn, M. W. et al. Mechanism of STMN2 cryptic splice-polyadenylation and its correction for TDP-43 proteinopathies. Science 379, 1140–1149 (2023).
Article CAS PubMed PubMed Central Google Scholar
Mann, J. R. et al. Loss of function of the ALS-associated NEK1 kinase disrupts microtubule homeostasis and nuclear import. Sci. Adv. 9, eadi5548 (2023).
Article CAS PubMed PubMed Central Google Scholar
Miller, T. M. et al. Trial of Antisense Oligonucleotide Tofersen for SOD1 ALS. N. Engl. J. Med. 387, 1099–1110 (2022).
Article CAS PubMed Google Scholar
Giacomelli, E. et al. Human stem cell models of neurodegeneration: from basic science of amyotrophic lateral sclerosis to clinical translation. Cell Stem Cell 29, 11–35 (2022).
Article CAS PubMed PubMed Central Google Scholar
Workman, M. J. et al. Large-scale differentiation of iPSC-derived motor neurons from ALS and control subjects. Neuron 111, 1191–1204.e5 (2023).
Article CAS PubMed PubMed Central Google Scholar
Shi, Y. et al. Haploinsufficiency leads to neurodegeneration in C9ORF72 ALS/FTD human induced motor neurons. Nat. Med. 24, 313–325 (2018).
Article CAS PubMed PubMed Central Google Scholar
Coyne, A. N. et al. G4C2 repeat RNA initiates a POM121-mediated reduction in specific nucleoporins in C9orf72 ALS/FTD. Neuron 107, 1124–1140.e11 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wang, T. et al. C9orf72 regulates energy homeostasis by stabilizing mitochondrial complex I assembly. Cell Metab. 33, 531–546.e9 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wang, T. et al. Intracellular energy controls dynamics of stress-induced ribonucleoprotein granules. Nat. Commun. 13, 5584 (2022).
Article CAS PubMed PubMed Central Google Scholar
Baxi, E. G. et al. Answer ALS, a large-scale resource for sporadic and familial ALS combining clinical and multi-omics data from induced pluripotent cell lines. Nat. Neurosci. 25, 226–237 (2022).
Article CAS PubMed PubMed Central Google Scholar
Ng, S., Masarone, S., Watson, D. & Barnes, M. R. The benefits and pitfalls of machine learning for biomarker discovery. Cell Tissue Res. 394, 17–31 (2023).
Article PubMed PubMed Central Google Scholar
Li, R. Data mining and machine learning methods for dementia research. Methods Mol. Biol. 1750, 363–370 (2018).
Article CAS PubMed Google Scholar
Sances, S. et al. Modeling ALS with motor neurons derived from human induced pluripotent stem cells. Nat. Neurosci. 19, 542–553 (2016).
Article CAS PubMed PubMed Central Google Scholar
Guise, A. J. et al. TDP-43-stratified single-cell proteomics of postmortem human spinal motor neurons reveals protein dynamics in amyotrophic lateral sclerosis. Cell Rep. 43, 113636 (2024).
Article CAS PubMed PubMed Central Google Scholar
Wang, T. et al. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat. Commun. 12, 3445 (2021).
Article CAS PubMed PubMed Central Google Scholar
Shen, H. et al. Sexually dimorphic RNA helicases DDX3X and DDX3Y differentially regulate RNA metabolism through phase separation. Mol. Cell 82, 2588–2603.e9 (2022).
Article CAS PubMed PubMed Central Google Scholar
Harlan, B. A. et al. Evaluation of the NAD+ biosynthetic pathway in ALS patients and effect of modulating NAD+ levels in hSOD1-linked ALS mouse models. Exp. Neurol. 327, 113219 (2020).
Article CAS PubMed PubMed Central Google Scholar
Andrés-Benito, P., Moreno, J., Aso, E., Povedano, M. & Ferrer, I. Amyotrophic lateral sclerosis, gene deregulation in the anterior horn of the spinal cord and frontal cortex area 8: implications in frontotemporal lobar degeneration. Aging9, 823–851 (2017).
Article PubMed PubMed Central Google Scholar
Chen, Z. S. et al. Mutant GGGGCC RNA prevents YY1 from binding to Fuzzy promoter which stimulates Wnt/β-catenin pathway in C9ALS/FTD. Nat. Commun. 14, 8420 (2023).
Article CAS PubMed PubMed Central Google Scholar
Harvey, C. et al. Rare and common genetic determinants of mitochondrial function determine severity but not risk of amyotrophic lateral sclerosis. Heliyon 10, e24975 (2024).
Article CAS PubMed PubMed Central Google Scholar
Park, J. et al. Poly(GR) interacts with key stress granule factors promoting its assembly into cytoplasmic inclusions. Cell Rep. 42, 112822 (2023).
Article CAS PubMed PubMed Central Google Scholar
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Proc. of the 31st International Conference on Neural Information Processing Systems 4768–4777 (Curran Associates Inc., 2017).
Zhang, S. et al. Genome-wide identification of the genetic basis of amyotrophic lateral sclerosis. Neuron 110, 992–1008.e11 (2022).
Article CAS PubMed PubMed Central Google Scholar
Kiskinis, E. et al. Pathways disrupted in human ALS motor neurons identified through genetic correction of mutant SOD1. Cell Stem Cell 14, 781–795 (2014).
Article CAS PubMed PubMed Central Google Scholar
Li, Y. et al. Globally reduced N6-methyladenosine (m6A) in C9ORF72-ALS/FTD dysregulates RNA metabolism and contributes to neurodegeneration. Nat. Neurosci. 26, 1328–1338 (2023).
Article CAS PubMed PubMed Central Google Scholar
Kok, J. R., Palminha, N. M., Dos Santos Souza, C., El-Khamisy, S. F. & Ferraiuolo, L. DNA damage as a mechanism of neurodegeneration in ALS and a contributor to astrocyte toxicity. Cell Mol. Life Sci. 78, 5707–5729 (2021).
Article CAS PubMed PubMed Central Google Scholar
Klim, J. R. et al. ALS-implicated protein TDP-43 sustains levels of STMN2, a mediator of motor neuron growth and repair. Nat. Neurosci. 22, 167–179 (2019).
Article CAS PubMed PubMed Central Google Scholar
Gomes, L. C. et al. Multiomic ALS signatures highlight subclusters and sex differences suggesting the MAPK pathway as therapeutic target. Nat. Commun. 15, 4893 (2024).
Article Google Scholar
Larrea, D. Altered mitochondria-associated ER membrane (MAM) function shifts mitochondrial metabolism in amyotrophic lateral sclerosis (ALS). Nat. Commun. 16, 379 (2025).
Lee, H. Multi-omic analysis of selectively vulnerable motor neuron subtypes implicates altered lipid metabolism in ALS. Nat. Neurosci. 24, 1673-1685 (2021).
Huang, N., Lee, I., Marcotte, E. M. & Hurles, M. E. Characterising and predicting haploinsufficiency in the human genome. PLoS Genet. 6, e1001154 (2010).
Article PubMed PubMed Central Google Scholar
Fadista, J., Oskolkov, N., Hansson, O. & Groop, L. LoFtool: a gene intolerance score based on loss-of-function variants in 60,706 individuals. Bioinformatics 33, 471–474 (2017).
Article CAS PubMed Google Scholar
Petrovski, S., Wang, Q., Heinzen, E. L., Allen, A. S. & Goldstein, D. B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).
Article CAS PubMed PubMed Central Google Scholar
Tam, O. H. et al. Postmortem cortex samples identify distinct molecular subtypes of ALS: retrotransposon activation, oxidative stress, and activated glia. Cell Rep. 29, 1164–1177.e5 (2019).
Article CAS PubMed PubMed Central Google Scholar
O’Donohue, M.-F., Choesmel, V., Faubladier, M., Fichant, G. & Gleizes, P.-E. Functional dichotomy of ribosomal proteins during the synthesis of mammalian 40S ribosomal subunits. J. Cell Biol. 190, 853–866 (2010).
Article PubMed PubMed Central Google Scholar
Mirabello, L. et al. Whole-exome sequencing and functional studies identify RPS29 as a novel gene mutated in multicase Diamond-Blackfan anemia families. Blood 124, 24–32 (2014).
Article CAS PubMed PubMed Central Google Scholar
Kapur, M. & Ackerman, S. L. mRNA translation gone awry: translation fidelity and neurological disease. Trends Genet. 34, 218–231 (2018).
Article CAS PubMed PubMed Central Google Scholar
Martinez-Miguel, V. E. et al. Increased fidelity of protein synthesis extends lifespan. Cell Metab. 33, 2288–2300.e12 (2021).
Article CAS PubMed PubMed Central Google Scholar
Salas-Marco, J. & Bedwell, D. M. Discrimination between defects in elongation fidelity and termination efficiency provides mechanistic insights into translational readthrough. J. Mol. Biol. 348, 801–815 (2005).
Article CAS PubMed Google Scholar
She, R., Luo, J. & Weissman, J. S. Translational fidelity screens in mammalian cells reveal eIF3 and eIF4G2 as regulators of start codon selectivity. Nucleic Acids Res. 51, 6355–6369 (2023).
Article CAS PubMed PubMed Central Google Scholar
Maor-Nof, M. et al. p53 is a central regulator driving neurodegeneration caused by C9orf72 poly(PR). Cell 184, 689–708.e20 (2021).
Article CAS PubMed PubMed Central Google Scholar
Solomon, D. A. et al. A feedback loop between dipeptide-repeat protein, TDP-43, and karyopherin-α mediates C9orf72-related neurodegeneration. Brain 141, 2908–2924 (2018).
Article PubMed PubMed Central Google Scholar
Diaz-Garcia, S. et al. Nuclear depletion of RNA-binding protein ELAVL3 (HuC) in sporadic and familial amyotrophic lateral sclerosis. Acta Neuropathol. 142, 985–1001 (2021).
Article CAS PubMed PubMed Central Google Scholar
van Rheenen, W. et al. Common and rare variant association analyses in amyotrophic lateral sclerosis identify 15 risk loci with distinct genetic architectures and neuron-specific biology. Nat. Genet. 53, 1636–1648 (2021).
Article PubMed PubMed Central Google Scholar
Nicolas, A. et al. Genome-wide analyses identify KIF5A as a novel ALS gene. Neuron 97, 1268–1283.e6 (2018).
Article CAS PubMed Central Google Scholar
PARALS Registry et al. Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis. Nat. Genet. 48, 1043–1048 (2016).
Article PubMed Central Google Scholar
Hop, P. J. et al. Genome-wide study of DNA methylation shows alterations in metabolic, inflammatory, and cholesterol pathways in ALS. Sci. Transl. Med. 14, eabj0264 (2022).
Article CAS PubMed PubMed Central Google Scholar
Laaksovirta, H. et al. Chromosome 9p21 in amyotrophic lateral sclerosis in Finland: a genome-wide association study. Lancet Neurol. 9, 978–985 (2010).
Article CAS PubMed PubMed Central Google Scholar
Fujimori, K. et al. Modeling sporadic ALS in iPSC-derived motor neurons identifies a potential therapeutic agent. Nat. Med. 24, 1579–1589 (2018).
Article CAS PubMed Google Scholar
Morimoto, S. et al. Phase 1/2a clinical trial in ALS with ropinirole, a drug candidate identified by iPSC drug discovery. Cell Stem Cell 30, 766–780.e9 (2023).
Article CAS PubMed Google Scholar
Ziff, O. J. et al. Integrated transcriptome landscape of ALS identifies genome instability linked to TDP-43 pathology. Nat. Commun. 14, 2176 (2023).
Article CAS PubMed PubMed Central Google Scholar
Taylor, A. M. et al. Hematopoietic defects in rps29 mutant zebrafish depend upon p53 activation. Exp. Hematol. 40, 228–237.e5 (2012).
Article CAS PubMed Google Scholar
Taylor, A. et al. Calmodulin inhibitors improve erythropoiesis in Diamond-Blackfan anemia. Sci. Transl. Med. 12, eabb5831 (2020).
Article CAS PubMed PubMed Central Google Scholar
Krus, K. L. et al. Loss of Stathmin-2, a hallmark of TDP-43-associated ALS, causes motor neuropathy. Cell Rep. 39, 111001 (2022).
Article CAS PubMed PubMed Central Google Scholar
Klim, J. R., Pintacuda, G., Nash, L. A., Guerra San Juan, I. & Eggan, K. Connecting TDP-43 pathology with neuropathy. Trends Neurosci. 44, 424–440 (2021).
Article CAS PubMed Google Scholar
Depienne, C. & Mandel, J.-L. 30 years of repeat expansion disorders: What have we learned and what are the remaining challenges?. Am. J. Hum. Genet. 108, 764–785 (2021).
Article CAS PubMed PubMed Central Google Scholar
Malik, I., Kelley, C. P., Wang, E. T. & Todd, P. K. Molecular mechanisms underlying nucleotide repeat expansion disorders. Nat. Rev. Mol. Cell Biol. 22, 589–607 (2021).
Article CAS PubMed PubMed Central Google Scholar
Nguyen, L., Cleary, J. D. & Ranum, L. P. W. Repeat-associated Non-ATG translation: molecular mechanisms and contribution to neurological disease. Annu. Rev. Neurosci. 42, 227–247 (2019).
Article CAS PubMed PubMed Central Google Scholar
Rodriguez, C. M. et al. A native function for RAN translation and CGG repeats in regulating fragile X protein synthesis. Nat. Neurosci. 23, 386–397 (2020).
Article CAS PubMed PubMed Central Google Scholar
Yamada, S. B. et al. RPS25 is required for efficient RAN translation of C9orf72 and other neurodegenerative disease-associated nucleotide repeats. Nat. Neurosci. 22, 1383–1388 (2019).
Article CAS PubMed PubMed Central Google Scholar
Du, Z.-W. et al. Generation and expansion of highly pure motor neuron progenitors from human pluripotent stem cells. Nat. Commun. 6, 6626 (2015).
Article CAS PubMed Google Scholar
Kramer, E. B., Vallabhaneni, H., Mayer, L. M. & Farabaugh, P. J. A comprehensive analysis of translational missense errors in the yeast Saccharomyces cerevisiae. RNA 16, 1797–1808 (2010).
Article CAS PubMed PubMed Central Google Scholar
Arnold, A. et al. Functional characterization of C. elegans Y-box-binding proteins reveals tissue-specific functions and a critical role in the formation of polysomes. Nucleic Acids Res. 42, 13353–13369 (2014).
Article CAS PubMed PubMed Central Google Scholar
Zhou, Y. et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun. 10, 1523 (2019).
Article PubMed PubMed Central Google Scholar
Szklarczyk, D. et al. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 51, D638–D646 (2023).
Article CAS PubMed Google Scholar
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Article CAS PubMed PubMed Central Google Scholar
Wilkerson, M. D. & Hayes, D. N. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics 26, 1572–1573 (2010).
Article CAS PubMed PubMed Central Google Scholar
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinforma. 9, 559 (2008).
Article Google Scholar
Langfelder, P. & Horvath, S. Fast R functions for robust correlations and hierarchical clustering. J. Stat. Softw. 46, 1–7 (2012).
Article Google Scholar
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
Article CAS PubMed PubMed Central Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central Google Scholar
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).
Article CAS PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Article CAS PubMed Google Scholar
Wickham, H. Data analysis. In ggplot2: Elegant Graphics for Data Analysis (ed. Wickham, H.) 189–201 (Springer International Publishing, 2016).

Download references

Acknowledgements

This work was supported by the National Science and Technology Innovation 2030 Major Projects of China (STI2030-Major Projects-2022ZD0212600), the National Key Research and Development Program of China (NO. 2022YFA1106600), National Natural Science Foundation of China (No. 82273473, 82473207, 32271000), and Shanghai Municipal Health Commission (2022YQ062). We thank all the patients participated in the study. Data used in the preparation of this article were obtained from the ANSWER ALS Data Portal (AALS-01184). For up-to-date information on the study, visit https://dataportal.answerals.org. Publication license is generated for graphical illustrations made by BioRender (www.biorender.com).

Author information

These authors contributed equally: Wei Xu, Zhipeng Guo.

Authors and Affiliations

Department of Anesthesiology, Shanghai Key Laboratory of Perioperative Stress and Protection, Zhongshan Hospital, Institute for Translational Brain Research, State Key Laboratory of Brain Function and Disorders, MOE Frontiers Center for Brain Science, MOE Innovative Center for New Drug Development of Immune Inflammatory Diseases, Fudan University, Shanghai, China
Wei Xu, Zhipeng Guo, Yian Guan, Shihui Lv, Xue Gao, Tao Wang & Zhixin Qiu
Department of Anesthesiology, Zhongshan Hospital, Fudan University, Shanghai, China
Wenchen Luo
Institute of Pediatrics, National Children’s Medical Center, Children’s Hospital, Institute for Translational Brain Research, State Key Laboratory of Brain Function and Disorders, MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
Tianlin Cheng
Department of Neurology, Zhongshan Hospital, Institute for Translational Brain Research, State Key Laboratory of Brain Function and Disorders, MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
Zhicheng Shao
Department of Neurosurgery, Xinhua Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
Bangbao Tao

Authors

Wei Xu
View author publications
Search author on:PubMed Google Scholar
Zhipeng Guo
View author publications
Search author on:PubMed Google Scholar
Yian Guan
View author publications
Search author on:PubMed Google Scholar
Shihui Lv
View author publications
Search author on:PubMed Google Scholar
Xue Gao
View author publications
Search author on:PubMed Google Scholar
Wenchen Luo
View author publications
Search author on:PubMed Google Scholar
Tianlin Cheng
View author publications
Search author on:PubMed Google Scholar
Zhicheng Shao
View author publications
Search author on:PubMed Google Scholar
Bangbao Tao
View author publications
Search author on:PubMed Google Scholar
Tao Wang
View author publications
Search author on:PubMed Google Scholar
Zhixin Qiu
View author publications
Search author on:PubMed Google Scholar

Contributions

W.X. and Z.Q. conceptualized the study. W.X. performed data analysis and organized figures. Z.G., Y.G., W.X., T.W., and B.T. contributed to experimental verifications and organized the experimental data and images. S.L., W.L., Z.S., and T.C. provided assistance with experimental design and process. X.G. provided insights for data analysis. W.X., Z.Q., B.T., and T.W. wrote the manuscript. All authors provided valuable suggestions and revisions to the manuscript. Z.Q. T.W. and B.T. supervised the study.

Corresponding authors

Correspondence to Bangbao Tao, Tao Wang or Zhixin Qiu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Biology thanks Hidenori Homma and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: João Valente. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Reporting Summary

Transparent Peer Review file

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Xu, W., Guo, Z., Guan, Y. et al. Machine learning-based proteomics profiling of ALS identifies downregulation of RPS29 that maintains protein homeostasis and STMN2 level. Commun Biol 8, 1177 (2025). https://doi.org/10.1038/s42003-025-08578-8

Download citation

Received: 15 January 2025
Accepted: 22 July 2025
Published: 07 August 2025
Version of record: 07 August 2025
DOI: https://doi.org/10.1038/s42003-025-08578-8

This article is cited by

Reference gene variability across age and sex in 5XFAD mice highlights normalization challenges in Alzheimer’s models
- Eleonora Daini
- Kristy Antonioni
- Antonietta Vilella
Scientific Reports (2026)