Abstract
Amyotrophic lateral sclerosis (ALS) is a devastating motor neuron disease. The molecular understanding of ALS is hampered by the lack of experimental models recapitulating disease heterogeneity and analytical framework integrating multi-omics datasets. Here, we developed a pipeline integrating machine learning and consensus clustering to analyze a large-scale dataset of patient-derived motor neuron models from Answer ALS. Compared to the transcriptome, proteomic profiling closely correlates with ALS pathology, which is interrogated to identify 110 proteomics-based biomarkers (Proteomics Markers for ALS 110, PMA110). Functional enrichment highlights dysregulation of ALS pathways, including protein translation and neuronal function. By integrating ALS subtype-specific proteins with patient postmortem proteomics, we found that RPS29 was consistently downregulated in ALS models and patient motor neurons. RPS29 is required for neuronal viability by maintaining ribosome profiling and accurate translation, and suppressing pathological translation. RPS29 downregulation suppresses translation of STMN2, an essential protein for motor neurons, in iPSC-derived motor neurons. Taken together, this study provides a robust framework for ALS proteomics, identifies RPS29 as a quality controller of protein translation, and presents a translational mechanism for STMN2 maintenance in ALS.
Similar content being viewed by others
Introduction
Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease affecting the central nervous system, characterized by a protracted and challenging diagnostic process1. The median time from symptom onset to death typically spans 2 to 3 years, leading to the 5-year survival rate around 20%2,3. There is a significant heterogeneity in clinical presentation and prognosis for ALS4. Approximately 5 to 10% of ALS cases are familial, with a family history of ALS, while the majority (90%) are classified as sporadic5. However, the etiology of ALS remains largely elusive, and no effective therapies are available6. Advancing the understanding of molecular mechanisms underlying this deadly disease is crucial for identifying diagnostic biomarkers and therapeutic targets to this unmet medical need.
Recent years have witnessed a remarkable advancement in molecular profiling of ALS samples, leading to a list of about 53 genes associated with ALS7. Most of these genes were discovered through genetic analysis, such as genome-wide association study and next-generation sequencing. However, the alteration rate for these genes is rather low for the ALS population, and for a majority of ALS cases, there are no known driver genes. The most commonly identified ALS genes include C9orf72, SOD1, TARDBP, and FUS, which are altered in 70% of familial ALS (fALS), but only in 10% of sporadic ALS (sALS) patients8. Such a tremendous genetic heterogeneity largely impedes the development of targeted treatment for ALS. Although several molecular targets are being developed for therapeutic treatment, including ATXN29, STMN210, NEK111, and SOD112, ALS remains fatal without cure. Exploring the biomarkers and targets from additional molecular levels, such as proteomics and epigenetics, which provide regulatory information downstream of genomics, holds promise for exploring novel targets and improving ALS treatment.
Due to the scarcity of ALS postmortem tissues and their inability for experimental processes, ALS is lacking a proper experimental platform to recapitulate its heterogeneity. Patient-derived induced pluripotent stem cells (iPSCs), which could be differentiated into motor neurons and preserve the genetic background of the donor patient, provide a representative model system in ALS13. A small panel of iPSC models with defined genetic backgrounds has been used to identify pathogenic mechanisms and potential therapeutic targets14,15,16,17,18. However, the genetic heterogeneity of ALS is largely neglected by using limited iPSC models in these studies. Answer ALS (AALS) initiative is an international effort to address these challenges by generating over 1000 iPSC lines and differentiating them into motor neurons, which are characterized across multi-modality techniques, including whole genome sequencing, RNA sequencing, proteomics, and epigenetic profiles19. This comprehensive multi-omics dataset provides a valuable resource to elucidate the motor neuron-intrinsic molecular changes during ALS development. However, a proper analytical pipeline for this heterogeneous dataset is still under active development and will contribute to data interpretation in ALS.
In this study, we developed an integrative analytical pipeline to interrogate the multi-omics datasets of AALS motor neurons. Due to the molecular heterogeneity across models and the network crosstalk among molecular features, the commonly used methods, such as t-tests or DEseq2, failed to capture the differential expression patterns20. In contrast, machine learning algorithms offer several advantages, such as nonlinearity and fault tolerance, making them particularly suited for complex applications21. We employed machine learning algorithms to identify potential biomarkers to distinguish ALS cases from controls in the AALS proteomics dataset. By integrating machine learning with molecular subtypes of AALS, we resolved the molecular heterogeneity and identified potential proteomic targets for patient stratification and treatment.
Results
Analytical pipeline of ALS cohort and multi-omics dataset
To comprehensively characterize the proteomic biomarkers and therapeutic targets for ALS, we interrogated the multi-omics dataset from Answer ALS (AALS) project and other published patient cohorts by applying machine learning algorithms, bioinformatic analysis and experimental validation (Fig. 1). The dataset from AALS project analyzed in this study includes 169 ALS patients and 33 controls (Table 1). 28 patients were reported to be associated with ALS genetic variants, including C9orf72 (n = 16), SOD1 (n = 7), TARDP (n = 1), and others (n = 4). Most patients were sporadic cases (n = 141). Peripheral blood mononuclear cells (PBMCs) were collected from these volunteers and reprogrammed into iPSCs using the well-established protocols, followed by a standardized 32-day motor neuron induction protocol22. The obtained motor neurons were further characterized by transcriptomic and proteomic profiling to systematically establish the molecular landscapes in ALS motor neuron panel. On average, 21,625 genes were quantified per sample at the transcriptome level, and 4441 proteins were quantified per sample in the proteomic data.
a Answer ALS (AALS) multi-omics dataset. Blood samples are obtained from volunteers, reprogrammed into iPSCs, followed by differentiation into motor neurons and multi-omics sequencing. b Proteomics for machine learning and feature selection by importance ranking to obtain a candidate list (Proteomics Markers for ALS 110, PMA110). c Functional enrichment and conservation analysis of PMA110 signature. d Consensus clustering of ALS proteomics subtypes and their functional relevance revealed by pathway enrichment and co-expression module analysis. e Proteomics dataset from patient postmortem tissues for target prioritization. f Experimental validation and multi-omics characterization to dissect target functions in ALS. Images in Fig. 1 were created with BioRender.com.
Machine learning and consensus clustering of the AALS proteomics dataset were applied to identify a protein signature associated with ALS pathology. Subsequent integration with a proteomics dataset of laser capture microdissection and single-cell mass spectrometry-based proteomics from individual motor neurons (MNs) from ALS postmortem tissues prioritized candidate proteins for further validation23.
Machine learning of ALS multi-omics dataset identifies 110 ALS-relevant protein markers
To identify potential biomarkers and targets in ALS motor neurons compared to control, we started to analyze the differentially expressed proteins (DEP). Unfortunately, DEP analysis failed to identify high-confidence candidates after multiple comparisons (Supplementary Fig. 1a, b). Given the substantial heterogeneity of data from ALS cohort and the complex molecular regulatory network, DEP analysis was not able to interpret the heterogeneity and the non-linear molecular interactions. We hypothesized that machine learning techniques could model the crosstalk among features and identify potential biomarkers to classify ALS motor neurons and controls. Surprisingly, our analyses revealed that proteomic features yielded better predictive performance compared to transcriptomic features in multiple machine learning algorithms (Random Forest, XGBoost, and SVM), suggesting the power of proteomic data in modeling ALS pathology (Fig. 2a and Supplementary Fig. 1c, d). Similarly, we applied a graph convolutional networks (GCN)-based algorithms, Multi-Omics Graph cOnvolutional NETworks (MOGONET)24, for integrative analysis of both transcriptomic and proteomic data, which also showed that the proteomics features ranked top in predicting ALS and control motor neurons, compared to the transcriptomic features (Supplementary Fig. 1e). These data indicate the importance of proteomic data in ALS classification compared to transcriptomic data, which probably indicates that protein regulation is closely related in ALS pathology. Therefore, we mainly focused on the proteomic data in this study.
a The receiver-operating characteristic curve (ROC) plots showing the prediction accuracy using proteomics data and transcriptome data by Random Forest. Area under curve (AUC) values were calculated to indicate the overall model performance. b The bar graph showing the top features prioritized in 10,000 randomly sampled Random Forest training models. Specific features with reported ALS relevance were in red. c The plots showing the protein-protein interaction (PPI) distance between selected gene sets with the ALS meta-set. The red line represented the average PPI distance between PMA110 proteins and the ALS meta-set. The violin plot showed the average PPI distance between 110 randomly selected proteins (repeated 1000 times) and the ALS meta-set. d Gene ontology (GO) enrichment of PMA110 signature. Top, biological process gene sets from GO; Bottom, molecular function gene sets from GO. e Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment of PMA110 signature. f Gene ontology (GO) enrichment of the four protein-protein interaction (PPI) modules based on the PPI network among PMA110 signature.
To establish prediction models and perform feature selection in AALS proteomic dataset, we introduced a machine learning pipeline combined with bootstrapping strategy. For each training iteration, we randomly selected 70% samples of the dataset as the training subset. Proteins that exhibited feature importance for the Random Forest model exceeding a predetermined threshold were identified in each iteration. The frequency of protein features selected across 10,000 training iteration models was quantified and ranked, ultimately generating a candidate list of 110 protein features (Proteomics Markers for ALS 110, PMA110; Fig. 2b, Supplementary Fig. 2a, b and Supplementary Data 1). Incorporating other variables, such as gender, age and batch, into the machine learning-based prediction revealed that these features had only limited impact on model performance, as shown by the overlapped features with PMA110 (Supplementary Fig. 2c). In line with previous studies14, we used the top 500 most variable genes in PCA analysis and checked the distribution of these variables, but overall we didn’t find a strong biased distribution for them (Supplementary Fig. 2d, e). Subsequent retraining using this selected PMA110 protein feature set demonstrated improved accuracy and enhanced stability in predicting ALS and control samples (Supplementary Fig. 2f). Notably, several of these proteins, including DDX3Y25, NMNAT126, PAK327, YY128, ACADM29, and YTHDF130, have been previously implicated in ALS. Shapley Additive exPlanations (SHAP)31 is widely used in explaining the feature importance in machine learning and artificial intelligence models. Therefore, we utilized SHAP plots to visually evaluate the impact of PMA110 features on model predictions, considering both the feature values (encoded by gradient colors) and the direction of prediction contribution (indicating the likelihood of ALS) (Supplementary Fig. 3a). For instance, EIF4E2 exhibited large SHAP values, suggesting a substantial discriminatory power and a significant contribution on the model’s output. Individuals with lower EIF4E2 levels (indicated by blue) are more likely to develop ALS compared to those with higher levels (indicated by red).
Among PMA110 signature, only 45% of the protein features showed differential expression (Student’s t-test p < 0.05) at the protein levels between ALS patients and controls, indicating the advantage of machine learning in identify ALS-related protein features. We observed the expression changes for EIF4A2, ACADM, YY1 and BRD2 at the protein level, but not the RNA level (Supplementary Fig. 3b–e), which could be explained by the marginal correlation between protein and RNA levels (Supplementary Fig. 3g). This finding highlights the importance of prioritizing ALS regulators at the protein levels in future investigation. Taken together, these data suggest that the machine learning prediction of the proteomic dataset uncovers potential protein markers linked with ALS pathology. To further explore the correlation between PMA110 and the well-established ALS genes, we linked PMA110 signature with 2 published ALS risk gene sets (ALS meta-set7 and RefMap32), which were curated from genetic and epigenetic association. Overall, we noticed a limited overlap among the three gene sets, suggesting the complexity of ALS pathology at different molecular regulation levels (Supplementary Fig. 3f). We next hypothesized that PMA110 signature may interact with the well-established ALS genes through protein-protein interaction (PPI). To this end, we examined the PPI distance between PMA110 proteins with the ALS meta-set proteins (Supplementary Fig. 3h). A strong interaction relationship between both gene sets was identified, as revealed by the shorter distance between PMA110 and ALS meta-set compared to the randomly selected gene sets, suggesting the functional relevance of PMA110 to ALS (Fig. 2c).
Functional enrichment of PMA110 signature reveals ALS pathological pathways
To gain a deeper understanding of biological functions related to PMA110 signature, we conducted functional enrichment analysis using the 110 candidate proteins. Gene Ontology (GO) and KEGG analysis revealed that these candidates are enriched in pathways associated with stress response, oxidative phosphorylation, RNA metabolism, DNA damage response, neuron function, MAPK signaling, membrane regulation, and lipid metabolism, which was in line with previous findings33,34,35,36,37,38,39 (Fig. 2d, e). Utilizing STRING database, we grouped the candidate proteins into four distinct PPI clusters, each representing different interaction patterns and functional relevance. Cluster1 was associated with mitochondrial function, Cluster2 with DNA damage response, Cluster3 with RNA binding regulation, and Cluster4 with protein transport and localization (Fig. 2f). Notably, the hub proteins located at the central positions of each PPI cluster exhibited differential expression levels between ALS and control groups (Supplementary Fig. 4a–d), including the well-established ALS proteins, such as TP53 and HNRNPA2B1. In addition, the interacting proteins also displayed a strong correlation of protein expression levels, further supporting the functional cooperation within each PPI cluster for ALS (Supplementary Fig. 4e–h).
In addition to functional and pathway enrichment analysis, we performed conservation analysis of PMA110 proteins to assess the functional significance of the candidate genes (Supplementary Fig. 5a–c). This list of genes was found to be more haploinsufficient, as indicated by their haploinsufficiency (HI) scores40 when compared to all the protein-coding genes. Furthermore, these candidates exhibited less intolerance to loss-of-function (LoF) mutations41 and various other mutation types, as demonstrated by LoFtool score and the Residual Variation Intolerance Score (RVIS)42. By contrast, a randomly selected protein set did not show significant differences from the background coding proteins. Collectively, these results underscore the functional significance of the PMA110 signature and suggest that alterations of their protein levels in patients may confer pathological effects.
Inter-patient molecular heterogeneity is resolved by proteomics-based ALS subtypes
Considering the individual genetic heterogeneity across AALS cohort, we hypothesized that molecular subtype analysis could reduce the overall heterogeneity, stratify patients into subtypes with similar molecular backgrounds, and identify subtype-specific alterations in ALS. To this end, we performed consensus clustering based on transcriptional and proteomic expression profiles. Consistent with machine learning modelling, proteomics-based clustering outperformed transcriptomic clustering in distinguishing patient subtypes (Fig. 3a, Supplementary Fig. 6a–h). Consensus clustering divided the patients into four distinct subtypes (Proteomics Subtype 1-4, ProS1-4), and principal component analysis (PCA) demonstrated that this classification was robust (Fig. 3b). The 4 proteomics subtypes showed distinct protein expression patterns and clinical relevance (Fig. 3c and Table 2). ProS1 showed a late ALS onset, as revealed by an older age at symptom onset. ProS2 subtype was mainly from female patients and with higher C9orf72 repeats. These data were in line with previous findings that aging and sex differences were involved in the ALS pathology14.
a Heatmap showing consensus matrix with 4 molecular clusters for ALS patient group, identifying 4 Proteomics-based subtypes (ProS) based on protein expression data. b Principal-component analysis (PCA) plots of four ALS subtypes based on protein expression data. c Heatmap showing the protein expression levels of highly expressed proteins in the four proteomics subtypes. d Heatmap showing proteomics subtype-specific pathway enrichment using Ingenuity Pathway Analysis (IPA) analysis. Colors represented the IPA zscores. e Heatmap showing protein expression modules identified by weighted gene co-expression network analysis (WGCNA) and their correlation with proteomic subtypes. The number in each heatmap box represents the correlation coefficient and p value in the parentheses. f Functional enrichment analysis of gene sets in selected modules. Top: red, yellow, and tan modules are significantly up-regulated in ProS2, 3, and 4, respectively. Bottom: black, turquoise, pink, and blue modules are significantly down-regulated in ProS1, 2, 3, and 4, respectively.
To further characterize the functional difference of each proteomic subtype, we conducted Ingenuity Pathway Analysis (IPA) using the subtype-specifically expressed proteins to identify subtype-specific activation or inhibition of specific pathways (Fig. 3d and Supplementary Fig. 7a–d). ProS1 showed activation of the protein folding and extracellular matrix (ECM) and decreased neuronal function; ProS2 subtype exhibited increased ECM, antigen presentation, but downregulation of the MAPK pathway; ProS3 had upregulated protein transport and dysregulated mitochondrial function; ProS4 showed significant activation of the AMPK pathway and inhibition of translation and ECM. By comparing these proteomic subtypes with previously established ALS subtypes of ALS-Ox, ALS-Glia, and ALS-TE43, we found that ProS1 was mainly enriched in ALS-Glia subtype, ProS2 enriched in ALS-Ox subtype, and Pro S3 and S4 split into all the 3 subtypes with a slight enrichment in ALS-TE subtypes (Supplementary Fig. 7e). We also employed an additional clustering methodology37 to classify AALS dataset (Supplementary Fig. 7f). The pathway activation pattern shown in the heatmap reveals that ProS2 subtype, characterized by activation of immune pathway and extracellular matrix, aligned with the C3 subtype, whereas ProS3 exhibits extracellular matrix (ECM) suppression, resembling the C4 subtype. Overall, our analysis identified proteomic clusters consistent with previous studies and revealed a novel subtype, ProS4, characterized by translational dysregulation.
Next, we explored the protein modules across subtypes using Weighted Gene Co-expression Network Analysis (WGCNA), identifying 14 proteomics-based co-expression modules (Fig. 3e and Supplementary Fig. 8a, b). We selected the most significantly up- or down-regulated modules in each cluster for enrichment analysis (Fig. 3f). Among the up-regulated module activities, module red (ProS1 and ProS3) enriched protein localization and mitophagy pathway. Module yellow (ProS2) enriched protein metabolism, neurodevelopment, and aging pathways. Module tan (ProS4) enriched pathways related to sugar metabolism. Among the down-regulated modules, module black (ProS1) significantly down-regulated energy metabolism and endocytosis. Module turquoise (ProS2) significantly downregulated RNA splicing-related pathways. Module pink (ProS3) significantly downregulated protein stability pathways. Module blue (ProS4) significantly downregulated ribosome function and cytoplasmic translation pathway. These data suggest a remarkable inter-patient heterogeneity among ALS, and that the specific dysregulation of functional pathways are observed in only a subset of ALS cohort.
Consensus clustering integrated with machine learning modelling in AALS identifies subtype-specific targets in ALS patients
We examined the expression pattern of the PMA110 signature across the AALS proteomics subgroups and observed that certain proteins showed differential expression patterns among the subtypes (Supplementary Fig. 9a). To identify potential targets for further experimental validation, we hypothesized that consensus clustering could reduce patient heterogeneity by stratifying individuals into molecular subtypes, and that subtype-specific differentially expressed proteins (DEPs) would better explain the expression patterns of features selected by machine learning models, which helps elucidate subtype-specific regulators and identify biologically meaningful targets within ALS patient cohorts. To this end, we established a pipeline of integrating AALS subtype-specific protein data (ProS), machine learning identified protein signature (PMA110) with ALS patient proteomic analysis (Fig. 4a). Differential expression analysis between each subtype and the control group in AALS dataset yielded a set of differentially expressed proteins, in contrast to the whole cohort analysis, indicating that the patient heterogeneity was indeed resolved at the molecular subtype level (Fig. 4b, Supplementary Fig. 9b–e and Supplementary Data 2). Overlapping analysis of these subtype-specific DEPs with PMA110 yielded a list of 34 proteins (Fig. 4c and Supplementary Fig. 9f). Given that the AALS dataset was derived from iPSC-differentiated motor neurons, we sought to validate whether these motor neuron-intrinsic candidates could be recapitulated in an ALS patient single-cell proteomic dataset obtained from laser-capture microdissection of individual motor neurons from the thoraco-lumbar ventral spinal horns. Analysis of 34 protein candidates identified 11 proteins detectable in postmortem tissue, due to the technical limitations of proteomics. Among these, five proteins (RPS29, RALA, GPS1, ARRB1, and TBC1D9B) showed differential expression in ALS motor neurons compared to controls. Notably, concordant expression changes between AALS models and postmortem tissue were observed for RPS29, RALA, and GPS1 (Fig. 4d–f). Given the involvement in translational regulation and ribosome integrity for RPS2944, we proceeded with functional validation of RPS29. In contrast to the downregulation of RPS29 protein levels, RPS29 mRNA levels showed no reduction in ProS4 (Supplementary Fig. 9g). Considering that mis-splicing of RPS29 has been implicated in Diamond-Blackfan anemia45, we examined alternative splicing events of RPS29. However, no significant differences were detected in ProS4 and control group (Supplementary Fig. 9h). These findings suggest that RPS29 protein downregulation was not associated with transcription and alternative splicing regulation.
a Pipeline of potential target prioritization. PMA110 signature was interrogated against the differentially expressed proteins (DEPs) by comparing each subgroup to the control group, followed by supporting evidence of DEPs from ALS postmortem tissues. Images in Fig. 4a were created with BioRender.com. b Differentially expressed protein analysis (ALS Subtype vs Control) showing up- and down-regulated genes across all the four ALS subtypes. Proteins were considered as DEPs based on the threshold of fold change > 1.5 and adjusted p values < 0.0001. p values were calculated using a two-tailed t-test. Benjamini-Hochberg method was used for multiple testing correction to obtain the corrected p-value (padj). c Pie plots illustrating the percentage of subtype-specific DEPs and patient postmortem DEPs in PMA110 signature. d–f Boxplots showing the protein expression levels in proteomics subtypes of AALS dataset (top panels) and ALS postmortem tissue dataset (bottom panels). For box-and-whisker plot, the center line represents the median, the box bounds indicate the interquartile range (IQR) from the 25th to the 75th percentile, and the whiskers extend to 1.5 times the IQR below the first quartile and above the third quartile. The points beyond whiskers indicate outliers. p values were calculated using one-way ANOVA. For iPSC proteomic data, p values were adjusted for age at sampling using ANCOVA. *p < 0.05; ***p < 0.001; ****p < 0.0001; ns, not significant.
RPS29 serves as a quality control regulator of protein translation
Precise control of protein translation is essential for motor neuron function; however, its dysregulation in ALS is not well understood. Motivated by the discovery from the integrative analytical pipeline of ALS patient-derived cellular models and postmortem tissues, we sought to validate whether reduced RPS29, a ribosomal protein of the 40S ribosome subunits, accounted for the dysregulated protein translation control in ALS. We first analyzed the functional role of RPS29 in global protein translation using the puromycin incorporation assay. We observed a significant reduction in protein translation rate after RPS29 knockdown (Fig. 5a, b and Supplementary Fig. 10a). Such reduced protein translation further led to disrupted protein homeostasis, as evidenced by concomitant increased global ubiquitylation level, and neuronal cell death (Fig. 5c–e). Faithful translation of mRNA into corresponding protein requires intact ribosomal function, and neurons are particularly sensitive to translational infidelity46. To further analyze the protein translation fidelity affected by RPS29 loss, we introduced two dual-luciferase reporters to measure the stop codon readthrough and amino acid misincorporation, two common types of translational errors, respectively47,48. The Renilla luciferase serves as an internal control for both mRNA abundance and the normal translation whereas the Firefly luciferase is silenced due to the existence of either a key mutation (H245R) or an upstream stop codon but will restore its activity when misincorporation or stop codon read through occurs (Fig. 5f). The results showed that, after RPS29 knockdown, both types of translation errors were increased (Fig. 5g). In contrast, the frequency of translation initiation at near-cognate start codon was not affected by RPS29 knockdown49 (Supplementary Fig. 10b, c). These data suggest the RPS29 loss resulted in impaired protein synthesis rate and accuracy.
Immunoblots a and quantification results b showing the translation rate detected by puromycin incorporation assay in SH-SY5Y cells after RPS29 knockdown. Each dot represents one biological replicate (n = 3). Data are presented as mean ± SD. Immunoblots c and quantification results d showing the protein ubiquitination levels detected by ubiquitin antibody in SH-SY5Y cells after RPS29 knockdown. Each dot represents one biological replicate (n = 3). Data are presented as mean ± SD. e Relative neuronal viability of differentiated SH-SY5Y with shRNA-mediated knockdown of RPS29. The differentiation protocol was also showed on the top. Each dot represents one biological replicate (n = 3). Data are presented as mean ± SD. f A scheme of the dual luciferase reporter and control reporter to measure stop codon readthrough and misincorporation translational errors. g Translation accuracy levels measured in SH-SY5Y cells with shRNA-mediated knockdown of RPS29. Top, stop codon readthrough assay. Bottom, misincorporation translational error assay. Each dot represents one biological replicate (n = 6). All data are representative of at least three independent experiments. Data are presented as mean ± SD. The p value was calculated using a two-tailed t-test. *p < 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001.
RPS29 maintains STMN2 translation and protein expression
To further understand the molecular mechanisms of RPS29 loss in ALS, we performed Ribosome-seq (Ribo-seq) combined with RNA-seq to characterize the translatome and transcriptome in wildtype and RPS29 knockdown cells and explore the RPS29-regulated cellular process (Supplementary Fig. 10d–g). GSEA analysis showed that a series of key pathways were dysregulated after RPS29 knockdown, including ribosome proteins, translation and its related quality control, microtubule growth, and DNA damage response (Fig. 6a, b). Interestingly, p53 pathway was activated after RPS29 knockdown, which was in line with the p53 activation in ALS patient samples50. In line with the role of RPS29 in cytoplasmic ribosomes, protein translation pathway was downregulated at the protein translation level (Supplementary Fig. 10h).
Bar plots showing the gene set enrichment analysis (GSEA) of RNA-seq a and Ribo-seq b after RPS29 knockdown in SY5Y. NES, normalized enrichment score. c Heatmap showing the translation levels of key ALS-related proteins. Immunoblots d and quantification results e showing decreased expression levels of ALS-related proteins in SH-SY5Y after RPS29 knockdown. Each dot represents one biological replicate (n = 3). Data are presented as mean ± SD. The p value was calculated using a two-tailed t-test. Immunoblots f and quantification results g showing increased levels of poly-GA produced by repeat-associated non-AUG (RAN) translation of C9orf72 repeats after RPS29 knockdown in HEK-293T cells. Each dot represents one biological replicate (n = 3). Data are presented as mean ± SD. The p value was calculated using a two-tailed t-test. h Images (left) showing increased levels of poly-GA produced by repeat-associated non-AUG (RAN) translation of C9orf72 repeats after RPS29 knockdown in HEK-293T cells. The cell proportion (right) was obtained by dividing the DPR cells in each field of view by all cells in the field of view (n = 10). Data are presented as mean ± SD. The p value was calculated using a two-tailed t-test. i Relative cell viability of motor neuron with shRNA-mediated knockdown of RPS29. Each dot represents a biological replicate (n = 3). Data are presented as mean ± SD. The p value was calculated using a two-tailed t-test. j Images (left) and quantification results (right) showing decreased levels of STMN2 in the axon of motor neurons after RPS29 knockdown. Relative STMN2 expression level was obtained by dividing the mean intensity of the axons by the mean intensity of the soma in each neuron (n = 28). Data are presented as mean ± SD. The p value was calculated using a two-tailed t-test. TUJ1, a neuron marker. k Relative cell viability of SH-SY5Y and motor neuron with shRNA-mediated knockdown of RPS29 and RPS29 overexpression. Each dot represents one biological replicate (n = 3). All data are representative of at least three independent experiments. Data are presented as mean ± SD. The p value was calculated using one-way ANOVA. *p < 0.05; **p < 0.01; ****p < 0.0001; ns not significant.
Next, we dissected the downstream targets of RPS29 involved in ALS. Ribo-seq profiling showed the decreased translation of ALS-related genes, including TDP43, SOD1, FUS, hnRNPA2B1, CHCHD10, and VCP. Importantly, a panel of motor neuron essential proteins, whose functional loss contribute to the pathogenesis of ALS, including STMN2, KPNA2, and ELAVL3, exhibited decreased protein translation after RPS29 suppression10,36,51,52 (Fig. 6c). We then confirmed a significant reduction in these proteins’ levels using western blotting and qRT-PCR (Fig. 6d, e, Supplementary Fig. 10i). Among those proteins, STMN2 is a microtubule regulator essential for motor neuron axon growth and its reduced expression is a hallmark of ALS patients. While previous studies suggested that loss of STMN2 in ALS is caused by TDP43-associated cryptic splice-polyadenylation mechanism10,36, our data indicated that RPS29 severed as an additional regulator of STMN2 to sustain its protein expression at the translation level.
In addition to the conventional protein translation, we next analyzed the regulatory role of RPS29 in the ALS-associated aberrant translation process. Repeat-associated non-AUG (RAN) translation of a disease-causing expanded (GGGGCC)n repeat in C9orf72, the most common genetic cause of ALS, produces highly toxic dipeptide-repeats (DPRs) proteins in C9orf72-ALS patients. We developed a dual fluorescent reporter to monitor RAN translation activity in cells with RPS29 loss. This reporter generates a single mRNA whose N terminal is a mCherry gene, followed by the intron 1 A region of human C9orf72 gene containing a (GGGGCC)96 repeats and a GFP gene lacking AUG start codon. While the mCherry protein can be generated by conventional translation, the GFP protein can only be produced through cap-independent RAN translation of its upstream (GGGGCC)96 repeats, resulting in a poly(Gly-Ala)-GFP fusion protein. Our data showed that RAN translation activity was significantly enhanced by RPS29 knockdown, as evidenced by the increased polyGA-GFP protein level and the high molecular weight aggregation (Fig. 6f, g). Consistently, we observed larger cytoplasmic GFP-positive polyGA inclusions in the RPS29 knockdown cells (Fig. 6h).
Next, we explored the functional relevance of RPS29 in human iPSC differentiation-derived motor neurons (Supplementary Fig. 11a). Motor neurons showed a significant reduction in neuronal viability after RPS29 suppression (Fig. 6i). To confirm the regulatory axis of RPS29-STMN2, we further investigated the protein expression and distribution of STMN2 in motor neurons following RPS29 knockdown. Compared to the RNA levels (Supplementary Fig. 11b), immunofluorescence assays demonstrated decreased axon STMN2 levels after RPS29 suppression, which was in line with the role of STMN2 in axon protection (Fig. 6j). Taken together, these data suggest that RPS29 loss in ALS patient will disrupt protein homeostasis by inhibiting conventional translation and STMN2 protein expression, but promoting translational errors and aberrant RAN translation.
To further validate the functional importance of RPS29, we examined whether re-expression of RPS29 could rescue the phenotypic defects induced by RPS29 downregulation in SH-SY5Y cells and motor neurons. Remarkably, RPS29 overexpression significantly restored cell viability caused by RPS29 inhibition, suggesting the functional importance of RPS29 and therapeutic potential of RPS29 gene therapy for RPS29-downregulated patients (Fig. 6k).
Discussion
Genome-wide association studies (GWAS) have significantly advanced our understanding of the genetic architecture of ALS pathology, defining a set of genomic alterations associated with ALS53,54,55,56. However, the discovery of the therapeutic targets for ALS remains challenging due to the technical limitation of targeting genetic alterations, such as the difficulty of correcting mutations with high efficiency and accuracy, and the complexity of the downstream biological consequences. Instead, it holds promise for target discovery at regulatory levels, including protein, epigenetics, and metabolism. In this study, we utilized proteomic and transcriptomic data of over 200 cases from AALS database to construct a machine learning framework, through which we identified 110 proteins linked to ALS (PMA110 signature). Interestingly, the PMA110 signature showed limited overlap with previously reported ALS-related genes, which largely shows the difference between proteomics, transcriptomics, and genetic levels. In addition, the heterogeneity among ALS patient cohort also contributes to the different gene sets in different studies, including patients of different ethnicities across Europe, Africa, and Asia, even if the datasets were derived from the same molecular level53,54,55,57. By integrating molecular subtype-specific expression patterns of PMA110 signature and proteomics profiling of ALS patient spinal cord, our study pinpointed a panel of proteomic biomarkers and targets, especially RPS29, suggesting the dysregulation of the relevant biological functions in ALS subtypes.
One major obstacle in target discovery for brain disorders is the lacking of patient-derived experimental models that are representative of patients’ pathology and molecular characteristics. Several research groups have successfully generated iPSCs from patients harboring pathogenic mutations in ALS, revealing disease-specific phenotypes at the cellular level in vitro and dissecting the pathogenic role of genetic mutations58. Moreover, the iPSC-derived ALS motor neuron have been applied for high-throughput drug screening, identifying ropinirole as a promising therapeutic candidate59. Our study, together with these studies, highlight the potential value of iPSC models in drug screening, multi-omics analysis, and functional validation for neurodegenerative diseases. Further improvement of iPSC models, such as the increase in genetic heterogeneity, optimized motor neuron differentiation, and multi-cell type culture as organoid, would provide iPSC models more representative of ALS patients.
In our investigation, we observed a limited number of differentially expressed genes between ALS and control groups in iPSC-derived motor neurons. This observation aligns with previous findings that the iPSC motor neuron (iPSMN) model demonstrates only mild differential gene expression60, which also underscores the inherent genetic heterogeneity in ALS. In recent years, artificial intelligence techniques, particularly machine learning tools, have been increasingly applied to sequencing data to enable precise diagnosis. For instance, machine learning is used to analyze cerebrospinal fluid proteomic data from large-scale samples, identifying novel biomarkers to improve the early and accurate diagnosis of Alzheimer’s disease and Parkinson's disease, thereby contributing to a deeper understanding of disease mechanisms and potential diagnostic tools. To address this complexity in ALS, we employed several machine learning methods to identify proteomic features associated with ALS. Among the methods tested, including Random Forest, XGBoost, and Support Vector Machines (SVM), the predictive performance of the three algorithms was comparable and yielded 110 ALS-associated protein markers. For statistical modelling of ALS, there would be several improvements for further studies. First, a larger dataset with more samples would benefit for the machine learning modeling to avoid overfitting and sample imbalance. Next, other regulatory omics data provide additional information. State-of-art algorithms to integrate proteomics, metabolomics, and epigenetics, together with biological pathways and functional association, would further improve the biological relevance of the machine learning process. Last, with the increase in sample scale and multi-omics information, the recently developed methodologies, such as deep learning (e.g., convolutional neural networks) and large language models, should be investigated to deal with the complexity of biological context and enhance the accuracy and reliability of disease prediction models.
PMA110 signature represents a novel list of proteomic candidates for ALS biomarkers and targets, which warrants further experimental and clinical validation. In line with PMA110, subtype DEPs also showed minor overlapping with the well-established ALS genes, suggesting a unique role of protein-level regulation. We hypothesized that the information about subtype-specific DEP expression would better explain the expression trend for features selected by ML-based models, and help to identify biologically-meaningful targets in patient cohort or subtypes. In this study, we focused on RPS29, which is a component of the 40S ribosomal subunit and plays a fundamental role in translation initiation. RPS29 is downregulated in ALS patients, and such reduction is irrelevant to any known genetic variations. However, RPS29 mutation has not been linked to ALS, while it is reported to be associated with Diamond-Blackfan anemia (DBA) through whole-exome sequencing and functional analyses, suggesting the contribution of specific ribosomal proteins to ribosome assembly and protein translation may be cell- and tissue-specific45. Nevertheless, in a zebrafish model of DBA, Rps29 mutant or knockout induced p53 pathway activation, and suppression of p53 rescued the morphological and hematopoietic defects associated with RPS29 knockdown61,62. This aligns with our findings that RPS29 knockdown leads to the upregulation of the p53 signaling pathway, which is also found in ALS samples. Therefore, our data indicate that RPS29 regulates p53 pathway activity in ALS.
We also explored the functional and mechanism association of RPS29 specifically in ALS. Besides the overall reduction of protein translation, RPS29 inhibition significantly impaired expression levels of several motor neuron essential proteins, including STMN2. STMN2 is a member of the nervous system-specific stathmin family, which binds to tubulin dimers to regulate microtubule stability. STMN2 is essential for axonal regeneration and motor neuron survival. It has been reported that TDP-43 is able to bind to the STMN2 pre-mRNA and suppress the inclusion of a cryptic exon in the first exon. TDP-43 loss-of-function leads to the cryptic exon inclusion of STMN2 and reduced mRNA expression, which is considered as a primary pathological hallmark in ALS36,63,64. Our data indicate an additional mechanism to maintain STMN2 expression, through RPS29-controlled protein translation of STMN2, in addition to TPD-43-mediated RNA regulation. The underlying mechanism and contribution of these two regulatory mechanisms warrant further studies. Also, it would be of particular interest to test whether RPS29-STMN2 axis is reserved in large-scale patient samples and could be exploited for therapeutic intervention of ALS using techniques like adeno-associated virus-mediated gene therapy to restore RPS29 expression in impaired motor neurons.
Mounting evidence suggests that pathogenic RAN translation of expanded tandem repeat is closely associated with multiple neurological disorders, including C9orf72-ALS65,66,67. Moreover, recent studies suggested a native function of RAN translation of tandem repeats within the physiological range68. However, the underlying mechanism of RAN translation remains unknown. Previous studies suggest that a protein component of the small (40S) ribosomal subunit, RPS25, is required for effective RAN translation69. In contrast, in the present study, we demonstrate that another 40S ribosomal protein, RPS29, is served as a key quality controller of protein translation by suppressing RAN translation while sustaining conventional translation. Thus, these data collectively suggest that the 40S ribosomal subunit function may be a key regulatory component for RAN translation.
Taken together, this study performed a proof-of-concept machine learning modelling of proteomics data from ALS patient-derived motor neuron samples and identified a list of 110 proteins for further validation as biomarkers or targets. One of the proteins, RPS29, which maintains translation process, especially STMN2 translation, and suppresses aberrant RAN translation, represents a novel therapeutic target for therapy development.
Methods
Multi-omics datasets from Answer ALS cohort
The clinical information, RNA-seq data and proteomics data from iPSC-differentiated motor neurons were obtained from the Answer ALS Data Portal (https://dataportal.answerals.org) with approved permission.
Cell culture
Human SH-SY5Y neuroblastoma cells, HeLa cells, and 293 T cells were cultured in Dulbecco’s modified Eagle’s medium (DMEM, Thermo Fisher Scientific; C11995500BT) containing 10% fetal bovine serum (FBS) (Oricell; FBSSR-01021-500), 50 units/mL penicillin, and 50 mg/mL streptomycin. The cell line was maintained in a 37 °C incubator with 5% CO2.
Plasmid construction, lentiviral production, and transfection
The short hairpin RNA (shRPS29: 5’-GCTCTTGTCGTGTCTGTTCAA-3’) was inserted into the plasmid pLKO.1 at restriction enzyme cutting sites of AgeI and EcoRI. Sanger DNA sequencing was used to confirm the insertion of shRNA. The RPS29 overexpression vector was constructed by introducing a synonymous mutation (CGT to AGG at codon 22 encoding Arginine) at the shRNA binding site through PCR, thereby disrupting the shRNA recognition sequence. The mutated fragment was subsequently cloned into the pLenti-PGK-DEST-Hygro lentiviral expression vector using Gateway recombination cloning technology (Thermo Fisher Scientific; 11789020;11791020). For the repeat-associated non-AUG (RAN) translation reporter, a 3x stop codon was appended to the 3’ end of mCherry via PCR amplification and subsequently inserted upstream of the (GGGGCC)96 repeat sequence, including the adjacent intronic sequences, within the pcDNA 3.1-(GGGGCC)96 vector using XbaI and BamHI restriction enzyme sites. The EGFP, lacking the ATG start codon, was then subcloned downstream of the (GGGGCC)96 repeats in-frame with GA. For the dual-luciferase reporter, Renilla luciferase and firefly luciferase were sequentially ligated into the same expression vector by PCR, with the two luciferases separated by an in-frame linker sequence encoding a sense codon in the control construct. To generate stop codon readthrough structures, the linker sequence codon CGA was mutated to the stop codon UGA. For misincorporation measurements, a reporter construct harboring a mutation in the active site of firefly luciferase (H245K; CAC245CGC) was utilized. For the near-start codon reporter, full-length EGFP and an EGFP variant with the start codon mutated to CTG were both amplified by PCR. The CMV promoter was subsequently inserted upstream of EGFP by overlap PCR, and the resulting construct was integrated into the stop codon region of mCherry in the pLenti-ef1a-mc vector via XbaI and EcoRI sites.
293 T cells were used to generate lentiviral particles through co-transfection of the packaging vectors psPAX2 and pMD2.G using LipoD293 In Vitro DNA Transfection Reagent (SignaGen Laboratories; SL100668). Media were changed after 12 h. The lentivirus supernatant was collected 48 h after media change, followed by filter through a 0.45 μm filter and virus concentration (Beyotime; C2901S) according to the instructions. Briefly, 9 mL of viral supernatant was resuspended with 450 μL of resuspension solution and stored in −80 °C for future use. For lentiviral infections, SH-SY5Y, motor neuron, HeLa, and 293 T cells were infected with concentrated shRNA lentivirus and RPS29-overexpressing lentivirus, and the media were replaced after 72 h. Knockdown efficiency was measured by qRT-PCR or immunoblotting 48 h post-infection. For transient transfection, plasmid transfection was carried out using Lipo8000™ Transfection Reagent according to the instructions. Cells were analyzed 72–96 h post-transfection.
RNA Isolation and qRT-PCR
Cellular total RNA was extracted using RNAprep Pure Cell/Bacteria Kit (TIANGEN; DP430), and reverse transcription to cDNA was performed using the reverse transcription reagent premix (Accurate Biology; AG11706). Briefly, 1 μg of total RNA was used for cDNA synthesis using random primers according to the manufacturer’s instructions. Quantitative real-time PCR (qRT-PCR) was performed using the SYBR Green Master mix (Thermo Fisher Scientific; A25778) on an Applied Biosystems Stepone Plus Real-Time PCR System and normalized to 18S ribosomal RNA or GAPDH. The primers in qRT-PCR were designed using Primer3 (https://primer3.ut.ee/).
SH-SY5Y neuronal differentiation
Human SH-SY5Y cells were seeded at a density of 1 × 105 cells per well of a 6-well culture plate in DMEM (Thermo Fisher Scientific; C11995500BT) supplemented with 10% FBS (Oricell; FBSSR-01021-500), 50 units/mL penicillin, and 50 mg/mL of streptomycin. After 24 h, the media was changed to DMEM supplemented with 5% FBS, 50 units/mL penicillin, 50 mg/mL of streptomycin, 4 mM L-glutamine, and 10 µM retinoic acid (MERCK; R2625). After 72 h, the media was switched to neurobasal media (Thermo Fisher Scientific; 12348017) containing 1% N-2 supplement 100x (Thermo Fisher Scientific; 17502048), 50 units/mL penicillin, 50 mg/mL of streptomycin, 1% L-glutamine, and 50 ng/mL human BDNF (PEPROTECH; 450-02). Cells were cultured for additional 3 days for differentiation.
Motor neuron generation
The method for differentiating human iPSCs into motor neurons is based on a previous report with minor modification70. Briefly, on Day 0, a commercially obtained fibroblast-induced iPSC clone was dissociated and plated into a single well of a Matrigel-coated 6-well plate, supplemented with 10 µM Rock inhibitor (Selleck; S1049). On Day 1, the medium was changed to neuronal differentiation medium, consisting of 50% Neurobasal medium (Thermo Fisher; 21103049) and 50% DMEM/F12 medium (Thermo Fisher; C11330500BT), supplemented with 1X GlutaMAX (Thermo Fisher; 35050061), 0.5X N2 (Thermo Fisher; 17502048), B27 supplement (Thermo Fisher; 17504044), and 0.1 mM ascorbic acid (Sigma; A4544). Additionally, 3 µM CHIR99021 (Sigma; SML1046), 2 µM SB431542 (Selleck; S1067), and 2 µM DMH-1 (Selleck; S7146) were added, and the culture was maintained for 6 days. Cells were dissociated and plated onto Matrigel-coated 10-cm dishes in neuronal medium containing 1 µM CHIR99021, 2 µM DMH-1, 2 µM SB431542, 0.1 µM retinoic acid (RA, Sigma; R2625-50MG), and 0.5 µM purmorphamine (Selleck; S3042). On Day 13, cells were dissociated using 1 U/mL dispase (Stemcell; 7923) and transferred to ultra-low adhesion 10-cm plates (LABSELECT; 12331) in neuronal medium supplemented with 0.5 µM RA and 0.2 µM purmorphamine. On Day 21, cells were detached using 1X Accutase (Stemcell; 07920) and seeded into PDL/laminin (Sigma; P7405, L2020) coated 6-well plates at a density of 2 × 106 cells/well and onto PDL/laminin-coated slides at a density of 1.6 × 105 cells/well. The culture medium contained neuronal medium supplemented with 0.5 µM RA, 0.2 µM purmorphamine, and 0.1 µM Compound E (Sigma; 565790-500UG). To knockdown RPS29 in the iPSCs-derived motor neurons, cells were transfected with lentivirus expressing shRPS29 for 12 h on Day 25, followed by an 8-day culture period before experimental analysis. To overexpress RPS29 protein in the iPSCs-derived motor neurons, cells were transduced with concentrated RPS29-overexpressing lentivirus for 12 h on Day 25 of differentiation, followed by an 8-day culture period to allow stable expression before experimental analysis.
Neuronal viability
Motor neurons were plated at a density of 2 × 105 cells/well in a 24-well plate with three or four replicate wells. CellTiter-Glo (Promega) was used to measure cell viability, and the absorbance of each well was measured using a Tecan Spark plate reader.
Dual luciferase assays for translation fidelity
Translational fidelity measurements in cells were conducted using a dual-luciferase reporter system. Plasmid transfection was performed using the Lipo8000™ Transfection Reagent (Beyotime; C0533-1.5 ml), following the manufacturer’s instructions, and cells were harvested after 48 h of transfection. Percentage of stop codon readthrough or misincorporation was determined by dividing the firefly-to-Renilla luciferase ratio of the experimental reporter by the mean firefly-to-Renilla ratio of the control reporter, as described previously48,71. Luciferase activities were quantified using the Dual Luciferase Reporter Gene Assay Kit (Yeasen; 11402ES60). Each sample was incubated in 200 µL of cell lysate at 4 °C for 5 min, transferred to a black 96-well plate, and analyzed using a Tecan Spark plate reader.
Translation measurement using puromycin incorporation assay
This experimental protocol measures translation via puromycin incorporation and does not exhibit any discernible impact on overall translation72. To assess translation, puromycin was added to the culture medium at a final concentration of 5 µg/mL for 5 min prior to harvesting. Following treatment, cells were washed with ice-cold PBS and lysed in RIPA buffer (50 mM Tris, pH 8.0, 150 mM NaCl, 0.1% SDS, 1% NP40, and 1% sodium deoxycholate) supplemented with a 1X× protease inhibitor cocktail. Lysates were subjected to SDS-PAGE followed by western blot analysis.
Western blot
Cells were collected and lysed using RIPA buffer (50 mM Tris, pH 8.0, 150 mM NaCl, 0.1% SDS, 1% NP40, and 1% sodium deoxycholate) supplemented with a 1× protease inhibitor cocktail. Cell lysates were then mixed with 5× SDS sample buffer (250 mM Tris, pH 6.8, 750 mM NaCl, 5% NP40, 5% sodium deoxycholate, 10% SDS, 5% 2-mercaptoethanol, and 60 mM EDTA) and boiled for 10 min. The prepared samples were subjected to SDS-PAGE analysis, followed by protein transfer onto nitrocellulose membranes. The membranes were then probed with the indicated primary antibodies, with GAPDH or tubulin serving as loading controls.
Primary antibodies for immunoblotting were as follows: anti-puromycin (Sigma, MABE343, 1:500 dilution); anti-ubiquitin (Proteintech, 10201-2-AP, 1:5000 dilution); anti-RPS29 (Proteintech, 17374-1-AP, 1:2000 dilution); anti-STMN2 (Proteintech, 10586-1-AP, 1:2000 dilution); anti-ELAVL3 (Proteintech, 55047-1-AP, 1:2000 dilution); anti-KPNA2 (Proteintech, 10819-1-AP, 1:2000 dilution); anti-GAPDH (Proteintech, 60004-1-Ig, 1: 500000 dilution); anti-GFP-Tag (ABclonal, AE078, 1:10000 dilution); Anti-mCherry-Tag (ABclonal, AE002, 1:5000 dilution).
Immunofluorescence
For immunofluorescence analysis, cells grown on coverslips were rapidly rinsed with PBS and then fixed in 4% paraformaldehyde in PBS for 20 min at room temperature. Fixed cells were then permeabilized with 0.1% Triton X-100 in PBS and treated with blocking buffer (1× PBS, 3% BSA, and 0.1% Tween-20) for 30 min at room temperature. Cells were incubated with primary antibodies in blocking buffer overnight at 4 °C and washed with 1× PBS supplemented with 0.1% Tween-20. Then, cells were incubated with fluorescently conjugated secondary antibodies in blocking buffer for 1.5 h at room temperature and washed with 1× PBS supplemented with 0.1% Tween-20. Coverslips were mounted with ProLong Diamond with DAPI (Thermo Fisher).
Images were captured by an FV3000 Confocal Microscope (Olympus) equipped with the FV31S-SW Software and a 60X oil objective using the same settings to allow the comparison of signal intensities across samples. Images were analyzed using Fiji software. For each sample, imaging fields were randomly selected. For motor neuron imaging analysis, the soma of each neuron was first segmented to calculate the average intensity of STMN2. About 40 µm of axons near the soma side were then segmented to calculate the average intensity of STMN2 in axons. Relative STMN2 expression level was obtained by dividing the mean intensity of the axons by the mean intensity of the soma in each neuron.
MOGONET analysis
The analysis framework of Multi-Omics Graph cOnvolutional NETworks (MOGONET) was obtained from the GitHub repository24. The parameters were modified to accommodate the two available omics data types (RNA-seq and proteomic data). For each omics dataset, the 200 features with the largest variance were selected to prioritize the most variable and potentially informative features. Each type of omics data was individually scaled to [0, 1] through linear transformations for training. The data were randomly divided into a training set (70% of the total samples) and a testing set (30% of the total samples) to facilitate model training and evaluation. Finally, the top 50 biomarkers for both transcriptional and proteomic data were identified.
Machine learning-based PMA110 signature
We retrieved the Matrix of Intensities data from the AALS proteomics dataset portal (https://data.answerals.org/home). The data matrix was log2-transformed for normalization. Missing values were handled by imputing them with zero. Proteins that were not expressed in over 75% of the samples were filtered out. In terms of batch correction, we incorporated batch information as a feature in the machine learning model, allowing the model to account for batch effects during analysis. In Python, the control sample data was filled using the “SMOTE” function. The model was trained using the “RandomForestClassifier” in the “sklearn” library, with 70% of the samples used for training and 30% for prediction in each run. Given the limited number of control samples (n = 33), SMOTE was applied to balance the classes. This approach, compared to Leave-One-Out Cross-Validation, enabled more stable model evaluation and reduced the risk of overfitting due to the presence of synthetic samples. The “permutation_importance” was used to obtain the features whose importance was above the threshold of 5 × 10−3 in each run. We tested multiple thresholds and found that different cutoff values yielded largely similar rankings of the selected features. The number of times for selected features whose importance was above the threshold in 10,000 runs was calculated, and features with more than 7000 times were considered as candidate features for further analysis.
Protein-protein interaction (PPI) distance
We downloaded the human PPIs from BioGRID (version 4.4.228), including 20,011 proteins and 1,096,621 PPIs, followed by interrogation of PPI distance between a specific protein and ALS-related proteins using this PPI network. When calculating the interaction distance between protein A and ALS protein, if the protein set B that interacts with protein A includes the ALS list, the PPI distance is taken as 1. If set B does not include the ALS list, then if the protein set C that interacts with the proteins in set B includes the ALS list, the PPI distance is 2; otherwise, the PPI distance is greater than or equal to 3.
Enrichment analysis of PPI module
Enrichment analysis was performed using Metascape73, with biological processes for GO enrichment and KEGG pathway enrichment. The enrichment analysis was performed against the whole protein-coding genes as background, and the most significant representative terms were adjusted by false discovery rate (FDR) and selected for visualization. It should be noted that the background in the functional enrichment analysis is all the protein-coding genes, instead of proteins detected in the proteomics dataset. Due to the technical limitation in proteomics technology, only subset of proteins can be accurately detected. To avoid missing biologically meaningful pathway information, we chose all the protein-coding genes as background for statistical analysis. Protein interaction analysis was performed by STRING74. The “cluster” function was used to divide the candidate protein list into 4 modules of PPI network and obtain the enrichment analysis results of the four modules. The protein interaction data were retrieved and imported into Cytoscape75 software to obtain the protein interactions.
Consensus clustering and WGCNA
In order to classify ALS patients into molecular subtypes, we used the “ConsensusClusterPlus”76 package in R for consensus clustering. The maximum classification parameter “maxK” was set to 20, and the parameter k = 4 was selected based on the output cluster performance.
To perform gene co-expression analysis, we used ”WGCNA”77,78 package in R. Pairwise Pearson correlations were calculated to build signed regulatory networks using WGCNA. We applied soft thresholding techniques to approximate scale-free topological networks by constructing adjacency matrices, and defined multiple modules of gene co-expression. WGCNA was performed using the following parameters: soft-thresholding power = 0.85, minimum module size = 30, and module similarity cut-height = 0.25. Signature genes or proteins were calculated as the first principal component of each module. We calculated the relationship between WGCNA modules and four proteomics subtypes. Modules that were highly correlated with specific proteomics subtypes were screened based on the p values and correlation coefficient. Functional enrichment analysis was performed using Metascape.
To classify samples from the AALS database into transcriptional subtypes defined by Tam et al. 43. (Retrotransposon Activation [TE], Oxidative Stress [Ox], and Activated Glia [Glia]), We implemented a weighted scoring approach based on subtype-specific marker genes. We retrieved the molecular signature markers for each subtype and calculated the Tam Subtype score using weighted marker expression levels. Each AALS sample was classified into one subtype with the highest score.
RNA-seq
For transcriptomic data of RPS29-knockdown cells, total RNA was extracted using the RNAprep Pure Cell/Bacteria Kit (TIANGEN; DP430) following the manufacturer’s protocol. The extracted RNA was utilized for library construction with the Illumina TruSeq Stranded Total RNA Library Prep Kit. The library was sequenced using paired-end 150-bp reads. Raw FASTQ reads were processed by trimming with Trim Galore (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/), and transcript quantification was conducted using Salmon software in quasi-mapping mode79. Salmon “quant” files were subsequently converted using the Tximport function. Differentially expressed genes (DEGs) were identified using DESeq280. Gene set enrichment analysis (GSEA) was performed on a preranked gene list, with gene expression fold change as the ranking metric, using the GSEA desktop application81.
To investigate differential splicing events between ProS4 and control samples, raw FASTQ files were obtained from the AALS database. Raw FASTQ reads were processed by trimming with Trim Galore (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). Filtered reads were aligned to the GRCh38 human reference genome using STAR82. Splicing quantification was performed using rMATS to detect five major types of alternative splicing events: skipped exons (SE), alternative 5’/3’ splice sites (A5SS/A3SS), mutually exclusive exons (MXE), and retained introns (RI).
Ribo-seq
Cells were treated with a specific lysis buffer containing cycloheximide (50 mg/mL) to obtain the lysate. The concentration of the lysate was measured using a Qubit fluorometer. To digest RNA other than ribosome-protected fragments (RPFs), cell or tissue lysates were treated with the unspecific endoribonuclease RNase I. Monosomes were isolated by size-exclusion chromatography and purification. Both ends of the RPFs were phosphorylated and ligated with 5’ and 3’ adapters, respectively. RNA samples were treated with an rRNA depletion kit (Qiagen; 334387) to minimize rRNA contamination. The RNA fragments were reverse transcribed and amplified by PCR, followed by library construction using the Multiplex Small RNA Library Prep Set for Illumina (NEB; E7300L). The libraries were subjected to Illumina sequencing with single-end 50 bp (SE50) sequencing.
Raw FASTQ reads were trimmed using Trim Galore to retain fragments between 20 and 40 base pairs. Ribosomal RNA (rRNA) and transfer RNA (tRNA) sequences were removed using Bowtie2 alignment83. The remaining sequences were aligned to the hg38 human genome using STAR82, and gene expression was quantified using featureCounts84. Differential analysis was identified using DESeq280.
Statistical analyses
Unless otherwise stated, all data are presented as the mean ± standard deviation (SD). Graph visualization was performed using specific R packages, ggplot2 package85 or Prism 9. Statistical significance was determined as follows: *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001. A two-tailed, unpaired t-test was used for comparisons between two groups. For box-and-whisker plot, the box indicates interquartile range (IQR), the line in the box indicates the median, the whiskers indicate points within Q3 + 1.5× IQR and Q1 − 1.5× IQR, and the points beyond whiskers indicate outliers. Q1 and Q3, the first and third quartiles, respectively.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All raw sequencing data and processed data are available at the Gene Expression Omnibus through accession number GSE285221. Uncropped gels are provided in Supplementary Fig. 12. The source data behind the graphs in the paper can be found in Supplementary Data 3. Any additional data and information are available from the corresponding author upon request.
Code availability
All major analyses were carried out using publicly available tools as described in the Methods. RNA-seq and Ribo-seq data processing were performed in a Linux environment using standard bioinformatics pipelines. Data processing, statistical testing, and visualization—including PCA, volcano plots, and heatmaps—were conducted in R (version 4.2.3). The full set of R scripts used for analysis and figure generation is available from the corresponding author upon request.
References
Feldman, E. L. et al. Amyotrophic lateral sclerosis. Lancet 400, 1363–1380 (2022).
Westeneng, H.-J. et al. Prognosis for patients with amyotrophic lateral sclerosis: development and validation of a personalised prediction model. Lancet Neurol. 17, 423–433 (2018).
Hardiman, O., van den Berg, L. & Kiernan, M. Clinical diagnosis and management of amyotrophic lateral sclerosis. Nat. Rev. Neurol. 7, 639–649 (2011).
Kiernan, M. C. et al. Improving clinical trial outcomes in amyotrophic lateral sclerosis. Nat. Rev. Neurol. 17, 104–118 (2021).
Byrne, S. et al. Rate of familial amyotrophic lateral sclerosis: a systematic review and meta-analysis. J. Neurol. Neurosurg. Psychiatry 82, 623–627 (2011).
Mead, R. J., Shan, N., Reiser, H. J., Marshall, F. & Shaw, P. J. Amyotrophic lateral sclerosis: a neurodegenerative disorder poised for successful therapeutic translation. Nat. Rev. Drug Discov. 22, 185–212 (2023).
Udine, E., Jain, A. & van Blitterswijk, M. Advances in sequencing technologies for amyotrophic lateral sclerosis research. Mol. Neurodegener. 18, 4 (2023).
Suzuki, N., Nishiyama, A., Warita, H. & Aoki, M. Genetics of amyotrophic lateral sclerosis: seeking therapeutic targets in the era of gene therapy. J. Hum. Genet. 68, 131–152 (2023).
Elden, A. C. et al. Ataxin-2 intermediate-length polyglutamine expansions are associated with increased risk for ALS. Nature 466, 1069–1075 (2010).
Baughn, M. W. et al. Mechanism of STMN2 cryptic splice-polyadenylation and its correction for TDP-43 proteinopathies. Science 379, 1140–1149 (2023).
Mann, J. R. et al. Loss of function of the ALS-associated NEK1 kinase disrupts microtubule homeostasis and nuclear import. Sci. Adv. 9, eadi5548 (2023).
Miller, T. M. et al. Trial of Antisense Oligonucleotide Tofersen for SOD1 ALS. N. Engl. J. Med. 387, 1099–1110 (2022).
Giacomelli, E. et al. Human stem cell models of neurodegeneration: from basic science of amyotrophic lateral sclerosis to clinical translation. Cell Stem Cell 29, 11–35 (2022).
Workman, M. J. et al. Large-scale differentiation of iPSC-derived motor neurons from ALS and control subjects. Neuron 111, 1191–1204.e5 (2023).
Shi, Y. et al. Haploinsufficiency leads to neurodegeneration in C9ORF72 ALS/FTD human induced motor neurons. Nat. Med. 24, 313–325 (2018).
Coyne, A. N. et al. G4C2 repeat RNA initiates a POM121-mediated reduction in specific nucleoporins in C9orf72 ALS/FTD. Neuron 107, 1124–1140.e11 (2020).
Wang, T. et al. C9orf72 regulates energy homeostasis by stabilizing mitochondrial complex I assembly. Cell Metab. 33, 531–546.e9 (2021).
Wang, T. et al. Intracellular energy controls dynamics of stress-induced ribonucleoprotein granules. Nat. Commun. 13, 5584 (2022).
Baxi, E. G. et al. Answer ALS, a large-scale resource for sporadic and familial ALS combining clinical and multi-omics data from induced pluripotent cell lines. Nat. Neurosci. 25, 226–237 (2022).
Ng, S., Masarone, S., Watson, D. & Barnes, M. R. The benefits and pitfalls of machine learning for biomarker discovery. Cell Tissue Res. 394, 17–31 (2023).
Li, R. Data mining and machine learning methods for dementia research. Methods Mol. Biol. 1750, 363–370 (2018).
Sances, S. et al. Modeling ALS with motor neurons derived from human induced pluripotent stem cells. Nat. Neurosci. 19, 542–553 (2016).
Guise, A. J. et al. TDP-43-stratified single-cell proteomics of postmortem human spinal motor neurons reveals protein dynamics in amyotrophic lateral sclerosis. Cell Rep. 43, 113636 (2024).
Wang, T. et al. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat. Commun. 12, 3445 (2021).
Shen, H. et al. Sexually dimorphic RNA helicases DDX3X and DDX3Y differentially regulate RNA metabolism through phase separation. Mol. Cell 82, 2588–2603.e9 (2022).
Harlan, B. A. et al. Evaluation of the NAD+ biosynthetic pathway in ALS patients and effect of modulating NAD+ levels in hSOD1-linked ALS mouse models. Exp. Neurol. 327, 113219 (2020).
Andrés-Benito, P., Moreno, J., Aso, E., Povedano, M. & Ferrer, I. Amyotrophic lateral sclerosis, gene deregulation in the anterior horn of the spinal cord and frontal cortex area 8: implications in frontotemporal lobar degeneration. Aging9, 823–851 (2017).
Chen, Z. S. et al. Mutant GGGGCC RNA prevents YY1 from binding to Fuzzy promoter which stimulates Wnt/β-catenin pathway in C9ALS/FTD. Nat. Commun. 14, 8420 (2023).
Harvey, C. et al. Rare and common genetic determinants of mitochondrial function determine severity but not risk of amyotrophic lateral sclerosis. Heliyon 10, e24975 (2024).
Park, J. et al. Poly(GR) interacts with key stress granule factors promoting its assembly into cytoplasmic inclusions. Cell Rep. 42, 112822 (2023).
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Proc. of the 31st International Conference on Neural Information Processing Systems 4768–4777 (Curran Associates Inc., 2017).
Zhang, S. et al. Genome-wide identification of the genetic basis of amyotrophic lateral sclerosis. Neuron 110, 992–1008.e11 (2022).
Kiskinis, E. et al. Pathways disrupted in human ALS motor neurons identified through genetic correction of mutant SOD1. Cell Stem Cell 14, 781–795 (2014).
Li, Y. et al. Globally reduced N6-methyladenosine (m6A) in C9ORF72-ALS/FTD dysregulates RNA metabolism and contributes to neurodegeneration. Nat. Neurosci. 26, 1328–1338 (2023).
Kok, J. R., Palminha, N. M., Dos Santos Souza, C., El-Khamisy, S. F. & Ferraiuolo, L. DNA damage as a mechanism of neurodegeneration in ALS and a contributor to astrocyte toxicity. Cell Mol. Life Sci. 78, 5707–5729 (2021).
Klim, J. R. et al. ALS-implicated protein TDP-43 sustains levels of STMN2, a mediator of motor neuron growth and repair. Nat. Neurosci. 22, 167–179 (2019).
Gomes, L. C. et al. Multiomic ALS signatures highlight subclusters and sex differences suggesting the MAPK pathway as therapeutic target. Nat. Commun. 15, 4893 (2024).
Larrea, D. Altered mitochondria-associated ER membrane (MAM) function shifts mitochondrial metabolism in amyotrophic lateral sclerosis (ALS). Nat. Commun. 16, 379 (2025).
Lee, H. Multi-omic analysis of selectively vulnerable motor neuron subtypes implicates altered lipid metabolism in ALS. Nat. Neurosci. 24, 1673-1685 (2021).
Huang, N., Lee, I., Marcotte, E. M. & Hurles, M. E. Characterising and predicting haploinsufficiency in the human genome. PLoS Genet. 6, e1001154 (2010).
Fadista, J., Oskolkov, N., Hansson, O. & Groop, L. LoFtool: a gene intolerance score based on loss-of-function variants in 60,706 individuals. Bioinformatics 33, 471–474 (2017).
Petrovski, S., Wang, Q., Heinzen, E. L., Allen, A. S. & Goldstein, D. B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).
Tam, O. H. et al. Postmortem cortex samples identify distinct molecular subtypes of ALS: retrotransposon activation, oxidative stress, and activated glia. Cell Rep. 29, 1164–1177.e5 (2019).
O’Donohue, M.-F., Choesmel, V., Faubladier, M., Fichant, G. & Gleizes, P.-E. Functional dichotomy of ribosomal proteins during the synthesis of mammalian 40S ribosomal subunits. J. Cell Biol. 190, 853–866 (2010).
Mirabello, L. et al. Whole-exome sequencing and functional studies identify RPS29 as a novel gene mutated in multicase Diamond-Blackfan anemia families. Blood 124, 24–32 (2014).
Kapur, M. & Ackerman, S. L. mRNA translation gone awry: translation fidelity and neurological disease. Trends Genet. 34, 218–231 (2018).
Martinez-Miguel, V. E. et al. Increased fidelity of protein synthesis extends lifespan. Cell Metab. 33, 2288–2300.e12 (2021).
Salas-Marco, J. & Bedwell, D. M. Discrimination between defects in elongation fidelity and termination efficiency provides mechanistic insights into translational readthrough. J. Mol. Biol. 348, 801–815 (2005).
She, R., Luo, J. & Weissman, J. S. Translational fidelity screens in mammalian cells reveal eIF3 and eIF4G2 as regulators of start codon selectivity. Nucleic Acids Res. 51, 6355–6369 (2023).
Maor-Nof, M. et al. p53 is a central regulator driving neurodegeneration caused by C9orf72 poly(PR). Cell 184, 689–708.e20 (2021).
Solomon, D. A. et al. A feedback loop between dipeptide-repeat protein, TDP-43, and karyopherin-α mediates C9orf72-related neurodegeneration. Brain 141, 2908–2924 (2018).
Diaz-Garcia, S. et al. Nuclear depletion of RNA-binding protein ELAVL3 (HuC) in sporadic and familial amyotrophic lateral sclerosis. Acta Neuropathol. 142, 985–1001 (2021).
van Rheenen, W. et al. Common and rare variant association analyses in amyotrophic lateral sclerosis identify 15 risk loci with distinct genetic architectures and neuron-specific biology. Nat. Genet. 53, 1636–1648 (2021).
Nicolas, A. et al. Genome-wide analyses identify KIF5A as a novel ALS gene. Neuron 97, 1268–1283.e6 (2018).
PARALS Registry et al. Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis. Nat. Genet. 48, 1043–1048 (2016).
Hop, P. J. et al. Genome-wide study of DNA methylation shows alterations in metabolic, inflammatory, and cholesterol pathways in ALS. Sci. Transl. Med. 14, eabj0264 (2022).
Laaksovirta, H. et al. Chromosome 9p21 in amyotrophic lateral sclerosis in Finland: a genome-wide association study. Lancet Neurol. 9, 978–985 (2010).
Fujimori, K. et al. Modeling sporadic ALS in iPSC-derived motor neurons identifies a potential therapeutic agent. Nat. Med. 24, 1579–1589 (2018).
Morimoto, S. et al. Phase 1/2a clinical trial in ALS with ropinirole, a drug candidate identified by iPSC drug discovery. Cell Stem Cell 30, 766–780.e9 (2023).
Ziff, O. J. et al. Integrated transcriptome landscape of ALS identifies genome instability linked to TDP-43 pathology. Nat. Commun. 14, 2176 (2023).
Taylor, A. M. et al. Hematopoietic defects in rps29 mutant zebrafish depend upon p53 activation. Exp. Hematol. 40, 228–237.e5 (2012).
Taylor, A. et al. Calmodulin inhibitors improve erythropoiesis in Diamond-Blackfan anemia. Sci. Transl. Med. 12, eabb5831 (2020).
Krus, K. L. et al. Loss of Stathmin-2, a hallmark of TDP-43-associated ALS, causes motor neuropathy. Cell Rep. 39, 111001 (2022).
Klim, J. R., Pintacuda, G., Nash, L. A., Guerra San Juan, I. & Eggan, K. Connecting TDP-43 pathology with neuropathy. Trends Neurosci. 44, 424–440 (2021).
Depienne, C. & Mandel, J.-L. 30 years of repeat expansion disorders: What have we learned and what are the remaining challenges?. Am. J. Hum. Genet. 108, 764–785 (2021).
Malik, I., Kelley, C. P., Wang, E. T. & Todd, P. K. Molecular mechanisms underlying nucleotide repeat expansion disorders. Nat. Rev. Mol. Cell Biol. 22, 589–607 (2021).
Nguyen, L., Cleary, J. D. & Ranum, L. P. W. Repeat-associated Non-ATG translation: molecular mechanisms and contribution to neurological disease. Annu. Rev. Neurosci. 42, 227–247 (2019).
Rodriguez, C. M. et al. A native function for RAN translation and CGG repeats in regulating fragile X protein synthesis. Nat. Neurosci. 23, 386–397 (2020).
Yamada, S. B. et al. RPS25 is required for efficient RAN translation of C9orf72 and other neurodegenerative disease-associated nucleotide repeats. Nat. Neurosci. 22, 1383–1388 (2019).
Du, Z.-W. et al. Generation and expansion of highly pure motor neuron progenitors from human pluripotent stem cells. Nat. Commun. 6, 6626 (2015).
Kramer, E. B., Vallabhaneni, H., Mayer, L. M. & Farabaugh, P. J. A comprehensive analysis of translational missense errors in the yeast Saccharomyces cerevisiae. RNA 16, 1797–1808 (2010).
Arnold, A. et al. Functional characterization of C. elegans Y-box-binding proteins reveals tissue-specific functions and a critical role in the formation of polysomes. Nucleic Acids Res. 42, 13353–13369 (2014).
Zhou, Y. et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun. 10, 1523 (2019).
Szklarczyk, D. et al. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 51, D638–D646 (2023).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Wilkerson, M. D. & Hayes, D. N. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics 26, 1572–1573 (2010).
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinforma. 9, 559 (2008).
Langfelder, P. & Horvath, S. Fast R functions for robust correlations and hierarchical clustering. J. Stat. Softw. 46, 1–7 (2012).
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Wickham, H. Data analysis. In ggplot2: Elegant Graphics for Data Analysis (ed. Wickham, H.) 189–201 (Springer International Publishing, 2016).
Acknowledgements
This work was supported by the National Science and Technology Innovation 2030 Major Projects of China (STI2030-Major Projects-2022ZD0212600), the National Key Research and Development Program of China (NO. 2022YFA1106600), National Natural Science Foundation of China (No. 82273473, 82473207, 32271000), and Shanghai Municipal Health Commission (2022YQ062). We thank all the patients participated in the study. Data used in the preparation of this article were obtained from the ANSWER ALS Data Portal (AALS-01184). For up-to-date information on the study, visit https://dataportal.answerals.org. Publication license is generated for graphical illustrations made by BioRender (www.biorender.com).
Author information
Authors and Affiliations
Contributions
W.X. and Z.Q. conceptualized the study. W.X. performed data analysis and organized figures. Z.G., Y.G., W.X., T.W., and B.T. contributed to experimental verifications and organized the experimental data and images. S.L., W.L., Z.S., and T.C. provided assistance with experimental design and process. X.G. provided insights for data analysis. W.X., Z.Q., B.T., and T.W. wrote the manuscript. All authors provided valuable suggestions and revisions to the manuscript. Z.Q. T.W. and B.T. supervised the study.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Biology thanks Hidenori Homma and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: João Valente. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Xu, W., Guo, Z., Guan, Y. et al. Machine learning-based proteomics profiling of ALS identifies downregulation of RPS29 that maintains protein homeostasis and STMN2 level. Commun Biol 8, 1177 (2025). https://doi.org/10.1038/s42003-025-08578-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42003-025-08578-8