Fig. 2: Machine learning of ALS multi-omics dataset identifies 110 ALS-relevant protein markers.

a The receiver-operating characteristic curve (ROC) plots showing the prediction accuracy using proteomics data and transcriptome data by Random Forest. Area under curve (AUC) values were calculated to indicate the overall model performance. b The bar graph showing the top features prioritized in 10,000 randomly sampled Random Forest training models. Specific features with reported ALS relevance were in red. c The plots showing the protein-protein interaction (PPI) distance between selected gene sets with the ALS meta-set. The red line represented the average PPI distance between PMA110 proteins and the ALS meta-set. The violin plot showed the average PPI distance between 110 randomly selected proteins (repeated 1000 times) and the ALS meta-set. d Gene ontology (GO) enrichment of PMA110 signature. Top, biological process gene sets from GO; Bottom, molecular function gene sets from GO. e Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment of PMA110 signature. f Gene ontology (GO) enrichment of the four protein-protein interaction (PPI) modules based on the PPI network among PMA110 signature.