Fig. 3: Application of LIME to identify genes influencing the classification decision by neural networks.
From: Neural networks reveal novel gene signatures in Parkinson disease from single-nuclei transcriptomes

A Scatter plots showing the cell type-specific correlations between the LIME feature importance Z-scores in the exploratory datasets. Pearson’s correlation coefficient (R) and the corresponding P-value are shown at the top of the panels. B Dot plots showing the neural network (NN) disease classification balanced accuracy obtained from permutation tests with decreasing feature counts used as input. HVGs with a mean LIME feature importance Z-score > 1.00 were incrementally eliminated based on their Z-score percentile rank until only the most important features remained. The same permutation tests were performed with an equal number of randomly selected genes as a benchmark. The dashed LIME indicates the optimal threshold: the number of input genes that maximized the discrepancy in balanced accuracy between using LIME-identified genes and random genes, across both Kamath et al. (top) and Wang et al. (bottom). C Bar plots showing the median NN accuracy using a leave-one-subject-out approach with LIME-identified genes or an equal number of randomly selected genes. Error bars represent the mean absolute deviation of the median accuracy across ten permutations for each subject. D Bar plots showing the dataset- and cell type specific-NN balanced accuracy when using the LIME-identified genes or an equal number of randomly selected genes. Error bars represent the standard deviation of the balanced accuracy across ten permutations. Wilcoxon rank-sum tests were used to compare model performance when using the LIME-identified genes versus randomly selected genes; a distinct set of random genes were used for each permutation. *P < 0.05; **P < 0.01; *** P < 0.001. DaNeurons dopaminergic neurons, NS not significant, oligo oligodendrocytes, OPC oligodendrocyte precursor cells.