Fig. 3: Positive-unlabeled random forest model interpretation based on known antigens. | npj Systems Biology and Applications

Fig. 3: Positive-unlabeled random forest model interpretation based on known antigens.

From: Positive-unlabeled learning identifies vaccine candidate antigens in the malaria parasite Plasmodium falciparum

Fig. 3

a The left panel displays permutation-based variable importance analysis of the final tree-filtered model. The x-axis shows the mean decrease in accuracy (scaled by the standard error) of the known antigen set (n = 52) after permuting the variables for each tree in the model. The y-axis lists the ten most important variables in predicting the known antigens. The property groups of the variables are noted by colors. The right panel shows a summary of variable values of the known antigens (red) and randomly selected proteins (n = 52; blue) that are predicted as non-antigens by the final tree-filtered model. The ten most important variables obtained from the permutation-based variable importance analysis are shown. Points represent proteins. Boxplots show median with first and third quartiles, and the whiskers indicate the 1.5 interquartile range extended from the first and third quartiles. Numbers on the right show adjusted p-values calculated using two-sided Mann–Whitney tests. Variable values were normalized based on the entire data set. b Permutation-based group variable importance analysis. Variable importance was calculated on the known antigens, and the decrease in accuracy after variable permutation was recorded. Variables in the same property groups were permutated together. The mean decrease in accuracy was standardized using the standard error computed across all trees in the model.

Back to article page