Figure 3

Two corresponding pairwise versions of multi-way importance plots of the investigated machine learning algorithms based on the random forests meta/ensemble learner when fitted on the probability estimates of the eight ML models as features (Table 3) based on the findings (a,b), impressions (c,d) and both (e,f) report sections. The axes on subplots (a,c,e) measure the prediction related relevance of a variable. Here, y-axes (Gini_decrease) display the Gini feature importance-based mean decrease in node impurity while the x-axes (Accuracy_decrease) show the more robust mean decrease in accuracy (type = 1) variable importance measure6,62,63,64. P-values (legend: red, green and blue patches and colored text brackets) were derived from a binomial distribution of the number of nodes split on the variable assuming random draws. On subplots (b,d,f), y-axes (Times_a_root) show the number of trees in which the root is split on that variable (i.e. ML classifier), whereas the x-axes (Mean_minimal_depth) show the mean depth of first split on the variable. Because these two measures are negatively associated, most important variables are located in the upper-left corner. Area of the points is proportional with the total number of nodes (no_of_nodes) in the forest that split on that variable and the points are blue if the variable was used as root (top). When ML classifiers trained only on the findings sections were fed to the RF ensemble (a), XGBoost (p < 0.01) was the only significant predictor while linear kernel SVM showed a weak trend (p < 0.1). Underscoring XGBoost’s importance (b), it was used in the most nodes and as root split. Among the models developed on the impressions (c), fastText (p < 0.01) was the most important predictor followed by SVM-LK (p < 0.01) while brier score-tuned RF (tRF-BS) showed a week trend (p < 0.1). FastText and SVM-LK (d) were the most relevant classifiers based on tree splitting measures. Likewise, when all 16 ML-models were combined (e), fastText (p < 0.01) and SVM-LK (p < 0.01) based on the impressions dominated the importance rankings, however, although less relevant findings-based XGBoost still achieved a weak trend (p < 0.1). Plots were created on the first outer fold test set (Ntest.1.0 = 42).