Fig. 4

Feature Engineering Steps in the Application of Machine Learning Algorithms. Subfigures (A), (B), and (C) present the log-transformed distributions and Q‒Q plots of the REVEL, MetaLR, and SIFT scores, respectively. For each subfigure, the left panels depict the log-transformed distributions, whereas the right panels present the corresponding Q‒Q plots. Subfigure (D) shows the combined density plots for the selected in silico tools, where nonnormal distributions persist even after data transformation (E). The 3D scatter plot (E) provides a visual representation of the local outlier factor (LOF) analysis, identifying and highlighting the 28 outliers that were subsequently excluded from the study. Subfigure (F) presents a heatmap of SHAP values, illustrating the impact of each feature on model predictions across instances. The dot plot in (G) further elaborates on the feature impacts, whereas subfigure (H) offers a summary plot of SHAP values, demonstrating the combined influence of the top features within the triple in silico tool model. Notably, MetaLR has a substantial negative effect on the model’s output. Finally, in subfigure (I), the feature importance metrics reveal that REVEL is the most significant contributor, accounting for approximately half of the model’s predictive power.