Fig. 5: SHAP beeswarm plots displaying how features in a dataset impact model output for featurization.

a atom pair-count, b PaDEL, c MACCS. Each dot represents an individual model instance (molecule), which pile up along each feature row to show density. Each row corresponds to one feature, which is sorted by the mean of absolute SHAP values. Color is used to display the original value of a feature, whereas the SHAP value is the impact of a given feature value on the model output. Large values correspond to larger expected model impact.