Fig. 5: Summary of training, validation, and model interpretation pipeline.

Data were preprocessed, vectorized, and split into a training (80%) and holdout (20%) set as shown in panel (a). Binary classifiers were trained in panel (b) and the model was evaluated in panel (c). Model interpretability was provided by using word coefficients and human interpretation of misclassification.