Table 2 Out-of-sample classification performance.

The table provides macro-averaged precision, recall, and F1 score to compare model fit across “shallow” descriptive classifiers and “deep” transfer learning architectures. Logistic (Unweighted): Logistic regression classifier using TF-IDF weighted features and optimized via grid-search. Logistic (Weighted): Logistic regression classifier using TF-IDF weighted features, weighting for class imbalance, and optimized via grid-search. SVM (Unweighted): A linear support vector machine classifier using TF-IDF weighted features and optimized via grid-search. SVM (Weighted): A linear support vector machine classifier using TF-IDF weighted features, weighting for class imbalance, and optimized via grid-search. ULMFiT models: We start with a pre-trained language model which utilizes the Wiki-103 corpus. We then tuned the pre-trained model using 1) our training set \((n = 23,436)\) and a large, random sample \((n = 100,000)\) of unannotated blog and CTT paragraphs. Second, we trained the classification model using the training and validation sets described above. Given observed class imbalances, we examined four variations of the ULMFiT architecture: a model that (1) ignored class imbalance; (2) applies oversampling of each minibatch to adjust for class imbalance; (3) weights the loss function for class imbalance following the “balanced” procedure used in the scikit-learn library; and (4) uses a focal loss function. RoBERTa models: See discussion in Methods.

Search