Table 1 Best classification score achieved using a Logistic Regression Classifier with L₂ regularization.

	F1	AUROC	Accuracy	Precision	Recall
Initial set	0.73	0.81	0.71	0.72	0.73
Final set	0.81	0.89	0.81	0.81	0.81
Final set - out-of-sample	0.79	—	0.72	0.79	0.79

For the training set obtained with the final set of hashtags, classification scores are computed over a 10-fold cross-validation. For the training set obtained with the initial set of hashtags, classification scores are computed on the set of tweets contained in the final set but not used for training the classifier. For F₁, Precision and Recall, the average of the two scores computed by taking each class as the positive class is computed. The out-of-sample scores are computed using a random sample of 500 manually annotated tweets.

Quick links

Search