Table 1 F1-scores (F) for “defect” (+), “possible defect” (?), and “non-defect” (−) tweet classes for three classifiers trained on the original, under-sampled, and over-sampled data sets

From: Towards scaling Twitter for digital epidemiology of birth defects

Classifier

Training set

F (+)

F (?)

F (−)

NB

Original, imbalanced training set (14,716)

0.54

0.44

0.94

NB

Under-sampling based on similar majority class tweets in original training set (5551)a

0.46

0.38

0.92

NB

Under-sampling based on similar false-negative majority class tweets (8015)b

0.44

0.40

0.92

NB

Random under-sampling control set (5551)c

0.50

0.43

0.93

NB

Random under-sampling control set (8015)c

0.51

0.44

0.93

NB

Over-sampling instances of minority classes with replacement (40,675)d

0.49

0.40

0.93

NB

SMOTE on original training set (39,148)e

0.36

0.30

0.95

SVM

Original, imbalanced training set (14,716)

0.62

0.52

0.96

SVM

Under-sampling based on similar majority class tweets in original training set (5551)a

0.62

0.43

0.96

SVM

Under-sampling based on similar false-negative majority class tweets (8015)b

0.58

0.51

0.95

SVM

Random under-sampling control set (5551)c

0.62

0.49

0.96

SVM

Random under-sampling control set (8015)c

0.62

0.50

0.96

SVM

Over-sampling instances of minority classes with replacement (40,675)d

0.62

0.46

0.95

SVM

SMOTE on original training set (39,148)e

0.62

0.51

0.96

LSTM

Original, imbalanced training set (14,716)

0.60

0.35

0.96

LSTM

Under-sampling based on similar majority class tweets in original training set (5551)a

0.55

0.33

0.91

LSTM

Under-sampling based on similar false-negative majority class tweets (8015)b

0.48

0.36

0.90

LSTM

Random under-sampling control set (5551)c

0.54

0.37

0.92

LSTM

Random under-sampling control (8015)c

0.59

0.45

0.95

LSTM

Over-sampling instances of minority classes with replacement (40,675)d

0.55

0.45

0.95

  1. aMethod (1) described in the “Methods” section
  2. bMethod (2) described in the “Methods” section
  3. cMethod (3) described in the “Methods” section
  4. dMethod (4) described in the “Methods” section
  5. eMethod (5) described in the “Methods” section