Table 4 Overview of input text length and its impact on coding performance with prediction probabilities greater than 0.5.
From: Multi-label text classification via secondary use of large clinical real-world data sets
# of Tokens | > 512 | \(<=\) 512 | ||||||
---|---|---|---|---|---|---|---|---|
Model | Precision | Recall | F1-measure | MAP | Precision | Recall | F1-measure | MAP |
medBERT.de | 0.717 | 0.538 | 0.579 | 0.762 | 0.793 | 0.767 | 0.767 | 0.812 |
surgeryBERT.at | 0.711 | 0.527 | 0.572 | 0.758 | 0.777 | 0.743 | 0.747 | 0.793 |
fastText | 0.659 | 0.513 | 0.544 | 0.710 | 0.730 | 0.719 | 0.709 | 0.766 |
CNN | 0.660 | 0.464 | 0.511 | 0.712 | 0.714 | 0.674 | 0.681 | 0.731 |
SVM | 0.773 | 0.593 | 0.639 | 0.836 | 0.758 | 0.725 | 0.728 | 0.780 |
LR | 0.557 | 0.386 | 0.425 | 0.593 | 0.598 | 0.568 | 0.571 | 0.616 |