Table 2 Statistical data from generic domain datasets.
From: Intelligent recognition of counterfeit goods text based on BERT and multimodal feature fusion
Training Data | # Line | Avg.Length | # Errors |
|---|---|---|---|
SIGHAN 2013 | 350 | 49.2 | 350 |
SIGHAN 2014 | 6,526 | 49.7 | 10,087 |
SIGHAN 2015 | 3,174 | 30.0 | 4,237 |
Wang271K | 271,329 | 44.4 | 382,704 |
Test Data | # Line | Avg.Length | # Errors |
|---|---|---|---|
SIGHAN 2013 | 1,000 | 74.1 | 1,227 |
SIGHAN 2014 | 1,062 | 50.1 | 782 |
SIGHAN 2015 | 1,100 | 30.5 | 715 |