Table 2 Statistical data from generic domain datasets.

From: Intelligent recognition of counterfeit goods text based on BERT and multimodal feature fusion

Training Data

# Line

Avg.Length

# Errors

SIGHAN 2013

350

49.2

350

SIGHAN 2014

6,526

49.7

10,087

SIGHAN 2015

3,174

30.0

4,237

Wang271K

271,329

44.4

382,704

Test Data

# Line

Avg.Length

# Errors

SIGHAN 2013

1,000

74.1

1,227

SIGHAN 2014

1,062

50.1

782

SIGHAN 2015

1,100

30.5

715