Table 3 Result of k-cross validation (k = 5, 10) applied for SpaCy with manual corpus, artificial data, and combined data.

From: A novel automated label data extraction and data base generation system from herbarium specimen images using OCR and NER

k

Num_entities

Num_predictions

Num_correct

Precision

Recall

f_value

With manual corpus

 k = 5

  Mean

2165.2

2089.2

1630.2

0.781

0.753

0.766

  S.D.

35.1

91.0

34.1

0.020

0.016

0.006

 k = 10

  Mean

1082.6

1050.2

821.1

0.782

0.759

0.770

  S.D.

33.6

38.2

20.6

0.016

0.019

0.015

With manual corpus + artificial data (10,000)

 k = 5

  Mean

2165.2

2056.2

1688.8

0.821

0.780

0.800

  S.D.

35.1

33.9

22.0

0.005

0.004

0.002

 k = 10

  Mean

1082.6

1031.7

843

0.817

0.779

0.798

  S.D.

33.5

33.4

19.8

0.013

0.022

0.016

Artificial data only (10,000)

 k = 5

  Mean

2165.2

1365.6

1063.4

0.779

0.491

0.602

  S.D.

35.1

16.3

18.6

0.008

0.012

0.011

 k = 10

Mean

1082.6

682.8

531.7

0.779

0.491

0.602

  S.D.

33.6

15.5

15.5

0.011

0.015

0.013