Table 1 Number of matched cases using OCR text extracted from 10,000 herbarium specimen images.

From: A novel automated label data extraction and data base generation system from herbarium specimen images using OCR and NER

NER labels

Meaning

Matched cases

en_family_name

Family name of the plants (English)

1880/10,000

'jp_family_name

Family name of the plants (Japanese)

3124/10,000

'en_name

Scientific name

2637/10,000

jp_name

Japanese name

7009/10,000

collect_country

Country

0/10,000

collect_pref

Prefecture

6830/10,000

'collect_city'

Locality

5802/10,000

collect_addr

Street

3954/10,000

collect_date

Date of collection

0/10,000

collect_person

Collector(s)

4381/10,000

collect_number

Collector number

8757/10,000