Table 1 Description of the columns in the dataset table (PV600.csv).

From: Annotated textual dataset PV600 of perovskite bandgaps for information extraction from literature

No.

Column name

Description

Example

1

Snippet name

The name of the snippet file

FAPI_10.1016–j.jmrt.2021.03.107_31

2

Article identifier

The DOI or other identifier of the snippet origin article

10.1016/j.jmrt.2021.03.107

3

Publisher

The data provider of the article

Elsevier

4

Year

Publication year of the article

2021

5

Material

Material of the interest

FAPI

6

Text

Full text of the snippet

The strong characteristic diffraction……cells or optoelectronic applications.

7

Annotations

Does the snippet contain annotation (yes or no)

yes

8

Annotation_1

Annotated value number 1

1.5-1.4

9

Start_index_1

Beginning character index of the annotation 1 calculated from the beginning of the snippet

1312

10

End_index_1

Ending character index of the annotation 1 calculated from the beginning of the snippet

1319

11

Special_character_1

Does the annotation contain something else than just pure numbers (e.g. ‘-’)

yes

12

Processed_annotation_1

If annotation denotes range, it is averaged here or the error marking has been removed

1.45

13

Bandgap_type_1

The type of the bandgap

Literature

  1. The columns 8-13 define one annotation in the snippet and the columns after these follow the same structure, where 6 columns correspond to one annotation in the snippet.