Table 2 Named entity recognition and relation extraction scores for three tasks in materials science using models with a JSON output schema

From: Structured information extraction from scientific text with large language models

Task

Relation

E.M. Precision (GPT-3)

E.M. Recall (GPT-3)

E.M. F1 (GPT-3)

E.M. Precision (Llama-2)

E.M. Recall (Llama-2)

E.M. F1 (Llama-2)

Doping

host-dopant

0.772

0.684

0.726

0.836

0.807

0.821a

General

formula-name

0.507

0.429

0.456

0.462

0.417

0.367

General

formula-acronym

0.500

0.250

0.333

0.333

0.250

0.286

General

formula-structure/phase

0.538

0.439

0.482

0.551

0.432

0.47

General

formula-application

0.542

0.543

0.537

0.545

0.496

0.516

General

formula-description

0.362

0.35

0.354

0.347

0.342

0.340

MOFs

name-formula

0.425

0.688

0.483

0.460

0.454

0.276

MOFs

name-guest specie

0.789

0.576

0.616

0.497

0.407

0.408

MOFs

name-application

0.657

0.518

0.573

0.507

0.562

0.531

MOFs

name-description

0.493

0.475

0.404

0.432

0.411

0.389

  1. Exact match (E.M.) scores are evaluated on a per-word basis, and links are only correct if both entities and the relationship are correct. The exact match metric scores output that contains the correct information but is written differently as incorrect, making such scores a rough lower bound on the true performance of models. F1, precision, and recall reflect the scores on a hold out test set for doping models and averages over five cross-validation sets for the general and MOF models.
  2. aBest F1 scores for each task are shown in bold.