Table 4 Preselection evaluation results precision (P), recall (R) and F1-score (F1) of LLMs.

From: Annotated textual dataset PV600 of perovskite bandgaps for information extraction from literature

 

Mixtral

Mixtral-Instruct

Llama-3.1

Llama3-ChatQA

GPT-4o

P

33.7

74.3

32.8

35.8

87.8 (±0.6)

R

66.0

55.3

92.0

97.3

94.7 (±1.8)

F1

44.6

63.4

48.3

52.4

91.60.2)

  1. The GPT-4o results are presented as average  ± standard deviation.