Table 4 Preselection evaluation results precision (P), recall (R) and F1-score (F1) of LLMs.
From: Annotated textual dataset PV600 of perovskite bandgaps for information extraction from literature
Mixtral | Mixtral-Instruct | Llama-3.1 | Llama3-ChatQA | GPT-4o | |
---|---|---|---|---|---|
P | 33.7 | 74.3 | 32.8 | 35.8 | 87.8 (±0.6) |
R | 66.0 | 55.3 | 92.0 | 97.3 | 94.7 (±1.8) |
F1 | 44.6 | 63.4 | 48.3 | 52.4 | 91.6 (±0.2) |