Table 2 Performance comparison on test set between LLM-Prop and a text/description-based baseline (MatBERT)

From: LLM-Prop: predicting the properties of crystalline materials using large language models

Model

Band gap

Volume

FEPA

EPA

Ehull

Is-gap-direct

 

(eV) ↓

(A³/cell)˚↓

(eV/atom) ↓

(eV/atom) ↓

(eV/atom) ↓

(AUC) ↑

Text-based

MatBERT w/ Numbers

0.258

56.613

0.071

0.100

0.058

0.710

MatBERT w/o Numbers

0.262

54.969

0.079

0.104

0.053

0.714

MatBERT w/ [NUM]&[ANG]

0.260

55.984

0.076

0.098

0.050

0.722

LLM-Prop w/ Numbers

0.232

39.138

0.056

0.071

0.049

0.835

LLM-Prop w/o Numbers

0.231

39.252

0.056

0.072

0.047

0.839

LLM-Prop w/ [NUM]&[ANG]

0.234

40.123

0.057

0.067

0.047

0.857

  1. “w/ Numbers” denotes retaining both bond lengths and angles, “w/o Numbers” denotes removing both bond lengths and bond angles from the crystal description, and “w/ [NUM]&[ANG]” means that we replace bond lengths and bond angles with [NUM] and [ANG] tokens, respectively.