Table 3 Performance comparison on test set when the input is a structure

From: LLM-Prop: predicting the properties of crystalline materials using large language models

Model

Band gap

Volume

FEPA

EPA

Ehull

Is-gap-direct

 

(eV) ↓

(A³/cell)↓

(eV/atom) ↓

(eV/atom) ↓

(eV/atom) ↓

(AUC) ↑

Structure-based

CGCNN

0.293

188.834

0.046

0.082

0.040

0.830

MEGNet

0.304

297.948

0.077

0.056

0.051

N/A

ALIGNN

0.250

129.580

0.027

0.059

0.028

0.678

DeeperGATGNN

0.291

111.857

0.081

0.116

0.045

N/A

RF (Robo-struct.)

0.958

271.006

0.765

1.271

0.180

0.581

XGBoost (Robo-struct.)

0.984

274.104

0.761

1.266

0.178

0.586

MatBERT (Robo-struct.)

0.379

47.936

0.079

0.099

0.064

0.723

MatBERT (CIF-struct.)

0.347

46.727

0.077

0.099

0.064

0.716

LLM-Prop (Robo-struct.)

0.280

36.546

0.057

0.064

0.048

0.695

LLM-Prop (CIF-struct.)

0.269

36.546

0.056

0.065

0.048

0.695

  1. “Robo-struct.” means that the input is a condensed structure in a JSON format generated by Robocrystallographer while “CIF-struct.” denotes that the input is a CIF file. For the GNN-based models, the input is always a CIF file.