Table 3 Performance comparison on test set when the input is a structure

Model	Band gap	Volume	FEPA	EPA	Ehull	Is-gap-direct
	(eV) ↓	(A³/cell)↓	(eV/atom) ↓	(eV/atom) ↓	(eV/atom) ↓	(AUC) ↑
Structure-based
CGCNN	0.293	188.834	0.046	0.082	0.040	0.830
MEGNet	0.304	297.948	0.077	0.056	0.051	N/A
ALIGNN	0.250	129.580	0.027	0.059	0.028	0.678
DeeperGATGNN	0.291	111.857	0.081	0.116	0.045	N/A
RF (Robo-struct.)	0.958	271.006	0.765	1.271	0.180	0.581
XGBoost (Robo-struct.)	0.984	274.104	0.761	1.266	0.178	0.586
MatBERT (Robo-struct.)	0.379	47.936	0.079	0.099	0.064	0.723
MatBERT (CIF-struct.)	0.347	46.727	0.077	0.099	0.064	0.716
LLM-Prop (Robo-struct.)	0.280	36.546	0.057	0.064	0.048	0.695
LLM-Prop (CIF-struct.)	0.269	36.546	0.056	0.065	0.048	0.695

“Robo-struct.” means that the input is a condensed structure in a JSON format generated by Robocrystallographer while “CIF-struct.” denotes that the input is a CIF file. For the GNN-based models, the input is always a CIF file.

Quick links

Search