Table 5 The contribution of each preprocessing strategy on LLM-Prop performance

Model	Band gap ↓	Volume ↓	Is-gap-direct ↑
LLM-Prop (baseline)	0.256	69.352	0.796
+ modified tokenizer	0.247	78.632	0.785
+ label scaling	0.242	44.515	N/A
+ [CLS] token	0.231	39.520	0.842
+ [NUM] token	0.251	86.090	0.793
+ [ANG] token	0.242	64.965	0.810
− stopwords	0.252	56.593	0.779
LLM-Prop+all (without space group)	0.235	97.457	0.705
LLM-Prop+all	0.229	42.259	0.857

We compare the baseline (when the input crystal descriptions and the targets are not touched, and with default T5 tokenizer) and to when all strategies are combined together.

Quick links

Search