Table 5 The contribution of each preprocessing strategy on LLM-Prop performance
From: LLM-Prop: predicting the properties of crystalline materials using large language models
Model | Band gap ↓ | Volume ↓ | Is-gap-direct ↑ |
---|---|---|---|
LLM-Prop (baseline) | 0.256 | 69.352 | 0.796 |
+ modified tokenizer | 0.247 | 78.632 | 0.785 |
+ label scaling | 0.242 | 44.515 | N/A |
+ [CLS] token | 0.231 | 39.520 | 0.842 |
+ [NUM] token | 0.251 | 86.090 | 0.793 |
+ [ANG] token | 0.242 | 64.965 | 0.810 |
− stopwords | 0.252 | 56.593 | 0.779 |
LLM-Prop+all (without space group) | 0.235 | 97.457 | 0.705 |
LLM-Prop+all | 0.229 | 42.259 | 0.857 |