Table 3 Comparative performance of three ChemBERTa-based models on the twelve QM9 properties. Each row corresponds to a specific QM9 property, while the columns list R² and RMSE for (i) ChemBERTa-zinc-base-v1 (“v1”), (ii) ChemBERTa-77 M-MLM (“v2”), and (iii) the SELFIES-domain-adapted model (“SELFIES DA”). The SELFIES-domain-adapted model outperforms the smaller SMILES model in all targets and in most cases exceeds the larger model’s performance.

From: Domain adaptation of a SMILES chemical transformer to SELFIES with limited computational resources

Property

R²

RMSE

v1

v2

SELFIES DA

v1

v2

SELFIES DA

μ

0.6725 ± 0.0078

0.7439 ± 0.0054

0.7440 ± 0.0067

0.8593 ± 0.0124

0.7598 ± 0.0048

0.7596 ± 0.0083

α

0.6809 ± 0.0075

0.9451 ± 0.0036

0.9686 ± 0.0022

4.6133 ± 0.0727

1.9123 ± 0.0609

1.4463 ± 0.0490

\(\in _{{HOMO}}\)

0.7769 ± 0.0113

0.8874 ± 0.0022

0.8804 ± 0.0053

0.0104 ± 0.0003

0.0074 ± 0.0001

0.0076 ± 0.0002

\(\in _{{LUMO}}\)

0.9242 ± 0.0070

0.9578 ± 0.0021

0.9526 ± 0.0040

0.0129 ± 0.0006

0.0096 ± 0.0002

0.0102 ± 0.0004

\(\in _{{gap}}\)

0.8959 ± 0.0054

0.9381 ± 0.0013

0.9352 ± 0.0014

0.0152 ± 0.0004

0.0117 ± 0.0001

0.0120 ± 0.0001

< R²>

0.6552 ± 0.0124

0.9341 ± 0.0027

0.9514 ± 0.0037

164.7080 ± 3.7830

72.0325 ± 2.1329

61.7844 ± 2.3331

zpve

0.8578 ± 0.0082

0.9677 ± 0.0024

0.9813 ± 0.0016

0.0125 ± 0.0003

0.0059 ± 0.0002

0.0045 ± 0.0002

U0

0.7245 ± 0.0129

0.9644 ± 0.0011

0.9828 ± 0.0012

20.9064 ± 0.5646

7.5164 ± 0.0979

5.2261 ± 0.1893

U

0.7436 ± 0.0063

0.9653 ± 0.0011

0.9832 ± 0.0008

20.1722 ± 0.3222

7.4171 ± 0.1182

5.1588 ± 0.1020

H

0.7420 ± 0.0088

0.9637 ± 0.0021

0.9826 ± 0.0012

20.2327 ± 0.3604

7.5918 ± 0.2141

5.2573 ± 0.1938

G

0.7468 ± 0.0049

0.9657 ± 0.0034

0.9819 ± 0.0014

20.0472 ± 0.2422

7.3691 ± 0.3614

5.3600 ± 0.1806

Cv

0.7769 ± 0.0069

0.9591 ± 0.0019

0.9729 ± 0.0014

1.9192 ± 0.0239

0.8222 ± 0.0232

0.6689 ± 0.0156