Table 3 Model performance on CoREMOF dataset. Root-mean-square-deviation (RMSD) and coefficient of determination (R\(^2\) score) results in predicting the Henry’s coefficient (log k\(_{H}\)) for CO\(_{2}\) and CH\(_{4}\) and gas uptakes for CH\(_{4}\) for the CoREMOF dataset. Different sets of features (S = baseline structural, T = topological, T + WE = topological and word embeddings) are shown.

From: Machine learning with persistent homology and chemical word embeddings improves prediction accuracy and interpretability in metal-organic frameworks

Target

RMSD

R\(^2\) score

S

T

T + WE

\({{\Delta }}\)

S

T

T + WE

\({{\Delta }}\)

log(K\(_{H}\)) CO\(_2\)

0.90

0.73

0.60

33.3%

0.26

0.53

0.69

165%

log(K\(_{H}\)) CH\(_4\)

0.34

0.30

0.24

29.4%

0.55

0.65

0.78

41.2%

5.8 bar CH\(_4\)

27.15

22.00

20.19

25.7%

0.47

0.65

0.71

51.1%

65 bar CH\(_4\)

32.06

25.57

24.57

23.1%

0.76

0.85

0.87

14.5%

  1. For each target, the units are mol kg\(^{-1}\) Pa\(^{-1}\) and V\(_{STP}\)/V respectively. The best model is in bold. As the improvement from the topology + word embeddings is always greater than the structural features, the percentage of improvement (decrease in the case of RMSD and increase in the case of R\(^2\) score) is also shown (\(\Delta \)).