Table 9 Details of the machine learning (ML) approaches used to model and predict different MXene outputsa

From: First-principles and machine-learning approaches for interpreting and predicting the properties of MXenes

Year, authors

Output

Data size

Feature selection

ML problem

Algorithm

Validation /Test

Rajan et al.54

Metal−semiconductor identification

643

LASSO

Classification

Bagging

5-fold CV, 10 % test

Rajan et al.54

Band-gap estimation

70

LASSO

Regression

Gaussian process

5-fold CV, 10 % test

Frey et al.209

Synthesizability

792

PCA

Positive and unlabeled learning

Robust ensemble SVM

10-fold CV

Mishra et al.212

Valence band

76

LASSO, NCA

Regression

Gaussian process

10 % test

Marchwiany et al.219

Cytotoxicity

71

Random forest feature importance

Classification

Random forest

10-fold CV

Venturi et al.214

Mechanical strength, band gap and formation energy

>3500

---

Regression

CGCNNs

15 % validation; 15 % test

Wang et al.222

HER

420

RFE; feature importance;

Pearson correlation coefficient

Regression

AdaBoost

10-fold CV

Zheng et al.221

HER

299

Pearson correlation coefficient

Regression

Random forest

10-fold CV, 25 % test

He et al.210

Stability

85

Pearson correlation coefficient;

Symbolic regression

Classification

SVM

20 % test

Li et al.226

Energy storage (gravimetric capacity, voltage, and induced charge)

360

RFE

Regression and Classification

Multi-output random forest

5-fold CV; 20 % test

Song et al.218

Saturation magnetization

23825

---

Classification

AdaBoost

10-fold CV; 10 % test

Tian et al.216

Tensile stiffness

157

---

Regression

SISSO

---

Abraham et al.229

CO2 activation

114

Feature importance

Regression

Random forest

5-fold CV; 20 % test

Boonpalit et al.230

CO sensing

450

---

Regression

CGCNNs

20 % validation

Chen et al.224

ORR and OER

78

Feature importance

Regression

Random forest

4-fold CV; 25 % test

Cheng et al.227

Hydrogen adsorption distance

12647

---

Classification and regression

ALIGNNs

5-fold CV

Ding et al.232

Solar spectral absorption

500

---

Regression

Random forest

10 % test

Jiao et al.231

C−N coupling

54

LASSO

Regression

SISSO

10-fold CV

Ma et al.225

ORR and OER

42

Expert knowledge criteria

Regression

Random forest and gradient boosting

5-fold CV; 10 % test

Liang et al.223

HER (Gibbs free energy of hydrogen adsorption)

264

Pearson correlation coefficient; feature importance

Regression

Random forests

30 % test

Liang et al.223

HER (cohesive energy)

264

Pearson correlation coefficient; feature importance

Regression

Random forest

30 % test

Roy et al.213

Work function

275 and 315

Genetic algorithm

Regression

Artificial neural networks

5-fold CV; 20 % test

  1. aDefinition of the abbreviations: ALIGNNs - physics-informed atomistic line graph neural networks; CGCNNs - Crystal Graph Convolutional Neural Networks; CV - Cross-Validation; HER – Hydrogen Evolution Reaction; LASSO - Least Absolute Shrinkage and Selection Operator; NCA - Neighborhood Component Analysis; OER – Oxygen Evolution Reaction; ORR – Oxygen Reduction Reaction; PCA - Principle Component Analysis; RFE - Recursive Feature Elimination; SISSO - Sure Independence Screening and Sparsifying Operator; SVM – Support Vector Machine.