Table 6 Comparison of model performance on the test set with and without data augmentation.

From: Data-augmented machine learning for personalized carbohydrate-protein supplement recommendation for endurance

Model

Condition

Key optimized hyperparameters

CV performance (on development set)

Final performance (on hold-out test set)

CV train MAE

CV train R2

CV valid MAE

CV valid RMSE

CV valid R2

Test MAE

Test RMSE

Test R2

XGBoost

Baseline

n_estimators = 500, max_depth = 5, learning_rate = 0.05

68.66

0.98

587.65

742.96

0.42

642.40

751.26

0.48

Augmented

n_estimators = 400, max_depth = 3, learning_rate = 0.05

276.98

0.84

614.90

759.11

0.39

632.05

715.97

0.53

SVR

Baseline

kernel=’poly’, C = 500, degree = 3

356.48

0.65

644.75

830.19

0.24

620.12

773.31

0.45

Augmented

kernel=’rbf’, C = 500, gamma=’auto’

284.18

0.73

660.34

840.97

0.26

644.96

765.48

0.46

MLP

Baseline

hidden_dims=[32, 16], learning_rate = 0.001, batch_size = 16

434.94

0.71

638.54

785.69

0.33

556.96

686.76

0.57

Augmented

hidden_dims=[128, 64], learning_rate = 0.005, batch_size = 16

256.23

0.86

706.37

868.69

0.20

641.48

771.49

0.46

  1. Note: Bold values highlight the metrics of the best-performing model on the hold-out test set for each data condition (Baseline and Augmented).