Table 1 Data distribution of train and test sets with 80–20 split.

From: Profiling low-proficiency science students in the Philippines using machine learning

Data split

Poor performance (Level ≤ 1b)

Better performance (Level ≥ 1a)

Total

Training data

2419

3297

5716

Test data

628

801

1429

Total

3047

4098

7145

  1. Note the imbalance in the number of training samples for the good and poor-performing students.