Table 1 Literature review.
From: Using machine learning to predict student outcomes for early intervention and formative assessment
Theme | Authors | Algorithms used | Aim | Key variables | Dataset size |
|---|---|---|---|---|---|
Prediction of optimal learning materials / methods | k-NN | Determine appropriate study material level | Pretest scores measuring computer thinking skills | N/A | |
Decision Tree | Recommend optimal study methods | Demographic, academic, and lecturer characteristics | 248 | ||
Naïve Bayes, J48, SMO, JRip | Predict variation in learning motivation by socio-demographic and study method | LMS achievement data, demographics | 18,988 | ||
Academic performance prediction | k-NN, ANN, Genetic Algorithms | Predict distance learning success | Grades | 227 | |
DT, SVM, ANN, RF, Naïve Bayes | Predict grades in Portuguese and mathematics | Demographic, social, school-related features | 227 | ||
Naïve Bayes, OneR | Binary classification of success in high school | Demographic, social, and grade data | 1,969 | ||
Decision Tree | Identify students likely to fail | Demographics, past performance | 346 | ||
ID3 | Predict future grades and target weaknesses | Grades, participation measures | 1,547 | ||
SVM, C4.5, CART, Bayes Network, Naïve Bayes | Predict pass/fail status | Academic and demographic data | 776 | ||
Academic performance prediction | PCR Model | Predict performance based on behavior and assessments | Viewing behavior, quizzes, assignments | 220 | |
k-NN, SVM, ANN, DT, NB, LR | Compare ML methods for exam performance | Demographic, academic, engagement data | 3,166 | ||
ANN | Predict final performance | Content usage, attendance, homework | 3,518 | ||
Linear Regression, SVR | Predict academic performance | Personal, educational, extracurricular | 85 | ||
KNN, DT, RF, LR, SVM, NB, ANN | Three-type classification of students | Performance, demographics, school | 649 | ||
GA, DT, KNN | Predict marks/grades | Academic history | 90,000 | ||
DT, SVM, ANN, RF, GBM, XGBoost, Bagging, NB | Predict college success | Academic and socio-economic data | 6,690 | ||
DT, SVM, RF, GB, XGBoost, CatBoost, LGBM | Predict academic performance | Demographic, socio-economic, academic path | 4,424 | ||
Dropout prediction / Early warning systems | Naïve Bayes, k-NN | Classify dropout vs. non-dropout | Demographics, grades | 498 | |
RF | Identify at-risk students | Attendance, punctuality, activity data | 165,715 | ||
N/A | Identify dropout reasons | Individual, institutional, economic factors | OECD dataset | ||
RF, J48, LR, Bagging | Identify at-risk students | Demographics | 64,754 | ||
Logistic Model, SVM, RF | Early warning system | Demographics, academic performance | 758 | ||
RF, NN, SVM, LR, NB, k-NN | Early risk prediction | Midterm grades, department/faculty data | 1,854 | ||
LR, RF, MLP, LR | Predict future performance and key areas | Academic and demographic data | 3,687 | ||
Classification and success factors | ID3 | Focus on failure areas to reduce negative outcomes | Previous marks, tests, seminar grades | 50 | |
OneR, DT, NN, k-NN | Student classification | Demographics, prior scores, admission info | 10,330 | ||
Naïve Bayes, J48, REPTree, SMO, MLP | Identify slow learners | Demographics, grades | 152 | ||
ID3, C4.5, CART, CHAID | Predict factors affecting performance | Demographics, GPA | 270 | ||
NB, RF, CART, Bayes Net | Predict graduation | Personal, family, academic, institutional | 412 | ||
Elastic Net, RF | Identify most effective performance variables | Demographics, socio-economic variables, GPA | 50,095 |