Table 1 Literature review.

From: Using machine learning to predict student outcomes for early intervention and formative assessment

Theme

Authors

Algorithms used

Aim

Key variables

Dataset size

Prediction of optimal learning materials / methods

37

k-NN

Determine appropriate study material level

Pretest scores measuring computer thinking skills

N/A

38

Decision Tree

Recommend optimal study methods

Demographic, academic, and lecturer characteristics

248

39

Naïve Bayes, J48, SMO, JRip

Predict variation in learning motivation by socio-demographic and study method

LMS achievement data, demographics

18,988

Academic performance prediction

40

k-NN, ANN, Genetic Algorithms

Predict distance learning success

Grades

227

41

DT, SVM, ANN, RF, Naïve Bayes

Predict grades in Portuguese and mathematics

Demographic, social, school-related features

227

42

Naïve Bayes, OneR

Binary classification of success in high school

Demographic, social, and grade data

1,969

43

Decision Tree

Identify students likely to fail

Demographics, past performance

346

44

ID3

Predict future grades and target weaknesses

Grades, participation measures

1,547

45

SVM, C4.5, CART, Bayes Network, Naïve Bayes

Predict pass/fail status

Academic and demographic data

776

Academic performance prediction

46

PCR Model

Predict performance based on behavior and assessments

Viewing behavior, quizzes, assignments

220

47

k-NN, SVM, ANN, DT, NB, LR

Compare ML methods for exam performance

Demographic, academic, engagement data

3,166

48

ANN

Predict final performance

Content usage, attendance, homework

3,518

49

Linear Regression, SVR

Predict academic performance

Personal, educational, extracurricular

85

50

KNN, DT, RF, LR, SVM, NB, ANN

Three-type classification of students

Performance, demographics, school

649

51

GA, DT, KNN

Predict marks/grades

Academic history

90,000

52

DT, SVM, ANN, RF, GBM, XGBoost, Bagging, NB

Predict college success

Academic and socio-economic data

6,690

53

DT, SVM, RF, GB, XGBoost, CatBoost, LGBM

Predict academic performance

Demographic, socio-economic, academic path

4,424

Dropout prediction / Early warning systems

54

Naïve Bayes, k-NN

Classify dropout vs. non-dropout

Demographics, grades

498

55

RF

Identify at-risk students

Attendance, punctuality, activity data

165,715

56

N/A

Identify dropout reasons

Individual, institutional, economic factors

OECD dataset

57

RF, J48, LR, Bagging

Identify at-risk students

Demographics

64,754

58

Logistic Model, SVM, RF

Early warning system

Demographics, academic performance

758

59

RF, NN, SVM, LR, NB, k-NN

Early risk prediction

Midterm grades, department/faculty data

1,854

60

LR, RF, MLP, LR

Predict future performance and key areas

Academic and demographic data

3,687

Classification and success factors

61

ID3

Focus on failure areas to reduce negative outcomes

Previous marks, tests, seminar grades

50

62

OneR, DT, NN, k-NN

Student classification

Demographics, prior scores, admission info

10,330

63

Naïve Bayes, J48, REPTree, SMO, MLP

Identify slow learners

Demographics, grades

152

64

ID3, C4.5, CART, CHAID

Predict factors affecting performance

Demographics, GPA

270

65

NB, RF, CART, Bayes Net

Predict graduation

Personal, family, academic, institutional

412

66

Elastic Net, RF

Identify most effective performance variables

Demographics, socio-economic variables, GPA

50,095