Table 1 Literature survey.

From: Robust diabetic prediction using ensemble machine learning models with synthetic minority over-sampling technique

Author

Dataset

Limitation

Accuracy

16

Pima Indians Diabetes Dataset

Compares only a few ML algorithms

J48 decision tree with 94.44% accuracy, as well as Random Forest with 94% accuracy

18

Dataset of diabetes from Frankfurt hospital

Limited data preprocessing is done

Decision tree of 99% accuracy

19

Diabetes 130

Practical Implementation is difficult

Light gradient boosting model is better performing

20

(NHANES), MIMIC-III and MIMIC-IV

Dataset is specific for a certain population

Logistic Regression model achieved the highest accuracy of 86%

21

Dataset from the TUBITAK study

The dataset size is relatively small, which may affect generalizability

The best model, an RNN-LSTM achieved an AUC of 98%.

22

T2DM database

Models used are prone to overfitting

LGBM achieved the highest accuracy at 95.2%

23

NHANES

Logistic regression may not capture complex relationships

The logistic regression model achieved an accuracy of approximately 78.26%