Table 1 Literature survey.
Author | Dataset | Limitation | Accuracy |
|---|---|---|---|
Pima Indians Diabetes Dataset | Compares only a few ML algorithms | J48 decision tree with 94.44% accuracy, as well as Random Forest with 94% accuracy | |
Dataset of diabetes from Frankfurt hospital | Limited data preprocessing is done | Decision tree of 99% accuracy | |
Diabetes 130 | Practical Implementation is difficult | Light gradient boosting model is better performing | |
(NHANES), MIMIC-III and MIMIC-IV | Dataset is specific for a certain population | Logistic Regression model achieved the highest accuracy of 86% | |
Dataset from the TUBITAK study | The dataset size is relatively small, which may affect generalizability | The best model, an RNN-LSTM achieved an AUC of 98%. | |
T2DM database | Models used are prone to overfitting | LGBM achieved the highest accuracy at 95.2% | |
NHANES | Logistic regression may not capture complex relationships | The logistic regression model achieved an accuracy of approximately 78.26% |