Table 1 Summary of the state-of-the-art literature.

From: Efficient diagnosis of diabetes mellitus using an improved ensemble method

Name and author	Dataset	Methodology	Results and accuracy	Limitations	Future scope
Yadav and Pal³¹	UCI Repository	J48, Decision Stump, REP, RF, Gradient Boosting, AdaBoost M1, XGBoost	RF (Parallel) = 100%, XGBoost (Sequential) = 98.05%	Limited to ensemble methods	Explore hybrid models combining sequential and parallel approaches
Kumari, Kumar, and Mittal³⁹	PIMA diabetes dataset (UCI)	RF, Logistic Regression, Naive Bayes	79.08% (PIMA), 97.02% (breast cancer)	Focused on soft voting ensembles	Extend to other medical datasets and more classifiers
Tewari and Dwivedi³²	UCI dataset	JRIP, OneR, Decision Table, Boosting, Bagging	Bagging = 98%	Limited feature selection methods	Investigate more advanced feature selection techniques
Ghosh et al. 2021	PIMA Indians diabetes dataset	Gradient Boosting, SVM, AdaBoost, RF with and without MRMR feature selection	RF = 99.35% with MRMR	High complexity in MRMR feature selection process	Simplify feature selection and test on other datasets
Atif, Anwer, and Talib⁴⁴	PIMA Indians dataset, Early-Stage Diabetes	Hard Voting Classifier (Logistic Regression, Decision Tree, SVM)	81.17% (PIMA), 94.23% (Early-Stage Diabetes)	Limited voting scheme to hard voting	Explore soft voting or weighted voting for improved results
Rashid, Yaseen, Saeed, and Alasaady [45]	PIMA Indians Diabetes Dataset (PIDD)	Decision Tree, Logistic Regression, KNN, RF, XGBoost	81% after standardization and imputation	Focused only on ensemble voting techniques	Test other ensemble strategies such as bagging or boosting
Zhou, Xin, and Li ⁴⁶	PIMA Indian diabetes dataset	Boruta feature selection, K-Means + + clustering, stacking ensemble learning	98%	High computational cost in clustering	Reduce computation and test scalability
Kawarkhe and Kaur⁴⁷	PIMA Indians	CatBoost, LDA, LR, RF, GBC with preprocessing techniques	90.62%	Limited to specific preprocessing techniques	Broaden the preprocessing techniques and methods
Reza, Amin, Yasmin, Kulsum, and Ruhi⁴⁸	PIMA Indian diabetes dataset, local healthcare	Stacking ensemble with classical and deep neural networks	77.10% (PIMA), 95.50% (simulation)	Limited to stacking approaches	Explore other ensemble methods or hybrid approaches
Thongkam et al.¹⁷	Breast Cancer Dataset	AdaBoost	Improved prediction and diagnosis	Initially used only for breast cancer	Extend to other medical conditions
Velu and Kashwan¹⁸	Various Datasets	SVM, Radial Basis Function, Multi-Layer Perceptron, and Multi-Level Counter Propagation Network.	High accuracy in various applications	Complexity in model selection	Test different combinations and optimizations
Temurtas et al.¹⁹	PIMA-diabetes illness dataset	Multilayer Neural Network	Improved accuracy	Focused on PIMA dataset	Apply to other chronic disease datasets
Ayo at al.²⁰	Heart Disease Dataset	Levenberg–Marquardt approach, Probabilistic Neural Network, Naive Bayes, SVM	High accuracy in diagnosing cardiac disease	Limited to cardiac disease prediction	Broaden to include other comorbidities
Farvaresh and Sepehri²¹	Various medical datasets	Decision Tree C4.5, Bagging with C4.5, and Naive Bayes.	Improved prediction of cardiac illness	Initial focus on cardiac illness	Expand to other diseases and datasets
Kalman Filter Theory²²	PIMA Indian dataset	Adaptive and personalized insulin recommendation	Enhanced classification accuracy	Focused on insulin recommendation systems	Broaden to other therapeutic recommendations
Ajagbe et al.²³	Various applications	Multimedia analytic techniques, meta-data annotation, MPEG-7	Improved semantic analysis	Limited to MPEG-7	Explore alternative multimedia retrieval frameworks
Gong and Kim,²⁴	Misbalanced Datasets	RHS-Boost for balanced classification	High accuracy and prediction	Designed for misbalanced datasets	Apply to other datasets and test alternative balancing methods
Purnami et al.²⁵	Diabetes detection	ANFIS and PCA	Enhanced detection	Initial partitioning approach	Broaden to include other feature extraction methods
Rani and Jyothi²⁶	Diabetes dataset	Bayesian Classification, J48, KNN, Filtered Classifier, ANN, Naive Bayes	77.01% accuracy	Lack of cross-validation	Implement cross-validation and expand dataset usage
Zheng et al.²⁷	Various datasets	KNN, Naive Bayes, Decision Tree, RF, SVM, Logistic Regression	Improved recall and accuracy	Filtering criteria could be improved	Enhance feature selection and parameter tuning
Komi et al.²⁸	Sample datasets	ELM, ANN, LR, GMM, SVM	Better accuracy with fewer samples	Less amount of sample data	Increase sample data and test on more complex datasets
Sai et al.²⁹	Diabetes dataset	Weighted voting approach for ensemble prediction models	Enhanced predictive performance	Focused on ensemble prediction model	Explore ensemble expansion and optimization
Rustam et al.⁵⁶	Multiple datasets	Ensemble of CNN and LSTM for feature extraction, Random Forest model for prediction	Accuracy score of 0.99 using CNN-LSTM features with Random Forest	Limited by dataset size, generalizability issues in existing approaches	Explore other ensemble models, improve dataset diversity, real-world applicability
Faustin and Zou et al.⁵⁷	Pima Indian Diabetes Dataset	Genetic Algorithm (GA) enhanced with a two-step crossover operator for feature selection.	Accuracy: 97.5%, Precision: 98%, Recall: 97%, F1-score: 97%	Premature convergence due to insufficient population diversity in GA.	Apply the improved GA to other datasets, refine crossover technique.
Reza et al.⁵⁸	PIMA Indian Diabetes dataset, Local healthcare data	Stacking ensemble method combining classical and deep neural network models for diabetes classification.	Stacking ensemble with NN architectures: Accuracy 95.50%, Precision 94%, Recall 97%, F1-score 96%	Limited to dataset used in the study; may need more diverse data for generalization.	Explore further with other datasets, and apply to real-time healthcare systems.
Saihood and Sonuç⁵⁹	Pima Indians Diabetes Database	Ensemble machine learning models: Bagging, boosting, and stacking with hyperparameter tuning and data preprocessing.	Stacking (RF & SVM): 97.50% accuracy, Bagging (RF): 97.20%, Boosting (XGB): 97.10%	Framework limited to Pima Indians dataset; real-world data may vary.	Extend framework to include more diverse datasets and explore real-time applications in healthcare.
Daza et al.⁶⁰	Diabetes Dataset (768 patient records)	Stacking ensemble approach using 7 base algorithms; oversampling to balance the dataset and cross-validation for model training.	Best accuracy: 91.5%, Sensitivity: 91.6%, F1-Score: 91.49%, Precision: 91.5%, ROC Curve: 97%.	Performance depends on the dataset and oversampling method.	Improve the model’s generalizability by testing on other datasets and enhancing model

Back to article page

Table 1 Summary of the state-of-the-art literature.

Search

Quick links