Table 1 Comparision analysis of literature survey.

From: Hybrid Synthetic Minority Over-sampling Technique (HSMOTE) and Ensemble Deep Dynamic Classifier Model (EDDCM) for big data analytics

Study/method

Key issue in previous methods

Proposed solution/approach

How the issue was resolved/alleviated

Wang et al.20—SA-EFS (Sort Aggregation-based EFS)

Single FS methods struggle with stability and accuracy in HD datasets

Combined CST, Maximum Data Coefficient, and XGBoost via AM & GM aggregation

AM aggregation improved accuracy significantly compared to single FS, with optimal T interval (0.1) enhancing robustness across classifiers

Chandralekha and Shebagavadivu21—Wrapper + Random Trees EFS

Traditional FS often selects irrelevant features, reducing classifier accuracy

Wrapper-based RT + bagging + probability weighting to refine feature selection

Removed irrelevant features, achieving better attribute selection and mean classification accuracy of 92%, outperforming other ensemble methods

Elgin Christo et al.22—Correlation-based EFS + GD-BPNN

Existing FS methods fail to consider correlation and domain-specific datasets

Correlation-based EFS + Neural Network (GD-BPNN) with tenfold CV

Improved disease classification on WDBC & Hepatitis datasets; adaptable for clinical DM systems, addressing feature redundancy issues

Rezaee et al.23—Two-Step Gene Expression FS + DNN

Gene expression FS suffers from poor generalizability and high error rates

Wrapper-based gene ranking (kNN) + soft ensembling + stacked DNN

Found efficient gene subsets, reduced error rates, and validated generalizability on unseen MS and SRBCT datasets

Rashid et al.24—Random Feature Grouping (RFG) + CCFS

Existing CC-based FS ignores feature interactions, reducing accuracy

Introduced RFG variants within CCFS to dynamically group interacting features

Improved accuracy across 7 datasets with kNN, J48, RF, SVM, NB, outperforming baseline CC-based FS (CCEAFS)

You et al.25—PSO-based Two-Stage Weighted Ensemble

Difficulty balancing diversity vs. accuracy in ensemble classifiers

Stage 1: Mixed-binary PSO for learner diversity. Stage 2: Weighted ensemble optimization

Struck balance between diversity & accuracy; outperformed SOTA methods on 30 UCI datasets

Uddin and Halder26—Multi-Layer Dynamic System (MLDS)

Base classifiers underperform in CVD prediction due to weak FS

Multi-layer FS (CAE, GRAE, IGAE, Lasso, ETC) + Ensemble (RF, NB, GB) + KNN for local refinement

Improved predictive accuracy on Cleveland, Hungarian & Long Beach datasets; surpassed 5 baseline models

Wang et al.27—IDE-TSK-FC (Improved Deep-Ensemble TSK Fuzzy Classifier)

Class-imbalanced data weakens classifier learning, esp. minority classes

Layered Zero-Order TSK fuzzy subclassifiers with ensemble stacking

Enhanced minority-class detection; real-world & public datasets showed better performance vs. standard ZO TSK classifiers