Table 2 Comprehensive review for recent crash severities prediction methods.
References | Year | Applied dataset | Dataset size | No. records | Applied algorithms | Results |
---|---|---|---|---|---|---|
2024 | Highway 401 (Canada), U.S. highways | Large |  ~ 1 M records | Deep Neural Networks Adaptive Feature Selection | Deep Neural Networks: ~ 90% accuracy | |
2022 | Traffic ]accident records | Moderate |  ~ 500 K records | Boruta Algorithm, RF, XGBoost, Naïve Bayes | XGBoost: 82.10%, Naïve Bayes: 79.52% | |
2022 | Various disease risk datasets | Moderate |  ~ 100 K–500 K | Feature Selection Review | Feature selection review; no accuracy reported | |
2024 | UAH-DriveSet dataset | Large |  ~ 1.2 M records | Wrapper FS, RF | RF: 96.4%, KNN: 96.29% | |
2023 | Traffic accident data | Large | 750Â K accidents | DT, RF, LR | RF: 85% , DT: 80% | |
2025 | Traffic data from Turkey | Small | 13,234 accident records | KNN, RF, XGBoost, DNN, DNN + RF | Accuracy: 92% | |
2024 | Real-time road data | Moderate |  ~ 600 K real-time traffic data | SMOTE, AdaBoost, XGBoost, RF | SMOTE + XGBoost: 88% AdaBoost: 85% | |
2020 | Real-time vehicle and environmental data | Large |  ~ 850 K real-time traffic records | Bayesian Learners, KNN, SVM, MLP | Boosting Model: 93.66% F1-score | |
2022 | NGSIM Traffic Data | Large |  ~ 900 K lane change instances | XGBoost, Recursive Feature Elimination | XGBoost with Feature Engineering: 97.6% | |
2023 | Real-time vehicle telemetry | Large |  ~ 1 M vehicle telemetry records | LGBM, ENN-SMOTE-Tomek Link | LGBM with Feature Selection: 91.5% | |
2023 | GIDAS (German In-Depth Accident Study) | Small | 11,074 collision scenarios | K-Means +  + , k-NN | 35 clusters of car-to-car collision configurations | |
2021 | UCI Automobile Dataset | Moderate |  ~ 150 K automobile price records | LASSO Regression, Stepwise Selection | LASSO Regression (Testing): 87% accuracy | |
2022 | Azure ML Studio dataset | Moderate |  ~ 300 K samples | Spearman Correlation, , Pearson Correlation | Fisher Score: Best among tested methods | |
2019 | vehicles’ trajectory data | Small | 822 candidate features | Feature Selection & Prediction Models | RF with SMOTETomek: 80.3% | |
2022 | Accident severity datasets | Moderate |  ~ 650 K crash severity data | Gradient Boosting, Feature Engineering | Gradient Boosting: 89% LR: 85% | |
2023 | New Zealand road accident data | Moderate | 67,971 records | RF, AdaBoost, XGBoost, LGBM, CatBoost | RF with SMOTE: 81.45% | |
2023 | Traffic accident data from the Qassim Province, Saudi Arabia | Small | 3506 accidents | LR, RF, XGBoost | XGBoost: AUC 87% | |
RF: AUC 87% | ||||||
LR: AUC 62% | ||||||
2022 | Traffic crash data from Al-Ahsa, Saudi Arabia | Small | 9031 records | Binary Logistic RegressionRegression Tree Model | LR: 73% | |
CART Model: 74% | ||||||
2022 | Road traffic crash data from Highway 15 in Saudi Arabia | Small | 3439 records | RF, KNN | RF: 78.7% | |
KNN: 75% | ||||||
2025 | Crash Report Sampling System | Moderate | 45,373 records | Transformer architecture | Test accuracy: 93% |