Table 2 The summary of the analyzed related works.

From: Real-time driver drowsiness detection using transformer architectures: a novel deep learning approach

Ref

Year

Dataset

Classification model

Accuracy

Strengths and Weaknesses

19

2019

Self-prepared dataset

RF and non-linear SVM

RF: 88.37% to 91.18%

Strengths: Good performance with varying epoch lengths. Weaknesses: SVM accuracy is consistently lower

20

2019

CelebA, YawDD

Multiple CNN-kernelized correlation filters method

92%

Strengths: High accuracy in a variety of conditions, robust to environmental variations. Weaknesses: Limited to CNN models, lacks broader evaluation across other architectures

21

2020

300-W dataset

Mamdani fuzzy inference system

95.5%

Strengths: Incorporates fuzzy logic for drowsiness detection, useful for real-time applications. Weaknesses: May struggle with fine-tuning or handling complex image data without enhancement

22

2020

Self-prepared thermal

SVM and KNN

SVM: 90%

Strengths: Non-invasive method, useful for thermal monitoring in different lighting conditions. Weaknesses: Thermal imaging may require high-end equipment, and KNN struggles in complex environments.

  

image dataset

 

KNN:83%

 

23

2020

Self-prepared ZJU dataset

FD-NN, TL-VGG16, and TL-VGG19

FD-NN: 98.15%, TL-VGG16: 95.45%, TL-VGG19: 95%

Strengths: Impressive performance for fatigue detection, able to capture fine-grain eye movement features. Weaknesses: Limited dataset, reliance on predefined classifiers

24

2020

Self-prepared dataset (DROZY database)

Multilayer perceptron, RF, and SVM

SVM: 94.9%

Strengths: Good feature extraction using basic neural networks for fatigue detection. Weaknesses: Lack of deep learning-based approaches, might miss subtle facial cues.

10

2021

NTHU-DDD video dataset

Deep-CNN-based ensemble

85%

Strengths: Ensemble learning improves detection, high accuracy in diverse environments. Weaknesses: Lower overall performance compared to transformer-based models

25

2022

CEW, ZJU, MRL

Dual CNN Ensemble (DCNNE)

CEW: 97.56%, ZJU: 97.99%, MRL: 98.98%

Strengths: High performance across multiple datasets with ensemble methods. Weaknesses: May not generalize well to datasets outside of the tested range

26

2022

UTA-RLDD dataset

RNN and CNN

60%

Strengths: Low computational cost, suitable for mobile applications. Weaknesses: Low accuracy, particularly for complex real-time applications

27

2022

Self-prepared dataset for traffic signs

CNN

98.53%

Strengths: Strong accuracy in traffic-related scenarios, applicable to many monitoring systems. Weaknesses: Lacks focus on drowsiness detection, limited to specific contexts

13

2022

NTHU-DDD

CNN + LSTM

97.3%

Strengths: Efficient for sequential data analysis, effective in dynamic environments. Weaknesses: Struggles with long-duration analysis or sustained predictions in real-time

32

2022

Public dataset using gas sensor, temperature sensor, and digital camera

t-SNE for feature extraction + Isolation Forest (iF) for anomaly detection

95%

Strengths: Effective detection using only normal (non-drunk) data, Handles nonlinear, high-dimensional data well, Unsupervised approach suitable for real-time detection. Weaknesses: t-SNE is computationally intensive and not ideal for real-time processing, Model performance may vary with different sensor quality or configurations

11

2023

Drowsiness dataset

CNN and VGG16

CNN: 97%, VGG16: 94%

Strengths: High performance in detecting drowsiness from real-time data. Weaknesses: Limited to VGG16 architecture, potential underperformance in new data types

28

2023

MRL

VGG16, VGG19, and 4D

VGG16: 95.93%, VGG19: 95.03%, 4D: 97.53%

Strengths: Strong results across multiple configurations, robust for real-time driver drowsiness detection. Weaknesses: Dependence on specific VGG-based models may limit flexibility in dynamic environments

29

2023

NTHUDDD dataset

RF, SVM, and sequential NN

RF: 99%, SVM: 80%, 4D: 96%

Strengths: RF offers excellent performance, especially for simple fatigue detection scenarios. Weaknesses: SVM underperformed significantly, less robust across diverse environmental conditions

33

2024

Public dataset using gas sensor, temperature sensor, and digital camera

ICA for feature extraction + Kantorovitch Distance (KD) + DEWMA for anomaly detection; XGBoost for SHAP analysis

F1-score = 98%

Strengths: Does not require labeled data (semi-supervised) High sensitivity using DEWMA with nonparametric threshold, SHAP adds explainability to the model, Effective on non-Gaussian multivariate data. Weaknesses: Complexity due to integration of multiple techniques, DEWMA and KD may require careful tuning for different datasets, Potentially computationally intensive for real-time systems

30

2024

NTHU-DDD

VGG19

96.51%

Strengths: Efficient in various lighting and environmental conditions. Weaknesses: Performance variation across datasets, still limited by fixed network architectures

31

2024

YawDD, MRL

VGG16 and CNN

VGG16: 95.85%, CNN: 96.54%

Strengths: High performance in real-time detection with various feature extraction methods. Weaknesses: Not ideal for low-complexity devices, might need more robust processing power

18

2024

MRL

CNN, InceptionV3, and MobileNetV2

CNN: 96%, MobileNetV2: 97%, InceptionV3: 98%

Strengths: Excellent performance with quick response times, particularly in driver monitoring. Weaknesses: InceptionV3 and MobileNetV2 still face computational trade-offs