Table 16 Performance comparison with recent SOTA models (NASA + Fire videos datasets).

Model	Accuracy (%)	Precision (%)	Recall (%)	F1-score (%)	AUC-ROC (%)	FPS (GPU)
Proposed hybrid model	98.8	98.6	98.3	98.4	98.9	32
YOLOv13²⁸	97.8	97.5	97.9	97.7	98.2	28
YOLO-NAS⁴⁰	98.1	98.0	97.8	97.9	98.5	29
MobileViT³⁰	96.5	96.0	96.3	96.1	97.2	35
EfficientViT²⁵	96.8	96.4	96.7	96.5	97.4	33
FireViTNet¹⁴	97.2	97.0	97.1	97.0	97.8	27
Smoke detection transformer⁷	97.5	97.2	97.3	97.2	98.0	26

The Proposed Hybrid Model corresponds to the full configuration integrating Vision Transformers (ViTs), 3D-CNNs, Transformer Attention, and Multi-task Learning. The reported accuracy of 98.8% reflects the average performance on the combined test set (NASA + Fire Videos datasets). Individually, the model achieved 99.2% on the NASA dataset and 98.3% on the Fire Videos dataset, yielding an average accuracy of 98.75%.

Quick links

Search