Table 14 Comparison of CNN/LSTM and transformer-based HAR models with XTinyHAR.

From: A tiny inertial transformer for human activity recognition via multimodal knowledge distillation and explainable AI

Model

Architecture type

Accuracy (%)

Modality

Model complexity

CNN/LSTM-based HAR models

CNN-LSTM 19

Deep learning hybrid

90.89

IMU

High

Attention-LSTM 20

Attention-based RNN

94.30

IMU

High

Self-supervised CNN 21

Conv-based SSL

96.50

IMU

Medium

DeepConvLSTM 22

CNN + LSTM

93.70

IMU

High

MC-HARNet 23

Multiscale CNN

95.20

IMU

Medium

Transformer-based HAR models

Spectro-transformer 25

Transformer (spectrogram)

97.80

IMU

High

Contrastive transformer 26

Contrastive + Transformer

98.60

IMU

High

RFID-transformer 27

RFID + Transformer

99.10

RFID

High

Skeleton-ViT 28

ViT for skeleton data

98.50

Skeleton

High

Student-teacher HAR 29

Transformer KD

98.70

Multi to IMU

Medium

DMFT 30

Distilled multi-modal transformer

93.97

Multi-modal

Medium

XTinyHAR (ours)

Lightweight inertial transformer (KD)

98.71 (UTD), 98.55 (MM-Fit)

IMU

Low (2.45 MB)