Table 2 Inference speed in frames per second (fps), on a NVIDIA GTX 2080 Ti, for each included model. This is the average speed over processing the 4500 frames of the dataset.

From: Comparison of marker-less 2D image-based methods for infant pose estimation

Model

ViTPose-huge

HRNet-w48

MediaPipe

OpenPose

Retrained ViTPose

AggPose

AGMA-HRNet48

Inference speed

1 fps

1.2 fps

3 fps (CPU)

11.5 fps

1.1 fps

2.7 fps

3 fps