Table 1 Benchmarking summary: this table summarizes the performance of various 3D pose estimation neural networks and their 2D backbones on the SCAI-Gait and Healthy Datasets

From: 3D pose estimation for scalable remote gait kinematics assessment

SCAI-Gait Dataset

    

3D pose estimator

2D BackBone

1920X1080, 50FPS

720X576, 25FPS

Overall

RTMPose3D14

RTMPose

111.0 ± 25.9 (34,561)

132.9 ± 30.2 (9613)

120.4 ± 29.9 (44,174)

BlazePose15

MediaPipe

108.1 ± 16.1 (23,004)

137.3 ± 30.2 (5883)

120.3 ± 24.2 (28,887)

MotionBERT12

AlphaPose

84.6 ± 23.4 (25,882)

119.8 ± 32.6 (9409)

97.9 ± 33.9 (35,291)

MotionAGFormer17

YOLOv3+HRNet

82.8 ± 28.9 (29,696)

118.0 ± 28.3 (8695)

98.3 ± 32.3 (38,391)

VideoPose3D16

Detectron2

78.1 ± 32.9 (35,925)

107.694 ± 25.1 (9,792)

90.7 ± 33.6 (45,717)

Healthy Dataset

    

3D Pose Estimator

2D BackBone

1000 X 1000, 50FPS

640 X 480, 25FPS

Overall

VideoPose3D

Detectron2

11.5 ± 5.7 (38,881)

91.0 ± 7.8 (8367)

24.8 ± 30.2 (47,248)

  1. The SCAI-Gait Dataset includes videos at 1920 × 1080 (50 FPS) and 720 × 576 (25 FPS), with a total of 46,717 frames, of which 45,717 frames contained valid 3D pose detections (values in parentheses indicate the number of frames with detected 3D pose keypoints). The Healthy Dataset comprises 47,248 frames, with 38,881 frames at 1000 × 1000 (50 FPS) from H3.6M and 8367 frames at 640 × 480 (25 FPS) from HumanEva-I. Performance is evaluated using the Procrustes Adjusted Mean Per Joint Position Error (PA-MPJPE). As expected, higher-resolution, higher-frame-rate videos yield lower errors due to reduced pixel distortion and better temporal fidelity, particularly improving localization of distal joints. The reported average PA-MPJPE values highlight the robustness of each estimator across varying resolutions and datasets.
  2. The bold values showcase best performing model (VideoPose3D which achieved lowest PA-MPJPE score across all models.