Fig. 6: Description of our video-based learning framework.
From: Learning aggressive animal locomotion skills for quadrupedal robots solely from monocular videos

We divide the system into two steps: The 3D motion estimation module captures monocular videos of animals and generates a spatial-temporal skeleton motion graph in three dimensions. The motion imitation module adapts the animal motion data to the robot’s joint space and trains the robot to master these dynamic movements, enabling proficient performance in its environment. a Perception pipeline for animal 2D pose estimation and joint tracking. b 3D pose estimation module and structure of STGNet. c Control and deployment pipeline for imitating multiple video-extracted motions, such as backflip and bipedal movements.