Fig. 2: Video preprocessing and dataset construction workflow.
From: Video-based cattle behaviour detection for digital twin development in precision dairy systems

Continuous barn surveillance footage was processed through a multi-stage pipeline involving (1) raw video input from multiple cameras, (2) frame-wise cow detection using YOLOv11, (3) identity-preserving multi-object tracking via ByteTrack, and (4) per-cow bounding box cropping to generate standardized 10-s clips at 224Ă—224 px resolution. This pipeline enabled the creation of a balanced, behaviour-focused dataset suitable for spatiotemporal model training and digital-twin integration.