Fig. 2

Mask R-CNN-based visual perception pipeline. The RGB image is passed through a ResNet-50 + FPN backbone. The Region Proposal Network (RPN) generates candidate regions, which are refined via RoIAlign and processed through classification and segmentation heads. The output is a per-pixel binary mask of the navigable trail, used downstream for decision-making and control.