Fig. 3
From: ScaleFormer architecture for scale invariant human pose estimation with enhanced mixed features

ScaleFormer Algorithm Flowchart. The framework employs a dual-branch parallel architecture where Swin Transformer extracts hierarchical global features while ConvNeXt Enhancement provides fine-grained local feature refinement. Multi-scale features are subsequently fused to generate robust pose estimations.