Fig. 2
From: Dual attention for multi object tracking with intra sample context and cross sample interaction

The overall of the proposed network is as follows: The backbone with lightweight model extracts fundamental visual features \(B1, B2, B3, B4\). Sample-perception features \(P2, P3, P4, E2, E3, E4\) are obtained through a dual attention mechanism (DAM), comprising intra-sample local attention mechanism (SLAM) and inter-sample global attention mechanism (SGAM). SLAM is utilized for extracting distinctive context information, while SGAM is adopted to capture shared instance-level semantics across samples. Finally, the tracking association component predicts and assigns object trajectories, ensuring accurate and reliable tracking.