Fig. 2
From: End to end polysemantic cooperative mixed task trainer for UAV target detection

Traditional DETR Architecture Diagram. The left side starts with the input image and passes through the Backbone module subsequently through the traditional Transformer Encoder and Decoder structure, resulting in a one-to-one label matching rule.