Fig. 5
From: Automatic damage detection and localization of ancient city walls—a case study of the Great Wall

The white dashed boxes contain the network input and output. The gray dashed boxes enclose the key network architecture (including backbone, neck, and head). The rounded rectangle boxes represent different network layers. The stacked gray rectangles indicate feature maps, with lower values showing their dimensions (width × height × channels). The red cubes denote prediction heads, with lower values displaying feature map dimensions (width × height) × prediction data dimensions.