Fig. 3
From: A multi-head YOLOv12 with self-supervised pretraining for urinary sediment particle detection

Overview of the proposed two-stage deep learning method. (A) The encoder–decoder network is pretrained via self-supervised reconstruction on unlabeled urine sediment images to learn rich feature representations. (B) The pretrained encoder (backbone) is fine-tuned for object detection using six parallel heads, enabling precise multi-class identification of urinary particles.